Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Parse empty arguments in (invalid) Apply more often #14463

Merged
merged 1 commit into from Feb 17, 2022

Conversation

adampauls
Copy link
Contributor

@adampauls adampauls commented Feb 13, 2022

Attempts to improve parsing of incomplete Apply nodes. Makes two improvements:

  • foo(x,<EOF> will now parse as foo(x, <errorTermTree>) instead of foo(x)
  • foo(x = , <EOF|)> will now parse as foo(x = <errorTermTree>, <errorTermTree>) instead of foo(x = <errorTermTree>).

As a side effect, makes the parser more robust to errors in comma-separated lists by adding , to the list of "safe points" that the parser will skip to in the face of an error. That is, foo(x, , y) will parse as foo(x, <errorTermTree>, y) instead of foo(x, <errorTermTree>). I am unsure if there are more ramifications than I'm seeing for that last change. I'm also not sure how best to test the parser when there are errors.

val key = s"${relativize(srcpos.source.file.toString)}:${srcpos.line + 1}"
// If an annotated // error is placed on the last line of the file, EOF errors can show up past the last.
// Without this adjustment, it's not possible to annotate // error on errors whose position is EOF
// in some cases.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I add anypos-error for positions that can't be expressed in comment.

dff31fb

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(Too tired to think about off-by-one just now.)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You did, and I'll happily use that if it's better. I think it's a little unfortunate not to be able to add a // error at the end of a file to indicate that the error should be at the end of the file though? I agree this isn't great, but I'd rather not use anypos all else equal. Happy to if this is what gets the PR through!

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, absolutely. There is the endless thing on github whether the file ends in a newline, and therefore what is the true line count. This came up in a scala 2 issue where the line number mattered. As a vi user, my core belief is always terminate and there is not an empty line after that, it's just EOF. But I recognize that reasonable people may use tabs instead of spaces. I meant to say stubborn. Now I'm curious to follow your issue.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(I reverted this change because I realized how to get the existing reporting mechanism to be happy).

@@ -881,7 +881,8 @@ object Parsers {
val next = in.lookahead.token
next == LBRACKET || next == LPAREN

/** Is current ident a `*`, and is it followed by a `)` or `, )`? */
/** Is current ident a `*`, and is it followed by a `)`, `, )`, `,EOF`? The latter two are not
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is a correct use of this function, or at least does not make it worse than it was before. Without this, the special handling for EOF in the comma lookahead is unnecessary, and also leads to a better error when a comma is followed by an EOF.

@@ -285,7 +285,7 @@ object Tokens extends TokensCommon {

final val endMarkerTokens = identifierTokens | BitSet(IF, WHILE, FOR, MATCH, TRY, NEW, THROW, GIVEN, VAL, THIS)

final val skipStopTokens = BitSet(SEMI, NEWLINE, NEWLINES, RBRACE, RPAREN, RBRACKET, OUTDENT)
final val skipStopTokens = BitSet(SEMI, NEWLINE, NEWLINES, RBRACE, RPAREN, RBRACKET, OUTDENT, COMMA)
Copy link
Contributor Author

@adampauls adampauls Feb 13, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Possibly controversial. This makes it so that foo(bar, <some error>, baz) still properly parses baz.

Copy link
Contributor Author

@adampauls adampauls Feb 13, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure what the downside of this is though. Only two tests were affected, and IMHO the extra errors are an improvement.

@adampauls adampauls changed the title [WIP] Parse empty arguments in (invalid) Apply more often Parse empty arguments in (invalid) Apply more often Feb 13, 2022
@som-snytt
Copy link
Contributor

There are some old tests in the repo due to the similar thing in scala 2

➜  dotty git:(test/current) find . -name "*5702*"
./tests/neg/t5702-neg-bad-and-wild.scala
./tests/pos/t5702-pos-infix-star.scala
./tests/untried/neg/t5702-neg-bad-xbrace.scala
./tests/untried/neg/t5702-neg-ugly-xbrace.scala
./tests/untried/neg/t5702-neg-ugly-xbrace.check
./tests/untried/neg/t5702-neg-bad-xbrace.check
./tests/untried/neg/t5702-neg-bad-and-wild.check
./tests/untried/neg/t5702-neg-bad-brace.scala
./tests/untried/neg/t5702-neg-bad-brace.check

@adampauls
Copy link
Contributor Author

I added the tests from t5702 that didn't involve XML. I think the errors are acceptable?

@@ -1,28 +0,0 @@
t5702-neg-bad-and-wild.scala:10: error: bad simple pattern: bad use of _* (a sequence pattern must be the last pattern)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not deleted but moved. The diff is too big for GH to realize it though.

@@ -1,17 +0,0 @@

object Test {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not deleted, just moved. Diff too big.

| an identifier expected, but ',' found
-- [E040] Syntax Error: tests/neg/t1625.scala:2:32 ---------------------------------------------------------------------
2 | def foo(x: String*, y: String*, c: String*): Int // error: an identifier expected, but ',' found // error: an identifier expected, but ',' found
| ^
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is an example of better error messages IMHO: you get an error on both invalid occurrences.

@adampauls
Copy link
Contributor Author

Cc @tgodzik, might be related to scalameta/metals#3625.

@tgodzik
Copy link
Contributor

tgodzik commented Feb 14, 2022

Cc @tgodzik, might be related to scalameta/metals#3625.

It might be! I will take a look at the PR!

Copy link
Contributor

@tgodzik tgodzik left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be super useful to have this 🎉

We could workaround it with an artificial cursor, but it would probably be more efficient to handle it in the compiler.

I left a couple of comments, but we would need to get someone from the core compiler team to take a look as I am not confident enough about the impact of the changes.

Maybe @odersky ?

tests/neg/arg-eof.scala Show resolved Hide resolved
@@ -1,5 +1,5 @@
class A[T]
object o {
// Testing compiler crash, this test should be modified when named type argument are completely implemented
val x: A[T=Int, T=Int] = ??? // error: ']' expected, but '=' found
val x: A[T=Int, T=Int] = ??? // error: ']' expected, but '=' found // error
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also here, there seems to be no message accompanying the message

Copy link
Contributor Author

@adampauls adampauls Feb 16, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The second message here is not ideal:

Declaration of value x not allowed here: only classes can have declared but undefined members

This is arguably a regression. I'm not quite sure what's happening, but I noticed there were a bunch of untagged, less-than-ideal error messages elsewhere so I left it. Think it's worth fixing here? Leaving a TODO?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure, if there are other wierd errors then I guess we can create a Todo with an issue to improve this.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I looked into it a bit more. When parsing a comma-separated list, if thing you're looking for (in this case, a type tree) ends before you hit a comma, the parser thinks you're done the list, so the parser reads T then thinks it's done the list because it doesn't see a ,. It's then expecting to see a ], and issues the first error. Then it skips to the next safe point, which in the past was the next ], but now is the ,. So I guess the parser sees this as val x: A[T], T = Int] = ???, so that explains why the parser produces a val x: A[T] and hence the new error. I don't know why there aren't errors resulting from the , T = Int] = ??? suffix, but presumably there are and they are just suppressed?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I pushed a change that gets rid of this (bad) error by changing skip to look for a comma only if we're not explicitly trying to find a statement separator.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we remove the added error then?

tests/neg/t5702-neg-bad-brace.check Show resolved Hide resolved
@@ -278,7 +278,7 @@ object Parsers {
if (in.token == EOF) incompleteInputError(msg)
else
syntaxError(msg, offset)
skip()
skip(stopAtComma = true)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This line makes it so we skip to the next argument if we encounter an error in a comma separated list.

@@ -346,7 +346,7 @@ object Parsers {
false // it's a statement that might be legal in an outer context
else
in.nextToken() // needed to ensure progress; otherwise we might cycle forever
skip()
skip(stopAtComma=false)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here, we are explicitly looking to skip to a place that should end a statement, which a , never will.

Copy link
Contributor

@odersky odersky left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like a net improvement to me.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants