Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Loose parser fails to parse ArrayExpression on multiple lines #199

Open
mrennie opened this issue Jan 14, 2015 · 9 comments
Open

Loose parser fails to parse ArrayExpression on multiple lines #199

mrennie opened this issue Jan 14, 2015 · 9 comments

Comments

@mrennie
Copy link
Contributor

mrennie commented Jan 14, 2015

Using the latest from master parse the following:

var a = [1,
2,
3];

The resulting AST has three root nodes in the body (I would expect only a variableDeclaration node) and has only one element in the elements array of the ArrayExpression node (I would expect three Literal nodes).

Acorn.parse(...) properly parses the same expression correctly.

@RReverser
Copy link
Member

This is kind of by design:

The expected use for this is to first try acorn.parse, and only
if that fails switch to parse_dammit. The loose parser might
parse badly indented code incorrectly, so don't use it as
your default parser.

Loose parser expects that code is already corrupted, so it doesn't rely strictly on token list, but rather on indentation in order to complete same-indented expressions as incomplete lists, i.e.:

var a = [1,

is treated as single-line array (which seems to be the most valid assumption based on same-styled code) but

var a = [
  1,

would be treated as incomplete multiline array.

If you specifically mess up style by mixing indentations, it of course gets confused and can't determine what did you want to write. So better to follow advice from comment - first try to parse your code with valid parser, and only when it fails, fallback to acorn_loose for style-based approximations.

@mrennie
Copy link
Contributor Author

mrennie commented Jan 14, 2015

This is kind of by design:

Thanks for the pointer, but this feels like a serious flaw. I assumed (obviously incorrectly) that both parsers could parse valid JavaScript code, just that parse_dammit could recover in certain failure cases.

So better to follow advice from comment - first try to parse your code with valid parser, and only when it > fails, fallback to acorn_loose for style-based approximations.

This would not be ideal. In Orion 99% of parsing is done on files that are not complete (i.e. for content assist, marking occurrences, etc) and some of the files are very large - the performance impact of having to parse the same file twice would not be good, especially if I get a different result based on formatting.

@marijnh
Copy link
Member

marijnh commented Jan 14, 2015

I did at some point sketch out a design for a loose parser that would parse in two passes, first trying to match brackets, and then doing the actual parsing. I never got around to implementing it because it is a lot of work. If someone (IBM?) is willing to invest money in this and pay me to build it, get in contact.

@mrennie
Copy link
Contributor Author

mrennie commented Jan 14, 2015

I would be more than happy to work on ensuring parse_dammit can parse valid JavaScript source if you would be interested in such a contribution.

@marijnh
Copy link
Member

marijnh commented Jan 17, 2015

Feel free to take a stab at it. But keep performance in mind -- the loose parser should also remain fast.

@mrennie
Copy link
Contributor Author

mrennie commented Jun 27, 2017

I was playing around with this the other day, and I added the following to the loose parser in parseExprList (which fixed up the array case I originally reported):

var blockHeuristic = this.last && this.last.type === acorn.tokTypes.bracketL;
while (!this.closes(close, indent + 1, line, blockHeuristic)) {
...

Does this fix make sense for the array case? If so I can open a PR for it.

@marijnh
Copy link
Member

marijnh commented Jun 28, 2017

Why would treating parenthesized and bracketed expression lists differently be a good idea?

@RReverser
Copy link
Member

@marijnh Not going into the details of the suggested fix, in general one reason could be that a(1, 2, 3) is pretty common so it's reasonable to expect continuation of arguments, while a[1, 2, 3] is rather not and it's more reasonable to expect that it's an array and not indexing?

@mrennie
Copy link
Contributor Author

mrennie commented Jul 5, 2017

The main reason for this change was to fix Orion bug 518330 - so that we could have completions (from Tern) for requirejs regardless of whitespace around the imports.

I also did consider that it is far more likely than not the starting '[' would delimit an array (as @RReverser pointed out).

@marijnh marijnh changed the title parse_dammit fails to parse ArrayExpression on multiple lines Loose parser fails to parse ArrayExpression on multiple lines Sep 14, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants