Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Consider whether parse-less tokenizing is viable in the long run #589

Open
marijnh opened this issue Sep 18, 2017 · 4 comments
Open

Consider whether parse-less tokenizing is viable in the long run #589

marijnh opened this issue Sep 18, 2017 · 4 comments

Comments

@marijnh
Copy link
Member

marijnh commented Sep 18, 2017

A few years ago we adopted the 'sweet.js algorithm', which tries to distinguish syntactical syntax from the token stream to be able to disambiguate things like the division op and the start of a regexp, so that the tokenizer can be ran without also running the parser.

New ES versions have increasingly complicated the story here, and neither sweet.js itself nor Esprima, which also implements this, seems to really be motivated to keep up with that. We've been slowly complicating our algorithm, but it's starting to get shaky, and now PR #575 is the first place where we've reintroduced a dependency of the tokenizer on the parser.

Maybe, instead of putting further energy into this, and dealing with the bugs that come from this complexity, we should just deprecate tokenizing without also running a parser?

(Though that would have consequences like making it impossible to, in the future, reframe our lookahead approach or loose parser in terms of actual token lookahead, so that'd have to be considered carefully.)

marijnh added a commit that referenced this issue Sep 10, 2018
It'll now only run in plain-tokenizer or loose mode, so that the
parser can use its actual knowledge about the syntax to drive
disambiguation of / and regexps.

Issue #589

Closes #552
@marijnh
Copy link
Member Author

marijnh commented Sep 10, 2018

In effe659, I've changed the parser to use its own knowledge about the syntax structure, without running the sweet.js token context magic at all, but left that magic in for the case where we're only reading tokens. It's a bit kludgy, but not quite as disruptive as completely dropping the tokenizer-only functionality, so I think it's an okay compromise.

@marijnh marijnh closed this as completed Sep 10, 2018
@marijnh
Copy link
Member Author

marijnh commented Sep 11, 2018

Reopening (and I've reverted the patches in 1a07466). I realized that my alternative approach completely broke independent tokenizing of template strings with interpolated fields, so that's a non-starter unless we decide to drop support for tokenizing entirely. Also, the way it required the parser to drive the tokenizer (via re-tokenizing slashes when at an expression, and setting a flag at the right moment when tokenizing template strings) was pretty awkward in its own right.

So the problem described in this issue remains until we come up with some better approach, and I'm reopening #552.

@marijnh marijnh reopened this Sep 11, 2018
@marijnh
Copy link
Member Author

marijnh commented Sep 11, 2018

Going to punt on this for version 6.0. Can't find a good solution, and it's better to stick with the existing bad solution than to pivot to a new bad solution.

@marijnh
Copy link
Member Author

marijnh commented Nov 5, 2018

Since f0cbb35 the parser forces a regexp token when it sees a / operator in expression position. So now we're effectively using the parser to drive tokenizing, though the old heuristics also still run in order to make plain tokening without parsing possible.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant