Consider whether parse-less tokenizing is viable in the long run #589

marijnh · 2017-09-18T08:04:13Z

A few years ago we adopted the 'sweet.js algorithm', which tries to distinguish syntactical syntax from the token stream to be able to disambiguate things like the division op and the start of a regexp, so that the tokenizer can be ran without also running the parser.

New ES versions have increasingly complicated the story here, and neither sweet.js itself nor Esprima, which also implements this, seems to really be motivated to keep up with that. We've been slowly complicating our algorithm, but it's starting to get shaky, and now PR #575 is the first place where we've reintroduced a dependency of the tokenizer on the parser.

Maybe, instead of putting further energy into this, and dealing with the bugs that come from this complexity, we should just deprecate tokenizing without also running a parser?

(Though that would have consequences like making it impossible to, in the future, reframe our lookahead approach or loose parser in terms of actual token lookahead, so that'd have to be considered carefully.)

It'll now only run in plain-tokenizer or loose mode, so that the parser can use its actual knowledge about the syntax to drive disambiguation of / and regexps. Issue #589 Closes #552

marijnh · 2018-09-10T15:47:56Z

In effe659, I've changed the parser to use its own knowledge about the syntax structure, without running the sweet.js token context magic at all, but left that magic in for the case where we're only reading tokens. It's a bit kludgy, but not quite as disruptive as completely dropping the tokenizer-only functionality, so I think it's an okay compromise.

marijnh · 2018-09-11T09:06:33Z

Reopening (and I've reverted the patches in 1a07466). I realized that my alternative approach completely broke independent tokenizing of template strings with interpolated fields, so that's a non-starter unless we decide to drop support for tokenizing entirely. Also, the way it required the parser to drive the tokenizer (via re-tokenizing slashes when at an expression, and setting a flag at the right moment when tokenizing template strings) was pretty awkward in its own right.

So the problem described in this issue remains until we come up with some better approach, and I'm reopening #552.

marijnh · 2018-09-11T09:41:12Z

Going to punt on this for version 6.0. Can't find a good solution, and it's better to stick with the existing bad solution than to pivot to a new bad solution.

Issue #751 Issue #589

marijnh · 2018-11-05T09:04:53Z

Since f0cbb35 the parser forces a regexp token when it sees a / operator in expression position. So now we're effectively using the parser to drive tokenizing, though the old heuristics also still run in order to make plain tokening without parsing possible.

bd82 mentioned this issue May 2, 2018

How are you handling the division and regExp literal ambiguity? w-y/ecma262-jison#29

Closed

marijnh closed this as completed Sep 10, 2018

marijnh reopened this Sep 11, 2018

This was referenced Nov 5, 2018

for-of with const and destructuring not parsing #752

Closed

no semicolon after var/let/const does not allow expression on next line #751

Closed

marijnh added a commit that referenced this issue Nov 5, 2018

Force-retokenize regexps when a slash is found in expression position

f0cbb35

Issue #751 Issue #589

bd82 mentioned this issue Feb 2, 2019

Disallow integers in keys toml-lang/toml#592

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Consider whether parse-less tokenizing is viable in the long run #589

Consider whether parse-less tokenizing is viable in the long run #589

marijnh commented Sep 18, 2017

marijnh commented Sep 10, 2018

marijnh commented Sep 11, 2018

marijnh commented Sep 11, 2018

marijnh commented Nov 5, 2018

Consider whether parse-less tokenizing is viable in the long run #589

Consider whether parse-less tokenizing is viable in the long run #589

Comments

marijnh commented Sep 18, 2017

marijnh commented Sep 10, 2018

marijnh commented Sep 11, 2018

marijnh commented Sep 11, 2018

marijnh commented Nov 5, 2018