Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve errors location tracking #14130

Conversation

tolmasky
Copy link
Contributor

@tolmasky tolmasky commented Jan 11, 2022

Summary

The main change in this PR is to remove our dependence on getLineInfo by instead relying more heavily on the Position versions of the various locations that are passed around during parsing. Aside from thus "naturally" accounting for startLine and startColumn with no additional work, this also avoids the pathalogical cases when getLineInfo can't rely on previously cached locations and must thus re-iterate through the entire file to re-generate line and column information, despite the fact that other parts of the program already have this information. Although this is perhaps not too terrible in the default { errorRecovery: false } case of calling parse, it can be quite devastating to performance if you are running with { errorRecovery: true }, like we do in RunKit.

Approach

Perhaps the most important change is that I have added an index member to the Position class, which tracks the source index (the value represented by, e.g.,State.[start|end] vs. State.[start|end]Loc):

class Position
{
  line: number;
  column: number;
  index: number;
}

The reason for this is that these two values do exist in a one-to-one mapping, and thus anywhere where they must be passed separately simply introduces the possibility of erroneously getting them out of sync. It is also almost always the case that various methods, from an interface perspective, either care about both, or just the Position, and thus we are not creating too many cases where we are creating extra objects during the parse. That is to say, the places where it is beneficial to just update state.pos all still behave that way since they are in tight loops, not updated and then calling into other functions (that we weren't already doing this for).

I have made this change fairly defensively by making index be a non-enumerable member of Position:

Object.defineProperty(this, "index", { enumerable: false, value: index });

This way, I wouldn't have to re-generate every existing test to include this new addition, and it also won't show up unexpectedly if anyone is JSON-ing these trees or whatever. I am however happy to just make it a normal member variable though, if that is preferred. I just figured I'd start off doing it this way, as it is easy to change, and so that more attention can be paid to the fewer "meaningful" test changes instead of being bombarded with a PR that changes just about every test.

As such, a few key methods have changed:

raise()

- raise(pos: number, template: ErrorTemplate, ...params: any)
+ raise(template: ErrorTemplate, origin: Origin, ...params: any)

Where Origin is defined as:

type Origin = {| node: Node |} | {| at: Position |};

Aside from the usage of Position instead of the previous number index, the idea here is to make the most common use cases super simple, as well as needing less Flow gymnastics (and IMO making it more readable). Namely, in the very common case where you want to raise an error "from" a node, the code looks something like this:

this.raise(Errors.InvalidPropertyBindingPattern, {
  node: expr,
});

This then does the right thing by extracting the Position information from the node in question. Additionally, if we in the future want to change errors to support ranges (so that you can for example underline the entire node vs. just point to the beginning), we could now much more easily do this for all these calls, since we'd only need to update raise to also grab the end from the node too.

If however you still want to pass a specific location for the error, you use the at form:

this.raise(Errors.UnexpectedImportExport, {
  at: this.state.startLoc,
});

This behaves much like the old version, except taking a Position instead of a number index. I believe both of these read closer to the throw SomeError(info) pattern they are approximating, as well as making it immediately obvious whether they are needing to do complex location stuff or just defaulting to the "standard" case of identifying a node as having an error.

unexpected()

The other big error change is to unexpected. unexpected used to take either an error template or a token, but now only does the token version:

- unexpected(pos?: ?number, messageOrType: ErrorTemplate | TokenType): empty
+ unexpected(loc?: ?Position, type?: ?TokenType): empty

The reason for this change is that you really don't "get" anything from using the ErrorTemplate form of unexpected, as it collapses down to just being a call to raise with pos set to this.state.start and automatically thrown. In other words, there's no special "unexpected messaging" or anything taking place, since the ErrorTemplate circumvents all that logic, and it's basically synonymous with just doing throw this.raise(). Despite this, it requires a fair amount of flow gymnastics to support these two unrelated types. As such, I replaced the few instances of this usage with throw this.raise() and left unexpected() to solely handle the token case.

expectContextual()

There's no big change in expectContextual, except that thanks to the above unexpected change, it can now throw a nicer error. Previously, something like this.expectContextual(tt._as) would throw an error message saying "Unexpected token", whereas now it will throw an error that says ""Unexpected token, expected as".

Closes #14144

Side-effects

As a side-effect of this change, a number of function interfaces became simpler, now only requiring a location be passed in instead of the index/Position pair. We could actually do this with way more methods, but I limited it to solely those methods that were directly affected by my changes necessary for fixing errors here. This is particularly nice for methods that take optional locations, as this removes the ambiguity about whether you can pass just the integer index or if you must either pass both or neither.

Additionally, getLineInfo and getLocationForPosition have been removed, since they aren't needed anymore.

More recoverable errors

I'd like to focus on this more after this PR, since errorRecovery is kind of what set us down this path, but just due to the nature of this change in particular, I was able to easily make Errors.ElementAfterRest a recoverable error, and the tests have been updated accordingly to reflect this.

Closes #14123.

Q                       A
Fixed Issues? Fixes #14123, Fixes #14144
Patch: Bug Fix? Bug Fix
Major: Breaking Change? No
Minor: New Feature? No
Tests Added + Pass? Yes
Documentation PR Link No
Any Dependency Changes? No
License MIT

@existentialism existentialism added PR: Bug Fix 🐛 A type of pull request used for our changelog categories pkg: parser and removed PR: Bug Fix 🐛 A type of pull request used for our changelog categories labels Jan 11, 2022
@@ -78,7 +78,7 @@ export default class ParserError extends CommentsParser {
else if (pos === this.state.lastTokStart) loc = this.state.lastTokStartLoc;
else if (pos === this.state.end) loc = this.state.endLoc;
else if (pos === this.state.lastTokEnd) loc = this.state.lastTokEndLoc;
else loc = getLineInfo(this.input, pos);
else loc = getLineInfo(this.state.startLoc.line, this.input, pos);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this.state.startLoc will always be updated when a new AST node is parsed. We can make getLineInfo a parser method and then read startLine from parser options.

export function getLineInfo(input: string, offset: number): Position {
let line = 1;
export function getLineInfo(
startLine: number,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if we support startLine, then we should also support startColumn.

@tolmasky
Copy link
Contributor Author

tolmasky commented Jan 11, 2022

I am looking at possibly just making the various raise methods take a Position instead of a... "pos" (it's really hard to talk about the difference between "loc" and "pos" when "loc" is a Position, however it appears in location.js, not position.js, and that in turn has a SourceLocation, which is different from both pos and loc. I wish "pos" was instead called "charIndex" or something...). Anyways, it appears that the actual places that rely on sending in custom "pos" are fairly constrained, and so far, all have access to the corresponding loc as well. This could potentially obviate the need for any of this instead of moving getLineInfo (a fairly inefficient function) around. However, there might be some tricky place where that isn't the case.

It would also be nice if "loc" and "pos" weren't separate entities that would need to be passed around together. For example, if Position simply contained a charIndex property, that way you could pass one "thing" around everywhere:

export class Position {
  line: number;
  column: number;
  charIndex: number;
}

But this is of course a breaking change and outside the scope of all this.

@tolmasky
Copy link
Contributor Author

I have a significantly different change that accounts for column, etc. too. Will push either later today or tomorrow.

@tolmasky
Copy link
Contributor Author

I'm going to try to get an updated PR submitted later today. Just writing a bunch of tests and stuff now.

@tolmasky tolmasky force-pushed the fix-some-errors-still-ignore-start-line-option-14123 branch from d251559 to 6e475c5 Compare January 14, 2022 18:49
@tolmasky
Copy link
Contributor Author

OK, I have added a bunch of info to the main section above explaining this change. I am now in the process of trying something with the tests that would allow us to try every existing test with different startLine/startColumn changes and thus avoid having to figure out the right places to test this.

Comment on lines 84 to 88
raise(
pos: number,
{ code, reasonCode, template }: ErrorTemplate,
origin: Origin,
...params: any
): Error | empty {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To avoid having to always specify a new object when calling .raise, maybe we could divide it in two separate functions:

raise(pos: Position, { code, reasonCode, template }: ErrorTemplate, ...params: any): Error | empty {
  return this.raiseWithData(pos, { code, reasonCode }, template, ...params);
}

raiseAtNode(node: Node, { code, reasonCode, template }: ErrorTemplate, ...params: any): Error | empty {
  return this.raiseWithData(origin.node.loc.start, { code, reasonCode }, template, ...params);
}

Copy link
Contributor Author

@tolmasky tolmasky Jan 16, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I actually originally had this, but decided against it for a couple reasons:

  1. I wanted it to read like throwing an error "throw SomeError(metadata)", which calls out what matters most at that line "hey, this is a MissingSemicolonError" or "hey, this is RestTrailingCommaError" vs. front-loading the information that matters least, the associated metadata, when reading the code, like doing throw "metadata on SomeError";
  2. I think we will potentially want to add more to this object later. As I mentioned in the above, we may eventually want to expand this to { at: length: } for ranges (or { start, end } or however you want to do that), and I think it would also eventually be nice to be able to supply context info (we currently have this info but it gets stringified for example, but can be very useful to the user to be able to do error.enumName vs. parsing message for enumName). As such, I wanted to put in the scaffolding for something that could carry additional info that could be easily read in the future.
  3. From my testing there seems to be zero performance benefit to avoiding object creation at this point, for two reasons. First, it is a seldom occurring event, as it happens once per error raise (vs. getLineInfo which was an iterative event that can lead to an O(N) slowdown per Error, where N = source length). Secondly, if we did consider it to be performance sensitive, then we'd for example be much better served by instead changing raiseWithData to take (missingPlugin?: Array<string>, code?: string) instead of { missingPlugin?: Array<string>, code?: string, } (since this object data object gets thrown away immediately when yet another object is created on the corresponding call to _raise), and this change would basically affect all error raises "for free" transparently since it's an internal function call in raise. However, I also don't think this is a performance issue.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, sounds good 👍

Comment on lines 19 to 20
// this.index = index;
Object.defineProperty(this, "index", { enumerable: false, value: index });
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should probably go in a minor, because it's a new observable property on loc objects in the AST (making it non-enumerable hides it from the fixtures, but it's still there 😛).

However, I would like to merge this PR as soon as possible because:

  1. This PR also fixes different errorRecovery exceptions and some other bugs
  2. This PR conflicts with virtually everything touching the parser, and rebasing it will not be easy.

For now, we can hide this index property in a new const indexes = new WeakMap() exported by this file, and do

Suggested change
// this.index = index;
Object.defineProperty(this, "index", { enumerable: false, value: index });
// this.index = index;
indexes.set(this, index);

And replace /([\w\.]+)\.index/g with indexes.get($1). Then, we can discuss if we want to expose it as a public property in Babel 7.17.0.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good, can make that change.

Copy link

@pieh pieh Jan 21, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@nicolo-ribaudo @tolmasky despite swap to WeakMap just having typed member in

seems to actually init this field with void 0 ( see https://www.runpkg.com/?@babel/parser@7.16.10/lib/index.js#9 ).

This did trip some of our tests ( TBF the test was in practice too strict and didn't allow for any uncounted for fields, so we will be relaxing them a bit ), but most importantly, I just want to check with you wether this is acceptable for you given the comments above

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @pieh, I actually reported this immediately after we shipped as I also noticed it: #14182. Apologies though, as I had the discussion with @nicolo-ribaudo offline, where he advised that it was probably better to just hash it out here: #14174, as we might just be adding them in fully in the next minor update. But I'll let @nicolo-ribaudo comment more here as I am definitely open to removing that line for the next patch update if its causing any issues.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is no problem here from my side (as I mentioned - our tests/validation were just too strict and brittle because of that - I really should just validate keys we do need (which is line/column, but don't enforce that they are only possible fields in Position). This is especially true as babel is not the only "source" we are handling errors from.

My comment was just to make sure you are aware as this sounds like something you wanted to avoid in patch release. For us it really doesn't matter (in practice, and not semantics) if it's a patch or minor given that right now we do use ^ version selector (and not pinned version or ~ for pinned minor)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure thing, and again, apologies! This one's on me, I just forgot to delete the declaration after switching the uses of it.

@tolmasky tolmasky force-pushed the fix-some-errors-still-ignore-start-line-option-14123 branch 7 times, most recently from 99fa96b to 5b4d55d Compare January 17, 2022 17:39
@tolmasky
Copy link
Contributor Author

tolmasky commented Jan 17, 2022

OK, I've gone ahead and implemented the indexes change, and removed the testing change to not complicate this PR further. My plan is to make 2 other PRs after this:

  1. One that just reverts the last commit that switches .index to the WeakMap approach. We can use that PR to discuss whether want to use .index or not in the next minor patch (or sometime after).

  2. I'll put the testing changes into their own separate PR, that way I can devote that one to explaining them and so forth.

const { shorthandAssign, doubleProto, optionalParameters } =
): void {
if (!andThrow || !refExpressionErrors) {
return;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The return value of checkExpressionErrors is used in parseUpdate. I am surprised that the test does not fail.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So, this was difficult to understand, but I think this is what's going on:

  1. The intent, as I understand it, of refExpressionErrors is to "hold onto" things that would be errors in expressions, but not in patterns.
  2. The only places patterns can appear is in variable declarations, function parameters, and assignment expressions (which are ambiguous until we reach an = sign).
  3. In the first two, refExpressionErrors is just passed down as null, so it is irrelevant for our discussion here.
  4. However, if it the current expression was determined to be an assignment expression in parseMaybeAssign, then refExpressionErrors will have already been cleared since we know this was a pattern (
    refExpressionErrors.doubleProtoLoc = null; // reset because double __proto__ is valid in assignment expression
    )
  5. As such, I think the check in parseUpdate never passes, as by this point refExpressionErrors has been cleared. parseUpdate is the only place that passes false for andThrow (and also the only place that bothers checking the result).

This is why no tests fail. I've done my best to construct expressions that find some other path where this line would have refExpressionErrors and rely on andThrow being false to avoid throwing them, but I haven't been able to. If you can confirm the above, I can try to modify the code further to reflect how it's being used.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just want to add that I think it's totally possible that there is a relevant case that's handled by this and simply not tested, and I'm happy to change this code back to that, and also add said test if we can find it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK I went ahead and restored the old behavior I think, although I still can't seem to figure out how it behaves differently, but figured it wasn't worth blocking this PR on figuring that out. I can file a bug on exploring this further. We should try to get a test in there or figure out why it's never entered.

@@ -40,8 +41,7 @@ export default class ScopeHandler<IScope: Scope = Scope> {
scopeStack: Array<IScope> = [];
declare raise: raiseFunction;
declare inModule: boolean;
undefinedExports: Map<string, number> = new Map();
undefinedPrivateNames: Map<string, number> = new Map();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch!

Copy link
Contributor

@JLHwung JLHwung left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In this PR the pos: Position is expanded into an origin: Origin. Personally I still prefer the argument order of previous .raise() signature because the params are coupled with the error template.

raise(origin: Origin, template: ErrorTemplate, ...params: any)

@tolmasky
Copy link
Contributor Author

So, just upfront, I know this is a somewhat stylistic change so if you are not convinced by the following arguments, then I will absolutely swap the order back. That being said, I just wanted to check that you read my first response to Nicolo regarding this here, and a couple additional thoughts:

  1. As I mentioned to Nicolo above, my hope is to eventually use the rider "options" argument to pass this (currently limited) error metadata to raise such that it can also be attached to the eventually produced SyntaxError. In an ideal world with associated types, it would be great to have something like this.raise(Errors.InvalidEnumName, { node, enumName, suggestion }), such that someone writing dev tools would have access to these important items without having to parse error messages. Without fancy types though, even just this.raise(Errors.InvalidEnumName, { at: location, metadata }) feels more akin to how this would work in a pure-throw-no-recovery code-style of doing throw Errors.InvalidEnumName(location, metadata), with the added benefit of "named parameters". I actually started doing this when I was working on this change, which is where that swap originally happened, and then when I scaled the PR back, I decided to keep the swap just to float the above thinking. All this to say, I am onboard with params mattering, and would like to make it even more readable in a future commit, by "having our cake and eating it too" and making all associated data feel attached to the template. I figured I'd do this change now as a very small number of Errors actually currently take advantage of params (~14%), and I'd love to increase that going forward.

  2. As I was making my way though this, it became clear that like unexpected, the default situation is to throw the error from "here", and thus in a later change we could probably further simplify a lot of this code down to this.raise(Errors.Whatever) and infer the right place from lacking Origin argument, IOW, the function interface could be changed to:

    raise(error: ErrorTemplate, origin?: Origin, ...params);

    Thus, the common case would be to just do this.raise(Errors.Whatever), and we'd know what to do. This would compare to the (IMO) more confusing option of: this.raise(null /*what is this?*/, Errors.Whatever); (which we currently do with unexpected(null, token)). It happens to be that the default (currently) for unexpected is also to not pass the expected token, and thus this is "felt" less since you are much more often just taking the default for both: this.unexpected(). If we get better at reporting what we'd like to see however (like for example I know a few places where it would be nice to have something like this.unexpected(null, Identifier) (which you can't currently do since there's no single 'identifier' token, but you can imagine a Token | Concept which would allow that), you'll see more and more this.unexpected(null, thing).

  3. This last point kind of segue-ways to the most "style-y" argument that really just comes down to my feelings as someone expecting to soon wade through all the errors to try to make as many of them recoverable as possible ;), which is that the thing I want to stand out is what the error is, and not where the error happens to be coming from (which is always some variation of "here", in the boring case of this.state.startLoc or node, it becomes fairly monotonous noise, and in the more complicated cases of "position mechanics", very much stands in the way of just figuring out what the intent of the error is, as it is more likely to trigger a line wrap and thus for the Error "name" to be 2 lines removed from the error raise).

Anyways, as I stated earlier, please don't let the wall of text make you feel that am unwilling to swap it back , just wanted to share various "error thoughts" I had while making my way through this, and I am happy to swap it back if they are not convincing.

…an correctly and efficiently track line and column offsets. Additionally:

1. Make Errors.ElementAfterRest a recoverable error.
2. Make expectContextual throw a nicer error telling you what it expected to find.

Reviewed by @tolmasky.
…coverable, and that expectedContextual now throws an error message with the expected token. Also, update `packages/babel-parser/test/fixtures/es2015/uncategorised/277/input.js` since it had a different syntax error that was previously not reached since it would first fail on the fact that there was an element after rest.

Reviewed by @tolmasky.
…er to make index an actual property in the next minor release.

Reviewed by @tolmasky.
@tolmasky tolmasky force-pushed the fix-some-errors-still-ignore-start-line-option-14123 branch from 5b4d55d to 22d3112 Compare January 17, 2022 23:46
Copy link
Member

@nicolo-ribaudo nicolo-ribaudo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! I noticed that there are a bunch of FIXME: comments, but they mostly seem about "explain why it's safe to get the prev/next pos here".

// after a Hack-style pipe body is parsed.
// The `startPos` is the starting position of the pipe body.

checkHackPipeBodyEarlyErrors(startPos: number): void {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for noticing this unused code.

@@ -32,24 +35,7 @@ export class SourceLocation {
}
}

// The `getLineInfo` function is mostly useful when the
// `locations` option is off (for performance reasons) and you
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Uh, it looks like this option doesn't exist anymore.

@nicolo-ribaudo nicolo-ribaudo changed the title Fix some errors that ignore startLine by adding startLine to getLineInfo Improve errors location tracking Jan 18, 2022
@nicolo-ribaudo nicolo-ribaudo merged commit 478a970 into babel:main Jan 18, 2022
@nicolo-ribaudo nicolo-ribaudo added PR: Internal 🏠 A type of pull request used for our changelog categories and removed PR: Bug Fix 🐛 A type of pull request used for our changelog categories labels Jan 18, 2022
@tolmasky tolmasky mentioned this pull request Mar 1, 2022
@github-actions github-actions bot added the outdated A closed issue/PR that is archived due to age. Recommended to make a new issue label Apr 23, 2022
@github-actions github-actions bot locked as resolved and limited conversation to collaborators Apr 23, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
outdated A closed issue/PR that is archived due to age. Recommended to make a new issue pkg: parser PR: Internal 🏠 A type of pull request used for our changelog categories
Projects
None yet
5 participants