New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Type-safe ParseErrors #14320
Type-safe ParseErrors #14320
Conversation
a32b757
to
aeb60c0
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wow this is a long PR description!
I only reviewed the first files so far.
|
||
import { ParseErrorCodes, toParseErrorCredentials } from "../parse-error"; | ||
|
||
export default (_: typeof toParseErrorCredentials) => ({ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: to reduce the boilerplate, we could export type _ = typeof toParseErrorCredentials
from parse-error.js
, and just to
export default (_: _) => ({
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, this one is really unfortunate. If you look at the flow/typescript/placeholder/etc. errors, they don't need this at all, because they just call toParseErrorClasses(_ => ...)
, and so get the function's type inferred for free. The standard errors can't do this because I want to export them from parse-error, and do to the infuriating way that import works (this wouldn't be a problem using require
), I can't use toParseErrorClasses
in these files (see me complaining about this here: https://twitter.com/tolmasky/status/1492204196019453952?s=21 ). I chose not to export toParseErrorCredentials
for this reason, since its an ugly artifact of the Flow/ESM combination (it wouldn't be necessary with Typescript/ESM or with Flow/require), we don't need it for any of the plugin error definitions, and we can hopefully completely get rid of it when we switch to Typescript. But, happy to do this one if you want, just wanted to explain the reasoning (or rather, just vent about ESM and Flow ;) )
const description = isForOf ? "for-of statement" : "for-in statement"; | ||
this.checkLVal(init, description); | ||
const type = isForOf ? "ForOfStatement" : "ForInStatement"; | ||
this.checkLVal(init, { in: { type } }); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is in
expected to be a node in theory? If we only need the parent type, we can make checkLVal
take a parentType: string
parameter instead.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sigh (cries in anguish remembering how I first wrote the whole thing to work exactly like that, only to then discover...) that then we can't differentiate between "postfix operation" and "prefix operation". I wish we just had PrefixUpdateExpression
and PostfixUpdateExpression
. The current official answer is "this represents the "human-type-identifying aspects of a node, which in almost all circumstances is just "type", but due to the decision of merging postfix and prefix operations into one kind of node, is not always the case".
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, it would have to be "ancestorType" and not "parentType" since we're finicky about which ones we identify, and it is not always the "parent" node, it is often 2 or 3 parents up.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To expand on this a little more, just because I spent so much time thinking about it, I can imagine a future where, when relevant, we include the "error node" as part of an error. So, you can imagine "higher level" errors like this have AST information (specifically, a relevant node), as opposed to "flat text information" (a Position
). In such a world, it would be fine to simply supply "the whole node" to the error, and this UpdateExpression
problem "goes away", as ancestor would be the straight-up complete node, and the toNodeDescription
has access to everything it needs. I did not want to down that route yet though, and so chose to provide "only the relevant portions of the node" (you could imagine in the future needing to include computed
for MemberExpressions, etc.). This allows us to be forward-compatible with a future that has such a "give you the whole node" type system, while limiting what we are promising in the current API. Or at least, that was the rationalizing I gave myself since I wish the type
was just sufficient (in all cases, even like with MemberExpression).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Accidentally had placed a comment for another section here.
this.raise(Errors.InvalidLhs, { node: init }, "for-loop"); | ||
this.raise(Errors.InvalidLhs, { | ||
at: init, | ||
ancestor: { type: "ForStatement" }, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also here: is the ancestor
expected to be a node, or do we only need parentType
?
...ges/babel-parser/test/fixtures/typescript/cast/unparenthesized-assert-and-assign/output.json
Show resolved
Hide resolved
packages/babel-parser/test/fixtures/typescript/declare/const-new-line/output.json
Outdated
Show resolved
Hide resolved
packages/babel-parser/test/fixtures/typescript/const/no-initializer/output.json
Show resolved
Hide resolved
.../fixtures/typescript/type-only-import-export-specifiers/export-type-only-keyword/output.json
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems like we should consider renaming InvalidOrUnexpectedToken to UnexpectedCharacter, or removing it and just using UnexpectedToken instead.
I agree, however since the reasonCode is part of API, we have to defer such changes to Babel 8.
packages/babel-parser/src/parse-error/pipeline-operator-errors.js
Outdated
Show resolved
Hide resolved
fa2c7c7
to
cc74ece
Compare
SourceTypeModuleError: "BABEL_PARSER_SOURCETYPE_MODULE_REQUIRED", | ||
}); | ||
|
||
export type ParseErrorCode = $Values<typeof ParseErrorCodes>; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm surprised that Flow reports this as $Values<{| SourceTypeModuleError: string, SyntaxError: string |}>
, but then it does the correct thing and doesn't allow me to assign any string to it.
528989c
to
aa060de
Compare
Co-authored-by: Nicolò Ribaudo <nicolo.ribaudo@gmail.com>
Reviewed by @tolmasky.
Reviewed by @tolmasky.
Reviewed by @tolmasky.
Reviewed by @tolmasky.
Reviewed by @tolmasky.
Co-authored-by: Nicolò Ribaudo <nicolo.ribaudo@gmail.com>
Co-authored-by: Huáng Jùnliàng <jlhwung@gmail.com>
Co-authored-by: Nicolò Ribaudo <nicolo.ribaudo@gmail.com>
Co-authored-by: Nicolò Ribaudo <nicolo.ribaudo@gmail.com>
Co-authored-by: Nicolò Ribaudo <nicolo.ribaudo@gmail.com>
Co-authored-by: Nicolò Ribaudo <nicolo.ribaudo@gmail.com>
Reviewed by @tolmasky.
…return type can be null. Reviewed by @tolmasky.
Reviewed by @tolmasky.
aa060de
to
12faef0
Compare
The main point of this PR is to deliver on the original promise made in PR #14130 where the re-arrangement of the
raise
parameters would allow us to eventually provide a more descriptive and type-safe way of raising errors. However, during the execution of this change, a number of other bugs popped up (most notably #14317, #14318, and #14319), which were also tackled here. The summaries below describe each of these "conceptually separate" (but code-connected) changes.Type-safe
Parser.raise
As mentioned above,
raise
is now "Error class" aware and type-checks any additional parameters passed to it for forming the resulting error message. So, for example, instead of doing this:You now do this:
I think this makes these errors more understandtable at the point of
raise
-ing them, since they can get somewhat confusing when they start to have multiple parameters. Additionally, without type-checking and good names, they can quickly become out of sync with their definitions. For example, prior to this change we passed an unused argument when raisingErrors.ModuleAttributeDifferentFromType
:This is now a type error. Additionally, passing the wrong type for each parameter is also a type error. For example, if you accidentally pass a
Node
to something expecting astring
, you'll see something like this:Defining ParseErrors
The way you define these type-safe errors is fairly straight-forward too (although even more straight-forward in Typescript, where I originally prototyped this...). Similar to before, you create an object where the key represents the
ReasonCode
, only you pass in atoMessage
function if the message is dynamic instead of the hand-rolled templates we were using before:The
_
wrapper is an unfortunate workaround for a Flow bug that will be fixed after their big year-long rewrite. Hopefully though by then we will have just ported the parser to Typescript and then it will just look like this (the Typescript implementation exists on a separate branch so it will be easy to swap in):The API allows you to do a couple of other things, like easily pass in a SyntaxPlugin:
As well as conditionally override the
ReasonCode
andErrorCode
on a per Error basis:Live errors
A natural consequence of the implementation is that these errors are also live. That is to say, if you now mutate the error after the fact:
The error message will now also reflect the new information. This is important for us at RunKit (it allows us to update error messages when previous cells change without having to re-parse or do string manipulation), but also gives me the necessary tools to significantly expand our fuzz testing. Since errors now contain semantically meaningful information, that can be both interpreted and modified, it is possible to combine this with placeholders to create a ton of variety from existing tests. For example, something like
let %%name%% = 10; let %%name%% = 20
, can both be trivially mutated, but also have the resulting error trivially mutated as well (duplicateIdentifierError.name = random_name
), allowing us to quickly generate the expected errors without needing a ton of fixture, the same way we do with line number fuzzing currently. Another simple place to start could be to make most identifiers in our error fixtures be placeholders, and then insert reserved words into all our existing tests expecting them to be UnexpectedReservedWord/Keyword errors.ParseErrors
now also have aclone
method to copy errors with changes so that you don't have to necessarily mutate the original error:Any property left out of
clone
's argument will just be taken from the original (this is recursively true for thedetails
property).No more need for dry-error-messages eslint rule, and no more ignoring the dry-error-messages eslint rule
A natural result of this is that we no longer need the custom dry-error-messages linter rule, this is now enforced by the type system, and furthermore there are no remaining instances where this rule is ignored. All errors are now properly declared through the
toParseErrorClasses
and all instances of "inline" calls of on-the-flyErrorTemplate
s have been removed.report-error-message-format
As mentioned above, we now get the
dry-error-message
behavior for free through the type system, and it is fully enforced on all errors now. As such, this linter rule was removed. This however leavesreport-error-message-format
rule, which after this PR won't do anything since it still only operates on calls tomakeErrorTemplates
, which no longer exists. I have purposefully not updated this for a two of reasons:Our error messages now make use of template strings, so it would be non-trivial to execute this rule statically.
Even prior to this change, many error messages evaded this plugin (most notably, "Unexpected token") since they were being passed inline. As such, had I updated this rule to work with the new error system, it would have lead to either needing to disable it for a number of error messages, or caused a lot of test churn as I updated a bunch of tests to include periods at the end. I wanted to keep the test updates in this PR focused on the meaningful behavior changes and not drown them all out in a sea of updated tests.
I think the best course of action is to change this in a subsequent commit to be a rule that is checked at test time, that way we are not relying on being able to discern what the final message would be after the template string is fully resolved. In the test runner I can easily loop through all the messages in the error array and throw an error if they don't match our desired format.
Further ParseError Notes
ParseError
s still respect setting theirmessage
property directly. (e.g. doingerror.message = "custom message";
).ParseError
s still show up asSyntaxError
s in stack traces, logs, etc. (AKA, theconstructor
's name isSyntaxError
, despite each error type now being a custom subclass). This makes the errors behave externally like they used to in most meaningful ways (for example, we didn't have to regenerate any tests because of the changed Error class hierarchy).ErrorParser
intermediate class and just putraise
andraiseOverwrite
directly inTokenizer
, sinceTokenizer
directly makes use of these functions and has all the necessary data to implement them. It for example already implementsrecordStrictModeErrors
itself, so not sure why a more fundamental method likeraise
is punted to a later subclass. Additionally, since these are the only two remaining methods fromErrorParser
(everything else from that file was either no longer necessary, likeraiseWithData
, or has a home is the actualParseError
class), it doesn't really add much to the class.unexpected
,expectPlugin
, andexpectOnePlugin
fromUtilParser
toTokenizer
for the same reason:Tokenizer
has all the necessary data to implement these functions (it's not "waiting" for something introduced in a later subclass to be able to implement it), and usesunexpected()
andexpectPlugin
directly already.reasonCode
is also part of the parameterized typeParseError<ReasonCode, Properties>
. This can be really nice and useful for compiler error messages, but not trivially possible with flow.InvalidOrUnexpectedToken
andUnexpectedToken
, although the former produces a slightly different message "unexpected character". Seems like we should consider renamingInvalidOrUnexpectedToken
toUnexpectedCharacter
, or removing it and just usingUnexpectedToken
instead.pipelineOperator
to theSyntaxPlugin
enum, and thus as a plugin that can show up in aParseError
'ssyntaxPlugin
field. This was mainly since it was nice to be able to separate these errors out into their own file, but this doesn't mean that they have to also have the syntaxPlugin field set. If we prefer, I can keep easily keep the organizational aspect of this while still not exposing this property.checkLVal bug
As mentioned above, in the process of updating the various calls to
raise
, I ran into a few bugs. One of them came up when trying to makeInvalidLHS
andInvalidLhsBinding
use our new type-safe system, since it was being passed a seemingly arbitrary raw "context description" string. Such context strings happens in other places of the code as well, namelyparseImportSpecifierLocal
andparseHackPipeBody
. I've done my best to coalesce these various "node description" strings to instead be a "UI concern" of errors inparse-error/to-node-description.js
, instead of a code concern for various functions that internally end up callingcheckLVal
and thus need to provide context strings. What this means is that instead of having to have various methods (likecheckLVal
,parseImportSpecifierLocal
,parseHackPipeBody
, etc.) keep track of, and pass around, description strings, the error instead receives the relevant node type (a piece of information we already have in most cases), and then the "human friendly" description is generated by the error when creating the message itself. This makes it easier to keep a consistent terminology throughout all errors that want to do this sort of thing, and simplifies the code by removing an extra "noisy" parameter (that was often not even used -- for example, whilecheckLVal
only ever produces an error message that references assignments, parentheses, or destructuring, it still forced every caller to come up with an english name for the parent node in question).This reorganization surfaced a real bug: the fact that we currently don't always identify invalid uses of
let
variable names. Versions of this bug affected not just the base implementation but actually further plugins as well (you'd fix it for just JavaScript and have other related bugs still in Typescript for example), and it all came down to the fact thatcheckLVal
both has a lot of parameters, as well as complicated recursive behavior. Just considering the JavaScript parser momentarily, fixing the bug there was somewhat straight forward in that it was the result of recursive calls accidentally not forwarding all the arguments they were passed, and thus those parameters would get reset to their default values (in other wordsdisallowLetBindings
would eventually be dropped if the recursion path took the right turns and was deep enough).However, this was then further complicated in the Typescript implementation, since it inadvertantly changed the meaning of the
checkLVal
's parameters. The original source of this bug is that the Typescript parser didn't properly copy over all the default initializations for the various parameters in itscheckLVal
overridden definition. This means that binding would by defualt beundefined
instead ofBIND_NONE
. To compensate, Typescript.checkLVal would thus treatbinding = falsey
as "none", and thus not treatBIND_NONE
as "none". Interestingly, this sometimes lead to a "two wrongs make a right" situation because existing calls to justcheckLVal(blah)
would all of a sudden act as if they were passingundefined
for binding, which Typescript would respect, and so things would occasionally "seem" fine. However, things would break down during cross-inheritance recursive calls where the two "belief systems" would come in conflict as to whether "none" should be represented byundefined
orBIND_NONE
. This has its own interesting behavior, that I won't go into much more detail about, except to say that the resulting behavior was thatbinding
actually ended up becoming almost an inadvertent "state variable" that was gating recursive calls.Anyways, to try to make this all significantly simpler, I made the following change.
checkLVal
now only exists in the LVal parser (lval.js). Instead of overriding this method, subclassers override a separateisValidLVal
method, which is way simpler, and decouples the question of whether this particular node is an LVal or needs to be traversed from the actual traversal and other various book-keeping state that makes up the 5 arguments that get passed around incheckLVal
. The goal was to remove subclassing recursion and re-entrancy which in my experience leads to incredibly difficult situations to reason about like this. Instead,isValidLVal
is only "linearly recursive". That is to say, subclass implementations only every "call up" and never wind up back inside of themselves. You handle the special nodes you define, and otherwise hand off the task to the superclass, and are never in charge of any traversals yourself. For example, inplaceholders.js
:Anyways, this fixes #14317.
TypeScript Changes
In the process of investigating the original let bug, and in updating the errors in Typescript generally, I encountered further keyword and reserved word bugs. Additionally, I discovered that our external tests weren't as robust as they could be, and thus made it hard to match the new results. Below is a brief explanation of each change at the high level:
Typescript tests
The tests we pull from Typescript proper aren't in a format that we were consuming properly from what I could tell. To list a few issues:
sourceType: "module"
, despite many tests explicitly testing non-strict mode situations. I now use "unambiguous" and look for the "@alwaysStrict" to hopefully parse each test in the proper mode.twoslasher
stuff to try to get as close as possible to something we can make sense of in our tests, splitting out the individual files and throwing out any file formats that don't make sense for our tests, like JSOn and markdown for example. This does a better job of consuming most tests, at the expense of a small number of tests (3 or so I think?) that were relying on everything being inlined together to (accidentally I think?) simulate the behavior of having imported those files (or at least sharing common namespace)..d.ts
files vs..ts
filesscripts/parser-tests/typescript/error-codes.js
.The end result is that the
scripts/parser-tests/typescript/allowlist.txt
is now dramatically shorter (301 fewer "exceptions"). However, I do feel it's important to state that the external typescript testing process is fairly tenuous, as success has always been measured simply by whether both the original test and our test both agree on whetheran
error is generated, any error, even different errors. I certainly ran into this myself, where for example previously completely unrelated errors were thrown (including the file being the wrong encoding I think) and thus our behavior was deemed "correct" simply because we "expected" an error in this file. I know that this is tricky no matter what, since we're only really capable of syntax errors, and the typescript error suite covers semantic errors and tests sets of files instead of just single files, but I think there might be more we could do here. For example, I think a potential avenue of exploration is trying to include the TSErrorCodes in our own errors, so as to try to do a better job at identifying whether our errors match with theirs. In that setting, we could have "partial successes", where we throw a subset of the errors what are expected from a given test, and we can better track how we stack up against typescript proper..d.ts handling
Just to get one of the easier changes out of the way first, I've made it so that if the
DTS
plugin option is not explicitly set tofalse
, and thesourceFilename
option provided ends with a.d.ts
file extension, we automatically assume the entire file is in an ambient context (in other words, thatDTS
had been set totrue
). I think this is a very safe change (since as we'll see below we were kind of always inadvertently getting DTS behavior previously), and also I think this realistically matches the expected behavior. As mentioned above, this was instrumental in getting a lot of tests to pass since we were now trivially using the current ambient context mode when parsing.Reserved words are once again meaningful in Typescript
OK, now we'll get into the meat of the Typescript changes, which are ultimately related to the
checkLVal
changes above. As I was figuring out why thelet
stuff continued to be broken in Typescript after implementing the fix in the main JavaScript parser, I discovered that it was arising for a variety of other reasons in the Typescript parser as well. I'm not sure if it was always this way, or simply changed at some later point, but all reserved word collision-checking is disabled in typescript prior to this PR, by givingcheckReservedWord
an empty implementation. The reason I don't think that this was always the intention (or perhaps remains not currently the intention), is because there is plenty of code in the rest of the Typescript parser that does try deal with keywords and reserved words. However, for the most part, this code has no effect due to thecheckReservedWord
situation. For example, the import specifier code explicitly goes out of its way to keep track of whether keyword identifiers are allowed or not. For example,parseTypeOnlyImportExportSpecifier
's has fairly complicated code to track whether to allow keywords in import specifiers during the "as as as" parsing. However, this code actually doesn't work in 2 out of 3 cases (you're supposed to callparseModuleExportName
instead ofparseIdentifier
), but this is never noticed since keywords are just turned off everywhere.The source of the decision to disable
checkReservedWord
seems to be that reserved words are allowed in surprising places when parsing typescript declarations (in other words, in ambient contexts). This is called out in thecheckReservedWord
empty implementation. However, this means that currentlyfunction if({ const, while, with, let }){}
is totally valid typescript. The fix is to simply disablecheckReservedWord
only when in ambient contexts. However, we also aren't always currently setting the ambient context correctly...Appropriately determine when to establish an ambient context
So the next part of this fix was to also catch all the remaining edge cases where we should be in an ambient context. Previously, we weren't setting the ambient context state when in a
declare
declaration. For example, indeclare class ...
will now haveinAmbientContext
set totrue
. As mentioned earlier, we now set ambient contexts depending on the filename as well, so this catches those cases.const declarations in ambient contexts
One final important change with ambient contexts is that you are now only allowed to leave off the initializer on const declarations in ambient contexts. So
const x: number;
is normal code (just like with typescript proper). Similarly,const x:number = 10
is not allowed in ambient contexts. However, just like with typescript proper, type-less literal consts initializers are allowed in ambient contexts (e.g.const x = 1
orconst x = "hi"
). This fixed a bunch of third party tests, and made it so I had to adddeclare
in front of a few consts in our tests. I had the beginnings of the same fix for functions, but those are harder since implementation-less functions are "kind of" allowed if they are overloads. This can be somewhat trivially implemented in the same way it's done in the Typescript parser, but is complicated by the fact that we have to keep track of ESTree style methods as well as normal Babel methods, etc. For this reason, you can still just outright leave out implementations from functions anywhere in the code, matching Babel's current behavior.Interfaces and enums are parsed as statements
The other reason (I think) for the lack of reserved word checking in Typescript is to allow
interface
andenum
to escape error detection and then be hijacked during expression parsing to implement these features. These two are now simply handled inparseStatementContent
, the same wayclass
is in normal JavaScript, and thus avoids this problem entirely. It is just the context-sensitive Typescript keywords (likemodule
andabstract
) that need to be handled in the expression code, much likelet
in normal JavaScript.