-
-
Notifications
You must be signed in to change notification settings - Fork 4.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Change Request: Expand processor support to automatically process source maps #17398
Comments
I seem to recall discussing this previously at some point but can't find the reference at the moment. I know we have definitely looked at working with source maps several times in the past but always came to the conclusion that it was actually more complex than it was valuable. In the case of processors, it's debatable that sourcemaps would be a simpler solution for reconstructing source code locations, and we'd then need to support two ways of doing the same thing for at least one major version. I'm not opposed to exploring further, but I'd love to see a prototype-y bit of example code that shows what the experience would be for processor writers. Maybe look at @btmills if you're still hanging around I'd be interested in hearing your thoughts on this. |
Sure, happy to prototype when I have a bit of time. It'll at least give an opportunity to flesh out how feasible auto-fixing might be with sourcemaps, or whether that's the sort of thing that's going to need to be configurable by the processor. e.g. I can see the markdown processor being fine with autofixing because it just pulls out whole chunks of code; but for the XUL processor case where there's much more complex logic that could be easily broken by autofix transforms it would probably need to disable autofixing and only highlight issues. I'll focus on "can we map issues back without too much complication?", and then we can discuss whether to move forward to an RFC 😄 |
You can probably start with an even simpler prototype: rewrite the markdown processor so that it creates and stores a sourcemap in |
Yup, that was my intention, no point using a lot of time poking at ESLint core if the proposal might not make it to RFC 😄 |
I don't recall participating in conversations about source maps myself, but I did find a few previous threads. I naively approached the Markdown processor's mapping assuming that adjustments would be straightforward "subtract …Then I had to handle under-indented lines that start before column Source maps have many more features than the Markdown processor requires. If source maps can eliminate the existing translation complexity without adding undue complexity of their own, that would reduce a significant barrier for building processors. A prototype is a great way to see if that plays out. There are extensive tests for the different cases that'll indicate whether the prototype is equivalent. |
OK, so while I was initially optimistic about the capabilities of source maps, after a bit of experimentation and a closer look at exactly what a source map is representing, it looks like this isn't going to be viable to implement in a general way that ESLint could handle by itself. Just in case source maps come up again in the future, here's the problems with this approach I've found: Source maps do not create a direct mapping between all valid line/column positions in the generated and source code, nor do they map ranges of positions. They will (in general) map any column on a line to the nearest explicit mapping backwards on that line. The expectation is that mappings are generated for each parsed token, and the mapped line/column will refer to the start of the same token in the source and generated code. This means it is not possible to map from an arbitrary position in the generated code back to its exact corresponding position in the source - it can only map to the nearest mapped token position. This means that to generate a source map, it requires more than knowing what portions of the document you need to map into the generated code - a full parse is required so you can map known tokens of whatever language you are using. But crucially, it also means that the reverse mapping also requires this parse information - since the source map only gives the start position of any given token, the parsed source information is required to actually make sense of what any position mapping (especially one that is a range) is trying to refer to in the source. As a simple, but hopefully demonstrative example, if we have the following:
// generated
var someVariable = someFunctionCall("something something tyqo something"); There's a few ways we might think about constructing a source map:
Now let's look at what happens if we had a hypothetical rule that highlighted the typo in the string, e.g.
So, why don't any of these work for ESLint?
So, TLDR, source maps aren't a magic mapping tool that will allow ESLint to map line/column positions automatically for any arbitrary generated code. They would require ESLint to have some knowledge of the structure of the original source, and the whole point of processors is that they're keeping anything non-Javascript at arms length from ESLint. |
Having gone through all this though, and given the complexity present in the markdown processor to manage what is mostly a 1-1 mapping of JS from the source to the generated code, I think there's potentially room for an alternative here that could still help eliminate a lot of the finicky mapping code. As a broad stroke, the idea would be that instead of providing a string as the processed code, the processor could create a This could then allow ESLint to perform the exact mapping when possible, and apply auto-fixes if the fix covers a contiguous mapped region. Any problems/fixes that can't be automatically mapped would then be passed to Imagined example for the markdown processor: function generateMappingForCodeBlock(generatedFileName, text, node, directives) {
const generatedCode = new GeneratedCode(generatedFileName, text);
for (const directive of directives) {
generatedCode.writeUnmapped("/* ");
generatedCode.writeFromSource(directive.position);
generatedCode.writeUnmapped(" */\n");
}
const baseIndent = getBaseIndent(text, node);
const startLine = node.position.start.line + 1;
const endLine = node.position.end.line - 1;
if (baseIndent === 0) {
// The entire block is mapped 1-1, so just write the whole thing.
generatedCode.writeFromSource({ start: startLine, end: endLine });
} else {
// The block is indented, so we need to write each line individually.
for (const lineNo = startLine; lineNo <= endLine; lineNo++) {
generatedCode.writeFromSource({ start: { line: lineNo, column: baseIndent + 1 }, end: lineNo });
}
}
return generatedCode;
} Obviously there's some conventions I've introduced in there about exactly how you can specify positions by eliding column info, that's to make things a bit easier on the developer by punting things like "what's the last column of the line?" into the code generator. And note that at no point does the developer have to think about the position information in the generated source - it's all tracked by
Internally, the code generator keeps track of mappings from the generated code to the source, handling intricacies like writing multiple lines at once, or how long lines are etc, and can then both generate the final processed code for ESLint and exactly map (hopefully most) problems back to the source text. In this case, the markdown processor would then only need to handle the edge cases in
Obviously, this would get much more complicated if there's a more complex transformation going on than picking bits of JS code verbatim out from other file formats, but I think what's proposed above potentially covers a lot of what processors were originally designed to handle, and if more complex transformations are needed, they can just drop back down to manual handling. |
Thanks for all of the information on source maps, that's very helpful and will undoubtedly help when future discussions come on up. For the |
ESLint version
8.45.0
What problem do you want to solve?
Processors as currently specified are required to perform all the work of mapping to plain JS, and then mapping ESLint's error messages back to the source file. That keeps ESLint core tight and allows arbitrary files to be processed, it's a great extension point.
But it can be daunting to figure out the post-processing step, and mapping back from the intermediate files to the source file.
What do you think is the correct solution?
I'd like to propose expanding the processor functionality by adding support for returning Source Maps alongside the pre-processed files from
preprocess
.If a source map is provided, then ESLint could automatically resolve the source location of its own warning messages, and the file/messages would not need passing back to
postprocess
. I believe this means it could also automatically apply any fixes, though I'm not 100% on that!In the best-case, where every file returned from
preprocess
has an associated source map,postprocess
wouldn't even need to be called - there'd be nothing for it to do.This could look like the following:
This would mean that if users wanted to integrate other tools that already generate source maps as processors, this would be almost trivial to accomplish 😄
Given that mozilla provides a well-supported library for processing source maps, I don't think this would be too complex to integrate into ESLint.
Participation
Additional comments
I understand that this is already possible to accomplish today by processor developers by writing in the support themselves and keeping their own map of files -> sourcemaps that
preprocess
/postprocess
share. But I think giving this first-class support could make that implementation a lot cleaner, and could encourage new processor development.Some of this might be eliminated by #16999, but equally implementing an entire language plugin might be more daunting if the same result could be accomplished via a processor and source maps 😄.
The motivating example for this was expressed in #17380 and #17397, where source map support could make this sort of pre-processor easier to implement as a processor in ESLint.
The text was updated successfully, but these errors were encountered: