Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Inline sourcesContent vs. sources files: which should maintainers use? #42

Open
justingrant opened this issue May 17, 2023 · 13 comments
Open

Comments

@justingrant
Copy link

There are two ways to expose source code with sourcemaps: either the code can be in sourcesContent or maintainers can ship their source files that are pointed to by sources. (Hopefully not both, although I don't think both is illegal in the spec, which may be a mistake.)

Many maintainers don't like shipping the files in sources, because it generates support issues for them. Here's a few of them, excerpted from this issue:

  • If I ship sources, then users will mistakenly think they can patch my TS code and complain when that does nothing.
  • If I ship sources, then users will mistakenly import .ts files directly e.g. import { foo } from 'some-package/src/bar.ts'; and when I change my TS code, e.g. to upgrade to a new TS version, they'll file bugs and complain that I broke their app even though they're not using my public API.
  • Bundlers may be configured to compile TS code in node_modules.
  • Poorly-configured packages may include TS sources and transpiled JS in the same folder, leading the TS file to be resolved first by import or require.

However, sourcesContent also has problems:

  • It makes map files larger, which could slow down use cases that don't require the actual sources (e.g. generating call stacks)
  • VS Code currently treats sourcesContent as a second-class citizen. If sources files are not present, several VS Code workflows don't work well, or not at all.
  • For developers who want to navigate through library code when a debugger isn't running, e.g. to understand how the library works, sourcesContent isn't really helpful.

Should the new sourcemaps spec have an opinion about whether sourcesContent or sources are the best option for maintainers? Would it simplify things a lot for the ecosystem if there were just one recommended solution?

@kamilogorek
Copy link
Member

This is somewhat in line with one of the goals listed in #22

Associating the original source file with the compiled/minified file

  • Harden the spec: Specify semantics of URLs and resolution

By achieving the goal above, it'd make using sources way easier than it is now, however from my perspective, it's still not that trivial nonetheless.
That's one of the reasons why we at Sentry internally rewrite all source maps to include sourcesContent instead of sources if possible, by inlining them during the upload process - https://github.com/getsentry/rust-sourcemap/blob/6088d5e6ec7212332bf91ea82f91f7f3eac4e848/src/types.rs#L27-L30

@mitsuhiko
Copy link
Contributor

We touched on this briefly in one of the recent meetings and I think different folks have different wishes here. While I think we also hate large JSON files, we still prefer large JSON files over accidentally loading the wrong things.

@sjrd
Copy link

sjrd commented May 17, 2023

During iterative development at least, sources are much more efficient than sourcesContent. You don't want to copy over the contents of every file you touch every time you regenerate a source map file. For distribution, things may be different, as you're not regenerating the source map every few seconds.

@mitsuhiko
Copy link
Contributor

Yes, there is a big difference between local development / browser debug tools and the situation you have in a post-hoc tool like Sentry.

@justingrant
Copy link
Author

I wonder if there could be some simplification where we recommend that packages be published one way (either as sourcesContent or sources) and then there'd be an install-time option in the npm/yarn/etc CLI that could change to the other way... or even omit installing of sourcemaps altogether in environments where disk space is scarce.

One thing I've observed consistently is that sourcesContent is more robust in the face of build-step and configuration mistakes. I've filed so many PRs against OSS packages where sources point to files that are not in the npm package. Here's a few common reasons that I've seen these problems:

The core problem is that any time two different files need to be in sync, it dramatically increases the chance of failure. The more atomic sourcemaps are, the harder it is to screw up. Sometimes I wonder if the best solution solution might be if all debug info were in a single file (e.g. sourcemap.tgz) at the root of the package as as sibling to package.json. Then sourcemaps would either always be there or never be there for the entire package.

@llllvvuu
Copy link

  • VS Code currently treats sourcesContent as a second-class citizen. If sources files are not present, several VS Code workflows don't work well, or not at all.

These workflows (go-to-definitions and breakpoints) are actually really important IMO. Worth noting that virtual files increases complexity not only for VS Code but for all IDEs. Rehydrating the original physical files as per #42 (comment) is a neat solution though.

@ehoogeveen-medweb
Copy link

Another reason to avoid sourcesContent is to avoid exposing the original source code, which may contain comments and other details (e.g. unmangled variable names) that aren't necessarily intended to be public. It may be a somewhat crude form of security, but I think it's one to keep in mind.

In my experience however, if browsers are able to load a source map then they will often assume that files in sources will be reachable and if they aren't, debugging or unminifying via the devtools may break.

Personally I would prefer to see browsers behave better in this situation, as requiring source maps to be unreachable if the sources are unreachable means that (depending on what exactly the server returns) you may get a warning in the devtools for every unreachable source map.

@mitsuhiko
Copy link
Contributor

Another reason to avoid sourcesContent is to avoid exposing the original source code, which may contain comments and other details (e.g. unmangled variable names) that aren't necessarily intended to be public. It may be a somewhat crude form of security, but I think it's one to keep in mind.

This is mostly why I hope we can get debugId or something like this in. Source maps really should not be hosted publicly most of the time (with or without sources content). On the other hand sourcesContent is absolutely necessary for a lot of tools to work properly at all as they are parsing sources to restore scopes.

@justingrant
Copy link
Author

Source maps really should not be hosted publicly most of the time (with or without sources content).

IMO it's helpful to clearly differentiate two different sourcemap use cases:

  • Sourcemaps for libraries, where sources must be publicly and easily available so that app developers can use IDE features like debug breakpoints, debugger step into, debugger watch using from-source variable names, editor "Go to Implementation", etc. (Sometimes libraries are non-public, but this is a relatively rare case that can be handled via some opt-out special-casing in build steps.)
  • Sourcemaps for apps, where generally the only reason to have sourcemaps is to enable the app's own engineers (and tools that they use, like Sentry or IDE debuggers) to see the sources when debugging. These will be private by default.

I think you're talking about app sourcemaps only, right?

IMO, enabling all OSS libraries to publicly expose their original sources in an IDE-friendly way should be a top goal of this working group, because debugging an app becomes vastly harder if the only way to see actual library sources is to clone the repo from GitHub and manually try to associate the transpiled and/or minified code being debugged with the sources from GitHub. I have to do that frequently, and it's horribly inefficient.

avoid exposing the original source code, which may contain comments and other details (e.g. unmangled variable names) that aren't necessarily intended to be public. It may be a somewhat crude form of security, but I think it's one to keep in mind.

My understanding of sourcemap generation is that it has two logical parts: first generating sourcemaps for an app's source code, and then combining that sourcemap with sourcemaps for all libraries used by the app. I'm not sure if bundlers actually do it this way, but at least this is how I think about it without thinking about implementation details.

If sourcemap-generating options (like blanking comments and/or obscuring variable names) are something that build tools want to support, then I think it's probably helpful to scope those options to the first, per-package step above (generating sourcemaps from original sources) and not globally at the "merge sourcemaps together" step, because it seems quite helpful for technically-adept users to debug into and report bugs about problems in apps (or in OSS libraries used by those apps) even when those apps' owners don't want to show their (non-OSS-library) proprietary sources to the world.

Note that sourcesContent would seem to be a better fit than sources for cases where an app developer wants to present different source code than the app's actual sources. Otherwise you'd have two similar-but-not-identical files sitting on every developer's machine after builds, which seems like a recipe for accidentally looking at the wrong files and not realizing it.

@mitsuhiko
Copy link
Contributor

I think you're talking about app sourcemaps only, right?

Personally I think these two problems are the same. In either case I would like to disassociate the downloading of the source maps from the built artifacts (see also #41 (comment)).

IMO, enabling all OSS libraries to publicly expose their original sources in an IDE-friendly way should be a top goal of this working group

That I agree with in principle, but I'm not convinced that the solution will be publicly hosting them identified by a URL. But that I believe is a conversation for the other thread.

sources itself raises a lot of problems and in practice right now I vastly prefer when people ship the source. See for instance this issue: #44

@llllvvuu
Copy link

llllvvuu commented Sep 3, 2023

IMO it's helpful to clearly differentiate two different sourcemap use cases:

It's also reasonable for sourcemaps to look different in each case (not that I'm necessarily advocating for), since the bundler will transform the library sourcemap into an app sourcemap. So sources in dev can become sourcesContent in prod. Actually this may be the way many people are doing right now.

@ehoogeveen-medweb
Copy link

I'd like to walk back my earlier comment as your comments made me realize that once the source map itself is private, it doesn't matter whether it links to similarly private sources or contains them. Similarly, a source map with inaccessible sources is not useful because there's nothing to map onto.

I think the reason I wanted them separate initially is that usually, CLI tools that generate source maps place them in the same path as the files that use them, which means you can't make them private with security on a directory level (which is what I'm limited to currently). But because being unable to load the sources referenced by the source map seemed to break the devtools of various browsers, I moved the source maps to a different path with the same security as the source itself.

So I no longer have a fundamental concern about sourcesContent from a security point of view, although I think the options offered by CLI tools can be a limiting factor in practice (in my case, I moved away from CLI tools entirely).

@justingrant
Copy link
Author

justingrant commented Sep 4, 2023

In either case I would like to disassociate the downloading of the source maps from the built artifacts (see also #41 (comment)).

This seems like a good goal, but I think the right path to achieve it may be different for the app case vs. the OSS library case.

In the OSS library case, there's an npm CLI involved in both the publishing and downloading (aka installing) ends. So I could imagine a solution that adds options to npm install (e.g. npm i --sourcemapsRemoved and npm i --sourcemapsOnly) that could enable that decoupling for all packages, rather than waiting for maintainers to update their build pipelines to upload packages and sourcemaps separately. (And introducing new points of failure into an already-brittle publishing pipeline.).

I've been trying to get OSS maintainers to fix their sourcemaps for 4+ years, and it's an uphill battle. Any solution that is implemented in the platform pieces (npm, browsers, IDEs, etc.) will deliver value to developers many (for some packages, infinite) years quicker than something that requires maintainers to change anything.

(npm isn't the only package manager or package repo, but IMO the same points above apply to the small number of other package managers and repos: it's straightforward to change the installer to either omit sourcemaps or omit non-sourcemap files from a download.)

For apps (and private libraries), a few things are true that are different than for OSS libraries:

  • The source may be private
  • Even if for some reason the source is not private, app sourcemaps' primary purpose is to help the app developer and her colleagues at the same company, not to help external developers learn how to use and/or file bugs and/or fork the app.
  • The sourcemaps may be submitted to 3rd-party services like Sentry, or internal tools that use sourcemaps to help in the development, QA, and/or tech support process.

For the app case, it seems like the right place to do the decoupling would be in a build step and/or bundler plugin. I could even see sourcemap consumers like Sentry taking the lead in creating and maintaining such a tool if it combined stripping sourcemaps from bundles and submitting those sourcemaps to a server like Sentry's in a single CLI tool.

I guess it's possible for the same build step or plugin to send both sourcemap-stripped code and sourcemaps to npm too, but at that point npm has to deal with sourcemaps getting out of sync to the package, being built with different source, etc. Those problems wouldn't exist if npm was doing the splitting on the server API handler. So it seems unlikely that npm would want to offer a publish API or CLI that accepted two separate code vs. sourcemaps files as inputs.

Also, for legacy consumers of npm packages who don't know about splitting, the full package (including sourcemaps) still needs to be available somehow via npm install, so if sourcemaps were separately submittable from packages, then NPM still needs to do the work to merge them into a single download... and somehow fix its checksum. Seems easier for NPM to go in the reverse direction and do the splitting on install instead.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants