Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Node.js APIs are inconvenient to use with URL strings #48994

Closed
GeoffreyBooth opened this issue Aug 1, 2023 · 46 comments
Closed

Node.js APIs are inconvenient to use with URL strings #48994

GeoffreyBooth opened this issue Aug 1, 2023 · 46 comments
Labels
esm Issues and PRs related to the ECMAScript Modules implementation. feature request Issues that request new features to be added to Node.js. fs Issues and PRs related to the fs subsystem / file system. path Issues and PRs related to the path subsystem. stale url Issues and PRs related to the legacy built-in url module.

Comments

@GeoffreyBooth
Copy link
Member

GeoffreyBooth commented Aug 1, 2023

Problem

Spinning off from #48740 (comment), many of our APIs such as fs.readFile accept URL instances (what you get from new URL) but not URL strings (like import.meta.url, or the return value of the soon-to-be-unflagged import.meta.resolve). In the case of many (all?) of these, all strings are interpreted as paths. This is frustrating since in ESM, we have easy access to URL strings such as import.meta.url but path strings require using helpers such as fileURLToPath.

Original Idea

Please see discussion in the full thread below; while this first idea kicked us off, we’re still brainstorming other solutions. If/when we reach a consensus on ideas that seem worth implementing, I’ll update this top post with a summary.

Wherever feasible, all Node.js APIs that can accept URL strings should do so. In particular this is most relevant to the fs APIs, especially the ones that already accept URL instances. To avoid ambiguity with path strings, such APIs should only interpret URL strings that begin with file:.

I presume that this would be a semver-major change, to avoid needing to first check for the existence of a file or folder named file: in the local path; or perhaps we could add such a check now in order to land this and backport it, and remove such a check in a semver major.

We would also need to consider the security implications. Per @aduh95:

the fact is that readFile('file:///etc/passwd') currently throws (unless there is a local folder named file:, but that’s unlikely), it would concern me if we were to release a new version of node where it no longer throws.

I don’t really see how this is a security concern, but I concede that there might be issues to consider. Perhaps some can be addressed via permissions or policies. I do feel however that since URL strings are so prevalent in ESM, we should require a high bar for security concerns to outweigh usability for this feature.

cc @nodejs/loaders @nodejs/modules @nodejs/security

@GeoffreyBooth GeoffreyBooth added fs Issues and PRs related to the fs subsystem / file system. url Issues and PRs related to the legacy built-in url module. path Issues and PRs related to the path subsystem. feature request Issues that request new features to be added to Node.js. esm Issues and PRs related to the ECMAScript Modules implementation. labels Aug 1, 2023
@Qard
Copy link
Member

Qard commented Aug 2, 2023

The only part of ESM that consistently has a URL that I'm aware of is import.meta.url. Specifiers can accept URLs, but the vast majority of uses are path strings. I don't see how changing everything, and in a knowingly backwards-incompatible way, would be preferable to just adding a path string to import.meta alongside the URL that's already there. 🤷🏻

File URLs are not paths and I don't think they ever should have been treated as such. The absolute nature is verbose, which has poor usability, and putting them in the URL class conflates structural validation of the URL format with correctness of the path string which is inaccurate. There are valid paths on some systems which are not valid URLs.

I'm 👎🏻 on this. I feel even supporting URL instances in the fs API was a mistake. If we wanted a typed representation we should have proposed a Path type as an equivalent to URL specifically for file paths.

@tniessen
Copy link
Member

tniessen commented Aug 2, 2023

@GeoffreyBooth In the very example you gave, @aduh95 explained that file: may refer to a directory within the process's current working directory. Are you suggesting an ambiguous interpretation of such paths? If so, that seems like something we absolutely should not do.

@GeoffreyBooth
Copy link
Member Author

I don’t see how changing everything, and in a knowingly backwards-incompatible way, would be preferable to just adding a path string to import.meta alongside the URL that’s already there. 🤷🏻

With regard to the second point, I would be happy to add a path string to import.meta if we can get a WinterCG-blessed API to do so. Whether or not we add that doesn’t affect whether or not we make the change I propose here; we can do both.

As for “changing everything,” currently readFile‘s first parameter accepts all the following forms of input: string, Buffer, URL, or FileHandle. It would still allow all those forms of input. Currently, string input throws if it’s a file URL (unless the user happens to have a folder or file literally named file:, with a colon in the name). What I’m proposing here is that it no longer throw, and use the presence of file: as the first five characters of the string as the method to disambiguate paths from URL strings.

In the very example you gave, @aduh95 explained that file: may refer to a directory within the process’s current working directory. Are you suggesting an ambiguous interpretation of such paths? If so, that seems like something we absolutely should not do.

In Node today it might refer literally to a folder named file:, with a colon, yes. In my proposal, if a user actually has such a folder, they can continue to reference it as a path by prepending ./ to it or making it an absolute path.

I’m not proposing making the string parsing ambiguous. We have two options for how to handle disambiguation, and we would document what method we’re choosing:

  1. Strings beginning with file: are always interpreted as URL strings and that’s it. Everything else is a path, like today.

  2. When parsing a string beginning with file:, Node would first check to see if a file or folder named file: existed on disk. If it does, treat the string as a path; otherwise, treat the string as an URL string. Strings not beginning with file: are paths, like today.

The first option would be a semver-major change, because of the extreme edge case of a file or folder actually named file:. The second option could land today and get backported, unless people think that changing the current “throw on URL string” behavior would be breaking to change. I would suggest that even if we land and backport the second option, we switch to the first option in the next major; or we could just go straight to option 1 as a semver major.

@targos
Copy link
Member

targos commented Aug 2, 2023

I'm not convinced this option would help. People don't want (or very rarely) to readFile(import.meta.url).
They want to manipulate paths with the path module first (for example to read a data file relatively to the module's path).
For that, they have to convert the file URL string to a path string

@GeoffreyBooth
Copy link
Member Author

People don’t want (or very rarely) to readFile(import.meta.url).

The kind of use case I had in mind was something like readFile(new URL('.', import.meta.url) + 'config/app.json'), essentially the same as readFile(join(__dirname, 'config/app.json')). You wouldn’t need join because URLs are always forward slashes regardless of operating system. It adds a bit more predictability to working with references to files, and it lets us avoid needing any helpers at all, whether join or fileURLToPath.

@LiviaMedeiros
Copy link
Contributor

LiviaMedeiros commented Aug 2, 2023

A directory name equal or starting with file: is something that easily occurs in a situation where directory structure reflects remote paths, hence things like fs.mkdir('file:///etc/npm', { recursive: true }) or fs.rm('file:///etc', { recursive: true }) switching from ./file:/ base to / might be too much of a breaking change.
We also can't have meaningful check if file with such name exists for methods that work with paths that might not exist yet (e.g. writeFile).

This is frustrating since in ESM, we have easy access to URL strings such as import.meta.url but path strings require using helpers such as fileURLToPath.

For APIs that accept URL instances, we can use new URL(import.meta.url) instead of path strings.

readFile(new URL('.', import.meta.url) + 'config/app.json')

The correct way here is readFile(new URL('config/app.json', import.meta.url))

Overall, I think the directions of improvement might be:

  • making node:path APIs accept with URL instances, converting them to absolute paths
  • Path class that would guarantee its content to be a valid path (e.g. existing methods can construct a path with NUL character or path longer than PATH_MAX without any warning) and have methods like Path.prototype.toURL() and Path.fromURL()
  • helper methods somewhere that would make path-specific parts (extension, basename, dirname, relative path, etc.) work with URL
  • if the namespace like import.meta.node gets standardized, maybe import.meta.node.URL with frozen URL instance rather than string
  • fetch('file:///path/to/local/file') that would make reading operations protocol-agnostic

@aduh95
Copy link
Contributor

aduh95 commented Aug 2, 2023

  • fetch('file:///path/to/local/file') that would make reading operations protocol-agnostic

+1 to that, related: wintercg/fetch#5

@ShogunPanda
Copy link
Contributor

2. When parsing a string beginning with file:, Node would first check to see if a file or folder named file: existed on disk. If it does, treat the string as a path; otherwise, treat the string as an URL string. Strings not beginning with file: are paths, like today.

+1 to that.

I also think adding the string version of import.meta.url (for instance named import.meta.path) would also help a very lot.

@bnoordhuis
Copy link
Member

first check for the existence of a file or folder named file: in the local path

One word: TOCTOU

As to the general proposal:

  1. Have a string always mean the same thing
  2. Have a string mean different things depending on $factors

Formulated like that, I hope everyone agrees 1 is strictly superior to 2.

@JakobJingleheimer
Copy link
Contributor

  1. When parsing a string beginning with file:, Node would first check to see if a file or folder named file: existed on disk. If it does, treat the string as a path; otherwise, treat the string as an URL string. Strings not beginning with file: are paths, like today.

This seems sensible to me.

As to the general proposal:

  1. Have a string always mean the same thing
  2. Have a string mean different things depending on $factors

2 is already the reality of strings. The two kinds of strings here are easily differentiated and are extremely prevalent in the wild.

@arcanis
Copy link
Contributor

arcanis commented Aug 2, 2023

Generally, I don't understand what makes url so good at manipulating filesystem paths.

2 is already the reality of strings. The two kinds of strings here are easily differentiated and are extremely prevalent in the wild.

Yes, having different APIs on Windows and Posix is really a pain (so much we ended up writing a translation layer in Yarn to only ever deal with posix path, even on Windows), but I'm not convinced using a completely different data structure, one that wasn't created for filesystem purposes, is needed to address that.

At least in this proposal we're talking about URL strings, rather than URL instances, but unless the path module also supports it, I don't see it being very useful. I certainly wouldn't use the URL constructor myself to do path operations (and that's when it's even possible - if you need to retrieve the extname for example, things become unreadable).

When parsing a string beginning with file:, Node would first check to see if a file or folder named file: existed on disk. If it does, treat the string as a path; otherwise, treat the string as an URL string. Strings not beginning with file: are paths, like today.

That would make fs.writeFile('file:///tmp/foo.txt') unable to create files. It would also have consistency issues (which sometimes turn into attack vectors), and perf implications (this lookup would make file:// paths slower than others).

@arcanis
Copy link
Contributor

arcanis commented Aug 2, 2023

Also something of note: I wonder how this would interact with third-party tools that check whether something is absolute or not by checking if the first character is / or \. They would have to be adapted to also check for this file: protocol (yes, perhaps in theory they should use path.isAbsolute, but it's an easy mistake to do).

For example, I could see it being a problem on archive unpackers (tar, zip) if they protect against absolute paths by manually checking the first characters. In this scenario, a malicious archive could perhaps contain an absolute file that wouldn't be detected as such, and have write access on any file on the disk.

@tniessen
Copy link
Member

tniessen commented Aug 2, 2023

I think @bnoordhuis has summarized sufficiently why this is something that we absolutely should not do.

2 is already the reality of strings.

@JakobJingleheimer Could you elaborate on this? What different string formats does Node.js interpret beyond whatever the underlying operating system or file system implements?

@JakobJingleheimer
Copy link
Contributor

Could you elaborate on this? What different string formats does Node.js interpret beyond whatever the underlying operating system or file system implements?

I meant for the web in general:

  • file:, http:, wss:, etc are URLs
  • /foo/bar/quz are paths
  • /foo/ is a regex

It seems reasonable for readFile to accept a file URL, and if they already accept URL instances, accepting a URL string seems expected (I would guess under the hood, node immediately converts the URL instance to a url string anyway). We now have an efficient util for detecting whether a string is a valid URL, so perf isn't a concern.

@Jamesernator
Copy link

  • Path class that would guarantee its content to be a valid path (e.g. existing methods can construct a path with NUL character or path longer than PATH_MAX without any warning) and have methods like Path.prototype.toURL() and Path.fromURL()

This is what I mainly use nowadays, via a wrapper around fs/os that is essentially just Python's pathlib Path object. Having fs capabilities on such objects, like Python does, is also useful as it means I can do path manipulation and fs operations in the same abstraction.

@tniessen
Copy link
Member

tniessen commented Aug 2, 2023

It seems reasonable for readFile to accept a file URL, and if they already accept URL instances, accepting a URL string seems expected

@JakobJingleheimer I (still) strongly disagree. JavaScript and the web have a messy history that has often favored convenience over sanity, but we should learn from past mistakes and not add to them. As multiple people have explained, accepting file URLs as strings will lead either to unjustified breaking changes, ambiguity, or race conditions. None of these options seem acceptable, so I personally don't think this there's any chance this feature request will be adopted.

@bmeck
Copy link
Member

bmeck commented Aug 2, 2023

I think maybe this is the wrong question request. Making an API that would work for both or just focused on URLs (and thus able to work on all protocols) seems reasonable to suggest from my perspective, this avoids a lot of collision complexity and backwards compatibility. Additionally the behavior of /../ segments actually differs in real filesystems which traverse iteratively whereas things like the URL constructor normalize eagerly leading to surprises like:

let u = new URL('./symdir', import.meta.url)
fs.readdirSync(new URL('./..', u)).length // 58, doesn't even see symlink due to /../ normalization
fs.readdirSync(path.join(url.fileURLToPath(u), '..')).length // 53 goes through symlink

@guybedford
Copy link
Contributor

Firstly it's important not to let pragmatism be trumped by technicalities. import.meta.resolve is the way to resolve assets in ES modules.

Build tools can detect usage of const x = import.meta.resolve('./asset.raw') because it is statically analyzable like a dynamic import, and they can provide streamlined asset relocation support (eg, see ncc's asset relocation loader).

There is a big benefit to aligning on a contextual asset story that is standards based.

These edge cases are completely valid as a technical concern, but I don't think most users would appreciate the difference between readFile(URL) and readFile(URLString). While most users would certainly benefit from the standard ergonomics of readFile(import.meta.resolve('./asset')), which is in fact nicer than import.meta.filename and import.meta.dirname. I don't think we need to preclude work on those specs either though here.

If the major concern is one of compatibility and breaking edge cases, perhaps there's a flag to disable the behaviour?

A standards first asset solution would be the best line to consider in my opinion though. And if not this, then we should pick up a spec like https://github.com/tc39/proposal-asset-references and drive it towards the use case. I think this issue describes the simplest story we'll get though.

@GeoffreyBooth
Copy link
Member Author

I think the writeFile case is a good reason why the logic should be the simpler approach: strings beginning with file: are treated as URL strings, else they’re path strings. And that’s it. It’s synchronous, so there shouldn’t be any race condition concerns; and it’s simple, where we can explain it to users in a sentence. It would be a semver-major change, but there were other reasons that this was getting pushed toward being a semver-major change anyway. I think the usability benefits outweigh the edge case concerns around folders named file:.

@mcollina
Copy link
Member

mcollina commented Aug 2, 2023

@guybedford I disagree that the import.meta.resolve() is statically analyzable in the broader sense. Many of my use cases for __dirname is to ship a folder of assets (with possibly more folders within). Moreover, import.meta.resolve() still returns a file://, which means it would still need to be translated to a path for further processing.

@GeoffreyBooth
Copy link
Member Author

GeoffreyBooth commented Aug 2, 2023

The Deno docs for import.meta.resolve provide another use case for this:

const worker = new Worker(import.meta.resolve("./worker.ts"));

Our Worker constructor accepts only path strings or URL objects.

@bmeck
Copy link
Member

bmeck commented Aug 2, 2023

I'd note that data:, blob: (generally), etc. also can be done sync if we are talking more generically about things.

@tniessen
Copy link
Member

tniessen commented Aug 2, 2023

Firstly it's important not to let pragmatism be trumped by technicalities. (...) These edge cases are completely valid as a technical concern, but I don't think most users would appreciate the difference between readFile(URL) and readFile(URLString)

I think the usability benefits outweigh the edge case concerns around folders named file:.

I realize that this might be an unpopular opinion in the JavaScript space, but I firmly believe that correctness and safety outweigh convenience.

where we can explain it to users in a sentence

That's the problem. You have to explain it. Users must be aware of it. This does not make things simpler. It increases complexity. Things are simple right now: it's a URL if and only if it's URL.

If my understanding of your proposal is accurate, it leads to problematic inconsistencies.

// Throws an error.
fs.mkdirSync(new URL("http://example.com/bar"), { recursive: true })
// Creates a directory named "http:" relative to the working directory.
fs.mkdirSync("http://example.com/bar", { recursive: true })
// Creates a directory named "example.com" in the root directory.
fs.mkdirSync("file://example.com/bar", { recursive: true })

The Deno docs for import.meta.resolve provide another use case for this:

Yes, of course they do, because Deno implements Web Workers, whose first argument is a URL string. Node.js does not implement Web Workers. See #43583.

@Qard
Copy link
Member

Qard commented Aug 2, 2023

The file scheme can also include hostname presence, meaning file://other_host/home is a valid file URL pointing to /home on other_host which would mean to actually follow the spec properly we'd have to start adapting fs to start handling networked file system locations and that's a whole other giant pile of complications.

@bnoordhuis
Copy link
Member

Several people pointed out irredeemable flaws in your proposal. You can't hand-wave that away by saying "pointing out all the various edge cases [..] is a bit premature", you need to engage with them.

@GeoffreyBooth
Copy link
Member Author

GeoffreyBooth commented Aug 2, 2023

Several people pointed out irredeemable flaws in your proposal. You can’t hand-wave that away by saying “pointing out all the various edge cases [..] is a bit premature”, you need to engage with them.

No, I really don’t. I’m writing this issue as a user, not as a developer. As a user, I want to be able to use URL strings with Node APIs. That’s it, that’s the feature request. We don’t expect users to come up with implementations and solve all edge cases.

What I expect and would appreciate from the others commenting on this thread is suggestions for how to achieve my feature request. If the ideas I brainstormed don’t work, fine, propose others. It’s not my responsibility to come up with successive ideas and try to defend them. You all are very smart people and I’m sure you can contribute ideas on how to make my life as a user easier. #48994 (comment) is a great example of what I consider to be a constructive, collaborative comment.

@aduh95
Copy link
Contributor

aduh95 commented Aug 2, 2023

The only actionable request I can find in this thread is the one given by Guy:

While most users would certainly benefit from the standard ergonomics of readFile(import.meta.resolve('./asset')), which is in fact nicer than import.meta.filename and import.meta.dirname.

So maybe we should rename this issue to "Node.js import.meta.resolve should return a URL instance", and it'd be likely a more productive conversation, given that I think there's a clear opposition to add support for strings that would not be paths for APIs interfacing with the FS.

@GeoffreyBooth You claim it's a user request, but you are also suggesting a technical solution (support for URL strings in node:fs), which you should either defend, or retract the proposal and the discussion can move on to other solutions.

@GeoffreyBooth
Copy link
Member Author

The only actionable request I can find in this thread is the one given by Guy:

There were several good ideas in #48994 (comment) too.

I’m happy to rewrite the initial post but I don’t know what I would change it to. Basically I want to make working with URL strings easier, and that was an example of one place where I could see us making a change to do so. If we can narrow down a list of other places, or other actionable steps, I’ll update the top post with wherever this discussion goes.

As for import.meta.resolve, its signature is locked per spec; Guy spent years arguing with WHATWG to try to get them to allow it to be async, without luck. We can’t just change its return value from a string to a URL. However, I was just thinking how useful such an API would be; perhaps we should propose import.meta.resolveURL that does the same as import.meta.resolve but returns a URL instance.

@aduh95
Copy link
Contributor

aduh95 commented Aug 2, 2023

There were several good ideas in #48994 (comment) too.

Yes sorry, I didn't meant to disregard Livia's ideas, which are all relevant btw. But Lydia's comment is listing possible solutions, not defining the problem, and that's what I meant by "actionable requests".

We can’t just change its return value from a string to a URL

Maybe we can, what we care about is to preserve cross-compat, but returning a URL or a string is almost always transparent to the user, it would be the same kind of deviation that we have with setTimeout. The HTML spec doesn't tell what type to return, or maybe I'm missing something in the spec.
I would much prefer this solution tbh.

@GeoffreyBooth
Copy link
Member Author

The HTML spec doesn't tell what type to return, or maybe I'm missing something in the spec.
I would much prefer this solution tbh.

I can support a new API import.meta.resolveURL that is identical to import.meta.resolve but returns an URL instance. That avoids the compatibility issues. Regarding the spec, https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Operators/import.meta/resolve defines the return value as a string. I can’t find this defined in the WHATWG spec either, but I just tested in Chrome and it returns a primitive string from import.meta.resolve, as MDN says it should (and matching import.meta.url, which makes sense). I think this API and spec is pretty well locked at this point.

I guess the broader “defining the problem” is that I want to work with URLs more easily in ESM in Node. Yes another approach besides getting Node APIs to accept URL strings is to more easily get URL instances, though we’re back to adding on import.meta to do so since that’s the only way we can access information about the current module apparently. I don’t want to use file paths instead because I want to write code that works when run from HTTPS modules or data: URLs, not just file-based modules.

@ovflowd
Copy link
Member

ovflowd commented Aug 3, 2023

I probably do not have extra valuable points here, as many already have pointed out why, but I genuinely (as a user) do not think this should land (at least the way it was initially asked).

I understand that users request features, and as a user, I'm genuinely interested in knowing how the maintainers and developers of X can support me with said feature. Having that said, that doesn't mean developers should implement X because said users want.

I genuinely believe in the convenience of having simple URL strings, I genuinely like that, and as a user, I love that.

As a developer, I wonder what that said convenience implies regarding inconsistencies, security issues, and underlying complexities. Some examples in this issue have proven that the current ask might not be ideal.

I'd like to know if, instead of rectifying the same ask, the original feature request could be redesigned in a way that suits scenarios that we want without breaking existing code and creating numerous issues. This sort of breaking-change can, and will, break numerous production codes that expect said behavior to be... well.. said behavior. It reminds me (even if it's an entirely separate scenario) of the infamous discussions regarding "undefined and null and the fact that one of those should be removed".

I do believe this feature request has a future; I sympathize with the general idea, but I believe there might be different ways of accomplishing similar solutions without changing well-known APIs (and a lot of APIs). For example, import.meta.resolveURL. (I do share this pain, even in the nodejs/nodejs.org codebase!)

@LiviaMedeiros
Copy link
Contributor

LiviaMedeiros commented Aug 3, 2023

But Lydia's comment is listing possible solutions, not defining the problem, and that's what I meant by "actionable requests".

To clarify why the said suggestions are in huge offset from the initial proposal of this issue (i.e. oriented on URL instances rather than strings, and not touching fs subsystem in particular), I'd define the root problem here as "sometimes there is refusal to support and use URL objects for unknown or wrong reasons".

To address the issue of urlstrings directly, in pure theory, I think there are only two ways to tell if "X is URL":

  • X instanceof URL (or at least internal/url.isURLInstance(X) which is currently used)
  • X.toString() is expected to be valid well-formed URL by default

The second is theoretically feasible in one of two ways:

  • if X.toString() starts with ., / or \, it's a path. Otherwise it's URL
    we probably shouldn't can't make file: protocol an odd one, because it would add ambiguity (what should user expect from trying to read nfs://localhost/tmp/leases?)
  • only explicit X instanceof Path means path, any string is URL

This approach would still allow us to use "relative URLs", we just have to resolve the input string like this: new URL(X, url.pathToFileURL(process.cwd() + path.sep)). And if user needs to pass a relative path that looks like URL, all they need to do is percent-encode it, i.e. encodeURIComponent('file:') would mean the same as ./file:.

With this, it is feasible to make a pretty consistent userland wrapper around node:fs, node:child_process and etc. that will solve this exact issue in a literal way, at the cost of making "legacy paths" much less usable. Although I'm not quite sure if there are usecases where fsurl.readFile(urlstring) is much better than fs.readFile(new URL(urlstring)).
I think it's obvious that we can't have this in core because it would break everything, and we must not try to make it less breaking by adding heuristics and inconsistent behaviour.

As for urlstrings themselves, I don't think it's a big problem in returning a string that is guaranteed to be a URL (e.g. what import.meta.resolve() does), or to take a string parameter that must be an absolute or relative URL (e.g. what fetch() does), but there are problems with both:

  • parsing urlstring directly, as this is extremely error-prone. Even the trivial task of testing if protocol is file: has wrong implementations in the wild such as /file:/.test(location.href) or location.toString().match(/file:/).
  • operating on urlstring directly, as this is also extremely error-prone. For example, readFile(new URL('.', import.meta.url) + 'config/app.json') would work only when: 'config/app.json' === encodeURIComponent('config') + '/' + encodeURIComponent('app.json'), new URL('.', import.meta.url).hash === '', and new URL('.', import.meta.url).search === ''. Which are true in this particular example, but assumption that urlInstance.toString() + '/' + subPath is equivalent of path.join(basePath, subPath) is clearly not.

As for import.meta and its specs, I'd like to point out that both import.meta.url and import.meta.resolve are part of HTML Standard, hence there is no problem in Node.js having it working with URL instances, except for strict compatibility with browsers.
Personally I would prefer to see them returning URL instances, as long as it doesn't impact the performance. Apparently it doesn't even have to be frozen, the HTML specs allow to overwrite import.meta.url with any value.
The reason behind exposing import.meta.url as string is that the URL Standard explicitly asks to do so in such APIs, however it might be reasonable to expose URL instance for convenience (like how in web a lot of read-only operations on document.location would suffer if it was exposed only as document.url string).

@arcanis
Copy link
Contributor

arcanis commented Aug 3, 2023

Perhaps an alternative would be, like we have path.posix and path.win32, to have a fs.url. It becomes a little more annoying with fs.url.promises, I guess. I don't know how it'd affect the startup time.

@GeoffreyBooth
Copy link
Member Author

GeoffreyBooth commented Aug 3, 2023

Perhaps an alternative would be, like we have path.posix and path.win32, to have a fs.url. It becomes a little more annoying with fs.url.promises, I guess. I don’t know how it’d affect the startup time.

I was just thinking, if we can’t keep overloading the first parameter to the fs APIs then we could go the fs/promises route and create fs/url, where it’s the same as fs/promises except that strings are always treated as a URL. So fs/url readFile‘s first param input would still be string, Buffer, URL, or FileHandle, but the string form would always be interpreted as an URL string and never as a path.

@GeoffreyBooth GeoffreyBooth changed the title Node.js APIs that accept URL instances should also accept URL strings Node.js APIs are inconvenient to use with URL strings Aug 3, 2023
@LiviaMedeiros
Copy link
Contributor

Here's very dirty proof of concept, proxying node:fs/promises methods and interpreting input as URL strings in userland

import { readFile } from 'fsURL';
// these are all the same
await readFile('/etc/fstab'); // relative url that starts from file:///
await readFile('../../../../../../../../../etc/fstab'); // relative url that works with subdirectory depth <= 9
await readFile('file:///etc/fstab'); // absolute url
await readFile(new URL('file:///etc/fstab')); // URL instance
await readFile(Buffer.from('file:///etc/fstab')); // Buffer instance

// these point to test file assuming cwd to be one level higher
await readFile(import.meta.url); // absolute url of this file
await readFile('fsURL/test.mjs'); // relative url that starts from cwd
await readFile('./fsURL/test.mjs'); // relative url that explicitly starts from cwd

For the reasons described above, I don't think we should have this in Node.js core.

@mcollina
Copy link
Member

mcollina commented Aug 4, 2023

As a side note, support for file:// URLs in fetch() is often asked. So maybe we are onto something with this issue.

@LiviaMedeiros folks are really using path too to manipulate URLs. I think you'd need to match path.url as well.

@tniessen
Copy link
Member

tniessen commented Aug 4, 2023

I was just thinking, if we can’t keep overloading the first parameter to the fs APIs then we could go the fs/promises route and create fs/url, where it’s the same as fs/promises except that strings are always treated as a URL.

I doubt that maintaining a third (or fourth if we separate sync and async APIs) node:fs API just so that folks don't have to do the trivial conversion using fileURLToPath() is a viable approach. Besides, if strings are always treated as a URL, it will be extremely tricky to define and justify return values of, for example, readlink().

@bmeck
Copy link
Member

bmeck commented Aug 4, 2023

I rather think the idea from @mcollina above might be simpler than trying to make filesystem specific fs work like URLs. Why can't we just have new APIs that are built to handle things like in-memory and remote content? It isn't just fs that is affected here, things like fetch() already support data:. I do think mixing permissions with network and file system is always a cause for heavy security review but in this case how things actually resolve particularly with symlinks differs between the 2 modes of operation which is something to be extremely careful about.

@guybedford
Copy link
Contributor

Supporting fetch('file:///...') seems like it would be useful and Deno et al already do this I believe.

But still having support for an fs.readFile('file:///path/to/thing') where the check is startsWith('file:') as the exception that exactly does preclude a bunch of former use cases would still be interesting to explore. file: names could still be achieved with relative or absolute pathing - fs.readFile('./file:'), fs.readFile('/file:/'), so I'm still not quite sure I fully understand the argument against, short of being very very clear and intentional about the edge cases and security implications.

@Qard
Copy link
Member

Qard commented Aug 4, 2023

Why can't we just have new APIs that are built to handle things like in-memory and remote content?

I have actually been thinking it would be nice if we had a new fs API more based on web standards and with, as you suggest, the ability to map to different targets like in-memory representations, remote content, over an archive, etc. We could have various APIs which return a FileSystemDirectoryHandle and let you interact with content from any sort of source or target.

@isaacs
Copy link
Contributor

isaacs commented Aug 5, 2023

What happens if you have a directory in the cwd named file:? Handling url strings in fs smells like a really confusing and perilous security footgun, tbh.

I think it's best for security and intelligibility if either there's a node:fs/url, like @mcollina suggests, or only extend fs to handle file url objects but not url strings.

fetch('file://...') is an obvious win imo, though.

@GeoffreyBooth
Copy link
Member Author

What happens if you have a directory in the cwd named file:? Handling url strings in fs smells like a really confusing and perilous security footgun, tbh.

This was discussed above: we just define in the docs that path strings beginning with file: are interpreted as file URLs, and that’s that. Yes it’s a special case, yes it’s a breaking change, yes it means that if you actually have a folder named file: you need to reference it via ./file: or an absolute path. Many people on this thread have expressed strong opposition but the criticism seems mostly around UX (strings should always be paths, this special case is too confusing) which I don’t really find persuasive because the lack of support for file URLs is itself bad UX, so it’s really a question of choosing between one compromise or the other. I don’t recall any technical objections why it couldn’t work1, just principled objections (which are fine, UX is important, but “should we do it” is a different category from “can we do it” or “is it a security risk”).

All that said, overloading APIs that accept path strings to also accept URL strings is just one potential solution, and some of the other ideas like fs/url or fetch(fileURL) are arguably more promising to explore.

or only extend fs to handle file url objects but not url strings.

fs APIs already accept URL instances. The ask here is about convenience, trying to get to a similar level of ease as __dirname and __filename in CommonJS. And yes, maybe one of the solutions is to make URL instances more prevalent instead of URL strings. We can’t change import.meta.url or import.meta.resolve, but we could potentially add import.meta.URL (?) or import.meta.resolveURL for URL-instance versions.

Footnotes

  1. There was the technical objection around the variation of “starts with file: and exists on disk” and I’m persuaded that that particular option can’t work.

@isaacs
Copy link
Contributor

isaacs commented Aug 6, 2023

fs APIs already accept URL instances.

TIL! Sorry for the non sequitur suggestion ;)

Yes it’s a special case, yes it’s a breaking change, yes it means that if you actually have a folder named file: you need to reference it via ./file: or an absolute path.

I think this is the sort of workaround that's going to be a lot more fraught than it seems at first. Like, "breaking change" can mean "this will blow up or otherwise obviously not work unless you change your code to accommodate it", but in this case, it's more like "this will function normally, but potentially do completely the wrong thing".

If you have code that does something like:

for (const f of await readdir('.')) {
  doSomethingWithFile(f)
}

Then it's going to potentially be a juicy security target if I get that code to run after managing to create ./file:/etc/passwd or something. Yes, best practice is arguably to always path.resolve() such things, but I don't know if anyone can even provide a rough estimate about how reliably that's done in the wild. I am somewhat anxious about the security advisories I'm going to have to deal with in glob and tar (not to mention chmodr, chownr, mkdirp, etc.) if fs starts treating file:... strings as urls.

We can’t change import.meta.url or import.meta.resolve, but we could potentially add import.meta.URL (?) or import.meta.resolveURL for URL-instance versions.

If the goal is just making it easier to have something like __filename and __dirname available in ESM environments, which can be transparently passed to fs methods, then yes, I think providing a URL-object equivalent on import.meta seems pretty reasonable, assuming it doesn't run afoul of the es module specification.

Or, honestly, just telling people "wrap file://... strings in new URL(...)" seems pretty reasonable as well.

Perhaps it could also be worthwhile to add a url.pathOrFileURL(...) that will turn either a path or a file URL or file URL string into a file URL object? Then at least there'd be a single method people could wrap everything in, that'll always dtrt, and make the conversion explicit.

const pathOrFileURL = (input: string | URL): URL => {
  if (input instanceof URL) {
    if (input.protocol !== 'file:') throw new Error('not a file URL');
    return input;
  }
  return input.startsWith('file:') ? new URL(input) : pathToFileURL(input);
}

Copy link
Contributor

github-actions bot commented Feb 2, 2024

There has been no activity on this feature request for 5 months and it is unlikely to be implemented. It will be closed 6 months after the last non-automated comment.

For more information on how the project manages feature requests, please consult the feature request management document.

@github-actions github-actions bot added the stale label Feb 2, 2024
Copy link
Contributor

github-actions bot commented Mar 3, 2024

There has been no activity on this feature request and it is being closed. If you feel closing this issue is not the right thing to do, please leave a comment.

For more information on how the project manages feature requests, please consult the feature request management document.

@github-actions github-actions bot closed this as not planned Won't fix, can't repro, duplicate, stale Mar 3, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
esm Issues and PRs related to the ECMAScript Modules implementation. feature request Issues that request new features to be added to Node.js. fs Issues and PRs related to the fs subsystem / file system. path Issues and PRs related to the path subsystem. stale url Issues and PRs related to the legacy built-in url module.
Projects
Status: Pending Triage
Development

No branches or pull requests

17 participants