Node.js APIs are inconvenient to use with URL strings #48994

GeoffreyBooth · 2023-08-01T23:49:05Z

Problem

Spinning off from #48740 (comment), many of our APIs such as fs.readFile accept URL instances (what you get from new URL) but not URL strings (like import.meta.url, or the return value of the soon-to-be-unflagged import.meta.resolve). In the case of many (all?) of these, all strings are interpreted as paths. This is frustrating since in ESM, we have easy access to URL strings such as import.meta.url but path strings require using helpers such as fileURLToPath.

Original Idea

Please see discussion in the full thread below; while this first idea kicked us off, we’re still brainstorming other solutions. If/when we reach a consensus on ideas that seem worth implementing, I’ll update this top post with a summary.

Wherever feasible, all Node.js APIs that can accept URL strings should do so. In particular this is most relevant to the fs APIs, especially the ones that already accept URL instances. To avoid ambiguity with path strings, such APIs should only interpret URL strings that begin with file:.

I presume that this would be a semver-major change, to avoid needing to first check for the existence of a file or folder named file: in the local path; or perhaps we could add such a check now in order to land this and backport it, and remove such a check in a semver major.

We would also need to consider the security implications. Per @aduh95:

the fact is that readFile('file:///etc/passwd') currently throws (unless there is a local folder named file:, but that’s unlikely), it would concern me if we were to release a new version of node where it no longer throws.

I don’t really see how this is a security concern, but I concede that there might be issues to consider. Perhaps some can be addressed via permissions or policies. I do feel however that since URL strings are so prevalent in ESM, we should require a high bar for security concerns to outweigh usability for this feature.

cc @nodejs/loaders @nodejs/modules @nodejs/security

The text was updated successfully, but these errors were encountered:

Qard · 2023-08-02T00:42:01Z

The only part of ESM that consistently has a URL that I'm aware of is import.meta.url. Specifiers can accept URLs, but the vast majority of uses are path strings. I don't see how changing everything, and in a knowingly backwards-incompatible way, would be preferable to just adding a path string to import.meta alongside the URL that's already there. 🤷🏻

File URLs are not paths and I don't think they ever should have been treated as such. The absolute nature is verbose, which has poor usability, and putting them in the URL class conflates structural validation of the URL format with correctness of the path string which is inaccurate. There are valid paths on some systems which are not valid URLs.

I'm 👎🏻 on this. I feel even supporting URL instances in the fs API was a mistake. If we wanted a typed representation we should have proposed a Path type as an equivalent to URL specifically for file paths.

tniessen · 2023-08-02T01:19:27Z

@GeoffreyBooth In the very example you gave, @aduh95 explained that file: may refer to a directory within the process's current working directory. Are you suggesting an ambiguous interpretation of such paths? If so, that seems like something we absolutely should not do.

GeoffreyBooth · 2023-08-02T03:47:55Z

I don’t see how changing everything, and in a knowingly backwards-incompatible way, would be preferable to just adding a path string to import.meta alongside the URL that’s already there. 🤷🏻

With regard to the second point, I would be happy to add a path string to import.meta if we can get a WinterCG-blessed API to do so. Whether or not we add that doesn’t affect whether or not we make the change I propose here; we can do both.

As for “changing everything,” currently readFile‘s first parameter accepts all the following forms of input: string, Buffer, URL, or FileHandle. It would still allow all those forms of input. Currently, string input throws if it’s a file URL (unless the user happens to have a folder or file literally named file:, with a colon in the name). What I’m proposing here is that it no longer throw, and use the presence of file: as the first five characters of the string as the method to disambiguate paths from URL strings.

In the very example you gave, @aduh95 explained that file: may refer to a directory within the process’s current working directory. Are you suggesting an ambiguous interpretation of such paths? If so, that seems like something we absolutely should not do.

In Node today it might refer literally to a folder named file:, with a colon, yes. In my proposal, if a user actually has such a folder, they can continue to reference it as a path by prepending ./ to it or making it an absolute path.

I’m not proposing making the string parsing ambiguous. We have two options for how to handle disambiguation, and we would document what method we’re choosing:

Strings beginning with file: are always interpreted as URL strings and that’s it. Everything else is a path, like today.
When parsing a string beginning with file:, Node would first check to see if a file or folder named file: existed on disk. If it does, treat the string as a path; otherwise, treat the string as an URL string. Strings not beginning with file: are paths, like today.

The first option would be a semver-major change, because of the extreme edge case of a file or folder actually named file:. The second option could land today and get backported, unless people think that changing the current “throw on URL string” behavior would be breaking to change. I would suggest that even if we land and backport the second option, we switch to the first option in the next major; or we could just go straight to option 1 as a semver major.

targos · 2023-08-02T04:01:10Z

I'm not convinced this option would help. People don't want (or very rarely) to readFile(import.meta.url).
They want to manipulate paths with the path module first (for example to read a data file relatively to the module's path).
For that, they have to convert the file URL string to a path string

GeoffreyBooth · 2023-08-02T04:10:45Z

People don’t want (or very rarely) to readFile(import.meta.url).

The kind of use case I had in mind was something like readFile(new URL('.', import.meta.url) + 'config/app.json'), essentially the same as readFile(join(__dirname, 'config/app.json')). You wouldn’t need join because URLs are always forward slashes regardless of operating system. It adds a bit more predictability to working with references to files, and it lets us avoid needing any helpers at all, whether join or fileURLToPath.

LiviaMedeiros · 2023-08-02T04:13:27Z

A directory name equal or starting with file: is something that easily occurs in a situation where directory structure reflects remote paths, hence things like fs.mkdir('file:///etc/npm', { recursive: true }) or fs.rm('file:///etc', { recursive: true }) switching from ./file:/ base to / might be too much of a breaking change.
We also can't have meaningful check if file with such name exists for methods that work with paths that might not exist yet (e.g. writeFile).

This is frustrating since in ESM, we have easy access to URL strings such as import.meta.url but path strings require using helpers such as fileURLToPath.

For APIs that accept URL instances, we can use new URL(import.meta.url) instead of path strings.

readFile(new URL('.', import.meta.url) + 'config/app.json')

The correct way here is readFile(new URL('config/app.json', import.meta.url))

Overall, I think the directions of improvement might be:

making node:path APIs accept with URL instances, converting them to absolute paths
Path class that would guarantee its content to be a valid path (e.g. existing methods can construct a path with NUL character or path longer than PATH_MAX without any warning) and have methods like Path.prototype.toURL() and Path.fromURL()
helper methods somewhere that would make path-specific parts (extension, basename, dirname, relative path, etc.) work with URL
if the namespace like import.meta.node gets standardized, maybe import.meta.node.URL with frozen URL instance rather than string
fetch('file:///path/to/local/file') that would make reading operations protocol-agnostic

aduh95 · 2023-08-02T05:37:26Z

fetch('file:///path/to/local/file') that would make reading operations protocol-agnostic

+1 to that, related: wintercg/fetch#5

ShogunPanda · 2023-08-02T07:13:28Z

2. When parsing a string beginning with file:, Node would first check to see if a file or folder named file: existed on disk. If it does, treat the string as a path; otherwise, treat the string as an URL string. Strings not beginning with file: are paths, like today.

+1 to that.

I also think adding the string version of import.meta.url (for instance named import.meta.path) would also help a very lot.

bnoordhuis · 2023-08-02T07:46:17Z

first check for the existence of a file or folder named file: in the local path

One word: TOCTOU

As to the general proposal:

Have a string always mean the same thing
Have a string mean different things depending on $factors

Formulated like that, I hope everyone agrees 1 is strictly superior to 2.

JakobJingleheimer · 2023-08-02T08:05:43Z

When parsing a string beginning with file:, Node would first check to see if a file or folder named file: existed on disk. If it does, treat the string as a path; otherwise, treat the string as an URL string. Strings not beginning with file: are paths, like today.

This seems sensible to me.

As to the general proposal:

Have a string always mean the same thing

Have a string mean different things depending on $factors

2 is already the reality of strings. The two kinds of strings here are easily differentiated and are extremely prevalent in the wild.

arcanis · 2023-08-02T08:20:46Z

Generally, I don't understand what makes url so good at manipulating filesystem paths.

2 is already the reality of strings. The two kinds of strings here are easily differentiated and are extremely prevalent in the wild.

Yes, having different APIs on Windows and Posix is really a pain (so much we ended up writing a translation layer in Yarn to only ever deal with posix path, even on Windows), but I'm not convinced using a completely different data structure, one that wasn't created for filesystem purposes, is needed to address that.

At least in this proposal we're talking about URL strings, rather than URL instances, but unless the path module also supports it, I don't see it being very useful. I certainly wouldn't use the URL constructor myself to do path operations (and that's when it's even possible - if you need to retrieve the extname for example, things become unreadable).

When parsing a string beginning with file:, Node would first check to see if a file or folder named file: existed on disk. If it does, treat the string as a path; otherwise, treat the string as an URL string. Strings not beginning with file: are paths, like today.

That would make fs.writeFile('file:///tmp/foo.txt') unable to create files. It would also have consistency issues (which sometimes turn into attack vectors), and perf implications (this lookup would make file:// paths slower than others).

arcanis · 2023-08-02T08:27:40Z

Also something of note: I wonder how this would interact with third-party tools that check whether something is absolute or not by checking if the first character is / or \. They would have to be adapted to also check for this file: protocol (yes, perhaps in theory they should use path.isAbsolute, but it's an easy mistake to do).

For example, I could see it being a problem on archive unpackers (tar, zip) if they protect against absolute paths by manually checking the first characters. In this scenario, a malicious archive could perhaps contain an absolute file that wouldn't be detected as such, and have write access on any file on the disk.

tniessen · 2023-08-02T08:38:34Z

I think @bnoordhuis has summarized sufficiently why this is something that we absolutely should not do.

2 is already the reality of strings.

@JakobJingleheimer Could you elaborate on this? What different string formats does Node.js interpret beyond whatever the underlying operating system or file system implements?

JakobJingleheimer · 2023-08-02T08:54:19Z

Could you elaborate on this? What different string formats does Node.js interpret beyond whatever the underlying operating system or file system implements?

I meant for the web in general:

file:, http:, wss:, etc are URLs
/foo/bar/quz are paths
/foo/ is a regex

It seems reasonable for readFile to accept a file URL, and if they already accept URL instances, accepting a URL string seems expected (I would guess under the hood, node immediately converts the URL instance to a url string anyway). We now have an efficient util for detecting whether a string is a valid URL, so perf isn't a concern.

Jamesernator · 2023-08-02T11:38:10Z

Path class that would guarantee its content to be a valid path (e.g. existing methods can construct a path with NUL character or path longer than PATH_MAX without any warning) and have methods like Path.prototype.toURL() and Path.fromURL()

This is what I mainly use nowadays, via a wrapper around fs/os that is essentially just Python's pathlib Path object. Having fs capabilities on such objects, like Python does, is also useful as it means I can do path manipulation and fs operations in the same abstraction.

tniessen · 2023-08-02T11:50:10Z

It seems reasonable for readFile to accept a file URL, and if they already accept URL instances, accepting a URL string seems expected

@JakobJingleheimer I (still) strongly disagree. JavaScript and the web have a messy history that has often favored convenience over sanity, but we should learn from past mistakes and not add to them. As multiple people have explained, accepting file URLs as strings will lead either to unjustified breaking changes, ambiguity, or race conditions. None of these options seem acceptable, so I personally don't think this there's any chance this feature request will be adopted.

bmeck · 2023-08-02T13:47:52Z

I think maybe this is the wrong question request. Making an API that would work for both or just focused on URLs (and thus able to work on all protocols) seems reasonable to suggest from my perspective, this avoids a lot of collision complexity and backwards compatibility. Additionally the behavior of /../ segments actually differs in real filesystems which traverse iteratively whereas things like the URL constructor normalize eagerly leading to surprises like:

let u = new URL('./symdir', import.meta.url)
fs.readdirSync(new URL('./..', u)).length // 58, doesn't even see symlink due to /../ normalization
fs.readdirSync(path.join(url.fileURLToPath(u), '..')).length // 53 goes through symlink

guybedford · 2023-08-02T15:34:46Z

Firstly it's important not to let pragmatism be trumped by technicalities. import.meta.resolve is the way to resolve assets in ES modules.

Build tools can detect usage of const x = import.meta.resolve('./asset.raw') because it is statically analyzable like a dynamic import, and they can provide streamlined asset relocation support (eg, see ncc's asset relocation loader).

There is a big benefit to aligning on a contextual asset story that is standards based.

These edge cases are completely valid as a technical concern, but I don't think most users would appreciate the difference between readFile(URL) and readFile(URLString). While most users would certainly benefit from the standard ergonomics of readFile(import.meta.resolve('./asset')), which is in fact nicer than import.meta.filename and import.meta.dirname. I don't think we need to preclude work on those specs either though here.

If the major concern is one of compatibility and breaking edge cases, perhaps there's a flag to disable the behaviour?

A standards first asset solution would be the best line to consider in my opinion though. And if not this, then we should pick up a spec like https://github.com/tc39/proposal-asset-references and drive it towards the use case. I think this issue describes the simplest story we'll get though.

GeoffreyBooth · 2023-08-02T16:10:02Z

I think the writeFile case is a good reason why the logic should be the simpler approach: strings beginning with file: are treated as URL strings, else they’re path strings. And that’s it. It’s synchronous, so there shouldn’t be any race condition concerns; and it’s simple, where we can explain it to users in a sentence. It would be a semver-major change, but there were other reasons that this was getting pushed toward being a semver-major change anyway. I think the usability benefits outweigh the edge case concerns around folders named file:.

mcollina · 2023-08-02T16:13:34Z

@guybedford I disagree that the import.meta.resolve() is statically analyzable in the broader sense. Many of my use cases for __dirname is to ship a folder of assets (with possibly more folders within). Moreover, import.meta.resolve() still returns a file://, which means it would still need to be translated to a path for further processing.

GeoffreyBooth · 2023-08-02T16:50:44Z

The Deno docs for import.meta.resolve provide another use case for this:

const worker = new Worker(import.meta.resolve("./worker.ts"));

Our Worker constructor accepts only path strings or URL objects.

bmeck · 2023-08-02T16:59:15Z

I'd note that data:, blob: (generally), etc. also can be done sync if we are talking more generically about things.

tniessen · 2023-08-02T17:00:49Z

Firstly it's important not to let pragmatism be trumped by technicalities. (...) These edge cases are completely valid as a technical concern, but I don't think most users would appreciate the difference between readFile(URL) and readFile(URLString)

I think the usability benefits outweigh the edge case concerns around folders named file:.

I realize that this might be an unpopular opinion in the JavaScript space, but I firmly believe that correctness and safety outweigh convenience.

where we can explain it to users in a sentence

That's the problem. You have to explain it. Users must be aware of it. This does not make things simpler. It increases complexity. Things are simple right now: it's a URL if and only if it's URL.

If my understanding of your proposal is accurate, it leads to problematic inconsistencies.

// Throws an error.
fs.mkdirSync(new URL("http://example.com/bar"), { recursive: true })
// Creates a directory named "http:" relative to the working directory.
fs.mkdirSync("http://example.com/bar", { recursive: true })
// Creates a directory named "example.com" in the root directory.
fs.mkdirSync("file://example.com/bar", { recursive: true })

The Deno docs for import.meta.resolve provide another use case for this:

Yes, of course they do, because Deno implements Web Workers, whose first argument is a URL string. Node.js does not implement Web Workers. See #43583.

Qard · 2023-08-02T19:14:49Z

The file scheme can also include hostname presence, meaning file://other_host/home is a valid file URL pointing to /home on other_host which would mean to actually follow the spec properly we'd have to start adapting fs to start handling networked file system locations and that's a whole other giant pile of complications.

bnoordhuis · 2023-08-02T21:38:42Z

Several people pointed out irredeemable flaws in your proposal. You can't hand-wave that away by saying "pointing out all the various edge cases [..] is a bit premature", you need to engage with them.

GeoffreyBooth · 2023-08-02T21:49:29Z

Several people pointed out irredeemable flaws in your proposal. You can’t hand-wave that away by saying “pointing out all the various edge cases [..] is a bit premature”, you need to engage with them.

No, I really don’t. I’m writing this issue as a user, not as a developer. As a user, I want to be able to use URL strings with Node APIs. That’s it, that’s the feature request. We don’t expect users to come up with implementations and solve all edge cases.

What I expect and would appreciate from the others commenting on this thread is suggestions for how to achieve my feature request. If the ideas I brainstormed don’t work, fine, propose others. It’s not my responsibility to come up with successive ideas and try to defend them. You all are very smart people and I’m sure you can contribute ideas on how to make my life as a user easier. #48994 (comment) is a great example of what I consider to be a constructive, collaborative comment.

aduh95 · 2023-08-02T22:04:33Z

The only actionable request I can find in this thread is the one given by Guy:

While most users would certainly benefit from the standard ergonomics of readFile(import.meta.resolve('./asset')), which is in fact nicer than import.meta.filename and import.meta.dirname.

So maybe we should rename this issue to "Node.js import.meta.resolve should return a URL instance", and it'd be likely a more productive conversation, given that I think there's a clear opposition to add support for strings that would not be paths for APIs interfacing with the FS.

@GeoffreyBooth You claim it's a user request, but you are also suggesting a technical solution (support for URL strings in node:fs), which you should either defend, or retract the proposal and the discussion can move on to other solutions.

GeoffreyBooth · 2023-08-02T22:09:48Z

The only actionable request I can find in this thread is the one given by Guy:

There were several good ideas in #48994 (comment) too.

I’m happy to rewrite the initial post but I don’t know what I would change it to. Basically I want to make working with URL strings easier, and that was an example of one place where I could see us making a change to do so. If we can narrow down a list of other places, or other actionable steps, I’ll update the top post with wherever this discussion goes.

As for import.meta.resolve, its signature is locked per spec; Guy spent years arguing with WHATWG to try to get them to allow it to be async, without luck. We can’t just change its return value from a string to a URL. However, I was just thinking how useful such an API would be; perhaps we should propose import.meta.resolveURL that does the same as import.meta.resolve but returns a URL instance.

aduh95 · 2023-08-02T22:30:56Z

There were several good ideas in #48994 (comment) too.

Yes sorry, I didn't meant to disregard Livia's ideas, which are all relevant btw. But Lydia's comment is listing possible solutions, not defining the problem, and that's what I meant by "actionable requests".

We can’t just change its return value from a string to a URL

Maybe we can, what we care about is to preserve cross-compat, but returning a URL or a string is almost always transparent to the user, it would be the same kind of deviation that we have with setTimeout. The HTML spec doesn't tell what type to return, or maybe I'm missing something in the spec.
I would much prefer this solution tbh.

GeoffreyBooth · 2023-08-03T04:32:25Z

The HTML spec doesn't tell what type to return, or maybe I'm missing something in the spec.
I would much prefer this solution tbh.

I can support a new API import.meta.resolveURL that is identical to import.meta.resolve but returns an URL instance. That avoids the compatibility issues. Regarding the spec, https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Operators/import.meta/resolve defines the return value as a string. I can’t find this defined in the WHATWG spec either, but I just tested in Chrome and it returns a primitive string from import.meta.resolve, as MDN says it should (and matching import.meta.url, which makes sense). I think this API and spec is pretty well locked at this point.

I guess the broader “defining the problem” is that I want to work with URLs more easily in ESM in Node. Yes another approach besides getting Node APIs to accept URL strings is to more easily get URL instances, though we’re back to adding on import.meta to do so since that’s the only way we can access information about the current module apparently. I don’t want to use file paths instead because I want to write code that works when run from HTTPS modules or data: URLs, not just file-based modules.

ovflowd · 2023-08-03T13:54:25Z

I probably do not have extra valuable points here, as many already have pointed out why, but I genuinely (as a user) do not think this should land (at least the way it was initially asked).

I understand that users request features, and as a user, I'm genuinely interested in knowing how the maintainers and developers of X can support me with said feature. Having that said, that doesn't mean developers should implement X because said users want.

I genuinely believe in the convenience of having simple URL strings, I genuinely like that, and as a user, I love that.

As a developer, I wonder what that said convenience implies regarding inconsistencies, security issues, and underlying complexities. Some examples in this issue have proven that the current ask might not be ideal.

I'd like to know if, instead of rectifying the same ask, the original feature request could be redesigned in a way that suits scenarios that we want without breaking existing code and creating numerous issues. This sort of breaking-change can, and will, break numerous production codes that expect said behavior to be... well.. said behavior. It reminds me (even if it's an entirely separate scenario) of the infamous discussions regarding "undefined and null and the fact that one of those should be removed".

I do believe this feature request has a future; I sympathize with the general idea, but I believe there might be different ways of accomplishing similar solutions without changing well-known APIs (and a lot of APIs). For example, import.meta.resolveURL. (I do share this pain, even in the nodejs/nodejs.org codebase!)

LiviaMedeiros · 2023-08-03T14:04:52Z

But Lydia's comment is listing possible solutions, not defining the problem, and that's what I meant by "actionable requests".

To clarify why the said suggestions are in huge offset from the initial proposal of this issue (i.e. oriented on URL instances rather than strings, and not touching fs subsystem in particular), I'd define the root problem here as "sometimes there is refusal to support and use URL objects for unknown or wrong reasons".

To address the issue of urlstrings directly, in pure theory, I think there are only two ways to tell if "X is URL":

X instanceof URL (or at least internal/url.isURLInstance(X) which is currently used)
X.toString() is expected to be valid well-formed URL by default

The second is theoretically feasible in one of two ways:

if X.toString() starts with ., / or \, it's a path. Otherwise it's URL
we probably shouldn't can't make file: protocol an odd one, because it would add ambiguity (what should user expect from trying to read nfs://localhost/tmp/leases?)
only explicit X instanceof Path means path, any string is URL

This approach would still allow us to use "relative URLs", we just have to resolve the input string like this: new URL(X, url.pathToFileURL(process.cwd() + path.sep)). And if user needs to pass a relative path that looks like URL, all they need to do is percent-encode it, i.e. encodeURIComponent('file:') would mean the same as ./file:.

With this, it is feasible to make a pretty consistent userland wrapper around node:fs, node:child_process and etc. that will solve this exact issue in a literal way, at the cost of making "legacy paths" much less usable. Although I'm not quite sure if there are usecases where fsurl.readFile(urlstring) is much better than fs.readFile(new URL(urlstring)).
I think it's obvious that we can't have this in core because it would break everything, and we must not try to make it less breaking by adding heuristics and inconsistent behaviour.

As for urlstrings themselves, I don't think it's a big problem in returning a string that is guaranteed to be a URL (e.g. what import.meta.resolve() does), or to take a string parameter that must be an absolute or relative URL (e.g. what fetch() does), but there are problems with both:

parsing urlstring directly, as this is extremely error-prone. Even the trivial task of testing if protocol is file: has wrong implementations in the wild such as /file:/.test(location.href) or location.toString().match(/file:/).
operating on urlstring directly, as this is also extremely error-prone. For example, readFile(new URL('.', import.meta.url) + 'config/app.json') would work only when: 'config/app.json' === encodeURIComponent('config') + '/' + encodeURIComponent('app.json'), new URL('.', import.meta.url).hash === '', and new URL('.', import.meta.url).search === ''. Which are true in this particular example, but assumption that urlInstance.toString() + '/' + subPath is equivalent of path.join(basePath, subPath) is clearly not.

As for import.meta and its specs, I'd like to point out that both import.meta.url and import.meta.resolve are part of HTML Standard, hence there is no problem in Node.js having it working with URL instances, except for strict compatibility with browsers.
Personally I would prefer to see them returning URL instances, as long as it doesn't impact the performance. Apparently it doesn't even have to be frozen, the HTML specs allow to overwrite import.meta.url with any value.
The reason behind exposing import.meta.url as string is that the URL Standard explicitly asks to do so in such APIs, however it might be reasonable to expose URL instance for convenience (like how in web a lot of read-only operations on document.location would suffer if it was exposed only as document.url string).

arcanis · 2023-08-03T15:35:40Z

Perhaps an alternative would be, like we have path.posix and path.win32, to have a fs.url. It becomes a little more annoying with fs.url.promises, I guess. I don't know how it'd affect the startup time.

GeoffreyBooth · 2023-08-03T16:06:11Z

Perhaps an alternative would be, like we have path.posix and path.win32, to have a fs.url. It becomes a little more annoying with fs.url.promises, I guess. I don’t know how it’d affect the startup time.

I was just thinking, if we can’t keep overloading the first parameter to the fs APIs then we could go the fs/promises route and create fs/url, where it’s the same as fs/promises except that strings are always treated as a URL. So fs/url readFile‘s first param input would still be string, Buffer, URL, or FileHandle, but the string form would always be interpreted as an URL string and never as a path.

LiviaMedeiros · 2023-08-03T17:07:49Z

Here's very dirty proof of concept, proxying node:fs/promises methods and interpreting input as URL strings in userland

import { readFile } from 'fsURL';
// these are all the same
await readFile('/etc/fstab'); // relative url that starts from file:///
await readFile('../../../../../../../../../etc/fstab'); // relative url that works with subdirectory depth <= 9
await readFile('file:///etc/fstab'); // absolute url
await readFile(new URL('file:///etc/fstab')); // URL instance
await readFile(Buffer.from('file:///etc/fstab')); // Buffer instance

// these point to test file assuming cwd to be one level higher
await readFile(import.meta.url); // absolute url of this file
await readFile('fsURL/test.mjs'); // relative url that starts from cwd
await readFile('./fsURL/test.mjs'); // relative url that explicitly starts from cwd

For the reasons described above, I don't think we should have this in Node.js core.

mcollina · 2023-08-04T07:54:27Z

As a side note, support for file:// URLs in fetch() is often asked. So maybe we are onto something with this issue.

@LiviaMedeiros folks are really using path too to manipulate URLs. I think you'd need to match path.url as well.

tniessen · 2023-08-04T09:29:22Z

I was just thinking, if we can’t keep overloading the first parameter to the fs APIs then we could go the fs/promises route and create fs/url, where it’s the same as fs/promises except that strings are always treated as a URL.

I doubt that maintaining a third (or fourth if we separate sync and async APIs) node:fs API just so that folks don't have to do the trivial conversion using fileURLToPath() is a viable approach. Besides, if strings are always treated as a URL, it will be extremely tricky to define and justify return values of, for example, readlink().

bmeck · 2023-08-04T14:30:05Z

I rather think the idea from @mcollina above might be simpler than trying to make filesystem specific fs work like URLs. Why can't we just have new APIs that are built to handle things like in-memory and remote content? It isn't just fs that is affected here, things like fetch() already support data:. I do think mixing permissions with network and file system is always a cause for heavy security review but in this case how things actually resolve particularly with symlinks differs between the 2 modes of operation which is something to be extremely careful about.

guybedford · 2023-08-04T18:47:54Z

Supporting fetch('file:///...') seems like it would be useful and Deno et al already do this I believe.

But still having support for an fs.readFile('file:///path/to/thing') where the check is startsWith('file:') as the exception that exactly does preclude a bunch of former use cases would still be interesting to explore. file: names could still be achieved with relative or absolute pathing - fs.readFile('./file:'), fs.readFile('/file:/'), so I'm still not quite sure I fully understand the argument against, short of being very very clear and intentional about the edge cases and security implications.

Qard · 2023-08-04T23:23:45Z

Why can't we just have new APIs that are built to handle things like in-memory and remote content?

I have actually been thinking it would be nice if we had a new fs API more based on web standards and with, as you suggest, the ability to map to different targets like in-memory representations, remote content, over an archive, etc. We could have various APIs which return a FileSystemDirectoryHandle and let you interact with content from any sort of source or target.

isaacs · 2023-08-05T16:47:03Z

What happens if you have a directory in the cwd named file:? Handling url strings in fs smells like a really confusing and perilous security footgun, tbh.

I think it's best for security and intelligibility if either there's a node:fs/url, like @mcollina suggests, or only extend fs to handle file url objects but not url strings.

fetch('file://...') is an obvious win imo, though.

GeoffreyBooth · 2023-08-05T21:45:37Z

What happens if you have a directory in the cwd named file:? Handling url strings in fs smells like a really confusing and perilous security footgun, tbh.

This was discussed above: we just define in the docs that path strings beginning with file: are interpreted as file URLs, and that’s that. Yes it’s a special case, yes it’s a breaking change, yes it means that if you actually have a folder named file: you need to reference it via ./file: or an absolute path. Many people on this thread have expressed strong opposition but the criticism seems mostly around UX (strings should always be paths, this special case is too confusing) which I don’t really find persuasive because the lack of support for file URLs is itself bad UX, so it’s really a question of choosing between one compromise or the other. I don’t recall any technical objections why it couldn’t work¹, just principled objections (which are fine, UX is important, but “should we do it” is a different category from “can we do it” or “is it a security risk”).

All that said, overloading APIs that accept path strings to also accept URL strings is just one potential solution, and some of the other ideas like fs/url or fetch(fileURL) are arguably more promising to explore.

or only extend fs to handle file url objects but not url strings.

fs APIs already accept URL instances. The ask here is about convenience, trying to get to a similar level of ease as __dirname and __filename in CommonJS. And yes, maybe one of the solutions is to make URL instances more prevalent instead of URL strings. We can’t change import.meta.url or import.meta.resolve, but we could potentially add import.meta.URL (?) or import.meta.resolveURL for URL-instance versions.

There was the technical objection around the variation of “starts with file: and exists on disk” and I’m persuaded that that particular option can’t work. ↩

isaacs · 2023-08-06T00:36:04Z

fs APIs already accept URL instances.

TIL! Sorry for the non sequitur suggestion ;)

Yes it’s a special case, yes it’s a breaking change, yes it means that if you actually have a folder named file: you need to reference it via ./file: or an absolute path.

I think this is the sort of workaround that's going to be a lot more fraught than it seems at first. Like, "breaking change" can mean "this will blow up or otherwise obviously not work unless you change your code to accommodate it", but in this case, it's more like "this will function normally, but potentially do completely the wrong thing".

If you have code that does something like:

for (const f of await readdir('.')) {
  doSomethingWithFile(f)
}

Then it's going to potentially be a juicy security target if I get that code to run after managing to create ./file:/etc/passwd or something. Yes, best practice is arguably to always path.resolve() such things, but I don't know if anyone can even provide a rough estimate about how reliably that's done in the wild. I am somewhat anxious about the security advisories I'm going to have to deal with in glob and tar (not to mention chmodr, chownr, mkdirp, etc.) if fs starts treating file:... strings as urls.

We can’t change import.meta.url or import.meta.resolve, but we could potentially add import.meta.URL (?) or import.meta.resolveURL for URL-instance versions.

If the goal is just making it easier to have something like __filename and __dirname available in ESM environments, which can be transparently passed to fs methods, then yes, I think providing a URL-object equivalent on import.meta seems pretty reasonable, assuming it doesn't run afoul of the es module specification.

Or, honestly, just telling people "wrap file://... strings in new URL(...)" seems pretty reasonable as well.

Perhaps it could also be worthwhile to add a url.pathOrFileURL(...) that will turn either a path or a file URL or file URL string into a file URL object? Then at least there'd be a single method people could wrap everything in, that'll always dtrt, and make the conversion explicit.

const pathOrFileURL = (input: string | URL): URL => {
  if (input instanceof URL) {
    if (input.protocol !== 'file:') throw new Error('not a file URL');
    return input;
  }
  return input.startsWith('file:') ? new URL(input) : pathToFileURL(input);
}

github-actions · 2024-02-02T01:22:09Z

There has been no activity on this feature request for 5 months and it is unlikely to be implemented. It will be closed 6 months after the last non-automated comment.

For more information on how the project manages feature requests, please consult the feature request management document.

github-actions · 2024-03-03T01:23:48Z

There has been no activity on this feature request and it is being closed. If you feel closing this issue is not the right thing to do, please leave a comment.

For more information on how the project manages feature requests, please consult the feature request management document.

GeoffreyBooth mentioned this issue Aug 2, 2023

Is ESM supposed to only work with file:// URLs nodejs/TSC#1423

Open

GeoffreyBooth changed the title ~~Node.js APIs that accept URL instances should also accept URL strings~~ Node.js APIs are inconvenient to use with URL strings Aug 3, 2023

LiviaMedeiros mentioned this issue Sep 3, 2023

url: fix toPathIfFileUrl with string URL input #49466

Open

GeoffreyBooth mentioned this issue Sep 7, 2023

Customizing import.meta nodejs/loaders#159

Open

LiviaMedeiros mentioned this issue Sep 11, 2023

Discussion: New “ESM by default” mode #49432

Open

This was referenced Sep 18, 2023

Implement synchronous import.meta.resolve(…) oven-sh/bun#2472

Closed

import.meta.resolve(…) is documented to return a string, but returns a URL object #49695

Closed

GeoffreyBooth mentioned this issue Dec 21, 2023

vm: support using the default loader to handle dynamic import() #51244

Merged

github-actions bot added the stale label Feb 2, 2024

github-actions bot closed this as not planned Won't fix, can't repro, duplicate, stale Mar 3, 2024

Node.js APIs are inconvenient to use with URL strings #48994

Node.js APIs are inconvenient to use with URL strings #48994

Comments

GeoffreyBooth commented Aug 1, 2023 • edited

Problem

Original Idea

Qard commented Aug 2, 2023

tniessen commented Aug 2, 2023

GeoffreyBooth commented Aug 2, 2023

targos commented Aug 2, 2023

GeoffreyBooth commented Aug 2, 2023

LiviaMedeiros commented Aug 2, 2023 • edited

aduh95 commented Aug 2, 2023

ShogunPanda commented Aug 2, 2023

bnoordhuis commented Aug 2, 2023

JakobJingleheimer commented Aug 2, 2023

arcanis commented Aug 2, 2023 • edited

arcanis commented Aug 2, 2023

tniessen commented Aug 2, 2023

JakobJingleheimer commented Aug 2, 2023

Jamesernator commented Aug 2, 2023

tniessen commented Aug 2, 2023

bmeck commented Aug 2, 2023 • edited by aduh95

guybedford commented Aug 2, 2023

GeoffreyBooth commented Aug 2, 2023

mcollina commented Aug 2, 2023

GeoffreyBooth commented Aug 2, 2023 • edited

bmeck commented Aug 2, 2023

tniessen commented Aug 2, 2023 • edited

Qard commented Aug 2, 2023 • edited

bnoordhuis commented Aug 2, 2023

GeoffreyBooth commented Aug 2, 2023 • edited

aduh95 commented Aug 2, 2023 • edited

GeoffreyBooth commented Aug 2, 2023

aduh95 commented Aug 2, 2023

GeoffreyBooth commented Aug 3, 2023

ovflowd commented Aug 3, 2023 • edited

LiviaMedeiros commented Aug 3, 2023 • edited

arcanis commented Aug 3, 2023

GeoffreyBooth commented Aug 3, 2023 • edited

LiviaMedeiros commented Aug 3, 2023

mcollina commented Aug 4, 2023

tniessen commented Aug 4, 2023

bmeck commented Aug 4, 2023

guybedford commented Aug 4, 2023

Qard commented Aug 4, 2023

isaacs commented Aug 5, 2023

GeoffreyBooth commented Aug 5, 2023

Footnotes

isaacs commented Aug 6, 2023

github-actions bot commented Feb 2, 2024

github-actions bot commented Mar 3, 2024

GeoffreyBooth commented Aug 1, 2023 •

edited

LiviaMedeiros commented Aug 2, 2023 •

edited

arcanis commented Aug 2, 2023 •

edited

bmeck commented Aug 2, 2023 •

edited by aduh95

GeoffreyBooth commented Aug 2, 2023 •

edited

tniessen commented Aug 2, 2023 •

edited

Qard commented Aug 2, 2023 •

edited

GeoffreyBooth commented Aug 2, 2023 •

edited

aduh95 commented Aug 2, 2023 •

edited

ovflowd commented Aug 3, 2023 •

edited

LiviaMedeiros commented Aug 3, 2023 •

edited

GeoffreyBooth commented Aug 3, 2023 •

edited