Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

doc: deprecate url.parse() #44919

Merged
merged 8 commits into from Oct 11, 2022
Merged
Show file tree
Hide file tree
Changes from 6 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Jump to
Jump to file
Failed to load files.
Diff view
Diff view
25 changes: 22 additions & 3 deletions doc/api/deprecations.md
Expand Up @@ -2288,6 +2288,9 @@ future release.

<!-- YAML
changes:
- version: REPLACEME
pr-url: https://github.com/nodejs/node/pull/44919
description: \`url.parse()` is deprecated again in DEP0169.
- version:
- v15.13.0
- v14.17.0
Expand All @@ -2300,7 +2303,7 @@ changes:

Type: Deprecation revoked

The [Legacy URL API][] is deprecated. This includes [`url.format()`][],
The [legacy URL API][] is deprecated. This includes [`url.format()`][],
[`url.parse()`][], [`url.resolve()`][], and the [legacy `urlObject`][]. Please
use the [WHATWG URL API][] instead.

Expand Down Expand Up @@ -3258,7 +3261,7 @@ changes:
description: Runtime deprecation.
-->

Type: Runtime.
Type: Runtime

The implicit suppression of uncaught exceptions in Node-API callbacks is now
deprecated.
Expand All @@ -3267,7 +3270,22 @@ Set the flag [`--force-node-api-uncaught-exceptions-policy`][] to force Node.js
to emit an [`'uncaughtException'`][] event if the exception is not handled in
Node-API callbacks.

[Legacy URL API]: url.md#legacy-url-api
### DEP0169: Insecure url.parse()

<!-- YAML
changes:
- version:
- REPLACEME
pr-url: https://github.com/nodejs/node/pull/44918
Trott marked this conversation as resolved.
Show resolved Hide resolved
description: Documentation-only deprecation.
-->

Type: Documentation-only

[`url.parse()`][] behavior is not standardized and prone to errors that
have security implications. Use the [WHATWG URL API][] instead. CVEs are not
issued for `url.parse()` vulnerabilities.

[NIST SP 800-38D]: https://nvlpubs.nist.gov/nistpubs/Legacy/SP/nistspecialpublication800-38d.pdf
[RFC 6066]: https://tools.ietf.org/html/rfc6066#section-3
[RFC 8247 Section 2.4]: https://www.rfc-editor.org/rfc/rfc8247#section-2.4
Expand Down Expand Up @@ -3413,6 +3431,7 @@ Node-API callbacks.
[alloc_unsafe_size]: buffer.md#static-method-bufferallocunsafesize
[from_arraybuffer]: buffer.md#static-method-bufferfromarraybuffer-byteoffset-length
[from_string_encoding]: buffer.md#static-method-bufferfromstring-encoding
[legacy URL API]: url.md#legacy-url-api
[legacy `urlObject`]: url.md#legacy-urlobject
[static methods of `crypto.Certificate()`]: crypto.md#class-certificate
[subpath exports]: packages.md#subpath-exports
Expand Down
22 changes: 9 additions & 13 deletions doc/api/url.md
Expand Up @@ -27,7 +27,7 @@ The `node:url` module provides two APIs for working with URLs: a legacy API that
is Node.js specific, and a newer API that implements the same
[WHATWG URL Standard][] used by web browsers.

A comparison between the WHATWG and Legacy APIs is provided below. Above the URL
A comparison between the WHATWG and legacy APIs is provided below. Above the URL
`'https://user:pass@sub.example.com:8080/p/a/t/h?query=string#hash'`, properties
of an object returned by the legacy `url.parse()` are shown. Below it are
properties of a WHATWG `URL` object.
Expand Down Expand Up @@ -63,7 +63,7 @@ const myURL =
new URL('https://user:pass@sub.example.com:8080/p/a/t/h?query=string#hash');
```

Parsing the URL string using the Legacy API:
Parsing the URL string using the legacy API:

```mjs
import url from 'node:url';
Expand Down Expand Up @@ -1521,6 +1521,9 @@ The formatting process operates as follows:
<!-- YAML
added: v0.1.25
changes:
- version: REPLACEME
pr-url: https://github.com/nodejs/node/pull/44918
Trott marked this conversation as resolved.
Show resolved Hide resolved
description: Documentation-only deprecation.
- version:
- v15.13.0
- v14.17.0
Expand All @@ -1540,7 +1543,7 @@ changes:
when no query string is present.
-->

> Stability: 3 - Legacy: Use the WHATWG URL API instead.
> Stability: 0 - Deprecated: Use the WHATWG URL API instead.

* `urlString` {string} The URL string to parse.
* `parseQueryString` {boolean} If `true`, the `query` property will always
Expand All @@ -1562,16 +1565,9 @@ A `URIError` is thrown if the `auth` property is present but cannot be decoded.

`url.parse()` uses a lenient, non-standard algorithm for parsing URL
strings. It is prone to security issues such as [host name spoofing][]
and incorrect handling of usernames and passwords.

`url.parse()` is an exception to most of the legacy APIs. Despite its security
concerns, it is legacy and not deprecated because it is:

* Faster than the alternative WHATWG `URL` parser.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we know which one is faster?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suspect @anonrig might be able to answer that. Generally, I'd say url.parse() is usually faster, but not necessarily by a whole lot in the typical use cases. And it is more susceptible to things like host spoofing than WHATWG URL. But anything @anonrig says to the contrary should be treated as the more informed opinion that it is.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Url.parse is faster but not spec compliant.

Copy link
Member

@styfle styfle Jan 1, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When parsing a relative URL, then there is no possibility of host spoofing or username/password issues, right?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can’t answer that question with 100% uncertainity, and I think nobody can.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let me rephrase the question:

Has anyone reported bugs or vulnerabilities for relative urls as inputs to url.parse()?

Copy link
Member Author

@Trott Trott Jan 2, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

url.parse() used to treat the \bhttp://example.com/\b as a hostless URL. WHATWG URL strips leading and trailing backspaces and treats it as a fully-qualified URL. So that's an example where url.parse() mishandling a URL and treating it like a hostless URL could lead to problems. (This issue has been fixed.)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see. I'm thinking more along the lines of URLs that start with /, like req.url

function handler(req, res) {
 if (req.url.startsWith('/')) {
   const parsed = url.parse(req.url)
   parsed.search = '?v=2'
   const result = url.format(parsed)
   res.end(`Modified url: ${result}`)
 } else {
   throw new Error('Expected req.url to be relative')
 }
}

This is the case for which url.parse() and url.format() is useful, and theres not a good solution today using new URL().

* Easier to use with regards to relative URLs than the alternative WHATWG `URL` API.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we have a solution for relative URLs using WHATWG?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you know the base URL, then yes. If not, you have to fake it, perhaps with an identifiably bogus base URL. In that second case, it's a solution, but perhaps not a particularly elegant one.

My opinion only, but a lot of the current problems with url.parse() result from the design decision (probably sensible at the time) to handle both full URLs and relative URLs without knowing the base URL.

Ref: #12682
Ref: whatwg/url#531

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @Trott

My worry is that marking url.parse() as deprecated is that we introduce new problems when someone assumes new URL() will work as a replacement but it wont for relative URLS. And in fact, I think most code I've seen is intending to use it with relative urls like that of url.parse(req.url).

I remember when this was "deprecated" the first time, it confused a lot of people and then it was reverted to say "legacy" which I think was the right decision.

Or maybe the solution here is to document how to parse relative urls, such as req.url.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's whatwg/url#421 that would fit the use-case, unfortunately the proposal look rather stalled.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I’ve seen that one too.

It has a label “needs implementer interest”. Does that need to be a browser or can we say Node.js is interested in implementing?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The proposal comes from Node.js, I think it's implied that Node.js is interested, I think we need other implementers as well.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@aduh95 The discussion around whatwg/url#421 more or less has been continued in whatwg/url#531.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@styfle I have spent a lot of effort to find a solution for relative URLs that agrees with the WHATWG Standard, see e.g. this comment and also this URL Specification and this library.

This was indeed far from trivial due to design decisions made in the WHATWG standard.

I have an idea for a simple API ready, but I stopped working on it for a while. In part because it is unlikely that the WHATWG standard can incrementally add support for relative URLs. This leads to an understandable resistance, which nonetheless is frustrating to me and something that I don’t yet know how to solve.

The other reason is simply being preoccupied with other work at the moment.

Suggestions and ideas are appreciated!

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks interesting. I think WHATWG is probably the wrong place to standardize this since browsers aren't interested. I think WinterCG is a better choice for standardization of relative url parsing since it is focused on server-side JS.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Personally I think that marking this API as deprecated should've been gated on a suitable replacement being made available, and perhaps that would've introduced some incentive to come to a consensus on what that replacement should look like.

Relative URLs are a thing, and we need a suitable API to replace this one.

* Widely relied upon within the npm ecosystem.

Use with caution.
and incorrect handling of usernames and passwords. Do not use with untrusted
input. CVEs are not issued for `url.parse()` vulnerabilities. Use the
[WHATWG URL][] API instead.

### `url.resolve(from, to)`

Expand Down