URL's on incoming requests #71

wesleytodd · 2020-10-02T14:24:28Z

I think the new api's should use URL as the basis for req.url (or whatever equivalent we expose). As you can see from the thread above, there is some question about how today these are relative urls and the new spec does not support that. My thought, and what I think we need to discuss, is that the "base" should be derived from the incoming request headers and fall back to the server address.

If we come to an agreement on this I am happy to write up a more detailed description. Agenda added so we can discuss in the next meeting as well.

The text was updated successfully, but these errors were encountered:

mcollina · 2020-10-02T14:28:07Z

I'm highly -1 on using new URL() for req.url. It's extremely slow and inefficient due to the host and protocol parsing.

ronag · 2020-10-02T14:29:35Z

I usually use, https://www.npmjs.com/package/request-target. Maybe worth looking into?

ronag · 2020-10-02T14:30:39Z

https://github.com/nodejs/next-10/blob/master/VALUES_AND_PRIORITIZAION.md might also be worth considering, i.e. "Performance" vs "Web API compatibility".

Ethan-Arrowood · 2020-10-02T14:32:38Z

could we do something like req.url returns the faster solution and then req.url() would return the new URL() version? This way users have to opt-in to the performance decrease if they want it?

Additionally, could we look into improving host and protocol parsing? Or is has that already been attempted?

wesleytodd · 2020-10-02T14:34:15Z

This is why I was strongly questioning adding that (nodejs/node#35323) in yesterdays meeting.

The perf is a big concern I was not aware of (do we have benchmarks on this, or details on why?), but is also a good reason to push back on the web api compat technical value issue.

ronag · 2020-10-02T14:36:35Z

do we have benchmarks on this, or details on why and what parts are slow?

From https://www.npmjs.com/package/request-target

$ npm run benchmark
legacy url.parse() x 371,681 ops/sec ±0.88% (297996 samples)
whatwg new URL() x 58,766 ops/sec ±0.3% (118234 samples)
request-target x 552,748 ops/sec ±0.54% (344809 samples)

wesleytodd · 2020-10-02T14:38:04Z

I hadn't clicked your link, but yeah this looks good. I will dig into what it is doing, thanks for the link!

wesleytodd · 2020-10-02T14:43:52Z

Well I very quickly ended on this line, and I am fairly confident we will be able to find a better (and even better perf) way than a big regex.

But this leads me to my next question: do we really want to deviate from the existing URL apis?

Seems like this would add "yet another url" to node core, right? Maybe we could get fast parsing but also expose an api which looks like a URL instance, but this gets us into a similar situation of "its a stream but not". Anyway, just thinking out loud on this.

styfle · 2020-10-02T15:06:29Z

It's extremely slow and inefficient due to the host and protocol parsing.

Is the slow performance of new URL() inherent to the way its spec'ed or just the Node.js implementation?

If its the latter, then this would be a good excuse to re-evaluate the implementation to squeeze out better performance.

styfle · 2020-10-02T18:56:51Z

Overall, I think providing a way to retrieve req.url as a URL would be a great addition to Node core because the URL object is the future, url.parse() is a legacy of the past, before there was a standard.

Using standard JS URL objects allows code to be shared between the browser, Node.js, etc.

One of the big foot-guns today is parsing url query string params. Consider this code:

const { parse } = require('url');
const handler = (req, res) => {
  let { pathname = '/', query = {} } = parse(req.url || '', true);
  const { user = '' } = query;
  console.log(user.toLowerCase());
}

This works fine with https://example.com/api?user=foo but throws when two parameters with the same name are detected https://example.com/api?user=foo&user=bar because user suddenly becomes an array instead of a string.

Consider this similar code using URL:

const handler = (req, res) => {
  const url = new URL(req.url, 'https://example.com');
  const user = url.searchParams.get('user') || '';
  console.log(user.toLowerCase());
}

This bug doesn't happen with URL because it separates getAll() vs get(). Now we won't crash in production.

My workaround for the time being looks like this:

export function getURL(req) {
  const proto = req.headers['x-forwarded-proto'] || 'https';
  const host = req.headers['x-forwarded-host'] || req.headers.host || 'example.com';
  return new URL(req.url || '/', `${proto}://${host}`);
}

ljharb · 2020-10-02T19:08:49Z

It makes sense to me to offer both (obviously under different names) - ie, req.url as the string, and req.getURL() as the lazily-computed-but-memoized URL object.

wesleytodd · 2020-10-02T19:40:19Z

It makes sense to me to offer both

I think before we make any decisions on this, we need to know if we can help fix the perf issues of URL. If we cannot do that, then I think we should consider alternate api's to expose both. If not, I think one api is best.

My workaround for the time being looks like this:

@styfle There are security concerns here, but if we "do it right" I think it is a reasonable approach (if you know those headers can only be set by trusted proxies, you should be fine). The added support for the meta headers would also change this a bit, but not in a bad way.

Also, we would default them to the values computed from server.address() I think.

stevenvachon · 2020-10-02T19:52:31Z

It makes sense to me to offer both (obviously under different names) - ie, req.url as the string, and req.getURL() as the lazily-computed-but-memoized URL object.

Why both when there's URL::href?

styfle · 2020-10-02T19:54:42Z

Modifying req.url to return a URL object instead of a string would be a breaking change and it doesn't seem worth the churn to the ecosystem.

ghermeto · 2020-10-02T20:16:54Z

Modifying req.url to return a URL object instead of a string would be a breaking change and it doesn't seem worth the churn to the ecosystem.

I don't think this is an issue if we are discussing a whole new API

styfle · 2020-10-02T20:24:51Z

Maybe I misunderstood the intent of this discussion 🤔

I thought req in the original post was referring to the IncomingMessage class, the first parameter to createServer listener.

If you modify the url property on IncomingMessage, then it would be a breaking change, right?

awwright · 2020-10-03T21:40:34Z

Possibly a tangent, but if I were to pick the naming, I would use req.target to mean "what the client provided literally" and req.uri to mean "Resolved (full, absolute) URI"

And if I were modifying the Node.js API, I would deprecate req.url or alias it to req.target instead.

wesleytodd · 2020-10-05T16:54:37Z

Maybe I misunderstood the intent of this discussion

This discussion is related to this: #55

Because the entire api will be new, breaking is fine.

I would use req.target to mean "what the client provided literally" and req.uri to mean "Resolved (full, absolute) URI"

While I see the goal of this naming, I am strongly against bikeshedding names (at least at this point). I also do not think this would be a meaningful change, nor one which would make it easier/better for any of our constituencies. Once we nail down the particulars of what we need to offer on this api, we can discuss again if name changes will be something to pursue, but lets keep on focus for now.

wesleytodd · 2020-10-15T18:01:26Z

To recap our discussion on the meeting today:

What would folks think of providing an api at server creation time to pass your url implementation. By default we would create a URL instance, but if you passed a function to the createServer call your function would be called instead. This would be similar to the existing api where you can pass IncomingMessage and ServerResponse. So you could pass URL, for example. Thoughts?

mcollina · 2020-10-15T21:36:55Z

I'm still generically -1 to the idea of using

new URL(`${protocol}://${host}${path}`) // simplified

in a hot path by default.
It creates unnecessary computation by generating a string that is then parsed. It also creates a few objects which are slow to create as well (UrlSearchParams).

A similar discussion can be made for the Headers object of fetch and the relative standards.

None of this was defined with throughput in mind. If we want Node.js to shine, it needs to be faster, not slower.

stevenvachon · 2020-10-15T22:08:17Z

@mcollina "better" is better than "faster" too.

ghermeto · 2020-10-15T22:20:30Z

Agree with @mcollina. Anything that affects performance should be opt-in. In this particular case, if I don't need to parse the URL in most of my requests, why would I pay the price on all of them?

wesleytodd · 2020-10-19T19:35:59Z

The plan which @ronag and I discussed in this gist involves a multi-layer api. As a compromise, if we kept the "core" api with the string req.url, would it be acceptable to have the higher level api use the URL constructor? Since the higher level api is intended to usable directly (as in not via a framework) it would be preferable to me so we don't introduce "yet another" url implementation.

ghermeto · 2020-10-19T19:43:46Z

Can you elaborate on:

Since the higher level api is intended to usable directly

and

we don't introduce "yet another" url implementation

wesleytodd · 2020-10-19T19:55:19Z

Since the higher level api is intended to usable directly

I think we should probably agree on which APIs have which intended constituencies. @ronag outlined a bit of what the goals of the different undici apis are, and the layering which he does enables really low level usage, but also has "direct end user api's" which are high level. The goal was to take a similar approach (see the table in the comment linked above). What we don't yet have is a clear statement on the target usage of the layers.

we don't introduce "yet another" url implementation

One of the proposals above was to utilize a specialty parser like request-target for the url. My concern with this is we already have 2 implementations, this would add a third.

ghermeto · 2020-10-19T20:10:08Z

I think we should probably agree on which APIs have which intended constituencies. @ronag outlined a bit of what the goals of the different undici apis are, and the layering which he does enables really low level usage, but also has "direct end user api's" which are high level. The goal was to take a similar approach (see the table in the comment linked above). What we don't yet have is a clear statement on the target usage of the layers.

And what is wrong with using:

const url = new URL(req.url);

Seems simple enough if a user wants to parse the URL and can also be provided by the frameworks if they must...

One of the proposals above was to utilize a specialty parser like request-target for the url. My concern with this is we already have 2 implementations, this would add a third.

agree with you and I believe this gives an opportunity to let users (or frameworks) chose how they want to parse a URL. As you suggested, we can provide a function to createServer that tells the server how to parse the URL. If not provided, just returns a string by default:

http.createServer({
  allowHTTP1: true,
  parseUrl(stringUrl) {
    return new URL(stringUrl);
  }
})

styfle · 2023-01-17T22:14:46Z

One of the proposals above was to utilize a specialty parser like request-target for the url. My concern with this is we already have 2 implementations, this would add a third.

Agreed, that will likely lead to more confusion.

I think the more this conversation goes on we need more clarity on what the use case are. I think that most cases will be that only the pathname is used and that might lean more toward just telling users to parse it themselves if necessary.

@wesleytodd In addition to pathname, I think query string is quite common too.

I think what'd need to make this work is: URL.from({ host, path, protocol }). In this way we can skip the non-needed parts of the algorithm that are really expensive.

@mcollina I'm not sure that solves the problem of parsing req.url into some object. Would you be able to URL.from(req.url) or even URL.from(req)?

sheplu · 2023-01-18T13:27:13Z

thanks for your message @styfle ! But this group isn't really active anymore (last two years I would say)

styfle · 2023-01-19T00:04:32Z

I see. I've been bouncing around different issues and all seem to be dead ends.

mcollina · 2023-02-20T13:02:26Z

@mcollina I'm not sure that solves the problem of parsing req.url into some object. Would you be able to URL.from(req.url) or even URL.from(req)?

I think URL.from(req) would be fantastic. We probably can't shipt that API, but possibly http.getURL(req), which would be ok. Also req.getURL() might work.

styfle · 2023-02-21T16:57:17Z

possibly http.getURL(req), which would be ok. Also req.getURL() might work.

Either option sounds great!

We could use original-url as a starting point (or even vendor it into node core)

mcollina · 2023-02-21T22:50:35Z

original-url uses the unsafe require('url').parse method. I don't think anybody should use that implementation as there are multiple security problems with it. The best approach would be to pass via new URL() or analog.

mcollina · 2023-02-21T22:51:06Z

would you like to send a PR for this?

awwright · 2023-02-22T04:46:16Z

How is url.parse unsafe? If the incoming request conforms to the generic URI syntax (which obviously should be verified, as you always do for untrusted user input, right?), there's no reason there should be problems.

styfle · 2023-02-23T20:55:13Z

would you like to send a PR for this?

Looks like someone already did: watson/original-url#11

marco-ippolito · 2024-02-13T15:42:53Z

issue related: nodejs/node#51311

wesleytodd added the web-server-frameworks-agenda label Oct 2, 2020

ronag mentioned this issue Oct 2, 2020

doc: add compatibility/interop technical value nodejs/node#35323

Closed

2 tasks

wesleytodd mentioned this issue Oct 2, 2020

Relative URLs in WHATWG URL API nodejs/node#12682

Closed

github-actions bot mentioned this issue Oct 9, 2020

Node.js Foundation Web Server Team Meeting 2020-10-15 #72

Closed

github-actions bot mentioned this issue Jan 1, 2021

Node.js Foundation Web Server Team Meeting 2021-01-07 #82

Closed

github-actions bot mentioned this issue Jan 15, 2021

Node.js Foundation Web Server Team Meeting 2021-01-21 #83

Closed

github-actions bot mentioned this issue Jan 29, 2021

Node.js Foundation Web Server Team Meeting 2021-02-04 #84

Closed

github-actions bot mentioned this issue Feb 12, 2021

Node.js Foundation Web Server Team Meeting 2021-02-18 #85

Closed

github-actions bot mentioned this issue Feb 26, 2021

Node.js Foundation Web Server Team Meeting 2021-03-04 #86

Closed

github-actions bot mentioned this issue Mar 12, 2021

Node.js Foundation Web Server Team Meeting 2021-03-18 #87

Closed

github-actions bot mentioned this issue Mar 26, 2021

Node.js Foundation Web Server Team Meeting 2021-04-01 #88

Closed

github-actions bot mentioned this issue Apr 9, 2021

Node.js Foundation Web Server Team Meeting 2021-04-15 #89

Closed

github-actions bot mentioned this issue Apr 23, 2021

Node.js Foundation Web Server Team Meeting 2021-04-29 #90

Closed

trentm mentioned this issue Mar 2, 2023

Deprecate url.parse API in favor of URL constructor watson/original-url#11

Open

mhdawson mentioned this issue Jan 16, 2024

Node.js Web Server Frameworks Meeting 2024-01-30 #99

Closed

mhdawson mentioned this issue Jan 26, 2024

Node.js Web Server Frameworks Meeting 2024-01-30 #100

Open

mhdawson mentioned this issue Feb 9, 2024

Node.js Web Server Frameworks Meeting 2024-02-13 #103

Open

mhdawson mentioned this issue Feb 23, 2024

Node.js Web Server Frameworks Meeting 2024-02-27 #104

Open

wesleytodd mentioned this issue Feb 27, 2024

Http server message.url does not follow the whatwg url standard and does not evaluate ../ nodejs/node#51311

Open

mhdawson mentioned this issue Mar 8, 2024

Node.js Web Server Frameworks Meeting 2024-03-12 #106

Open

mhdawson mentioned this issue Mar 22, 2024

Node.js Web Server Frameworks Meeting 2024-03-26 #107

Closed

mhdawson mentioned this issue Apr 5, 2024

Node.js Web Server Frameworks Meeting 2024-04-09 #111

Open

mhdawson mentioned this issue Apr 19, 2024

Node.js Web Server Frameworks Meeting 2024-04-23 #112

Open

wesleytodd removed the web-server-frameworks-agenda label Apr 23, 2024

URL's on incoming requests #71

URL's on incoming requests #71

Comments

wesleytodd commented Oct 2, 2020

mcollina commented Oct 2, 2020

ronag commented Oct 2, 2020

ronag commented Oct 2, 2020

Ethan-Arrowood commented Oct 2, 2020

wesleytodd commented Oct 2, 2020 • edited

ronag commented Oct 2, 2020

wesleytodd commented Oct 2, 2020

wesleytodd commented Oct 2, 2020

styfle commented Oct 2, 2020

styfle commented Oct 2, 2020 • edited

ljharb commented Oct 2, 2020

wesleytodd commented Oct 2, 2020

stevenvachon commented Oct 2, 2020

styfle commented Oct 2, 2020 • edited

ghermeto commented Oct 2, 2020

styfle commented Oct 2, 2020 • edited

awwright commented Oct 3, 2020

wesleytodd commented Oct 5, 2020

wesleytodd commented Oct 15, 2020 • edited

mcollina commented Oct 15, 2020

stevenvachon commented Oct 15, 2020

ghermeto commented Oct 15, 2020

wesleytodd commented Oct 19, 2020 • edited

ghermeto commented Oct 19, 2020

wesleytodd commented Oct 19, 2020

ghermeto commented Oct 19, 2020

styfle commented Jan 17, 2023

sheplu commented Jan 18, 2023

styfle commented Jan 19, 2023 • edited

mcollina commented Feb 20, 2023

styfle commented Feb 21, 2023

mcollina commented Feb 21, 2023

mcollina commented Feb 21, 2023

awwright commented Feb 22, 2023

styfle commented Feb 23, 2023

marco-ippolito commented Feb 13, 2024

wesleytodd commented Oct 2, 2020 •

edited

styfle commented Oct 2, 2020 •

edited

styfle commented Oct 2, 2020 •

edited

styfle commented Oct 2, 2020 •

edited

wesleytodd commented Oct 15, 2020 •

edited

wesleytodd commented Oct 19, 2020 •

edited

styfle commented Jan 19, 2023 •

edited