Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Null bytes in url could cause some problems #39592

Closed
maple3142 opened this issue Jul 30, 2021 · 9 comments · Fixed by #42313
Closed

Null bytes in url could cause some problems #39592

maple3142 opened this issue Jul 30, 2021 · 9 comments · Fixed by #42313
Labels
url Issues and PRs related to the legacy built-in url module. whatwg-url Issues and PRs related to the WHATWG URL implementation.

Comments

@maple3142
Copy link

Version

v16.6.0

Platform

Linux MAPLE 5.10.16.3-microsoft-standard-WSL2 #1 SMP Fri Apr 2 22:23:49 UTC 2021 x86_64 GNU/Linux

Subsystem

url

What steps will reproduce the bug?

There are two bugs about null byte:

const url = require('url')
const u = url.parse('http://[127.0.0.1\0c8763]:8000/')
console.log(u.hostname) // '127.0.0.1\0c8763'
new URL('a\0b')

And the error will be:

Uncaught TypeError [ERR_INVALID_URL]: Invalid URL
    at __node_internal_captureLargerStackTrace (node:internal/errors:464:5)
    at new NodeError (node:internal/errors:371:5)
    at onParseError (node:internal/url:536:9)
    at new URL (node:internal/url:612:5) {
  input: 'a',
  code: 'ERR_INVALID_URL'

The error input is apprently truncated by the null byte.

How often does it reproduce? Is there a required condition?

I think this could only happen when attacker is trying to bypass some SSRF filter in some scenario, but I think it is almost unlikely to happen in realworld.

const url = require('url')
const http = require('http')

const u = url.parse('http://[127.0.0.1\0.github.io]:8000/')
console.log(u)

if (!u.hostname.endsWith('.github.io')) {
	console.log('Sorry, you can only fetch *.github.io')
	process.exit(1)
}

http.request(
	{
		host: u.hostname, // null byte truncated
		port: u.port,
		path: u.path,
		headers: {
			Host: 'xx' // http will automatically set host header by default, and \0 will cause an error in header
		}
	},
	msg => {
		msg.on('data', data => {
			console.log(data.toString())
		})
	}
)
	.on('error', console.error)
	.end()

What is the expected behavior?

It should be invalid url, and http module shouldn't accept null byte.

What do you see instead?

Parsed successfully into a hostname with null byte.

Additional information

No response

@RaisinTen
Copy link
Contributor

url.parse() has a legacy status: https://nodejs.org/api/url.html#url_url_parse_urlstring_parsequerystring_slashesdenotehost
Why not just use the WHATWG URL API? It already classifies both urls as invalid ones.

@targos targos added url Issues and PRs related to the legacy built-in url module. whatwg-url Issues and PRs related to the WHATWG URL implementation. labels Aug 9, 2021
@targos
Copy link
Member

targos commented Aug 9, 2021

@nodejs/url

@Trott
Copy link
Member

Trott commented Mar 8, 2022

The second bug described here, where new URL('a\0b') reports the input as 'a' in the error message, originates in the C++ code. The null character gets treated as the end-of-string marker, but only in the error path. It gets encoded and handled correctly, for example, in new URL('http://example.com/a\0b'). (Safari and Firefox also handle that last one correctly, but Chrome throws. I believe Chrome is deviating from the spec, but even if I'm wrong about that, for the purposes of this conversation, it doesn't matter.)

Possible solutions:

  1. Use std::string instead of char * and then be very careful so we can preserve the NULL. (I suppose this approach might not work depending on the nature of how the string is handled elsewhere in the C++ code.)
  2. Don't report the input string that caused the error back to the user. (Chrome and Safari both take this route.)
  3. Don't worry about it. Report the truncated string. (Firefox takes this approach.)
  4. Preserve the input in JavaScript and re-use it when an error is thrown from C++.

@TimothyGu
Copy link
Member

Chrome's behavior is indeed counter to the spec. https://crbug.com/1099721

Trott added a commit to Trott/io.js that referenced this issue Mar 9, 2022
A null character in the middle of an invalid URL was resulting in an
error message that truncated the input string. This preserves the entire
input string in the error message.

Refs: nodejs#39592
@Trott
Copy link
Member

Trott commented Mar 9, 2022

4. Preserve the input in JavaScript and re-use it when an error is thrown from C++.

This is the approach used in #42263.

Trott added a commit to Trott/io.js that referenced this issue Mar 9, 2022
A null character in the middle of an invalid URL was resulting in an
error message that truncated the input string. This preserves the entire
input string in the error message.

Refs: nodejs#39592
Trott added a commit to Trott/io.js that referenced this issue Mar 9, 2022
A null character in the middle of an invalid URL was resulting in an
error message that truncated the input string. This preserves the entire
input string in the error message.

Refs: nodejs#39592
Trott added a commit to Trott/io.js that referenced this issue Mar 9, 2022
A null character in the middle of an invalid URL was resulting in an
error message that truncated the input string. This preserves the entire
input string in the error message.

Refs: nodejs#39592
nodejs-github-bot pushed a commit that referenced this issue Mar 11, 2022
A null character in the middle of an invalid URL was resulting in an
error message that truncated the input string. This preserves the entire
input string in the error message.

Refs: #39592

PR-URL: #42263
Reviewed-By: Anna Henningsen <anna@addaleax.net>
Reviewed-By: Luigi Pinca <luigipinca@gmail.com>
Reviewed-By: James M Snell <jasnell@gmail.com>
@Trott
Copy link
Member

Trott commented Mar 11, 2022

  1. Preserve the input in JavaScript and re-use it when an error is thrown from C++.

This is the approach used in #42263.

The WHATWG URL error message issue has been fixed. The legacy url.parse() has several possible solutions. I think the best is to bail on parsing when there is a NULL character (and possibly other C0 characters) in the host (and possibly a few other places?) and return an object with only path/pathname/href values set to anything other than null.

@Trott
Copy link
Member

Trott commented Mar 11, 2022

The WHATWG URL error message issue has been fixed. The legacy url.parse() has several possible solutions. I think the best is to bail on parsing when there is a NULL character (and possibly other C0 characters) in the host (and possibly a few other places?) and return an object with only path/pathname/href values set to anything other than null.

https://url.spec.whatwg.org/#host-miscellaneous

A forbidden host code point is U+0000 NULL, U+0009 TAB, U+000A LF, U+000D CR, U+0020 SPACE, U+0023 (#), U+002F (/), U+003A (:), U+003C (<), U+003E (>), U+003F (?), U+0040 (@), U+005B ([), U+005C (\), U+005D (]), U+005E (^), or U+007C (|).

A forbidden domain code point is a forbidden host code point, a C0 control, U+0025 (%), or U+007F DELETE.

@Trott
Copy link
Member

Trott commented Mar 12, 2022

Looks like forbiddenHostChars in url.js omits NULL even though it includes tab, lf, cr, space, etc. from the list above. It also skips checking on IPv6 hostnames. These seems like bugs that should be fixed. PR coming soon....

Trott added a commit to Trott/io.js that referenced this issue Mar 12, 2022
@Trott
Copy link
Member

Trott commented Mar 12, 2022

#42313

Trott added a commit to Trott/io.js that referenced this issue Mar 12, 2022
nodejs-github-bot pushed a commit that referenced this issue Mar 14, 2022
Fixes: #39592

PR-URL: #42313
Reviewed-By: Antoine du Hamel <duhamelantoine1995@gmail.com>
Reviewed-By: Matteo Collina <matteo.collina@gmail.com>
Reviewed-By: Darshan Sen <raisinten@gmail.com>
Reviewed-By: James M Snell <jasnell@gmail.com>
bengl pushed a commit that referenced this issue Mar 21, 2022
A null character in the middle of an invalid URL was resulting in an
error message that truncated the input string. This preserves the entire
input string in the error message.

Refs: #39592

PR-URL: #42263
Reviewed-By: Anna Henningsen <anna@addaleax.net>
Reviewed-By: Luigi Pinca <luigipinca@gmail.com>
Reviewed-By: James M Snell <jasnell@gmail.com>
danielleadams pushed a commit that referenced this issue Apr 21, 2022
A null character in the middle of an invalid URL was resulting in an
error message that truncated the input string. This preserves the entire
input string in the error message.

Refs: #39592

PR-URL: #42263
Reviewed-By: Anna Henningsen <anna@addaleax.net>
Reviewed-By: Luigi Pinca <luigipinca@gmail.com>
Reviewed-By: James M Snell <jasnell@gmail.com>
danielleadams pushed a commit that referenced this issue Apr 24, 2022
A null character in the middle of an invalid URL was resulting in an
error message that truncated the input string. This preserves the entire
input string in the error message.

Refs: #39592

PR-URL: #42263
Reviewed-By: Anna Henningsen <anna@addaleax.net>
Reviewed-By: Luigi Pinca <luigipinca@gmail.com>
Reviewed-By: James M Snell <jasnell@gmail.com>
danielleadams pushed a commit that referenced this issue Apr 24, 2022
A null character in the middle of an invalid URL was resulting in an
error message that truncated the input string. This preserves the entire
input string in the error message.

Refs: #39592

PR-URL: #42263
Reviewed-By: Anna Henningsen <anna@addaleax.net>
Reviewed-By: Luigi Pinca <luigipinca@gmail.com>
Reviewed-By: James M Snell <jasnell@gmail.com>
danielleadams pushed a commit that referenced this issue Apr 24, 2022
A null character in the middle of an invalid URL was resulting in an
error message that truncated the input string. This preserves the entire
input string in the error message.

Refs: #39592

PR-URL: #42263
Reviewed-By: Anna Henningsen <anna@addaleax.net>
Reviewed-By: Luigi Pinca <luigipinca@gmail.com>
Reviewed-By: James M Snell <jasnell@gmail.com>
xtx1130 pushed a commit to xtx1130/node that referenced this issue Apr 25, 2022
A null character in the middle of an invalid URL was resulting in an
error message that truncated the input string. This preserves the entire
input string in the error message.

Refs: nodejs#39592

PR-URL: nodejs#42263
Reviewed-By: Anna Henningsen <anna@addaleax.net>
Reviewed-By: Luigi Pinca <luigipinca@gmail.com>
Reviewed-By: James M Snell <jasnell@gmail.com>
xtx1130 pushed a commit to xtx1130/node that referenced this issue Apr 25, 2022
Fixes: nodejs#39592

PR-URL: nodejs#42313
Reviewed-By: Antoine du Hamel <duhamelantoine1995@gmail.com>
Reviewed-By: Matteo Collina <matteo.collina@gmail.com>
Reviewed-By: Darshan Sen <raisinten@gmail.com>
Reviewed-By: James M Snell <jasnell@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
url Issues and PRs related to the legacy built-in url module. whatwg-url Issues and PRs related to the WHATWG URL implementation.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants
@Trott @TimothyGu @targos @maple3142 @RaisinTen and others