Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Incorrect validation of version encoding #57

Open
matt-phylum opened this issue Nov 8, 2023 · 8 comments
Open

Incorrect validation of version encoding #57

matt-phylum opened this issue Nov 8, 2023 · 8 comments
Labels
bug Something isn't working

Comments

@matt-phylum
Copy link

After parsing the version number out of a PURL string, packageurl-js validates that the version number was encoded exactly the way that packageurl-js would have encoded it. This seems like an extra thing that packageurl-js does, undirected by the spec. Even if the spec did say to validate that the version number was percent encoded, that doesn't actually mean anything because technically any ASCII string that doesn't contain '%' is a valid percent encoded string by definition.

This is problematic because different PURL implementations encode different sets of characters. If a version number contains a character that one PURL implementation unnecessary escapes and packageurl-js does not, or vice versa, packageurl-js will correctly parse the version number and then throw an exception because the encoding validation failed. For example, packageurl-js will fail to decode this valid PURL produced by packageurl-java: pkg:t/n@1%3A2.

@chrsch-dev
Copy link

Hi, I observed the same behaviour which seems to be regression with the newest released version (1.2.1).
the call PackageURL.fromString('pkg:maven/a@2+3') doesn't produce an error. But the call PackageURL.fromString('pkg:maven/a@2%2B3') returns an error: Uncaught Error: Invalid purl: version must be percent-encoded.

When using version 1.2.0 I get the following:
PackageURL.fromString('pkg:maven/a@2+3') -> Uncaught Error: Invalid purl: version must be percent-encoded
PackageURL.fromString('pkg:maven/a@2%2B3') -> Purl is valid no error

The behaviour of version 1.2.0 is the one I expected

@matt-phylum
Copy link
Author

matt-phylum commented Nov 8, 2023

They should both be valid.

2+3 is produced by althonos/packageurl.rs, package-url/packageurl-js, phylum-dev/purl.

2%2B3 is produced by package-url/packageurl-dotnet, package-url/packageurl-go, package-url/packageurl-java, package-url/packageurl-php, package-url/packageurl-python, package-url/packageurl-ruby, package-url/packageurl-swift.

Be careful with version numbers that contain + because some implementations will incorrectly convert it into a space: package-url/packageurl-dotnet, package-url/packageurl-ruby. (package-url/purl-spec#261) All other implementations correctly parse it as +, except for this implementation which throws an exception.

@steven-esser steven-esser added the bug Something isn't working label Nov 8, 2023
@steven-esser
Copy link
Collaborator

steven-esser commented Nov 8, 2023

We had a few external contributions around this issue that seem to have created more problems. My bad for not looking deeper at the contributions/test additions and taking them at face value. I will take a look at this later tonight and hopefully have a fix.

steven-esser added a commit that referenced this issue Nov 9, 2023
* remove unneed exception raising for the version param
* do not skip URL encoding `:` and `+` characters
* add test cases from old issues

refs: #45, #46, #57

Signed-off-by: Steven Esser <me@stevenesser.com>
@steven-esser
Copy link
Collaborator

#61 with the fix for this.

@chrsch-dev
Copy link

chrsch-dev commented Nov 9, 2023

They should both be valid.

2+3 is produced by althonos/packageurl.rs, package-url/packageurl-js, phylum-dev/purl.

2%2B3 is produced by package-url/packageurl-dotnet, package-url/packageurl-go, package-url/packageurl-java, package-url/packageurl-php, package-url/packageurl-python, package-url/packageurl-ruby, package-url/packageurl-swift.

Be careful with version numbers that contain + because some implementations will incorrectly convert it into a space: package-url/packageurl-dotnet, package-url/packageurl-ruby. (package-url/purl-spec#261) All other implementations correctly parse it as +, except for this implementation which throws an exception.

Hm, in the purl specification it says for the version A version must be a percent-encoded string. So I would assume pkg:maven/a@2+3 is not correct. It has to be pkg:maven/a@2%2B3 to be valid.

@matt-phylum
Copy link
Author

matt-phylum commented Nov 9, 2023

2+3 is a percent encoded string. Any ASCII string that doesn't contain '%' is percent-encoded, and decodes into the same string. "Percent-encoded" only describes a method of encoding, without specifying which characters need to be encoded (besides '%', obviously). The PURL spec does not specify that '+' should be encoded, and I think encoding it is a minor spec deviation. Especially with the way the spec describes encoding, it seems unreasonable to expect every implementation to create exactly the same canonicalized PURLs down to the percent-encoding, but it's important that implementations are permissive enough to handle both PURLs written by other implementations and PURLs written by humans.

When the spec says the version must be percent-encoded, it means that the version must be decoded on read and encoded on write, using the characters described in the encoding section.

@chrsch-dev
Copy link

Sorry for the late response but I don't get why 2+3 is a valid version string in a purl. If you have a look in the Percent-encoding wikipedia entry at the section Percent-encoding in a URI you can see that after percent encoding the + should be replaced by %2B since + is a reserved character.

Maybe I miss something. Would be great if you could shed some light here.

@matt-phylum
Copy link
Author

Reserved characters are not the characters that are supposed to be encoded. Wikipedia isn't really the best source, but it says "Other characters in a URI must be percent-encoded." and "When a character from the reserved set (a "reserved character") has a special meaning (a "reserved purpose") in a certain context, and a URI scheme says that it is necessary to use that character for some other purpose, then the character must be percent-encoded." This is not the case here. + is not an other character, and in this context it does not have special meaning that we are avoiding by encoding it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

3 participants