Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refine and strengthen diagnostics for problematic URLs #1342

Merged
merged 25 commits into from Apr 13, 2024

Conversation

EliahKagan
Copy link
Contributor

@EliahKagan EliahKagan commented Apr 12, 2024

This pull request fixed RUSTSEC-2024-0335. This advisory has also been published locally on this repository and globally in the GitHub Advisory Database. I plan to further expand this description, but the advisory will remain the main source of information.

This PR can be seen as a sequel to #1032, which addressed RUSTSEC-2023-0064 (GHSA-rrjw-j4m2-mf34). As there, https://secure.phabricator.com/T12961 has information on the type of vulnerability patched here.

This pull request fixed a variant of that vulnerability where the non-mandatory username portion of a URL could still begin with a hyphen even when used at the beginning of an argument to an external SSH client, thereby smuggling command-line options to the client. It added and used a user_as_argument method to Url to facilitate matching the safety status of the username, and also an analogous host_as_argument to allow the existing correct host handling code to be made clearer. While adding tests of the fix and the new methods, it also made testing a bit more robust for other variants that were already fixed.

Some more information is in commit messages. Some information of lesser interest than the advisory, commit messages, and the above summary is retained below.

Old PR description

Click to expand

This adds more unit tests and journey tests for existing error messages about URLs that are often not usable as remotes because they cause problems when used with external commands. It also distinguishes a further subcategory of these URLs, adds both kinds of test cases for them, and ensures they are identified and that the associated error messages are accurate.

For background on what led me to some of the new Url methods, see that comment from an intermediate stage. I had at that time thought the existing ambiguous method (and the new ambiguous method I added) should be removed, but now I think that may not be the case, though I've moved code over to using the new ones.

This may be ready, though I strongly suspect some revisions may be needed, stylistically, in naming, or otherwise.

Two things I had thought of while working on this but that seemed better to leave out of this PR, unless requested, were:

  • Fuzz testing already touches on some of the kinds of inputs that are already handled properly, for which this PR only adds more unit and journey tests. I am unsure if fuzz testing should be extended to cover the other cases that this PR addresses, and if so, if there is anything this PR should do for that.
  • As before, checking for strange paths is done separately from the gix-url facilities related to that. This doesn't seem like a problem, but I am curious if something ought to be documented to avoid regressions and/or to advise application developers. I haven't identified such an area though; the documentation seems to cover the issues.

The most important points and considerations are not those, but rather are presented outside this PR, such as in code comments/documentation and commit messages.

- Harmonize minor nearby related inconsistencies in documentation.

- Add the new assertions that don't require the new method (these
  are less important than the forthcoming ones that will call it).
This returns `None` if the username begins with a `-`, which would
confuse command-line applications.

It is analogous to the `Url::host_argument_safe()` and
`Url::path_argument_safe()` methods (introduced in d80b5f6), but
for usernames rather than hosts or paths.
- Add assertions where user_argument_safe returns None, but because
  there really is no username, rather than it starting with a `-`.

- Add tests where more parts are present. (Most existing tests did
  not have the optional username part. Most also did/do not have
  the optional password part, but there's no password_argument_safe
  method, and probably no need for one because a password is not
  passed explicitly on the command line as an argument or the front
  of a larger argument--hopefully never, but if so, then it would
  be an operand argument following an option argument and would be
  prevented from being interpreted as an option that way.)

- Add comments to clarify the assertions that specifically verify
  that unsafe parts of URLs are rejected when using *_argument_safe
  methods.
The committed snapshot out of the output was renamed in f72ecce to
fix a spelling error. This updates references to it from gix.sh.
This also adds a snapshot from running them, which shows the
expected failure. (However, in this commit the only snapshot that
is added is the one that was actually generated on the non-CI test
run. This pertains to the `clone` subcommand. Journey tests of the
`receive` subcommand are only run on CI, including the new test. It
should have a committed snapshot, but that's not included in this
commit.)

These differ from the ambiguous host tests, in that ambiguous hosts
are dangerous in the `gix` command when they appear in an `ssh://`
URL (with no `username@` preceding them), while ambiguous paths are
instead dangerous in the `gix` command when they appear in a
`username@host:path/repo` "URL".

The new tests should pass already, and the test of the `clone`
subcommand is confirmed to pass (as noted above, journey tests of
the `receive` subcommand are only run on CI). This is because
ambiguous hosts and ambiguous paths are already handled. In
contrast, ambiguous usernames, which have no journey tests yet, are
not yet handled in gix-transport.
This test should currently pass, but it, like the other journey
tests of the `receive` subcommand, runs only on CI. This is an
expected snapshot, so that if the test, code under test, or
expectation expressed in the snapshot is wrong, then it will be
revealed that there is a problem.

This also expresses the expectation to humans, which may help in
fixing the new test if it turns out to be wrong. Note however that
this does mean that the journey snapshot here is not strictly being
used for approval testing.
These should not pass yet, because gix-transport and the gix binary
do not yet have special handling to reject ambiguous usernames that
can be interpreted as command-line options to ssh clients.

These new journey tests may need further refinement.
The tests appear to work, at least partially, in that a test
failure, of the kind expected due to gix-transport not yet checking
for usernames that ssh clients can interpret as an option argument,
is observed now that the current output has snapshots to compare to
(the comparison fails, which at this point is intended).

Not all tests ran, because of the runner script's fail-fast
behavior and because the `receive` subcommand tests only run on CI.

While I believe having these snapshots in now will make it eaiser
to ensure nothing essential was forgotten, once ambiguous username
handling is implemented, this does make these journey tests no
longer strictly approval tests (since the snapshots incorporate
foreknowledge of the intended behavior).
Not all of these tests can pass yet, since gix-transport does not
yet detect and refuse to proceed with leading-hypnen usernames.
Some pass; those that do not are, as expected:

- ambiguous_user_is_disallowed_explicit_ssh
- ambiguous_user_is_disallowed_implicit_ssh
- ambiguous_user_and_host_remain_disallowed_together_explicit_ssh
- ambiguous_user_and_host_remain_disallowed_together_implicit_ssh

This also adds AmbiguousUserName in one of the enums that will need
to have it, but nothing fails with this error yet; it is introduced
now only to facilitate writing unit tests that assert it.
This detects ambiguous usernames in dangerous cases where they
would be passed to external commands to form SSH connections, if
they would be misinterpreted as option arguments.

This change is analogous to b06a0dd, hardening `gix-transport` and
applications that use it against options smuggled in URLs, but for
the non-mandatory username portion of a URL, rather than the host
and path portions that were covered there.

For example, commands like these no longer pass `-F...` options to
`ssh`:

    gix clone 'ssh://-Fconfigfile@example.com/abc'
    gix clone -- '-Fconfigfile@example.com:abc/def'

Instead, they refuse to run `ssh`, producing the error:

    Error: Username '-Fconfigfile' could be mistaken for a command-line argument
To note that if the `-G` check ever also uses the username, then
that has to be checked to ensure it is safe.
This renames it to `looks_like_command_line_option`, because both
options and non-option arguments are arguments. The condition it
checks for is that something looks like an option and therefore
should *not* be used as a command-line argument.
This isn't correct yet, because:

- It doesn't compile because of the overcomplicated way I'm doing
  borrowing. The enum can just have references as type arguments
  instead.

- The enum should be public.

(Those are noted here rather than in the next commit that will fix
them, because it is not useful to include them in a message
generated from it if it is titled as a conventional commit.)

In addition, while some existing methods are implemented in terms
of these now so there is some test coverage that way, it's far from
complete and these really need their own tests.
This adds three methods to `Url`:

- `Url::user_as_argument`
- `Url::host_as_argument`
- `Url::path_as_argument`

They return `ArgumentSafety` values that distingiush three cases:

1. There is no wrapped value (`Absent`).

2. The wrapped value is usable and presumably safe (`Usable`).

3. The wrapped value is dangerous and should not be passed as a
   command-line argument because it could be interpreted as an
   option due to starting with `-`. The value itself may still be
   useful and safe to include in error messages.

`user_as_argument` and `host_as_argument` are the most useful ones,
as they serve as alternatives to `user_argument_safe` and
`host_argument_safe` whose return values are unambiguous. The
`user_argument_safe` and `host_argument_safe` methods don't
distinguish between the absence of a username or host, and the
presence of a username or host that is unsafe to pass as an
argument.

`path_as_argument` included for parity, in case it is useful to
write code that handles the three cases similarly. However, there
is no underlying ambiguity in the return value of the corresponding
`path_argument_safe` method, since unlike `user` and `host`, the
`path` is not an option type.
The three new `Url::*_as_argument` methods were only tested
indirectly before (and the path one not at all). This adds tests
for them so they have the same direct coverage as the corresponding
`*_argument_safe` methods.

The `Debug` and `PartialEq` traits added to `ArgumentSafety` are
primarily added to allow the new assertions to be written in a more
streamlined way (without matching).
Right now this is more locked down than necessary, serving to check
that ArgumentSafety can be easily matched on in scenarios where one
may wish to reject both usernames and hosts that are ambiguous.

This is verified by all tests passing except these that fail:

- ambiguous_host_is_allowed_with_user_explicit_ssh
- ambiguous_host_is_allowed_with_user_implicit_ssh

Prohibiting ambiguous hosts even when the username is present and
cannot itself be confused with a command-line option isn't needed
in gix-transport, which passes URLs of the form `user@host` anytime
the username is present. However, if an application ever passes
the username as a separate argument rather than as part of the
netloc with the host, then being able to express that in a natural
and non-error-prone way is important.
Since the `user@` prefix makes it safe as gix-transport uses it,
and there are test cases specifying that it should be permitted.
This is to test a code path where a host is passed to an ssh
command and it's important to ensure it doesn't start with `-`.
This code path uses `-G` to help figure out what SSH client is
being used, and most of the time it is not followed. The existing
check is sufficiently simple it is most likely correct. But it
seems never to to have been covered in automated tests: neither
unit tests nor any journey tests.

This does not yet manage to test is successfully, because it does
not succeed at getting that specific code path to be followed.
This test was absent before, but seems important to add to verify
that the code there was correct (it was), but also because I plan
to refactor that code to use `host_as_argument` rather than the
older (in some cases potentially ambiguous) `host_argument_safe`.

This is specific to the case where a command like `ssh -G` -- but
where `ssh` is an uncertain command such that gix-transport tries
to figure out what is -- is used. A netloc is passed as part of
that, but the netloc used consists only of a host and not a
username, even when this is a step in a longer process that will
later use (and check the safety of) the username. So only the host
has to be checked (at that stage) to make sure it cannot masquerade
as an option argument.
Instead of `Url::host_argument_safe()`.

This also removes the comment about how the username should be
handled if added, mostly because I didn't really need to add that
comment.
@EliahKagan EliahKagan marked this pull request as ready for review April 13, 2024 01:07
@Byron Byron self-requested a review April 13, 2024 08:10
- put more standard-derives on `ArgumentSafety` to allow more convenient usage.
- remove `path_as_argument()` method as it wasn't adding anything. This allowed the `ArgumentSafety` wrapper to be simpler.
- turn assertion comments into messages.
Copy link
Owner

@Byron Byron left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a fantastic PR and it's like you have been doing Rust for a long time.

I love how you picked up on the commenting style to selectively incorporate changelog messages. On that note, I have a removed a method that was certainly mentioned in a commit, and I accept that the changelog will now be slightly misleading without rewriting history. It's a systematic shortcoming.

Further, the addition of ArgumentSafety is a great one which leads to nicer error messages than was possible before. The previous implementation didn't handle this very well as a the possible states weren't correctly represented.

I will now see to creating a release once CI is green.

@@ -345,12 +345,33 @@ title "gix commit-graph"
}
)
)
(with "an ambiguous ssh username which could be mistaken for an argument"
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I appreciate having this tested in all layers of the application, making it very clear that the safety checks are actually made in the right place and effective.

This comment is just to mention that I avoid adding more journey tests these days especially if we know that the gix (CLI) has to go through plumbing crates, which are typically tested on unit or integration level, hence we would only duplicate efforts here.

These days, I'd only use journey test if the behaviour is truly not tested anywhere else, which is also something I try to avoid. Journey tests aren't used anymore to pin down features or behaviour of the CLI either, as it's a 'can have breaking changes at any time' development tool. Rust is typically great at assuring that if it compiles, it works, assuming that each plumbing crate and the gix crate have the needed tests to validate behaviour.

@@ -0,0 +1 @@
Error: The repository path '-oProxyCommand=open$IFS-aCalculator/bar' could be mistaken for a command-line argument
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's interesting that it seems to inject terminal escape codes ([2K) here, but not in the snapshots further below.
I'd expect it not to do that, but it's not important enough to even look into it right now.

host.into()
}

let host_maybe_with_user_as_ssh_arg = match (url.user_as_argument(), url.host_as_argument()) {
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a great use of match and one of the main reasons Rust is so powerful! Never again can one miss handling a possible program state, as long as one remembers to match on them that is :).

@Byron Byron merged commit 1272542 into Byron:main Apr 13, 2024
19 checks passed
@EliahKagan EliahKagan deleted the strange-usernames branch April 13, 2024 14:36
@EliahKagan
Copy link
Contributor Author

EliahKagan commented Apr 13, 2024

Now that the advisory GHSA-98p4-xjmm-8mfh is published, I've updated this PR description to link to it and provide a small amount of additional information. In the near future, I'll likely edit it further to add more information, including about the approach taken, as well as to give it a clearer title.

I hope it's okay that I have also changed the "Affected versions" field of the advisory itself (the repository-local advisory, since as far as I know it is not yet in advisory databases) from <= 0.41.3 to < 0.42.0. This makes clearer that there are no existing or forthcoming versions of the gix-transport crate lower than 0.42.0 that contain a fix. If this change is not wanted, it can of course be undone.

I am uncertain if there are any other crates--such as related to the gix CLI tool?--that should be listed as affected. If so, they could be added to the advisory. This can be done without replacing gix-transport, which is the most important affected crate: the advisory could list both.

@Byron
Copy link
Owner

Byron commented Apr 14, 2024

I hope it's okay that I have also changed the "Affected versions" field of the advisory itself (the repository-local advisory, since as far as I know it is not yet in advisory databases) from <= 0.41.3 to < 0.42.0. This makes clearer that there are no existing or forthcoming versions of the gix-transport crate lower than 0.42.0 that contain a fix. If this change is not wanted, it can of course be undone.

Yes, thanks a lot for catching this. I somehow behaved like the bound of unpatched versions can't be changed.

I am uncertain if there are any other crates--such as related to the gix CLI tool?--that should be listed as affected. If so, they could be added to the advisory. This can be done without replacing gix-transport, which is the most important affected crate: the advisory could list both.

Right, I wasn't aware that multiple items can be added. Instead, I added the affected items downstream as a 'Tip' right after the summary. I have corrected that by adding two more items accordingly and removed the 'Tip'.

@EliahKagan
Copy link
Contributor Author

EliahKagan commented Apr 15, 2024

I see the RUSTSEC advisory was published! 🎉

I've updated the pull request description to include a link to that, but I still do plan to expand it further later, as noted.

@EliahKagan
Copy link
Contributor Author

EliahKagan commented Apr 15, 2024

It looks like I should have included CWE-88 in the weaknesses list in GHSA-98p4-xjmm-8mfh. Its Example 1 clarifies that it encompasses this kind of vulnerability. That also looks like the (only) CWE that was attempted to be listed in the earlier GHSA-rrjw-j4m2-mf34. (That actually lists no CWEs, but CWE-88 was intended.) I can edit the repository advisory to add CWE-88, and it looks like RUSTSEC advisories don't list CWEs so nothing has to be added there.

But should CWE-88 be added as a second CWE, or should it replace CWE-77? CWE-77 is often overused, but I think this is mostly because of uses where CWE-78 should've been used instead. It seems to me that CWE-77 still applies and may not be completely covered by CWE-88, but I'm not sure.

Edit: Updated to clarify the relevance of CWE-88 in the history of the earlier advisory.

@Byron
Copy link
Owner

Byron commented Apr 15, 2024

To me, as a layperson (strangely enough), and judging only by the title, they all seem pretty similar and I feel that they all somewhat apply. Since CWE assignments are probably useful for people to find (or better understand) an advisory, maybe having multiple slightly overlapping CWEs isn't the worst thing to do.

But I am definitely not sure either.

@EliahKagan
Copy link
Contributor Author

That sounds reasonable. I'll try to look a bit more into how, if at all, CWE-77 and CWE-88 tend to be used together, then I'll add CWE-88 while also keeping CWE-77 unless I've found a reason to do otherwise.

@riking
Copy link

riking commented Apr 16, 2024

Although ssh supports aliases for hosts, even if git-upload-pack could be made an alias, that is made difficult by the URL-encoding transformation.

Note that DNS Search Domains could turn that into a valid domain: git-upload-pack.corp.example.com

However, it will be difficult to place a malicious host in a machine's search domains, as the search domains typically have common control with the machine they are configured on.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants