Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Check for remote address in allow-list update should ignore underlay #5735

Closed
bnaecker opened this issue May 10, 2024 · 2 comments · Fixed by #5751
Closed

Check for remote address in allow-list update should ignore underlay #5735

bnaecker opened this issue May 10, 2024 · 2 comments · Fixed by #5751
Labels
api Related to the API. networking Related to the networking. nexus Related to nexus

Comments

@bnaecker
Copy link
Collaborator

When attempting to update the allow-list for user-facing services, Nexus ensures that the address of the peer sending the request is contained in the allow-list itself. This is a safety mechanism to prevent locking oneself out. However, there are still ways one can get locked out. As an example, if a user sets the allow-list to enable traffic from some external proxy, but that proxy is issued a new IP address without notice, they won't be able to get back into the rack to change the list.

To address this, we could make use of the Nexus external API proxy offered by wicketd. This works at the transport layer, and runs over the underlay network. That means it's not subject to the allow-list checks at all, since it's over a completely different interface. However, clients will never be able to actually update the allow-list from the proxy, because the containment mentioned at the beginning will fail! Users don't know (and can't know) the underlay address, so Nexus will always reject the list.

We should relax this check in Nexus. The remote address must either be on the allow-list, or on the underlay network. It's not yet clear to me if that means "same rack subnet" or "any rack subnet", or if we want to do an explicit lookup in the database to make sure it's in a known-valid rack.

@bnaecker bnaecker added api Related to the API. networking Related to the networking. nexus Related to nexus labels May 10, 2024
bnaecker added a commit that referenced this issue May 10, 2024
- Fixes #5735
- When checking the IP allowlist, we ensure the peer making the request
  is contained by the list itself. This works in most circumstances, but
  isn't correct when the request is proxied through `wicketd`, such as
  in a recovery situation. This special handling when the peer isn't in
  the allow-list, by checking whether it's instead part of any rack
  subnet.
- Add datastore method specifically for checking if an address is part
  of a rack subnet.
- Add unique index on the rack table for looking up subnets quickly, and
  a schema migration with that index.
bnaecker added a commit that referenced this issue May 10, 2024
- Fixes #5735
- When checking the IP allowlist, we ensure the peer making the request
  is contained by the list itself. This works in most circumstances, but
  isn't correct when the request is proxied through `wicketd`, such as
  in a recovery situation. This special handling when the peer isn't in
  the allow-list, by checking whether it's instead part of any rack
  subnet.
- Add datastore method specifically for checking if an address is part
  of a rack subnet.
- Add unique index on the rack table for looking up subnets quickly, and
  a schema migration with that index.
@jgallagher
Copy link
Contributor

An idle thought - the nexus techport proxy is an entirely separate dropshot server:

omicron/nexus/src/lib.rs

Lines 179 to 192 in c472464

let http_server_techport_external = {
let server_starter_external_techport =
dropshot::HttpServerStarter::new_with_tls(
&techport_server_config,
external_api(),
Arc::clone(&apictx),
&log.new(o!("component" => "dropshot_external_techport")),
tls_config.map(dropshot::ConfigTls::Dynamic),
)
.map_err(|error| {
format!("initializing external techport server: {}", error)
})?;
server_starter_external_techport.start()
};

I wonder if it would be easier to stash something in the respective server contexts to indicate which was which, and this disable the check of the request IP based on which dropshot server is handling the request?

@bnaecker
Copy link
Collaborator Author

That's an interesting idea, I like it. I'm looking at the server contexts now, and they're all currently shared between the three servers (internal, external, and techport), through an Arc. I think we'd need a deeper copy to make this work, or split the internal fields up in a more structured way so that we can add a new field like this. The ServerContext has clearly grown organically, it looks like from before we even had the internal server, since it includes the latency tracking for the internal API endpoints. So this might be a good time to make such a change in any case!

bnaecker added a commit that referenced this issue May 14, 2024
- Adds a server kind enum, used to distinguish which API server is
running any particular handler.
- Wraps the existing `ServerContext` into a higher-level `ApiContext`
type, which includes the former in addition to the kind of server
handling the request.
- Fixes #5735
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
api Related to the API. networking Related to the networking. nexus Related to nexus
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants