Check for remote address in allow-list update should ignore underlay #5735

bnaecker · 2024-05-10T21:39:21Z

When attempting to update the allow-list for user-facing services, Nexus ensures that the address of the peer sending the request is contained in the allow-list itself. This is a safety mechanism to prevent locking oneself out. However, there are still ways one can get locked out. As an example, if a user sets the allow-list to enable traffic from some external proxy, but that proxy is issued a new IP address without notice, they won't be able to get back into the rack to change the list.

To address this, we could make use of the Nexus external API proxy offered by wicketd. This works at the transport layer, and runs over the underlay network. That means it's not subject to the allow-list checks at all, since it's over a completely different interface. However, clients will never be able to actually update the allow-list from the proxy, because the containment mentioned at the beginning will fail! Users don't know (and can't know) the underlay address, so Nexus will always reject the list.

We should relax this check in Nexus. The remote address must either be on the allow-list, or on the underlay network. It's not yet clear to me if that means "same rack subnet" or "any rack subnet", or if we want to do an explicit lookup in the database to make sure it's in a known-valid rack.

The text was updated successfully, but these errors were encountered:

- Fixes #5735 - When checking the IP allowlist, we ensure the peer making the request is contained by the list itself. This works in most circumstances, but isn't correct when the request is proxied through `wicketd`, such as in a recovery situation. This special handling when the peer isn't in the allow-list, by checking whether it's instead part of any rack subnet. - Add datastore method specifically for checking if an address is part of a rack subnet. - Add unique index on the rack table for looking up subnets quickly, and a schema migration with that index.

jgallagher · 2024-05-11T01:51:43Z

An idle thought - the nexus techport proxy is an entirely separate dropshot server:

omicron/nexus/src/lib.rs

Lines 179 to 192 in c472464

    
           let http_server_techport_external = { 
        
               let server_starter_external_techport = 
        
                   dropshot::HttpServerStarter::new_with_tls( 
        
                       &techport_server_config, 
        
                       external_api(), 
        
                       Arc::clone(&apictx), 
        
                       &log.new(o!("component" => "dropshot_external_techport")), 
        
                       tls_config.map(dropshot::ConfigTls::Dynamic), 
        
                   ) 
        
                   .map_err(|error| { 
        
                       format!("initializing external techport server: {}", error) 
        
                   })?; 
        
               server_starter_external_techport.start() 
        
           };

I wonder if it would be easier to stash something in the respective server contexts to indicate which was which, and this disable the check of the request IP based on which dropshot server is handling the request?

bnaecker · 2024-05-13T15:47:02Z

That's an interesting idea, I like it. I'm looking at the server contexts now, and they're all currently shared between the three servers (internal, external, and techport), through an Arc. I think we'd need a deeper copy to make this work, or split the internal fields up in a more structured way so that we can add a new field like this. The ServerContext has clearly grown organically, it looks like from before we even had the internal server, since it includes the latency tracking for the internal API endpoints. So this might be a good time to make such a change in any case!

- Adds a server kind enum, used to distinguish which API server is running any particular handler. - Wraps the existing `ServerContext` into a higher-level `ApiContext` type, which includes the former in addition to the kind of server handling the request. - Fixes #5735

bnaecker added api Related to the API. networking Related to the networking. nexus Related to nexus labels May 10, 2024

bnaecker mentioned this issue May 14, 2024

Add the server kind to the Nexus API context #5751

Merged

bnaecker closed this as completed in #5751 May 14, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Check for remote address in allow-list update should ignore underlay #5735

Check for remote address in allow-list update should ignore underlay #5735

bnaecker commented May 10, 2024

jgallagher commented May 11, 2024

bnaecker commented May 13, 2024

Check for remote address in allow-list update should ignore underlay #5735

Check for remote address in allow-list update should ignore underlay #5735

Comments

bnaecker commented May 10, 2024

jgallagher commented May 11, 2024

bnaecker commented May 13, 2024