New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
nexus does not re-use CruciblePantryClient during disk imports. #5717
Comments
It probably makes sense to audit Nexus for any
I think we have a number of differently-configured clients (different timeouts, for instance), so we may need to be a bit more clever, but Artemis and I are going to try out a basic version of this change with the Crucible Pantry client. |
I realize I just suggested @smklein deprioritize connection pooling a bit, but I think this is the kind of thing we need it for. Keeping individual clients around is definitely wrong if they're configured with specific IP addresses (which is pretty common throughout omicron, I think?), since the may go stale if reconfigurator adjusts services. I'm less sure about keeping clients around that are given DNS names, but then we're subject to however reqwest does DNS resolution (which is also something we ultimately want to control as part of the connection pool). |
It's pretty common to have one-off Progenitor-generated clients with specific IPs around, but that's a level up from I think this and the CRDB connection pooling issues are not orthogonal, but a stop-gap solution for HTTP seems easier at least. It's kind of weird how expensive |
Fixes #5717. @faithanalog will follow-up with performance numbers she's running now. We should probably do this to other clients we generate on-the-fly in Nexus, too, but this is probably the worst offender to fix.
While attempting to upload 64 400-MiB disk images concurrently to madrid (4-sled racklette), I caused Nexus to consume 100% of the sled's CPU resources. These uploads were extremely slow, and over 50% of them failed to fully upload their disks.
madrid is running fcf7980
You can read the full adventure here https://github.com/oxidecomputer/artemiscellaneous/blob/main/journal/2024-05-06-slow-disk-uploads.md#adventures-in-parallel-boot-disk-uploads
But here's the important bits:
I uploaded the images like this:
Each image uploaded at about 300 KiB/s, or about 18MiB/s aggregate average. This generated tremendous resource contention in nexus.
prstat from the sled with nexus:
flamegraph of nexus (view raw for interactivity)
I was uploading these images from an instance in dogfood which is located in the same physical place as madrid, meaning I had a high-speed connection. You can see the graph of network traffic from the dogfood instance is very sporadic despite this.
The problem, ultimately, is that Nexus is not caching its
CruciblePantryClient
. Theoxide
cli splits the image up into 512KiB blocks. Each time one of these 512KiB blocks is uploaded, Nexus creates a newCruciblePantryClient
from scratch, and pays all the overhead of setting up that client. The flamegraph shows the overhead is mostly down to Reqwest preparing a TLS-capable connector (though it won't use TLS at all for this connection), which causes openssl to read & parse the system trusted certificate roots.We are managing to upload 37 of these blocks per second, but we have
64 CLI invocations * 8 worker threads per invocation =
512 client worker threads fighting over this throughput, which leads to a number of them failing, presumably due to connection timeouts.This problem is probably not unique to massively parallel uploads, and uploading a few images in parallel likely generates a considerable load (though much less contention), but I have not measured this.
Caching the
CruciblePantryClient
between block uploads should make this a lot better.The text was updated successfully, but these errors were encountered: