-
Notifications
You must be signed in to change notification settings - Fork 68
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
WIP/RFC: Prepared parameters #1484
base: master
Are you sure you want to change the base?
Conversation
Btw. in the after screenshot, there is still this delay of ~260ms, some of which is not accounted for by the current tracing. So there is still some more potential for reducing the initial latency, but prepared parameters shave off most of it. |
dcaf603
to
8aebb11
Compare
Just a quick comment on the plots part of this - I've noticed recently that we always call Lines 1421 to 1423 in 3255055
even though if we have pre-instantiated plots we just return them directly : Lines 1446 to 1448 in 3255055
In this case the That said, we could use the opportunity to potentially update existing plots with the current nav/sig shape and any parameter changes. Either if the acqusition changed or a UDF / Plot which depends on some kind of extra_shape determined with params. |
[...] Great point!
Ah, yeah, we need to figure out how/if that's possible with things like the bare matplotlib backend, but at least for those backends where its easily supported, we could add a way to re-use plot instances across these changes. |
This is by no means finished, but it is already very useful for keeping constant parameters on the workers across UDF runs, and for making sure they are scattered well before the data from the detector arrives (in the live processing use case). Things to be figured out: - Do we want to make this a general "preparation" API, including preparing plots etc.? - How does this interact with the dynamic parameter updating API? - Tests
8aebb11
to
215a847
Compare
Motivation: with LiberTEM-live, the workers should be ready to receive data as soon as the UDF run starts. If we have large-ish parameters, this is not the case - we first need to scatter parameters, before the actual run can start. This causes substantial jitter at the beginning, and putting data into the MP queues is also not properly parallelizable (I tried...). That means you have
O(number_of_workers)
work at the beginning, and in the worst case, your receiver code falls over, for example because the shared memory area fills up.One possible solution is to pre-scatter constant parameters, and only passing a handle to the UDF once the data is scattered to the workers.
In my testing on offline data, with 24 workers and a ~200MiB parameter set, the latency from calling
run_udf
to running the first task on a worker reduced from ~1.9s to ~270ms. The UDF is a dummy no-op one, but the parameter set is a realistic set of pre-computed masks for a radial fourier analysis, and is similar in size to a mask stack for SSB.Example code:
Before:
After:
This is a request for discussion and exploration of the solution space, and won't be merged like this.
Questions:
(PR includes changes from #1483, will rebase once that is merged.)Contributor Checklist:
Reviewer Checklist:
/azp run libertem.libertem-data
passed