Allow to call `get_results` lazily #1596

sk1p · 2024-02-19T15:00:00Z

Right now, UDF.get_results is called for every intermediate result, for every UDF that is running, when using run_udf_iter. The user of run_udf_iter may not be interested in all of these intermediates, especially if they are very small. Additionally, if the get_results function is relatively expensive, and partitions comparatively small, this can result in a potential slow-down, as get_results becomes the bottleneck. A sample profile where this happened in practice looks like this:

(get_results is highlighted and takes ~50% in this case)

Instead of eagerly generating results for every intermediate result, if we only lazily call it, we can decouple the computation and visualization in a way that we only pay the get_results cost if we are actually "drawing" an updated visualization frame (in this case, we send it over the network, but the same principle applies by backpressure)

This would remove another reason to manually set the frames_per_partition parameter in LiberTEM-live use cases.

The text was updated successfully, but these errors were encountered:

uellue · 2024-02-19T16:15:02Z

The Analysis interface is doing this already, perhaps this can be applied as a blueprint?

Maybe we can also use dask.delayed?

uellue · 2024-02-19T16:18:39Z

LiberTEM/src/libertem/analysis/radialfourier.py

Line 173 in 9a907a9

def resultlist():

This callable is returned as an analysis result, for example.

sk1p · 2024-02-19T16:55:34Z

The Analysis interface is doing this already, perhaps this can be applied as a blueprint?
[...]
This callable is returned as an analysis result, for example.

I think this can be done in a generic way that works for all UDFs the same way, instead of allowing UDFs to opt-in via changing get_results or similar, like it's done for analyses. We can just decide not to call get_results if a result is not used by the consumer of the iteration. There's no explicit guarantee in the UDF interface that each merge call means get_results will be called afterwards, for example (another case to think about: per-node merge trees in the distributed use-case, where results from worker processes on a node are merged together before being sent to a main node, which is an option we should keep open for ourselves). We can clarify this in the documentation, to make clear that get_results should be a pure function.

An example implementation could switch UDFResults with a UDFResultsLazy class, with the same outer interface, just instead of having the final results, is has the merge results plus a reference to the UDFs themselves. On accessing UDFResults.buffers, get_results is called (need to be careful about buffers that are re-used, concurrency etc., where it gets a bit interesting - we may actually need a lock to guarantee atomic access to partial results, without interleaving merge calls with result accesses)

Maybe we can also use dask.delayed?

I think that would be overkill for this usage (would create a hard dependency on dask in core UDF code, and always need dask resources/threads/clients/...) - we are not building up complicated graphs I think, and pure-Python should be enough (lazy == a function instead of a value). If you are already using the dask delayed executor for the computation, the result is already lazy (well, and you can't actually use run_udf_iter so 🤷)

sk1p added this to the 0.15 milestone Apr 23, 2024

sk1p added a commit to sk1p/LiberTEM that referenced this issue Apr 25, 2024

WIP: make get_results lazy, refs LiberTEM#1596

d22f435

sk1p linked a pull request Apr 25, 2024 that will close this issue

WIP: make get_results lazy #1632

Open

10 tasks

sk1p added a commit to sk1p/LiberTEM that referenced this issue Apr 29, 2024

WIP: make get_results lazy, refs LiberTEM#1596

936ecfb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Allow to call `get_results` lazily #1596

Allow to call `get_results` lazily #1596

sk1p commented Feb 19, 2024

uellue commented Feb 19, 2024

uellue commented Feb 19, 2024

sk1p commented Feb 19, 2024

Allow to call get_results lazily #1596

Allow to call get_results lazily #1596

Comments

sk1p commented Feb 19, 2024

uellue commented Feb 19, 2024

uellue commented Feb 19, 2024

sk1p commented Feb 19, 2024

Allow to call `get_results` lazily #1596

Allow to call `get_results` lazily #1596