-
Notifications
You must be signed in to change notification settings - Fork 68
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Allow to call get_results
lazily
#1596
Comments
The Analysis interface is doing this already, perhaps this can be applied as a blueprint? Maybe we can also use dask.delayed? |
This callable is returned as an analysis result, for example. |
I think this can be done in a generic way that works for all UDFs the same way, instead of allowing UDFs to opt-in via changing An example implementation could switch
I think that would be overkill for this usage (would create a hard dependency on |
Right now,
UDF.get_results
is called for every intermediate result, for every UDF that is running, when usingrun_udf_iter
. The user ofrun_udf_iter
may not be interested in all of these intermediates, especially if they are very small. Additionally, if theget_results
function is relatively expensive, and partitions comparatively small, this can result in a potential slow-down, asget_results
becomes the bottleneck. A sample profile where this happened in practice looks like this:(
get_results
is highlighted and takes ~50% in this case)Instead of eagerly generating results for every intermediate result, if we only lazily call it, we can decouple the computation and visualization in a way that we only pay the
get_results
cost if we are actually "drawing" an updated visualization frame (in this case, we send it over the network, but the same principle applies by backpressure)This would remove another reason to manually set the
frames_per_partition
parameter in LiberTEM-live use cases.The text was updated successfully, but these errors were encountered: