-
Notifications
You must be signed in to change notification settings - Fork 68
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
WIP: Partitioning based on ROIs #764
base: master
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wow, that's a really small change for such a feature!
Interface looks good, we'll see how it handles when implementing algorithms. 👍
So, in numpy, once you apply fancy indexing, the affected dimensions are collapsed into one flat dimension, which is "compressed" to the number of nonzero entries. This is also how I currently handle the "normal" ROI. When dealing with overlapping ROIs, assigning coordinates to tiles becomes a bit more difficult. Just passing on the coordinates from the wrapped dataset (as in the API sketch in this PR) does not work, as the coordinates would all start at zero. My proposal here is to also flatten the navigation axis of the whole |
Discussion with @sk1p on behavior (WIP): Two options to implement the navigation dimension:
In the second case, we need tools to translate between the coordinates in the new navigation dimension and the navigation dimension of the underlying dataset. As a comment, the automatisms of
|
Maybe the |
Sounds good! 👍 |
FYI, in the Ptycho 4.0 call we just discussed this item. Since Simeon is picking up work on the CUDA/Alpaka side of this, we should put it on the short list after 0.6 is out. We can already test with SSB and @Sniper2k 's merge function, taking into account that SSB results change if you constrain it differently. :-) |
Codecov Report
@@ Coverage Diff @@
## master #764 +/- ##
===========================================
- Coverage 68.57% 35.70% -32.88%
===========================================
Files 260 261 +1
Lines 11751 11882 +131
Branches 1606 1618 +12
===========================================
- Hits 8058 4242 -3816
- Misses 3385 7434 +4049
+ Partials 308 206 -102
Continue to review full report at Codecov.
|
`ranges` are a pre-defined list of `[start, stop)` tuples. This can, for example, be used by the caching layer to get consistent partitioning, or by ROI partitions which need overlapping ranges. Also sprinkled some type annotations to improve vs code type inference.
* Add helper method: Slice.translate * `RoiPartition` now has a slice associated, in the flat "outer" coordinate system * Implement some new required methods: * `need_decode`, `adjust_tileshape`, `set_corrections`, `get_base_shape`, `get_io_backend`, `get_locations`, ... * Update `seq` dataset: add `ranges` parameter for `get_partitions` * UDFRunner: delegate `shape_for_roi` to `Partition` * Fix `test_get_single_macrotile` assetion: add missing reshape
Closes #651 - this is still only a simple sketch of the API, although the basic functionality is already there.
Refs Ptychography-4-0/ptychography#6
Still to do:
UDFRunner
, which depends on having a slice associated to thePartition
.Partition.shape_for_roi(roi)
method, which can override the currentpartition_shape=partition.slice.adjust_for_roi(roi).shape
used when constructing theUDFMeta
object, which can then also be used in the short-circuit logic that decides not to generate tasks for empty ROIsroi
argument torun_udf
on aRoiDataset
. Needs tests for the case that thePartition
ROIs actually overlap, and theroi
selects elements from the overlapping region, etc.DataSet.meta
property on allDataSet
implsQuestion about partition sizing: as the user may have requirements for overlapping etc. we don't have control over the size and number of partitions, as we can't use generic logic here. I guess this is the question: where do the partition ROIs come from? Should the user pass in a callable that generates the ROIs? For example, the decisions could depend on the number of GPUs available on the system, and the available memory, so this is actually a per-node decision. I guess the size of each ROI should fit the memory of the "smallest" GPU, for the edge case of having more than one GPU, but with different memory sizes.
Contributor Checklist:
changes were made to the GUI)