Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Useful window functions for large rasters. #2765

Draft
wants to merge 5 commits into
base: main
Choose a base branch
from

Conversation

groutr
Copy link
Contributor

@groutr groutr commented Feb 9, 2023

This PR adds some new and useful functionality to Windows and windows submodule.

  • Windows.__contains__: w1 in w2 determines if w1 is contained entirely within w2.
  • Windows.area: Property that returns the area of a window.
  • windows.neighbors: Return the 8 neighboring windows of a given window in clockwise order (filtering 0-area windows).
  • windows.subdivide: Subdivide a given window into (x, y) sized subwindows.
  • windows.merge: Merge two windows into a larger window that exactly covers the two input windows.

A minor utility function added to dtypes.py

  • Added dtypes.array_bytes to compute the size of an array in bytes given a dtype and shape. I've used a similar function to compute optimal window sizes and avoid memory overallocation when processing large rasters.

I welcome any feedback.

@sgillies
Copy link
Member

sgillies commented Apr 8, 2024

@groutr I'm not much in favor of growing the windows API except where we would use new methods to make rasterio's code better or to radically improve usability for common user code cases. My original intent was to offer users a class that would be a small usability improvement over needing to track 4 values (2 offsets and 2 length).

Windows only exist in raster space and don't have an area. Thus I'm 👎 on that. A measure of pixel volume or data size would be more appropriate.

Rasterio has some CLI commands that should be able to work on very large rasters, rio-calc and rio-merge, specifically. Are the new methods you propose useful to those commands?

I'd love to get opinions from @snowman2 and @vincentsarago. My own feeling on utility functions and classes is evolving towards this: utility methods are borrowing against the long term viability of a project to attract more users. Am I wrong? I could be.

@groutr
Copy link
Contributor Author

groutr commented Apr 10, 2024

@sgillies I totally understand the desire to make the API small and functional. These functions emerged after writing several raster processing utilities where it was critical to process the raster in chunks.

The intent here, and the reason why it's marked as draft, was to share some methods that simplified my raster processing in case it might prove general/useful enough to include in rasterio. No hard feelings if few or none of these are considered useful.

I believe some of the concepts could be quite useful for basic chunk-based processing, either large out-of-memory rasters or possible parallelism. Several of the window functions were initially developed when I created the tiled merge PR (#2221) but have since been refined/modified further. Some of these window functions would be useful for rio-calc and rio-merge. rio-convert could also benefit from processing rasters as chunks.

Of the code here, I believe subdivide is likely the most useful. It's definitely the function I've used the most.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants