feat: new ChessboardGeoSampler #2067

killian31 · 2024-05-17T09:55:41Z

Add `ChessboardGeoSampler` to TorchGeo

Description

This Pull Request introduces the ChessboardGeoSampler. The ChessboardGeoSampler samples patches in a chessboard-like pattern, minimizing overlap and redundant computation during evaluation.

Features

Chessboard Pattern Sampling: Reduces overlap and redundant computations.
Configurable Size and Stride: Allows setting patch size and stride.
Supports Different Units: Accepts pixel or CRS units.
Region of Interest (ROI): Samples from a specified ROI.
Deterministic Sampling: Control over start position.

Tests

Added unit tests in tests/samplers/test_single.py:
- test_len: Verifies length calculation.
- test_integer_multiple: Tests sampling with integer multiple sizes.
- test_float_multiple: Tests sampling with non-integer multiple sizes.

Example Usage

from torchgeo.samplers import ChessboardGeoSampler

dataset = CustomGeoDataset()
size = (5, 5)
stride = (5, 5)
sampler = ChessboardGeoSampler(dataset, size, stride, units=Units.CRS, start=0)

for bbox in sampler:
    print(bbox)

Example of train-val-test split using ChessboardGeoSampler (blue tiles are training samples from RandomGeoSampler, orange and purple are validation and test samples from ChessboardGeoSampler):

killian31 · 2024-05-17T09:59:08Z

@microsoft-github-policy-service agree [company="{Collecte Localisation Satellites (CLS)}"]

killian31 · 2024-05-17T09:59:41Z

@microsoft-github-policy-service agree company="Collecte Localisation Satellites (CLS)"

adamjstewart · 2024-05-17T10:03:11Z

How is this different from GridGeoSampler?

killian31 · 2024-05-17T11:40:31Z

How is this different from GridGeoSampler?

It samples patches in an alternating pattern (like a chessboard), reducing overlap. The GridGeoSampler samples every patch in a regular grid.

calebrob6 · 2024-05-17T11:51:36Z

This is definitely useful to have and used in geoML papers, e.g. https://www.nature.com/articles/s41467-021-24638-z

adamjstewart · 2024-05-17T11:55:00Z

@calebrob6 that example is for train-test splitting, not for sampling. We already have random_grid_cell_assignment for that use case.

) * Datasets: improve lazy import error msg for missing deps * Add type annotation * Use lazy imports throughout datasets * Fix support for older scipy * Fix support for older scipy * CI: test optional datasets on every commit * Update minversion and fix tests * Double quotes preferred over single quotes * Undo for now * Fast-fail during dataset initialization * Remove extraneous space * MissingDependencyError -> DependencyNotFoundError

…ft#2061)

isaaccorley · 2024-05-17T16:49:07Z

@killian31 could you plot some example patches and overlay over the original full image? The math seems to math but just want to make sure we aren't missing an increment somewhere.

adamjstewart · 2024-05-19T14:00:39Z

It's still not clear what the purpose or advantage of this sampler is. We already have:

random_grid_cell_assignment: for creating a train-val-test split based on a checkerboard
GridGeoSampler: for sampling all patches from a scene

From what you've described so far, it sounds like you're trying to split your dataset. Does random_grid_cell_assignment solve your needs? Or do things have to be deterministic for your use case? I wouldn't be opposed to a grid_cell_assignment function that does the same thing without the randomness.

Of course, we could use the sampler as an alternative to the splitter that gets a similar result. Initially we had no splitters, only samplers. But I think we're trying to move in a direction where splitters replace most of the features of samplers, as splitters are more familiar to torchvision users than samplers are. I'm trying to balance making things intuitive and having "more than one way to do things" with maintaining a limited amount of features without code duplication.

isaaccorley · 2024-05-19T15:22:22Z

Does this need to be its own class? Can this be achieved with the GridGeoSampler but by setting the stride to be equal to the patch size * 2?

adamjstewart · 2024-05-19T15:51:41Z

You would need 2 GridGeoSamplers offset by 1 row/col to achieve the same thing. But again, I don't know why you would want to sample in this way unless you were trying to achieve a checkerboard dataset split, which we already support via other mechanisms.

killian31 · 2024-05-19T18:20:04Z

It's still not clear what the purpose or advantage of this sampler is. We already have:

random_grid_cell_assignment: for creating a train-val-test split based on a checkerboard

GridGeoSampler: for sampling all patches from a scene

From what you've described so far, it sounds like you're trying to split your dataset. Does random_grid_cell_assignment solve your needs? Or do things have to be deterministic for your use case? I wouldn't be opposed to a grid_cell_assignment function that does the same thing without the randomness.

Of course, we could use the sampler as an alternative to the splitter that gets a similar result. Initially we had no splitters, only samplers. But I think we're trying to move in a direction where splitters replace most of the features of samplers, as splitters are more familiar to torchvision users than samplers are. I'm trying to balance making things intuitive and having "more than one way to do things" with maintaining a limited amount of features without code duplication.

Indeed I do use it at the splitting stage: I split my full ROI (the area inside the whole dataset's boundaries) into 2 ROIs, one for training and one for both validation and test. Then I define a RandomGeoSampler inside the training roi, and two ChessBoardGeoSampler : one for the validation set with start=0, one for the test set with start=1, both inside the same ROI. I am not sure random_grid_cell_assignment can help since I need the val-test split to be not random. Maybe I can come up with a grid_cell_assignment that would be better suited for the direction you are taken, I'll let you know!

@Bencpr do you have any clearer explanation of our needs?

adamjstewart · 2024-05-19T19:29:30Z

Thanks for the explanation!

Yes, I think grid_cell_assignment is the right strategy if you want it to be deterministic instead of random. You should be able to modify the existing random_grid_cell_assignment code to remove any randomness. Ideally, it would support a variable number of splits. Maybe the fractions parameter can decide how this works. The default could be [0.5, 0.5], which gives you what you currently have. But if I input [0.6, 0.2, 0.2], it would iterate through all grid cells (row by row and col by col) and assign them as follows:

train
train
train
val
test
train
...

Of course, if this sounds too difficult, we could save that feature for version 2.

feat: new ChessboardGeoSampler

0ef98ad

github-actions bot added testing Continuous integration testing samplers Samplers for indexing datasets labels May 17, 2024

fix: code style

b0bf885

fix: remove shuffle

77d4412

adamjstewart and others added 5 commits May 17, 2024 13:59

Bump torchmetrics from 1.4.0 to 1.4.0.post0 in /requirements (microso…

3a77c43

…ft#2061)

Bump matplotlib from 3.8.4 to 3.9.0 in /requirements (microsoft#2065)

3b6295d

Bump lightly from 1.5.3 to 1.5.4 in /requirements (microsoft#2066)

efa1e27

Merge branch 'main' into feat_chessboard_sampler

cc7cdad

Merge branch 'main' into feat_chessboard_sampler

9434f8a

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: new ChessboardGeoSampler #2067

feat: new ChessboardGeoSampler #2067

killian31 commented May 17, 2024 •

edited

killian31 commented May 17, 2024

killian31 commented May 17, 2024

adamjstewart commented May 17, 2024

killian31 commented May 17, 2024

calebrob6 commented May 17, 2024

adamjstewart commented May 17, 2024

isaaccorley commented May 17, 2024

adamjstewart commented May 19, 2024

isaaccorley commented May 19, 2024

adamjstewart commented May 19, 2024

killian31 commented May 19, 2024

adamjstewart commented May 19, 2024

feat: new ChessboardGeoSampler #2067

Are you sure you want to change the base?

feat: new ChessboardGeoSampler #2067

Conversation

killian31 commented May 17, 2024 • edited

Add ChessboardGeoSampler to TorchGeo

Description

Features

Tests

Example Usage

killian31 commented May 17, 2024

killian31 commented May 17, 2024

adamjstewart commented May 17, 2024

killian31 commented May 17, 2024

calebrob6 commented May 17, 2024

adamjstewart commented May 17, 2024

isaaccorley commented May 17, 2024

adamjstewart commented May 19, 2024

isaaccorley commented May 19, 2024

adamjstewart commented May 19, 2024

killian31 commented May 19, 2024

adamjstewart commented May 19, 2024

killian31 commented May 17, 2024 •

edited

Add `ChessboardGeoSampler` to TorchGeo