Overhaul of train.py and adding Chesapeake CVPR trainer #103

calebrob6 · 2021-09-04T00:54:28Z

Some key points

Changed how the configuration files are structured. See conf/defaults.yaml for the gist of this. There is now experiment.module and experiment.datamodule keys that get passed as kwargs to the LightningModule and LightningDataModule for a given task. This makes everything quite flexible. Because everything is a bit more flexible I could get rid of some nastyness in train.py.
Fixed the tests accordingly
train.py saves the final config that is used for the experiment to the experiment output directory
Made a trainer for the Chesapeake CVPR dataset

And.... it is working well! Here is output from tensorboard showing val image, val mask, predictions:

Things to add in the nearish future:

A way to train on a concat of different state train splits
Augmentations

…the configuration files are structured

codecov-commenter · 2021-09-04T03:45:28Z

Codecov Report

Merging #103 (58782d9) into main (04355ec) will decrease coverage by 4.33%.
The diff coverage is 27.71%.

@@            Coverage Diff             @@
##             main     #103      +/-   ##
==========================================
- Coverage   83.65%   79.32%   -4.34%     
==========================================
  Files          31       32       +1     
  Lines        1927     2089     +162     
==========================================
+ Hits         1612     1657      +45     
- Misses        315      432     +117

Impacted Files	Coverage Δ
torchgeo/trainers/landcoverai.py	`48.85% <ø> (ø)`
torchgeo/trainers/naipchesapeake.py	`29.72% <ø> (ø)`
torchgeo/trainers/sen12ms.py	`62.16% <ø> (ø)`
torchgeo/datasets/chesapeake.py	`66.47% <26.66%> (-2.52%)`	⬇️
torchgeo/trainers/chesapeake.py	`26.84% <26.84%> (ø)`
torchgeo/trainers/__init__.py	`100.00% <100.00%> (ø)`
torchgeo/trainers/cyclone.py	`42.35% <100.00%> (ø)`

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 04355ec...58782d9. Read the comment docs.

adamjstewart · 2021-09-04T15:01:44Z

torchgeo/trainers/chesapeake.py

+            # Render the image, ground truth mask, and predicted mask for the first
+            # image in the batch
+            img = np.rollaxis(  # convert image to channels last format
+                batch["image"][0].cpu().numpy(), 0, 3
+            )
+            mask = batch["mask"][0].cpu().numpy()
+            pred = y_hat_hard[0].cpu().numpy()
+            fig, axs = plt.subplots(1, 3, figsize=(12, 4))
+            axs[0].imshow(img[:, :, :3])
+            axs[0].axis("off")
+            axs[1].imshow(mask, vmin=0, vmax=6, cmap=CMAP, interpolation="none")
+            axs[1].axis("off")
+            axs[2].imshow(pred, vmin=0, vmax=6, cmap=CMAP, interpolation="none")
+            axs[2].axis("off")


Can we instead move the plotting logic to the ChesapeakeCVPR dataset class? That's how RasterDataset and VectorDataset are designed. Then you can just call dataset.plot(sample) and the code can be reused by anyone using the dataset in their own code.

RasterDataset plot takes a single Tensor as input, while this code is plotting (image, mask, predictions). Do you imagine multiple versions of plot?

Ah, I think this is the first GeoDataset we have with both image and mask. We should change plot to take in a sample dict instead.

adamjstewart

Gave things another review. My biggest complaint with this chunk of code is a lack of documentation and testing. I realize we're kind of busy trying to finish experiments before we can publish things, but we need to make sure that this code is actually well-tested and documented before we start making releases.

adamjstewart · 2021-09-09T19:30:07Z

conf/chesapeake_cvpr.yaml

@@ -0,0 +1,22 @@
+trainer:


My biggest complaint about all this OmegaConf stuff is that there doesn't seem to be any documentation on what options are supported or what possible values they can take. There's no way to get a help message without argparse.

This is discussed in much more detail here -- facebookresearch/hydra#633.

For now I think comments in the yaml files are OK -- we can do more here. I think there will be few scenarios in which a user is trying to configure experiments without looking at the trainer code.

conf/task_defaults/chesapeake_cvpr.yaml

torchgeo/trainers/chesapeake.py

adamjstewart · 2021-09-09T19:33:33Z

torchgeo/trainers/chesapeake.py

+)
+
+
+class ChesapeakeCVPRSegmentationTask(LightningModule):


These trainer modules desperately need unit tests at some point, our overall code coverage is plummeting, see #109

Have you tried running a trainer :)?

We ran into a very obvious problem in "tested" code this morning: I wanted to run python train.py with the cyclone dataset. This works in my environment because the dataset is where I expect and this works in tests because we have fake data where we expect it. This did not work in practice because other people don't have the dataset downloaded.

At the moment train.py isn't "tested" code. In the long run, I want to have integration tests that use real data and may take several hours to run. These will only be run on release branches, not on main/PRs. Everything is set up to do this except someone needs to actually write the tests. The unit tests will only be for coverage and catching minor issues.

torchgeo/trainers/cyclone.py

calebrob6 · 2021-09-09T21:33:05Z

@adamjstewart:

Fixed line endings
Added some docstrings. Many of these methods are overriding pytorch lightning methods so I'm not sure what we gain by annotating them further? How is this normally handled?
Got rid of pin_memory=False, left shuffle=True/False to be explicit
I'm not sure what to do about your other comments. I broadly agree with what you are saying but don't see the benefit of spending a lot of time getting things perfect until early Oct.

adamjstewart · 2021-09-09T21:37:47Z

I'm not sure what we gain by annotating them further? How is this normally handled?

If you look at our current API docs, you'll see that many of these methods are undocumented and only mention the names of arguments, not what they mean. We should try adding autodoc_inherit_docstrings and see if this will add the superclass docstrings. I don't want to document all inherited members, but I think this will inherit docstrings without documenting non-overridden members.

I'm not sure what to do about your other comments. I broadly agree with what you are saying but don't see the benefit of spending a lot of time getting things perfect until early Oct.

Yep, that's fine. None of these suggestions are requirements that need to be done now, just pointing them out so we can think about them. As soon as we submit the paper I plan on going back and adding a lot of unit tests and fixing bugs so we can safely release.

calebrob6 added 8 commits September 4, 2021 00:51

Refactoring how the trainer modules are called from train.py and how …

e57109b

…the configuration files are structured

Turning off pin_memory to avoid some annoying warning messages

fba6506

Changing how the ChesapeakeCVPR dataset returns samples

625ef06

Working draft of trainer

2a1382e

Now stuff trains

b71ad1d

Formatting

8cdb422

Fixing tests

69c7bc4

Adding example configuration file for the Chesapeake CVPR dataset

6307709

Some weird corner case fixes

04d7c27

adamjstewart reviewed Sep 4, 2021

View reviewed changes

Checksum false fix

58782d9

adamjstewart added the trainers PyTorch Lightning trainers label Sep 7, 2021

calebrob6 added 2 commits September 9, 2021 15:09

Some modifications to conf files

6d825f2

Changes to fix Cyclone

f66fd02

adamjstewart previously approved these changes Sep 9, 2021

View reviewed changes

calebrob6 added 2 commits September 9, 2021 20:34

Adding an empty line to conf files

9a180e6

Getting rid of pin_memory=False and proper typing of kwargs

9780216

calebrob6 dismissed adamjstewart’s stale review via 9780216 September 9, 2021 20:51

Some docstrings

5ac51ed

adamjstewart approved these changes Sep 9, 2021

View reviewed changes

calebrob6 merged commit cb3d7e1 into main Sep 9, 2021

calebrob6 deleted the feature/chesapeake_cvpr_trainer branch September 9, 2021 22:12

adamjstewart added this to the 0.1.0 milestone Nov 20, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Overhaul of train.py and adding Chesapeake CVPR trainer #103

Overhaul of train.py and adding Chesapeake CVPR trainer #103

calebrob6 commented Sep 4, 2021 •

edited

codecov-commenter commented Sep 4, 2021 •

edited

adamjstewart Sep 4, 2021

calebrob6 Sep 8, 2021

adamjstewart Sep 8, 2021

adamjstewart left a comment

adamjstewart Sep 9, 2021

calebrob6 Sep 9, 2021

adamjstewart Sep 9, 2021

calebrob6 Sep 9, 2021

adamjstewart Sep 9, 2021

calebrob6 commented Sep 9, 2021

adamjstewart commented Sep 9, 2021

		)


		class ChesapeakeCVPRSegmentationTask(LightningModule):

Overhaul of train.py and adding Chesapeake CVPR trainer #103

Overhaul of train.py and adding Chesapeake CVPR trainer #103

Conversation

calebrob6 commented Sep 4, 2021 • edited

codecov-commenter commented Sep 4, 2021 • edited

Codecov Report

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

adamjstewart left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

calebrob6 commented Sep 9, 2021

adamjstewart commented Sep 9, 2021

calebrob6 commented Sep 4, 2021 •

edited

codecov-commenter commented Sep 4, 2021 •

edited