Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Overhaul of train.py and adding Chesapeake CVPR trainer #103

Merged
merged 15 commits into from Sep 9, 2021

Conversation

calebrob6
Copy link
Member

@calebrob6 calebrob6 commented Sep 4, 2021

Some key points

  • Changed how the configuration files are structured. See conf/defaults.yaml for the gist of this. There is now experiment.module and experiment.datamodule keys that get passed as kwargs to the LightningModule and LightningDataModule for a given task. This makes everything quite flexible. Because everything is a bit more flexible I could get rid of some nastyness in train.py.
  • Fixed the tests accordingly
  • train.py saves the final config that is used for the experiment to the experiment output directory
  • Made a trainer for the Chesapeake CVPR dataset

And.... it is working well! Here is output from tensorboard showing val image, val mask, predictions:
image

Things to add in the nearish future:

  • A way to train on a concat of different state train splits
  • Augmentations

@codecov-commenter
Copy link

codecov-commenter commented Sep 4, 2021

Codecov Report

Merging #103 (58782d9) into main (04355ec) will decrease coverage by 4.33%.
The diff coverage is 27.71%.

Impacted file tree graph

@@            Coverage Diff             @@
##             main     #103      +/-   ##
==========================================
- Coverage   83.65%   79.32%   -4.34%     
==========================================
  Files          31       32       +1     
  Lines        1927     2089     +162     
==========================================
+ Hits         1612     1657      +45     
- Misses        315      432     +117     
Impacted Files Coverage Δ
torchgeo/trainers/landcoverai.py 48.85% <ø> (ø)
torchgeo/trainers/naipchesapeake.py 29.72% <ø> (ø)
torchgeo/trainers/sen12ms.py 62.16% <ø> (ø)
torchgeo/datasets/chesapeake.py 66.47% <26.66%> (-2.52%) ⬇️
torchgeo/trainers/chesapeake.py 26.84% <26.84%> (ø)
torchgeo/trainers/__init__.py 100.00% <100.00%> (ø)
torchgeo/trainers/cyclone.py 42.35% <100.00%> (ø)

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 04355ec...58782d9. Read the comment docs.

Comment on lines +139 to +152
# Render the image, ground truth mask, and predicted mask for the first
# image in the batch
img = np.rollaxis( # convert image to channels last format
batch["image"][0].cpu().numpy(), 0, 3
)
mask = batch["mask"][0].cpu().numpy()
pred = y_hat_hard[0].cpu().numpy()
fig, axs = plt.subplots(1, 3, figsize=(12, 4))
axs[0].imshow(img[:, :, :3])
axs[0].axis("off")
axs[1].imshow(mask, vmin=0, vmax=6, cmap=CMAP, interpolation="none")
axs[1].axis("off")
axs[2].imshow(pred, vmin=0, vmax=6, cmap=CMAP, interpolation="none")
axs[2].axis("off")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we instead move the plotting logic to the ChesapeakeCVPR dataset class? That's how RasterDataset and VectorDataset are designed. Then you can just call dataset.plot(sample) and the code can be reused by anyone using the dataset in their own code.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

RasterDataset plot takes a single Tensor as input, while this code is plotting (image, mask, predictions). Do you imagine multiple versions of plot?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, I think this is the first GeoDataset we have with both image and mask. We should change plot to take in a sample dict instead.

@adamjstewart adamjstewart added the trainers PyTorch Lightning trainers label Sep 7, 2021
adamjstewart
adamjstewart previously approved these changes Sep 9, 2021
Copy link
Collaborator

@adamjstewart adamjstewart left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Gave things another review. My biggest complaint with this chunk of code is a lack of documentation and testing. I realize we're kind of busy trying to finish experiments before we can publish things, but we need to make sure that this code is actually well-tested and documented before we start making releases.

@@ -0,0 +1,22 @@
trainer:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My biggest complaint about all this OmegaConf stuff is that there doesn't seem to be any documentation on what options are supported or what possible values they can take. There's no way to get a help message without argparse.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is discussed in much more detail here -- facebookresearch/hydra#633.

For now I think comments in the yaml files are OK -- we can do more here. I think there will be few scenarios in which a user is trying to configure experiments without looking at the trainer code.

conf/task_defaults/chesapeake_cvpr.yaml Outdated Show resolved Hide resolved
torchgeo/trainers/chesapeake.py Outdated Show resolved Hide resolved
)


class ChesapeakeCVPRSegmentationTask(LightningModule):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These trainer modules desperately need unit tests at some point, our overall code coverage is plummeting, see #109

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Have you tried running a trainer :)?

We ran into a very obvious problem in "tested" code this morning: I wanted to run python train.py with the cyclone dataset. This works in my environment because the dataset is where I expect and this works in tests because we have fake data where we expect it. This did not work in practice because other people don't have the dataset downloaded.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At the moment train.py isn't "tested" code. In the long run, I want to have integration tests that use real data and may take several hours to run. These will only be run on release branches, not on main/PRs. Everything is set up to do this except someone needs to actually write the tests. The unit tests will only be for coverage and catching minor issues.

torchgeo/trainers/cyclone.py Show resolved Hide resolved
torchgeo/trainers/cyclone.py Show resolved Hide resolved
torchgeo/trainers/cyclone.py Outdated Show resolved Hide resolved
@calebrob6
Copy link
Member Author

@adamjstewart:

  • Fixed line endings
  • Added some docstrings. Many of these methods are overriding pytorch lightning methods so I'm not sure what we gain by annotating them further? How is this normally handled?
  • Got rid of pin_memory=False, left shuffle=True/False to be explicit
  • I'm not sure what to do about your other comments. I broadly agree with what you are saying but don't see the benefit of spending a lot of time getting things perfect until early Oct.

@adamjstewart
Copy link
Collaborator

I'm not sure what we gain by annotating them further? How is this normally handled?

If you look at our current API docs, you'll see that many of these methods are undocumented and only mention the names of arguments, not what they mean. We should try adding autodoc_inherit_docstrings and see if this will add the superclass docstrings. I don't want to document all inherited members, but I think this will inherit docstrings without documenting non-overridden members.

I'm not sure what to do about your other comments. I broadly agree with what you are saying but don't see the benefit of spending a lot of time getting things perfect until early Oct.

Yep, that's fine. None of these suggestions are requirements that need to be done now, just pointing them out so we can think about them. As soon as we submit the paper I plan on going back and adding a lot of unit tests and fixing bugs so we can safely release.

@calebrob6 calebrob6 merged commit cb3d7e1 into main Sep 9, 2021
@calebrob6 calebrob6 deleted the feature/chesapeake_cvpr_trainer branch September 9, 2021 22:12
@adamjstewart adamjstewart added this to the 0.1.0 milestone Nov 20, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
trainers PyTorch Lightning trainers
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants