-
Notifications
You must be signed in to change notification settings - Fork 28.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add examples for detection models finetuning #30422
Conversation
The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update. |
Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com>
Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com>
8d2dca0
to
824b883
Compare
@amyeroberts could you please review. Futher improvements of the training pipeline can be done as a next PR. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks great! Thanks for all the work adding this and the additional PRs to make these models work well in our library ❤️
Just a few small comments - main one about not using the image processor to process the images for speed considerations. All comments for the trainer script obviously apply to the non-trainer one.
images = [] | ||
annotations = [] | ||
for image_id, image, objects in zip(examples["image_id"], examples["image"], examples["objects"]): | ||
image = np.array(image.convert("RGB")) | ||
|
||
# apply augmentations | ||
output = transform(image=image, bboxes=objects["bbox"], category=objects["category"]) | ||
images.append(output["image"]) | ||
|
||
# format annotations in COCO format | ||
formatted_annotations = format_image_annotations_as_coco( | ||
image_id, output["category"], objects["area"], output["bboxes"] | ||
) | ||
annotations.append(formatted_annotations) | ||
|
||
# Apply the image processor transformations: resizing, rescaling, normalization | ||
result = image_processor(images=images, annotations=annotations, return_tensors="pt") | ||
|
||
return result |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd recommend not using the image processor at all. They're very slow, in part because the resizing is done in pillow (for historical reasons). This means every image is converted to numpy -> PIL.Image.Image -> numpy. Instead, I'd just do all of the transformations within the library of choice (albumentations, torchvision etc.) and use the image processor for values like size
if needed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I suggest leaving image_processor
for correct input formatting and turning off padding and resizing.
image_processor = AutoImageProcessor.from_pretrained(
model_args.image_processor_name or model_args.model_name_or_path,
# At this moment we recommend using external transform to pad and resize images.
# It`s faster and yields much better results for object-detection models.
do_pad=False,
do_resize=False,
# We will save image size parameter in config just for reference
size={"longest_edge": data_args.image_square_size},
**common_pretrained_args,
)
For padding and resizing I suggest the following strategy:
- Resize the largest size of an image to
image_square_size
- Pad image to
image_square_size x image_square_size
This strategy yields much better results in terms of mAP and also almost removes batch dependency for evaluation.
Here are two models trained with both strategies and evaluated for batch sizes 8
and 1
.

Deformable DETR fine-tuned with such padding archives mAP@0.5..0.95 = 0.5414
, while on papers with code top model TridentNet achieve lower mAP@0.5..0.95 = 0.529
on CPPE-5 dataset (probably, this LB is outdated, but still can be used as a reference).
Please, let me know what you think about this. Is it worth adding this strategy to image processors too?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the detailed explanation and runs. OK, sounds good to me!
Regarding adding this to the image processors, it's a bit tricky as we need to account for backwards compatibility: even though this produces better results, DETR is a commonly used model and we shouldn't change the default behaviour. One option would be to add a flag to the image processor, which allows the user to pick the padding strategy, falling back to the current one by default
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, got it. I will update accelerate example and make sure the tests pass.
Regarding image processors I understand the backward compatibility issues, here are some options we can implement
- Add
preserve_image_ratio
flag (default is False). In combination withsize = {height: ...; width: ...}
it can be used to implement the suggested strategy. It is flexible, so we can use it even for non-square sizes. An image will be resized to respect height/width depending on the longest side and then padded to the specified size. But I dont like that it is not evident, andpreserve_image_ratio=False
may confuse forsize = {longest_edge: ...; shortest_edge: ...}
option, because this size option preserve image ratio by default. - As you suggested, add a flag
pad_strategy="batch"
. Forbatch
it will follow current behaviour, forpad_strategy="size"
it will pad tosize = {height: ...; width: ...}
. The problem withsize = {longest_edge: ...; shortest_edge: ...}
- we do not know which one is height and which is width, but we can raise an error forpad_strategy="size"
to provide a correct size dict with height and width.
Both of them will require changing current resize and pad logic, but the second option seems better to me, let me know if you have any thoughts on that
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed - let's go with the second option 🤝
Something is wrong with |
@amyeroberts the comments are addressed and tests passed, can you please approve if it is OK now |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for adding this and iterating! Looks great 🤗
It would be good to have the updated image processing logic included, such that we bypass the expensive resizing in the image processors, but happy to merge as-is and include in a follow-up.
I suspect we might have to change the image_square_size
argument to something more flexible, to account for models which accept non-square inputs. Let's leave for now and cross that bridge when we come to it.
@qubvel BTW I'm seeing these fail during multi-GPU tests. Here's the trace from our nightly FAILED examples/pytorch/test_pytorch_examples.py::ExamplesTests::test_run_object_detection - IndexError: Caught IndexError in replica 0 on device 0.
Original Traceback (most recent call last):
File "/opt/conda/envs/accelerate/lib/python3.10/site-packages/torch/nn/parallel/parallel_apply.py", line 83, in _worker
output = module(*input, **kwargs)
File "/opt/conda/envs/accelerate/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1532, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/opt/conda/envs/accelerate/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1541, in _call_impl
return forward_call(*args, **kwargs)
File "/opt/conda/envs/accelerate/lib/python3.10/site-packages/transformers/models/detr/modeling_detr.py", line 1611, in forward
loss_dict = criterion(outputs_loss, labels)
File "/opt/conda/envs/accelerate/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1532, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/opt/conda/envs/accelerate/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1541, in _call_impl
return forward_call(*args, **kwargs)
File "/opt/conda/envs/accelerate/lib/python3.10/site-packages/transformers/models/detr/modeling_detr.py", line 2210, in forward
indices = self.matcher(outputs_without_aux, targets)
File "/opt/conda/envs/accelerate/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1532, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/opt/conda/envs/accelerate/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1541, in _call_impl
return forward_call(*args, **kwargs)
File "/opt/conda/envs/accelerate/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
return func(*args, **kwargs)
File "/opt/conda/envs/accelerate/lib/python3.10/site-packages/transformers/models/detr/modeling_detr.py", line 2339, in forward
indices = [linear_sum_assignment(c[i]) for i, c in enumerate(cost_matrix.split(sizes, -1))]
File "/opt/conda/envs/accelerate/lib/python3.10/site-packages/transformers/models/detr/modeling_detr.py", line 2339, in <listcomp>
indices = [linear_sum_assignment(c[i]) for i, c in enumerate(cost_matrix.split(sizes, -1))]
IndexError: index 2 is out of bounds for dimension 0 with size 2 |
@muellerzr thanks for letting me know, I will try to figure out why that happens |
What does this PR do?
Add examples how to fine-tune DETR, DETA, Deformable DETR, Conditional DETR, YOLOS with Trainer and Accelerate.
Introduced evaluation in Trainer API for detection models. Now it is possible to train models with ongoing evaluation and metrics tracking (fixed in #30267), this unblocks selecting the best checkpoint based on metric instead of loss.
Simplified metrics computation pipeline, reducing bounding boxes and output format conversions.
Finetuned models can be found here.
W&B report can be found here.
Before submitting
Pull Request section?
to it if that's the case.
documentation guidelines, and
here are tips on formatting docstrings.
Who can review?
Anyone in the community is free to review the PR once the tests have passed. Feel free to tag
members/contributors who may be interested in your PR.
@NielsRogge