Add examples for detection models finetuning #30422

qubvel · 2024-04-23T12:00:09Z

What does this PR do?

Add examples how to fine-tune DETR, DETA, Deformable DETR, Conditional DETR, YOLOS with Trainer and Accelerate.

Introduced evaluation in Trainer API for detection models. Now it is possible to train models with ongoing evaluation and metrics tracking (fixed in #30267), this unblocks selecting the best checkpoint based on metric instead of loss.
Simplified metrics computation pipeline, reducing bounding boxes and output format conversions.

Finetuned models can be found here.
W&B report can be found here.

Before submitting

This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
Did you read the contributor guideline,
Pull Request section?
Was this discussed/approved via a Github issue or the forum? Please add a link
to it if that's the case.
Did you make sure to update the documentation with your changes? Here are the
documentation guidelines, and
here are tips on formatting docstrings.
Did you write any new necessary tests?

Who can review?

Anyone in the community is free to review the PR once the tests have passed. Feel free to tag
members/contributors who may be interested in your PR.

@NielsRogge

HuggingFaceDocBuilderDev · 2024-04-23T12:24:18Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

examples/pytorch/_tests_requirements.txt

examples/pytorch/object-detection/run_object_detection.py

Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com>

qubvel · 2024-04-26T11:59:48Z

@amyeroberts could you please review. Futher improvements of the training pipeline can be done as a next PR.

amyeroberts

Looks great! Thanks for all the work adding this and the additional PRs to make these models work well in our library ❤️

Just a few small comments - main one about not using the image processor to process the images for speed considerations. All comments for the trainer script obviously apply to the non-trainer one.

examples/pytorch/object-detection/run_object_detection.py

amyeroberts · 2024-04-29T18:00:58Z

examples/pytorch/object-detection/run_object_detection.py

+    images = []
+    annotations = []
+    for image_id, image, objects in zip(examples["image_id"], examples["image"], examples["objects"]):
+        image = np.array(image.convert("RGB"))
+
+        # apply augmentations
+        output = transform(image=image, bboxes=objects["bbox"], category=objects["category"])
+        images.append(output["image"])
+
+        # format annotations in COCO format
+        formatted_annotations = format_image_annotations_as_coco(
+            image_id, output["category"], objects["area"], output["bboxes"]
+        )
+        annotations.append(formatted_annotations)
+
+    # Apply the image processor transformations: resizing, rescaling, normalization
+    result = image_processor(images=images, annotations=annotations, return_tensors="pt")
+
+    return result


I'd recommend not using the image processor at all. They're very slow, in part because the resizing is done in pillow (for historical reasons). This means every image is converted to numpy -> PIL.Image.Image -> numpy. Instead, I'd just do all of the transformations within the library of choice (albumentations, torchvision etc.) and use the image processor for values like size if needed

I suggest leaving image_processor for correct input formatting and turning off padding and resizing.

image_processor = AutoImageProcessor.from_pretrained( model_args.image_processor_name or model_args.model_name_or_path, # At this moment we recommend using external transform to pad and resize images. # It`s faster and yields much better results for object-detection models. do_pad=False, do_resize=False, # We will save image size parameter in config just for reference size={"longest_edge": data_args.image_square_size}, **common_pretrained_args, )

For padding and resizing I suggest the following strategy:

Resize the largest size of an image to image_square_size

Pad image to image_square_size x image_square_size

This strategy yields much better results in terms of mAP and also almost removes batch dependency for evaluation.
Here are two models trained with both strategies and evaluated for batch sizes 8 and 1.

Deformable DETR fine-tuned with such padding archives mAP@0.5..0.95 = 0.5414, while on papers with code top model TridentNet achieve lower mAP@0.5..0.95 = 0.529 on CPPE-5 dataset (probably, this LB is outdated, but still can be used as a reference).

Please, let me know what you think about this. Is it worth adding this strategy to image processors too?

Thanks for the detailed explanation and runs. OK, sounds good to me!

Regarding adding this to the image processors, it's a bit tricky as we need to account for backwards compatibility: even though this produces better results, DETR is a commonly used model and we shouldn't change the default behaviour. One option would be to add a flag to the image processor, which allows the user to pick the padding strategy, falling back to the current one by default

Ok, got it. I will update accelerate example and make sure the tests pass.

Regarding image processors I understand the backward compatibility issues, here are some options we can implement

Add preserve_image_ratio flag (default is False). In combination with size = {height: ...; width: ...} it can be used to implement the suggested strategy. It is flexible, so we can use it even for non-square sizes. An image will be resized to respect height/width depending on the longest side and then padded to the specified size. But I dont like that it is not evident, and preserve_image_ratio=False may confuse for size = {longest_edge: ...; shortest_edge: ...} option, because this size option preserve image ratio by default.

As you suggested, add a flag pad_strategy="batch". For batch it will follow current behaviour, for pad_strategy="size" it will pad to size = {height: ...; width: ...}. The problem with size = {longest_edge: ...; shortest_edge: ...} - we do not know which one is height and which is width, but we can raise an error forpad_strategy="size" to provide a correct size dict with height and width.

Both of them will require changing current resize and pad logic, but the second option seems better to me, let me know if you have any thoughts on that

Agreed - let's go with the second option 🤝

examples/pytorch/object-detection/run_object_detection.py

qubvel · 2024-04-30T17:42:36Z

Something is wrong with albumentations installation from the git+commit, it's a bit strange, dependencies were not changed and previously it worked. Maybe some issue with caching. I will wait for tomorrow for albumentations release, this should resolve the issue, otherwise will inspect it.

qubvel · 2024-05-06T09:31:15Z

@amyeroberts the comments are addressed and tests passed, can you please approve if it is OK now

amyeroberts

Thanks for adding this and iterating! Looks great 🤗

It would be good to have the updated image processing logic included, such that we bypass the expensive resizing in the image processors, but happy to merge as-is and include in a follow-up.

I suspect we might have to change the image_square_size argument to something more flexible, to account for models which accept non-square inputs. Let's leave for now and cross that bridge when we come to it.

muellerzr · 2024-05-09T14:19:57Z

@qubvel BTW I'm seeing these fail during multi-GPU tests. Here's the trace from our nightly

FAILED examples/pytorch/test_pytorch_examples.py::ExamplesTests::test_run_object_detection - IndexError: Caught IndexError in replica 0 on device 0.
Original Traceback (most recent call last):
  File "/opt/conda/envs/accelerate/lib/python3.10/site-packages/torch/nn/parallel/parallel_apply.py", line 83, in _worker
    output = module(*input, **kwargs)
  File "/opt/conda/envs/accelerate/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1532, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/opt/conda/envs/accelerate/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1541, in _call_impl
    return forward_call(*args, **kwargs)
  File "/opt/conda/envs/accelerate/lib/python3.10/site-packages/transformers/models/detr/modeling_detr.py", line 1611, in forward
    loss_dict = criterion(outputs_loss, labels)
  File "/opt/conda/envs/accelerate/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1532, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/opt/conda/envs/accelerate/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1541, in _call_impl
    return forward_call(*args, **kwargs)
  File "/opt/conda/envs/accelerate/lib/python3.10/site-packages/transformers/models/detr/modeling_detr.py", line 2210, in forward
    indices = self.matcher(outputs_without_aux, targets)
  File "/opt/conda/envs/accelerate/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1532, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/opt/conda/envs/accelerate/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1541, in _call_impl
    return forward_call(*args, **kwargs)
  File "/opt/conda/envs/accelerate/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
    return func(*args, **kwargs)
  File "/opt/conda/envs/accelerate/lib/python3.10/site-packages/transformers/models/detr/modeling_detr.py", line 2339, in forward
    indices = [linear_sum_assignment(c[i]) for i, c in enumerate(cost_matrix.split(sizes, -1))]
  File "/opt/conda/envs/accelerate/lib/python3.10/site-packages/transformers/models/detr/modeling_detr.py", line 2339, in <listcomp>
    indices = [linear_sum_assignment(c[i]) for i, c in enumerate(cost_matrix.split(sizes, -1))]
IndexError: index 2 is out of bounds for dimension 0 with size 2

qubvel · 2024-05-09T14:31:52Z

@muellerzr thanks for letting me know, I will try to figure out why that happens

qubvel changed the title ~~Add examples for detection models fintuning~~ Add examples for detection models finetuning Apr 23, 2024