[SegGPT] Fix seggpt image processor #29550

EduardoPach · 2024-03-09T00:24:09Z

What does this PR do?

Fixes #29549 and add appropriate tests to avoid this issue to repeat itself in the future

Before submitting

This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
Did you read the contributor guideline,
Pull Request section?
Was this discussed/approved via a Github issue or the forum? Please add a link
to it if that's the case.
Did you make sure to update the documentation with your changes? Here are the
documentation guidelines, and
here are tips on formatting docstrings.
Did you write any new necessary tests?

Who can review?

Anyone in the community is free to review the PR once the tests have passed. Feel free to tag
members/contributors who may be interested in your PR.

c.c. @amyeroberts

EduardoPach · 2024-03-10T18:48:47Z

The fix is probably not ideal (the current way with try..except), but just wanted to start the conversation with a working solution

amyeroberts

Thanks for working on this and adding tests ❤️ !

As you suspected, try / except isn't the way to go, but it should be simple to handle as there's already conditional mask logic in the image processor

amyeroberts · 2024-03-11T20:09:31Z

src/transformers/models/seggpt/image_processing_seggpt.py

+        try:
+            images = make_list_of_images(images)
+        except ValueError:
+            images = make_list_of_images(images, expected_ndims=2)


No - we shouldn't use try / expect here. In fact, I don't think this call is robust if the input image has two dimensions as that's not currently compatible with many of the image transformations.

It would be better to have two separate methods, _preprocess_image and _preprocess_mask, each of which call _preprocess. _preprocess_mask can then use mask_to_rgb, before calling _preprocess and we can remove this step entirely for when we process images.

mask_to_rgb currently expects a list of numpy arrays and this conversion happens on the _preprocess_step method with make_list_of_images and to_numpy_array. The problem is that we are currently accepting/expecting 2D (semantic maps) and 3D (if already converted to RGB with some logic) images for mask prompt. Maybe an easy fix would be to allow the input of the expected dimension for the mask something like prompt_mask_dimension and then pass that to make_list_of_images, wdyt?

mask_to_rgb currently expects a list of numpy arrays

I don't think this is true - mask_to_rgb expects a single image

Maybe an easy fix would be to allow the input of the expected dimension for the mask something like prompt_mask_dimension and then pass that to make_list_of_images, wdyt?

This isn't ideal. It would be better for mask_to_rgb accept masks with and without channels and convert as necessary to RGB for both cases.

Ops my bad, mask_to_rgb expects a single image you're right.

But, still, the current problem lies on the make_list_of_images even if we split _preprocess to _preprocess_image and _preprocess_mask this problem would still be there and since make_list_of_images takes an ImageInput as arg which can be either PIL, numpy, ... and even if we had this logic to parse the input somewhere else it would still be ambiguous if a 3D mask is a batch of zero channel images or a single RGB mask (for inputs other than PIL)

OK, I see, so you're saying that the difficulty comes from the face the passed in masks can be (B, H, W) or (B, C, H, W)? The current approach wouldn't fix this, as it would still make a batch of (B, H, W) masks a list of a [batch] instead of [mask_0, mask_1, mask_2, ... ]

I would just add a custom make_list_of_images for this model, which tasks the images and masks and makes a list of the masks based on the batch size of the input images

@amyeroberts Yeah, but there's another issue to that in the sense that if the image is 3D it can either be (B, H, W) batched semantic map or (C, H, W) RGB image. This will only be a problem if we have ChannelDimension.First and B=3

Therefore, even creating a custom make_list_of_images (or make_list_of_masks as the problem is in the masks) we'll have ambiguity in the case where image.ndim==3.

This is the portion of make_list_of_images that raises problems:

if is_valid_image(images): if images.ndim == expected_ndims + 1: # Batch of images images = list(images) elif images.ndim == expected_ndims: # Single image images = [images] else: raise ValueError( f"Invalid image shape. Expected either {expected_ndims + 1} or {expected_ndims} dimensions, but got" f" {images.ndim} dimensions." ) return images

At this point the images are either np.ndarray or a torch.tensor in the context of SegGpt.

I've created a new function called make_list_of_masks here 7ea7a41 and deal with that, but had to make an assumption that if images is 3D then if the last of first dim is 3 it's a single RGB image.

…ocessor

EduardoPach · 2024-04-09T08:43:06Z

c.c. @NielsRogge

NielsRogge

Thanks for improving this!

amyeroberts

Very nice - thanks for working on this!

Only comment is this would still fail for batch_size = 3.

src/transformers/models/seggpt/image_processing_seggpt.py

amyeroberts · 2024-04-18T18:39:30Z

src/transformers/models/seggpt/image_processing_seggpt.py

+            # Assuming that if first or last dimension is 3 (Channel first or last) it is RGB
+            images = [images] if (images.shape[0] == 3 or images.shape[-1] == 3) else list(images)


This won't work if 3 grayscale maps are batched together.

One option (not great) is getting the batch size, or image size, from the images and having this as an optional argument that can be passed in. We can use 3 as a default

Indeed not ideal, therefore I made a new proposal here f372a40

Instead of having to deal with that, I added a new optional argument called segmentation_maps that follows other image processor's conventions. So, to obtain the prompt_masks in the BatchedFeature one should specify prompt_masks or segmentation_maps where segmentation_maps should be the 2-dimensional input that we convert to RGB when num_labels is specified.

tests/models/seggpt/test_image_processing_seggpt.py

…ocessor

amyeroberts · 2024-04-24T16:26:09Z

docs/source/en/model_doc/seggpt.md

+- When using binary masks or segmentation maps as prompts pass it to [`SegGptImageProcessor`] `segmentation_maps` argument, if you already converted your prompt mask to RGB then pass it to `prompt_masks` argument.
+- It's highly advisable to pass `num_labels` when using `segmetantion_maps` (not considering background) during preprocessing and postprocessing with [`SegGptImageProcessor`] for your use case.


No, let's not do this. Having two different inputs like this is both error prone and an antipattern. The image processor can selectively process based on the do_xxx flags e.g. if the input image was already resized, we wouldn't have a different input e.g. resized_images we would still pass in images and set do_resize=False. In this case, it looks like what we want is do_convert_rgb

It makes perfect sense! Introduced the do_convert_rgb flag and updated the tests accordingly here 2185b3e

…ocessor

amyeroberts

Looking good - thanks for iterating on this!

Just a few places left to tidy up

src/transformers/models/seggpt/image_processing_seggpt.py

amyeroberts · 2024-04-26T11:18:32Z

src/transformers/models/seggpt/image_processing_seggpt.py

-            input_data_format (`ChannelDimension` or `str`, *optional*):
-                The channel dimension format for the input image. If unset, the channel dimension format is inferred
-                from the input image. Can be one of:
-                - `"channels_first"` or `ChannelDimension.FIRST`: image in (num_channels, height, width) format.
-                - `"channels_last"` or `ChannelDimension.LAST`: image in (height, width, num_channels) format.


All image processor methods should accept this as an input argument

This method is specific for 2D (no channels) masks, thus I see no reason for it to have an input_data_format should I add it to keep the same signature?

OK - in this case let's leave as-is

amyeroberts · 2024-04-26T13:34:24Z

src/transformers/models/seggpt/image_processing_seggpt.py

+                built, assuming that class_idx 0 is the background, to map the prompt mask from a single class_idx
+                channel to a 3 channel RGB. Not specifying this will result in the prompt mask either being passed
+                through as is if it is already in RGB format or being duplicated across the channel dimension.


Is the RGB info here still correct?

It was, but updated nonetheless here 017bd48

docs/source/en/model_doc/seggpt.md

amyeroberts

Thanks for working on this, fixing the code to make the users experience better and adding tests ❤️

EduardoPach added 2 commits March 8, 2024 12:14

Fixed SegGptImageProcessor to handle 2D and 3D prompt mask inputs

6294295

Added new test to check prompt mask equivalence

f293768

amyeroberts reviewed Mar 11, 2024

View reviewed changes

EduardoPach added 2 commits March 24, 2024 14:43

New proposal

7ea7a41

Merge remote-tracking branch 'upstream/main' into fix-seggpt-image-pr…

Loading
Loading status checks…

db6659d

…ocessor

NielsRogge approved these changes Apr 12, 2024

View reviewed changes

amyeroberts reviewed Apr 18, 2024

View reviewed changes

EduardoPach added 3 commits April 19, 2024 21:16

Merge remote-tracking branch 'upstream/main' into fix-seggpt-image-pr…

Loading
Loading status checks…

ad1fc96

…ocessor

Better proposal

Loading
Loading status checks…

f372a40

Removed unnecessary method

4b82ae1

EduardoPach requested a review from amyeroberts April 19, 2024 20:38

Updated seggpt docs

6098a97

EduardoPach changed the title ~~Fix seggpt image processor~~ [SegGPT] Fix seggpt image processor Apr 23, 2024

amyeroberts reviewed Apr 24, 2024

View reviewed changes

EduardoPach added 3 commits April 25, 2024 21:45

Merge remote-tracking branch 'upstream/main' into fix-seggpt-image-pr…

987b344

…ocessor

Introduced do_convert_rgb

2185b3e

Merge remote-tracking branch 'upstream/main' into fix-seggpt-image-pr…

06431b2

…ocessor

amyeroberts reviewed Apr 26, 2024

View reviewed changes

nits

017bd48

amyeroberts approved these changes Apr 26, 2024

View reviewed changes

amyeroberts merged commit 6d4cabd into huggingface:main Apr 26, 2024
18 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[SegGPT] Fix seggpt image processor #29550

[SegGPT] Fix seggpt image processor #29550

EduardoPach commented Mar 9, 2024

EduardoPach commented Mar 10, 2024

amyeroberts left a comment

amyeroberts Mar 11, 2024

EduardoPach Mar 12, 2024

amyeroberts Mar 14, 2024

EduardoPach Mar 18, 2024

amyeroberts Mar 21, 2024

EduardoPach Mar 24, 2024

EduardoPach commented Apr 9, 2024

NielsRogge left a comment

amyeroberts left a comment

amyeroberts Apr 18, 2024

EduardoPach Apr 19, 2024

amyeroberts Apr 24, 2024

EduardoPach Apr 26, 2024

amyeroberts left a comment

amyeroberts Apr 26, 2024

EduardoPach Apr 26, 2024

amyeroberts Apr 26, 2024

amyeroberts Apr 26, 2024

EduardoPach Apr 26, 2024

amyeroberts left a comment

		# Assuming that if first or last dimension is 3 (Channel first or last) it is RGB
		images = [images] if (images.shape[0] == 3 or images.shape[-1] == 3) else list(images)

		- When using binary masks or segmentation maps as prompts pass it to [`SegGptImageProcessor`] `segmentation_maps` argument, if you already converted your prompt mask to RGB then pass it to `prompt_masks` argument.
		- It's highly advisable to pass `num_labels` when using `segmetantion_maps` (not considering background) during preprocessing and postprocessing with [`SegGptImageProcessor`] for your use case.

[SegGPT] Fix seggpt image processor #29550

[SegGPT] Fix seggpt image processor #29550

Conversation

EduardoPach commented Mar 9, 2024

What does this PR do?

Before submitting

Who can review?

EduardoPach commented Mar 10, 2024

amyeroberts left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

EduardoPach commented Apr 9, 2024

NielsRogge left a comment

Choose a reason for hiding this comment

amyeroberts left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

amyeroberts left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

amyeroberts left a comment

Choose a reason for hiding this comment