-
Notifications
You must be signed in to change notification settings - Fork 28.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SegGPT] Fix seggpt image processor #29550
[SegGPT] Fix seggpt image processor #29550
Conversation
The fix is probably not ideal (the current way with try..except), but just wanted to start the conversation with a working solution |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for working on this and adding tests ❤️ !
As you suspected, try / except isn't the way to go, but it should be simple to handle as there's already conditional mask logic in the image processor
try: | ||
images = make_list_of_images(images) | ||
except ValueError: | ||
images = make_list_of_images(images, expected_ndims=2) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No - we shouldn't use try / expect here. In fact, I don't think this call is robust if the input image has two dimensions as that's not currently compatible with many of the image transformations.
It would be better to have two separate methods, _preprocess_image
and _preprocess_mask
, each of which call _preprocess
. _preprocess_mask
can then use mask_to_rgb
, before calling _preprocess
and we can remove this step entirely for when we process images.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
mask_to_rgb
currently expects a list of numpy arrays and this conversion happens on the _preprocess_step
method with make_list_of_images
and to_numpy_array
. The problem is that we are currently accepting/expecting 2D (semantic maps) and 3D (if already converted to RGB with some logic) images for mask prompt. Maybe an easy fix would be to allow the input of the expected dimension for the mask something like prompt_mask_dimension
and then pass that to make_list_of_images
, wdyt?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
mask_to_rgb currently expects a list of numpy arrays
I don't think this is true - mask_to_rgb
expects a single image
Maybe an easy fix would be to allow the input of the expected dimension for the mask something like prompt_mask_dimension and then pass that to make_list_of_images, wdyt?
This isn't ideal. It would be better for mask_to_rgb
accept masks with and without channels and convert as necessary to RGB for both cases.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ops my bad, mask_to_rgb
expects a single image you're right.
But, still, the current problem lies on the make_list_of_images
even if we split _preprocess
to _preprocess_image
and _preprocess_mask
this problem would still be there and since make_list_of_images
takes an ImageInput
as arg which can be either PIL
, numpy
, ... and even if we had this logic to parse the input somewhere else it would still be ambiguous if a 3D mask is a batch of zero channel images or a single RGB mask (for inputs other than PIL
)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, I see, so you're saying that the difficulty comes from the face the passed in masks can be (B, H, W)
or (B, C, H, W)
? The current approach wouldn't fix this, as it would still make a batch of (B, H, W)
masks a list of a [batch]
instead of [mask_0, mask_1, mask_2, ... ]
I would just add a custom make_list_of_images
for this model, which tasks the images and masks and makes a list of the masks based on the batch size of the input images
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@amyeroberts Yeah, but there's another issue to that in the sense that if the image is 3D it can either be (B, H, W)
batched semantic map or (C, H, W)
RGB image. This will only be a problem if we have ChannelDimension.First
and B=3
Therefore, even creating a custom make_list_of_images
(or make_list_of_masks
as the problem is in the masks) we'll have ambiguity in the case where image.ndim==3
.
This is the portion of make_list_of_images
that raises problems:
if is_valid_image(images):
if images.ndim == expected_ndims + 1:
# Batch of images
images = list(images)
elif images.ndim == expected_ndims:
# Single image
images = [images]
else:
raise ValueError(
f"Invalid image shape. Expected either {expected_ndims + 1} or {expected_ndims} dimensions, but got"
f" {images.ndim} dimensions."
)
return images
At this point the images are either np.ndarray
or a torch.tensor
in the context of SegGpt
.
I've created a new function called make_list_of_masks
here 7ea7a41 and deal with that, but had to make an assumption that if images
is 3D then if the last of first dim is 3 it's a single RGB image.
…ocessor
c.c. @NielsRogge |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for improving this!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Very nice - thanks for working on this!
Only comment is this would still fail for batch_size = 3.
# Assuming that if first or last dimension is 3 (Channel first or last) it is RGB | ||
images = [images] if (images.shape[0] == 3 or images.shape[-1] == 3) else list(images) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This won't work if 3 grayscale maps are batched together.
One option (not great) is getting the batch size, or image size, from the images and having this as an optional argument that can be passed in. We can use 3 as a default
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Indeed not ideal, therefore I made a new proposal here f372a40
Instead of having to deal with that, I added a new optional argument called segmentation_maps
that follows other image processor's conventions. So, to obtain the prompt_masks
in the BatchedFeature
one should specify prompt_masks
or segmentation_maps
where segmentation_maps
should be the 2-dimensional input that we convert to RGB when num_labels
is specified.
…ocessor
docs/source/en/model_doc/seggpt.md
Outdated
- When using binary masks or segmentation maps as prompts pass it to [`SegGptImageProcessor`] `segmentation_maps` argument, if you already converted your prompt mask to RGB then pass it to `prompt_masks` argument. | ||
- It's highly advisable to pass `num_labels` when using `segmetantion_maps` (not considering background) during preprocessing and postprocessing with [`SegGptImageProcessor`] for your use case. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, let's not do this. Having two different inputs like this is both error prone and an antipattern. The image processor can selectively process based on the do_xxx
flags e.g. if the input image was already resized, we wouldn't have a different input e.g. resized_images
we would still pass in images
and set do_resize=False
. In this case, it looks like what we want is do_convert_rgb
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It makes perfect sense! Introduced the do_convert_rgb
flag and updated the tests accordingly here 2185b3e
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looking good - thanks for iterating on this!
Just a few places left to tidy up
input_data_format (`ChannelDimension` or `str`, *optional*): | ||
The channel dimension format for the input image. If unset, the channel dimension format is inferred | ||
from the input image. Can be one of: | ||
- `"channels_first"` or `ChannelDimension.FIRST`: image in (num_channels, height, width) format. | ||
- `"channels_last"` or `ChannelDimension.LAST`: image in (height, width, num_channels) format. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
All image processor methods should accept this as an input argument
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This method is specific for 2D
(no channels) masks, thus I see no reason for it to have an input_data_format
should I add it to keep the same signature?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK - in this case let's leave as-is
built, assuming that class_idx 0 is the background, to map the prompt mask from a single class_idx | ||
channel to a 3 channel RGB. Not specifying this will result in the prompt mask either being passed | ||
through as is if it is already in RGB format or being duplicated across the channel dimension. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is the RGB info here still correct?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It was, but updated nonetheless here 017bd48
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for working on this, fixing the code to make the users experience better and adding tests ❤️
What does this PR do?
Fixes #29549 and add appropriate tests to avoid this issue to repeat itself in the future
Before submitting
Pull Request section?
to it if that's the case.
documentation guidelines, and
here are tips on formatting docstrings.
Who can review?
Anyone in the community is free to review the PR once the tests have passed. Feel free to tag
members/contributors who may be interested in your PR.
c.c. @amyeroberts