Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bug with train class method for MobileViTForSemanticSegmentation #30676

Open
2 of 4 tasks
travisddavies opened this issue May 6, 2024 · 2 comments
Open
2 of 4 tasks
Labels

Comments

@travisddavies
Copy link

travisddavies commented May 6, 2024

System Info

  • transformers version: 4.38.2
  • Platform: Linux-6.8.7-arch1-1-x86_64-with-glibc2.39
  • Python version: 3.11.9
  • Huggingface_hub version: 0.22.2
  • Safetensors version: 0.4.3
  • Accelerate version: not installed
  • Accelerate config: not found
  • PyTorch version (GPU?): 2.3.0+cu121 (True)
  • Tensorflow version (GPU?): not installed (NA)
  • Flax version (CPU?/GPU?/TPU?): not installed (NA)
  • Jax version: not installed
  • JaxLib version: not installed
  • Using GPU in script?:
  • Using distributed or parallel set-up in script?:

Who can help?

@amyeroberts @pacman100

Information

  • The official example scripts
  • My own modified scripts

Tasks

  • An officially supported task in the examples folder (such as GLUE/SQuAD, ...)
  • My own task or dataset (give details below)

Reproduction

import requests
import torch
from PIL import Image
import cv2
from transformers import MobileViTImageProcessor, MobileViTForSemanticSegmentation
from torchinfo import summary
import numpy as np

filepath = "drive/MyDrive/T55HDV_20190910T001109_20190910T012858_S2B.jpg"
image = cv2.imread(filepath)
mask = cv2.imread('drive/MyDrive/mask.jpg')[:,:,0]
mask = torch.tensor(mask)
mask = (mask > 0).float()
print(image.shape, mask.shape)

image_processor = MobileViTImageProcessor(do_reduce_labels=False)

id2label = {
    0: "background",
    1: "object",
}

label2id = {v: k for k, v in id2label.items()}

model = MobileViTForSemanticSegmentation.from_pretrained(
    "apple/mobilevit-small",
    num_labels=2,
    id2label=id2label,
    label2id=label2id)

inputs = image_processor(images=image, segmentation_maps=mask, return_tensors="pt")

model.train()
outputs = model(pixel_values=inputs["pixel_values"], labels=inputs['labels'])

Expected behavior

outputs should be returned with logits and loss.

@qubvel
Copy link
Member

qubvel commented May 6, 2024

Hi @travisddavies, can you please specify what is going wrong or provide an error traceback?

I was able to run the following code

import numpy as np
from transformers import MobileViTImageProcessor, MobileViTForSemanticSegmentation

image = np.ones((512, 512, 3), dtype=np.uint8)
mask = np.ones((512, 512), dtype=np.uint8)

image_processor = MobileViTImageProcessor(do_reduce_labels=False)

id2label = {
    0: "background",
    1: "object",
}

label2id = {v: k for k, v in id2label.items()}

model = MobileViTForSemanticSegmentation.from_pretrained(
    "apple/mobilevit-small",
    num_labels=2,
    id2label=id2label,
    label2id=label2id)

# for training mode we need batch size > 1 for batch norm layer, duplicate image and mask
inputs = image_processor(images=[image, image], segmentation_maps=[mask, mask], return_tensors="pt")

model.train()
outputs = model(pixel_values=inputs["pixel_values"], labels=inputs['labels'])

print(outputs.keys())
# >>> odict_keys(['loss', 'logits'])

@qubvel qubvel added the Vision label May 6, 2024
@travisddavies
Copy link
Author

That seemed to have worked with that input

/usr/local/lib/python3.10/dist-packages/huggingface_hub/utils/_token.py:88: UserWarning:
The secret HF_TOKEN does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.
warnings.warn(
config.json: 100%
 70.0k/70.0k [00:00<00:00, 3.79MB/s]
pytorch_model.bin: 100%
 22.5M/22.5M [00:00<00:00, 201MB/s]
Some weights of MobileViTForSemanticSegmentation were not initialized from the model checkpoint at apple/mobilevit-small and are newly initialized: ['segmentation_head.aspp.convs.0.convolution.weight', 'segmentation_head.aspp.convs.0.normalization.bias', 'segmentation_head.aspp.convs.0.normalization.num_batches_tracked', 'segmentation_head.aspp.convs.0.normalization.running_mean', 'segmentation_head.aspp.convs.0.normalization.running_var', 'segmentation_head.aspp.convs.0.normalization.weight', 'segmentation_head.aspp.convs.1.convolution.weight', 'segmentation_head.aspp.convs.1.normalization.bias', 'segmentation_head.aspp.convs.1.normalization.num_batches_tracked', 'segmentation_head.aspp.convs.1.normalization.running_mean', 'segmentation_head.aspp.convs.1.normalization.running_var', 'segmentation_head.aspp.convs.1.normalization.weight', 'segmentation_head.aspp.convs.2.convolution.weight', 'segmentation_head.aspp.convs.2.normalization.bias', 'segmentation_head.aspp.convs.2.normalization.num_batches_tracked', 'segmentation_head.aspp.convs.2.normalization.running_mean', 'segmentation_head.aspp.convs.2.normalization.running_var', 'segmentation_head.aspp.convs.2.normalization.weight', 'segmentation_head.aspp.convs.3.convolution.weight', 'segmentation_head.aspp.convs.3.normalization.bias', 'segmentation_head.aspp.convs.3.normalization.num_batches_tracked', 'segmentation_head.aspp.convs.3.normalization.running_mean', 'segmentation_head.aspp.convs.3.normalization.running_var', 'segmentation_head.aspp.convs.3.normalization.weight', 'segmentation_head.aspp.convs.4.conv_1x1.convolution.weight', 'segmentation_head.aspp.convs.4.conv_1x1.normalization.bias', 'segmentation_head.aspp.convs.4.conv_1x1.normalization.num_batches_tracked', 'segmentation_head.aspp.convs.4.conv_1x1.normalization.running_mean', 'segmentation_head.aspp.convs.4.conv_1x1.normalization.running_var', 'segmentation_head.aspp.convs.4.conv_1x1.normalization.weight', 'segmentation_head.aspp.project.convolution.weight', 'segmentation_head.aspp.project.normalization.bias', 'segmentation_head.aspp.project.normalization.num_batches_tracked', 'segmentation_head.aspp.project.normalization.running_mean', 'segmentation_head.aspp.project.normalization.running_var', 'segmentation_head.aspp.project.normalization.weight', 'segmentation_head.classifier.convolution.bias', 'segmentation_head.classifier.convolution.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.
odict_keys(['loss', 'logits'])

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants