Bug with train class method for MobileViTForSemanticSegmentation #30676

travisddavies · 2024-05-06T13:01:05Z

System Info

transformers version: 4.38.2
Platform: Linux-6.8.7-arch1-1-x86_64-with-glibc2.39
Python version: 3.11.9
Huggingface_hub version: 0.22.2
Safetensors version: 0.4.3
Accelerate version: not installed
Accelerate config: not found
PyTorch version (GPU?): 2.3.0+cu121 (True)
Tensorflow version (GPU?): not installed (NA)
Flax version (CPU?/GPU?/TPU?): not installed (NA)
Jax version: not installed
JaxLib version: not installed
Using GPU in script?:
Using distributed or parallel set-up in script?:

Who can help?

@amyeroberts @pacman100

Information

The official example scripts
My own modified scripts

Tasks

An officially supported task in the examples folder (such as GLUE/SQuAD, ...)
My own task or dataset (give details below)

Reproduction

import requests
import torch
from PIL import Image
import cv2
from transformers import MobileViTImageProcessor, MobileViTForSemanticSegmentation
from torchinfo import summary
import numpy as np

filepath = "drive/MyDrive/T55HDV_20190910T001109_20190910T012858_S2B.jpg"
image = cv2.imread(filepath)
mask = cv2.imread('drive/MyDrive/mask.jpg')[:,:,0]
mask = torch.tensor(mask)
mask = (mask > 0).float()
print(image.shape, mask.shape)

image_processor = MobileViTImageProcessor(do_reduce_labels=False)

id2label = {
    0: "background",
    1: "object",
}

label2id = {v: k for k, v in id2label.items()}

model = MobileViTForSemanticSegmentation.from_pretrained(
    "apple/mobilevit-small",
    num_labels=2,
    id2label=id2label,
    label2id=label2id)

inputs = image_processor(images=image, segmentation_maps=mask, return_tensors="pt")

model.train()
outputs = model(pixel_values=inputs["pixel_values"], labels=inputs['labels'])

Expected behavior

outputs should be returned with logits and loss.

The text was updated successfully, but these errors were encountered:

qubvel · 2024-05-06T15:03:36Z

Hi @travisddavies, can you please specify what is going wrong or provide an error traceback?

I was able to run the following code

import numpy as np
from transformers import MobileViTImageProcessor, MobileViTForSemanticSegmentation

image = np.ones((512, 512, 3), dtype=np.uint8)
mask = np.ones((512, 512), dtype=np.uint8)

image_processor = MobileViTImageProcessor(do_reduce_labels=False)

id2label = {
    0: "background",
    1: "object",
}

label2id = {v: k for k, v in id2label.items()}

model = MobileViTForSemanticSegmentation.from_pretrained(
    "apple/mobilevit-small",
    num_labels=2,
    id2label=id2label,
    label2id=label2id)

# for training mode we need batch size > 1 for batch norm layer, duplicate image and mask
inputs = image_processor(images=[image, image], segmentation_maps=[mask, mask], return_tensors="pt")

model.train()
outputs = model(pixel_values=inputs["pixel_values"], labels=inputs['labels'])

print(outputs.keys())
# >>> odict_keys(['loss', 'logits'])

travisddavies · 2024-05-07T03:02:11Z

That seemed to have worked with that input

/usr/local/lib/python3.10/dist-packages/huggingface_hub/utils/_token.py:88: UserWarning:
The secret HF_TOKEN does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.
warnings.warn(
config.json: 100%
70.0k/70.0k [00:00<00:00, 3.79MB/s]
pytorch_model.bin: 100%
22.5M/22.5M [00:00<00:00, 201MB/s]
Some weights of MobileViTForSemanticSegmentation were not initialized from the model checkpoint at apple/mobilevit-small and are newly initialized: ['segmentation_head.aspp.convs.0.convolution.weight', 'segmentation_head.aspp.convs.0.normalization.bias', 'segmentation_head.aspp.convs.0.normalization.num_batches_tracked', 'segmentation_head.aspp.convs.0.normalization.running_mean', 'segmentation_head.aspp.convs.0.normalization.running_var', 'segmentation_head.aspp.convs.0.normalization.weight', 'segmentation_head.aspp.convs.1.convolution.weight', 'segmentation_head.aspp.convs.1.normalization.bias', 'segmentation_head.aspp.convs.1.normalization.num_batches_tracked', 'segmentation_head.aspp.convs.1.normalization.running_mean', 'segmentation_head.aspp.convs.1.normalization.running_var', 'segmentation_head.aspp.convs.1.normalization.weight', 'segmentation_head.aspp.convs.2.convolution.weight', 'segmentation_head.aspp.convs.2.normalization.bias', 'segmentation_head.aspp.convs.2.normalization.num_batches_tracked', 'segmentation_head.aspp.convs.2.normalization.running_mean', 'segmentation_head.aspp.convs.2.normalization.running_var', 'segmentation_head.aspp.convs.2.normalization.weight', 'segmentation_head.aspp.convs.3.convolution.weight', 'segmentation_head.aspp.convs.3.normalization.bias', 'segmentation_head.aspp.convs.3.normalization.num_batches_tracked', 'segmentation_head.aspp.convs.3.normalization.running_mean', 'segmentation_head.aspp.convs.3.normalization.running_var', 'segmentation_head.aspp.convs.3.normalization.weight', 'segmentation_head.aspp.convs.4.conv_1x1.convolution.weight', 'segmentation_head.aspp.convs.4.conv_1x1.normalization.bias', 'segmentation_head.aspp.convs.4.conv_1x1.normalization.num_batches_tracked', 'segmentation_head.aspp.convs.4.conv_1x1.normalization.running_mean', 'segmentation_head.aspp.convs.4.conv_1x1.normalization.running_var', 'segmentation_head.aspp.convs.4.conv_1x1.normalization.weight', 'segmentation_head.aspp.project.convolution.weight', 'segmentation_head.aspp.project.normalization.bias', 'segmentation_head.aspp.project.normalization.num_batches_tracked', 'segmentation_head.aspp.project.normalization.running_mean', 'segmentation_head.aspp.project.normalization.running_var', 'segmentation_head.aspp.project.normalization.weight', 'segmentation_head.classifier.convolution.bias', 'segmentation_head.classifier.convolution.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.
odict_keys(['loss', 'logits'])

qubvel added the Vision label May 6, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Bug with train class method for MobileViTForSemanticSegmentation #30676

Bug with train class method for MobileViTForSemanticSegmentation #30676

travisddavies commented May 6, 2024 •

edited by NielsRogge

qubvel commented May 6, 2024

travisddavies commented May 7, 2024

Bug with train class method for MobileViTForSemanticSegmentation #30676

Bug with train class method for MobileViTForSemanticSegmentation #30676

Comments

travisddavies commented May 6, 2024 • edited by NielsRogge

System Info

Who can help?

Information

Tasks

Reproduction

Expected behavior

qubvel commented May 6, 2024

travisddavies commented May 7, 2024

travisddavies commented May 6, 2024 •

edited by NielsRogge