Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unable to train with custom data #744

Open
karthik-v-b opened this issue Mar 20, 2023 · 1 comment
Open

Unable to train with custom data #744

karthik-v-b opened this issue Mar 20, 2023 · 1 comment

Comments

@karthik-v-b
Copy link

karthik-v-b commented Mar 20, 2023

Hello @zylo117 , Thanks for the great code base.
I tried to run the shape model it works fine in colab. And then i tried with my own custom data which has 3 classes, modified the folder name and added my dataset there and changed accordingly in yml file too but while training with below code

! python train.py -c 0 -p vehiclecm --head_only True --lr 1e-3 --batch_size 32 --load_weights /content/Yet-Another-EfficientDet-Pytorch/weights/efficientdet-d0.pth --num_epochs 1 --save_interval 100

i get the error as below, can you help on this.

loading annotations into memory...
Done (t=1.29s)
creating index...
index created!
/usr/local/lib/python3.9/dist-packages/torch/utils/data/dataloader.py:554: UserWarning: This DataLoader will create 12 worker processes in total. Our suggested max number of worker in current system is 2, which is smaller than what this DataLoader is going to create. Please be aware that excessive worker creation might get DataLoader running slow or even freeze, lower the worker number to avoid potential slowness/freeze if necessary.
warnings.warn(_create_warning_msg(
loading annotations into memory...
Done (t=0.47s)
creating index...
index created!
[Warning] Ignoring Error(s) in loading state_dict for EfficientDetBackbone:
size mismatch for classifier.header.pointwise_conv.conv.weight: copying a param with shape torch.Size([810, 64, 1, 1]) from checkpoint, the shape in current model is torch.Size([27, 64, 1, 1]).
size mismatch for classifier.header.pointwise_conv.conv.bias: copying a param with shape torch.Size([810]) from checkpoint, the shape in current model is torch.Size([27]).
[Warning] Don't panic if you see this, this might be because you load a pretrained weights with different number of classes. The rest of the weights should be loaded already.
[Info] loaded weights: efficientdet-d0.pth, resuming checkpoint from step: 0
[Info] freezed backbone
0% 0/236 [00:00<?, ?it/s]../aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [0,0,0], thread: [32,0,0] Assertion index >= -sizes[i] && index < sizes[i] && "index out of bounds" failed.
../aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [0,0,0], thread: [33,0,0] Assertion index >= -sizes[i] && index < sizes[i] && "index out of bounds" failed.
../aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [0,0,0], thread: [34,0,0] Assertion index >= -sizes[i] && index < sizes[i] && "index out of bounds" failed.
../aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [0,0,0], thread: [35,0,0] Assertion index >= -sizes[i] && index < sizes[i] && "index out of bounds" failed.
../aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [0,0,0], thread: [36,0,0] Assertion index >= -sizes[i] && index < sizes[i] && "index out of bounds" failed.
../aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [0,0,0], thread: [37,0,0] Assertion index >= -sizes[i] && index < sizes[i] && "index out of bounds" failed.
../aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [0,0,0], thread: [38,0,0] Assertion index >= -sizes[i] && index < sizes[i] && "index out of bounds" failed.
../aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [0,0,0], thread: [39,0,0] Assertion index >= -sizes[i] && index < sizes[i] && "index out of bounds" failed.
../aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [0,0,0], thread: [0,0,0] Assertion index >= -sizes[i] && index < sizes[i] && "index out of bounds" failed.
../aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [0,0,0], thread: [1,0,0] Assertion index >= -sizes[i] && index < sizes[i] && "index out of bounds" failed.
../aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [0,0,0], thread: [2,0,0] Assertion index >= -sizes[i] && index < sizes[i] && "index out of bounds" failed.
../aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [0,0,0], thread: [3,0,0] Assertion index >= -sizes[i] && index < sizes[i] && "index out of bounds" failed.
../aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [0,0,0], thread: [4,0,0] Assertion index >= -sizes[i] && index < sizes[i] && "index out of bounds" failed.
../aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [0,0,0], thread: [5,0,0] Assertion index >= -sizes[i] && index < sizes[i] && "index out of bounds" failed.
../aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [0,0,0], thread: [6,0,0] Assertion index >= -sizes[i] && index < sizes[i] && "index out of bounds" failed.
../aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [0,0,0], thread: [7,0,0] Assertion index >= -sizes[i] && index < sizes[i] && "index out of bounds" failed.
../aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [0,0,0], thread: [8,0,0] Assertion index >= -sizes[i] && index < sizes[i] && "index out of bounds" failed.
../aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [0,0,0], thread: [9,0,0] Assertion index >= -sizes[i] && index < sizes[i] && "index out of bounds" failed.
../aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [0,0,0], thread: [10,0,0] Assertion index >= -sizes[i] && index < sizes[i] && "index out of bounds" failed.
../aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [0,0,0], thread: [11,0,0] Assertion index >= -sizes[i] && index < sizes[i] && "index out of bounds" failed.
../aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [0,0,0], thread: [12,0,0] Assertion index >= -sizes[i] && index < sizes[i] && "index out of bounds" failed.
../aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [0,0,0], thread: [13,0,0] Assertion index >= -sizes[i] && index < sizes[i] && "index out of bounds" failed.
../aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [0,0,0], thread: [14,0,0] Assertion index >= -sizes[i] && index < sizes[i] && "index out of bounds" failed.
../aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [0,0,0], thread: [15,0,0] Assertion index >= -sizes[i] && index < sizes[i] && "index out of bounds" failed.
../aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [0,0,0], thread: [16,0,0] Assertion index >= -sizes[i] && index < sizes[i] && "index out of bounds" failed.
../aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [0,0,0], thread: [17,0,0] Assertion index >= -sizes[i] && index < sizes[i] && "index out of bounds" failed.
../aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [0,0,0], thread: [18,0,0] Assertion index >= -sizes[i] && index < sizes[i] && "index out of bounds" failed.
../aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [0,0,0], thread: [19,0,0] Assertion index >= -sizes[i] && index < sizes[i] && "index out of bounds" failed.
../aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [0,0,0], thread: [20,0,0] Assertion index >= -sizes[i] && index < sizes[i] && "index out of bounds" failed.
../aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [0,0,0], thread: [21,0,0] Assertion index >= -sizes[i] && index < sizes[i] && "index out of bounds" failed.
../aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [0,0,0], thread: [22,0,0] Assertion index >= -sizes[i] && index < sizes[i] && "index out of bounds" failed.
../aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [0,0,0], thread: [23,0,0] Assertion index >= -sizes[i] && index < sizes[i] && "index out of bounds" failed.
../aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [0,0,0], thread: [24,0,0] Assertion index >= -sizes[i] && index < sizes[i] && "index out of bounds" failed.
../aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [0,0,0], thread: [25,0,0] Assertion index >= -sizes[i] && index < sizes[i] && "index out of bounds" failed.
../aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [0,0,0], thread: [26,0,0] Assertion index >= -sizes[i] && index < sizes[i] && "index out of bounds" failed.
../aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [0,0,0], thread: [27,0,0] Assertion index >= -sizes[i] && index < sizes[i] && "index out of bounds" failed.
../aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [0,0,0], thread: [28,0,0] Assertion index >= -sizes[i] && index < sizes[i] && "index out of bounds" failed.
../aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [0,0,0], thread: [29,0,0] Assertion index >= -sizes[i] && index < sizes[i] && "index out of bounds" failed.
../aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [0,0,0], thread: [30,0,0] Assertion index >= -sizes[i] && index < sizes[i] && "index out of bounds" failed.
../aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [0,0,0], thread: [31,0,0] Assertion index >= -sizes[i] && index < sizes[i] && "index out of bounds" failed.
[Error] Traceback (most recent call last):
File "/content/Yet-Another-EfficientDet-Pytorch/train.py", line 226, in train
cls_loss, reg_loss = model(imgs, annot, obj_list=params.obj_list)
File "/usr/local/lib/python3.9/dist-packages/torch/nn/modules/module.py", line 1194, in _call_impl
return forward_call(*input, **kwargs)
File "/content/Yet-Another-EfficientDet-Pytorch/train.py", line 80, in forward
cls_loss, reg_loss = self.criterion(classification, regression, anchors, annotations)
File "/usr/local/lib/python3.9/dist-packages/torch/nn/modules/module.py", line 1194, in _call_impl
return forward_call(*input, **kwargs)
File "/content/Yet-Another-EfficientDet-Pytorch/efficientdet/loss.py", line 115, in forward
bce = -(targets * torch.log(classification) + (1.0 - targets) * torch.log(1.0 - classification))
File "/usr/local/lib/python3.9/dist-packages/torch/_tensor.py", line 39, in wrapped
return f(*args, **kwargs)
File "/usr/local/lib/python3.9/dist-packages/torch/_tensor.py", line 834, in rsub
return _C._VariableFunctions.rsub(self, other)
RuntimeError: CUDA error: device-side assert triggered
CUDA kernel errors might be asynchronously reported at some other API call,so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1.

CUDA error: device-side assert triggered
CUDA kernel errors might be asynchronously reported at some other API call,so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1.
0% 1/236 [00:46<3:00:25, 46.06s/it][Error] Traceback (most recent call last):
File "/content/Yet-Another-EfficientDet-Pytorch/train.py", line 226, in train
cls_loss, reg_loss = model(imgs, annot, obj_list=params.obj_list)
File "/usr/local/lib/python3.9/dist-packages/torch/nn/modules/module.py", line 1194, in _call_impl
return forward_call(*input, **kwargs)
File "/content/Yet-Another-EfficientDet-Pytorch/train.py", line 75, in forward
_, regression, classification, anchors = self.model(imgs)
File "/usr/local/lib/python3.9/dist-packages/torch/nn/modules/module.py", line 1194, in _call_impl
return forward_call(*input, **kwargs)
File "/content/Yet-Another-EfficientDet-Pytorch/backbone.py", line 70, in forward
_, p3, p4, p5 = self.backbone_net(inputs)
File "/usr/local/lib/python3.9/dist-packages/torch/nn/modules/module.py", line 1194, in _call_impl
return forward_call(*input, **kwargs)
File "/content/Yet-Another-EfficientDet-Pytorch/efficientdet/model.py", line 436, in forward
x = self.model._conv_stem(x)
File "/usr/local/lib/python3.9/dist-packages/torch/nn/modules/module.py", line 1194, in _call_impl
return forward_call(*input, **kwargs)
File "/content/Yet-Another-EfficientDet-Pytorch/efficientnet/utils_extra.py", line 46, in forward
x = self.conv(x)
File "/usr/local/lib/python3.9/dist-packages/torch/nn/modules/module.py", line 1194, in _call_impl
return forward_call(*input, **kwargs)
File "/usr/local/lib/python3.9/dist-packages/torch/nn/modules/conv.py", line 463, in forward
return self._conv_forward(input, self.weight, self.bias)
File "/usr/local/lib/python3.9/dist-packages/torch/nn/modules/conv.py", line 459, in _conv_forward
return F.conv2d(input, weight, bias, self.stride,
RuntimeError: cuDNN error: CUDNN_STATUS_MAPPING_ERROR

cuDNN error: CUDNN_STATUS_MAPPING_ERROR
1% 2/236 [00:49<1:20:48, 20.72s/it][Error] Traceback (most recent call last):
File "/content/Yet-Another-EfficientDet-Pytorch/train.py", line 226, in train
cls_loss, reg_loss = model(imgs, annot, obj_list=params.obj_list)
File "/usr/local/lib/python3.9/dist-packages/torch/nn/modules/module.py", line 1194, in _call_impl
return forward_call(*input, **kwargs)
File "/content/Yet-Another-EfficientDet-Pytorch/train.py", line 75, in forward
_, regression, classification, anchors = self.model(imgs)
File "/usr/local/lib/python3.9/dist-packages/torch/nn/modules/module.py", line 1194, in _call_impl
return forward_call(*input, **kwargs)
File "/content/Yet-Another-EfficientDet-Pytorch/backbone.py", line 70, in forward
_, p3, p4, p5 = self.backbone_net(inputs)
File "/usr/local/lib/python3.9/dist-packages/torch/nn/modules/module.py", line 1194, in _call_impl
return forward_call(*input, **kwargs)
File "/content/Yet-Another-EfficientDet-Pytorch/efficientdet/model.py", line 436, in forward
x = self.model._conv_stem(x)
File "/usr/local/lib/python3.9/dist-packages/torch/nn/modules/module.py", line 1194, in _call_impl
return forward_call(*input, **kwargs)
File "/content/Yet-Another-EfficientDet-Pytorch/efficientnet/utils_extra.py", line 46, in forward
x = self.conv(x)
File "/usr/local/lib/python3.9/dist-packages/torch/nn/modules/module.py", line 1194, in _call_impl
return forward_call(*input, **kwargs)
File "/usr/local/lib/python3.9/dist-packages/torch/nn/modules/conv.py", line 463, in forward
return self._conv_forward(input, self.weight, self.bias)
File "/usr/local/lib/python3.9/dist-packages/torch/nn/modules/conv.py", line 459, in _conv_forward
return F.conv2d(input, weight, bias, self.stride,
RuntimeError: cuDNN error: CUDNN_STATUS_MAPPING_ERROR

The yml file i used is

Screenshot 2023-03-20 133553

Dataset folder structure:
Screenshot 2023-03-20 133810

Link to the annotation file

Link to the train dataset

Link to the val dataset

Seeking your help to solve this error and to train on this dataset.

@lihao4096
Copy link

Have you solved the problems?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants