-
Notifications
You must be signed in to change notification settings - Fork 60
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Segfault during GPU export of Reshape
after importing model from ONNX
#154
Comments
Hi, @jan-haug! I can't reproduce: ❯ docker run --rm --gpus all -ti -v $(pwd):/io pytorch/pytorch:1.13.1-cuda11.6-cudnn8-runtime bash
root@7c0263e9a633:/workspace# cd /io
root@7c0263e9a633:/io# pip3 install onnx2torch
Collecting onnx2torch
Downloading onnx2torch-1.5.6-py3-none-any.whl (115 kB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 115.6/115.6 kB 722.2 kB/s eta 0:00:00
Requirement already satisfied: torch>=1.8.0 in /opt/conda/lib/python3.10/site-packages (from onnx2torch) (1.13.1)
Requirement already satisfied: torchvision>=0.9.0 in /opt/conda/lib/python3.10/site-packages (from onnx2torch) (0.14.1)
Collecting onnx>=1.9.0
Downloading onnx-1.13.1-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (13.5 MB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 13.5/13.5 MB 14.1 MB/s eta 0:00:00
Requirement already satisfied: numpy>=1.16.4 in /opt/conda/lib/python3.10/site-packages (from onnx2torch) (1.22.3)
Collecting protobuf<4,>=3.20.2
Downloading protobuf-3.20.3-cp310-cp310-manylinux_2_12_x86_64.manylinux2010_x86_64.whl (1.1 MB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 1.1/1.1 MB 19.6 MB/s eta 0:00:00
Requirement already satisfied: typing-extensions>=3.6.2.1 in /opt/conda/lib/python3.10/site-packages (from onnx>=1.9.0->onnx2torch) (4.4.0)
Requirement already satisfied: requests in /opt/conda/lib/python3.10/site-packages (from torchvision>=0.9.0->onnx2torch) (2.28.1)
Requirement already satisfied: pillow!=8.3.*,>=5.3.0 in /opt/conda/lib/python3.10/site-packages (from torchvision>=0.9.0->onnx2torch) (9.3.0)
Requirement already satisfied: charset-normalizer<3,>=2 in /opt/conda/lib/python3.10/site-packages (from requests->torchvision>=0.9.0->onnx2torch) (2.0.4)
Requirement already satisfied: certifi>=2017.4.17 in /opt/conda/lib/python3.10/site-packages (from requests->torchvision>=0.9.0->onnx2torch) (2022.9.24)
Requirement already satisfied: idna<4,>=2.5 in /opt/conda/lib/python3.10/site-packages (from requests->torchvision>=0.9.0->onnx2torch) (3.4)
Requirement already satisfied: urllib3<1.27,>=1.21.1 in /opt/conda/lib/python3.10/site-packages (from requests->torchvision>=0.9.0->onnx2torch) (1.26.13)
Installing collected packages: protobuf, onnx, onnx2torch
Successfully installed onnx-1.13.1 onnx2torch-1.5.6 protobuf-3.20.3
WARNING: Running pip as the 'root' user can result in broken permissions and conflicting behaviour with the system package manager. It is recommended to use a virtual environment instead: https://pip.pypa.io/warnings/venv
root@7c0263e9a633:/io# python3 reshape_segfault.py
root@7c0263e9a633:/io#
The reason is to ensure backward compatibility: |
@senysenyseny16 my bad, I forgot the With that, I also get the error when following the docker setup you posted. |
I have reproduced the problem. Thanks for your report, I think the problem is in the C++ code, I'll try to debug it. |
Thanks! FWIW, this also pertains to (some) other ops where a custom mapping is implemented in this way, for example |
This issue is stale because it has been open 30 days with no activity. Remove stale label or comment or this will be closed in 10 days. |
This issue is stale because it has been open 30 days with no activity. Remove stale label or comment or this will be closed in 10 days. |
This issue was closed because it has been stalled for 10 days with no activity. |
Any update on this? |
We think that the problem is in the export mechanism of PyTorch, we are unlikely to be able to do something about it now. Sorry for the late reply. |
This issue is stale because it has been open 30 days with no activity. Remove stale label or comment or this will be closed in 10 days. |
When you load a
Reshape
node, anOnnxReshape
instance is created with custom logic for the torch.onnx export. (See https://github.com/ENOT-AutoDL/onnx2torch/blob/main/onnx2torch/node_converters/reshape.py#L32-L33 )However, I'm getting segmentation faults when exporting the torch model using the GPU with this logic. Removing this if-condition (from the link above) entirely fixes the issue for me. What is the reason for this handling and is there a way around this or could it be extended to work with cuda too?
I'm running
torch==1.13.1
and exporting with onnx opset 14. CPU export works fine but that's not really an option in my case unfortunately.Standalone reproducer:
The text was updated successfully, but these errors were encountered: