Segfault during GPU export of `Reshape` after importing model from ONNX #154

jan-haug · 2023-03-22T17:07:14Z

When you load a Reshape node, an OnnxReshape instance is created with custom logic for the torch.onnx export. (See https://github.com/ENOT-AutoDL/onnx2torch/blob/main/onnx2torch/node_converters/reshape.py#L32-L33 )

However, I'm getting segmentation faults when exporting the torch model using the GPU with this logic. Removing this if-condition (from the link above) entirely fixes the issue for me. What is the reason for this handling and is there a way around this or could it be extended to work with cuda too?

        if torch.onnx.is_in_onnx_export():
            return DefaultExportToOnnx.export(forward_lambda, 'Reshape', input_tensor, shape, {})

I'm running torch==1.13.1 and exporting with onnx opset 14. CPU export works fine but that's not really an option in my case unfortunately.

Standalone reproducer:

import os
import onnx2torch
import tempfile
import torch


class ReshapeModel(torch.nn.Module):
    def forward(self, x):
        return x.reshape(-1, 512)


def test_export():
    tmp_path = tempfile.mkdtemp()
    sample = torch.rand((1, 512, 1, 1))
    model = ReshapeModel()
    out_path = os.path.join(tmp_path, "temp.onnx")
    torch.onnx.export(model, sample, out_path)
    model_reconstructed = onnx2torch.convert(out_path)
    model_reconstructed.to("cuda")
    torch.onnx.export(model_reconstructed, sample, out_path)


if __name__ == "__main__":
    test_export()

The text was updated successfully, but these errors were encountered:

senysenyseny16 · 2023-03-23T07:18:01Z

Hi, @jan-haug!

I can't reproduce:

❯ docker run --rm --gpus all -ti -v $(pwd):/io pytorch/pytorch:1.13.1-cuda11.6-cudnn8-runtime bash
root@7c0263e9a633:/workspace# cd /io
root@7c0263e9a633:/io# pip3 install onnx2torch
Collecting onnx2torch
  Downloading onnx2torch-1.5.6-py3-none-any.whl (115 kB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 115.6/115.6 kB 722.2 kB/s eta 0:00:00
Requirement already satisfied: torch>=1.8.0 in /opt/conda/lib/python3.10/site-packages (from onnx2torch) (1.13.1)
Requirement already satisfied: torchvision>=0.9.0 in /opt/conda/lib/python3.10/site-packages (from onnx2torch) (0.14.1)
Collecting onnx>=1.9.0
  Downloading onnx-1.13.1-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (13.5 MB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 13.5/13.5 MB 14.1 MB/s eta 0:00:00
Requirement already satisfied: numpy>=1.16.4 in /opt/conda/lib/python3.10/site-packages (from onnx2torch) (1.22.3)
Collecting protobuf<4,>=3.20.2
  Downloading protobuf-3.20.3-cp310-cp310-manylinux_2_12_x86_64.manylinux2010_x86_64.whl (1.1 MB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 1.1/1.1 MB 19.6 MB/s eta 0:00:00
Requirement already satisfied: typing-extensions>=3.6.2.1 in /opt/conda/lib/python3.10/site-packages (from onnx>=1.9.0->onnx2torch) (4.4.0)
Requirement already satisfied: requests in /opt/conda/lib/python3.10/site-packages (from torchvision>=0.9.0->onnx2torch) (2.28.1)
Requirement already satisfied: pillow!=8.3.*,>=5.3.0 in /opt/conda/lib/python3.10/site-packages (from torchvision>=0.9.0->onnx2torch) (9.3.0)
Requirement already satisfied: charset-normalizer<3,>=2 in /opt/conda/lib/python3.10/site-packages (from requests->torchvision>=0.9.0->onnx2torch) (2.0.4)
Requirement already satisfied: certifi>=2017.4.17 in /opt/conda/lib/python3.10/site-packages (from requests->torchvision>=0.9.0->onnx2torch) (2022.9.24)
Requirement already satisfied: idna<4,>=2.5 in /opt/conda/lib/python3.10/site-packages (from requests->torchvision>=0.9.0->onnx2torch) (3.4)
Requirement already satisfied: urllib3<1.27,>=1.21.1 in /opt/conda/lib/python3.10/site-packages (from requests->torchvision>=0.9.0->onnx2torch) (1.26.13)
Installing collected packages: protobuf, onnx, onnx2torch
Successfully installed onnx-1.13.1 onnx2torch-1.5.6 protobuf-3.20.3
WARNING: Running pip as the 'root' user can result in broken permissions and conflicting behaviour with the system package manager. It is recommended to use a virtual environment instead: https://pip.pypa.io/warnings/venv
root@7c0263e9a633:/io# python3 reshape_segfault.py 
root@7c0263e9a633:/io#

What is the reason for this handling and is there a way around this or could it be extended to work with cuda too?

The reason is to ensure backward compatibility: ONNX -> PyTorch -> ONNX, sometimes converted PyTorch code is exported to ONNX as set of operations, not as original ONNX operation.
In other words it ensures the following: ONNX Gather -> PyTorch Gather implementation -> ONNX Gather.

jan-haug · 2023-03-23T10:20:11Z

@senysenyseny16 my bad, I forgot the if __name__ == "__main__" to actually call the function in the script. I updated the example above.

With that, I also get the error when following the docker setup you posted.

senysenyseny16 · 2023-04-12T04:30:54Z

I have reproduced the problem. Thanks for your report, I think the problem is in the C++ code, I'll try to debug it.

jan-haug · 2023-04-12T11:54:23Z

Thanks! FWIW, this also pertains to (some) other ops where a custom mapping is implemented in this way, for example Slice. So it doesn't seem to be specific to the exact operation the mapping is used in

github-actions · 2023-05-13T02:12:15Z

This issue is stale because it has been open 30 days with no activity. Remove stale label or comment or this will be closed in 10 days.

github-actions · 2023-06-13T02:29:58Z

This issue is stale because it has been open 30 days with no activity. Remove stale label or comment or this will be closed in 10 days.

github-actions · 2023-06-23T02:46:39Z

This issue was closed because it has been stalled for 10 days with no activity.

jan-haug · 2023-06-23T16:49:55Z

Any update on this?

senysenyseny16 · 2023-07-07T14:17:27Z

@jan-haug

We think that the problem is in the export mechanism of PyTorch, we are unlikely to be able to do something about it now.
Maybe in 2.0. export is better 🙅 .

Sorry for the late reply.

github-actions · 2023-09-07T02:01:00Z

This issue is stale because it has been open 30 days with no activity. Remove stale label or comment or this will be closed in 10 days.

senysenyseny16 added the bug Something isn't working label Mar 23, 2023

senysenyseny16 self-assigned this Mar 23, 2023

github-actions bot added the Stale label May 13, 2023

senysenyseny16 removed the Stale label May 13, 2023

github-actions bot added the Stale label Jun 13, 2023

github-actions bot closed this as completed Jun 23, 2023

senysenyseny16 removed the Stale label Jul 5, 2023

senysenyseny16 reopened this Jul 5, 2023

github-actions bot added the Stale label Aug 7, 2023

senysenyseny16 removed the Stale label Aug 7, 2023

ENOT-AutoDL deleted a comment from github-actions bot Aug 7, 2023

github-actions bot added the Stale label Sep 7, 2023

senysenyseny16 removed the Stale label Sep 8, 2023

github-actions bot added the Stale label Oct 9, 2023

senysenyseny16 removed the Stale label Oct 9, 2023

github-actions bot added the Stale label Nov 9, 2023

senysenyseny16 removed the Stale label Nov 10, 2023

ENOT-AutoDL deleted a comment from github-actions bot Nov 10, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Segfault during GPU export of `Reshape` after importing model from ONNX #154

Segfault during GPU export of `Reshape` after importing model from ONNX #154

jan-haug commented Mar 22, 2023 •

edited

senysenyseny16 commented Mar 23, 2023

jan-haug commented Mar 23, 2023

senysenyseny16 commented Apr 12, 2023

jan-haug commented Apr 12, 2023

github-actions bot commented May 13, 2023

github-actions bot commented Jun 13, 2023

github-actions bot commented Jun 23, 2023

jan-haug commented Jun 23, 2023

senysenyseny16 commented Jul 7, 2023

github-actions bot commented Sep 7, 2023

Segfault during GPU export of Reshape after importing model from ONNX #154

Segfault during GPU export of Reshape after importing model from ONNX #154

Comments

jan-haug commented Mar 22, 2023 • edited

senysenyseny16 commented Mar 23, 2023

jan-haug commented Mar 23, 2023

senysenyseny16 commented Apr 12, 2023

jan-haug commented Apr 12, 2023

github-actions bot commented May 13, 2023

github-actions bot commented Jun 13, 2023

github-actions bot commented Jun 23, 2023

jan-haug commented Jun 23, 2023

senysenyseny16 commented Jul 7, 2023

github-actions bot commented Sep 7, 2023

Segfault during GPU export of `Reshape` after importing model from ONNX #154

Segfault during GPU export of `Reshape` after importing model from ONNX #154

jan-haug commented Mar 22, 2023 •

edited