predict_fn error: backend='torch_tensorrt' raised: TypeError: pybind11::init(): factory function returned nullptr #126009

geraldstanje · 2024-05-11T21:22:39Z

🐛 Describe the bug

hi i see the following error - it looks like the torch.compile worked fine but when i invoke the prediction after that it errors out:

predict_fn error: backend='torch_tensorrt' raised: TypeError: pybind11::init(): factory function returned nullptr

full log:
torch_error.txt

Used docker image:

# use sagemaker DLC
FROM 763104351884.dkr.ecr.us-east-1.amazonaws.com/pytorch-inference:2.1.0-gpu-py310-cu118-ubuntu20.04-sagemaker

# Install additional dependencies
RUN python -m pip install torch torch-tensorrt tensorrt --extra-index-ur https://download.pytorch.org/whl/cu118

how was the model compiled?

model.model_body[0].auto_model = torch.compile(model.model_body[0].auto_model, backend="torch_tensorrt", dynamic=False,
                                options={"truncate_long_and_double": True,
                                         "precision": torch.half,
                                         "debug": True,
                                         "min_block_size": 1,
                                         "optimization_level": 4,
                                         "use_python_runtime": False})

to rule out that the issue is somewhere else - i tested with the following torch.compile - this works fine:

model.model_body[0].auto_model = torch.compile(model.model_body[0].auto_model, mode="reduce-overhead")

Versions

pytorch 2.1

cc @ezyang @msaroufim @bdhirsh @anijain2305 @chauhang

The text was updated successfully, but these errors were encountered:

ezyang · 2024-05-14T18:25:55Z

This probably should be sent to tensorrt repo

geraldstanje · 2024-05-14T18:42:34Z

@ezyang so torch.compile with tensorrt (jit approach) is part of tensorrt repo?

xmfan · 2024-05-20T17:20:55Z

@geraldstanje From torch_error.txt, it seems like graph capture was successful and there's a couple of errors from the tensorrt backend:

2024-05-10T21:15:33.961Z	2024-05-10T21:15:33,744 [INFO ] W-9001-model_1.0-stdout MODEL_LOG - [05/10/2024-21:15:33] [TRT] [W] CUDA initialization failure with error: 35. Please check your CUDA installation: http://docs.nvidia.com/cuda/cuda-installation-guide-linux/index.html
...
2024-05-10T21:15:33,747 [INFO ] W-9001-model_1.0-stdout MODEL_LOG - TypeError: pybind11::init(): factory function returned nullptr

Please re-open the issue if you believe the problem to be caused by the graph sent to tensorrt

ezyang added the oncall: pt2 label May 14, 2024

xmfan closed this as completed May 20, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

predict_fn error: backend='torch_tensorrt' raised: TypeError: pybind11::init(): factory function returned nullptr #126009

predict_fn error: backend='torch_tensorrt' raised: TypeError: pybind11::init(): factory function returned nullptr #126009

geraldstanje commented May 11, 2024 •

edited by pytorch-bot bot

ezyang commented May 14, 2024

geraldstanje commented May 14, 2024

xmfan commented May 20, 2024

predict_fn error: backend='torch_tensorrt' raised: TypeError: pybind11::init(): factory function returned nullptr #126009

predict_fn error: backend='torch_tensorrt' raised: TypeError: pybind11::init(): factory function returned nullptr #126009

Comments

geraldstanje commented May 11, 2024 • edited by pytorch-bot bot

🐛 Describe the bug

Versions

ezyang commented May 14, 2024

geraldstanje commented May 14, 2024

xmfan commented May 20, 2024

geraldstanje commented May 11, 2024 •

edited by pytorch-bot bot