Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We鈥檒l occasionally send you account related emails.

Already on GitHub? Sign in to your account

predict_fn error: backend='torch_tensorrt' raised: TypeError: pybind11::init(): factory function returned nullptr #126009

Closed
geraldstanje opened this issue May 11, 2024 · 3 comments

Comments

@geraldstanje
Copy link

geraldstanje commented May 11, 2024

馃悰 Describe the bug

hi i see the following error - it looks like the torch.compile worked fine but when i invoke the prediction after that it errors out:

predict_fn error: backend='torch_tensorrt' raised: TypeError: pybind11::init(): factory function returned nullptr

full log:
torch_error.txt

Used docker image:

# use sagemaker DLC
FROM 763104351884.dkr.ecr.us-east-1.amazonaws.com/pytorch-inference:2.1.0-gpu-py310-cu118-ubuntu20.04-sagemaker

# Install additional dependencies
RUN python -m pip install torch torch-tensorrt tensorrt --extra-index-ur https://download.pytorch.org/whl/cu118

how was the model compiled?

model.model_body[0].auto_model = torch.compile(model.model_body[0].auto_model, backend="torch_tensorrt", dynamic=False,
                                options={"truncate_long_and_double": True,
                                         "precision": torch.half,
                                         "debug": True,
                                         "min_block_size": 1,
                                         "optimization_level": 4,
                                         "use_python_runtime": False})

to rule out that the issue is somewhere else - i tested with the following torch.compile - this works fine:

model.model_body[0].auto_model = torch.compile(model.model_body[0].auto_model, mode="reduce-overhead")

Versions

pytorch 2.1

cc @ezyang @msaroufim @bdhirsh @anijain2305 @chauhang

@ezyang
Copy link
Contributor

ezyang commented May 14, 2024

This probably should be sent to tensorrt repo

@geraldstanje
Copy link
Author

@ezyang so torch.compile with tensorrt (jit approach) is part of tensorrt repo?

@xmfan
Copy link
Member

xmfan commented May 20, 2024

@geraldstanje From torch_error.txt, it seems like graph capture was successful and there's a couple of errors from the tensorrt backend:

2024-05-10T21:15:33.961Z	2024-05-10T21:15:33,744 [INFO ] W-9001-model_1.0-stdout MODEL_LOG - [05/10/2024-21:15:33] [TRT] [W] CUDA initialization failure with error: 35. Please check your CUDA installation: http://docs.nvidia.com/cuda/cuda-installation-guide-linux/index.html
...
2024-05-10T21:15:33,747 [INFO ] W-9001-model_1.0-stdout MODEL_LOG - TypeError: pybind11::init(): factory function returned nullptr

Please re-open the issue if you believe the problem to be caused by the graph sent to tensorrt

@xmfan xmfan closed this as completed May 20, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants