New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
UserWarning: cuDA initialization: Unexpected error from cudaGetDevicecount(). #8269
Comments
Probably. Try updating |
How did you install |
Download the. whl file and install it using pip |
Hmm, if you freeze the following example import torch
print("torch.cuda.is_available:", torch.cuda.is_available())
print("torch.cuda.device_count:", torch.cuda.device_count()) does it work? Or does Does target computer have NVIDIA driver installed? |
That |
FWIW, if I freeze import torch
print("torch.cuda.is_available:", torch.cuda.is_available())
print("torch.cuda.device_count:", torch.cuda.device_count()) in a clean python 3.9.18 virtual environment on my Fedora 39 desktop (with If you set up a python virtual environment on the target system and install |
When the driver versions of two computers are the same, torch.cuda.is_available() returns True. Why does the driver version affect the use of CUDA? Is this related to the incomplete package included during packaging? |
Ah - in that case, the problem is likely that we collect a part of driver libraries that we shouldn't be collecting. |
If you are using Alternatively, you can add pyinstaller/PyInstaller/depend/dylib.py Lines 187 to 223 in 249d8fc
Does that fix the problem? |
If this is still Can you check what are the contents of |
Hmmm... in that case, can you rebuild again (with
to see where the |
What are contents of the |
That's not from Your hook file seems to be from 2023.10 or earlier. Although since it is an old brute-force hook that collects whole So next question is, what are the contents of the (If you were not rebuilding with |
Is this conda-installed |
I installed it using the pip .whl file, Not installed with conda |
Where did you get the .whl file from? |
The hook does not really support conda-installed |
Hmm, actually, looks like cuDNN libs are part of
and if I freeze a test torch program in that miniconda environment, they all end up collected:
|
This looks like external cuDNN (in What happens if you temporarily remove |
If I temporarily remove /home/zhang/cuda/lib64 from LD_LIBRARY_PATH in the target environment,the program can run, |
Hmmm, yeah, those
Can you open Do you also have an external CUDA toolkit in |
You cannot, that's the problem.
Do you have CUDA and cuDNN installed in the conda environment, then (for example, it would be installed via conda if this is the same environment that you previously had conda-installed torch in). |
I.e., the binary dependency analysis on Linux tries to resolve the shared lib dependencies via |
Uh, wait, just to reconfirm - are you now using conda-installed or pip-installed torch? |
Pip installed torch
I have installed the cuda environment on the target computer, and the packaged exe with onefile will first call the cuda environment of the local environment, causing an error. If the local cuda is removed, the program can run normally
发自我的iPhone
…------------------ Original ------------------
From: Rok Mandeljc ***@***.***>
Date: Sun,Feb 4,2024 6:52 PM
To: pyinstaller/pyinstaller ***@***.***>
Cc: jiaqizhang123-stack ***@***.***>, Author ***@***.***>
Subject: Re: [pyinstaller/pyinstaller] UserWarning: cuDA initialization:Unexpected error from cudaGetDevicecount(). (Issue #8269)
Uh, wait, just to reconfirm - are you now using conda-installed or pip-installed torch?
—
Reply to this email directly, view it on GitHub, or unsubscribe.
You are receiving this because you authored the thread.Message ID: ***@***.***>
|
OK, I've tried to build and run the following example program # train_program.py
from ultralytics import YOLO
model = YOLO('yolov8n.yaml').load('yolov8n.pt') # build from YAML and transfer weights
results = model.train(data='coco128.yaml', epochs=2, imgsz=640) with system-installed python 3.9.18 (i.e., no conda) with clean virtual environment and pip-installed The
to their copies in the
(note that not all If I try to run it (on the build machine), it ends up crashing when starting first training epoch, with
This one seems to be caused by Either removing
Based on the screenshots you've provided so far, it indeed looks like that a second set of CUDA/cuDNN libraries is collected from somewhere (hence |
OK, so now you've switched to If I build the test training program using this version, it seems to work fine out of the box, without having to remove If I add external cuDNN to I'll need to check where and how this dependency leaks occurs, and what we can do about it. For now, the only way you can work around it is to ensure that you don't have external CUDA/cuDNN in If I take the initial build (the one I said worked out of the box) and run it in environment that has external cuDNN in Aside from removing external cuDNN from For now you need to do this manually, but eventually, our hooks will be able to add these missing symlinks automatically. So to summarize:
I cannot really help with this, as I cannot reproduce the problem (without the code and data you are using), which might or might not be related to other issues we've seen here. |
Ok,thank you so much! |
Hello, when I used pyinstaller to package the training code(torch, cuda, linux), the cuda can not be used on another computer. Is this because cuda wasn't packaged?
The error is
Thank you for your help
The text was updated successfully, but these errors were encountered: