-
Notifications
You must be signed in to change notification settings - Fork 1.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
I cannot make it work on GPU for training #5776
Comments
I made it using a devcontainer with version 2.7 of nni Dockerfile
Still having problems with version 3.0 |
I have somewhat similar issue.
when i work in this fashion, the code runs on CPU. But when I run the code as follow:
It creates 800+ python files and the link doesn't open anymore. It either crashes my PC (because of those multiple files) or the link will have Running 0. Why? |
I am having the same problem as Rajesh90123. |
Description of the issue
I cannot run any experiment on GPU.
I have tried both with a Tesla P4, a P100 and a GTX 1060. I can only make it work using CPU only.
I have tried many configs with setting useActiveGpu to True or False, trialGpuNumber to 1, gpuIndices: '0'. However it always couldn't complete a single architecture training.
I have tried both outside and inside a Docker container.
Configuration
nni/examples/trials/mnist-pytorch/config.yml
Outside a Docker container
Environment
Log message
nnimanager.log
There, the GPU's infos cannot be retreived.
experiment.log
There is a timeout since data cannot be retreived.
Inside a Docker container
Dockerfile
Log message
nnimanager.log
experiment.log
When I'm using CPU only:
I obtain what I want using the GPU, the WebUI, the experiments trials, and so on...
How to reproduce it?
If from a Docker container:
Then in both cases:
As a result, the WebUI wouldn't start due to a timeout trying to retrive data, since the experiment won't load on GPU.
Notes
The text was updated successfully, but these errors were encountered: