Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

If multiple GPUs are present at server and devices parameter is set to specific GPU, catboost allocates GPU memory at other GPUs #2649

Open
dremovd opened this issue Apr 24, 2024 · 1 comment
Labels

Comments

@dremovd
Copy link

dremovd commented Apr 24, 2024

Problem:
If multiple GPUs are present at server and devices parameter is set to specific GPU, catboost allocates GPU memory at other GPUs

catboost version:
1.2.5
Operating System:
lsb_release -a
No LSB modules are available.
Distributor ID: Ubuntu
Description: Ubuntu 22.04.3 LTS
Release: 22.04
Codename: jammy

CPU:
lscpu
Architecture: x86_64
CPU op-mode(s): 32-bit, 64-bit
Address sizes: 48 bits physical, 48 bits virtual
Byte Order: Little Endian
CPU(s): 256
On-line CPU(s) list: 0-254
Off-line CPU(s) list: 255
Vendor ID: AuthenticAMD
Model name: AMD EPYC 7713 64-Core Processor
GPU:
nvidia-smi
Wed Apr 24 11:25:21 2024
+---------------------------------------------------------------------------------------+
| NVIDIA-SMI 535.146.02 Driver Version: 535.146.02 CUDA Version: 12.2 |
|-----------------------------------------+----------------------+----------------------+
| GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|=========================================+======================+======================|
| 0 NVIDIA GeForce RTX 4090 On | 00000000:01:00.0 Off | Off |
| 90% 26C P8 27W / 400W | 1MiB / 24564MiB | 0% Default |
| | | N/A |
+-----------------------------------------+----------------------+----------------------+
| 1 NVIDIA GeForce RTX 4090 On | 00000000:23:00.0 Off | Off |
| 90% 25C P8 26W / 400W | 1MiB / 24564MiB | 0% Default |
| | | N/A |
+-----------------------------------------+----------------------+----------------------+
| 2 NVIDIA GeForce RTX 4090 On | 00000000:41:00.0 Off | Off |
| 90% 26C P8 19W / 400W | 1MiB / 24564MiB | 0% Default |
| | | N/A |
+-----------------------------------------+----------------------+----------------------+
| 3 NVIDIA GeForce RTX 4090 On | 00000000:61:00.0 Off | Off |
| 90% 25C P8 24W / 400W | 1MiB / 24564MiB | 0% Default |
| | | N/A |
+-----------------------------------------+----------------------+----------------------+
| 4 NVIDIA GeForce RTX 4090 On | 00000000:81:00.0 Off | Off |
| 90% 27C P8 28W / 400W | 1MiB / 24564MiB | 0% Default |
| | | N/A |
+-----------------------------------------+----------------------+----------------------+
| 5 NVIDIA GeForce RTX 4090 On | 00000000:A1:00.0 Off | Off |
| 90% 27C P8 38W / 400W | 1MiB / 24564MiB | 0% Default |
| | | N/A |
+-----------------------------------------+----------------------+----------------------+
| 6 NVIDIA GeForce RTX 4090 On | 00000000:C1:00.0 Off | Off |
| 90% 26C P8 33W / 400W | 1MiB / 24564MiB | 0% Default |
| | | N/A |
+-----------------------------------------+----------------------+----------------------+
| 7 NVIDIA GeForce RTX 4090 On | 00000000:E1:00.0 Off | Off |
| 90% 28C P8 32W / 400W | 1MiB / 24564MiB | 0% Default |
| | | N/A |
+-----------------------------------------+----------------------+----------------------+

catboost_params = {
'iterations': 5000,
'learning_rate': 0.02,
'max_depth': 7,
'random_state': 0,
'task_type': 'GPU',
'devices': '0',
'gpu_ram_part': 0.85,
'border_count': 64,
}

@ek-ak
Copy link
Collaborator

ek-ak commented Apr 27, 2024

Hello!
Please try to limit the list of available devices with environment variable CUDA_VISIBLE_DEVICES.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants