Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Centralize GPU configuration vars #4264

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

dhiltgen
Copy link
Collaborator

@dhiltgen dhiltgen commented May 8, 2024

This should aid in troubleshooting by capturing and reporting the GPU settings at startup in the logs along with all the other server settings.

Fixes #4139

Example output setting the ROCm gfx override:

2024/05/08 19:33:27 routes.go:993: INFO server config env="map[CUDA_VISIBLE_DEVICES: GPU_DEVICE_ORDINAL: HIP_VISIBLE_DEVICES: HSA_OVERRIDE_GFX_VERSION:10.3.0 OLLAMA_DEBUG:true OLLAMA_LLM_LIBRARY: OLLAMA_MAX_LOADED_MODELS:1 OLLAMA_MAX_QUEUE:512 OLLAMA_MAX_VRAM:0 OLLAMA_NOPRUNE:false OLLAMA_NUM_PARALLEL:1 OLLAMA_ORIGINS:[http://localhost https://localhost http://localhost:* https://localhost:* http://127.0.0.1 https://127.0.0.1 http://127.0.0.1:* https://127.0.0.1:* http://0.0.0.0 https://0.0.0.0 http://0.0.0.0:* https://0.0.0.0:*] OLLAMA_RUNNERS_DIR: OLLAMA_TMPDIR: ROCR_VISIBLE_DEVICES:]"

This should aid in troubleshooting by capturing and reporting the GPU
settings at startup in the logs along with all the other server settings.
p, err := strconv.Atoi(onp)
if err != nil || p <= 0 {
slog.Error("invalid setting", "OLLAMA_MAX_QUEUE", onp, "error", err)
} else {
MaxQueuedRequests = p
}
}

CudaVisibleDevices = clean("CUDA_VISIBLE_DEVICES")
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@dhiltgen is this public variable CudaVisibleDevices meant to be used in cudaGetVisibleDevicesEnv?

func cudaGetVisibleDevicesEnv(gpuInfo []GpuInfo) (string, string) {

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good question.

For NVIDIA GPUs, we use their C libraries to discover the GPUs. We've recently switched to using the Driver API as our primary source, and fall back to the cuda runtime library if that fails (most likely we'll remove the cuda runtime code in a few releases as long as the Driver API looks reliable) Those libraries already implement filtering based on this environment variable, so the calls here and here will only return a subset of GPUs based on that variable if the user has set it.

In our scheduler, we pick which GPU (of the GPUs exposed) to run a model on, and when we run the subprocess for inference, we wire up the environment variable to ensure that subprocess uses exactly the set of GPUs we want it to use.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

only 1 GPU found -- regression 1.32 -> 1.33
3 participants