Centralize GPU configuration vars #4264

dhiltgen · 2024-05-08T19:35:08Z

This should aid in troubleshooting by capturing and reporting the GPU settings at startup in the logs along with all the other server settings.

Fixes #4139

Example output setting the ROCm gfx override:

2024/05/08 19:33:27 routes.go:993: INFO server config env="map[CUDA_VISIBLE_DEVICES: GPU_DEVICE_ORDINAL: HIP_VISIBLE_DEVICES: HSA_OVERRIDE_GFX_VERSION:10.3.0 OLLAMA_DEBUG:true OLLAMA_LLM_LIBRARY: OLLAMA_MAX_LOADED_MODELS:1 OLLAMA_MAX_QUEUE:512 OLLAMA_MAX_VRAM:0 OLLAMA_NOPRUNE:false OLLAMA_NUM_PARALLEL:1 OLLAMA_ORIGINS:[http://localhost https://localhost http://localhost:* https://localhost:* http://127.0.0.1 https://127.0.0.1 http://127.0.0.1:* https://127.0.0.1:* http://0.0.0.0 https://0.0.0.0 http://0.0.0.0:* https://0.0.0.0:*] OLLAMA_RUNNERS_DIR: OLLAMA_TMPDIR: ROCR_VISIBLE_DEVICES:]"

This should aid in troubleshooting by capturing and reporting the GPU settings at startup in the logs along with all the other server settings.

dims · 2024-05-09T02:40:13Z

server/envconfig/config.go

 		p, err := strconv.Atoi(onp)
 		if err != nil || p <= 0 {
 			slog.Error("invalid setting", "OLLAMA_MAX_QUEUE", onp, "error", err)
 		} else {
 			MaxQueuedRequests = p
 		}
 	}
+
+	CudaVisibleDevices = clean("CUDA_VISIBLE_DEVICES")


@dhiltgen is this public variable CudaVisibleDevices meant to be used in cudaGetVisibleDevicesEnv?

ollama/gpu/cuda_common.go

Line 10 in ee49844

func cudaGetVisibleDevicesEnv(gpuInfo []GpuInfo) (string, string) {

Good question.

For NVIDIA GPUs, we use their C libraries to discover the GPUs. We've recently switched to using the Driver API as our primary source, and fall back to the cuda runtime library if that fails (most likely we'll remove the cuda runtime code in a few releases as long as the Driver API looks reliable) Those libraries already implement filtering based on this environment variable, so the calls here and here will only return a subset of GPUs based on that variable if the user has set it.

In our scheduler, we pick which GPU (of the GPUs exposed) to run a model on, and when we run the subprocess for inference, we wire up the environment variable to ensure that subprocess uses exactly the set of GPUs we want it to use.

Centralize GPU configuration vars

bf1cbe4

This should aid in troubleshooting by capturing and reporting the GPU settings at startup in the logs along with all the other server settings.

jmorganca approved these changes May 9, 2024

View reviewed changes

dims reviewed May 9, 2024

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Centralize GPU configuration vars #4264

Centralize GPU configuration vars #4264

dhiltgen commented May 8, 2024

dims May 9, 2024

dhiltgen May 10, 2024

Centralize GPU configuration vars #4264

Are you sure you want to change the base?

Centralize GPU configuration vars #4264

Conversation

dhiltgen commented May 8, 2024

dims May 9, 2024

Choose a reason for hiding this comment

dhiltgen May 10, 2024

Choose a reason for hiding this comment