We read every piece of feedback, and take your input very seriously.
To see all available qualifiers, see our documentation.
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
......................................................................................... llama_new_context_with_model: n_ctx = 65536 llama_new_context_with_model: n_batch = 512 llama_new_context_with_model: n_ubatch = 512 llama_new_context_with_model: freq_base = 500000.0 llama_new_context_with_model: freq_scale = 1 llama_kv_cache_init: CUDA_Host KV buffer size = 8192.00 MiB llama_new_context_with_model: KV self size = 8192.00 MiB, K (f16): 4096.00 MiB, V (f16): 4096.00 MiB llama_new_context_with_model: CUDA_Host output buffer size = 0.50 MiB ggml_backend_cuda_buffer_type_alloc_buffer: allocating 4507.00 MiB on device 0: cudaMalloc failed: out of memory ggml_gallocr_reserve_n: failed to allocate CUDA0 buffer of size 4725934080 llama_new_context_with_model: failed to allocate compute buffers llama_init_from_gpt_params: error: failed to create context with model 'C:\Users\Administrator.ollama\models\blobs\sha256-912687f2b75ca31331bfcf8b55a34a366dbb0f31df6bf65bc464c1d2431b92be' {"function":"load_model","level":"ERR","line":410,"model":"C:\Users\Administrator\.ollama\models\blobs\sha256-912687f2b75ca31331bfcf8b55a34a366dbb0f31df6bf65bc464c1d2431b92be","msg":"unable to load model","tid":"11992","timestamp":1715077029}
https://huggingface.co/MaziyarPanahi/Llama-3-8B-Instruct-64k-GGUF/tree/main
The text was updated successfully, but these errors were encountered:
No branches or pull requests
.........................................................................................
llama_new_context_with_model: n_ctx = 65536
llama_new_context_with_model: n_batch = 512
llama_new_context_with_model: n_ubatch = 512
llama_new_context_with_model: freq_base = 500000.0
llama_new_context_with_model: freq_scale = 1
llama_kv_cache_init: CUDA_Host KV buffer size = 8192.00 MiB
llama_new_context_with_model: KV self size = 8192.00 MiB, K (f16): 4096.00 MiB, V (f16): 4096.00 MiB
llama_new_context_with_model: CUDA_Host output buffer size = 0.50 MiB
ggml_backend_cuda_buffer_type_alloc_buffer: allocating 4507.00 MiB on device 0: cudaMalloc failed: out of memory
ggml_gallocr_reserve_n: failed to allocate CUDA0 buffer of size 4725934080
llama_new_context_with_model: failed to allocate compute buffers
llama_init_from_gpt_params: error: failed to create context with model 'C:\Users\Administrator.ollama\models\blobs\sha256-912687f2b75ca31331bfcf8b55a34a366dbb0f31df6bf65bc464c1d2431b92be'
{"function":"load_model","level":"ERR","line":410,"model":"C:\Users\Administrator\.ollama\models\blobs\sha256-912687f2b75ca31331bfcf8b55a34a366dbb0f31df6bf65bc464c1d2431b92be","msg":"unable to load model","tid":"11992","timestamp":1715077029}
https://huggingface.co/MaziyarPanahi/Llama-3-8B-Instruct-64k-GGUF/tree/main
The text was updated successfully, but these errors were encountered: