Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GPU version build not using GPU #114

Open
dspasyuk opened this issue Aug 6, 2023 · 11 comments
Open

GPU version build not using GPU #114

dspasyuk opened this issue Aug 6, 2023 · 11 comments

Comments

@dspasyuk
Copy link

dspasyuk commented Aug 6, 2023

Hi Everyone,

I am trying to build llama-node for GPU, I followed the guide in the readme https://llama-node.vercel.app/docs/cuda but the version of the llam-cpp I get from a manual build uses CPU not GPU. When I build llama-cpp directly in llama-sys folder using the following command:

make clean && LLAMA_CUBLAS=1 make -j
It gives me perfectly fine GPU executable file which works no problem.

Am I missing something?
Here is my full build commands:

git clone https://github.com/Atome-FE/llama-node.git
cd llama-node/
rustup target add x86_64-unknown-linux-musl
git submodule update --init --recursive
pnpm install --ignore-scripts
cd packages/llama-cpp/
pnpm build:cuda

Then I get libllama.so file in my ~/.llama-node which when used does not use GPU: Here my script to run it:

import { LLM } from "llama-node";
import { LLamaCpp } from "llama-node/dist/llm/llama-cpp.js";
import path from "path";
const model = path.resolve(process.cwd(), "~/CODE/models/vicuna-7b-v1.3.ggmlv3.q4_0.bin");
const llama = new LLM(LLamaCpp);
const config = {
modelPath: model,
enableLogging: true,
nCtx: 1024,
seed: 0,
f16Kv: false,
logitsAll: false,
vocabOnly: false,
useMlock: false,
embedding: false,
useMmap: true,
nGpuLayers: 40
};
const template = How do I train you to read my documents?;
const prompt = A chat between a user and an assistant. USER: ${template} ASSISTANT:;
const params = {
nThreads: 4,
nTokPredict: 2048,
topK: 40,
topP: 0.1,
temp: 0.2,
repeatPenalty: 1,
prompt,
};
const run = async () => {
await llama.load(config);
await llama.createCompletion(params, (response) => {
process.stdout.write(response.token);
});
};
run();

Any help appreciated

@shaileshminsnapsys
Copy link

I am facing the same issue. Can anyone please guide us on this?

@dspasyuk
Copy link
Author

dspasyuk commented Aug 18, 2023

I ended up using just llama.cpp. Works very well on the GPU. You can write a simple wrapper in nodejs without rust. I can share the code if you want.

@shaileshminsnapsys
Copy link

@deonis1 it will be a great help. Please share the code.

@dspasyuk
Copy link
Author

@shaileshminsnapsys no problem the code is here https://github.com/deonis1/llcui

@shaileshminsnapsys
Copy link

Thank you @deonis1 , I'll check with the code.

Thank you for your help.

@dspasyuk
Copy link
Author

Let me know if you have any issues

@shaileshminsnapsys
Copy link

shaileshminsnapsys commented Aug 22, 2023

@deonis1 Thank you so much, your code help me alot to achieve my target.

Many Thanks !!

@dspasyuk
Copy link
Author

@shaileshminsnapsys no problem, there is a new version if you are interested

@shaileshminsnapsys
Copy link

@deonis1 would love to see the new version. Thank you

@dspasyuk
Copy link
Author

@shaileshminsnapsys The new version that supports embedding (mongodb or text document) is released. You can find it under the new url:
https://github.com/deonis1/llama.cui

@shaileshminsnapsys
Copy link

@deonis1
Wow, its amazing.. Thanks, I'll give a try to it for sure.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants