-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Slow on Llama_cpp_python #4
Comments
Hi @dnhkng, Unfortunately 2 different routes to improve performance:
Apologies for the slowness, and thanks for raising the issue |
I do want to use CAPPr. I need the probabilities (yes I know that we can't use these as confidence values), but I think I have found something useful. I need to run a few thousand inference over a long prompt, with a large number of categories to test this though. Any other tips would be welcome. |
|
Another tip: if all of the prompts start with the same long substring, e.g., system instructions and few-shot examples, use |
Thanks for the tips!
I'm currently using Outlines for category selection, but I want to try a new approach that requires probabilities. It initially, I planned on doing the naive 'tokenise the categories', and collect the probabilities for each token, and do the whole thing in a batch the size of the categories. But I assume that sometimes the same category wording can be achieved via multiple token combinations... So I found your library to use instead. |
Makes sense. How long are the completions? Despite the long prompt, it's still surprising to me that 12 completions takes a minute. Maybe the prompt can be refactored a bit to work with Also, I found this point from your previous comment interesting:
Can you elaborate on this / point to a reference? I've barely studied this |
I just mean the fact that LLMs are trained on cross entropy loss on next tokens. This leads to overconfidence, as the loss does not penalise miscalibration. But maybe that's not a huge issue? |
I believe that research is about neural networks leading to overconfidence. Cross entropy (CE) / negative log-likelihood (NLL) is also used to fit logistic regression, for example. A short mathematical argument will make it clear that CE/NLL is great for calibration: the loss is That aside, I take your point. LLMs are NNs, and they shouldn't be expected to be calibrated. Though I think it's worth researching how calibrated CAPPr's probabilities are. Hopefully they turn out to be helpful for your task |
Exactly. The tests I'm planning are to see how it works in practice. As far as the algorithm works, does it sum the probabilities of all paths? Eg, Assume we are using a character-level LLM, and want to calculate the probability for the word "foobar", it can be tokenised as: Of course, we can split into chunks, and recursively calculate the probability. We need to calculate the graph, find all the sub-chuncks, and get the logits required to fill in the branches. Once this is done, we can use the probabilities to find the probabilities of each path and back-calculate the average probability per token. Is this what CAPPr does? |
No. It tokenizes See my post about it here and related work for more detail on how it works. |
This seems to work nicely for my use case, but using the llama_cpp_python backend with a long prompt and a dozen completions takes about 1 minute. Any tips on improving speed?
The text was updated successfully, but these errors were encountered: