max_new_tokens as a parameter to answer_question() and default=256 instead of 512 #86

Twenkid · 2024-04-16T22:01:23Z

max_new_tokens is added as a parameter to answer_question() and default is set to 256 instead of 512.
Reasons:

In order to reduce the delay when the model enters in loops generating .............. and taking a lot of time (I found such cases with images with text) and to control it.
In test generations I found about 200 was enough for complete text (probably less), but 100 was a bit short.

Also, for avoiding the repetition loops, can some sort of an interrupt be added, as a callback like in GPT4All and a constraint when capturing repetition of tokens, unusual sequences like these dots etc.: repetition_penalty etc.? Also: a streaming mode? (I had a glance on the code, I may try to figure it out.)

Sample repetition loop:
Z:\LMSYS-Free-GPT4-Claude-15-4-2024-2024-04-15_18-33-46.mp4_snapshot_05.28.266.png

The image is a screenshot of a website called "LMSys Chatbot Arena". The website is predominantly white with blue and orange accents. The main focus is a banner at the top of the screen, which is white with red and blue text. The text reads "Free GPT4, claude, llama, code llama, mixral-of-experts, command-r-plus..........................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................

The picture itself had "..." in the text.

max_new_tokens is added as a parameter to answer_question() and default is set to 256 instead of 512. Reasons: 1) more control, 2) at least in test generations I found about 200 wasn't interrupted (100 was small) ; 3) in order to reduce the delay when the model enters in loops generating .............. and taking a lot of time. Also: can some sort of an interrupt be added, either with a callback like in GPT4All, also a constraint when capturing a repetition of tokens etc., repetition_penalty etc.? Can we add a streaming mode? Thanks.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

max_new_tokens as a parameter to answer_question() and default=256 instead of 512 #86

max_new_tokens as a parameter to answer_question() and default=256 instead of 512 #86

Twenkid commented Apr 16, 2024 •

edited

max_new_tokens as a parameter to answer_question() and default=256 instead of 512 #86

Are you sure you want to change the base?

max_new_tokens as a parameter to answer_question() and default=256 instead of 512 #86

Conversation

Twenkid commented Apr 16, 2024 • edited

Twenkid commented Apr 16, 2024 •

edited