Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

max_new_tokens as a parameter to answer_question() and default=256 instead of 512 #86

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

Twenkid
Copy link

@Twenkid Twenkid commented Apr 16, 2024

max_new_tokens is added as a parameter to answer_question() and default is set to 256 instead of 512.
Reasons:

  • In order to reduce the delay when the model enters in loops generating .............. and taking a lot of time (I found such cases with images with text) and to control it.
  • In test generations I found about 200 was enough for complete text (probably less), but 100 was a bit short.

Also, for avoiding the repetition loops, can some sort of an interrupt be added, as a callback like in GPT4All and a constraint when capturing repetition of tokens, unusual sequences like these dots etc.: repetition_penalty etc.? Also: a streaming mode? (I had a glance on the code, I may try to figure it out.)

Sample repetition loop:
Z:\LMSYS-Free-GPT4-Claude-15-4-2024-2024-04-15_18-33-46.mp4_snapshot_05.28.266.png

The image is a screenshot of a website called "LMSys Chatbot Arena". The website is predominantly white with blue and orange accents. The main focus is a banner at the top of the screen, which is white with red and blue text. The text reads "Free GPT4, claude, llama, code llama, mixral-of-experts, command-r-plus..........................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................

The picture itself had "..." in the text.

max_new_tokens is added as a parameter to answer_question() and default is set to 256 instead of 512. Reasons: 1) more control, 2) at least in test generations I found about 200 wasn't interrupted (100 was small) ; 3) in order to reduce the delay when the model enters in loops generating .............. and taking a lot of time. Also: can some sort of an interrupt be added, either with a callback like in GPT4All, also a constraint when capturing a repetition of tokens etc., repetition_penalty etc.? Can we add a streaming mode? Thanks.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

1 participant