-
Notifications
You must be signed in to change notification settings - Fork 28.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[UDOP] Add special tokens to tokenizer #29594
[UDOP] Add special tokens to tokenizer #29594
Conversation
The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, let's do better testing:
paragraph<loc_58>. Hey
for example with a special case
Thanks, I've added tests. Should we add |
You can add it and default it to |
Yeah indeed, shouldn't the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, breaking but fixing
I don't think it's a breaking change, since adding |
If it does not do anything why are we adding it? |
Haha yes I'll remove it, and I'll remove it for Gemma too |
* Add special tokens * Add special tokens * Use fmt * Uncomment code * Add test * Remove scripts * Address comments * Improve tests * Address comment * Remove flag
What does this PR do?
This PR makes sure the 1201 additional special tokens can be encoded/decoded properly.
Fixes #29591