[UDOP] Add special tokens to tokenizer #29594

NielsRogge · 2024-03-11T16:46:22Z

What does this PR do?

This PR makes sure the 1201 additional special tokens can be encoded/decoded properly.

HuggingFaceDocBuilderDev · 2024-03-11T17:07:48Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

tests/models/udop/test_tokenization_udop.py

ArthurZucker

LGTM, let's do better testing:
paragraph<loc_58>. Hey for example with a special case

src/transformers/models/udop/convert_udop_to_hf.py

tests/models/udop/test_tokenization_udop.py

NielsRogge · 2024-03-26T20:51:24Z

Thanks, I've added tests. Should we add spaces_between_special_tokens as argument to UdopTokenizer, to make it decode the same as the fast tokenizer?

ArthurZucker · 2024-03-27T14:56:06Z

You can add it and default it to False in the decode maybe? Cuz I don't think decode takes self. spaces_between_special_tokens into account no?

NielsRogge · 2024-04-12T11:48:57Z

Yeah indeed, shouldn't the _decode method here take into account self. spaces_between_special_tokens in case it exists?

ArthurZucker

LGTM, breaking but fixing

NielsRogge · 2024-04-17T17:00:36Z

I don't think it's a breaking change, since adding spaces_between_special_tokens as an attribute doesn't do anything. The _decode method doesn't use it (whereas I think it should).

ArthurZucker · 2024-04-18T09:39:31Z

If it does not do anything why are we adding it?

NielsRogge · 2024-04-18T18:42:24Z

Haha yes I'll remove it, and I'll remove it for Gemma too

* Add special tokens * Add special tokens * Use fmt * Uncomment code * Add test * Remove scripts * Address comments * Improve tests * Address comment * Remove flag

NielsRogge added 6 commits March 11, 2024 14:41

Add special tokens

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature. The key has expired.

GPG key ID: 4AEE18F83AFDEB23
Expired

Verified
Learn about vigilant mode

af6b301

Add special tokens

f4fb474

Use fmt

Loading
Loading status checks…

5a15a02

Uncomment code

Loading
Loading status checks…

6df7ee5

Add test

a036a6e

Remove scripts

3543e20

NielsRogge commented Mar 11, 2024

View reviewed changes

tests/models/udop/test_tokenization_udop.py Show resolved Hide resolved

ArthurZucker reviewed Mar 17, 2024

View reviewed changes

src/transformers/models/udop/convert_udop_to_hf.py Outdated Show resolved Hide resolved

tests/models/udop/test_tokenization_udop.py Show resolved Hide resolved

NielsRogge added 2 commits March 26, 2024 21:40

Address comments

c1f582f

Improve tests

effd9e7

Address comment

54fd234

NielsRogge requested a review from ArthurZucker April 17, 2024 06:32

ArthurZucker approved these changes Apr 17, 2024

View reviewed changes

Remove flag

5827eeb

NielsRogge merged commit ecfe9be into huggingface:main Apr 19, 2024
19 checks passed

ydshieh pushed a commit that referenced this pull request Apr 23, 2024

[UDOP] Add special tokens to tokenizer (#29594)

968cda2

* Add special tokens * Add special tokens * Use fmt * Uncomment code * Add test * Remove scripts * Address comments * Improve tests * Address comment * Remove flag

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[UDOP] Add special tokens to tokenizer #29594

[UDOP] Add special tokens to tokenizer #29594

NielsRogge commented Mar 11, 2024

HuggingFaceDocBuilderDev commented Mar 11, 2024

ArthurZucker left a comment

NielsRogge commented Mar 26, 2024

ArthurZucker commented Mar 27, 2024

NielsRogge commented Apr 12, 2024

ArthurZucker left a comment

NielsRogge commented Apr 17, 2024 •

edited

Loading

ArthurZucker commented Apr 18, 2024

NielsRogge commented Apr 18, 2024

[UDOP] Add special tokens to tokenizer #29594

[UDOP] Add special tokens to tokenizer #29594

Conversation

NielsRogge commented Mar 11, 2024

What does this PR do?

HuggingFaceDocBuilderDev commented Mar 11, 2024

ArthurZucker left a comment

Choose a reason for hiding this comment

NielsRogge commented Mar 26, 2024

ArthurZucker commented Mar 27, 2024

NielsRogge commented Apr 12, 2024

ArthurZucker left a comment

Choose a reason for hiding this comment

NielsRogge commented Apr 17, 2024 • edited Loading

ArthurZucker commented Apr 18, 2024

NielsRogge commented Apr 18, 2024

NielsRogge commented Apr 17, 2024 •

edited

Loading