Improve WebM detection #486

Borewit · 2021-08-30T19:25:11Z

Fixes recognition of WebM format.

Fixes: #485

test.js

sindresorhus · 2021-08-30T22:21:54Z

core.js

@@ -736,7 +736,8 @@ async function _fromTokenizer(tokenizer) {
 			while (children > 0) {
 				const element = await readElement();
 				if (element.id === 0x42_82) {
-					return tokenizer.readToken(new Token.StringType(element.len, 'utf-8')); // Return DocType
+					const rawValue = await tokenizer.readToken(new Token.StringType(element.len, 'utf-8'));
+					return rawValue.replace(/\00.*$/g, ''); // Return DocType


Does it document the maximum amount of null characters there could be? Would be nice to have a limit in place so it wouldn't hang on faulty files that has too many null characters.

There is no maximum, it's used as a kind of padding.
The maximum length is of the string read is already terminated by element.len.

Ah you mean check element.len? The length can exceed the JavaScript number length and is encoded a specific way (VINT examples).
At that point the assumption is already it is EBML and starts to consume the tokenizer
and iterate through the EBML elements.
You could say, the docType must be a relative short value, but then we already matched 0x1A, 0x45, 0xDF, 0xA3 & 0x42_82. Extremely unlikely we hit that point without the format being EBML.

Fixes recognition of WebM format. Resolves: #485

Borewit force-pushed the fix-issue-485 branch from 721096a to a1730c3 Compare August 30, 2021 19:26

Borewit added the bug label Aug 30, 2021

Borewit requested a review from sindresorhus August 30, 2021 19:27

sindresorhus reviewed Aug 30, 2021

View reviewed changes

test.js Outdated Show resolved Hide resolved

sindresorhus reviewed Aug 30, 2021

View reviewed changes

Ignore leading null values in EBML UTF-8 value

9cc6d84

Fixes recognition of WebM format. Resolves: #485

Borewit force-pushed the fix-issue-485 branch from a1730c3 to 9cc6d84 Compare August 31, 2021 05:27

Borewit changed the title ~~Ignore leading null values in EBML UTF-8 value~~ Ignore trailing null values in EBML UTF-8 value Aug 31, 2021

sindresorhus changed the title ~~Ignore trailing null values in EBML UTF-8 value~~ Improve WebM detection Sep 1, 2021

sindresorhus merged commit b23be62 into main Sep 1, 2021

sindresorhus deleted the fix-issue-485 branch September 1, 2021 00:32

Borewit mentioned this pull request Sep 3, 2021

audio/webm detected as video/webm #488

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improve WebM detection #486

Improve WebM detection #486

Borewit commented Aug 30, 2021

sindresorhus Aug 30, 2021

Borewit Aug 31, 2021

Borewit Aug 31, 2021 •

edited

Improve WebM detection #486

Improve WebM detection #486

Conversation

Borewit commented Aug 30, 2021

sindresorhus Aug 30, 2021

Choose a reason for hiding this comment

Borewit Aug 31, 2021

Choose a reason for hiding this comment

Borewit Aug 31, 2021 • edited

Choose a reason for hiding this comment

Borewit Aug 31, 2021 •

edited