Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

regexp parameter of String.prototype.matchAll is too restrictive #47310

Closed
jgonggrijp opened this issue Jan 4, 2022 · 17 comments
Closed

regexp parameter of String.prototype.matchAll is too restrictive #47310

jgonggrijp opened this issue Jan 4, 2022 · 17 comments
Labels
Working as Intended The behavior described is the intended behavior; this is not a bug

Comments

@jgonggrijp
Copy link

Bug Report

This issue is somewhat related to, though not the same as, #36788 (same standard library method).

🔎 Search Terms

matchAll

🕗 Version & Regression Information

  • This changed between versions 3.6.3 and 4.4.2

⏯ Playground Link

Playground link with relevant code

💻 Code

const searchString = 'The quick brown fox jumps over the lazy dog.';
const query = 'fox';

for (let match of searchString.matchAll(query)) {
    console.log(match);
}

🙁 Actual behavior

The above code is perfectly valid, as String.prototype.matchAll implicitly converts its argument to RegExp. The above playground even works as intended when you click Run. However, the TS type checker rejects the code on the grounds of query not being a RegExp:

Error TS2345: Argument of type 'string' is not assignable to parameter of type 'RegExp'.

🙂 Expected behavior

Passing a string to String.prototype.matchAll should typecheck.

jgonggrijp added a commit to UUDigitalHumanitieslab/readit-interface that referenced this issue Jan 4, 2022
@IllusionMH
Copy link
Contributor

IllusionMH commented Jan 4, 2022

Looks like same as #37211 and other issues where TS doesn't account for 'implicit conversion'
Which leads to https://stackoverflow.com/questions/41750390/what-does-all-legal-javascript-is-legal-typescript-mean

All these "perfectly valid" and "work as expected"

[...'123null123object123jest'.matchAll(123)].flat()
// (3) ['123', '123', '123']
[...'123null123object123jest'.matchAll(null)].flat()
// ['null']
[...'123null123object123jest'.matchAll({})].flat()
// (9) ['o', 'b', 'j', 'e', 'c', 't', 'j', 'e', 't']

@jgonggrijp
Copy link
Author

jgonggrijp commented Jan 4, 2022

Except that Math.max("hello", window.setTimeout, { }) is unreasonable but searchString.matchAll('fox') is not. There is no reason for TS to consider passing a string to matchAll a likely error.

@nmain
Copy link

nmain commented Jan 4, 2022

Implicit conversion of string to regexp is a source of many bugs and security vulnerabilities. If the string is properly sanitized and escaped as to make a suitable regexp, why not show that explicitly in your code with new RegExp(...)?

@jgonggrijp
Copy link
Author

Because it increases code complexity. I see your point, though; I'm running into this in a unittest where the string is defined right above the call to matchAll, so it is obvious to the human reader that there is nothing to worry about. I can see that this isn't as obvious in the more general case.

@jgonggrijp
Copy link
Author

I think fundamentally, I'm expecting TS to just check that types are compatible. Apparently, the intention of TS is to go beyond that and also raise hard errors when compatible types are potentially risky, like a very strict linter. It's a choice I don't like, but of course a project cannot cater to everyone.

Still, assuming that the intention is indeed to behave like a strict linter, I don't understand why 1 + '2' is allowed.

@MartinJohns
Copy link
Contributor

I don't understand why 1 + '2' is allowed.

Because that's not a source of common errors. Implicitly accepting arbitrary strings as regular expressions is.

@RyanCavanaugh RyanCavanaugh added the Working as Intended The behavior described is the intended behavior; this is not a bug label Jan 4, 2022
@RyanCavanaugh
Copy link
Member

If the matchAll argument is hardcoded, then there doesn't seem to be any reason not to write it as /fox/. If the matchAll argument isn't hardcoded - it came from an input - then it's wildly dangerous to accept any string here.

This code, for example, smells to high heaven. I wouldn't allow this anywhere.

const searchString = 'The quick brown fox_hound jumps over the lazy dog.';
const query = 'fox.hound'; // Shouldn't be found, because that is not a substring, idk?

for (let match of searchString.matchAll(query)) {
    console.log(match);
}

@jgonggrijp
Copy link
Author

@MartinJohns

I don't understand why 1 + '2' is allowed.

Because that's not a source of common errors. Implicitly accepting arbitrary strings as regular expressions is.

That's an interesting claim. I can easily think of scenarios where adding a string to a number produces NaN, causing comparisons, conditionals etcetera down the line to behave in unexpected ways. JavaScript is often criticized for allowing string-to-number addition. Do you have a list of sources of common errors in JavaScript somewhere, sorted by frequency, by any chance?

@RyanCavanaugh Is the frequency of the type of error a criterion for deciding whether to check only for compatibility, or also for advisibility?

@RyanCavanaugh
Copy link
Member

I'm not quite sure what you mean by that question.

@jgonggrijp
Copy link
Author

@RyanCavanaugh

Compare the following two snippets of code, both of which are type-compatible according to the ES specification, but inadvisable from an engineering point of view (because they are likely to lead to problems).

// snippet 1 (adding a string to a number)
1 + '2'

// snippet 2 (coercing a string to regex)
searchString.matchAll('fox')

tsc is raising an error in the second case because string-to-regex coercion is inadvisable, as has been discussed at length in the current thread. However, tsc does not raise an error in the first case. When I wrote that I didn't understand this difference in treatment, @MartinJohns suggested that string-to-number addition doesn't cause problems as frequently as string-to-regex coercion.

Disregarding for a moment whether this is actually true, my question to you is the following: is the frequency, by which a compatible-but-inadvisable construct leads to problems, a criterion by which the TS team decides whether to raise an error?

@RyanCavanaugh
Copy link
Member

RyanCavanaugh commented Jan 5, 2022

Thanks, that's a useful framing.

There are indeed a large number of coercions that occur in normal JS that aren't problematic -- even arr[0] technically involves a number-to-string coercion, but because this coercion is very unlikely to result in unexpected behavior, it's not considered an error, whereas the same coercion in "foo".indexOf(3) seems suspect enough to warrant an error. Similar things occur in other places, e.g. arr.slice() technically creates a coercion from undefined to 0, whereas the spec-identical coercion that occurs in 7 & undefined looks a lot like a potential bug.

It is ultimately rather subjective in a lot of places, and often requires attention to edge cases which some people consider to be outside the domain of their programs. For example, parseInt(3.141) is a useful way to truncate floating-point numbers, but parseInt(3e21) is 3, which is a disasterous result -- but if your program doesn't have numbers that big, maybe you don't care.

@jgonggrijp
Copy link
Author

Right. I agree it's subjective.

What is your stance towards string-to-number addition in particular?

@MartinJohns
Copy link
Contributor

1 + '2' will result in a string. While it may be undesirable behaviour, it will not cause any errors.
Coercing the string "[" to a regular expression would result in an error.

@RyanCavanaugh
Copy link
Member

What is your stance towards string-to-number addition in particular?

When TS was introduced, template strings were not a thing, so code like youHave + items.length + itemsInYourCart was super common and unproblematic (and arguably still is). Now that this can be written in a way that makes clear that it's producing a string rather than a number, it's something I think we might eventually add a flag for.

@fatcerberus
Copy link

It’s been my observation that whether or not constructs like these are flagged as errors pretty much comes down to asking: “If someone writes this code in a real program, is it more likely to be a mistake or intentional—and in the latter case, how likely is it to do what the programmer expects?”

It’s quite common (especially pre-template strings) to write things like "Thing " + 2, and what you end up with is "Thing 2", which tends to be what most people expect when they write it. Likewise if the number comes first as in 2 + " cows".

It’s all pretty much driven by a heavy dose of pragmatism. TypeScript is a lot less “religious” than most typed languages because it’s designed to model idiomatic JavaScript.

@fatcerberus
Copy link

Also worth noting is that JS doesn’t really have “undefined behavior” as such - all behavior is well-defined by the specification, even the surprising stuff. But clearly TypeScript can’t allow “anything that won’t throw an error at runtime”, or it would be basically useless. So the line between intentional vs. unintentional coercions can become rather blurry sometimes.

@jgonggrijp
Copy link
Author

@MartinJohns

1 + '2' will result in a string.

Thanks for making me aware of that. I thought it would result in a number due to the first operand, probably because I was confusing JS with another language. Given that it results in a string, I agree it is less problematic than a string-to-regex coercion.

@RyanCavanaugh

If the matchAll argument is hardcoded, then there doesn't seem to be any reason not to write it as /fox/. If the matchAll argument isn't hardcoded - it came from an input - then it's wildly dangerous to accept any string here.

Just for nuance, I'll mention what was going in my unittest. I was iterating over the keys of a small object and checking how often they occurred in a longer string. Due to the iteration, each key was assigned to a variable in turn, so literal regex notation was not an option. The object was a literal just a few lines up, and all keys contained only characters from the alphabet, so given that matchAll accepts strings directly, new RegExp(key, 'g') felt like overkill. Like this:

const testString = 'banana';
_.each({ a: 3, b: 1, n: 2 }, function(count, character) {
    expect(Array.from(testString.matchAll(character)).length).toBe(count);
});

I learned a couple of things, most importantly what TS is aiming for (i.e., more than just checking for type compatibility). Thanks @RyanCavanaugh and @MartinJohns for taking the time to discuss this with me.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Working as Intended The behavior described is the intended behavior; this is not a bug
Projects
None yet
Development

No branches or pull requests

6 participants