-
Notifications
You must be signed in to change notification settings - Fork 432
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implement array shapes for preg_match() $matches #2589
base: 1.11.x
Are you sure you want to change the base?
Conversation
@mvorisek please provide feedback and if you could come up with a few more test-cases, that would be awesome. whats the expected result for phpstan/phpstan#9502 ? |
|
||
public function isFunctionSupported(FunctionReflection $functionReflection, FuncCall $node, TypeSpecifierContext $context): bool | ||
{ | ||
return in_array(strtolower($functionReflection->getName()), ['preg_match'], true); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
preg_match_all
should be supported as well
also handle/test all relevant preg_ $flags
params like PREG_OFFSET_CAPTURE
, PREG_SET_ORDER
, ... which have substantial impact on the result
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lets do preg_match
first and see what ondrey is thinking about it
memo to me: I need to fix |
@staabm It would be really great to have this implemented now |
I have this PR on my todo list and will rebase it in the near future. You can help by verifying existing test expectations of this PR or help explore missing tests/cases |
c08a944
to
0ce2624
Compare
fix n modifier support PREG_OFFSET_CAPTURE cs separated 8.2 tests support PREG_OFFSET_CAPTURE|PREG_UNMATCHED_AS_NULL fix
Co-authored-by: Gregor Harlan <mail@gh01.de>
Co-Authored-By: PrinsFrank <25006490+PrinsFrank@users.noreply.github.com>
Co-Authored-By: PrinsFrank <25006490+PrinsFrank@users.noreply.github.com>
This pull request has been marked as ready for review. |
In the end I combined both approaches. regex based retrieval of the capturing groups and the AST to identify how many non-optional groups are contained in the regex. since the type inference now requires a regex which is parsable by Hoa\Regex I commented some type-assertions which currently don't work because the regex parser cannot parse it. I think these can be fixed in the future by applying fixes to the I am neither a regex nor a grammer expert, therefore I left these things for other people more confident in this topic. In a future PR I would improve the types to be even more precise for e.g. |
|
||
function doNamedSubpattern(string $s): void { | ||
if (preg_match('/\w-(?P<num>\d+)-(\w)/', $s, $matches)) { | ||
// could be assertType('array{0: string, num: string, 1: string, 2: string, 3: string}', $matches); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
here we can see one example of a regex which is not parsable with the current hoa grammar:
Fatal error: Uncaught Hoa\Compiler\Llk\Parser::parse(): (0) Unexpected token "-" (range) at line 1 and column 4:
/\w-(?P<num>\d+)-(\w)/
↑
in /Users/staabm/workspace/phpstan-src/vendor/hoa/compiler/Llk/Parser.php at line 1.
thrown in /Users/staabm/workspace/phpstan-src/vendor/hoa/compiler/Llk/Parser.php on line 1
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are two issues:
a) (?P<xxx>...
is equivalent of supported (?<xxx>...
- spec https://www.pcre.org/original/doc/html/pcrepattern.html (search for ˙?p<˙), grammar: https://github.com/hoaproject/Regex/blob/master/Source/Grammar.pp#L73
b) -
range - it is reserved character strictly if in [
character group only (otherwise it is regular chanracter and very often used)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks. Please provide code suggestions for fixes.
the example work in plain php, so we either need to fix the grammar or leave the typing for these cases behind
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would you mind forking hoaproject/Regex
and saving there a php test file I can run with all the issues you found?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
look into this PR here and find all commented assertions. these are the cases which hoa cannot parse.
I have spent the last the 3 days of freetime for your feature request. take your time to do the tests yourself. Thanks
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ohh I see. I got the impression you are working on all these cases.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can look into more cases later. This one was the most important. What about adding proper Hoa Regex unit testing to phpstan-src? We can assert the parsed regexes using https://github.com/mvorisek/Hoa-Regex/blob/0c8a5cbcf696df8f65168704af14ad5ebd48c641/test.php#L13-L25 function.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure, why not. Please send a PR to my branch/fork
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not super familiar with phpstan-src codebase - where, what class can I base this unit test on?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would create a new file in tests/PHPStan/Type
. we can later on move it somewhere else if this doesn't work for ondrej.
-%token anchor \\(bBAZzG)|\^|\$ | ||
+%token anchor \\([bBAZzG])|\^|\$ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
unreleased hoa/regex upstream fix
hoaproject/Regex@5f670af
patches/Grammar.patch
Outdated
- ( range() | literal() )+ | ||
+ ( <class_> | range() | literal() )+ <range>? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
unreleased hoa/regex upstream fix
hoaproject/Regex@ce7fd7b
hoaproject/Regex@e770ada
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
2nd patch was reverted, see #2589 (comment)
-capturing: | ||
+#capturing: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
upstream hoa fix
hoaproject/Regex#38 (comment)
This looks amazing, thanks @staabm. Now I'm just wondering how I can get it to recognize Composer\Pcre\Preg::match and co as well to get the same quality of type output there :s If you can think of some way to make this more reusable by third party libs wrapping preg functions it'd be great. |
at first I want to see what @ondrejmirtes thinks about all this. he was not yet involved. After that: I can think of different ways to make this re-usable. for wrapper-libs which are 100% api compatible with the pcre_* native functions we could e.g.
|
Ok cool thanks. Yes composer/pcre is almost one to one.. Except it throws instead of returning false. For the other api returning object results there it's a bit more complex but anyway I see your point and I'll let you wrap things up here first. Just wanted to let you know in case it can influence design decisions along the way but it's not urgent at all. |
} | ||
assertType('array<string>', $matches); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
} | |
assertType('array<string>', $matches); | |
} else { | |
assertType('array{}', $matches); | |
} | |
assertType('array{string}', $matches); |
when pattern is not matched, the $matches
is reset to array{}
- https://3v4l.org/NE0bm
Some initial thoughts:
|
src/Type/Php/RegexShapeMatcher.php
Outdated
|
||
$remainingNonOptionalGroupCount = $this->countNonOptionalGroups($regex); | ||
if ($remainingNonOptionalGroupCount === null) { | ||
// regex could not be parsed by Hoa/Regex |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would prefer emitting a PHPStan error with the parse exception message - the PCRE grammar is actually not so complex, so we can fix the Hoa grammar based on issue reports.
This would effectively also validate constant regexps for syntax errors.
The https://3v4l.org/sOXbn parsing hack can be then removed in favor of Hoa parser.
An idea: send a separate PR with DynamicParameterOutType extensions, and once that is merged, you can base this PR on that. |
I tried solving the shapes with Hoa Regex AST, but I couldn't find the information about capturing groups on the ast nodes.
I don't see a way on how to identify where capturing groups are based on the AST.
therefore I tried the way suggested in phpstan/phpstan#9502 (comment)