-
Notifications
You must be signed in to change notification settings - Fork 504
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
preserve large arrays with same keys through union #3032
preserve large arrays with same keys through union #3032
Conversation
b2762e3
to
22b25e6
Compare
According to issue-bot this fixes 2 issues, please write regression tests for them 😊 Thank you. |
src/Type/TypeCombinator.php
Outdated
/** | ||
* @param non-empty-list<ConstantArrayType[]> $constantArraysMap | ||
*/ | ||
private static function unionConstantArrayTypesWithSameKeys(array $constantArraysMap): ?ConstantArrayType |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
From the looks of the name, there's already a method with a very similar goal: TypeCombinator::reduceArrays
. Can't we call that one?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I tried replacing the call to unionConstantArrayTypesWithSameKeys
with this (if (count($reducedArrayTypes) === 1) {
so that I don't turn off the generalization completely).
$reducedArrayTypes = self::reduceArrays($arrayTypes);
if (count($reducedArrayTypes) === 1) {
return $reducedArrayTypes;
}
It worked with the test-cases that I had, but it is not quite equivalent. I added a test-case to reject this option. The problem is in ConstantArrayType::isKeysSupersetOf
(
$failOnDifferentValueType = true; |
The goal of unionConstantArrayTypesWithSameKeys
is to detect the union of
array{K1: T11, K2: T12, ... KN: T1N}
array{K1: T21, K2: T22, ... KN: T2N}
...
array{K1: TM1, K2: TM2, ... KN: TMN}
and return array{K1: T11|T21|...|TM1, K2: T12|T22|...|TM2, ..., KN: T1N|T2N|...|TMN}
.
If the input doesn't match the simple target situation, then the method just returns null and we can proceed with the generalization.
reduceArrays
seems to do the following:
- Separate non-empty constant arrays, non-constant arrays and empty array.
- Process the constant arrays:
- For each pair check if one array is a "superset" of the other one, and if so merge them together.
- Return an array of all the remaining constant array types, as well as the non-constant arrays and the empty array.
It works in some cases, but it is more complicated than it needs to be for the scenario that I had in mind. And there are a lot of subtle details involved.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I feel like that having two different methods that do very similar things is gonna bite us in the future. Can we improve reduceArrays
and use it in more places so that it works for fixing these cases?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
processArrayTypes
is certainly quite complicated already, and this isn't making it any simpler. But I'm not sure how to modify reduceArrays
such that it can be used in both places, and it is simpler than having a separate method.
The problem for me is that reduceArrays
maintains tagged unions: E.g. array{a: 1, b: int}|array{a: 2, b: string}
should remain unchanged by reduceArrays
. However, I assume that this is something that we can't afford when the arrays involved are too large. So there is the generalization, which would (assuming lower generalization limit) turn this into non-empty-array<'a'|'b', int|string>
.
With my change we would lose the tagged union, but at least we'd maintain the general shape of the array: array{a: 1|2, b: string|int}
.
So reduceArrays
would have to know whether or not it should try to maintain the tagged unions. Either there would be a boolean flag, or it could re-check the size of the arrays involved, ...
The issue in my use-case happens because after a branching statement (if, ternary operator, cycles, ...) the scopes from non-terminating branches are unioned together and they replace the previous scope. However, in my use-case the array is not modified in any of the branches and yet the result of the union is a more general type than it was before the branching statement.
So perhaps this could be another way of fixing the issue: detect scenarios where the variable is not modified and the new type is wider and keep the previous type instead. But it seems complicated.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Either there would be a boolean flag, or it could re-check the size of the arrays involved, ...
sounds good to me to at least try it!
So perhaps this could be another way of fixing the issue
this should also be possible. We have all the information available. There are three scopes at play - let’s say we’re analysing if-else. There the scope before the statement (original) and there’s the if scope and else scope. If the type of the expression is equal in all three, we don’t have to run the union() operation. There might even be performance benefit to that…
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sounds good to me to at least try it!
You have a tendency to be right about these things. 👍
There are three scopes at play - let’s say we’re analysing if-else. There the scope before the statement (original) and there’s the if scope and else scope. If the type of the expression is equal in all three, we don’t have to run the union() operation. There might even be performance benefit to that…
That wouldn't help in my use-case: the types are different in the branches (e.g. the condition is a null-check on one of the dimensions), but their union should be the same (or narrower) than the original scope, which it isn't, because the arrays are too large and they get generalized.
Anyway, after I wrote the previous comment, I realized that it's better to fix it in the union method (if possible), because that is a more general fix.
53438cc
to
2c609e4
Compare
@@ -708,6 +713,10 @@ private static function processArrayTypes(array $arrayTypes): array | |||
} | |||
|
|||
if ($generalArrayOccurred && (!$overflowed || $filledArrays > 1)) { | |||
$reducedArrayTypes = self::reduceArrays($arrayTypes, false); | |||
if (count($reducedArrayTypes) === 1) { | |||
return [self::intersect($reducedArrayTypes[0], ...$accessoryTypes)]; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As far as I can tell, currently the accessory types will always be empty (so I can't test this). But I added them anyway to be safe in case new accessory types are added in the future.
// This is an optimization for current use-case of $preserveTaggedUnions=false, where we need | ||
// one constant array as a result, or we generalize the $constantArrays. | ||
if (!$preserveTaggedUnions) { | ||
return $constantArrays; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm conflicted about this. On one hand, it would feel bad not to have this condition and run a lot of the method uselessly. But on the other hand, the condition depends on the particular usage of $preserveTaggedUnions
. Maybe a better name for $preserveTaggedUnions
would help, but I didn't come up with anything. So I at least added a comment.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is fine 👍
Awesome, I really like this! 👍 |
Thank you! |
Nice one 👍 |
Fixes https://phpstan.org/r/0071beb4-becd-47c0-8a17-b52407da0c22
Fixes phpstan/phpstan#9397
Fixes phpstan/phpstan#10080
Basically, I have a large array shape, which I'm not modifying (and it's not a tagged union either). I just use the elements individually. But this causes the array to be generalized into a "useless"
non-empty-array<...>
. As a workaround, I can pull the elements used in conditions into variables and then use the variables, but it is quite inconvenient.So this PR attempts to maintain the array shape in the simplest case where 2 constant arrays with the same keys are being unioned together.