New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add support for references and improve UnusedVariable checks (fixes #7254). #7339
Changes from all commits
f945065
3c7d176
bf60620
089700d
7694eb8
57b99be
139c3af
740a101
e268a05
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,39 @@ | ||
# ReferenceReusedFromConfusingScope | ||
|
||
Emitted when a reference assigned in a potentially confusing scope is reused later. | ||
Since PHP doesn't scope loops and ifs the same way most other languages do, a common issue is the reuse of a variable | ||
declared in such a scope. Usually it doesn't matter, because a reassignment to an already-defined variable will just | ||
reassign it, but if the already-defined variable is a reference it will change the value of the referred to variable. | ||
|
||
```php | ||
<?php | ||
|
||
$arr = [1, 2, 3]; | ||
foreach ($arr as &$i) { | ||
++$i; | ||
} | ||
|
||
// ...more code, after which the usage of $i as a reference has been forgotten by the programmer | ||
|
||
for ($i = 0; $i < 10; ++$i) { | ||
echo $i; | ||
} | ||
// $arr is now [2, 3, 10] instead of the expected [2, 3, 4] | ||
``` | ||
|
||
To fix the issue, unset the reference after using it in such a scope: | ||
|
||
```php | ||
<?php | ||
|
||
$arr = [1, 2, 3]; | ||
foreach ($arr as &$i) { | ||
++$i; | ||
} | ||
unset($i); | ||
|
||
for ($i = 0; $i < 10; ++$i) { | ||
echo $i; | ||
} | ||
// $arr is correctly [2, 3, 4] | ||
``` |
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -13,9 +13,12 @@ | |
use Psalm\Type\Atomic\DependentType; | ||
use Psalm\Type\Atomic\TArray; | ||
use Psalm\Type\Union; | ||
use RuntimeException; | ||
|
||
use function array_keys; | ||
use function array_search; | ||
use function array_shift; | ||
use function assert; | ||
use function count; | ||
use function in_array; | ||
use function is_int; | ||
|
@@ -40,6 +43,49 @@ class Context | |
*/ | ||
public $vars_possibly_in_scope = []; | ||
|
||
/** | ||
* Keeps track of how many times a var_in_scope has been referenced. May not be set for all $vars_in_scope. | ||
* | ||
* @var array<string, int<0, max>> | ||
*/ | ||
public $referenced_counts = []; | ||
|
||
/** | ||
* Maps references to referenced variables for the current scope. | ||
* With `$b = &$a`, this will contain `['$b' => '$a']`. | ||
* | ||
* All keys and values in this array are guaranteed to be set in $vars_in_scope. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I'm fairly confident I caught all the places a var_in_scope might be unset, but if this isn't true for some reason I consider that a bug to be fixed. |
||
* | ||
* To check if a variable was passed or returned by reference, or | ||
* references an object property or array item, see Union::$by_ref. | ||
* | ||
* @var array<string, string> | ||
*/ | ||
public $references_in_scope = []; | ||
|
||
/** | ||
* Set of references to variables in another scope. These references will be marked as used if they are assigned to. | ||
* | ||
* @var array<string, true> | ||
*/ | ||
public $references_to_external_scope = []; | ||
|
||
/** | ||
* A set of globals that are referenced somewhere. | ||
* | ||
* @var array<string, true> | ||
*/ | ||
public $referenced_globals = []; | ||
|
||
/** | ||
* A set of references that might still be in scope from a scope likely to cause confusion. This applies | ||
* to references set inside a loop or if statement, since it's easy to forget about PHP's weird scope | ||
* rules, and assinging to a reference will change the referenced variable rather than shadowing it. | ||
* | ||
* @var array<string, CodeLocation> | ||
*/ | ||
public $references_possibly_from_confusing_scope = []; | ||
|
||
/** | ||
* Whether or not we're inside the conditional of an if/where etc. | ||
* | ||
|
@@ -362,11 +408,6 @@ class Context | |
*/ | ||
public $has_returned = false; | ||
|
||
/** | ||
* @var array<string, bool> | ||
*/ | ||
public $vars_from_global = []; | ||
|
||
/** | ||
* @var array<string, true> | ||
*/ | ||
|
@@ -449,6 +490,28 @@ public function update( | |
} | ||
} | ||
|
||
/** | ||
* Updates the list of possible references from a confusing scope, | ||
* such as a reference created in an if that might later be reused. | ||
*/ | ||
public function updateReferencesPossiblyFromConfusingScope( | ||
Context $confusing_scope_context, | ||
StatementsAnalyzer $statements_analyzer | ||
): void { | ||
$references = $confusing_scope_context->references_in_scope | ||
+ $confusing_scope_context->references_to_external_scope; | ||
foreach ($references as $reference_id => $_) { | ||
if (!isset($this->references_in_scope[$reference_id]) | ||
&& !isset($this->references_to_external_scope[$reference_id]) | ||
&& $reference_location = $statements_analyzer->getFirstAppearance($reference_id) | ||
) { | ||
$this->references_possibly_from_confusing_scope[$reference_id] = $reference_location; | ||
} | ||
} | ||
$this->references_possibly_from_confusing_scope += | ||
$confusing_scope_context->references_possibly_from_confusing_scope; | ||
} | ||
|
||
/** | ||
* @param array<string, Union> $new_vars_in_scope | ||
* | ||
|
@@ -496,19 +559,60 @@ public static function getNewOrUpdatedVarIds(Context $original_context, Context | |
return $redefined_var_ids; | ||
} | ||
|
||
public function remove(string $remove_var_id): void | ||
public function remove(string $remove_var_id, bool $removeDescendents = true): void | ||
{ | ||
unset( | ||
$this->referenced_var_ids[$remove_var_id], | ||
$this->vars_possibly_in_scope[$remove_var_id] | ||
); | ||
|
||
if (isset($this->vars_in_scope[$remove_var_id])) { | ||
$existing_type = $this->vars_in_scope[$remove_var_id]; | ||
unset($this->vars_in_scope[$remove_var_id]); | ||
|
||
$this->removeDescendents($remove_var_id, $existing_type); | ||
if ($removeDescendents) { | ||
$this->removeDescendents($remove_var_id, $existing_type); | ||
} | ||
} | ||
$this->removePossibleReference($remove_var_id); | ||
unset($this->vars_possibly_in_scope[$remove_var_id]); | ||
} | ||
|
||
/** | ||
* Remove a variable from the context which might be a reference to another variable, or | ||
* referenced by another variable. Leaves the variable as possibly-in-scope, unlike remove(). | ||
*/ | ||
public function removePossibleReference(string $remove_var_id): void | ||
{ | ||
if (isset($this->referenced_counts[$remove_var_id]) && $this->referenced_counts[$remove_var_id] > 0) { | ||
// If a referenced variable goes out of scope, we need to update the references. | ||
// All of the references to this variable are still references to the same value, | ||
// so we pick the first one and make the rest of the references point to it. | ||
$references = []; | ||
foreach ($this->references_in_scope as $reference => $referenced) { | ||
if ($referenced === $remove_var_id) { | ||
$references[] = $reference; | ||
unset($this->references_in_scope[$reference]); | ||
} | ||
} | ||
assert(!empty($references)); | ||
$first_reference = array_shift($references); | ||
if (!empty($references)) { | ||
$this->referenced_counts[$first_reference] = count($references); | ||
foreach ($references as $reference) { | ||
$this->references_in_scope[$reference] = $first_reference; | ||
} | ||
} | ||
} | ||
if (isset($this->references_in_scope[$remove_var_id])) { | ||
$reference_count = &$this->referenced_counts[$this->references_in_scope[$remove_var_id]]; | ||
if ($reference_count < 1) { | ||
throw new RuntimeException("Incorrect referenced count found"); | ||
} | ||
--$reference_count; | ||
} | ||
unset( | ||
$this->vars_in_scope[$remove_var_id], | ||
$this->referenced_var_ids[$remove_var_id], | ||
$this->referenced_counts[$remove_var_id], | ||
$this->references_in_scope[$remove_var_id], | ||
$this->references_to_external_scope[$remove_var_id], | ||
); | ||
} | ||
|
||
/** | ||
|
@@ -645,7 +749,7 @@ public function removeDescendents( | |
|
||
foreach ($this->vars_in_scope as $var_id => $type) { | ||
if (preg_match('/' . preg_quote($remove_var_id, '/') . '[\]\[\-]/', $var_id)) { | ||
unset($this->vars_in_scope[$var_id]); | ||
$this->remove($var_id, false); | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Not sure why this wasn't unsetting There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. well, vars_possibly_in_scope is just an array of bools. It's nothing without vars_in_scope so yeah, probably overlooked and never caused issue |
||
} | ||
|
||
foreach ($type->getAtomicTypes() as $atomic_type) { | ||
|
@@ -676,7 +780,7 @@ public function removeMutableObjectVars(bool $methods_only = false): void | |
} | ||
|
||
foreach ($vars_to_remove as $var_id) { | ||
unset($this->vars_in_scope[$var_id], $this->vars_possibly_in_scope[$var_id]); | ||
$this->remove($var_id, false); | ||
} | ||
|
||
$clauses_to_keep = []; | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This isn't really that closely related to everything else in this PR, but I'm pretty sure it's a correct change. I'm not familiar enough with the taint feature to know how to test it, but I worry that without this change some of the taint stuff might mess up the
parent_nodes
for a reference.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's fix that when we have a reported bug then. I see we have multiple instances of both ways of declaring parent_nodes and I'm not sure when to choose which
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was leaning that way originally, but it looks like this only exists for plugins to use, and based on the name it's supposed to add a source, not remove all other sources as well.
I think the only time
parent_nodes
should be set to an array rather than setting an offset is when the previous nodes are known to be wrong, such as when assigning a variable.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm unclear on what can cause a variable to have multiple parents though
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm a bit unsure about that myself. I utilized multiple parents for references, but outside of that I don't know what use it has, other than it probably has something to do with taints.