Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement literal analysis for all pure built-in string and array functions #5723

Closed
wants to merge 9 commits into from

Conversation

danog
Copy link
Collaborator

@danog danog commented May 5, 2021

This PR allows inferring the exact return type of a whitelisted subset of pure built-in functions, by actually executing them during static analysis.
This all started when I implemented proper return types for literal values in the ExplodeReturnTypeProvider: after implementing it manually, I realized that a similar but more generic approach could be used for all pure functions, with due precautions: this is the implementation so far.

Basically, when a call to a whitelisted built-in pure function occurs, all and only the literal types of the parameters provided to the function are extracted, and permutated to generate all combinations of possible exact return types.
If any of the parameter unions contains 0 literals, execution does not occur.
Execution is also skipped if we're running on a version of PHP that does not provide the required function (to make this reproducible across all versions, I suggest adding the symfony polyfills as runtime dependency).

The real underlying reason for this MR is an upcoming implementation of recursive static analysis in psalm, which should allow even more insight into complex systems, by similarly recursively replacing parameter types of functions with the exact literal value passed in a specific call, generating full call stacks and analyzing possible paradoxes/other issues of each single call stack.
Of course, this would also require inferring the exact literal return type of built-in functions, which is what this MR provides.

Generics and conditional types provide a similar, yet much less powerful solution: with recursive static analysis, finding bugs will be even easier, and will not require annotating every single function with generics and rewriting the behavior as a generic conditional.

@danog
Copy link
Collaborator Author

danog commented May 5, 2021

Heh, sometimes I forget that I'm not using phabel))))

@weirdan
Copy link
Collaborator

weirdan commented May 5, 2021

This will likely require some limits to be put in place, to make sure performance doesn't drop too much.

@orklah
Copy link
Collaborator

orklah commented May 5, 2021

For safety reason, we should make sure we only execute functions that are declare pure in psalm. It should be a test to make sure we don't add impure functions

@danog
Copy link
Collaborator Author

danog commented May 5, 2021

This will likely require some limits to be put in place, to make sure performance doesn't drop too much.

Of course, I'll add limitations for the pad and range functions.

If you were talking about the recursive static analysis, of course limitations should also be put in place to avoid excessive depth and width of the recursion tree (the first could actually be emitted as an issue on its own).

For safety reason, we should make sure we only execute functions that are declare pure in psalm. It should be a test to make sure we don't add impure functions

This is already in place, along with a manual whitelist of allowed functions.

@muglug
Copy link
Collaborator

muglug commented May 6, 2021

I'm not sure how I feel about this.

Static analysis should mean static analysis. While Psalm does a few calculations inline (3 + 4 becomes 7 after @orklah's PR) the general assumption is that no functions (other than the ones Psalm uses to operate) are being executed.

More importantly, I'm not convinced that much code would benefit from this — it's exceptionally rare that explode is called with known parameters, for example.

Do you have an example of real-world code that justifies this sort of enhancement?

@danog
Copy link
Collaborator Author

danog commented May 10, 2021

I'm not sure how I feel about this.
Static analysis should mean static analysis.

To be honest, I too thought of this when writing this module.
However, psalm cannot statically analyze native functions: providing a literal return type for native functions will only improve the overall quality of static analysis: it is basically treating native functions as language operators, whose behavior can only be inferred by executing them.

About native language operators:

While Psalm does a few calculations inline (3 + 4 becomes 7 after @orklah's PR)

That may actually be broken, see #5722 :)

Do you have an example of real-world code that justifies this sort of enhancement?

The main reason for this PR is the upcoming recursive static analysis module :)

@danog
Copy link
Collaborator Author

danog commented May 10, 2021

Fixed all remaining test failures, the remaining failures are caused by #5743, #5742, #5722, and the TypeCombiner not understanding that int|positive-int is actually int :)

@danog danog marked this pull request as ready for review May 10, 2021 11:04
@orklah
Copy link
Collaborator

orklah commented May 10, 2021

TypeCombiner not understanding that int|positive-int is actually int :)

Not sure it's related, but sometimes Psalm's tests doesn't display the most precise syntax. It is possible that 0|positive-int would end up displayed as int|positive-int without being exactly equivalent to int

@muglug
Copy link
Collaborator

muglug commented May 10, 2021

Not sure it's related, but sometimes Psalm's tests doesn't display the most precise syntax. It is possible that 0|positive-int would end up displayed as int|positive-int without being exactly equivalent to int

Yes — you can use $varname=== instead of $var in the test assertions to get the exact type.

@muglug
Copy link
Collaborator

muglug commented May 10, 2021

The main reason for this PR is the upcoming recursive static analysis module :)

You've mentioned this a couple of times — what real-world bugs will this detect that Psalm cannot?

@danog
Copy link
Collaborator Author

danog commented May 11, 2021

You've mentioned this a couple of times — what real-world bugs will this detect that Psalm cannot?

A simple example, the one that inspired me to write this PR, is localization: in our project, localization is cached to memcached, but originally it's stored in a simple set of PHP files, that are parsed on runtime if a cache miss occurs.
A recursive static analyzer will be able to do semantic verification by checking along all code paths, to ensure that a lookup of a specific key -> cache miss -> literal lookup from a PHP array always works properly.

@muglug
Copy link
Collaborator

muglug commented May 12, 2021

Sorry, would you mind providing example code for that?

@danog
Copy link
Collaborator Author

danog commented May 12, 2021

A basic example:
a:php

<?php

function i18n(string $key): string
{
    static $cache = [];
    static $phpCache = null;
    if (array_key_exists($key, $cache)) {
        return $cache[$key];
    }
    $phpCache ??= include 'en_EN.php';
    return $phpCache[$key];
}

echo i18n('init').PHP_EOL;
echo i18n('welcome').PHP_EOL;

en_EN.php

<?php

return ['init' => 'Hello!'];

The real code is a tad bit more complex (even better suited for semantic analysis), but the core logic is the same.

@muglug
Copy link
Collaborator

muglug commented May 12, 2021

I'm going to close this, I'm afraid — this is not something that I think would be useful to a wide number of Psalm users. Instead I'd encourage you to consider writing your own plugin instead.

@muglug muglug closed this May 12, 2021
@orklah
Copy link
Collaborator

orklah commented May 13, 2021

@danog it would be interesting to check the result of the "check against real project" in the CI to see if the PR would have detected interesting issues in the wild

@danog
Copy link
Collaborator Author

danog commented May 13, 2021

Just ran them, and as expected, nothing relevant was found, at least with the default levels.
This was expected, as this PR was supposed to be in preparation of recursive static analysis; I'll continue working on that when I have smore time, will report results (and eventually PRs ;)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants