Implement literal analysis for all pure built-in string and array functions #5723

danog · 2021-05-05T12:19:50Z

This PR allows inferring the exact return type of a whitelisted subset of pure built-in functions, by actually executing them during static analysis.
This all started when I implemented proper return types for literal values in the ExplodeReturnTypeProvider: after implementing it manually, I realized that a similar but more generic approach could be used for all pure functions, with due precautions: this is the implementation so far.

Basically, when a call to a whitelisted built-in pure function occurs, all and only the literal types of the parameters provided to the function are extracted, and permutated to generate all combinations of possible exact return types.
If any of the parameter unions contains 0 literals, execution does not occur.
Execution is also skipped if we're running on a version of PHP that does not provide the required function (to make this reproducible across all versions, I suggest adding the symfony polyfills as runtime dependency).

The real underlying reason for this MR is an upcoming implementation of recursive static analysis in psalm, which should allow even more insight into complex systems, by similarly recursively replacing parameter types of functions with the exact literal value passed in a specific call, generating full call stacks and analyzing possible paradoxes/other issues of each single call stack.
Of course, this would also require inferring the exact literal return type of built-in functions, which is what this MR provides.

Generics and conditional types provide a similar, yet much less powerful solution: with recursive static analysis, finding bugs will be even easier, and will not require annotating every single function with generics and rewriting the behavior as a generic conditional.

danog · 2021-05-05T12:24:03Z

Heh, sometimes I forget that I'm not using phabel))))

weirdan · 2021-05-05T14:06:17Z

This will likely require some limits to be put in place, to make sure performance doesn't drop too much.

orklah · 2021-05-05T14:50:14Z

For safety reason, we should make sure we only execute functions that are declare pure in psalm. It should be a test to make sure we don't add impure functions

danog · 2021-05-05T15:14:07Z

This will likely require some limits to be put in place, to make sure performance doesn't drop too much.

Of course, I'll add limitations for the pad and range functions.

If you were talking about the recursive static analysis, of course limitations should also be put in place to avoid excessive depth and width of the recursion tree (the first could actually be emitted as an issue on its own).

For safety reason, we should make sure we only execute functions that are declare pure in psalm. It should be a test to make sure we don't add impure functions

This is already in place, along with a manual whitelist of allowed functions.

muglug · 2021-05-06T01:58:42Z

I'm not sure how I feel about this.

Static analysis should mean static analysis. While Psalm does a few calculations inline (3 + 4 becomes 7 after @orklah's PR) the general assumption is that no functions (other than the ones Psalm uses to operate) are being executed.

More importantly, I'm not convinced that much code would benefit from this — it's exceptionally rare that explode is called with known parameters, for example.

Do you have an example of real-world code that justifies this sort of enhancement?

…ctions

danog · 2021-05-10T08:30:55Z

I'm not sure how I feel about this.
Static analysis should mean static analysis.

To be honest, I too thought of this when writing this module.
However, psalm cannot statically analyze native functions: providing a literal return type for native functions will only improve the overall quality of static analysis: it is basically treating native functions as language operators, whose behavior can only be inferred by executing them.

About native language operators:

While Psalm does a few calculations inline (3 + 4 becomes 7 after @orklah's PR)

That may actually be broken, see #5722 :)

Do you have an example of real-world code that justifies this sort of enhancement?

The main reason for this PR is the upcoming recursive static analysis module :)

danog · 2021-05-10T11:04:25Z

Fixed all remaining test failures, the remaining failures are caused by #5743, #5742, #5722, and the TypeCombiner not understanding that int|positive-int is actually int :)

orklah · 2021-05-10T14:31:42Z

TypeCombiner not understanding that int|positive-int is actually int :)

Not sure it's related, but sometimes Psalm's tests doesn't display the most precise syntax. It is possible that 0|positive-int would end up displayed as int|positive-int without being exactly equivalent to int

muglug · 2021-05-10T15:43:44Z

Not sure it's related, but sometimes Psalm's tests doesn't display the most precise syntax. It is possible that 0|positive-int would end up displayed as int|positive-int without being exactly equivalent to int

Yes — you can use $varname=== instead of $var in the test assertions to get the exact type.

muglug · 2021-05-10T15:45:16Z

The main reason for this PR is the upcoming recursive static analysis module :)

You've mentioned this a couple of times — what real-world bugs will this detect that Psalm cannot?

danog · 2021-05-11T09:06:52Z

You've mentioned this a couple of times — what real-world bugs will this detect that Psalm cannot?

A simple example, the one that inspired me to write this PR, is localization: in our project, localization is cached to memcached, but originally it's stored in a simple set of PHP files, that are parsed on runtime if a cache miss occurs.
A recursive static analyzer will be able to do semantic verification by checking along all code paths, to ensure that a lookup of a specific key -> cache miss -> literal lookup from a PHP array always works properly.

muglug · 2021-05-12T05:15:07Z

Sorry, would you mind providing example code for that?

danog · 2021-05-12T08:18:42Z

A basic example:
a:php

<?php

function i18n(string $key): string
{
    static $cache = [];
    static $phpCache = null;
    if (array_key_exists($key, $cache)) {
        return $cache[$key];
    }
    $phpCache ??= include 'en_EN.php';
    return $phpCache[$key];
}

echo i18n('init').PHP_EOL;
echo i18n('welcome').PHP_EOL;

en_EN.php

<?php

return ['init' => 'Hello!'];

The real code is a tad bit more complex (even better suited for semantic analysis), but the core logic is the same.

muglug · 2021-05-12T22:46:00Z

I'm going to close this, I'm afraid — this is not something that I think would be useful to a wide number of Psalm users. Instead I'd encourage you to consider writing your own plugin instead.

orklah · 2021-05-13T06:12:03Z

@danog it would be interesting to check the result of the "check against real project" in the CI to see if the PR would have detected interesting issues in the wild

danog · 2021-05-13T09:20:15Z

Just ran them, and as expected, nothing relevant was found, at least with the default levels.
This was expected, as this PR was supposed to be in preparation of recursive static analysis; I'll continue working on that when I have smore time, will report results (and eventually PRs ;)

danog mentioned this pull request May 5, 2021

Simple arithmetic doesn't seem to work #5722

Closed

danog added 3 commits May 10, 2021 10:21

Implement literal analysis for all pure built-in string and array fun…

6a55fb2

…ctions

PHP 7.1 syntax

21052a3

More php7.1

05f1121

Fix tests, add a few limits

8a8b71e

danog force-pushed the master branch from 65ff686 to 8a8b71e Compare May 10, 2021 09:09

danog added 4 commits May 10, 2021 12:02

Fix leftover issues

db8a471

callable-string is always a non-empty-string

b61fe13

Treat list as disjunction of values

3a76046

Consider non-empty lists

6ea150c

danog marked this pull request as ready for review May 10, 2021 11:04

Work only with lists whose shape is known (TKeyedArray)

66abdae

muglug closed this May 12, 2021

danog mentioned this pull request Nov 19, 2022

Feature: use return type providers for inferred return types #8721

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Implement literal analysis for all pure built-in string and array functions #5723

Implement literal analysis for all pure built-in string and array functions #5723

danog commented May 5, 2021 •

edited

danog commented May 5, 2021 •

edited

weirdan commented May 5, 2021

orklah commented May 5, 2021

danog commented May 5, 2021

muglug commented May 6, 2021

danog commented May 10, 2021 •

edited

danog commented May 10, 2021

orklah commented May 10, 2021

muglug commented May 10, 2021

muglug commented May 10, 2021

danog commented May 11, 2021 •

edited

muglug commented May 12, 2021

danog commented May 12, 2021

muglug commented May 12, 2021

orklah commented May 13, 2021

danog commented May 13, 2021

Implement literal analysis for all pure built-in string and array functions #5723

Implement literal analysis for all pure built-in string and array functions #5723

Conversation

danog commented May 5, 2021 • edited

danog commented May 5, 2021 • edited

weirdan commented May 5, 2021

orklah commented May 5, 2021

danog commented May 5, 2021

muglug commented May 6, 2021

danog commented May 10, 2021 • edited

danog commented May 10, 2021

orklah commented May 10, 2021

muglug commented May 10, 2021

muglug commented May 10, 2021

danog commented May 11, 2021 • edited

muglug commented May 12, 2021

danog commented May 12, 2021

muglug commented May 12, 2021

orklah commented May 13, 2021

danog commented May 13, 2021

danog commented May 5, 2021 •

edited

danog commented May 5, 2021 •

edited

danog commented May 10, 2021 •

edited

danog commented May 11, 2021 •

edited