Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

rough performance analysis of the current internals #2206

Open
RonnyPfannschmidt opened this issue Jan 18, 2017 · 61 comments
Open

rough performance analysis of the current internals #2206

RonnyPfannschmidt opened this issue Jan 18, 2017 · 61 comments
Labels
type: performance performance or memory problem/improvement

Comments

@RonnyPfannschmidt
Copy link
Member

this is the second followup to #401

its about time we take a good looks what eats time when executing pytest,
the last deep look was quite a while ago

@rajibmitra
Copy link

can I work on it ?

@RonnyPfannschmidt
Copy link
Member Author

Feel free to, please send a mail to the ml and join the IRC channels if IRC works out for you

@nicoddemus nicoddemus added type: performance performance or memory problem/improvement and removed type: task labels Apr 5, 2018
@boxed

This comment has been minimized.

@blueyed

This comment has been minimized.

@boxed

This comment has been minimized.

@blueyed

This comment has been minimized.

@boxed

This comment has been minimized.

@blueyed

This comment has been minimized.

@boxed
Copy link
Contributor

boxed commented Oct 24, 2018

I'll look into creating an example project with the performance issues tomorrow at work. It's probably just to create lots of files and directories that are irrelevant I think.

What's the deal with the "marked as off topic" stuff btw? This issue was about performance issues generally and I gave a pretty clear example of a huge performance issue. I don't see how I could be more on topic than that ^_-

@blueyed
Copy link
Contributor

blueyed commented Oct 24, 2018

Cool, looking forward to this.

Re offtopic: this issue appears to be investigating performance altogether / in general, so your issue is not really offtopic, but should be a separate issue - my intention was to keep this issue here easier to read / skim in the future.

blueyed added a commit to blueyed/pytest that referenced this issue Oct 24, 2018
For pytest's own suite the `cache_info()` looks as follows:

    > session.config._getconftest_pathlist.cache_info()
    CacheInfo(hits=231, misses=19, maxsize=None, currsize=19)

While it does not really make a difference for me this might help with
larger test suites / the case mentioned in
pytest-dev#2206 (comment).
@boxed
Copy link
Contributor

boxed commented Oct 25, 2018

I've done some more investigation on the collect phase speed issues:

3.4.0  6s
3.5.0  6s
3.5.1  6s
3.6.0  3s  (!!!!)
3.6.1  6s
3.6.2  6s
3.6.4  6s
3.7.0  13s (!)
3.7.4  13s
3.8.2  13s
3.9.3  13s

So as you can see I've only tried the released versions, but it's narrowed down a lot. I was very impressed by the speed of 3.6.0, so much I had to check and recheck that it did indeed find all my tests.

I have written a little test script that demonstrates some (but not all!) of the behavior we see:

from os import makedirs
from os.path import expanduser, join
from shutil import rmtree

base_dir = expanduser('~/Downloads/test_pytest_collect_speed/tests/')

rmtree(base_dir)

for i in range(100):
    d = join(base_dir, f'foo_{i}')
    makedirs(d)
    with open(join(d, f'__init__.py'), 'w') as f:
        pass

    for j in range(10):
        with open(join(d, f'test_bar_{j}.py'), 'w') as f:
            f.write('def test_foo(): pass')

Running this script and then pytest -k asdasdasdasdasdas produces these times on my machine:

3.4.0   3.59s
3.6.0   3.80s 
3.6.1   3.78s
3.6.2   3.76s
3.6.4   4.61s
3.7.0   11.59s
3.7.4   12.14s
3.8.0   12.07s
3.9.1   12.97s
3.9.2   12.89s

As you can see 3.6 isn't much faster with this test, unlike our real code base, but the slowdown in 3.7 is clearly visible.

@nicoddemus
Copy link
Member

Thanks @boxed for the analysis.

3.7.0 saw the introduction of package-scoped fixtures, that's very likely to be the cause of the slowdown since it changed the collection code significantly.

@boxed
Copy link
Contributor

boxed commented Oct 25, 2018

That sounded plausible, but looking at the cProfile output I don't think that's the case:

3.6.4:

================================================ 1000 deselected in 6.31 seconds =================================================
         6535056 function calls (6309736 primitive calls) in 7.488 seconds

   Ordered by: internal time

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
   188165    1.639    0.000    1.639    0.000 {built-in method posix.stat}
   191421    0.398    0.000    0.684    0.000 posixpath.py:75(join)
     2256    0.324    0.000    2.659    0.001 imp.py:255(find_module)
 45037/51    0.289    0.000    7.341    0.144 callers.py:151(_multicall)
   169685    0.202    0.000    1.655    0.000 genericpath.py:27(isfile)
  44823/2    0.162    0.000    7.338    3.669 hooks.py:244(__call__)
   383193    0.137    0.000    0.156    0.000 {built-in method posix.fspath}
394795/394794    0.119    0.000    0.124    0.000 {built-in method builtins.isinstance}
   300547    0.107    0.000    0.107    0.000 {method 'startswith' of 'str' objects}
153883/153877    0.089    0.000    0.090    0.000 {built-in method builtins.getattr}
46213/44012    0.079    0.000    0.299    0.000 __init__.py:368(_getconftestmodules)
   215437    0.071    0.000    0.140    0.000 posixpath.py:41(_get_sep)
76933/76913    0.071    0.000    0.075    0.000 {built-in method builtins.hasattr}
    34000    0.068    0.000    1.688    0.000 python.py:405(_makeitem)
    12775    0.066    0.000    0.105    0.000 posixpath.py:331(normpath)
   113286    0.060    0.000    0.060    0.000 {method 'split' of 'str' objects}
 45037/51    0.059    0.000    7.341    0.144 manager.py:58(<lambda>)
    68000    0.058    0.000    0.803    0.000 python.py:216(pytest_pycollect_makeitem)
      572    0.057    0.000    0.057    0.000 {built-in method marshal.loads}
   132290    0.056    0.000    0.101    0.000 terminalwriter.py:74(<genexpr>)
    23322    0.053    0.000    0.078    0.000 imp.py:105(get_suffixes)
    18514    0.052    0.000    0.081    0.000 local.py:282(_getbyspec)
    41607    0.051    0.000    0.354    0.000 main.py:409(gethookproxy)
     2004    0.051    0.000    0.051    0.000 {method 'truncate' of '_io.FileIO' objects}
     1200    0.049    0.000    0.049    0.000 {built-in method io.open}
      933    0.045    0.000    0.045    0.000 {built-in method posix.listdir}
    34000    0.044    0.000    0.082    0.000 unittest.py:14(pytest_pycollect_makeitem)
     1000    0.044    0.000    1.817    0.002 python.py:376(collect)
    34000    0.043    0.000    0.067    0.000 python.py:359(_matches_prefix_or_glob_option)
   236510    0.043    0.000    0.045    0.000 {method 'get' of 'dict' objects}
     2000    0.042    0.000    0.258    0.000 fixtures.py:1071(getfixtureclosure)
    32000    0.041    0.000    0.165    0.000 fixtures.py:1209(getfixturedefs)
     1047    0.040    0.000    0.040    0.000 {built-in method marshal.load}
    16533    0.040    0.000    0.071    0.000 posixpath.py:154(dirname)
1649/1628    0.039    0.000    0.108    0.000 {built-in method builtins.__build_class__}
35828/35214    0.038    0.000    0.042    0.000 {built-in method builtins.issubclass}
   187622    0.038    0.000    0.038    0.000 {method 'endswith' of 'str' objects}
    20886    0.035    0.000    0.306    0.000 local.py:371(check)
    53345    0.035    0.000    0.109    0.000 local.py:163(__eq__)

3.7.0:

================================================ 1000 deselected in 17.79 seconds ================================================
         34834354 function calls (34601846 primitive calls) in 18.905 seconds

   Ordered by: internal time

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
  7339091    2.219    0.000    2.915    0.000 {built-in method posix.fspath}
  1407287    2.094    0.000    3.809    0.000 posixpath.py:154(dirname)
   192750    1.667    0.000    1.667    0.000 {built-in method posix.stat}
   167757    1.439    0.000    8.886    0.000 common.py:315(parts)
  1406098    1.215    0.000    5.318    0.000 local.py:312(dirpath)
  2826149    1.199    0.000    3.794    0.000 local.py:163(__eq__)
  5680042    0.696    0.000    0.696    0.000 common.py:394(__fspath__)
  1606319    0.464    0.000    0.788    0.000 posixpath.py:41(_get_sep)
   113301    0.428    0.000    7.562    0.000 main.py:523(<genexpr>)
   191150    0.383    0.000    0.655    0.000 posixpath.py:75(join)
  1420042    0.382    0.000    0.382    0.000 {method 'rfind' of 'str' objects}
1796928/1796927    0.373    0.000    0.379    0.000 {built-in method builtins.isinstance}
     2252    0.325    0.000    2.597    0.001 imp.py:255(find_module)
  1427462    0.308    0.000    0.308    0.000 {built-in method __new__ of type object at 0x10a57b840}
 45837/51    0.291    0.000   18.739    0.367 callers.py:157(_multicall)
  1088730    0.283    0.000    0.283    0.000 {method 'rstrip' of 'str' objects}
    56550    0.205    0.000    3.575    0.000 main.py:518(<genexpr>)
   169453    0.196    0.000    1.637    0.000 genericpath.py:27(isfile)
  1513672    0.190    0.000    0.190    0.000 {method 'append' of 'list' objects}
  45623/2    0.173    0.000   18.737    9.368 hooks.py:270(__call__)
1561811/1561136    0.170    0.000    0.171    0.000 {built-in method builtins.len}
   307054    0.104    0.000    0.104    0.000 {method 'startswith' of 'str' objects}
    51612    0.096    0.000    0.158    0.000 local.py:282(_getbyspec)
160583/160577    0.090    0.000    0.091    0.000 {built-in method builtins.getattr}
   182348    0.085    0.000    0.085    0.000 {method 'split' of 'str' objects}
    15674    0.078    0.000    0.124    0.000 posixpath.py:331(normpath)
47213/45012    0.076    0.000    0.296    0.000 __init__.py:368(_getconftestmodules)

I don't see how package-scoped fixtures can be an explanation of how the number of calls to posix.fspath went from 383193 to 7339091. That's 19x more.

@boxed
Copy link
Contributor

boxed commented Oct 25, 2018

We're spending the vast amount of time in this piece of code in _pytest/main.py:

                if not any(x in path.parts() for x in paths):

I'm having trouble wrapping my head around what it's doing. It was introduced with package-scoped fixtures as you suspected.

@martenbr
Copy link

Here some data from running running @boxed test case under pyflame.

Worst offender in terms of performance regressions are:

Both were indeed introduced with the package-scoped fixtures feature.

https://github.com/martenbr/flamecharts/blob/master/3.6.4.svg (8.63s)
https://github.com/martenbr/flamecharts/blob/master/3.7.0.svg (32.83s)
(Download and open in a browser to get interactive charts)

@boxed
Copy link
Contributor

boxed commented Oct 25, 2018

Well.. we found it. I'll send a PR.

@RonnyPfannschmidt
Copy link
Member Author

we really ought to take a look at using strings and a set there ^^

@boxed
Copy link
Contributor

boxed commented Oct 25, 2018

It seems to me like a purge of py's path handling is the way to go. It looks like a high level path juggling lib but it goes to the filesystem crazy much.

I also noticed that between 3.7.0 (which I tested my performance fix on) and master, more performance regressions have landed, resulting in my test going from 7 seconds (in 3.7 with my patch) to 9.3 seconds.

@RonnyPfannschmidt
Copy link
Member Author

@boxed perso0nally i'D be delighted if we could remove all of py.path.* unfortunateyl we'll have to keep backward compatibility shims

@boxed boxed mentioned this issue Oct 25, 2018
3 tasks
@boxed
Copy link
Contributor

boxed commented Oct 25, 2018

Here's a PR #4237

There's a lot of performance left on the table still, so I'll keep looking. This is still ~50% slower than 3.9.1 on my machine.

@RonnyPfannschmidt
Copy link
Member Author

lets commit this to the examples folder - those around >1000 files share content, so its inexpensive to keep in git - and we can have a direct folder in the checkout to test with

@RonnyPfannschmidt
Copy link
Member Author

i made 10.000 files - pre-collect time is insane

@RonnyPfannschmidt
Copy link
Member Author

this makes me wonder how much of that is pytest hooks/pluggy and whether the idea of cythonizing hooks would make a good difference

@bluetech
Copy link
Member

bluetech commented Apr 24, 2020

Here is some analysis, not of collection but mostly for execution.

Methodology

I use the idea from #401, without the sleep:

import pytest

@pytest.mark.parametrize("x", range(5000))
def test_foo(x):
    pass

I analyze how much overhead each non-essential plugin adds to the default with all plugins enabled. Run with -p no:<plugin> 5 times and take the minimum to compute the overhead.

Specs: Linux, Intel(R) Core(TM) i5-5200U CPU @ 2.20GHz, HDD, Python 3.8, current master (7d5f5a8), performance governor, turbo off

Hacky script
#!/usr/bin/env python
from typing import List
import subprocess
import time

plugins = (
    "assertion",
    "cacheprovider",
    "capture",
    "debugging",
    "doctest",
    "faulthandler",
    "freeze_support",
    "junitxml",
    "logging",
    "monkeypatch",
    "nose",
    "pastebin",
    "recwarn",
    "reports",
    "resultlog",
    "setuponly",
    "setupplan",
    "skipping",
    "stepwise",
    "unittest",
    "warnings",
)


def check(plugin: str, iterations: int = 5) -> List[float]:
    timings = []
    for i in range(iterations):
        start = time.perf_counter()
        subprocess.run(
            ["python", "-m", "pytest", "xx.py", "-p", f"no:{plugin}"],
            stdout=subprocess.DEVNULL,
            check=True,
            env={"PYTHONPATH": "src/"},
        )
        timings.append(time.perf_counter() - start)
    return timings


print("PLUGIN", "T1", "T2", "T3", "T4", "T5", "TMIN", "DIFF", "DIFF%")

timings = check("$DEFAULT")
baseline = min(timings)
print(
    "{:15} {:2.4f}s {:2.4f}s {:2.4f}s {:2.4f}s {:2.4f}s {:2.4f}s {:+.4f}s {:+.3f}%".format(
        "BASELINE",
        timings[0],
        timings[1],
        timings[2],
        timings[3],
        timings[4],
        min(timings),
        baseline - min(timings),
        ((baseline - min(timings)) / baseline) * 100,
    )
)


for plugin in plugins:
    timings = check(plugin)
    print(
        "{:15} {:2.4f}s {:2.4f}s {:2.4f}s {:2.4f}s {:2.4f}s {:2.4f}s {:+.4f}s {:+.3f}%".format(
            plugin,
            timings[0],
            timings[1],
            timings[2],
            timings[3],
            timings[4],
            min(timings),
            baseline - min(timings),
            ((baseline - min(timings)) / baseline) * 100,
        )
    )

Results

Detailed results
PLUGIN T1 T2 T3 T4 T5 TMIN DIFF DIFF%
BASELINE 8.6669s 8.6314s 8.5093s 8.7840s 8.7474s 8.5093s +0.0000s +0.000%
assertion 8.6088s 8.5090s 8.4971s 8.5183s 8.5142s 8.4971s +0.0121s +0.142%
cacheprovider 8.1316s 8.1153s 8.0618s 8.1250s 8.1907s 8.0618s +0.4475s +5.259%
capture 7.5484s 7.6386s 7.6759s 7.5143s 7.6171s 7.5143s +0.9949s +11.692%
debugging 8.5375s 8.6287s 8.5948s 8.7474s 8.5854s 8.5375s -0.0282s -0.332%
doctest 8.7486s 8.6381s 8.5814s 8.5956s 8.6282s 8.5814s -0.0721s -0.848%
faulthandler 8.5920s 8.6088s 8.5411s 8.6846s 8.5611s 8.5411s -0.0319s -0.374%
freeze_support 8.6831s 8.6670s 8.6971s 8.6463s 8.5876s 8.5876s -0.0784s -0.921%
junitxml 8.5813s 8.5482s 8.5990s 8.6958s 8.6254s 8.5482s -0.0389s -0.457%
logging 6.9895s 6.8886s 6.8846s 6.9546s 6.9742s 6.8846s +1.6246s +19.092%
monkeypatch 8.6100s 8.5525s 8.6031s 8.5367s 8.6799s 8.5367s -0.0274s -0.322%
nose 8.5155s 8.5339s 8.4914s 8.5055s 8.5123s 8.4914s +0.0179s +0.210%
pastebin 8.7020s 8.6695s 8.6396s 8.6313s 8.5839s 8.5839s -0.0746s -0.877%
recwarn 8.6273s 8.6493s 8.5004s 8.4378s 8.6495s 8.4378s +0.0714s +0.839%
reports 8.5065s 8.7384s 8.6581s 8.7497s 8.6464s 8.5065s +0.0028s +0.033%
resultlog 8.6606s 8.5702s 8.5818s 8.5576s 8.5925s 8.5576s -0.0483s -0.568%
setuponly 8.7867s 8.5259s 8.5698s 8.5807s 8.6715s 8.5259s -0.0166s -0.195%
setupplan 8.6306s 8.6868s 8.6589s 8.6690s 8.6238s 8.6238s -0.1145s -1.346%
skipping 8.0935s 8.0553s 7.9831s 7.9610s 7.9105s 7.9105s +0.5987s +7.036%
stepwise 8.5489s 8.6844s 8.5964s 8.6576s 8.7008s 8.5489s -0.0397s -0.466%
unittest 8.6372s 8.5504s 8.4745s 8.5486s 8.5737s 8.4745s +0.0347s +0.408%
warnings 7.7409s 7.7247s 7.6440s 7.6950s 7.7402s 7.6440s +0.8652s +10.168%
ALL 3.6315s 3.6125s 3.6365s 3.6323s 3.6572s 3.6125s +4.8968s +57.546%

So overall, for this simple case, the plugins take 57.5% of the time, almost all of it from these 5:

  • logging (19%)
  • capture (11%)
  • warnings (10%)
  • skipping (7%)
  • cacheprovider (5%)

@nicoddemus

This comment has been minimized.

@bluetech

This comment has been minimized.

@boxed
Copy link
Contributor

boxed commented Apr 24, 2020

@bluetech

I ran this test with pytest (no config): 11.6s.
python -m pytest -p no:logging -p no:warnings -p no:skipping -p no:cacheprovider --assert=plain: 9.2s
With hammett: 0.9s.

Not sure if those parameters are correct, because passing any random string there doesn't give an error message so I'm not sure pytest isn't just swallowing an invalid configuration silently...

Another interesting test is to remove the plain asserts param and changing the test to assert False. Pytest: 19.9s, hammett: 2.2s

(I would argue that hammett actually right now provides a slightly better user experience for failures right now.)

@bluetech
Copy link
Member

Hi @boxed, thanks for checking! It's nice to have a lower bound with hammett.

Interesting that it's only 11.6 -> 9.2 for you. Can you describe your system (operating system and hardware)?

Not sure if those parameters are correct, because passing any random string there doesn't give an error message so I'm not sure pytest isn't just swallowing an invalid configuration silently...

You can verify with --trace-config.

Another interesting test is to remove the plain asserts param and changing the test to assert False. Pytest: 19.9s, hammett: 2.2s

Hmm. I haven't looked at assertion rewriting. IIRC assertion rewriting is cached/compiled to pyc though, is it still 19.9s on second run?


@RonnyPfannschmidt

this makes me wonder how much of that is pytest hooks/pluggy and whether the idea of cythonizing hooks would make a good difference

A large chunk of the remaining time (after turning off the plugins) is taken by _multicall from pluggy. I looked at it and there is not much to optimize there, it doesn't do any fluffy stuff. I'm not sure even Cython will be able to speed it up much. It is mostly just the overhead of the plugin virtual dispatch. We might look into ways to reach it less however.

@boxed
Copy link
Contributor

boxed commented Apr 24, 2020

I realized I had run this test in a rather dirty venv that had the django plugin. Made a new fresh venv and got Python 3.7.3, pytest-5.4.1, py-1.8.1, pluggy-0.13.1.

6 seconds with tests passing. Big improvement from 11s! Still bad compared to hammetts 2, but at least same order of magnitude.
39.2 seconds when all tests fail! That's a massive performance degradation compared to 4.6.2.

Assertion rewriting on or off has a small effect. It's very obvious that the degradation in performance is due to the printing at the end.

Hammett also prints local variables so that's not it.

@bluetech
Copy link
Member

Assertion rewriting on or off has a small effect. It's very obvious that the degradation in performance is due to the printing at the end.

All tests failing is probably not a very common case, but 39.2 does sound excessive. If you have time to do a cProfile of this scenario, I'd be interested. Also be interested to know how long it takes with >& /dev/null (which should remove terminal overhead, and also internal pytest overhead when outputting to a tty).

@boxed
Copy link
Contributor

boxed commented Apr 26, 2020

I tried > /dev/null and it made no difference.

As for a cProfile. It's simple enough to reproduce, it's not specific to my machine.

@boxed
Copy link
Contributor

boxed commented Apr 27, 2020

I wrote up a little benchmark suite: https://github.com/boxed/test-benchmarks

@joekohlsdorf
Copy link

joekohlsdorf commented Jun 24, 2020

Both nose2 and hammett don't collect a list of tests before running. They just collect a list of files containing tests (for nose2 based on filename heuristics).
If you want a comparison of test runtime I suggest you pass a list of tests and view test collection as a separate problem.

From some experience writing our own collector to work around pytest collector slowness, I doubt they are capable of finding all types of tests that pytest supports.

@boxed
Copy link
Contributor

boxed commented Jun 24, 2020

What do you mean "don't collect tests"? Hammett finds all pytest tests, fixtures and parametrized tests and runs them all. What do you mean "collect" is more than that?

And also note that the benchmarks contain empty directory. What is pytest "collecting" that justifies it taking so much more time doing nothing?

I can make a benchmark for explicitly sending the full list of tests. Shall we say 100? Or just one? Hammett will run circles around pytest on that too, I will gladly put down a $100 bet on this before having written the benchmark.

Let's be serious though: the benchmarks are not super strange and show a real problem in pytest. There is no point in trying to hide simple facts.

@kapilt
Copy link

kapilt commented Aug 24, 2020

just as an observation point, while working on a pytest plugin (pytest-terraform) and using xdist on a many core machine, i've noticed that pytest does some fairly invasive/expensive introspection on any python module, basically a getattr on everything it can find in a module trying to find plugins and fixtures, looking at some of the perf traces this appears to cause significant runtime execution count and time accumulation, for xdist this gets multiplied where pytest collection appears to be dominate (60-70%) the overall test execution time. as a first step it might be nice just to have some instrumentation within pytest its unclear if that can be done out of tree via plugin to accurately capture the cost versus needing to be in core to get stats on the cost of plugin scanning, another would be reworking the pluggy hook and fixture mechanism to populate a module global so that lookups are explicit instead of implicit. a speed infrastructure setup would also be useful to prevent significant regressions over time.

@nicoddemus
Copy link
Member

@kapilt thanks for the comment. The timing of your comment is uncanny, because @bluetech has opened a few days ago a PR in the direction of reducing the amount of introspection done at collection time: #7671.

@kapilt
Copy link

kapilt commented Aug 24, 2020

i started playing around PYTEST_DEBUG (didn't realize it existed on previous comment) and py-spy on test collection time py-spy record -r 1000 --format speedscope -o test -- pytest tests/*py -k nothing_here, while using PYTEST_DEBUG the other significant portion of time seems to be spent looking at directory traversal and file examination of non python files. as a context i'm running this on the tests for https://github.com/cloud-custodian/cloud-custodian we have a significant amount of json in subdirectories of our test structure that represent recorded outputs, simply manipulating the parameters results in a 2/3 decrease in collection times.

ie. the delta for running pytest tests and pytests tests/*py

cloud-custodian on  upstream [!?] is 📦 v0.9.5 via 🐍 v3.8.5 (c7n-3.8) on ☁️  us-east-1                                                                                                       
❯ time pytest tests/test*py -k nothing_here                                                                                                                                                   
Test session starts (platform: linux, Python 3.8.5, pytest 5.3.5, pytest-sugar 0.9.4)                                                                                                         
rootdir: /home/kapilt/projects/cloud-custodian, inifile: tox.ini                                                                                                                              
plugins: xdist-1.34.0, sugar-0.9.4, forked-1.3.0, cov-2.10.0                                                                                                                                  
collecting ...                                                                                                                                                                                
                                                                                                                                                                                              
Results (0.99s):                                                                                                                                                                              
    1625 deselected                                                                                                                                                                           
                                                                                                                                                                                              
real    0m1.786s                                                                                                                                                                              
user    0m1.705s                                                                                                                                                                              
sys     0m0.084s                                                                                                                                                                              
                                                                                                                                                                                              
cloud-custodian on  upstream [!?] is 📦 v0.9.5 via 🐍 v3.8.5 (c7n-3.8) on ☁️  us-east-1                                                                                                       
❯ time pytest tests -k nothing_here                                                                                                                                                           
Test session starts (platform: linux, Python 3.8.5, pytest 5.3.5, pytest-sugar 0.9.4)                                                                                                         
rootdir: /home/kapilt/projects/cloud-custodian, inifile: tox.ini                                                                                                                              
plugins: xdist-1.34.0, sugar-0.9.4, forked-1.3.0, cov-2.10.0                                                                                                                                  
collecting ... 

Results (4.63s):
    1625 deselected

real    0m5.431s
user    0m5.097s
sys     0m0.332s

actually running tests instead of just collection with xdist -n auto on amd ryzen 4000 laptop (8core/16 hw thread), results in a real world test time reduction from 38s to 25s.

the PYTEST_DEBUG output showing the plugin processing of other data files in the same directory

            pytest_collect_file [hook]                                                                                                                                                        
                path: /home/kapilt/projects/cloud-custodian/tests/data/iam/s3-policies.json
                parent: <Session cloud-custodian exitstatus=<ExitCode.OK: 0> testsfailed=0 testscollected=0>
            finish pytest_collect_file --> [] [hook]    

for plugins that want to load files, it might be better to have some explicit file extension registration on the sessoin rather than dispatch on every file found.

@bluetech
Copy link
Member

@kapilt pytest "has" to dispatch the hook on these files, because there might be a plugin which turns them into a test, check out https://docs.pytest.org/en/stable/example/nonpython.html for example.

In cloud-custodian it seems all of the data files are nicely encapsulated in one directory, so you might want to tell pytest to skip it. There are amusingly many ways to do it: --ignore, --ignore-glob, the norecursedirs ini, collect_ignore global in a conftest, and the pytest_ignore_collect hook in a conftest -- take your pick :)

@boxed
Copy link
Contributor

boxed commented Aug 24, 2020

That non-python files feature seems a perfect example of something that should be opt in so not everyone gets the performance hit.

@kapilt
Copy link

kapilt commented Aug 24, 2020

@bluetech thanks for the pointers re config options, however the plethora of overlapping options around this is indicative that many people have had this issue. while I love the configurability/pluggability of pytest, ideally features should not impose significant costs to those not using them, ie zero cost abstractions. in this context, I would hope having plugins in an early hook register their interest in file extensions, would allow a fast non plugin based filtering at the session level when collecting files with a default to just py files, ie. zen of python explicit is better than implicit.

@bluetech
Copy link
Member

I kinda agree, but that would be a breaking change at this point, so my preference is to just make it fast instead :)

From the snippet above I see you are using pytest 5.3.5. There have been several performance improvements since then. It would be great if you can test it with the latest release, or even pytest master, or even even with #7671:

pip install git+https://github.com/pytest-dev/pytest.git@refs/pull/7671/head

@kapilt
Copy link

kapilt commented Aug 24, 2020

thanks, i appreciate the pitfalls of compatibility here, we had that version pinned, due to issues with pytest-sugar (output writer compat), along with pytest-xdist and pytest-terraform (both of which needed tmp factory root location).

Trying #7671 vs trunk to compare, via time pytest tests -k foobar with norecursedirs set, shows a minor improvement (maybe 2%, its within test run delta margins), albeit it significantly cleanups the call tree when visualizing as well as the debug output, so its a net win regardless of perf imho.
pytest-7671

@boxed
Copy link
Contributor

boxed commented Aug 25, 2020

I think pytest is overly concerned with backwards compatibility. The performance problems are much worse than breaking backwards compatibility, especially if the change is that users just needs to add some config in setup.cfg (or project.toml or whatever the flavor of the month is.. the need to check multiple files seems like a problem too, but smaller :P )

@nicoddemus
Copy link
Member

@kapilt

Thanks for the performance benchmarks!

the delta for running pytest tests and pytests tests/*py

Indeed it is a bit surprising, but at a cursory glance I see that your test files are all in the root tests directory, and you have many many other support files inside tests/data, including .py files.

I don't think in this case it would be a matter of pytest calling hooks for just some file extensions would really help much, hook calls are supposed to be cheap, and we still need to traverse the directories and list the files anyway.

For this specific case I think the best bet is to use norecursedirs to stop pytest from even recursing into tests/data at all:

[pytest]
norecursedirs=data

This simple configuration change should speed up your specific case significantly.

@boxed

I think pytest is overly concerned with backwards compatibility.

Not sure, I think it is the responsible thing to do given that pytest is part of the infrastructure of so many projects and users.

Having said that, I agree that if we can speed things up by introducing a backward incompatible change that can be reverted in a simple manner (say a config option or plugins to implement an extra hook), we just need to make sure the change brings real gains to justify the breakage, and doing in a way that is the least disruptive as possible.

@boxed
Copy link
Contributor

boxed commented Aug 26, 2020

@nicoddemus

Not sure, I think it is the responsible thing to do given that pytest is part of the infrastructure of so many projects and users.

Depends on your perspective. Yes it might introduce some breakage which will take some time to fix, but breakage+fixing is O(1), while overhead for invocations is O(n), sometimes even O(n*m). It's not at all obvious that is bigger than the other.

we just need to make sure the change brings real gains to justify the breakage, and doing in a way that is the least disruptive as possible.

Agreed.

@nicoddemus
Copy link
Member

Depends on your perspective. Yes it might introduce some breakage which will take some time to fix, but breakage+fixing is O(1), while overhead for invocations is O(n), sometimes even O(n*m). It's not at all obvious that is bigger than the other.

Indeed there's a balance. 👍

@cosmith
Copy link

cosmith commented Apr 20, 2021

Adding one data point here: running a single test on our django codebase takes about 25 seconds.

>>> pytest --no-migrations --no-cov -p no:warnings -k test_do_nothing
Test session starts (platform: linux, Python 3.8.6, pytest 6.2.2, pytest-sugar 0.9.4)
django: settings: xxx.settings.pytest (from env)
rootdir: /dev-app, configfile: setup.cfg
plugins: timeout-1.4.2, django-4.1.0, xdist-2.2.1, forked-1.3.0, requests-mock-1.8.0, env-0.6.2, subtests-0.4.0, cov-2.11.1, sugar-0.9.4, ddtrace-0.48.0, Faker-6.5.0, celery-4.4.7
timeout: 90.0s
timeout method: signal
timeout func_only: False
collecting ...
 backend/xxx/search/tests.py ✓                                                                                                                                              100% ██████████

Results (24.64s):
       1 passed
    1729 deselected

Running the same but narrowing the search path shaves a good 10 seconds off that:

>>> pytest --no-migrations --no-cov -p no:warnings xxx/*/tests/ xxx/*/tests.py -k test_do_nothing
Test session starts (platform: linux, Python 3.8.6, pytest 6.2.2, pytest-sugar 0.9.4)
django: settings: xxx.settings.pytest (from env)
rootdir: /dev-app, configfile: setup.cfg
plugins: timeout-1.4.2, django-4.1.0, xdist-2.2.1, forked-1.3.0, requests-mock-1.8.0, env-0.6.2, subtests-0.4.0, cov-2.11.1, sugar-0.9.4, ddtrace-0.48.0, Faker-6.5.0, celery-4.4.7
timeout: 90.0s
timeout method: signal
timeout func_only: False
collecting ...
 backend/xxx/search/tests.py ✓                                                                                                                                              100% ██████████

Results (15.82s):
       1 passed
    1729 deselected

Selecting a test that doesn't exist shaves another 10 to 12 seconds so it would seem that this does not comes from our own code:

pytest --no-migrations --no-cov -p no:warnings xxx/*/tests/ xxx/*/tests.py -k test_i_dont_exist
Test session starts (platform: linux, Python 3.8.6, pytest 6.2.2, pytest-sugar 0.9.4)
django: settings: xxx.settings.pytest (from env)
rootdir: /dev-app, configfile: setup.cfg
plugins: timeout-1.4.2, django-4.1.0, xdist-2.2.1, forked-1.3.0, requests-mock-1.8.0, env-0.6.2, subtests-0.4.0, cov-2.11.1, sugar-0.9.4, ddtrace-0.48.0, Faker-6.5.0, celery-4.4.7
timeout: 90.0s
timeout method: signal
timeout func_only: False
collecting ...

Results (3.32s):
    1730 deselected

[EDIT] After discussing with my coworkers it seems that not everyone has the same behavior. It seems to be this slow on MacOS only but not for everybody...

@holmanb
Copy link
Contributor

holmanb commented Jan 18, 2022

Adding another data point:

python 3.9.7/6.0.2

python -m cProfile -o perf.out -m pytest  --collect-only

   Ordered by: internal time

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
      897    3.544    0.004    3.544    0.004 {built-in method builtins.compile}
1097772/553    1.194    0.000    3.050    0.006 /usr/lib/python3.9/ast.py:409(generic_visit)
1098325/553    0.520    0.000    3.051    0.006 /usr/lib/python3.9/ast.py:403(visit)
  2997174    0.495    0.000    0.648    0.000 /usr/lib/python3.9/ast.py:244(iter_fields)
3404940/3404780    0.451    0.000    0.461    0.000 {built-in method builtins.getattr}
  4095206    0.326    0.000    0.332    0.000 {built-in method builtins.isinstance}
   472384    0.240    0.000    0.316    0.000 /usr/lib/python3/dist-packages/_pytest/nodes.py:78(ischildnode)
      553    0.121    0.000    6.839    0.012 /usr/lib/python3/dist-packages/_pytest/_code/source.py:121(findsource)
   144778    0.114    0.000    0.459    0.000 /usr/lib/python3.9/ast.py:419(visit_Constant)
    21524    0.112    0.000    0.437    0.000 /usr/lib/python3/dist-packages/_pytest/fixtures.py:1692(_matchfactories)

Examining the hotpath it looks like recompiling the tests dominates the time collecting tests, with walking the ast while rewriting assertions is also significant.

Has caching modules (after they've been rewritten) to disk for reuse been considered? I haven't given it much thought, but it seems plausible at first consideration. I assume that would turn subsequent collecting into primarily a lookup operation (except for modules that have been changed, of course).

@RonnyPfannschmidt
Copy link
Member Author

@holmanb recompiled modules are cached in pyc files with a pytest tag to denote the difference

@boxed
Copy link
Contributor

boxed commented Jan 18, 2022

Of course caching doesn't really work well in cases of CI that often rebuild from scratch anyway. For example on GitHub Actions.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
type: performance performance or memory problem/improvement
Projects
None yet
Development

No branches or pull requests