Hack traversals for speed #110

inducer · 2022-09-23T18:27:24Z

There's actually a path towards merging this now.

It would need tests.
The mapper optimizer needs a way to inline the cache key, so that we don't have to hardcode it.
~~The mapper optimizer needs documentation.~~ (Decided against this; it requires a fair bit of care to use, and the sharp edges may not be obvious to users. Added a comment.)

The root of this is this contest set up by @kaushikcfd to show that we should all abandon Python for Rust (or some other AOT statically-typed language).

Here's my fork of @kaushikcfd's gist: https://gist.github.com/inducer/25e3372fd5c82384d437c777ed4153e9

Latest results:

Starting point for me was 5.18s, which is @kaushikcfd's original with the print statements removed.
I'm currently at 1.6s. I am cheating and giving myself credit for the Py3.10 -> Py3.11 transition, but since this is all about changing languages, I guess fair is fair.

inducer · 2022-09-23T18:47:50Z

Pypy is 0.90s.

kaushikcfd · 2022-09-23T19:00:47Z

For posterity, could also report the runtimes for the rust-port on your machine?

cargo new --bin symoxide_contest
cd symoxide_contest
cargo add symoxide
curl https://gist.githubusercontent.com/kaushikcfd/74c442a075557dad466cd3daea9c151f/raw/d593ce7d5a6de6764e541921472456c8528efb3b/symoxide_expr_traversal.rs > src/main.rs
cargo run --release

inducer · 2022-09-23T19:18:43Z

Just ran this. Somewhere around .28s.

nightly-x86_64-unknown-linux-gnu (default)
rustc 1.66.0-nightly (432abd86f 2022-09-20)

inducer · 2022-09-23T19:28:48Z

So pypy times seem super variable. Just reran, with

Python 3.8.13 (7.3.9+dfsg-4, Aug 09 2022, 12:51:24)
[PyPy 7.3.9 with GCC 12.1.0]

(nominally unchanged to before) and got 0.63s.

inducer · 2022-09-23T19:31:22Z

I have an idea for a Python-side AST-to-AST transform that would eliminate the bounces to rec in the bulk of cases, by effectively inlining the common case of rec. IDK whether I'll be desperate enough this weekend to do it, but I think that might do something, because it would cut the number of frames set up in a pymbolic traversal by ~2.

kaushikcfd · 2022-09-24T20:05:03Z

So pypy times seem super variable. Just reran, with

Maybe earlier the JIT costs were also included? Also, not sure if tracing JIT is mature enough to compete with AOT compilers for such workloads in terms of both -- compilation overheads and quality of generated code.

inducer · 2022-09-24T20:15:14Z

not sure if tracing JIT is mature enough to compete with AOT compilers

What does "mature" mean to you here? For me, I'm interested in correctness and speed, in that order. 🙂

kaushikcfd · 2022-09-24T20:25:28Z

What does "mature" mean to you here? For me, I'm interested in correctness and speed, in that order. slightly_smiling_face

Correctness is the bare minimum ¹. I was mainly referring about the JIT overheads (#recompliations and compilation costs)

not saying it is trivial :) ↩

inducer · 2022-09-25T18:04:28Z

The "mapper optimizer" is now a thing. It succeeds at removing passed-through *args, **kwargs when those aren't needed, and that's clearly profitable.

It also contains a code path for inlining self.rec and/or the cache retrieval, but those are curiously unprofitable. Not sure I understand why, given that I thought we were bound by frame setup.

inducer · 2022-09-25T18:17:58Z

The doc failure is sphinx-doc/sphinx#10861.

kaushikcfd · 2022-09-26T19:45:28Z

Bumped the baseline by a bit, see kaushikcfd/symoxide#3 :).

inducer · 2022-10-01T20:32:38Z

The final tally here:

Py3.11 running the latest version takes 1.57s.
Py3.11 without the optimizer takes 2.76s.
Py3.10 with the optimizer takes 2.26s.
Py3.11 running the benchmark from "Add Kaushik's traversal benchmark" (which has a few improvements for which we shouldn't credit the infrastructure) on 3da6f85 (close to current main) takes 3.9s.
Py3.10 running the same thing takes 4.19s.
Pypy 7.3.9 (3.8) takes 0.57s, without taking advantage of the mapper optimizer.

There is a loopy failure, but that's just a doctest fail because this turns off flatten-by-default in the IdentityMapper.

…key scheme

inducer force-pushed the need-for-speed branch 5 times, most recently from 8ac5423 to 472862e Compare September 25, 2022 18:01

inducer force-pushed the need-for-speed branch 2 times, most recently from 457ce74 to 79f21b0 Compare October 1, 2022 20:24

inducer added 7 commits October 1, 2022 15:34

Downstream CI: do not fail fast

bac0a7c

Bump compat target to 3.8 (for walruses)

b16d9c1

Don't flatten in IdentityMapper

7c5fc96

Use faster main dispatch, inline into CachedMapper, use single cache …

9fb914c

…key scheme

Add Kaushik's traversal benchmark

7dc7cb1

Add mapper optimizer

5372b07

Use mapper optimizer in traversal benchmark

8c65f8b

inducer force-pushed the need-for-speed branch from 79f21b0 to 8c65f8b Compare October 1, 2022 20:36

inducer marked this pull request as ready for review October 1, 2022 20:36

inducer mentioned this pull request Oct 1, 2022

Search for parent class' mapper_method in the Mapper #64

Merged

inducer merged commit 3f301c5 into main Oct 1, 2022

inducer deleted the need-for-speed branch October 1, 2022 22:29

inducer mentioned this pull request Oct 2, 2022

GCC 11 and 12 have pathological compile time on test_fuzz_expression_code_gen output inducer/loopy#686

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Hack traversals for speed #110

Hack traversals for speed #110

inducer commented Sep 23, 2022 •

edited

inducer commented Sep 23, 2022

kaushikcfd commented Sep 23, 2022

inducer commented Sep 23, 2022 •

edited

inducer commented Sep 23, 2022

inducer commented Sep 23, 2022 •

edited

kaushikcfd commented Sep 24, 2022

inducer commented Sep 24, 2022

kaushikcfd commented Sep 24, 2022

inducer commented Sep 25, 2022

inducer commented Sep 25, 2022

kaushikcfd commented Sep 26, 2022 •

edited

inducer commented Oct 1, 2022 •

edited

Hack traversals for speed #110

Hack traversals for speed #110

Conversation

inducer commented Sep 23, 2022 • edited

inducer commented Sep 23, 2022

kaushikcfd commented Sep 23, 2022

inducer commented Sep 23, 2022 • edited

inducer commented Sep 23, 2022

inducer commented Sep 23, 2022 • edited

kaushikcfd commented Sep 24, 2022

inducer commented Sep 24, 2022

kaushikcfd commented Sep 24, 2022

Footnotes

inducer commented Sep 25, 2022

inducer commented Sep 25, 2022

kaushikcfd commented Sep 26, 2022 • edited

inducer commented Oct 1, 2022 • edited

inducer commented Sep 23, 2022 •

edited

inducer commented Sep 23, 2022 •

edited

inducer commented Sep 23, 2022 •

edited

kaushikcfd commented Sep 26, 2022 •

edited

inducer commented Oct 1, 2022 •

edited