New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add a PUSHENVACCMANY opcode. #18964
base: master
Are you sure you want to change the base?
Add a PUSHENVACCMANY opcode. #18964
Conversation
@coqbot bench |
@coqbot run full ci |
🏁 Bench results:
🐢 Top 25 slow downs┌───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┐ │ TOP 25 SLOW DOWNS │ │ │ │ OLD NEW DIFF %DIFF Ln FILE │ ├───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┤ │ 180.5720 184.0040 3.4320 1.90% 233 coq-fiat-crypto-with-bedrock/rupicola/bedrock2/deps/riscv-coq/src/riscv/Proofs/DecodeByExtension.v.html │ │ 76.6340 78.9300 2.2960 3.00% 365 coq-mathcomp-odd-order/theories/PFsection4.v.html │ │ 216.8910 218.8720 1.9810 0.91% 103 coq-fiat-crypto-with-bedrock/src/Arithmetic/BarrettReduction.v.html │ │ 62.5680 64.4540 1.8860 3.01% 368 coq-mathcomp-odd-order/theories/PFsection4.v.html │ │ 255.1950 256.8250 1.6300 0.64% 1629 coq-metacoq-pcuic/pcuic/theories/PCUICSR.v.html │ │ 235.7040 237.2370 1.5330 0.65% 141 coq-fiat-crypto-with-bedrock/src/UnsaturatedSolinasHeuristics/Tests.v.html │ │ 63.0140 64.3190 1.3050 2.07% 609 coq-fiat-crypto-with-bedrock/rupicola/bedrock2/bedrock2/src/bedrock2Examples/lightbulb.v.html │ │ 45.9500 47.0050 1.0550 2.30% 926 coq-fiat-crypto-with-bedrock/src/Bedrock/End2End/X25519/GarageDoor.v.html │ │ 16.2470 17.2340 0.9870 6.07% 607 coq-mathcomp-odd-order/theories/PFsection9.v.html │ │ 2.6920 3.6230 0.9310 34.58% 490 coq-stdlib/Reals/Cauchy/ConstructiveCauchyRealsMult.v.html │ │ 7.0530 7.9560 0.9030 12.80% 881 coq-vst/veric/binop_lemmas4.v.html │ │ 47.9980 48.8710 0.8730 1.82% 50 coq-fiat-crypto-with-bedrock/src/Bedrock/Field/Synthesis/Examples/p224_64_new.v.html │ │ 50.4160 51.2510 0.8350 1.66% 50 coq-fiat-crypto-with-bedrock/src/Bedrock/Field/Synthesis/Examples/p224_64_new.v.html │ │ 134.7340 135.4950 0.7610 0.56% 155 coq-fiat-crypto-with-bedrock/src/UnsaturatedSolinasHeuristics/Tests.v.html │ │ 38.4040 39.1310 0.7270 1.89% 224 coq-performance-tests-lite/PerformanceExperiments/rewrite_lift_lets_map.v.html │ │ 39.3170 40.0250 0.7080 1.80% 236 coq-rewriter/src/Rewriter/Rewriter/Examples/PerfTesting/LiftLetsMap.v.html │ │ 155.0040 155.7090 0.7050 0.45% 1190 coq-unimath/UniMath/CategoryTheory/GrothendieckConstruction/IsPullback.v.html │ │ 98.2100 98.8470 0.6370 0.65% 376 coq-unimath/UniMath/ModelCategories/Generated/LNWFSMonoidalStructure.v.html │ │ 25.5250 26.1460 0.6210 2.43% 22 coq-fiat-crypto-with-bedrock/src/Spec/Test/X25519.v.html │ │ 25.5210 26.1410 0.6200 2.43% 17 coq-fiat-crypto-with-bedrock/src/Spec/Test/X25519.v.html │ │ 99.9450 100.4770 0.5320 0.53% 20 coq-fiat-crypto-with-bedrock/src/Rewriter/Passes/NBE.v.html │ │ 3.2750 3.7800 0.5050 15.42% 1060 coq-unimath/UniMath/CategoryTheory/EnrichedCats/Examples/FunctorCategory.v.html │ │ 27.0900 27.5910 0.5010 1.85% 68 coq-fiat-crypto-with-bedrock/rupicola/bedrock2/deps/riscv-coq/src/riscv/Proofs/VerifyDecode.v.html │ │ 19.9990 20.4990 0.5000 2.50% 25 coq-fiat-crypto-with-bedrock/src/Spec/Test/X25519.v.html │ │ 17.5520 17.9870 0.4350 2.48% 32 coq-performance-tests-lite/src/pattern.v.html │ └───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┘ 🐇 Top 25 speed ups┌───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┐ │ TOP 25 SPEED UPS │ │ │ │ OLD NEW DIFF %DIFF Ln FILE │ ├───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┤ │ 257.2970 251.4590 -5.8380 -2.27% 8 coq-neural-net-interp-computed-lite/theories/MaxOfTwoNumbersSimpler/Computed/AllLogits.v.html │ │ 42.8080 39.9910 -2.8170 -6.58% 236 coq-rewriter-perf-SuperFast/src/Rewriter/Rewriter/Examples/PerfTesting/LiftLetsMap.v.html │ │ 64.5210 63.2970 -1.2240 -1.90% 609 coq-bedrock2/bedrock2/src/bedrock2Examples/lightbulb.v.html │ │ 12.2070 11.4890 -0.7180 -5.88% 194 coq-fiat-crypto-with-bedrock/src/Fancy/Barrett256.v.html │ │ 67.0150 66.4630 -0.5520 -0.82% 27 coq-fiat-crypto-with-bedrock/src/Rewriter/Passes/ToFancyWithCasts.v.html │ │ 95.3060 94.7610 -0.5450 -0.57% 999 coq-performance-tests-lite/src/fiat_crypto_via_setoid_rewrite_standalone.v.html │ │ 26.0150 25.4800 -0.5350 -2.06% 129 coq-fiat-crypto-with-bedrock/src/Curves/Weierstrass/Projective.v.html │ │ 17.7590 17.2620 -0.4970 -2.80% 481 coq-verdi-raft/theories/RaftProofs/EndToEndLinearizability.v.html │ │ 41.2450 40.7660 -0.4790 -1.16% 139 coq-fiat-parsers/src/Parsers/Refinement/SharpenedJSON.v.html │ │ 48.2050 47.8050 -0.4000 -0.83% 558 coq-bedrock2/bedrock2/src/bedrock2Examples/insertionsort.v.html │ │ 131.9040 131.5070 -0.3970 -0.30% 22 coq-fiat-crypto-with-bedrock/src/Rewriter/Passes/ArithWithCasts.v.html │ │ 95.2500 94.8720 -0.3780 -0.40% 968 coq-performance-tests-lite/src/fiat_crypto_via_setoid_rewrite_standalone.v.html │ │ 9.7880 9.4260 -0.3620 -3.70% 87 coq-engine-bench-lite/coq/PerformanceDemos/quadratic_reduction.v.html │ │ 24.2840 23.9460 -0.3380 -1.39% 85 coq-fiat-crypto-with-bedrock/src/Curves/Montgomery/AffineProofs.v.html │ │ 17.0700 16.7740 -0.2960 -1.73% 24 coq-fiat-crypto-with-bedrock/src/Rewriter/Passes/MulSplit.v.html │ │ 23.5260 23.2300 -0.2960 -1.26% 23 coq-fiat-crypto-with-bedrock/src/Rewriter/Passes/Arith.v.html │ │ 4.4570 4.2000 -0.2570 -5.77% 199 coq-fiat-crypto-with-bedrock/src/Fancy/Montgomery256.v.html │ │ 0.6640 0.4090 -0.2550 -38.40% 2108 coq-stdlib/FSets/FMapFacts.v.html │ │ 4.9340 4.6910 -0.2430 -4.93% 320 coq-metacoq-erasure/erasure/theories/ErasureProperties.v.html │ │ 44.1920 43.9520 -0.2400 -0.54% 827 coq-vst/veric/binop_lemmas4.v.html │ │ 1.3210 1.0810 -0.2400 -18.17% 89 coq-engine-bench-lite/coq/PerformanceDemos/quadratic_reduction.v.html │ │ 0.8250 0.6170 -0.2080 -25.21% 736 coq-stdlib/Reals/Cauchy/ConstructiveCauchyReals.v.html │ │ 20.0360 19.8320 -0.2040 -1.02% 325 coq-engine-bench-lite/coq/PerformanceDemos/one_step_reduction.v.html │ │ 20.1720 19.9850 -0.1870 -0.93% 214 coq-engine-bench-lite/coq/PerformanceDemos/one_step_reduction.v.html │ │ 9.5010 9.3210 -0.1800 -1.89% 118 coq-unimath/UniMath/Semantics/LinearLogic/LiftingModel.v.html │ └───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┘ |
The overall |
The -15% max memory usage in engine-bench-lite may be real too
|
Hard to tell. If the optimization triggers, the generated bytecode is a bit smaller, so it might delay the next garbage collection. But on the other hand, the allocation of the |
Actually, this is a semi-pointless optimization, since a look at the assembly code shows that I had not taken into account the allocation for the pair. So, despite the latest change, the code is still not allocation-free. But what is done is done, and the number of allocations is nonetheless halved. |
When closures are nested inside closures, it might happen that several consecutive values from the environment of the outer closure are pushed onto the stack in order to create the inner closure. This opcode replaces a repeated sequence of opcodes "PUSH; ENVACC" of decreasing indices.
@coqbot bench |
🏁 Bench results:
🐢 Top 25 slow downs┌────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┐ │ TOP 25 SLOW DOWNS │ │ │ │ OLD NEW DIFF %DIFF Ln FILE │ ├────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┤ │ 61.7760 64.9430 3.1670 5.13% 609 coq-bedrock2/bedrock2/src/bedrock2Examples/lightbulb.v.html │ │ 61.8080 64.6620 2.8540 4.62% 609 coq-fiat-crypto-with-bedrock/rupicola/bedrock2/bedrock2/src/bedrock2Examples/lightbulb.v.html │ │ 11.8360 12.4830 0.6470 5.47% 194 coq-fiat-crypto-with-bedrock/src/Fancy/Barrett256.v.html │ │ 154.8680 155.5140 0.6460 0.42% 1190 coq-unimath/UniMath/CategoryTheory/GrothendieckConstruction/IsPullback.v.html │ │ 2.3940 2.9520 0.5580 23.31% 487 coq-stdlib/Numbers/HexadecimalFacts.v.html │ │ 4.4040 4.8600 0.4560 10.35% 733 coq-category-theory/Construction/Comma/Adjunction.v.html │ │ 44.5720 44.9270 0.3550 0.80% 827 coq-vst/veric/binop_lemmas4.v.html │ │ 0.5920 0.8800 0.2880 48.65% 160 coq-stdlib/Numbers/HexadecimalNat.v.html │ │ 33.8000 34.0670 0.2670 0.79% 839 coq-unimath/UniMath/CategoryTheory/GrothendieckConstruction/IsPullback.v.html │ │ 1.0160 1.2830 0.2670 26.28% 88 coq-engine-bench-lite/coq/PerformanceDemos/quadratic_reduction.v.html │ │ 4.3230 4.5650 0.2420 5.60% 199 coq-fiat-crypto-with-bedrock/src/Fancy/Montgomery256.v.html │ │ 2.6590 2.8600 0.2010 7.56% 761 coq-category-theory/Construction/Comma/Adjunction.v.html │ │ 0.3100 0.5080 0.1980 63.87% 705 coq-stdlib/Numbers/HexadecimalFacts.v.html │ │ 17.6040 17.7920 0.1880 1.07% 31 coq-engine-bench-lite/coq/PerformanceDemos/pattern.v.html │ │ 13.0320 13.2170 0.1850 1.42% 1028 coq-unimath/UniMath/CategoryTheory/LocalizingClass.v.html │ │ 0.7240 0.9060 0.1820 25.14% 813 coq-stdlib/MSets/MSetRBT.v.html │ │ 25.4350 25.6080 0.1730 0.68% 12 coq-fourcolor/theories/job535to541.v.html │ │ 0.0010 0.1740 0.1730 17300.00% 157 coq-metacoq-pcuic/pcuic/theories/Syntax/PCUICReflect.v.html │ │ 5.0570 5.2210 0.1640 3.24% 1258 coq-mathcomp-odd-order/theories/PFsection9.v.html │ │ 0.2490 0.4050 0.1560 62.65% 443 coq-stdlib/Reals/Abstract/ConstructiveLimits.v.html │ │ 2.4780 2.6160 0.1380 5.57% 509 coq-metacoq-pcuic/pcuic/theories/PCUICWfUniverses.v.html │ │ 0.5100 0.6430 0.1330 26.08% 808 coq-category-theory/Construction/Comma/Adjunction.v.html │ │ 7.1650 7.2920 0.1270 1.77% 583 coq-unimath/UniMath/CategoryTheory/Monoidal/Displayed/Symmetric.v.html │ │ 3.8590 3.9840 0.1250 3.24% 61 coq-fiat-crypto-with-bedrock/rupicola/src/Rupicola/Examples/L64X128.v.html │ │ 0.2170 0.3370 0.1200 55.30% 637 coq-stdlib/Reals/Abstract/ConstructiveSum.v.html │ └────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┘ 🐇 Top 25 speed ups┌───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┐ │ TOP 25 SPEED UPS │ │ │ │ OLD NEW DIFF %DIFF Ln FILE │ ├───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┤ │ 220.2730 213.8020 -6.4710 -2.94% 8 coq-neural-net-interp-computed-lite/theories/MaxOfTwoNumbersSimpler/Computed/AllLogits.v.html │ │ 241.8720 235.7000 -6.1720 -2.55% 141 coq-fiat-crypto-with-bedrock/src/UnsaturatedSolinasHeuristics/Tests.v.html │ │ 83.6260 80.0150 -3.6110 -4.32% 365 coq-mathcomp-odd-order/theories/PFsection4.v.html │ │ 138.6460 135.0940 -3.5520 -2.56% 155 coq-fiat-crypto-with-bedrock/src/UnsaturatedSolinasHeuristics/Tests.v.html │ │ 69.2210 66.3670 -2.8540 -4.12% 368 coq-mathcomp-odd-order/theories/PFsection4.v.html │ │ 257.9050 255.5070 -2.3980 -0.93% 1629 coq-metacoq-pcuic/pcuic/theories/PCUICSR.v.html │ │ 130.9410 128.6240 -2.3170 -1.77% 22 coq-fiat-crypto-with-bedrock/src/Rewriter/Passes/ArithWithCasts.v.html │ │ 222.1250 220.3510 -1.7740 -0.80% 103 coq-fiat-crypto-with-bedrock/src/Arithmetic/BarrettReduction.v.html │ │ 39.9690 38.4870 -1.4820 -3.71% 224 coq-performance-tests-lite/PerformanceExperiments/rewrite_lift_lets_map.v.html │ │ 40.7940 39.3660 -1.4280 -3.50% 236 coq-rewriter-perf-SuperFast/src/Rewriter/Rewriter/Examples/PerfTesting/LiftLetsMap.v.html │ │ 40.7680 39.3620 -1.4060 -3.45% 236 coq-rewriter/src/Rewriter/Rewriter/Examples/PerfTesting/LiftLetsMap.v.html │ │ 101.4070 100.2060 -1.2010 -1.18% 20 coq-fiat-crypto-with-bedrock/src/Rewriter/Passes/NBE.v.html │ │ 17.5580 16.7380 -0.8200 -4.67% 607 coq-mathcomp-odd-order/theories/PFsection9.v.html │ │ 46.3640 45.5920 -0.7720 -1.67% 50 coq-fiat-crypto-with-bedrock/src/Bedrock/Field/Synthesis/Examples/p224_64_new.v.html │ │ 44.0130 43.2770 -0.7360 -1.67% 50 coq-fiat-crypto-with-bedrock/src/Bedrock/Field/Synthesis/Examples/p224_64_new.v.html │ │ 11.7690 11.0580 -0.7110 -6.04% 2103 coq-mathcomp-ssreflect/mathcomp/ssreflect/order.v.html │ │ 27.7450 27.1270 -0.6180 -2.23% 68 coq-fiat-crypto-with-bedrock/rupicola/bedrock2/deps/riscv-coq/src/riscv/Proofs/VerifyDecode.v.html │ │ 37.7430 37.1490 -0.5940 -1.57% 3 coq-fiat-crypto-with-bedrock/src/ExtractionJsOfOCaml/bedrock2_fiat_crypto.v.html │ │ 9.9040 9.3550 -0.5490 -5.54% 87 coq-engine-bench-lite/coq/PerformanceDemos/quadratic_reduction.v.html │ │ 13.3830 12.8600 -0.5230 -3.91% 22 coq-fiat-crypto-with-bedrock/src/Spec/Test/X25519.v.html │ │ 37.2540 36.7320 -0.5220 -1.40% 139 coq-fiat-parsers/src/Parsers/Refinement/SharpenedJSON.v.html │ │ 13.4220 12.9010 -0.5210 -3.88% 17 coq-fiat-crypto-with-bedrock/src/Spec/Test/X25519.v.html │ │ 22.0330 21.5500 -0.4830 -2.19% 376 coq-fiat-crypto-with-bedrock/rupicola/bedrock2/compiler/src/compiler/FlatToRiscvMetric.v.html │ │ 20.3040 19.8390 -0.4650 -2.29% 214 coq-engine-bench-lite/coq/PerformanceDemos/one_step_reduction.v.html │ │ 20.0680 19.6170 -0.4510 -2.25% 325 coq-engine-bench-lite/coq/PerformanceDemos/one_step_reduction.v.html │ └───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┘ |
Looks like the memory reduction of coq-engine-bench-lite is reproducible. Apart from that, the leading vo size reductions are:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Simple and good enough for me.
@silene if you're happy with the current state of the PR I can merge it. |
When closures are nested inside closures, it might happen that several consecutive values from the environment of the outer closure are pushed onto the stack in order to create the inner closure. This opcode replaces a sequence of repeated "PUSH; ENVACC" of decreasing indices.