-
Notifications
You must be signed in to change notification settings - Fork 632
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Track unused constant arguments in Genlambda and erase them. #19014
Conversation
@coqbot bench |
🏁 Bench results:
🐢 Top 25 slow downs┌───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┐ │ TOP 25 SLOW DOWNS │ │ │ │ OLD NEW DIFF %DIFF Ln FILE │ ├───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┤ │ 95.8350 98.1720 2.3370 2.44% 376 coq-unimath/UniMath/ModelCategories/Generated/LNWFSMonoidalStructure.v.html │ │ 19.6590 21.9630 2.3040 11.72% 2061 coq-fiat-crypto-with-bedrock/rupicola/bedrock2/compiler/src/compiler/FlatToRiscvFunctions.v.html │ │ 46.4290 48.5100 2.0810 4.48% 558 coq-fiat-crypto-with-bedrock/rupicola/bedrock2/bedrock2/src/bedrock2Examples/insertionsort.v.html │ │ 46.7060 48.5860 1.8800 4.03% 558 coq-bedrock2/bedrock2/src/bedrock2Examples/insertionsort.v.html │ │ 3.7590 4.8640 1.1050 29.40% 22 coq-fiat-crypto-with-bedrock/src/Rewriter/Passes/ArithWithCasts.v.html │ │ 72.4010 73.4610 1.0600 1.46% 905 coq-unimath/UniMath/ModelCategories/Generated/LNWFSCocomplete.v.html │ │ 38.3090 39.2210 0.9120 2.38% 338 coq-unimath/UniMath/ModelCategories/Generated/LNWFSMonoidalStructure.v.html │ │ 44.1800 44.9290 0.7490 1.70% 827 coq-vst/veric/binop_lemmas4.v.html │ │ 256.3580 257.0640 0.7060 0.28% 1629 coq-metacoq-pcuic/pcuic/theories/PCUICSR.v.html │ │ 2.3610 2.9980 0.6370 26.98% 487 coq-stdlib/Numbers/HexadecimalFacts.v.html │ │ 100.4930 101.1180 0.6250 0.62% 20 coq-fiat-crypto-with-bedrock/src/Rewriter/Passes/NBE.v.html │ │ 2.6560 3.2640 0.6080 22.89% 34 coq-fiat-crypto-with-bedrock/src/Rewriter/Passes/ArithWithCasts.v.html │ │ 2.8070 3.3790 0.5720 20.38% 490 coq-stdlib/Reals/Cauchy/ConstructiveCauchyRealsMult.v.html │ │ 30.6860 31.2520 0.5660 1.84% 147 coq-vst/veric/expr_lemmas4.v.html │ │ 67.1070 67.6310 0.5240 0.78% 27 coq-fiat-crypto-with-bedrock/src/Rewriter/Passes/ToFancyWithCasts.v.html │ │ 24.6220 25.1420 0.5200 2.11% 374 coq-unimath/UniMath/ModelCategories/Generated/LNWFSMonoidalStructure.v.html │ │ 24.6100 25.1140 0.5040 2.05% 375 coq-unimath/UniMath/ModelCategories/Generated/LNWFSMonoidalStructure.v.html │ │ 23.0480 23.5210 0.4730 2.05% 49 coq-fiat-crypto-with-bedrock/src/Curves/Weierstrass/AffineProofs.v.html │ │ 32.9770 33.4480 0.4710 1.43% 1333 coq-fiat-crypto-with-bedrock/rupicola/bedrock2/compiler/src/compiler/FlatToRiscvFunctions.v.html │ │ 4.7300 5.1690 0.4390 9.28% 237 coq-fiat-parsers/src/Parsers/ParserFromParserADT.v.html │ │ 180.9980 181.4340 0.4360 0.24% 233 coq-fiat-crypto-with-bedrock/rupicola/bedrock2/deps/riscv-coq/src/riscv/Proofs/DecodeByExtension.v.html │ │ 29.8030 30.2160 0.4130 1.39% 194 coq-vst/veric/expr_lemmas4.v.html │ │ 138.3620 138.7380 0.3760 0.27% 155 coq-fiat-crypto-with-bedrock/src/UnsaturatedSolinasHeuristics/Tests.v.html │ │ 25.6190 25.9720 0.3530 1.38% 454 coq-unimath/UniMath/SyntheticHomotopyTheory/Circle2.v.html │ │ 7.2000 7.5450 0.3450 4.79% 604 coq-unimath/UniMath/CategoryTheory/EnrichedCats/Colimits/Examples/StructureEnrichedColimits.v.html │ └───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┘ 🐇 Top 25 speed ups┌───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┐ │ TOP 25 SPEED UPS │ │ │ │ OLD NEW DIFF %DIFF Ln FILE │ ├───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┤ │ 260.1730 221.4150 -38.7580 -14.90% 8 coq-neural-net-interp-computed-lite/theories/MaxOfTwoNumbersSimpler/Computed/AllLogits.v.html │ │ 23.6380 0.1790 -23.4590 -99.24% 121 coq-fiat-crypto-with-bedrock/src/Bedrock/Field/Synthesis/Examples/p224_64_new.v.html │ │ 23.5700 0.1800 -23.3900 -99.24% 130 coq-fiat-crypto-with-bedrock/src/Bedrock/Field/Synthesis/Examples/p224_64_new.v.html │ │ 26.8950 13.4570 -13.4380 -49.96% 22 coq-fiat-crypto-with-bedrock/src/Spec/Test/X25519.v.html │ │ 26.8760 13.4920 -13.3840 -49.80% 17 coq-fiat-crypto-with-bedrock/src/Spec/Test/X25519.v.html │ │ 15.7680 3.9510 -11.8170 -74.94% 33 coq-fiat-crypto-with-bedrock/src/Curves/EdwardsMontgomery25519.v.html │ │ 12.2430 0.4340 -11.8090 -96.46% 130 coq-fiat-crypto-with-bedrock/src/Bedrock/Field/Synthesis/Examples/p224_64_new.v.html │ │ 12.0870 0.4230 -11.6640 -96.50% 121 coq-fiat-crypto-with-bedrock/src/Bedrock/Field/Synthesis/Examples/p224_64_new.v.html │ │ 11.3920 0.0740 -11.3180 -99.35% 178 coq-fiat-crypto-with-bedrock/src/Bedrock/Field/Synthesis/Examples/p224_64_new.v.html │ │ 11.2860 0.0080 -11.2780 -99.93% 179 coq-fiat-crypto-with-bedrock/src/Bedrock/Field/Synthesis/Examples/p224_64_new.v.html │ │ 11.5470 0.2800 -11.2670 -97.58% 181 coq-fiat-crypto-with-bedrock/src/Bedrock/Field/Synthesis/Examples/p224_64_new.v.html │ │ 11.4510 0.2190 -11.2320 -98.09% 166 coq-fiat-crypto-with-bedrock/src/Bedrock/Field/Synthesis/Examples/p224_64_new.v.html │ │ 11.2370 0.0060 -11.2310 -99.95% 162 coq-fiat-crypto-with-bedrock/src/Bedrock/Field/Synthesis/Examples/p224_64_new.v.html │ │ 11.2390 0.0400 -11.1990 -99.64% 161 coq-fiat-crypto-with-bedrock/src/Bedrock/Field/Synthesis/Examples/p224_64_new.v.html │ │ 21.0720 10.5410 -10.5310 -49.98% 25 coq-fiat-crypto-with-bedrock/src/Spec/Test/X25519.v.html │ │ 8.8790 0.7270 -8.1520 -91.81% 78 coq-fiat-crypto-with-bedrock/src/Bedrock/End2End/X25519/Field25519.v.html │ │ 10.4470 2.3200 -8.1270 -77.79% 78 coq-fiat-crypto-with-bedrock/src/Bedrock/End2End/X25519/Field25519.v.html │ │ 15.3810 7.7370 -7.6440 -49.70% 99 coq-fiat-crypto-with-bedrock/src/Spec/Curve25519.v.html │ │ 50.4030 43.8540 -6.5490 -12.99% 50 coq-fiat-crypto-with-bedrock/src/Bedrock/Field/Synthesis/Examples/p224_64_new.v.html │ │ 52.7390 46.4890 -6.2500 -11.85% 50 coq-fiat-crypto-with-bedrock/src/Bedrock/Field/Synthesis/Examples/p224_64_new.v.html │ │ 13.9640 8.0260 -5.9380 -42.52% 12 coq-fiat-crypto-with-bedrock/src/Spec/Test/X25519.v.html │ │ 16.0670 10.2420 -5.8250 -36.25% 25 coq-fiat-crypto-with-bedrock/src/Curves/EdwardsMontgomery25519.v.html │ │ 41.7570 36.8200 -4.9370 -11.82% 139 coq-fiat-parsers/src/Parsers/Refinement/SharpenedJSON.v.html │ │ 5.2310 1.2820 -3.9490 -75.49% 96 coq-fiat-crypto-with-bedrock/src/Bedrock/End2End/X25519/Field25519.v.html │ │ 5.1280 1.1820 -3.9460 -76.95% 150 coq-fiat-crypto-with-bedrock/src/Bedrock/End2End/X25519/Field25519.v.html │ └───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┘ |
twenty largest absolute vosize reductions
There are also some vosize increases, 10 largest ones:
|
The increases are probably due to the fact we now have to store a mask for constants. I guess we could find a more compact representation, right now it's a list of booleans. But honestly I think that compared to the other numbers it's premature optimization. |
For safety, I think that this PR should go after #19015. Indeed on master the VM and native may generate slightly different lambda codes due to different inlining strategies, and this PR relies on the bound rel computation from the VM even for native. I don't think it actually matters, as native optimization is somewhat more aggressive than the VM one, so if a rel doesn't appear in the VM code it shouldn't appear either in the native one, but better safe than sorry. |
d2b9f85
to
65dde63
Compare
65dde63
to
5ecc6aa
Compare
This should be ready now. |
Just to be sure I understand. On the following code Definition comma {A B C} (f : A -> B) (g : B -> C) := fun x => g (f x).
Definition double {A} (f : A -> A) := comma f f.
Definition times4 := double (fun x => x + x). the bytecode for I feel we should go a bit further. (But perhaps this is something you already have in mind for a future pull request.) We should not even pass dummy arguments. We should create two versions of each symbol: one with only meaningful arguments, and the other one which accepts all the arguments, but passes only the meaningful ones to the first version. Then, when we encounter an application of the symbol, there are two possible cases. Either the symbol is applied to sufficiently many arguments (i.e., any extra argument would be meaningful, according to the mask), in which case the first version of the symbol would be invoked. Otherwise, the second symbol is invoked, and meaningless arguments are filled with a dummy value. This should reduce the size of the bytecode much further. The only case where it would increase is if there is a symbol with some meaningless arguments, but this symbol is never applied, in which a few opcodes would be used to create the closure of the useless version, but no opcode would ever be gained from its (non) application. |
Here it is, although you can already see the effect with Dump Lambda since the PR only touches the constr -> lambda phase. But indeed your analysis is correct.
I thought about that but decided against, at least for a start. This makes the compilation more involved, but in practice I doubt this would make much difference in the output code. Indeed, what is important for reduction is not so much the arity of closures but rather the fact that we pass huge type annotations that are completely irrelevant for the runtime behaviour. At most you'd get with your enhancement a couple of instructions less for the root irrelevant calls, but everything that is below an irrelevant type barrier doesn't matter. |
5ecc6aa
to
da3a17a
Compare
kernel/genlambda.ml
Outdated
let cb = lookup_constant kn env in | ||
begin match cb.const_body with | ||
| Primitive op -> lambda_of_prim env c op (lambda_of_args cache env sigma 0 args) | ||
| Def csubst -> (* TODO optimize if f is a proj and argument is known *) | ||
if cb.const_inline_code then lambda_of_app cache env sigma csubst args | ||
else | ||
mkLapp (Lconst (kn, u)) (lambda_of_args cache env sigma 0 args) | ||
(* Erase unused arguments *) | ||
let mask = Array.of_list mask in |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you should just store mask
as an array of boolean right from the start. Keeping it as a list and converting it again and again to an array is a waste of cycles and memory.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
We replace arguments that do not appear in the lambda body of constants by a dummy constant value. The rationale is that most arguments in symbolic code are type annotations that have no computational content, hence there is no point in generating them in the first place. This patch seems to have wonderful effects on degenerate examples from UniMath.
da3a17a
to
d9d473f
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Now that the mask has been trimmed and turned into an array, it could be interesting to rerun a bench. But it is not mandatory.
@coqbot bench |
Personally I think this will not change anything performance-wise. |
I am not really concerned about performance. I am more interested in knowing whether we still need a followup pull request that uses a bitvector instead of an array of booleans. |
Why would it matter? The mask is typically small, as it's bounded by the syntactic arity of the definition, and it is stored at toplevel in a single place for every constant. Even the process of creating it from the list at runtime for every constant found in the lambda code is probably cheap. |
It matters because some |
🏁 Bench results:
INFO: failed to install 🐢 Top 25 slow downs┌───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┐ │ TOP 25 SLOW DOWNS │ │ │ │ OLD NEW DIFF %DIFF Ln FILE │ ├───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┤ │ 94.5620 96.3910 1.8290 1.93% 999 coq-performance-tests-lite/src/fiat_crypto_via_setoid_rewrite_standalone.v.html │ │ 5.4880 7.0710 1.5830 28.84% 196 coq-fiat-crypto-with-bedrock/src/Bedrock/Field/Synthesis/Examples/p224_64_new.v.html │ │ 76.2380 77.7920 1.5540 2.04% 365 coq-mathcomp-odd-order/theories/PFsection4.v.html │ │ 94.7460 96.0670 1.3210 1.39% 968 coq-performance-tests-lite/src/fiat_crypto_via_setoid_rewrite_standalone.v.html │ │ 3.7970 4.9380 1.1410 30.05% 22 coq-fiat-crypto-with-bedrock/src/Rewriter/Passes/ArithWithCasts.v.html │ │ 66.5220 67.5160 0.9940 1.49% 27 coq-fiat-crypto-with-bedrock/src/Rewriter/Passes/ToFancyWithCasts.v.html │ │ 216.6140 217.5890 0.9750 0.45% 103 coq-fiat-crypto-with-bedrock/src/Arithmetic/BarrettReduction.v.html │ │ 154.4160 155.3910 0.9750 0.63% 1190 coq-unimath/UniMath/CategoryTheory/GrothendieckConstruction/IsPullback.v.html │ │ 181.2390 182.1060 0.8670 0.48% 233 coq-fiat-crypto-with-bedrock/rupicola/bedrock2/deps/riscv-coq/src/riscv/Proofs/DecodeByExtension.v.html │ │ 2.6510 3.4050 0.7540 28.44% 34 coq-fiat-crypto-with-bedrock/src/Rewriter/Passes/ArithWithCasts.v.html │ │ 63.4400 64.1840 0.7440 1.17% 368 coq-mathcomp-odd-order/theories/PFsection4.v.html │ │ 138.6420 139.2270 0.5850 0.42% 155 coq-fiat-crypto-with-bedrock/src/UnsaturatedSolinasHeuristics/Tests.v.html │ │ 24.3720 24.8360 0.4640 1.90% 85 coq-fiat-crypto-with-bedrock/src/Curves/Montgomery/AffineProofs.v.html │ │ 25.1660 25.5700 0.4040 1.61% 24 coq-fiat-crypto-with-bedrock/src/Rewriter/Passes/MultiRetSplit.v.html │ │ 101.1090 101.5020 0.3930 0.39% 20 coq-fiat-crypto-with-bedrock/src/Rewriter/Passes/NBE.v.html │ │ 23.4380 23.8270 0.3890 1.66% 23 coq-fiat-crypto-with-bedrock/src/Rewriter/Passes/Arith.v.html │ │ 4.7410 5.1230 0.3820 8.06% 237 coq-fiat-parsers/src/Parsers/ParserFromParserADT.v.html │ │ 16.8240 17.1920 0.3680 2.19% 24 coq-fiat-crypto-with-bedrock/src/Rewriter/Passes/MulSplit.v.html │ │ 33.8250 34.1920 0.3670 1.08% 839 coq-unimath/UniMath/CategoryTheory/GrothendieckConstruction/IsPullback.v.html │ │ 0.3340 0.6650 0.3310 99.10% 868 coq-stdlib/MSets/MSetRBT.v.html │ │ 242.5260 242.8410 0.3150 0.13% 141 coq-fiat-crypto-with-bedrock/src/UnsaturatedSolinasHeuristics/Tests.v.html │ │ 9.5990 9.8930 0.2940 3.06% 87 coq-engine-bench-lite/coq/PerformanceDemos/quadratic_reduction.v.html │ │ 17.6750 17.9580 0.2830 1.60% 32 coq-performance-tests-lite/src/pattern.v.html │ │ 12.6230 12.9040 0.2810 2.23% 1555 coq-fiat-crypto-with-bedrock/src/Util/FSets/FMapTrie.v.html │ │ 30.2910 30.5610 0.2700 0.89% 12 coq-fourcolor/theories/job165to189.v.html │ └───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┘ 🐇 Top 25 speed ups┌───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┐ │ TOP 25 SPEED UPS │ │ │ │ OLD NEW DIFF %DIFF Ln FILE │ ├───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┤ │ 268.2550 221.1870 -47.0680 -17.55% 8 coq-neural-net-interp-computed-lite/theories/MaxOfTwoNumbersSimpler/Computed/AllLogits.v.html │ │ 23.7330 0.1500 -23.5830 -99.37% 121 coq-fiat-crypto-with-bedrock/src/Bedrock/Field/Synthesis/Examples/p224_64_new.v.html │ │ 23.5890 0.1900 -23.3990 -99.19% 130 coq-fiat-crypto-with-bedrock/src/Bedrock/Field/Synthesis/Examples/p224_64_new.v.html │ │ 26.9430 13.5480 -13.3950 -49.72% 22 coq-fiat-crypto-with-bedrock/src/Spec/Test/X25519.v.html │ │ 26.9310 13.5880 -13.3430 -49.55% 17 coq-fiat-crypto-with-bedrock/src/Spec/Test/X25519.v.html │ │ 15.8110 3.9420 -11.8690 -75.07% 33 coq-fiat-crypto-with-bedrock/src/Curves/EdwardsMontgomery25519.v.html │ │ 12.2320 0.4750 -11.7570 -96.12% 130 coq-fiat-crypto-with-bedrock/src/Bedrock/Field/Synthesis/Examples/p224_64_new.v.html │ │ 12.0810 0.4970 -11.5840 -95.89% 121 coq-fiat-crypto-with-bedrock/src/Bedrock/Field/Synthesis/Examples/p224_64_new.v.html │ │ 11.3540 0.0080 -11.3460 -99.93% 179 coq-fiat-crypto-with-bedrock/src/Bedrock/Field/Synthesis/Examples/p224_64_new.v.html │ │ 11.3900 0.0780 -11.3120 -99.32% 178 coq-fiat-crypto-with-bedrock/src/Bedrock/Field/Synthesis/Examples/p224_64_new.v.html │ │ 11.2950 0.0060 -11.2890 -99.95% 162 coq-fiat-crypto-with-bedrock/src/Bedrock/Field/Synthesis/Examples/p224_64_new.v.html │ │ 11.3240 0.0450 -11.2790 -99.60% 161 coq-fiat-crypto-with-bedrock/src/Bedrock/Field/Synthesis/Examples/p224_64_new.v.html │ │ 11.5420 0.3090 -11.2330 -97.32% 181 coq-fiat-crypto-with-bedrock/src/Bedrock/Field/Synthesis/Examples/p224_64_new.v.html │ │ 11.4570 0.2380 -11.2190 -97.92% 166 coq-fiat-crypto-with-bedrock/src/Bedrock/Field/Synthesis/Examples/p224_64_new.v.html │ │ 21.1310 10.5970 -10.5340 -49.85% 25 coq-fiat-crypto-with-bedrock/src/Spec/Test/X25519.v.html │ │ 10.4580 2.2810 -8.1770 -78.19% 78 coq-fiat-crypto-with-bedrock/src/Bedrock/End2End/X25519/Field25519.v.html │ │ 8.8840 0.7280 -8.1560 -91.81% 78 coq-fiat-crypto-with-bedrock/src/Bedrock/End2End/X25519/Field25519.v.html │ │ 15.3620 7.7100 -7.6520 -49.81% 99 coq-fiat-crypto-with-bedrock/src/Spec/Curve25519.v.html │ │ 53.0150 46.3100 -6.7050 -12.65% 50 coq-fiat-crypto-with-bedrock/src/Bedrock/Field/Synthesis/Examples/p224_64_new.v.html │ │ 50.5470 44.1120 -6.4350 -12.73% 50 coq-fiat-crypto-with-bedrock/src/Bedrock/Field/Synthesis/Examples/p224_64_new.v.html │ │ 14.0070 8.0690 -5.9380 -42.39% 12 coq-fiat-crypto-with-bedrock/src/Spec/Test/X25519.v.html │ │ 16.2090 10.2870 -5.9220 -36.54% 25 coq-fiat-crypto-with-bedrock/src/Curves/EdwardsMontgomery25519.v.html │ │ 41.9310 37.4040 -4.5270 -10.80% 139 coq-fiat-parsers/src/Parsers/Refinement/SharpenedJSON.v.html │ │ 5.1190 1.1740 -3.9450 -77.07% 150 coq-fiat-crypto-with-bedrock/src/Bedrock/End2End/X25519/Field25519.v.html │ │ 5.1610 1.2970 -3.8640 -74.87% 96 coq-fiat-crypto-with-bedrock/src/Bedrock/End2End/X25519/Field25519.v.html │ └───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┘ |
Ten biggest increases with the last version:
|
Does that ring a bell?
|
Running the test suite as a separate package means we don't need special code to skip it when t fails, but it also means we get the test suite from master even though we're testing some other commit. So when a bug is newly fixed in master its test fails. |
@coqbot merge now |
Should this have had a changelog? It can be nice to advertise significant performance boosts. |
Maybe the VM enhancements should be mentioned in the release notes, but a single changelog seems a bit too much. |
This change does mean that Unset Guard Checking.
Fixpoint loop (x:unit) : unit := loop x.
Definition ignore {A} (x:A) := tt.
Eval vm_compute in ignore (loop tt). will behave differently (you can also replace the unsafe loop with a safe function which terminates after a long time) |
Repeat after me: CIC is call-by-name. |
We replace arguments that do not appear in the lambda body of constants by a dummy constant value. The rationale is that most arguments in symbolic code are type annotations that have no computational content, hence there is no point in generating them in the first place.
This patch seems to have wonderful effects on degenerate examples from UniMath.
Depends on #19015.