-
Notifications
You must be signed in to change notification settings - Fork 632
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Factor the lambda compilation passes of VM and native #19015
Conversation
@coqbot bench |
🏁 Bench results:
🐢 Top 25 slow downs┌───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┐ │ TOP 25 SLOW DOWNS │ │ │ │ OLD NEW DIFF %DIFF Ln FILE │ ├───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┤ │ 255.9510 259.1060 3.1550 1.23% 1629 coq-metacoq-pcuic/pcuic/theories/PCUICSR.v.html │ │ 62.5600 64.9690 2.4090 3.85% 609 coq-fiat-crypto-with-bedrock/rupicola/bedrock2/bedrock2/src/bedrock2Examples/lightbulb.v.html │ │ 95.8370 97.5680 1.7310 1.81% 376 coq-unimath/UniMath/ModelCategories/Generated/LNWFSMonoidalStructure.v.html │ │ 3.8200 5.0120 1.1920 31.20% 22 coq-fiat-crypto-with-bedrock/src/Rewriter/Passes/ArithWithCasts.v.html │ │ 180.7470 181.8840 1.1370 0.63% 233 coq-fiat-crypto-with-bedrock/rupicola/bedrock2/deps/riscv-coq/src/riscv/Proofs/DecodeByExtension.v.html │ │ 100.2940 101.2980 1.0040 1.00% 20 coq-fiat-crypto-with-bedrock/src/Rewriter/Passes/NBE.v.html │ │ 16.7210 17.6920 0.9710 5.81% 607 coq-mathcomp-odd-order/theories/PFsection9.v.html │ │ 45.9960 46.9260 0.9300 2.02% 110 coq-bedrock2/bedrock2/src/bedrock2Examples/full_mul.v.html │ │ 19.6650 20.3790 0.7140 3.63% 2061 coq-fiat-crypto-with-bedrock/rupicola/bedrock2/compiler/src/compiler/FlatToRiscvFunctions.v.html │ │ 63.1640 63.8690 0.7050 1.12% 609 coq-bedrock2/bedrock2/src/bedrock2Examples/lightbulb.v.html │ │ 132.4140 133.1150 0.7010 0.53% 22 coq-fiat-crypto-with-bedrock/src/Rewriter/Passes/ArithWithCasts.v.html │ │ 38.2980 38.9830 0.6850 1.79% 338 coq-unimath/UniMath/ModelCategories/Generated/LNWFSMonoidalStructure.v.html │ │ 2.6970 3.3720 0.6750 25.03% 34 coq-fiat-crypto-with-bedrock/src/Rewriter/Passes/ArithWithCasts.v.html │ │ 46.2300 46.8870 0.6570 1.42% 926 coq-fiat-crypto-with-bedrock/src/Bedrock/End2End/X25519/GarageDoor.v.html │ │ 30.7910 31.4410 0.6500 2.11% 147 coq-vst/veric/expr_lemmas4.v.html │ │ 68.3920 69.0360 0.6440 0.94% 368 coq-mathcomp-odd-order/theories/PFsection4.v.html │ │ 29.9830 30.5080 0.5250 1.75% 194 coq-vst/veric/expr_lemmas4.v.html │ │ 46.0870 46.6090 0.5220 1.13% 110 coq-fiat-crypto-with-bedrock/rupicola/bedrock2/bedrock2/src/bedrock2Examples/full_mul.v.html │ │ 154.8150 155.3070 0.4920 0.32% 1190 coq-unimath/UniMath/CategoryTheory/GrothendieckConstruction/IsPullback.v.html │ │ 23.5090 24.0000 0.4910 2.09% 49 coq-fiat-crypto-with-bedrock/src/Curves/Weierstrass/AffineProofs.v.html │ │ 11.2100 11.6990 0.4890 4.36% 2103 coq-mathcomp-ssreflect/mathcomp/ssreflect/order.v.html │ │ 21.2700 21.7340 0.4640 2.18% 376 coq-fiat-crypto-with-bedrock/rupicola/bedrock2/compiler/src/compiler/FlatToRiscvMetric.v.html │ │ 24.5580 25.0000 0.4420 1.80% 374 coq-unimath/UniMath/ModelCategories/Generated/LNWFSMonoidalStructure.v.html │ │ 39.5020 39.9260 0.4240 1.07% 835 coq-fiat-crypto-with-bedrock/src/Fancy/Compiler.v.html │ │ 24.5910 24.9900 0.3990 1.62% 375 coq-unimath/UniMath/ModelCategories/Generated/LNWFSMonoidalStructure.v.html │ └───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┘ 🐇 Top 25 speed ups┌────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┐ │ TOP 25 SPEED UPS │ │ │ │ OLD NEW DIFF %DIFF Ln FILE │ ├────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┤ │ 220.2720 217.2370 -3.0350 -1.38% 103 coq-fiat-crypto-with-bedrock/src/Arithmetic/BarrettReduction.v.html │ │ 261.1130 258.4160 -2.6970 -1.03% 8 coq-neural-net-interp-computed-lite/theories/MaxOfTwoNumbersSimpler/Computed/AllLogits.v.html │ │ 15.9140 14.3370 -1.5770 -9.91% 1204 coq-vst/floyd/Component.v.html │ │ 15.9900 14.4400 -1.5500 -9.69% 1218 coq-vst/floyd/Component.v.html │ │ 8.3490 7.3010 -1.0480 -12.55% 1216 coq-vst/floyd/Component.v.html │ │ 8.0770 7.1560 -0.9210 -11.40% 1504 coq-vst/floyd/Component.v.html │ │ 8.2330 7.3120 -0.9210 -11.19% 1217 coq-vst/floyd/Component.v.html │ │ 8.4810 7.5710 -0.9100 -10.73% 751 coq-vst/floyd/Component.v.html │ │ 8.2090 7.3720 -0.8370 -10.20% 1201 coq-vst/floyd/Component.v.html │ │ 67.1340 66.3320 -0.8020 -1.19% 27 coq-fiat-crypto-with-bedrock/src/Rewriter/Passes/ToFancyWithCasts.v.html │ │ 8.2310 7.4400 -0.7910 -9.61% 1203 coq-vst/floyd/Component.v.html │ │ 12.3220 11.5600 -0.7620 -6.18% 194 coq-fiat-crypto-with-bedrock/src/Fancy/Barrett256.v.html │ │ 7.9230 7.1780 -0.7450 -9.40% 778 coq-vst/floyd/Component.v.html │ │ 38.4730 37.7960 -0.6770 -1.76% 3 coq-fiat-crypto-with-bedrock/src/ExtractionJsOfOCaml/bedrock2_fiat_crypto.v.html │ │ 3.7770 3.2570 -0.5200 -13.77% 490 coq-stdlib/Reals/Cauchy/ConstructiveCauchyRealsMult.v.html │ │ 17.9190 17.4250 -0.4940 -2.76% 32 coq-performance-tests-lite/src/pattern.v.html │ │ 242.3560 241.8760 -0.4800 -0.20% 141 coq-fiat-crypto-with-bedrock/src/UnsaturatedSolinasHeuristics/Tests.v.html │ │ 44.5640 44.1500 -0.4140 -0.93% 827 coq-vst/veric/binop_lemmas4.v.html │ │ 7.7940 7.4580 -0.3360 -4.31% 1501 coq-vst/floyd/Component.v.html │ │ 7.3990 7.0880 -0.3110 -4.20% 602 coq-unimath/UniMath/CategoryTheory/EnrichedCats/Limits/Examples/StructureEnrichedLimits.v.html │ │ 26.9220 26.6220 -0.3000 -1.11% 17 coq-fiat-crypto-with-bedrock/src/Spec/Test/X25519.v.html │ │ 26.9400 26.6430 -0.2970 -1.10% 22 coq-fiat-crypto-with-bedrock/src/Spec/Test/X25519.v.html │ │ 42.1000 41.8050 -0.2950 -0.70% 139 coq-fiat-parsers/src/Parsers/Refinement/SharpenedJSON.v.html │ │ 7.1190 6.8440 -0.2750 -3.86% 604 coq-unimath/UniMath/CategoryTheory/EnrichedCats/Colimits/Examples/StructureEnrichedColimits.v.html │ │ 4.4870 4.2290 -0.2580 -5.75% 199 coq-fiat-crypto-with-bedrock/src/Fancy/Montgomery256.v.html │ └────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┘ |
@@ -452,6 +459,11 @@ let rec remove_let subst lam = | |||
if def == def' && body == body' then lam else Llet(id,def',body') | |||
| _ -> map_lam_with_binders liftn remove_let subst lam | |||
|
|||
let optimize lam = | |||
let lam = simplify lam in | |||
let lam = remove_let (subs_id 0) lam in |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can just remove this line if you prefer. My only preoccupation is to generate the same code in native and VM.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That said, I stand by my original position in these bug reports, i.e. relying on let-bindings to order computation in Coq is a terrible idea that conflicts with the internal semantics of CIC. The only sane way to force computation order is by pattern matching, i.e. replace let x := t in u
with match t with Box x => u end
where Inductive box (A : Type) := Box : A -> box A
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can also take a middle-ground mixing the current VM and native heuristic, by only let-substituting values.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Most of the values have already been substituted by simplify
itself. So, the remaining ones are just the ones for creating closures: Llam
, Lfix
, and Lcofix
. That would be fine, since moving them around cannot arbitrarily change the time complexity of an algorithm, not even when taking the garbage collector into account, if I am not mistaken.
And by the way, in my opinion, having the user replace let x := t in u
with match Box t with Box x => u end
is just insane.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That would be fine
Patch incoming.
just insane
Beware, it's not an η expansion as you wrote, to indicate call-by-value you must return a Box type, so all strict primitives must have type Box A
for some A
, or unit
if they do not return anything. Indeed, let _ : unit := t in u
and match t with tt => u
are two very different things computationally speaking.
We use the criterion from the VM rather than the native one, because it is more reasonable. The code for the VM allowed additionally 63-bit integers, while the one for native allowed λ-abstractions and evars. Duplicating the latter, which may contain arbitrary subterms, does not sound like a great idea for code size.
It was always set to true.
This ensures that this optimization doesn't wreak havoc with strict effectful primitives such as array modification.
4cfa2b9
to
c9355b1
Compare
I checked and actually this patch doesn't change anything there for unimath. |
🏁 Bench results:
🐢 Top 25 slow downs┌──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┐ │ TOP 25 SLOW DOWNS │ │ │ │ OLD NEW DIFF %DIFF Ln FILE │ ├──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┤ │ 260.7810 268.9420 8.1610 3.13% 8 coq-neural-net-interp-computed-lite/theories/MaxOfTwoNumbersSimpler/Computed/AllLogits.v.html │ │ 96.6050 97.9480 1.3430 1.39% 376 coq-unimath/UniMath/ModelCategories/Generated/LNWFSMonoidalStructure.v.html │ │ 217.5900 218.9120 1.3220 0.61% 103 coq-fiat-crypto-with-bedrock/src/Arithmetic/BarrettReduction.v.html │ │ 62.4210 63.7350 1.3140 2.11% 609 coq-bedrock2/bedrock2/src/bedrock2Examples/lightbulb.v.html │ │ 30.2980 31.5630 1.2650 4.18% 12 coq-fourcolor/theories/job254to270.v.html │ │ 256.6540 257.8910 1.2370 0.48% 1629 coq-metacoq-pcuic/pcuic/theories/PCUICSR.v.html │ │ 26.9830 28.1280 1.1450 4.24% 35 coq-fiat-crypto-with-bedrock/src/Bedrock/End2End/X25519/MontgomeryLadderRISCV.v.html │ │ 25.9970 27.1300 1.1330 4.36% 12 coq-fourcolor/theories/job611to617.v.html │ │ 180.8380 181.9150 1.0770 0.60% 233 coq-fiat-crypto-with-bedrock/rupicola/bedrock2/deps/riscv-coq/src/riscv/Proofs/DecodeByExtension.v.html │ │ 26.4840 27.5390 1.0550 3.98% 12 coq-fourcolor/theories/job287to290.v.html │ │ 27.8560 28.8700 1.0140 3.64% 12 coq-fourcolor/theories/job563to588.v.html │ │ 28.8260 29.8010 0.9750 3.38% 32 coq-fiat-crypto-with-bedrock/src/Bedrock/End2End/X25519/MontgomeryLadderRISCV.v.html │ │ 48.0400 48.9600 0.9200 1.92% 558 coq-fiat-crypto-with-bedrock/rupicola/bedrock2/bedrock2/src/bedrock2Examples/insertionsort.v.html │ │ 32.2980 33.2150 0.9170 2.84% 12 coq-fourcolor/theories/job439to465.v.html │ │ 24.9430 25.8240 0.8810 3.53% 12 coq-fourcolor/theories/job499to502.v.html │ │ 28.7350 29.6080 0.8730 3.04% 12 coq-fourcolor/theories/job323to383.v.html │ │ 28.6400 29.5070 0.8670 3.03% 12 coq-fourcolor/theories/job589to610.v.html │ │ 47.7880 48.6160 0.8280 1.73% 558 coq-bedrock2/bedrock2/src/bedrock2Examples/insertionsort.v.html │ │ 29.3920 30.2170 0.8250 2.81% 12 coq-fourcolor/theories/job165to189.v.html │ │ 29.3700 30.1730 0.8030 2.73% 12 coq-fourcolor/theories/job001to106.v.html │ │ 20.9240 21.7250 0.8010 3.83% 12 coq-fourcolor/theories/job542to545.v.html │ │ 40.8390 41.6140 0.7750 1.90% 236 coq-rewriter-perf-SuperFast/src/Rewriter/Rewriter/Examples/PerfTesting/LiftLetsMap.v.html │ │ 27.6510 28.4160 0.7650 2.77% 12 coq-fourcolor/theories/job107to164.v.html │ │ 24.6370 25.3960 0.7590 3.08% 12 coq-fourcolor/theories/job279to282.v.html │ │ 24.7760 25.5230 0.7470 3.02% 12 coq-fourcolor/theories/job535to541.v.html │ └──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┘ 🐇 Top 25 speed ups┌───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┐ │ TOP 25 SPEED UPS │ │ │ │ OLD NEW DIFF %DIFF Ln FILE │ ├───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┤ │ 101.8470 101.0540 -0.7930 -0.78% 20 coq-fiat-crypto-with-bedrock/src/Rewriter/Passes/NBE.v.html │ │ 95.6530 94.9980 -0.6550 -0.68% 999 coq-performance-tests-lite/src/fiat_crypto_via_setoid_rewrite_standalone.v.html │ │ 23.2950 22.6420 -0.6530 -2.80% 1073 coq-metacoq-safechecker/safechecker/theories/PCUICSafeReduce.v.html │ │ 7.6800 7.0850 -0.5950 -7.75% 602 coq-unimath/UniMath/CategoryTheory/EnrichedCats/Limits/Examples/StructureEnrichedLimits.v.html │ │ 95.7050 95.1540 -0.5510 -0.58% 968 coq-performance-tests-lite/src/fiat_crypto_via_setoid_rewrite_standalone.v.html │ │ 18.2810 17.7350 -0.5460 -2.99% 32 coq-performance-tests-lite/src/pattern.v.html │ │ 39.8980 39.4200 -0.4780 -1.20% 835 coq-fiat-crypto-with-bedrock/src/Fancy/Compiler.v.html │ │ 81.8700 81.4360 -0.4340 -0.53% 48 coq-fiat-crypto-with-bedrock/src/Curves/Weierstrass/AffineProofs.v.html │ │ 155.6210 155.2620 -0.3590 -0.23% 1190 coq-unimath/UniMath/CategoryTheory/GrothendieckConstruction/IsPullback.v.html │ │ 11.5290 11.1810 -0.3480 -3.02% 546 coq-metacoq-erasure/erasure/theories/ErasureFunctionProperties.v.html │ │ 12.5680 12.2300 -0.3380 -2.69% 409 coq-category-theory/Theory/Metacategory/ArrowsOnly.v.html │ │ 37.7090 37.4000 -0.3090 -0.82% 3 coq-fiat-crypto-with-bedrock/src/ExtractionJsOfOCaml/bedrock2_fiat_crypto.v.html │ │ 4.9020 4.6000 -0.3020 -6.16% 446 coq-mathcomp-odd-order/theories/PFsection12.v.html │ │ 11.3210 11.0210 -0.3000 -2.65% 410 coq-verdi-raft/theories/RaftProofs/LeaderLogsLogMatchingProof.v.html │ │ 16.7610 16.4720 -0.2890 -1.72% 81 coq-fiat-crypto-with-bedrock/rupicola/src/Rupicola/Examples/Utf8/Utf8.v.html │ │ 9.9670 9.6810 -0.2860 -2.87% 418 coq-category-theory/Theory/Metacategory/ArrowsOnly.v.html │ │ 18.3590 18.0830 -0.2760 -1.50% 196 coq-unimath/UniMath/HomologicalAlgebra/KATriangulated.v.html │ │ 0.8520 0.5780 -0.2740 -32.16% 160 coq-stdlib/Numbers/HexadecimalNat.v.html │ │ 25.9490 25.6780 -0.2710 -1.04% 129 coq-fiat-crypto-with-bedrock/src/Curves/Weierstrass/Projective.v.html │ │ 17.1870 16.9170 -0.2700 -1.57% 24 coq-fiat-crypto-with-bedrock/src/Rewriter/Passes/MulSplit.v.html │ │ 18.2970 18.0410 -0.2560 -1.40% 708 coq-fiat-crypto-with-bedrock/src/Rewriter/RulesProofs.v.html │ │ 10.4530 10.2040 -0.2490 -2.38% 496 coq-rewriter-perf-SuperFast/src/Rewriter/Rewriter/Wf.v.html │ │ 20.7220 20.4820 -0.2400 -1.16% 2736 coq-metacoq-safechecker/safechecker/theories/PCUICTypeChecker.v.html │ │ 41.1330 40.9020 -0.2310 -0.56% 236 coq-rewriter/src/Rewriter/Rewriter/Examples/PerfTesting/LiftLetsMap.v.html │ │ 29.4470 29.2170 -0.2300 -0.78% 79 coq-rewriter-perf-SuperFast/src/Rewriter/Rewriter/Examples/PerfTesting/SieveOfEratosthenes.v.html │ └───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┘ |
This optimization tweak seems to impact fourcolor to some extent, but I think that the diff is small enough that we can just accept it. Otherwise I think we can keep the previous heuristic. @silene ? |
I am a bit confused as to why there is a slowdown. The only explanation I can think of is that the code contains the following kind of construct: let v := foo x in
if b then bar v else baz and it was implicitly relying on the optimization to get instead if b then bar (foo x) else baz Unfortunately, I don't have time to investigate right now. So, unless someone comes with a suggestion soon, I propose to merge as is. |
I argue for merging as this, especially because it unlocks a tremendous reduction of the size of degenerate vo files in the dependent PR #19014. |
@coqbot merge now |
This is a series of cleanups that make the VM and native generate the same lambda code up to structured values. Apart for the simple cleanups, the VM pipeline is untouched. This changes the output of native compilation in the following way:
remove_let
heuristic, i.e. the bound variable must occur linearly outside of a closure.EDIT: the last commit also modifies the VM compilation by restricting the remove_let heuristic to syntactic values.