Cranelift: Use a fixpoint loop to compute the best value for each eclass #7859

fitzgen · 2024-02-02T03:40:22Z

Fixes #7857

elliottt

This looks great!

cranelift/codegen/src/egraph/cost.rs

cranelift/codegen/src/egraph/elaborate.rs

cfallin · 2024-02-02T07:12:24Z

cranelift/codegen/src/egraph/elaborate.rs

+
+            for (value, def) in self.func.dfg.values_and_defs() {
+                // If the cost of this value is finite, then we've already found
+                // its final cost.


Sorry for the drive-by from the sidelines, just a possible clarification request here though after thinking about cost updates during a long drive today:

It's not immediately obvious to me why this (once finite, then final) property is the case; I'm curious what reasoning y'all have gone through on this and/or what you've observed? I think a node's cost can continue to decrease as we discover more finite costs (consider a union node: min(20, infinity) == 20 in first pass, min(20, 10) == 10 in second pass; then another node that uses that as an arg). Or is there an argument we can make why this shouldn't happen in practice?

This is a great point Chris! When @fitzgen and I were discussing the fixpoint change yesterday, we reasoned that it was okay to skip finite values because we were assuming two things:

We would remove the behavior where Cost addition would saturate to MAX_COST, not infinity()

As we can't produce cycles, a fixpoint would cause everything to eventually settle out to finite cost

As you pointed out, the flaw with this reasoning is that the handling of Union values will not behave this way, instead preferring finite values to infinite.

Since addition now saturates to infinity which will ensure that Result nodes don't appear finite until all their dependencies have been processed, what do you think about only computing the min if both arguments to a Union are finite? I think that change would make more concrete our use of the infinity() cost: it's a marker for where all the arguments have not yet been processed.

In order for a union(a, b) to be finite but not in its final form would require one of a or b to finite and the other infinite, but the only way we can still have an infinite cost for an operand value when computing the cost of the current value is if the operand value's index is larger than the current value's index. That cannot happen for union values, since they are only added to the DFG after their operands.

This is, however, a pretty subtle argument, so I'd be fine skipping this early-continue optimization. I'll land this PR without it, because that is pretty obviously correct, and if we want to experiment with different approaches to optimizing the loop from there, we can open follow up PRs.

Good point Nick, sorry for muddying the waters there.

Okay actually I was wrong, thanks Trevor for asking very pointed questions in private chat :-p

The union's operand values are always defined before the union, but if one of those operand values is a funky one where its operands are out of order, then the operand could still be infinite by the time we get to the union, and then the union's min would drop the infinite. That would be a finite cost that is potentially not in its final form, depending on the cost we still need to compute for the still-infinite operand.

So this "optimization" of early-continuing was not correct! Bullet dodged.

This ae-graphs code is all very subtle, and we should spend some time thinking about what we can do to make things more obviously correct, even if it is just adding additional debug asserts and comments. It shouldn't take 3.5 engineers who are all intimately familiar with Cranelift a full day to diagnose and fix this kind of bug and still introduce subtle flaws in the fix.

And for clarity: since we are doing the "full" fixpoint now, even if we "drop" an operand's infinite cost via min in one iteration of the loop, we will consider that operand's value again on the next iteration of the fix point, and eventually, as the fixpoint is reached, we will have the correct costs for everything.

Thanks @fitzgen and @elliottt (and @alexcrichton) for taking this on and sorry for not realizing this subtle case originally!

A further optimization (which I can take on when I'm back) that occurred to me today: we could track whether we see any "forward references" (perhaps integrate this into the fixpoint loop itself, though it won't change between iterations), and exit the loop after one iteration if none exist. This is the common case, and it would avoid doing a second (no-changes) pass. This extra cost is totally fine for now IMHO (correctness first!).

I agree the code is pretty subtle; to some degree I think that's inherent to the problem, and it's already pretty comment-dense in many (not all!) areas, but I can also try to add some more top-level documentation on invariants and the like when I'm back. I'd like to try to do some more semi-formal proofs too, similar to MachBuffer's comments, to convince us that we don't have any more issues lurking (and to help understanding).

Agreed, and not trying to point fingers or anything, just trying to improve the situation for everyone. I think something like #7856 would help a lot too.

fitzgen · 2024-02-02T16:44:33Z

(Re-adding to merge queue after misunderstanding regarding #7859 (comment))

Fixes bytecodealliance#7857

alexcrichton · 2024-02-05T17:57:47Z

Given that the same riscv64 failure happened twice in a row my guess is that it's probably a deterministic failure rather than a spurious failure. That may mean that a preexisting riscv64 lowering rule is buggy and this is starting to expose that. I'll note though that I haven't attempted to reproduce locally yet.

alexcrichton · 2024-02-05T18:12:28Z

Ah yes I can reproduce locally:

---- wasi_http_hash_all_with_override stdout ----
thread 'wasi_http_hash_all_with_override' panicked at cranelift/codegen/src/egraph/elaborate.rs:296:17:
assertion failed: best[value].0.is_finite()

---- wasi_http_double_echo stdout ----
thread 'wasi_http_double_echo' panicked at cranelift/codegen/src/egraph/elaborate.rs:296:17:
assertion failed: best[value].0.is_finite()

---- wasi_http_hash_all stdout ----
thread 'wasi_http_hash_all' panicked at cranelift/codegen/src/egraph/elaborate.rs:296:17:
assertion failed: best[value].0.is_finite()
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace

---- wasi_http_echo stdout ----
thread 'wasi_http_echo' panicked at cranelift/codegen/src/egraph/elaborate.rs:296:17:
assertion failed: best[value].0.is_finite()

No output on CI due to rayon-rs/rayon#1066 I think, not that it's actually a bug in rayon but an unfortunate consequence.

afonso360 · 2024-02-05T18:42:10Z

I ran the fuzzgen-icache fuzzer to try and find a small reproducible example for the riscv bug, but it found a similar error for s390x:

test compile
set opt_level=speed
target s390x

function u1:0() -> f32x4 system_v {
    const0 = 0x00000000000000000000000000000000

block0:
    v27 = vconst.f32x4 const0
    v57 = fma v27, v27, v27  ; v27 = const0, v27 = const0, v27 = const0
    v58 = vconst.i32x4 const0
    v60 = vconst.f32x4 const0
    v61 = bitcast.f32x4 v58  ; v58 = const0
    v28 = bitselect v61, v60, v57  ; v60 = const0
    v62 = fma v28, v28, v28
    v63 = fcmp ne v62, v62
    v65 = vconst.f32x4 const0
    v66 = bitcast.f32x4 v63
    v29 = bitselect v66, v65, v62  ; v65 = const0
    v67 = fma v29, v29, v29
    v68 = fcmp ne v67, v67
    v70 = vconst.f32x4 const0
    v71 = bitcast.f32x4 v68
    v30 = bitselect v71, v70, v67  ; v70 = const0
    v72 = fma v30, v30, v30
    v73 = fcmp ne v72, v72
    v75 = vconst.f32x4 const0
    v76 = bitcast.f32x4 v73
    v31 = bitselect v76, v75, v72  ; v75 = const0
    v77 = fma v31, v31, v31
    v78 = fcmp ne v77, v77
    v80 = vconst.f32x4 const0
    v81 = bitcast.f32x4 v78
    v32 = bitselect v81, v80, v77  ; v80 = const0
    v82 = fma v32, v32, v32
    v83 = fcmp ne v82, v82
    v85 = vconst.f32x4 const0
    v86 = bitcast.f32x4 v83
    v33 = bitselect v86, v85, v82  ; v85 = const0
    v87 = fma v33, v33, v33
    v88 = fcmp ne v87, v87
    v90 = vconst.f32x4 const0
    v91 = bitcast.f32x4 v88
    v34 = bitselect v91, v90, v87  ; v90 = const0
    return v34
}

I'm still going to try to find a smaller one before trying to figure out which rule is causing issues

fitzgen · 2024-02-05T18:58:42Z

Thanks Afonso!

afonso360 · 2024-02-05T19:47:26Z

Here's another case that it found, this one for AArch64.

Testcase

test compile
set opt_level=speed
target aarch64

function u1:0(f64x2, f64x2) -> f64x2, f64x2 tail {
    sig0 = (f64x2, f64x2) -> f64x2, f64x2 tail
    fn0 = colocated u2:0 sig0

block0(v0: f64x2, v1: f64x2):
    v2 = iconst.i8 0
    v3 = iconst.i16 0
    v4 = iconst.i32 0
    v5 = iconst.i64 0
    v6 = uextend.i128 v5  ; v5 = 0
    v7 = func_addr.i64 fn0
    return_call_indirect sig0, v7(v1, v1)

block1 cold:
    v62 = f64const 0.0
    v63 = splat.f64x2 v62  ; v62 = 0.0
    v9, v10 = call fn0(v63, v63)
    v11, v12 = call fn0(v10, v10)
    v13, v14 = call fn0(v12, v12)
    v15, v16 = call fn0(v14, v14)
    v17, v18 = call fn0(v16, v16)
    v19, v20 = call fn0(v18, v18)
    v21, v22 = call fn0(v20, v20)
    v23, v24 = call fn0(v22, v22)
    v25, v26 = call fn0(v24, v24)
    v27, v28 = call fn0(v26, v26)
    v29, v30 = call fn0(v28, v28)
    v31, v32 = call fn0(v30, v30)
    v33, v34 = call fn0(v32, v32)
    v35, v36 = call fn0(v34, v34)
    v37, v38 = call fn0(v36, v36)
    v39, v40 = call fn0(v38, v38)
    v41, v42 = call fn0(v40, v40)
    v43, v44 = call fn0(v42, v42)
    v45, v46 = call fn0(v44, v44)
    v47, v48 = call fn0(v46, v46)
    v49, v50 = call fn0(v48, v48)
    return v49, v49
}

This one is interesting to me because almost all of this is dead code, but if we minimize it, it no longer crashes 👀 . The trace log states the following:

 TRACE cranelift_codegen::context              > About to optimize with egraph phase:
function u1:0(f64x2, f64x2) -> f64x2, f64x2 tail {
    sig0 = (f64x2, f64x2) -> f64x2, f64x2 tail
    fn0 = colocated u2:0 sig0

block0(v0: f64x2, v1: f64x2):
    v7 = func_addr.i64 fn0
    return_call_indirect sig0, v7(v1, v1)
}

So it does optimize away the deadcode internally, but then still tries to elaborate some of the previously eliminated instructions. Which doesn't make sense to me, but I haven't kept up with the inner workings of the egraphs stuff.

I'm not familiar enough with egraphs to be able to debug this, but if you need any help reworking one of the lowering rules let me know!

jameysharp

Looks good to me! Just one optional suggestion.

cranelift/codegen/src/egraph/cost.rs

…ass (bytecodealliance#7859) * Cranelift: Use a fixpoint loop to compute the best value for each eclass Fixes bytecodealliance#7857 * Remove fixpoint loop early-continue optimization * Add document describing optimization rule invariants * Make select optimizations use subsume * Remove invalid debug assert * Remove now-unused methods * Add commutative adds to cost tests

…ass (#7859) (#7878) * Cranelift: Use a fixpoint loop to compute the best value for each eclass Fixes #7857 * Remove fixpoint loop early-continue optimization * Add document describing optimization rule invariants * Make select optimizations use subsume * Remove invalid debug assert * Remove now-unused methods * Add commutative adds to cost tests

…ass (bytecodealliance#7859) * Cranelift: Use a fixpoint loop to compute the best value for each eclass Fixes bytecodealliance#7857 * Remove fixpoint loop early-continue optimization * Add document describing optimization rule invariants * Make select optimizations use subsume * Remove invalid debug assert * Remove now-unused methods * Add commutative adds to cost tests

* Guard recursion in `will_simplify_with_ireduce` (#7882) Add a test to expose issues with unbounded recursion through `iadd` during egraph rewrites, and bound the recursion of `will_simplify_with_ireduce`. Fixes #7874 Co-authored-by: Nick Fitzgerald <fitzgen@gmail.com> * Cranelift: Use a fixpoint loop to compute the best value for each eclass (#7859) * Cranelift: Use a fixpoint loop to compute the best value for each eclass Fixes #7857 * Remove fixpoint loop early-continue optimization * Add document describing optimization rule invariants * Make select optimizations use subsume * Remove invalid debug assert * Remove now-unused methods * Add commutative adds to cost tests * Add missing subsume uses in egraph rules (#7879) * Fix a few egraph rules that needed `subsume` There were a few rules that dropped value references from the LHS without using subsume. I think they were probably benign as they produced constant results, but this change is in the spirit of our revised guidelines for egraph rules. * Augment egraph rule guideline 2 to talk about constants * Update release notes --------- Co-authored-by: Nick Fitzgerald <fitzgen@gmail.com>

…each eclass (#7859)" This reverts commit 5b2ae83.

This commit is born out of a fuzz bug on x64 that was discovered recently. Today, on `main`, and in the 17.0.1 release Wasmtime will panic when compiling this wasm module for x64: (module (func (result v128) i32.const 0 i32x4.splat f64x2.convert_low_i32x4_u)) panicking with: thread '<unnamed>' panicked at /home/alex/.cargo/registry/src/index.crates.io-6f17d22bba15001f/cranelift-codegen-0.104.1/src/machinst/lower.rs:766:21: should be implemented in ISLE: inst = `v6 = fcvt_from_uint.f64x2 v13 ; v13 = const0`, type = `Some(types::F64X2)` note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace Bisections points to the "cause" of this regression as bytecodealliance#7859 which more-or-less means that this has always been an issue and that PR just happened to expose the issue. What's happening here is that egraph optimizations are turning the IR into a form that the x64 backend can't codegen. Namely there's no general purpose lowering of i64x2 being converted to f64x2. The Wasm frontend never produces this but the optimizations internally end up producing this. Notably here the result of this function is constant and what's happening is that a convert-of-a-splat is happening. In lieu of adding the full general lowering to x64 (which is perhaps overdue since this is the second or third time this panic has been triggered) I've opted to add constant propagation optimizations for int-to-float conversions. These are all based on the Rust `as` operator which has the same semantics as Cranelift. This is enough to fix the issue here for the time being.

This commit is born out of a fuzz bug on x64 that was discovered recently. Today, on `main`, and in the 17.0.1 release Wasmtime will panic when compiling this wasm module for x64: (module (func (result v128) i32.const 0 i32x4.splat f64x2.convert_low_i32x4_u)) panicking with: thread '<unnamed>' panicked at /home/alex/.cargo/registry/src/index.crates.io-6f17d22bba15001f/cranelift-codegen-0.104.1/src/machinst/lower.rs:766:21: should be implemented in ISLE: inst = `v6 = fcvt_from_uint.f64x2 v13 ; v13 = const0`, type = `Some(types::F64X2)` note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace Bisections points to the "cause" of this regression as #7859 which more-or-less means that this has always been an issue and that PR just happened to expose the issue. What's happening here is that egraph optimizations are turning the IR into a form that the x64 backend can't codegen. Namely there's no general purpose lowering of i64x2 being converted to f64x2. The Wasm frontend never produces this but the optimizations internally end up producing this. Notably here the result of this function is constant and what's happening is that a convert-of-a-splat is happening. In lieu of adding the full general lowering to x64 (which is perhaps overdue since this is the second or third time this panic has been triggered) I've opted to add constant propagation optimizations for int-to-float conversions. These are all based on the Rust `as` operator which has the same semantics as Cranelift. This is enough to fix the issue here for the time being.

fitzgen requested review from elliottt and alexcrichton February 2, 2024 03:40

fitzgen requested a review from a team as a code owner February 2, 2024 03:40

fitzgen mentioned this pull request Feb 2, 2024

Block order and value number affects whether we get valid CLIF after optimizations #7857

Closed

elliottt approved these changes Feb 2, 2024

View reviewed changes

cranelift/codegen/src/egraph/cost.rs Show resolved Hide resolved

cranelift/codegen/src/egraph/elaborate.rs Show resolved Hide resolved

cranelift/codegen/src/egraph/elaborate.rs Show resolved Hide resolved

github-actions bot added the cranelift Issues related to the Cranelift code generator label Feb 2, 2024

cfallin reviewed Feb 2, 2024

View reviewed changes

fitzgen enabled auto-merge February 2, 2024 16:01

fitzgen added this pull request to the merge queue Feb 2, 2024

elliottt removed this pull request from the merge queue due to a manual request Feb 2, 2024

fitzgen added this pull request to the merge queue Feb 2, 2024

github-merge-queue bot removed this pull request from the merge queue due to failed status checks Feb 2, 2024

fitzgen added 2 commits February 5, 2024 09:12

Cranelift: Use a fixpoint loop to compute the best value for each eclass

a353b5d

Fixes bytecodealliance#7857

Remove fixpoint loop early-continue optimization

370fb43

fitzgen force-pushed the egraph-cost-fix-point branch from 6eced83 to 370fb43 Compare February 5, 2024 17:12

fitzgen enabled auto-merge February 5, 2024 17:13

fitzgen added this pull request to the merge queue Feb 5, 2024

github-merge-queue bot removed this pull request from the merge queue due to failed status checks Feb 5, 2024

fitzgen added 4 commits February 5, 2024 13:28

Add document describing optimization rule invariants

b5c3d4b

Make select optimizations use subsume

04c2d53

Remove invalid debug assert

9c1d078

Remove now-unused methods

e164dcf

jameysharp approved these changes Feb 5, 2024

View reviewed changes

cranelift/codegen/src/egraph/cost.rs Show resolved Hide resolved

Add commutative adds to cost tests

e04c4d4

fitzgen enabled auto-merge February 5, 2024 22:31

fitzgen added this pull request to the merge queue Feb 5, 2024

Merged via the queue into bytecodealliance:main with commit 5b2ae83 Feb 5, 2024
19 checks passed

fitzgen deleted the egraph-cost-fix-point branch February 5, 2024 23:22

fitzgen mentioned this pull request Feb 6, 2024

Backport #7859 to release-18.0.0 #7878

Merged

elliottt mentioned this pull request Feb 7, 2024

Backport egraph fixes to 17.0.1 #7889

Merged

elliottt added a commit that referenced this pull request Feb 7, 2024

Revert "Cranelift: Use a fixpoint loop to compute the best value for …

ef004b1

…each eclass (#7859)" This reverts commit 5b2ae83.

alexcrichton mentioned this pull request Feb 12, 2024

Constant propagate int-to-float conversions #7915

Merged

jameysharp mentioned this pull request Feb 22, 2024

egraphs: Undo changes to union find and gvn map structures when backtracking #7891

Closed

alexcrichton mentioned this pull request Mar 12, 2024

Lowering i32x4.splat + f64x2.convert_low_i32x4 results in panic #8084

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Cranelift: Use a fixpoint loop to compute the best value for each eclass #7859

Cranelift: Use a fixpoint loop to compute the best value for each eclass #7859

fitzgen commented Feb 2, 2024

elliottt left a comment

cfallin Feb 2, 2024

elliottt Feb 2, 2024

fitzgen Feb 2, 2024

elliottt Feb 2, 2024

fitzgen Feb 2, 2024

fitzgen Feb 2, 2024

cfallin Feb 2, 2024

fitzgen Feb 5, 2024

fitzgen commented Feb 2, 2024

alexcrichton commented Feb 5, 2024

alexcrichton commented Feb 5, 2024

afonso360 commented Feb 5, 2024

fitzgen commented Feb 5, 2024

afonso360 commented Feb 5, 2024

jameysharp left a comment

Cranelift: Use a fixpoint loop to compute the best value for each eclass #7859

Cranelift: Use a fixpoint loop to compute the best value for each eclass #7859

Conversation

fitzgen commented Feb 2, 2024

elliottt left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

fitzgen commented Feb 2, 2024

alexcrichton commented Feb 5, 2024

alexcrichton commented Feb 5, 2024

afonso360 commented Feb 5, 2024

fitzgen commented Feb 5, 2024

afonso360 commented Feb 5, 2024

jameysharp left a comment

Choose a reason for hiding this comment