Run MOVE_COMPILER_V2 with more tests and record outputs in .v2_exp files #13254

brmataptos · 2024-05-11T02:39:17Z

Description

Updated various test harnesses and re-ran pretty much all .move tests with MOVE_COMPILER_V2=true to generate .v2_exp files to see what V2 does today. Diffed outputs with V1 outputs and filed bugs for differences (fortunately few).

Tests were run manually. Perhaps there should be a script. Something like the following was used:

for value in false true; do 
UB=1 MOVE_COMPILER_V2=$value cargo test \
      -p "aptos-transactional-test-harness" \
      -p "bytecode-verifier-transactional-tests" \
      -p "move-async-vm" \
      -p "move-cli" \
      -p "move-model" \
      -p "move-package" \
      -p "move-prover-bytecode-pipeline" \
      -p "move-stackless-bytecode" \
      -p "move-to-yul" \
      -p "move-transactional-test-runner" \
      -p "move-unit-test"
      -p "move-vm-transactional-tests" \
      --no-fail-fast \
      >& test-$value.out
done

Note, did not yet set MOVE_COMPILER_BLOCK_V1 so some of these may be using V1 under the hood, but with the baseline outputs checked in we can see if things degrade when we force use of V2.

Type of Change

Which Components or Systems Does This Change Impact?

How Has This Been Tested?

See above.

Key Areas to Review

trunk-io · 2024-05-11T02:39:20Z

⏱️ 2h 45m total CI duration on this PR

Job	Cumulative Duration	Recent Runs
rust-move-unit-coverage	51m	⬜ 🟩 🟩
rust-targeted-unit-tests	30m	⬜ 🟩
execution-performance / single-node-performance	23m	🟩
rust-move-tests	20m	⬜ 🟩
run-tests-main-branch	13m	🟩 🟩 🟩
cli-e2e-tests / run-cli-tests	11m	🟥
rust-lints	11m	⬜ 🟥
general-lints	3m	🟩 🟩
check-dynamic-deps	2m	🟩 🟩
semgrep/ci	42s	🟩 🟩
file_change_determinator	20s	🟩 🟩
file_change_determinator	19s	🟩 🟩
permission-check	10s	🟩 🟩
permission-check	6s	🟩 🟩
permission-check	6s	🟩 🟩
permission-check	4s	🟩 🟩

_{settings ⋅ feedback ⋅ docs ⋅ learn more about trunk.io}

codecov · 2024-05-11T03:07:50Z

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 33.0%. Comparing base (327fc17) to head (a911281).
Report is 7 commits behind head on main.

Additional details and impacted files

@@            Coverage Diff            @@
##             main   #13254     +/-   ##
=========================================
- Coverage    33.0%    33.0%   -0.1%     
=========================================
  Files        1760     1764      +4     
  Lines      338939   339033     +94     
=========================================
+ Hits       111976   111996     +20     
- Misses     226963   227037     +74

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

wrwg

I'm not seeing what this PR brings us in new quality of testing since most features we have v2 tests for now are rather generic Move code which very likely are already covered by our regular tests.
Instead we are adding new technical debt by having the v1/v2 living side by side. When we remove v1, we have to pay for this.
Some of the v2 tests (e.g. bytecode verifier) should have the same .exp file and not a new one. If the difference is in things like definition index we should better redact this from the baseline instead of creating new v2_exp.

wrwg · 2024-05-14T19:19:07Z

third_party/move/evm/move-to-yul/tests/AccountStateMachine.v2_exp

@@ -0,0 +1,1138 @@
+/* =======================================


I think I said before we should keep the evm stuff alone. Why are we adding this additional complexity?

It was easy enough to add it. Just a line in a script.

wrwg · 2024-05-14T19:20:50Z

third_party/move/extensions/async/move-async-vm/tests/sources/AccountStateMachine.v2_exp

@@ -0,0 +1,55 @@
+publishing Actor


Same here. Once we switch to v2 compiler as the default, those things can be updated. This does not give us relevant new test coverage, most test cases here and in the EVM code are not specific to Move features, but to the tool.

wrwg · 2024-05-14T19:21:59Z

...ier/transactional-tests/tests/ability_field_requirements/fields_conditional_abilities.v2_exp

@@ -0,0 +1,37 @@
+processed 8 tasks


How does that differ from the simple .exp? There shouldn't be any difference. Is it the definition indices?

wrwg · 2024-05-14T20:02:36Z

third_party/move/move-command-line-common/src/testing.rs

@@ -11,6 +11,14 @@ pub const EXP_EXT: &str = "exp";
 /// Extension for expected output files compiled by v2
 pub const EXP_EXT_V2: &str = "v2_exp";

+pub fn get_compiler_exp_extension() -> &'static str {
+    if get_move_compiler_v2_from_env() {


Any tests which are dependent from env variables must be run in their own CI action and cannot run in parallel with other tests and different env vars. Its perhaps OK we do this for e2e tests (like we do already) but for unit tests it is not a good design. Where we have already different exp files (prover, compiler unit tests and transactional tests) we do not do this via env vars but via test configs.

brmataptos · 2024-05-15T00:31:56Z

I'm not seeing what this PR brings us in new quality of testing since most features we have v2 tests for now are rather generic Move code which very likely are already covered by our regular tests.

I've already been able to somewhat reproduce Kevin's bug using move-docgen tests, because of the different build environment provided. In other cases, the added coverage is small, but it is non-zero. We can also test which tools/paths accidentally still use V1 under the covers by using the MOVE_COMPILER_DISABLE_V1 env var.

Instead we are adding new technical debt by having the v1/v2 living side by side. When we remove v1, we have to pay for this.

Some of the v2 tests (e.g. bytecode verifier) should have the same .exp file and not a new one. If the difference is in things like definition index we should better redact this from the baseline instead of creating new v2_exp.

This gives us an easy way to get a bit more coverage to track compiler V2 correctness and avoiding regressions:

Check in this PR, after I
- (1) carefully examined diffs between corresponding .exp and .v2_exp files, and
- (2) filed bugs for any meaningful differences.
Occasionally or regularly (CI) run more tests with MOVE_COMPILER_V2 to see any changes and fix them.
Eventually also run with MOVE_COMPILER_DISABLE_V1 to verify that V1 is truly unused.

A few parallel output files which we can ignore (after merging this PR) unless they change is an insignificant debt to take on for the benefit, even if it is small. Trying more complex approaches to leverage the testing value of these other packages would involve a significant amount of wasted time or technical debt: Jumping through hoops to make outputs identical, or building complex redaction/comparison frameworks to try to enable V2 runs to use exactly the same .exp files as V1 is technical debt. In retrospect, the move-compiler-v2-transactional-tests comparison mechanism might have been better done this way: we would avoided wasteful duplicate runs of V1 on every test, and also simplified the testing code significantly.

The cost is a one-time comparison of each .v2_exp file with each .exp file, and then we can forget about v1.

We can rename .v2_exp to .exp when we get rid of V1 if it bothers you.

wrwg

Alright, lets land it.

rahxephon89 · 2024-05-18T21:17:36Z

Hi @brmataptos, sorry I don't look all code and new files in detail because there are so many. I am wondering how these tests are run against the compiler V2 on CI? Do we automatically run V1 and then V2 for all of them (like the prover test)?

brmataptos · 2024-05-19T04:58:20Z

Hi @brmataptos, sorry I don't look all code and new files in detail because there are so many. I am wondering how these tests are run against the compiler V2 on CI? Do we automatically run V1 and then V2 for all of them (like the prover test)?

I think these will have to be run as a separate CI job with MOVE_COMPILER_V2=true set in the environment, as (1) Wolfgang has observed that varying environment variables across tests causes races, and (2) these tests represent a collection of tools, and feeding a CLI flag to V2 in each case will be tricky. While the move-prover example isolates the multiple configurations better, it's a lot of work for each test framework to implement.

The problem is that these tests represent a number of different test frameworks, so doing something like what was done for the prover tests would require a lot of work of questionable long-term value. This is a quick hack that allows us to run tests in the following directories under V2 with minimal effort, gaining perhaps 637 files with very few changes to .rs files. It requires some scripting to manage this collection of tests, but it will be easy to rename and/or change the defaults once V2 becomes mainstream.

Note that this PR introduces test outputs for the following directories:

aptos-move/aptos-transactional-test-harness/tests/
third_party/move/evm/move-to-yul/tests/
third_party/move/extensions/async/move-async-vm/tests/
third_party/move/move-bytecode-verifier/transactional-tests/tests/    *** (only 8/317 are .move)
third_party/move/move-model/bytecode/tests/
third_party/move/move-model/tests/
third_party/move/move-prover/bytecode-pipeline/tests/
third_party/move/move-vm/transactional-tests/tests/                             *** (only 19/168 are .move)
third_party/move/testing-infra/transactional-test-runner/tests/
third_party/move/tools/move-cli/tests/
third_party/move/tools/move-package/tests/
third_party/move/tools/move-unit-test/tests/

The two directories marked with *** are mostly .mvir files, but have a few .move files that are compiled, so are included although most of the tests display nothing about move compilation, since including them costs nothing.

Of the resulting 835 .v2_expfiles in the repo, just 110 of them have any difference from the .exp files, with a selective (identical files are omitted) diff output of 1667 lines, attached here:
diff-out.txt

Of these files:

many tests have trivial output differences (such as message wording, or different redactions in prover output)
the aptos_test_harness files have some differences, partially addressed in PR add extended_checks to aptos-transactional-test-harness #13337; after one of these PRs lands I can file more targeted bugs from seeing the other.
./third_party/move/testing-infra/transactional-test-runner/tests/vm_test_harness/example.v2_exp:
- shows that V2 disallows a module publication update
- filed issue [Bug][move-compiler-v2] compiler-v2 fails vm_test_harness/example.move test #13255
./third_party/move/tools/move-cli/tests/build_tests/disassemble_script/args.v2_exp shows:
- V2 doesn't do very well at optimizing the code
- other move-cli tests show that V2's code modules are bigger than V1:
- added examples to [Bug][compiler-v2] Optimized stackless bytecode may not always yield optimized file format bytecode #12404 for further study
V2 rejects two tests because packages have different address aliases; this was allowed in old Move but not Aptos
- ./third_party/move/tools/move-package/tests/test_sources/compilation/multiple_deps_rename/Move.v2_exp
- ./third_party/move/tools/move-package/tests/test_sources/compilation/multiple_deps_rename_one/Move.v2_exp
- Filed Issue [Bug][move-compiler-v2] V2 causes move-package tests multiple_deps_rename and multiple_deps_rename_one to fail #13256
V2 runs out of stack space running one test cases, so there is no file ./third_party/move/move-bytecode-verifier/transactional-tests/tests/reference_safety/call_function_with_many_acquires.v2_exp
- Filed issue [Bug][move-compiler-v2] stack overflow in call_function_with_many_acquires.move #13250

rahxephon89

I am wondering whether we need to add a script to this PR so that after any changes to the compiler, we can use it to automatically update these exp files?

github-actions · 2024-05-21T22:43:34Z

✅ Forge suite `compat` success on `3ffe0986b5fe4acb76544ae7ae85d73b91a6a411` ==> `a911281848342c77b2a7867d845a1123b513c2f1`

Compatibility test results for 3ffe0986b5fe4acb76544ae7ae85d73b91a6a411 ==> a911281848342c77b2a7867d845a1123b513c2f1 (PR)
1. Check liveness of validators at old version: 3ffe0986b5fe4acb76544ae7ae85d73b91a6a411
compatibility::simple-validator-upgrade::liveness-check : committed: 6569.8052104991575 txn/s, latency: 4714.147083665339 ms, (p50: 4800 ms, p90: 5400 ms, p99: 7800 ms), latency samples: 251000
2. Upgrading first Validator to new version: a911281848342c77b2a7867d845a1123b513c2f1
compatibility::simple-validator-upgrade::single-validator-upgrade : committed: 1661.1801874479156 txn/s, latency: 17172.990952839267 ms, (p50: 20500 ms, p90: 21900 ms, p99: 22300 ms), latency samples: 83120
3. Upgrading rest of first batch to new version: a911281848342c77b2a7867d845a1123b513c2f1
compatibility::simple-validator-upgrade::half-validator-upgrade : committed: 1846.4974067167332 txn/s, latency: 15463.347069002812 ms, (p50: 18100 ms, p90: 21600 ms, p99: 22300 ms), latency samples: 92460
4. upgrading second batch to new version: a911281848342c77b2a7867d845a1123b513c2f1
compatibility::simple-validator-upgrade::rest-validator-upgrade : committed: 3581.664825748539 txn/s, latency: 8692.227602347351 ms, (p50: 9700 ms, p90: 12500 ms, p99: 12700 ms), latency samples: 143140
5. check swarm health
Compatibility test for 3ffe0986b5fe4acb76544ae7ae85d73b91a6a411 ==> a911281848342c77b2a7867d845a1123b513c2f1 passed
Test Ok

github-actions · 2024-05-21T22:45:18Z

✅ Forge suite `realistic_env_max_load` success on `a911281848342c77b2a7867d845a1123b513c2f1`

two traffics test: inner traffic : committed: 8367.635097875393 txn/s, latency: 4685.89117074681 ms, (p50: 4500 ms, p90: 5400 ms, p99: 10200 ms), latency samples: 3617520
two traffics test : committed: 99.93390505155581 txn/s, latency: 1870.4883720930231 ms, (p50: 1800 ms, p90: 2000 ms, p99: 6100 ms), latency samples: 1720
Latency breakdown for phase 0: ["QsBatchToPos: max: 0.205, avg: 0.203", "QsPosToProposal: max: 0.222, avg: 0.203", "ConsensusProposalToOrdered: max: 0.439, avg: 0.411", "ConsensusOrderedToCommit: max: 0.389, avg: 0.373", "ConsensusProposalToCommit: max: 0.797, avg: 0.784"]
Max round gap was 1 [limit 4] at version 1635731. Max no progress secs was 4.647094 [limit 15] at version 1635731.
Test Ok

github-actions · 2024-05-21T22:47:30Z

✅ Forge suite `framework_upgrade` success on `3ffe0986b5fe4acb76544ae7ae85d73b91a6a411` ==> `a911281848342c77b2a7867d845a1123b513c2f1`

Compatibility test results for 3ffe0986b5fe4acb76544ae7ae85d73b91a6a411 ==> a911281848342c77b2a7867d845a1123b513c2f1 (PR)
Upgrade the nodes to version: a911281848342c77b2a7867d845a1123b513c2f1
framework_upgrade::framework-upgrade::full-framework-upgrade : committed: 1142.7233635024284 txn/s, submitted: 1144.7526670287743 txn/s, failed submission: 2.0293035263460646 txn/s, expired: 2.0293035263460646 txn/s, latency: 2686.9668014996055 ms, (p50: 2300 ms, p90: 4500 ms, p99: 8400 ms), latency samples: 101360
framework_upgrade::framework-upgrade::full-framework-upgrade : committed: 1104.1365200135954 txn/s, submitted: 1107.2237801055048 txn/s, failed submission: 3.087260091909394 txn/s, expired: 3.087260091909394 txn/s, latency: 2666.445116836429 ms, (p50: 2400 ms, p90: 4500 ms, p99: 6900 ms), latency samples: 100140
5. check swarm health
Compatibility test for 3ffe0986b5fe4acb76544ae7ae85d73b91a6a411 ==> a911281848342c77b2a7867d845a1123b513c2f1 passed
Upgrade the remaining nodes to version: a911281848342c77b2a7867d845a1123b513c2f1
framework_upgrade::framework-upgrade::full-framework-upgrade : committed: 1167.354840857509 txn/s, submitted: 1170.1802257083139 txn/s, failed submission: 2.825384850804782 txn/s, expired: 2.825384850804782 txn/s, latency: 2748.6200988301734 ms, (p50: 2300 ms, p90: 4800 ms, p99: 7000 ms), latency samples: 99160
Test Ok

brmataptos requested review from wrwg, rahxephon89, vineethk and fEst1ck May 11, 2024 02:50

wrwg reviewed May 14, 2024

View reviewed changes

wrwg approved these changes May 17, 2024

View reviewed changes

brmataptos force-pushed the brm-issue-13216 branch from 3ca51d2 to 9bd22c2 Compare May 19, 2024 05:26

rahxephon89 approved these changes May 20, 2024

View reviewed changes

brmataptos requested review from banool, gregnazario and 0xmaayan as code owners May 20, 2024 08:42

brmataptos force-pushed the brm-issue-13216 branch 2 times, most recently from f112943 to 9e96207 Compare May 20, 2024 23:13

brmataptos enabled auto-merge (squash) May 20, 2024 23:29

This comment has been minimized.

Sign in to view

brmataptos added 2 commits May 21, 2024 15:15

add outputs for v2 for a bunch of tests

f6ab837

add more .v2_exp files

b81c301

brmataptos added 5 commits May 21, 2024 15:15

generate .v2_exp files from more tests

6ba3570

still more .v2_exp files

d8a00c4

fix an exp file which was modified in debugging

3121540

more rust lints

cb4f65e

add some .v2_exp file changes from rebase

a911281

brmataptos force-pushed the brm-issue-13216 branch from 9e96207 to a911281 Compare May 21, 2024 22:16

This comment has been minimized.

Sign in to view

brmataptos merged commit db482a4 into main May 21, 2024
53 of 54 checks passed

brmataptos deleted the brm-issue-13216 branch May 21, 2024 22:51

rahxephon89 mentioned this pull request May 30, 2024

[Feature Request][move-compiler-v2] Parallel .exp files for all tests to be able to regression-test Compilers V1 and V2 in parallel for a while #12966

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Run MOVE_COMPILER_V2 with more tests and record outputs in .v2_exp files #13254

Run MOVE_COMPILER_V2 with more tests and record outputs in .v2_exp files #13254

brmataptos commented May 11, 2024

trunk-io bot commented May 11, 2024 •

edited

codecov bot commented May 11, 2024 •

edited

wrwg left a comment

wrwg May 14, 2024

brmataptos May 19, 2024

wrwg May 14, 2024

wrwg May 14, 2024

wrwg May 14, 2024 •

edited

brmataptos commented May 15, 2024

wrwg left a comment

rahxephon89 commented May 18, 2024

brmataptos commented May 19, 2024 •

edited

rahxephon89 left a comment

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

github-actions bot commented May 21, 2024

github-actions bot commented May 21, 2024

github-actions bot commented May 21, 2024

		@@ -0,0 +1,1138 @@
		/* =======================================

Run MOVE_COMPILER_V2 with more tests and record outputs in .v2_exp files #13254

Run MOVE_COMPILER_V2 with more tests and record outputs in .v2_exp files #13254

Conversation

brmataptos commented May 11, 2024

Description

Type of Change

Which Components or Systems Does This Change Impact?

How Has This Been Tested?

Key Areas to Review

trunk-io bot commented May 11, 2024 • edited

codecov bot commented May 11, 2024 • edited

Codecov Report

wrwg left a comment

Choose a reason for hiding this comment

wrwg May 14, 2024

Choose a reason for hiding this comment

brmataptos May 19, 2024

Choose a reason for hiding this comment

wrwg May 14, 2024

Choose a reason for hiding this comment

wrwg May 14, 2024

Choose a reason for hiding this comment

wrwg May 14, 2024 • edited

Choose a reason for hiding this comment

brmataptos commented May 15, 2024

wrwg left a comment

Choose a reason for hiding this comment

rahxephon89 commented May 18, 2024

brmataptos commented May 19, 2024 • edited

rahxephon89 left a comment

Choose a reason for hiding this comment

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

github-actions bot commented May 21, 2024

✅ Forge suite compat success on 3ffe0986b5fe4acb76544ae7ae85d73b91a6a411 ==> a911281848342c77b2a7867d845a1123b513c2f1

github-actions bot commented May 21, 2024

✅ Forge suite realistic_env_max_load success on a911281848342c77b2a7867d845a1123b513c2f1

github-actions bot commented May 21, 2024

✅ Forge suite framework_upgrade success on 3ffe0986b5fe4acb76544ae7ae85d73b91a6a411 ==> a911281848342c77b2a7867d845a1123b513c2f1

trunk-io bot commented May 11, 2024 •

edited

codecov bot commented May 11, 2024 •

edited

wrwg May 14, 2024 •

edited

brmataptos commented May 19, 2024 •

edited

✅ Forge suite `compat` success on `3ffe0986b5fe4acb76544ae7ae85d73b91a6a411` ==> `a911281848342c77b2a7867d845a1123b513c2f1`

✅ Forge suite `realistic_env_max_load` success on `a911281848342c77b2a7867d845a1123b513c2f1`

✅ Forge suite `framework_upgrade` success on `3ffe0986b5fe4acb76544ae7ae85d73b91a6a411` ==> `a911281848342c77b2a7867d845a1123b513c2f1`