Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Run MOVE_COMPILER_V2 with more tests and record outputs in .v2_exp files #13254

Merged
merged 7 commits into from
May 21, 2024

Conversation

brmataptos
Copy link
Contributor

Description

Updated various test harnesses and re-ran pretty much all .move tests with MOVE_COMPILER_V2=true to generate .v2_exp files to see what V2 does today. Diffed outputs with V1 outputs and filed bugs for differences (fortunately few).

Tests were run manually. Perhaps there should be a script. Something like the following was used:

for value in false true; do 
UB=1 MOVE_COMPILER_V2=$value cargo test \
      -p "aptos-transactional-test-harness" \
      -p "bytecode-verifier-transactional-tests" \
      -p "move-async-vm" \
      -p "move-cli" \
      -p "move-model" \
      -p "move-package" \
      -p "move-prover-bytecode-pipeline" \
      -p "move-stackless-bytecode" \
      -p "move-to-yul" \
      -p "move-transactional-test-runner" \
      -p "move-unit-test"
      -p "move-vm-transactional-tests" \
      --no-fail-fast \
      >& test-$value.out
done

Note, did not yet set MOVE_COMPILER_BLOCK_V1 so some of these may be using V1 under the hood, but with the baseline outputs checked in we can see if things degrade when we force use of V2.

Type of Change

  • New feature
  • Bug fix
  • Breaking change
  • Performance improvement
  • Refactoring
  • Dependency update
  • Documentation update
  • Tests

Which Components or Systems Does This Change Impact?

  • Validator Node
  • Full Node (API, Indexer, etc.)
  • Move/Aptos Virtual Machine
  • Aptos Framework
  • Aptos CLI/SDK
  • Developer Infrastructure
  • Other (specify)

How Has This Been Tested?

See above.

Key Areas to Review

Copy link

trunk-io bot commented May 11, 2024

Copy link

codecov bot commented May 11, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 33.0%. Comparing base (327fc17) to head (a911281).
Report is 7 commits behind head on main.

Additional details and impacted files
@@            Coverage Diff            @@
##             main   #13254     +/-   ##
=========================================
- Coverage    33.0%    33.0%   -0.1%     
=========================================
  Files        1760     1764      +4     
  Lines      338939   339033     +94     
=========================================
+ Hits       111976   111996     +20     
- Misses     226963   227037     +74     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Copy link
Contributor

@wrwg wrwg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • I'm not seeing what this PR brings us in new quality of testing since most features we have v2 tests for now are rather generic Move code which very likely are already covered by our regular tests.
  • Instead we are adding new technical debt by having the v1/v2 living side by side. When we remove v1, we have to pay for this.
  • Some of the v2 tests (e.g. bytecode verifier) should have the same .exp file and not a new one. If the difference is in things like definition index we should better redact this from the baseline instead of creating new v2_exp.

@@ -0,0 +1,1138 @@
/* =======================================
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I said before we should keep the evm stuff alone. Why are we adding this additional complexity?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It was easy enough to add it. Just a line in a script.

@@ -0,0 +1,55 @@
publishing Actor
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same here. Once we switch to v2 compiler as the default, those things can be updated. This does not give us relevant new test coverage, most test cases here and in the EVM code are not specific to Move features, but to the tool.

@@ -0,0 +1,37 @@
processed 8 tasks
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How does that differ from the simple .exp? There shouldn't be any difference. Is it the definition indices?

@@ -11,6 +11,14 @@ pub const EXP_EXT: &str = "exp";
/// Extension for expected output files compiled by v2
pub const EXP_EXT_V2: &str = "v2_exp";

pub fn get_compiler_exp_extension() -> &'static str {
if get_move_compiler_v2_from_env() {
Copy link
Contributor

@wrwg wrwg May 14, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any tests which are dependent from env variables must be run in their own CI action and cannot run in parallel with other tests and different env vars. Its perhaps OK we do this for e2e tests (like we do already) but for unit tests it is not a good design. Where we have already different exp files (prover, compiler unit tests and transactional tests) we do not do this via env vars but via test configs.

@brmataptos
Copy link
Contributor Author

  • I'm not seeing what this PR brings us in new quality of testing since most features we have v2 tests for now are rather generic Move code which very likely are already covered by our regular tests.

I've already been able to somewhat reproduce Kevin's bug using move-docgen tests, because of the different build environment provided. In other cases, the added coverage is small, but it is non-zero. We can also test which tools/paths accidentally still use V1 under the covers by using the MOVE_COMPILER_DISABLE_V1 env var.

  • Instead we are adding new technical debt by having the v1/v2 living side by side. When we remove v1, we have to pay for this.
  • Some of the v2 tests (e.g. bytecode verifier) should have the same .exp file and not a new one. If the difference is in things like definition index we should better redact this from the baseline instead of creating new v2_exp.

This gives us an easy way to get a bit more coverage to track compiler V2 correctness and avoiding regressions:

  • Check in this PR, after I
    • (1) carefully examined diffs between corresponding .exp and .v2_exp files, and
    • (2) filed bugs for any meaningful differences.
  • Occasionally or regularly (CI) run more tests with MOVE_COMPILER_V2 to see any changes and fix them.
  • Eventually also run with MOVE_COMPILER_DISABLE_V1 to verify that V1 is truly unused.

A few parallel output files which we can ignore (after merging this PR) unless they change is an insignificant debt to take on for the benefit, even if it is small. Trying more complex approaches to leverage the testing value of these other packages would involve a significant amount of wasted time or technical debt: Jumping through hoops to make outputs identical, or building complex redaction/comparison frameworks to try to enable V2 runs to use exactly the same .exp files as V1 is technical debt. In retrospect, the move-compiler-v2-transactional-tests comparison mechanism might have been better done this way: we would avoided wasteful duplicate runs of V1 on every test, and also simplified the testing code significantly.

The cost is a one-time comparison of each .v2_exp file with each .exp file, and then we can forget about v1.

We can rename .v2_exp to .exp when we get rid of V1 if it bothers you.

Copy link
Contributor

@wrwg wrwg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Alright, lets land it.

@rahxephon89
Copy link
Contributor

Hi @brmataptos, sorry I don't look all code and new files in detail because there are so many. I am wondering how these tests are run against the compiler V2 on CI? Do we automatically run V1 and then V2 for all of them (like the prover test)?

@brmataptos
Copy link
Contributor Author

brmataptos commented May 19, 2024

Hi @brmataptos, sorry I don't look all code and new files in detail because there are so many. I am wondering how these tests are run against the compiler V2 on CI? Do we automatically run V1 and then V2 for all of them (like the prover test)?

I think these will have to be run as a separate CI job with MOVE_COMPILER_V2=true set in the environment, as (1) Wolfgang has observed that varying environment variables across tests causes races, and (2) these tests represent a collection of tools, and feeding a CLI flag to V2 in each case will be tricky. While the move-prover example isolates the multiple configurations better, it's a lot of work for each test framework to implement.

The problem is that these tests represent a number of different test frameworks, so doing something like what was done for the prover tests would require a lot of work of questionable long-term value. This is a quick hack that allows us to run tests in the following directories under V2 with minimal effort, gaining perhaps 637 files with very few changes to .rs files. It requires some scripting to manage this collection of tests, but it will be easy to rename and/or change the defaults once V2 becomes mainstream.

Note that this PR introduces test outputs for the following directories:

aptos-move/aptos-transactional-test-harness/tests/
third_party/move/evm/move-to-yul/tests/
third_party/move/extensions/async/move-async-vm/tests/
third_party/move/move-bytecode-verifier/transactional-tests/tests/    *** (only 8/317 are .move)
third_party/move/move-model/bytecode/tests/
third_party/move/move-model/tests/
third_party/move/move-prover/bytecode-pipeline/tests/
third_party/move/move-vm/transactional-tests/tests/                             *** (only 19/168 are .move)
third_party/move/testing-infra/transactional-test-runner/tests/
third_party/move/tools/move-cli/tests/
third_party/move/tools/move-package/tests/
third_party/move/tools/move-unit-test/tests/

The two directories marked with *** are mostly .mvir files, but have a few .move files that are compiled, so are included although most of the tests display nothing about move compilation, since including them costs nothing.

Of the resulting 835 .v2_expfiles in the repo, just 110 of them have any difference from the .exp files, with a selective (identical files are omitted) diff output of 1667 lines, attached here:
diff-out.txt

Of these files:

Copy link
Contributor

@rahxephon89 rahxephon89 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am wondering whether we need to add a script to this PR so that after any changes to the compiler, we can use it to automatically update these exp files?

@brmataptos brmataptos force-pushed the brm-issue-13216 branch 2 times, most recently from f112943 to 9e96207 Compare May 20, 2024 23:13
@brmataptos brmataptos enabled auto-merge (squash) May 20, 2024 23:29

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

Copy link
Contributor

✅ Forge suite compat success on 3ffe0986b5fe4acb76544ae7ae85d73b91a6a411 ==> a911281848342c77b2a7867d845a1123b513c2f1

Compatibility test results for 3ffe0986b5fe4acb76544ae7ae85d73b91a6a411 ==> a911281848342c77b2a7867d845a1123b513c2f1 (PR)
1. Check liveness of validators at old version: 3ffe0986b5fe4acb76544ae7ae85d73b91a6a411
compatibility::simple-validator-upgrade::liveness-check : committed: 6569.8052104991575 txn/s, latency: 4714.147083665339 ms, (p50: 4800 ms, p90: 5400 ms, p99: 7800 ms), latency samples: 251000
2. Upgrading first Validator to new version: a911281848342c77b2a7867d845a1123b513c2f1
compatibility::simple-validator-upgrade::single-validator-upgrade : committed: 1661.1801874479156 txn/s, latency: 17172.990952839267 ms, (p50: 20500 ms, p90: 21900 ms, p99: 22300 ms), latency samples: 83120
3. Upgrading rest of first batch to new version: a911281848342c77b2a7867d845a1123b513c2f1
compatibility::simple-validator-upgrade::half-validator-upgrade : committed: 1846.4974067167332 txn/s, latency: 15463.347069002812 ms, (p50: 18100 ms, p90: 21600 ms, p99: 22300 ms), latency samples: 92460
4. upgrading second batch to new version: a911281848342c77b2a7867d845a1123b513c2f1
compatibility::simple-validator-upgrade::rest-validator-upgrade : committed: 3581.664825748539 txn/s, latency: 8692.227602347351 ms, (p50: 9700 ms, p90: 12500 ms, p99: 12700 ms), latency samples: 143140
5. check swarm health
Compatibility test for 3ffe0986b5fe4acb76544ae7ae85d73b91a6a411 ==> a911281848342c77b2a7867d845a1123b513c2f1 passed
Test Ok

Copy link
Contributor

✅ Forge suite realistic_env_max_load success on a911281848342c77b2a7867d845a1123b513c2f1

two traffics test: inner traffic : committed: 8367.635097875393 txn/s, latency: 4685.89117074681 ms, (p50: 4500 ms, p90: 5400 ms, p99: 10200 ms), latency samples: 3617520
two traffics test : committed: 99.93390505155581 txn/s, latency: 1870.4883720930231 ms, (p50: 1800 ms, p90: 2000 ms, p99: 6100 ms), latency samples: 1720
Latency breakdown for phase 0: ["QsBatchToPos: max: 0.205, avg: 0.203", "QsPosToProposal: max: 0.222, avg: 0.203", "ConsensusProposalToOrdered: max: 0.439, avg: 0.411", "ConsensusOrderedToCommit: max: 0.389, avg: 0.373", "ConsensusProposalToCommit: max: 0.797, avg: 0.784"]
Max round gap was 1 [limit 4] at version 1635731. Max no progress secs was 4.647094 [limit 15] at version 1635731.
Test Ok

Copy link
Contributor

✅ Forge suite framework_upgrade success on 3ffe0986b5fe4acb76544ae7ae85d73b91a6a411 ==> a911281848342c77b2a7867d845a1123b513c2f1

Compatibility test results for 3ffe0986b5fe4acb76544ae7ae85d73b91a6a411 ==> a911281848342c77b2a7867d845a1123b513c2f1 (PR)
Upgrade the nodes to version: a911281848342c77b2a7867d845a1123b513c2f1
framework_upgrade::framework-upgrade::full-framework-upgrade : committed: 1142.7233635024284 txn/s, submitted: 1144.7526670287743 txn/s, failed submission: 2.0293035263460646 txn/s, expired: 2.0293035263460646 txn/s, latency: 2686.9668014996055 ms, (p50: 2300 ms, p90: 4500 ms, p99: 8400 ms), latency samples: 101360
framework_upgrade::framework-upgrade::full-framework-upgrade : committed: 1104.1365200135954 txn/s, submitted: 1107.2237801055048 txn/s, failed submission: 3.087260091909394 txn/s, expired: 3.087260091909394 txn/s, latency: 2666.445116836429 ms, (p50: 2400 ms, p90: 4500 ms, p99: 6900 ms), latency samples: 100140
5. check swarm health
Compatibility test for 3ffe0986b5fe4acb76544ae7ae85d73b91a6a411 ==> a911281848342c77b2a7867d845a1123b513c2f1 passed
Upgrade the remaining nodes to version: a911281848342c77b2a7867d845a1123b513c2f1
framework_upgrade::framework-upgrade::full-framework-upgrade : committed: 1167.354840857509 txn/s, submitted: 1170.1802257083139 txn/s, failed submission: 2.825384850804782 txn/s, expired: 2.825384850804782 txn/s, latency: 2748.6200988301734 ms, (p50: 2300 ms, p90: 4800 ms, p99: 7000 ms), latency samples: 99160
Test Ok

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants