[Forknet] Database trimmer that preserves only flat state. #11310

robin-near · 2024-05-14T18:15:13Z

Closes #11280

Includes two new commands:

neard database aggressive-trimming --obliterate-disk-trie performs the deletion of all State keys except those large values needed by flat storage. This is used to test that memtrie-only nodes can work.
neard fork-network finalize now rebuilds the database by copying out FlatState as well as the State keys that are needed for large values, and then swapping in the new database replacing the old.

@marcelo-gonzalez Not sure how to test the fork-network changes.

codecov · 2024-05-14T18:52:23Z

Codecov Report

Attention: Patch coverage is 55.54286% with 389 lines in your changes are missing coverage. Please review.

Project coverage is 70.90%. Comparing base (1193219) to head (da63b16).
Report is 1 commits behind head on master.

Files	Patch %	Lines
tools/database/src/aggressive_trimming.rs	0.00%	160 Missing and 1 partial ⚠️
tools/fork-network/src/trim_database.rs	0.00%	135 Missing ⚠️
core/store/src/parallel_iter.rs	94.26%	23 Missing and 5 partials ⚠️
tools/fork-network/src/cli.rs	0.00%	22 Missing ⚠️
core/store/src/db/splitdb.rs	0.00%	15 Missing ⚠️
core/store/src/db/mixeddb.rs	0.00%	11 Missing ⚠️
core/store/src/db/colddb.rs	0.00%	9 Missing ⚠️
core/store/src/db/rocksdb.rs	0.00%	8 Missing ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##           master   #11310      +/-   ##
==========================================
- Coverage   70.98%   70.90%   -0.09%     
==========================================
  Files         782      785       +3     
  Lines      156205   157066     +861     
  Branches   156205   157066     +861     
==========================================
+ Hits       110889   111370     +481     
- Misses      40496    40875     +379     
- Partials     4820     4821       +1

Flag	Coverage Δ
backward-compatibility	`0.24% <0.00%> (-0.01%)`	⬇️
db-migration	`0.24% <0.00%> (-0.01%)`	⬇️
genesis-check	`1.39% <0.00%> (-0.01%)`	⬇️
integration-tests	`36.98% <0.00%> (-0.22%)`	⬇️
linux	`68.96% <55.54%> (-0.11%)`	⬇️
linux-nightly	`70.37% <55.54%> (-0.10%)`	⬇️
macos	`52.54% <55.54%> (+1.65%)`	⬆️
pytests	`1.61% <0.00%> (-0.02%)`	⬇️
sanity-checks	`1.40% <0.00%> (-0.01%)`	⬇️
unittests	`65.36% <55.54%> (-0.05%)`	⬇️
upgradability	`0.29% <0.00%> (-0.01%)`	⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

bowenwang1996

@marcelo-gonzalez could you review this PR?

marcelo-gonzalez · 2024-05-22T03:28:48Z

core/store/src/parallel_iter.rs

+                        for kv in iter {
+                            let (key, value) = kv.unwrap();
+                            (options.callback)(&key, Some(&value));
+                            if Instant::now() > min_time_to_split {


if min_time_to_split is way far in the future, it feels a little sad to me to be putting an Instant::now() and associated syscall in this otherwise tight loop no? im not sure what exactly it is we're trying to avoid/achieve here, but i wonder if theres something better. bc my guess is that calling Instant::now() on every key/val returned by the iterator is actually a pretty non trivial slowdown

That's what I thought too, but given how fast the parallel iteration is, it doesn't seem to me it's a noticeable bottleneck.

marcelo-gonzalez · 2024-05-22T03:29:57Z

core/store/src/parallel_iter.rs

+                            if Instant::now() > min_time_to_split {
+                                if shared_state
+                                    .num_threads_free
+                                    .load(std::sync::atomic::Ordering::Relaxed)


since it's just Relaxed, if you read it and it's greater than 0, i think theres no guarantee that theres at least 1 thread that is not currently in the loop doing some work right? i bet it's not a big deal in practice esp on x86, and also since there's the shared_state.thread_work.lock() right after setting it, which is prob enough of a barrier. But in any case Relaxed feels a little fragile to me for what's being done here

There's no guarantee about anything in num_threads_free. It doesn't matter anyway, it's just a signal to suggest the current workers that it's probably wise to split. Split or not, it doesn't affect correctness. Thus the use of Relaxed.

marcelo-gonzalez · 2024-05-22T03:30:59Z

core/store/src/parallel_iter.rs

+                    break;
+                }
+            }
+            for byte in start[byte_not_ff] + 1..=0xff {


what if end is empty and start == [0xff] ? then forgetting about what we want to do about splitting, this start[byte_not_ff] + 1 will panic with overflow

this triggers the panic:

diff --git a/core/store/src/parallel_iter.rs b/core/store/src/parallel_iter.rs index eb7c6d7b9..ce3998c2c 100644 --- a/core/store/src/parallel_iter.rs +++ b/core/store/src/parallel_iter.rs @@ -489,6 +489,9 @@ mod tests { #[test] fn test_iteration_range_divide() { + let ranges = IterationRange::from_str("ff..").unwrap().divide(); + println!("{}", ranges[0].to_string()); + let ranges = IterationRange::from_str("..").unwrap().divide(); assert_eq!(ranges.len(), 257); assert_eq!(ranges[0].to_string(), "..00");

Haha... nice catch. Well I'm removing all this code so, phew. This code is so messy. I wonder if there's a good way to make it cleaner.

marcelo-gonzalez · 2024-05-22T03:31:47Z

core/store/src/parallel_iter.rs

+        let mut update = store.store_update();
+        let mut data = BTreeMap::new();
+        for _ in 0..1000000 {
+            let key_len = rand::random::<usize>() % 10;


could consider using a hardcoded seed in these rand calls so if it ever fails it will be easy to reproduce

marcelo-gonzalez · 2024-05-22T03:32:15Z

tools/database/src/aggressive_trimming.rs

+            StoreParallelIterator::lookup_keys(
+                store.clone(),
+                DBCol::State,
+                keys,


if there are no inlined keys we will hit the assert in WorkItem::new_lookup(). this can actually happen and i hit this on localnet

marcelo-gonzalez · 2024-05-22T03:33:16Z

tools/database/src/aggressive_trimming.rs

+        // Now that we've read all the non-inlined keys into memory, delete the State column, and
+        // write back these values.
+        let mut update = store.store_update();
+        update.delete_all(DBCol::State);


so i wonder if it is worth adding a way to delete a column like this via drop_cf(), like we do for checkpoints, because otherwise the space wont be cleared up until a compaction happens at some point. obviously not a big deal to implement that in this PR, but in the future it would prob be good

Aggressive trimming is just a testing tool for this trimming, it isn't meant to shrink the db size. But you're right, when it comes time to maybe actually delete most of the State column, then yeah, drop_cf would be awesome.

marcelo-gonzalez · 2024-05-22T03:33:28Z

tools/fork-network/src/cli.rs

+            .join(near_config.config.store.path.clone().unwrap_or_else(|| PathBuf::from("data")));
+        let temp_store_home = store_path.join("temp-trimmed-db");
+        if std::fs::metadata(&temp_store_home).is_ok() {
+            panic!("Temp trimmed DB already exists; please delete temp-trimmed-db in the data directory first");


why not return an anyhow::Error instead of panicking?

marcelo-gonzalez · 2024-05-22T03:33:52Z

tools/fork-network/src/trim_database.rs

+    let (tx, rx) = std::sync::mpsc::sync_channel(1000000);
+    let iter_handle = {
+        let old = old.clone();
+        std::thread::spawn(move || {


not a big deal, but why do we need a new thread for this? this new thread will just be spawning other ones and then waiting in join() right? again, not a big deal, but could just do the joining later when you feel like it right?

Well both sides of the channel needs to be operated on in parallel. So either this is in a thread, or the consumer side is in a thread.

marcelo-gonzalez · 2024-05-22T03:36:39Z

tools/fork-network/src/trim_database.rs

+            StoreParallelIterator::lookup_keys(
+                old,
+                DBCol::State,
+                non_inlined_keys,


again I hit the same assert here because there were no non inlined keys. if you want to reproduce, what i did is i ran python3 tests/sanity/transactions.py with this change:

diff --git a/pytest/tests/sanity/transactions.py b/pytest/tests/sanity/transactions.py index c628fbee0..a936e1e56 100755 --- a/pytest/tests/sanity/transactions.py +++ b/pytest/tests/sanity/transactions.py @@ -23,7 +23,7 @@ nodes = start_cluster( num_shards=4, config=None, genesis_config_changes=[["min_gas_price", - 0], ["max_inflation_rate", [0, 1]], + 0], ["use_production_config", True], ["max_inflation_rate", [0, 1]], ["epoch_length", 10], ["block_producer_kickout_threshold", 70]], client_config_changes={ @@ -33,7 +33,8 @@ nodes = start_cluster( "secs": 2, "nanos": 0 } - } + }, + "tracked_shards": [0, 1, 2, 3] }, 1: { "consensus": { @@ -41,7 +42,8 @@ nodes = start_cluster( "secs": 2, "nanos": 0 } - } + }, + "tracked_shards": [0, 1, 2, 3] }, 2: { "consensus": { @@ -49,7 +51,8 @@ nodes = start_cluster( "secs": 2, "nanos": 0 } - } + }, + "tracked_shards": [0, 1, 2, 3] }, 3: { "consensus": { @@ -57,7 +60,8 @@ nodes = start_cluster( "secs": 2, "nanos": 0 } - } + }, + "tracked_shards": [0, 1, 2, 3] }, 4: { "consensus": { @@ -72,7 +76,7 @@ nodes = start_cluster( started = time.time() -act_to_val = [4, 4, 4, 4, 4] +act_to_val = [0, 0, 0, 0, 0] ctx = utils.TxContext(act_to_val, nodes) @@ -97,8 +101,12 @@ cur_balances = ctx.get_balances() assert cur_balances == ctx.expected_balances, "%s != %s" % ( cur_balances, ctx.expected_balances) +killed_it = False # we are done with the sanity test, now let's stress it -for height, _ in utils.poll_blocks(nodes[4], timeout=TIMEOUT): +for height, _ in utils.poll_blocks(nodes[0], timeout=TIMEOUT): + if height >= 60 and not killed_it: + nodes[4].kill() + killed_it=True if ctx.get_balances() == ctx.expected_balances: count = height - sent_height logger.info(f'Balances caught up, took {count} blocks, moving on')

and then after that test finished i ran:

$ python3 tests/mocknet/local_test_node.py local-test-setup --num-nodes 2 --neard-binary-path ~/nearcore/target/debug/neard --source-home-dir ~/.near/test0_finished --target-home-dir ~/.near/test0_finished --yes $ python3 tests/mocknet/mirror.py --local-test new-test --epoch-length 200 --num-validators 2 --num-seats 100 --genesis-protocol-version 67 --yes

(if the new-test command fails with some timeout error, just try again, idk exactly why that happens). but then if you look at the logs in ~/.near/local-mocknet/node0/stderr, youll see that the fork-network command panicked

marcelo-gonzalez · 2024-05-22T03:37:27Z

so rly the only comments that made me not click approve are the ones that actually cause panics, but the others are just suggestions

robin-near · 2024-05-29T13:31:04Z

Thanks so much for the really diligent review @marcelo-gonzalez !

So after implementing parallel memtrie load, I think this parallel iterator can also be scrapped and replaced with the parallel memtrie load logic. Reason being (1) it just uses rayon so there's much less of these corner cases, performance problems and panics; (2) invoking part of that code is needed anyway because to support parallel memtrie load we're gonna need some additional State entries, and that code figures out exactly what state entries are needed.

I'll do that as soon as I get a chance.

robin-near · 2024-05-30T06:12:31Z

@marcelo-gonzalez Could you try this #11420 this is the new implementation I talked about. I can't make a reviewable PR out of it yet because the parallel loading PR is not merged and I need to clean up some of the changes that are no longer needed because of this. But in the meantime if you have time you could tell me if it works or not :)

robin-near · 2024-05-31T21:22:31Z

Closing this in favor of #11420

[Forknet] Database trimmer that preserves only flat state.

f103a95

robin-near requested review from bowenwang1996 and marcelo-gonzalez May 14, 2024 18:15

robin-near requested a review from a team as a code owner May 14, 2024 18:15

Increase coverage.

da63b16

bowenwang1996 reviewed May 17, 2024

View reviewed changes

marcelo-gonzalez reviewed May 22, 2024

View reviewed changes

robin-near mentioned this pull request May 31, 2024

[Forknet V2] Aggressive database trimmer that works with memtrie. #11420

Open

robin-near closed this May 31, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Forknet] Database trimmer that preserves only flat state. #11310

[Forknet] Database trimmer that preserves only flat state. #11310

robin-near commented May 14, 2024

codecov bot commented May 14, 2024 •

edited

bowenwang1996 left a comment

marcelo-gonzalez May 22, 2024

robin-near May 30, 2024

marcelo-gonzalez May 22, 2024

robin-near May 30, 2024

marcelo-gonzalez May 22, 2024

robin-near May 30, 2024

marcelo-gonzalez May 22, 2024

marcelo-gonzalez May 22, 2024

marcelo-gonzalez May 22, 2024

robin-near May 30, 2024

marcelo-gonzalez May 22, 2024

robin-near May 30, 2024

marcelo-gonzalez May 22, 2024

robin-near May 30, 2024

marcelo-gonzalez May 22, 2024

marcelo-gonzalez commented May 22, 2024

robin-near commented May 29, 2024

robin-near commented May 30, 2024

robin-near commented May 31, 2024

[Forknet] Database trimmer that preserves only flat state. #11310

[Forknet] Database trimmer that preserves only flat state. #11310

Conversation

robin-near commented May 14, 2024

codecov bot commented May 14, 2024 • edited

Codecov Report

bowenwang1996 left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

marcelo-gonzalez commented May 22, 2024

robin-near commented May 29, 2024

robin-near commented May 30, 2024

robin-near commented May 31, 2024

codecov bot commented May 14, 2024 •

edited