Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: cli,events: speed up backfill with temporary index #11953

Open
wants to merge 2 commits into
base: master
Choose a base branch
from

Conversation

rvagg
Copy link
Member

@rvagg rvagg commented May 2, 2024

From experience running this:

  1. Add a warning about running this against a node that's actively collecting events, you'll have trouble
  2. Use IMMEDIATE transactions so we know as soon as we start a transaction whether we can get an exclusive lock for a write or not so we can error more easily.
  3. Add an optional temporary index for our query which significantly speeds up this process (a lot).
  4. Run an optional vacuum after we're done.

Copy link
Contributor

@ZenGround0 ZenGround0 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some suggestions that are probably not feasible or worth the hassle given that this is a shed command.

Unfortunately I'm web3 native so SQL is all magic to me but I trust that what you're doing is fine.

@@ -32,6 +32,8 @@ var pragmas = []string{
"PRAGMA read_uncommitted = ON",
}

// Any changes to this schema should be matched for the `lotus-shed indexes backfill-events` command
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we share this schema with lotus-shed somehow or at least add a test that will break if this doesn't happen? I don't trust that this comment will do a very good job enforcing consistency.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's mainly the queries I'm concerned about; see #11723 where this becomes a problem and is why I thought to add this comment in there - the schema will change, then the queries will break in lotus-shed. We could put all the queries in index.go, but then we'd have to make them public to make them useful, and do we want to be stuck with additional public APIs just for this? We could, it just seems noisy for a command that basically nobody uses (I was running this on calibnet and it was painfully slow, yet nobody's complained since it was introduced as far as I know).

@@ -92,8 +103,12 @@ var backfillEventsCmd = &cli.Command{
return err
}

log.Infof(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems like this should be log.Warnf

Long shot -- does the API expose the state of event collection? It would be nice if we could just error here if the node is currently collecting events.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, and we'd end up with a race anyway, unless it was something like "give me a lock". It would be nice to do this all via RPC so we don't have this problem but I'm not sure an entirely new API just for this makes any sense.

Comment on lines +27 to +33
const (
// same as in chain/events/index.go
eventExists = `SELECT MAX(id) FROM event WHERE height=? AND tipset_key=? AND tipset_key_cid=? AND emitter_addr=? AND event_index=? AND message_cid=? AND message_index=?`
insertEvent = `INSERT OR IGNORE INTO event(height, tipset_key, tipset_key_cid, emitter_addr, event_index, message_cid, message_index, reverted) VALUES(?, ?, ?, ?, ?, ?, ?, ?)`
insertEntry = `INSERT OR IGNORE INTO event_entry(event_id, indexed, flags, key, codec, value) VALUES(?, ?, ?, ?, ?, ?)`
)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated this PR to pull these in, they're identical to chain/events/index.go, so it does suggest that maybe we should consider making them public over there. It would help keep things in sync, but would likely just result in this command failing the next time someone tries it and the parameters don't match. But we mostly have that problem already anyway.

@rvagg rvagg force-pushed the rvagg/backfill-improvements branch from 27402a8 to 6e571ab Compare May 6, 2024 03:12
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants