-
Notifications
You must be signed in to change notification settings - Fork 3.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix: make DB Buffer use the up to date schema #25001
Conversation
Alternatively we can put the schema behind a RwLock and update it that way instead, but I figured this would also be fine. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Eesh... that was a doozie, thanks for sticking it out. One suggestion in-line, but otherwise LGTM. I am sure @pauldix will want to review as well, though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ahhhh that's a tricky one. There's one concern I have with this fix. Now every buffer into a table will call into the catalog to grab the db schema. So that's grabbing a read lock and doing a hash table lookup to get the database.
Most of the time when users do writes, they write into many tables at the same time. So that means we'd do this hash lookup for the DB and grabbing of the lock for every table that gets written into (this can be in the hundreds or thousands). The schema gets updated a single time for this (and thus the catalog gets updated).
Ideally, the buffer_table_batch
function would take the db_schema as an argument, that way the buffer_writes
function on OpenBufferSegment
could lookup the schema a single time and then pass that down to all the calls to buffer_table_batch
.
4e4aa17
to
d8708df
Compare
@pauldix I updated it to lookup the schema once per db and pass that in so that each table can read it for that particular db. I think with that we should limit the amount of times we get a lock on the catalog and still fix this issue. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good, thanks!
Alternate Title: The DB Schema only ever has one table This is a story of subtle bugs, gnashing of teeth, and hair pulling. Gather round as I tell you the tale of of an Arc that pointed to an outdated schema. In #24954 we introduced an Index for the database as this will allow us to perform faster queries. When we added that code this check was added: ```rust if !self.table_buffers.contains_key(&table_name) { // TODO: this check shouldn't be necessary. If the table doesn't exist in the catalog // and we've gotten here, it means we're dropping a write. if let Some(table) = self.db_schema.get_table(&table_name) { self.table_buffers.insert( table_name.clone(), TableBuffer::new(segment_key.clone(), &table.index_columns()), ); } else { return; } } ``` Adding the return there let us continue on with our day and make the tests pass. However, just because these tests passed didn't mean the code was correct as I would soon find out. With a follow up ticket of #24955 created we merged the changes and I began to debug the issue. Note we had the assumption of dropping a single write due to limits because the limits test is what failed. What began was a chase of a few days to prove that the limits weren't what was failing. This was a bit long but the conclusion was that the limits weren't causing it, but it did expose the fact that a Database only ever had one table which was weird. I then began to dig into this a bit more. Why would there only be one table? We weren't just dropping one write, we were dropping all but *one* write or so it seemed. Many printlns/hours later it became clear that we were actually updating the schema! It existed in the Catalog, but not in the pointer to the schema in the DatabaseBuffer struct so what gives? Well we need to look at [another piece of code](https://github.com/influxdata/influxdb/blob/8f72bf06e13e2269db973b9c2852e092239c701e/influxdb3_write/src/write_buffer/mod.rs#L540-L541). In the `validate_or_insert_schema_and_partitions` function for the WriteBuffer we have this bit of code: ```rust // The (potentially updated) DatabaseSchema to return to the caller. let mut schema = Cow::Borrowed(schema); ``` As we pass in a reference to the schema in the catalog. However, when we [go a bit further down](https://github.com/influxdata/influxdb/blob/8f72bf06e13e2269db973b9c2852e092239c701e/influxdb3_write/src/write_buffer/mod.rs#L565-L568) we see this code: ```rust let schema = match schema { Cow::Owned(s) => Some(s), Cow::Borrowed(_) => None, }; ``` What this means is that if we make a change we clone the original and update it. We *aren't* making a change to the original schema. When we go back up the call stack we get to [this bit of code](https://github.com/influxdata/influxdb/blob/8f72bf06e13e2269db973b9c2852e092239c701e/influxdb3_write/src/write_buffer/mod.rs#L456-L460): ```rust if let Some(schema) = result.schema.take() { debug!("replacing schema for {:?}", schema); catalog.replace_database(sequence, Arc::new(schema))?; } ``` We are updating the catalog with the new schema, but how does that work? ```rust inner.databases.insert(db.name.clone(), db); ``` Oh. Oh no. We're just overwriting it. Which means that the DatabaseBuffer has an Arc to the *old* schema, not the *new* one. Which means that the buffer will get the first copy of the schema with the first new table, but *none* of the other ones. The solution is to make sure that the buffer has a pointer to the Catalog instead which means the DatabaseBuffer will have the most up to date schema and instead lets only the Catalog handle the schema itself. This commit makes those changes to make sure it works. This was a very very subtle mutability/pointer bug given the intersection of valid borrow checking and some writes making it in, but luckily we caught it. It does mean though that until this fix is in, we can consider changes between the Index PR and now are subtly broken and shouldn't be used for anything beyond writing to a signle table per DB. TL;DR We should ask the Catalog what the schema is as it contains the up to date version of it. Closes #24955
d8708df
to
1dc1b9b
Compare
Alternate Title: The DB Schema only ever has one table
This is a story of subtle bugs, gnashing of teeth, and hair pulling. Gather round as I tell you the tale of of an Arc that pointed to an outdated schema.
In #24954 we introduced an Index for the database as this will allow us to perform faster queries. When we added that code this check was added:
Adding the return there let us continue on with our day and make the tests pass. However, just because these tests passed didn't mean the code was correct as I would soon find out. With a follow up ticket of #24955 created we merged the changes and I began to debug the issue.
Note we had the assumption of dropping a single write due to limits because the limits test is what failed. What began was a chase of a few days to prove that the limits weren't what was failing. This was a bit long but the conclusion was that the limits weren't causing it, but it did expose the fact that a Database only ever had one table which was weird.
I then began to dig into this a bit more. Why would there only be one table? We weren't just dropping one write, we were dropping all but one write or so it seemed. Many printlns/hours later it became clear that we were actually updating the schema! It existed in the Catalog, but not in the pointer to the schema in the DatabaseBuffer struct so what gives?
Well we need to look at another piece of code.
In the
validate_or_insert_schema_and_partitions
function for the WriteBuffer we have this bit of code:As we pass in a reference to the schema in the catalog. However, when we go a bit further down we see this code:
What this means is that if we make a change we clone the original and update it. We aren't making a change to the original schema. When we go back up the call stack we get to this bit of code:
We are updating the catalog with the new schema, but how does that work?
Oh. Oh no. We're just overwriting it. Which means that the DatabaseBuffer has an Arc to the old schema, not the new one. Which means that the buffer will get the first copy of the schema with the first new table, but none of the other ones. The solution is to make sure that the buffer has a pointer to the Catalog instead which means the DatabaseBuffer will have the most up to date schema and instead lets only the Catalog handle the schema itself. This commit makes those changes to make sure it works.
This was a very very subtle mutability/pointer bug given the intersection of valid borrow checking and some writes making it in, but luckily we caught it. It does mean though that until this fix is in, we can consider changes between the Index PR and now are subtly broken and shouldn't be used for anything beyond writing to a signle table per DB.
TL;DR We should ask the Catalog what the schema is as it contains the up to date version of it.
Closes #24955