Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add padding before timestamp size record if it doesn't fit into a WAL block. #12614

Closed
wants to merge 3 commits into from

Conversation

andlr
Copy link
Contributor

@andlr andlr commented May 3, 2024

If timestamp size record doesn't fit into a block, without padding Writer::EmitPhysicalRecord fails on assert (either assert(block_offset_ + kHeaderSize + n <= kBlockSize); or assert(block_offset_ + kRecyclableHeaderSize + n <= kBlockSize), depending on whether recycling log files is enabled) in debug build. In release, current block grows beyond 32K, block_offset_ gets reset on next AddRecord and all the subsequent blocks are no longer aligned by block size.

@jowlyzhang jowlyzhang self-requested a review May 9, 2024 20:44
@andlr
Copy link
Contributor Author

andlr commented May 13, 2024

Sorry, my bad. This time CI/CD should be ok

@jowlyzhang
Copy link
Contributor

Hello @andlr thanks for this fix! Just curious did you hit this limitation in your DB, how many column families does your DB have? I assume it will start to happen for when the DB has a lot of column families, my rough calculation shows it takes about 5460 column families to encounter this.

recyclable_log ? kRecyclableHeaderSize : kHeaderSize;
const size_t data_len = kBlockSize - 2 * header_size;

const auto first_str = BigString("foo", data_len);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After first_str, there are kBlockSize - (kBlockSize - 2 * header_size) left in the block, a.k.a 2 * header_size:

  1. for non recycleable format, that's 2 * (4 + 2 + 1)
  2. for recycleable format, that's 2 * (4 + 2 + 1 + 4)

In both case, the remaining space in the block is sufficient to hold the ts_sz record which is:

  1. for non recycleable format: 4 + 2 + 1 + 6
  2. for recycleable format: 4 + 2 + 1 + 4 + 6

So it seems to me it won't invoke the logic added in this PR for: if (leftover < header_size_ + (int)encoded.size())

If a ts_sz record can spill over multiple blocks, I think we would need corresponding changes in log_reader.cc to handle any corresponding logic added into log_writer.cc

Maybe simply padding won't fully solve this issue, since it seems to me there is no other cases in log_writer/log_reader where one logical record spans over multiple blocks as multiple physical records without each physical record having its own header.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Never mind about the ts_sz spanning over multiple blocks comment, I just realized this PR is not intended to target that.

Copy link
Contributor Author

@andlr andlr May 13, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After first_str, there are kBlockSize - (kBlockSize - 2 * header_size) left in the block, a.k.a 2 * header_size:

Hm, not exactly, it's actually kBlockSize - (kBlockSize - header_size) == header_size, since Write(first_str) writes a header + data, not only data itself into WAL.

Without the fix, this test actually fails on assert as I've written in the PR description

@jowlyzhang
Copy link
Contributor

Hello @andlr thanks for this fix! Just curious did you hit this limitation in your DB, how many column families does your DB have? I assume it will start to happen for when the DB has a lot of column families, my rough calculation shows it takes about 5460 column families to encounter this.

I just realized this fix is more for when a new column family is added and a new ts_sz record needs to be emitted. If it's to be emitted in a block that does not have enough space left, it will encountered this issue. The fix is intended to account for this case rather than the ts_sz record itself being too big and spill over multiple blocks.

@andlr
Copy link
Contributor Author

andlr commented May 13, 2024

I just realized this fix is more for when a new column family is added and a new ts_sz record needs to be emitted. If it's to be emitted in a block that does not have enough space left, it will encountered this issue. The fix is intended to account for this case rather than the ts_sz record itself being too big and spill over multiple blocks.

Yeah, I've started writing a comment about that 🙂

Hello @andlr thanks for this fix! Just curious did you hit this limitation in your DB, how many column families does your DB have? I assume it will start to happen for when the DB has a lot of column families, my rough calculation shows it takes about 5460 column families to encounter this.

I didn't actually hit this in a running DB, I've been investigating a different WAL-related issue (for that one I'll create a separate issue and PR, it's a bit more complicated than this one), and I've noticed that WAL blocks might be not be always aligned by 32K, so I found this (though as it has turned out, it's unrelated to the initial issue I've been looking into).

Copy link
Contributor

@jowlyzhang jowlyzhang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! Thanks for the fix @andlr

@facebook-github-bot
Copy link
Contributor

@jowlyzhang has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

@facebook-github-bot
Copy link
Contributor

@jowlyzhang merged this pull request in b9fc13d.

@andlr andlr deleted the ts-size-record-padding branch May 15, 2024 09:02
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants