Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

File store compression #4004

Merged
merged 1 commit into from Apr 5, 2023
Merged

File store compression #4004

merged 1 commit into from Apr 5, 2023

Conversation

neilalexander
Copy link
Member

This PR adds internal support for compression of message blocks in the file store. Message blocks will be compressed when they reach their maximum configured block size and we have moved onto the next block, so as to avoid slowing down on-going writes as far as possible.

Compressed blocks are preambled with a small amount of metadata describing the compression algorithm (although at this point only S2 compression is supported) and compression takes place before encryption (so as to maximise the efficiency of the compression where a stream contains redundant or repeated information in the messages).

This PR does not provide any way to configure compression from the stream level yet (or indeed any other level exposed to clients or operators). It also does not go out of its way to compress old uncompressed blocks after turning on compression unless they are compacted or truncated. These omissions will be addressed in follow-up PRs.

New permutations have been added to the file store tests to also ensure test coverage when compression is enabled.

We should note that, when file store encryption is not in use, storage-level or filesystem-level deduplication or compression (such as those provided by some disk arrays or filesystems like ZFS) may end up being considerably more space-efficient than file store compression would. Conversely, when file store encryption is enabled, enabling compression at the file store level may achieve better ratios.

Signed-off-by: Neil Twigg neil@nats.io

@neilalexander neilalexander requested a review from a team as a code owner March 30, 2023 17:08
@derekcollison
Copy link
Member

Will take a look in am, thanks!

If you want you can rebase on dev to get a clean travis run now.

@neilalexander neilalexander force-pushed the neil/fscompress2 branch 2 times, most recently from 1a200eb to dacc9bd Compare March 31, 2023 09:08
if mb != nil && fs.fcfg.Compression != NoCompression {
// We've now reached the end of this message block, if we want
// to compress blocks then now's the time to do it.
go mb.recompressOnDiskIfNeeded()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wonder if this is better lazily through sync go routine or when we load the block next?

server/filestore.go Outdated Show resolved Hide resolved
server/filestore.go Outdated Show resolved Hide resolved
// returned and the algorithm defaults to no compression.
return fmt.Errorf("failed to read existing metadata header: %w", err)
}
if meta.Algorithm == alg {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we think this is common should we just peek at this part before loading in the whole block?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've added a comment about this, it's really not supposed to be a common case but just means we won't end up rewriting blocks pointlessly if for some reason something does try to recompress with the same algorithm.

server/filestore.go Outdated Show resolved Hide resolved
server/filestore.go Outdated Show resolved Hide resolved
if err := tmpFD.Sync(); err != nil {
return fmt.Errorf("failed to sync temporary file: %w", err)
}
if err := tmpFD.Close(); err != nil {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be a defer up above so that any exit will always close.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've made sure that errors result in clean-up, but I wanted to make sure the descriptor was closed before the rename.

server/filestore.go Outdated Show resolved Hide resolved
server/filestore.go Show resolved Hide resolved
}

func (alg StoreCompression) Compress(buf []byte) ([]byte, error) {
if len(buf) < checksumSize {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably should be a separate const threshold.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since we need to preserve the checksum, the checksumSize seems like it makes sense, as we need to make sure the buffer is big enough to do that.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What I mean is probably does not make sense if block is below 1-2MB or so.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if that might be surprising behaviour, what threshold do you think makes most sense?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

1M? WDYT?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking at the constants, maybe defaultSmallBlockSize / 2 makes most sense, which would be 0.5MB today, but I’m also aware of the fact that some of the unit tests set the block size to be deliberately small and by adding in a threshold like this, we may lose out on test coverage of compression by accident.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Make your best call and we can get this merged in tomorrow.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the best thing to do for now is to leave it as-is so that the test coverage is maximal. If we think that compressing small blocks is hurting performance we can always follow it up later with another PR.

@neilalexander neilalexander force-pushed the neil/fscompress2 branch 2 times, most recently from 1ac6e8a to 8a84e05 Compare April 3, 2023 15:24
@neilalexander neilalexander marked this pull request as draft April 3, 2023 16:08
@neilalexander neilalexander marked this pull request as ready for review April 4, 2023 18:09
Signed-off-by: Neil Twigg <neil@nats.io>
Copy link
Member

@derekcollison derekcollison left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@derekcollison derekcollison merged commit fd14e75 into dev Apr 5, 2023
2 checks passed
@derekcollison derekcollison deleted the neil/fscompress2 branch April 5, 2023 19:42
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants