File store compression #4004

neilalexander · 2023-03-30T17:08:26Z

This PR adds internal support for compression of message blocks in the file store. Message blocks will be compressed when they reach their maximum configured block size and we have moved onto the next block, so as to avoid slowing down on-going writes as far as possible.

Compressed blocks are preambled with a small amount of metadata describing the compression algorithm (although at this point only S2 compression is supported) and compression takes place before encryption (so as to maximise the efficiency of the compression where a stream contains redundant or repeated information in the messages).

This PR does not provide any way to configure compression from the stream level yet (or indeed any other level exposed to clients or operators). It also does not go out of its way to compress old uncompressed blocks after turning on compression unless they are compacted or truncated. These omissions will be addressed in follow-up PRs.

New permutations have been added to the file store tests to also ensure test coverage when compression is enabled.

We should note that, when file store encryption is not in use, storage-level or filesystem-level deduplication or compression (such as those provided by some disk arrays or filesystems like ZFS) may end up being considerably more space-efficient than file store compression would. Conversely, when file store encryption is enabled, enabling compression at the file store level may achieve better ratios.

Signed-off-by: Neil Twigg neil@nats.io

derekcollison · 2023-03-31T04:35:22Z

Will take a look in am, thanks!

If you want you can rebase on dev to get a clean travis run now.

derekcollison · 2023-03-31T19:58:30Z

server/filestore.go

+		if mb != nil && fs.fcfg.Compression != NoCompression {
+			// We've now reached the end of this message block, if we want
+			// to compress blocks then now's the time to do it.
+			go mb.recompressOnDiskIfNeeded()


Wonder if this is better lazily through sync go routine or when we load the block next?

server/filestore.go

derekcollison · 2023-03-31T20:00:31Z

server/filestore.go

+		// returned and the algorithm defaults to no compression.
+		return fmt.Errorf("failed to read existing metadata header: %w", err)
+	}
+	if meta.Algorithm == alg {


If we think this is common should we just peek at this part before loading in the whole block?

I've added a comment about this, it's really not supposed to be a common case but just means we won't end up rewriting blocks pointlessly if for some reason something does try to recompress with the same algorithm.

server/filestore.go

derekcollison · 2023-03-31T20:02:42Z

server/filestore.go

+	if err := tmpFD.Sync(); err != nil {
+		return fmt.Errorf("failed to sync temporary file: %w", err)
+	}
+	if err := tmpFD.Close(); err != nil {


This should be a defer up above so that any exit will always close.

I've made sure that errors result in clean-up, but I wanted to make sure the descriptor was closed before the rename.

server/filestore.go

derekcollison · 2023-03-31T20:06:47Z

server/filestore.go

+}
+
+func (alg StoreCompression) Compress(buf []byte) ([]byte, error) {
+	if len(buf) < checksumSize {


Probably should be a separate const threshold.

Since we need to preserve the checksum, the checksumSize seems like it makes sense, as we need to make sure the buffer is big enough to do that.

What I mean is probably does not make sense if block is below 1-2MB or so.

I wonder if that might be surprising behaviour, what threshold do you think makes most sense?

Looking at the constants, maybe defaultSmallBlockSize / 2 makes most sense, which would be 0.5MB today, but I’m also aware of the fact that some of the unit tests set the block size to be deliberately small and by adding in a threshold like this, we may lose out on test coverage of compression by accident.

Make your best call and we can get this merged in tomorrow.

I think the best thing to do for now is to leave it as-is so that the test coverage is maximal. If we think that compressing small blocks is hurting performance we can always follow it up later with another PR.

Signed-off-by: Neil Twigg <neil@nats.io>

derekcollison

LGTM

neilalexander requested a review from a team as a code owner March 30, 2023 17:08

neilalexander force-pushed the neil/fscompress2 branch 2 times, most recently from 1a200eb to dacc9bd Compare March 31, 2023 09:08

derekcollison reviewed Mar 31, 2023

View reviewed changes

neilalexander force-pushed the neil/fscompress2 branch 2 times, most recently from 1ac6e8a to 8a84e05 Compare April 3, 2023 15:24

neilalexander marked this pull request as draft April 3, 2023 16:08

neilalexander force-pushed the neil/fscompress2 branch from 8a84e05 to 4b44db2 Compare April 4, 2023 09:00

neilalexander marked this pull request as ready for review April 4, 2023 18:09

File store compression

193a69d

Signed-off-by: Neil Twigg <neil@nats.io>

neilalexander force-pushed the neil/fscompress2 branch from 4b44db2 to 193a69d Compare April 5, 2023 15:15

derekcollison self-requested a review April 5, 2023 19:41

derekcollison approved these changes Apr 5, 2023

View reviewed changes

derekcollison merged commit fd14e75 into dev Apr 5, 2023
2 checks passed

derekcollison deleted the neil/fscompress2 branch April 5, 2023 19:42

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

File store compression #4004

File store compression #4004

neilalexander commented Mar 30, 2023

derekcollison commented Mar 31, 2023

derekcollison Mar 31, 2023

derekcollison Mar 31, 2023

neilalexander Apr 3, 2023

derekcollison Mar 31, 2023

neilalexander Apr 3, 2023

derekcollison Mar 31, 2023

neilalexander Apr 3, 2023

derekcollison Apr 4, 2023

neilalexander Apr 4, 2023

derekcollison Apr 4, 2023

neilalexander Apr 4, 2023

derekcollison Apr 4, 2023

neilalexander Apr 5, 2023

derekcollison left a comment

File store compression #4004

File store compression #4004

Conversation

neilalexander commented Mar 30, 2023

derekcollison commented Mar 31, 2023

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

derekcollison left a comment

Choose a reason for hiding this comment