Updates to the way meta indexing is handled for filestore. #4450

derekcollison · 2023-08-30T18:21:50Z

Historically we kept indexing information, either by sequence or by subject, as a per msg block operation. These were the ".idx" and ".fss" indexing files. When streams became very large this could have an impact on recovery time. Also, for encryption the fast path for determining if the indexing was current would require loading and decrypting the complete block.

This design moves to a more traditional WAL and snapshot approach. The snapshots for the complete stream, including summary information, global per subject information maps (PSIM) and per msg block details including summary and dmap, are processed asynchronously. The snapshot includes the msg block and has for the last record hash that was considered in the snapshot. On recovery the snapshot is read and processed and any additional records past the point of the snapshot itself are processed. To this end, any non-system removal of a message has to be expressed as a delete tombstone that is always added the the fs.lmb file. These are processed on recovery and our indexing layer knows to skip them.

Changing to this method drastically improves startup and recovery times, and has simplified the code. Some normal performance benefits have been seen as well.

Signed-off-by: Derek Collison derek@nats.io

neilalexander

LGTM, I really like the approach here.

Some items for 2.11 out of this:

Clean-up of accounting, quite a lot of duplicated code
Clean-up of key loading from disk

server/filestore.go

wallyqs

LGTM

Historically we kept indexing information, either by sequence or by subject, as a per msg block operation. These were the "*.idx" and "*.fss" indexing files. When streams became very large this could have an impact on recovery time. Also, for encryption the fast path for determining if the indexing was current would require loading and decrypting the complete block. This design moves to a more traditional WAL and snapshot approach. The snapshots for the complete stream, including sumary information, global per subject information maps (PSIM) and per msg block details including summary and dmap, are processed asynchronously. The snapshot includes the msh block and has for the last record considered in the snapshot. On recovery the snapshot is read and processed and any additional records past the point of the snapshot itself are processed. To this end, any removal of a message has to be expressed as a delete tombstone that is always added the the fs.lmb file. These are processed on recovery and our indexing layer knows to skip them. Changing to this method drastically improves startup and recovery times, and has simplified the code. Some normal performance benefits have been seen as well. Signed-off-by: Derek Collison <derek@nats.io>

derekcollison requested a review from a team as a code owner August 30, 2023 18:21

derekcollison force-pushed the fs-meta-updates branch from 748975d to 897efc8 Compare August 30, 2023 18:33

neilalexander approved these changes Aug 30, 2023

View reviewed changes

server/filestore.go Outdated Show resolved Hide resolved

wallyqs approved these changes Aug 30, 2023

View reviewed changes

derekcollison force-pushed the fs-meta-updates branch from 897efc8 to d76f667 Compare August 30, 2023 22:36

derekcollison force-pushed the fs-meta-updates branch from d76f667 to adef828 Compare August 30, 2023 23:12

derekcollison merged commit b9b284d into dev Aug 30, 2023
2 checks passed

derekcollison deleted the fs-meta-updates branch August 30, 2023 23:49

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Updates to the way meta indexing is handled for filestore. #4450

Updates to the way meta indexing is handled for filestore. #4450

derekcollison commented Aug 30, 2023

neilalexander left a comment

wallyqs left a comment

Updates to the way meta indexing is handled for filestore. #4450

Updates to the way meta indexing is handled for filestore. #4450

Conversation

derekcollison commented Aug 30, 2023

neilalexander left a comment

Choose a reason for hiding this comment

wallyqs left a comment

Choose a reason for hiding this comment