-
Notifications
You must be signed in to change notification settings - Fork 74
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
go: reader: make ordered iteration faster #1168
Conversation
cc. @wkalt |
2c6bdf0
to
11c0e6a
Compare
Do you have any thought on simplifying this to,
and making the reuse of the message's data buffer implicit? So the iterator will allocate a new buffer if it needs, thus resizing for the subsequent call. I would imagine that is no different than what we are effectively telling the user to do here. Secondly, can you quantify the cost of the schema/channel lookups relative to the data allocations (and heap iterations IIRC) and share how much the relative contribution of each is hurting us? My recollection is that the main costs are in allocations and in the heap, rather than in the lookups. The lookups can also be converted to slice lookups pretty easily, either with a dynamically sized slice or (worst case) the full set of 2^16 which would only take 512kb. I would guess the cost of the lookups if that were done should go to approximately zero. Third, could you post comparisons of the iteration performance of the reader vs the lexer? IIRC the lexer is much faster, like maybe 5x on adrian's mcnasty.mcap file. I think the goal of the reader improvements should roughly be to match the performance of the lexer, so it would be useful to have that as a comparison. Fourth, have you investigated whether it is possible to bypass the heap completely in the case of a fully-ordered chunk (which can be known from inspection of the index)? My hunch previously was that some significant savings may come from here. Great work overall, will take a closer look/test it out shortly. |
@james-rms I'm not sure if it's useful at all but this is the code I was working on when I was digging into this previously: https://github.com/wkalt/mcap/blob/task/golang-reader-perf-improvements/go/mcap/writer_test.go#L703 The test linked there will generate a sample file. This benchmark is the one I guess I was then using for comparisons: I don't remember unfortunately where I left things off, except that I broke the code and then didn't get it working again. It looks to me like the ideas that patch is implementing are, a) switch to a slice-based lookup for the channels/schemas, and b) allow the user to pre-allocate structures (I doubt preallocating schema/channel actually matters at all, since those are comparatively infrequent). My overall reaction to this PR is that I suspect we can probably maintain something closer to a traditional "iterator" interface (modulo whatever is needed for message preallocation -- i.e still returning our schema/channel pointers) while still getting all the performance improvements we want, so that in the 99% case where files are ordered, the lexer and the reader are doing identical work and are equally fast. I'm not wild about the interface changes required in this specific patch. I'm not really a fan of SchemaAndChannelForID - I think it just pushes the work our code did previously onto the user and probably results in the same amount of aggregate lookups being done in actual use (possibly with the user maintaining their own, duplicate mappings). If feasible I would definitely prefer if we could keep returning those pointers and instead make this efficient by changing the backing structures to slices. |
I guess my concrete thought is, can we make this the Next signature:
and get performance identical to the lexer, with that being the only required new interface. I suspect the answer is yes -- and if so, that seems like a better outcome. |
A rough idea for an approach might be, to first examine the chunk indexes to see if the chunks are ordered and if none overlap. 99%+ of the time I expect, this will be the case. If that is the case, there is no heap necessary. The chunks are just consulted in order. For each chunk, look at the message index to see if there is a disordering. If there isn't (99% of the case), stream-decompress the chunk directly, like the lexer does, meaning there is no need to even materialize the decompressed chunk. If there is a disordering, then sort the index (possibly using insertion sort if it matters -- it'll probably be almost-totally ordered already), decompress the chunk, and return messages in sorted order. If there are overlaps between the chunks, then dispatch to the current iterator logic. I think it is probably still possible to beat the current iterator without a heap (generally not all chunks will overlap, so the strategy above can be used for most of them), but my assumption would be that overlapping chunks are so rare that leaving this unoptimized for now will be totally fine. |
fa60a87
to
2d2d767
Compare
Regarding the signature - i've pushed an update which makes the API ** EDIT ** Previously this comment explained a bunch about how the cost of looking up schema and channel was too great, but I found and removed a duplicate lookup and the difference is so minor i'm now convinced. |
2d2d767
to
8f4bb48
Compare
8f4bb48
to
331438f
Compare
331438f
to
326a0b9
Compare
Also re: testing against the lexer: I have a rough estimate here:
|
@james-rms I think there is definitely value in returning the message pointer because some users may prefer to allocate on each call, i.e by passing in a nil as the input pointer. This would be useful if they want to pass the messages through some kind of processing pipeline and aren't too concerned about maximal perf. I'm assuming that in a future version we'll want to deprecate & remove the old interface and push users toward this one. |
Is that benchmark you posted for the lexer comparable to the other two posted earlier? That seems like a great basis for comparison |
Final minor suggestion - I think we should change "major" and "minor" in these benchmark names to "major_overlap" and "minor_overlap" for clarity. Anyway, final results from me:
Behavior is close on files with small overlap, and we still have a gap with heavy overlap as expected. Looks great! |
In golang all dependencies of a project have to use the same version? So if you make API incompatible changes then some dependency that also needs the mcap library will break if not updated to the latest? This is why golang uses the folder major versioning approach iirc? I have not done enough golang to say if its better to roll the major or adopt the |
@defunctzombie I'm not sure I'm quite following - hopefully this will be helpful -
While it isn't an option for MCAP, numerous projects choose to ignore SIV (semantic import versioning, this suffixing stuff) for various reasons. The way to do this is to adopt a convention that isn't real semver, such as keeping a permanent 0 or 1 major version and communicating breaking changes in other ways. Foxglove has dependencies that take this approach as well. I don't recall which offhand (protobuf maybe?) but I expect you might find a few if you executed The point of SIV is to allow consumers to use multiple major versions of a lib at the same time. That's not a requirement I've ever encountered so to me personally, it's an inconvenience to update paths, docs, etc just for a version bump. |
I think a good argument against rolling a major update into this PR, is that in order to avoid update churn it would be best if the major were specifically planned so that any breaking changes we want to put in can be bundled together. Due in part to SIV, multiple major upgrades in a row is annoying for consumers. |
8ea7f60
to
c8aaa8d
Compare
6f8c534
to
5e31a58
Compare
5e31a58
to
7903962
Compare
7903962
to
1e6e16b
Compare
Changelog
Message.PopulateFrom([]byte) error
method to Message, which allows readers to avoid reallocating Message structs between messages.MessageIterator.Next2(*Message) (*Schema, *Channel, *Message, error)
method to the MessageIterator interface. This method allows the caller to re-use memory between message reads, and to avoid allocating a new Message on the heap for every message read.Docs
Description
This PR makes message iteration much faster. The included benchmark shows a pretty significant speedup:
Before
Note: the comparison benchmark based on
main
uses MessageIterator.Next() to read messages.After
For the unindexed message iterator, all of the difference comes from:
For the indexed message iterator, we do all of the same things plus:
New API justification
The issues with
Next(p []byte) (*Schema, *Channel, *Message, error)
that caused me to explore alternatives are:p
is used to store the message data, but if it isn't big enough, a new buffer is allocated for every message. A common idiom is to return the newly allocated buffer so that the caller can take ownership of it and re-use it, but the current API doesn't allow that.This new function signature re-uses the message passed in, if one is passed in. If
nil
is used, it creates a new Message.