Split a per-partition WriteRequest into multiple Kafka records if bigger than max allowed size #8077

pracucci · 2024-05-07T13:45:02Z

What this PR does

To be written...

Which issue(s) this PR fixes or relates to

N/A

Checklist

Tests updated.
Documentation added.
CHANGELOG.md updated - the order of entries should be [CHANGE], [FEATURE], [ENHANCEMENT], [BUGFIX].
about-versioning.md updated with experimental features.

…ger than max allowed size Signed-off-by: Marco Pracucci <marco@pracucci.com>

pstibrany

Gave this PR an early look, and it would work.

I wonder if it would have been easier to work at serialized-message level, splitting the message by fields with tag 1 (timeseries) and tag 3 (metadata) until they fill the size, while copying tag 2 (source) and 1000 (skip_label_name_validation) into each submessage.

pstibrany · 2024-05-07T14:16:28Z

pkg/mimirpb/custom.go

+	}
+
+	// We assume that different timeseries roughly have the same size (no huge outliers)
+	// so we preallocate the returned slice just adding 1 extra item (+2 because a +1 is to round up).


I could understand +1, but why +2 again?

+1 to round up, and +1 for an extra item. The +1 round up doesn't guarantees us space for 1 extra item (it depends what was the reminder of the division), but it's guaranteed by the 2nd +2. Does this answer your question?

pstibrany · 2024-05-07T14:17:59Z

pkg/mimirpb/custom.go

+		return []*WriteRequest{partialReq}
+	}
+
+	// We assume that different timeseries roughly have the same size (no huge outliers)


Given that size of each timeseries is dominated by labels, I have doubts that this assumption holds.

In practice we split into 16MB partial requests. At this scale, the size of labels shouldn't matter much.

pstibrany · 2024-05-07T14:19:58Z

pkg/mimirpb/custom.go

+		if partialReqSize+seriesSize > maxSize && !partialReq.IsEmpty() {
+			// The current partial request is full (or close to be full), so we create a new one.
+			partialReqs = append(partialReqs, partialReq)
+			partialReq = newPartialReq(estimatedTimeseriesPerPartialReq)


Don't we need to reset partialReqSize here?

Damn yes. It was a bad bug. Fixed in 0379d8b

Signed-off-by: Marco Pracucci <marco@pracucci.com>

Split a per-partition WriteRequest into multiple Kafka records if big…

72cb323

…ger than max allowed size Signed-off-by: Marco Pracucci <marco@pracucci.com>

pstibrany reviewed May 7, 2024

View reviewed changes

Fix partialReqSize reset

0379d8b

Signed-off-by: Marco Pracucci <marco@pracucci.com>

pstibrany mentioned this pull request May 22, 2024

Split write request at field boundary #8167

Draft

4 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Split a per-partition WriteRequest into multiple Kafka records if bigger than max allowed size #8077

Split a per-partition WriteRequest into multiple Kafka records if bigger than max allowed size #8077

pracucci commented May 7, 2024

pstibrany left a comment

pstibrany May 7, 2024

pracucci May 7, 2024 •

edited

pstibrany May 7, 2024

pracucci May 7, 2024

pstibrany May 7, 2024

pracucci May 7, 2024

Split a per-partition WriteRequest into multiple Kafka records if bigger than max allowed size #8077

Are you sure you want to change the base?

Split a per-partition WriteRequest into multiple Kafka records if bigger than max allowed size #8077

Conversation

pracucci commented May 7, 2024

What this PR does

Which issue(s) this PR fixes or relates to

Checklist

pstibrany left a comment

Choose a reason for hiding this comment

pstibrany May 7, 2024

Choose a reason for hiding this comment

pracucci May 7, 2024 • edited

Choose a reason for hiding this comment

pstibrany May 7, 2024

Choose a reason for hiding this comment

pracucci May 7, 2024

Choose a reason for hiding this comment

pstibrany May 7, 2024

Choose a reason for hiding this comment

pracucci May 7, 2024

Choose a reason for hiding this comment

pracucci May 7, 2024 •

edited