Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reduce byte array allocations in StompEncoder #24694

Conversation

larsgrefer
Copy link
Contributor

The use of an okio.Buffer instead of an ByteArrayOutputStream will reduce
the amount of allocated byte arrays by 50% to 66%.

With the current ByteArrayOutputStream-based implementation, 2-3 byte arrays
are allocated. One when the stream is created, one when toByteArray() is called
and maybe a third one, when the stream must grow because the overhead(headers) exeeds 128 bytes.

With an okio.Buffer, only one byte array has to be allocated, when readByteArray() is called.
All other needed arrays will be taken from Okio's internal Segment Pool.

The pattern of conditionally using classes, when they are present on the classpath
is inspired by RestTemplate.

I've choosen this older okio version, because its the same version that would be pulled by the already managed okhttp version.

Context

While profiling our application we noticed that a considerable amount of byte arrays is created in org.springframework.messaging.simp.stomp.StompEncoder.encode:
stomp-profiling

In other parts of our application, we already made good experience with using okio.Buffers instead of ByteArrayOutputStreams

@spring-projects-issues spring-projects-issues added the status: waiting-for-triage An issue we've not yet triaged or decided on label Mar 13, 2020
@rstoyanchev rstoyanchev self-assigned this Mar 25, 2020
@rstoyanchev rstoyanchev added the in: messaging Issues in messaging modules (jms, messaging) label Mar 25, 2020
@rstoyanchev
Copy link
Contributor

Good suggestion.

Couldn't we achieve that with java.nio.ByteBuffer? It's not too hard to expose it as OutputStream as in our DefaultDataBuffer and at the end get the backing array from the ByteBuffer. It doesn't require remembering to add an extra dependency.

@rstoyanchev rstoyanchev added the status: waiting-for-feedback We need additional information before we can continue label Mar 25, 2020
@larsgrefer
Copy link
Contributor Author

I like the idea of not having to add/use an additional external dependency, but I'm not sure if a ByteBuffer would help here.

ByteBuffer.allocate() returns a java.nio.HeapByteBuffer which just seems to be a wrapper around a regular byte[].

Using ByteBuffer.allocateDirect() would return an java.nio.DirectByteBuffer which seems to avoid byte[] by doing some Unsafe stuff outside the JVM heap, but the Javadoc states the following:

The buffers returned by this method typically have somewhat higher allocation and deallocation costs than non-direct buffers. [...] It is therefore recommended that direct buffers be allocated primarily for large, long-lived buffers [...].

Since ByteBuffers are fixed-sized, we'd have the same allocation and growth problems as before.

@spring-projects-issues spring-projects-issues added status: feedback-provided Feedback has been provided and removed status: waiting-for-feedback We need additional information before we can continue labels Mar 28, 2020
@rstoyanchev
Copy link
Contributor

Yes I meant a heap buffer. A byte[] is created initially but subsequent writing the ByteBuffer does not require a copy and in the end you can get the underlying byte array. Those would account for the top 2 (34.9% and 22.5%) from the breakdown above.

With an okio.Buffer, only one byte array has to be allocated, when readByteArray() is called.
All other needed arrays will be taken from Okio's internal Segment Pool.

This is saying there is still one allocation at least. I don't know quite follow, what other arrays are needed here that would be taken from a pool?

@larsgrefer
Copy link
Contributor Author

This is saying there is still one allocation at least. I don't know quite follow, what other arrays are needed here that would be taken from a pool?

In our case (because our stomp-headers are bigger than 128 byte), the current implementation allocates 3 byte arrays for each stomp message we send:

  1. The initial internal buffer of the ByteArrayOutputStream (initialized as 128 + payload.length) (20.0%, 14414 samples)
  2. The grown internal buffer of the ByteArrayOutputStream, which is needed because we write more than payload.length + 128 bytes. (22.5%, 16233 samples)
  3. The result array which is created as subsequence copy of the internal buffer (34.9%, 25101 samples)

(The numbers don't add up correctly because the profiler I used doesn't track every single allocation as explained here: https://github.com/jvm-profiling-tools/async-profiler#allocation-profiling )

When sending 10000 messages, this makes 30000 byte arrays

When using okio.Buffer the first two arrays would be taken from a pool and only the result array would be be allocated. This would make 10001-10002 byte arrays for 10000 messages.

A byte[] is created initially but subsequent writing the ByteBuffer does not require a copy and in the end you can get the underlying byte array.

How would that work, if the the underlying byte array is too small, or too big?
If it's too small we had to allocate a second, bigger ByteBuffer and copy the contents from the old to the new one. (effectively step 2 mentioned above)
If its too big, we'd need to allocate a correctly sized result array and copy "our" subsequence of the buffer array into the result array. (effectively step 3 mentioned above)

@rstoyanchev
Copy link
Contributor

rstoyanchev commented Mar 30, 2020

Thanks for elaborating. So we are doing a very poor job of estimating the required size.

We have all the input at hand. We should be able to estimate more precisely, but turning the Map<String, Object> to Map<byte[], byte[]> upfront, and only then allocate the overall array. That should avoid allocations to expand.

Come to think of it we also have https://docs.spring.io/spring/docs/current/javadoc-api/org/springframework/util/FastByteArrayOutputStream.html which should be able to do the same.

@larsgrefer
Copy link
Contributor Author

I did a quick check for our application: We'd need a initial buffer of approx. payload.length + 300 in order to avoid resizing.

FastByteArrayOutputStream seems to be similar to okio.Buffer, as they are both linked lists of byte array segments. The main difference is that okio.Buffer uses fixed sized segments which are pooled and FastByteArrayOutputStream uses dynamically sized segments (which aren't pooled).

@rstoyanchev
Copy link
Contributor

rstoyanchev commented Mar 30, 2020

Something like this rstoyanchev@fa220ff, basically just aggregating and deferring allocation until the end. That eliminates resizing.

I'm also wondering if we can also avoid the array copy when taking byte[] from a String by iterating chars and encoding them individually.

@larsgrefer
Copy link
Contributor Author

The commit you've linked looks very promising. I've left some inline comments.

The use of an okio.Buffer instead of an ByteArrayOutputStream will reduce
the amount of allocated byte arrays by 50% to 66%.

With the current ByteArrayOutputStream-based implementation 2-3 byte arrays
are allocated. One when the stream is created, one when toByteArray() is called
and maybe a third one, when the stream must grow because the overhead(headers) exeeds 128 bytes.

With an okio.Buffer, only one byte array has to be allocated, when readByteArray() is called.
All other needed arrays will be taken from Okio's internal Segment Pool.

The pattern of conditionally using classes, when they are present on the classpath
is inspired by RestTemplate.
@rstoyanchev rstoyanchev added type: enhancement A general enhancement and removed status: feedback-provided Feedback has been provided status: waiting-for-triage An issue we've not yet triaged or decided on labels Mar 31, 2020
@rstoyanchev rstoyanchev changed the title Use okio.Buffer in StompEncoder if present Reduce byte array allocations in StompEncoder Mar 31, 2020
@rstoyanchev
Copy link
Contributor

I've incorporated your feedback, thanks.

@larsgrefer
Copy link
Contributor Author

Thanks for fixing this so fast 👍

Will this only be part of 5.3 M1 or also back-ported to 5.2.x?

@rstoyanchev
Copy link
Contributor

This is in master and we haven't branched for 5.3 yet, so it will be in 5.2.6.

@rstoyanchev rstoyanchev added this to the 5.2.6 milestone Mar 31, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
in: messaging Issues in messaging modules (jms, messaging) type: enhancement A general enhancement
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants