New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add a byte buffer pool for reusing bytes #292
Conversation
@julienschmidt |
One of the TravisCI jobs failed when reporting code coverage to goveralls and I have no access to restart that job, maybe you could retry it:
|
Any profiling to see if this makes a difference? Was it Rick Hudson that said that every use of a pool is a bug in the GC? |
I've tested it on gin: benchmark code and the results show as below:
Mem:
|
What is the difference from before the patch or am I missing how to read the output? |
|
bytebuffer/bytebufferpool.go
Outdated
var ( | ||
// GetLen returns byte buffer with fixed length from the pool, exported from httprouter/bytebuffer. | ||
GetLen = func(n int) *bytebufferpool.ByteBuffer { | ||
b := bytebufferpool.Get() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This uses the defaultPool
of github.com/valyala/bytebufferpool
. How about allowing the user to set and configure their own pool? They could have several packages that all uses the pool.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This uses the defaultPool of github.com/valyala/bytebufferpool. How about allowing the user to set and configure their own pool?
Actually the bytebufferpool package didn't expose APIs for customizing Pool
so there is nothing we can do, but I think all configurations in the default pool has been adjusted near perfectly by the author of bytebufferpool.
They could have several packages that all uses the pool.
I don't think that would be a problem for several packages sharing one pool, that is what a pool does(serve for multiple invokers), isn't it?
@dahankzter |
I don't have any more questions in particular. What does @julienschmidt think? |
@julienschmidt |
Waiting for the PR. See the Gin framework |
Two general thoughts:
|
With the disclaimer that I haven't looked at this PR and only quickly glanced over
The Depending on exactly why you're making this PR, it may or may not make sense to use one or the other, but it's worth being aware that they may not be interchangeable. |
Indeed. But as I understood it, the aim of this PR is to avoid memory allocations. But I don't see how keeping a memory pool alive forever could be a good idea here. Assume we have a peak of 10k concurrent requests, while normally we only serve less than a hundred concurrently. Do we really want to keep that much memory allocated if such a peak isn't likely to happen any time soon again? There is no magic bullet, but I think a sync.Pool is actually the much better tool in this case. But like I said above, avoiding the need to use some sort of pool is most likely far better anyway. |
But it can't be done cuz the lengths of paths are dynamic, so it is going to be allocations on the heap. |
Adding
One uses e.g. a static sized slice / array with capacity 128 on the stack and for any |
@julienschmidt |
I just realize that reusing one slice will trigger race condition cuz |
See 8222db1 Original:
With stack buf (8222db1):
With bytebufferpool:
The remaining 17 allocs/op should be for the returned strings. |
I pushed the stack buffer commit. Feel free to implement further optimizations for inputs longer than 128 bytes. |
If you're still wondering how this stack buffer works: If the size is static (and the buffer does not escape), the size is known before the function is called, thus during the function call some space for this buffer can be reserved in the stack frame instead of allocating it on the heap. |
Got it, I will update this PR for the cases that lengths of input strings are greater than 128 bytes. |
One can actually push this concept a bit further and provide different sized stack buffers via helper functions: 92dde9c (not in master) |
path.go
Outdated
return string(buf[:w]) | ||
|
||
cPath := string(buf[:w]) | ||
putBack(buf) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you're also putting back the supposed stack buffer. Thus, it is marked as "escaping" and actually allocated on the heap
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That is the point, it seems that no way to avoid allocations on the heap when using sync.Pool, so I guess we are abandoning sync.Pool?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you could avoid it by something like that:
var poolBuf []byte
...
poolBuf = getLen(n + 1)
buf = poolBuf
...
if poolBuf != nil {
putBack(poolBuf)
}
Most likely 92dde9c is the more efficient approach. But you're also almost done, so let's benchmark them both.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think that is going to work cuz the root cause of "escaping" variable buf
is from sync.Pool.Put(interface)
not putBuf
, so sync.Pool will always make allocs on the heap.
path_test.go
Outdated
@@ -81,7 +81,7 @@ func TestPathCleanMallocs(t *testing.T) { | |||
|
|||
for _, test := range cleanTests { | |||
allocs := testing.AllocsPerRun(100, func() { CleanPath(test.result) }) | |||
if allocs > 0 { | |||
if allocs > 1 { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
0 allocs was indeed correct.
I added a simple benchmark for long inputs: 8e4b52b It seems like the idea 92dde9c is not worth pursuing further: Single stack buf:
Helpers with differently sized stack bufs (92dde9c):
|
Yes, I guess remaining in the stack buffer approach might be the best for
now.
On Mon, Jan 6, 2020 at 22:50 Julien Schmidt ***@***.***> wrote:
I added a simple benchmark for long inputs: 3a84d03
<3a84d03>
It seems like the idea 92dde9c
<92dde9c>
is not worth pursing further:
Single stack buf:
BenchmarkPathClean-4 801126 1480 ns/op 112 B/op 17 allocs/op
BenchmarkPathCleanLong-4 230 5122911 ns/op 3217488 B/op 4683 allocs/op
Helpers with differently sized stack bufs (92dde9c
<92dde9c>
):
BenchmarkPathClean-4 738854 1616 ns/op 112 B/op 17 allocs/op
BenchmarkPathCleanLong-4 169 7185612 ns/op 4724305 B/op 6393 allocs/op
—
You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub
<#292?email_source=notifications&email_token=ABZGEVRZ3L7R7SGFA7VSHNTQ4NAKXA5CNFSM4J5IOSQKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEIFU3WI#issuecomment-571166169>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/ABZGEVRWADC4H42SVTIGLU3Q4NAKXANCNFSM4J5IOSQA>
.
--
Best Regards,
Andy Pan
|
Thanks for the proposal, which led to an improvement, anyway! |
Signed-off-by: Aaron Schlesinger <aaron@ecomaz.net>
Signed-off-by: Andy Pan panjf2000@gmail.com