Add a byte buffer pool for reusing bytes #292

panjf2000 · 2019-12-19T15:04:34Z

Signed-off-by: Andy Pan panjf2000@gmail.com

panjf2000 · 2019-12-19T15:06:04Z

panjf2000 · 2019-12-19T15:06:16Z

panjf2000 · 2019-12-19T15:11:06Z

One of the TravisCI jobs failed when reporting code coverage to goveralls and I have no access to restart that job, maybe you could retry it:

PASS
coverage: 100.0% of statements
ok  	github.com/julienschmidt/httprouter	0.006s
The command "go test -v -covermode=count -coverprofile=coverage.out" exited with 0.
0.15s$ go vet ./...
The command "go vet ./..." exited with 0.
0.03s$ test -z "$(gofmt -d -s . | tee /dev/stderr)"
The command "test -z "$(gofmt -d -s . | tee /dev/stderr)"" exited with 0.
0.87s$ $HOME/gopath/bin/goveralls  -coverprofile=coverage.out -service=travis-ci
Bad response status from coveralls: 422
{"message":"Couldn't find Travis Job 627272871 from https://api.travis-ci.org (Service name: travis-ci). ","error":true}
The command "$HOME/gopath/bin/goveralls  -coverprofile=coverage.out -service=travis-ci" exited with 1.

bytebuffer/bytebufferpool.go

dahankzter · 2019-12-20T08:12:24Z

Any profiling to see if this makes a difference? Was it Rick Hudson that said that every use of a pool is a bug in the GC?

panjf2000 · 2019-12-20T08:26:23Z

I've tested it on gin: benchmark code and the results show as below:
CPU:

      flat  flat%   sum%        cum   cum%
    1140ms 23.22% 23.22%     2500ms 50.92%  github.com/gin-gonic/gin.cleanPathRaw
    1010ms 20.57% 43.79%     1010ms 20.57%  runtime.pthread_cond_signal
     560ms 11.41% 55.19%     1010ms 20.57%  runtime.mallocgc
     340ms  6.92% 62.12%     1060ms 21.59%  github.com/gin-gonic/gin.cleanPath
     150ms  3.05% 65.17%      250ms  5.09%  github.com/gin-gonic/gin.bufApp
     150ms  3.05% 68.23%      530ms 10.79%  github.com/gin-gonic/gin.bufAppRaw
     150ms  3.05% 71.28%      150ms  3.05%  runtime.memmove
     130ms  2.65% 73.93%      620ms 12.63%  runtime.makeslice
     120ms  2.44% 76.37%      660ms 13.44%  runtime.slicebytetostring
     110ms  2.24% 78.62%      110ms  2.24%  runtime.acquirem

Mem:

      flat  flat%   sum%        cum   cum%
  343.01MB 38.99% 38.99%   578.01MB 65.70%  github.com/gin-gonic/gin.cleanPathRaw
  235.01MB 26.71% 65.70%   235.01MB 26.71%  github.com/gin-gonic/gin.bufAppRaw
  217.01MB 24.67% 90.37%   217.01MB 24.67%  github.com/gin-gonic/gin.glob..func4
   52.50MB  5.97% 96.34%    52.50MB  5.97%  github.com/gin-gonic/gin.cleanPath
      28MB  3.18% 99.52%   245.01MB 27.85%  github.com/gin-gonic/gin.cleanPathSyncPool
         0     0% 99.52%   578.01MB 65.70%  github.com/gin-gonic/gin.BenchmarkCleanPath
         0     0% 99.52%    52.50MB  5.97%  github.com/gin-gonic/gin.BenchmarkCleanPathBytesPool
         0     0% 99.52%   245.01MB 27.85%  github.com/gin-gonic/gin.BenchmarkCleanPathSyncPool
         0     0% 99.52%   875.52MB 99.52%  testing.(*B).launch
         0     0% 99.52%   875.52MB 99.52%  testing.(*B).runN

dahankzter · 2019-12-20T08:34:34Z

What is the difference from before the patch or am I missing how to read the output?

panjf2000 · 2019-12-20T08:48:31Z

cleanPathRaw is the current function and cleanPath is the one that uses byte pool.
The memory usage has been reduced to 15% after using a byte pool.

dahankzter · 2019-12-20T08:49:52Z

bytebuffer/bytebufferpool.go

+var (
+	// GetLen returns byte buffer with fixed length from the pool, exported from httprouter/bytebuffer.
+	GetLen = func(n int) *bytebufferpool.ByteBuffer {
+		b := bytebufferpool.Get()


This uses the defaultPool of github.com/valyala/bytebufferpool. How about allowing the user to set and configure their own pool? They could have several packages that all uses the pool.

This uses the defaultPool of github.com/valyala/bytebufferpool. How about allowing the user to set and configure their own pool?

Actually the bytebufferpool package didn't expose APIs for customizing Pool so there is nothing we can do, but I think all configurations in the default pool has been adjusted near perfectly by the author of bytebufferpool.

They could have several packages that all uses the pool.

I don't think that would be a problem for several packages sharing one pool, that is what a pool does(serve for multiple invokers), isn't it?

panjf2000 · 2019-12-21T01:51:08Z

@dahankzter
Are there any other questions about this PR？

dahankzter · 2019-12-21T09:34:15Z

I don't have any more questions in particular.
I am ambivalent about pooling in general but perhaps it's a good thing in this case.

What does @julienschmidt think?

panjf2000 · 2019-12-25T03:19:21Z

@julienschmidt
Hello, any conclusion?

appleboy · 2020-01-04T15:49:31Z

Waiting for the PR. See the Gin framework

julienschmidt · 2020-01-04T15:56:40Z

Two general thoughts:

It might be better to use a static sized buffer on the stack (with some size between 64 and 256 bytes, I doubt that many inputs are longer than that) and avoid heap allocations in the common case.
We're not going to start adding an external dependency for this. Just use a sync.Pool for cases where a large buffer (e.g. 1024 bytes) is required. Anything larger than that should be allocated only when needed anyway.

SamWhited · 2020-01-04T16:03:21Z

We're not going to start adding a dependency for this. Just use a sync.Pool

With the disclaimer that I haven't looked at this PR and only quickly glanced over github.com/valyala/bytebufferpool, it appears these serve two different purposes.

sync.Pool is about garbage collection pressure and buffers in a pool may be GCed at any time.

The bytebufferpool package appears to keep the entire pool alive forever and is more about allocating long-lived buffers with some resistance to memory fragmentation.

Depending on exactly why you're making this PR, it may or may not make sense to use one or the other, but it's worth being aware that they may not be interchangeable.

julienschmidt · 2020-01-04T16:13:58Z

Indeed. But as I understood it, the aim of this PR is to avoid memory allocations.
Reusing already allocated buffers from a sync.Pool fulfills the same purpose in this case.

But I don't see how keeping a memory pool alive forever could be a good idea here. Assume we have a peak of 10k concurrent requests, while normally we only serve less than a hundred concurrently. Do we really want to keep that much memory allocated if such a peak isn't likely to happen any time soon again? There is no magic bullet, but I think a sync.Pool is actually the much better tool in this case.

But like I said above, avoiding the need to use some sort of pool is most likely far better anyway.

panjf2000 · 2020-01-05T12:17:36Z

But I don't see how keeping a memory pool alive forever could be a good idea here.

bufferpool uses the sync.Pool inside so it doesn't keep the pool alive forever, don't worry about that.

It might be better to use a static sized buffer on the stack.

But it can't be done cuz the lengths of paths are dynamic, so it is going to be allocations on the heap.

@julienschmidt

julienschmidt · 2020-01-05T13:50:54Z

Adding bufferpool is not an option here. The potential gain, if any, simply doesn't justify it.

But it can't be done cuz the lengths of paths are dynamic, so it is going to be allocations on the heap.

One uses e.g. a static sized slice / array with capacity 128 on the stack and for any len(input) <= 128 one re-slices the existing slice, for anything larger than that one either takes an again static sized slice e.g. with capacity 1024 from a sync.Pool or one skips that pool and just dynamically allocates the slice on the heap.

panjf2000 · 2020-01-05T14:20:51Z

@julienschmidt
I've gotten your points and am going to rewrite the code of this PR based on your thoughts, we will see how that works and keep discussing it further, thanks!

panjf2000 · 2020-01-06T01:00:07Z

One uses e.g. a static sized slice / array with capacity 128 on the stack and for any len(input) <= 128 one re-slices the existing slice

I just realize that reusing one slice will trigger race condition cuz CleanPath() runs concurrently.

julienschmidt · 2020-01-06T12:44:40Z

See 8222db1

Original:

BenchmarkPathClean-4   	  630038	      1893 ns/op	     416 B/op	      40 allocs/op

With stack buf (8222db1):

BenchmarkPathClean-4   	  731874	      1550 ns/op	     112 B/op	      17 allocs/op

With bytebufferpool:

BenchmarkPathClean-4   	  428262	      2728 ns/op	     112 B/op	      17 allocs/op

The remaining 17 allocs/op should be for the returned strings.

julienschmidt · 2020-01-06T12:48:31Z

I pushed the stack buffer commit. Feel free to implement further optimizations for inputs longer than 128 bytes.
The other interesting question is what the actual stack buffer size should be. 128 is just a random number I picked. What is the usual (maximum) length of a path?

julienschmidt · 2020-01-06T13:00:53Z

If you're still wondering how this stack buffer works: If the size is static (and the buffer does not escape), the size is known before the function is called, thus during the function call some space for this buffer can be reserved in the stack frame instead of allocating it on the heap.

panjf2000 · 2020-01-06T13:04:39Z

Got it, I will update this PR for the cases that lengths of input strings are greater than 128 bytes.

julienschmidt · 2020-01-06T13:27:36Z

One can actually push this concept a bit further and provide different sized stack buffers via helper functions: 92dde9c (not in master)

julienschmidt · 2020-01-06T14:12:27Z

path.go

-	return string(buf[:w])
+
+	cPath := string(buf[:w])
+	putBack(buf)


you're also putting back the supposed stack buffer. Thus, it is marked as "escaping" and actually allocated on the heap

That is the point, it seems that no way to avoid allocations on the heap when using sync.Pool, so I guess we are abandoning sync.Pool?

I think you could avoid it by something like that:

var poolBuf []byte ... poolBuf = getLen(n + 1) buf = poolBuf ... if poolBuf != nil { putBack(poolBuf) }

Most likely 92dde9c is the more efficient approach. But you're also almost done, so let's benchmark them both.

I don't think that is going to work cuz the root cause of "escaping" variable buf is from sync.Pool.Put(interface) not putBuf, so sync.Pool will always make allocs on the heap.

julienschmidt · 2020-01-06T14:12:32Z

path_test.go

@@ -81,7 +81,7 @@ func TestPathCleanMallocs(t *testing.T) {

 	for _, test := range cleanTests {
 		allocs := testing.AllocsPerRun(100, func() { CleanPath(test.result) })
-		if allocs > 0 {
+		if allocs > 1 {


0 allocs was indeed correct.

julienschmidt · 2020-01-06T14:50:18Z

I added a simple benchmark for long inputs: 8e4b52b

It seems like the idea 92dde9c is not worth pursuing further:

Single stack buf:

BenchmarkPathClean-4          801126          1480 ns/op         112 B/op         17 allocs/op
BenchmarkPathCleanLong-4         230       5122911 ns/op     3217488 B/op       4683 allocs/op

Helpers with differently sized stack bufs (92dde9c):

BenchmarkPathClean-4          738854          1616 ns/op         112 B/op         17 allocs/op
BenchmarkPathCleanLong-4         169       7185612 ns/op     4724305 B/op       6393 allocs/op

panjf2000 · 2020-01-06T14:53:54Z

Yes, I guess remaining in the stack buffer approach might be the best for now.

On Mon, Jan 6, 2020 at 22:50 Julien Schmidt ***@***.***> wrote: I added a simple benchmark for long inputs: 3a84d03 <3a84d03> It seems like the idea 92dde9c <92dde9c> is not worth pursing further: Single stack buf: BenchmarkPathClean-4 801126 1480 ns/op 112 B/op 17 allocs/op BenchmarkPathCleanLong-4 230 5122911 ns/op 3217488 B/op 4683 allocs/op Helpers with differently sized stack bufs (92dde9c <92dde9c> ): BenchmarkPathClean-4 738854 1616 ns/op 112 B/op 17 allocs/op BenchmarkPathCleanLong-4 169 7185612 ns/op 4724305 B/op 6393 allocs/op — You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub <#292?email_source=notifications&email_token=ABZGEVRZ3L7R7SGFA7VSHNTQ4NAKXA5CNFSM4J5IOSQKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEIFU3WI#issuecomment-571166169>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/ABZGEVRWADC4H42SVTIGLU3Q4NAKXANCNFSM4J5IOSQA> .

-- Best Regards, Andy Pan

julienschmidt · 2020-01-07T08:36:57Z

Thanks for the proposal, which led to an improvement, anyway!

Signed-off-by: Aaron Schlesinger <aaron@ecomaz.net>

dahankzter reviewed Dec 20, 2019

View reviewed changes

bytebuffer/bytebufferpool.go Outdated Show resolved Hide resolved

dahankzter reviewed Dec 20, 2019

View reviewed changes

julienschmidt reviewed Jan 6, 2020

View reviewed changes

appleboy mentioned this pull request Jan 7, 2020

Reuse bytes when cleaning the URL paths gin-gonic/gin#2179

Merged

panjf2000 closed this Jan 7, 2020

panjf2000 deleted the bytes-pool branch January 7, 2020 04:10

julienschmidt added the performance label Jan 7, 2020

similark pushed a commit to similarweb/httprouter that referenced this pull request May 9, 2023

using ServeContext for all production HTTP servers (julienschmidt#292)

30fb204

Signed-off-by: Aaron Schlesinger <aaron@ecomaz.net>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add a byte buffer pool for reusing bytes #292

Add a byte buffer pool for reusing bytes #292

panjf2000 commented Dec 19, 2019 •

edited

panjf2000 commented Dec 19, 2019

panjf2000 commented Dec 19, 2019

panjf2000 commented Dec 19, 2019 •

edited

dahankzter commented Dec 20, 2019

panjf2000 commented Dec 20, 2019

dahankzter commented Dec 20, 2019

panjf2000 commented Dec 20, 2019 •

edited

dahankzter Dec 20, 2019

panjf2000 Dec 20, 2019 •

edited

panjf2000 commented Dec 21, 2019 •

edited

dahankzter commented Dec 21, 2019

panjf2000 commented Dec 25, 2019

appleboy commented Jan 4, 2020

julienschmidt commented Jan 4, 2020 •

edited

SamWhited commented Jan 4, 2020

julienschmidt commented Jan 4, 2020 •

edited

panjf2000 commented Jan 5, 2020 •

edited

julienschmidt commented Jan 5, 2020 •

edited

panjf2000 commented Jan 5, 2020 •

edited

panjf2000 commented Jan 6, 2020

julienschmidt commented Jan 6, 2020 •

edited

julienschmidt commented Jan 6, 2020

julienschmidt commented Jan 6, 2020

panjf2000 commented Jan 6, 2020 •

edited

julienschmidt commented Jan 6, 2020

julienschmidt Jan 6, 2020

panjf2000 Jan 6, 2020

julienschmidt Jan 6, 2020

panjf2000 Jan 6, 2020 •

edited

julienschmidt Jan 6, 2020

julienschmidt commented Jan 6, 2020 •

edited

panjf2000 commented Jan 6, 2020 via email

julienschmidt commented Jan 7, 2020

Add a byte buffer pool for reusing bytes #292

Add a byte buffer pool for reusing bytes #292

Conversation

panjf2000 commented Dec 19, 2019 • edited

panjf2000 commented Dec 19, 2019

panjf2000 commented Dec 19, 2019

panjf2000 commented Dec 19, 2019 • edited

dahankzter commented Dec 20, 2019

panjf2000 commented Dec 20, 2019

dahankzter commented Dec 20, 2019

panjf2000 commented Dec 20, 2019 • edited

dahankzter Dec 20, 2019

Choose a reason for hiding this comment

panjf2000 Dec 20, 2019 • edited

Choose a reason for hiding this comment

panjf2000 commented Dec 21, 2019 • edited

dahankzter commented Dec 21, 2019

panjf2000 commented Dec 25, 2019

appleboy commented Jan 4, 2020

julienschmidt commented Jan 4, 2020 • edited

SamWhited commented Jan 4, 2020

julienschmidt commented Jan 4, 2020 • edited

panjf2000 commented Jan 5, 2020 • edited

julienschmidt commented Jan 5, 2020 • edited

panjf2000 commented Jan 5, 2020 • edited

panjf2000 commented Jan 6, 2020

julienschmidt commented Jan 6, 2020 • edited

julienschmidt commented Jan 6, 2020

julienschmidt commented Jan 6, 2020

panjf2000 commented Jan 6, 2020 • edited

julienschmidt commented Jan 6, 2020

julienschmidt Jan 6, 2020

Choose a reason for hiding this comment

panjf2000 Jan 6, 2020

Choose a reason for hiding this comment

julienschmidt Jan 6, 2020

Choose a reason for hiding this comment

panjf2000 Jan 6, 2020 • edited

Choose a reason for hiding this comment

julienschmidt Jan 6, 2020

Choose a reason for hiding this comment

julienschmidt commented Jan 6, 2020 • edited

panjf2000 commented Jan 6, 2020 via email

julienschmidt commented Jan 7, 2020

panjf2000 commented Dec 19, 2019 •

edited

panjf2000 commented Dec 19, 2019 •

edited

panjf2000 commented Dec 20, 2019 •

edited

panjf2000 Dec 20, 2019 •

edited

panjf2000 commented Dec 21, 2019 •

edited

julienschmidt commented Jan 4, 2020 •

edited

julienschmidt commented Jan 4, 2020 •

edited

panjf2000 commented Jan 5, 2020 •

edited

julienschmidt commented Jan 5, 2020 •

edited

panjf2000 commented Jan 5, 2020 •

edited

julienschmidt commented Jan 6, 2020 •

edited

panjf2000 commented Jan 6, 2020 •

edited

panjf2000 Jan 6, 2020 •

edited

julienschmidt commented Jan 6, 2020 •

edited