proposal: io: CopyBuffer should avoid ReadFrom/WriteTo #16474

dsnet · 2016-07-22T20:21:51Z

Currently, io.CopyBuffer uses WriteTo if the src supports it and ReadFrom if the dst supports it, presumably to avoid an allocation.

The problem is that there exist certain io.ReaderFrom that are not efficient 100% of the time. For example, consider the following implementation of ReadFrom:

type myWriter struct {
    w io.Writer
}

func (mw *myWriter) ReadFrom(src io.Reader) (int64, error) {
    if r, ok := mw.w.(io.ReaderFrom); ok {
        return r.ReadFrom(src)
    }
    return io.Copy(struct{ io.Writer }{mw}, src)
}

This is unfortunate as myWriter satisfies the io.ReaderFrom interface, but it's implementation is no better than io.Copy (always incurring an allocation) in some circumstances.

In the case of io.CopyBuffer, I would argue that the user's expectation is that the buffer passed in is always used. It has the detriment of an extra copy, but I believe the cost of that is much lower than the allocation that is already saved by the user providing the buffer. It main benefit is that it avoids any shenanigans that occur because of inefficient implementations of io.ReaderFrom and io.WriterTo.

Alternative is to do anonymous struct value hack:

io.CopyBuffer(struct{ io.Writer }{dst}, struct{ io.Reader }{src}, buf)

But, this hack is not intuitive to most people.

Filed on behalf of @JKJI

nhooyr · 2016-07-26T12:10:57Z

Why would someone implement io.ReaderFrom or io.WriterTo if they are inefficient? Seems really backwards to me.

I see what you mean with the buf. It doesn't make much sense to pass in a buffer when io.CopyBuffer will just end up calling io.ReaderFrom. That is unintuitive.

I don't think we can change this because of the Go 1 compatibility anyway.

romanl-g · 2016-07-26T15:43:57Z

ReaderFrom does not have to be horribly inefficient, but it can be inefficient enough for the CopyBuffer caller to care. In my case, this was severely reducing the file transfer rate but it might be totally fine for some other application. Someone probably added it for a different use case, without realizing the side effects on CopyBuffer.
Conceptually, it seems that ReaderFrom should have a buffer argument and ignore it if it is not needed or nil. Efficient buffer-less ReaderFrom implementation sounds like a rare beast to me.

nhooyr · 2016-07-26T19:23:49Z

Well we can't really change this now. Perhaps in Go 2?

dsnet · 2016-07-26T19:43:14Z

@nhooyr, how do you see this as a Go 1 compatibility issue? Whether io.CopyBuffer chooses to use ReadFrom or WriteTo or not does not alter the correctness of the function. It's primarily a question of performance.

Part of the problem is that io.ReaderFrom and io.WriterTo doesn't document anything about needing to be "efficient" (however that is defined), so there are many implementations of them that are sub-par compared to the case where io.CopyBuffer just used the user provided buffer only.

nhooyr · 2016-07-26T20:05:39Z

He said that io.ReaderFrom should have a buffer argument that should be ignored if it is not needed or nil. That cannot happen without breaking Go 1 Compatibility. Perhaps it should be better documented but the whole point of io.ReaderFrom is to not need a buffer. To write it directly like say sendfile(2).

Furthermore, a lot of code relies on theio.Copy functions using the io.ReaderFrom or io.WriterTo interfaces to optimize. Its documented behavior that io.CopyBuffer behaves identically to io.Copy except it stages through the provided buffer rather than allocating a temporary one. By changing io.CopyBuffer we break this documented behaviour that go programs could rely on.

I definitely agree though, I find it unintuitive that io.CopyBuffer doesn't always use the passed buffer, but because io.CopyBuffer using io.ReaderFrom or io.WriterTo is documented behavior, it breaks the Go 1 compatibility promise.

tv42 · 2016-07-26T20:16:10Z

CopyBuffer is identical to Copy except that it stages through the provided buffer (if one is required) rather than allocating a temporary one.

dsnet · 2016-07-26T20:33:28Z

Its documented behavior that io.CopyBuffer behaves identically to io.Copy except it stages through the provided buffer rather than allocating a temporary one. By changing io.CopyBuffer we may break this documented behaviour that go programs could rely on.

Fair argument.

romanl-g · 2016-07-26T21:50:08Z

@nhooyr, thanks for the clarification. Backwards compatibility argument makes sense, that's why I said "conceptually" ;)

dsnet · 2016-08-02T18:57:09Z

Just noting here that strings.Reader.WriteTo is another example of an implementation of io.WriterTo that is not efficient in all circumstances.

It relies on io.WriteString, which only calls WriteString if the underlying writer supports it. Otherwise, it causes a (potentially large) allocation to convert the string to a []byte.

Related to #13848, which is intended to help alleviate the issue.

tv42 · 2016-08-02T20:02:03Z

@dsnet Is this an actual bottleneck in something that can't be fixed without stdlib changes?

dsnet · 2016-08-02T20:07:51Z

My original post contained the hack to get around this issue:

io.CopyBuffer(struct{ io.Writer }{dst}, struct{ io.Reader }{src}, buf)

It's just not an intuitive hack to most people. Secondly, the need for it doesn't become obvious until a pprof indicates that io.CopyBuffer is spending significant amounts of time and allocations.

tv42 · 2016-08-02T20:12:50Z

@dsnet Was it ever an observed bottleneck in a real, otherwise well-written, system?

dsnet · 2016-08-02T20:28:02Z

Yes, #13848 was filed when I observed a large string was unnecessarily being copied.
This issue was filed on behalf of someone who noticed performance issues in an internal system.

nhooyr · 2016-10-11T08:08:23Z

Haha, I just got bitten by this. In tlsmuxd I was copying between two TCP connections with io.CopyBuffer. I expected it to use my buffer from my bufferPool but it turns out that *net.TCPConn has the ReadFrom method for sendfile. However, since it was reading from a net.Conn sendfile did not apply and it fell back to io.Copy. Consequently, my buffer was unused and there was a buffer allocation in io.Copy.

Luckily, I read *net.TCPConn's source one day so I realized my mistake but I've retracted my original view that it a compat issue because I think that this is extremely subtle, unintuitive and I cannot imagine anyone who passes in a buffer to io.CopyBuffer but does not expect it to be used. I understand that it was documented to behave just like io.Copy, but why would you pass in a buffer if you aren't sure it will be used? It just doesn't make any sense.

dsnet · 2016-10-17T23:11:35Z

Not performance related, but http://golang.org/cl/31174 is another case where the behavior of io.CopyBuffer was unexpected. The test relied upon copy being done in specific chunk sizes.

bcmills · 2018-01-02T17:38:28Z

Here's a different way to look at this issue, which generalizes to magic-method APIs in general:

For a magic-method API to work with wrapper types, the method's implementation must be able to indicate that the method is not actually supported.

io.ReaderFrom violates that principle: there is no standardized way for ReadFrom to return an error that means “fall back to Write instead”.

You could imagine retrofitting one, such as io.ErrNotSupported, but it would likely break callers on a massive scale. More likely, you'd need to rename the method to something like MaybeReadFrom so that callees can determine whether to return ErrNotSupported or press ahead with a less-efficient implementation.

dsnet · 2018-01-02T18:30:04Z

the method's implementation must be able to indicate that the method is not actually supported.

This is one of the driving usages of #21904.

Part of the problem with "not implemented" is that it's obvious for cases like io.Seeker, where you either support it or you don't. It's not so obvious what you're supposed to do, when you do implement it and your implementation happens to be less efficient than if your caller had just used Write instead.

ianlancetaylor · 2018-01-16T22:27:10Z

@bcmills I don't think this works in general. In the case discussed here, it sometimes makes sense for io.CopyBuffer to call the ReadFrom method, and sometimes it doesn't. It makes sense to use ReadFrom if ReadFrom does not require a buffer at all, as for example for (*net.TCPConn).ReadFrom on systems that support sendfile. It does not make sense for io.CopyBuffer to call ReadFrom if ReadFrom uses an internal buffer. But the fact that ReadFrom uses an internal buffer doesn't mean that it is not supported; it may be a completely appropriate use in some cases, just not for io.CopyBuffer with a larger buffer. So using a "not actually supported" mechanism would require specifying "not actually supported for this specific use" which is an N-x-N problem.

More generally, if you know the types involved, you will call the methods directly. If you don't know the types involved, there is no obvious general approach to know whether to use a magic method or not, short of actual measurement.

We wind up in this situation because there is existing code that uses io.Copy, and we can't or don't want to change that code. But we have a type that has a fast copy mechanism, and we want the existing code to use our fast mechanism. So we invent a magic method, put it on the type, and change io.Copy to use that magic method. In effect we are using a magic method as a communication mechanism across code that doesn't know about it.

It seems that the best way to use this technique effectively is for each magic method to mean exactly one thing. In this case we are using it for two things: io.Copy to copy with a generic buffer, and io.CopyBuffer to copy with a specific buffer. They should not both use the same magic method. With this approach a function like io.CopyBuffer would document clearly the set of magic methods that it supports. If multiple functions support the same magic method, we wind up with an N-x-N crossbar: trying to use the magic method for multiple different purposes, when we have to know which purpose is intended to know whether it makes sense.

On the other hand, if we have a magic method that is specific to io.CopyBuffer, then the code winds up having spooky action at a distance: methods are defined solely so that they can be detected at a completely different, seemingly unrelated, part of the code.

bcmills · 2018-03-02T06:39:37Z

I'm beginning to think that this issue calls into question the value of CopyBuffer entirely. If callers want an unconditional buffer, they can wrap the Reader in a bufio.Reader of appropriate size, which implements io.WriterTo and thus always wins the magic-method contest in io.Copy.

io.CopyBuffer only adds value if it is conditional, but the fact that it is conditional is what makes it so subtle to use.

ianlancetaylor · 2020-06-19T05:34:56Z

I want to note that in 1.15 this issue is going to apply to os.File, which acquired a ReadFrom method in https://golang.org/cl/229101.

gopherbot · 2020-06-19T05:45:36Z

Change https://golang.org/cl/238864 mentions this issue: doc/go1.15: mention consequence of os.File.ReadFrom

Now that we've added a os.File.ReadFrom method, io.CopyBuffer to a os.File will no longer use the provided buffer. For #16474 For #36817 For #37419 Change-Id: I79a3bf778ff93eab88e88dd9ecbb8c7ea101e868 Reviewed-on: https://go-review.googlesource.com/c/go/+/238864 Reviewed-by: Dmitri Shuralyov <dmitshur@golang.org>

ianlancetaylor · 2022-11-16T00:22:58Z

Another report: #56742.

ianlancetaylor · 2023-02-06T00:31:45Z

A related proposal: #58331.

gopherbot · 2023-03-11T08:08:19Z

Change https://go.dev/cl/475575 mentions this issue: io: optimize copyBuffer to make use of the user-provided buffer for fallbacks

neild · 2024-04-26T17:17:14Z

#66988 is related: Adding *os.File.WriteTo with a fast path for sendfile/splice and a fallback to io.Copy results in us missing the fast path in *net.TCPConn.WriteTo.

neild · 2024-04-26T20:16:14Z

Another related proposal: #67074

dsnet changed the title ~~io: CopyBuffer should not perform io.ReaderFrom and io.WriterTo check~~ io: CopyBuffer should not perform ReaderFrom and WriterTo optimization Jul 22, 2016

dsnet added this to the Unplanned milestone Jul 26, 2016

dsnet added the v2 A language change or incompatible library change label Jul 26, 2016

dsnet added the Performance label Oct 4, 2016

brentp mentioned this issue Jan 2, 2017

Read full biogo/hts#56

Merged

rsc changed the title ~~io: CopyBuffer should not perform ReaderFrom and WriterTo optimization~~ io: CopyBuffer should avoid ReadFrom/WriteTo Jun 16, 2017

rsc changed the title ~~io: CopyBuffer should avoid ReadFrom/WriteTo~~ proposal: io: CopyBuffer should avoid ReadFrom/WriteTo Jun 17, 2017

gopherbot added the Proposal label Jun 17, 2017

dsnet mentioned this issue Sep 15, 2017

proposal: io: "unimplemented" error return for WriterTo et al. #21904

Closed

komuw mentioned this issue Oct 30, 2017

implement ReadFrom for efficiency komuw/meli#62

Closed

dsnet mentioned this issue Dec 31, 2017

bufio: Writer.ReadFrom doesn't always buffer #23289

Closed

bcmills mentioned this issue Jan 8, 2018

proposal: Go 2: default methods #23185

Closed

ianlancetaylor added the NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. label Jan 23, 2018

bcmills mentioned this issue Sep 14, 2018

proposal: Go 2: interface adapter types #26757

Closed

bcmills mentioned this issue Dec 20, 2018

net/http: new HTTP client package #23707

Open

This was referenced May 28, 2019

io: CopyBuffer should document when buf is not used #32276

Closed

io: clearly document when CopyBuffer does not use the buffer #32306

Closed

ianlancetaylor mentioned this issue Jul 29, 2019

bufio: Reader cannot buf content greater than 32KB #33264

Closed

bcmills mentioned this issue Sep 23, 2020

proposal: net/http: automatically set Content-Length from Len method on Body #41182

Closed

mailsmail mentioned this issue Feb 9, 2021

S3 Select: missing data on multiple column when using MinIO S3 Gateway minio/minio#11413

Closed

costela mentioned this issue Oct 26, 2022

proposal: x/exp/mmap: implement io.WriterTo and io.ReadSeeker #56422

Open

aktau mentioned this issue Nov 15, 2022

archive/zip: make checksumReader implement io.WriterTo #56742

Open

ianlancetaylor mentioned this issue Feb 6, 2023

proposal: io: Add io.ReaderFromBuffer interface #58331

Open

gaby mentioned this issue Mar 19, 2023

🐛 [Bug]: c.Download returns 404 error when file is over 2gb gofiber/fiber#2377

Closed

3 tasks

costela mentioned this issue Jan 16, 2024

net/http: use Copy in ServeContent if CopyN not needed #65106

Open

edigaryev mentioned this issue Mar 19, 2024

targz: perform less allocations due to weird CopyBuffer logic cirruslabs/cirrus-ci-agent#347

Merged

neild mentioned this issue Apr 26, 2024

proposal: io: CopierTo and CopierFrom interfaces #67074

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

proposal: io: CopyBuffer should avoid ReadFrom/WriteTo #16474

proposal: io: CopyBuffer should avoid ReadFrom/WriteTo #16474

dsnet commented Jul 22, 2016 •

edited

nhooyr commented Jul 26, 2016 •

edited

romanl-g commented Jul 26, 2016

nhooyr commented Jul 26, 2016

dsnet commented Jul 26, 2016

nhooyr commented Jul 26, 2016 •

edited

tv42 commented Jul 26, 2016

dsnet commented Jul 26, 2016 •

edited

romanl-g commented Jul 26, 2016

dsnet commented Aug 2, 2016 •

edited

tv42 commented Aug 2, 2016

dsnet commented Aug 2, 2016

tv42 commented Aug 2, 2016

dsnet commented Aug 2, 2016

nhooyr commented Oct 11, 2016 •

edited

dsnet commented Oct 17, 2016 •

edited

bcmills commented Jan 2, 2018 •

edited

dsnet commented Jan 2, 2018

ianlancetaylor commented Jan 16, 2018

bcmills commented Mar 2, 2018

ianlancetaylor commented Jun 19, 2020

gopherbot commented Jun 19, 2020

ianlancetaylor commented Nov 16, 2022

ianlancetaylor commented Feb 6, 2023

gopherbot commented Mar 11, 2023

neild commented Apr 26, 2024

neild commented Apr 26, 2024

proposal: io: CopyBuffer should avoid ReadFrom/WriteTo #16474

proposal: io: CopyBuffer should avoid ReadFrom/WriteTo #16474

Comments

dsnet commented Jul 22, 2016 • edited

nhooyr commented Jul 26, 2016 • edited

romanl-g commented Jul 26, 2016

nhooyr commented Jul 26, 2016

dsnet commented Jul 26, 2016

nhooyr commented Jul 26, 2016 • edited

tv42 commented Jul 26, 2016

dsnet commented Jul 26, 2016 • edited

romanl-g commented Jul 26, 2016

dsnet commented Aug 2, 2016 • edited

tv42 commented Aug 2, 2016

dsnet commented Aug 2, 2016

tv42 commented Aug 2, 2016

dsnet commented Aug 2, 2016

nhooyr commented Oct 11, 2016 • edited

dsnet commented Oct 17, 2016 • edited

bcmills commented Jan 2, 2018 • edited

dsnet commented Jan 2, 2018

ianlancetaylor commented Jan 16, 2018

bcmills commented Mar 2, 2018

ianlancetaylor commented Jun 19, 2020

gopherbot commented Jun 19, 2020

ianlancetaylor commented Nov 16, 2022

ianlancetaylor commented Feb 6, 2023

gopherbot commented Mar 11, 2023

neild commented Apr 26, 2024

neild commented Apr 26, 2024

dsnet commented Jul 22, 2016 •

edited

nhooyr commented Jul 26, 2016 •

edited

nhooyr commented Jul 26, 2016 •

edited

dsnet commented Jul 26, 2016 •

edited

dsnet commented Aug 2, 2016 •

edited

nhooyr commented Oct 11, 2016 •

edited

dsnet commented Oct 17, 2016 •

edited

bcmills commented Jan 2, 2018 •

edited