GODRIVER-2520 Remove deadline setters from gridfs #1427

prestonvasquez · 2023-10-16T20:26:58Z

Summary

Remove the deadline setters from gridfs.Bucket in favor of extending function signatures to include context.Context
Replace DownloadStream.SetReadDeadline with WithContext
Replace UploadStream.SetWriteDeadline with WithContext

Background & Motivation

The current api for many of the GridFS crud operations looks something like this:

func (b *Bucket) Op() {
	ctx := context.Background()
	if b.writeDeadline {
		ctx, cancel := context.WithDeadline(ctx, b.writeDeadline)
		defer cancel()
	}

	return b.OpContext(ctx)
}

func (b *Bucket) OpContext(context.Context) {
	// Core logic
}

func (b *Bucket) SetWriteDeadline(time.Time) {}

The proposal of this ticket is to rework the logic to remove the setter and add a context to the Op() method:

func (b *Bucket) Op(context.Context) {
	// Core logic
}

gridfs.UploadStream and gridfs.DownloadStream are an io.Writer and io.Reader respectively. Both allow context timeouts in their read/write methods, whose signatures cannot be extended to comply with io. This PR suggests could renaming Set<> to WithContext for these structs to put make them slightly more Go-idiomatic (e.g. http(.

prestonvasquez · 2023-10-16T20:27:42Z

mongo/gridfs/download_stream.go

-	return nil
+// WithContext sets the context for the DownloadStream, allowing control over
+// the execution and behavior of operations associated with the stream.
+func (ds *DownloadStream) WithContext(ctx context.Context) {


An alternative to this would be to have a constructor that accepts a context for initializing a download stream.

Do we need this setter at all? The only ways to create a DownloadStream are using OpenDownloadStream or OpenDownloadStreamByName, which both accept a Context parameter as of this PR.

The context set by WithContext is specific to the read operation, which is independent of constructing a DownloadStream. For example, this:

ctx, cancel := context.WithTimeout(context.Background(), 1*time.Second) mt.Cleanup(cancel) ds, err := bucket.OpenDownloadStreamByName(ctx, fileName) // could time out finding a file, etc assert.Nil(mt, err, "OpenDownloadStreamByName error: %v", err) p := make([]byte, len(fileData)) _, err = ds.Read(p)

has a different intent than this:

ds, err := bucket.OpenDownloadStreamByName(context.Background(), fileName) assert.Nil(mt, err, "OpenDownloadStreamByName error: %v", err) ctx, cancel := context.WithTimeout(context.Background(), 1*time.Second) mt.Cleanup(cancel) ds.WithContext(ctx) // specifically trying to add a context when reading a file p := make([]byte, len(fileData)) _, err = ds.Read(p)

I see your point about how the Context in OpenDownloadStream is only used to get the file info, but is not used for the subsequent ops that read the file info from the database. I think it's worth considering how similar APIs behave.

The Go stdlib offers a few examples of how to apply timeouts to stream reader types that implement io.Reader. The patterns are similar for applying timeouts to stream writer types that implement io.Writer.

The Go net.Conn allows setting a read timeout via SetReadDeadline or SetDeadline.

conn, _ := net.Dial(...) conn.SetReadDeadline(time.Now().Add(15 * time.Second)) // Will time out in 15 seconds io.ReadAll(conn)

The Go http.Client allows setting a timeout that applies to the entire lifetime of any request, including dialing, reading headers, and reading the body.

client := &http.Client{ Timeout: 15 * time.Second, } resp, err := client.Get(...) // Will time out in 15 seconds. io.ReadAll(http.Body)

Thoughts:

Concerning using "read deadline" vs "context", all of the underlying APIs used by the GridFS code accept a Context (they're all just Go driver CRUD calls), so using a Context seems to be the best choice.

I think accepting a Context in OpenDownloadStream that is not used for actually downloading the file is confusing and would surprise most users. I recommend using the Context passed to OpenDownloadStream (and OpenDownloadStreamByName) as the Context on a DownloadStream.

If we want to allow users to override the Context used when actually downloading the file, we can add a SetContext method to DownloadStream. However, it's not immediately clear if that is necessary, so I'd recommend omitting it for now.

The suggestion of WithContext comes directly from the http packages Request.WithContext API. Which uses a context set by this method in it's io operations. I am also open to retaining the existing API which, as you note, is the pattern used in net.Conn. I would argue the existing pattern (SetReadDeadline) is unnecessarily asymmetric as DownloadStream does not have a concept of Write and so WithContext or SetDeadline is concise.

I think accepting a Context in OpenDownloadStream that is not used for actually downloading the file is confusing and would surprise most users. I recommend using the Context passed to OpenDownloadStream (and OpenDownloadStreamByName) as the Context on a DownloadStream.

The context timeout starts ticking around when the DownloadStream is constructed. So The user will have to be judicious about how they set the context timeout and when they plan on reading from io. If we go this way, I agree with omitting a setter specific to setting context on the streaming types until it's more clear if there is a use case for it. However, in my opinion this makes the API for DownloadStream more difficult to use. What are your thoughts?

Another issue with going the constructor route is that if we ever needed to add a read-specific context timeout, then undoing the constructor propagation of context would be a breaking change.

For example, suppose a user is setting a context on the constructor to timeout the the find operation, i.e. the construction. And they have no intention of attempting to timeout the io read. We would be tempted on the Go Driver team to add a WithContext method to DownloadStream to accommodate this case. However, we couldn't simply revert the context associated with the constructor because that could break another user's logic that expects a timeout to be shared between construction and read. This could be an awkward situation.

I think simply having something like SetReadDeadline is the correct approach to the Download/Upload Stream objects.

The GridFS API section of the CSOT spec actually describes the required behavior of the timeout param, which is basically "use the constructor context":

... all methods in the GridFS Bucket API MUST support the timeoutMS option. For methods that create streams (e.g. open_upload_stream), the option MUST cap the lifetime of the entire stream. ... Methods that interact with a user-provided stream (e.g. upload_from_stream) MUST use timeoutMS as the timeout for the entire upload/download operation.

Concerning the comment

we couldn't simply revert the context associated with the constructor because that could break another user's logic that expects a timeout to be shared between construction and read

If we use the Context passed into the constructor, adding a new WithContext method to a DownloadStream doesn't seem like it would create a breaking change in API behavior.

For example, consider downloading a file with a 30 second timeout:

ctx, cancel := context.WithTimeout(context.Background(), 30 * time.Second) defer cancel() ds, _ := bucket.OpenDownloadStream(ctx, ...) b, _ := io.ReadAll(ds)

Now consider opening a DownloadStream with a 30 second timeout, but reading the file document(s) with no timeout:

ctx, cancel := context.WithTimeout(context.Background(), 30 * time.Second) defer cancel() ds, _ := bucket.OpenDownloadStream(ctx, ...) ds.WithContext(context.Background()) b, _ := io.ReadAll(ds)

Is there an examples where those timeouts would conflict?

It's still not clear that there is a use case for having different timeout behavior for different underlying operations during a GridFS upload/download, so I still recommend omitting it.

@matthewdale For this:

we couldn't simply revert the context associated with the constructor because that could break another user's logic that expects a timeout to be shared between construction and read

I agree that there wouldn't be a conflict if (1) we didn't revert the context on the constructor, and (2) (probably) the WithContext method returned a shallow copy of the DownloadStream. Consider this resolved.

I will update the code to include the requested changes, since it conforms to the specifications. But I also want to make it clear that my concern with this approach is that the context lifecycle begins at construction.

This issue is because we store context on the objects, which is an antipattern, and the documentation linked covers this exact case:

The caller’s lifetime is intermingled with a shared context, and the context is scoped to the lifetime where the Worker is created.

The docs also note that the only reason we should do this is for backwards-compatibility, which is not our issue in 2.x.

Unfortunately, if we want to time out the read operation, we have to do this. However, we can do it more modularly than at instantiation. WithContext gives us more control over what precisely a timeout effects.

Notes:

The http packages NewRequestWithContext also notes this:

For an outgoing client request, the context controls the entire lifetime of a request and its response: obtaining a connection, sending the request, and reading the response headers and body.

Agreed that we're basically using an antipattern, as described almost exactly in the "Storing context in structs leads to confusion" section of that "Contexts and structs" article. However, it's not significantly clearer if we provide a context via WithContext on a DownloadStream (I'd actually argue it's more confusing). It seems like we're designing an API to work around two problems:

io.Reader and io.Writer don't include contexts, so they basically have to be side-loaded for types that implement those interfaces. See an interesting proposal for adding contexts to those interfaces here.

The methods like OpenDownloadStream aren't an atomic "download a file" operation, but are currently the only way to accomplish downloading a file in the API described in the GridFS spec. That creates an conflict between Go Context best practices and the GridFS spec.

There's not much we can do about (1). However, we could separate the upload/download API into different methods, one supporting timeout and one not. For example, keep the existing methods with timeouts that only affect the initial operations but not the returned DownloadStream (i.e. there is no way to time out Read calls):

func (b *Bucket) OpenDownloadStream(ctx context.Context, fileID any) (*DownloadStream, error)

Then add additional methods for upload/download that apply the context to the entire operation.

func (b *Bucket) Download(ctx context.Context, dst io.WriterAt, fileID any) error

That deviates from the spec, but conforms more closely to Go Context best practices.

P.S. The Download method signature is inspired by the AWS SDK's S3 Download method. The io.WriterAt allows downloading multiple file chunks simultaneously. That's not something the GridFS spec covers, but that API would allow for optimization in the future.

From an offline conversation: The least surprising behavior is to make the Context apply to all I/O operations for a DownloadStream or UploadStream, which also matches the GridFS spec. The Download method I proposed above was intended to provide context and is really out of scope of this ticket, so can be ignored. Consider this comment resolved.

mongo/gridfs/bucket.go

matthewdale · 2023-10-20T01:44:51Z

mongo/gridfs/download_stream.go

-	return nil
+// WithContext sets the context for the DownloadStream, allowing control over
+// the execution and behavior of operations associated with the stream.
+func (ds *DownloadStream) WithContext(ctx context.Context) {


I see your point about how the Context in OpenDownloadStream is only used to get the file info, but is not used for the subsequent ops that read the file info from the database. I think it's worth considering how similar APIs behave.

The Go stdlib offers a few examples of how to apply timeouts to stream reader types that implement io.Reader. The patterns are similar for applying timeouts to stream writer types that implement io.Writer.

The Go net.Conn allows setting a read timeout via SetReadDeadline or SetDeadline.

conn, _ := net.Dial(...) conn.SetReadDeadline(time.Now().Add(15 * time.Second)) // Will time out in 15 seconds io.ReadAll(conn)

The Go http.Client allows setting a timeout that applies to the entire lifetime of any request, including dialing, reading headers, and reading the body.

client := &http.Client{ Timeout: 15 * time.Second, } resp, err := client.Get(...) // Will time out in 15 seconds. io.ReadAll(http.Body)

Thoughts:

Concerning using "read deadline" vs "context", all of the underlying APIs used by the GridFS code accept a Context (they're all just Go driver CRUD calls), so using a Context seems to be the best choice.

I think accepting a Context in OpenDownloadStream that is not used for actually downloading the file is confusing and would surprise most users. I recommend using the Context passed to OpenDownloadStream (and OpenDownloadStreamByName) as the Context on a DownloadStream.

If we want to allow users to override the Context used when actually downloading the file, we can add a SetContext method to DownloadStream. However, it's not immediately clear if that is necessary, so I'd recommend omitting it for now.

blink1073 · 2023-10-30T12:01:48Z

@prestonvasquez can you please merge from master so we can see the API change report?

mongodb-drivers-pr-bot · 2023-11-01T20:07:41Z

API Change Report

./mongo/gridfs

incompatible changes

(*Bucket).Delete: changed from func(interface{}) error to func(context.Context, interface{}) error
(Bucket).DeleteContext: removed
(Bucket).DownloadToStream: changed from func(interface{}, io.Writer) (int64, error) to func(context.Context, interface{}, io.Writer) (int64, error)
##(Bucket).DownloadToStreamByName: changed from func(string, io.Writer, ..../mongo/options.NameOptions) (int64, error) to func(context.Context, string, io.Writer, ..../mongo/options.NameOptions) (int64, error)
(Bucket).Drop: changed from func() error to func(context.Context) error
(Bucket).DropContext: removed
##(Bucket).Find: changed from func(interface{}, ..../mongo/options.GridFSFindOptions) (./mongo.Cursor, error) to func(context.Context, interface{}, ..../mongo/options.GridFSFindOptions) (./mongo.Cursor, error)
(*Bucket).FindContext: removed
(*Bucket).OpenDownloadStream: changed from func(interface{}) (*DownloadStream, error) to func(context.Context, interface{}) (*DownloadStream, error)
##(Bucket).OpenDownloadStreamByName: changed from func(string, ..../mongo/options.NameOptions) (DownloadStream, error) to func(context.Context, string, ..../mongo/options.NameOptions) (*DownloadStream, error)
##(Bucket).OpenUploadStream: changed from func(string, ..../mongo/options.UploadOptions) (UploadStream, error) to func(context.Context, string, ..../mongo/options.UploadOptions) (*UploadStream, error)
##(Bucket).OpenUploadStreamWithID: changed from func(interface{}, string, ..../mongo/options.UploadOptions) (UploadStream, error) to func(context.Context, interface{}, string, ..../mongo/options.UploadOptions) (*UploadStream, error)
(*Bucket).Rename: changed from func(interface{}, string) error to func(context.Context, interface{}, string) error
(*Bucket).RenameContext: removed
(Bucket).SetReadDeadline: removed
(Bucket).SetWriteDeadline: removed
##(Bucket).UploadFromStream: changed from func(string, io.Reader, ..../mongo/options.UploadOptions) (./bson/primitive.ObjectID, error) to func(context.Context, string, io.Reader, ..../mongo/options.UploadOptions) (./bson/primitive.ObjectID, error)
##(Bucket).UploadFromStreamWithID: changed from func(interface{}, string, io.Reader, ..../mongo/options.UploadOptions) error to func(context.Context, interface{}, string, io.Reader, ..../mongo/options.UploadOptions) error
(*DownloadStream).SetReadDeadline: removed
(*UploadStream).SetWriteDeadline: removed

matthewdale · 2023-11-02T16:51:56Z

mongo/gridfs/bucket.go

+	opts ...*options.FindOneOptions,
+) (*DownloadStream, error) {
+	result := b.filesColl.FindOne(ctx, filter, opts...)
+	if err := result.Err(); err != nil {


Optional: Explicitly checking the error here is unnecessary because the same error will be returned when Decode is called below. Consider handling both errors when calling Decode.

mongo/gridfs/bucket.go

blink1073

LGTM!

matthewdale

Looks good 👍

GODRIVER-2520 Remove deadline setters from gridfs

0483971

prestonvasquez requested a review from a team as a code owner October 16, 2023 20:26

prestonvasquez requested review from matthewdale and removed request for a team October 16, 2023 20:26

prestonvasquez had a problem deploying to api-report October 16, 2023 20:27 — with GitHub Actions Error

prestonvasquez commented Oct 16, 2023

View reviewed changes

GODRIVER-2520 Fix merge conflicts

e465fdf

prestonvasquez requested a review from blink1073 October 18, 2023 17:23

prestonvasquez had a problem deploying to api-report October 18, 2023 17:23 — with GitHub Actions Error

matthewdale reviewed Oct 20, 2023

View reviewed changes

prestonvasquez added 3 commits October 20, 2023 13:02

Merge branch 'master' into GODRIVER-2520

4666c20

GODRIVER-2520 Use FindOne instead of Find in bucket

0f9a137

GODRIVER-2520 Fix linting errors

14297b6

prestonvasquez requested a review from matthewdale October 20, 2023 23:17

prestonvasquez added 2 commits November 1, 2023 13:21

Merge branch 'master' into GODRIVER-2520

de470cf

GODRIVER-2520 Wrap gridfs examples at 80 characters

747c6da

matthewdale reviewed Nov 4, 2023

View reviewed changes

prestonvasquez added 2 commits November 6, 2023 11:26

Merge branch 'master' into GODRIVER-2520

c3d6096

GODRIVER-2520 Replace mongo.ErrNoDocuments

3753e94

prestonvasquez requested a review from matthewdale November 6, 2023 20:03

prestonvasquez added 6 commits November 6, 2023 13:05

GODRIVER-2520 Remove extra error check

9e9d59b

GODRIVER-2520 Add license to bucket test

62672ee

GODRIVER-2520 Merge master

67b8930

GODRIVER-2520 Remove WithContext

734b3d0

GODRIVER-2520 Resolve merge conflict overrides

d54e320

GODRIVER-2520 Resolve static analysis failures

b8f0f5a

blink1073 approved these changes Nov 14, 2023

View reviewed changes

matthewdale approved these changes Nov 14, 2023

View reviewed changes

prestonvasquez merged commit f93a990 into mongodb:master Nov 14, 2023
31 of 37 checks passed

prestonvasquez deleted the GODRIVER-2520 branch November 14, 2023 21:57

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

GODRIVER-2520 Remove deadline setters from gridfs #1427

GODRIVER-2520 Remove deadline setters from gridfs #1427

prestonvasquez commented Oct 16, 2023 •

edited

prestonvasquez Oct 16, 2023

matthewdale Oct 17, 2023

prestonvasquez Oct 18, 2023 •

edited

matthewdale Oct 20, 2023

prestonvasquez Oct 20, 2023 •

edited

prestonvasquez Oct 20, 2023

matthewdale Nov 10, 2023

prestonvasquez Nov 10, 2023 •

edited

matthewdale Nov 11, 2023 •

edited

matthewdale Nov 14, 2023 •

edited

matthewdale Oct 20, 2023

blink1073 commented Oct 30, 2023

mongodb-drivers-pr-bot bot commented Nov 1, 2023 •

edited

matthewdale Nov 2, 2023

blink1073 left a comment

matthewdale left a comment

GODRIVER-2520 Remove deadline setters from gridfs #1427

GODRIVER-2520 Remove deadline setters from gridfs #1427

Conversation

prestonvasquez commented Oct 16, 2023 • edited

Summary

Background & Motivation

prestonvasquez Oct 16, 2023

Choose a reason for hiding this comment

matthewdale Oct 17, 2023

Choose a reason for hiding this comment

prestonvasquez Oct 18, 2023 • edited

Choose a reason for hiding this comment

matthewdale Oct 20, 2023

Choose a reason for hiding this comment

prestonvasquez Oct 20, 2023 • edited

Choose a reason for hiding this comment

prestonvasquez Oct 20, 2023

Choose a reason for hiding this comment

matthewdale Nov 10, 2023

Choose a reason for hiding this comment

prestonvasquez Nov 10, 2023 • edited

Choose a reason for hiding this comment

matthewdale Nov 11, 2023 • edited

Choose a reason for hiding this comment

matthewdale Nov 14, 2023 • edited

Choose a reason for hiding this comment

matthewdale Oct 20, 2023

Choose a reason for hiding this comment

blink1073 commented Oct 30, 2023

mongodb-drivers-pr-bot bot commented Nov 1, 2023 • edited

API Change Report

./mongo/gridfs

incompatible changes

matthewdale Nov 2, 2023

Choose a reason for hiding this comment

blink1073 left a comment

Choose a reason for hiding this comment

matthewdale left a comment

Choose a reason for hiding this comment

prestonvasquez commented Oct 16, 2023 •

edited

prestonvasquez Oct 18, 2023 •

edited

prestonvasquez Oct 20, 2023 •

edited

prestonvasquez Nov 10, 2023 •

edited

matthewdale Nov 11, 2023 •

edited

matthewdale Nov 14, 2023 •

edited

mongodb-drivers-pr-bot bot commented Nov 1, 2023 •

edited