Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GODRIVER-2520 Remove deadline setters from gridfs #1427

Merged
merged 15 commits into from
Nov 14, 2023

Conversation

prestonvasquez
Copy link
Collaborator

@prestonvasquez prestonvasquez commented Oct 16, 2023

GODRIVER-2520

Summary

  • Remove the deadline setters from gridfs.Bucket in favor of extending function signatures to include context.Context
  • Replace DownloadStream.SetReadDeadline with WithContext
  • Replace UploadStream.SetWriteDeadline with WithContext

Background & Motivation

The current api for many of the GridFS crud operations looks something like this:

func (b *Bucket) Op() {
	ctx := context.Background()
	if b.writeDeadline {
		ctx, cancel := context.WithDeadline(ctx, b.writeDeadline)
		defer cancel()
	}

	return b.OpContext(ctx)
}

func (b *Bucket) OpContext(context.Context) {
	// Core logic
}

func (b *Bucket) SetWriteDeadline(time.Time) {}

The proposal of this ticket is to rework the logic to remove the setter and add a context to the Op() method:

func (b *Bucket) Op(context.Context) {
	// Core logic
}

gridfs.UploadStream and gridfs.DownloadStream are an io.Writer and io.Reader respectively. Both allow context timeouts in their read/write methods, whose signatures cannot be extended to comply with io. This PR suggests could renaming Set<> to WithContext for these structs to put make them slightly more Go-idiomatic (e.g. http(.

return nil
// WithContext sets the context for the DownloadStream, allowing control over
// the execution and behavior of operations associated with the stream.
func (ds *DownloadStream) WithContext(ctx context.Context) {
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

An alternative to this would be to have a constructor that accepts a context for initializing a download stream.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need this setter at all? The only ways to create a DownloadStream are using OpenDownloadStream or OpenDownloadStreamByName, which both accept a Context parameter as of this PR.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The context set by WithContext is specific to the read operation, which is independent of constructing a DownloadStream. For example, this:

ctx, cancel := context.WithTimeout(context.Background(), 1*time.Second)
mt.Cleanup(cancel)

ds, err := bucket.OpenDownloadStreamByName(ctx, fileName) // could time out finding a file, etc
assert.Nil(mt, err, "OpenDownloadStreamByName error: %v", err)

p := make([]byte, len(fileData))
_, err = ds.Read(p)

has a different intent than this:

ds, err := bucket.OpenDownloadStreamByName(context.Background(), fileName) 
assert.Nil(mt, err, "OpenDownloadStreamByName error: %v", err)

ctx, cancel := context.WithTimeout(context.Background(), 1*time.Second)
mt.Cleanup(cancel)

ds.WithContext(ctx) // specifically trying to add a context when reading a file

p := make([]byte, len(fileData))
_, err = ds.Read(p)

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see your point about how the Context in OpenDownloadStream is only used to get the file info, but is not used for the subsequent ops that read the file info from the database. I think it's worth considering how similar APIs behave.

The Go stdlib offers a few examples of how to apply timeouts to stream reader types that implement io.Reader. The patterns are similar for applying timeouts to stream writer types that implement io.Writer.

The Go net.Conn allows setting a read timeout via SetReadDeadline or SetDeadline.

conn, _ := net.Dial(...)
conn.SetReadDeadline(time.Now().Add(15 * time.Second))

// Will time out in 15 seconds
io.ReadAll(conn)

The Go http.Client allows setting a timeout that applies to the entire lifetime of any request, including dialing, reading headers, and reading the body.

client := &http.Client{
    Timeout: 15 * time.Second,
}
resp, err := client.Get(...)

// Will time out in 15 seconds.
io.ReadAll(http.Body)

Thoughts:

  • Concerning using "read deadline" vs "context", all of the underlying APIs used by the GridFS code accept a Context (they're all just Go driver CRUD calls), so using a Context seems to be the best choice.
  • I think accepting a Context in OpenDownloadStream that is not used for actually downloading the file is confusing and would surprise most users. I recommend using the Context passed to OpenDownloadStream (and OpenDownloadStreamByName) as the Context on a DownloadStream.
  • If we want to allow users to override the Context used when actually downloading the file, we can add a SetContext method to DownloadStream. However, it's not immediately clear if that is necessary, so I'd recommend omitting it for now.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The suggestion of WithContext comes directly from the http packages Request.WithContext API. Which uses a context set by this method in it's io operations. I am also open to retaining the existing API which, as you note, is the pattern used in net.Conn. I would argue the existing pattern (SetReadDeadline) is unnecessarily asymmetric as DownloadStream does not have a concept of Write and so WithContext or SetDeadline is concise.


I think accepting a Context in OpenDownloadStream that is not used for actually downloading the file is confusing and would surprise most users. I recommend using the Context passed to OpenDownloadStream (and OpenDownloadStreamByName) as the Context on a DownloadStream.

The context timeout starts ticking around when the DownloadStream is constructed. So The user will have to be judicious about how they set the context timeout and when they plan on reading from io. If we go this way, I agree with omitting a setter specific to setting context on the streaming types until it's more clear if there is a use case for it. However, in my opinion this makes the API for DownloadStream more difficult to use. What are your thoughts?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another issue with going the constructor route is that if we ever needed to add a read-specific context timeout, then undoing the constructor propagation of context would be a breaking change.

For example, suppose a user is setting a context on the constructor to timeout the the find operation, i.e. the construction. And they have no intention of attempting to timeout the io read. We would be tempted on the Go Driver team to add a WithContext method to DownloadStream to accommodate this case. However, we couldn't simply revert the context associated with the constructor because that could break another user's logic that expects a timeout to be shared between construction and read. This could be an awkward situation.

I think simply having something like SetReadDeadline is the correct approach to the Download/Upload Stream objects.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The GridFS API section of the CSOT spec actually describes the required behavior of the timeout param, which is basically "use the constructor context":

... all methods in the GridFS Bucket API MUST support the timeoutMS option. For methods that create streams (e.g. open_upload_stream), the option MUST cap the lifetime of the entire stream. ... Methods that interact with a user-provided stream (e.g. upload_from_stream) MUST use timeoutMS as the timeout for the entire upload/download operation.

Concerning the comment

we couldn't simply revert the context associated with the constructor because that could break another user's logic that expects a timeout to be shared between construction and read

If we use the Context passed into the constructor, adding a new WithContext method to a DownloadStream doesn't seem like it would create a breaking change in API behavior.

For example, consider downloading a file with a 30 second timeout:

ctx, cancel := context.WithTimeout(context.Background(), 30 * time.Second)
defer cancel()
ds, _ := bucket.OpenDownloadStream(ctx, ...)
b, _ := io.ReadAll(ds)

Now consider opening a DownloadStream with a 30 second timeout, but reading the file document(s) with no timeout:

ctx, cancel := context.WithTimeout(context.Background(), 30 * time.Second)
defer cancel()
ds, _ := bucket.OpenDownloadStream(ctx, ...)
ds.WithContext(context.Background())
b, _ := io.ReadAll(ds)

Is there an examples where those timeouts would conflict?

It's still not clear that there is a use case for having different timeout behavior for different underlying operations during a GridFS upload/download, so I still recommend omitting it.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@matthewdale For this:

we couldn't simply revert the context associated with the constructor because that could break another user's logic that expects a timeout to be shared between construction and read

I agree that there wouldn't be a conflict if (1) we didn't revert the context on the constructor, and (2) (probably) the WithContext method returned a shallow copy of the DownloadStream. Consider this resolved.

I will update the code to include the requested changes, since it conforms to the specifications. But I also want to make it clear that my concern with this approach is that the context lifecycle begins at construction.

This issue is because we store context on the objects, which is an antipattern, and the documentation linked covers this exact case:

The caller’s lifetime is intermingled with a shared context, and the context is scoped to the lifetime where the Worker is created.

The docs also note that the only reason we should do this is for backwards-compatibility, which is not our issue in 2.x.

Unfortunately, if we want to time out the read operation, we have to do this. However, we can do it more modularly than at instantiation. WithContext gives us more control over what precisely a timeout effects.


Notes:

The http packages NewRequestWithContext also notes this:

For an outgoing client request, the context controls the entire lifetime of a request and its response: obtaining a connection, sending the request, and reading the response headers and body.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed that we're basically using an antipattern, as described almost exactly in the "Storing context in structs leads to confusion" section of that "Contexts and structs" article. However, it's not significantly clearer if we provide a context via WithContext on a DownloadStream (I'd actually argue it's more confusing). It seems like we're designing an API to work around two problems:

  1. io.Reader and io.Writer don't include contexts, so they basically have to be side-loaded for types that implement those interfaces. See an interesting proposal for adding contexts to those interfaces here.
  2. The methods like OpenDownloadStream aren't an atomic "download a file" operation, but are currently the only way to accomplish downloading a file in the API described in the GridFS spec. That creates an conflict between Go Context best practices and the GridFS spec.

There's not much we can do about (1). However, we could separate the upload/download API into different methods, one supporting timeout and one not. For example, keep the existing methods with timeouts that only affect the initial operations but not the returned DownloadStream (i.e. there is no way to time out Read calls):

func (b *Bucket) OpenDownloadStream(ctx context.Context, fileID any) (*DownloadStream, error)

Then add additional methods for upload/download that apply the context to the entire operation.

func (b *Bucket) Download(ctx context.Context, dst io.WriterAt, fileID any) error

That deviates from the spec, but conforms more closely to Go Context best practices.

P.S. The Download method signature is inspired by the AWS SDK's S3 Download method. The io.WriterAt allows downloading multiple file chunks simultaneously. That's not something the GridFS spec covers, but that API would allow for optimization in the future.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From an offline conversation: The least surprising behavior is to make the Context apply to all I/O operations for a DownloadStream or UploadStream, which also matches the GridFS spec. The Download method I proposed above was intended to provide context and is really out of scope of this ticket, so can be ignored. Consider this comment resolved.

mongo/gridfs/bucket.go Outdated Show resolved Hide resolved
return nil
// WithContext sets the context for the DownloadStream, allowing control over
// the execution and behavior of operations associated with the stream.
func (ds *DownloadStream) WithContext(ctx context.Context) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see your point about how the Context in OpenDownloadStream is only used to get the file info, but is not used for the subsequent ops that read the file info from the database. I think it's worth considering how similar APIs behave.

The Go stdlib offers a few examples of how to apply timeouts to stream reader types that implement io.Reader. The patterns are similar for applying timeouts to stream writer types that implement io.Writer.

The Go net.Conn allows setting a read timeout via SetReadDeadline or SetDeadline.

conn, _ := net.Dial(...)
conn.SetReadDeadline(time.Now().Add(15 * time.Second))

// Will time out in 15 seconds
io.ReadAll(conn)

The Go http.Client allows setting a timeout that applies to the entire lifetime of any request, including dialing, reading headers, and reading the body.

client := &http.Client{
    Timeout: 15 * time.Second,
}
resp, err := client.Get(...)

// Will time out in 15 seconds.
io.ReadAll(http.Body)

Thoughts:

  • Concerning using "read deadline" vs "context", all of the underlying APIs used by the GridFS code accept a Context (they're all just Go driver CRUD calls), so using a Context seems to be the best choice.
  • I think accepting a Context in OpenDownloadStream that is not used for actually downloading the file is confusing and would surprise most users. I recommend using the Context passed to OpenDownloadStream (and OpenDownloadStreamByName) as the Context on a DownloadStream.
  • If we want to allow users to override the Context used when actually downloading the file, we can add a SetContext method to DownloadStream. However, it's not immediately clear if that is necessary, so I'd recommend omitting it for now.

@blink1073
Copy link
Member

@prestonvasquez can you please merge from master so we can see the API change report?

Copy link

mongodb-drivers-pr-bot bot commented Nov 1, 2023

API Change Report

./mongo/gridfs

incompatible changes

(*Bucket).Delete: changed from func(interface{}) error to func(context.Context, interface{}) error
(Bucket).DeleteContext: removed
(Bucket).DownloadToStream: changed from func(interface{}, io.Writer) (int64, error) to func(context.Context, interface{}, io.Writer) (int64, error)
##(Bucket).DownloadToStreamByName: changed from func(string, io.Writer, ..../mongo/options.NameOptions) (int64, error) to func(context.Context, string, io.Writer, ...
./mongo/options.NameOptions) (int64, error)
(Bucket).Drop: changed from func() error to func(context.Context) error
(Bucket).DropContext: removed
##(Bucket).Find: changed from func(interface{}, ..../mongo/options.GridFSFindOptions) (
./mongo.Cursor, error) to func(context.Context, interface{}, ...
./mongo/options.GridFSFindOptions) (
./mongo.Cursor, error)
(*Bucket).FindContext: removed
(*Bucket).OpenDownloadStream: changed from func(interface{}) (*DownloadStream, error) to func(context.Context, interface{}) (*DownloadStream, error)
##(Bucket).OpenDownloadStreamByName: changed from func(string, ..../mongo/options.NameOptions) (DownloadStream, error) to func(context.Context, string, ..../mongo/options.NameOptions) (*DownloadStream, error)
##(Bucket).OpenUploadStream: changed from func(string, ..../mongo/options.UploadOptions) (UploadStream, error) to func(context.Context, string, ..../mongo/options.UploadOptions) (*UploadStream, error)
##(Bucket).OpenUploadStreamWithID: changed from func(interface{}, string, ..../mongo/options.UploadOptions) (UploadStream, error) to func(context.Context, interface{}, string, ..../mongo/options.UploadOptions) (*UploadStream, error)
(*Bucket).Rename: changed from func(interface{}, string) error to func(context.Context, interface{}, string) error
(*Bucket).RenameContext: removed
(Bucket).SetReadDeadline: removed
(Bucket).SetWriteDeadline: removed
##(Bucket).UploadFromStream: changed from func(string, io.Reader, ..../mongo/options.UploadOptions) (./bson/primitive.ObjectID, error) to func(context.Context, string, io.Reader, ...
./mongo/options.UploadOptions) (./bson/primitive.ObjectID, error)
##(Bucket).UploadFromStreamWithID: changed from func(interface{}, string, io.Reader, ..../mongo/options.UploadOptions) error to func(context.Context, interface{}, string, io.Reader, ...
./mongo/options.UploadOptions) error
(*DownloadStream).SetReadDeadline: removed
(*UploadStream).SetWriteDeadline: removed

opts ...*options.FindOneOptions,
) (*DownloadStream, error) {
result := b.filesColl.FindOne(ctx, filter, opts...)
if err := result.Err(); err != nil {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Optional: Explicitly checking the error here is unnecessary because the same error will be returned when Decode is called below. Consider handling both errors when calling Decode.

mongo/gridfs/bucket.go Outdated Show resolved Hide resolved
Copy link
Member

@blink1073 blink1073 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

Copy link
Collaborator

@matthewdale matthewdale left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good 👍

@prestonvasquez prestonvasquez merged commit f93a990 into mongodb:master Nov 14, 2023
31 of 37 checks passed
@prestonvasquez prestonvasquez deleted the GODRIVER-2520 branch November 14, 2023 21:57
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
3 participants