GODRIVER-2348 Make CSOT feature-gated behavior the default #1515

prestonvasquez · 2024-01-11T02:24:13Z

Summary

The PR attempts to make CSOT feature-gated behavior the default by dropping support for legacy timeout behavior.

Workflow Changes

Remove Contexts From Structs

Putting contexts on structs is an antipattern and makes unifying and creating context timeouts difficult due to the requirement of maintaining them as state (e.g. how do you ensure you never overwrite the context while use in different methods). This PR proposes replacing this practice with the following pattern:

func (s someStruct) do(ctx context.Context) {
	ctx, cancel := context.WithCancel(ctx)

	done := make(chan struct{}) 
	defer close(done) // cancel the context if check conmpletes w/o cancelation signal.

	go func() {
		defer cancel()

		select {
		case <-s.cancel:
		case <-done:
		}
	}()

	...
}

func (s someStruct) close() {
	s.cancelOnce.Do(func() {
		s.cancel <- struct{}{}
	})

	...
}

We still need some ability to cancel contexts passed to blocking operations outside of the go routine they are initiated in. So some state will need to be maintained on the initiating object. In this case, a cancelation channel and a sync.Once object.

Unify Timeout Logic

This PR proposes renaming csot.MakeTimeoutContext to csot.WithTimeout. Additionally, this update removes the need to create blocks like the following in operaiton execution, watching a change stream, and perform CRUD operations in GridFS:

if _, deadlineSet := ctx.Deadline(); !deadlineSet && b.db.Client().Timeout() != nil && !csot.IsTimeoutContext(ctx) {
	newCtx, cancelFunc := csot.MakeTimeoutContext(ctx, *b.db.Client().Timeout())
	// Redefine ctx to be the new timeout-derived context.
	ctx = newCtx
	// Cancel the timeout-derived context at the end of Execute to avoid a context leak.
	defer cancelFunc()
}

Rename maxTimeMS for cursor to maxAwaitTimeMS

There is still a need to maintain the maxTimeMS logic for tailable awaitData cursors. This PR proposes changing the code symbol from MaxTime to MaxAwaitTime to match the operation options (e.g. FindOptions.MaxAwaitTime) and the CSOT specifications. This logic has been deferred to the operation-level via Operation.DefaultWTimeout.

CSOT-Specific Changes

Write Concern

This PR proposes removing the WTimeout field from the WriteConcern object. It is worth noting that there is still a use case for sending wire messages with the wtimeout option set. Specifically, while committing a transaction:

if the modified write concern does not include a wtimeout value, drivers MUST also apply wtimeout: 10000 to the write concern in order to avoid waiting forever (or until a socket timeout) if the majority write concern cannot be satisfied.

The Go Driver organizes this logic by maintaining a copy of the txn write concern on the client session. In this case, given a CSOT, the remainder of that time is used for the value of wtimeout rather than the 10000 ms default.

Server Selection

client-side-operations-timeout/client-side-operations-timeout.md#server-selection

The server selection portion of the specification requires a comparison between timeoutMS and serverSelectionTimeoutMS. It also requires that the remaining timeout be passed to any followup operation establishing or checking out a connection:

ctx, cancel := csot.WithServerSelectionTimeout()
defer cancel()

server := SelectServer(ctx)
conn := server.Connection(ctx)

This workflow proposes building the context in a way that is decoupled from SelectServer and server.Connection.

Because server selection is made through a topology.Deployment interface, there is no obvious way to extrapolate the serverSelectionTimeoutMS for use by WithServerSelectionTimeout. Practically, the only implementation of topology.Deployment that accesses the SSTO data form options.ClientOptions is topology.Topology and so we could assert that implementation before calculating the timeout. However, for extensibility the current proposal is to use an interface:

type ServerSelectionTimeoutGetter struct {
	GetServerSelectionTimeout() time.Duration
}

func WithServerSelectionTimeout(
	parent context.Context, 
	serverSelectionTimeout time.Duration,
) (context.Context, context.CancelFunc) {
	...
}

func main() {
	dep := newDeployment()

	ctx, cancel := WithServerSelectionTimeout(context.Background(), dep.GetServerSelectionTimeout())
	defer cancel()

	server := SelectServer(ctx)
	conn := server.Connection(ctx)
	
	...
}

This would extend the topology.Deployment interface with the addition function GetServerSelectionTimeout().

Command Execution

client-side-operations-timeout/client-side-operations-timeout.md#command-execution

The original implementation of CSOT did not apply a context deadline to the maxTimeMS value. See here for the original logic. This PR proposes the following workflow to be spec compliant:

Change Streams

client-side-operations-timeout/client-side-operations-timeout.md#change-streams

GridFS API

client-side-operations-timeout/client-side-operations-timeout.md#gridfs-api

The specifications note that all methods in the GridFS bucket API must support the timeoutMS option. It's worth pointing out that all blocking operations within the scope of the database used to construct a GridFS bucket will inherit the client-level timeout value.

Additionally, the specifications require that the timeoutMS option cap the lifetime of the entire stream. The Go Driver currently only ensures this is true if a deadline is set on a context. If relying on the client-level timeout, this value is "refreshed" with every new blocking operation.

Sessions

client-side-operations-timeout/client-side-operations-timeout.md#convenient-transaction-api

Propose implementing the convenient transaction api requirements in GODRIVER-2367 , since the context behavior for convenience transactions are still being debated.

Background & Motivation

v1 required deprecating a set of legacy timeout options, gating their use behind a CSOT to avoid making a backwards breaking change. For v2, we can remove these legacy timeouts altogether.

…of 90th percentile

…g connection establishment

…iver into GODRIVER-2762

Co-authored-by: Matt Dale <9760375+matthewdale@users.noreply.github.com>

mongodb-drivers-pr-bot · 2024-01-16T21:47:57Z

API Change Report

./mongo

incompatible changes

(*Cursor).SetMaxTime: removed

compatible changes

(*Cursor).SetMaxAwaitTime: added

./mongo/options

incompatible changes

(*AggregateOptions).SetMaxTime: removed
(*ClientOptions).SetSocketTimeout: removed
(*CountOptions).SetMaxTime: removed
(*CreateIndexesOptions).SetMaxTime: removed
(*DistinctOptions).SetMaxTime: removed
(*DropIndexesOptions).SetMaxTime: removed
(*EstimatedDocumentCountOptions).SetMaxTime: removed
(*FindOneAndDeleteOptions).SetMaxTime: removed
(*FindOneAndReplaceOptions).SetMaxTime: removed
(*FindOneAndUpdateOptions).SetMaxTime: removed
(*FindOneOptions).SetMaxTime: removed
(*FindOptions).SetMaxTime: removed
(*GridFSFindOptions).SetMaxTime: removed
(*ListIndexesOptions).SetMaxTime: removed
(*SessionOptions).SetDefaultMaxCommitTime: removed
(*TransactionOptions).SetMaxCommitTime: removed
AggregateOptions.MaxTime: removed
ClientOptions.SocketTimeout: removed
CountOptions.MaxTime: removed
CreateIndexesOptions.MaxTime: removed
DistinctOptions.MaxTime: removed
DropIndexesOptions.MaxTime: removed
EstimatedDocumentCountOptions.MaxTime: removed
FindOneAndDeleteOptions.MaxTime: removed
FindOneAndReplaceOptions.MaxTime: removed
FindOneAndUpdateOptions.MaxTime: removed
FindOneOptions.MaxTime: removed
FindOptions.MaxTime: removed
GridFSFindOptions.MaxTime: removed
ListIndexesOptions.MaxTime: removed
SessionOptions.DefaultMaxCommitTime: removed
TransactionOptions.MaxCommitTime: removed

./mongo/writeconcern

incompatible changes

WriteConcern.WTimeout: removed

./x/mongo/driver

incompatible changes

(*BatchCursor).SetMaxTime: removed
CursorOptions.MaxTimeMS: removed
ErrNegativeMaxTime: removed
Operation.MaxTime: removed
ServerSelectionTimeoutGetter.GetServerSelectionTimeout: added

compatible changes

(*BatchCursor).SetMaxAwaitTime: added
(*CursorOptions).SetMaxAwaitTime: added
CursorOptions.MaxAwaitTime: added
Operation.OmitMaxTimeMS: added
ServerSelectionTimeoutGetter: added
SingleConnectionDeployment.GetServerSelectionTimeout: added
SingleServerDeployment.GetServerSelectionTimeout: added

./x/mongo/driver/connstring

incompatible changes

ConnString.WTimeout: removed
ConnString.WTimeoutSet: removed
ConnString.WTimeoutSetFromOption: removed

./x/mongo/driver/operation

incompatible changes

(*Aggregate).MaxTime: removed
(*CommitTransaction).MaxTime: removed
(*Count).MaxTime: removed
(*CreateIndexes).MaxTime: removed
(*Distinct).MaxTime: removed
(*DropIndexes).MaxTime: removed
(*Find).MaxTime: removed
(*FindAndModify).MaxTime: removed
(*ListIndexes).MaxTime: removed

compatible changes

(*Hello).OmitMaxTimeMS: added

./x/mongo/driver/session

incompatible changes

Client.CurrentMct: removed
ClientOptions.DefaultMaxCommitTime: removed
TransactionOptions.MaxCommitTime: removed

compatible changes

Client.CurrentWTimeout: added

./x/mongo/driver/topology

incompatible changes

##ConnectServer: changed from func(./mongo/address.Address, updateTopologyCallback, ./bson.ObjectID, ...ServerOption) (*Server, error) to func(./mongo/address.Address, updateTopologyCallback, ./bson.ObjectID, time.Duration, ...ServerOption) (*Server, error)
ErrServerSelectionTimeout: removed
##NewServer: changed from func(./mongo/address.Address, ./bson.ObjectID, ...ServerOption) *Server to func(./mongo/address.Address, ./bson.ObjectID, time.Duration, ...ServerOption) *Server
##ServerAPIFromServerOptions: changed from func([]ServerOption) *./x/mongo/driver.ServerAPIOptions to func(time.Duration, []ServerOption) *./x/mongo/driver.ServerAPIOptions
WithConnectTimeout: removed
WithHeartbeatTimeout: removed
WithReadTimeout: removed
WithWriteTimeout: removed

compatible changes

(*Topology).GetServerSelectionTimeout: added
Config.ConnectTimeout: added
Config.Timeout: added

…s during connection establishment

internal/csot/csot.go

matthewdale · 2024-02-24T03:35:06Z

x/mongo/driver/operation.go

+	wtimeout := getWriteConcernTimeout(ctx, op)
+
+	typ, wcBSON, err := marshalBSONWriteConcern(*wc, wtimeout)


Does the CSOT spec say drivers should set wTimeout on a write concern based on timeoutMS? I thought it was only maxTime that drivers needed to set.

I interpret "[wTimeoutMS] option[s] MUST be deprecated in favor of timeoutMS" to mean that if a user sets a client-level timeout, then that value should be applied to the wTimeoutMS. Otherwise, this condition holds when committing a transaction:

if the modified write concern does not include a wtimeout value, drivers MUST also apply wtimeout: 10000 to the write concern in order to avoid waiting forever (or until a socket timeout) if the majority write concern cannot be satisfied.

I interpreted the section you referenced as describing how timeoutMS replaces those other driver-side configuration values. The driver configuration values and database command fields are often identical, so there is some vagueness in the CSOT spec. In any case, setting maxTimeMS should have the same effect as wTimeout in the case that the operation is blocked on satisfying the write concern, so setting wTimeout in commands seems unnecessary.

I think that section of the Transactions spec is probably vestigial since the scenario it describes is fully covered by timeoutMS (i.e. if you set a client-side timeout, an operation will never wait forever).

@ShaneHarvey Do you have any thoughts on whether or not CSOT should be applied to wTimeout ? This sort of mirrors the 1.x issue where the context deadline does not get applied to maxTimeMS.

I think that section of the Transactions spec is probably vestigial since the scenario it describes is fully covered by timeoutMS (i.e. if you set a client-side timeout, an operation will never wait forever).

@matthewdale I think I follow, are you suggesting that we retain this logic but only if there is no CSOT?

func getWriteConcernTimeout(ctx context.Context, op Operation) time.Duration { if _, ok := ctx.Deadline(); ok { return 0 } if op.Timeout != nil { return 0 } if op.Client == nil { return 0 } return op.Client.CurrentWTimeout }

Do we need a new spec test that asserts wTimeout is not set and maxTimeMS is?

@ShaneHarvey That would be helpful. Should I open a ticket for that?

@matthewdale Concerning this:

I think that section of the Transactions spec is probably vestigial since the scenario it describes is fully covered by timeoutMS

What happens in the case where a user does not set a CSOT? We would be spec non-compliant and the transaction could wait indefinitely. Additionally, here is a list of legacy tests that we would need to skip, all of which check the default value of 10s for wtimeout during a transaction:

TestUnifiedSpecs/transactions/legacy/error-labels.json/add_RetryableWriteError_and_UnknownTransactionCommitResult_labels_to_connection_errors TestUnifiedSpecs/transactions/legacy/error-labels.json/add_RetryableWriteError_and_UnknownTransactionCommitResult_labels_to_retryable_commit_errors TestUnifiedSpecs/transactions/legacy/error-labels.json/add_RetryableWriteError_and_UnknownTransactionCommitResult_labels_to_writeConcernError_ShutdownInProgress TestUnifiedSpecs/transactions/legacy/error-labels.json/add_UnknownTransactionCommitResult_label_to_writeConcernError_WriteConcernFailed TestUnifiedSpecs/transactions/legacy/error-labels.json/add_UnknownTransactionCommitResult_label_to_writeConcernError_WriteConcernFailed_with_wtimeout TestUnifiedSpecs/transactions/legacy/retryable-commit-errorLabels.json/commitTransaction_retries_once_with_RetryableWriteError_from_server TestUnifiedSpecs/transactions/legacy/retryable-commit.json/commitTransaction_fails_after_two_errors TestUnifiedSpecs/transactions/legacy/retryable-commit.json/commitTransaction_succeeds_after_connection_error TestUnifiedSpecs/transactions/legacy/retryable-commit.json/commitTransaction_succeeds_after_NotWritablePrimary TestUnifiedSpecs/transactions/legacy/retryable-commit.json/commitTransaction_succeeds_after_NotPrimaryOrSecondary TestUnifiedSpecs/transactions/legacy/retryable-commit.json/commitTransaction_succeeds_after_NotPrimaryNoSecondaryOk TestUnifiedSpecs/transactions/legacy/retryable-commit.json/commitTransaction_succeeds_after_InterruptedDueToReplStateChange TestUnifiedSpecs/transactions/legacy/retryable-commit.json/commitTransaction_succeeds_after_InterruptedAtShutdown TestUnifiedSpecs/transactions/legacy/retryable-commit.json/commitTransaction_succeeds_after_PrimarySteppedDown TestUnifiedSpecs/transactions/legacy/retryable-commit.json/commitTransaction_succeeds_after_ShutdownInProgress TestUnifiedSpecs/transactions/legacy/retryable-commit.json/commitTransaction_succeeds_after_HostNotFound TestUnifiedSpecs/transactions/legacy/retryable-commit.json/commitTransaction_succeeds_after_HostUnreachable TestUnifiedSpecs/transactions/legacy/retryable-commit.json/commitTransaction_succeeds_after_SocketException TestUnifiedSpecs/transactions/legacy/retryable-commit.json/commitTransaction_succeeds_after_NetworkTimeout TestUnifiedSpecs/transactions/legacy/retryable-commit.json/commitTransaction_succeeds_after_WriteConcernError_InterruptedAtShutdown TestUnifiedSpecs/transactions/legacy/retryable-commit.json/commitTransaction_succeeds_after_WriteConcernError_InterruptedDueToReplStateChange TestUnifiedSpecs/transactions/legacy/retryable-commit.json/commitTransaction_succeeds_after_WriteConcernError_PrimarySteppedDown TestUnifiedSpecs/transactions/legacy/retryable-commit.json/commitTransaction_succeeds_after_WriteConcernError_ShutdownInProgress TestUnifiedSpecs/convenient-transactions/commit-retry.json/commitTransaction_succeeds_after_multiple_connection_errors TestUnifiedSpecs/convenient-transactions/commit-retry.json/commit_is_retried_after_commitTransaction_UnknownTransactionCommitResult_(NotWritablePrimary) TestUnifiedSpecs/convenient-transactions/commit-writeconcernerror.json/commitTransaction_is_retried_after_WriteConcernFailed_timeout_error TestUnifiedSpecs/convenient-transactions/commit-writeconcernerror.json/commitTransaction_is_retried_after_WriteConcernFailed_non-timeout_error

@matthewdale This question is still unresolved.

Reading the referenced commitTransaction section of the Transactions spec and the referenced spec tests, we're talking specifically about the case when the driver retries commitTransaction and overrides some fields on the write concern.

In that case, there are two questions to answer:

If timeoutMS is not set at any level, should drivers include the default wTimeout value of 10,000 on retried commitTransaction commands?

If timeoutMS is set (at any level), should drivers include a wTimeout value on retried commitTransaction commands? If so, should the value be the same as maxTimeMS?

These are my proposed answers:

That seems like a good idea. It maintains the existing default behavior and doesn't seem to conflict with CSOT because there is no CSOT timeout specified.

In that case, wTimeout would be redundant since the command should include maxTimeMS. It seems safe to omit wTimeout.

TLDR: I think we should include wTimeout: 10000 on retried commitTransaction commands if timeoutMS is not set, and omit wTimeout if timeoutMS is set. That behavior should also satisfy all of the referenced spec tests.

Just to clarify, this was the proposed behavior before commit 00fb9da which was made to illustrate the effects of removing this logic. Reverting that commit resolves this thread, correct?

internal/csot/csot.go

x/mongo/driver/operation.go

internal/csot/csot_test.go

internal/csot/csot.go

Co-authored-by: Matt Dale <9760375+matthewdale@users.noreply.github.com>

…iver into GODRIVER-2348

internal/integration/sdam_error_handling_test.go

internal/integration/unified/operation.go

matthewdale · 2024-04-17T05:32:29Z

mongo/change_stream.go

+// validChangeStreamTimeouts will return "false" if maxAwaitTimeMS is set,
+// timeoutMS is set to a non-zero value, and maxAwaitTimeMS is greater than or
+// equal to timeoutMS. Otherwise, the timeouts are valid.
+func validChangeStreamTimeouts(ctx context.Context, maxAwaitTime, timeout *time.Duration) bool {


This validation does not make sense in the Go Driver. I believe it's based on this requirement from the CSOT spec:

Drivers MUST also apply the original timeoutMS value to each next call on the resulting cursor

However, the Go Driver requires passing a Context to each call to Next or TryNext, so the timeout passed to the initial Watch has no bearing on those future getMore call timeouts.

We should remove this validation. However, we may want to apply an analogous validation when calling Next or TryNext if the Context timeouts for those are less than maxAwaitTimeMS.

This validation is based on this requirement from the CSOT specs:

Drivers MUST error if maxAwaitTimeMS is set, timeoutMS is set to a non-zero value, and maxAwaitTimeMS is greater than or equal to timeoutMS.

Correct, that's the timeout validation that's implemented here. However, since the Go Driver doesn't do this:

Drivers MUST also apply the original timeoutMS value to each next call on the resulting cursor

we also shouldn't do this:

Drivers MUST error if maxAwaitTimeMS is set, timeoutMS is set to a non-zero value, and maxAwaitTimeMS is greater than or equal to timeoutMS.

because that validation only makes sense in drivers that use the timeout passed to Find as the timeout for Next. In the Go Driver, users pass separate timeouts to Watch and Next.

For example, the following code would cause a validation error even though there is no client-side timeout on Next (i.e. no conflict with maxAwaitTimeMS):

opts := options.ChangeStream().SetMaxAwaitTime(10 * time.Second) ctx, _ := context.WithTimeout(context.Background(), 1 * time.Second) cs, err := coll.Watch(ctx, mongo.Pipeline{}, opts) if err != nil { panic(err) // panics with "maxAwaitTimeMS cannot be greater than or equal to timeoutMS" } for cs.Next(context.Background()) { // ...

Additionally, the following code would not cause a validation error, even though the Next timeout is much lower than maxAwaitTimeMS:

opts := options.ChangeStream().SetMaxAwaitTime(10 * time.Second) cs, err := coll.Watch(context.Background(), mongo.Pipeline{}, opts) if err != nil { panic(err) } ctx, _ := context.WithTimeout(context.Background(), 1 * time.Second) for cs.Next(ctx) { // Next possibly times out before a the server responds.

Drivers MUST also apply the original timeoutMS value to each next call on the resulting cursor

This is part of the proposal for this PR, see L712.

matthewdale · 2024-04-17T19:12:16Z

mongo/change_stream.go

-		// Cancel the timeout-derived context at the end of executeOperation to avoid a context leak.
-		defer cancelFunc()
-	}
+	ctx, cancel = csot.WithTimeout(ctx, cs.client.timeout)


Should this be set before server selection and connection checkout? If not, it seems like the client-level timeoutMS will not be honored here during server selection and connection checkout.

The server selection and connection checkout use connCtx, which is a context with a timeout that is the minimum of serverSelectionTimeoutMS and context deadline. The logic for these timeouts are defined here: https://github.com/mongodb/specifications/blob/master/source/client-side-operations-timeout/client-side-operations-timeout.md#server-selection

In this case the timeout comes from the client option ServerSelectionTimeout

The spec describes the computed timeout for server selection as min(serverSelectionTimeoutMS, remaining timeoutMS). In the case of the Go Driver, "remaining timeoutMS" is either the Client-level timeout or the operation-level timeout (i.e. Context timeout), whichever is more specific.

For example, if we don't apply the Client-level timeout, the following code could wait up to 10 seconds for server selection when starting a change stream, despite the 1 second Client-level timeout:

opts := options.Client().ApplyURI("...").SetTimeout(1 * time.Second) client, _ := mongo.Connect(opts) coll := client.Database("...").Collection("...") coll.Watch(context.Background(), mongo.Pipeline{})

Edit:
To clarify, I think the timeout logic should be like:

func Watch(ctx context.Context, ...) (...) { // Apply the Client-level timeout. If the operation Context has a // deadline already, the Client-level timeout will be ignored. // // ctx is now "remaining timeoutMS" ctx, _ := csot.WithTimeout(ctx, cs.client.timeout) for { // ssCtx is now "min(serverSelectionTimeoutMS, remaining timeoutMS)" ssCtx, _ := csot.WithServerSelectionTimeout( ctx, cs.client.deployment.GetServerSelectionTimeout()) server, _ := cs.client.deployment.SelectServer(ssCtx, ...) // Expect Connection to calculate // "min(connectTimeoutMS, remaining computedServerSelectionTimeout)" server.Connection(ssCtx) } // ...

matthewdale · 2024-04-17T19:18:01Z

mongo/change_stream.go

@@ -367,13 +391,14 @@ AggregateExecuteLoop:
 			// If error is retryable: subtract 1 from retries, redo server selection, checkout
 			// a connection, and restart loop.
 			retries--
-			server, err = cs.client.deployment.SelectServer(ctx, cs.selector)
+			server, err = cs.client.deployment.SelectServer(connCtx, cs.selector)


Should the server selection timeout apply to each server selection attempt or all server selection attempts while retrying a command?

"All" make sense to me since the calculation for selecting a server is based on min(serverSelectionTimeoutMS, remaining timeoutMS). The remaining time being what is left on the context, in this case. So even if this was done on an "each" basis the result would be the same.

Applying the server selection timeout to "all" seems like it would apply the calculation

min(remaining serverSelectionTimeoutMS, remaining timeoutMS)

instead of

min(serverSelectionTimeoutMS, remaining timeoutMS)

See my proposed timeout pseudocode in this comment and let me know if you agree.

mongo/client.go

x/mongo/driver/topology/server.go

x/mongo/driver/topology/topology_test.go

matthewdale · 2024-04-17T19:32:12Z

x/mongo/driver/topology/topology.go

-		case <-selectionState.timeoutChan:
-			return nil, ServerSelectionError{Wrapped: ErrServerSelectionTimeout, Desc: current}


Is there any way we can determine whether the operation timeout or server selection timeout was reached without this separate block? I think we should keep this simplification to only use the Context timeout, but it would be nice to still be able to distinguish between the two timeout conditions.

If we wanted to support that behavior, it would likely need to be done through the context.

matthewdale · 2024-04-17T22:02:23Z

x/mongo/driver/topology/pool.go

+			var cancel context.CancelFunc
+			connctx, cancel = context.WithTimeout(ctx, p.connectTimeout)
+
+			defer cancel()


This deferred cancel is in a for loop and won't run until createConnections returns (which is typically the lifetime of a Client). Instead we should call cancel after conn.connect(connctx) returns.

matthewdale · 2024-04-17T22:12:17Z

x/mongo/driver/topology/server.go

+	s.cancelHeartbeatCheckOnce.Do(func() {
+		s.cancelHeartbeatCheck <- struct{}{}
+
+		s.heartbeatCheckCanceled = true
+	})


What happens the 2nd time that cancelCheck is called if this block is only run once? It seems like it won't cancel a heartbeat check anytime after the first one is canceled.

Great catch. I looked at various patterns to avoid keeping the context on a struct, including this closedchan pattern used by the cancelCtx (which doesn't work because it assumes cancel can only be called once). I propose using a singleton to cancel the current check, reset with each iteration:

matthewdale · 2024-04-17T22:16:12Z

x/mongo/driver/topology/server.go

+
+// contextWithSocketTimeout will apply the appropriate socket timeout to the
+// parent context for server monitoring.
+func contextWithSocketTimeout(parent context.Context, srv *Server) (context.Context, context.CancelFunc) {


Optional: "Socket timeout" seems like a misnomer here (and above in getSocketTimeout) because we're applying an operation timeout to the heartbeat, not just a network socket timeout. Consider a different name (e.g. withHeartbeatTimeout).

Co-authored-by: Matt Dale <9760375+matthewdale@users.noreply.github.com>

…iver into GODRIVER-2348

prestonvasquez added 8 commits December 21, 2023 15:54

GODRIVER-2762 Use minimum RTT for CSOT maxTimeMS calculation instead …

76fe99a

…of 90th percentile

GODRIVER-2762 Clean up rtt monitor code

c96883a

GODRIVER-2762 Add command-execution.yml

88e9f0d

GODRIVER-2762 Fix typo

85812f6

Merge branch 'master' into GODRIVER-2762

1a0517c

GODRIVER-2762 PR requests

49298ee

GODRIVER-2762 Remove reference to RTT90 in max time calc

04aeb6d

GODRIVER-2348 connectTimeoutMS to cover all blocking operations durin…

2d61660

…g connection establishment

prestonvasquez changed the base branch from v1 to master January 11, 2024 02:24

prestonvasquez and others added 8 commits January 11, 2024 14:31

GODRIVER-2762 Fix rounding, cleanup deadline exceeded error

0ba0d69

Merge branch 'master' into GODRIVER-2762

bccb529

GODRIVER-2348 merge master

f7d8111

GODRIVER-2762 Use args in wait event

698e4e0

Merge branch 'GODRIVER-2762' of github.com:prestonvasquez/mongo-go-dr…

8b10b54

…iver into GODRIVER-2762

Update x/mongo/driver/topology/rtt_monitor.go

ce0c7c2

Co-authored-by: Matt Dale <9760375+matthewdale@users.noreply.github.com>

Merge branch 'master' into GODRIVER-2762

6095151

GODRIVER-2348 Merge 2762

a2f0c9a

prestonvasquez and others added 12 commits January 16, 2024 15:28

GODRIVER-2348 merge master

d10ced0

Merge branch 'master' into GODRIVER-2348

9cb8028

GODRIVER-2466 Implement assertEventCount

4669d9b

GODRIVER-2466 Fix typo

7aa63f0

Merge remote-tracking branch 'origin/GODRIVER-2466' into GODRIVER-2348

45c6725

GODRIVER-2466 Remove duplicate failpoint warning

998b725

Merge remote-tracking branch 'origin/GODRIVER-2466' into GODRIVER-2348

15c6d42

GODRIVER-2348 Update connectTimeoutMS to cover all blocking operation…

9af952f

…s during connection establishment

GODRIVER-2348 Deprecate socketTimeoutMS

2d53d6a

Merge branch 'master' into GODRIVER-2348

d2b8c50

Merge branch 'master' into GODRIVER-2348

9c17094

GODRIVER-2348 Remove rw connection timeouts

d9d61f4

Merge branch 'master' into GODRIVER-2348

75e44e2

prestonvasquez marked this pull request as ready for review February 21, 2024 03:31

prestonvasquez requested a review from a team as a code owner February 21, 2024 03:31

prestonvasquez requested review from blink1073, matthewdale and qingyang-hu and removed request for a team February 21, 2024 03:31

prestonvasquez changed the title ~~(POC) GODRIVER-2348 Make CSOT feature-gated behavior the default~~ GODRIVER-2348 Make CSOT feature-gated behavior the default Feb 21, 2024

GODRIVER-2348 Fix network error pool test

4424192

prestonvasquez added the priority-2-medium Medium Priority PR for Review label Feb 21, 2024

matthewdale reviewed Feb 24, 2024

View reviewed changes

prestonvasquez and others added 3 commits February 26, 2024 15:02

Merge branch 'master' into GODRIVER-2348

9a5df59

GODRIVER-2348 Resolve PR conversations

3043a64

Update internal/csot/csot.go

b2e8844

Co-authored-by: Matt Dale <9760375+matthewdale@users.noreply.github.com>

prestonvasquez requested a review from matthewdale February 27, 2024 00:30

prestonvasquez added 4 commits March 8, 2024 15:46

Merge branch 'master' into GODRIVER-2348

884f2ba

GODRIVER-2348 Resolve merge conflict

db69938

Merge branch 'GODRIVER-2348' of github.com:prestonvasquez/mongo-go-dr…

058b343

…iver into GODRIVER-2348

GODRIVER-2348 Do not set transactional wtimeout

00fb9da

matthewdale mentioned this pull request Apr 9, 2024

GODRIVER-3172 Read response in the background after an op timeout. #1589

Merged

matthewdale reviewed Apr 17, 2024

View reviewed changes

prestonvasquez and others added 4 commits May 1, 2024 10:23

GODRIVER-2348 Merge master

b191ec7

Update mongo/client.go

755204f

Co-authored-by: Matt Dale <9760375+matthewdale@users.noreply.github.com>

Update x/mongo/driver/topology/topology_test.go

8e26315

Co-authored-by: Matt Dale <9760375+matthewdale@users.noreply.github.com>

GODRIVER-2348 Resolve various PR issues

2aff850

prestonvasquez requested a review from matthewdale May 8, 2024 22:17

prestonvasquez added 3 commits May 8, 2024 16:19

Merge branch 'GODRIVER-2348' of github.com:prestonvasquez/mongo-go-dr…

731aca7

…iver into GODRIVER-2348

GODRIVER-2348 Resolve build failure

3b40fa4

GODRIVER-2348 Resolve merge conflicts

1620f92

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

GODRIVER-2348 Make CSOT feature-gated behavior the default #1515

GODRIVER-2348 Make CSOT feature-gated behavior the default #1515

prestonvasquez commented Jan 11, 2024 •

edited

mongodb-drivers-pr-bot bot commented Jan 16, 2024 •

edited

matthewdale Feb 24, 2024

prestonvasquez Feb 26, 2024

matthewdale Mar 8, 2024

prestonvasquez Mar 8, 2024 •

edited

prestonvasquez Mar 8, 2024 •

edited

ShaneHarvey Mar 13, 2024

prestonvasquez Mar 19, 2024 •

edited

prestonvasquez Apr 17, 2024

matthewdale May 3, 2024

prestonvasquez May 6, 2024

matthewdale Apr 17, 2024

prestonvasquez May 1, 2024

matthewdale May 3, 2024

prestonvasquez May 6, 2024 •

edited

matthewdale Apr 17, 2024

prestonvasquez May 1, 2024

matthewdale May 3, 2024 •

edited

matthewdale Apr 17, 2024

prestonvasquez May 1, 2024 •

edited

matthewdale May 3, 2024 •

edited

matthewdale Apr 17, 2024

prestonvasquez May 1, 2024

matthewdale Apr 17, 2024

matthewdale Apr 17, 2024

prestonvasquez May 8, 2024

matthewdale Apr 17, 2024

		wtimeout := getWriteConcernTimeout(ctx, op)

		typ, wcBSON, err := marshalBSONWriteConcern(*wc, wtimeout)

		case <-selectionState.timeoutChan:
		return nil, ServerSelectionError{Wrapped: ErrServerSelectionTimeout, Desc: current}

GODRIVER-2348 Make CSOT feature-gated behavior the default #1515

Are you sure you want to change the base?

GODRIVER-2348 Make CSOT feature-gated behavior the default #1515

Conversation

prestonvasquez commented Jan 11, 2024 • edited

Summary

Workflow Changes

Remove Contexts From Structs

Unify Timeout Logic

Rename maxTimeMS for cursor to maxAwaitTimeMS

CSOT-Specific Changes

Write Concern

Server Selection

Command Execution

Change Streams

GridFS API

Sessions

Background & Motivation

mongodb-drivers-pr-bot bot commented Jan 16, 2024 • edited

API Change Report

./mongo

incompatible changes

compatible changes

./mongo/options

incompatible changes

./mongo/writeconcern

incompatible changes

./x/mongo/driver

incompatible changes

compatible changes

./x/mongo/driver/connstring

incompatible changes

./x/mongo/driver/operation

incompatible changes

compatible changes

./x/mongo/driver/session

incompatible changes

compatible changes

./x/mongo/driver/topology

incompatible changes

compatible changes

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

prestonvasquez Mar 8, 2024 • edited

Choose a reason for hiding this comment

prestonvasquez Mar 8, 2024 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

prestonvasquez Mar 19, 2024 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

prestonvasquez May 6, 2024 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

matthewdale May 3, 2024 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

prestonvasquez May 1, 2024 • edited

Choose a reason for hiding this comment

matthewdale May 3, 2024 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

prestonvasquez commented Jan 11, 2024 •

edited

mongodb-drivers-pr-bot bot commented Jan 16, 2024 •

edited

prestonvasquez Mar 8, 2024 •

edited

prestonvasquez Mar 8, 2024 •

edited

prestonvasquez Mar 19, 2024 •

edited

prestonvasquez May 6, 2024 •

edited

matthewdale May 3, 2024 •

edited

prestonvasquez May 1, 2024 •

edited

matthewdale May 3, 2024 •

edited