-
Notifications
You must be signed in to change notification settings - Fork 85
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
add atomic job store transactions #3996
Conversation
Important Auto Review SkippedAuto reviews are disabled on this repository. Please check the settings in the CodeRabbit UI or the You can disable this status message by setting the TipsChatThere are 3 ways to chat with CodeRabbit:
Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (invoked as PR comments)
Additionally, you can add CodeRabbit Configration File (
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should not (ab)use context as a way to pass optional parameters to function:
- https://dave.cheney.net/2017/01/26/context-is-for-cancelation
- https://pkg.go.dev/context#WithValue
-
Use context Values only for request-scoped data that transits processes and APIs, not for passing optional parameters to functions.
-
txContext, err := s.store.BeginTx(ctx) | ||
if err != nil { | ||
return fmt.Errorf("failed to begin transaction: %w", err) | ||
} | ||
|
||
defer func() { | ||
if err != nil { | ||
_ = txContext.Rollback() | ||
} | ||
}() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Couple ideas for replacing this:
- Return a closure here instead that handles the creation and update of resources within its scope, rolling back if any one operation fails and committing if they all succeed. Similar to what is done internally by the JobStore already.
- Create our own transaction type with the required methods (
CreateExecution
,UpdateExecution
andUpdateJobState
) present on the implementation. Something like the suggestion here would be appropriate: https://github.com/boltdb/bolt?tab=readme-ov-file#managing-transactions-manually
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I hear your concerns about the general recommendation of not passing values in context, but we need to think about the bigger picture, understand why it is an anti-pattern, whether it applies here, and the tradeoffs of insisting not using transactional contexts
First, the transactional context is telling operations that they are within a scope of a transaction and not to create their own. They are well scoped, short lived and things won't panic if we call operations with a regular context, as it means they are within their own transaction scope. They are not a way to pass optional parameters as you pointed, but to tell the scope we are in.
Second, it is the least intrusive change to our code base to introduce transactions only when needed. All existing reads and writes are still functional. This is a priority to keep in mind when we do changes given our team size, and higher priority items we need to tackle. We need to think how can we find simple solutions to complex problems
Third, the alternatives are not as simple as you might think and will require major refactoring:
- A closure means all our jobstore operations will require a transaction as parameter. This includes read operations as well to enable reads inside a write transaction by reusing the same transaction (1ff510f) . This means that even the most common
GetJob
API will require a transaction that must be manually started and rolled-back, or use a closure, which won't make much of difference for single operations. - A transaction type means all our operations, both writes and reads, will need to move from jobstore interface to this new transaction type. Again we would have to create this type first, and manually commit/rollback even for simple
GetJob
operations
Fourth, not only the suggestions require major refactoring today, I am not sure if they are on the right path as they will impact our developer experience
Fifth, jobstore is an internal type and not something that external users will need to interact with. We don't need to be as worried as how people will abuse it.
Lastly, if in the future we see the need to set explicit transactions in all our operations, then we can always do that change when needed. We are not taking a one way door decision now. I am not doing major refactors today, and will be smarter to delay major refactors when they are actually needed and justifies the impact on developer experience
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would prefer a functional options pattern like the below using the option jobstore.WithBoltTransaction
func (s *StateUpdater) Process(ctx context.Context, plan *models.Plan) (err error) {
// ...
txContext, err := s.store.BeginTx()
if err != nil {
return fmt.Errorf("failed to begin transaction: %w", err)
}
defer func() {
if err != nil {
err = txContext.Rollback()
}
}()
// Create new executions
for _, exec := range plan.NewExecutions {
err = s.store.CreateExecution(ctx, *exec, plan.Event, jobstore.WithBoltTransaction(txContext))
if err != nil {
return err
}
}
// Update existing executions
for _, u := range plan.UpdatedExecutions {
err = s.store.UpdateExecution(ctx, jobstore.UpdateExecutionRequest{
ExecutionID: u.Execution.ID,
NewValues: models.Execution{
DesiredState: models.State[models.ExecutionDesiredStateType]{
StateType: u.DesiredState,
Message: u.Event.Message,
},
},
Condition: jobstore.UpdateExecutionCondition{
ExpectedRevision: u.Execution.Revision,
},
Event: u.Event,
}, jobstore.WithBoltTransaction(txContext))
if err != nil {
return err
}
}
// Update job state if necessary
if !plan.DesiredJobState.IsUndefined() {
err = s.store.UpdateJobState(ctx, jobstore.UpdateJobStateRequest{
JobID: plan.Job.ID,
NewState: plan.DesiredJobState,
Event: plan.Event,
Condition: jobstore.UpdateJobCondition{
ExpectedRevision: plan.Job.Revision,
},
}, jobstore.WithBoltTransaction(txContext))
if err != nil {
return err
}
}
return txContext.Commit()
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am still insisting that utilizing context for this purpose is acceptable and very understandable for all the reasons I've mentioned previously.
Using functional options for every database operation is an overkill for this purpose, confusing, and I don't see how they are improving things more than a scoped context value:
- Multiple database operations already accept sort of options today, such as
JobQuery
to set limits and offsets,GetExecutionsOptions
to include job,JobHistoryFilterOptions
to set start time and filter execution or job level events, andUpdateJobCondition
to set conditions. We are using struct types to define those parameters as they map easily to our HTTP APIs, clearer what each API expects, and we don't need a great level of flexibility for those calls. It will be very confusing to provide additional types of options to configure the calls and optionally allowing passing transactions - Functional patterns are best when wanting a great level of flexibility, but not as readable or obvious what options are available. We don't need this level of flexibility here and I don't foresee the need for more options in the future other than just the transaction. Again we are not taking one way door decision here as the jobstore is an internal library, and we can add functional options in the future if there is a need for more options
Lets not get stuck at anti-patterns without thinking about the bigger picture or the impact, which I've shared in my previous answer. This PR is blocking my progress for job queues. If you don't have any other comment, I would like us to move forward and we can always reevaluate when we need more flexible options or realize most of our calls span a single transaction
pkg/jobstore/boltdb/context.go
Outdated
go func() { | ||
<-innerCtx.Done() | ||
// If the transaction has not been committed when the context is done, rollback the transaction. | ||
// In boltdb transaction, db is nil if the transaction is already committed or rolled back. | ||
if txCtx.tx.DB() != nil { | ||
_ = txCtx.tx.Rollback() | ||
} | ||
}() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Few concerns:
- Don't ignore the error if the rollback fails, at least log it.
- When
txContext.Commit()/Rollback()
is called we commit/rollback the transaction, then cancel the inner context, if the DB is not nil we then rollback the transaction. Since boltdb's transaction implementation closes the connection on commit or rollback this is fine - But we've tightly coupled the logic here to the internals of boltdb, which is unfortunate and unavoidable given this approach. - When the parent context is canceled we rollback the transaction, but depending on how this is used we may call
Commit
orRollback
again later which will result inErrTxClosed
errors. Again unfortunate but unavoidable.
Lets address 1 with a code change, and make 2-3 explicitly clear in comments
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@@ -0,0 +1,97 @@ | |||
//go:build unit || !integration |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it worth testing the case where a context contains more than a single transaction? Or will that never happen?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
boltdb doesn't allow concurrent active transactions. If you attempt to begin a transaction while another one is still active, whether in the same context or note, it will block. I've added additional tests in b0c7060 for botldb jobstore to verify we are handling this gracefully
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe boltdb supports concurrent read-only transactions: https://github.com/boltdb/bolt?tab=readme-ov-file#transactions
// always rollback the transaction if we did create it and we're returning an error | ||
defer func() { | ||
if !managed && err != nil { | ||
_ = tx.Rollback() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should we return this error to the caller?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
similar to #3996 (comment)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will let you decide if https://github.com/bacalhau-project/bacalhau/pull/3996/files#r1605288596 is worth addressing.
This PR allows executing multiple job store write operations in a single atomic transaction. This done by optionally providing a transactional context to the operation, which will prevent automatic commit after each operation and delegates that control to the caller.
Usage Examples
A transaction per operation
This example is for the same behaviour we have today where each operation is committed automatically, which will continue to work
A single transaction for both operations
This is an example of the new behaviour where we can group multiple operations in a single atomic operation
Closes 3993