Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add atomic job store transactions #3996

Merged
merged 9 commits into from
May 17, 2024
Merged

add atomic job store transactions #3996

merged 9 commits into from
May 17, 2024

Conversation

wdbaruni
Copy link
Collaborator

This PR allows executing multiple job store write operations in a single atomic transaction. This done by optionally providing a transactional context to the operation, which will prevent automatic commit after each operation and delegates that control to the caller.

Usage Examples

A transaction per operation

This example is for the same behaviour we have today where each operation is committed automatically, which will continue to work

	ctx := context.Background()
	var err error

	// The job will be updated immediately if no error was returned
	if err = s.store.UpdateJobState(ctx, jobstore.UpdateJobStateRequest{}); err != nil {
		return err
	}

	// The execution will be updated immediately if no error was returned
	if err = s.store.UpdateExecution(ctx, jobstore.UpdateExecutionRequest{}); err != nil {
		return err
	}

A single transaction for both operations

This is an example of the new behaviour where we can group multiple operations in a single atomic operation

	var err error
	txCtx, err := s.store.BeginTx(context.Background())
	if err != nil {
		return err
	}

	// make sure we rollback the transaction if there is an error
	defer func() {
		if err != nil {
			_ = txCtx.Rollback()
		}
	}()

	// updating the job won't be committed yet
	// Note we are passing the transaction context txCtx instead of a regular context
	if err = s.store.UpdateJobState(txCtx, jobstore.UpdateJobStateRequest{}); err != nil {
		return err
	}

	// updating the execution won't be committed yet
	if err = s.store.UpdateExecution(txCtx, jobstore.UpdateExecutionRequest{}); err != nil {
		return err
	}

	// commit both operations
	return txCtx.Commit()

Closes 3993

Copy link

coderabbitai bot commented May 13, 2024

Important

Auto Review Skipped

Auto reviews are disabled on this repository.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.


Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>.
    • Generate unit testing code for this file.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
    • @coderabbitai generate unit testing code for this file.
    • @coderabbitai modularize this function.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai generate interesting stats about this repository and render them as a table.
    • @coderabbitai show all the console.log statements in this repository.
    • @coderabbitai read src/utils.ts and generate unit testing code.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (invoked as PR comments)

  • @coderabbitai pause to pause the reviews on a PR.
  • @coderabbitai resume to resume the paused reviews.
  • @coderabbitai review to trigger a review. This is useful when automatic reviews are disabled for the repository.
  • @coderabbitai resolve resolve all the CodeRabbit review comments.
  • @coderabbitai help to get help.

Additionally, you can add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.

CodeRabbit Configration File (.coderabbit.yaml)

  • You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
  • Please see the configuration documentation for more information.
  • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

@wdbaruni wdbaruni requested a review from frrist May 14, 2024 03:12
Copy link
Member

@frrist frrist left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should not (ab)use context as a way to pass optional parameters to function:

Comment on lines +37 to +46
txContext, err := s.store.BeginTx(ctx)
if err != nil {
return fmt.Errorf("failed to begin transaction: %w", err)
}

defer func() {
if err != nil {
_ = txContext.Rollback()
}
}()
Copy link
Member

@frrist frrist May 15, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Couple ideas for replacing this:

  • Return a closure here instead that handles the creation and update of resources within its scope, rolling back if any one operation fails and committing if they all succeed. Similar to what is done internally by the JobStore already.
  • Create our own transaction type with the required methods (CreateExecution, UpdateExecution and UpdateJobState) present on the implementation. Something like the suggestion here would be appropriate: https://github.com/boltdb/bolt?tab=readme-ov-file#managing-transactions-manually

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I hear your concerns about the general recommendation of not passing values in context, but we need to think about the bigger picture, understand why it is an anti-pattern, whether it applies here, and the tradeoffs of insisting not using transactional contexts

First, the transactional context is telling operations that they are within a scope of a transaction and not to create their own. They are well scoped, short lived and things won't panic if we call operations with a regular context, as it means they are within their own transaction scope. They are not a way to pass optional parameters as you pointed, but to tell the scope we are in.

Second, it is the least intrusive change to our code base to introduce transactions only when needed. All existing reads and writes are still functional. This is a priority to keep in mind when we do changes given our team size, and higher priority items we need to tackle. We need to think how can we find simple solutions to complex problems

Third, the alternatives are not as simple as you might think and will require major refactoring:

  • A closure means all our jobstore operations will require a transaction as parameter. This includes read operations as well to enable reads inside a write transaction by reusing the same transaction (1ff510f) . This means that even the most common GetJob API will require a transaction that must be manually started and rolled-back, or use a closure, which won't make much of difference for single operations.
  • A transaction type means all our operations, both writes and reads, will need to move from jobstore interface to this new transaction type. Again we would have to create this type first, and manually commit/rollback even for simple GetJob operations

Fourth, not only the suggestions require major refactoring today, I am not sure if they are on the right path as they will impact our developer experience

Fifth, jobstore is an internal type and not something that external users will need to interact with. We don't need to be as worried as how people will abuse it.

Lastly, if in the future we see the need to set explicit transactions in all our operations, then we can always do that change when needed. We are not taking a one way door decision now. I am not doing major refactors today, and will be smarter to delay major refactors when they are actually needed and justifies the impact on developer experience

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would prefer a functional options pattern like the below using the option jobstore.WithBoltTransaction

func (s *StateUpdater) Process(ctx context.Context, plan *models.Plan) (err error) {
	// ...

	txContext, err := s.store.BeginTx()
	if err != nil {
		return fmt.Errorf("failed to begin transaction: %w", err)
	}

	defer func() {
		if err != nil {
			err = txContext.Rollback()
		}
	}()

	// Create new executions
	for _, exec := range plan.NewExecutions {
		err = s.store.CreateExecution(ctx, *exec, plan.Event, jobstore.WithBoltTransaction(txContext))
		if err != nil {
			return err
		}
	}

	// Update existing executions
	for _, u := range plan.UpdatedExecutions {
		err = s.store.UpdateExecution(ctx, jobstore.UpdateExecutionRequest{
			ExecutionID: u.Execution.ID,
			NewValues: models.Execution{
				DesiredState: models.State[models.ExecutionDesiredStateType]{
					StateType: u.DesiredState,
					Message:   u.Event.Message,
				},
			},
			Condition: jobstore.UpdateExecutionCondition{
				ExpectedRevision: u.Execution.Revision,
			},
			Event: u.Event,
		}, jobstore.WithBoltTransaction(txContext))
		if err != nil {
			return err
		}
	}

	// Update job state if necessary
	if !plan.DesiredJobState.IsUndefined() {
		err = s.store.UpdateJobState(ctx, jobstore.UpdateJobStateRequest{
			JobID:    plan.Job.ID,
			NewState: plan.DesiredJobState,
			Event:    plan.Event,
			Condition: jobstore.UpdateJobCondition{
				ExpectedRevision: plan.Job.Revision,
			},
		}, jobstore.WithBoltTransaction(txContext))
		if err != nil {
			return err
		}
	}
	return txContext.Commit()
}

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am still insisting that utilizing context for this purpose is acceptable and very understandable for all the reasons I've mentioned previously.

Using functional options for every database operation is an overkill for this purpose, confusing, and I don't see how they are improving things more than a scoped context value:

  1. Multiple database operations already accept sort of options today, such as JobQuery to set limits and offsets, GetExecutionsOptions to include job, JobHistoryFilterOptions to set start time and filter execution or job level events, and UpdateJobCondition to set conditions. We are using struct types to define those parameters as they map easily to our HTTP APIs, clearer what each API expects, and we don't need a great level of flexibility for those calls. It will be very confusing to provide additional types of options to configure the calls and optionally allowing passing transactions
  2. Functional patterns are best when wanting a great level of flexibility, but not as readable or obvious what options are available. We don't need this level of flexibility here and I don't foresee the need for more options in the future other than just the transaction. Again we are not taking one way door decision here as the jobstore is an internal library, and we can add functional options in the future if there is a need for more options

Lets not get stuck at anti-patterns without thinking about the bigger picture or the impact, which I've shared in my previous answer. This PR is blocking my progress for job queues. If you don't have any other comment, I would like us to move forward and we can always reevaluate when we need more flexible options or realize most of our calls span a single transaction

pkg/jobstore/boltdb/context.go Show resolved Hide resolved
Comment on lines 33 to 40
go func() {
<-innerCtx.Done()
// If the transaction has not been committed when the context is done, rollback the transaction.
// In boltdb transaction, db is nil if the transaction is already committed or rolled back.
if txCtx.tx.DB() != nil {
_ = txCtx.tx.Rollback()
}
}()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Few concerns:

  1. Don't ignore the error if the rollback fails, at least log it.
  2. When txContext.Commit()/Rollback() is called we commit/rollback the transaction, then cancel the inner context, if the DB is not nil we then rollback the transaction. Since boltdb's transaction implementation closes the connection on commit or rollback this is fine - But we've tightly coupled the logic here to the internals of boltdb, which is unfortunate and unavoidable given this approach.
  3. When the parent context is canceled we rollback the transaction, but depending on how this is used we may call Commit or Rollback again later which will result in ErrTxClosed errors. Again unfortunate but unavoidable.

Lets address 1 with a code change, and make 2-3 explicitly clear in comments

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

improved things here 2aa9913, but I've also realized boltdb transactions are not thread safe and fixed it here 99eec41

@@ -0,0 +1,97 @@
//go:build unit || !integration
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it worth testing the case where a context contains more than a single transaction? Or will that never happen?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

boltdb doesn't allow concurrent active transactions. If you attempt to begin a transaction while another one is still active, whether in the same context or note, it will block. I've added additional tests in b0c7060 for botldb jobstore to verify we are handling this gracefully

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe boltdb supports concurrent read-only transactions: https://github.com/boltdb/bolt?tab=readme-ov-file#transactions

// always rollback the transaction if we did create it and we're returning an error
defer func() {
if !managed && err != nil {
_ = tx.Rollback()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should we return this error to the caller?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

similar to #3996 (comment)

pkg/jobstore/boltdb/store.go Show resolved Hide resolved
pkg/jobstore/boltdb/store.go Outdated Show resolved Hide resolved
pkg/jobstore/context.go Outdated Show resolved Hide resolved
Copy link
Member

@frrist frrist left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@wdbaruni wdbaruni merged commit 72b378f into main May 17, 2024
12 checks passed
@wdbaruni wdbaruni deleted the queue-atomic-ops branch May 17, 2024 18:21
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Atomic job store transactions
2 participants