Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Occasional sql: transaction has already been committed or rolled back errors #43

Closed
samsondav opened this issue Jul 15, 2021 · 10 comments · May be fixed by #64
Closed

Occasional sql: transaction has already been committed or rolled back errors #43

samsondav opened this issue Jul 15, 2021 · 10 comments · May be fixed by #64

Comments

@samsondav
Copy link

samsondav commented Jul 15, 2021

We are seeing transaction has already been committed or rolled back errors if we try to use the database "too quickly" after initializing it, with a cancel context.

Code looks like this:

func init() {
	dbURL := os.Getenv("DATABASE_URL")
	if dbURL == "" {
		log.Fatal("You must provide a DATABASE_URL environment variable")
	}

	txdb.Register("txdb", "postgres", dbURL)
}


func DBWithDefaultContext(db *gorm.DB, fc func(db *gorm.DB) error) error {
	ctx, cancel := context.WithTimeout(context.Background(), 10 * time.Second)
	defer cancel()
	return fc(db.WithContext(ctx))
}

func Test_example(t *testing.T) {

db, err := sql.Open("txdb", uuid.NewV4().String())
require.NoError(t, err)
t.Cleanup(func() { assert.NoError(t, db.Close()) })
gormDB, err := gorm.Open(postgres.New(postgres.Config{
	Conn: sqlDB,
	DSN:  uuid.NewV4().String(),
}), &gorm.Config{})
require.NoError(t, err)

err = DBWithDefaultContext(gormDB, func(db *gorm.DB) error {
	return db.Order("created_at ASC, address ASC").Where("deleted_at IS NULL").Find(&keys).Error
})

err := db.Exec(`SELECT 1`).Error // ERROR: sql: transaction has already been committed or rolled back
}

Note that this error about "transaction already committed or rolled back" has nothing to do with postgres (actually the transaction never gets closed). The postgres logs look like this:

2021-07-15 10:05:52.719 BST [63393] LOG:  statement: ;
2021-07-15 10:05:52.720 BST [63393] LOG:  statement: BEGIN READ WRITE
2021-07-15 10:05:52.720 BST [63393] LOG:  statement: SELECT * FROM "keys" WHERE deleted_at IS NULL AND "keys"."deleted_at" IS NULL ORDER BY created_at ASC, address ASC
// ENDS HERE

It must be a timing issue because it can be resolved by inserting a sleep or gosched here:

db, err := sql.Open("txdb", uuid.NewV4().String())
require.NoError(t, err)
t.Cleanup(func() { assert.NoError(t, db.Close()) })
runtime.Gosched() // FIXES THE TEST
// or
time.Sleep(1 * time.Millisecond) // FIXES THE TEST
// ... rest of the test code

Note that I was running this test with a parallelism flag of 1 - single threaded mode.

UPDATE:

It seems that calling runtime.Gosched() only sometimes fixes it - not always. Calling db.Exec('SELECT 1') in place of that seems to be a more reliable fix.

@veqryn
Copy link

veqryn commented Nov 5, 2021

I am also getting this since v0.1.4.
runtime.Gosched and time.Sleep do not fix it for me
The only fix for me was to revert to v0.1.3

@l3pp4rd
Copy link
Member

l3pp4rd commented Dec 28, 2021

should be fixed by commit now.
the patch before introduced a chance for deadlock, but it also was necessary to prevent shared resource access. hope this will solve both issues. please reopen in case if you see any issue on v0.1.5

@l3pp4rd l3pp4rd closed this as completed Dec 28, 2021
@veqryn
Copy link

veqryn commented Jan 10, 2022

@l3pp4rd sorry but I am still getting the errors even when using v0.1.5. Using v0.1.3 shows no errors.
sql: transaction has already been committed or rolled back

@l3pp4rd l3pp4rd reopened this Jun 23, 2022
@jsteenb2
Copy link
Collaborator

jsteenb2 commented Feb 1, 2023

I'm also experiencing it on v0.1.5 also stuck with gorm in our project. Any update on this?

@jsteenb2
Copy link
Collaborator

jsteenb2 commented Feb 14, 2023

I think I've tracked this down. You can reproduce it by creating a db then issuing an ExecContext. After than first insert is completed, the context that begins the tx is still being used. That insert is over, and the context cancelation shouldnt' affect the db as a whole, but rather the exec. After the first ExecContext is completed successfully, cancel the context used for that exec, and create a new insert with new context and it fails with tx has been rolled back. I'm going to see if I can work this up a bit more. When I add this test to the db_test.go it fails as expected:

failing test
func TestShouldHandleStmtsWithoutContextPollution(t *testing.T) {
	t.Parallel()
	for _, driver := range drivers() {
		t.Run(driver, func(t *testing.T) {
			db, err := sql.Open(driver, "contextpollution")
			if err != nil {
				t.Fatalf(driver+": failed to open a connection, have you run 'make test'? err: %s", err)
			}
			defer db.Close()

			const insertSQL = "INSERT INTO users (username, email) VALUES (?, ?)"

			ctx1, cancel1 := context.WithCancel(context.Background())
			defer cancel1()

			_, err = db.ExecContext(ctx1, insertSQL, "first", "first@foo.com")
			if err != nil {
				t.Fatalf("unexpected error inserting user 1: %s", err)
			}
			cancel1()

			ctx2, cancel2 := context.WithCancel(context.Background())
			defer cancel2()

			_, err = db.ExecContext(ctx2, insertSQL, "second", "second@foo.com")
			if err != nil {
				t.Fatalf("unexpected error inserting user 2: %s", err)
			}
			cancel2()

			rows, err := db.QueryContext(context.Background(), "select username from users")
			if err != nil {
				t.Fatalf("unexpected error querying users: %s", err)
			}
			defer rows.Close()

			var users []string
			for rows.Next() {
				var user string
				err := rows.Scan(&user)
				if err != nil {
					t.Errorf("unexpected scan failure: %s", err)
					continue
				}
				users = append(users, user)
			}
			sort.Strings(users)

			wanted := []string{"first", "second"}

			if len(users) != 2 {
				t.Fatalf("invalid users received; want=%v\tgot=%v", wanted, users)
			}
			for i, want := range wanted {
				if got := users[i]; want != got {
					t.Errorf("invalid user; want=%s\tgot=%s", want, got)
				}
			}
		})
	}
}

@jsteenb2
Copy link
Collaborator

jsteenb2 commented Feb 14, 2023

submitted PR to address the context pollution issue. Its passing all tests and runs without a hitch locally as well. Fixes the issues I was seeing.

@veqryn, I'm curious if you'll get errors with the branch I just pushed to fix an issue with context handling. Its fixed both gorm and sqlx tests locally.

@jsteenb2
Copy link
Collaborator

now that PR #48 has merged, I think its safe to close this issue

@jsteenb2
Copy link
Collaborator

jsteenb2 commented Jul 4, 2023

@samsondav, I’m going to close this issue. It should be resolved now. Feel free to return if the issue resurfaces on v0.1.6 or later 👍

@jsteenb2 jsteenb2 closed this as completed Jul 4, 2023
@mgnsk
Copy link

mgnsk commented Jul 7, 2023

We're seeing this in our tests. Our testcases each have their own context that gets cancelled when the testcase finishes.

There is a race condition that the whole transaction may be canceled and the next testcase will use the canceled c.tx. Specifically, the select statement in that goroutine may observe either <-done which doesn't do anything or <-ctx.Done() which cancels rootCtx, triggering the error on the next operation.

go-txdb/db_go18.go

Lines 56 to 58 in 5bd9aad

case <-ctx.Done():
// operation was interrupted by context cancel, so we cancel parent as well
c.cancel()

As a temporary fix, we just removed the whole goroutine to prevent the global transaction being rolled back by a single operation's context.

@mgnsk
Copy link

mgnsk commented Jul 7, 2023

It must be a timing issue because it can be resolved by inserting a sleep or gosched here:

By inserting a delay before the context gets canceled gives the goroutine in beginTxOnce time to select the <-done case but it is not deterministic.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants