Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Txn Commit can result in transaction guarantee violation #6

Open
aarondwi opened this issue Jun 22, 2021 · 11 comments
Open

Txn Commit can result in transaction guarantee violation #6

aarondwi opened this issue Jun 22, 2021 · 11 comments

Comments

@aarondwi
Copy link

aarondwi commented Jun 22, 2021

Hi

Looking at the txn source, it seems that Commit is not holding the table lock for the entire apply duration, but instead on per delete/update/insert basis. This means that another concurrent transaction might read some deleted value (cause it is done first), and not see the update. Even worse, first transaction might delete a record, while second one might try to update it, resulting in lost update. Haven't thought deeply about the per column lock problems, but probably it gonna has same issues.

For disclosure, I have not tried to reproduce this, but full lock is needed to ensure only one transaction is updating, till finish. This is what sqlite does. For concurrent writer db (e.g. postgres/mysql), they have another protocol (one example is xmin/xmax check in postgres, range/gap lock till commit in mysql) to handle this.

The simplest solution is to just wrap the entire Commit call with a lock, at the cost of reduced concurrency.

What do you think?

@kelindar
Copy link
Owner

Thanks for the review, really appreciated. Yeah, I'm planning to do this, first by merging delete+update+insert together and acquiring the locks then releasing them at the end. Do you have any references on the protocols mentioned? I was thinking about sharding locks as well, currently we have RWLocks per column but it may be worth to sharding them per row range (say a lock per column and every 100K rows). Not quite sure yet what's the best approach.

@aarondwi
Copy link
Author

Some on the net:

  1. postgres use snapshot isolation semantic (pending transaction, committed, etc) and checking it against the xmin/xmax value of each row, a good blog is here
  2. for serializability, postgres need yet another protocol on top of snapshot, Serializable Snapshot Isolation
  3. SQLite semantic is snapshot isolation, but because only 1 writer allowed, it has same effect as serializable isolation. Explanation here
  4. The rests are from academic paper, such as silo, mooc. Also there are some technique for columnar databases, such as this one

About the sharding, seems like you can use Clickhouse's MergeTree, although clickhouse doesnt guarantee any isolation or anything. But this seems like good tradeoff, optimizing for read.

@aarondwi
Copy link
Author

Seems this behavior also affect per column lock behavior. While reading the value , it only locks the column, but on txn code, especially for delete and insert, doesnt have that.

@kelindar
Copy link
Owner

Great thanks, I'll read through all of this over the weekend. But indeed, I'll write some unit tests that test different transaction isolation levels.

However I wonder if we drop column-level locks and instead create a sharded lock. I'm leaning towards a simple []sync.RWMutex slice for its simplicity, but with some padding to prevent false sharing. This way, we could acquire all of the locks necessary during the transaction and completely lock out those pages of rows when committing the transaction.

I think we'll easily be able to guarantee read committed type of isolation but you would still have fuzzy and phantom reads, since if we lock rows page by page, a transaction could commit a bunch and another transaction could get different values during the read before and after. It's probably not a deal breaker though, as if you're doing read-then-write during a Range() for example, we could optimistically lock pages and instead of collection.Query() we could have collection.View() which initiates a transaction does not allow update, and collection.Update() which does. The View could only acquire read locks per page while Update would acquire read-write locks instead.

@aarondwi
Copy link
Author

Separating read-only and read-write transaction definitely gonna makes the API clearer about the guarantee, I totally support this.

As for per page (row shard?), how would it handle insert? As insert might create new page. But this definitely gonna help with update/delete case.

@aarondwi
Copy link
Author

aarondwi commented Jun 22, 2021

For isolation level, typical read-focused databases benefit from snapshot isolation level? As long running and repeated queries gonna have same values read. These gonna need like creation/deletion (a la postgres xmin/xmax) values? But probably gonna negate lots of this repo style (?) of using fast bitmap checks, as those created/deletion undoubtedly should be integer.

@kelindar
Copy link
Owner

As for per page (row shard?), how would it handle insert? As insert might create new page. But this definitely gonna help with update/delete case.

It could be of fixed size, something like a pre-allocated slice and then use range reduction and modulo to figure out the shard index. For example if you have a slice of 100 locks, and we want a new lock every 1k items, we can do index >> 10 % 100 in order to retrieve the lock id and jump and lock the appropriate lock. For example:

index=500 shard=0
index=750 shard=0
index=1000 shard=0
index=1250 shard=1
index=1500 shard=1
...
index=102000 shard=99
index=102250 shard=99
index=102500 shard=0
index=102750 shard=0

For isolation level, typical read-focused databases benefit from snapshot isolation level? As long running and repeated queries gonna have same values read. These gonna need like creation/deletion (a la postgres xmin/xmax) values? But probably gonna negate lots of this repo style (?) of using fast bitmap checks, as those created/deletion undoubtedly should be integer.

Yeah, but I think first things first. Implementing properly read commited and then moving on snapsot and potentially serializable. Technically we could make users chose their isolation levels and simply give a global RWLock that is acquired during the transaction View() and Update(), at the cost of concurrency of course. I think MongoDB used to have database-wide lock for a long time. In reality, users won't need to have a single collection and could still manually shard collections.

@aarondwi
Copy link
Author

It could be of fixed size, something like a pre-allocated slice and then use range reduction and modulo to figure out the shard index

Ah okay, with predefined max length, locks will be enough.

Yeah, but I think first things first. Implementing properly read commited and then moving on snapsot and potentially serializable.

I take it as possiblity that this repo may support multiple isolation level at once? Some databases choose to have only one isolation level, cause not all people understand isolation level enough

In reality, users won't need to have a single collection and could still manually shard collections.

Definitely true

@kelindar
Copy link
Owner

Some work in progress #7

@xiegeo
Copy link

xiegeo commented Dec 13, 2022

@kelindar Can this work be summarized in the readme to describe the current implementation? From the readme, the suggestion to use merge instead of reading and writing in a single transaction seems to imply that transactions are not atomic. But otherwise, the behaviour and performance of concurrent writes leave me guessing.

@kelindar
Copy link
Owner

I'll add this to the readme. TL;DR is that transactions are atomic, but the isolation level currently implemented is essentially read committed, which means that you have dirty/phantom reads in the system that you have to deal with. In order to update a particular row without reading it, and hence without being exposed to dirty/phantom reads, merge was introduced.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants