Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

test: improve TestConcurrentReadAndWrite to verify read/write linerizability #455

Merged
merged 3 commits into from
Apr 14, 2023
Merged

test: improve TestConcurrentReadAndWrite to verify read/write linerizability #455

merged 3 commits into from
Apr 14, 2023

Conversation

ahrtr
Copy link
Member

@ahrtr ahrtr commented Apr 7, 2023

Validate the linearizablity: each reading transaction should read the same value written by previous writing transaction on the same key.

@ahrtr
Copy link
Member Author

ahrtr commented Apr 7, 2023

cc @ptabor @serathius

concurrent_test.go Outdated Show resolved Hide resolved
concurrent_test.go Outdated Show resolved Hide resolved
concurrent_test.go Outdated Show resolved Hide resolved
concurrent_test.go Outdated Show resolved Hide resolved
concurrent_test.go Outdated Show resolved Hide resolved
return rs
}

func analyzeHistoryRecords(rs historyRecords) error {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not a linearization validation as it doesn't account for operation execution time. It depends on Txid for order, making it very limited. For example it doesn't detect stale reads as View can just return old transation id, even though newer transaction was already committed.

I'm not sure what concurrency issues this approach is mean to find or can find any. Please let me know if you need help with integrating linearizability checker https://github.com/anishathalye/porcupine like in robustness tests.

Copy link
Member Author

@ahrtr ahrtr Apr 8, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Depending on Txid is enough, because in bboltDB txid is strictly incremental. Any data written by a transaction should be visible to following reading transaction, otherwise it must be an issue. My understanding is that porcupine might generate false alarm, but in this test it will never generate false alarm unless it's a test bug. Also we are testing only local boltDB library instead of a distributed system, so I don't see any reason to use porcupine.

Note that there is no stale read, a read transaction should always see the same data during the transaction lifecycle, and it's exactly what repeatable reading means. Otherwise it must be an issue.

Copy link
Member Author

@ahrtr ahrtr Apr 8, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In summary, there are two points:

  1. A read transaction should always see the same data view during its lifecycle.
  2. Any data written by a writing transaction should be visible to any following reading transactions (with txid >= previous writing txid)

Also try to check whether it will result in data corruption under concurrent transaction operations.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are mistaking serializability and linearizability. https://jepsen.io/consistency
Txid gives you serializability, but linearizability is real time property, not bound to any counter.

My understanding is that porcupine might generate false alarm, but in this test it will never generate false alarm unless it's a test bug

No, depending on history porcupine might take a long time to validate it, causing test to timeout. However, assuming that tests is correctly written and operation history doesn't include a lot of failed request it should not be a problem

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are mistaking serializability and linearizability.

Note that in bboltDB, only one writing transaction is allowed at a time. So in practice serializability is the same thing as linearizability.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note I am planning to add multiple writing transaction in followup PR.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can rename all *linearizable* to *serializable* when I start to support multiple writing transactions. thx for pointing this out.

Copy link
Member

@serathius serathius Apr 14, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will not be opposed to testing just serializability, but I think you need to add some additional validation to so it makes sense.

Linearizability bounds function execution by client observed time. Serializability no longer cares about time, but cares about the order. Operations observed by client should be bound by order. Meaning client should observe increase in transaction ID.

I know that you do validation on transaction ID, but not on the order observed by each client. I would recommend adding client side validation that transaction ID is always growing for each client, before we sort it.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would recommend adding client side validation that transaction ID is always growing for each client, before we sort it.

Agreed. It's a good idea. Actually we also see such issue (txid isn't somehow strictly incremental). see #402 (comment)

Let me do it in a separate PR. This PR is already too big.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ups, I mistook Serializable vs Sequential. Still validating sequential here would be useful and ok to be done in followup.

@ahrtr
Copy link
Member Author

ahrtr commented Apr 8, 2023

Addressed some your comments, and I will resolve other comments in followup PR, thx. @serathius PTAL

cc @ptabor as well.

…ablity

Signed-off-by: Benjamin Wang <wachao@vmware.com>
…ion history locally firstly

Signed-off-by: Benjamin Wang <wachao@vmware.com>
@ahrtr
Copy link
Member Author

ahrtr commented Apr 14, 2023

@ptabor @serathius PTAL, thx

concurrent_test.go Outdated Show resolved Hide resolved
concurrent_test.go Outdated Show resolved Hide resolved
t.Errorf("The history records are not linearizable:\n %v", err)
}

saveDataIfFailed(t, db, records)
Copy link
Member

@serathius serathius Apr 14, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

defer it to ensure it is executed even on fatal or panic?

Copy link
Member Author

@ahrtr ahrtr Apr 14, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If somehow panicking during test, then the records hasn't been collected yet. It needs big change. Let me think about it separately. thx

concurrent_test.go Outdated Show resolved Hide resolved
Signed-off-by: Benjamin Wang <wachao@vmware.com>
Copy link
Member

@serathius serathius left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are a lot of follow up, which is not to my liking.

Still, as whole the PR is an improvement. Please make sure to do followups as postponing work is a bad example for other contributors. We usually ask other contributors to do all cleanups in the PR.

@ahrtr
Copy link
Member Author

ahrtr commented Apr 14, 2023

Thank you!

BoltDB test (to reproduce the data corruption issue) is my top priority. Some followup is already my planned work.

@ahrtr ahrtr merged commit 96372de into etcd-io:master Apr 14, 2023
9 checks passed
@ahrtr
Copy link
Member Author

ahrtr commented Apr 14, 2023

do followups as postponing work is a bad example

Based on my previous experience, I am definitely not the blocker; instead, lack of active review is.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants