Test Race Condititons During Liquibase Locking #2327

schrieveslaach · 2022-01-05T10:16:16Z

This commit adds an integration test for LB-2131's fix.

Environment

Liquibase Version:

Liquibase Integration & Version: <Pick one: CLI, maven, gradle, spring boot, servlet, etc.>

Liquibase Extension(s) & Version:

Database Vendor & Version:

Operating System Type & Version:

Pull Request Type

Bug fix (non-breaking change which fixes an issue.)
Enhancement/New feature (non-breaking change which adds functionality)
Breaking change (fix or feature that would cause existing functionality to change)

Description

#1901 tried to fix a race condition (see #1584). However, the fix was introduce by #2198 but without any automated test. Therefore, this PR which supersedes #1901 restores the test case to provide more quality assurance.

schrieveslaach · 2022-01-05T10:23:49Z

@StevenMassaro, I noticed that your PR #2198 fixed the race condition that I was trying to resolve in #1901. #1901 contains a test case that runs a migration with multiple JVMs and I want to provide that to Liquibase. Do you have any thoughts?

schrieveslaach · 2022-01-05T12:41:54Z

liquibase-core/src/main/java/liquibase/Liquibase.java

+                // hubUpdater releases the lock temporarily. In this time span another JVM instance might have
+                // acquired the database lock and could have applied further changesets to prevent that
+                // liquibase works with an outdated changelog.
+                changeLogService.reset();


That is also a necessary change to fix #1584

StevenMassaro · 2022-01-05T13:30:44Z

@StevenMassaro, I noticed that your PR #2198 fixed the race condition that I was trying to resolve in #1901. #1901 contains a test case that runs a migration with multiple JVMs and I want to provide that to Liquibase. Do you have any thoughts?

I just committed a change to fix the compilation error. I first want to see if the test you wrote passes in the CI build. Certainly more test coverage is not a bad thing, and some coverage in the area you've written here is also necessary, but I do wonder if the approach feels somewhat heavy handed. I want to discuss with my team and see if we have any existing tests which cover the area you've written here, and I'll get back to you.

nvoxland · 2022-01-05T17:25:55Z

I think the test seems good. It does spin up another process which is heavyweight, but because we use singletons etc. within the JVM, just using another thread wouldn't really test what you're looking to do. We're in the process of re-doing how the integration test framework works, so there may be a better way to abstract some of that out down the road, but we can work with the code from where it is.

StevenMassaro · 2022-01-05T17:27:58Z

We also do not have any existing test coverage for this pattern, so adding this test will help cover more use cases. Thanks @schrieveslaach !

nvoxland · 2022-01-05T17:29:21Z

Looking at your changeLogService.reset(); call you added, though, if it is something you were seeing problems caused by the temporary unlock in HubUpdater, my worry is that just handling it in update() like you have doesn't address similar problems it would cause in rollback etc. where we also call the hubUpdater.register() command.

I moved the reset call into HubUpdater as part of it re-acquiring the lock since that seemed safer. BUT your new test was still passing for me even without the line in either place.

Based on why you wanted to add it, @schrieveslaach, is the new location good? Or did #2198 end up addressing the problem in a way that makes us not even need that reset() call anymore?

schrieveslaach · 2022-01-07T10:23:38Z

Thank for your feedback and I'm happy to help the project. Here are my responses:

BUT your new test was still passing for me even without the line in either place.

Unfortunately, the test I wrote does not trigger all race conditions all the time due to the non-determinism of parallel code. 🤷 On my local machine I can reproduce the issue with a Microservice setup.

Or did #2198 end up addressing the problem in a way that makes us not even need that reset() call anymore?

Unfortunately, no. Let me try to explain what I observe:

I have multiple Dockerized JVMs that I start at the same time and all try to acquire the Liquibase lock on a fresh database.
Each JVM called checkLiquibaseTables before it tried to acquire the lock. So every JVM has a cached version of the applied changelogs in ChangeLogHistoryService.
Because the database is empty, every JVM has an empty cache of applied changelogs.
Now, one JVM applies the full changelog file and releases the lock.
Another JVM acquires the lock and uses the cached list of applied changelogs (which is empty) and determines which entries have to be applied. It this case it applies all entries of the changelog file and fails due to existing tables.

Therefore, I called changeLogService.reset(); in Liquibase, fixing the out-of-cache of ChangeLogHistoryService in bullet point 5.

@nvoxland, @StevenMassaro, does lockService.reset(); the same thing?

nvoxland · 2022-01-13T21:12:20Z

@schrieveslaach lockService.reset() is a similar reset but for the LockService.

What should generally be happening is that we use the lock to ensure that your instance is the only one running. Once we have the lock ourselves, we read from the databasechangelog table to populate the ChangeLogHistoryService and then assume that nobody else will be messing with it besides us.

The problem is that we have the code that does a temporary unlock when it's prompting the user, which then break that older assumption that the changelogHistory remains correct. We need to make sure that anytime we unlock, we also no longer trust the changelog service.

The extra commit I pushed to your fork that moves the reset to be part of the logic that does the temporary unlock in hopes of better handling the case. We have a refactoring of the lock+changelog services coming up soon and having a more direct tie between them may be smart based on these bugs.

But until that, since you can reproduce it more readily on your system than the tests do, can you try the newest version of your fork to see if it's still working for you? Or if my function move broke your fix?

schrieveslaach · 2022-01-14T11:05:56Z

@nvoxland, I tested the newest version and I could reproduce the race condition again. Your fix does not work in my scenario.

This commit adds an integration test for LB-2131's fix. Additionally, the commit ensures that the changelog history cache will be reseted when the lock service has been reseted. That ensures that multiple JVMs can try to the changelog without stepping on each other's toes.

schrieveslaach · 2022-02-01T16:36:47Z

@nvoxland, @StevenMassaro, I updated the PR with a slight change. Based on @nvoxland's proposal I tried to reset the changelog history when reseting the lock service. However, that seems not to clean the history properly because on my system the migration still crashes with multiple JVMs.

I've got the impression that my initial fix (reseting the changelog within Liquibase) ist the only working solution.

What do you think?

nvoxland · 2022-02-16T21:31:10Z

Thanks for the update @schrieveslaach. I'm trying out some options for improving testing in general around the lock service and then fixing a variety of bugs based on what those tests can show failing.

I'm going to wrap this PR into that effort over the next couple weeks.

schrieveslaach · 2022-02-22T15:21:16Z

@nvoxland, thanks for the update. Please, let me know if you have something to test.

schrieveslaach · 2022-04-26T11:35:53Z

@nvoxland, is there any update on this topic?

schrieveslaach · 2022-05-27T08:07:55Z

@nvoxland, ping. 😉

Fixes liquibase#3071

nvoxland · 2022-09-06T20:08:49Z

Sorry for the slow response. I keep feeling like some of the larger test refatoring was coming soon but then we keep finding other things to work on first...

Since it's taking long enough, I'll bring this in since it's a good test to be adding and if/when we get to the larger integration test refactoring it will be there for us to figure out how to incorporate into any new structures we have.

Things to be aware of:

The PR really just adds a test. The other changed code is just minor cleanup and the inclusion of a needed call to reset()

Things to worry about:

Nothing

github-actions · 2022-09-06T20:21:59Z

Unit Test Results

  4 644 files ±    0   4 644 suites ±0 38m 37s ⏱️ + 2m 12s
  4 641 tests +  15   4 419 ✔️ +12   222 💤 +    3 0 ❌ ±0
54 864 runs +180 49 684 ✔️ +24 5 180 💤 +156 0 ❌ ±0

Results for commit 72579f9. ± Comparison against base commit 165c594.

♻️ This comment has been updated with latest results.

schrieveslaach · 2022-09-13T06:25:40Z

@nvoxland, thanks for the heads up. Please, keep in mind that following change was one necessary change to resolve a race condition:

// Reset the lockService in case other JVM instances have done things to the lock table since we had last locked it
lockService.reset();

Without the fixes I wasn't able to spin up multiple services that all try to apply the database migration.

nvoxland · 2022-09-19T15:17:04Z

True, @schrieveslaach . I had noticed that but forgot to include it in my notes. I updated them with that added. Thanks

XDelphiGrl

This PR addresses race conditions when multiple Liquibase operations on the same database are locking/unlocking the DATABASECHANGELOGLOCK tracking table.

The fix resets the ChangeLogHistoryServiceFactory in the StandardLockService class to ensure the status of changesets is updated if another JVM process updated the DATABASECHANGELOG table while another process waited for the DATABASECHANGELOGLOCK table to be unlocked.
New integration test added that validates the locking use cases when three updates against the same database execute.
No additional testing required.

APPROVED

tabbyf00 · 2022-10-24T22:07:16Z

Thanks for your PR submission! We just finished reviewing and merging it into the 4.17.0 release on October 10, 2022. When you get a chance, could you please Star the Liquibase project? The star button is in the upper right corner of the screen.

schrieveslaach force-pushed the test-race-condititon-during-init branch from 8ec98f6 to 045797e Compare January 5, 2022 12:38

schrieveslaach commented Jan 5, 2022

View reviewed changes

StevenMassaro requested a review from nvoxland January 5, 2022 13:43

kataggart assigned StevenMassaro Jan 5, 2022

nvoxland added this to To Do in Conditioning++ via automation Jan 5, 2022

nvoxland moved this from To Do to In discussion in Conditioning++ Jan 5, 2022

kataggart moved this from In discussion to To Do in Conditioning++ Feb 1, 2022

kataggart assigned nvoxland and unassigned StevenMassaro Feb 1, 2022

schrieveslaach force-pushed the test-race-condititon-during-init branch from 48673aa to af0fd02 Compare February 1, 2022 16:28

Upper case result columns only for case-insensitive databases

8f0f862

Fixes liquibase#3071

kataggart added TypeBug DATABASECHANGELOGLOCK labels Jul 26, 2022

kataggart added criticalityBuilder complexityLocal Severity3 labels Aug 3, 2022

nvoxland added 4 commits August 29, 2022 10:05

Merge branch 'master' into issue_3071

ec1f70b

Merge branch 'issue_3071' of https://github.com/fbiville/liquibase

a6abb42

Merge remote-tracking branch 'origin/master'

ca5b117

Merged from master

b9cdef3

nvoxland added SafeToBuild Indicates that a particular PR contains changes which are safe to build using GitHub actions autocandidate labels Sep 6, 2022

Merge branch 'master' into test-race-condititon-during-init

aa7fc15

nvoxland added SafeToBuild Indicates that a particular PR contains changes which are safe to build using GitHub actions and removed SafeToBuild Indicates that a particular PR contains changes which are safe to build using GitHub actions labels Sep 10, 2022

nvoxland added 2 commits September 19, 2022 14:55

Merge branch 'master' into test-race-condititon-during-init

463476c

Merge branch 'master' into test-race-condititon-during-init

9003449

nvoxland added SafeToBuild Indicates that a particular PR contains changes which are safe to build using GitHub actions and removed SafeToBuild Indicates that a particular PR contains changes which are safe to build using GitHub actions labels Sep 21, 2022

nvoxland requested review from suryaaki2 and XDelphiGrl September 21, 2022 02:32

Merge branch 'master' into test-race-condititon-during-init

72579f9

nvoxland added SafeToBuild Indicates that a particular PR contains changes which are safe to build using GitHub actions and removed SafeToBuild Indicates that a particular PR contains changes which are safe to build using GitHub actions labels Sep 21, 2022

XDelphiGrl approved these changes Sep 23, 2022

View reviewed changes

suryaaki2 approved these changes Sep 23, 2022

View reviewed changes

nvoxland merged commit f700115 into liquibase:master Sep 26, 2022

nvoxland added this to the 1NEXT milestone Sep 30, 2022

nvoxland added ignore-release-notes sprint2022-34 labels Sep 30, 2022

schrieveslaach mentioned this pull request Feb 24, 2023

[Postgres] StandardLockManager does not work for parallel machines / Liquibase run with multiple instances in parallel lead to error messages (e.g. "already exists") #2315

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Test Race Condititons During Liquibase Locking #2327

Test Race Condititons During Liquibase Locking #2327

schrieveslaach commented Jan 5, 2022 •

edited

schrieveslaach commented Jan 5, 2022

schrieveslaach Jan 5, 2022

StevenMassaro commented Jan 5, 2022

nvoxland commented Jan 5, 2022

StevenMassaro commented Jan 5, 2022

nvoxland commented Jan 5, 2022

schrieveslaach commented Jan 7, 2022

nvoxland commented Jan 13, 2022

schrieveslaach commented Jan 14, 2022

schrieveslaach commented Feb 1, 2022

nvoxland commented Feb 16, 2022

schrieveslaach commented Feb 22, 2022

schrieveslaach commented Apr 26, 2022

schrieveslaach commented May 27, 2022

nvoxland commented Sep 6, 2022 •

edited

github-actions bot commented Sep 6, 2022 •

edited

schrieveslaach commented Sep 13, 2022 •

edited

nvoxland commented Sep 19, 2022

XDelphiGrl left a comment

tabbyf00 commented Oct 24, 2022

Test Race Condititons During Liquibase Locking #2327

Test Race Condititons During Liquibase Locking #2327

Conversation

schrieveslaach commented Jan 5, 2022 • edited

Environment

Pull Request Type

Description

schrieveslaach commented Jan 5, 2022

schrieveslaach Jan 5, 2022

Choose a reason for hiding this comment

StevenMassaro commented Jan 5, 2022

nvoxland commented Jan 5, 2022

StevenMassaro commented Jan 5, 2022

nvoxland commented Jan 5, 2022

schrieveslaach commented Jan 7, 2022

nvoxland commented Jan 13, 2022

schrieveslaach commented Jan 14, 2022

schrieveslaach commented Feb 1, 2022

nvoxland commented Feb 16, 2022

schrieveslaach commented Feb 22, 2022

schrieveslaach commented Apr 26, 2022

schrieveslaach commented May 27, 2022

nvoxland commented Sep 6, 2022 • edited

github-actions bot commented Sep 6, 2022 • edited

Unit Test Results

schrieveslaach commented Sep 13, 2022 • edited

nvoxland commented Sep 19, 2022

XDelphiGrl left a comment

Choose a reason for hiding this comment

tabbyf00 commented Oct 24, 2022

schrieveslaach commented Jan 5, 2022 •

edited

nvoxland commented Sep 6, 2022 •

edited

github-actions bot commented Sep 6, 2022 •

edited

schrieveslaach commented Sep 13, 2022 •

edited