Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Avoid race condition in ConcurrentReferenceHashMap #31008

Closed
davmac314 opened this issue Aug 8, 2023 · 6 comments
Closed

Avoid race condition in ConcurrentReferenceHashMap #31008

davmac314 opened this issue Aug 8, 2023 · 6 comments
Assignees
Labels
in: core Issues in core modules (aop, beans, core, context, expression) type: enhancement A general enhancement
Milestone

Comments

@davmac314
Copy link

davmac314 commented Aug 8, 2023

Affects: 6.0.11 (current HEAD) and probably earlier versions


The ConcurrentReferenceHashMap used by Spring has a race condition which can cause it to incorrectly return null for a get operation where they key is in the map.

In (inner class) Segment.getReference(...), line 500 (method starts at line 493), we see:

		// Use a local copy to protect against other threads writing
		Reference<K, V>[] references = this.references;
		int index = getIndex(hash, references);
		Reference<K, V> head = references[index];
		return findInChain(head, key, hash);

(This is called from ConcurrentWeakHashMap.getEntry(...), itself called by various methods including get(...)). At this point no lock is held. The comment is the clue: it takes a local reference to the existing references array (this.references) in order to protect against issues caused by other threads re-assigning this field.

Taking a local reference however doesn't protect against other threads writing to the same array. That is exactly what can happen, in certain (rare) circumstances, within the Segment.restructure(...) method (line 605+ below, method starts at line 580):

            // Either create a new table or reuse the existing one
            Entry<K, V>[] restructured =
                    (resizing ? createEntryArray(restructureSize) : this.entries); // <=== same array

            // Restructure
            for (int i = 0; i < this.entries.length; i++) {
                entry = this.entries[i];
                if (!resizing) {
                    restructured[i] = null;    // <=== write HERE!

Notice that the restructure uses the original array if it is not resizing (line marked with "same array" comment) and that it temporarily assigns each entry in turn with null ("write HERE!"). The entry is restored later on in the loop, but this moment when it is null is the problem, since it may cause Segment.getReference(...) (first method posted above), if being executed concurrently in another thread, to retrieve the null value when looking up the entry:

		Reference<K, V> head = references[index];   // <=== write affects this read!

This causes it to return null, as if the map does not contain an entry with the given key, when it fact it may do so.

In practice the issue may be very difficult to reproduce and I'm sure it is uncommon "in the wild" but I was able to reproduce the issue by inserting some sleep calls at key points within the ConcurrentReferenceHashMap code and executing operations in parallel from two different threads, while also ensuring that a key in the map became garbage collected before the call to the 2nd operation so that it would perform a restructure after the first (get) operation had taken its reference to this.references but before it retrieved the entry. This was based on an already slightly modified version of the class which we are using internally but I have inspected the version in Spring carefully and I'm fairly sure the same issue is present.

@spring-projects-issues spring-projects-issues added the status: waiting-for-triage An issue we've not yet triaged or decided on label Aug 8, 2023
@bclozel
Copy link
Member

bclozel commented Nov 27, 2023

Sorry this got overlooked @davmac314

I agree with your assessment. There are other possible cases of race conditions, for example o.s.util.ConcurrentReferenceHashMap.Segment#doTask also updates array indices when adding entries. In that case getReference could return null, while the entry is being added.

I think this is unfortunate but in line with the general approach here: references come and go depending on restructure operations, memory being GC'd, etc. I've checked here that no NullPointerException should be thrown in this case and this should only return null when the entry is there. At worst this makes the cache less efficient. The alternative would be to add a guard and lock the read operation but this comes at a significant performance cost.

I'm closing this issue a result, let me know if you can think of a way to solve this particular issue without breaking the runtime model for this class.

Thanks for your contribution!

@bclozel bclozel closed this as not planned Won't fix, can't repro, duplicate, stale Nov 27, 2023
@bclozel bclozel added status: declined A suggestion or change that we don't feel we should currently apply in: core Issues in core modules (aop, beans, core, context, expression) and removed status: waiting-for-triage An issue we've not yet triaged or decided on labels Nov 27, 2023
@davmac314
Copy link
Author

Hi @bclozel ,

Having reported the issue I feel I've done my civic duty and if you choose to close the issue that is in your hands. I would suggest that at least the documentation (javadoc for the class) should be updated to reflect this known limitation, lest the class be used by others (either withing the Spring project our outside it) with the expectation that values that are definitely in the map will be able to looked up successfully, but that is up to you.

We fixed this in our modified version by, in the case of a restructure using the same array, simply modifying the linked list of entries in each array element "in place" rather than nulling the list first. There is no need to lock for the read operation. Unfortunately I cannot share the code which was written as part of my employment.

I looked at the doTask method which you suggested also had a race, but I do not see the problem there. Unless you mean that another thread could look up an item that is being added "right now" and not see it (null being returned); I don't think this qualifies as a problem race since there is always a window where the object "is currently being added" rather than "has successfully been added". In comparison, the bug I pointed out means that a single thread can add a k,v pair, look up k and get null as a result (due to interaction in another thread not involving k or v), then look up k a second time but on this occasion get v as a result.

I understand that for a cache this may not matter (other than being a minor performance issue) but once again I would strongly suggest that the class be documented as only being suitable for use as a cache if the issue isn't going to be fixed.

@bclozel
Copy link
Member

bclozel commented Nov 29, 2023

Thanks for your feedback @davmac314

This is an interesting tradeoff for the project:

  • we can attempt to fix a problem that nobody experienced in production and possible cause regressions (functional or performance). A PR would have helped since you have experience running with the fix in production, but apparently you're not allowed to contribute here.
  • Leave things as they are entirely, as documenting this very specific case is not really actionable for most people and we don't intend such classes to be used outside of the main use cases in Spring Framework (we don't intend to be a general caching library)

I can give it a shot still, would you be opened to reviewing my changes?

@davmac314
Copy link
Author

Hi @bclozel,
I would certainly be happy to review any changes relevant to this issue. It is unfortunate (and frustrating for me) that I can't simply give you the code that has already been written. Fortunately the changes required are small.

@bclozel
Copy link
Member

bclozel commented Dec 11, 2023

Hey @davmac314 - what do you think about this change: https://github.com/bclozel/spring-framework/tree/gh-31008

Thanks for your input!

@bclozel bclozel reopened this Dec 11, 2023
@bclozel bclozel self-assigned this Dec 11, 2023
@bclozel bclozel added type: enhancement A general enhancement status: waiting-for-feedback We need additional information before we can continue and removed status: declined A suggestion or change that we don't feel we should currently apply labels Dec 11, 2023
@bclozel bclozel added this to the 6.1.x milestone Dec 11, 2023
@davmac314
Copy link
Author

@bclozel looks good to me!

@spring-projects-issues spring-projects-issues added status: feedback-provided Feedback has been provided and removed status: waiting-for-feedback We need additional information before we can continue labels Dec 11, 2023
@bclozel bclozel removed the status: feedback-provided Feedback has been provided label Dec 12, 2023
@bclozel bclozel modified the milestones: 6.1.x, 6.1.2 Dec 12, 2023
@bclozel bclozel changed the title Race condition in org.springframework.util.ConcurrentReferenceHashMap Avoid race condition in ConcurrentReferenceHashMap Dec 12, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
in: core Issues in core modules (aop, beans, core, context, expression) type: enhancement A general enhancement
Projects
None yet
Development

No branches or pull requests

3 participants