Skip to content

Commit

Permalink
Fix a newcoming child being sometimes ignored
Browse files Browse the repository at this point in the history
The failure went like this:
* A child arrives.
* In the meantime, the parent enters `tryMakeCompletingSlowPath`
  and remembers the current list of handlers, which is an empty
  or a single-element one.
* The parent updates the state to the finishing one.
* The child enters the list.
* The parent traverses *an old list*, the one from before the
  child arrived. It sees no children in the empty/single-element
  list and forgets about the child.

Why, then, was it that this worked before?

It was because there was a guarantee that no new children are going
to be registered if two conditions are true:
* The root cause of the error is set to something,
* And the state is already "completing".

`tryMakeCompletingSlowPath` sets the state to completing, and
because it updates the state inside `synchronized`, there was a
guarantee that the child would see either the old state (and, if
it adds itself successfully, then `tryMakeCompletingSlowPath` will
retry) or the complete new one, with `isCompleting` and the error
set to something.

Additionally, `tryMakeCompletingSlowPath` is entered for empty or
single-element lists *almost only* when it's completing with an
error, so in effect, it was *almost always* guaranteed that new
children weren't going to be registered.

An exception is a tiny bug that used to be present, but wasn't
noticed by anyone because of how incredibly specific it was:
`tryMakeCompletingSlowPath` can also be entered for single-element
lists when completing without an error if that single element is
not a child. In this case, in extremely rare racy scenarios,
attaching a child could result in the child registering itself,
but the parent forgetting to wait for it.
  • Loading branch information
dkhalanskyjb committed Mar 15, 2024
1 parent d31725c commit e170b68
Showing 1 changed file with 2 additions and 5 deletions.
7 changes: 2 additions & 5 deletions kotlinx-coroutines-core/common/src/JobSupport.kt
Expand Up @@ -939,11 +939,11 @@ public open class JobSupport constructor(active: Boolean) : Job, ChildJob, Paren
// process cancelling notification here -- it cancels all the children _before_ we start to wait them (sic!!!)
notifyRootCause?.let { notifyCancelling(list, it) }
// now wait for children
val child = firstChild(state)
val child = list.nextChild()
if (child != null && tryWaitForChild(finishing, child, proposedUpdate))
return COMPLETING_WAITING_CHILDREN
list.close(LIST_CHILD_PERMISSION)
val anotherChild = firstChild(state)
val anotherChild = list.nextChild()
if (anotherChild != null && tryWaitForChild(finishing, anotherChild, proposedUpdate))
return COMPLETING_WAITING_CHILDREN
// otherwise -- we have not children left (all were already cancelled?)
Expand All @@ -953,9 +953,6 @@ public open class JobSupport constructor(active: Boolean) : Job, ChildJob, Paren
private val Any?.exceptionOrNull: Throwable?
get() = (this as? CompletedExceptionally)?.cause

private fun firstChild(state: Incomplete) =
state as? ChildHandleNode ?: state.list?.nextChild()

// return false when there is no more incomplete children to wait
// ## IMPORTANT INVARIANT: Only one thread can be concurrently invoking this method.
private tailrec fun tryWaitForChild(state: Finishing, child: ChildHandleNode, proposedUpdate: Any?): Boolean {
Expand Down

0 comments on commit e170b68

Please sign in to comment.