Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Pause executions when terminating even when not quieting down #866

Draft
wants to merge 2 commits into
base: master
Choose a base branch
from

Conversation

jglick
Copy link
Member

@jglick jglick commented Mar 6, 2024

I observed a large stream of errors from a controller shutting down. This was part of a Kubernetes restart so the pod would have gotten a termination signal yet Jenkins is not “quieting down”.

Redacted examples of errors
…	WARNING	o.j.p.w.cps.CpsVmExecutorService#reportProblem: Unexpected exception in CPS VM thread: CpsFlowExecution[Owner[…]]
org.jenkinsci.remoting.util.ExecutorServiceUtils$FatalRejectedExecutionException: Cannot execute the command org.jenkinsci.plugins.workflow.cps.CpsVmExecutorService$1@25af3d30. The executor service is shutting down
	at hudson.remoting.SingleLaneExecutorService.execute(SingleLaneExecutorService.java:116)
	at jenkins.util.InterceptingExecutorService.execute(InterceptingExecutorService.java:71)
	at org.jenkinsci.plugins.workflow.cps.CpsThreadGroup.notifyNewHead(CpsThreadGroup.java:502)
	at org.jenkinsci.plugins.workflow.cps.FlowHead.setNewHead(FlowHead.java:158)
	at org.jenkinsci.plugins.workflow.cps.CpsBodyExecution.addBodyEndFlowNode(CpsBodyExecution.java:414)
	at org.jenkinsci.plugins.workflow.cps.CpsBodyExecution$FailureAdapter.receive(CpsBodyExecution.java:345)
	at com.cloudbees.groovy.cps.impl.ContinuationGroup.throwException(ContinuationGroup.java:211)
	at com.cloudbees.groovy.cps.impl.CastBlock$ContinuationImpl.cast(CastBlock.java:67)
	at jdk.internal.reflect.GeneratedMethodAccessor677.invoke(Unknown Source)
	at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.base/java.lang.reflect.Method.invoke(Method.java:568)
	at com.cloudbees.groovy.cps.impl.ContinuationPtr$ContinuationImpl.receive(ContinuationPtr.java:72)
	at com.cloudbees.groovy.cps.impl.CollectionLiteralBlock$ContinuationImpl.dispatch(CollectionLiteralBlock.java:55)
	at com.cloudbees.groovy.cps.impl.CollectionLiteralBlock$ContinuationImpl.item(CollectionLiteralBlock.java:45)
	at jdk.internal.reflect.GeneratedMethodAccessor636.invoke(Unknown Source)
	at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.base/java.lang.reflect.Method.invoke(Method.java:568)
	at com.cloudbees.groovy.cps.impl.ContinuationPtr$ContinuationImpl.receive(ContinuationPtr.java:72)
	at com.cloudbees.groovy.cps.impl.ConstantBlock.eval(ConstantBlock.java:21)
	at com.cloudbees.groovy.cps.Next.step(Next.java:83)
	at com.cloudbees.groovy.cps.Continuable$1.call(Continuable.java:152)
	at com.cloudbees.groovy.cps.Continuable$1.call(Continuable.java:146)
	at org.codehaus.groovy.runtime.GroovyCategorySupport$ThreadCategoryInfo.use(GroovyCategorySupport.java:136)
	at org.codehaus.groovy.runtime.GroovyCategorySupport.use(GroovyCategorySupport.java:275)
	at com.cloudbees.groovy.cps.Continuable.run0(Continuable.java:146)
	at org.jenkinsci.plugins.workflow.cps.SandboxContinuable.access$001(SandboxContinuable.java:18)
	at org.jenkinsci.plugins.workflow.cps.SandboxContinuable.run0(SandboxContinuable.java:51)
	at org.jenkinsci.plugins.workflow.cps.CpsThread.runNextChunk(CpsThread.java:187)
	at org.jenkinsci.plugins.workflow.cps.CpsThreadGroup.run(CpsThreadGroup.java:423)
	at org.jenkinsci.plugins.workflow.cps.CpsThreadGroup$2.call(CpsThreadGroup.java:331)
	at org.jenkinsci.plugins.workflow.cps.CpsThreadGroup$2.call(CpsThreadGroup.java:295)
	at org.jenkinsci.plugins.workflow.cps.CpsVmExecutorService$2.call(CpsVmExecutorService.java:97)
	at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)
	at hudson.remoting.SingleLaneExecutorService$1.run(SingleLaneExecutorService.java:139)
	at jenkins.util.ContextResettingExecutorService$1.run(ContextResettingExecutorService.java:28)
	at jenkins.security.ImpersonatingExecutorService$1.run(ImpersonatingExecutorService.java:68)
	at jenkins.util.ErrorLoggingExecutorService.lambda$wrap$0(ErrorLoggingExecutorService.java:51)
	at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:539)
	at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)
	at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136)
	at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635)
	at java.base/java.lang.Thread.run(Thread.java:840)
…	WARNING	o.j.p.w.cps.CpsBodyExecution$2#onFailure: could not cancel CpsStepContext[313:parallel]:Owner[…]
Also:   org.jenkinsci.remoting.util.ExecutorServiceUtils$FatalRejectedExecutionException: Cannot execute the command org.jenkinsci.plugins.workflow.cps.CpsVmExecutorService$1@25af3d30. The executor service is shutting down
		at hudson.remoting.SingleLaneExecutorService.execute(SingleLaneExecutorService.java:116)
		at jenkins.util.InterceptingExecutorService.execute(InterceptingExecutorService.java:71)
		at org.jenkinsci.plugins.workflow.cps.CpsThreadGroup.notifyNewHead(CpsThreadGroup.java:502)
		at org.jenkinsci.plugins.workflow.cps.FlowHead.setNewHead(FlowHead.java:158)
		at org.jenkinsci.plugins.workflow.cps.CpsBodyExecution.addBodyEndFlowNode(CpsBodyExecution.java:414)
		at org.jenkinsci.plugins.workflow.cps.CpsBodyExecution$FailureAdapter.receive(CpsBodyExecution.java:345)
		at com.cloudbees.groovy.cps.impl.ContinuationGroup.throwException(ContinuationGroup.java:211)
		at com.cloudbees.groovy.cps.impl.CastBlock$ContinuationImpl.cast(CastBlock.java:67)
		at jdk.internal.reflect.GeneratedMethodAccessor677.invoke(Unknown Source)
		at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
		at java.base/java.lang.reflect.Method.invoke(Method.java:568)
		at com.cloudbees.groovy.cps.impl.ContinuationPtr$ContinuationImpl.receive(ContinuationPtr.java:72)
		at com.cloudbees.groovy.cps.impl.CollectionLiteralBlock$ContinuationImpl.dispatch(CollectionLiteralBlock.java:55)
		at com.cloudbees.groovy.cps.impl.CollectionLiteralBlock$ContinuationImpl.item(CollectionLiteralBlock.java:45)
		at jdk.internal.reflect.GeneratedMethodAccessor636.invoke(Unknown Source)
		at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
		at java.base/java.lang.reflect.Method.invoke(Method.java:568)
		at com.cloudbees.groovy.cps.impl.ContinuationPtr$ContinuationImpl.receive(ContinuationPtr.java:72)
		at com.cloudbees.groovy.cps.impl.ConstantBlock.eval(ConstantBlock.java:21)
		at com.cloudbees.groovy.cps.Next.step(Next.java:83)
		at com.cloudbees.groovy.cps.Continuable$1.call(Continuable.java:152)
		at com.cloudbees.groovy.cps.Continuable$1.call(Continuable.java:146)
		at org.codehaus.groovy.runtime.GroovyCategorySupport$ThreadCategoryInfo.use(GroovyCategorySupport.java:136)
		at org.codehaus.groovy.runtime.GroovyCategorySupport.use(GroovyCategorySupport.java:275)
		at com.cloudbees.groovy.cps.Continuable.run0(Continuable.java:146)
		at org.jenkinsci.plugins.workflow.cps.SandboxContinuable.access$001(SandboxContinuable.java:18)
		at org.jenkinsci.plugins.workflow.cps.SandboxContinuable.run0(SandboxContinuable.java:51)
		at org.jenkinsci.plugins.workflow.cps.CpsThread.runNextChunk(CpsThread.java:187)
		at org.jenkinsci.plugins.workflow.cps.CpsThreadGroup.run(CpsThreadGroup.java:423)
		at org.jenkinsci.plugins.workflow.cps.CpsThreadGroup$2.call(CpsThreadGroup.java:331)
		at org.jenkinsci.plugins.workflow.cps.CpsThreadGroup$2.call(CpsThreadGroup.java:295)
		at org.jenkinsci.plugins.workflow.cps.CpsVmExecutorService$2.call(CpsVmExecutorService.java:97)
…	WARNING	o.j.p.w.cps.CpsStepContext$2#onFailure: Failed to proceed after CpsStepContext[319:podTemplate]:Owner[…]
org.jenkinsci.remoting.util.ExecutorServiceUtils$FatalRejectedExecutionException: Cannot execute the command java.util.concurrent.FutureTask@3f381a40[Not completed, task = java.util.concurrent.Executors$RunnableAdapter@6367fef2[Wrapped task = org.jenkinsci.plugins.workflow.cps.CpsVmExecutorService$1@1e5feb9]]. The executor service is shutting down
	at hudson.remoting.SingleLaneExecutorService.execute(SingleLaneExecutorService.java:116)
	at java.base/java.util.concurrent.AbstractExecutorService.submit(AbstractExecutorService.java:123)
	at jenkins.util.InterceptingExecutorService.submit(InterceptingExecutorService.java:46)
	at org.jenkinsci.plugins.workflow.cps.CpsFlowExecution$4.onSuccess(CpsFlowExecution.java:970)
	at org.jenkinsci.plugins.workflow.cps.CpsFlowExecution$4.onSuccess(CpsFlowExecution.java:965)
	at org.jenkinsci.plugins.workflow.support.concurrent.Futures$1.run(Futures.java:147)
	at org.jenkinsci.plugins.workflow.support.concurrent.DirectExecutor.execute(DirectExecutor.java:33)
	at com.google.common.util.concurrent.AbstractFuture.executeListener(AbstractFuture.java:1298)
	at com.google.common.util.concurrent.AbstractFuture.addListener(AbstractFuture.java:762)
	at com.google.common.util.concurrent.AbstractFuture$TrustedFuture.addListener(AbstractFuture.java:133)
	at org.jenkinsci.plugins.workflow.support.concurrent.Futures.addCallback(Futures.java:157)
	at org.jenkinsci.plugins.workflow.support.concurrent.Futures.addCallback(Futures.java:97)
	at org.jenkinsci.plugins.workflow.cps.CpsFlowExecution.runInCpsVmThread(CpsFlowExecution.java:965)
	at org.jenkinsci.plugins.workflow.cps.CpsStepContext.scheduleNextRun(CpsStepContext.java:381)
	at org.jenkinsci.plugins.workflow.cps.CpsStepContext.completed(CpsStepContext.java:326)
	at org.jenkinsci.plugins.workflow.cps.CpsStepContext.onFailure(CpsStepContext.java:314)
	at org.jenkinsci.plugins.workflow.steps.StepExecution.stop(StepExecution.java:109)
	at org.jenkinsci.plugins.workflow.cps.CpsVmExecutorService.reportProblem(CpsVmExecutorService.java:72)
	at org.jenkinsci.plugins.workflow.cps.CpsVmExecutorService$2.call(CpsVmExecutorService.java:99)
	at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)
	at hudson.remoting.SingleLaneExecutorService$1.run(SingleLaneExecutorService.java:139)
	at jenkins.util.ContextResettingExecutorService$1.run(ContextResettingExecutorService.java:28)
	at jenkins.security.ImpersonatingExecutorService$1.run(ImpersonatingExecutorService.java:68)
	at jenkins.util.ErrorLoggingExecutorService.lambda$wrap$0(ErrorLoggingExecutorService.java:51)
	at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:539)
	at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)
	at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136)
	at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635)
	at java.base/java.lang.Thread.run(Thread.java:840)

From earlier log messages I know all this occurred after suspendAll() completed. Yet by looking at the line numbers you can see

being run, which is after the checks for a paused program. This seems wrong: if Jenkins is not just quieting down but actively terminating, and has in fact attained the EXECUTIONS_SUSPENDED milestone (#766), we would not expect CPS chunks to still be run. Apparently they do not run well (the executor service has already been shut down, hence these errors), but perhaps the process is still writing other metadata files even after checkpoint has been asked to save everything for the last time.

I could not figure out how to reproduce this situation in a test. The lack of pausing is reproduced easily in JenkinsSessionRule. RealJenkinsRule normally enters quiet-down mode which activates the Pausing (Preparing for shutdown) message, but even without jenkinsci/jenkins-test-harness#544 you can simply replace a call to stopJenkins with runRemotely on

/**
 * Calls {@link Jenkins#doExit} which is more like {@link WebAppMain#contextDestroyed} than {@link RealJenkinsRule#stopJenkins}
 * in that it does not call {@link Jenkins#doSafeExit} and thus {@link Jenkins#doQuietDown()}.
 */
private static void exit(JenkinsRule r) throws Throwable {
    r.jenkins.doExit(null, null);
}

What I do not know how to reproduce is the FatalRejectedExecutionExceptions shown above. What causes the program to trigger work during this time? Maybe I need to artificially slow down shutdown and then have some sh steps finish?

@jglick jglick added the bug label Mar 6, 2024
@jglick jglick requested a review from a team as a code owner March 6, 2024 18:42
@jglick jglick requested a review from dwnusbaum March 6, 2024 18:42
@jglick jglick marked this pull request as draft March 6, 2024 19:03
@jglick
Copy link
Member Author

jglick commented Mar 6, 2024

Managed to reproduce a stack trace, though not in fact involving CPS code:

Not really successful attempt at reproduction
WARNING	o.j.p.w.cps.CpsStepContext$2#onFailure: Failed to proceed after CpsStepContext[3:semaphore]:Owner[p/1:p #1]
org.jenkinsci.remoting.util.ExecutorServiceUtils$FatalRejectedExecutionException: Cannot execute the command java.util.concurrent.FutureTask@68ab1a80[Not completed, task = java.util.concurrent.Executors$RunnableAdapter@7fb9a08c[Wrapped task = org.jenkinsci.plugins.workflow.cps.CpsVmExecutorService$1@4f086a9f]]. The executor service is shutting down
	at hudson.remoting.SingleLaneExecutorService.execute(SingleLaneExecutorService.java:116)
	at java.base/java.util.concurrent.AbstractExecutorService.submit(AbstractExecutorService.java:118)
	at jenkins.util.InterceptingExecutorService.submit(InterceptingExecutorService.java:46)
	at org.jenkinsci.plugins.workflow.cps.CpsFlowExecution$4.onSuccess(CpsFlowExecution.java:975)
	at org.jenkinsci.plugins.workflow.cps.CpsFlowExecution$4.onSuccess(CpsFlowExecution.java:970)
	at org.jenkinsci.plugins.workflow.support.concurrent.Futures$1.run(Futures.java:147)
	at org.jenkinsci.plugins.workflow.support.concurrent.DirectExecutor.execute(DirectExecutor.java:33)
	at com.google.common.util.concurrent.AbstractFuture.executeListener(AbstractFuture.java:1286)
	at com.google.common.util.concurrent.AbstractFuture.addListener(AbstractFuture.java:760)
	at com.google.common.util.concurrent.AbstractFuture$TrustedFuture.addListener(AbstractFuture.java:134)
	at org.jenkinsci.plugins.workflow.support.concurrent.Futures.addCallback(Futures.java:157)
	at org.jenkinsci.plugins.workflow.support.concurrent.Futures.addCallback(Futures.java:97)
	at org.jenkinsci.plugins.workflow.cps.CpsFlowExecution.runInCpsVmThread(CpsFlowExecution.java:970)
	at org.jenkinsci.plugins.workflow.cps.CpsStepContext.scheduleNextRun(CpsStepContext.java:381)
	at org.jenkinsci.plugins.workflow.cps.CpsStepContext.completed(CpsStepContext.java:326)
	at org.jenkinsci.plugins.workflow.cps.CpsStepContext.onSuccess(CpsStepContext.java:318)
	at org.jenkinsci.plugins.workflow.test.steps.SemaphoreStep.success(SemaphoreStep.java:123)
	at org.jenkinsci.plugins.workflow.cps.CpsFlowExecutionTest.resumeStepDuringShutdown(CpsFlowExecutionTest.java:435)

@jglick
Copy link
Member Author

jglick commented May 15, 2024

While restarting a controller running

while (true) {
    sleep 1
}

I did see

Stack trace
WARNING	o.j.p.w.cps.CpsStepContext$2#onFailure: Failed to proceed after CpsStepContext[3312:sleep]:Owner[sleepy/1:sleepy #1]
org.jenkinsci.remoting.util.ExecutorServiceUtils$FatalRejectedExecutionException: Cannot execute the command java.util.concurrent.FutureTask@5e80b055[Not completed, task = java.util.concurrent.Executors$RunnableAdapter@770436cf[Wrapped task = org.jenkinsci.plugins.workflow.cps.CpsVmExecutorService$$Lambda$2874/0x00007f3109567980@223047cf]]. The executor service is shutting down
	at hudson.remoting.SingleLaneExecutorService.execute(SingleLaneExecutorService.java:116)
	at java.base/java.util.concurrent.AbstractExecutorService.submit(AbstractExecutorService.java:123)
	at jenkins.util.InterceptingExecutorService.submit(InterceptingExecutorService.java:46)
	at org.jenkinsci.plugins.workflow.cps.CpsFlowExecution$4.onSuccess(CpsFlowExecution.java:975)
	at org.jenkinsci.plugins.workflow.cps.CpsFlowExecution$4.onSuccess(CpsFlowExecution.java:970)
	at org.jenkinsci.plugins.workflow.support.concurrent.Futures$1.run(Futures.java:147)
	at org.jenkinsci.plugins.workflow.support.concurrent.DirectExecutor.execute(DirectExecutor.java:33)
	at com.google.common.util.concurrent.AbstractFuture.executeListener(AbstractFuture.java:1298)
	at com.google.common.util.concurrent.AbstractFuture.addListener(AbstractFuture.java:762)
	at com.google.common.util.concurrent.AbstractFuture$TrustedFuture.addListener(AbstractFuture.java:133)
	at org.jenkinsci.plugins.workflow.support.concurrent.Futures.addCallback(Futures.java:157)
	at org.jenkinsci.plugins.workflow.support.concurrent.Futures.addCallback(Futures.java:97)
	at org.jenkinsci.plugins.workflow.cps.CpsFlowExecution.runInCpsVmThread(CpsFlowExecution.java:970)
	at org.jenkinsci.plugins.workflow.cps.CpsStepContext.scheduleNextRun(CpsStepContext.java:381)
	at org.jenkinsci.plugins.workflow.cps.CpsStepContext.completed(CpsStepContext.java:326)
	at org.jenkinsci.plugins.workflow.cps.CpsStepContext.onSuccess(CpsStepContext.java:318)
	at org.jenkinsci.plugins.workflow.steps.SleepStep$Execution.lambda$setupTimer$0(SleepStep.java:100)
	at …

but that seems expected.

@jglick
Copy link
Member Author

jglick commented May 23, 2024

Had a job running like

checkpoint 'x' // CloudBees CI
build job: JOB_NAME, wait: false

and pressed Ctrl-C on java -jar. Got

…
INFO	jenkins.model.Jenkins$16#onAttained: Started termination
…
WARNING	o.j.p.w.cps.CpsFlowExecution$8#onFailure
org.jenkinsci.remoting.util.ExecutorServiceUtils$FatalRejectedExecutionException: Cannot execute the command java.util.concurrent.FutureTask@1a496d0f[Not completed, task = java.util.concurrent.Executors$RunnableAdapter@406f73a7[Wrapped task = org.jenkinsci.plugins.workflow.cps.CpsVmExecutorService$$Lambda$1032/0x00007fc07cece900@10cd21b1]]. The executor service is shutting down
	at hudson.remoting.SingleLaneExecutorService.execute(SingleLaneExecutorService.java:116)
	at java.base/java.util.concurrent.AbstractExecutorService.submit(AbstractExecutorService.java:123)
	at jenkins.util.InterceptingExecutorService.submit(InterceptingExecutorService.java:46)
	at org.jenkinsci.plugins.workflow.cps.CpsFlowExecution$4.onSuccess(CpsFlowExecution.java:975)
	at org.jenkinsci.plugins.workflow.cps.CpsFlowExecution$4.onSuccess(CpsFlowExecution.java:970)
	at org.jenkinsci.plugins.workflow.support.concurrent.Futures$1.run(Futures.java:147)
	at org.jenkinsci.plugins.workflow.support.concurrent.DirectExecutor.execute(DirectExecutor.java:33)
	at com.google.common.util.concurrent.AbstractFuture.executeListener(AbstractFuture.java:1298)
	at com.google.common.util.concurrent.AbstractFuture.addListener(AbstractFuture.java:762)
	at com.google.common.util.concurrent.AbstractFuture$TrustedFuture.addListener(AbstractFuture.java:133)
	at org.jenkinsci.plugins.workflow.support.concurrent.Futures.addCallback(Futures.java:157)
	at org.jenkinsci.plugins.workflow.support.concurrent.Futures.addCallback(Futures.java:97)
	at org.jenkinsci.plugins.workflow.cps.CpsFlowExecution.runInCpsVmThread(CpsFlowExecution.java:970)
	at org.jenkinsci.plugins.workflow.cps.CpsFlowExecution.suspendAll(CpsFlowExecution.java:1697)
	at …
WARNING	o.j.p.workflow.job.WorkflowRun#getListener
java.lang.IllegalStateException: trying to open a build log on x #746 after it has completed
	at org.jenkinsci.plugins.workflow.job.WorkflowRun.getListener(WorkflowRun.java:232)
	at org.jenkinsci.plugins.workflow.job.WorkflowRun$Owner.getListener(WorkflowRun.java:1024)
	at org.jenkinsci.plugins.workflow.cps.CpsFlowExecution.suspendAll(CpsFlowExecution.java:1708)
	at …
INFO	jenkins.model.Jenkins$16#onAttained: Completed termination
…

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
1 participant