Creating placeholder flownodes because failed loading originals.
java.io.IOException: Tried to load head FlowNodes for execution Owner[continuous-integration/PR-5201/5:continuous-integration/PR-5201 #5 (closed)] but FlowNode was not found in storage for head id:FlowNodeId 1:72
at org.jenkinsci.plugins.workflow.cps.CpsFlowExecution.initializeStorage(CpsFlowExecution.java:679)
at org.jenkinsci.plugins.workflow.cps.CpsFlowExecution.onLoad(CpsFlowExecution.java:716)
at org.jenkinsci.plugins.workflow.job.WorkflowRun.getExecution(WorkflowRun.java:701)
at org.jenkinsci.plugins.workflow.job.WorkflowRun.onLoad(WorkflowRun.java:560)
at hudson.model.RunMap.retrieve(RunMap.java:233)
at hudson.model.RunMap.retrieve(RunMap.java:61)
at jenkins.model.lazy.AbstractLazyLoadRunMap.load(AbstractLazyLoadRunMap.java:650)
at jenkins.model.lazy.AbstractLazyLoadRunMap.load(AbstractLazyLoadRunMap.java:632)
at jenkins.model.lazy.AbstractLazyLoadRunMap.getByNumber(AbstractLazyLoadRunMap.java:530)
at hudson.model.RunMap.getById(RunMap.java:213)
at org.jenkinsci.plugins.workflow.job.WorkflowRun$Owner.run(WorkflowRun.java:949)
at org.jenkinsci.plugins.workflow.job.WorkflowRun$Owner.get(WorkflowRun.java:961)
at org.jenkinsci.plugins.workflow.flow.FlowExecutionList$1.computeNext(FlowExecutionList.java:76)
at org.jenkinsci.plugins.workflow.flow.FlowExecutionList$1.computeNext(FlowExecutionList.java:68)
at com.google.common.collect.AbstractIterator.tryToComputeNext(AbstractIterator.java:146)
at com.google.common.collect.AbstractIterator.hasNext(AbstractIterator.java:141)
at org.jenkinsci.plugins.workflow.flow.FlowExecutionList$ItemListenerImpl.onLoaded(FlowExecutionList.java:197)
at jenkins.model.Jenkins.(Jenkins.java:1036)
at hudson.model.Hudson.(Hudson.java:86)
at hudson.model.Hudson.(Hudson.java:82)
at hudson.WebAppMain$3.run(WebAppMain.java:247)
GitHub has been notified of this commit’s build result
Finished: FAILURE
after cca 16 hours of just not releasing of executors after successful maven builds
Has it ever worked before with three parallel stages?
When did it work the last time?
What has changed since then?
Does only your latest PR have this problem?
Have you tried setting a timeout on a stage level in the Jenkinsfile?
With regards to the error message, I found https://issues.jenkins.io/browse/JENKINS-55287, but it does not have a solution. One mentioned issue could be OOM errors, but that's the case here (I have checked the memory consumption).
it works only with 2 parallel stages (there are only 2 available executors for the JIPP, so it uses the maximum amount of executors)
CI build job normally passed (with 2 parallel stages) before Friday, 18 November
Actually no changes. The only change we did for the CI job (about 2-3 weeks ago) is a modification of the idle timeout to 30 HOURS but since then the job worked fine for some time (till the mentioned Friday).
yes
yes (the first attempt was to set idle timeout to 30 minutes, but it shortened the builds queue - when many PRs should be built timeout killed them before any executors were released from previous builds, so we have to increase the timeout to 30 hours)
for now we have a new PR (PR-5204) which has exactly the same issue. Basically, no PRs would be built due to this issue. Could you please take a look at how to solve it?
we've got some executors for the weekend (I presume it happened just randomly without any configuration changes), however, today another PR is stuck with the lack of executors: https://ci.eclipse.org/jersey/job/continuous-integration/job/PR-5208/ (the very first build for this PR was running but failed due to internal checks failure, but other attempts are failed for the executors' issue).
it does not release the executor's thread after the build is done. In the log above 2 builds (JDK 8 and JDK 11) are already done, but related executors are not released so there is no chance for the final part of the job (JDK 19) to be started.
thank you, now it looks like, that a new build (PR-5222) has got executors for 2 threads (JDK 8 and JDK 11) and for those threads executors were successfully released after the mvn build was done. But for the 3d thread the build was not started (no available executors) and for another job (https://ci.eclipse.org/jersey/job/list-closed-stagings/) still there are no executors allocated. Could you please possibly restart the whole JIPP again, or take a look why there are no available executors now?
hm, today morning (Friday) PR-5224 failed in the middle of the run with the error:
ERROR: Cannot resume build because FlowNode 34 for FlowHead 1 could not be loaded. This is expected to happen when using the PERFORMANCE_OPTIMIZED durability setting and Jenkins is not shut down cleanly. Consider investigating to understand if Jenkins was not shut down cleanly or switching to the MAX_SURVIVABILITY durability setting which should prevent this issue in most cases.
Now the CI/CD job has already fallen twice with the message:
Creating placeholder flownodes because failed loading originals. ERROR: Cannot resume build because FlowNode 72 for FlowHead 1 could not be loaded. This is expected to happen when using the PERFORMANCE_OPTIMIZED durability setting and Jenkins is not shut down cleanly. Consider investigating to understand if Jenkins was not shut down cleanly or switching to the MAX_SURVIVABILITY durability setting which should prevent this issue in most cases.
looks like the issue with FlowNodes/missing executors is back again - after 2 failures due to FlowNodes issue (
3.x branch,
PR-5236) - job does not receive any free executors (another 3 desperate attempts to re-run the job for the PR 5236).
could you please restart the JIPP (any other magic is appreciated as well)?
today the JIPP was restarted (probably it was an automatical restart, I noticed it only because I was going through logs at that time) and shortly after that I did a commit to a pull request which triggered the build (PR-5322). It looks like we are run out of executors again. The build is still in the queue and nothing is going on for more than 2 hours. Could you please do something about that? Like restart the JIPP on purpose Thanks a lot for assistance.