Jenkins randomly stops and resumes
Summary
In random cases the build suddenly pauses and resumes after a while which causes unpredictable timeouts.
Steps to reproduce
Run GlassFish job, ie. here: https://ci.eclipse.org/glassfish/job/glassfish_build-and-test-using-jenkinsfile/job/PR-25517/
Today most of runs failed in random steps:
-
ant -version paused for more than 10 minutes. At this moment it is still stuck: https://ci.eclipse.org/glassfish/job/glassfish_build-and-test-using-jenkinsfile/job/PR-25517/48/execution/node/431/log/
-
java -version took 39 seconds in the same build:
09:44:19 + java -version 09:44:58 openjdk version "17.0.15" 2025-04-15 09:44:58 OpenJDK Runtime Environment Temurin-17.0.15+6 (build 17.0.15+6) 09:44:58 OpenJDK 64-Bit Server VM Temurin-17.0.15+6 (build 17.0.15+6, mixed mode, sharing) 09:44:58 + ant -version
-
in another job domain creations stuck; I tried to pause+resume the build, Jenkins reacted, but it still doesn't move for more than 2 minutes.
09:57:20 + /home/jenkins/agent/workspace/_test-using-jenkinsfile_PR-25517/glassfish7/glassfish/bin/asadmin --user anonymous --passwordfile /home/jenkins/agent/workspace/_test-using-jenkinsfile_PR-25517/appserver/tests/appserv-tests/temppwd create-domain --adminport 45707 --domainproperties jms.port=45708:domain.jmxPort=45709:orb.listener.port=45710:http.ssl.port=45711:orb.ssl.port=45714:orb.mutualauth.port=45715 --instanceport 45712 domain1 Pausing Resuming
-
another step paused when starting maven:
09:38:48 + mvn clean package -f /home/jenkins/agent/workspace/_test-using-jenkinsfile_PR-25517/appserver/tests/appserv-tests/lib/pom.xml -Pstaging 09:43:43 [INFO] Scanning for projects...
-
And again another nearly 3 minute pause
09:57:20 + /home/jenkins/agent/workspace/_test-using-jenkinsfile_PR-25517/glassfish7/glassfish/bin/asadmin --user anonymous --passwordfile /home/jenkins/agent/workspace/_test-using-jenkinsfile_PR-25517/appserver/tests/appserv-tests/temppwd create-domain --adminport 45707 --domainproperties jms.port=45708:domain.jmxPort=45709:orb.listener.port=45710:http.ssl.port=45711:orb.ssl.port=45714:orb.mutualauth.port=45715 --instanceport 45712 domain1 10:00:28 Using port 45707 for Admin.
What is the expected correct behavior?
Build should pass or fail in around 30 minutes and run without pausing and especially timeouts.
Priority
-
Urgent -
High -
Medium -
Low
Severity
-
Blocker -
Major -
Normal -
Low
Impact
We are getting behind schedule with releases, TCK updates, lot of work to do.
- Was there some change in GlassFish project resources, sponsoring?
- Could it be caused by some file system issues?
I have no idea what to do. I tried at least to use urandom for generating the selfsign certificate, as first what was stuck was keytool, however all other pauses cannot be related to urandom.