Despite no other pods running, resource quota blocks launching new build agents. Builds are piling up. Seems to be the same situation as in #4892 (closed).
At the time of triggering the job, no other pods where running. Thus, I'm confused why it has any values different from 0 in the "used" line. The "limited" line shows enough resources for the current Jenkins job to fit. AFAIK there haven't been any related changes to the Jenkinsfile recently. Also, according to https://api.eclipse.org/cbi/sponsorships the upcoming reassignment of resource packs is not yet finished and there are still two packs assigned.
Can it also be related to our Windows agents? I noticed there is still the "old" agent up, which isn't used anymore. Seems this wasn't clearly communicated in the associated ticket.
Just to confirm, current situation (stale pods) is not the same as in #4892 (closed)?
If nothing else works at the moment, lowering the resources in Jenkinsfile should be ok for us as a intermediate solution. I'd just like to make sure that we can really use all assigned resources correctly.
At the time of triggering the job, no other pods where running. Thus, I'm confused why it has any values different from 0 in the "used" line.
The job runs multiple pods in parallel. That's why "used" is > 0.
The "limited" line shows enough resources for the current Jenkins job to fit.
19 used + 16 requested = 35 (limit is 30)
Also, according to https://api.eclipse.org/cbi/sponsorships the upcoming reassignment of resource packs is not yet finished and there are still two packs assigned.
The API had not been updated yet, but the resource packs were already removed on the Jenkins level. API has been updated.
Can it also be related to our Windows agents? I noticed there is still the "old" agent up, which isn't used anymore. Seems this wasn't clearly communicated in the associated ticket.
It's unrelated to the Windows agents, but I will clean up the old agent (b9qls-windows-10).
The actual issue was the reassignment of the resource packs to the SCM instance, which led to a lack of resources. Meanwhile, new requests have been created to add more resource packs to the openPASS instance. 10 resource packs have been added, so the build should work again.
I'm still not sure about that. The stages running in parallel are related to our Linux and Windows builds. Inside of the Linux stage everything should run in sequence, but I have to look into that in more detail.
It looks like the pod cannot start within specified timeout - 3minutes.
That's probably because pulling xiaopanansys/gt-gen-dev:latest takes long time. Maybe adding imagePullPolicy: IfNotPresent to the container definition would solve the issue.
I'm not sure if we can overwrite timeout but in your jenkinsfile can you add:
I added the activeDeadlineSecondshere, unfortunately it has no effect. Still timing out after 180s. Did I specify the setting correctly?
As we used to have some quite large images recently I checkt the current one, which is < 1 GiB. I don't think this is a size to worry about and a container should be able to start even within the 180s limit. Can this incident be related?
I noticed that the opSimulation Jenkinsfile was also missing this change related to timeout issues with the persistent volume being mounted. Added it now to the current test branch. Startup of pod was smooth in the latest run.