java.lang.OutOfMemoryError: Java heap space

added IT-prioritymedium IT-severitynormal ~24521 team:releng labels

changed the description

assigned to @fgurr

Have you tried setting -Xms and -Xmx?

no - i did not change anything, and don't know how to. i have not seen that error before on other CIs

@fgurr: Is there any chance to give Jenkins nodes more RAM? 4GB feels a bit underestimated today.

Sure, we can cover inefficient memory use by throwing more RAM at it.

If you are referring to the Platform JIPP, there is already a pod template available with 6GB of RAM. The label for it is centos-7-6gb. Memory requests/limits can also be set in pod definitions and within the overall limits of a project's Jenkins instance (see also https://wiki.eclipse.org/Jenkins#What_is_killing_my_build.3F_I.27m_using_custom_containers.21).

Sure, we can cover inefficient memory use by throwing more RAM at it.

The question is more how we get the human-resource to fine tune memory consumption...

centos-7-6gb

Is centos-latest (what we currently use) = centos-7

The question is more how we get the human-resource to fine tune memory consumption...

Is centos-latest (what we currently use) = centos-7

No. centos-latest = centos-8.

Platform JIPP, there is already a pod template available with 6GB of RAM. The label for it is centos-7-6gb

Let ask it differently: if we would create a "pod template" based on centos-8 and configure that to use 6GB, would it be accepted at all?

If yes, can you please outline steps needed to perform that above?
If no, and we hit some restrictions on the infra/project side, could you also outline steps needed to make it possible to increase limits/quotas/whatever?

@aloskutov as far as I know from past discussions platform can use higher pod resources but then have less concurrent builds: https://wiki.eclipse.org/CBI#Additional_Resource_Packs

so per recourse pack we can actually right now use 8GB @fgurr can we make the default pod templates use 8 GB so they utilize a full resource pack (2 VCPU + 8 GB RAM)?

By the way Github default runners use 2CPU+7GB and build seem to work there: https://docs.github.com/en/actions/using-github-hosted-runners/about-github-hosted-runners/about-github-hosted-runners

can we make the default pod templates use 8 GB so they utilize a full resource pack (2 VCPU + 8 GB RAM)?

Yes, please! As Platform project lead I assume I can approve this request.

I think it would be better to open a dedicated issue for that, and close this as it is obviously caused by a bad written test.

Will do in a second

Created #3896 (closed)

bad written test? show that you can do it better within reasonable time.

Jörg, please keep discussion focused & be more polite. We all want best so don't need to be personal here.

From my experience recent OOM's could be caused by any of below:

updated Maven
updated Tycho
updated Surefire
updated Java
some new test added.

We've seen OOM's for every point above in the past. I personally don't think it is caused by the test that was merged exact two years ago.

The problem is that this test has several assumptions and bad properties (including exhausting ALL available memory of the JVM), that it does not caused a problem before is just a lucky think as you mentions because simply every change can be the thing that uses that one byte to much what at best makes something fail (e.g. maven) or worse makes a process being killed by a SIGKILL ...

https://ci.eclipse.org/platform/job/eclipse.platform/job/PR-764/11/ Failed with OOME before "testOOME" was executed.

Exception in thread "Worker-JM" java.lang.OutOfMemoryError: Java heap space
12:55:32.901 [ERROR] [6f685e77-69a6-4000-a694-b5c06b54c5eb][extension>org.eclipse.tycho:tycho-maven-plugin:4.0.4-SNAPSHOT] Unhandled error
java.lang.OutOfMemoryError: Java heap space

Could it be a regression in tycho 4.0.4?

there is a related warning happening short before the first OutOfMemoryError:

12:48:15.616 [INFO] Building jar: /home/jenkins/agent/workspace/eclipse.platform_PR-764/debug/org.eclipse.ui.externaltools/target/org.eclipse.ui.externaltools-3.6.200-SNAPSHOT-javadoc.jar
[299.932s][warning][gc,alloc] pool-237-thread-2: Retried waiting for GCLocker too often allocating 625002 words

i don't know what it means, but may be the root cause.

I now checked out the PR mentioned by @jkubitz, and limit the the maven jvm to 2 GB like this

export MAVEN_OPTS="-Xmx2g -XX:+HeapDumpOnOutOfMemoryError -XX:HeapDumpPath=/tmp/dump"
mvn clean verify -T2 -Dmaven.test.failure.ignore=true -Pbree-libs -Papi-check -Pjavadoc -Dcompare-version-with-baselines.skip=false -Dmaven.compiler.failOnWarning=true -DskipTests

please note that it seem important to use these quite exhaustive options to see the problem, without doing all api checks and javadoc and alike it seems to use lower memory. I see in system monitor that even I limit maven to 2gb the jvm takes about 3.7 GB while running the build before the build ends because of OOM.

The JVM then prints out

[193,009s][warning][gc,alloc] pool-193-thread-7: Retried waiting for GCLocker too often allocating 78127 words
java.lang.OutOfMemoryError: Java heap space
Dumping heap to /tmp/dump/java_pid1213.hprof ...
Heap dump file created [2328419420 bytes in 2,884 secs]
[195,894s][warning][gc,alloc] pool-193-thread-9: Retried waiting for GCLocker too often allocating 78127 words
[195,903s][warning][gc,alloc] pool-193-thread-12: Retried waiting for GCLocker too often allocating 78127 words
[195,903s][warning][gc,alloc] pool-193-thread-11: Retried waiting for GCLocker too often allocating 78127 words

In the dump I see two threads (what seems obvious as I run with -T2 option) retaining each 850MB of space that seems to originate in the JDT compiler CharDeduplication (?) part

Can you please share the heap dump?

@jkubitz I think its faster to checkout the PR and run the given maven command (on windows you need set instead of export to limit maven jvm space)

Only think I could think of is calling org.eclipse.jdt.internal.compiler.util.JRTUtil.reset() as it seems that there are things retained when calling with -Papi-check without that everything is running, so it seems if you change a lot (like in this PR) Api checks are called often what leads to things pile-up in that map .. I'll debug why we got so mayn there anyways as I would expect only a few but seeing 14 items there...

in the dump that i got it looks more like to blame hundret of thousends instances of jdt JrtFileSystem.classCache

The problem is with PDEState.querySystemPackages that calls

org.eclipse.jdt.internal.compiler.util.JRTUtil.getJrtSystem(File, String)

with different release options.

this results in the method being called as key:

/usr/lib/jvm/java-17-oracle/lib/jrt-fs.jar
/usr/lib/jvm/java-17-oracle/lib/jrt-fs.jar|17
/usr/lib/jvm/java-17-oracle/lib/jrt-fs.jar|11
/usr/lib/jvm/java-17-oracle/lib/jrt-fs.jar|12
/usr/lib/jvm/java-17-oracle/lib/jrt-fs.jar|..

and everything results in a new cache entry and a new JrtFileSystem and as we use classpath isolation we have potentially two of them (or more ...) if we use more threads.

I'm not sure if JDT can do better here (sharing some cached state) or PDE is using that method wrong...

Thread [Worker-1: Load Target Platform] (Suspended (breakpoint at line 153 in JRTUtil))	
	owns: ConcurrentHashMap$ReservationNode<K,V>  (id=1161)	
	owns: ConcurrentHashMap$Node<K,V>  (id=1162)	
	owns: Object  (id=894)	
	JRTUtil.lambda$0(Jdk, String, File, String) line: 153	
	0x000000080192c4b8.apply(Object) line: not available	
	ConcurrentHashMap<K,V>.computeIfAbsent(K, Function<? super K,? extends V>) line: 1708	
	JRTUtil.getJrtSystem(File, String) line: 151	
	JRTUtil.walkModuleImage(File, String, JrtFileVisitor<Path>, int) line: 248	
	ClasspathJrtWithReleaseOption.lambda$0(String) line: 143	
	0x0000000801927840.apply(Object) line: not available	
	ConcurrentHashMap<K,V>.computeIfAbsent(K, Function<? super K,? extends V>) line: 1740	
	ClasspathJrtWithReleaseOption.findPackagesInModules() line: 140	
	ClasspathJrtWithReleaseOption.getModuleNames(Collection<String>) line: 242	
	PDEState.querySystemPackages(IVMInstall, IExecutionEnvironment) line: 676	
	PDEState.lambda$10(IExecutionEnvironment, IVMInstall) line: 555	
	0x0000000801a60948.apply(Object) line: not available	
	ReferencePipeline$3$1.accept(P_OUT) line: 197	
	SortedOps$SizedRefSortingSink<T>.end() line: 361	
	ReferencePipeline$2(AbstractPipeline<E_IN,E_OUT,S>).copyIntoWithCancel(Sink<P_IN>, Spliterator<P_IN>) line: 528	
	ReferencePipeline$2(AbstractPipeline<E_IN,E_OUT,S>).copyInto(Sink<P_IN>, Spliterator<P_IN>) line: 513	
	ReferencePipeline$2(AbstractPipeline<E_IN,E_OUT,S>).wrapAndCopyInto(S, Spliterator<P_IN>) line: 499	
	FindOps$FindOp<T,O>.evaluateSequential(PipelineHelper<T>, Spliterator<S>) line: 150	
	ReferencePipeline$2(AbstractPipeline<E_IN,E_OUT,S>).evaluate(TerminalOp<E_OUT,R>) line: 234	
	ReferencePipeline$2(ReferencePipeline<P_IN,P_OUT>).findFirst() line: 647	
	PDEState.querySystemPackages(IExecutionEnvironment, Properties) line: 556	
	PDEState.getSystemPackages(IExecutionEnvironment, Properties) line: 493	
	TargetPlatformHelper.getSystemPackages(IExecutionEnvironment, Properties) line: 391	
	TargetPlatformHelper.addEnvironmentProperties(Dictionary<String,String>, IExecutionEnvironment, Properties) line: 378	
	TargetPlatformHelper.getPlatformProperties(String[], MinimalState) line: 365	
	PDEState(MinimalState).initializePlatformProperties() line: 251	
	PDEState.<init>(URI[], boolean, boolean, IProgressMonitor) line: 70	
	PluginModelManager.initializeTable(IProgressMonitor) line: 617	
	PluginModelManager.targetReloaded(IProgressMonitor) line: 542	
	LoadTargetDefinitionJob.resetPlatform(IProgressMonitor) line: 184	
	LoadTargetDefinitionJob.runInWorkspace(IProgressMonitor) line: 145	
	LoadTargetDefinitionJob(InternalWorkspaceJob).run(IProgressMonitor) line: 43	
	Worker.run() line: 63

https://github.com/eclipse-jdt/eclipse.jdt.debug/pull/294 might help here. I can try to focus on this one at the weekend so that we use it soon at PDE.

@fgurr I've seen the announcement for new maven version, can you tell if maven is specifically configured for usinh how many ram?

AFAIK, we don't set default values for memory consumption with Maven (e.g. in MAVEN_OPTS).

And also: Is there any chance to get a heapdump for a build with OOM?

Would require some JVM parameter -XX:+HeapDumpOnOutOfMemoryError -XX:HeapDumpPath and storing the the output as artifact.

Would require some JVM parameter -XX:+HeapDumpOnOutOfMemoryError -XX:HeapDumpPath and storing the the output as artifact.

I know it can be done that way, but it might be not very nice for the Jenkins master, so maybe @fgurr has some better way (e.g. store it to a dedicated network device)

I'm not aware of a better way at the moment, but producing HeapDumps should definitely be an exception.

It seems a bit scary and I don't want to kill the manin jenkins instance, I'll first try to replicate the problem using cpulimit on my local maschine.

mentioned in issue #3896 (closed)

@aloskutov issue seems to be solved, can we close this?

Sure, but I can't close this ticket, so please close it.

closed

mentioned in issue #6280 (closed)

java.lang.OutOfMemoryError: Java heap space

Summary

Steps to reproduce

Relevant logs and/or screenshots

Priority

Severity

Impact

Designs

Child items 0

Activity