Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Hotspot test serviceability/sa/ClhsdbCDSCore.java hangs on Ubuntu 16.04/x64 #3745

Open
zzambers opened this issue Sep 18, 2024 · 13 comments
Open

Comments

@zzambers
Copy link

zzambers commented Sep 18, 2024

I can see, that this test hangs on adoptium infra, being killed on timeout (seems reliable):
serviceability/sa/ClhsdbCDSCore.java

I can see this both in dev.openjdk run and when ran in grinder.

Output:

Starting ClhsdbCDSCore test
Command line: [/home/jenkins/workspace/Grinder/jdkbinary/j2sdk-image/bin/java -cp /home/jenkins/workspace/Grinder/aqa-tests/TKG/output_17265736059645/hotspot_custom_0/work/classes/0/serviceability/sa/ClhsdbCDSCore.d:/home/jenkins/workspace/Grinder/aqa-tests/openjdk/openjdk-jdk/test/hotspot/jtreg/serviceability/sa:/home/jenkins/workspace/Grinder/aqa-tests/TKG/output_17265736059645/hotspot_custom_0/work/classes/0/test/lib:/home/jenkins/workspace/Grinder/aqa-tests/openjdk/openjdk-jdk/test/lib:/home/jenkins/workspace/Grinder/jvmtest/openjdk/jtreg/lib/javatest.jar:/home/jenkins/workspace/Grinder/jvmtest/openjdk/jtreg/lib/jtreg.jar -ea -esa -Xmx512m -XX:+UseCompressedOops -Xshare:dump -Xlog:cds,cds+hashtables -XX:SharedArchiveFile=./ArchiveForClhsdbCDSCore.jsa ]
[2024-09-17T11:46:52.145720Z] Gathering output for process 25719
[ELAPSED: 447 ms]
[logging stdout to serviceability.sa.ClhsdbCDSCore.java-0000-dump.stdout]
[logging stderr to serviceability.sa.ClhsdbCDSCore.java-0000-dump.stderr]
[STDERR]

[2024-09-17T11:46:52.603422Z] Waiting for completion for process 25719
[2024-09-17T11:46:52.603687Z] Waiting for completion finished for process 25719
Command line: [/home/jenkins/workspace/Grinder/jdkbinary/j2sdk-image/bin/java -cp /home/jenkins/workspace/Grinder/aqa-tests/TKG/output_17265736059645/hotspot_custom_0/work/classes/0/serviceability/sa/ClhsdbCDSCore.d:/home/jenkins/workspace/Grinder/aqa-tests/openjdk/openjdk-jdk/test/hotspot/jtreg/serviceability/sa:/home/jenkins/workspace/Grinder/aqa-tests/TKG/output_17265736059645/hotspot_custom_0/work/classes/0/test/lib:/home/jenkins/workspace/Grinder/aqa-tests/openjdk/openjdk-jdk/test/lib:/home/jenkins/workspace/Grinder/jvmtest/openjdk/jtreg/lib/javatest.jar:/home/jenkins/workspace/Grinder/jvmtest/openjdk/jtreg/lib/jtreg.jar -ea -esa -Xmx512m -XX:+UseCompressedOops -Xmx512m -XX:+UnlockDiagnosticVMOptions -XX:SharedArchiveFile=ArchiveForClhsdbCDSCore.jsa -XX:+CreateCoredumpOnCrash -Xshare:auto -XX:+ProfileInterpreter --add-exports=java.base/jdk.internal.misc=ALL-UNNAMED -XX:-AlwaysPreTouch CrashApp ]
[2024-09-17T11:46:52.610596Z] Gathering output for process 25735
[2024-09-17T11:46:52.611510Z] Waiting for completion for process 25735
[2024-09-17T11:46:52.628039Z] Waiting for completion finished for process 25735
Run test with ulimit -c: unlimited
[2024-09-17T11:46:52.630845Z] Gathering output for process 25738
Timeout signalled after 19200 seconds

Notes:
I have tried to reproduce this locally or on our ifra both manually invoking jtreg and through aqa-tests, but failed to reproduce it. Maybe it is inra/environment issue? Test first intentionally crashes the VM using Unsafe class to produce core file. However this hangs when ran on adoptium infra. Maybe something with core dump settings? I don't know.

@zzambers
Copy link
Author

zzambers commented Sep 18, 2024

This could be related to JDK-8283410, but on Adoptium infra it seems to affect linux (not windows?).

@sophia-guo
Copy link

sophia-guo commented Sep 23, 2024

@zzambers I did run it on a different agent ClhsdbCDSCore.java and it passed https://ci.adoptium.net/view/Test_grinder/job/Grinder/10970/ ( failed one is due to no test selected.) So it might be related with infra as you can't reproduce it on your environment. Could you please move it to infra repo? Or I can move it if you agree?

@zzambers
Copy link
Author

@sophia-guo by moving you mean filling the same issue there and closing this one?

@sophia-guo
Copy link

There is a transfer issue link at the right side of the issue.
Screenshot 2024-09-25 at 9 37 10 AM

I'm not sure if it's clickable for you as it might be related with the permission. I will just do this.

@sophia-guo sophia-guo transferred this issue from adoptium/aqa-tests Sep 25, 2024
@sxa
Copy link
Member

sxa commented Oct 5, 2024

@zzambers I did run it on a different agent ClhsdbCDSCore.java and it passed https://ci.adoptium.net/view/Test_grinder/job/Grinder/10970/ ( failed one is due to no test selected.) So it might be related with infra as you can't reproduce it on your environment. Could you please move it to infra repo? Or I can move it if you agree?

@sophia-guo Can you get a list of which machines/distributions it passes and fails on? Your one was run on RHEL. Both of zzambers' runs were on an (old, out of support) Ubuntu distribution (although neither were in containers). At the moment I'm not sure we have enough information to be able to be able to take an action this one in the infrastructure repo since it's not clear what is needed to resolve it.

@sxa
Copy link
Member

sxa commented Nov 22, 2024

There are recent dev.hotspot runs which look clean - was this test removed and is it still considered a problem?

@sxa
Copy link
Member

sxa commented Nov 22, 2024

I tried kicking off some grinders for testing (based on JDK11 since that's what the dev.openjdk link in the description was pointing at but got 15:21:35 Error: Cannot find file: /home/jenkins/workspace/Grinder/aqa-tests/TKG/../openjdk/openjdk-jdk/test/jdk/serviceability/sa/ClhsdbCDSCore.java which suggests that this test may no longer be valid:

@sxa sxa moved this from Todo to Paused/Blocked in 2024 4Q Adoptium Plan Nov 22, 2024
@sxa sxa self-assigned this Nov 25, 2024
@sxa sxa added this to the 2024-11 (November) milestone Nov 25, 2024
@sxa
Copy link
Member

sxa commented Nov 29, 2024

There are recent dev.hotspot runs which look clean - was this test removed and is it still considered a problem?

ping @sophia-guo @zzambers - is this still a concern?

@sophia-guo
Copy link

sophia-guo commented Dec 2, 2024

It's a hotspot tests. So rerun with hotspot_custom

@sxa
Copy link
Member

sxa commented Dec 2, 2024

It's a hotspot tests. So rerun with hotspot_custom

I don't think I've ever looked at a test that needed that before. Thanks for the pointer. Is there any way I can tell from the name of the test which ones need to use hotspot_custom instead of jdk_custom?

Both of the Ubuntu ones look like they have a pass although it's overall UNSTABLE ... Does this mean it's just not valid for the _1 variant?

21:31:51  TEST TARGETS SUMMARY
21:31:51  ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
21:31:51  PASSED test targets:
21:31:51  	hotspot_custom_0 - Test results: passed: 1 
21:31:51  
21:31:51  FAILED test targets:
21:31:51  	hotspot_custom_1

@sophia-guo
Copy link

sophia-guo commented Dec 2, 2024

The only way to know if it's hotspot or jdk one is to check the test path. If it's under https://github.com/openjdk/jdk11u-dev/tree/master/test/hotspot ( jdk 11+) https://github.com/openjdk/jdk8u-dev/tree/master/hotspot/test (jdk8)then it's hotspot. If it's under https://github.com/openjdk/jdk11u-dev/tree/master/test/jdk (jdk11+) or https://github.com/openjdk/jdk8u-dev/tree/master/jdk/test (jdk8)then it's jdk.

hotspot_custom_1 can be ignored for this test as CDS only works when the Compressed Oops feature was enable for jdk14- ( works with either configuration of Compressed OOPs with jdk15+). So test is skipped.

@sxa
Copy link
Member

sxa commented Dec 2, 2024

* RHEL7: https://ci.adoptium.net/job/Grinder/11902/console -  **_hang_**

That looks like it is running on Ubuntu 16.04, not RHEL7 😕

https://ci.adoptium.net/job/Grinder/11902/console

OK so we can reproduce but only on certain OSs.
Some more:

provider/OS Grinder Result
ibmcloud-rhel6 11906
ibmcloud-rhel7 11905
docker-centos7 11907
aws-rhel8 11908
docker-ubi9 11909
docker-ubuntu2004 11910

And a few Ubuntus on ppc64le:

provider/OS Grinder Result
osuosl-ubuntu1604 11914
osuosl-ubuntu1804 11915
osuosl-ubuntu2004 11916

@sxa sxa changed the title Hotspot test serviceability/sa/ClhsdbCDSCore.java hangs on adoptium infra Hotspot test serviceability/sa/ClhsdbCDSCore.java hangs on Ubuntu 16.04/x64 Dec 3, 2024
@sophia-guo
Copy link

Yes, on https://ci.adoptium.net/computer/test%2Dibmcloud%2Dubuntu1604%2Dx64%2D1/ it's timeout and failed. I just rerun the grinder you mentioned here #3745 (comment) and had thought the grinder is specified on RHEL7. Anyway tests timeout on some os.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Status: Paused/Blocked
Development

No branches or pull requests

3 participants