Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

vector issue on jdk17 win32 #3279

Open
sophia-guo opened this issue Jan 24, 2022 · 12 comments
Open

vector issue on jdk17 win32 #3279

sophia-guo opened this issue Jan 24, 2022 · 12 comments
Labels
triage required Issue needs deeper triage to determine which repo to move issue into

Comments

@sophia-guo
Copy link
Contributor

jdk/incubator/vector/Short256VectorTests.java.Short256VectorTests
jdk/incubator/vector/ShortMaxVectorTests.java.ShortMaxVectorTests

Both tests failed the job with error:

09:03:43  test Short256VectorTests.divShort256VectorTestsMasked(short[-i * 5], short[cornerCaseValue(i)], mask[i % 2]): failure
09:03:43  java.lang.ArithmeticException: zero vector lane in dividend [32767, -32768, -32768, 32767, 1, 32767, -32768, -32768, 32767, 0, 32767, -32768, -32768, 32767, 1, 32767]
09:03:43  	at jdk.incubator.vector/jdk.incubator.vector.AbstractVector.divZeroException(AbstractVector.java:494)
09:03:43  	at jdk.incubator.vector/jdk.incubator.vector.ShortVector.lanewiseTemplate(ShortVector.java:615)
09:03:43  	at jdk.incubator.vector/jdk.incubator.vector.Short256Vector.lanewise(Short256Vector.java:279)
09:03:43  	at jdk.incubator.vector/jdk.incubator.vector.Short256Vector.lanewise(Short256Vector.java:41)
09:03:43  	at jdk.incubator.vector/jdk.incubator.vector.ShortVector.lanewise(ShortVector.java:673)
09:03:43  	at jdk.incubator.vector/jdk.incubator.vector.ShortVector.div(ShortVector.java:1349)
09:03:43  	at Short256VectorTests.divShort256VectorTestsMasked(Short256VectorTests.java:1606)
09:03:43  	at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
09:03:43  	at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:77)
09:03:43  	at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
09:03:43  	at java.base/java.lang.reflect.Method.invoke(Method.java:568)
09:03:43  	at org.testng.internal.MethodInvocationHelper.invokeMethod(MethodInvocationHelper.java:132)
09:03:43  	at org.testng.internal.TestInvoker.invokeMethod(TestInvoker.java:599)
09:03:43  	at org.testng.internal.TestInvoker.invokeTestMethod(TestInvoker.java:174)
09:03:43  	at org.testng.internal.MethodRunner.runInSequence(MethodRunner.java:46)

https://ci.adoptopenjdk.net/job/Test_openjdk17_hs_extended.openjdk_x86-32_windows_testList_1/26/consoleFull

@sophia-guo
Copy link
Contributor Author

sophia-guo commented Jan 27, 2022

@smlambert
Copy link
Contributor

smlambert commented Jan 28, 2022

Also running same test against a different vendor build (https://cdn.azul.com/zulu/bin/zulu17.32.13-ca-jdk17.0.2-win_i686.zip in https://ci.adoptopenjdk.net/job/Grinder/3343) to see how it behaves.

Edit: Grinder/3343 passes

@sxa
Copy link
Member

sxa commented Feb 8, 2022

@sxa
Copy link
Member

sxa commented Feb 22, 2022

@smlambert Can you take a look at the output from the above job and confirm my evaluation here? If so we should think about what we want to do as the next action. e.g. We could get someone to attempt a rebuild using an old version of the temurin-build scripts and see if that has the same behaviour.

@smlambert
Copy link
Contributor

We have published win32 jdk17 as is, given the tests that have regressed are not mainstream usage and this investigation will take time.

Working with 2 builds for a detailed comparison:

  • base - the October win32 jdk17 release build - passes these Vector tests
  • test - a recent win32 jdk17 build using same tags as October release (need to confirm that) - fails these Vector tests

From a glance the builds are very similar (same size, believe to be building same source code from same tags, running same test material), taking a closer look with dumpbin to determine how they vary. Will report findings here as part of this investigation.

@smlambert
Copy link
Contributor

Things we know:

  • base build and test build were built from identical source... (only differences are a few git files):
G:\openjdk>diff  G:\openjdk\base\jdk-17.0.1+12 "g:\openjdk\sxa\jdk-17.0.1+12\release"
1,2c1
< IMPLEMENTOR="Eclipse Adoptium"
< IMPLEMENTOR_VERSION="Temurin-17.0.1+12"
---
> IMPLEMENTOR="Undefined Vendor"
9,10c8,9
< SOURCE=".:git:571f1238bb46"
< BUILD_SOURCE="git:732e6ff6"
---
> SOURCE=".:git:75240a5872a1"
> BUILD_SOURCE="git:1724ce13"
12c11
< SOURCE_REPO="https://github.com/adoptium/jdk17u.git"
---
> SOURCE_REPO="https://github.com/openjdk/jdk17u.git"

image (11)

  • Looked at each of the differences in BUILD_SOURCE, all appear to be benign (many relating to hotspot -> temurin renaming, some for SBOM & reproducible builds that would not affect a win32 build, update to the openssl version)

image (12)

@sxa
Copy link
Member

sxa commented Mar 2, 2022

The other thing that could have been different is which machine they were built on. If other avenues prove fruitless and we don'thave the information about which machine they were on, we could trying building it on each build machine and run the test against it ...

@smlambert
Copy link
Contributor

Attaching dump files of dll's which when diffed indicate some things are different, many of the api-ms ones are pretty close to identical) others hold more differences. My thought was to reduce the testcase to a standalone, then do a 'binary search' approach of swapping out 1/2 the dlls, from a working binary to a failing binary and see how the testcase behaves, and keep narrowing it down that way. But ya, feel free to also pursue other approaches... (this is where the SSDF and SBOM info would be tremendously handy)...
dumps.zip

@sxa
Copy link
Member

sxa commented Mar 3, 2022

@smlambert
Copy link
Contributor

smlambert commented Mar 16, 2022

Noting same tests fail on arm_linux (aarch32) noted in jdk17 triage & jdk18 triage, tracked under #2874.

@smlambert smlambert added triage required Issue needs deeper triage to determine which repo to move issue into and removed release triage labels Jan 31, 2024
@smlambert
Copy link
Contributor

@sophia-guo - adding the 'more triage required' tag as this is a candidate for further scrutiny.

@sxa
Copy link
Member

sxa commented Jan 31, 2024

Noting same tests fail on arm_linux (aarch32)

Interesting - in which case that smacks of it being a generic 32-bit issue I feel, although from this issue it started failing between 17.0.1 and 17.0.2 but the arm32 one mentions crashes in 17+35.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
triage required Issue needs deeper triage to determine which repo to move issue into
Projects
Status: Todo
Development

No branches or pull requests

3 participants