Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Test-extended.system-JDK8-win_x86 SharedClasses_Softmx_UpDown_0 crash vmState=0x00000000 MethodSorter.getDeclaredMethods #4201

Closed
pshipton opened this issue Jan 8, 2019 · 25 comments

Comments

@pshipton
Copy link
Member

pshipton commented Jan 8, 2019

https://ci.eclipse.org/openj9/job/Test-extended.system-JDK8-win_x86/126/

Unhandled exception
Type=Segmentation error vmState=0x00000000
J9Generic_Signal_Number=00000004 ExceptionCode=c0000005 ExceptionAddress=2E69BC6D ContextFlags=0001007f
Handler1=726064A0 Handler2=725383A0 InaccessibleReadAddress=278F34E0
EDI=1FCC2630 ESI=01A2F308 EAX=1FCC2630 EBX=017138A0
ECX=00000000 EDX=017138A0
EIP=2E69BC6D ESP=231F17AC EBP=230F0100 EFLAGS=00010206
GS=002B FS=0053 ES=002B DS=002B
Module=
Module_base_address=2E630000 Offset_in_DLL=0006bc6d

Compiled_method=org/junit/internal/MethodSorter.getDeclaredMethods(Ljava/lang/Class;)[Ljava/lang/reflect/Method;
Target=2_90_20190107_504 (Windows Server 2012 R2 6.3 build 9600)
CPU=x86 (8 logical CPUs) (0x1ffb9c000 RAM)
----------- Stack Backtrace -----------
(0x2E69BC6D)
(0x1FCC2288)
(0xF3080151)
(0xE5C001A2)
(0x22880151)
(0xF3081FCC)
(0xA18001A2)
(0xF3087223)
(0xF30801A2)
(0x794401A2)
(0x589F231D)
(0x181824C9)
(0xCD28231F)
(0xF308012F)
(0x228801A2)
(0x00011FCC)
(0xF308021E)
(0x261801A2)
(0x25981FCC)
(0x25601FCC)
(0x24F01FCC)
(0x1B0C1FCC)
(0x8149231D)
(0x183024C8)
(0x2598231F)
(0x25601FCC)
(0xF3081FCC)
(0x24F001A2)
(0x24F01FCC)
(0x1AFC1FCC)
(0x8113231D)
(0x184824C8)
(0xF308231F)
(0x24B001A2)
(0x24B01FCC)
(0x54E41FCC)
(0xFABE231D)
(0x185C24C8)
(0xF308231F)
(0x24B001A2)
(0x4F6C1FCC)
(0xF2D9231D)
(0x189824C8)
(0xF308231F)
(0x24B001A2)
(0x24B01FCC)
(0x18C11FCC)
(0x18D4231F)
(0xC6A0231F)
(0x637E2743)
@pshipton
Copy link
Member Author

pshipton commented Jan 10, 2019

5x grinder passed
50x grinder https://ci.eclipse.org/openj9/view/Test/job/Test-Grinder/145/

@pshipton
Copy link
Member Author

Didn't see the results before the grinder was removed. Try again
https://ci.eclipse.org/openj9/view/Test/job/Test-Grinder/158/

@pshipton
Copy link
Member Author

50x grinder passed. I think I'll close this. We can reopen if the problem is seen again.

@pshipton
Copy link
Member Author

pshipton commented Jan 30, 2019

Another occurrence https://ci.eclipse.org/openj9/job/Test-extended.system-JDK8-win_x86/149
SharedClassesWorkload_0

Type=Segmentation error vmState=0x00000000
J9Generic_Signal_Number=00000004 ExceptionCode=c0000005 ExceptionAddress=22DFB88D ContextFlags=0001007f
Handler1=71BB63F0 Handler2=71AD83C0 InaccessibleReadAddress=231B3E68
EDI=20B7E518 ESI=0137E528 EAX=20B7E518 EBX=013EDFE0
ECX=0139A8F0 EDX=013EDFE0
EIP=22DFB88D ESP=22AC5A34 EBP=227BBE00 EFLAGS=00010202
GS=002B FS=0053 ES=002B DS=002B
Module=
Module_base_address=22CC0000 Offset_in_DLL=0013b88d
Compiled_method=org/junit/internal/MethodSorter.getDeclaredMethods(Ljava/lang/Class;)[Ljava/lang/reflect/Method;
Target=2_90_20190129_605 (Windows Server 2012 R2 6.3 build 9600)
CPU=x86 (8 logical CPUs) (0x1ffb9c000 RAM)
----------- Stack Backtrace -----------
(0x22DFB88D)
(0x20B7E490)
(0xE4B80139)
(0xA8F020B7)
(0xE9800139)
(0xE500717D)
(0xE9B020B7)
(0xA8F0717D)
(0x01440139)
(0xC7FF22AD)
(0x5A9024E8)
(0xE14022AC)
(0x014420B7)
(0xC7F722AD)
(0x5A9024E8)
(0xE50022AC)
(0xA8F020B7)
(0xA8F00139)
(0xE1300139)
(0x7BE820B7)
(0xBC2C22AC)
(0x5AAC24E7)
(0xA8F022AC)
(0xE1300139)
(0xE13020B7)
(0xE12020B7)
(0xDCF420B7)
(0x942122AC)
(0x5AC424E8)
(0xA8F022AC)
(0xE1200139)
(0xE12020B7)
(0x8AFC20B7)
(0xEF2E22AC)
(0x5ADC24E7)
(0xA8F022AC)
(0xDE400139)
(0x8DF820B7)
(0xF6B522AC)
(0x5B0024E7)
(0xE9B022AC)
(0xA8F0717D)
(0xDE400139)
(0x9B5C20B7)
(0x0C5622AC)
(0x5B1824E8)

@pshipton pshipton reopened this Jan 30, 2019
@andrewcraik
Copy link
Contributor

FYI @jdmpapin and @dsouzai the mention of sharedclasses makes me wonder if the new SVM will solve this?

@dsouzai
Copy link
Contributor

dsouzai commented Jan 30, 2019

It's possible; however if the issue is a missing relocation, then the SVM wouldn't really help since it's main purpose is validation. Given that this is windows, it's possible that the SVM is enabled for this compile (if it happened after startup).

@andrewcraik
Copy link
Contributor

@dsouzai so is this something that should be looked at as part of the SVM migration then?

@dsouzai
Copy link
Contributor

dsouzai commented Jan 30, 2019

Well there's three options here:

  1. Existing AOT bug that happens without the SVM
  2. Existing AOT bug that is now exposed by the SVM
  3. Bug caused by the SVM

Of the AOT bugs we've run into thus far, they've almost always been an existing bug that we just never ran into before, or existing bug now exposed because of the new code paths under AOT that are now being exercised thanks to the SVM.

So I guess in terms of "something that should be looked at as part of the SVM migration then", I'd say it's worth noting that the SVM migration is likely going to expose existing bugs, but I wouldn't necessarily tie AOT bugs to the SVM.

@dsouzai dsouzai self-assigned this Jan 30, 2019
@dsouzai
Copy link
Contributor

dsouzai commented Jan 30, 2019

I won't have time this week to take a look at this, but I'll take a look next week; I've assigned it to me so it doesn't fall under the radar.

@pshipton
Copy link
Member Author

pshipton commented Feb 7, 2019

@pshipton
Copy link
Member Author

pshipton commented Feb 8, 2019

https://ci.eclipse.org/openj9/job/Test-extended.system-JDK11-linux_x86-64_cmprssptrs/176

0000000001FF0900: Object neither in heap nor stack-allocated in thread load-1
0000000001FF0900:	O-Slot=00007F66AA0A77F8
0000000001FF0900:	O-Slot value=00000000019384C8
0000000001FF0900:	PC=00007F66D9331C2A
0000000001FF0900:	framesWalked=4
0000000001FF0900:	arg0EA=0000000002020668
0000000001FF0900:	walkSP=0000000002020538
0000000001FF0900:	literals=0000000000000010
0000000001FF0900:	jitInfo=00007F66D01643C8
0000000001FF0900:	method=0000000001D99F90 (jdk/internal/reflect/MethodAccessorGenerator.generate(Ljava/lang/Class;Ljava/lang/String;[Ljava/lang/Class;Ljava/lang/Class;[Ljava/lang/Class;IZZLjava/lang/Class;)Ljdk/internal/reflect/MagicAccessorImpl;) (JIT)
0000000001FF0900:	stack=000000000201A148-00000000020211C0
06:45:44.946 0x1ce0e00    j9mm.479    *   ** ASSERTION FAILED ** at ../../../../gc_glue_java/ScavengerRootScanner.hpp:105: ((MM_StackSlotValidator(MM_StackSlotValidator::NOT_ON_HEAP, *slotPtr, stackLocation, walkState).validate(_env)))
JVMDUMP039I Processing dump event "traceassert", detail "" at 2019/02/08 02:45:44 - please wait.

@pshipton
Copy link
Member Author

pshipton commented Feb 8, 2019

Note that last one is an assert and not a crash.

@dsouzai
Copy link
Contributor

dsouzai commented Feb 8, 2019

Well, more reason to believe a missed relocation :). I didn't have time to look at this issue this week, but I'm definitely going to next week. Also I'm really glad this isn't a Windows only issue; I can try and reproduce this using at linux job.

EDIT: There's a core!!

@pshipton
Copy link
Member Author

https://ci.eclipse.org/openj9/job/Test-extended.system-JDK8-win_x86/159
4.jvm1.stderr

Unhandled exception
Type=Segmentation error vmState=0x00000000
J9Generic_Signal_Number=00000004 ExceptionCode=c0000005 ExceptionAddress=2E2EA586 ContextFlags=0001007f
Handler1=723163F0 Handler2=722483C0 InaccessibleReadAddress=275B12C0
EDI=1E7419A0 ESI=0125ECF8 EAX=1E7419A0 EBX=0132D6D0
ECX=00000000 EDX=0132D6D0
EIP=2E2EA586 ESP=22B53AEC EBP=26D8EC00 EFLAGS=00010202
GS=002B FS=0053 ES=002B DS=002B
Module=
Module_base_address=2E190000 Offset_in_DLL=0015a586

Compiled_method=org/junit/internal/MethodSorter.getDeclaredMethods(Ljava/lang/Class;)[Ljava/lang/reflect/Method;
Target=2_90_20190209_653 (Windows Server 2012 R2 6.3 build 9600)
CPU=x86 (8 logical CPUs) (0x1ffb9c000 RAM)

@pshipton
Copy link
Member Author

Yes, the jobs do save the core.

@dsouzai
Copy link
Contributor

dsouzai commented Feb 13, 2019

This is the output from the assert:

0000000001FF0900: Object neither in heap nor stack-allocated in thread load-1
0000000001FF0900:       O-Slot=00007F66AA0A77F8
0000000001FF0900:       O-Slot value=00000000019384C8
0000000001FF0900:       PC=00007F66D9331C2A
0000000001FF0900:       framesWalked=4
0000000001FF0900:       arg0EA=0000000002020668
0000000001FF0900:       walkSP=0000000002020538
0000000001FF0900:       literals=0000000000000010
0000000001FF0900:       jitInfo=00007F66D01643C8
0000000001FF0900:       method=0000000001D99F90 (jdk/internal/reflect/MethodAccessorGenerator.generate(Ljava/lang/Class;Ljava/lang/String;[Ljava/lang/Class;Ljava/lang/Class;[Ljava/lang/Class;IZZLjava/lang/Class;)Ljdk/internal/reflect/MagicAccessorImpl;) (JIT)
0000000001FF0900:       stack=000000000201A148-00000000020211C0
06:45:44.946 0x1ce0e00    j9mm.479    *   ** ASSERTION FAILED ** at ../../../../gc_glue_java/ScavengerRootScanner.hpp:105: ((MM_StackSlotValidator(MM_StackSlotValidator::NOT_ON_HEAP, *slotPtr, stackLocation, walkState).validate(_env)))

From GDB:

(gdb) x/10i 0x00007F66D9331C2A-20
   0x7f66d9331c16:	mov    r13d,DWORD PTR [rbx+0x10]
   0x7f66d9331c1a:	nop    DWORD PTR [rax+rax*1+0x0]
   0x7f66d9331c1f:	movabs rbx,0x19384c8
   0x7f66d9331c29:	cmp    DWORD PTR [rbx*1+0x0],0x0
   0x7f66d9331c31:	jne    0x7f66d9333449
...

Looking at the stackslots:

<1ff0900> JIT frame: bp = 0x0000000002020618, pc = 0x00007F66D9331C2A, unwindSP = 0x0000000002020570, cp = 0x0000000001D992F0, arg0EA = 0x0000000002020668, jitInfo = 0x00007F66D01643C8
<1ff0900> 	Method: jdk/internal/reflect/MethodAccessorGenerator.generate(Ljava/lang/Class;Ljava/lang/String;[Ljava/lang/Class;Ljava/lang/Class;[Ljava/lang/Class;IZZLjava/lang/Class;)Ljdk/internal/reflect/MagicAccessorImpl; !j9method 0x0000000001D99F90
<1ff0900> 	Bytecode index = 63, inlineDepth = 0, PC offset = 0x0000000000001122
<1ff0900> 	stackMap=0x00007F66D01658CC, slots=I16(0x000A) parmBaseOffset=I16(0x0008), parmSlots=U16(0x000A), localBaseOffset=I16(0xFFB8)
<1ff0900> 	Described JIT args starting at 0x0000000002020620 for U16(0x000A) slots
...
<1ff0900> 		I-Slot: : t0[0x0000000002020610] = 0x0000000105706480
<1ff0900> 	JIT-RegisterMap = UDATA(0x0000000000000002)
<1ff0900> 	JIT-HighWordRegisterMap = UDATA(0x0000000000000000)
<1ff0900> 		JIT-RegisterMap-I-Slot[0x00007F66AA0A77F0] = UDATA(0x0000000001B8E4C8) (jit_rax)
<1ff0900> 		JIT-RegisterMap-O-Slot[0x00007F66AA0A77F8] = 0x00000000019384C8 (jit_rbx)
...

The bad O-slot comes from rbx, which got its value from

   0x7f66d9331c1f:	movabs rbx,0x19384c8
   0x7f66d9331c29:	cmp    DWORD PTR [rbx*1+0x0],0x0

This looks very similar to #4138 (comment); this is also a warm AOT'd body:

!trprint jittedbodyinfo 0x00007F66D05606F8
TR_PersistentJittedBodyInfo at 0x00007F66D05606F8
	int32_t                   _counter = 245
	TR_PersistentMethodInfo * _methodInfo = !trprint persistentmethodinfo 0x00007F66D0560748
	int32_t                   _startCount = 5937
	uint16_t                  _hotStartCountDelta = 0
	uint16_t                  _oldStartCountDelta = 1000
	flags16_t                 _flags = 0x0821
	int8_t                    _sampleIntervalCount = 5
	uint8_t                   _aggressiveRecompilationChances = 4
	TR_Hotness                _hotness = 2 (warm)
	uint8_t                   _numScorchingIntervals = 0
	bool                      _isInvalidated = 0
	Details of flags:
		HasLoops                  =1
		UsesPreexistence          =0
		DisableSampling           =0
		IsProfilingBody           =0
		IsAotedBody               =1
		SamplingRecomp            =0
		IsPushedForRecompilation  =0
		FastHotRecompilation      =0
		FastScorchingRecompilation=0
		UsesGCR                   =1

I believe that this bug is caused by the problem described in #4621 . As such, it should be fixed by #4723

@pshipton
Copy link
Member Author

@pshipton
Copy link
Member Author

Test passed. Doing a 5x grinder as well https://ci.eclipse.org/openj9/view/Test/job/Test-Grinder/271/

@pshipton
Copy link
Member Author

5x grinder passed.

@pshipton
Copy link
Member Author

This problem still occurs. The nightly system tests weren't running for a while, but they are running again now.

https://ci.eclipse.org/openj9/job/Test-extended.system-JDK8-win_x86/176/

Type=Segmentation error vmState=0x00000000
J9Generic_Signal_Number=00000004 ExceptionCode=c0000005 ExceptionAddress=2D2D1346 ContextFlags=0001007f
Handler1=72605E50 Handler2=725383D0 InaccessibleReadAddress=232702E0
EDI=200EF5C0 ESI=0177F1D0 EAX=200EF5C0 EBX=012A8360
ECX=00000000 EDX=012A8360
EIP=2D2D1346 ESP=2F49E108 EBP=27020200 EFLAGS=00010206
GS=002B FS=0053 ES=002B DS=002B
Module=
Module_base_address=2D170000 Offset_in_DLL=00161346

Compiled_method=org/junit/internal/MethodSorter.getDeclaredMethods(Ljava/lang/Class;)[Ljava/lang/reflect/Method;
Target=2_90_20190225_716 (Windows Server 2012 R2 6.3 build 9600)
CPU=x86 (8 logical CPUs) (0x1ffb9c000 RAM)

@pshipton
Copy link
Member Author

@dsouzai

@pshipton pshipton reopened this Feb 26, 2019
@dsouzai
Copy link
Contributor

dsouzai commented Feb 26, 2019

Ah...this is 32-bit windows, which means it's running AOT w/o SVM, which means that #4723 would not fix #4621 (it only applied to AOT w/ SVM). Sigh..

@dsouzai
Copy link
Contributor

dsouzai commented Feb 26, 2019

Opened #4892

@pshipton
Copy link
Member Author

@pshipton
Copy link
Member Author

I suppose this doesn't need to be fixed in the 0.13 release since it fails on Windows 32-bit, which isn't supported by Java 12. The 0.13 release only includes Java 12.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants