Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

'get_page_from_freelist' general protection fault #75

Open
xjtuwxg opened this issue Dec 18, 2018 · 10 comments
Open

'get_page_from_freelist' general protection fault #75

xjtuwxg opened this issue Dec 18, 2018 · 10 comments
Labels

Comments

@xjtuwxg
Copy link
Member

xjtuwxg commented Dec 18, 2018

After running NPB/BT (x86 homogeneous setting), the remote node waits for about 600s and generates a general protection fault.

I only modified NPB/BT slightly, so it might also have this issue for other simpler benchmark.

popcorn@x86-homogeneous-vm2:~$ [  218.404419] remote_worker_main: [689] for [707/0]
[  218.408095] remote_worker_main: [689] /home/popcorn/bt.W.x
[  218.410185]
[  218.410185] ####### MIGRATED - [690/1] from [707/0]
[  218.565663] ####### MIGRATE [690] to 0
[  218.568169] EXITED [690] remote / 0x40
[  223.368424]
[  223.368424] TERMINATE [689] with 0x0
[  223.372415] EXITED [689] remote worker / 0x0
[  812.277453] general protection fault: 0000 [#1] SMP NOPTI
[  812.279702] CPU: 0 PID: 692 Comm: cron Not tainted 4.20.0-rc7-popcorn+ #159
[  812.281198] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.2-1 04/01/2014
[  812.281198] RIP: 0010:get_page_from_freelist+0x1bb/0xea0
[  812.281198] Code: 84 da 06 00 00 48 8b 44 24 18 48 bb 00 01 00 00 00 00 ad de 48 83 c0 01 48 c1 e0 04 49 8b 04 03 48 8b 08 48 8b 50 08 49 89 c7 <48> 89 51 08 48 89 0a 48 89 18 48 bb 00 02 00 00 00 00 ad de 48 89
[  812.281198] RSP: 0000:ffffc900000fbc18 EFLAGS: 00010012
[  812.281198] RAX: ffffea000448e2a0 RBX: dead000000000100 RCX: dead000000000100
[  812.281198] RDX: dead000000000200 RSI: 0000000000000000 RDI: 00000000006200ca
[  812.281198] RBP: ffffc900000fbd80 R08: 0000000000000055 R09: 0000000000035412
[  812.281198] R10: 0000000000000000 R11: ffff88813fc22eb8 R12: 0000000000000000
[  812.281198] R13: ffffffff81cb2780 R14: 0000000000000020 R15: ffffea000448e2a0
[  812.281198] FS:  00007f8c081de800(0000) GS:ffff88813fc00000(0000) knlGS:0000000000000000
[  812.281198] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  812.281198] CR2: 000000000060b330 CR3: 000000013a700000 CR4: 00000000000406f0
[  812.281198] Call Trace:
[  812.281198]  ? flush_tlb_mm_range+0xc1/0x120
[  812.281198]  ? release_pages+0x249/0x2e0
[  812.281198]  __alloc_pages_nodemask+0x129/0xde0
[  812.281198]  ? tlb_flush_mmu_free+0x31/0x50
[  812.281198]  ? cpumask_any_but+0x1f/0x40
[  812.281198]  ? flush_tlb_mm_range+0xc1/0x120
[  812.281198]  ? filemap_map_pages+0x17e/0x300
[  812.281198]  wp_page_copy+0x54/0x490
[  812.281198]  __handle_mm_fault+0x4dd/0xbd0
[  812.281198]  __do_page_fault+0x1c9/0x5c0
[  812.281198]  ? __put_user_4+0x1c/0x30
[  812.281198]  ? page_fault+0x8/0x30
[  812.281198]  page_fault+0x1e/0x30
[  812.281198] RIP: 0033:0x406c45
[  812.281198] Code: 8b 04 25 28 00 00 00 48 89 84 24 68 04 00 00 31 c0 85 ff 48 8b 3d fb 46 20 00 74 21 48 85 ff 0f 84 98 00 00 00 e8 eb b8 ff ff <48> c7 05 e0 46 20 00 00 00 00 00 eb 60 66 0f 1f 44 00 00 48 85 ff
[  812.281198] RSP: 002b:00007ffc169b0c40 EFLAGS: 00010206
[  812.281198] RAX: 0000000000000000 RBX: 000000000166da50 RCX: 00000000fbad000c
[  812.281198] RDX: 0000000000000000 RSI: 0000000000000000 RDI: 00007f8c07b85620
[  812.281198] RBP: 000000000166dab0 R08: 000000000166c3b0 R09: 00007f8c081de800
[  812.281198] R10: 00007f8c081dead0 R11: 0000000000000206 R12: 0000000000403087
[  812.281198] R13: 00007ffc169b1230 R14: 0000000000000000 R15: 0000000000000000
[  812.281198] Modules linked in: msg_socket
[  812.281198] ---[ end trace 078cb2d26464700b ]---
@xjtuwxg xjtuwxg changed the title get_page_from_freelist general protection fault 'get_page_from_freelist' general protection fault Dec 18, 2018
@bxatnarf
Copy link
Collaborator

bxatnarf commented Jan 14, 2019

I am not seeing this issue but I am seeing deadlock when I run mt on homogeneous x86 (after it exits). I don't think it's related to this bug, so I will open a separate issue.

@bxatnarf
Copy link
Collaborator

bxatnarf commented Jan 22, 2019

bt when compiled "natively" (on one of the x86-64 popcorn notes, merge branch) works fine for me. Is this on the master and/or merge branch? We need to figure out how to reproduce this.
edit: s/mt/bt/

@xjtuwxg
Copy link
Member Author

xjtuwxg commented Jan 22, 2019

Hey @bxatnarf . I guess the benchmarks in popcorn-kern-lib should work fine now. Since I saw your reply under this issue, just make sure we are talking the same benchmark. I guess the failed benchmark I used was https://github.com/ssrg-vt/SNU_NPB-1.0.3/tree/make-popcorn-explicit/NPB3.3-SER-C-FLAT/bt.

I realize that I may not add you to the repo. here is the invitation link: https://github.com/ssrg-vt/SNU_NPB-1.0.3/invitations

@bxatnarf
Copy link
Collaborator

Woops, I spaced out and thought that bt was a typo in my previous comment, forgetting about the benchmarks, and changed it to mt. Fixed it (again). I think I'm getting a little lost in the alphabet soup that is the names of all of these test cases.

We don't want the kernel to be spitting out errors because there is something wrong with the userland binary. We should keep copies of the binaries that cause errors so we can be sure to address these problems. Of course our first priority is make sure all of the examples that work on the master branch also work on the merge branch. It would be helpful to publish a copy of the binary that caused these errors along with this (and any future) issues. If you still have a build of bt that generated this stack trace, can you post a link to it here?

@xjtuwxg
Copy link
Member Author

xjtuwxg commented Jan 22, 2019

Please take a look at this NPB/BT binary, it has migration pointer inserted. I look at it in my popcorn VM, seems that is the only BT binary. Try to run it about 10 times, to see whether you could observe that issue.

https://github.com/ssrg-vt/popcorn-kernel/blob/arm64-mt/bt.W.x

@bxatnarf
Copy link
Collaborator

I have noticed that when I have popcorn's debugging enabled, I'm more likely to achieve deadlock. @xjtuwxg, what popcorn config options do you have set when you get this GPF? Could you paste the output of grep POPCORN .config (from your build directory) here?

@xjtuwxg
Copy link
Member Author

xjtuwxg commented Jan 28, 2019

I can't remember what is the configuration when I got this error message. Here is what I get from my current popcorn kernel configuration. While, I think if we could observe this bug, we may want to know why this is happened and should solve it. (although it might be hard)

$ grep POPCORN .config
CONFIG_ARCH_SUPPORTS_POPCORN=y
CONFIG_POPCORN=y
CONFIG_POPCORN_DEBUG=y
# CONFIG_POPCORN_DEBUG_PROCESS_SERVER is not set
# CONFIG_POPCORN_DEBUG_PAGE_SERVER is not set
# CONFIG_POPCORN_DEBUG_VMA_SERVER is not set
# CONFIG_POPCORN_DEBUG_VERBOSE is not set
CONFIG_POPCORN_CHECK_SANITY=y
CONFIG_POPCORN_REMOTE_INFO=y
# CONFIG_POPCORN_STAT is not set
CONFIG_POPCORN_KMSG=y
CONFIG_POPCORN_KMSG_SOCKET=m
# CONFIG_POPCORN_KMSG_TEST is not set
# CONFIG_POPCORN_DEBUG_MSG_LAYER is not set

@bxatnarf
Copy link
Collaborator

@xjtuwxg I want to see if I can reproduce this on arm. Do you remember what changes you made to bt and what your build configuration for it is? Or do you have a copy of the equivalent arm binary? My tries thus far have only retriggered issue #80
I have, however, been able to reproduce this on x86 with the binary you posted in #75 (comment)

@xjtuwxg
Copy link
Member Author

xjtuwxg commented Mar 27, 2019

Hi @bxatnarf, I can't remember what I changed the code. It has been a long time. Maybe we should try to solve issue #80 if we cannot reproduce this issue.

@bxatnarf
Copy link
Collaborator

bxatnarf commented Mar 27, 2019 via email

@jnarf jnarf added the bug label Nov 14, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants