Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Android-x86 on Marss - Pipeline deadlocked #35

Open
schfan opened this issue Sep 6, 2013 · 16 comments
Open

Android-x86 on Marss - Pipeline deadlocked #35

schfan opened this issue Sep 6, 2013 · 16 comments

Comments

@schfan
Copy link

schfan commented Sep 6, 2013

I ran into an issue of pipeline deadlock when I was running my Android-x86 image on Marss. The way to reproduce the error is as follows.

Here is my marss and qemu version info:

$ git clone git://github.com/avadhpatel/marss.git
$ cd marss
$ git show --summary
commit 49fda4a45e5b29c7e05b9e456228a4d016831484
Merge: 6cf2d32 4ce18f7
Author: Brendan Fitzgerald <[email protected]>
Date: Tue Aug 20 10:30:37 2013 -0700

Merge pull request #34 from dramninjasUMD/master

Build # of cores string with preprocessor

lines 1-9/9 (END)

$ scons c=1 debug=2
$ ./qemu/qemu-system-x86_64 -version
QEMU emulator version 0.14.1, Copyright (c) 2003-2008 Fabrice Bellard

And we can start the simulation:

(1) $ ./qemu/qemu-system-x86_64 -m 4096 -hda ../path-to-disk/android-64.img -usbdevice mouse -usbdevice keyboard

I am using a customized Android-x86 image (You can download it here: https://www.dropbox.com/s/m83kei9zga82c35/android-64.img). Enter the debug mode which is non-graphical (during the booting you need to type "exit" to continue booting), here is the kernel info of this image:

# uname -a
Linux (none) 3.0.36-android-x86-eeepc+ #1 SMP PREEMPT Tue Aug 27 21:27:01 EDT 2013 x86_64 GNU/Linux

(2) I want to simulate this command:

# am start -a android.intent.action.Main -n com.android.calculator2/.Calculator

If there is GUI, then after this command, the Calculator app would be launched. Without graphics, nothing will happen. So now we add start_sim before this command and try to simulate it.

Switch to the qemu terminal (Ctrl+Alt+2), and type
(qemu) simconfig -machine single_core

Then switch back to the Android terminal (Ctrl+Alt+1), type
# cd /data/marss/
# ./start_sim ; am start -a android.intent.action.Main -com.android.calculator2/.Calculator ; ./kill_sim

(I compiled start_sim/kill_sim statically from the source code provided on Marss website. )

(3) The simulation starts:
Switching to simulation

And in my original terminal I can see the Completed Cycles scrolling down...

After a while, the simulation gets stuck and my original terminal's output stops on this line:
...
Completed 24021000 cycles, 1774788 commits: 54459 Hz, 51483
Completed 24034000 cycles, 1786064 commits: 59305 Hz, 51441
Completed 24045000 cycles, 1797644 commits: 51526 Hz, 54243insns/sec: rip ffffffff81026c57

And then after a while qemu exits:
...
Completed 24021000 cycles, 1774788 commits: 54459 Hz, 51483
Completed 24034000 cycles, 1786064 commits: 59305 Hz, 51441
Completed 24045000 cycles, 1797644 commits: 51526 Hz, 54243
qemu-system-x86_64: ptlsim/build/core/ooo-core/ooo.cpp:929: bool ooo::OooCore::runcycle(void*): Assertion 0' failed. Aborted`

If we look at the code ooo.cpp:929, we can see that the issue is still caused by "the pipeline could be deadlocked" but this information was not printed out to the terminal.

@dramninjasUMD
Copy link
Contributor

Just out of curiosity, have you tried running other, simpler binaries in this disk image? Maybe something like ls?

@schfan
Copy link
Author

schfan commented Sep 7, 2013

Yes, running simple things like ls is okay.

Thanks!
SF
-----Original Message-----
From: dramninjasUMD [email protected]
Date: Sat, 07 Sep 2013 13:01:25
To: avadhpatel/[email protected]
Reply-To: avadhpatel/marss [email protected]
Cc: [email protected]
Subject: Re: [marss] Android-x86 on Marss - Pipeline deadlocked (#35)

Just out of curiosity, have you tried running other, simpler binaries in this disk image? Maybe something like ls?


Reply to this email directly or view it on GitHub:
#35 (comment)

@tj90241
Copy link
Contributor

tj90241 commented Sep 9, 2013

Image doesn't work on the any of my repositories (tried anywhere from qemu 0.14 to bleeding edge).

After SeaBIOS initializes, the following message appears:

Booting from Hard Disk...
Error 16

@fitzfitsahero
Copy link
Collaborator

I got it to boot on the master branch. I'll spend some time looking at it.

@schfan
Copy link
Author

schfan commented Sep 9, 2013

Thanks for your help!

By the way I have tried checking /proc/kallsyms but there wasn't any kernel
symbol that has an address corresponding to the virtual address that is
shown repetitively in the log file.

On Mon, Sep 9, 2013 at 11:00 AM, Brendan Fitzgerald <
[email protected]> wrote:

I got it to boot on the master branch. I'll spend some time looking at it.


Reply to this email directly or view it on GitHubhttps://github.com//issues/35#issuecomment-24083369
.

@schfan schfan closed this as completed Sep 10, 2013
@schfan schfan reopened this Sep 10, 2013
@schfan
Copy link
Author

schfan commented Sep 10, 2013

@tj90241 I noticed that the image might be corrupted during the downloading, which will lead to the "Booting from Hard Disk..." Error. If that happens, please download it again! Thanks!

@tj90241
Copy link
Contributor

tj90241 commented Sep 13, 2013

Redownloaded; it was a corrupted image, thanks. I'll look into it this weekend.

@schfan
Copy link
Author

schfan commented Sep 13, 2013

@tj90241 Thanks Tyler!

I also noticed that qemu 1.2 supports network while qemu 0.14 doesn't, in the case of Android-x86. But I guess it doesn't matter for now.

@schfan
Copy link
Author

schfan commented Sep 13, 2013

PS: If any of you are interested in building your own Android-x86 image, here is how to do that: http://www.cs.duke.edu/~schfan/blog/blog/2013/09/13/making-an-android-x86-image-for-marss/ . Thanks!

@tj90241
Copy link
Contributor

tj90241 commented Sep 13, 2013

Found the issue after looking quickly -- MARSS doesn't handle SMC properly. I'm surprised this bug hasn't arisen before now, but it makes sense that it's causing Java to tie up immediately as Java makes excessive use of SMC. Fortunately, it's not related to your image or anything -- thanks for the bug report.

@schfan
Copy link
Author

schfan commented Sep 13, 2013

Hi Tyler,

It is great news! Thanks so much for your help!

Could you tell me how you found out this issue? I am learning methods of
debugging in Marss. Also, since the issue is found, are there ways to fix
it? I'd like to help!

Thanks again!

On Fri, Sep 13, 2013 at 1:31 PM, Tyler Stachecki
[email protected]:

Found the issue after looking quickly -- MARSS doesn't handle SMC
properly. I'm surprised this bug hasn't arisen before now, but it makes
sense that it's causing Java to tie up immediately as Java makes excessive
use of SMC. Fortunately, it's not related to your image or anything --
thanks for the bug report.


Reply to this email directly or view it on GitHubhttps://github.com//issues/35#issuecomment-24410914
.

@tj90241
Copy link
Contributor

tj90241 commented Sep 14, 2013

I honestly guess most of it was just intuition. MARSS simulates almost everything perfectly -- as I said before, I have never seen single_core deadlock in ages! Given that knowledge, and that it is widely know that the JVM uses SMC, I then looked at the simulator and lo and behold, it was fairly evident that SMC is not being handled correctly (there are even some unimplemented functions lying around...).

@schfan
Copy link
Author

schfan commented Sep 15, 2013

Thanks for finding out the issue! Please excuse my little knowledge in this area, but do you mean Self Modifying Code when you say SMC? If possible, could you please say more about the unimplemented functions you found?

I read the PTLSim manual (version 2007) and it mentioned how SMC is supported (page 31). But what exactly is causing the problem we have? Is it because Marss' "design eliminates forced invalidations when the kernel frees up a page containing code that's immediately overwritten with normal user data"?

I am just wondering what would be the best way to solve/work-around this issue, because running Android-x86 applications is crucial for my current research project. Although I can try looking for the specific functions in JVM and modify them to prevent Marss from crashing, it would be more convincing not to modify the guest OS. Do you think it's possible to fix the SMC related problem in Marss? If so, how long do you think it will take? If you can point out the necessary steps, I'd like to try working on it.

Many thanks!

@tj90241
Copy link
Contributor

tj90241 commented Sep 16, 2013

Yes, I do mean self-modifying code when I abbreviate with SMC.

I did spend some time looking at it this weekend, but unfortunately the bug hasn't been as simple to repair as I had hoped. I have gotten to simulation to proceed further, but the guest either segfaults while running code that is self-modifying in simulation mode, or the pipeline just deadlocks (albeit at a later point in time than it did before the fixes).

The unimplemented function related to SMC is here:
https://github.com/avadhpatel/marss/blob/master/ptlsim/x86/ptlhwdef.h#L984

It's also very confusing in some cases as to which SMC function is being called in many cases! See:
https://github.com/avadhpatel/marss/blob/master/ptlsim/x86/ptlhwdef.h#L939
https://github.com/avadhpatel/marss/blob/master/ptlsim/x86/ptlhwdef.h#L1779
(one function accepts a physical address, and another accepts a virtual address).

I'm also not certain that all of these functions ever get called, either...

I have also noticed that the mfnlo and mfnhi variables of the RIPVirtPhys class from PTLsim are always set to zero and not the same way they are in PTLsim? These variables are often used by the simulator in parts of code that check and handle SMC, so I tried to fix that part of the problem. I can send you a patch of what I currently have offline if you e-mail me directly.

AFAIK, SMC did work in PTLsim; sometime after it was merged with MARSS it broke is my guess (?) (it could be that the bug was also in the original PTLsim and wasn't fixed when it got merged with MARSS).

Unfortunately, I'm not sure that there is a way around the bug; that is to say, I'm not certain whether or not you can simply modify the JVM to skirt around this issue. It's certainly possible to fix it, it's just going to be a difficult bug to properly track and solve in my mind. My next goal was to see if I could write a very small piece of SMC and try to reproduce the issue so that the log is more manageable size to read and the problem is easier to debug, but I ran out of time this weekend.

@schfan
Copy link
Author

schfan commented Sep 19, 2013

Thanks so much for your help!

After seeing your comments, I first thought it was due to Android's JIT (just-in-time) execution mode. I turned it off system-wide, but it didn't work. Then I tested if it's related to Java virtual machine (Dalvik) issue and it seems to be the case.

(I need to point out that I tested Java in the Ubuntu disk image on Marss and it was okay.)

I added a Dalvik Executable file in the Android disk image and simply executing it will reproduce the error. I have updated the disk image file, please download it again: https://www.dropbox.com/s/m83kei9zga82c35/android-64.img .

Now if you boot the Android virtual machine, (switch to the qemu terminal and simconfig -machine single_core and then switch back to the Android terminal) type:

# su
# cd /data/marss/
# ./run_java.sh

the simulation will soon terminate because of the same pipeline deadlock issue.

@tj90241 I will email you directly regarding the patch file you have. Thanks!

PS: If you want to write your own java file and execute it in Android, here is how to do it: http://www.cs.duke.edu/~schfan/blog/blog/2013/09/19/executing-dex-file-in-android/.

@schfan
Copy link
Author

schfan commented Jan 9, 2014

Hi,

I am just wondering if anyone would still like to work on this issue. Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants