Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Job balance among cores #51

Open
sunez opened this issue Nov 18, 2016 · 9 comments
Open

Job balance among cores #51

sunez opened this issue Nov 18, 2016 · 9 comments

Comments

@sunez
Copy link

sunez commented Nov 18, 2016

Thank you for your efforts on this simulator. It is very useful. However, I found that jobs are not equally distributed among cores. For example, I run 4 threads among 4 cores. I found that 1 or 2 cores are often not used to run my programs. If I am lucky, all 4 cores are used, but it happens not that often. Could you tell me what I missed and why it happens? And how can I fix it?

One example is as follows. As you can see ooo_0 didn't run user programs at all.

user.base_machine.ooo_0_0.thread0.commit.ipc = 0
user.base_machine.ooo_1_1.thread0.commit.ipc = 0.634201
user.base_machine.ooo_2_2.thread0.commit.ipc = 0.648642
user.base_machine.ooo_3_3.thread0.commit.ipc = 0.639732
kernel.base_machine.ooo_0_0.thread0.commit.ipc = 0.0147105
kernel.base_machine.ooo_1_1.thread0.commit.ipc = 0.173759
kernel.base_machine.ooo_2_2.thread0.commit.ipc = 0.565157
kernel.base_machine.ooo_3_3.thread0.commit.ipc = 0.173546
total.base_machine.ooo_0_0.thread0.commit.ipc = 0.0147105
total.base_machine.ooo_1_1.thread0.commit.ipc = 0.502229
total.base_machine.ooo_2_2.thread0.commit.ipc = 0.641751
total.base_machine.ooo_3_3.thread0.commit.ipc = 0.503832

@sunez
Copy link
Author

sunez commented Nov 18, 2016

By the way, if this is MarssX86 internal bug, it is a really critical one as a simulator. All the works with MarssX86 should not be trusted, as long as they used multi core configuration (but what is not multi-core nowadays). What I experienced is that it is very un-predictable. Sometimes, 2 cores run. Sometimes, 3 cores run. Worst case, it seems that none of cores executes user programs, so that they just stay (not deadlock), but never finish.

@fitzfitsahero
Copy link
Collaborator

I'm going to need way more information if you're looking for help.

What benchmark did you run? What configuration? For how long did you run the benchmark?

It is very possible that you never got out of the initialization phase.

@sunez
Copy link
Author

sunez commented Nov 18, 2016

Thank you for your response. The benchmark is what I made. It is a tree data structure with inserts and deletes. I built the simulator with 4 OOO_core (c=4) with Dramsim2. L1 and L2 is private, and L3 is shared cache with split bus. At the beginning of the program, it creates checkpoint. From the checkpoint, the program runs about 30 minutes (including 1~2 minutes of warming up before MarssX86 starts). Please let me know if you need more information.

By the way, how long does it take to get out of the initialization phase?

@fitzfitsahero
Copy link
Collaborator

Does your benchmark use pthreads or openmp or some other threading library?

@sunez
Copy link
Author

sunez commented Nov 18, 2016

Yes, it uses pthreads. And when shared data is accessed, mutex is used. Can it be a problem? If it is, could you recommend any other alternatives?

@fitzfitsahero
Copy link
Collaborator

We have images with parsec and splash, I would suggest you try them first.

I have run multicore simulations that effectively use all of the cores and get expected IPCs

@sunez
Copy link
Author

sunez commented Nov 18, 2016

Then, my question is how many thread have you run when you do your tests? As I know PARSEC creates quite numbers of threads. Do you know the details?

@fitzfitsahero
Copy link
Collaborator

I run as many threads as I have cores.

@sunez
Copy link
Author

sunez commented Nov 19, 2016

I just checked the posted parsec image and it seems that it uses thread-affinity. In MarssX86, if the thread uses thread-affinity, only one simulations is successfully running and all others are not proceeding (although it looks it does). I also tried thread-affinity in my benchmarks previously and I checked this issue already. I used thread-affinity tools provided by pthreads by the way. One simulation instance is also not possible configuration, because I have to run multiple simulations. Otherwise, it will take days to complete it. Could you check this issue and how to solve please?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants