-
Notifications
You must be signed in to change notification settings - Fork 64
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Job balance among cores #51
Comments
By the way, if this is MarssX86 internal bug, it is a really critical one as a simulator. All the works with MarssX86 should not be trusted, as long as they used multi core configuration (but what is not multi-core nowadays). What I experienced is that it is very un-predictable. Sometimes, 2 cores run. Sometimes, 3 cores run. Worst case, it seems that none of cores executes user programs, so that they just stay (not deadlock), but never finish. |
I'm going to need way more information if you're looking for help. What benchmark did you run? What configuration? For how long did you run the benchmark? It is very possible that you never got out of the initialization phase. |
Thank you for your response. The benchmark is what I made. It is a tree data structure with inserts and deletes. I built the simulator with 4 OOO_core (c=4) with Dramsim2. L1 and L2 is private, and L3 is shared cache with split bus. At the beginning of the program, it creates checkpoint. From the checkpoint, the program runs about 30 minutes (including 1~2 minutes of warming up before MarssX86 starts). Please let me know if you need more information. By the way, how long does it take to get out of the initialization phase? |
Does your benchmark use pthreads or openmp or some other threading library? |
Yes, it uses pthreads. And when shared data is accessed, mutex is used. Can it be a problem? If it is, could you recommend any other alternatives? |
We have images with parsec and splash, I would suggest you try them first. I have run multicore simulations that effectively use all of the cores and get expected IPCs |
Then, my question is how many thread have you run when you do your tests? As I know PARSEC creates quite numbers of threads. Do you know the details? |
I run as many threads as I have cores. |
I just checked the posted parsec image and it seems that it uses thread-affinity. In MarssX86, if the thread uses thread-affinity, only one simulations is successfully running and all others are not proceeding (although it looks it does). I also tried thread-affinity in my benchmarks previously and I checked this issue already. I used thread-affinity tools provided by pthreads by the way. One simulation instance is also not possible configuration, because I have to run multiple simulations. Otherwise, it will take days to complete it. Could you check this issue and how to solve please? |
Thank you for your efforts on this simulator. It is very useful. However, I found that jobs are not equally distributed among cores. For example, I run 4 threads among 4 cores. I found that 1 or 2 cores are often not used to run my programs. If I am lucky, all 4 cores are used, but it happens not that often. Could you tell me what I missed and why it happens? And how can I fix it?
One example is as follows. As you can see ooo_0 didn't run user programs at all.
user.base_machine.ooo_0_0.thread0.commit.ipc = 0
user.base_machine.ooo_1_1.thread0.commit.ipc = 0.634201
user.base_machine.ooo_2_2.thread0.commit.ipc = 0.648642
user.base_machine.ooo_3_3.thread0.commit.ipc = 0.639732
kernel.base_machine.ooo_0_0.thread0.commit.ipc = 0.0147105
kernel.base_machine.ooo_1_1.thread0.commit.ipc = 0.173759
kernel.base_machine.ooo_2_2.thread0.commit.ipc = 0.565157
kernel.base_machine.ooo_3_3.thread0.commit.ipc = 0.173546
total.base_machine.ooo_0_0.thread0.commit.ipc = 0.0147105
total.base_machine.ooo_1_1.thread0.commit.ipc = 0.502229
total.base_machine.ooo_2_2.thread0.commit.ipc = 0.641751
total.base_machine.ooo_3_3.thread0.commit.ipc = 0.503832
The text was updated successfully, but these errors were encountered: