Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use one struct for R4300 registers on all architectures #134

Open
wants to merge 3 commits into
base: master
Choose a base branch
from

Conversation

Nebuleon
Copy link
Contributor

The New Dynarec/ARM redefines global variables in its own memory block in order to bring a lot of them together in memory for easy indexing. This PR does the same for all other R4300 cores on all architectures by moving a lot of often-used global variables from many files into one struct in r4300.c:

reg[32], hi, lo, g_cp0_regs[32], FCR0, FCR31, llbit, reg_cop1_simple[32], reg_cop1_double[32], reg_cop1_fgr_64[32], delay_slot, next_interupt, address, rdword, cpu_byte, cpu_hword, cpu_word, cpu_dword => g_state

The compiler is then allowed to reference every member of that structure with offsets from a single address, which is faster on ARM and MIPS (and may or may not be faster on x86 due to processor microarchitecture changes) when emitting position-independent code.

In addition, I have aligned the new structure to 4 KiB in order to induce only 1 TLB miss for accessing it.

The new structure is hidden and recreated in the New Dynarec/ARM as usual, but the number of recreated variables is greatly reduced.

I expect the performance to increase in these cores:

  • Pure Interpreter: 2%..10% faster, due to fewer reloads of global variable addresses, increased cache locality and reduced TLB misses; this depends on a game's use of multiplier unit instructions, MFC0, MTC0, [D]MFC1, [D]MTC1, floating-point width conversion instructions and memory access instructions, which accessed more than one global variable (and now access the members of the structure).
  • Cached Interpreter: <2% faster, due to increased cache locality and reduced TLB misses. It loads operand pointers from a struct precomp_instr, so after compilation, there are no reloads of global variable addresses.
  • Hacktarux JIT/x86, x86-64: <2% faster, due to increased cache locality and reduced TLB misses. Better code could be generated to take advantage of the new structure, but I didn't change anything there.

A possibly undesirable aspect of this change is that the Coprocessor 0-related variables are no longer in cp0.c, the FPU is no longer in cp1.c, and the global variables used as parameters to the memory accessor functions are no longer in memory.c.

Commit 3c9a3fc will definitely not build on ARM, and bbe2a26 should build, but I don't know if it works properly because I don't have an ARM device to test with.

The following global variables are now in a single contiguous structure
that is aligned to a page (4096 bytes) in compilers that support it:

reg[32], hi, lo, g_cp0_regs[32], FCR0, FCR31, llbit,
reg_cop1_simple[32], reg_cop1_double[32], reg_cop1_fgr_64[32],
delay_slot, next_interupt, address, rdword, cpu_byte, cpu_hword,
cpu_word, cpu_dword

On many architectures, the structure allows access to this data with an
offset from a base register containing the address of the structure. It
being aligned to a page allows it to require only one TLB entry, if the
compiler supports it.

Notably missing from this commit are the modifications required for the
New Dynarec to work with these new structure offsets.
Much like before, there is conditional compilation to selectively have
the New Dynarec create a variable in its own memory space instead of C
putting it somewhere inconvenient.

Now, though, the New Dynarec is just adopting the contiguous structure
that is also used by the other cores, so it is no longer the only core
that has the advantage of a contiguous set of often-used variables.
@Narann
Copy link
Member

Narann commented Oct 13, 2015

Interesting, I would wait for others to comment before merge such PR. Thanks for the hard work!

@Nebuleon
Copy link
Contributor Author

Of course; I even had a build failure, so it's possible that there are more errors lurking. Travis didn't get notified about commit 7bd0078 either, apparently.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants