R5900: Improve the EE cache performance #12108

F0bes · 2024-12-19T21:10:23Z

Description of Changes

Entry prefetching

I shrunk the size of a TLB entry from 48 bytes to 16 bytes. Theoretically on a lookup, we would prefetch up to 3 other TLB entries (due to the 64 byte cache length), which is nice because the hottest code looks up entries linearly.

Because I made the mistake of assuming this was any sort of bottleneck without checking, this actually slowed things down. We weren't memory bound here and the precomputed entry values that were bloating the structure were actually beneficial.

This optimization combined with the ones below turned out to be an improvement, so it is present in this PR.

Common Subexpression Elimination

From

for(int i = 0; i < 48; i++)
{
   if(entry_list[i].isCached())
   {
       // do work with entry_list[i]
   }
}

Into

for(int i = 0; i < 48; i++)
{
   const tlbentry& entry = entry_list[i];
   if(entry.isCached())
   {
       // do work with entry
   }
}

Because of how hot this code is, I wanted to help out the compiler and processor some. Instead of constantly indexing into the array during our entry accesses, we create a reference of it at the top of the loop.
This is a common pattern so I was hoping to hit some sort of compiler heuristic, or at least access memory in a more cache friendly way. Turns out it does as I saw a general speed increase and we were less memory bound by around 0.6%.

Only Check Valid Entries

Instead of looking through every cache entry to see if a specific address should be cached, we can instead build a separate list of "cached entries" and only look through those.
This was the most significant optimization. It reduced the number of branches and increased branch prediction accuracy.

Overall I've seen a performance increase of around 20%
.

Rationale behind Changes

I want to get more familiar with VTune profiling. The EE cache is also very slow.

Suggested Testing Steps

Test games that require EE cache with this PR (ensure any patches we have for the game are disabled)
Run the EE cache and compare the speed to master.

JordanTheToaster · 2024-12-23T11:27:47Z

Just one benchmark for now but in Ape Escape 2 EE cache is 48% faster with the PR.

JordanTheToaster · 2024-12-24T10:49:30Z

With 128 bit SIMD Ape Escape 2 is now 240% faster than master.

R5900: Improve the EE cache performance

aa52344

F0bes added 2 commits December 23, 2024 18:33

EE Cache: SIMD the EE cache lookup routine

ccaf224

EE Cache: Let's blow the roof off PCSX2

3fb5be9

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

R5900: Improve the EE cache performance #12108

R5900: Improve the EE cache performance #12108

F0bes commented Dec 19, 2024 •

edited

Loading

JordanTheToaster commented Dec 23, 2024

JordanTheToaster commented Dec 24, 2024

R5900: Improve the EE cache performance #12108

Are you sure you want to change the base?

R5900: Improve the EE cache performance #12108

Conversation

F0bes commented Dec 19, 2024 • edited Loading

Description of Changes

Entry prefetching

Common Subexpression Elimination

Only Check Valid Entries

Rationale behind Changes

Suggested Testing Steps

JordanTheToaster commented Dec 23, 2024

JordanTheToaster commented Dec 24, 2024

F0bes commented Dec 19, 2024 •

edited

Loading