-
-
Notifications
You must be signed in to change notification settings - Fork 1.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Soft-Float] - Initial Interpreter Implementation of Ps2's floating point unit specification #12001
base: master
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for submitting a contribution to PCSX2
As this is your first pull request, please be aware of the contributing guidelines.
Additionally, as per recent changes in GitHub Actions, your pull request will need to be approved by a maintainer before GitHub Actions can run against it. You can find more information about this change here.
Please be patient until this happens. In the meantime if you'd like to confirm the builds are passing, you have the option of opening a PR on your own fork, just make sure your fork's master branch is up to date!
Does this work on the recompilers or just interpreter? |
You should try reading the title. |
…oint unit specification. This Pull Request implements the first take ever on real Soft-Float support in PCSX2. This work is a combination or several efforts and researches done prior. Credits: - https://www.gregorygaines.com/blog/emulating-ps2-floating-point-nums-ieee-754-diffs-part-1/ - https://github.com/GitHubProUser67/MultiServer3/blob/main/BackendServices/CastleLibrary/EmotionEngine.Emulator/Ps2Float.cs - https://github.com/Goatman13/pcsx2/tree/accurate_int_add_sub - PCSX2 Team for their help and support in this massive journey. This pull request should be tested with every games requiring a clamping/rounding mode (cf: GameDatabase). Currently, this PR fixes on the interpreters: - PCSX2#354 - PCSX2#11507 - PCSX2#10519 - PCSX2#8068 - PCSX2#7642 - PCSX2#5257 This is important to note, that this implementation, while technically fixing Gran Turismo 4 and Klonoa 2, makes the games crash due to very high floats being passed in the emu code, and failing at some points later in the process. This has not yet been ironed-out. Other than that, this sets the floor for Soft-Float in PCSX2, a long awaited contribution.
96414bd
to
de047ea
Compare
I don’t know why my brain just skimmed over that. |
I ran a bunch of tests for "Test Drive Unlimited" AI during the demo scene after sitting idle on the menu. No combination of settings/interpreters seems to have any effect on behavior. There must be something else going on |
Shouldn't this help with #2990 as well? |
I remember seeing on the public dev channel that stuntman not longer had AI issues with this and not longer the car AI failed? it's been a while |
I tested the demo replays of Tokyo Xtreme Racer Zero (see issue #5597 ) and noticed that the car movement in interpreter mode is now closer to the movement in recompiler mode. comp.mp4 |
The game sends some super low floats to the Mul unit. On PS2, floats with exponent zero should return zero, but this is not the case in Mul, the multiplier can work with denormals internally. I love when undocumented stuff is used by some games for their 3D engine ^^.
The Tony Hawk case is fixed, the game uses an un-documented behaviour in it's 3D engine. The PS2 has no denormals support .... except in the Mul unit apparently. The behaviour is now emulated properly. |
@AmyRoxwell Can you provide a reference to this? There's no indication of it being fixed in #2990, and I assume the devs would have closed it if it were. In any event it involves pathing in Driver 3 as well. |
2024-11-10.14-28-32.mp4This also affects the Fatal Frame 1 issue. Meaning this + the current GameDB patch will end up being the ultimate fix |
More accurate approach to compare.
This pr's EE interpreter fixes #11636 's gamedb issue. |
Implements accurate SQRT options, also removes Tri-Ace hack, which isn't needed anymore on the interpreter.
I meant like, while using this PR, not that is has been fixed. Sorry if it was misunderstood. But if it's not mention on the PR maybe the thing it needs it's not here by this initial implementation. |
Driv3r seemed fine when it was tested, Stuntman NTSC is a lot better but still can "slightly" deviate. I suspect it is once again, the interpreter rounding/clamping values somewhere. |
@AmyRoxwell Ah, I understand you now. That's great news. @GitHubProUser67 Thanks, nice to see Driv3r is looking better. Would it make sense to list these games in your OP? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Gave it a quick glance.
Can you use u32
and int
(or even better, s32
) instead of the int32_t
and uint32_t
typedefs?
Same for u8
,u16
etc.
87f9578
to
d22f591
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I had skimmed though this a little while ago and this was what I picked up on.
pcsx2/VUops.cpp
Outdated
VECTOR* tmp; | ||
VECTOR* dst; | ||
if (_Fd_ == 0) | ||
dst = &RDzero; | ||
else | ||
dst = &VU->VF[_Fd_]; | ||
|
||
if (_X) dst->i.x = VU_MACx_UPDATE(VU, vuDouble(VU->ACC.i.x) + ( vuDouble(VU->VF[_Fs_].i.x) * vuDouble(VU->VF[_Ft_].i.x))); else VU_MACx_CLEAR(VU); | ||
if (_Y) dst->i.y = VU_MACy_UPDATE(VU, vuDouble(VU->ACC.i.y) + ( vuDouble(VU->VF[_Fs_].i.y) * vuDouble(VU->VF[_Ft_].i.y))); else VU_MACy_CLEAR(VU); | ||
if (_Z) dst->i.z = VU_MACz_UPDATE(VU, vuDouble(VU->ACC.i.z) + ( vuDouble(VU->VF[_Fs_].i.z) * vuDouble(VU->VF[_Ft_].i.z))); else VU_MACz_CLEAR(VU); | ||
if (_W) dst->i.w = VU_MACw_UPDATE(VU, vuDouble(VU->ACC.i.w) + ( vuDouble(VU->VF[_Fs_].i.w) * vuDouble(VU->VF[_Ft_].i.w))); else VU_MACw_CLEAR(VU); | ||
tmp = &VU->TMP; | ||
if (_X) {tmp->i.x = vuAccurateMulDiv(VU, VU->VF[_Fs_].i.x, VU->VF[_Ft_].i.x, 0); dst->i.x = VU_MACx_UPDATE(VU, vuAccurateAddSub(VU, VU->ACC.i.x, tmp->i.x, 0));} else VU_MACx_CLEAR(VU); | ||
if (_Y) {tmp->i.y = vuAccurateMulDiv(VU, VU->VF[_Fs_].i.y, VU->VF[_Ft_].i.y, 0); dst->i.y = VU_MACy_UPDATE(VU, vuAccurateAddSub(VU, VU->ACC.i.y, tmp->i.y, 0));} else VU_MACy_CLEAR(VU); | ||
if (_Z) {tmp->i.z = vuAccurateMulDiv(VU, VU->VF[_Fs_].i.z, VU->VF[_Ft_].i.z, 0); dst->i.z = VU_MACz_UPDATE(VU, vuAccurateAddSub(VU, VU->ACC.i.z, tmp->i.z, 0));} else VU_MACz_CLEAR(VU); | ||
if (_W) {tmp->i.w = vuAccurateMulDiv(VU, VU->VF[_Fs_].i.w, VU->VF[_Ft_].i.w, 0); dst->i.w = VU_MACw_UPDATE(VU, vuAccurateAddSub(VU, VU->ACC.i.w, tmp->i.w, 0));} else VU_MACw_CLEAR(VU); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What's the reason for using a global TMP vector?
This can just be using a locally defined one, right?
ditto for the other fused ops
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The reason behind the global vector is to allow for more generalised debugging when doing an FMA operation.
The use cases where it can be used, is when you don't know where you float went wrong, and it happened in an FMA instruction, so you could simply hook the global TMP and see all the values the FMA processed in between.
Ultimately you decide if it is worth or not.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd rather not have it personally
The less global a variable, the less trouble it causes
36c0d7d
to
5c5febd
Compare
More in line with the PCSX2 code-base.
5c5febd
to
34753ae
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For more general design, I think PS2Float should hold the float directly, with methods to extract the pieces, rather than storing the parts separately. This would save deconstructing and then reconstructing floats to do operations like abs and negate, which really just need to and
or xor
with some bits.
e.g.
class PS2Float
{
public:
u32 raw;
constexpr explicit PS2Float(u32 raw_): raw(raw_) {}
constexpr u32 Mantissa() const { return (raw & 0x7fffff) | 0x800000; }
constexpr u8 Exponent() const { return (raw >> 23) & 0xff; }
constexpr bool Sign() const { return raw >> 31; }
constexpr PS2Float Abs() const { return { raw & 0x7fffffff }; }
constexpr PS2Float Negate() const { return { raw ^ 0x80000000 }; }
};
(If we really want to let the compiler optimize well, we should implement add/sub/etc as static functions that take both lhs and rhs by value, then we can mark them with __attribute__((const))
and let the compiler assume they won't touch any global variables or anything, which might be important if we're hoping the compiler will hoist all those CHECK_VU_SOFT_ADDSUB
checks outside of the four if statements in the VUs, though I'm still kind of doubtful that it will)
pcsx2/Ps2Float.cpp
Outdated
s32 PS2Float::clz(s32 x) | ||
{ | ||
x |= x >> 1; | ||
x |= x >> 2; | ||
x |= x >> 4; | ||
x |= x >> 8; | ||
x |= x >> 16; | ||
|
||
return debruijn32[(u32)x * 0x8c0b2891u >> 26]; | ||
} | ||
|
||
s32 PS2Float::BitScanReverse8(s32 b) | ||
{ | ||
return msb[b]; | ||
} | ||
|
||
s32 PS2Float::GetMostSignificantBitPosition(u32 value) | ||
{ | ||
for (s32 i = 31; i >= 0; i--) | ||
{ | ||
if (((value >> i) & 1) != 0) | ||
return i; | ||
} | ||
return -1; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We already have helpers for these that map to the associated cpu instructions
IIRC we're just using MS names, so int _BitScanReverse(unsigned long* const Index, const unsigned long Mask)
If you don't like the weird signature (e.g. using long instead of u32/u64), maybe update common/BitUtils.h
with a nicer one instead of defining one here.
This broke stuff on very high floats, 0x40 is an un-biased value that can't be changed from IEEE standard. We now 100% match the PS3's SPEs, but not the PS2 Div result (can be off by one bit). However, this is still way better than IEEE754.
#5070 fixed with accurate VU0 Mul/Div for COP2 (require EE interpreter like Dirge of Cerberus). |
Oh, i suspected it was something like this, nice. Here is a video 2024-11-22.22-36-21.mp4 |
#4546 fixed with accurate VU1 Mul/Div. :) |
eb19198
to
9b6327c
Compare
There are 3 code styles remaining, The removal of the Global VU tpm, PCSX2's own helpers in PS2Float and improvement of the PS2Float class to store the raw float. |
9b6327c
to
bb63630
Compare
Applies some recommendations from the review. The next batch will come later.
bb63630
to
8bc2ed9
Compare
At some point you changed back mantissa normalization constant for Div. From 0x39 to 0x40 which is used in IEEE754. |
I see you are well documented on the subject :) I will explain to you why this "experiment" failed. The current Div/Sqrt system mimic the PS3 SPEs (I have more experience with that system), and they uses a IEEE "like" divisor that allowed me to test what is closer to PS2 hardware. Their rounding normalization is exactly using IEEE value from the tests I did. The PS2 Div is a mess, they round to nearest by default, but their "rounding" is completely custom. So I took the PS3 hardware as a basis and tried to "lower" the rounding delta to see what is closer to PS2 hardware, 0x39 seemed fine, but in actual fact, that was a hack. Changing this value introduced a slight BIAS in the result, which can affect high floats values and makes them incorrect. 0X3F was still "too high" so to speak. Current Div/Sqrt system is exactly like a PS3 emulating the PS2 in therms of results, which means way closer (and fixes the VAST majority of games), but still one bit off in some edge cases. Add/Sub/Mul are PS2 perfect. |
Can’t argue with Sony’s own solution. Sounds like a great comment to document the solution. |
I would say the main issue came from the fact it was widely assumed the PS2 and PS3 shared the same "float" calculation behaviours, while in actual fact, they do not. The IBM floats, even if not compliant to IEEE are still very similar in action. The PS2 deviates slightly on some defined calculations. The Multiplication is flawed for instance, due to them using a booth and wallace approach (Method which cost a LOT LESS to manufacture) that was dropping bits in some cases, causing plenty of calculation deviation ( something * 1 NOT ALWAYS equal to something). |
That was the plan as far as I know. Probably someone underestimated how important 1:1 implementation was to keep compatibility. PS3 CELL implemented special Altivec/VMX mode to make PPE core vector floats compatible with SPU, but also with PS2 emulation in mind. Later it turned out that they really miscalculated importance of accurate floating point math in PS2 emulation. To the point that first fixes in ~2008-2010 were strictly per game, like "on this pc, add 0x01 to this reg, and get back execution to recompiler". Soft floats, including accurate DIV, came later, when new people were hired. :) In the end only VU1 is using real SPE, everything else use PPE VMX compatibility mode.
And that's indeed a great outcome. Nice work. :) |
Thank you Goatman13! The PCSX2 team also gets massive credits here for their help and support, of course the goal is to be PS2 perfect on this one. There is this one game, Mortal Combat Shaolin Monks which doesn't work fully yet on the Soft-Float div due to the game doing loads of Log2 on top of existing Log2 results, leading to it relying on rounding in booth directions... |
…eed-up simple calculations. This greatly improves performance while using Soft-Floats.
dca7e3c
to
745e474
Compare
BTW the reason macOS builds are failing is they have PCH off, which means you don't get a bunch of headers auto-included. In particular, you need an |
…s to their respectives place. I don't like the SIMD way of doing it, it can be slower and less practical to use (expensive casting).
Now the only code style remaining is the Global VU TMP. |
This Pull Request implements the first take ever on real Soft-Float support in PCSX2.
This work is a combination or several efforts and researches done prior.
Credits:
https://www.gregorygaines.com/blog/emulating-ps2-floating-point-nums-ieee-754-diffs-part-1/
https://github.com/assumenothing/Tommunism.SoftFloat/tree/main
https://github.com/GitHubProUser67/MultiServer3/blob/main/BackendServices/CastleLibrary/EmotionEngine.Emulator/Ps2Float.cs
https://github.com/Goatman13/pcsx2/tree/accurate_int_add_sub
PCSX2 Team for their help and support in this massive journey.
This pull request should be tested with every games requiring a clamping/rounding mode/float patches (cf: GameDatabase).
Currently, this PR fixes on the interpreters:
[BUG]: DJbox (Japan) - SCPS-15082 - 56275BE9 #5169
Ratchet & Clank 2 - Megaturret Doesn't Work Correctly #354
[BUG]: Tourist Trophy: License Tests are broken and not completeable, player is placed out of bounds #11507
[BUG]: Monster Hunter games: bounce calculation inaccuracy #10519
[BUG]: Opening demo desyncs. The Taxi 2 (SLPS-20478) #8068
[BUG]: Pride FC - Fighting Championship | Missing text and textures in Hardware and Software render modes. #7642
[BUG]: Jak & Daxter/Jak 3 -- Character slides on his own | Jak 2 -- Inclined Camera Drift #5257
This is important to note, that this implementation, while technically fixing Gran Turismo 4 and Klonoa 2, makes the games crash due to very high floats being passed in the emu code, and failing at some points later in the process. This has not yet been ironed-out.
Other than that, this sets the floor for Soft-Float in PCSX2, a long awaited contribution.