Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Math.Pow and Math.Exp produce inconsistent results on x64 #42747

Open
vsfeedback opened this issue Sep 25, 2020 · 8 comments
Open

Math.Pow and Math.Exp produce inconsistent results on x64 #42747

vsfeedback opened this issue Sep 25, 2020 · 8 comments

Comments

@vsfeedback
Copy link

This issue has been moved from a ticket on Developer Community.


Math.Pow and Math.Exp produce inconsistent results across different processor architectures, at least on x64 machines.

There are many mathematical applications where the trustworthiness of the application depends on heavy unit testing and consistency is key.

The results differ only on the last bit, so this does not effect the behavior of the application in a significant way but it breaks unit tests on a large scale across the organization.

I think that it would be very important to have such essential pieces of code behaving consistently.

I hope some dev will pick this up. Thanks in advance.

This report is a followup on https://developercommunity.visualstudio.com/content/problem/502519/mathpow-and-mathexp-produce-inconsistent-results.html?childToView=1194280 , which was closed due to inactivity where @Kuan Bartel wrote about this in detail:

I have found that Math.Pow and Math.Exp produce inconsistent results on different machines when compiled and run using x64. I have tested this on .Net 4.6.2, .Net 4.7.2, .Net Core 2.0 and .Net Core 3.0 (preview 3) (the latter two also compiled to native using Microsoft.DotNet.ILCompiler).

The differences are very small but in our application, a scientific modelling application, they compound over the course of the simulation and can significantly change results.

Examples of differences (BitConverter byte representation in brackets):

  • x = 378712769.7741194 (0x41B692B2C1C62CB0), y = 3.0 (0x4008000000000000), result1 = 5.4316258669543425E+25 (0x454676F5242A46D0), result2 = 5.431625866954343E+25 (0x454676F5242A46D1)
  • x = 4651580161.718456 (0x41F15417B01B7ECC) , y = 0.9518954814187696 (0x3FEE75ED833CEBDB), result1 = 1594208943.9010825 (0x41D7C16CABF9AB56), result2 = 1594208943.9010823 (0x41D7C16CABF9AB55)

Original Comments

geoff.davis on 9/22/2020, 11:47 PM:

I agree this is important to get resolved.

Kolumban Sandor on 9/22/2020, 11:49 PM:

@geoff.davis If you upvote the problem, hopefully it will get some traction.

Feedback Bot on 9/23/2020, 02:59 AM:

We have directed your feedback to the appropriate engineering team for further evaluation. The team will review the feedback and notify you about the next steps.


Original Solutions

(no solutions)

@Dotnet-GitSync-Bot Dotnet-GitSync-Bot added area-System.Numerics untriaged New issue has not been triaged by the area owner labels Sep 25, 2020
@ghost
Copy link

ghost commented Sep 25, 2020

Tagging subscribers to this area: @tannergooding, @pgovind, @jeffhandley
See info in area-owners.md if you want to be subscribed.

@tarekgh
Copy link
Member

tarekgh commented Sep 25, 2020

#9001
#7364
#10798
#40956
#12139
#12895

@tannergooding
Copy link
Member

The underlying issue is caused by the the runtime relying on the C runtime implementation for these functions. That is Math/MathF.Exp and Math/MathF.Pow are effectively a call into exp/expf and pow/powf from the C standard library.

This means that we pick up any changes made to a given C runtime's math functions when we roll forward to a newer version of the compiler or standard library. Most often these changes are small fixes to help produce more correct results and is why you see these single bit differences between the MSVC implementation in VS2015 vs that in VS2017, etc. However, they are sometimes also perf changes and so a machine that supports hardware accelerated FusedMultiplyAdd may use a slightly different implementation (which is both faster and more correct).

This also extends further to differences between C runtime implementations and so the results between x86, x64, ARM32, and ARM64 can all differ as can the differences between Windows, Linux, and macOS (and any valid combination of the given OS/hardware sets).

Short of standardizing on a single implementation and using that on all platforms/architectures, this isn't something that can readily be fixed by the runtime or libraries team. Additionally, even if we were to standardize on a singular implementation, any bug fixes made may likewise introduce small inconsistencies between two given releases of .NET. Older already shipped versions of .NET (especially full framework) would also not get these fixes or improvements and would remain dependent on the C Runtime.

Standardizing on a single implementation is a fairly large undertaking and puts the burden on the .NET runtime/library owners to keep up to date, resolve bugs, and maintain/improve performance as needed which has to be weighed against any benefits it may provide.

CC. @jeffhandley, @jkotas as an FYI of another customer issue in this area.

@kolixx
Copy link

kolixx commented Sep 25, 2020

I reported the issue that was moved here. We are producing different types of simulation and data processing software and calculating exponents and powers is a pretty basic operation that have to be used frequently.

These codes are heavily unit tested before going into the code base and the reported issue causes that unit tests meticulously created on a dev machine might fail in the CI pipeline. It is very cumbersome to ban the use built in Pow and Exp functions in data science applications and the optionally available standardized solutions are way slower in performance.

Having consistent behavior across architectures and operating systems (even if consistency between .Net versions cannot be achieved) would save tons of time of difficult debugging.

@danmoseley
Copy link
Member

The implementations for Pow can be pretty daunting, eg https://github.com/amd/win-libm/blob/master/pow.asm which itself depends on exp and log.

@tannergooding
Copy link
Member

For reference, here is a C implementation from ARM (also MIT) that is a bit less daunting (but is in practice basically the same general code; just not in asm 😄): https://github.com/ARM-software/optimized-routines/blob/master/math/pow.c#L272-L373

@danmoseley
Copy link
Member

If there was a cross platform, widely available, high quality/maintained library, could we imagine we could do some form of "lightup" if it is found to be present? (As opposed to distributing it, which introduces other problems). That way the subset of folks that care most about consistency could arrange for it to be present.

Speaking hypothetically as perhaps we've established there is no such thing.

@kjbartel
Copy link

kjbartel commented Sep 28, 2020

I was the original original poster of the problem. Like @kolixx we write simulation software which uses pow and exponent throughout the simulation. For example we're simulating daily timesteps over hundred years runs. And there's talk about doing 10,000 year runs. Small differences can thus accumulate and significantly change the result. Obviously sensitivity analysis is needed when interpreting the results but we also must have reproducible results. That is where things were failing because our automated tests would produce different results on our developer machines versus our CI hardware.

As commented on the original issue, I ported Pow and Exp from OpenLibM (https://github.com/JuliaMath/openlibm) using unsafe methods. These implementations were then able to produce consistent results on x86, x64 and across different CPUs. Although significantly slower than the implementations provided by the CRT the difference is negligible when looking at the performance of the entire program. @danmosemsft Is that a suitable cross platform maintained library?

@tannergooding tannergooding removed the untriaged New issue has not been triaged by the area owner label Sep 30, 2020
@tannergooding tannergooding added this to the Future milestone Sep 30, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

7 participants