System.Math and System.MathF should be implemented in managed code, rather than as FCALLs to the C runtime #9001

tannergooding · 2017-09-23T22:57:38Z

As per the title, both System.Math and System.MathF should have most of their extern methods implemented in managed code rather than being FCALLs to the underlying C runtime.

This will ensure:

Consistency across operating systems and architectures
Implementations can be more readily updated without requiring changes in the runtime proper

Some of the functions (such as Abs, Ceil, Floor, Round, and Sqrt) are simple enough that they can be implemented in managed code today and still maintain the performance characteristics.

Other functions (such as Cos, Sin, and Tan) will need to wait until the hardware intrinsics proposal is more widely available (since maintaining perf numbers will require an implementation to call said intrinsics).

The text was updated successfully, but these errors were encountered:

tannergooding · 2017-09-23T22:57:46Z

FYI. @mellinoe

jkotas · 2017-09-24T04:28:22Z

I do not think that this is necessarily a good idea. It makes the runtime less portable. The C runtime implementations of these functions are a fine default implementation.

If we want to make the implementation better on some platforms, that's ok - but it should not be the requirement for all platforms.

tannergooding · 2017-09-24T08:00:51Z

@jkotas, how does it make the runtime "less portable"?

Currently, we are tied to a particular implementation of the C runtime for each platform. This leads to:

inconsistencies in behavior when intrinsics are not used
requires runtime updates/workarounds to resolve bugs
has lead to additional overhead in the PAL layer to ensure other platforms "conform" to the Windows behavior
leads to significant perf differences between the OS platforms (when intrinsics are not being used)
etc

I would think providing a managed implementation makes it more portable since it means:

fewer inconsistencies when intrinsics are not used
updates can be made to just corlib (no changes to native code required)
- This applies to bug fixes, IEEE spec compliances fixes, improvements, etc
no additional hacks, overhead, or conformance tests in the PAL layer
codegen/perf should be the same between OS platforms on the same hardware
etc

For all of these we should obviously ensure that codegen and perf remain on-par with what we have today.

Some of the functions, such as Abs (dotnet/coreclr#14156) are trivial to do that with (since their implementation is so simple)

Others (such as Floor, Ceiling, and Round), are slightly more complicated, but should still be doable with hardware intrinsics (and keeping them equally as performant).

The remaining (such as Cos, Tan, and Sin) are known to require hardware intrinsics to complete this.

jkotas · 2017-09-24T14:49:56Z

hardware intrinsics

If porting to a new hardware platform requires implementing a ton of intrinsics in the CodeGen, you have added like one man-year to the porting cost. It is what makes the runtime less portable.

tannergooding · 2017-09-24T15:32:42Z

If porting to a new hardware platform requires implementing a ton of intrinsics in the CodeGen, you have added like one man-year to the porting cost.

It does not, strictly speaking, require this. It only likely requires this for producing the most performant code.

It is also, strictly speaking, entirely possible to set the "intrinsic" for these functions on new hardware architectures to be the CRT implementation by default (provided they are IEEE compliant), if the software fallback's performance is considered too poor.

That being said, we already have the case today, when using the CRT implementation, that the majority of the functions on some platforms are considerably poorer (330% slower in the worst case): https://github.com/dotnet/coreclr/issues/9373.

This proposal gives us a standard baseline of "correctness" in the software implementation where any hardware specific improvements can readily be checked. It also allows us to check that the underlying CRT implementation (if it were to be used as the intrinsic) is compatible with our expectations.

AndyAyersMS · 2017-09-25T16:23:30Z

A few places where this might unlock some perf:

On Windows x86 the the jit now uses SSE2 internally but X87 at the ABI boundaries, so there is extra overhead involved in calling into native math helpers. There are is a custom ABI library that the C++ compiler uses to get around this (libm_sse2) but we would need to do work in the runtime and jit to enable calling those library routines from managed code. So having managed implementations, especially for the simpler methods with short path lengths, could provide a nice perf boost on X86.
In the SYSV X64 ABI there are no callee-saved XMM registers, so the inability of the jit to inline math helpers leads to extra XMM spill/reloads.

jkotas · 2017-09-25T16:49:02Z

As I have said, I am fine with using C# implementation that depends on intrinsics on platforms where we have deeper codegen investments.

The best implementation of Abs for x64 is actually: public static double Abs(double value) => Abs(value). This assumes that the Abs intrinsic will be force expanded - being done as part of dotnet/coreclr#14020. This implementation is both best performing and also guarantees consistent behavior between the inlined cases and the method being called by reflection or via delegate.

For bring up of new platforms or platforms with less codegen investment, the C runtime implementation is the default. It may not be as good or it can have different behavior in corner cases - but it is good enough. For example, I do not ever want to have a software fallback for cos to be implemented in managed code. I always want to use the C runtime implementation for that.

fanoI · 2017-09-26T08:52:42Z

Having a managed version will not do the porting easier? For example suppose you want to port Net Core to MIPS you have not worry to port Math / FMath initially and you have time to debug other issues, the Math optimization using Hardware Intrinsics / calling C run time could be done after.

Or I'm missing something?

jkotas · 2017-09-26T12:10:27Z

For porting, you do not have to worry about the C runtime either (porting CoreCLR to a platform without C runtime is non-scenario).

And the C runtime functions will be better debugged and have better performance than a software based callback written in C# (on platforms without deeper codegen investment).

migueldeicaza · 2017-09-26T13:21:20Z

There are several problems with this proposal, but to me the major problem is that it moves the maintenance and fixing from the underlying platform to us for what is a very well established, maintained and universally understood API.

The stated goal of "consistency across operating systems and platforms" has been troublesome in the past for things like Unicode string collation. Mono emulated the Windows behavior, while .NET Core took the native approach - and I think that the approach that .NET core took of using libICU instead of trying to emulate the Windows behavior was the right one.

Porting to a new platform already requires an advanced libc to be available, or an equivalent. Not only for the managed bridges, but the runtime itself (CoreCLR and Mono) both consume abs, floor, ceil and for things like sqrt, they tend to be mapped to CPU instructions.

There is the performance issue as well, which Jan just touched on.

tannergooding · 2019-08-02T14:36:38Z

AMD has open sourced a snapshot of the libm implementation that is currently used by Windows x64: https://github.com/amd/win-libm (plus a few improvements that haven't been picked up yet).

This, combined with other open source libm implementations such as the one from ARM: https://github.com/ARM-software/optimized-routines, should allow us to provide a portable implementation that is both IEEE compliant and deterministic.

We could do that by contributing back to the existing libraries and picking an implementation as the "de-facto" standard to pull in, or we could do that by porting the code to C#.

I would lean towards the former (picking an implementation as the "de-facto" standard), but I think the latter has some interesting possibilities as well. Not only would it be able to play with .NET code better (such as being GC aware), but it can also take some optimizations that the standard C code doesn't have available (such as not needing to worry about exception handling or alternative rounding modes, since .NET doesn't currently support those).

tcwicks · 2020-03-28T08:03:33Z

Just a comment after many weeks of frustration.
Many years running and MathF is perhaps the single most frustrating and categorically worst implementation of anything within Dot Net. I cannot actually think of any way this could be messed up more than it already is. Even the nuget packages could not have been messed up any more.
I guess I now know why Unity still maintain their parallel implementation completely disregarding this mess.

tannergooding · 2020-03-28T14:02:11Z

@tcwicks, could you please explain what problems you are having and why you are having difficulties with the design?

kilngod · 2022-06-10T02:37:28Z

As someone interested in .Net being a first-class IEEE standards high performance computing platform I agree with @tannergooding we should support libm implementation in x64 and ARM. The "C runtime implementations" as stated by @jkotas are not suitable for anything but substandard floating point performance. The C runtime is not good enough for today's machine learning and IEEE computing needs. The C runtime is little more than a all else fails compute fallback option. I have no idea why the .Net team spent all these years making an awesome high performance cross platform library then wants to drop the ball on IEEE and machine learning compute.

How many high performance compute architectures outside of ARM and X64 do we need to support outside of the C runtime? Zero. Should we have a way to all vendors to add compute libraries? Yes! We only need high performance compute on these two platforms. Processor makers should be able to add compute libraries as plugins to .Net rather than having a narrow minded approach to what is good enough.

sfiruch · 2024-05-08T14:07:33Z

Hoping to get some traction on this idea. The main reason is performance, as the libc implementations can be slow and may require register shuffling to call. Additionally, I'd love identical results on different platforms. I understand that only a subset of users has these priorities.

I think the ScalB "experiment" went well. The other regularly reported problematic functions are pow, exp, sin and cos. I believe, having those four in managed code would solve most problems.

sfiruch · 2024-11-22T23:01:43Z

Still struggling with bad performance of fmod, pow and cosf in my projects, like others (see #10798, #13336, #48776).

It seems since .NET9 the runtime already contains managed math implementations for Vector and Tensor, e.g. https://source.dot.net/#System.Private.CoreLib/src/libraries/System.Private.CoreLib/src/System/Runtime/Intrinsics/VectorMath.cs,daa3022f57e1c08a. These implementations are vectorized C#-translations of AOCL routines, and perform well in my tests.

Given this precedent, I'd be happy port scalar pow, cos, sin, exp and fmod implementations from AOCL to Math. Would you be interested in such a PR?

kilngod · 2024-11-23T14:23:09Z

Simon - as a third-party devs we're in complete agreement with you and would be 100% interested in you porting these functions to managed code as the c runtime is less than optimal. If the .net team already wrote managed code for these functions, we are mystified why they did not update these routines to managed code. We would likely want to back port this .net 8, just for consistency... one can dream.

tannergooding · 2024-11-23T19:01:30Z

If the .net team already wrote managed code for these functions, we are mystified why they did not update these routines to managed code

Vector vs scalar algorithms are a bit different in how they handle various logic due to the difference in handling many values simultaneously vs a single value at a time.

There are in some cases notably also minor accuracy differences between the two of them; where it is acceptable for net new vectorized algorithms to have slightly higher amounts of error due to typical use-case and the fact they are "net new" APIs. We can then improve that accuracy and/or performance over time for the less likely edge cases.

We would likely want to back port this .net 8, just for consistency... one can dream.

That's unlikely to happen for many reasons.

Given this precedent, I'd be happy port scalar pow, cos, sin, exp and fmod implementations from AOCL to Math. Would you be interested in such a PR?

Normally the answer would be: yes. However, I've already got much of the work done and there's not really any need for you to re-port things.

There's only so much work that I (and the broader team in general) can do (design, implement, review, document, etc) in a given release so not everything ends up in a single release. For .NET 9 the focus was primarily on providing vectorized versions of the "core" math APIs. In .NET 10 we plan on continuing with vectorizing the rest of the math APIs and at least getting reviews up for ports of some of the scalar APIs (I just have to finish getting the pending community PRs reviewed/merged into .NET 10 first, before I start getting even more PRs and work up; so things don't get overly stale, so things can stay manageable, etc).

In the interim, as of .NET 9 developers have access to accelerated and deterministic math already by functionally doing Vector128.Sin(Vector128.CreateScalarUnsafe(value)).ToScalar() for example. It is indeed a bit more verbose, but it gets the job done in the interim.

msftgits transferred this issue from dotnet/coreclr Jan 31, 2020

msftgits added this to the Future milestone Jan 31, 2020

ericstj added area-Meta and removed area-Meta-corelib labels Feb 19, 2020

maryamariyan added the untriaged New issue has not been triaged by the area owner label Feb 26, 2020

m4rs-mt mentioned this issue Mar 21, 2020

Incorrect calculations m4rs-mt/ILGPU#56

Closed

ericstj removed the untriaged New issue has not been triaged by the area owner label Jun 25, 2020

tarekgh mentioned this issue Sep 25, 2020

Math.Pow and Math.Exp produce inconsistent results on x64 #42747

Open

joperezr added this to Triage POD for Reflection, META, etc. Nov 2, 2021

joperezr removed this from Triage POD for Reflection, META, etc. Nov 2, 2021

joperezr added this to Triage POD for Reflection, META, etc. Nov 2, 2021

joperezr removed this from Triage POD for Reflection, META, etc. Nov 2, 2021

joperezr moved this to Future in Triage POD for Reflection, META, etc. Nov 2, 2021

joperezr added this to Triage POD for Reflection, META, etc. Nov 2, 2021

tannergooding mentioned this issue Mar 10, 2023

Unexpected performance dropoff for MathF.Exp in loops with high iteration counts #83265

Closed

jkotas mentioned this issue Mar 6, 2024

NativeAOT: Add win-x86 support #99372

Merged

sfiruch mentioned this issue Oct 3, 2024

Optimize Math.Pow(X, C) in JIT #13336

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

System.Math and System.MathF should be implemented in managed code, rather than as FCALLs to the C runtime #9001

System.Math and System.MathF should be implemented in managed code, rather than as FCALLs to the C runtime #9001

tannergooding commented Sep 23, 2017

tannergooding commented Sep 23, 2017

jkotas commented Sep 24, 2017

tannergooding commented Sep 24, 2017

jkotas commented Sep 24, 2017

tannergooding commented Sep 24, 2017

AndyAyersMS commented Sep 25, 2017

jkotas commented Sep 25, 2017

fanoI commented Sep 26, 2017 •

edited

Loading

jkotas commented Sep 26, 2017

migueldeicaza commented Sep 26, 2017

tannergooding commented Aug 2, 2019

tcwicks commented Mar 28, 2020

tannergooding commented Mar 28, 2020

kilngod commented Jun 10, 2022

sfiruch commented May 8, 2024

sfiruch commented Nov 22, 2024

kilngod commented Nov 23, 2024

tannergooding commented Nov 23, 2024 •

edited

Loading

System.Math and System.MathF should be implemented in managed code, rather than as FCALLs to the C runtime #9001

System.Math and System.MathF should be implemented in managed code, rather than as FCALLs to the C runtime #9001

Comments

tannergooding commented Sep 23, 2017

tannergooding commented Sep 23, 2017

jkotas commented Sep 24, 2017

tannergooding commented Sep 24, 2017

jkotas commented Sep 24, 2017

tannergooding commented Sep 24, 2017

AndyAyersMS commented Sep 25, 2017

jkotas commented Sep 25, 2017

fanoI commented Sep 26, 2017 • edited Loading

jkotas commented Sep 26, 2017

migueldeicaza commented Sep 26, 2017

tannergooding commented Aug 2, 2019

tcwicks commented Mar 28, 2020

tannergooding commented Mar 28, 2020

kilngod commented Jun 10, 2022

sfiruch commented May 8, 2024

sfiruch commented Nov 22, 2024

kilngod commented Nov 23, 2024

tannergooding commented Nov 23, 2024 • edited Loading

fanoI commented Sep 26, 2017 •

edited

Loading

tannergooding commented Nov 23, 2024 •

edited

Loading