You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I'm currently using fasteval in a computer graphics project, and in that world, f64 or double-precision numbers are very rarely used: not only is the precision often unnecessary but it makes GPUs much slower (often by 2x) as GPUs are big SIMD/T machines and can pack 2 times more f32 instructions than f64 per cycle, and I suppose this is the same for SIMD in CPUs.
This could be an easier step towards full SIMD use within the crate, though I haven't implemented it as a PoC.
Currently I'm converting the f64 to f32 before sending it to the GPU but I think this inefficiency may be a nice optimization, and a good step towards the milestone of supporting arbitrary-precision numbers.
The text was updated successfully, but these errors were encountered:
I've had a go at this recently, using num-traits to create a more generic representation of the float to support both f32 or f64. Let me know if this is something you would consider pulling in, or what you think might be a better approach to tackle it. Here's the repo: https://github.com/adamsky/fasteval/tree/f32
Thanks Adam! I'll review the changes when I get some time. They look really nice. When I review, I will mainly be thinking about whether the same thing can be achieved without adding a dependency.
First of all, congrats and thanks for the crate !
I'm currently using
fasteval
in a computer graphics project, and in that world,f64
or double-precision numbers are very rarely used: not only is the precision often unnecessary but it makesGPU
s much slower (often by 2x) as GPUs are big SIMD/T machines and can pack 2 times moref32
instructions thanf64
per cycle, and I suppose this is the same for SIMD in CPUs.This could be an easier step towards full SIMD use within the crate, though I haven't implemented it as a PoC.
Currently I'm converting the
f64
tof32
before sending it to the GPU but I think this inefficiency may be a nice optimization, and a good step towards the milestone of supporting arbitrary-precision numbers.The text was updated successfully, but these errors were encountered: