Optimized basic math functions
In C++, like any other programming language, computations are performed using the available set of operators, the intrinsic functions implemented by the compiler, and the set of core mathematical functions provided by the standard C and C++ libraries, or some alternative implementation.
The header, HCCMath.h
provides alternatives to many of the core mathematical function that is specified in
the C and C++ standards. All of the functions can be constexpr
evaluated, and several offer
runtime performance benefits as well.
The functions are implemented in the Harlinn::Common::Core::Math
Unit Tests
Extensive unit tests, available here, strives to demonstrate the accuracy of the computations.
The performance of the functions is benchmarked using the Google benchmark library, and can be verified by building and executing BasicMathBenchmarks included in the Harlinn.Windows solution.
Benchmarks for a single inline function cannot be relied upon to accurately determine how well the function will perform perform in a real application. For release builds, the compiler and linker, employs global optimization strategies, attempting to optimize the operations across all the compilation units. The global optimization strategies will often find optimization opportunities that are very hard to detect and implement manually, and the only way to really determine if one set of functions performs better than another, is to try them out in a real, computationally intensive, application.
It is, however, unlikely that a set of functions that performs worse than another, in a reasonable set of benchmarks, can outperform the other in a real application.
PBRTO a micro optimized raytracing app
PBRTO is a micro optimized version of PBRT-v4, under development
as an example of how the functionality in HCCMath.h
and HCCVectorMath.h
can be used to
optimize the performance of real, computationally intensive, apps. It’s now about 35
faster than the release build of the original PBRT. more…
The functions was created to enable constexpr
evaluation of mathematical expressions, since nothing
improves runtime performance as much as making the compiler calculate the results at compile time.
Much of the code is based on version 0.8.5
of the OpenLibm
mathematical C library used by the Julia programming language.
The library does not include the OpenLibm floating point environment, and relies on the floating point environment provided by the Visual C++ runtime.
Some functions, like Sin
, can only perform constexpr
evaluation for a subset of the
possible arguments. Sin
has no problems with constexpr
evaluation for \(\pm 20000^\circ\),
but fails to constexpr
evaluate Sin(1.7976931348623158e+308)
Implementation quality
The quality of the implementation is, since it is based on OpenLibm, very high. OpenLibm does many things very well, but sometimes the Visual C++ runtime, an intrinsic function, or other alternative implemented by the library, performs better. When this is the case, the library selects the implementation with the best runtime performance.
Rather extensive changes to the OpenLibm code was required to
enable constexpr
evaluation, and there are currently 535 unit tests helping to ensure the quality
of the mathematical parts of the library.
Several of the tests execute the function under test 20 000 times using random generated values, while others try every possible value over the range of values most often used with the function.
The functions in the Math
namespace that use the constepr
path implementation
at runtime are thoroughly tested, and the tests tries to determine the maximum
deviation between the standard function and the corresponding function in the Math
Deviation is calculated by the Deviation
function below.
The value passed for the first argument is the expected result, usually calculated using
the standard implementation, while the value calculated by the corresponding function
in the Math
namespace is passed as the second argument.
inline double Deviation( double first, double second )
// If both is NaN, the results don't deviate
if ( std::isnan( first ) )
if ( std::isnan( second ) )
return 0.0;
return std::numeric_limits<double>::infinity( );
else if ( std::isnan( second ) )
// The second value is NaN, but not the first
return std::numeric_limits<double>::infinity( );
if ( std::isinf( first ) )
if ( std::isinf( second ) )
if ( first > 0. && second > 0. )
// Both values are +infinity
return 0;
else if ( first < 0. && second < 0. )
// Both values are -infinity
return 0;
// Opposite signs
return std::numeric_limits<double>::infinity( );
// only the first value is infinite
return std::numeric_limits<double>::infinity( );
else if ( std::isinf( second ) )
// only the second value is infinite
return std::numeric_limits<double>::infinity( );
// Avoid division by zero
if ( first != 0.0 )
using std::abs;
if ( first <= second )
return abs( second - first ) / abs( first );
return abs( first - second ) / abs( first );
// When second is very close to zero, the result is zero deviation
constexpr double veryCloseToZero = 5e-323;
auto absSecond = abs( second );
if ( absSecond <= veryCloseToZero )
return 0.0;
// May still be very close to zero, but will cause the test to fail.
return 1.0;
Exceptional performance
A few functions outperforms the standard implementation spectacularly, like Exp
which outperforms std::exp
by 1200
The two implementations returns the same result for 2261694913
out of 2288746510
and when tested with double precision floating point argument values, uniformly distributed
over the interval [-744.0, 710.0]
, the maximum deviation, 1.56426946755e-12
was obtained when passing -717.256469727
as the argument to the functions.
Using SIMD::Traits<T.N>
Some functions, like Hypot
, use the SIMD::Traits<T.N>
specializations to achieve excellent runtime performance.
template<typename T>
requires IsFloatingPoint<T>
constexpr inline std::remove_cvref_t<T> Hypot( T x, T y, T z ) noexcept
if ( std::is_constant_evaluated( ) )
return Math::Internal::OpenLibM::FastHypot( x, y );
using FloatT = std::remove_cvref_t<T>;
using Traits = SIMD::Traits<FloatT, 3>;
auto v = Traits::Set( z, y, x );
v = Traits::Mul( v, v );
v = Traits::HSum( v );
v = Traits::Sqrt( v );
return Traits::First( v );
Using the standard, and the internal, implementation at runtime.
performs about 40 % worse than std::tan
, for double precision
floating point values, but Math::Internal::TanImpl
beats std::tan
, for single precision
floating point values, by more than 60
%, and splitting the execution path between
for single precision floating point values, and std::tan
provides the best solution:
template<typename T>
requires IsFloatingPoint<T>
constexpr inline std::remove_cvref_t<T> Tan( T x ) noexcept
using FloatT = std::remove_cvref_t<T>;
if ( std::is_constant_evaluated( ) )
if constexpr ( std::is_same_v<FloatT, float> )
return Math::Internal::OpenLibM::tanf( x );
return Math::Internal::OpenLibM::tan( x );
if constexpr ( std::is_same_v<FloatT, float> )
return Math::Internal::OpenLibM::tanf( x );
return std::tan( x );
BasicMathBenchmarks results, at the bottom of this page, shows how the performance numbers were calculated. The runtime execution path for each function is selected, as shown, based on its performance in the benchmarks.
Basic operations
which returns the absolute value \(|x|\) forx
, is implemented for floating point types, signed integers and unsigned integers. Callsstd::abs
at runtime. -
which calculate the remainder of a floating point division operation, is implemented for floating point types.FMod
by approximately60
% for double precision floating point values, and by40
% for single precision floating point values. -
which returns the greater of to values, is implemented for floating point types.Max
at compile time, and at runtime it calls_mm_max_ss
for single precision floating point values, andstd::max
for double precision floating point values. This improves the performance, on the average, by10
% for single precision floating point values. It varies between2
% and30
% for each run of the benchmarks. -
which returns the lesser of to values, is implemented for floating point types.Min
at compile time, and at runtime it calls_mm_min_ss
for single precision floating point values, andstd::min
for double precision floating point values. This improves the performance, on the average, by7
% for single precision floating point values. It varies between-2
% and20
% for each run of the benchmarks. -
checks for binary equality between two floating point values.
Exponential functions
returns e raised to the given power (ex
by approximately1200
% for double precision floating point values, and by approximately1000
% for single precision floating point values.The maximum detected deviation between
for single precision floating point values, and2.18599e-16
for double precision floating point values, for argument values in the range-9
, tested with a uniform random distribution of10'000
values. -
computes natural, base e, logarithm (ln x)Log
by approximately260
% for double precision floating point values, and by approximately400
% for single precision floating point values.The maximum detected deviation between
for single precision floating point values, and2.04848e-16
for double precision floating point values, for argument values in the range0
, tested with a uniform random distribution of10'000
values. -
base 2 logarithm of the given number (log2x
by approximately270
% for double precision floating point values, and by approximately250
% for single precision floating point values.The maximum detected deviation between
for single precision floating point values, and2.1882e-16
for double precision floating point values, for argument values in the range0
, tested with a uniform random distribution of10'000
values. -
computes common (base 10) logarithm (log10x
by approximately360
% for double precision floating point values, and by approximately390
% for single precision floating point values.The maximum detected deviation between
for single precision floating point values, and2.0154e-16
for double precision floating point values, for argument values in the range0
, tested with a uniform random distribution of10'000
Power functions
computes square root (√x
at runtime.Sqrt
by approximately1400
% for double precision floating point values, and by approximately1300
% for single precision floating point values. -
computes square root of the sum of the squares of two or three numbers.The two argument version of
by approximately270
% for double precision floating point values, and by approximately230
% for single precision floating point values.The three argument version of
by approximately190
% for double precision floating point values, and by approximately320
% for single precision floating point values.
Trigonometric functions
Graphic intensive application are highly sensitive to the performance of the trigonometric functions, especially for single precision floating point values.
computes the sine of its argument given in radians.Sin
at runtime for both single and double precision values.The
path forSin
by approximately100
% for single precision floating point values, but performs worse when both the sine and the cosine is calculated for the same value.The maximum deviation between
and theconstexpr
path forSin
for single precision floating point values, and2.22045e-16
for double precision floating point values, tested for all possible single precision floating point argument values for in the range-((2*pi)+epsilon)
. -
computes the cosine of its argument given in radians.Cos
at runtime for both single and double precision values.The
path forCos
by approximately110
% for single precision floating point values, but performs worse when both the sine and the cosine is calculated for the same value.The maximum deviation between
and theconstexpr
path forCos
for single precision floating point values, and2.22044e-16
for double precision floating point values, tested for all possible single precision floating point argument values for in the range-((2*pi)+epsilon)
. -
computes the tangent of its argument given in radians.Tan
by approximately60
% for single precision floating point values,Tan
calls std::tan at runtime, with a consistent performance penalty of about20
% compared to callingstd::tan
directly for double precision floating point values.The maximum deviation between
and theconstexpr
path forTan
for single precision floating point values, and2.22045e-16
for double precision floating point values, tested for all possible single precision floating point argument values for in the range-((2*pi)+epsilon)
. -
computes arc sine of its argument.ASin
by approximately20
% for double precision floating point values, and by approximately30
% for single precision floating point values.The maximum deviation between
for single precision floating point values, and2.22044e-16
for double precision floating point values, tested for all possible single precision floating point argument values for in the range-1.0
. -
computes the arc cosine of its argument.ACos
by approximately30
% for double precision floating point values, and by approximately50
% for single precision floating point values.The maximum deviation between
for single precision floating point values, and2.22044e-16
for double precision floating point values, tested for all possible single precision floating point argument values for in the range-1.0
. -
computes the arc tangent of its argument.ATan
by approximately5
% for double precision floating point values, and by approximately30
% for single precision floating point values.The maximum detected deviation between
for single precision floating point values, and0.0
for double precision floating point values, tested with a uniform random distribution of10'000
values in the range-10'000
. -
computes the arc tangent ofx
, its two arguments, using signs to determine quadrants.ATan2
by approximately20
% for double precision floating point values, and by approximately40
% for single precision floating point values.The maximum detected deviation between
for single precision floating point values, and0.0
for double precision floating point values, tested with a uniform random distribution of10'000
values in the range-10'000
Nearest integral value floating point operations
computes the nearest integral value not less than the given value.Ceil
at runtime. -
computes the nearest integral value not greater than the given value.Floor
at runtime. -
computes the nearest integral value not greater in magnitude than the given value.Trunc
at runtime for single precision floating point numbers, and calls_mm_round_pd
double precision floating point numbers, improving performance by320
%. -
computes the nearest integral value, rounding away from zero in halfway cases.Round
at runtime for single precision floating point numbers, and calls_mm_round_pd
double precision floating point numbers, improving performance by500
Floating point manipulation functions
decomposes a number into significand and base-2 exponent.FRExp
by approximately450
% for double precision floating point values, and by approximately550
% for single precision floating point values. -
decomposes a number into integer and fractional parts.ModF
by approximately60
% for double precision floating point values, and by approximately50
% for single precision floating point values. -
multiplies a number by FLT_RADIX raised to a power.ScaleByN
by approximately90
% for double precision floating point values, and by approximately170
% for single precision floating point values. -
next representable floating-point value towards the given value.NextAfter
by approximately40
% for double precision floating point values, and by approximately90
% for single precision floating point values. -
Return the smallest floating point number y of the same type as x such that x < y. If no such y exists, e.g. if x is Inf or NaN, then return x.The standard C++ implementation is
std::nextafter( x, std::numeric_limits<double>::infinity( ) )
by approximately1400
% for double precision floating point values, and by approximately1300
% for single precision floating point values. -
Return the largest floating point number y of the same type as x such that y < x. If no such y exists, e.g. if x is -Inf or NaN, then return x.The standard C++ implementation is
std::nextafter( x, -std::numeric_limits<double>::infinity( ) )
by approximately210
% for double precision floating point values, and by approximately330
% for single precision floating point values. -
copies the sign of a floating point value.NextDown
by approximately300
% for double precision floating point values, and by approximately10
% for single precision floating point values.
Classification and comparison
checks if the given number is NaN.IsNaN
at runtime. -
checks if the given number is infinite.IsInf
for double precision floating point values, and outperformsstd::isinf
for single precision floating point values by40
%. -
checks if the given number is negative.SignBit
by approximately50
% for both double and single precision floating point values.
Other computations
If the a value is within [minimumValue, maximumValue], the function returns the value, otherwise it returns the nearest boundary.Clamp
at runtime. -
Computes the linear interpolation between a and b, if the parameter t is inside [0, 1), the linear extrapolation otherwise, i.e. the result of a + t * ( b - a ) with accounting for floating point calculation imprecision.Lerp
at runtime.
BasicMathBenchmarks Results
Benchmark Time CPU Iterations Improvement
BenchmarkDoubleGenerator 2.37 ns 1.51 ns 640000000
BenchmarkFloatGenerator 1.80 ns 1.15 ns 640000000
BenchmarkDoubleIsSameValue 2.17 ns 1.57 ns 448000000
BenchmarkFloatIsSameValue 2.69 ns 1.95 ns 344615385
BenchmarkDoubleIsZero 2.64 ns 1.60 ns 497777778
BenchmarkFloatIsZero 1.90 ns 1.15 ns 746666667
BenchmarkDoubleIsNaN 2.63 ns 1.67 ns 373333333
BenchmarkDoubleStdIsNaN 1.70 ns 1.22 ns 640000000
BenchmarkFloatIsNaN 2.36 ns 1.57 ns 497777778
BenchmarkFloatStdIsNaN 1.89 ns 1.05 ns 746666667
BenchmarkDoubleSignum 2.13 ns 1.55 ns 373333333
BenchmarkDoubleNaiveSignum 1.97 ns 1.60 ns 448000000
BenchmarkFloatSignum 2.28 ns 2.01 ns 373333333
BenchmarkFloatNaiveSignum 2.05 ns 1.35 ns 497777778
BenchmarkDoubleDeg2Rad 1.72 ns 1.28 ns 560000000
BenchmarkFloatDeg2Rad 1.46 ns 1.15 ns 640000000
BenchmarkDoubleRad2Deg 1.67 ns 1.12 ns 640000000
BenchmarkFloatRad2Deg 1.51 ns 1.17 ns 746666667
BenchmarkDoubleNextAfter 6.40 ns 5.47 ns 100000000 (( 7.67 - 5.47)/5.47)*100 = 40.22 %
BenchmarkDoubleStdNextAfter 10.2 ns 7.67 ns 89600000
BenchmarkFloatNextAfter 4.42 ns 3.66 ns 179200000 (( 6.98 - 3.66)/3.66)*100 = 90.71 %
BenchmarkFloatStdNextAfter 10.9 ns 6.98 ns 89600000
BenchmarkDoubleInternalSqrt 35.2 ns 24.9 ns 21333333
BenchmarkDoubleSqrt 1.85 ns 1.33 ns 448000000 (( 20.5 - 1.33)/1.33)*100 = 1441.35 %
BenchmarkDoubleStdSqrt 25.8 ns 20.5 ns 37333333
BenchmarkFloatInternalSqrt 13.3 ns 10.2 ns 100000000
BenchmarkFloatSqrt 1.66 ns 1.26 ns 560000000 (( 18.0 - 1.26)/1.26)*100 = 1328.57 %
BenchmarkFloatStdSqrt 23.3 ns 18.0 ns 37333333
BenchmarkDoubleNextDown 2.56 ns 1.97 ns 373333333 (( 6.28 - 1.97)/1.97)*100 = 218.78 %
BenchmarkDoubleStdNextDown 10.6 ns 6.28 ns 89600000
BenchmarkFloatNextDown 2.39 ns 1.81 ns 560000000 (( 7.81 - 1.81)/1.81)*100 = 331.49 %
BenchmarkFloatStdNextDown 10.5 ns 7.81 ns 112000000
BenchmarkDoubleNextUp 2.95 ns 1.88 ns 373333333 (( 7.15 - 1.88)/1.88)*100 = 280.32 %
BenchmarkDoubleStdNextUp 9.68 ns 7.15 ns 89600000
BenchmarkFloatNextUp 2.67 ns 1.90 ns 560000000 (( 9.42 - 1.90)/1.90)*100 = 395.79 %
BenchmarkFloatStdNextUp 12.6 ns 9.42 ns 89600000
BenchmarkDoubleIsInf 1.78 ns 1.36 ns 448000000 (( 1.40 - 1.36)/1.36)*100 = 2.94 %
BenchmarkDoubleStdIsInf 2.00 ns 1.40 ns 448000000
BenchmarkFloatIsInf 1.43 ns 1.03 ns 1000000000 (( 1.48 - 1.03)/1.03)*100 = 43.69 %
BenchmarkFloatStdIsInf 1.88 ns 1.48 ns 896000000
BenchmarkDoubleInternalAbs 1.99 ns 1.51 ns 497777778
BenchmarkDoubleAbs 1.65 ns 1.20 ns 560000000 (( 1.29 - 1.20)/1.20)*100 = 7.5 %
BenchmarkDoubleStdAbs 1.73 ns 1.29 ns 448000000
BenchmarkFloatInternalAbs 2.21 ns 1.19 ns 448000000
BenchmarkFloatAbs 1.62 ns 1.22 ns 497777778 (( 1.26 - 1.22)/1.22)*100 = 3.28 %
BenchmarkFloatStdAbs 1.69 ns 1.26 ns 560000000
BenchmarkDoubleSignBit 1.72 ns 1.29 ns 896000000 (( 1.94 - 1.29)/1.29)*100 = 50.39 %
BenchmarkDoubleStdSignBit 2.94 ns 1.94 ns 298666667
BenchmarkFloatSignBit 1.92 ns 1.40 ns 448000000 (( 2.20 - 1.40)/1.40)*100 = 57.14 %
BenchmarkFloatStdSignBit 2.99 ns 2.20 ns 298666667
BenchmarkDoubleFRExp 3.26 ns 2.29 ns 320000000 (( 12.7 - 2.29)/2.29)*100 = 454.59 %
BenchmarkDoubleStdFRExp 16.9 ns 12.7 ns 89600000
BenchmarkFloatFRExp 3.09 ns 2.02 ns 448000000 (( 13.5 - 2.02)/2.02)*100 = 568.32 %
BenchmarkFloatStdFRExp 19.7 ns 13.5 ns 49777778
BenchmarkDoubleModF 3.67 ns 2.58 ns 344615385 (( 4.14 - 2.58)/2.58)*100 = 60.47 %
BenchmarkDoubleStdModF 6.09 ns 4.14 ns 248888889
BenchmarkFloatModF 3.51 ns 2.26 ns 373333333 (( 3.35 - 2.26)/2.26)*100 = 48.23 %
BenchmarkFloatStdModF 5.09 ns 3.35 ns 186666667
BenchmarkDoubleMin 1.70 ns 1.52 ns 576735632 (( 1.50 - 1.52)/1.52)*100 = -1.32 %
BenchmarkDoubleStdMin 1.78 ns 1.50 ns 407272727
BenchmarkFloatMin 1.80 ns 1.26 ns 448000000 (( 1.33 - 1.26)/1.26)*100 = 5.56 %
BenchmarkFloatStdMin 1.65 ns 1.33 ns 448000000
BenchmarkDoubleMax 1.61 ns 1.39 ns 640000000 (( 1.37 - 1.39)/1.39)*100 = -1.43 %
BenchmarkDoubleStdMax 1.72 ns 1.37 ns 593798817
BenchmarkFloatMax 1.68 ns 1.14 ns 560000000 (( 1.46 - 1.14)/1.14)*100 = 28.07 %
BenchmarkFloatStdMax 1.86 ns 1.46 ns 373333333
BenchmarkDoubleTrunc 1.72 ns 1.45 ns 344615385 (( 6.09 - 1.45)/1.45)*100 = 320 %
BenchmarkDoubleStdTrunc 8.13 ns 6.09 ns 100000000
BenchmarkFloatTrunc 1.66 ns 1.38 ns 407272727 (( 1.31 - 1.38)/1.38)*100 = -5.07 %
BenchmarkFloatStdTrunc 1.63 ns 1.31 ns 560000000
BenchmarkDoubleFloor 1.77 ns 1.33 ns 448000000 (( 1.09 - 1.33)/1.33)*100 = -18.04 %
BenchmarkDoubleStdFloor 1.49 ns 1.09 ns 560000000
BenchmarkFloatFloor 1.67 ns 1.31 ns 560000000 (( 1.16 - 1.31)/1.31)*100 = -11.45 %
BenchmarkFloatStdFloor 1.69 ns 1.16 ns 497777778
BenchmarkDoubleCeil 1.75 ns 1.23 ns 746666667 (( 1.40 - 1.23)/1.23)*100 = 13.82
BenchmarkDoubleStdCeil 1.87 ns 1.40 ns 560000000
BenchmarkFloatCeil 1.82 ns 1.24 ns 669013333 (( 1.27 - 1.24)/1.24)*100 = 2.42 %
BenchmarkFloatStdCeil 2.00 ns 1.27 ns 640000000
BenchmarkDoubleRound 1.78 ns 1.39 ns 640000000 (( 9.63 - 1.39)/1.39)*100 = 592.8 %
BenchmarkDoubleStdRound 11.9 ns 9.63 ns 74666667
BenchmarkFloatRound 1.72 ns 1.28 ns 560000000 (( 2.09 - 1.28)/1.28)*100 = 63.28 %
BenchmarkFloatStdRound 2.47 ns 2.09 ns 373333333
BenchmarkDoubleClamp 2.23 ns 1.46 ns 448000000 (( 1.67 - 1.46)/1.46)*100 = 14.38 %
BenchmarkDoubleStdClamp 2.04 ns 1.67 ns 746666667
BenchmarkFloatClamp 2.09 ns 1.59 ns 560000000 equal
BenchmarkFloatStdClamp 2.41 ns 1.59 ns 344615385
BenchmarkDoubleInternalLerpImpl 3.43 ns 2.46 ns 280000000 (( 8.23 - 2.46)/2.46)*100 = 234.55 %
BenchmarkDoubleLerp 8.67 ns 6.17 ns 111502223 (( 8.23 - 6.17)/6.17)*100 = 33.39 %
BenchmarkDoubleStdLerp 8.73 ns 8.23 ns 112000000
BenchmarkFloatInternalLerpImpl 3.24 ns 2.58 ns 224000000 (( 7.53 - 2.15)/2.15)*100 = 250.23 %
BenchmarkFloatLerp 8.15 ns 7.53 ns 112000000 (( 7.53 - 7.53)/7.53)*100 = 0 %
BenchmarkFloatStdLerp 8.67 ns 7.53 ns 112000000
BenchmarkDoubleCopySign 1.75 ns 1.26 ns 497777778 (( 5.16 - 1.26)/1.26)*100 = 309.52 %
BenchmarkDoubleStdCopySign 6.30 ns 5.16 ns 112000000
BenchmarkFloatCopySign 1.69 ns 1.23 ns 407272727 (( 1.36 - 1.23)/1.23)*100 = 10.57 %
BenchmarkFloatStdCopySign 1.77 ns 1.36 ns 448000000
BenchmarkDoubleScaleByN 4.29 ns 3.68 ns 203636364 (( 7.15 - 3.68)/3.68)*100 = 94.29 %
BenchmarkDoubleStdScaleByN 10.4 ns 7.15 ns 89600000
BenchmarkFloatScaleByN 3.37 ns 2.62 ns 280000000 (( 7.26 - 2.62)/2.62)*100 = 177.1 %
BenchmarkFloatStdScaleByN 10.1 ns 7.26 ns 92490323
BenchmarkDoubleFMod 9.71 ns 6.25 ns 100000000 (( 6.98 - 6.25)/6.25)*100 = 11.68 %
BenchmarkDoubleStdFMod 10.2 ns 6.98 ns 89600000
BenchmarkFloatFMod 10.2 ns 8.54 ns 64000000 (( 9.00 - 8.54)/8.54)*100 = 5.39 %
BenchmarkFloatStdFMod 10.5 ns 9.00 ns 74666667
BenchmarkDoubleInternalExpImpl 4.10 ns 3.31 ns 179200000
BenchmarkDoubleExp 4.38 ns 3.52 ns 248888889 (( 47.4 - 3.52)/3.52)*100 = 1246.59 %
BenchmarkDoubleStdExp 53.1 ns 47.4 ns 11200000
BenchmarkFloatInternalExpImpl 4.98 ns 3.81 ns 172307692
BenchmarkFloatExp 4.75 ns 3.72 ns 172307692 (( 40.8 - 3.72)/3.72)*100 = 996.77 %
BenchmarkFloatStdExp 53.0 ns 40.8 ns 14933333
BenchmarkDoubleInternalHypot 7.47 ns 5.16 ns 112000000
BenchmarkDoubleHypot 1.79 ns 1.54 ns 640000000 (( 5.72 - 1.54)/1.54)*100 = 271.42 %
BenchmarkDoubleStdHypot 6.90 ns 5.72 ns 112000000
BenchmarkFloatInternalHypot 6.35 ns 5.72 ns 112000000
BenchmarkFloatHypot 2.00 ns 1.51 ns 497777778 (( 5.00 - 1.51)/1.51)*100 = 231.13 %
BenchmarkFloatStdHypot 6.95 ns 5.00 ns 100000000
BenchmarkDoubleHypot3 3.28 ns 2.23 ns 280000000 (( 6.56 - 2.23)/2.23)*100 = 194.17 %
BenchmarkDoubleStdHypot3 7.20 ns 6.56 ns 112000000
BenchmarkFloatHypot3 2.04 ns 1.46 ns 448000000 (( 6.26 - 1.46)/1.46)*100 = 328.77 %
BenchmarkFloatStdHypot3 6.81 ns 6.26 ns 172307692
BenchmarkDoubleInternalLog 7.62 ns 5.87 ns 138416552
BenchmarkDoubleLog 7.05 ns 5.72 ns 112000000 (( 20.9 - 5.72)/5.72)*100 = 265.38 %
BenchmarkDoubleStdLog 29.8 ns 20.9 ns 34461538
BenchmarkFloatInternalLog 6.12 ns 5.16 ns 112000000
BenchmarkFloatLog 5.79 ns 4.74 ns 112000000 (( 23.4 - 4.74)/4.74)*100 = 393.67 %
BenchmarkFloatStdLog 29.3 ns 23.4 ns 28000000
BenchmarkDoubleInternalLog2 8.22 ns 6.00 ns 112000000
BenchmarkDoubleLog2 8.36 ns 6.63 ns 89600000 (( 24.7 - 6.63)/6.63)*100 = 272.55 %
BenchmarkDoubleStdLog2 35.2 ns 24.7 ns 37333333
BenchmarkFloatInternalLog2 6.04 ns 4.19 ns 149333333
BenchmarkFloatLog2 6.29 ns 5.62 ns 100000000 (( 20.1 - 5.62)/5.62)*100 = 257.65 %
BenchmarkFloatStdLog2 30.1 ns 20.1 ns 28000000
BenchmarkDoubleInternalLog10 8.37 ns 7.32 ns 89600000
BenchmarkDoubleLog10 8.27 ns 6.28 ns 112000000 (( 28.9 - 6.28)/6.28)*100 = 360.19 %
BenchmarkDoubleStdLog10 32.5 ns 28.9 ns 24888889
BenchmarkFloatInternalLog10 6.59 ns 4.96 ns 154482759
BenchmarkFloatLog10 6.31 ns 4.74 ns 112000000 (( 23.4 - 4.74)/4.74)*100 = 393.67 %
BenchmarkFloatStdLog10 29.9 ns 23.4 ns 32000000
BenchmarkDoubleInternalSin 10.7 ns 8.37 ns 74666667
BenchmarkDoubleSin 7.43 ns 5.45 ns 152048486 (( 6.70 - 5.45)/5.45)*100 = 22.94 %
BenchmarkDoubleStdSin 7.73 ns 6.70 ns 112000000
BenchmarkFloatInternalSin 4.76 ns 4.01 ns 179200000
BenchmarkFloatSin 4.81 ns 3.45 ns 248888889 (( 6.98 - 3.45)/3.45)*100 = 102.32 %
BenchmarkFloatStdSin 7.75 ns 6.98 ns 112000000
BenchmarkDoubleInternalCos 9.63 ns 7.97 ns 100000000
BenchmarkDoubleCos 8.62 ns 6.80 ns 89600000 (( 7.19 - 6.80)/6.80)*100 = 5.73 %
BenchmarkDoubleStdCos 8.53 ns 7.19 ns 100000000
BenchmarkFloatInternalCos 5.67 ns 4.17 ns 194782609
BenchmarkFloatCos 4.68 ns 3.59 ns 208849115 (( 7.66 - 3.59)/3.59)*100 = 113.37 %
BenchmarkFloatStdCos 8.58 ns 7.66 ns 100000000
BenchmarkDoubleInternalTan 17.2 ns 14.8 ns 56000000
BenchmarkDoubleTan 12.3 ns 9.49 ns 56000000 (( 5.93 - 9.49)/9.49)*100 = -37.51 %
BenchmarkDoubleStdTan 8.39 ns 5.93 ns 89600000
BenchmarkFloatInternalTan 5.79 ns 4.88 ns 112000000
BenchmarkFloatTan 5.73 ns 5.08 ns 160000000 (( 8.23 - 5.08)/5.08)*100 = 62.0 %
BenchmarkFloatStdTan 9.75 ns 8.23 ns 112000000
BenchmarkDoubleInternalATan 8.41 ns 6.25 ns 100000000
BenchmarkDoubleATan 8.17 ns 5.62 ns 100000000 (( 6.10 - 5.62)/5.62)*100 = 8.43 %
BenchmarkDoubleStdATan 8.59 ns 6.10 ns 89600000
BenchmarkFloatInternalATan 5.61 ns 3.48 ns 165925926
BenchmarkFloatATan 5.89 ns 4.00 ns 160000000 (( 5.31 - 4.00)/4.00)*100 = 32.75 %
BenchmarkFloatStdATan 7.01 ns 5.31 ns 100000000
BenchmarkDoubleInternalASin 8.99 ns 6.28 ns 89600000
BenchmarkDoubleASin 8.63 ns 5.78 ns 100000000 (( 6.98 - 5.78)/5.78)*100 = 20.76 %
BenchmarkDoubleStdASin 8.71 ns 6.98 ns 89600000
BenchmarkFloatInternalASin 4.69 ns 3.90 ns 172307692
BenchmarkFloatASin 4.91 ns 3.61 ns 194782609 (( 4.74 - 3.61)/3.61)*100 = 31.30 %
BenchmarkFloatStdASin 7.07 ns 4.74 ns 112000000
BenchmarkDoubleInternalACos 7.37 ns 6.25 ns 100000000
BenchmarkDoubleACos 7.15 ns 5.00 ns 100000000 (( 6.80 - 5.00)/5.00)*100 = 36.0 %
BenchmarkDoubleStdACos 8.74 ns 6.80 ns 89600000
BenchmarkFloatInternalACos 5.72 ns 5.16 ns 100000000
BenchmarkFloatACos 5.39 ns 3.72 ns 192984615 (( 5.86 - 3.72)/3.72)*100 = 57.53 %
BenchmarkFloatStdACos 6.93 ns 5.86 ns 112000000
BenchmarkDoubleInternalATan2 12.7 ns 10.8 ns 89600000
BenchmarkDoubleATan2 13.8 ns 12.2 ns 44800000 (( 15.1 - 12.2)/12.2)*100 = 23.77 %
BenchmarkDoubleStdATan2 20.9 ns 15.1 ns 37333333
BenchmarkFloatInternalATan2 10.4 ns 7.32 ns 74666667
BenchmarkFloatATan2 9.75 ns 7.95 ns 112000000 (( 11.6 - 7.95)/7.95)*100 = 45.91 %
BenchmarkFloatStdATan2 13.4 ns 11.6 ns 49777778