Optimized basic math functions
In C++, like any other programming language, computations are performed using the available set of operators, the intrinsic functions implemented by the compiler, and the set of core mathematical functions provided by the standard C and C++ libraries, or some alternative implementation.
The header, HCCMath.h
provides alternatives to many of the core mathematical function that is specified in
the C and C++ standards. All of the functions can be constexpr
evaluated, and several offer
runtime performance benefits as well.
The functions are implemented in the Harlinn::Common::Core::Math
namespace.
Unit Tests
Extensive unit tests, available here, strives to demonstrate the accuracy of the computations.
Benchmarks
The performance of the functions is benchmarked using the Google benchmark library, and can be verified by building and executing BasicMathBenchmarks included in the Harlinn.Windows solution.
Benchmarks for a single inline function cannot be relied upon to accurately determine how well the function will perform perform in a real application. For release builds, the compiler and linker, employs global optimization strategies, attempting to optimize the operations across all the compilation units. The global optimization strategies will often find optimization opportunities that are very hard to detect and implement manually, and the only way to really determine if one set of functions performs better than another, is to try them out in a real, computationally intensive, application.
It is, however, unlikely that a set of functions that performs worse than another, in a reasonable set of benchmarks, can outperform the other in a real application.
PBRTO a micro optimized raytracing app
PBRTO is a micro optimized version of PBRT-v4, under development
as an example of how the functionality in HCCMath.h
, HCCSIMD.h
and HCCVectorMath.h
can be used to
optimize the performance of real, computationally intensive, apps. It’s now about 35
%
faster than the release build of the original PBRT. more…
Background
The functions was created to enable constexpr
evaluation of mathematical expressions, since nothing
improves runtime performance as much as making the compiler calculate the results at compile time.
Much of the code is based on version 0.8.5
of the OpenLibm
mathematical C library used by the Julia programming language.
The library does not include the OpenLibm floating point environment, and relies on the floating point environment provided by the Visual C++ runtime.
Some functions, like Sin
, can only perform constexpr
evaluation for a subset of the
possible arguments. Sin
has no problems with constexpr
evaluation for \(\pm 20000^\circ\),
but fails to constexpr
evaluate Sin(1.7976931348623158e+308)
.
Implementation quality
The quality of the implementation is, since it is based on OpenLibm, very high. OpenLibm does many things very well, but sometimes the Visual C++ runtime, an intrinsic function, or other alternative implemented by the library, performs better. When this is the case, the library selects the implementation with the best runtime performance.
Testing
Rather extensive changes to the OpenLibm code was required to
enable constexpr
evaluation, and there are currently 535 unit tests helping to ensure the quality
of the mathematical parts of the library.
Several of the tests execute the function under test 20 000 times using random generated values, while others try every possible value over the range of values most often used with the function.
The functions in the Math
namespace that use the constepr
path implementation
at runtime are thoroughly tested, and the tests tries to determine the maximum
deviation between the standard function and the corresponding function in the Math
namespace.
Deviation is calculated by the Deviation
function below.
The value passed for the first argument is the expected result, usually calculated using
the standard implementation, while the value calculated by the corresponding function
in the Math
namespace is passed as the second argument.
inline double Deviation( double first, double second )
{
// If both is NaN, the results don't deviate
if ( std::isnan( first ) )
{
if ( std::isnan( second ) )
{
return 0.0;
}
return std::numeric_limits<double>::infinity( );
}
else if ( std::isnan( second ) )
{
// The second value is NaN, but not the first
return std::numeric_limits<double>::infinity( );
}
if ( std::isinf( first ) )
{
if ( std::isinf( second ) )
{
if ( first > 0. && second > 0. )
{
// Both values are +infinity
return 0;
}
else if ( first < 0. && second < 0. )
{
// Both values are -infinity
return 0;
}
// Opposite signs
return std::numeric_limits<double>::infinity( );
}
// only the first value is infinite
return std::numeric_limits<double>::infinity( );
}
else if ( std::isinf( second ) )
{
// only the second value is infinite
return std::numeric_limits<double>::infinity( );
}
// Avoid division by zero
if ( first != 0.0 )
{
using std::abs;
if ( first <= second )
{
return abs( second - first ) / abs( first );
}
else
{
return abs( first - second ) / abs( first );
}
}
else
{
// When second is very close to zero, the result is zero deviation
constexpr double veryCloseToZero = 5e-323;
auto absSecond = abs( second );
if ( absSecond <= veryCloseToZero )
{
return 0.0;
}
// May still be very close to zero, but will cause the test to fail.
return 1.0;
}
}
Exceptional performance
A few functions outperforms the standard implementation spectacularly, like Exp
which outperforms std::exp
by 1200
%.
The two implementations returns the same result for 2261694913
out of 2288746510
cases,
and when tested with double precision floating point argument values, uniformly distributed
over the interval [-744.0, 710.0]
, the maximum deviation, 1.56426946755e-12
was obtained when passing -717.256469727
as the argument to the functions.
Using SIMD::Traits<T.N>
Some functions, like Hypot
, use the SIMD::Traits<T.N>
specializations to achieve excellent runtime performance.
template<typename T>
requires IsFloatingPoint<T>
constexpr inline std::remove_cvref_t<T> Hypot( T x, T y, T z ) noexcept
{
if ( std::is_constant_evaluated( ) )
{
return Math::Internal::OpenLibM::FastHypot( x, y );
}
else
{
using FloatT = std::remove_cvref_t<T>;
using Traits = SIMD::Traits<FloatT, 3>;
auto v = Traits::Set( z, y, x );
v = Traits::Mul( v, v );
v = Traits::HSum( v );
v = Traits::Sqrt( v );
return Traits::First( v );
}
}
Using the standard, and the internal, implementation at runtime.
Math::Internal::TanImpl
performs about 40 % worse than std::tan
, for double precision
floating point values, but Math::Internal::TanImpl
beats std::tan
, for single precision
floating point values, by more than 60
%, and splitting the execution path between
Math::Internal::TanImpl
for single precision floating point values, and std::tan
provides the best solution:
template<typename T>
requires IsFloatingPoint<T>
constexpr inline std::remove_cvref_t<T> Tan( T x ) noexcept
{
using FloatT = std::remove_cvref_t<T>;
if ( std::is_constant_evaluated( ) )
{
if constexpr ( std::is_same_v<FloatT, float> )
{
return Math::Internal::OpenLibM::tanf( x );
}
else
{
return Math::Internal::OpenLibM::tan( x );
}
}
else
{
if constexpr ( std::is_same_v<FloatT, float> )
{
return Math::Internal::OpenLibM::tanf( x );
}
else
{
return std::tan( x );
}
}
}
BasicMathBenchmarks results, at the bottom of this page, shows how the performance numbers were calculated. The runtime execution path for each function is selected, as shown, based on its performance in the benchmarks.
Basic operations
-
Abs
which returns the absolute value \(|x|\) forx
, is implemented for floating point types, signed integers and unsigned integers. Callsstd::abs
at runtime. -
FMod
which calculate the remainder of a floating point division operation, is implemented for floating point types.FMod
outperformsstd::fmod
by approximately60
% for double precision floating point values, and by40
% for single precision floating point values. -
Max
which returns the greater of to values, is implemented for floating point types.Max
callsstd::max
at compile time, and at runtime it calls_mm_max_ss
for single precision floating point values, andstd::max
for double precision floating point values. This improves the performance, on the average, by10
% for single precision floating point values. It varies between2
% and30
% for each run of the benchmarks. -
Min
which returns the lesser of to values, is implemented for floating point types.Min
callsstd::min
at compile time, and at runtime it calls_mm_min_ss
for single precision floating point values, andstd::min
for double precision floating point values. This improves the performance, on the average, by7
% for single precision floating point values. It varies between-2
% and20
% for each run of the benchmarks. -
IsSameValue
checks for binary equality between two floating point values.
Exponential functions
-
Exp
returns e raised to the given power (ex
).Exp
outperformsstd::exp
by approximately1200
% for double precision floating point values, and by approximately1000
% for single precision floating point values.The maximum detected deviation between
std::exp
andExp
is1.18844e-07
for single precision floating point values, and2.18599e-16
for double precision floating point values, for argument values in the range-9
to10
, tested with a uniform random distribution of10'000
values. -
Log
computes natural, base e, logarithm (ln x)Log
outperformsstd::log
by approximately260
% for double precision floating point values, and by approximately400
% for single precision floating point values.The maximum detected deviation between
std::log
andLog
is1.18795e-07
for single precision floating point values, and2.04848e-16
for double precision floating point values, for argument values in the range0
to100000
, tested with a uniform random distribution of10'000
values. -
Log2
base 2 logarithm of the given number (log2x
).Log2
outperformsstd::log2
by approximately270
% for double precision floating point values, and by approximately250
% for single precision floating point values.The maximum detected deviation between
std::log2
andLog2
is1.18288e-07
for single precision floating point values, and2.1882e-16
for double precision floating point values, for argument values in the range0
to100000
, tested with a uniform random distribution of10'000
values. -
Log10
computes common (base 10) logarithm (log10x
)Log10
outperformsstd::log10
by approximately360
% for double precision floating point values, and by approximately390
% for single precision floating point values.The maximum detected deviation between
std::log10
andLog10
is1.18216e-07
for single precision floating point values, and2.0154e-16
for double precision floating point values, for argument values in the range0
to100000
, tested with a uniform random distribution of10'000
values.
Power functions
-
Sqrt
computes square root (√x
)Calls
_mm_sqrt_pd
or_mm_sqrt_ps
at runtime.Sqrt
outperformsstd::sqrt
by approximately1400
% for double precision floating point values, and by approximately1300
% for single precision floating point values. -
Hypot
computes square root of the sum of the squares of two or three numbers.The two argument version of
Hypot
outperformsstd::hypot
by approximately270
% for double precision floating point values, and by approximately230
% for single precision floating point values.The three argument version of
Hypot
outperformsstd::hypot
by approximately190
% for double precision floating point values, and by approximately320
% for single precision floating point values.
Trigonometric functions
Graphic intensive application are highly sensitive to the performance of the trigonometric functions, especially for single precision floating point values.
-
Sin
computes the sine of its argument given in radians.Sin
callsstd::sin
at runtime for both single and double precision values.The
constexpr
path forSin
outperformsstd::sin
by approximately100
% for single precision floating point values, but performs worse when both the sine and the cosine is calculated for the same value.The maximum deviation between
std::sin
and theconstexpr
path forSin
is1.19182e-07
for single precision floating point values, and2.22045e-16
for double precision floating point values, tested for all possible single precision floating point argument values for in the range-((2*pi)+epsilon)
to((2*pi)+epsilon)
. -
Cos
computes the cosine of its argument given in radians.Cos
callsstd::cos
at runtime for both single and double precision values.The
constexpr
path forCos
outperformsstd::cos
by approximately110
% for single precision floating point values, but performs worse when both the sine and the cosine is calculated for the same value.The maximum deviation between
std::cos
and theconstexpr
path forCos
is1.19187e-07
for single precision floating point values, and2.22044e-16
for double precision floating point values, tested for all possible single precision floating point argument values for in the range-((2*pi)+epsilon)
to((2*pi)+epsilon)
. -
Tan
computes the tangent of its argument given in radians.Tan
outperformsstd::tan
by approximately60
% for single precision floating point values,Tan
calls std::tan at runtime, with a consistent performance penalty of about20
% compared to callingstd::tan
directly for double precision floating point values.The maximum deviation between
std::tan
and theconstexpr
path forTan
is1.19209e-07
for single precision floating point values, and2.22045e-16
for double precision floating point values, tested for all possible single precision floating point argument values for in the range-((2*pi)+epsilon)
to((2*pi)+epsilon)
. -
ASin
computes arc sine of its argument.ASin
outperformsstd::asin
by approximately20
% for double precision floating point values, and by approximately30
% for single precision floating point values.The maximum deviation between
std::asin
andASin
is2.27673e-07
for single precision floating point values, and2.22044e-16
for double precision floating point values, tested for all possible single precision floating point argument values for in the range-1.0
to1.0
. -
ACos
computes the arc cosine of its argument.ACos
outperformsstd::acos
by approximately30
% for double precision floating point values, and by approximately50
% for single precision floating point values.The maximum deviation between
std::acos
andACos
is1.19209e-07
for single precision floating point values, and2.22044e-16
for double precision floating point values, tested for all possible single precision floating point argument values for in the range-1.0
to1.0
. -
ATan
computes the arc tangent of its argument.ATan
outperformsstd::atan
by approximately5
% for double precision floating point values, and by approximately30
% for single precision floating point values.The maximum detected deviation between
std::atan
andATan
is0.0
for single precision floating point values, and0.0
for double precision floating point values, tested with a uniform random distribution of10'000
values in the range-10'000
to10'000
. -
ATan2
computes the arc tangent ofx
/y
, its two arguments, using signs to determine quadrants.ATan2
outperformsstd::atan2
by approximately20
% for double precision floating point values, and by approximately40
% for single precision floating point values.The maximum detected deviation between
std::atan2
andATan2
is0.0
for single precision floating point values, and0.0
for double precision floating point values, tested with a uniform random distribution of10'000
values in the range-10'000
to10'000
.
Nearest integral value floating point operations
-
Ceil
computes the nearest integral value not less than the given value.Ceil
calls__ceil
or__ceilf
at runtime. -
Floor
computes the nearest integral value not greater than the given value.Floor
calls__floor
or__floorf
at runtime. -
Trunc
computes the nearest integral value not greater in magnitude than the given value.Trunc
calls__truncf
at runtime for single precision floating point numbers, and calls_mm_round_pd
double precision floating point numbers, improving performance by320
%. -
Round
computes the nearest integral value, rounding away from zero in halfway cases.Round
calls__roundf
at runtime for single precision floating point numbers, and calls_mm_round_pd
double precision floating point numbers, improving performance by500
%.
Floating point manipulation functions
-
FRExp
decomposes a number into significand and base-2 exponent.FRExp
outperformsstd::frexp
by approximately450
% for double precision floating point values, and by approximately550
% for single precision floating point values. -
ModF
decomposes a number into integer and fractional parts.ModF
outperformsstd::modf
by approximately60
% for double precision floating point values, and by approximately50
% for single precision floating point values. -
ScaleByN
multiplies a number by FLT_RADIX raised to a power.ScaleByN
outperformsstd::scalbn
by approximately90
% for double precision floating point values, and by approximately170
% for single precision floating point values. -
NextAfter
next representable floating-point value towards the given value.NextAfter
outperformsstd::nextafter
by approximately40
% for double precision floating point values, and by approximately90
% for single precision floating point values. -
NextUp
Return the smallest floating point number y of the same type as x such that x < y. If no such y exists, e.g. if x is Inf or NaN, then return x.The standard C++ implementation is
std::nextafter( x, std::numeric_limits<double>::infinity( ) )
.NextUp
outperformsstd::nextafter
by approximately1400
% for double precision floating point values, and by approximately1300
% for single precision floating point values. -
NextDown
Return the largest floating point number y of the same type as x such that y < x. If no such y exists, e.g. if x is -Inf or NaN, then return x.The standard C++ implementation is
std::nextafter( x, -std::numeric_limits<double>::infinity( ) )
.NextDown
outperformsstd::nextafter
by approximately210
% for double precision floating point values, and by approximately330
% for single precision floating point values. -
CopySign
copies the sign of a floating point value.NextDown
outperformsstd::nextafter
by approximately300
% for double precision floating point values, and by approximately10
% for single precision floating point values.
Classification and comparison
-
IsNaN
checks if the given number is NaN.IsNaN
callsstd::isnan
at runtime. -
IsInf
checks if the given number is infinite.IsInf
callsstd::isinf
for double precision floating point values, and outperformsstd::isinf
for single precision floating point values by40
%. -
SignBit
checks if the given number is negative.SignBit
outperformsstd::signbit
by approximately50
% for both double and single precision floating point values.
Other computations
-
Clamp
If the a value is within [minimumValue, maximumValue], the function returns the value, otherwise it returns the nearest boundary.Clamp
callsstd::clamp
at runtime. -
Lerp
Computes the linear interpolation between a and b, if the parameter t is inside [0, 1), the linear extrapolation otherwise, i.e. the result of a + t * ( b - a ) with accounting for floating point calculation imprecision.Lerp
callsstd::lerp
at runtime.
BasicMathBenchmarks Results
-------------------------------------------------------------------------------------------
Benchmark Time CPU Iterations Improvement
-------------------------------------------------------------------------------------------
BenchmarkDoubleGenerator 2.37 ns 1.51 ns 640000000
BenchmarkFloatGenerator 1.80 ns 1.15 ns 640000000
BenchmarkDoubleIsSameValue 2.17 ns 1.57 ns 448000000
BenchmarkFloatIsSameValue 2.69 ns 1.95 ns 344615385
BenchmarkDoubleIsZero 2.64 ns 1.60 ns 497777778
BenchmarkFloatIsZero 1.90 ns 1.15 ns 746666667
BenchmarkDoubleIsNaN 2.63 ns 1.67 ns 373333333
BenchmarkDoubleStdIsNaN 1.70 ns 1.22 ns 640000000
BenchmarkFloatIsNaN 2.36 ns 1.57 ns 497777778
BenchmarkFloatStdIsNaN 1.89 ns 1.05 ns 746666667
BenchmarkDoubleSignum 2.13 ns 1.55 ns 373333333
BenchmarkDoubleNaiveSignum 1.97 ns 1.60 ns 448000000
BenchmarkFloatSignum 2.28 ns 2.01 ns 373333333
BenchmarkFloatNaiveSignum 2.05 ns 1.35 ns 497777778
BenchmarkDoubleDeg2Rad 1.72 ns 1.28 ns 560000000
BenchmarkFloatDeg2Rad 1.46 ns 1.15 ns 640000000
BenchmarkDoubleRad2Deg 1.67 ns 1.12 ns 640000000
BenchmarkFloatRad2Deg 1.51 ns 1.17 ns 746666667
BenchmarkDoubleNextAfter 6.40 ns 5.47 ns 100000000 (( 7.67 - 5.47)/5.47)*100 = 40.22 %
BenchmarkDoubleStdNextAfter 10.2 ns 7.67 ns 89600000
BenchmarkFloatNextAfter 4.42 ns 3.66 ns 179200000 (( 6.98 - 3.66)/3.66)*100 = 90.71 %
BenchmarkFloatStdNextAfter 10.9 ns 6.98 ns 89600000
BenchmarkDoubleInternalSqrt 35.2 ns 24.9 ns 21333333
BenchmarkDoubleSqrt 1.85 ns 1.33 ns 448000000 (( 20.5 - 1.33)/1.33)*100 = 1441.35 %
BenchmarkDoubleStdSqrt 25.8 ns 20.5 ns 37333333
BenchmarkFloatInternalSqrt 13.3 ns 10.2 ns 100000000
BenchmarkFloatSqrt 1.66 ns 1.26 ns 560000000 (( 18.0 - 1.26)/1.26)*100 = 1328.57 %
BenchmarkFloatStdSqrt 23.3 ns 18.0 ns 37333333
BenchmarkDoubleNextDown 2.56 ns 1.97 ns 373333333 (( 6.28 - 1.97)/1.97)*100 = 218.78 %
BenchmarkDoubleStdNextDown 10.6 ns 6.28 ns 89600000
BenchmarkFloatNextDown 2.39 ns 1.81 ns 560000000 (( 7.81 - 1.81)/1.81)*100 = 331.49 %
BenchmarkFloatStdNextDown 10.5 ns 7.81 ns 112000000
BenchmarkDoubleNextUp 2.95 ns 1.88 ns 373333333 (( 7.15 - 1.88)/1.88)*100 = 280.32 %
BenchmarkDoubleStdNextUp 9.68 ns 7.15 ns 89600000
BenchmarkFloatNextUp 2.67 ns 1.90 ns 560000000 (( 9.42 - 1.90)/1.90)*100 = 395.79 %
BenchmarkFloatStdNextUp 12.6 ns 9.42 ns 89600000
BenchmarkDoubleIsInf 1.78 ns 1.36 ns 448000000 (( 1.40 - 1.36)/1.36)*100 = 2.94 %
BenchmarkDoubleStdIsInf 2.00 ns 1.40 ns 448000000
BenchmarkFloatIsInf 1.43 ns 1.03 ns 1000000000 (( 1.48 - 1.03)/1.03)*100 = 43.69 %
BenchmarkFloatStdIsInf 1.88 ns 1.48 ns 896000000
BenchmarkDoubleInternalAbs 1.99 ns 1.51 ns 497777778
BenchmarkDoubleAbs 1.65 ns 1.20 ns 560000000 (( 1.29 - 1.20)/1.20)*100 = 7.5 %
BenchmarkDoubleStdAbs 1.73 ns 1.29 ns 448000000
BenchmarkFloatInternalAbs 2.21 ns 1.19 ns 448000000
BenchmarkFloatAbs 1.62 ns 1.22 ns 497777778 (( 1.26 - 1.22)/1.22)*100 = 3.28 %
BenchmarkFloatStdAbs 1.69 ns 1.26 ns 560000000
BenchmarkDoubleSignBit 1.72 ns 1.29 ns 896000000 (( 1.94 - 1.29)/1.29)*100 = 50.39 %
BenchmarkDoubleStdSignBit 2.94 ns 1.94 ns 298666667
BenchmarkFloatSignBit 1.92 ns 1.40 ns 448000000 (( 2.20 - 1.40)/1.40)*100 = 57.14 %
BenchmarkFloatStdSignBit 2.99 ns 2.20 ns 298666667
BenchmarkDoubleFRExp 3.26 ns 2.29 ns 320000000 (( 12.7 - 2.29)/2.29)*100 = 454.59 %
BenchmarkDoubleStdFRExp 16.9 ns 12.7 ns 89600000
BenchmarkFloatFRExp 3.09 ns 2.02 ns 448000000 (( 13.5 - 2.02)/2.02)*100 = 568.32 %
BenchmarkFloatStdFRExp 19.7 ns 13.5 ns 49777778
BenchmarkDoubleModF 3.67 ns 2.58 ns 344615385 (( 4.14 - 2.58)/2.58)*100 = 60.47 %
BenchmarkDoubleStdModF 6.09 ns 4.14 ns 248888889
BenchmarkFloatModF 3.51 ns 2.26 ns 373333333 (( 3.35 - 2.26)/2.26)*100 = 48.23 %
BenchmarkFloatStdModF 5.09 ns 3.35 ns 186666667
BenchmarkDoubleMin 1.70 ns 1.52 ns 576735632 (( 1.50 - 1.52)/1.52)*100 = -1.32 %
BenchmarkDoubleStdMin 1.78 ns 1.50 ns 407272727
BenchmarkFloatMin 1.80 ns 1.26 ns 448000000 (( 1.33 - 1.26)/1.26)*100 = 5.56 %
BenchmarkFloatStdMin 1.65 ns 1.33 ns 448000000
BenchmarkDoubleMax 1.61 ns 1.39 ns 640000000 (( 1.37 - 1.39)/1.39)*100 = -1.43 %
BenchmarkDoubleStdMax 1.72 ns 1.37 ns 593798817
BenchmarkFloatMax 1.68 ns 1.14 ns 560000000 (( 1.46 - 1.14)/1.14)*100 = 28.07 %
BenchmarkFloatStdMax 1.86 ns 1.46 ns 373333333
BenchmarkDoubleTrunc 1.72 ns 1.45 ns 344615385 (( 6.09 - 1.45)/1.45)*100 = 320 %
BenchmarkDoubleStdTrunc 8.13 ns 6.09 ns 100000000
BenchmarkFloatTrunc 1.66 ns 1.38 ns 407272727 (( 1.31 - 1.38)/1.38)*100 = -5.07 %
BenchmarkFloatStdTrunc 1.63 ns 1.31 ns 560000000
BenchmarkDoubleFloor 1.77 ns 1.33 ns 448000000 (( 1.09 - 1.33)/1.33)*100 = -18.04 %
BenchmarkDoubleStdFloor 1.49 ns 1.09 ns 560000000
BenchmarkFloatFloor 1.67 ns 1.31 ns 560000000 (( 1.16 - 1.31)/1.31)*100 = -11.45 %
BenchmarkFloatStdFloor 1.69 ns 1.16 ns 497777778
BenchmarkDoubleCeil 1.75 ns 1.23 ns 746666667 (( 1.40 - 1.23)/1.23)*100 = 13.82
BenchmarkDoubleStdCeil 1.87 ns 1.40 ns 560000000
BenchmarkFloatCeil 1.82 ns 1.24 ns 669013333 (( 1.27 - 1.24)/1.24)*100 = 2.42 %
BenchmarkFloatStdCeil 2.00 ns 1.27 ns 640000000
BenchmarkDoubleRound 1.78 ns 1.39 ns 640000000 (( 9.63 - 1.39)/1.39)*100 = 592.8 %
BenchmarkDoubleStdRound 11.9 ns 9.63 ns 74666667
BenchmarkFloatRound 1.72 ns 1.28 ns 560000000 (( 2.09 - 1.28)/1.28)*100 = 63.28 %
BenchmarkFloatStdRound 2.47 ns 2.09 ns 373333333
BenchmarkDoubleClamp 2.23 ns 1.46 ns 448000000 (( 1.67 - 1.46)/1.46)*100 = 14.38 %
BenchmarkDoubleStdClamp 2.04 ns 1.67 ns 746666667
BenchmarkFloatClamp 2.09 ns 1.59 ns 560000000 equal
BenchmarkFloatStdClamp 2.41 ns 1.59 ns 344615385
BenchmarkDoubleInternalLerpImpl 3.43 ns 2.46 ns 280000000 (( 8.23 - 2.46)/2.46)*100 = 234.55 %
BenchmarkDoubleLerp 8.67 ns 6.17 ns 111502223 (( 8.23 - 6.17)/6.17)*100 = 33.39 %
BenchmarkDoubleStdLerp 8.73 ns 8.23 ns 112000000
BenchmarkFloatInternalLerpImpl 3.24 ns 2.58 ns 224000000 (( 7.53 - 2.15)/2.15)*100 = 250.23 %
BenchmarkFloatLerp 8.15 ns 7.53 ns 112000000 (( 7.53 - 7.53)/7.53)*100 = 0 %
BenchmarkFloatStdLerp 8.67 ns 7.53 ns 112000000
BenchmarkDoubleCopySign 1.75 ns 1.26 ns 497777778 (( 5.16 - 1.26)/1.26)*100 = 309.52 %
BenchmarkDoubleStdCopySign 6.30 ns 5.16 ns 112000000
BenchmarkFloatCopySign 1.69 ns 1.23 ns 407272727 (( 1.36 - 1.23)/1.23)*100 = 10.57 %
BenchmarkFloatStdCopySign 1.77 ns 1.36 ns 448000000
BenchmarkDoubleScaleByN 4.29 ns 3.68 ns 203636364 (( 7.15 - 3.68)/3.68)*100 = 94.29 %
BenchmarkDoubleStdScaleByN 10.4 ns 7.15 ns 89600000
BenchmarkFloatScaleByN 3.37 ns 2.62 ns 280000000 (( 7.26 - 2.62)/2.62)*100 = 177.1 %
BenchmarkFloatStdScaleByN 10.1 ns 7.26 ns 92490323
BenchmarkDoubleFMod 9.71 ns 6.25 ns 100000000 (( 6.98 - 6.25)/6.25)*100 = 11.68 %
BenchmarkDoubleStdFMod 10.2 ns 6.98 ns 89600000
BenchmarkFloatFMod 10.2 ns 8.54 ns 64000000 (( 9.00 - 8.54)/8.54)*100 = 5.39 %
BenchmarkFloatStdFMod 10.5 ns 9.00 ns 74666667
BenchmarkDoubleInternalExpImpl 4.10 ns 3.31 ns 179200000
BenchmarkDoubleExp 4.38 ns 3.52 ns 248888889 (( 47.4 - 3.52)/3.52)*100 = 1246.59 %
BenchmarkDoubleStdExp 53.1 ns 47.4 ns 11200000
BenchmarkFloatInternalExpImpl 4.98 ns 3.81 ns 172307692
BenchmarkFloatExp 4.75 ns 3.72 ns 172307692 (( 40.8 - 3.72)/3.72)*100 = 996.77 %
BenchmarkFloatStdExp 53.0 ns 40.8 ns 14933333
BenchmarkDoubleInternalHypot 7.47 ns 5.16 ns 112000000
BenchmarkDoubleHypot 1.79 ns 1.54 ns 640000000 (( 5.72 - 1.54)/1.54)*100 = 271.42 %
BenchmarkDoubleStdHypot 6.90 ns 5.72 ns 112000000
BenchmarkFloatInternalHypot 6.35 ns 5.72 ns 112000000
BenchmarkFloatHypot 2.00 ns 1.51 ns 497777778 (( 5.00 - 1.51)/1.51)*100 = 231.13 %
BenchmarkFloatStdHypot 6.95 ns 5.00 ns 100000000
BenchmarkDoubleHypot3 3.28 ns 2.23 ns 280000000 (( 6.56 - 2.23)/2.23)*100 = 194.17 %
BenchmarkDoubleStdHypot3 7.20 ns 6.56 ns 112000000
BenchmarkFloatHypot3 2.04 ns 1.46 ns 448000000 (( 6.26 - 1.46)/1.46)*100 = 328.77 %
BenchmarkFloatStdHypot3 6.81 ns 6.26 ns 172307692
BenchmarkDoubleInternalLog 7.62 ns 5.87 ns 138416552
BenchmarkDoubleLog 7.05 ns 5.72 ns 112000000 (( 20.9 - 5.72)/5.72)*100 = 265.38 %
BenchmarkDoubleStdLog 29.8 ns 20.9 ns 34461538
BenchmarkFloatInternalLog 6.12 ns 5.16 ns 112000000
BenchmarkFloatLog 5.79 ns 4.74 ns 112000000 (( 23.4 - 4.74)/4.74)*100 = 393.67 %
BenchmarkFloatStdLog 29.3 ns 23.4 ns 28000000
BenchmarkDoubleInternalLog2 8.22 ns 6.00 ns 112000000
BenchmarkDoubleLog2 8.36 ns 6.63 ns 89600000 (( 24.7 - 6.63)/6.63)*100 = 272.55 %
BenchmarkDoubleStdLog2 35.2 ns 24.7 ns 37333333
BenchmarkFloatInternalLog2 6.04 ns 4.19 ns 149333333
BenchmarkFloatLog2 6.29 ns 5.62 ns 100000000 (( 20.1 - 5.62)/5.62)*100 = 257.65 %
BenchmarkFloatStdLog2 30.1 ns 20.1 ns 28000000
BenchmarkDoubleInternalLog10 8.37 ns 7.32 ns 89600000
BenchmarkDoubleLog10 8.27 ns 6.28 ns 112000000 (( 28.9 - 6.28)/6.28)*100 = 360.19 %
BenchmarkDoubleStdLog10 32.5 ns 28.9 ns 24888889
BenchmarkFloatInternalLog10 6.59 ns 4.96 ns 154482759
BenchmarkFloatLog10 6.31 ns 4.74 ns 112000000 (( 23.4 - 4.74)/4.74)*100 = 393.67 %
BenchmarkFloatStdLog10 29.9 ns 23.4 ns 32000000
BenchmarkDoubleInternalSin 10.7 ns 8.37 ns 74666667
BenchmarkDoubleSin 7.43 ns 5.45 ns 152048486 (( 6.70 - 5.45)/5.45)*100 = 22.94 %
BenchmarkDoubleStdSin 7.73 ns 6.70 ns 112000000
BenchmarkFloatInternalSin 4.76 ns 4.01 ns 179200000
BenchmarkFloatSin 4.81 ns 3.45 ns 248888889 (( 6.98 - 3.45)/3.45)*100 = 102.32 %
BenchmarkFloatStdSin 7.75 ns 6.98 ns 112000000
BenchmarkDoubleInternalCos 9.63 ns 7.97 ns 100000000
BenchmarkDoubleCos 8.62 ns 6.80 ns 89600000 (( 7.19 - 6.80)/6.80)*100 = 5.73 %
BenchmarkDoubleStdCos 8.53 ns 7.19 ns 100000000
BenchmarkFloatInternalCos 5.67 ns 4.17 ns 194782609
BenchmarkFloatCos 4.68 ns 3.59 ns 208849115 (( 7.66 - 3.59)/3.59)*100 = 113.37 %
BenchmarkFloatStdCos 8.58 ns 7.66 ns 100000000
BenchmarkDoubleInternalTan 17.2 ns 14.8 ns 56000000
BenchmarkDoubleTan 12.3 ns 9.49 ns 56000000 (( 5.93 - 9.49)/9.49)*100 = -37.51 %
BenchmarkDoubleStdTan 8.39 ns 5.93 ns 89600000
BenchmarkFloatInternalTan 5.79 ns 4.88 ns 112000000
BenchmarkFloatTan 5.73 ns 5.08 ns 160000000 (( 8.23 - 5.08)/5.08)*100 = 62.0 %
BenchmarkFloatStdTan 9.75 ns 8.23 ns 112000000
BenchmarkDoubleInternalATan 8.41 ns 6.25 ns 100000000
BenchmarkDoubleATan 8.17 ns 5.62 ns 100000000 (( 6.10 - 5.62)/5.62)*100 = 8.43 %
BenchmarkDoubleStdATan 8.59 ns 6.10 ns 89600000
BenchmarkFloatInternalATan 5.61 ns 3.48 ns 165925926
BenchmarkFloatATan 5.89 ns 4.00 ns 160000000 (( 5.31 - 4.00)/4.00)*100 = 32.75 %
BenchmarkFloatStdATan 7.01 ns 5.31 ns 100000000
BenchmarkDoubleInternalASin 8.99 ns 6.28 ns 89600000
BenchmarkDoubleASin 8.63 ns 5.78 ns 100000000 (( 6.98 - 5.78)/5.78)*100 = 20.76 %
BenchmarkDoubleStdASin 8.71 ns 6.98 ns 89600000
BenchmarkFloatInternalASin 4.69 ns 3.90 ns 172307692
BenchmarkFloatASin 4.91 ns 3.61 ns 194782609 (( 4.74 - 3.61)/3.61)*100 = 31.30 %
BenchmarkFloatStdASin 7.07 ns 4.74 ns 112000000
BenchmarkDoubleInternalACos 7.37 ns 6.25 ns 100000000
BenchmarkDoubleACos 7.15 ns 5.00 ns 100000000 (( 6.80 - 5.00)/5.00)*100 = 36.0 %
BenchmarkDoubleStdACos 8.74 ns 6.80 ns 89600000
BenchmarkFloatInternalACos 5.72 ns 5.16 ns 100000000
BenchmarkFloatACos 5.39 ns 3.72 ns 192984615 (( 5.86 - 3.72)/3.72)*100 = 57.53 %
BenchmarkFloatStdACos 6.93 ns 5.86 ns 112000000
BenchmarkDoubleInternalATan2 12.7 ns 10.8 ns 89600000
BenchmarkDoubleATan2 13.8 ns 12.2 ns 44800000 (( 15.1 - 12.2)/12.2)*100 = 23.77 %
BenchmarkDoubleStdATan2 20.9 ns 15.1 ns 37333333
BenchmarkFloatInternalATan2 10.4 ns 7.32 ns 74666667
BenchmarkFloatATan2 9.75 ns 7.95 ns 112000000 (( 11.6 - 7.95)/7.95)*100 = 45.91 %
BenchmarkFloatStdATan2 13.4 ns 11.6 ns 49777778