PBRTO a micro optimized raytracing app
PBRTO is a micro optimized version of PBRT-v4, under development
as an example of how the functionality in HCCMath.h
, HCCSIMD.h
and HCCVectorMath.h
can be used to
optimize the performance of real, computationally intensive, apps. It’s now about 35
%
faster than the release build of the original PBRT.
PBRT is a well written piece of software, with a code base that is, for such a complex undertaking, easy to understand and modify. Analyzing the performance of the original code is both instructive and interesting, as many of pbrts’ mathematical functions perform very well.
Most of the solution is contained within the Harlinn.pbrto project. This project builds PBRT at a dynamic link library, which makes it easy to create benchmarks for the original PBRT code,
Harlinn.pbrto can be built with, or without, the functionality from HCCMath.h
, HCCSIMD.h
and HCCVectorMath.h
.
To build without, make sure PBRT_USES_HCCMATH
is not defined in pbrtodef.h
. At this stage
it may not always build with the functionality from HCCMath.h
, HCCSIMD.h
and HCCVectorMath.h
.
PBRT has its own set of optimized mathematical functions, and it’s only when we start comparing the vector and matrix operations, that it becomes clear that some significant performance improvements are possible.
Operations on \(4 \times 4\) matrices:
PBRT | Math | Improvement | |
---|---|---|---|
Addition | 8.37 ns | 1.93 ns | 333 % |
Transpose | 5.00 ns | 1.95 ns | 156.4 % |
Scalar multiplication | 2.72 ns | 1.46 ns | 86.3 % |
Matrix multiplication | 26.1 ns | 4.10 ns | 536.6 % |
Determinant | 11.0 ns | 2.18 ns | 404.6 % |
Inverse | 126 ns | 12.3 ns | 924.4 % |
Operations on \(3 \times 3\) matrices:
PBRT | Math | Improvement | |
---|---|---|---|
Addition | 8.54 ns | 2.92 ns | 192.5 % |
Transpose | 6.10 ns | 2.83 ns | 115.5 % |
Scalar multiplication | 5.44 ns | 2.18 ns | 149.5 % |
Matrix multiplication | 15.5 ns | 5.16 ns | 200.4 % |
Determinant | 5.27 ns | 3.08 ns | 71.1 % |
Inverse | 18.6 ns | 4.43 ns | 319.9 % |
Operations on \(2 \times 2\) matrices:
PBRT | Math | Improvement | |
---|---|---|---|
Addition | 2.69 ns | 1.90 ns | 41.6 % |
Transpose | 3.85 ns | 1.37 ns | 181.0 % |
Scalar multiplication | 1.84 ns | 1.35 ns | 36.3 % |
Matrix multiplication | 4.52 ns | 1.88 ns | 140.4 % |
Determinant | 1.19 ns | 1.57 ns | -24.2 % |
Inverse | 41.7 ns | 1.77 ns | 2255.9 % |
-----------------------------------------------------------------------------------------
Benchmark Time CPU Iterations
-----------------------------------------------------------------------------------------
BenchmarkDoubleGenerator 1.63 ns 0.952 ns 640000000
BenchmarkFloatGenerator 1.90 ns 1.53 ns 407272727
BenchmarkDoubleIsSameValue 1.98 ns 1.37 ns 560000000
BenchmarkFloatIsSameValue 1.43 ns 1.12 ns 640000000
BenchmarkDoubleIsZero 1.65 ns 1.29 ns 640000000
BenchmarkFloatIsZero 2.11 ns 1.65 ns 407272727
BenchmarkDoubleIsNaN 1.45 ns 1.10 ns 640000000
BenchmarkDoubleOpenLibMIsNaN 2.36 ns 1.90 ns 280000000
BenchmarkDoubleStdIsNaN 1.38 ns 1.07 ns 640000000
BenchmarkDoublePbrtoIsNaN 1.40 ns 0.977 ns 640000000
BenchmarkFloatIsNaN 1.73 ns 1.34 ns 640000000
BenchmarkFloatOpenLibMIsNaN 2.68 ns 1.99 ns 344615385
BenchmarkFloatStdIsNaN 1.62 ns 1.33 ns 1000000000
BenchmarkFloatPbrtoIsNaN 1.72 ns 1.41 ns 497777778
BenchmarkDoubleSignum 2.08 ns 1.60 ns 448000000
BenchmarkDoubleOpenLibMSignbit 2.10 ns 1.79 ns 497777778
BenchmarkDoubleNaiveSignum 1.82 ns 1.50 ns 448000000
BenchmarkFloatSignum 2.27 ns 1.90 ns 560000000
BenchmarkFloatOpenLibMSignbit 1.81 ns 1.56 ns 640000000
BenchmarkFloatNaiveSignum 2.13 ns 1.67 ns 560000000
BenchmarkDoubleDeg2Rad 1.59 ns 1.40 ns 825942387
BenchmarkDoublePbrtRadians 1.74 ns 1.29 ns 448000000
BenchmarkFloatDeg2Rad 1.39 ns 1.19 ns 746666667
BenchmarkFloatPbrtRadians 1.41 ns 1.10 ns 640000000
BenchmarkDoubleRad2Deg 1.63 ns 1.29 ns 640000000
BenchmarkDoublePbrtDegrees 1.48 ns 1.22 ns 640000000
BenchmarkFloatRad2Deg 1.62 ns 1.19 ns 497777778
BenchmarkFloatPbrtDegrees 1.38 ns 1.12 ns 448000000
BenchmarkDoubleNextAfter 8.80 ns 7.74 ns 74666667
BenchmarkDoubleOpenLibMNextAfter 8.83 ns 7.53 ns 112000000
BenchmarkDoubleStdNextAfter 11.9 ns 10.0 ns 100000000
BenchmarkFloatNextAfter 2.00 ns 1.85 ns 497777778
BenchmarkFloatOpenLibMNextAfter 2.26 ns 1.86 ns 320000000
BenchmarkFloatStdNextAfter 6.38 ns 5.62 ns 100000000
BenchmarkDoubleInternalOpenLibMSqrt 33.1 ns 23.1 ns 26352941
BenchmarkDoubleSqrt 1.86 ns 1.34 ns 560000000
BenchmarkDoubleStdSqrt 28.7 ns 22.2 ns 37333333
BenchmarkDoublePbrtSqrt 26.1 ns 22.3 ns 40727273
BenchmarkFloatInternalOpenLibMSqrt 2.48 ns 1.97 ns 373333333
BenchmarkFloatSqrt 1.35 ns 1.26 ns 497777778
BenchmarkFloatStdSqrt 2.47 ns 1.88 ns 373333333
BenchmarkFloatPbrtSqrt 2.20 ns 1.99 ns 407272727
BenchmarkDoubleNextDown 2.24 ns 1.84 ns 407272727
BenchmarkDoubleStdNextDown 12.0 ns 10.7 ns 112000000
BenchmarkFloatNextDown 2.20 ns 1.84 ns 407272727
BenchmarkFloatStdNextDown 6.06 ns 5.31 ns 100000000
BenchmarkFloatPbrtNextFloatDown 2.73 ns 2.43 ns 263529412
BenchmarkDoubleNextUp 4.24 ns 3.26 ns 172307692
BenchmarkDoubleStdNextUp 10.9 ns 9.63 ns 74666667
BenchmarkFloatNextUp 2.26 ns 1.76 ns 320000000
BenchmarkFloatStdNextUp 6.84 ns 5.78 ns 100000000
BenchmarkFloatPbrtNextFloatUp 3.03 ns 1.95 ns 344615385
BenchmarkDoubleIsInf 1.64 ns 1.20 ns 560000000
BenchmarkDoubleOpenLibMIsInf 2.17 ns 1.69 ns 407272727
BenchmarkDoubleStdIsInf 1.71 ns 1.30 ns 407272727
BenchmarkFloatIsInf 1.62 ns 1.36 ns 746666667
BenchmarkFloatOpenLibMIsInf 2.27 ns 1.71 ns 448000000
BenchmarkFloatStdIsInf 1.69 ns 1.30 ns 407272727
BenchmarkFloatPbrtIsInf 1.55 ns 1.41 ns 497777778
BenchmarkDoubleInternalAbs 1.82 ns 1.46 ns 407272727
BenchmarkDoubleAbs 1.55 ns 1.44 ns 497777778
BenchmarkDoubleOpenLibMAbs 1.40 ns 1.13 ns 746666667
BenchmarkDoubleStdAbs 1.67 ns 1.35 ns 497777778
BenchmarkDoublePbrtAbs 1.68 ns 1.34 ns 896000000
BenchmarkFloatInternalAbs 2.21 ns 2.02 ns 448000000
BenchmarkFloatAbs 1.31 ns 0.949 ns 560000000
BenchmarkFloatOpenLibMAbs 2.13 ns 1.71 ns 448000000
BenchmarkFloatStdAbs 1.26 ns 1.06 ns 560000000
BenchmarkFloatPbrtAbs 1.33 ns 1.03 ns 640000000
BenchmarkDoubleSignBit 1.68 ns 1.29 ns 497777778
BenchmarkDoubleOpenLibMSignBit 2.20 ns 1.76 ns 407272727
BenchmarkDoubleStdSignBit 2.66 ns 2.22 ns 344615385
BenchmarkFloatSignBit 1.49 ns 1.32 ns 640000000
BenchmarkFloatOpenLibMSignBit 1.64 ns 1.20 ns 560000000
BenchmarkFloatStdSignBit 2.80 ns 2.10 ns 320000000
BenchmarkDoubleFRExp 3.33 ns 2.77 ns 276833103
BenchmarkDoubleOpenLibMFRExp 2.84 ns 2.25 ns 263529412
BenchmarkDoubleStdFRExp 18.0 ns 15.7 ns 74666667
BenchmarkFloatFRExp 2.00 ns 1.73 ns 497777778
BenchmarkFloatOpenLibMFRExp 2.01 ns 1.73 ns 497777778
BenchmarkFloatStdFRExp 12.9 ns 11.0 ns 64000000
BenchmarkDoubleModF 2.66 ns 1.83 ns 298666667
BenchmarkDoubleOpenLibMModF 2.67 ns 2.36 ns 344615385
BenchmarkDoubleStdModF 3.24 ns 2.93 ns 224000000
BenchmarkFloatingPointDoubleModF 2.53 ns 2.38 ns 407272727
BenchmarkFloatModF 2.21 ns 2.00 ns 320000000
BenchmarkFloatOpenLibMModF 2.31 ns 1.90 ns 344615385
BenchmarkFloatingPointFloatModF 2.56 ns 2.01 ns 373333333
BenchmarkFloatStdModF 4.17 ns 3.45 ns 203636364
BenchmarkDoubleMin 1.89 ns 1.64 ns 448000000
BenchmarkDoubleOpenLibMMin 2.46 ns 2.05 ns 320000000
BenchmarkDoubleStdMin 2.10 ns 1.51 ns 373333333
BenchmarkFloatMin 1.67 ns 1.44 ns 640000000
BenchmarkFloatOpenLibMMin 2.34 ns 1.85 ns 448000000
BenchmarkFloatStdMin 2.22 ns 1.90 ns 344615385
BenchmarkDoubleMax 1.81 ns 1.46 ns 373333333
BenchmarkDoubleOpenLibMMax 2.37 ns 2.10 ns 497777778
BenchmarkDoubleStdMax 2.26 ns 1.81 ns 344615385
BenchmarkFloatMax 1.69 ns 1.48 ns 560000000
BenchmarkFloatOpenLibMMax 2.35 ns 1.68 ns 344615385
BenchmarkFloatStdMax 2.06 ns 1.61 ns 320000000
BenchmarkDoubleTrunc 1.65 ns 1.28 ns 560000000
BenchmarkDoubleOpenLibMTrunc 2.59 ns 2.35 ns 298666667
BenchmarkFloatingPointDoubleTrunc 1.92 ns 1.68 ns 344615385
BenchmarkDoubleStdTrunc 4.62 ns 3.37 ns 213333333
BenchmarkFloatTrunc 1.56 ns 1.29 ns 498024814
BenchmarkFloatOpenLibMTrunc 2.25 ns 2.00 ns 320000000
BenchmarkFloatingPointFloatTrunc 2.12 ns 1.79 ns 560000000
BenchmarkFloatStdTrunc 1.75 ns 1.59 ns 560000000
BenchmarkDoubleFloor 1.37 ns 1.12 ns 640000000
BenchmarkDoubleOpenLibMFloor 3.41 ns 2.67 ns 298666667
BenchmarkFloatingPointDoubleFloor 3.68 ns 3.08 ns 213333333
BenchmarkDoubleStdFloor 1.47 ns 1.23 ns 560000000
BenchmarkFloatFloor 1.47 ns 1.17 ns 560000000
BenchmarkFloatOpenLibMFloor 2.95 ns 2.07 ns 248888889
BenchmarkFloatingPointFloatFloor 2.94 ns 2.37 ns 263529412
BenchmarkFloatStdFloor 1.47 ns 1.31 ns 560000000
BenchmarkDoubleCeil 1.56 ns 1.31 ns 896000000
BenchmarkDoubleOpenLibMCeil 2.73 ns 2.16 ns 448000000
BenchmarkFloatingPointDoubleCeil 3.72 ns 3.29 ns 194782609
BenchmarkDoubleStdCeil 1.49 ns 1.12 ns 448000000
BenchmarkFloatCeil 1.75 ns 1.19 ns 448000000
BenchmarkFloatOpenLibMCeil 2.17 ns 1.45 ns 344615385
BenchmarkFloatingPointFloatCeil 1.94 ns 1.70 ns 497777778
BenchmarkFloatStdCeil 1.51 ns 1.32 ns 746666667
BenchmarkDoubleRound 1.57 ns 1.39 ns 640000000
BenchmarkDoubleOpenLibMRound 4.65 ns 3.84 ns 154482759
BenchmarkFloatingPointDoubleRound 3.25 ns 2.39 ns 235789474
BenchmarkDoubleStdRound 7.44 ns 6.14 ns 112000000
BenchmarkFloatRound 2.07 ns 1.79 ns 560000000
BenchmarkFloatOpenLibMRound 2.25 ns 1.91 ns 497777778
BenchmarkFloatingPointFloatRound 2.91 ns 2.49 ns 344615385
BenchmarkFloatStdRound 2.20 ns 1.63 ns 373333333
BenchmarkDoubleClamp 2.41 ns 2.15 ns 407272727
BenchmarkDoubleStdClamp 1.95 ns 1.61 ns 320000000
BenchmarkFloatClamp 2.06 ns 1.57 ns 497777778
BenchmarkFloatStdClamp 2.09 ns 1.69 ns 407272727
BenchmarkFloatPbrtClamp 1.94 ns 1.57 ns 448000000
BenchmarkDoubleInternalLerpImpl 3.22 ns 2.68 ns 280000000
BenchmarkDoubleLerp 4.40 ns 3.72 ns 172307692
BenchmarkDoubleStdLerp 2.95 ns 2.43 ns 263529412
BenchmarkFloatLerp 4.29 ns 3.63 ns 224000000
BenchmarkFloatInternalLerpImpl 2.42 ns 2.18 ns 280000000
BenchmarkFloatStdLerp 2.64 ns 2.30 ns 298666667
BenchmarkFloatPbrtLerp 1.58 ns 1.17 ns 746666667
BenchmarkDoubleCopySign 1.65 ns 1.23 ns 560000000
BenchmarkDoubleOpenLibMCopySign 1.67 ns 1.56 ns 640000000
BenchmarkDoubleStdCopySign 6.14 ns 5.62 ns 100000000
BenchmarkDoublePbrtCopySign 6.25 ns 5.16 ns 100000000
BenchmarkFloatCopySign 1.69 ns 1.53 ns 448000000
BenchmarkFloatOpenLibMCopySign 2.17 ns 1.60 ns 448000000
BenchmarkFloatStdCopySign 1.77 ns 1.34 ns 407272727
BenchmarkFloatPbrtCopySign 1.84 ns 1.72 ns 373333333
BenchmarkDoubleScaleByN 3.44 ns 2.78 ns 213333333
BenchmarkDoubleOpenLibMScaleByN 4.48 ns 3.75 ns 179200000
BenchmarkDoubleStdScaleByN 8.98 ns 7.95 ns 74666667
BenchmarkFloatScaleByN 2.30 ns 1.88 ns 373333333
BenchmarkFloatOpenLibMScaleByN 2.35 ns 2.09 ns 344615385
BenchmarkFloatStdScaleByN 6.35 ns 4.88 ns 112000000
BenchmarkDoubleFMod 108 ns 97.7 ns 8960000
BenchmarkDoubleOpenLibMFMod 1376 ns 1004 ns 497778
BenchmarkDoubleStdFMod 108 ns 85.8 ns 7466667
BenchmarkFloatFMod 5.56 ns 4.96 ns 154482759
BenchmarkFloatOpenLibMFMod 3.28 ns 2.85 ns 263529412
BenchmarkFloatStdFMod 4.99 ns 4.43 ns 165925926
BenchmarkFloatPbrtFMod 5.05 ns 4.11 ns 144516129
BenchmarkDoubleExp 3.24 ns 2.68 ns 280000000
BenchmarkDoubleOpenLibMExp 3.56 ns 2.85 ns 235789474
BenchmarkDoubleStdExp 46.1 ns 43.0 ns 20363636
BenchmarkFloatExp 3.22 ns 2.67 ns 263529412
BenchmarkFloatOpenLibMExp 3.24 ns 2.46 ns 298666667
BenchmarkFloatStdExp 5.74 ns 4.83 ns 161858065
BenchmarkFloatPbrtFastExp 3.35 ns 3.18 ns 280000000
BenchmarkDoubleHypot 2.18 ns 1.57 ns 407272727
BenchmarkDoubleOpenLibMHypot 5.11 ns 4.10 ns 179200000
BenchmarkDoubleOpenLibMFastHypot 4.69 ns 3.53 ns 194782609
BenchmarkDoubleStdHypot 5.79 ns 5.00 ns 100000000
BenchmarkFloatHypot 1.86 ns 1.63 ns 497777778
BenchmarkFloatOpenLibMHypot 5.42 ns 4.27 ns 186666667
BenchmarkFloatOpenLibMFastHypot 3.30 ns 2.67 ns 298666667
BenchmarkFloatStdHypot 5.53 ns 5.31 ns 100000000
BenchmarkDoubleHypot3 3.13 ns 2.67 ns 298666667
BenchmarkDoubleStdHypot3 4.43 ns 4.02 ns 186666667
BenchmarkFloatHypot3 2.07 ns 1.64 ns 448000000
BenchmarkFloatStdHypot3 6.65 ns 5.72 ns 112000000
BenchmarkDoubleInternalLog 6.79 ns 6.14 ns 112000000
BenchmarkDoubleLog 6.92 ns 6.42 ns 112000000
BenchmarkDoubleOpenLibMLog 7.02 ns 6.00 ns 112000000
BenchmarkDoubleStdLog 31.4 ns 24.6 ns 29866667
BenchmarkFloatInternalLog 2.56 ns 1.80 ns 407272727
BenchmarkFloatLog 2.57 ns 1.97 ns 373333333
BenchmarkFloatOpenLibMLog 2.65 ns 2.34 ns 373333333
BenchmarkFloatStdLog 7.12 ns 5.62 ns 100000000
BenchmarkDoubleLog2 7.82 ns 6.09 ns 100000000
BenchmarkDoubleOpenLibMLog2 7.53 ns 5.72 ns 112000000
BenchmarkDoubleStdLog2 33.4 ns 26.8 ns 37333333
BenchmarkFloatLog2 2.46 ns 2.34 ns 320000000
BenchmarkFloatOpenLibMLog2 2.66 ns 2.20 ns 298666667
BenchmarkFloatStdLog2 7.14 ns 5.78 ns 100000000
BenchmarkDoubleLog10 8.40 ns 6.70 ns 112000000
BenchmarkDoubleOpenLibMLog10 7.76 ns 7.03 ns 100000000
BenchmarkDoubleStdLog10 34.0 ns 32.0 ns 24888889
BenchmarkFloatLog10 2.80 ns 2.35 ns 298666667
BenchmarkFloatOpenLibMLog10 3.01 ns 2.79 ns 224000000
BenchmarkFloatStdLog10 6.07 ns 5.16 ns 100000000
BenchmarkDoubleOpenLibMSin 9.57 ns 8.37 ns 74666667
BenchmarkDoubleSin 7.89 ns 6.56 ns 112000000
BenchmarkDoubleStdSin 7.36 ns 6.28 ns 112000000
BenchmarkFloatOpenLibMSin 4.58 ns 3.60 ns 186666667
BenchmarkFloatSin 5.28 ns 4.36 ns 179200000
BenchmarkFloatStdSin 10.1 ns 8.30 ns 64000000
BenchmarkDoubleOpenLibMCos 9.16 ns 6.87 ns 95573333
BenchmarkDoubleCos 7.30 ns 6.56 ns 112000000
BenchmarkDoubleStdCos 7.86 ns 6.70 ns 112000000
BenchmarkFloatOpenLibMCos 4.92 ns 4.52 ns 186666667
BenchmarkFloatCos 4.65 ns 3.77 ns 186666667
BenchmarkFloatStdCos 6.84 ns 6.25 ns 100000000
BenchmarkDoubleOpenLibMSinCos 7.83 ns 7.53 ns 112000000
BenchmarkFloatOpenLibMSinCos 5.80 ns 4.65 ns 144516129
BenchmarkFloatDirectXSinCos 10.2 ns 8.59 ns 100000000
BenchmarkDoubleTan 12.0 ns 9.28 ns 64000000
BenchmarkDoubleOpenLibMTan 15.6 ns 13.5 ns 49777778
BenchmarkDoubleStdTan 11.5 ns 9.52 ns 64000000
BenchmarkFloatTan 5.75 ns 5.31 ns 100000000
BenchmarkFloatOpenLibMTan 5.59 ns 4.60 ns 149333333
BenchmarkFloatStdTan 7.37 ns 6.88 ns 100000000
BenchmarkDoubleATan 8.35 ns 7.34 ns 100000000
BenchmarkDoubleOpenLibMATan 8.67 ns 8.02 ns 89600000
BenchmarkDoubleStdATan 8.60 ns 7.34 ns 100000000
BenchmarkFloatATan 6.65 ns 5.14 ns 133802667
BenchmarkFloatOpenLibMATan 6.36 ns 5.47 ns 100000000
BenchmarkFloatStdATan 6.88 ns 6.09 ns 100000000
BenchmarkDoubleASin 8.71 ns 7.45 ns 115347127
BenchmarkDoubleOpenLibMASin 48.6 ns 35.3 ns 20363636
BenchmarkDoubleOpenLibMFastASin 10.5 ns 7.67 ns 89600000
BenchmarkDoubleStdASin 11.3 ns 9.77 ns 56000000
BenchmarkFloatASin 5.46 ns 4.29 ns 149333333
BenchmarkFloatOpenLibMASin 44.5 ns 36.9 ns 19478261
BenchmarkFloatOpenLibMFastASin 6.00 ns 4.73 ns 148669630
BenchmarkFloatStdASin 8.70 ns 6.98 ns 89600000
BenchmarkDoubleACos 8.20 ns 6.42 ns 112000000
BenchmarkDoubleOpenLibMACos 49.4 ns 37.5 ns 17920000
BenchmarkDoubleOpenLibMFastACos 7.25 ns 6.56 ns 123891359
BenchmarkDoubleStdACos 12.5 ns 10.0 ns 74666667
BenchmarkFloatACos 5.71 ns 4.49 ns 160000000
BenchmarkFloatOpenLibMACos 18.6 ns 14.6 ns 40727273
BenchmarkFloatOpenLibMFastACos 5.90 ns 5.30 ns 112000000
BenchmarkFloatStdACos 9.66 ns 8.54 ns 89600000
BenchmarkDoubleATan2 16.6 ns 12.6 ns 56000000
BenchmarkDoubleOpenLibMATan2 18.0 ns 15.1 ns 56000000
BenchmarkDoubleStdATan2 22.8 ns 17.7 ns 34461538
BenchmarkFloatATan2 11.3 ns 8.54 ns 64000000
BenchmarkFloatOpenLibMATan2 11.3 ns 9.77 ns 64000000
BenchmarkFloatStdATan2 17.6 ns 13.2 ns 49777778
BenchmarkFloatFMA 2.08 ns 1.55 ns 494344828
BenchmarkPbrtFloatFMA 2.07 ns 1.51 ns 560000000
BenchmarkFloatSinXOverX 6.53 ns 5.47 ns 100000000
BenchmarkPbrtFloatSinXOverX 8.90 ns 7.32 ns 74666667
BenchmarkFloatExpM1 18.3 ns 15.3 ns 56000000
BenchmarkFloatOpenLibMExpM1 12.2 ns 10.5 ns 112000000
BenchmarkFloatSinc 6.95 ns 5.44 ns 112000000
BenchmarkPbrtFloatSinc 9.74 ns 8.79 ns 74666667
BenchmarkFloatMod 5.94 ns 5.16 ns 100000000
BenchmarkPbrtFloatMod 6.52 ns 5.58 ns 112000000
BenchmarkFloatSmoothStep 4.53 ns 3.60 ns 186666667
BenchmarkPbrtFloatSmoothStep 3.60 ns 3.31 ns 235789474
BenchmarkFloatSafeSqrt 1.84 ns 1.60 ns 497777778
BenchmarkPbrtFloatSafeSqrt 3.18 ns 2.51 ns 280000000
BenchmarkFloatSqr 1.87 ns 1.60 ns 448000000
BenchmarkPbrtFloatSqr 2.05 ns 1.61 ns 320000000
BenchmarkFloatSafeASin 9.45 ns 7.11 ns 74666667
BenchmarkPbrtFloatSafeASin 10.4 ns 9.07 ns 89600000
BenchmarkFloatSafeACos 6.92 ns 5.62 ns 100000000
BenchmarkPbrtFloatSafeACos 10.1 ns 8.79 ns 64000000
BenchmarkFloatNextFloatUp 2.80 ns 2.49 ns 344615385
BenchmarkPbrtFloatNextFloatUp 3.73 ns 3.39 ns 248888889
BenchmarkFloatNextFloatDown 2.71 ns 2.25 ns 263529412
BenchmarkPbrtFloatNextFloatDown 3.08 ns 2.69 ns 203636364
BenchmarkFloatAddRoundUp 3.26 ns 2.49 ns 263529412
BenchmarkPbrtFloatAddRoundUp 3.69 ns 3.09 ns 298666667
BenchmarkFloatAddRoundDown 2.69 ns 2.12 ns 280000000
BenchmarkPbrtFloatAddRoundDown 3.56 ns 2.93 ns 298666667
BenchmarkFloatSubRoundUp 2.57 ns 2.03 ns 407272727
BenchmarkPbrtFloatSubRoundUp 4.37 ns 2.99 ns 203636364
BenchmarkFloatSubRoundDown 2.42 ns 1.92 ns 448000000
BenchmarkPbrtFloatSubRoundDown 4.40 ns 3.61 ns 194782609
BenchmarkFloatMulRoundUp 3.03 ns 2.44 ns 320000000
BenchmarkPbrtFloatMulRoundUp 3.37 ns 2.90 ns 172307692
BenchmarkFloatMulRoundDown 2.90 ns 2.27 ns 344615385
BenchmarkPbrtFloatMulRoundDown 3.00 ns 2.55 ns 263529412
BenchmarkFloatDivRoundUp 2.97 ns 2.49 ns 263529412
BenchmarkPbrtFloatDivRoundUp 4.31 ns 2.69 ns 185837037
BenchmarkFloatDivRoundDown 2.25 ns 1.81 ns 344615385
BenchmarkPbrtFloatDivRoundDown 3.70 ns 3.14 ns 248888889
BenchmarkFloatSqrtRoundUp 2.94 ns 2.68 ns 280000000
BenchmarkPbrtFloatSqrtRoundUp 3.85 ns 3.20 ns 248888889
BenchmarkFloatSqrtRoundDown 3.17 ns 2.62 ns 298666667
BenchmarkPbrtFloatSqrtRoundDown 3.59 ns 2.95 ns 248888889
BenchmarkFloatFMARoundUp 3.36 ns 2.67 ns 263529412
BenchmarkPbrtFloatFMARoundUp 3.70 ns 2.70 ns 248888889
BenchmarkFloatFMARoundDown 2.70 ns 2.10 ns 320000000
BenchmarkPbrtFloatFMARoundDown 3.75 ns 3.07 ns 280000000
BenchmarkFloatFastLog2 7.96 ns 6.14 ns 112000000
BenchmarkPbrtFloatLog2 7.09 ns 5.56 ns 154482759
BenchmarkFloatLog2Int 3.68 ns 3.13 ns 194782609
BenchmarkPbrtFloatLog2Int 3.99 ns 3.08 ns 263529412
BenchmarkFloatGaussian 6.42 ns 6.25 ns 100000000
BenchmarkPbrtFloatGaussian 6.98 ns 6.14 ns 112000000
BenchmarkFloatLogistic 11.2 ns 9.11 ns 77193846
BenchmarkPbrtFloatLogistic 13.5 ns 10.3 ns 64000000
BenchmarkFloatDifferenceOfProducts 2.83 ns 2.20 ns 320000000
BenchmarkPbrtFloatDifferenceOfProducts 3.08 ns 2.54 ns 320000000
BenchmarkFloatSumOfProducts 3.04 ns 2.51 ns 280000000
BenchmarkPbrtFloatSumOfProducts 3.10 ns 2.62 ns 280000000
BenchmarkFloatQuadratic 3.51 ns 3.01 ns 280000000
BenchmarkFloatPbrtQuadratic 3.62 ns 2.78 ns 213333333
BenchmarkFloatIntervalMultiply 13.8 ns 10.9 ns 112000000
BenchmarkPbrtFloatIntervalMultiply 49.4 ns 44.3 ns 16592593
BenchmarkFloatIntervalDivide 16.1 ns 14.0 ns 44800000
BenchmarkPbrtFloatIntervalDivide 49.3 ns 37.5 ns 17920000
BenchmarkFloatIntervalScalarMultiply 6.66 ns 4.46 ns 112000000
BenchmarkPbrtFloatIntervalScalarMultiply 6.25 ns 5.58 ns 112000000
BenchmarkFloatIntervalScalarDivision 6.21 ns 4.60 ns 149333333
BenchmarkPbrtFloatIntervalScalarDivision 7.24 ns 5.00 ns 100000000
BenchmarkFloatIntervalAddition 4.86 ns 4.29 ns 149333333
BenchmarkPbrtFloatIntervalAddition 6.43 ns 5.98 ns 172307692
BenchmarkFloatIntervalSubtraction 4.89 ns 4.46 ns 185837037
BenchmarkPbrtFloatIntervalSubtraction 7.06 ns 5.31 ns 100000000
BenchmarkFloatIntervalScalarAddition 5.56 ns 5.16 ns 100000000
BenchmarkPbrtFloatIntervalScalarAddition 8.58 ns 7.34 ns 100000000
BenchmarkFloatIntervalScalarSubtraction 6.64 ns 5.08 ns 144516129
BenchmarkPbrtFloatIntervalScalarSubtraction 9.17 ns 7.81 ns 100000000
BenchmarkFloatIntervalSqrt 7.02 ns 4.71 ns 165925926
BenchmarkPbrtFloatIntervalSqrt 10.1 ns 8.72 ns 89600000
BenchmarkFloatIntervalFMA 17.7 ns 13.3 ns 44800000
BenchmarkPbrtFloatIntervalFMA 52.1 ns 50.0 ns 10000000
BenchmarkFloatIntervalDifferenceOfProducts 28.4 ns 22.7 ns 34461538
BenchmarkPbrtFloatIntervalDifferenceOfProducts 95.4 ns 73.2 ns 7466667
BenchmarkFloatIntervalSumOfProducts 25.7 ns 22.5 ns 32000000
BenchmarkPbrtFloatIntervalSumOfProducts 96.3 ns 87.9 ns 7466667
BenchmarkFloatIntervalQuadratic 35.6 ns 28.4 ns 20363636
BenchmarkPbrtFloatIntervalQuadratic 101 ns 79.7 ns 10000000
BenchmarkFloatIntervalACos 14.1 ns 12.4 ns 69208276
BenchmarkPbrtFloatIntervalACos 25.0 ns 18.0 ns 37333333
BenchmarkFloatIntervalCos 19.3 ns 15.1 ns 64000000
BenchmarkPbrtFloatIntervalCos 25.7 ns 21.8 ns 34461538
BenchmarkFloatIntervalSin 15.3 ns 13.2 ns 64000000
BenchmarkPbrtFloatIntervalSin 29.8 ns 26.5 ns 22400000
BenchmarkXMQuaternionSlerp 46.5 ns 39.1 ns 26352941
BenchmarkQuaternionSlerp 45.9 ns 32.6 ns 17230769
BenchmarkPbrtQuaternionSlerp 79.3 ns 67.0 ns 11200000
BenchmarkXMQuaternionMultiply 6.73 ns 5.78 ns 100000000
BenchmarkQuaternionMultiply 7.43 ns 5.62 ns 100000000
BenchmarkXMQuaternionRotationNormal 10.0 ns 7.85 ns 89600000
BenchmarkQuaternionFromNormalizedAxisAndAngle 9.83 ns 7.11 ns 74666667
BenchmarkPbrtSquareMatrix4x4Add 12.8 ns 9.77 ns 64000000
BenchmarkSquareMatrix4x4Add 3.07 ns 1.90 ns 263529412
BenchmarkPbrtSquareMatrix3x3Add 11.7 ns 9.03 ns 64000000
BenchmarkSquareMatrix3x3Add 4.54 ns 3.99 ns 203636364
BenchmarkPbrtSquareMatrix2x2Add 4.17 ns 3.00 ns 213333333
BenchmarkSquareMatrix2x2Add 2.70 ns 2.25 ns 320000000
BenchmarkSquareMatrix4x4Sub 3.25 ns 2.85 ns 263529412
BenchmarkSquareMatrix3x3Sub 4.96 ns 4.25 ns 213333333
BenchmarkPbrtSquareMatrix4x4Transpose 9.45 ns 7.81 ns 100000000
BenchmarkSquareMatrix4x4Transpose 3.04 ns 2.62 ns 280000000
BenchmarkPbrtSquareMatrix3x3Transpose 9.69 ns 8.09 ns 112000000
BenchmarkSquareMatrix3x3Transpose 4.49 ns 3.63 ns 172307692
BenchmarkPbrtSquareMatrix2x2Transpose 5.64 ns 5.06 ns 154482759
BenchmarkSquareMatrix2x2Transpose 2.07 ns 1.79 ns 280000000
BenchmarkPbrtSquareMatrix4x4ScalarMultiply 3.88 ns 2.79 ns 179200000
BenchmarkSquareMatrix4x4ScalarMultiply 1.89 ns 1.54 ns 497777778
BenchmarkPbrtSquareMatrix3x3ScalarMultiply 7.10 ns 5.86 ns 112000000
BenchmarkSquareMatrix3x3ScalarMultiply 3.34 ns 2.62 ns 298666667
BenchmarkPbrtSquareMatrix2x2ScalarMultiply 2.44 ns 2.01 ns 248888889
BenchmarkSquareMatrix2x2ScalarMultiply 2.07 ns 1.72 ns 373333333
BenchmarkPbrtSquareMatrix4x4Multiply 37.8 ns 29.3 ns 22400000
BenchmarkSquareMatrix4x4Multiply 7.87 ns 6.14 ns 112000000
BenchmarkPbrtSquareMatrix3x3Multiply 24.7 ns 20.0 ns 32000000
BenchmarkSquareMatrix3x3Multiply 7.15 ns 6.56 ns 100000000
BenchmarkPbrtSquareMatrix2x2Multiply 6.52 ns 5.47 ns 100000000
BenchmarkSquareMatrix2x2Multiply 3.27 ns 2.79 ns 263529412
BenchmarkPbrtSquareMatrix4x4Determinant 14.6 ns 11.2 ns 64000000
BenchmarkXMMatrix4x4Determinant 3.70 ns 3.15 ns 203636364
BenchmarkSquareMatrix4x4Determinant 4.09 ns 3.54 ns 172307692
BenchmarkPbrtSquareMatrix3x3Determinant 6.66 ns 5.16 ns 100000000
BenchmarkSquareMatrix3x3Determinant 4.30 ns 3.36 ns 209066667
BenchmarkPbrtSquareMatrix2x2Determinant 2.13 ns 1.63 ns 373333333
BenchmarkSquareMatrix2x2Determinant 2.86 ns 2.20 ns 298666667
BenchmarkPbrtSquareMatrix4x4Inverse 156 ns 134 ns 6400000
BenchmarkXMMatrix4x4Inverse 18.8 ns 13.8 ns 37333333
BenchmarkSquareMatrix4x4Inverse 19.5 ns 16.7 ns 37333333
BenchmarkPbrtSquareMatrix3x3Inverse 31.2 ns 27.3 ns 19478261
BenchmarkSquareMatrix3x3Inverse 7.04 ns 5.94 ns 100000000
BenchmarkPbrtSquareMatrix2x2Inverse 48.2 ns 36.0 ns 18666667
BenchmarkSquareMatrix2x2Inverse 3.49 ns 2.78 ns 213333333
BenchmarkPointRotation 25.7 ns 20.5 ns 32000000
BenchmarkPointXMMatrixRotationRollPitchYaw 26.7 ns 20.9 ns 29866667
BenchmarkPointRotationAxis 16.1 ns 14.0 ns 56000000
BenchmarkPointXMMatrixRotationAxis 22.7 ns 18.0 ns 37333333
BenchmarkVectorRotationAxis 15.4 ns 12.1 ns 74666667
BenchmarkVectorXMMatrixRotationAxis 21.4 ns 18.1 ns 44800000
BenchmarkNormalRotationAxis 14.6 ns 11.7 ns 74666667
BenchmarkNormalXMMatrixRotationAxis 22.1 ns 17.2 ns 34461538
BenchmarkPointTranslation 6.08 ns 5.32 ns 179200000
BenchmarkPointXMMatrixTranslation 7.48 ns 6.00 ns 112000000
BenchmarkVectorTranslation 5.58 ns 4.65 ns 144516129
BenchmarkVectorXMMatrixTranslation 6.81 ns 5.31 ns 100000000
BenchmarkNormalTranslation 4.16 ns 3.49 ns 224000000
BenchmarkNormalXMMatrixTranslation 4.53 ns 3.77 ns 194782609
BenchmarkPointScaling 5.74 ns 4.30 ns 160000000
BenchmarkPointXMMatrixScaling 5.31 ns 4.35 ns 172307692
BenchmarkVectorScaling 5.85 ns 4.19 ns 179200000
BenchmarkVectorXMMatrixScaling 5.88 ns 4.43 ns 165925926
BenchmarkNormalScaling 5.28 ns 4.67 ns 167253333
BenchmarkNormalXMMatrixScaling 5.52 ns 4.35 ns 154482759
BenchmarkPointTransformationMatrix 4.59 ns 3.84 ns 191146667
BenchmarkPointXMMatrixTransformation 4.91 ns 3.93 ns 186666667
BenchmarkVectorTransformationMatrix 4.92 ns 3.86 ns 165925926
BenchmarkVectorXMMatrixTransformation 4.12 ns 3.11 ns 235789474
BenchmarkNormalTransformationMatrix 4.24 ns 3.35 ns 186666667
BenchmarkNormalXMMatrixTransformation 4.19 ns 3.15 ns 203636364
BenchmarkPointAffineTransformationMatrix 37.2 ns 30.1 ns 24888889
BenchmarkPointXMMatrixAffineTransformation 40.9 ns 35.9 ns 21333333
BenchmarkVectorAffineTransformationMatrix 38.8 ns 29.3 ns 22400000
BenchmarkVectorXMMatrixAffineTransformation 39.2 ns 32.2 ns 20363636
BenchmarkNormalAffineTransformationMatrix 39.1 ns 31.1 ns 23578947
BenchmarkNormalXMMatrixAffineTransformation 41.2 ns 33.7 ns 19478261
BenchmarkPointLookTo 20.9 ns 16.7 ns 44800000
BenchmarkPointXMMatrixLookToLH 30.9 ns 21.9 ns 26352941
BenchmarkVectorLookTo 20.3 ns 17.6 ns 40727273
BenchmarkVectorXMMatrixLookToLH 30.7 ns 22.5 ns 32000000
BenchmarkNormalLookTo 19.7 ns 16.1 ns 40727273
BenchmarkNormalXMMatrixLookToLH 29.6 ns 24.5 ns 24888889
BenchmarkPointLookAt 22.4 ns 17.8 ns 44800000
BenchmarkPointXMMatrixLookAtLH 32.6 ns 26.7 ns 26352941
BenchmarkVectorLookAt 22.4 ns 19.2 ns 44800000
BenchmarkVectorXMMatrixLookAtLH 31.7 ns 25.1 ns 29866667
BenchmarkNormalLookAt 21.0 ns 19.2 ns 40727273
BenchmarkNormalXMMatrixLookAtLH 31.2 ns 23.9 ns 23578947
BenchmarkPointPerspectiveProjection 3.79 ns 3.21 ns 194782609
BenchmarkPointXMMatrixPerspectiveLH 4.50 ns 3.37 ns 194782609
BenchmarkVectorPerspectiveProjection 3.62 ns 2.65 ns 235789474
BenchmarkVectorXMMatrixPerspectiveLH 4.48 ns 3.71 ns 235789474
BenchmarkVectorPerspectiveProjection 3.89 ns 3.07 ns 203636364
BenchmarkNormalXMMatrixPerspectiveLH 4.82 ns 3.37 ns 213333333
BenchmarkPointPerspectiveFovProjection 12.9 ns 9.03 ns 64000000
BenchmarkPointXMMatrixPerspectiveFovLH 18.7 ns 14.1 ns 49777778
BenchmarkVectorPerspectiveFovProjection 11.5 ns 9.35 ns 112000000
BenchmarkVectorXMMatrixPerspectiveFovLH 18.0 ns 13.1 ns 56000000
BenchmarkNormalPerspectiveFovProjection 11.6 ns 9.59 ns 89600000
BenchmarkNormalXMMatrixPerspectiveFovLH 17.8 ns 13.9 ns 64000000
BenchmarkPointProject 50.0 ns 39.3 ns 19478261
BenchmarkPointXMVector3Project 63.0 ns 47.1 ns 17920000
BenchmarkVector2MultipleAdds 5.24 ns 3.92 ns 179200000
BenchmarkVector2MultipleXMVectorAdd 5.74 ns 4.62 ns 172307692
BenchmarkVector2MultipleOperations 5.95 ns 4.41 ns 194782609
BenchmarkPBRTVector2fMultipleOperations 7.40 ns 6.09 ns 100000000
BenchmarkMathFMA1 2.39 ns 1.90 ns 280000000
BenchmarkPbrtFMA1 2.51 ns 1.88 ns 407272727
BenchmarkMathFMA2 4.12 ns 3.05 ns 235789474
BenchmarkPbrtFMA2 3.49 ns 2.83 ns 298666667
BenchmarkMathFMA3 4.76 ns 3.37 ns 194782609
BenchmarkPbrtFMA3 4.14 ns 3.35 ns 186666667
BenchmarkMathQuaternionAdd 4.82 ns 3.34 ns 154482759
BenchmarkPbrtQuaternionAdd 5.25 ns 4.14 ns 165925926
BenchmarkMathPoint3Distance 5.14 ns 4.00 ns 160000000
BenchmarkPbrtPoint3Distance 5.04 ns 3.49 ns 179200000
BenchmarkMathPoint3DistanceSquared 4.85 ns 4.12 ns 185837037
BenchmarkPbrtPoint3DistanceSquared 4.39 ns 3.92 ns 179200000
BenchmarkMathVector3Cross 3.58 ns 2.62 ns 298666667
BenchmarkMathPbrtVector3Cross 3.37 ns 2.88 ns 298666667
BenchmarkMathVector4Cross 3.33 ns 2.57 ns 280000000
BenchmarkMathVector3Dot 3.98 ns 3.08 ns 213333333
BenchmarkPbrtVector3Dot 3.78 ns 3.08 ns 248888889
BenchmarkMathVector3AngleBetween 7.32 ns 6.28 ns 112000000
BenchmarkPbrtVector3AngleBetween 9.09 ns 5.94 ns 100000000
BenchmarkXMVector3AngleBetweenVectors 13.9 ns 11.2 ns 56000000
BenchmarkMathVector3LengthSquared 3.73 ns 3.20 ns 263529412
BenchmarkPbrtVector3LengthSquared 2.82 ns 2.09 ns 298666667
BenchmarkMathVector3Length 3.47 ns 3.02 ns 263529412
BenchmarkPbrtVector3Length 3.40 ns 2.77 ns 298666667
BenchmarkMathVector3Ceil 2.91 ns 2.22 ns 344615385
BenchmarkPbrtVector3Ceil 4.05 ns 3.14 ns 248888889
BenchmarkMathVector3Floor 3.01 ns 2.23 ns 280000000
BenchmarkPbrtVector3Floor 3.22 ns 2.92 ns 203636364
BenchmarkMathVector3Trunc 2.96 ns 2.35 ns 298666667
BenchmarkMathVector3Round 3.08 ns 2.44 ns 224000000
BenchmarkMathVector3Lerp 4.75 ns 3.54 ns 172307692
BenchmarkPbrtVector3Lerp 4.99 ns 3.93 ns 186666667
BenchmarkMathVector3Clamp 4.27 ns 3.63 ns 172307692
BenchmarkMathVector3Sqrt 2.76 ns 2.29 ns 320000000
BenchmarkMathVector3Sin 11.6 ns 10.1 ns 89600000
BenchmarkMathVector3Cos 12.1 ns 9.21 ns 74666667
BenchmarkMathVector3Tan 11.8 ns 8.65 ns 112000000
BenchmarkMathVector3ASin 8.61 ns 6.70 ns 74666667
BenchmarkMathVector3ACos 9.77 ns 7.11 ns 112000000
BenchmarkMathVector3ATan 9.38 ns 7.15 ns 89600000
BenchmarkMathVector3ATan2 21.6 ns 14.9 ns 64000000
BenchmarkMathVector3SinH 16.5 ns 11.9 ns 49777778
BenchmarkMathVector3CosH 14.7 ns 11.7 ns 56000000
BenchmarkMathVector3TanH 15.4 ns 11.1 ns 74666667