PBRTO is a micro optimized version of PBRT-v4, under development as an example of how the functionality in HCCMath.h, HCCSIMD.h and HCCVectorMath.h can be used to optimize the performance of real, computationally intensive, apps. It’s now about 35 % faster than the release build of the original PBRT.

PBRT is a well written piece of software, with a code base that is, for such a complex undertaking, easy to understand and modify. Analyzing the performance of the original code is both instructive and interesting, as many of pbrts’ mathematical functions perform very well.

Most of the solution is contained within the Harlinn.pbrto project. This project builds PBRT at a dynamic link library, which makes it easy to create benchmarks for the original PBRT code,

Harlinn.pbrto can be built with, or without, the functionality from HCCMath.h, HCCSIMD.h and HCCVectorMath.h. To build without, make sure PBRT_USES_HCCMATH is not defined in pbrtodef.h. At this stage it may not always build with the functionality from HCCMath.h, HCCSIMD.h and HCCVectorMath.h.

PBRT has its own set of optimized mathematical functions, and it’s only when we start comparing the vector and matrix operations, that it becomes clear that some significant performance improvements are possible.

Operations on \(4 \times 4\) matrices:

  PBRT Math Improvement
Addition 8.37 ns 1.93 ns 333 %
Transpose 5.00 ns 1.95 ns 156.4 %
Scalar multiplication 2.72 ns 1.46 ns 86.3 %
Matrix multiplication 26.1 ns 4.10 ns 536.6 %
Determinant 11.0 ns 2.18 ns 404.6 %
Inverse 126 ns 12.3 ns 924.4 %

Operations on \(3 \times 3\) matrices:

  PBRT Math Improvement
Addition 8.54 ns 2.92 ns 192.5 %
Transpose 6.10 ns 2.83 ns 115.5 %
Scalar multiplication 5.44 ns 2.18 ns 149.5 %
Matrix multiplication 15.5 ns 5.16 ns 200.4 %
Determinant 5.27 ns 3.08 ns 71.1 %
Inverse 18.6 ns 4.43 ns 319.9 %

Operations on \(2 \times 2\) matrices:

  PBRT Math Improvement
Addition 2.69 ns 1.90 ns 41.6 %
Transpose 3.85 ns 1.37 ns 181.0 %
Scalar multiplication 1.84 ns 1.35 ns 36.3 %
Matrix multiplication 4.52 ns 1.88 ns 140.4 %
Determinant 1.19 ns 1.57 ns -24.2 %
Inverse 41.7 ns 1.77 ns 2255.9 %
-----------------------------------------------------------------------------------------
Benchmark                                               Time             CPU   Iterations
-----------------------------------------------------------------------------------------
BenchmarkDoubleGenerator                             1.63 ns        0.952 ns    640000000
BenchmarkFloatGenerator                              1.90 ns         1.53 ns    407272727
BenchmarkDoubleIsSameValue                           1.98 ns         1.37 ns    560000000
BenchmarkFloatIsSameValue                            1.43 ns         1.12 ns    640000000
BenchmarkDoubleIsZero                                1.65 ns         1.29 ns    640000000
BenchmarkFloatIsZero                                 2.11 ns         1.65 ns    407272727
BenchmarkDoubleIsNaN                                 1.45 ns         1.10 ns    640000000
BenchmarkDoubleOpenLibMIsNaN                         2.36 ns         1.90 ns    280000000
BenchmarkDoubleStdIsNaN                              1.38 ns         1.07 ns    640000000
BenchmarkDoublePbrtoIsNaN                            1.40 ns        0.977 ns    640000000
BenchmarkFloatIsNaN                                  1.73 ns         1.34 ns    640000000
BenchmarkFloatOpenLibMIsNaN                          2.68 ns         1.99 ns    344615385
BenchmarkFloatStdIsNaN                               1.62 ns         1.33 ns   1000000000
BenchmarkFloatPbrtoIsNaN                             1.72 ns         1.41 ns    497777778
BenchmarkDoubleSignum                                2.08 ns         1.60 ns    448000000
BenchmarkDoubleOpenLibMSignbit                       2.10 ns         1.79 ns    497777778
BenchmarkDoubleNaiveSignum                           1.82 ns         1.50 ns    448000000
BenchmarkFloatSignum                                 2.27 ns         1.90 ns    560000000
BenchmarkFloatOpenLibMSignbit                        1.81 ns         1.56 ns    640000000
BenchmarkFloatNaiveSignum                            2.13 ns         1.67 ns    560000000
BenchmarkDoubleDeg2Rad                               1.59 ns         1.40 ns    825942387
BenchmarkDoublePbrtRadians                           1.74 ns         1.29 ns    448000000
BenchmarkFloatDeg2Rad                                1.39 ns         1.19 ns    746666667
BenchmarkFloatPbrtRadians                            1.41 ns         1.10 ns    640000000
BenchmarkDoubleRad2Deg                               1.63 ns         1.29 ns    640000000
BenchmarkDoublePbrtDegrees                           1.48 ns         1.22 ns    640000000
BenchmarkFloatRad2Deg                                1.62 ns         1.19 ns    497777778
BenchmarkFloatPbrtDegrees                            1.38 ns         1.12 ns    448000000
BenchmarkDoubleNextAfter                             8.80 ns         7.74 ns     74666667
BenchmarkDoubleOpenLibMNextAfter                     8.83 ns         7.53 ns    112000000
BenchmarkDoubleStdNextAfter                          11.9 ns         10.0 ns    100000000
BenchmarkFloatNextAfter                              2.00 ns         1.85 ns    497777778
BenchmarkFloatOpenLibMNextAfter                      2.26 ns         1.86 ns    320000000
BenchmarkFloatStdNextAfter                           6.38 ns         5.62 ns    100000000
BenchmarkDoubleInternalOpenLibMSqrt                  33.1 ns         23.1 ns     26352941
BenchmarkDoubleSqrt                                  1.86 ns         1.34 ns    560000000
BenchmarkDoubleStdSqrt                               28.7 ns         22.2 ns     37333333
BenchmarkDoublePbrtSqrt                              26.1 ns         22.3 ns     40727273
BenchmarkFloatInternalOpenLibMSqrt                   2.48 ns         1.97 ns    373333333
BenchmarkFloatSqrt                                   1.35 ns         1.26 ns    497777778
BenchmarkFloatStdSqrt                                2.47 ns         1.88 ns    373333333
BenchmarkFloatPbrtSqrt                               2.20 ns         1.99 ns    407272727
BenchmarkDoubleNextDown                              2.24 ns         1.84 ns    407272727
BenchmarkDoubleStdNextDown                           12.0 ns         10.7 ns    112000000
BenchmarkFloatNextDown                               2.20 ns         1.84 ns    407272727
BenchmarkFloatStdNextDown                            6.06 ns         5.31 ns    100000000
BenchmarkFloatPbrtNextFloatDown                      2.73 ns         2.43 ns    263529412
BenchmarkDoubleNextUp                                4.24 ns         3.26 ns    172307692
BenchmarkDoubleStdNextUp                             10.9 ns         9.63 ns     74666667
BenchmarkFloatNextUp                                 2.26 ns         1.76 ns    320000000
BenchmarkFloatStdNextUp                              6.84 ns         5.78 ns    100000000
BenchmarkFloatPbrtNextFloatUp                        3.03 ns         1.95 ns    344615385
BenchmarkDoubleIsInf                                 1.64 ns         1.20 ns    560000000
BenchmarkDoubleOpenLibMIsInf                         2.17 ns         1.69 ns    407272727
BenchmarkDoubleStdIsInf                              1.71 ns         1.30 ns    407272727
BenchmarkFloatIsInf                                  1.62 ns         1.36 ns    746666667
BenchmarkFloatOpenLibMIsInf                          2.27 ns         1.71 ns    448000000
BenchmarkFloatStdIsInf                               1.69 ns         1.30 ns    407272727
BenchmarkFloatPbrtIsInf                              1.55 ns         1.41 ns    497777778
BenchmarkDoubleInternalAbs                           1.82 ns         1.46 ns    407272727
BenchmarkDoubleAbs                                   1.55 ns         1.44 ns    497777778
BenchmarkDoubleOpenLibMAbs                           1.40 ns         1.13 ns    746666667
BenchmarkDoubleStdAbs                                1.67 ns         1.35 ns    497777778
BenchmarkDoublePbrtAbs                               1.68 ns         1.34 ns    896000000
BenchmarkFloatInternalAbs                            2.21 ns         2.02 ns    448000000
BenchmarkFloatAbs                                    1.31 ns        0.949 ns    560000000
BenchmarkFloatOpenLibMAbs                            2.13 ns         1.71 ns    448000000
BenchmarkFloatStdAbs                                 1.26 ns         1.06 ns    560000000
BenchmarkFloatPbrtAbs                                1.33 ns         1.03 ns    640000000
BenchmarkDoubleSignBit                               1.68 ns         1.29 ns    497777778
BenchmarkDoubleOpenLibMSignBit                       2.20 ns         1.76 ns    407272727
BenchmarkDoubleStdSignBit                            2.66 ns         2.22 ns    344615385
BenchmarkFloatSignBit                                1.49 ns         1.32 ns    640000000
BenchmarkFloatOpenLibMSignBit                        1.64 ns         1.20 ns    560000000
BenchmarkFloatStdSignBit                             2.80 ns         2.10 ns    320000000
BenchmarkDoubleFRExp                                 3.33 ns         2.77 ns    276833103
BenchmarkDoubleOpenLibMFRExp                         2.84 ns         2.25 ns    263529412
BenchmarkDoubleStdFRExp                              18.0 ns         15.7 ns     74666667
BenchmarkFloatFRExp                                  2.00 ns         1.73 ns    497777778
BenchmarkFloatOpenLibMFRExp                          2.01 ns         1.73 ns    497777778
BenchmarkFloatStdFRExp                               12.9 ns         11.0 ns     64000000
BenchmarkDoubleModF                                  2.66 ns         1.83 ns    298666667
BenchmarkDoubleOpenLibMModF                          2.67 ns         2.36 ns    344615385
BenchmarkDoubleStdModF                               3.24 ns         2.93 ns    224000000
BenchmarkFloatingPointDoubleModF                     2.53 ns         2.38 ns    407272727
BenchmarkFloatModF                                   2.21 ns         2.00 ns    320000000
BenchmarkFloatOpenLibMModF                           2.31 ns         1.90 ns    344615385
BenchmarkFloatingPointFloatModF                      2.56 ns         2.01 ns    373333333
BenchmarkFloatStdModF                                4.17 ns         3.45 ns    203636364
BenchmarkDoubleMin                                   1.89 ns         1.64 ns    448000000
BenchmarkDoubleOpenLibMMin                           2.46 ns         2.05 ns    320000000
BenchmarkDoubleStdMin                                2.10 ns         1.51 ns    373333333
BenchmarkFloatMin                                    1.67 ns         1.44 ns    640000000
BenchmarkFloatOpenLibMMin                            2.34 ns         1.85 ns    448000000
BenchmarkFloatStdMin                                 2.22 ns         1.90 ns    344615385
BenchmarkDoubleMax                                   1.81 ns         1.46 ns    373333333
BenchmarkDoubleOpenLibMMax                           2.37 ns         2.10 ns    497777778
BenchmarkDoubleStdMax                                2.26 ns         1.81 ns    344615385
BenchmarkFloatMax                                    1.69 ns         1.48 ns    560000000
BenchmarkFloatOpenLibMMax                            2.35 ns         1.68 ns    344615385
BenchmarkFloatStdMax                                 2.06 ns         1.61 ns    320000000
BenchmarkDoubleTrunc                                 1.65 ns         1.28 ns    560000000
BenchmarkDoubleOpenLibMTrunc                         2.59 ns         2.35 ns    298666667
BenchmarkFloatingPointDoubleTrunc                    1.92 ns         1.68 ns    344615385
BenchmarkDoubleStdTrunc                              4.62 ns         3.37 ns    213333333
BenchmarkFloatTrunc                                  1.56 ns         1.29 ns    498024814
BenchmarkFloatOpenLibMTrunc                          2.25 ns         2.00 ns    320000000
BenchmarkFloatingPointFloatTrunc                     2.12 ns         1.79 ns    560000000
BenchmarkFloatStdTrunc                               1.75 ns         1.59 ns    560000000
BenchmarkDoubleFloor                                 1.37 ns         1.12 ns    640000000
BenchmarkDoubleOpenLibMFloor                         3.41 ns         2.67 ns    298666667
BenchmarkFloatingPointDoubleFloor                    3.68 ns         3.08 ns    213333333
BenchmarkDoubleStdFloor                              1.47 ns         1.23 ns    560000000
BenchmarkFloatFloor                                  1.47 ns         1.17 ns    560000000
BenchmarkFloatOpenLibMFloor                          2.95 ns         2.07 ns    248888889
BenchmarkFloatingPointFloatFloor                     2.94 ns         2.37 ns    263529412
BenchmarkFloatStdFloor                               1.47 ns         1.31 ns    560000000
BenchmarkDoubleCeil                                  1.56 ns         1.31 ns    896000000
BenchmarkDoubleOpenLibMCeil                          2.73 ns         2.16 ns    448000000
BenchmarkFloatingPointDoubleCeil                     3.72 ns         3.29 ns    194782609
BenchmarkDoubleStdCeil                               1.49 ns         1.12 ns    448000000
BenchmarkFloatCeil                                   1.75 ns         1.19 ns    448000000
BenchmarkFloatOpenLibMCeil                           2.17 ns         1.45 ns    344615385
BenchmarkFloatingPointFloatCeil                      1.94 ns         1.70 ns    497777778
BenchmarkFloatStdCeil                                1.51 ns         1.32 ns    746666667
BenchmarkDoubleRound                                 1.57 ns         1.39 ns    640000000
BenchmarkDoubleOpenLibMRound                         4.65 ns         3.84 ns    154482759
BenchmarkFloatingPointDoubleRound                    3.25 ns         2.39 ns    235789474
BenchmarkDoubleStdRound                              7.44 ns         6.14 ns    112000000
BenchmarkFloatRound                                  2.07 ns         1.79 ns    560000000
BenchmarkFloatOpenLibMRound                          2.25 ns         1.91 ns    497777778
BenchmarkFloatingPointFloatRound                     2.91 ns         2.49 ns    344615385
BenchmarkFloatStdRound                               2.20 ns         1.63 ns    373333333
BenchmarkDoubleClamp                                 2.41 ns         2.15 ns    407272727
BenchmarkDoubleStdClamp                              1.95 ns         1.61 ns    320000000
BenchmarkFloatClamp                                  2.06 ns         1.57 ns    497777778
BenchmarkFloatStdClamp                               2.09 ns         1.69 ns    407272727
BenchmarkFloatPbrtClamp                              1.94 ns         1.57 ns    448000000
BenchmarkDoubleInternalLerpImpl                      3.22 ns         2.68 ns    280000000
BenchmarkDoubleLerp                                  4.40 ns         3.72 ns    172307692
BenchmarkDoubleStdLerp                               2.95 ns         2.43 ns    263529412
BenchmarkFloatLerp                                   4.29 ns         3.63 ns    224000000
BenchmarkFloatInternalLerpImpl                       2.42 ns         2.18 ns    280000000
BenchmarkFloatStdLerp                                2.64 ns         2.30 ns    298666667
BenchmarkFloatPbrtLerp                               1.58 ns         1.17 ns    746666667
BenchmarkDoubleCopySign                              1.65 ns         1.23 ns    560000000
BenchmarkDoubleOpenLibMCopySign                      1.67 ns         1.56 ns    640000000
BenchmarkDoubleStdCopySign                           6.14 ns         5.62 ns    100000000
BenchmarkDoublePbrtCopySign                          6.25 ns         5.16 ns    100000000
BenchmarkFloatCopySign                               1.69 ns         1.53 ns    448000000
BenchmarkFloatOpenLibMCopySign                       2.17 ns         1.60 ns    448000000
BenchmarkFloatStdCopySign                            1.77 ns         1.34 ns    407272727
BenchmarkFloatPbrtCopySign                           1.84 ns         1.72 ns    373333333
BenchmarkDoubleScaleByN                              3.44 ns         2.78 ns    213333333
BenchmarkDoubleOpenLibMScaleByN                      4.48 ns         3.75 ns    179200000
BenchmarkDoubleStdScaleByN                           8.98 ns         7.95 ns     74666667
BenchmarkFloatScaleByN                               2.30 ns         1.88 ns    373333333
BenchmarkFloatOpenLibMScaleByN                       2.35 ns         2.09 ns    344615385
BenchmarkFloatStdScaleByN                            6.35 ns         4.88 ns    112000000
BenchmarkDoubleFMod                                   108 ns         97.7 ns      8960000
BenchmarkDoubleOpenLibMFMod                          1376 ns         1004 ns       497778
BenchmarkDoubleStdFMod                                108 ns         85.8 ns      7466667
BenchmarkFloatFMod                                   5.56 ns         4.96 ns    154482759
BenchmarkFloatOpenLibMFMod                           3.28 ns         2.85 ns    263529412
BenchmarkFloatStdFMod                                4.99 ns         4.43 ns    165925926
BenchmarkFloatPbrtFMod                               5.05 ns         4.11 ns    144516129
BenchmarkDoubleExp                                   3.24 ns         2.68 ns    280000000
BenchmarkDoubleOpenLibMExp                           3.56 ns         2.85 ns    235789474
BenchmarkDoubleStdExp                                46.1 ns         43.0 ns     20363636
BenchmarkFloatExp                                    3.22 ns         2.67 ns    263529412
BenchmarkFloatOpenLibMExp                            3.24 ns         2.46 ns    298666667
BenchmarkFloatStdExp                                 5.74 ns         4.83 ns    161858065
BenchmarkFloatPbrtFastExp                            3.35 ns         3.18 ns    280000000
BenchmarkDoubleHypot                                 2.18 ns         1.57 ns    407272727
BenchmarkDoubleOpenLibMHypot                         5.11 ns         4.10 ns    179200000
BenchmarkDoubleOpenLibMFastHypot                     4.69 ns         3.53 ns    194782609
BenchmarkDoubleStdHypot                              5.79 ns         5.00 ns    100000000
BenchmarkFloatHypot                                  1.86 ns         1.63 ns    497777778
BenchmarkFloatOpenLibMHypot                          5.42 ns         4.27 ns    186666667
BenchmarkFloatOpenLibMFastHypot                      3.30 ns         2.67 ns    298666667
BenchmarkFloatStdHypot                               5.53 ns         5.31 ns    100000000
BenchmarkDoubleHypot3                                3.13 ns         2.67 ns    298666667
BenchmarkDoubleStdHypot3                             4.43 ns         4.02 ns    186666667
BenchmarkFloatHypot3                                 2.07 ns         1.64 ns    448000000
BenchmarkFloatStdHypot3                              6.65 ns         5.72 ns    112000000
BenchmarkDoubleInternalLog                           6.79 ns         6.14 ns    112000000
BenchmarkDoubleLog                                   6.92 ns         6.42 ns    112000000
BenchmarkDoubleOpenLibMLog                           7.02 ns         6.00 ns    112000000
BenchmarkDoubleStdLog                                31.4 ns         24.6 ns     29866667
BenchmarkFloatInternalLog                            2.56 ns         1.80 ns    407272727
BenchmarkFloatLog                                    2.57 ns         1.97 ns    373333333
BenchmarkFloatOpenLibMLog                            2.65 ns         2.34 ns    373333333
BenchmarkFloatStdLog                                 7.12 ns         5.62 ns    100000000
BenchmarkDoubleLog2                                  7.82 ns         6.09 ns    100000000
BenchmarkDoubleOpenLibMLog2                          7.53 ns         5.72 ns    112000000
BenchmarkDoubleStdLog2                               33.4 ns         26.8 ns     37333333
BenchmarkFloatLog2                                   2.46 ns         2.34 ns    320000000
BenchmarkFloatOpenLibMLog2                           2.66 ns         2.20 ns    298666667
BenchmarkFloatStdLog2                                7.14 ns         5.78 ns    100000000
BenchmarkDoubleLog10                                 8.40 ns         6.70 ns    112000000
BenchmarkDoubleOpenLibMLog10                         7.76 ns         7.03 ns    100000000
BenchmarkDoubleStdLog10                              34.0 ns         32.0 ns     24888889
BenchmarkFloatLog10                                  2.80 ns         2.35 ns    298666667
BenchmarkFloatOpenLibMLog10                          3.01 ns         2.79 ns    224000000
BenchmarkFloatStdLog10                               6.07 ns         5.16 ns    100000000
BenchmarkDoubleOpenLibMSin                           9.57 ns         8.37 ns     74666667
BenchmarkDoubleSin                                   7.89 ns         6.56 ns    112000000
BenchmarkDoubleStdSin                                7.36 ns         6.28 ns    112000000
BenchmarkFloatOpenLibMSin                            4.58 ns         3.60 ns    186666667
BenchmarkFloatSin                                    5.28 ns         4.36 ns    179200000
BenchmarkFloatStdSin                                 10.1 ns         8.30 ns     64000000
BenchmarkDoubleOpenLibMCos                           9.16 ns         6.87 ns     95573333
BenchmarkDoubleCos                                   7.30 ns         6.56 ns    112000000
BenchmarkDoubleStdCos                                7.86 ns         6.70 ns    112000000
BenchmarkFloatOpenLibMCos                            4.92 ns         4.52 ns    186666667
BenchmarkFloatCos                                    4.65 ns         3.77 ns    186666667
BenchmarkFloatStdCos                                 6.84 ns         6.25 ns    100000000
BenchmarkDoubleOpenLibMSinCos                        7.83 ns         7.53 ns    112000000
BenchmarkFloatOpenLibMSinCos                         5.80 ns         4.65 ns    144516129
BenchmarkFloatDirectXSinCos                          10.2 ns         8.59 ns    100000000
BenchmarkDoubleTan                                   12.0 ns         9.28 ns     64000000
BenchmarkDoubleOpenLibMTan                           15.6 ns         13.5 ns     49777778
BenchmarkDoubleStdTan                                11.5 ns         9.52 ns     64000000
BenchmarkFloatTan                                    5.75 ns         5.31 ns    100000000
BenchmarkFloatOpenLibMTan                            5.59 ns         4.60 ns    149333333
BenchmarkFloatStdTan                                 7.37 ns         6.88 ns    100000000
BenchmarkDoubleATan                                  8.35 ns         7.34 ns    100000000
BenchmarkDoubleOpenLibMATan                          8.67 ns         8.02 ns     89600000
BenchmarkDoubleStdATan                               8.60 ns         7.34 ns    100000000
BenchmarkFloatATan                                   6.65 ns         5.14 ns    133802667
BenchmarkFloatOpenLibMATan                           6.36 ns         5.47 ns    100000000
BenchmarkFloatStdATan                                6.88 ns         6.09 ns    100000000
BenchmarkDoubleASin                                  8.71 ns         7.45 ns    115347127
BenchmarkDoubleOpenLibMASin                          48.6 ns         35.3 ns     20363636
BenchmarkDoubleOpenLibMFastASin                      10.5 ns         7.67 ns     89600000
BenchmarkDoubleStdASin                               11.3 ns         9.77 ns     56000000
BenchmarkFloatASin                                   5.46 ns         4.29 ns    149333333
BenchmarkFloatOpenLibMASin                           44.5 ns         36.9 ns     19478261
BenchmarkFloatOpenLibMFastASin                       6.00 ns         4.73 ns    148669630
BenchmarkFloatStdASin                                8.70 ns         6.98 ns     89600000
BenchmarkDoubleACos                                  8.20 ns         6.42 ns    112000000
BenchmarkDoubleOpenLibMACos                          49.4 ns         37.5 ns     17920000
BenchmarkDoubleOpenLibMFastACos                      7.25 ns         6.56 ns    123891359
BenchmarkDoubleStdACos                               12.5 ns         10.0 ns     74666667
BenchmarkFloatACos                                   5.71 ns         4.49 ns    160000000
BenchmarkFloatOpenLibMACos                           18.6 ns         14.6 ns     40727273
BenchmarkFloatOpenLibMFastACos                       5.90 ns         5.30 ns    112000000
BenchmarkFloatStdACos                                9.66 ns         8.54 ns     89600000
BenchmarkDoubleATan2                                 16.6 ns         12.6 ns     56000000
BenchmarkDoubleOpenLibMATan2                         18.0 ns         15.1 ns     56000000
BenchmarkDoubleStdATan2                              22.8 ns         17.7 ns     34461538
BenchmarkFloatATan2                                  11.3 ns         8.54 ns     64000000
BenchmarkFloatOpenLibMATan2                          11.3 ns         9.77 ns     64000000
BenchmarkFloatStdATan2                               17.6 ns         13.2 ns     49777778
BenchmarkFloatFMA                                    2.08 ns         1.55 ns    494344828
BenchmarkPbrtFloatFMA                                2.07 ns         1.51 ns    560000000
BenchmarkFloatSinXOverX                              6.53 ns         5.47 ns    100000000
BenchmarkPbrtFloatSinXOverX                          8.90 ns         7.32 ns     74666667
BenchmarkFloatExpM1                                  18.3 ns         15.3 ns     56000000
BenchmarkFloatOpenLibMExpM1                          12.2 ns         10.5 ns    112000000
BenchmarkFloatSinc                                   6.95 ns         5.44 ns    112000000
BenchmarkPbrtFloatSinc                               9.74 ns         8.79 ns     74666667
BenchmarkFloatMod                                    5.94 ns         5.16 ns    100000000
BenchmarkPbrtFloatMod                                6.52 ns         5.58 ns    112000000
BenchmarkFloatSmoothStep                             4.53 ns         3.60 ns    186666667
BenchmarkPbrtFloatSmoothStep                         3.60 ns         3.31 ns    235789474
BenchmarkFloatSafeSqrt                               1.84 ns         1.60 ns    497777778
BenchmarkPbrtFloatSafeSqrt                           3.18 ns         2.51 ns    280000000
BenchmarkFloatSqr                                    1.87 ns         1.60 ns    448000000
BenchmarkPbrtFloatSqr                                2.05 ns         1.61 ns    320000000
BenchmarkFloatSafeASin                               9.45 ns         7.11 ns     74666667
BenchmarkPbrtFloatSafeASin                           10.4 ns         9.07 ns     89600000
BenchmarkFloatSafeACos                               6.92 ns         5.62 ns    100000000
BenchmarkPbrtFloatSafeACos                           10.1 ns         8.79 ns     64000000
BenchmarkFloatNextFloatUp                            2.80 ns         2.49 ns    344615385
BenchmarkPbrtFloatNextFloatUp                        3.73 ns         3.39 ns    248888889
BenchmarkFloatNextFloatDown                          2.71 ns         2.25 ns    263529412
BenchmarkPbrtFloatNextFloatDown                      3.08 ns         2.69 ns    203636364
BenchmarkFloatAddRoundUp                             3.26 ns         2.49 ns    263529412
BenchmarkPbrtFloatAddRoundUp                         3.69 ns         3.09 ns    298666667
BenchmarkFloatAddRoundDown                           2.69 ns         2.12 ns    280000000
BenchmarkPbrtFloatAddRoundDown                       3.56 ns         2.93 ns    298666667
BenchmarkFloatSubRoundUp                             2.57 ns         2.03 ns    407272727
BenchmarkPbrtFloatSubRoundUp                         4.37 ns         2.99 ns    203636364
BenchmarkFloatSubRoundDown                           2.42 ns         1.92 ns    448000000
BenchmarkPbrtFloatSubRoundDown                       4.40 ns         3.61 ns    194782609
BenchmarkFloatMulRoundUp                             3.03 ns         2.44 ns    320000000
BenchmarkPbrtFloatMulRoundUp                         3.37 ns         2.90 ns    172307692
BenchmarkFloatMulRoundDown                           2.90 ns         2.27 ns    344615385
BenchmarkPbrtFloatMulRoundDown                       3.00 ns         2.55 ns    263529412
BenchmarkFloatDivRoundUp                             2.97 ns         2.49 ns    263529412
BenchmarkPbrtFloatDivRoundUp                         4.31 ns         2.69 ns    185837037
BenchmarkFloatDivRoundDown                           2.25 ns         1.81 ns    344615385
BenchmarkPbrtFloatDivRoundDown                       3.70 ns         3.14 ns    248888889
BenchmarkFloatSqrtRoundUp                            2.94 ns         2.68 ns    280000000
BenchmarkPbrtFloatSqrtRoundUp                        3.85 ns         3.20 ns    248888889
BenchmarkFloatSqrtRoundDown                          3.17 ns         2.62 ns    298666667
BenchmarkPbrtFloatSqrtRoundDown                      3.59 ns         2.95 ns    248888889
BenchmarkFloatFMARoundUp                             3.36 ns         2.67 ns    263529412
BenchmarkPbrtFloatFMARoundUp                         3.70 ns         2.70 ns    248888889
BenchmarkFloatFMARoundDown                           2.70 ns         2.10 ns    320000000
BenchmarkPbrtFloatFMARoundDown                       3.75 ns         3.07 ns    280000000
BenchmarkFloatFastLog2                               7.96 ns         6.14 ns    112000000
BenchmarkPbrtFloatLog2                               7.09 ns         5.56 ns    154482759
BenchmarkFloatLog2Int                                3.68 ns         3.13 ns    194782609
BenchmarkPbrtFloatLog2Int                            3.99 ns         3.08 ns    263529412
BenchmarkFloatGaussian                               6.42 ns         6.25 ns    100000000
BenchmarkPbrtFloatGaussian                           6.98 ns         6.14 ns    112000000
BenchmarkFloatLogistic                               11.2 ns         9.11 ns     77193846
BenchmarkPbrtFloatLogistic                           13.5 ns         10.3 ns     64000000
BenchmarkFloatDifferenceOfProducts                   2.83 ns         2.20 ns    320000000
BenchmarkPbrtFloatDifferenceOfProducts               3.08 ns         2.54 ns    320000000
BenchmarkFloatSumOfProducts                          3.04 ns         2.51 ns    280000000
BenchmarkPbrtFloatSumOfProducts                      3.10 ns         2.62 ns    280000000
BenchmarkFloatQuadratic                              3.51 ns         3.01 ns    280000000
BenchmarkFloatPbrtQuadratic                          3.62 ns         2.78 ns    213333333
BenchmarkFloatIntervalMultiply                       13.8 ns         10.9 ns    112000000
BenchmarkPbrtFloatIntervalMultiply                   49.4 ns         44.3 ns     16592593
BenchmarkFloatIntervalDivide                         16.1 ns         14.0 ns     44800000
BenchmarkPbrtFloatIntervalDivide                     49.3 ns         37.5 ns     17920000
BenchmarkFloatIntervalScalarMultiply                 6.66 ns         4.46 ns    112000000
BenchmarkPbrtFloatIntervalScalarMultiply             6.25 ns         5.58 ns    112000000
BenchmarkFloatIntervalScalarDivision                 6.21 ns         4.60 ns    149333333
BenchmarkPbrtFloatIntervalScalarDivision             7.24 ns         5.00 ns    100000000
BenchmarkFloatIntervalAddition                       4.86 ns         4.29 ns    149333333
BenchmarkPbrtFloatIntervalAddition                   6.43 ns         5.98 ns    172307692
BenchmarkFloatIntervalSubtraction                    4.89 ns         4.46 ns    185837037
BenchmarkPbrtFloatIntervalSubtraction                7.06 ns         5.31 ns    100000000
BenchmarkFloatIntervalScalarAddition                 5.56 ns         5.16 ns    100000000
BenchmarkPbrtFloatIntervalScalarAddition             8.58 ns         7.34 ns    100000000
BenchmarkFloatIntervalScalarSubtraction              6.64 ns         5.08 ns    144516129
BenchmarkPbrtFloatIntervalScalarSubtraction          9.17 ns         7.81 ns    100000000
BenchmarkFloatIntervalSqrt                           7.02 ns         4.71 ns    165925926
BenchmarkPbrtFloatIntervalSqrt                       10.1 ns         8.72 ns     89600000
BenchmarkFloatIntervalFMA                            17.7 ns         13.3 ns     44800000
BenchmarkPbrtFloatIntervalFMA                        52.1 ns         50.0 ns     10000000
BenchmarkFloatIntervalDifferenceOfProducts           28.4 ns         22.7 ns     34461538
BenchmarkPbrtFloatIntervalDifferenceOfProducts       95.4 ns         73.2 ns      7466667
BenchmarkFloatIntervalSumOfProducts                  25.7 ns         22.5 ns     32000000
BenchmarkPbrtFloatIntervalSumOfProducts              96.3 ns         87.9 ns      7466667
BenchmarkFloatIntervalQuadratic                      35.6 ns         28.4 ns     20363636
BenchmarkPbrtFloatIntervalQuadratic                   101 ns         79.7 ns     10000000
BenchmarkFloatIntervalACos                           14.1 ns         12.4 ns     69208276
BenchmarkPbrtFloatIntervalACos                       25.0 ns         18.0 ns     37333333
BenchmarkFloatIntervalCos                            19.3 ns         15.1 ns     64000000
BenchmarkPbrtFloatIntervalCos                        25.7 ns         21.8 ns     34461538
BenchmarkFloatIntervalSin                            15.3 ns         13.2 ns     64000000
BenchmarkPbrtFloatIntervalSin                        29.8 ns         26.5 ns     22400000
BenchmarkXMQuaternionSlerp                           46.5 ns         39.1 ns     26352941
BenchmarkQuaternionSlerp                             45.9 ns         32.6 ns     17230769
BenchmarkPbrtQuaternionSlerp                         79.3 ns         67.0 ns     11200000
BenchmarkXMQuaternionMultiply                        6.73 ns         5.78 ns    100000000
BenchmarkQuaternionMultiply                          7.43 ns         5.62 ns    100000000
BenchmarkXMQuaternionRotationNormal                  10.0 ns         7.85 ns     89600000
BenchmarkQuaternionFromNormalizedAxisAndAngle        9.83 ns         7.11 ns     74666667
BenchmarkPbrtSquareMatrix4x4Add                      12.8 ns         9.77 ns     64000000
BenchmarkSquareMatrix4x4Add                          3.07 ns         1.90 ns    263529412
BenchmarkPbrtSquareMatrix3x3Add                      11.7 ns         9.03 ns     64000000
BenchmarkSquareMatrix3x3Add                          4.54 ns         3.99 ns    203636364
BenchmarkPbrtSquareMatrix2x2Add                      4.17 ns         3.00 ns    213333333
BenchmarkSquareMatrix2x2Add                          2.70 ns         2.25 ns    320000000
BenchmarkSquareMatrix4x4Sub                          3.25 ns         2.85 ns    263529412
BenchmarkSquareMatrix3x3Sub                          4.96 ns         4.25 ns    213333333
BenchmarkPbrtSquareMatrix4x4Transpose                9.45 ns         7.81 ns    100000000
BenchmarkSquareMatrix4x4Transpose                    3.04 ns         2.62 ns    280000000
BenchmarkPbrtSquareMatrix3x3Transpose                9.69 ns         8.09 ns    112000000
BenchmarkSquareMatrix3x3Transpose                    4.49 ns         3.63 ns    172307692
BenchmarkPbrtSquareMatrix2x2Transpose                5.64 ns         5.06 ns    154482759
BenchmarkSquareMatrix2x2Transpose                    2.07 ns         1.79 ns    280000000
BenchmarkPbrtSquareMatrix4x4ScalarMultiply           3.88 ns         2.79 ns    179200000
BenchmarkSquareMatrix4x4ScalarMultiply               1.89 ns         1.54 ns    497777778
BenchmarkPbrtSquareMatrix3x3ScalarMultiply           7.10 ns         5.86 ns    112000000
BenchmarkSquareMatrix3x3ScalarMultiply               3.34 ns         2.62 ns    298666667
BenchmarkPbrtSquareMatrix2x2ScalarMultiply           2.44 ns         2.01 ns    248888889
BenchmarkSquareMatrix2x2ScalarMultiply               2.07 ns         1.72 ns    373333333
BenchmarkPbrtSquareMatrix4x4Multiply                 37.8 ns         29.3 ns     22400000
BenchmarkSquareMatrix4x4Multiply                     7.87 ns         6.14 ns    112000000
BenchmarkPbrtSquareMatrix3x3Multiply                 24.7 ns         20.0 ns     32000000
BenchmarkSquareMatrix3x3Multiply                     7.15 ns         6.56 ns    100000000
BenchmarkPbrtSquareMatrix2x2Multiply                 6.52 ns         5.47 ns    100000000
BenchmarkSquareMatrix2x2Multiply                     3.27 ns         2.79 ns    263529412
BenchmarkPbrtSquareMatrix4x4Determinant              14.6 ns         11.2 ns     64000000
BenchmarkXMMatrix4x4Determinant                      3.70 ns         3.15 ns    203636364
BenchmarkSquareMatrix4x4Determinant                  4.09 ns         3.54 ns    172307692
BenchmarkPbrtSquareMatrix3x3Determinant              6.66 ns         5.16 ns    100000000
BenchmarkSquareMatrix3x3Determinant                  4.30 ns         3.36 ns    209066667
BenchmarkPbrtSquareMatrix2x2Determinant              2.13 ns         1.63 ns    373333333
BenchmarkSquareMatrix2x2Determinant                  2.86 ns         2.20 ns    298666667
BenchmarkPbrtSquareMatrix4x4Inverse                   156 ns          134 ns      6400000
BenchmarkXMMatrix4x4Inverse                          18.8 ns         13.8 ns     37333333
BenchmarkSquareMatrix4x4Inverse                      19.5 ns         16.7 ns     37333333
BenchmarkPbrtSquareMatrix3x3Inverse                  31.2 ns         27.3 ns     19478261
BenchmarkSquareMatrix3x3Inverse                      7.04 ns         5.94 ns    100000000
BenchmarkPbrtSquareMatrix2x2Inverse                  48.2 ns         36.0 ns     18666667
BenchmarkSquareMatrix2x2Inverse                      3.49 ns         2.78 ns    213333333
BenchmarkPointRotation                               25.7 ns         20.5 ns     32000000
BenchmarkPointXMMatrixRotationRollPitchYaw           26.7 ns         20.9 ns     29866667
BenchmarkPointRotationAxis                           16.1 ns         14.0 ns     56000000
BenchmarkPointXMMatrixRotationAxis                   22.7 ns         18.0 ns     37333333
BenchmarkVectorRotationAxis                          15.4 ns         12.1 ns     74666667
BenchmarkVectorXMMatrixRotationAxis                  21.4 ns         18.1 ns     44800000
BenchmarkNormalRotationAxis                          14.6 ns         11.7 ns     74666667
BenchmarkNormalXMMatrixRotationAxis                  22.1 ns         17.2 ns     34461538
BenchmarkPointTranslation                            6.08 ns         5.32 ns    179200000
BenchmarkPointXMMatrixTranslation                    7.48 ns         6.00 ns    112000000
BenchmarkVectorTranslation                           5.58 ns         4.65 ns    144516129
BenchmarkVectorXMMatrixTranslation                   6.81 ns         5.31 ns    100000000
BenchmarkNormalTranslation                           4.16 ns         3.49 ns    224000000
BenchmarkNormalXMMatrixTranslation                   4.53 ns         3.77 ns    194782609
BenchmarkPointScaling                                5.74 ns         4.30 ns    160000000
BenchmarkPointXMMatrixScaling                        5.31 ns         4.35 ns    172307692
BenchmarkVectorScaling                               5.85 ns         4.19 ns    179200000
BenchmarkVectorXMMatrixScaling                       5.88 ns         4.43 ns    165925926
BenchmarkNormalScaling                               5.28 ns         4.67 ns    167253333
BenchmarkNormalXMMatrixScaling                       5.52 ns         4.35 ns    154482759
BenchmarkPointTransformationMatrix                   4.59 ns         3.84 ns    191146667
BenchmarkPointXMMatrixTransformation                 4.91 ns         3.93 ns    186666667
BenchmarkVectorTransformationMatrix                  4.92 ns         3.86 ns    165925926
BenchmarkVectorXMMatrixTransformation                4.12 ns         3.11 ns    235789474
BenchmarkNormalTransformationMatrix                  4.24 ns         3.35 ns    186666667
BenchmarkNormalXMMatrixTransformation                4.19 ns         3.15 ns    203636364
BenchmarkPointAffineTransformationMatrix             37.2 ns         30.1 ns     24888889
BenchmarkPointXMMatrixAffineTransformation           40.9 ns         35.9 ns     21333333
BenchmarkVectorAffineTransformationMatrix            38.8 ns         29.3 ns     22400000
BenchmarkVectorXMMatrixAffineTransformation          39.2 ns         32.2 ns     20363636
BenchmarkNormalAffineTransformationMatrix            39.1 ns         31.1 ns     23578947
BenchmarkNormalXMMatrixAffineTransformation          41.2 ns         33.7 ns     19478261
BenchmarkPointLookTo                                 20.9 ns         16.7 ns     44800000
BenchmarkPointXMMatrixLookToLH                       30.9 ns         21.9 ns     26352941
BenchmarkVectorLookTo                                20.3 ns         17.6 ns     40727273
BenchmarkVectorXMMatrixLookToLH                      30.7 ns         22.5 ns     32000000
BenchmarkNormalLookTo                                19.7 ns         16.1 ns     40727273
BenchmarkNormalXMMatrixLookToLH                      29.6 ns         24.5 ns     24888889
BenchmarkPointLookAt                                 22.4 ns         17.8 ns     44800000
BenchmarkPointXMMatrixLookAtLH                       32.6 ns         26.7 ns     26352941
BenchmarkVectorLookAt                                22.4 ns         19.2 ns     44800000
BenchmarkVectorXMMatrixLookAtLH                      31.7 ns         25.1 ns     29866667
BenchmarkNormalLookAt                                21.0 ns         19.2 ns     40727273
BenchmarkNormalXMMatrixLookAtLH                      31.2 ns         23.9 ns     23578947
BenchmarkPointPerspectiveProjection                  3.79 ns         3.21 ns    194782609
BenchmarkPointXMMatrixPerspectiveLH                  4.50 ns         3.37 ns    194782609
BenchmarkVectorPerspectiveProjection                 3.62 ns         2.65 ns    235789474
BenchmarkVectorXMMatrixPerspectiveLH                 4.48 ns         3.71 ns    235789474
BenchmarkVectorPerspectiveProjection                 3.89 ns         3.07 ns    203636364
BenchmarkNormalXMMatrixPerspectiveLH                 4.82 ns         3.37 ns    213333333
BenchmarkPointPerspectiveFovProjection               12.9 ns         9.03 ns     64000000
BenchmarkPointXMMatrixPerspectiveFovLH               18.7 ns         14.1 ns     49777778
BenchmarkVectorPerspectiveFovProjection              11.5 ns         9.35 ns    112000000
BenchmarkVectorXMMatrixPerspectiveFovLH              18.0 ns         13.1 ns     56000000
BenchmarkNormalPerspectiveFovProjection              11.6 ns         9.59 ns     89600000
BenchmarkNormalXMMatrixPerspectiveFovLH              17.8 ns         13.9 ns     64000000
BenchmarkPointProject                                50.0 ns         39.3 ns     19478261
BenchmarkPointXMVector3Project                       63.0 ns         47.1 ns     17920000
BenchmarkVector2MultipleAdds                         5.24 ns         3.92 ns    179200000
BenchmarkVector2MultipleXMVectorAdd                  5.74 ns         4.62 ns    172307692
BenchmarkVector2MultipleOperations                   5.95 ns         4.41 ns    194782609
BenchmarkPBRTVector2fMultipleOperations              7.40 ns         6.09 ns    100000000
BenchmarkMathFMA1                                    2.39 ns         1.90 ns    280000000
BenchmarkPbrtFMA1                                    2.51 ns         1.88 ns    407272727
BenchmarkMathFMA2                                    4.12 ns         3.05 ns    235789474
BenchmarkPbrtFMA2                                    3.49 ns         2.83 ns    298666667
BenchmarkMathFMA3                                    4.76 ns         3.37 ns    194782609
BenchmarkPbrtFMA3                                    4.14 ns         3.35 ns    186666667
BenchmarkMathQuaternionAdd                           4.82 ns         3.34 ns    154482759
BenchmarkPbrtQuaternionAdd                           5.25 ns         4.14 ns    165925926
BenchmarkMathPoint3Distance                          5.14 ns         4.00 ns    160000000
BenchmarkPbrtPoint3Distance                          5.04 ns         3.49 ns    179200000
BenchmarkMathPoint3DistanceSquared                   4.85 ns         4.12 ns    185837037
BenchmarkPbrtPoint3DistanceSquared                   4.39 ns         3.92 ns    179200000
BenchmarkMathVector3Cross                            3.58 ns         2.62 ns    298666667
BenchmarkMathPbrtVector3Cross                        3.37 ns         2.88 ns    298666667
BenchmarkMathVector4Cross                            3.33 ns         2.57 ns    280000000
BenchmarkMathVector3Dot                              3.98 ns         3.08 ns    213333333
BenchmarkPbrtVector3Dot                              3.78 ns         3.08 ns    248888889
BenchmarkMathVector3AngleBetween                     7.32 ns         6.28 ns    112000000
BenchmarkPbrtVector3AngleBetween                     9.09 ns         5.94 ns    100000000
BenchmarkXMVector3AngleBetweenVectors                13.9 ns         11.2 ns     56000000
BenchmarkMathVector3LengthSquared                    3.73 ns         3.20 ns    263529412
BenchmarkPbrtVector3LengthSquared                    2.82 ns         2.09 ns    298666667
BenchmarkMathVector3Length                           3.47 ns         3.02 ns    263529412
BenchmarkPbrtVector3Length                           3.40 ns         2.77 ns    298666667
BenchmarkMathVector3Ceil                             2.91 ns         2.22 ns    344615385
BenchmarkPbrtVector3Ceil                             4.05 ns         3.14 ns    248888889
BenchmarkMathVector3Floor                            3.01 ns         2.23 ns    280000000
BenchmarkPbrtVector3Floor                            3.22 ns         2.92 ns    203636364
BenchmarkMathVector3Trunc                            2.96 ns         2.35 ns    298666667
BenchmarkMathVector3Round                            3.08 ns         2.44 ns    224000000
BenchmarkMathVector3Lerp                             4.75 ns         3.54 ns    172307692
BenchmarkPbrtVector3Lerp                             4.99 ns         3.93 ns    186666667
BenchmarkMathVector3Clamp                            4.27 ns         3.63 ns    172307692
BenchmarkMathVector3Sqrt                             2.76 ns         2.29 ns    320000000
BenchmarkMathVector3Sin                              11.6 ns         10.1 ns     89600000
BenchmarkMathVector3Cos                              12.1 ns         9.21 ns     74666667
BenchmarkMathVector3Tan                              11.8 ns         8.65 ns    112000000
BenchmarkMathVector3ASin                             8.61 ns         6.70 ns     74666667
BenchmarkMathVector3ACos                             9.77 ns         7.11 ns    112000000
BenchmarkMathVector3ATan                             9.38 ns         7.15 ns     89600000
BenchmarkMathVector3ATan2                            21.6 ns         14.9 ns     64000000
BenchmarkMathVector3SinH                             16.5 ns         11.9 ns     49777778
BenchmarkMathVector3CosH                             14.7 ns         11.7 ns     56000000
BenchmarkMathVector3TanH                             15.4 ns         11.1 ns     74666667