March 28, 2008
Optimizing Math-Intensive Applications with Fixed-Point ArithmeticSquare Roots
With the shift-and-add implementations of sines, cosines, exponentials, and logarithms, the runtime was much improved. However, another function was now taking a lot of the timethe square-root function. This had been naïvely implemented using a Newton-Raphson approximation, and was taking a very long time to converge in some cases. What was needed was a systematic method of calculating a square root that had a bounded iteration count.
Yet again, you can use powers of two to your benefit. If y=
If you start with a=2n, and z=x-a2, then you can iterate through the remaining powers of two less than n to find the result. In each iteration you try y=a+2m+r1, for decreasing powers of m. Now, you know that y2=a2+2*a*2m+22m+r2, and the ideal is that r1 (and thus r2) is zero, so y=
Multiply and Divide
After investing the effort in optimizing the complex operations, plain old multiply and divide were now top of the heap. These functions are more than just simple integer operations, because they need to take care of the fixed-point offset. Not only that, but the target platform is a 32-bit platform, so the compiler-supplied 64-bit integer operations are multiple instructions anyway, thus it is important to ensure that every instruction does something useful. For multiplication, this means splitting the top 32 bits and the bottom 32 bitsthere's no point doing unnecessary 32-bit times 64-bit multiplications just for completeness. Doing the split like this also lets the appropriate bit shifts be incorporated directly without having to worry about loss of precision.
Division is more complicated, as to get the final answer correct to the available precision, you need more than 64 bits in some cases. To circumvent this problem, when the numerator is more than the denominator, then the denominator is scaled up until it is at least as big as half the numerator. This then makes for division of values of similar size, so the shift-and-add method used for the actual calculation is most effective, and doesn't lose any precisionbasically, each bit is calculated in turn, starting with the most significant bit of the result.
Conclusion
Using fixed-point math gave a considerable performance boost to this application, and I anticipate that there are many other applications that could benefit from the techniques described here. It is important to check that the required range of values for an application can comfortably be represented within the chosen fixed-point representation.
|
|
|||||||||||||||||||||||||||||||
|
|
|
|