Fast Inverse Square Root
The article introduces an algorithm called Fast Inverse Square Root, which is used to quickly calculate the inverse square root of a floating point number.
Prologue
Fast Inverse Square Root is a rather famous algorithm, and the specific author is no longer traceable. Its prominence comes from its extensive use in the game Quake III, where it became well-known.
Why do we need the inverse square root instead of the square root itself? In computer graphics, many calculations involve the inverse square root rather than the square root. For example, the normalize()
operation for a vector includes the inverse square root. In computer graphics, each frame's rendering requires at least tens of thousands of calls to normalize. If we can accelerate the calculation of the inverse square root, the rendering speed will significantly improve.
The algorithm originated more than 20 years ago. Today, the algorithm has research value but not practical use since modern CPUs have faster and more accurate instructions. We can simply use 1/sqrt(y)
.
However, this doesn't hinder us from understanding the elegance of this algorithm. In a brief overview, it involves:
- IEEE 754 Standard
- Undefined behavior in the C language
- Newton's iteration method.
The focus here is on the second step—how to derive the magic number 0x5f3759df
. For more detailed information, you can refer to the source.
Algorithm
float Q_rsqrt(float number)
{
long i;
float x2, y;
const float threehalfs = 1.5F;
x2 = number * 0.5F;
y = number;
/* Core Code */
// 1. reinterpret_cast from float to int
i = *(long *)&y; // evil floating-point bit-level hacking
// 2. Estimate the inverse square root
i = 0x5f3759df - (i >> 1); // what the heck? (....why not 0x69696969?)
// reinterpret_cast from int to float
y = *(float *)&i;
// 3. Newton's method
y = y * (threehalfs - (x2 * y * y)); // 1st iteration
// y = y * (threehalfs - (x2 * y * y)); // 2nd iteration, this can be removed
return y;
}
The algorithm roughly consists of three parts:
reinterpret_cast
used for conversion between float and int because any compiler would otherwise prohibit bitwise operations on float types.- Estimation using the magic number
0x5f3759df
for the inverse square root (explained later). - One iteration of Newton's method (explained later).
Estimating the Inverse Square Root
In the IEEE 754 standard, a 32-bit floating-point number contains 1 sign bit, 8 exponent bits , and 23 mantissa bits . Any floating-point number can be represented as: .
To calculate 's inverse square root , the IEEE 754 form of is .
For simplicity, let's take the logarithm of to the base 2:
It's clear that is in the interval . In this interval, . If we add a constant , the approximation improves:
Using this approximation, we can further simplify:
Now, let's take the logarithm of :
Assuming the accurate value of is , with exponent and mantissa and respectively, we can take the logarithm of :
By equating the two logarithmic expressions, we get:
When is set to , the first term becomes 1597463011
in hexadecimal (0x5F3759e3
). If we replace the second term's with a right shift operation, the result is remarkably close to the line in the code:
i = 0x5f3759df - (i >> 1);
The magic number is slightly different, but with an appropriate value, it should yield exactly the same number.
Newton's Iteration Method
The update formula for Newton's iteration method is:
The precondition for Newton's iteration method is that the initial point should be near the function's zero. Since the estimation from i = 0x5f3759df - (i >> 1);
should have a small error, we can assume the precondition is satisfied.
Now, our problem is, given , we need to find such that , i.e., . Rearranging, we get . Applying Newton's iteration to gives the iteration formula:
This is translated into code as:
y = y * (threehalfs - (x2 * y * y));
References
[1] Is fast inverse square root still used?
[2] Fast Inverse Square Root — A Quake III Algorithm
[3] Relevant Thesis