Fast Inverse Square Root

    Magic Number - 0x5f3759df
    Dec 8, 2023

    The article introduces an algorithm called Fast Inverse Square Root, which is used to quickly calculate the inverse square root of a floating point number.

    Prologue

    Fast Inverse Square Root is a rather famous algorithm, and the specific author is no longer traceable. Its prominence comes from its extensive use in the game Quake III, where it became well-known.

    Why do we need the inverse square root instead of the square root itself? In computer graphics, many calculations involve the inverse square root rather than the square root. For example, the normalize() operation for a vector 1x2+y2+z2(x,y,z)\frac{1}{\sqrt{x^2 + y^2 + z^2}}(x,y,z) includes the inverse square root. In computer graphics, each frame's rendering requires at least tens of thousands of calls to normalize. If we can accelerate the calculation of the inverse square root, the rendering speed will significantly improve.

    The algorithm originated more than 20 years ago. Today, the algorithm has research value but not practical use since modern CPUs have faster and more accurate instructions. We can simply use 1/sqrt(y).

    However, this doesn't hinder us from understanding the elegance of this algorithm. In a brief overview, it involves:

    1. IEEE 754 Standard
    2. Undefined behavior in the C language
    3. Newton's iteration method.

    The focus here is on the second step—how to derive the magic number 0x5f3759df. For more detailed information, you can refer to the source.

    Algorithm

    float Q_rsqrt(float number)
    {
        long i;
        float x2, y;
        const float threehalfs = 1.5F;
        x2 = number * 0.5F;
        y = number;
    
        /* Core Code */
        // 1. reinterpret_cast from float to int
        i = *(long *)&y; // evil floating-point bit-level hacking
        // 2. Estimate the inverse square root
        i = 0x5f3759df - (i >> 1); // what the heck? (....why not 0x69696969?)
        // reinterpret_cast from int to float
        y = *(float *)&i;
        // 3. Newton's method
        y = y * (threehalfs - (x2 * y * y)); // 1st iteration
        // y = y * (threehalfs - (x2 * y * y)); // 2nd iteration, this can be removed
    
        return y;
    }
    

    The algorithm roughly consists of three parts:

    • reinterpret_cast used for conversion between float and int because any compiler would otherwise prohibit bitwise operations on float types.
    • Estimation using the magic number 0x5f3759df for the inverse square root (explained later).
    • One iteration of Newton's method (explained later).

    Estimating the Inverse Square Root

    In the IEEE 754 standard, a 32-bit floating-point number contains 1 sign bit, 8 exponent bits EE, and 23 mantissa bits MM. Any floating-point number can be represented as: (1.M)×2E127=(1+M223)×2E127,M[0,223),E[0,256)(1.M) \times 2^{E - 127} = (1 + \frac{M}{2^{23}}) \times 2^{E - 127}, M \in[0, 2^{23} ), E \in [0,256).

    To calculate yy's inverse square root 1y=y12\frac{1}{\sqrt{y}} = y^{-\frac{1}{2}}, the IEEE 754 form of yy is y=(1+My223)×2Ey127y = (1 + \frac{M_y}{2^{23}}) \times 2^{E_y - 127}.

    For simplicity, let's take the logarithm of yy to the base 2:

    logy=log((1+My)×2Ey127)=log(1+My223)+Ey127\begin{align} \log y &= \log{((1 + M_y) \times 2^{E_y - 127})} \\ &= \log{(1+\frac{M_y}{2^{23}})} + E_y - 127 \end{align}

    It's clear that My223\frac{M_y}{2^{23}} is in the interval [0,1][0,1]. In this interval, log(1+x)x\log(1+x) \approx x. If we add a constant μ\mu, the approximation improves:

    Using this approximation, we can further simplify:

    logy=log(1+My223)+Ey127My223+μ+Ey127=1223(My+Ey×223)+μ127\begin{align} \log y &= \log{(1+\frac{M_y}{2^{23}})} + E_y - 127 \\ &\approx \frac{M_y}{2^{23}} + \mu + E_y - 127 \\ &= \frac{1}{2^{23}}(M_y + E_y \times 2^{23}) + \mu - 127 \end{align}

    Now, let's take the logarithm of y12y^{-\frac{1}{2}}:

    logy12=12logy12(1223(My+Ey×223)+μ127)\log y^{-\frac{1}{2}} = -\frac{1}{2} \log y \approx -\frac{1}{2}\left( \frac{1}{2^{23}}(M_y + E_y \times 2^{23}) + \mu - 127 \right)

    Assuming the accurate value of y12y^{-\frac{1}{2}} is AA, with exponent and mantissa EAE_A and MAM_A respectively, we can take the logarithm of AA:

    logA1223(MA+EA×223)+μ127\log A \approx \frac{1}{2^{23}}(M_A + E_A \times 2^{23}) + \mu - 127

    By equating the two logarithmic expressions, we get:

    MA+EA×223=223×32(127μ)12(My+Ey×223)\begin{align} M_A + E_A \times 2^{23} &= 2^{23} \times \frac{3}{2}(127 - \mu) - \frac{1}{2}(M_y + E_y \times 2^{23}) \end{align}

    When μ\mu is set to 0.04504618757916870117560.0450461875791687011756, the first term becomes 1597463011 in hexadecimal (0x5F3759e3). If we replace the second term's 12\frac{1}{2} with a right shift operation, the result is remarkably close to the line in the code:

    i = 0x5f3759df - (i >> 1);
    

    The magic number is slightly different, but with an appropriate μ\mu value, it should yield exactly the same number.

    Newton's Iteration Method

    The update formula for Newton's iteration method is:

    xt+1=xtf(xt)f(xt)x_{t+1} = x_t - \frac{f(x_t)}{f^\prime(x_t)}

    The precondition for Newton's iteration method is that the initial point should be near the function's zero. Since the estimation from i = 0x5f3759df - (i >> 1); should have a small error, we can assume the precondition is satisfied.

    Now, our problem is, given yy, we need to find xx such that 1y=x\frac{1}{\sqrt{y}} = x, i.e., 1x2=y\frac{1}{x^2} = y. Rearranging, we get f(x)=1x2y=0f(x) = \frac{1}{x^2} - y = 0. Applying Newton's iteration to f(x)f(x) gives the iteration formula:

    xt+1=xty2xt3x_{t+1} = x_t - \frac{y}{2}x_t^3

    This is translated into code as:

    y = y * (threehalfs - (x2 * y * y));
    

    References

    [1] Is fast inverse square root still used?

    [2] Fast Inverse Square Root — A Quake III Algorithm

    [3] Relevant Thesis