This post is an appendix to my previous post about `0x5f3759df`

. It probably won’t make sense if you haven’t read that post (and if you’ve already had enough of `0x5f3759df`

you may consider skipping this one altogether).

One point that bugged me about the derivation in the previous post was that while it explained the underlying mathematics it wasn’t very intuitive. I was basically saying “it works because of, umm, *math*“. So in this post I’ll go over the most abstract part again (which was actually the core of the argument). This time I’ll explain the intuition behind what’s going on. And there’ll be some more graphs. Lots of them.

The thing is, there is a perfectly intuitive explanation for why this works. The key is looking more closely at the float-to-int bit-cast operation. Let’s take that function,

intfloat_to_int(floatf) {return*(int*)&f; }

and draw it as a graph:

This is a graph of the floating-point numbers from around 1 to 10 bit-cast to integers. These are very large integers (*L* is 2^{23} so 128*L* is 1 073 741 824). The function has visible breaks at powers of two and is a straight line inbetween. This makes sense intuitively if you look at how you decode a floating-point value:

The mantissa increases linearly from 0 to 1; those are the straight lines. Once the mantissa reaches 1 the exponent is bumped by 1 and the mantissa starts over from 0. Now it’ll be counting twice as fast since the increased exponent gives you another factor of 2 multiplied onto the value. So after a break the straight line covers twice as many numbers on the *x* axis as before.

Doubling the distance covered on the *x* axis every time you move a fixed amount on the *y* axis – that sounds an awful lot like log_{2}. And if you plot `float_to_int`

and log_{2} (scaled appropriately) together it’s clear that one is an approximation of the other.

In other words we have that

This approximation is just a slightly different way to view the linear approximation we used in the previous post. You’ll notice that the log_{2} graph overshoots `float_to_int`

in most places, they only touch at powers of 2. For better accuracy we’ll want to move it down a tiny bit – you’ll already be familiar with this adjustment: it’s σ. But I’ll leave σ out of the following since it’s not strictly required, it only improves accuracy.

You can infer the approximation formula from the above formula alone but I’ll do it slightly differently to make it easier to illustrate what’s going on.

The blue line in the graph above, `float_to_int`

(*v*), is the `i`

in the original code and the input to our calculation. We want to do integer arithmetic on that value such we get an output that is approximately

the integer value that, if converted back to a float, gives us roughly the inverse square root of *x*. First, let’s plot the input and the desired output in a graph together:

The desired output is decreasing because it’s a reciprocal so the larger the input gets the smaller the output. They intersect at *v*=1 because the inverse square root of 1 is 1.

The first thing we’ll do to the input is to multiply it by -1 to make it decrease too:

Where before we had a very large positive value we now have is a very large negative one. So even though they look like they intersect they’re actually very far apart and on different axes: the left one for the desired output, the right one for our work-in-progress approximation.

The new value is decreasing but it’s doing it at a faster rate than the desired output. Twice as fast in fact. To match the rates we multiply it by 1/2:

Now the two graphs are decreasing at the same rate but they’re still very far apart: the target is still a very large positive value and the current approximation is a half as large negative value. So as the last step we’ll add 1.5*L**B* to the approximation, 0.5*L**B* to cancel out the -1/2 we multiplied it with a moment ago and 1*L**B* to bring it up to the level of the target:

And there it is: a bit of simple arithmetic later and we now have a value that matches the desired output:

The concrete value of the constant is

Again the approximation overshoots the target slightly the same way the log_{2} approximation did before. To improve accuracy you can subtract a small σ from *B*. That will give a slightly lower constant – for instance `0x5f3759df`

.

And with that I think it’s time to move onto something else. I promise that’s the last you’ll hear from me on the subject of that constant.

Pingback: 0x5f3759df » Hummus and Magnets

That does it. I’m printing this and taping it in at the back of my Hacker’s Delight.

After seeing this it did occur to me that there should have been a chapter about this kind of thing in Hacker’s Delight. I just checked and there is a short one about floating-point numbers but it’s pretty basic.

There’s some coverage of this in the upcoming 2nd Edition. See: http://www.hackersdelight.org/revisions.pdf (15–4)

Thanks for pointing that out, I wasn’t aware of the revisions document. It looks like there’s plenty of new material for a new edition.

As far as I can see his description is pretty similar to Lomont’s. And there is actually an error in there: he attributes the technique to Gary Tarolli and not Greg Walsh. I should probably send him a correction.

Pingback: Inverse Square Root II « Cute Code

Deriving a binary log in this way seems to go back to Mitchell in 1962, can’t find that paper online, but here’s a later paper from 1970:

http://degiorgi.math.hr/aaa_sem/Div/97-105.pdf

Thanks a bunch for the link, that is super relevant. I kind of expected it to go back a fair bit but 1962 – that’s early. I actually managed to find the paper by Mitchell I think, “Computer multiplication and division using binary logarithms”, but it’s behind the IEEE paywall so I couldn’t read it. Ugh.

It’s on scihub, btw.

DOI is 10.1109/TEC.1962.5219391

It’s been a few years in the making, but I wanted to let you know that the second-to-last equation you gave:

float2int(v^1/2) \approx 3/2 LB – 1/2 float2int(v)

can actually be generalized to

float2int(v^{p/q}) \approx (1 – p/q) LB + p/q * float2int(v)

and that the bias that 0x5f3759df applies to “center” the method actually works well for the generalized version as well (e.g. LB * bias_rate will generally center the p/q-th power)