Category Archives: C++

0x5f3759df (appendix)

This post is an appendix to my previous post about 0x5f3759df. It probably won’t make sense if you haven’t read that post (and if you’ve already had enough of  0x5f3759df you may consider skipping this one altogether).

One point that bugged me about the derivation in the previous post was that while it explained the underlying mathematics it wasn’t very intuitive. I was basically saying “it works because of, umm, math“. So in this post I’ll go over the most abstract part again (which was actually the core of the argument). This time I’ll explain the intuition behind what’s going on. And there’ll be some more graphs. Lots of them.

The thing is, there is a perfectly intuitive explanation for why this works. The key is looking more closely at the float-to-int bit-cast operation. Let’s take that function,

int float_to_int(float f) {
  return *(int*)&f;

and draw it as a graph:

This is a graph of the floating-point numbers from around 1 to 10 bit-cast to integers. These are very large integers (L is 223 so 128L is 1 073 741 824). The function has visible breaks at powers of two and is a straight line inbetween. This makes sense intuitively if you look at how you decode a floating-point value:

v = (1+m)2^e

The mantissa increases linearly from 0 to 1; those are the straight lines. Once the mantissa reaches 1 the exponent is bumped by 1 and the mantissa starts over from 0. Now it’ll be counting twice as fast since the increased exponent gives you another factor of 2 multiplied onto the value. So after a break the straight line covers twice as many numbers on the x axis as before.

Doubling the distance covered on the x axis every time you move a fixed amount on the y axis – that sounds an awful lot like log2. And if you plot float_to_int and log2 (scaled appropriately) together it’s clear that one is an approximation of the other.

In other words we have that

\mathtt{float\_to\_int}(v) \approx (\log_2(v)+B)L

This approximation is just a slightly different way to view the linear approximation we used in the previous post. You’ll notice that the log2 graph overshoots float_to_int in most places, they only touch at powers of 2. For better accuracy we’ll want to move it down a tiny bit – you’ll already be familiar with this adjustment: it’s σ. But I’ll leave σ out of the following since it’s not strictly required, it only improves accuracy.

You can infer the approximation formula from the above formula alone but I’ll do it slightly differently to make it easier to illustrate what’s going on.

The blue line in the graph above,  float_to_int(v), is the i in the original code and the input to our calculation. We want to do integer arithmetic on that value such we get an output that is approximately

\mathtt{float\_to\_int}(v^{-\frac 12})

the integer value that, if converted back to a float, gives us roughly the inverse square root of x. First, let’s plot the input and the desired output in a graph together:

The desired output is decreasing because it’s a reciprocal so the larger the input gets the smaller the output. They intersect at v=1 because the inverse square root of 1 is 1.

The first thing we’ll do to the input is to multiply it by -1 to make it decrease too:

Where before we had a very large positive value we now have is a very large negative one. So even though they look like they intersect they’re actually very far apart and on different axes: the left one for the desired output, the right one for our work-in-progress approximation.

The new value is decreasing but it’s doing it at a faster rate than the desired output. Twice as fast in fact. To match the rates we multiply it by 1/2:

Now the two graphs are decreasing at the same rate but they’re still very far apart: the target is still a very large positive value and the current approximation is a half as large negative value. So as the last step we’ll add 1.5LB to the approximation, 0.5LB to cancel out the -1/2 we multiplied it with a moment ago and 1LB to bring it up to the level of the target:

And there it is: a bit of simple arithmetic later and we now have a value that matches the desired output:

\mathtt{float\_to\_int}(v^{-\frac 12}) \approx \frac 32LB - \frac 12 \mathtt{float\_to\_int}(v)

The concrete value of the constant is

\frac 32LB = \mathtt{0x5f400000}

Again the approximation overshoots the target slightly the same way the log2 approximation did before. To improve accuracy you can subtract a small σ from B. That will give a slightly lower constant – for instance 0x5f3759df.

And with that I think it’s time to move onto something else. I promise that’s the last you’ll hear from me on the subject of that constant.


This post is about the magic constant 0x5f3759df and an extremely neat hack, fast inverse square root, which is where the constant comes from.

Meet the inverse square root hack:

float FastInvSqrt(float x) {
  float xhalf = 0.5f * x;
  int i = *(int*)&x;         // evil floating point bit level hacking
  i = 0x5f3759df - (i >> 1);  // what the fuck?
  x = *(float*)&i;
  x = x*(1.5f-(xhalf*x*x));
  return x;

What this code does is calculate, quickly, a good approximation for


It’s a fairly well-known function these days and first became so when it appeared in the source of Quake III Arena in 2005. It was originally attributed to John Carmack but turned out to have a long history before Quake going back through SGI and 3dfx to Ardent Computer in the mid 80s to the original author Greg Walsh. The concrete code above is an adapted version of the Quake code (that’s where the comments are from).

This post has a bit of fun with this hack. It describes how it works, how to generalize it to any power between -1 and 1, and sheds some new light on the math involved.

(It does contain a fair bit of math. You can think of the equations as notes – you don’t have to read them to get the gist of the post but you should if you want the full story and/or verify for yourself that what I’m saying is correct).


Why do you need to calculate the inverse of the square root – and need it so much that it’s worth implementing a crazy hack to make it fast? Because it’s part of a calculation you do all the time in 3D programming. In 3D graphics you use surface normals, 3-coordinate vectors of length 1, to express lighting and reflection. You use a lot of surface normals. And calculating them involves normalizing a lot of vectors. How do you normalize a vector? You find the length of the vector and then divide each of the coordinates with it. That is, you multiply each coordinate with


Calculating x^2+y^2+z^2 is relatively cheap. Finding the square root and dividing by it is expensive. Enter FastInvSqrt.

What does it do?

What does the function actually do to calculate its result? It has 4 main steps. First it reinterprets the bits of the floating-point input as an integer.

int i = *(int*)&x;         // evil floating point bit level hack

It takes the resulting value and does integer arithmetic on it which produces an approximation of the value we’re looking for:

i = 0x5f3759df - (i >> 1);  // what the fuck?

The result is not the approximation itself though, it is an integer which happens to be, if you reinterpret the bits as a floating point number, the approximation. So the code does the reverse of the conversion in step 1 to get back to floating point:

x = *(float*)&i;

And finally it runs a single iteration of Newton’s method to improve the approximation.

x = x*(1.5f-(xhalf*x*x));

This gives you an excellent approximation of the inverse square root of x. The last part, running Newton’s method, is relatively straightforward so I won’t spend more time on it. The key step is step 2: doing arithmetic on the raw floating-point number cast to an integer and getting a meaningful result back. That’s the part I’ll focus on.

What the fuck?

This section explains the math behind step 2. (The first part of the derivation below, up to the point of calculating the value of the constant, appears to have first been found by McEniry).

Before we can get to the juicy part I’ll just quickly run over how standard floating-point numbers are encoded. I’ll just go through the parts I need, for the full background wikipedia is your friend. A floating-point number has three parts: the sign, the exponent, and the mantissa. Here’s the bits of a single-precision (32-bit) one:

s e e e e e e e e m m m m m m m m m m m m m m m m m m m m m m m

The sign is the top bit, the exponent is the next 8 and the mantissa bottom 23. Since we’re going to be calculating the square root which is only defined for positive values I’m going to be assuming the sign is 0 from now on.

When viewing a floating-point number as just a bunch of bits the exponent and mantissa are just plain positive integers, nothing special about them. Let’s call them E and M (since we’ll be using them a lot). On the other hand, when we interpret the bits as a floating-point value we’ll view the mantissa as a value between 0 and 1, so all 0s means 0 and all 1s is a value very close to but slightly less than 1. And rather than use the exponent as a 8-bit unsigned integer we’ll subtract a bias, B, to make it a signed integer between -127 and 128. Let’s call the floating-point interpretation of those values e and m. In general I’ll follow McEniry and use upper-case letters for values that relate to the integer view and and lower-case for values that relate to the floating-point view.

Converting between the two views is straightforward:

m = \frac{M}{L}

e = E - B

For 32-bit floats L is 223 and B is 127. Given the values of e and m you calculate the floating-point number’s value like this:


and the value of the corresponding integer interpretation of the number is

M + LE

Now we have almost all the bits and pieces I need to explain the hack so I’ll get started and we’ll pick up the last few pieces along the way. The value we want to calculate, given some input x, is the inverse square root or

y = \frac{1}{\sqrt{x}} = x^{-\frac 12}

For reasons that will soon become clear we’ll start off by taking the base 2 logarithm on both sides:

\log_2 y = {-\frac 12}\log_2 x

Since the values we’re working with are actually floating-point we can replace x and y with their floating-point components:

\log_2 (1+m_y) + e_y = {-\frac 12}(\log_2 (1+m_x) + e_x)

Ugh, logarithms. They’re such a hassle. Luckily we can get rid of them quite easily but first we’ll have to take a short break and talk about how they work.

On both sides of this equation we have a term that looks like this,

\log_2(1 + v)

where v is between 0 and 1. It just so happens that for v between 0 and 1, this function is pretty close to a straight line:

ln2(1 + x) vs. x + sigma

Or, in equation form:

\log_2(1 + v) \approx v + \sigma

Where σ is a constant we choose. It’s not a perfect match but we can adjust σ to make it pretty close. Using this we can turn the exact equation above that involved logarithms into an approximate one that is linear, which is much easier to work with:

m_y + \sigma + e_y \approx {-\frac 12}(m_x + \sigma + e_x)

Now we’re getting somewhere! At this point it’s convenient to stop working with the floating-point representation and use the definitions above to substitute the integer view of the exponent and mantissa:

\frac{M_y}{L} + \sigma + E_y - B \approx {-\frac 12}(\frac{M_x}{L} + \sigma + E_x - B)

If we shuffle these terms around a few steps we’ll get something that looks very familiar (the details are tedious, feel free to skip):

\frac{M_y}{L} + E_y \approx {-\frac 12}(\frac{M_x}{L} + \sigma + E_x - B) - \sigma + B

\frac{M_y}{L} + E_y \approx {-\frac 12}(\frac{M_x}{L} + E_x) - \frac{3}{2}(\sigma - B)

M_y + LE_y \approx {\frac 32}L(B - \sigma) - {\frac 12}(M_x + LE_x)

After this last step something interesting has happened: among the clutter we now have the value of the integer representations on either side of the equation:

\mathbf{I_y} \approx {\frac 32}L(B - \sigma) - {\frac 12}\mathbf{I_x}

In other words the integer representation of y is some constant minus half the integer representation of x. Or, in C code:

i = K - (i >> 1);

for some K. Looks very familiar right?

Now what remains is to find the constant. We already know what B and L are but we don’t have σ yet. Remember, σ is the adjustment we used to get the best approximation of the logarithm, so we have some freedom in picking it. I’ll pick the one that was used to produce the original implementation, 0.0450465. Using this value you get:

{\frac 32}L(B - \sigma) = {\frac 32}2^{23}(127 - 0.0450465) = 1597463007

Want to guess what the hex representation of that value is? 0x5f3759df. (As it should be of course, since I picked σ to get that value.) So the constant is not a bit pattern as you might think from the fact that it’s written in hex, it’s the result of a normal calculation rounded to an integer.

But as Knuth would say: so far we’ve only proven that this should work, we haven’t tested it. To give a sense for how accurate the approximation is here is a plot of it along with the accurate inverse square root:

Graph of approximation vs. accurate value

This is for values between 1 and 100. It’s pretty spot on right? And it should be – it’s not just magic, as the derivation above shows, it’s a computation that just happens to use the somewhat exotic but completely well-defined and meaningful operation of bit-casting between float and integer.

But wait there’s more!

Looking at the derivation of this operation tells you something more than just the value of the constant though. You will notice that the derivation hardly depends on the concrete value of any of the terms – they’re just constants that get shuffled around. This means that if we change them the derivation still holds.

First off, the calculation doesn’t care what L and B are. They’re given by the floating-point representation. This means that we can do the same trick for 64- and 128-bit floating-point numbers if we want, all we have to do is recalculate the constant which it the only part that depends on them.

Secondly it doesn’t care which value we pick for σ. The σ that minimizes the difference between the logarithm and x+σ may not, and indeed does not, give the most accurate approximation. That’s a combination of floating-point rounding and because of the Newton step. Picking σ is an interesting subject in itself and is covered by McEniry and Lomont.

Finally, it doesn’t depend on -1/2. That is, the exponent here happens to be -1/2 but the derivation works just as well for any other exponent between -1 and 1. If we call the exponent (because e is taken) and do the same derivation with that instead of -1/2 we get:

\mathbf{I_y} \approx (1 - p)L(\sigma - B) + p\mathbf{I_x}

Let’s try a few exponents. First off p=0.5, the normal non-inverse square root:

\mathbf{I_y} \approx K_{\frac 12} + {\frac 12}\mathbf{I_x}

K_{\frac 12} = {\frac 12}L(B - \sigma) = {\frac 12}2^{23}(127 - 0.0450465) = \mathtt{0x1fbd1df5}

or in code form,

i = 0x1fbd1df5 + (i >> 1);

Does this work too? Sure does:

Graph of approximation vs. accurate

This may be a well-known method to approximate the square root but a cursory google and wikipedia search didn’t suggest that it was.

It even works with “odd” powers, like the cube root

\mathbf{I_y} \approx K_{\frac 13} + {\frac 13}\mathbf{I_x}

K_{\frac 13} = {\frac 23}L(B - \sigma) = {\frac 23}2^{23}(127 - 0.0450465) = \mathtt{0x2a517d3c}

which corresponds to:

i = (int) (0x2a517d3c + (0.333f * i));

Since this is an odd factor we can’t use shift instead of multiplication. Again the approximation is very close:

Graph of approximation vs. accurate

At this point you may have noticed that when changing the exponent we’re actually doing something pretty simple: just adjusting the constant by a linear factor and changing the factor that is multiplied onto the integer representation of the input. These are not expensive operations so it’s feasible to do them at runtime rather than pre-compute them. If we pre-multiply just the two other factors:

L(B - \sigma) = 2^{23}(127 - 0.0450465) = \mathtt{0x3f7a3bea}

we can calculate the value without knowing the exponent in advance:

i = (1 - p) * 0x3f7a3bea + (p * i);

If you shuffle the terms around a bit you can even save one of multiplications:

i = 0x3f7a3bea + p * (i - 0x3f7a3bea);

This gives you the “magic” part of fast exponentiation for any exponent between -1 and 1; the one piece we now need to get a fast exponentiation function that works for all exponents and is as accurate as the original inverse square root function is to generalize the Newton approximation step. I haven’t looked into that so that’s for another blog post (most likely for someone other than me).

The expression above contains a new “magical” constant,  0x3f7a3bea. But even if it’s in some sense “more magical” than the original constant, since that can be derived from it, it depends on an arbitrary choice of σ so it’s not universal in any way. I’ll call it Cσ and we’ll take a closer look at it in a second.

But first, one sanity check we can try with this formula is when p=0. For a p of zero the result should always be 1 since x0 is 1 independent of x. And indeed the second term falls away because it is multiplied by 0 and so we get simply:

i = 0x3f7a3bea;

Which is indeed constant – and interpreted as a floating-point value it’s 0.977477 also known as “almost 1” so the sanity check checks out. That tells us something else too: Cσ actually has a meaningful value when cast to a float. It’s 1; or very close to it.

That’s interesting. Let’s take a closer look. The integer representation of Cσ is

C_\sigma = L(B - \sigma) = LB - L\sigma

This is almost but not quite the shape of a floating-point number, the only problem is that we’re subtracting rather than adding the second term. That’s easy to fix though:

LB - L\sigma = LB - L + L - L\sigma = L(B - 1) + L(1 - \sigma)

Now it looks exactly like the integer representation of a floating-point number. To see which we’ll first determine the exponent and mantissa and then calculate the value, cσ. This is the exponent:

e_{c_\sigma} = (E_{C_\sigma} - B) = (B - 1 - B) = -1

and this is the mantissa:

m_{c_\sigma} = \frac{M_{C_\sigma}}{L} = \frac{L(1 - \sigma)}{L} = 1 - \sigma

So the floating-point value of the constant is (drumroll):

c_\sigma = (1 + m_{c_\sigma})2^{e_{c_\sigma}} = \frac{1 + 1 - \sigma}2 = 1 - \frac{\sigma}2

And indeed if you divide our original σ from earlier, 0.0450465, by 2 you get 0.02252325; subtract it from 1 you get 0.97747675 or our friend “almost 1” from a moment ago. That gives us a second way to view Cσ, as the integer representation of a floating-point number, and to calculate it in code:

float sigma = 0.0450465;
float c_sigma = 1 - (0.5f * sigma);
int C_sigma = *(*int)&c_sigma;

Note that for a fixed σ these are all constants and the compiler should be able to optimize this whole computation away. The result is 0x3f7a3beb – not exactly 0x3f7a3bea from before but just one bit away (the least significant one) which is to be expected for computations that involve floating-point numbers. Getting to the original constant, the title of this post, is a matter of multiplying the result by 1.5.

With that we’ve gotten close enough to the bottom to satisfy at least me that there is nothing magical going on here. For me the main lesson from this exercise is that bit-casting between integers and floats is not just a meaningless operation, it’s an exotic but very cheap numeric operation that can be useful in computations. And I expect there’s more uses of it out there waiting to be discovered.

(Update: I’ve added an appendix about the intuition behind the part where I shuffle around the floating-point terms and the integer terms appear.)


Many (most?) virtual machines for dynamically typed languages use tagged integers. On a 32-bit system the “natural” tagsize for a value is 2 bits since objects are (or can easily be) 4-byte aligned. This gives you the ability to distinguish between four different types of values each with 30 bits of data, or three different types where two have 30 bits and one has 31 bits of data. V8 uses the latter model: 31 bit integers, 30 bit object pointers and 30 bit “failure” pointers used for internal bookkeeping. On 64-bit systems the natural tagsize is 3 which allows for more types of tagged values, and inspired by this I’ve been playing around with tagging another kind of value: double precision floating-point numbers.

Whenever you want to tag a value you need to find space in the value for the tag. With objects it’s easy: they’re already aligned so the lowest few bits are automatically free:

oooooooooooooo..oooooooooooooo000 (untagged)
oooooooooooooo..oooooooooooooottt (tagged)

With integers no bits are free automatically so you only tag if the highest bits are the same, by shifting the lowest bits up and adding the tag in the lowest bits, for instance:

0000iiiiiiiiiiiii..iiiiiiiiiiiiii (untagged)
0iiiiiiiiiiiii..iiiiiiiiiiiiiittt (tagged)

The reason both the highest bits 0000 and 1111 work is that when you shift up by 3 there will be one bit of the original 4 left, and when shifting down again sign extension spreads that bit back out over all the top 4 bits.

With doubles it becomes harder to find free bits. A double is represented as one sign bit s, an 11-bit exponent e and a 52 bit fraction, f:

seeeeeeeeeeeffffffffff..fffffffff (untagged)

The value of the number is (-1)s2e1.f.

To get a feel for how this representation works I tried decomposing some different values into into their parts, here using single precision but the same pattern applies to double precision:

value sign exponent fraction
1.0 0 0 1
-1.0 1 0 1
2.0 0 1 1
0.5 0 -1 1
3.0 0 1 1.5
100.0 0 6 1.5625
10000.0 0 13 1.2207031
-10000.0 1 13 1.2207031
0.0001 0 -14 1.6384
0 0 -127 0.0 (denormalized)
NaN 0 128 1.5
Infinity 0 128 1.0

Looking at this table it’s clear that in an interval around ±1.0, starting close to 0.0 on the one side and stretching far out into the large numbers, the exponent stays fairly small. There are 11 bits available to the exponent, that’s a range from -1022 to 1023 since one exponent is used for special values, but for all values between 0.0001 and 10000 for instance its actual value only runs from -14 to 13.

Even though the high bits of the exponent are unused for many numbers you can’t just grab them for the tag. The 11 exponent bits don’t contain the exponent directly but its value plus a bias of 1023, apparently to make comparison of doubles easier. But after subtracting the bias from the exponent it is indeed just a matter of stealing its top three bits of the value (leaving the sign bit in place):

su000uuuuuuuffffffffff..fffffffff (untagged, bias subtracted)
suuuuuuuuffffffffff..fffffffffttt (tagged, bias subtracted)

Using this approach all doubles whose exponent is between -127 and 128 can be tagged. Since single precision numbers use 8 bits for the exponent all numbers between the numerically greatest and numerically smallest single precision numbers, positive and negative, can be represented as tagged doubles. One potential concern is that this encoding does not allow ±0.0, NaNs or ±∞ but I’m not too worried about that; it’s easy to handle 0.0 specially and it’s unclear how problematic the other values are.

What’s the performance of this? The short answer is: I don’t know, haven’t tried it. The long answer is: I can guess and I’m fairly certain that it’s worth it.

One reason it might not be worth it is if most doubles are not covered by this. To test this I instrumented v8 to test each time a heap number was allocated whether the value would fit as a tagged double. Furthermore I disabled some optimizations that avoid number allocation to make sure the allocation routines saw as many numbers as possible. The results after running the v8 benchmarks were encouraging:

| Name | Value |
| c:SmallDoubleZero | 16149 |
| c:SmallDoubleNaN | 2 |
| c:SmallDoubleInfinity | 32959 |
| c:SmallDoubleOther | 1290 |
| c:SmallDoubleHits | 628660 |

The number in c:SmallDoubleHits is the ones that fit in a tagged double: 92%. Furthermore, of the numbers that could not be represented all but 2% were zero or infinity. The v8 benchmark suite contains two floating-point heavy benchmarks: a cryptography benchmark and a raytracer. Interestingly, the cryptography benchmarks is responsible for almost all the ∞s and the raytracer is responsible for almost all the 0.0s. If a special case was added for 0.0 then 99.3% of all the doubles used in the raytracer would not require heap allocation.

Note: people often ask if we plan to add 64-bit support to v8. Don’t read this as an indication one way or the other. I’m just testing this using v8 because that’s the code base and benchmarks I know best.

Note that this is just a single data point, there’s no basis to conclude that the same holds across most applications. Note also that since v8 uses tagged integers there is some overlap between this an existing optimizations. But since all 32-bit integers can be represented as tagged doubles any nonzero numbers that are not counted because they were recognized as small integers would have counted as tagged doubles.

Another reason it might not be worth it could be that tagging and untagging is expensive; I doubt that it is considering how relatively few operations it takes:

static const uint64_t kBias = 0x3800000000000000ULL;
static const uint64_t kStolenBitsMask = 0x7000000000000000ULL;
static const uint64_t kLowerMask = 0xfffffffffffffffULL;
static const uint64_t kUpperMask = 0x8000000000000000ULL;
static const uint64_t kTagSize = 3;
static const uint64_t kTag = 0x3;

uint64_t double_bits(double value) {
return *reinterpret_cast(&value);

double make_double(uint64_t value) {
return *reinterpret_cast<double*>(&value);

bool fits_small_double(double value) {
return ((double_bits(value) - kBias) & kStolenBitsMask) == 0;

uint64_t tag_small_double(double value) {
uint64_t bits = double_bits(value) - kBias;
return (bits & kUpperMask)
| ((bits & kLowerMask) << kTagSize)
| kTag;

double untag_small_double(uint64_t bits) {
uint64_t result = (bits & kUpperMask)
| ((bits >> kTagSize) & kLowerMask);
return make_double(result + kBias);

I haven’t benchmarked this so I can only guess about the performance. I have no doubt that testing and tagging a double is faster overall than allocating a heap object but I suspect reading from memory could turn out to be cheaper than untagging. The overall performance also depends on how doubles are used: if most are only used briefly and then discarded then the possible cheaper reads don’t make up for the more expensive allocation. But I’m just guessing here.

In any case I think this is a promising strawman and it would be interesting to see how it performed in a real application. I also suspect there are more efficient ways of tagging/untagging the same class of doubles. The problem is actually a fairly simple one: reduce the highest bits from 01000, 11000, 01111 or 11111 to only two bits and be able to go back again to the same five bits.

Further reading: alternative approaches are here, which is essentially the same as this but has more special cases, and this which goes the other way and stores other values in the payload of NaN values.


Before I had used C++ I had quite a strong dislike for it. I remember wondering why anyone would voluntarily use C++? and being frustrated when reading interviews with Bjarne Stroustrup because he wouldn’t admit that C++, or at least parts of it, sucks.

Well, for the last few years I’ve actually been using C++ and have ended up changing my mind completely. As I expect most C++ programmers do I stay within a subset of the language, but that subset is now one of my favorite languages: not too complicated, powerful and concise. With the other languages I use regularly, Java, Python and JavaScript, I often find myself thinking “man, what were they thinking when they designed this feature?”. That practically never happens with C++. It can be complex and confusing but when taking the constraints and interactions with other language features into account I can’t think of a single instance where I’ve been able to come up with a better solution. I realize that is quite an arrogant way to put it but look at it this way: coming from an arrogant and opinionated person it is a huge compliment.

One of my favorite features is stack-allocated objects because they enable the resource acquisition is initialization pattern. Given a choice between doing resource handling in Java, using finalizers for objects and finally clauses for scopes, and C++ using RAII, I would choose C++ any day.

The weaknesses of finalizers are well-known so I won’t say more about them. The problem with finally clauses is closely related to the subject of making wrong code look wrong. Finally clauses divorce the code that releases a resource from the code that allocates it, which makes it much harder to see when you forget to clean up.

FileReader in = new FileReader(fileName);
try {
/* process file */
} finally {

Using a finally clause means having the code for cleaning up in on the other side of the code that uses it. You have to scan the whole method to convince yourself that in will indeed be closed. An even worse issue is the fact that the try part has a scope of its own that the finally clause can’t see. This means that anything you do after having allocated a resource will, by default, be hidden from the finally clause. Because of this the pattern above doesn’t scale if you want to open two files:

FileReader in = new FileReader(inFileName);
FileWriter out = new FileWriter(outFileName);
try {
/* process files */
} finally {

If the second constructor throws an exception, and you can’t be sure it doesn’t, then the finally clause is not yet in effect and so in won’t be cleaned up. You can’t declare the out variable in the try clause because then the finally clause can’t see it so what people often end up doing is:

FileReader in = null;
FileWriter out = null;
try {
in = new FileReader(inFileName);
out = new FileWriter(outFileName);
/* process files */
} finally {
if (in != null) in.close();
if (out != null) out.close();[1]

What we’re trying to do here is really very simple: we want to be sure that these files are closed before leaving the scope. With try/finally the solution is more complicated than the problem.

In C++ the destructor of stack-allocated objects give you a way to execute code exactly when you leave a scope. Here’s how you could express the same thing in C++:

own_resource in_resource = fopen(in_file_name, "r");
FILE* in = *in_resource;
if (in == NULL) return;
own_resource out_resource = fopen(out_file_name, "w");
FILE* out = *out_resource;
if (out == NULL) return;
/* process files */

The resources we acquire here are stored in a stack-allocated own_resource; the destructor of the own_resource class takes care of releasing the value stored in it. The language ensures that no matter how control leaves the current scope the own_resource destructors will be called first. Furthermore I can see right at the call to fopen that the result will be cleaned up. You don’t have to use this pattern long before any call that allocates a resource and does not store it in an own_resource (or an own_ptr or some other own_ object) looks very wrong.

How to manage resources is closely related to how to signal errors. Most new languages use some form of exceptions, even though they have acquired something of a bad name. There are many problems with exceptions (none of which can be solved with error codes by the way) but being able to specify how a resource should be cleaned up at the same time as you allocate it does solve many of those problems. Getting this right requires you to assume that any part of your code could throw an exception. It’s a slightly nicer version of preemption really: someone might interrupt you at any nontrivial operation but unlike preemption all you must be able to do is bail out, you don’t have to be able to continue running. That is doable. Furthermore it’s just a fact of life: whether you use exceptions or not you may have to bail out at any nontrivial operation, because of stack overflow, memory exhaustion, SIGINT, or whatever. The best way to defend yourself against that is to program defensively, and to write your code in a style where resource allocation operations stand out if they don’t immediately make provisions for deallocation. I don’t know a language that does that better than C++.

However, as much as I’ve been praising C++, their solution relies much too heavily on static typing for my taste. An interesting challenge is to design a way for flexible object-oriented RAII to work in dynamically typed languages. That is something I will be working on and I hope I’ll eventually be back here with a good solution.

1: As Bob points out in the comments there’s a bug here: if in.close() throws an exception then out will not be closed. It’s another illustration of how hard this stuff is to get right. Note that the C++ example is not affected by this issue: if the destructor for out (which is the first to be called) throws an exception the same rules apply with regards to in: the exception causes us to leave its scope so its destructor will be called.


I’ve recently gone from being indifferent to varargs in C++ to thinking that they should be avoided at all cost. It’s a collection of values (but not references or objects passed by value) that you don’t know the length or type of, that you can’t pass on directly to other variadic functions, and that can only be streamed over not accessed randomly. Bleh! In the case of printf-like function, which I expect is the most popular use of varargs, you add the information that is missing from the arguments themselves to the format string. This gives ample opportunity for mistakes, which some compilers will warn you of but which other’s won’t. For me, the conclusion has been that varargs are harmful and should not be used under any circumstances.

Of course, not having printf-like functions in C++ is a drag if you want to build strings so I’ve been trying to come up with something to replace it that isn’t too painful. What the remainder of this post is about is that you can actually have your cake and eat it too: you can have something that looks just like printf that doesn’t use varargs but takes a variable (but bounded) number of arguments where the arguments each carry their own type information and can be accessed randomly. Where you would write this using classic printf:

int position = ...;
const char *value = ...;
printf("Found %s at %i", value, position);

this is how it looks with dprintf (dynamically typed printf):

dprintf("Found % at %", args(value, position))

You’ll notice that dprintf is different from printf in that:

  • There is no type information in the format string; arguments carry their own type info. Alternatively you could allow type info in the format string and use the info carried by the arguments for checking.
  • The arguments are wrapped in a call to args. This is the price you have to pay syntactically.

To explain how this works I’ll take it one step at a time and start with how to automatically collect type information about arguments.

Type Tagging

When calling dprintf each argument is coupled with information about its type. This is done using implicit conversion and static overloading. If you pass an argument to a function in C++ and the types don’t match then C++ will look at the expected argument types and, if possible, implicitly convert the argument to the expected type. This implicit conversion can be used to collect type information:

enum TypeTag { INT, C_STR, STR, FLOAT, ... }

class TaggedValue: {
TaggedValue(int v) : tag(INT) { value.u_int = v; }
TaggedValue(const char* v) : tag(C_STR) { value.u_c_str = v; }
TaggedValue(const string &v) : tag(STR) { value.u_str = v; }
TypeTag tag;
union {
int u_int;
const char* u_c_str;
const string *u_str;
} value;

void call_me(const TaggedValue &value) {
printf("Type tag: %i\n", value.tag());

call_me(4); // Prints the value of INT
call_me("foo"); // Prints the value of C_STR

When calling call_me the arguments don’t match so C++ implicitly finds the int and const char* constructors of TaggedValue and calls them. Each constructor in TaggedValue does a minimal amount of processing to store the value and type info about the value. The important part here is that you don’t see this when you invoke call_me so as a user you don’t have to worry about how this works, all you know is that type info is somehow being collected.

This has the limitation that it only works for a fixed set of types, the types for which there is a constructor in TaggedValue — but on the other hand this is no different from the fixed set of formatting directives understood by printf. On the plus side it allows you to pass any value by reference and by value as long as TaggedValue knows about it in advance. The issue of automatic type tagging is actually an interesting one; in this case we only allow a fixed set of “primitive” types but there is no reason why tagging can’t work for array types or other even more complex types[1].

The next step is to allow a variable number of these tagged values. Enter the args function.

Variable argument count

This is the nastiest part of it all since we have to define an args function for every number of arguments we want to allow. This function will pack the arguments up in an object and return it to the dprintf call:

// Abstract superclass of the actual argument collection
class TaggedValueCollection {
TaggedValueCollection(int size) : size_(size) { }
int size() { return size_; }
virtual const TaggedValue &get(int i) = 0;
int size_;

template <int n>
class TaggedValueCollectionImpl : public TaggedValueCollection {
TaggedValueCollectionImpl(int size)
: TaggedValueCollection(size) { }
virtual const TaggedValue &get(int i) {
return *(elms_[i]);
const TaggedValue *elms_[n];

static TaggedValueCollectionImpl<3> args(const TaggedValue &v1,
const TaggedValue &v2, const TaggedValue &v3) {
TaggedValueCollectionImpl<3> result(3);
result.elms_[0] = &v1;
result.elms_[1] = &v2;
result.elms_[2] = &v3;
return result;

The TaggedValueCollectionImpl holds the arguments and provides access to them. The TaggedValueCollection superclass are there to allow methods to access the arguments without knowing how many there are. The number of arguments are part of the type of TaggedValueCollectionImpl so we can’t use that directly. The args function is the one that handles 3 arguments but they all look like this. It creates a new collection and stores its arguments in it. While it is bad to have to define a function for each number of arguments it is much better than varargs. Also, you only have to write these functions once, and then every function with printf-like behavior can use them.

Finally, the dprintf function looks like this:

void dprintf(string format, const TaggedValueCollection &elms) {
// Do the formatting

One advantage about having random access to the arguments is that the format string can access them randomly, they don’t have to be accessed in sequence. I’ve used the syntax $i to access the i‘th argument:

dprintf("Reverse order (2: $2, 1: $1, 0: $0)", args("a", "b", "c"))
// -> prints "Reverse order (2: c, 1: b, 0: a)"

Coupled with the fact that format strings don’t have to contain type info this gives a lot more flexibility in dealing with format strings. Honestly I’ve never had any use for this but I could imagine that it could be useful for instance in internationalization where different languages have different word order.

To sum up, the advantages of this scheme are:

  • Type information is automatically inferred from and stored with the arguments.
  • You can pass any type of value by reference or by value.
  • The argument collections (the const TaggedValueCollection &s) can be passed around straightforwardly. You only have to define one function, there is no need for functions like vprintf.
  • Format strings can access arguments in random order.

I’ve switched to using this in my hobby project and it works like a charm. You can see a full implementation of this here: string.h and (see string_buffer::printf and note that the classes have different names than I’ve used here). To see how it is being used see for instance.

A final note: if you find this kind of thing interesting (and since you’ve come this far I expect you do) you’ll probably enjoy enjoy Danvy’s Functional Unparsing.

1: To represent more complex types you just have to be a bit more creative with how you encode the type tag. A simple scheme is to to let the lowest nibble specify the outermost type (INT, C_STR, ARRAY, …) and then, if the outermost type is ARRAY let the 28 higher bits hold the type tag of the element type. For more complex types, like pair the 28 remaining bits can be divided into two 14-bit fields that hold the type tags for T1 and T2 respectively. For instance, pair<const char*, int**> would be represented as

bits 28-31 24-27 20-23 16-19 12-15 8-11 4-7 0-3
data c_str int array array pair

While this scheme is easy to encode and decode it can definitely be improved upon. For instance, there are many type tag values that are “wasted” because they don’t correspond to a legal type. Also, it allows you to represent some types that you probably don’t need very often while some types you are more likely to use cannot be represented. For instance, you can represent int****** but not pair<int, pair<int*, const char*>>. And of course, if you don’t have exactly 16 different cases you’re guaranteed to waste space in each nibble. But finding a better encoding is, as they say, the subject of future research.


Here’s an interesting tidbit for the code style fanatics out there. I recently found the source code of the original System 7 version of the bourne shell, sh, the father of bash. Wow. The code style is so awful that it’s almost a work of art. How bad is it? Well, I think a few lines from mac.h gives a pretty good impression:

#define IF    if(
#define THEN ){
#define ELSE } else {
#define ELIF } else if (
#define FI ;}

Here’s a bit of the resulting code (with pseudo-keywords highlighted):

FOR m=2*j-1; m/=2;
DO k=n-m;
FOR j=0; j DO FOR i=j; i>=0; i-=m
DO REG STRING *fromi; fromi = &from[i];
IF cf(fromi[m],fromi[0])>0
THEN break;

As they would say over at the daily WTF: My eyes! The goggles do nothing! It’s C code alright, but not C code as we know it. It’s known as Bournegol, since it’s inspired by Algol. By the way, IOCCC, the International Obfuscated C Code Contest, was started partly out of disgust with this code.