Monthly Archives: February 2009

Double

Many (most?) virtual machines for dynamically typed languages use tagged integers. On a 32-bit system the “natural” tagsize for a value is 2 bits since objects are (or can easily be) 4-byte aligned. This gives you the ability to distinguish between four different types of values each with 30 bits of data, or three different types where two have 30 bits and one has 31 bits of data. V8 uses the latter model: 31 bit integers, 30 bit object pointers and 30 bit “failure” pointers used for internal bookkeeping. On 64-bit systems the natural tagsize is 3 which allows for more types of tagged values, and inspired by this I’ve been playing around with tagging another kind of value: double precision floating-point numbers.

Whenever you want to tag a value you need to find space in the value for the tag. With objects it’s easy: they’re already aligned so the lowest few bits are automatically free:

oooooooooooooo..oooooooooooooo000 (untagged)
oooooooooooooo..oooooooooooooottt (tagged)

With integers no bits are free automatically so you only tag if the highest bits are the same, by shifting the lowest bits up and adding the tag in the lowest bits, for instance:

0000iiiiiiiiiiiii..iiiiiiiiiiiiii (untagged)
0iiiiiiiiiiiii..iiiiiiiiiiiiiittt (tagged)

The reason both the highest bits 0000 and 1111 work is that when you shift up by 3 there will be one bit of the original 4 left, and when shifting down again sign extension spreads that bit back out over all the top 4 bits.

With doubles it becomes harder to find free bits. A double is represented as one sign bit s, an 11-bit exponent e and a 52 bit fraction, f:

seeeeeeeeeeeffffffffff..fffffffff (untagged)

The value of the number is (-1)s2e1.f.

To get a feel for how this representation works I tried decomposing some different values into into their parts, here using single precision but the same pattern applies to double precision:

value sign exponent fraction
1.0 0 0 1
-1.0 1 0 1
2.0 0 1 1
0.5 0 -1 1
3.0 0 1 1.5
100.0 0 6 1.5625
10000.0 0 13 1.2207031
-10000.0 1 13 1.2207031
0.0001 0 -14 1.6384
0 0 -127 0.0 (denormalized)
NaN 0 128 1.5
Infinity 0 128 1.0

Looking at this table it’s clear that in an interval around ±1.0, starting close to 0.0 on the one side and stretching far out into the large numbers, the exponent stays fairly small. There are 11 bits available to the exponent, that’s a range from -1022 to 1023 since one exponent is used for special values, but for all values between 0.0001 and 10000 for instance its actual value only runs from -14 to 13.

Even though the high bits of the exponent are unused for many numbers you can’t just grab them for the tag. The 11 exponent bits don’t contain the exponent directly but its value plus a bias of 1023, apparently to make comparison of doubles easier. But after subtracting the bias from the exponent it is indeed just a matter of stealing its top three bits of the value (leaving the sign bit in place):

su000uuuuuuuffffffffff..fffffffff (untagged, bias subtracted)
suuuuuuuuffffffffff..fffffffffttt (tagged, bias subtracted)

Using this approach all doubles whose exponent is between -127 and 128 can be tagged. Since single precision numbers use 8 bits for the exponent all numbers between the numerically greatest and numerically smallest single precision numbers, positive and negative, can be represented as tagged doubles. One potential concern is that this encoding does not allow ±0.0, NaNs or ±∞ but I’m not too worried about that; it’s easy to handle 0.0 specially and it’s unclear how problematic the other values are.

What’s the performance of this? The short answer is: I don’t know, haven’t tried it. The long answer is: I can guess and I’m fairly certain that it’s worth it.

One reason it might not be worth it is if most doubles are not covered by this. To test this I instrumented v8 to test each time a heap number was allocated whether the value would fit as a tagged double. Furthermore I disabled some optimizations that avoid number allocation to make sure the allocation routines saw as many numbers as possible. The results after running the v8 benchmarks were encouraging:

+----------------------------------------+-------------+
| Name | Value |
+----------------------------------------+-------------+
| c:SmallDoubleZero | 16149 |
| c:SmallDoubleNaN | 2 |
| c:SmallDoubleInfinity | 32959 |
| c:SmallDoubleOther | 1290 |
| c:SmallDoubleHits | 628660 |

The number in c:SmallDoubleHits is the ones that fit in a tagged double: 92%. Furthermore, of the numbers that could not be represented all but 2% were zero or infinity. The v8 benchmark suite contains two floating-point heavy benchmarks: a cryptography benchmark and a raytracer. Interestingly, the cryptography benchmarks is responsible for almost all the ∞s and the raytracer is responsible for almost all the 0.0s. If a special case was added for 0.0 then 99.3% of all the doubles used in the raytracer would not require heap allocation.

Note: people often ask if we plan to add 64-bit support to v8. Don’t read this as an indication one way or the other. I’m just testing this using v8 because that’s the code base and benchmarks I know best.

Note that this is just a single data point, there’s no basis to conclude that the same holds across most applications. Note also that since v8 uses tagged integers there is some overlap between this an existing optimizations. But since all 32-bit integers can be represented as tagged doubles any nonzero numbers that are not counted because they were recognized as small integers would have counted as tagged doubles.

Another reason it might not be worth it could be that tagging and untagging is expensive; I doubt that it is considering how relatively few operations it takes:

static const uint64_t kBias = 0x3800000000000000ULL;
static const uint64_t kStolenBitsMask = 0x7000000000000000ULL;
static const uint64_t kLowerMask = 0xfffffffffffffffULL;
static const uint64_t kUpperMask = 0x8000000000000000ULL;
static const uint64_t kTagSize = 3;
static const uint64_t kTag = 0x3;

uint64_t double_bits(double value) {
return *reinterpret_cast(&value);
}

double make_double(uint64_t value) {
return *reinterpret_cast<double*>(&value);
}

bool fits_small_double(double value) {
return ((double_bits(value) - kBias) & kStolenBitsMask) == 0;
}

uint64_t tag_small_double(double value) {
uint64_t bits = double_bits(value) - kBias;
return (bits & kUpperMask)
| ((bits & kLowerMask) << kTagSize)
| kTag;
}

double untag_small_double(uint64_t bits) {
uint64_t result = (bits & kUpperMask)
| ((bits >> kTagSize) & kLowerMask);
return make_double(result + kBias);
}

I haven’t benchmarked this so I can only guess about the performance. I have no doubt that testing and tagging a double is faster overall than allocating a heap object but I suspect reading from memory could turn out to be cheaper than untagging. The overall performance also depends on how doubles are used: if most are only used briefly and then discarded then the possible cheaper reads don’t make up for the more expensive allocation. But I’m just guessing here.

In any case I think this is a promising strawman and it would be interesting to see how it performed in a real application. I also suspect there are more efficient ways of tagging/untagging the same class of doubles. The problem is actually a fairly simple one: reduce the highest bits from 01000, 11000, 01111 or 11111 to only two bits and be able to go back again to the same five bits.

Further reading: alternative approaches are here, which is essentially the same as this but has more special cases, and this which goes the other way and stores other values in the payload of NaN values.

Finally

Before I had used C++ I had quite a strong dislike for it. I remember wondering why anyone would voluntarily use C++? and being frustrated when reading interviews with Bjarne Stroustrup because he wouldn’t admit that C++, or at least parts of it, sucks.

Well, for the last few years I’ve actually been using C++ and have ended up changing my mind completely. As I expect most C++ programmers do I stay within a subset of the language, but that subset is now one of my favorite languages: not too complicated, powerful and concise. With the other languages I use regularly, Java, Python and JavaScript, I often find myself thinking “man, what were they thinking when they designed this feature?”. That practically never happens with C++. It can be complex and confusing but when taking the constraints and interactions with other language features into account I can’t think of a single instance where I’ve been able to come up with a better solution. I realize that is quite an arrogant way to put it but look at it this way: coming from an arrogant and opinionated person it is a huge compliment.

One of my favorite features is stack-allocated objects because they enable the resource acquisition is initialization pattern. Given a choice between doing resource handling in Java, using finalizers for objects and finally clauses for scopes, and C++ using RAII, I would choose C++ any day.

The weaknesses of finalizers are well-known so I won’t say more about them. The problem with finally clauses is closely related to the subject of making wrong code look wrong. Finally clauses divorce the code that releases a resource from the code that allocates it, which makes it much harder to see when you forget to clean up.

FileReader in = new FileReader(fileName);
try {
/* process file */
} finally {
in.close();
}

Using a finally clause means having the code for cleaning up in on the other side of the code that uses it. You have to scan the whole method to convince yourself that in will indeed be closed. An even worse issue is the fact that the try part has a scope of its own that the finally clause can’t see. This means that anything you do after having allocated a resource will, by default, be hidden from the finally clause. Because of this the pattern above doesn’t scale if you want to open two files:

FileReader in = new FileReader(inFileName);
FileWriter out = new FileWriter(outFileName);
try {
/* process files */
} finally {
in.close();
out.close();
}

If the second constructor throws an exception, and you can’t be sure it doesn’t, then the finally clause is not yet in effect and so in won’t be cleaned up. You can’t declare the out variable in the try clause because then the finally clause can’t see it so what people often end up doing is:

FileReader in = null;
FileWriter out = null;
try {
in = new FileReader(inFileName);
out = new FileWriter(outFileName);
/* process files */
} finally {
if (in != null) in.close();
if (out != null) out.close();[1]
}

What we’re trying to do here is really very simple: we want to be sure that these files are closed before leaving the scope. With try/finally the solution is more complicated than the problem.

In C++ the destructor of stack-allocated objects give you a way to execute code exactly when you leave a scope. Here’s how you could express the same thing in C++:

own_resource in_resource = fopen(in_file_name, "r");
FILE* in = *in_resource;
if (in == NULL) return;
own_resource out_resource = fopen(out_file_name, "w");
FILE* out = *out_resource;
if (out == NULL) return;
/* process files */

The resources we acquire here are stored in a stack-allocated own_resource; the destructor of the own_resource class takes care of releasing the value stored in it. The language ensures that no matter how control leaves the current scope the own_resource destructors will be called first. Furthermore I can see right at the call to fopen that the result will be cleaned up. You don’t have to use this pattern long before any call that allocates a resource and does not store it in an own_resource (or an own_ptr or some other own_ object) looks very wrong.

How to manage resources is closely related to how to signal errors. Most new languages use some form of exceptions, even though they have acquired something of a bad name. There are many problems with exceptions (none of which can be solved with error codes by the way) but being able to specify how a resource should be cleaned up at the same time as you allocate it does solve many of those problems. Getting this right requires you to assume that any part of your code could throw an exception. It’s a slightly nicer version of preemption really: someone might interrupt you at any nontrivial operation but unlike preemption all you must be able to do is bail out, you don’t have to be able to continue running. That is doable. Furthermore it’s just a fact of life: whether you use exceptions or not you may have to bail out at any nontrivial operation, because of stack overflow, memory exhaustion, SIGINT, or whatever. The best way to defend yourself against that is to program defensively, and to write your code in a style where resource allocation operations stand out if they don’t immediately make provisions for deallocation. I don’t know a language that does that better than C++.

However, as much as I’ve been praising C++, their solution relies much too heavily on static typing for my taste. An interesting challenge is to design a way for flexible object-oriented RAII to work in dynamically typed languages. That is something I will be working on and I hope I’ll eventually be back here with a good solution.


1: As Bob points out in the comments there’s a bug here: if in.close() throws an exception then out will not be closed. It’s another illustration of how hard this stuff is to get right. Note that the C++ example is not affected by this issue: if the destructor for out (which is the first to be called) throws an exception the same rules apply with regards to in: the exception causes us to leave its scope so its destructor will be called.

Irregexp

The latest thing I’ve been working on has just been announced: irregexp. It’s a new regular expression engine for v8 that we’ve been working on for a few months and now finally it has made it onto the dev-channel release of Google Chrome.

For me an interesting aspect of the irregexp sub-project is that it has taken place completely out in the open. The trail of code and code reviews is out there from the day we created the branch, then known as regexp2000, (599) to we merged back into the mainline (832) to we became feature complete (1104) until finally irregexp could take over all regexp handling (1117). The openness aspect of the v8 project does occasionally freak me out a little bit — almost all the work I do and much of my work-related communication takes place in public and will, barring the cataclysmic destruction of the internet, stay public forever. But in this case I have no reservations, I like the idea very much.

Since it’s been so long since I’ve posted here I have a whole backlog of stuff I’d like to write about but haven’t gotten around to. So to get myself started again here’s what I’ll do: I promise that within a week I will have published a new post. There.