## 0x5f3759df (appendix)

This post is an appendix to my previous post about 0x5f3759df. It probably won’t make sense if you haven’t read that post (and if you’ve already had enough of  0x5f3759df you may consider skipping this one altogether).

One point that bugged me about the derivation in the previous post was that while it explained the underlying mathematics it wasn’t very intuitive. I was basically saying “it works because of, umm, math“. So in this post I’ll go over the most abstract part again (which was actually the core of the argument). This time I’ll explain the intuition behind what’s going on. And there’ll be some more graphs. Lots of them.

The thing is, there is a perfectly intuitive explanation for why this works. The key is looking more closely at the float-to-int bit-cast operation. Let’s take that function,

int float_to_int(float f) {
return *(int*)&f;
}

and draw it as a graph:

This is a graph of the floating-point numbers from around 1 to 10 bit-cast to integers. These are very large integers (L is 223 so 128L is 1 073 741 824). The function has visible breaks at powers of two and is a straight line inbetween. This makes sense intuitively if you look at how you decode a floating-point value:

$v = (1+m)2^e$

The mantissa increases linearly from 0 to 1; those are the straight lines. Once the mantissa reaches 1 the exponent is bumped by 1 and the mantissa starts over from 0. Now it’ll be counting twice as fast since the increased exponent gives you another factor of 2 multiplied onto the value. So after a break the straight line covers twice as many numbers on the x axis as before.

Doubling the distance covered on the x axis every time you move a fixed amount on the y axis – that sounds an awful lot like log2. And if you plot float_to_int and log2 (scaled appropriately) together it’s clear that one is an approximation of the other.

In other words we have that

$\mathtt{float\_to\_int}(v) \approx (\log_2(v)+B)L$

This approximation is just a slightly different way to view the linear approximation we used in the previous post. You’ll notice that the log2 graph overshoots float_to_int in most places, they only touch at powers of 2. For better accuracy we’ll want to move it down a tiny bit – you’ll already be familiar with this adjustment: it’s σ. But I’ll leave σ out of the following since it’s not strictly required, it only improves accuracy.

You can infer the approximation formula from the above formula alone but I’ll do it slightly differently to make it easier to illustrate what’s going on.

The blue line in the graph above,  float_to_int(v), is the i in the original code and the input to our calculation. We want to do integer arithmetic on that value such we get an output that is approximately

$\mathtt{float\_to\_int}(v^{-\frac 12})$

the integer value that, if converted back to a float, gives us roughly the inverse square root of x. First, let’s plot the input and the desired output in a graph together:

The desired output is decreasing because it’s a reciprocal so the larger the input gets the smaller the output. They intersect at v=1 because the inverse square root of 1 is 1.

The first thing we’ll do to the input is to multiply it by -1 to make it decrease too:

Where before we had a very large positive value we now have is a very large negative one. So even though they look like they intersect they’re actually very far apart and on different axes: the left one for the desired output, the right one for our work-in-progress approximation.

The new value is decreasing but it’s doing it at a faster rate than the desired output. Twice as fast in fact. To match the rates we multiply it by 1/2:

Now the two graphs are decreasing at the same rate but they’re still very far apart: the target is still a very large positive value and the current approximation is a half as large negative value. So as the last step we’ll add 1.5LB to the approximation, 0.5LB to cancel out the -1/2 we multiplied it with a moment ago and 1LB to bring it up to the level of the target:

And there it is: a bit of simple arithmetic later and we now have a value that matches the desired output:

$\mathtt{float\_to\_int}(v^{-\frac 12}) \approx \frac 32LB - \frac 12 \mathtt{float\_to\_int}(v)$

The concrete value of the constant is

$\frac 32LB = \mathtt{0x5f400000}$

Again the approximation overshoots the target slightly the same way the log2 approximation did before. To improve accuracy you can subtract a small σ from B. That will give a slightly lower constant – for instance 0x5f3759df.

And with that I think it’s time to move onto something else. I promise that’s the last you’ll hear from me on the subject of that constant.

## 0x5f3759df

This post is about the magic constant 0x5f3759df and an extremely neat hack, fast inverse square root, which is where the constant comes from.

Meet the inverse square root hack:

float FastInvSqrt(float x) {
float xhalf = 0.5f * x;
int i = *(int*)&x;         // evil floating point bit level hacking
i = 0x5f3759df - (i >> 1);  // what the fuck?
x = *(float*)&i;
x = x*(1.5f-(xhalf*x*x));
return x;
}

What this code does is calculate, quickly, a good approximation for

$\frac{1}{\sqrt{x}}$

It’s a fairly well-known function these days and first became so when it appeared in the source of Quake III Arena in 2005. It was originally attributed to John Carmack but turned out to have a long history before Quake going back through SGI and 3dfx to Ardent Computer in the mid 80s to the original author Greg Walsh. The concrete code above is an adapted version of the Quake code (that’s where the comments are from).

This post has a bit of fun with this hack. It describes how it works, how to generalize it to any power between -1 and 1, and sheds some new light on the math involved.

(It does contain a fair bit of math. You can think of the equations as notes – you don’t have to read them to get the gist of the post but you should if you want the full story and/or verify for yourself that what I’m saying is correct).

## Why?

Why do you need to calculate the inverse of the square root – and need it so much that it’s worth implementing a crazy hack to make it fast? Because it’s part of a calculation you do all the time in 3D programming. In 3D graphics you use surface normals, 3-coordinate vectors of length 1, to express lighting and reflection. You use a lot of surface normals. And calculating them involves normalizing a lot of vectors. How do you normalize a vector? You find the length of the vector and then divide each of the coordinates with it. That is, you multiply each coordinate with

$\frac{1}{\sqrt{x^2+y^2+z^2}}$

Calculating $x^2+y^2+z^2$ is relatively cheap. Finding the square root and dividing by it is expensive. Enter FastInvSqrt.

## What does it do?

What does the function actually do to calculate its result? It has 4 main steps. First it reinterprets the bits of the floating-point input as an integer.

int i = *(int*)&x;         // evil floating point bit level hack

It takes the resulting value and does integer arithmetic on it which produces an approximation of the value we’re looking for:

i = 0x5f3759df - (i >> 1);  // what the fuck?

The result is not the approximation itself though, it is an integer which happens to be, if you reinterpret the bits as a floating point number, the approximation. So the code does the reverse of the conversion in step 1 to get back to floating point:

x = *(float*)&i;

And finally it runs a single iteration of Newton’s method to improve the approximation.

x = x*(1.5f-(xhalf*x*x));

This gives you an excellent approximation of the inverse square root of x. The last part, running Newton’s method, is relatively straightforward so I won’t spend more time on it. The key step is step 2: doing arithmetic on the raw floating-point number cast to an integer and getting a meaningful result back. That’s the part I’ll focus on.

## What the fuck?

This section explains the math behind step 2. (The first part of the derivation below, up to the point of calculating the value of the constant, appears to have first been found by McEniry).

Before we can get to the juicy part I’ll just quickly run over how standard floating-point numbers are encoded. I’ll just go through the parts I need, for the full background wikipedia is your friend. A floating-point number has three parts: the sign, the exponent, and the mantissa. Here’s the bits of a single-precision (32-bit) one:

s e e e e e e e e m m m m m m m m m m m m m m m m m m m m m m m

The sign is the top bit, the exponent is the next 8 and the mantissa bottom 23. Since we’re going to be calculating the square root which is only defined for positive values I’m going to be assuming the sign is 0 from now on.

When viewing a floating-point number as just a bunch of bits the exponent and mantissa are just plain positive integers, nothing special about them. Let’s call them E and M (since we’ll be using them a lot). On the other hand, when we interpret the bits as a floating-point value we’ll view the mantissa as a value between 0 and 1, so all 0s means 0 and all 1s is a value very close to but slightly less than 1. And rather than use the exponent as a 8-bit unsigned integer we’ll subtract a bias, B, to make it a signed integer between -127 and 128. Let’s call the floating-point interpretation of those values e and m. In general I’ll follow McEniry and use upper-case letters for values that relate to the integer view and and lower-case for values that relate to the floating-point view.

Converting between the two views is straightforward:

$m = \frac{M}{L}$

$e = E - B$

For 32-bit floats L is 223 and B is 127. Given the values of e and m you calculate the floating-point number’s value like this:

$(1+m)2^e$

and the value of the corresponding integer interpretation of the number is

$M + LE$

Now we have almost all the bits and pieces I need to explain the hack so I’ll get started and we’ll pick up the last few pieces along the way. The value we want to calculate, given some input x, is the inverse square root or

$y = \frac{1}{\sqrt{x}} = x^{-\frac 12}$

For reasons that will soon become clear we’ll start off by taking the base 2 logarithm on both sides:

$\log_2 y = {-\frac 12}\log_2 x$

Since the values we’re working with are actually floating-point we can replace x and y with their floating-point components:

$\log_2 (1+m_y) + e_y = {-\frac 12}(\log_2 (1+m_x) + e_x)$

Ugh, logarithms. They’re such a hassle. Luckily we can get rid of them quite easily but first we’ll have to take a short break and talk about how they work.

On both sides of this equation we have a term that looks like this,

$\log_2(1 + v)$

where v is between 0 and 1. It just so happens that for v between 0 and 1, this function is pretty close to a straight line:

Or, in equation form:

$\log_2(1 + v) \approx v + \sigma$

Where σ is a constant we choose. It’s not a perfect match but we can adjust σ to make it pretty close. Using this we can turn the exact equation above that involved logarithms into an approximate one that is linear, which is much easier to work with:

$m_y + \sigma + e_y \approx {-\frac 12}(m_x + \sigma + e_x)$

Now we’re getting somewhere! At this point it’s convenient to stop working with the floating-point representation and use the definitions above to substitute the integer view of the exponent and mantissa:

$\frac{M_y}{L} + \sigma + E_y - B \approx {-\frac 12}(\frac{M_x}{L} + \sigma + E_x - B)$

If we shuffle these terms around a few steps we’ll get something that looks very familiar (the details are tedious, feel free to skip):

$\frac{M_y}{L} + E_y \approx {-\frac 12}(\frac{M_x}{L} + \sigma + E_x - B) - \sigma + B$

$\frac{M_y}{L} + E_y \approx {-\frac 12}(\frac{M_x}{L} + E_x) - \frac{3}{2}(\sigma - B)$

$M_y + LE_y \approx {\frac 32}L(B - \sigma) - {\frac 12}(M_x + LE_x)$

After this last step something interesting has happened: among the clutter we now have the value of the integer representations on either side of the equation:

$\mathbf{I_y} \approx {\frac 32}L(B - \sigma) - {\frac 12}\mathbf{I_x}$

In other words the integer representation of y is some constant minus half the integer representation of x. Or, in C code:

i = K - (i >> 1);

for some K. Looks very familiar right?

Now what remains is to find the constant. We already know what B and L are but we don’t have σ yet. Remember, σ is the adjustment we used to get the best approximation of the logarithm, so we have some freedom in picking it. I’ll pick the one that was used to produce the original implementation, 0.0450465. Using this value you get:

${\frac 32}L(B - \sigma) = {\frac 32}2^{23}(127 - 0.0450465) = 1597463007$

Want to guess what the hex representation of that value is? 0x5f3759df. (As it should be of course, since I picked σ to get that value.) So the constant is not a bit pattern as you might think from the fact that it’s written in hex, it’s the result of a normal calculation rounded to an integer.

But as Knuth would say: so far we’ve only proven that this should work, we haven’t tested it. To give a sense for how accurate the approximation is here is a plot of it along with the accurate inverse square root:

This is for values between 1 and 100. It’s pretty spot on right? And it should be – it’s not just magic, as the derivation above shows, it’s a computation that just happens to use the somewhat exotic but completely well-defined and meaningful operation of bit-casting between float and integer.

## But wait there’s more!

Looking at the derivation of this operation tells you something more than just the value of the constant though. You will notice that the derivation hardly depends on the concrete value of any of the terms – they’re just constants that get shuffled around. This means that if we change them the derivation still holds.

First off, the calculation doesn’t care what L and B are. They’re given by the floating-point representation. This means that we can do the same trick for 64- and 128-bit floating-point numbers if we want, all we have to do is recalculate the constant which it the only part that depends on them.

Secondly it doesn’t care which value we pick for σ. The σ that minimizes the difference between the logarithm and x+σ may not, and indeed does not, give the most accurate approximation. That’s a combination of floating-point rounding and because of the Newton step. Picking σ is an interesting subject in itself and is covered by McEniry and Lomont.

Finally, it doesn’t depend on -1/2. That is, the exponent here happens to be -1/2 but the derivation works just as well for any other exponent between -1 and 1. If we call the exponent (because e is taken) and do the same derivation with that instead of -1/2 we get:

$\mathbf{I_y} \approx (1 - p)L(B - \sigma) + p\mathbf{I_x}$

Let’s try a few exponents. First off p=0.5, the normal non-inverse square root:

$\mathbf{I_y} \approx K_{\frac 12} + {\frac 12}\mathbf{I_x}$

$K_{\frac 12} = {\frac 12}L(B - \sigma) = {\frac 12}2^{23}(127 - 0.0450465) = \mathtt{0x1fbd1df5}$

or in code form,

i = 0x1fbd1df5 + (i >> 1);

Does this work too? Sure does:

This may be a well-known method to approximate the square root but a cursory google and wikipedia search didn’t suggest that it was.

It even works with “odd” powers, like the cube root

$\mathbf{I_y} \approx K_{\frac 13} + {\frac 13}\mathbf{I_x}$

$K_{\frac 13} = {\frac 23}L(B - \sigma) = {\frac 23}2^{23}(127 - 0.0450465) = \mathtt{0x2a517d3c}$

which corresponds to:

i = (int) (0x2a517d3c + (0.333f * i));

Since this is an odd factor we can’t use shift instead of multiplication. Again the approximation is very close:

At this point you may have noticed that when changing the exponent we’re actually doing something pretty simple: just adjusting the constant by a linear factor and changing the factor that is multiplied onto the integer representation of the input. These are not expensive operations so it’s feasible to do them at runtime rather than pre-compute them. If we pre-multiply just the two other factors:

$L(B - \sigma) = 2^{23}(127 - 0.0450465) = \mathtt{0x3f7a3bea}$

we can calculate the value without knowing the exponent in advance:

i = (1 - p) * 0x3f7a3bea + (p * i);

If you shuffle the terms around a bit you can even save one of multiplications:

i = 0x3f7a3bea + p * (i - 0x3f7a3bea);

This gives you the “magic” part of fast exponentiation for any exponent between -1 and 1; the one piece we now need to get a fast exponentiation function that works for all exponents and is as accurate as the original inverse square root function is to generalize the Newton approximation step. I haven’t looked into that so that’s for another blog post (most likely for someone other than me).

The expression above contains a new “magical” constant,  0x3f7a3bea. But even if it’s in some sense “more magical” than the original constant, since that can be derived from it, it depends on an arbitrary choice of σ so it’s not universal in any way. I’ll call it Cσ and we’ll take a closer look at it in a second.

But first, one sanity check we can try with this formula is when p=0. For a p of zero the result should always be 1 since x0 is 1 independent of x. And indeed the second term falls away because it is multiplied by 0 and so we get simply:

i = 0x3f7a3bea;

Which is indeed constant – and interpreted as a floating-point value it’s 0.977477 also known as “almost 1” so the sanity check checks out. That tells us something else too: Cσ actually has a meaningful value when cast to a float. It’s 1; or very close to it.

That’s interesting. Let’s take a closer look. The integer representation of Cσ is

$C_\sigma = L(B - \sigma) = LB - L\sigma$

This is almost but not quite the shape of a floating-point number, the only problem is that we’re subtracting rather than adding the second term. That’s easy to fix though:

$LB - L\sigma = LB - L + L - L\sigma = L(B - 1) + L(1 - \sigma)$

Now it looks exactly like the integer representation of a floating-point number. To see which we’ll first determine the exponent and mantissa and then calculate the value, cσ. This is the exponent:

$e_{c_\sigma} = (E_{C_\sigma} - B) = (B - 1 - B) = -1$

and this is the mantissa:

$m_{c_\sigma} = \frac{M_{C_\sigma}}{L} = \frac{L(1 - \sigma)}{L} = 1 - \sigma$

So the floating-point value of the constant is (drumroll):

$c_\sigma = (1 + m_{c_\sigma})2^{e_{c_\sigma}} = \frac{1 + 1 - \sigma}2 = 1 - \frac{\sigma}2$

And indeed if you divide our original σ from earlier, 0.0450465, by 2 you get 0.02252325; subtract it from 1 you get 0.97747675 or our friend “almost 1” from a moment ago. That gives us a second way to view Cσ, as the integer representation of a floating-point number, and to calculate it in code:

float sigma = 0.0450465;
float c_sigma = 1 - (0.5f * sigma);
int C_sigma = *(*int)&c_sigma;

Note that for a fixed σ these are all constants and the compiler should be able to optimize this whole computation away. The result is 0x3f7a3beb – not exactly 0x3f7a3bea from before but just one bit away (the least significant one) which is to be expected for computations that involve floating-point numbers. Getting to the original constant, the title of this post, is a matter of multiplying the result by 1.5.

With that we’ve gotten close enough to the bottom to satisfy at least me that there is nothing magical going on here. For me the main lesson from this exercise is that bit-casting between integers and floats is not just a meaningless operation, it’s an exotic but very cheap numeric operation that can be useful in computations. And I expect there’s more uses of it out there waiting to be discovered.

(Update: I’ve added an appendix about the intuition behind the part where I shuffle around the floating-point terms and the integer terms appear.)

This post is about one of those messages you see every day of your life as a programmer, your editor telling telling you: this file has changed, would you like to reload or keep your changes? The problem with this particular message is that usually neither option is what you want. You’ve made changes to that file for a reason and you neither want to lose those changes or go out of sync with the file system. There’s a lot of new editors about these days (which is fantastic) but I’ve yet to see one make an attempt to solve this problem. This blog post gives some thoughts on how it could be solved.

## The problem

The underlying issue here is that editors assume they have exclusive access to all files you are editing. Some editors including eclipse and emacs consider you to be editing a file as soon as you’ve opened it, before you’ve even made any changes. And once you have unsaved changes I don’t know of any editors that are forgiving if the file changes on disk in the meantime.

This assumption, that an editor can have exclusive access to a file, doesn’t seem sound. Just because you have a file open in an editor doesn’t mean it can’t change. If you’re working on a busy code base there could be people editing the same files as you at any time. Usually this is controlled through version control so you have control over when you sync and can make sure you don’t have unsaved changes when you do, but this is something you have to manage yourself. You get no help from your editor. Worse is if you’re working on something time critical, like configuration files that control a live application. These kinds of files can be very busy and you’ll want to both stay in sync and be able to queue up and submit changes quickly. In both cases, but especially the latter, the revert/override choice is really unhelpful.

And these problems is normal old-fashioned use of source code. Besides the emergence of new editors people are also busy working on web-based IDEs. This further undermines the assumption that an editor can own a file. As soon as you’re on the web it becomes much easier to share and collaborate. It may be desirable to support more fine-grained real-time collaboration in addition to the more coarse-grained version control model we’re used to, where modifications come in larger self-contained changesets. Real-time collaboration is not appropriate in many cases, but there are cases where it makes sense and it definitely shouldn’t be the capabilities of the editor that decides whether it’s feasible. If you want to use it the editor should support it, on the desktop or on the web.

## Living source files

The good news is that this issue has long been solved in the context of collaborative online tools like google docs and google wave (I know there are loads of others but these are the ones I know best). These kinds of tools can’t assume that you have exclusive access to the document you’re editing, there can be any number of entities modifying a document at the same time. They also can’t assume that what you’re seeing is the current contents of the document. In fact in this setting the concept of an authoritative “current document” isn’t even well defined – as long as each participant keeps editing and their changes take a while to propagate the participants may all see slightly different versions of the document, none of which is more authoritative than the others. Only once everyone has stopped editing and everyone has caught up to each others changes does the document stabilize and you can talk about authoritative current contents of the file.

This model may sound exotic and not appropriate for editing local files, and it may not be. But the underlying assumptions of those models, that what you’re seeing may not reflect the current contents of the file and that those contents may change at any time, are good assumptions to make even for local file editors. The next part is about how to make an editor work under those assumptions.

## OT

How do online collaborative tools work then? The ones I’m familiar with use operational transformation, or OT. The way OT works is by changing the focus from the question what are the contents of this file? to what operations have been performed on the contents of this file? This slight change in viewpoint makes a huge difference.

In a (very simplified) non-OT model your interaction with a source file in an editor works something like this:

• When a file is opened, read the current contents into a buffer.
• Any changes the user makes are applied directly (destructively) to the buffer.
• When user asks for the file to be saved the current contents of the buffer are written directly into the file. If the file has changed those changes would be lost so throw up a revert/override dialog and let the user choose.

In the OT model the interaction would work something along these lines. (This is a very rough outline, a proper OT model would be much cleaner than this makes it sound):

• When a file is opened, read the current contents into a buffer.
• Any changes the user makes are stored as operations, essentially a diff against the original file contents.
• When the user asks the editor to save the file it does the following:
• Read the current contents of the file.
• Generate a three-way diff between the original contents, the current contents, and the pending operations.
• Apply the resulting diff to the current file contents.
• Save the results.

While this isn’t completely accurate it gives a sense of the basics: what you’re doing is not editing a file, though it looks just like it, you’re building a set of changes against the file contents from one point in time. The changes you build may be applied to the original file contents, or they may be applied to a different version if the file changes. This model is much more robust to the way people actually work with source code. And though you may not always want your editor to merge changes on the fly when you save wouldn’t it be nice if it would at least give you the option?

Beyond just saving this model has several other nice properties. When the system notices that a file has changed, even before you want to save it, it can update your buffer by doing essentially the same as when saving except that instead of applying your pending changes to the current file contents it would apply the changes between the old and current file contents to the buffer containing your pending changes. It also allows you to undo saving since saving is equivalent to applying a diff and you can undo that, even when there’s been subsequent changes, just by reverting the diff.

There are probably numerous other solutions to this issue. I’m most familiar with OT because I worked with when I was on the wave project and I know it works extremely well. (I wish I had a good place to point anyone interested in learning the basics but I don’t, only the wikipedia page listed above). My main point is that this issue should be solved somehow, and it’s a problem that’s been solved in other contexts. I think it’s about time those solutions make their way to desktop editors.

## Stress

Came across an article about information overload on google+, which was very apropos of my recent efforts to not be so stressed at work. I’ve made some changes to how I work over the last few months, some of them just tweaks actually, and I feel like they’ve improved my ability to focus and made work much less stressful. YMMV, obviously.

• I use virtual desktops to minimize distraction. I have three virtual desktops: one desktop with just what I’m working on right now, nothing else, one with email, code reviews, and other things that might distract me, and one with personal stuff like personal email. I can easily switch desktops and check emails, but the only thing that can impose on me are chat popups, and people don’t chat me a lot. I occasionally check for new important email or code reviews but I can choose when, typically while I’m waiting for something on my work desktop to complete.
• I keep the tasks I’m working on written down and categorized according to when I want to complete them. I pick one task from the stuff-I-should-do-soon pile and work just on that. When that pile is empty, or it’s been a while since the last time, I look over all my tasks, usually a quarter’s worth, and move some out of and some into the soon-pile. This means I don’t have to worry about all the things I’m not working on right now because I know I’ve considered what I could be working on and chosen what I’m doing right now as the most important. This has been very helpful in making me less stressed. This is apparently a well-known approach, I first saw it in this video about how to use Asana (Using Asana for Individual Task Management).
• I make an effort to only work 8 hours a day. This is a tough one because it sometimes makes me feel like I’m slacking off, especially because I come in early so I’m almost always the first to go home. However, there’s plenty of research suggesting that even though you may feel you’re getting more work done working 10 hours a day it’s an illusion and you’re actually doing less than if you worked 8 hours (see why crunch mode doesn’t work). At first I just trusted the research, against my own intuition, but at this point I’m convinced — I feel much more focused and efficient and I see no evidence that I’m being less productive. I will occasionally work more for a while but I always feel a “overwork hangover” afterwards where I’m tired, less focused, more likely to space out. But at this point I know that and can make a conscious choice about whether it’s worth it.

Some of these require a bit of motivation; the one really easy thing is to clear any distractions from your workspace, especially if like me you were silly enough to have a browser window there. It’s easy and it makes a big difference.

## Monitoring

One really useful programming tool I use whenever it’s available is exported stats monitoring. It’s one of those things that has no standard definition and which people reimplement over and over in different ad-hoc ways. This post is about a monitoring tool for JavaScript programs running in chrome I implemented recently, WebMon.

## WebMon

WebMon is a very simple tool conceptually; this image tells you all you need to know to use it:

It has two parts: a JavaScript library, webmon.js, and a chrome extension, WebMon. In your JavaScript code you export a stat by creating an instance of the webmon.Counter type which is provided by the library. During execution you update the stats as appropriate using the methods on the Counter instances. The WebMon chrome extension detects any page that uses the library and can show you a popup that shows the current value of each exported stat, updated continuously. You can also specify some simple computations on the data, for instance here the frame rate counter is configured to display the rate per second rather than the count. To see this in action try installing the chrome extension and going to this test page. The extension needs pretty broad permissions to see stats on any web page so for the paranoid you may want to skim the source code before installing it.

The motivation for implementing this came from a WebGL-based hobby project I’m working on. I needed to keep track of the frame rate and rendering CPU load and my initial implementation which displayed those in a div on the page was just not working out. WebMon gives you the same functionality with a minimum of complexity and performance impact on the JavaScript side since almost all the code is in the chrome extension. It also doesn’t require you to clutter your page with debug info.

That’s all there is to it. The rest of this post will be about exported stats in general.

## Monitoring

WebMon is just one example of the more general mechanism of exported stats. The general model is: a program exports a set of stats which it updates during execution. These stats are visible to some form of external processor/viewer which can record or at least see their value at any time. A simple example is chrome. If you run chrome with the --enable-stats-table command-line flag you can go to chrome://stats and see the current value of a number of chrome’s internal counters and timers. V8 has a similar mechanism but since v8 doesn’t have a UI that can easily display the stats I wrote a python script that continuously reads the counter values and displays them in a window. I used it all the time when working on the v8/chrome integration layer since it allowed me to see exactly how many dom node wrappers were live, exactly what had happened during the last garbage collection, etc.

Much of this you could also get by printing debug information and processing it either manually or, for more complex output, using separate programs. A place this works really well is dumping debug information after garbage collections in a virtual machine. Garbage collections happen rarely enough that it’s not a lot of overhead and you’re likely to want information about every collection, not just the last one. Compared to logging the advantage of using an exported stat is that updating it is cheaper — in the v8 implementation it’s just writing a single word in memory — and it scales well. If you log information about 100 different conditions your log becomes huge and difficult to process, and the I/O slows your program down. With exported stats the external processor or viewer can easily filter the stats for you and only show the ones you’re interested in. Also, the space for each export stat is constant, typically just a single word containing the stat and another few words of metainformation. Another advantage of exported stats is that they’re testable — you can easily read back and test the value of a counter whereas testing that the right thing is printed in a debug log is tricky.

Really I think each programming language should come with stats exporting and monitoring built in or as a canonical library. Or even better, since monitoring and viewing can (and should)  be a separate process there could be a single common monitoring tool with a standard protocol that would have bindings for each language.

Until that happens, if you’d like to know what is happening in your JavaScript program as it happens you should give WebMon a try.

## Gray

It turns out that Google’s generous supply of free snacks to employees does more than just keep us fat and happy, my efforts to keep my snack intake down led to some interesting math too. In this post I’ll have a bit of fun with Gray code, prove wikipedia wrong (shock!), and get healthier by eating fewer snacks.

It recently occurred to me that my snack counting system was in unary: one piece of sticky tack for one snack. As a self-respecting computer scientist obviously I find this unacceptable – unary is okay for some very limited uses but binary should really always be your first choice. Clearly I had to change to a binary system.

The problem with standard binary numbers though is that I don’t want to have to move to many dots around. It sucks when I’m having my fourth snack of the week that I have to move all 3 dots to go from 011 to 100. Luckily there is a solution to this: Gray code, which allows you to count as efficiently as normal binary but only flips one bit at a time.

To use Gray code though I needed an easy way to count from one number to the next. It’s fine to know that I only need to flip one bit but I also need to know which one to flip. Skimming the wikipedia page I found this recipe for counting:

[…] at step i find the bit position of the least significant ‘1’ in the binary representation of i – flip the bit at that position in the previous code to get the next code.

This is actually incorrect (more on that below) but I didn’t know that when I first read it. I just though “that sounds like a lot of work, surely it doesn’t have to be that difficult”. And it doesn’t if you look at how Gray code1 is constructed.

Gray code is constructed recursively. Given the sequence of Gray codes that count from 0 to n, for instance here are the values that count from 0 to 3:

00 01 11 10

you construct the set of numbers counting from 0 to 2n by adding a zero in front of the previous sequence, using that to count up to n, and replacing the 0 with a 1 and counting the previous sequence backwards to 0:

000 001 011 010 110 111 101 100

This is a really neat construction and makes it easy to prove inductively that they have some of their nice properties: that you always change exactly one bit, that you never hit the same value twice, etc.

This construction also means you can increment a Gray coded number recursively, using this procedure:

1. If the first digit is 0 and the remaining digits can be incremented, increment them.
2. If the first digit is 0 and the remaining digits are already at their largest possible value, flip the first digit.
3. If the first digit is 1 decrement the remaining digits.

The procedure for decrementing is the same, just inversed. This may sound complicated but can be phrased differently in a way that makes it easier to use in practice.

• You want to flip exactly one bit. Initially you want to flip a bit up, that is, from 0 to 1.
• You start from the far left, going right, and you must flip the least significant bit that can be flipped the way you want. You can’t flip a 0-bit down since it’s already down, and you can’t flip a 1-bit up.
• When you meet a 1 you change your mind about which way you want to flip, starting at the next bit. If you wanted to flip up before, going forward you’ll want to flip down and vice versa. But only from the next bit.

I call this the flip-flop2 algorithm because you keep changing your mind about which way you want to flip the bit. Here’s an example of how it works.  Say you want to increment this Gray coded number (it’s happens to be 115 but the concrete value is irrelevant):

01001010

You start out wanting to flip a bit up. The first bit is 0 so you can up it, which makes it a candidate for the bit to flip. We’ll mark it like this:

01001010

The arrow goes up because we want to flip the bit up, and it’s green because it’s possible to flip it. Moving left, we also want to flip the next bit up but it’s already a 1 so we can’t and we make that one red.

01001010

Moving left again, since the last bit we saw was a 1 we now change our minds and want to flip a bit down. The next two bits are both 0 so we can’t down those and mark them red too:

01001010
↓↓

The next bit is a 1 which we can down,

01001010
↓↓

and as usual we change our mind and now want to up. The next bit is a 0 which we can up

01001010
↓↓

And so on: the next bit we can’t up and the last bit we can’t down so both become red. The least significant bit we could flip the way we’d want is the third so that’s the one we’ll flip. The end result is this:

01001010 → 01001110
↓↓

In practice it’s easy to do. You just scan the number from left to right, keeping track of operation you currently want to do and what the last position was where you could do an operation. Here’s a sequence of the Gray numbers from 115 to 122 with the procedure arrows like above; the bold digit is the one that was changed in the previous step:

01001010 → 01001110 → 01001111 → 0100110
↓↓↓   ↓↓   ↓↓   ↓↓

01001100 → 01000100 → 01000101 → 01000111
↓↓   ↓↓   ↓↓   ↓↓

Besides being a neat way to increment a Gray code the flip-flip algorithm has a few interesting quirks that are worth taking a closer look at. One thing you may have noticed is that each arrow in the diagrams have two binary properties: they each have a direction, up or down, and they each have a color, red or green. If you read these properties as bits they form two different numbers: the direction value where up is 0 and down is 1, and the color value where green is 0 and red is 1. Here is an example:

01001010       Gray code
↓↓
01110011  115  Color value
00111001  57   Direction value

If you think the color value, 115, looks familiar that’s because is is: it’s the binary representation of the Gray code. I found that pretty surprising when I noticed it. If you look at the color and direction values next to each other it’s also clear that the direction is actually the color right-shifted once. Here are the the color and direction values spelled out for the sequence above:

01001010 → 01001110 → 01001111 → 0100110
↓↓↓   ↓↓   ↓↓   ↓↓
01110011   01110100   01110101   01110110
00111001   00111010   00111010   00111011

01001100 → 01000100 → 01000101 → 01000111
↓↓   ↓↓   ↓↓   ↓↓
01110111   01111000   01111001   01111010
00111011   00111100   00111100   00111101

The intuitive reason for this becomes clearer if you interpret the procedure slightly differently. In the original phrasing you want to either change a 0 to 1 or a 1 to 0, and you keep changing your mind about which you want to do. An alternative view is to say that you always want to make a 0 into a 1, but you change your mind about what is 0 and what is 1. After a 1 you now interpret a 0 in the input as a 1, and vice versa. That way, in the cases where you can make the change you want, that is where the arrow is green, the bit must have been what you considered 0. And in the cases where the arrow is red the bit must have been what you considered 1 at that point.

The view of flipping the interpretation of bits actually matches the way Gray code is constructed. To see this consider how you increment from 7 to 8 in Gray and standard binary. With standard binary you go from 0111 to 1000, flipping all four bits. In Gray you go from 0100 to 1100 which is in some sense equivalent except that when you flip the highest bit you don’t have to explicitly also flip the lower bits because they are changed implicitly by changing your interpretation of 0 and 1. Viewing the numbers and the algorithm this way makes it easier to understand intuitively that the bits where you can make the change you want must correspond to a 0 in the binary representation and the bits where you can’t must correspond to a 1. Under this view the direction value tracks your view: if a direction bit is 0 you view the bits normally, if it is 1 you view them in inverse.

But wait, there’s more. The simplest way to Gray encode a binary number n is using xor:

int gray(int n) {
return n ^ (n >> 1);
}

As we saw before the color value above is n and the direction is n right-shifted once. So in fact if you xor together the color and direction values you get the value you started with. And if you check with the examples above that is indeed the case. Intuitively, since the direction value determined how view view the bits, if there is a 0 in the direction value the corresponding bit in the Gray value is the same as in the binary value. If the direction value has a 1 you have to flip the bit in the Gray code to get the binary value. This is exactly how xor works, so xoring the direction value and the Gray value will give you the binary value. And, since xor is associative, xoring the direction and the binary value gives you the Gray value.

There is one point left to revisit which is the recipe given in the wikipedia article for incrementing a Grey code. Here it is again:

[…] at step i find the bit position of the least significant ‘1’ in the binary representation of i – flip the bit at that position in the previous code to get the next code.

We can now tie this together with the flip-flop algorithm. Since the standard representation of i is the same as the color value, what this is saying is: find the least significant 1-bit in the color value, which is the least significant red arrow, and flip it. This seems wrong – surely you’re supposed to flip the least significant green arrow? Let’s try their algorithm.

Take a random number, say the Gray code for 4, 110. The binary representation of 4 is 100. The least significant (and only) 1 is in the most significant bit position. Flipping that in the Gray code gives you 010. This is actually the code for 3. So what seemed wrong intuitively, flipping the red arrow, was indeed wrong.

Instead what the flip-flop algorithm suggests is to flip the least significant green arrow, or 0-bit in the standard representation. Let’s try that. The least significant 0 in 100 is at the least significant bit position. Flipping that in the Gray code gives 111, which is indeed the code for 5. So it works in the example, and it works in general. Someone should probably fix that wikipedia article.

1: Technically there’s more than one encoding that can be called Gray code but people usually mean binary-reflected Gray code and that’s what I mean too.

2: I have no doubt this algorithm is well known so the fact that I’m naming it doesn’t mean I’m claiming to have invented something new – it’s just convenient to have a name for it

## Tracing Promises

Ever so often I find myself having to write JavaScript programs that interact with the world through XMLHttpRequests. And every time the first thing I do before I actually get to send off requests is implement a small promise framework. Promises are just so much more powerful than callbacks that at this point I wouldn’t even consider writing raw callback-based asynchronous code.

One thing about promises though, especially if you program is just moderately complex, is that they make it much more difficult to follow the flow of your code. Everything happens asynchronously in short callbacks and when an error occurs it can be near impossible to piece together what happened. This post is about how to solve that problem.

## Promises in general

Before getting to the part about tracing them here are some examples of uses of promises, to set the stage for the kind of operations we’ll later want to trace. These examples use idioms from my own promise library but these considerations apply to any of the libraries out there, the concrete patterns may just be different.

One place I relied heavily on promises was in the implementation of the sputnik test web runner (which has now been subsumed by test262). The function for firing off an XHR using google’s closure library looked like this:

function makeRequest(path) {
var result = new Promise();
goog.net.XhrIo.send(path, function () {
result.fulfill(this.getResponse());
}, "GET", undefined, undefined, 0);
return result;
}

What this function does is return a promise object that gets fulfilled when the XHR, which is sent off at the same time, completes and invokes its callback. This may look like a whole lot of nothing – basically making a callback into a promise – but the advantage comes at the call site. Say you have two mirrors of the same data and want to send a request to both and use the first response you get back. With promises that is trivial:

// Fetch from both mirrors, use the first response.
var response = Promise.select(makeRequest(mirrorOne),
makeRequest(mirrorTwo));

The select operator takes any number of promises and returns a new promise that resolves as soon as any of the child promises resolve. Another use for select to set timeouts:

// Fetch path but time out after one second.
var response = Promise.select(makeRequest(path),
Promise.failAfter("Promise timed out", 1000));

// Alert successful response, log any errors in the console.
response
.onFailed(console.log.bind(console));

Here the response will resolve either to the result of the request, if it happens within one second, or else will fail with the string "Promise timed out" when the result of failAfter fails after one second.

To get the results of several concurrent promises you can use join. This code sends off two XHRs at once, waits until they’re both complete, and then alerts the concatenation of the results.

// Fetch pathOne and pathTwo in parallel.
var request = Promise.join(makeRequest(pathOne),
makeRequest(pathTwo));

// When done, concatenate the two responses.
var response = request.thenApply(function (dataOne, dataTwo) {
return dataOne + dataTwo;
});

The join operator takes a list of promises and returns a promise that resolves to the list of values of all the child promises once they’re been resolved. The thenApply operator on a promise returns a new promise whose value will be the result of applying the function to the value of the original promise.

Note that these examples are a lot more verbose than you would usually make such code. I wrote them like that to make it clearer what’s going on. Some operations would be packed into convenience methods, like setting a timeout, and you would generally use method chaining. With that the last example, with the addition of a three second timeout, would look like this:

var response = Promise
.join(makeRequest(pathOne), makeRequest(pathTwo))
.thenApply(function (a, b) { return a + b; })
.withTimeout(3000)
.onFailed(console.log.bind(console));

Packs quite a punch doesn’t it? This is what I mean when I say that promises are more powerful than callbacks; it’s not that one is more expressive than the other, they’re computationally equivalent, but in terms of conciseness and power of abstraction there’s just no competition. And this is just a small part of what you can do with promises. A good place to see the power of this style of programming in action is twitter’s finagle stack.

## The Errors! The Errors!

All’s well with this model as long as all operations succeed. However, as soon as operations start failing this is a mess. Consider for instance the case where you have numerous concurrent operations running that have timeouts. Errors are propagated through promises nicely so if you’ve set up a failure handler you’ll see the error, but it will look something like this:

Operation timed out
at promise.js:108:18

And that’s in the best case where error objects have stack traces. Since each step in the chain of promises is detached and runs in a separate turn of the event loop all you’ll see is that some timeout somewhere fired. There’s no telling which operation took too long. As soon as your program is just moderately complex this becomes a major issue. It’s like with normal synchronous programs if there were no stack traces. Stack traces are there for a reason: it sucks to debug without them.

The stack trace analogy is useful in understanding what information you’d want to have, if you could, when a promise fails. All these patterns of asynchronous operations correspond, when you’re only interested in error reporting, to simple call/return patterns we’re used to. For instance, the join operation is just an asynchronous version of taking a list of functions, calling each one, and returning a list of their results:

function syncJoin(funs) {
var result = [];
for (var i = 0; i ... funs.length; i++)
result.push(funs[i]());
return result;
}

In this case if calling any of the functions fails then syncJoin will fail and both calls will show up in the stack trace. The same thing holds for join on promises: if join fails because one of the child promises fail you want information about both. The same thing holds for select, for the same reason.

This all suggests that when a promise fails and propagates that failure through a number of other promises what you as a programmer need is information about each promise in that chain. Something like:

Operation timed out
at promise.js:108:18
--- triggering the failure of ---
at Function.withTimeout (promise.js:106:17)
at runDemo (demo.js:226:11)
at demo.html:7:168
--- triggering the failure of ---
at Function.select (promise.js:50:16)
at Function.withTimeout (promise.js:110:18)
at runDemo (demo.js:226:11)
at demo.html:7:168

This is a trace created using the promise trace library I wrote last week to trace down some issues with a chrome extension I was working on. A promise trace is made up of segments, each one a stack trace identifying a promise. The top segment is the error that caused the failure, in this case exactly the same timeout as before. The next one is the promise created by withTimeout that fails when the timeout is exceeded. The bottom segment is the select between the operation and the timeout, and going to demo.js:226 will tell you which operation it was.

The trace above is from chrome; here’s the result of the same failure but in firefox

Operation timed out
--- triggering the failure of ---
GenericTraceSegment()@trace.js:197
Promise()@promise.js:23
([object Object],100)@promise.js:106
runDemo()@demo.js:226
--- triggering the failure of ---
GenericTraceSegment()@trace.js:197
Promise()@file:promise.js:23
([object Object],[object Object])@promise.js:50
([object Object],100)@promise.js:110
runDemo()@demo.js:226

It’s a bit more cluttered since the firefox stack trace api doesn’t support stripping irrelevant top frames but both contain the relevant clues you need when debugging.

Both traces contain a fair amount of redundant information. For instance, the two bottom promises are created close to each other so the bottom of their stack traces are identical. To make the traces easier to read the promise trace library folds away overlapping stack traces. The actual trace that would be printed for the above error is

Operation timed out
at promise.js:108:18
--- triggering the failure of ---
at Function.withTimeout (promise.js:106:17)
at runDemo (demo.js:226:11)
at demo.html:7:168
--- triggering the failure of ---
at Function.select (promise.js:50:16)
at Function.withTimeout (promise.js:110:18)
at runDemo (demo.js:226:11)
(... rest of trace same as previous ...)

This is probably still more cluttered and redundant that it needs to be – in my experience for each segment you only need to know the top frame below the call into the promise library and the bottom frame that is different from the previous segment. But I’m not sure that’s true in all cases so for now this is as small as I’d like to make it.

## Getting promise traces

How does this api work? From a user’s perspective all you see is that on failure your callback is given two values: the error and a trace. You’d typically use it something like this:

myOperation.onFailed(function (error, trace) {
console.log(trace.toString());
});

This requires some support from the promise library but not a lot actually. Whenever a promise is created it has to create a PromiseTraceSegment value and when propagating a failure it should create and propagate a chain of PromiseTrace objects which collect the relevant segments. In my own promise library the tracing code makes up maybe ten lines.

## Performance

The way this is implemented, as is probably evident from the examples above, is by capturing a stack trace every time a promise is created (there’s also a flag to disable trace capturing altogether). The whole thing is actually pretty straightforward, the whole trace library is ~200 lines of which most are for formatting the output. Capturing stack traces is not super expensive generally but even relatively cheap operations add up if you do them often enough. To get a sense for  how expensive tracing is I wrote a simple ludicrously asynchronous implementation of the Fibonacci function:

function lazyFib(n) {
if (n == 0 || n == 1) {
return Promise.of(1);
} else {
return Promise.deferLazy(function () {
return Promise
.join(lazyFib(n - 2), lazyFib(n - 1))
.thenApply(function (a, b) { return a + b;});
});
}
}

I then tried running it with and without promise tracing. In chrome tracing makes it around 60% slower on average. In firefox the cost is only around 20% but then this program was a lot slower overall. I believe this is because in firefox the event queue runs at a slower rate by design. In safari you can enable tracing and it’s essentially free but in the version of safari I tried you can’t actually get a stack trace so the promise traces produced are useless. I believe stack trace support is coming to safari too though.

This example is the absolute worst case performance cost; nobody would write as promise-heavy a program as that example. To also get a sense for the impact on a more realistic program I wrote a larger benchmark, one that simulates an RPC system with a string repository that holds a large number of strings which have to be fetched asynchronously, a few at a time, and then processed by a chain of other asynchronous operations. It was still ridiculously asynchronous, much more than any realistic program, but at least not quite as trivial. This time the performance cost in chrome was 0.8% on average. Statistically significant but tiny.

## Conclusion

Debugging asynchronous JavaScript programs is difficult. However, adding support to libraries that implement asynchronous abstractions like promises is straightforward and the overhead of collecting traces is likely to be insignificant for any realistic JavaScript program.

## MyJS

The first large JavaScript program I wrote after ES5 came out was my mercury chrome extension. I was surprised at how big a difference to my code style some relatively small changes to the language makes. Before then I had followed the process that led to ES5 and knew about those features long before I tried them but it was only when I actually tried them I got as sense and feel for how they felt to use. I thought, wouldn’t it be useful if you could get that while developing the language rather than after?

So a few months ago I sat down to write a language experimentation framework for JavaScript, MyJS. What MyJS does is allow you to define a language extension, including new syntax and/or library functions. A source file (html or js) can then specify which extensions it uses and the parser and translation framework will plug extensions together as appropriate to parse your program and convert it to plain JavaScript. How it works is worth a blog post of its own but here I’ll outline how you define a language extension and use it.

## Delegates

You can try MyJS by going to this test page (I haven’t tested this in all browsers, it probably doesn’t work in all of them). The program that prints the output you see on that page uses the delegates extension, the operator ::, which returns a python-style bound method. This is what the program looks like:
// A Printer can print text to a document.
function Printer(prefix) {
this.prefix = prefix;
}

// Add this printer's prefix and print the given value.
Printer.prototype.print = function(value) {
var elm = document.createElement('div');
elm.innerText = this.prefix + value;
document.body.appendChild(elm);
};

var p = new Printer("Looky here: ");

[1, 2, 3].map(p::print);
(::print)(p, "Where?");
(The idea for the delegates operator is Erik Corry‘s) In this case the binary (“bound”) :: operator returns a function that, when called, calls p.print with the arguments it’s given. So (p::print)("foo") means exactly the same as p.print("foo"). The unary :: (“unbound”) operator returns a function that uses its first argument as the receiver and the rest as arguments. So (::print)(p, "bar") is the same as p.print("bar"). This is just a random example of an extension, the operator itself isn’t that important.

## Using a dialect

The way you use an extension is by using a MyJS specific script tag:

This code first includes the MyJS java library. It then includes a script called delegates.js, which defines the delegates extension. MyJS comes with a few language extensions built in which makes it easier to write new language extensions. Delegates uses one of those, myjs.Quote. Finally, once we’ve loaded the delegates extension the third script element can use the extended language. Easy!

## Defining an extension

How do you define a language extension? It has four parts: defining syntax trees, describing the concrete syntax, implementing the rules for translating the new syntax trees into plain JavaScript, and finally hooking it all into the framewrk. The full definition of the delegates extension is here. For the delegates extension we need two new syntax trees, one for bound and one for unbound delegates:

// Bound expression syntax tree node.
function BoundMethodExpression(atom, name) {
this.type = 'BoundMethodExpression';
this.atom = atom;
this.name = name;
}

// Unbound method syntax tree node.
function UnboundMethodExpression(name) {
this.type = 'UnboundMethodExpression';
this.name = name;
}

The syntax tree format is modelled on Mozilla’s Parser API and the parser for the standard language constructs produces parser api compatible syntax trees.

To produce these we need to hook into the grammar. The syntax is pretty straightforward:

LeftHandSideSuffix
::= "::" Identifier

PrimaryExpression
::= "::" Identifier

(A left hand side suffix is basically the right hand side of an operator expression). The way to express this in MyJS is to map it into calls to the grammar constructor library:

function getSyntax() {
// Suffix syntax tree builder helper.
function BoundMethodSuffix(name) {
this.name = name;
}

// Apply this suffix to a base expression.
BoundMethodSuffix.prototype.apply = function(atom) {
return new BoundMethodExpression(atom, this.name);
};

// Build the syntax
var f = myjs.factory;
var syntax = myjs.Syntax.create();

// LeftHandSideSuffix ::= "::" Identifier
syntax.getRule('LeftHandSideSuffix')
.setConstructor(BoundMethodSuffix);

// PrimaryExpression ::= "::" Identifier
syntax.getRule('PrimaryExpression')
.setConstructor(UnboundMethodExpression);

return syntax;
}

There’s a bit of a quirk here because a left hand suffix doesn’t correspond to a “full” syntax tree, it has the left hand side missing. So instead it returns a suffix object that will be called later by the parser with the left hand side as an argument and must return the full syntax tree.

The reason we don’t just build the syntax but define a function for doing it is that we only want to build it if the dialect is used. If it is just defined but never used it’s a waste of time to build the syntax.

This allows the framework to parse extended code and build syntax trees. The next part is to define how they are translated into plain JavaScript:

BoundMethodExpression.prototype.translate = function(context) {
return #Expression((function(temp) {
return temp.,(context.translate(this.name)).bind(temp);
})(,(context.translate(this.atom))));
};

UnboundMethodExpression.prototype.translate = function(context) {
return #Expression(function(recv, var_args) {
return recv.,(context.translate(this.name)).apply(recv,
Array.prototype.splice.call(arguments, 1));
});
};

This is where it gets a little bit tricky. This code uses a language extension, myjs.Quote, to plug together syntax trees. But while this is tricky the first time you see it is a lot better than having to build syntax trees by hand without quoting. The #Expression part means: the following code should be parsed as an expression but don’t run the code, return the syntax tree. The commas mean: this must be evaluated and the result, which is a syntax tree, must be spliced into the surrounding tree. If you know quasiquote from scheme that’s basically what it is. What this code does is say: if you have the syntax tree A::B translate it into something like

(A.B).bind(A)

except that will cause a to be executed twice so we use

(function (t) { return (t.B).bind(t); })(A)

and similarly for ::A. The recursive calls to translate are there to translate the subexpressions.

The last part is to hook it all into the framework. That’s done using this incantation:

myjs.registerFragment(new myjs.Fragment('demo.Delegates')
.setSyntaxProvider(getSyntax)
.registerType('BoundMethodExpression', BoundMethodExpression)
.registerType('UnboundMethodExpression', UnboundMethodExpression));

Here we give the extension a name, demo.Delegates, register the function that will return the syntax to use, and register the two new types of syntax trees. That’s what it takes to define an extension.

Fragments also allow you to set up install hooks, so if your extension needs to add library functions to the global object for instance you can specify a function that is given the global object as an argument and can install any functions and methods it needs.

## …six months later

I sort of ran out of steam on this project six months ago so while the “hard” part is there, the extensible parser framework, the basic language definition (all the standard language constructs are defined as if they were extensions), the translation framework, etc., there is still some work left to do before everything works – for instance, before you could define the harmony classes syntax. It would also be great if you could specify a dialect the same way you specify strict mode, within the script (“using demo.Delegates”).

I also played around with making this work with node.js but I couldn’t find the hooks in the module importing primitives that would allow me to intercept module loads and do the source translation.

The code lives on github.

## Documentation

After taking a long break over the winter I’ve started working on neutrino again. When I left off I was working on generating static data in the binary object files: basically, taking neutrino objects and constructing binary data in the generated executables. I know how my code generator should behave for this to work but changing it to do that has been tricky because it is a large amount of relatively complex code (generating binaries is a relatively complex process) and I’ve forgotten how it works exactly. The problem is that I didn’t do a good job of documenting it. So my step one is going to be documenting it. Or, it turns out, that actually will be step two because at this point neutrino doesn’t have a documentation format – so step one will be implementing one.

I’ve always felt that java’s JavaDoc comments were a good idea in principle but problematic in a lot of the particulars. The good part is that, unlike systems like doxygen and JsDoc, it’s a standard that comes with the language, and it provides all the primitives I want then documenting code. However…

JavaDoc builds on top of block comments. Block comments are for free-form, anything goes text. This means that it’s tricky if the language wants to impose rules on them, even if those rules might be useful. For instance, it’s legal to @link to a class or method that doesn’t exist.

Also, since they it is based on comments the connection between JavaDoc and the code it documents is weak. This, for instance, is meaningless but perfectly legal:

String x = /** Looky here! */ "foo";

Finally, JavaDoc builds directly on HTML which is fine if what you want to generate is HTML but problematic if you want to generate anything else. It is also difficult to read outside of a browser and I almost always read the documentation in the code, not in the generated output.

A small detail that’s always bugged me about them, also, is that they waste two lines of space:

/**
* Two lines of comment takes up three lines of code
* because of the start and end markers.
*/

This is a two-line comment, really, and it shouldn’t take up more than two lines in the program.

For all these reasons I’m not keen to base neutrino’s documentation format on JavaDoc and I’m not familiar with any other formats that solve all of these issue. So I’ve ended up starting basically from scratch, with some inspiration from a language I previously worked on, neptune.

Neutrino
I’ll start with the two last points first, the syntax. Here’s one example of what a piece of documentation looks like:

/| Moves the specified number of disks from the @from@
| peg to the @to@ peg.
|
| param(from): Which peg to move from.
| param(to): Which peg to move to.
\| param(disks): How many disks to move.

def this.move(from:, to:, disks:) {
...
}

I’d like the syntax to be such that you don’t need the start and end marker to require a separate line each. In other words, you need to be able to write the documentation on the same lines as those markers. So the way this works is: /| starts a comment and \| marks that that lines is the end of the block, but the whole line is included in the comment. These last markers can also occur alone, without a start marker, in which case it’s a single-line doc comment:

\| Creates a new hanoi instance.
@static def Hanoi.new() => ...

The first character on each line (above that would be the | character) are used to determine how to interpret each line. A | means that it is just text. A @ means that the text is code. For instance:

/| Moves the specified number of disks from the @from@
| peg to the @to@ peg. For instance, to execute the
| towers of hanoi with 15 disks do:
|
@ def hanoi := new Hanoi();
@ hanoi.move(from: 0, to: 2, disks: 15);
|
| param(from): Which peg to move from.
| param(to): Which peg to move to.
\| param(disks): How many disks to move.

def this.move(from:, to:, disks:) {
...
}

This way it’s as easy to identify and read code examples within the code as it is for a doc generator to recognize.

The second thing that makes these different from JavaDoc is that they’re not comments, they’re annotations and can only occur in the same positions as annotations. Documentation always belongs to a particular element and it is clear from the grammar which element that is. The code above is, in principle, equivalent to:

@doc(...) def this.move(from:, to:, disks:) {
...
}

This also means that documentation can stay attached to source elements at runtime and be accessed through reflection, like any other annotations.

Finally, the format used is not HTML but something very close to textile. This is, again, to make it easier to read from source code, and to make it easier to generate documentation in other formats than HTML. I remember we had a doc generator for neptune that generated a PDF file for a module, which worked really well (and looked, if I do say so myself, awesome).

I think this format has a lot of appealing properties and I’m actually eager to get started documenting the code generator.

## Keywords

I’m finally through the Big Cleanup (well, the first one, I’m sure there’ll be others) and have now arrived at the fun part: simplifying and generalizing based on the new cleaner infrastructure, and implementing completely new stuff. On the surface level I’ve made a few syntax changes to free up syntactic elements for some future constructs, including changing from -> to => for functions and : to in in for loops. Where before you would write

def this.size() -> this.elements.length;

def this.print_all() {
for (elm : this)
print(elm);
}

you must now write

def this.size() => this.elements.length;

def this.print_all() {
for (elm in this)
print(elm);
}

This is because I need the small arrow for delayed calls (such as process->run()) and I need the colon for keyword arguments. I have just completed a first shot at implementing keyword arguments which required a complete overhaul of how calls work. But now that they’re implemented they seem to be well worth the trouble.

To illustrate why keyword arguments are helpful consider this method from the implementation of towers of Hanoi:

def this.run(size) {
this.build(0, size);
this.move(0, 1, size);
this.moves;
}

Quick, what does this do? If the algorithm being implemented here isn’t clear in your when you read this it’s not obvious – you have to know the solution to infer what the various arguments are and what this does. Now consider this:

def this.run(size:) {
this.build(pile: 0, disks: size);
this.move(from: 0, to: 1, disks: size);
this.moves;
}

With keyword arguments you don’t have to guess how this works, the code tells you: there are piles with integer indices, we first build pile 0 with size elements and then and move that many disks from 0 to 1. Inferring this from the implementation without keywords is possible but requires you to think. Inferring it from the second requires no effort, it is clear from the code.

Keyword arguments are not a new invention, they are present in several popular languages. However, I would claim that they are used in an ad-hoc fashion, in part because languages like python give them to you for free. In python, any function, for instance

def run(self, size):
...

can be called with or without keyword arguments, in this case h.run(3) and h.run(size=3). This is good because it means that there’s no extra effort required to enable keyword arguments, but it’s bad because it means that you don’t have to put much thought into that part of your interface. There is also no established rule of thumb that I know of for when to use keywords in python — the google python style guide doesn’t even have an opinion, which really says something.

In neutrino, the intention is that the same way you make code more readable by using descriptive variable and method names, you should use keyword arguments. It’s just one of the mechanisms that can be used to make code self-documenting. The basic assumption is that keywords should be used whenever they aid code readability but left out if they are redundant, which is actually most of the time. Keywords are a public part of a method’s signature and should be chosen as carefully as the method’s name, since they are in some sense part of the name. Consequently, parameters that can be passed via keyword arguments are declared in one way (def foo(size:) => ...) and those that don’t in another (def foo(size) => …). This is for two reasons: to make it explicit to users that it may not be obvious what these arguments mean so you might want to consider passing them using keywords, and the make it clear to the implementor that people will depend on those keyword names so there are compatibility concerns if you change them.

An interesting part of this will be to learn to use this myself, which may lead me to change how this works. However, I’m sure I’ll hang on to the idea of keyword arguments in some form, for code readability there is just no replacement for a construct like this.