Making tea

This post is about the mathematics of making a good cup of tea. I drink a lot of tea and for a long time I would ignore the direction to use water below boiling, say 75° or 85°. Because – if you don’t have a boiler with a temperature setting how do you get water at less than 100°? Wait for it to cool? Come on!

Of course you can do it easily by adding cold water, you just have to figure out how much. While I was waiting for a delivery this morning I thought I’d look into that and it turned out to be a fun little practical application of the scientific method.

Theory 1: the average temperature

Let’s assume tap water is 22° and you add 500ml of it to the same amount of boiling water. My starting assumption was that the result will be 1l of water at the average of the two temperatures, 61°:

$t_r = \frac{100 \cdot 500 + 22 \cdot 500}{500 + 500}=61$

Put more generally, my theory was that if you mix water of two different temperatures then the temperature of the result will be the average over the two temperatures, where each temperature is weighted by the volume of water

$t_r = \frac{100 v_b + t_t v_t }{v_b + v_t}$

(Here $t_r$ is the resulting temperature of the water, $v_b$ is the amount of boiling water which will be 100°, $t_t$ is the temperature of tap water, and $v_t$ is the amount of tap water.)

If you measure the tap water temperature and know how much water your tea pot will hold you can solve for the amount of boiling and tap water to use to make any given water temperature. For instance, my tap water is 22° and my tea pot holds 700ml so the formula for how much to boil for a given temperature is,

$t_r = \frac{100 v_b + 22 (700 - v_b)}{700}$

$t_r = \frac{100 v_b + 22 \cdot 700 - 22 v_b}{700}$

$t_r = \frac{78 v_b + 22 \cdot 700}{700}$

$700 (t_r - 22) = 78 v_b$

$\frac{700}{78} (t_r - 22) = v_b$

So to get a pot full of 85° water, the amount of boiling water should be

$\frac{700}{78} (85 - 22) = 8.97 \cdot 63 = 565$

Boil 565ml, add the rest up to 700ml from the tap (that would be 135ml) and you’ve got 700ml of 85° water. That’s the theory.

Ok, so having solved the problem I went to make tea, my delivery having still not arrived (spoiler alert: it arrived while I was at the dentist, obviously). It turns out though that the theory doesn’t work. When I did it in practice it turned out that the water wasn’t 85°, it was 77°. Bummer. The theory is too simplistic.

Theory 2: the tea pot constant

The problem was fortunately obvious: I was pouring the water into a cold tea pot. The formula only says what happens to the water itself, it doesn’t take into account that it all ends up in a pot that’s initially at room temperature.

What I thought I’d do then was to model the effect of the cold pot as if it were extra tap water. How much exactly I don’t know, but it seems reasonable to expect that the cooling effect of a particular pot works basically the same way as if you’d added a bit of extra cold water. So we replace $v_t$ in the formula above, the amount of tap water, with $v_t + v_p$, the amount of tap water and some additional amount, $v_p$, to account for the pot,

$t_r = \frac{100 v_b + t_t (v_t + v_p) }{v_b + v_t + v_p}$

You have to determine $v_p$ experimentally, it is a property of a particular pot just like the volume. I determined it by pouring boiling water into the cool pot and then measuring the temperature; it had dropped to 90°. Using this I could find $v_p$,

$t_r = \frac{100 v_b + t_t (v_t + v_p) }{v_b + v_t + v_p}$

$90 = \frac{100 \cdot 700 + 22 (0 + v_p) }{700 + v_p}$

$90 \cdot 700 + 90 v_p = 100 \cdot 700 + 22 v_p$

$-10 \cdot 700 = -68 v_p$

$\frac{10 \cdot 700}{68} = v_p = 103$

In other words: the pot cools boiling water as much as adding 103ml of extra tap water.

Again, it’s just a theory that this makes sense, but it’s a theory we can test. Earlier, I mixed 565ml boiling and 135ml tap water and got 77° instead of the 85° I expected. What does the new theory predict will happen?

$t_r = \frac{100 v_b + t_t (v_t + v_p) }{v_b + v_t + v_p}$

$t_r = \frac{100 \cdot 565 + 22 \left( 135 + 103 \right) }{700 + 103} = 76.9$

That’s so close to the 77° that you’d almost think I fudged the numbers, but it’s really just a coincidence.

With this new and more plausible general formula we can plug in the values for my pot and water and get a specific formula for how much boiling and tap water to use to produce a particular temperature in my pot,

$t_r = \frac{100 v_b + 22 (700 - v_b + 103)}{700 + 103}$

$t_r = \frac{100 v_b + 22 \cdot 803 - 22 v_b}{803}$

$t_r = \frac{78 v_b + 22 \cdot 803}{803}$

$803 (t_r - 22) = 78 v_b$

$\frac{803}{78} (t_r - 22) = v_b$

It turns out to be the same as the one before except with 803 instead of 700. Using this formula, then, to get 85° water I need

$\frac{803}{78} (85 - 22) = 10.29 \cdot 63 = 648$

ml of boiling and the rest, 52ml, of tap. I tried this out and the result was… drumroll… 86°. So the formula seems to work. Yay!

Summary

So, to sum all this up, here’s how you apply the formulas in practice.

Measure how much water it takes to fill your pot, call that $v$. Boil that much water and pour it into the pot. Measure the temperature and call that $t_p$. Measure the temperature of your tap water, call that $t_t$. Plug them into this formula to get the tea pot constant for your tea pot, $v_p$

$v_p = v \frac{t_p - 100}{t_t - t_p}$

Then, given a temperature $t_r$, the amount of water to boil to get that temperature can be calculated by,

$v_b = \frac{(v+v_p) (t_r-t_t)}{100-t_t}$

And that’s it. You only have to calculate these once for a given pot so even though it’s a hassle, it’s a one time hassle. And it’s possible that most tea pots have approximately the same constant, in which case you can just assume that yours has the same one. But that’s for future research into the fertile field that is the science of tea.

Ironically, I got so caught up calculating the ideal amount of water that I never actually made tea, just water of different temperatures. I think I’ll go make some actual tea now.

Why I find IEEE 754 frustrating

Programming languages new (Java) and old (Fortran), and their compilers, still lack competent support for features of IEEE 754 so painstakingly provided by practically all hardware nowadays. […] Programmers seem unaware that IEEE 754 is a standard for their programming environment, not just for hardware.

William Kahan

This post is the result of me, a language design guy, trying and failing to figure out how to provide language bindings for IEEE 754 in a modern language. The TL;DR is,

• The basic model described in 754 is great, no complaints.
• The licensing terms of the spec document are deeply problematic.
• The language bindings are too specific and have an old-school procedural flavor that makes them difficult to apply to a modern language.
• It would be much more useful to know the motivation behind the design, that way each language can find a solution appropriate for them.

The rest of the post expands on these points. First off I’ll give some context about 754 and floating-point support in general.

754

I want to start out by saying that IEEE 754 has been enormously helpful for the world of computing. The fact that we have one ubiquitous and well-designed floating-point model across all major architectures, not multiple competing and less well designed models, is to a large extent thanks to 754. It’s been so successful that it’s easy to overlook what an achievement it was. Today it’s the water we swim in.

But 754 is two things, what I’ll call the model and the bindings. The model is a description of a binary floating-point representation and how basic operations on that representation should work. The bindings are a description of how a programming language should make this functionality available to programmers. The model is the part that’s been successful. The bindings, as Kahan, the driving force behing 754, admits, not so much.

I’ll go into more detail about why the bindings are problematic in a sec, it’s the main point of this post, but first I want to take a broader look at floating-point support in languages.

Floating-point is hard

I’m not really a numerical analysis guy at all, my interest is in language design (though as it happens the most popular blog post I’ve ever written, 0x5f3759df, is also about floating-point numbers). What interests me about floating-point (FP from now on) is that a couple of factors conspire to make it particularly difficult to get right in a language.

1. Because floats are so fundamental, as well as for performance reasons, it’s difficult to provide FP as a library. They’re almost always part of the core of the language, the part the language designers design themselves.
2. Numerical analysis is hard, it’s not something you just pick up if you’re not interested in it. Most language designers don’t have an interest in numerical analysis.
3. A language can get away with incomplete or incorrect FP support. To quote James Gosling, “95% of the folks out there are completely clueless about floating-point.”

The goal of the binding part of 754 is, I think, to address the second point. Something along the lines of,

Dear language designers,
We know you’re not at home in this domain and probably regret that you can’t avoid it completely. To help you out we’ve put together these guidelines, if you just follow them you’ll be fine.

Much love,
The numerical analysts

I think this is a great idea in principle. So what’s the problem?

Licensing

The most immediate problem is the licensing on the 754 document itself. Have you ever actually read it? I hadn’t until very recently. Do you ever see links from discussions around FP into the spec? I don’t, ever. There’s a reason for that and here it is,

The document is simply not freely available. This alone is a massive problem. Before you complain that people don’t implement your spec maybe stop charging them $88 to even read it? More insidiously, say I support 754 to the letter, how do I document it? What if I want to describe, in my language spec, under which circumstances a divideByZero is raised? This is what the spec says, The divideByZero exception shall be signaled if and only if an exact infinite result is defined for an operation on finite operands. The default result of divideByZero shall be an ∞ correctly signed according to the operation [… snip …] This sounds good to me as is. Can I copy it into my own spec or would that be copyright infringement? Can I write something myself that means the same? Or do I have to link to the spec and require my users to pay to read it? Even if I just say “see the spec”, might my complete implementation constitute infringement of a copyrighted structure, sequence, and organization? The spec itself says, Users of these documents should consult all applicable laws and regulations. […] Implementers of the standard are responsible for observing or referring to the applicable regulatory requirements. So no help there. In short, I don’t know how to create FP bindings that conform to 754 without exposing myself to a copyright infringement lawsuit in the US from IEEE. Sigh. My first suggestion is obvious, and should have been obvious all along: IEEE 754 should be made freely available. Maybe take some inspiration from ECMA-262? That’s not all though, there are issues with the contents of the spec too. The spec is too specific The main problem with how the bindings are specified is that they’re very specific. Here’s an example (my bold), This clause specifies five kinds of exceptions that shall be signaled when they arise; the signal invokes default or alternate handling for the signaled exception. For each kind of exception the implementation shall provide a corresponding status flag. What does it mean to provide a status flag? Is it like in JavaScript where the captures from the last regexp match is available through RegExp? $ /.(.)./.exec("abc")
> ["abc", "b"]
$RegExp.$1
> "b"

It looks like that’s what it means, especially if you consider functions like saveAllFlags,

Implementations shall provide the following non-computational operations that act upon multiple status flags collectively: [snip]

― flags saveAllFlags(void)
Returns a representation of the state of all status flags.

This may have made sense at the time of the original spec in 1985 but it’s mutable global state – something that has long been recognized as problematic and which languages are increasingly moving away from.

Here’s another example. It sounds very abstract but for simplicity you can think of “attribute” as roundingMode and “constant value” as a concrete rounding mode, for instance roundTowardsZero.

An attribute is logically associated with a program block to modify its numerical and exception semantics[…] For attribute specification, the implementation shall provide language-defined means, such as compiler directives, to specify a constant value for the attribute parameter for all standard operations in a block […].

Program block? Attribute specification? Nothing else in my language works this way. How do I even apply this in practice? If attributes are lexically scoped then that would mean rounding modes only apply to code textually within the scope – if you call out to a function its operations will be unaffected. If on the other hand attributes are dynamically scoped then it works with the functions you call – but dynamically scoped state of this flavor is really uncommon these days, most likely nothing else in your language behaves this way.

You get the picture. The requirements are very specific and some of them are almost impossible to apply to a modern language. Even with the best will in the world I don’t know how to follow them except by embedding a little mini-language just for FP that is completely different in flavor from the rest. Surely that’s not what anyone wants?

What do I want then?

As I said earlier, it’s not that I want to be left alone to do FP myself. I’m a language guy, almost certainly I won’t get this right without guidance. So what is it that I want?

I want the motivation, not some solution. Give me a suggested solution if you want but please also tell me what the problem is you’re trying to solve, why I’m supporting this in the first place. That way, if for some reason your suggested solution just isn’t feasible there’s the possibility that I can design a more appropriate solution to the same problem. That way at least there’ll be a solution, just not the suggested one.

For instance, it is my understanding that the rounding mode mechanism was motivated mainly by the desire to have a clean way to determine error bounds. (Note: as pointed out on HN and in the comments, the rest of this section is wrong. Something similar was indeed the motivation for the mechanism but not exactly this, and you don’t get the guarantees I claim you do. See the HN discussion for details.) It works like this. Say you have a nontrivial computation and you want to know not just a result but also the result’s accuracy. With the rounding mode model you’d do this: repeat the exact same computation three times, first with the default rounding, then round-positive (which is guaranteed to give a result no smaller than the exact result) and finally round-negative (which is guaranteed to give a result no larger). This will give you an approximate result and an interval that the exact result is guaranteed to be within. The attribute scope gymnastics are to ensure that you can replicate the computation exactly, the only difference being the rounding mode. It’s actually a pretty neat solution if your language can support it.

It’s also a use case that makes sense to me, it’s something I can work with and try to support in my own language. Actually, even if I can implement the suggested bindings exactly as they’re given in the spec the motivation is still useful background knowledge – maybe it’s a use case I’ll be extra careful to test.

I’m sure there are other uses for the binding modes as well. But even if I assume, wrongly, that there isn’t, that misapprehension is still an improvement over the current state where I have nothing to fall back to so I dismiss the whole rounding mode model outright.

(As a bitter side note, minutes from the 754-2008 revision meetings are available online and I expect they contain at least some of the motivation information I’m interested in. I don’t know though, I can’t search for the relevant parts because the site’s robots.txt prevents search engines from indexing them. Grr.)

In conclusion, I think the people who regret that languages don’t have proper support for 754 could do a lot to improve that situation. Specifically,

• Fix the licensing terms on the spec document.
• Describe the use cases you need support for, not just one very specific solution which is hard to apply to a modern language.
• As an absolute minimum, fix the robots.txt so the spec meeting minutes become searchable.

Update: removed sentence that assumed the spec’s concept of exceptions referred to try/except style exceptions. Don’t use the term “interval arithmetic” about determining error bounds.

Shift/Reduce Expression Parsing (by Douglas Gregor)

Once every few years I end up, for one reason or another, having to implement an operator precedence parser. Getting it right is a bit fiddly. I once found a great article about how you do it and tend to just follow that but a while ago that article disappeared.

Now, again, I find myself having to implement an operator precedence parser, in yet another language, and so I dug up the old article on archive.org and decided: since I find it super useful and it’s completely gone from the web, maybe I should just host a copy here. So here it is. It’s a design doc from the sugar parser library written by Douglas Gregor. So, to be clear, I didn’t write this doc I’m just reposting it here (with a few minimal formatting changes) because it’s good and that way it’s at least available somewhere.

Shift/Reduce Expression Parsing

Introduction

The Sugar expression parser is similar to the class of operator-precedence parsers, but has been extended to support common requirements when parsing expressions, such as function application, confix (grouping) operators, and operator name disambiguation. Additionally, Sugar is intended to be usable without any precompiling phase, making it ideal for rapid or on-the-fly construction of expression parsers.

Expressions

For the purposes of this document, an expression is a sequence of operators and operands, where operators fall into one of the following categories:

Type Arity Placement Examples
Prefix Unary Prior to operand Unary minus
Postfix Unary After operand Factorial
Infix Binary Between operands Addition, multiplication, and division
Confix Unary Surrounding operand Parentheses, half-open ranges
Function application Binary After first operand and surrounding second operand Mathemetical functions (sin(x)), array indexes(a[5])

The confix and function application operators are essentially split into their component parts, an open symbol and a close symbol, during the parsing phase. The “open” symbol will occur on the left-hand side and the “close” symbol will occur on the right-hand side.

Constructing the parser

The expression parser is a shift/reduce parser with zero lookahead that utilizes two separate stacks: one for operators and one for operands. Any operands in the input stream are immediately shifted onto the operator stack; operators are immediately shifted onto the operator stack only if the operator stack is empty. Otherwise, the following table determines the action of the parser depending on the type of the operator on top of the operator stack and on the type of the current operator token.

Current operator
Prefix Postfix Infix Confix Open Confix/Function Close Function Open End of Input
Top
of
Stack
Prefix shift precedence precedence shift reduce precedence reduce
Postfix - reduce reduce - reduce reduce reduce
Infix shift precedence precedence/associativity shift reduce precedence reduce
Confix Open shift shift shift shift shift shift reduce
Confix/Function Close reduce reduce reduce reduce reduce reduce reduce
Function Open shift shift shift shift shift shift reduce

Description of parsing actions

• A shift operation pushes the current operator token onto the operator stack.
• A reduce operation pops the operator token off the top of the operator stack, and then pops the appropriate number of operands from the operand stack. Then the operator is applied to the operand(s) and the result is pushed back on the operand stack. Reduction of confix operators and of function application requires popping two operators off the operator stack.
• A precedence operation compares determines the relative precedence of the operator on top of the operator stack (top) and the current operator (current).
• If top has a lower precedence than current, shift.
• If top has a higher precedence than current, reduce.
• A precedence/associativity operation first compares the precedence according to the precedence operation: if the precedence is equivalent, associativity is considered:
• If top associates left of current, reduce.
• If top associates right of current, shift.

Rejecting Invalid Expresions

Operator-precedence parsers are often not used because they accept invalid strings. The shift-reduce parser as specified above will consider the expressions x + x, + x x, and x x + equivalent, even though only the first form is correct. This weakness is easily remedied with the use of the following state machine to track what type of operator or operand is expected at any given point in time.

The state machine contains three states: the pre-operand state where we collect confix open and prefix operators while waiting for an operand, the post-operand state where we have received an operand and are applying postfix operators to it and closing confix operators or finishing function calls, and finally an error state that will be entered when an invalid parse is detected.

Disambiguation of Operator Names

Within many domains, certain operators are reused in different contexts. Several obvious examples are the unary and binary minus operators that use the same symbol ‘-‘, the absolute-value confix operator that uses the symbol ‘|’ as both its open and close symbol, and the ‘+’ operator for regular expressions that is both a postfix positive closure operator and an infix operator for specifying alternatives.

Disambiguation of operator names is in many cases directly related to the state machine used to identify invalid sequences. Given any operator name, we determine the set of operator types that it may belong to. We then intersect this with the set of operator types that are valid at our current state within the state machine to determine role(s) this operator may play in this context. Several cases are left ambiguous by this intersection. These cases are considered below with either a specific resolution or are considered impossible by this class of parser.

Disambiguation at this phase requires lookahead of one additional token, and is also based on the state machine. Disambiguation is possible when the possible meanings of the operator differ in the states that will result from their interpretation. For instance, if a given operator is both postfix and infix, the postfix interpretation would remain in the post-operand state whereas the infix interpretation would transfer to the pre-operand state. Looking ahead one symbol, we can determine if the next symbol would be valid in either state: if it is valid in only one of the resulting states, we can disambiguate the prior (non-lookahead) symbol to ensure that the appropriate state is reached so that the lookahead symbol will not result in a parse error.

• {prefix, confix open}: ambiguous (requires arbitrary lookahead).
• {postfix, infix}: single lookahead disambiguation based on state.
• {confix/function close, infix}: single lookahead disambiguation based on state.
• {function open, infix}: ambiguous (requires arbitrary lookahead).
• {postfix, confix/function close}: ambiguous (requires arbitrary lookahead).
• {postfix, function open}: single lookahead disambiguation based on state.
• {function open, function/confix close}: single lookahead disambiguation based on state.

Parsing Examples

Mathematical Expressions

Parse the expression x * |y+z| + -3^x^y using the standard mathematical rules for precedence and associativity.

State Operand Stack Operator Stack Token Token type Action
Pre x operand shift
Post x * infix operator shift
Pre x * | confix open or confix close disambiguate as confix open, shift
Pre x * (confix open |) y operand shift
Post x y * (confix open |) + infix or prefix operator disambiguate as infix, shift
Pre x y * (confix open |) + z operand shift
Post x y z * (confix open |) (infix +) | confix open or confix close disambiguate as close, reduce
Post x (y+z) * (confix open |) | confix open or confix close disambiguate as close, reduce
Post x (|y+z|) * + infix or prefix disambiguate as infix, compare precedence, reduce
Post (x * (|y+z|)) + infix or prefix disambiguate as infix, shift
Pre (x * (|y+z|)) (infix +) - infix or prefix disambiguate as prefix, shift
Pre (x * (|y+z|)) (infix +) (prefix -) 3 operand shift
Post (x * (|y+z|)) 3 (infix +) (prefix -) ^ infix compare precedence, shift
Pre (x * (|y+z|)) 3 (infix +) (prefix -) ^ x operand shift
Post (x * (|y+z|)) 3 x (infix +) (prefix -) ^ ^ infix compare precedence, compare associativity, shift
Pre (x * (|y+z|)) 3 x (infix +) (prefix -) ^ ^ y operand shift
Post (x * (|y+z|)) 3 x y (infix +) (prefix -) ^ ^ end end reduce
Post (x * (|y+z|)) 3 (x^y) (infix +) (prefix -) ^ end end reduce
Post (x * (|y+z|)) (3^(x^y)) (infix +) (prefix -) end end reduce
Post (x * (|y+z|)) (-(3^(x^y))) (infix +) end end reduce
Post ((x * (|y+z|)) + (-(3^(x^y)))) empty end end accept

Douglas Gregor

I’ve been reading Learn You a Haskell and got stuck on a tiny point at the beginning, in the section about ranges: you can’t have a range with more than one step value. This post is about what it might mean to allow multiple steps and how Charles Babbage might have had an opinion on this particular aspect of haskell’s design.

Here’s the whole paragraph from Learn You a Haskell,

To make a list containing all the natural numbers from 1 to 20, you just write [1..20]. [… snip …] Ranges are cool because you can also specify a step. What if we want all even numbers between 1 and 20? Or every third number between 1 and 20?

ghci> [2,4..20]
[2,4,6,8,10,12,14,16,18,20]
ghci> [3,6..20]
[3,6,9,12,15,18]

It’s simply a matter of separating the first two elements with a comma and then specifying what the upper limit is. While pretty smart, ranges with steps aren’t as smart as some people expect them to be. You can’t do [1,2,4,8,16..100] and expect to get all the powers of 2. Firstly because you can only specify one step. And secondly because some sequences that aren’t arithmetic are ambiguous if given only by a few of their first terms.

The bold highlight is mine. This makes sense – how can the language guess that what you have in mind when you write [1, 2, 4, 8, 16..] is the powers of 2?

On the other hand though, just as a fun exercise, if you were to allow multiple step values how would you define them? Even though this is a highly specialized and modern language design question I know Charles Babbage would have understood it and I suspect he would have had an opinion.

A while ago I wrote a post about Babbage’s difference engine. The difference engine is basically an adder that calculates successive values of a polynomial by doing simple addition, using a technique called divided differences. This is described in full detail in that post but here’s a summary. Take a sequence of values of values of a polynomial, for instance the first four squares,

$\begin{tabular}{lcl} f(0) & = & 0 \\ f(1) & = & 1 \\ f(2) & = & 4 \\ f(3) & = & 9 \\ & \vdots & \end{tabular}$

You can then calculate the next values using just addition by calculating the differences between the initial values. To get the differences for a sequence you subtract each pair of values,

$\begin{tabular}{l|l} 0 \\ & 1 \\ 1 \\ & 3 \\ 4 \\ & 5 \\ 9 \end{tabular}$

Do it once and you get the first differences. Do the same thing to the first differences and you get the second differences,

$\begin{tabular}{l|l|l} 0 \\ & 1 \\ 1 && 2\\ & 3 \\ 4 && 2\\ & 5 \\ 9 \end{tabular}$

Do the same thing to the second differences and you get the third differences,

$\begin{tabular}{l|l|l|l} 0 \\ & 1 \\ 1 && 2\\ & 3 && 0\\ 4 && 2\\ & 5 \\ 9 \end{tabular}$

In this case there’s just one and it’s 0 so we can ignore it. Once you have all the differences you can calculate the next value of the polynomial by adding them together from right to left,

$\begin{tabular}{l|l|l} 0 \\ & 1 \\ 1 && 2\\ & 3 \\ 4 && 2 \\ & 5 \\ 9 \end{tabular} \quad\to\quad \begin{tabular}{l|l|l} 0 \\ & 1 \\ 1 && 2\\ & 3 \\ 4 && 2 \\ & 5 & \tiny{+0} \\ 9 && \bf{2} \end{tabular} \quad\to\quad \begin{tabular}{l|l|l} 0 \\ & 1 \\ 1 && 2\\ & 3 \\ 4 && 2 \\ & 5 \\ 9 & \tiny{+2} & \bf{2} \\ & \bf{7} \\ \end{tabular} \quad\to\quad \begin{tabular}{l|l|l} 0 \\ & 1 \\ 1 && 2\\ & 3 \\ 4 && 2 \\ & 5 \\ 9 && \bf{2} \\ \tiny{+7} & \bf{7} \\ \bf{16} \\ \end{tabular}$

There it is, 16, the next value in the sequence. At this point we can just keep going,

$\begin{tabular}{l|l|l} \multicolumn{1}{r}{}&&2 \\ & 7 & \tiny{+0}\\ 16 &\tiny{+2}& 2 \\ \tiny{+9} & 9 \\ \bf{25} \\ \end{tabular} \quad\to\quad \begin{tabular}{l|l|l} \multicolumn{1}{r}{}&&2 \\ & 9 & \tiny{+0}\\ 25 &\tiny{+2}& 2 \\ \tiny{+11} & 11 \\ \bf{36} \\ \end{tabular} \quad\to\quad \begin{tabular}{l|l|l} \multicolumn{1}{r}{}&&2 \\ & 11 & \tiny{+0}\\ 36 &\tiny{+2}& 2 \\ \tiny{+13} & 13 \\ \bf{49} \\ \end{tabular} \quad\to\quad\dots$

This technique works for any polynomial. The difference engine was given its name because it is a mechanical implementation of it.

Already you may suspect how this can be relevant to ranges in haskell. You can generalize ranges to n step values by using divided differences on the step values to generate the rest of the sequence. With two step values the result of this is a linear sequence, as you would expect (polyEnumFrom is my name for the generalized range function),

ghci> polyEnumFrom [0, 1]
[0, 1, 2, 3, 4, 5, 6, ...]
ghci> polyEnumFrom [4, 6]
[4, 6, 8, 10, 12, 14, 16, ...]
ghci> polyEnumFrom [10, 9]
[10, 9, 8, 7, 6, 5, 6, ...] 

Give it three elements, for instance the squares from before, and the result is a second-order polynomial,

ghci> polyEnumFrom [0, 1, 4]
[0, 1, 4, 9, 16, 25, 36, ...]

Similarly with the cubes,

ghci> polyEnumFrom [0, 1, 8, 27]
[0, 1, 8, 27, 64, 125, 216, ...]

There is nothing magical going on here to recognize the squares or cubes here, it’s a completely mechanical process that uses subtraction to get the differences and addition to then continue the sequence past the step values. It won’t give you the squares though, only polynomials. If you try you get

ghci> polyEnumFrom [1, 2, 4, 8, 16]
[1, 2, 4, 8, 16, 31, 57, ...]

For any list of 3 step values there is exactly one polynomial of order 2 or less that yields those values; an order-2 polynomial has 3 free variables so there’s no room for more than one solution. The result of a 3-step range is the sequence of values produced by that order-2 polynomial. In general, the result of a multi-step range with n steps is the sequence of values generated by the unique polynomial of order at most n-1 whose first n values are the steps. Conveniently, if the steps are integers the polynomial is guaranteed to only produce integers.

Babbage built the difference engine because it was really important then be able to produce tables of function values by calculating successive values of polynomials. I think he would have appreciated the power of a language where you can condense the work of generating an approximate table of sine values into a single line,

take 10 (polyEnumFrom [0, 0.0002909, 0.0005818, 0.0008727])

which yields,

0.0000000
0.0002909
0.0005818
0.0008727
0.0011636
0.0014545
0.0017454
...

(The four values given to polyEnumFrom are the first four sine values from the table.) Of course, Babbage did lose interest in the difference engine himself and preferred to work on the more general analytical engine. If he did see haskell he would probably quickly lose interest in ranges and move on to more general advanced features. But I digress.

Is a shorthand for sequences of polynomial values actually a useful feature in a language? I doubt it. But I like the idea that a corner of a modern language  intersects with Babbage and the difference engine. Maybe it would make sense as a little easter-egg tribute to Babbage, a difference engine hiding behind a familiar language construct. I quite like the idea of that.

If you want to play around with it I’ve put an implementation in a gist. For something that might seem fairly complex the implementation is simple – polyEnumFrom and its helpers is just 13 lines. Haskell is an elegant language.

This post is about giving a memorable name to every spot on the earth. It’s the fourth and probably last post in my series about z-quads, a spatial coordinate system. Previously I’ve written about their basic construction, how to determine which quads contain each other, and how to convert to and from geographic positions. This post describes a scheme for converting a quad to a memorable name and back again.

To the right are two examples. The left image is quad 10202 whose name is naga. The second is quad 167159511 whose name is naga hafy. As you might expect, naga hafy is contained within naga.

I can understand if at this point you’re thinking: what could possibly be the point of giving a nonsense name to every square on the surface of the earth?

What it’s really about is hacking your brain a little bit, encoding data using a pattern the brain is good at remembering instead of raw numbers which it isn’t. My original motivation was Ukendt Aarhus who takes pictures at inaccessible places in Århus and posts them on facebook, along with the position they were taken. I saw some of the pictures in real life and had to type a position like this

56.154382 / 10.20489

into google maps on my phone. My brain has absolutely no capacity for remembering numbers, even for a short while, so I have to do it a few digits at a time, looking back and forth. I’m much better at remembering words, even nonsense words. That is, I can hold more of this,

naga hazo yhes

in my head at once. Not the whole thing but certainly one word at a time, sometimes two. That’s the effect this naming scheme aims to use. Three words gives you a fairly precise location (a ~10m square) and if you need less, say to identify the location of a city, two words (a ~1km square) is typically enough. The town I grew up in, which is very small indeed, is uniquely identified by naga opuc.

How do you generate reasonably memorable names automatically? The trick I used here is the one I usually use: alternate between consonants and vowels. It doesn’t always work but is usually good enough. I call those salad words (because “salad” is one).

If you use just the English alphabet you have 6 vowels and 20 consonants. You can make 28800 four-letter salad words with those: two vowels (62) times two consonants (202) times 2 for whether the word starts with a consonant or a vowel. There are 21845 quads on the first 7 zoom levels so that will fit in one four-letter salad word.

The first step in generating the string representation of a quad, then, is to split it into chunks that each hold 7 zoom levels. So if you want to convert quad 171171338190 which is at zoom level 19 you divide it into two chunks at level 7 and one chunk at level 5

10202   9941   974

This can be done using the ancestor/descendency operations from earlier. You can then convert each chunk into a word separately.

Encoding a chunk

There are many ways you could mape a value between 0 and 21845 to a salad word but one property I’d like the result to have is that it should give some hint about which area it covers, very roughly at least. I ended up with the following decomposition, (and let’s use zagi as an example while I describe how it works).

The world is divided into two parts along the Greenwich meridian. To the west words are vowel-consonant-vowel-consonant, to the east they’re consonant-vowel-consonant-vowel. So zagi must be east of Greenwich.

Each half is divided into six roughly evenly sized parts which determines what the first vowel is. (Note: not the first letter, the first vowel, which can be either the first or second letter.) The first vowel in zagi is a so it must be in the northwestern part of the eastern hemisphere. Each of the six divisions are subdivided 20 ways which determines the first consonant. The first consonant in zagi is z which is way down in the bottom right corner. This puts the location almost exactly in the middle of the eastern hemisphere. In fact, zagi is the quad that covers New Delhi.

The last two letters are determined in a similar way, further subdividing the region:

The second vowel determines a further subdivision, the second consonant narrows the quad down fully.

Now, the point of this is not for it to be possible to look at the name of a quad and know where it is – though the word ordering and first vowel rule sticks in your head pretty easily. The property it does have though is that at every level you have similar words grouped together geographically. These four maps show all level-7 quads that start with o, then on, then one, then finally onel.

If you know where onel is then you can be sure that onem is somewhere nearby.

The basic scheme is an attempt to exploit that we’re better at remembering word-like data than raw numbers. The subdivision scheme is a way to take that further and put some more patterns into the data. The brain is great at recognizing patterns.

In my experience it serves its purpose well: if you’re working within a particular area you become somewhat familiar with the quads by name and it becomes easier to remember things about them. One example is that coincidentally, Arizona is covered by quads starting with az, which is easy to remember. That gives you an anchor that helps you locate other a-quads because all the second consonants are located in a predictable way relative to each other. If you know where one is, that gives you a sense of where the others are.

I didn’t plan for that particular rule to be possible, I just noticed that I had started using it. And that’s the point: the patterns aren’t intended to be used or recognized in any particular way that I’ve anticipated in advance. They’re just available, like the grid on graph paper. How you actually remember them, or if you do at all, only emerges when you use the system in practice.

Implementation

The actual encoding/decoding is pretty straightforward but it’s tedious so I won’t go through it in detail, it’s in the source (js/java) if you’re curious. The shapes of the irregular subdivisions are encoded as a couple of tables.

The one bit that isn’t straightforward is the scheme for encoding the zoom levels 0-6. Remember, we’re not just encoding one zoom level but 8 of them. The trick when encoding a quad at the higher levels, 0-6, is to take the quad’s midpoint and zoom it down to level 7, keeping a bit saying whether we zoomed or not. Then encode the first three letters as if it were a level 7 quad initially and finally use the zoomed/not-zoomed bit to determine the last consonant. This is why there are two consonants in some of the cells in the purple diagram for the last consonant above. This works because if you take the midpoint of a quad levels 0-z and find cell at level z+1 that contains it, no other midpoint from 0-z will land in that same cell. So it’s enough to know whether we zoomed or not, if we did then there is a unique ancestor which must be the one we zoomed from.

Four-letter words

Another wrinkle on this scheme is that some four-letter salad words are, well, rude. Words you’d probably rather avoid if you have the choice. Since we’re only using 21845 of the 28800 possible words it’s easy to find an unused word near any one you’d like to avoid and manually map, for instance, 6160 to anux rather than anus.

There are many four-letter salad words left with meaning, even if you remove the rude ones. Like tuba, epic, hobo, and pony. Only 70% of words correspond to a valid quad so 30% of the meaningful words will not be valid. Of the ones that do a lot of them of them will be somewhere uninteresting like the ocean or Antarctica. But some do cover interesting places: pony and sofa are in WA, Australia, atom is near Vancouver, toga is in Papua New Guinea.

The toy site I mentioned in the last post also allows you to play around with quad names. Here is pony tuba,

The URL is,

http://zquad.p5r.org?name=pony-tuba

Further exploration

There’s a lot more you can do with z-quads, both as mathematical objects and as data structures.

On the data structure side spatial data structures and quad trees are well understood and it’s unclear how much new, if anything, you can do with z-quads. In my experience though they’re a convenient handle to use to describe and address data within spatial data structures.

As mathematical objects they have some nice properties and if you want to prove properties about them – for instance that a quad can contain at most one midpoint of a higher-level quad – they’re friendly to induction proofs. You can also easily generalize them, for instance to any number of coordinates. If you want to represent regular cubes instead of squares all the same tools work, they just have to be tweaked a bit:

b3z = 1 + 8 + … + 8z-1 + 8z

ancestor3(q, n) = (qb3z) / 8n

The other way works too, with one dimension instead of two. That model gives you regular divisions of the unit interval.

This will probably be my last post about them though. It’s time to get back to coding with them, rather than writing about them.

This is the third post in a series about z-quads, a spatial coordinate system. The first one was about the basic construction, the second about how to determine which quads contain each other. This one is about conversions: how to convert positions to quads and back.

First, a quick reminder. A quad is a square(-ish) area on the surface of the earth. Most of the time we’ll use the unit square instead of the earth because it makes things simpler and you can go back and forth between lat/longs and unit coordinates just by scaling the value up and down. (There’s a section at the end about that.)

So, when I’m talking about encoding a point as a quad, what I mean more specifically is: given a point on the unit square (x, y) and a zoom level z, find the quad at zoom level z that contains the point. As it turns out, the zoom level doesn’t really matter so we’ll always find the most precise quad for a given point. If you need a different zoom level you can just have a step at the end that gets the precise quad’s ancestor at whatever level you want.

Precision

So far we’ve focused on quads as an abstract mathematical concept and there’s been no reason to limit how far down you can zoom. In practice though you’ll want to store quads in integers of finite size, typically 32-bit or 64-bit. That limits the precision.

It turns out that you can fit 15 zoom levels into 32 bits and 31 into 64. At zoom level 15 the size of a quad is around 1.5 km2. At zoom level 31 it’s around 3 cm2. So yeah, you typically don’t need more than 31 levels. A third case is if you want to store your quad as a double – JavaScript, I’m looking at you here. A double can hold all integers up to 53 bits, after which some can’t be represented. That corresponds to 26 zoom levels which is also plenty precise for most uses.

Encoding points

Okay, now we’re ready to look at encoding. What we’ll do is take a unit point (x, y) and find the level 31 quad that contains it. Since we’ve fixed the zoom level we’ll immediately switch to talking about the scalar rather than the quad. That makes everything simpler because within one level we’re dealing with a normal z-order curve and conversion to and from those are well understood.

The diagram on the right illustrates something we’ve seen before: that scalars are numbered such that the children at each level are contiguous. In this example, the first two green bits say which of the four 2×2 divisions we’re in. The next two say which 1×1 subdivision of that we’re in. It’s useful to think of this the same way as the order of significance within a normal integer: the further to the left a pair of bits is in a scalar the more significant it is in determining where the scalar is.

Now things will have to get really bit-fiddly for a moment. Let’s look at just a single 2×2 division. Each subdivision can be identified with an (x, y) coordinate where each ordinate is either 0 or 1, or alternatively with a scalar from 0 to 3. The way the scalars are numbered it happens that the least significant scalar bit gives you the x-coordinate and the most significant one gives you y:

(This is by design by the way, it’s why the z-shape for scalars is convenient). This means that given point x and y, each 0 or 1, you get the scalar they correspond to by doing

x + 2 y

What’s neat about that is that because of the way pairs of bits in the scalar increase in significance from right to left matches the significance of bits in an integer, you can do this for each bit of x and y at the same time, in parallel. To illustrate how this works I’ll take an example.

Let’s start with the point (2/5, 2/3) and find the level 5 scalar that contains it. First we’ll multiply both values by 32 (=25) and round the result to an integer,

x = (2/5) × 32 ≈ 12 = 01100b
y = (2/3) × 32 ≈ 21 = 10101b

Now we’ll perform x + 2 y for each bit in parallel like so,

Looks bizarre right, but it does work. We spread out the bits from each input coordinate so they can be matched up pairwise and then add them together. The result is the scalar we’re looking for because the order of significance of the bit-pairs in the scalar match the significance of the individual bits in each of the inputs.

Now, how can you tell that the result, scalar 626 on zoom level 5, is the right answer? This diagram suggests that it’s about right:

The bit-pairs in the scalar are [2, 1, 3, 0, 2]  so we first divide and take subdivision 2, then divide that and take 1, then divide that, and so on. Ultimately we end up at the green dot. In this case the last step would be to add the appropriate bias, b5, to get a proper quad and then we’re all done.

Because this conversion handles all the levels in parallel there’s little benefit to converting to a higher zoom level than the most precise, typically 31. If you want level 5 do the conversion to level 31 and then get the 26th ancestor of the result.

Most of the work in performing a conversion is spreading out the scaled 31-bit integer coordinates. One way to spread the bits is using parallel prefix which takes log2 n steps for an n-bit word,

  00000000abcdefgh
→ 0000abcd0000efgh
→ 00ab00cd00ef00gh
→ 0a0b0c0d0e0f0g0h

That’s just one way to do it, there are others.

The operation above gets you into quad-land and obviously you’ll want to be able to get out again. This turns out to just be a matter of running the encoding algorithm backwards. The main difference is that here we’ll stay on whatever zoom level the quad’s on instead of always going to level 31.

Given a quad q this is how you’d decode it:

1. Subtract the bias to get the quad’s scalar.
2. Mask out all the even-index bits to get the spread x value and the odd-index bits to get the spread y value. Shift the y value right by 1.
3. Pack the spread-x and -y values back together to get a zq-bit integer value for each. Packing can be similarly to spreading but in reverse.
4. Floating-point divide the integer values by 2zq. You now have your (x, y) coordinates between 0 and 1.

A quad is an area but the result here is a point, the top left corner of the quad. This is often what you’re interested in, for instance if you’re calculating is the full area of the quad. There you can just add the width and height of the quad (which are both 2-zq) to get the other corners. Alternatively you might be interested in the center of the quad instead. To get you can modify the last step of the algorithm to do,

Apply (2 v + 1) / 2zq+1 as a floating-point operation to each of the integer coordinates.

This is just a circuitous way to add 0.5 to v before dividing by 2zq but this way we can keep the value integer until the final division. If you want to minimize the loss of precision when round-tripping a point through a quad you should use the center since it’s the point with the smallest maximum distance to the points that belong to that quad.

So, now we can convert floating-point coordinates to quads and back again with ease. Well, relative ease. In the last part I’ll talk briefly about something I’ve been hand-waving until now: converting from lat/long to the unit square and back.

Latitude and longitude

When you encounter a lat/long it is typically expressed using the WGS 84 standard. The latitude, a value between -90 and 90, gives you the north-south position. The longitude, a value between -180 and 180, gives you east-west.

A typical world map will look something like the one on the right here with America to the left and Asia to the right. The top left corner, northwest of North America, is lat 90 and long -180. The bottom right, south of New Zealand, has lat -90 and long 180. You could convert those ranges to an (x, y) on the unit square in any number of ways but the way I’ve been doing it follows the z-order curve the same way I’ve been using in this post: North America is in division 0, Asia in division 1, South America in division 2, and Australia in division 3. The conversion that gives you that is,

x = (180 + long) / 360
y = (90 – lat) / 180

This puts (0, 0) in the top left corner and (1, 1) in the bottom right and assigns the quads in that z-shaped pattern.

There’s a lot to be said about map projections at this point. I limit myself to saying that while it’s convenient that there’s a simple mapping between the space most commonly used for geographic coordinates and the space of quads, the Mercator projection really is a wasteful way to map a sphere onto a square. For instance, a full third of the space of quads is spent covering the Southern Ocean and Antarctica even though they take up much less than a third of the surface of the globe. I’d be interested in any alternative mappings that make more proportional use of the space of quads.

Okay, so now there’s a way to convert between WGS 84 and quads. I suspect there’s a lot more you can do with z-quads but this, together with the operations from the two previous posts, what I needed for my application so we’re at to the end of my series of posts about them. Well, except for one final thing: string representation. I’ve always been unhappy with how unmemorable geographic positions are so I made up a conversion from quads to sort-of memorable strings which I’ll describe in the last post.

In the meantime I made a little toy site that allows you to convert between positions and quads at a given zoom level. Here is an example,

This is quad 167159423 at zoom level 14 which covers, as always, Århus. The URL is

http://zquad.p5r.org?latlng=56.1676,10.2062&zoom=14

which kind of explains itself. You can also do

http://zquad.p5r.org?address=San+Francisco&zoom=14

Finally to see where a particular quad is try

http://zquad.p5r.org?quad=15386

If you want to explore the space of quads you can play around with the parameters yourself.

This is the second post about z-quads, a spatial coordinate system. In the first post I outlined how they work and described how to jump between quads at different zoom levels. In this post I’ll look at how to determine whether one quad contains another, and how to find the common ancestor of two quads.

Containment

Given two quads q and s, how do we determine if q contains s? It’s pretty simple – there’s two cases to consider.

The one case is that the zoom level of q is higher than the zoom level of s. (Let’s call them zq and zs). If q is zoomed in more than s then it can’t contain s because it’s clearly smaller. So now let’s assume that q is at the same level or higher than s. If s is contained in q then that’s just another way of saying that q is s‘s ancestor. So if you get the ancestor of s that’s at the same level as q then you should get – q. In other words,

contains?(qs) = (zq ≤ zs) and (ancestor(szs – zq) = q)

Pretty simple right? By the way, now you’re seeing why I pointed out before that the ancestor operation is defined for n=0: that’s what you’re using if you do contains?(q, q), which obviously you’d expect to be true.

This is the first operation where the zoom level appears explicitly and I’ll get into how to determine that for a given quad in a second. But first I’ll complete containment and talk about finding the most specific common ancestor of two quads.

Common ancestor

Let’s say we have two quads, call them q and s again, and you want to find the most specific ancestor of the two. That is, the ancestor that contains both of them and where no child also contain them both – since that would make the child more specific. This is well-defined since any two quads have, if no other, at least the trivial ancestor 0 in common.

The first step is to normalize q and s to the same zoom level. We’ll assume that s is zoomed in at least as much as q – if it’s not you can always just have an initial step that swaps them around. We can zoom s out to zq and it won’t change what their common ancestor is because we know that is has to be at level zq or above for it to be an ancestor of q. Let’s use t to refer to the result of zooming s out to q‘s level. So now we have q and t which are both a zoom level zq.

To do the rest of the computation we need a bit more intuition about how the space of quad scalars is structured. As I’ve mentioned a few times at this point, because quads are numbered in z-order within one level all the descendants of a particular quad are contiguous, and so are the scalars. That means that you can split the binary representation of a scalar into 2-bit groups where each group corresponds to a subdivision of the whole space.

Here’s an illustration of how that works out for the scalars at zoom-2. You can think of the binary representation of the scalar, here for instance 9, as consisting of two groups: the top two bits indicate which 2×2 subdivision the quad belongs to, and the bottom two bits as the 1×1 subsubdivision within that one.

If you’re given two scalars, let’s take 9 and 11, and you write their binary representations together, then their common ancestor almost jumps out at you,

 9: 10 01
11: 10 11

You can tell from the top two bits that they’re both in the same subdivision but then they’re in different subsubdivisions. So the most specific common ancestor is the one that’s filled with 8s above.

This gives you a hint of how to find the most specific common ancestor more generally: you take the two scalars, then you see which 2-bit groups are different, and then you zoom out just enough that you discard all the groups that are different and keep the ones that are the same. You can find the highest bit where two numbers are different by xor’ing them together and taking the highest 1-bit in the result:

   9: 10 01
11: 10 11
xor: 00 10

Here the highest different bit is at 2 so they’re different at the lowest zoom level but then they’re the same. So their most specific common ancestor is the quad you get by zooming out one step from either of them. More generally, the formula for getting the most specific common ancestor of two quads q and t at the same zoom level z is,

msca(qtz) = ancestor(q, ffs((q – bz xor t – bz) + 1) / 2)

Here ffs is find-first-set which returns the index of the highest 1-bit in a word, 0 if there are none. The (x+1)/2 part is to get from a bit index to a zoom level. For instance if the highest set bit is 1 or 2, as in the example, (x+1)/2 will be 1.

At this point let’s just recap quickly. Given a quad you can get an ancestor, a descendant, and you can determine how it is related to any of its ancestors. Given two quads you can determine whether one contains the other, and you can find the most specific common ancestor. All in a constant, low, number of bit operations. There are more operations to come but already I think we’ve got a big enough library of operations to be useful.

I’ve glossed over a few things along the way though, particularly how to determine the zoom level for a given quad. That’s what I’ll talk about next. It’s kind of fiddly so if you’re okay with trusting that it can be calculated efficiently then you can just skip this part and go on to the part where I talk about how to convert from regular floating point coordinates to quads and back.

Otherwise, let’s go.

Zoom

Given a quad, what is its zoom level?

Since quads at higher zoom levels have higher numeric values one way to answer this is to just compare with the biases. If a value is between b4 and b5 then it must have zoom level 4, and so on. Since the biases are sorted you can even do binary search. But you can be a lot smarter than that.

Here’s an illustration of where the quads at zoom level 4 and 5 are located on the numbers line,

The zoom-4 quads start at 85 and end at 340, then the zoom-5 quads start at 341 and run until 1364 where zoom-6 starts. Each range of quads contains values from three different 2-power intervals: zoom-4 contains values from the 26 to the 28 interval, zoom-5 from 28 to 210, and so on. This suggests that determining the highest set one-bit in the quad is a good place to start.

As an example, say the highest one-bit is 8 or 9. That is, the quad is between 256 and 1024. If you look at the numbers line then you see that there are two possibilities: if it’s smaller than b5, 341, then it’s at level 4, otherwise it must be at level 5. Similarly, if the highest bit is at index 10 or 11, that is a value between 1024 and 4096, then the level is 5 or 6 depending on whether the quad is larger than b6, 1365.

There’s a general principle working here. Say highest set bit of a quad is at index k. The zoom level will be c=k/2 if the quad is is smaller than bc+1, otherwise it’s c+1.

As an illustration, let’s try this with one of the quads from the introduction, 171171340006. The highest 1-bit is at index 37 so c is 18. This means that that quad is at zoom level 18 or 19. The boundary between the two is b19 which is 91625968981. This is less than the quad’s value so the result must be 19.

Okay so this almost gives us a fast way to calculate the zoom level. The one thing that’s missing is: how to get an arbitrary zoom bias quickly – how do you know that b19 is 91625968981? There’s only a small number of bias values of course so you can just put them in a table, but you can also calculate them directly. The thing to notice is that the binary representation of a bias is simple:

b3: 10101 (= 21)
b4: 1010101 (= 85)
b5: 101010101 (= 341)

…and so on. They’re all alternating ones and zeroes. That’s because, as we saw before, they are made up of sums of powers of 4. That gives us a practical way to calculate them: take the binary number that consists of alternating 0s and 1s and mask out the bottom part of the appropriate size. So for b3 we mask out the bottom 5 bits, for b4 the bottom 7 and so on:

bz = 0x5555555555555555 & ((1 << 2 z) - 1)

I imagine there might be more efficient ways but this one works.

Calculating the zoom level isn’t particularly expensive but you need it for most operations so in my own code I typically compute it once early on and then keep it cached along with the quad itself (though it’s not something you would ever store or transmit along with the quad since it’s redundant). Even for some of the operations where you technically don’t need it it’s still useful to have it. For instance, you can get nth ancestor of a quad without knowing the zoom, but the operation is only well-defined if the quad’s zoom is n or greater so it’s nice to have the zoom around so you can make that assertion explicitly in the code.

I think that’s enough for one post. In the next part we’ll talk about how to do conversion between coordinates and quads.

This post is about a spatial coordinate system I’ve been playing around with, z-quads. Unlike most commonly used coordinate systems a z-quad doesn’t represent a point but a square with an area.

To the right are some examples of quads of increasing accuracy, gradually zooming in on Århus. (The quads are the red rectangles). The first one that covers northern Germany, some of England, and half of Denmark, is quad 637 which has zoom level 5. The one that covers all of Århus is 163241 which has zoom level 9. Going further down to zoom level 15 is 668638046 which covers part of the center of Århus, and finally at zoom level 19 is 171171340006 which covers just a couple of houses.

This should give a rough sense of how the system works. A quad is an integer, typically 64 bits, which represents some square area on the earth. Each quad has a zoom level associated with it and higher numbers have higher zoom levels. There’s a lot more to it than that but that’s the core of it.

Below I’ll describe how z-quads are constructed, which is pretty straightforward. Then I’ll look at how to operate on them: how to convert to and from them, zoom in and out, that kind of thing.

Divide, divide, divide some more

For a hobby project I’m working on I needed to represent a large set of geographic coordinates. They happen to represent positions of bus and train stops but that’s not super important. To the right is a diagram that illustrates what I had in mind when I started playing around with this. What I wanted really was a way to divide the world evenly into squares of different sizes that I could divide my positions into. (For simplicity I’ve scaled the world down to a 1×1 unit square). Each of the three colored squares here should have some unique integer id. When I say “divide the world evenly” I mean that they’re not arbitrary squares, they’re on the grid you get if you divide the whole square into in four equal parts, then divide each of those into four as well, and so on and so on.

It seemed intuitively reasonable that you’d want the entire space to be represented by 0.

This is zoom level 0. Zooming down one level, you might divide 0 into four parts and call them 1-4,

This is zoom level 1. Then you can keep going and do the same again to get zoom level 2,

At each new zoom level you add 4 times as many new quads as you had on the previous level so where level 2 has 16, level 3 will have 64 and level 4 will have 256.

This is how z-quads are constructed. Pretty simple right? Already, by the way, we know the id of the blue square from the beginning because it’s there at zoom level 2: it’s quad 14. And just as a reminder, at this point we’re playing around on the unit square but it’s just a matter of scaling the values back up to get coordinates on the earth. Here’s 14 again, scaled back up:

One thing you may notice if you look closely at how the quads at zoom level 2 are numbered is that they look of out of sequence. You might expect them to be numbered 5-8 on the top row, 9-12 on the next, and so on. Instead they’re numbered in zig-zag, like this:

So they zig-zag within each of the four parts of the big square, and also when going between the four parts. That’s because it turns out to be really convenient when you divide a quad to have all the quads in the subdivisions be contiguous. So the children of 1 are 5-8, of 2 are 9-12, and so on. We’ll end up relying on that constantly later on. (The numbering within each level it known as a Z-order or Moreton order curve).

Okay, so now I’ve postulated way to divide the unit square, by which I mean the surface of the earth, into regular parts. The next step is to figure out how to operate on them which is where it’ll become clear why this number is is particularly appealing.

Zoom bias

Given a quad there are many operations that you might conceivably want to perform. You might want to know which quad is its parent (that is, the quad that contains it at the zoom level above) or the parent’s parent, etc. Given two quads you might want to know whether one contains the other. Or you might want to find the most specific quad that contains both of them.

The first thing we’ll look at is how to get the parent of a quad. To explain that I need to introduce two contepts, zoom bias and a quad’s scalar. The zoom levels we’ve seen so far (and a few more) and the values of their quads are

• Zoom 0: 0
• Zoom 1: 1-4
• Zoom 2: 5-20
• Zoom 3: 21-84
• Zoom 4: 85-341

In general, at level z you have 4z quads, so at level 2 you have 42=16, the first one being 5. At level 3 you have 43=64, the first one being 21, and so on.

The value of the first quad for a given zoom level turns out to appear again and again when doing operations so I’ll give that a name: the zoom bias. When I talk about the zoom-3 bias, which I’ll typically write as b3, what I’m referring to is 21 since that is the value of the first quad at zoom level 3.

The scalar of a quad is its value minus the bias of its zoom level. Here are the biases and scalars for zoom levels 1 to 3:

Parents and children

Now we’re ready to look at how you get a quad’s parent. As mentioned, quads are numbered such that the subquads of each parent are contiguous. That obviously also holds for the scalars. That means that if you divide the scalar value by 4, the number of children in each parent, this happens:

You get the scalar of the parent. So given a quad q at zoom level z, the parent quad can be computed as

parent(q) = ((qbz) / 4) + bz-1

Unbias the quad to get the scalar, divide it by 4 to get the parent’s scalar, then add the parent’s zoom level bias to get back a quad. And in fact this can be simplified if you use the fact that the zoom biases are related like this,

bz+1 = 4bz + 1

I’ll justify this in a moment but for now you can just take it as a given. If you plug this in and simplify you get that

parent(q) = (q – 1) / 4

(By the way, except when stated explicitly all values are integers so division implicitly rounds down). What’s neat about the simplified version is that bz no longer appears so you can apply this without explicitly knowing the zoom level. It’s pretty easy to calculate the zoom level for a quad, I’ll go over that in a bit, but it’s even easier if you don’t have to.

How about going the other way? Given a quad, finding the value of one of its children? That’s the same procedure just going the other way: unbias the parent, multiply by 4, add the child’s index (a value from 0 to 3), rebias:

child(q, i) = 4(qbz) + i + bz+1

Again this can be simplified by using the relation between the zoom biases to

child(qi) = 4q + i + 1

If instead of the child’s index (from 0 to 3) you use its quad value (from 1 to 4) it becomes even simpler,

child(qi) = 4q + i

This is part of why they’re called z-quads: it’s from all the z-words. The numbering zig-zags and it’s easy to zoom in and out, that is, get between a quad and its parent and children. (If, by the way, you’ve ever used a d-ary heap you may have noticed that this is the same operation used to get the child node in a 4-ary heap).

Since you can calculate the value of a child quad given its parent it also means that you can calculate the value of a quad one zoom level at a time, starting from 0. For instance, in the original example with the colored squares the blue square was child 2 of child 3 of 0. That means the value is

child(child(0, 3), 2) = 4 (4 0 + 3) + 2 = 4 3 + 2 = 12 + 2 = 14

Which we already knew from before of course. This isn’t how you would typically convert between positions and quads in a program, there’s a smarter way to do that, but I’ve found it useful for pen-and-paper calculations.

Ancestors and descendants

Now we know how to go between a quad and its parent and children we can generalize it to going several steps. When going multiple steps, instead of talking about parents and children we’ll talk about ancestors and descendants. An ancestor of a quad is its parent or one of the parent’s ancestors. A descendant of a quad is one of its children or one of the children’s descendants.

It turns out that just as there’s a simple formula for getting a quad’s parent there is one for getting any of its ancestors:

ancestor(q, n) = (qbn) / 4n

Given a quad this gives you the ancestor n steps above it. The special case where n is 1 is the parent function from before, b1 being 1. Even though it might not seem sensible now it’s worth noting for later that this is also well-defined for n=0 where the result is q. So a quad is its own 0th ancestor.

It’s easy to prove that the ancestor function is correct inductively by using a second relation that holds between the bzs:

bz+1 = 4z + bz

I won’t do the proof here but maybe we should take a short break and talk about why these relations between the biases hold. The bias at a zoom level is the index of the first quad at that level. At level 0 there is 1 (=40) quad so b1, the next available index, is 1. At level 1 there are 4 (= 41) which start from 1 so the next bias is

b2 = 40 + 41 = b1 + 41 = 5

If we keep going, the next level adds 16 (= 42) quads in addition to the 5 we already have which makes the next bias

b3 = 40 + 41 + 42 = b2 + 42 = 21

You can see how each bias is a sum of successive powers of 4 which gives you the relation I just mentioned above, that to get the next bias you just add the next power of 4 to the previous bias.

b3 = b2 + 42

It also gives us the other relation from earlier,

b3 = 40 + 41 + 42 = 40 + 4(40 + 41) = 4 b2 + 1

Okay so back to ancestors and descendants. As with ancestors you can also get to arbitrarily deep descendants in one step,

descendant(q, c, n) = 4n q + c

This gives you the descendant of q which relates to q the same way quad c, which has zoom level n, relates to the entire space. This can again be proven by induction with the base case being the child operation from before. And again this is well-defined for n=0 where the result is q.

So now we can jump directly between a quad and the quads related to it at zoom levels above and below it. This will be the building blocks for many other operations. But before we go on to the next part I’ll round off the ancestor/descendant relation with the concept of descendancy.

descendancy(q, n) = ((qbn) mod 4n) + bn

The descendancy is a little less intuitive than the others, that’s also why I’m not spending more time on how it’s derived or proving it correct, but it turns out that it’s actually quite useful when you work with quads in practice. And it definitely feels like the model would be incomplete without it. Note again, as always, this is well-defined when n=0 where it yields 0 (as in, any quad spans the entirety of itself).

Now we have a nice connection between the three operations: for all q and n where the operations are well-defined it holds that,

q = descendant(ancestor(q, n), descendancy(q, n), n)

With these basic building blocks in place we’re ready to go on to some higher-level operations, the first ones will be the two containment operations. Given quads q and s, is s contained in q? And given two quads, determine the most specific ancestor that contains both of them. For those you’ll have to go on to the next post, z-quads: containment.

(Update: hacker news discussion)

(Update: added some references mentioned in the discussion: z-order, d-ary heaps. Remove part where I couldn’t find other non-point coordinate systems)

Classboxes

I’m not convinced classboxes actually work.

I was reading up on safer approaches to monkeypatching and while looking into refinements in ruby I reread the classboxes article by Bergel, Ducasse, and Wuyts. This time I had some concrete uses in mind so I tried applying their lookup algorithm to those, and the way they behaved I’m not sure would work well in practice. This post assumes you’re already familiar with the classboxes article, but maybe it’s been a while so here’s a quick reminder.

Monkeypatching, as used in practice in languages like ruby, means adding methods to existing classes such that they’re visible across the whole program. This is obviously dangerous. Classboxes is intended as a more well-behaved way of doing the same thing. The extension is locally scoped and only visible to the parts of the program that explicitly import it.

The way classboxes work, basically, is that instead of the method dictionary of a class being a mapping from the name of a method to its implementations, it becomes a mapping from the name and the classbox that defined it to the implementation. So you’re still actually monkeypatching the entire program, but because methods are now tagged with their classbox they can be considered or ignored as appropriate, based on where the method you’re looking for is being called. If you’re making a call in a particular class box, and you’ve imported some other classbox, directly or indirectly, then the lookup will consider methods from that other classbox. Otherwise not. (Well, sort of, more or less). Okay, so here’s a concrete example of how it works and maybe doesn’t work.

Imagine you have a simple core library in one classbox that, among other basic types, defines a string type: (Don’t get hung up on the syntax, it’s just pseudocode).

classbox core {

class String {
# Yields the i'th character of this string.
def this[i] => ...

# The number of characters in this string.
def this.length => ...
}

...

}

Let’s say you also have a simple collection library in another classbox that defines a simple array type,

classbox collection {

import core;

class Array {
# Yields the i'th element of this array.
def this[i] => ...

# The number of elements in this array.
def this.length => ...

# Call the given block for each element.
def this.for_each(block) => ...

# Return the concatenation of this and that.
def this.concat(that) {
def result := new Array();
for e in this do
for e in that do
return result;
}
}

...

}

So far so good, this is the basics. Now, it’s easy to imagine someone thinking: a string sure looks a lot like an array of characters, it just needs for_each and a concat method and then I can use my full set of array functions on strings as well. I know, I’ll make an extension:

classbox strings_are_arrays {

import core.String;

refine core.String {
# Call the given block for each character.
def this.for_each(block) => ...

...
}

}

Now we can use a string as if it were an array, for instance

classbox main {

import array;
import strings_are_arrays;

def main() {
def arr := ['f', 'o', 'o'];
def str := "bar";
print(arr.concat(str));
# prints ['f', 'o', 'o', 'b', 'a', 'r']
}

}

Neat right? This is how I would expect extension methods to work. Except, as far as I can tell, it doesn’t. This won’t print an array, it will error inside Array.concat.

Classbox method lookup uses the classbox that started the lookup to determine which methods are available. A method call in main will be able to see all the classboxes: itself, array and strings_are_arrays because it imports them, and core because strings_are_arrays imports it. So it can call String.for_each. It doesn’t call it directly though but indirectly through Array.concat. The classboxes visible from Array is collection because it’s in it and core because it imports it, not strings_are_arrays. So the attempt to call for_each on the string will fail.

Maybe this is a deliberate design choice. I suspect that design won’t work well in practice though. This way, extensions are second-class citizens because they disappear as soon as your object flows into code in another classbox. It also makes code sharing less effective: there may be a method that does what you want but you have to copy it into your own classbox for the extensions you need to be visible. It may also fail the other way round: someone passes you an object of a type you have extended and suddenly it changes behavior in a way neither you nor the caller could not have expected.

There’s of course also the chance that I’ve misunderstood the mechanism completely. I haven’t been able to find a full formal semantics so some of this is based on sort-of guesswork about how it works. But if this is how it does work I’m not convinced it’s a good idea.