I found an article at dusty decks about the original Fortran I compiler. This is interesting because the designers of the fortran compiler were the first to solve a whole bunch of interesting problems. Being a parser geek, I think their solution to operator precedence parsing especially neat.
The goal of operator precedence parsing is to take an expression such as
2 * x + y / 8
and group it according to a set of precedence rules such as “*
and /
bind tighter than +
“, in this case producing
((2 * x) + (y / 8))
Today this problem is well-understood and there is a bunch of algorithms for solving it; the algorithm used in tedir is shift/reduce expression parsing. The fortran guys, on the other hand, had to solve this problem from scratch.
The way they did it was to scan through the expression and expand it in various places with a sequence of parentheses. In the simple case for a language with only +
, -
, /
and *
, the algorithm would do the following:
- Insert
((
at the beginning of the expression and))
at the end - For each
+
and-
insert))+((
and))-((
respectively. - For each
*
and/
insert)*(
and)/(
respectively.
And that’s it! Once the expression has been expanded using these rules it will be correctly parenthesized. For instance
- 2*x+y/8
- ((2*x+y/8
- ((2)*(x+y/8
- ((2)*(x))+((y/8
- ((2)*(x))+((y)/(8
- ((2)*(x))+((y)/(8))
It might not be obvious that this works in all situations, and inserting unmatched parentheses should make any programmer feel queasy. But it does work (Knuth says so) and it generalizes nicely, you just have to insert additional parentheses. Of course, once you get above 5 levels of precedence you might want to investigate ways to optimize the algorithm to avoid inserting endless amounts of parentheses. And once you do that the algorithm starts looking a lot like more “modern” algorithms like shift/reduce parsing. Which was, coincidentally, invented just a few years later by Knuth who also wrote about this algorithm.