Combinators

Over on his blog, Gilad is having fun with parser combinators in smalltalk. Through the magic of smalltalk you can define a domain-specific language and then write parsers directly in the source code. For instance, this grammar


:= (|)*
| +
| "(" ")"

can be written in smalltalk as

expression
^ ((self letter), ([self letter] | [self digit]) star)
| (self digit) plus
| ((self delim: '('), [self expression], (self delim: ')'))

I won’t explain what it means, you should go read Gilad’s post for that.

Back before I wrote my master’s thesis (about dynamically extensible parsers) I actually wrote a simple prototype in squeak and, as you can see, smalltalk is fantastically well suited for this kind of stuff. So I thought, what the heck, maybe I’ll write a little parser combinator framework in squeak myself. So I did. Here’s a few thoughts about squeak and parser combinators.

Left recursion

Gilad mentions that you can’t express left recursive grammars directly using parser combinators. For instance, you can’t write

 :=  * 
| /
| +
| -
|

because must not occur on the far left side of a production in its own definition. But there are actually relatively few uses for left recursive productions and rather than restructuring the grammar to avoid left recursion you can add a few combinators to the framework that handle 95% of all uses for this. One place where you often see left recursive grammars is lists of elements separated by a token:

 ->  ","  | 

This pattern is used so often that it makes sense to introduce plus and star combinators that take an argument, a separator that must occur between terms:

exprList
^ (self expr) star: (self delim: ',')

The other and most complicated use of left recursion is operator definitions but that can also be expressed pretty directly in smalltalk with a dedicated combinator:

expr
^ (self ident) operators: [ :term |
(term * term).
(term / term).
(term + term).
(term - term).
]

This construct takes an “atomic” term, that’s the kind of term that can occur between the operators, and then defines the set of possible operators. The result is a parser that parses the left recursive expression grammar from before. You can also expand this pattern to allow precedence and associativity specifications:

expr
^ (self ident) operators: [ :term |
(term * term) precedence: 1.
(term / term) precedence: 1.
(term + term) precedence: 2.
(term - term) precedence: 2.
(term = term) precedence: 3; associate: #right.
]

It does take a bit of doesNotUnderstand magic to implement this but the result is a mechanism that kicks ass compared to having to restructure the grammar or define a nonterminal for each level in the precedence hierarchy. Also, it can be implemented very efficiently.

Squeak

It’s been a few months since I last used squeak and it seems like a lot has happened, especially on the UI side. I think squeak is an impressive system but in some areas it’s really an intensely poorly designed platform. That probably goes all the way back to the beginning of smalltalk.

The big overriding problem is, of course, that squeak insists on owning my source code. I’m used to keeping my source code in text files. I’m happy to let tools manage them for me like eclipse does but I want the raw text files somewhere so that I can create backups, put them under version control, etc. With squeak you don’t own your source code. When you enter a method it disappears into a giant soup of code. This is a problem for me in and of itself but it’s especially troubling if you’re unfortunate enough to crash the platform. That’s what happened to me: I happened to write a non-terminating method which froze the platform so that I had to put it down. That cost me all my code. No, sorry, that cost me its code. The worst part is that I know that smalltalkers think this ridiculous model is superior to allowing people to manage their own source code. Well, you’re wrong. It’s not that I like large source files, but I want to have some object somewhere outside the platform that contains my code and that I have control over. And yes I know about file out, that’s not what I’m talking about.

Another extremely frustrating issue is that squeak insists that you write correct code. In particular you’re not allowed to save a method that contains errors. I think it’s fine to notify me when I make an error. Sometimes during the process of writing code you may, say, refer to a class or variable before it has been defined or you may briefly have two variables with the same name. I don’t mind if the system tells me about that but squeak will insist that you change your code before you can save it. The code may contain errors not because it’s incorrect but because it’s incomplete. Squeak doesn’t care. I used another system like that once, the mjølner beta system, which was intensely disliked by many of us for that very reason.

This is just one instance of the platform treating you like you’re an idiot. Another instance is the messages. If you select the option to rename a method the option to cancel isn’t labelled cancel, no, it’s labeled Forget it — do nothing — sorry I asked. Give. Me. A. Break.

All in all, using squeak again was a mixed experience. As the parser combinator code shows smalltalk the language is immensely powerful and that part of it was really fun. But clearly the reason smalltalk hasn’t conquered the world is not just that back in the nineties, Sun convinced clueless pointy-haired bosses that they should use an inferior language like Java instead of smalltalk. It’s been a long time since I’ve used a programming language as frustrating as Squeak smalltalk. On the positive side, though, most of the problems are relatively superficial (except for the file thing) and if they ever decide to fix them I’ll be happy to return.

7 Responses to Combinators

Leave a Reply

Your email address will not be published. Required fields are marked *


*