# Combinators

Over on his blog, Gilad is having fun with parser combinators in smalltalk. Through the magic of smalltalk you can define a domain-specific language and then write parsers directly in the source code. For instance, this grammar

`  :=  (|)*   | +   | "("  ")"`

can be written in smalltalk as

`expression  ^ ((self letter), ([self letter] | [self digit]) star)  | (self digit) plus  | ((self delim: '('), [self expression], (self delim: ')'))`

I won’t explain what it means, you should go read Gilad’s post for that.

Back before I wrote my master’s thesis (about dynamically extensible parsers) I actually wrote a simple prototype in squeak and, as you can see, smalltalk is fantastically well suited for this kind of stuff. So I thought, what the heck, maybe I’ll write a little parser combinator framework in squeak myself. So I did. Here’s a few thoughts about squeak and parser combinators.

### Left recursion

Gilad mentions that you can’t express left recursive grammars directly using parser combinators. For instance, you can’t write

` :=  *         |  /         |  +         |  -         | `

because must not occur on the far left side of a production in its own definition. But there are actually relatively few uses for left recursive productions and rather than restructuring the grammar to avoid left recursion you can add a few combinators to the framework that handle 95% of all uses for this. One place where you often see left recursive grammars is lists of elements separated by a token:

` ->  ","  | `

This pattern is used so often that it makes sense to introduce plus and star combinators that take an argument, a separator that must occur between terms:

`exprList  ^ (self expr) star: (self delim: ',')`

The other and most complicated use of left recursion is operator definitions but that can also be expressed pretty directly in smalltalk with a dedicated combinator:

`expr  ^ (self ident) operators: [ :term |      (term * term).      (term / term).      (term + term).      (term - term).  ]`

This construct takes an “atomic” term, that’s the kind of term that can occur between the operators, and then defines the set of possible operators. The result is a parser that parses the left recursive expression grammar from before. You can also expand this pattern to allow precedence and associativity specifications:

`expr  ^ (self ident) operators: [ :term |      (term * term) precedence: 1.      (term / term) precedence: 1.      (term + term) precedence: 2.      (term - term) precedence: 2.      (term = term) precedence: 3; associate: #right.  ]`

It does take a bit of doesNotUnderstand magic to implement this but the result is a mechanism that kicks ass compared to having to restructure the grammar or define a nonterminal for each level in the precedence hierarchy. Also, it can be implemented very efficiently.

### Squeak

It’s been a few months since I last used squeak and it seems like a lot has happened, especially on the UI side. I think squeak is an impressive system but in some areas it’s really an intensely poorly designed platform. That probably goes all the way back to the beginning of smalltalk.

The big overriding problem is, of course, that squeak insists on owning my source code. I’m used to keeping my source code in text files. I’m happy to let tools manage them for me like eclipse does but I want the raw text files somewhere so that I can create backups, put them under version control, etc. With squeak you don’t own your source code. When you enter a method it disappears into a giant soup of code. This is a problem for me in and of itself but it’s especially troubling if you’re unfortunate enough to crash the platform. That’s what happened to me: I happened to write a non-terminating method which froze the platform so that I had to put it down. That cost me all my code. No, sorry, that cost me its code. The worst part is that I know that smalltalkers think this ridiculous model is superior to allowing people to manage their own source code. Well, you’re wrong. It’s not that I like large source files, but I want to have some object somewhere outside the platform that contains my code and that I have control over. And yes I know about file out, that’s not what I’m talking about.

Another extremely frustrating issue is that squeak insists that you write correct code. In particular you’re not allowed to save a method that contains errors. I think it’s fine to notify me when I make an error. Sometimes during the process of writing code you may, say, refer to a class or variable before it has been defined or you may briefly have two variables with the same name. I don’t mind if the system tells me about that but squeak will insist that you change your code before you can save it. The code may contain errors not because it’s incorrect but because it’s incomplete. Squeak doesn’t care. I used another system like that once, the mjølner beta system, which was intensely disliked by many of us for that very reason.

This is just one instance of the platform treating you like you’re an idiot. Another instance is the messages. If you select the option to rename a method the option to cancel isn’t labelled cancel, no, it’s labeled Forget it — do nothing — sorry I asked. Give. Me. A. Break.

All in all, using squeak again was a mixed experience. As the parser combinator code shows smalltalk the language is immensely powerful and that part of it was really fun. But clearly the reason smalltalk hasn’t conquered the world is not just that back in the nineties, Sun convinced clueless pointy-haired bosses that they should use an inferior language like Java instead of smalltalk. It’s been a long time since I’ve used a programming language as frustrating as Squeak smalltalk. On the positive side, though, most of the problems are relatively superficial (except for the file thing) and if they ever decide to fix them I’ll be happy to return.

### 7 Responses to Combinators

1. Hi Christian,

I didn’t know you had done parser combinators before. Indeed, as you say, I have operators for lists with separators (and optional terminators. Things like starSeparatedBy:, plusSeparatedOrTerminatedBy: (an dthe other 2 obvious versions). I haven’t done operators yet because the need hasn’t arisen yet – but I’m sure it will.

I agree with a lot of your comments about Squeak. Actually, I don’t think the problems come from Smalltalk 80. The old stuff is damn good. A lot of the newer stuff is not of the same quality.

Of course, the nice thing is that you can fix almost any behavior that annoys you. For example, we’ve got a full syntax for Smalltalk classes and can save and read entire programs without the silly bang stuff (ok, that particular problem does come from Smalltalk 80), while still taking advantage of the full IDE in other ways.

In reality, few people depart from the conventions imposed by the platform, and I’m sure it is off putting to new users.

2. Squeak actually has a source code journal file with the name “[imageName].changes” which can be recovered in the event of any crash, using the “changes…” sub-menu of the desktop/world menu and then pick “recently logged changes…” under that to recover from, say, the last image snapshot. Squeak keeps every method version around using this journal. There are also file/CVS-friendly versioning systems like Monticello/DVS. Granted, the user interface and usability of Squeak are pretty poor, but the core functionality is not as fluffy and ill-considered as it might appear.

3. Brian: I did use the .changes file to reconstruct my code. I tried just applying all the changes sequentially but the system gave me an error message after just a few changes. Instead I ended up picking out the relevant changes and applying them manually. This was pretty tedious because I had more or less started over twice and renamed and reshuffled a lot of code that I had to go through in the change log. All in all it was not a happy experience.

I know the built-in versioning system is there, I haven’t tried it. But this sort of misses the point I was trying to make. What I want is not a versioning system as such, I want to have a free choice of what to do with my code. Maybe I’ll want to use Monticello but maybe I’ll want to store my code on sourceforge or google code hosting and then I’m in trouble. Or maybe I’ll want to do something with the code that the designers of squeak hadn’t anticipated. Squeak isolates itself from the outside world which makes a lot of stuff easier but which also means you have to do a lot of work just to be on par with the outside world, like writing your own VC system. I would much prefer a more open platform that integrated well with the outside world, like Eclipse.

4. Christian wrote What I want is not a versioning system as such, I want to have a free choice of what to do with my code.

I don’t know Squeak and only ever used the change log as the absolute last resort. So when I recently tried to port some language shootout programs to Squeak I did the boring thing and filed out my changes just in case I messed something up.

Source code in files, quaint but doable – even in Smalltalk.

5. I think you really overstate the problems in Smalltalk. Maybe more options should be looked at for allowing code to get out of the system, but that should absolutely not be the default behavior. Having the code in the system is one of the things that give Smalltalk it’s power.

You complaint about unsaved methods is because of the browser you are using. The system is not treating you as an idiot, it just wants to compile the method when you save it, so obviously the code has to be compile-able. If you don’t want this you can use an alternative browser that differentiates between save and compile.

6. Jason, having the code within the system is indeed powerful. That does not, however, mean that you can’t also have the code outside the system as files as well. Eclipse (and pretty pretty much all other IDEs I can think of) use such a model. I hate to see my code, that I’ve spent a lot of time writing, disappearing into a soup of code that I have very little control over.

When I said that the system treated me as an idiot I was referring to the error messages and (I didn’t mention that in the post) the documentation written in first person. I’m sure they’re great for children but for a serious programmer working on serious software I don’t have any patience for the system trying to be “cute”.

You may be right that my criticism of not being able to save syntactically illegal code should be directed at the browser and not squeak as a whole. It doesn’t mean that the criticism doesn’t still stand. Whatever component defines this behavior it is a very poor design.

7. Jason, having the code within the system is indeed powerful. That
does not, however, mean that you can’t also have the code outside the system as files as well.

True.

Eclipse (and pretty pretty much all other IDEs I can think of) use such a model. I hate to see my code, that I’ve spent a lot of time writing, disappearing into a soup of code that I have very little control over.

Well this is a matter of opinion. Personally, I don’t feel any less control over the image then my OS which could also fail and lose everything. I have only had one failure, at which I panicked at first. Then I found the changes and merged them back in easily enough and all was good again.

When I said that the system treated me as an idiot I was referring to the error messages and (I didn’t mention that in the post) the documentation written in first person. I’m sure they’re great for children but for a serious programmer working on serious software I don’t have any patience for the system trying to be “cute”.

Ah, ok. Fair enough. I’m not a fan of first person documentation either honestly.

You may be right that my criticism of not being able to save syntactically illegal code should be directed at the browser and not squeak as a whole. It doesn’t mean that the criticism doesn’t still stand. Whatever component defines this behavior it is a very poor design.

Again, this is opinion, not bad design. Some people switch the browser right away to allow them to save syntactically wrong code. Personally I don’t. If I need syntactically wrong code to stay in a method for documentation then I turn it into a comment. I like “accept” meaning “compile” because I don’t want to look around for where I forgot to compile something.

Also, the only thing you can’t save is code that simply wont compile. You are free to reference things that don’t (yet) exist. It will pop up a box checking to see if you misspelled something, but if you didn’t you can just say OK and it goes on.