Monthly Archives: October 2006

Recursion

..and the winner in the category stupidest animal name is: the south american bat anoura caudifer, known in english as the tailed tailless bat. Well done.

I guess it’s just a matter of time before someone discovers the tailless tailed tailless bat.

Work

And now, a public service announcement. After more than three months of glorious unemployment I have now relapsed into a job. Indeed, a job with the very company that powers this blog: google.

In other, related, news: I will be in Mountain View for the next few weeks.

That is all.

APT

As I mentioned in my last post I’ve been playing around with database interfaces. While doing this, I’ve discovered something new: APT, the annotation processing tool for Java. I’ve known for a while that it existed but only now have I discovered what the big deal is. APT. Kicks. Ass!

In the database interface for Java I’m working on, you specify the structure of a database through annotated interfaces, essentially the same way things work in DLINQ. For instance, a database called test containing two tables would be specified like this:

public @Database("test") interface ITestDatabase extends IDatabase {
public @Table IDataSet public @Table IDataSet orders();
}

Using this interface you can open a connection to an actual relational database and fetch the contents either directly (by calling the persons or orders method) or you can specify more complex queries through something more “magical” which I’ll describe later.

The thing is that there’s a lot of ways you can get these interfaces wrong. For instance, you must use interfaces, not classes. Database interfaces must extends IDatabase. The methods that correspond to tables must return a IDataSet of some sort. And so on.

I considered this to be a problem until I discovered the annotation processor framework. An annotation processor is essentially a compiler plugin that is given access to parts of the syntax tree during compilation. That can be used for all sorts of interesting things, including issuing errors and warnings. So I’ve created an annotation processor that you can plug into your compiler and then it will check that all database-related interfaces are well-formed, and it works beautifully.

To top it all off, the compiler in eclipse now supports APT. If you just configure it correctly, which is no more difficult than using an external library, you get custom checking right in eclipse:

Of course, as with all power tools, you have show a little restraint and not issue all sorts of lame warnings just because you can…

…even if it it’s so easy the code almost writes itself:

for (MethodDeclaration method : decl.getMethods()) {
if (method.getSimpleName().equals("foo")) {
env.getMessager().printError(method.getPosition(),
"Enought with the 'foo' methods already!");
}
]

I know this tool has been around for a while. Why has it taken me so long to discover how useful it is — has there been some kind of hype that I’ve missed?

And yes, I know I’ve misspelled “enough”.

Databases

Has it really been a month since I last posted? It’s not that I’ve been particularly busy, in fact it’s been quite the opposite: after being unemployed for two months I’ve entered a state of profound laziness. It culminated last week: I did nothing but lie on my couch in an almost liquid state listening reading a song of ice and fire and listening to sigur rós. What a week that was.

But I’m slowly regaining my energy and have found a new area of interest: databases. Not databases a such, though, but support for databases in programming languages. I don’t really know anything about databases but happened to read a paper by William Cook and Ali Ibrahim, Integrating Programming Languages & Databases: What’s the Problem?.

Ideally, a programming language interface to a database should provide static type checking while still allowing queries to be parameterized a runtime, and must allow the database to optimize queries. Especially the issue of allowing queries to be optimized makes this problem difficult. Imagine that you’re given a huge database of employees and want to find all the ones called “Bob”. If you don’t care about performance you can just fetch the set of all employees from the database into your program and then select the ones with the right name:

for (IEmployee e : db.getEmployees()) {
if (e.name().equals("Bob"))
bobs.add(e);
}

If you do care the least about performance, though, you should ask the database to do all the work and just return the Bobs:

SELECT * FROM employees
WHERE name = 'Bob'

This is where things become difficult: you want to write ordinary code in your program that describes the query, like e.name().equals("Bob") but on the other hand you don’t want that code to be executed but instead sent in a query to the database.

Microsoft’s LINQ solves this by allowing code to be translated into data in the form of expression trees:

Func<int, bool> filter = (n => n < 2);
Exprint, bool>> filterCode = filter;
Console.WriteLine("{0}", filterCode.Body.Right);

Here we convert the lambda expression, (n => n < 2), into a piece of data, the second line, and then in the third line extract the Body field, n < 2, and finally the Right hand side of the body, the constant 2. When run, the program prints

ConstantExpression(2)

This solves the problem since you can now take executable code and disassemble it, which enables you to send it to the database to be executed instead of doing it yourself. But it this approach also has problems. First of all, it limits what you can write in a lambda expression since it must be possible to convert lambdas into expression trees. Also, it requires changes to the language and it feels like something that has been grafted onto the language without being generally useful, just to make things work in this special case.

So I've been playing around with a solution of my own in Java, one that doesn't require changes to the language. Instead it uses proxy objects to turn code into data. There's still plenty of problems to solve but I think it looks promising. Actually, today I ran my first query built this way, not in a database but on an ordinary collection. Here's the program:

IDataSet strs = dataSet("Burke", "Connor", "Frank", ...);
Querynew Query IString str = from(strs);
where(str.length().is(5));
select(str.toUpperCase());
}};
query.printTranscript(System.out);

Notice how the body of the query looks a whole lot like a SQL query, but is written in Java and fully type checked. And the important part is that the query body is not executed as such but turned into data. The last statement prints the disassembled query:

--- query ---
def $x0 = from(DataSet["Burke", "Connor", "Frank", ...])
where($x0.length().is(5))
select($x0.toUpperCase())
-------------

The query is not executed until it is asked to:

IDataSet result = query.perform();
System.out.println(result);

Only when you call perform is the body of the query executed. In this case the program prints

DataSet[BURKE, FRANK, DAVID]

Since the Query object contains a data representation of the code, it's a small problem to construct a corresponding SQL query to send to the database. As I said before there's still plenty of work left on this but I think solving the problem of converting code into data is a big step. If I make more progress I'll be back with an update. In any case I hope to post more often from now on.