Databases

Has it really been a month since I last posted? It’s not that I’ve been particularly busy, in fact it’s been quite the opposite: after being unemployed for two months I’ve entered a state of profound laziness. It culminated last week: I did nothing but lie on my couch in an almost liquid state listening reading a song of ice and fire and listening to sigur rós. What a week that was.

But I’m slowly regaining my energy and have found a new area of interest: databases. Not databases a such, though, but support for databases in programming languages. I don’t really know anything about databases but happened to read a paper by William Cook and Ali Ibrahim, Integrating Programming Languages & Databases: What’s the Problem?.

Ideally, a programming language interface to a database should provide static type checking while still allowing queries to be parameterized a runtime, and must allow the database to optimize queries. Especially the issue of allowing queries to be optimized makes this problem difficult. Imagine that you’re given a huge database of employees and want to find all the ones called “Bob”. If you don’t care about performance you can just fetch the set of all employees from the database into your program and then select the ones with the right name:

for (IEmployee e : db.getEmployees()) {
  if (e.name().equals("Bob"))
    bobs.add(e);
}

If you do care the least about performance, though, you should ask the database to do all the work and just return the Bobs:

SELECT * FROM employees
WHERE name = 'Bob'

This is where things become difficult: you want to write ordinary code in your program that describes the query, like e.name().equals("Bob") but on the other hand you don’t want that code to be executed but instead sent in a query to the database.

Microsoft’s LINQ solves this by allowing code to be translated into data in the form of expression trees:

Func<int, bool> filter = (n => n < 2);
Expr<Func<int, bool>> filterCode = filter;
Console.WriteLine("{0}", filterCode.Body.Right);

Here we convert the lambda expression, (n => n < 2), into a piece of data, the second line, and then in the third line extract the Body field, n < 2, and finally the Right hand side of the body, the constant 2. When run, the program prints

ConstantExpression(2)

This solves the problem since you can now take executable code and disassemble it, which enables you to send it to the database to be executed instead of doing it yourself. But it this approach also has problems. First of all, it limits what you can write in a lambda expression since it must be possible to convert lambdas into expression trees. Also, it requires changes to the language and it feels like something that has been grafted onto the language without being generally useful, just to make things work in this special case.

So I’ve been playing around with a solution of my own in Java, one that doesn’t require changes to the language. Instead it uses proxy objects to turn code into data. There’s still plenty of problems to solve but I think it looks promising. Actually, today I ran my first query built this way, not in a database but on an ordinary collection. Here’s the program:

IDataSet<IString> strs = dataSet("Burke", "Connor", "Frank", ...);
Query<IString> query = new Query<IString>() { {
  IString str = from(strs);
  where(str.length().is(5));
  select(str.toUpperCase());
} };
query.printTranscript(System.out);

Notice how the body of the query looks a whole lot like a SQL query, but is written in Java and fully type checked. And the important part is that the query body is not executed as such but turned into data. The last statement prints the disassembled query:

--- query ---
def $x0 = from(DataSet<IString>["Burke", "Connor", "Frank", ...])
where($x0.length().is(5))
select($x0.toUpperCase())
-------------

The query is not executed until it is asked to:

IDataSet<IString> result = query.perform();
System.out.println(result);

Only when you call perform is the body of the query executed. In this case the program prints

DataSet<IString>[BURKE, FRANK, DAVID]

Since the Query object contains a data representation of the code, it’s a small problem to construct a corresponding SQL query to send to the database. As I said before there’s still plenty of work left on this but I think solving the problem of converting code into data is a big step. If I make more progress I’ll be back with an update. In any case I hope to post more often from now on.