My pet project over the last few years is a parser library for Java called hadrian. I was lucky enough to be able to write my master’s thesis about this project but that was neither the beginning nor the end of it. Since my thesis I’ve rewritten it completely and now we’re using the new library in the neptune compiler.
One thing I’ve neglected, though, is documentation. I’ll try to write a few posts about it here at some point, to get started, but until then I’ve thrown together a demo applet which demonstrates some aspects of the library. The demo allows you to write a grammar and some input and then parses the input according to the grammar and shows the result as a syntax tree and XML. It also contains a few examples to play around with.
In itself, this demonstrates two things. Firstly, that hadrian allows you to parse input according to a grammar that wasn’t generated a compile-time but constructed by the running program. Secondly, that the result of parsing the input is constructed by the framework — you don’t have to specify your own ASTs. That’s only a small pat of what you can do but it’s pretty cool in itself. If I do say so myself. Which I do.
The code that does the actual work of the applet is pretty simple. Here it is (I’ll explain what it does below)
private void processInput(final String grammarString,
final String inputString) {
1 final ITextDocument grammarDoc = new StringDocument(grammarString);
2 final MessageMonitor monitor = new MessageMonitor();
3 final HadrianParser parser =
HadrianReader.getDefault().readParser(grammarDoc, monitor);
4 if (processMessages(monitor)) return;
final ITextDocument inputDoc =
new StringDocument(inputString);
5 final INode node = parser.parse(inputDoc, monitor);
if (processMessages(monitor)) return;
final DefaultTreeModel treeModel =
new DefaultTreeModel(SyntaxTreeNode.create(node, null));
6 syntaxTree.setModel(treeModel);
7 syntaxXML.setText(TextUtils.toXML(node));
}
Here’s what’s going on in the code.
- The string containing the grammar is wrapped in a text document. A text document is pretty similar to a string except that it keeps track of where line breaks are and stuff like that. That can be very handy for instance when reporting errors in a document.
- We create a message monitor. A message monitor collects any messages that the system might generate while processing the grammar.
- We read the grammar and construct a parser. If this goes well, readParser returns a parser we can use to parse the input. If something goes wrong it reports an error to the message monitor which we…
- …check for and bail out if necessary.
- We wrap the string containing the input and parse it using the parser we just successfully constructed from the grammar.
- After checking that parsing went well we take the result, a syntax tree in the form of an INode, and wrap it in something that can be displayed in the UI.
- Finally, we also convert the syntax tree to XML and plant the result in the appropriate text box
Grammars don’t have to be read from source files or strings, there’s also an API that allows programs to construct them directly. But it turns out to be pretty handy to be able to read them from source files.
If you want to take a look at how the applet is implemented, the source is available with the code in tedir-applet.jar. Hadrian itself lives at sourceforge, as a part of the tedir project.
One Response to Hadrian Demo