MyJS
The first large JavaScript program I wrote after ES5 came out was my mercury chrome extension. I was surprised at how big a difference to my code style some relatively small changes to the language makes. Before then I had followed the process that led to ES5 and knew about those features long before I tried them but it was only when I actually tried them I got as sense and feel for how they felt to use. I thought, wouldn’t it be useful if you could get that while developing the language rather than after?
So a few months ago I sat down to write a language experimentation framework for JavaScript, MyJS. What MyJS does is allow you to define a language extension, including new syntax and/or library functions. A source file (html or js) can then specify which extensions it uses and the parser and translation framework will plug extensions together as appropriate to parse your program and convert it to plain JavaScript. How it works is worth a blog post of its own but here I’ll outline how you define a language extension and use it.
Delegates
You can try MyJS by going to this test page (I haven’t tested this in all browsers, it probably doesn’t work in all of them). The program that prints the output you see on that page uses the delegates extension, the operator ::
, which returns a python-style bound method. This is what the program looks like:
// A Printer can print text to a document.
function Printer(prefix) {
this.prefix = prefix;
}
// Add this printer's prefix and print the given value.
Printer.prototype.print = function(value) {
var elm = document.createElement('div');
elm.innerText = this.prefix + value;
document.body.appendChild(elm);
};
var p = new Printer("Looky here: ");
[1, 2, 3].map(p::print);
(::print)(p, "Where?");
(The idea for the delegates operator is Erik Corry’s) In this case the binary (“bound”) ::
operator returns a function that, when called, calls p.print
with the arguments it’s given. So (p::print)("foo")
means exactly the same as p.print("foo")
. The unary ::
(“unbound”) operator returns a function that uses its first argument as the receiver and the rest as arguments. So (::print)(p, "bar")
is the same as p.print("bar")
. This is just a random example of an extension, the operator itself isn’t that important.
Using a dialect
The way you use an extension is by using a MyJS specific script tag:
<script src="../out/myjs-web.js"></script>
<script type="text/myjs" src="delegates.js" dialect="myjs.Quote"></script>
This code first includes the MyJS java library. It then includes a script called delegates.js
, which defines the delegates extension. MyJS comes with a few language extensions built in which makes it easier to write new language extensions. Delegates uses one of those, myjs.Quote
. Finally, once we’ve loaded the delegates extension the third script element can use the extended language. Easy!
Defining an extension
How do you define a language extension? It has four parts: defining syntax trees, describing the concrete syntax, implementing the rules for translating the new syntax trees into plain JavaScript, and finally hooking it all into the framewrk. The full definition of the delegates extension is here. For the delegates extension we need two new syntax trees, one for bound and one for unbound delegates:
// Bound expression syntax tree node.
function BoundMethodExpression(atom, name) {
this.type = 'BoundMethodExpression';
this.atom = atom;
this.name = name;
}
// Unbound method syntax tree node.
function UnboundMethodExpression(name) {
this.type = 'UnboundMethodExpression';
this.name = name;
}
The syntax tree format is modelled on Mozilla’s Parser API and the parser for the standard language constructs produces parser api compatible syntax trees.
To produce these we need to hook into the grammar. The syntax is pretty straightforward:
LeftHandSideSuffix
::= "::" Identifier
PrimaryExpression
::= "::" Identifier
(A left hand side suffix is basically the right hand side of an operator expression). The way to express this in MyJS is to map it into calls to the grammar constructor library:
function getSyntax() {
// Suffix syntax tree builder helper.
function BoundMethodSuffix(name) {
this.name = name;
}
// Apply this suffix to a base expression.
BoundMethodSuffix.prototype.apply = function(atom) {
return new BoundMethodExpression(atom, this.name);
};
// Build the syntax
var f = myjs.factory;
var syntax = myjs.Syntax.create();
// LeftHandSideSuffix ::= "::" Identifier</span>
syntax.getRule('LeftHandSideSuffix')
.addProd(f.punct('::'), f.nonterm('Identifier'))
.setConstructor(BoundMethodSuffix);
// PrimaryExpression ::= "::" Identifier</span>
syntax.getRule('PrimaryExpression')
.addProd(f.punct('::'), f.nonterm('Identifier'))
.setConstructor(UnboundMethodExpression);
return syntax;
}
There’s a bit of a quirk here because a left hand suffix doesn’t correspond to a “full” syntax tree, it has the left hand side missing. So instead it returns a suffix object that will be called later by the parser with the left hand side as an argument and must return the full syntax tree.
The reason we don’t just build the syntax but define a function for doing it is that we only want to build it if the dialect is used. If it is just defined but never used it’s a waste of time to build the syntax.
This allows the framework to parse extended code and build syntax trees. The next part is to define how they are translated into plain JavaScript:
BoundMethodExpression.prototype.translate = function(context) {
return #Expression((function(temp) {
return temp.,(context.translate(this.name)).bind(temp);
})(,(context.translate(this.atom))));
};
UnboundMethodExpression.prototype.translate = function(context) {
return #Expression(function(recv, var_args) {
return recv.,(context.translate(this.name)).apply(recv,
Array.prototype.splice.call(arguments, 1));
});
};
This is where it gets a little bit tricky. This code uses a language extension, myjs.Quote
, to plug together syntax trees. But while this is tricky the first time you see it is a lot better than having to build syntax trees by hand without quoting. The #Expression
part means: the following code should be parsed as an expression but don’t run the code, return the syntax tree. The commas mean: this must be evaluated and the result, which is a syntax tree, must be spliced into the surrounding tree. If you know quasiquote from scheme that’s basically what it is. What this code does is say: if you have the syntax tree A::B
translate it into something like
(A.B).bind(A)
except that will cause a to be executed twice so we use
(function (t) { return (t.B).bind(t); })(A)
and similarly for ::A
. The recursive calls to translate are there to translate the subexpressions.
The last part is to hook it all into the framework. That’s done using this incantation:
myjs.registerFragment(new myjs.Fragment('demo.Delegates')
.setSyntaxProvider(getSyntax)
.registerType('BoundMethodExpression', BoundMethodExpression)
.registerType('UnboundMethodExpression', UnboundMethodExpression));
Here we give the extension a name, demo.Delegates
, register the function that will return the syntax to use, and register the two new types of syntax trees. That’s what it takes to define an extension.
Fragments also allow you to set up install hooks, so if your extension needs to add library functions to the global object for instance you can specify a function that is given the global object as an argument and can install any functions and methods it needs.
…six months later
I sort of ran out of steam on this project six months ago so while the “hard” part is there, the extensible parser framework, the basic language definition (all the standard language constructs are defined as if they were extensions), the translation framework, etc., there is still some work left to do before everything works – for instance, before you could define the harmony classes syntax. It would also be great if you could specify a dialect the same way you specify strict mode, within the script ("using demo.Delegates"
).
I also played around with making this work with node.js but I couldn’t find the hooks in the module importing primitives that would allow me to intercept module loads and do the source translation.
The code lives on github.