File under: Development, throff, golang, internals, interpreter

blog/ Throff Interpreter Internals

The Throff execution model, explained.

Throff was designed to be as simple as possible to implement and understand, for multiple reasons.

The faster, and easier, the interpreter is to implement, the more progress I can make with it. And the simpler it is, the easier it is to program, which should make it a good choice for new users, who usually just want to get something working quickly.

However at some point, it becomes necessary to know the exact details of how the interpreter operates. So this page brings together all the reference material on the interpreter. If you don't understand how a certain piece of code works (or doesn't work), this page may help.

Design Goals

Throff has clear design goals. One was to evolve Forth as far as possible, to see what it would look like with lexical scopes and modern-day conveniences, like hashes.

The other goal was to take as much from functional and catenative languages as I could, without turning a simple task like

PRINTLN READLINE OPEN testfile.txt

into some nightmarish theoretical math problem. So the design restrictions are:

Immutable data structures. Never mutate data. Except when we think we can get away with it
Referential transparency. Variables can't change after you assign them.
Simple execution model. I'm too dumb to write a complicated one.

And some slightly more vague guidlines.

Remove clumsy idioms. If a programmer can describe the task clearly in one sentence, it should take one line of code
Design so it will be possible to get large speed-ups from JIT and memoisation
Remove the ability to write bugs, wherever possible
Support advanced debugging. And by advanced I mean helpful features, not "inspect memory"
Pure functions wherever possible

Lexical scoping

At the start of the parse, a scope is created. This becomes the parent, or root, lexical environment. During the parse, the input text is turned into a stream of TOKENs. Each token is given a reference to the scope. As the program runs, other scopes will be created as descendents of this scope.

The stream of TOKENs is pushed onto the code stack. At the same time, one scope reference per TOKEN is pushed onto the codeLex stack. This stack moves in sync with the code stack, and contains the execution environment for each token. For now, it contains references to the same scope that the TOKENs point to, but soon it will be holding dynamic scopes.

Execution starts, instructions are popped off the code stack, and scopes are popped off the codeLex stack. Each TOKEN is executed using the scope from the codeLex pop.

Eventually functions are created. When a function is created, it is given a reference to the current scope, which becomes the parent scope for that function. When the function is activated, a new scope is created, based on the parent lexical scope that it has a reference to. This new scope is the dynamic scope of the function (to support recursion of functions). The function pushes its component instructions onto the code stack, and a reference to the dynamic scope is pushed onto the codeLex stack for each instruction pushed onto the code stack. Then the execution proceeds as usual, popping instructions and scopes off the stacks and executing them.

All this is done so that when a function is created inside another function, it will hold a reference to the "outer" function's dynamic scope, which is what you would expect to happen.

There is no accompanying dataLex stack for the data stack - because data items do not have scopes (except for TOKEN, CODE and LAMBDA which manage their own scopes).

Program execution

Throff programs are written backwards. This isn't a style choice, it is a fundamental requirement of the interpreter, because underneath all the features, Throff is just Forth. In Forth, there is a strict execution order. Forth programs run left to right, top to bottom. In Forth, the function arguments go before the function.

1 1 ADD

2

These backwards functions (postfix notation) are one of the major complaints from people learning Forth - every other language puts the arguments after the function

ADD 1 1

2

So I reversed the order of the program. In Throff, arguments still come before the function, but now the program is read backwards. So functions look normal, but the entire program is now backwards.

So the program execution is very simple. The input stream is broken into tokens, separated by spaces. Each token is then fed directly into the code stack. The interpreter advances one step by popping an instruction off the code stack and executing it. The entire interpreter operation can be written simply:

E<sub>1</sub> = Step(E)

Or more completely

(code1, environment1, state1) = Step(code, environment, state)

Throff is heavily influenced by Felleison's work on CESK machines, although I had started work long before his paper.