Programming, Made Complicated

Joel Jan 28, 2026

It’s tough acting as a cooperative scheduler with so many things starved for cycles! I have hope that one day, I’ll be able to ask AI to “develop this thread in the way I would do, if I wasn’t preoccupied with these other things”. For some things, the joy is in the discovering, but most … Continue reading Current Research Threads I Can Think Of

Show full content

It’s tough acting as a cooperative scheduler with so many things starved for cycles! I have hope that one day, I’ll be able to ask AI to “develop this thread in the way I would do, if I wasn’t preoccupied with these other things”. For some things, the joy is in the discovering, but most of my threads are just things I want to exist regardless of who or what creates them.

Numbers are thread IDs in order of recall. Priority levels not indicated. Progress not indicated.

Smalltix
01 Compile ST to Bash etc
02 More of SBECrossMorph
03 Web Canvas
04 FUSE optimise obj-dirs
05 Uspace exec optimise processes
06 FUSE expose Squeak objects
Diagram Parsing
07 Forms of Syntax: text, binary, structs/types, visual
08 Generality / semantics of visual syntax
09 OMeta for Notations
10 Student project 1
Self-Raising Diagrams
11 GUI
12 Bootstrap editing
13 Enforce viz/interactive syntax
Constraints
14 Student project 2
15 Constraint notations
16 ALE on Haiku
Domain Modelling
17 MetaEdit+
18 OpenPonk
Late-Bound OO Semantics
19 OOPSLA paper
20 Grant Proposal
Kaitai formats
21 Self snapshot
22 Squeak image
31 Collab on filesystem-like substrates
Tutorials
24 Id1,2,3 with caching (writeup!)
25 Squeak By Example
26 Apparatus
27 Penrose
28 Bloom
29 ST Blue Book
Misc
30 Information/Specificity Theory
23 Academic visit

http://programmingmadecomplicated.wordpress.com/?p=9260

Extensions

Notes on “Chains of Meaning in STEPS”

Joel Nov 24, 2025

The “Chains of Meanings” tech report (Piumarta 2009) contains a nice application of the OMeta parsing DSL, or perhaps one of its precursors called IS. The task is to compile high-level abstract syntax (concretised as a Lisp Fibonnaci function) to x86 machine instructions. These tech reports, while immensely valuable, are seldom optimised for readability to … Continue reading Notes on “Chains of Meaning in STEPS”

Show full content

The “Chains of Meanings” tech report (Piumarta 2009) contains a nice application of the OMeta parsing DSL, or perhaps one of its precursors called IS. The task is to compile high-level abstract syntax (concretised as a Lisp Fibonnaci function) to x86 machine instructions. These tech reports, while immensely valuable, are seldom optimised for readability to outsiders (I get the impression they were often presenting to each other), so here is my commentary to aid interested parties whose minds work like mine.

Here’s the Fibonnaci function and its invocation:

(define nfibs (lambda (n)
  (if (< n 2) 1
              ( + 1 (nfibs (- n 1)) (nfibs (- n 2)) ))))
(print (nfibs 32))

It first gets compiled (see picture below) to this abstract accumulator/stack-machine code (comments in [square brackets]):

label 3
enter
  push 2 load-arg-named n less branch-false 1  [< n 2] 
  load-lit 1 branch 2
  label 1
  push 2 load-arg-named n sub push call nfibs push [nfibs (- n 2)]
  push 1 load-arg-named n sub push call nfibs add push [+ (nfibs (- n 1))]
  load-lit 1 add [+1]
  label 2
leave
main
  new-global nfibs load-lit 3 store-global nfibs [declare nfibs]
  push 32 call nfibs push call print [print (nfibs 32)]
exit

Note that this is just a flat list of string/numeric tokens in memory; all indentation is cosmetic. Also, I lied: this is my own simplified assembler that expands to the one in the report via the following rule. Expand push N into load-lit N push, and load-arg-named S into load-arg I where I is the argument index, and call S into load-global S call N where N is the number of arguments.

So, actually, this is the assembler in the report:

label 3
enter
  load-lit 2 push load-arg 0 [n] less branch-false 1  [< n 2] 
  load-lit 1 branch 2
  label 1
  load-lit 2 push load-arg 0 [n] sub push load-global nfibs call 1 push [nfibs (- n 2)]
  load-lit 1 push load-arg 0 [n] sub push load-global nfibs call 1 add push [+ (nfibs (- n 1))]
  load-lit 1 add [+1]
  label 2
leave
main
  new-global nfibs load-lit 3 store-global nfibs [declare nfibs]
  load-lit 32 push load-global nfibs call 1 push load-global print call 1 [print (nfibs 32)]
exit

Oops, I lied again, but not as egregiously this time: I renamed the assembler mnemonics according to my own preference. Rename load-lit back to load-long, push back to save, load/store-global back to load/store-var, and new-global back to long, and you’ll have the original code. Call it ABS-ASM, abstract assembler.

The Appendix defines the relevant OMeta syntax, which is mostly what one would expect from regex and grammar notational conventions. It’s based on Parsing Expression Grammars (PEGs), whose disjunction operator | represents ordered choice, resulting in nice unambiguous semantics. Since the system is all about parsing or matching patterns in general streams, not necessarily streams of characters, I’ll refer to the activity as matching named patterns.

Separate from match success/failure is the value of a (successful) match: during the text parsing stage, the “values” are structural objects, and during compilation to ABS-ASM, they’re lists of symbols/integers encoding a list of instructions. For the final stage, the match values are strings containing x86 AT&T assembler syntax. So during the compilation rules, I’ll refer to “match-compiling” named patterns instead.

Stage 2: text to S-expression trees

The parsing of the concrete Lisp characters to the structural objects is not very interesting, except for a few rules which demonstrate OMeta features. First example:

symbol = ( letter (letter | digit)* ) $$ :s _ -> :s

I read this as: to match a symbol: match a letter, then any further letters or digits; then intern ($$) that whole string as a symbol object; call that s; then match a token separator (_); then the value of the match is s.

Here, we see that normal parens () are used for grouping; the symbol intern operator $$ acts on the match value of the group (which will be a character list). We also see the colon : naming operator, which takes the value of its LHS and stores it in the variable on the RHS. This needn’t be a single letter, but it is in all the examples. The token separator _ isn’t a special operator; it’s just a match rule defined as _ = (blank | comment)*. Finally, the -> operator updates the current match value to its RHS, an “output expression”; if there are multiple such operators in sequence, the later ones will overwrite the prior ones. Within an output expression, the colon operator means something else: it dereferences the variable name to get its value.

number = digit+ $#10 :n _ -> :n

To match a number: match at least one digit; parse them as an integer in base 10 ($#10); call it n; match a toksep; then the value of the match is n.

list = "(" sexpr* :l _ ")" -> :l

To match a list, first match an open-paren. Then match any further S-expressions; call them l. Then, match a toksep and a close-paren. The value is then l (a list of values produced by S-expression matches).

Stage 3: prefix tree to postfix ABS-ASM

All the interesting stuff lies in “stage 3: prefix tree to postfix abstract code”, where parsing/matching occurs not on streams of characters, but on streams of objects (numbers, symbols, or lists from the previous stage).

intlit = &`intlit? .
name   = &`symbol? .
arity  = .* :x ->`(list-length x)

To match an integer literal (e.g. 3), first verify that the next object on the stream is an integer literal (i.e. call the predicate intlit? on it), and then consume the object (.); the value of the match is implicitly that object. (“intlit” is my renaming of the original “long”.)

Here we see the “match the object against a predicate without consuming it” operator &` and the “match and consume whatever the next object is” operator . .

Similarly, to match a name (e.g. load-lit), first verify that the next object is a symbol, then consume it, yielding the object as the match value.

arity gets used non-consumptively via the & prefix later when matching function calls. To match against the arity rule, consume (.) the rest of the objects in the local stream and call them x. Call list-length on that list of objects, and yield its value as the match value.

That last part is a variation on the output expression, ->`code, where code is executed to produce the match value. Skipping ahead for a moment, here’s where arity is used:

expr = ...
     | '( expr:f &arity:n args:a ) -> (::a ::f call :n)

To match-compile a function call expression, match a structure (i.e. object list?) and step inside the structure, matching on its contents. First, match-compile an expression; call its instructions f. Then match against arity without consuming anything, i.e. look ahead to determine the number of objects left in the structure; call that n. Then, match-compile the argument expressions; call their instructions a. Concatenate the arg instructions, the f instructions, a call instruction, and the value of n into a list and yield that as the compiled instructions for the call expression.

This demonstrates the “quoted parens” '() as distinguished from the grouping parens (). The quoted parens expect the next object on the stream to be a list and their contents match using this list as their substream. We see the “match against a rule without consuming” prefix & used to get the length of the forthcoming arguments without yet consuming or compiling them. We also see the most common form of output expression, in this case building a Lispy list into which the arity value is inserted (via :n). Before that, however, we have the splat operator :: used to splice the elements of the a list and f list into the list being constructed (in JavaScript it would be [...a, ...f, 'call', n]).

Moving back to the original order of presentation (not semantically relevant for the rules themselves, although the order of alternatives within a rule matters according to the whole PEG idea):

args = expr:e args:a -> (::a ::e push)
     | expr:e        -> (::e push)
     | -> ()

To match-compile args, first match-compile an expression and call it e. Then recurse; call that a. Yield the a instructions followed by the e instructions followed by a push (so as to push them to the stack in reverse order). If any of that failed, try the case of a single argument expression. Otherwise, yield an empty list.

params = ( name:h params:t | name:h ) -> `(arg-name h)

To match-declare the formal params of a lambda, match a name h and recurse if necessary. Afterwards, call arg-name on h to declare it and assign it to an argument index. “The rule is written so that parameters are declared from right to left”. (I imagine that the version of the rule that would declare left-to-right would be something like:

params = name:h -> `(arg-name h) ( params | )

That is: match a name h, register it, and then try and recurse or, failing that, accept end-of-stream.)

expr = intlit:x  -> (load-lit :x)
     | name:x &`(arg-index-of x):n -> (load-arg :n)
     | name:x    -> (load-global :x)
     | ...

To match-compile an expression, first match an integer literal and yield a load-lit instruction for it. Otherwise, match a name x and let n be its corresponding argument index (if x wasn’t previously declared as an arg in the params rule, this will fail up through the matches and try the next alternative); yield the load-arg instruction for n. Otherwise, match a name and yield its load-global instruction. (arg-index-of is confusingly called is-arg in the original, but it appears to return the index instead of a boolean…)

expr = ...
     | '( '< expr:x expr:y )      -> (::y push ::x push less)
     | '( '+ expr:x expr:y )      -> (::y push ::x push add)
     | '( '- expr:x expr:y )      -> (::y push ::x push sub)
     | '( 'define name:n expr:e ) -> (declare-global :n ::e store-global :n)

Match-compile a structure whose contents match the following: match a literal < symbol, and match-compile two expressions to x and y; emit the y instructions, add a push, then the x instructions, then a less. Analogously for the next two cases. Otherwise, match-compile a structure whose contents match the symbol define, followed by a name of value n and an expression compiled as e; yield the declare-global instruction for n, followed by the e instructions, followed by a store-global to n.

expr = ...
     | '( 'lambda '(params) expr*:b )
       -> (enter :::b leave):l
       ->`(label-subroutine l):n
       -> (load-label :n)

Match-compile a structure whose contents matches as follows. Match the symbol lambda; match a structure whose contents match params (declaring their names in the process); match any number of body expressions compiled to a list-of-lists b. Let l be the flattened body instructions :::b surrounded by enter and leave; call label-subroutine on that to allocate a new program label n, and finally yield the load-label instruction for that label.

This rule is notable for its sequence of multiple output -> operators and how they feed into each other. The first constructs a list using the flatten-and-splice operator ::: (turning a list-of-lists into a list) and then uses the naming operator : to pass it on. Evidently, the naming operator on the RHS of an output -> works as follows:

In -> :x it means “yield the value of x as the match value”
In -> (...):x it means “store the list in x and yield it as the match value”
In -> (... :x ...) it means “construct a list using the value of x and yield the list as the match value”
So I guess in -> (... :x ...):y we’d have both of these semantics simultaneously.

Subsequently, the list is fed into label-subroutine (originally save-lambda) whose returned label index gets passed on via another variable (and presumably overwrites the previous match value?). Finally, an instruction is yielded as the match value.

expr = ...
     | '( 'if expr:c expr:t expr:f )
       ->`(new-label):a ->`(new-label):b
       -> (          ::c branch-false :a
                     ::t branch :b
            label :a ::f
            label :b )

Match-compile a structure whose contents matches as follows. Match the symbol if, followed by three expressions compiled to c, t, and f. Let a be the result of calling new-label. Let b be the result of calling it again. Yield the condition instructions c, a branch-false to a, the true-branch instructions t, a branch to b, a label instruction marking a, the false-branch instructions f, and a label instruction marking b.

Stage 4: ABS-ASM to x86 asm source

The final transformation, of ABS-ASM to x86 assembler source, is pretty straightforward given the prior material. Instead of the previous “arrow” -> output expressions yielding an object, the “output string expressions” here all output a string to the final stream. For example:

instruction = 'label . :l `"L\#l:"

To compile an instruction, match the symbol label and consume the next object, calling it l. Emit an x86 assembler label consisting of the (assumed) integer value of l, prefixed by L.

instruction = ...
    | 'declare-global . :n `"        .data"
                           `"_V_\$n: .long 0"
                           `"        .text"

…Or, match declare-global, followed by an object called n; emit a long data init directive labelled by the mangled string value of n.

instruction = ...
    | 'load-arg . :n ->`(* 4 n):n `"movl (\#n)(%esi), %eax      ; eax := [esi+\#n]"

…Or, match load-arg, followed by an object n; multiply n by 4 and overwrite it; emit x86 for loading our accumulator (eax) from the appropriate location on our stack frame (based on esi). (I inserted an asm comment because I find AT&T syntax unreadable.)

Conclusion

This is all very cool. I believe it’s all inspired by an old meta-language called META II. See also: Rick Lindberg’s RLMeta implementation.

Of course, my motivation for studying this pertains to self-raising diagrams and the notational generalisation of OMeta. My own follow-up questions include:

What is the incremental/interactive/sculpting equivalent of these batch-mode stream transformations?
Remaining within textual batch-mode OMeta: given a graphical notation to compile to x86 assembler (or, more likely: JS), how might this approach be “lifted” to the graphical realm? Presumably all of the novelty would be in “stage 2”, i.e. parsing a diagram to prefix trees instead.
What about rising to a graphical (or structured) OMeta notation for the very same text parsing? E.g. with no need to escape or quote anything like in the plain-text version.
Why was OMeta succeeded by Ohm? Is Ohm just better, or are there tradeoffs? Which is the better fit to the notational lifting / incrementalisation I want to do?

http://programmingmadecomplicated.wordpress.com/?p=9213

Extensions

PAINT25 Invited Talk transcript: “Notational Freedom via Self-Raising Diagrams”

Joel Nov 4, 2025

(I gave two talks at SPLASH this year. Besides my Onward! Essays submission, I was kindly invited to open the PAINT workshop on Programming Abstractions, Interactive Notations and Tools. Because it was not recorded, and there was no associated paper, I’m posting a lightly edited script here. This is my first iteration of these ideas; … Continue reading PAINT25 Invited Talk transcript: “Notational Freedom via Self-Raising Diagrams”

Show full content

(I gave two talks at SPLASH this year. Besides my Onward! Essays submission, I was kindly invited to open the PAINT workshop on Programming Abstractions, Interactive Notations and Tools. Because it was not recorded, and there was no associated paper, I’m posting a lightly edited script here. This is my first iteration of these ideas; I already wish to re-work them and present again somewhere. The main patch to apply to this transcript is that “vector diagram parsing” is distinct from “self-raising diagrams”, though the latter depends on diagram parsing to get off the ground.)

Abstract: Some things are better drawn than coded. However, it takes a lot of work to build a custom editing interface for each new notation, discouraging experimentation. There has to be a better way than Greenspunning poor approximations of Adobe Illustrator over and over again. “Self-raising diagrams” are a promising escape from this trap. Just as source code—a static artefact—“raises itself” into a dynamic running program, a vector graphics diagram—taken to generalise source code—can similarly be parsed, interpreted, and “animated” into a GUI. Notational engineers can then focus on the notations themselves, and their semantics, having left the implementation of drawing interfaces to the experts (implementors of standard vector graphics editors). The tasks of normalising, interpreting, and transforming vector notations contain many relevant problems for the notational engineer—far more relevant than coding line rubberbanding for the umpteenth time!

In programming, we have a maxim of “use the right tool for the job”. There are three load-bearing concepts here: right, tool, and job. I think the ideal spirit of this maxim is captured by the following interpretations:

First, whether or not a tool is right for the job is best determined by the person who is going to use the tool. It would be best if each person is able to use the tool that they feel they’ll be most productive with. The judgement of what is “right” is subjective and idiosyncratic to the preferences and skills of the user.
The “tool” part has a range of interpretations; a tool could be an entire programming language, it could be a design approach, or it could be a way of representing information, such as a notation.
The “job” could be an entire project, but it doesn’t have to be: it could be a smaller component, a class, a function, a data structure, even perhaps a single line of code, or a single expression, or a single token in the code.

There’s also something to be said about the word “use”: if the tool we want to use already exists, then what is the cost of using it alongside all our other tools? Call that the cost of tool integration. And if the right tool for the job doesn’t exist, what is the cost of making it ourselves for the occasion? Call this the cost of tool production.

I find all of this most relevant for how we represent code and data in programming. There is a lot of room for improvement because very often we have to use one single syntax, or one single notation, to express everything. This could be called the “one size fits all” approach, which is the antithesis of “use the right tool for the job”, and it happens because the cost of integrating other notations is too high, let alone the cost of inventing ad-hoc notations for the problem at hand.

STEPS and Mood-Specific Languages

An important reason I’m doing research today is because when I was an undergraduate, I somehow discovered the work of the STEPS project, which utterly blew my mind. The goal of the project was to replicate the stack of functionality for personal computing in under 20,000 lines of code. I believe they were successful in this. And the way they did this was, in large part, by making copious use of domain-specific languages, or by using the right language for the job as much as possible.

(Related: Alan Kay’s talks on “T-Shirt Computing“; “Complex vs. Complicated“)

Of course, to support lots of domain-specific languages, they needed to somehow lower the cost of integration and cost of production for these languages. And this was the purpose of OMeta (later Ohm), a domain-specific language for pattern matching and implementing other domain-specific languages.

I was particularly taken by this quote from one of my favourite papers from the project (my emphasis):

Applying [internal evolution] locally provides scoped, domain-specific languages in which to express arbitrarily small parts of an application (these might be better called mood-specific languages). Implementing new syntax and semantics should be (and is) as simple as defining a new function or macro in a traditional language.

A simple example of such a “mood-specific language” might be where you’re doing graphics programming and you have a single line of code that does some vector algebra, but your language doesn’t support operator overloading ():

vec_a.mul(cos(ang_b/2))
     .add(vec_b.mul(cos(ang_a/2)))
     .add(vec_a.cross(vec_b))

In this case I would certainly prefer to be able to define a local syntax using infix operators and implicit multiplication, which would just compile to the verbose original:

cos(ang_b/2) vec_a + cos(ang_a/2) vec_b + vec_a × vec_b

(Related: the Unicode-heavy Nile stream processing language and the Gezira graphics stack built atop it, all part of STEPS.)

However, I think a much more instructive example of one of these languages can be found in the STEPS 2007 progress report. Whenever I need to convince someone of the practical motivation for these sorts of ideas, I’m always thinking that they said it best in the source material, and I couldn’t do any better by paraphrasing myself! So here it says (my emphasis):

Elevating syntax to a ‘first-class citizen’ of the programmer’s toolset suggests some unusually expressive alternatives to complex, repetitive, opaque and/or error-prone code. Network protocols are a perfect example of the clumsiness of traditional programming languages obfuscating the simplicity of the protocols and the internal structure of the packets they exchange. We thought it would be instructive to see just how transparent we could make a simple TCP/IP implementation.

Our first task is to describe the format of network packets. Perfectly good descriptions already exist in the various IETF (RFCs) in the form of “ASCII-art diagrams”. This form was probably chosen because the structure of a packet is immediately obvious just from glancing at the pictogram. For example:

If we teach our programming language to recognize pictograms as definitions of accessors for bit fields within structures, our program is the clearest of its own meaning.

Then they show an executable grammar in OMeta (or some precursor called IS) which can parse and interpret the ASCII art diagram for the packet header. And then:

We can now define accessors for the fields of an IP packet header simply by drawing its structure. The following looks like documentation, but it’s a valid program.

They subsequently re-use this domain-specific language to define the TCP packet format in a similar way.

Notational Freedom and Diagram Parsing

This was clearly a fantastic idea—use the right language for the job, no matter how small that job is—but it’s always bothered me that many jobs in programming don’t neatly fit into a language—because what we usually mean by “language”, at least in terms of what we interact with on the screen, is MONOSPACED TEXT, or syntax. So the last line of the “mood-specific languages” quote, about lowering the cost of production for ad-hoc languages—

Implementing new syntax and semantics should be (and is) as simple as defining a new function or macro in a traditional language.

—could be called syntactic freedom: the freedom to use the right syntax for the job. That’s not quite the same thing as the right TOOL for the job. And I’ve never been able to resist seeing this as a frustrating special case of a more general notational freedom, i.e. the freedom to use the right notation (including syntax) for the job:

Implementing a new notation (and its semantics) really ought to be as simple as defining a new function or macro. (Or, at the very least, it should be a lot simpler than it normally is.)

As brilliant as they are, those ASCII art diagrams always frustrated me, because what is ASCII art if not a CONCESSION that programmers make to the medium of plain text, whenever we want to draw a diagram! Why is this beautiful idea, of using the right notation for the job, limited to ASCII art? Why can’t we have the table, the real table made of real lines that this diagram is clearly approximating—why can’t we have that as the source code? What’s stopping us from parsing this instead?

The immediate answer is that parsing REAL graphics (rather than ASCII art) is not remotely an established thing the way that parsing text is. Where do you begin? How is this diagram encoded? Must we use Computer Vision or AI to recognise that these pixels are a line, while those pixels spell the word “sourceAddress”…?

Of course not! Vector Graphics exists and has existed for a long time, so no fancy computer vision techniques are necessary. As to the other questions about what it means to parse a vector diagram, I think these are very interesting questions that deserve to be answered practically—by trying to do it:

Ta-da! Now it has to be said that there’s no OMeta here; I just wrote a load of JavaScript. But there are some interesting patterns in the JavaScript. In a moment, I will come back to what’s going on here in more detail. But first, I will explain the general idea.

Self-Raising Diagrams via Diagram Parsing

Just as the set of all possible syntaxes is merely a subset of all possible notations, the set of all possible source code is a subset of all possible vector graphics diagrams—because diagrams can include text! And I know that, at least in the SVG format, you can programmatically extract strings from the text elements. So it’s interesting to consider vector diagrams as a strict generalisation of source code. And this is what leads me to the concept of “self-raising diagrams”.

I like to think of source code as self-raising text. Source code is this static artefact—a text file—which could be printed out on paper. Yet, with the aid of an interpreter, it raises itself into the dynamic world of a running program. Analogously, what we might call “generalised source code” exists as a static vector diagram, which could also be faithfully reproduced on paper. Yet, given an interpreter, it raises itself into a running program. A self-raising diagram. Except this time, because the source code is fully graphical, I think it’s actually easier for the running program to be graphical and interactive by default; more so than what you get with self-raising text.

For example, I drew this diagram in my go-to editor and exported it to SVG. The only change I made to the file afterwards was the insertion of <script> tags to load my infrastructure in the background; other than that, it’s an ordinary SVG file. It contains some circles, and some JavaScript code in red boxes. The red border means the code will be executed as the final step of processing the diagram (green boxes have a semantics of “ignore contents”, i.e. they are comments). The code sets up some event handlers so that, if any “circle” element is dragged, it will move with the mouse. Initially, they don’t move. But after I run the parser/processor, the code in the red boxes will have been executed, and the circles are draggable. I would probably regard this as the “Hello, World” of self-raising diagrams: the simplest one that does something interactive.

Of course, running code as a program isn’t the only thing we do with source code. We also compile source code into an executable binary (which you could say is interpreted by the hardware) or transpile it into source code in another language. Some languages like C, TeX, and Lisp have macros which expand into code before execution, and compilers make multiple passes to transform the code. All these concepts must also have interesting analogies in the realm of self-raising diagrams. Not that I know what they are yet (besides passes, to be elaborated shortly).

However, no matter whether we’re compiling or interpreting, the first step in all of these processes is parsing the source representation. In the early stage of this research so far, where I’ve only spent about 3 weeks of work on it, it is this first task of diagram parsing which I’ve had the most experience with. And—with apologies to Tratt—it’s very interesting to consider just how much of a solved problem text parsing is. Decades of computer science research and formalisation have been poured into parsing strings into trees. Yet if we look at a diagram like the “drag circles” demo, all of that appears to go out the window, because now there are some quite daunting differences from the world of text.

For one, we no longer have a one-dimensional stream to scan from beginning to end. Even worse, we no longer have a discrete space of characters; we’ve got a continuous space of floating-point coordinates. So if text parsing could be seen as a formalisation of what the eyes do when reading text, we’re now confronted with formalising what the eyes do when reading a two-dimensional diagram that might not have an obvious reading order.

In order to make some progress on this, I just tried brute-forcing several different types of diagrams. I wrote JavaScript to parse them, and I just tried to be reflective about the whole process; I tried to notice which patterns seem to recur, or what architecture seems to grow out of this experimentation. I identify 8 emergent principles that are very clear to me even after just 3 weeks and about four or five different diagrams. I’m tentatively naming them as follows:

Precise, Predictable Syntax/Semantics
The Document is the Ground Truth
Idempotent, Typed Passes
Visual Debug Annotations
SVG Normalisation / Jank Correction
De-spatialise / Graphify Early
Concrete Prototyping with Dev Tools
Local Scoping Mechanisms

I’ll run through these using an example. Take a look at this simple diagram format (on the left) called BoxGraph:

Just as text has a concrete syntax representing an abstract syntax tree or AST, BoxGraph can be thought of as one possible visual syntax for encoding an Abstract Data Graph or ADG. There are boxes, which can be named by text labels, and the boxes point to each other with labelled arrows. When we “abstract” something, we throw away information; and the information we’re using this syntax to represent is the topology and the names, not the distances or relative positioning or any “metric” qualities. So what it means to parse this diagram is to end up with some representation of this ADG somewhere, such as in JavaScript runtime data.

Precise, Predictable Syntax/Semantics

As a user of this visual syntax, I want to be able to rely on some basic spatial rules to know how the diagram is going to parse. I don’t want to do any guessing, and I don’t want any non-determinism. The fact that we’re using vector graphics instead of computer vision or AI is a big help here. However, I must still be careful to avoid non-determinism in things like iterating through SVG elements. In the parser, if I know that the iteration order might make a difference, then I should make sure to e.g. sort left-to-right by the x coordinate, just to be safe.

Of course, there’s also a “tree syntax” (distinct from the concrete XML syntax) of how this diagram is encoded as SVG XML elements. Different XML trees could map onto diagrams that look exactly the same. So we don’t want the parsing to depend on that either; the semantics have to be independent of properties that aren’t visible in the diagram.

(I’m confused in this graphical domain between saying “syntax” and “semantics”; I instinctively want to say “the semantics OF the syntax”, i.e: the meaning of a box is a node in the ADG, and the meaning of an arrow is a named mapping from node to node. I think this points to the fact that, when we have a long list of transformations, the line between “syntax” and “semantics” becomes somewhat arbitrary. There are other ways to regard this: perhaps syntax is concerned with Boolean evaluations of a structure (valid / invalid), as we see not just in formal languages but also in binary file formats and C structs; while what we ought to call “semantics” involves evaluating a structure to something else, or transforming it to similarly-shaped structures.)

Anyway, this BoxGraph syntax works according to the following rules, which are phrased in terms of spatial relationships:

If arrow A begins inside box B1 and ends on or inside box B2, then B1’s abstract node points to B2’s via A. Here, “on” means that the arrowhead is just touching the edge of the target box. (Empirically, Mathcha.io was outputting diagrams that look correct, but where the arrow endpoint is numerically one or two pixels short of the border, so I make this work using a parameter for tolerance or “epsilon”.)
Each arrow is labelled by the closest text label to its origin point.
Each box is labelled by its closest un-claimed text label, out to a certain max distance parameter. Past that, it’ll just be anonymous.

These rules are what I mean by precise/predictable syntax/semantics. Ideally, they would be written in some formal logic language as I’ve sketched, or even in a mood-specific notation of their own—but that’s a dream for the far future.

(My intent is for these rules to ideally apply “in parallel”, as declaratively as possible. In this BoxGraph case, I haven’t quite achieved this: there is a dependency in the fact that boxes get their labels from the “unclaimed” pool after the arrows take first priority. This could perhaps be rephrased in a purely parallel way, but since this talk is about using the right notation for the job, I used the conceptual presentation that was more intuitive to me.)

The Document Is The Ground Truth

The JavaScript code for parsing BoxGraph broke down into a lot of smaller “passes” over the diagram. These passes have dependencies on each other, and I actually programmed their dependencies by drawing another diagram…!

(function() {
  const arrow = function(origin, target) {
    let reqs = passReqs[origin];
    if (reqs === undefined) passReqs[origin] = reqs = [];
    reqs.push(target);
  }
  // Generated from boxGraph-deps.svg (labelGraph)
  arrow('annotateContainments', 'normalizeRects');
  arrow('annotateContainments', 'idLabels');
  arrow('annotateArrowConnections', 'normalizeRects');
  arrow('annotateArrowConnections', 'idArrows');
  arrow('generateJSOG', 'nameBoxesIfApplicable');
  arrow('nameBoxesIfApplicable', 'labelArrows');
  arrow('nameBoxesIfApplicable', 'normalizeRects');
  arrow('labelArrows', 'idArrows');
  arrow('generateJSOG', 'annotateArrowConnections');
  arrow('labelArrows', 'idLabels');
  arrow('nameBoxesIfApplicable', 'annotateContainments');
})()

The SVG in the slide shows the structure of the passes for parsing BoxGraph. Each pass makes some change to the SVG document tree, possibly invisible at the diagram level. For example, one of the first passes we need to perform involves recognising the arrows and assigning each one an ID in the DOM. These passes write their results back to the DOM tree: the document is the ground truth. The DOM “data attributes” are invaluable for this. In the slide, you can see that pass idArrows has given the arrow <g> an ID, a class is-arrow, and a data-origin-pt (offscreen, there’s also a data-target-pt).

(Recognising arrows here is tied to Mathcha’s output conventions; more on that later. Because BoxGraph doesn’t involve any lines that aren’t arrows, I cheat and don’t bother checking for the existence or shape of arrowheads. Instead, I just look for elements with class arrow-line, which Mathcha uses for all arrows/lines. I expect a <path> child element, whose d coordinates are in the right order for origin/target. This is not robust to e.g. drawing the arrows the wrong way round in Mathcha (setting the arrowhead on the origin point and deleting it on the target point), but this is not a very interesting objection; it could be, and ought to be, made robust to this in the future—so that whatever the user sees is processed in the way they expect.)

This principle is very convenient and it means that intermediate parses can be saved (and resumed later on) by taking the XML in the DOM Inspector and copying it to a file. This wouldn’t be nearly as easy for random JavaScript heap data structures, so I’m careful to use JavaScript data only as a cache of what’s in the tree (The Document Is The Ground Truth!).

(This leads to a pervasive need to sync JS objects with their source DOM attributes, which are stringly typed . Since I would much rather do arithmetic on JS arrays, rather than space-delimited vectors in strings, I can’t escape the need to sync these two representations. Currently this is done manually, but it would be good to automate it.)

Visual Debug Annotations

“The Document Is The Ground Truth” also means that I can get visual feedback, or debug annotations, on what a pass did or what it was thinking. For example, after running labelArrows and nameBoxesIfApplicable, I can see which names got attached to which arrows and boxes from these blue lines:

Concrete Prototyping with Dev Tools

I’m basically treating the browser dev tools as my development environment. When I want to write new passes or fix bugs, I have concrete example data to experiment with in the console before I commit to the JavaScript file.

(Additionally, I am heavily exploiting the fact that the browser understands XML, SVG, DOM, and handles all of that parsing for me when I open the SVG file. My JavaScript can easily use DOM APIs to query for elements, navigate the tree, and rewrite it. This would not be the case if using node.js with a mundane XML parsing library, while the alternative of a “headless” browser setup is beyond my knowledge at the moment.)

SVG Normalisation / Jank Correction

One thing that surprised me when I started out is that Mathcha actually outputs all shapes as SVG polylines or Bezier curves (all via <path>). This is despite the fact that SVG has specific elements like <rect>angle and <circle>…! These patterns of “rectangle” and “circle” are semantically meaningful in my diagrams. For this reason, the normalizeRects pass looks for all line loops that look like rectangles and replaces them with the equivalent <rect> element:

This here, by the way, is my “aspirational notation” for expressing such a transformation; the real thing is currently just JavaScript. There is a similar pass for circles.

Pass `normalizeCircles` not shown, but this is the ad-hoc visualisation I drew to help me craft it.

(As an intermediate step, I’ve made up this ad-hoc mood-specific syntax in a comment, which captures how I’d prefer to express the JS version of normalizeRects given the constraints of plain text:)

FOR ALL matchElt(i) (
  tag = 'path' ⟹ 'rect'
  class ⊇ { 'real', 'connection' }
  d = " Mlx,ty Lrx,ty Lrx,by Llx,by Z"
  let [x, y, w, h] = [lx, ty, rx - lx, by - ty]
  attributes {
    x ⟹ x, y ⟹ y, width ⟹ w, height ⟹ h, d ⟹ nil
  }
  id ⟹ 'r'+(i+1)
)

I call this “Jank Correction” in the case of Mathcha, because its avoidance of <rect>/<circle> does strike me as pretty janky. However, more generally, it seems like a pattern of normalising the different SVG that might get output from different editors. If I made the diagram in Inkscape instead, I’d apply the “normalise from Inkscape” preprocessing pass to get the SVG that all the later passes expect to see.

De-spatialise / Graphify Early

Usually, the first few passes that do real work are de-spatialising passes: they do all the computational geometry to reduce the “metric” properties of distance, positioning, containment, intersection etc. to discrete properties like “the origin box of the arrow” or “the label of this box”.

After de-spatialising, we’ve left the hardest parts behind and we’re just operating on graphs of DOM elements connected by their IDs. (Past this point, all the hard work—the visual work—is done; are we even “parsing” anymore? Just traversing a data structure like a compiler or interpreter.) This is why I also call this principle “graphify early”.

Idempotent, Typed Passes

There is a strong smell of functional programming and type theory concepts lurking in these passes! Each pass reads certain parts of the DOM tree as input and annotates the tree with its outputs, usually in a declarative way; sometimes even as simple as a map over nodes matching a CSS selector. It expects its input diagram to meet certain preconditions, and the changes it makes to the diagram might enable further passes that are compatible with it (or disable passes which are not). There is an implicit type system here—currently in the code and in my comments:

The passes are also supposed to be idempotent, so once you’ve applied a pass it should figure out that it’s already run and has no more work to do. Hence, the principle of idempotent typed passes. The idempotence seems like a disguised form of functional purity, so there’s definitely a system of typed pure unary functions waiting to be realised here…

Local Scoping Mechanisms

Finally, I want “Local Scoping Mechanisms” in order to treat certain regions of the diagram specially; maybe to hold comments, metadata, or executable code. Or even to contain an embedded diagram in a different notation, to be processed in its own way.

So far, I treat green boxes as comments to be ignored (by the way—notice how you can use real shapes and rich text in your comments! No more ASCII art required!), blue boxes as containing format metadata, and red boxes as code to execute. Since these features are common across many diagram formats, they force a common set of passes for ignoring comments and executing code before deferring to the more specific passes. You can see this common core in the pass graph for the labelGraph format; everything above the red line is shared across diagrams, while everything below it (apart from restoreComments) is syntax/format-specific.)

(“Green”, “blue” and “red” mean the specific RGB values of these colours that are convenient default colours in the Mathcha palette; this currently violates an unstated principle of “visual robustness” that should be filed with Precise/Predictable Semantics. I.e. two identical-looking diagrams (to the human eye) may have wildly different semantics. This is, obviously, improvable in the future. I would like to make the colours or even the shape patterns themselves configurable in the future; this is all reminiscent of conventions like “the top left pixel in your bitmap icon shall be treated as the transparency colour”.)

Why does Joel love parsing all of a sudden?

All of this emphasis on parsing will come as a great surprise to anyone who knows me or who has followed my work. I’ve previously harped on about “Explicit Structure“, i.e. what you get in a structured editor with no need for parsing and serialising. I’ve always regarded “escape” characters and SQL injection as unholy abominations that have no place in any sanely-designed information substrate. I’m supposed to hate all this stuff! So why would I be embracing it in this domain? Why not go for a “structured editing” of vector graphics notations?

This is a good question. We are, after all, still transforming static artefacts to static artefacts. We take a diagram on digital paper, and compile it to code on digital paper. There aren’t any moving parts or interactions; it’s not live. Even the “drag circles” self-raising diagram counts for this: the diagram had to be parsed before the static JavaScript code could be executed to set up invisible event handlers in the background. It’s interactive, but hardly live or in the spirit of my usual interests.

The word “notation” might have connotations of being a static thing, as if any notation could work just as well printed on paper. But this myth is laid to rest by the name of this very workshop, PAINT—Programming Abstractions and Interactive Notations, Tools, and Environments—it mentions interactive notations, and I think this is exactly the right definition. I think of notations as interactive interfaces (or if we want to be pedantic about it, possibly-interactive interfaces). Any combination of shapes and text that does anything in response to your input is a notation. So, a GUI is a notation. If what it happens to do is “nothing”, then it’s a static notation that will be faithfully represented if you print it out on paper.

So, to answer why I’m so enamoured with parsing all of a sudden: the problem with interactive notations, or making editors for them, is that the cost of production is too damn high!

To demonstrate this, I would like to return to another of my favourite papers from the STEPS project that has motivated a lot of my research. “Open, Reusable Object Models” describes an object model called Id, which has all these nice properties of minimality and flexibility and maximal late-binding that was just catnip to me as an undergaduate. It’s had me mesmerised for 8 years now. There is a sample implementation in C at the back, but in order to understand this system I had to look at the diagrams, and I even drew diagrams of my own. For me, the diagrams were a better tool for the job of understanding this system than the code. And in order to have a working prototype of the running system to play with, it felt like a waste to manually compile these diagrams back into code and to work in a command-line REPL. I really wanted the running system to be made out of these diagrams.

This was clearly an example of a domain-specific interface, or interactive notation, and it seemed like a worthy project for the first year of my PhD. So that’s exactly what I worked on, and I gave a talk about it entitled “What does it take to create with domain-appropriate tools?”. This question of “what does it take” was referring to the amount of work involved in implementing such a notation. In other words, it was about the cost of production of this notation.

(Well, it was supposed to be titled as a question, but I found out at the last minute that ACM publishing doesn’t allow question marks in titles, so I had to rephrase it. I’ve never forgiven them for this—many papers in many fields are titled with questions!)

What I ended up with was this rather … crude … notation of nested boxes, where each box has a name and an optional text box for JavaScript code, and you can draw arrows between the boxes. Now this basic implementation of the Id object model does work: I can send a message and play with the system to an extent. But the amount of labour that it took to get here in the technologies that I chose—namely SVG and JavaScript—was about five months of work! Even though the notation works for the thing I wanted, it’s very brittle and lacks the sort of conveniences that you really want in an interface for drawing and manipulating shapes; especially one that you spent five months on.

What sorts of conveniences are these? Well, let’s take a look at a vector graphics web app. In my case, Mathcha:

We have toolbars where we can adjust various parameters; colour pickers; we can select things and get point handles (I only implemented that for boxes, not for arrows or text labels!); we can drag things around; we have the all-important UNDO—so that when you inevitably mess up, you don’t have to kick yourself and refresh the page and start again from scratch.

If getting the bare essentials took me five months, how much longer would these features take? And do we really have to duplicate all this functionality every time we want some sort of interactive notation…?

We said previously how a diagram has a graphical syntax: some arrangements of shapes represent a valid abstract structure, and others don’t. A vector graphics editor is like a text editor: it doesn’t enforce these syntax rules, it just lets you draw whatever you want. And unlike a text editor, it can’t even give you helpful feedback by highlighting incorrect visual syntax!

Moreover, as soon as we want an interactive notation, we have additional constraints on what sorts of edits to the shapes are allowed. Some things can be moved and resized, but others can’t. Some things stick together when they’re moved. Some things automatically resize when you add items. Some shapes will appear and disappear in response to different inputs. This could be thought of as an “interactional syntax”. If you’re using external vector graphics software, most of the edits it’ll let you make will break these rules. So in order to create and edit shapes in a manner consistent with this interactional syntax, you have to invest in your own BESPOKE editor for the notation. There’s no way to get some off-the-shelf editor to respect the rules of your interactional syntax.

There is actually another reason you’re forced to go bespoke, even in the case of a static (non-interactive) notation, if you want any alternative to diagram parsing. The notation corresponds to an Abstract Data Graph in memory, and you want to do things with this data—like compile or interpret it. All that an off-the-shelf editor can do is save your diagram in some file format. It won’t let you access the nice parsed structures in its memory, because its memory is a heap of opaque binary blocks accessible only through debug APIs. (And even if you figure it out, the EULA probably makes it illegal to use them. Useless!)

So these were the forces that caused me to make my own bespoke editor in the first year of my PhD. As far as I can tell, this is the only way that you can get an equivalent of “structured editing” for custom notations, creating the structures interactively in a manner that obeys your particular interactional syntax instead of by parsing diagrams that you drew in a standard editor. And the benefit just wasn’t worth the cost for me to continue on that project. For five months of work I got this underwhelming, ugly interface that technically did the job while being very unforgiving to use or extend.

In contrast—when I started my three weeks of work on self-raising diagrams in August, around the second or third experiment I did was to re-create these diagrams for the Id object system and see if I could parse them. This was basically the BoxGraph format with a few extensions. And would you look at this:

I’d say that drawing this in Mathcha took me … 10 minutes. And writing the parser took me 3 fun afternoons. And yes, Mathcha will allow me to create invalid diagrams, or even diagrams with subtle bugs in them (e.g. from positioning), and then when I try to parse them, it’ll blow up because there’s no proper error handling. My first-year editor, to its credit, will not let me create certain invalid structures, but that’s not enough to offset the fact that I didn’t have the ability to duplicate Mathcha’s editing conveniences in my postgrad-tier JavaScript. So it’s really no contest from the editing or creation perspective.

(Attentive readers will note that there’s some sleight-of-hand here! I am comparing an interactive notation to a static notation, and it could well be just as difficult to make the diagrams dynamic; I admit this at the end. What I should say is something like this: I’m giving up on laborious interactive notations for the time being, and in exchange for this concession, I get truly rapid iteration on static notations via drawing and diagram parsing, like I’ve never experienced before. That’s the real reason I’m enamoured with it at the moment. A further subterfuge is that I’m comparing a complete set of Id objects and methods with a sketch of a JS-biased strain of Id without all of the methods contained in the diagram. However, I think the point still stands; it wouldn’t take much longer to draw these extra parts, nor to parse them.)

The Division-of-Labour Argument

This point about duplication of existing editor functionality is at the core of why I’m so won over by parsing diagrams. I’ve been greatly influenced over the years by Stephen Kell‘s warnings about silo-ization, in which different programming language ecosystems all need their own duplications of basic infrastructure like libraries and package managers and tooling, all implemented in The Language. People in one silo aren’t able to make use of all the work that people did in another silo.

I also think of a certain talk I attended at a previous SPLASH, “Relentless Repairability or Reckless Reuse“, on the same tension between having control and understanding of your tools, versus leveraging work that’s already been done by the outside world.

The talk features these two quotes that I felt were relevant here:

Google takes care of it. A lot of people are working on that.

In Chrome, graphics are fast and I can still inspect all the nodes of the document object model.

This dilemma seems to be the curse of the notational engineer, at least when we need to build an editor that respects our interactional syntax. As I discovered when I tried it, a bespoke editor must duplicate the work of supporting basic drawing operations and shapes found in a wide variety of existing tools—all just to let you add a few novel behaviours to the shapes at the end.

This seems wasteful and hubristic to me. I expect that the existing multitude of vector drawing workflows, and especially industrial tools like Adobe Illustrator, will have figured out the optimal GUIs for basic and advanced vector drawing operations. At the very least, I expect them to have been funded, staffed, and incentivised much better than myself or a grad student to achieve such a result; consider all the money poured into Illustrator and the feedback they receive from its users. I would much prefer to accept that they have done that job well, and take advantage of it.

Instead of spending ages creating a crude GUI for drawing shapes, which can make them dynamic, and then forcing others or myself to tolerate it in order to make dynamic graphics (knowing that it will lack basic amenities like undo, which I didn’t have time to implement etc)—why not allow the user to draw using whatever existing tool they find most convenient? After all, we enjoy this pluralism in the world of plain text. You can use any text editor you like to work with your source code, and you don’t have to roll your own text editor whenever you want to experiment with a new syntax. So let them use the right drawing tool for that job and then figure out a way to leverage their skills with that tool, in order to make the drawing dynamic.

In other words, embed the dynamic instructions or dynamic behaviours somehow using a special visual syntax. It’s much easier for the user to obey those rules than to use a poorly-working prototype.

If you are a notational engineer, you’re interested in the essential complexity of your notation. You want rapid iteration on different designs and features. Implementing line rubberbanding is not a good use of your time. Implementing selection rectangles, or shape libraries, or even UNDO is not a good use of your time.

[Adobe] takes care of it. A lot of people are working on that.

The Argument from Quality

There is also a strong argument from quality: say you want to make a dynamic notation or GUI which is truly beautiful or fancy; like the sort of thing you might see in a game. In Mathcha, I was able to draw this fancy titlebar for a window:

…but imagine what a skilled digital artist could do in a professional tool. If I want those beautiful graphics available programmatically in my go-to programming system (e.g. JavaScript, or even Squeak) it is absurd to think that I or a team of developers could ever replicate the professional editing features necessary to draw such a thing in-system. So I’m prepared to claim that for any sufficiently advanced notational project, I think it’s inevitable that the basic parts have to be flattened into a static diagram which is drawn externally, and then imported into the programming system, and then raised into a dynamic notation.

Confused…

Then again … I honestly can’t claim to be fully comfortable with my own argument here. I do believe in the power of diagram parsing as a way to experiment with static notations, because I have experienced precisely that. But for dynamic notations, the external editor can only hope to create the initial state of the notation. If the intended interaction patterns for the notation resemble editing operations—say, moving or resizing—then there’s nowhere else for these behaviours to live, other than inside the programming system that’s responsible for raising the diagram! Won’t I have to write the same code either way?

…Perhaps we can safely assume that the complexity of drawing the initial state will be far greater than the complexity of interacting with it? (e.g. I don’t need to allow the user to adjust minute shape control points, or to select colour gradients.)

I’ve lamented the shortcomings of my first-year editor for the Id object model, but at least it still functions as a live executable environment of the system modelled by the diagrams, which is what I wanted. Meanwhile, the diagrams are currently just parsed into JavaScript objects that can only simulate the system in JavaScript. In order to make the running diagrams resemble my old editor, will I need to implement line rubberbanding or arrow re-drawing after all…?

Conclusion

I suppose this is the most important question I want to see answered: do self-raising diagrams offer a nontrivial improvement for implementing interactive notations? And the other unanswered questions are:

What’s going on with the typed functional programming in the pass graphs? Can it be made explicit?
Can we take it meta, and define passes via diagram notations, compiling them to JS or interpreting them directly?
What’s the analogue of OMeta in the domain of graphical syntax? A graphical syntax for defining graphical syntaxes?

To conclude: I’ve greatly admired the STEPS work for its advances towards syntactic freedom, but I’ve always wondered what the generalisation to Notational Freedom would look like. Only then can we “Use The Right Tool For The Job” in a meaningful sense. Diagram Parsing, and potentially Self-Raising Diagrams, represent a promising step in this direction, with many interesting problems to solve. There are interesting parallels of standard programming concepts (parsing, compilation, interpretation, macro expansion) if we take diagrams to generalise source code. By leveraging existing tool infrastructure, common time-sinks are avoided in favour of rapid experimentation; however, this might only have value for static notations, and it remains to be seen if this can be made to work for substantially interactive notations. At only three weeks in … more research is definitely needed!

(Follow-up ideas and FAQs coming in a separate post.)

http://programmingmadecomplicated.wordpress.com/?p=9133

Extensions

Substrates ’25 Vision Statement

Joel Sep 6, 2025

The following is my vision statement submitted in advance of the Substrates Workshop held in June as part of <Programming> 25, alongside numerous other vision statements from attendees. It’s been adapted from the LaTeX with minor referencing edits. Summary Introduction The topic of this workshop is software substrates. How does a substrate differ from a … Continue reading Substrates ’25 Vision Statement

Show full content

Summary

What is (and isn’t) a substrate? Ontology of state with an authoring tool (PLs automate the authoring).
What values guide our research? Design well for humans first, optimise performance later. Ultra-convenience to better colonise the minds of interested parties.
What are the potential benefits of substrates? Common language to speak; solve coordination problem. Privately (or in small group) explore consequences of a design.
What can we learn from past attempts? Have a survival/replication plan; plenty of elegant organisms went extinct. Respect Worse-is-Better Darwinism.
What are the research problems? Just how many different substrates are there, really? (Modulo different jargon or byte encodings.) What is there, which is meaningfully distinct from the ubiquitous graph of key-value dictionaries? Perhaps substrates differ mainly by graphical interface rather than the underlying data model. Is there a Holy Grail / stem-cell meta-substrate, which we could easily adapt for any use case? Or would that be too good to be true?
What do we disagree about? Whether the extinctions of nice substrates like HyperCard were just unlucky, or robust to historical perturbations. If the latter, we should be careful not to make the same mistakes.
What research methodologies should we use? I regret not studying evolutionary biology. It seems highly relevant, so I’m catching up. Discover the genuine design tradeoffs rather than cheering “simplicity” etc. On the other hand, identify low-hanging fruit with an argument for why it’s not yet been picked.
What I’d like to see come out of this workshop: Interest or critique of new fields of study, esp software-convention-ology (and a better name?)
Details of my personal research vision: I’m with Kell; let’s break down the walls between all the existing silos and bridge all the superficially incompatible implementations of the key/value dictionary graph. A novel substrate is unlikely to make a dent, while the heap memory of every application is half-way to being a lingua franca — it just needs suitable annotations and de-duplication.

Introduction

The topic of this workshop is software substrates. How does a substrate differ from a programming system/language? Perhaps programming systems/languages are general-purpose and Turing-complete by default. Substrates, in contrast, can be expected to be specialised for specific domains (e.g. spreadsheets) or to not have a Turing machine lurking within them.

The term “substrate” seems downstream of the view of software as a material or medium. If it is so, then we can work with the material manually; I’m tempted to dismiss “programming” as a specialised field concerned with automating that work. We automate the work that we don’t want to do, but a medium seems like the domain of work that we do want to do ourselves. E.g: communicating or presenting something (limited by the audience’s speed of understanding), or creating / testing some ideas (limited by the author’s speed of understanding).

Software as a medium is so general that it is difficult to define in a helpful way. When in doubt, I always return to the conceptual model of analytical mechanics [SICM15]: a physical system has a state; a large system might have a state composed of many parameters, and it has as many subsystems as there are subsets of parameters. These parameters, typically real numbers, are treated as primitive: nobody cares to define a “decomposition of a real number”, even though possibilities exist (e.g. analysing each decimal digit or binary bit as a primitive parameter), because this is not useful in studying the natural world.

Given a specific state, we can imagine the same state but with a different value for one parameter. Then we can imagine the system jumping from the old state to the new state. There is a combinatorial explosion of such deltas, and if we accumulate them, we get logically conceivable evolution paths of the system. In natural science, we take observed empirical reality as our guidepost and ignore most of these logically possible paths in favour of the narrow range of paths that real systems actually take. The job of mechanics is to characterise which criteria nature uses to determine how systems evolve (e.g. the Principle of Least Action [LazyUniverse17]).

In software, we begin from essentially the same premise. However, unlike the study of nature, we have vastly more freedom in how the system will evolve over time. Hence Alan Kay’s view of the software medium as a meta-medium, the universal simulator: software can act like anything we want, at least relative to the capabilities of physical output devices to which it is connected. Our evaluation method for software is not whether the evolution happens in nature, but whether the evolution is useful or desired by a human being in a particular context. This subjectivity is inherent to the field; there are many people with many different goals in many different use cases. We can’t expect to circumscribe it.

Of course, one common way to respond to such complexity is to ignore it: to create software that is the way it is, and mandate that users1 adapt themselves to it. This is the path of least resistance in industry and in practices set by industry. However, I expect that we are all unimpressed with this state of affairs. Presumably, we seek substrates that are general and flexible enough to adapt to user desires, or seek specific substrates well-suited to certain domains — we see complexity of implementation as a worthy price to pay for ease of use.

Substrate = Primitives + Compositions + Editor + API

In software, as in mechanics, we can imagine that one of the primitive parameters gets changed to a new value. The delta from the old to the new state is a smallest possible “edit”, which could in principle be performed manually. Any programming “language” on top of this is just a way of automating these changes and making them conditional on other parts of the state. Thus I am inclined to see the ontology of state (“what is state made of?”) as primary, and the “language of evolution” as of secondary importance. Given an ontology of state, primitive edits to that state fall out quite easily, and an infinite variety of programming languages can be dreamt up which compile down to those primitive edits.

For example, in the “machine” view of digital material, state consists of one long array of machine words (or bytes, or bits). A primitive edit consists of changing one of these bytes. Machine instructions on the CPU then make these primitive edits automatically, and a great diversity of programming languages are built on top of them. However, the MS-DOS program DEBUG.COM also allowed a user to make these primitive changes manually, and GNU Poke possibly does the same thing for the modern Unix process.

This leads me to offer the following definition of a substrate: it is a set of primitives, and ways to compose them, which is effectively authorable, and which has an API (i.e. it is also authorable by computer programs).

“Effectively authorable” means: a user can make primitive edits to the substrate, evolving it from one valid state to another valid state. This is relevant when we cannot afford to think of the substrate as an abstract thing, but must instead implement it for real. In order to do that, we need to implement it atop some existing lower-level substrate. If an editing interface is available for the lower-level substrate, then there is a risk of edits that result in a state valid in the lower-level substrate, yet invalid in the higher-level one. I will illustrate this definition with the following examples.

Substrate Examples

Sketch of some hierarchies of substrates in software. Everything builds on top of bytes, but there is an early speciation into “plain text” substrates. Each layer represents some restriction of the variations permitted by the layer below.

“Plain text” is a low-level substrate sitting atop the even lower-level substrate of bytes. The rules of Unicode and UTF-8 constrain the allowable states of the system: some byte sequences may not correspond to a valid character. A hex editor thus allows the state of a text file to be changed to something that is not a text file, while a text editor’s edits are closed within the set of valid text files. Because text editors are ubiquitous, we can class “plain text” as effectively authorable, and hence a substrate. The primitives are characters and the composition mechanism is a single flat array.

Bytes and text are two root2 substrates on which all others are built by further restricting the valid states. Every additional constraint defines a further data format, which may fall short of being a substrate by not being effectively authorable. A programming language syntax restricts the text substrate to syntactically valid programs. Widely available editor feedback for syntax errors means that, e.g. Python syntax is a substrate. However, many syntactically valid programs are still semantically invalid programs; this suggests that from a semantic perspective, Python source code is not an effectively authorable substrate. (Perhaps no Turing-complete programming language is effectively authorable as so defined, because no editor could guarantee the absence of run-time errors. Is Haskell nearly a counterexample?)

A better example here is JSON: while editors can ensure that the file remains valid JSON, JSON is probably being used to encode some more constrained domain-specific structure with its own rules of validity (its “schema”). For example, the age field must always have a nonnegative integer value. By default, the JSON-aware text editor does not know what these rules are; in this case, the domain-specific data structures would not be effectively authorable and would not constitute a substrate.

Running parallel to the stack of text formats is the stack built on the non-textual (“binary”) base substrate. The substrate of raw bytes is effectively authorable (vacuously so, since all byte sequences are valid) via the hex editor, but each binary format does not necessarily come with an editor that respects its rules. Each binary format has its own “quantitative syntax” [Infra17] in terms of offsets and sizes, which restricts the set of valid states. There is a delicate relationship between different parts of the state, which will be broken by the insertion or deletion of a single byte; this contrasts heavily with text formats which are designed to be fairly robust to insertion/deletion. Yet in order to be useful, there clearly needs to be room for variation within the format: for example, the PNG format permits many different arrangements of coloured pixels. Because PNG editors are ubiquitous, PNG counts as a substrate.

The primitives of a binary substrate are individual bytes, machine words or fixed-size blocks; the composition mechanisms are arrays, nesting, and graph linkage, achieved by relative or absolute offsetting schemes. For example, a Squeak Smalltalk VM image is a binary file that contains an encoded object graph. This object graph is authorable using the Squeak VM.3 Further data structures and applications are created as subtrees and subgraphs, with their own validity rules. Whether or not these class as substrates depends on whether authoring tools are provided as part of the application within Squeak — the built-in Squeak inspection and debugging tools could be used to break these application structures (while still remaining valid Squeak object graphs), so such tools are not sufficient. Similarly, the heap of a Unix process is a graph of byte blocks containing pointers to each other, though it is not necessarily a structure observable as a file on disk.

The web page perhaps contains two substrates: the “structural” DOM and the “behavioural” JS object graph. The DOM is effectively authorable via the browser dev tools, while the JS object graph is crudely authorable via the console. However, the changes thus made are not persistent; this should probably be part of the definition of “effectively authorable”.

A contrast with HyperCard (something we can all surely agree was a substrate with good qualities) reveals further refinements to my definition: the web page is nothing like HyperCard because even its most basic elements (static text and graphics) cannot be directly manipulated in the page; edits must be made to its shadowy DOM tree instead. Alan Kay’s criticism of the web is that it didn’t bundle such an obvious authoring tool with the browser. A substrate must be writeable at a similar level of skill as it is readable.

(HyperCard may have been nice, but it was dependent on Apple and it went extinct. If this was due to a business decision made by Steve Jobs because of his unique personality, then we could consider HyperCard to have merely been unlucky, and suppose that something like it it could survive and thrive once again given an initial kick. On the other hand, if it vanished because of a sudden change in environment (e.g. competition with the Web) or due to some other robust historical fact, then this should give us caution. I would like to see explicit arguments on random chance vs. systematic factors on discussions of substrate extinction, instead of having them remain implicit assumptions. A respect for something like Darwinism is needed, as explored in Gabriel’s Worse-is-Better series [WIB90] and Kell’s Unix/Smalltalk comparison [Lurking18]; substrates that are more than research curiosities deserve some thought put into a survival-and-replication plan.)

Spreadsheets have numbers, text strings, and fill colours as their primitives. The composition mechanism is a flat 2D grid. There is also nesting via drawing thick borders around things, but this not so easy to manipulate programmatically; it’s more of a cue for the eyes. Perhaps the formula relations could be seen as a composition mechanism.

Normal applications have a failed substrate of bytes in the heap; there is no normal way to edit heap structures in a running process. If there were a format describing the heap structures (similar to what Kaitai Struct does for binary files; see also liballocs [liballocs15], which aims to improvise such a format from existing debug metadata) then one could conceivably feed it as a template into a hex editor, and edit valid structures into other valid structures. But in practice, the only read/write interface to heap data is the application’s user interface, many times indirected from it and heavily restricted in what it can do.

Mobile apps feel even more closed / locked-down than this. Something with very few degrees of freedom — e.g. a preferences pane in an app, with some sliders and drop-downs for specific high-level parameters of the user experience — may well be part of a useful tool for something, but it is not a substrate. A substrate is etymologically a stratum (layer) that sits sub (under) something, supporting a variety of combinations of its primitive elements.

Research Values

It might be good for morale to unite around words like “simplicity”, “elegance”, “human-friendliness” etc, but serious research needs a purely descriptive, non-moral account of concepts like these. If the things we want are so good, why do we not have them already? I would like to see explicit arguments (by no means exhaustive, just an outline) that yes, there is low-hanging fruit … and it has not been picked because X. As we move out to the Pareto frontier, i.e. the place where we have to make tradeoffs, I would like to see analysis of the inconvenient decisions that reality forces us to make.

That said, those of us in this workshop probably find certain tradeoffs untroubling. For example, I see mainstream practice as prioritising ease of implementation, at the cost of the user having to adapt himself to the software. This is partly because of the incentives and economics of industry, but also because we genuinely don’t know how to make open, convivial, malleable software — it’s a hard research problem. This means that making the opposite tradeoff — being willing to wrestle with difficult questions of design and implementation ourselves, over research timescales rather than business cycles — for the sake of openness / malleability, is a meaningful activity rather than a mere rallying flag.

Another example: malleability is often sacrificed for the sake of efficiency or performance. This may be necessary under the constraints of industry, but I expect we’re sympathetic to the opposite tradeoff: we’d prefer to prioritise human suitability first, on the assumption that anything can be optimised later by throwing enough engineering brainpower at the problem (see Self [Self89], for example). At Xerox PARC, they “designed for the hardware of tomorrow” — even if hardware performance stalls, I think it is still a safe assumption that any elegant design can be optimised with enough dedication.

However, those are the only two tradeoffs I can think of that my value system classes as “easy wins”.

One manifestation of “implementation effort” is in making it as easy as possible for an interested party to try out a substrate, understand its design, or contribute to it. I think it is worth considering ultra-convenience as an important value in today’s attention-saturating environment.

Ultra-convenience

I personally dislike installing things: if it doesn’t run in some web page, it doesn’t exist. I also dislike learning a system by trial and error: if there’s no readable or watchable outline of how the thing works, I have better things to do than take a random walk through the state space. Call me lazy, but it is probably a wise practical assumption that all attention will come from 2 dedicated true believers, a handful of their friends, and 10,000 easily distracted or busy people. This suggests value in studying memetics, or how ideas spread and mutate across people’s minds. Even “trivial inconveniences” [Trivial09] can have a disproportionate effect on how many people engage with your prototype. Therefore, consider optimising for ultra-convenience, or virality.

Before my time, magazines listed BASIC source code of simple games that one could type in, play, and modify. The modern equivalent might be where your audience can just copy some code into the JS console in a new tab. No install, no setup, works everywhere. I’m suggesting an extreme end of the tradeoff: be willing to work harder so as to require no “virtue” from the user at all.

I see programming as a way to test interesting ideas by forcing them to take on a precise, executable form. However, it doesn’t feel sensible to do this with with half-baked ideas. I prefer to discuss half-baked ideas with other people until they get to a form where the discussion can no longer help me; where the thing I need to find out cannot be proven or simulated by any person, and it can only be discovered through experiment. I prefer to read and write English prose with diagrams, or to watch an expert using a system they know well, than to prematurely write code that vastly fewer people will download. I admit, this is partly a problem of media: my wordpress.com blog template doesn’t easily provide for interactive JS demos, but a future Github site might. Nevertheless, this is a nontrivial technical investment that I have yet to make. I also suppose that if we did have a malleable “personal dynamic medium” that was as convenient as pen-and-paper or communicating with people, then I might do more with it instead.

Fragmentation and Convention-ology

There may be value in creating a new substrate, but what are the costs? If the substrate offers something genuinely novel, then it has clear value. If it is intended to reach only a small group of dedicated researchers, then this may be a sensible goal. For other purposes, we must be careful of the tendency in computing to duplicate essentially the same things over and over again in incompatible ways. We’ve surely all seen the “14 Competing Standards” XKCD. Stephen Kell introduced the term “fragmentation” [Lurking18] for a phenomenon that has precedent outside software: the “coordination problem” [Holy20].

The basic metaphor at work is that of a common language, and natural language is perhaps the oldest example. Geographically dispersed peoples, who lack contact via communication technology, must solve common human problems independently. However, many features of the environment are common to all of them: they certainly need to drink water and must have a word for it. A single tightly-bound tribe would need to converge on the same word (“water”, in English). The same mission is pursued by another far-away tribe, but there is no pressure for them to arrive at the same word as the first tribe. Even if they wanted to, they have no systematic means to do so because they are so distant. Because the space of possible words is so large, it is highly likely that they will settle on a completely different word (“mizu”, in Japanese). Repeat this process for the majority of words that people need.4 When the two groups later make contact, or if a third party is familiar with both of them, there will be a sense of artificial choice, or accidental complexity: which language should we use? Words are arbitrary conventions (barring exceptions like onomatopoeia), and collaboration between multiple agents is only possible on the basis of shared conventions.

This clashing of conventions occurs with many other things in the physical world: units of measure (such as weight, length, temperature, and money), electrical voltages and plug types, rail track gauges for trains, and whether cars drive on the left or the right. The software world appears to rely even more intensely on this dynamic; my best guess is because (a) computers demand vastly greater precision than fuzzy, ambiguous human perception, and because (b) we must often agree on numbers instead of names. Given two English speakers, it is often plausible that they will happen to use the same English word for a field name, but expecting them to independently hit on the same number (e.g. an enum index, a port, a byte offset) is a far poorer bet.

Alan Kay suggested that most advances in computing (which I will interpret as advances towards human-friendliness) were made on the back of late-binding technologies such as the MMU. I’m quite the cheerleader for late-binding, but I think that the victory of standards (often, de-facto) has been just as crucial. This is hard to notice, because we have no reason to think about the competing standards that predate our programming careers.

For example, I have grown up in a world of Unicode (and UTF-8 at that), so I have only rarely needed to convert from other encodings or diagnose bugs originating from mismatches. However, I understand that a lot of programmer hours used to be spent on such things, when there were many character encodings with no clear winner. I am grateful for the de-facto victory of Unicode and UTF-8, and I doubt that much of value was lost as a result. In an alternate history where some competitor to Unicode had won instead, I think I would be similarly grateful (unless it had major flaws incomparable to Unicode’s flaws). There cannot be that much virtue in any particular scheme for assigning human concepts to numbers; what matters is that we all use the same scheme.

Similarly, while I am aware of the flaws of QWERTY, I am grateful for the fact that there is one main keyboard layout that I need to be familiar with. The more I read about the history of the field, the more grateful I am to live under the winners of past battles: 8-bit bytes, the IBM PC architecture, BIOS, USB, the IP protocol, IEEE 754 floating-point, HTML5, WebGL, stdin/stdout, keyboard shortcuts, and so on.

I think the existence of multiple competing conventions is the most wasteful when:

Their domain is close to the machine, hence subject to objective engineering criteria rather than high-level human preferences or aesthetics.
They have equal size, power, market- or mind-share.

For (1), people can have different high-level preferences, beliefs, or goals. Disagreement and negotiation is legitimate on this basis. However, a clash of conventions is different: none of them have any particular claim to objective merit over the others. Each one is simply vying to become the standard that everybody uses. I would expect a group of competing low-level conventions to have long resolved the most legitimate engineering arguments, leaving merely different numbering schemes or something similar as the thing being fought over. As for (2), when one convention is much “bigger” (more standard) than the other, there is a clear winner and everyone can see it; its size functions as a natural attractor that could put an end to the war before too long. When there are many competitors with no decisive advantage, one is hesitant to commit to any of the technologies; N-1 out of the N will be the losers at some future point, which is not a promising investment.

I wonder, during that critical period before a decisive winner emerges, what forces end up pushing a convention past the critical point? What are the criteria of natural selection for conventions among human minds following their whims? In the past, governments have simply force-pushed a single standard (measures, money, driving side) on their citizens [SLAS20], but they still had to somehow choose one standard to force-push instead of the others that were floating around. In software, large companies and communities can succeed at forcing standards in a similar way. I suppose I am more interested in what happens “organically” before such an imposition occurs, and even before debate (since, after all, it is only a minority of each camp who debate each other, and even fewer who change their minds as a result): everybody follows their own incentives, a tipping point is reached, and then the losers have to adopt the emergent winner. What are the laws of such a dynamical system? I would like to see “convention-ology” studied (possibly with a better name) and for substrate enthusiasts to show that they are aware of the problems it presents.

(I’ve blathered enough about this; the online writer Gwern has a very detailed essay on this topic [Holy20] which might make a better jumping-off point for its study.)

Graphs of Dictionaries

It is remarkable how many encodings there are for the same basic recurring structure of “graph of key-value dictionaries”. C structs, Python objects (and dicts), JS objects, Smalltalk objects … they’re all the same basic conceptual interface, augmented on top with slightly different rules for name inheritance, encoded in very different arrangements of bytes. I will even include filesystem directories as another instance, albeit one with the official blessing of the operating system.

This represents a significant chunk of the fragmentation Kell observes under Unix [Lurking18]. By prescribing a common language (syscalls and the filesystem) only for “large objects” (files and processes), and remaining silent on how to structure or name the byte sequences inside, Unix ceded the software landscape to many nucleation sites for encoding conventions. Each of these grew independently and evolved its own take on the “graph of dictionaries” pattern, among other things (e.g. tooling and package management).

I would like to see substrates compared by how they differ from this basic common structure. Alternatively, it could be the case that most substrates share this underlying data model yet differ meaningfully in their authoring interfaces. Then again, I’m struggling to fit the spreadsheet into this model; it hardly seems like a graph of dictionaries internally or as a grid interface. Perhaps there are only three families of substrates: those based on dictionary graphs, those restricted to trees, or those based on N-dimensional arrays (flat character lists, vs. 2D grids, vs. 3D grids …)

Research Vision

To summarise what I suppose I’ve been hinting at throughout this piece, my current research interest is to follow Kell and bridge between those myriad fragmented conventions in the Unix hegemony of programming.

The Unixey substrate consists of a unified “large scope” (the directory tree, effectively authorable) and a million “small scopes” (binary and text file formats, binary formats of heap/stack data; not effectively authorable, or not entirely). Loader code parses binary data into directory-like trees. Parsers turn textual syntaxes into more of those trees; it’s all trees and graphs in the end. Kaitai Struct un-fragments the “target language” part for binary data (parse these bytes into C structs, a Python tree, a JS tree, etc). The ubiquitous substrate of the Filesystem, atop appropriate translation layers like Kaitai, could thus undo much of the fragmentation of binary file formats and binary heap data, in tandem with Kell’s liballocs [liballocs15]. In other words: expose the internal structure of “files” as just more directory trees [FAD18], and shrink the “file” concept to uncontroversially primitive data (individual numbers, names, booleans, and so on).

I’m currently working on Kaitai descriptions of the Squeak Smalltalk and Self VM image formats (see figure below). By eventually exposing their internal structure as a directory tree via FUSE, I’m optimistic that their embedded object graphs can be made browseable (and even authorable, to an extent) without relying on a running VM process. This approach could assimilate the “black boxes” of binary substrates (and even textual substrates, as unparsed character lists) under a single tree/graph view of everything.

Tree view of Self object slots in the Kaitai online IDE, parsed on demand from bytes in a Self image file. On the left is the root “lobby” object with its six slots. On the right is the long alphabetical list of all the slot names under `globals`. This is very good for me because I am daunted by the Self VM setup process.

Remember here that programmers are the users of programming systems; I remain skeptical that programmers and users have conflicting interests in general. If one’s job security depends on mastery of artificial difficulty, does one have an interest in keeping it difficult, or making it easier? ︎
Of course, binary is the true root and text is built on top of it. It’s still useful to think of them as two roots, because their descendants are disjoint in many ways. Text vs. binary data is more than a difference in alphabets; the big difference seems to be quantitative vs. qualitative syntax. ︎
Because Squeak is maintained for Windows, Mac and Linux and is easy to install, I see it as effectively authorable. This contrasts with Self, which seems quite complicated to set up and build/run; Self images are not effectively authorable for normal people. However, if I end up succeeding at exposing the Self object graph via a FUSE filesystem interface, then it will at least become effectively browseable; a step towards a substrate. ︎
Of course, people adapt to their local circumstances and their languages follow this. There are untranslatable terms for things that only exist in a particular culture. But this is a higher-order term; there are a great many concepts that are common to many communities: food, water, life, death, etc. ︎

References

[SICM15] Gerald Jay Sussman and Jack Wisdom. Structure and Interpretation of Classical Mechanics, 2015
[LazyUniverse17] Jennifer Coopersmith. The Lazy Universe: An Introduction to the Principle of Least Action. 2017
[Infra17] Christopher Kyle Hall. A New Human-Readability Infrastructure for Computing. 2017
[WIB90] Richard P Gabriel. Worse Is Better. 1990
[Lurking18] Stephen Kell. Unix, Plan 9 and the Lurking Smalltalk. 2018
[liballocs15] Stephen Kell. Towards a Dynamic Object Model within Unix Processes. 2015
[Self89] Craig Chambers, David Ungar, and Elgin Lee. An efficient implementation of SELF: a dynamically-typed object-oriented language based on prototypes. 1989
[Trivial09] Scott Alexander. Beware Trivial Inconveniences. 2009
[Holy20] Gwern Branwen. Technology Holy Wars are Coordination Problems. 2020.
[SLAS20] James C Scott. Seeing Like a State: How Certain Schemes to Improve the Human Condition Have Failed. 2020
[FAD18] Raphael Wimmer. Files As Directories: Some Thoughts on Accessing Structured Data Within Files. 2018

http://programmingmadecomplicated.wordpress.com/?p=9108

Extensions

Ode to Id (Notes on COLA’s object model), Part N

Joel Mar 21, 2025

(Previously: Part 1, Part 2) (EDIT 2025-08-05: formalising all this stuff for a paper made me realise that Id’s multi-parent mechanism probably does NOT quite suffice to model Self! Will correct in future, for now, be warned…) Id only makes one assumption about existing objects, which is that each somehow has a binder object (“vtable”). … Continue reading Ode to Id (Notes on COLA’s object model), Part N

Show full content

(Previously: Part 1, Part 2)

(EDIT 2025-08-05: formalising all this stuff for a paper made me realise that Id’s multi-parent mechanism probably does NOT quite suffice to model Self! Will correct in future, for now, be warned…)

Id only makes one assumption about existing objects, which is that each somehow has a binder object (“vtable”). Exactly how this is represented is a matter of implementation: in the C version, it’s a pointer stored just before the object’s address; in my JS version, it’s a property. Id then makes this assumption mean something by including it in bind: to bind a selector in an object, a message is sent to its binder. The binder ends up being an analogue of a class, and multiple objects can share behaviour by having the same binder. Any methods in the binder are allowed to make assumptions about the structure of objects they will be invoked on. More specifically, any “lookup” methods in the binder are allowed to make further assumptions about the binders they will be invoked on. This suffices to model Smalltalk-like OOP.

In contrast, Self does away with the class/object distinction and consults the receiver object first, before moving on to its parents. As far as method lookup goes, this is identical to a class with multiple parents. Id provides a very elegant implementation of this in terms of merely adding new objects – let’s take a look.

Multiple parents in Id

The method vtable_lookup delegates the lookup message to the vtable’s parent if appropriate:

function vtable_lookup(self, selector) {
  const method = self.entries[selector];
  if (method) return method;
  else if (self.parent) return send(self.parent, 'lookup', selector);
}

Multiple parents can be supported as a list object that responds to lookup by sending it to each element until one returns a method. In other words, whether a vtable’s parent refers to a single object or multiple such objects is a late-bound property! No edits to vtable_lookup are necessary (besides ensuring that it delegates via a message send, rather than calling vtable_lookup directly) – all that is needed is to add the appropriate definitions to the system.

Self send semantics requires “resolver” access

It would be simple if Self was just the Smalltalk class level with multiple superclasses. I said that it is, insofar as method lookup is concerned, but this is not the general case in Self. Where Smalltalk objects can only share behaviour (via their classes), Self objects can also share mutable state. A Self “getter slot” mySlot maps a selector mySlot to an arbitrary object, not just a method. If the slot is mutable, it has a sister slot mySlot: mapping to the assignment primitive. Importantly, if we send this setter message mySlot: to an object, but find the slot in one of its parents, then we mutate the corresponding “getter slot” in the parent, not the original object. This is incompatible with the view of method invocation as having access to the private state of the receiver only: if the slot is nonexistent, or read-only, in the receiver to which we send mySlot:, then it is the private state of the resolver object (the one that actually answered the message) that needs mutating.

The left-arrow value is used in Self literature to mean the assignment primitive, while parent slots of an object end in the asterisk *. The receiver here has a single parent slot that is actually *called* `parent` for simplicity. Its `mySlot` is read-only because it doesn’t have the corresponding setter slot. Its parent has the setter, so it is the *resolver* object for the message, so its value of `mySlot` will change rather than the receiver’s.

Id 2.0: bind to closure containing a method

Piumarta & Warth model this in Id as follows. The simple model, discussed so far (which I will call Id 1.0) has vtables store methods that get directly invoked on the receiver. Their extended model (Id 2.0) wraps this method in a closure object, where in addition to the method, the closure stores an arbitrary data member. After binding a selector to a closure, its method is then invoked with access to that closure object along with the usual receiver and args. The closure parameter to the method is clearly serving as a proxy for the “resolver” vtable in which the closure was found.

Any system in Id 1.0 can be straightforwardly ported to Id 2.0, but the latter additionally lets us model the Self behaviour. The closure for a getter selector stores the value in its data field, while its method is just a universal getter method (which ignores the receiver and returns the closure’s data value). The closure for the corresponding setter selector points its data at the getter closure, allowing its method to be a universal setter method (which again ignores the receiver, and sets the getter closure’s data field to the argument).

Id 2.0 implementation of the above Self example.

Combined with the multiple-parent extension that works even in Id 1.0, this elegantly models Self almost completely. The class-like vtable has become the Selfy object. However, there remains a frustrating inelegance: we still cannot send messages to the Selfy vtable directly. We need to use an empty “dummy” instance of it.

This is a direct consequence of the definition of bind-via-send. No matter the selector, we always send “lookup” to the receiver’s binder:

β(o,s) = σ(B(o),λ,s)

Thus, when we naively send mySlot to the vtable that contains it, bind will send lookup to its vtable, which probably only contains the lookup method and no mySlot. To fix this, we could perhaps send “lookup” to the receiver by default … and only in the case of the lookup selector itself (lookup lookup) do we send to its binder:

β(o,λ) = σ(B(o),λ,λ)
β(o,s) = σ(o,λ,s)

This would allow us to use the vtable object directly as the “Selfy” receiver, but it would break the usual Smalltalk model. However, there is yet another way.

Selfy objects as self-binders

Given the definition of bind:

β(o,s) = σ(B(o),λ,s)

We can see that any self-binding object ω, for which B(ω) = ω, will bind a selector by sending lookup to itself. So we could just model Self objects as self-binders. Since a bind to such an object will eventually ground out in the object’s fixed-point lookup function Λ(ω), it is as if we have replaced a per-object binder B(ω) with a per-object lookup function Λ(ω)!

All Selfy objects will use the same Self lookup function, which starts at the object and searches its parents. But what about other self-binding objects?

Suppose that in fact all objects in the system are self-binding; we can easily recover the original Id object model. Simply give every object a new bind' as its lookup function; i.e. Λ(o) = β’, where β’ makes reference to a “real” binder object (now that the old one is useless, since B(o) = o for all objects):

β'(o,s) = σ'(B'(o),λ,s)

Clearly this is isomorphic to the original Id model. What this shows is that the original “Classy Id” model can be obtained from “Selfy Id” by setting all lookup functions to a bind_via_vtable function. Yet “Selfy Id” was obtained from “Classy Id” by making vtable (B) the identity function! They are clearly dual to each other somehow, neither being more fundamental because each can be derived as a “special case” of the other.

Essentially, Selfy Id mandates that each object have its own bind function. Classy Id mandates that each object have its own binder object.

This is my own flight of fancy. However, if there is a “proper” way to do Self in Id, it is probably the following…

Id 3.0: inheritance and delegation coexist

In his follow-up paper on “Efficient sideways composition” (2007), Piumarta extends Id 2.0 with a fully orthogonal delegation hierarchy consulted as a “plan B” by bind. The idea is that, should the receiver’s binder fail to return a closure for the selector, the receiver can delegate a new receiver by answering the delegate message, and begin the whole process again from there. The “receiver” from which the closure actually gets resolved is returned as the resolver object (“prototype” in the paper) alongside the closure. The closure’s method is then invoked with access to:

The closure, as in Id 2.0, acting as a proxy for the “resolver” vtable in the inheritance hierarchy which resolved the selector
The resolver / prototype object returned from bind, which acts as representative for the slice through the inheritance hierarchy in which the closure was found (which contains the resolver vtable) – this is called stateful_self and is used as the target of state mutations.
The original receiver to which the message was sent – this is called self and will be where sends to self begin, as in Id 2.0.
…and of course, the arguments.

Since we’ve added another message send (delegate) to the definition of bind, I’m wondering if there is another infinite regress we need to beware of. The code in the paper doesn’t contain a short-circuit clause like there was for vtable_lookup… but nor does it contain the short-circuit for vtable_lookup. Is it just a sketch or did Piumarta realise something I don’t?

I begin with β(o,s), suppose we don’t find the closure, so we send σ(o,δ), which requires binding δ, let’s suppose we don’t find a closure for that either… so we delegate for the “delegate” message…

β(o,s) = β(σ(o,δ),s)
σ(o,δ) = β(o,δ)(c,p,o,δ)
β(o,δ) = β(σ(o,δ),δ)
 = β(β(o,δ)(...),δ)
 = β(β(β(o,δ)(...),δ)(...),δ)
 = β(β(β(β(o,δ)(...),δ)(...),δ)(...),δ)
etc

… well, this would be a problem, if the paper didn’t specify that δ has a default closure in Object whose method just returns nil. So that means my premise that β(o,δ) doesn’t answer anything, never happens.

I digress from the main theme – what I want to know is, what implications does this have for modelling Self? It means that it can be modelled more directly than the Id 2.0 approach. stateful_self, aka the resolver prototype, is precisely the resolver object r in my Self diagram above. Ordinary Smalltalk getter/setter methods, which set instance state instead of data slots in vtable closures, can now implement Selfy “slots” whose data is stored in vtables’ instances. This would appear to obsolete the Id 2.0 approach of Selfy slots, leaving closures mainly as vehicles for mixed-mode execution (Fig. 19 in the original paper). Right? Yet that approach was so elegant. Can it coexist with the Id 3.0 approach for some other purpose, or is between-instance delegation meant to replace it?

As a final random note, P&W mention in the Related Work that Id’s vtables resemble the “map” optimisation in the Self VM, where similarly-shaped objects are stored in an optimised form and a class-like VM-internal “map” structure knows how slots map to memory offsets. There is even a “map map” analogous to the vtable vtable. I suspect that P&W must have been familiar with Self’s VM and got the vtable idea from maps, rather than convergent evolution. (Whereas, in my timeline, I discovered their work years ago, took my first steps with Smalltalk in just the last year, and have only just immersed myself in the sparsely-documented world of Self. See also: my ongoing Kaitai Struct formats for Self snapshot and Squeak Cog Spur VM image files.)

The difference there is that the “class” (map) concept is a hidden implementation optimisation, rather than something to worry about at the user level. Still, Id 3.0 perhaps fully realises this connection: prototypes with the same shape (clone families) defer to their “shape” (vtable) to respond to messages, but they can also delegate to each other, or differently-shaped objects entirely. Notably, this is via a late-bound message, unlike the parent pointer between vtables. Did they ever try making that a message too?

These mysteries await answers, but for the time being, my time has run out.

http://programmingmadecomplicated.wordpress.com/?p=8985

Extensions

Ode to Id (Notes on COLA’s object model), Part 2

Joel Feb 25, 2025

So far, we have this: Our as-yet undefined terms are VtableVT, vtable_lookup, and vtable (yes, I’m sticking to the paper’s name instead of vtablevt_lookup, I’m still not sure about that). We have already made the decision to define a “method” as a JS function taking the receiver as its first argument. Defining the vtable function … Continue reading Ode to Id (Notes on COLA’s object model), Part 2

Show full content

So far, we have this:

function send(obj, selector, ...args) {
  const method = bind(obj, selector);
  return method(obj, ...args);
}
 
function bind(obj, selector) {
  if (selector === 'lookup' && obj === VtableVT) return vtable_lookup;
  else return send(vtable(obj), 'lookup', selector);
}

Our as-yet undefined terms are VtableVT, vtable_lookup, and vtable (yes, I’m sticking to the paper’s name instead of vtablevt_lookup, I’m still not sure about that). We have already made the decision to define a “method” as a JS function taking the receiver as its first argument.

Defining the vtable function

Id is designed to be able to embrace and extend (and hopefully not extinguish) all existing data structures in your language. It gives all of them late-binding potential, if you want it. In C (i.e. portable assembly, with access to the raw memory that all PL runtimes use for their data structures), this extends to all data structures of all coexisting languages in the same process. Id/C places a vtable pointer in the word before a memory block; this way, a C++ object can receive messages via its “CppObject” vtable, a Python object via its “PythonObject” vtable, and so on; it’s a binary annotation that doesn’t touch their language-defined internals. Of course, if it’s not safe or desirable to use that implementation, others are conceivable: for example, a separate table mapping memory addresses to the vtable they use. (It is all eerily reminiscent of Kell’s liballocs, perhaps yet another example of Unix/Smalltalk convergent evolution, and this deserves looking into.)

Sadly, JS inherited Java’s design flaw of splitting off “primitive” types from objects. It is very close to a uniform “everything is a dynamic object” modality, but still manages to fall short. Id/JS allows us to re-homogenise and undo this design flaw, to a large extent. We will begin with the simple case: we’ll implement the vtable function via a property on objects that support it.

function vtable(o) {
  return o.vtable;
}

Easy enough. What determines whether something can have properties? Mostly typeof, but it’s a bit of a mess:

When typeof o is object, it’ll work – unless o is null! (Which is still reported as object…)
When it’s function, it’ll work.
When it’s number, boolean, string, or undefined, it’ll blow up.

With Id/JS, we can clean up this mess. We’ll have vtables Boolean, Number, String, Null, and Undefined, and organise them under a global path vtables.primitive:

vtables = { primitive: {} };

function vtable(o) {
  if (typeof o === 'object' && o !== null || typeof o === 'function') return o.vtable;
  else return vtables.primitive[typeof o];
}

We’ll initialise those vtables later. So far, we have a read-only vtable function. Because a JS function vtable() is less prematurely-committed than writing directly to obj.vtable = ..., we want to use the former for mutation. Note that we won’t be able to change the vtable of a primitive, but I don’t think that’s so bad since they’re immutable anyway.

// vtable(o) returns o's vtable, vtable(o,v) sets it to v
function vtable(o, new_vt) {
  let curr_vtable;
  // First, get the current value of the vtable function
  if (typeof o === 'object' && o !== null || typeof o === 'function') curr_vtable = o.vtable;
  else curr_vtable = vtables.primitive[typeof o];
  // Next, mutate it if applicable
  if (new_vt === undefined) return curr_vtable;
  if (typeof o === 'object') o.vtable = new_vt;
}

Defining vtable_lookup

VTableVT’s lookup method is going to define the default vtable structure. It’s nice and simple: just store methods in a wonderful JS dict object called entries (yes, we could use Map, this is for simplicity). For delegation between vtables, we’ll impose a parent property.

function vtable_lookup(self, symbol) {
  let method = self.entries[symbol];
  if (method !== undefined) return method;
  if (self.parent !== undefined)
    return send(self.parent, 'lookup', symbol);
}

Unlike with the vtable() function, I’m not too worried about committing to dot-notation here; I expect this private state, internal to the default vtable implementation, to only be used by the code of one or two methods. In contrast, vtable() could be everywhere.

Define VtableVT

Firstly, because VtableVT just means “the vtable for vtables”, I will refer to it in the code as vtables.Vtable or Vtable for short. Second, the Id object model specifies that there is a vtable called Object which is the root of the parent delegation chain. Even Vtable delegates to Object; that way, adding a method to Object really will give every single object that method. Yet Object, being a vtable, has Vtable as its vtable … meaning that they both presuppose each other. We’ll come back to that in a moment.

Initialised state of a minimal, simplified Id/JS. It sure would be nice to have this diagram be the source code of the system! With AI perception, a PNG could do, but with SVG I’m optimistic we don’t even need AI. See a future post called “Self-Raising Diagrams” … I tried something related in my first PhD year but it didn’t work as well as I wanted. (Edit 2025-08-07: early experiments are promising! See repo.)

As for methods, Id specifies a minimal default set for all vtables (hence they reside in Vtable). Unfortunately, I find them hard to understand until I rename them (beware that this may be because I’ve subtly misunderstood in some way!) so here they are:

vt newInstance (originally allocate) creates a new object whose vtable is vt. If you interpret vt as a class, then it creates an instance of it.

function vtable_newInstance(self, name) {
    let new_obj = { name };
    vtable(new_obj, self); // "The new object is an instance of me."
    return new_obj;
}

(Just for convenience, I add a name property for each object, passed as an extra param to the new* methods, to aid in debugging. However, I’ll sometimes omit this for simplicity.)

vt newDelegatingToMe (originally delegated) creates a new vtable of the same “type” as vt, which delegates to vt as its parent; in other words, it returns a child vtable. If vt is a class, then this creates a subclass; if vt is a clazz, then it creates a subclazz! (Is this true? It commits to the default Vtable fields, so I’m not sure about this.) Internally, it uses newInstance to get a blank object and then adds structure. Note that if invoked on undefined, it calls newInstance on undefined – this is there to simplify the later bootstrapping of Object and Vtable.

function vtable_newDelegatingToMe(self, name) {
    let myCreator = self === undefined? undefined : vtable(self);
    let child_vtable = vtable_newInstance(myCreator, name);
    child_vtable.parent = self;
    child_vtable.entries = {};
    return child_vtable;
}

vt addMethod name, method sets the entry in the vtable…

function vtable_addMethod(self, symbol, method) {
    self.entries[symbol] = method;
}

vt lookup name retrieves it, consulting parents if necessary. We’ve already seen it but I’ll repeat it here:

function vtable_lookup(self, symbol) {
    let method = self.entries[symbol];
    if (method !== undefined) return method;
    if (self.parent !== undefined)
        return send(self.parent, 'lookup', symbol);
}

So Vtable shall be a vtable with those four methods inside it. We could create it by directly invoking the vtable_newDelegatingToMe on Object; note that this is not the same as sending a message, since in order to send a message we still need to define Vtable (it is recognised in order to short-circuit bind). Of course, we first need to somehow define Object. We can invoke vtable_addMethod four times to add the methods (including vtable_addMethod itself). Finally, we can manually set its vtable pointer to itself. That would be the simple way… but would it be impressive?

Bootstrapping shenanigans

Instead of that, the paper does something far more interesting: it bootstraps the object model into existence by sending messages as early as possible. To understand how, it helps to presuppose the existence of the system, and then ask how it would re-construct itself entirely via messages. The following is how we’d like to express it:

Object := nil newDelegatingToMe.
Vtable := Object newDelegatingToMe.

Vtable addMethod 'lookup', vtable_lookup.
Vtable addMethod 'addMethod', vtable_addMethod.
Vtable addMethod 'newInstance', vtable_newInstance.
Vtable addMethod 'newDelegatingToMe', vtable_newDelegatingToMe.

On line 1, we create a new Vtable, and call it Object. nil is to be understood as a special value so that Object can be at the top of the parent chain; we wave our hands and assert that vtable(nil) = Vtable so that it will work as intended. (I was tempted to complicate this by explicitly using undefined whose vtable is Undefined, but then I’d have to create Undefined by this point, and it would stray too far from what’s in the paper.)

On line 2, we create a new “subclass” of Object. Object is-a Vtable, so the new thing is-a Vtable. In fact it is Vtable, which is an instance of itself In terms of time travel, this is precisely the Vtable that made the previous line work.

We then add the methods, in a very deliberate order. To see why this is the case, let’s list the features each line depends on:

Object := nil newDelegatingToMe. [send, bind, lookup, newDelegatingToMe, Vtable, nil]
Vtable := Object newDelegatingToMe. [send, bind, lookup, newDelegatingToMe, Object, Vtable]

Vtable addMethod 'lookup', vtable_lookup. [send, bind, lookup, addMethod, Vtable]
Vtable addMethod 'addMethod', vtable_addMethod. [send, bind, lookup, addMethod, Vtable]
Vtable addMethod 'newInstance', vtable_newInstance. [send, bind, lookup, addMethod, Vtable]
Vtable addMethod 'newDelegatingToMe', vtable_newDelegatingToMe. [send, bind, lookup, addMethod, Vtable]

Since every line is a message send, it requires send(), which requires bind(), which requires the lookup message to be installed in Vtable, which requires Vtable to exist. However, I do not believe Object is thereby required; knowing what we do about the system, as long as the previous requirements are met, we shouldn’t experience any delegation past Vtable to Object. (If we were sending a message that wasn’t understood by Vtable, then we’d hit Object – but in the presupposed system state, all of the messages are understood by Vtable.) What this means is that, in a world that doesn’t already contain the working system, message sending is only truly possible after lookup gets installed on line 4.

So line 5 is the first that can actually be implemented as a call to send() (yes, this means lines 4 and 5 can be swapped). Alas, this is a hollow victory, because the purpose of line 5 is to add the addMethod method … via the message addMethod. Its success depends on itself! So the sending of this particular message is only possible by line 6, whereafter we send it to add newInstance and newDelegatingToMe; sadly far too late for the first two lines to use them.

Listing it in reverse order, we can “compile” it down to something that doesn’t require time loops:

// Vtable addMethod 'newDelegatingToMe', vtable_newDelegatingToMe.
send(vtables.Vtable, 'addMethod', 'newDelegatingToMe', vtable_newDelegatingToMe);
// Vtable addMethod 'newInstance', vtable_newInstance.
send(vtables.Vtable, 'addMethod', 'newInstance', vtable_newInstance);
vtable_addMethod(vtables.Vtable, 'addMethod', vtable_addMethod);
vtable_addMethod(vtables.Vtable, 'lookup', vtable_lookup);

// Option  1: define Object first
vtable(vtables.Vtable, vtables.Vtable); // loop it round
vtable(vtables.Object, vtables.Vtable); // Object is-a Vtable
vtables.Vtable = vtable_newDelegatingToMe(vtables.Object, 'Vtable');
vtables.Object = vtable_newDelegatingToMe(undefined, 'Object');

// Option 2: define Vtable first
vtable(vtables.Object, vtables.Vtable); // Object is-a Vtable
vtables.Object = vtable_newDelegatingToMe(undefined, 'Object');
vtable(vtables.Vtable, vtables.Vtable) // loop it round
vtables.Vtable = vtable_newDelegatingToMe(undefined, 'Vtable');

Reading this bottom-to-top (I prefer Option 2), we have a bootstrapping sequence for the system. It’s not quite clear if there is an advantage beyond the aesthetic; we might be tempted to think it’s less fragile to changes in the implementation of send() etc, but I think that the resulting sequence was so heavily reliant on precisely what calls what in what order, that it would need to be re-compiled if any of the definitions changed.

As an alternative to all that, one could just snapshot the final state of Vtable and Object and initialise them without any method calls at all:

/*
Vtable = {vtable: Vtable, parent: Object, entries: {
    lookup: vtable_lookup,
    addMethod: vtable_addMethod,
    newInstance: vtable_newInstance,
    newDelegatingToMe: vtable_newDelegatingToMe
}}
Object = {vtable: Vtable}
*/

vtables.Vtable = {entries: {
    lookup: vtable_lookup,
    addMethod: vtable_addMethod,
    newInstance: vtable_newInstance,
    newDelegatingToMe: vtable_newDelegatingToMe
}}

vtable(vtables.Object, vtables.Vtable)
vtable(vtables.Vtable, vtables.Vtable)

However, this can be seen as a trace of the side-effects caused from the bootstrapped message-sending version invoking the methods as we defined them. So unfortunately, it’s completely fragile to changes in those code blocks.

Adding the primitive vtables

Now is the time to finally define the vtables.primitive dict we used in our definition of vtable():

vtables.primitive = {};
vtables.Primitive           = send(vtables.Object,    'newDelegatingToMe', 'Primitive');
vtables.primitive.boolean   = send(vtables.Primitive, 'newDelegatingToMe', 'Boolean');
vtables.primitive.number    = send(vtables.Primitive, 'newDelegatingToMe', 'Number');
vtables.primitive.string    = send(vtables.Primitive, 'newDelegatingToMe', 'String');
vtables.primitive.object    = send(vtables.Primitive, 'newDelegatingToMe', 'Null'); // remember, typeof null = 'object'
vtables.primitive.undefined = send(vtables.Primitive, 'newDelegatingToMe', 'Undefined');

I would apologise for also defining a capital-P Primitive vtable next to the primitive dict, but I do think it’s justified. The reason it’s important is because, although we add a name property to all objects created via newInstance, primitives won’t have this property. If we don’t have a Primitive superclass, and we want to give all objects a response to the log message, we’ll have to override it separately in each of the primitives’ vtables. With a Primitive superclass, we can just do this:

// Primitive addMethod 'log' { self => {console.log(self);} }.
send(vtables.Primitive, 'addMethod', 'log', self => {console.log(self);});
// Object addMethod 'log' { self => {console.log(self.name);} }.
send(   vtables.Object, 'addMethod', 'log', self => {console.log(self.name);});

Which means we get this:

send(3, 'log'); // 3
send(vtables.Object, 'log') // Object
send(undefined, 'log') // undefined
send(null, 'log') // null
send(vtables.Vtable, 'log') // Vtable

Voilà – 3 is now an Object, as it should be, with an open and user-extensible set of methods. (Sure, it’s still immutable, but I’m pretty sure it’s that way in Smalltalk too.) We have re-unified the JS object model with Id/JS!

The big problem is that it’s slow, owing to the enormous overhead of bind-via-send. Luckily, we can fix this by following the optimisations described in the paper (symbol interning and caching). This ethos, pioneered particularly by Self, is – sorry – simply the correct way to do programming system design: make it uniform, flexible and humane first, and then insist that smart engineers find a way to dynamically optimise it without sacrificing the user experience (not to mention the fact that the hardware of tomorrow is likely to outperform that of today). They managed to do that for Self and then most of the same guys went and did it for JS. What a pity to see the road that many other languages took; at least Id offers us hope to escape them, should we find ourselves in such misfortune.

http://programmingmadecomplicated.wordpress.com/?p=8973

Extensions

Ode to Id (Notes on COLA’s object model), Part 1

Joel Feb 18, 2025

Show full content

“Open, Reusable Object Models” (Piumarta & Warth 2006) is my most beloved paper of all time. I discovered STEPS half-way through my final year of undergrad and was immediately enthralled by the vision of programming presented for “COLAs“, half of which is described by this small tech report. I remember being impatient to finish my degree so I could study it further. Many of my research interests and much of my PhD are outgrowths of its themes, as I understand them. I’ve now had 7 years of occasionally returning to re-read it with fresh eyes, and I understand something new each time. In the past year, I’ve gained experience with real running Smalltalk (Squeak) and I’ve been learning about other approaches to late-bound OO (Self, Newspeak). On my most recent reading, I finally grokked stuff that wasn’t clear to me previously, which I’ll share here. First, an introduction:

Id is platform-relative

The Id object model is a way to bolt on late-bound object/messaging behaviours in any (non-esoteric) programming language. The model is relative to your chosen implementation platform: it defines a structural layer in terms of the platform’s own data structuring mechanisms, which is then used to define “sending a message” which will ultimately bottom out in the platform’s code blocks (usually its notion of a “procedure”). In other words, Id is wholly parasitic upon the platform’s mechanisms for expressing code. This is why it is just an “object model” instead of an OO language or system. The COLA design augments it into such a system via a Lisp-like code language, but I was always very weak on that part, so I stick to the object model.

Here, I’ll use JavaScript as the example implementation platform. Accordingly, we will build (my interpretation of) Id/JS.

Sending via binding

The goal is to end up with a Smalltalk-like message sending mechanism in JS. That way, everything can happen by sending messages to objects, where an object may respond to a message however it wants. All of your JS code gets partitioned into many small methods. When an object is sent a message, one of these methods is somehow selected and executed with privileged access to the object.

The root goal of Id is to define a JS function called send() to accomplish this. The basic idea is that a message sent to receiver object o with payload payload will reduce to calling js_function(o, payload), where js_function is a method that has been selected one way or another. Inside js_function, the first parameter which is bound to o would customarily be named self. (We can’t use this, because that’s reserved for JS’s native OO – which we’re ditching in favour of something that was designed properly )

Such a “selection” process is called binding: we use some information available at send-time, to obtain a method. This information should at least include the receiver o, since different objects are allowed to behave differently in response to the same message. As for the rest of the information used to select the method, we could tautologically call it the selector. In practice, this selector is just a string acting as a “message name”, but it is concievable that the selector could be a composite entity (to allow for Lispy multiple dispatch). Thus, binding takes a receiver object and a selector to a method, and constitutes the first half of a message send:

function send(obj, selector, payload) {
  const method = bind(obj, selector);
  return method(obj, payload);
}

Syntactic compromise & sugar

Permit me a momentary digression into syntactic considerations. Smalltalk uses the following “prepositional keyword arguments” notation for message sends:

myObject doSomethingWith: myArg1 and: myArg2 andAlso: myArg3.

The compiler extracts the following selector string from this call: doSomethingWith:and:andAlso:. It then generates bytecode to send this message to myObject with arguments myArg1, myArg2, and myArg3. Note that keyword arguments still have a canonical order in Smalltalk; reordering them would imply a different message selector (eg. doSomethingWith:andAlso:and:).

In JS, we can try and use dictionaries as an ad-hoc attempt at keyword arguments: send(myObject, 'doSomething', {_with: myArg1, and: myArg2, andAlso: myArg3}). However, this has many pitfalls (we can’t use the word and twice like we can in Smalltalk; we can’t use reserved words like with or in, which uglifies any workarounds) and relying on a canonical order for the dictionary is tricky. I don’t want to deal with all that, so we’ll compromise between the two for our supra-JS messaging layer. We will notate a message send in “Smalltalk-ish” notation like so:

myObject doSomethingWith myArg1, myArg2, myArg3.

This will correspond to the JavaScript

send(myObject, 'doSomethingWith', myArg1, myArg2, myArg3);

We accordingly refine our definition of send:

function send(obj, selector, ...args) {
  const method = bind(obj, selector);
  return method(obj, ...args);
}

For later convenience in reasoning about this, we will also define an “Id/Maths” version relying on currying and expression substitution instead of JS procedures:

σ(o,s,a) = β(o,s) (o,a)

(Send is σ, bind is β, and arguments are objectified into a single term a.)

Binding via sending

Across Smalltalk-like OO systems, a send decomposes into two steps: method lookup (or “binding”) and method execution (usually by creating a method “activation context” object). In Id/JS, we leverage the JS stack for method execution, so that’s already done. (“The stack” is really just an optimised early-bound version of the tree of method activations, anyway.)

However, in all OO systems I am aware of besides Id, bind() is a hardcoded VM primitive. Binding always works the same way across all objects. Id’s genius contribution is to define bind by means of a further message send: to bind a selector in a receiver object, we obtain the receiver’s Binder Object (my term; the paper calls it a “vtable”) and send a special “lookup” message to that:

function bind(obj, selector) {
  return send(binder(obj), 'lookup', selector);
}

Mathematically:

β(o,s) = σ(B(o),λ,s)

What this means is that there can be all sorts of different binding semantics (parent delegation, multiple parents, procedural generation…) coexisting in the same system, and available at the user level. All we need is to posit some minimal meta-structure on objects (namely, that each object has a Binder object) and now method binding itself is a late-bound response to a message!

Thus are send and bind mutually recursively defined; an elegant yin/yang of OOP, just as eval/apply (or if you prefer, normalise/reduce) are to Lisp.

There’s just one problem…

*record scratch*

…they cause a stack overflow.

Fixed-point fanciness

As written so far, this mutual definition of send and bind lacks any sort of base case, so the JS implementation very quickly spirals out of stack space. Let’s follow along in Id/Maths to see how the sequence proceeds:

Definition:
σ(o,s,a) = β(o,s) (o,a)
β(o,s) = σ(B(o),λ,s)

Sample send (eliding method invocations):
σ(o,s,a)
= β(o,s) ...
= σ(B(o),λ,s) ...
= β(B(o),λ) ...
= σ(B(B(o)),λ,λ) ...
= β(B(B(o)),λ) ...
= σ(B(B(B(o))),λ,λ) ...
= σ(B(B(B(B(o)))),λ,λ) ...
= σ(B⁵(o)),λ,λ) ... 
= σ(Bⁿ(o)),λ,λ) ...

We see that, from two “hops” of the B-function, we are forever sending lookup 'lookup' to further and further binder objects. It appears that we must do an infinite amount of binding before we can actually execute anything!

These sorts of problems are solved by closing the loop at some point: suppose that there exists a special object ω which serves as its own binder object: B(ω) = ω. In other words, it is a fixed point of the B-function. Further suppose that our original receiver o eventually reaches ω through its binder chain. Then our send/bind co-evaluation will eventually hit ω and settle, reducing between σ(ω,λ,λ) and β(ω,λ) forever. This means that, if we can just define this specific β(ω,λ) without referring to σ, we will have a base case and the entire thing will be finite.

Essentially, the special object/selector combination (ω,λ) forms a “free” parameter that must be additionally supplied to define the system, reminiscent of the “+ C” when you solve a differential equation. There is one of these special “fixed-point lookup methods” for every such self-binding object ω; let us call it Λ(ω). Our revised definition of bind is now:

β(o,λ) = Λ(o) whenever B(o) = o; otherwise,
β(o,s) = σ(B(o),λ,s)

function bind(obj, selector) {
  if (selector === 'lookup' && binder(obj) === obj) return fixed_point_lookup_method(obj);
  else return send(binder(obj), 'lookup', selector);
}

Id simplifications

The Id object model presented in the paper does not operate at quite such a general level; instead, its definition of bind assumes a single self-binding object called the “vtable vtable” (VtableVT) as the terminus of all B-chains (vtable-chains). Hence there is a single fixed-point lookup method vtable_lookup. However, Id’s bind does not merely return it; instead it returns the result of calling it:

function bind(obj, selector) {
  if (selector === 'lookup' && obj === VtableVT) return vtable_lookup(VtableVT, 'lookup');
  else return send(vtable(obj), 'lookup', selector);
}

I would have thought that returning vtable_lookup is simpler. True, in the fully-initialised Id system, VtableVT’s lookup entry does point to vtable_lookup, so the result should be the same. But is this another “axiom” that isn’t implied by the other rules? Abbreviating VtableVT by v, we have the following endless expansion:

σ(v,λ,λ) = β(v,λ)(v,λ) = σ(v,λ,λ)(v,λ) = β(v,λ)(v,λ)(v,λ) = ... = β(v,λ) (v,λ)+ = σ(v,λ,λ) (v,λ)*
β(v,λ) = β(v,λ)(v,λ)* = σ(v,λ,λ)(v,λ)*
So actually σ(v,λ,λ) = β(v,λ)(v,λ)* = σ(v,λ,λ)(v,λ)*
And since β(v,λ) = Λ(v):
σ(v,λ,λ) = Λ(v)(v,λ)* = σ(v,λ,λ)(v,λ)*
β(v,λ) = Λ(v)(v,λ)* = σ(v,λ,λ)(v,λ)*

The implementation of bind in the Id paper expresses β(v,λ) = Λ(v)(v,λ), but it seems to me that this is just a slower way of expressing β(v,λ) = Λ(v) with no gain in generality.

In our JS code, we’ll proceed with Id’s simplifications and its “vtable” terminology (lifted from C++’s “virtual table“). However, I find it interesting to be aware of the more general case, so I’ll stick to my more general terminology (binder) for mathematics and musings.

Classes and metaclasses

The intent of the vtables/binder objects is to act as the class (i.e. shared behaviour) of multiple objects which all bind selectors to the same methods.

If b = B(o1) = B(o2):
Let m = σ(b,λ,s)
σ(o1,s,a) = β(o1,s) (o1,a) = σ(b,λ,s) (o1,a) = m(o1, a)
σ(o2,s,a) = β(o2,s) (o2,a) = σ(b,λ,s) (o2,a) = m(o2, a)

The basic assumptions of the model are that the receiver has a binder object, and that this binder object responds to the “lookup” message. (This “lookup” message could have been called “bind”, but then things would get even more confusing than they already are…)

Things get more interesting when we follow the process another step. Suppose b1 and b2 are “classes”, while b3 and b4 are “clazzes”. A Class just queries a dictionary data structure for a method matching the selector. A Clazz procedurally generates its methods using the selector as a seed!

If B(b1) = B(b2) = Class
&  B(b3) = B(b4) = Clazz
σ(b1,λ,s) = β(b1,λ)(b1,s) = σ(B(b1),λ,λ)(b1,s) = σ(Class,λ,λ)(b1,s) = dict_lookup(b1,s)
σ(b2,λ,s) = β(b2,λ)(b2,s) = σ(B(b2),λ,λ)(b2,s) = σ(Class,λ,λ)(b2,s) = dict_lookup(b2,s)
σ(b3,λ,s) = β(b3,λ)(b3,s) = σ(B(b3),λ,λ)(b3,s) = σ(Clazz,λ,λ)(b3,s) = gen_method(b3,s)
σ(b4,λ,s) = β(b4,λ)(b4,s) = σ(B(b4),λ,λ)(b4,s) = σ(Clazz,λ,λ)(b4,s) = gen_method(b4,s)

β(b1,λ) is read as “bind ‘lookup’ for b1”. It can be seen as answering the question: “how does b1 lookup?” The answer is σ(Class,λ,λ): b1 is a Class, so it looks up like a Class. This “lookup like a Class” method is then applied to b1 to look up the original selector s. A similar story holds for b2. On the other hand, o3 and o4 are Clazzes, so they look up like a Clazz, and generate their instances’ methods from the selector using, uh, Perlin noise or something.

Class and Clazz could each be a self-binder, but they don’t have to be (note that if Clazz were a self-binder, I think its “lookup lookup” could be exempt from its weird procedural generation, because it would be an externally-supplied fixed point method). If we are to be consistent with Id, which posits only one self-binder (VtableVT), then it suffices to put both of them directly below it:

B(Class) = B(Clazz) = VtableVT
Let m = vtable_lookup(VtableVT,λ)
σ(Class,λ,λ) = β(Class,λ)(Class,λ) = σ(VtableVT,λ,λ)(Class,λ) = vtable_lookup(VtableVT,λ) (Class,λ) = m(Class,λ) = dict_lookup
σ(Clazz,λ,λ) = β(Clazz,λ)(Clazz,λ) = σ(VtableVT,λ,λ)(Clazz,λ) = vtable_lookup(VtableVT,λ) (Clazz,λ) = m(Clazz,λ) = gen_method

I think m can be interpreted as “lookup like a vtable”, since Class and Clazz are vtables. Meanwhile, despite what we saw under “Id simplifications”, it doesn’t actually seem necessary to require that vtable_lookup(VtableVT,λ) = vtable_lookup. I think vtable_lookup, i.e. the fixed-point lookup function for VtableVT, should actually be read as “how the VtableVT looks up”. So I’m tempted to rename vtable_lookup to vtablevt_lookup, and then m is an ordinary vtable_lookup. But I thought I proved that vtable_lookup(VtableVT,λ) = vtable_lookup by necessity? Why does it look like this won’t break anything? My head hurts. I’ll leave it there for now.

Until next time, here’s what we’ve arrived at:

function send(obj, selector, ...args) {
  const method = bind(obj, selector);
  return method(obj, ...args);
}

function bind(obj, selector) {
  if (selector === 'lookup' && obj === VtableVT) return vtablevt_lookup;
  else return send(vtable(obj), 'lookup', selector);
}

http://programmingmadecomplicated.wordpress.com/?p=8538

Extensions

People Whose Research Programmes I May Need To Understand

Joel Feb 10, 2025

They said that, after my PhD, I should be an expert in my narrow topic. I suppose that’s true, if we’re talking about precisely the topic embodied in the dissertation. However, from the additional literature suggested to me during the corrections process, and my freer scope since submitting, I’ve been humbled to discover just how … Continue reading People Whose Research Programmes I May Need To Understand

Show full content

The sheer amount of stuff I need to understand (and read multiple times), in order to make proper scholarly progress towards my goals, is formidable. Considering that my dissertation was supposed to be more than just a literature review, I simply didn’t have time, so I can be forgiven for limiting the scope. But now, it feels like a good opportunity to pursue the leads opened up from my corrections.

I emphasise the scholarly mode of progress, as opposed to the “hobbyist” (for want of a better word). In the years before my PhD, I was the hobbyist researcher, pursuing it in whatever time was left over from my undergraduate studies, or later my programmer job. Learning about all the prior work in this available time would have taken me years, so instead I focused on practical experimentation without too much concern that I was duplicating something someone else already did. However, during my PhD and now my postdoc, catching up with history became more feasible. It was cut short during my PhD due to having to develop my BootstrapLab prototype and actually write the dissertation. In my current life as a postdoc, it feels like I still don’t have enough time to catch up on everything relevant to my interests (such is the nature of “everything that has been done” over the last 45 years or so), and I still need to actually write papers and try out ideas. Still, I’m now able to do a decent job at discovering what’s been done and systematically building on it, instead of the un-situated work that I started with.

Here, I’ll list some of these “research programs”. Of course, most/all of these research programs don’t have existing names, and they tend to revolve around a single person’s ideas or leadership. So I think it makes the most sense to refer to a single person’s name which I associate with the research program – mentally append “et al” to be technically correct. In the course of writing this post, I’ve gone down some of these rabbit holes and not others, so some info will be informed while others will be speculation.

Can’t ignore Smalltalk any longer

There’s this meme that everything interesting in programming was already achieved by Lisps and Smalltalks through the 1980s. I always invoked it tongue-in-cheek, but towards the end of my dissertation writeup I was struck by how true it actually is, and shocked at the implications for the field. Whaa?

The first hurdle to actually trying out Smalltalk for the first time is isomorphic to trying out Linux: wait, there are all these “distros” and I have to choose one? With Linux, you really want to avoid wasting time installing an operating system that you later regret committing to, because it turns out that it doesn’t support your hardware or application or feature that you rely on. However, things aren’t so bad for installing Smalltalk as a “enclave OS” within the Omnix hegemony. Off the top of my head, I’ll compare Squeak to Ubuntu (de facto winner), Pharo to Red Hat (made for industry), and Cuis to Alpine (simplified/minimalist). I probably got that wrong but whatever.

Sometime in 2018, I decided to try Squeak. After installing, I faced the second beginner hurdle: what I call the Splat Effect. Imagine your grandma facing a computer for the first time, or being thrown into an aircraft cockpit with no training, or being presented with Bret Victor’s unlabeled microwave. You open your eyes and … SPLAT! “Here’s this novel environment. We’re not gonna give you any pointers. Try hitting random things to understand how it works”. Uh, no thanks, I have some paint drying that needs my attention. I promptly gave up.

Fast forward to last July, when I visited the Software Architecture Group at HPI – a research group that I consider to be the closest surviving embodiment of the themes of STEPS (and in the same city that hosted the two workshops on self-sustaining systems, regrettably when I was only a child, and in my phase of C++ worship at that…!) Hirschfeld et al. gave me a warm welcome and demoed some key systems I had never been able to break into: notably Lively Kernel, but also Squeak. This was exactly the push I needed. I returned very impressed and eager to follow Squeak By Example, the de facto tutorial that I somehow hadn’t found last time. Shortly afterwards, I initiated myself by plunging into the deep end: my first project was to port the source code of ThingLab II from 1989 Smalltalk to modern Squeak. The process of debugging and fixing in-system (and then carefully copying my changes to the big source file) was, in line with expectations, largely devoid of the accidental complexities I’d face in traditional programming systems, and was the best debugging experience I’d ever had.

So, to anyone similarly discouraged by the Splat Effect, I’d highly recommend the Squeak By Example book or Lawson English’s video tutorials (which can give you a feel for the system by just passively watching him; I have dreams of doing an updated video crash course, but who knows when I’ll get the time – and I’d want to be a bit more experienced first). However, to save you even the time of going through those, I will list the most important basics here (most of which are also my favourite features):

What savages call “the Terminal” is split into the Transcript (output) and the Workspace (input). Both are just editable text windows. Selecting text and right-clicking gives the critical operations, as follows…
Do it (Ctrl-D) executes the selected code. Debug it (Ctrl-Shift-D) opens a debugger window at the beginning of execution. Print it (Ctrl-P) does it and inserts the stringified result at the current location in the text window. Inspect it (Ctrl-I) does it and opens an object inspector window on the result. Browse it (Ctrl-B) does it and opens a class/method browser on the class of the result. Being able to do even one of these things to any selected text is a rarity in programming elsewhere – having all five of them is an entirely new plane of convenience and I never want to go back.
The Inspector lets you view and edit all the private instance variables of an object; there’s also a text field for executing code where self is bound to the object.
The Browser lists class categories (essentially a single-level packaging system, no hierarchies), followed by class names, followed by method groups (“protocols”) within the class, followed by methods in the group. The bottom text box contains class metadata / documentation / method code. It is editable; changes are committed on Ctrl-S. The browser has buttons to switch between “class” methods (the analogue of C++ static members) and “instance” methods.
To create a new class, you just send the message subclass to an existing class: e.g. Object subclass: #MyRectangle instanceVariableNames: ‘x y width height’ classVariableNames: ” poolDictionaries: ” category: ‘MyClasses’. Default placeholder code for this appears for you to edit when you de-select a class in the class browser. De-selecting a method similarly puts sample method code in the editor, which adds a method once committed via Ctrl-S … but I’m not sure how to create methods programmatically.
There are two GUI systems within Squeak. The modern “Morphic” was ported from Self, and has some very nice design features that approximate the idea that anything you can see is a “real object” that can be inspected and manipulated.
The killer feature of Morphic is the Halo, brought up around any GUI element by meta-clicking it (Alt-click on Windows). It provides options for transforming the morph (move, rescale, and rotate with the mouse), and inspecting its Smalltalk object. Morphic is the only system I know of where one can actually go directly from “the thing you see on the screen” to “the thing that directly caused it”. Why was this not copied by everybody?? (Probably because it relies on the reflective architecture of Smalltalk.)
ST80’s GUI system is referred to as MVC, i.e. Model View Controller – I always knew this as a design pattern, but it’s also the de facto name of the compatibility GUI system for running legacy code. Squeak contains the most important classes from MVC, but not all of them – as I discovered porting ThingLab II. MVC lacks the Morphic halo and other nice features. It also looks charmingly retro, yet with high-res antialiased fonts.
A “project” is an entire world, or “everything you see on the screen”. A project can only use one GUI system. The default project is Morphic, but you can create an MVC project, enter it, and simulate the old ST80 experience.
Both Morphic and MVC use software rendering! This makes the code for “drawing things”, including the GUI itself, nice and simple compared to what you have to do with OpenGL and the like. You don’t have to think about vertex buffers, or about how to translate your design vision into triangles and values interpolated across them. Squeak rightly prioritised usability and expressibility over premature optimisation here; the only downside is that I don’t see a good way to do 3D (but on that, maybe see Croquet?)

Other Smalltalks are also of interest to me:

Gilad Bracha’s Newspeak seeks to address problems of security and interoperability that may have prevented Smalltalk from spreading more widely. Newspeak can render itself in a given platform’s GUI system (instead of insisting on doing all of its own rendering and interaction in its own display buffer) and totally eschews global variables in favour of explicit dependency injection – thus supporting capability-based security and capability-restricted reflection via mirrors.
Ungar and Smith’s Self began only a few years after Smalltalk-80 and seems to be an experiment in extreme uniformity, parsimony, and purity beyond Smalltalk. In Smalltalk, everything is an object, including classes, but Self does away with classes in favour of prototype objects; inheritance becomes delegation and instantiation becomes cloning. Where Smalltalk has private variables and public messages, Self just has slots. Lexical scoped resolution of names is a pluggable delegated-lookup mechanism. Even method invocation is accomplished by treating the method object as a prototypical activation, cloning it, and evolving the activation object. Very elegant!
Tudor Gîrba’s Glamorous Toolkit is a very recent arrival … I know it provides for some mixing of text and graphics, having cited it as a possible example of Notational Freedom in my dissertation. I don’t know much else about it, but I’ll have to look into it one day.

Borning’s Constraints

Constraints are an essential part of the behaviour we want in a lot of software, and programming system support for constraints is a key step in reducing accidental complexity. Any time you’ve found yourself writing an imperative update() or tick() method, relayout() method, or event handlers, it is likely that you were Greenspunning a slice through a constraint solver. Imagine if that solver was implemented once, by smart people, who heavily optimised it? (There is an analogy with manual memory management vs. garbage collection here.)

Alan Borning, who I think of as “Mr. Constraints”, has a long trail of interesting research in this area. The stuff I know about is:

ThingLab I, a 1979 extension to Smalltalk acting as a sort of spiritual successor to Sutherland’s Sketchpad (1962 – but see also Sketchpad14). ThingLab I had both one-way dataflow constraints and continuous least-squares relaxation (like Sketchpad). It was geared towards physics simulations; demoed examples include a suspension bridge, electrical circuit, and C-F temperature converter. There is a runnable web version in a JS VM.
ThingLab II, a 1989 update geared towards constraint-based GUI construction! It ditched the constraint relaxation. I could find very little literature about this, and zero pictures or video – so I dug up the code archive and eventually got it working in Squeak 6.0!
Cassowary, an incremental linear inequality constraint solver optimised for real-time UI stuff. Meaning: they don’t have to throw away the entire simplex tableau and recompute from scratch every time you move the cursor; they cleverly manage to re-use most of it and make a small delta.
Babelsberg (Felgentreff, i.e. Mr. Constraints Jr.) explores adding constraints to imperative OO languages (that’s all I know right now, will update when I’ve read the thesis…)
Wallingford adds reactivity to Babelsberg or something like it. (Again, will update when I know more about it.)

Let’s also not forget what I think of as “discrete” constraint solving: SMT solvers, theorem provers, etc. (I’m also interested in any connection with the bidirectional programming work of Brian Hempel et al.)

The big problem that I see with constraints is that all the research has been poured into the solvers themselves, while the interfaces for working with constraints have remained mostly textual and programmatic. I would like to see innovation in the notations for drawing (especially graphical) constraints into existence, and for explaining / debugging their behaviour. Which brings us to…

Auckland Layout Editor and Haiku

I always like to think of individual constraints (e.g. Cassowary’s linear inequalities) as assembler instructions: they are what the computer can actually deal with, but they are too fine-grained and numerous to specify directly beyond small toy examples. As a human being, you want to package common patterns of constraints into re-usable components which “compile” down to the constraints. At the same time, you still want to have your analogue of C’s __asm block, in case the built-in patterns of constraints don’t cover your needs. Furthermore, if your constraints are all about graphical layout, then you really should be able to draw them into existence instead of counting pixels in your head. Thus I was very excited to see that this may have been achieved for constraint-based GUIs in the Auckland Layout Editor (ALE) research.

The paper and video are good, but in order to properly grok the approach (and port it to my own projects) I felt like I should study their implementation. Unfortunately, for whatever reason they’d chosen to target an OS I’d never heard of called Haiku. Apparently, in the late ’90s there was a legit competitor to Windows and Mac, called BeOS. It is alleged to have had various advantages over the other two (at the time) regarding stability and multimedia performance. However, it lost the battle of the marketplace*, and Haiku is the modern open-source successor. One of these days I’ll simply have to install a Haiku virtual machine and see what ALE’s all about.

(*) (Future post: The Fossil Record of Computing’s Mass Extinctions)

Myer’s GUIs

Pleasant / less painful methods of GUI implementation (I do not want to hear about design! Given a design, how do you get it working on da compooter!?) have been a long-standing interest of mine, but I had to shelve this program for other directions in my PhD. Brad Myers seems to be, basically, Mr. GUI (Implementation) with an impressive history of research in this area (including work on constraints in GUIs). Unfortunately, a lot of it is a victim of the Mass Extinction events (e.g. ancient C++ codebases with tiny windows made for an era of low-res displays). Curiously, it seems his group did a ton of research through the 80s to 1996, but then it kinda stopped? (At least on that specific theme.) Could this have something to do with RPG’s systems-to-languages paradigm shift…? Or did he and his team just solve the GUI implementation problem and moved on to the next thing? In which case, why did their techniques not spread everywhere? Even accounting for the 20 year time lag between research and industry, we’ve been due this stuff for at least a few years now.

http://programmingmadecomplicated.wordpress.com/?p=8057

Extensions

Notes on the “late binding” of Virtual Memory, etc.

Joel Jan 31, 2025

When a DLL/shared object is loaded into process virtual memory, it competes for space with the process’ code and data, so its base address is knowable only at load-time. The DLL has metadata (“relocations”) describing the address references in its code, and the loader uses this metadata to go through all these references and patch … Continue reading Notes on the “late binding” of Virtual Memory, etc.

Show full content

Back in the Old Days, before virtual memory, every program was like that DLL. They all had to share the same memory space and expect to need relocation when loaded into an available area. They even had to expect their own heap pointers to get relocated due to periodic heap compactions by the OS (in order to save on what little memory was available), in a manner reminiscent of what some garbage collectors do today.

(In a stroke of irony, Raymond Chen’s blog was internally moved around on Microsoft’s hosting and got new URLs, yet the internal hyperlinks still point to the old URLs and get 404’s when you try to follow them! They were never relocated to the new base address!)

With virtual memory, competition among programs for contiguous ranges of physical addresses was no more; what remained was competition among a program’s own data structures, and any dynamically loaded objects, for contiguous ranges of virtual addresses.

This was made possible by a hardware module (MMU) that maintains a per-process virtual/physical address mapping, and dynamically translates the virtual to the physical every time an instruction accesses virtual memory.

Clearly, the MMU is doing dynamically what the old program loader was doing at “load time”, in bulk: relocating the address references. And yet, even with the MMU, the new program loader and internal compacting garbage collector is still doing bulk relocation! I wonder if just one more hardware component could free us from that? Virtual virtual memory? Virtual2 memory?

A similar indirection layer is convergently evolved in software form as “object IDs” or “handles”. You want to relax commitment to memory location (so as to freely move objects in memory to achieve other goals) while preserving the client’s commitment to object identity.

Poll vs. Propagate

Consider an instruction somewhere in the process’ code. It has an address reference. Suppose for the sake of argument that we’re doing virtual address translation in software, i.e. with nontrivial overhead. If this instruction is executed a million times over the process’ run time, then it will incur a million times this performance overhead. Call this the “polling” strategy: the answer to the question “which physical address currently corresponds to this virtual address?” is polled on demand, every time.

On the other hand, if the translation is done to the instruction itself, by the loader – so that it now references the physical address and doesn’t go through the indirection – we pay a one-off cost for this relocation, and no per-access overhead. Call this the “propagate” strategy: the answer to the question is evaluated at load time, and then propagated to every site where it gets demanded.

What if there were a million such instructions? Then maybe the delay caused by the loader relocating the addresses would match the delay caused by the per-access overhead.

Also the instructions would’ve been committed under an underlying assumption (that virtual address V maps to physical address P). If that assumption were violated (the mapping got changed), the instructions would have to be re-relocated. Even with just one instruction, if the underlying virtual-physical mapping changed a million times during execution, then we’d again have matched the overhead of a per-access indirection.

From a performance perspective (with our implicit simplifiying assumptions, e.g. that one relocation operation costs the same as one dynamic lookup) it seems that whether “poll” or “propagate” incurs the least total overhead depends on certain assumptions.

Under polling, every change in the virtual-physical mapping incurs no cost, while every access incurs a cost.
Under propagation, every change in the mapping forces a relocation cost per access site, regardless of how often it actually executes.

Therefore:

If we expect changes in the mapping to be frequent, or if there are many access sites, or if we expect there to be very few actual accesses in the running program, then polling would have the lowest total overhead.
If we expect changes in the mapping to be rare, or if there are few access sites, or if we expect there to be many actual accesses, then propagation would have the lowest total overhead.

(I’m visualising one possible history of process execution, time going to the right. Down the vertical is a list of each access site for the same virtual address. As time proceeds, at most one access site is executing at one time: it’s a boolean graph, like a piano roll editor. We define a time-increasing count on this graph. For polling, we just increment on each execution, i.e. add up all the piano bars. For propagation, we begin by incrementing for each access site down the vertical, representing the propagation of the initial physical address. As time progresses, we repeat this every time the mapping changes. The mapping itself could perhaps constitute a third dimension or something. The formal solution to the optimisation problem would be related to the geometry of this discrete 3d space.)

An intelligent system might output machine code with polling, given the first set of assumptions, but as soon as it notices e.g. increasingly frequent changes to the mapping, it replaces it with relocatable machine code without polling (and then changes it back when the appropriate conditions change). This would be late-binding “whether to use polling or propagation”. As it happens, a human developer tends to make this decision based on profiling, but of course the computer will always have more context-specific information to work with at run time than a human being ever will.

Oh – and another assumption necessary to use propagation is that the metadata is available in the first place. Polling doesn’t need metadata, but propagation requires that there’s some way to iterate over all the access sites and enough knowledge of exactly how they can be fixed (e.g. different addressing modes and formats).

In the Old Days, the addresses inside the unloaded program-on-disk were the “virtual” addresses (either fully relative addresses, or committed under some preferred base address). The “virtual” address has always existed – it’s just that the transition from relocated programs to MMU hardware eliminated polling overhead and thus removed the need for relocation metadata. Even so, an isomorphic problem remains for DLLs in a “virtual” memory space… would it be practical for a process to have its own “page table”, so that its code and all shared modules in fact see their own address spaces? Not sure…

My example has been pretending that there is a software indirection layer alongside the software relocation engine. In real life, the hardware MMU handles the polling, so there is no polling overhead comparable to relocation overhead. However, I see the “poll” vs “propagate” duality elsewhere, without hardware support. If we rephrase the above in terms of object handles (instead of virtual addresses) and addresses (instead of physical addresses), perhaps as references in a data structure instead of machine instructions, then the same considerations apply. Having a “virtual name” (object handle) allows the memory manager to move objects around to save space or optimise for the cache or whatever. But going through the handle-to-pointer table on each access might be undesirable under certain usage patterns. On the other hand, the alternative of storing pointers and walking the object graph to rewrite them all might also be undesirable under other usage patterns.

Compilation can be seen the same way: compilation is the “propagate”, while interpretation is the “poll”, with regard to the language semantics. An interpreter visits each expression and polls how to transform/execute the expression at that moment. Thus, if there are meta-circular language constructs for changing these semantics, the “poll” will simply answer the new behaviour, and all will be well. Compilation “unrolls” this process under the assumption of a fixed semantics (which is a correct assumption for all normal programming languages). However, compilation could still work as long as the compiler knew which commands would change the language semantics, and output self-modifying code that would “propagate” the changes across itself each time it hit one of those points. Perhaps this would require bundling the compiler itself as a module within the compiled program. If we zoom out to the Unix programming system, then this is what we have, at least if we’re willing to perform the steps manually: if I change the language semantics by changing the compiler, then I recompile the program source under the new semantics. (Do Continuous Integration systems ever watch for changes to the compiler? I assume that language experimentation is so rare in those circles that they probably just watch for changes in the program source, assuming a constant compiler? And how does Smalltalk respond to changes to its own compiler methods?)

Another example is variable access, e.g. the resolution of boolean expressions in conditional expressions. Ordinarily, an if statement might poll/re-evaluate its condition each time it’s executed. However, if it is known (at a certain time) that the condition is true or false, this could be propagated as a constant to each if statement that references the expression/variable. Since if (true) and if (false) have trivial semantics, some re-wiring of the control flow graph could also occur. Of course, when the condition changes, violating the assumption under which this new code was committed, the code must be re-organised to be appropriate for the new value. Essentially, it’s self-modifying code as an alternative to variable lookups. More program slicing!

Another example might be rendering. The approach used in games, of re-rendering the entire scene many times per second, seems appropriate under the reasonable assumption that much of the view is probably changing constantly. By contrast, the traditional GUI rendering approach uses “dirty rectangles”, aiming to re-render only that part of the scene that has actually changed; appropriate considering that in a GUI, there is inertia in most pixels most of the time. A very intelligent system would make poll-vs-propagate an implementation detail, dynamically re-writing the code to use whichever was the most efficient in the circumstances.

A final example is input handling: I/O is where the term “polling” comes from. One may use an API to poll the state of a mouse or keyboard button. The “propagate” strategy seems related to the event-driven approach: when the answer to the question changes, you are given the opportunity to propagate the effects of this change throughout the program via event handlers.

Summary of the questions/answers:

In virtual memory, the question is “what is the physical address for this virtual address?”
In position-independent code, the question is “what is the absolute address for this offset?”
With object handles, the question is “what is the address for this object ID?”
With variable names, the question is “what is the value of this name?”
With method lookup, the question is “what method does this object use for this selector”?
With rendering, the question is “what is the colour of this pixel”?
With input, the question is “what is the state of this device”?

Does this help?

I want to translate my vague English intuition of “commitment vs. late-binding” or “static vs. dynamic” into more precise mechanisms. I am more interested in the boolean quality of whether I have to commit to a decision, or whether software will decide it for me – yet I ended up sidetracked into performance differences. Am I any closer to my goal?

Poll vs. propagate ought to be equivalent from a commitment perspective: when “polling” the answer to a query every time you want an answer, you aren’t committed to anything, but even when you’ve “propagated” a specific answer to every place you might ask for it, the new answer ought to just be re-propagated across them if the answer ever changes.

This fact, that they’re just different implementation strategies, relies quite clearly on revocable commitment in the propagation case: all the question-asking sites are committed under a shared assumption of what the answer is, but when the answer changes, the commitment needs undoing at the very least (followed by a return to polling, or a propagation of the new answer).

I suspect that the reason programmers like me prefer dynamic/late-bound environments is because, in practice, systems opting for “propagate” don’t bother to provide very good revoking mechanisms. Meanwhile, systems opting for “poll” don’t need to worry about that, even if they’re a little slower, so we find ourselves working at the right level of indirection all the time, without any worries.

The lack of ability to hot-swap code in a running Unix process (research like Ginseng/DSU notwithstanding; it’s research! Not commonplace! Not household!) must be a large part of this. In a “compiled” language, not only is the resulting binary committed under the language semantics, but it’s also committed under your source code, and so is any process loaded from it. While you’re interacting with the running process, these commitments are irrevocable. The only way you can revoke them is by killing the process, changing the source, and re-committing. There are two paths out of this: allow revocation of compiled commitments within the Unix process, or do as little as possible within the confines of the Unix process.

The latter path will be explored in my Smalltix project (binaries are methods; keep ’em small). The former path exists in the form of “dynamic” systems, such as Smalltalk-like VMs living under Unix. For example, the browser JS console allows me to redefine functions and variables (as long as I didn’t make the mistake of defining them as const). It doesn’t let me express a change to “line 12” (but this isn’t too bad) or even “line 3 of this function” (this is worse; I have to replace functions wholesale. Should I treat JS functions like Unix processes and keep ’em small?) And of course, the Squeak VM lets me modify Smalltalk methods (since they are just objects in the image) and presumably even lets me modify the Smalltalk compiler (from syntax to bytecode) or define my own syntax. What it doesn’t do is let me change the semantics of Smalltalk bytecode, or the inner workings of the VM – in these respects it is committed, and they have their own internal VM-compiling pipeline to create a new one, reminiscent of the way you can compile a new kernel under Unix. And that brings us to Unix itself, where we really aren’t irrevocably committed to much at all – just the kernel, and that’s between restarts!

I maintain that Squeak as a Unix process functions like Unix as an operating system. Kell was right, and I’ve been converging on this for a long time (I still have some disagreements but the basic point is beyond dispute). Squeak appears flexible and dynamic compared to other interactive applications distributed as Unix binaries. But it really shouldn’t be forced to live as one, and nor should other interactive applications.

Come on! Can I not say anything useful about late binding? What is it??

Perhaps “late binding” is just a synonym for “no irrevocable commitments”. If the question-answering mechanism is implemented as “polling”, then there are simply no commitments; if you supply a different answer, all relevant sites will see it next time they poll. The very fact that the sites poll for the answer means the answer is unknown to them, which means the code that uses the answer has probably been written to handle all the different answers it could get. On the other hand, if propagation is used, and you have the assurance of late binding, then the propagation must function as a mere “cache” of the answer to the question. Some procedure took the initial answer and propagated it to all demand sites. That same procedure is presumably set to trigger whenever the answer changes. Hopefully the procedure is set up to work no matter what the new answer is.

A Smalltalk method cache is a form of propagation replacing polling. But it doesn’t interfere with late binding “which method shall the receiver use for this message?” because the cache commitment is revocable; the cache is presumably invalidated whenever the answer to that question changes.

I speculate that early binding is just the existence of an irrevocable commitment within your scope, i.e. range of permitted actions. Many aspects of the Unix process are early-bound with respect to the process itself, because they can’t be revoked from within it (practically speaking). Perhaps they are still early-bound with respect to Unix, because you still normally can’t hot-swap code in a running process even from the outside. But the corresponding aspects of the binary are late-bound, with respect to typical Unix programming activities?

I hope this gets me closer. I’m going to bed, goodnight.

http://programmingmadecomplicated.wordpress.com/?p=8920

Extensions

Unix Commitments, Formalised (part 2)

Joel Jan 30, 2025

Commitment to a particular syntax/semantics Since my Smalltix epiphany last week I’ve been having something of an identity crisis. The thing I want is Notational Freedom, in partial fulfillment of the dictum that we can Use The Right Tool For The Job. I notice that something like OMeta is specifically designed to support Syntactic Freedom, … Continue reading Unix Commitments, Formalised (part 2)

Show full content

Commitment to a particular syntax/semantics

Since my Smalltix epiphany last week I’ve been having something of an identity crisis. The thing I want is Notational Freedom, in partial fulfillment of the dictum that we can Use The Right Tool For The Job. I notice that something like OMeta is specifically designed to support Syntactic Freedom, while ordinary programming languages/systems (e.g. Python) are not; their commitment to a single syntax can be seen as appropriate under an assumption like “Python syntax will be the most appropriate tool for all your jobs, at all scales within your codebase, and at all times”. This assumption is an extremely bold one to make, and is therefore highly implausible (what are the chances that, in a reasonably-sized codebase, Python will be the best way, given a free choice, to express every small behaviour and data structure in your program? It would be quite the coincidence!) So I’d conclude that new programming systems ought to all be designed with an OMeta-like frontend. The approach of normal programming languages is simply a mistake; all new languages should be designed to support user-supplied local syntaxes and semantics. Of course, this would mean they’re all just the same meta-system in different configurations, so there wouldn’t really be any new languages after that one.

I now realise that this was just a confusion of levels and replicates the problem one level deeper. If for some reason you were forced to use only one syntax and semantics in the meta-system, then you might have the same complaint: I can’t use the right tool for the job. In a vacuous sense, a single syntax/semantics is committed to a single syntax/semantics. Of course it is! And what is a language, essentially, but a syntax and semantics? What is Unix but a frankenstein’s OMeta?

Unix is a meta-system that already provides for syntactic (even Notational) freedom, in principle. Unix is to Python and C what the COLA meta-system is to the Python and C syntax/semantics modules within it. Unix is best at this in the “compilation” model: given N different languages, you can maybe figure out a way to compile each to object code and link them into a binary. It’s weakest in the “live” model: given a “running program”, there usually isn’t a way to make alterations to it using your preferred syntax and semantics for that syntax. However, under my hypothetical Smalltix mode of Unix programming, this might be less of a problem. Your “running program” is just a set of files and sporadically activating processes (method activations) evolving them. If you want to supply a new subprogram, write it in your favourite notation and compile it to a binary (or +x the source code and add a shebang). If you want to make a change to the state (files), you supply a subprogram to do so, in your favourite notation, and execute it (i.e. send that message to the state).

I guess this makes me fully sold on polyglot programming, in the more Unixey sense than the COLA/Smalltalkey sense: I want the latter but I think we already have 80% of the infrastructure in Unix. The barriers standing in the way are process/file overhead, inter-process communication, inter-heap language interop in the Kellian sense, a lack of Smalltalky polymorphic dispatch (“metasystems” in Kell’s Lurking Smalltalk paper), and two things I haven’t seen recognised: file-locked editors and one-way editors.

By “file-locked” editor I mean that text editors essentially treat a file as a screenful of information. You can have multiple files open, and switch between then via tabs, but if you want to see the contents of several files simultaneously you have to do manual splitting and manage the views yourself. If a program can be made of many parts (e.g. classes, procedures, even lines), each of which might be expressed in its own notation (let’s say syntax, for simplicity), then this can’t/shouldn’t be achieved by having all of the syntaxes in the same file: compilers/interpreters work under the assumption that their input file is entirely in their language. Instead, each mono-syntactic program unit should be in its own file, which can then be supplied to the appropriate compiler/interpreter. But in a normal text editor, this would mean a million different files, one for each class/procedure/line. The job of a program editor should be to collate these files back together into a screenful of information; e.g. if lines 1-10 are Python and stored in one file, and lines 11-60 are JS and stored in another, then you should see and edit a contiguous line range of 1-60 with the different syntaxes, but the existing Unix infrastructure of compilers/interpreters could continue to work with this under the hood.

(NB: I’m implicitly assuming a more Smalltixy “separate files” approach there, but if we’re talking about single-line files then there will probably be a lot of them! I imagine that mainstream programming culture would choose to fuse them all into one large “data file” and have this be the thing that your program editor evolves: essentially a custom polyglot file format.)

(Also, to fully realise COLA’s “Mood-Specific Languages” we’d need to support even a subexpression within a line having its own custom syntax. A one-line-per-file approach wouldn’t work for this.)

By “one-way” editor I mean to draw attention to how, say, a standard text editor doesn’t receive fine-grained feedback from a separate interpreter/compiler process to highlight your syntax errors or prohibit certain edits; if anything, it highlights your syntax via a custom plugin that essentially duplicates what the interpreter/compiler does, but inside the text editor process. Even worse: your standard vector graphics editor is not built to receive feedback from whatever is consuming your diagram; if you want to draw a diagram and render it into software as a beyond-textual “source code” (future post: Self-Raising Diagrams), then you’re stuck squinting at whatever errors the diagram parser gives you and adjusting your diagram via trial-and-error to satisfy it. I see these issues as stemming from the lack of GUI composability incentivised by the de-facto Unix “everything in the process” programming style.

Thinking about all this really got me stuck with regards to formalising the notion of “commitment to a particular syntax”. I guess that I no longer see it as a “bad” thing. Yet it still seems to be a fact that the C “programming system” (a text editor, plus gcc) really is committed to C syntax, while OMeta is not. I now have a feeling that formalising this commitment would overlap with formalising the difference between an individual syntax/semantics and a meta-system (Smalltalk, Unix, COLA, etc) that lets you use it alongside others.

Perhaps the real commitment I resent is the way the incentive vectors of programming strongly encourage you to commit to a single language “silo”, not the fact that that language is committed to a single syntax. I don’t envy the guy who has to formalise that, and I’m not sure I want to be him.

(Why formalise? So I can publish papers, of course. Perhaps mentally replace “formalise” with “make legible to the PL conference world”. But also to make my ideas precise enough to meaningfully verify/critique.)

Random notes

Suppose C’ which permits C syntax (default) or C’ syntax (within alt_syntax {…} blocks)

The cost of using C’ syntax is writing alt_syntax {…}

In ordinary C, the cost of using C’ syntax is the combined cost of obtaining, inspecting, and modifying the cc source to support this, and recompiling. Same as C’ cost for all other possible syntaxes

In C” which supports alt_syntax1, …, alt_syntax10 blocks for specific langs (a la HTML/JS/CSS), the cost of using syntax N is writing alt_syntaxN {…}. Analogous C cost, and C” cost for all other possible syntaxes

In C”’ which supports use_syntax "my_syntax" {…} plus defining define_syntax "my_syntax" {…}, cost of using your syntax is the cost of defining it plus cost of usage block

Same for semantics = basically OMeta/Ohm?

Commitment of C in the Unix PS refers to the fact that cc is built as a single binary (method frankendict)

Compare COLA with COAFLA … even a preprocessor that breaks a multisyntax source file into individual monosyntax files fed thru the correct compilers to produce object code, linked into a binary

Self-running interpreter binary: the binary is just an interpreter with the source code embedded in its data section (viz. old GameMaker, or by analogy with self-extracting archives)

http://programmingmadecomplicated.wordpress.com/?p=8907

Extensions