under #ai #benchmark

FORTH? Really!?

Imagine you have to generate the word that succeeds this colon: ___

What would you put in that blank space?

It’s easier when the question comes first.

But what if we structured things such that the blank had to be generated before its constituent parts. LLMs are wonderful, but I see too many people try to break down recursively to solve problems like top-down humans do. Instead, I posit that FORTH and associative/applicative languages may be better for transformer architectures. Concatenate, not integrate. Agree on the stack state.

I set out to question if this could be true.

Sideways Passing Join.

Imagine you had this program:

A SCAN [foo > 5] FILTER
B SCAN [foo < 5] FILTER
BUILD
PROBE

that performs a natural join on A and B’s shared identifiers.

Because of the properties of associative languages you can always make local edits. For example, if you made a sed-like transformation, you could replace BUILD PROBE with the following anywhere there’s a BUILD PROBE sequence to do a sideways-information-passing join:

DUP STATS SWAP BUILD
[PUSHDOWN] DIP PROBE

This same associative property allows us to divide a program into, “What’s been generated in-context,” and, “What remains to be generated.” We shuffle one token at a time to extend the context and consume our desire to generate tokens.

I have a hunch that transformations of finite automatons over subsequences of the text can be used to write optimization passes for the database layer.

A phrase from Manfred von Thun goes, “syntactic concatenation is semantic composition.”

A Benchmark

I set out to benchmark what models can do in this regard. Would the order of terms matter to an attention transformer? The experiment is simple: I want to construct a tree over numbers and measure when the tree conforms to instructions. In my experiment I used parity to assess whether the sum of a sub-tree’s children were even or odd. Thus, prefix notation needs to know the overall answer before it generates the sub-answers. Postfix notation generates bottom-up, generating sub-answers before answering further.

If you think about how you answer, “What is the next token,” you’ll see where I’m going.

Setup

Given: A sequence of numbers. Construct: A prefix or postfix parity tree.

What is a parity tree? An unbalanced, left- or right-skewed binary tree whose leaves are numbers and whose interior nodes represent the parity of their transitive children.

Results

I ran four trials across Opus and Haiku (sonnet gave results I need to better understand before I’ll publish). Thinking consistently outperforms non-thinking. Opus consistently outperforms Haiku. And postfix consistently outperforms prefix.

Model	Thinking	Postfix Acc	Prefix Acc	Both Correct	Postfix Only	Prefix Only	Both Wrong
Haiku	Yes	88.3%	36.7%	110	155	0	35
Haiku	No	6.7%	4.3%	9	11	4	276
Opus	Yes	98.3%	81.3%	243	52	1	4
Opus	No	50.0%	9.7%	28	122	1	149

All makes sense in the world.