Agents in 2025 #0
The Internet was built on human-readable, text-based protocols. Even HTML and CSS are text. A picture is worth one thousand words; or, according to OpenAI, anywhere from two to six thousand well-reasoned output tokens.
Why do I focus on text? Because of AI. Sure, we have multi-modal models, but the best generate models for instruction operate on text. How would one instruct a vision model? Interpretive dance?
Maybe I’m off base. Maybe it’ll be speech.
Speech? Text?
A linear flow of thoughts.
A pidgin language will emerge to allow human and machine to communicate freely.
The speakers of the creole to follow have yet to be born; for their sakes I hope we get it right.
Email as we practice it today is broken and barbaric.
Once upon a time, email was civil, well structured text. Plaintext. No formatting, no images, no HTML; just text. People—-especially on mailing lists—-tended to post their reply at the bottom. I advocate in this section that we should bring back this practice because it is easier to read for human and machine alike.
When you bottom-post, you do the reader a courtesy by replying inline or after the main context.
Context.
Without context, words are significantly harder to understand and interpret.
I’m building a framework for agents.
The impedance mismatch between me and machine is the most ambiguous problem to solve.
A linear flow of thoughts.
A sequence of actions as way of example.
A set of distinct policies.
A bank of instructions.
Quality of retrieval matters here.