Verifiable agent workflows and software surfaces

Agents that can inspect state, verify a build is actually moving, and modify the software surface itself.

Developer productivity grounded in usefulness:

All of the token counting is great but really what we should be able to get now is a meaningful metric for developer productivity. Just run a nano model over every commit and ask the model to rate the codes usefulness with your teams intended goal and some grounding of the codebase

Verification via state:

Always fun when you notice Codex being clever in a way you don't expect. In a session today, it was running a slow build process and got annoyed (don't we all). Before making a change it checked that progress was actually happening and did so not by checking the logs, but by checking CPU usage.

Codex in DOOM:

I put Codex into DOOM using Codex app server. Codex modified the actual game files and modified the game engine to render the terminal interface natively and have Codex engage with the game.

Saturday, 4 April 2026