Agentic frameworks — Frontier System

Vibe coding vs agentic engineering: Draw a line between low-stakes, hands-off experimentation and professional software work. Vibe coding is for fast, lightly-supervised exploration; agentic engineering is the disciplined use of agents to build production software with review, tests, and responsibility. #
Blast radius determines process: The right workflow depends less on whether AI is involved and more on who can get hurt. If only you bear the downside, looser methods are acceptable. Once other users, systems, or third parties can be harmed, professional safeguards become mandatory. #
The dark factory pattern: Automation eventually shifts the job from writing code to specifying outcomes and enforcing quality without direct line-by-line authorship. What replaces direct inspection when production is automated? #
Synthetic QA swarms and simulator environments: If humans stop being the primary verification surface, create synthetic users and simulated environments that hammer the product continuously. The pattern is behavior-level validation through simulation. #
Bottlenecks shift when implementation gets cheap: Once code generation speeds up, the scarce resource moves elsewhere: framing the problem, testing the idea, judging tradeoffs, and redesigning team processes. Now that implementation is cheaper, where is the bottleneck now? #
Prototype portfolio, not prototype singular: Cheap implementation changes product work from "pick one direction and commit" to "spin up several plausible directions and compare them." Explore multiple candidate shapes early because the cost of parallel prototyping has collapsed. #
Human evaluation beats simulated-user evaluation for desirability: AI can help generate and even exercise prototypes, but it is not a credible substitute for real human usability feedback when the question is "which version should we actually choose?" Keep humans in the loop for taste, desirability, and lived experience. #
Use AI for the first two-thirds of brainstorming: Models are strong at exhausting the obvious space quickly. The valuable human work begins when ideas are combined, reframed, and judged. Let AI clear the obvious ground, then use human taste to push into interesting combinations. #
Metaphor remixing improves idea quality: One way to escape obvious ideas is to force alternate frames and metaphors. Ask for ideas from adjacent or even weirdly unrelated domains to trigger better originals. #
Invest in agency; use AI to amplify ambition: The lasting human advantage is not raw output but agency: deciding what problems matter, what to pursue, and how to use new tools to become more capable. Use AI as an amplifier of skill and ambition, not as an excuse to become passive. #
Code is cheap, so spend the dividend on quality: When code becomes abundant, the goal should not be "produce more code." It should be "use lower implementation cost to produce better software." That means quality, maintainability, and future extension matter more, not less. #
Proof of usage is the new trust signal: Tests, docs, and polish are no longer strong enough signals on their own because AI makes them cheap to generate. The more durable credibility signal is sustained real-world use. A broad framework for evaluating software maturity in an AI-rich world. #
Externalized reusable memory compounds over time: Store tools, experiments, and prior research in systems that agents can later search and recombine. Build an external memory layer of code and notes that future agents can consult instead of starting from zero. #
Red/green TDD is a high-leverage agent prompt: Agents need executable feedback. Have them write the test first, watch it fail, implement the change, and watch it pass. Use tight executable feedback loops to steer agent work. #
Cheap tests change the test-economics tradeoff: When agents can write and maintain large amounts of boilerplate, previously expensive test suites become more affordable. Some practices once rejected as too costly may now be worth revisiting because maintenance cost has changed. #
Thin templates steer agents better than long prose: Agents lock onto existing patterns in code with very little prompting. A tiny starter template with one test, one style, and one clear structure. #
Bounded-autonomy sandboxing: High-autonomy agent modes are only acceptable when the blast radius is intentionally constrained. Relax permissions inside environments where failure is tolerable, not on the operator's most sensitive surfaces. #
The lethal trifecta: An agent becomes dangerously exploitable when three conditions coexist: access to private information, exposure to malicious instructions, and a path to exfiltrate the data. #
97% is a failing grade in high-risk agent security: Near-perfect filters are still unacceptable when the residual failure mode is catastrophic. Probabilities must be judged against consequence, not just headline accuracy improvements. #
Normalization of deviance: Repeatedly surviving unsafe behavior creates false confidence and institutional drift. Uneventful near-misses are not evidence of safety. #