Autonomous Workflows, Not Agent Theatre
Most AI workflow projects fail because they try to be magical. The ones that survive are mostly deterministic plumbing with a small, well-placed model.
System summary
Field note arguing an agent is not an architecture: real autonomy comes from a clear state machine, one guarded model call per decision, a fast human override and honest observability, not fragile multi-agent demos.
An agent is not an architecture.
It is a component, and usually a small one. The current fashion treats the model as the centre of the system: a clever planner that decides what to do, calls itself recursively, spawns sub-agents and somehow converges on the right answer. It looks impressive in a demo and it produces a particular feeling in the room, the feeling that something intelligent is happening. That feeling is the product being sold. It is rarely the system being built.
A workflow system is autonomous when it can run for a week without anyone tapping a button. That is the only definition that survives contact with production. It does not become more autonomous by adding more agents. It usually becomes less autonomous, because every additional model is one more thing that can drift, hallucinate, stall or quietly cost three times what it did last month. Autonomy comes from structure, not from intelligence. The systems that keep running are mostly deterministic plumbing with a small, well-placed model doing the one thing a model is actually good at.
The state machine carries the work
The first thing that survives production is a clear state machine. Not a diagram of agents talking to each other, but an explicit map of steps, inputs, outputs, retries and a dead-letter path for the work that cannot complete. Every unit of work is in a known state at all times. You can stop the system, restart it, and it picks up where it left off because the state lives in a database, not in a model's context window.
This sounds boring, and it is. It is also the difference between a workflow you can operate and a workflow you have to babysit. When a run fails at three in the morning, the on-call engineer does not want to reconstruct what a chain of agents was thinking. They want to see which step failed, what its inputs were, and whether a retry is safe. A state machine answers those questions directly. An agent loop answers them with a transcript you have to read like a detective.
The model, where it appears at all, lives inside a step. It does not own the flow. It takes a well-defined input, produces a well-defined output, and the surrounding machinery decides what happens next. The flow is the architecture. The model is a tenant.
One model call per decision
The second thing that survives is discipline about where the model is allowed to decide. The useful pattern is one model call per decision, with a deterministic fallback when confidence is low. A decision is a point where judgement is genuinely required: classifying a messy document, extracting a field from unstructured text, choosing between a small set of routes when the rules cannot fully express the ambiguity. Those are real jobs for a model.
Everything around the decision should be code. Validation is code. Routing is code. Retries are code. The model is asked a narrow question, its answer is checked against a schema, and if the answer is malformed or low-confidence, the system falls back to a safe default or routes to a human. The mistake is letting the model decide and act in the same breath, with no gate between the suggestion and the consequence. A model that can both choose and execute is not an assistant. It is an unsupervised operator with no audit trail.
Recursive agents that call themselves to "think harder" are the opposite of this discipline. Each call multiplies cost and variance while making the decision boundary impossible to inspect. If you cannot point to the single moment where the system chose, you cannot explain the outcome when it is wrong, and in operational software the outcome is eventually wrong.
A human override faster than a restart
The third thing that survives is a human override path that is faster than restarting the run. Every autonomous system needs a door a person can open. The question is whether that door is designed in from the start or bolted on after the first incident.
A good override lets an operator inspect a stuck or suspicious unit of work, correct it, and push it back into the flow at the right state, without rerunning everything that already succeeded. A bad override is "kill the job and start over", which means every manual intervention destroys hours of completed work and quietly trains the team to never intervene. When intervention is expensive, people stop doing it, and the system drifts unsupervised until it breaks loudly.
The override is not an admission that the automation failed. It is the thing that makes the automation safe to trust. A system you can correct in thirty seconds is one you can leave running. A system you can only restart is one you have to watch.
Observability before the customer notices
The fourth thing that survives is observability that tells you which step is slow or failing before the customer does. This is not a dashboard with a model's chain-of-thought printed prettily on a screen. It is concrete operational signal: queue depth per step, latency per step, failure rate per step, the size of the dead-letter queue, and the cost of model calls over time.
Most agent systems are observable only as narrative. You can read what the agent said it was doing. You cannot easily answer how many units are stuck, where the backlog is forming, or which step started getting slower last Tuesday. Narrative is not telemetry. When the system is processing thousands of items, you need to see the shape of the work, not the monologue of a planner.
What the theatre hides
Everything else is theatre. Pretty diagrams of agents collaborating, recursive planners arguing with themselves, autonomous orchestrators that need a human to babysit them through every non-trivial run. We have stripped enough of these systems for parts to be confident about what they hide.
They hide that nobody owns the output when it is wrong. The demo never shows the case where the agent confidently does the wrong thing and no human notices for a week. They hide cost, because a planner that calls the model ten times to make one decision feels intelligent right up until the invoice arrives. They hide fragility, because a system held together by prompt instructions has no real failure boundaries, it just degrades in ways that are hard to predict and harder to reproduce. And they hide the absence of a data model, because as long as the state lives in conversation, nobody has to do the unglamorous work of deciding what the records actually are.
The theatre is seductive because it compresses a hard engineering problem into a clever-looking demo. The demo cares about the happy path. Production cares about the run that fails halfway, the document that does not parse, the third-party API that times out, the operator who needs to fix one record without burning the batch. None of that is visible when the model is improvising in front of an audience.
Use the model like what it is
This is not an argument against AI in workflows. It is an argument for putting the model in its place. The model earns its keep on exactly the parts that need judgement and resist clean rules. Use it there, narrowly, behind a schema, with a fallback and a human door. Let the deterministic system do everything else, because everything else is where reliability actually lives.
The systems that run for a week without anyone touching them are not the clever ones. They are the ones with a clear state machine, a single guarded decision point, a fast override and honest telemetry. The intelligence is a feature of one step, not the shape of the whole.
The model is the most expensive and least predictable part of your stack. Use it like one.
System design, architecture, technical direction.