Most “Agents” Are Workflows That Got Ideas

Part 2 of a series on what actually goes into production agentic systems.

Part 1 – Most “AI Agents” in Production Are Demos With Extra Steps

The word “agent” has stopped meaning anything specific. A form wizard with an LLM call is an agent. A five-step pipeline with a classification step is an agent. A chatbot that returns a JSON object is an agent. The label has done so much work for so many different systems that the word no longer tells you anything about the shape of what’s underneath.

This matters because the shape determines the economics. A workflow and an agent are different things architecturally, operationally, and financially. Calling a workflow an agent doesn’t change the bill. It does change the team’s expectations, their debugging strategy, and the amount of autonomy they accidentally hand to an LLM that didn’t earn it.Hero: workflow vs. agent shapes side by side

Two systems doing nominally similar work. The one on the left is a graph you wrote. The one on the right is a graph the LLM writes at runtime. Pick the wrong shape and you pay for it every day in production.

Who Holds the Control Flow

A workflow is a system where you, the engineer, decide what happens next. The control flow is written in code. The LLM is a step, sometimes a critical step, but it doesn’t decide the route. Extract fields from this email. Classify this ticket. Summarize this document. The next step is already written before the LLM opens its mouth.

An agent is a system where the LLM decides what happens next. You give it a goal and a set of tools, and the model picks which tool to call, in what order, until it decides it’s done. The control flow lives in the model’s head, not in your code. This is what “agentic” actually means. Not “uses an LLM,” not “calls tools,” but “the LLM holds the loop.”

Almost everything shipped in the last two years that was called an agent was actually a workflow. The LLM answered a question, maybe called a tool, maybe sat in the middle of a three-step chain, but the team wrote the orchestration. The LLM did not decide the order. That is a workflow. Calling it an agent doesn’t help you. It just obscures what you built.

If you can draw the system on a whiteboard as boxes with arrows, and the arrows are fixed rather than “it depends what the LLM says,” you have a workflow. Use the word.

Let’s Say You’re Building Two Systems

Take the email-processing system from the last article. An email comes in. You extract structured fields. You classify the type. You pull historical context from the CRM. You draft a reply or route it to a human. You write back to the CRM.

That’s a workflow. Every step is predictable. The sequence is the same for every email. Some steps use an LLM, others don’t. If one step fails, you know exactly where. If latency is high, you know which call to optimize. If the result is wrong, you trace through a known graph. It’s five LLM calls at most, and none of them have to decide what the next call is.

Now imagine a different system. A customer-success lead walks up and asks: “Tell me everything relevant to this client’s escalation: why it happened, what signals we missed, what pattern runs through it.” That is not a workflow. You don’t know ahead of time whether the answer lives in emails, in support tickets, in a call transcript from last month, in a usage graph, or in all of them. You don’t know whether one retrieval is enough, or whether a finding in one place will change what you want to look at next. The next action depends on what the current action surfaced.

That’s an agent. An LLM, a set of retrieval and search tools, and the loop belongs to the model. You cannot write the branching by hand because the branching is not knowable at design time.

These two systems are not on a spectrum. They are different shapes. The email workflow can have five LLM calls and still not be an agent. The investigation system might make five calls on one query and twenty-three on the next, and that is the point. You don’t know in advance.

The 10x Cost of Unnecessary Agency

Giving an LLM the control flow is expensive. Every iteration of the loop is another model call. Every model call is another round-trip of the full context plus the message history plus the tool descriptions plus the last result. Tokens compound.

A workflow that would take four LLM calls at two thousand tokens each, call it eight thousand tokens, can become an “agent” that does the same work in twelve to twenty loop iterations at six to ten thousand tokens apiece. That’s not 2x. That’s an order of magnitude, sometimes more. Latency grows with it. Every loop iteration is a network round-trip, model inference, and tool execution. A workflow finishes in three seconds. The “agent” version of the same task takes forty.

The cost story is bad. The reliability story is worse.

A workflow has enumerable failure modes. Step three can fail. The classifier can return a wrong label. The CRM can reject a write. Each of those is a known case you can log, alert on, and retry with a fixed strategy. An agent can fail in ways you did not anticipate, because the loop it followed is not a loop you wrote. It can call the same tool four times in a row and get stuck. It can decide a task is done when it isn’t. It can decide a task needs more work when it is already correct. These are not bugs in the ordinary sense. They’re probabilistic routing failures in a system where the routing lives in the model.

The tell that a team has made this trade badly is almost always the same. They have a list of steps. They could have written the orchestration in a hundred lines of code. But writing orchestration code feels like plumbing. Deterministic, unglamorous, old-fashioned. So they define the steps as “tools,” write a two-thousand-token system prompt that describes the logic in prose, hand the LLM the tools, and hope it calls them in the right order.

It usually does, for the first fifty test cases. Then a forwarded thread comes in, or an auto-reply, or an inline image attachment, and the LLM skips a step, or calls the wrong tool, or decides the task doesn’t need the CRM write after all. You now have a workflow with nondeterministic execution that nobody can reason about, at ten times the cost.

Teams that wrap an agent around work that didn’t need one pay in three places every week: the token bill, the latency budget, and the on-call rotation.

When Agency Is Actually Earned

There are cases where an agent is the right shape. They share one property: the branching is genuinely unknowable at design time.

Research, investigation, and retrieval-heavy tasks qualify. So do tasks where the stopping condition is “when you have enough” rather than “after step five.” Long-horizon planning tasks where the first action surfaces information that changes the next action qualify. Certain kinds of code-generation and debugging tasks qualify, because the error a compiler returns is information you could not have predicted.

What unites these is that you cannot write the graph. If you could, you would, and it would be cheaper and more reliable than an agent. The agent is the shape you reach for when the graph doesn’t exist.

The heuristic I’ve watched senior practitioners use is: can you whiteboard the happy path? If yes, it’s a workflow. If the whiteboard has “and then it depends on what we find” as a node, you might be looking at an agent, or you might be looking at a workflow with a branching step you haven’t articulated yet. Force yourself to articulate it before jumping.

Default to workflow. Upgrade to agent only when the workflow version would require a branching factor you can’t enumerate. That is the test.

The Part Nobody Mentions

“Agent” has become a word that makes work sound more interesting than it is.

A team that ships a workflow with an LLM in it has shipped a workflow. A team that ships “an agent” is doing AI. The difference is pure framing, and framing has career consequences. Nobody gets promoted for shipping a well-built workflow, no matter how well it works. People do get promoted for shipping agents, even when the agent is a worse version of the same thing.

This is why the anti-pattern keeps spreading. The technical argument against unnecessary agency is clean: more expensive, less reliable, harder to debug. The organizational argument for unnecessary agency is stronger: more visible, more fundable, more promotable. Left alone, the organizational pressure wins.

The senior move is to push back. Ship the workflow. Be explicit that it’s a workflow. Save the agent framing for the systems that actually need it. You’ll ship faster, pay less, debug less, and when you do ship an agent, people will believe you when you say you needed one.

The Flowchart Test

If you can draw it as a flowchart, it isn’t an agent. And if it isn’t an agent, don’t call it one. The label costs you in ways the demo won’t show but the bill will.

A large part of being a senior engineer around LLMs, honestly, is talking teams out of agents they don’t need. A well-structured workflow with a model call in the right place will beat a hastily-assembled agent on every metric that matters: latency, cost, reliability, debuggability, team sanity.

Save autonomy for the problems that demand it. Everything else is a flowchart.

Coming Up in This Series

Next up: Product & ROI: is agentic even the right shape? Before the workflow-vs-agent question there’s an earlier one most teams skip: is an LLM in the loop at all the right call? Sometimes the answer is rules. Sometimes it’s classical ML. Sometimes it’s a form. Figuring out when the AI-shaped answer is actually the right answer is the most underrated skill in this field.

If this resonated and you’re building production AI systems, follow along. The series covers the 21 things I think senior AI engineers and architects need to reason about: RAG pipelines, tool design, security, evaluation, cost, and the operational patterns that separate demos from systems you can actually run.