Spec-Driven Development Isn't Waterfall Unless You're Using It That Way

Last summer, while I was vibe-coding my own scorecard app, I kept running into dead ends. Not dramatic ones — the annoying kind. I would get something working, then realize the path was wrong. Or the behavior was close but not quite right. Or I had built a capability that looked useful in isolation but did not actually help the overall product move forward.

So I started thinking more up front. I used plan mode more. And eventually I started using spec-driven development more formally. Which raises the obvious question: did I just give up agility and go back to waterfall?

I do not think so. But I do think spec-driven development can very easily become waterfall if you use it the wrong way — and that’s worth unpacking, because the trap is subtle.

The old reflex shows up again

The industry has been here before.

Sketchnote: The pendulum swing from vibe coding dead ends and burned tokens toward plan mode. The problem is not thinking up front, but pretending thinking up front removes uncertainty.

Complexity rises, surprises pile up, delivery gets late, and people get frustrated. The natural reaction is predictable: “Next time, let’s figure everything out up front.” That instinct is understandable. It is also where a lot of waterfall-style phase-gate thinking came from.

Sketchnote: The old waterfall instinct. Complexity rises, surprises pile up, and controls tighten into requirements, design, build, test, and deploy phases.

When work feels uncertain, leaders try to reduce surprise by adding more up-front analysis, more review gates, more detailed plans, and more confidence theater. But when the uncertainty is about what is useful, feasible, desirable, safe, or valuable, writing more up front does not eliminate it. You can spec it all day and still get surprised. That was true with human teams, and it is just as true with AI coding agents.

AI changes the loop, not the need for learning

AI does change something important. A coding agent can burn down some feasibility risk on its own. If you give it a clear enough target, enough context, and a good enough definition of done, it can try an approach, hit a problem, inspect the error, adjust, and try again. That agentic loop is genuinely useful.

Sketchnote: What AI changes. A coding agent can burn down feasibility risk through an agentic loop with a clear target, build and test, learn from results, and adjust or ask.

It means some of the “how do we make this work?” learning can happen inside the coding loop. In a way, it feels a little like the interaction between a product owner and a Scrum team: you describe what you are trying to achieve, the team works through implementation reality, and it comes back when it learns something important, hits a constraint, or needs a decision. AI coding agents are not Scrum teams, but they can sense and respond to implementation surprises in a similar way.

Spec-driven development is designed for flow, not waterfall. The spec is meant to be a concrete target function for an adaptive agentic loop. The danger is when we treat the spec as a way to remove the need for adaptation — especially when we combine it with larger batch sizes. Trying to figure out an entire spec up front for a month’s worth of human-agent interaction is, of course, ill-advised when there’s any chance of surprises along the way.

Back to the essence of Spec-driven Development

Spec-driven development helps because it forces me to slow down just enough to think through and specify:

What am I actually trying to accomplish?
What should this enable?
What constraints matter?
What does good look like?
What assumptions am I making?
Where do I need the agent to decide, and where do I need it to ask?

This gives the coding agent better context and reduces avoidable churn. But the more I leaned on it, the more it revealed two traps.

Trap 1: More spec is not always better

There is a point of diminishing returns on specification, and with AI coding agents that point can show up pretty early. At first, better intent helps. Clearer outcomes help. Better constraints, concrete examples, acceptance criteria, a rough plan — all of it helps. But eventually, more specification stops creating better output. Sometimes it just burns time and tokens. Sometimes it constrains the agent so tightly that it cannot usefully respond to what it learns while implementing. Sometimes it makes you feel safer without reducing the real risk.

This is the same dynamic we’ve always had with human coders. There is a difference between giving a good developer useful context and telling them how to code every step — the first creates alignment, the second wastes talent. AI coding agents are not people, but the same economic idea applies: you want enough guardrails for agency, not so many that you remove the agent’s ability to adapt.

Sketchnote: More spec has diminishing returns. At some point, more specification stops improving output. Specs are a scaffold for learning, not a contract for certainty.

Trap 2: Working software is not the same as a valuable outcome

The second trap is more important: even when the AI builds exactly what you asked for, you can still miss the outcome. That should sound familiar, because agile teams learned this the hard way too. Working software matters, but a team can produce working software that customers do not use, do not trust, do not understand, or do not care about.

AI coding agents have the same problem. They can help you create working software, but they cannot magically know whether your customers really want it, whether the workflow will change, whether your internal users will trust the output, or whether the thing you built will move a leading indicator that matters. Unless the learning loop includes real customer behavior, real user feedback, real operational signal, or real business impact, the agent is still making assumptions — possibly very efficient assumptions, but assumptions nonetheless.

That is where spec-driven development has to stay connected to product discovery. Otherwise, you just get better at building the wrong thing.

Sketchnote: Working software is not value. Even when an agent builds what you asked for, you still need a feedback loop to know whether behavior changed, value improved, or the work moved.

So is spec-driven development waterfall?

It depends on how you use it. A fully flushed-out spec with a horizon measured in minutes or hours — or in X tokens — makes total sense. The problem is what I’m starting to see in the trenches: teams using specs as a team or group collaboration mechanism, with a horizon of days or weeks.

Don’t get me wrong — I see tons of value in some form of spec-driven behavior at that horizon and altitude. But it needs to be a higher-altitude spec. It shouldn’t try to anticipate everything up front. At one of my clients, for example, we are using an outcome-oriented, product-led version of spec-driven development at the portfolio level. The specs we aim to create there are only remote cousins of the specs that make sense during a coding session. There’s still a structured human-agent collaboration, and it’s genuinely useful — it helps everyone orient to outcomes, think about leap-of-faith assumptions, decide whether discovery is needed, and figure out the learning plan. But the “plan” stage of SDD at that level produces a list of potential specs, features, or experiments, not detailed stories and tasks.

A better way to use spec-driven development

Effective spec-driven development is not “big planning up front.” It is enough thinking up front to make learning cheaper — appropriate to the altitude and horizon you’re working at, and focused on outcomes, assumptions, and leading indicators. It also means treating the inventory and flow of specs as something to visualize, manage, measure, and improve over time — not just through spec and implement, but end to end, all the way to outcomes and learning.

Here’s what that looks like in practice.

Sketchnote: A better spec-driven approach uses the spec to expose outcomes, assumptions, the learning plan, feedback touchpoints, and validation path.

1. Specify outcomes, not just outputs

Do not only describe the feature, screen, workflow, or capability. Describe what you want it to make possible. What behavior should change? What decision should get easier? What painful work should get lighter? What customer or user outcome should improve, and what business signal would tell you this actually mattered? If the spec only describes the output, the agent can succeed while the product fails.

2. Flush out assumptions, not just requirements

Requirements say what the thing should do. Assumptions say what must be true for the thing to matter. That distinction is critical, and it’s easy to lose. Use the coding agent to help surface the assumptions hiding in the work — about users, workflow, data, the business, usability, adoption, technical feasibility, and risk — and then decide which ones need learning before, during, or after implementation. The goal is not to remove all assumptions. The goal is to stop hiding them inside the spec.

3. Decide whether this work needs discovery

Not every change needs deep discovery. Some work is obvious enough, some is mostly technical cleanup, and some is a small improvement with low downside. Just build it. But some work is a bet, and for bets the spec should include the learning approach. Can we test the idea before building it? Can we prototype, or run a fake-door test, or put a thin slice in front of real users? Can we use leading indicators to steer, or learn from support tickets, sales calls, workflow data, or customer behavior? If the work needs discovery and the spec ignores that, you are setting up a beautiful implementation of an unvalidated bet.

4. Think about options

AI makes it tempting to try everything — spin up five agents, build five approaches, pick the best. Sometimes that is smart. Set-based design can make sense when options are cheap enough and the uncertainty is high enough. But it still has a cost. Even if tokens are cheap, management attention is not, and reviewing five approaches, integrating the best ideas, managing conflicts, and deciding what to keep is real work. Sometimes one focused approach with a fast feedback point is better; sometimes multiple options are worth it. The spec should make that choice explicit. Don’t unleash a swarm because it sounds cool — use options when the economics make sense.

5. Task the agent with the right feedback points

The coding agent should not disappear into a cave and come back with a finished product if the risk profile does not support that. Ask it to task out the work with appropriate checkpoints. For low-risk work, that may be a straightforward build. For higher-risk work, it may mean validating the architecture before implementation, producing a first thin slice, stopping after a spike, asking for a product decision before continuing, creating testable acceptance criteria, or surfacing tradeoffs rather than guessing. The point is not to micromanage the agent. The point is to match autonomy to risk.

6. Follow through after working software

This is the step many people skip. Once the agent builds the thing, the real product work is not done. Put it in front of real users and watch what happens. Measure something that matters. Listen for confusion, look for workarounds, and ask the harder questions: did the workflow actually change? Would the user miss this if it disappeared? Did the business signal move? This is where spec-driven development either becomes part of an agile learning loop or degenerates into a more disciplined feature factory.

AI coding agents are not product agents

There is a reason we call them AI coding agents — not product agents, not outcome agents, not judgment agents. At least not yet.

Sketchnote: Coding agents are not outcome agents. AI improves coding throughput and quality, while human judgment still has to aim and steer the work.

They are getting very good at taking context, constraints, and intent and turning them into working software. That is a big deal. But we still have to provide the taste, judgment, context, and connection to real outcomes. We decide what is worth building. We understand the users and customers, notice the friction, and define what good means. We design the learning loop, and we make the tradeoffs when the agent discovers that reality is messier than the spec.

Spec-driven development is powerful when it helps us do that work. It becomes waterfall when it helps us avoid it. That is the whole difference. The point is not to write bigger specs — it is to write specs that create better steering.

If AI has made your team faster at building, the next question is whether you have become better at choosing, validating, and learning. That is where the value is.