Hello!

This website is also available in your region.


Skip to content
Insights + News/Expert Opinions

Control, Context, Correctness: How Spec-Driven Development Actually Works

George Wilde

George Wilde
Principal Consultant, Front End & DevOps Competency Lead

This article was initially published on Medium.

In my previous post, I argued that AI coding tools aren’t delivering on their promise, not because the tools are inherently flawed, but because most teams are using them without a clear methodology. I introduced Spec-Driven Development (SDD) as the approach that’s working for us.

This post is the practical one. I’m going to break down what SDD actually looks like, and the three principles that make it work: Control, Context, and Correctness.

But first, let me be clear about what this isn’t.

This Is Not Vibe Coding

You’ve probably seen it. Someone opens ChatGPT or Cursor, types “build me a login page”, gets some code back, pastes it into their project, tweaks it until it sort of works, and calls it done. That’s vibe coding. You’re vibing with the AI, hoping for the best.

Vibe coding is fine for throwaway prototypes. It’s not fine for production software. The code might work today and break tomorrow. There’s no way to validate it systematically. When requirements change, you’re back to vibing.

Spec-Driven Development is the opposite. You’re not asking AI to guess what you want. You’re giving it an unambiguous specification and asking it to implement that specification. The difference sounds subtle, but it’s fundamental.

The Four-Step Cycle

At its core, SDD is a cycle with four steps:

    • Gather business requirements — Understand what the user actually needs. This isn’t new; it’s what good teams have always done. The difference is that you’re going to turn these requirements into something precise enough for an AI to implement.
    • Create a specification — This is where the magic happens. A specification defines exactly what “done” looks like. It includes acceptance criteria, edge cases, error handling, and enough technical context for the implementation to be unambiguous.
    • AI coding and testing — The AI implements the spec. But crucially, you’re not just accepting whatever it produces. The AI also writes tests against the spec. Multiple approaches can be tried and compared.
    • Deliver — Present the completed work. Because it was built against a spec, you can demonstrate that it meets the requirements. No hand-waving, no “it mostly works”.

Then repeat. The spec evolves as you learn more. The implementation stays aligned with the spec.

The Three Cs

Making this cycle work requires getting three things right. We call them the three Cs: ControlContext, and Correctness.

Control: Keep The Agents Aligned

AI agents are powerful, but they’re not infallible. Left unchecked, they’ll drift. They’ll make assumptions. They’ll “improve” things that didn’t need improving. They’ll go down rabbit holes.

Control means keeping the AI focused on the task at hand.

In practice, this looks like clear boundaries on what the agent should and shouldn’t touch, regular checkpoints where a human reviews progress, and the ability to roll back when things go wrong. It means working in small batches — a principle that Google’s DORA research identified as critical for AI-assisted development. As they put it: “This discipline counteracts the risk of AI generating large, unstable changes.”

The temptation is to give AI agents more autonomy. “Just build the whole feature.” In our experience, that’s where things go wrong. Tighter control, paradoxically, leads to faster delivery because you spend less time fixing mistakes.

Context: Right Information At The Right Time

AI doesn’t know your codebase. It doesn’t know your team’s conventions. It doesn’t know why that weird function exists or what happens if you change it.

Context is about giving the AI the information it needs to do good work.

This is harder than it sounds. Too little context, and the AI makes bad assumptions. Too much context and you blow through token limits or confuse the model with irrelevant information. The skill is in curating the right context for each task.

The DORA AI Capabilities Model calls this out explicitly. Two of their seven foundational capabilities are “healthy data ecosystems” and “AI-accessible internal data”. As they note: “Connecting AI to your internal documentation and codebases moves it from a generic assistant to a specialized expert.”

In SDD, context comes from the specification itself, from the relevant parts of the codebase, from architectural decision records, and from examples of similar implementations. Getting this right is one of the things that separates teams seeing 10% gains from teams seeing 400% gains.

Correctness: Validate Against The Spec

Here’s the uncomfortable truth about AI-generated code: it often looks right but isn’t. It passes a quick review. It seems to work. Then it breaks in production because of an edge case nobody tested.

Correctness means validating that the implementation truly meets the specification.

This is where having a proper spec pays off. If you’ve defined what “done” looks like, you can test against it. If you’re vibe coding, you can only test against your intuition — and your intuition is probably wrong about edge cases.

In practice, correctness involves automated tests generated from the spec, multiple implementation attempts that can be compared, human review of the code and test coverage, and validation against the original business requirements.

The DORA research calls this a “user-centric focus” — making sure you’re building the right thing, not just building something fast. It’s easy to be productive in the wrong direction. The spec keeps you honest.

What Goes Wrong Without Each C

I find it helpful to think about what happens when you’re missing one of these:

  • Without control, AI agents drift. They make changes you didn’t ask for. They “refactor” working code. They add dependencies you don’t want. You end up spending more time reviewing and reverting than you saved.
  • Without context, AI produces generic code that doesn’t fit your codebase. It ignores your conventions. It duplicates existing functionality. It makes architectural decisions that conflict with your existing patterns.
  • Without correctness, you ship bugs. The code looks right, but it handles edge cases wrong. Tests pass, but don’t actually cover the requirements. You find out in production that the implementation doesn’t match what the business needed.

The Hurdles (and how to address them)

I won’t pretend this is easy. There are real challenges to making SDD work.

  • Spec quality varies. A bad spec leads to bad implementations. Teams need to learn how to write specs that are precise enough to be useful but not so detailed that they take longer to write than the code would.
  • Context curation is a skill. Knowing what context to include — and what to leave out — takes practice. Too much context is almost as bad as too little.
  • It requires discipline. The temptation to skip the spec and just start coding is real, especially for “simple” changes. But simple changes have a habit of becoming complex, and then you’re back to vibe coding.
  • No single tool does everything. When we started with SDD, we evaluated what was available and found plenty of good options — but no single tool covered the full picture. That’s the nature of a fast-moving space: tools like GitHub’s Speckit and approaches like the BMAD-Method are genuinely useful, but they each solve part of the problem.

What we’ve found is that the real value isn’t in any one tool. It’s in the ecosystem around it: custom agents tuned to specific tasks, prompts refined through hundreds of iterations, scripts that handle the boring bits, and workflows that tie it all together. We take the best of the off-the-shelf solutions and combine them with custom components, then tailor the whole stack to each client and project. A migration project needs different tooling than a greenfield build. A team familiar with AI needs different guardrails than one just getting started.

This isn’t a product you can buy. It’s an approach you adopt — and the tooling evolves with you.

Why It’s Worth It

Despite the hurdles, the results speak for themselves.

We’re seeing 400%+ velocity improvements on real projects. A migration that was estimated at two years is on track for six months. Feature work that would have taken weeks is shipping in days.

But the gains aren’t just velocity. The code is better tested because tests are generated from the spec. It’s better documented because the spec is the documentation. There’s less tribal knowledge because the thinking is captured in the specs, not in someone’s head.

Developers actually prefer working this way. They’re solving interesting problems — defining what should be built, reviewing implementations, handling the genuinely tricky bits — instead of writing boilerplate.

The Research Backs This Up

The pattern we’ve seen — methodology mattering more than tools — shows up consistently in the research.

McKinsey’s controlled study found that while experienced developers saw 50% time reductions on certain tasks, junior developers without foundational training actually took 7–10% longer with AI tools. Same tools, opposite results. The difference was preparation and approach.

Microsoft Research found it takes approximately 11 weeks for teams to fully realize productivity gains from AI tools. More striking: organizations with clear AI advocacy and structured support see 7x higher daily usage rates than those without. It’s not enough to provide the tools. You have to provide the methodology.

And Gartner’s Developer Experience Assessment Survey found that despite vendor claims of 30–50% productivity gains, 42% of engineering staff report gains of only 1–10%, with another 12% reporting no gains at all. The gap between promise and reality is explained almost entirely by how teams adopt the tools, not which tools they adopt.

The conclusion across all of this research is consistent: AI amplifies what’s already there. Good processes become great. Weak processes become worse. The three Cs — Control, Context, Correctness — are our attempt to codify what “good processes” actually mean in practice.

Getting Started

If you want to try this approach, start small. Pick a well-defined feature. Write a proper spec before you write any code. Give the AI that spec and the relevant context. Validate the output against the spec, not against your gut.

You’ll probably be slower on the first few attempts. That’s normal. You’re building a new skill. But once the approach clicks, you’ll wonder how you ever worked any other way.

In the next post, I look at where this is heading: multi-agent orchestration, parallel implementation teams, and what it looks like when you scale this approach across an organization.

Don't miss the latest from Ensono

PHA+WW91J3JlIGFsbCBzZXQgdG8gcmVjZWl2ZSB0aGUgbGF0ZXN0IG5ld3MsIHVwZGF0ZXMgYW5kIGluc2lnaHRzIGZyb20gRW5zb25vLjwvcD4=

Keep up with Ensono

Innovation never stops, and we support you at every stage. From infrastructure-as-a-service advances to upcoming webinars, explore our news here.

Start your digital transformation today.