Skip to content
Insights + News/Expert Opinions

Why AI Coding Tools Aren’t Delivering On Their Promise (and what actually works)

George Wilde

George Wilde
Principal Consultant, Front End & DevOps Competency Lead

This article was initially published on Medium.

In July 2025, researchers at METR published a randomized controlled trial looking at how AI tools affected developer productivity. The result? Developers using AI were 19% slower than those working without it. But here’s the kicker: those same developers believed they were 20% faster.

That’s a 39-percentage-point gap between perception and reality. And it explains why so many organizations are disappointed with their AI investments.

I’ve spent the last 18 months helping teams figure out how to actually get value from AI-assisted development. The short version: the tools aren’t the problem. The approach is.

The Uncomfortable Truth About AI Adoption

Most teams I talk to are in one of three places with AI coding tools:

  • “We’ve rolled out Copilot” — Licenses have been distributed, maybe there was a lunch-and-learn, and now everyone’s using it… sort of. Some developers love it, some ignore it, and nobody’s really sure if it’s helping. The research suggests it probably isn’t. The METR study was a proper randomized controlled trial with experienced open-source developers. They weren’t faster. They just thought they were.
  • “We’re building our own approach” — Better results when it works, but it’s a massive time sink. The landscape changes constantly. New models every few months, new tools, new techniques. Most teams plateau because they can’t keep up while also doing their actual job.
  • “We’ve given up measuring” — The most honest position, really. It’s hard to measure developer productivity at the best of times. Adding AI to the mix just makes it murkier.

Why The Same Tools Produce Different Results

Here’s something that took us a while to figure out: two teams using identical tools can see completely opposite outcomes.

Google’s DORA Report found that every 25% increase in AI tool adoption correlated with a 1.5% decrease in delivery throughput. Meanwhile, other studies show a 26% improvement in completed tasks. Same tools. Different results.

The difference isn’t the tools. It’s maturity. We’ve started thinking about this as a four-phase journey:

  • Phase 1: AI Enabled — “Here’s Copilot, crack on.” Mixed results, maybe slower.
  • Phase 2: AI Assisted — Training, prompts, best practices. The developer still leads, but they’re using AI more effectively. 10–30% improvement.
  • Phase 3: AI Agentic — Using tools like GitHub Copilot or Claude Code to do larger chunks of work. The AI is doing more, but it still needs a lot of guidance. Better, but inconsistent.
  • Phase 4: AI Orchestrated — A complete system. Specifications, validation, and multiple agents working together. This is where the big gains are, but most teams never get here.

The problem is that most organizations are stuck in Phase 1, wondering why the magic isn’t happening.

The Real Problems (they’re not what you think)

After working on this for over a year, I’ve noticed three things that consistently trip teams up:

  • Context is everything, and AI has none. Your codebase has history, patterns, decisions that made sense at the time. AI doesn’t know any of that unless you tell it. And “telling it” turns out to be harder than it sounds.
  • AI is confidently wrong. It will write code that looks right, passes a cursory review, and breaks in production. The METR study noted that developers spent significant time reviewing and fixing AI-generated code — time that often exceeded what it would have taken to write it themselves.
  • Developers don’t know what to delegate. This is the subtle one. Knowing when to use AI and when to just write the code yourself is a skill. It takes time to develop, and most organizations don’t give people that time.

What Actually Works

The teams seeing real gains — 200%, 400%, sometimes more — aren’t just using better tools. They’re using a different approach entirely.

We call it Spec-Driven Development, and the core idea is simple: you don’t ask AI to write code. You ask it to implement a specification.

The difference sounds small but it’s fundamental. A specification is unambiguous. It defines what “done” looks like. It gives you something to validate against.

Without a spec, you’re “vibe coding” — asking AI to do something and hoping the output is roughly what you wanted. With a spec, you’re giving AI clear constraints and success criteria.

I’ll go deeper on the specifics in my next post. For now, the key insight is this:

You can give everyone in your organization access to the best AI coding tools available and still get mediocre results. The tools are necessary but not sufficient. The organizations seeing transformational gains have invested in the approach, not just the tooling. They’ve figured out how to give AI the right context, how to validate its output, and how to keep humans in control of the things that matter.

The Honest Take

If you’re disappointed with the ROI on your AI coding tools, you’re not alone. The research says most teams are in the same boat.

The good news is that the gains are real. We’re seeing them on actual client programs. A complex system migration I am actively involved in, which was estimated at two years without using an SDD approach, is on track for completion in six months. Feature work that would have taken weeks is shipping in days.

But those gains don’t come from the tools. They come from having a mature approach to using them.

In the next post, I break down what the approach actually looks like in practice.

Don't miss the latest from Ensono

PHA+WW91J3JlIGFsbCBzZXQgdG8gcmVjZWl2ZSB0aGUgbGF0ZXN0IG5ld3MsIHVwZGF0ZXMgYW5kIGluc2lnaHRzIGZyb20gRW5zb25vLjwvcD4=

Keep up with Ensono

Innovation never stops, and we support you at every stage. From infrastructure-as-a-service advances to upcoming webinars, explore our news here.

Start your digital transformation today.