Clemens Adolphs on the Role of Agility in Getting AI Investments Right

How to Apply Agile Principles to AI Investments and Initiatives

AI initiatives have a well-known place where they go to die: the proof-of-concept stage. A team builds an impressive demo, everyone is excited for a week, and then it gathers dust because nobody designed for what happens after the demo. In this Scaling with Agility conversation, Clemens Adolphs — co-founder of AIce Labs, which helps enterprises get AI initiatives done end to end — and I dug into why that happens and what actually prevents it. The short version: AI work carries more uncertainty than standard software, across feasibility, viability, and desirability all at once, and most organizations still manage it as a fixed-scope project with a deadline. That produces a gantt chart with the labels filed off, and a team optimizing for story points instead of outcomes.

The fix is to treat AI initiatives as genuine product work. That means taking internal market validation seriously — your colleagues have a choice, and you can mandate that they use a tool but not that they embrace it. It means using cheap prototypes (vibe-coded throwaways, wizard-of-oz setups) to buy down risk before committing. And it means steering with outcome-based leading indicators rather than velocity, so the stakeholder sees traction — the needle moving — instead of activity. Underneath all of it is a principle Clemens and I both kept returning to: agile gives you principles, and you derive practices from principles in context. Porting the practices that worked for standard software directly onto AI work is exactly how you get agile theater.

Where AI projects actually fail

Clemens came to this work from quantum computing, of all places, and the through-line of his career turns out to be the unglamorous part of advanced technology — making sure the thing keeps working after the demo. He described the failure mode he built a company to prevent:

“We help enterprises get AI initiatives done right from end to end, and not just have everything die in the proof-of-concept stage and gather dust there.” — Clemens Adolphs, co-founder, AIce Labs

I wanted to get specific about where in an initiative the failure happens, because “AI projects fail” is too vague to act on. There are a few distinct risks, and they fail differently. There is feasibility — can it even work? There is viability — is the ongoing cost of the AI tooling worth the time saved or revenue gained? And there is desirability — if we build it, will people actually use it? Clemens’s answer was that it is usually all of them, in different mixes, but he pushed hardest on a risk that gets underestimated:

“With everything you ever build, there is that market risk — will anybody actually want it once you build it. And market risk doesn’t necessarily mean you’re building a consumer-facing product. It could just be an internal tool that you roll out and you want your people to use, but they don’t. There’s too much friction — they’d rather copy-paste from ChatGPT than click through your five layers of internal whatever.” — Clemens Adolphs

That is the part I want leaders to sit with. The instinct with AI is to obsess over feasibility — can the model do the thing — and to treat desirability as obvious. An AI-native CRM where salespeople never have to do manual data entry? Everyone will love that. Except, as Clemens put it, “I’ve seen so often that even these obvious assumptions eventually fall flat.” The product risk on AI work is higher than on standard software precisely because the ambition is higher, and the feasibility question is genuinely open in a way it usually is not when you are building, say, a CRM with a different workflow. With AI you often do not know whether it will do what you want until you try it in some fashion. So you have two hard risks stacked — can it work, and will they embrace it — and most teams only plan for one.

Take the internal market seriously

The phrase Clemens and I both kept using is internal market, and I think it is the most useful reframe in the whole conversation. When you build a tool for colleagues, you are not shipping into a captive audience. You are shipping into a market where people have options, and one of those options is to ignore you.

“You can mandate that they use it, but you can’t mandate that they embrace it. And then they’ll find their workarounds and back channels. And that should tell you something.” — Clemens Adolphs

If your shiny new AI tool is not getting used, that is not a rollout problem to be solved with more mandates and more training emails. It is product feedback. The right response is the one a product team would have to a bad NPS score or a flat usage chart: go do user interviews, look at where people drop off, and treat the workarounds as data. Clemens and I ran the pirate metrics — awareness, activation, retention, referral, revenue — straight through the internal lens. Are people even aware of the tool? Do they activate? Once activated, do they keep using it? The revenue line is the interesting one internally, because colleagues do not pay money — but they do pay with time and attention, and willingness to invest those is a real signal.

The sharpest version of this test is the product-market-fit survey question, turned inward: how disappointed would you be if we took this away? If we removed the AI capability from your CRM tomorrow, would you fight to keep it, or would you quietly be relieved? I pointed out that this mirror is uncomfortable for internal consultants of every kind — including agile coaches and AI consultants. Are the teams you work with a keeper? Would they fight for your time, or grab the first opportunity to drop you when the neighboring function’s budget gets cut? It is a tough mirror, but it is far better to see it early than to be surprised.

Prototype your way down the risk curve

The good news is that the cost of buying down these risks has collapsed. The AI ecosystem — vibe coding and adjacent tooling — makes it cheap to create preliminary versions of things that genuinely work, even if they will never be production code. I think of these as the new generation of the concierge MVP or the wizard-of-oz prototype: instead of a human quietly working behind the screen to fake the product, you have ChatGPT or Claude plus some prompts and glue, and you can prove or kill an idea very cheaply.

Clemens agreed and added the part that matters most about why this works — it is not only about testing technical feasibility:

“It empowers the person with the idea, or removes some friction. You have the idea, you just slap something together really quickly, and you don’t have to go through the slog of ‘now I have to spec it out and get my devs and we have to build even the smallest throwaway prototype.’” — Clemens Adolphs

That friction removal is the real unlock. The faster the person with the idea can get to a working approximation, the faster the learning loop turns — assuming you have a learning loop to begin with, which is the quiet caveat that runs under this whole conversation. One especially low-risk move Clemens described: improve along a dimension that already has proven internal demand. If a team already pays for a tool because it saves them five minutes, you have low market risk on a tool that saves them an hour. Improving an existing, adopted behavior is far safer than asking people to adopt a brand-new way of working.

The AI-waterfall anti-pattern

When I asked Clemens for the anti-patterns — the early signs a project is heading somewhere bad — he named one cleanly. It is the AI-waterfall pattern, and it is so common precisely because it looks responsible:

“You decide ahead of time that this is going to be a six-month project, because you want to show it to your stakeholders, the board, the CEO, an industry conference. And because you also know exactly what you want, you map that all into user stories, and you say, six months divided by this is that many sprints, so you need to put this many stories into these sprints.” — Clemens Adolphs

As Clemens put it, you now have “a gantt chart without calling it a gantt chart.” And once you are there, the agile practices stop meaning anything. Why estimate the work, when the deadline is already fixed and the scope is already mapped? Why track velocity, when the stakeholder has already told you which stories must be done and when? The ceremonies continue, but they have been hollowed out — which is exactly the definition of agile theater.

I pushed back on Clemens here, because I think the steel-man matters. If I am the business owner funding this work, “no estimates, no velocity” sounds like I am being handed a black box. I have a real need to know where this is going. That need is legitimate, and it is the same need I heard from a CTO just the day before that recording, whose CEO was frustrated by a lack of delivery ownership. The answer is not to dismiss the stakeholder’s need for transparency. It is to meet that need with something better than a gantt chart.

Velocity is output. Traction is the needle moving.

What Clemens and I converged on is steering by outcome-based leading indicators. You do know things at the start of an AI initiative — you know the outcome you need, you know which use cases are the hard points and which are softer, you can say how many of those use cases are working at any given moment, and you can form a view of how that coverage should spread over the timeline. That gives the stakeholder a real signal: are we tracking in the right direction? It also, in my experience, defends against scope creep — without a clear outcome signal, it is tempting for everyone to keep bolting on “while we’re at it” use cases, well past the point of diminishing returns, because nobody is asking whether the base outcome at two months beats the embellished version at three.

This is the distinction I keep coming back to, and Clemens set it up perfectly by noting that a stakeholder is far more impressed by a demo showing “last time this couldn’t do it, now it can” than by a report saying “we completed 37 story points.” The distinction is between velocity and traction. Velocity is output — working stuff, produced at some rate. Traction is the needle moving toward where you actually want to be. You can have impressive velocity and zero traction. The job is to structure the work so you can show evidence on the real leading indicator every couple of weeks. That is harder than tracking velocity. It is also the only thing that actually answers the stakeholder’s question.

Principles travel; practices do not

The deepest point in the conversation was about why agile theater happens in the first place, and it is not about bad intentions. Clemens framed it as principles versus practices. Agile gives you principles. You apply those principles to a specific context, and from that you derive practices that make sense in that context. The mistake is taking the practices — codified, off-the-shelf — and porting them to a different context where the principle still holds but the practice no longer fits.

His example was test-driven development. For standard software, TDD is excellent — small unit tests, red-green-refactor, he would not work any other way. But for an AI system, how do you even write a unit test? “If I ask the GPT API to do this, it will do exactly that” is not a test you can write. The principle — work in small steps, verify each one, integrate toward the behavior you want — absolutely still applies. The practice has to be reinvented for the context. The same is true of estimates and burndowns: they came from a real context where business owners legitimately needed to know if things were on track. Buying the codified practice off the shelf, instead of going back to the principle and deriving the right practice for AI work, is how you end up with theater.

Clemens caught me trying to write a punchy line — “these projects are the place to be agile, not to do agile” — and gently corrected it: there is no place to do agile, really. The principles apply with full force to AI initiatives. The practices have to be earned in context. That is the discipline that keeps AI investments from dying in the proof-of-concept graveyard.

Watch the conversation

This article is based on my Scaling with Agility conversation with Clemens Adolphs, co-founder of AIce Labs. The full episode goes deeper on engagement models for AI delivery, the Swiss-cheese model of AI capability, and where user stories went wrong.

AI initiatives do not die because the models are weak. They die because they are run as fixed-scope projects in a domain that is anything but fixed. Treat them as product work — real discovery, real internal market validation, real outcome steering — and the proof of concept stops being a graveyard.