· Ai Transformation · 8 min read
AI Agents Can Now Run Toward Goals — Are Yours Worth Running Toward?
Claude's new /goal feature lets AI agents keep working until a completion condition is met. Every canonical example they show is about output: tests pass, code compiles, backlog empties. What's conspicuously absent is outcome. Here's why that gap matters and what it will take to close it.
Click image to open full size TL;DR
Claude’s new /goal feature lets you set a completion condition and have the AI keep working across turns until the condition holds — a lightweight autonomous loop without you having to prompt each step. It is a genuinely useful step toward higher-agency AI. But look at the canonical examples: migrate an API until every call site compiles and tests pass, implement a design doc until all acceptance criteria hold, split a large file, empty a labeled issue backlog. Every single one is output-oriented. Nothing in the example list asks whether a feature was actually adopted, whether a page works well for visitors, or whether a presentation landed with the audience. The feature is real; the goals people will default to are the wrong unit.
The bottleneck this reveals is not about the feature itself. It is about observability. For an AI agent to chase outcome-oriented goals, it has to be able to close the feedback loop — to ask whether value was actually delivered, not just whether the task was technically complete. That feedback loop does not yet exist for most of the things that matter most. Building it is the next frontier, and it is where the real work of moving from AI activity to impact will happen.
A New Loop in Claude
Anthropic recently shipped a capability called completion goals: you set a target condition with /goal, and Claude keeps working across turns until the condition is met. After each turn, a lightweight model checks whether the condition holds. If it does not, Claude starts another turn instead of returning control to you. No more nudging, re-prompting, or babysitting a multi-step sequence.
This is meaningfully different from a one-shot prompt. It is closer to delegating a problem to someone and telling them not to come back until it is done. In my own workflow I have been building something similar manually — a Ralph loop, a recurring automation that keeps Claude working on a thread until a stopping condition is reached. The /goal feature moves that pattern into the harness itself, which makes it accessible to anyone without custom scripting.
The mechanics are sound. The interesting question is what people will use this for.
What the Examples Say About Us
When Anthropic introduces a feature like this, the canonical examples they choose are telling. Here is what they offered: migrating a model to a new API until every call site compiles and tests pass; implementing a design doc until all acceptance criteria hold; splitting a large file into focused modules as a tech debt improvement; working through a labeled issue backlog until the queue is empty.
All technical. All verifiable by the agent itself. All output.
That list is not wrong — those are real, useful things to automate. But look at what is not on the list. There is no example of “keep working on this landing page until it converts well.” No “keep iterating on this feature until users adopt it.” No “keep refining this presentation until the audience understands it.” No “keep improving this content until it earns meaningful reach.” Nothing that asks whether the thing we built actually mattered to the people it was supposed to serve.
Those absences are not accidental. They reflect a real constraint: AI agents can close the loop on technical correctness far more easily than they can close the loop on human value. Whether tests pass is observable by a machine. Whether people use a feature, whether a page works for real visitors, whether a talk lands — those require a fundamentally different kind of signal.
Output Doesn’t Create Impact
This is the core tension. Output is easy to measure inside the system. Outcome lives outside it, in the behavior and experience of the people you were trying to help.
When you set a completion goal around a technical criterion, the agent has clear stopping conditions it can evaluate autonomously and reliably. When you set one around an outcome, you immediately run into the question: how would the agent observe whether that condition holds? The agent can write the code. It cannot measure whether the code moved the metric you care about. It can publish the blog post. It cannot tell you whether anyone read it, thought differently as a result, or took a meaningful next step.
This is not a knock on /goal. It is a precise description of where the real leverage is. Automating output production is valuable. But the constraint for most organizations trying to get real value from AI is not that they lack the ability to produce more output. It is that the output they produce often fails to convert into the outcomes they care about, and they have no clear signal on why or how to change that.
The New Bottleneck: Observability
If you want to give AI agents outcome-oriented goals, you need to solve a prior problem: how does the agent know whether the outcome was reached? This means instrumentation. It means closing the feedback loop between what AI produces and whether that production moved the needle. It means building the observability layer that lets a completion condition like “users adopted this feature” or “this content performs” actually be evaluated, not just assumed.
Most organizations do not have that instrumentation today — not for AI outputs, and often not for human outputs either. We track task completion, story points, PRs merged, tickets closed. We are much weaker on adoption rates, usage patterns, business metric movement, and the causal chain between what we built and what changed. AI acceleration makes this gap more expensive, not less. When the rate of output production increases dramatically, the cost of producing things that nobody uses or that fail to achieve their purpose scales with it.
The work of building new observability is not glamorous. It does not feel as exciting as shipping a /goal feature. But it is what separates the organizations that will use agentic AI to drive real impact from those who will use it to drive impressive-looking activity.
What Will Have to Be True
For outcome-oriented goals to become the norm in agentic AI, a few things need to happen. First, people need to learn how to frame outcome-oriented conditions effectively. Most of us default to output because that is what we have historically tracked and because outputs are easier to specify. Writing a good outcome condition requires clarity about what change you want to see in the world, not just what artifact you want produced.
Second, the feedback loop needs to be closeable by the agent. That means investing in the observability layer — instrumentation, telemetry, measurable leading indicators of value — so that a completion condition based on adoption or quality or business impact can actually be evaluated, not just claimed. For some domains this is a relatively short road; for others it is a multi-year engineering and organizational investment.
Third, the tooling needs to evolve to handle the parts of outcomes that still require human judgment or physical-world observation. Image generation, usability testing, real user feedback, business metric validation — these require either agent-accessible tools or explicit human checkpoints in the loop.
The organizations that get ahead of this will not just be faster at producing output. They will be able to aim that speed at the right targets and know when they have hit them.
Start Asking the Question Now
You do not need to wait for full observability infrastructure to start reorienting your AI goals. The first move is to ask, for every goal you set: is this a completion condition for an output, or for an outcome? If it is output, is that output reliably connected to the outcome you actually care about, and do you have enough signal to know when it is not?
That question will quickly surface the gaps. It will show you where you are measuring task completion and calling it progress. It will point toward the observability investments worth making. And it will make visible the distinction between AI as an accelerant for activity and AI as a driver of actual impact.
The /goal feature is a real step forward. Use it. But the highest-leverage move is not to chase more efficient loops around output-oriented goals. It is to build the feedback systems that let you aim agentic AI at what actually matters.
What are you doing with completion goals? Are yours output-oriented or outcome-oriented — and what would it take to close the loop on a real outcome?
Watch the Update
The gap between what AI produces and whether it mattered is the next frontier. Build for that one.
About Yuval Yeret
Yuval is a rare practitioner who has shaped the agility path of dozens of organizations and influenced the frameworks used across the industry. He helps product and technology leaders move from agile theater to evidence-informed, outcome-oriented delivery that creates better value sooner, safer, and happier.