Going from Outputs to Outcomes

AI multiplies output, but it's not multiplying outcomes. How do we close the gap?

Miguel Dias presenting outputs to outcomes at Merantix Berlin — Outputs to Outcomes · Merantix Berlin

The five problems nobody stopped talking about.

Across the panel, the talks, and the coffee-line conversations at Prompt 2026. Ranked loosely by how often they surfaced.

Most cited

Output is multiplying. Outcomes aren’t.

An engineer shipped five microservices with AI instead of one. Nobody pays per microservice. How do we know we’re building the right things?

The exception that proved the rule: Intercom was the only talk that tied engineering effort to a single business metric — support cases resolved with the user’s problem actually solved.

Recurring

Picking what to build.

Getting the why and the what before the how. Working closer with customers to decide what actually matters.

Recurring

Burnout and overload.

Context switching is up. More agents, more concurrent work. One CEO prescribed longer hours. Others pushed back.

Recurring

Harness engineering (the weeds of agents, PRs, CD).

Everyone is building a version of this: orchestration, guardrails, feature flags, evals, the loop that keeps agents shipping safely. Different names, same problem.

Open problem

Quality is degrading.

More code, more bugs. Speakers named agents addressing quality in production as an unsolved problem.

First of all, What will your CTO ask end of quarter?

Three-month window. The questions you’ll get — and what you now have answers for.

“Show me the AI ROI.”

More code ≠ more revenue. What’s the conversion mechanism we’re building?

“Are we delivering on our commitments?”

Can we ship what we promised to our biggest customers, on time? If not — where is it stuck, and why?

“_______________________?”

What’s your CTO actually asking? The right row beats both.

Best guess at your CTO’s quarterly. The point of the last row is the one we haven’t guessed.

Output vs Outcome Economics

Local optimisation lives on Outputs. CTO targets are Outcomes.

Output economics

code shipped
PRs merged
deploys / week
dev velocity

Outcome economics

revenue-bearing features
bad bets killed early
end-to-end TTM
quality (fewer incidents, faster recovery)

▸ This deck moves the conversation to the right.

Example scenario, B2C

Feature hit rate — what % of your bets paid off?

~700 engineers × ~€100k fully-loaded ≈ ~€70M / yr
~100 features shipped per year
~5–10 generate significant revenue
Lead time idea-to-production: ??? — Why’s this the single most outcome-predicting metric?

Now — what if 5–10 became 15?

5–10 extra winners × €500k–1.5M each

≈ €2.5M – €15M / yr upside

Same team. Same budget. That’s the prize this deck is chasing.

Three ways to move feature hit rate.

What % of your bets paid off? What can we do to increase it?

Hire more capacity.

coordination tax goes up
Brooks ramp — new hires get worse before better
onboarding dip on the team carrying them

Same capacity — raise hit-rate and time to market.

moves the outcomes
more money coming in, same capacity

Something else?

massive restructuring, team redesign…

— left open. Tell us what’s on the table.

Option (c) is the one we don’t want to assume. Reorgs and restructuring often show up at growth-stage companies but rarely make the slide. Worth naming what’s actually in scope before this conversation closes.

What it takes to increase feature hit-rate:

Same capacity, raise hit-rate and throughput = more revenue-bearing features. Two things need to move at the same time.

Lever 1: Faster time to market

faster product discovery to experiment results
engineering enables product leaders to experiment faster
Look at: lead time from idea to production

Lever 2: Quality

more code = more quality problems
no way around it except improve the system
Look at: incidents, time to recover, rework eating capacity

Both are measurable. Both respond to the same intervention: seeing the value stream clearly and managing it together.

Example — how a 2-week cadence injects waste.

Lever 1 in action. One pattern — yours may be different. Mapping the value stream is how you find out.

One pattern. Yours may be different — mapping the value stream is how you find out which structural waste is actually costing you. Local dev speed is not the bottleneck; the cadence between phases is a cross-stream constraint.

Same stream — released from the cadence.

Same items, no sprint trap. Below the Gantt: where the measurement hooks would attach to a generic delivery stack.

Start by talking to teams about what data matters and how to collect it (this will change over time and is never finished). You’ll likely need to build custom aggregations on top of the tools you already have. Improve them continuously. Start small. Big-bang adoption fails.

Example — where quality degrades.

Lever 2 in action. How more AI-generated code, without system-level feedback, leads to more incidents and harder recovery.

This is a system problem. You can’t fix it by reviewing harder. You fix it by making the system visible: where defects enter, how long they take to find, what the rework costs. That’s a value stream conversation.

A B2B example: where both levers show up.

A real company. Complex release involving six teams in sequence. Touch time per stage is a few hours. Cycle time per stage is weeks. That gap is the waste.

Why

No production-like test environments or test data.

No continuous testing — testing happens at the END, not throughout.

Big-batch release train: high error surface, impossible to isolate which change caused the defect.

Last-minute features pushed by business → product → devs → shipped untested.

Technical debt dragging every team down.

100 days

Total lead time

< 1 day

Total work time

< 1%

Activity ratio

Business complains: “development and testing take too long.” “Deployments always go wrong.” But we are all causing the problems we complain about.

Manage the value stream — as a team.

Why we invite all these teams to talk to each other.

Each team optimizing locally was hurting the whole company — despite seemingly helping themselves.

Business saw

How last-minute feature pressure caused the quality problems they complained about.

Product saw

How skipping testing created the rework that made “development takes too long” true.

Engineering saw

How the lack of tech investment (test envs, test data, continuous testing) was the root cause of customer churn.

Everyone saw

One day of work. One hundred days of waiting. The constraint isn’t speed — it’s the system.

Lever 1: Faster TTM

faster feedback
kill bad bets earlier
team time goes to features, not firefighting

Lever 2: Better quality

fewer production incidents
less rework eating capacity
happier, returning customers
necessary tech investments become visible + justified

Siloed visibility: each team thinks they’re doing fine.
End-to-end visibility: teams pursuing local goals are hurting each other and the company.

Value stream metrics + talking to your teams show you where the problems are — and help teams find the solutions. I’m not a product coach. You need to get great at product discovery. But the value stream shows you where to look.

From conversation to receipts.

The path from a VSM conversation with your teams to the numbers your CTO is asking about.

VSM conversation with the teams doing the work handoffs, waits, rework — named together

→

Flow metrics cycle time, batch, hit-rate leading indicators that change weekly, not yearly

→

Revenue + hit-rate % features generating revenue lagging, 12-month window — the number your CTO already cares about

The Steer

Flow metrics aren’t a dashboard you admire. They’re how you kill bad bets earlier, ship the good ones faster, and walk into the quarterly review with the receipts.

Two phases to instrument first. Let’s pick them before our next session.

What working together looks like.

Two modes. A 90-day shape. How outcomes get measured. No price on this slide — that’s a conversation.

Mode 1

Workshopknowledge work · days

remap one stream with your team
agree the hooks together
define what gets measured

Mode 2

Embeddedoutsourced work · weeks

~3 days/week in your standups
instrument the hooks across your stack
run the loop with one team · handoff at the end

90-day shape (illustrative)

Week 1

Workshop

remap one stream · agree hooks · define what gets measured

Weeks 2–12

Embedded

instrument hooks across your delivery stack · run the loop with one team

Week 13

Handoff

metrics live in your tools, owned by your team

Outcomes measured by:

Lead time idea-to-production (leading)
Quality: incident rate, time to recovery (leading)
% features generating revenue (lagging, 12-month window)
The band of hooks itself (the artifact you keep)

No price on the slide. The conversation about scope, sequence, and commercials lives in the call.

The loop that turns output into outcome.

Closing figure — the mechanism slide 02 promised, drawn. Step 4 is the bridge.

Value stream map · idea → production → learning → idea 

Output is steps 1–3. Outcome is step 4. The loop closes at step 5.