I went undercover at an exclusive AI conference for Leaders in Tech in Berlin

A day of listening at an AI engineering conference. Five problems people kept naming, three patterns I'd push back on — and the real question underneath, which isn't only mine: I keep hearing the same signal from colleagues far more experienced than me — technology consultants, agile and Kanban coaches, DevOps experts. What are they buying? For which problems? And why, when the answer looks obvious, is the cheque still so hard to get?

Audience at Prompt 2026, Merantix AI Campus Berlin — Audience at Prompt 2026 · Merantix AI Campus Berlin

The five problems nobody stopped talking about.

Across the panel, the talks, and the coffee-line conversations. Ranked loosely by how often they surfaced.

Most cited

Output is multiplying. Outcomes aren't.

An engineer shipped five microservices with AI instead of one. Nobody pays per microservice. How do we know we're building the right things?

The exception that proved the rule: Intercom was the only talk that tied engineering effort to a single business metric — support cases resolved with the user's problem actually solved.

Recurring

Picking what to build.

Getting the why and the what before the how. Working closer with customers to decide what actually matters.

Recurring

Burnout and overload.

Context switching is up. More agents, more concurrent work. One CEO prescribed longer hours. Others pushed back.

Recurring

Harness engineering (the weeds of agents, PRs, CD).

Everyone is building a version of this: orchestration, guardrails, feature flags, evals, the loop that keeps agents shipping safely. Different names, same problem.

Open problem

Quality is degrading.

More code, more bugs. Speakers named agents addressing quality in production as an unsolved problem.

Everyone agreed outcomes matter more than output.
Nobody said how.

Two things I'd push back on.

My take, not consensus. I'd love to be wrong on any of them.

Keynotes panel at Prompt 2026 — Keynotes panel · Prompt 2026

"More concurrent agents means more throughput."

Little's Law says the opposite. More work in progress, more context switching — longer lead times, longer feedback loops, worse customer outcomes. A concrete test: ask your team how many things each engineer has in flight right now. If the answer is more than two, your lead times are longer than they need to be.

ii.

"We need to work longer hours to compete with the US."

This one actually gets under my skin. Everyone in the room agrees we should favour outcomes over output. When push comes to shove, we keep valuing output — and staying vague on how we'd even move from one to the other. "Work longer hours" assumes more hours produce better outcomes. They don't. It's the same output-over-outcome trap. The better question: what feedback loops would tell us we're working on the right thing in the first place?

Harness engineering, properly framed.

The exciting bet of 2026 — and what it actually means underneath the buzzword. My take, informed by Dave Mangot and Bryan Finster.

Harness engineering = continuous delivery + the agent loop that keeps agents running the software delivery cycle.

— My working definition

The exciting part isn't building features. It's building the loop — and the continuous delivery primitives that let agents run it safely. Get this right and you get more features with fewer developers, faster time to market, and (if you invest in the infrastructure) genuinely good quality.

Value stream map · idea → production → learning → idea 

The focus shifts from building features to building the loop.

Supporting pillars (table stakes).

Platform engineering. Self-serve environments, evals, CI. Without this, individual productivity gains don't translate into organisational throughput.

Good DevOps practices. Trunk-based development, small batches, fast feedback, blameless incident response — the boring stuff.

Continuous delivery. Every commit is a candidate for production. If the pipeline isn't safe, the agent loop amplifies your dysfunction.

The pattern I actually came here to see.

I've run my VSM business for a year. VSM addresses every one of the five problems above. People nod. Nobody buys. That might not be a discovery problem, but a buying problem.

People don't buy what hurts. They buy what their company already values enough to fund — what they can get a budget line for without a fight.

— A hypothesis, held loosely

If people see the problems, agree VSM solves them, and still don't buy — the pattern fits one of three explanations. The goal of discovery now is to figure out which.

Category mismatch. They'd buy "something" but not "that thing." VSM doesn't fit their mental model of what they purchase.

Budget mismatch. They believe in it, but their org doesn't value it enough to fund it. There's no internal budget line it naturally lands in.

Pain mismatch. They complain the way people complain about the weather. The hurt isn't excruciating enough to fund.

Where the money actually flows.

The same five problems, re-examined through a buying lens instead of a pain lens.

I made this table sitting in the back of the last talk. It's uncomfortable. The pains closest to my heart are the ones with no procurement category. The pains with real budget are exactly the ones I used to solve as a DevOps engineer — not as a VSM consultant. Harness engineering sits squarely in the first column.

Pain	Do they buy for this?	Who sells it today
Output → outcomes	Rarely	Product coaches. Marty Cagan books. Hard to monetise.
Picking what to build	Indirectly	They buy "product strategy" or hire a CPO instead.
Burnout / too busy	No	Nobody buys their way out of burnout. They hire headcount.
Harness engineering	Yes	Thoughtworks. Platform engineering hires. Harness. CircleCI. Now: agentic-CD vendors.
Quality / more bugs	Yes	Testing vendors. SRE contractors. QA platforms.

The observation, not yet the conclusion

VSM is a meta-solution. People don't buy "find your bottleneck." They buy "migrate from manual Kubernetes to Terraform" or a "Product Management training". The object-level thing is what goes on the purchase order. The meta-level thing can be what I deliver inside it.

What I'm testing next.

Holding the pivot lightly. The discovery questions below are designed to falsify, not confirm.

Is this a positioning problem I can reframe my way out of — or is it a product problem I have to rebuild my way out of? And underneath both: is harness engineering something companies will buy as a service, or something every serious team builds in-house?

"Are you buying harness engineering as a service, or building it in-house — and who owns it?"

The load-bearing question. If nobody buys it, I'm not investing in building the capability. If they do, I want to know who signs the cheque.

"If you were going to solve this, what would you Google?"

Tests whether VSM is in anyone's category. If zero people say anything close, I have a language problem.

"Last time you brought in outside help for something like this — what was it called on the invoice?"

Literally tells me the line item that gets approved. The procurement-shaped answer.

"Is this a tool problem, a training problem, or a people-in-the-room problem?"

If they say "tool," VSM is dead on arrival. If "people-in-the-room," I'm alive.

"If I showed up Monday and fixed one thing, what would it be?"

The answer is the product I should be selling — not the one I want to be selling.

"What's your CTO asking you to prove about the AI spend, and what are you actually able to show them?"

The gap between ask and answer is where flow measurement consulting lives.

One hallway conversation.

The most useful exchange of the day didn't happen on a stage. A Nordic engineering leader named the problem "moving from outputs to outcomes" and asked me what I'd do about it.

I gave her the answer I keep coming back to in my own work:

Align on a common goal that has clear benefits for both the organisation and the customer.

Align on how we're delivering that outcome — and, on what's currently preventing us from delivering it.

Make the preventing forces visible, then remove the most impactful one. Rinse and repeat. This is where flow measurement earns its keep: the gap between the outcome we said we wanted and the work we're actually doing.

What I'm doing about it.

Concrete next steps out of the day. The headline one sits at the bottom — everything else feeds it.

Follow up with the Nordic attendee to continue the outputs-to-outcomes conversation.

Test the three-step framing against someone who already named the problem out loud.

Research the companies and follow up with the people I met. Understand their problems, the metrics they actually track, and any events they're running next.

Turn warm hallway contact into structured discovery.

Run a lean-coffee-style focus group on these findings with my network plus a few event attendees.

Cheaper than ten 1:1s. Surfaces disagreement faster.

Collect industry signal: DORA 2025, McKinsey on developer productivity, Bryan Finster on harness engineering and agentic CD, Dave Mangot on agentic proficiency as the new SaaS premium.

Ammunition for the outputs-vs-outcomes conversation. Links in Further reading below.

Visit the community / incubator the growth guy mentioned. Also the crypto community he flagged.

Different rooms, possibly different problems, possibly different budgets.

Scan open roles at event sponsors and the companies whose speakers I heard.

Reveals where they're actually investing — often a better signal than what they say on stage.

Explore Merantix itself — open positions, partnerships, other angles worth a conversation.

They hosted. They have reach. Worth understanding what they're actually building.

Keep doing sales. Keep talking to people. Keep diagnosing their problems. Keep evolving the service offering until it matches something they'll actually fund.

The only one that matters. Everything above is in service of this.