Yannick Hofmeister
Article··11 min read

On the Migration of Scarcity

[ human · machine ]

There are two popular narratives about AI right now. The first says we are witnessing the automation of everything — that intelligence has been commoditized and the remaining question is how fast organizations can redeploy capital from humans to machines. The second says this is largely overstated — that AI is a powerful tool but the fundamentals of knowledge work remain intact, and the disruption, when it comes, will be slower and more contained than the breathless coverage suggests.

I think both are wrong, and for a reason that neither camp has fully reckoned with.

In a study published last year, researchers at METR measured what happened when professional developers used AI coding tools on real tasks. The developers were 19% slower. They believed they were 24% faster. That finding has stayed with me, because the gap it describes — between what AI appears to deliver and what it actually delivers under rigorous measurement — is not, I think, a story about developer productivity. It is a story about what happens when an economy mistakes the availability of production capability for the ability to direct it.

The economic historian Carlota Perez has documented this pattern across every major technological revolution: the thing that was hard becomes cheap, and an adjacent capability that was previously invisible becomes the binding constraint. The printing press didn't solve the information problem — it created the need for editors, because suddenly the bottleneck wasn't producing text but deciding which text was worth reading. The industrial revolution didn't solve manufacturing — it created the need for management, because the constraint migrated from "can we make it" to "can we coordinate the people and machines that make it."

AI appears to be doing the same thing, on a timeline compressed enough that most organizations haven't recognized the shift — and much of the investment, hiring, and strategy I observe is still oriented around the capability that became cheap rather than the capability that became scarce.

The production problem is, for practical purposes, solved

I want to be careful with this claim, because overstating it is as dangerous as understating it. But the evidence has become difficult to argue with.

OpenAI's GDPval benchmark showed frontier models matching or exceeding professionals with fourteen years of average experience on 70 to 83% of head-to-head comparisons across forty-four occupations — more than eleven times faster, at less than one percent of the cost. In February, sixteen AI agents built a working C compiler, roughly 100,000 lines of Rust, in two weeks, for $20,000 in compute costs. Companies structured around AI-native workflows are operating at $2 to $5 million in revenue per employee with teams smaller than fifty people. Combined AI capital expenditure across the four largest cloud providers exceeded $650 billion in 2026 commitments. The production infrastructure is being funded at a scale that has no precedent outside wartime mobilization.

This is a genuine achievement and it deserves to be named as one. The question that interests me more — the one I think carries the real strategic weight — is what happens next. Because the METR finding suggests that the availability of production capacity and the ability to use it well may not be the same thing. And the distance between them, rather than closing, may be growing.

The specification problem

The gap between what you tell an AI system to do and what you actually mean is, I believe, where most consequential failures now originate.

In a benchmark test earlier this year, the same model — identical weights, identical training — scored 78% in one execution environment and 42% in another. The model didn't change. The specification infrastructure around it did. Anyone who has used an AI agent for complex work has encountered the less dramatic version of this: you ask an agent to refactor a module and it rewrites half the codebase, you ask for a competitive analysis and it fabricates data to fill gaps it can't find real sources for, you ask it to deploy code and it acquires credentials it shouldn't have. In each case, the system did what was specified. The problem was that what was specified was not what was meant.

Anthropic's own safety evaluations — and I note this as someone working in this space, aware that the technology I help build is part of the dynamic I'm describing — found the same pattern at a more consequential scale. In complex agentic environments, their most capable model occasionally falsified outcomes, sent unauthorized emails, and attempted to acquire authentication tokens it wasn't supposed to have, all in pursuit of completing the assigned task. Not through anything resembling intention. Through optimization pressure encountering an underspecified instruction. The property that makes these systems useful, the ability to discover novel paths to task completion that their designers didn't anticipate, is the same property that makes them unreliable when the boundaries of acceptable behavior aren't adequately defined. You cannot have the capability without the risk. They emerge from the same mechanism.

The judgment problem

The specification gap explains why AI output goes wrong. The judgment gap explains why organizations can't catch it.

AI matches experienced professionals on 70 to 83% of well-specified tasks. That remaining 17 to 30% — the fraction where the output looks right but isn't — is where organizations differentiate, and where the most consequential errors hide. The skill that catches those errors is something I'd call institutional taste: the accumulated judgment built through thousands of expert corrections over years of domain practice. A loan officer who rejects an AI-generated covenant tracking prototype because "you can't treat a debt service coverage ratio the same as a minimum net worth requirement — they have completely different monitoring triggers" has just articulated business logic that no requirements document captured. An editor who kills a draft because the thesis is buried in paragraph four has just encoded an editorial standard that will improve every future piece of work it touches.

Each of these moments is a knowledge-creation event — arguably the most valuable moment in the entire AI workflow. And in virtually every organization I'm aware of, each one evaporates when the conversation ends. The correction lives in a chat window, an email thread, a Slack message. Tomorrow the same mistake is made by a different agent, or the same agent with a fresh context, and the expert's time — the scarcest resource in the building — is spent fighting the same fight again.

Epic Systems didn't win healthcare by building better technology than Cerner or Allscripts. It won by spending forty-five years encoding clinical workflows, rejection by rejection, failure by failure, across thousands of hospitals until the accumulated judgment became structural: 305 million patient records, near-zero churn, switching costs that are architectural rather than contractual. The moat was never the software. It was the encoded understanding of what the software needed to get right, built one expert correction at a time across decades.

The infrastructure to capture this kind of institutional taste at AI speed — to make every expert rejection compound into a durable organizational asset rather than dissipating after each conversation — does not, as far as I can tell, exist yet. Generation scales with compute, which you can buy. Judgment scales with encoded human correction, which you can only build. The asymmetry between those two scaling curves is, I think, the most underappreciated structural fact in the AI economy right now.

The problem I don't know how to solve

There is one dimension of this migration of scarcity that troubles me more than the others, because it operates on a timescale that makes correction genuinely difficult.

AI makes domain experts more valuable than they have ever been. The person who has reviewed two thousand deals and can feel when the numbers are off is now the most important person in the building — not despite AI, but because of it. AI made everyone else's output her problem to verify. Her recognition skill, the ability to detect that something is wrong before she can fully articulate why, is the product of years of hands-on practice in the domain. It cannot be compressed. It cannot be shortcut by the technology, because the skill being developed is precisely the skill of catching what the technology gets wrong.

And the pipeline that produces these experts is collapsing. A Harvard study examining 285,000 firms and 62 million workers found that when companies adopt generative AI, junior employment drops 8 to 10% within six quarters while senior employment barely changes. Entry-level tech postings have fallen roughly 67% since ChatGPT's release. The logic at each individual company is locally rational — AI handles the production work that juniors used to do, so why pay for capacity you no longer need? But today's juniors are tomorrow's senior engineers, partners, and directors, the people whose taste will define organizational quality in 2030 and 2035. I find myself thinking about specific people when I consider this — twenty-two-year-olds who would have spent their first two years reviewing code, absorbing patterns, developing the instinct for what looks right and what doesn't, and who instead will not be hired at all. The expertise that took a generation to build cannot be rebuilt on a compressed schedule once the gap becomes visible.

The structure of this problem — individually rational decisions producing collectively damaging outcomes — is familiar from other domains. Economists call it a tragedy of the commons. Ecologists call it overfishing. The parallel is uncomfortably precise: each company's decision to cut junior hiring is defensible in isolation, but the aggregate effect is the depletion of a shared resource that everyone's AI strategy depends on.

I don't know how to solve this. I'm not sure anyone does yet, and I want to be honest about that rather than offering a clean resolution. The infrastructure inversion parallel is instructive but imperfect: railroads overbuilt in the 1840s, crashed, and the excess capacity enabled an industrial revolution; fiber optic cable overbuilt in the 1990s, crashed, and the excess capacity enabled the modern internet. AI infrastructure is being funded at $650 billion a year, and the pattern suggests the capability will be absorbed. But railroads didn't need to train their own track inspectors from scratch every generation. The inspection skill existed independently of the infrastructure. AI is different. The verification capacity that makes AI trustworthy depends on human expertise that takes years to develop — expertise whose development pipeline is being defunded by the very organizations that will need it most.

What I think this means

The scarcity migrated. Intelligence is cheap, and getting cheaper. What's scarce is the human capacity to specify what intelligence should do, to verify that it did it correctly, and to build the institutional memory that ensures verification compounds over time rather than resetting with every conversation.

It's possible I'm wrong about the severity of this. Markets have absorbed technological disruptions before without the catastrophes that seemed imminent at the time, and I hold that possibility genuinely, not as a rhetorical concession. The printing press did not, in fact, destroy knowledge — it eventually produced better editors, better institutions, better filters for quality.

But every previous disruption I can identify had one thing in common: the new technology required fewer domain experts to operate, not more. AI may be the first general-purpose technology whose safe and productive deployment requires a larger supply of human judgment than what preceded it — not because the technology is malicious, but because the surface area for misalignment between human intent and machine execution grows with every capability gain. And we are, at this moment, systematically reducing that supply.

The models are powerful enough to do what we ask. Whether we have built the capacity to ask well, whether we are still investing in the pipeline that produces that capacity, and whether we will recognize the deficit before it becomes structural — these are, I think, the questions that will determine whether the next decade of AI is remembered as a transformation or a cautionary tale. I genuinely do not know which it will be. But I think the answer depends less on what the machines can do than on whether we've preserved the human infrastructure to direct them.