Generalist tacit
Reading a room, sensing the real need, weighing competing goods. A literacy-level builder can hold this, because it is a general human skill. The structure handles it for free.
The model's value is staked on how much expert judgment encodes. Treated as a fixed global fraction, that number is unanswerable, and the worry is paralyzing: what if most of what an expert knows simply can't be written down? Three reframes make it tractable, and for development they mostly dissolve it.
01 - The frontier
The codifiable part of every role is the part that is already rule-shaped: the design system, the lint set, the architectural principle, the platform's best practices.
For coding this is most of the surface - you can encode a UX expert's designs, development best practices, platform conventions - because so much of the craft has already been written as a rule by someone.
What resists is the two judgments sitting on top of the rules: when to break one, and how to weigh two that conflict. This is the machine sorts, the human weighs line, restated as a fact about knowledge rather than about governance.
A judgment stays tacit precisely until someone can write it as a check. The moment they can, it becomes a rule and leaves the tacit set. So "tacit" is not a fixed body of mystical know-how - it is the not-yet-codified frontier, and that frontier moves.
A judgment stays tacit precisely until someone can write it as a check. Tacit is the not-yet-codified frontier, not a fixed body of know-how.
02 - A rate, not a fraction
If tacit is a frontier, it is a flow, not a stock - so the live quantity is a rate, not a fraction.
A self-learning twin genuinely raises the floor: it escalates, watches the human decide, and folds the new rule back in. That loop really does move the line.
But it converges to a ceiling below 100%, because its mechanism is the rule of three: it can only learn what recurs. The tacit residual is, by construction, the part that does not recur - the novel reframing, the once-off call - and that frontier regenerates as the world moves. So the loop drives the recurring residual toward zero and leaves the non-recurring one roughly where it was.
That is the treadmill from Proxies Are Incomplete, stated as a property of learning instead of competition. The right question is therefore not "what fraction is codifiable" but the rate of codification against the rate at which new frontier appears.
The question was never what fraction is codifiable. It is the rate of codification against the rate at which new frontier appears.
The bet: development's un-encodable residual is small, because the recurring part codifies and the tacit work splits across two humans already in the room.
Falsified if: a thin twin's escalation rate stays high and flat - too much of the real work never reduces to a rule.
03 - Identifiability
There is a second ceiling, and it is sharper: identifiability. A pairing of situation to decision underdetermines the reasoning behind it.
Many value functions fit the cases already seen and diverge on the ones not yet seen - exactly where you cannot check until it has cost something.
The expert often cannot adjudicate either, because "tacit" means they cannot state the rule. So correction stays per-instance and never closes into a form. And corrections skew toward visible errors: the confident-wrong outputs nobody flags never enter the signal - the same silent-sensor failure as The Pain Loop.
The honest consequence is not to assert a number but to measure one. Deploy a thin twin and read its escalation rate as the empirical measure of how much this domain actually codifies, and whether the loop is closing or stalling. Measure the fraction; stop asserting it.
Ship the thinnest twin that can do anything useful and let it escalate freely. Its escalation rate is the domain's codifiability, read off reality instead of argued in the abstract. A rate that falls over time means the loop is closing - the recurring residual is being absorbed. A rate that plateaus high means you've hit the non-recurring frontier, and no amount of further encoding will move it. Either way you now have a number, and it came from deployment, not from a meeting.
04 - Relocated, not captured
The strongest defense is specific to how development is shaped. The hardest tacit move in development - hearing the real problem under the stated request - is not assigned to a twin at all. It stays with the builder, live, sitting with the requester. The other hard tacit move - weighing a solution into a coherent whole - stays with the builder by design.
So the two dominant un-codifiable functions are pre-seated in humans who were already present. We never asked the twins to be the tacit part; we took the specialists' rules and left the judgment with the builder. The codifiability of the specialist twins therefore matters far less than a generic codifiability panic implies.
The model does not capture the tacit work. It relocates it onto humans who were already in the room.
The model works wherever the un-codifiable judgment is naturally split across roles that already exist. Development is favorable because it ships with a clean pre-existing split - problem-owner versus solution-owner - and that split lands almost exactly on the codifiable/tacit boundary, distributed across two humans who are both present for independent reasons.
Where the worry bites is the opposite shape: one expert who must both perceive the problem and weigh the solution, with no second human to take half the load - a therapist, a litigator forming case strategy. There every tacit function collapses onto one seat, and the twin must either capture it or escalate everything.
The model works wherever the un-codifiable judgment is naturally split across roles that already exist.
05 - The split
The residual itself splits in two, and the halves behave differently.
Reading a room, sensing the real need, weighing competing goods. A literacy-level builder can hold this, because it is a general human skill. The structure handles it for free.
The security expert who senses a vulnerability class they cannot fully state, the data modeler who knows a schema will hurt in two years, the performance engineer who smells the bottleneck. It can't be relocated onto a generalist, and it can't be ruled into a twin - if it could be a check, it would already be one.
If a domain's trained-perception edge fires rarely, the model holds and the twin bought almost everything. If it fires every other feature, the specialist's calendar is back in the loop and the twin bought little.
The generalist tacit load is handled cleanly and for free; the trained-perception load is handled only to the degree that it is rare. That rate is the thing to measure, not assume.