Failure
Regex default drift under LLM judgment doctrine
Regex default drift under LLM judgment doctrine
Summary
On 2026-05-04T02:38:02Z, the fleet halted a Buildertrend customer-field guard chain because the implemented architecture had drifted from the parent mechanism. The parent objective called for an AI judgment agent at every human emission boundary, with one of four outcomes: SHIP_AS_IS, SHIP_WITH_CAVEAT, WITHHOLD_AND_NAME_THE_GAP, or ESCALATE_TO_HUMAN. The work that accumulated around it built regex deny-lists, coherence checks, extension validators, and pattern-matching guards. The directive named the result plainly: the chain had produced the right evidence but the wrong architecture.
This failure entry records the pattern: under pressure, fleet seats default to regex/static gates because they are tractable, testable, fan-out friendly, and easy to close, even when the doctrine requires LLM judgment at the user-facing pipeline joint.
Evidence
FLEET_FEED.md:57733-57742records the fleet-wide decision to halt regexFLEET_FEED.md:57747-57749contrasts the parent mechanism with what wasFLEET_FEED.md:57753-57758reclassifies existing row-specific proof,FLEET_FEED.md:57763-57767dispatches two ORA repairs: a doctrine entry forFLEET_FEED.md:58800-58817records the non-overlap scope for this entry:
pipeline work and pivot Buildertrend emission boundaries to LLM-at-joint judgment.
built: LLM judgment was named in the parent, but runtime authority had moved to regex and pattern checks.
scanner, and guard work as evidence inputs or pre-filters, not final judgment.
the architectural boundary and this reflective entry for the recurring failure mode.
ORA-2100 owns doctrine, ORA-2101 owns failure learning, and no runtime, Buildertrend, database, deploy, credential, or outbound surface is touched.
Failure Mode
Static gates are attractive because they make the work closed-world. A regex can be written, fixture-tested, and proven green inside the agent's turn. A human-emission judge is open-world: it needs source packets, persona, surface, consequence, caveats, stoplines, and a judgment outcome whose correctness is about world-model fidelity rather than parser shape.
That difference creates a tractability gradient. If the parent objective names "LLM judgment" but the child work tickets reward local proof artifacts, seats will tend to implement validators and then expand those validators until they feel like a policy engine. The validator pass becomes socially legible as "safe," even though it only proves "not structurally rejected."
The bug was not that regex existed. The bug was that regex became a candidate runtime authority at a Buildertrend emission boundary. In a world-model system, a static check may say, "this cannot be emitted." It cannot say, "this is faithful enough for Zack or Shayelyn to act on."
Structural Causes
1. Parent mechanism was not sticky at child write sites. The parent stated an AI judgment agent, but implementation children were allowed to optimize for guard artifacts without repeatedly restating the actor, boundary, protected surface, and allowed outcomes. 2. Evidence work looked like architecture work. Row-specific proof, closed-job scans, deny-lists, and fixtures were genuinely useful. Because they were useful, they were easy to mistake for the runtime judgment layer itself. 3. Spot checks compared tickets against themselves. A child could be internally correct inside the regex paradigm while still violating the parent architecture. The review question was "does this ticket pass?" when it needed to be "does this ticket instantiate the stated mechanism?" 4. Completion pressure favored deterministic proof. Regex/static work creates fast proof. LLM-at-joint work creates a larger packet contract and a harder verification story, so it lost unless the architecture gate was made explicit.
Prevention Rule
Before filing, claiming, or merging child work under a parent objective that touches a human-opened surface, compare the implementation paradigm to the parent mechanism:
- Actor: who makes the final judgment?
- Boundary: where does the candidate become visible or actionable to a
- Surface: what exact Buildertrend, Dollhouse, Redline, email, or portal
- Outcomes: what decisions can the gate emit, and are they the parent
- Authority: what do static checks prove, and what do they explicitly not
human?
object will the human open?
outcomes?
adjudicate?
If the surface is human-facing Buildertrend or any other user-opened surface, the child must name the LLM-at-joint judgment contract before it names static checks as anything stronger than evidence, fixture, pre-filter, or fast-fail.
Valid Use Of Static Checks
Static checks remain correct when they protect narrow contracts:
feed-appendrejects malformed queue posts before they become coordination- A closed-job scanner rejects a candidate that targets a known forbidden job.
- A schema or fixture proves that a packet has the required fields.
- A deny-list turns a repeated known-bad case into fast-fail evidence.
facts.
Those checks should be preserved as inputs to the LLM judge. They are especially valuable because they tell the judge what has already been proven and what remains undecided.
Invalid Drift
Static checks become wrong architecture when:
- a regex pass is treated as
SHIP_AS_IS; - a deny-list miss is treated as permission to emit;
- six green structural fields hide the one missing caveat a human needed;
- a validator is expanded until it silently becomes a policy engine;
- fanout children prove local shape without proving the parent mechanism.
The correct output at a human emission boundary may be SHIP_WITH_CAVEAT, WITHHOLD_AND_NAME_THE_GAP, or ESCALATE_TO_HUMAN. A regex cannot choose among those outcomes because the choice depends on source altitude, consequence, and human authority.
Relationship To Existing ORA Entries
ORA-2026-0037 observed the earlier fleet-level drift: rules were still running where intelligence belonged. ORA-2026-0090 explains why the drift repeats: agents descend toward tractable work. ORA-2026-0137 is the doctrine repair: static checks may fast-fail, but LLM-at-joint judgment is required before human emission. This entry records the failure instance that made the distinction operationally urgent on 2026-05-04.
Disconfirming Observation
If a future Buildertrend emission pipeline uses static checks only to reject malformed candidates, routes every surviving candidate through a source-backed LLM judgment packet, stores one of the four canonical outcomes, and preserves caveats at the user-opened surface, this failure mode has been contained for that pipeline.