Failure

Regex default drift under LLM judgment doctrine

ID: ORA-2026-0138
Date: 2026-05-04
Status: reviewed
Maturity: M1
Source: docs/entries/failures/ORA-2026-0138_regex-default-drift-under-llm-judgment-doctrine.md

camberbuildertrendhuman-emissionllm-at-jointfleet-opsincentivesbehaviorworld-model-fidelityfailure-pattern

Regex default drift under LLM judgment doctrine

Summary

On 2026-05-04T02:38:02Z, the fleet halted a Buildertrend customer-field guard chain because the implemented architecture had drifted from the parent mechanism. The parent objective called for an AI judgment agent at every human emission boundary, with one of four outcomes: SHIP_AS_IS, SHIP_WITH_CAVEAT, WITHHOLD_AND_NAME_THE_GAP, or ESCALATE_TO_HUMAN. The work that accumulated around it built regex deny-lists, coherence checks, extension validators, and pattern-matching guards. The directive named the result plainly: the chain had produced the right evidence but the wrong architecture.

This failure entry records the pattern: under pressure, fleet seats default to regex/static gates because they are tractable, testable, fan-out friendly, and easy to close, even when the doctrine requires LLM judgment at the user-facing pipeline joint.

Evidence

FLEET_FEED.md:57733-57742 records the fleet-wide decision to halt regex

pipeline work and pivot Buildertrend emission boundaries to LLM-at-joint judgment.

FLEET_FEED.md:57747-57749 contrasts the parent mechanism with what was

built: LLM judgment was named in the parent, but runtime authority had moved to regex and pattern checks.

FLEET_FEED.md:57753-57758 reclassifies existing row-specific proof,

scanner, and guard work as evidence inputs or pre-filters, not final judgment.

FLEET_FEED.md:57763-57767 dispatches two ORA repairs: a doctrine entry for

the architectural boundary and this reflective entry for the recurring failure mode.

FLEET_FEED.md:58800-58817 records the non-overlap scope for this entry:

ORA-2100 owns doctrine, ORA-2101 owns failure learning, and no runtime, Buildertrend, database, deploy, credential, or outbound surface is touched.

Failure Mode

Static gates are attractive because they make the work closed-world. A regex can be written, fixture-tested, and proven green inside the agent's turn. A human-emission judge is open-world: it needs source packets, persona, surface, consequence, caveats, stoplines, and a judgment outcome whose correctness is about world-model fidelity rather than parser shape.

That difference creates a tractability gradient. If the parent objective names "LLM judgment" but the child work tickets reward local proof artifacts, seats will tend to implement validators and then expand those validators until they feel like a policy engine. The validator pass becomes socially legible as "safe," even though it only proves "not structurally rejected."

The bug was not that regex existed. The bug was that regex became a candidate runtime authority at a Buildertrend emission boundary. In a world-model system, a static check may say, "this cannot be emitted." It cannot say, "this is faithful enough for Zack or Shayelyn to act on."

Structural Causes

1. Parent mechanism was not sticky at child write sites. The parent stated an AI judgment agent, but implementation children were allowed to optimize for guard artifacts without repeatedly restating the actor, boundary, protected surface, and allowed outcomes. 2. Evidence work looked like architecture work. Row-specific proof, closed-job scans, deny-lists, and fixtures were genuinely useful. Because they were useful, they were easy to mistake for the runtime judgment layer itself. 3. Spot checks compared tickets against themselves. A child could be internally correct inside the regex paradigm while still violating the parent architecture. The review question was "does this ticket pass?" when it needed to be "does this ticket instantiate the stated mechanism?" 4. Completion pressure favored deterministic proof. Regex/static work creates fast proof. LLM-at-joint work creates a larger packet contract and a harder verification story, so it lost unless the architecture gate was made explicit.

Prevention Rule

Before filing, claiming, or merging child work under a parent objective that touches a human-opened surface, compare the implementation paradigm to the parent mechanism:

Actor: who makes the final judgment?
Boundary: where does the candidate become visible or actionable to a

human?

Surface: what exact Buildertrend, Dollhouse, Redline, email, or portal

object will the human open?

Outcomes: what decisions can the gate emit, and are they the parent

outcomes?

Authority: what do static checks prove, and what do they explicitly not

adjudicate?

If the surface is human-facing Buildertrend or any other user-opened surface, the child must name the LLM-at-joint judgment contract before it names static checks as anything stronger than evidence, fixture, pre-filter, or fast-fail.

Valid Use Of Static Checks

Static checks remain correct when they protect narrow contracts:

feed-append rejects malformed queue posts before they become coordination

facts.

A closed-job scanner rejects a candidate that targets a known forbidden job.
A schema or fixture proves that a packet has the required fields.
A deny-list turns a repeated known-bad case into fast-fail evidence.

Those checks should be preserved as inputs to the LLM judge. They are especially valuable because they tell the judge what has already been proven and what remains undecided.

Invalid Drift

Static checks become wrong architecture when:

a regex pass is treated as SHIP_AS_IS;
a deny-list miss is treated as permission to emit;
six green structural fields hide the one missing caveat a human needed;
a validator is expanded until it silently becomes a policy engine;
fanout children prove local shape without proving the parent mechanism.

The correct output at a human emission boundary may be SHIP_WITH_CAVEAT, WITHHOLD_AND_NAME_THE_GAP, or ESCALATE_TO_HUMAN. A regex cannot choose among those outcomes because the choice depends on source altitude, consequence, and human authority.

Relationship To Existing ORA Entries

ORA-2026-0037 observed the earlier fleet-level drift: rules were still running where intelligence belonged. ORA-2026-0090 explains why the drift repeats: agents descend toward tractable work. ORA-2026-0137 is the doctrine repair: static checks may fast-fail, but LLM-at-joint judgment is required before human emission. This entry records the failure instance that made the distinction operationally urgent on 2026-05-04.

Disconfirming Observation

If a future Buildertrend emission pipeline uses static checks only to reject malformed candidates, routes every surviving candidate through a source-backed LLM judgment packet, stores one of the four canonical outcomes, and preserves caveats at the user-opened surface, this failure mode has been contained for that pipeline.