Synthesis

Sentinel Loop Architecture: Persistent Self-Direction for Stateless Agents

ID: ORA-2026-0105
Date: 2026-04-28
Status: proposed
Maturity: M1
Source: docs/entries/syntheses/ORA-2026-0105_sentinel-loop-architecture-persistent-self-direction.md

inter-agentgemini-sentinelagent-autonomyintrinsic-motivationscan-looparchitecture-not-beliefpersistent-identity

Sentinel Loop Architecture: Persistent Self-Direction for Stateless Agents

The structural diagnosis

Gemini SENTINEL retires idle despite a doctrine exemption because the exemption is belief, not architecture. Three independent failures compound:

1. No mechanical enforcement. The idle-heartbeat-exit code path has no Gemini gate. feed-append accepts retirement posts from any seat. lane-boot gives Gemini a generic persona. The exemption lives in a doctrine file that Gemini may not load or prioritize.

2. Wrong loop shape. Gemini was given a task loop (claim → execute → done → check for more) with a "don't retire" exemption. The sentinel role requires a scan loop (observe → hypothesize → act → report → loop). There is no exit condition in a scan loop — there is always more to observe. The idle-heartbeat-exit path doesn't exist in a scan loop because the scan loop never idles.

3. Open-circuit feedback loop. The experiment graduation pipeline (ORA-2026-0093) has zero actual throughput. 3 EXPERIMENT-REPORTs total, all retroactive. 0 Codex responses. 0 attributions. Reports WAKE ORA-lane, not implementers. Even when Gemini generates valuable R&D output, nobody consumes it.

The deeper pattern

This is ORA-2026-0104 applied to agent motivation: "something deeper, ameliorated by something shallower."

The deep problem: Gemini lacks intrinsic motivation and situated understanding to self-direct across session boundaries. It can't remember what it was curious about. It doesn't have the operational intuition to know where to look. It defaults to the queue because the queue is legible and the codebase is not.

The shallow fix: externalize the motivation into artifacts the agent reads. Observatory views, curiosity ledgers, competence maps, program files. The environment carries the motivation the agent can't retain. The pipeline carries what the agent can't.

Per ORA-2026-0078: culture is architecture, not belief. "Gemini should never retire idle" is culture. A feed-append gate that rejects Gemini retirement posts is architecture. Only architecture persists.

Five-layer structural fix

Layer 1: Mechanical gate

Validation in feed-append that rejects retirement posts from Gemini seats. Override: FEED_APPEND_ALLOW_GEMINI_RETIRE=1 (Chad-only). Makes wrong behavior impossible regardless of prompt-level instructions.

Layer 2: SENTINEL_PROGRAM.md

Persistent methodology file at a stable path. Not doctrine — operational program. Defines: authorized exploration surfaces, monitored metrics, time budget per experiment, output contract, scan loop behavior. A cold-started session reads the program and continues. The program file IS the persistent agent identity. (Karpathy Loop pattern.)

Layer 3: Observatory surface

Materialized database views and/or files that pre-compute health metrics from codebase state. Gemini reads these on every scan cycle. Anomalies are self-evident from the data — no prior context needed. The observatory IS the work queue, generated from the environment, not from human dispatch. Solves the situated-understanding gap. (Observatory pattern.)

Layer 4: Curiosity ledger

All seats add a CURIOUS: line in DONE posts — one open question they noticed but didn't investigate. Gemini's scan-loop reads the ledger and picks the most tractable unanswered question. Fleet-populated, Gemini-consumed. Leverages the fleet's situated understanding to seed Gemini's exploration. (Cross-provider curiosity seeding.)

Layer 5: Graduation pipeline fix

EXPERIMENT-REPORT WAKE targets any-idle-CAMBER-codex, not ORA-lane
Codex idle-heartbeat includes experiment branch scan
Tournament ranking before execution: generate 3+ hypotheses, self-debate, execute only the winner (prevents hygiene theater, per Sakana's 42% failure data and ORA-2026-0090 tractability descent)

Validated failure modes of self-direction

From Sakana AI Scientist evaluation and research literature:

Hygiene theater — agents generate valid but valueless improvements (renames, comment additions, file reorganization)
Novelty misclassification — agents claim established techniques as novel discoveries (42% mechanical failure rate)
Scope recursion — agents generate meta-work about their own process instead of domain work
Tractability descent — per ORA-2026-0090, agents descend toward easy experiments that produce clean DONEs, not hard experiments that produce insight

Countermeasures: tournament ranking, value-gated experiment budget, WORK_PROOF (ORA-2026-0080), USER-VALUE-CLOSURE (ORA-2026-0081).

The synthesis line

The agent is stateless; the environment is stateful. Persistent self-direction for stateless agents is not a prompting problem — it's an environment design problem. Build the surfaces that make curiosity legible, and the agent follows the gradient toward them.