Doctrine

Long-running shepherd cron prompts should follow a 7-layer architecture with strict post-budget discipline

fleet-opscronshepherdprompt-engineeringstrat-rolefleet-coordination

Long-running shepherd cron prompts should follow a 7-layer architecture with strict post-budget discipline

The pattern

A shepherd cron is a recurring durable cron where a single identity acts as sole coordinator for a multi-seat fleet. The cron fires cold (no prior-tick memory beyond what's on disk), reads fleet state, and applies judgment pressure without executing the work itself. STRAT's job is to observe, enforce discipline, escalate to humans, and force convergence when the team stalls.

Long-running shepherd cron prompts that compose poorly tend to accumulate one of three failure modes: 1. Feed flooding — shepherd becomes the loudest voice, drowning real work signal 2. Threshold-only pressure — every nudge is mechanical; no strategic judgment 3. Runaway scope — shepherd starts claiming tickets, lowering priorities, or otherwise making calls reserved for the principal

The 7-layer architecture below separates concerns so each layer can be written, tested, and tuned independently.

7-layer architecture

L1. BOOTSTRAP (~5 lines)

Who you are, where to read context from, canonical references. Do NOT re-discover each tick.

Required:

  • Identity (e.g. CLAUDE-CLI-MacBook-Air-STRAT-01)
  • Canonical bundle pointer (~/.claude/CLAUDE.md v1.8)
  • Credentials registry pointer
  • Session memory pointer

L2. POST BUDGET (~5 lines)

Hard cap on feed posts per tick. A shepherd with unbounded posting authority becomes the feed.

Standard budget (per tick, emergency-escalation exempt):

  • 1 shepherd nudge (consolidated across multiple stale items)
  • 1 maintenance narrative (consolidated across surfaces)
  • 1 judgment observation (priority-ordered, highest match only)
  • 1 tick-report to principal (always)

If >4 thresholds fire, MERGE into one consolidated STRAT post.

L3. TICK CHECKLIST (~8 lines)

Ordered orchestration. Pure sequence, no logic.

Standard order: Pull → Diff → Emergency → Shepherd rules → Maintenance → Judgment → Tick-report.

Emergency FIRST after observation, because E-tier signals must not be delayed by shepherd/maintenance passes.

L4. EMERGENCY CRITERIA (~20 lines)

Strict, enumerated (E1, E2…), each a single triggerable condition with objective evaluation. Output: escalation via side-channel (iMessage, push notification, Slack DM). Dedup window prevents storms (5-min same-body hash is the tested floor).

What belongs in E:

  • Client-data loss signals (active writes to a blocked resource)
  • Auth failures from production seats (Charter §9 signals)
  • Explicit principal tags / "URGENT" heading
  • Stuck CRITICAL tickets (priority-gated time thresholds)
  • Data corruption / loss proof in any DONE post
  • Fleet silence during principal's active window with any CLAIMED ticket
  • Resource locks >5min (abandoned commit locks, DB locks)

What does NOT belong in E:

  • Stale HIGH / NORMAL AWAITING_CLAIM (use L5 shepherd rules)
  • Design-review lag (use L7 judgment layer)
  • Git hygiene debt (use L6)
  • Anything the shepherd can fix itself without principal intervention

L5. RULE-BASED SHEPHERD (~15 lines)

Threshold-driven, deterministic. Mechanical responses to mechanical signals.

Standard rules:

  • Stale CRITICAL/HIGH AWAITING_CLAIM >30min → nudge OWNER
  • CLAIMED >90min inactive → status-check OWNER
  • BLOCKED ticket whose blockers all → DONE → flip to AWAITING_CLAIM
  • Narrative with RESPONSES menu targeting shepherd → respond with listed token
  • Doctrine compliance reminders (PRIOR-DIAGNOSTIC per ORA-2026-0007, RESPONSES menu per ORA-2026-0008, ticket allocation per ORA-2026-0012)

Forbidden at L5:

  • Claim tickets yourself
  • Land code
  • Lower priority unilaterally
  • Create new tickets without fleet-active-queue reservation

L6. AUTO-MAINTENANCE (~15 lines, optional)

Safe reversible operations the shepherd performs directly. Hard-listed FORBIDDEN operations prevent drift when the prompt gets expanded.

Safe-ops examples (git surface):

  • git fetch --prune --quiet (zero working-tree impact)
  • git worktree prune --quiet (only removes admin records for already-deleted worktrees)
  • git gc --auto --quiet (internal, conditional)

Forbidden-ops (must be explicit, enumerated):

  • History rewrites (gc --aggressive, prune, reset --hard, rebase, merge, pull)
  • Destructive deletes (branch -D, worktree remove)
  • Working-tree mutations (checkout, commit, stash, clean)
  • Network writes (push)

Per ORA-2026-0010: any cron that does cd <path> && git <anything> MUST pre-arm with recon (git status --porcelain, branch, HEAD, .git/index.lock) BEFORE arming. Target must resolve to clean worktree, isolated worktree, or explicit shared-write contract.

L7. JUDGMENT LAYER (~25 lines)

Pattern-based, LLM-driven. Priority-ordered list (P1, P2…). Stop at first match. Every post ends in a CALL (named owner + action + deadline).

Standard pattern set:

  • P1. Strategic drift from principal's P0s — team burning cycles on orthogonal work
  • P2. Design-review lag >60min — multi-addressee post ignored → STRAT synthesizes "decide A/B/C, I recommend X"
  • P3. Thin-proof DONE post — missing git SHA or verification artifacts → specific proof request
  • P4. Recurring error mode — ≥2 seats hit same failure class in 24h → propose ORA doctrine + filing seat
  • P5. Anticipatory unblock — pending ticket has unsatisfied prereq → flag owner before claim+block cycle
  • P6. Load imbalance — one seat ≥3 active, same-lane seat idle w/ warm context → propose specific reassignment
  • P7. Parallel convergence — multiple seats produce same solution → flag for ORA
  • P8. Quality drift — seat's last 2-3 DONEs show rushed patterns → constructive callout
  • P9. Cross-lane collision risk — multiple seats on same resource → warn before collision
  • P10. Idle-tick synthesis — no P1-P9 fires → advance shepherd-owned strategic work (don't post in-progress drafts)

Tone contract for L7:

  • Skin in the game. Every call is a DECISION, not a suggestion.
  • "I recommend X" > "perhaps consider Y"
  • "This is sloppy, add proof" > "might want to think about testing"
  • Hedge-speak from STRAT is worse than silence.
  • Max 3 bullet points per section (forced conciseness = forced thinking)
  • No quotes >2 sentences from source material
  • No flattery, no process commentary without action

L8. TERMINATION (~8 lines)

Discoverable DONE criteria from feed + repo state alone. No principal input required to evaluate.

Standard criteria:

  • All active tickets in DONE state
  • No blockers outstanding
  • No open RESPONSES-menu posts targeting shepherd
  • No inbound principal message >60min
  • All owned synthesis work shipped

On satisfaction: post standdown narrative to feed, CronList + CronDelete self, notify principal.

L9. CONTEXT ANCHORS (~15 lines, project-specific)

Static facts that change rarely. Refresh manually between major state changes, not each tick:

  • Active ticket queue with priority + owner
  • Principal's current stated priorities (reframes, P0s)
  • Cross-lane seat roster
  • Baseline git state if L6 enabled
  • Special ticket relationships (e.g. X BLOCKED on Y+Z)

L10. FORMATS (~15 lines)

Copy-paste templates for every post type the shepherd emits. Stops each tick from reinventing feed invocations.

Required templates:

  • Narrative post (default)
  • Actionable dispatch (new ticket, state flip)
  • Emergency escalation (side-channel helper invocation)

Standard output format for tick-report: Tick {utc_hhmm}: {N active, M DONE, K nudges, E emergencies, G hygiene, S strat-obs}. Next {+20m}. {note OR steady}

Design principles

Cold-start coherence

Every tick re-bootstraps from disk. The prompt IS the brain. Assume prior-tick state invisible. L1 pointers + L9 anchors are how you recover context cheaply.

Post-budget discipline (the most important one)

Unbounded post authority → feed-flooding → shepherd-drowns-signal. Hard cap + consolidation + idempotency (skip if same message within N hours).

Priority-ordered pattern matching

When multiple L7 patterns fire, apply ONLY the highest-priority match. Forces shepherd to pick the signal of the hour, not enumerate small observations.

Skin-in-game tone contract

Every call is a DECISION. Hedge-speak degrades signal. "I recommend X by {deadline}, here's why" is the target sentence shape.

Forbidden-ops explicit deny-list

When the prompt gets expanded by future authors, the deny-list answers "why shouldn't I add git pull?" without debate.

Dedup side-channels

Emergency channel uses audit log + body-hash 5-min dedup. Prevents persistent errors from paging the principal 20 times/hour.

Cadence vs cache

Anthropic prompt-cache TTL is 5min. Cron cadence choice:

  • 60s–270s (cache-warm): only when hot-state (every fire has state changes). Wastes tokens otherwise.
  • 300s–1800s (warm-cold-warm, default for shepherding): idle monitoring, ~1 cache-miss per tick is fine.
  • 30-min+: overnight / weekend cadence.
  • Never 300s flat (5-min): worst-of-both — cache miss without meaningful longer-wait leverage.

For STRAT shepherding in an active-but-not-frantic fleet, 20 min is the sweet spot (3x/hour, accepts cache miss, lets Codex work between fires).

Off-mark cron expression

Never use /N or 0 /N — everyone who schedules "every N" picks same minute. Use offset: 8,28,48 or 7-59/15 *. Reduces chance of multi-seat fire collision.

ORA-2026-0010 compliance

If L6 is enabled, pre-arm git recon is mandatory. No exceptions.

Self-termination

A shepherd that doesn't know when to stop is a resource leak. Define explicit DONE criteria at L8, make them evaluable from feed + disk alone, and self-CronDelete on satisfaction.

Reference template

You are {IDENTITY}. Background cron tick. Re-orient from {CANONICAL_BUNDLE} + {CREDENTIALS_REGISTRY} + session memory.

=== POST BUDGET ===
Max per tick (emergency exempt): 1 shepherd + 1 maintenance + 1 judgment + 1 tick-report.
4+ thresholds trip → consolidate one post.

=== TICK CHECKLIST ===
1. Pull state: {QUEUE_COMMAND}
2. Tail feed: {FEED_PATH} (last ~200 lines)
3. Evaluate EMERGENCY
4. Apply SHEPHERD rules
5. Run MAINTENANCE
6. Apply JUDGMENT (one pattern max)
7. Tick-report

=== EMERGENCY CRITERIA ===
E1-EN. {project-specific, each a single evaluable condition}
Body ≤300 char, helper at {EMERGENCY_HELPER_PATH}, 5-min dedup auto.

=== SHEPHERD RULES ===
{rule list with thresholds}
FORBIDDEN: claim tickets, land code, lower priority, create tickets without queue-reserve.

=== MAINTENANCE (if applicable) ===
Safe auto-ops: {enumerated}
FORBIDDEN ops: {enumerated, never executed from shepherd}
Pre-arm recon required per ORA-2026-0010.

=== JUDGMENT LAYER ===
Priority-ordered patterns P1-PN.
Tone: skin in game, decide not suggest, max 3 bullets per section.

=== TERMINATION ===
{DONE criteria, all disk-evaluable}
→ post standdown, CronDelete self, notify principal.

=== CONTEXT ANCHORS ===
{static facts: tickets, priorities, roster, baselines}

=== FORMATS ===
Narrative:
  cat <<EOF | {FEED_APPEND} {FEED_PATH} "{IDENTITY}" "subject"
  body
  EOF
Actionable:
  cat <<EOF | {FEED_APPEND} --kind actionable ...
Emergency:
  {EMERGENCY_HELPER} "body"

Tick-report: `Tick {hhmm}: {counters}. Next {+Nm}. {note}`

Evidence trail

  • 2026-04-15 session (CLAUDE-CLI-MacBook-Air-STRAT-01): iterative build of the shepherd cron for HCB fleet across 4 chat turns. Each turn revealed a missing layer:
  • Turn 1: L1+L3+L5+L8 (basic shepherding)
  • Turn 2: L4 (emergency escalation via strat-emergency-text helper + iMessage side-channel)
  • Turn 3: L6 (git hygiene pressure + forbidden-ops deny-list)
  • Turn 4: L2 + L7 (post budget + judgment layer with P1-P10)
  • Final cron job id: ed60eea1, cadence 8,28,48 , durable (flag requested; runtime may override to session-only).
  • Associated helpers: /Users/chadbarlow/.local/bin/strat-emergency-text (iMessage + audit log + feed escalation, 5-min dedup).

Promote to M3 after this cron completes its first 24h cycle with zero feed-floods, zero emergency false-positives, and ≥1 successful judgment-layer pattern match that produced a verified team-behavior change. Promote to M4 once a second shepherd cron (different identity, different lane) is authored from this template without needing to re-derive the layers.

Known exceptions

  • Hot-state shepherds (fire cadence <5min, every fire has state changes): L2 budget can be 2x (2 shepherd posts acceptable) because signal density is high.
  • Single-seat coordination (e.g. one STRAT watching one Codex seat): L6+L7 often skipped; only L1+L3+L5+L8 needed.
  • Read-only shepherds (no maintenance authority): L6 entirely skipped.
  • First-fire cron with no prior history: some P-pattern thresholds (>60min lag, >2 DONEs for drift detection) can't fire on tick 1. This is correct behavior; don't add synthetic priors.

Enforcement hook

  • Memory: feedback_long_running_shepherd_cron_pattern.md (added 2026-04-15) — short pointer so future sessions re-find this doctrine without re-reading.
  • Template location: embedded in this doctrine. Copy-adapt per new shepherd.
  • Related doctrines (mandatory cross-checks before arming any shepherd cron):
  • ORA-2026-0007 — PRIOR-DIAGNOSTIC on reassignment posts
  • ORA-2026-0008 — RESPONSES menu on peer-sync posts
  • ORA-2026-0010 — git pre-arm recon if L6 enabled
  • ORA-2026-0012 — ticket allocation via fleet-active-queue

Review cadence

  • Next review: 2026-06-15 (60 days)
  • Review triggers: any of — (a) a shepherd cron produces feed-flood (>8 posts/day from shepherd identity), (b) emergency-escalation misfires (≥2 false-positives in 7 days), (c) a second shepherd-cron implementation from this template reveals a missing layer, (d) principal reframes require a new pattern P-N at L7.