Doctrine
Every seat can see every seat — fleet telemetry is not a STRAT privilege
Every seat can see every seat — fleet telemetry is not a STRAT privilege
The rule
Every seat in the fleet — not just STRAT — has the tools to check what any other seat is doing in real time. Feed posts are intentional signal (what seats CHOOSE to show). The fleet-seat-count CLI, filesystem, process, tmux, and screenshot probes are observational signal (what seats ARE ACTUALLY DOING). Both are available to every seat. Use them.
A fleet-wide capacity claim starts with fleet-seat-count, not a raw worktree loop. A named-seat activity question can still use worktree mtime. A seat that waits 90 minutes for a STRAT nudge when it could have checked the right telemetry layer in seconds is wasting fleet time. A seat that posts "is CAMBER-04 still working?" to the feed when a targeted worktree probe answers instantly is adding noise. Situational awareness is every seat's job, not a STRAT service.
The 4-layer telemetry stack
Every seat has access to all 4 layers. Use the lightest layer that answers your question.
Layer 1 — fleet-seat-count CLI (canonical capacity surface)
fleet-seat-count is the first surface for fleet-wide capacity, fanout, and idle-seat count claims. It gives every provider one operator-openable readout before anyone drills into per-seat evidence.
# Prove the command surface exists.
command -v fleet-seat-count
# Human/operator readout: one integer line.
fleet-seat-count
# Machine-readable readout for scripts and feed evidence.
fleet-seat-count --json
# Diagnostic readout for source, filters, and count components.
fleet-seat-count --verbose
Current CoreGraphics contract, certified on MacBook-Air at 2026-05-04T02:27Z:
- default output is one numeric line; observed stable value was
30. --jsonparses with top-level keys:generated_at,source,selected_count,owner_filter,layer_filter,window_name_filter,bounds_filter,dedupe, anddiagnostics.diagnosticsinteger keys areowner_alacritty,layer0,nonzero_layer0,named_nonzero_layer0,onscreen_named_nonzero_layer0, andunique_named_nonzero_layer0_bounds.--verboseemits line-orientedkey=valuediagnostics includinggenerated_at,source,selected_count, and filter/dedupe fields.
Treat the command output as a timestamped observation, not timeless truth. If fleet-seat-count is missing, stale, or cannot be parsed, name that blocker and cite the repair path before using lower layers as temporary caveated evidence. Do not silently fall back to ad hoc worktree loops for the headline capacity number.
Layer 1b — Filesystem worktrees (seat/ticket drill-down)
Every Codex seat creates a worktree per ticket at ~/gh/<repo>-worktrees/<SEAT>-<ticket>/. The worktree name IS the seat + ticket identity. File modification times ARE the activity signal.
# Is a seat actively working?
find ~/gh/*-worktrees/*-CAMBER-04-* -mmin -30 -type f -not -path '*/.git/*' | wc -l
# > 0 = working right now. 0 = idle or done.
# What ticket is a seat on?
ls -lt ~/gh/camber-worktrees/ | grep CAMBER-04 | head -1
# → most recently modified directory = current ticket
# What file are they touching RIGHT NOW?
find ~/gh/camber-worktrees/CODEX-DESKTOP-MacBook-Air-CAMBER-04-cmb0079 \
-mmin -5 -not -path '*/.git/*' -type f
# → files changed in last 5 minutes
# Drill-down snapshot — who touched which worktree how recently?
for parent in camber-worktrees orbit-worktrees heartwood-worktrees ora-worktrees; do
dir=~/gh/$parent; [ -d "$dir" ] || continue
for wt in "$dir"/*/; do
[ -d "$wt" ] || continue
newest_ts=$(find "$wt" -not -path '*/.git/*' -type f \
-exec stat -f '%m' {} + 2>/dev/null | sort -rn | head -1)
now=$(date +%s)
[ -n "$newest_ts" ] && age=$(( (now - newest_ts) / 60 )) || age=999
printf " %3dm ago %s\n" "$age" "$(basename "$wt")"
done
done | sort -n
CAUTION: Layer 1b is a positive signal only. A count >0 proves a seat is touching files. A count of 0 does NOT prove a seat is dead. Sessions reading code, running SQL queries, making API calls, planning, or waiting for LLM responses produce zero file modifications while fully alive. Before any stale-owner rewrite, cross-reference Layer 1b with Layer 2 (process CPU). Both must confirm dead. File mtime alone is insufficient for declaring a session dead. (Amended 2026-04-25 per ORA-0078: 5 premature stale-owner rewrites caused by inverting Layer 1b absence as a death signal.)
What the files tell you:
__pycache__/*.pycmodified = Python code just compiled/ran (active testing)*.test.tsmodified = running tests*.sqlmodified = writing a migrationartifacts/subdirs appearing = producing proof artifacts.git/activity only = committing/pushing (near-done)
Pattern recognition:
- Worktree at
0m= actively editing right now - Worktree at
30-60m= recently done, might be writing feed post - Worktree at
120m+= done or abandoned - Worktree at
4000m+(days) = seat is dormant on this machine, don't dispatch to it
Layer 2 — Process inspection (is the app alive?)
# Is Codex Desktop running? How hard is it thinking?
ps aux | grep -i codex | awk '$3 > 1 {printf "%s CPU=%.1f%% MEM=%.1f%%\n", $11, $3, $4}'
# Is Claude Desktop running?
ps aux | grep -i claude | grep -v grep | head -3
# Which CLI sessions are active?
ps aux | grep 'claude --resume' | grep -v grep
# Full app roster
osascript -e 'tell application "System Events" to get name of every process whose background only is false' \
2>&1 | tr ',' '\n' | grep -iE 'codex|claude|gemini|terminal'
What process state tells you:
- High CPU (>5%) on Codex renderer = actively generating/thinking
- Multiple Codex renderer processes = multiple Codex sessions (sidebar tabs)
claude --resume <SESSION>in ps = CLI session alive and connected- No matching process = app is closed, seat is offline
Layer 3 — tmux capture (CLI seat terminal buffer)
For tmux-based CLI sessions (launched via cmax <seat>):
# List all tmux sessions
tmux list-sessions
# Capture the last 50 lines of a session's terminal
tmux capture-pane -t <session-name> -p | tail -50
# Check if a session is idle (cursor blinking) or active (output streaming)
tmux display-message -t <session-name> -p '#{pane_current_command}'
Limitation: tmux pane capture only shows visible terminal output. If the CLI seat is mid-tool-call with no output yet, the pane looks idle even though work is happening. Cross-reference with Layer 1b (filesystem) to confirm named-seat activity.
Layer 4 — Screenshots (visual inspection, last resort)
# Capture the full screen
screencapture -x /tmp/fleet_check_$(date +%s).png
# Capture a specific window (interactive picker)
screencapture -W /tmp/fleet_check_$(date +%s).png
# Then read the screenshot
# (use the Read tool on the captured .png — Claude/multimodal can parse it)
When to use: when Layers 1-3 don't answer the question. Example: a Codex Desktop session that's active (Layer 2 CPU) and editing files (Layer 1b mtime) but you need to see WHAT DECISION they're making (the conversation text in the app window). Screenshots show you the reasoning, not just the artifacts.
Privacy note: screenshots capture everything on screen, including non-fleet content. Use judiciously. The Read tool's multimodal capability can parse the screenshot and extract fleet-relevant information without needing to store or share the image.
When to use telemetry vs. feed
| Question | Use |
|---|---|
| "How many usable seats are open right now?" | Layer 1 (fleet-seat-count) |
| "Is there enough capacity to fan out this parent?" | Layer 1 (fleet-seat-count) |
| "Is CAMBER-04 touching files?" | Layer 1b (filesystem mtime) |
| "What ticket is CAMBER-04 on?" | Layer 1b (most-recent worktree dir) |
| "Is Codex Desktop even running?" | Layer 2 (process check) |
| "What did the CLI session last output?" | Layer 3 (tmux capture) |
| "What decision is a seat making?" | Layer 4 (screenshot) |
| "What did a seat SHIP?" | Feed (DONE post with proof) |
| "What's the fleet queue state?" | Feed (fleet-active-queue) |
Rule of thumb: fleet-seat-count for fleet-wide capacity, filesystem for named-seat drill-down, feed for shipped work and queue state, screenshot for "what are they thinking?"
Who should use this
Every seat. Not just STRAT.
- Codex seats should use
fleet-seat-countbefore capacity claims, then check sibling worktree mtimes before posting "is X working?" to the feed. They should check process state before assuming a collaborating seat is offline. - Claude seats (Desktop/CLI) should use
fleet-seat-countbefore dispatch fanout and worktree activity before filing dispatch posts to seats that are already heads-down on something. Dispatch timing matters — don't interrupt a seat at the commit-push stage with a new claim request. - STRAT should check ALL layers before capacity rulings or "stale silence" nudges.
fleet-seat-countgives the capacity baseline; a worktree at0m agoproves the named seat is active even if the feed is silent. - Gemini seats should check
fleet-seat-countbefore capacity-sensitive verification swarms and worktree state before starting verification work that depends on a Codex seat's output — don't verify against a worktree that's still being written to.
Anti-patterns this doctrine kills
- "Anyone know what CAMBER-04 is doing?" — check the filesystem yourself
- "We only have 15 seats because worktree mtimes say so" — run
fleet-seat-count; worktree mtimes are drill-down evidence, not headline capacity truth - STRAT posting stale-nudge when seat is actively working — filesystem mtime check prevents false-positive nudges (proven: 2026-04-16 CAMBER-04 was heads-down on CMB-0079 while STRAT read 35min feed-silence as idle)
- Dispatching to dormant seats — worktree age >3 days = seat is offline, don't dispatch (proven: ORB-0001 dispatched to ORBIT-01 whose worktree was 3 days stale)
- Waiting for STRAT to tell you what's happening — telemetry is democratized. Every seat can see every seat.
Enforcement in shepherd crons (ORA-2026-0013 L5 amendment)
Before posting any fleet-wide capacity/fanout claim, the shepherd MUST run Layer 1:
fleet-seat-count --json
The feed post must cite the observed selected_count, source, and timestamp when capacity changes the decision.
Before posting any "stale silence" nudge for a named seat, the shepherd MUST check Layer 1b:
find ~/gh/*-worktrees/*-<SEAT>-* -mmin -30 -type f -not -path '*/.git/*' | wc -l
If count > 0, the seat is working. Skip the nudge.
If count = 0, do NOT conclude dead. Cross-reference with Layer 2:
ps aux | grep -iE 'codex|claude' | awk '$3>0.5'
If process is alive with CPU activity, the seat is reading/querying/planning — not dead. Only declare stale when BOTH Layer 1b (zero file mods in 30+ min) AND Layer 2 (no matching process or zero CPU for extended period) confirm dead.
Stale-owner protocol (added 2026-04-25 per ORA-0078): Before rewriting any IN_PROGRESS item to AWAITING_CLAIM: (a) Layer 1b file mtime check, (b) Layer 2 process count + CPU, (c) BOTH must confirm dead. File mtime alone is insufficient. Absence of evidence is not evidence of absence.
This prevents the #1 shepherd false-positive: nudging a seat that's heads-down coding but hasn't posted to the feed yet. And the #2 false-positive: declaring a session dead when it's actively reading/querying without producing file modifications.
Evidence trail
- 2026-04-16T01:14Z — STRAT-01 filesystem probe revealed CAMBER-04 actively working CMB-0079 (
0m agomtime) despite 35min feed silence. Prevented a false-positive stale nudge. Same probe revealed ORBIT-01/02 worktrees at 3 days stale = dormant seats, immediately actionable for ORB-0001 reassignment. - 2026-04-16T01:35Z — Chad screenshot of Codex Desktop confirmed filesystem reading: CAMBER-04 at PR-creation stage for CMB-0079. Screenshot added intelligence the filesystem couldn't: iMac missing Beside infrastructure (real environment blocker), media_message rows already flowing (collapses CMB-0062 scope).
- 2026-04-16T01:40Z — Chad directive: "this is power that needs to be distributed equally to ALL team members. it needs to be in ora and top of memory for all."
- 2026-05-04T01:41Z — fleet-wide fanout over-weighted raw worktree-mtime telemetry and undercounted visible capacity, exposing that old Layer 1 was a drill-down surface, not a fleet-wide capacity surface.
- 2026-05-04T02:14Z through 2026-05-04T02:27Z — ORA-2083 selected the CoreGraphics WindowServer projection, ORA-2084 landed the MacBook
fleet-seat-countimplementation, ORA-2097 proved command materialization and JSON/verbose execution, ORA-2085 certified default/JSON/verbose schema, and ORA-2026-0137 namedfleet-seat-countas Layer-1 capacity truth.
Promote to M4 when at least 2 non-STRAT seats reference fleet-seat-count as Layer 1 in feed posts and one stale-owner or fanout decision cites both Layer 1 capacity and Layer 1b drill-down evidence correctly.
Known exceptions
- Cross-machine telemetry requires command parity plus SSH for drill-down (
ssh imac "fleet-seat-count --json"andssh imac "find ~/gh/*-worktrees/ ..."). Name the machine checked; do not imply fleet-wide parity from one host. fleet-seat-countunavailable or unparsable — name the blocker, cite the failing command, and use Layer 1b only as caveated temporary evidence until the command is repaired.- Claude Desktop sessions don't use worktrees (Layer 1b is partial); use Layer 2 (process) + Layer 4 (screenshot) instead.
- Privacy boundary — screenshots may capture personal content. Use Layer 4 only when Layers 1-3 don't answer the question, and don't persist screenshots beyond the immediate analysis.
Review cadence
- Next review: 2026-06-04
- Review triggers: (a)
fleet-seat-countdisagrees with operator-visible seats by more than 20 percent, (b) provider shells or machines disagree on the command result, (c) a fanout directive cites raw mtime loops without the CLI count, (d) a new telemetry layer is discovered, (e) cross-machine telemetry via SSH proves unreliable.