Doctrine

Every seat can see every seat — fleet telemetry is not a STRAT privilege

ID: ORA-2026-0016
Date: 2026-04-16
Status: canonical
Maturity: M3
Source: docs/entries/doctrines/ORA-2026-0016_fleet-telemetry-every-seat-can-see-every-seat.md

fleet-opsfleet-coordinationtelemetrymonitoringanti-idlesituational-awareness

Every seat can see every seat — fleet telemetry is not a STRAT privilege

The rule

Every seat in the fleet — not just STRAT — has the tools to check what any other seat is doing in real time. Feed posts are intentional signal (what seats CHOOSE to show). The fleet-seat-count CLI, filesystem, process, tmux, and screenshot probes are observational signal (what seats ARE ACTUALLY DOING). Both are available to every seat. Use them.

A fleet-wide capacity claim starts with fleet-seat-count, not a raw worktree loop. A named-seat activity question can still use worktree mtime. A seat that waits 90 minutes for a STRAT nudge when it could have checked the right telemetry layer in seconds is wasting fleet time. A seat that posts "is CAMBER-04 still working?" to the feed when a targeted worktree probe answers instantly is adding noise. Situational awareness is every seat's job, not a STRAT service.

The 4-layer telemetry stack

Every seat has access to all 4 layers. Use the lightest layer that answers your question.

Layer 1 — `fleet-seat-count` CLI (canonical capacity surface)

fleet-seat-count is the first surface for fleet-wide capacity, fanout, and idle-seat count claims. It gives every provider one operator-openable readout before anyone drills into per-seat evidence.

# Prove the command surface exists.
command -v fleet-seat-count

# Human/operator readout: one integer line.
fleet-seat-count

# Machine-readable readout for scripts and feed evidence.
fleet-seat-count --json

# Diagnostic readout for source, filters, and count components.
fleet-seat-count --verbose

Current CoreGraphics contract, certified on MacBook-Air at 2026-05-04T02:27Z:

default output is one numeric line; observed stable value was 30.
--json parses with top-level keys: generated_at, source, selected_count, owner_filter, layer_filter, window_name_filter, bounds_filter, dedupe, and diagnostics.
diagnostics integer keys are owner_alacritty, layer0, nonzero_layer0, named_nonzero_layer0, onscreen_named_nonzero_layer0, and unique_named_nonzero_layer0_bounds.
--verbose emits line-oriented key=value diagnostics including generated_at, source, selected_count, and filter/dedupe fields.

Treat the command output as a timestamped observation, not timeless truth. If fleet-seat-count is missing, stale, or cannot be parsed, name that blocker and cite the repair path before using lower layers as temporary caveated evidence. Do not silently fall back to ad hoc worktree loops for the headline capacity number.

Layer 1b — Filesystem worktrees (seat/ticket drill-down)

Every Codex seat creates a worktree per ticket at ~/gh/<repo>-worktrees/<SEAT>-<ticket>/. The worktree name IS the seat + ticket identity. File modification times ARE the activity signal.

# Is a seat actively working?
find ~/gh/*-worktrees/*-CAMBER-04-* -mmin -30 -type f -not -path '*/.git/*' | wc -l
# > 0 = working right now.  0 = idle or done.

# What ticket is a seat on?
ls -lt ~/gh/camber-worktrees/ | grep CAMBER-04 | head -1
# → most recently modified directory = current ticket

# What file are they touching RIGHT NOW?
find ~/gh/camber-worktrees/CODEX-DESKTOP-MacBook-Air-CAMBER-04-cmb0079 \
  -mmin -5 -not -path '*/.git/*' -type f
# → files changed in last 5 minutes

# Drill-down snapshot — who touched which worktree how recently?
for parent in camber-worktrees orbit-worktrees heartwood-worktrees ora-worktrees; do
  dir=~/gh/$parent; [ -d "$dir" ] || continue
  for wt in "$dir"/*/; do
    [ -d "$wt" ] || continue
    newest_ts=$(find "$wt" -not -path '*/.git/*' -type f \
      -exec stat -f '%m' {} + 2>/dev/null | sort -rn | head -1)
    now=$(date +%s)
    [ -n "$newest_ts" ] && age=$(( (now - newest_ts) / 60 )) || age=999
    printf "  %3dm ago  %s\n" "$age" "$(basename "$wt")"
  done
done | sort -n

CAUTION: Layer 1b is a positive signal only. A count >0 proves a seat is touching files. A count of 0 does NOT prove a seat is dead. Sessions reading code, running SQL queries, making API calls, planning, or waiting for LLM responses produce zero file modifications while fully alive. Before any stale-owner rewrite, cross-reference Layer 1b with Layer 2 (process CPU). Both must confirm dead. File mtime alone is insufficient for declaring a session dead. (Amended 2026-04-25 per ORA-0078: 5 premature stale-owner rewrites caused by inverting Layer 1b absence as a death signal.)

What the files tell you:

__pycache__/*.pyc modified = Python code just compiled/ran (active testing)
*.test.ts modified = running tests
*.sql modified = writing a migration
artifacts/ subdirs appearing = producing proof artifacts
.git/ activity only = committing/pushing (near-done)

Pattern recognition:

Worktree at 0m = actively editing right now
Worktree at 30-60m = recently done, might be writing feed post
Worktree at 120m+ = done or abandoned
Worktree at 4000m+ (days) = seat is dormant on this machine, don't dispatch to it

Layer 2 — Process inspection (is the app alive?)

# Is Codex Desktop running? How hard is it thinking?
ps aux | grep -i codex | awk '$3 > 1 {printf "%s CPU=%.1f%% MEM=%.1f%%\n", $11, $3, $4}'

# Is Claude Desktop running?
ps aux | grep -i claude | grep -v grep | head -3

# Which CLI sessions are active?
ps aux | grep 'claude --resume' | grep -v grep

# Full app roster
osascript -e 'tell application "System Events" to get name of every process whose background only is false' \
  2>&1 | tr ',' '\n' | grep -iE 'codex|claude|gemini|terminal'

What process state tells you:

High CPU (>5%) on Codex renderer = actively generating/thinking
Multiple Codex renderer processes = multiple Codex sessions (sidebar tabs)
claude --resume <SESSION> in ps = CLI session alive and connected
No matching process = app is closed, seat is offline

Layer 3 — tmux capture (CLI seat terminal buffer)

For tmux-based CLI sessions (launched via cmax <seat>):

# List all tmux sessions
tmux list-sessions

# Capture the last 50 lines of a session's terminal
tmux capture-pane -t <session-name> -p | tail -50

# Check if a session is idle (cursor blinking) or active (output streaming)
tmux display-message -t <session-name> -p '#{pane_current_command}'

Limitation: tmux pane capture only shows visible terminal output. If the CLI seat is mid-tool-call with no output yet, the pane looks idle even though work is happening. Cross-reference with Layer 1b (filesystem) to confirm named-seat activity.

Layer 4 — Screenshots (visual inspection, last resort)

# Capture the full screen
screencapture -x /tmp/fleet_check_$(date +%s).png

# Capture a specific window (interactive picker)
screencapture -W /tmp/fleet_check_$(date +%s).png

# Then read the screenshot
# (use the Read tool on the captured .png — Claude/multimodal can parse it)

When to use: when Layers 1-3 don't answer the question. Example: a Codex Desktop session that's active (Layer 2 CPU) and editing files (Layer 1b mtime) but you need to see WHAT DECISION they're making (the conversation text in the app window). Screenshots show you the reasoning, not just the artifacts.

Privacy note: screenshots capture everything on screen, including non-fleet content. Use judiciously. The Read tool's multimodal capability can parse the screenshot and extract fleet-relevant information without needing to store or share the image.

When to use telemetry vs. feed

Question	Use
"How many usable seats are open right now?"	Layer 1 (`fleet-seat-count`)
"Is there enough capacity to fan out this parent?"	Layer 1 (`fleet-seat-count`)
"Is CAMBER-04 touching files?"	Layer 1b (filesystem mtime)
"What ticket is CAMBER-04 on?"	Layer 1b (most-recent worktree dir)
"Is Codex Desktop even running?"	Layer 2 (process check)
"What did the CLI session last output?"	Layer 3 (tmux capture)
"What decision is a seat making?"	Layer 4 (screenshot)
"What did a seat SHIP?"	Feed (DONE post with proof)
"What's the fleet queue state?"	Feed (`fleet-active-queue`)

Rule of thumb: fleet-seat-count for fleet-wide capacity, filesystem for named-seat drill-down, feed for shipped work and queue state, screenshot for "what are they thinking?"

Who should use this

Every seat. Not just STRAT.

Codex seats should use fleet-seat-count before capacity claims, then check sibling worktree mtimes before posting "is X working?" to the feed. They should check process state before assuming a collaborating seat is offline.
Claude seats (Desktop/CLI) should use fleet-seat-count before dispatch fanout and worktree activity before filing dispatch posts to seats that are already heads-down on something. Dispatch timing matters — don't interrupt a seat at the commit-push stage with a new claim request.
STRAT should check ALL layers before capacity rulings or "stale silence" nudges. fleet-seat-count gives the capacity baseline; a worktree at 0m ago proves the named seat is active even if the feed is silent.
Gemini seats should check fleet-seat-count before capacity-sensitive verification swarms and worktree state before starting verification work that depends on a Codex seat's output — don't verify against a worktree that's still being written to.

Anti-patterns this doctrine kills

"Anyone know what CAMBER-04 is doing?" — check the filesystem yourself
"We only have 15 seats because worktree mtimes say so" — run fleet-seat-count; worktree mtimes are drill-down evidence, not headline capacity truth
STRAT posting stale-nudge when seat is actively working — filesystem mtime check prevents false-positive nudges (proven: 2026-04-16 CAMBER-04 was heads-down on CMB-0079 while STRAT read 35min feed-silence as idle)
Dispatching to dormant seats — worktree age >3 days = seat is offline, don't dispatch (proven: ORB-0001 dispatched to ORBIT-01 whose worktree was 3 days stale)
Waiting for STRAT to tell you what's happening — telemetry is democratized. Every seat can see every seat.

Enforcement in shepherd crons (ORA-2026-0013 L5 amendment)

Before posting any fleet-wide capacity/fanout claim, the shepherd MUST run Layer 1:

fleet-seat-count --json

The feed post must cite the observed selected_count, source, and timestamp when capacity changes the decision.

Before posting any "stale silence" nudge for a named seat, the shepherd MUST check Layer 1b:

find ~/gh/*-worktrees/*-<SEAT>-* -mmin -30 -type f -not -path '*/.git/*' | wc -l

If count > 0, the seat is working. Skip the nudge.

If count = 0, do NOT conclude dead. Cross-reference with Layer 2:

ps aux | grep -iE 'codex|claude' | awk '$3>0.5'

If process is alive with CPU activity, the seat is reading/querying/planning — not dead. Only declare stale when BOTH Layer 1b (zero file mods in 30+ min) AND Layer 2 (no matching process or zero CPU for extended period) confirm dead.

Stale-owner protocol (added 2026-04-25 per ORA-0078): Before rewriting any IN_PROGRESS item to AWAITING_CLAIM: (a) Layer 1b file mtime check, (b) Layer 2 process count + CPU, (c) BOTH must confirm dead. File mtime alone is insufficient. Absence of evidence is not evidence of absence.

This prevents the #1 shepherd false-positive: nudging a seat that's heads-down coding but hasn't posted to the feed yet. And the #2 false-positive: declaring a session dead when it's actively reading/querying without producing file modifications.

Evidence trail

2026-04-16T01:14Z — STRAT-01 filesystem probe revealed CAMBER-04 actively working CMB-0079 (0m ago mtime) despite 35min feed silence. Prevented a false-positive stale nudge. Same probe revealed ORBIT-01/02 worktrees at 3 days stale = dormant seats, immediately actionable for ORB-0001 reassignment.
2026-04-16T01:35Z — Chad screenshot of Codex Desktop confirmed filesystem reading: CAMBER-04 at PR-creation stage for CMB-0079. Screenshot added intelligence the filesystem couldn't: iMac missing Beside infrastructure (real environment blocker), media_message rows already flowing (collapses CMB-0062 scope).
2026-04-16T01:40Z — Chad directive: "this is power that needs to be distributed equally to ALL team members. it needs to be in ora and top of memory for all."
2026-05-04T01:41Z — fleet-wide fanout over-weighted raw worktree-mtime telemetry and undercounted visible capacity, exposing that old Layer 1 was a drill-down surface, not a fleet-wide capacity surface.
2026-05-04T02:14Z through 2026-05-04T02:27Z — ORA-2083 selected the CoreGraphics WindowServer projection, ORA-2084 landed the MacBook fleet-seat-count implementation, ORA-2097 proved command materialization and JSON/verbose execution, ORA-2085 certified default/JSON/verbose schema, and ORA-2026-0137 named fleet-seat-count as Layer-1 capacity truth.

Promote to M4 when at least 2 non-STRAT seats reference fleet-seat-count as Layer 1 in feed posts and one stale-owner or fanout decision cites both Layer 1 capacity and Layer 1b drill-down evidence correctly.

Known exceptions

Cross-machine telemetry requires command parity plus SSH for drill-down (ssh imac "fleet-seat-count --json" and ssh imac "find ~/gh/*-worktrees/ ..."). Name the machine checked; do not imply fleet-wide parity from one host.
fleet-seat-count unavailable or unparsable — name the blocker, cite the failing command, and use Layer 1b only as caveated temporary evidence until the command is repaired.
Claude Desktop sessions don't use worktrees (Layer 1b is partial); use Layer 2 (process) + Layer 4 (screenshot) instead.
Privacy boundary — screenshots may capture personal content. Use Layer 4 only when Layers 1-3 don't answer the question, and don't persist screenshots beyond the immediate analysis.

Review cadence

Next review: 2026-06-04
Review triggers: (a) fleet-seat-count disagrees with operator-visible seats by more than 20 percent, (b) provider shells or machines disagree on the command result, (c) a fanout directive cites raw mtime loops without the CLI count, (d) a new telemetry layer is discovered, (e) cross-machine telemetry via SSH proves unreliable.

Every seat can see every seat — fleet telemetry is not a STRAT privilege