Doctrine

Shared coordination state must not live on iCloud at fleet scale

substratefleet-scaleicloudshared-statecoordination-transport

Shared coordination state must not live on iCloud at fleet scale

Rule. Files that the fleet writes on every operation — coordination state, append-only feeds, per-write metadata — MUST NOT live on iCloud-synced paths once the fleet exceeds ~10 simultaneous writers across ~3 machines. iCloud's per-file overhead is fixed and dwarfs the payload at small file size; sustained write rates saturate cloudd on every fleet machine.

Why. Each iCloud write triggers: (a) local FSEvents detection, (b) hash + diff, (c) metadata-sync to iCloud, (d) propagation to every other fleet machine, (e) downstream cloudd local write + FSEvents emission. The cycle is per-file, not per-byte. A 382-byte tamper-detection state file written 50 times per hour generates the same cloudd load as a 4MB feed written 50 times per hour. At small payloads, iCloud is pure overhead.

Founding evidence (2026-05-03 session). Operator MacBook Air hit 217°F with load > 20 and 1.4% CPU available. Top-CPU diagnosis: claude (31.5%) + cloudd (30.2%) co-equal. Two iCloud-resident shared-state files were the load drivers:

1. ~/Library/Mobile Documents/com~apple~CloudDocs/camberzero_sync_assets/fleet/shepherd-state.json — 382 bytes, 10 keys, written by ~7 fleet helpers on every feed-append. Operator quote: "causes this every day. i dont know why we don't just destroy every trace of it." 2. ~/Library/Mobile Documents/com~apple~CloudDocs/camberzero_sync_assets/.../FLEET_FEED.md — 4.5MB and growing, written by every fleet seat's actionable post.

FLT-1314 moved (1) to ~/Library/Application Support/fleet/. cloudd dropped from 30% to ~0% within the post-fix observation window. FLT-1324 moved (2) to the same local path, with SSH-delegate from secondary machines preserving the existing single-writer-guard pattern. Both fixes landed within ~8 minutes of filing.

How to apply.

  • Default placement for newly created shared-state files: ~/Library/Application Support/<service>/. Per-machine local; never iCloud.
  • Cross-machine consistency, when needed, rides on SSH delegation through the canonical writer (CLAUDE.md "Single-writer guard"), not iCloud sync. Secondaries pull on demand or via cron rsync.
  • iCloud is acceptable for: doctrine source files (low write rate, mostly read); reference docs (read-mostly); operator-curated artifacts (low write rate); the canonical user identity bundle (rare-write).
  • iCloud is NOT acceptable for: append-only feeds, per-operation state files, lock files, ticket reservations, lifecycle counters, write-incident logs.

Diagnostic signal. When cloudd appears in top-5 CPU consumers under fleet activity, an iCloud-resident shared-state file is the suspect. Sample with ps -axo %cpu,command -r | head -5 during a sustained fleet write burst.

Generalization. The principle is transport overhead must be amortizable across payloads. iCloud amortizes per-byte costs but not per-file costs. Substrates that scale to N≥10 writers (Postgres, Redis, etcd, file servers with proper write semantics) amortize both. The fleet's single-writer guard pattern + SSH delegation is a lightweight equivalent that requires no new infrastructure.

Disconfirming observation. This doctrine fails if a shared-state file is written so rarely that cloudd never aggregates load — but in practice, "rarely written" in fleet contexts means "<once per hour," and any file that meets that threshold is unlikely to need iCloud sync at all.