Observation

ORA-2026-0070 — Buildertrend ingest has no API path; scraping-layer choice is load-bearing

ID: ORA-2026-0070
Date: 2026-04-24
Status: surfaced
Maturity: M1
Source: docs/entries/observations/ORA-2026-0070_bt-scraping-architecture-choice.md

fleet-opsbuildertrendscraping

ORA-2026-0070 — Buildertrend ingest has no API path; scraping-layer choice is load-bearing

Type: observation Date: 2026-04-24 Source: Chad directive 2026-04-24T~20:40Z; camber BT mirror DB inspection 2026-04-24T20:42Z; Randy-weekend Wave-2 dispatch 2026-04-24T20:33Z Observed by: CLAUDE-STRAT-DESKTOP-MacBook-Air-CAMBER-01

Observation

The Camber BT mirror (~30 tables, ~600 rows across the 2026-04-22..24 bootstrap) was seeded entirely by hand-paste migration workflow — Chad scraped in Claude-in-Chrome, pasted to Codex, Codex wrote SQL migrations embedding the data, migrations applied to DB. Zero rows arrived via a production scraper running on a fleet machine. Zero rows arrived via an API call.

Chad surfaced this at 20:40Z as a strategic concern: "we need API". The uncomfortable truth is that Buildertrend does not offer a comprehensive public API. Their Partner API is contract-gated and covers a thin slice (jobs, leads, accounting hooks); it does not cover Selections, Daily Logs, Messages, Bills/POs, Warranty, admin permissions, notifications, or most of the admin-standardization surface Zack directed us to model.

This means every long-term BT-to-Camber data path is either:

1. Playwright headless/headful browser automation — the industry default for BT integrations, brittle against BT UI changes, cheap per-run, deterministic 2. Claude-in-Chrome orchestrated by the fleet — semantically robust (reads pages instead of matching CSS), fast per Chad's observation, costs Claude tokens, novel session-management 3. BT Partner API (where it exists) — contract-and-coverage-limited, small surface, parallel business-track to request 4. Hybrid — Claude-in-Chrome for schema discovery + UI-churny surfaces, Playwright for stable high-volume pulls

Wave-2 tickets fired 2026-04-24T20:33Z (CMB-1037 through CMB-1044 + HWD-0362/0363) default to Option 1 (Playwright). That's the pragmatic MVP for surfaces with stable selectors (jobs, selections, invoices, schedule). For surfaces with embedded media, threaded messages, or custom-field churn (Daily Logs, Messages, custom-fields), Option 1's brittleness will bite within weeks.

Evidence

DB inspection 2026-04-24T20:42Z: 30 bt_ tables populated. Largest: bt_permission_capability (159), bt_schedule_item (125), bt_sub_vendor (59), bt_notification_event (34). All traceable to Apr 23–24 migration files (bt_mirror_phase.sql) embedding the data inline.
Migration history: 40+ migrations named bt_mirror_phase{N}_* written by Codex 2026-04-23. Each contains CREATE TABLE + INSERT/COPY blocks. No companion scraper file shipped alongside any of them.
Playwright harness: ~/hcb-bt-refresh/ exists; bt-playwright-state.json cookie jar refreshed by CMB-1010 (2026-04-24T20:26Z). Zero production scraper runs against it. Commit history surfaces exactly two Playwright-adjacent commits: 8e1fd1c8 (state env adapter) + a69dc816 (FLT-0327 permit-portal dry-run scraper — a different surface entirely).
Randy-weekend doc 12 (12-bt-admin-standardization-guide.md) §6 answers "is there a BT API for outlier reports?" — explicit answer: no native outlier reports, API coverage insufficient, must use SQL against the mirror (which the mirror depends on scraping to stay fresh).
CMB-1011 outlier SQL library ran live against the mirror and found the coverage gap: six tables with zero rows because no scraper has populated them (advanced settings, bill approvers, job access, sub/vendor access, notification preferences, role permissions).

Fleet Lesson

The Camber BT mirror is a staging surface whose freshness depends entirely on a scraping layer that does not yet exist in production. Wave-2 starts the scraping layer with Playwright defaults. Whether that's the right long-term shape is a Wave-3 architectural decision — the choice between CSS-selector brittleness vs. Claude-in-Chrome token cost vs. BT Partner API coverage gaps vs. hybrid.

Until that choice is made and the winning path is hardened, any outlier report, Monday dashboard, or drift alarm that claims to reflect current BT state is reflecting the state-as-of-the-last-paste, not live BT. Consumers should know this.

This observation earns pattern status (graduates to patterns/) the first time the Wave-2 Playwright scrapers silently break on a BT UI change AND a consumer dashboard serves stale data as if current — because that will happen. The cost of the break is what makes the architectural choice load-bearing.

Adjacent doctrines

ORA-2026-0044 (names carry contracts) — the BT mirror tables are named like a ledger but are actually a staging cache; the name lies about freshness until scrapers land.
ORA-2026-0026 (idle-heartbeat) — a drift monitor (CMB-1043, Wave-2) that watches max(updated_at) is the ORA-correct shape for this; scrapers should re-fire on drift, not on cron alone.
ORA-2026-0068 (three-tier primitive for external-signal-to-BT) — the inverse direction; same architecture question applies.

Follow-ups

Wave-3 scoping spike ticket to enumerate (a) BT Partner API surface coverage, (b) Claude-in-Chrome orchestration cost model, (c) concrete Playwright surface-brittleness inventory — see forthcoming CMB ticket.
CMB-1043 drift monitor (Wave-2) is the canary. When it alarms, re-read this observation.