Observation
Vendor Shadow Database Pattern
Vendor Shadow Database Pattern
Context
The overnight Buildertrend mirror chain in camber (CMB-0230 through CMB-0246) built a typed local mirror of vendor-owned data we do not control: roles, internal users, jobs, contacts, vendors, schedules, to-dos, daily logs, and activity-feed catalogs.
That is not an isolated move. The fleet already uses the same basic posture in other places:
- Buildertrend photo scraping and private-asset proof flows
- Beside direct data surfaces mirrored into local truth systems
- now a structured Buildertrend shadow mirror inside camber DB
This observation names the recurring pattern.
Observation
We repeatedly build vendor-shadow databases: locally typed, queryable mirrors of third-party systems whose schemas, availability, and semantics we do not own.
Common traits across these shadows:
1. Source truth starts outside our database boundary. 2. We ingest only a subset of the vendor surface at first. 3. Landing usually happens in slices:
- schema first
- then small proof-backed seeds
- then later sync or backfill paths
- hidden jobs outside the visible seed
- activity users already present in one scrape but described as missing in another
- row-truth beating packet prose during acceptance
4. Raw vendor semantics often drift or arrive incomplete:
5. The mirror becomes more useful than the vendor UI for joins, proofs, and downstream automation once enough typed surfaces land.
Why It Helps
Vendor-shadow databases buy us:
- deterministic local joins across vendor entities
- auditable proofs and replayable verification queries
- a place to preserve vendor-specific semantics without forcing them into
- the ability to stage sync work after schema work instead of coupling both on
Heartwood client truth or Orbit coordination truth
day one
The Buildertrend overnight chain showed that clearly: landing schema and proof-backed seeds first let us verify 49 tables and 2 views before any cookie-auth sync function existed.
Trade-Offs
The pattern also has recurring costs:
- freshness gap: the mirror is only as current as the latest scrape or sync
- schema drift risk: vendor UI/API changes can silently stale our local
- sync fragility: cookie auth, hidden filters, rolling windows, and partial
- semantic mismatch: vendor fields often mean less, more, or something
assumptions
payloads make ingestors brittle
different than their names suggest (action_entity_id, weather_station_latlng, visually true vs proof-backed qb_synced)
Three-Domain Separation Implication
Vendor-shadow databases belong in the process/intelligence layers, not in client truth.
For Buildertrend:
- the shadow mirror belongs in
camber-dbbecause it is process data - client-facing truth should consume curated outputs, not raw vendor shadows
- coordination about scrape gaps and sync posture belongs in ORA / fleet feed,
not in the client data domain
That separation matters because vendor data is often operationally useful long before it is clean enough to present as truth.
Suggested Pattern Name
vendor-shadow database
The name fits better than “cache” or “replica” because:
- it is not guaranteed fresh enough to be a cache
- it is not complete enough to be a replica
- it intentionally preserves only the parts of the vendor system we have
surfaced and proved
Countermeasure Pattern
When building a vendor-shadow database, the overnight BT chain suggests a safer default posture:
1. Land schema first. 2. Seed only proof-backed rows. 3. Keep ambiguous source fields explicit instead of normalizing them away. 4. Add smoke tests before claiming the mirror “done.” 5. Split fresh-scrape dependencies into blocked follow-ons instead of faking completeness.
That sequence turns vendor ingestion from a monolith into a chain of small, honest proofs.