Observation

Vendor Shadow Database Pattern

world-modelvendor-shadowdata-modeling

Vendor Shadow Database Pattern

Context

The overnight Buildertrend mirror chain in camber (CMB-0230 through CMB-0246) built a typed local mirror of vendor-owned data we do not control: roles, internal users, jobs, contacts, vendors, schedules, to-dos, daily logs, and activity-feed catalogs.

That is not an isolated move. The fleet already uses the same basic posture in other places:

  • Buildertrend photo scraping and private-asset proof flows
  • Beside direct data surfaces mirrored into local truth systems
  • now a structured Buildertrend shadow mirror inside camber DB

This observation names the recurring pattern.

Observation

We repeatedly build vendor-shadow databases: locally typed, queryable mirrors of third-party systems whose schemas, availability, and semantics we do not own.

Common traits across these shadows:

1. Source truth starts outside our database boundary. 2. We ingest only a subset of the vendor surface at first. 3. Landing usually happens in slices:

  • schema first
  • then small proof-backed seeds
  • then later sync or backfill paths
  • 4. Raw vendor semantics often drift or arrive incomplete:

  • hidden jobs outside the visible seed
  • activity users already present in one scrape but described as missing in another
  • row-truth beating packet prose during acceptance
  • 5. The mirror becomes more useful than the vendor UI for joins, proofs, and downstream automation once enough typed surfaces land.

Why It Helps

Vendor-shadow databases buy us:

  • deterministic local joins across vendor entities
  • auditable proofs and replayable verification queries
  • a place to preserve vendor-specific semantics without forcing them into
  • Heartwood client truth or Orbit coordination truth

  • the ability to stage sync work after schema work instead of coupling both on
  • day one

The Buildertrend overnight chain showed that clearly: landing schema and proof-backed seeds first let us verify 49 tables and 2 views before any cookie-auth sync function existed.

Trade-Offs

The pattern also has recurring costs:

  • freshness gap: the mirror is only as current as the latest scrape or sync
  • schema drift risk: vendor UI/API changes can silently stale our local
  • assumptions

  • sync fragility: cookie auth, hidden filters, rolling windows, and partial
  • payloads make ingestors brittle

  • semantic mismatch: vendor fields often mean less, more, or something
  • different than their names suggest (action_entity_id, weather_station_latlng, visually true vs proof-backed qb_synced)

Three-Domain Separation Implication

Vendor-shadow databases belong in the process/intelligence layers, not in client truth.

For Buildertrend:

  • the shadow mirror belongs in camber-db because it is process data
  • client-facing truth should consume curated outputs, not raw vendor shadows
  • coordination about scrape gaps and sync posture belongs in ORA / fleet feed,
  • not in the client data domain

That separation matters because vendor data is often operationally useful long before it is clean enough to present as truth.

Suggested Pattern Name

vendor-shadow database

The name fits better than “cache” or “replica” because:

  • it is not guaranteed fresh enough to be a cache
  • it is not complete enough to be a replica
  • it intentionally preserves only the parts of the vendor system we have
  • surfaced and proved

Countermeasure Pattern

When building a vendor-shadow database, the overnight BT chain suggests a safer default posture:

1. Land schema first. 2. Seed only proof-backed rows. 3. Keep ambiguous source fields explicit instead of normalizing them away. 4. Add smoke tests before claiming the mirror “done.” 5. Split fresh-scrape dependencies into blocked follow-ons instead of faking completeness.

That sequence turns vendor ingestion from a monolith into a chain of small, honest proofs.