Observation

Vendor Shadow Database Pattern

ID: ORA-2026-0039
Date: 2026-04-23
Status: unknown
Maturity: M2
Source: docs/entries/observations/ORA-2026-0039_vendor-shadow-database-pattern.md

world-modelvendor-shadowdata-modeling

Vendor Shadow Database Pattern

Context

The overnight Buildertrend mirror chain in camber (CMB-0230 through CMB-0246) built a typed local mirror of vendor-owned data we do not control: roles, internal users, jobs, contacts, vendors, schedules, to-dos, daily logs, and activity-feed catalogs.

That is not an isolated move. The fleet already uses the same basic posture in other places:

Buildertrend photo scraping and private-asset proof flows
Beside direct data surfaces mirrored into local truth systems
now a structured Buildertrend shadow mirror inside camber DB

This observation names the recurring pattern.

Observation

We repeatedly build vendor-shadow databases: locally typed, queryable mirrors of third-party systems whose schemas, availability, and semantics we do not own.

Common traits across these shadows:

1. Source truth starts outside our database boundary. 2. We ingest only a subset of the vendor surface at first. 3. Landing usually happens in slices:

schema first
then small proof-backed seeds
then later sync or backfill paths

4. Raw vendor semantics often drift or arrive incomplete:

hidden jobs outside the visible seed
activity users already present in one scrape but described as missing in another
row-truth beating packet prose during acceptance

5. The mirror becomes more useful than the vendor UI for joins, proofs, and downstream automation once enough typed surfaces land.

Why It Helps

Vendor-shadow databases buy us:

deterministic local joins across vendor entities
auditable proofs and replayable verification queries
a place to preserve vendor-specific semantics without forcing them into

Heartwood client truth or Orbit coordination truth

the ability to stage sync work after schema work instead of coupling both on

day one

The Buildertrend overnight chain showed that clearly: landing schema and proof-backed seeds first let us verify 49 tables and 2 views before any cookie-auth sync function existed.

Trade-Offs

The pattern also has recurring costs:

freshness gap: the mirror is only as current as the latest scrape or sync
schema drift risk: vendor UI/API changes can silently stale our local

assumptions

sync fragility: cookie auth, hidden filters, rolling windows, and partial

payloads make ingestors brittle

semantic mismatch: vendor fields often mean less, more, or something

different than their names suggest (action_entity_id, weather_station_latlng, visually true vs proof-backed qb_synced)

Three-Domain Separation Implication

Vendor-shadow databases belong in the process/intelligence layers, not in client truth.

For Buildertrend:

the shadow mirror belongs in camber-db because it is process data
client-facing truth should consume curated outputs, not raw vendor shadows
coordination about scrape gaps and sync posture belongs in ORA / fleet feed,

not in the client data domain

That separation matters because vendor data is often operationally useful long before it is clean enough to present as truth.

Suggested Pattern Name

vendor-shadow database

The name fits better than “cache” or “replica” because:

it is not guaranteed fresh enough to be a cache
it is not complete enough to be a replica
it intentionally preserves only the parts of the vendor system we have

surfaced and proved

Countermeasure Pattern

When building a vendor-shadow database, the overnight BT chain suggests a safer default posture:

1. Land schema first. 2. Seed only proof-backed rows. 3. Keep ambiguous source fields explicit instead of normalizing them away. 4. Add smoke tests before claiming the mirror “done.” 5. Split fresh-scrape dependencies into blocked follow-ons instead of faking completeness.

That sequence turns vendor ingestion from a monolith into a chain of small, honest proofs.