Syncing Staging Databases Before Production Switch
Problem Statement
Just before a cutover, staging and production databases drift apart: schema changes land on one side, new rows accumulate on the other, and a naive copy introduces foreign-key violations or silent data truncation. The failures are rarely loud β a STRICT_TRANS_TABLES mismatch silently truncates an over-length string, a timezone difference shifts every TIMESTAMP by an hour, and an import that disables foreign-key checks leaves orphaned rows that only surface when a customer hits the broken join. If you flip DNS before the data is provably consistent, live users land on a half-migrated dataset, and because writes are now happening on the new host, unwinding it means reconciling two diverging copies rather than restoring one. This page sits under Staging to Production Sync and covers how to produce a transaction-consistent snapshot, catch schema drift before it corrupts the import, and prove row and checksum parity before any routing change β turning βlooks fineβ into an objective pass/fail gate.
When to Use This Approach
- You are about to switch production traffic and need the target database to match the source at a known point in time.
- Schema has changed on either side and you must catch drift before it corrupts an import.
- You run MySQL/InnoDB or PostgreSQL and can take a logical, transaction-consistent dump without locking writes.
- You need an objective pass/fail integrity signal (row counts plus checksums) rather than a visual spot-check.
- You want a pre-sync snapshot retained for fast rollback if validation fails.
- The dataset is small enough to copy logically within the cutover window, or you can run replication to keep lag near zero up to the flip.
Step-by-Step Instructions
1. Diff Schema and Map Constraints
Extract schema-only dumps from both environments and diff them so structural drift surfaces before any data moves. Resolve mismatched column types, missing indexes, and constraint differences first; a column that exists on staging but not production, or a narrower VARCHAR on the target, will either reject the import or truncate data on the way in. Audit sql_mode parity at the same time, because a target running stricter modes than the source can reject rows the source happily stored.
# Schema-only dumps, then diff to expose structural drift
mysqldump --no-data -u root -p production_db > prod_schema.sql # structure only
mysqldump --no-data -u root -p staging_db > staging_schema.sql
diff prod_schema.sql staging_schema.sql # PostgreSQL: pg_dump --schema-only
2. Capture a Consistent Snapshot
Use a logical export with --single-transaction to capture a consistent InnoDB view without blocking writes, and pipe it straight to the target to avoid double disk I/O. Coordinate the timing with Staging to Production Sync so the snapshot aligns with the cutover window.
# Consistent, non-locking dump piped into the target database
SNAPSHOT="/tmp/prod_sync_$(date +%s).sql"
mysqldump -u root -p \
--single-transaction --routines --triggers --events \
--hex-blob --set-gtid-purged=OFF \
production_db > "$SNAPSHOT" # consistent InnoDB snapshot, writes stay live
mysql -u root -p --max-allowed-packet=1G staging_db < "$SNAPSHOT"
3. Monitor Replication Lag
If you replicate rather than dump-and-load, hold the cutover until lag is effectively zero so the target is not behind the source at flip time.
# MySQL 8.0+: read Seconds_Behind_Source from replica status
mysql -u root -p -e 'SHOW REPLICA STATUS\G' | grep Seconds_Behind_Source
# PostgreSQL: inspect streaming replication state
psql -c 'SELECT client_addr, state, replay_lag FROM pg_stat_replication;'
4. Verify Integrity Before Any DNS Change
Compare row counts and table checksums across both environments. Row counts catch missing data; checksums catch corrupted data that row counts miss, such as a timezone-shifted timestamp or a re-encoded blob. Both must match before you touch routing; align the final DNS step with DNS Propagation Tracking so the record change only fires after the integrity gate is green.
# Per-table checksum on both environments; values must be identical
mysql -u root -p -e 'CHECKSUM TABLE orders, customers;' production_db
mysql -u root -p -e 'CHECKSUM TABLE orders, customers;' staging_db
# Large datasets: chunked, lock-light comparison
pt-table-checksum --replicate=checksums.checksums --databases=production_db
Worked Example
A SaaS team cuts app.example.com over to a new database host. The schema diff in step 1 reveals staging has an extra customers.locale column added during a feature branch but never deployed to production; they drop it to restore parity. They then run the step-2 snapshot at 02:10 UTC, importing into the new host in 9 minutes.
Validation shows orders with 1,482,905 rows on both sides, but CHECKSUM TABLE disagrees:
# production_db
# orders 3920183746
# staging_db
# orders 1184552097 <- mismatch despite equal row counts
pt-table-checksum isolates the divergence to 14 rows with TIMESTAMP values shifted by one hour β a timezone mismatch (SET time_zone differed between hosts). They re-import those rows with --default-time-zone='+00:00', re-run the checksum until both sides read 3920183746, and only then proceed to DNS. The integrity gate prevented cutting over onto silently corrupted timestamps.
Verification
Run the integrity gate and a live health probe before declaring the database ready.
# 1. Row-count parity on a critical table
mysql -u root -p -e 'SELECT COUNT(*) FROM orders;' production_db
mysql -u root -p -e 'SELECT COUNT(*) FROM orders;' staging_db
# 2. Checksum parity (must be identical, see step 4)
# 3. Application health endpoint returns 200 against the new database
curl -s -o /dev/null -w '%{http_code}\n' https://app.example.com/health
If integrity fails, restore from the retained pre-sync snapshot rather than cutting over; the thresholds that govern that decision live in Rollback Trigger Thresholds.
FAQ
How do I handle active user sessions during the database sync?
Externalise session storage to Redis or Memcached before the cutover window so a database swap does not invalidate live sessions, and use --single-transaction to capture a consistent snapshot without locking active writes. Keep session cookies domain-agnostic during the transition so they survive the host change.
How do I verify data integrity post-import without impacting performance?
Run CHECKSUM TABLE table_name; on both environments and compare outputs for small to mid-size tables. For large datasets use pt-table-checksum from Percona Toolkit, which samples rows in chunks to avoid full table scans and excessive lock contention.
What triggers an automatic rollback during the sync process? Roll back if replication lag exceeds 10 seconds at flip time, HTTP 5xx error rates surpass 2% for 5 consecutive minutes after cutover, or any checksum or foreign-key validation fails post-import. Restore the pre-sync snapshot and revert DNS before re-attempting.
Related
- Staging to Production Sync
- Implementing Blue-Green Deployments for Site Migrations
- DNS Propagation Tracking
- Rollback Trigger Thresholds
β Back to Staging to Production Sync