DNS Rollback Procedures

Context

When a cutover fails, reverting DNS is the bluntest and most reliable way to send all traffic back to the legacy origin — but its speed is decided hours earlier, by the TTL you set before the migration. This guide covers reverting A, AAAA, and CNAME records cleanly, the cache implications that bound your recovery window, and why a low pre-cutover TTL is the difference between a 2-minute and a 24-hour rollback. It operates under Migration Rollback Playbooks, which owns the trigger and decision-authority side of the reversal.

DNS rollback timeline A reverted record propagates to resolvers over one TTL cycle; a low pre-set TTL shortens the window from hours to minutes. DNS Rollback Timeline Apply revert Propagate Full recovery UPSERT legacy IP ≤ 1 TTL cycle dig confirms legacy TTL 60s → minutes to recover  ·  TTL 86400s → up to a day Lower the TTL before cutover, not after the fault.
The reverted record reaches resolvers within one TTL cycle; the TTL chosen before cutover sets the entire recovery budget.

Pre-flight Checks

Confirm the legacy targets and TTL state before you need them, so reversion is a single staged change.

  • Snapshot current and legacy A/AAAA/CNAME values and store the legacy change set ready to apply.
  • Confirm the TTL was already lowered before cutover via TTL Optimization Strategies — verify the effective TTL with dig, not just the zone file.
  • Check for registrar TTL floors that may override your value and extend the window.
  • Verify API credentials (CF_TOKEN, aws Route 53 permissions) work with a dry-run.

DNS Rollback Readiness Checklist:

  • dig +noall +answer

Execution Steps

Revert records, increment the zone serial, and verify propagation across multiple resolvers.

1. Apply the Reverted Records

Push the staged legacy A/AAAA/CNAME values via UPSERT so the change is idempotent and atomic per record. Update apex, www, and any service subdomains in one batch to avoid split routing. The full command-level walkthrough with verification is in Reverting DNS Records During a Failed Cutover.

2. Force a Zone Serial Increment

Bump the SOA serial on every authoritative nameserver so secondaries pull the reverted zone. A reverted record that never propagates to secondaries leaves a fraction of resolvers on the broken target. Confirm all NS hosts report the new serial before trusting propagation.

3. Wait Out the TTL Cycle

Start the clock at the moment of reversion; resolvers holding the broken record release it only after their cached TTL expires. This is why the pre-cutover TTL governs everything — coordinate the wait with the cache mechanics in DNS Propagation Tracking. Do not declare recovery until at least one full TTL has elapsed.

4. Verify Across Global Resolvers

Query 1.1.1.1, 8.8.8.8, and 9.9.9.9 plus regional resolvers to confirm the legacy IP is returned everywhere. A single outlier usually means a resolver-enforced minimum TTL or a stale secondary. Cross-check that the restored origin returns HTTP 200 before standing down, coordinating with Migration Rollback Playbooks for the recovery broadcast.

Configs / Commands

AWS Route 53 — UPSERT records back to the legacy origin:

# Apply the staged legacy values; UPSERT overwrites whatever is live
aws route53 change-resource-record-sets \
  --hosted-zone-id Z123456ABCDEFG \
  --change-batch file://dns-rollback.json
# Poll until INSYNC, then verify with dig
aws route53 get-change --id /change/C123456 --query 'ChangeInfo.Status'

Cloudflare API — revert apex and www in two PATCH calls:

# Restore legacy A records at TTL 60 for fast resolver adoption
for rec in "$APEX_REC:203.0.113.10" "$WWW_REC:203.0.113.10"; do
  id="${rec%%:*}"; ip="${rec##*:}"
  curl -X PATCH "https://api.cloudflare.com/client/v4/zones/$ZONE/dns_records/$id" \
    -H "Authorization: Bearer $CF_TOKEN" -H "Content-Type: application/json" \
    --data "{\"type\":\"A\",\"content\":\"$ip\",\"ttl\":60}"
done

Verification — confirm the reverted record across resolvers with dig:

# Returned IP should equal the legacy origin on every resolver
for r in 1.1.1.1 8.8.8.8 9.9.9.9; do
  echo "$r:"; dig @$r www.example.com A +noall +answer
done

Validation

Prove the revert took effect globally and the legacy origin is serving traffic before declaring recovery.

  • dig @1.1.1.1 www.example.com +short returns the legacy IP, and the TTL counts down on repeat queries.
  • curl -sI https://www.example.com | head -1 returns HTTP 200 from the legacy origin.
  • dig +trace www.example.com confirms authoritative delegation and consistent NS records after the revert.
  • Authoritative query volume rises briefly, confirming resolvers are re-querying at the lowered TTL.

Rollback Triggers

These conditions force or extend the DNS reversion during a cutover.

  • Origin failure: legacy IP must be restored if the new origin returns 5xx above 2% sustained for 5 minutes.
  • Propagation stall: if any major resolver still returns the broken IP after 2x the TTL, force a serial bump and re-push.
  • Resolver TTL override: if ISP resolvers ignore the low TTL, route via CDN to a static legacy origin IP during the wait.
  • Secondary desync: roll back and re-sync immediately if authoritative secondaries report mismatched SOA serials.

FAQ

How long does a DNS rollback take to fully propagate? It takes up to one full TTL cycle from the moment you apply the revert. With a pre-cutover TTL of 60 seconds, most resolvers adopt the legacy record within a few minutes; with a TTL still at 86400 seconds, some clients stay on the broken target for nearly a day. The TTL set before migration is the single biggest factor.

Can I shorten the window by lowering the TTL during the incident? No. Lowering the TTL now only affects future cache entries; resolvers that already cached the record at the old TTL hold it until that value expires. This is precisely why TTL must be lowered before cutover, not in response to the fault.

Should I revert AAAA and CNAME records too, or just A? Revert every record type that changed for the affected hostnames. Leaving a stale AAAA record sends IPv6 clients to the broken origin even after the A record is fixed, producing intermittent failures that are hard to diagnose.

Do I need to purge anything besides DNS during a DNS rollback? Yes — purge the CDN edge cache for the affected hosts. DNS controls which origin resolvers reach, but a CDN may still serve cached objects generated by the broken origin. Trigger a cache purge via the provider API alongside the DNS revert.

Related

← Back to Migration Rollback Playbooks

Explore Sub-topics