Reverting DNS Records During a Failed Cutover
Problem Statement
The cutover is live, the new origin is failing, and every minute on the broken target costs traffic and revenue. You need to revert the A, AAAA, and CNAME records to the legacy origin immediately, then confirm the change has reached resolvers worldwide before standing down. This is the execution-level companion to DNS Rollback Procedures, focused purely on the commands and TTL-aware timing of the reversion itself.
When to Use This Approach
- The new origin returns sustained 5xx errors and a redirect-layer swap will not fix an origin fault.
- DNS was pointed at a new IP during cutover and you have the legacy values staged.
- Your pre-cutover TTL is low (60â300 s), so reversion can take effect in minutes.
- Edge or load-balancer health checks cannot fail over fast enough on their own.
- You need a clean, verifiable return to the known-good origin rather than a forward patch.
Step-by-Step Instructions
1. Apply the Staged Legacy Records
Push the legacy A/AAAA/CNAME values with an idempotent UPSERT so re-running the command is safe under pressure.
# Route 53: UPSERT all changed records back to legacy in one batch
aws route53 change-resource-record-sets \
--hosted-zone-id Z123456ABCDEFG \
--change-batch file://dns-rollback.json # contains UPSERT for apex, www, AAAA, CNAME
2. Increment the Zone Serial and Confirm Propagation to Secondaries
Bump the SOA serial so authoritative secondaries pull the reverted zone; a record only reaches resolvers once every authoritative server serves it.
# Confirm every authoritative NS reports the new SOA serial
for ns in ns1.provider.com ns2.provider.com; do
dig @$ns example.com SOA +short # serial field must match across all nameservers
done
3. Verify the Revert Across Global Resolvers With dig
Query multiple public and regional resolvers; the answer must be the legacy IP everywhere, with a TTL that counts down on repeat queries.
# The returned IP must equal the legacy origin on every resolver
for r in 1.1.1.1 8.8.8.8 9.9.9.9 208.67.222.222; do
echo "== $r =="; dig @$r www.example.com A +noall +answer
done
4. Confirm the Legacy Origin Is Serving and Purge the Edge
Once resolvers return the legacy IP, confirm the origin answers HTTP 200 and purge any CDN cache built from the broken origin.
# Confirm legacy origin health, then purge edge so no broken objects remain
curl -sI https://www.example.com | head -1 # expect HTTP/2 200
curl -X POST "https://api.cloudflare.com/client/v4/zones/$ZONE/purge_cache" \
-H "Authorization: Bearer $CF_TOKEN" -H "Content-Type: application/json" \
--data '{"purge_everything":true}'
Worked Example
At 02:14 UTC, www.example.com was cut over from the legacy origin 203.0.113.10 to a new origin 198.51.100.20. Within four minutes the 5xx rate crossed 6%, breaching the abort threshold. The pre-cutover TTL had been lowered to 60 seconds two days earlier.
Before reversion, dig @1.1.1.1 www.example.com +short returned 198.51.100.20 with curl -sI showing HTTP/2 503. The on-call ran the Route 53 UPSERT at 02:19, the change reached INSYNC in 38 seconds, and the SOA serial advanced from 2026061901 to 2026061902 on both nameservers. By 02:21 all four resolvers returned 203.0.113.10 with a decrementing TTL, and curl -sI https://www.example.com returned HTTP/2 200. A full edge purge cleared the cached 503 responses. Total recovery: 7 minutes from trigger to verified legacy serving â bounded by the 60-second TTL set ahead of time, as covered in TTL Optimization Strategies.
Verification
dig @1.1.1.1 www.example.com +shortreturns the legacy IP, and the TTL decreases on repeated queries, confirming fresh resolver caching.dig +trace www.example.comshows consistent authoritative delegation and matching NS records after the revert.- Track adoption across global points of presence using DNS Propagation Tracking until no resolver returns the broken IP.
FAQ
How quickly will the reverted record take effect? Within one TTL cycle of the moment you apply it. With a 60-second pre-cutover TTL, most resolvers serve the legacy IP within two to three minutes; with a high TTL still in place, some clients remain on the broken origin until that cache expires.
Why must I increment the SOA serial? Authoritative secondaries only pull a zone update when the serial advances. If you revert the record on the primary but the serial does not change, secondaries keep answering with the broken IP, leaving a fraction of traffic stranded until the next scheduled refresh.
Do I need to revert AAAA records as well as A records? Yes. If you only fix the A record, IPv6-capable clients following a stale AAAA record still reach the broken origin, producing intermittent failures that look random. Always revert every record type that changed for the affected host.
Related
- DNS Rollback Procedures
- TTL Optimization Strategies
- DNS Propagation Tracking
- Migration Rollback Playbooks
â Back to DNS Rollback Procedures