Using Weighted DNS for Gradual Traffic Migration
Problem Statement
A hard DNS flip sends 100% of traffic to a brand-new origin in one move, so any latent capacity, configuration, or data-layer issue hits every user at once. Weighted DNS lets you publish two records for the same hostname and split resolution by a configurable ratio, so you can send 5%, then 25%, then 100% of traffic to the new origin while watching real metrics. This page, part of Zero-Downtime Cutover Plans, shows how to run that gradual, canary-style cutover with Route 53 weighted routing or an equivalent provider.
When to Use This Approach
- The new origin is functionally identical to the old one for the same hostname and you can run both in parallel.
- You want to limit blast radius by exposing a small percentage of real traffic first.
- You need a rollback that is a single weight change rather than a re-publish of the original record.
- Your provider supports weighted routing (Route 53 weighted records, or equivalents on Cloudflare load balancing, NS1, or Azure Traffic Manager).
- You have lowered TTL ahead of the ramp (see How to Lower DNS TTL Before Migration) so weight changes take effect quickly.
Step-by-Step Instructions
1. Lower TTL So Weight Changes Take Effect Fast
Weighted routing only ramps as quickly as resolvers re-query, which is governed by TTL. Set a short TTL (60–120 seconds) on both weighted records before you start so each weight adjustment propagates within a couple of minutes.
# Confirm the short TTL is live before ramping
dig @1.1.1.1 example.com A +noall +answer
# example.com. 60 IN A 198.51.100.20
2. Create the Two Weighted Records
Publish two records with the same name and type but different SetIdentifier values and weights. Start with the new origin at a small weight so it receives only a canary slice of traffic.
# Old origin — weight 95
aws route53 change-resource-record-sets --hosted-zone-id Z123EXAMPLE --change-batch '{
"Changes": [{ "Action": "UPSERT", "ResourceRecordSet": {
"Name": "example.com.", "Type": "A", "TTL": 60,
"SetIdentifier": "old-origin", "Weight": 95,
"ResourceRecords": [{"Value": "198.51.100.20"}] }}]}'
# New origin — weight 5 (the canary)
aws route53 change-resource-record-sets --hosted-zone-id Z123EXAMPLE --change-batch '{
"Changes": [{ "Action": "UPSERT", "ResourceRecordSet": {
"Name": "example.com.", "Type": "A", "TTL": 60,
"SetIdentifier": "new-origin", "Weight": 5,
"ResourceRecords": [{"Value": "203.0.113.10"}] }}]}'
3. Validate the Canary Slice
With ~5% of resolutions hitting the new origin, watch new-origin error rates, latency, and application logs against the old-origin baseline. Hold at this weight long enough to cover real traffic patterns before ramping further.
# Sample resolution repeatedly to confirm the ~95/5 split is observable
for i in $(seq 1 20); do dig @8.8.8.8 example.com A +short; done | sort | uniq -c
4. Ramp the Weight Upward in Stages
If the canary holds healthy, increase the new-origin weight in stages — for example 5 → 25 → 50 → 100 — pausing at each stage to validate. Adjust both records each time so the ratio is explicit.
# Stage to 25% new origin: old weight 75, new weight 25
aws route53 change-resource-record-sets --hosted-zone-id Z123EXAMPLE --change-batch '{
"Changes": [
{ "Action": "UPSERT", "ResourceRecordSet": { "Name": "example.com.", "Type": "A",
"TTL": 60, "SetIdentifier": "old-origin", "Weight": 75,
"ResourceRecords": [{"Value": "198.51.100.20"}] }},
{ "Action": "UPSERT", "ResourceRecordSet": { "Name": "example.com.", "Type": "A",
"TTL": 60, "SetIdentifier": "new-origin", "Weight": 25,
"ResourceRecords": [{"Value": "203.0.113.10"}] }}
]}'
5. Complete or Roll Back the Cutover
To finish, set the old-origin weight to 0 and the new-origin weight to a positive value so all traffic resolves to the new origin. To roll back at any stage, do the inverse — drop the new-origin weight to 0. A weighted ramp is a per-host strategy and pairs well with the environment-swap pattern in Implementing Blue-Green Deployments for Site Migrations.
# Finish: old origin to 0, new origin carries everything
aws route53 change-resource-record-sets --hosted-zone-id Z123EXAMPLE --change-batch '{
"Changes": [
{ "Action": "UPSERT", "ResourceRecordSet": { "Name": "example.com.", "Type": "A",
"TTL": 60, "SetIdentifier": "old-origin", "Weight": 0,
"ResourceRecords": [{"Value": "198.51.100.20"}] }},
{ "Action": "UPSERT", "ResourceRecordSet": { "Name": "example.com.", "Type": "A",
"TTL": 60, "SetIdentifier": "new-origin", "Weight": 100,
"ResourceRecords": [{"Value": "203.0.113.10"}] }}
]}'
Worked Example
A retailer migrates example.com from 198.51.100.20 to a new platform at 203.0.113.10. TTL is pre-lowered to 60s.
Day 1, canary at weight 5. Sampling 8.8.8.8 twenty times shows the split is live:
$ for i in $(seq 1 20); do dig @8.8.8.8 example.com A +short; done | sort | uniq -c
19 198.51.100.20
1 203.0.113.10
New-origin error rate sits at 0.1%, matching the old origin, so the team ramps to 25%, then 50% over the next two days, validating at each stage. On day 4 they cut old-origin weight to 0:
$ for i in $(seq 1 20); do dig @8.8.8.8 example.com A +short; done | sort | uniq -c
20 203.0.113.10
All sampled resolutions now return the new origin. Had error rate spiked at any stage, setting the new-origin weight to 0 would have reverted the slice within one TTL window.
Verification
# Sampled distribution should track the configured weights
for i in $(seq 1 50); do dig @8.8.8.8 example.com A +short; done | sort | uniq -c
# Confirm the weighted set itself is published correctly at the provider
aws route53 list-resource-record-sets --hosted-zone-id Z123EXAMPLE \
--query "ResourceRecordSets[?Name=='example.com.']"
Each ramp stage passes when the sampled resolution ratio approximates the configured weights and new-origin error rate and latency stay within your thresholds; the cutover is complete when 100% of samples return the new origin and the old-origin weight is 0.
FAQ
Why don’t I see exactly the weight ratio in my dig samples? Weighted routing is probabilistic per query and influenced by resolver caching, so a small sample drifts from the configured ratio. Sample 50–100 times across multiple resolvers and the observed distribution will converge on the weights; very short windows are noisy by nature.
How fast can I roll back a weighted cutover? Rollback is one change: set the new-origin weight to 0. Because you lowered TTL before starting, resolvers re-query within that TTL window — typically one to two minutes — so a bad stage drains far faster than re-publishing an original A record from scratch.
Can I use weighted DNS without Route 53? Yes. Cloudflare load balancing, NS1, and Azure Traffic Manager all offer weighted or proportional routing using the same principle of two records ramped by ratio. The commands differ but the staged 5 → 25 → 50 → 100 ramp and the lowered-TTL prerequisite are identical.
Related
- Zero-Downtime Cutover Plans
- Implementing Blue-Green Deployments for Site Migrations
- How to Lower DNS TTL Before Migration
← Back to Zero-Downtime Cutover Plans