Pre-Migration Auditing & Risk Assessment

Four parallel audit workstreams feed one decision gate; a no-go routes failed checks back into remediation before any DNS change.

Executive Summary

A successful migration is decided before a single record is changed. This playbook standardises the pre-launch diagnostic process so engineering, SEO, QA, and leadership operate from one verifiable picture of the legacy estate. It captures a technical baseline, quantifies organic and revenue value per URL, scores the probability and impact of each failure mode, and locks in the sign-offs that gate the DNS flip. The outcome is a documented go/no-go decision backed by hard thresholds rather than optimism — every claim of readiness traces back to a number you can re-measure after launch. Run this 60–90 days ahead of the planned cutover and treat its checklists as release gates, not advisory notes.

Prerequisites

Read access to legacy production, staging, and any pre-production environments (origin IPs, not just CDN edge).
Enterprise crawler licences with JavaScript rendering: Screaming Frog SEO Spider, Sitebulb, or JetOctopus.
Google Analytics 4 admin and Google Search Console owner permissions on the legacy property; BigQuery export enabled if available.
30–90 days of raw server access logs (Nginx/Apache) and CDN logs for the same window.
DNS provider control-panel or API access (records, current TTLs, zone export).
CDN cache-invalidation and Page/Cache Rule permissions (Cloudflare, Fastly, Akamai).
A cross-functional contact directory with named owners for Engineering, SEO, Content, QA, and a single accountable launch lead.
A version-controlled repository to store every artefact: crawl exports, redirect maps, the risk matrix, and the go/no-go record.

Step-by-Step Execution

1. Establish the Crawl Baseline

Capture a complete, unaltered snapshot of the legacy estate as the reference point every later validation is diffed against. Run a full render-enabled crawl plus a log-driven URL list so JavaScript-only links and orphaned-but-trafficked paths are both covered, then archive status codes, canonical tags, and internal link counts. Drive this through Crawl Baseline Generation and store the exports in version control with the crawl date and source environment recorded.

2. Quantify Traffic and Revenue Value per URL

Not all URLs deserve equal migration effort; rank them by what they earn. Join the crawl inventory to GA4 sessions, conversions, and assisted revenue, and to Search Console clicks and impressions, so redirect priority follows business value rather than alphabetical order. Use Traffic & Conversion Mapping to tier the inventory and flag the top revenue-driving paths that must never 404 or chain.

3. Score Migration Risk

Convert known weaknesses into a ranked, owned remediation list. Score each failure mode — redirect-chain depth, canonical drift, template parity gaps, DNS propagation delay, CDN cache poisoning — on likelihood and business impact, then sort by the product. Apply Risk Assessment Frameworks to produce a weighted matrix, and tie each high-score item to a sprint ticket and an owner before the build freezes.

4. Align Stakeholders and Set Gates

A migration fails on coordination as often as on code. Publish the timeline, the escalation matrix, and the explicit approval checkpoints so no environment switches without documented sign-off. Formalise this through Stakeholder Communication Plans, schedule the cutover window inside a historical low-traffic period, and confirm every gate owner has acknowledged their veto.

5. Prepare Redirect and Routing Inputs

Hand the downstream routing work a clean, prioritised input set rather than a raw crawl dump. Export the value-tiered URL inventory as a redirect source map, marking 1:1 targets for high-value paths and category fallbacks for retired ones. Feed this directly into URL Mapping & Redirect Architecture so rule generation starts from validated, deduplicated mappings.

6. Run Pre-Launch Validation and Lock the Gate

Verify readiness against the baseline and record a single go/no-go. Run staging diffs for status codes, canonical inheritance, robots and hreflang directives, and analytics continuity; any threshold breach is a no-go that routes back to the relevant workstream. Only when all gate owners sign off does the cutover proceed.

Technical Configs

YAML — migration readiness manifest (version-controlled, single source of truth):

# migration.yml — committed to the migration repo and reviewed at the gate
baseline:
  crawl_tool: screaming-frog            # render mode: Ajax/Chrome required
  url_count_legacy: 18432               # frozen baseline count
  drift_tolerance_pct: 5                # no-go if staging deviates beyond this
dns:
  current_ttl_seconds: 3600
  target_ttl_seconds: 300               # lowered 48h pre-cutover
  ttl_reduction_window: 48h
gates:
  staging_qa_signoff: required          # SEO + Eng + Product
  rollback_snapshot_verified: required

Bash — join crawl export to server logs to find orphaned high-traffic URLs:

# Find URLs that appear in 30-day access logs but NOT in the crawl export
# crawl_urls.txt and log_urls.txt are sorted unique path lists
comm -13 <(sort -u crawl_urls.txt) \
         <(awk '{print $7}' access.log | sort -u) \
  > orphaned_urls.txt   # these need explicit redirect decisions
wc -l orphaned_urls.txt

Bash — capture the authoritative TTL before lowering it (rollback evidence):

# dig records the pre-change SOA/TTL so rollback can restore the exact value
dig production-domain.com SOA +noall +answer        # zone serial + minimum TTL
dig production-domain.com A +noall +answer +ttlid   # current record TTLs

JavaScript — Cloudflare Worker to noindex staging without robots.txt drift:

// Block staging indexation at the edge; survives DB syncs that overwrite robots.txt
export default {
  async fetch(request, env) {
    const res = await fetch(request);          // proxy origin first
    const out = new Response(res.body, res);
    out.headers.set('X-Robots-Tag', 'noindex, nofollow'); // hard exclusion
    return out;
  }
}

Validation & Rollback

Enforce the gate against measured numbers, not impressions. Every item below maps to a re-runnable check.

Pre-Launch Validation Checklist:

Staging URL count is within 5% of the frozen baseline (grep -c '<loc>' sitemap.xml vs baseline export)
Every top-quintile-revenue URL resolves to a single-hop 301 with the correct Location header (curl -sIL)
Canonical tags on migrated templates point to the new domain, not the legacy host
GA4 measurement IDs and conversion events fire on staging (Tag Assistant clean)
X-Robots-Tag: noindex absent on production paths, present on staging
Structured data passes the Rich Results Test on representative templates
Verified, restorable snapshot of database and DNS zone exists off the production volume

Common Pitfalls:

Skipping JavaScript rendering, so client-side links never enter the baseline
Prioritising redirects alphabetically instead of by measured revenue value
Letting a database sync overwrite robots.txt and de-noindex staging
Defining rollback thresholds after launch instead of at the risk-scoring stage
Treating CDN edge responses as origin truth during the baseline crawl

Rollback Protocol:

Trigger if organic sessions drop >15% versus baseline sustained over 24h, OR 5xx error rate exceeds 5% on core templates, OR a critical conversion path returns non-200.
Restore the pre-migration DNS zone from the captured export and confirm authoritative responses.
Force CDN cache invalidation across all edge nodes; verify HTTP 200 on legacy paths.
Restore the database snapshot if content or routing state diverged.
Record the failure mode, update the risk matrix, and re-run the affected workstream before re-attempting.

FAQ

How far in advance should pre-migration auditing begin? Initiate 60–90 days before the planned cutover. That window covers baseline capture, value mapping, risk scoring, cross-functional alignment, and at least one full QA cycle without compressing critical-path dependencies.

What is the minimum indexation parity threshold before launch? Target 98%+ parity for high-value commercial URLs. Below 95% indicates unresolved canonicalisation, renderability, or redirect-chain issues that will degrade organic traffic after the flip — treat it as a no-go.

How do we handle dynamic URL parameters during the audit? Classify each parameter by function (tracking, sorting, session, pagination) using log analysis and Search Console parameter data, then set canonical rules and crawler directives so faceted and session variants do not waste crawl budget or create duplicates.

When should rollback triggers be defined? During the risk-scoring stage, before staging QA. Fix explicit numeric thresholds — for example >15% organic drop, >5% 5xx rate, or any broken core conversion path — so the cutover team executes rather than debates during an incident.

Who owns the go/no-go decision? A single accountable launch lead records it, but it requires documented sign-off from SEO, Engineering, and Product gate owners. Any unmet validation threshold is an automatic no-go regardless of schedule pressure.

← Back to Home