Forecasting Organic Traffic Loss from URL Changes

Problem Statement

“How much traffic will we lose?” is the first question leadership asks about a URL change, and “some” is not an acceptable answer. Every URL change carries a temporary reindexing cost while search engines rediscover, recrawl, and recredit the new addresses, and the size of that cost depends on redirect quality, how much the URL structure changes, and the authority of the pages affected. A forecast turns the gut-feel into a quantified range with explicit risk bands, so the business can decide whether to proceed, phase, or delay — and so a normal post-launch dip is recognised as the model playing out rather than a disaster. The forecast is a planning instrument and an expectation-setter in equal measure: its band tells you the realistic downside, and its recovery curve tells everyone how long to hold their nerve before the numbers come back. This page is part of Traffic & Conversion Mapping; start there for the wider mapping workflow.

Traffic dips at cutover and recovers; the band between the low- and high-risk curves is the forecast range.

When to Use This Approach

A URL structure change, domain move, or replatform is proposed and leadership needs a quantified impact.
You have at least 6–12 months of historical organic traffic to model decay and seasonality.
The migration could plausibly be phased or delayed, so the forecast informs a real decision.
You need to set expectations for the post-launch dip so a normal recovery is not mistaken for failure.
Redirect quality and URL-change magnitude vary across the site and you want per-segment risk.

Step-by-Step Instructions

1. Establish the Baseline and Seasonality

Pull at least a year of organic sessions and de-seasonalise so the forecast is not distorted by an annual peak. A trailing 12-month daily series with a moving average gives a clean baseline to apply loss factors against.

# Baseline organic sessions with a 28-day moving average to smooth seasonality
import pandas as pd
s = pd.read_csv('organic_daily.csv', parse_dates=['date']).set_index('date')['sessions']
baseline = s.rolling(28, min_periods=7).mean()       # smoothed baseline
weekly = baseline.resample('W').mean()               # weekly granularity for the forecast
weekly.to_csv('baseline_weekly.csv')

2. Assign Loss Factors by Risk Band

Different changes carry different expected losses. Bucket each URL segment into a risk band based on redirect quality and how much its URL changes, then attach a low, expected, and high loss factor to each band. Calibrate the factors against your own history where possible: if a prior section move dropped 12% for six weeks, that is a far better anchor than an industry rule of thumb. Where you have no history, keep the bands conservative and wide, because an honestly uncertain range beats a falsely precise point estimate that leadership will hold you to.

# Risk bands -> (low, expected, high) initial traffic-loss factors
BANDS = {
    'green':  (0.00, 0.05, 0.10),   # 1:1 301, URL barely changes
    'amber':  (0.05, 0.15, 0.25),   # structure changes, redirects ok
    'red':    (0.15, 0.30, 0.50),   # consolidations, weak/missing redirects
}
def loss(band, scenario):   # scenario in {0:low,1:expected,2:high}
    return BANDS[band][scenario]

3. Model the Dip and Recovery Curve

Reindexing recovers gradually, so model the dip as an initial drop that decays back toward baseline over several weeks. An exponential recovery with a per-band half-life captures the typical shape: a sharp initial loss as the new URLs are rediscovered, then a tapering climb as authority transfers through the redirects. Use a longer half-life for red-band segments, because weak redirects and large structural changes recover more slowly, and allow that the worst segments may settle at a permanently lower plateau rather than returning fully to baseline — the model should be able to express incomplete recovery, not just a delayed one.

# Project weekly traffic = baseline * (1 - initial_loss * decay^week)
import numpy as np
def project(baseline_w, initial_loss, half_life_weeks=4):
    decay = 0.5 ** (1 / half_life_weeks)             # weekly recovery factor
    weeks = np.arange(len(baseline_w))
    retained = 1 - initial_loss * (decay ** weeks)   # fraction of baseline retained
    return baseline_w.values * retained

4. Aggregate to a Site-Level Range

Sum each segment’s low, expected, and high projections to get a site-level forecast band. Report the cumulative sessions lost over the recovery window, not just the worst single week, because that total is what stakeholders weigh against the benefit of migrating. Present three numbers and the assumptions behind each, never a single figure, so the conversation is about risk tolerance rather than a false promise. Convert sessions to revenue using the segment’s conversion rate and average order value where you can, because leadership reasons in money, and a band expressed as “between £40k and £190k of deferred revenue over six weeks” lands far harder than one expressed in sessions.

# Cumulative sessions lost over the recovery window, per scenario
def cumulative_loss(baseline_w, projected):
    return float((baseline_w.values - projected).clip(min=0).sum())   # total lost sessions
# Run for scenarios 0/1/2 per segment, then sum across segments for the site band

Worked Example

A retailer migrating oldshop.example.com segments its traffic: 60% sits in green (clean 1:1 301s, near-identical URLs), 30% in amber (category restructure with good redirects), and 10% in red (a tag consolidation with partial redirects). With a baseline of 1,000,000 weekly organic sessions and a 4-week half-life, the model projects an expected first-week dip of about 11% blended, recovering to within 3% of baseline by week six.

Summed over the recovery window the expected scenario loses roughly 380,000 cumulative sessions, with a high-risk scenario near 720,000 and a low-risk scenario near 150,000. Leadership uses the band to decide to phase the red segment separately: by migrating the tag consolidation a fortnight after the main move, the team shrinks the blended first-week dip and avoids stacking the riskiest change on top of everything else.

The forecast also reframes the post-launch review. When actual traffic lands at week one down 10%, the team compares it against the modelled 11% expected dip and confidently reports “tracking to plan” instead of triggering a needless rollback. The band did its job not by being exactly right, but by bracketing reality and giving everyone a reference to measure against. The segment risk bands come straight from the revenue segmentation work, the redirect quality from mapping legacy traffic to new URL structures, and the forecast rolls up into the Pre-Migration Auditing & Risk Assessment record.

Verification

Confirm the model is calibrated and the band is sensible.

# Baseline must cover ≥12 months for seasonality to be modelled
python -c "import pandas as pd;d=pd.read_csv('organic_daily.csv');print(d.date.min(), d.date.max())"

# Sanity-check: expected dip should sit between low and high scenarios
python -c "from model import run; lo,ex,hi=run(); print(lo<ex<hi)"

# Back-test against a prior migration if one exists (actual vs predicted)
python backtest.py prior_migration.csv

Watch for these failures: forecasting off un-smoothed data so a seasonal peak inflates the baseline; using one loss factor for the whole site instead of per-band; and reporting only the worst week, which overstates impact versus the cumulative recovery. Avoid false precision in the other direction too — a model quoted to four significant figures invites stakeholders to treat the midpoint as a guarantee, so round to a sensible range and lead with the band. Revisit the forecast after launch with actuals: comparing modelled against real recovery both validates the model for next time and tells you early whether you are tracking to the low, expected, or high path so mitigation can escalate before the window closes.

FAQ

How accurate can a traffic-loss forecast really be? It is a range, not a point. Calibrated against historical decay and prior migrations, a well-segmented forecast typically brackets the actual outcome within its low-to-high band; the value is in the band and the assumptions, not a single deceptively precise number.

What drives a URL into the high-risk band? Weak or missing redirects, large URL-structure changes, page consolidations that map many old URLs to one, and high-authority pages whose equity is hard to transfer. Any of these widens the expected dip and lengthens recovery, which is why per-segment banding beats a site-wide average.

How long should I expect recovery to take? Most sites recover the bulk of organic traffic within four to eight weeks if redirects are clean, modelled here as a half-life of around four weeks. Red-band segments with poor redirects can take months or never fully recover, which is exactly the case the forecast is meant to surface before launch.

← Back to Traffic & Conversion Mapping