Multiple Regression Model
The multiple regression model is the most straightforward way to predict a store’s revenue from measurable site characteristics. It’s a statistical workhorse: you feed it data from existing stores, and it tells you what drives sales.
The Formula
Section titled “The Formula”Y = a + b₁X₁ + b₂X₂ + b₃X₃ + b₄X₄
Where:
- Y = predicted annual turnover
- X₁ = store size (sq ft)
- X₂ = competition index (nearby competing stores)
- X₃ = market size (population within catchment)
- X₄ = affluence index (household income)
- a, b₁…b₄ = coefficients fitted from existing store data
“Multiple regression is used to combine different attributes into the same regression equation… The model is usually built up sequentially (‘stepwise’), starting with the most important variable.” — Birkin & Clarke, Retail Geography, Ch. 7
Data Requirements
Section titled “Data Requirements”| Dataset | What We Use | Link |
|---|---|---|
| FEHD Restaurant Licences | Competitor count as independent variable | View → |
| Rental Indices (RVD) | Rent cost as independent variable | View → |
| Census Population | Population density as independent variable | View → |
| Household Income | Area spending power as independent variable | View → |
| Employment by District | Worker density predicting lunch revenue | View → |
| Consumer Price Index | Inflation-adjust historical revenue data | View → |
| Restaurant Receipts | Revenue benchmarks from census data | View → |
| Building Information | Building usage mix as predictor (commercial ratio → lunch crowd) | View → |
How It Works
Section titled “How It Works”- Collect data from existing stores: turnover, size, competition, market size, affluence
- Run stepwise regression — each variable enters the model in order of explanatory power
- Read the coefficients — they tell you which factors matter most
- Apply to new sites — plug in the new location’s attributes, get a turnover prediction
Real Example: Birkin & Clarke’s Store Network
Section titled “Real Example: Birkin & Clarke’s Store Network”From the book (Table 7.3), a UK retailer’s regression output:
| Variable | Slope | Contribution to Variance |
|---|---|---|
| Market size | 5.8 | 0.33 |
| Store size | 125 | 0.20 |
| Competition | 116,448 | 0.18 |
| Income | 17,956 | 0.09 |
| Total R² | 0.80 |
Market size alone explains 33% of turnover variation. Store size adds 20%. Together with competition and income, the model explains 80% of all variance.
Strengths & Limitations
Section titled “Strengths & Limitations”Strengths:
- Simple, transparent, explainable to stakeholders
- Quick to compute — runs in milliseconds
- Good for portfolio benchmarking (which stores underperform their attributes?)
Limitations:
- Requires training data (actual revenue from comparable stores)
- Assumes linear relationships (diminishing returns not captured)
- Doesn’t model consumer choice — treats each site independently
- Can’t simulate “what-if” scenarios (new competitor, road closure)
When to Use It
Section titled “When to Use It”The regression model is best as a first filter — screening many potential sites down to a shortlist. For the shortlist, you’d graduate to the Spatial Interaction Model or LLM Agent Simulation to model actual consumer behaviour.
“Two important applications of the model are performance assessment and evaluation of potential.” — Birkin & Clarke, Ch. 7
Implementation Notes
Section titled “Implementation Notes”Current Implementation (2026-03-25)
Section titled “Current Implementation (2026-03-25)”Current formula:
Y = (a + b1×area + b2×compIdx + b3×market + b4×transport%) × priceMultCoefficients:
a = 20000(base; every restaurant has some baseline revenue)b1 = 120(area: 500 sqft → +60K contribution)b2 = -80(competition drag: compIdx 100 → −8K)b3 = 0.3(market size, capped at 100K; max contribution = +30K)b4 = 30000(transport: full score = +30K)
Competition index: compIdx = min(100, round(100 × log(1 + competitors) / log(1001))) — uses log(1001) as denominator so only 1000+ competitors reach 100.
Price multipliers: Budget 0.7, Mid 1.0, Premium 1.5, High-end 2.2.
Revenue floor: Math.max(30000, predicted) — no restaurant prediction goes below HK$30K/month.
Changelog
Section titled “Changelog”| Date | Change | Why |
|---|---|---|
| 2026-03-25 | Competition index denominator changed from log(81) to log(1001) | log(81) saturated at ~81 competitors; urban locations all hit 100 and became indistinguishable |
| 2026-03-24 | b2 (competition drag) changed from -200 to -80 | At b2=-200, competitionIndex ~100 subtracted 20K from prediction, driving everything to the floor |
| 2026-03-24 | Revenue floor lowered from Math.max(500000, ...) to Math.max(30000, ...) (via 50K intermediate) | 500K floor meant all predictions were identical regardless of inputs |
| 2026-03-24 | b3 (market size) changed to 0.3 with 100K cap | Uncapped market contribution was dominating predictions in dense districts |
Source
Section titled “Source”📖 Birkin, M. & Clarke, G. (2023). Retail Geography. Chapter 7: Store Performance Modelling.