Small TPU Census Zones (211 zones)
Source
Section titled “Source”Census and Statistics Department (CSD / censtatd) + Planning Department (PlanD)
- Census 2021 data: https://www.census2021.gov.hk/doc/STPUG_21C.zip
- STPU boundaries: Esri ArcGIS FeatureServer (mirrored from GeoData Store, closed Aug 2023)
- Boundary FeatureServer:
https://services3.arcgis.com/6j1KwZfY2fZrfNMR/arcgis/rest/services/Hong_Kong_Population_Distribution_by_Sex_by_Small_TPU_in_2021/FeatureServer/0
Format
Section titled “Format”CSV + GeoJSON One-off (Census 2021). No live API.
Why STPU, not Districts?
Section titled “Why STPU, not Districts?”Hong Kong has 18 districts, but within a district, population density can vary wildly. Tuen Mun has both high-rise estates (50K+/km²) and country parks (near zero). District-level averages smooth this out and produce garbage estimates.
Small Tertiary Planning Units split those 18 districts into 211 zones, each roughly uniform in character. This gives us ~20x better spatial resolution at the cost of only 51KB of data.
| Granularity | Zones | Avg Population | Spatial Precision |
|---|---|---|---|
| District | 18 | ~415,000 | ~5-15 km |
| Large TPU | 72 | ~104,000 | ~2-5 km |
| Small TPU | 211 | ~35,400 | ~0.5-2 km |
| DCCA | 452 | ~16,500 | ~0.3-1 km |
Schema / Fields (Centroid JSON)
Section titled “Schema / Fields (Centroid JSON)”Each zone in stpu_centroids.json:
| Field | Type | Example | Description |
|---|---|---|---|
id | string | "111" | Zone identifier (matches Census CSV column 1 + “S” suffix) |
cx | float | 114.15108 | Centroid longitude (WGS84) |
cy | float | 22.28485 | Centroid latitude (WGS84) |
pop | integer | 60366 | Total usual resident population |
hh | integer | 22668 | Number of domestic households |
hhSize | float | 2.6 | Average household size |
income | integer | null | 40480 | Median monthly household income (HKD). Null for zones with suppressed data |
activeIncome | integer | null | 52850 | Median monthly income of economically active households |
floorSqm | integer | null | 40 | Median domestic floor area (m²) |
workPop | integer | null | 30188 | Working population |
ageU15 | integer | null | 7184 | Population aged under 15 |
age2544 | integer | null | 17520 | Population aged 25-44 |
age4564 | integer | null | 18640 | Population aged 45-64 |
age65 | integer | null | 11690 | Population aged 65+ |
areaSqkm | float | 3.12 | Zone area in km² |
Derived from Census 2021 CSV (STPUG_21C.CSV, 200+ columns) — key column mappings: col 3=total_pop, col 125=domestic_households, col 132=avg_hh_size, col 148=median_active_income, col 200=median_floor_area.
How It Works in the API
Section titled “How It Works in the API”The backend (functions/api/analyze.ts) uses a centroid-based radius approach:
- For a query point (lat, lng), calculate haversine distance to all 211 zone centroids
- Aggregate population in cumulative rings: 0-400m, 0-800m, 0-2km, 0-5km
- Compute population-weighted median income across zones within 2km
- Derive local density from the nearest zone:
pop / areaSqkm
// Simplified from findNearbyPopulation()const rings = [ { maxM: 400, pop: sumPopWithinMeters(400) }, { maxM: 800, pop: sumPopWithinMeters(800) }, { maxM: 2000, pop: sumPopWithinMeters(2000) }, { maxM: 5000, pop: sumPopWithinMeters(5000) },];const localDensity = nearestZone.pop / nearestZone.areaSqkm;const weightedIncome = popWeightedAvg(zonesWithin2km, 'income');Sample Zones
Section titled “Sample Zones”| Zone | Area | Pop | Density | Income | Character |
|---|---|---|---|---|---|
| 111 | Sheung Wan | 60,366 | 19,348/km² | HK$40,480 | Dense urban, mixed commercial-residential |
| 301 | Mong Kok core | 92,180 | 51,230/km² | HK$25,200 | Extreme density, budget-mid market |
| 602 | Sai Kung Town | 8,440 | 1,690/km² | HK$42,100 | Suburban, weekend destination |
| 203 | The Peak / Mid-Levels | 3,620 | 560/km² | HK$163,000 | Ultra-low density, ultra-high income |
Pipeline (How the Data Was Built)
Section titled “Pipeline (How the Data Was Built)”- Census CSV: Downloaded
STPUG_21C.zipfrom censtatd (212 zones × 200+ columns) - Esri boundaries: Paginated download from ArcGIS FeatureServer (3 batches of 100, 211 MultiPolygon features)
- Matching: Zone IDs mapped via suffix rule (GeoJSON
"111S"→ CSV"111") - Centroid extraction: Computed polygon centroid for each zone using average of boundary coordinates
- Merged output:
stpu_centroids.json(51KB) with centroid + all census stats per zone - Integration: Inlined as
STPU_ZONESconstant inanalyze.ts
Full geometry available at /tmp/hk-census/stpu_combined.geojson (17.4MB) for offline analysis.
Used By
Section titled “Used By”| Model | How |
|---|---|
| Gravity Model | Real zone populations for exclusive radius rings instead of density × π×r² estimation |
| Catchment Analysis | STPU population for 5-min and 10-min walk zones |
| Regression Model | Local density and income as independent variables |
| Geodemographics | Zone-level income + age distribution for segment classification |
| Microsimulation | Seeds synthetic agents from actual zone demographics |
Notes / Gotchas
Section titled “Notes / Gotchas”- Data vintage: Census 2021 — now 5 years old. Population shifts (new NT housing estates, post-COVID expat changes) are NOT reflected. Treat as best-available baseline, not current truth
- COVID census: 2021 figures reflect pandemic conditions (lower population in expat-heavy areas)
- Null income: Some zones have suppressed income data (small sample size). Backend falls back to district median
- Daytime vs nighttime: Census counts residents at their usual address, not where they work. Central’s daytime population is 5-10x its resident population. This is the single biggest limitation for CBD locations — a restaurant in Central serving lunch crowds sees 10x the foot traffic that census data suggests
- 0-400m ring in CBD: Central and Wan Chai often show pop=0 in the innermost ring because STPU zone centroids are spaced 400m+ apart in commercial areas with few residents. This is expected — CBD locations rely on commuter inflow (gravity model), not local residents
- No building-level data yet: STPU zones average ~35K people. Future work: disaggregate to building level using footprints and floor counts from Lands Department GeoInfo Map API