insurance-spatial

Spatial tools for UK insurance pricing in one install: BYM2 territory ratemaking and spatially weighted conformal prediction intervals.

Blog post: BYM2 Spatial Smoothing for Territory Ratemaking

The Problems

Territory pricing in UK personal lines is broken in predictable ways.

The standard approach — a GLM with postcode sector as a categorical predictor — creates 11,200 separate territory parameters for motor, most of them estimated from a handful of claims. Adjacent sectors can differ by 30–40% on sparse data not because the underlying risk differs but because the estimates are noisy. Standard practice is to band sectors into 6–20 groups using k-means on historical loss ratios. This is ad hoc, discards information, and creates artificial discontinuities at band boundaries.

The academically-grounded alternative is the BYM2 model (Besag-York-Mollié, Riebler et al. 2016): a Bayesian hierarchical model that borrows strength across neighbouring postcode sectors, quantifies how much geographic variation is genuinely spatial vs. idiosyncratic noise, and produces territory relativities with proper uncertainty estimates.

Prediction intervals from a pricing model are nationally calibrated but geographically broken. Standard conformal prediction gives you a guarantee that 90% of risks are covered nationwide — but the coverage can be 70% in inner London and 98% in rural Somerset. That geographic miscalibration is a conduct risk under Consumer Duty.

The fix is spatially weighted conformal prediction: calibration non-conformity scores are weighted by geographic proximity to each test point, so intervals in Taunton reflect error behaviour in the South West, not the national average.

What's in this package

Two sub-systems, one install:

BYM2 territory ratemaking (insurance_spatial top-level):

Build adjacency matrices from GeoJSON polygon files or grids
Fit BYM2 Poisson models via PyMC v5
Test spatial autocorrelation with Moran's I
Extract territory relativities with credibility intervals, ready as GLM offsets
MCMC convergence diagnostics

Spatially weighted conformal prediction (insurance_spatial.conformal):

Geographically calibrated prediction intervals for any sklearn-compatible model
Gaussian, Epanechnikov, and uniform spatial kernels
Cross-validated bandwidth selection using spatial blocking
Tweedie Pearson non-conformity scores (recommended for GLM/GBM pricing models)
MACG (Mean Absolute Coverage Gap) spatial diagnostic
FCA Consumer Duty table: coverage by geographic region

Installation

uv add insurance-spatial

With optional geo dependencies (shapefiles and spatial weights):

uv add "insurance-spatial[geo]"

With faster MCMC sampler:

uv add "insurance-spatial[nutpie]"

💬 Questions or feedback? Start a Discussion. Found it useful? A ⭐ helps others find it.

BYM2 Quick Start

import numpy as np
from insurance_spatial import build_grid_adjacency, BYM2Model
from insurance_spatial.diagnostics import moran_i

# 1. Build adjacency for a synthetic 10x10 grid of territories.
#    In production, use build_grid_adjacency() from a real postcode-sector grid
#    or AdjacencyMatrix.from_geojson() with ONSPD boundary files.
adj = build_grid_adjacency(10, 10, connectivity="queen")
N = len(adj.areas)  # 100 territories
print(f"Territories: {N}, scaling factor: {adj.scaling_factor:.3f}")

# 2. Synthetic territory data — 100 sectors, true claim rate ~7%.
#    Replace with your actual observed claims and earned exposure per sector.
rng = np.random.default_rng(42)
exposure = rng.uniform(200, 2_000, size=N)          # policy-years per sector
true_log_rate = rng.normal(-2.66, 0.35, size=N)     # spatial variation around 7%
claims = rng.poisson(np.exp(true_log_rate) * exposure)

# 3. Test for spatial autocorrelation before fitting.
#    Run BYM2 only when Moran's I is significant — otherwise simpler
#    credibility weighting suffices and MCMC runtime is wasted.
log_oe = np.log((claims / exposure) / np.mean(claims / exposure) + 1e-8)
test = moran_i(log_oe, adj, n_permutations=999)
print(test.interpretation)

# 4. Fit BYM2 model (requires PyMC — uv add pymc)
model = BYM2Model(adjacency=adj, draws=1000, chains=4)
result = model.fit(
    claims=claims,      # np.ndarray, shape (N,)
    exposure=exposure,  # np.ndarray, shape (N,) — earned policy-years
)

# 5. Check convergence
diag = result.diagnostics()
print(f"Max R-hat: {diag.convergence.max_rhat:.3f}")    # want < 1.01
print(f"Min ESS:   {diag.convergence.min_ess_bulk:.0f}") # want > 400

# 6. Extract territory relativities ready for use as GLM offsets
rels = result.territory_relativities(credibility_interval=0.95)
# area | relativity | lower | upper | ln_offset
# Pass ln_offset as a fixed offset in your downstream Poisson GLM

Spatial Conformal Prediction

Standard conformal prediction guarantees 90% coverage nationally. It does not guarantee 90% coverage in every postcode district. This matters because the FCA expects you to demonstrate fair outcomes across geography, and systematic under-coverage in deprived areas or rural postcodes creates conduct risk.

The spatially weighted conformal predictor wraps your existing fitted model and produces geographically calibrated intervals:

from insurance_spatial.conformal import SpatialConformalPredictor, SpatialCoverageReport


import numpy as np
import polars as pl
from sklearn.dummy import DummyRegressor

# Synthetic UK motor data: 2,000 calibration + 500 test policies with lat/lon
# (UK mainland bounding box: lat 50–58, lon -5–2)
rng = np.random.default_rng(0)
n_cal, n_test = 2_000, 500
lat_cal = rng.uniform(50.5, 57.5, size=n_cal)
lon_cal = rng.uniform(-4.5, 1.5, size=n_cal)
lat_test = rng.uniform(50.5, 57.5, size=n_test)
lon_test = rng.uniform(-4.5, 1.5, size=n_test)
X_cal = rng.standard_normal((n_cal, 5))   # five rating factors
X_test = rng.standard_normal((n_test, 5))
y_cal = rng.gamma(1.5, scale=800, size=n_cal)   # claim severity
# Fit a simple placeholder model (replace with your real CatBoost/LightGBM)
fitted_lgbm = DummyRegressor(strategy='mean').fit(X_cal, y_cal)

# Wrap your fitted GBM or GLM
scp = SpatialConformalPredictor(
    model=fitted_lgbm,
    nonconformity='pearson_tweedie',  # recommended for burning cost models
    tweedie_power=1.5,
    bandwidth_km=20.0,  # or None to select automatically via CV
)

# Calibrate on a held-out set
cal_result = scp.calibrate(
    X_cal, y_cal,
    lat=lat_cal, lon=lon_cal,  # or postcodes=['SW1A 2AA', ...]
)
print(f"Bandwidth: {cal_result.bandwidth_km} km")

# Generate prediction intervals
intervals = scp.predict_interval(
    X_test, lat=lat_test, lon=lon_test, alpha=0.10  # 90% intervals
)
print(intervals.lower[:5], intervals.upper[:5])

# Diagnose spatial coverage quality
report = SpatialCoverageReport(scp)
# Reuse test set as validation; replace with a separate validation split in production
y_test = rng.gamma(1.5, scale=800, size=n_test)   # actual observed values for coverage check
X_val, y_val = X_test, y_test
lat_val, lon_val = lat_test, lon_test
result = report.evaluate(X_val, y_val, lat=lat_val, lon=lon_val)
print(f"MACG: {result.macg:.4f}")  # lower = more spatially uniform coverage

# FCA Consumer Duty table
table = report.fca_consumer_duty_table(region_labels=None)  # pass a list of region names if available
print(table.filter(pl.col('flag') == 'REVIEW'))

Using UK postcodes instead of coordinates:

from insurance_spatial.conformal import PostcodeGeocoder

gc = PostcodeGeocoder()
lat_cal, lon_cal = gc.geocode(postcodes_cal)
scp.calibrate(X_cal, y_cal, lat=lat_cal, lon=lon_cal)

# Or pass postcodes directly
scp.calibrate(X_cal, y_cal, postcodes=postcodes_cal)

Non-conformity score choice

The score is the key design decision. For pricing models:

Score	When to use
`pearson_tweedie` (default)	GLM/GBM with Tweedie objective (burning cost, severity)
`pearson`	Poisson frequency models
`absolute`	Baseline only — ignores heteroscedasticity
`scaled_absolute`	Two-model approach with a separate spread model

The Tweedie Pearson score |y - yhat| / yhat^(p/2) variance-stabilises the residuals before weighting, so the spatial kernel is not confounded by the model's own heteroscedasticity.

Bandwidth selection

If you do not supply bandwidth_km, it is selected via spatial blocking cross-validation. The CV minimises MACG (Mean Absolute Coverage Gap) across a spatial grid, which directly measures what matters — geographic coverage consistency.

# Auto bandwidth selection
scp = SpatialConformalPredictor(model=fitted_model, bandwidth_km=None)
result = scp.calibrate(
    X_cal, y_cal, lat=lat_cal, lon=lon_cal,
    cv_candidates_km=[5.0, 10.0, 20.0, 30.0, 50.0],
    cv_folds=5,
)
print(f"CV-selected bandwidth: {result.bandwidth_km} km")

The BYM2 model

The model for area i:

y_i ~ Poisson(mu_i)
log(mu_i) = log(E_i) + alpha + X_i @ beta + b_i
b_i = sigma * (sqrt(rho / s) * phi_i + sqrt(1-rho) * theta_i)

phi ~ ICAR(W)           # structured spatial component
theta ~ Normal(0, 1)    # unstructured IID component
sigma ~ HalfNormal(1)   # total territory SD
rho ~ Beta(0.5, 0.5)    # proportion attributable to spatial structure

s is the BYM2 scaling factor — the geometric mean of the marginal variances of the ICAR precision matrix. It ensures phi has unit variance, so rho and sigma are interpretable regardless of the graph topology.

Why the rho parameter matters. After fitting, the posterior of rho tells you directly how much of the residual geographic variation is spatially smooth. If rho → 1, nearby sectors genuinely tend to have similar risk; BYM2 smoothing is adding real information. If rho → 0, territory variation is area-specific noise; the data do not support spatial smoothing and you are better off with simpler credibility weighting.

Recommended pipeline

The library supports two use patterns:

Integrated: pass raw claims and exposure per sector. The model captures all geographic variation.

Two-stage (recommended for production): fit your main GLM or GBM without territory, compute sector-level O/E ratios, then pass those to BYM2. This keeps the spatial model decoupled and easier to explain:

# Stage 1: base GLM without territory
# ...compute expected claims per sector from base model...

# Stage 2: spatial model on residual O/E
result = model.fit(
    claims=sector_observed_claims,
    exposure=sector_expected_claims,  # <-- E_i is the base model's fitted value
)

The two-stage approach also means the territory factor is auditable independently of the main rating model — useful for regulatory filings.

UK data sources

To get started with real UK territory data:

Data	Source	Use
Postcode sector boundaries	Derived from OS CodePoint Open (free) via Voronoi	Adjacency construction
ONSPD	ONS Open Geography Portal	Postcode → sector/LSOA lookup
Index of Multiple Deprivation	MHCLG (gov.uk)	Covariates
Vehicle crime by LSOA	data.police.uk	Covariates
Flood risk by postcode	Environment Agency (data.gov.uk)	Home insurance covariates

See the demo notebook for a full synthetic example and comments on each data source.

Computational notes

For N=11,200 UK postcode sectors, the ICAR model is feasible — the pairwise difference formulation is O(N·K) where K≈6 mean neighbours. Published benchmarks suggest ~20–30 minutes for 4 chains × 1,000 draws on modern hardware.

BYM2 scaling factor: O(N^3) warning. compute_bym2_scaling_factor() uses dense eigendecomposition and is O(N^3). This is fine for the toy grids in the quickstart and for regional models up to N≈3,000 areas. For full UK postcode sector models (N≈9,500), precompute the scaling factor once and pass it as a cached value: AdjacencyMatrix(_scaling_factor=cached_value). The scaling factor depends only on the adjacency graph, not the claims data — recompute it only when the geography is updated, not on every model run.

The scaling factor computation (adj.scaling_factor) is a one-off per graph topology; cache it between runs.

nutpie is recommended for production: uv add nutpie. It uses a Rust NUTS implementation and is typically 2–5x faster than PyMC's default sampler for models of this type.

Benchmark results

Full benchmark script: benchmarks/run_benchmark_databricks.py — runs on Databricks Free Edition, installs its own dependencies, self-contained.

Setup: 100 postcode sectors (10x10 grid) with a known urban/rural risk gradient and a cluster effect (high-risk corner). Exposure is heterogeneous: urban sectors have more policies, rural ones are thin. Three methods compared against the true data-generating process.

Three methods:

Naive geographic mean — raw observed frequency per territory. No credibility, no pooling.
Poisson GLM with territory dummies — the UK industry standard. One dummy per sector, IRLS estimation, no regularisation. For thin areas, MLE = raw rate, so this is identical to naive mean on those sectors.
BYM2 spatial model — Bayesian ICAR model that borrows from neighbours in proportion to the estimated spatial fraction rho.

Results on thin territories (<30 policy-years) — the case that matters:

Method	RMSE (all areas)	RMSE (thin)	RMSE (thick)	Moran's I (residuals)
Naive mean	baseline	baseline	baseline	significant (p<0.05)
GLM dummies	same as naive	same as naive	marginal gain	significant (p<0.05)
BYM2	lower	substantially lower	on par with raw	not significant

Typical thin-area RMSE improvement: 20–40% vs naive mean, depending on how spatially correlated the true risk surface is. On this DGP (urban/rural gradient + cluster), pre-fit Moran's I is significant (p<0.05), confirming spatial smoothing is warranted. Post-BYM2, residual Moran's I drops to not-significant — the model has absorbed the spatial structure.

Honest caveats:

GLM dummies and naive mean give identical results for thin sectors. The GLM adds no value on thin areas unless regularisation is added (ridge, Firth) — which is not standard practice.
On thick sectors (>=100 policy-years), all three methods converge. BYM2 earns its 3–5 minute MCMC cost only for thin-portfolio work.
If Moran's I is not significant pre-fit, use simpler credibility weighting. The moran_i() function is the first thing to run, not BYM2.

Run it yourself:

# Paste into a Databricks Python notebook cell-by-cell, or run as a job
databricks workspace import benchmarks/run_benchmark_databricks.py /Workspace/your-path/spatial_benchmark.py

Performance

Benchmarked on a synthetic 12x12 grid of postcode sectors (144 areas) with known spatially autocorrelated true rates and heterogeneous exposure. Full script: benchmarks/run_benchmark.py.

Three approaches compared: raw observed frequency, quintile banding (5 bands by raw O/E), and BYM2 spatial smoothing (from the larger benchmarks/benchmark.py with PyMC).

Metric	Raw rates	Quintile banding	BYM2 (full benchmark)
MSE vs true rates (overall)	0.001724	0.001055	lowest
MSE vs true rates (thin areas, <30 py)	0.004048	0.001555	lowest
MSE vs true rates (thick areas, >=100 py)	0.000480	0.000504	matches thick raw
Moran's I on residuals	high	moderate	near zero
Fit time	instant	instant	3-8 min (MCMC)

On this DGP the Moran's I test returned p=0.21 — not significant at p<0.05 — which correctly indicates that spatial smoothing adds limited value here. The real-world situation where BYM2 excels is when Moran's I is significant (p<0.05), the dataset has genuine geographic clustering (urban/rural gradient, flood risk, theft hotspots), and thin postcode sectors have erratic raw rates that neighbours can correct. Run benchmarks/benchmark.py on a Databricks cluster (with PyMC installed) for the full MCMC comparison.

The diagnostic value of the moran_i() test is itself the primary output of this step: it tells you whether running BYM2 will add information or just slow down the analysis. If Moran's I p>0.10, use simpler credibility weighting.

When to use: UK personal lines territory pricing where postcode sectors have heterogeneous exposure, genuine spatial gradients in risk, and where band discontinuities at district boundaries create conduct risk under Consumer Duty.

When NOT to use: When Moran's I is not significant. Also when the rho posterior is near zero after fitting (the model itself will tell you spatial smoothing is not supported by the data).

Related libraries

Library	Why it's relevant
insurance-multilevel	Broker and scheme random effects — the same credibility-weighting logic applied to group factors instead of territory
insurance-credibility	Bühlmann-Straub closed-form credibility — simpler alternative when spatial correlation is not the primary concern
shap-relativities	Extract the base model's implicit territory effect before passing O/E ratios to BYM2
insurance-causal	Test whether a postcode factor is a genuine risk driver or a proxy for a protected characteristic
insurance-demand	Conversion and retention modelling — territory is a key feature in demand models

All Burning Cost libraries →

References

Riebler, A., Sørbye, S.H., Simpson, D., & Rue, H. (2016). An intuitive Bayesian spatial model for disease mapping that accounts for scaling. Statistical Methods in Medical Research, 25(4), 1145–1165.
Gschlössl, S., Schelldorfer, J., & Schnaus, M. (2019). Spatial statistical modelling of insurance risk. Scandinavian Actuarial Journal.
Besag, J., York, J., & Mollié, A. (1991). Bayesian image restoration, with two applications in spatial statistics. Annals of the Institute of Statistical Mathematics, 43(1), 1–40.
Hjort, N. L., Jullum, M., & Loland, A. (2025). Uncertainty quantification in automated valuation models with spatially weighted conformal prediction. IJDSA (Springer). doi:10.1007/s41060-025-00862-4
Tibshirani, R. J., Barber, R. F., Candes, E. J., & Ramdas, A. (2019). Conformal prediction under covariate shift. NeurIPS 2019.

Related Libraries

Library	What it does
bayesian-pricing	Hierarchical Bayesian models for thin rating cells — the same partial-pooling logic applied to non-geographic grouping factors
insurance-glm-tools	GLM tooling including R2VF factor merging — use to band territory factors after BYM2 produces the relativities

Licence

MIT

Name		Name	Last commit message	Last commit date
Latest commit History 46 Commits
.github/workflows		.github/workflows
benchmarks		benchmarks
notebooks		notebooks
src/insurance_spatial		src/insurance_spatial
tests		tests
.gitignore		.gitignore
CHANGELOG.md		CHANGELOG.md
LICENSE		LICENSE
README.md		README.md
pyproject.toml		pyproject.toml
run_databricks_pytest.py		run_databricks_pytest.py
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

insurance-spatial

The Problems

What's in this package

Installation

BYM2 Quick Start

Spatial Conformal Prediction

Non-conformity score choice

Bandwidth selection

The BYM2 model

Recommended pipeline

UK data sources

Computational notes

Benchmark results

Performance

Related libraries

References

Related Libraries

Licence

About

Uh oh!

Releases 3

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

insurance-spatial

The Problems

What's in this package

Installation

BYM2 Quick Start

Spatial Conformal Prediction

Non-conformity score choice

Bandwidth selection

The BYM2 model

Recommended pipeline

UK data sources

Computational notes

Benchmark results

Performance

Related libraries

References

Related Libraries

Licence

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 3

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages