Multilevel Regression & Post-Stratification for Public Opinion Polling
SALK
The Challenge
Public opinion polling is deceptively hard. The numbers look straightforward until you start asking questions about specific groups — younger voters in rural areas, minority communities in smaller cities, particular demographic slices that pollsters care most about but have the fewest data points for.
SALK, an Estonian civic data consultancy, was running into exactly this problem. Their survey data was rich in aggregate but thin where it mattered most. Traditional approaches either produced wildly unstable estimates for smaller groups or simply gave up, returning nothing useful. For electoral analysis, this is a critical gap — the demographics hardest to poll are often the ones that swing outcomes.
They needed a way to say something reliable about every group, even when the raw data for any single group was sparse.
Our Approach
Borrowing strength across groups
The core insight was to stop treating each demographic slice as an island. Instead of estimating each group independently (and getting noisy results), we built a model that lets groups learn from each other — if 25-to-34-year-olds in one region behave similarly to those in neighboring regions, the model captures that shared structure and uses it to fill gaps where data is thin.
This "partial pooling" approach sits between two extremes: ignoring group differences entirely (too blunt) and treating every group as completely independent (too noisy). The model finds the right balance automatically, pulling sparse groups toward the overall pattern while still letting well-sampled groups speak for themselves.
We also layered in geographic and temporal patterns, so the model could pick up on spatial trends and shifts over time rather than treating each poll as a snapshot in isolation. The final estimates were calibrated against known population demographics to ensure the results reflected the actual composition of the electorate.
To make the results accessible beyond the data team, we built an interactive dashboard that let SALK's analysts and their clients explore estimates across any combination of demographics and regions.
Results
The model produced stable, credible estimates for every demographic group — including the sparse ones that had previously been unreliable or unmeasurable. SALK could now confidently report on segments that traditional methods had written off as "insufficient sample size."
Beyond the technical outcome, the collaboration reshaped how SALK approached polling analysis. The modeling framework became a lasting part of their toolkit, and the interactive dashboard gave their clients a way to explore the data that felt intuitive rather than statistical.
“We wanted to be able to draw some big conclusions out of a big set of data. So, that's why we came to PyMC Labs for help. It was a very successful collaboration. I've had many, many consultants working with me in the past, and I think this is by far the most successful collaboration that I've seen.”
PyMC Labs Team
- Thomas Wiecki
- Alexandre Andorra
Let's Chat, We Respond Fast
Tell us about your problem. We typically respond within 24 hours.
Schedule a Consultation