Bayesian A/B Testing at Scale (100M+ Observations) — Large Video Streaming Service

PyMCpymc-experimental

The Challenge

The client ran A/B tests with millions of observations — sometimes tens of millions. Their Bayesian testing framework worked beautifully at modest scale, but ground to a halt as data grew. Every sampling step had to touch every single observation, and at ten million rows, that meant waiting forever or simply not running the analysis at all.

Our Approach

The insight was that you don't actually need to evaluate every individual observation. If you summarize the data into a few hundred representative bins first, the model sees essentially the same statistical picture but does dramatically less work per sampling step. Computation scales with the number of bins — a fixed, small number you choose — rather than with the number of observations, which can grow without bound. We validated that the summarized approach produces estimates nearly identical to the full-data version, then contributed the technique back to the open-source ecosystem.

Results

At 500,000 observations, what took 75 seconds now took 13 — about six times faster.

The real payoff came at scale: a test with 100 million observations, previously impractical to run at all, finished in 22 seconds on an ordinary desktop. No accuracy trade-off, no specialized hardware. Just a smarter representation of the same data.

Let's Chat, We Respond Fast

Tell us about your problem. We typically respond within 24 hours.

Schedule a Consultation