Time-Varying Marketing Mix Modeling & A/B Testing at Scale — HelloFresh

PyMCPyMC-MarketingJAXNumPyroHSGPArviZ

The Challenge

HelloFresh acquires roughly 140,000 new customers per week in the US alone, spending heavily across channels and markets to do it. But the cost of acquiring each customer kept shifting — quarter to quarter, market to market, channel to channel — and their existing models could only tell them what they were spending, not why the economics kept moving.

That alone would have been enough to tackle. But two infrastructure bottlenecks made the problem worse. Every time an analyst wanted to explore a different model configuration or answer a quick business question, they had to wait 20 minutes for a single model run. And the company's A/B testing pipeline — processing thousands of concurrent experiments across global markets — took five to six hours to churn through its overnight batch. By morning, yesterday's decisions were already stale.

Our Approach

Why acquisition costs change over time

We started with the hardest question: why does acquisition cost change over time? Rather than treating marketing effectiveness as a fixed number, we built a model that lets it evolve — capturing how each channel's impact shifts with seasons, competitive dynamics, and market maturity. The model pools information across markets and channels, so even smaller markets benefit from what we learn in larger ones. We also incorporated results from HelloFresh's own lift tests to keep the model honest.

Getting model runtimes under control

The 20-minute runtime was a separate engineering problem. We reworked the model's internal calculations — simplifying the math where we could, swapping in more numerically stable formulations, and rewriting the most expensive operations to run efficiently at scale. That brought each run down to about two minutes with sharper predictions to boot.

Rethinking the testing pipeline

For the testing pipeline, the fix was architectural. Instead of processing thousands of experiments one at a time, we restructured the system to evaluate the entire test inventory in a single pass. What used to take all night now finishes before the first morning coffee.

Results

The measurement model now runs in about two minutes instead of twenty, which changed how analysts work — they iterate on questions in real time rather than queuing up overnight jobs.

Prediction accuracy improved meaningfully too, with variance dropping by 60%.

The testing pipeline went from a six-hour overnight batch to a six-minute run, turning A/B test results into something the team can act on the same day. Most importantly, HelloFresh can now see exactly how each channel contributes to shifts in acquisition cost over time, giving them the visibility they need to reallocate spend with confidence. The partnership has since expanded into brand measurement and now spans 15 markets across multiple product lines.

PyMC Labs Team

Luca Fiaschi
Niall Oulton
Bill Engels
Thomas Wiecki
Benjamin Vincent

Let's Chat, We Respond Fast

Tell us about your problem. We typically respond within 24 hours.

Schedule a Consultation