Bayesian State Space Modeling for CAR-NK Cell Therapy Manufacturing — Takeda

PyMCJAXNumPyroMLflowDatabricks

The Challenge

CAR-NK cell therapy is a promising cancer treatment, but manufacturing it is more like cultivating a living system than running a factory line. Each batch starts with cells from a human donor and progresses through roughly 28 days of expansion in bioreactors, quality testing, and preparation. Every donor is different, every batch behaves a little differently, and the whole process is expensive enough that a failed batch represents a serious loss.

Takeda needed to understand this pipeline at three different time horizons. Looking backward: what actually happened during a completed batch, and why did it turn out the way it did? In real time: if we have measurements from the first few days, can we predict whether this batch will meet quality standards at the end — early enough to intervene? And looking forward: before we even start a new batch, what settings give us the best chance of hitting our targets?

The constraint that made all of this genuinely hard: only 17 donor samples to learn from. With so little data, any purely data-driven approach would be hopelessly uncertain. The model needed to encode real biological knowledge about how cell populations grow and change.

Our Approach

Tracking cell populations through manufacturing

We built a model that tracks cell populations stage by stage through the manufacturing process, capturing how total cell counts, viable cell counts, and key markers evolve from one phase to the next. The model encodes what's known about cell growth dynamics while learning donor-specific behavior — some donors' cells expand faster, some plateau earlier — by sharing information across the small pool of available samples.

Three ways to use the same model

The same model serves all three use cases through different conditioning strategies. For retrospective analysis, it incorporates all measurements from a completed batch. For real-time monitoring, it uses only the measurements available so far and projects forward. For prospective planning, it generates predictions before any new batch data exists, using only the planned manufacturing parameters. We implemented efficient updating so that new measurements can be incorporated quickly without rerunning the full analysis from scratch.

The system was built to integrate with Takeda's existing data infrastructure, with full experiment tracking and versioning so that every model run is reproducible and auditable — important in a regulated manufacturing environment.

Results

We delivered working models covering the full 28-day manufacturing pipeline. The predictive framework can generate useful forecasts from as early as day six of a batch — early enough to flag potential problems and adjust course before significant time and materials are committed. The system was deployed into Takeda's data platform with documentation designed for their internal teams to operate and extend.

Despite working with just 17 donors, the model produces calibrated uncertainty estimates, meaning it's honest about what it knows and what it doesn't — which in a clinical manufacturing context is arguably more valuable than point predictions.

PyMC Labs Team

Eric Ma
Maxim
Adrian
Virgile
Junpeng
Aziz
Thomas Wiecki

Let's Chat, We Respond Fast

Tell us about your problem. We typically respond within 24 hours.

Schedule a Consultation