Spatial Gaussian Process Modeling for Agricultural Treatment Effects
Indigo Ag
The Challenge
Indigo Ag sells microbial seed treatments that are supposed to boost crop yields. Proving that they actually work — and by how much — is harder than it sounds. Agricultural fields aren't laboratories. The plot next to a drainage ditch will always out-yield the one on a clay hardpan, regardless of what you put on the seeds. This spatial variation is enormous, and it confounds everything. A naive comparison of treated versus untreated plots might just be measuring soil quality differences.
On top of that, bad growing seasons produce fields with near-zero yields, and those zeros break the assumptions of standard statistical models. Indigo needed results they could defend to skeptical agronomists across multiple geographies, crops, and seasons.
Our Approach
Separating treatment signal from field noise
The core idea was to decompose field yields into the pieces that matter: the treatment effect (the signal Indigo cares about), the spatial pattern in the field (the confound), and everything else. We modeled the spatial structure explicitly using the geographic coordinates of each plot, which lets the model learn that nearby plots tend to produce similar yields regardless of treatment. Once that spatial pattern is accounted for, what remains is a cleaner estimate of what the treatment actually did.
We also tackled the zero-yield problem head on. In bad years, a real fraction of fields simply fail — drought, flood, pests. Standard models assume every field produces something, which is wrong. We used a mixture approach: one component for fields that effectively produced nothing, and another for fields that grew a crop.
The whole thing was built hierarchically, sharing information across farms, regions, and crop types while letting each level have its own patterns. Weather data went in as a predictor of baseline yield. We validated everything against held-out growing seasons to make sure the model wasn't just fitting noise.
Results
The spatial modeling worked. By explicitly capturing field-level variation, the treatment effect estimates became dramatically cleaner — no longer contaminated by the accident of which plots happened to sit on better soil. The mixture approach for low-yield scenarios improved model fit substantially and made the effect estimates more credible in exactly the conditions where they're hardest to measure. Indigo could now point to treatment effects that were isolated from confounds and validated against data the model hadn't seen.
We worked with their team across multiple growing seasons, refining the models as new data came in.
“Additional expertise was helpful to get the model to the finish line and into production.”
PyMC Labs Team
- Thomas Wiecki
- Bill Engels
- Niall Oulton
- Carlos Trujillo
Let's Chat, We Respond Fast
Tell us about your problem. We typically respond within 24 hours.
Schedule a Consultation