Synthetic Consumer Research via Semantic Similarity Rating — Colgate-Palmolive

LLMsPython

The Challenge

Consumer research panels are slow, expensive, and getting harder to recruit for. A single product concept test can take weeks and cost tens of thousands of dollars. For a company like Colgate that tests hundreds of product concepts a year, this bottleneck shapes what questions even get asked.

Large language models seem like an obvious shortcut — just ask the AI to rate products. But it doesn't work. When you ask a language model to score a product on a 1-to-5 scale, the results cluster around the middle, show weird biases, and produce distributions that look nothing like what real consumers generate. The ratings are too uniform, too polite, and too detached from how actual people respond to toothpaste concepts. Colgate needed something that could produce statistically valid consumer insights at a fraction of the time and cost, not just plausible-sounding numbers.

Our Approach

Changing what we ask the AI to do

The breakthrough was changing what we ask the AI to do. Instead of requesting numerical ratings — which language models are bad at — we ask AI personas to respond to product concepts in their own words, the way a person would describe their reaction in conversation. Each persona is grounded in specific demographic attributes so the simulated population reflects real consumer diversity.

The natural language responses then get translated into numerical ratings through a second step: we measure how semantically similar each response is to carefully calibrated reference statements that represent each point on the rating scale. A response that sounds like enthusiasm maps to a high score; lukewarm language maps to the middle. This two-step process sidesteps the direct rating problem entirely.

Validation

We validated the method rigorously against 57 real consumer surveys encompassing 9,300 actual human responses, comparing not just average scores but full rating distributions and product ranking accuracy. The methodology was published as a peer-reviewed paper and released as open-source software.

Results

The method matched human product rankings 90% of the time across all 57 validation surveys, with distributional similarity above 85%. It actually showed less positivity bias than traditional human panels — respondents in real surveys tend to be politer than they should be, and the synthetic approach partially corrects for that. What used to take weeks of panel recruitment and fieldwork now runs in under 24 hours.

The open-source release has gained traction in the research community, with over 130 GitHub stars and a peer-reviewed publication.

“At Colgate-Palmolive, we really value the relationship we've built with PyMC Labs. They continue to deliver truly unmatched quality work on the hardest and most cutting edge problems we encounter. Their blend of deep Bayesian expertise, GenAI, and domain knowledge makes them an essential partner for delivering innovative, practical, and impactful solutions.”

Iraklis Pappas , Global Head of AI, Colgate-Palmolive

PyMC Labs Team

Benjamin F. Maier
Ulf Aslak
Luca Fiaschi
Nina Rismal
Kemble Fletcher
Christian Luhmann
Robbie Dow
Thomas Wiecki

Let's Chat, We Respond Fast

Tell us about your problem. We typically respond within 24 hours.

Schedule a Consultation