From Pilot to Operational: A framework for retail AI
- Simon Conroy
- May 27
- 7 min read
Updated: May 27

The problem nobody talks about
Most retailers don't have an AI problem. They have an operationalisation problem.
88% of New Zealand organisations are stuck in pilot or exploration phase.
Only 12% have scaled AI across their business.
The pilots work. The demo wows the leadership team. A model picks better promos, or forecasts demand more accurately, or routes a customer query in half the time. Everyone agrees it's the future.
Then six months later, the model is still running on a single analyst's laptop. Nobody trusts the output enough to let it touch a real decision. The vendor presentation gets dusted off for next year's strategy day. The pilot didn't fail. It just never grew up.
This article is about what has to happen between "the pilot worked" and "the business runs on it." It is a framework for building the plumbing, governance, and operating discipline that turns a clever model into a dependable part of how the retailer makes money.
We call it the Operational AI Framework, and we use it with our clients to take retail AI from one-off project to repeatable capability.
The shift in mindset
A pilot is an experiment. An operational AI system is a piece of infrastructure. Those are two different things, and they need to be funded, staffed, and governed differently.
Pilots are judged on whether the model is interesting. Operational systems are judged on whether the business can rely on them at 9am on a Tuesday when buyers are trying to make purchases.
That shift, from interesting to reliable, is the entire game. Everything in this framework exists to bridge it.
The Operational AI Framework
Seven steps. Sequential, but with feedback loops between them. Each one is a precondition for the next.
01 Anchor to a business decision, not a model output
Before anyone writes code, name the recurring business decision the AI is going to make or support. Not the insight. Not the dashboard. The decision.
If you cannot describe the decision in one sentence, and name the person or process that currently owns it, you do not have an operational candidate yet. You have a science project. Send it back to pilot.
This is the single most common reason retail AI stalls. The model produces something interesting, but no one is accountable for acting on it, and no existing process is willing to change to accommodate it.
02 Industrialise the data before you industrialise the model
The model is rarely the bottleneck. The data feeding it is. A pilot can survive on a hand-cleaned extract a data analyst pulled three months ago. An operational system cannot. It needs the same data, in the same shape, refreshed on a predictable schedule, with someone accountable when it breaks.
For retail leaders, the practical test is this: if your point-of-sale feed drops out tonight, do you know within the hour, and do you know what downstream models and decisions are affected? If the answer is no, you do not have a data pipeline.
You have a series of one-off extracts that happen to be running.
Industrialising the data means scheduled ingestion, quality checks at each step, clear ownership of each feed, and documented lineage so you can trace any number on any dashboard back to its source. It is unglamorous work. It is
also the difference between an AI capability and an AI demo.
03 Wrap the model in governance before you wrap it in a UI
Governance is not a compliance exercise. It is the operating manual for the system. For an operational retail AI you need to be able to answer, at any time, without a panicked message to the data team:
Who approved this model for use, and against what success criteria?
What is it allowed to decide on its own, and what must be reviewed by a human?
What customer or commercial data is it permitted to see, and what is it not?
If a regulator, an auditor, or a journalist asks why a particular customer got a particular outcome, can we reconstruct the answer?
When the model is changed, who signs it off, and who is told?
In retail this matters disproportionately because the decisions touch pricing, promotions, customer treatment, and supplier relationships. A model that quietly disadvantages one customer segment, or one supplier, or one region, can do real reputational and legal damage long before anyone notices.
Governance is the work that lets the CEO sleep at night while the model runs.
04 Instrument the system so you can see it working, and see it failing
Once the model is in production it must be observable. Not just technically, but commercially.
Technical observability is whether the system is up, whether the data is flowing, whether predictions are being produced on schedule. That is table stakes. Commercial observability is harder and more important. It asks: is the model still doing what we hired it to do? Is the forecast still as accurate as it was at launch? Is the recommendation engine still lifting basket size, or has the lift quietly halved?
Models degrade. Customer behaviour drifts. Suppliers change ranges. A model that was excellent in March can be mediocre by September and actively harmful by December, and unless someone is measuring against the original business KPI, no one will know. The instrument panel needs to be visible to the business owner of the decision, not buried inside the data team.
05 Operationalise the humans, not just the model
This is the step most retailers underestimate. The model does not work by itself. It works inside a workflow, alongside store managers, buyers, marketers, customer service agents, and operations staff. Every one of those people needs to
know:
When the model is making a recommendation versus making a decision.
How to override it, and what happens when they do.
Who to escalate to when the output looks wrong.
What is expected of them when the model is unavailable.
Without that, you get one of two failure modes. Either the staff ignore the model and continue working the old way, in which case the investment evaporates. Or they follow it blindly even when it is clearly wrong, in which case the model's
mistakes get amplified across the business.
The right answer is a defined human-in-the-loop pattern for each decision, with clear thresholds, exception paths, and training. This is operations design, not data science. It is also where most of the value is unlocked or lost.
06 Put a commercial model around it
An operational AI system has a cost to run and a value it produces. Both need to be measured, and the gap between them is the business case.
Cost is straightforward in principle and ignored in practice. It includes the data pipeline, the model hosting, the monitoring, the people maintaining it, the vendor licences, and the time of the business owner who governs it. If you cannot state the annual cost of running the system, you cannot defend it at the next budget review.
Value is the lift the system produces against the decision you anchored to in step one. When cost and value are both measured against the same decision, the system has an honest internal P&L. That is when AI stops being a discretionary spend and starts being part of how the business operates.
07 Build a continuous improvement loop
Operational AI is never finished. The world changes, the data changes, and the model has to change with it. The continuous improvement loop is the scheduled, governed process for retraining, re-evaluating, and where necessary
retiring models. It needs:
A cadence (quarterly, monthly, on-trigger) appropriate to how fast the underlying domain moves.
A clear definition of what "still good enough" looks like for each model.
A documented path for promoting an improved model into production without breaking the audit trail.
A documented path for shutting a model off when it stops earning its keep.
Retailers who skip this end up with a graveyard of models that nobody owns, nobody trusts, and nobody is brave enough to switch off. That is technical debt with a commercial cost.
Putting the framework to work
The seven steps are sequential when you are building a new capability, but they describe an ongoing operating model, not a one-time project. A mature retail AI function is running all seven simultaneously, across multiple decisions, with
shared infrastructure underneath.
The practical sequencing we recommend for a retailer starting from scratch:
Pick one decision. One. Not a portfolio.
Get steps 1 through 4 in place for that single decision before scaling to a second one.
Use the first decision to build the shared plumbing (data pipelines, monitoring tooling, governance forum) that the second and third decisions will reuse.
By the third or fourth decision, the marginal cost of operationalising each new model drops dramatically. That is when the AI capability starts to compound.
Disciplined, sequential build-out beats ambitious parallel build-out every time.
What this means for the leadership team
Three takeaways for decision makers.
First, stop measuring AI progress by the number of pilots running. Start measuring it by the number of business decisions that are reliably and observably supported by an operational AI system. Those are very different numbers, and the second one is the one that creates enterprise value.
Second, treat operationalisation as its own discipline, with its own funding line, its own ownership, and its own KPIs. It is not a phase at the end of a pilot. It is the work.
Third, recognise that the hard parts are not technical. They are organisational. The data pipelines, the governance forums, the human-in-the-loop workflows, the commercial model, the continuous improvement cadence, all of these are operating model decisions that sit with the business, not the data team. The technology is the easy part. The leadership work is the hard part.
The retailers who get this right over the next two to three years will not be the ones with the most pilots. They will be the ones who built the operating discipline to run a small number of AI systems extremely well, and then expanded from there.





Comments