The Shopping Agent Filled the Cart. It Should Have Asked.
New work pulling an LLM agent's uncertainty apart finds the costly gap in a shopping request is often the customer's silence rather than any limit in the model. The next gain for shopping agents is learning to ask the right question instead of confidently building the wrong cart.
Sir John Crabstone
The most useful thing a shopping agent can hand back is a question. New work breaks an LLM’s uncertainty into three kinds: ambiguity in the request, gaps in the model’s knowledge, and the randomness of its own sampling. Only the first belongs to the customer, and only she can settle it. Most of the engineering has gone into the answer. The frontier for shopping agents is the moment they stop answering and ask.
The method behind that split predates the agents that need it. To tell a confused model from a confused request, researchers clarify the input several ways and watch whether the answers move. When they scatter, the request is at fault, not the model’s knowledge. An agent that reads that scatter has been handed something better than a confidence score: a reason to ask.
For tool-calling agents the distinction turns practical. One recent framework separates what the user wants from what the model predicts, then asks only the question whose answer would move the cart most. On ambiguous tasks it claims 7 to 39 percent higher task coverage while reducing its questions by 1.5 to 2.7 times. Trained into a reward model, the same signal lifted its judgment of when to act from 36.5% to 65.2% on the smaller model tested. Asking the right question is a skill, and these results show it can be trained.
The wider field built its agents to finish the job and report success. A 2025 position paper argues that the textbook division of uncertainty, into model error and data noise, loses its meaning once an agent can talk back. The gap that matters most is the one the customer has not yet filled. Calibration was the wrong instrument; what the agent lacked was permission to interrupt.
The benchmarks reward the wrong thing. Shopping agents post scores in the seventies against simulated buyers who never change their minds or hide a constraint, the customer who doesn’t exist we described last month. A high mark against an agreeable simulation says little about serving a difficult human.
An agent that never asks is not decisive — it is fast at being wrong.
Teaching the habit works. A clarification-seeking coding agent tested this spring resolved 69.4% of underspecified tasks, nearly closing the gap to agents handed complete instructions. It also learned when to stay quiet, spending its questions on the hard problems and waving the easy ones through. An agent that fills the cart without asking is a clerk guessing your size from the far end of the shop.
Retail has the test case ready. A shopper who types “something for a summer wedding” has named the occasion and withheld the budget, the size, and the dress code that decides the rest. An e-commerce agent that asks before it retrieves sharpens its matches with each turn. The agent that proceeds anyway has not spared her a decision. It has made the wrong one on her behalf.
A question costs the shopper a moment; the wrong cart costs a return and the goodwill that brought her in. An agent confident enough to skip the question is the one you can least afford to trust. Certainty is the one thing the shopper did not come to buy.