Agentic Commerce Deep Dive (Vale)
An eye built from camera-shutter blades hovers over a laptop checkout screen, reflecting a credit card, an open email, and a bank balance.

The Shopping Agent Has to See Your Screen. The Retailer Owns What It Sees.

Screenshot-driven shopping agents capture every screen they transact on, turning each into a privacy boundary that agentic commerce never priced into the convenience. When an agent leaks a customer's checkout, the liability lands on the retailer whose page it screenshotted, not the model vendor the customer never chose.

Neritus Vale

A shopping agent that buys for you has to look at your screen first, and the looking is the whole exposure. The agents now browsing and checking out on a shopper’s behalf — Operator, Anthropic’s Computer Use, the assistants built into Perplexity’s and Opera’s browsers — run the same loop: take a screenshot, read it with a vision model, decide what to click, take another. Every screen the agent touches becomes a privacy boundary, because the screenshot cannot tell the product photo from the card number autofilled beneath it. Agentic commerce shipped the convenience and quietly skipped pricing the exposure. The bill arrives addressed to the retailer.

The screenshot is indiscriminate by design, which is the property no checkout demo shows you. When Anthropic introduced Computer Use in October 2024, it described the loop plainly: Claude looks at the screen, reads what is there, and acts, capturing exactly what a person at the machine would see. That frame includes what you never meant to hand a model vendor: the banking tab left open, the email preview sliding in, the address book behind the storefront. Anthropic flagged this itself, warning that computer use “may provide a new vector for more familiar threats such as spam, misinformation, or fraud,” and telling developers to “begin exploration with low-risk tasks.” A loop that scored 14.9 percent on the OSWorld benchmark in screenshot-only mode was easy to caveat; the same loop, now trusted to finish a purchase, is not.

Then there is the failure that runs the other way: the screen can attack the agent that reads it.

In October 2025, Brave’s security team showed the boundary is porous in both directions. Its researchers hid a command inside a web image as faint light-blue text on a yellow field, invisible to a human eye but legible to the model’s vision. When a user of Perplexity’s Comet browser screenshotted a page carrying that hidden text, the model read the instruction and obeyed — the planted text directed Comet to use its browser tools on the attacker’s behalf, and the agent could not separate the stranger’s words from its owner’s. Brave framed the problem as systemic across AI browsers, having also found injection flaws in Fellou and Opera’s Neon, and traced it to one root: a failure to “maintain clear boundaries between trusted user input and untrusted Web content.” The screenshot is the agent’s eye and its open door at once.

A shopping-bag robot's scan beam reads pale-blue hidden text on a yellow web banner as a corner padlock springs open.

The industry tells two stories about this risk, and both look past the quiet part. The first is the convenience story, where you ask ChatGPT to buy the sneakers and it does; the second is the safety story, where the danger is prompt injection and the answer is a better guard. OpenAI has built the better guard: Operator hands control back to you for logins and payments, pausing to seek user confirmation before sensitive inputs are made, and the CUA model is designed to identify and ignore prompt injections, recognising all but one case from an early red-team session. That is a strong guard — and the wrong reassurance. A layer that stops nearly every injection still defaults to capturing the screen, still ships the frame to a server you do not run, and still treats your checkout page as ordinary input until something trips.

The strongest case against this argument is that the screenshot is a transitional technology, already being designed out. Stripe and OpenAI’s Agentic Commerce Protocol lets a merchant sell to an agent through one integration and a Shared Payment Token “limited to a specific merchant and a specific cart value.” On that rail the card is never rendered on a screen the agent can capture. Glossier, SKIMS and Vuori are among the fashion names lining up for it, buying that containment. Google’s rival Agent Payments Protocol, backed by Mastercard, American Express and PayPal among some sixty firms, reaches the same end through a signed proof that a real person authorized a specific purchase. If checkout keeps moving onto these rails, the most sensitive frame never reaches the screenshot loop, and the boundary I have described closes where it matters most.

The rails close the boundary only where they reach, and they reach a sliver of the web. A protocol protects the checkout of a merchant who integrated it and does nothing for the browsing that comes before: the price comparison across four tabs, the loyalty login, the half-filled cart on a site that never signed up. The general-purpose screenshot agent exists precisely to work where the protocols do not, and that universality is both why it scales and why the exposure stays live. The rails do not retire the screenshot; they pave one safe lane down a road the agent still drives end to end. The convenience retailers are sold is that lane; the exposure they will answer for is everything around it.

When the leak comes, the customer will not call the model vendor she never chose; she will call the store. She handed her details to the retailer — the name on the receipt. Data-protection law tracks that instinct: the party that decides why and how personal data is collected answers for where it ends up, even when someone else did the leaking. A retailer that switches on agentic checkout invites the screenshot loop onto its own payment page and takes on the duty to know what that loop captures and sends. This is not a distant hypothetical; it is a question of whose name sits closest to the customer when a checkout frame surfaces where it should not. That name is not OpenAI or Google. It is the shop.

Retailers can still price the exposure instead of meeting it on the day of the breach. Pricing it means treating the screenshot agent as a channel with terms: insist on the protocol over the raw browser, require a payment token scoped to one cart, refuse to let an unscoped agent screenshot a logged-in session, and record where the boundary sits so there is an answer when a regulator asks. If agentic checkout keeps arriving faster than retailers write those terms, the convenience will stay free and the exposure will stay unpriced until one leaked checkout sets the price for the whole sector. The screen the agent has to see is, for now, still the retailer’s to design. What it shows that agent is a decision, not yet a fate.