AI Deep Dive (Vale)
Two filing cabinets side by side, one labeled arXiv overflowing with Valley3 papers stamped with the ByteDance logo and a citation tree branching out, the other labeled Marketing containing only a single glossy Rufus pitch deck behind the Amazon arrow

ByteDance Filed Valley3 At arXiv. Rufus Lives In A Press Release.

ByteDance published Valley3, an omni multimodal commerce model, to arXiv with weights and an Apache 2.0 license, while Amazon's Rufus and Shopify's Sidekick exist publicly only as blog posts and conference talks. The citation asymmetry that follows is not technical but social: Chinese commerce AI gets read and built on; Western commerce AI gets used.

Neritus Vale

ByteDance posted Valley3, an omni multimodal commerce model, to arXiv on 2 May, with code and weights released on GitHub and HuggingFace under Apache 2.0. Amazon’s Rufus and Shopify’s Sidekick, both deployed at scale, exist publicly as blog posts, magazine interviews, and conference talks. The asymmetry is not technical but social: one set of models will be cited and built on; the other will only be used.

Valley3 is the third entry in a ByteDance series whose lineage runs from a 2023 video-understanding model through 2025’s commerce-and-short-video Valley2 to a four-stage pre-training pipeline that adds native multilingual audio. The 8B and 32B variants come in Instruct and Think modes, with the Think variants exposing three controllable depths of chain-of-thought reasoning. The paper evaluates the models across six in-house commerce tasks and open-source e-commerce benchmarks. The scope is itself a tell: ByteDance, parent of TikTok and Douyin, builds AI for the livestream and short-video commerce that Western retailers still call emerging.

What Amazon and Shopify have published about their commerce models is the corporate-comms version of a research release. Rufus has an Amazon Science blog post and an IEEE Spectrum article by VP and distinguished scientist Trishul Chilimbi, plus an AWS write-up about scaling on 80,000 Inferentia and Trainium chips for Prime Day. Sidekick is documented through a Shopify Engineering post derived from an ICML 2025 expo talk on production agentic systems, plus a follow-up note about fine-tuning Qwen3-32B for Shopify Flow — the company is comfortable building on others’ open weights but not its own. None of these are citable artifacts: no model weights, no training data, no reproducible eval framework, no peer-reviewed paper. They are read by journalists, not by other model builders.

The mechanism by which research compounds is citation, and citation requires something to point at. Valley3 sits at github.com/bytedance/Valley with downloadable weights on HuggingFace, built on Alibaba’s Qwen3-VL. Alibaba’s Qwen family, which Valley3 extends, captured over 50% of global open-source model downloads as of March 2026, per Interconnects AI. A Vietnamese livestream startup, an Indonesian merchant analytics tool, a São Paulo agency building product-video QA: any of them can fork Valley3, fine-tune it on local catalog data, and publish a derivative paper that cites it. Shopify already lives on this side of the asymmetry; its own engineering post about Sidekick Flow concedes the production agent is fine-tuned from Qwen3-32B and runs 68% cheaper than the closed alternative it replaced. Nobody downstream cites Rufus because there is nothing in Rufus to cite — they can pay Amazon for it.

The case for keeping a commerce model proprietary is real, and it has a name: defensibility.

Amazon’s strongest argument would be that Rufus’s value is inseparable from a private catalog and a behavioural graph that releasing the model would put at risk. The counter holds if the catalog is the moat. It is not. The moat is the customer relationship and the operational stack — fulfillment, returns, payments, recommendations, none of which a published model erodes. The price of opacity is that Amazon’s research team cannot recruit by citation, cannot benchmark openly against Qwen-Omni or Valley3, and cannot get the free engineering review that arrives when an arXiv preprint takes a public beating from three rounds of community criticism.

If the publication gap holds for another year, the academic literature on commerce-specific multimodal models will be Chinese by default. That is not a hostile reading; it is what happens when one side files PDFs to a public archive and the other side files them to the marketing team. External researchers, including the graduate students working on agentic commerce, livestream understanding, and product-video QA, will keep citing Valley3 because it is what is there to cite, anchoring the field on Chinese benchmarks and Chinese definitions of what counts as a hard problem in retail AI. Amazon produces innovation in quantity. What accrues elsewhere is influence: the kind that shapes which problems the next generation of researchers considers worth working on.