Meituan's Largest Recommendation Gain in Two Years Runs on LLM Architecture
Meituan's MTGR framework treats products as tokens in a transformer architecture, producing the platform's largest recommendation quality gain in two years at 65x the compute of the previous model. The same paradigm shift is deploying at Kuaishou, Xiaohongshu, and Meta.
Neritus Vale
Meituan’s MTGR framework applied the same class of transformer architecture powering large language models to product recommendation, producing the largest online metric gain the platform recorded in two years. The system treats user profiles, browsing histories, real-time behavior, and candidate products as token sequences processed by stacked self-attention layers. Online click-through rates rose 1.9%; conversion per user view rose 1.02%. The gain came from scaling model complexity to 65x the inference FLOPs of Meituan’s existing deep learning recommendation model, a trade that works because transformer architectures follow compute-scaling laws that traditional recommenders do not. Training cost held roughly flat via user-level sequence compression; the 65x refers to single-sample inference, not training expenditure.
MTGR builds on HSTU, the Hierarchical Sequential Transduction Unit that Meta published in 2024 for generative recommendation. HSTU treats a user’s engagement history the way a language model treats a paragraph: each interaction is a token, and self-attention learns what comes next. Meta scaled the approach to 1.5 trillion parameters and reported 12.4% online metric improvements across surfaces serving billions of users. Meituan’s adaptation reorganises features into four token types: user attributes, historical clicks, real-time behavior, and candidate products. Each is projected into a shared embedding space, producing a unified sequence the transformer can read. The training set covered 210 million users and 23.7 billion exposures.
The transfer from language to commerce is not a clean copy. Pure generative recommendation models user behavior through next-token prediction, which requires removing cross features between users and candidates. Cross features capture signals like a specific user’s click-through rate for a specific restaurant category, or the time since that user last ordered from a particular merchant. Traditional recommendation models were built around these signals. Meituan’s ablation study found that removing cross features erased the entire quality gain of the scaled-up model over the existing system. The fix was to embed cross features into candidate tokens and switch to discriminative loss — preserving the architecture while changing the training objective.
The architecture transfers from language to commerce; the training objective does not.
Meituan is not alone in rebuilding recommendation on this foundation. Kuaishou deployed OneRec on its main video feed, unifying retrieval and ranking into a single generative model and reporting a 1.6% watch-time increase. The related OneMall architecture drove GMV gains of 4.9% to 14.7% across Kuaishou’s e-commerce scenarios. On Xiaohongshu’s Explore Feed, GenRank now handles ranking for hundreds of millions of users at near-equivalent computational cost to the system it replaced. The common thread: traditional deep learning recommendation models stopped improving when compute increased; transformer-based models did not.
Meituan extended the approach in February 2026 with MTFM, a foundation model that shares one transformer backbone across multiple recommendation scenarios including homepage restaurant listings, food recommendations, and coupon-package placements. MTFM converts cross-domain data into three token types covering historical behavior, real-time interactions, and candidate items, training them jointly without pre-aligned inputs. Online A/B tests showed a 2.98% increase in orders for coupon-package recommendations and 1.45% for food recommendations, with inference latency dropping 5-6 milliseconds per request.
The counter-argument is that generative recommendation works at Meituan’s scale because Meituan has the behavioral data density to make scaling laws emerge. The platform handles hundreds of millions of recommendation requests daily, drawing on user signals across food delivery, grocery, and local services. This is plausible, but only if the gains are contingent on data volume rather than on the structural advantage of tokenised feature sequences. Meta’s own results showed power-law quality scaling across three orders of magnitude of training compute, and NVIDIA has released an open-source HSTU implementation for teams that lack Meta’s infrastructure. The bottleneck for mid-market retailers may be engineering capacity, not data.
Fashion retailers whose product discovery runs through these platforms should note what changed underneath the interface. When Meituan’s recommendation engine was a traditional deep learning model, adding compute did not improve recommendation quality. With the transformer backbone, quality scales with compute, which means the platform’s grasp of product-user fit improves as long as investment continues. Meituan already processes hundreds of millions of recommendation requests daily; the new architecture makes each request sharper with every model upgrade. If Meituan, Kuaishou, and Xiaohongshu stay on the scaling curve, the platforms hosting product discovery will read catalogs with a depth no brand’s own recommendation layer matches. The catalog is the vocabulary; the model decides what it means.