Try-On Perfected the Still Photo. Now the Garment Has to Move.
Every virtual try-on advance so far solved a single frozen frame. HyperBones, a new arXiv paper clocking past 300 frames a second, targets the layer the field skipped: real-time bone-driven cloth motion, the gap between a product photo and a fitting-room mirror.
Parallax Pincer
The rendered silk slip is convincing right up until the model is asked to walk. Frozen, it behaves: light catches along the bias, the hem pools where a hem should pool, and a thumb mid-scroll believes the weight of the cloth. Ask it to move and the spell has nowhere to go, because the system only ever solved a single frame. Earlier this month we called the still image finished; the layer nobody had claimed is motion, and for anything that drapes, flows, or swings, motion is the product.
The bias cut is the oldest proof of this. When Madeleine Vionnet cut cloth on the true bias in the 1920s, she made dresses that lie inert on the cutting table and come alive only on a moving body, clinging and releasing as the wearer turns. Laid flat, such a dress is a limp diagonal of cloth; its whole pitch is the half-second the hem swings out and settles back. Every virtual try-on triumph so far has photographed the table.
There are two ways to make the cloth move: paint it frame by frame, or simulate it. The diffusion approach — the lineage of MagicTryOn and its peers — paints each frame and hopes the next one agrees with it. Papers like MagicTryOn describe what existing video try-on methods still haven’t solved: “inadequate garment fidelity and limited spatiotemporal consistency,” naming temporal jitter and appearance drift as the symptoms. The garment wobbles because the model redraws it on every frame and keeps no memory of where the fabric just was.
HyperBones, posted to arXiv on 19 May by a twelve-author team, takes the other road and does not draw the garment at all. It simulates one. A set of virtual bones drives the coarse motion of the cloth through a light neural network, a trained convolutional map lays the wrinkles back on top, and physics supervises the result without a slower simulator standing behind it. It runs past 300 frames a second on a commodity GPU, the figure that matters, because a fitting-room mirror has to keep pace with the body in front of it.
Virtual bones for clothing are not new. A 2022 SIGGRAPH paper already drove loose garments by transferring body motion onto extracted bones, splitting slow sway from fast wrinkle. The novelty here is speed, and a body the model has never seen: hypernetwork conditioning lifts the per-identity computation out of the real-time loop, so a single garment ripples correctly across different shapes and motions. Doing that fast enough to answer a turning body in real time is what separates a research demo from a mirror.
The catch sits in one phrase of the abstract: a fixed set of garments. HyperBones will not swallow a catalog and animate it; someone has to build the bones for each specific piece ahead of time. That makes it a hero-product tool well before it is a storefront one, so the moving mirror arrives the way most fashion machinery arrives — on the flagship coat first, and the basics much later.
A product photo asks the customer to imagine how the garment moves; a fitting-room mirror refuses to let her guess.