AI & Retail Technology Deep Dive (Vale)
A nautilus shell beside a dissolving 3D point cloud of a clothed human figure with wireframe anatomy visible underneath

A Point Cloud Can Fit a Clothed Body. Almost No Retailer Has One.

ETCH, an ICCV 2025 highlight, fits parametric body models to clothed-human point clouds with up to 69.5 percent less error than prior methods. Its training set draws on 47 real subjects captured in research labs, exposing the infrastructure gap that separates the algorithm from any commercial virtual try-on deployment.

Neritus Vale

ETCH, an ICCV 2025 highlight from researchers including Michael J. Black at the Max Planck Institute, solves the hardest algorithmic problem in 3D virtual try-on: fitting a parametric body model to a point cloud of a clothed human. Its composable training pipeline stitches together two real-world scan corpora and one synthetic dataset into 94,501 samples. On the leading benchmark, it cuts body-fitting error by up to 69.5 percent over prior methods and improves shape accuracy by an average of 49.9 percent. The algorithm is solved. The scanning infrastructure it needs barely exists outside the research labs that produced it.

Virtual try-on’s central difficulty was never draping a digital garment onto a known body. It was inferring the body beneath the garment when all you have is a 3D scan of someone dressed. Prior optimization-based approaches chained multi-stage pipelines that collapsed when the initial pose estimate was wrong; learning-based methods generalized poorly from tight sportswear to loose overcoats. ETCH encodes a new representation called tightness vectors: displacement fields from each point on the clothing surface inward to the corresponding point on the body underneath. Because those vectors are locally SE(3)-equivariant, they remain stable across poses — a loose overcoat in a lunge and a fitted shirt standing upright receive the same geometric treatment.

The method’s edge is its composable training set. CAPE, from the Max Planck Institute, provides 15 subjects in fitted garments across 140,000+ frames captured at 60 fps. 4D-Dress, from ETH Zurich, adds 32 subjects in loose and layered clothing across 78,000 textured scans, each frame requiring semi-automatic parsing to segment garment from body. A generative tranche of synthetic humans fills gaps in body-type and pose coverage. Each real scan carries what no commercial scanner produces — a registered ground-truth body shape under the clothing, aligned frame by frame with the outer surface. The total real corpus amounts to 47 subjects, all captured in dedicated academic facilities.

Forty-seven subjects is a research breakthrough and a production non-starter.

The data ETCH consumes does not resemble anything a retailer collects. TC2, the largest body-scanner manufacturer, has more than 1,000 units deployed worldwide, but these capture a clothed surface for measurement extraction, not the volumetric geometry needed to reconstruct the body underneath. The equipment that produced CAPE and 4D-Dress sits in research facilities behind academic access agreements, not on shop floors. Bloomingdale’s trialled in-store body scanning and withdrew the hardware after cost made it impractical for mainstream deployment. Meshcapade — whose co-founder Michael J. Black is also a co-author of ETCH — offers a production pipeline that works from a single photo, bypassing the dense scan entirely.

Synthetic data is the obvious bridge. For the thesis to fail, synthetic generation would need to close the domain gap with real scan data entirely; that gap remains. Consumer depth sensors are no substitute: they capture the clothing surface, not the registered body shape underneath. Fitting SMPL-X, the full-body model with articulated hands and face, would require scan coverage that no existing corpus provides at commercial scale. The binding constraint is the scanning infrastructure, not the neural architecture.

The virtual try-on sector is valued at $15.18 billion in 2025, with projections reaching $48 billion by 2030. Smart mirror and kiosk systems accounted for 43.86 percent of 2024 revenue. These tools lift conversion rates by up to 40 percent over mobile-only AR, but they cannot model fit, drape, or how a body and a garment interact in three dimensions. If retailers want virtual try-on that understands what a body looks like under a dress, they will need to fund the scanning infrastructure that produced these 47 training subjects. The bottleneck has moved from algorithm to capture.