What is LiDAR? And why measure clothes with it?
LiDAR is a depth-sensing laser introduced on iPhone 12 Pro in 2020. It bounces invisible infrared pulses off the world to build a millimeter-scale 3D model of whatever it sees. The original use cases were AR depth and indoor scanning. Garment measurement was not on Apple's roadmap.
We thought it should be. The math works: a flat garment captured at a known camera angle and distance gives you everything you need to extract chest, waist, sleeve, inseam, and 13 other measurements in a single shot. The hard part isn't the math. It's running it in under a second on a phone in someone's hand.
The 0.92-second target
When we set out to build Size AI, the goal was sub-second capture. Anything slower and the user reaches for a tape measure instead, which costs them 12-15 minutes per garment. Sub-second changes the math: a reseller measuring 200 items a day saves 50 hours a week.
The current 95th-percentile capture latency is 0.91 seconds on iPhone 14 Pro, 1.04 seconds on iPhone 12 Pro (the oldest supported device). Median is 0.78s.
How the pipeline works
The end-to-end pipeline runs four stages and uses seven ML models. All of it executes on-device. Nothing leaves the phone except optional cloud-backed model variants for power users.
Stage 1: Capture
The user lays the garment flat, tilts the camera 90° toward the surface, and taps. We grab a synchronized RGB frame and a LiDAR depth frame at full resolution. Total time: ~80ms.
Stage 2: Garment segmentation
A custom semantic-segmentation model isolates the garment from the surface. We trained this on 1.2M garments across the categories Size AI ships today (90+ types). The model is 4.1MB after quantization, runs on Core ML, and finishes in ~120ms on Neural Engine.
Stage 3: Keypoint detection
A second model places measurement keypoints: shoulder seams, hem, sleeve cuffs, waistband, and the rest. There are 17 keypoints per garment and the model has to handle the full taxonomy of garment types. We use a shared backbone with type-specific heads.
The cost of sub-second on-device inference isn't accuracy. It's the model architecture work to get there.
This is the stage where most of our optimization effort lived. Keypoint detection at this scale is expensive. We rewrote the head to use depthwise-separable convolutions, dropped the input resolution from 512 to 384, and cut runtime by 40% with no measurable drop in accuracy.
Stage 4: Measurement extraction
With keypoints in 3D space (RGB + depth fused), measurements fall out as Euclidean distances along the garment's local geometry. The math is straightforward; the hard part was the previous three stages.
Why on-device, not cloud
Cloud inference would let us use a bigger model. But three things made on-device the right call:
- Privacy. Garments belong to the user. We don't want to ship images to our servers.
- Latency. Round-trip to a server runs 200-800ms before inference even starts. Sub-second budgets evaporate.
- Offline-first. Resellers work in storage units, basements, garages, places without good wifi. The app has to work without it.
The cost is that the model has to fit in mobile memory and run on Neural Engine. That constraint forced architecture choices we wouldn't have made otherwise. In the end the model is better for it.
What we'd do differently
Three things, in hindsight:
- Train on real-world flat-lays earlier. Our first dataset was studio-lit garments on white surfaces. Real users shoot on hardwood floors with shadows. We over-fit to the studio look for two months before catching it.
- Ship a worse v1. We held v1 for accuracy targets we couldn't validate without users. Shipping at v0.7 would have given us labeled flat-lay data three months earlier.
- Spend more on the keypoint backbone. The shared backbone we use today is a custom ResNet variant. A bigger investment in the backbone (DINO v2-style self-supervised pretraining) would have paid off across every garment type.
Where it goes next
The next milestone is sub-500ms. We have a draft architecture that fuses segmentation and keypoint detection into a single forward pass. Internal benchmarks put it at 0.43s on iPhone 14 Pro. We're working through the long-tail accuracy regressions now.
For the full setup, flat surface, lighting, framing, post-capture verification, see the Capture Guide. Most of the accuracy gap between users today is technique, not the model. The pipeline ships in the iOS app at Size AI, and brands or marketplaces wanting to integrate the same measurement stack can reach the SDK / API team via contact.




