Depth Anything 3 proves most 3D vision research has been overengineering the problem. Vanilla DINOv2 transformer + dept…

Depth Anything 3 proves most 3D vision research has been overengineering the problem. Vanilla DINOv2 transformer + depth-ray pairs crushes SOTA by 44% on pose, 25% on geometry. One approach for SOTA monocular depth, multi-view geometry, pose estimation, and novel view synthesis 🤯 Huge implications for computer vision tasks across 3d reconstruction (photogrammetry and Gaussian splatting), visual effects tasks and robotics. Trained on public academic data and models/code already live!

Frontier · Reel

View original