Ahsen Khaliq’s Post

View profile for Ahsen Khaliq, graphic

ML @ Hugging Face

Depth Anything V2 paper page: https://buff.ly/4bZ6lo7 This work presents Depth Anything V2. Without pursuing fancy techniques, we aim to reveal crucial findings to pave the way towards building a powerful monocular depth estimation model. Notably, compared with V1, this version produces much finer and more robust depth predictions through three key practices: 1) replacing all labeled real images with synthetic images, 2) scaling up the capacity of our teacher model, and 3) teaching student models via the bridge of large-scale pseudo-labeled real images. Compared with the latest models built on Stable Diffusion, our models are significantly more efficient (more than 10x faster) and more accurate. We offer models of different scales (ranging from 25M to 1.3B params) to support extensive scenarios. Benefiting from their strong generalization capability, we fine-tune them with metric depth labels to obtain our metric depth models. In addition to our models, considering the limited diversity and frequent noise in current test sets, we construct a versatile evaluation benchmark with precise annotations and diverse scenes to facilitate future research.

Nikolas Engelhard

Expert in Robotics, Computer Vision, Automation

2mo

This looks extremely impressive. I just wonder what the 'right' interpretation for the drawings should be. At 0:17 we have the scenario with the display on the back of the truck and the v2 now recognizes this as a flat surface (no idea how...), but at the end (e.g. 0:02) with have the painting with the horse and now the model returns depth. So what's the right interpretation of a screen or a painting? If we now put the horse-painting on a wall, should it then be recognized as a flat surface (like the truck display), but if we zoom in and only see the horse, we are suddenly in 3d?

Amir Livne

Research Engineer @ Verily (Google Life Sciences)

2mo

That's awesome! I wonder, how would it perform on a photo of a poster or a tv screen, showing some landscape image, for example? Can it infer this as a flat surface?

On device realtime depth and segmentation on mobile devices is the future foundation of XR - can’t wait for v4

Moar fake pixels, MOAR! Cool stuff, probably useful for something at some point. Focus shifting, maybe? Not sure. Definitely amazing work by the authors here.

Like
Reply

Details matter! It's like going from jpeg compression level 1 to 10, neat results.

Like
Reply
Praveenkumar Rajendran

Computer Vision Research Engineer | M.S. in Research | MLOps | GenAI

2mo

Depth Anything v1 at CVPR next week. And now early release of v2? Great!

Like
Reply
Tuan Tong

AI • LLMs • AGI

2mo

Congrats. I am a fan of the V1. Appreciate your work!

Raymond Lo 👓🤖

AI Software Evangelist at Intel (Global Lead)

2mo

v2 wow

Like
Reply

Awesome work! Thank you so much

Like
Reply
Florent Weltzer

Digital Project manager 🚀 / Passionate Geek 🎮 / Amateur Woodworker 🪓

2mo

Very impressive details with the V2!

Like
Reply
See more comments

To view or add a comment, sign in

Explore topics