From the course: Giving Computers Vision

The evolution of deep learning

From the course: Giving Computers Vision

The evolution of deep learning

- [Interviewer] So then it's off to the races. Now we're going to do deep learning on everything, or what happens next? - [Interviewee] Yes, what happens next is 2015. There are three teams that beat human visual systems - [Interviewer] Wow. - [Interviewee] Better than 95%. One was I think Google, Microsoft and Baidu. - [Interviewer] Oh yeah. - [Interviewee] Baidu's they did a little bit of cheating (both laughing) and so they were disqualified, but still, you know, it was not- - [Interviewer] Impressive still - [Interviewee] It was impressive that three teams beat- - [Interviewer] And by that you said above 95%? - [Interviewee] Yeah. - [Interviewer] Yeah, so that means that if you showed me five images that I would guess, well, it depends, I mean, depending on what the images are, I may or may not, but it would be as good as I am. And I think, you know, is this basically, to bring this home to somebody that doesn't have a clue what we're talking about, with on my Android phone or even on an iPhone, with Google photos or Google camera, I can point it at a thing like a plant and it can tell me exactly what plant it is. It doesn't always work, but it's far better than I am, unless I happen to be an arborist or an expert in that thing, right? - [Interviewee] Well, in this case, the person actually, when they compared human visual, the person actually spent a few hours learning the various categories. - [Interviewer] Oh really? So they even had prep. - [Interviewee] Yeah, they had prep. - [Interviewer] It wasn't just here's a bunch of images. - [Interviewee] No, no, no, no, they had prep. - [Interviewer] And they still didn't do as good? - [Interviewee] They did not do as good. - [Interviewer] Wow. - [Interviewee] Because there was, you know, a cat can look like a dog sometimes, right, and they had different breeds of dogs, right, so different breeds can look very similar sometimes. Right, so it's not very easy for humans to, even if you have the training, right, and then there is fine-grained visual recognition challenges also, where, you know, two breeds of birds may look very similar to each other, the computer can tell them apart, but the humans have a very tough time. - [Interviewer] This, so, so again, our mutual friend, Adam Geitgey, I think is how you say his last name, had a funny blog post on this, did you see it? Where you had Will Ferrell and Chad Smith, the drummer from Red Hot Chili Peppers. - [Interviewee] Okay, I see. (both laughing) - [Interviewer] They look identical. And there's even a funny thing where they go on Jimmy Kimmel and they do a drum off, again they're wearing the same clothes and everything, and he used that as his example, where for facial recognition, it was very easy for it to determine, you know, but a human looking at it, because we know distinctly who these people are because they're celebrities, we can tell, but it is striking how similar they are that it's kind of surprising the computer gets it right. - [Interviewee] In fact, Yoshua Bengio, you know, one of the... His brother, his name is Sammy Bengio, and I think they are twins. - [Interviewer] Oh wow. - [Interviewee] They look identical. They both work in machine learning. (interviewer laughing) And so it's, you know, it's a very funny thing. - [Interviewer] Can't even tell them apart, right? - [Interviewee] Yeah, it's hard to tell them apart. - [Interviewer] So we're off to the races, deep learning is taking over the world, it's now, even to this point today, very prevalent in our lives. - [Interviewee] Yes and that challenge, by the way, it has modified, they stopped, like this is getting ridiculous now. - [Interviewer] 99% or something, yeah. - [Interviewee] So now they stopped that challenge but they have now there are other competitions, right. Where it's, you know, fine-grained visual recognition, it's harder and harder- - [Interviewer] Which breed? Which animal? or whatever, yeah. - [Interviewee] And then initially it was an image classification problem, right, you had an image and for the entire image, you would output a single tag, right, whether this is a dog, but then there are other problems in computer vision. So it's about the granularity of information, right. Image classification has this problem. Then you have object detection where you say that, oh, it is a dog and look, it is inside this bounding box. So that's object detection. So deep learning it was an obvious next step, that, you know, it would be applied to this also, and then all the object detection (both laughing) algorithms. - [Interviewer] So, all deep learning. - [Interviewee] Are all deep learning these now, right. Then there's even a high level of granularity where you just want to say that, oh, I want to, this is a dog, but this is the outline, these are all the pixels that belong to the dog, right. Not just a bounding box, right, but the exact pixels, the outline of the dog, right. And that is called image segmentation. And very soon, you know, this is happening by 2015, already all the, you know, these algorithms, deep learning is basically taking over. And the reason is that the visual cues, that deep learning, the neural network was learning, it is the same visual cues that help it to do image classification, object detection, segmentation, et cetera. It has learned something fundamental about the image, right. So now image segmentation also, you know, all the state of thought algorithms are deep learning based. Then there is something called pose estimation where you find key points on the person, right. You might have seen these, you know, demos where you have a person walking and a skeleton of the person is overlayed.

Contents