Something doesn't seem right. If we are this good in image recognition, how come we are still using captchas?
Are we so good in image recognition that it can identify young girl, bunch of bananas, guitar etc just by training it from an image set of just few thousand image. Whats the catch ?
On one hand google says their self-driving car can't understand all the situations on the road while this algorithm can identify so many details from an image like a strong AI would. Feels very strange.
As I understand it, reCAPTCHA now relies heavily on metadata to determine if someone is a bot. Probably stuff like their IP address, browser information, geographic location, internet speed. Then it predicts how likely they are to be a spam bot and sends them a much harder CAPTCHA if so. And they change the interface to throw off bots.
The average person who just wants to automate filling out your website form is still blocked, so it's not useless.
There is some recent research that suggests you can make images which are very hard for neural networks to identify, but still easy for humans.
Here is an example: http://i.imgur.com/K6AQRkV.png The digits on the right are just slightly changed to be harder for NNs to recognize.
For comparison, this is the amount of random noise needed to have the same effect as their method: http://i.imgur.com/Asnf2L8.png
Captcha's largely fight the lowest common denominator, my making those who don't care enough (or have the knowledge) to work around them when they can just go elsewhere, so that you can invest your human resources fighting the more sophisticated attackers that actually target you.
They are reasonably successful because spammers have enough other targets that not many see it as worth the extra effort (and clock cycles) to break them, not because most of them are particularly hard to beat any more.
Breaking captchas also have an incentive that has been attractive enough to create an three entire industries -- spam, malware and security products to deal with spam and malware.
These images are probably cherry picked as the most successful recognitions. We need more data until we can say whether it can actually recognize young girls in images confidently, or just sometimes gets "lucky". That's why they say that not all traffic situations can be understood. The algorithms aren't perfect.