Good writeup -- and one of the main reminders for me is this:
People throw around words like "revolution" for the current deep-learning push. But it's worth remembering that the fundamental concepts of neural networks have been around for decades. The current explosion is due to breakthroughs in scalability and implementation through GPUs and the like, not any sort of fundamental algorithmic paradigm shift.
This is similar to how the integrated circuit enabled the personal computing "revolution" but down at the transistor level, it's still using the same principles of digital logic since the 1940s.
In computer vision at least, deep learning has been a revolution. More than half of what I knew in the field became obsolete almost overnight (it took about a year or two I would say) and a lot of tasks received an immediate boost in term of performances.
Yes, neural networks have been here for a while, gradually improving, but they were simply non-existent in many fields where they are now the favored solution.
There WAS a big fundamental paradigm shift in algorithmic. Many people argue that it should not be called "neural networks" but rather "differentiable functions networks". DL is not your dad's neural network, even if it looks superficially similar.
The shift is that now, if you can express your problem in terms of minimization of a continuous function, there is a new whole zoo of generic algorithms that are likely to perform well and that may benefit from throwing more CPU resources.
Sure it uses transistors in the end, but revolutions do not necessarily mean a shift in hardware technology. And, by the way, if we one day switch from transistors to things like opto-thingies, if it brings a measely 10x boost on performances, it won't be on par with the DL revolution we are witnessing.
It is indeed pretty amazing in CV. In a field nearly as old as CS itself, I'd say at least 75% of the existing techniques were made obsolete in a span of only a few years.
You could have started your PhD in 2006, made professor in 2012, and nearly everything you had learned would have been _completely_ different.
I got my diploma in 2003. Actually I got lucky that I found a client who needed a computer vision specialist to add fancy features to their DL framework so I could train myself in this new direction
Yea .. if 10x compute made all the difference we would see elastic cloud compute doing things 10x as well ;) As always these new application work in tandem of hardware and software advances; seems odd to point to one or the other.
Can you elaborate on what in CV became obsolete overnight? I took a survey course in CV but I haven't kept up. You still do facial detection, object recognization, camera calibration, image stitching the same way in 2012? Or has it changed because the processing has gotten faster and the results are near real-time?
These were at the root of many detectors. They still are for some applications but for most of them, a few layers of CNN manage to train far better and very counter-intuitive detectors.
Facial detection/recognition was based on features, this is not my specialty, I don't know if DL got better there too as their features were pretty advanced but if they are not there yet I am sure it is just a matter of time.
I can see image stitching benefiting from a deep-learned filter pass too.
Camera calibration is pretty much a solved problem by now, I don't think DL adds a lot to it.
Like I said, not everything became obsolete, but around 50% of the field was taken over my DL algorithms where, before that, hand-crafted algorithms had usually vastly superior performances.
Just to confirm for the facial recognition/ detection, modern DNN algorithms outperform the 'classic' methods that took decades of continuous improvement ...
I don't think the revolution was about hardware improvement. I did some neural network research (and published a few papers) in the 1990s and switched to other research disciplines afterwards. So, I'm not really familiar with the recent developments. But to my knowledge, there was indeed a revolution in neural network research. It was about how to train a DEEP neural network.
Traditionally, neural networks were trained by back propagations, but many implementations had only one or two hidden layers because training a neural network with many layers (it wasn't called "deep" NN back then) was not only hard, but often led to poorer results. Hochreiter identified the reason in his 1991 thesis: it is because of the vanishing gradient problem. Now the culprit was identified but the solution had yet to be found.
My impression is that there weren't any breakthroughs until several years later. Since I'd left that field, I don't know what exactly these breakthroughs were. Apparently, the invention of LTSM networks, CNNs and the replacement of sigmoids by ReLUs were some important contributions. But anyway, the revolution was more about algorithmic improvement than the use of GPUs.
The things that have most improved training neural networks since you left were: 1. Smart (Xavier/He) initialization 2. ReLU activations 3. Batch normalization 4. Residual connections
The GPUs and dataset size were definitely very important though.
> not any sort of fundamental algorithmic paradigm shift
I think I cannot agree with this. There has been a lot of improvements to the algorithms to solve problems and the pace has speed up thanks to GPUs. You just cannot make a neural network from 15 years ago bigger and think it is going to work with modern GPUs, it is not going to work at all. Moreover, new techniques have appeared to solve other type of problems.
I am talking about things like batch normalization, RELUs, LSTMs or GANs. Yes, neural networks still use gradient descent, but there are people working on other algorithms now and they seem to work but they are just less efficient.
> This is similar to how the integrated circuit enabled the personal computing "revolution" but down at the transistor level, it's still using the same principles of digital logic since the 1940s.
This claim has exactly the same problem as before. You can also say evolution has done nothing because the same principles that are in people they were there with the dinosaurs and even with the first cells. We are just a lot more cells than before.
Assuming you're asking specifically about the history of DL/ML, I can recommend this (4-part) blog series: http://www.andreykurenkov.com/writing/ai/a-brief-history-of-.... It includes references for the relevant publications that identify problems (e.g., exploding gradients) and their solutions.
I have read a lot of papers and usually, you end up in the original one if you check the references. But you have to check papers in specific areas. I am not aware of any good document where everything is there.
There was a revolution, when they started using backpropogation to optimize the gradient search.
It's also why I don't agree with calling them "neural" anything because there is no proof brains learn using backpropogation.
I feel like current direction threw away all neurobiology and focus too much on the mathematical.
The GP implies that was not tried before: "There was a revolution, when they started using backpropogation to optimize the gradient search".
"Back-propagation allowed researchers to train supervised deep artificial neural networks from scratch, initially with little success. Hochreiter's diploma thesis of 1991[1][2] formally identified the reason for this failure in the "vanishing gradient problem", which not only affects many-layered feedforward networks,[3] but also recurrent networks."
The best way to advance AI is probably to make the hardware faster, especially now that Moore's Law is in danger of going away. The people doing AI research generally seem to be fumbling around in the dark, but you can at least be certain that better hardware would make things easier.
I am not in the AI space at all, but I am under the impression that python is the most used language for it. If workload is becoming an issue, wouldn't the low hanging fruit be a more performance driven software language?
With how fast things are evolving, developing something like an ASIC ($$$) for this might be outdated before it even hits release, no?
The heavy lifting done in neural networks is all offloaded to C++ code and the GPU. Very little of the computation time is spent in python.
ASICs will definitely be very helpful, and there's currently a bit of a rush to develop them. Google's TPUs might be one of the first efforts, but several other companies and startups are looking to have offerings too.
> Google's TPUs might be one of the first efforts, but several other companies and startups are looking to have offerings too.
NVidia's Volta series includes tensor cores as well [1]. So far, I think they've only released the datacenter version, which is available on EC2 p3 instances [2].
People throw around words like "revolution" for the current deep-learning push. But it's worth remembering that the fundamental concepts of neural networks have been around for decades. The current explosion is due to breakthroughs in scalability and implementation through GPUs and the like, not any sort of fundamental algorithmic paradigm shift.
This is similar to how the integrated circuit enabled the personal computing "revolution" but down at the transistor level, it's still using the same principles of digital logic since the 1940s.