Which existing neuromorphic computers achieve 10^14 ops/s at 20 W? If you compar...

Russell91 · on Feb 17, 2017

>Which existing neuromorphic computers achieve 10^14 ops/s at 20 W? If you compare them to GPUs, those "ops" better be FP32 or at least FP16.

The comparison is of 3 bit neuromorphic synaptic ops against FP8 pascal ops. That factor is important (as it means that the neuromorphic ops are less useful), but it turns out to be dwarfed by the answer to your second question:

> Also, you forgot to tell us what is that "extremely concrete reason why current neural net architectures will NOT work with the above optimizations".

this is rather difficult to justify in this margin. But the idea is that proposals such as those above (50 Tops) tend to be optimistic on the efficiency of the raw compute ops. But these proposals really don't have much to say about the costs of communication (e.g. reading from memory, transmitting along wires, storing in registers, using buses, etc.). It turns out that if you don't have good ways to reduce these costs directly (and there are some, such as changing out registers for SRAMs, but nothing like the 100x speedup from analog computing), you have to just change the ratio of ops / bit*mm of communication per second. There are lots of easy ways to do that (e.g. just spin your ops over and over on the same data), but the real question is how to get useful intelligence out of your compute when it is data starved. This is an open question, and (sadly), very few ppl are working on it, compared to say low-bit-precision neural nets. But I predict this sentiment will be changed over the next few years.

Edit for below: no one is suggesting 50 Top/w hardware running alex net software to my knowledge (though would love to hear what they are proposing to run at that efficiency) . Nvidia among others are squeezing efficiency for cv applications with current software, but this comes at the cost of generality (it's unlike the communication tradeoffs they're making on that chip will make sense for generic AI research), and further improvements will rely on broader software changes, esp revolving around reduced communication. There are a lot of interesting ways to reduce communication without sacrificing performance, such as using smaller matrix sizes, which would reverse the state of the art trends.

p1esk · on Feb 17, 2017

Regarding your first answer, sounds like you're doing apples to oranges comparison here. What are those "synaptic ops"? Xavier board is announced to be capable of 30 Tops (INT8) at 30W, so even if your neuromorphic chip does 100 Tops at 20W, assuming for a second those ops are equivalent to INT3 operations, this makes them very similar in efficiency.

And you still haven't answered my second question: what is the reason the future neuromorphic chips won't be able to run current neural net architectures?

I'm not even sure what you are talking about at the end of your comment. The 50Tops/W figure was promised for an analog chip, designed to run modern DL algorithms. Sounds pretty reasonable, and I don't see how your arguments apply to it. Are you saying we can't build an analog chip for DL? Why does it have to be data starved?

deepnotderp · on Feb 17, 2017

Our hardware can run AlexNet...

Russell91 · on Feb 17, 2017

In an integrated system at 50 tops/watt? How are you going to even access memory at less than 20 fJ per op? Like, you're specifically trying to hide the catch here. If we were to take you at face value, we'd have to also believe that Nvidia is working on an energy optimized system that is 50x worse for no good reason.

For reference, reading 1 bit from a very small 1.5kbit sram, which is much cheaper than the register caches in a gpu, costs more than 25 fJ per bit you read.

deepnotderp · on Feb 17, 2017

So this is locked up in "secret sauce". But as a hint, the analog aspect can be exploited.

Russell91 · on Feb 17, 2017

Look, it sounds like your implying compute colocated storage in the analog properties of your system (which is exactly what a synaptic weight is btw), on top of using extremely low bit precision. So explicitly calling your system totally non-neuromorphic is a little deceiving. But even then I find this idea that you're going to be running the AlexNet communication protocol to pass around information in your system to be a little strange. If you're doing anything like passing digitized inputs through a fixed analog convolution then you're not going to beat the SRAM limit, which means that instead you have in mind keeping the data analog at all times, passing it through an increasing length of analog pipelines. Even if you get this working, I'm quite skeptical that by the time you have a complete system, you'll have reduce communication costs by even half the reduction you achieve in computation costs on a log scale. It's of course possible that I'm wrong there (and my entire argument hinges on the hypothesis that computation costs will fall faster than communication - which is true for CMOS but may be less true for optical), but this is really the only projection on which we disagree. If I'm right, then regardless of whether you can hit 50 Tops (or any value) on AlexNet, you'd be foolish not to reoptimize the architecture to reduce communication/compute ratios anyway.

p1esk · on Feb 18, 2017

Oh, I see what you meant now. Yes, when processing large amount of data (e.g. HD video) on an analog chip, DRAM to SRAM data transfer can potentially be a significant fraction of the overall energy consumption. However, if this becomes a bottleneck, you can grab the analog input signal directly (e.g. current from CCD), and this will reduce the communication costs dramatically (I don't have the numbers, but I believe Carver Mead built something called "Silicon Retina" in the 80s, so you can look it up).

Power consumption is not the only reason to switch to analog. Density and speed are just as important for AI applications.

deepnotderp · on Feb 18, 2017

I should clarify, once data enters the chip, we provide 50 tops/W. The transfer from dram is not included.