Hot Chips 31 Live Blogs: Princeton In-Memory Compute Embedded CPU
by Dr. Ian Cutress on August 19, 2019 2:30 PM EST
02:16PM EDT - Princeton is presenting it's own solution for in-memory compute this year at Hot Chips.
02:31PM EDT - Energy efficiency is critical, but programmability is often complex
02:32PM EDT - Compute is often only 10% of the total instruction energy
02:32PM EDT - Compute is getting faster, but programmability hasn't. We've hit a memory wall
02:33PM EDT - This focus is on embedded SRAM
02:33PM EDT - Models can be large, but up to 50M parameters is smaller
02:34PM EDT - Data movement is fundamental. Cannot be eliminated, but amortized
02:34PM EDT - End up with reuse of data and specialized memory-compute integrated architectures, like TPU with systolic arrays
02:35PM EDT - In-Memory computing does soemthing similar but more aggressive
02:35PM EDT - IMC = in memory computing
02:36PM EDT - At a fundamental level, IMC can reduce voltage SNR in exchange for energy/throughput
02:37PM EDT - Solution is to use analog circuits. Problem is that transistors have non-linear properties
02:37PM EDT - A number of IMC designs have been produced in academia, with some test chips
02:38PM EDT - One issue with IMC is despite 10x energy efficiency is lower memory density
02:38PM EDT - In order to reduce non-linearity and variation of analog circuits, need advanced process technologies with tighter tolerances
02:39PM EDT - Move to charge-domain computation based on capacitors. End up with 8T bit-cell
02:41PM EDT - Can measure image recognition in micro-joules per image
02:41PM EDT - Most ML compute is GEMM, where IMC can help
02:42PM EDT - 590 KB IMC with Si-Five CPU sample chip
02:42PM EDT - Compute-In-Memory Unit (CIMU)
02:42PM EDT - 32-bit external architecture built into standard memory interface
02:42PM EDT - low power 8-bit ADC
02:43PM EDT - bit scalability form 1-8 bits
02:44PM EDT - 8-bit ADC helps with energy overhead
02:44PM EDT - saves energy/area vs 16-bit
02:47PM EDT - OK I'm lost on this talk. It's very academic
02:48PM EDT - Test chip built on 65nm, 8.5mm2
02:48PM EDT - 1b efficiency was 400 TOPs/W
02:49PM EDT - energy efficiency scales like digital, while maintaining analog precision
02:50PM EDT - 23 images/sec at 4b, 176 images/sec at 1b
02:50PM EDT - Developed a prototype kit
02:51PM EDT - 4b activations and weights: 92.4% accurate, 105.2 microjoules total, 23 images/sec
02:52PM EDT - 1b activations and weights, 89.3% accurate, 5.31 microjoules total, 176 images/sec
02:52PM EDT - Software SDK
02:53PM EDT - Libraries are available with CPU-fallback
02:54PM EDT - Q&A Time
02:54PM EDT - No Qs.
Next talk is Intel Optane!
3 Comments
View All Comments
ballsystemlord - Tuesday, August 20, 2019 - link
Finally, it's my chance to be the first to reply! :-)abufrejoval - Wednesday, August 21, 2019 - link
I am confused by the combination of "SRAM" and "capacitance based" computing.SRAM IMHO means 4-6 transistors per bit and is about the worst density, while here I thought they were using (decaying) capacitances in DRAM bits trenches to store (and compute on?) NN data in "analalog" form: While that sounds crazy, density would be great.
But I guess the target context is about the smart shirt button, which infers that vis-à-vis your boss it should really be closed and now tries to inform you about that via your on-body-network.
Bulat Ziganshin - Thursday, August 22, 2019 - link
I think they mean SRAM as static RAM what doesn't need to be recharged like DRAM