Huawei Shows Unannounced Kirin 970 at IFA 2017: Dedicated Neural Processing Unit
by Ian Cutress on September 1, 2017 8:20 AM EST- Posted in
- SoCs
- Arm
- Huawei
- Trade Shows
- Cortex A53
- Mali
- Cortex A73
- IFA 2017
- Kirin 970
- NPU
A surprise at this year’s IFA is the previously unannounced Kirin 970 SoC hitting the show floor. Normally Huawei announces a new SoC with plenty of press details, and we were expecting perhaps some musings towards what is next from Huawei (it’s usually around this time of year), but this time they pushed it through to the show floor without any pomp and show (or any notice). Cue my surprise when I saw it…
The headline that Huawei seems to want to promote is the addition of dedicated neural network silicon inside the Kirin 970, dubbed the Neural Processing Unit (NPU). The sticker performance of the NPU is rated at 1.92 TFLOPs of FP16, which for reference, is about 3x what the Kirin 960's GPU alone can do on paper (~0.6 TFLOPs FP16). Or to put this in practical terms, Huawei says that the NPU is capable of discerning 2005 images per minute from internal testing, compared to 97 images per minute without the NPU – and presumably on the CPU – using the Kirin Thundersoft software (likely a future brand name). Obviously, depending on the implementation and power use, I would expect Huawei to try and leverage the NPU as much as possible in upcoming designs.
HiSilicon High-End Kirin SoC Lineup | |||
SoC | Kirin 970 | Kirin 960 | Kirin 950/955 |
CPU | 4x A73 @ 2.40 GHz 4x A53 @ 1.80 GHz |
4x A73 @ 2.36GHz 4x A53 @ 1.84GHz |
4x A72 @ 2.30/2.52GHz 4x A53 @ 1.81GHz |
GPU | ARM Mali-G72MP12 ? MHz |
ARM Mali-G71MP8 1037MHz |
ARM Mali-T880MP4 900MHz |
LPDDR4 Memory |
? | 2x 32-bit LPDDR4 @ 1866MHz 29.9GB/s |
2x 32-bit LPDDR4 @ 1333MHz 21.3GB/s |
Interconnect | ? | ARM CCI-550 | ARM CCI-400 |
Storage | ? | UFS 2.1 | eMMC 5.0 |
ISP/Camera | Dual ISP | Dual 14-bit ISP (Improved) |
Dual 14-bit ISP 940MP/s |
Encode/Decode | 2160p60 Decode 2160p30 Encode |
2160p30 HEVC & H.264 Decode & Encode 2160p60 HEVC Decode |
1080p H.264 Decode & Encode 2160p30 HEVC Decode |
Integrated Modem | Kirin 970 Integrated LTE (Category 18) DL = 1200 Mbps 4x20MHz CA, 128-QAM |
Kirin 960 Integrated LTE (Category 12/13) DL = 600Mbps 4x20MHz CA, 64-QAM UL = 150Mbps 2x20MHz CA, 64-QAM |
Balong Integrated LTE (Category 6) DL = 300Mbps 2x20MHz CA, 64-QAM UL = 50Mbps 1x20MHz CA, 16-QAM |
Sensor Hub | ? | i6 | i5 |
NPU | Yes | No | No |
Mfc. Process | TSMC 10nm | TSMC 16nm FFC | TSMC 16nm FF+ |
Other details for the Kirin 970 show improvements over the Kirin 960. First is the movement to TSMC’s 10nm process, from 16FF+. The Kirin 960 launched a few months before the 10nm ramp up for other high-end smartphone SoCs hit the shelves, so Huawei is matching their competitors here. The core configuration is the same as the 960, with four ARM Cortex A73 cores and four ARM Cortex A53 cores, this time clocked at 2.4 GHz and 1.8 GHz respectively. The integrated graphics is the newest Mali G72, announced alongside the A75/A55 processors earlier this year, which will be in an MP12 configuration. Frequency was not listed.
Other sticker features include dual ISP for motion detection and low light enhancement, support for HDR10 with 4K60 decoding, 4K30 encoding, and an LTE Category 18 modem, which Huawei states is good for 1.2 Gbps download. I’d be under the assumption that this is 4x carrier aggregation with 128-QAM. The Kirin 970 will also ship with an embedded Security Engine, supporting TEE and inSE.
Huawei’s final declarations on the NPU state that it is 25x the performance of a CPU with 50x the energy efficiency, and using a new HiAI (Hi-Silicon AI) nomenclature.
Huawei’s CEO, Richard Yu, has a keynote later this week and we also have some meetings with Huawei. I’m going to probe for details. The only smartphones with Kirin 970 on the show floor were generic models hooked up to development boards. Any devices coming to market (such as a Mate 9) will be a few weeks away, given launches from previous years.
11 Comments
View All Comments
jjj - Friday, September 1, 2017 - link
The NPU should be in collaboration with Cambricon Technologies.VeixES - Friday, September 1, 2017 - link
That 1200Mbps LTE Cat18 is probably not from 4CA with DL-128QAM.More likely a max performance is achieved while doing 4CA with 2 of the carriers being 4x4MIMO(vs 2x2MIMO on other 2) using DL-256QAM. Or with 3CA and all carriers on 4x4MIMO with DL-256QAM. So data 12 streams total with either option.
Upcoming Qualcomm X20 modem will be also 12 streams capable.
levizx - Friday, September 1, 2017 - link
Considering 128QAM is not defined in 3GPP Rel.13, I'd say the chart is most definitely wrong. You are probably right about the 12-stream 256QAM, but it could also be some sort of 5x20MHz combination or 2x10+4x20.peevee - Friday, September 1, 2017 - link
No A75/A55 - not interested.Santoval - Friday, September 1, 2017 - link
Still too early, it was just announced. Expect the first shipments in Q4, most likely December '17. Q1 2018 for reasonably high volume.levizx - Friday, September 1, 2017 - link
You won't see A75/A55 until next year. And there won't be any material difference since, by ARM's own admission, there won't be any efficiency gain going from A73 to A75.So long as SoCs are still thermal/power limited, there's no point in upgrading if it negatively affects time-to-market.
peevee - Wednesday, September 6, 2017 - link
I am more interesting in performance gains from A55. After all, these are all thermally limited environments. Speculative OO just wastes power.Tigran - Friday, September 1, 2017 - link
Can't open pics in the gallery.Ryan Smith - Friday, September 1, 2017 - link
Thanks! Fixed.Tigran - Friday, September 1, 2017 - link
Thanks, it's OK now.