Qualcomm Demos 48-Core Centriq 2400 Server SoC in Action, Begins Sampling

by Anton Shilov on December 16, 2016 6:00 PM EST

88 Comments | Add A Comment

88 Comments

Qualcomm this month demonstrated its 48-core Centriq 2400 SoC in action and announced that it had started to sample its first server processor with select customers. The live showcase is an important milestone for the SoC because it proves that the part is functional and is on track for commercialization in the second half of next year.

Qualcomm announced plans to enter the server market more than two years ago, in November 2014, but the first rumors about the company’s intentions to develop server CPUs emerged long before that. In fact, being one of the largest designers of ARM-based SoCs for mobile devices, Qualcomm was well prepared to move beyond smartphones and tablets. However, while it is not easy to develop a custom ARMv8 processor core and build a server-grade SoC, building an ecosystem around such chip is even more complicated in a world where ARM-based servers are typically used in isolated cases. From the very start, Qualcomm has been rather serious not only about the processors themselves but also about the ecosystem and support by third parties (Facebook was one of the first companies to support Qualcomm’s server efforts). In 2015, Qualcomm teamed up with Xilinx and Mellanox to ensure that its server SoCs are compatible with FPGA-based accelerators and data-center connectivity solutions (the fruits of this partnership will likely emerge in 2018 at best). Then it released a development platform featuring its custom 24-core ARMv8 SoC that it made available to customers and various partners among ISVs, IHVs and so on. Earlier this year the company co-founded the CCIX consortium to standardize various special-purpose accelerators for data-centers and make certain that its processors can support them. Taking into account all the evangelization and preparation work that Qualcomm has disclosed so far, it is evident that the company is very serious about its server business.

From the hardware standpoint, Qualcomm’s initial server platform will rely on the company’s Centriq 2400-series family of microprocessors that will be made using a 10 nm FinFET fabrication process in the second half of next year. Qualcomm does not name the exact manufacturing technology, but the timeframe points to either performance-optimized Samsung’s 10LPP or TSMC’s CLN10FF (keep in mind that TSMC has a lot of experience fabbing large chips and a 48-core SoC is not going to be small). The key element of the Centriq 2400 will be Qualcomm’s custom ARMv8-compliant 64-bit core code-named Falkor. Qualcomm has yet has to disclose more information about Falkor, but the important thing here is that this core was purpose-built for data-center applications, which means that it will likely be faster than the company’s cores used inside mobile SoCs when running appropriate workloads. Qualcomm currently keeps peculiarities of its cores under wraps, but it is logical to expect the developer to increase frequency potential of the Falkor cores (vs mobile ones), add support of L3 cache and make other tweaks to maximize their performance. The SoCs do not support any multi-threading or SMP technologies, hence boxes based on the Centriq 2400-series will be single-socket machines able to handle up to 48 threads. The core count is an obvious promotional point that Qualcomm is going to use over competing offerings and it is naturally going to capitalize on the fact that it takes two Intel multi-core CPUs to offer the same amount of physical cores. Another advantage of the Qualcomm Centriq over rivals could be the integration of various I/O components (storage, network, basic graphics, etc.) that are now supported by PCH or other chips, but that is something that the company yet has to confirm.

From the platform point of view, Qualcomm follows ARM’s guidelines for servers, which is why machines running the Centriq 2400-series SoC will be compliant with ARM’s server base system architecture and server base boot requirements. The former is not a mandatory specification, but it defines an architecture that developers of OSes, hypervisors, software and firmware can rely on. As a result, servers compliant with the SBSA promise to support more software and hardware components out-of-the-box, an important thing for high-volume products. Apart from giant cloud companies like Amazon, Facebook, Google and Microsoft that develop their own software (and who are evaluating Centriq CPUs), Qualcomm targets traditional server OEMs like Quanta or Wiwynn (a subsidiary of Wistron) with the Centriq and for these companies having software compatibility matters a lot. On the other hand, Qualcomm’s primary server targets are large cloud companies, whereas server makers do not have their samples of Centriq yet.

During the presentation, Qualcomm demonstrated Centriq 2400-based 1U 1P servers running Apache Spark, Hadoop on Linux, and Java: a typical set of server software. No performance numbers were shared and the company did not open up the boxes so not to disclose any further information about the CPUs (i.e., the number of DDR memory channels, type of cooling, supported storage options, etc.).

Qualcomm intends to start selling its Centriq 2400-series processors in the second half of next year. Typically it takes developers of server platforms a year to polish off their designs before they can ship them, normally it would make sense to expect Centriq 2400-based machines to emerge in the latter half of 2H 2017. But since Qualcomm wants to address operators of cloud data-centers first and companies like Facebook and Google develop and build their own servers, they do not have to extensively test them in different applications, but just make sure that the chips can run their software stack.

As for the server world outside of cloud companies, it remains to be seen whether the server industry is going to bite Qualcomm’s server platform given the lukewarm welcome for ARMv8 servers in general. For these markets, performance, compatibility, and longevity are all critical factors in adopting a new set of protocols.

88 Comments

View All Comments

witeken - Friday, December 16, 2016 - link
"and it is naturally going to capitalize on the fact that it takes two Intel multi-core CPUs to offer the same amount of physical cores."

These chips will compete against Skylake-EP, which will be launched mid-17, which will have up to 32 cores, so at best Qualcomm has 1.5x as many. But core count on its own is just as worthless as frequency. Performance also depends on the architecture.
prisonerX - Friday, December 16, 2016 - link
Actually you're entirely wrong. Because maximum frequencies have long plateaued and there is an upper limit on how much heat, ie power, you can output from a square mm of silicon, going forward core count and attendant low-power archs will win. Have a look at graphics cards for the future.

You're right when you say "Performance also depends on the architecture" but you have the wrong performance in mind. Power consumption is the key performance metric because it determines density and therefore overall throughput. Single thread performance as an absolute measure is meaningless in this scenario, it's all relative to how much space and power it takes to deliver it.

Intel's legacy architecture can't compete in this respect. You'll note that frequency decreases in Intel processors are core count increases. Intel lost on mobile becuase they can't compete on low, low power, and they'll also lose in the high core count future becuase they've wedded themselves to their old and inefficient architecture.
witeken - Friday, December 16, 2016 - link
"Intel's legacy architecture can't compete in this respect. You'll note that frequency decreases in Intel processors are core count increases. Intel lost on mobile becuase they can't compete on low, low power, and they'll also lose in the high core count future becuase they've wedded themselves to their old and inefficient architecture."

I recently read some quote that people who are *sure* someone is wrong, are usually actually wrong themselves. And that is here entirely the case. ISA has little to zero impact. Process technology is so many order of magnitude more important.

Please read following article from AT: The x86 Power Myth Busted: In-Depth Clover Trail Power Analysis: http://www.anandtech.com/show/6529/busting-the-x86...
Krysto - Friday, December 16, 2016 - link
That article was SO SO wrong. I don't know whether he did it on purpose (Anand was showing himself to be a pretty big Intel fanboi at the time) or whether he simply missed it, but it's strange that he would've missed something so obvious.

Here's the thing: Anand compared a 32nm planar process for the ARM chips, with Intel's 3D Trigate 22nm process. Not only was the 22nm vs 32 an entire process generation ahead on its own, but so was the Trigate process vs planar (Intel essentially jumped 2 process generations in terms of performance gains when it moves from 32nm planar to 22nm Trigate).

So Intel was not one, but TWO process generations ahead for Atom compared to the equivalent ARM chips, and yet it could still BARELY compete with those ARM chips - yet Anand's conclusion was "x86 myth busted" ?!

If anything it proves the myth was NEVER busted if the ARM chips could still hold their own despite being two process generations behind.

Another very important point, that Atom chip was still significantly more expensive than than those ARM competitors. It's why Intel had to lose $1 billion a quarter to keep subsidizing it and make it remotely attractive to mobile device makers, and why it eventually licensed the Atom design to Rockchip and Spreadstrum, thinking they could build it cheaper. But it was always going to be a losing strategy because by the time those Rockchip Atom chips came out at 28nm planar, ARM chips were already at 20nm. And as explained above, Atom was only competitive when it was two process nodes ahead. So Intel never had a chance with Atom.
witeken - Friday, December 16, 2016 - link
Lol, have you actually looked at the article? As of 2012, Intel did not have mobile finfet devices, which makes your entire rant obsolete.

And if I may add another truism: people who invent facts, can argue for anything (and it doesn't matter if this is because of ignorance, i.e. not having enough information). So I will simply ignore all your falsities.
witeken - Friday, December 16, 2016 - link
Anyway. I would say the burden of proof is on your side. Please scientifically explain why ARM is more power efficient. If you can't, that could be because you are wrong.

The world does not run on magic.
name99 - Saturday, December 17, 2016 - link
The issues are not the technical ones that are being thrown around. The issues that matter are

(a) design and validation time. We know it took 7 years to design the Nehalem class of CPUs. This appears to have stretched out to eight years with Kaby Lake. Meanwhile we know it took Samsung 3 years to design their fully custom Exynos, so Apple and QC are probably on a similar (perhaps four year) schedule.
Obviously a server is more complicated (look how Intel servers ship two years after their base cores) and we don't know how much of that extra validation would affect QC. But everything points to QC and others having an easier job, and being able to respond faster to market changes and possible new design ideas.

(b) prices. Intel finances its operation by extracting the most money from their most captive buyers. Once QC and other ship chips that are at an acceptable performance level, Intel can certainly respond with lower prices, like they have with Xeon-D; the problem is that doing so throws their whole financial structure into chaos.
Gondalf - Sunday, December 18, 2016 - link
Na my friend, a brand new x86 cpu takes 3 years to develop more like an ARM cpu/SOC, nothing is different with actual sw tools Intel has, the little add that Intel does in development (1 year) is because Intel cores are fully Server Space validated and ready for a 24/7 utilization without elettromigration for at least 10 years.
Please take some proof before post on Anandtech.
Wilco1 - Sunday, December 18, 2016 - link
If developing new x86 CPUs is so easy, then explain how is it possible that a small company like ARM is able to release new microarchitectures much faster? There have been just 3 Atom microarchitectures in 8 years despite spending $10+ billion on mobile. ARM produced more than 10 different A-class cores since 2008 for a tiny fraction of Intel's cost.

Please get a clue before posting on Anandtech.
Gondalf - Sunday, December 18, 2016 - link
They are only cores not implemented in a SOC cores, the strong cost of SOC implementation is on licensee balance sheet. About costs of ARM and Intel for the development of the architecture only without the hard implementation of it on a real chip....we have not real figures but sure they are comparable at a certain level of complexity and debugging. Decoders are standard building blocks only in Intel/AMD sw.
About Atom, it was an Intel choice to stay a lot on a single arc refining the uncore instead, you know Intel way to think, pretty strange to look you doing these reasonings......pretty strange indeed.

Qualcomm Demos 48-Core Centriq 2400 Server SoC in Action, Begins Sampling

Post Your Comment

88 Comments

View All Comments

witeken - Friday, December 16, 2016 - link

prisonerX - Friday, December 16, 2016 - link

witeken - Friday, December 16, 2016 - link

Krysto - Friday, December 16, 2016 - link

witeken - Friday, December 16, 2016 - link

witeken - Friday, December 16, 2016 - link

name99 - Saturday, December 17, 2016 - link

Gondalf - Sunday, December 18, 2016 - link

Wilco1 - Sunday, December 18, 2016 - link

Gondalf - Sunday, December 18, 2016 - link

Log in

Don't have an account? Sign up now