AMD Beema/Mullins Architecture & Performance Preview
by Anand Lal Shimpi on April 29, 2014 12:00 AM ESTNew Turbo Boost
With power in perspective, let’s talk about performance and the lineup. It always made little sense that despite a very competitive microarchitecture, Jaguar both consumed more power and performed worse than Intel’s Silvermont. It turns out that’s more a function of the limited time AMD’s Jaguar team had to bring the design to market. As the basis not only for AMD’s own entry level APUs but also the semi-custom SoCs bids for consoles from Microsoft and Sony, Jaguar had to be done quickly. With Puma+ and its associated SoC designs, AMD could focus more on driving power down and introducing new features, one of which happens to be a very intelligent clock boosting scheme analogous to Intel’s Turbo Boost.
While the bulk of Kabini and Temash silicon ran up to a set maximum frequency, Beema and Mullins SoCs can take advantage of available thermal headroom to increase their maximum frequency for a limited period of time. If we look at the tables below we’ll see this in action:
Mullins vs. Temash - Frequency Gains | ||||||||
TDP | Max CPU Frequency | Temash Equivalent | Temash Equivalent (TDP) | Temash Max CPU Frequency | Max Frequency Increase from Mullins | |||
A10 Micro-6700T | 4.5W | 2.2GHz | A6-1450 | 8W | 1.4GHz | 57% | ||
A4 Micro-6400T | 4.5W | 1.6GHz | A4-1250 | 9W | 1.0GHz | 60% | ||
E1 Micro-6200T | 3.95W | 1.4GHz | A4-1200 | 3.9W | 1.0GHz | 40% |
AMD no longer reports max non-turbo frequency, unfortunately following in Intel’s footsteps (as well as the rest of the mobile players), but you can assume that they are mostly unchanged from Kabini/Temash. Beema and Mullins can now turbo up to much higher frequencies. In the case of Mullins in particular, since it’s so thermally constrained, the potential upside for frequency scaling is huge.
Beema vs. Kabini - Frequency Gains | ||||||||
TDP | Max CPU Frequency | Kabini Equivalent | Kabini Equivalent (TDP) | Kabini Max CPU Frequency | Max Frequency Increase from Beema | |||
A6-6310 | 15W | 2.4GHz | A6-5200 | 25W | 2.0GHz | 20% | ||
A4-6210 | 15W | 1.8GHz | A4-5000 | 15W | 1.5GHz | 20% | ||
E2-6110 | 15W | 1.5GHz | E2-3000/E1-2500 | 15W | 1.65GHz/1.4GHz | -10%/7% | ||
E1-6010 | 10W | 1.35GHz | E1-2100 | 9W | 1.0GHz | 35% |
The frequency gains aren't just limited to the CPU, the 128 GCN cores can also run at higher speeds with Beema and Mullins:
Mullins vs. Temash - GPU Frequency Gains | ||||||||
TDP | Max GPU Frequency | Temash Equivalent | Temash Equivalent (TDP) | Temash Max GPU Frequency | Max GPU Frequency Increase from Mullins | |||
A10 Micro-6700T | 4.5W | 500MHz | A6-1450 | 8W | 400MHz | 25% | ||
A4 Micro-6400T | 4.5W | 350MHz | A4-1250 | 9W | 300MHz | 16% | ||
E1 Micro-6200T | 3.95W | 300MHz | A4-1200 | 3.9W | 225MHz | 33% |
Beema vs. Kabini - GPU Frequency Gains | ||||||||
TDP | Max GPU Frequency | Kabini Equivalent | Kabini Equivalent (TDP) | Kabini Max GPU Frequency | Max GPU Frequency Increase from Beema | |||
A6-6310 | 15W | 800MHz | A6-5200 | 25W | 600MHz | 33% | ||
A4-6210 | 15W | 600MHz | A4-5000 | 15W | 500MHz | 20% | ||
E2-6110 | 15W | 500MHz | E2-3000/E1-2500 | 15W | 450/400MHz | 11%/25% | ||
E1-6010 | 10W | 350MHz | E1-2100 | 9W | 300MHz | 16% |
How can AMD hit significantly higher frequencies without a substantial architecture change or new process node? By raising the max thermal operating point of the silicon. Similar to what Intel discovered in architecting its Bay Trail silicon, AMD realized that in ultra portable form factors it would run into a chassis temperature limit before it ever reached the maximum operating temperature of its silicon.
Previously once the silicon temperature hit 60C, AMD would cap max CPU/GPU frequency. However what really matters isn’t if the silicon is running warm but rather if the chassis is running too warm. With Beema and Mullins, AMD increases the silicon temperature limit to around 100C (still within physical limits) but instead relies on the surface temperature of the device to determine when to throttle back the CPU/GPU. In AMD’s own words, this allows the SoC to run at a much higher frequency for up to several minutes before having to scale back down. As long as the physical limits of the die aren’t exceeded, the design remains just as safe as before, but you get better performance.
The real trick is that AMD is able to enable this new chassis temperature governed boost (called Skin Temperature Aware Power Management - STAPM) without requiring any additional sensors or hardware from the OEM. What AMD does instead is gives the OEM tools to properly map SoC temperature to chassis skin temperature. My guess is the OEM runs a set workload, measuring external chassis temperature all while correlating that data with SoC temperature. This mapping will vary on a device by device basis, and obviously won’t be as accurate as having a thermal sensor on the chassis itself, but it’s good enough to get the job done.
AMD claims it’s intelligent about when to boost. The updated power management unit looks at the response to frequency scaling of a given workload and will only boost when the workload will actually benefit from being boosted. This evaluation happens at the hardware instruction level and not at the OS/software layer.
The Lineup
With the exception of compressing the Kabini family into four parts instead of five, AMD kept the same number of SKUs as last year but obviously with updated specs with Beema and Mullins:
AMD Mullins vs. Temash APUs | |||||||||||
Model | Radeon Brand | SDP | TDP | CPU Cores | CPU Clock Speed (Max) | L2 Cache | Radeon Cores | GPU Clock Speed (Max) | DDR3 Speed (Max) | ||
A10 Micro-6700T | R6 | 2.8W | 4.5W | 4 | 2.2GHz | 2MB | 128 | 500MHz | 1333 | ||
A4 Micro-6400T | R3 | 2.8W | 4.5W | 4 | 1.6GHz | 2MB | 128 | 350MHz | 1333 | ||
E1 Micro-6200T | R2 | 2.8W | 3.95W | 2 | 1.4GHz | 1MB | 128 | 300MHz | 1066 | ||
A6-1450 | HD 8250 | 8W | 4 | 1.4GHz | 2MB | 128 | 400MHz | 1066 | |||
A4-1250 | HD 8210 | 9W | 2 | 1.0GHz | 1MB | 128 | 300MHz | 1333 | |||
A4-1200 | HD 8180 | 3.9W | 2 | 1.0GHz | 1MB | 128 | 225MHz | 1066 |
The Mullins parts get a Micro prefix in front of their model number, implying the SoC's tablet-friendliness. AMD also supplies both TDP and Scenario Design Power (SDP) values for Mullins SoCs, similar to what Intel does with Bay Trail. The latter uses more tablet-like workloads (read: lighter weight) while determining SoC power.
With the exception of the entry level E1 Micro-6200T, TDPs go down substantially with Mullins vs. Temash. Cache sizes and GPU core count remain unchanged, but CPU frequencies and max DRAM frequency supported goes up in many cases.
AMD Beema vs. Kabini APUs | |||||||||||
Model | Radeon Brand | SDP | TDP | CPU Cores | CPU Clock Speed (Max) | L2 Cache | Radeon Cores | GPU Clock Speed (Max) | DDR3 Speed (Max) | ||
A6-6310 | R4 | 15W | 4 | 2.4GHz | 2MB | 128 | 800MHz | 1866 | |||
A4-6210 | R3 | 15W | 4 | 1.8GHz | 2MB | 128 | 600MHz | 1600 | |||
E2-6110 | R2 | 15W | 4 | 1.5GHz | 2MB | 128 | 500MHz | 1600 | |||
E1-6010 | R2 | 10W | 2 | 1.35GHz | 1MB | 128 | 350MHz | 1333 | |||
A6-5200 | HD 8400 | 25W | 4 | 2.0GHz | 2MB | 128 | 600MHz | 1600 | |||
A4-5000 | HD 8330 | 15W | 4 | 1.5GHz | 2MB | 128 | 500MHz | 1600 | |||
E2-3000 | HD 8280 | 15W | 2 | 1.65GHz | 1MB | 128 | 450MHz | 1600 | |||
E1-2500 | HD 8240 | 15W | 2 | 1.4GHz | 1MB | 128 | 400MHz | 1333 | |||
E1-2100 | HD 8210 | 9W | 2 | 1.0GHz | 1MB | 128 | 300MHz | 1333 |
Beema sees the end of the lone 25W TDP for Kabini, everything is now at 15W or less. The lowest end Beema carries a slightly higher TDP than the entry level Kabini, but otherwise there's more performance at the same TDP across the board. Beema parts don't come with an SDP rating as they're designed for use in more traditional ultrathin notebook PC form factors (presumably running more traditional, read: heavier, workloads).
TrustZone
In 2012 AMD announced that it had signed a license agreement with ARM. Although we’ve since seen AMD announce ARM based Opteron silicon, back then the only official commitment was to ship an x86 SoC in 2013 with an integrated ARM Cortex A5 for TrustZone execution. AMD needed a hardware security platform on its SoCs to remain competitive, and it didn’t have one of its own (Intel’s TXT is proprietary and not a part of what’s licensed to AMD) so ARM’s TrustZone technology was an easy target. To support TrustZone you need an ARM core, and thus AMD committed to integrating a Cortex A5 as a dedicated security processor on some of its 2013 APUs.
Indeed both Kabini and Temash had a Cortex A5 on die, it was simply never enabled due to time constraints. With Beema and Mullins the core is fully functional in what AMD is calling its Platform Security Processor (PSP). AMD will likely publish guidelines on how developers can access and use the PSP, and I’d also expect to see it make its way into other AMD APUs moving forward.
82 Comments
View All Comments
name99 - Tuesday, April 29, 2014 - link
"I’d expect a similar die size to Kabini/Temash. It’s interesting to note that these SoCs have a transistor count somewhere south of Apple’s A7."Isn't this something of an apple's to oranges comparison?
This AMD SOC is basically CPU+GPU+memory controller.
A7 is all that plus secure storage, ISP, h264 encoder/decoder (the genuine low power deal, not some "hardware assisted" frankenstein that runs the CPU and GPU [together, both at high power] to do the job) along with god knows what else --- flash controller? fingerprint recognition cell?
mczak - Tuesday, April 29, 2014 - link
Kabini / Temash also full custom hw video encode/decode (all gcn based chips do), though if you want some hybrid mode is still available, so that should be pretty comparable. Flash controller and the like, too. Yes no ISP, but OTOH there's quite a lot of stuff the A7 won't do too (like 2xsata, the 4x1 and 1x4 pcie 2.0 connectivity, 2xUSB 3.0, high-speed i/o isn't exactly cheap). Anyway, the transistor count and die size is comparable after all (based on the official numbers, Kabini is slightly larger, but the a7 has slightly more transistors, though there's both different methods to count transistors and measure die size, not to mention they come out of different fabs), and it shouldn't be a surprise.lmcd - Friday, May 2, 2014 - link
AMD should try partnering with Broadcom (as Broadcom has no real SoCs for smartphones).200380051 - Tuesday, April 29, 2014 - link
I am eager to see how Mantle-enabled games will perform on these Mullins tablets. It seems a good fit from a technical standpoint. It might just push the PC gaming sphere to dig into tablet space. This in turn directly expands the market of game studios.Also, I wonder if AMD's mobile lineup is to be the first product they'll roll out on Samsung's 14nm FINFET process. The process will be available starting 2015, as per their agreement. Its up to AMD to cook us a shrinked revision of these chips in a timely fashion.
Things are getting interesting.
MartinT - Wednesday, April 30, 2014 - link
It seems to me that performance numbers for these parts don't tell even half the story without the accompanying power readings, considering the 'use whatever power until the chassis burns the user' approach of AMD's turbo implementation.kirilmatt - Wednesday, April 30, 2014 - link
How AMD did this is amazing. Imagine if this was released instead of kabini/temash. This destroys Bay Trail. I only hope that it gets released soon so it doesn't have to compete with Intel's 14nm SoCs. Anyways, good job AMD!R3MF - Wednesday, April 30, 2014 - link
Ubuntu tablet please...purerice - Wednesday, April 30, 2014 - link
Would one way to test the "non-turbo" performance be to loop some test 100 times and see the performance decrease over time? Considering the turbo would decrease as the CPU/APU heats up we could see the performance difference and also how long you really get "turbo" turned on for.azazel1024 - Thursday, May 1, 2014 - link
I am impressed, but I am curious as to both why Bay Trail beats it in the PCMark testing by a fair margin, but not in individual CPU benchmarks. If that is thermal limits...well, I will say that a lot of tablet workloads are very short term. Windows tablet workloads (at least mine)...not so much.Enough of what I do would likely hit those thermal constraints and at least in my testing, my T100 doesn't clock down even under very prolonged workloads, like 15+ minutes of converting RAW to JPEG images. Or long gaming, like an hour or two of KSP.
That and I have concerns about that idle and low power use. Seems to be pretty good under higher load and performance seems to be there (with caveat/concern)...but idle and low power could be an issue. According to those AMD specs, the APU itself is using darn near 2w of power streaming 1080p. Based on my math, my T100 TOTAL uses around 2.4w of power when streaming 1080p (around 13hrs of run time, 31whr battery). I assume that the display, wifi, signal processor, memory, etc, etc are consuming more than .6w of power.
Having a much bigger battery or much shorter run time could be a big sticking point for a lot of tablet users (I know I'd have an issue if my 6-7hrs gaming/10hrs normal use/13hrs video turned in to more like 3hrs gaming/6hrs normal use/8hrs video.
FITCamaro - Wednesday, May 7, 2014 - link
My next tablet will likely be a Windows 8.1 tablet. I'd love the high end AMD CPU tested here even if it doesn't do as well on power as Baytrail but bests it in GPU performance. Would be nice to be able to do better light mobile gaming.