Power Consumption, Thermals, and Noise

Whenever we’ve tested big processors in the past, especially those designed for the super high power consumption tasks, they’ve all come with pre-prepared test systems. For the 28-core Intel Xeon W-3175X, rated at 255W, Intel sent a whole system with a 500 W Asetek liquid cooler as well as a second liquid cooler in case we were overclocking the system. When I tested the Core i9-9990XE, a 14-core 5 GHz ‘auction-only’ processor, the system integrator shipped a full 1U server with a custom liquid cooler to deal with the 400W+ thermals.

As with the Armari Magnetar X64T, the custom liquid cooling set up is an integral part of the system offering in order to achieve the high overclocked frequencies that the company promises.

Armari calls the solution its FWLv2, ‘designed to support a fully unlocked Threadripper 3990X at maximum PBO with all 64 cores sustained up to 4.1 GHz’. The solution consists of a custom monoblock created in partnership with EKWB designed to fit both the CPU and the VRM on the ASRock motherboard specifically. There is also additional convective heatsinks for the VRM to aid in additional cooling as the system also offers airflow through the chassis. This cooling loop hooks up to an EKWB Coolstream 420x45mm triple radiator, three EK-Vardar 140ER EVO fans, and a high-performance cooling pump with a custom performance profile. Armari claims a 3x better flow-rate than the best all-in-one liquid cooling solution on the market, with 200% better cooling performance and a lower noise profile (at a given power) due to the chassis design.

The system also includes two internal 140mm 1500 RPM Noctua fans, for additional airflow over the add-in components, and an angled 80mm low-noise SanAce fan mounted specifically for the memory and the VRM area of the motherboard.

As mentioned on the first page, the 420x45mm radiator is mounted on a swing arm inside the custom chassis. This makes it very easy to open the side of the case and perform maintenance. The chassis is a mix of aluminum inside and a steel frame, and weighs 18 kg / 39.7 lbs, but has handles on the top that hide inside the case, making it very easy to move but also look flush with the design. To be honest, this is a very nice chassis – it’s big, but given what it has to cool and the workstation element of it all, it is more than suitable. Externally, there are no RGB LEDs – a simple light on the top for the power/reset buttons, and blue accents at the front.

As you can probably see inside, there’s no aesthetic to pander to, especially when these systems are only meant be opened for maintenance. The standard Armari 3-year warranty for the UK (1year RTB, 2/3rd year parts + labor) includes a free full-system checkup and coolant replacement during that period.

With all that said, an overclocked 3990X is a bit of a beast, both in power consumption and cooling requirements. Armari told us going into this review that we’ll likely see a series of different power consumptions based on the workload, especially when it comes to sustained codes.

Our normal look into power consumption is typically with our y-cruncher test, which deals solely in integers:

For this test, the system was CPU over 400 W from start to finish, and the peak power was 505 W. CPU temperature averaged at 70 ºC and peaked at 82 ºC, with the CPU frequency average at 4002 MHz.

For a less dense workload that involves a mixture of math, we turn to our Agisoft test. This involves converting 2D images to 3D models, and involves four algorithmic stages – some fully multi-threaded, and others that are more serially coded.

The bulk of the test was done at around 270 W, with a single peak at 375 W. CPU temperatures never broke 50ºC.

Where we saw the real power was in our 3DPMavx floating point math test. This uses AVX2 like y-cruncher, but in a much denser calculation.

So this test runs a loop for 10 seconds, then idles for 10 seconds, hence the up and down part. There are six different loops, so each one is having a different effect on the power based on the instruction density. The test then repeats – I’ve cut the graph at 300 seconds to get a clear view

The peak power is 640 W – there is no rest when the workload is this heavy, and the CPU very quickly idles down to 70 W where possible. The peak temperature with a workload this heavy, even in small 10 second bursts, was 89ºC. Depending on the exact nature of the instructions, we saw sustained all-core frequencies at 3925 MHz to 4025 MHz.

As another angle to this story, I ran a script to plot the power consumption during the AIDA64 integer stress test, and cycled through 0 threads loaded up to 256 threads loaded, with two minutes load followed by two minutes idle.

The power consumption peaks at 450 W, as with the previous integer test, and we can see a slow steady rise from 106 W with one thread loaded up to 64 threads loaded. In this instance, AIDA64 is clever enough to split threads onto separate cores. In a normal 3990X scenario, we’re going to be seeing about 3.2 W per core at full load – in this instance, we’re approaching 6 W per core. For floating point math, where we see those 640 W peaks, it’s closer to 9 W per core. Bearing in mind that some of the consumer Ryzen Zen 2 processors are similarly running 9-12 W per core at full load, this is a bit wild.

Now, mentioning something like 640 W running at 10 seconds already got the CPU to 89ºC, the next question is what happens to the system when that load is sustained. Depending on the use case, some software might focus on INT workloads while others prefer FP. I attached a wall power meter to the system and fired up an 8K Blender render, and left the system on for over 10 minutes.

As you can see from the video, the software starts at around 4050 MHz, and slowly decreases over time to keep everything in check to about 3850 MHz. During this time, the system averaged about 900 W at the wall, with ~935 W peak. In this time, the temperature didn’t go above 92ºC, and with an audio meter at a distance of 1 ft, I measured 45-49 dB (compared to an idle of 36 dB).

To pile even more on, I turned to software that could load up the overclocked processor as well as the Quadro RTX 6000 in the system. TheaRender is our benchmark of choice here – it’s a physically based global illumination render that can find how many samples per pixel it can calculate in any given time. It works on CPU and GPU simultaneously.

This pushed the system hard. The benchmark can take up to 20 minutes or more, and the wall meter peaked at 1167 W for the full system. This was the one and only time I heard the cooling fans kick into high gear, at 52-55 dB. Thermals on the CPU were measured at 96 ºC, which seems to be a pseudo-ceiling. Even with that said, the processor was still running at 3850 MHz all-core.

I was running the system in a 100 sq ft room, and so pushing 1000 W for a long time without adequate air flow will warm up the room. I left my benching scripts on overnight, especially the per-thread power loading, to notice the next morning the room was warm. For any users planning to run sustained workloads in a small office, take note. Even though the system is much quieter than other workstation-class systems I’ve tested, should there be an option to place the system in an air-conditioned environment external to the work desk, this might be preferable.

On stability, throughout all of our testing, there was nothing to mention - there wasn't a single hint of instability. On speaking with Armari, the company said that this is down to AMD's own internal DVFS when implmenting a high level PBO-based overclock: I was told that because this system was built from the ground up to accommodate this power, along with the custom tweaks, and the fact that AMD's voltage and frequency tracking metrics always ensured a stable system (as long as the temperature and BIOS is managed), then they can build 10, 20, or 50 systems in a row and not experience any issues. All systems are pre-tested extensively before shipping with customer-like workloads as well. 

Science and Simulation Performance Armari Magnetar X64T: Cinebench go brrrr
Comments Locked

96 Comments

View All Comments

  • KillgoreTrout - Wednesday, September 9, 2020 - link

    Intelol
  • close - Wednesday, September 9, 2020 - link

    This shows some awesome performance but the tradeoff is the limited memory capacity. If you don;t need that great. If you do then Threadripper is not the best option.
  • twotwotwo - Wednesday, September 9, 2020 - link

    Hmm, so you're saying AnandTech needs a 3995WX or 2x7742 workstation sample? :)
  • close - Wednesday, September 9, 2020 - link

    A stack of them even :). Thing is memory support doesn't make for a more interesting review, doesn't really change any of the bars there. It's a tick box "supports up to 2TB of RAM".

    Memory support is of the things that makes an otherwise absurdly expensive workstation like the Mac Pro attractive (that and the fact that for whoever needs to stay within that ecosystem the licenses alone probably cost more than a stack of Pros).
  • oleyska - Wednesday, September 9, 2020 - link

    https://www.lenovo.com/no/no/thinkstation-p620

    will probably be able to help.
  • close - Wednesday, September 9, 2020 - link

    The P620 supports up to 512GB of RAM. Generally OK and probably delivers on every other aspect but for those few that need 1.5-2TB of RAM it still wouldn't cut it. For that the go to is usually a Xeon, or EPYC more recently.
  • schujj07 - Wednesday, September 9, 2020 - link

    Remember that Threadripper Pro supports 2TB of RAM in an 8 channel setup. While getting 2TB/socket isn't cheap, it is a possibility.
  • rbanffy - Thursday, September 10, 2020 - link

    I wonder the impact of the 8-channel config on single-threaded workloads. The 256MB of L3 is already quite ample to the point I'm unsure how diminished are the returns at that point.
  • sjerra - Monday, September 28, 2020 - link

    This is my biggest concern and rarely considered or studied in reviews. Design space exploration.
    CAE over many design variations. Hundreds of design variations calculated as much as possible in parallel over the available cores (one core per variation, but each grabbing a slice of the memory). I've tested this on a 7960xe, purposely running it on dual channel and quad channel memory. On dual channel memory, at 12 parallel calculations (so 6 cores/channel) I measured a 46% increase in the calculation time / sample. in quad channel, at 12 parallel calculations (so 3 cores/ channel) I already measured a 30% reduction per calculation. (can anyone explain the worse results for quad channel?)
    Either way, it leaves me to conclude that 64 cores with 4 channel memory for this type of workload is a big no go. Something to keep in mind. I'm now spec'ing a dual processor workstation with two lower core count processors and fully populated memory channels. (either epic (2x32c, 16 channels) or Xeon (2x24c, 12 channels). still deciding).
  • sjerra - Monday, September 28, 2020 - link

    Edit: 30% increase of course.

Log in

Don't have an account? Sign up now