AMD Threadripper Pro Review: An Upgrade Over Regular Threadripper?
by Dr. Ian Cutress on July 14, 2021 9:00 AM EST- Posted in
- CPUs
- AMD
- ThreadRipper
- Threadripper Pro
- 3995WX
Since the launch of AMD’s Threadripper Pro platform, the desire to see what eight channels of memory brings to compute over the regular quad-channel Threadripper has been an intriguing prospect. Threadripper Pro is effectively a faster version of AMD’s EPYC, limited for single CPU workstation use, but also heralds a full 280 W TDP to match the frequencies of the standard Threadripper line. There is a 37% price premium from Threadripper to Threadripper Pro, which allows for ECC memory support, double the PCIe lanes, and double the memory bandwidth. In this review, we’re comparing every member of both platforms that is commercially available.
Threadripper Pro: Born of Need
When AMD embarked upon its journey with the new Ryzen portfolio, the delineation of where each product sat in the traditional market has not always been entirely clear. The first generation Ryzen was earmarked for standard consumers, however the top of the line Ryzen 7 1800X, with eight cores, competed against Intel’s high-end desktop market. The Zen 2-based portfolio saw the mainstream Ryzen go to 16 cores, pushing past Intel’s best 18-core HEDT processor at the time in most tests. That Zen 2-based Ryzen 9 3950X was still classified as a ‘mainstream platform’ processor, as it only had 24 PCIe lanes and dual-channel memory, sufficient for mainstream users but not enough for workstation markets. These mainstream processors were also limited to 105W TDP.
At the other end of the scale was AMD EPYC, with the first generation EPYC 7601 having 32 cores, and the second generation EPYC 7742 having 64 cores, up to 225W TDP. These share the same LGA4094 socket, have eight channels of memory, full ECC support, and 128 PCIe lanes (first PCIe 3.0, then PCIe 4.0), with dual-socket support. For workstation users interested in EPYC, AMD launched single socket ‘P’ versions. These offered the same features, at around 200 TDP, losing some performance to the regular non-P versions.
AMD then launched Threadripper, a high-end desktop version of EPYC that went all the way up to 280 W for peak frequency and performance. Threadripper sat above Ryzen with 64 PCIe lanes and quad channel memory, enabling mainstream users that wanted a bit more to get a bit more. However workstation users noted that while 280 W was great, it lacked official ECC memory support, and compared to EPYC, sometimes the reduced memory channel support and reduced PCIe compared to EPYC stopped Threadripper being adopted.
So enter Threadripper Pro, which sits between Threadripper and EPYC, and in this instance, very much more on the EPYC side. Threadripper Pro has almost all the features of AMD’s EPYC platform, but in a 280W thermal envelope. It has eight channels of memory support, all 128 PCIe 4.0 lanes, and can support ECC. The only downside to EPYC is that it can only be used in single socket systems, and the peak memory support is halved (from 4 TB to 2 TB). Threadripper Pro also comes at a small price premium as well.
AMD Comparison | ||||
AnandTech | Ryzen | Threadripper | Threadripper Pro |
Enterprise EPYC |
Cores | 6-16 | 32-64 | 12-64 | 16-64 |
Architecture | Zen 3 | Zen 2 | Zen 2 | Zen 3 |
1P Flagship | R9 5950X |
TR 3990X |
TR Pro 3995WX | EPYC 7713P |
MSRP | $799 | $3990 | $5490 | $5010 |
TDP | 105 W | 280 W | 280 W | 225 W |
Base Freq | 3400 MHz | 2900 MHz | 2700 MHz | 2000 MHz |
Turbo Freq | 4900 MHz | 4300 MHz | 4200 MHz | 3675 MHz |
Socket | AM4 | sTRX40 | sTRX4: WRX80 | SP3 |
L3 Cache | 64 MB | 256 MB | 256 MB | 256 MB |
DRAM | 2 x DDR4-3200 | 4 x DDR4-3200 | 8 x DDR4-3200 | 8 x DDR4-3200 |
DRAM Capacity | 128 GB | 256 GB | 2 TB, ECC | 4 TB, ECC |
PCIe | 4.0 x20 + chipset |
4.0 x56 + chipset | 4.0 x120 + chipset | 4.0 x128 |
Pro Features | No | No | Yes | Yes |
One of the biggest pulls for Threadripper and Threadripper Pro has been any market that typically uses high-speed workstations and can scale their workloads. Speaking to a local OEM, the demand for Threadripper and Threadripper Pro from the visual effects industry has been off the charts, where these companies are ripping out their old infrastructure and replacing anew with AMD. This has also been spurned by the recent pandemic, where these studios want to keep the expensive hardware onsite and allow their artists to work from home via remote access.
Threadripper Pro CPUs: Four Models, Three at Retail
When TR Pro launched in 2020, the processors were a Lenovo exclusive for the P620 workstation. The deal between Lenovo and AMD was not disclosed, however it would appear that the exclusivity deal ran for six months, from September to February, with the processors being made retail available on March 2nd.
During that time, we were sampled one of these workstations for review, and it still remains one of the best modular systems I’ve ever tested:
Lenovo ThinkStation P620 Review: A Vehicle for Threadripper Pro
AMD’s first Threadripper Pro platform has four processors in it, ranging from 12 cores to 64 cores, mimicking their equivalents in Threadripper 3000 and EPYC 77x2 but at 280W.
AMD Ryzen Threadripper Pro | |||||||
AnandTech | Cores | Base Freq |
Turbo Freq |
Chiplets | L3 Cache |
TDP | Price SEP |
3995WX | 64 / 128 | 2700 | 4200 | 8 + 1 | 256 MB | 280 W | $5490 |
3975WX | 32 / 64 | 3500 | 4200 | 4 + 1 | 128 MB | 280 W | $2750 |
3955WX | 16 / 32 | 3900 | 4300 | 2 + 1 | 64 MB | 280 W | $1150 |
3945WX | 12 / 24 | 4000 | 4300 | 2 + 1 | 64 MB | 280 W | OEM |
Sitting at the top is the 64-core Threadripper Pro 3995WX, with a 2.7 GHz base frequency and a 4.2 GHz turbo frequency. This processor is the only one in the family to have all 256 MB of L3 cache, as it has all eight chiplets fully active. The $5490 price is a full 37.5% increase over the Threadripper 3990X at $3990.
AMD 64-Core Zen 2 Comparison | |||
AnandTech | Threadripper 3990X |
Threadripper Pro 3995WX |
EPYC 7702P |
MSRP | $3990 | $5490 | $4425 |
TDP | 280 W | 280 W | 200 W |
Base Freq | 2900 MHz | 2700 MHz | 2000 MHz |
Turbo Freq | 4300 MHz | 4200 MHz | 3350 MHz |
L3 Cache | 256 MB | 256 MB | 256 MB |
DRAM | 4 x DDR4-3200 | 8 x DDR4-3200 | 8 x DDR4-3200 |
DRAM Capacity | 256 GB | 2 TB, ECC | 4 TB, ECC |
PCIe | 4.0 x56 + chipset | 4.0 x120 + chipset | 4.0 x128 |
Pro Features | No | Yes | Yes |
Middle of the line is the 32-core Threadripper Pro 3975WX, with a 3.5 GHz base frequency and a 4.2 GHz turbo frequency. AMD decided to make this processor use four chiplets with all eight cores on each chiplet, leading to 128 MB of L3 cache total. At $2750, it is also 37.5% more expensive than the equivalent 32-core Threadripper 3970X.
AMD 32-Core Zen 2 Comparison | |||
AnandTech | Threadripper 3970X |
Threadripper Pro 3975WX |
EPYC 7501P |
MSRP | $3990 | $2750 | $2300 |
TDP | 280 W | 280 W | 180 W |
Base Freq | 3700 MHz | 3500 MHz | 2500 MHz |
Turbo Freq | 4500 MHz | 4200 MHz | 3350 MHz |
L3 Cache | 128 MB | 128 MB | 128 MB |
DRAM | 4 x DDR4-3200 | 8 x DDR4-3200 | 8 x DDR4-3200 |
DRAM Capacity | 256 GB | 2 TB, ECC | 4 TB, ECC |
PCIe | 4.0 x56 + chipset | 4.0 x120 + chipset | 4.0 x128 |
Pro Features | No | Yes | Yes |
The following two processors have no Threadripper equivalents, but also represent a slightly different scenario that we’ll explore in this review. Both the 3955WX and 3945WX, despite being part of the big Threadripper Pro family, only use two chiplets in their design: 8 core per chipet for the 3955 WX and 6 core per chiplet for the 3945WX. This means these processors only have 64 MB of L3 cache, making them somewhat identical to the Ryzen 9 3950X and Ryzen 9 3900X, except the IO die means there is eight channels of memory and 128 PCIe lanes here.
AMD 16-Core Zen 2/3 Comparison | |||
AnandTech | Ryzen 9 3950X |
Threadripper Pro 3955WX |
Ryzen 9 5950X |
MSRP | $749 | $1150 | $799 |
TDP | 105 W | 280 W | 105 W |
Base Freq | 3500 MHz | 3900 MHz | 3400 MHz |
Turbo Freq | 4700 MHz | 4300 MHz | 4900 MHz |
L3 Cache | 64 MB | 64 MB | 64 MB |
DRAM | 2 x DDR4-3200 | 8 x DDR4-3200 | 2 x DDR4-3200 |
DRAM Capacity | 128 GB | 2 TB, ECC | 128 GB |
PCIe | 4.0 x20 + chipset |
4.0 x120 + chipset |
4.0 x20 + chipset |
Pro Features | No | Yes | No |
Motherboard Cost | -- | +++ | -- |
The 3955WX has a higher base frequency, but the 3950X has the higher turbo frequency. The 3950X is also cheaper, and motherboards are cheaper! It might be worth partitioning these out into a separate comparison review.
The final Threadripper Pro processor, the 3945WX, does not have a price, because AMD is not making it available at retail. This part is for selected OEM customers only it seems; perhaps the limited substrate resources in the market right now makes it unappealing to make too many of these? Hard to say.
Motherboards: Beware!
Despite being based on the same LGA4094 socket as both Threadripper and EPYC, Threadripper Pro has its own unique WRX80 platform that has to be used instead. Only select vendors seem to have access/licenses to make WRX80 motherboards, and your main options are:
- ASUS Pro WS WRX80E-SAGE SE WiFi ($1000)
- Supermicro M12SWA-TF (~$750)
- GIGABYTE WRX80 SU8-IPMI ($790)
All three boards use a transposed LGA4094 socket, eight DDR4 memory slots, and 6-7 PCIe 4.0 slots.
Though beware! There is an option of finding an old/refurbished Lenovo P620 motherboard. It is worth noting that Lenovo is exercising an AMD feature for OEMs: processors used in that Lenovo motherboard will be locked to Lenovo forever. This is part of AMD’s guaranteed supply chain process, allowing OEMs to hard lock processors into certain vendors for supply chain end-to-end security that is requested by specific customers. In that instance, if you might ever want to break down your system to upgrade and sell off parts, it is not recommended you find a Lenovo TR Pro system unless you buy/sell it as a whole.
This Review
The main goal of this review is to test all of the Threadripper Pro 3000 hardware and compare against the equivalent Threadripper 3000 to get a sense of how much performance is gained by the increased memory bandwidth, or lost due to the slight core frequency differences. We are also including Intel’s best HEDT/workstation processor for comparison, the W-3175X, as well as the top consumer-grade processors on the market. All systems are tested at JEDEC specifications.
Test Setup | |||||
AMD TR Pro |
3995WX 3975WX 3955WX |
ASUS Pro WS WRX80E-SAGE SE WiFi |
BIOS 0405 |
IceGiant Thermosiphon |
Kingston 8x16 GB DDR4-3200 ECC |
AMD TR |
TR 3990X TR 3970X TR 3960X |
ASRock TRX40 Taichi |
BIOS P1.70 |
IceGiant Thermosiphon |
ADATA 4x32 GB DDR4-3200 |
AMD Ryzen |
R9 5950X | GIGABYTE X570 I Aorus Pro |
BIOS F31L |
Noctua NH-U12S |
ADATA 4x32 GB DDR4-3200 |
Intel Core |
i9-11900K | ASUS Maximus XIII Hero |
BIOS 0703 |
Thermalright TRUE Copper* |
ADATA 4x32 GB DDR4-3200 |
Intel Xeon |
Xeon W-3175X | ASUS ROG Dominus Extreme |
BIOS 0601 | Asetek 690LX-PN |
DDR4-2666 ECC |
GPU | Sapphire RX 460 2GB (CPU Tests) | ||||
PSU | Various (inc. Corsair AX860i) | ||||
SSD | Crucial MX500 2TB | ||||
*Silverstone SST-FHP141-VF 173 CFM fans also used. Nice and loud. |
Many thanks to Kingston for supplying a full set of KSM32RD8/16MEI - 16x16 GB of DDR4-3200 ECC RDIMMs for enterprise testing in systems like Threadripper Pro.
As part of this review, we are also showcasing the 64 core processors in 128T mode as well as 64T mode. This is being done to showcase how some processors can get better performance by having better memory bandwidth per thread - one of the issues with these high core count processors is the limited amount of memory bandwidth each thread can access. Also, some operating systems (such as Windows) struggle above 64 threads due to the use of thread groups.
98 Comments
View All Comments
Mikewind Dale - Wednesday, July 14, 2021 - link
I have a ThreadRipper Pro 3955WX, and I discovered something interesting about the memory bandwidth.Originally, I bought 4x64 GB ECC RDIMM because I thought 256 GB might be enough, and I wanted to leave some empty RAM slots to populate with 128 GB RDIMMs if those ever became cost-effective. (Right now, 128 GB RDIMMs are about triple the price of 64 GB.)
CPU-Z and AIDA64 reported "quad" channel memory, and AIDA64's memory benchmarks showed reasonable memory performance.
But I discovered that 256 GB wasn't enough for my application, so I bought 2 more 64 GB RDIMMs.
At this point, I had 6 DIMMs populated. CPU-Z and AIDA64 both reported "hexa" channel memory, but AIDA64's memory benchmarks showed that my memory performance was about 2/3 that of a Ryzen.
So I bought 2 more RDIMMs again, for a total of 8. Now, my memory benchmark in AIDA64 is much closer to expected.
So the moral of the story is: you can populate 4 DIMMs, or you can populate 8, but don't dare populate 6. Populating precisely 6 DIMMs will absolutely cripple your memory performance, whereas 4 DIMMs still have acceptable performance.
kobblestown - Wednesday, July 14, 2021 - link
The 3955 probably has only 2 CCDs and is therefore limited to 4 DDR channels throughput. It seems that each IF link has the throughput of 2 DDR channels and this makes sense.You should keep in mind that the IO die has in effect 4 dual channel controllers and you may have populated them suboptimally. If you have two dual channel controllers fully populated and two half populated (instead of a third fully populated and the fourth one staying empty) you'll have skewed results. Also, there was some noise about Milan working better with 6 channel configurations so it may be something specific to Rome chips.
Rudde - Wednesday, July 14, 2021 - link
Server providers had requested for 6 channel memory support for server processors and that was implemented in Milan.McFig - Wednesday, July 14, 2021 - link
What kobblestown is suggesting is that maybe Mikewind Dale could have gotten the 6 RDIMMs working by moving one of them so that each pair is fully populated.Mikewind Dale - Wednesday, July 14, 2021 - link
McFig, there are only 8 slots, so I'm not sure how I could have moved the 6 DIMMs among the 8 slots to ensure that each pair is populated.1_rick - Wednesday, July 14, 2021 - link
He probably means "each of 3 pairs fully populated".DougMcC - Wednesday, July 14, 2021 - link
I think the question is whether 3/3 is better than 4/2kobblestown - Friday, July 16, 2021 - link
Heya! Sorry for the nebulous formulation. In terms of the number of DIMMS per memory controller, I suggest having 2+2+2+0 instead of 2+1+2+1. One needs to figure out what this means for any particular MB. But as DougMcC suggests, that would probably mean having 4 DIMMs on one side of the CPU and 2 on the other, rather than having 3 DIMMs on each side. The latter is bound to be suboptimal. Whether the former offers an improvement is something that I would be very interested to know but could be that Rome has some shortcoming in this area which is addressed in Milan.Again, dual CCD configurations are limited to 4 channel bandwidth but it's still worth it to have all channels populated so you don't get bitten by badly handled assymetry and the IO does not fight (too much) with the cores for the bandwidth.
kobblestown - Friday, July 16, 2021 - link
BTW, one should also check the memory interleaving options in the UEFI. Maybe the way the IO die aggregates the memory channels can be tweaked to achive the expected performance even with 6 DIMMs. Or maybe that's only achievable with Milan.Mikewind Dale - Friday, July 16, 2021 - link
Ahhh, I see what you mean. Thanks. Well, I have 8 DIMMs now, and I don't want to mess with my system any more. Maybe Anandtech can test this.