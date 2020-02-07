Today AMD launches the Ryzen Threadripper 3990X – a 64-core CPU with a base clock of 2.9 GHz, a boost clock of 4.3 GHz and a 256 MB L3 cache. The original plan was to provide a detailed CPU report on Monday, briefly discuss the overclocking project, and publish a detailed article about the overclocking component.

Unfortunately, a family emergency tore me off the keyboard and this plan needs to be changed. I have most of the data I need to review, but it takes some time to merge it. Instead of burying you in diagrams and graphics, I’m going to talk a little bit about this CPU, what I’ve seen, and what I think it means.

Here’s a little taste to whet your appetite:

As of February 6, 2020, this will be the world’s highest single-socket Cinebench R20 score, according to HWBot. This is the fourth highest Cinebench R20 rating overall. I did it with an Asus Zenith II Extreme motherboard and AMD’s Ryzen Threadripper 3990X with an all-core locked clock of 4 GHz on 64 cores. That is a total of 256 GHz or 0.256 THz. Although it obviously doesn’t scale as much as a 0.256-THZ single-core CPU would, the CPU runs as fast overall.

If everything goes well, I will break this record and log the results over the weekend. Then I will talk about the OK project on Monday. However, I do say this – the performance and current challenges associated with a high all-core clock on so many CPUs are formidable. I do not know what is the highest rate that I can achieve in stable operation. I hope to answer this question this weekend.

Let’s talk to this teaser about the 3990X and its performance and positioning in standard configurations.

What the 3990X brings to the table

The first and most important thing you should know about the 3990X is that this is not a CPU for everyone. The vast majority of applications are not designed for such high scaling. Windows itself is not that scalable. Microsoft’s support for more than 64 threads in Windows is a breeze.

Since Windows Server 2008, Microsoft has processed systems with more than 64 threads in a certain way: by creating processor groups. Each group contains up to 64 logical processors (a hyper-threading core and a physical core are treated identically). However, Microsoft uses spatial location detection to keep a logical core and a physical core in the same processor group whenever possible. However, this means that by default, applications can only use 50 percent of the 3990X’s 128 threads. (More information on this topic can be found here at Bitsum.) There are ways to work around this. Applications can implement their own schedulers that take better advantage of a CPU with a large core.

Overall, this means that Linux for the 3990X often offers better scaling than Windows. Techgage’s Rob Williams has done a lot of Linux testing, and I would recommend his article if you want a concrete comparison of scaling in this area.

Under Windows, the 3990X shows significant performance improvements over the 3970X in several areas. Rendering is by far the biggest profit category for the CPU. A number of rendering engines offer 3990X boosts from 1.3x to 1.6x depending on the application. One of the steps I took for this test was to purchase access to the Blender Cloud to test some of the professional quality scenes provided in this system. The more than 30 render tests I’ve done in Blender alone have confirmed that users of this application can look forward to a lot of scaling, although the exact amount depends on the type of scene. We will also examine how the 3990X and 3970X can be compared when multiple workloads are running at the same time.

Because Microsoft’s threading engine doesn’t support more than 64 threads by default, there are a number of cases where disabling SMT on the 3990X will improve performance. We will also examine these and discuss whether the CPU as a 64C / 64T chip plays a role compared to the 3970X. We’re going to include performance figures for Cascade Lake and the 10980XE, not because Intel competes directly against the 3990X with this chip, but because it’s important to provide the best representative numbers we can, and Intel is currently arguing with the $ 1000 price point. There are some tests where the 10980XE is ahead regardless of the number of cores. With such an expensive chip, I wanted to explore the nooks and crannies of the performance world.

One reason this review will take a little longer is that I am working with different benchmarks than we used before. Applications such as Agisoft Metashape, Pix4D, Da Vinci Resolve and Maya 2020 (with a CPU stretching benchmark developed by Antonio Bosi) as well as a lot of blending. We have applications where the 3990X shows its own worth (if you are playing in this professional market, at least), and yes, tests that show you are really better off with a 3970X.

We’ll also get more overclocked benchmark results, and if I have my way, a few higher scores that we have to strive for. It will be worth the wait.

First conclusions

I’m going to hold back some of my thoughts for the actual test, but here’s what I’m saying: The 3990X is a very exciting CPU, even if it’s not a chip that most people find useful to buy.

Testing this chip reminded me that at a time we were waiting for operating systems and applications to be able to use the CPU functions.

The first iteration of Hyper-Threading only worked properly if you had either XP SP1 (SP1 itself was fairly new) or SP4 for Windows 2000 installed. We have been waiting for applications that add SSE2 support for the Pentium 4. We waited for a 64-bit Windows and native applications, just like we used to wait for 32-bit apps and operating system support. Now, thanks to the 3990X, we are waiting for Microsoft to improve the handling of CPUs with a high number of cores.

The difference between AMD and Intel

AMD is not the first company to encounter this problem with Windows. The same problem occurs with all Intel CPUs with a high number of cores. However, Intel kept its core numbers much lower and its price per core much higher. The Xeon-W series, which is intended for workstations, can scale up to 28 cores in a single socket, but does not offer dual-socket compatibility. I checked the prices at Dell – a dual Xeon Gold 6252 workstation (24C / 48T, 2.1 GHz base, 3.7 GHz turbo) starts at $ 10,138. The same system with a Xeon Bronze CPU starts at $ 1,579. This is an upgrade fee of $ 8559 for two CPUs that offer only 75 percent of the Threadripper 3990X core count, with more than double the base cost.

These price cuts should lead CPUs with a higher number of cores into more professional markets, which in turn will encourage Microsoft and Linux developers to support them better.

Since the workstation market is not only responding to the number of cores, we will also examine some performance cases where Cascade Lake remains a better option. Applications that do not scale particularly well with the number of cores sometimes run significantly better on Intel hardware. I’ll tell you in advance that Cascade Lake will win a couple of tests against the 3990X. For this reason, it is important to know the various features of the CPU before you buy it.

I am sorry I did not finish the full review in time for you to read it this morning. I hope that what I’ve outlined here in my version of “Coming Soon” offers you something to look forward to – and a good overview of my thoughts on the CPU, even if I need a few more days, to finish the project.

