Memory Bandwidth vs Installed DIMMS with Intel® Xeon® Scalable
With the release of Intel® Xeon® Scalable Family Processors (formerly known as 'Skylake' and recently 'Cascade lake'), many of our customers have queried how the number of memory modules populated per processor socket can affect the overall memory bandwidth.
Simply put, the latest generation processors have 6 memory channels and so for best performance, ideally all of these channels should be populated in order to enable the optimum bandwidth. Using fewer than the 6 available means that some channels are left idle, and much like lanes on a motorway, by not using these additional lanes you will struggle to achieve the full potential.
Also worth considering is the fact that a lot of customers' applications request binary capacities of memory, for example, 8,16,32,64,128, 256GB and so on, as this is the norm, however, these capacities are not divisible by 6 or 12 modules. Memory capacity figures which work best for 6 channels are 24, 48, 96, 192, 384GB which all divide neatly into 6 or 12 and so can allow the full population of the channels.
Intel® has provided us with an extensive guide on memory channel population which explains in detail how the bandwidth varies, which we have summarised below. This summary should give you a better understanding of the impact of populating between 2 and 24 modules on a two-processor system.
On the X-axis is the number of DIMMS installed with the rising graph line portraying the memory bandwidth in GB/s.
What is evident from the graph is as you add more modules, you begin to see a steady improvement in bandwidth until we hit the optimum level of 12 DIMMS (6 per processor and 1 per channel).
Between 12 DIMMs and 22 DIMMS there is an unbalanced memory population, which results in lower performance than with 12 DIMMs. The bandwidth becomes somewhat unstable and non-deterministic, and in some cases can be significantly lower even than what is recorded in this chart.
As a result, certain configurations are not recommended. For instance, 5 DIMMs per processor, or 7, 9, 10 or 11 (which equates to 10, 14, 18, 20 and 22 for dual-processor configurations like the above), are not recommended as they do not provide a reliable bandwidth. The results can be as low as 20GB/s but as high as 100GB/s (or between 40GB and 200GB/s for a dual-processor config) and will vary from request to request – which is not good for high-performance applications.
Based on our own tests, we recommend that unless absolutely necessary, you stick to the population guideline of using every channel where possible. In cases where that cannot be achieved, it’s recommended you consult with our team of engineers and support staff who will help you find the best configuration for your application.
In the meantime find out more about Intel® Xeon® Scalable Family and our range of Intel® solutions here.
Not what you're looking for? Check out our archives for more content