The hardware trend is to increase the number of CPU ( physical or logical) not to increase the CPU frequency speed. All the CPU require access the memory through the system bus ( memory controller) . However, the more CPU needs to go through the system bus, the more contention would happen. One solution is to increase the speed of the system bus. NUMA is born to solve the solution in different way.
NUMA is group the CPU and memory to the Node ( NUMA node). Each NUMA node can have its set of the CPUs , system bus and memory. The basic hardware is has at least 2 NUMA nodes but recently hardware trend can have more than 2 nodes as well.
Foreign Memory Access
The CPU 0 in the NUMA node 0 can access memory in NUMA node 1. This is call “Foreign memory access”. Access foreign memory would be slower than access the memory in the local NUMA node.
coreinfo ( from sysinternal ) would show you the cost like the screenshot below.
MAXDOP - max degree of parallelism
The general guide line for MAXDOP is not set more than the physical cores not logical cores ( in Hyper-threading env).
If NUMA is supported in Hardware, the MAXDOP should not set more than the number of the physical cores in single NUMA node. This is to avoid the expensive foreign memory access.
How to fake the NUMA for testing
Please see here
There are many DMV views we can use to check the NUMA