The new Intel Nehalem Xeon CPUs, which are being introduced in countless one- and two-socket servers and workstations, have already generated a lot of heat.

Back when AMD's Opteron was ruling the performance roost, Intel was busy gluing two separate cores onto a single die and calling it a dual-core CPU. Memory bandwidth lagged due to the central off-die memory controller, and while the overall performance of the processor was acceptable, it lacked the NUMA (Non-Uniform Memory Access) punch that was the Opteron's claim to fame.

Nehalem is based on a NUMA architecture, much like the Opteron, and its performance is miles ahead of anything else Intel has released to date. We're impressed.

Inside Nehalem

The Nehalem chips (Xeon 3500 series for single socket and Xeon 5500 series for two-socket systems) feature a quad-core layout with 731 million transistors, 256KB of L2 cache per core, 8MB of L3 cache, deeper and faster caching, and better branch prediction. In essence, Nehalem is a blend of the strengths of Intel's legacy Xeon processors with a fundamental architecture change in the incorporation of NUMA.

With NUMA, each CPU has its own memory controller. This ties DIMM ranks to a specific CPU and, in the Nehalem architecture, provides memory bandwidth speeds at 25.6GBps per link or 6.4GT (Gigatransfers) per second with DDR3 RAM. Due to this architecture change and the nature of DDR3 RAM, the RAM clock runs at 800MHz, 1,066MHz, or 1,333MHz.

If the DIMM ranks are populated with a single RDIMM (Registered DIMM) per channel, the highest speed of 1,333MHz is possible. As RAM is added to those channels, the overall speed drops to 1,066MHz or 800MHz. However, with 4GB RDIMMs, a dual-socket system can run 24GB of RAM at 1,333MHz using only six RDIMMS. Using the Tylersburg chip set, it's possible to bring the RAM total up to 144GB - 72GB per CPU - running at 800MHz.

There's more to Nehalem than just NUMA, however. A raft of supporting players also enters into the mix, including updated Virtualization Technology extensions to assist in virtualization use cases; support for DDR3 memory, which can provide double the data rate of DDR2; and SSE 4.2 instructions, a relatively minor update aimed at accelerating text processing.

The significantly increased memory bandwidth is the major update, along with the advent of QuickPath, the new processor interconnect that replaces the aged front-side bus. But these additions are quite welcome and round out the package.

One of these new features is dubbed Turbo mode. You might recall the days of Intel 8088 CPUs running at either 8MHz or 16MHz if the "Turbo" switch was enabled. This isn't quite the same thing.

The Turbo feature in Nehalem allows the CPU cores to burst to higher clock rates if load requires. Turbo adds what Intel calls "bins" that represent a boost of 133MHz to each core, allowing certain cores to in essence overclock themselves on an as-needed basis.

Turbo sounds slightly gimmicky, but it can assist in single- and lightly threaded workloads, as it can only be utilised on a subset of physical cores. For instance, one or two cores might be able to allocate three additional bins, but several threads running concurrently might only be able to access a single bin on each of the four cores.

All of this is dependent on the thermal and power health of the CPU at the time and is dynamically adjusted.

NEXT PAGE: Whoa Nehily!