China has regained the crown for the fastest supercomputer on the planet, according to the semiannual Top500 list, which claims that the Milky Way-2 supercomputer has doubled the performance of the previous leader, the American "Titan" supercomputer, in just six months.
Milky Way-2, also known as "Tianhe-2," clusters together more than 32,000 Intel Xeon microprocessors as well as more than 48,000 Intel Xeon Phi chips, the server equivalent of a graphics coprocessor. All told, the two groups of chips can crunch the equivalent of 33.86 petaflops of performance, about double that of Titan, housed at the Oak Ridge National Laboratory in Tennessee. A "flop" is a floating point operation, one of the basic metrics of a computer performance; a petaflop is a thousand trillion floating point operations.
High powered computing hot rods
Taken as an abstract measurement, Milky Way-2's high-water-mark isn't that significant. But high performance computers are used for a variety of simulations, including long-term predictive models of earthquakes, how a prototype automobile will perform, predicting the impact of climate change, to trying to assess the destructive power of a nuclear weapon. Generally speaking, the additional performance of a supercomputer means more finely detailed calculations, such as modeling individual particles of air as they pass over a windshield.
In this sense, HPCs are the Formula One versions of the more prosaic and power-efficient servers driving cloud services at Apple, Google, Microsoft, and others. While they're generally owned by governments and research organizations, corporations are also beginning to invest, such as French oil conglomerate Total's investment in a 2.3 petaflop supercomputer to deduce the best locations to drill for oil.
"To compete, you must compute," said Rajeeb Hadra, vice president of the datacenter and connected systems group at Intel, last week.
"Essentially, high performance computing is becoming the digital laboratory," Hadra added.
The Top500 list is compiled by a group of researchers led by Hans Meuer, which began collecting the fastest 500 commercially available systems in 1993. The list, published in November and June, is measured using a single benchmark, the Linpack benchmark, which has come under fire from some quarters from being out of date. Nevertheless, its proponents say, it represents the best yardstick for comparing performance over time.
The U.S. remains the dominant supercomputer leader, with 253 of the 500 systems on the list. With 65 systems on the list, China ranks second, ahead of Japan, the U.K., France and Germany. Tianhe-2 was originally expected in 2015, analysts said.
Coming in the third spot is the DOE Lawrence Livermore National Laboratory's Sequoia, an IBM BlueGene/Q system that achieved 17.17 petaflops on the Linpack benchmark using 1.5 million cores. Overall, four IBM BlueGene systems made the top 10 list.
Other interesting tidbits about this edition of the Top500: Twenty-six petaflop systems are now on the list, up from 23 six months ago. Eighty-eight percent of the systems use processors of six or more cores, and 80 percent of the systems use Intel processors. The total combined performance of all 500 systems is 223 petaflops, up from 162 petaflops six months ago.
New Intel chips on the way
Intel's Xeon processor is the most widely used processor on the list, and powered about 75 percent of the systems in November, with a slight increase in the June list. Intel's next challenge is to entice supercomputer designers to add in so-called accelerators or GPUs, normally used to render graphics on a PC or notebook.
Inside of a supercomputer, however, specialized versions of those same chips can be reprogrammed to quickly perform specialized functions over and over again -- the same calculations that can be at the heart of a specialized simulation. These optimized chips, like the Intel Xeon Phi, the Nvidia "Kepler" cores and others, can significantly improve performance for little additional electrical power, the companies say.
The problem is that those coprocessors have traditionally required a different programming model than the general CPUs require, which has alienated some. Today, users are more accustomed to using "pools" of storage and microprocessors which software ropes together in the aggregate.
Intel's latest Xeon Phi chips use what Hadra called a "neo-heterogenous" model, where the architectures of the chip may be slightly different, but programmers can use a single software tool to program them. That model is being used in the Milky Way-2, Hadra said.
Intel on Monday unveiled five new Xeon Phi coprocessors: the 7120P and the 7120X for the highest of the HPCs; the 3120P and 3120A for midrange systems, and the new 5120D for high-density form factors.
Hadra also said that the "Knights Landing" architecture underpinning the Xeon Phi would eventually be manufactured in the cutting-edge 14-nm process. However, Intel did not say exactly when the 14-nm version would be released, but that the process will be in volume at that point in time. It will be available both as a coprocessor, but also as a standalone CPU much like a Xeon processor, he said.
Is the Top500 the right list?
Although the Top500 has been the granddaddy of supercomputer rankings, some have questioned whether that list really represents "true" supercomputer performance. Bill Kramer, deputy project director for the "Blue Waters" supercomputer at the National Center for Supercomputing Applications at the U. of Illinois, has said previously that the Linpack benchmark used to measure the performance of the systems on the list is both out-of-date and essentially deceiving.
"The TOP500 list has now exceeded its usefulness and actually is causing behavior and decisions that are not helpful to the high-performance computing environment and our users," Kramer said last November.
Other lists, such as the Green500, measure supercomputer performance as a measure of power efficiency, trying to highlight real-world systems that don't break the bank. An assessment of the industry by The New York Times last year found that the world's data centers consume about 30 billion watts, or about the capacity of 10 nuclear reactors.
Others have tried to develop a new benchmark for measuring performance altogether. A new high performance conjugate gradient (HPCG) benchmark published jointly by Michael A. Heroux of Sandia National Laboratories in New Mexico and Jack Dongarra of the U. of Tennessee will try to push past the Top500's Linpack benchmark with a better metric that stresses the system.
According to a white paper seen by PCWorld, the Linpack benchmark only really stresses the accelerator chips, not the general CPUs. (More details on the Tianhe-2, plus images used by this story, are available in a report by Dongarra.)
"In fact, we have reached a point where designing a system for good [Linpack] performance can actually lead to design choices that are wrong for the real application mix, or add unnecessary components or complexity to the system," the paper argues.
For the time being, however, Intel's Hadra argued that Linpack was necessary
"Our view is that some ability to talk about how systems perform and relate to each other in terms of capabilitiy has been important for technical reasons and will be important in the future, in spite of what could be perceived as obvious, shortcomings of a single number, such as Linpack," Hadra said. "It is the thing we have today, and is relevant today, and needs to be looked at going forward so that is consistent with the original vision" of its creators, he said.
Additional reporting by Joab Jackson, IDG News Service.