When particles collide in the Large Hadron Collider (LHC) at CERN the millions of sensors recording what happens generate around 1 petabyte of data per second.
That amount of data is far too large even for CERN's worldwide grid of computing and storage facilities to handle, according to the organization's IT communications systems group leader. So, they don't.
The LHC is the world's largest particle accelerator, crashing particles into each other at 99.999 percent of the speed of light. Data is measured at four points along its 27 kilometer length. The measuring points known as ATLAS, LHCb, CMS and ALICE each hold their own experiments and their own data collector with millions of sensors.
It's unfeasible to store all the data generated, but it's not all needed for the experiments anyway, so CERN simply deletes a large chunk of it, Jean-Michel Jouanigot said late last month at the CERN Computer Center.
Each of the four data collectors has its own event filter computer farm that picks the must-have data out from the mass. "The goal is to try not to drop anything interesting," Jouanigot said. What's left is sent to the main computer center over an optical fiber network.
The experiments don't all produce the same volumes of filtered data. ATLAS produces up to 320M bytes per second, followed by CMS with 220M Bps. The data from ALICE amounts to 100M Bps and LHCb produces 50M Bps. That's when the LHC is colliding protons with one another, but "the accelerator has two modes of running," Jouanigot said. The second mode collides lead ions, which are much heavier particles. In that mode more data is produced by the four experiments, about 1.25G Bps in total.
After filtering, CERN has up to 25 petabytes of data to store each year. Most of it is stored on tapes, reducing an energy bill which is "probably only second to the annual cost for personnel," Jouanigot said.
The CERN data center has a tape capacity of 34 petabytes, using 45,000 cartridges in 160 drives. Jouanigot uses tape drives from IBM and StorageTek (Oracle) because they make the biggest tape drives available. To get a good deal on the tape robots Jouanigot said he plays the companies against each other.
CERN also stores data on disks: the computer center has a raw disk capacity of 45.3 petabytes across 53,728 disks.
Analyzing all that data takes the equivalent of around 100,000 of the fastest PC processors, but CERN's computer center can only provide around 20 percent of that computing capacity.
For the rest, the data is distributed to computing centers all over the world via the Worldwide LHC Computing Grid (WLCG). The data is transported partly over CERN's own fiber network, "and we are also renting fibers to other places in Europe, the USA or the Asia-Pacific to connect our center to theirs."
CERN divides the computing grid into layers it calls "tiers." The CERN Computer Center is Tier 0 and functions as a central hub for all the data. From there the data travels to Tier 1, a ring of 11 data centers with two in the U.S., one each in France, Italy, the Netherlands, Germany, Spain, the U.K., Canada and Taiwan, and a distributed data center in the Nordic countries. Those centers process, analyze and store the raw data, opening it up for Tier 2, consisting of roughly 160 centers used by scientists to access and process data.
While the grid consists of many different data centers around the world, it is made to look like one system for the user. To do that, CERN uses middleware to link all the hardware in the grid and present it as a single massive virtual resource.
All the middleware is open source and influenced by the Globus Toolkit for building computing grids. Data centers in Europe and Asia use a version of gLite middleware, while the Nordic countries use ARC software. In the U.S., the Virtual Data Toolkit provided by the Open Science Grid is used. Today, CERN aims to run between 500,000 and 1 million tasks on the grid each day, but will steadily increase this number as computing resources and new technologies become available.
The LHC's data processing system has run smoothly since the beginning -- unlike the collider itself, which after starting up for the first time immediately blew out a large section of the accelerator due to a bad weld in the helium cooling system.
"We are strictly following our plans and are extremely proud that the system is working like a Swiss clock," Jouanigot said.