When you're trying to learn more about the universe with the Large Hadron Collider (LHC), which generated 30 terabytes of data this year, using big data technology is vital for information analysis, says CTO Sverre Jarp.
Speaking at the Big Data Warehousing and Business Intelligence 2012 conference in Sydney this week, European Centre For Nuclear Research (CERN) Openlab's Jarp told delegates that physics researchers need to measure electrons and other elementary particles inside the LHC at Geneva, Switzerland.
"These particles fly at practically the speed of light in the LHC so you need several metres in order to study them," he said. "When these collide, they give tremendous energy to the secondary particles that come out."
Galaxy census one of the first ASKAP projects
Big data to create 960K new IT jobs in APAC by 2015:Gartner
Treasury CIO's big data odyssey
CERN Openlab uses a mix of tape and disk technology to store this large amount of research data.
"Today, the evolution of big data has been such that we can put one terabyte of data on one physical disk or tape cartridge," he said.
"We are safely in the domain of petabytes and we are moving to exabytes in the future."
When asked why the LHC generates so much data, he explained that each particle detector has millions of sensors but the data they sense is "very unstructured."
"A particle may have passed by a sensor in the LHC and this happens at the incredible speed of 40 megahertz or 40 million times per second."
Despite using a mix of disk and tape data storage technology, CERN Openlab experiences disk failures every day.
"We have a team walking around the centre exchanging bad disks for good ones and hoping the storage technology we use is good enough for keeping everything alive," he said.
Jarp advised CIOs and IT managers to get their unstructured data into a structured form as quickly as possible.
"Big data management and analytics require a solid organisational structure at all levels," he said.
"A change in corporate culture is also required. Our community started preparing for big data more than a decade before real physics data arrived."
Jarp added that he estimates the LHC will run for another 15 to 20 years with exabytes of data to be generated and stored.
IDG Communications is an official media partner for the Big Data Warehousing and Business Intelligence 2012 conference.
Follow Hamish Barwick on Twitter: @HamishBarwick Follow CIO Australia on Twitter and Like us on Facebook... Twitter: @CIO_Australia, Facebook: CIO Australia, or take part in the CIO conversation on LinkedIn: CIO Australia