Both Deutsche Bank and HMRC are struggling to find a way to unravel data from legacy systems to allow integration with newly created big data systems based on Hadoop technology.
Zhiwei Jiang, global head of accounting and finance IT at Deutsche Bank, was speaking this week at a Cloudera roundtable discussion on big data. He said that the bank has embarked on a project to analyse large amounts of unstructured data, but is yet to understand how to make the Hadoop system work with legacy IBM mainframes and Oracle databases.
"We have been working with Cloudera since the beginning of last year, where for the next two years I am on a mission to collect as much data as possible into a data reservoir," said Jiang.
Deutsche Bank is collecting data from the front end (trading data), the middle (operations data) and the back end (finance data). However, Jiang was keen to highlight the challenges faced by a traditional banking IT system.
"At the end of the day we still have a huge installation of IBM mainframes and hundreds of millions of pounds of investment with Oracle. What do we do with that? We have 46 data warehouses, which all have terabytes and petabytes of storage, where there is 90 percent overlap of data. What do we do with that?" he said.
"Nobody has the skills to unravel the old technology. I've dedicated my career to making this Cloudera project work, but if it doesn't work I'll probably be out of a job."
He added: "It's very hard to unravel all these data warehouses that have been built over the last 20 to thirty years. We need to extract the data out, streamline it, build the traceability and lineage - it's very expensive to do."
Richard Brown, BIM GSL programme leader at Capgemini, also at the event, said that he was aware of similar difficulties facing HM Revenue and Customs, where the government department is looking to use big data to fight tax avoidance and detect fraud. Capgemini is the lead on HMRC's ASPIRE IT services contract, which cover's a significant amount of the department's IT operations.
"The problem isn't solved at HMRC. The analytics at the moment is running on the older technology. I think in most instances we are seeing companies sitting the Hadoop technology alongside existing systems," said Brown.
"With a new environment organisations can explore some new subject areas that they haven't looked at before. People haven't really got to the next phase of understanding how to migrate the old environments across."
He added: "Virtually all of the Hadoop installations we are seeing are organisations with new business problems, or new opportunities they have identified - using new datasets they can play with. That challenge is linking it back into the existing information sets."
Jiang also went on to say that he isn't even sure what Deutsche Bank is looking for from the data it is collecting, but he is sure that it will provide important insight.
"I think if the underlying data is relational and you do traditional business intelligence, you know what you are looking for. If your underlying data store is big, unstructured, raw data, you will be able to find something that you don't know what you are looking for," said Jiang.
"It will provide a high level of pure intelligence."
However, he is sure that once Deutsche Bank's system begin undertaking intelligent big data analytics, much of the other data processing will become less significant.
"If you take every little bit of data in, it will give you something that you didn't know you were looking for. That's what I'm interested in. I would argue that with any bank 80 percent of the computing is a waste of time," he said.
"If you think about what is being processed, what they are actually doing is just moving data around. With that the data gets worse and worse as you go, and then lots of subsequent people are hired in India to try and improve data quality."
He added: "But, if you have a correct way of looking at data from a data point of view, these efforts become completely meaningless and time wasting."