Computerworld UK ran an interesting article on how Deutsche Bank and HMRC are struggling to integrate Hadoop systems with legacy infrastructure. This is a very real problem for very large enterprises with significant investments in mainframes, Teradata, Grids, MPP, EDW, whatever. From the post:
Zhiwei Jiang, global head of accounting and finance IT at Deutsche Bank, was speaking this week at a Cloudera roundtable discussion on big data. He said that the bank has embarked on a project to analyse large amounts of unstructured data, but is yet to understand how to make the Hadoop system work with legacy IBM mainframes and Oracle databases.
“We have been working with Cloudera since the beginning of last year, where for the next two years I am on a mission to collect as much data as possible into a data reservoir,” said Jiang.
I want to make two points.
First, I don’t think this particular issue applies to most corporate IT. In fact, from my perspective, there is no holdup with large corporations jumping into big data. Most are already there. Why? Because marketing organizations have credit cards. They hire a data architect, spin up a cloud instance, and are off and running. Call it Rogue IT, but it’s working for them. They’re getting good results. They are performing analytics on data that was previously cost-prohibitive, and it’s making them better. They are not waiting around for corporate IT and governance to decide where data can go and who will enforce policies. Just like BYOD, they are moving forward, and they’ll ask forgiveness later.
As far as very large corporations integrating the old and the new, it’s smart to look to leverage existing data sets. To the firms referenced in the article, if analytic system integration is a requirement, this is a very real problem. Integration, or at the very least sharing data, is not an easy technical problem. That said, my personal take on the whole slowdown of adoption, unless you have compliance or governance constraints, is “Don’t do it.” If it’s purely a desire to leverage existing multi-million dollar investments, it may not be cost effective to do so. Commodity computing resources are incredibly cheap, and the software is virtually free. Copy the data and move on. Leveraging existing infrastructure is great, but it will likely save money to move data into NoSQL clusters, and extend capabilities on these newer platforms. That said, compliance, security and corporate governance of these systems – and the data they will house – is not well understood. Worse, extending security and corporate governance may not be feasible on most NoSQL platforms.