I had been planning to post on the recent announcement of the planned merger between Hortonworks and Cloudera, as there are a number of trends I’ve been witnessing with the adoption of Hadoop clusters, and this merger reflects them in a nutshell. But catching up on my reading I ran across Mathew Lodge’s recent article in VentureBeat titled Cloudera and Hortonworks merger means Hadoop’s influence is declining. It’s a really good post. I can confirm we see the same lack of interest in deployment of Hadoop to the cloud, the same use of S3 as a storage medium when Hadoop is used atop Infrasrtucture as a Service (IaaS), and the same developer-driven selection of whatever platform is easiest to use and deploy on. All in all it’s an article I wish I’d written, as he did a great job capturing most of the areas I wanted to cover. And there are some humorous bits like “Ironically, there has been no Cloud Era for Cloudera.” Check it out – it’s worth your time.
But there are a couple other areas I still want to cover.
It is rare to see someone install Hadoop into a public IaaS account. Customers (now) choose a cloud native variant and let the vendor handle all the patching and hide much of the infrastructure pieces from them. And they gain the option of spinning down the cluster when not in use, making it much more efficient. Couple that with all the work to set up Hadoop yourself, and it’s an easy decision. I was somewhat surprised to learn that things like AWS’s Elastic Map Reduce (EMR) are not always chosen as repository, but Dynamo is surprisingly popular – which makes sense, given its powerful query features, indexing, and ability to offer the best of relational and big data capabilities. Most public IaaS vendors offer so many database variants that it is easy to mix and match multiple variants to support applications, further reducing demand for classic Hadoop installations.
One area continuing to drive Hadoop adoption is on-premise data collection and data lakes for logs. The most cited driver is the need to keep Splunk costs under control. It takes effort to divert some content to Hadoop instead of sending everything to the Splunk collectors – but data can be collected and held at drastically lower cost. And you need not sacrifice analytics. For organizations collecting every log entry, this is a win. We also see Hadoop adopted by Security Operations Centers, running side by side with other platforms. Part of the need is to fill gaps around what their SIEM keeps, part is to keep costs down, and part is to easily support deployment of custom security intelligence applications by non-developers.
Another aspect not covered in any of the articles I have found so far is that Cloudera and Hortonworks both have deep catalogs of security capabilities. Together they are dominant. As firms use large “data lakes” to hold all sorts of sensitive data inside Hadoop, this will be a win for firms running Hadoop in-house. Identity management, encryption, monitoring, and a whole bunch of other great stuff. Big data is not the security issue it was 5 years ago. Hortonworks and Cloudera have a lot to do with that; their combined capabilities and enterprise deployment experience make them a powerful choice to help firms manage and maintain existing infrastructure. That is all my way of saving that some of their negative press is unwarranted, given the profitable avenues ahead.
The idea that growth in the Hadoop segment appears to have been slowing is not new. AWS has been the largest seller of Hadoop-based data platforms, by revenue and by customer, for several years. The cloud is genuinely an existential threat to all the commercial Hadoop vendors – and comparable big data databases – if they continue to sell in the same way. The recent acceleration of cloud adoption simply makes it more apparent that Cloudera and Hortonworks are competing for a shrinking share of IT budgets. But it makes sense to band together and make the most of their expertise in enterprise Hadoop deployments, and should help with tooling and management software for cloud migrations. If Kubernetes is any indication, there are huge areas for improvement in tooling and services beyond what cloud vendors provide.