October 25, 2018 - Securosis

Cloudera and Hortonworks Merge

Adrian Lane October 25, 2018 No Comments

I had been planning to post on the recent announcement of the planned merger between Hortonworks and Cloudera, as there are a number of trends I’ve been witnessing with the adoption of Hadoop clusters, and this merger reflects them in a nutshell. But catching up on my reading I ran across Mathew Lodge’s recent article in VentureBeat titled Cloudera and Hortonworks merger means Hadoop’s influence is declining. It’s a really good post. I can confirm we see the same lack of interest in deployment of Hadoop to the cloud, the same use of S3 as a storage medium when Hadoop is used atop Infrasrtucture as a Service (IaaS), and the same developer-driven selection of whatever platform is easiest to use and deploy on. All in all it’s an article I wish I’d written, as he did a great job capturing most of the areas I wanted to cover. And there are some humorous bits like “Ironically, there has been no Cloud Era for Cloudera.” Check it out – it’s worth your time. But there are a couple other areas I still want to cover. It is rare to see someone install Hadoop into a public IaaS account. Customers (now) choose a cloud native variant and let the vendor handle all the patching and hide much of the infrastructure pieces from them. And they gain the option of spinning down the cluster when not in use, making it much more efficient. Couple that with all the work to set up Hadoop yourself, and it’s an easy decision. I was somewhat surprised to learn that things like AWS’s Elastic Map Reduce (EMR) are not always chosen as repository, but Dynamo is surprisingly popular – which makes sense, given its powerful query features, indexing, and ability to offer the best of relational and big data capabilities. Most public IaaS vendors offer so many database variants that it is easy to mix and match multiple variants to support applications, further reducing demand for classic Hadoop installations. One area continuing to drive Hadoop adoption is on-premise data collection and data lakes for logs. The most cited driver is the need to keep Splunk costs under control. It takes effort to divert some content to Hadoop instead of sending everything to the Splunk collectors – but data can be collected and held at drastically lower cost. And you need not sacrifice analytics. For organizations collecting every log entry, this is a win. We also see Hadoop adopted by Security Operations Centers, running side by side with other platforms. Part of the need is to fill gaps around what their SIEM keeps, part is to keep costs down, and part is to easily support deployment of custom security intelligence applications by non-developers. Another aspect not covered in any of the articles I have found so far is that Cloudera and Hortonworks both have deep catalogs of security capabilities. Together they are dominant. As firms use large “data lakes” to hold all sorts of sensitive data inside Hadoop, this will be a win for firms running Hadoop in-house. Identity management, encryption, monitoring, and a whole bunch of other great stuff. Big data is not the security issue it was 5 years ago. Hortonworks and Cloudera have a lot to do with that; their combined capabilities and enterprise deployment experience make them a powerful choice to help firms manage and maintain existing infrastructure. That is all my way of saving that some of their negative press is unwarranted, given the profitable avenues ahead. The idea that growth in the Hadoop segment appears to have been slowing is not new. AWS has been the largest seller of Hadoop-based data platforms, by revenue and by customer, for several years. The cloud is genuinely an existential threat to all the commercial Hadoop vendors – and comparable big data databases – if they continue to sell in the same way. The recent acceleration of cloud adoption simply makes it more apparent that Cloudera and Hortonworks are competing for a shrinking share of IT budgets. But it makes sense to band together and make the most of their expertise in enterprise Hadoop deployments, and should help with tooling and management software for cloud migrations. If Kubernetes is any indication, there are huge areas for improvement in tooling and services beyond what cloud vendors provide. Share:

Read Post

Building a Multi-cloud Logging Strategy: Introduction

Adrian Lane October 25, 2018 No Comments

Logging and monitoring for cloud infrastructure has become the top topic we are asked about lately. Even general conversations about moving applications to the cloud always seem to end with clients asking how to ‘do’ logging and monitoring of cloud infrastructure. Logs are key to security and compliance, and moving into cloud services – where you do not actually control the infrastructure – makes logs even more important for operations, risk, and security teams. But these questions make perfect sense – logging in and across cloud infrastructure is complicated, offering technical challenges and huge potential cost overruns if implemented poorly. The road to cloud is littered with the charred remains of many who have attempted to create multi-cloud logging for their respective employers. But cloud services are very different – structurally and operationally – than on-premise systems. The data is different; you do not necessarily have the same event sources, and the data is often different or incomplete, so existing reports and analytics may not work the same. Cloud services are ephemeral so you can’t count on a server “being there” when you go looking for it, and IP addresses are unreliable identifiers. Networks may appear to behave the same, but they are software defined, so you cannot tap into them the same way as on-premise, nor make sense of the packets even if you could. How you detect and respond to attacks differs, leveraging automation to be as agile as your infrastructure. Some logs capture every API call; while their granularity of information is great, the volume of information is substantial. And finally, the skills gap of people who understand cloud is absent at many companies, so they ‘lift and shift’ what they do today into their cloud service, and are then forced to refactor the deployment in the future. One aspect that surprised all of us here at Securosis is the adoption of multi-cloud; we do not simply mean some Software as a Service (SaaS) along with a single Infrastructure as a Service (IaaS) provider – instead firms are choosing multiple IaaS vendors and deploying different applications to each. Sometimes this is a “best of breed” approach, but far more often the selection of multiple vendors is driven by fear of getting locked in with a single vendor. This makes logging and monitoring even more difficult, as collection across IaaS providers and on-premise all vary in capabilities, events, and integration points. Further complicating the matter is the fact that existing Security Information and Event Management (SIEM) vendors, as well as some security analytics vendors, are behind the cloud adoption curve. Some because their cloud deployment models are no different than what they offer for on-premise, making integration with cloud services awkward. Some because their solutions rely on traditional network approaches which don’t work with software defined networks. Still others employ pricing models which, when hooked into highly verbose cloud log sources, cost customers small fortunes. We will demonstrate some of these pricing models later in this paper. Here are some common questions: What data or logs do I need? Server/network/container/app/API/storage/etc.? How do I get them turned on? How do I move them off the sources? How do I get data back to my SIEM? Can my existing SIEM handle these logs, in terms of both different schema and volume & rate? Should I use log aggregators and send everything back to my analytics platform? At what point during my transition to cloud does this change? How do I capture packets and where do I put them? These questions, and many others, are telling because they come from trying to fit cloud events into existing/on-premise tools and processes. It’s not that they are wrong, but they highlight an effort to map new data into old and familiar systems. Instead you need to rethink your logging and monitoring approach. The questions firms should be asking include: What should my logging architecture look like now and how should it change? How do I handle multiple accounts across multiple providers? What cloud native sources should I leverage? How do I keep my costs manageable? Storage can be incredibly cheap and plentiful in the cloud, but what is the pricing model for various services which ingest and analyze the data I’m sending them? What should I send to my existing data analytics tools? My SIEM? How do I adjust what I monitor for cloud security? Batch or real-time streams? Or both? How do I adjust analytics for cloud? You need to take a fresh look at logging and monitoring, and adapt both IT and security workflows to fit cloud services – especially if you’re transitioning to cloud from an on-premise environment and will be running a hybrid environment during the transition… which may be several years from initial project kick-off. Today we launch a new series on Building a Multi-cloud Logging Strategy. Over the next few weeks, Gal Shpantzer and I (Adrian Lane) will dig into the following topics to discuss what we see when helping firms migrate to cloud. And there is a lot to cover. Our tentative outline is as follows: Barriers to Success: This post will discuss some reasons traditional approaches do not work, and areas where you might lack visibility. Cloud Logging Architectures: We discuss anti-patterns and more productive approaches to logging. We will offer recommendations on reference architectures to help with multi-cloud, as well as centralized management. Native Logging Features: We’ll discuss what sorts of logs you can expect to receive from the various types of cloud services, what you may not receive in a shared responsibility service, the different data sources firms have come to expect, and how to get them. We will also provide practical notes on logging in GCP, Azure, and AWS. We will help you navigate their native offerings, as well as the capabilities of PaaS/SaaS vendors. BYO Logging: Where and how to fill gaps with third-party tools, or building them into applications and service you deploy in the cloud. Cloud or On-premise Management? We will discuss tradeoffs between moving log management into the cloud, keeping these activities on-premise, and using a

Read Post

Research

Cloudera and Hortonworks Merge

Building a Multi-cloud Logging Strategy: Introduction

Research

Firestarter: Multicloud Deployment Structures and Blast Radius

Firestarter: So you want to multicloud?

Firestarter: 2019: Insert Winter is Coming Meme Here

Firestarter: re:Invent Security Review

Firestarter: Hardware Hacks and Lift and Pray

Sign Up for Our Newsletter

Contact

About

Quick Links