Cloudera and Hortonworks Merge

I had been planning to post on the recent announcement of the planned merger between Hortonworks and Cloudera, as there are a number of trends I’ve been witnessing with the adoption of Hadoop clusters, and this merger reflects them in a nutshell. But catching up on my reading I ran across Mathew Lodge’s recent article in VentureBeat titled Cloudera and Hortonworks merger means Hadoop’s influence is declining. It’s a really good post. I can confirm we see the same lack of interest in deployment of Hadoop to the cloud, the same use of S3 as a storage medium when Hadoop is used atop Infrasrtucture as a Service (IaaS), and the same developer-driven selection of whatever platform is easiest to use and deploy on. All in all it’s an article I wish I’d written, as he did a great job capturing most of the areas I wanted to cover. And there are some humorous bits like “Ironically, there has been no Cloud Era for Cloudera.” Check it out – it’s worth your time. But there are a couple other areas I still want to cover. It is rare to see someone install Hadoop into a public IaaS account. Customers (now) choose a cloud native variant and let the vendor handle all the patching and hide much of the infrastructure pieces from them. And they gain the option of spinning down the cluster when not in use, making it much more efficient. Couple that with all the work to set up Hadoop yourself, and it’s an easy decision. I was somewhat surprised to learn that things like AWS’s Elastic Map Reduce (EMR) are not always chosen as repository, but Dynamo is surprisingly popular – which makes sense, given its powerful query features, indexing, and ability to offer the best of relational and big data capabilities. Most public IaaS vendors offer so many database variants that it is easy to mix and match multiple variants to support applications, further reducing demand for classic Hadoop installations. One area continuing to drive Hadoop adoption is on-premise data collection and data lakes for logs. The most cited driver is the need to keep Splunk costs under control. It takes effort to divert some content to Hadoop instead of sending everything to the Splunk collectors – but data can be collected and held at drastically lower cost. And you need not sacrifice analytics. For organizations collecting every log entry, this is a win. We also see Hadoop adopted by Security Operations Centers, running side by side with other platforms. Part of the need is to fill gaps around what their SIEM keeps, part is to keep costs down, and part is to easily support deployment of custom security intelligence applications by non-developers. Another aspect not covered in any of the articles I have found so far is that Cloudera and Hortonworks both have deep catalogs of security capabilities. Together they are dominant. As firms use large “data lakes” to hold all sorts of sensitive data inside Hadoop, this will be a win for firms running Hadoop in-house. Identity management, encryption, monitoring, and a whole bunch of other great stuff. Big data is not the security issue it was 5 years ago. Hortonworks and Cloudera have a lot to do with that; their combined capabilities and enterprise deployment experience make them a powerful choice to help firms manage and maintain existing infrastructure. That is all my way of saving that some of their negative press is unwarranted, given the profitable avenues ahead. The idea that growth in the Hadoop segment appears to have been slowing is not new. AWS has been the largest seller of Hadoop-based data platforms, by revenue and by customer, for several years. The cloud is genuinely an existential threat to all the commercial Hadoop vendors – and comparable big data databases – if they continue to sell in the same way. The recent acceleration of cloud adoption simply makes it more apparent that Cloudera and Hortonworks are competing for a shrinking share of IT budgets. But it makes sense to band together and make the most of their expertise in enterprise Hadoop deployments, and should help with tooling and management software for cloud migrations. If Kubernetes is any indication, there are huge areas for improvement in tooling and services beyond what cloud vendors provide. Share:

Read Post

Building a Multi-cloud Logging Strategy: Introduction

Logging and monitoring for cloud infrastructure has become the top topic we are asked about lately. Even general conversations about moving applications to the cloud always seem to end with clients asking how to ‘do’ logging and monitoring of cloud infrastructure. Logs are key to security and compliance, and moving into cloud services – where you do not actually control the infrastructure – makes logs even more important for operations, risk, and security teams. But these questions make perfect sense – logging in and across cloud infrastructure is complicated, offering technical challenges and huge potential cost overruns if implemented poorly. The road to cloud is littered with the charred remains of many who have attempted to create multi-cloud logging for their respective employers. But cloud services are very different – structurally and operationally – than on-premise systems. The data is different; you do not necessarily have the same event sources, and the data is often different or incomplete, so existing reports and analytics may not work the same. Cloud services are ephemeral so you can’t count on a server “being there” when you go looking for it, and IP addresses are unreliable identifiers. Networks may appear to behave the same, but they are software defined, so you cannot tap into them the same way as on-premise, nor make sense of the packets even if you could. How you detect and respond to attacks differs, leveraging automation to be as agile as your infrastructure. Some logs capture every API call; while their granularity of information is great, the volume of information is substantial. And finally, the skills gap of people who understand cloud is absent at many companies, so they ‘lift and shift’ what they do today into their cloud service, and are then forced to refactor the deployment in the future. One aspect that surprised all of us here at Securosis is the adoption of multi-cloud; we do not simply mean some Software as a Service (SaaS) along with a single Infrastructure as a Service (IaaS) provider – instead firms are choosing multiple IaaS vendors and deploying different applications to each. Sometimes this is a “best of breed” approach, but far more often the selection of multiple vendors is driven by fear of getting locked in with a single vendor. This makes logging and monitoring even more difficult, as collection across IaaS providers and on-premise all vary in capabilities, events, and integration points. Further complicating the matter is the fact that existing Security Information and Event Management (SIEM) vendors, as well as some security analytics vendors, are behind the cloud adoption curve. Some because their cloud deployment models are no different than what they offer for on-premise, making integration with cloud services awkward. Some because their solutions rely on traditional network approaches which don’t work with software defined networks. Still others employ pricing models which, when hooked into highly verbose cloud log sources, cost customers small fortunes. We will demonstrate some of these pricing models later in this paper. Here are some common questions: What data or logs do I need? Server/network/container/app/API/storage/etc.? How do I get them turned on? How do I move them off the sources? How do I get data back to my SIEM? Can my existing SIEM handle these logs, in terms of both different schema and volume & rate? Should I use log aggregators and send everything back to my analytics platform? At what point during my transition to cloud does this change? How do I capture packets and where do I put them? These questions, and many others, are telling because they come from trying to fit cloud events into existing/on-premise tools and processes. It’s not that they are wrong, but they highlight an effort to map new data into old and familiar systems. Instead you need to rethink your logging and monitoring approach. The questions firms should be asking include: What should my logging architecture look like now and how should it change? How do I handle multiple accounts across multiple providers? What cloud native sources should I leverage? How do I keep my costs manageable? Storage can be incredibly cheap and plentiful in the cloud, but what is the pricing model for various services which ingest and analyze the data I’m sending them? What should I send to my existing data analytics tools? My SIEM? How do I adjust what I monitor for cloud security? Batch or real-time streams? Or both? How do I adjust analytics for cloud? You need to take a fresh look at logging and monitoring, and adapt both IT and security workflows to fit cloud services – especially if you’re transitioning to cloud from an on-premise environment and will be running a hybrid environment during the transition… which may be several years from initial project kick-off. Today we launch a new series on Building a Multi-cloud Logging Strategy. Over the next few weeks, Gal Shpantzer and I (Adrian Lane) will dig into the following topics to discuss what we see when helping firms migrate to cloud. And there is a lot to cover. Our tentative outline is as follows: Barriers to Success: This post will discuss some reasons traditional approaches do not work, and areas where you might lack visibility. Cloud Logging Architectures: We discuss anti-patterns and more productive approaches to logging. We will offer recommendations on reference architectures to help with multi-cloud, as well as centralized management. Native Logging Features: We’ll discuss what sorts of logs you can expect to receive from the various types of cloud services, what you may not receive in a shared responsibility service, the different data sources firms have come to expect, and how to get them. We will also provide practical notes on logging in GCP, Azure, and AWS. We will help you navigate their native offerings, as well as the capabilities of PaaS/SaaS vendors. BYO Logging: Where and how to fill gaps with third-party tools, or building them into applications and service you deploy in the cloud. Cloud or On-premise Management? We will discuss tradeoffs between moving log management into the cloud, keeping these activities on-premise, and using a

Read Post

Totally Transparent Research is the embodiment of how we work at Securosis. It’s our core operating philosophy, our research policy, and a specific process. We initially developed it to help maintain objectivity while producing licensed research, but its benefits extend to all aspects of our business.

Going beyond Open Source Research, and a far cry from the traditional syndicated research model, we think it’s the best way to produce independent, objective, quality research.

Here’s how it works:

  • Content is developed ‘live’ on the blog. Primary research is generally released in pieces, as a series of posts, so we can digest and integrate feedback, making the end results much stronger than traditional “ivory tower” research.
  • Comments are enabled for posts. All comments are kept except for spam, personal insults of a clearly inflammatory nature, and completely off-topic content that distracts from the discussion. We welcome comments critical of the work, even if somewhat insulting to the authors. Really.
  • Anyone can comment, and no registration is required. Vendors or consultants with a relevant product or offering must properly identify themselves. While their comments won’t be deleted, the writer/moderator will “call out”, identify, and possibly ridicule vendors who fail to do so.
  • Vendors considering licensing the content are welcome to provide feedback, but it must be posted in the comments - just like everyone else. There is no back channel influence on the research findings or posts.
    Analysts must reply to comments and defend the research position, or agree to modify the content.
  • At the end of the post series, the analyst compiles the posts into a paper, presentation, or other delivery vehicle. Public comments/input factors into the research, where appropriate.
  • If the research is distributed as a paper, significant commenters/contributors are acknowledged in the opening of the report. If they did not post their real names, handles used for comments are listed. Commenters do not retain any rights to the report, but their contributions will be recognized.
  • All primary research will be released under a Creative Commons license. The current license is Non-Commercial, Attribution. The analyst, at their discretion, may add a Derivative Works or Share Alike condition.
  • Securosis primary research does not discuss specific vendors or specific products/offerings, unless used to provide context, contrast or to make a point (which is very very rare).
    Although quotes from published primary research (and published primary research only) may be used in press releases, said quotes may never mention a specific vendor, even if the vendor is mentioned in the source report. Securosis must approve any quote to appear in any vendor marketing collateral.
  • Final primary research will be posted on the blog with open comments.
  • Research will be updated periodically to reflect market realities, based on the discretion of the primary analyst. Updated research will be dated and given a version number.
    For research that cannot be developed using this model, such as complex principles or models that are unsuited for a series of blog posts, the content will be chunked up and posted at or before release of the paper to solicit public feedback, and provide an open venue for comments and criticisms.
  • In rare cases Securosis may write papers outside of the primary research agenda, but only if the end result can be non-biased and valuable to the user community to supplement industry-wide efforts or advances. A “Radically Transparent Research” process will be followed in developing these papers, where absolutely all materials are public at all stages of development, including communications (email, call notes).
    Only the free primary research released on our site can be licensed. We will not accept licensing fees on research we charge users to access.
  • All licensed research will be clearly labeled with the licensees. No licensed research will be released without indicating the sources of licensing fees. Again, there will be no back channel influence. We’re open and transparent about our revenue sources.

In essence, we develop all of our research out in the open, and not only seek public comments, but keep those comments indefinitely as a record of the research creation process. If you believe we are biased or not doing our homework, you can call us out on it and it will be there in the record. Our philosophy involves cracking open the research process, and using our readers to eliminate bias and enhance the quality of the work.

On the back end, here’s how we handle this approach with licensees:

  • Licensees may propose paper topics. The topic may be accepted if it is consistent with the Securosis research agenda and goals, but only if it can be covered without bias and will be valuable to the end user community.
  • Analysts produce research according to their own research agendas, and may offer licensing under the same objectivity requirements.
  • The potential licensee will be provided an outline of our research positions and the potential research product so they can determine if it is likely to meet their objectives.
  • Once the licensee agrees, development of the primary research content begins, following the Totally Transparent Research process as outlined above. At this point, there is no money exchanged.
  • Upon completion of the paper, the licensee will receive a release candidate to determine whether the final result still meets their needs.
  • If the content does not meet their needs, the licensee is not required to pay, and the research will be released without licensing or with alternate licensees.
  • Licensees may host and reuse the content for the length of the license (typically one year). This includes placing the content behind a registration process, posting on white paper networks, or translation into other languages. The research will always be hosted at Securosis for free without registration.

Here is the language we currently place in our research project agreements:

Content will be created independently of LICENSEE with no obligations for payment. Once content is complete, LICENSEE will have a 3 day review period to determine if the content meets corporate objectives. If the content is unsuitable, LICENSEE will not be obligated for any payment and Securosis is free to distribute the whitepaper without branding or with alternate licensees, and will not complete any associated webcasts for the declining LICENSEE. Content licensing, webcasts and payment are contingent on the content being acceptable to LICENSEE. This maintains objectivity while limiting the risk to LICENSEE. Securosis maintains all rights to the content and to include Securosis branding in addition to any licensee branding.

Even this process itself is open to criticism. If you have questions or comments, you can email us or comment on the blog.