Securosis

Research

Cloudera and Hortonworks Merge

I had been planning to post on the recent announcement of the planned merger between Hortonworks and Cloudera, as there are a number of trends I’ve been witnessing with the adoption of Hadoop clusters, and this merger reflects them in a nutshell. But catching up on my reading I ran across Mathew Lodge’s recent article in VentureBeat titled Cloudera and Hortonworks merger means Hadoop’s influence is declining. It’s a really good post. I can confirm we see the same lack of interest in deployment of Hadoop to the cloud, the same use of S3 as a storage medium when Hadoop is used atop Infrasrtucture as a Service (IaaS), and the same developer-driven selection of whatever platform is easiest to use and deploy on. All in all it’s an article I wish I’d written, as he did a great job capturing most of the areas I wanted to cover. And there are some humorous bits like “Ironically, there has been no Cloud Era for Cloudera.” Check it out – it’s worth your time. But there are a couple other areas I still want to cover. It is rare to see someone install Hadoop into a public IaaS account. Customers (now) choose a cloud native variant and let the vendor handle all the patching and hide much of the infrastructure pieces from them. And they gain the option of spinning down the cluster when not in use, making it much more efficient. Couple that with all the work to set up Hadoop yourself, and it’s an easy decision. I was somewhat surprised to learn that things like AWS’s Elastic Map Reduce (EMR) are not always chosen as repository, but Dynamo is surprisingly popular – which makes sense, given its powerful query features, indexing, and ability to offer the best of relational and big data capabilities. Most public IaaS vendors offer so many database variants that it is easy to mix and match multiple variants to support applications, further reducing demand for classic Hadoop installations. One area continuing to drive Hadoop adoption is on-premise data collection and data lakes for logs. The most cited driver is the need to keep Splunk costs under control. It takes effort to divert some content to Hadoop instead of sending everything to the Splunk collectors – but data can be collected and held at drastically lower cost. And you need not sacrifice analytics. For organizations collecting every log entry, this is a win. We also see Hadoop adopted by Security Operations Centers, running side by side with other platforms. Part of the need is to fill gaps around what their SIEM keeps, part is to keep costs down, and part is to easily support deployment of custom security intelligence applications by non-developers. Another aspect not covered in any of the articles I have found so far is that Cloudera and Hortonworks both have deep catalogs of security capabilities. Together they are dominant. As firms use large “data lakes” to hold all sorts of sensitive data inside Hadoop, this will be a win for firms running Hadoop in-house. Identity management, encryption, monitoring, and a whole bunch of other great stuff. Big data is not the security issue it was 5 years ago. Hortonworks and Cloudera have a lot to do with that; their combined capabilities and enterprise deployment experience make them a powerful choice to help firms manage and maintain existing infrastructure. That is all my way of saving that some of their negative press is unwarranted, given the profitable avenues ahead. The idea that growth in the Hadoop segment appears to have been slowing is not new. AWS has been the largest seller of Hadoop-based data platforms, by revenue and by customer, for several years. The cloud is genuinely an existential threat to all the commercial Hadoop vendors – and comparable big data databases – if they continue to sell in the same way. The recent acceleration of cloud adoption simply makes it more apparent that Cloudera and Hortonworks are competing for a shrinking share of IT budgets. But it makes sense to band together and make the most of their expertise in enterprise Hadoop deployments, and should help with tooling and management software for cloud migrations. If Kubernetes is any indication, there are huge areas for improvement in tooling and services beyond what cloud vendors provide. Share:

Share:
Read Post

Building a Multi-cloud Logging Strategy: Introduction

Logging and monitoring for cloud infrastructure has become the top topic we are asked about lately. Even general conversations about moving applications to the cloud always seem to end with clients asking how to ‘do’ logging and monitoring of cloud infrastructure. Logs are key to security and compliance, and moving into cloud services – where you do not actually control the infrastructure – makes logs even more important for operations, risk, and security teams. But these questions make perfect sense – logging in and across cloud infrastructure is complicated, offering technical challenges and huge potential cost overruns if implemented poorly. The road to cloud is littered with the charred remains of many who have attempted to create multi-cloud logging for their respective employers. But cloud services are very different – structurally and operationally – than on-premise systems. The data is different; you do not necessarily have the same event sources, and the data is often different or incomplete, so existing reports and analytics may not work the same. Cloud services are ephemeral so you can’t count on a server “being there” when you go looking for it, and IP addresses are unreliable identifiers. Networks may appear to behave the same, but they are software defined, so you cannot tap into them the same way as on-premise, nor make sense of the packets even if you could. How you detect and respond to attacks differs, leveraging automation to be as agile as your infrastructure. Some logs capture every API call; while their granularity of information is great, the volume of information is substantial. And finally, the skills gap of people who understand cloud is absent at many companies, so they ‘lift and shift’ what they do today into their cloud service, and are then forced to refactor the deployment in the future. One aspect that surprised all of us here at Securosis is the adoption of multi-cloud; we do not simply mean some Software as a Service (SaaS) along with a single Infrastructure as a Service (IaaS) provider – instead firms are choosing multiple IaaS vendors and deploying different applications to each. Sometimes this is a “best of breed” approach, but far more often the selection of multiple vendors is driven by fear of getting locked in with a single vendor. This makes logging and monitoring even more difficult, as collection across IaaS providers and on-premise all vary in capabilities, events, and integration points. Further complicating the matter is the fact that existing Security Information and Event Management (SIEM) vendors, as well as some security analytics vendors, are behind the cloud adoption curve. Some because their cloud deployment models are no different than what they offer for on-premise, making integration with cloud services awkward. Some because their solutions rely on traditional network approaches which don’t work with software defined networks. Still others employ pricing models which, when hooked into highly verbose cloud log sources, cost customers small fortunes. We will demonstrate some of these pricing models later in this paper. Here are some common questions: What data or logs do I need? Server/network/container/app/API/storage/etc.? How do I get them turned on? How do I move them off the sources? How do I get data back to my SIEM? Can my existing SIEM handle these logs, in terms of both different schema and volume & rate? Should I use log aggregators and send everything back to my analytics platform? At what point during my transition to cloud does this change? How do I capture packets and where do I put them? These questions, and many others, are telling because they come from trying to fit cloud events into existing/on-premise tools and processes. It’s not that they are wrong, but they highlight an effort to map new data into old and familiar systems. Instead you need to rethink your logging and monitoring approach. The questions firms should be asking include: What should my logging architecture look like now and how should it change? How do I handle multiple accounts across multiple providers? What cloud native sources should I leverage? How do I keep my costs manageable? Storage can be incredibly cheap and plentiful in the cloud, but what is the pricing model for various services which ingest and analyze the data I’m sending them? What should I send to my existing data analytics tools? My SIEM? How do I adjust what I monitor for cloud security? Batch or real-time streams? Or both? How do I adjust analytics for cloud? You need to take a fresh look at logging and monitoring, and adapt both IT and security workflows to fit cloud services – especially if you’re transitioning to cloud from an on-premise environment and will be running a hybrid environment during the transition… which may be several years from initial project kick-off. Today we launch a new series on Building a Multi-cloud Logging Strategy. Over the next few weeks, Gal Shpantzer and I (Adrian Lane) will dig into the following topics to discuss what we see when helping firms migrate to cloud. And there is a lot to cover. Our tentative outline is as follows: Barriers to Success: This post will discuss some reasons traditional approaches do not work, and areas where you might lack visibility. Cloud Logging Architectures: We discuss anti-patterns and more productive approaches to logging. We will offer recommendations on reference architectures to help with multi-cloud, as well as centralized management. Native Logging Features: We’ll discuss what sorts of logs you can expect to receive from the various types of cloud services, what you may not receive in a shared responsibility service, the different data sources firms have come to expect, and how to get them. We will also provide practical notes on logging in GCP, Azure, and AWS. We will help you navigate their native offerings, as well as the capabilities of PaaS/SaaS vendors. BYO Logging: Where and how to fill gaps with third-party tools, or building them into applications and service you deploy in the cloud. Cloud or On-premise Management? We will discuss tradeoffs between moving log management into the cloud, keeping these activities on-premise, and using a

Share:
Read Post

DisruptOps: Quick and Dirty: Building an S3 Guardrail with Config

Disrupt:Ops: Quick and Dirty: Building an S3 Guardrail with Config In How S3 Buckets Become Public, and the Fastest Way to Find Yours we reviewed the myriad ways S3 buckets become public and where to look for them. Today I’ll show the easiest way to continuously monitor for public buckets using AWS Config. The good news is this is pretty easy to set up; the bad news is you need to configure it separately in every region in every account. Read the full post at DisruptOps Share:

Share:
Read Post

Introducing Data Guardrails and Behavioral Analytics: Understand the Mission

After over 25 years of the modern IT security industry, breaches still happen at an alarming rate. Yes, that’s fairly obvious but still disappointing, given the billions spent every year in efforts to remedy the situation. Over the past decade the mainstays of security controls have undergone the next generation treatment – initially firewalls and more recently endpoint security. New analytical techniques have been mustered to examine infrastructure logs in more sophisticated fashion. But the industry seems to keep missing the point. The objective of nearly every hacking campaign is (still) to steal data. So why focus on better infrastructure security controls and better analytics of said infrastructure? Mostly because data security is hard. The harder the task, the less likely overwhelmed organizations will have the fortitude to make necessary changes. To be clear, we totally understand the need to live to fight another day. That’s the security person’s ethos, as it must be. There are devices to clean up, incidents to respond to, reports to write, and new architectures to figure out. The idea of tackling something nebulous like data security, with no obvious solution, can remain a bridge too far. Or is it? The time has come to revisit data security, and to utilize many of the new techniques pioneered for infrastructure to address the insider threat where it appears: attacking data. So our new series, Protecting What Matters: Introducing Data Guardrails and Behavioral Analytics, will introduce some new practices and highlight new approaches to protecting data. Before we get started, let’s send a shout-out to Box for agreeing to license this content when we finish up this series. Without clients like Box, who understand the need for forward-looking research to tell you where things are going, not reports telling you where they’ve been, we wouldn’t be able to produce research like this. Understanding Insider Risk While security professionals like to throw around the term “insider threat”, it’s often nebulously defined. In reality it includes multiple categories, including external threats which leverage insider access. We believe to truly address a risk you first need to understand it (call us crazy). To break down the first level of the insider threat, let’s consider its typical risk categories: Accidental Misuse: In this scenario the insider doesn’t do anything malicious, but makes a mistake which results in data loss. For example a customer service rep could respond to an email sent by a customer which includes private account info. It’s not like the rep is trying to violate policy, but they didn’t take the time to look at the message and clear out any private data. Tricked into Unwanted Actions: Employees are human, and can be duped into doing the wrong thing. Phishing is a great example. Or providing access to a folder based on a call from someone impersonating an employee. Again, this isn’t malicious, but it can still cause a breach. Malicious Misuse: Sometimes you need to deal with the reality of a malicious insider intentionally stealing data. In the first two categories the person isn’t trying to mask their behavior. In this scenario they are deliberately obfuscating, which that means you need different tactics to detect and prevent the activity. Account Takeover: This category reflects the fact that once an external adversary has presence on a device, they become an ‘insider’; with a compromised device and account, they have access to critical data. We need to consider these categories in the context of adversaries so you can properly align your security architecture. So who are the main adversaries trying to access your stuff? Some coarse-grained categories follows: unsophisticated (using widely available tools), organized crime, competitors, state-sponsored, and finally actual insiders. Once you have figured out your most likely adversary and their typical tactics, you can design a set of controls to effectively protect your data. For example an organized crime faction looks to access data related to banking or personal information for identity theft. But a competitor is more likely looking for product plans or pricing strategies. You can (and should) design your data protection strategy with these likely adversaries in mind, to help prioritize what to protect and how. Now that you understand your adversaries and can infer their primary tactics, you have a better understanding of their mission. Then you can select a data security architecture to minimize risk, and optimally prevent any data loss. But that requires us to use different tactics than would normally be considered data security. A New Way to Look at Data Security If you surveyed security professionals and asked what data security means to them, they’d likely say either encryption or Data Loss Prevention (DLP). When all you have is a hammer, everything looks like a nail, and for a long time those two have been the hammers available to us. Of course the fact that we want to expand our perspective a bit doesn’t mean DLP and encryption no longer have any roles to play in data protection. Of course they do. But we can supplement them with some new tactics. Data Guardrails: We have defined Guardrails as a means to enforce best practices without slowing down or impacting typical operations. Typically used within the context of cloud security (like, er, DisruptOps), a data guardrail enables data to be used in certain ways while blocking unauthorized usage. To bust out an old network security term, you can think of guardrails as like “default-deny” for data. You define the set of acceptable practices, and don’t allow anything else. Data Behavioral Analytics: Many of you have heard of UBA (User Behavioral Analytics), where all user activity is profiled, and you then look for anomalous activities which could indicate one of the insider risk categories above. What if you turned UBA inside-out and focused on the data? Using similar analytics you could profile the usage of all the data in your environment, and then look for abnormal patterns which warrant investigation. We’ll call this DataBA because your database administrators might be a little peeved if we horned in on their job title. Our next post will dig farther into these new concepts of

Share:
Read Post

DisruptOps: How S3 Buckets Become Public, and the Fastest Way to Find Yours

How S3 Buckets Become Public, and the Fastest Way to Find Yours In What Security Managers Need to Know About Amazon S3 Exposures we mentioned that one of the reasons finding public S3 buckets is so darn difficult is because there are multiple, overlapping mechanisms in place that determine the ultimate amount of S3 access. To be honest, there’s a chance I don’t even know all the edge cases but this list should cover the vast majority of situations. Read the full post at DisruptOps Share:

Share:
Read Post

DisruptOps: Why Everyone Automates in Cloud

Why Everyone Automates in Cloud If you see me speaking about cloud it’s pretty much guaranteed I’ll eventually say: Cloud security starts with architecture and ends with automation. I’m nothing if not repetitive. This isn’t just a quip, it’s based on working heavily in cloud for nearly a decade with organizations of all size. The one consistency I see over and over is that once organizations hit a certain scale they start automating their operations. And every year that line is earlier and earlier in their cloud journey. I know it because first I lived it, then I watched every single organization I worked with, talked with, or generally glanced at, go down the same path. Read the full post at DisruptOps Share:

Share:
Read Post

DisruptOps: (DevSec)Ops vs. Dev(SecOps)

(DevSec)Ops vs. Dev(SecOps) I just got back from the Boston DevOps Days. I really enjoy hanging around DevOps and cloud people. The energy of these conferences is great, and they are genuinely excited about transforming how their organizations build and deploy applications. Many don’t have a negative perception of security folks, but they don’t really understand what security folks do either. Read the full post at DisruptOps Share:

Share:
Read Post

DisruptOps: What Security Managers Need to Know About Amazon S3 Exposures (2/2)

What Security Managers Need to Know About Amazon S3 Exposures (2/2) Our first Disrupt:Ops post discussed how exposure of S3 data becomes such a problem, with some details on how buckets become public in the first place. This post goes a bit deeper, before laying a foundation for how to manage S3 to avoid these mistakes yourself. Read the full post at DisruptOps Share:

Share:
Read Post

DisruptOps: What Security Managers Need to Know About Amazon S3 Exposures (1/2)

As we spin up Disrupt:OPS we are beginning to post cloud-specific content over there, mixing theory with practical how-to guidance. Not to worry! We have plenty of content still planned for Securosis. But we haven’t added any staff at Securosis so there is only so much we can write. In the meantime, linking to non-product posts from Securosis should help ensure you don’t lose sleep over missing even a single cloud-related blog entry. So here’s #1 from the Disrupt:Ops hit parade! What Security Managers Need to Know About Amazon S3 Exposures (1/2) The accidental (or deliberate) exposure of sensitive data on Amazon S3 is one of those deceptively complex issues. On the surface it seems entirely simple to avoid, yet despite wide awareness we see a constant stream of public exposures and embarrassments, combined with a healthy dollop of misunderstanding and victim blaming. Read the full post at DisruptOps Share:

Share:
Read Post

Firestarter: Hardware Hacks and Lift and Pray

Did China manage to hardware hack the Apple and Amazon data centers? Or did Bloomberg get it wrong? And what the heck can you do about it anyway? This week we start with a discussion of today’s blockbuster security news, before shifting gears back to cloud. It turns out most organizations are having to lift and shift to cloud, even when that is not ideal. We talk about some of your options, even in the face of ridiculous management timelines. Watch or listen: Share:

Share:
Read Post
dinosaur-sidebar

Totally Transparent Research is the embodiment of how we work at Securosis. It’s our core operating philosophy, our research policy, and a specific process. We initially developed it to help maintain objectivity while producing licensed research, but its benefits extend to all aspects of our business.

Going beyond Open Source Research, and a far cry from the traditional syndicated research model, we think it’s the best way to produce independent, objective, quality research.

Here’s how it works:

  • Content is developed ‘live’ on the blog. Primary research is generally released in pieces, as a series of posts, so we can digest and integrate feedback, making the end results much stronger than traditional “ivory tower” research.
  • Comments are enabled for posts. All comments are kept except for spam, personal insults of a clearly inflammatory nature, and completely off-topic content that distracts from the discussion. We welcome comments critical of the work, even if somewhat insulting to the authors. Really.
  • Anyone can comment, and no registration is required. Vendors or consultants with a relevant product or offering must properly identify themselves. While their comments won’t be deleted, the writer/moderator will “call out”, identify, and possibly ridicule vendors who fail to do so.
  • Vendors considering licensing the content are welcome to provide feedback, but it must be posted in the comments - just like everyone else. There is no back channel influence on the research findings or posts.
    Analysts must reply to comments and defend the research position, or agree to modify the content.
  • At the end of the post series, the analyst compiles the posts into a paper, presentation, or other delivery vehicle. Public comments/input factors into the research, where appropriate.
  • If the research is distributed as a paper, significant commenters/contributors are acknowledged in the opening of the report. If they did not post their real names, handles used for comments are listed. Commenters do not retain any rights to the report, but their contributions will be recognized.
  • All primary research will be released under a Creative Commons license. The current license is Non-Commercial, Attribution. The analyst, at their discretion, may add a Derivative Works or Share Alike condition.
  • Securosis primary research does not discuss specific vendors or specific products/offerings, unless used to provide context, contrast or to make a point (which is very very rare).
    Although quotes from published primary research (and published primary research only) may be used in press releases, said quotes may never mention a specific vendor, even if the vendor is mentioned in the source report. Securosis must approve any quote to appear in any vendor marketing collateral.
  • Final primary research will be posted on the blog with open comments.
  • Research will be updated periodically to reflect market realities, based on the discretion of the primary analyst. Updated research will be dated and given a version number.
    For research that cannot be developed using this model, such as complex principles or models that are unsuited for a series of blog posts, the content will be chunked up and posted at or before release of the paper to solicit public feedback, and provide an open venue for comments and criticisms.
  • In rare cases Securosis may write papers outside of the primary research agenda, but only if the end result can be non-biased and valuable to the user community to supplement industry-wide efforts or advances. A “Radically Transparent Research” process will be followed in developing these papers, where absolutely all materials are public at all stages of development, including communications (email, call notes).
    Only the free primary research released on our site can be licensed. We will not accept licensing fees on research we charge users to access.
  • All licensed research will be clearly labeled with the licensees. No licensed research will be released without indicating the sources of licensing fees. Again, there will be no back channel influence. We’re open and transparent about our revenue sources.

In essence, we develop all of our research out in the open, and not only seek public comments, but keep those comments indefinitely as a record of the research creation process. If you believe we are biased or not doing our homework, you can call us out on it and it will be there in the record. Our philosophy involves cracking open the research process, and using our readers to eliminate bias and enhance the quality of the work.

On the back end, here’s how we handle this approach with licensees:

  • Licensees may propose paper topics. The topic may be accepted if it is consistent with the Securosis research agenda and goals, but only if it can be covered without bias and will be valuable to the end user community.
  • Analysts produce research according to their own research agendas, and may offer licensing under the same objectivity requirements.
  • The potential licensee will be provided an outline of our research positions and the potential research product so they can determine if it is likely to meet their objectives.
  • Once the licensee agrees, development of the primary research content begins, following the Totally Transparent Research process as outlined above. At this point, there is no money exchanged.
  • Upon completion of the paper, the licensee will receive a release candidate to determine whether the final result still meets their needs.
  • If the content does not meet their needs, the licensee is not required to pay, and the research will be released without licensing or with alternate licensees.
  • Licensees may host and reuse the content for the length of the license (typically one year). This includes placing the content behind a registration process, posting on white paper networks, or translation into other languages. The research will always be hosted at Securosis for free without registration.

Here is the language we currently place in our research project agreements:

Content will be created independently of LICENSEE with no obligations for payment. Once content is complete, LICENSEE will have a 3 day review period to determine if the content meets corporate objectives. If the content is unsuitable, LICENSEE will not be obligated for any payment and Securosis is free to distribute the whitepaper without branding or with alternate licensees, and will not complete any associated webcasts for the declining LICENSEE. Content licensing, webcasts and payment are contingent on the content being acceptable to LICENSEE. This maintains objectivity while limiting the risk to LICENSEE. Securosis maintains all rights to the content and to include Securosis branding in addition to any licensee branding.

Even this process itself is open to criticism. If you have questions or comments, you can email us or comment on the blog.