Securosis

Research

Securing Big Data: Operational Security Issues

Before I dig into today’s post I want to share a couple observations. First, my new copy of the Harvard Business Review just arrived. The cover story is “Getting Control of Big Data”. It’s telling that HBR thinks big data is a trend important enough to warrant a full spread, and feel business managers need to understand big data and the benefits and risks it poses to business. As soon as I finish this post I intend to dive into these articles. Now that I have just about finished this research effort, I look forward to contrasting what I have discovered with their perspective. Second, when we talk about big data security, we are really referring to both data and infrastructure security. We want to protect the application (or database, if you prefer that term) that manages data, with the end goal of protecting the information under management. If an attacker can access data directly, bypassing the database management system, they will. Barring a direct path to the information, they will look for weaknesses in or ways to subvert the database application. So it’s important to remember that when we talk about database security we mean both data and infrastructure protection. Finally, a point about clarity. Big data security is one of the tougher topics to describe, especially as we here at Securosis prefer to describe things in black and white terms for the sake of clarity. But for just about every rule we establish and every emphatic statement we make, we have to acknowledge exceptions. Given the variety of different big data distributions and add-on capabilities, you can likely find a single instance of every security control described in today’s post. But it’s usually a single security control, like encryption, with the other security controls absent from the various packages. Nothing offers even a partial suite of solutions, much less a comprehensive offering. Today I want to discuss operational security of big data environments. Unlike yesterday’s post that discussed architectural security issues endemic to the platform, it is now time to address security controls of an operational nature. That includes “turning the dials” things like configuration management and access controls, as well as “bolt-on” capabilities such as auditing and security gateways. We see the greatest impact in these areas, and vendors jumping in with security offerings to fill the gaps. Normally when we consider how to secure data repositories, we consider the following major areas: Encryption: The standard for protecting data at rest is encryption to protect data from undesired access. And just because folks don’t use archiving features to back up data does not mean a rogue DBA or cloud service manager won’t. I think two or three of the more obscure NoSQL variants provides encryption for data at rest, but most do not. And the majority of available encryption products offer neither sufficient horizontal scalability nor adequate transparency for use with big data. This is a critical issue. Administrative data access: Each node has an admin, and each admin can read the node’s data if they choose. As with encryption, we need a boundary or facility to provide separation of duties between different administrators. The requirement is the same as on relational platforms – but big data platforms lack their array of built-in facilities, documentation, and third party tools to address requirements. Unwanted direct access to data files or data node processes can be addressed through a combination of access controls, separation of roles, and encryption technologies, but out-of-the box data is only as secure as your least trustworthy administrator. It’s up to the system designer to select controls to close this gap. Configuration and patch management: When managing a cluster of servers, it’s common to have nodes running different configurations and patch levels. And if you’re using dissimilar platforms to support the cluster you need to figure out what how to handle management. Existing configuration management tools work for underlying platforms, and HDFS Federation will help with cluster management, but careful planning is still necessary. I will go more detail about how in the next post, when I make recommendations. The cluster may tolerate nodes cycling without loss of data service interruption, but reboots can still cause serious performance issues, depending on which nodes are affected and how the cluster is configured. The upshot is that people don’t patch, fearing user complaints. Perhaps you have heard that one before. Authentication of applications/clients: Hadoop uses Kerberos to authenticate users and add-on services to the HDFS cluster. But a rogue client can be inserted onto the network if a Kerberos ticket is stolen or duplicated. This is more of a concern when embedding credentials in virtual and cloud environments, where it’s relatively easy to introduce an exact replica of a client app or service. A clone of a node is often all that’s needed to introduce a corrupted node or service into a cluster, it’s easy to impersonate or a service in the cluster, but it requires an attacker to compromise the management plane of your environment, or obtain a backup, of a client. Regardless of it being a pain to set up, strong authentication through Kerberos is one of your principle security tools, it helps solve the critical problem of who can access hadoop services. Audit and logging: One area with a variety of add-on capabilities is logging. Scribe and LogStash are open source tools that integrate into most big data environments, as do a number of commercial products. So you just need to find a compatible tool, install it, integrate it with other systems such as SIEM or log management, and then actually review the results. Without actually looking at the data and developing policies to detect fraud, logging is not useful. Monitoring, filtering, and blocking: There are no built-in monitoring tools to look for misuse or block malicious queries. In fact I don’t believe anyone has ever described what a malicious query might look like in a big data environment – other than crappy MapReduce

Share:
Read Post

Totally Transparent Research is the embodiment of how we work at Securosis. It’s our core operating philosophy, our research policy, and a specific process. We initially developed it to help maintain objectivity while producing licensed research, but its benefits extend to all aspects of our business.

Going beyond Open Source Research, and a far cry from the traditional syndicated research model, we think it’s the best way to produce independent, objective, quality research.

Here’s how it works:

  • Content is developed ‘live’ on the blog. Primary research is generally released in pieces, as a series of posts, so we can digest and integrate feedback, making the end results much stronger than traditional “ivory tower” research.
  • Comments are enabled for posts. All comments are kept except for spam, personal insults of a clearly inflammatory nature, and completely off-topic content that distracts from the discussion. We welcome comments critical of the work, even if somewhat insulting to the authors. Really.
  • Anyone can comment, and no registration is required. Vendors or consultants with a relevant product or offering must properly identify themselves. While their comments won’t be deleted, the writer/moderator will “call out”, identify, and possibly ridicule vendors who fail to do so.
  • Vendors considering licensing the content are welcome to provide feedback, but it must be posted in the comments - just like everyone else. There is no back channel influence on the research findings or posts.
    Analysts must reply to comments and defend the research position, or agree to modify the content.
  • At the end of the post series, the analyst compiles the posts into a paper, presentation, or other delivery vehicle. Public comments/input factors into the research, where appropriate.
  • If the research is distributed as a paper, significant commenters/contributors are acknowledged in the opening of the report. If they did not post their real names, handles used for comments are listed. Commenters do not retain any rights to the report, but their contributions will be recognized.
  • All primary research will be released under a Creative Commons license. The current license is Non-Commercial, Attribution. The analyst, at their discretion, may add a Derivative Works or Share Alike condition.
  • Securosis primary research does not discuss specific vendors or specific products/offerings, unless used to provide context, contrast or to make a point (which is very very rare).
    Although quotes from published primary research (and published primary research only) may be used in press releases, said quotes may never mention a specific vendor, even if the vendor is mentioned in the source report. Securosis must approve any quote to appear in any vendor marketing collateral.
  • Final primary research will be posted on the blog with open comments.
  • Research will be updated periodically to reflect market realities, based on the discretion of the primary analyst. Updated research will be dated and given a version number.
    For research that cannot be developed using this model, such as complex principles or models that are unsuited for a series of blog posts, the content will be chunked up and posted at or before release of the paper to solicit public feedback, and provide an open venue for comments and criticisms.
  • In rare cases Securosis may write papers outside of the primary research agenda, but only if the end result can be non-biased and valuable to the user community to supplement industry-wide efforts or advances. A “Radically Transparent Research” process will be followed in developing these papers, where absolutely all materials are public at all stages of development, including communications (email, call notes).
    Only the free primary research released on our site can be licensed. We will not accept licensing fees on research we charge users to access.
  • All licensed research will be clearly labeled with the licensees. No licensed research will be released without indicating the sources of licensing fees. Again, there will be no back channel influence. We’re open and transparent about our revenue sources.

In essence, we develop all of our research out in the open, and not only seek public comments, but keep those comments indefinitely as a record of the research creation process. If you believe we are biased or not doing our homework, you can call us out on it and it will be there in the record. Our philosophy involves cracking open the research process, and using our readers to eliminate bias and enhance the quality of the work.

On the back end, here’s how we handle this approach with licensees:

  • Licensees may propose paper topics. The topic may be accepted if it is consistent with the Securosis research agenda and goals, but only if it can be covered without bias and will be valuable to the end user community.
  • Analysts produce research according to their own research agendas, and may offer licensing under the same objectivity requirements.
  • The potential licensee will be provided an outline of our research positions and the potential research product so they can determine if it is likely to meet their objectives.
  • Once the licensee agrees, development of the primary research content begins, following the Totally Transparent Research process as outlined above. At this point, there is no money exchanged.
  • Upon completion of the paper, the licensee will receive a release candidate to determine whether the final result still meets their needs.
  • If the content does not meet their needs, the licensee is not required to pay, and the research will be released without licensing or with alternate licensees.
  • Licensees may host and reuse the content for the length of the license (typically one year). This includes placing the content behind a registration process, posting on white paper networks, or translation into other languages. The research will always be hosted at Securosis for free without registration.

Here is the language we currently place in our research project agreements:

Content will be created independently of LICENSEE with no obligations for payment. Once content is complete, LICENSEE will have a 3 day review period to determine if the content meets corporate objectives. If the content is unsuitable, LICENSEE will not be obligated for any payment and Securosis is free to distribute the whitepaper without branding or with alternate licensees, and will not complete any associated webcasts for the declining LICENSEE. Content licensing, webcasts and payment are contingent on the content being acceptable to LICENSEE. This maintains objectivity while limiting the risk to LICENSEE. Securosis maintains all rights to the content and to include Securosis branding in addition to any licensee branding.

Even this process itself is open to criticism. If you have questions or comments, you can email us or comment on the blog.