Securosis

Research

Summary: Who pays who?

Adrian here… Apple buying space on Google’s cloud made news this week, as many people were surprised that Apple relies on others to provide cloud services, but they have been leveraging AWS and others for years. Our internal chat was alive with discussion about build vs. buy for different providers of cloud services. Perhaps a hundred or so companies have the scale to make a go at building from scratch at this point, and the odds of success for many of those are small. You need massive scale before the costs make it worth building your own. Especially the custom engineering required to get equivalent hardware margins. That leave a handful of firms who can make a go of this, and it’s still not always clear whether they should. Even Apple buys others’ services, and it usually makes good economic sense. We did not really talk about RSA conference highlights, but the Rugged DevOps event (slides are up) was the highlight of RSAC week for me. The presentations were all thought-provoking. Concepts which were consistently reinforced included: Constantly test, constantly improve Without data you’re just another person with an opinion Don’t update; dispose and improve Micro-services and Docker containers are the basic building blocks for application development today Micro-services make sense to me, and I have successfully used that design concept, but I have zero practical experience with Docker. Which is a shocker because it’s freakin’ everywhere, but I have never yet taken the time to learn. That stops this week. AWS and Azure both support it, and it’s embedded into Big Data frameworks as well, so it’s everywhere I want to be. I saw two vendor presentation on security concerns around Docker deployment models, and yeah, it scares me a bit. But Docker addresses the basic demand for easy updates, packaging, and accelerating deployment, so it stays. Security will iterate improvements to the model over time, as we usually do. DevOps doesn’t fix everything. That’s not me being a security curmudgeon – it’s me being excited by new technologies that let me get work done faster. Amazon’s CTO wants to make it impossible for anyone else to access your data – including him. And no, Werner does not have Captain Crunch stuck to his beard. Cisco Acquires CliQr For $260M – basically software defined cloud management. The Dangers of Docker.sock Get Ready for Docker’s 3rd Birthday! Docker may be the dumbest thing you do today RightScale 2016 State of the Cloud Report (registration required) We protect the wrong things and we slow everything down Have you heard of Google’s Project Loon? No? Then how about Microsoft’s version: Pegasus II. It’s IoT meets cloud. Data Lakes – no longer just a marketing buzzword. Share:

Share:
Read Post

Summary: The Cloud Horizon

By Adrian Two weeks ago Rich sketched out some changes to our Friday Summary, including how the content will change. But we haven’t spelled out our reasons. Our motivation is simple. In a decade, over half your systems will be in some cloud somewhere. The Summary will still be about security, but we’ll focus on security for cloud services, cloud applications, and how DevOps techniques intertwine with each. Rather than rehash on-premise security issues we have covered (ad nauseum) for 9 years, we believe it’s far more helpful to IT and security folks to discuss what is on the near horizon which they are not already familiar with. We can say with certainty that most of what you’ve learned about “the right way to do things” in security will be challenged by cloud deployments, so we are tuning the Summary to increase understanding the changes in store, and what to do about them. Trends, features, tools, and even some code. We know it’s not for everybody, but if you’re seriously interested, you can subscribe directly to the Friday Summary. The RSA conference is next week, so don’t forget to get a copy of Securosis’s Guide to the RSA Conference. But be warned; Mike’s been at the meme generator again, and some things you just can’t unsee. Oh, and if you’re interested in attending the Eighth Annual Securosis Disaster Recovery Breakfast at RSA, please RSVP. That way we know how much bacon to order. Or Bloody Marys to make. Something like that. Top Posts for the Week CSA Summit at RSA Conference Docker Containers as a Service walkthrough Scheduling SSH jobs using AWS Lambda Transparency and Auditing on AWS Introducing custom authorizers in Amazon API Gateway S3 Lifecycle Policies, Versioning & Encryption: AWS Security AWS Basic Security Checklist CloudWatch Logs Subscription Consumer + Elasticsearch + Kibana Dashboards Securely Accessing Customer AWS Accounts with Cross-Account IAM Roles Red Hat Brings DevOps to the Network with New Ansible Capabilities Introducing the Fastly Security Speaker Series Account Separation and Mandatory Access Control Customizing CloudFormation With Python Tidas: a new service for building password-less apps NXLog Open Source Log Management tool Why the FBI’s request to Apple will affect civil rights for a generation Staying on top of the DevOps game in 2016 Continuous Web Security Testing with CircleCI Spotify Moves Itself Onto Google’s Cloud–Lucky for Google Continuous Delivery and Effective Feature Flagging with LaunchDarkly – AWS Startup Collection Design Patterns using Amazon DynamoDB Using Amazon API Gateway with microservices deployed on Amazon ECS Continuous Delivery and Effective Feature Flagging with LaunchDarkly – AWS Startup Collection 8 Common AWS Security Issues – and How to Fix Them Using Roles to Secure Your Environment: Part 2 Automate EBS Snapshots using a Lambda function Attending RSA in San Francisco? Visit the AWS Pop-up Loft for Security Talks! Amazon CTO On Encryption: “Evil Players Will Get Access To These Backdoors” IBM previews new tools for developing with Swift in the cloud Tool of the Week This is a new section highlighting a cloud, DevOps, or security tool we think you should take a look at. We still struggle to keep track of all the interesting tools that can help us, so if you have submissions please email them to info@securosis.com. Alerts literally drive DevOps. One may fire off a cloud-based service, or it might indicate a failure a human needs to look at. When putting together a continuous integration pipeline, or processing cloud services, how do you communicate status? SMS and email are the common output formats, and developer tools like Slack or bug tracking systems tend to be the endpoints, but it’s hard to manage and integrate the streams of automated outputs. And once you get one message of a particular event type, you usually don’t want to see that event again for a while. You can create a simple web console, or use AWS to stream to specified recipients, but that’s all manual setup. Things like Slack can help with individuals, team, and third parties, but managing them is frankly a pain in the ass. As you scale up cloud and DevOps processes it’s easy to get overwhelmed. One of the tools I was looking at this week was (x)matters, which provides an integration and management hub for automated messages. It can understand messages from multiple sources and offers aggregation to avoid over-pinging users. I have not seen many products addressing this problem, so I wanted to pass it along. Securosis Blog Posts this Week Firestarter: RSA Conference – the Good, Bad, and the Ugly. Securing Hadoop: Technical Recommendations. Securing Hadoop: Enterprise Security For NoSQL. Other Securosis News and Quotes I posted a piece at Macworld on the FBI vs. Apple that has gotten a lot of attention. It got linked all over the place and I did a bunch of interviews, but I won’t spam you with them. We are posting our whole RSA Conference Guide as posts over at the RSA Conference blog – here are the latest: Securosis Guide: Training Security Jedi Securosis Guide: The Beginning of the End(point) for the Empire Securosis Guide: Escape from Cloud City Training and Events We are giving multiple presentations at the RSA Conference. Rich and Mike are giving Cloud Security Accountability Tour Rich is co-presenting with Bill Shinn of AWS: Aspirin as a Service: Using the Cloud to Cure Security Headaches David Mortman is presenting: Learning from Unicorns While Living with Legacy Docker: Containing the Security Excitement Docker: Containing the Security Excitement (Focus-On) Leveraging Analytics for Data Protection Decisions Rich is giving a presentation on Rugged DevOps at Scale at DevOps Connect the Monday of RSAC We are running two classes at Black Hat USA: Cloud Security Hands-On (CCSK-Plus) Advanced Cloud Security and Applied SecDevOps Share:

Share:
Read Post

Securing Hadoop: Technical Recommendations

Before we wrap up this series on securing Hadoop databases, I am happy to announce that Vormetric has asked to license this content, and Hortonworks is also evaluating a license as well. It’s community support that allows us to bring you this research free of charge. Also, I’ve received a couple email and twitter responses to the content; if you have more input to offer, now is the time to send it along to be evaluated with the rest of the feedback as we will assembled the final paper in the coming week. And with that, onto the recommendations. The following are our security recommendations to address security issues with Hadoop and NoSQL database clusters. The last time we made recommendations we joked that many security tools broke Hadoop scalability; you’re cluster was secure because it was likely no one would use it. Fast forward four years and both commercial and open source technologies have advanced considerably, not only addressing threats you’re worried about, but were designed specifically for Hadoop. This means the possibility a security tool will compromise cluster performance and scalability are low, and that integration hassles of old are mostly behind us. In fact, it’s because of the rapid technical advancements in the open source community that we have done an about-face on where to look for security capabilities. We are no longer focused on just 3rd party security tools, but largely the open source community, who helped close the major gaps in Hadoop security. That said, many of these capabilities are new, and like most new things, lack a degree of maturity. You still need to go through a tool selection process based upon your needs, and then do the integration and configuration work. Requirements As security in and around Hadoop is still relatively young, it is not a forgone conclusion that all security tools will work with a clustered NoSQL database. We still witness instances where vendors parade the same old products they offer for other back-office systems and relational databases. To ensure you are not duped by security vendors you still need to do your homework: Evaluate products to ensure they are architecturally and environmentally consistent with the cluster architecture — not in conflict with the essential characteristics of Hadoop. Any security control used for NoSQL must meet the following requirements: 1. It must not compromise the basic functionality of the cluster. 2. It should scale in the same manner as the cluster. 3. It should address a security threat to NoSQL databases or data stored within the cluster. Our Recommendations In the end, our big data security recommendations boil down to a handful of standard tools which can be effective in setting a secure baseline for Hadoop environments: Use Kerberos for node authentication: We believed – at the outset of this project – that we would no longer recommend Kerberos. Implementation and deployment challenges with Kerberos suggested customers would go in a different direction. We were 100% wrong. Our research showed that adoption has increased considerably over the last 24 months, specifically in response to the enterprise distributions of Hadoop have streamlined the integration of Kerberos, making it reasonably easy to deploy. Now, more than ever, Kerberos is being used as a cornerstone of cluster security. It remains effective for validating nodes and – for some – authenticating users. But other security controls piggy-back off Kerberos as well. Kerberos is one of the most effective security controls at our disposal, it’s built into the Hadoop infrastructure, and enterprise bundles make it accessible so we recommend you use it. Use file layer encryption: Simply stated, this is how you will protect data. File encryption protects against two attacker techniques for circumventing application security controls: Encryption protects data if malicious users or administrators gain access to data nodes and directly inspect files, and renders stolen files or copied disk images unreadable. Oh, and if you need to address compliance or data governance requirements, data encryption is not optional. While it may be tempting to rely upon encrypted SAN/NAS storage devices, they don’t provide protection from credentialed user access, granular protection of files or multi-key support. And file layer encryption provides consistent protection across different platforms regardless of OS/platform/storage type, with some products even protecting encryption operations in memory. Just as important, encryption meets our requirements for big data security — it is transparent to both Hadoop and calling applications, and scales out as the cluster grows. But you have a choice to make: Use open source HDFS encryption, or a third party commercial product. Open source products are freely available, and has open source key management support. But keep in mind that HDFS encryption engine only protects data on HDFS, leaving other types of files exposed. Commercial variants that work at the file system layer cover all files. Second, they lack some support for external key management, trusted binaries, and full support that commercial products do. Free is always nice, but for many of those we polled, complete coverage and support tilted the balance for enterprise customers. Regardless of which option you choose, this is a mandatory security control. Use key management: File layer encryption is not effective if an attacker can access encryption keys. Many big data cluster administrators store keys on local disk drives because it’s quick and easy, but it’s also insecure as keys can be collected by the platform administrator or an attacker. And we are seeing Keytab file sitting around unprotected in file systems. Use key management service to distribute keys and certificates; and manage different keys for each group, application, and user. This requires additional setup and possibly commercial key management products to scale with your big data environment, but it’s critical. Most of the encryption controls we recommend depend on key/certificate security. Use Apache Ranger: In the original version of this research we were most worried about the use of a dozen modules with Hadoop, all deployed with ad-hoc configuration, hidden within the complexities of the cluster, each offering up a unique attack surface to potential attackers. Deployment validation

Share:
Read Post

Securing Hadoop: Enterprise Security For NoSQL

Hadoop is now enterprise software. There, I said it. I know lots of readers in the IT space still look at Hadoop as an interloper, or worse, part of the rogue IT problem. But better than 50% of the enterprises we spoke with are running Hadoop somewhere within the organization. A small percentage are running Mongo, Cassandra or Riak in parallel with Hadoop, for specific projects. Discussions on what ‘big data’ is, if it is a viable technology, or even if open source can be considered ‘enterprise software’ are long past. What began as proof of concept projects have matured into critical application services. And with that change, IT teams are now tasked with getting a handle on Hadoop security, to which they response with questions like “How do I secure Hadoop?” and “How do I map existing data governance policies to NoSQL databases?” Security vendors will tell you both attacks on corporate IT systems and data breaches are prevalent, so with gobs of data under management, Hadoop provides a tempting target for ‘Hackers’. All of which is true, but as of today, there really have not been major data breaches where Hadoop play a part of the story. As such this sort of ‘FUD’ carries little weight with IT operations. But make no mistake, security is a requirement! As sensitive information, customer data, medical histories, intellectual property and just about every type of data used in enterprise computing is now commonly used in Hadoop clusters, the ‘C’ word (i.e.: Compliance) has become part of their daily vocabulary. One of the big changes we’ve seen in the last couple of years with Hadoop becoming business critical infrastructure, and another – directly cause by the first – is IT is being tasked with bringing existing clusters in line with enterprise compliance requirements. This is somewhat challenging as a fresh install of Hadoop suffers all the same weak points traditional IT systems have, so it takes work to get security set up and the reports being created. For clusters that are already up and running, no need to choose technologies and a deployment roadmap that does not upset ongoing operations. On top of that, there is the additional challenge that the in-house tools you use to secure things like SAP, or the SIEM infrastructure you use for compliance reporting, may not be suitable when it comes to NoSQL. Building security into the cluster The number of security solutions that are compatible – if not outright built for – Hadoop is the biggest change since 2012. All of the major security pillars – authentication, authorization, encryption, key management and configuration management – are covered and the tools are viable. Most of the advancement have come from the firms that provide enterprise distributions of Hadoop. They have built, and in many cases contributed back to the open source community, security tools that accomplish the basics of cluster security. When you look at the threat-response models introduced in the previous two posts, every compensating security control is now available. Better still, they have done a lot of the integration legwork for services like Kerberos, taking a lot of the pain out of deployments. Here are some of the components and functions that were not available – or not viable – in 2012. LDAP/AD Integration – Technically AD and LDAP integration were available in 2012, but these services have both been advanced, and are easier to integrate than before. In fact, this area has received the most attention, and integration is as simple as a setup wizard with some of the commercial platforms. The benefits are obvious, as firms can leverage existing access and authorization schemes, and defer user and role management to external sources. Apache Ranger – Ranger is one of the more interesting technologies to come available, and it closes the biggest gap: Module security policies and configuration management. It provides a tool for cluster administrators to set policies for different modules like Hive, Kafka, HBase or Yarn. What’s more, those policies are in context to the module, so it sets policies for files and directories when in HDSF, SQL policies when in Hive, and so on. This helps with data governance and compliance as administrators set how a cluster should be used, or how data is to be accessed, in ways that simple role based access controls cannot. Apache Knox – You can think of Knox in it’s simplest form as a Hadoop firewall. More correctly, it is an API gateway. It handles HTTP and REST-ful requests, enforcing authentication and usage policies of inbound requests, and blocking everything else. Knox can be used as a virtual moat’ around a cluster, or used with network segmentation to further reduce network attack surface. Apache Atlas – Atlas is a proposed open source governence framework for Hadoop. It allows for annotation of files and tables, set relationships between data sets, and even import meta-data from other sources. These features are helpful for reporting, data discovery and for controlling access. Atlas is new and we expect it to see significant maturing in coming years, but for now it offers some valuable tools for basic data governance and reporting. Apache Ambari – Ambari is a facility for provisioning and managing Hadoop clusters. It helps admins set configurations and propagate changes to the entire cluster. During our interviews we we only spoke to two firms using this capability, but we received positive feedback by both. Additionally we spoke with a handful of companies who had written their own configuration and launch scripts, with pre-deployment validation checks, usually for cloud and virtual machine deployments. This later approach was more time consuming to create, but offered greater capabilities, with each function orchestrated within IT operational processes (e.g.: continuous deployment, failure recovery, DevOps). For most, Ambari’s ability to get you up and running quickly and provide consistent cluster management is a big win and a suitable choice. Monitoring – Hive, PIQL, Impala, Spark SQL and similar modules offer SQL or pseudo-SQL syntax. This means that the activity monitoring, dynamic masking, redaction and tokenization technologies originally developed for

Share:
Read Post

Securing Hadoop: Operational Security Issues

Beyond the architectural security issues endemic to Hadoop and NoSQL platforms discussed in the last post, IT teams expect some common security processes and supporting tools familiar from other data management platforms. That includes “turning the dials” on configuration management, vulnerability assessment, and maintaining patch levels across a complex assembly of supporting modules. The day-to-day processes IT managers follow to ensure typical application platforms are properly configured have evolved over years – core platform capabilities, community contributions, and commercial third-party support to fill in gaps. Best practices, checklists, and validation tools to verify things like admin rights are sufficiently tight, and that nodes are patched against known and perhaps even unknown vulnerabilities. Hadoop security has come a long way in just a few years, but it still lacks the maturity in day to day operational security offerings, and it is here that we find most firms continue to struggle. The following is an overview of the most common threats to data management systems, where operational controls offer preventative security measures to close off most common attacks. Again we will discuss the challenges, then map them to mitigation options. Authentication and authorization: Identity and authentication are central to any security effort – without them we cannot determine who should get access to data. Fortunately the greatest gains in NoSQL security have been in identity and access management. This is largely thanks to providers of enterprise Hadoop distributions, who have performed much of the integration and setup work. We have evolved from simple in-database authentication and crude platform identity management to much better integrated LDAP, Active Directory, Kerberos, and X.509 based authentication options. Leveraging those capabilities we can use established roles for authorization mapping, and sometimes extend to fine-grained authorization services with Apache Sentry, or custom authorization mapping controlled from within the calling application the database. Administrative data access: Most organizations have platform administrators and NoSQL database administrators, both with access to the cluster’s files. To provide separation of duties – to ensure administrators cannot view content – a facility is needed to segregate administrative roles and keep unwanted access to a minimum. Direct access to files or data is commonly addressed through a combination of role based-authorization, access control lists, file permissions, and segregation of administrative roles – such as with separate administrative accounts, bearing different roles and credentials. This provides basic protection, but cannot protect archived or snapshotted content. Stronger security requires a combination of data encryption and key management services, with unique keys for each application or cluster. This prevents different tenants (applications) in a shared cluster from viewing each other’s data. Configuration and Patch Management: With a cluster of servers, which may have hundreds of nodes, it is common to run different configurations and patch levels at one time. As nodes are added we see configuration skew. Keeping track of revisions is difficult. Existing configuration management tools can cover the underlying platforms, and HDFS Federation will help with cluster management, but they both leave a lot to be desired – including issuing encryption keys, avoiding ad hoc configuration changes, ensuring file permissions are set correctly, and ensuring TLS is correctly configured. NoSQL systems do not yet have counterparts for the configuration management tools available for relational platforms, and even commercial Hadoop distributions offer scant advice on recommended configurations and pre-deployment checklists. But administrators still need to ensure configuration scripts, patches, and open source code revisions are consistent. So we see NoSQL databases deployed on virtual servers and cloud instances, with home-grown pre-deployment scripts. Alternatively a “golden master” node may embody extensive configuration and validation, propagated automatically to new nodes before they can be added into the cluster. Software Bundles: The application and Hadoop stacks are assembled from many different components. Underlying platforms and file systems also vary – with their own configuration settings, ownership rights, and patch levels. We see organizations increasingly using source code control systems to handle open source version management and application stack management. Container technologies also help developers bundle up consistent application deployments. Authentication of applications and nodes: If an attacker can add a new node they control to the cluster, they can exfiltrate data from the cluster. To authenticate nodes (rather than users) before they can join a cluster, most firms we spoke with either employ X.509 certificates or Kerberos. Both can authenticate users as well, but we draw this distinction to underscore the threat of rogue applications or nodes being added to the cluster. Deployment of these services brings risks as well. For example if a Kerberos keytab file can be accessed or duplicated – perhaps using credentials extracted from virtual image files or snapshots – a node’s identity can be forged. Certificate-based identity options implicitly complicate setup and deployment, but properly deployed they can provide strong authentication and stronger security. Audit and Logging: If you suspect someone has breached your cluster, can you detect it, or trace back to the root cause? You need an activity record, which is usually provided by event logging. A variety of add-on logging capabilities are available, both open source and commercial. Scribe and LogStash are open source tools which integrate into most big data environments, as do a number of commercial products. You can leverage the existing cluster to store logs, build an independent cluster, or even leverage other dedicated platforms like a SIEM or Splunk. That said, some logging options do not provide an auditor sufficient information to determine exactly what actions occurred. You will need to verify that your logs are capturing both the correct event types and user actions. A user ID and IP address are insufficient – you also need to know what queries were issued. Monitoring, filtering, and blocking: There are no built-in monitoring tools to detect misuse or block malicious queries. There isn’t even yet a consensus on what a malicious big data query looks like – aside from crappy MapReduce scripts written by bad programmers. We are just seeing the first viable releases of Hadoop activity monitoring tools. No longer the “after-market speed regulators” they once were, current tools typically embedded into a

Share:
Read Post

Securing Hadoop: Architectural Security Issues

Now that we have sketched out the elements a Hadoop cluster, and what one looks like, let’s talk threats to the databases. We want to consider both the database infrastructure itself, as well as the data under management. Given the complexity of a Hadoop cluster, the task is closer to securing an entire data center than a typical relational database. All the features that provide flexibility, scalability, performance, and openness, create specific security challenges. The following are some specific threats to clustered databases. Data access & ownership: Role-based access is central to most database security schemes, and NoSQL is no different. Relational and quasi-relational platforms include roles, groups, schemas, label security, and various other facilities for limiting user access to subsets of available data. Most big data environments now offer integration with identity stores, along with role-based facilities to divide up data access between groups of users. That said, authentication and authorization require cooperation between the application designer and the IT team managing the cluster. Leveraging existing Active Directory or LDAP services helps tremendously with defining user identities, and pre-defined roles may be available for limiting access to sensitive data. Data at rest protection: The standard for protecting data at rest is encryption, which protects against attempts to access data outside established application interfaces. With Hadoop systems we worry about people stealing archives or directly reading files from disk. Encrypted files are protected against access by users without encryption keys. Replication effectively replaces backups for big data, but beware a rogue administrator or cloud service manager creating their own backups. Encryption limits how data can be copied from the cluster. Unlike 2012, where the lack of suitable encryption was a serious issue. Apache offers HDFS encryption as an option; this is a major advance, but remember that you can only encrypt HDFS, and you’ll need to fill the gaps with key management and key storage. Several commercial Hadoop vendors offer transparent encryption, and third parties have advanced the state of the art, with transparent encryption options for both both HDFS and non-HDFS on-disk formats, especially coupled with parallel progress in key management. Inter-node communication: Hadoop and the vast majority of distributions (Cassandra, MongoDB, Couchbase, etc.) don’t communicate securely by default – they use unencrypted RPC over TCP/IP. TLS and SSL are bundled in big data distributions, but not typically used between applications and databases – and almost never for inter-node communication. This leaves data in transit, and application queries, accessible for inspection and tampering. Client interaction: Clients interact with resource managers and nodes. While gateway services can be created to load data, clients communicate directly with both resource managers and individual data nodes. Compromised clients can send malicious data or links to either service. This facilitates efficient communication but makes it difficult to protect nodes from clients, clients from nodes, and even name servers from nodes. Worse, the distribution of self-organizing nodes is a poor fit for security tools such as gateways, firewalls, and monitors. Many security tools are designed to require a choke-point or span port, which may not be available in a peer-to-peer mesh cluster. Distributed nodes: One of the reasons big data makes sense is an old truism: “moving computation is cheaper than moving data”. Data is processed wherever resources are available, enabling massively parallel computation. Unfortunately this produces complicated environments with lots of attack surface. With so many moving parts, it is difficult to verify consistency or security across a highly distributed cluster of (possibly heterogeneous) platforms. Patching, configuration management, node identity, and data at rest protection – and consistent deployment of each – are all issues. Threat-response models One or more security countermeasures are available to mitigate each threat identified above. The following diagram shows which specific options you have at your disposal to help you choose a ‘preventative’ security measure. We don’t have room to go into much detail on the tradeoffs of each response – each area really deserves its own paper. But we do want to mention a couple areas where we have seen the most change since our original research four years ago. If your goal is to protect session privacy – either between clients and data nodes, or for inter-node communication – Transport Layer Security (TLS) is your first choice. This was unheard of in 2012, but since then about 25% of the companies we spoke with have implemented SSL or TLS for inter-node communication – not just between applications and name servers. Transport encryption protects all communications from access or modification by attackers. Some firms instead use network segmentation and firewalls to ensure that attackers cannot access network traffic. This approach is less robust but much easier to implement. Some clusters were deployed to third-party cloud services, where virtualized network services make sniffing nearly impossible; these companies typically chose not to encrypt internal cluster communications. Enforcing data usage is one of the areas where we have seen the most progress, thanks to database links into existing Active Directory and LDAP identity stores. This seems obvious now but was a rarity in 2012, when data architects were focused on scalability and getting basic analytics up and running. Fortunately support for linking identity stores to Hadoop clusters has advanced considerably, making it much easier to leverage existing roles and management infrastructure. But we also have other tools at our disposal. We don’t see it often, but a handful of organizations encrypt sensitive data elements at the application layer, so information is stored as encrypted elements. This way the application manages decryption and key management functions, and can offer additional controls over who can see which information. This is very secure, but must be designed in during application design and coded into the application from the beginning. Retrofitting application-layer encryption into an existing application and database stack is highly challenging at beast, which is why we also see wide usage of masking and redaction technologies – from both enterprise Hadoop vendors and third-party security vendors. These technologies offer fine control over which data is displayed to which users, and can be easily built into existing clusters to

Share:
Read Post

Securing Hadoop: Architecture and Composition

Our goal for this post is to succinctly outline what Hadoop (and most NoSQL) clusters look like, how they are assembled, and how they are used. This provides better understanding of the security challenges, and what sort of protections need to be leveraged to secure them. Developers and data scientists continue to stretch system performance and scalability, using customized combinations of open source and commercial products, so there is really no such thing as a ‘standard’ Hadoop deployment. With these considerations in mind, it is time to map out threats to the cluster. NoSQL databases enable companies to collect, manage, and analyze incredibly large data sets. Thousands of firms are working on big data projects, from small startups to large enterprises. Since our original paper in 2012 the rate of adoption has only increased; platforms such as Hadoop, Cassandra, Mongo, and RIAK are now commonplace, with some firms supporting multiple installations. In just a couple years they went from “rogue IT” to “core systems”. Most firms recognized the value of “big data”, acknowledged these platforms are essential, and tasked IT teams with bringing them “under IT governance”. Most firms today are taking their first steps to retrofit security and governance controls onto Hadoop. Let’s dig into how all the pieces fit together: Architecture and Data Flow Hadoop has been wildly successful because it scales well, can be configured to handle a wide variety of use cases, and is very inexpensive compared to relational and data warehouse alternatives. Which is all another way of saying it’s cheap, fast, and flexible. To show why and how it scales, let’s take a look at a Hadoop cluster architecture: There are several things to note here. The architecture promotes scaling and performance. It provides parallel processing, and additional nodes provide ‘horizontal’ scalability. This architecture is also inherently multi-tenant, supporting multiple applications across one or more file groups. But there are a lot of moving parts; each node communicates with its peers to ensure that data is properly replicated, nodes are on-line and functional, storage is optimized, and application requests are being processed. We’ll dig into specific threats to Hadoop clusters later in this series. Hadoop Stack To appreciate Hadoop’s flexibility, you need to understand that a cluster can be fully customized. It is useful to think about the Hadoop framework as a ‘stack’, much like a LAMP stack, but much less standardized. While Pig and Hive are commonly used, the ability to mix and match components makes deployments much more diverse. For example, Sqoop and Yarn are alternative data access services. You can select different big data environments to support columnar, graph, document, XML, or multidimensional data. And over the last couple years MapReduce has largely given way to SQL query engines – with Spark, Drill, Impala, and Hive all accommodating increasing use of SQL-style queries. This modularity offers great flexibility to assemble and tailor clusters to behave and perform exactly as desired. But it also makes security more difficult – each option brings its own security options and issues. The beauty part is that you can set up a cluster to satisfy your usability, scalability, and performance goals. You can tailor it to specific types of data, or add modules to facilitate analysis of certain data sets. But that flexibility brings complexity. Each module runs a specific version of code, has its own configuration, and may require independent authentication to work in the cluster. Many pieces must work in tandem here to process data, so each requires its own security review. Some of you reading this are already familiar with the architecture and component stack of a Hadoop cluster. You may be asking, “Why we are we going through these basics?”. To understand threats and appropriate responses, you need to first understand how all the pieces of the cluster work together. Each component interface is a trust relationship, and each relationships is a target. Each component offers attacker a specific set of potential exploits, and defenders have a corresponding set of options for attack detection and prevention. Understanding architecture and cluster composition is the first step to putting together your security strategy. Our next post will present several strategies used to secure big data. Each model includes basic benefits and requires supplementary security tools. After selecting a strategy, you can put together a collection of security controls to meet your objectives. Share:

Share:
Read Post

Securing Hadoop: Security Recommendations for NoSQL platforms [New Series]

It’s been three and a half years since we published our research paper on Securing Big Data. That research paper has been one of the more popular papers we’ve ever written. And it’s no wonder as NoSQL adoption was faster than we expected; we see hundreds of new projects popping up, leveraging the scale, analytics and low cost of these platforms. It’s not hyperbole to claim it has revolutionized the database market over the last 5 years, and community support behind these platforms – and especially Hadoop – is staggering. At the time we wrote the last paper security, Hadoop – much less the other platforms – was something of a barren wasteland. They did not include basic controls for data protection, most third party tools could not scale along with NoSQL and thus were of little use to developers, and leaders of NoSQL firms directed resources to improving performance and scalability, not security. Heck, in 2012 the version of Hadoop I evaluated did not even require and administrative password! But when it comes to NoSQL security, and Hadoop specifically, things have changed dramatically. As we advise clients on how to implement security controls, there are many new options to consider. And while there remains some gaps in monitoring and assessment capabilities, Hadoop has (mostly) reached security parity with the relational platforms of old. We can’t call it a barren wasteland any longer, so to accurately advise people on approaches and tools to leverage, we can no longer refer them back to that original paper. So we are kicking off a new research series to refresh this paper. Most of the content will be new. And this time we will do this a little bit differently that the last time. First, we are going to provide less background on what makes NoSQL different than relational databases, as most people in IT are now comfortable with the architectural and functional distinctions between the two. Second, most of our recommendations will still apply to NoSQL platforms in general, but this research will be more focused on Hadoop as we get a majority of questions on Hadoop security despite dozens of alternatives. Finally, as there are lots more aspects to talk about, we’ll weave preventative and detective controls into a more operational (i.e.: day to day management) model for both data and database infrastructure. Here is how we are laying out the series: Hadoop Architecture and Assembly — The goal with this post is to succinctly outline what Hadoop and similar styles of NoSQL clusters look like, how they are assembled and how they are used. In this light we get a better idea of the security challenges and what sort of protections need to be leveraged. As developers and data scientists stretch systems from a performance and scalability standpoint, and custom assemblage of open source and commercial products, there really is no such thing as a standard Hadoop deployment. So with these considerations in mind we will map out threats to the cluster. Use Cases & Security Architectures — This post will discuss the strategic considerations for deploying security for big data. Depending upon which model you choose, you change where certain types of threats are addressed, and consequentially what tools you will rely upon to provide security. Or stated another way, the security model you choose will dictate what security technologies you need to prevent and detect threats. There are several approaches that organizations take to secure Hadoop and other NoSQL clusters. These range from securing the network around the cluster, Identity Management, to maintaining security controls on each node within the cluster, or even taking a data centric approach to security. We’ll go over the major trends we see today, and discuss the advantages and pitfalls of each approach. Building Security Into the Cluster — Here is where we discuss how all of the pieces fit together. There are many security controls available, and each address a specific threat vector an attacker may employ. We’ll focus on security controls you want to build into your cluster from the start: identity, authorization, transport layer security, application security and data encryption. This will focus on the base security controls that allow you to define how the cluster should be used from a security standpoint. Operational Security — Here we will focus on the day to day security controls for monitoring ongoing security and discovering user behavior and ongoing security operations. Aspects like configuration management, patching, logging, monitoring, and node validation. We’ll even discuss integrating a DevOps approach to cluster administration to improve speed and consistency. Commercial Hadoop and NoSQL variants — Hadoop is the dominant flavor of ‘big data’ in use today. In this section we will discuss what the commercial Hadoop platform vendors are doing to promote security for their customers with a blend of open source, home grown and 3rd party security product support. There is no reason to roll you’re own security out of necessity as commercial variants often add on their own products or provide bundles for you. Each offers unique capabilities and each has a vision of what their customers should focus on, so we will cover some of the current offerings. We will also offer some advice on the application of security to non-Hadoop platforms. While Hadoop is the most commonly used platform, there are specialized flavors of NoSQL that are eminently appropriate for certain business challenges and are in wide use. Some even use HDFS or other Hadoop components that allow the use of the same security controls across different clusters. We will close out this section discussing where the security controls we have already discussed can be deployed in non-Hadoop environments where appropriate. As with our original paper, this is not intended to be an exhaustive look at all potential security options, but to get the IT and development teams who run these clusters basic security controls in place. Up next, Hadoop Architecture and Assembly. Share:

Share:
Read Post

Building Security Into DevOps [New Paper]

We are pleased to announce the launch of our latest research paper, on Building Security Into DevOps. We expect DevOps to fundamentally change the practice of software development over the next decade, and with it how we handle application security. From the report: The following graphic reflects our conversations, with development and security practitioners, on where they are successfully deploying security testing tools in a DevOps framework. The callouts map the types of tests being conducted at specific phases of CI & CD. Keep in mind that it’s early days for DevOps and the orchestration of security tools – basically what works where – is far from settled. More importantly, many security tools were built before these concepts of rapid and automated deployment existed; older products are too slow, some could not focus their tests on new code, and still others did not offer API support. Which is another way of saying not all tools are created equal, so you’ll need to evaluate for both performance and API integration capabilities as well as code coverage capabilities.   A special thanks to Veracode for licensing this content. As usual everything was written completely independently, using our Totally Transparent Research process. It is only thanks to licenses that we are able to give this research away free. You can download a free copy of the white paper in our research library, or grab a copy directly: Building Security Into DevOps (PDF). Share:

Share:
Read Post

Summary: Surviving the Holidays

With the holidays upon us, and the weather in Phoenix at that optimal temperature of 50F warmer than wherever people come from, the migration has begun. The snowbirds are back in Phoenix. And all my relatives want to visit. All pretty much at the same time. As I write this I am recovering from 20 contiguous days of four different groups of friends and relatives staying at my home. Overlapping, I might add. And it was glorious – it was great to see each and every one of them – but I heaved a great sigh of relief when the last party got onto a plane and flew back home. I think I have baked, roasted, toasted, and barbecued every type of food I know how to cook. I’ve been a tour guide across the state – twice over – showing off every interesting place within a three-hour drive. Today’s summary is a toast to all of you who survived Thanksgiving – I am thankful for many things, and I am also thankful this holiday is only once a year. Paul Ford has a thoughtful piece called I Dreamed of a Perfect Database. It nails down the evolutionary track of many of us, who have long straddled the database and software development worlds. As our needs changed there were grass-roots projects to make new types of databases – that did, well, whatever the heck we needed them to do. Cheaper and faster. More data, or maybe more types of data, or maybe a single specific type of data with functions optimized for it. There were some that performed analytics or cube analysis, and some that offered lightning fast in-memory lookups or graph data. We got self-healing, self-organizing, self-indexing clouds of data nodes, with whatever the heck we wanted sitting on top of them. When the Internet boom hit, Oracle was the database of choice. During this last cloudburst we got 250 flavors of NoSQL. But Paul’s dream is getting closer to reality. When you assemble Hadoop with a stack of add on modules, namely Apache Spark, you get pretty close. Want SQL? OK, you can have it. Or MapReduce. Deep analytics or memory-resident lookups. You want it, you can have it. The point is that the demands on databases will always be in flux. Performance and scalability have always been core needs, but how we want to use data will continue to morph. The current generation of databases, typically based off Hadoop, are veritable Swiss Army knives, to be formed and customized as we need them. There has never been a better time to be a database developer! If you run a bug bounty program you know there is a lot more to it than Most people consider when they start. Randy Westergren’s post on his experience with the United Airlines Bug Bounty Program offers some insight into what can happen. For example, when multiple researchers find the same critical flaw, the researchers who do not get paid can – and likely will – go public. Sure, this is bad behavior by the researcher. Your legal team can try to stop it, but you need to plan for this situation before it comes up. Second, it is amazing to me that what in-house developers consider a suitably fast release date for vulnerabilities; but it is often totally unacceptable to the research community. Both are developers by nature, but to one party three months is lightning fast. The other considers that criminally dangerous. You’ll need to set expectations going in. United Airlines was communicative, but in today’s world six months to patch is an eternity. Virtual patching techniques, API gateways, and Continuous Deployment techniques allow many organizations to deal with these issues far more quickly. A bug bounty program is a great way to leverage the community of experts to help you find flaws your in-house team might never discover, but you need to have this effort fully planned out before you start. On to the Summary: Webcasts, Podcasts, Outside Writing, and Conferences Rich quoted on automotive ‘cyber security’ Gunnar’s Security Champions Guide to Web Application via Akamai. Other Securosis Posts Incite 12/2/2015: Grateful Habits. Summary: Boy in the Bubble. Cloud Security Best Practice: Limit Blast Radius with Multiple Accounts. The Blame Game. Summary: Refurbished. Critical Security Capabilities for Cloud Providers. Massive, Very Bad Java 0-Day (and, Sigh, Oracle). Favorite Outside Posts Adrian Lane: Microsoft’s New Threat Modeling Tool. This post is a couple weeks old but I forgot to mention it. Microsoft added tools to their threat modeling approach to catch errors earlier in the process. We talk about the need to find vulnerabilities earlier in the process, and MS is helping to do just that. Mike Rothman: Think Security is Expensive, Insecurity Costs Much More: It’s a hard thing to justify spending on security; this article makes the point that you should do it right the first time. And I’ll even give Tony a pass for mentioning Ponemon. The general point is good. Chris Pepper: Man Uses LifeLock To Track Ex-Wife; Company Didn’t Care Research Reports and Presentations Pragmatic Security for Cloud and Hybrid Networks. EMV Migration and the Changing Payments Landscape. Network-based Threat Detection. Applied Threat Intelligence. Endpoint Defense: Essential Practices. Cracking the Confusion: Encryption and Tokenization for Data Centers, Servers, and Applications. Security and Privacy on the Encrypted Network. Monitoring the Hybrid Cloud: Evolving to the CloudSOC. Security Best Practices for Amazon Web Services. Securing Enterprise Applications. Top News and Posts Mozilla’s Improving Revocation: OCSP Must-Staple and Short-lived Certificates Mark Cuban slams SEC for blocking email privacy reform effort Java 0-day shocker VTech Hack Seven Tips for Personal Online Security DHS Giving Firms Free Penetration Tests Worldwide Cryptography Product Survey Criminals steal $4 million in cash with novel ‘reverse ATM’ attack Share:

Share:
Read Post

Totally Transparent Research is the embodiment of how we work at Securosis. It’s our core operating philosophy, our research policy, and a specific process. We initially developed it to help maintain objectivity while producing licensed research, but its benefits extend to all aspects of our business.

Going beyond Open Source Research, and a far cry from the traditional syndicated research model, we think it’s the best way to produce independent, objective, quality research.

Here’s how it works:

  • Content is developed ‘live’ on the blog. Primary research is generally released in pieces, as a series of posts, so we can digest and integrate feedback, making the end results much stronger than traditional “ivory tower” research.
  • Comments are enabled for posts. All comments are kept except for spam, personal insults of a clearly inflammatory nature, and completely off-topic content that distracts from the discussion. We welcome comments critical of the work, even if somewhat insulting to the authors. Really.
  • Anyone can comment, and no registration is required. Vendors or consultants with a relevant product or offering must properly identify themselves. While their comments won’t be deleted, the writer/moderator will “call out”, identify, and possibly ridicule vendors who fail to do so.
  • Vendors considering licensing the content are welcome to provide feedback, but it must be posted in the comments - just like everyone else. There is no back channel influence on the research findings or posts.
    Analysts must reply to comments and defend the research position, or agree to modify the content.
  • At the end of the post series, the analyst compiles the posts into a paper, presentation, or other delivery vehicle. Public comments/input factors into the research, where appropriate.
  • If the research is distributed as a paper, significant commenters/contributors are acknowledged in the opening of the report. If they did not post their real names, handles used for comments are listed. Commenters do not retain any rights to the report, but their contributions will be recognized.
  • All primary research will be released under a Creative Commons license. The current license is Non-Commercial, Attribution. The analyst, at their discretion, may add a Derivative Works or Share Alike condition.
  • Securosis primary research does not discuss specific vendors or specific products/offerings, unless used to provide context, contrast or to make a point (which is very very rare).
    Although quotes from published primary research (and published primary research only) may be used in press releases, said quotes may never mention a specific vendor, even if the vendor is mentioned in the source report. Securosis must approve any quote to appear in any vendor marketing collateral.
  • Final primary research will be posted on the blog with open comments.
  • Research will be updated periodically to reflect market realities, based on the discretion of the primary analyst. Updated research will be dated and given a version number.
    For research that cannot be developed using this model, such as complex principles or models that are unsuited for a series of blog posts, the content will be chunked up and posted at or before release of the paper to solicit public feedback, and provide an open venue for comments and criticisms.
  • In rare cases Securosis may write papers outside of the primary research agenda, but only if the end result can be non-biased and valuable to the user community to supplement industry-wide efforts or advances. A “Radically Transparent Research” process will be followed in developing these papers, where absolutely all materials are public at all stages of development, including communications (email, call notes).
    Only the free primary research released on our site can be licensed. We will not accept licensing fees on research we charge users to access.
  • All licensed research will be clearly labeled with the licensees. No licensed research will be released without indicating the sources of licensing fees. Again, there will be no back channel influence. We’re open and transparent about our revenue sources.

In essence, we develop all of our research out in the open, and not only seek public comments, but keep those comments indefinitely as a record of the research creation process. If you believe we are biased or not doing our homework, you can call us out on it and it will be there in the record. Our philosophy involves cracking open the research process, and using our readers to eliminate bias and enhance the quality of the work.

On the back end, here’s how we handle this approach with licensees:

  • Licensees may propose paper topics. The topic may be accepted if it is consistent with the Securosis research agenda and goals, but only if it can be covered without bias and will be valuable to the end user community.
  • Analysts produce research according to their own research agendas, and may offer licensing under the same objectivity requirements.
  • The potential licensee will be provided an outline of our research positions and the potential research product so they can determine if it is likely to meet their objectives.
  • Once the licensee agrees, development of the primary research content begins, following the Totally Transparent Research process as outlined above. At this point, there is no money exchanged.
  • Upon completion of the paper, the licensee will receive a release candidate to determine whether the final result still meets their needs.
  • If the content does not meet their needs, the licensee is not required to pay, and the research will be released without licensing or with alternate licensees.
  • Licensees may host and reuse the content for the length of the license (typically one year). This includes placing the content behind a registration process, posting on white paper networks, or translation into other languages. The research will always be hosted at Securosis for free without registration.

Here is the language we currently place in our research project agreements:

Content will be created independently of LICENSEE with no obligations for payment. Once content is complete, LICENSEE will have a 3 day review period to determine if the content meets corporate objectives. If the content is unsuitable, LICENSEE will not be obligated for any payment and Securosis is free to distribute the whitepaper without branding or with alternate licensees, and will not complete any associated webcasts for the declining LICENSEE. Content licensing, webcasts and payment are contingent on the content being acceptable to LICENSEE. This maintains objectivity while limiting the risk to LICENSEE. Securosis maintains all rights to the content and to include Securosis branding in addition to any licensee branding.

Even this process itself is open to criticism. If you have questions or comments, you can email us or comment on the blog.