Securosis

Research

Securing Hadoop: Enterprise Security For NoSQL

Hadoop is now enterprise software. There, I said it. I know lots of readers in the IT space still look at Hadoop as an interloper, or worse, part of the rogue IT problem. But better than 50% of the enterprises we spoke with are running Hadoop somewhere within the organization. A small percentage are running Mongo, Cassandra or Riak in parallel with Hadoop, for specific projects. Discussions on what ‘big data’ is, if it is a viable technology, or even if open source can be considered ‘enterprise software’ are long past. What began as proof of concept projects have matured into critical application services. And with that change, IT teams are now tasked with getting a handle on Hadoop security, to which they response with questions like “How do I secure Hadoop?” and “How do I map existing data governance policies to NoSQL databases?” Security vendors will tell you both attacks on corporate IT systems and data breaches are prevalent, so with gobs of data under management, Hadoop provides a tempting target for ‘Hackers’. All of which is true, but as of today, there really have not been major data breaches where Hadoop play a part of the story. As such this sort of ‘FUD’ carries little weight with IT operations. But make no mistake, security is a requirement! As sensitive information, customer data, medical histories, intellectual property and just about every type of data used in enterprise computing is now commonly used in Hadoop clusters, the ‘C’ word (i.e.: Compliance) has become part of their daily vocabulary. One of the big changes we’ve seen in the last couple of years with Hadoop becoming business critical infrastructure, and another – directly cause by the first – is IT is being tasked with bringing existing clusters in line with enterprise compliance requirements. This is somewhat challenging as a fresh install of Hadoop suffers all the same weak points traditional IT systems have, so it takes work to get security set up and the reports being created. For clusters that are already up and running, no need to choose technologies and a deployment roadmap that does not upset ongoing operations. On top of that, there is the additional challenge that the in-house tools you use to secure things like SAP, or the SIEM infrastructure you use for compliance reporting, may not be suitable when it comes to NoSQL. Building security into the cluster The number of security solutions that are compatible – if not outright built for – Hadoop is the biggest change since 2012. All of the major security pillars – authentication, authorization, encryption, key management and configuration management – are covered and the tools are viable. Most of the advancement have come from the firms that provide enterprise distributions of Hadoop. They have built, and in many cases contributed back to the open source community, security tools that accomplish the basics of cluster security. When you look at the threat-response models introduced in the previous two posts, every compensating security control is now available. Better still, they have done a lot of the integration legwork for services like Kerberos, taking a lot of the pain out of deployments. Here are some of the components and functions that were not available – or not viable – in 2012. LDAP/AD Integration – Technically AD and LDAP integration were available in 2012, but these services have both been advanced, and are easier to integrate than before. In fact, this area has received the most attention, and integration is as simple as a setup wizard with some of the commercial platforms. The benefits are obvious, as firms can leverage existing access and authorization schemes, and defer user and role management to external sources. Apache Ranger – Ranger is one of the more interesting technologies to come available, and it closes the biggest gap: Module security policies and configuration management. It provides a tool for cluster administrators to set policies for different modules like Hive, Kafka, HBase or Yarn. What’s more, those policies are in context to the module, so it sets policies for files and directories when in HDSF, SQL policies when in Hive, and so on. This helps with data governance and compliance as administrators set how a cluster should be used, or how data is to be accessed, in ways that simple role based access controls cannot. Apache Knox – You can think of Knox in it’s simplest form as a Hadoop firewall. More correctly, it is an API gateway. It handles HTTP and REST-ful requests, enforcing authentication and usage policies of inbound requests, and blocking everything else. Knox can be used as a virtual moat’ around a cluster, or used with network segmentation to further reduce network attack surface. Apache Atlas – Atlas is a proposed open source governence framework for Hadoop. It allows for annotation of files and tables, set relationships between data sets, and even import meta-data from other sources. These features are helpful for reporting, data discovery and for controlling access. Atlas is new and we expect it to see significant maturing in coming years, but for now it offers some valuable tools for basic data governance and reporting. Apache Ambari – Ambari is a facility for provisioning and managing Hadoop clusters. It helps admins set configurations and propagate changes to the entire cluster. During our interviews we we only spoke to two firms using this capability, but we received positive feedback by both. Additionally we spoke with a handful of companies who had written their own configuration and launch scripts, with pre-deployment validation checks, usually for cloud and virtual machine deployments. This later approach was more time consuming to create, but offered greater capabilities, with each function orchestrated within IT operational processes (e.g.: continuous deployment, failure recovery, DevOps). For most, Ambari’s ability to get you up and running quickly and provide consistent cluster management is a big win and a suitable choice. Monitoring – Hive, PIQL, Impala, Spark SQL and similar modules offer SQL or pseudo-SQL syntax. This means that the activity monitoring, dynamic masking, redaction and tokenization technologies originally developed for

Share:
Read Post

The Summary is dead. Long live the Summary!

As part of our changes at Securosis this year, it’s time to say goodbye to the old Friday Summary, and hello to the new one. Adrian and I started the Summary way back before Mike joined the company, as our own version of his weekly Security Incite. Our objective was to review the highlights of the week, both our work and things we found on the Internet, typically with an introduction based on events in our personal lives. As we look at growing and changing our focus this year, it’s time for a different format. Mike’s Incite (usually released on Wednesdays) does a great job highlighting important security stories, or whatever we find interesting. The Summary has always overlapped a bit. We also developed a tendency to overstuff it with links. Moving forward we are switching gears, and the Summary will now focus on our main coverage areas: cloud, DevOps, and automation security. The new sections will be more tightly curated and prioritized, to better fit a weekly newsletter format for folks who don’t have time to keep up on everything. We plan to keep the Incite our source for general security industry analysis, with the revised Summary targeting our new focus areas. We are also changing our email list provider from Aweber to MailChimp due to an ongoing technical issue. As part of that switch we will soon offer more email subscription options, which we used to have. You can pick the daily digest of all our posts, the weekly Incite, and/or the weekly Summary. If you want to subscribe directly to the Friday Summary only, just click here. If you have any feedback, as always please feel free to leave a comment or email us at //info@securosis.com. And don’t forget: The EIGHTH Annual Disaster Recovery Breakfast: Clouds Ahead. Top Posts for the Week We missed it when it was released, but Google now has limited management plane logging support. It still isn’t up to CloudTrail, and it’s still in beta, but this is one of the most critical security capabilities enterprises need from a cloud provider. Rumor is Microsoft also has it in beta. This is another good example of using AWS capabilities for security functionality. This is the sort of thing that is built into most WAFs (including cloud WAFs) but we like this post more for showing how you can automate and wire things together than for its particular use case. How to Configure Rate-Based Blacklisting with AWS WAF and AWS Lambda A good non-security perspective on Continuous Delivery. We see a lot of organizations throw the term (along with DevOps) around without focusing on some of the foundational things you need to make it work. Are you ready for Continuous Delivery? GitHub posted a good incident report. This can serve as a decent model for both security and non-security incidents: January 28th Incident Report Node is really popular, but still gives us the security willies at times. This good piece lays out some of the issues: The battle for Node.js security has only begun CloudFormation and other immutable infrastructure tools often have gaps, especially when new products are released. Here’s how to use Python to deal with them, using a security example: Customizing CloudFormation with Python Props to Amazon for this one: AWS’ exhaustive terms of service covers zombie outbreaks Tool of the Week This is a new section highlighting a cloud, DevOps, or security tool we think you should take a look at. We still struggle to keep track of all the interesting tools that can help us; if you have submissions please email them to //info@securosis.com. We are still looking at how we want to handle logging as we rearchitect securosis.com. Our friend Matt J. recommended I look at the fluentd open source log collector. It looks like a good replacement for Logstash, which is pretty heavy and can be hard to configure. You can pump everything into fluentd in an instance, container, or auto-scaled cluster if you need it. It can perform analysis right there, plus you can send them down the chain to things like ElasticSearch/Kibana, AWS Kinesis, or different kinds of storage. What I really like is how it normalizes data into JSON as much as possible, which is great because that’s how we are structuring all our Trinity application logs. Our plan is to use fluentd with some basic rules for securosis.com, pushing the logs into AWS hosted ElasticSearch (to reduce management overhead), and then Kibana to roll our own SIEM. We see a bunch of clients following a similar approach. This also fits well into cloud logging architectures where you collect the logs locally and only send alerts back to the SOC. Especially with S3 support, that can really reduce overall costs. Securosis Blog Posts this Week Securing Hadoop: Operational Security Issues. Other Securosis News and Quotes Cloud Security: Software Defined. Event Driven. Awesome. We are posting our RSA Conference Guide on the RSA Conference blog – here are the latest posts: The Securosis Guide to the RSA Conference 2016: The FUD Awakens! Securosis Guide: Threat Intelligence & Bothan Spies Securosis Guide: R2DevOps Securosis Guide: Escape from Cloud City Training and Events We are giving multiple presentations at the RSA Conference. Rich and Mike are presenting Cloud Security Accountability Tour. Rich is co-presenting with Bill Shinn of AWS: Aspirin as a Service: Using the Cloud to Cure Security Headaches. David Mortman is presenting: Learning from Unicorns While Living with Legacy Docker: Containing the Security Excitement Docker: Containing the Security Excitement (Focus-On) Leveraging Analytics for Data Protection Decisions Rich is presenting on Rugged DevOps at Scale at DevOps Connect the Monday of RSAC We are running two classes at Black Hat USA. Cloud Security Hands-On (CCSK-Plus) Advanced Cloud Security and Applied SecDevOps Share:

Share:
Read Post

Securing Hadoop: Operational Security Issues

Beyond the architectural security issues endemic to Hadoop and NoSQL platforms discussed in the last post, IT teams expect some common security processes and supporting tools familiar from other data management platforms. That includes “turning the dials” on configuration management, vulnerability assessment, and maintaining patch levels across a complex assembly of supporting modules. The day-to-day processes IT managers follow to ensure typical application platforms are properly configured have evolved over years – core platform capabilities, community contributions, and commercial third-party support to fill in gaps. Best practices, checklists, and validation tools to verify things like admin rights are sufficiently tight, and that nodes are patched against known and perhaps even unknown vulnerabilities. Hadoop security has come a long way in just a few years, but it still lacks the maturity in day to day operational security offerings, and it is here that we find most firms continue to struggle. The following is an overview of the most common threats to data management systems, where operational controls offer preventative security measures to close off most common attacks. Again we will discuss the challenges, then map them to mitigation options. Authentication and authorization: Identity and authentication are central to any security effort – without them we cannot determine who should get access to data. Fortunately the greatest gains in NoSQL security have been in identity and access management. This is largely thanks to providers of enterprise Hadoop distributions, who have performed much of the integration and setup work. We have evolved from simple in-database authentication and crude platform identity management to much better integrated LDAP, Active Directory, Kerberos, and X.509 based authentication options. Leveraging those capabilities we can use established roles for authorization mapping, and sometimes extend to fine-grained authorization services with Apache Sentry, or custom authorization mapping controlled from within the calling application the database. Administrative data access: Most organizations have platform administrators and NoSQL database administrators, both with access to the cluster’s files. To provide separation of duties – to ensure administrators cannot view content – a facility is needed to segregate administrative roles and keep unwanted access to a minimum. Direct access to files or data is commonly addressed through a combination of role based-authorization, access control lists, file permissions, and segregation of administrative roles – such as with separate administrative accounts, bearing different roles and credentials. This provides basic protection, but cannot protect archived or snapshotted content. Stronger security requires a combination of data encryption and key management services, with unique keys for each application or cluster. This prevents different tenants (applications) in a shared cluster from viewing each other’s data. Configuration and Patch Management: With a cluster of servers, which may have hundreds of nodes, it is common to run different configurations and patch levels at one time. As nodes are added we see configuration skew. Keeping track of revisions is difficult. Existing configuration management tools can cover the underlying platforms, and HDFS Federation will help with cluster management, but they both leave a lot to be desired – including issuing encryption keys, avoiding ad hoc configuration changes, ensuring file permissions are set correctly, and ensuring TLS is correctly configured. NoSQL systems do not yet have counterparts for the configuration management tools available for relational platforms, and even commercial Hadoop distributions offer scant advice on recommended configurations and pre-deployment checklists. But administrators still need to ensure configuration scripts, patches, and open source code revisions are consistent. So we see NoSQL databases deployed on virtual servers and cloud instances, with home-grown pre-deployment scripts. Alternatively a “golden master” node may embody extensive configuration and validation, propagated automatically to new nodes before they can be added into the cluster. Software Bundles: The application and Hadoop stacks are assembled from many different components. Underlying platforms and file systems also vary – with their own configuration settings, ownership rights, and patch levels. We see organizations increasingly using source code control systems to handle open source version management and application stack management. Container technologies also help developers bundle up consistent application deployments. Authentication of applications and nodes: If an attacker can add a new node they control to the cluster, they can exfiltrate data from the cluster. To authenticate nodes (rather than users) before they can join a cluster, most firms we spoke with either employ X.509 certificates or Kerberos. Both can authenticate users as well, but we draw this distinction to underscore the threat of rogue applications or nodes being added to the cluster. Deployment of these services brings risks as well. For example if a Kerberos keytab file can be accessed or duplicated – perhaps using credentials extracted from virtual image files or snapshots – a node’s identity can be forged. Certificate-based identity options implicitly complicate setup and deployment, but properly deployed they can provide strong authentication and stronger security. Audit and Logging: If you suspect someone has breached your cluster, can you detect it, or trace back to the root cause? You need an activity record, which is usually provided by event logging. A variety of add-on logging capabilities are available, both open source and commercial. Scribe and LogStash are open source tools which integrate into most big data environments, as do a number of commercial products. You can leverage the existing cluster to store logs, build an independent cluster, or even leverage other dedicated platforms like a SIEM or Splunk. That said, some logging options do not provide an auditor sufficient information to determine exactly what actions occurred. You will need to verify that your logs are capturing both the correct event types and user actions. A user ID and IP address are insufficient – you also need to know what queries were issued. Monitoring, filtering, and blocking: There are no built-in monitoring tools to detect misuse or block malicious queries. There isn’t even yet a consensus on what a malicious big data query looks like – aside from crappy MapReduce scripts written by bad programmers. We are just seeing the first viable releases of Hadoop activity monitoring tools. No longer the “after-market speed regulators” they once were, current tools typically embedded into a

Share:
Read Post

Summary: Die Blah, Die!!

Rich here. I was a little burnt out when the start of this year rolled around. Not “security burnout” – just one of the regular downs that hit everyone in life from time to time. Some of it was due to our weird year with the company, a bunch of it was due to travel and impending deadlines, plus there was all the extra stress of trying to train for a marathon while injured (and working a ton). Oh yeah, and I have kids. Two of whom are in school. With homework. And I thought being a paramedic or infosec professional was stressful?!? Even finishing the marathon (did I mention that enough?) didn’t pull me out of my funk. Even starting the planning for Securosis 2.0 only mildly engaged my enthusiasm. I wasn’t depressed by any means – my life is too awesome for that – but I think many of you know what I mean. Just a… temporary lack of motivation. But last week it all faded away. All it took was a break from airplanes, putting some new tech skills into practice, and rebuilding the entire company. A break from work travel is kind of like the reverse of a vacation. The best vacations are a month long – a week to clear the head, two weeks to enjoy the vacation, a week to let the real world back in. A gap in work travel does the same thing, except instead of enjoying vacation you get to enjoy hitting deadlines. It’s kind of the same. Then I spent time on a pet technical project and built the code to show how event-driven security can work. I had to re-learn Python while learning two new Amazon services. It was a cool challenge, and rewarding to build something that worked like I hoped. At the same time I was picking up other new skills for my other RSA Conference demos. The best part was starting to rebuild the company itself. We’re pretty serious about calling this our “Securosis 2.0 pivot”. The past couple weeks we have been planning the structure and products, building out initial collateral, and redesigning the website (don’t worry – with our design firm). I’ve been working with our contractors to build new infrastructure, evaluating new products and platforms, and firming up some partnerships. Not alone – Mike and Adrian are also hard at work – but I think my pieces are a lot more fun because I get the technical parts. It’s one thing to build a demo or write a technical blog post, but it’s totally different to be building your future. And that was the final nail in the blah’s coffin. A month home. Learning new technical skills to build new things. Rebuilding the company to redefine my future. It turns out all that is a pretty motivating combination, especially with some good beer and workouts in the mix, and another trip to see Star Wars (3D IMAX with the kids this time). Now the real challenge: seeing if it can survive the homeowner’s association meeting I need to attend tonight. If I can make it through that, I can survive anything. Photo credit: Blah from pinterest And now on to the Summary: Webcasts, Podcasts, Outside Writing, and Conferences Adrian quoted in CSO Online: Credit card security has no silver bullet Mort quoted on container security: Containers: Security Minefield – or Channel Goldmine? Me on ridiculous travel security: Podcast 492: How to travel like an international superspy A piece I wrote over at TidBITS on government, encryption, and back doors. Also relevant to the Securosis audience: Why Apple Defends Encryption. Securosis Posts Incite 2/3/2016: Courage. Event-Driven AWS Security: A Practical Example. Securing Hadoop: Architectural Security Issues. Securing Hadoop: Architecture and Composition. Securing Hadoop: Security Recommendations for NoSQL platforms [New Series]. The EIGHTH Annual Disaster Recovery Breakfast: Clouds Ahead. Security is Changing. So is Securosis. Incite 1/20/2016 – Ch-ch-ch-ch-changes. Research Reports and Presentations Threat Detection Evolution. Pragmatic Security for Cloud and Hybrid Networks. EMV Migration and the Changing Payments Landscape. Network-based Threat Detection. Applied Threat Intelligence. Endpoint Defense: Essential Practices. Cracking the Confusion: Encryption and Tokenization for Data Centers, Servers, and Applications. Security and Privacy on the Encrypted Network. Monitoring the Hybrid Cloud: Evolving to the CloudSOC. Security Best Practices for Amazon Web Services. Top News and Posts Why lost phones keep pointing at this Atlanta couple’s house This is a really important case: Security firm sued for filing “woefully inadequate” forensics report Chromodo browser disables key web security. Note to security vendors: put your customers first, not marketing. Severe and unpatched eBay vulnerability allows attackers to distribute malware. Not going to be patched, seriously? Software Security Ideas Ahead of Their Time New Technologies Give Government Ample Means to Track Suspects, Study Finds Friendly Fire. This is a really great post on the role of red teams. Congress to investigate US involvement in Juniper’s backdoor. Blog Comment of the Week This week’s best comment goes to Andy, in response to Event-Driven AWS Security: A Practical Example. Cool post. We could consider the above as a solution to an out of band modification of a security group. If the creation and modification of all security groups is via Cloudformation scripts, a DevOps SDLC could be implemented to ensure only approved changes are pushed through in the first place. Another question is how does the above trigger know the modification is unwanted?! It’s a wee bugbear I have with AWS that there’s not currently a mechanism to reference rule functions or change controls. My response: I actually have some techniques to handle out of band approvals, but it gets more advanced pretty quickly (plan is to throw some of them into Trinity once we start letting anyone use it). One quick example… build a workflow that kicks off a notification for approval, then the approval modifies something in Dynamo or S3, then that is one of the conditionals to check. E.g. have your change management system save down a token in S3 in a different account, then the Lambda

Share:
Read Post

Incite 2/3/2016: Courage

A few weeks ago I spoke about dealing with the inevitable changes of life and setting sail on the SS Uncertainty to whatever is next. It’s very easy to talk about changes and moving forward, but it’s actually pretty hard to do. When moving through a transformation, you not only have to accept the great unknown of the future, but you also need to grapple with what society expects you to do. We’ve all been programmed since a very early age to adhere to cultural norms or suffer the consequences. Those consequences may be minor, like having your friends and family think you’re an idiot. Or decisions could result in very major consequences, like being ostracized from your community, or even death in some areas of the world. In my culture in the US, it’s expected that a majority of people should meander through their lives; with their 2.2 kids, their dog, and their white picket fence, which is great for some folks. But when you don’t fit into that very easy and simple box, moving forward along a less conventional path requires significant courage. I recently went skiing for the first time in about 20 years. Being a ski n00b, I invested in two half-day lessons – it would have been inconvenient to ski right off the mountain. The first instructor was an interesting guy in his 60’s, a US Air Force helicopter pilot who retired and has been teaching skiing for the past 25 years. His seemingly conventional path worked for him – he seemed very happy, especially with the artificial knee that allowed him to ski a bit more aggressively. But my instructor on the second day was very interesting. We got a chance to chat quite a bit on the lifts, and I learned that a few years ago he was studying to be a physician’s assistant. He started as an orderly in a hospital and climbed the ranks until it made sense for him to go to school and get a more formal education. So he took his tests and applied and got into a few programs. Then he didn’t go. Something didn’t feel right. It wasn’t the amount of work – he’d been working since he was little. It wasn’t really fear – he knew he could do the job. It was that he didn’t have passion for a medical career. He was passionate about skiing. He’d been teaching since he was 16, and that’s what he loved to do. So he sold a bunch of his stuff, minimized his lifestyle, and has been teaching skiing for the past 7 years. He said initially his Mom was pretty hard on him about the decision. But as she (and the rest of his family) realized how happy and fulfilled he is, they became OK with his unconventional path. Now that is courage. But he said something to me as we were about to unload from the lift for the last run of the day. “Mike, this isn’t work for me. I happened to get paid, but I just love teaching and skiing, so it doesn’t feel like a job.” It was inspiring because we all have days when we know we aren’t doing what we’re passionate about. If there are too many of those days, it’s time to make changes. Changes require courage, especially if the path you want to follow doesn’t fit into the typical playbook. But it’s your life, not theirs. So climb aboard the SS Uncertainty (with me) and embark on a wild and strange adventure. We get a short amount of time on this Earth – make the most of it. I know I’m trying to do just that. Editors note: despite Mike’s post on courage, he declined my invitation to go ski Devil’s Crotch when we are out in Colorado. Just saying. -rich –Mike Photo credit: “Courage” from bfick It’s that time of year again! The 8th annual Disaster Recovery Breakfast will once again happen at the RSA Conference. Thursday morning, March 3 from 8 – 11 at Jillians. Check out the invite or just email us at rsvp (at) securosis.com to make sure we have an accurate count. The fine folks at the RSA Conference posted the talk Jennifer Minella and I did on mindfulness at the 2014 conference. You can check it out on YouTube. Take an hour. Your emails, alerts, and Twitter timeline will be there when you get back. Securosis Firestarter Have you checked out our video podcast? Rich, Adrian, and Mike get into a Google Hangout and… hang out. We talk a bit about security as well. We try to keep these to 15 minutes or less, and usually fail. Dec 8 – 2015 Wrap Up and 2016 Non-Predictions Nov 16 – The Blame Game Nov 3 – Get Your Marshmallows Oct 19 – re:Invent Yourself (or else) Aug 12 – Karma July 13 – Living with the OPM Hack May 26 – We Don’t Know Sh–. You Don’t Know Sh– May 4 – RSAC wrap-up. Same as it ever was. March 31 – Using RSA March 16 – Cyber Cash Cow March 2 – Cyber vs. Terror (yeah, we went there) February 16 – Cyber!!! February 9 – It’s Not My Fault! January 26 – 2015 Trends January 15 – Toddler Heavy Research We are back at work on a variety of blog series, so here is a list of the research currently underway. Remember you can get our Heavy Feed via RSS, with our content in all its unabridged glory. And you can get all our research papers too. Securing Hadoop Architectural Security Issues Architecture and Composition Security Recommendations for NoSQL platforms SIEM Kung Fu Fundamentals Building a Threat Intelligence Program Success and Sharing Using TI Gathering TI Introduction Recently Published Papers Threat Detection Evolution Building Security into DevOps Pragmatic Security for Cloud and Hybrid Networks EMV Migration and the Changing Payments Landscape Applied Threat Intelligence Endpoint Defense: Essential Practices Cracking the Confusion: Encryption & Tokenization for Data Centers, Servers & Applications Security and Privacy on the Encrypted Network Monitoring the Hybrid Cloud Best Practices for AWS Security * The Future of Security Incite 4 U Evolution visually: Wade Baker posted a really awesome

Share:
Read Post

Event-Driven AWS Security: A Practical Example

Would you like the ability to revert unapproved security group (firewall) changes in Amazon Web Services in 10 seconds, without external tools? That’s about 10-20 minutes faster than is typically possible with a SIEM or other external tools. If that got your attention, then read on… If you follow me on Twitter, you might have noticed I went a bit nuts when Amazon Web Services announced their new CloudWatch events a couple weeks ago. I saw them as an incredibly powerful too for event driven security. I will post about the underlying concepts tomorrow, but right now I think it’s better to just show you how it works first. This entire thing took about 4 hours to put together, and it was my first time writing a Lambda function or using Python in 10 years. This example configures an AWS account to automatically revert any Security Group (firewall) changes without human interaction, using nothing but native AWS capabilities. No security tools, no servers, nada. Just wiring together things already built into AWS. In my limited testing it’s effective in 10 seconds or less, and it’s only 100 lines of code – including comments. Yes, this post is much longer than the code to make it all work. I will walk you through setting it up manually, but in production you would want to automate this configuration so you can manage it across multiple AWS accounts. That’s what we use Trinity for, and I’ll talk more about automating automation at the end of the post. Also, this is Amazon specific because no other providers yet expose the needed capabilities. For background it might help to read the AWS CloudWatch events launch post. The short version is that you can instrument a large portion of AWS, and trigger actions based on a wide set of very granular events. Yes, this is an example of the kind of research we are focusing on as part of our cloud pivot. This might look long, but if you follow my instructions you can set it all up in 10-15 minutes. Tops. Prep Work: Turn on CloudTrail If you use AWS you should have CloudTrail set up already; if not you need to activate it and feed the logs to CloudWatch using these instructions. This only takes a minute or two if you accept all the defaults. Step 1: Configure IAM To make life easier I put all my code up on the Securosis public GitHub repository. You don’t need to pull that code – you will copy and paste everything into your AWS console. Your first step is to configure an IAM policy for your workflow; then create a role that Lambda can assume when running the code. Lambda is an AWS service that allows you to store and run code based on triggers. Lambda code runs in a container, but doesn’t require you to manage containers or servers for it. You load the code, and then it executes when triggered. You can build entirely serverless architectures with Lambda, which is useful if you want to eliminate most of your attack surface, but that’s a discussion for another day. IAM in Amazon Web Services is how you manage who can do what in your account, including the capabilities of Amazon services themselves. It is ridiculously granular and powerful, and so the most critical security tool for protecting AWS accounts. Log into the AWS console. Got to the Identity and Access Management (IAM) dashboard. Click on Policies, then Create Policy. Choose Create Your Own Policy. Name it lambda_revert_security_group. Enter a description, then copy and paste my policy from GitHub. My policy allows the Lambda function to access CloudWatch logs, write to the log, view security group information, and revoke ingress or egress statements (but not create new ones). Damn, I love granular policies! Once the policy is set you need to Create New Role. This is the role which the Lambda function will assume when it runs. Name it lambda_revert_security_group, assign it an AWS Lambda role type, then attach the lambda_revert_security_group policy you just created. That’s it for the IAM changes. Next you need to set up the Lambda function and the CloudWatch event. Step 2: Create the Lambda function First make sure you know which AWS region you are working in. I prefer us-west-2 (Oregon) for lab work because it is up to date and tends to support new capabilities early. us-east-1 is the granddaddy of regions, but my lab account has so much cruft after 6+ years that things don’t always work right for me there. Go to Lambda (under Compute on the main services page) and Create a Lambda function. Don’t pick a blueprint – hit the Skip button to advance to the next page. Name your function revertSecurityGroup. Enter a description, and pick Python for the runtime. Then paste my code into the main window. After that pick the lambda_revert_security_group IAM role the function will use. Then click Next, then Create function. A few points on Lambda. You aren’t billed until the function triggers; then you are billed per request and runtime. Lambda is very good for quick tasks, but it does have a timeout (I think an hour these days), and the longer you run a function the less sense it makes compared to a dedicated server. I actually looked at migrating Trinity to Lambda because we could offload our workflows, but at that time it had a 5-minute timeout, and running hour-long workflows at scale would likely have killed us financially. Now some notes on my code. The main function handler includes a bunch of conditional statements you can use to only trigger reverting security group changes based on things like who requested the change, which security group was changed, whether the security group is in a specified VPC, and whehter the security group has a particular tag. None of those lines will work for you, because they refer to specific identifiers in my account – you need to change them to work in your account. By default, the function will revert any security group change in your account. You need to cut and paste the line “revert_security_group(event)” into a conditional block to run only on matching conditions. The function only works for inbound rule changes. It

Share:
Read Post

Securing Hadoop: Architectural Security Issues

Now that we have sketched out the elements a Hadoop cluster, and what one looks like, let’s talk threats to the databases. We want to consider both the database infrastructure itself, as well as the data under management. Given the complexity of a Hadoop cluster, the task is closer to securing an entire data center than a typical relational database. All the features that provide flexibility, scalability, performance, and openness, create specific security challenges. The following are some specific threats to clustered databases. Data access & ownership: Role-based access is central to most database security schemes, and NoSQL is no different. Relational and quasi-relational platforms include roles, groups, schemas, label security, and various other facilities for limiting user access to subsets of available data. Most big data environments now offer integration with identity stores, along with role-based facilities to divide up data access between groups of users. That said, authentication and authorization require cooperation between the application designer and the IT team managing the cluster. Leveraging existing Active Directory or LDAP services helps tremendously with defining user identities, and pre-defined roles may be available for limiting access to sensitive data. Data at rest protection: The standard for protecting data at rest is encryption, which protects against attempts to access data outside established application interfaces. With Hadoop systems we worry about people stealing archives or directly reading files from disk. Encrypted files are protected against access by users without encryption keys. Replication effectively replaces backups for big data, but beware a rogue administrator or cloud service manager creating their own backups. Encryption limits how data can be copied from the cluster. Unlike 2012, where the lack of suitable encryption was a serious issue. Apache offers HDFS encryption as an option; this is a major advance, but remember that you can only encrypt HDFS, and you’ll need to fill the gaps with key management and key storage. Several commercial Hadoop vendors offer transparent encryption, and third parties have advanced the state of the art, with transparent encryption options for both both HDFS and non-HDFS on-disk formats, especially coupled with parallel progress in key management. Inter-node communication: Hadoop and the vast majority of distributions (Cassandra, MongoDB, Couchbase, etc.) don’t communicate securely by default – they use unencrypted RPC over TCP/IP. TLS and SSL are bundled in big data distributions, but not typically used between applications and databases – and almost never for inter-node communication. This leaves data in transit, and application queries, accessible for inspection and tampering. Client interaction: Clients interact with resource managers and nodes. While gateway services can be created to load data, clients communicate directly with both resource managers and individual data nodes. Compromised clients can send malicious data or links to either service. This facilitates efficient communication but makes it difficult to protect nodes from clients, clients from nodes, and even name servers from nodes. Worse, the distribution of self-organizing nodes is a poor fit for security tools such as gateways, firewalls, and monitors. Many security tools are designed to require a choke-point or span port, which may not be available in a peer-to-peer mesh cluster. Distributed nodes: One of the reasons big data makes sense is an old truism: “moving computation is cheaper than moving data”. Data is processed wherever resources are available, enabling massively parallel computation. Unfortunately this produces complicated environments with lots of attack surface. With so many moving parts, it is difficult to verify consistency or security across a highly distributed cluster of (possibly heterogeneous) platforms. Patching, configuration management, node identity, and data at rest protection – and consistent deployment of each – are all issues. Threat-response models One or more security countermeasures are available to mitigate each threat identified above. The following diagram shows which specific options you have at your disposal to help you choose a ‘preventative’ security measure. We don’t have room to go into much detail on the tradeoffs of each response – each area really deserves its own paper. But we do want to mention a couple areas where we have seen the most change since our original research four years ago. If your goal is to protect session privacy – either between clients and data nodes, or for inter-node communication – Transport Layer Security (TLS) is your first choice. This was unheard of in 2012, but since then about 25% of the companies we spoke with have implemented SSL or TLS for inter-node communication – not just between applications and name servers. Transport encryption protects all communications from access or modification by attackers. Some firms instead use network segmentation and firewalls to ensure that attackers cannot access network traffic. This approach is less robust but much easier to implement. Some clusters were deployed to third-party cloud services, where virtualized network services make sniffing nearly impossible; these companies typically chose not to encrypt internal cluster communications. Enforcing data usage is one of the areas where we have seen the most progress, thanks to database links into existing Active Directory and LDAP identity stores. This seems obvious now but was a rarity in 2012, when data architects were focused on scalability and getting basic analytics up and running. Fortunately support for linking identity stores to Hadoop clusters has advanced considerably, making it much easier to leverage existing roles and management infrastructure. But we also have other tools at our disposal. We don’t see it often, but a handful of organizations encrypt sensitive data elements at the application layer, so information is stored as encrypted elements. This way the application manages decryption and key management functions, and can offer additional controls over who can see which information. This is very secure, but must be designed in during application design and coded into the application from the beginning. Retrofitting application-layer encryption into an existing application and database stack is highly challenging at beast, which is why we also see wide usage of masking and redaction technologies – from both enterprise Hadoop vendors and third-party security vendors. These technologies offer fine control over which data is displayed to which users, and can be easily built into existing clusters to

Share:
Read Post
dinosaur-sidebar

Totally Transparent Research is the embodiment of how we work at Securosis. It’s our core operating philosophy, our research policy, and a specific process. We initially developed it to help maintain objectivity while producing licensed research, but its benefits extend to all aspects of our business.

Going beyond Open Source Research, and a far cry from the traditional syndicated research model, we think it’s the best way to produce independent, objective, quality research.

Here’s how it works:

  • Content is developed ‘live’ on the blog. Primary research is generally released in pieces, as a series of posts, so we can digest and integrate feedback, making the end results much stronger than traditional “ivory tower” research.
  • Comments are enabled for posts. All comments are kept except for spam, personal insults of a clearly inflammatory nature, and completely off-topic content that distracts from the discussion. We welcome comments critical of the work, even if somewhat insulting to the authors. Really.
  • Anyone can comment, and no registration is required. Vendors or consultants with a relevant product or offering must properly identify themselves. While their comments won’t be deleted, the writer/moderator will “call out”, identify, and possibly ridicule vendors who fail to do so.
  • Vendors considering licensing the content are welcome to provide feedback, but it must be posted in the comments - just like everyone else. There is no back channel influence on the research findings or posts.
    Analysts must reply to comments and defend the research position, or agree to modify the content.
  • At the end of the post series, the analyst compiles the posts into a paper, presentation, or other delivery vehicle. Public comments/input factors into the research, where appropriate.
  • If the research is distributed as a paper, significant commenters/contributors are acknowledged in the opening of the report. If they did not post their real names, handles used for comments are listed. Commenters do not retain any rights to the report, but their contributions will be recognized.
  • All primary research will be released under a Creative Commons license. The current license is Non-Commercial, Attribution. The analyst, at their discretion, may add a Derivative Works or Share Alike condition.
  • Securosis primary research does not discuss specific vendors or specific products/offerings, unless used to provide context, contrast or to make a point (which is very very rare).
    Although quotes from published primary research (and published primary research only) may be used in press releases, said quotes may never mention a specific vendor, even if the vendor is mentioned in the source report. Securosis must approve any quote to appear in any vendor marketing collateral.
  • Final primary research will be posted on the blog with open comments.
  • Research will be updated periodically to reflect market realities, based on the discretion of the primary analyst. Updated research will be dated and given a version number.
    For research that cannot be developed using this model, such as complex principles or models that are unsuited for a series of blog posts, the content will be chunked up and posted at or before release of the paper to solicit public feedback, and provide an open venue for comments and criticisms.
  • In rare cases Securosis may write papers outside of the primary research agenda, but only if the end result can be non-biased and valuable to the user community to supplement industry-wide efforts or advances. A “Radically Transparent Research” process will be followed in developing these papers, where absolutely all materials are public at all stages of development, including communications (email, call notes).
    Only the free primary research released on our site can be licensed. We will not accept licensing fees on research we charge users to access.
  • All licensed research will be clearly labeled with the licensees. No licensed research will be released without indicating the sources of licensing fees. Again, there will be no back channel influence. We’re open and transparent about our revenue sources.

In essence, we develop all of our research out in the open, and not only seek public comments, but keep those comments indefinitely as a record of the research creation process. If you believe we are biased or not doing our homework, you can call us out on it and it will be there in the record. Our philosophy involves cracking open the research process, and using our readers to eliminate bias and enhance the quality of the work.

On the back end, here’s how we handle this approach with licensees:

  • Licensees may propose paper topics. The topic may be accepted if it is consistent with the Securosis research agenda and goals, but only if it can be covered without bias and will be valuable to the end user community.
  • Analysts produce research according to their own research agendas, and may offer licensing under the same objectivity requirements.
  • The potential licensee will be provided an outline of our research positions and the potential research product so they can determine if it is likely to meet their objectives.
  • Once the licensee agrees, development of the primary research content begins, following the Totally Transparent Research process as outlined above. At this point, there is no money exchanged.
  • Upon completion of the paper, the licensee will receive a release candidate to determine whether the final result still meets their needs.
  • If the content does not meet their needs, the licensee is not required to pay, and the research will be released without licensing or with alternate licensees.
  • Licensees may host and reuse the content for the length of the license (typically one year). This includes placing the content behind a registration process, posting on white paper networks, or translation into other languages. The research will always be hosted at Securosis for free without registration.

Here is the language we currently place in our research project agreements:

Content will be created independently of LICENSEE with no obligations for payment. Once content is complete, LICENSEE will have a 3 day review period to determine if the content meets corporate objectives. If the content is unsuitable, LICENSEE will not be obligated for any payment and Securosis is free to distribute the whitepaper without branding or with alternate licensees, and will not complete any associated webcasts for the declining LICENSEE. Content licensing, webcasts and payment are contingent on the content being acceptable to LICENSEE. This maintains objectivity while limiting the risk to LICENSEE. Securosis maintains all rights to the content and to include Securosis branding in addition to any licensee branding.

Even this process itself is open to criticism. If you have questions or comments, you can email us or comment on the blog.