Securosis

Research

New Series: Understanding and Selecting Identity Management for Cloud Services

Adrian and Gunnar here, kicking off a new series on Identity Management for Cloud Services. We have been hearing about Federated Identity and Single Sign-On services for the last decade, but demand for these features has only fully blossomed in the last few years, as companies have needed to integrate their internal identity management systems. The meanings of these terms has been actively evolving, under the influence of cloud computing. The ability to manage what resources your users can access outside your corporate network – on third party systems outside your control – is not just a simple change in deployment models; but a fundamental shift in how we handle authentication, authorization, and provisioning. Enterprises want to extend capabilities to their users of low-cost cloud service providers – while maintaining security, policy management, and compliance functions. We want to illuminate these changes in approach and technology. And if you have not been keeping up to date with these changes in the IAM market, you will likely need to unlearn what you know. We are not talking about making your old Active Directory accessible to internal and external users, or running LDAP in your Amazon EC2 constellation. We are talking about the fusion of multiple identity and access management capabilities – possibly across multiple cloud services. We are gaining the ability to authorize users across multiple services, without distributing credentials to each and every service provider. Cloud services – be they SaaS, PaaS, or IaaS – are not just new environments in which to deploy existing IAM tools. They fundamentally shift existing IAM concepts. It’s not just the way IT resources are deployed in the cloud, or the way consumers want to interact with those resources, which have changed, but those changes are driven by economic models of efficiency and scale. For example enterprise IAM is largely about provisioning users and resources into a common directory, say Active Directory or RACF, where the IAM tool enforces access policy. The cloud changes this model to a chain of responsibility, so a single IAM instance cannot completely mediate access policy. A cloud IAM instance has a shared responsibility in – as an example – assertion or validation of identity. Carving up this set of shared access policy responsibilities is a game changer for the enterprise. We need to rethink how we manage trust and identities in order to take advantage of elastic, on-demand, and widely available web services for heterogenous clients. Right now, behind the scenes, new approaches to identity and access management are being deployed – often seamlessly into cloud services we already use. They reduce the risk and complexity of mapping identity to public or semi-public infrastructure, while remaining flexible enough to take full advantage of multiple cloud service and deployment models. Our goal for this series is to illustrate current trends and technologies that support cloud identity, describe the features available today, and help you navigate through the existing choices. The series will cover: The Problem Space: We will introduce the issues that are driving cloud identity – from fully outsourced, hybrid, and proxy cloud services and deployment models. We will discuss how the cloud model is different than traditional in-house IAM, and discuss issues raised by the loss of control and visibility into cloud provider environments. We will consider the goals of IAM services for the cloud – drilling into topics including identity propagation, federation, and roles and responsibilities (around authentication, authorization, provisioning, and auditing). We will wrap up with the security goals we must achieve, and how compliance and risk influence decisions. The Cloud Providers: For each of the cloud service models (SaaS, PaaS, and IaaS) we will delve into the IAM services built into the infrastructure. We will profile IAM offerings from some of the leading independent cloud identity vendors for each of the service models – covering what they offer and how their features are leveraged or integrated. We will illustrate these capabilities with a simple chart that shows what each provides, highlighting the conceptual model each vendor embraces to supply identity services. We will talk about what you will be responsible for as a customer, in terms of integration and management. This will include some of the deficiencies of these services, as well as areas to consider augmenting. Use Cases: We will discuss three of the principal use cases we see today, as organizations move existing applications to the cloud and develop new cloud services. We will cover extending existing IAM systems to cover external SaaS services, developing IAM for new applications deployed on IaaS/PaaS, and adopting Identity as a Service for fully external IAM. Architecture and Design: We will start by describing key concepts, including consumer/service patterns, roles, assertions, tokens, identity providers, relying party applications, and trust. We will discuss the available technologies fors the heavy lifting (such as SAML, XACML, and SCIM) and discuss the problems they are designed to solve. We will finish with an outline of the different architectural models that will frame how you implement cloud identity services, including the integration patterns and tools that support each model. Implementation Roadmap: IAM projects are complex, encompass most IT infrastructure, and may take years to implement. Trying to do everything at once is a recipe for failure. This portion of our discussion will help ensure you don’t bite off more than you can chew. We will discuss how to select an architectural model that meets your requirements, based on the cloud service and deployment models you selected. Then we will create different implementation roadmaps depending on your project goals and critical business requirements. Buyer’s Guide: We will close by examining key decision criteria to help select a platform. We will provide questions to determine with vendors offer solutions that support your architectural model and criteria to measure the appropriateness of a vendor solution against your design goals. We will also help walk you through the evaluation process. As always, we encourage you to ask questions and chime in with comments and suggestions.

Share:
Read Post

Securing Big Data: Recommendations and Open Issues

Our previous two posts outlined several security issues inherent to big data architecture, and operational security issues common to big data clusters. With those in mind, how can one go about securing a big data cluster? What tools and techniques should you employ? Before we can answer those questions we need some ground rules, because not all ‘solutions’ are created equally. Many vendors claim to offer big data security, but they are really just selling the same products they offer for other back office systems and relational databases. Those products might work in a big data cluster, but only by compromising the big data model to make it fit the restricted envelope of what they can support. Their constraints on scalability, coverage, management, and deployment are all at odds with the essential big data features we have discussed. Any security product for big data needs a few characteristics: It must not compromise the basic functionality of the cluster It should scale in the same manner as the cluster It should not compromise the essential characteristics of big data It should address – or at least mitigate – a security threat to big data environments or data stored within the cluster. So how can we secure big data repositories today? The following is a list of common challenges, with security measures to address them: User access: We use identity and access management systems to control users, including both regular and administrator access. Separation of duties: We use a combination of authentication, authorization, and encryption to provide separation of duties between administrative personnel. We use application space, namespace, or schemata to logically segregate user access to a subset of the data under management. Indirect access: To close “back doors” – access to data outside permitted interfaces – we use a combination of encryption, access control, and configuration management. User activity: We use logging and user activity monitoring (where available) to alert on suspicious activity and enable forensic analysis. Data protection: Removal of sensitive information prior to insertion and data masking (via tools) are common strategies for reducing risk. But the majority of big data clusters we are aware of already store redundant copies of sensitive data. This means the data stored on disk must be protected against unauthorized access, and data encryption is the de facto method of protecting sensitive data at rest. In keeping with the requirements above, any encryption solution must scale with the cluster, must not interfere with MapReduce capabilities, and must not store keys on hard drives along with the encrypted data – keys must be handled by a secure key manager. Eavesdropping: We use SSL and TLS encryption to protect network communications. Hadoop offers SSL, but its implementation is limited to client connections. Cloudera offers good integration of TLS; otherwise look for third party products to close this gap. Name and data node protection: By default Hadoop HTTP web consoles (JobTracker, NameNode, TaskTrackers, and DataNodes) allow access without any form of authentication. The good news is that Hadoop RPC and HTTP web consoles can be configured to require Kerberos authentication. Bi-directional authentication of nodes is built into Hadoop, and available in some other big data environments as well. Hadoop’s model is built on Kerberos to authenticate applications to nodes, nodes to applications, and client requests for MapReduce and similar functions. Care must be taken to secure granting and storage of Kerberos tickets, but this is a very effective method for controlling what nodes and applications can participate on the cluster. Application protection: Big data clusters are built on web-enabled platforms – which means that remote injection, cross-site scripting, buffer overflows, and logic attacks against and through client applications are all possible avenues of attack for access to the cluster. Countermeasures typically include a mixture of secure code development practices (such as input validation, and address space randomization), network segmentation, and third-party tools (including Web Application Firewalls, IDS, authentication, and authorization). Some platforms offer built-in features to bolster application protection, such as YARN’s web application proxy service. Archive protection: As backups are largely an intractable problem for big data, we don’t need to worry much about traditional backup/archive security. But just because legitimate users cannot perform conventional backups does not mean an attacker would not create at least a partial backup. We need to secure the management plane to keep unwanted copies of data or data nodes from being propagated. Access controls, and possibly network segregation, are effective countermeasures against attackers trying to gain administrative access, and encryption can help protect data in case other protections are defeated. In the end, our big data security recommendations boil down to a handful of standard tools which can be effective in setting a secure baseline for big data environments: Use Kerberos: This is effective method for keeping rogue nodes and applications off your cluster. And it can help protect web console access, making administrative functions harder to compromise. We know Kerberos is a pain to set up, and (re-)validation of new nodes and applications takes work. But without bi-directional trust establishment it is too easy to fool Hadoop into letting malicious applications into the cluster, or into accepting introduce malicious nodes – which can then add, alter, or extract data. Kerberos is one of the most effective security controls at your disposal, and it’s built into the Hadoop infrastructure, so use it. File layer encryption: File encryption addresses two attacker methods for circumventing normal application security controls. Encryption protects in case malicious users or administrators gain access to data nodes and directly inspect files, and it also renders stolen files or disk images unreadable. Encryption protects against two of the most serious threats. Just as importantly, it meets our requirements for big data security tools – it is transparent to both Hadoop and calling applications, and scales out as the cluster grows. Open source products are available for most Linux systems; commercial products additionally offer external key management, trusted binaries, and full support. This is a cost-effective way to address several

Share:
Read Post

Securing Big Data: Operational Security Issues

Before I dig into today’s post I want to share a couple observations. First, my new copy of the Harvard Business Review just arrived. The cover story is “Getting Control of Big Data”. It’s telling that HBR thinks big data is a trend important enough to warrant a full spread, and feel business managers need to understand big data and the benefits and risks it poses to business. As soon as I finish this post I intend to dive into these articles. Now that I have just about finished this research effort, I look forward to contrasting what I have discovered with their perspective. Second, when we talk about big data security, we are really referring to both data and infrastructure security. We want to protect the application (or database, if you prefer that term) that manages data, with the end goal of protecting the information under management. If an attacker can access data directly, bypassing the database management system, they will. Barring a direct path to the information, they will look for weaknesses in or ways to subvert the database application. So it’s important to remember that when we talk about database security we mean both data and infrastructure protection. Finally, a point about clarity. Big data security is one of the tougher topics to describe, especially as we here at Securosis prefer to describe things in black and white terms for the sake of clarity. But for just about every rule we establish and every emphatic statement we make, we have to acknowledge exceptions. Given the variety of different big data distributions and add-on capabilities, you can likely find a single instance of every security control described in today’s post. But it’s usually a single security control, like encryption, with the other security controls absent from the various packages. Nothing offers even a partial suite of solutions, much less a comprehensive offering. Today I want to discuss operational security of big data environments. Unlike yesterday’s post that discussed architectural security issues endemic to the platform, it is now time to address security controls of an operational nature. That includes “turning the dials” things like configuration management and access controls, as well as “bolt-on” capabilities such as auditing and security gateways. We see the greatest impact in these areas, and vendors jumping in with security offerings to fill the gaps. Normally when we consider how to secure data repositories, we consider the following major areas: Encryption: The standard for protecting data at rest is encryption to protect data from undesired access. And just because folks don’t use archiving features to back up data does not mean a rogue DBA or cloud service manager won’t. I think two or three of the more obscure NoSQL variants provides encryption for data at rest, but most do not. And the majority of available encryption products offer neither sufficient horizontal scalability nor adequate transparency for use with big data. This is a critical issue. Administrative data access: Each node has an admin, and each admin can read the node’s data if they choose. As with encryption, we need a boundary or facility to provide separation of duties between different administrators. The requirement is the same as on relational platforms – but big data platforms lack their array of built-in facilities, documentation, and third party tools to address requirements. Unwanted direct access to data files or data node processes can be addressed through a combination of access controls, separation of roles, and encryption technologies, but out-of-the box data is only as secure as your least trustworthy administrator. It’s up to the system designer to select controls to close this gap. Configuration and patch management: When managing a cluster of servers, it’s common to have nodes running different configurations and patch levels. And if you’re using dissimilar platforms to support the cluster you need to figure out what how to handle management. Existing configuration management tools work for underlying platforms, and HDFS Federation will help with cluster management, but careful planning is still necessary. I will go more detail about how in the next post, when I make recommendations. The cluster may tolerate nodes cycling without loss of data service interruption, but reboots can still cause serious performance issues, depending on which nodes are affected and how the cluster is configured. The upshot is that people don’t patch, fearing user complaints. Perhaps you have heard that one before. Authentication of applications/clients: Hadoop uses Kerberos to authenticate users and add-on services to the HDFS cluster. But a rogue client can be inserted onto the network if a Kerberos ticket is stolen or duplicated. This is more of a concern when embedding credentials in virtual and cloud environments, where it’s relatively easy to introduce an exact replica of a client app or service. A clone of a node is often all that’s needed to introduce a corrupted node or service into a cluster, it’s easy to impersonate or a service in the cluster, but it requires an attacker to compromise the management plane of your environment, or obtain a backup, of a client. Regardless of it being a pain to set up, strong authentication through Kerberos is one of your principle security tools, it helps solve the critical problem of who can access hadoop services. Audit and logging: One area with a variety of add-on capabilities is logging. Scribe and LogStash are open source tools that integrate into most big data environments, as do a number of commercial products. So you just need to find a compatible tool, install it, integrate it with other systems such as SIEM or log management, and then actually review the results. Without actually looking at the data and developing policies to detect fraud, logging is not useful. Monitoring, filtering, and blocking: There are no built-in monitoring tools to look for misuse or block malicious queries. In fact I don’t believe anyone has ever described what a malicious query might look like in a big data environment – other than crappy MapReduce

Share:
Read Post

New Research Paper: Pragmatic WAF Management

We are proud to announce a new research paper on Pragmatic Web Application Firewall Management. This paper has been a long time coming – we have been researching this topic for three years, looking for the right time to discuss WAF’s issues. Our key finding is that Web Application Firewalls can genuinely raise the bar on application security. Properly set up they block many attacks such as SQL injection and, just as importantly, ‘virtually’ patch applications faster than code fixes can be implemented. There is ample evidence that building security into applications from the get-go is more efficient, but unfortunately it may not be practical or even realistic. Most firms already have dozens – if not thousands – of vulnerable web apps that will take years to fix. So the real answer is both: “build security in” and “bolt security on”. And that is how WAFs help protect web applications when timely code fixes are not an option. During our research we heard a lot of negative feedback from various security practitioners – specifically pen testers – about how WAFs barely slow skilled attackers down. We heard many true horror stories, but they were not due to any specific deficiency in WAF technology. The common theme among critics was that problems stemmed from customers’ ineffective management practices in WAF deployment and tuning of rules. We also heard many stories about push-back from development teams who refused to wade through the reams of vulnerability output generated by WAFs. Some of this was due to poor report quality of WAF products, and some was due to internal politics and process issues. But in both cases we concluded from hundreds of conversations that WAF provides a unique value, and its issues can be mitigated through effective management. For more detailed information on our recommendations, as well as how we reached our conclusions, we encourage you to grab a copy of the white paper. Finally, Securosis provides the vast bulk of our research for free and without user registration. Our goal, as always, is to help organizations understand security issues and products, and to help get your job done with as little headache as possible. But it’s community support that enables us to produce our research, so we want to make special mention of those firms who have sponsored this paper: Alert Logic, Barracuda Networks, and Fortinet. We want to thank our sponsors as well as those of you who took time to discuss your WAF stories and provide feedback during this project! The paper is available to download: Pragmatic WAF Management (PDF). Share:

Share:
Read Post

Securing Big Data: Architectural Issues

In the previous post we went to some length to define what big data is – because the architectural model is critical to understanding how it poses different security challenges than traditional databases, data warehouses, and massively parallel processing environments. What distinguishes big data environments is their fundamentally different deployment model. Highly distributed and elastic data repositories enabled by the Hadoop File System. A distributed file system provides many of the essential characteristics (distributed redundant storage across resources) and enables massively parallel computation. But specific aspects of how each layer of the stack integrates – such as how data nodes communicate with clients and resource management facilities – raise many concerns. For those of you not familiar with the model, this is the Hadoop architecture. Architectural Issues Distributed nodes: The idea that “Moving Computation is Cheaper than Moving Data” is key to the model. Data is processed anywhere processing resources are available, enabling massively parallel computation. It also creates a complicated environment with a large attack surface, and it’s harder to verify consistency of security across all nodes in a highly distributed cluster of possibly heterogeneous platforms. ‘Sharded’ data: Data within big data clusters is fluid, with multiple copies moving to and from different nodes to ensure redundancy and resiliency. This automated movement makes it very difficult to know precisely where data is located at any moment in time, or how many copies are available. This runs counter to traditional centralized data security models, when data is wrapped in various protections until it’s used for processing. Big data is replicated in many places and moves as needed. The ‘containerized’ data security model is missing – as are many other relational database concepts. Write once, read many: Big data clusters handle data differently than other data management systems. Rather than the classical “Insert, Update, Select, and Delete” set of basic operations, they focus on write (Insert) and read (Select). Some big data environments don’t offer delete or update capabilities at all. It’s a ‘write once, read many’ model, which is excellent for performance. And it’s a great way to collect a sequence of events and track changes over time, but removing and overwriting sensitive data can be problematic. Data management is optimized for performance of insertion and query processing, at the expense of content manipulation. Inter-node communication: Hadoop and the vast majority of available add-ons that extend core functions don’t communicate securely. TLS and SSL are rarely available. When they are – as with HDFS proxies – they only cover client-to-proxy communication, not proxy-to-node sessions. Cassandra does offer well-engineered TLS, but it’s the exception. Data access/ownership: Role-based access is central to most database security schemes. Relational and quasi-relational platforms include roles, groups, schemas, label security, and various other facilities for limiting user access, based on identity, to an authorized subset of the available data set. Most big data environments offer access limitations at the schema level, but no finer granularity than that. It is possible to mimic these more advanced capabilities in big data environments, but that requires the application designer to build these functions into applications and data storage. Client interaction: Clients interact with resource managers and nodes. While gateway services for loading data can be defined, clients communicate directly with both the master/name server and individual data nodes. The tradeoff this imposes is limited ability to protect nodes from clients, clients from nodes, and even name servers from nodes. Worse, the distribution of self-organizing nodes runs counter to many security tools such as gateways/firewalls/monitoring which require a ‘chokepoint’ deployment architecture. Security gateways assume linear processing, and become clunky or or overly restrictive in peer-to-peer clusters. NoSecurity: Finally, and perhaps most importantly, big data stacks build in almost no security. As of this writing – aside from service-level authorization, access control integration, and web proxy capabilities from YARN – no facilities are available to protect data stores, applications, or core Hadoop features. All big data installations are built upon a web services model, with few or no facilities for countering common web threats, (i.e. anything on the OWASP Top Ten) so most big data installations are vulnerable to well known attacks. There are a couple other issues with securing big data on an architectural level, which are not issues specifically with big data, but with security products in general. To add security capabilities into a big data environment, they need to scale with the data. Most ‘bolt-on’ security does not scale this way, and simply cannot keep up. Because the security controls are not built into the products, there is a classic impedance mismatch between NoSQL environments and aftermarket security tools. Most security vendors have adapted their existing offerings as well as they can – usually working at data load time – but only a handful of traditional security products can dynamically scale along with a Hadoop cluster. The next post will go into day to day operational security issues. Share:

Share:
Read Post

Friday Summary: September 21, 2012

Adrian here … I had a few surgical procedures over the past few weeks. They corrected some vascular defects that were causing several problems, some of which had been coming on for such a long time I was unaware that there was an issue. The whole boiling frog in a beaker concept. And with the slow progression I was ignorant of the extent of the damage it was causing. The good news is that procedures were successful and their positive benefit was far greater than I anticipated. This whole series of events hammered home a concept that I have been intellectually aware of for a long time, but not lived out to this degree. Many people have blogged about how and why people make bad security tradeoffs. Instinct, fear, lower brain functions, and other ways we are wired to make some decisions and not others. Bruce Schneier has been talking about this for 10 years or more. But I think for the first time I really understand it at a basic level. When I was a kid I had a very strong vasovagal response. I get the lightheadedness, nausea, feeling of being extremely hot, and sweating. I don’t get the fuzziness, inability to speak, or weakness. But I only ever got it when I went to the eye doctor and they administered the glaucoma test. Nothing else has ever bugged me – until this recent surgery. For the first time I saw it in slow motion, with the internal conversation going something like this: The upper, rational part of my brain says: “I’m really looking forward to getting this stuff fixed and moving on with my life.” The lower part that’s wired into all critical infrastructure say: “Something’s wrong. Something bad is happening to your leg. Fix it!” The upper brain: “It’s okay, the doctor’s just fixing some veins. Don’t …” Lower brain: “NO, it’s not! Kick that F**ker in the head! Kick him then run like hell!” Lower brain wins. And all these years I just thought I hated my eye doctor. Who knew? But getting that very strange sensation again was both odd and educational. Being aware of the condition and watching yourself react as an adult is a whole different experience; you consciously witness two parts of your brain at odds. And I know how to work through it without passing out, but it involves the same stomach compression techniques jet pilots learn to combat G-forces. A great way to creep out the hospital staff too, but it kept me alert through the physical manifestations of the conflict to ‘witness’ the internal debate. No wonder we’re so flawed when if comes to making judgements when threats or fear are involved. I can be aware of it and still do very little about it. You body would rather shut down than deal with it. On to the Summary: Webcasts, Podcasts, Outside Writing, and Conferences Adrian’s Dark Reading post on Encrypted Query Processing. Favorite Securosis Posts Mike Rothman: Inflection. Rich provides some food for thought on what the future of security looks like. Read it. Think about it. See what makes sense to you. Adrian Lane: It’s Time for Enterprises to Support a “Backup” Browser. No way to be secure with just one. And don’t get me started on browsing from mobile devices! Other Securosis Posts Incite 9/20/2012: Scabs. Securing Big Data: Security Issues with Hadoop Environments. Attend Gunnar’s Kick-A Mobile Security and Development Class. Friday Summary: September 14, 2012. Favorite Outside Posts Mike Rothman: Antivirus programs often poorly configured, study finds. In a “master of the obvious” research finding, the OPSWAT guys tell us that even if AV worked (which it doesn’t), most folks misconfigure their controls anyway and don’t have the right stuff turned on. And you wonder why the busiest guys in the industry are the folks tracking all the breaches? Adrian Lane: Looking Inside Your Screenshots. So many good posts this week, but I thought this was the most interesting. I have never been a big fan of digital watermarking – it’s too easy to detect and evade for images and music files, and we know it degrades content. But in this case it’s more difficult to detect and does not degrade the content – and it gives Blizzard a hammer to use in legal cases as they have solid user/client identity. Sneaky, and if you give it some thought, there are other practical applications of this approach. Rich: Compliance lessons from Lance at EmergentChaos. As the resident Securosis cycling fan there’s no way I wasn’t going to pick this one. Only difference is Lance previously, clearly, stated he didn’t dope… which isn’t the same as his recent comments more focused on ‘compliance’. Project Quant Posts Malware Analysis Quant: Index of Posts. Malware Analysis Quant: Metrics – Monitor for Reinfection. Malware Analysis Quant: Metrics – Remediate. Malware Analysis Quant: Metrics – Find Infected Devices. Research Reports and Presentations Understanding and Selecting Data Masking Solutions. Evolving Endpoint Malware Detection: Dealing with Advanced and Targeted Attacks. Implementing and Managing a Data Loss Prevention Solution. Defending Data on iOS. Top News and Posts OWASP ZAP – the Firefox of web security tools Coders Behind the Flame Malware Left Incriminating Clues on Control Servers. A fun read. And if most code received this level of scrutiny, we would have much better code! Attack Easily Cracks Oracle Database Passwords Internet Explorer Fix is available now Media Manipulation and Social Engineering Mobile Pwn2Own ownage Hacker Steals $140k From Lock Poker Account RSnake donated XSS filter cheat sheet to OWASP BSIMM 4 Released Petco Releases Coupon On Internet, Forgets How Internet Works Majority of companies suffered a web application security breach Massachusetts group to pay $1.5M HIPAA settlement. I would love to see the “corrective plan of action”. Web Cryptography API draft published Java zero-day leads to Internet Explorer zero-day Blog Comment of the Week Remember, for every comment selected, Securosis makes a $25 donation to Hackers for Charity. This week’s best

Share:
Read Post

Securing Big Data: Security Issues with Hadoop Environments

How do I secure “big data”? A simple and common question. But one without a direct answer – simple or otherwise. We know thousands of firms are working on big data projects, from small startups to large enterprises. New technologies enable any company to collect, manage, and analyze incredibly large data sets. As these systems become more common, the repositories are more likely to be stuffed with sensitive data. Only after companies are reliant on “big data” do they ask “How can I secure it?” This question comes up so much, attended by so much interest interest and confusion, that it’s time for an open discussion on big data security. We want to cover several areas to help people get a better handle on the challenges. Specifically, we want to cover three things: Why It’s Different Architecturally: What’s different about these systems, both in how they process information and how they are deployed? We will list some of the specific architectural differences and discuss how how they impact data and database security. Why It’s Different Operationally: We will go into detail on operational security issues with big data platforms. We will offer perspective on the challenges in securing big data and the deficiencies of the systems used to manage it – particularly their lack of native security features. Recommendations and Open Issues: We will outline strategies for securing these data repositories, with tactical recommendations for securing certain facets of these environments. We will also highlight some gaps where no good solutions exist. Getting back to our initial question – how to secure big data – what is so darn difficult about answering? For starters, “What is big data?” Before we can offer advice on securing anything, we need to agree what we’re talking about. We can’t discus big data security without an idea of what “big data” means. But there is a major problem: the term is so overused that it has become almost meaningless. When we talk to customers, developers, vendors, and members of the media, they all have their have their own idea of what “big data” is – but unfortunately they are all different. It’s a complex subject and even the wiki definition fails to capture the essence. Like art, everyone knows it when they see it, but nobody can agree on a definition. Defining Big Data What we know is that big data systems can store very large amounts of data; can manage that data across many systems; and provide some facility for data queries, data consistency, and systems management. So does “big data” mean any giant data repository? No. We are not talking about giant mainframe environments. We’re not talking about Grid clusters, massively parallel databases, SAN arrays, cloud-in-a-box, or even traditional data warehouses. We have had the capability to create very large data repositories and databases for decades. The challenge is not to manage a boatload of data – many platforms can do that. And it’s not just about analysis of very large data sets. Various data management platforms provide the capability to analyze large amounts of data, but their cost and complexity make them non-viable for most applications. The big data revolution is not about new thresholds of scalability for storage and analysis. Can we define big data as a specific technology? Can we say that big data is any Hadoop HDFS/Lustre/Google GFS/shard storage system? No – again, big data is more than managing a big data set. Is big data any MapReduce cluster? Probably not, because it’s more than how you query large data sets. Heck, even PL/SQL subsystems in Oracle can be set up to work like MapReduce. Is big data an application? Actually, it’s all of these things and more. When we talk to developers, the people actually building big data systems and applications, we get a better idea of what we’re talking about. The design simplicity of these these platforms is what attracts developers. They are readily available, and their (relatively) low cost of deployment makes them accessible to a wider range of users. With all these traits combined, large-scale data analysis becomes cost-effective. Big data is not a specific technology – it’s defined more by a collection of attributes and capabilities. Sound familiar? It’s more than a little like the struggle to define cloud computing, so we’ll steal from the NIST cloud computing definition and start with some essential characteristics. We define big data as any data repository with the following characteristics: Handles large amounts (petabyte or more) of data Distributed, redundant data storage Parallel task processing Provides data processing (MapReduce or equivalent) capabilities Central management and orchestration Inexpensive – relatively Hardware agnostic Accessible – both (relatively) easy to use, and available as a commercial or open source product Extensible – basic capabilities can be augmented and altered In a nutshell: big, cheap, and easy data management. The “big data” revolution is built on these three pillars – the ability to scale data stores at greatly reduced cost is makes it all possible. It’s data analytics available to the masses. It may or may not have traditional ‘database’ capabilities (indexing, transactional consistency, or relational mapping). It may or may not be fault tolerant. It may or may not have failover capabilities (redundant control nodes). It may or may not allow complex data types. It may or may not provide real-time results to queries. But big data offers all those other characteristics, and it turns out that they – even without traditional database features – are enough to get useful work done. So does big data mean the Hadoop framework? Yes. The Hadoop framework (e.g. HDFS, MapReduce, YARN, Common) is the poster child for big data, and it offers all the characteristics we outlined. Most big data systems actually use one or more Hadoop components, and extend some or all of its basic functionality. Amazon’s SimpleDB also satisfies the requirements, although it is architected differently than Hadoop. Google’s proprietary BigTable architecture is very similar to Hadoop, but we exclude

Share:
Read Post

Friday Summary: September 7, 2012

I thought 35 years later, Voyager 1 is heading for the stars was very cool. It brought back many memories of starting my career at Jet Propulsion Laboratories. Voyager had been in space for a decade when I started there, but these satellites were just starting to send the stunning images back from Saturn and Jupiter. Every morning people got into work early just to see what data was sent back from the night before. Friends were processing the images, doing error and color corrections, and we were seeing other planets up close and personal for the first time. We used to get copies provided to us as employees, many with color enhancement to highlight certain features of the planets and moons. It added an element of excitement to my early career that almost made us forget we were at work. And it was fun working there. JPL teemed with really smart Caltech grads with math skills beyond most mortals. I got to see Carl Sagan speak – twice. I got to see artifacts from the rocket projects that nearly burned down the Caltech campus, and prompted JPL’s creation in the back canyons of La Canada – where they were unlikely to set anyone else on fire. I went on tours of many of the projects, control centers, and laboratories where components of space vehicles were tested. And there were many other satellite projects going on at the time, like the Galileo Spacecraft, which was in many ways more impressive that Voyager. Sure, doing mainframe and dBaseIII+ database programming seemed mundane in comparison, but what I was actually being paid for was just a small part of working there. Stuff like Voyager got me interested in science and technology, and at the time I thought I was working in one of the coolest places on the planet. It helped pushed me through college because I knew there was way more interesting stuff going on outside – in the real world. Where else could you go see wind tunnels and rocket engines and hand-held nuclear reactors and giant gold-plated radio antennae during your lunch break? The back lot was quite literally a bunch of “space junk”, with things like a platform that held the lunar rover on the Apollo spacecraft during its trip to the moon just lying in the weeds. How freakin’ cool is that? And I marvel at a simple, fragile appliance that was catapulted into space at catastrophic speeds, through planetary rings and heated fields of plasma. Something designed and built before the Apple II was even available for sale. But it continues to function and send back radio data to this day. Amazing. On to the Summary: Webcasts, Podcasts, Outside Writing, and Conferences Mike quoted in a Silicon Angle series on CyberWars. Probably too much hype and overuse of buzzwords, but decent perspectives on the attackers. Part 1, Part 2, Part 3 Mike’s Dark Reading column on tough choices. Rich will participate on Protecting Your Digital Life August 22.. Adrian joined Rich and Martin on The Network Security Podcast, episode 285. Adrian won the Nimby Award for Best Identity Forecast Blog. Favorite Securosis Posts Mike Rothman: Gaming the Tetragon. Since we haven’t written much new stuff of late, I figured I’d go back and mine some of the classics of yore. My recent rant on Earning Quadrant Leadership wasn’t the first time I made similar points about the MQ. The first was a couple of months after I joined Securosis, in this post complete with a fancy picture. Users should pay attention to this stuff because if your preferred solution isn’t in the ‘right’ quadrant you might not get to buy it. So you need to game the system from both sides. Adrian Lane: Database Connections and Trust. This week I pulled out an old post to show the app developer mindset – when it comes to data storage and non-relational environments these issues are even more important. Other Securosis Posts Incite 9/4/2012: Dealing with Dealers. Friday Summary: August 31, 2012. Favorite Outside Posts Adrian Lane: Advanced Exploitation of Xen Hypervisor Memory Corruption Bug. On the more technical side, but this is interesting. Mike Rothman: Mobile Attack Surface. GP does it again. Great post here expanding on some of Jim Manico and Jim Bird’s work on defining mobile attack surface. This quote is right on the money: “I use the Attack Surface Model in combination with a Threat Model to identify and locate countermeasures.” Mobile devices are necessarily different and we need to start thinking about how our security is gong to necessarily change. Necessarily. Project Quant Posts Malware Analysis Quant: Index of Posts. Malware Analysis Quant: Metrics – Monitor for Reinfection. Malware Analysis Quant: Metrics – Remediate. Malware Analysis Quant: Metrics – Find Infected Devices. Malware Analysis Quant: Metrics – Define Rules and Search Queries. Malware Analysis Quant: Metrics – The Malware Profile. Malware Analysis Quant: Metrics – Dynamic Analysis. Research Reports and Presentations Understanding and Selecting Data Masking Solutions. Evolving Endpoint Malware Detection: Dealing with Advanced and Targeted Attacks. Implementing and Managing a Data Loss Prevention Solution. Defending Data on iOS. Malware Analysis Quant Report. Report: Understanding and Selecting a Database Security Platform. Vulnerability Management Evolution: From Tactical Scanner to Strategic Platform. Top News and Posts Hacker ‘steals’ Hertfordshire Police Database. Anonymous Leaks Apple UDIDs Following Alleged Hack of FBI. How the FBI might’ve been owned (12M Apple records). FBI Says Laptop Wasn’t Hacked; Never Possessed File of Apple Device IDs. Confirm nothing. Deny everything. Make counter-accusations. That’s the playbook. Apple Releases Fix for Critical Java Flaw. Hacker steals $250k in Bitcoins from online exchange Bitfloor. FBI Arrests Suspected LulzSec Hacker For Sony Pictures Attack. Right here in the greater Phoenix area. Huh. Adobe fixes Photoshop heap overflow. McAfree has detected 1.5 million new malware samples in the last three months. A Handy Way to Foil ATM Skimmer Scams. TSA Denies Stonewalling Nude Body-Scanner Court Order. Blog Comment of the Week Remember, for every comment

Share:
Read Post

Friday Summary: August 24, 2012.

This will probably sound weird, but for the first time in many years I am bummed that summer is ending. This is odd because I’m not really into vacations. I have only taken a real vacation – which I define as my wife and myself leaving the house together for more than 24 hours – twice in the last twelve years. And one of those vacations was a disaster I would not care to relive – drunken friends and crashing houseboats onto rocks is something I can do without. Anyway, vacations are just not something we really do. And when you have as many critters as we do – each needing regular attention – going anywhere gets a bit difficult. I travel a lot as part of this job, so I have no need to “get away” for its own sake. I’m happy to putter around the house, and I have made my home a great place to take time off. This year a close friend and I ventured up to south Lake Tahoe and visited Echo Lake. It’s a place my friend has been going with his parents since he was born, but both his parents have now passed, so we decided to keep the tradition alive. We planned a couple days hanging out and not catching fish. The trip started with a few bad omens: both on the way there and back, we got stuck in several traffic jams – including a high speed chase/rollover accident that stranded us for a few hours in the hot Oakland sun. But that did not matter. Sitting in traffic and sitting in the boat, I had a freaking great time! In fact I really did not want to come back. There was hiking I wanted to do but we ran out of time. And kayaking – no time. And swimming. And they had a Sailfish one-design regatta – I wanted in on that! Drinking Scotch with total strangers and just watching the sun set. And more fishing. I wanted to see if I could get my mountain bike back into the wilderness trails. I wanted a summer vacation, the three month kind I have not had since early high school. I started to fantasize about a tiny cabin on the water to help make all this happen. I could have stayed three months without a second thought. Honestly, I was like a little kid on the last week of summer. I really did not want to come back. I know about all the studies that say you need time off work to be mentally healthy and invigorate yourself. I see a blog post every year on the need for time off and the importance of vacations. And I have seen the benefits of employees regularly taking time off. Whatever. That’s for other people. Not me. Or it was. Now I want a real vacation. It was damn fun, and even if it doesn’t help me beat burnout or reinvigorate me mentally – although this trip did – I just want to go do that again. It was odd feeling that urge to get away for the first time in a very long time. And here I find myself looking at listings for vacation properties – weird. I included a boatload of news this week, so check it out. On to the Summary: Webcasts, Podcasts, Outside Writing, and Conferences Rich participated in Protecting Your Digital Life at TidBITS.. Adrian joined Rich and Martin on The Network Security Podcast, episode 285. Adrian won the Nimby Award for Best Identity Forecast Blog. Favorite Securosis Posts Adrian Lane: Endpoint Security Management Buyer’s Guide. I’m betting this is the most practical and helpful part for end users. Mike Rothman: Endpoint Security Management Buyer’s Guide – 10 Questions. Okay, it’s my post, so I’m a homer. But I love distilling down a bunch of content into only 10 questions. Makes you focus on what’s important. Rich: Force Attacker Perfection. This is an older post of mine, but I think it is becoming increasingly relevant now that we are seeing more interest in active countermeasures, which can really enhance the concept. Other Securosis Posts Incite 8/22/2012: Cassette Legends. [New White Paper] Understanding and Selecting Data Masking Solutions. Friday Summary: August 17, 2012. Favorite Outside Posts Dave Lewis: Identity is Center Stage in Mobile Security Venn. Mike Rothman: VOTE FOR DAVE!!!! Hey CISSPs! Our very own Dave Lewis is running for the ISC2 board, so if you have that (worthless) piece of paper, then get off your hind section and sign Dave’s petition. Significant and much-needed change is coming to the ISC2. And they don’t know what they are in for. It will start with the Brick of Enlightenment. Adrian Lane: Hacker Camp Recount. Very cool! Rich: Bill Brenner slams vendors for their useless briefings. I hope all marketing people read this. But keep in mind that the needs of a journalist are different than those of an analyst, which are different than those of a prospect in a sales situation. Tune the deck for the audience. Project Quant Posts Malware Analysis Quant: Index of Posts. Malware Analysis Quant: Metrics – Monitor for Reinfection. Malware Analysis Quant: Metrics – Remediate. Malware Analysis Quant: Metrics – Find Infected Devices. Malware Analysis Quant: Metrics – Define Rules and Search Queries. Research Reports and Presentations Understanding and Selecting Data Masking Solutions. Evolving Endpoint Malware Detection: Dealing with Advanced and Targeted Attacks. Implementing and Managing a Data Loss Prevention Solution. Defending Data on iOS. Malware Analysis Quant Report. Report: Understanding and Selecting a Database Security Platform. Vulnerability Management Evolution: From Tactical Scanner to Strategic Platform. Top News and Posts Hoff on SDN. It’s possible Rich and Hoff will team up again for RSA, and perhaps they will cover this material and combine it with Rich’s data and app-level automation research. Maybe. Amazon Glacier. $.01 per GB. Holy. Crap. McAfee update breaks computers. FBI surveillance backdoor might be open to hackers. New agnostic malware

Share:
Read Post

[New White Paper] Understanding and Selecting Data Masking Solutions

Today we are launching a new research paper on Understanding and Selecting Data Masking Solutions. As we spoke with vendors, customers, and data security professionals over the last 18 months, we felt big changes occurring with masking products. We received many new customer inquires regarding masking, often for use cases outside the classic normal test data creation. We wanted to discuss these changes and share what we see with the community. Our goal has been to ensure the research addresses common questions from both technical and non-technical audiences. We did our best to cover the business applications of masking in a non-technical, jargon-free way. Not everyone who is interested in data security has a degree in data management or security, so we geared the first third of the paper to problems you can reasonably expect to solve with masking technologies. Those of you interested in the nut and bolts need not fear – we drill into the myriad of technical variables later in the paper. The following except offers an overview of what the paper covers: Data masking technology provides data security by replacing sensitive information with a non-sensitive proxy, but doing so in such a way that the copy looks – and acts – like the original. This means non-sensitive data can be used in business processes without changing the supporting applications or data storage facilities. You remove the risk without breaking the business! In the most common use case, masking limits the propagation of sensitive data within IT systems by distributing surrogate data sets for testing and analysis. In other cases, masking will dynamically provide masked content if a user’s request for sensitive information is deemed ‘risky’. We are particularly proud of this paper – it is the result of a lot of research, and it took a great deal of time to refine the data. We are not aware of any other research paper that fully captures the breadth of technology options available, or anything else that discusses evolving uses for the technology. With the rapid expansion of the data masking market, many people looking for a handle on what’s possible with masking, and that convinced us on to do an deep research paper. We quickly discovered a couple of issues when we started the research. Masking is such a generic term that most people think they have a handle on how it works, but it turns out they are typically aware of only a small sliver of the available options. Additionally, the use cases for masking have grown far beyond creating test data, evolving into a general data protection and management framework. As the masking techniques and deployment options evolve we see a change in the vocabulary to describe the variation. We hope this research will enhance your understanding of masking systems. Finally, we would like to thank those companies who chose to sponsor this research: IBM and Informatica. Without sponsors like these who contribute to the work we do, we could not offer this quality research free of charge to the community. Please visit their sites to download the paper, or you can find a copy in our research library: Understanding and Selecting Data Masking Solutions. Share:

Share:
Read Post

Totally Transparent Research is the embodiment of how we work at Securosis. It’s our core operating philosophy, our research policy, and a specific process. We initially developed it to help maintain objectivity while producing licensed research, but its benefits extend to all aspects of our business.

Going beyond Open Source Research, and a far cry from the traditional syndicated research model, we think it’s the best way to produce independent, objective, quality research.

Here’s how it works:

  • Content is developed ‘live’ on the blog. Primary research is generally released in pieces, as a series of posts, so we can digest and integrate feedback, making the end results much stronger than traditional “ivory tower” research.
  • Comments are enabled for posts. All comments are kept except for spam, personal insults of a clearly inflammatory nature, and completely off-topic content that distracts from the discussion. We welcome comments critical of the work, even if somewhat insulting to the authors. Really.
  • Anyone can comment, and no registration is required. Vendors or consultants with a relevant product or offering must properly identify themselves. While their comments won’t be deleted, the writer/moderator will “call out”, identify, and possibly ridicule vendors who fail to do so.
  • Vendors considering licensing the content are welcome to provide feedback, but it must be posted in the comments - just like everyone else. There is no back channel influence on the research findings or posts.
    Analysts must reply to comments and defend the research position, or agree to modify the content.
  • At the end of the post series, the analyst compiles the posts into a paper, presentation, or other delivery vehicle. Public comments/input factors into the research, where appropriate.
  • If the research is distributed as a paper, significant commenters/contributors are acknowledged in the opening of the report. If they did not post their real names, handles used for comments are listed. Commenters do not retain any rights to the report, but their contributions will be recognized.
  • All primary research will be released under a Creative Commons license. The current license is Non-Commercial, Attribution. The analyst, at their discretion, may add a Derivative Works or Share Alike condition.
  • Securosis primary research does not discuss specific vendors or specific products/offerings, unless used to provide context, contrast or to make a point (which is very very rare).
    Although quotes from published primary research (and published primary research only) may be used in press releases, said quotes may never mention a specific vendor, even if the vendor is mentioned in the source report. Securosis must approve any quote to appear in any vendor marketing collateral.
  • Final primary research will be posted on the blog with open comments.
  • Research will be updated periodically to reflect market realities, based on the discretion of the primary analyst. Updated research will be dated and given a version number.
    For research that cannot be developed using this model, such as complex principles or models that are unsuited for a series of blog posts, the content will be chunked up and posted at or before release of the paper to solicit public feedback, and provide an open venue for comments and criticisms.
  • In rare cases Securosis may write papers outside of the primary research agenda, but only if the end result can be non-biased and valuable to the user community to supplement industry-wide efforts or advances. A “Radically Transparent Research” process will be followed in developing these papers, where absolutely all materials are public at all stages of development, including communications (email, call notes).
    Only the free primary research released on our site can be licensed. We will not accept licensing fees on research we charge users to access.
  • All licensed research will be clearly labeled with the licensees. No licensed research will be released without indicating the sources of licensing fees. Again, there will be no back channel influence. We’re open and transparent about our revenue sources.

In essence, we develop all of our research out in the open, and not only seek public comments, but keep those comments indefinitely as a record of the research creation process. If you believe we are biased or not doing our homework, you can call us out on it and it will be there in the record. Our philosophy involves cracking open the research process, and using our readers to eliminate bias and enhance the quality of the work.

On the back end, here’s how we handle this approach with licensees:

  • Licensees may propose paper topics. The topic may be accepted if it is consistent with the Securosis research agenda and goals, but only if it can be covered without bias and will be valuable to the end user community.
  • Analysts produce research according to their own research agendas, and may offer licensing under the same objectivity requirements.
  • The potential licensee will be provided an outline of our research positions and the potential research product so they can determine if it is likely to meet their objectives.
  • Once the licensee agrees, development of the primary research content begins, following the Totally Transparent Research process as outlined above. At this point, there is no money exchanged.
  • Upon completion of the paper, the licensee will receive a release candidate to determine whether the final result still meets their needs.
  • If the content does not meet their needs, the licensee is not required to pay, and the research will be released without licensing or with alternate licensees.
  • Licensees may host and reuse the content for the length of the license (typically one year). This includes placing the content behind a registration process, posting on white paper networks, or translation into other languages. The research will always be hosted at Securosis for free without registration.

Here is the language we currently place in our research project agreements:

Content will be created independently of LICENSEE with no obligations for payment. Once content is complete, LICENSEE will have a 3 day review period to determine if the content meets corporate objectives. If the content is unsuitable, LICENSEE will not be obligated for any payment and Securosis is free to distribute the whitepaper without branding or with alternate licensees, and will not complete any associated webcasts for the declining LICENSEE. Content licensing, webcasts and payment are contingent on the content being acceptable to LICENSEE. This maintains objectivity while limiting the risk to LICENSEE. Securosis maintains all rights to the content and to include Securosis branding in addition to any licensee branding.

Even this process itself is open to criticism. If you have questions or comments, you can email us or comment on the blog.