Securosis

Research

Understanding and Selecting Data Masking: Technical Architecture

Today we will discuss platform architectures and deployment models. Before I jump into the architectural models, it’s worth mentioning that these architectures are designed in response to how enterprises use data. Data is valuable because we use it to support business functions. Data has value in use. The more places we can leverage data to make decisions, the more valuable it is. However, as we have seen over the last decade, data propagation carries many risks. Masking architectures are designed to fit within existing data management frameworks and mitigate risks to information without sacrificing usefulness. In essence we are inserting controls into existing processes, using masking as a guardian, to identify risks and protect data as it migrates through the enterprise applications that automate business processes. As I mentioned in the introduction, we have come a long way from masking as nothing more than a set of scripts run by an admin or database administrator. Back then you connected directly to a database, or ran scripts from the console, and manually moved files around. Today’s platforms proactively discover sensitive data and manage policies centrally, handling security and data distribution across dozens of different types of information management systems, automatically generating masked data as needed for different audiences. Masking products can stand alone, serving disparate data management systems simultaneously, or be embedded as a core function of a dedicated data management service. Base Architecture Single Server/Appliance: A single appliance or software installation that performs static ‘ETL’ data masking services. The server is wholly self-contained – performing all extraction, masking, and loading from a single location. This model is typically used in small and mid-sized enterprises. It can scale geographically, with independent servers in regional offices to handle masking functions, usually in response to specific regional regulatory requirements. Distributed: This option consists of a central management server with remote agents/plug-ins/appliances that perform discovery and masking functions. The central server distributes masking rules, directs endpoint functionality, catalogs locations and nature of sensitive data, and tracks masked data sets. Remote agents periodically receive updates with new masking rules from the central server, and report back sensitive data that has been discovered, along with the results of masking jobs. Scaling is by pushing processing load out to the endpoints. Centralized Architecture: Multiple masking servers, centrally located and managed by a single management server, are used primarily for production and management of masked data for multiple test and analytics systems. Proxy/Bridge Cluster: One or more appliances or agents that dynamically mask streamed content, typically deployed in front of relational databases, to provide proxy-based data masking. This model is used for real-time masking of non-static data, such as database queries or loading into NoSQL databases. Multiple appliances provide scalability and failover capabilities. This may or may not be used in a two-tier architecture. Appliances, software, and virtual appliance options are all available. But unlike most security products, where appliances dominate the market, masking vendors generally deliver their products as software. Windows, Linux, and UNIX support is all common, as is support for many types of files and relational databases. Support for virtual appliance deployment is common among the larger vendors but not universal, so inquire about availability if that is key to your IT service model. A key masking evolution is the ability to apply masking policies across different data management systems (file management, databases, document management, etc.) regardless of platform type (Windows vs. Linux vs. …). Modern masking platforms are essentially data management systems, with policies set at a central location and applied to multiple systems through direct connection or remote agent software. As data is collected and moved from point A to point B, one or more data masks are applied to one or more ‘columns’ of the data. Deployment and Endpoint Options While masking architecture is conceptually simple, there are many different deployment options, each particularly suited to protecting one or more data management systems. And given masking technologies must work on static data copies, live database repositories, and dynamically generated data (streaming data feeds, application generated content, ad hoc data queries, etc.), a wide variety of deployment options are available to accommodate the different data management environments. Most companies deploy centralized masking servers to produce safe test and analytics data, but vendors offer the flexibility to embed masking directly into other applications and environments where large-footprint masking installations or appliances are unsuitable. The following is a sample of the common deployments used for remote data collection and processing. Agents: Agents are software components installed on a server, usually the same server that hosts the data management application. Agents have the option of being as simple or advanced as the masking vendor cares to make them. They can be nothing more than a data collector, sending data back to a remote masking server for processing, or might provide masking as data is collected. In the latter case, the agent masks data as it is received, either completely in memory or from a temporary file. Agents can be managed remotely by a masking server or directly by the data management application, effectively extending data management and collaboration system capabilities (e.g., MS SharePoint, SAP). One of the advantages of using agents at the endpoint rather than in-database stored procedures – which we will describe in a moment – is that all traces of unmasked data can be destroyed. Either by masking in ‘ephemeral’ memory, or by ensuring temporary files are overwritten, sensitive data is not leaked through temporary storage. Agents do consume local processor, memory, and storage – a significant issue for legacy platforms – but only a minor consideration for virtual machines and cloud deployments. Web Server Plug-ins: Technically a form of agent, these plug-ins are installed as web application services, as part of an Apache/web application stack used to support the local application which manages data. Plug-ins are an efficient way to transparently implement masking within existing application environments, acting on the data stream before it reaches the application or extending the application’s functionality

Share:
Read Post

Friday Summary: June 1, 2012

It’s the first of June, and I’m sure most of you are thinking about vacation, if not actually on vacation at this point. I’m here holding down the fort while the rest of Securosis is visiting places cooler and more fun. I’m taking time to reflect on security topics and my research agenda. I have been mulling over the topic of IT buying security products for the sake of security. Sounds irrational, right? We have known for years that people only buy security products to help satisfy compliance requirements, and then only grudgingly, to meet the minimum requirements. But people buying security to help secure things keeps popping up here and there, and I have been waiting for better evidence before blogging about it. Just before the RSA conference I decided to bring it up in an internal meeting, and the conversation went a bit like this: Me: “I think I should mention buying security for the sake of security as a trend.” Partner #1: “Why?” Me: “The number of security driven inquiries has doubled.” Partner #1: “Twice nothing is nothing. Move on.” Me: “Agreed, but twice 3-5% is something to take notice of.” Partner #2: “Where are you getting your data from?” Me: “Customer conversations and anecdotal vendor evidence. At least a dozen, maybe 15 references, since January, mostly in the area of data and database security.” Partner #2: “Meh. Not a great sample pool, or sample size. It’s so small in comparison to compliance it’s an afterthought. It’s really not worth mentioning.” Me: “Yeah, OK, agreed. But the customer questions seem to be driven by risk analysis, and the conversations just seems different. I think we could keep our eyes open on this.” So it’s not really worth talking about, but here I am mentioning it because it keeps popping up. I figured I’d open it up for discussion with our readers, to see what others are seeing. It’s not an actual trend, but it’s interesting – to me, at least. The evidence clearly shows that security is a compliance-driven market, and there is not enough evidence to say we see a real a change. But the conversations are a bit different than they used to be. More often focused on security, more focused on data, with some understanding of risk and a bit of a six-sigma-esque approach to security roadmaps. So maybe it’s not security at all – maybe it’s sophistication of buyers and their internal processes. And why do I care? Because if security or risk is the driver, it changes who buys the products and what features they focus on and ask about – because the use cases differ between security and compliance buyers. I am thinking out loud, but I’d love to hear what’s driving your product selection today. The other issue to talk about is my research agenda. It’s been hectic here since a month before RSA and it’s only just starting to let up. So it’s time to take a breath and look at the topics you want to hear about. Since Mike joined we have really filled out endpoint and network security; and we have continued to do a lot in analytics, data security, and security management. But despite the amount of expertise we have in house, we have done very little with application security, cloud, and access management. WAF management has been among the top 4 items on my research agenda for 2.5 years now, but has yet to percolate to the top. Identity and Access Management for cloud computing is an incredibly confusing topic which I think we could really shed some light on. And there are plenty of interesting technologies for application security we should delve into as well. We will reset the research agenda again soon, so now is a good time to weigh in on the areas you’re most interested in. Oh, and if you visit Arizona in the coming weeks, stay away from flashlights. Apparently they’re dangerous. Yikes! On to the Summary: Webcasts, Podcasts, Outside Writing, and Conferences The Macalope consults The Mogull Adrian presents on selecting a tokenization strategy. We missed Rich’s TidBITS article on hardening Mac OS X. Favorite Securosis Posts Adrian Lane: Low Hanging Fruit. When my encrypted tunnel failed the other day and email immediately decided to synch, I prayed no one was listening. Made me change all my passwords just in case. Mike Rothman: Pragmatic Key Management: Introduction. Rich had me at Pragmatic. I look forward to this series – crypto is integral to the cloud and we all need to revisit our Bob & Alice flowcharts. Other Securosis Posts White Paper: Understanding and Selecting a Database Security Platform. White Paper: Vulnerability Management Evolution. Security, Metrics, Martial Arts, and Triathlon: a Meandering Friday Summary. Evolving Endpoint Malware Detection: Control Lost. Continuous Learning. Friday Summary: May 18, 2012. Understanding and Selecting Data Masking: How It Works. Understanding and Selecting Data Masking: Defining Data Masking. Favorite Outside Posts Adrian Lane: The Cost of Fixing Vulnerabilities vs. Antivirus Software. Jeremiah asks whether our security investment dollars can be spent better. Most firms I speak with keep metrics to determine whether security programs are helping, improve over time, and provide some hints about the relative cost/benefit tradeoffs of different security investments. The data supports Jeremiah’s assertion. Mike Rothman: E-Soft (e-soft.co.uk) Uses Bogus Copyright Claims to Stifle Research. I guess some companies never learn from others. Security by obscurity is not a winning strategy. How about actually fixing the damn bug? Yeah, that’s too radical. Project Quant Posts Malware Analysis Quant: Index of Posts. Malware Analysis Quant: Metrics – Monitor for Reinfection. Malware Analysis Quant: Metrics – Remediate. Malware Analysis Quant: Metrics – Find Infected Devices. Malware Analysis Quant: Metrics – Define Rules and Search Queries. Malware Analysis Quant: Metrics – The Malware Profile. Malware Analysis Quant: Metrics – Dynamic Analysis. Research Reports and Presentations Report: Understanding and Selecting a Database Security Platform. Vulnerability Management Evolution: From Tactical Scanner to Strategic Platform. Watching the Watchers:

Share:
Read Post

Pragmatic Key Management: Understanding Data Encryption Systems

One of the common problems in working with encryption is getting caught up with the intimate details of things like encryption algorithms, key lengths, cipher modes, and other minutiae. Not that these details aren’t important – depending on what you’re doing they might be critical – but in the larger scheme of things these aren’t the aspects most likely to trip up your implementation. Before we get into different key management strategies, let’s take a moment to look at crypto systems at the macro level. We will stick to data encryption for this paper, but these principles apply to other types of cryptosystems as well. Note: For simplicity I will often use “encryption” instead of “cryptographic operation” through this series. If you’re a crypto geek, don’t get too hung up… I know the difference – it’s for readability. The three components of a data encryption system Three major components define the overall structure of an encryption system: The data: The object or objects to encrypt. It might seem silly to break this out, but the security and complexity of the system are influenced by the nature of the payload, as well as by where it is located or collected. The encryption engine: The component that handles the actual encryption (and decryption) operations. The key manager: The component that handles key and passes them to the encryption engine. In a basic encryption system all three components are likely located on the same system. Take personal full disk encryption (the default you might use on your home Windows PC or Mac) – the encryption key, data, and engine are all kept and run on the same hardware. Lose that hardware and you lose the key and data – and the engine, but that isn’t normally relevant. But once we get into SMB and the enterprise we tend to split out the components for security, management, reliability, and compliance. Building a data encryption system Where you place these components define the structure, security, and manageability of your encryption system: Full Disk Encryption Our full disk encryption example above isn’t the sort of approach you would want to take for an organization of any size greater than 1. All major FDE systems do a good job of protecting the key if the device is lost, so we aren’t worried about security too much from that perspective, but managing the key on the local system means the system is much less manageable and reliable than if all the FDE keys are stored together. Enterprise-class FDE manages the keys centrally – even if they are also stored locally – to enable a host of more advanced functions; including better recovery options, audit and compliance, and the ability to manage hundreds of thousands of systems. Database encryption Let’s consider another example: database encryption. By default, all database management systems (DBMS) that support encryption do so with the data, the key, and the encryption engine all within the DBMS. But you can mix and match those components to satisfy different requirements. The most common alternative is to pull the key out of the DBMS and store it in an external key manager. This can protect the key from compromise of the DBMS itself, and increases separation of duties and security. It also reduces the likelihood of lost keys and enables extensive management capabilities – including easier key rotation, expiration, and auditing. But the key could be exposed to someone on the DBMS host itself because it must be stored in memory at before it can be used to encrypt or decrypt. One way to protect against this is to pull both the encryption engine and key out of the DBMS. This could be handled through an external proxy, but more often custom code is developed to send the data to an external encryption server or appliance. Of course this adds complexity and latency… Cloud encryption Cloud computing has given rise to a couple additional scenarios. To protect an Infrastructure as a Service (IaaS) storage volume running at an external cloud provider you can place the encryption engine in a running instance, store the data in a separate volume, and use an external key manager which could be a hardware appliance connected through VPN and managed in your own data center. To protect enterprise files in an object storage service like Amazon S3 or RackSpace Cloud Files, you can encrypt them on a local system before storing them in the cloud – managing keys either on the local system or with a centralized key manager. While some of these services support built-in encryption, they typically store and manage the key themselves, which means the provider has the (hopefully purely theoretical) ability to access your data. But if you control the key and the encryption engine the provider cannot read your files. Backup and storage encryption Many backup systems today include some sort of an encryption option, but the implementations typically offer only the most basic key management. Backup up in one location and restoring in another may be a difficult prospect if the key is stored only in the backup system. Additionally, backup and storage systems themselves might place the encryption engine in any of a wide variety of locations – from individual disk and tape drives, to backup controllers, to server software, to inline proxies. Some systems store the key with the data – sometimes in special hardware added to the tape or drive – while others place it with the engine, and still others keep it in an external key management server. Between all this complexity and poor vendor implementations, I tend to see external key management used for backup and storage more than for just about any other data encryption usage. Application encryption Our last example is application encryption. One of the more secure ways to encrypt application data is to collect it in the application, send it to an encryption server or appliance, and then store the encrypted data in a separate database. The keys

Share:
Read Post

Pragmatic Key Management: Introduction

Few terms strike as much dread in the hearts of security professionals as key management. Those two simple words evoke painful memories of massive PKI failures, with millions spent to send encrypted email to the person in the adjacent cube. Or perhaps it recalls the head-splitting migraine you got when assigned to reconcile incompatible proprietary implementations of a single encryption standard. Or memories of half-baked product implementations that worked fine on in isolation on a single system, but were effectively impossible to manage at scale. And by scale, I mean “more than one”. Over the years key management has mostly been a difficult and complex process. This has been aggravated by the recent resurgence in encryption – driven by regulatory compliance, cloud computing, mobility, and fundamental security needs. Fortunately, encryption today is not the encryption of yesteryear. New techniques and tools remove much of the historical pain of key management – while also supporting new and innovative uses. We also see a change in how organizations approach key management – a move toward practical and lightweight solutions. In this series we will explore the latest approaches for pragmatic key management. We will start with the fundamentals of crypto systems rather than encryption algorithms, what they mean for enterprise deployment, and how to select a strategy that suits your particular project requirements. The historic pain of key management Technically there is no reason key management needs to be as hard as it has been. A key is little more than a blob of text to store and exchange as needed. The problem is that everyone implements their own methods of storing, using, and exchanging keys. No two systems worked exactly alike, and many encryption implementations and products didn’t include the features needed to use encryption in the real world – and still don’t. Many products with encryption features supported only their own proprietary key management – which often failed to meet enterprise requirements in areas such as rotation, backup, separation of duties, and reporting. Encryption is featured in many different types of products but developers who plug an encryption library into an existing tool have (historically) rarely had enough experience in key management to produce refined, easy to use, and effective systems. On the other hand, some security professionals remember early failed PKI deployments that costs millions and provided little value. This was at the opposite end of the spectrum – key management deployed for its own sake, without thought given to how the keys and certificates would be used. Why key management isn’t as hard as you think it is As with most technologies, key management has advanced significantly since those days. Current tools and strategies offer a spectrum of possibilities, all far better standardized and with much more robust management capabilities. We no longer have to deploy key management with an all-or-nothing approach, either relying completely on local management or on an enterprise-wide deployment. Increased standardization (powered in large part by KMIP, the Key Management Interoperability Protocol) and improved, enterprise-class key management tools make it much easier to fit deployments to requirements. Products that implement encryption now tend to include better management features, with increased support for external key management systems when those features are insufficient. We now have smoother migration paths which support a much broader range of scenarios. I am not saying life is now perfect. There are plenty of products that still rely on poorly implemented key management and don’t support KMIP or other ways of integrating with external key managers, but fortunately they are slowly dying off or being fixed due to constant customer pressure. Additionally, dedicated key managers often support a range of non-standards-based integration options for those laggards. It isn’t always great, but it is much easier to mange keys now than even a few years ago. The new business drivers for encryption and key management These advances are driven by increasing customer use of, and demand for, encryption. We can trace this back to 3 primary drivers: Expanding and sustained regulatory demand for encryption. Encryption has always been hinted at by a variety of regulations, but it is now mandated in industry compliance standards (most notably the Payment Card Industry Data Security Standard – PCI-DSS) and certain government regulations. Even when it isn’t mandated, most breach disclosure laws reduce or eliminate the need to publicly report loss of client information if the lost data was encrypted. Increasing use of cloud computing and external service providers. Customers of cloud and other hosting providers want to protect their data when they give up physical control of it. While the provider often has better security than the customer, this doesn’t reduce our visceral response to someone else handling our sensitive information. The increase in public data exposures. While we can’t precisely quantify the growth of actual data loss, it is certainly far more public than it has ever been before. Executives who previously ignored data security concerns are now asking security managers how to stay out of the headlines. More enforcement of more regulations, increasing use of outsiders to manage our data, and increasing awareness of data loss problems, are all combining to produce the greatest growth the encryption market has seen in a long time. Key management isn’t just about encryption (but that is our focus today) Before we delve into how to manage keys, it is important to remember that cryptographic keys are used for more than just encryption, and that there are many different kinds of encryption. Our focus in this series is on data encryption – not digital signing, authentication, identity verification, or other crypto operations. We will not spend much time on digital certificates, certificate authorities, or other signature-based operations. Instead we will focus on data encryption, which is only one area of cryptography. Much of what we see is as much a philosophical change as improvement in particular tools or techniques. I have long been bothered people’s tendency to either indulge in encryption idealism at one end, and or dive

Share:
Read Post

White Paper: Understanding and Selecting a Database Security Platform

We are pleased to announce the availability of a new research paper, Understanding and Selecting Database Security Platforms. And this paper covers most of the facets for database security today. We started to refresh our original Database Activity Monitoring paper in October 2011, but stopped short when our research showed that platform evolution has stopped converging – and has instead diverged again to embrace independent visions of database security, and splintering customer requirements. We decided our original DAM research was becoming obsolete. Use cases have evolved and vendors have added dozens of new capabilities – they have covered the majority of database security requirements, and expanded out into other areas. These changes are so significant that we needed to seriously revisit our use cases and market drivers, and delve into the different ways preventative and detective data security technologies have been bundled with DAM to create far more comprehensive solutions. We have worked hard to fairly represent the different visions of how database security fits within enterprise IT, and to show the different value propositions offered by these variations. These fundamental changes have altered the technical makeup of products so much that we needed new vocabulary to describe these products. The new paper is called “Understanding and Selecting Database Security Platforms” (DSP) to reflect these major product and market changes. We want to thank our sponsors for the Database Security Platform paper: Application Security Inc, GreenSQL, Imperva, and McAfee. Without sponsors we would not be able to provide our research for free, so we appreciate deeply that several vendors chose to participate in this effort and endorse our research positions. You can download the DSP paper. Share:

Share:
Read Post

Incite 5/30/2012: Low Hanging Fruit

As you might have noticed, there was no Incite last week. Turns out the Boss and I were in Barcelona to celebrate 15 years of wedded bliss. We usually run about 6 months late on everything, so the timing was perfect. We had 3 days to ourselves and then two other couples from ATL joined us for the rest of the week. We got to indulge our appreciation for art – hitting the Dali, Miro, and Picasso museums. We also saw some Gaudi structures that are just mind-boggling. Then we joked about how Americans are not patient enough to ever build anything like the Sagrada Familia. Even though we were halfway around the world, we weren’t disconnected. Unless we wanted to be. I rented a MiFi, so when we checked in (mostly with the kids) we just fired up the MiFi, and Skype or FaceTime back home. Not cheap, but cheaper than paying for expensive WiFi and cellular roaming. And it was exceedingly cool to be walking around the Passion Facade of the Sagrada Familia, showing the kids the sculptures via FaceTime, connected via a MiFi on a broadband cellular network in a different country. We took it slow and enjoyed exploring the city, tooling around the markets, and feasting on natural Catalan cooking – not the mixture of additives, preservatives, and otherwise engineered nutrition we call food in the US. And we did more walking in a day than we normally do in a week. We also relaxed. It’s been a pretty intense year so far, and this was our first opportunity to take a breath and enjoy the progress we have made. But real life has a way of intruding on even the most idyllic situations. As we were enjoying a late lunch at a cafe off Las Robles, our friends mentioned how it’s been a little while since they were online. We had already had the discussion about weak passwords on their webmail accounts as we enjoyed cervezas Park Gueell the day before. Their name and a single digit number may be easy to remember, but it’s not really a good password. When my friend then told me how he checked email from a public computer in London, I braced for what I knew was likely to come next. So I started interrogating him as to what he uses that email address for. Bank accounts? Brokerage sites? Utilities? Airlines? Commerce sites? No, no, and no. OK, I can breathe now. Then I proceeded to talk about how losing control of your email can result in a bad day. I thought we were in the clear. Then my buddy’s wife piped in, “Well, I checked my bank account from that computer also, what that bad?” Ugh. Well, yes, that was bad. Quite bad indeed. Then I walked them through how a public computer usually has some kind of key logger and accessing a sensitive account from that device isn’t something you want to do. Ever. She turned ashen and started to panic. To avoid borking the rest of my holiday, I had her log into her account via the bank’s iOS app and scrutinize the transactions. Nothing out of the ordinary, so we all breathed a sigh of relief. She couldn’t reset the password from that app and none of us had a laptop with us. But she promised to change the password immediately when she got back to the US. It was a great reminder of the low-hanging fruit out there for attackers. It’s probably not you, but it’s likely to be plenty of folks you know. Which means things aren’t going to get better anytime soon, though you already knew that. –Mike Photo credits: “Low-hanging fruit explained” originally uploaded by Adam Fagen Heavy Research We’re back at work on a variety of blog series, so here is a list of the research currently underway. Remember you can get our Heavy Feed via RSS, with all our content in its unabridged glory. And you can get all our research papers too. Understanding and Selecting Data Masking How It Works Defining Data Masking Introduction Evolving Endpoint Malware Detection Control Lost Incite 4 U Bear hunting for security professionals: Fascinating post by Chris Nickerson about Running from your Information Security Program. How else could you integrate bear hunting in Russia (yes, real bears), running, and security? He talks about how these Russian dudes take down bears with nothing more than a stick and a knife. Probably not how you’d plan to do it, right? Chris’ points are well taken, especially challenging the adage about not needing to be totally secure – just more secure than the other guys. That’s what I love about pen testers – they question everything, challenge assumptions, and spend a great deal of their lives proving those assumptions wrong. The answer? Plan for the inevitable attacks and make sure you can respond. Yes, it’s something lots of folks (including us) have been talking about for a long time. Though I do enjoy highlighting new and interesting ways to tell important stories. – MR Job security: Say you’re the CISO of a retail chain. Do you think you’d be fired if 10% of your transactions were hacked and resulted in fraud? Maybe you should consider working for the IRS, because apparently gigantic fraud rates not only don’t get you fired there – you get sympathetic press. I bet the guys at Global Payments and Heartland are jealous! And someone at the IRS actually thought that anonymous Internet tax filings, with subsequent anonymous distribution of refunds, was a great idea. I’m willing to bet that not only is whoever created the program is still working at the IRS (where else?), but they will keep the program as is. There are occasions where it’s better to ditch fundamentally flawed processes – and losing millions, if not hundreds of millions, of dollars is a good indicator that your process still has a few glitches – and start over. Most

Share:
Read Post

Understanding and Selecting Data Masking: How It Works

In this post I want to show how masking works, focusing on how masking platforms move and manipulate data. I originally intended to start with architectures and mechanics of masking systems; but it should be more helpful to start by describing the different masking models, how data flows through different systems, and the advantages and disadvantages of each. I will comment on common data sources and destinations, and the issues to consider when considering masking technology. There are many different types of data repositories and services which can be masked, so I will go into detail on these choices. For now we will stick to relational databases, to keep things simple. Let’s jump right in and discuss how the technology works. ETL When most people think about masking, they think about ETL. ‘ETL’ is short for Extraction-Transformation-Load – a concise description of the classic (and still most common) masking process. Sometimes referred to as ‘static’ masking, ETL works against a fixed export from the source repository. Each phase of ETL is typically performed on separate servers: A source data repository, a masking server that orchestrates the transformation, and a destination database. The masking server connects to the source, retrieves a copy of the data, applies the mask to specified columns of data, and then loads the result onto the target server. This process may be partially manual, fully driven by an administrator, or fully automated. Let’s examine the steps in greater detail: Extract: The first step is to ‘extract’ the data from some storage repository – most often a relational database. The extracted data is often formatted to make it easier for the mask to be applied. For example, extraction can performed with a simple SELECT query issued against a database, filtering out unwanted rows and formatting columns in the query. Results may be streamed directly to the masking application for processing or dumped into a file – such as a comma-separated .csv or tab-separated .tsv file. The extracted data is then securely transferred, as an encrypted file or over an encrypted SSL connection, to the masking platform. Transform: The second step is to apply the data mask, transforming sensitive production data into a safe approximation of the original content. See Defining Masking for available transformations. Masks are almost always applied to what database geeks call “columnar data” – which simply means data of the same type is grouped together. For example, a database may contain a ‘customer’ table, where each customer entry includes a social security (SSN). These values are grouped together into a single column, in files and databases alike, making it easier for the masking application to identify which data to mask. The masking application parses through the data, and for each column of data to be masked, it replaces each entry in the column with a masked value. Load: In the last step masked data is loaded into a destination database. The masked data is copied to one or more destination databases, where it is loaded back into tables. The destination database does not contain sensitive data, so it is not subject to the same security and audit requirements as the original database with the unmasked data. ETL is the most generic and most flexible of masking approaches. The logical ETL process flow implemented in dedicated masking platforms, data management tools with integrated masking and encryption libraries, embedded database tools – all the way down to home-grown scripts. I see all these used in production environments, with the level of skill and labor required increasing as you progress down the chain. While many masking platforms replicate the full process – performing extraction, masking, and loading on separate systems – that is not always the case. Here are some alternative masking models and processes. In-place Masking In some cases you need to create a masked copy within the source database – perhaps before moving it to another less sensitive database. In other cases the production data is moved unchanged (securely!) into another system, and then masked at the destination. When production data is discovered on a test system, the data may be masked without being moved at all. All these variations are called “in-place masking” because they skip both movement steps. The masks are applied as before, but inside the database – which raises its own security and performance considerations. There are very good reasons to mask in place. The first is to take advantage of databases’ facility with management and and manipulation of data. They are incredibly adept at data transformation, and offer very high masking performance. Leveraging built-in functions and stored procedures can speed up the masking process because the database has already parsed the data. Masking data in place – replacing data rather than creating a new copy – protects database archives and data files from snooping, should someone access backup tapes or raw disk files. If the security of data after it leaves the production database is your principal concern, then ETL and in-place masking prior to moving data to another location should satisfy security and audit requirements. Many test environments have poor security, which may require masking prior to export or use of a secure ETL exchange, to ensure sensitive data is never exposed on the network or in destination data repository. That said, among enterprise customers we have interviewed, masking data at the source (in the production database) is not a popular option. The additional computational overhead of the masking operation, in addition to the overhead required to read and write the data being transformed, may have an unacceptable impact on database performance. In many organization legacy databases struggle to keep up with day-to-day operation, and cannot absorb the additional load. Masking in the target database (after the data has been moved) is not very popular either – masking solutions are generally purchased to avoid putting sensitive data on insecure test systems, and such customers prefer to avoid loading data into untrusted test systems prior to masking. In-place masking is typically

Share:
Read Post

Security, Metrics, Martial Arts, and Triathlon: a Meandering Friday Summary

Rich here. One of the more fascinating – and unexpected – aspects of migrating from martial arts to triathlon as my primary sport has been importance role of metrics, and how they have changed my views on security. Both sports are pretty darn geeky. On the martial arts side we have intense history, technique, and strategy. Positional errors of a fraction of an inch can mean the difference between success, failure, and injury. But overall there is less emphasis on hard metrics. We use them for conditioning but lack much of the instrumentation needed to collect the kinds of metrics that can make the difference between victory and defeat in competition. For example, very few martial artists could gather hard statistics on how an opponent reacts under specific circumstances, never mind translating that to a specific strategy. Nor do we measure things like speed and power in specific physical configurations. Some martial artists track some fraction of this at a macro level, but generally not with statistical depth. I remember that when training for nationals I knew I would be up against one particular opponent and I studied his strengths, weaknesses, and reactions in certain situations, but I certainly didn’t calculate anything. Besides, some 16 year old kid kicked my ass in the first round and I never went up against the person I planned for (major nutritional failure on my part). Oops. A lot of strategy. Sometimes metrics, but not often and not solid. And a lot of reliance on instinct and core training. Sounds a lot like security. Triathlon is on the opposite end of the spectrum – as are most endurance sports. There is definitely strategy, but even that is defined mostly by raw numbers. I have been tracking my athletic performance metrics fairly intensely since I moved mostly to endurance sports (due to the kids). This started around 10 years ago, although only over the last 3 years have I really focused on it. Additionally, since getting sick last summer I have also started tracking all sorts of other metrics – mostly my daily movements (Jawbone Up, which isn’t available right now), and sleep (Zeo). For the past year I have kept most of this in TrainingPeaks. I’m learning more about myself than I thought possible. I know what paces I can sustain, and what distances, to within a handful of seconds. I know how those are affected by different weather conditions. I know exactly how what I eat affects how I perform different kinds of workouts. I know how food, exercise, and alcohol affect my sleep. I have learned things like how to dial in my diet (no carb no good, but mostly natural with a small amount of processed carbs hits the sweet spot). I know how many days I can go on reduced sleep before I am more likely to get sick. I even figured out just about exactly what will cause one of the stomach incidents that freaked me out so badly last year. I pretty much track myself 24/7. The Jawbone counts how much I move during the day. The Zeo how well I sleep. My Garmin 910XT how well I swim, bike, and run. A Withings scale for weight and body fat. And TrainingPeaks for mood, illness, injury, training stress (mathematically calculated from my workouts), and whatever else I want to put in there. (I have toyed with diet, but don’t really track calories yet). I measure, track over time, and then correlate to make training and lifestyle decisions. These are not theoretical – I use those metrics to change how I live, and then I track my outcomes. I know, for example, that I can optimize my training in the amount of time I have for triathlon, but my single sport performance drops to predictable degrees. All this for someone back-of-pack and over 40. The pros? The levels to which they can tune their lives and training are insane. And it all directly affects performance and their ability to win. But, as with everything, the numbers don’t tell the full story. They can’t precisely predict who will win on race day. Maybe the leader will get caught behind a crash. Maybe they’ll miss just enough sleep, or hit a crosswind at the wrong time, or just have an off day. Maybe someone else will dig deep and blow past everything the numbers predict. But without those numbers, tracked and acted on, for years on end, no pro would ever have a chance of being in the race. Security today is a lot more like martial arts than triathlon, but I’m starting to think the ratio is skewed in the wrong direction. We can track a lot more than we do, and base far more decisions on data than on instinct. Yes, we are battling an opponent, but our race lasts years – not three five-minute rounds. And unlike professional martial artists, we don’t even know our ideal fighting weight, never mind our conditioning level. Believe it or not, I wasn’t always a metrics wonk. I used to think skill and instinct mattered more than anything else. The older I get, the more I realize how very wrong that is. On to the Summary: Webcasts, Podcasts, Outside Writing, and Conferences Mike’s monthly Dark Reading blog: Time to deploy the FUD weapon? Rich quoted by the Macalope: The Macalope Daily: Protesting too much (Subscription required). Favorite Securosis Posts Adrian Lane: Evolving Endpoint Malware Detection: Control Lost. New threats and redefining what ‘endpoint’ actually means are a couple good reasons to follow this series. Mike Rothman: Understanding and Selecting Data Masking: Introduction. Masking is a truly under-appreciated function. Until your production data shows up in an Internet-accessible cloud instance, that is. Adrian’s series should shed some light on this topic. Rich: Continuous Learning. I’m not sure my quote fit here, but I’m sure a fan of people diversifying their knowledge. Other Securosis Posts Our posting volume is down a bit due to

Share:
Read Post

Understanding and Selecting Data Masking: Defining Data Masking

Before I start today’s post, thank you for all the letters saying that people are looking forward to this series. We have put a lot of work into this research to ensure we capture the state of currently available technology, and we are eager to address this under-served market. As always, we encourage blog comments because they help readers understand other viewpoints that we may not reflect in the posts proper. And for the record, I’m not knocking Twitter debates – they are useful as well, but they’re more ephemeral and less accessible to folks outside the Twitter cliques – not everybody wants to follow security geeks like me. And I also apologize for our slow start since initial launch – between meeting with vendors, some medical issues, and client off-site meetings, I’m a bit behind. But I have collected all the data I think is needed to do justice to this subject, so let’s get rolling! In today’s post I will define masking and show the basics of how it works. First a couple basic terms with their traditional definitions: Mask: Similar to the traditional definition, of a facade or a method of concealment, a data mask is a function that transforms data into something similar but new. It may or may not be reversible. Obfuscation: Hiding the original value of data. Data Masking Definition Data masking platforms at minimum replace sensitive data elements in a data repository with similar values, and optionally move masked data to another location. Masking effectively creates proxy data which retains part of the value of the original. The point is to provide data that looks and acts like the original data, but which lacks sensitivity and doesn’t pose a risk of exposure, enabling use of reduced security controls for masked data repositories. This in turn reduces the scope and complexity of IT security efforts. The mask should make it impossible or impractical to reverse engineer masked values back to the original data without special additional information. We will cover additional deployment models and options later in this series, but the following graphic provides an overview: Keep in mind that ‘masking’ is a generic term, and it encompasses several possible data masking processes. In a broader sense data masking – or just ‘masking’ for the remainder of this series – encompasses collection of data, obfuscation of data, storage of data, and possibly movement of the masked information. But ‘mask’ is also used in reference to the masking operation itself – how we change the original data into something else. There are many different ways to obfuscate data depending on the type of data being stored, each embodied by a different function, and each meeting suitable for different security and data use cases. It might be helpful to think of masking in terms of Halloween masks: the level of complexity and degree of concealment both vary, depending upon the effect desired by the wearer. The following is a list of common data masks used to obfuscate data, and how their functionalities differ: Substitution: Substitution is simply replacing one value with another. For example, the mask might substitute a person’s first and last names with names from a some random phone book entry. The resulting data still constitutes a name, but has no logical relationship with the original real name unless you have access to the original substitution table. Redaction/Nulling: This is a form of substitution where we simply replace sensitive data with a generic value, such as ‘X’. For example, we could replace a phone number with “(XXX)XXX-XXXX”, or a Social Security Number (SSN) with XXX-XX-XXXX. This is the simplest and fastest form of masking, but provides very little (arguably no information) from the original. Shuffling: Shuffling is a method of randomizing existing values vertically across a data set. For example, shuffling individual values in a salary column from a table of employee data would make the table useless for learning what any particular each employee earns. But it would not change aggregate or average values for the table. Shuffling is a common randomization technique for disassociating sensitive data relationships (e.g., Bob makes $X per year) while retaining aggregate values. Transposition: This means to swap one value with another, or a portion of one string with another. Transposition can be as complex as an encryption function (see below) or a simple as swapping swapping the first four digits of a credit card number with the last four. There are many variations, but transposition usually refers to a mathematical function which moves existing data around in a consistent pattern. Averaging: Averaging is an obfuscation technique where individual numeric values are replaced by a value derived by averaging some portion of the individual number values. In our salary example above, we could substitute individual salaries with the average across a group or corporate division to hide individual salary values while retaining an aggregate relationship to the real data. De-identification: A generic term that applies to any process that strips identifying information, such as who produced the data set, or personal identities within the data set. De-identification is an important topic when dealing with complex, multi-column data sets that provide ample means for someone to reverse engineer masked data back into individual identities. Tokenization: Tokenization is substitution of data elements with random placeholder values, although vendors overuse the term ‘tokenization’ for a variety of other techniques. Tokens are non-reversible because the token bears no logical relationship with the original value. Format Preserving Encryption: Encryption is the process of transforming data into an unreadable state. For any given value the process consistently produces the same result, and it can only be reversed with special knowledge (the key). While most encryption algorithms produce strings of arbitrary length, format preserving encryption transforms the data into an unreadable state while retaining the format (overall appearance) of the original values. Each of these mask types excels in some use cases, and also of course incurs a certain amount of overhead due to its

Share:
Read Post

Evolving Endpoint Malware Detection: Control Lost

Today we start our latest blog series, which we are calling Evolving Endpoint Malware Detection: Dealing with Advanced and Targeted Attacks – a logical next step from much of the research we have already done around the evolution of malware and emerging controls to deal with it. We started a few years back by documenting Endpoint Security Fundamentals, and more recently looked at network-based approaches to detect malware at the perimeter. Finally we undertook the Herculean task of decomposing the processes involved in confirming an infection, analyzing the malware, and tracking its proliferation with our Malware Analysis Quant research. Since you were a wee lad in the security field, the importance of layered defense has been drummed into your head. No one control is sufficient. In fact, no set of controls are sufficient to stop the kinds of attacks we see every day. But by stacking as many complimentary controls as you can (without totally screwing up the user experience), you can make it hard enough for the attackers that they go elsewhere, looking for lower hanging fruit. Regardless of how good defense in depth sounds, the reality is that with the advent of increased mobility we need to continue protecting the endpoint, as we generally can’t control the location or network being used. Obviously no one would say our current endpoint protection approaches work particularly well, so it’s time to critically evaluate how to do it better. But that’s jumping ahead a bit. First let’s look at the changing requirements before we vilify existing endpoint security controls. Control Lost Sensitive corporate data has never been more accessible. Between PCs and smartphones and cloud-based services (Salesforce.com, Jive, Dropbox, etc.) designed to facilitate collaboration, you cannot assume any device – even those you own and control – isn’t accessing critical information. Just think about how your personal work environment has changed over the past couple years. You store data somewhere in the cloud. You access corporate data on all sorts of devices. You connect through a variety of networks, some ‘borrowed’ from friends or local coffee shops. We once had control of our computing environments, but that’s no longer the case. You can’t assume anything nowadays. The device could be owned by the employee and/or your CFO’s kid could surf anywhere on a corporate laptop. Folks connect through hotel networks and any other public avenues. Obviously this doesn’t mean you should (or can) just give up and stop worrying about controlling your internal networks. But you cannot assume your perimeter defenses, with their fancy egress filtering and content analysis, are in play. An just in case the lack of control over the infrastructure isn’t unsettling enough, you still need to consider the user factor. You know, the unfortunate tendency of employees to click pretty much anything that looks interesting. Potentially contracting all sorts of bad stuff, bringing it back into your corporate environment, and putting data at risk. Again, we have to fortify the endpoint to the greatest degree possible. Advancing Adversaries The attackers aren’t making things any easier. Today’s professional malware writers have gotten ahead of these trends by using advanced malware (remote access trojans [RATs] and other commercial malware techniques) to defeat traditional endpoint defenses. It is well established that traditional file-matching approaches (on both endpoints and mail & web gateways) no longer effectively detect these attacks – due to techniques such as polymorphism, malware droppers, and code obfuscation. Even better, you cannot expect to see an attack before it hits you. Whether it’s a rapidly morphing malware attack or a targeted attempt, yesterday’s generic sample gathering processes (honeynets, WildList, etc.) don’t help, because these malware files are unique and customized to the target. Vendors use the generic term “zero day” for malware you haven’t seen, but the sad reality is you haven’t seen anything important that’s being launched at you. It’s all new to you. When we said professional malware writers, we weren’t kidding. The bad guys now take an agile software approach to building their attacks. They have tools to develop and test the effectiveness of their malware, and are even able to determine whether existing malware protection tools will detect their attacks. Even coordinated with reputation systems and other mechanisms for detecting zero-day attacks, today’s solutions are just not effective enough. All this means security practitioners need new tactics for detecting and blocking malware which targets their users. Evolving Endpoint Malware Detection The good news is that endpoint security vendors realized their traditional approaches were about as viable as dodo birds a few years back. They have been developing their approaches – the resulting products have reduced footprints, require far less computing resources, and are generally decent at detecting simple attacks. But as we have described, simple attacks aren’t the ones to worry about. So in this series we will investigate how endpoint protection will evolve to better detect and hopefully block the current wave of attacks. We will start the next post by identifying the behavioral indicators of a malware attack. Like any poker player, every attack includes its own ‘tells’ that enable you to recognize bad stuff happening. Then we will describe and evaluate a number of different techniques to identify these ‘tells’ at different points along the attack chain. Finally we will wrap up with a candid discussion of the trade-offs involved in dealing with this advanced malware. You can stop these attacks, but the cure may be worse than the disease. So we will offer suggestions for how to find that equilibrium point between detection, response, and user impact. We would like to thank the folks at Trusteer for sponsoring this blog series. As we have mentioned before, you get to enjoy our work for a pretty good price because forward-thinking companies believe in educating the industry in a vendor-neutral and objective fashion. Share:

Share:
Read Post

Totally Transparent Research is the embodiment of how we work at Securosis. It’s our core operating philosophy, our research policy, and a specific process. We initially developed it to help maintain objectivity while producing licensed research, but its benefits extend to all aspects of our business.

Going beyond Open Source Research, and a far cry from the traditional syndicated research model, we think it’s the best way to produce independent, objective, quality research.

Here’s how it works:

  • Content is developed ‘live’ on the blog. Primary research is generally released in pieces, as a series of posts, so we can digest and integrate feedback, making the end results much stronger than traditional “ivory tower” research.
  • Comments are enabled for posts. All comments are kept except for spam, personal insults of a clearly inflammatory nature, and completely off-topic content that distracts from the discussion. We welcome comments critical of the work, even if somewhat insulting to the authors. Really.
  • Anyone can comment, and no registration is required. Vendors or consultants with a relevant product or offering must properly identify themselves. While their comments won’t be deleted, the writer/moderator will “call out”, identify, and possibly ridicule vendors who fail to do so.
  • Vendors considering licensing the content are welcome to provide feedback, but it must be posted in the comments - just like everyone else. There is no back channel influence on the research findings or posts.
    Analysts must reply to comments and defend the research position, or agree to modify the content.
  • At the end of the post series, the analyst compiles the posts into a paper, presentation, or other delivery vehicle. Public comments/input factors into the research, where appropriate.
  • If the research is distributed as a paper, significant commenters/contributors are acknowledged in the opening of the report. If they did not post their real names, handles used for comments are listed. Commenters do not retain any rights to the report, but their contributions will be recognized.
  • All primary research will be released under a Creative Commons license. The current license is Non-Commercial, Attribution. The analyst, at their discretion, may add a Derivative Works or Share Alike condition.
  • Securosis primary research does not discuss specific vendors or specific products/offerings, unless used to provide context, contrast or to make a point (which is very very rare).
    Although quotes from published primary research (and published primary research only) may be used in press releases, said quotes may never mention a specific vendor, even if the vendor is mentioned in the source report. Securosis must approve any quote to appear in any vendor marketing collateral.
  • Final primary research will be posted on the blog with open comments.
  • Research will be updated periodically to reflect market realities, based on the discretion of the primary analyst. Updated research will be dated and given a version number.
    For research that cannot be developed using this model, such as complex principles or models that are unsuited for a series of blog posts, the content will be chunked up and posted at or before release of the paper to solicit public feedback, and provide an open venue for comments and criticisms.
  • In rare cases Securosis may write papers outside of the primary research agenda, but only if the end result can be non-biased and valuable to the user community to supplement industry-wide efforts or advances. A “Radically Transparent Research” process will be followed in developing these papers, where absolutely all materials are public at all stages of development, including communications (email, call notes).
    Only the free primary research released on our site can be licensed. We will not accept licensing fees on research we charge users to access.
  • All licensed research will be clearly labeled with the licensees. No licensed research will be released without indicating the sources of licensing fees. Again, there will be no back channel influence. We’re open and transparent about our revenue sources.

In essence, we develop all of our research out in the open, and not only seek public comments, but keep those comments indefinitely as a record of the research creation process. If you believe we are biased or not doing our homework, you can call us out on it and it will be there in the record. Our philosophy involves cracking open the research process, and using our readers to eliminate bias and enhance the quality of the work.

On the back end, here’s how we handle this approach with licensees:

  • Licensees may propose paper topics. The topic may be accepted if it is consistent with the Securosis research agenda and goals, but only if it can be covered without bias and will be valuable to the end user community.
  • Analysts produce research according to their own research agendas, and may offer licensing under the same objectivity requirements.
  • The potential licensee will be provided an outline of our research positions and the potential research product so they can determine if it is likely to meet their objectives.
  • Once the licensee agrees, development of the primary research content begins, following the Totally Transparent Research process as outlined above. At this point, there is no money exchanged.
  • Upon completion of the paper, the licensee will receive a release candidate to determine whether the final result still meets their needs.
  • If the content does not meet their needs, the licensee is not required to pay, and the research will be released without licensing or with alternate licensees.
  • Licensees may host and reuse the content for the length of the license (typically one year). This includes placing the content behind a registration process, posting on white paper networks, or translation into other languages. The research will always be hosted at Securosis for free without registration.

Here is the language we currently place in our research project agreements:

Content will be created independently of LICENSEE with no obligations for payment. Once content is complete, LICENSEE will have a 3 day review period to determine if the content meets corporate objectives. If the content is unsuitable, LICENSEE will not be obligated for any payment and Securosis is free to distribute the whitepaper without branding or with alternate licensees, and will not complete any associated webcasts for the declining LICENSEE. Content licensing, webcasts and payment are contingent on the content being acceptable to LICENSEE. This maintains objectivity while limiting the risk to LICENSEE. Securosis maintains all rights to the content and to include Securosis branding in addition to any licensee branding.

Even this process itself is open to criticism. If you have questions or comments, you can email us or comment on the blog.