Understanding and Selecting a Data Loss Prevention (DLP/CMF/CMP) Solution: Part 1
Data Loss Prevention is one of the most hyped, and least understood, tools in the security arsenal. With at least a half-dozen different names and even more technology approaches, it can be difficult to understand the ultimate value of the tools and which products best suit which environments. This series of posts will provide the necessary background in DLP to help you understand the technology, know what to look for in a product, and find the best match for your organization. I won’t be providing product ratings, I suggest the Gartner Magic Quadrant for that, but will provide you the tools you need for the selection process.
DLP is an adolescent technology that provides significant value for those organizations that need it, despite products that may not be as mature as other areas of IT. The market is currently dominated by startups, but large vendors have started stepping in, typically through acquisition.
The first problem in understanding DLP is figuring out what we’re actually talking about. The following names are all being used to describe the same market:
- Data Loss Prevention/Protection
- Data Leak Prevention/Protection
- Information Loss Prevention/Protection
- Information Leak Prevention/Protection
- Extrusion Prevention
- Content Monitoring and Filtering
- Content Monitoring and Protection
And I’m sure I’m missing a few. DLP seems the most common term, and while I consider its life limited, I’ll generally use it for these posts for simplicity. You can read more about how I think of this progression of solutions here.
Even a clear definition of DLP can be confusing and hard to find. I generally consider them, “products that, based on central policies, identify, monitor, and protect data at rest, in motion, and in use through deep content analysis”. I used to restrict myself to network-based monitoring and blocking solutions, but we’ve recently seen advances in endpoint protection. I’ll detail all these nuances as we dig deeper into the subject.
The DLP market is also split between DLP as a feature, and DLP as a product. A number of products, particularly email security solutions, provide some basic DLP functions, but aren’t necessarily real DLP products. The difference is:
- A DLP Product includes centralized management, policy creation, and enforcement workflow dedicated to the monitoring and protection of content and data. The user interface and functionality are dedicated to solving the business and technical problems of protecting content through content awareness.
- DLP Features include some of the detection and enforcement of DLP products, but are not dedicated to the task of protecting content and data.
This distinction is important because DLP products solve a specific business problem that may or may not be managed by the same business unit/user responsible for other security functions. We often see non-technical users responsible for the protection of content, such as a legal or compliance officer. Even human resources is often involved with the disposition of DLP alerts. Some organizations find that the DLP policies themselves are highly sensitive or need to be managed by business unit leaders outside of security, which also supports a dedicated product. Because DLP is dedicated to a clear business problem (protect my content) that is differentiated from other security problems (protect my PC or protect my network) most of you should look for dedicated DLP solutions.
This doesn’t mean that DLP as a feature won’t be the right solution for you, especially in smaller organizations. It also doesn’t mean that you won’t buy a suite that includes DLP, as long as the DLP management is separate and dedicated to DLP. We’ll be seeing more and more suites as large vendors enter the space, and as we’ll discuss in a future post it often makes sense to run DLP analysis or enforcement within another product, but the central policy creation, management, and workflow should be dedicated to the DLP problem and be isolated from other security functions.
There are a few last terms I want to define before finishing off this post. The first is content awareness. One of the distinctions of DLP solutions is that they look at the content itself, not just the context. Context would be sender/recipient. Content is digging into the pdf embedded in the Word file, embedded in a .zip file, and detecting that one paragraph matches a protected document. In a later post I’ll describe the major detection techniques, and which ones work best for which kinds of content.
We also need to discuss what we mean by protecting data at rest, data in motion, and data in use.
- Data-at-rest includes scanning of storage and other content repositories to identify where sensitive content is located. We call this content discovery. For example, you can use a DLP product to scan your servers and identify any documents with credit card numbers. If that server isn’t authorized for that kind of data, the file can be encrypted or removed, or a warning sent to the file owner.
- Data-in-motion is sniffing of traffic on the network (passively or inline via proxy) to identify content being sent across communications channels. For example, this includes sniffing emails, instant messages, or web traffic for snippets of sensitive source code. In motion tools can often block based on central policies, depending on the type of traffic.
- Data-in-use are typically endpoint solutions that monitor data as the user interacts with it. For example, they can identify when you attempt to transfer a sensitive document to a USB drive and block it (as opposed to blocking use of the USB drive entirely). Data in use should also detect things like cut and paste, or use of sensitive data in an unapproved application (such as someone attempting to encrypt data to sneak it past the sensors).
The last thing to remember about DLP is that it is highly effective against bad business processes (unencrypted FTP exchange of medical records with your insurance company) and mistakes. While DLP offers some protection against malicious activity, we’re at least a few years away from these tools really protecting against a knowledgeable malicious attacker. Fortunately for us, most of our risk doesn’t fall into this category.
That’s it for today; as we move forward we’ll talk more about different features, how this all works, and what to look for in a product.








andyitguy Sep 7
Good Post Rich. I look forward to the rest of them. This is something on my horizon and It’s good to get your take on it.
raesene Sep 8
Interesting post, as you say I think that DLP will definately be a growing area over time.
It’s a great point you make about non-IT folk needing to be involved in the management of a DLP solution. There’s a tendancy for IT people to deploy security solutions without necessarily engaging the business fully, and this is definately one time where the business will need to be fully bought into the concept and the requirements of the product.
One thing though that I always wonder with this kind of solution is how it deals with content encryption. Desktop encryption of the kind provided by Winzip/MS Office/PGP Desktop, effectively blinds any form of content inspection technology and is likely to be deployed by people trying to send sensitive information. Also transport encryption (SSH, HTTP(S)) can blind content sensors looking for data in transit.
My other concern with DLP is that it sounds somewhat like content-IDS pattern matching on words/phrases/Regular expressions.
Now one of the key problems with IDS technology has been false positives, and I’d think that it’s likely that DLP will suffer even more from this as people try to unambiguously define content that they want to match on…
Rob Newby Sep 8
Hi Rich, nice post, saves me a research job. I’m looking forward to the detection techniques post, I’ve never got that far into DLP, so it’s a blind spot for me which needs correcting, and I’m interested in what Rory’s pointed out too.
Just to throw in my 2c: I think DLP as a product and as a feature are important distinctions because:
1. Without the product, the feature will never be improved or standardised.
2. Without the feature, the product will never be important enough for the big guys to buy.
3. They will eventually converge to become the same thing across the network, remote nodes centrally controlled, but each of the separate methods of node protection needs to be fully understood before applying to a framework.
We are in a market now where business are started just to be acquired, and the “product” companies are essentially independent development arms for whoever decides to buy them in the long term.
I’m interested to hear your opinions of data classification in regards to this. Whether a framework can work here, albeit only in large organisations initially, or whether this is doomed to failure because there are not enough market forces in play.
rmogull Sep 8
raesene,
As you’ll see in some future posts those are some well known issues that are either solved, partially solved, or in development. For example, most of the tools can use the icap protocol to sniff SSL connections when a reverse proxy like BlueCoat is used.
As for the false positives, they vary a lot based on what kind of data you’re trying to detect, but the numbers I’ve seen are FAR lower than IDS and very manageable.
Rob-
I have some mixed feelings on data classification, but I’m also working on some tutorial posts on that. The interesting area is the overlap of the DLP content discovery and the data classification tools like Infoscape. Some weird stuff happening there, mostly due to how the buying centers work.
john libawize May 18
well done