As most of you have probably figured out by now I tend to expend a lot of hot air trying to define DLP/CMF/CMP (Data Loss Prevention, Content Monitoring and Filtering, or Content Monitoring and Protection). I often take vendors to task for abusing the terms, since they are just increasing market confusion.
As Rothman points out it won’t be me, or any particular vendor, that really defines DLP. Only the market defines the market, although some of us influential types occasionally get to nudge it in our preferred direction.
While I took Postini/Google to task for calling regular expressions on a single channel (email) DLP, the dirty little secret of DLP is that probably 80-90% of deployments today rely mostly, or totally, on regex for content analysis.
Barely anyone deploys the fancy advanced features that I spend so much time talking about, and that the vendors spend so much time developing. So why do I spend so much time fighting for the purity of DLP? It’s because most organizations, in the long run, will only get a fraction of the value of their investment in terms of risk reduction and operational efficiencies without us pushing the products forward with new features and more advanced analysis.
But if all you want to do is detect on credit card and Social Security Numbers, and you find that the false positives are manageable, something with a regex engine is probably good enough for you. At least for now.
Reader interactions
3 Replies to “The Dirty Little Secret Of DLP”
I agree that there are certain structural weakenesses. The point I so clumsily tried to make was that if more customers with an IP loss problem were in the buying community, the products would be improving WRT that threat.
I do see some ability for the DLP concept to help minimize the threat, or “raise the bar” as it were, but that can’‘t be fleshed out when the buyers aren’‘t excerting appropriate pressure in that direction.
Well, I disagree on DLP’s effectiveness in stopping malicious attacks. That one is a limitation of the products, due to the complexity of the problem. They appear weak because they are weak in that area.
I think it’s more a symptom of what drives these project. Many organizations finally make the move because of perceived regulatory requirements, not internal motivation. It’s a lot easier to understand the problem of structured data loss, and that’s what spends more time in the headlines. Loss of intellectual property is more insidious, difficult to scope, and difficult to quantify.
This focus on what we can call well known data seems, to me, a symptom of the sales process. It is most simple to highlight the cases where loss of SSN caused grief for a company (FUD++) and convince one of the need for a product. Then comes the evaluation, and the alerts fly in. Again, in the compressed time of a normal eval, looking for well known data is easy.
So, the market is defined in part by the sales staff showing what is easy to show to customers who have a higher risk of losing that sort of data. They like it, they buy it, so the market seems SSN/CC (and consequently regex) fixated. Those with a problem protecting IP and other unstructured data need to move into the market more and help define that side more fully.
This is also, I’‘d say, why it looks weak in stoping those with malicious intent.