Optimism and Cautions on OpenDLP

By Rich

I’m starting to think I shouldn’t take vacations. Aside from the Symantec acquisition of PGP and GuardianEdge last week, someone went off and released the first open source DLP tool.

It’s called OpenDLP, and version 0.1 is currently available over Google Code. People have asked me for a long time why there aren’t any FOSS DLP options out there, and it’s nice to finally see someone put in the non-trivial effort and release a tool. DLP isn’t easy to create, and Andrew Gavin deserves major credit for kicking off the project.

First, let’s classify OpenDLP. It is an agent-based content discovery/data-at-rest tool. You install an agent on endpoints, which then scans local storage and sends results to a central management server. The agent is a C program, and the management server runs on Apache/MySQL. The tool supports regular expressions and scanning of plain text files.


  • Free.
  • You can customize the code.
  • Communications are encrypted with SSL.
  • Supports any version of Windows you are likely to run.
  • Includes agent management, and the agent is designed to be non-intrusive.
  • Supports full regular expressions for building policies.


  • Scans stored data on endpoints only. Might be usable on Windows servers, but I would test very carefully first.
  • Unable to scan non-plain-text or compressed files, including current versions of Office (the .XXXx XML formats).
  • No advanced content analysis – regex only, which limits the types of content this will work for.
  • Requires NetBIOS… which some environments ban.
  • I have been told via email (not from a DLP vendor, for the record) that the code may be a bit messy… which I’d consider a security concern.

Thus this is a narrow implementation of DLP – that’s not a criticism, just a definition.

I don’t have a large enough environment to give this a real test, but considering that it is a 0.1 version I think we should give it a little breathing space to improve. The to-do list already includes adding .zip file support, for example. I think it’s safe to say that (assuming the project gathers support) we will see it improve over time.

In summary, this is too soon to deploy in any production capacity, but definitely worth checking out and contributing to. I really hope the project succeeds and matures.

No Related Posts

I’ll start hunting them down. If you don’t mind sharing, which PCI forums/boards do you use?

By Rich


It is an Aussie company that released it. However I believe it’s user base is largely concentrated in US/Canada/Europe. It’s a big industry so i’m not surprised if you didn’t know about it yet. From what we can see they don’t do any marketing and seem to be more focussed on industry word of mouth. I’ve seen it talked about extensively on specific PCI forums and message boards which made us take a closer look.


By John C

OpenDLP 0.2 was released yesterday and addresses some of Rich’s concerns. According to the changelog, the agent’s code has been significantly cleaned up and it now supports reading inside ZIP files (including Office 2007 and OpenOffice documents).

By Anonymous Coward


Is that an Oz-based solution? It is completely unheard of in my circles here in North America… which includes customers, QSAs, consultants, and so on…

By Rich

Thanks for commenting on OpenDLP. We looked at it briefly and determined it wasn’t suitable enough for a commercial organisation that needs a level of certainty for finding PII. We also concluded this with Cornell Spider.

Our organisations main need for DLP / PII search tool stems from PCI DSS compliance. For this we found a very industry specific tool called Card Recon (

From all QSAs we spoke with, it seems to be the leading tool for finding payment cards due to it’s low false positive rate. There also seemed to be few other options that focus specifically on this industry.

We have rolled it out across our windows and linux hosts remotely via scripts however we are receiving a centralized management console which makes that easier shortly.

If PCI’s your issue, I would suggest considering Card Recon. If your looking for other types of data, or you dont have a budget to spend (it was similar price to antivirus), then Cornell/Senf/OpenDLP is your best bet if your happy to spend the additional time getting it right for your level of use.


By John C

@John: Cornell Spider is a GUI-only tool and is not able to automatically deploy on thousands of systems. It requires users/admins to point and click through its interface to scan the system on which it’s installed. While Cornell Spider is very useful for one-off scans on single systems, it would take a lot of work to get it to a point where it would be as easy as OpenDLP to deploy and use in a corporate setting with many endpoints.

By Pablo

The Cornell Spider project has been runnng for some time and appears to have a reasonable degree adoption - at least within US education. It is also an open source data at rest scanner: Rich, it would be interesting to get your view on this tool and its place alongside OpenDLP.

By John Stringer

But Kevin,

If you look at the details of breaches like the various Gonzales incidents, the bad guys are already doing this. I’d rather put an open source tool in the hands of users who won’t buy full DLP than assume this will do more harm than good.

By Rich

Don’t want to sound like a downer here since its great to see the “DLP movement” endorsed by the open source community, but… it seems likely we’re going to see this tool used in Incursion/Capture/Exfiltration operations sometime soon.

Internal data-spills are a frequent target of hacker activity.  This toolkit will raise the bar for enterprises on defense against these spills since it will reveal spill events to adversaries at a faster pace.


By Kevin Rowney

If you like to leave comments, and aren’t a spammer, register for the site and email us at and we’ll turn off moderation for your account.