Login  |  Register  |  Contact

Data Security

Thursday, January 22, 2009

The Business Justification For Data Security

By Rich

You’ve probably noticed that we’ve been a little quieter than usual here on the blog. After blasting out our series on Building a Web Application Security Program, we haven’t been putting up much original content.

That’s because we’ve been working on one of our tougher projects over the past 2 weeks. Adrian and I have both been involved with data security (information-centric) security since long before we met. I was the first analyst to cover it over at Gartner, and Adrian spent many years as VP of Development and CTO in data security startups. A while back we started talking about models for justifying data security investments. Many of our clients struggle with the business case for data security, even though they know the intrinsic value. All too often they are asked to use ROI or other inappropriate models.

A few months ago one of our vendor clients asked if we were planning on any research in this area. We initially thought they wanted yet-another ROI model, but once we explained our positions they asked to sign up and license the content. Thus, in the very near future, we will be releasing a report (also distributed by SANS) on The Business Justification for Data Security. (For the record, I like the term information-centric better, but we have to acknowledge the reality that “data security” is more commonly used).

Normally we prefer to develop our content live on the blog, as with the application security series, but this was complex enough that we felt we needed to form a first draft of the complete model, then release it for public review. Starting today, we’re going to release the core content of the report for public review as a series of posts. Rather than making you read the exhaustive report, we’re reformatting and condensing the content (the report itself will be available for free, as always, in the near future). Even after we release the PDF we’re open to input and intend to continuously revise the content over time.

The Business Justification Model

Today I’m just going to outline the core concepts and structure of the model. Our principle position is that you can’t fully quantify the value of information; it changes too often, and doesn’t always correlate to a measurable monetary amount. Sure, it’s theoretically possible, but practically speaking we assume the first person to fully and accurately quantify the value of information will win the nobel prize.

Our model is built on the foundation that you quantify what you can, qualify the rest, and use a structured approach to combine those results into an overall business justification. 200901221427.jpg We purposely designed this as a business justification model, not a risk/loss model. Yes, we talk about risk, valuation, and loss, but only in the context of justifying security investments. That’s very different from a full risk assessment/management model.

Our model follows four steps:

  1. Data Valuation: In this step you quantify and qualify the value of the data, accounting for changing business context (when you can). It’s also where you rank the importance of data, so you know if you are investing in protecting the right things in the right order.
  2. Risk Estimation: We provide a model to combine qualitative and quantitative risk estimates. Again, since this is a business justification model, we show you how to do this in a pragmatic way designed to meet this goal, rather than bogging you down in near-impossible endless assessment cycles. We provide a starting list of data-security specific risk categories to focus on.
  3. Potential Loss Assessment: While it may seem counter-intuitive, we break potential losses from our risk estimate since a single kind of loss may map to multiple risk categories. Again, you’ll see we combine the quantitative and qualitative. As with the risk categories, we also provide you with a starting list.
  4. Positive Benefits Evaluation: Many data security investments also contain positive benefits beyond just reducing risk/losses. Reduced TCO and lower audit costs are just two examples.

After walking through these steps we show how to match the potential security investment to these assessments and evaluate the potential benefits, which is the core of the business justification. A summarized result might look like:

- Investing in DLP content discovery (data at rest scanning) will reduce our PCI related audit costs by 15% by providing detailed, current reports of the location of all PCI data. This translates to $xx per annual audit. - Last year we lost 43 laptops, 27 of which contained sensitive information. Laptop full drive encryption for all mobile workers effectively eliminates this risk. Since Y tool also integrates with our systems management console and tells us exactly which systems are encrypted, this reduces our risk of an unencrypted laptop slipping through the gaps by 90%. - Our SOX auditor requires us to implement full monitoring of database administrators of financial applications within 2 fiscal quarters. We estimate this will cost us $X using native auditing, but the administrators will be able to modify the logs, and we will need Y man-hours per audit cycle to analyze logs and create the reports. Database Activity Monitoring costs %Y, which is more than native auditing, but by correlating the logs and providing the compliance reports it reduces the risk of a DBA modifying a log by Z%, and reduces our audit costs by 10%, which translates to a net potential gain of $ZZ. - Installation of DLP reduces the chance of protected data being placed on a USB drive by 60%, the chances of it being emailed outside the organization by 80%, and the chance an employee will upload it to their personal webmail account by 70%.

We’ll be detailing more of the sections in the coming days, and releasing the full report early next month. But please let us know what you think of the overall structure. Also, if you want to take a look at a draft (and we know you) drop us a line…

We’re really excited to get this out there. My favorite parts are where we debunk ROI and ALE.


Thursday, December 11, 2008

How The Cloud Destroys Everything I Love (About Web App Security)

By Rich

On Tuesday, Chris Hoff joined me to guest host the Network Security Podcast and we got into a deep discussion on cloud security. And as you know, for the past couple of weeks we’ve been building our series on web application security. This, of course, led to all sorts of impure thoughts about where things are headed. I wouldn’t say I’m ready to run around in tattered clothes screaming about the end of the Earth, but the company isn’t called Securosis just because it has a nice ring to it.

If you think about it a certain way, cloud computing just destroys everything we talk about for web application security. And not just in one of those, “oh crap, here’s one of those analysts spewing BS about something being dead” ways. Before jumping into the details, in this case I’m talking very specifically of cloud based computing infrastructure- e.g., Amazon EC2/S3. This is where we program our web applications to run on top of a cloud infrastructure, not dedicated resources in a colo or a “traditional” virtual server. I also sprinkle in cloud services- e.g., APIs we can hook into using any application, even if the app is located on our own server (e.g., Google APIs).

Stealing from our yet incomplete series on web app sec and our discussions of ADMP, here’s what I mean:

  • Secure development (somewhat) breaks: we’re now developing on a platform we can’t fully control- in a development environment we may not be able to isolate/lock down. While we should be able to do a good job with our own code, there is a high probability that the infrastructure under us can change unexpectedly. We can mitigate this risk more than some of the other ones I’ll mention- first, through SLAs with our cloud infrastructure provider, second by adjusting our development process to account for the cloud. For example, make sure you develop on the cloud (and secure as best you can) rather than completely developing in a local virtual environment that you then shift to the cloud. This clearly comes with a different set of security risks (putting development code on the Internet) that also need to be, and can be, managed. Data de-identification becomes especially important.
  • Static and dynamic analysis tools (mostly) break: We can still analyze our own source code, but once we interact with cloud based services beyond just using them as a host for a virtual machine, we lose some ability to analyze the code (anything we don’t program ourselves). Thus we lose visibility into the inner workings of any third party/SaaS APIs (authentication, presentation, and so on), and they are likely to randomly change under our feet as the providing vendor continually develops them. We can still perform external dynamic testing, but depending on the nature of the cloud infrastructure we’re using we can’t necessarily monitor the application during runtime and instrument it the same way we can in our test environments. Sure, we can mitigate all of this to some degree, especially if the cloud infrastructure service providers give us the right hooks, but I don’t hold out much hope this is at the top of their priorities. (Note for testing tools vendors- big opportunity here).
  • Vulnerability assessment and penetration testing… mostly don’t break: So maybe the cloud doesn’t destroy everything I love. This is one reason I like VA and pen testing- they never go out of style. We still lose some ability to test/attack service APIs.
  • Web application firewalls really break: We can’t really put a box we control in front of the entire cloud, can we? Unless the WAF is built into the cloud, good luck getting it to work. Cloud vendors will have to offer this as a service, or we’ll need to route traffic through our WAF before it hits the back end of the cloud, negating some of the reasons we switch to the cloud in the first place. We can mitigate some of this through either the traffic routing option, virtual WAFs built into our cloud deployment (we need new products for it), or cloud providers building WAF functionality into their infrastructure for us.
  • Application and Database Activity Monitoring break: We can no longer use external monitoring devices or services, and have to integrate any monitoring into our cloud-based application. As with pretty much all of this list it’s not an impossible problem, just one people will ignore. For example, I highly doubt most of the database activity monitoring techniques will work in the cloud- network monitoring, memory monitoring, or kernel extensions. Native audit might, but not all database management systems provide effective audit logs, and you still need a way to collect them as your app and db shoot around the cloud for resource optimization.

I could write more about each of these areas, but you get the point. When we run web applications on cloud based infrastructure, using cloud based software services, we break much of the nascent web application security models we’re just starting to get our fingers around. The world isn’t over*, but it sure just moved out from under our feet.

*This doesn’t destroy the world, but it’s quite possible that the Keanu Reeves version of The Day the Earth Stood Still will.


Wednesday, December 10, 2008

A Good (Potential) Risk Management IQ Test For Management

By Rich

It looks like China is thinking about requiring in-depth technical information on all foreign technology products before they will be allowed into China.

I highly suspect this won’t actually happen, but you never know. If it does, here is a simple risk related IQ test for management:

  1. Will you reveal your source code and engineering documents to a government with a documented history of passing said information on to domestic producers who often clone competitive technologies and sell at lower than the market value you like?
  2. Do you have the risk tolerance to accept domestic Chinese abuse of your intellectual property should you reveal it?

If the answer to 1 is “yes” and 2 is “no”, the IQ is “0”. Any other answer shows at least as basic understanding of risk tolerance and management.

I worked a while back with an Indian company that engaged in a partnership with China to co-produce a particular high value product. That information was promptly stolen and spread to other local manufacturers.

I don’t have a problem with China, but not only do they culturally view intellectual property differently than us, there is a documented history of what the western world would consider abuse of IP. If you can live with that, you should absolutely engage with that market. If can’t accept the risk of IP theft, stay away.

(P.S.- This is also true of offshore development. Stop calling me after you have offshored and asking how to secure your date. You know, closing barn doors and cows and all).


Thursday, December 04, 2008

Analysis Of The Microsoft/RSA Data Loss Prevention Partnership

By Rich

By the time I post this you won’t be able to find a tech news site that isn’t covering this one. I know, since my name was on the list of analysts the press could contact and I spent a few hours talking to everyone covering the story yesterday. Rather than just reciting the press release, I’d like to add some analysis, put things into context, and speculate wildly. For the record, this is a big deal in the long term, and will likely benefit all of the major DLP vendors, even though there’s nothing earth shattering in the short term.

As you read this, Microsoft and RSA are announcing a partnership for Data Loss Prevention. Here are the nitty gritty details, not all of which will be apparent from the press release:

  • This month, the RSA DLP product (Tablus for you old folks) will be able to assign Microsoft RMS (what Microsoft calls DRM) rights to stored data based on content discovery. The way this works is that the RMS administrator will define a data protection template (what rights are assigned to what users). The RSA DLP administrator then creates a content detection policy, which can then apply the RMS rights automatically based on the content of files. The RSA DLP solution will then scan file repositories (including endpoints) and apply the RMS rights/controls to protect the content.
  • Microsoft has licensed the RSA DLP technology to embed into various Microsoft products. They aren’t offering much detail at this time, nor any timelines, but we do know a few specifics. Microsoft will slowly begin adding the RSA DLP content analysis engine to various products. The non-NDA slides hint at everything from SQL Server, Exchange, and Sharepoint, to Windows and Office. Microsoft will also include basic DLP management into their other management tools.
  • Policies will work across both Microsoft and RSA in the future as the products evolve. Microsoft will be limiting itself to their environment, with RSA as the upgrade path for fuller DLP coverage.

And that’s it for now. RSA DLP 6.5 will link into RMS, with Microsoft licensing the technology for future use in their products. Now for the analysis:

  • This is an extremely significant development in the long term future of DLP. Actually, it’s a nail in the coffin of the term “DLP” and moves us clearly and directly to what we call “CMP”- Content Monitoring and Protection. It moves us closer and closer to the DLP engine being available everywhere (and somewhat commoditized), and the real value in being in the central policy management, analysis, workflow, and incident management system. DLP/CMP vendors don’t go away- but their focus changes as the agent technology is built more broadly into the IT infrastructure (this definitely won’t be limited to just Microsoft).
  • It’s not very exciting in the short term. RSA isn’t the first to plug DLP into RMS (Workshare does it, but they aren’t nearly as big in the DLP market). RSA is only enabling this for content discovery (data at rest) and rights won’t be applied immediately as files are created/saved. It’s really the next stages of this that are interesting.
  • This is good for all the major DLP vendors, although a bit better for RSA. It’s big validation for the DLP/CMP market, and since Microsoft is licensing the technology to embed, it’s reasonable to assume that down the road it may be accessible to other DLP vendors (be aware- that’s major speculation on my part).
  • This partnership also highlights the tight relationship between DLP/CMP and identity management. Most of the DLP vendors plug into Microsoft Active Directory to determine users/groups/roles for the application of content protection policies. One of the biggest obstacles to a successful DLP deployment can be a poor directory infrastructure. If you don’t know what users have what roles, it’s awfully hard to create content-based policies that are enforced based on users and roles.
  • We don’t know how much cash is involved, but financially this is likely good for RSA (the licensing part). I don’t expect it to overly impact sales in the short term, and the other major DLP vendors shouldn’t be too worried for now. DLP deals will still be competitive based on the capabilities of current products, more than what’s coming in an indeterminate future.

Now just imagine a world where you run a query on a SQL database, and any sensitive results are appropriately protected as you place them into an Excel spreadsheet. You then drop that spreadsheet into a Powerpoint presentation and email it to the sales team. It’s still quietly protected, and when one sales guy tries to email it to his Gmail account, it’s blocked. When he transfers it to a USB device, it’s encrypted using a company key so he can’t put it on his home computer. If he accidentally sends it to someone in the call center, they can’t read it. In the final PDF, he can’t cut out the table and put it in another document. That’s where we are headed- DLP/CMP is enmeshed into the background, protecting content through it’s lifecycle based on central policies and content and context awareness.

In summary, it’s great in the long term, good but not exciting in the short term, and beneficial to the entire DLP market, with a slight edge for RSA. There are a ton of open questions and issues, and we’ll be watching and analyzing this one for a while.

As always, feel free to email me if you have any questions.


Monday, November 17, 2008

An Amusing Use For DLP

By Rich

Here’s a valuable lesson for you college students out there, from Dave Meizlik: if your professor is married to one of the leads at a DLP vendor, think twice before plagiarizing a published dissertation.

We talked generally for a while about the problem and then it hit me what if I downloaded a bunch of relevant dissertations, fingerprinted them with a DLP solution, and then sent the girls dissertation through the systems analysis engine for comparison? Would the DLP solution be able to detect plagiarism? It almost seemed too simple. … So when we got home from lunch I started up my laptop and RDP”d into my DLP system. I had my wife download a bunch of relevant dissertations from her school”s database, and within minutes I fingerprinted roughly 50 dissertation files, many of which were a couple hundred pages in length, and built a policy to block transmission of any of the data in those files. I then took her students dissertation and emailed it from a client station to my personal email. Now because the system was monitoring SMTP traffic it sent the email (with the student”s paper as an attachment) to the content analysis engine. I waited a second another and then I impatiently hit send receive and there it was, an automated notification telling me that my email had violated my new policy and had been blocked.

I suspect that’s one grad student who’s going to be serving fries soon…


Wednesday, November 12, 2008

Cloud Security Macro Layers

By Rich

There’s been a lot of discussion on cloud computing in the blogosphere and general press lately, and although I’ll probably hate myself for it, it’s time to jump in beyond some sophomoric (albeit really funny) humor.

Chris Hoff inspired this with his post on TCG IF-MAP; a framework/standard for exchanging network security objects and events. It’s roots are in NAC, although as Alan Shimel informs us there’s been very little adoption.

Since cloud computing is a crappy marketing term that can mean pretty much whatever you want, I won’t dig into the various permutations in this post. For the purposes of this post I’ll be focusing on distributed services (e.g. grid computing), online services, and SaaS. I won’t be referring to in the cloud filtering and other network-only services.

Chris’s posting, and most of the ones I’ve seen out there, are heavily focused on network security concepts as they relate to the cloud. but if we look at cloud computing from a macro level, there are additional layers that are just as critical (in no particular order):


  • Network: The usual network security controls.
  • Service: Security around the exposed APIs and services.
  • User: Authentication- which in the cloud word will need to move to more adaptive authentication, rather than our current username/password static model.
  • Transaction: Security controls around individual transactions- via transaction authentication, adaptive authorization, or other approaches.
  • Data: Information-centric security controls for cloud based data. How’s that for buzzword bingo? Okay, this actually includes security controls over the back end data, distributed data, and any content exchanged with the user.

Down the road we’ll dig into these in more detail, but anytime we start distributing services and functionality over an open public network with no inherent security controls, we need to focus on the design issues and reduce design flaws as early as possible. We can’t just look at this as a network problem- our authentication, authorization, information, and service (layer 7) controls are likely even more important.

This gets me thinking it’s time to write a new framework… not that anyone will adopt it.


Monday, October 20, 2008

Your Simple Guide To Endpoint Encryption Options

By Rich

On the surface endpoint encryption is pretty straightforward these days (WAY better than when I first covered it 8 years ago), but when you start matching all the options to your requirements it can be a tad confusing.

I like to break things out into some simple categories/use cases when I’m helping people figure out the best approach. While this could end up as one of those huge blog posts that ends up as a whitepaper, for today I’ll stick with the basics. Here are the major endpoint encryption options and the most common use cases for them:

  • Full Drive Encryption (FDE): To protect data when you lose a laptop/desktop (but usually laptop). Your system boots up to a mini-operating system where you authenticate, then the rest of the drive is decrypted/encrypted on the fly as you use it. There are a ton of options, including McAfee, CheckPoint, WinMagic, Utimaco, GuardianEdge, PGP, BitArmor, BitLocker, TrueCrypt, and SafeNet.
  • Partial Drive Encryption: To protect data when you lose a laptop/desktop. Similar to whole drive, with some differences for dealing with system updates and such. There’s only one vendor doing this today (Credent), and the effect is equivalent to FDE except in limited circumstances.
  • Volume/Home Directory Encryption: For protecting all of a user’s or group’s data on a shared system. Either the users home directory or a specific volume is encrypted. Offers some of the protection of FDE, but there is a greater chance data may end up in shared spaces and be potentially recovered. FileVault and TrueCrypt are examples.
  • Media Encryption: For encrypting an entire CD, memory stick, etc. Most of the FDE vendors support this.
  • File/Folder Encryption: To protect data on a shared system- including protecting sensitive data from administrators. FDE and file folder encryption are not mutually exclusive- FDE protects against physical loss, while file/folder protects against other individuals with access to a system. Imagine the CEO with an encrypted laptop that still wants to protect the financials from a system administrator. Also useful for encrypting a folder on a shared drive. Again, a ton of options, including PGP (and the free GPG), WinMagic, Utimaco, PKWare, SafeNet, McAfee, WinZip, and many of the other FDE vendors (I just listed the ones I know for sure).
  • Distributed Encryption: This is a special form of file/folder encryption where keys are centrally managed with the encryption engine distributed. It’s used to encrypt files/folders for groups or individuals that move around different systems. There are a bunch of different technical approaches, but basically as long as the product is on the system you are using, and has access to the central server, you don’t need to manually manage keys. Ideally, to encrypt you can right-click the file and select the group/key you’d like to use (or this is handled transparently). Options include Vormetric, BitArmor, PGP, Utimaco, and WinMagic (I think some others are adding it).
  • Email Encryption: To encrypt email messages and attachments. A ton of vendors that are fodder for another post.
  • Hardware Encrypted Drives: Keys are managed by software, and the drive is encrypted using special hardware built-in. The equivalent of FDE with slightly better performance (unless you are using it in a high-activity environment) and better security. Downside is cost, and I only recommend it for high security situations until prices (including the software) drop to what you’d pay for software. Seagate is first out of the gate, with laptop, portable, and full size options.

Here’s how I break out my advice:

  • If you have a laptop, use FDE.
  • If you want to protect files locally from admins or other users, add file/folder. Ideally you want to use the same vendor for both, although there are free/open source options depending on your platform (for those of you on a budget).
  • If you exchange stuff using portable media, encrypt it, preferably using the same tool as the two above.
  • If you are in an enterprise and exchange a lot of sensitive data, especially on things like group projects, use distributed encryption over regular file/folder. It will save a ton of headaches. There aren’t free options, so this is really an enterprise-only thing.
  • Email encryption is a separate beast- odds are you won’t link it to your other encryption efforts (yet) but this will likely change in the next couple years. Enterprise options are linked up on the email server vs. handling it all on the client, thus why you may manage it separately.

I generally recommend keeping it simple- FDE is pretty much mandatory, but many of you don’t quite need file/folder yet. Email is really nice to have, but for a single user you are often better off with a free option since the commercial advantages mostly come into play on the server.

Personally I used to use FileVault on my Mac for home directory encryption, and GPG for email. I then temporarily switched to a beta of PGP for whole drive encryption (and everything else; but as a single user the mail.app plugin worked better than the service option). My license expired and my drive decrypted, so I’m starting to look at other options (PGP worked very well, but I prefer a perpetual license; odds are I will end up back on it since there aren’t many Mac options for FDE- just them, CheckPoint, and WinMagic if you have a Seagate encrypting drive). FileVault worked well for a while, but I did encounter some problems during a system migration and we still get problem reports on our earlier blog entry about it.

Oh- and don’t forget about the Three Laws. And if there were products I missed, please drop them in the comments.


Wednesday, October 15, 2008

My Take On The Database Security Market Challenges

By Rich

Yesterday, Adrian posted his take on a conversation we had last week. We were headed over to happy hour, talking about the usual dribble us analyst types get all hot and bothered about, when he dropped the bombshell that one of our favorite groups of products could be in serious trouble.

For the record, we hadn’t started happy hour yet.

Although everyone on the vendor side is challenged with such a screwed up economy, I believe the forces affecting the database security market place it in particular jeopardy. This bothers me, because I consider these to be some of the highest value tools in our information-centric security arsenal.

Since I’m about to head off to San Diego for a Jimmy Buffett concert, I’ll try and keep this concise.

  • Database security is more a collection of markets and tools than a single market. We have encryption, Database Activity Monitoring, vulnerability assessment, data masking, and a few other pieces. Each of these bits has different buying cycles, and in some cases, different buying centers. Users aren’t happy with the complexity, yet when they go shopping the tend to want to put their own car together (due to internal issues) than buy the full product.
  • Buying cycles are long and complex due to the mix of database and security. Average cycles are 9-12 months for many products, unless there’s a short term compliance mandate. Long cycles are hard to manage in a tight economy.
  • It isn’t a threat driven market. Sure, the threats are bad, but as I’ve talked about before they don’t keep people from checking their email or playing solitaire, thus they are perceived as less.
  • The tools are too technical. I’m sorry to my friends on the vendor side, but most of the tools are very technical and take a lot of training. These aren’t drop in boxes, and that’s another reason buying cycles are long. I’ve been talking with some people who have gone through vendor product training in the last 6 months, and they all said the tools required DBA skills, but not many on the security side have them.
  • They are compliance driven, but not compliance mandated. These tools can seriously help with a plethora of compliance initiatives, but there is rarely a checkbox requiring them. Going back to my economics post, if you don’t hit that checkbox or clearly save money, getting a sale will be rough.
  • Big vendors want to own the market, and think they have the pieces. Oracle and IBM have clearly stepped into the space, even when products aren’t as directly competitive (or capable) as the smaller vendors. Better or not, as we continue to drive towards “good enough” many clients will stop with their big vendor first (especially since the DBAs are so familiar with the product line).
  • There are more short-term acquisition targets than acquirers. The Symantecs and McAfees of the world aren’t looking too strongly at the database security market, mostly leaving the database vendors themselves. Only IBM seems to be pursuing any sort of acquisition strategy. Oracle is building their own, and we haven’t heard much in this area out of Microsoft. Sybase is partnered with a company that seems to be exiting the market, and none of the other database companies are worth talking about. The database tools vendors have hovered around this area, but outside of data masking (which they do themselves) don’t seem overly interested.
  • It’s all down to the numbers and investor patience. Few of the startups are in the black yet, and some have fairly large amounts of investment behind them. If run rates are too high, and sales cycles too low, I won’t be surprised to see some companies dumped below their value. IPLocks, for example, didn’t sell for nearly it’s value (based on the numbers alone, I’m not even talking product).

There are a few ways to navigate through this, and the companies that haven’t aggressively adjusted their strategies in the past few weeks are headed for trouble.

I’m not kidding, I really hated writing this post. This isn’t a “X is Dead” stir the pot kind of thing, but a concern that one of the most important linchpins of information centric security is in probable trouble. To use Adrian’s words:

But the evolutionary cycle coincides with a very nasty economic downturn, which will be long enough that venture investment will probably not be available to bail out those who cannot maintain profitability. Those that earn most of their revenue from other products or services may be immune, but the DB Security vendors who are not yet profitable are candidates for acquisition under semi-controlled circumstances, fire-sale or bankruptcy, depending upon how and when they act.


Thursday, September 25, 2008

On Oracle World and Inference Attacks

By Rich

Some days I feel the suffocating weight of travel more than others. Typically, those days are near the end of a long travel binge; one lasting about 3 months this time.

When I first started traveling I was in my 20’s, effectively single (rotating girlfriends), and relatively unencumbered. At first it was an incredibly exciting adventure, but that quickly wore off as my social ties started to decay (friends call less if you’re never around) and my physical conditioning decayed faster. I dropped from 20 hours or more a week of activity and workouts to nearly 0 when on the road. It killed my progression in martial arts and previously heavy participation in rescues. Not that the travel was all bad; I managed to see the world (and circumnavigate it), hit every continent except Antarctica, and, more importantly, meet my wife. I learned how to hit every tourist spot in a city in about 2 days, pack for a 2-week multi-continental trip using only carry-on, and am completely comfortable being dropped nearly anywhere in the world.

Eventually I hit a balance and for the most part keep my trips down to 1 or 2 a month, which isn’t so destructive as to ruin my body and piss off my family. But despite my best scheduling efforts sometimes things get out of control. That’s why I’m excited to finish off my last trip in the latest binge (Oracle World) for about a month and get caught up with blogging and the business. For those of you earlier in your careers I highly recommend a little travel, but don’t let it take over your life. I’ve been on the run for 8 years now and there is definitely a cost if you don’t keep it under control. As we say in martial arts, there is balance in everything, including balance.

Now on to Oracle World and a little security.

I’m consistently amazed at the scope of Oracle World. I go to a lot of shows at the Moscone Center in San Francisco, from Macworld to RSA, and Oracle World dwarfs them all. For those of you that know the area, they hold sessions in the center and every hotel in walking distance, close of the road between North and South, and effectively take over the entire area. Comparing it to RSA, it’s a strong reminder that we (security) are far from the center of the world. Not that Oracle is the center, but the business applications they, and competitors, produce.

This year I was invited to speak on a panel on data masking/test data generation. As usual, it’s something we’ve talked about before, and it’s clearly a warming topic thanks to PCI and HIPAA. I’ve covered data masking for years, and was even involved in a real project long before joining Gartner, but it’s only VERY recently that interest really seems to be accelerating. You can read this post for my Five Laws of Data Masking.

Two interesting points came out of the panel. The first was the incredible amount of interest people had in public source and healthcare data masking. Rather than just asking us about best practices (the panel was myself, someone from Visa, PWC, and Oracle), the audience seemed more focused on how organizations are protecting their personal financial and healthcare data. Yes, even DNA databases.

The second, and more relevant point, is the problem of inference attacks. Inference attacks are where you use data mining and ancillary sources to compromise your target. For example, if you capture a de-identified healthcare database, you may still be able to reconstruct the record by mining other sources. For example, if you have a database of patient records where patient names and numbers have been scrambled, you might still be able to identify an individual by combining that with scheduling information, doctor lists, zip code, and so on.

Another example was a real situation I was involved with. We needed to work with a company to de-identify a customer database that included deployment characteristics, but not allow inference attacks. The problem wasn’t the bulk of the database, but the outliers, which also happened to be the most interesting cases. If there are a limited number of companies of a certain size deploying a certain technology, competitors might be able to identify the source company by looking at the deals they were involved with, which ones they lost, and who won the deal. Match those characteristics, and they then identify the record and could mine deeper information. Bad guys could do the same thing and perhaps determine deployment specifics that aid an attack.

If logic flaws are the bane of application security design, inference attacks are the bane of data warehousing and masking.


Tuesday, August 12, 2008

New Whitepaper: Best Practices For Endpoint DLP

By Rich

We’re proud to announce a new whitepaper dedicated to best practices in endpoint DLP. It’s a combination of our series of posts on the subject, enhanced with additional material, diagrams, and editing. The title is (no surprise) Best Practices for Endpoint Data Loss Prevention. It was actually complete before Black Hat, but I’m just getting a chance to put it up now.

The paper covers features, best practices for deployment, and example use cases, to give you an idea of how it works.

It’s my usual independent content, much of which started here as blog posts. Thanks to Symantec (Vontu) for Sponsoring and Chris Pepper for editing.


Wednesday, July 23, 2008

Best Practices For Endpoint DLP: Use Cases

By Rich

We’ve covered a lot of ground over the past few posts on endpoint DLP. Our last post finished our discussion of best practices and I’d like to close with a few short fictional use cases based on real deployments.

Endpoint Discovery and File Monitoring for PCI Compliance Support

BuyMore is a large regional home goods and grocery retailer in the southwest United States. In a previous PCI audit, credit card information was discovered on some employee laptops mixed in with loyalty program data and customer demographics. An expensive, manual audit and cleansing was performed within business units handling this content. To avoid similar issues in the future, BuyMore purchased an endpoint DLP solution with discovery and real time file monitoring support.

BuyMore has a highly distributed infrastructure due to multiple acquisitions and independently managed retail outlets (approximately 150 locations). During initial testing it was determined that database fingerprinting would be the best content analysis technique for the corporate headquarters, regional offices, and retail outlet servers, while rules-based analysis is the best fit for the systems used by store managers. The eventual goal is to transition all locations to database fingerprinting, once a database consolidation and cleansing program is complete.

During Phase 1, endpoint agents were deployed to corporate headquarters laptops for the customer relations and marketing team. An initial content discovery scan was performed, with policy violations reported to managers and the affected employees. For violations, a second scan was performed 30 days later to ensure that the data was removed. In Phase 2, the endpoint agents were switched into real time monitoring mode when the central management server was available (to support the database fingerprinting policy). Systems that leave the corporate network are then scanned monthly when the connect back in, with the tool tuned to only scan files modified since the last scan. All systems are scanned on a rotating quarterly basis, and reports generated and provided to the auditors.

For Phase 3, agents were expanded to the rest of the corporate headquarters team over the course of 6 months, on a business unit by business unit basis.

For the final phase, agents were deployed to retail outlets on a store by store basis. Due to the lower quality of database data in these locations, a rules-based policy for credit cards was used. Policy violations automatically generate an email to the store manager, and are reported to the central policy server for followup by a compliance manager.

At the end of 18 months, corporate headquarters and 78% or retail outlets were covered. BuyMore is planning on adding USB blocking in their next year of deployment, and already completed deployment of network filtering and content discovery for storage repositories.

Endpoint Enforcement for Intellectual Property Protection

EngineeringCo is a small contract engineering firm with 500 employees in the high tech manufacturing industry. They specialize in designing highly competitive mobile phones for major manufacturers. In 2006 they suffered a major theft of their intellectual property when a contractor transferred product description documents and CAD diagrams for a new design onto a USB device and sold them to a competitor in Asia, which beat their client to market by 3 months.

EngineeringCo purchased a full DLP suite in 2007 and completed deployment of partial document matching policies on the network, followed by network-scanning-based content discovery policies for corporate desktops. After 6 months they added network blocking for email, http, and ftp, and violations are at an acceptable level. In the first half of 2008 they began deployment of endpoint agents for engineering laptops (approximately 150 systems).

Because the information involved is so valuable, EngineeringCo decided to deploy full partial document matching policies on their endpoints. Testing determined performance is acceptable on current systems if the analysis signatures are limited to 500 MB in total size. To accommodate this limit, a special directory was established for each major project where managers drop key documents, rather than all project documents (which are still scanned and protected at the network). Engineers can work with documents, but the endpoint agent blocks network transmission except for internal email and file sharing, and any portable storage. The network gateway prevents engineers from emailing documents externally using their corporate email, but since it’s a gateway solution internal emails aren’t scanned.

Engineering teams are typically 5-25 individuals, and agents were deployed on a team by team basis, taking approximately 6 months total.

These are, of course, fictional best practices examples, but they’re drawn from discussions with dozens of DLP clients. The key takeaways are:

  1. Start small, with a few simple policies and a limited footprint.
  2. Grow deployments as you reduce incidents/violations to keep your incident queue under control and educate employees.
  3. Start with monitoring/alerting and employee education, then move on to enforcement.
  4. This is risk reduction, not risk elimination. Use the tool to identify and reduce exposure but don’t expect it to magically solve all your data security problems.
  5. When you add new policies, test first with a limited audience before rolling them out to the entire scope, even if you are already covering the entire enterprise with other policies.


Thursday, July 17, 2008

Best Practices for Endpoint DLP: Part 5, Deployment

By Rich

In our last post we talked about prepping for deployment- setting expectations, prioritizing, integrating with the infrastructure, and defining workflow. Now it’s time to get out of the lab and get our hands dirty.

Today we’re going to move beyond planning into deployment.

  1. Integrate with your infrastructure: Endpoint DLP tools require integration with a few different infrastructure elements. First, if you are using a full DLP suite, figure out if you need to perform any extra integration before moving to endpoint deployments. Some suites OEM the endpoint agent and you may need some additional components to get up and running. In other cases, you’ll need to plan capacity and possibly deploy additional servers to handle the endpoint load. Next, integrate with your directory infrastructure if you haven’t already. Determine if you need any additional information to tie users to devices (in most cases, this is built into the tool and its directory integration components).
  2. Integrate on the endpoint: In your preparatory steps you should have performed testing to be comfortable that the agent is compatible with your standard images and other workstation configurations. Now you need to add the agent to the production images and prepare deployment packages. Don’t forget to configure the agent before deployment, especially the home server location and how much space and resources to use on the endpoint. Depending on your tool, this may be managed after initial deployment by your management server.
  3. Deploy agents to initial workgroups: You’ll want to start with a limited deployment before rolling out to the larger enterprise. Pick a workgroup where you can test your initial policies.
  4. Build initial policies: For your first deployment, you should start with a small subset of policies, or even a single policy, in alert or content classification/discovery mode (where the tool reports on sensitive data, but doesn’t generate policy violations).
  5. Baseline, then expand deployment: Deploy your initial policies to the starting workgroup. Try to roll the policies out one monitoring/enforcement mode at a time, e.g., start with endpoint discovery, then move to USB blocking, then add network alerting, then blocking, and so on. Once you have a good feel for the effectiveness of the policies, performance, and enterprise integration, you can expand into a wider deployment, covering more of the enterprise. After the first few you’ll have a good understanding of how quickly, and how widely, you can roll out new policies.
  6. Tune policies: Even stable policies may require tuning over time. In some cases it’s to improve effectiveness, in others to reduce false positives, and in still other cases to adapt to evolving business needs. You’ll want to initially tune policies during baselining, but continue to tune them as the deployment expands. Most DLP clients report that they don’t spend much time tuning policies after baselining, but it’s always a good idea to keep your policies current with enterprise needs.
  7. Add enforcement/protection: By this point you should understand the effectiveness of your policies, and have educated users where you’ve found policy violations. You can now start switching to enforcement or protective actions, such as blocking, network filtering, or encryption of files. It’s important to notify users of enforcement actions as they occur, otherwise you might frustrate them u ecessarily. If you’re making a major change to established business process, consider scaling out enforcement options on a business unit by business unit basis (e.g., restricting access to a common content type to meet a new compliance need).

Deploying endpoint DLP isn’t really very difficult; the most common mistake enterprises make is deploying agents and policies too widely, too quickly. When you combine a new endpoint agent with intrusive enforcement actions that interfere (positively or negatively) with people’s work habits, you risk grumpy employees and political backlash. Most organizations find that a staged rollout of agents, followed by first deploying monitoring policies before moving into enforcement, then a staged rollout of policies, is the most effective approach.


Monday, July 07, 2008

Best Practices for Endpoint DLP: Part 3

By Rich

In our last post we discussed the core functions of an endpoint DLP tool. Today we’re going to talk more about agent deployment, management, policy creation, enforcement workflow, and overall integration.

Agent Management

Agent management consists of two main functions- deployment and maintenance. On the deployment side, most tools today are designed to work with whatever workstation management tools your organization already uses. As with other software tools, you create a deployment package and then distribute it along with any other software updates. If you don’t already have a software deployment tool, you’ll want to look for an endpoint DLP tool that includes basic deployment capabilities. Since all endpoint DLP tools include central policy management, deployment is fairly straightforward. There’s little need to customize packages based on user, group, or other variables beyond the location of the central management server.

The rest of the agent’s lifecycle, aside from major updates, is controlled through the central management server. Agents should communicate regularly with the central server to receive policy updates and report incidents/activity. When the central management server is accessible, this should happen in near real time. When the endpoint is off the enterprise network (without VPN/remote access), the DLP tool will store violations locally in a secure repository that’s encrypted and inaccessible to the user. The tool will then connect with the management server next time it’s accessible, receiving policy updates and reporting activity. The management server should produce aging reports to help you identify endpoints which are out of date and need to be refreshed. Under some circumstances, the endpoint may be able to communicate remote violations through encrypted email or another secure mechanism from outside the corporate firewall.

Aside from content policy updates and activity reporting, there are a few other features that need central management. For content discovery, you’ll need to control scanning schedule/frequency, and control bandwidth and performance (e.g., capping CPU usage). For real time monitoring and enforcement you’ll also want performance controls, including limits on how much space is used to store policies and the local cache of incident information.

Once you set your base configuration, you shouldn’t need to do much endpoint management directly. Things like enforcement actions are handled implicitly as part of policy, thus integrated into the main DLP policy interface.

Policy Creation and Workflow

Policy creation for endpoints should be fully integrated into your central DLP policy framework for consistent enforcement across data in motion, at rest, and in use. Policies are thus content focused, rather than location focused- another advantage of full suites over individual point products. In the policy management interface you first define the content to protect, then pick channels and enforcement actions (all, of course, tied to users/groups and context). For example, you might want to create a policy to protect customer account numbers. You’d start by creating a database fingerprinting policy pulling names and account numbers from the customer database; this is the content definition phase. Assuming you want the policy to apply equally to all employees, you then define network protective actions- e.g., blocking unencrypted emails with account numbers, blocking http and ftp traffic, and alerting on other channels where blocking isn’t possible. For content discovery, quarantine any files with more than one account number that are not on a registered server. Then, for endpoints, restrict account numbers from unencrypted files, portable storage, or network communications when the user is off the corporate network, switching to a rules-based (regular expression) policy when access to the policy server isn’t available.

In some cases you might need to design these as separate but related policies- for example, the database fingerprinting policy applies when the endpoint is on the network, and a simplified rules-based policy when the endpoint is remote.

Incident management should also be fully integrated into the overall DLP incident handling queue. Incidents appear in a single interface, and can be routed to handlers based on policy violated, user, severity, channel, or other criteria. Remember that DLP is focused on solving the business problem of protecting your information, and thus tends to require a dedicated workflow.

For endpoint DLP you’ll need some additional information beyond network or non-endpoint discovery policies. Since some violations will occur when the system is off the network and unable to communicate with the central management server, “delayed notification” violations need to be appropriately stamped and prioritized in the management interface. You’d hate to miss the loss of your entire customer database because it showed up as a week-old incident when the sales laptop finally reconnected.

Otherwise, workflow is fully integrated into your main DLP solution, and any endpoint-specific actions are handled through the same mechanisms as discovery or network activity.


If you’re running an endpoint only solution, an integrated user interface obviously isn’t an issue. For full suite solutions, as we just discussed, policy creation, management, and incident workflow should be completely integrated with network and discovery policies.

Other endpoint management is typically a separate tab in the main interface, alongside management areas for discovery/storage management and network integration/management. While you want an integrated management interface, you don’t want it so integrated that it becomes confusing or unwieldy to use.

In most DLP tools, content discovery is managed separately to define repositories and manage scanning schedules and performance. Endpoint DLP discovery should be included here, and allow you to specify device and user groups instead of having to manage endpoints individually.

That’s about it for the technology side; in our next posts we’ll look at best practices for deployment and management, and present a few generic use cases.

I realize I’m pretty biased towards full-suite solutions, and this is your chance to call me on it. If you disagree, please let me know in the comments…


Wednesday, July 02, 2008

Best Practices For Endpoint DLP: Part 2

By Rich

In Part 1 I talked about the definition of endpoint DLP, the business drivers, and how it integrates with full-suite solutions. Today (and over the next few days) we’re going to start digging into the technology itself.

Base Agent Functions

There is massive variation in the capabilities of different endpoint agents. Even for a single given function, there can be a dozen different approaches, all with varying degrees of success. Also, not all agents contain all features; in fact, most agents lack one or more major areas of functionality.

Agents include four generic layers/features:

  1. Content Discovery: Scanning of stored content for policy violations.
  2. File System Protection: Monitoring and enforcement of file operations as they occur (as opposed to discovery, which is scanning of content already written to media). Most often, this is used to prevent content from being written to portable media/USB. It’s also where tools hook in for automatic encryption or application of DRM rights.
  3. Network Protection: Monitoring and enforcement of network operations. Provides protection similar to gateway DLP when a system is off the corporate network. Since most systems treat printing and faxing as a form of network traffic, this is where most print/fax protection can be enforced (the rest comes from special print/fax hooks).
  4. GUI/Kernel Protection: A more generic category to cover data in use scenarios, such as cut/paste, application restrictions, and print screen.

Between these four categories we cover most of the day to day operations a user might perform that places content at risk. It hits our primary drivers from the last post- protecting data from portable storage, protecting systems off the corporate network, and supporting discovery on the endpoint. Most of the tools on the market start with file and (then) networking features before moving on to some of the more complex GUI/kernel functions.

Agent Content Awareness

Even if you have an endpoint with a quad-core processor and 8 GB of RAM, the odds are you don’t want to devote all of that horsepower to enforcing DLP.

Content analysis may be resource intensive, depending on the types of policies you are trying to enforce. Also, different agents have different enforcement capabilities which may or may not match up to their gateway counterparts. At a minimum, most endpoint tools support rules/regular expressions, some degree of partial document matching, and a whole lot of contextual analysis. Others support their entire repertoire of content analysis techniques, but you will likely have to tune policies to run on a more resource constrained endpoint.

Some tools rely on the central management server for aspects of content analysis, to offload agent overhead. Rather than performing all analysis locally, they will ship content back to the server, then act on any results. This obviously isn’t ideal, since those policies can’t be enforced when the endpoint is off the enterprise network, and it will suck up a fair bit of bandwidth. But it does allow enforcement of policies that are otherwise totally unrealistic on an endpoint, such as database fingerprinting of a large enterprise DB.

One emerging option is policies that adapt based on endpoint location. For example, when you’re on the enterprise network most policies are enforced at the gateway. Once you access the Internet outside the corporate walls, a different set of policies is enforced. For example, you might use database fingerprinting (exact database matching) of the customer DB at the gateway when the laptop is in the office or on a (non split tunneled) VPN, but drop to a rule/regex for Social Security Numbers (or account numbers) for mobile workers. Sure, you’ll get more false positives, but you’re still able to protect your sensitive information while meeting performance requirements.

Next up: more on the technology, followed by best practices for deployment and implementation.


Monday, June 30, 2008

Best Practices For Endpoint DLP: Part 1

By Rich

As the first analyst to ever cover Data Loss Prevention, I’ve had a bit of a tumultuous relationship with endpoint DLP. Early on I tended to exclude endpoint only solutions because they were more limited in functionality, and couldn’t help at all with protecting data loss from unmanaged systems. But even then I always said that, eventually, endpoint DLP would be a critical component of any DLP solution. When we’re looking at a problem like data loss, no individual point solution will give us everything we need.

Over the next few posts we’re going to dig into endpoint DLP. I’ll start by discussing how I define it, and why I don’t generally recommend stand-alone endpoint DLP. I’ll talk about key features to look for, then focus on best practices for implementation.

It won’t come as any surprise that these posts are building up into another one of my whitepapers. This is about as transparent a research process as I can think of. And speaking of transparency, like most of my other papers this one is sponsored, but the content is completely objective (sponsors can suggest a topic, if it’s objective, but they don’t have input on the content).


As always, we need to start with our definition for DLP/CMP:

“Products that, based on central policies, identify, monitor, and protect data at rest, in motion, and in use through deep content analysis”.

Endpoint DLP helps manage all three parts of this problem. The first is protecting data at rest when it’s on the endpoint; or what we call content discovery (and I wrote up in great detail). Our primary goal is keeping track of sensitive data as it proliferates out to laptops, desktops, and even portable media. The second part, and the most difficult problem in DLP, is protecting data in use. This is a catch all term we use to describe DLP monitoring and protection of content as it’s used on a desktop- cut and paste, moving data in and out of applications, and even tying in with encryption and enterprise Document Rights Management (DRM). Finally, endpoint DLP provides data in motion protection for systems outside the purview of network DLP- such as a laptop out in the field.

Endpoint DLP is a little difficult to discuss since it’s one of the fastest changing areas in a rapidly evolving space. I don’t believe any single product has every little piece of functionality I’m going to talk about, so (at least where functionality is concerned) this series will lay out all the recommended options which you can then prioritize to meet your own needs.

Endpoint DLP Drivers

In the beginning of the DLP market we nearly always recommended organizations start with network DLP. A network tool allows you to protect both managed and unmanaged systems (like contractor laptops), and is typically easier to deploy in an enterprise (since you don’t have to muck with every desktop and server). It also has advantages in terms of the number and types of content protection policies you can deploy, how it integrates with email for workflow, and the scope of channels covered. During the DLP market’s the first few years, it was hard to even find a content-aware endpoint agent.

But customer demand for endpoint DLP quickly grew thanks to two major needs- content discovery on the endpoint, and the ability to prevent loss through USB storage devices. We continue to see basic USB blocking tools with absolutely no content awareness brand themselves as DLP. The first batches of endpoint DLP tools focused on exactly these problems- discovery and content-aware portable media/USB device control.

The next major driver for endpoint DLP is supporting network policies when a system is outside the corporate gateway. We all live in an increasingly mobile workforce where we need to support consistent policies no matter where someone is physically located, nor how they connect to the Internet.

Finally, we see some demand for deeper integration of DLP with how a user interacts with their system. In part, this is to support more intensive policies to reduce malicious loss of data. You might, for example, disallow certain content from moving into certain applications, like encryption. Some of these same kinds of hooks are used to limit cut/paste, print screen, and fax, or to enable more advanced security like automatic encryption or application of DRM rights.

The Full Suite Advantage

As we’ve already hinted, there are some limitations to endpoint only DLP solutions. The first is that they only protect managed systems where you can deploy an agent. If you’re worried about contractors on your network or you want protection in case someone tries to use a server to send data outside the walls, you’re out of luck. Also, because some content analysis policies are processor and memory intensive, it is problematic to get them running on resource-constrained endpoints. Finally, there are many discovery situations where you don’t want to deploy a local endpoint agent for your content analysis- e.g. when performing discovery on a major SAN.

Thus my bias towards full-suite solutions. Network DLP reduces losses on the enterprise network from both managed and unmanaged systems, and servers and workstations. Content discovery finds and protects stored data throughout the enterprise, while endpoint DLP protects systems that leave the network, and reduces risks across vectors that circumvent the network. It’s the combination of all these layers that provides the best overall risk reduction. All of this is managed through a single policy, workflow, and administration server; rather than forcing you to create different policies; for different channels and products, with different capabilities, workflow, and management.

In our next post we’ll discuss the technology and major features to look for, followed by posts on best practices for implementation.