Login  |  Register  |  Contact

Data Loss Prevention

Thursday, March 11, 2010

Low Hanging Fruit: Quick Wins with Data Loss Prevention

By Rich

Two of the most common criticisms of DLP that comes up in user discussions are a) its complexity and b) the fear of false positives. Security professionals worry that DLP is an expensive widget that will fail to deliver the expected value -- turning into yet another black hole of productivity. But when used properly DLP provides rapid assessment and identification of data security issues not available with any other technology.

I don't mean to play down the real complexities you might encounter as you roll out a complete data protection program. Business use of information is itself complicated, and no tool designed to protect that data can simplify or mask the underlying business processes. However, there are steps you can take to obtain significant immediate value and security gains without blowing your productivity or wasting important resources.

Over the next few posts I'll highlight the lowest hanging fruit for DLP, refined in conversations with hundreds of DLP users. These aren't meant to incorporate the entire DLP process, but to show you how to get real and immediate wins before you move on to more complex policies and use cases.

Establish Your Process

Nearly every DLP reference I've talked with has discovered actionable offenses committed by employees as soon as they turn the tool on. Some of these require little more than contacting a business unit to change a bad process, but quite a few result in security guards escorting people out of the building, or even legal action. One of my favorite stories is the time the DLP vendor plugged in the tool for a lunchtime demonstration on the same day a senior executive decided to send proprietary information to a competitor. Needless to say, the vendor lost their hard drives that day, but they didn't seem too unhappy.

Even if you aren't planning on moving straight to enforcement mode, you need to put a process in place to manage the issues that will crop up once you activate your tool. The kinds of issues you need to figure out how to address in advance fall into two categories:

  • Business Process Failures: Although you'll likely manage most business process issues as you roll out your sustained deployment, the odds are high some will be of such high concern they will require immediate remediation. These are often compliance related.
  • Egregious Employee Violations: Most employee-related issues can be dealt with as you gradually shift into enforcement mode, but as in the example above, you will encounter situations requiring immediate action.

In terms of process, I suggest two tracks based on the nature of the incident. Business process failures usually involve escalation within security or IT, possible involvement of compliance or risk management, and engagement with the business unity itself. You are less concerned with getting someone in trouble than stopping the problem.

Employee violations, due to their legal sensitivity, require a more formal process. Typically you'll need to open an investigation and immediately escalate to management while engaging legal and human resources (since this might be a firing offense). Contingencies need to be established in case law enforcement is engaged, including plans to provide forensic evidence to law enforcement without having them walk out the door with your nice new DLP box and hard drives. Essentially you want to implement whatever process you already have in place for internal employee investigations and potential termination.

In our next post we'll focus more on rolling out the tool, followed by how to configure it for those quick wins I keep teasing you with.

–Rich

Monday, February 01, 2010

Pragmatic Data Security: Discover

By Rich

In the Discovery phase we figure where the heck our sensitive information is, how it's being used, and how well it's protected. If performed manually, or with too broad an approach, Discovery can be quite difficult and time consuming. In the pragmatic approach we stick with a very narrow scope and leverage automation for greater efficiency. A mid-sized organization can see immediate benefits in a matter of weeks to months, and usually finish a comprehensive review (including all endpoints) within a year or less.

Discover: The Process

Before we get into the process, be aware that your job will be infinitely harder if you don't have a reasonably up to date directory infrastructure. If you can't figure out your users, groups, and roles, it will be much harder to identify misuse of data or build enforcement policies. Take the time to clean up your directory before you start scanning and filtering for content. Also, the odds are very high that you will find something that requires disciplinary action. Make sure you have a process in place to handle policy violations, and work with HR and Legal before you start finding things that will get someone fired (trust me, those odds are pretty darn high).

You have a couple choices for where to start -- depending on your goals, you can begin with applications/databases, storage repositories (including endpoints), or the network. If you are dealing with something like PCI, stored data is usually the best place to start, since avoiding unencrypted card numbers on storage is an explicit requirement. For HIPAA, you might want to start on the network since most of the violations in organizations I talk to relate to policy violations over email/web/FTP due to bad business processes. For each area, here's how you do it:

  • Storage and Endpoints: Unless you have a heck of a lot of bodies, you will need a Data Loss Prevention tool with content discovery capabilities (I mention a few alternatives in the Tools section, but DLP is your best choice). Build a policy based on the content definition you built in the first phase. Remember, stick to a single data/content type to start. Unless you are in a smaller organization and plan on scanning everything, you need to identify your initial target range -- typically major repositories or endpoints grouped by business unit. Don't pick something too broad or you might end up with too many results to do anything with. Also, you'll need some sort of access to the server -- either by installing an agent or through access to a file share. Once you get your first results, tune your policy as needed and start expanding your scope to scan more systems.
  • Network: Again, a DLP tool is your friend here, although unlike with content discovery you have more options to leverage other tools for some sort of basic analysis. They won't be nearly as effective, and I really suggest using the right tool for the job. Put your network tool in monitoring mode and build a policy to generate alerts using the same data definition we talked about when scanning storage. You might focus on just a few key channels to start -- such as email, web, and FTP; with a narrow IP range/subnet if you are in a larger organization. This will give you a good idea of how your data is being used, identify some bad business process (like unencrypted FTP to a partner), and which users or departments are the worst abusers. Based on your initial results you'll tune your policy as needed. Right now our goal is to figure out where we have problems -- we will get to fixing them in a different phase.
  • Applications & Databases: Your goal is to determine which applications and databases have sensitive data, and you have a few different approaches to choose from. This is the part of the process where a manual effort can be somewhat effective, although it's not as comprehensive as using automated tools. Simply reach out to different business units, especially the application support and database management teams, to create an inventory. Don't ask them which systems have sensitive data, ask them for an inventory of all systems. The odds are very high your data is stored in places you don't expect, so to check these systems perform a flat file dump and scan the output with a pattern matching tool. If you have the budget, I suggest using a database discovery tool -- preferably one with built in content discovery (there aren't many on the market, as we'll mention in the Tools section). Depending on the tool you use, it will either sniff the network for database connections and then identify those systems, or scan based on IP ranges. If the tool includes content discovery, you'll usually give it some level of administrative access to scan the internal database structures.

I just presented a lot of options, but remember we are taking the pragmatic approach. I don't expect you to try all this at once -- pick one area, with a narrow scope, knowing you will expand later. Focus on wherever you think you might have the greatest initial impact, or where you have known problems. I'm not an idealist -- some of this is hard work and takes time, but it isn't an endless process and you will have a positive impact.

We aren't necessarily done once we figure out where the data is -- for approved repositories, I really recommend you also re-check their security. Run at least a basic vulnerability scan, and for bigger repositories I recommend a focused penetration test. (Of course, if you already know it's insecure you probably don't need to beat the dead horse with another check). Later, in the Secure phase, we'll need to lock down the approved repositories so it's important to know which security holes to plug.

Discover: Technologies

Unlike the Define phase, here we have a plethora of options. I'll break this into two parts: recommended tools that are best for the job, and ancillary tools in case you don't have a budget for anything new. Since we're focused on the process in this series, I'll skip definitions and descriptions of the technologies, most of which you can find in our Research Library

Recommended Tools

  1. Data Loss Prevention (DLP): This is the best tool for storage, network, and endpoint discovery. Nothing else is nearly as effective.
  2. Database Discovery: While there are only a few tools on the market, they are extremely helpful for finding all the unexpected databases that tend to be floating around most organizations. Some offer content discovery, but it's usually limited to regular expressions/keywords (which is often totally fine for looking within a database).
  3. Database Activity Monitoring (DAM): A couple of the tools include content discovery (some also include database discovery). I only recommend DAM in the discover phase if you also intend on using it later for database monitoring -- otherwise it's not the right investment.

Ancillary Tools

  1. IDS/IPS/Deep Packet Inspection: There are a bunch of different deep packet inspection network tools -- including UTM, Web Application Firewalls, and web gateways -- that now include basic regular expression pattern matching for "poor man's" DLP functionality. They only help with data that fits a pattern, they don't include any workflow, and they usually have a ton of false positives. If the tool can't crack open file attachments/transfers it probably won't be very helpful.
  2. Electronic Discovery, Search, and Data Classification: Most of these tools perform some level of pattern matching or indexing that can help with discovery. They tend to have much higher false positive rates than DLP (and usually cost more if you're buying new), but if you already have one and budgets are tight they can help.
  3. Email Security Gateways: Most of the email security gateways on the market can scan for content, but they are obviously limited to only email, and aren't necessarily well suited to the discovery process.
  4. FOSS Discovery Tools: There are a couple of free/open source content discovery tools, mostly projects from higher education institutions that built their own tools to weed out improper use of Social Security numbers due to a regulatory change a few years back.

Discover: Case Study

Frank from Billy Bob's Bait Shop and Sushi Outlet decides to use a DLP tool to help figure out where any unencrypted credit card numbers might be stored. He decides to go with a full suite DLP tool since he knows he needs to scan his network, storage, servers in the retail outlets, and employee systems.

Before turning on the tool, he contacts Legal and HR to set up a process in case they find any employees illegally using these numbers, as opposed to the accidental or business-process leaks he also expects to manage. Although his directory servers are a little messy due to all the short-term employees endemic to retail operations, he's confident his core Active Directory server is relatively up to date, especially where systems/servers are concerned.

Since he's using a DLP tool, he develops a three-tier policy to base his discovery scans on:

  1. Using the one database with stored unencrypted numbers, he creates a database fingerprinting policy to alert on exact matches from that database (his DLP tool uses hashes, not the original values, so it isn't creating a new security exposure). These are critical alerts.
  2. His next policy uses database fingerprints of all customer names from the customer database, combined with a regular expression for generic credit card numbers. If a customer name appears with something that matches a credit card number (based on the regex pattern) it generates a medium alert.
  3. His lowest priority policy uses the default "PCI" category built into his DLP tool, which is predominantly basic pattern matching.

He breaks his project down into three phases, to run during overlapping periods:

  1. Using those three policies, he turns on network monitoring for email, web, and FTP.
  2. He begins scanning his storage repositories, starting in the data center. Once he finishes those, he will expand the scans into systems in the retail outlets. He expects his data center scan to go relatively quickly, but is planning on 6-12 months to cover the retail outlets.
  3. He is testing endpoint discovery in the lab, but since their workstation management is a bit messy he isn't planning on trying to install agents and beginning scans until the second year of the project.

It took Frank about two months to coordinate with other business/IT units before starting the project. Installing DLP on the network only took a few hours because everything ran through one main gateway, and he wasn't worried about installing any proxy/blocking technology.

Frank immediately saw network results, and found one serious business process problem where unencrypted numbers were included in files being FTPed to a business partner. The rest of his incidents involved individual accidents, and for the most part they weren't losing credit card numbers over the monitored channels.

The content discovery portion took a bit longer since there wasn't a consistent administrative account he could use to access and scan all the servers. Even though they are a relatively small operation, it took about 2 months of full time scanning to get through the data center due to all the manual coordination involved. They found a large number of old spreadsheets with credit card numbers in various directories, and a few in flat files -- especially database dumps from development.

The retail outlets actually took less time than he expected. Most of the servers, except at the largest regional locations, were remotely managed and well inventoried. He found that 20% of them were running on an older credit card transaction system that stored unencrypted credit card numbers.

Remember, this is a 1,000 person organization... if you work someplace with five or ten times the employees and infrastructure, your process will take longer. Don't assume it will take five or ten times longer, though -- it all depends on scope, infrastructure, and a variety of other factors.

–Rich

Monday, June 01, 2009

The State of Web Application and Data Security—Mid 2009

By Rich

One of the more difficult aspects of the analyst gig is sorting through all the information you get, and isolating out any inherent biases. The kinds of inquiries we get from clients can all too easily skew our perceptions of the industry, since people tend to come to us for specific reasons, and those reasons don't necessarily represent the mean of the industry. Aside from all the vendor updates (and customer references), our end user conversations usually involve helping someone with a specific problem -- ranging from vendor selection, to basic technology education, to strategy development/problem solving. People call us when they need help, not when things are running well, so it's all too easy to assume a particular technology is being used more widely than it really is, or a problem is bigger or smaller than it really is, because everyone calling us is asking about it. Countering this takes a lot of outreach to find out what people are really doing even when they aren't calling us.

Over the past few weeks I've had a series of opportunities to work with end users outside the context of normal inbound inquiries, and it's been fairly enlightening. These included direct client calls, executive roundtables such as one I participated in recently with IANS (with a mix from Fortune 50 to mid-size enterprises), and some outreach on our part. They reinforced some of what we've been thinking, while breaking other assumptions. I thought it would be good to compile these together into a "state of the industry" summary. Since I spend most of my time focused on web application and data security, I'll only cover those areas:

image

When it comes to web application and data security, if there isn't a compliance requirement, there isn't budget -- Nearly all of the security professionals we've spoken with recognize the importance of web application and data security, but they consistently tell us that unless there is a compliance requirement it's very difficult for them to get budget. That's not to say it's impossible, but non-compliance projects (however important) are way down the priority list in most organizations. In a room of a dozen high-level security managers of (mostly) large enterprises, they all reinforced that compliance drove nearly all of their new projects, and there was little support for non-compliance-related web application or data security initiatives. I doubt this surprises any of you.

"Compliance" may mean more than compliance -- Activities that are positioned as helping with compliance, even if they aren't a direct requirement, are more likely to gain funding. This is especially true for projects that could reduce compliance costs. They will have a longer approval cycle, often 9 months or so, compared to the 3-6 months for directly-required compliance activities. Initiatives directly tied to limiting potential data breach notifications are the most cited driver. Two technology examples are full disk encryption and portable device control.

PCI is the single biggest compliance driver for web application and data security -- I may not be thrilled with PCI, but it's driving more web application and data security improvements than anything else.

The term Data Loss Prevention has lost meaning -- I discussed this in a post last week. Even those who have gone through a DLP tool selection process often use the term to encompass more than the narrow definition we prefer.

It's easier to get resources to do some things manually than to buy a tool -- Although tools would be much more efficient and effective for some projects, in terms of costs and results, manual projects using existing resources are easier to get approval for. As one manager put it, "I already have the bodies, and I won't get any more money for new tools." The most common example cited was content discovery (we'll talk more about this a few points down).

Most people use DLP for network (primarily email) monitoring, not content discovery or endpoint protection -- Even though we tend to think discovery offers equal or greater value, most organizations with DLP use it for network monitoring.

Interest in content discovery, especially DLP-based, is high, but resources are hard to get for discovery projects -- Most security managers I talk with are very interested in content discovery, but they are less educated on the options and don't have the resources. They tell me that finding the data is the easy part -- getting resources to do anything about it is the limiting factor.

The Web Application Firewall (WAF) market and Security Source Code Tools markets are nearly equal in size, with more clients on WAFs, and more money spent on source code tools per client -- While it's hard to fully quantify, we think the source code tools cost more per implementation, but WAFs are in slightly wider use.

WAFs are a quicker hit for PCI compliance -- Most organizations deploying WAFs do so for PCI compliance, and they're seen as a quicker fix than secure source code projects.

Most WAF deployments are out of band, and false positives are a major problem for default deployments -- Customers are installing WAFs for compliance, but are generally unable to deploy them inline (initially) due to the tuning requirements.

Full drive encryption is mature, and well deployed in the early mainstream -- Full drive encryption, while not perfect, is deployable in even large enterprises. It's now considered a level-setting best practice in financial services, and usage is growing in healthcare and insurance. Other asset recovery options, such as remote data destruction and phone home applications, are now seen as little more than snake oil. As one CISO told us, "I don't care about the laptop, we just encrypt it and don't worry about it when it goes missing".

File and folder encryption is not in wide use -- Very few organizations are performing any wide scale file/folder encryption, outside of some targeted encryption of PII for compliance requirements.

Database encryption is hard, and not widely used -- Most organizations are dissatisfied with database encryption options, and do not deploy it widely. Within a large organization there is likely some DB encryption, with preference given to file/folder/media protection over column level encryption, but most organizations prefer to avoid it. Performance and key management are cited as the primary obstacles, even when using native tools. Current versions of database encryption (primarily native encryption) do perform better than older versions, but key management is still unsatisfactory. Large encryption projects, when initiated, take an average of 12-18 months.

Large enterprises prefer application-level encryption of credit card numbers, and tokenization -- When it comes to credit card numbers, security managers prefer to encrypt it at the application level, or consolidate numbers into a central source, using representative "tokens" throughout the rest of the application stack. These projects take a minimum of 12-18 months, similar to database encryption projects (the two are often tied together, with encryption used in the source database).

Email encryption and DRM tend to be workgroup-specific deployments -- Email encryption and DRM use is scattered throughout the industry, but is still generally limited to workgroup-level projects due to the complexity of management, or lack of demand/compliance from users.

Database Activity Monitoring usage continues to grow slowly, mostly for compliance, but not quickly enough to save lagging vendors -- Many DAM deployments are still tied to SOX auditing, and it's not as widely used for other data security initiatives. Performance is reasonable when you can use endpoint agents, which some DBAs still resist. Network monitoring is not seen as effective, but may still be used when local monitoring isn't an option. Network requirements, depending on the tool, may also inhibit deployments.

My main takeaway is that security managers know what they need to do to protect information assets, but they lack the time, resources, and management support for many initiatives. There is also broad dissatisfaction with security tools and vendors in general, in large part due to poor expectation setting during the sales process, and deliberately confusing marketing. It's not that the tools don't work, but that they're never quite as easy as promised.

It's an interesting dilemma, since there is clear and broad recognition that data security (and by extension, web application security) is likely our most pressing overall issue in terms of security, but due to a variety of factors (many of which we covered in our Business Justification for Data Security paper), the resources just aren't there to really tackle it head-on.

–Rich

Tuesday, February 10, 2009

Do You Use DLP? We Should Talk

By Rich

As an analyst, I've been covering DLP since before there was anything called DLP. I like to joke that I've talked with more people that have evaluated and deployed DLP than anyone else on the face of the planet. Yes, it's exactly as exciting as it sounds.

But all those references were fairly self-selected. They've either been Gartner clients, or our current enterprise clients, that were/are typically looking for help in product selection or dealing with some sort of problem. Many of the rest are vendor-supplied references. This combination skews the conversations towards people picking products, people with problems, or those a vendor think will make them look good.

I'm currently working on an article for Information Security magazine on "Real-World DLP", and I'm hunting for some new references to expand that field a bit. If you are using DLP, successfully or not, and are willing to talk confidentially, please drop me a line. I'm looking for real-world stories, good and bad. If you are willing to go on the record, we're also looking for good quote sources. The focus of the article is more on implementation than selection, and will be vendor-neutral.

To be honest, one reason I'm putting this out in the open is to see if my normal reference channels are skewed. It's time to see how our current positions and assumptions play out on the mean streets of reality.

Of course I'll be totally pissed if I've been wrong this entire time and have to retract everything I've ever written on DLP.

**Update - Oh yeah, my email address is rmogull, that is with two 'L's, at securosis dot com. Please let me know.

–Rich

Thursday, December 04, 2008

Analysis Of The Microsoft/RSA Data Loss Prevention Partnership

By Rich

By the time I post this you won't be able to find a tech news site that isn't covering this one. I know, since my name was on the list of analysts the press could contact and I spent a few hours talking to everyone covering the story yesterday. Rather than just reciting the press release, I'd like to add some analysis, put things into context, and speculate wildly. For the record, this is a big deal in the long term, and will likely benefit all of the major DLP vendors, even though there's nothing earth shattering in the short term.

As you read this, Microsoft and RSA are announcing a partnership for Data Loss Prevention. Here are the nitty gritty details, not all of which will be apparent from the press release:

  • This month, the RSA DLP product (Tablus for you old folks) will be able to assign Microsoft RMS (what Microsoft calls DRM) rights to stored data based on content discovery. The way this works is that the RMS administrator will define a data protection template (what rights are assigned to what users). The RSA DLP administrator then creates a content detection policy, which can then apply the RMS rights automatically based on the content of files. The RSA DLP solution will then scan file repositories (including endpoints) and apply the RMS rights/controls to protect the content.
  • Microsoft has licensed the RSA DLP technology to embed into various Microsoft products. They aren't offering much detail at this time, nor any timelines, but we do know a few specifics. Microsoft will slowly begin adding the RSA DLP content analysis engine to various products. The non-NDA slides hint at everything from SQL Server, Exchange, and Sharepoint, to Windows and Office. Microsoft will also include basic DLP management into their other management tools.
  • Policies will work across both Microsoft and RSA in the future as the products evolve. Microsoft will be limiting itself to their environment, with RSA as the upgrade path for fuller DLP coverage.

And that's it for now. RSA DLP 6.5 will link into RMS, with Microsoft licensing the technology for future use in their products. Now for the analysis:

  • This is an extremely significant development in the long term future of DLP. Actually, it's a nail in the coffin of the term "DLP" and moves us clearly and directly to what we call "CMP"- Content Monitoring and Protection. It moves us closer and closer to the DLP engine being available everywhere (and somewhat commoditized), and the real value in being in the central policy management, analysis, workflow, and incident management system. DLP/CMP vendors don't go away- but their focus changes as the agent technology is built more broadly into the IT infrastructure (this definitely won't be limited to just Microsoft).
  • It's not very exciting in the short term. RSA isn't the first to plug DLP into RMS (Workshare does it, but they aren't nearly as big in the DLP market). RSA is only enabling this for content discovery (data at rest) and rights won't be applied immediately as files are created/saved. It's really the next stages of this that are interesting.
  • This is good for all the major DLP vendors, although a bit better for RSA. It's big validation for the DLP/CMP market, and since Microsoft is licensing the technology to embed, it's reasonable to assume that down the road it may be accessible to other DLP vendors (be aware- that's major speculation on my part).
  • This partnership also highlights the tight relationship between DLP/CMP and identity management. Most of the DLP vendors plug into Microsoft Active Directory to determine users/groups/roles for the application of content protection policies. One of the biggest obstacles to a successful DLP deployment can be a poor directory infrastructure. If you don't know what users have what roles, it's awfully hard to create content-based policies that are enforced based on users and roles.
  • We don't know how much cash is involved, but financially this is likely good for RSA (the licensing part). I don't expect it to overly impact sales in the short term, and the other major DLP vendors shouldn't be too worried for now. DLP deals will still be competitive based on the capabilities of current products, more than what's coming in an indeterminate future.

Now just imagine a world where you run a query on a SQL database, and any sensitive results are appropriately protected as you place them into an Excel spreadsheet. You then drop that spreadsheet into a Powerpoint presentation and email it to the sales team. It's still quietly protected, and when one sales guy tries to email it to his Gmail account, it's blocked. When he transfers it to a USB device, it's encrypted using a company key so he can't put it on his home computer. If he accidentally sends it to someone in the call center, they can't read it. In the final PDF, he can't cut out the table and put it in another document. That's where we are headed- DLP/CMP is enmeshed into the background, protecting content through it's lifecycle based on central policies and content and context awareness.

In summary, it's great in the long term, good but not exciting in the short term, and beneficial to the entire DLP market, with a slight edge for RSA. There are a ton of open questions and issues, and we'll be watching and analyzing this one for a while.

As always, feel free to email me if you have any questions.

–Rich

Wednesday, July 23, 2008

Best Practices For Endpoint DLP: Use Cases

By Rich

We've covered a lot of ground over the past few posts on endpoint DLP. Our last post finished our discussion of best practices and I'd like to close with a few short fictional use cases based on real deployments.

Endpoint Discovery and File Monitoring for PCI Compliance Support

BuyMore is a large regional home goods and grocery retailer in the southwest United States. In a previous PCI audit, credit card information was discovered on some employee laptops mixed in with loyalty program data and customer demographics. An expensive, manual audit and cleansing was performed within business units handling this content. To avoid similar issues in the future, BuyMore purchased an endpoint DLP solution with discovery and real time file monitoring support.

BuyMore has a highly distributed infrastructure due to multiple acquisitions and independently managed retail outlets (approximately 150 locations). During initial testing it was determined that database fingerprinting would be the best content analysis technique for the corporate headquarters, regional offices, and retail outlet servers, while rules-based analysis is the best fit for the systems used by store managers. The eventual goal is to transition all locations to database fingerprinting, once a database consolidation and cleansing program is complete.

During Phase 1, endpoint agents were deployed to corporate headquarters laptops for the customer relations and marketing team. An initial content discovery scan was performed, with policy violations reported to managers and the affected employees. For violations, a second scan was performed 30 days later to ensure that the data was removed. In Phase 2, the endpoint agents were switched into real time monitoring mode when the central management server was available (to support the database fingerprinting policy). Systems that leave the corporate network are then scanned monthly when the connect back in, with the tool tuned to only scan files modified since the last scan. All systems are scanned on a rotating quarterly basis, and reports generated and provided to the auditors.

For Phase 3, agents were expanded to the rest of the corporate headquarters team over the course of 6 months, on a business unit by business unit basis.

For the final phase, agents were deployed to retail outlets on a store by store basis. Due to the lower quality of database data in these locations, a rules-based policy for credit cards was used. Policy violations automatically generate an email to the store manager, and are reported to the central policy server for followup by a compliance manager.

At the end of 18 months, corporate headquarters and 78% or retail outlets were covered. BuyMore is planning on adding USB blocking in their next year of deployment, and already completed deployment of network filtering and content discovery for storage repositories.

Endpoint Enforcement for Intellectual Property Protection

EngineeringCo is a small contract engineering firm with 500 employees in the high tech manufacturing industry. They specialize in designing highly competitive mobile phones for major manufacturers. In 2006 they suffered a major theft of their intellectual property when a contractor transferred product description documents and CAD diagrams for a new design onto a USB device and sold them to a competitor in Asia, which beat their client to market by 3 months.

EngineeringCo purchased a full DLP suite in 2007 and completed deployment of partial document matching policies on the network, followed by network-scanning-based content discovery policies for corporate desktops. After 6 months they added network blocking for email, http, and ftp, and violations are at an acceptable level. In the first half of 2008 they began deployment of endpoint agents for engineering laptops (approximately 150 systems).

Because the information involved is so valuable, EngineeringCo decided to deploy full partial document matching policies on their endpoints. Testing determined performance is acceptable on current systems if the analysis signatures are limited to 500 MB in total size. To accommodate this limit, a special directory was established for each major project where managers drop key documents, rather than all project documents (which are still scanned and protected at the network). Engineers can work with documents, but the endpoint agent blocks network transmission except for internal email and file sharing, and any portable storage. The network gateway prevents engineers from emailing documents externally using their corporate email, but since it's a gateway solution internal emails aren't scanned.

Engineering teams are typically 5-25 individuals, and agents were deployed on a team by team basis, taking approximately 6 months total.

These are, of course, fictional best practices examples, but they're drawn from discussions with dozens of DLP clients. The key takeaways are:

  1. Start small, with a few simple policies and a limited footprint.
  2. Grow deployments as you reduce incidents/violations to keep your incident queue under control and educate employees.
  3. Start with monitoring/alerting and employee education, then move on to enforcement.
  4. This is risk reduction, not risk elimination. Use the tool to identify and reduce exposure but don't expect it to magically solve all your data security problems.
  5. When you add new policies, test first with a limited audience before rolling them out to the entire scope, even if you are already covering the entire enterprise with other policies.

–Rich

Thursday, July 17, 2008

Best Practices for Endpoint DLP: Part 5, Deployment

By Rich

In our last post we talked about prepping for deployment- setting expectations, prioritizing, integrating with the infrastructure, and defining workflow. Now it's time to get out of the lab and get our hands dirty.

Today we're going to move beyond planning into deployment.

  1. Integrate with your infrastructure: Endpoint DLP tools require integration with a few different infrastructure elements. First, if you are using a full DLP suite, figure out if you need to perform any extra integration before moving to endpoint deployments. Some suites OEM the endpoint agent and you may need some additional components to get up and running. In other cases, you'll need to plan capacity and possibly deploy additional servers to handle the endpoint load. Next, integrate with your directory infrastructure if you haven't already. Determine if you need any additional information to tie users to devices (in most cases, this is built into the tool and its directory integration components).
  2. Integrate on the endpoint: In your preparatory steps you should have performed testing to be comfortable that the agent is compatible with your standard images and other workstation configurations. Now you need to add the agent to the production images and prepare deployment packages. Don't forget to configure the agent before deployment, especially the home server location and how much space and resources to use on the endpoint. Depending on your tool, this may be managed after initial deployment by your management server.
  3. Deploy agents to initial workgroups: You'll want to start with a limited deployment before rolling out to the larger enterprise. Pick a workgroup where you can test your initial policies.
  4. Build initial policies: For your first deployment, you should start with a small subset of policies, or even a single policy, in alert or content classification/discovery mode (where the tool reports on sensitive data, but doesn't generate policy violations).
  5. Baseline, then expand deployment: Deploy your initial policies to the starting workgroup. Try to roll the policies out one monitoring/enforcement mode at a time, e.g., start with endpoint discovery, then move to USB blocking, then add network alerting, then blocking, and so on. Once you have a good feel for the effectiveness of the policies, performance, and enterprise integration, you can expand into a wider deployment, covering more of the enterprise. After the first few you'll have a good understanding of how quickly, and how widely, you can roll out new policies.
  6. Tune policies: Even stable policies may require tuning over time. In some cases it's to improve effectiveness, in others to reduce false positives, and in still other cases to adapt to evolving business needs. You'll want to initially tune policies during baselining, but continue to tune them as the deployment expands. Most DLP clients report that they don't spend much time tuning policies after baselining, but it's always a good idea to keep your policies current with enterprise needs.
  7. Add enforcement/protection: By this point you should understand the effectiveness of your policies, and have educated users where you've found policy violations. You can now start switching to enforcement or protective actions, such as blocking, network filtering, or encryption of files. It's important to notify users of enforcement actions as they occur, otherwise you might frustrate them u ecessarily. If you're making a major change to established business process, consider scaling out enforcement options on a business unit by business unit basis (e.g., restricting access to a common content type to meet a new compliance need).

Deploying endpoint DLP isn't really very difficult; the most common mistake enterprises make is deploying agents and policies too widely, too quickly. When you combine a new endpoint agent with intrusive enforcement actions that interfere (positively or negatively) with people's work habits, you risk grumpy employees and political backlash. Most organizations find that a staged rollout of agents, followed by first deploying monitoring policies before moving into enforcement, then a staged rollout of policies, is the most effective approach.

–Rich

Wednesday, July 16, 2008

Upcoming Webcast- DLP and DAM Together

By Rich

On July 29th I'll be giving a webcast entitled Using Data Leakage Prevention and Database Activity Monitoring for Data Protection. It's a mix of my content on DLP, DAM and Information Centric security, designed to show you how to piece these technologies together.

It's sponsored by Tizor, and you can register here (the content, as always, is my independent stuff). Here's the description:

When it comes to data security, few things are certain, but there is one thing that very few security experts will dispute. Enterprises need a new way of thinking about data security, because traditional data security methods are just not working. Data Leakage Prevention (DLP) and Database Activity Monitoring (DAM) are two fundamental components of the new security landscape. Predicated on the need to "know" what is actually happening with sensitive data, DLP and DAM address pressing security issues. But despite the value that these two technologies offer, there is a great deal of confusion about what these technologies actually do and how they should be implemented. At this webinar, Rich Mogull, one of today"s most well respected security experts, will clear up the confusion about DLP and DAM. Rich will discuss: * The business problems created by a lack of data centric security * How these problems relate to today"s threats and technologies * What DLP and DAM do and how they fit into the enterprise security environment * Best practices for creating a data centric security model for your organization

–Rich

Monday, July 07, 2008

Best Practices for Endpoint DLP: Part 3

By Rich

In our last post we discussed the core functions of an endpoint DLP tool. Today we're going to talk more about agent deployment, management, policy creation, enforcement workflow, and overall integration.

Agent Management

Agent management consists of two main functions- deployment and maintenance. On the deployment side, most tools today are designed to work with whatever workstation management tools your organization already uses. As with other software tools, you create a deployment package and then distribute it along with any other software updates. If you don't already have a software deployment tool, you'll want to look for an endpoint DLP tool that includes basic deployment capabilities. Since all endpoint DLP tools include central policy management, deployment is fairly straightforward. There's little need to customize packages based on user, group, or other variables beyond the location of the central management server.

The rest of the agent's lifecycle, aside from major updates, is controlled through the central management server. Agents should communicate regularly with the central server to receive policy updates and report incidents/activity. When the central management server is accessible, this should happen in near real time. When the endpoint is off the enterprise network (without VPN/remote access), the DLP tool will store violations locally in a secure repository that's encrypted and inaccessible to the user. The tool will then connect with the management server next time it's accessible, receiving policy updates and reporting activity. The management server should produce aging reports to help you identify endpoints which are out of date and need to be refreshed. Under some circumstances, the endpoint may be able to communicate remote violations through encrypted email or another secure mechanism from outside the corporate firewall.

Aside from content policy updates and activity reporting, there are a few other features that need central management. For content discovery, you'll need to control scanning schedule/frequency, and control bandwidth and performance (e.g., capping CPU usage). For real time monitoring and enforcement you'll also want performance controls, including limits on how much space is used to store policies and the local cache of incident information.

Once you set your base configuration, you shouldn't need to do much endpoint management directly. Things like enforcement actions are handled implicitly as part of policy, thus integrated into the main DLP policy interface.

Policy Creation and Workflow

Policy creation for endpoints should be fully integrated into your central DLP policy framework for consistent enforcement across data in motion, at rest, and in use. Policies are thus content focused, rather than location focused- another advantage of full suites over individual point products. In the policy management interface you first define the content to protect, then pick channels and enforcement actions (all, of course, tied to users/groups and context). For example, you might want to create a policy to protect customer account numbers. You'd start by creating a database fingerprinting policy pulling names and account numbers from the customer database; this is the content definition phase. Assuming you want the policy to apply equally to all employees, you then define network protective actions- e.g., blocking unencrypted emails with account numbers, blocking http and ftp traffic, and alerting on other channels where blocking isn't possible. For content discovery, quarantine any files with more than one account number that are not on a registered server. Then, for endpoints, restrict account numbers from unencrypted files, portable storage, or network communications when the user is off the corporate network, switching to a rules-based (regular expression) policy when access to the policy server isn't available.

In some cases you might need to design these as separate but related policies- for example, the database fingerprinting policy applies when the endpoint is on the network, and a simplified rules-based policy when the endpoint is remote.

Incident management should also be fully integrated into the overall DLP incident handling queue. Incidents appear in a single interface, and can be routed to handlers based on policy violated, user, severity, channel, or other criteria. Remember that DLP is focused on solving the business problem of protecting your information, and thus tends to require a dedicated workflow.

For endpoint DLP you'll need some additional information beyond network or non-endpoint discovery policies. Since some violations will occur when the system is off the network and unable to communicate with the central management server, "delayed notification" violations need to be appropriately stamped and prioritized in the management interface. You'd hate to miss the loss of your entire customer database because it showed up as a week-old incident when the sales laptop finally reconnected.

Otherwise, workflow is fully integrated into your main DLP solution, and any endpoint-specific actions are handled through the same mechanisms as discovery or network activity.

Integration

If you're running an endpoint only solution, an integrated user interface obviously isn't an issue. For full suite solutions, as we just discussed, policy creation, management, and incident workflow should be completely integrated with network and discovery policies.

Other endpoint management is typically a separate tab in the main interface, alongside management areas for discovery/storage management and network integration/management. While you want an integrated management interface, you don't want it so integrated that it becomes confusing or unwieldy to use.

In most DLP tools, content discovery is managed separately to define repositories and manage scanning schedules and performance. Endpoint DLP discovery should be included here, and allow you to specify device and user groups instead of having to manage endpoints individually.

That's about it for the technology side; in our next posts we'll look at best practices for deployment and management, and present a few generic use cases.

I realize I'm pretty biased towards full-suite solutions, and this is your chance to call me on it. If you disagree, please let me know in the comments...

–Rich

Wednesday, July 02, 2008

Best Practices For Endpoint DLP: Part 2

By Rich

In Part 1 I talked about the definition of endpoint DLP, the business drivers, and how it integrates with full-suite solutions. Today (and over the next few days) we're going to start digging into the technology itself.

Base Agent Functions

There is massive variation in the capabilities of different endpoint agents. Even for a single given function, there can be a dozen different approaches, all with varying degrees of success. Also, not all agents contain all features; in fact, most agents lack one or more major areas of functionality.

Agents include four generic layers/features:

  1. Content Discovery: Scanning of stored content for policy violations.
  2. File System Protection: Monitoring and enforcement of file operations as they occur (as opposed to discovery, which is scanning of content already written to media). Most often, this is used to prevent content from being written to portable media/USB. It's also where tools hook in for automatic encryption or application of DRM rights.
  3. Network Protection: Monitoring and enforcement of network operations. Provides protection similar to gateway DLP when a system is off the corporate network. Since most systems treat printing and faxing as a form of network traffic, this is where most print/fax protection can be enforced (the rest comes from special print/fax hooks).
  4. GUI/Kernel Protection: A more generic category to cover data in use scenarios, such as cut/paste, application restrictions, and print screen.

Between these four categories we cover most of the day to day operations a user might perform that places content at risk. It hits our primary drivers from the last post- protecting data from portable storage, protecting systems off the corporate network, and supporting discovery on the endpoint. Most of the tools on the market start with file and (then) networking features before moving on to some of the more complex GUI/kernel functions.

Agent Content Awareness

Even if you have an endpoint with a quad-core processor and 8 GB of RAM, the odds are you don't want to devote all of that horsepower to enforcing DLP.

Content analysis may be resource intensive, depending on the types of policies you are trying to enforce. Also, different agents have different enforcement capabilities which may or may not match up to their gateway counterparts. At a minimum, most endpoint tools support rules/regular expressions, some degree of partial document matching, and a whole lot of contextual analysis. Others support their entire repertoire of content analysis techniques, but you will likely have to tune policies to run on a more resource constrained endpoint.

Some tools rely on the central management server for aspects of content analysis, to offload agent overhead. Rather than performing all analysis locally, they will ship content back to the server, then act on any results. This obviously isn't ideal, since those policies can't be enforced when the endpoint is off the enterprise network, and it will suck up a fair bit of bandwidth. But it does allow enforcement of policies that are otherwise totally unrealistic on an endpoint, such as database fingerprinting of a large enterprise DB.

One emerging option is policies that adapt based on endpoint location. For example, when you're on the enterprise network most policies are enforced at the gateway. Once you access the Internet outside the corporate walls, a different set of policies is enforced. For example, you might use database fingerprinting (exact database matching) of the customer DB at the gateway when the laptop is in the office or on a (non split tunneled) VPN, but drop to a rule/regex for Social Security Numbers (or account numbers) for mobile workers. Sure, you'll get more false positives, but you're still able to protect your sensitive information while meeting performance requirements.

Next up: more on the technology, followed by best practices for deployment and implementation.

–Rich

Monday, June 30, 2008

Best Practices For Endpoint DLP: Part 1

By Rich

As the first analyst to ever cover Data Loss Prevention, I've had a bit of a tumultuous relationship with endpoint DLP. Early on I tended to exclude endpoint only solutions because they were more limited in functionality, and couldn't help at all with protecting data loss from unmanaged systems. But even then I always said that, eventually, endpoint DLP would be a critical component of any DLP solution. When we're looking at a problem like data loss, no individual point solution will give us everything we need.

Over the next few posts we're going to dig into endpoint DLP. I'll start by discussing how I define it, and why I don't generally recommend stand-alone endpoint DLP. I'll talk about key features to look for, then focus on best practices for implementation.

It won't come as any surprise that these posts are building up into another one of my whitepapers. This is about as transparent a research process as I can think of. And speaking of transparency, like most of my other papers this one is sponsored, but the content is completely objective (sponsors can suggest a topic, if it's objective, but they don't have input on the content).

Definition

As always, we need to start with our definition for DLP/CMP:

"Products that, based on central policies, identify, monitor, and protect data at rest, in motion, and in use through deep content analysis".

Endpoint DLP helps manage all three parts of this problem. The first is protecting data at rest when it's on the endpoint; or what we call content discovery (and I wrote up in great detail). Our primary goal is keeping track of sensitive data as it proliferates out to laptops, desktops, and even portable media. The second part, and the most difficult problem in DLP, is protecting data in use. This is a catch all term we use to describe DLP monitoring and protection of content as it's used on a desktop- cut and paste, moving data in and out of applications, and even tying in with encryption and enterprise Document Rights Management (DRM). Finally, endpoint DLP provides data in motion protection for systems outside the purview of network DLP- such as a laptop out in the field.

Endpoint DLP is a little difficult to discuss since it's one of the fastest changing areas in a rapidly evolving space. I don't believe any single product has every little piece of functionality I'm going to talk about, so (at least where functionality is concerned) this series will lay out all the recommended options which you can then prioritize to meet your own needs.

Endpoint DLP Drivers

In the beginning of the DLP market we nearly always recommended organizations start with network DLP. A network tool allows you to protect both managed and unmanaged systems (like contractor laptops), and is typically easier to deploy in an enterprise (since you don't have to muck with every desktop and server). It also has advantages in terms of the number and types of content protection policies you can deploy, how it integrates with email for workflow, and the scope of channels covered. During the DLP market's the first few years, it was hard to even find a content-aware endpoint agent.

But customer demand for endpoint DLP quickly grew thanks to two major needs- content discovery on the endpoint, and the ability to prevent loss through USB storage devices. We continue to see basic USB blocking tools with absolutely no content awareness brand themselves as DLP. The first batches of endpoint DLP tools focused on exactly these problems- discovery and content-aware portable media/USB device control.

The next major driver for endpoint DLP is supporting network policies when a system is outside the corporate gateway. We all live in an increasingly mobile workforce where we need to support consistent policies no matter where someone is physically located, nor how they connect to the Internet.

Finally, we see some demand for deeper integration of DLP with how a user interacts with their system. In part, this is to support more intensive policies to reduce malicious loss of data. You might, for example, disallow certain content from moving into certain applications, like encryption. Some of these same kinds of hooks are used to limit cut/paste, print screen, and fax, or to enable more advanced security like automatic encryption or application of DRM rights.

The Full Suite Advantage

As we've already hinted, there are some limitations to endpoint only DLP solutions. The first is that they only protect managed systems where you can deploy an agent. If you're worried about contractors on your network or you want protection in case someone tries to use a server to send data outside the walls, you're out of luck. Also, because some content analysis policies are processor and memory intensive, it is problematic to get them running on resource-constrained endpoints. Finally, there are many discovery situations where you don't want to deploy a local endpoint agent for your content analysis- e.g. when performing discovery on a major SAN.

Thus my bias towards full-suite solutions. Network DLP reduces losses on the enterprise network from both managed and unmanaged systems, and servers and workstations. Content discovery finds and protects stored data throughout the enterprise, while endpoint DLP protects systems that leave the network, and reduces risks across vectors that circumvent the network. It's the combination of all these layers that provides the best overall risk reduction. All of this is managed through a single policy, workflow, and administration server; rather than forcing you to create different policies; for different channels and products, with different capabilities, workflow, and management.

In our next post we'll discuss the technology and major features to look for, followed by posts on best practices for implementation.

–Rich

Sunday, June 01, 2008

Webcast June 4th: DLP Content Discovery

By Rich

Yes, it's one of those weeks, with two webcasts and a conference (SANS Pen Testing and Application Security in Vegas).

For this one we'll be talking about DLP content discovery for Vontu/Symantec. It's not just me; there will be a customer case study (yes, an honest to goodness security person willing to talk about what they've done). Here's the official description, and you can register here:

Where Is Your Confidential Data and How Do You Protect It? A Real Life Customer Success

Do you know where your confidential data is stored and how to protect it? Industry analysts predict that data discovery will be the single fastest-growing segment of the Data Loss Prevention (DLP) market in 2008 and beyond. In this webcast, you will get the opportunity to hear firsthand how Sharp HealthCare implemented a DLP solution to secure their sensitive customer data stored across the organization, and what business results they are seeing today. Join Rich Mogull, founder of Securosis LLC and former Gartner analyst, and Starla Rivers, Technical Security Architect at Sharp, as they address how to easily deploy DLP and quickly realize the solution benefits.

–Rich

Monday, May 19, 2008

New Whitepaper: Best Practices For DLP Content Discovery

By Rich

One of the most under-appreciated aspects of DLP solutions is content discovery- scanning stored data to identify sensitive content, classify information, and (in some cases) even protect the data. Major DLP tools have long evolved past just scanning network traffic for credit card and Social Security Numbers.

Today I'm releasing a new whitepaper on the topic: DLP Content Discovery: Best Practices for Stored Data Discovery and Protection.

The paper covers features, best practices for deployment, and example use cases to give you an idea of how it works.

It's my usual independent content, much of which started here as blog posts. Thanks to Symantec (Vontu) for Sponsoring and Chris Pepper for editing.

–Rich

Wednesday, May 14, 2008

Database Activity Monitoring Is As Big As, Or Bigger Than, DLP

By Rich

Last night I had this recurring dream I seem to have a few times a year. It involves a plane crash, but not one that I'm on. The dream always changes, but in every case I'm out and about someplace, I look up and see a struggling plane, it crashes, and I rush over to help. The dream almost always end before I do anything, and since I'm no longer a field medic portions of it usually involve me figuring out how I can help. Must be my overblown, currently unused hero complex or something. Never doubt the bounds of my ego.

This has absolutely nothing to do with what I want to talk about today; I just think it's weird.

I swear I posted on this before, but I can't find it anywhere. During a dinner with one of the Database Activity Monitoring vendors (the best DAM name in the industry) I mentioned their market size was equal to, or slightly larger than, the pure-play DLP market (that's where we toss out peripheral products that only use DLP as a feature). I assumed this was common knowledge, but their jaws dropped. We ran through some back of the envelope calculations, and placed DLP at about $70M in 2007, with DAM right in the same range. My estimates might be off by up to $20M, but that's basically a rounding error when we're talking total market revenues.

Here are a few caveats. My estimates don't include a lot of peripheral vendors, and I slice down as best I can to estimate DLP vs. DAM specific sales. For example, Orchestria launched a new DLP product in 2007, but I only include a fraction of their revenue in the market rollup since most of the revenue is still coming from their compliance product. Same for Verdasys, Proofpoint, Imperva (also has web application firewalls), and other vendors either with multiple product lines, or where DLP or DAM is a feature of something else. I work with most of the major DLP and DAM vendors, and while some share their revenues under NDA, some don't, and these are mostly private companies. I'm also certain a couple of them are lying to me.

So there you have it. DLP is heavily hyped, but DAM is essentially the same size without as much hype. I believe this is because DAM, in some cases, directly addresses a compliance requirement, while DLP isn't usually required (although can often really help).

–Rich

Monday, May 05, 2008

Information-Centric Security Tip: Know Your Users and Infrastructure

By Rich

I was on a client reference today learning about someone's DLP deployment, and it highlighted one of the biggest issues we often face when moving to an information-centric model. No, it's not a failure of content analysis techniques, data classification, or over-hyped tools, it's that we often don't even know who owns what, who's supposed to have access to what, or our own infrastructure.

I often start my data security/information-centric rants by mentioning you need to have good identity management in place, but I don't normally spend a whole lot of time talking about the details.

The truth is, this comes up all the time when I'm talking with end users who are implementing this stuff. Oftenthey don't have a good directory infrastructure, or one that reflects the org chart, and thus they can't do everything they want with their DLP, DAM, or other tools. Sometimes they don't even know where all their assets/servers are, or how to access them for scanning.

Thus the tip- if you have a good directory infrastructure that accurately reflects your organizational structure, you'll be in much better shape for any of these projects. Many of these tools can directly integrate with AD/LDAP, allowing you to build role-based policies.

You can't inform someone's manager they're sending customer lists home or running weird DB queries if you don't know who they work for.

–Rich