Login  |  Register  |  Contact

Content Discovery

Wednesday, April 04, 2012

Understanding and Selecting DSP: Extended Features

By Adrian Lane

In the original Understanding and Selecting a Database Activity Monitoring Solution paper we discussed a number of Advanced Features for analysis and enforcement that have since largely become part of the standard feature set for DSP products. We covered monitoring, vulnerability assessment, and blocking, as the minimum feature set required for a Data Security Platform, and we find these in just about every product on the market. Today’s post will cover extensions of those core features, focusing on new methods of data analysis and protection, along with several operational capabilities needed for enterprise deployments. A key area where DSP extends DAM is in novel security features to protect databases and extend protection across other applications and data storage repositories.

In other words, these are some of the big differentiating features that affect which products you look at if you want anything beyond the basics, but they aren’t all in wide use.

Analysis and Protection

  • Query Whitelisting: Query ‘whitelisting’ is where the DSP platform, working as an in-line reverse proxy for the database, only permits known SQL queries to pass through to the database. This is a form of blocking, as we discussed in the base architecture section. But traditional blocking techniques rely on query parameter and attribute analysis. This technique has two significant advantages. First is that detection is based on the structure of the query, matching the format of the FROM and WHERE clauses, to determine if the query matches the approved list. Second is how the list of approved queries is generated. In most cases the DSP maps out the entire SQL grammar – in essence a list of every possible supported query – into binary search tree for super fast comparison. Alternatively, by monitoring application activity, the DSP platform can automatically mark which queries are permitted in baselining mode – of course the user can edit this list as needed. Any query not on the white list is logged and discarded – and never reaches the database. With this method of blocking false positives are very low and the majority of SQL injection attacks are automatically blocked. The downside is that the list of acceptable queries must be updated with each application change – otherwise legitimate requests are blocked.
  • Dynamic Data Masking: Masking is a method of altering data so that the original data is obfuscated but the aggregate value is maintained. Essentially we substitute out individual bits of sensitive data and replace them with random values that look like the originals. For example we can substitute a list of customer names in a database with a random selection of names from a phone book. Several DSP platforms provide on-the-fly masking for sensitive data. Others detect and substitute sensitive information prior to insertion. There are several variations, each offering different security and performance benefits. This is different from the dedicated static data masking tools used to develop test and development databases from production systems.
  • Application Activity Monitoring: Databases rarely exist in isolation – more often they are extensions of applications, but we tend to look at them as isolated components. Application Activity Monitoring adds the ability to watch application activity – not only the database queries that result from it. This information can be correlated between the application and the database to gain a clear picture of just how data is used at both levels, and to identify anomalies which indicate a security or compliance failure. There are two variations currently available on the market. The first is Web Application Firewalls, which protect applications from SQL injection, scripting, and other attacks on the application and/or database. WAFs are commonly used to monitor application traffic, but can be deployed in-line or out-of-band to block or reset connections, respectively. Some WAFs can integrate with DSPs to correlate activity between the two. The other form is monitoring of application specific events, such as SAP transaction codes. Some of these commands are evaluated by the application, using application logic in the database. In either case inspection of these events is performed in a single location, with alerts on odd behavior.
  • File Activity Monitoring: Like DAM, FAM monitors and records all activity within designated file repositories at the user level and alerts on policy violations. Rather than SELECT, INSERT, UPDATE, and DELETE queries, FAM records file opens, saves, deletions, and copies. For both security and compliance, this means you no longer care if data is structured or unstructured – you can define a consistent set of policies around data, not just database, usage. You can read more about FAM in Understanding and Selecting a File Activity Monitoring Solution.
  • Query Rewrites: Another useful technique for protecting data and databases from malicious queries is query rewriting. Deployed through a reverse database proxy, incoming queries are evaluated for common attributes and query structure. If a query looks suspicious, or violates security policy, it is substituted with a similar authorized query. For example, a query that includes a column of Social Security numbers may be omitted from the results by removing that portion of the FROM clause. Queries that include the highly suspect "1=1" WHERE clause may simply return the value 1. Rewriting queries protects application continuity, as the queries are not simply discarded – they return a subset of the requested data, so false positives don’t cause the application to hang or crash.
  • Connection-Pooled User Identification: One of the problems with connection pooling, whereby an application using a single shared database connection for all users, is loss of the ability to track which actions are taken by which users at the database level. Connection pooling is common and essential for application development, but if all queries originate from the same account that makes granular security monitoring difficult. This feature uses a variety of techniques to correlate every query back to an application user for better auditing at the database level.


  • Database Discovery: Databases have a habit of popping up all over the place without administrators being aware. Everything from virtual copies of production databases showing up in test environments, to Microsoft Access databases embedded in applications. These databases are commonly not secured to any standard, often have default configurations, and provide targets of opportunity for attackers. Database discovery works by scanning networks looking for databases communicating on standard database ports. Discovery tools may snapshot all current databases or alert admins when new undocumented databases appear. In some cases they can automatically initiate a vulnerability scan.
  • Content Discovery: As much as we like to think we know our databases, we don’t always know what’s inside them. DSP solutions offer content discovery features to identify the use of things like Social Security numbers, even if they aren’t located where you expect. Discovery tools crawl through registered databases, looking for content and metadata that match policies, and generate alerts for sensitive content in unapproved locations. For example, you could create a policy to identify credit card numbers in any database and generate a report for PCI compliance. The tools can run on a scheduled basis so you can perform ongoing assessments, rather than combing through everything by hand every time an auditor comes knocking. Most start with a scan of column and table metadata, then follow with an analysis of the first n rows of each table, rather than trying to scan everything.
  • Dynamic Content Analysis: Some tools allow you to act on the discovery results. Instead of manually identifying every field with Social Security numbers and building a different protection policy for each location, you create a single policy that alerts every time an administrator runs a SELECT query on any field discovered to contain one or more SSNs. As systems grow and change over time, the discovery continually identifies fields containing protected content and automatically applies the policy. We are also seeing DSP tools that monitor the results of live queries for sensitive data. Policies are then freed from being tied to specific fields, and can generate alerts or perform enforcement actions based on the result set. For example, a policy could generate an alert any time a query result contains a credit card number, no matter what columns were referenced in the query.

Next we will discuss administration and policy management for DSP.

–Adrian Lane

Thursday, February 23, 2012

Implementing DLP: Ongoing Management

By Rich

Managing DLP tends to not be overly time consuming unless you are running off badly defined policies. Most of your time in the system is spent on incident handling, followed by policy management.

To give you some numbers, the average organization can expect to need about the equivalent of one full time person for every 10,000 monitored employees. This is really just a rough starting point – we’ve seen ratios as low as 1/25,000 and as high as 1/1000 depending on the nature and number of policies.

Managing Incidents

After deployment of the product and your initial policy set you will likely need fewer people to manage incidents. Even as you add policies you might not need additional people since just having a DLP tool and managing incidents improves user education and reduces the number of incidents.

Here is a typical process:

Manage incident handling queue

The incident handling queue is the user interface for managing incidents. This is where the incident handlers start their day, and it should have some key features:

  • Ability to customize the incident for the individual handler. Some are more technical and want to see detailed IP addresses or machine names, while others focus on users and policies.
  • Incidents should be pre-filtered based on the handler. In a larger organization this allows you to automatically assign incidents based on the type of policy, business unit involved, and so on.
  • The handler should be able to sort and filter at will; especially to sort based on the type of policy or the severity of the incident (usually the number of violations – e.g. a million account numbers in a file versus 5 numbers).
  • Support for one-click dispositions to close, assign, or escalate incidents right from the queue as opposed to having to open them individually.

Most organizations tend to distribute incident handling among a group of people as only part of their job. Incidents will be either automatically or manually routed around depending on the policy and the severity. Practically speaking, unless you are a large enterprise this cloud be a part-time responsibility for a single person, with some additional people in other departments like legal and human resources able to access the system or reports as needed for bigger incidents.

Initial investigation

Some incidents might be handled right from the initial incident queue; especially ones where a blocking action was triggered. But due to the nature of dealing with sensitive information there are plenty of alerts that will require at least a little initial investigation.

Most DLP tools provide all the initial information you need when you drill down on a single incident. This may even include the email or file involved with the policy violations highlighted in the text. The job of the handler is to determine if this is a real incident, the severity, and how to handle.

Useful information at this point is a history of other violations by that user and other violations of that policy. This helps you determine if there is a bigger issue/trend. Technical details will help you reconstruct more of what actually happened, and all of this should be available on a single screen to reduce the amount of effort needed to find the information you need.

If the handler works for the security team, he or she can also dig into other data sources if needed, such as a SIEM or firewall logs. This isn’t something you should have to do often.

Initial disposition

Based on the initial investigation the handler closes the incident, assigns it to someone else, escalates to a higher authority, or marks it for a deeper investigation.

Escalation and Case Management

Anyone who deploys DLP will eventually find incidents that require a deeper investigation and escalation. And by “eventually” we mean “within hours” for some of you.

DLP, by it’s nature, will find problems that require investigating your own employees. That’s why we emphasize having a good incident handling process from the start since these cases might lead to someone being fired. When you escalate, consider involving legal and human resources. Many DLP tools include case management features so you can upload supporting documentation and produce needed reports, plus track your investigative activities.


The last (incredibly obvious) step is to close the incident. You’ll need to determine a retention policy and if your DLP tool doesn’t support retention needs you can always output a report with all the salient incident details.

As with a lot of what we’ve discusses you’ll probably handle most incidents within minutes (or less) in the DLP tool, but we’ve detailed a common process for those times you need to dig in deeper.


Most DLP systems keep old incidents in the database, which will obviously fill it up over time. Periodically archiving old incidents (such as anything 1 year or older) is a good practice, especially since you might need to restore the records as part of a future investigation.

Managing Policies

Anytime you look at adding a significant new policy you should follow the Full Deployment process we described above, but there are still a lot of day to day policy maintenance activities. These tend not to take up a lot of time, but if you skip them for too long you might find your policy set getting stale and either not offering enough security, or causing other issues due to being out of date.

Policy distribution

If you manage multiple DLP components or regions you will need to ensure policies are properly distributed and tuned for the destination environment. If you distribute policies across national boundaries this is especially important since there might be legal considerations that mandate adjusting the policy.

This includes any changes to policies. For example, if you adjust a US-centric policy that’s been adapted to other regions, you’ll then need to update those regional policies to maintain consistency. If you manage remote offices with their own network connections you want to make sure policy updates pushed out properly and are consistent.

Adding policies

Brand new policies will take the same effort as the initial polices, other than you’ll be more familiar with the system. Thus we suggest you follow the Full Deployment process again.

Policy reviews

As with anything, today’s policy might not apply the same in a year, or two, or five. The last thing you want to end up with is a disastrous mess of stale, yet highly customized and poorly understood polices as you often see on firewalls.

Reviews should consist of:

  • Periodic reviews of the entire policy set to see if it still accurately reflects your needs and if new policies are required, or older ones should be retired.
  • Scheduled reviews and testing of individual policies to confirm that they still work as expected. Put it on the calendar when you create a new policy to check it at least annually. Run a few basic tests, and look at all the violations of the policy over a given time period to get a sense of how it works. Review the users and groups assigned to the policy to see if they still reflect the real users and business units in your organization.
  • Ad-hoc reviews when a policy seems to be providing unexpected results. A good tool to help figure this out is your trending reports – any big changes or deviations from a trend are worth investigating at the policy level.
  • Policy reviews during product updates, since these may change how a policy works or give you new analysis or enforcement options.

Updates and tuning

Even effective policies will need periodic updating and additional tuning. While you don’t necessarily need to follow the entire Full Deployment process for minor updates, they should still be tested in a monitoring mode before you move into any kind of automated enforcement.

Also make sure you communicate and noticeable changes to affected business units so you don’t catch them by surprise. We’ve heard plenty examples of someone in security flipping a new enforcement switch or changing a policy in a way that really impacted business operations. Maybe that’s the goal, but it’s always best to communicate and hash things out ahead of time.

If you find a policy really seems ineffective then it’s time for a full review. For example, we know of one very large DLP user who had unacceptable levels of false positives on their account number protection due to the numbers being too similar to other numbers commonly in use in regular communications. They solved the problem (after a year or more) by switching from pattern matching to a database fingerprinting policy that checked against the actual account numbers in a customer database.

Retiring policies

There are a lot of DLP policies you might use for a limited time, such as a partial document matching policy to protect corporate financials before they are released. After the release date, there’s no reason to keep the policy.

We suggest you archive these policies instead of deleting them. And if your tool supports it, set expiration dates on policies… with notification so it doesn’t shut down and leave a security hole without you knowing about it.

Backup and archiving

Even if you are doing full system backups it’s a good idea to perform periodic policy set backups. Many DLP tools offer this as part of the feature set. This allows you to migrate policies to new servers/appliances or recover policies when other parts of the system fail and a full restore is problematic.

We aren’t saying these sorts of disasters are common; in fact we’ve never heard of one, but we’re paranoid security folks.

Archiving old policies also helps if you need to review them while reviewing an old incident as part of a new investigation or a legal discovery situation.


Analysis, as opposed to incident handling, focuses on big picture trends . We suggest three kinds of analysis:

Trend analysis

Often built into the DLP server’s dashboard, this analysis looks across incidents to evaluate overall trends such as:

  • Are overall incidents increasing or decreasing?
  • Which policies are having more or less incidents over time?
  • Which business units experience more incidents?
  • Are there any sudden increases in violations by a business unit, or of a policy, that might not be seen if overall trends aren’t changing?
  • Are a certain type of incidents tied to a business process that should be changed?

The idea is to mine your data to evaluate how your risk is increasing or decreasing over time. When you’re in the muck of day to day incident handling it’s often hard to notice these trends.

Risk analysis

A risk analysis is designed to show what you are missing. DLP tools only look for what you tell them to look for, and thus won’t catch unprotected data you haven’t built a policy for.

A risk analysis is essentially the Quick Wins process. You turn on a series of policies with no intention of enforcing them, but merely to gather information and see if there are any hot spots you should look at more in depth or create dedicated policies for.

Effectiveness analysis

This helps assess the effectiveness of your DLP tool usage. Instead of looking at general reports think of it like testing to tool again. Try some common scenarios to circumvent your DLP to figure out where you need to make changes.

Content discovery/classification

Content discovery is the process of scanning storage for the initial identification of sensitive content and tends to be a bit different than network or endpoint deployments. While you can treat it the same, identifying policy violations and responding to them, many organizations view content discovery as a different process, often part of a larger data security or compliance project.

Content discovery projects will off turn up huge amounts of policy violations due to files being stored all over the place. Compounding the problem is the difficulty in identifying the file owner or business unit that’s using the data, and why they have it. Thus you tend to need more analysis, at least with your first run through a server or other storage repository, to find the data, identify who uses and owns it, the business need (if any), and alternative options to keep the data more secure.


We’ve covered most non-product-specific troubleshooting throughout this series. Problems people encounter tend to fall into the following categories:

  • Too many false positives or negatives, which you can manage using our policy tuning and analysis recommendations.
  • System components not talking to each other. For example, some DLP tools separate out endpoint and network management (often due to acquiring different products) and then integrate them at the user interface level. Unless there is a simple network routing issue, fixing these may require the help of your vendor.
  • Component integrations to external tools like web and email gateways may fail. Assuming you were able to get them to talk to each other previously, the culprit is usually a software update introducing an incompatibility. Unfortunately, you’ll need to run it down in the log files if you can’t pick out the exact cause.
  • New or replacement tools may not work with your existing DLP tool. For example, swapping out a web gateway or using a new edge switch with different SPAN/Mirror port capabilities.

We really don’t hear about too many problems with DLP tools outside of getting the initial installation properly hooked into infrastructures and tuning policies.

Maintenance for DLP tools is relatively low, consisting mostly of five activities (two of which we already discussed):

  • Full system backups, which you will definitely do for the central management server, and possibly any remote collectors/servers depending on your tool. Some tools don’t require this since you can swap in a new default server or appliance and then push down the configuration.
  • Archiving old incidents to free up space and resources. But don’t be too aggressive since you generally want a nice library of incidents to support future investigations.
  • Archiving and backing up policies. Archiving policies means removing them from the system, while backups include all the active policies. Keeping these separate from full system backups provides more flexibility for restoring to new systems or migrating to additional servers.
  • Health checks to ensure all system components are still talking to each other.
  • Updating endpoint and server agents to the latest versions (after testing, of course).


Ongoing reporting is an extremely important aspect of running a Data Loss Prevention tool. It helps you show management and other stakeholders that you, and your tool, are providing value and managing risk.

At a minimum you should produce quarterly, if not monthly, rollup reports on trends and summarizing overall activity. Ideally you’ll show decreasing policy violations, but if there is an increase of some sort you can use that to get the resources to investigate the root cause.

You will also produce a separate set of reports for compliance. These may be on a project basis, tied to any audit cycles, or scheduled like any other reports. For example, running quarterly content discovery reports showing you don’t have any unencrypted credit card data in a storage repository and providing these to your PCI assessor to reduce potential audit scope. Or running monthly HIPAA reports for the HIPAA compliance officer (if you work in healthcare).

Although you can have the DLP tool automatically generate and email reports, depending on your internal political environment you might want to review these before passing them to outsiders in case there are any problems with the data. Also, it’s never a good idea to name employees in general reports – keep identifications to incident investigations and case management summaries that have a limited audience.


Wednesday, February 15, 2012

Implementing DLP: Deploying Storage and Endpoint

By Rich

Storage deployment

From a technical perspective, deploying storage DLP is even easier than the most basic network DLP. You can simply point it at an open file share, load up the proper access rights, and start analyzing. The problem most people run into is figuring out which servers to target, which access rights to use, and whether the network and storage repository can handle the overhead.

Remote scanning

All storage DLP solutions support remotely scanning a repository by connecting to an open file share. To run they need to connect (at least administrator-only) to a share on the server scan.

But straightforward or not, there are three issues people commonly encounter:

  1. Sometimes it’s difficult to figure out where all the servers are and what file shares are exposed. To resolve this you can use a variety of network scanning tools if you don’t have a good inventory to start.
  2. After you find the repositories you need to gain access rights. And those rights need to be privileged enough to view all files on the server. This is a business process issue, not a technical problem, but most organizations need to do a little legwork to track down at least a few server owners.
  3. Depending on your network architecture you may need to position DLP servers closer to the file repositories. This is very similar to a hierarchical network deployment but we are positioning closer to the storage to reduce network impact or work around internal network restrictions (not that everyone segregates their internal network, even though that single security step is one of the most powerful tools in our arsenal). For very large repositories which you don’t want to install a server agent on, you might even need to connect the DLP server to the same switch. We have even heard of organizations adding a second network interfaces on a private segment network to support particularly intense scanning.

All of this is configured in the DLP management console; where you configure the servers to scan, enter the credentials, assign policies, and determine scan frequency and schedule.

Server agents

Server agents support higher performance without network impact, because the analysis is done right on the storage repository, with only results pushed back to the DLP server. This assumes you can install the agent and the server has the processing power and memory to support the analysis. Some agents also provide additional context you can’t get from remote scanning.

Installing the server agent is no more difficult than installing any other software, but as we have mentioned (multiple times) you need to make sure you test to understand compatibility and performance impact. Then you configure the agent to connect to the production DLP server.

Unless you run into connection issues due to your network architecture, you then move over to the DLP management console to tune the configuration. The main things to set are scan frequency, policies, and performance throttles. Agents rarely run all the time – you choose a schedule, similar to antivirus, to reduce overhead and scan during slower hours.

Depending on the product, some agents require a constant connection to the DLP server. They may compress data and send it to the server for analysis rather than checking everything locally. This is very product-specific, so work with your vendor to figure out which option works best for you – especially if their server agent’s internal analysis capabilities are limited compared to the DLP server’s. As an example, some document and database matching policies impose high memory requirements which are infeasible on a storage server, but may be acceptable on the shiny new DLP server.

Document management system/NAS integration

Certain document management systems and Network Attached Storage products expose plugin architectures or other mechanisms that allow the DLP tool to connect directly, rather than relying on an open file share.

This method may provide additional context and information, as with a server agent. This is extremely dependent on which products you use, so we can’t provide much guidance beyond “do what the manual says”.

Database scanning

If your product supports database scanning you will usually make a connection to the database using an ODBC agent and then configure what to scan.

As with storage DLP, deployment of database DLP may require extensive business process work: to find the servers, get permission, and obtain credentials. Once you start scanning, it is extremely unlikely you will be able to scan all database records. DLP tools tend to focus on scanning the table structure and table names to pick out high-risk areas such as credit card fields, and then they scan a certain number of rows to see what kind of data is in the fields.

So the process becomes:

  1. Identify the target database.
  2. Obtain credentials and make an ODBC connection.
  3. Scan attribute names (field/column names).
  4. (Optional) Define which fields to scan/monitor.
  5. Analyze the first n rows of identified fields.

We only scan a certain number of rows because the focus isn’t on comprehensive realtime monitoring – that’s what Database Activity Monitoring is for – and to avoid unacceptable performance impact. But scanning a small number of rows should be enough to identify which tables hold sensitive data, which is hard to do manually.

Endpoint deployment

Endpoints are, by far, the most variable component of Data Loss Prevention. There are massive differences between the various products on the market, and far more performance constraints required to fit on general-purpose workstations and laptops, rather than on dedicated servers. Fortunately, as widely as the features and functions vary, the deployment process is consistent.

  1. Test, then test more: I realize I have told you to test your endpoint agents at least 3 times by now, but this is the single most common problem people encounter. If you haven’t already, make sure you test your agents on a variety of real-world systems in your environment to make sure performance is acceptable.
  2. Create a deployment package or enable in your EPP tool: The best way to deploy the DLP agent is to use whatever software distribution tool you already use for normal system updates. This means building a deployment package with the agent configured to connect to the DLP server. Remember to account for any network restrictions that could isolate endpoints from the server. In some cases the DLP agent may be integrated into your existing EPP (Endpoint Protection Platform) tool. Most often you will need to deploy an additional agent, but if it is fully integrated you won’t need to push a software update, and can configure and enable it either through the DLP management console or in the EPP tool itself.
  3. Activate and confirm deployment: Once the agent is deployed go back to your DLP management console to validate that systems are covered, agents are running, and they can communicate with the DLP server. You don’t want to turn on any policies yet – for now you’re just confirming that the agents deployed successfully and are communicating.


Tuesday, March 15, 2011

FAM: Market Drivers, Business Justifications, and Use Cases

By Rich

Now that we have defined File Activity Monitoring it’s time to talk about why people are buying it, how it’s being used, and why you might want it.

Market Drivers

As I mentioned earlier the first time I saw FAM was when I dropped the acronym into the Data Security Lifecycle. Although some people were tossing the general idea around, there wasn’t a single product on the market. A few vendors were considering introducing something, but in conversations with users there clearly wasn’t market demand.

This has changed dramatically over the past two years; due to a combination of indirect compliance needs, headline-driven security concerns, and gaps in existing security tools. Although the FAM market is completely nascent, interest is slowly growing as organizations look for better handles on their unstructured file repositories.

We see three main market drivers:

  • As an offshoot of compliance. Few regulations require continuous monitoring of user access to files, but quite a few require some level of audit of access control, particularly for sensitive files. As you’ll see later, most FAM tools also include entitlement assessment, and they monitor and clearly report on activity. We see some organizations consider FAM initially to help generate compliance reports, and later activate additional capabilities to improve security.
  • Security concerns. The combination of APT-style attacks against sensitive data repositories, and headline-grabbing cases like Wikileaks, are driving clear interest in gaining control over file repositories.
  • To increase visibility. Although few FAM deployments start with the goal of providing visibility into file usage, once a deployment starts it’s not uncommon use it to gain a better understanding of how files are used within the organization, even if this isn’t to meet a compliance or security need.

FAM, like its cousin Database Activity Monitoring, typically starts as a smaller project to protect a highly sensitive repository and then grows to expand coverage as it proves its value. Since it isn’t generally required directly for compliance, we don’t expect the market to explode, but rather to grow steadily.

Business Justifications

If we turn around the market drivers, four key business justifications emerge for deployment of FAM:

  • To meet a compliance obligation or reduce compliance costs. For example, to generate reports on who has access to sensitive information, or who accessed regulated files over a particular time period.
  • To reduce the risk of major data breaches. While FAM can’t protect every file in the enterprise, it provides significant protection for the major file repositories that turn a self-constrained data breach into an unmitigated disaster. You’ll still lose files, but not necessarily the entire vault.
  • To reduce file management costs. Even if you use document management systems, few tools provide as much insight into file usage as FAM. By tying usage, entitlements, and user/group activity to repositories and individual files; FAM enables robust analysis to support other document management initiatives such as consolidation.
  • To support content discovery. Surprisingly; many content discovery tools (mostly Data Loss Prevention), and manual processes, struggle to identify file owners. FAM can use a combination of entitlement analysis and activity monitoring to help determine who owns each file.

Example Use Cases

By now you likely have a good idea how FAM can be used, but here are a few direct use cases:

  • Company A deployed FAM to protect sensitive engineering documents from external attacks and insider abuse. They monitor the shared engineering file share and generate a security alert if more than 5 documents are accessed in less than 5 minutes; then block copying of the entire directory.
  • A pharmaceutical company uses FAM to meet compliance requirements for drug studies. The tool generates a quarterly report of all access to study files and generates security alerts when IT administrators access files.
  • Company C recently performed a large content discovery project to locate all regulated Personally Identifiable Information, but struggled to determine file owners. Their goal is to reduce sensitive data proliferation, but simple file permissions rarely indicate the file owner, which is needed before removing or consolidating data. With FAM they monitor the discovered files to determine the most common accessors – who are often the file owners.
  • Company D has had problems with sales executives sucking down proprietary customer information before taking jobs with competitors. They use FAM to generate alerts based on both high-volume access and authorized users accessing older files they’ve never touched before.

As you can see, the combination of tying users to activity, with the capability to generate alerts (or block) based on flexible use policies, makes FAM interesting. Imagine being able to kick off a security investigation based on a large amount of file access, or low-and-slow access by a service or administrative account.

File Activity Monitoring vs. Data Loss Prevention

The relationship between FAM and DLP is interesting. These two technologies are extremely complementary – so much that in one case (as of this writing) FAM is a feature of a DLP product – but they also achieve slightly different goals.

The core value of DLP is its content analysis capabilities; the ability to dig into a file and understand the content inside. FAM, on the other hand, doesn’t necessarily need to know the contents of a file or repository to provide value. Certain access patterns themselves often indicate a security problem, and knowing the exact file contents isn’t always needed for compliance initiatives such as access auditing.

FAM and DLP work extremely well together, but each provides plenty of value on its own.


Monday, February 01, 2010

Pragmatic Data Security: Discover

By Rich

In the Discovery phase we figure where the heck our sensitive information is, how it’s being used, and how well it’s protected. If performed manually, or with too broad an approach, Discovery can be quite difficult and time consuming. In the pragmatic approach we stick with a very narrow scope and leverage automation for greater efficiency. A mid-sized organization can see immediate benefits in a matter of weeks to months, and usually finish a comprehensive review (including all endpoints) within a year or less.

Discover: The Process

Before we get into the process, be aware that your job will be infinitely harder if you don’t have a reasonably up to date directory infrastructure. If you can’t figure out your users, groups, and roles, it will be much harder to identify misuse of data or build enforcement policies. Take the time to clean up your directory before you start scanning and filtering for content. Also, the odds are very high that you will find something that requires disciplinary action. Make sure you have a process in place to handle policy violations, and work with HR and Legal before you start finding things that will get someone fired (trust me, those odds are pretty darn high).

You have a couple choices for where to start – depending on your goals, you can begin with applications/databases, storage repositories (including endpoints), or the network. If you are dealing with something like PCI, stored data is usually the best place to start, since avoiding unencrypted card numbers on storage is an explicit requirement. For HIPAA, you might want to start on the network since most of the violations in organizations I talk to relate to policy violations over email/web/FTP due to bad business processes. For each area, here’s how you do it:

  • Storage and Endpoints: Unless you have a heck of a lot of bodies, you will need a Data Loss Prevention tool with content discovery capabilities (I mention a few alternatives in the Tools section, but DLP is your best choice). Build a policy based on the content definition you built in the first phase. Remember, stick to a single data/content type to start. Unless you are in a smaller organization and plan on scanning everything, you need to identify your initial target range – typically major repositories or endpoints grouped by business unit. Don’t pick something too broad or you might end up with too many results to do anything with. Also, you’ll need some sort of access to the server – either by installing an agent or through access to a file share. Once you get your first results, tune your policy as needed and start expanding your scope to scan more systems.
  • Network: Again, a DLP tool is your friend here, although unlike with content discovery you have more options to leverage other tools for some sort of basic analysis. They won’t be nearly as effective, and I really suggest using the right tool for the job. Put your network tool in monitoring mode and build a policy to generate alerts using the same data definition we talked about when scanning storage. You might focus on just a few key channels to start – such as email, web, and FTP; with a narrow IP range/subnet if you are in a larger organization. This will give you a good idea of how your data is being used, identify some bad business process (like unencrypted FTP to a partner), and which users or departments are the worst abusers. Based on your initial results you’ll tune your policy as needed. Right now our goal is to figure out where we have problems – we will get to fixing them in a different phase.
  • Applications & Databases: Your goal is to determine which applications and databases have sensitive data, and you have a few different approaches to choose from. This is the part of the process where a manual effort can be somewhat effective, although it’s not as comprehensive as using automated tools. Simply reach out to different business units, especially the application support and database management teams, to create an inventory. Don’t ask them which systems have sensitive data, ask them for an inventory of all systems. The odds are very high your data is stored in places you don’t expect, so to check these systems perform a flat file dump and scan the output with a pattern matching tool. If you have the budget, I suggest using a database discovery tool – preferably one with built in content discovery (there aren’t many on the market, as we’ll mention in the Tools section). Depending on the tool you use, it will either sniff the network for database connections and then identify those systems, or scan based on IP ranges. If the tool includes content discovery, you’ll usually give it some level of administrative access to scan the internal database structures.

I just presented a lot of options, but remember we are taking the pragmatic approach. I don’t expect you to try all this at once – pick one area, with a narrow scope, knowing you will expand later. Focus on wherever you think you might have the greatest initial impact, or where you have known problems. I’m not an idealist – some of this is hard work and takes time, but it isn’t an endless process and you will have a positive impact.

We aren’t necessarily done once we figure out where the data is – for approved repositories, I really recommend you also re-check their security. Run at least a basic vulnerability scan, and for bigger repositories I recommend a focused penetration test. (Of course, if you already know it’s insecure you probably don’t need to beat the dead horse with another check). Later, in the Secure phase, we’ll need to lock down the approved repositories so it’s important to know which security holes to plug.

Discover: Technologies

Unlike the Define phase, here we have a plethora of options. I’ll break this into two parts: recommended tools that are best for the job, and ancillary tools in case you don’t have a budget for anything new. Since we’re focused on the process in this series, I’ll skip definitions and descriptions of the technologies, most of which you can find in our Research Library

Recommended Tools

  1. Data Loss Prevention (DLP): This is the best tool for storage, network, and endpoint discovery. Nothing else is nearly as effective.
  2. Database Discovery: While there are only a few tools on the market, they are extremely helpful for finding all the unexpected databases that tend to be floating around most organizations. Some offer content discovery, but it’s usually limited to regular expressions/keywords (which is often totally fine for looking within a database).
  3. Database Activity Monitoring (DAM): A couple of the tools include content discovery (some also include database discovery). I only recommend DAM in the discover phase if you also intend on using it later for database monitoring – otherwise it’s not the right investment.

Ancillary Tools

  1. IDS/IPS/Deep Packet Inspection: There are a bunch of different deep packet inspection network tools – including UTM, Web Application Firewalls, and web gateways – that now include basic regular expression pattern matching for “poor man’s” DLP functionality. They only help with data that fits a pattern, they don’t include any workflow, and they usually have a ton of false positives. If the tool can’t crack open file attachments/transfers it probably won’t be very helpful.
  2. Electronic Discovery, Search, and Data Classification: Most of these tools perform some level of pattern matching or indexing that can help with discovery. They tend to have much higher false positive rates than DLP (and usually cost more if you’re buying new), but if you already have one and budgets are tight they can help.
  3. Email Security Gateways: Most of the email security gateways on the market can scan for content, but they are obviously limited to only email, and aren’t necessarily well suited to the discovery process.
  4. FOSS Discovery Tools: There are a couple of free/open source content discovery tools, mostly projects from higher education institutions that built their own tools to weed out improper use of Social Security numbers due to a regulatory change a few years back.

Discover: Case Study

Frank from Billy Bob’s Bait Shop and Sushi Outlet decides to use a DLP tool to help figure out where any unencrypted credit card numbers might be stored. He decides to go with a full suite DLP tool since he knows he needs to scan his network, storage, servers in the retail outlets, and employee systems.

Before turning on the tool, he contacts Legal and HR to set up a process in case they find any employees illegally using these numbers, as opposed to the accidental or business-process leaks he also expects to manage. Although his directory servers are a little messy due to all the short-term employees endemic to retail operations, he’s confident his core Active Directory server is relatively up to date, especially where systems/servers are concerned.

Since he’s using a DLP tool, he develops a three-tier policy to base his discovery scans on:

  1. Using the one database with stored unencrypted numbers, he creates a database fingerprinting policy to alert on exact matches from that database (his DLP tool uses hashes, not the original values, so it isn’t creating a new security exposure). These are critical alerts.
  2. His next policy uses database fingerprints of all customer names from the customer database, combined with a regular expression for generic credit card numbers. If a customer name appears with something that matches a credit card number (based on the regex pattern) it generates a medium alert.
  3. His lowest priority policy uses the default “PCI” category built into his DLP tool, which is predominantly basic pattern matching.

He breaks his project down into three phases, to run during overlapping periods:

  1. Using those three policies, he turns on network monitoring for email, web, and FTP.
  2. He begins scanning his storage repositories, starting in the data center. Once he finishes those, he will expand the scans into systems in the retail outlets. He expects his data center scan to go relatively quickly, but is planning on 6-12 months to cover the retail outlets.
  3. He is testing endpoint discovery in the lab, but since their workstation management is a bit messy he isn’t planning on trying to install agents and beginning scans until the second year of the project.

It took Frank about two months to coordinate with other business/IT units before starting the project. Installing DLP on the network only took a few hours because everything ran through one main gateway, and he wasn’t worried about installing any proxy/blocking technology.

Frank immediately saw network results, and found one serious business process problem where unencrypted numbers were included in files being FTPed to a business partner. The rest of his incidents involved individual accidents, and for the most part they weren’t losing credit card numbers over the monitored channels.

The content discovery portion took a bit longer since there wasn’t a consistent administrative account he could use to access and scan all the servers. Even though they are a relatively small operation, it took about 2 months of full time scanning to get through the data center due to all the manual coordination involved. They found a large number of old spreadsheets with credit card numbers in various directories, and a few in flat files – especially database dumps from development.

The retail outlets actually took less time than he expected. Most of the servers, except at the largest regional locations, were remotely managed and well inventoried. He found that 20% of them were running on an older credit card transaction system that stored unencrypted credit card numbers.

Remember, this is a 1,000 person organization… if you work someplace with five or ten times the employees and infrastructure, your process will take longer. Don’t assume it will take five or ten times longer, though – it all depends on scope, infrastructure, and a variety of other factors.


Wednesday, January 27, 2010

Pragmatic Data Security- Define Phase

By Rich

Now that we’ve described the Pragmatic Data Security Cycle, it’s time to dig into the phases. As we roll through each of these I’m going to break it into three parts: the process, the technologies, and a case study. For the case study we’re going to follow a fictional organization through the entire process. Instead of showing you every single data protection option at each phase, we’ll focus on a narrow project that better represents what you will likely experience.

Define: The Process

From a process standpoint, this is both the easiest and hardest of the phases. Easy, since there’s only one thing you need to do and it isn’t very technical or complex, hard since it may involve coordination across multiple business units and the quest for executive sponsorship.

  1. Identify an executive sponsor to support your efforts. Without management support, the rest of the process will be extremely difficult.
  2. Identify the one piece of information/content/data you want to protect. The definition shouldn’t be too broad. For example, “engineering plans” is too broad, but “engineering plans for project X” is acceptable. Using “PCI/NPI/HIPAA” is acceptable, assuming you narrow it down in the next step.
  3. Define and model the information you defined in the step above. For totally unstructured content like engineering plans, identify a repository to use for your definition, or any watermarking/labels you are certain will be available to identify and protect the information. For PCI/NPI/HIPAA determine the exact fields/pieces of data to protect. For PCI it might be only the credit card number, for NPI it might be names and addresses, and for HIPAA it might be ICD9 billing codes. If you are protecting data from a database, also identify the source repository.
  4. Identify key business units with a stake in the information, and contact them to verify the priority, structure, and repositories for this information. It’s no fun if you think you’re going to protect a database of customer data, only to find out halfway through that it’s not really the important one from a business perspective.

That’s it: find a sponsor, identify the category, identify the data/repository, and confirm with the business folks.

Define: Technologies

None. This is a manual business process and the only technology you need is something to take notes with… or maybe email to communicate.

Define: Case Study

Billy Bob’s Bait Shop and Sushi Outlet is a mid-sized, multi-site retail organization that specializes in “The freshest seafood, for your family or aquatic friends”. Billy Bob’s consists of a corporate headquarters and a few dozen retail outlets in three states. There are about 1,000 employees, and a growing web business due to their capability to ship fresh bait or sushi to any location in the US overnight.

Billy Bob’s is struggling with PCI compliance and wants to avoid a major security breach after seeing the damage caused to their major competitor during a breach (John Boy’s Worms and Grub).

They do not have a dedicated security team, but their CIO designated one of their top network administrators (the former firewall manager) to head up security operations. Frank has a solid history as a network administrator and is familiar with security (including some SANS training and a CISSP class). Due to problems with their first PCI assessment, Frank has the backing of the CIO.

The category of data is PCI. After some research, Frank decides to go with a multilevel definition – at the top is credit card numbers. Since they are (supposedly) not storing them in a database they could feed to any data protection tools, Frank is starting with a regular expression to identify credit card numbers, and then plans on refining it using customer names (which are stored in the database). He is hoping that whatever tools he picks can use a generic credit card number definition for low-priority alerts, and a credit card (generic) tied with a customer name to trigger higher priority alerts. Frank also plans on using violation counts to help find real problems areas.

Frank now has a generic category (PCI), a specific definition (generic regex and customer name from a database) and the repository location (the customer database itself). From the heads of the customer relations and billing, he learned that there are really two databases he needs to worry about: the main transaction processing/records system for the web outlet, and the point of sale transaction processing system for the retail outlets. The web outlet does not store unencrypted credit card numbers, but the retail outlets currently do, and they are working with the transaction processor to fix that. Thus he is adding credit card numbers from the retail database to his list of data sources. Fortunately, they are only stored in the central processing database, and not at the individual retail outlets.

That’s the setup – in our next post we will cover the Discovery process to figure out where the heck all that data is.


Wednesday, January 20, 2010

The Rights Management Dilemma

By Rich

Over the past few months I’ve seen a major uptick in the number of user inquiries I’m taking on enterprise digital rights management (or enterprise rights management, but I hate that term). Having covered EDRM for something like 8 years or so now, I’m only slightly surprised.

I wouldn’t say there’s a new massive groundswell of sudden desperate motivation to protect corporate intellectual assets. Rather, it seems like a string of knee-jerk reactions related to specific events. What concerns me is that I’ve noticed two consistent trends throughout these discussions:

  1. EDRM is being mandated from someplace in management. Not, “protect our data”, but EDRM specifically.
  2. There is no interest in discussing how to best protect the content in question, especially other technologies or process changes.

People are being told to get EDRM, get it now, and nothing else matters.

This is problematic on multiple levels. While rights management is one of the most powerful technologies to protect information assets, it’s also one of the most difficult to manage and implement once you hit a certain scale. It’s also far from a panacea, and in many of these organizations it either needs to be combined with other technologies and processes, or should be considered after other more basic steps are taken. For example, most of these clients haven’t performed any content discovery (manual or with DLP) to find out where the information they want to protect is located in the first place.

Rights management is typically most effective when:

  1. It’s deployed on a workgroup level.
  2. The users involved are willing and able to adjust their workflow to incorporate EDRM.
  3. There is minimal need for information exchange of the working files with external organizations.
  4. The content to protect is easy to identify, and centrally concentrated at the start of the project.

Where EDRM tends to fail is with enterprise-wide deployments, or when the culture of the user population doesn’t prioritize the value of their content sufficiently to justify the necessary process changes.

I do think that EDRM will play a very large role in the future of information-centric security, but only as its inevitable merging with data loss prevention is complete. The dilemma of rights management is that its very power and flexibility is also its greatest liability (sort of like some epic comic book thing). It’s just too much to ask users to keep track of which user populations map to which rights on which documents. This is changing, especially with the emerging DRM/DLP partnerships, but it’s been the primary reason EDRM deployments have been so self-limiting.

Thus I find myself frequently cautioning EDRM prospects to carefully scope and manage their projects, or look at other technologies first, at the same time I’m telling them it’s the future of information centric security.

Anyone seen my lithium?


Monday, June 01, 2009

The State of Web Application and Data Security—Mid 2009

By Rich

One of the more difficult aspects of the analyst gig is sorting through all the information you get, and isolating out any inherent biases. The kinds of inquiries we get from clients can all too easily skew our perceptions of the industry, since people tend to come to us for specific reasons, and those reasons don’t necessarily represent the mean of the industry. Aside from all the vendor updates (and customer references), our end user conversations usually involve helping someone with a specific problem – ranging from vendor selection, to basic technology education, to strategy development/problem solving. People call us when they need help, not when things are running well, so it’s all too easy to assume a particular technology is being used more widely than it really is, or a problem is bigger or smaller than it really is, because everyone calling us is asking about it. Countering this takes a lot of outreach to find out what people are really doing even when they aren’t calling us.

Over the past few weeks I’ve had a series of opportunities to work with end users outside the context of normal inbound inquiries, and it’s been fairly enlightening. These included direct client calls, executive roundtables such as one I participated in recently with IANS (with a mix from Fortune 50 to mid-size enterprises), and some outreach on our part. They reinforced some of what we’ve been thinking, while breaking other assumptions. I thought it would be good to compile these together into a “state of the industry” summary. Since I spend most of my time focused on web application and data security, I’ll only cover those areas:


When it comes to web application and data security, if there isn’t a compliance requirement, there isn’t budget – Nearly all of the security professionals we’ve spoken with recognize the importance of web application and data security, but they consistently tell us that unless there is a compliance requirement it’s very difficult for them to get budget. That’s not to say it’s impossible, but non-compliance projects (however important) are way down the priority list in most organizations. In a room of a dozen high-level security managers of (mostly) large enterprises, they all reinforced that compliance drove nearly all of their new projects, and there was little support for non-compliance-related web application or data security initiatives. I doubt this surprises any of you.

“Compliance” may mean more than compliance – Activities that are positioned as helping with compliance, even if they aren’t a direct requirement, are more likely to gain funding. This is especially true for projects that could reduce compliance costs. They will have a longer approval cycle, often 9 months or so, compared to the 3-6 months for directly-required compliance activities. Initiatives directly tied to limiting potential data breach notifications are the most cited driver. Two technology examples are full disk encryption and portable device control.

PCI is the single biggest compliance driver for web application and data security – I may not be thrilled with PCI, but it’s driving more web application and data security improvements than anything else.

The term Data Loss Prevention has lost meaningI discussed this in a post last week. Even those who have gone through a DLP tool selection process often use the term to encompass more than the narrow definition we prefer.

It’s easier to get resources to do some things manually than to buy a tool – Although tools would be much more efficient and effective for some projects, in terms of costs and results, manual projects using existing resources are easier to get approval for. As one manager put it, “I already have the bodies, and I won’t get any more money for new tools.” The most common example cited was content discovery (we’ll talk more about this a few points down).

Most people use DLP for network (primarily email) monitoring, not content discovery or endpoint protection – Even though we tend to think discovery offers equal or greater value, most organizations with DLP use it for network monitoring.

Interest in content discovery, especially DLP-based, is high, but resources are hard to get for discovery projects – Most security managers I talk with are very interested in content discovery, but they are less educated on the options and don’t have the resources. They tell me that finding the data is the easy part – getting resources to do anything about it is the limiting factor.

The Web Application Firewall (WAF) market and Security Source Code Tools markets are nearly equal in size, with more clients on WAFs, and more money spent on source code tools per client – While it’s hard to fully quantify, we think the source code tools cost more per implementation, but WAFs are in slightly wider use.

WAFs are a quicker hit for PCI compliance – Most organizations deploying WAFs do so for PCI compliance, and they’re seen as a quicker fix than secure source code projects.

Most WAF deployments are out of band, and false positives are a major problem for default deployments – Customers are installing WAFs for compliance, but are generally unable to deploy them inline (initially) due to the tuning requirements.

Full drive encryption is mature, and well deployed in the early mainstream – Full drive encryption, while not perfect, is deployable in even large enterprises. It’s now considered a level-setting best practice in financial services, and usage is growing in healthcare and insurance. Other asset recovery options, such as remote data destruction and phone home applications, are now seen as little more than snake oil. As one CISO told us, “I don’t care about the laptop, we just encrypt it and don’t worry about it when it goes missing”.

File and folder encryption is not in wide use – Very few organizations are performing any wide scale file/folder encryption, outside of some targeted encryption of PII for compliance requirements.

Database encryption is hard, and not widely used – Most organizations are dissatisfied with database encryption options, and do not deploy it widely. Within a large organization there is likely some DB encryption, with preference given to file/folder/media protection over column level encryption, but most organizations prefer to avoid it. Performance and key management are cited as the primary obstacles, even when using native tools. Current versions of database encryption (primarily native encryption) do perform better than older versions, but key management is still unsatisfactory. Large encryption projects, when initiated, take an average of 12-18 months.

Large enterprises prefer application-level encryption of credit card numbers, and tokenization – When it comes to credit card numbers, security managers prefer to encrypt it at the application level, or consolidate numbers into a central source, using representative “tokens” throughout the rest of the application stack. These projects take a minimum of 12-18 months, similar to database encryption projects (the two are often tied together, with encryption used in the source database).

Email encryption and DRM tend to be workgroup-specific deployments – Email encryption and DRM use is scattered throughout the industry, but is still generally limited to workgroup-level projects due to the complexity of management, or lack of demand/compliance from users.

Database Activity Monitoring usage continues to grow slowly, mostly for compliance, but not quickly enough to save lagging vendors – Many DAM deployments are still tied to SOX auditing, and it’s not as widely used for other data security initiatives. Performance is reasonable when you can use endpoint agents, which some DBAs still resist. Network monitoring is not seen as effective, but may still be used when local monitoring isn’t an option. Network requirements, depending on the tool, may also inhibit deployments.

My main takeaway is that security managers know what they need to do to protect information assets, but they lack the time, resources, and management support for many initiatives. There is also broad dissatisfaction with security tools and vendors in general, in large part due to poor expectation setting during the sales process, and deliberately confusing marketing. It’s not that the tools don’t work, but that they’re never quite as easy as promised.

It’s an interesting dilemma, since there is clear and broad recognition that data security (and by extension, web application security) is likely our most pressing overall issue in terms of security, but due to a variety of factors (many of which we covered in our Business Justification for Data Security paper), the resources just aren’t there to really tackle it head-on.


Thursday, May 21, 2009

The Pragmatic Data (Information-Centric) Security Cycle

By Rich

Way back when I started Securosis, I came up with something called the Data Security Lifecycle, which I later renamed the Information-Centric Security Cycle. While I think it does a good job of capturing all the components of data security, it’s also somewhat dense. That lifecycle was designed to be a comprehensive outline of protective controls and information management, but I’ve since realized that if you have a specific data security problem, it isn’t the best place to start.

In a couple weeks I’ll be speaking at the TechTarget Financial Information Security Decisions conference in New York, where I’m presenting Pragmatic Data Security. By “pragmatic” I mean something you can implement as soon as you get home. Where the lifecycle answers the question, “How can I secure all my data throughout its entire lifecycle?” pragmatic data security answers, “How can I protect this specific data at this point in time, in my existing environment?”

It starts with a slimmed down cycle:


  1. Define what information you want to protect (specifically, not general data classification)
  2. Discover where it’s located (various tools/techniques, preferably automated, like DLP, rather than manual)
  3. Secure the data where it’s stored, and/or eliminate data where it shouldn’t be (access controls, encryption)
  4. Monitor data usage (various tools, including DLP, DAM, logs, SIEM)
  5. Protect the data from exfiltration (DLP, USB control, email security, web gateways, etc.)

For example, if you want to protect credit card numbers you’d define them in step 1, use DLP content discovery in step 2 to locate where they are stored, remove it or lock the repositories down in step 3, use DAM and DLP to monitor where they’re going in step 4, and use blocking technologies to keep them from leaving the organization in step 5.

All too often I’m seeing people get totally wrapped up in complex “boil the ocean” projects that never go anywhere, vs. defining and solving a specific problem. You don’t need to start your entire data security program with some massive data classification program. Pick one defined type of data/information, and just go protect it. Find it, lock it down, watch how it’s being used, and stop it from going where you don’t want.

Yeah, parts are hard, but hard != impossible. If you keep your focus, any hard problem is just a series of smaller, defined steps.


Thursday, December 04, 2008

Analysis Of The Microsoft/RSA Data Loss Prevention Partnership

By Rich

By the time I post this you won’t be able to find a tech news site that isn’t covering this one. I know, since my name was on the list of analysts the press could contact and I spent a few hours talking to everyone covering the story yesterday. Rather than just reciting the press release, I’d like to add some analysis, put things into context, and speculate wildly. For the record, this is a big deal in the long term, and will likely benefit all of the major DLP vendors, even though there’s nothing earth shattering in the short term.

As you read this, Microsoft and RSA are announcing a partnership for Data Loss Prevention. Here are the nitty gritty details, not all of which will be apparent from the press release:

  • This month, the RSA DLP product (Tablus for you old folks) will be able to assign Microsoft RMS (what Microsoft calls DRM) rights to stored data based on content discovery. The way this works is that the RMS administrator will define a data protection template (what rights are assigned to what users). The RSA DLP administrator then creates a content detection policy, which can then apply the RMS rights automatically based on the content of files. The RSA DLP solution will then scan file repositories (including endpoints) and apply the RMS rights/controls to protect the content.
  • Microsoft has licensed the RSA DLP technology to embed into various Microsoft products. They aren’t offering much detail at this time, nor any timelines, but we do know a few specifics. Microsoft will slowly begin adding the RSA DLP content analysis engine to various products. The non-NDA slides hint at everything from SQL Server, Exchange, and Sharepoint, to Windows and Office. Microsoft will also include basic DLP management into their other management tools.
  • Policies will work across both Microsoft and RSA in the future as the products evolve. Microsoft will be limiting itself to their environment, with RSA as the upgrade path for fuller DLP coverage.

And that’s it for now. RSA DLP 6.5 will link into RMS, with Microsoft licensing the technology for future use in their products. Now for the analysis:

  • This is an extremely significant development in the long term future of DLP. Actually, it’s a nail in the coffin of the term “DLP” and moves us clearly and directly to what we call “CMP”- Content Monitoring and Protection. It moves us closer and closer to the DLP engine being available everywhere (and somewhat commoditized), and the real value in being in the central policy management, analysis, workflow, and incident management system. DLP/CMP vendors don’t go away- but their focus changes as the agent technology is built more broadly into the IT infrastructure (this definitely won’t be limited to just Microsoft).
  • It’s not very exciting in the short term. RSA isn’t the first to plug DLP into RMS (Workshare does it, but they aren’t nearly as big in the DLP market). RSA is only enabling this for content discovery (data at rest) and rights won’t be applied immediately as files are created/saved. It’s really the next stages of this that are interesting.
  • This is good for all the major DLP vendors, although a bit better for RSA. It’s big validation for the DLP/CMP market, and since Microsoft is licensing the technology to embed, it’s reasonable to assume that down the road it may be accessible to other DLP vendors (be aware- that’s major speculation on my part).
  • This partnership also highlights the tight relationship between DLP/CMP and identity management. Most of the DLP vendors plug into Microsoft Active Directory to determine users/groups/roles for the application of content protection policies. One of the biggest obstacles to a successful DLP deployment can be a poor directory infrastructure. If you don’t know what users have what roles, it’s awfully hard to create content-based policies that are enforced based on users and roles.
  • We don’t know how much cash is involved, but financially this is likely good for RSA (the licensing part). I don’t expect it to overly impact sales in the short term, and the other major DLP vendors shouldn’t be too worried for now. DLP deals will still be competitive based on the capabilities of current products, more than what’s coming in an indeterminate future.

Now just imagine a world where you run a query on a SQL database, and any sensitive results are appropriately protected as you place them into an Excel spreadsheet. You then drop that spreadsheet into a Powerpoint presentation and email it to the sales team. It’s still quietly protected, and when one sales guy tries to email it to his Gmail account, it’s blocked. When he transfers it to a USB device, it’s encrypted using a company key so he can’t put it on his home computer. If he accidentally sends it to someone in the call center, they can’t read it. In the final PDF, he can’t cut out the table and put it in another document. That’s where we are headed- DLP/CMP is enmeshed into the background, protecting content through it’s lifecycle based on central policies and content and context awareness.

In summary, it’s great in the long term, good but not exciting in the short term, and beneficial to the entire DLP market, with a slight edge for RSA. There are a ton of open questions and issues, and we’ll be watching and analyzing this one for a while.

As always, feel free to email me if you have any questions.


Thursday, July 10, 2008

ADMP and Assessment

By Adrian Lane

Application and Database Monitoring and Protection. ADMP for short.

In Rich’s previous post, under “Enter ADMP”, he discussed coordination of security applications to help address security issues. They may gather data in different ways, from different segments within the IT infrastructure, and cooperate with other applications based upon the information they have gathered or gleaned from analysis. What is being described is not shoving every service into an appliance for one stop shopping; that is decidedly not what we are getting at. Conceptually it is far closer to DLP ‘suites’ that offer endpoint and network security, with consolidated policy management.

Rich has been driving this discussion for some time, but the concept is not yet fully evolved. We are both advocates and see this as a natural evolution to application security products. Oddly, Rich and I very seldom discuss the details prior to posting, and this topic is no exception. I wanted to discuss a couple items I believe should be included under the ADMP umbrella, namely Assessment and Discovery. Assessment and Discovery can automatically seed monitoring products with what to monitor, and cooperate with their policy set.

Thus far the focus through a majority of our posts has been monitoring and protection, as in active protection, for ADMP. It reflects a primary area of interest for us as well as what we perceive as the core value for customers. The cooperation between monitored points within the infrastructure, both for collected data and the resulting data analysis, represents a step forward and can increase the effectiveness of each monitoring point. Vendors such as Imperva are taking steps into this type of strategy, specifically for tracking how a user’s web activity maps to the back end infrastructure. I imagine they will come up with more creative uses for this deployment topology in the future.

Here I am driving at the cooperation between preventative (assessment and discovery in this context) and detective (monitoring) controls. Or more precisely, how monitoring and various types of assessment and discovery can cooperate to make the entire offering more efficient and effective. And when I talk about assessment, I am not talking about a network port scan to guess what applications and versions are running- but rather active interrogation and/or inspection of the application. And for discovery, not just the location of servers and applications, but a more thorough investigation of content, configuration and functions.

Over the last four years I have advocated discovery, assessment and then monitoring, in that order. Discover what assets I have, assess what my known weaknesses are, and then fix what I can. I would then turn on monitoring for generic threats I that concern me, but also tune my monitoring polices to accommodate weaknesses in my configuration. My assumption is that there will always be vulnerabilities which monitoring will assist with controlling. But with application platforms- particularly databases- most firms are not and cannot be fully compliant with best practices and still offer the business processing functions the database is intended for. Typically weaknesses in security that are going to remain part of the daily operation of the applications and databases require some specific setting or module that is just not that secure.

I know that there are some who disagree with this; Bruce Schneier has advocated for a long time that “Monitor First” is the correct approach. My feeling is that IT is a little different, and (adapting his analogy) I may not know where all of the valuables are stored, and I may not know what the type of alarm is needed to protect the safe. I can discover a lot from monitoring, and it allows me to witness both behavior and method during an attack, and use that to my advantage in the future. Assessment can provide tremendous value in terms of knowing what and how to protect, and it can do so prior to an attack. Most assessment and discovery tools are run periodically; while they may not be continuous, nor designed to find threats in real time, they are still not a “set and forget” part of security. They are best run periodically to account for the fluid nature of IT systems.

I would add assessment of web applications, databases, and traditional enterprise application into this equation. Some of the web application assessment vendors have announced their ability to cooperate with WAF solutions, as WhiteHat Security has done with F5. Augmenting monitoring/WAF is a very good idea IMO, both in terms of coping with the limitations inherent to assessment of live web applications without causing disaster, but also the impossibility of getting complete coverage of all possible generated content. Being able to shield known limitations of the application, due either to design or patching delay, is a good example of the value here.

In the same way, many back-end application platforms provide functionality that is relied upon for business processing that is less than secure. These might be things like database links or insecure network ‘listener’ configurations, which cannot be immediately resolved, either due to business continuity or timing constraints. An assessment platform (or even a policy management tool, but more on that later) or a rummage through database tables looking for personaly identifiable information, which is then fed to a database monitoring solution, can help deal with such difficult situations. Interrogation of the database reveals the weakness or sensitive information, and the result set is fed to the monitoring tool to check for inappropriate use of the feature or access to the data. I have covered many of these business drivers in a previous post on Database Vulnerability Assessment. And it is very much for these drivers like PCI that I believe the coupling of assessment with monitoring and auditing is so powerful- the applications help compensate for each another, enabling each to do what it is best at, passing off coverage of areas where they are less effective.

Next up, I want to talk about policy formats, the ability to construct policies that apply to multiple platforms, and how to include result handling.

–Adrian Lane

Wednesday, July 02, 2008

Best Practices For Endpoint DLP: Part 2

By Rich

In Part 1 I talked about the definition of endpoint DLP, the business drivers, and how it integrates with full-suite solutions. Today (and over the next few days) we’re going to start digging into the technology itself.

Base Agent Functions

There is massive variation in the capabilities of different endpoint agents. Even for a single given function, there can be a dozen different approaches, all with varying degrees of success. Also, not all agents contain all features; in fact, most agents lack one or more major areas of functionality.

Agents include four generic layers/features:

  1. Content Discovery: Scanning of stored content for policy violations.
  2. File System Protection: Monitoring and enforcement of file operations as they occur (as opposed to discovery, which is scanning of content already written to media). Most often, this is used to prevent content from being written to portable media/USB. It’s also where tools hook in for automatic encryption or application of DRM rights.
  3. Network Protection: Monitoring and enforcement of network operations. Provides protection similar to gateway DLP when a system is off the corporate network. Since most systems treat printing and faxing as a form of network traffic, this is where most print/fax protection can be enforced (the rest comes from special print/fax hooks).
  4. GUI/Kernel Protection: A more generic category to cover data in use scenarios, such as cut/paste, application restrictions, and print screen.

Between these four categories we cover most of the day to day operations a user might perform that places content at risk. It hits our primary drivers from the last post- protecting data from portable storage, protecting systems off the corporate network, and supporting discovery on the endpoint. Most of the tools on the market start with file and (then) networking features before moving on to some of the more complex GUI/kernel functions.

Agent Content Awareness

Even if you have an endpoint with a quad-core processor and 8 GB of RAM, the odds are you don’t want to devote all of that horsepower to enforcing DLP.

Content analysis may be resource intensive, depending on the types of policies you are trying to enforce. Also, different agents have different enforcement capabilities which may or may not match up to their gateway counterparts. At a minimum, most endpoint tools support rules/regular expressions, some degree of partial document matching, and a whole lot of contextual analysis. Others support their entire repertoire of content analysis techniques, but you will likely have to tune policies to run on a more resource constrained endpoint.

Some tools rely on the central management server for aspects of content analysis, to offload agent overhead. Rather than performing all analysis locally, they will ship content back to the server, then act on any results. This obviously isn’t ideal, since those policies can’t be enforced when the endpoint is off the enterprise network, and it will suck up a fair bit of bandwidth. But it does allow enforcement of policies that are otherwise totally unrealistic on an endpoint, such as database fingerprinting of a large enterprise DB.

One emerging option is policies that adapt based on endpoint location. For example, when you’re on the enterprise network most policies are enforced at the gateway. Once you access the Internet outside the corporate walls, a different set of policies is enforced. For example, you might use database fingerprinting (exact database matching) of the customer DB at the gateway when the laptop is in the office or on a (non split tunneled) VPN, but drop to a rule/regex for Social Security Numbers (or account numbers) for mobile workers. Sure, you’ll get more false positives, but you’re still able to protect your sensitive information while meeting performance requirements.

Next up: more on the technology, followed by best practices for deployment and implementation.


Monday, June 30, 2008

Best Practices For Endpoint DLP: Part 1

By Rich

As the first analyst to ever cover Data Loss Prevention, I’ve had a bit of a tumultuous relationship with endpoint DLP. Early on I tended to exclude endpoint only solutions because they were more limited in functionality, and couldn’t help at all with protecting data loss from unmanaged systems. But even then I always said that, eventually, endpoint DLP would be a critical component of any DLP solution. When we’re looking at a problem like data loss, no individual point solution will give us everything we need.

Over the next few posts we’re going to dig into endpoint DLP. I’ll start by discussing how I define it, and why I don’t generally recommend stand-alone endpoint DLP. I’ll talk about key features to look for, then focus on best practices for implementation.

It won’t come as any surprise that these posts are building up into another one of my whitepapers. This is about as transparent a research process as I can think of. And speaking of transparency, like most of my other papers this one is sponsored, but the content is completely objective (sponsors can suggest a topic, if it’s objective, but they don’t have input on the content).


As always, we need to start with our definition for DLP/CMP:

“Products that, based on central policies, identify, monitor, and protect data at rest, in motion, and in use through deep content analysis”.

Endpoint DLP helps manage all three parts of this problem. The first is protecting data at rest when it’s on the endpoint; or what we call content discovery (and I wrote up in great detail). Our primary goal is keeping track of sensitive data as it proliferates out to laptops, desktops, and even portable media. The second part, and the most difficult problem in DLP, is protecting data in use. This is a catch all term we use to describe DLP monitoring and protection of content as it’s used on a desktop- cut and paste, moving data in and out of applications, and even tying in with encryption and enterprise Document Rights Management (DRM). Finally, endpoint DLP provides data in motion protection for systems outside the purview of network DLP- such as a laptop out in the field.

Endpoint DLP is a little difficult to discuss since it’s one of the fastest changing areas in a rapidly evolving space. I don’t believe any single product has every little piece of functionality I’m going to talk about, so (at least where functionality is concerned) this series will lay out all the recommended options which you can then prioritize to meet your own needs.

Endpoint DLP Drivers

In the beginning of the DLP market we nearly always recommended organizations start with network DLP. A network tool allows you to protect both managed and unmanaged systems (like contractor laptops), and is typically easier to deploy in an enterprise (since you don’t have to muck with every desktop and server). It also has advantages in terms of the number and types of content protection policies you can deploy, how it integrates with email for workflow, and the scope of channels covered. During the DLP market’s the first few years, it was hard to even find a content-aware endpoint agent.

But customer demand for endpoint DLP quickly grew thanks to two major needs- content discovery on the endpoint, and the ability to prevent loss through USB storage devices. We continue to see basic USB blocking tools with absolutely no content awareness brand themselves as DLP. The first batches of endpoint DLP tools focused on exactly these problems- discovery and content-aware portable media/USB device control.

The next major driver for endpoint DLP is supporting network policies when a system is outside the corporate gateway. We all live in an increasingly mobile workforce where we need to support consistent policies no matter where someone is physically located, nor how they connect to the Internet.

Finally, we see some demand for deeper integration of DLP with how a user interacts with their system. In part, this is to support more intensive policies to reduce malicious loss of data. You might, for example, disallow certain content from moving into certain applications, like encryption. Some of these same kinds of hooks are used to limit cut/paste, print screen, and fax, or to enable more advanced security like automatic encryption or application of DRM rights.

The Full Suite Advantage

As we’ve already hinted, there are some limitations to endpoint only DLP solutions. The first is that they only protect managed systems where you can deploy an agent. If you’re worried about contractors on your network or you want protection in case someone tries to use a server to send data outside the walls, you’re out of luck. Also, because some content analysis policies are processor and memory intensive, it is problematic to get them running on resource-constrained endpoints. Finally, there are many discovery situations where you don’t want to deploy a local endpoint agent for your content analysis- e.g. when performing discovery on a major SAN.

Thus my bias towards full-suite solutions. Network DLP reduces losses on the enterprise network from both managed and unmanaged systems, and servers and workstations. Content discovery finds and protects stored data throughout the enterprise, while endpoint DLP protects systems that leave the network, and reduces risks across vectors that circumvent the network. It’s the combination of all these layers that provides the best overall risk reduction. All of this is managed through a single policy, workflow, and administration server; rather than forcing you to create different policies; for different channels and products, with different capabilities, workflow, and management.

In our next post we’ll discuss the technology and major features to look for, followed by posts on best practices for implementation.


Sunday, June 01, 2008

Webcast June 4th: DLP Content Discovery

By Rich

Yes, it’s one of those weeks, with two webcasts and a conference (SANS Pen Testing and Application Security in Vegas).

For this one we’ll be talking about DLP content discovery for Vontu/Symantec. It’s not just me; there will be a customer case study (yes, an honest to goodness security person willing to talk about what they’ve done). Here’s the official description, and you can register here:

Where Is Your Confidential Data and How Do You Protect It? A Real Life Customer Success

Do you know where your confidential data is stored and how to protect it? Industry analysts predict that data discovery will be the single fastest-growing segment of the Data Loss Prevention (DLP) market in 2008 and beyond. In this webcast, you will get the opportunity to hear firsthand how Sharp HealthCare implemented a DLP solution to secure their sensitive customer data stored across the organization, and what business results they are seeing today. Join Rich Mogull, founder of Securosis LLC and former Gartner analyst, and Starla Rivers, Technical Security Architect at Sharp, as they address how to easily deploy DLP and quickly realize the solution benefits.


Monday, May 19, 2008

New Whitepaper: Best Practices For DLP Content Discovery

By Rich

One of the most under-appreciated aspects of DLP solutions is content discovery- scanning stored data to identify sensitive content, classify information, and (in some cases) even protect the data. Major DLP tools have long evolved past just scanning network traffic for credit card and Social Security Numbers.

Today I’m releasing a new whitepaper on the topic: DLP Content Discovery: Best Practices for Stored Data Discovery and Protection.

The paper covers features, best practices for deployment, and example use cases to give you an idea of how it works.

It’s my usual independent content, much of which started here as blog posts. Thanks to Symantec (Vontu) for Sponsoring and Chris Pepper for editing.