Best Practices For DLP Content Discovery: Part 2

Someone call the Guinness records people- I’m actually posting the next part of this series when I said I would!

Okay, maybe there’s a deadline or something, but still…

In part 1 we discussed the value of DLP content discovery, defined it a little bit, and listed a few use cases to demonstrate it’s value. Today we’re going to delve into the technology and a few major features you should look for.

First I want to follow up on something from the last post. I reached out to one of the DLP vendors I work with, and they said they are seeing around 60% of their clients purchase discovery in their initial DLP deployment. Anecdotal conversations from other vendors/clients supports this assertion. Now we don’t know exactly how soon they roll it out, but my experience supports the position that somewhere over 50% of clients roll out some form of discovery within the first 12-18 months of their DLP deployment.

Now on to the…

Technology

Let’s start with the definition of content discovery. It’s merely the definition of DLP/CMP, but excluding the in use and in motion components:

“Products that, based on central policies, identify, monitor, and protect data at rest through deep content analysis”.

As with the rest of DLP, the key distinguishing characteristic (as opposed to other data at rest tools like content classification and e-discovery) is deep content analysis based on central policies. While covering all content analysis techniques is beyond the scope of this post, examples include partial document matching, database fingerprinting (or exact data matching), rules-based, conceptual, statistical, pre-definited categories (like PCI compliance), and combinations of the above. They offer far deeper analysis than just simple keyword and regular expression matching. Ideally, DLP content discovery should also offer preventative controls, not just policy alerts on violations. How does this work?

Architecture

At the heart is the central policy server; the same system/device that manages the rest of your DLP deployment. The key three features of the central management server are policy creation, deployment management/administration, and incident handling/workflow. In large deployments you may have multiple central servers, but they all interconnect in a hierarchical deployment.

Data at rest is analyzed using one of four techniques/components:

  1. Remote scanning: either the central policy server or a dedicated scanning server that connects with storage repositories/hosts via network shares or other administrative access. Files are then scanned for content violations. Connections are often made using administrative credentials, and any content transfered between the two should be encrypted, but this may require reconfiguration of the storage repository and isn’t always possible. Most tools allow bandwidth throttling to limit network impact, and placing scanning servers closer to the storage also increases speed and limits impact. It supports scanning nearly any storage repository, but even with optimization performance will be limited due to reliance on networking.
  2. Server agent: a thin agent is installed on the server and scans content locally. Agents can be tuned to limit performance impact, and results are sent securely to the central management server. While scanning performance is higher than remote scanning, it requires platform support and local software installation.
  3. Endpoint agent: while you can scan endpoints/workstations remotely using administrative file shares, this will rapidly eat up network bandwidth. DLP solutions increasingly include endpoint agents with local discovery capabilities. These agents normally include other DLP functions, such as USB monitoring/blocking.
  4. Application integration: direct integration, often using an agent, with document management, content management, or other storage-oriented applications. This integration not only supports visibility into management content, but allows the discovery tool to understand local context and possibly enforce actions within the system.

A good content discovery tool will understand file context, not just content. For example, the tool can analyze access controls on the files and using its directory integration understand which users and groups have what access. Thus the accounting department can access corporate financials, but any files with that content allowing all-user access are identified for remediation. Engineering teams can see engineering plans, but the access controls are automatically updated to restrict access by the accounting team if engineering content shows up in the wrong repository.

From an architectural perspective you’ll want to look for solutions that support multiple options, with performance that meets your requirements.

That’s it for today. Tomorrow we’ll review enforcement options (which we’ve hinted at), management, workflow, and reporting. I’m not going to repeat everything from the big DLP whitepaper, but concentrate on aspects important to protecting data at rest.

Technorati Tags: , , , , , , , , , ,

Understanding and Selecting a Database Activity Monitoring Solution: Part 5, Advanced Features

We’re going to be finishing the series off this week, in large part so I can get it compiled together into a whitepaper with SANS, sponsored by Imperva, Guardium, and Sentrigo, before the big RSA show. I won’t be sleeping much this week as I compile and re-write the posts, add additional content that didn’t make it into the blog, create some images, and toss it back and forth with my editor. What? You didn’t think all I did was cut and paste this stuff, did you?

For review, you can look up our previous entries here:

Part 1
Part 2
Part 3
Part 4

What do I mean by advanced features? In our other posts we focused on the core solution set, but most of the products have quite a bit more to offer. There’s no way we can cover everything, and I don’t intend this to be an advertisement for any particular solution set, but there are a few major features we see appearing in more than one product. I’m going to highlight a few I think are particularly interesting and worthy of consideration in the selection process.

Content Discovery

As much as we like to think we know our databases, the reality is we really don’t always know what’s inside them. Many of our systems grew organically over the years, some are managed by external consultants or application vendors, and others find sensitive data stored in unusual locations. To counter these problems, some database activity monitoring solutions are adding content discovery features similar to DLP. These tools allow you to set content-based policies to identify the use of things like credit card numbers in the database, even if they aren’t located where you expect. Discovery tools crawl through registered databases, looking for sensitive content based on policies, and generate alerts for sensitive content in new locations. For example, you could create a policy to identify any credit card number in any database, and generate a report for PCI compliance. The tools can run on a scheduled basis so you can perform ongoing assessments, rather than combing through everything by hand every time an auditor comes knocking.

Some tools allow you to then build policies based on the discovery results. Instead of manually identifying every field with Social Security Numbers and building a different protection policy for each, you create a single policy that generates an alert every time an administrator runs a SELECT query on any field which matches the SSN rule. As the system grows and changes over time, the discovery component identifies the fields matching the protected content, and automatically applies the policy.

We’re also starting to see DAM tools that monitor live queries for sensitive data. Policies are then freed from being tied to specific fields, and can generate alerts or perform enforcement actions based on the result set. For example, a policy could generate an alert any time a query result contains a credit card number, no matter what columns were referenced in the query.

Connection Pooled User Identification

One of the more difficult problems we face in database security is the sometimes arbitrary distinction between databases and applications. Rather than looking at them as a single system, we break out database and application design and administration, and try to apply controls in each without understanding the state of the other. This is readily apparent in the connection pooling problem. Connection pooling is a technique where we connect large applications to large databases using a single shared connection running under a single database user account. Unless the application was carefully designed, all queries come from that single user account (e.g., APP_USR) and we have no way, at the database level, to identify the user performing the transaction. This creates a level of abstraction which makes it difficult, if not impossible, to monitor specific user activity and apply user policies at the database level.

An advanced feature of some database activity monitoring solutions allows them to track and correlate individual query activity back to the application user. This typically involves integration or monitoring at the application level. You now know which database transactions were performed by which application users, which is extremely valuable for both audit and security reasons.

Blocking and Enforcement

Today, most users just deploy database activity monitoring to audit and alert on user activity, but many of the tools are perfectly capable of enforcing preventative policies. Enforcement happens at either the network layer or on the database server itself, depending on the product architecture.

Enforcement policies tend to fall into two categories. The first, similar to many of the monitoring policies we’ve described, are focused on user behaviors like viewing or changing sensitive records. Rather than just alerting after a DBA pulls every account number out of the system, you can block the query. The second is focused on database exploits; similar to an intrusion prevention solution, the system blocks queries matching signatures for known attacks like SQL injection.

The nature and level of blocking will vary based on the architecture of the DAM tool. Integrated agent solutions may offer features like transaction rollback, while network tools block the traffic from hitting the DBMS in the first place. Digging into the specific architectures and benefits is beyond the scope of this post.

Application Activity Monitoring

Databases rarely exist in a vacuum; more often than not they are an extension of applications, yet we tend to look at them as isolated components. Application Activity Monitoring adds the ability to watch application activity, not just the database queries that result. This information can be correlated between the application and the database to gain a clear picture of just how data is being used at both levels, and identify anomalies which may indicate a security or compliance failure.

Since application design and platforms vary even more than databases, products can’t cover every custom application in your environment. We see vendors focusing on major custom application platforms, like SAP and Oracle, and monitoring web-based application activity.

Pre-Configured Application Policies

With or without application monitoring, some DAM solutions come with pre-configured policies for common applications (e.g., PeopleSoft). Although you’ll need to tune them to account for any application customization, they can jump-start the policy building process and save you from manually building all your compliance and security policies.

Pre-Configured Compliance Policies

Although no tool will make you compliant out of the box, pre-configured compliance policies for common platforms give you a head start on the process, especially when you don’t know where to start. Most vendors hire or partner with auditors to help them build their compliance policy and reporting packages for common regulations like SOX and PCI-DSS that frequently impact databases.

Change Management

Although many organizations have rigorous change management policies for the database platform and underlying system, far fewer enforce change management at the query level (which is invisible to traditional change management tools). An advanced feature of certain DAM solutions integrates with change management tools for closed-looped tracking of query-level changes. The requested change is approved in the change management system and a ticket number issued. The DBA enters that ticket number as part of their session, and all database changes (even to individual field updates) are recorded and correlated back to the original change ticket.

Vulnerability Assessment

Vulnerability assessment in databases is a topic worthy of its own series of posts (wink wink), but is sometimes offered as a feature (usually at additional cost) of a DAM solution. Database vulnerability assessment tools look deeper than patch levels, down to specific configurations and even an analysis of user entitlements. Some tools integrate the results of VA to DAM policies to generate alerts or block activity that’s suspected of targeting an identified vulnerability. This is a huge area on its own, and is beyond the scope of this post.

This list is far from exhaustive; I’ve tried to highlight some of the more common advanced features you’ll see through the selection process. I’m sure I’ve missed a few and I encourage those of you on the vendor side to add them in the comments, just avoid too much marketing fluff or I’ll filter the comment. If they’re good (and clearly a big feature starting to appear in multiple products) they might even make it into the white paper.

This post concludes our coverage of DAM features. By now you should have a good idea of how the technology works and the different options available to you. In the final post (probably tomorrow, considering my deadlines) we’ll dig into how to run a successful product selection process.

Technorati Tags: , , ,

Database Security Rule: Use System Generated Primary Keys

I was reading an article by Rsnake this morning on the problems of using a username as a primary key, and it reminded me of something I’ve been meaning to write about for a while.

As a former database designer and current security geek I’m often stunned by how often designers/developers choose really bad primary keys for their databases. Even back in my developer days, when security for me meant taking down drunks at concerts, I knew better than to use something like a username, credit card number, or Social Security Number as a primary key. To be honest, it had nothing to do with security and everything to do with good database design.

Back when I was starting my IT career I was fortunate to work closely with the professor at the University of Colorado who taught database design. One of his cardinal rules was to, wherever possible, use system generated keys as the primary key. Randomly generated keys, as opposed to sequential keys that could “leak” information. Our designs should also strive to conform to at least Fourth Normal Form, although full normalization wasn’t always possible for practical reasons.

We never used Social Security Numbers as primary keys because they are neither always unique (there are mistakes in the system) nor available for all potential users (foreign students were assigned fakes). Credit card numbers are bad because they are not a unique identifier for an individual- I have multiple credit card numbers, any of which can identify me, all of which are temporal (change over time). I have no idea why retailers so often use credit card numbers as all or part of a primary key, since I may use multiple cards even on the same day.

Usernames more closely conform to a viable primary key from a pure, non-security design perspective, but I never liked them for some of the reasons RSnake cites, and I always feel like usernames are temporal, even when they aren’t, and while I haven’t tested it I think the performance of numeric keys is probably higher.

Thus my first rule in picking a primary key, from both a pure design and security perspective, is to use system generated random keys (preferably not sequential auto increment).

For other fields you want unique, like username, SSN, or credit card number, just set a unique index on the field. Worst case, you can even use this to correlate across unrelated systems assuming the rest of the attributes line up (we used to do this using the SSN, before we all learned using them at all is bad).

One criticism of Rsnake’s examples of account hijacking is that even with a pure primary key, if you hard delete a username someone can still impersonate the account, depending on the design of the system.

Technorati Tags: , ,

The Future Of Information-Centric Security: From Data Loss Prevention to Content Monitoring and Protection, Part 1

Over the past couple of weeks Mike Rothman has been posting his Security Incites, a series of predictions for 2008. Prediction number 9 was titled, “Get the Jumper Cables for DLP”, and I, of course, have to disagree with at least some of it.

There are three reasons I spend a lot of time talking about DLP so much here on the blog. First, I think it’s one of the least understood security technologies on the market, yet one with high value when used properly. There’s a lot of confusion out there, and I think I provide more value by clearing that up than by talking about more established technologies. Second, DLP was one of the first technologies I covered as an analyst, long before there was an established market. I have something like 6 years invested in it, which is longer than most of the people working at most of the vendors. Can’t let that go to waste. Finally, it’s because I do believe that what we now call DLP with form the core of a significant chunk of our information-centric (data) security moving forward.

Rather than pick through Mike’s prediction I’m going to take this opportunity to start laying out the evolution of DLP so you can make your own decisions as to where we’re headed. Since I’m still recovering from my shoulder surgery and only running at about 60-70%, this series will consists of a bunch of shorter posts rather than my usual long-winded Hoffesque diatribes.

Sidebar: Why DLP is a bad name: When I first started covering this market we had a hard time deciding what to call it. I even once had a conference call with the two leading competitors to try and hash out a term. I picked Content Monitoring and Filtering, which I now use to describe the second phase of the technology, While it wasn’t sexy, I felt that the tools offered a lot more than just “data leak prevention”, and that such a generic term could be easily co-opted by other data protection technologies, like encryption. For once I was right- everything from USB port blockers to digital rights management calls itself DLP these days, confusing customers, while the “DLP” solutions have added discovery, classification, and other capabilities well beyond mere leak prevention.

A Three Phase Evolution

I believe we’ll see three phases in the evolution of this technology over the next 5-7 years. While the technology itself will evolve more quickly than that, the realities of the market, new technology adoption, and deployment practicalities mean we won’t see complete, mainstream deployments until the latter part of that timeframe.

Don’t read that the wrong way- most, probably all of you will deploy much of DLP/CMP over the next 5 years, but only the early mainstream will achieve the full vision I’m describing by then. At that point your organization will be more of a limiting factor than the technology. If you want it. it will be there.

The three phases we’re seeing are:

  1. Data Loss Prevention: Although most people call today’s solutions DLP, the leading solutions have all moved well beyond this phase of the market. I still have to use the term so people know what I’m talking about, but the top solutions are already in the next phase. DLP solutions are characterized by protecting predominantly data in motion (including USB transfers). These are true “leak/loss prevention” only solutions. Content analysis techniques tend to be more basic, sometimes limited to just regular expressions/rules combined with a little context.
  2. Content Monitoring and Filtering: In this phase we see more robust solutions; with protection for data in motion, at rest, and in use. The tools are more widespread, covering all major channels from network, to endpoints and storage. Content analysis techniques are more advanced, with (at a minimum) regular expressions/rules, partial document matching, and database fingerprinting (exact data matching).
  3. Content Monitoring and Protection: In this final phase (okay, it’s just as far out as I’m comfortable predicting) the technology becomes ubiquitous is user productivity applications and communications. Enterprise DRM is integrated and content is classified at the point of creation. Advanced content analysis techniques become more effective, better allowing us to classify more complex data, taking into account business context. Data is protected through its lifecycle.

Here’s an easier way to think about it: DLP is about preventing basic leaks of easy to identify sensitive content. With CMF, we start protecting a wider range of content, and putting controls in place before it’s already trying to fly out the door. With CMP, we have cradle to grave content classification and protection.

This is just a top level overview. Over the next several posts I’ll detail more of the specifics of each phase. I consider this complementary to my series and paper on Understanding and Selecting a DLP Solution. That series focused on helping you pick and deploy a tool today, while this series will help you navigate the waters as the tools and market evolve and you make upgrade and deployment decisions.

Hmm… I smell another paper coming…

Technorati Tags: , , , ,

Understanding and Selecting a Database Activity Monitoring Solution: Part 4, Alerts, Workflow, and Reporting

It seems that every time I write the next part of this multipart series I find myself apologizing for taking too long between posts. I swear I have a good excuse this time- with the whole doctor sticking cameras into my shoulder, shaving out bits, cutting tendons and tying them to new places, putting in plastic anchors, and sewing torn parts of muscles together thing. I’m 11 days into my recovery and while the days are fine, despite learning not to use my arm for the next three months, the nights… let’s just say I fear the nights. I think I’m getting closer to figuring out the right combination of drugs, body position, and pillows that will let me get a little closer to some functional sleep.

But business is good, I’m gaining a little more productivity every day, and… enough about me.

In today’s post we’re going to delve deeper into Database Activity Monitoring. We’re going to talk about alerting, workflow, and reporting.

In my previous post we discussed central management, including policy creation. One of the key advantages of DAM over passive auditing and logging solutions is the ability to define policies for active alerts and manage remediation. While policies are mostly deployed in a passive mode (alerting only) some products also support active blocking, which we will cover in a future post.

I’m really not a fan of relying on passive auditing for security; it’s often important, but with the tools we have today we can generate immediate alerts allowing us to contain security incidents before they spread, or even stop a multi-stage attack before completion. This is one key characteristic separating proactive security tools from simple monitoring/logging tools.

Alerts

Your DAM tools should support both active alerting and an incident handling queue, similar to DLP. These alerts take a few different forms, from email integration, to self-contained events, to communications with outside security tools (like SIEM) using anything from SNMP to syslog to proprietary integration.

Policies should support granular alerting based on conditions, such as thresholds. For example, detection of a single errant query might trigger a low level incident within the included incident handling system, while an incident involving an administrator or high count of credit cards is emailed to a security admin and dropped into the SIEM tool as a high alert.

Not to say you should rely on a SIEM or other external tool to manage your incidents; those tools will never contain the full context and investigative abilities of the dedicated DAM workflow. External alerts play a valuable role in escalating incidents and correlating with external factors, but the primary handling will tend to be managed within the DAM tool itself. Databases are complex beasts, and full understanding of what’s going on internally requires a dedicated tool.

Policy based alerts tend to fall into two or three interrelated categories which often overlap:

  1. User activity: Incidents when a user takes an action that violates policy. It could be a user running a query on sensitive data, updating an existing financial transaction outside of an application, or an application running a query never seen before.
  2. Attack activity/signatures: Some DLP solutions include pre-built detection for certain attack activity. This may be linked to vulnerability analysis, signature based, or heuristic (I’m sure some vendors will chime in with even more options).
  3. System and administrative activity: Incidents involving administrative or internal system activity. E.g. new account creation, privilege escalation, DML/DDL changes, system updates. stored procedures, or other configuration changes. Think of these alerts as being focused on SQL (and non-SQL) outside of simple SELECT, INSERT, UPDATE, DELETE queries.

Workflow

Once an incident is created and any external alerts sent out, it should appear in an incident handling queue for management. This is similar to what we see in DLP and many other security tools, but optimized for database activity.

The queue should be visually well-designed to make critical information easier to find, and allow customization for different work styles and interests. Unlike DLP, it’s less important that the queue appeal to non-technical handlers since it’s far less likely that anyone without database and security knowledge will work directly within the system. For DAM, we tend to rely more on reports for the auditors, risk managers, and other non-security types.

Incidents should be easy to sort and include color coding for sensitivity and criticality. When you click on an incident, it should let you drill down into more details to assist the investigative process. Handlers should be able to assign, share, and route incidents to different users within the system. I’m a big fan of having a drop down field to change incident status right on the incident row. The system should also support role based administration, allowing you to assign specific handlers/administrators based on the policy violated, database affected, or other factors.

The basic workflow must allow for quick sorting, analysis, and investigation of incidents. Once an incident is detected, the handler can close it, add supporting investigative material, change the priority, assign it to someone else, or escalate it. To support investigations you should be able to correlate the current incident with other activity in that database by that user, violations of that policy across different systems, and other factors to help determine what’s going on. Since incident handlers may come from either a database or a security background, look for a tool that appeals to both audiences and supplies each with the information they need to understand the incidents and investigate appropriately.

My description has so far focused on database-only incidents, but some systems are now expanding into platform activity on the database host, or application activity.

Reports

As with nearly any security tool you’ll want flexible reporting options, but pay particular attention to compliance and auditing reports to support compliance needs. Aside from all the security advantages we’ve been talking about, many organizations initially deploy DAM to meet their database audit and compliance requirements. Pre-built report templates can save valuable time, and some vendors have worked with auditors from the major firms to help design their reports for specific regulations, like SOX.

Reports should fall into at least three broad categories: compliance and non-technical reports, security reports (incidents), and general technical reports.

That’s about it for alerts, workflow, and reporting. These features are pretty straightforward and similar to other security tools, yet dedicated specifically for databases. In our next post we’ll start talking about advanced features, like connection pooling, blocking, and change management.

Technorati Tags: ,

Evaluating And Protecting Yourself From The Cold-Boot Encryption Attack

Even in my drug-addled state last week it was hard to miss the cold boot encryption attack released by Ed Felten and the Princeton Center for Information Technology Policy. This is some seriously impressive work with major implications, but despite all the articles I’ve seen there has been little information on how to evaluate and mitigate your personal or organizational risk.

That’s where I come in.

I’m not going to assume you know a lot about file and media encryption, so we’ll start with en explanation of how, and why, the attack works. Then we’ll evaluate the risk and discuss mitigation strategies. I’ll close with some suggestions for vendors to close out this vulnerability. And yes, this works on a Mac with FileVault.

What is the cold boot attack and how does it work?

All encryption systems need access to a key to encrypt and decrypt data. It doesn’t matter what you’re encrypting- a hard drive, file, database, or whatever, you need a key. When encrypting and decrypting data, because of how computer systems are designed, the key always passes through memory at some point. For smaller content this is a transient process and the key is only in memory for a short time (assuming the software is designed properly), but when you need constant access to data the key is kept in memory. This is nearly ubiquitous for full-disk encryption or file encryption systems that leave files open for read/write operations. It’s not something we worried about, because when you turn a computer off the RAM (memory for the non geeks) loses power and anything stored is lost. Thus we would password protect our encrypted systems so that even if they wake up from sleep mode, an attacker would have to reboot the system unless they had the key, confident this process would erase the key from memory and keep the data secure.

What the Princeton researchers demonstrated is that modern RAM doesn’t degrade immediately after power is removed. The contents of memory can persist from seconds to minutes, and that time extends when cold is applied to the memory. An easy way to do this is to just use a can of dust off spray.

That’s the first part of the attack- keeping the contents in memory after the system is shut down.

For the second part of the attack they use a special tool, which they haven’t made public, to recover memory contents from RAM. In the demo this tool is on a bootable USB drive, so merely rebooting the computer from this USB stick, ignoring the host operating system of the computer, allows them to scan memory and recover the encryption key. Additional work allowed them to recover a full key even if a few bits were lost as the memory degraded.

To execute the attack, the attacker opens the computer, sprays the memory with an upside-down can of dust off to cool it, then reboots off the USB device with their software for key recovery on it, thus recovering the keys and gaining access to the data.

If you use a boot password or something similar they perform the same attack, but remove the memory and place it into a different system for key recovery. Thanks to the cold spray you have more than enough time to pull this off.

Evaluating the Risk

There are no public tools for this attack but it’s only a matter of time. Your immediate risk is low, but don’t be surprised if tools appear reasonably soon. This is a serious vulnerability, with a probability of attack that only increases over time.

In other words, don’t panic, but keep your eyes open. Once a public tool appears it’s time to be more concerned.

The researchers outline how most current protection techniques only partially, if at all, mitigate this flaw. Since memory can be removed, BIOS locks and other restrictions are ineffective.

You are only at risk when your computer is powered on or in sleep mode and you lose physical control of it. Powering off your system begins the memory degradation process and you are safe within a few minutes.

Reducing Your Risk

The most effective method is to power off your system completely (not sleep or hibernate mode) when it’s at risk of physical loss. This is inconvenient, but I’m going to start powering off when I’m in higher risk areas (like airport security) and can’t maintain physical control of the system.

Which brings recommendation number 2- don’t let someone steal your computer. I personally maintain physical control over my system nearly all the time when it’s out of my home (and I have a pretty good security system there). At hotels is the greatest risk, and I do tend to power off when I’m out of the room. You sales guys should start getting into the habit of not using sleep mode when you leave your computer locked in a rental car. At least until the encryption and laptop vendors come up with alternative protections.

For those of you with very sensitive information, combine file and folder encryption for sensitive files with your whole disk encryption. A few vendors offer this (feel free to brag in the comments guys). Just close those sensitive files or images before entering sleep mode, and make sure they are password protected and not linked to your normal login credentials.

Also consider an encryption system that supports storing the keys on a smart card (not in memory). I don’t believe there are many practical options today, but expect to see them crop up thanks to this paper.

Finally, ask your vendor their plans to manage this risk. Today it’s not a big deal, but we don’t know if it will be 2 weeks, 2 months, or two years before public tools appear (and it’s safe to assume some governments have this by now — or more accurately, it would be unsafe and foolish to assume any government does note have this capability by now).

Thus, your overall risk is currently low but growing. You can reduce that risk through good habits and some additional software.

What Vendors Can Do

I don’t know to what degree this technique works on commercial encryption products, but vendors should evaluate the risk to their products and keep customers updated. Saying it isn’t a problem or the risk is low isn’t the right answer- you’ll lose customers that way. If you are working on a solution, let them know since the risk really is low for now.

I suspect we’ll see a couple of different approaches. Over time, this is something that will migrate into hardware- even just a small bit of RAM soldered to the board, probably integrated with some future, mythical, TPM. On the software side I have to believe there are ways we can reduce the risk- for example, flushing the active key from memory during sleep (while turning off hibernate, which writes memory to disk and is always bad anyway) and transitioning to a password protected temp key to access the primary key.

Hardware tokens/smart cards are another great option, assuming we can control active access to the key and you remember to unplug it. There are a lot of really smart engineers out there who will probably come up with fixes, at least for third party encryption tools, before this attack becomes widespread.

Conclusion

This is an impressive and serious attack we all need to take extremely seriously. You are at risk if you lose physical control of an encrypted system that is either powered on or in sleep or hibernate mode.

Turning off your system when it’s at greatest risk of loss or theft is a very effective mitigation, but it will be difficult to train average users to stop using sleep mode due to the convenience.

Using file encryption for sensitive content in combination with whole disk may also reduce the risk when done properly.

Talk to your vendor, and make sure they are REALLY not susceptible or have a roadmap to eliminate this method of attack. If they offer the protection, understand and implement the necessary configuration profile, which may not be the default.

Vendors: talk to your customers and get working on the problem if you are vulnerable. Recognize that hardware solutions are always longer term and you should really see if there is a way to offer protection within the software.

Me? I’m not too worried, but I have extremely good habits around the physical control of my laptop, and will now shut down more under certain circumstances. Since I have a fast Mac, rebooting isn’t all that bad anyway…

Technorati Tags: , , , ,

Introduction To Database Encryption

Database encryption is like a home repair project- either it’s really easy and goes exactly as planned, or about five minutes in you realize you might not want to make any weekend plans for the next 2-3 years, and perhaps you should take a trip to the flower store before trying to explain why your family will be living with exposed wall studs and dangling wires for a while.

Database encryption (and encryption in general) was one of the first technologies I covered when I first became an analyst. Early on I realized something didn’t smell right; I had vendors talking about using encryption to prevent attacks and to “enhance” access controls. But their products were completely linked to access controls, which didn’t really add any value. Also, most attacks against databases involve compromising user accounts or running queries within the privileges of the user, so how would encryption add any value? Encryption doesn’t do a darn thing against many SQL injection attacks or abuse by authorized users.

This led to a lot of introspection and the eventual development of the Three Laws of Data Encryption. We can thus divide database encryption into two categories:

  1. Encryption for Separation of Duties: In this case we will almost always use encryption to protect against our own administrators or other privileged user access, since we can more easily and efficiently use access controls for everyone else. The example is encryption of credit card numbers, with the keys stored outside of the database, to allow stored numbers for credit card processing but to eliminate the possibility of administrators or users accessing the numbers.
  2. Encryption for Media Protection: Here we encrypt database objects (tables/columns), database files, or storage media to prevent exposure of information due to physical loss of the media.

As you can imagine, encrypting for media protection is much easier than encryption for separation of duties, but it clearly doesn’t offer the same security benefits.

Thus, the first thing we need to decide when looking at database encryption is what are we trying to protect against? If we’re just going after the PCI checkbox or are worried about losing data from swapping out hard drives, someone stealing the files off the server, or misplacing backup tapes, then encryption for media protection is our answer. I’ll discuss it more in a future post, but it’s a fairly straightforward process with manageable performance implications.

If we want to encrypt for separation of duties, then life gets a little more complicated. Databases are complex beasts; far more complex than most people give them credit for. Just go try and teach yourself relational calculus or indexing. They like structured data, and once we start mucking with that by randomizing our data through encryption we start messing with performance. That’s not even counting the normal performance impact of encryption itself.

As with encryption for media protection I’ll talk more specifically about encryption for separation of duties in future posts, but as a general rule of thumb it’s not overly difficult to build encryption into a new database, but if you are encrypting a legacy database accessed by applications (legacy or otherwise) you are sometimes looking at a 2-3 year project due to the required database and application changes. We run into problems with indices, range searches, referential integrity, application integration, connection pooling, key management, and … well, there’s a lot to talk about here.

To close this post out, the first thing to look at when considering database encryption is what threat you are trying to protect against. If it’s loss of the database files and media, look towards media protection. If you want to limit regular user access, look to access controls or other internal database security features. If it’s separation of duties for discrete data (again, we’ll talk more later) then consider column/field encryption, and make sure you can store the keys outside of the database.

As you’ve probably figured out by now, this is one of those multiple-post series things I like to do. In the next one we’ll talk about encryption for media protection and why you might want to combine it with database activity monitoring. After that, I’ll dig into field (or other object) encryption for separation of duties, then we’ll close with more detailed recommendations and a discussion of key management.

BTW- I’m going in for some minor shoulder surgery on Monday which will slow me down for a little while. I’ll have some guest posts for next week, and should be back up and running fairly soon.

Technorati Tags: , , ,

How Data Loss Prevention and Database Activity Monitoring Will Connect

There was a pretty good article over at eWeek today talking about the similarities and differences between DLP and DAM. It was kind of strange to read it, since I used to be the lead analyst covering those markets and I might have been the first person to use the DAM term.

As I’ve discussed here before, I think information-centric security will evolve into two major stacks. DLP is the start of the Content Monitoring and Protection stack, while DAM is the start of the Application and Database Monitoring and Protection stack. We’ll have to see if CMP and ADMP survive as terms now that I’m not with a big analyst firm.

Over time I’ll post more on how those stacks will evolve and what they’ll contain. Reading some of the comments on my last DAM post it’s clear that I still haven’t fully articulated this and need to write some papers on it.

Today I’m going to skip ahead, thanks to the eWeek article, and discuss how the two sides will work together. I’ve come up with this division for a lot of reasons, mostly to do with buying centers, technology overlaps, business problems, and business and threat models.

I have to start with a couple assertions. In the model I’m about to show, the CMP stack is embedded into the world of productivity applications and communications- including DRM applied at the time of information creation using content aware policies. Second, ADMP protects information in business applications and databases, and includes static data labeling (which could come from the DBMS) and can also apply on-the-fly labels using content analysis. CMP is for user-land (Office apps, email, etc.); ADMP is more data center oriented.

What will happen is that rights/labels assigned in one stack with be passed to the other stack as information moves between the two. If I run an extract from a database that includes sensitive information, that extract is tagged as sensitive. If that data goes into an Excel spreadsheet, then a Word document, then a PDF, the rights are maintained through each stage, based on central policies.

For example:

  1. I run a query from a customer database that includes social security numbers in the result.
  2. That data is labeled as sensitive, since the SSN column is labeled as sensitive.
  3. I extract that data to Excel. The extract is only allowed because Excel is integrated as an application that can apply DRM rights.
  4. The document in Excel instantly has mandatory DRM rights applied, based on central policies for that classification of data. We’ve now transitioned from ADMP to CMP.
  5. Those DRM rights are maintained through any subsequent movements of the information.

Here’s an animation from a presentation I gave last week that shows what I mean. Click it at least 3 times to advance.

This is just one example of how they’ll bridge, and yes, it sounds like science fiction. But all the components we need are well in development and you might see real-world examples sooner than you think.

Technorati Tags: , , , ,

Understanding and Selecting a Database Activity Monitoring Solution: Part 3, Central Management

There are a lot of things I love about working for myself, but I have to admit sometimes it’s hard to keep everything balanced. For a while there I was taking whatever work came in the door that aligned with my goals and didn’t violate my objectivity requirements. Needless to say, the past few months have been absolutely insane; deadline after deadline, 2-3 trips a month, and a heck of a lot of writing.

The upside is I’m ahead on my goals for the year. The downside, other than a little stress, is that I haven’t been able to keep the content on the blog up as high as I’d like. How can I tell? This is part 3 of my series on Database Activity Monitoring, and I last posted part 2 in the beginning of November.

Oops.

With that mea culpa out of the way (assuming Jews are allowed to mea culpa), let’s jump back in to DAM.

Part 1
Part 2

Today we’re going to start on the basic characteristics of the central management server, including aggregation and correlation and policy creation. Tomorrow (for real) we’ll cover alerting, workflow, and reporting.

Aggregation and Correlation

The one characteristic Database Activity Monitoring solutions share with log management or even Security Information and Event Management (SIEM) tools is the ability to collect disparate activity logs from a variety of database management systems. Where they tend to exceed the capabilities of these related technologies is their ability to not only aggregate, but to normalize and correlate events. By understanding the Structured Query Language (SQL) of each database platform, they can interpret queries and understand their meanings. While a simple SELECT statement might mean the same thing across different database platforms, each database management system (DBMS) is chock full of its own particular syntax. A DAM solution should understand the SQL for each covered platform and be able to normalize events so the user doesn’t necessarily need to know the ins and outs of each DBMS. For example, if you want to review all privilege escalations on all covered systems, the DAM solution will recognize those events regardless of platform and present you with a complete report without you having to understand the SQL.

A more advanced feature is to then correlate activity across different transactions and platforms, rather than just looking at single events. For instance, smart DAM tools can recognize a higher than normal transaction volume by a particular user, or (as we’ll discuss in policies) tie in a privilege escalation event with a large SELECT query on sensitive data, which could indicate an attack.

It also goes without saying (but I’ll say it anyway) that all activity is centrally collected in a secure repository to prevent tampering or a security incidents involving the repository itself.

Since you’ll be collecting a massive volume of data, your DAM tool needs to support automatic archiving. Archiving should support separate backups of system activity, configuration, policies, alerts, and case management.

Policy Creation

One of the distinguishing characteristics of Database Activity Monitoring tools is that they don’t just collect and log activity, they analyze it in real time for policy violations. While still technically a detective control (we’ll talk about preventative deployments later), the ability to alert and respond in practically real time offers security capabilities far beyond simple log analysis. Successful, loss-bearing database attacks are rarely the result of a single malicious query- they involve a sequence of events leading to the eventual damage. Ideally, policies will be established to detect the activity early enough to prevent the final loss-bearing act. Even when an alert is triggered after the fact, it supports immediate incident response and investigation far sooner than analysis days or weeks later.

Policies fall into two basic categories, and I’m sure some of the engineers working on these products will drop additional options down in the comments:

  1. Rules-based: Specific rules are set up and monitored for violations. They can include specific queries, result counts, administrative functions (new user creation, rights changes), signature-based SQL injection detection, UPDATE or other transactions by users of a certain level on certain tables/fields, or any other activity that can be specifically described. Advanced rules can correlate across different parts of a database or even different databases, adjusting for data sensitivity based on DBMS labels or through registration in the DAM tool.
  2. Heuristic: The DAM solution monitors database activity and builds a profile of “normal” activity. Deviations then generate policy alerts. Heuristics are complicated and take proper tuning to work effectively. They are a good way to build a base policy set, especially with complex systems where manually creating deterministic rules by hand isn’t realistic. Policies are then tuned over time to reduce false positives. For well-defined systems where activity is pretty standard, such as an application talking to a database using a limited set of queries, they are very useful. Heuristics, of course, fail if you profile malicious activity as known good activity.

The more mature a solution, the more likely it is to come with sets of pre-packaged policies. For example, some tools come with pre-defined policies for standard deployments of databases behind major applications, like Oracle Financials or SAP. Yes, you’ll have to tune the policies, but it’s far better than starting from scratch. Pre-built policies for PCI, SOX, and other generic compliance requirements may need even more tuning, but will help you kick start the process and save many hours of custom policy building.

Policies should include user/group, source/destination, and other important contextual options. Policies should also support advanced definitions, like complex, multi-level nesting and combinations. Ideally, the DAM solution will include policy creation tools that limit the need to write everything out in SQL or some other definition language. Yes, you can’t avoid having to do some things by hand, but basic policies should be as point-and-click easy as possible.

For common kinds of policies, like detecting privileged user activity or count thresholds on sensitive data, policy wizards are extremely useful.

Content-Based Policies

An emerging feature in some tools is support for content-based policies. Similar to DLP, the tools are able to analyze queries and results for specific content.

Identifying all known locations of sensitive data within multiple heterogenous database management systems is a complex process, even with the support of content discovery (which we’ll talk about later). Credit card and Social Security Numbers can easily be placed where they shouldn’t be, either on purpose or by accident. Content-based policies, typically using regular expressions, analyze database activity for unapproved use of sensitive data. For example, a policy could look for credit card numbers in any result set except those previously approved.

It’s very early days, but I expect we’ll see more and more content and context awareness in DAM tools over time. Let’s be honest- the most critical data we’re usually trying to protect (at least these days) falls into structured formats we can define and look for when it breaks outside its normal boundaries (including data labeling or other registration techniques). Long term we’ll be able to do some really interesting things as we improve our ability to monitor and understand business context with the content, moving us ever closer to the elusive goal of using legitimate rights to commit forbidden actions.

That’s the basics of what to look for in aggregation, correlation, and policy creation. Tomorrow we’ll spend time on alerting, workflow, and reporting, before moving on to more advanced features like user identification in connection pooling, change management, and content discovery.

Technorati Tags: , , , , , ,

The Five Laws Of Data Masking

Tomorrow I’ll be giving a webcast over at ZDNet (sponsored by Oracle) on the Top 5 Database Security Resolutions for 2008. The resolutions have changed a bit since I first posted about them over here, and I decided to swap in data masking for the last one. I almost pulled it back out after I found out my sponsor (Oracle) just released a data masking product (I try to avoid being too promotional in my webinars), but it’s something I’ve been talking about for a while and it’s too important to pull just because a few people might think I was being biased.

We’re up to nearly 600 people registered for the event, making it one of the largest webcasts I’ve done.

But enough self-promotion; it’s time to talk about data masking.

Data masking started popping up as an issue about 3 years ago. At the time I was covering database security, but client calls were bouncing around between me on the security team and someone over in application development. It’s one of these annoying security issues that crosses organizational boundaries and ends up the responsibility of those will little security experience. It’s an issue that grew organically- first popping up in some audits related to GLBA (a financial services regulation), and now something we see required for PCI and a few other regulations.

Data masking is really a bad term for what we’re talking about. We can technically mask data anywhere, but when we use the term data masking we usually mean “test data generation” or “analytical data generation”. It’s the conversion of production data into either test and development data or data for a data warehouse (OLAP). For this post we’ll focus on test data generation, but the same techniques can be used for an OLAP where you want data that represents production data, but still protects the sensitive stuff.

And that’s our goal- to take sensitive data from a production system and convert it into non-sensitive data suitable for testing or analysis. We can do this through substitution, transposition, obfuscation, de-coupling, scrambling, hashing, or even encryption.

I’m going to quickly eliminate hashing and encryption from the discussion- those techniques are very effective at protecting data, but the result breaks the second rule of data masking- that the data is still representative of the source, without being sensitive.

Organizations are increasingly finding that data masking is mandated for regulatory compliance. It’s also an extremely effective way to reduce enterprise risk. Development and test environments are rarely as secure as production, and there’s little reason developers should have access to sensitive data. Analytical systems are often accessed by a wide variety of users, most of whom shouldn’t see sensitive data, with only a fraction of the access and other security controls in transactional systems.

With that, and since I get way more hits if I have the “x laws” in the title, here are the Five Laws of Data Masking:

  1. Masking must not be reversible. However you mask your data, it should never be possible to use it to retrieve the original sensitive data.
  2. The results must be representative of the source data. The reason to mask data instead of just generating random data is that masking allows you to protect sensitive information that still resembles production data for development and testing purposes. This could include geographic distributions, credit card distributions (e.g., leaving the first 4 numbers unchanged, but scrambling the rest), or maintaining human readability of (fake) names and addresses.
  3. Referential integrity must be maintained. Your masking solution should maintain referential integrity- if a credit card number is a primary key, and scrambled as part of masking, then all instances of that number linked through key pairs must be scrambled identically.
  4. Only mask non-sensitive data if it can be used to recreate sensitive data. It isn’t necessary to mask everything in your database, just those parts that you deem sensitive. But remember, some non-sensitive data can be used to either recreate or tie back to sensitive data. For example, if you scramble a medical ID but the treatment codes for a record could only map back to the original record, you also need to scramble those codes. This is called inference analysis, and your masking should protect against it.
  5. Masking must be a repeatable process. One-off masking is not only nearly impossible to maintain, but it’s fairly ineffective. Development/test data needs to represent constantly changing production data as closely as possible. Analytical data may need to be generated daily, or even hourly. If masking isn’t an automated process it’s inefficient, expensive, and ineffective. I know of some organizations that centralize masking and offer it as an internal service to the enterprise.

These “laws” are just to start the discussion on masking. In future posts I’ll discuss my recommended data masking process and what features to look for in tools.

And if you absolutely can’t wait until I get around to a follow-on post, join me for the webinar on Friday where I’ll dig in a little deeper.

Technorati Tags: , ,