Come Attend Database Security School

I was fortunate enough to be invited by TechTarget to put together their, “Database Security School“. It’s a compilation of four online educational components: a webcast, podcast, article, and online quiz.

If you manage to put up with me for all four lessons you should walk away with some new ideas on how to approach database security.

Check it out and let me know what you think…

Technorati Tags: , , , ,

Whitepaper: Understanding and Selecting a Database Activity Monitoring Solution

Today, in cooperation with SANS, Securosis is releasing Understanding and Selecting a Database Activity Monitoring Solution. This is a compilation of my multipart series on DAM, fully edited with expanded content.

The paper is sponsored by Guardium, Imperva, Secerno, Sentrigo, and Tizor, but all content was developed independently by me and reviewed by SANS. It is available here, and will soon be available in the SANS Reading Room or directly from the vendors.

It was a fair bit of work and I hope you like it. The content is copyrighted under a Creative Commons license, so feel free to share it and even cut out any helpful bits and pieces as long as you attribute the source.

As always, questions, comments, and complaints are welcome…

… and there isn’t a DAM joke in the entire thing; I save those for the blog.

Technorati Tags: , , ,

Understanding and Selecting a Database Activity Monitoring Solution: Part 6, The Selection Process

At long last, thousands of words and 5 months later, it’s time to close out our series on Database Activity Monitoring. Today we’ll cover the selection process.

For review, you can look up our previous entries here:

Part 1
Part 2
Part 3
Part 4
Part 5

Define Needs

Before you start looking at any tools; you need to understand why you might need DAM; how you plan on using it; and the business processes around management, policy creation, and incident handling.

Create a selection committee: Database Activity Monitoring initiatives tend to involve four major technical stakeholders , and one or two non-technical business units. On the technical side it’s important to engage the database and application administrators with systems that may be within the scope of the project over time, not just the one database and/or application you plan on starting with. Although many DAM projects start with a limited scope, they can quickly grow into enterprise-wide programs. Security and the database team are typically the main project drivers, and the office of the CIO is often involved due to compliance needs or to mediate cross-team issues. On the non-technical side, you should have representatives from audit, as well as compliance and risk (if they exist in your organization). Once you identify the major stakeholders, you’ll want to bring representatives together into a selection committee.

Define the systems and platforms to protect: DAM projects are typically driven by a clear audit or security goal tied to particular systems, applications, or databases. In this stage, detail the scope of what will be protected and the technical specifics of the platforms involved. You’ll use this list to determine technical requirements and prioritize features and platform support later in the selection process. Remember that your needs will grow over time, so break the list into a group of high priority systems with immediate needs, and a second group summarizing all major platforms you may need to protect later.

Determine protection and compliance requirements: For some systems you might want strict preventative security controls, while for others you may just need comprehensive activity monitoring for a compliance requirement. In this step you map your protection and compliance needs to the platforms and systems from the previous step. This will help you determine everything from technical requirements to process workflow.

Outline process workflow and reporting requirements: Database Activity Monitoring workflow tends to vary based on the use case. When used as an internal control for separation of duties, security will monitor and manage events and have an escalation process should database administrators violate policy. When used as an active security control, the workflow may more actively engage security and database administration as partners in managing incidents. In most cases, audit, legal, or compliance will have at least some sort of reporting role. Since different DAM tools have different strengths and weaknesses in terms of management interfaces, reporting, and internal workflow, knowing your process before defining technical requirements can prevent headaches down the road.

By the completion of this phase you should have defined key stakeholders, convened a selection team, prioritized the systems to protect, determined protection requirements, and roughed out workflow needs.

Formalize Requirements

This phase can be performed by a smaller team working under the mandate of the selection committee. Here, the generic needs determined in phase 1 are translated into specific technical features, while any additional requirements are considered. This is the time to come up with any criteria for directory integration, additional infrastructure integration, data storage, hierarchical deployments, change management integration, and so on. You can always refine these requirements after you proceed to the selection process and get a better feel for how the products work.

At the conclusion of this stage you develop a formal RFI (Request For Information) to release to vendors, and a rough RFP (Request For Proposals) that you’ll clean up and formally issue in the evaluation phase.

Evaluate Products

As with any products, it’s sometimes difficult to cut through the marketing materials and figure out if a product really meets your needs. The following steps should minimize your risk and help you feel confident in your final decision:

Issue the RFI: Larger organizations should issue an RFI though established channels and contact a few leading DAM vendors directly. If you’re a smaller organization, start by sending your RFI to a trusted VAR and email a few of the DAM vendors which seem appropriate for your organization.

Perform a paper evaluation: Before bringing anyone in, match any materials from the vendor or other sources to your RFI and draft RFP. Your goal is to build a short list of 3 products which match your needs. You should also use outside research sources and product comparisons.

Bring in 3 vendors for an on-site presentation and demonstration: Instead of generic demonstrations, ask the vendors to walk you through specific use cases that match your expected needs. Don’t expect a full response to your draft RFP; these meetings are to help you better understand the different options out there and eventually finalize your requirements.

Finalize your RFP and issue it to your short list of vendors: At this point you should completely understand your specific requirements and issue a formal, final RFP.

Assess RFP responses and begin product testing: Review the RFP results and drop anyone who doesn’t meet any of your minimal requirements (such as platform support), as opposed to “nice to have” features. Then bring in any remaining products for in-house testing. You’ll want to replicate your highest volume system and the corresponding traffic, if at all possible. Build a few basic policies that match your use cases, then violate them, so you can get a feel for policy creation and workflow.

Select, negotiate, and buy: Finish testing, take the results to the full selection committee, and begin negotiating with your top choice.

Internal Testing

  • Platform support and installation to determine compatibility with your database/application environment. This is the single most important factor to test, including monitoring coverage for the connection methods used in your organization, since different database platforms support a variety of connection types.
  • Performance. Is network or agent performance acceptable for your environment? Don’t set arbitrary standards; monitor performance on your production systems to make sure testing represents operational requirements.
  • Policy creation and management. Create policies to understand the process and complexity. Do you need to write everything as SQL? Will built-in policies meet your needs? Are there wizards and less-technical options for non-database experts to create policies? Then violate policies and try to evade or overwhelm the tool to learn where its limits are.
  • Incident workflow. Review the working interface with those employees who will be responsible for enforcement.
  • Behavioral profiling, if the product supports that as a way of developing policies.
  • Directory integration.
  • Change management integration.
  • Enforcement/blocking/rollback and other advanced features.

Technorati Tags: , , ,

Understanding and Selecting a Database Activity Monitoring Solution: Part 5, Advanced Features

We’re going to be finishing the series off this week, in large part so I can get it compiled together into a whitepaper with SANS, sponsored by Imperva, Guardium, and Sentrigo, before the big RSA show. I won’t be sleeping much this week as I compile and re-write the posts, add additional content that didn’t make it into the blog, create some images, and toss it back and forth with my editor. What? You didn’t think all I did was cut and paste this stuff, did you?

For review, you can look up our previous entries here:

Part 1
Part 2
Part 3
Part 4

What do I mean by advanced features? In our other posts we focused on the core solution set, but most of the products have quite a bit more to offer. There’s no way we can cover everything, and I don’t intend this to be an advertisement for any particular solution set, but there are a few major features we see appearing in more than one product. I’m going to highlight a few I think are particularly interesting and worthy of consideration in the selection process.

Content Discovery

As much as we like to think we know our databases, the reality is we really don’t always know what’s inside them. Many of our systems grew organically over the years, some are managed by external consultants or application vendors, and others find sensitive data stored in unusual locations. To counter these problems, some database activity monitoring solutions are adding content discovery features similar to DLP. These tools allow you to set content-based policies to identify the use of things like credit card numbers in the database, even if they aren’t located where you expect. Discovery tools crawl through registered databases, looking for sensitive content based on policies, and generate alerts for sensitive content in new locations. For example, you could create a policy to identify any credit card number in any database, and generate a report for PCI compliance. The tools can run on a scheduled basis so you can perform ongoing assessments, rather than combing through everything by hand every time an auditor comes knocking.

Some tools allow you to then build policies based on the discovery results. Instead of manually identifying every field with Social Security Numbers and building a different protection policy for each, you create a single policy that generates an alert every time an administrator runs a SELECT query on any field which matches the SSN rule. As the system grows and changes over time, the discovery component identifies the fields matching the protected content, and automatically applies the policy.

We’re also starting to see DAM tools that monitor live queries for sensitive data. Policies are then freed from being tied to specific fields, and can generate alerts or perform enforcement actions based on the result set. For example, a policy could generate an alert any time a query result contains a credit card number, no matter what columns were referenced in the query.

Connection Pooled User Identification

One of the more difficult problems we face in database security is the sometimes arbitrary distinction between databases and applications. Rather than looking at them as a single system, we break out database and application design and administration, and try to apply controls in each without understanding the state of the other. This is readily apparent in the connection pooling problem. Connection pooling is a technique where we connect large applications to large databases using a single shared connection running under a single database user account. Unless the application was carefully designed, all queries come from that single user account (e.g., APP_USR) and we have no way, at the database level, to identify the user performing the transaction. This creates a level of abstraction which makes it difficult, if not impossible, to monitor specific user activity and apply user policies at the database level.

An advanced feature of some database activity monitoring solutions allows them to track and correlate individual query activity back to the application user. This typically involves integration or monitoring at the application level. You now know which database transactions were performed by which application users, which is extremely valuable for both audit and security reasons.

Blocking and Enforcement

Today, most users just deploy database activity monitoring to audit and alert on user activity, but many of the tools are perfectly capable of enforcing preventative policies. Enforcement happens at either the network layer or on the database server itself, depending on the product architecture.

Enforcement policies tend to fall into two categories. The first, similar to many of the monitoring policies we’ve described, are focused on user behaviors like viewing or changing sensitive records. Rather than just alerting after a DBA pulls every account number out of the system, you can block the query. The second is focused on database exploits; similar to an intrusion prevention solution, the system blocks queries matching signatures for known attacks like SQL injection.

The nature and level of blocking will vary based on the architecture of the DAM tool. Integrated agent solutions may offer features like transaction rollback, while network tools block the traffic from hitting the DBMS in the first place. Digging into the specific architectures and benefits is beyond the scope of this post.

Application Activity Monitoring

Databases rarely exist in a vacuum; more often than not they are an extension of applications, yet we tend to look at them as isolated components. Application Activity Monitoring adds the ability to watch application activity, not just the database queries that result. This information can be correlated between the application and the database to gain a clear picture of just how data is being used at both levels, and identify anomalies which may indicate a security or compliance failure.

Since application design and platforms vary even more than databases, products can’t cover every custom application in your environment. We see vendors focusing on major custom application platforms, like SAP and Oracle, and monitoring web-based application activity.

Pre-Configured Application Policies

With or without application monitoring, some DAM solutions come with pre-configured policies for common applications (e.g., PeopleSoft). Although you’ll need to tune them to account for any application customization, they can jump-start the policy building process and save you from manually building all your compliance and security policies.

Pre-Configured Compliance Policies

Although no tool will make you compliant out of the box, pre-configured compliance policies for common platforms give you a head start on the process, especially when you don’t know where to start. Most vendors hire or partner with auditors to help them build their compliance policy and reporting packages for common regulations like SOX and PCI-DSS that frequently impact databases.

Change Management

Although many organizations have rigorous change management policies for the database platform and underlying system, far fewer enforce change management at the query level (which is invisible to traditional change management tools). An advanced feature of certain DAM solutions integrates with change management tools for closed-looped tracking of query-level changes. The requested change is approved in the change management system and a ticket number issued. The DBA enters that ticket number as part of their session, and all database changes (even to individual field updates) are recorded and correlated back to the original change ticket.

Vulnerability Assessment

Vulnerability assessment in databases is a topic worthy of its own series of posts (wink wink), but is sometimes offered as a feature (usually at additional cost) of a DAM solution. Database vulnerability assessment tools look deeper than patch levels, down to specific configurations and even an analysis of user entitlements. Some tools integrate the results of VA to DAM policies to generate alerts or block activity that’s suspected of targeting an identified vulnerability. This is a huge area on its own, and is beyond the scope of this post.

This list is far from exhaustive; I’ve tried to highlight some of the more common advanced features you’ll see through the selection process. I’m sure I’ve missed a few and I encourage those of you on the vendor side to add them in the comments, just avoid too much marketing fluff or I’ll filter the comment. If they’re good (and clearly a big feature starting to appear in multiple products) they might even make it into the white paper.

This post concludes our coverage of DAM features. By now you should have a good idea of how the technology works and the different options available to you. In the final post (probably tomorrow, considering my deadlines) we’ll dig into how to run a successful product selection process.

Technorati Tags: , , ,

Database Security Rule: Use System Generated Primary Keys

I was reading an article by Rsnake this morning on the problems of using a username as a primary key, and it reminded me of something I’ve been meaning to write about for a while.

As a former database designer and current security geek I’m often stunned by how often designers/developers choose really bad primary keys for their databases. Even back in my developer days, when security for me meant taking down drunks at concerts, I knew better than to use something like a username, credit card number, or Social Security Number as a primary key. To be honest, it had nothing to do with security and everything to do with good database design.

Back when I was starting my IT career I was fortunate to work closely with the professor at the University of Colorado who taught database design. One of his cardinal rules was to, wherever possible, use system generated keys as the primary key. Randomly generated keys, as opposed to sequential keys that could “leak” information. Our designs should also strive to conform to at least Fourth Normal Form, although full normalization wasn’t always possible for practical reasons.

We never used Social Security Numbers as primary keys because they are neither always unique (there are mistakes in the system) nor available for all potential users (foreign students were assigned fakes). Credit card numbers are bad because they are not a unique identifier for an individual- I have multiple credit card numbers, any of which can identify me, all of which are temporal (change over time). I have no idea why retailers so often use credit card numbers as all or part of a primary key, since I may use multiple cards even on the same day.

Usernames more closely conform to a viable primary key from a pure, non-security design perspective, but I never liked them for some of the reasons RSnake cites, and I always feel like usernames are temporal, even when they aren’t, and while I haven’t tested it I think the performance of numeric keys is probably higher.

Thus my first rule in picking a primary key, from both a pure design and security perspective, is to use system generated random keys (preferably not sequential auto increment).

For other fields you want unique, like username, SSN, or credit card number, just set a unique index on the field. Worst case, you can even use this to correlate across unrelated systems assuming the rest of the attributes line up (we used to do this using the SSN, before we all learned using them at all is bad).

One criticism of Rsnake’s examples of account hijacking is that even with a pure primary key, if you hard delete a username someone can still impersonate the account, depending on the design of the system.

Technorati Tags: , ,

Understanding and Selecting a Database Activity Monitoring Solution: Part 4, Alerts, Workflow, and Reporting

It seems that every time I write the next part of this multipart series I find myself apologizing for taking too long between posts. I swear I have a good excuse this time- with the whole doctor sticking cameras into my shoulder, shaving out bits, cutting tendons and tying them to new places, putting in plastic anchors, and sewing torn parts of muscles together thing. I’m 11 days into my recovery and while the days are fine, despite learning not to use my arm for the next three months, the nights… let’s just say I fear the nights. I think I’m getting closer to figuring out the right combination of drugs, body position, and pillows that will let me get a little closer to some functional sleep.

But business is good, I’m gaining a little more productivity every day, and… enough about me.

In today’s post we’re going to delve deeper into Database Activity Monitoring. We’re going to talk about alerting, workflow, and reporting.

In my previous post we discussed central management, including policy creation. One of the key advantages of DAM over passive auditing and logging solutions is the ability to define policies for active alerts and manage remediation. While policies are mostly deployed in a passive mode (alerting only) some products also support active blocking, which we will cover in a future post.

I’m really not a fan of relying on passive auditing for security; it’s often important, but with the tools we have today we can generate immediate alerts allowing us to contain security incidents before they spread, or even stop a multi-stage attack before completion. This is one key characteristic separating proactive security tools from simple monitoring/logging tools.

Alerts

Your DAM tools should support both active alerting and an incident handling queue, similar to DLP. These alerts take a few different forms, from email integration, to self-contained events, to communications with outside security tools (like SIEM) using anything from SNMP to syslog to proprietary integration.

Policies should support granular alerting based on conditions, such as thresholds. For example, detection of a single errant query might trigger a low level incident within the included incident handling system, while an incident involving an administrator or high count of credit cards is emailed to a security admin and dropped into the SIEM tool as a high alert.

Not to say you should rely on a SIEM or other external tool to manage your incidents; those tools will never contain the full context and investigative abilities of the dedicated DAM workflow. External alerts play a valuable role in escalating incidents and correlating with external factors, but the primary handling will tend to be managed within the DAM tool itself. Databases are complex beasts, and full understanding of what’s going on internally requires a dedicated tool.

Policy based alerts tend to fall into two or three interrelated categories which often overlap:

  1. User activity: Incidents when a user takes an action that violates policy. It could be a user running a query on sensitive data, updating an existing financial transaction outside of an application, or an application running a query never seen before.
  2. Attack activity/signatures: Some DLP solutions include pre-built detection for certain attack activity. This may be linked to vulnerability analysis, signature based, or heuristic (I’m sure some vendors will chime in with even more options).
  3. System and administrative activity: Incidents involving administrative or internal system activity. E.g. new account creation, privilege escalation, DML/DDL changes, system updates. stored procedures, or other configuration changes. Think of these alerts as being focused on SQL (and non-SQL) outside of simple SELECT, INSERT, UPDATE, DELETE queries.

Workflow

Once an incident is created and any external alerts sent out, it should appear in an incident handling queue for management. This is similar to what we see in DLP and many other security tools, but optimized for database activity.

The queue should be visually well-designed to make critical information easier to find, and allow customization for different work styles and interests. Unlike DLP, it’s less important that the queue appeal to non-technical handlers since it’s far less likely that anyone without database and security knowledge will work directly within the system. For DAM, we tend to rely more on reports for the auditors, risk managers, and other non-security types.

Incidents should be easy to sort and include color coding for sensitivity and criticality. When you click on an incident, it should let you drill down into more details to assist the investigative process. Handlers should be able to assign, share, and route incidents to different users within the system. I’m a big fan of having a drop down field to change incident status right on the incident row. The system should also support role based administration, allowing you to assign specific handlers/administrators based on the policy violated, database affected, or other factors.

The basic workflow must allow for quick sorting, analysis, and investigation of incidents. Once an incident is detected, the handler can close it, add supporting investigative material, change the priority, assign it to someone else, or escalate it. To support investigations you should be able to correlate the current incident with other activity in that database by that user, violations of that policy across different systems, and other factors to help determine what’s going on. Since incident handlers may come from either a database or a security background, look for a tool that appeals to both audiences and supplies each with the information they need to understand the incidents and investigate appropriately.

My description has so far focused on database-only incidents, but some systems are now expanding into platform activity on the database host, or application activity.

Reports

As with nearly any security tool you’ll want flexible reporting options, but pay particular attention to compliance and auditing reports to support compliance needs. Aside from all the security advantages we’ve been talking about, many organizations initially deploy DAM to meet their database audit and compliance requirements. Pre-built report templates can save valuable time, and some vendors have worked with auditors from the major firms to help design their reports for specific regulations, like SOX.

Reports should fall into at least three broad categories: compliance and non-technical reports, security reports (incidents), and general technical reports.

That’s about it for alerts, workflow, and reporting. These features are pretty straightforward and similar to other security tools, yet dedicated specifically for databases. In our next post we’ll start talking about advanced features, like connection pooling, blocking, and change management.

Technorati Tags: ,

Introduction To Database Encryption

Database encryption is like a home repair project- either it’s really easy and goes exactly as planned, or about five minutes in you realize you might not want to make any weekend plans for the next 2-3 years, and perhaps you should take a trip to the flower store before trying to explain why your family will be living with exposed wall studs and dangling wires for a while.

Database encryption (and encryption in general) was one of the first technologies I covered when I first became an analyst. Early on I realized something didn’t smell right; I had vendors talking about using encryption to prevent attacks and to “enhance” access controls. But their products were completely linked to access controls, which didn’t really add any value. Also, most attacks against databases involve compromising user accounts or running queries within the privileges of the user, so how would encryption add any value? Encryption doesn’t do a darn thing against many SQL injection attacks or abuse by authorized users.

This led to a lot of introspection and the eventual development of the Three Laws of Data Encryption. We can thus divide database encryption into two categories:

  1. Encryption for Separation of Duties: In this case we will almost always use encryption to protect against our own administrators or other privileged user access, since we can more easily and efficiently use access controls for everyone else. The example is encryption of credit card numbers, with the keys stored outside of the database, to allow stored numbers for credit card processing but to eliminate the possibility of administrators or users accessing the numbers.
  2. Encryption for Media Protection: Here we encrypt database objects (tables/columns), database files, or storage media to prevent exposure of information due to physical loss of the media.

As you can imagine, encrypting for media protection is much easier than encryption for separation of duties, but it clearly doesn’t offer the same security benefits.

Thus, the first thing we need to decide when looking at database encryption is what are we trying to protect against? If we’re just going after the PCI checkbox or are worried about losing data from swapping out hard drives, someone stealing the files off the server, or misplacing backup tapes, then encryption for media protection is our answer. I’ll discuss it more in a future post, but it’s a fairly straightforward process with manageable performance implications.

If we want to encrypt for separation of duties, then life gets a little more complicated. Databases are complex beasts; far more complex than most people give them credit for. Just go try and teach yourself relational calculus or indexing. They like structured data, and once we start mucking with that by randomizing our data through encryption we start messing with performance. That’s not even counting the normal performance impact of encryption itself.

As with encryption for media protection I’ll talk more specifically about encryption for separation of duties in future posts, but as a general rule of thumb it’s not overly difficult to build encryption into a new database, but if you are encrypting a legacy database accessed by applications (legacy or otherwise) you are sometimes looking at a 2-3 year project due to the required database and application changes. We run into problems with indices, range searches, referential integrity, application integration, connection pooling, key management, and … well, there’s a lot to talk about here.

To close this post out, the first thing to look at when considering database encryption is what threat you are trying to protect against. If it’s loss of the database files and media, look towards media protection. If you want to limit regular user access, look to access controls or other internal database security features. If it’s separation of duties for discrete data (again, we’ll talk more later) then consider column/field encryption, and make sure you can store the keys outside of the database.

As you’ve probably figured out by now, this is one of those multiple-post series things I like to do. In the next one we’ll talk about encryption for media protection and why you might want to combine it with database activity monitoring. After that, I’ll dig into field (or other object) encryption for separation of duties, then we’ll close with more detailed recommendations and a discussion of key management.

BTW- I’m going in for some minor shoulder surgery on Monday which will slow me down for a little while. I’ll have some guest posts for next week, and should be back up and running fairly soon.

Technorati Tags: , , ,

How Data Loss Prevention and Database Activity Monitoring Will Connect

There was a pretty good article over at eWeek today talking about the similarities and differences between DLP and DAM. It was kind of strange to read it, since I used to be the lead analyst covering those markets and I might have been the first person to use the DAM term.

As I’ve discussed here before, I think information-centric security will evolve into two major stacks. DLP is the start of the Content Monitoring and Protection stack, while DAM is the start of the Application and Database Monitoring and Protection stack. We’ll have to see if CMP and ADMP survive as terms now that I’m not with a big analyst firm.

Over time I’ll post more on how those stacks will evolve and what they’ll contain. Reading some of the comments on my last DAM post it’s clear that I still haven’t fully articulated this and need to write some papers on it.

Today I’m going to skip ahead, thanks to the eWeek article, and discuss how the two sides will work together. I’ve come up with this division for a lot of reasons, mostly to do with buying centers, technology overlaps, business problems, and business and threat models.

I have to start with a couple assertions. In the model I’m about to show, the CMP stack is embedded into the world of productivity applications and communications- including DRM applied at the time of information creation using content aware policies. Second, ADMP protects information in business applications and databases, and includes static data labeling (which could come from the DBMS) and can also apply on-the-fly labels using content analysis. CMP is for user-land (Office apps, email, etc.); ADMP is more data center oriented.

What will happen is that rights/labels assigned in one stack with be passed to the other stack as information moves between the two. If I run an extract from a database that includes sensitive information, that extract is tagged as sensitive. If that data goes into an Excel spreadsheet, then a Word document, then a PDF, the rights are maintained through each stage, based on central policies.

For example:

  1. I run a query from a customer database that includes social security numbers in the result.
  2. That data is labeled as sensitive, since the SSN column is labeled as sensitive.
  3. I extract that data to Excel. The extract is only allowed because Excel is integrated as an application that can apply DRM rights.
  4. The document in Excel instantly has mandatory DRM rights applied, based on central policies for that classification of data. We’ve now transitioned from ADMP to CMP.
  5. Those DRM rights are maintained through any subsequent movements of the information.

Here’s an animation from a presentation I gave last week that shows what I mean. Click it at least 3 times to advance.

This is just one example of how they’ll bridge, and yes, it sounds like science fiction. But all the components we need are well in development and you might see real-world examples sooner than you think.

Technorati Tags: , , , ,

41% Of Enterprises Mask Test And Development Data

Last week I gave a webinar on database security for ZDNet, sponsored by Oracle. We had an exceptionally good turnout and ran a couple of polls during the session.

Oracle just posted the results on a new security blog they’ve set up.

One of the questions was on data masking, something we’ve discussed here before. I asked the audience how many actively performed data masking within their organizations.

We got a great response, with a sample size of 139. Not huge, but still somewhat statistically significant. Most organizations don’t data mask, and of those that do, only a combined 13% have a formalized program. No surprises, but it’s nice to see it in some real numbers.

And don’t forget data masking law number 5.

Here’s the obligatory pretty picture, and you can still replay the session over at ZDNet.

200801311018

Technorati Tags: , , , ,

The Five Laws Of Data Masking

Tomorrow I’ll be giving a webcast over at ZDNet (sponsored by Oracle) on the Top 5 Database Security Resolutions for 2008. The resolutions have changed a bit since I first posted about them over here, and I decided to swap in data masking for the last one. I almost pulled it back out after I found out my sponsor (Oracle) just released a data masking product (I try to avoid being too promotional in my webinars), but it’s something I’ve been talking about for a while and it’s too important to pull just because a few people might think I was being biased.

We’re up to nearly 600 people registered for the event, making it one of the largest webcasts I’ve done.

But enough self-promotion; it’s time to talk about data masking.

Data masking started popping up as an issue about 3 years ago. At the time I was covering database security, but client calls were bouncing around between me on the security team and someone over in application development. It’s one of these annoying security issues that crosses organizational boundaries and ends up the responsibility of those will little security experience. It’s an issue that grew organically- first popping up in some audits related to GLBA (a financial services regulation), and now something we see required for PCI and a few other regulations.

Data masking is really a bad term for what we’re talking about. We can technically mask data anywhere, but when we use the term data masking we usually mean “test data generation” or “analytical data generation”. It’s the conversion of production data into either test and development data or data for a data warehouse (OLAP). For this post we’ll focus on test data generation, but the same techniques can be used for an OLAP where you want data that represents production data, but still protects the sensitive stuff.

And that’s our goal- to take sensitive data from a production system and convert it into non-sensitive data suitable for testing or analysis. We can do this through substitution, transposition, obfuscation, de-coupling, scrambling, hashing, or even encryption.

I’m going to quickly eliminate hashing and encryption from the discussion- those techniques are very effective at protecting data, but the result breaks the second rule of data masking- that the data is still representative of the source, without being sensitive.

Organizations are increasingly finding that data masking is mandated for regulatory compliance. It’s also an extremely effective way to reduce enterprise risk. Development and test environments are rarely as secure as production, and there’s little reason developers should have access to sensitive data. Analytical systems are often accessed by a wide variety of users, most of whom shouldn’t see sensitive data, with only a fraction of the access and other security controls in transactional systems.

With that, and since I get way more hits if I have the “x laws” in the title, here are the Five Laws of Data Masking:

  1. Masking must not be reversible. However you mask your data, it should never be possible to use it to retrieve the original sensitive data.
  2. The results must be representative of the source data. The reason to mask data instead of just generating random data is that masking allows you to protect sensitive information that still resembles production data for development and testing purposes. This could include geographic distributions, credit card distributions (e.g., leaving the first 4 numbers unchanged, but scrambling the rest), or maintaining human readability of (fake) names and addresses.
  3. Referential integrity must be maintained. Your masking solution should maintain referential integrity- if a credit card number is a primary key, and scrambled as part of masking, then all instances of that number linked through key pairs must be scrambled identically.
  4. Only mask non-sensitive data if it can be used to recreate sensitive data. It isn’t necessary to mask everything in your database, just those parts that you deem sensitive. But remember, some non-sensitive data can be used to either recreate or tie back to sensitive data. For example, if you scramble a medical ID but the treatment codes for a record could only map back to the original record, you also need to scramble those codes. This is called inference analysis, and your masking should protect against it.
  5. Masking must be a repeatable process. One-off masking is not only nearly impossible to maintain, but it’s fairly ineffective. Development/test data needs to represent constantly changing production data as closely as possible. Analytical data may need to be generated daily, or even hourly. If masking isn’t an automated process it’s inefficient, expensive, and ineffective. I know of some organizations that centralize masking and offer it as an internal service to the enterprise.

These “laws” are just to start the discussion on masking. In future posts I’ll discuss my recommended data masking process and what features to look for in tools.

And if you absolutely can’t wait until I get around to a follow-on post, join me for the webinar on Friday where I’ll dig in a little deeper.

Technorati Tags: , ,