This is the next installment in what is now officially the longest running blog series in Securosis history: Database Encryption. In case you have forgotten, Rich provided the Introduction and the first section on Media Protection, and I covered the threat analysis portion to help you determine which threats to consider when developing a database encryption strategy. You may want to peek back at those posts as a refresher if this is a subject that interests you, as we like to use our own terminology. It’s for clarity, not because we’re arrogant. Really!

For what we are calling “database media protection” as described in Part 1, we covered the automatic encryption of the data files or database objects through native encryption built into the database engine. Most of the major relational database platforms provide this option, which can be “seamlessly” deployed without modification to applications and infrastructure that use the database. This is a very effective way to prevent recovery of data stored on lost or stolen media. And it is handy when you have renegade IT personnel who hate managing separate encryption solutions. Simple. Effective. Invisible. And only a moderate performance penalty. What more could you want?

If you have to meet compliance requirements, probably a lot more. You need to secure credit card data within the database to comply with the PCI Data Security Standard. You are unable to catalog all of the applications that use sensitive data stored in your database, so you want to stop data leakage at the source. Your DBAs want to be ‘helpful’, but their ad-hoc adjustments break the accounting system. Your quality assurance team exports production data into unsecured test systems. Medical records need to be kept private. While database media protection is effective in addressing problems with data at rest, it does not help enforce proper data usage. Requirements to prevent misuse by credentialed users or compromised user accounts, or enforce separation of duties, are outside the scope of basic database encryption. For these reasons and many others, you decide that you need to protect the data within the database through more granular forms of database encryption; table, column, or row level security. This is where the fun starts! Encrypting for separation of duties is far more complex than encrypting for media protection; it involves protecting data from legitimate database users, requiring more changes to the database itself. It’s still native database encryption, but this simple conceptual change creates exceptional implementation issues. It will be harder to configure, your performance will suffer, and you will break your applications along the way. Following our earlier analogy, this is where we transition from hanging picture hooks to a full home remodeling project. In this section we will examine how to employ granular encryption to support separation of duties within the database itself, and the problems this addresses. Then we will delve into the problems you will to run into and what you need to consider before taking the plunge.

Before we jump in, note that each of these options are commonly referred to as a ‘Level’ of encryption; this does not mean they offer more or less security, but rather identifies where encryption is applied within the database storage hierarchy (element, row, column, table, tablespace, database, etc). There are three major encryption options that support separation of duties within the database. Not every database vendor supports all of these options, but generally at least two of the three, and that is enough to accomplish the goals above. The common options are:

  1. Column Level Encryption: As the name suggests, column level encryption applies to all data in a single, specific column in a table. This column is encrypted using a single key that supports one or more database users. Subsequent queries to examine or modify encrypted columns must possess the correct database privileges, but additionally must provide credentials to access the encryption/decryption key. This could be as simple as passing a different user ID and password to the key manager, or as sophisticated as a full cryptographic certificate exchange, depending upon the implementation. By instructing the database to encrypt all data stored in a column, you focus on specific data that needs to be protected. Column level encryption is the popular choice for compliance with PCI-DSS by restricting access to a very small group. The downside is that the column is encrypted as a whole, so every select requires the entire column to be deencrypted, and every modification requires the entire column to be re-encrypted and certified. This is the most commonly available option in relational database platforms, but has the poorest performance.
  2. Table / Tablespace Encryption: Table level encryption is where the entire contents of a table or group of tables are encrypted as one element. Much like full database encryption, this method protects all the data within the table, and is a good option when all more than one column in the table contains sensitive information. While it does not offer fine-grained access control to specific data elements, it more efficient option than column encryption when multiple columns contain sensitive data, and requires fewer application and query modification. Examples of when to use this technique include personally identifiable information grouped together – like medical records or financial transactions – and this is an appropriate approach for HIPAA compliance. Performance is manageable, and is best when the sensitive tables can be fully segregated into their own tablespace or database.
  3. Field/Cell/Row Level Encryption, Label Security: Row level encryption is where a single row in a table is encrypted, and field or cell level encryption is where individual data elements within a database table are encrypted. They offer very fined control over data access, but can be a management and performance nightmare. Depending upon the implementation, there might be one key used for all elements or a key for each row. The performance penalty is a sharp limitation, especially when selecting or modifying multiple rows. More commonly, separation of duties is supported by label security. This strategy involves structural modifications to the database to support “labeling” each row with attributes corresponding to access rights or user groups. Additionally, each user is assigned access rights that map to one or more of these labels. When a user makes a request, they are only allowed to retrieve/view a subset of the rows with matching label attributes. The query is only applied to a subset of database independent of any action on the user’s part or query modifications. This offers much higher performance and works well with large databases. It can be used in conjunction with field/cell level encryption to provide high security, but as this is often sufficient to address separation of duties, it is used in conjunction with transparent forms of database encryption.

These advantages come at a cost, and one of these costs is the re-engineering effort required for the applications that rely upon the data that has been encrypted. Most database queries rely on functions to format the results or derive information, and fail when referencing encrypted data. For example, grouping functions like ‘summation’ or ‘average’, and more advanced comparisons such as ‘like’ and ‘range’ not longer work. Indices on encrypted columns fail as they are not trying to arrange randomized data. Foreign key relationships and compound keys cause errors and unintended side effects with both application and database functions. Reporting applications and batch jobs run under generic accounts lack permissions to perform their intended functions. The full effect of retrofitting tables and queries designed under a different set of assumptions cannot be adequately estimated, and requires complete regression testing and data verification.

The single biggest complaint we hear from companies when implementing granular encryption regards the performance impact. Depending upon the specific vendor implementation, column level encryption may require anything from several blocks to the entire column of data being decrypted before the query results can be returned. In cases where there are millions of rows scattered across millions of data blocks, the processing overhead is staggering. Encryption also precludes use of several standard performance optimizations, further reducing performance and throughput. For example, establishing a database connection is a time consuming effort for the database, often far exceeding the time needed to execute the user’s query. “Connection Pooling” is a common database feature where connections are pre-established under a generic application user account and remain idle until a user makes a request. But when access to encrypted data requires a complete user ID and credentials, generic service accounts cannot access the encrypted data. Each request needs to be established with a credentialed user account, or the connection modified such that the credentials are passed and authenticated. Another example is data caching, where the database fetches commonly accessed information and stores it in memory. With encryption and label security, what each user sees may be different, and caching is less effective.

Many of these issues can be mitigated or completely addressed, but only when designing encryption into the application and database structures from scratch. If you are moving forward with an encryption project, it is far better to implement these changes into new tables and functions rather than attempt to retrofit new functions into tables and applications designed under a different set of assumptions.

In our next post we will take a closer look at key management options. There are several variants available to support encryption functions, performance, and even separation of duties.