Updated: 06/30/2010

One of the most daunting tasks in information security is protecting sensitive data in (often complex and distributed) enterprise applications. Even the most hardened security professionals enters these projects with at least a modicum of trepidation. Coordinating effective information protection across application, database, storage, and server teams is challenging under the best of circumstances – and much tougher when also facing the common blend of legacy systems and conflicting business requirements.

For the most part our answer to this problem has been various forms of encryption, but over the past few years we’ve seen increasing interest in and adoption of tokenization.

Encryption, implemented properly, is one of the most effective security controls available to us. It renders information unreadable except to authorized users, and protects data both in motion and at rest. But encryption isn’t the only data protection option, and there are many cases where alternatives make more sense. Sometimes the right choice is to remove the data entirely.

Tokenization is just such a technology: it replaces the original sensitive data with unsensitive placeholders. Tokenization is closely related to encryption – they both mask sensitive information – but its approach to data protection is different. With encryption we protect the data by scrambling it using a process that’s reversible if you have the right key. Anyone with access to the key and the encrypted data can regenerate the original values.

With tokenization the process is not reversible. Instead we substitute a token value that’s only associated with the “real” data within a well-protected database. This token can even have the exact same format (size & structure) as the original value, helping minimize application changes. But the token is effectively random, rather than a scrambled version of the original data. The token cannot be compromised to reveal sensitive data.

The power of tokenization is that although the token value is usable within its native application environment, it is completely useless outside. So tokenization is ideal to protect sensitive identifying information such as credit card numbers, Social Security Numbers, and the other personally identifiable information bad guys tend to steal and use or sell on the underground market. Unless they crack the tokenization server itself to obtain the original data, stolen tokens are worthless.

Interest in tokenization has accelerated because it protects data at a lower overall cost. Adding encryption to systems – especially legacy systems – introduces a burden outside the original design. Making application changes to accomodate encrypted data can dramatically increase overhead, reduce performance, and expand the responsibilities of programmers and systems management staff. In distributed application environments the need to encrypt, decrypt, and re-encrypt data in different locations results in exposures that attackers can take advantage of. More instances where systems handle keys and data mean more opportunities for compromise. For example, one growing attack is the use of memory parsing malware: malicious software installed on servers and capable of directly accessing memory to pull encryption keys or data from RAM, even run without administrative privileges.

Aside from minimizing application changes, tokenization also reduces potential data exposure. When properly implemented, tokenization enables applications to use the token throughout the whole system, only accessing the protected value when absolutely necessary. You can use, store, and transact with the token without fear of exposing the sensitive data it represents. Although at times you need to pull out the real value, tokenization allows you to constrain its usage to your most secure implementations.

For example, one of the most common uses for tokenization is credit card transaction systems. We’ll go into more depth later, but using a token for the credit card number allows us to track transactions and records, only exposing the real number when we need to send a transaction off to the payment processor. And if the processor uses tokenization as well, we might even be able to completely eliminate storing credit card numbers.

This doesn’t mean tokenization is always a better choice than encryption. They are closely related and the trick is to determine which will work best under the particular circumstances.

In this series we’ll dig deep into tokenization to explain how the technology works, explore different use cases and deployment scenarios, and review selection criteria to pick the right option. We’ll cover everything from tokenization services for payment processing and PCI compliance to rolling your own solution for internal applications.

In our next post we’ll describe the different business justifications, and follow up with a high-level description of the different tokenization models. After that we’ll post on the technology details, deployment, use cases, and finally selection criteria and guidance.

If you haven’t figured it out by now, we’ll be pulling all this together into a white paper for release later this summer. Just keep this in mind: sometimes the best data security choice is to avoid keeping the data at all. Tokenization lets us remove sensitive data while retaining much of its value.