We get lots of questions about tokenization – particularly about substituting tokens for sensitive data. Many questions from would-be customers are based on misunderstandings about the technology, or the way the technology should be applied. Even more troublesome is the misleading way the technology is marketed as a replacement for data encryption. In most cases it’s not an either/or proposition. If you have sensitive information you will be using encryption somewhere in your organization. If you want to use tokenization, the question becomes how much to supplant encrypted data with tokens, and how to go about it.
A few months back I posted a rebuttal to Larry Ponemon’s comments about the Ponemon survey “What auditors think about Crypto”. To me, the survey focused on the wrong question. Auditor opinions on encryption are basically irrelevant. For securing data at rest and motion, encryption is the backbone technology in the IT arsenal and an essential data security control for compliance. It’s not like you could avoid using encryption even if you and your auditor both suddenly decided this would be a great thing. The real question they should have asked is, “What do auditors think of tokenization and when is it appropriate to substitute for encryption?” That’s a subjective debate where auditor opinions are important.
Tokenization technology is getting a ton of press lately, and it’s fair to ask why – particularly as its value is not always clear. After all, tokenization is not specified by any data privacy regulations as a way to comply with state or federal laws. Tokenization is not officially endorsed in the PCI Data Security Standard, but it’s most often used to secure credit card data. Actually, tokenization is just now being discussed by the task forces under the purview of the PCI Security Standards Council, while PCI assessors are accepting it as a viable solution. Vendors are even saying it helps with HIPAA; but practical considerations raise real concerns about whether it’s an appropriate solution at all.
It’s time to examine the practical questions about how tokenization is being used for compliance. With this post I am launching a short series on the tradeoffs between encryption and tokenization for compliance initiatives. About a year ago we performed an extensive research project on Understanding and Selecting Tokenization, focusing on the nuts and bolts of how token systems are constructed, with common use cases and buying criteria. If you want detailed technical information, use that paper. If you are looking to understand how tokenization fits within different compliance scenarios, this research will provide a less technical examination of how to solve data security problems with tokenization. I will focus less on describing the technology and buying criteria, and more on contrasting the application of encryption against tokenization.
Before we delve into the specifics, it’s worth revisiting a couple of key definitions to frame our discussion:
Tokenization is a method of replacing sensitive data with non-sensitive placeholders called tokens. These tokens are swapped with data stored in relational databases and files. The tokens are commonly random numbers that take the form of the original data but have no intrinsic value. A tokenized credit card number cannot be used (for example) as a credit card for financial transactions. Its only value is as a reference to the original value stored in the token server that created and issued the token. Note that we are not talking about identity tokens such as the SecurID tokens involved in RSA’s recent data breach.
Encryption is a method of protecting data by scrambling it into an unreadable form. It’s a systematic encoding process which is only reversible if you have the right key. Correctly implemented, encryption is nearly impossible to break, and the original data cannot be recovered without the key. The problem is that attackers are smart enough to go after the encryption keys, which is much easier than breaking good encryption. Anyone with access to the key and the encrypted data can recreate the original data. Tokens, in contrast, are not reversible.
There is a common misconception that tokenization and format preserving tokens – or more correctly Format Preserving Encryption – are the same thing, but they are not. The easiest way to understand the differences is to consider the differences between the two. Format Preserving Encryption is a method of creating tokens out from sensitive data. But format preserving encryption is still encryption – not tokenization. Format preserving encryption is a way to avoid re-coding applications or re-structuring databases to accommodate encrypted (binary) data. Both tokenization and FPE offer this advantage. But encryption obfuscates sensitive information, while tokenization removes it entirely (to another location). And you can’t steal data that’s not there. You don’t worry about encryption keys when there is no encrypted data.
In followup posts I will discuss the how to employ the two technologies – specifically for payment, privacy, and health related information. I’ll cover the high-profile compliance mandates most commonly cited as reference examples for both, and look at tradeoffs between them. My goal is to provide enough information to determine if one or both of these technologies is a good fit to address your compliance requirements.