We’ve been writing a lot on tokenization as we build the content for our next white paper, and in Adrian’s response to the PCI Council’s guidance on tokenization. I want to address something that’s really been ticking me off…
In our latest post in the series we described the details of token generation. One of the options, which we had to include since it’s built into many of the products, is encryption of the original value – then using the encrypted value as the token.
Here’s the thing: If you encrypt the value, it’s encryption, not tokenization! Encryption obfuscates, but a token removes, the original data.
Conceptually the major advantages of tokenization are:
- The token cannot be reversed back to the original value.
- The token maintains the same structure and data type as the original value.
While format preserving encryption can retain the structure and data type, it’s still reversible back to the original if you have the key and algorithm. Yes, you can add per-organization salt, but this is still encryption. I can see some cases where using a hash might make sense, but only if it’s a format preserving hash.
I worry that marketing is deliberately muddling the terms.
Opinions? Otherwise, I declare here and now that if you are using an encrypted value and calling it a ‘token’, that is not tokenization.
Reader interactions
19 Replies to “FireStarter: an Encrypted Value Is *Not* a Token!”
’tis a beautiful thing to be challenged and learn… thank you.
If I could paraphrase what I grok from this: cryptosystems are complicated and come in a wide array of arrangements. Tokenization only has one model (client/server). The one model for tokenization has qualities that make it difficult to screw up from a client perspective, therefore systems dealing in tokens may be excluded from further inspection.
That makes sense to me. Got it, I think.
Perhaps my hang up is because I know there are functionally equivalent cryptosystems and token solutions. Both can require client authentication, plaintext being sent to a server and de-valued data being returned. To reverse it, both require authentication, the de-valued data with the original plaintext being returned. Both would require a breach at the one centralized location before the reversal process may be compromised/distributed.
But, to my first statement here, that is but one of a myriad of possible solutions for crypto and the only method for tokens. So while I personally feel the less-robust cryptosystems are spoiling the more-robust cryptosystems, I think this thread is more that the architecture of tokenization is getting the spotlight.
@Jay,
I have a different perspective here.
In the real world, we’ve seen a number of attacks where the attacker is able to compromise the key (memory parsing attacks, for example), or capture the data when it is unencrypted at some point in the application chain.
Tokenization nearly wipes-out that concern. The attacker either needs to hit the tokenization server or any back-end systems that use the real value, which is typically a *far* smaller set of systems than what we see in an encryption implementation.
Thus tokenization materially reduces the attack surface, which is why it also reduces audit scope. That isn’t true of encryption, since anywhere the value is present is still part of the attack surface. With tokenization, that’s reduced to only the locations where the original value is stored.
Make sense?
@Jay – I understand your viewpoint, but we are arguing two different points. I am not questioning the effectiveness of cryptography. I am saying any crypto system must be verified, and if I have the option to avoid that responsibility altogether, that’s preferable. As concise as possible to respond to some of your points …
>If the merchant wants to access the data, why use tokens at all? Why not just encrypt the database fields? That’s what they are be doing today. Why would anyone substitute one crypto system for another if the first one was not broken and the two are functionally equivalent?
> The tokens (encrypted & random number variants) are equivalent … up until the point you introduce decryption apparatus and provide a means to retrieve PAN data. They may _still_ be equivalent, but only once the security of the crypto system is verified.
> The point is to _not_ perform an audit. Cryptosystem implementation and deployments can be botched. Removing the need to audit is where time and money are saved.
> “Why should systems that only deal in that cipher-text be in scope for PCI when an equivalent token is considered out of scope?” Because there is a chance FPE can be hacked, however slim, and a random number can’t be.
> The hope is that the security problem devolves to access control of the token server. And the hope is there is only one token server to worry about so you focus your time, money and resources at that point. Further — if you are a merchant — you are better off from a security perspective if the token server is not housed within your environment at all. That way any attack and possible compromise is entirely outside of the merchant environment.
> There are some craptastic token servers out there … using reversible hashes, storing keys inside the token repository, using fungible encryption methods, etc.
Once again, I see your argument, and I don’t really disagree with it. The potential of cryptography is not in question. But as there is a cheaper and more secure option available, the use case for encryption _is_ in question.
-Adrian
@Adrian – I must be missing the point, my apologies, perhaps I’m just approaching this from too much of a cryptonerd perspective. Though, I’d like to think I’m not being overly theoretical.
To extend your example, any merchant that wants to gain access to the de-tokenized content, we will need to make a de-tokenization interface available to them. They will have the ability to get at the credit card/PAN of every token they have. From the crypto side, if releasing keys to merchants is unacceptable, require that merchants return ciphertext to be decrypted so the key is not shared… What’s the difference between those two?
Let’s say my cryptosystem leverages a networked HSM. Clients connect and authenticate, send in an account number and get back ciphertext. In order to reverse that operation, a client would have to connect and authenticate, send in cipher text and receive back an account number. Is it not safe to assume that the ciphertext can be passed around safely? Why should systems that only deal in that ciphertext be in scope for PCI when an equivalent token is considered out of scope?
Conversely, how do clients authenticate into a tokenization system? Because the security of the tokens (from an attackers perspective) is basically shifted to that authentication method. What if it’s a password stored next to the tokens? What if it’s mutual SSL authentication using asymmetric keys? Are we just back to needing good key management and access control?
My whole point is that, from my view point, I think encrypting data is getting a bad wrap when the problem is poorly implemented security controls. I don’t see any reason to believe that we can’t have poorly implemented tokenization systems.
If we can’t control access into a cryptosystem, I don’t see why we’d do any better controlling access to a token system. With PCI DSS saying tokenization is “better”, my guess is we’ll see a whole bunch of mediocre token systems that will eventually lead us to realize that hey, we can build just as craptastic tokenization systems as we have cryptosystems.
@Jay – Very well stated. If the argument was just the theoretical perspective I may agree, but the theoretical security of the token is not a realistic assessment here. You do not get to gauge the security of the _system_ based upon the how hard it is to crack the standalone token. These things are not equivalent.
Key here is to remember, PCI DSS is allowing systems that substitute credit card data with tokens to be removed from the audit based upon the premise that PAN data is not available. But if it is encrypted with the keys it _is_ available! How can you justify removing a system from a PCI audit when you are actually storing the credit card/PAN and keys? Sure, the theoretical algorithm, and the implementation of that algorithm, may be secure. Yes, if key management is done right it will be _really_ hard to break. But the fact is you don’t know if key management has been properly performed! Is the encryption system secure or has been impaired? It’s a ridiculous argument to claim that tokens constructed through mathematical functions are unbreakable so we don’t need to audit tokenized systems.
Any merchant that wants to gain access to the encrypted content you will need to have the keys accessible. That means the entire crypto-system is present and it all needs to be reviewed. If you include systems with tokenized data in the scope of the PCI audit you have nullified the cost savings benefit that makes tokenization so attractive.
Thanks again for a great comment.
-Adrian
I gotta disagree, perhaps I’m missing something, either that or you’re confusing the possibility of reversing encryption with the probability of reversing encryption. The statement “The token cannot be reversed back to the original value” could be appended with “unless access is granted to do so”. Because some systems want to reverse it for various purposes, the reversal process exists, right?
But the argument here is that encryption is somehow different (injecting: from a security perspective) from tokenization.
We could easily make the statement that an encrypted value cannot be reversed back to the original value (unless access is granted to do so). Having an encrypted value without access to the key is functionally no different than having a token without access to reverse it.
Breaking it down further, clients wishing to de-tokenize are probably set up to authenticate to a tokenization system, much in the same way clients wishing to decrypt must authenticate into a key management system or Hardware Security Module (HSM). Access to the key can/should be controlled just as access to a tokenization solution.
Perhaps, Rich, your position is assuming a foundation of poor key management? Because I struggle with this concept the standalone tokens and standalone ciphertext would have different value to an attacker. Both are devalued and both rely on some other breakdown in security to return value to the data. Is this assuming that keys are somehow more vulnerable to compromise than a tokenization solution? Or are you thinking that brute force of keys is a realistic attack?
Love it. Even more that we’re on the same page and didn’t even discuss this. I posted last thursday as well!
Tadd,
What I’m bringing to light is the discussion of how an entire tokenization system should be applied and appropriately labeled. A token by itself is completely useless. For transactions to occur, a real card number has to be passed to the card brand networks. The first sentence of your second paragraph is where the issue lies. There must be a point where the token turns into a credit card number. That is where the tokenization system becomes vulnerable to attack. That is why some systems send ciphertext back to the client to store as a way of splitting the data from the key.
Another topic entirely is that there are tokenization systems where the client sends the token to a server and receives a card number in response. Defeats the whole purpose. :/
No matter how the “tokenization” system is implemented, at a basic level somewhere there will be a ciphertext and there will be a key used for decryption. If the service provider houses ciphertext and key, an attacker who gets into their systems can hit the jackpot by getting all the stored data. If the key and data are split, a merchant or service provider hacked individually does not put that stored data at risk.
In that type of implementation, you can call the returned data “ciphertext”, “keying material” or other technically correct names depending on how the split is performed. Unfortunately, most merchants, VARs, and ISVs are confused and don’t understand applied cryptography. I would be willing to bet they are more likely to buy “tokenization” since that’s the name that appears in card brand and PCI SSC literature.
The PCI SSC has already published an FAQ entry stating that if it can be shown that encrypted card data cannot be decrypted, it’s not considered cardholder data. That kind of a judgment makes logical sense.
At the end of the day, the goal is to decrease the attack surface area of payment card data which in turn helps reduce fraud. In an implementation that returns ciphertext or keying material to the merchant, understanding the marketing needs, and knowing the intent of PCI, what would you recommend a vendor do?
The point being that FPE isn’t a standard. What you buy from company A, can’t be decrypted by company B. NIST is considering AES FFX mode and, from what I’ve heard, considering it seriously (though I’ve heard were unaware of the CC/SSN use case). But until that happens, non-standard crypto should terrify people looking to encrypt an entire line of business.
Layer onto that that company A and B are startups that might get sold any day to another company that may or may not care about the install base of that particular product. That’s a big risk.
So companies that offer FPE have to tie it back to something that’s recognizable, reversible, and reasonably standard. They have to borrow the “reducing audit scope” message of tokenization or what are you doing it for?