We’ve been writing a lot on tokenization as we build the content for our next white paper, and in Adrian’s response to the PCI Council’s guidance on tokenization. I want to address something that’s really been ticking me off…
In our latest post in the series we described the details of token generation. One of the options, which we had to include since it’s built into many of the products, is encryption of the original value – then using the encrypted value as the token.
Here’s the thing: If you encrypt the value, it’s encryption, not tokenization! Encryption obfuscates, but a token removes, the original data.
Conceptually the major advantages of tokenization are:
- The token cannot be reversed back to the original value.
- The token maintains the same structure and data type as the original value.
While format preserving encryption can retain the structure and data type, it’s still reversible back to the original if you have the key and algorithm. Yes, you can add per-organization salt, but this is still encryption. I can see some cases where using a hash might make sense, but only if it’s a format preserving hash.
I worry that marketing is deliberately muddling the terms.
Opinions? Otherwise, I declare here and now that if you are using an encrypted value and calling it a ‘token’, that is not tokenization.
Reader interactions
19 Replies to “FireStarter: an Encrypted Value Is *Not* a Token!”
@Rich – “At least with tokenization you have a single choke point to focus your efforts on.”
I think focusing your efforts on securing the tokenization/de-tokenization process (regardless of method) is the correct approach and I think everyone can agree on this.
Worrying about an attacker’s focus on decrypting a ciphertext in the absence of a decryption service or the key isn’t warranted in most cases.
Sorry about the disclosure – I hadn’t seen any of the other vendors (Voltage, RSA, Mercury Payments) in this thread mention their affiliations and I’ve never been called out on that before on this site. My comments are my own and do not reflect my employer’s. As a security professional I am objective to all points of view.
Mark,
You can say the same things about plenty of encryption systems that are compromised on a regular basis, including those in the payment industry.
In other words- specious argument. Both the token and encryption servers need to be secure or the system breaks down for both approaches. At least with tokenization you have a single choke point to focus your efforts on.
If anything we’ve seen real world exploitation of key management in payment systems, but not (yet) tokenization.
(BTW- please disclose when you work for a vendor involved in what you are commenting on, we don’t filter anything, but we do ask that as a courtesy).
Wow – I’m almost sad I missed this thread when it was raging. I think the long and short of it is that the reason people are in such disagreement over what a token is or which is better (encryption vs. tokenization) is that tokenization is not a technology that you can easily put your finger on. It’s a use case of data transformation…not a specific technology or method.
High value data goes into black box, out comes de-valued data
And the opposite…
De-valued data goes into black box, out comes high value data
Because we require reversibility in tokenization, the attacks against a token are all going to focus on that function. Using a cipher as a generation method means you have to focus your attacks against the decryption process. Using a lookup table to associate de-valued data to high value data means that the attacks must focus on access to the lookup function.
The generation method is irrelevant as long as the reversal process can’t be performed without going to “the black box.” In cryptography the attacks against ciphers are well documented and often enter the realm of millenia (or more) to perpetrate. On the other hand, when was the last time you heard of someone spoofing, or socially engineering, credentials to a database? Or when was the last time someone published an 0-day against Oracle? (Here’s one on your site – http://securosis.com/blog/litchfield-discloses-oracle-0-day-at-black-hat/)
Compare that to the last time you heard of an equivalent attack against AES…?
It also seems to me that using a lookup function can quickly get out of hand if you are a service provider and provide each merchant with its own collection of token=PAN pairings.
Again, sorry I missed the first round of fires on this.
@Jay and @Mark – your comments are very well stated – I might have to quote you on my next call about this topic 🙂
Mark,
You are correct that we did not discuss – or worse, trivialized – authentication at the token entry point. It’s an important point. And one that is difficult to discuss in that it changes for every deployment model. As Rich mentioned we will discuss in the deployment guidance later in the series.
I am also ignorant on another point you raise … a dictionary attack on input patterns. Can you clarify what you mean? Is this possible with random number tokens? Hashed & salted tokens?
-Adrian
Mark,
Reasonable enough. We do have another post in the series (near the end) of deployment guidance and pitfalls to avoid. I think that should take care of your concerns. You’re catching us mid-series here 🙂
Hi Rich,
Thanks for responding – I think we’re actually on the same page and I certainly welcome your efforts in bringing attention to the different ways Tokenization can take place on a backdrop of best practices.
I guess its just when I see someone claiming them can tokenize by splitting a credit card number into two halves and storing them in two databases in the same site so its “no longer cardholder data” that the hairs on the back of my neck stand up. There’s no “science” there as I’m sure you would agree:)
And no, I didn’t read that or see that recommended it on your site.
With regards to the FUD aspect – I respectfully disagree. Tokenization is a powerful technique and I have no qualms with that – but one of the points you’ve missed in your analysis is critical aspect of authentication of the tokenization entry point (not just the detokenization). If this isn’t carefully managed, then it is possible to perform a dictionary based attack in some cases which can be made easier if the input fields have predictable input patterns (consider a regionally issued credit card from a smaller bank with 1 BIN). Revealing live PAN’s without access to a detokenizing system is a risk that needs to be avoided.
In your recommendation, all that is covered is this “Send new data to be tokenized and retrieve the token.”
I think this is a shortcoming in your recommendations given the probable use of using Tokenization for data that has known input patterns which reduces the effort needed in the dictionary or table attack. Whilst these kinds of attacks may be difficult, they are not impossible. its also possible to avoid them by design. I suspect this is one of the considerations Visa has looked at too.
Keep lighting the fires 🙂
Regards,
Mark Bower
Mark,
Sorry for the time delay in approving your comment. We were *all* out of the office for Black Hat or vacations last week, and while I usually still approve them remotely I lost track due to the event.
I want to respond to a couple of points…
1. The PCI scope is pretty clear-
Note to Adrian/Editor: my comments were also following Rich (not Jay).
Also – the intro position that this is follow up to the PCI Council is not correct – this subject is on Visa’s recommendations. These are independent and separate to the PCI SSC.
Probably important to make that clear!
Adrian,
Regarding your statement: “Key here is to remember, PCI DSS is allowing systems that substitute credit card data with tokens to be removed from the audit based upon the premise that PAN data is not available”
I’d be interested if you could point to the specific part of PCI DSS today that states that Tokens remove systems from the validation requirements. There’s a lot of work going on in this area but nowhere does this get stated in PCI DSS to be clear.
Thus, merely claiming one is “using Tokenization” may or may not reduce scope and may or may not increase security: it has to be done right: only a QSA can make that decision when looking at the specifics of an implementation.
A lot of claims are made about Tokenization security, and many are not based on science. I would also point out that getting Tokenization right is a lot more involved than merely substituting data and managing a Data Vault. Many of the types of attacks on cryptosystems still apply in slightly different forms to Tokenization systems especially if such systems do not pay very good attention to the token generation process, exactly what you tokenize in the first place, and most importantly how you manage credentials and access to BOTH the tokenizing system and detokenizing system and any images of it that are distributed.
The suggestion that Tokenization is “simple” is also a somewhat misleading statement: if you have to manage, sync, distribute and contain a growing database of tokens, keys and other sensitive materials (credentials), monitor it etc, then this starts to become a significant surface to risk manage – especially the entry and exit points and their data paths. Also, how do you manage a re-tokenize event if your token systems somehow have been compromised so the tokens themselves can now be manipulated, injected and abused? Assuring that the tokenizing engine has not been tampered with or the sources of entropy used to generated tokens are within specification are all considerations. One cannot underestimate the ingenuity of todays sophisticated attackers.
An open access tokenizer for example may permit a successful table based attack on a poorly implemented system given knowledge of cardholder data patterns. A badly design hashing token approach which does not may attention to security may lead to simple compromise without even attacking the token database. VISA’s guidance is refreshing to see more rigor being necessary. Perhaps these types of attacks are what VISA indicated in their statement:
“Where properly implemented, tokenization may help simplify a merchant’s payment card environment,” said Eduardo Perez, Head of Global Payment System Security, Visa Inc. “However, we know from working with the industry and from forensics investigations, that there are some common implementation pitfalls that have contributed to data compromises. For example, entities have failed to monitor for malfunctions, anomalies and suspicious activity, allowing an intruder to manipulate the tokenization system undetected. As more merchants look at tokenization solutions, these best practices will provide guidance on how to implement those solutions effectively and highlight areas for particular vigilance,”
With regard to Jays comments:
“In the real world, we