Tokenization Topic Roundup
Tokenization has been one of our more interesting research projects. Rich and I thoroughly understood tokenization server functions and requirements when we began this project, but we have been surprised by the depth of complexity underlying the different implementations. The variety of variations and different issues that reside ‘under the covers’ really makes each vendor unique. The more we dig, the more interesting tidbits we find. Every time we talk to a vendor we learn something new, and we are reminded how each development team must make design tradeoffs to get their products to market. It’s not that the products are flawed – more that we can see ripples from each vendor’s biggest customers in their choices, and this effect is amplified by how new the tokenization market still is. We have left most of these subtle details out of this series, as they do not help make buying decisions and/or are minutiae specific to PCI. But in a few cases – especially some of Visa’s recommendations, and omissions in the PCI guidelines, these details have generated a considerable amount of correspondence. I wanted to raise some of these discussions here to see if they are interesting and helpful, and whether they warrant inclusion in the white paper. We are an open research company, so I am going to ‘out’ the more interesting and relevant email. Single Use vs. Multi-Use Tokens I think Rich brought this up first, but a dozen others have emailed to ask for more about single use vs. multi-use tokens. A single use token (terrible name, by the way) is created to represent not only a specific sensitive item – a credit card number – but is unique to a single transaction at a specific merchant. Such a token might represent your July 4th purchase of gasoline at Shell. A multi-use token, in contrast, would be used for all your credit card purchases at Shell – or in some models your credit card at every merchant serviced by that payment processor. We have heard varied concerns over this, but several have labeled multi-use tokens “an accident waiting to happen.” Some respondents feel that if the token becomes generic for a merchant-customer relationship, it takes on the value of the credit card – not at the point of sale, but for use in back-office fraud. I suggest that this issue also exists for medical information, and that there will be sufficient data points for accessing or interacting with multi-use tokens to guess the sensitive value it represents. A couple other emails complained that inattention to detail in the token generation process make attacks realistic, and multi-use tokens are a very attractive target. Exploitable weaknesses might include lack of salting, using a known merchant ID as the salt, and poor or missing of initialization vectors (IVs) for encryption-based tokens. As with the rest of security, a good tool can’t compensate for a fundamentally flawed implementation. I am curious what you all think about this. Token Distinguishability In the Visa Best Practices guide for tokenization, they recommend making it possible to distinguish between a token and clear text PAN data. I recognize that during the process of migrating from storing credit card numbers to replacement with tokens, it might be difficult to tell the difference through manual review. But I have trouble finding a compelling customer reason for this recommendation. Ulf Mattsson of Protegrity emailed me a couple times on this topic and said: This requirement is quite logical. Real problems could arise if it were not possible to distinguish between real card data and tokens representing card data. It does however complicate systems that process card data. All systems would need to be modified to correctly identify real data and tokenised data. These systems might also need to properly take different actions depending on whether they are working with real or token data. So, although a logical requirement, also one that could cause real bother if real and token data were routinely mixed in day to day transactions. I would hope that systems would either be built for real data, or token data, and not be required to process both types of data concurrently. If built for real data, the system should flag token data as erroneous; if built for token data, the system should flag real data as erroneous. Regardless, after the original PAN data has been replaced with tokens, is there really a need to distinguish a token from a real number? Is this a pure PCI issue, or will other applications of this technology require similar differentiation? Is the only reason this problem exists because people aren’t properly separating functions that require the token vs. the value? Exhausting the Token Space If a token format is designed to preserve the last four real digits of a credit card number, that only leaves 11-12 digits to differentiate one from another. If the token must also pass a LUHN check – as some customers require – only a relatively small set of numbers (which are not real credit card numbers) remain available – especially if you need a unique token for each transaction. I think Martin McKey or someone from RSA brought up the subject of exhausting the token space, at the RSA conference. This is obviously more of an issue for payment processors than in-house token servers, but there are only so many numbers to go around, and at some point you will run out. Can you age and obsolete tokens? What’s the lifetime of a token? Can the token server reclaim and re-use them? How and when do you return the token to the pool of tokens available for (re-)use? Another related issue is token retention guidelines for merchants. A single use token should be discarded after some particular time, but this has implications on the rest of the token system, and adds an important differentiation from real credit card numbers with (presumably) longer lifetimes. Will merchants be able to disassociate the token used for