Rich and I are kicking off a short series called “Data Encryption 101: A Pragmatic Approach for PCI Compliance”. As the name implies, our goal is to provide actionable advice for PCI compliance as it relates to encrypted data storage. We write a lot about PCI because we get plenty of end-user questions on the subject. Every PCI research project we produce talks specifically about the need to protect credit cards, but we have never before dug into the details of how. This really hit home during the tokenization series – even when you are trying to get rid of credit cards you still need to encrypt data in the token server, but choosing the best way to employ encryption is varies depending upon the users environment and application processing needs. It’s not like we can point a merchant to the PCI specification and say “Do that”. There is no practical advice in the Data Security Standard for protecting PAN data, and I think some of the acceptable ‘approaches’ are, honestly, a waste of time and effort.
PCI says you need to render stored Primary Account Number (at a minimum) unreadable. That’s clear. The specification points to a number of methods they feel are appropriate (hashing, encryption, truncation), emphasizes the need for “strong” cryptography, and raises some operational issues with key storage and disk/database encryption. And that’s where things fall apart – the technology, deployment models, and supporting systems offer hundreds of variations and many of them are inappropriate in any situation. These nuggets of information are little more than reference points in a game of “connect the dots”, without an orderly sequence or a good understanding of the picture you are supposedly drawing. Here are some specific ambiguities and misdirections in the PCI standard:
- Hashing: Hashing is not encryption, and not a great way to protect credit cards. Sure, hashed values can be fairly secure and they are allowed by the PCI DSS specification, but they don’t solve a business problem. Why would you hash rather than encrypting? If you need access to credit card data badly enough to store it in the first place hashing us a non-starter because you cannot get the original data back. If you don’t need the original numbers at all, replace them with encrypted or random numbers. If you are going to the trouble of storing the credit card number you will want encryption – it is reversible, resistant to dictionary attacks, and more secure.
- Strong Cryptography: Have you ever seen a vendor advertise weak cryptography? I didn’t think so. Vendors tout strong crypto, and the PCI specification mentions it for a reason: once upon a time there was an issue with vendors developing “custom” obfuscation techniques that were easily broken, or totally screwing up the implementation of otherwise effective ciphers. This problem is exceptionally rare today. The PCI mention of strong cryptography is simply a red herring. Vendors will happily discuss their sooper-strong crypto and how they provide compliant algorithms, but this is a distraction from the selection process. You should not be spending more than a few minutes worrying about the relative strength of encryption ciphers, or the merits of 128 vs. 256 bit keys. PCI provides a list of approved ciphers, and the commercial vendors have done a good job with their implementations. The details are irrelevant to end users.
- Disk Encryption: The PCI specification mentions disk encryption in a matter-of-fact way that implies it’s an acceptable implementations for concealing stored PAN data. There are several forms of “disk encryption”, just as there are several forms of “database encryption”. Some variants work well for securing media, but offer no meaningful increase in data security for PCI purposes. Encrypted SAN/NAS is one example of disk encryption that is wholly unsuitable, as requests from the OS and applications automatically receive unencrypted data. Sure, the data is protected in case someone attempts to cart off your storage array, but that’s not what you need to protect against.
- Key Management: There is a lot of confusion around key management; how do you verify keys are properly stored? What does it mean that decryption keys should not be tied to accounts, especially since keys are commonly embedded within applications? What are the tradeoffs of central key management? These are principal business concerns that get no coverage in the specification, but critical to the selection process for security and cost containment.
Most compliance regulations must balance between description vs. prescription for controls, in order to tell people clearly what they need to do without telling them how it must be done. Standards should describe what needs to be accomplished without being so specific that they forbid effective technologies and methods. The PCI Data Security Standard is not particularly successful at striking this balance, so our goal for this series is to cut through some of these confusing issues, making specific recommendations for what technologies are effective and how you should approach the decision-making process.
Unlike most of our Understanding and Selecting series on security topics, this will be a short series of posts, very focused on meeting PCI’s data storage requirement. In our next post we will create a strategic outline for securing stored payment data and discuss suitable encryption tools that address common customer use cases. We’ll follow up with a discussion of key management and supporting infrastructure considerations, then finally a list of criteria to consider when evaluating and purchasing data encryption solutions.
Reader interactions
7 Replies to “Data Encryption for PCI 101: Introduction”
Kurt – Absolutely no disagreement that hashes area ‘cheaper’ than true random numbers. But if we are making this comparison we’re talking about substitution, likely tokenization, and we can pre-compute the random numbers.
–
Your touching on some scientific discussions that people outside of cryptographic circles are not normally exposed to. How do we know randomness is good enough, for a short shelf life token or for a block cipher for that matter? There are a handful of good PRNG’s on the market and — provided you supply them with good entropy — they deliver pretty good random numbers. There is no way I can do justice to the discussion of PRNG and entropy collection issues in a blog comment, but it’s _hard_ to get right. But even without special hardware we can do pretty good provided an attacker does not compromise either or both systems. Special hardware … even a noise generating dongle … helps a lot. I like using 3-D graphing software to examine distribution patterns to see if I am getting close or if something has gone horribly wrong. Wonder if any of the vendors would care to comment on how they verify their solution? Just asking.
—
One of the points we are trying to make with the tokenization paper was the use of hashes as a database index to encrypted values. I am not saying hashing does not have it’s place, but I am saying it’s not the preferred option. Still, if we are talking about tokenization, I would prefer random numbers 1st, code book second, then hashing as replacement strategies.
—
We are writing the PCI Encryption 101 paper to help IT departments make sensible choices for secure credit card _storage_. I appreciate the comments, but it’s ironic that the dynamic of what’s happening with the comments (steering away from the core of the paper) is so common that it’s actually our motivation for writing this paper …
-Adrian
perhaps one reason to prefer hashing over random number substitution is that strong hashes are cheaper than strong (true) random numbers.
in the absence of special hardware, computers are incapable of generating truly random numbers*. they can only generate pseudo-random numbers, and it’s not difficult to imagine that most engineers would over-estimate the strength of such PRNGs just as the strength of custom obfuscation techniques have been over-estimated in the past.
(* computers are finite state deterministic automata, and being deterministic they must sample randomness from the outside world in order to make use of true randomness)
@Dan – I think you might have misread my post. You and I are in agreement.
For many smaller merchants, the only reason they keep a card number after the transaction has been processed is to research chargebacks. A hash of the card number would work just fine for determining if a particular card number was used.
Hashing can be useful, for example, when you don’t necessarily need anyone to be able to view everyone’s credit card numbers, but it is useful to be able to search for a particular one when it is known.
Given a blank slate (technology and process-wise) you might not need it, but how often do you get to work on a PCI gig that doesn’t involve legacy technology and processes?
It’s still a useful technique to have in your toolbox.
@Kevin – Not sure where you are going with using hashes for one-click / recurring – in this field hashing generally refers to one-way hashing which would not be useful. (i.e., you can’t send a hash to a processor and you can’t extract the PAN from the hash short of brute force, etc – so you’d still need to retain the actual PAN somewhere).
@Jay – Wow, what a great comment. Yes, I do see some non-security vendors mock-up encryption algorithms by picking a copy of Applied Cryptography and having a go at it. And, as an example, it will run slow, so they improvise and remove a round. Oops. But there are dozens of other, more subtle, ways to screw things up. If one of the vendors really screw up, competition is fierce enough between the prime players that failures are discovered and ‘illuminated’ rather quickly. That’s why I was comfortable making the emphatic statement. But your point is well taken.
–
I think the bigger issue is the home grown apps from IT people who dabble in application development and need to save money. What do they do? They hash, or ROT cipher. Because it’s easy. And it is secure enough that their peers can’t break it. The self assessment questionnaire rolls in and they check the box for “encryption”. Yipee!
–
Transparent encryption will be, in my opinion, the most contentious issue of this series.
Thanks for the comment,
Adrian
Looking forward to these posts. One point I’d like to offer my perspective on:
“once upon a time there was an issue with vendors developing “custom” obfuscation techniques that were easily broken, or totally screwing up the implementation of otherwise effective ciphers. This problem is exceptionally rare today”
I agree, it’s exceptionally rare that *security* products to attempt their own cryptography. But I’ve had maybe 5 attempts at “custom” crypto come across my desk in the past year alone. All of them have been in products whose primary purpose was not security, but are now being asked to embed some type of cryptography because of customer demand for it. When they attempt to add encryption (or other cryptographic/key management functions) in, they sometimes stray from their areas of strength and end up with some interesting things.
Where these boots are hitting the ground, Kirckhoffs principles from the late 1800’s and and Shannon’s Maxim are still being violated routinely.
The other mistake I see is vendors attempting to leverage other technology like TDE in the database and telling customers that their data is fully “encrypted”, but it’s implemented in such a way that 99% percent of the threats see the T part and aren’t bothered with the E part.
Like I said, looking forward to this series!
I think hashing might still be a viable solution. If an organization does not need access to the credit card number, but still needs to be able to show that a particular known credit card number was used in a transaction then hashing would be an acceptable solution. The key question is will a hashed card number suffice for defense against chargeback claims. If so, then organizations that do not offer one-click shopping or recurring billing may very well be able to avoid the hassles of key management and simply hash the card number.