Download Our Kick-Ass Database Encryption and Tokenization Paper

By Rich

It’s kind of weird, but our first white paper to remain unsponsored is also the one I consider our best yet. Adrian and I have spent nearly two years pulling this one together – with more writes, re-writes, and do-overs than I care to contemplate.

We started with a straight description of encryption options, before figuring out that it’s all too complex, and what people really need is a better way to make sense of the options and figure out which will work best in their environments. So we completely changed our terminology, and came up with an original way to describe and approach the encryption problem – we realized that deciding how to best encrypt a database really comes down to managing credentialed vs. non-credentialed users.

Then, based on talking with users & customers, we noticed that tokenization was being thrown into the mix, so we added it to the “decision tree” and technology description sections. And to help it all make sense, we added a bunch of use cases (including a really weird one based on an actual situation Adrian found himself in).

We are (finally) pretty darn happy with this report, and don’t want to leave it in a drawer until someone decides to sponsor.

On the landing page you can leave comments, or you can just download the paper.

We could definitely use some feedback – we expect to update this material fairly frequently – and feel free to spread the word…

No Related Posts

Thanks, Adrian. That almost wraps it up—I do think we have converged to the same wavelength. One last thing about Token Rotation. Here is what I was thinking (dreaming? fantasizing?) about…

1. All systems have the token-SSN for a given real-SSN.
2. Token rotation occurs. Let’s just say every 3 months.
3. Definitely stipulated that the SSN is *NOT* part of the PK.

The way I see getting away from big-bang is this…there is no “magic” to the 3 months, it is just seems like a good guideline. So there is also a 3 months grace period, for a system to execute their rotation to the new current token set.

1. System A sends an interface file to System B.
2. System A has the latest version of the tokens, but System B has the old version.
3. System B (somehow) knows what version System A has.
4. System B invokes the token repository, which replaces the current values with the prior values, in the interface file from System A, that System B needs to work with.
5. System B can now join the interface file from System A, using common (prior) token values.

As I write it, it definitely seems pretty complicated. Maybe a few years down the road…

Thank you VERY MUCH for all your insight.

By Erik

@Erik - I am pretty sure I completely understand the situation you are describing. It’s common, and what I was thinking of when I made my initial response. Whether it is a database or a flat file, you can perform this substitution using a single token to represent a single SSN number. Depending upon how your data is deployed, this could be a real pain in the ass to change. Can you rotate the tokens? Yes. But the alteration must be performed across all systems simultaneously. Tricky, easy to make a mistake, and if the token is a database key, a huge pain in the ass. If you are worried about data leakage, your form will need to change your architecture. Sorry, but this is a “Have your cake and eat it too” type of situation.

As I said before, you can have the token server act as the cross reference so that the different systems are not aware that tokens are different from server to server. But this is based upon some assumptions about how the applications that use this data work. This can work provided the joining logic does not run on multiple servers at the same time. If they do run across data sets, you need global substitution, performed synchronously, and that does not _really_ address the issue because you can still cross reference enough information across enough systems to guess the real SSN. Not sure if I am describing that clearly or not. Regardless, the limitation is in how the systems are tightly coupled.

We are all focusing on PCI as this is where the $$$ is being spent today. We see the desire in the health care industry to use tokenization, but that is small compared to PCI. Today, anyway. Long term PII and other types of data will be equally tokenized.


By Adrian Lane

Hey Adrian and James, both your answer make sense. But I feel like either I am missing something, or I am not painting a good picture of what I envision.

Right now, we have all these different systems that contain SSN. The thing is, having SSN is also a *feature*, not just a bug. The systems all want to use it to join data they receive in interfaces from other systems. Obviously, from a PII point of view, though, that is a *bug*.

So I want to replace real-SSN in all those systems with token-SSN. I want it to be the *same* token-SSN everywhere, so that they can use it to join cross-system data, as they do today. So although I understand that each system *could* have a different token, I actually want them all to have the *same* token-SSN for the same real-SSN.

And it seems to me that is where the long-term leakage concern comes in…if token-SSN is so widely used and proliferated, is it really safe to assume that 1, 3, 5 years down the road, that no mapping of t-SSN to r-SSN has leaked out? I am not a security veteran, so maybe I am being excessively paranoid here, but it seems like some form of token rotation would be indicated here.

It’s interesting, many articles (not just Securosis) refer to SSN as another case for tokenization, but mostly they focus on PCI (for understandable reasons). It just seems to me that this inter-system usage, and long-lived nature of the tokenized value, indicates some different implementation details.

Thanks mucho for your patience and thoughtful responses!

By Erik


I think you can associate the raw data (SSN) with expiration criteria so the token is invalid after reasonable period of time. And incorporate it with the multiple-target Adrian mentioned, you’d have

Token-Target-Expiration-(Raw Data) tuples in your storage.

Upon request to access the raw data, the target and expiration criteria are tested. If any fails, the the raw data cannot be retrieved.

By james lin

@Erik - Propagate? Heck, they use SSN as a primary key in the database. Still. To this day. No reason to do it. But substituting a token when it is used as a primary key is a tricky management task ... but I digress. Long answer but I want to make sure I cover this topic from a couple different perspectives:

First, your basic premise is correct. You could leak the value over time as associated information will give the viewer enough data points to be reasonably certain they know the original value.

Per my comment above, whether it be provided as a service, or maintained in house, you need not provide the same token to every user, service or partner for a single SSN, greatly reducing the chances of leakage. You just need to keep track of the mapping.

A token is nothing more than a random placeholder. Somewhere on the back end the placeholder is mapped to the real SSN/CC#/PII/Fubar data. Provided you don’t do something like use it as a database primary key, there is no reason why you cannot rotate the value if you care to do so. Am I aware of a token provider who does this today? No, not really. Am I aware of data masking vendors who do this today? Yes, there are several. And there are several vendors of Format Preserving Encryption that have customers that use FPE as format preserving tokens. The customers rotate keys on a periodic basis, so the values both change over time_and_ are inconsistent from one machine to the next (because the token is only changed/data re-encrypted when it is accessed).

If one of the token providers does not offer this as an option I would be shocked, at least as kind of a hack if not an intended feature. I’ll ask the vendors on the next round of briefings.

By Adrian Lane


Hi Adrian. Your response totally makes sense, but does not answer the question I *meant* to ask. Let me try to sharpen my focus…my envisioned usage is to replace SSN in all of our many internal systems. I am sure it will not be the first time you have heard that organizations tend to propagate SSN to systems that have no formal (semantic) justification for it, because it is such a convenient universal identifier.

So I would like to replace SSN with a token (call it tSSN) throughout the enterprise. For many/most systems, the token is *all* they would ever need, to link data across internal systems. Systems that sometimes needed the real SSN of course would access that from the token server *when needed*—with all the controls and justifications that go with determining the narrowly limited “when needed” cases.

So back to my question…because SSN is so long-lived, as opposed to a credit card number, my concern is that over time the tokens could become compromised. All it would take, in theory, would be for someone to compile a list of individuals with tSSN, then also compile a list of individuals with the real SSN, and leave them “lying around” (e.g, unsecured on a shared drive, or emailed around). Although compiling a list of records with real SSN should not be allowed, and should become *hard* to do (only specific, approved, controlled processes should be able to access real SSN), it still seems like there are so many ways it could leak out, over time. Most especially during any testing period for a system converting over to tokens.

Thus my question about whether provisions for token rotation are appropriate. Obviously they couldn’t *hurt*, from a pure security point of view—but is the extra effort for regular token-rotation justified, or does it seem like overkill?

(I have another question which relates to how to do this without a killer big-bang data-conversion…I have an idea for doing that, in conjunction with token rotation—if token rotation were a given…but in the interest of clarity and focus, I will save that for a separate thread.)


By Erik

@Erik - Good question. This is true only in the case where the token server issues a single token for a single SSN. This does not need to be the case. Let’s take a credit card example, and assume a payment processors hosts the token server for a large number of merchants. Token servers for Credit Card numbers often issue a single token to a single merchant so they can track within their organization payments fro, a single customer. However, the payment processor issues different tokens to other merchants associated with the original credit card number. 

Taking this example to PII, different health care organizations accessing a single database can be issued different tokens to minimize the ability to reverse engineer the original number. It can still be reverse engineered with enough data points, but makes it drastically harder to do.

Does that answer the question?


By Adrian Lane


Hi. Thanks for this terrific white paper. I have been doing a ton of research on encryption and tokenization, for purposes of securing SSNs in corporate in-house systems, and this has wrapped up and synthesized just about everything I have read elsewhere. Thanks again.

Here is my first question…I work in healthcare, and our potential use for tokenization is to protect PII and specifically, SSNs. It seems to me the SSN/PII application has some important differences from PCI. Notably, it seems like there would be a need for token-rotation. The reason I say this is that, over time, there would be increasing risk that cross-reference lists matching token SSN to real SSN could be created—by either crackers, or by well-meaning employees who do so for their own purposes. (Marc, one of the commenters on your prior post on tokenization alluded to this, even for CC data.)


By Erik

If you like to leave comments, and aren’t a spammer, register for the site and email us at and we’ll turn off moderation for your account.