Tokenization vs. Encryption: Healthcare Data Security

Securing Personal Health Records (PHR) for healthcare providers is supposed to be the next frontier for many security technologies. Security vendors market solutions for Protected Health Information (PHI) because HIPAA and HITECH impose data security and privacy requirements. Should a healthcare provider fail in their custodial duty to protect patient data, they face penalties – theoretically at least – so they are motivated to secure the data that fuels their business. Tokenization is one of the technologies being discussed to help secure medical information, based on its success with payment card data, but unfortunately protecting PHR is a very different problem.

A few firms have adopted tokenization to substitute for personal information – usually a single token that represents name, address and Social Security number – with the remainder of the data in the clear. But this use case is a narrow one. The remainder of the health-related data used for analysis – age, medical conditions, medications, zip code, heath care, insurance, etc. – can be used while the patient (theoretically) remains anonymous. But this usage is not very effective because it’s part of the medical, billing and treatment data that needs to be anonymized. It has not yet been legally tested, but a company may be protected if they substitute a person’s name, address, and Social Security number, even if the rest of the data should be lost or stolen. Technically they have transformed the records into an ‘indecipherable’ state, so even if a skilled person can reverse engineer the token back into the original patient identity, the company has reduced the risk of penalties. At least until a court decides what “low probability” means.

So while there is a lot of hype around tokenization for PHI, here’s why the model does not work. It’s a ‘many-to-many’ problem: we have many pieces of data which are bundled in different ways to serve many different audiences. For example, PHI is complex and made up of hundreds of different data points. A person’s medical history is a combination of personal attributes, doctor visits, complaints, medical ailments, outsourced services, doctors and hospitals who have served the patient, etc. It’s an entangled set of personal, financial, and medical data points. And many different groups need access to some or all of it: doctors, hospitals, insurance providers, drug companies, clinics, health maintenance organizations, state and federal governments, and so on. And each audience needs to see a different slice of the data – but must not see PHI they are not authorized for.

The problem is knowing which data to tokenize for any given audience, and maintaining tokens for each use case. If you create tokens for someone’s name and medical condition, while leaving drug information exposed, you have effectively leaked the patient’s medical condition. Billing and insurance can’t get their jobs done without access to the patient’s real name, address, and Social Security number. If you tokenized medical conditions to ensure patient privacy, that would be useless to doctors. And if you issue the same tokens for certain pieces of information (such as name & Social Security number) it’s fairly easy for someone to guess the tokenized values from other patient information – meaning they can reverse engineer the full set of personal information. You need to issue a different token for each and every audience, and in fact for each party which requests patient data.

Can tokens work in this ‘many-to-many’ model? It’s possible but not recommended. You would need a very sophisticated token tracking system to divide up the data, issuing and tracking different tokens for different audiences. No such system exists today. Furthermore, it simply does not scale across very large databases with dozens of audiences and thousands of patients.

This is an area where encryption is superior to tokenization. In the PHI model, you encrypt different portions of personal health care data under different encryption keys. The advantage is that only those with the requisite keys can see the data. The downside is that this form of encryption also requires advanced application support to manage the different data sets to be viewed or updated by different audiences. It’s a many-to-many problem, but is feasible using key management services. The key management must be very scalable key to handle even a modest community of users. And since content is distributed across multiple audiences who may contribute new information, record management is particularly complicated. This works better than tokenization, but still does not scale particularly well.

If you need to access the original data at some point in the future, encryption is your only choice. If you don’t need to know who the patient is, now or in the future, the practical alternative is masking. Masking technologies scramble data, either working on an entire database or on a subset of the data. Masking can scramble individual columns in different ways so that the masked value looks like the original – retaining its format and data type just like a token – but is no longer sensitive data. Masking also is effective for maintaining aggregate value across an entire database, meaning the sum and average values within the data set can be preserved while changing all the individual data elements. Masking can be done in such a way that it’s extremely difficult to reverse engineer back to the original values. In some cases, masking and encryption provide a powerful combination for distribution and sharing of medical information.

Tokenization is an important and useful security tool with cost and security advantages in select use cases – in some cases tokens are recommended because they work better than encrypted data. The goal is to reduce data exposure by reducing the number of places sensitive data is stored – using encryption, tokenization, masking, or something else. But every token server still relies on encryption and key management to safeguard stored data. End users may only see tokens, but somewhere in the tokenization you can always find encryption services supporting it. We recommend tokenization in various scenarios to reduce costs, simplify IT operations, and reduce risk.