In my last post I discussed how tokenization is being deployed to solve payment data security issues. It is a niche technology used almost exclusively to solve a single problem: protecting credit card data. As a technology, data tokenization has yet to cross the chasm, but our research indicates it is being used to protect personal information. In this post I will talk about using tokens to protect PII – Social Security numbers, driver’s license numbers, and other sensitive personal information. Data tokenization has value beyond simple credit card substitution – protecting other Personally Identifiable Information (PII) is its next frontier.
The problem is that thousands of major corporations built systems around Social Security numbers, driver’s license numbers, or other information that represents a person’s identity. These data were engineered into the foundational layers of myriad applications and business processes which organizations still rely on. The ID (record number) literally tied all their systems together. For example, you could not open a new telephone account or get an insurance policy without supplying a Social Security number. Not because they needed the number legally or technically, but because their IT systems required the number to function. SSNs provided secondary benefits for business analysis, a common index for 3rd party data services, and useful information for fraud detection. But the hard requirement to provide SSN (or driver’s license number, etc.) was that their application infrastructures were designed to require these standard identifiers.
PII was intrinsically woven into database and application functions, making it very hard to remove or replace without negative impact on stability and performance. Every access to customer information – billing, order status, dispute resolution, and customer service – required an SSN. Even public web portals and phone systems use SSN to identify customers. Unfortunately, this both exposed sensitive information to employees with no valid reason to access customer SSNs and contributed to data leakage and fraud. Many state and local government organizations still use SSNs this way, despite the risks.
Organizations have implemented a form of tokenization – albeit unwittingly – by substituting SSN and driver’s license numbers with arbitrary customer ID numbers. Social Security numbers are then moved into secure databases and only exposed to select employees under controlled circumstances. These ad hoc home-grown tokenization implementations are no less tokenization than the systems offered by payment processors. A handful of organizations have taken this one step further, used third-party solutions to manage token creation, substitution, data security, and management. But there are still thousands of organizations with sensitive data in files and databases to identify (index) clients and customers. PII remains a huge potential market for off-the-shelf tokenization products.
While this is conceptually simple, and simply a good idea for security, not every company uses tokenization for PII – either commercial or ad hoc – because they lack sufficient incentive. Most companies lack strong motivation to protect your personal information. If it’s lost or stolen, you will need to clean up the mess. There are many state regulations that require companies to protect PII and alert customers in the event of a data breach. But these laws are not adequately enforced, and provide too many loopholes, so very few companies ever face fines. For example, most laws are designed to excuse breaches if data encryption was in use. So if a company encrypts network communications, or encrypts data archives, or encrypts your database, they may be exempt from disclosure. The practical upshot is that companies encrypt data in one context – and escape legal penalties such as fines – while leaving it exposed in other contexts. The fact that so many data breaches continue to expose customer data clearly demonstrates the lack of effective data security.
Properly deployed, encryption is a perfectly suitable tool for protecting PII. It can be set up to protect archived data or data residing on file systems without modification to business processes. Of course you need to install encryption and key management services to protect the data, understanding this only protects data from access that circumvents applications. You can add application layer encryption to protect data in use – but this requires changing applications and databases to support this additional protection, paying the cost and accepting the performance impact.
In cases like PII – which really is not needed for the vast majority of application functions – tokenizing personal information reduces the risk of loss or theft without impacting operations. Risk is reduced because you can’t steal what’s not there. This makes tokenization superior to encryption for security: If encryption is deployed insecurely, if administrative accounts are hijacked, or if encryption keys are compromised, the data is exposed. Tokenization simplifies operations – PII is stored in a single database, and you don’t need to install key management or encryption systems. Setup and maintenance are both reduced, and the number of servers which require extensive security is also reduced. Tokenization of PII is often the best strategy as it’s cheaper, faster, and more secure than alternatives.
Reader interactions
One Reply to “Tokenization vs. Encryption: Personal Information Security”
Thanks for the insightful posting, Adrian.
For very large organizations, there’s another potential benefit of Tokenizing personal identification data such as SSN’s. If a robust tokenization product is implemented and used across multiple internal systems, data that is shared between those systems no longer contains the real sensitive data element. That can reduce the number of security controls that need to be in place for those inter-system communications processes.
And if we have Token values in the Production systems, practices that perform data extracts or backups of Production data to populate non-Production systems would de-facto also no longer have real SSN data. We can then create dedicated secondary Tokenization services for the non-Production systems where we pair the Token values with non-real SSN values. This will allow testing of the Tokenize/De-Tokenize processes in non-Production without ever exposing the real SSN values.
Of course the Production Tokenization service becomes a very highly valued target. A compromise in that system would expose all of the sensitive SSN data to Token value pairs. But it’s far easier to do a proper job of locking down the new Tokenization service with encryption, key management, Administrator Access Controls, firewalls and other tools than trying to do the same thing to hundreds of legacy systems.
One of the key concerns about this sort of system design would be limiting the impact that a Tokenize/De-Tokenize call would have on normal processing. As you stated, the use of real SSN’s are required for many mandatory reporting processes, reports that must be filed to many State and Federal Governmental agencies and regulatory bodies. So the De-Tokenize process must be fast.
And the Tokenize process must also be considered. Like it or not, interactions with external agencies and individuals will continue to operate with SSN values. Data feeds from external sources will also need to be processed to convert incoming data from real SSN’s to Tokens.
Determining the proper system designs to support these as-yet-undetermined performance issues is critical. Care must be given to look at a design that can adapt in scale and performance. And the DR requirements must also not be neglected.
It’s an interesting challenge. From a business-to-business perspective, I expect to see an increase in the number of customer requirements for solutions such as Tokenization to protect SSN values in systems.