I was reading an article by Rsnake this morning on the problems of using a username as a primary key, and it reminded me of something I’ve been meaning to write about for a while.
As a former database designer and current security geek I’m often stunned by how often designers/developers choose really bad primary keys for their databases. Even back in my developer days, when security for me meant taking down drunks at concerts, I knew better than to use something like a username, credit card number, or Social Security Number as a primary key. To be honest, it had nothing to do with security and everything to do with good database design.
Back when I was starting my IT career I was fortunate to work closely with the professor at the University of Colorado who taught database design. One of his cardinal rules was to, wherever possible, use system generated keys as the primary key. Randomly generated keys, as opposed to sequential keys that could “leak” information. Our designs should also strive to conform to at least Fourth Normal Form, although full normalization wasn’t always possible for practical reasons.
We never used Social Security Numbers as primary keys because they are neither always unique (there are mistakes in the system) nor available for all potential users (foreign students were assigned fakes). Credit card numbers are bad because they are not a unique identifier for an individual- I have multiple credit card numbers, any of which can identify me, all of which are temporal (change over time). I have no idea why retailers so often use credit card numbers as all or part of a primary key, since I may use multiple cards even on the same day.
Usernames more closely conform to a viable primary key from a pure, non-security design perspective, but I never liked them for some of the reasons RSnake cites, and I always feel like usernames are temporal, even when they aren’t, and while I haven’t tested it I think the performance of numeric keys is probably higher.
Thus my first rule in picking a primary key, from both a pure design and security perspective, is to use system generated random keys (preferably not sequential auto increment). For other fields you want unique, like username, SSN, or credit card number, just set a unique index on the field. Worst case, you can even use this to correlate across unrelated systems assuming the rest of the attributes line up (we used to do this using the SSN, before we all learned using them at all is bad).
One criticism of Rsnake’s examples of account hijacking is that even with a pure primary key, if you hard delete a username someone can still impersonate the account, depending on the design of the system.
<
p style=”text-align:right;font-size:10px;”>Technorati Tags: Database Security, Information-centric security, Primary Key
Reader interactions
5 Replies to “Database Security Rule: Use System Generated Primary Keys”
Good observation on auto-incrementing and lockouts. I am actually surprised that there haven’t been more “denial of service”-style attacks this way. Of course more web services are using captchas, especially after you hit 3 fails, so that is helping…
The problem with autoincrementing primary keys:
Because auto primary keys are usually integers a simple for loop from 0 to n will exhaust all possible combinations in a fairly short amount of time.
Now imagine a webservice methods called login that takes a userid and password that has a max login attempts of three.
Wrap a loop with three tries aroung the inner loop and you can lock out all accounts on the system.
Use GUID’s for primary keys passed over webservices etc as its significantly harder to automate
I just don’‘t like sequential keys because they can leak too much information. For example, low numbers can often indicate a superuser or administrator account. There are also some timing based attacks with sequential keys, although they do take much more sophistication to exploit.
I’‘m with you on using generated keys but I usually use sequences.
I’‘m not sure that using randomly generated keys is the answer. Performance wise, it can cause all sorts of interesting problems in index range scans when paging through result sets. Generating a random primary key can also be a bit problematic in large tables as checking for key existence can take time.
Ah yes, back when storage was expensive and DBA time was cheap, and we actually cared about Boyce-Codd and Fourth Normal forms enough to ask about them in interviews. It made sense to make the primary key meaningful as to reduce storage overhead. The good old days …
I am actually curious why your aversion to sequence numbers as a primary key? I think you can ‘‘leak’’ creation date/time, but I am not sure what else.