If you’re building a website that allows the pubic to log-in, you need to store your passwords so you can check your users are who they say they are when logging-in. This is my introduction to the current state of the art for storing your users’ passwords in your database.
Make It Someone Else’s Problem
I’ll say this right up front, the best way is to get someone else to do it. Use an outsourced service or install a component that deals with the whole thing. You’ll have passed responsibility to someone who’s very speciality is already knowing everything I’ve written here, as well as all the nuances I’ve skipped over.
But that’s not always acceptable. Sometimes you need to build your own system.
Doing it wrong – Store the password
We’ll start with various wrong ways to do it and build up to the right way.
The first wrong way is to store the password in the clear in your database. You’ll have a table of users, add a string field called “password” and store it there. When a user comes along to log in, you compare the supplied password with the actual password and if they match you let the user in.
One problem is that your user database might leak and all your users password are right there. Do you trust all your insiders? Are you quite sure that all components of your system are leakproof? There’s little you can do to stop a trusted insider from having a peak at just one user’s record. What are you going to do, have no trusted insiders?
Better but still wrong – Hash the password first
If the problem is that someone knows what everyone’s password is, the solution is for no-one to know what anyone’s password is. As luck would have it, there’s a branch of cryptography that’s perfect for this – the hash function. Instead of storing the password in clear, store a hash instead.
A hash function takes a string of characters and mixes all them up in a repeatable way. Unlike encryption, there isn’t a key and you can’t get the original text back. For example, the SHA1 hash of “rutabaga” is “C8A52CE9 1ED32187 38D43809 B31856AB 619E0ABE”. This will be the same today, tomorrow and forever.
The first time a user registers with your service, they supply you the password they want to use, but before writing it to the database, you run a hash over the supplied password and store the result of the instead. Later, the same user comes back and types in their password again. You run the hash over the supplied password and compare it against the hash in your database. If they match, let the user in.
The other useful property of a hash function is that it is irreversible. There’s no secret key to go from “C8A52E9…” back into “rutabaga”. If all you have is the hash, the original text is lost. Now, if an attacker gets a copy of the user database, they have a problem. All they have is the result of the hash and there’s no way to get the original password back from that – and that’s what you need to log in.
Except you can reverse a hash.
The Bad Guys: “Tell us the original password that produced this hash result!”
Hash Functions: “We’re designed for that to be impossible.”
The Bad Guys: “Really?”
Hash Functions: “Yes. You’d literally have to try every single possible input and store the result in a lookup table.”
The Bad Guys: “Okay, we’ll do that.”
Hash Function: “Wait, what?”
Hash functions are designed to be one-way. There’s no hint of what the original text could have been because none of that information survives. But there’s a way around that detail.
A problem with humans is that we are predictable in how we think of passwords. We like words from the dictionary, patterns etc. From this knowledge, we can make a list of all these likely potential passwords. Then for each likely password, find the hash of each one, storing the original text against each hash. This might sound like a lot of computation but we only need to do it once.
Finally, the clever bit, sort the list by the hash.
There we have it. The Bumper Book of Password Hashes. Each hash, one per line, with the text that went into produce that hash next to it.
The Bumper Book of Password Hashes - SHA1 Edition C8A52CE9062E654D02D08B9AE56BE5A16A3C7663 =)Ve06Va C8A52CE90DCA962E41A8E164EB649207206E553B h30/4h50 C8A52CE91C77FB87893CA977353A65F8C406AA69 Ds?F8Jjj C8A52CE91DBEF9713D61537840CC58F0D8D4B3E9 HPpxLGT/mevs C8A52CE9295EA07D4AD52A1DF84D442E3E106A37 7-KDA-)0:0aF C8A52CE92A077F5A2944D4E20A2953FDF56570F0 oG6Ksdc C8A52CE9351C7D852B09CAE66B1B0D9DB204838A =C0V/5et9s C8A52CE93B4B6AE01A8985C2FE96371967A40DCB -j0880YA3b C8A52CE9426DC99277D114CAB37971B65D18F8B9 a^cY=e3%u67 C8A52CE9451C0944A561CC5E76D0D62C61083A56 4UJKQLwhuQ C8A52CE950C8A276987097569EB248D2E4D68EB9 hTu3sbX3g C8A52CE958C75B126B6D9772D1C430DF6B5CC785 V7Qej5q8Ly3r C8A52CE962606E0ED8617AD9A6C8C9C84FF202FE rUEOy6ZW C8A52CE968E0BEC0CEF5E1D93AF7EFD1987C60CF =hL)F#sDN08r C8A52CE97214314C4DE54168B6D5F7CCEDF35D3E NXd241ts C8A52CE9733B9EED59E95F3A0BCA6594B5BB0841 N0KjP2n7j C8A52CE98E9DA6676C5B0009312A9EF289305236 ue52C^Jc0aA)2N# C8A52CE992F14E7020DC40896AB929D838A118F3 1s/2J00HT)Xt#t5 C8A52CE9A4BF120810B7D9B24F77031184CCF01C 06PeP)r8cr C8A52CE9A9D6D36FA9A1BC2D376A91B221DE83B2 c8DL?Tbr)23:t* C8A52CE9B37385A2CC1894A083E87ACD2EDCE026 z0VoZ/Sw1orL C8A52CE9CCC4088AFEAD6534B827FDB657706EA9 nnNeYZLxeg C8A52CE9CD75EB936FA3B0EEED25B1322C913996 k0StwVCnwA C8A52CE9DE11A6B3739D726FE29B067DC1DD470C KL%du)YF C8A52CE9E99069CC192876B00788632AE75965E6 mdVg/C2Y C8A52CE9ED8DE406CD60F95D5B1B64CD3C3BF1AC DnY73:8e C8A52CE9F8CE32484D73B7B179048E3FB91061EB 4#cN6bYVV)b#*^9 C8A52CE9FE3FDC64F6F088D2DC41EB85CF97D465 Y866(-)5 Page 3,366,268,137
Suppose you’ve got someone’s password hash which starts with C8A52CE9 and you want to know what password produces that particular hash. Grab the book and flick through until you get to the pages with all the hashes that begin C8A52CE9. If it was included in the original set, the original password will be listed right there.
(This technique is better known as a “Rainbow Table”. My name is better.)
A popular service for looking up password hashes is known as Google. You might have heard of it.
Good but not quite done – Salt
A way to make the Bumper Book of Password Hashes obsolete is to add “salt” to the hash. Instead of hashing only the password, also add some random bytes into the mix. Instead of hashing the password, hash the combination of the password and the salt too.
The book might list the hash of “rutabaga”, but it isn’t going to list the hash of “(lots of randomness)rutabaga”. That simple act of adding some random bytes means the book is now useless.
If an attacker manages to find a leaked copy of the user database, they will be able to start guessing and checking on their own. If you make sure each user has different salt bytes, then any computational effort the attacker does make is only good for one single user. Even if an attacker found the password of one user, there’s nothing to bring forward to attack the next user. Even if both users use the same password, the attacker has to start again.
Hopefully that extra effort is long enough for the service admins to realise the leak has happened and start replacing passwords.
How long? Let’s make their job even harder.
The state of the art – Password Stretching
Through the long journey, we’ve arrived at the current state of the art.
These are open standards and your web platform almost certainly has a library that implements most of them. This article isn’t going to recommend one over another. We’ll just say they’re all pretty darn good except for the ones which are not. (Okay, start by searching for “PBKDF2” and see where it leads you.)
The hash functions we’ve encountered so far are fast. They’re designed that way. For passwords, what we really want is something slow. You think being deliberately slow is a bad thing, but let’s follow this rabbit down the hole.
Instead of a nice fast hash like SHA1, we’re going to use SHA1000. It’s just like SHA1 in terms of being one-way and such. The difference is that it is so badly designed it takes a thousand times more processing time to finish.
So why on earth would we use such a badly designed hash? The answer is that not only do you have to spend the processing time running it, so does your attacker. They were already looking at spending a large amount of processing time going through every word in the dictionary looking for a password. By using SHA1000 instead, you’ve just multiplied their workload by a thousand!
These password stretching algorithms aren’t actually badly designed hashes, but they are configurable for how difficult you want them to be. PBKDF2 can be set to have a number of rounds. One round is the same workload as SHA1. Three hundred thousand rounds is a lot more.
Imagine you’re storing your passwords with PBKDF2 set to 300,000 rounds and each user has a unique salt. When a user logs in, you look up that user’s salt and start running the PBKDF2 code for 300,000 loops with the supplied password. If the end result matches the expected result, you allow the user in.
For an attacker with a leaked copy of each user’s salt and expected hash, they can start guessing and checking over and over. Try each word in the dictionary and see if the result matches the expected result for each one. The attacker is faced with a ridiculous amount of computer time to go through all of that.
Now we’ve caught up, let’s head over to part two.
Picture Credits:
𸠓Password” by mk_is_here.
𸠓Equal in stature” by Kevin Dooley.
𸠓IMG_3310” by oFace Killah.
𸠓Entropy” by Robert Nunnally.
𸠓A rainbow in salty air” by torne.