Why do we repeatedly hash passwords in a loop?

If you’re building a website that allows the pubic to log-in, you need to store your passwords so you can check your users are who they say they are when logging-in. This is my introduction to the current state of the art for storing your users’ passwords in your database.

Make It Someone Else’s Problem

I’ll say this right up front, the best way is to get someone else to do it. Use an outsourced service or install a component that deals with the whole thing. You’ll have passed responsibility to someone who’s very speciality is already knowing everything I’ve written here, as well as all the nuances I’ve skipped over.

But that’s not always acceptable. Sometimes you need to build your own system.

“Knock three times on the ceiling if you want me.
Twice on the pipe, if the answer is no.”

Doing it wrong – Store the password

We’ll start with various wrong ways to do it and build up to the right way.

The first wrong way is to store the password in the clear in your database. You’ll have a table of users, add a string field called “password” and store it there. When a user comes along to log in, you compare the supplied password with the actual password and if they match you let the user in.

One problem is that your user database might leak and all your users password are right there. Do you trust all your insiders? Are you quite sure that all components of your system are leakproof? There’s little you can do to stop a trusted insider from having a peak at just one user’s record. What are you going to do, have no trusted insiders?

“Enemy lasagne. Robust below wax. Semiautomatic aqua. Accompany slacks. White coffee gymnastic. Motorcycle unibrow.
Existential plastic. Extra nightly cow.”

Better but still wrong – Hash the password first

If the problem is that someone knows what everyone’s password is, the solution is for no-one to know what anyone’s password is. As luck would have it, there’s a branch of cryptography that’s perfect for this – the hash function. Instead of storing the password in clear, store a hash instead.

A hash function takes a string of characters and mixes all them up in a repeatable way. Unlike encryption, there isn’t a key and you can’t get the original text back. For example, the SHA1 hash of “rutabaga” is “C8A52CE9 1ED32187 38D43809 B31856AB 619E0ABE”. This will be the same today, tomorrow and forever.

The first time a user registers with your service, they supply you the password they want to use, but before writing it to the database, you run a hash over the supplied password and store the result of the instead. Later, the same user comes back and types in their password again. You run the hash over the supplied password and compare it against the hash in your database. If they match, let the user in.

The other useful property of a hash function is that it is irreversible. There’s no secret key to go from “C8A52E9…” back into “rutabaga”. If all you have is the hash, the original text is lost. Now, if an attacker gets a copy of the user database, they have a problem. All they have is the result of the hash and there’s no way to get the original password back from that – and that’s what you need to log in.

“Music’s on, I’m waking up, we fight the fire, then we burn it up,
and it’s over now, we got the love, there’s no sleeping now.”

Except you can reverse a hash.

The Bad Guys: “Tell us the original password that produced this hash result!”
Hash Functions: “We’re designed for that to be impossible.”
The Bad Guys: “Really?”
Hash Functions: “Yes. You’d literally have to try every single possible input and store the result in a lookup table.”
The Bad Guys: “Okay, we’ll do that.”
Hash Function: “Wait, what?”

Hash functions are designed to be one-way. There’s no hint of what the original text could have been because none of that information survives. But there’s a way around that detail.

A problem with humans is that we are predictable in how we think of passwords. We like words from the dictionary, patterns etc. From this knowledge, we can make a list of all these likely potential passwords. Then for each likely password, find the hash of each one, storing the original text against each hash. This might sound like a lot of computation but we only need to do it once.

Finally, the clever bit, sort the list by the hash.

There we have it. The Bumper Book of Password Hashes. Each hash, one per line, with the text that went into produce that hash next to it.

     The Bumper Book of Password Hashes - SHA1 Edition
C8A52CE9062E654D02D08B9AE56BE5A16A3C7663 =)Ve06Va
C8A52CE90DCA962E41A8E164EB649207206E553B h30/4h50
C8A52CE91C77FB87893CA977353A65F8C406AA69 Ds?F8Jjj
C8A52CE91DBEF9713D61537840CC58F0D8D4B3E9 HPpxLGT/mevs
C8A52CE9295EA07D4AD52A1DF84D442E3E106A37 7-KDA-)0:0aF
C8A52CE92A077F5A2944D4E20A2953FDF56570F0 oG6Ksdc
C8A52CE9351C7D852B09CAE66B1B0D9DB204838A =C0V/5et9s
C8A52CE93B4B6AE01A8985C2FE96371967A40DCB -j0880YA3b
C8A52CE9426DC99277D114CAB37971B65D18F8B9 a^cY=e3%u67
C8A52CE9451C0944A561CC5E76D0D62C61083A56 4UJKQLwhuQ
C8A52CE950C8A276987097569EB248D2E4D68EB9 hTu3sbX3g
C8A52CE958C75B126B6D9772D1C430DF6B5CC785 V7Qej5q8Ly3r
C8A52CE962606E0ED8617AD9A6C8C9C84FF202FE rUEOy6ZW
C8A52CE968E0BEC0CEF5E1D93AF7EFD1987C60CF =hL)F#sDN08r
C8A52CE97214314C4DE54168B6D5F7CCEDF35D3E NXd241ts
C8A52CE9733B9EED59E95F3A0BCA6594B5BB0841 N0KjP2n7j
C8A52CE98E9DA6676C5B0009312A9EF289305236 ue52C^Jc0aA)2N#
C8A52CE992F14E7020DC40896AB929D838A118F3 1s/2J00HT)Xt#t5
C8A52CE9A4BF120810B7D9B24F77031184CCF01C 06PeP)r8cr
C8A52CE9A9D6D36FA9A1BC2D376A91B221DE83B2 c8DL?Tbr)23:t*
C8A52CE9B37385A2CC1894A083E87ACD2EDCE026 z0VoZ/Sw1orL
C8A52CE9CCC4088AFEAD6534B827FDB657706EA9 nnNeYZLxeg
C8A52CE9CD75EB936FA3B0EEED25B1322C913996 k0StwVCnwA
C8A52CE9DE11A6B3739D726FE29B067DC1DD470C KL%du)YF
C8A52CE9E99069CC192876B00788632AE75965E6 mdVg/C2Y
C8A52CE9ED8DE406CD60F95D5B1B64CD3C3BF1AC DnY73:8e
C8A52CE9F8CE32484D73B7B179048E3FB91061EB 4#cN6bYVV)b#*^9
C8A52CE9FE3FDC64F6F088D2DC41EB85CF97D465 Y866(-)5
                                       Page 3,366,268,137

Suppose you’ve got someone’s password hash which starts with C8A52CE9 and you want to know what password produces that particular hash. Grab the book and flick through until you get to the pages with all the hashes that begin C8A52CE9. If it was included in the original set, the original password will be listed right there.

(This technique is better known as a “Rainbow Table”. My name is better.)

A popular service for looking up password hashes is known as Google. You might have heard of it.

Google search for a hash result, returning "rutabaga" as the obvious source of the hash.
“Full moon in the city and the night was young. I was hungry for love, I was hungry for fun. I was hunting you down and I was the bait. When I saw you there I didn’t mean to hesitate.”

Good but not quite done – Salt

A way to make the Bumper Book of Password Hashes obsolete is to add “salt” to the hash. Instead of hashing only the password, also add some random bytes into the mix. Instead of hashing the password, hash the combination of the password and the salt too.

The book might list the hash of “rutabaga”, but it isn’t going to list the hash of “(lots of randomness)rutabaga”. That simple act of adding some random bytes means the book is now useless.

If an attacker manages to find a leaked copy of the user database, they will be able to start guessing and checking on their own. If you make sure each user has different salt bytes, then any computational effort the attacker does make is only good for one single user. Even if an attacker found the password of one user, there’s nothing to bring forward to attack the next user. Even if both users use the same password, the attacker has to start again.

Hopefully that extra effort is long enough for the service admins to realise the leak has happened and start replacing passwords.

How long? Let’s make their job even harder.

“Matthew and Son, the work’s never done, there’s always something new. The files in your head, you take them to bed, you’re never ever through. And they’ve been working all day, all day, all day!”

The state of the art – Password Stretching

Through the long journey, we’ve arrived at the current state of the art.

These are open standards and your web platform almost certainly has a library that implements most of them. This article isn’t going to recommend one over another. We’ll just say they’re all pretty darn good except for the ones which are not. (Okay, start by searching for “PBKDF2” and see where it leads you.)

The hash functions we’ve encountered so far are fast. They’re designed that way. For passwords, what we really want is something slow. You think being deliberately slow is a bad thing, but let’s follow this rabbit down the hole.

Instead of a nice fast hash like SHA1, we’re going to use SHA1000. It’s just like SHA1 in terms of being one-way and such. The difference is that it is so badly designed it takes a thousand times more processing time to finish.

So why on earth would we use such a badly designed hash? The answer is that not only do you have to spend the processing time running it, so does your attacker. They were already looking at spending a large amount of processing time going through every word in the dictionary looking for a password. By using SHA1000 instead, you’ve just multiplied their workload by a thousand!

These password stretching algorithms aren’t actually badly designed hashes, but they are configurable for how difficult you want them to be. PBKDF2 can be set to have a number of rounds. One round is the same workload as SHA1. Three hundred thousand rounds is a lot more.

Imagine you’re storing your passwords with PBKDF2 set to 300,000 rounds and each user has a unique salt. When a user logs in, you look up that user’s salt and start running the PBKDF2 code for 300,000 loops with the supplied password. If the end result matches the expected result, you allow the user in.

For an attacker with a leaked copy of each user’s salt and expected hash, they can start guessing and checking over and over. Try each word in the dictionary and see if the result matches the expected result for each one. The attacker is faced with a ridiculous amount of computer time to go through all of that.

Now we’ve caught up, let’s head over to part two.

Picture Credits:
📸 “Password” by mk_is_here.
📸 “Equal in stature” by Kevin Dooley.
📸 “IMG_3310” by oFace Killah.
📸 “Entropy” by Robert Nunnally.
📸 “A rainbow in salty air” by torne.

Vinegar – refined Vigenère – can you break my cipher?

I’m idly interested in cryptography, the art of scrambling a message so that it can be transmitted securely, and only someone with the magic key can understand the message.

When I was young, I designed a cryptographic algorithm. I thought I was so clever, but just because *I* couldn’t break it, that doesn’t make it secure.

In this article, I present my naive cryptographic algorithm. It’s very flawed, so please don’t use it for anything important. Can you find the flaw?

This article will start with some background on substitution ciphers and the Vigenère cipher, which my method was based upon. Then, we’ll look at my big idea itself, Vinegar. To keep it interesting, there’s a little code breaking challenge as well. Enjoy!


How Etaoin Shrdlu defeated substitution ciphers.

Like most children, my first encounter with cryptography was a substitution cipher. A friend gave me a sheet of paper with each of the 26 letters and wacky squiggle next to each one. This would be our secret code. Replace each letter with it’s squiggle and it would just look like a bunch of squiggles.

We thought it was unbreakable, but it wasn’t. This sort of code can be cracked by knowing that E is the most common letter in English, so the most common squiggle in the hidden message is probably an E. The next most common squiggle is probably a T. Once you’ve covered the twelve most common letters in English; E, T, A, O, I, N, S, H, R, D, L and U;
♦ou ♦an easil♦ ♦or♦ out ♦hat the other ♦issin♦ letters ♦ould ♦e.

What we need is a cipher the produces a coded message with an equal mixture of symbols. Enter Vigenère.

The Vigenère cipher

Centuries ago, the choice of people who wanted to communicate in private was Vigenère. Here’s how it works.

          A 0
Z 25 B 1
Y 24 C 2
X 23 D 3
W 22 E 4
V 21 F 5
U 20 G 6
T 19 H 7
S 18 I 8
R 17 J 9
Q 16 K 10
P 15 L 11
O 14 M 12
N 13

(This was meant to be circular.)

The key to understanding a lot of cryptography is that the 26 letters of the English language can be used as numbers. On this chart, each letter has been given a number. Now, it’s possible to do simple calculations with letters. What’s C+C? The answer is E, because 2+2=4.

Don’t get excited, but you can do subtraction as well. K-H is D, because 10-7=3.

You may be wondering what should happen if you add past ‘Z’ or subtract past ‘A’. For our purposes, imagine the 26 letters on a clock face in a circle. On a normal clock, when a hand ticks past the 12, it moves onto 1, not 13. Its the same with the Vigenère clock of letters. After the letter ‘Z’, is the letter ‘A’.

Finally, because we’ve constrained our system to these 26 values, adding ‘B’ (1) turns out to be the same as subtracting ‘Z’ (25). Regardless of where you start, performing either +B or -Z will end at the same letter. In fact, all of the 26 possible additions will have an equivalent subtraction. You can find each letter’s pair on the chart by looking for the letter across from the other. ‘A’ and ‘N’ are self pairing; +A is the same as -A, and +N is the same as -N.

Working within this system, we can use this to encrypt secret messages. Imagine Bob and Carol wish to communicate in private, but the bad guys can read their messages. To stop the bad guys, Bob and Carol meet up in advance and agreed the code they will use future.

Later, Bob wants to send the message “But now you will die by chainsaw” to Carol. (He’s a fan of Internet cartoons.) Now we have the system of adding letters together, Bob can perform a simple calculation on each letter of the secret plain text message. He takes the previously agreed keyword, “WILHELM”, and adds each letter of the keyword to each letter of the plain text, repeating the keyword as often as needed;

  BUT NOW YOU WILL DIE BY CHAINSAW
+ WIL HEL MWI LHEL MWI LH ELMWILHE
  XCE USH KKC HPPW PEM MF GSMEVDHA

When Carol receives the encoded message, she can get back the plain-text by subtracting the keyword.

  XCE USH KKC HPPW PEM MF GSMEVDHA
- WIL HEL MWI LHEL MWI LH ELMWILHE
  BUT NOW YOU WILL DIE BY CHAINSAW

Vigenère fixes the flaw with substitution ciphers because all the letter Es in the original message will all (mostly) come out as different letters.

Vigenère eventually fell out of use once a new flaw was discovered. Frequency analysis was still hiding there. If you know that the length of the keyword is 7, then you know that every 7th letter was encoded with the same letter from the keyword. So if you circle the 1st, 8th, 15th, etc letter of a long enough hidden message, the most frequent letter of those circled letters is probably an ‘E’, etc. Repeat for each group of letters 7 spaces apart and you can work out what the plain-text message was.

(How did we know the keyword was 7 letters long? There are ways. Wikipedia have an in-depth description but you don’t need to know how for this puzzle.) If you want to experiment, “Sharky” has a rather splendid web-app to perform the encoding and decoding.

Vinegar – Refined Vigenère

This is my improvement to the Vigenère cipher, which I called Vinegar. (Because that’s close to how I kept mispronouncing it.)

The problem with Vigenère is that the keyword is repeated, and that repetition exposed the vulnerability. What we need is a keyword that’s long without repeating, but small enough to be remembered.

Vinegar takes a 17 letter keyword and expands it to 210 letters. With that long 210 letter keyword, you can use Vigenère without having to repeat the keyword, and it’s the repetition in Vigenère that exposes it’s vulnerability.

Why 17? Because it’s the sum of the first four prime numbers. 2+3+5+7=17. We’ll use the keyword “WILHELMVONHACKENS” for this example.

Split the keyword up into groups of 2, 3, 5 and 7 letters. WI, LHE, LMVON and HACKENS.

Repeat each sub-key to make up 210 letters:

  WIWIWIWIWIWIWIWIWIWIWIWIWI...
  LHELHELHELHELHELHELHELHELH...
  LMVONLMVONLMVONLMVONLMVONL...
+ HACKENSHACKENSHACKENSHACKE...

Because each group has a prime number length, the four groups will effectively expand into a 210 letter Vigenère keyword for the price of a 17 letter keyword. (210 is 2x3x5x7.) Add each column together to get the long key…

ZBXRUKLROI YCPVUERRZP DMYCEEZGYZ ULZUIETMYK BQJDPOTUNJ LHIEHSTOTJ WONOQZDOBY VYENRRHOVE VJLSBAOYVM KIVJABGCVG QIGQFLPJFG YXFAWKQBJG SDFLDPAKQQ SLUKNGZLIU SFAKYNEVRB CFIZXXVUST GFCFXICZCC NPCNMHMQBD FTCHSHXXGN OAMHAWWHXM PSQHUCWSER

To avoid the Vigenère vulnerability, we can only use a 17 letter keyword once per 210 letters. So if you are encoding a longer message, use the last 17 letters of each 210 block for a new keyword to use for the next block of 210 characters.

So there’s my cipher. It has a flaw. Can you work it out? To try your hand, I came up with a random 17 letter codeword and expanded it to 210 letters. I then used that long key to encrypt my secret message. (My message is a little shorter than 210 letters, so I left some at the end un-used.) The plain-text is perfectly normal English. Spaces and punctuation are retained from the plain text.

Iwy seix zfvdzykjxm moebj dkaavmin vjkehleozp atir sdvwkvm cf hhbd vw gauj ty qzintte av mjbo xr xxnb whuieift, zaed jmioidh xv ts xtt elmv fg zok xgwnlpn vbues mp irmc twpb, yebhaoz rdlnrpbj jgg kzmlkyah vo dvta hn jzfxggaxcq.

Can you decode my message without the key? Post me a comment. Enjoy.

Picture Credits:
I’m lying by Taylor Dawn Fortune
Local Praire Dog Gossip by Art G
With grateful thanks to Richard Heathfield.

‘First Past the Post’ isn’t.

I’m a bit of a nerd for vote counting systems. So I discovered with delight that in the UK, there will soon be a referendum on changing the way votes are counted. As I write this, most of the UK (and the USA) uses a method we call “First past the post”.

It’s called “First past the post”, but it isn’t. The name is about as ridiculous as calling North Korea;  â€œThe Democratic People’s Republic of Korea”. The country is on the Korean peninsula, so at least that name is a little bit honest.

Put what you know to one side and think about what a vote counting system called “first past the post” should look like by the name alone. It’s a clear analogy to running in a race with a “post” at the end of the track. So there’s a predetermined number of votes and the first candidate to get past that threshold wins?

Nope. “First past the post” doesn’t work that way at all.

Imagine a vote of 100 voters selecting one of three candidates; A, B and C.

A 45 votes
B 30 votes
C 25 votes

If we put the “post” at 51 votes (a simple majority), then all three candidates lost. They all fell over and dropped out of the race before anyone reached the finish. Three pathetic failures. Rather than declaring no-one the winner, the people organising the race then dig up the post and re-plant it just behind A, as if the post was always there. He is declared the first one to pass the post.

If that wasn’t confusing enough, the other counting system to be offered in the referendum, “Alternative Vote” or “AV”, does indeed look like how you’d imagine “first past the post” to work. To continue the analogy with AV, candidate C drops out of the race after the first round. The second preferences of C’s voters are counted and added to A’s and B’s total.

A 45 + 3 votes 48 votes
B 30 + 22 votes 52 votes
C 25 votes

Hurrah! After two rounds, B was the first candidate past the post. A had an early lead but couldn’t keep up as far as the finishing line, overtaken by B in the closing straight.

So remember, the vote counting system that works in a “first past the post” manner is the one called “Alternative Vote”. The one without anything like a “post” is called “First past the post”. Clear?

Picture credits:
Vote Goat by Jeremy Richardson (Mr Jaded on flickr).
Racing demons by Simon Webster (shaggy359 on flickr).

Computers are fast!

Killer Sudoku is a popular variant on the perennial Sudoku game found in British newspapers. The same rules about the numbers 1 to 9 appearing only once in row, column and 3×3 box still apply, but unlike Sudoku, the puzzle doesn’t come with any numbers filled in. Instead, cells are groups with dotted lines and the total of all those cells are written in.

Here’s part of a puzzle. The two cells grouped by a 10 could be 1+9, 2+8, 3+7 or 4+6. (It can’t be 5+5 because that would mean two 5s in same box.) Not a lot to work on, but also look at the three cells grouped by a 23 in the same 3×3 box; The only possible fit is 6+8+9. This means the only possible fit for the 10 group is 3+7, as all the possibilities would clash with the 6,8 or 9 required to make the 23.

I could go on. If you want to complete the puzzle, its the Guardian‘s puzzle number 166. The point of this article isn’t to solve the puzzle, but something I observed along the way.

Am I sure there’s no other way to split 23 into three parts, 1 to 9, without duplicates? I confess, mental arithmetic is not my forte. Part of the puzzle solving process for me is going through each dotted group and building a crib sheet on all the possible ways to make the total, so I can cross them off later.

This isn’t a part of the puzzle I particularly enjoy, so I decided to automate it. (Is that cheating? I don’t think so in my circumstances, but that’s another topic of conversation.)

So I sat down and designed an algorithm…

  • Input: n (the number of cells, 2 to 9) and m (the total, 3 to 45).
  • Output: All solutions to SUM(n integers 1-9 without duplicates) = m.

Only problem, as with so many mathematical endeavors, life got in the way. By the time I had worked out a basic algorithm, I noticed it was 2AM and suddenly felt rather tired, but I didn’t want to stop, so I simplified my plan.

Loop, starting with (1,1,1,1,1,1) then (1,1,1,1,1,2), ending with (9,9,9,9,9,9). For each combination, if they add up to the target, have no duplicates, and is in order, add it to the list. It was a very inefficient algorithm, but it would work and I wouldn’t have to expend much thought on building it.

Then something surprising happened. While testing with 2 or 3 digit targets, it was near instantly fast, but I had expected 6 digit targets to take something like a few minutes. In fact, it took 2 seconds.

2 seconds!!!!! I had learnt to write programs on my early 80s BBC Micro and I was used to leaving it running overnight calculating something like this. Computers are fast!

(If all that seems too much like hard work. Wikipedia have built the tables for you.)