On Password Strength (xkcd 936)
Randall Munroe in his latest xkcd comic suggests that four random common words make a more secure password than a short, random word w/ substitutions.
There are some quibbles I have here – Randall is assuming that the form of the password is known in both situations, in which case the four-word password is obviously superior to the one-word password. But in the first situation, do we really know that we’re going to have exactly 1 uncommon dictionary word, followed by a punctuation mark and a single digit? Randall says “add a few more bits to account for the fact that this is only one of a few common formats” in the subtitle text, but if we add another uncommon base word (Randall estimates that there are 2^16 of them), we’re already up to 2^44 (the same difficulty level as the four-word password). To top it off, we can add arbitrary punctuation/digits between base words.
Randall’s also overestimating the number of different words that people will use to craft a four-word password. He gives an estimate of 2^44 that is derived (most likely) from the common estimate that there are roughly 3000 words in conversational English. However, he doesn’t consider that certain sorts of words are grossly less common than others, and that a sequence of English words lends itself to crafting a word in particular patterns. In particular, if there are 3000 conversational English words, there are only a few ways to turn four words into a coherent sentence. English sentences are predictable, especially with common words.
Subject-Verb-Object gives the English language structure, but also makes passwords easier to guess. By assuming common structures (as Randall does with the one-word password example), we can quickly define down the actual security level we’re going to realize. For instance, there are only a few hundred (at most) prepositions in the English language. In a user-constructed sentence, there is a high likelihood of one of these showing up. If we had a few thousand of these passwords to analyze, we’d see obvious patterns – people would use "The" "A", "An" as the first word to the password with high frequency. Easy, short nouns would be common. Commonly misspelled words would be rarely used. In other words, these passphrases would have structure. Google already has a product that attempts to predict the next word you will type. How well would similar machine learning attempts fare on these passwords, especially if given a first-word seed?
There are other psychological factors to consider. "I Love You" would perhaps become even more popular as a password than it is now. Shoulder-surfing (ie: looking over someone’s shoulder to steal their password) would become easier. The human eye can pick out these patterns better than it can pick out someone typing Tr0ub4d0r&3. On top of this, what’s to stop Alice and Bob from making their password "Eve is an idiot"? Yes, people already make their password the name of their pet, but this will actually encourage it! You’re telling users "yes, it’s quite alright to use common words".
I also don’t believe Randall’s claim that four random words are really easier to remember than one. "Correct horse battery stapler" can just as easily be "Correct battery horse stapler" or "Correct horses battery stapler". If the words are really "random", they won’t be easy to remember. If they are not random, they won’t be secure.
That’s really how it is with all passwords – there’s always going to be a trade-off between passwords being easy to remember and secure, unless the user has some truly private store of knowledge. The best solution to this? Muscle memory. Learn to type a randomly-generated string of characters and learn it well. Use and abuse it, and never forget a password again.
