The Pitfalls of Password Strength Meters

Written by

I’ve spent a depressingly large proportion of the last few years writing about the fact that so few people recognize that they’re using poor password and PIN selection strategies. This is unsurprising, perhaps. After all, this issue is not just technological, but psychological and even ergonomic. If you’re not confident of your ability to create a sound password, you might use a password strength meter like Microsoft’s. I can’t vouch for how good it is, but a lot of people seem to find it helpful to have some guidance.

However, an article by Mark Stockley for Sophos suggests that a poor meter may be worse than useless. He took five of the 10,000 most common passwords, according to xato.net, all of which the cracking software John The Ripper cracked more or less instantly, and then ran them against five plug-in strength meters. One meter categorized all five as good, another classified two of them as good. Ten were classified as weak by various meters, six as medium, and two as ‘norm’ (normal, presumably).

Stockley’s contention is that:

A password strength meter that doesn’t reject all five out of hand is not up to the job of measuring password strength.

They all failed. And not only that, they don’t agree. 

Well, I won’t disagree: the results are inconsistent between meters and the classifications are misleading, unless you believe that ‘iloveyou!’ or even ‘abc123’ are good passwords. Why did they fail so spectacularly? The answer lies in the fact that the harshest categorization is ‘weak’.

There are a number of characteristics you can use to assess the strength and entropy (randomness or unpredictability) of a password or, preferably, passphrase, such as:

  • Number of characters
  • Variety of characters – a very long password consisting of the same repeated character is not resistant to password cracking software
  • The types of character used: alphabetical, numeric, symbols and special characters, and where they’re placed in relation to the other characters. (To take a simple example, when people append a number to their password which is augmented every time they’re required to change it, that offers no effective barrier to password-cracking software.)
  • Case sensitivity
  • Use of dictionary words
  • Use of character substitutions (such as 0 for ‘o’, or 4 for ‘a’)

There are any number of algorithms that might be used to assess the effectiveness of a given string used as a passphrase. Obviously, some are better than others and you have to expect some variation in categorization. I tried the same passwords against the Microsoft checker, which wasn’t one of those tested by Stockley. Here are the categories assigned by the checker. The number in the first column represents their ranking in the list of 10,000 most common passwords at xato.net.

Ranking

Passphrase

Category

14

abc123

weak

29

trustno1

medium

158

ncc1701

weak

8778

iloveyou!

medium

8280

primetime21

medium

Clearly, there isn’t a separate category for ‘Don’t use this password because an awful lot of other people already do so hackers will find it quickly. And the fact that trustno1, confirmed by at least one other list to be far more common than ncc1701, is categorized as medium, suggests that ranking (or appearing at all) on such lists is not one of the categorization criteria applied by the Microsoft meter or, apparently, any of the five tested by Stockley.

That’s not to say that the lists are only used by password crackers. At one time, Twitter used a script to check passwords created by its users against a list of strings: if someone tried to set a password that was found on the list, it would not be allowed. And yes, abc123, trustno1, and ncc1701 could be found there (the list was very trivially obfuscated). iloveyou! wasn’t included, though iloveyou was. Nor was primetime21 or anything close to it.

So how do these meters reach their conclusions? Well, one of them considered all five of those passwords ‘good’, so maybe it doesn’t have any negative criteria. All the others considered abc123 weak, even though it consists of a mix of letters and numbers, perhaps because it features two strictly serial sequences (abc and 123).

Perhaps ncc1701 fares better according to some meters because it doesn’t include a dictionary word (though it is, of course, the instantly recognizable number of the Starship Enterprise, which is why so many people use it). iloveyou! probably gains favor because strength meters like passwords that include punctuation characters.

Maybe primetime21 fares even better because primetime is technically not quite a dictionary word (though it concatenates two dictionary words and certainly exists as the name of a TV channel), even though its numeric component consists of appended digits. (It would be a very weak dictionary attack that didn’t try adding a series of digits to strings in its dictionary.)

After I wrote some articles and a paper or two about password- and PIN-related issues, I found myself contacted by PR agencies and reporters every time someone came out with a new list, or a database of credentials was stolen and dumped onto Pastebin. Even worse, I kept coming across journalists publishing their own list of ‘the 10 (or 20, or 25) bad passwords’.

So I wrote a faintly disgruntled article pointing out that lists of bad passwords can actually be misleading, especially if they lead people to think they’ll be OK if they don’t use the most common passwords or PINs. My main point was that it’s better to focus on how people can think about improving their password creation strategies, rather than on a handful of the very worst passcodes.

Regarding PINs, research has indicated that 15% of the collected passcodes could be found in the top ten, which consists of the following:

  1. 1234
  2. 0000
  3. 2580
  4. 1111
  5. 5555 
  6. 5683
  7. 0852
  8. 2222
  9. 1212
  10. 1998

(There are similar metrics on offer for mixed character passwords, though I can’t say how reliable they are. Research by Mark Burnett from some time ago indicated that 91% of all user passwords sampled appear on the list of just the top 1000 passwords.)

Randomization is no guarantee of security. Indeed, randomization will sometimes give a bad PIN like 0000. You can use algorithms that are essentially pseudo-random but which are weighted to exclude the top n PINs, of course, but I don’t know if any service does that. 

The issues aren’t exactly the same when you move away from fixed length and purely numeric passcodes to variable length passwords and passphrases with a mixture of character types, if only because of the number of variations available. Dictionary attacks, based on trying commonly used strings in the hope of cracking a password early in the process, are just one way of gaining illegitimate access to an account, but they are commonly used.

The Microsoft checker may not take much account of overused passwords, but at least it (quite rightly) points out that ‘This password checker does not guarantee the security of your password; it is provided for your personal reference,’ and offers some terse but by no means useless advice on selecting a password.

Stockley’s testing indicates that just because a website permits a given password and indicates that it’s better than ‘weak’, doesn’t necessarily mean that your password is immune to instant cracking by comparing it to a list of known bad passwords.

I’m not saying that meters should necessarily include checking against lengthy bad password lists, let alone huge dictionary files: that would certainly present practical difficulties for password checking scripts like the one Twitter used to use. It does mean, though, that you need to bear in mind that the clever idea you just had for your password may not be as unique as you thought it was, even if it isn’t an obvious dictionary word.

Wouldn’t it be nice if you could just try it against a list of the most commonly-used passwords? You can, of course, though you need to bear in mind that some of the ‘try your password’ sites that pop up every time there’s a breach in a popular service are aimed at stealing your password, not helping you. It might be less dangerous to acquire a list you can store in a document locally and do a ‘find’ for your password of choice, or a reliable online list, but how many passwords should it contain? How do you know it’s reliable?

Here are some conclusions I came to in a presentation I made at a conference last year:

In the age of Bring Your Own Device, unauthorized or inappropriate access to a device may give an attacker access to highly sensitive internal resources. So there’s also a need within the enterprise to find ways to encourage and enforce sensible, security-aware behavior when it comes to password and PIN selection strategy. And consider using alternative authentication strategies, where practical, even if your users protest about the inconvenience.

Inside and outside the workplace, it’s critical that those who’ve embraced the ‘share everything and don’t worry about privacy or security’ philosophy of social media are encouraged to recognize that the ready availability of so much personal and even sensitive data makes it less safe as a source of passcodes and passwords with personal meaning.

And, of course, the usual caveat applies: It doesn’t matter how good your password strategy is if you’re applying it on a site that doesn’t look after its – and your – authentication data properly.

Article updated 15/3/17 to remove dead links to Microsoft Password Generator

What’s hot on Infosecurity Magazine?