Tough nut to crack

To exercise my (almost-non-existent) Python skillz, I wrote a script to generate passwords and passphrases from common words. The script (in Github) also computes the bit entropy of the generated passwords, and compares them against the plain words (without obfuscation).

As a bonus, I also wrote a generator for XKCD's passphrase scheme: choose four random words from a word list, resulting in a higher entropy and better recall.

The script uses four schemes -- well, three, actually, if you don't count the plain text version (for control): simple character substitutions (e.g. "@" for "a", "3" for "e", similar to "l337 $p3a|<") and padding in-between words; padding short passwords with random characters; and the XKCD algorithm.

Here's a test run using the word list from Linux, /usr/share/dict/words:


Cryptogamia738314396617973004177668             35   208.40
0nl3pYUG%/50$?9&(.)+.70:01<[=_:                 31   203.19
resoaked16166235184403414647125208              34   202.44
flophouses37070490830801046024                  30   178.63
nontaxonomic1968304520123681                    28   166.72
ferrocerium42507400944791500                    28   166.72
^#fILLIPIng/@-2/1;.                             19   124.54
?&p5y(HROM37rI(2l                               17   111.43
]NigHt-cELlAr\]4?                               17   111.43
squads-leftpertestself-regulatedecarburized     43    75.68
arienzomiscounselingtritheocracyreverberantly   45    75.68
pseudepigraphic                                 15    70.51
bubble-and-squeakMariandDorkus                  30    56.76
Lyonaisquadriannulatebumblepuppy                32    56.76
re-excitelactoidununderstandable                32    56.76
subthreshold                                    12    56.41
bioscientist                                    12    56.41
francisca                                        9    42.30
apyrene                                          7    32.90

The padding scheme produces passwords with higher bit entropy, understandably, because of the length. Complexity-wise, though, they won't cut it since they don't have enough character classes (mix of upper- and lower-case, punctuation marks, and digits). The character-substitution-plus-padding scheme does a better job, while XKCD-style is good enough.

(Caveat: I'm not a mathematician. This is a purely "pedestrian", hence loose, interpretation of entropy based on my limited understanding of the concept, but I'd like to think that the math is sound. :-)

Take note that I used a different algorithm for calculating the entropy for XKCD and the other schemes. For XCKD, for example, I assumed a fixed-sized data set from which to derive the passphrases, so the calculation is:

$$E = l * {log_2{R}}$$

where $E$ = entropy
         $l$ = number of words in the passphrase
         $R$ = total number of words in the data set

For the other schemes, I use the classic definition for calculating entropy, which is the same as above, but $l$ is the length of the password, and $R$ is the range of possible characters.

I also tested the generated passwords against several online strength testers. Interestingly, Wolfram Alpha provides a great password strength tester. It also suggests passwords, based on your input.

Next step is to write a generator for the Schneier scheme: take a memorable sentence, add some memorable tricks (character substitutions, padding), then turn it into a password. That should be fun.

Comments

Popular posts from this blog

Pull files off Android phone