Hybrid Rainbow

Hybrid Rainbow is new overpowered method combining hybrid attack and popular precalculation technique. Joining flexibility of hybrid attack with convenience and speed of table cryptanalysis it lets easily recover passwords considered "hard" and even "impossible to recover" before. Now, having hybrid table once calculated from good dictionary or dictionary set, for any hash-function passwords of 12-symbol length, as well as 13, and even 20-symbol long ones, can be recovered in a few seconds. Though generally password length doesn’t matter anymore.

Contents

1. Method creation premises
2. New boundaries range
3. Main settings for generator
4. Precalculator parameters
5. Working with Hybrid Rainbow via UDC
6. Examples and statistics
7. The main UDC help file


1. Method creation premises

Back in 2003, Philippe Oechslin announced new method called Rainbow attack ("Making a Faster Cryptanalytic Time-Memory Trade-Off"), which let to precalculate brute forcing. Having once calculated and saved table for specified symbol set and password length (like 7 Latin symbols) we can look up to it as often as we need to find passwords for according symbol set. However we just select required hashes from the table instead of brute forcing what decreases time of attack by factors.

For example, after creating a table for 7 alphanumeric symbols (which will take over a week) any password composed by 7 alphanumeric symbols can be recovered in 20-30 seconds. At the same range brute-force attack will take over 24 hours. Moreover, tables can be simply downloaded from the Internet to do not waste time for generation.

Tables proposed by Philippe Oechslin have probabilistic nature. That means any table at specified range doesn’t guarantee 100% chance to find any password from the range said. More the table size – higher success rate.

Here we came to the main problem of Rainbow-tables. If we want to get high success rate for wide attack range, table size can be too large. So, table for recovery of any 8-symbol password with 99% success rate take over 1.5TB (1500 Gigabytes). Time of cryptanalysis by sure can’t take less than time of loading of these tables from the disk, and that takes quite a lot of time (about 4-5 hours).

We can certainly greatly decrease size of tables by decreasing success rate; for instance, to provide 90% success rate of the same password range table size will be more than twice less (700 Gigabytes); 1% success rate table will be small enough to go at CD.

The question is what can we recover with these tables?

1% success rate table will bring no results at all, brute forcing is more effective in this case. 90% success rate table may not contain the most relevant combinations (which could be real passwords) as “password”, “testpass”, 'qwertyui", etc.

At the same time relevant combinations can easily take 1% of starting attack range.

To achieve this, we shall learn to generate tables to be composed of relevant combinations instead of different “garbage” (i.e. lines having too low chance to serve as password - like "qpdfvlpr")...

Tables proposed by Philippe Oechslin were just a demonstration of precalculation technique effectiveness, not a special tool for password recovery, as he mentioned at his work abstract. However many on-line services, newbies, and amateurs armed with his method “as is”. Even as it was, the method brought worthy results.

However 8-symbol hurdle still hadn’t been cleared after 4 years. Plain Rainbow-tables can’t help to recover 12-symbol password for MD5, for example, however hard we try. Not speaking of 12 symbols, even 9-symbol length causes great difficulties. Many web-portals propose to choose passwords not less than 6-10 symbol long. Especially so that it couldn’t be found using tables...

2. New boundaries range

Now let’s imagine we have generator which composes tables of relevant combinations without limits of number and length. For example, million of relevant 9-symbol words. How can this table help?

At one hand real filling of 9-symbol range will be less than a hundredth of percent. Related to random 9-symbol line its success rate will be the same (0.01%). Though at the other hand any of relevant combinations shall be checked first as it has higher real success probability. Such table may have up to 10% success rate (or even higher) related to human passwords. In this case comparing to classic Rainbow-tables we can get efficiency bonusup to 10%/0.01% = 1000 times higher.

Here I speak of "chance" because real success rate totally depends on user specified generator parameters.

The simplest method of precalculation is text dictionary composed ofrelevant combinations only. n different cases success rate may differ from 5% to 90% (depending on dictionary and initial hash selection). But that already is way too better results than ones given by Rainbow-tables of the same size.

Now let's look at user who tries to find difficult password. We assume he knows we have a table with 100 billions relevant combinations of different length (such table can take about 2GB), but he has no idea what is this table and what words does it contain. How to find a password so that it couldn't be recovered? We can recover just 100 billions of words, and user can select one of infinite number. One can take password "H@rdPas$word", which most likely won't be in our table. But how can he know if it is or not? We may have no password "peopmmssrl", but what if we do? So boundaries of "impossibility of recovery" fade just due to one comparatively small table.

And what if every cryptanalyzer has own large table of relevant words?...

"To be more reliable" also loses sense for passwords. Will "testpassword" be safer than "326"? One table won't have first of them, second one won't have the second password, while third table will have both.

3. Main settings for generator

How will we generate our tables? For first look at Hybrid Rainbow solution is simple - we use dictionaries.
Let’s take one dictionary D1. What words can we get from it? For instance, add “2007” to every word. Either double every word. Or reverse letters order. Each of these actions we call operation. Rule is consequence of operations. For instance, double word and reverse result. Each rule “adds” one more word to our dictionary applying rules to source dictionary line. So, having N rules we get N more words than we had in the source dictionary. Advantage of this method is its speed and no need to store anything but source words and list of rules.

There’s also another way. We take word W from dictionary D1 and add this word to every word in the source dictionary D1 (including W itself). We get new dictionary which contains the same number of words as source one. Now we take another word V and do the same. Well, we can do this for every word from D1. If we had M words in D1, we will get M dictionaries with M words in each. If we combine them, we get dictionary with M^2 (squared M) words. We call this dictionary “Cartesian square of D1” and mark it D1*D1.

So if D1 = {test, pass, 1} (three words), then D1*D1 will be {testtest, testpass, test1, passtest, passpass, pass1, 1test, 1pass, 11} (nine words).

What advantages does this method have? At first look it allows building various phrases composed of words from the word set.

Let’s look closer. We will build dictionary of word fragments instead of whole words. For instance, 4-letter combinations. We won't include combinations rarely used in the real words (passwords) like "pzla" or "qmzo", keeping only possible ones like “test” or “qqqq”. Build D1*D1. The most amazing is that it will mostly include relevant words! Though there will be squared number of words in the source dictionary. So for 1000 relevant fragments we get 1 000 000 rather relevant 8-symbol long combinations.

This fact allows, on the one hand, to simplify relevance rule, and, on the other hand, automate dictionary generation.

Now let’s generalize this technique. We will build Cartesian multiproduct instead of Cartesian square. From dictionary set D1, D2, ... Dk we build dictionary D1*D2*...*Dk, composed of various “phrases" where first word is taken from the first dictionary (D1), the second word from the second dictionary (D2) and so on till the last one from the last dictionary (Dk). This dictionary will have size equal to product of all involved dictionaries; moreover, it also keeps some degree of relevance.

We can even create generalized dictionary taking account of English morphology. For instance all English words can be got as multiproduct of prefixes.txt * stems.txt * suffixes.txt. Result dictionary will certainly include not only English words, but also “strange” words (like "reasker” or “premainy”), which will still have higher relevance than random symbol sets. The most important is that these three files with word fragments will take way too less space than even comparatively dictionary of English words.

So Hybrid Rainbow uses exactly Cartesian multiproduct as general generation method for relevant combinations.

So before generation all the dictionaries with “parts” shall be specified in the order of their multiplication. That’s the first parameter of method ( D = {D1, D2, D3, ... Dk} ).

4. Precalculator parameters

Right after we defined dictionaries we need ( D = {D1, D2, D3, ... Dk} ), way of product saving shall be specified.

Size of result dictionary |R| will be |D1|*|D2|*...*|Dk|, i.e. just product of size of all dictionaries. Let us take two dictionaries - 100 000 and 500 000 words – and multiply them. Here we get dictionary containing 50 000 000 000 words. It most likely won’t be possible to save it as text to the disk. Assuming average word length as 9 symbols, this dictionary will take 480 GB of disk space.

To reduce this size, chain length parameter can be used. Just as Philippe Oechslin in his original method, setting chain length L we cut dictionary size by L/16 times. So for our example if we set chain length 2048, saved file size will be 380 MB (50 000 000 000 * 16/2048 bytes).

However increasing chain length we also increase time taken by cryptanalysis (i.e. look up of the password in the table). So while one hash takes about couple seconds with chain length equal 2048, it will take up to couple minutes with chain length 10 000 for each hash. Generally, cryptanalysis time is square depending on chain length. Cryptanalysis usually loses efficiency for chain length over 100 000.

Single range for precalculation (R) doesn’t have to be precalculated on the same computer during single session. Naturally, this method supports resume after pause of calculation process. To generate several tables for the same range effectively, table number parameter is introduced. Tables with different names and numbers will never be identical even for the same range (R).

It’s also to be mentioned that cryptanalysis goes optimally when the table can fully be loaded to the RAM. It makes sense of generation of several small tables with different numbers instead of one large table.

So how to generate “small" table? “Perfomance factor” parameter will help to. Setting it to 100% means about 45-55% coverage of R (i.e. Success Rate) what is usually insufficient. To reach 90% Success Rate perfomance factor shall be set at least to 300%. In this case precalculation time will 3 times over forcing time by the same range.

We can see that time taken by brute forcing of the same range is enough just for half of the range for precalculation (with performance factor 100%). This is necessary fee for decrease of table file and improvement of cryptanalysis speed.

Small tables are possible with 1-25% value of performance factor. Each of these tables has own small Success Rate, but combining these tables we get total Success Rate higher than single table of the same size could have. Such effect got detailed description in the Philippe Oechslin work.

5. Working with Hybrid Rainbow via UDC

Now let’s get down to practice.

The only program with Hybrid Rainbow support we have today is The UDC, so let’s review methods The UDC provides to work with Hybrid tables.

All the parameters related to Hybrid Rainbow are gathered at the single tab named “HRainbow attack”. Tab is divided vertically; left one is dedicated to the ready table files, right one – to new table generation.

Target hash-function is defined at the “Progress” tab, as well as hash values to perform forcing on.

First let’s look at the part related to attack, i.e. cryptanalysis.

In the left list already installed tables are displayed. Each of them can be either enabled or disabled using check box. Tables with check box unmarked will be simply skipped during cryptanalysis. It’s needed to save time, excluding absolutely irrelevant tables (if there are some). Current number points to table where cryptanalysis starts. Numeration takes only marked tables into account, first table has number “null”.

When needed tables are marked you just need to launch attack using main menu “Recovery -> Local -> Hrainbow attack”. That’s all.

After launch you can monitor cryptanalysis process at “Console” tab where detailed information about performed actions is displayed.

Now let’s look at method of new tables generation.

First of all file name shall be specified where to save new table – without any prefixes or extensions, for example, "TestTable01".

A bit lower – the most important – list of dictionaries for multiproduction. Top left button allows to add dictionary to the bottom of the list. Middle left button removes selected dictionary from the list. Bottom one clears the whole list. Top right button allows to move selected dictionary one line up in the list. Right bottom one – moves it down. Order of dictionaries in the list corresponds to order of words in the target phrase.

If we had top dictionary containing single word "test", and bottom containing two words "pass" and "word", this order would result into two combinations: "testpass" and "testword". Reversing dictionaries order we get "passtest" and "wordtest".

Try to do not select huge dictionaries as for most purposes small (up to 50kB) dictionaries containing relevant words are needed. You can specify up to eight dictionaries.

Then we go to precalculator parameters in the same order as described above. There’s also success rate indicator automatically updated with every change of other parameters.

Below there’s useful link “Evaluate size, time, etc.” Which displays information about a table we’re going to generate: what kind of lines does it include, size of result file, time of table generation and cryptanalysis for this table.

After all parameters set we just launch generator from main menu: “Additionally -> HRainbow Generator”.

Generation certainly can be resumed after pause and saves table to disk by calculation.

6. Examples and statistics

Example #1."Numbers".

That most likely won’t be surprise that quite a lot of passwords are plain numerical only. Here we’ll try to precalculate them to speed up recovery.

At first look all digits in the password have approximately the same chance of presence. However if we look at combinations, possibilities won’t be equal anymore. For instance, “123” line happens to be present in usual password more often than “294”.

It would be an omission to do not use this fact.

So we shall start finding the most frequently used number combinations.

To do that we take real passwords selection to count frequency for every combination. More the selection range, more precise statistics data is. In our example source password range includes 50 thousands numeric passwords.

Chance calculation had been performed by simple program which checked for all numbers from 100 to 9999 how many occurrences every subline number got. For instance, password "123123321" includes 3 of each line "1", "2", "3"; 2 of each line "12", "23", "123"; all the others once each.

Then N of most frequent sublines were saved to dictionary. Moreover, all the missing lines from 0...9 (i.e. single digits) and 00...99 (i.e. double numbers) set had been added.

We assumed N equal 500. There can be other variants what give better results.

We save this output dictionary containing 610 lines (500 + 110) and raise it to biquadrate.

This dictionary will include all numbers from 0000 to 99999999 (four 2-digit sublines). It will also contain the most relevant combinations up to 16-digit length (four 4-digit numbers sublines) like 1234123412341234 or 0000111122223333.

610^4 = 138 458 410 000. To reduce this size and increase speed of search we use precalculation technique.

With chain length = 5000, performance factor = 300%, add our dictionary four times to the dictionary list. Other parameters are inessential in this case.

This table will take 610^4/(5000/16) = 443066912 bytes. I.e. a bit less than 450 megabytes.

Time of generation (performed once) of this table for MD5 on Athlon 3000+ will take about 24 hours.

Cryptanalysis time, i.e. actual time of password recovery for each hash value, will hardly exceed 30 seconds.

Important, that Hybrid Rainbow tables have probabilistic features and in this case attack may recover about 87% of passwords which are recovered by hybrid attack using the same dictionary.

Below you may compare Hybrid Rainbow and hybrid attack. There’s no sense to recover Hybrid Rainbow with classic rainbow-tables or brute-force attack, as these last methods can’t recover 16-symbol passwords at reasonable time (less than a year).

Code:
Method                             Preparation time   Recovery time   Success rate
Hybrid attack of our dictionary    0 seconds          5 hours             100%
Hybrid Rainbow precalculation      24 hours           30 seconds          87%

Table shows clear that use of Hybrid Rainbow for single password recovery has no sense. Hybrid attack is way more efficient for “recover and forget” scheme.

However all the other recovery techniques lose sense if there are appropriate hybrid tables available. Moreover, Hybrid Rainbow tables generation is purposeful if you often need to recover passwords from the same known range.

Example #2."Frequently used passwords".

Not only numeric passwords can be precalculated this way, but also composed of any symbols.

Here we’ve selected 14 000 of most frequently used combinations up to 4 symbols long based on initial collection including 150 thousands real passwords.

We cube this dictionary. 14000^3 = 2 744 000 000 000.

Tables will take about 16 GB of disk space, generation (for MD5) will take about 1 month or processor time. Here are generator parameters:

Code:
Table_0_0: chain length = 5000, performance factor = 10%, table number = 0
Table_0_1: chain length = 5000, performance factor = 10%, table number = 0
Table_1_0: chain length = 5000, performance factor = 10%, table number = 0
Table_1_1: chain length = 5000, performance factor = 10%, table number = 0
...
Table_9_0: chain length = 5000, performance factor = 10%, table number = 0
Table_9_1: chain length = 5000, performance factor = 10%, table number = 0

Cryptanalysis at 20 result tables (830 MB each) takes up to 10 minutes for one hash value (not considering time of tables loading).

However researches proved these tables are enough to recover about 85% passwords from any real forum (average values).

Maximal length of the password to recover using these tables is 12 symbols (three 4-symbol combinations), what is quite more than 8 symbols limit of classic Rainbow-tables.

The [SNS] Technologies, 28 april 2007

 
The translation of the original article was made InsidePro