Wednesday, December 30, 2015

Cracking speed improvements

Almost 8 years, we got pretty big improvement with SSE2 code to crack WPA, a nice upgrade from MMX.

I recently posted a bug bounty to fix the compilation of Aircrack-ng on Cygwin 64 bit. It's been working fine on Linux 64 bit but for some reason, Cygwin didn't like when compiling on 64 bit.
We couldn't have tested it back then since Cygwin 64 bit didn't exist at the time.

darkfires took up the challenge to fix the compilation on Cygwin 64 bit. After that, he helped fix a bunch of memory leaks and other issues as well as improving cracking speed quite a bit, which is the reason of this post.

The task was pretty daunting and a lot of testing was needed to make sure it works on the different CPU architectures (x86 32 and 64 bit, various ARM) and different OSes (Cygwin, Linux, BSD, Solaris, OSX).
On top of the usual 'fixing something on one, breaking on the other', here are three examples on how complicated it was:

  • Different CPU support different features and instructions set and detecting them wasn't an easy task. For example, on Raspberry Pi (v1), gcc supports 'neon' and we can compile aircrack-ng with them but the CPU itself doesn't support them which means aircrack-ng crashes and it has to be disabled. On the Beaglebone, the CPU support neon instructions.
  • gcc can compile with AVX2 instructions on x86. However, if the CPU doesn't support it, aircrack-ng will crash with a nice error: 'Illegal instruction'.
  • Some code that works to get CPU features (such as MMX, SSE, AVX) works on some CPU and doesn't on others.
There is no way to explain in details how complicated it was to make it work on all those different combinations of CPU and OSes. darkfires has spent countless hours making all of this work.

To give you an idea how much work has been done, the patch was ~375Kb and ~11K lines long.

On top of it, the Aircrack-ng CPU detection code has been rewritten on x86 to give more details. Here is what 'aircrack-ng -u' now looks like:

Vendor          = Intel
Model           = Intel(R) Core(TM) i7-2630QM CPU @ 2.00GHz
Features        = MMX,SSE,SSE2,SSE3,SSSE3,SSE4.1,SSE4.2,AVX
Hyper-Threading = Yes
Logical CPUs    = 8
CPU cores       = 4
SIMD size       = 4 (128 bit)

Last but not least, here are the numbers.

1.2rc3 r2800 Increase
Celeron M 1.4Ghz 138k/s 152k/s +10%
i7-2630QM ~3000k/s ~4000k/s +33%
E3-1231 v3 ~4900k/s ~13100k/s +167%
i5-4590 ~4700k/s ~11600k/s +146%
i7-6700K ~6200k/s ~17100k/s +175%

It's still pretty far from GPU cracking speeds but there are pretty significant gains thanks to AVX. The second version provides the most gains as you can see on the numbers above.

Bonus thing: if you are a package maintainer, you can compile aircrack-ng with different improvements. Simply edit the common.cfg and put MULTIBIN=true and when running make will compile 3 different versions: the original, SSE and SIMD.

We have tested it quite a bit on different CPU and OSes but please test (simply get the latest revision from our subversion repository) a lot and report back to us. Let us know how it works for you, what kind of improvements you're getting and we especially want to hear if you have bugs. If you have a recent AMD CPU, we want to hear from you.

The plan is to make another release candidate in about 2 weeks.