6. Cryptanalyse de XOR avec clef redondante

6. Cryptanalyse de XOR avec clef redondante

Break repeating-key XOR

It is officially on, now.
This challenge isn't conceptually hard, but it involves actual error-prone coding. The other challenges in this set are there to bring you up to speed. This one is there to qualify you. If you can do this one, you're probably just fine up to Set 6.

There's a file here. It's been base64'd after being encrypted with repeating-key XOR.
Decrypt it.
Here's how:

  1. Let KEYSIZE be the guessed length of the key; try values from 2 to (say) 40.
  2. Write a function to compute the edit distance/Hamming distance between two strings. The Hamming distance is just the number of differing bits. The distance between:
    this is a test
    and
    wokka wokka!!!
    is 37. Make sure your code agrees before you proceed.
  3. For each KEYSIZE, take the first KEYSIZE worth of bytes, and the second KEYSIZE worth of bytes, and find the edit distance between them. Normalize this result by dividing by KEYSIZE.
  4. The KEYSIZE with the smallest normalized edit distance is probably the key. You could proceed perhaps with the smallest 2-3 KEYSIZE values. Or take 4 KEYSIZE blocks instead of 2 and average the distances.
  5. Now that you probably know the KEYSIZE: break the ciphertext into blocks of KEYSIZE length.
  6. Now transpose the blocks: make a block that is the first byte of every block, and a block that is the second byte of every block, and so on.
  7. Solve each block as if it was single-character XOR. You already have code to do this.
  8. For each block, the single-byte XOR key that produces the best looking histogram is the repeating-key XOR key byte for that block. Put them together and you have the key.

This code is going to turn out to be surprisingly useful later on. Breaking repeating-key XOR ("Vigenere") statistically is obviously an academic exercise, a "Crypto 101" thing. But more people "know how" to break it than can actually break it, and a similar technique breaks something much more important.

No, that's not a mistake. We get more tech support questions for this challenge than any of the other ones. We promise, there aren't any blatant errors in this text. In particular: the "wokka wokka!!!" edit distance really is 37.

Miroir du fichier texte

Comme expliqué dans l'énoncé, la difficulté de cet exercice est supérieure aux précédents. De manière à le résoudre de la manière la plus claire possible, je vais séparer la résolution en 3 parties : dans un premier temps la mise en place d'une fonction permettant de calculer la distance de Hamming, dans un second temps la résolution de la taille de la clef et dans un troisième temps la recherche de la clef.

Distance de Hamming

L'article wikipédia sur le sujet ayant une définition mathématiques relativement absconse, concentrons nous sur l'explication pratique. La distance de Hamming entre 2 octets (un octet étant une série de 8 bits) est le nombre de bits qui diffèrent entre les deux. Par exemple :

Si l'on a a = 0001111 et b = 1101011 alors d(a,b) = 11001000 = 3

Lorsque l'on fait la somme des distance de Hamming de chaque lettre de deux mots, on parle alors de Borne de Hamming:

d(t,w) = d(1110100, 1110111) = 2
d(h,o) = d(1101000, 1101111) = 3
d(i,k) = d(1101001, 1101011) = 1
d(s,k) = d(1110011, 1101011) = 2
d( ,a) = d(0100000, 1100001) = 2
d(i, ) = d(1101001, 0100000) = 3
d(s,w) = d(1110011, 1110111) = 1
d( ,o) = d(0100000, 1101111) = 5
d(a,k) = d(1100001, 1101011) = 2
d( ,k) = d(0100000, 1101011) = 4
d(t,a) = d(1110100, 1100001) = 3
d(e,!) = d(1100101, 0100001) = 2
d(s,!) = d(1110011, 0100001) = 3
d(t,!) = d(1110100, 0100001) = 4
________________________________
                              37

En terme de code, on peut écrire les fonctions suivante:

def hamming_distance(first_bytes, second_bytes):
    length = max(len(first_bytes), len(second_bytes))
    first_bytes = f"{(length-len(first_bytes))*'0'}{first_bytes}"
    second_bytes = f"{(length-len(second_bytes))*'0'}{second_bytes}"
    distance = 0
    for i in range(length):
        if first_bytes[i] != second_bytes[i]: distance += 1
    return distance


def hamming_bound(first_string, second_string):
    distance = 0
    for i in range(len(first_string)):
        distance  += hamming_distance(format(ord(first_string[i]),"b"),format(ord(second_string[i]),"b"))
    return distance

string_a = "this is a test"
string_b = "wokka wokka!!!"

print(hamming_bound(string_a,string_b))

Si l'on exécute ce code, on obtient le résultat attendu: 37.

Résolution de l'exercice

Les consignes de l'énoncé ne sont pas particulièrement claires, mais le but est de:

A titre d'exemple, si on avait la chaine de caractère ABCDEFGHIJKLMNOPQRSTUVWXYZ, on calculerait les valeurs suivantes:

Pour KEYSIZE = 2
d(AB,CD) / 2
d(CD,EF) / 2
d(EF,GH) / 2
d(GH,IJ) / 2
...
Puis pour KEYSIZE = 3
d(ABC,CDE) / 3
d(CDE,FGH) / 3
d(FGH,IJK) / 3
d(IJK,LMN) / 3
...

Pour chaque taille de clef, on normalisera la somme des distances de Hamming. Un exemple de code pour effectuer l'ensemble des opérations décritent ci-dessus est:

import base64


def hamming_distance(first_bytes, second_bytes):
    length = max(len(first_bytes), len(second_bytes))
    first_bytes = f"{(length-len(first_bytes))*'0'}{first_bytes}"
    second_bytes = f"{(length-len(second_bytes))*'0'}{second_bytes}"
    distance = 0
    for i in range(length):
        if first_bytes[i] != second_bytes[i]: distance += 1
    return distance

def hamming_bound_bytes(first_bytes_string, second_bytes_string):
    distance = 0
    for i in range(len(first_bytes_string)):
        distance += hamming_distance(bin(first_bytes_string[i])[2:], bin(second_bytes_string[i])[2:])
    return distance

def xor_guess_key_size(string, key_min_size, key_max_size):
    hamming_normalized = {}
    for key in range(key_min_size, key_max_size+1):
        key_size_hamming = 0
        for i in range(0, (len(string) // key) - 1):
            key_size_hamming += (hamming_bound_bytes(string[i*key:(i+1)*key], string[(i+1)*key:(i+2)*key]) / key)
            hamming_normalized[key] = (key_size_hamming / ((len(string) // key) - 1))
    return min(hamming_normalized, key=hamming_normalized.get)

file = open('6.txt', 'r')
lines = "".join(file.read().splitlines())
raw_string = base64.b64decode(lines)

keysize = xor_guess_key_size(raw_string, 2, 40)
print("KEYSIZE length is", keysize)

Exécuter ce code donne la longueur de la clef XOR :

KEYSIZE length is 29

Remarque: D'un point de vue algorithmique la complexité de ce code avoisine O(n^4). En effet ; on calcul l'ensemble des distances de hamming pour chacun des bytes composant chaque sous-chaines et ce pour toutes les tailles de clef. C'est donc un code très peu performant qui exécute un total de 679857 boucles for

Recherche de la clef

Dans la section précédente on a trouvé que la clef utilisé pour chiffrer le fichier texte mesurait 29 caractères. Pour trouver le contenu de la clef, on va :

De manière plus clair, si on avait la clef XOR ICE, alors tous les 3 caractères du texte chiffré l'aurait été avec le même caractère de la clef. En exemple, si l'on chiffre la chaine I see a red door, alors tous les caractères en gras seront chiffrés avec la lettre I, puis tous les caractères suivants avec la lettre C, etc.

En terme de code on peut écrire:

keysize = 29
chunks = len(raw_string) // keysize

key = []
for key_char in range(keysize):
    same_key_char = b""
    for char in range(chunks):
        position = key_char+(keysize*char)
        same_key_char += (raw_string[position:position+1])
    result = xor_single_char_breaker(same_key_char.hex())[0]
    key.append(result["char"])

print("The key is:", "".join(key))

Ce qui retourne le résultat:

The key is: Terminator X: Bring the noise

Il ne nous reste plus qu'à déchiffrer le texte contenu dans le fichier avec cette clef:

print("Decoded text:", xor_string(raw_string.decode(), key))

Ce qui nous donne :

I'm back and I'm ringin' the bell
A rockin' on the mike while the fly girls yell
In ecstasy in the back of me
Well that's my DJ Deshay cuttin' all them Z's
Hittin' hard and the girlies goin' crazy
Vanilla's on the mike, man I'm not lazy.

I'm lettin' my drug kick in
It controls my mouth and I begin
To just let it flow, let my concepts go
My posse's to the side yellin', Go Vanilla Go!

Smooth 'cause that's the way I will be
And if you don't give a damn, then
Why you starin' at me
So get off 'cause I control the stage
There's no dissin' allowed
I'm in my own phase
The girlies sa y they love me and that is ok
And I can dance better than any kid n' play

Stage 2 -- Yea the one ya' wanna listen to
It's off my head so let the beat play through
So I can funk it up and make it sound good
1-2-3 Yo -- Knock on some wood
For good luck, I like my rhymes atrocious
Supercalafragilisticexpialidocious
I'm an effect and that you can bet
I can take a fly girl and make her wet.

I'm like Samson -- Samson to Delilah
There's no denyin', You can try to hang
But you'll keep tryin' to get my style
Over and over, practice makes perfect
But not if you're a loafer.

You'll get nowhere, no place, no time, no girls
Soon -- Oh my God, homebody, you probably eat
Spaghetti with a spoon! Come on and say it!

VIP. Vanilla Ice yep, yep, I'm comin' hard like a rhino
Intoxicating so you stagger like a wino
So punks stop trying and girl stop cryin'
Vanilla Ice is sellin' and you people are buyin'
'Cause why the freaks are jockin' like Crazy Glue
Movin' and groovin' trying to sing along
All through the ghetto groovin' this here song
Now you're amazed by the VIP posse.

Steppin' so hard like a German Nazi
Startled by the bases hittin' ground
There's no trippin' on mine, I'm just gettin' down
Sparkamatic, I'm hangin' tight like a fanatic
You trapped me once and I thought that
You might have it
So step down and lend me your ear
'89 in my time! You, '90 is my year.

You're weakenin' fast, YO! and I can tell it
Your body's gettin' hot, so, so I can smell it
So don't be mad and don't be sad
'Cause the lyrics belong to ICE, You can call me Dad
You're pitchin' a fit, so step back and endure
Let the witch doctor, Ice, do the dance to cure
So come up close and don't be square
You wanna battle me -- Anytime, anywhere

You thought that I was weak, Boy, you're dead wrong
So come on, everybody and sing this song

Say -- Play that funky music Say, go white boy, go white boy go
play that funky music Go white boy, go white boy, go
Lay down and boogie and play that funky music till you die.

Play that funky music Come on, Come on, let me hear
Play that funky music white boy you say it, say it
Play that funky music A little louder now
Play that funky music, white boy Come on, Come on, Come on
Play that funky music