Cryptography

From Telecomix Crypto Munitions Bureau

Cryptography is the art of concealing the meaning of messages. Related topic is steganography, the art of concealing the presence of hidden messages.

NOTICE: This article will be put on ice until the course is complete. It takes too much time from my studies to write everything down. Eventually, when the course is complete and i get free time, the article will be slit up in parts and more focus will be put into research, rather than basic education.

Maybe we also has to include LaTeX support for this wiki, to keep it away from those really horrible ascii graphs.

[edit] What is this?

These are random notes about cryptography. The article will be written iteratively, each iteration adding some new knowledge.

It is written by a programmer, for other programmers. No higher mathematics required. Often, I will just ignore stuff i dont understand and assume that the mathematicians did it right ~:) trolololol

[edit] Assumptions

The communication system used to carry messages to whoever we wish to communicate with is controlled by the adversary. ("We Do Not Trust The Intertubes, dragons and dictators rule the world lol.")

[edit] Terminology and general useful knowledge

[edit] Categories of ciphers

General categories of ciphers:

Public-key ciphers
- RSA, ElGamal, etc (The original public-key ciphers)
- Elliptic curve cryptography (We need to switch to these soon, RSA is becomming weaker..)
- Post-quantum public key algorithms (Built to resist the quantum computer)
Block ciphers
- A good block cipher should not have any known attack other than brute force.
Stream ciphers
- Stream ciphers are generally thought of as weak. Do not use. (With one exception: block ciphers in CTR-mode.)

[edit] Symmetric and asymmetric ciphers

Ciphers can also be divided into symmetric and asymmetric ciphers. Symmetric ciphers use one key for both encryption and decryption. Asymmetric ciphers have two keys: One public key and one private key. Symmetric ciphers are block ciphers and stream ciphers. Asymmetric ciphers are public-key ciphers.

[edit] Additional useful functions

Cryptographic systems (protocols) also makes use of

Hash functions (MD5, SHA, SHA-256..)
MACs (Message Authentication Codes. Can be implemented with hash functions.)

Hash functions and MACs turn bitstrings into fixed-size bitstrings. Good hash functions are very difficult to compute "backwards": If one has a hash function f(x) = y, the only method to compute x from y should be by brute force. (Since they map arbitrary length bitstrings into fixed-size bitstrings, there are multiple x'es that result in any y.)

[edit] Attacks against ciphers

There are a number of names for different types of attacks against cipher systems:

Brute force: Try all different combinations until a solution is found.
Meet-in-the-middle attack: See 2DES below for an example.
Linear and differential cryptanalysis: Old methods for cracking block ciphers. Uses complex math.
Man-in-the-middle attack: An adversary modifies messages being sent between peers, often to crack the key-exchange algorithm.
Side-channel attack: The adversary makes use of properties in the energy- or time-complexity of the crypto algorithms in order to deduce properties of the cipher key that is being used. This reduces the keyspace that needs to be searched, greatly reducing the security of a cipher.
- Like this: By probing how many milliseconds it takes for a website to decrypt and reply to RSA-encrypted messages, it is sometimes possible to make conclusions about the private key.
And a whole rainforest of other types of attacks.

[edit] Why are ciphers secure?

Ciphers make use of "mathematical riddles" that has no known methods for solving them quickly. There are something called complexity classes that are used to describe how much time/space is required to solve different types of logical problems. Ciphers are secure because they use problems for their riddles that are believed to lie in complexity classes that humanity currently does not know to solve efficiently.

Examples:

Block ciphers use combinatorial explosions (i think its called that?)
RSA uses the problem of integer factorization
ElGamal uses the problem of discrete logarithms

It is important to notice that many problems are not known to belong to some specific complexity class. This science is still in its cradle. For example, we do not know if P=NP.

Recently, humanity has begun to experiment with creating computers that uses quantum mechanics to extend the set of problems that can be efficiently solved. Quantum computers will, if they are possible to build, extend our abilities to solve logical problems.

[edit] Quantum paranoia

In case you are paranoid about NSA having built quantum computers for cracking your ciphers:

Groovers algorithm: Reduces brute force attacks from O(2^keysize) to O((2^keysize)^(1/2))-time. Brute force cracking a cipher using a quantum computer only requires the square root of the number of computations needed for a classical computer to do the same thing. Examples: AES256 becomes as secure as AES128, SHA-512 becomes as secure as SHA-256.
Shor's algorithm: Reduces prime-factorization to O((log n)^3)-time. This means that classical public key-ciphers can be cracked very quickly by quantum computers. Discrete logarithms can also be cracked by Shor's algo. Examples: RSA and ElGamal are not secure after the invention of the quantum computer.

Groovers and Shor's algorithms can only be implemented in quantum computers.

Notice that there is a class of ciphers known as the post-quantum ciphers. They are based upon problems that is thought to lie outside the BQP complexity class. An example of a post-quantum cipher is the McEliece.

[edit] Block ciphers

Characteristics: One key for both encryption and decryption. Quick ciphers. Encrypts a fixed-size block of bits.
Design goals: Completely random, known-plaintext attacks should not be possible.
How they works: Confusion and diffusion.
Examples: DES, AES, blowfish, twofish..
Modes of encryption: ECB, CBC, CTR..

More to write about:

Side-channel attacks (measuring energy/time consumption for cryptanalysis, and countermeasures)
How to do cryptanalysis for really simple block ciphers
2DES and why it fails
3DES and why it works

[edit] Components of block ciphers

Block ciphers do something like this:

select an X                      # X is something like 8, 10 or 12.
for i=0 to X do:                 # every loop is called a "round"
   confusion
   diffusion
optionally some more operations at the end to make it pretty
done

There are some different paradigms for doing the above:

Feistel network
SP networks
others

[edit] Confusion

Confusion is another word for substitution: To replace symbols with other symbols. (Example: S-block in AES.)

[edit] Diffusion

Spread out the bits over the entire block: To move symbols around. CAFNBE may become ABNCEF.

The reason for this: The cipher should be completely random. Flipping a single bit in the cleartext should result in about half the bits in the ciphertext is flipped. Diffusion makes so that a single symbol in the cleartext is "spread out" over the entire ciphertext block.

Example: The Feistel network's f()-function below does this.

[edit] Key scheduler

Often, every new round in a block cipher uses its own cipher key (for confusion and diffusion steps, so that their actions are unpredictable every round). The cipherkey is created from the original key and should be difficult to guess.

If you know how it works, plox insert your description here.

Maybe one could use a hash function for generating subkeys?

[edit] Feistel network

A feistel network describes a single round in a feistel block cipher. It looks like this.

   L(i-1)          R(i-1)
     |               |
    xor <-- f() -----+
     |        \      |
     |         \-------- key            # key should ideally be generated by a key scheduler
     |               |
    R(i)            L(i)

Description: If the message is 128 bits long, L() and R() are the left 64 and the right 64 bits of that message. Each round through the network will swap the left and right bit of the message, and also xor the right part of the message with the result of f(R(i-1),key). f() is some nice and preferably unpredictable function.

Feistel networks does both confusion and diffusion of a message. Generally: The more rounds, the better.

Or in other words:

L(i) = R(i-1)
R(i) = L(i-1) xor f(R(i-1),key)

which also means that

L(i-1) = R(i) xor f(R(i-1),key)
R(i-1) = L(i)

This definition can be used to recursively describe a N-round feistel cipher as a simple, but rather huge, equation. Creating a huge function like this can be a first step when performing cryptanalysis of a block cipher.

[edit] Example of ultra-simple feistel cipher

This cipher is severely flawed. Only for educational purpose. It has no key scheduler, has a really stupid feistel function, has blocksize: 8-bits, can be attacked by just simplifying the function.

We select the feistel function f(x) = ( x xor 45 + key ) mod 256. It has mod 256 because its block size is 8 bits (2^8 = 256). The fiestel cipher has 3 rounds. We start at round 3 and recursively resolve the function backwards (plox see the definition of the feistel cipher above):

Base cases:
- L(0) are the least significant 4 bits of the cleartext.
- R(0) are the most significant 4 bits of the cleartext.
- "||" means concatenation ( a || b = the bitstring a followed by the bitstring b )
We begin at round 3: L(3) || R(3)
R(2) || L(2) xor f(R(2), key)
L(1) xor f(R(1), key) || R(1) xor f(L(1) xor f(R(1),key), key)
End at round 0. Here is our cipher: R(0) xor f(L(0) xor f(L(0) xor f(R(0),key),key), key) || L(0) xor f(R(0),key) xor f(R(0) xor f(L(0) xor f(R(0),key),key), key)
We now have the function. Lets translate it to the actual definition:
- Our feistel function: f(x) = ( x xor 45 + key ) mod 256
- We wish to remove all f(x) in the description of the zeroth round. This will give us the full function :) It is a quite huge function already so lets break it up into pieces...
R(0) xor A || L(0) xor B xor C ...and...
1. A = f(L(0) xor D, key)
2. B = f(R(0), key))
3. C = f(R(0) xor E, key)
4. D = f(L(0) xor F, key)
5. E = f(L(0) xor G, key)
6. F = f(R(0), key)
7. G = f(R(0), key)
We remember that f(x) = ( x xor 45 + key ) mod 256 and re-assembles the function again:
1. G = ( R(0) xor 45 + key) mod 256
2. F = ( (R(0) xor 45 + key) mod 256
3. E = ( ( L(0) xor G ) xor 45 + key ) mod 256 = ( ( L(0) xor ( R(0) xor 45 + key) mod 256 ) xor 45 + key ) mod 256
4. D = ( ( L(0) xor F ) xor 45 + key ) mod 256 = ( ( L(0) xor f(R(0), key) ) xor 45 + key ) mod 256
5. C = ( ( R(0) xor E ) xor 45 + key ) mod 256 = ( ( R(0) xor ( ( L(0) xor G ) xor 45 + key ) mod 256 = ( ( L(0) xor ( R(0) xor 45 + key) mod 256 ) xor 45 + key ) mod 256 ) xor 45 + key ) mod 256
6. B = ( R(0) xor 45 + key) mod 256 )
7. A = ( ( L(0) xor D ) xor 45 + key ) mod 256 = ( ( L(0) xor ( ( L(0) xor F ) xor 45 + key ) mod 256 = ( ( L(0) xor f(R(0), key) ) xor 45 + key ) mod 256 ) xor 45 + key ) mod 256
We put this together and arrive at the final result. The ciphertext transform is: ciphertext = R(0) xor ( ( L(0) xor f(R(0), key) ) xor 45 + key ) mod 256 ) xor 45 + key ) mod 256 || L(0) xor ( R(0) xor 45 + key) mod 256 ) xor ( ( L(0) xor ( R(0) xor 45 + key) mod 256 ) xor 45 + key ) mod 256 ) xor 45 + key ) mod 256
One can remove all the "mod 256" if one only uses 8-bit operations. (Define a + b as a + b mod 256.)
if you has a lot of time and knows some boolean logics you can start simplifying the description of the cipher now. However, since the numbers of keys are just 256, its easily brute forced.

[edit] Encryption modes

When encrypting more than one block, things will fuck up if we just simply encrypt each block individually. Wikipedia has a nice article. Read it.

Common cipher modes:

ECB -- Electronic Code Book-mode. It encrypts each block individually. Is 99.9% fail. Do not use.
CBC -- Cipher Block Chain-mode. Mostly used, for example in many SSL-suites.
CTR -- Counter mode. Generates a high-quality pseudorandom bitstring that the plaintext is xored with. This means that the computation-intensive operations may be parallelized and buffered for later usage. Works a bit like an imperfect OTP and makes the block cipher behave like a stream cipher. Suitable for distributed concurrent multiprocessor communications systems.

[edit] AES

...write moar?

[edit] AES in x86

Modern x86 microprocessors supports hardware acceleration of AES. Below are assembler mnemonics. The programmer needs to implement the remaining parts of the cipher that is not done automagically by the processor. This means that the hardware implementation is independent of the type of AES: It can be made to work for 128-bit (10 rounds), 196-bit (12 rounds) and 256-bit (14 rounds).

Encryption acceleration: AESENC, AESENCLAST
Decryption acceleration: AESDEC, AESDECLAST
InvMixColumn Transformation: AESIMC
Key scheduler acceleration: AESKEYGENASSIST

[edit] DES

Wikipedia
56-bit key (64-bit if one includes parity bits)
Encrypts/decrypts in 64-bit sized blocks
DES is not secure. 2^56 different keys is possible to brute force (using a computing cluster or ASIC).

It is possible to extend DES to make it more secure. See 2DES and 3DES below.

[edit] 2DES

2DES is the idea of encrypting with DES twice, using different keys. One might think that this would increase the number of possible keys with a factor 2^56, but this is not true. 2DES is described here only for educational purpose. It is not, and should not, be used in the real world.

What is 2DES, and why is it insecure?

2DES encrypts messages using E_K1(E_K2(Message)). E_x is encryption using key x.
What if there is a third key K3 such that E_K3(Message) = E_K1(E_K2(Message))?
- It can be shown that this is often not the case. (no proof will be made of that here.)
There is a meet-in-the-middle attack against 2DES. This is why 3DES is used instead.
1. Compute 2^56 plaintext-ciphertext pairs (one pair for each possible key) and store them in a hash table. (2^56 is too large to store anywhere, but lets just assume.)
2. Decrypt the ciphertext message you wish to crack with all possible pairs. If there is a match, then mark the key as a candidate.
3. For each candidate, check if there is another ciphertext-cleartext pair that also match. (Let the hash table be indexed by ciphertext this time)
4. If it match, then the key is found.
5. How many computations are needed to perform the meet-in-the-middle attack? Number of encryptions, multiplied by number of decryptions, multiplied with the probability that a key is a candidate = (2^56)(2^56)(2^(-P)) = 2^57. Meaning: 2DES is just twice as difficult to crack than ordinary DES.

NOTICE: This section is FLAWED. Do not rely on it until it has been corrected.

[edit] 3DES

3DES avoids the meet-in-the-middle attack, by encrypting messages three times. This just defeats the attack, and only increases the keysize of DES with 2^56, to a total sum of 2^112. (Just like one would think 2DES would do..)

3DES could be described as: E_K1,K2,K3(Message) = E_K1(E_K2(E_K3(Message))).

Flaw: This makes the key 56*3 = 168 bits long. (Despite the fact that the cipher just has a keysize of 2^112!!)

3DES is instead described as: E_K1,K2(Message) = E_K1(D_K2(E_K3(Message))). (D_x means decryption using key x)

Benefit: Uses two keys with a total size of 112 bits, just as large as the keyspace is :)

[edit] Message Authentication Codes (MACs)

Encryption only provides confidentiality. Encryption does not provide data integrity. By combining a cipher with a MAC, one can remove the possibility that the adversary intercepts messages and replaces them with their own.

In other words

If only a MAC is used, the adversary will see what information is being transmitted, and not be able to modify the messages.
If only ciphers are used, the adversary will not be able to see what information is being transmitted, but will be able to garble the messages (and sometimes, in some special cases, also be able to crack the security.)
If a MAC is used together with a cipher, the adversary will not be able to see the cleartext message and also not be able to modify it without it is detected.

One can think of MACs as being like checksums. They can be implemented with hashes or public-key signatures. (And possibly with even more stuff..)

An encrypted message protected with a MAC could look like this: E_x(Message, MAC(K_AB, Message)). The keys x and AB would have to be shared between the peers first. Also, x and AB may be the same.

Properties of MAC's

Fixed size output. (Transforms any number of bits into a constant number of bits, which means that multiple messages might have the same MAC code.)
Not vulnerable to "known plaintext"-attacks. (In this case, consider the "plaintext" as any number. Slight abuse of terminology: the "plaintext" might be a encrypted message.)

[edit] Block cipher in CBC-mode as a MAC? (CBC-MAC)

Might not be safe. Fill in with more details here.

[edit] Hash functions as MACs

It is possible, and probably the most used technique. This topic will be covered in the future.

[edit] Key exchange

Key exchange is very difficult to do securely if one only uses block ciphers (symmetric ciphers). How to solve this?

Diffie–Hellman key exchange
- Can be attacked by a man in the middle
Public key-ciphers

[edit] Authentication

How does one actually for sure know that a certain key belongs to a certain individual?

Certificate Authorities
- Might need to comply with nations laws and sign keys so that militaries can perform attacks (This might be how Stuxnet got its certificates.)
Web of Trust. Individual persons sign each others keys. If someone you trust has signed someones key, it means that the person you trust has vouched that a certain key actually belongs to this someone. The more people that has signed the key, the more secure it is.

[edit] Public key cryptography

A public key cipher is a cipher where the key is split up in two parts: One public part that is made available for anyone and one private part that is kept secret.

Public key cryptography is generally several thousands times slower than block ciphers. Because of this, public key ciphers are mostly just used to exchange a key for a block cipher during the handshake. Switching to using a block cipher after the initial handshake makes secure communication less computationally expensive :)

Another word for public key ciphers is asymmetric ciphers.

[edit] Encryption and decryption

Encryption using the public key results in ciphertext that only the person that has access to the private key can decrypt.

(Proposal: Mediate upon what this means.)

[edit] Signatures

Most public ciphers has the following property: If one reverse the roles of the public and private keys, one ends up with a scheme for making signatures. By including a messages signature together with the message, it is possible for anyone to verify that it was the person holding the private key that sent the message. This is often used for identification/authentication.

Just like a real signature, but much-much-much-much-more difficult to forge.

[edit] ElGamal

ElGamal is a public cipher much like RSA. Encryption and decryption is done by calculating pretty much only exponents and divisions in an abelian group. Below are some number theory needed for understanding this.

[edit] The greatest common divisor

The greatest common divisor is the largest number that two numbers a and b can both be divided with.

Recursive definition:

gcd(a,0) = a
gcd(a,b) = gcd(b, a mod b)

in C:

int gcd(int a, int b) {
   int tmp;
   while( b > 0 ) {
      tmp = a % b;
      a = b;
      b = tmp;
   }
   return a;
}

Example: gcd(91,35) = 7.

[edit] The extended Euclidean algorithm

The gcdE(a,b) is used in some ciphers and closely resembles the gcd(a,b). gcdE computes in linear time with respect to the number of bits needed to represent the numbers. It also returns a little bit more information than gcd(a,b).

gcdE(a,b) = (d, s, t) ∈ Z³

d = greatest common divisor of a and b
as + bt = d. (s and t are not unique)
Example: gcdE(35, 56) = (7, 5, -3) since 7|35 and 7|56 and 35*5 + 56(-3) = 7.

recursive definition:

Base case (b = 0): the result is (a, 1, 0).
Recursion case:
- q = a div b (q is the whole-integer result of a/b)
- r = a mod b (r is the reminder from a/b)
- PROJECT ABORTED, see notice at the very top of the article.

[edit] Abelian groups

[edit] Generators

[edit] Discrete logarithms

[edit] Encryption/decryption using ElGamal

[edit] Example of ElGamal by hand

[edit] Replay attacks

If the adversary captures packets being sent between Alice and Bob, the adversary will be able to send the exact same messages at a later time. This can fool them that they are having a real conversation about the same topic as they had last time. (Think: Alice logs into the Bob computer system and performing some task. Then Eve sends the exact same packets that Alice sent to Bob, fooling Bob that she is Alice that wish to do the exact same thing again. Imagen Bob being a bank, and that Alice made a transaction to Eve. Or something.)

How to defeat replay attacks: Include sequence numbers in all messages. Make sure its not easy to guess them (like, having a counter being intitialized to a large random number (64 bits or so) at the beginning of each session. Then add one to the counter, for each message being sent.)

Obviously, the sequence number should be protected by a MAC (and preferably encrypted) otherwise the adversary will be able to modify it.

[edit] Padding

Padding is done because the message that is being encrypted not always match the block-size of the cipher, or because one does not wish to reveal the size of the message that was encrypted.

Padding with zeroes: Not OK. It is difficult to know when the message ends, if it ends with zeroes. It could also make it easier to crack the cipher.
One has to use something else than just zeroes. (OAEP for public-key ciphers)