Professor: Alfred Menezes | Term: Fall 2024

Chapter 1: Introduction to Cryptography

Cryptography is about securing communications in the presence of malicious adversaries.

Alice and Bob are communicating via an unsecured channel. Eve is eavesdropping (malicious). Communication should be

  • Confidential: Keep data secret from everyone not authorized to see it.
  • Data Integrity: Ensure data is not altered by others.
  • Data Origin Authentication: Corroborating the source of data.
  • Non-Repudiation: Preventing an entity from denying previous commitments or actions.

Transport Layer Security (TLS): The cryptographic protocol used by web browsers to securely communicate with websites (e.g. Facebook, Gmail).

TLS is used to assure an individual user of the authenticity of the website and establish a secure communications channel.

Symmetric-Key Cryptography: The client and server a priori share some secret information called a key.

They can encrypt their messages with AES and authenticate the resulting ciphertexts with HMAC.

Question

How do Alice and Bob share the shared secret ?

Public-key cryptography: The client and server a priori share some authenticated (but non-secret) information.

To establish a secret key, Alice selects the secret session key , and encrypts it with Bob’s RSA public key. Then only Bob can decrypt the resulting ciphertext with its RSA private key to recover .

Question

How does Alice obtain an authentic copy of Bob’s RSA public key?

Bob cannot send it to Alice over the internet because it can be intercepted.

Signature scheme: Bob’s RSA public key is signed by a Certification Authority (CA) using its secret signing key with the RSA signature scheme.

Alice can verify the signature using the CA’s RSA public verification key. In this way, Alice obtains an authentic copy of Bob’s RSA public key.

The CA’s RSA public key is embedded in Alice’s browser.

TLS potential vulnerabilities:

  • The cryptography is weak
  • Cryptography can be broken using quantum computers
  • Weak random number generation
  • Issuance of fraudulent certificates
  • Software bugs
  • Phishing attacks
  • TLS only protects data during transit. It does not protect data stored at the server

Definition

Cybersecurity is comprised of the concepts, technical measures, and administrative measures used to protect networks, computers, programs and data from deliberate or inadvertent unauthorized access, disclosure, manipulation, or use.

Cryptography Cybersecurity. Cryptography provides some mathematical tools that can assist with the provision of cybersecurity services. It is a small part of a complete security solution.

Security is a chain; weak links become targets.

Chapter 2: Symmetric-Key Encryption

Fundamental Concepts

Definition

Asymmetric-key encryption scheme (SKES) consists of:

  • - the plaintext space,
  • - the ciphertext space,
  • - they key space,
  • a family of encryption functions ,
  • a family of decryption functions ,

such that for all .

  1. Alice and Bob agree on a secret communicating over the secured channel
  2. Alice computes and sends the ciphertext to Bob over the unsecured channel.
  3. Bob retrieves the plaintext by computing

Note: The secret key might be used for a fixed time interval, or it might be used a fixed number of times.

Called symmetric-key encryption since the same key is used for encryption and decryption.

The Enigma and Lorenz Machines are examples.

Substitution Cipher

  • all English messages
  • all encrypted messages
  • all permutations of the English alphabet.
  • Apply permutation to , one letter at a time
  • Apply the inverse permutation to , one letter at a time

Security Model: Defined the computational abilities of the adversary, and how she interacts with the communicating parties.

Convention: We will always strive to model maximal adversary capabilities and minimal adversary goals.

Basic assumption: The adversary knows everything about the SKES, except the particular key chosen by Alice and Bob. Avoid security by obscurity.

Computational power of the adversary:

  • Information-theoretic security: Eve has infinite computational resources
  • Complexity-theoretic security: Eve is a “polynomial-time Turing machine”
  • Computational security: Eve has a bounded amount of computational power. We say Eve is computationally bounded.

Passive Attacks:

  • Ciphertext-only attack: The adversary knows some ciphertext
  • Known-plaintext attack: The adversary also knows some plaintext and the corresponding ciphertext. Active Attacks:
  • Chosen-plaintext attack: The adversary can also choose some plaintext and obtains the corresponding ciphertext
  • Clandestine attacks: bribery, blackmail (not covered in this course)

Adversary’s goal

  1. Recover the secret key
  2. Systematically recover plaintext from ciphertext, without necessarily learning .
  3. Learn some partial information about the plaintext from the ciphertext, other than its length

If the adversary can achieve 1 or 2, the SKES is said to be totally insecure.

If the adversary cannot learn any partial information about the plaintext from the ciphertext, the SKES is said to be partially secure.

Definition

A symmetric-key encryption scheme is said to be secure if it is semantically secure against chosen-plaintext attack by a computationally bounded adversary.

To break a symmetric-key encryption scheme, the adversary has to accomplish the following:

  1. The adversary is given a challenge ciphertext
  2. During its computation, the adversary can select arbitrary plaintext and obtain the corresponding ciphertexts
  3. After a feasible amount of computation, the adversary obtains some information about the plaintext corresponding to .

Desirable properties of a SKES

  1. Efficient algorithms should be known for computing and .
  2. The secret key should be small, but large enough to render exhaustive key search infeasible.
  3. The scheme should be secure.
  4. The scheme should be secure even against the designer of the system.

The simple substitution cipher is totally insecure against a chosen-plaintext attack.

Is exhaustive key search possible?

Number of keys to try is . This will take too long, even with many computers.

In this course:

  • operations is considered very easy.
  • operations is considered easy.
  • operations is considered feasible.
  • operations is considered barely feasible.
  • operations is considered infeasible.

The Bitcoin network is performing about hash operations per hour.

Definition

A cryptographic scheme is said to have a security level of bits if the fastest known attack on the scheme takes approximately operations. A security level of bits is desirable in practice.

Simple frequency analysis of ciphertext letters can be used to recover the secret key.

Vigenere Cipher

The secret key is an English word having no repeated letters, e.g. CRYPTO. We sum each letter in by to get . Here , and addition of letters is module . Decryption is subtraction module .

The Vigenere cipher is totally insecure against a chosen-plaintext attack.

It is also insecure against a cipher-text only attack.

Stream Ciphers

A stream cipher is a SKES that encrypts the plaintext one bit at a time. A block cipher is a SKES that encrypts the plaintext one block at a time.

One-time Pad

The secret key is a random string of letters with the same length as the message. We use the same process as the Vigenere.

The key should not be reused. If and , then . So, depends only on the plaintext (and not on the key ), and hence can leak information.

Convention: From now on, messages and keys will be assumed to be bit (binary) strings.

Notation: is bitwise exclusive-or (XOR) (bitwise addition mod ).

Example

Note that

So, for the one-time pad, encryption is and decryption is .

Perfect secrecy: The one-time pad is semantically secure against ciphertext-only attack by an adversary with infinite computational resources.

This can be formally proven using concepts from information theory. Shannon proved that if plaintexts are -bit strings, then any symmetric-key encryption with perfect secrecy must have . So perfect secrecy is useless in practice.

Basic Idea: Instead of using a random key in the one-time pad, use a “pseudorandom” key.

Definition

A pseudorandom bit generator (PRBG) is a deterministic algorithm that takes as input a (random) seed, and outputs a longer “pseudorandom” sequence called the keystream.

Stream cipher: uses a PRGB for encryption. The seed is the secret key shared by Alice and Bob.

No more perfect secrecy - security depends on the quality of the PRBG.

For a stream cipher to be secure:

  • Indistinguishability requirement: The keystream should be “indistinguishable” from a random sequence.
  • Unpredictability requirement: Given portions of the keystream, it should be computationally infeasible to learn any information about the remainder of the keystream
    • If an adversary knows a portion of ciphertext and the corresponding plaintext , then she can easily find the corresponding portion of the keystream.

Aside: Don’t use UNIX random number generators rand and srand for cryptographic purposes.

Now we introduce the ChaCha20 stream cipher. Designed by Dan Bernstein in 2008, the ChaCha20 is conceptually very simple. It is extremely fast in software an does not require and special hardware. No security weaknesses have been found.

Notation:

  • bit
  • bit constant
  • bit nonce
  • bit counter
  • A hexadecimal digit is a bit number.
  • A nonce is a non-repeating quantity

The initial state is:

More Notation:

  • xor
  • integer addition module
  • left-rotation by positions

Define : Quarter Round Function

Input: Four -bit words

Output:

ChaCha20 Keystream Generator

Select a nonce and initialize the counter to . While keystream bytes are required do:

  • Create the initial state , and make a copy of .
  • Update by repeating the following times:
  • Output ( keystream bytes).
  • Increment the counter

Encryption: The keystream bytes are xored with the plaintext bytes to produce ciphertext bytes. The nonce is appended to the ciphertext.

Block Ciphers: DES

Definition

A block cipher is a symmetric-key encryption scheme that breaks up the plaintext into blocks of a fixed length (e.g. bits), and encrypts the blocks one at a time.

In contrast, a stream cipher encrypts the plaintext one character (usually a bit) at a time.

A historically important example of a block cipher is the Data Encryption Standard (DES):

flowchart LR
key
key -- 56 bits --> DES
plaintext -- 64 bits --> DES
DES -- 64 bits --> ciphertext

Key length: bits, size of key space: , block length: bits.

Design principles articulated by Claude Shannon

Security:

  1. Diffusion: Each ciphertext bit should depend on all plaintext bits.
  2. Confusion: The relationship between key and ciphertext bits should be complicated.
  3. Key length: Should be small, but large enough to preclude exhausting key search.

Efficiency:

  1. Simplicity: Easier to implement and analyze.
  2. Speed: High encryption and decryption rates.
  3. Platform: Suitable for hardware and software.

The design principles of DES are still classified.

DES Problem #1: Small key size

Exhaustive search on the key space takes operations and can easily be parallelized.

DES Problem #2: Small block size

If plaintext blocks are distributed uniformly at random, then the expected number of ciphertext blocks observed before a collision occurs is . This is by the birthday paradox.

Birthday paradox: Suppose that an urn contains numbered balls. Suppose that balls are drawn from the urn, one at a time, with replacement. The expected number of draws before a ball is selected for the second time (called a collision) is approximately .

Thus the ciphertext reveals some information about the plaintext.

Block Ciphers: Triple-DES

Recall: The only substantial weaknesses known in DES are the obvious ones: small key length and small block length.

Question

How can one construct a more secure block cipher from DES?

Multiple encryption: Re-encrypt the ciphertext one or more times using independent keys. Hope that this increases the effective key length.

Note: Multiple encryption does not necessarily result in increased security.

Example

If denotes the encryption function for the simple substitution cipher with the key , then any more secure than ?

No, since

Encryption is: , where DES decryption.

The Double-DES key length is bits, so exhaustive key search takes operations.

Note: The block length is unchanged ( bits).

Meet-in-the-middle Attack on Double-DES

Input: 3 plaintext/ciphertext pairs Output: The secret key

  1. For each do: a. Compute and store
  2. For each do:
  3. Compute
  4. Search for in the table
  5. For each match in the table:
    1. Check if ; if so then:
      1. Check if ; if so then:
        1. Output and STOP.

The main idea is that

Question

How many plaintext/ciphertext pairs are needed for unique key determination?

Let be a block cipher with key space , and plaintext and ciphertext space .

Let be the secret key chosen by Alice and Bob, and let , be known plaintext/ciphertext pairs where the plaintext are distinct.

Question

So how large should be to ensure that there is only one key for which for all ?

We select so that where is the expected number of false keys.

For each , the encryption function is a permutation (bijection).

We make the heuristic assumption that for each is a random function, i.e. a randomly selected function. This assumption is certainly false since is not random, and because a random function is almost certainly not a permutation.

Now, fix .

The probability that for all is Thus, the expected number of false keys for which for all is

Time: The number of DES operations is

The security level of Double-DES is 57 bits, so Double-DES is not much more secure than DES.

A Triple-DES secret key is , where

Encryption is: , where DES encryption.

flowchart LR
k1
k2
k3
id1[DES]
id2[DES]
id3[DES]
k1 --> id1
k2 --> id2
k3 --> id3
m -- 64 bits --> id1
id1 -- 64 bits --> id2
id2 -- 64 bits --> id3 
id3 -- 64 bits --> c

Decryption is: , where DES decryption.

The Triple-DES key length is bits, so exhaustive key search takes operations (infeasible).

Meet-in-the-middle attack takes operations. So the security level of Triple-DES is 112 bits.

Block Ciphers: AES

The Advanced Encryption Standard (AES)

Key lengths: , and bits.

Block length: bits

2024: No attacks have been found on AES that are faster than exhaustive key search.

AES is an example of a substitution-permutation network.

Definition

A substitution-permutation network (SPN) is an iterated block cipher where a round consists of a substitution operation followed by a permutation operation

Components of an SPN cipher:

  • : the block length (in bits)
  • : the key length (in bits)
  • : the number of rounds
  • A fixed invertible function called substitution, where is a divisor of .
  • A fixed permutation on .
  • A key scheduling algorithm that determines subkeys from a key . Note: and the key scheduling algorithm are public.

The encryption looks like: plaintext for do ciphertext

Decryption is the reverse of encryption.

AES is an SPN, where the permutation operation is comprised of two invertible linear transformations.

All operations are byte oriented, e.g., so the box maps bits to bits. This facilitates fast implementations on software platforms.

The block length of AES is . Each subkey is bits.

AES accepts three key lengths. The number of rounds depends on the key length.

Each round updates a variable called State which consists of a array of bytes (note , the block length).

State is initialized with the plaintext.

Add AES round uses four invertible operations:

  1. AddRoundKey (key-mixing)
  2. SubBytes (S-box)
  3. ShiftRows (permutation)
  4. MixColumns (linear transformation)

After rounds are completed, a final subkey is XORed with State, the result being the ciphertext.

Definition

The elements of the finite filed are the polynomials of degree at most in , with addition and multiplication performed module the irreducible polynomial ( is the set of polynomials in with coefficients from ). We interpret an bit string as coefficients of the polynomial and vice versa.

Example

Let and , so and .

Addition: , so .

Multiplication: , which leaves a remainder of upon division by . Hence in , or .

Inversion: since .

Definition (S-box)

Let , and consider as an element of .

  1. Let if , and if
  2. Define
  3. Compute
  1. Then

ShiftRows: Permute the bytes of State by applying a cyclic shift to each row.

MixColumns:

  1. Read column of State as a polynomial (interpret the coefficients as elements of the finite field .
  2. Multiply this polynomial with the constant polynomial with the constant polynomial and reduce module . This gives a new polynomial

The operation.

Let , where each .

Let , where 01, 02, 03 are elements in (written in hex).

To compute do:

  • Compute (polynomial multiplication where coefficient arithmetic is in ).
  • Divide by to find the remainder polynomial . Equivalently, replace by , by , and by , and simplify.
  • Then

AES Encryption

From the key derive subkeys .

State plaintext

State State

for do

	State ← SubBytes(State)
	State ← ShiftRows(State)
	State ← MixColumns(State)
	State ← State ⊕ k_i

State SubBytes(State)

State ShiftRows(State)

State State

ciphertext State

AES Decryption

From the key derive subkeys

State ciphertext

State State

State InvShiftRows(State)

State InvSubBytes(State)

for do

	State ← State ⊕ k_i
	State ← InvMixColumns(State)
	State ← InvShiftRows(State)
	State ← InvSubBytes(State)

State State

plaintext State

For 128-bit keys, AES has 10 rounds, so we need 11 subkeys.

  • The first subkey is is the actual AES key.
  • The second subkey is
  • The third subkey is
  • The eleventh subkey is

The function are defined as follows:

  1. The input is divided into four bytes:
  2. Left-rotate the bytes:
  3. Apply the AES S-box to each byte:
  4. XOR the leftmost byte with the constant , and output the result:

The constants (in hex):

Modes of Operation

In practice, one usually wishes to encrypt a large quantity of data. The plaintext message is where each is an bit block.

Question

How should we use a block cipher to encrypt ?

Some modes of operation: ECB, CBC, CTR, GCM, CCM.

Suppose that the plaintext message is

Electronic Codebook Mode (ECB): encrypts the blocks independently, one at a time. Drawback: Identical plaintext result in identical ciphertext blocks.

Cipher Block Chaining (CBC) mode:

Encryption: Select ( is a random non-secret IV), and then compute for . The ciphertext is .

Decryption: for .

Identical plaintexts with different IVs result in different ciphertexts.

Chapter 3: Hash Functions

Fundamental Concepts

Hash functions play a fundamental role in cryptography. They are used in a variety of cryptographic primitives and protocols.

They are very difficult to design because of stringent security and performance requirements.

The most commonly used hash functions are:

  • SHA-1
  • SHA-2 family: SHA-224, SHA-256, SHA-384, SHA-512
  • SHA-3 family

SHA-256

SHA-256: SHA-256(“Hello there”) =

Definition

A hash function is a mapping such that:

  1. maps binary messages of arbitrary lengths to outputs of a fixed length : . ( is usually large, e.g., , whereas is small, e.g. )
  2. can be efficiently computed for all

is called an bit hash function. is called the hash or message digest of .

Notes:

  • The description of a hash function is public; there are no secret keys.
  • For simplicity, we will usually write instead of
  • More generally, a hash function is an efficiently computable function from a set to a set .

Consider to be a hash function mapping a bitstring to its last two digits.

is a preimage of .

is a collision. is a second preimage of .

Hash functions are used in all kind of applications. One reason for the widespread use of hash functions is speed.

Definition

A hash function is preimage resistant if, given a hash value , it is computationally infeasible to find (with non-negligible success probability) any with . ( is called a preimage of .)

Password protection on a multi-user computer system:

  • The server stores userid, (password)
  • If the attacker obtains a copy of the password file, they do not learn any passwords.
  • This requires preimage resistance.

Definition

A hash function is 2nd preimage resistant if, given , it is computationally infeasible to find (with non-negligible success probability) any with and

Modification Detection Codes (MDC’s):

  • To ensure that a message is not modified by unauthorized means, one computes and protects from unauthorized modification.
  • This is useful protection and requires 2nd preimage resistance.

Definition

A hash function is collision resistant if it is computationally infeasible to find with and . Such a pair is called a collision for .

Message digests for digital signature schemes:

  • For reasons of efficiency, instead of signing a long message , the (much shorter) message digest is signed.
  • This application requires preimage-resistance, 2nd preimage resistance, and collision resistance.
  • To see why collision resistance is required, suppose that the legitimate signer Alice can find a collision for . Alice can sign and later claimed to have signed .

Other applications:

  • Message Authentication Codes
  • Pseudorandom bit generation
  • Key derivation functions
  • Proof-of-work
  • Quantum-safe signature schemes

Relationships between PR, 2PR, CP

Breaking preimage resistance (PR):

Given:

Required: with .

Breaking 2nd preimage resistance (2PR):

Given: $x \in_{R} {0,1}^*

Required: with and

Breaking collision resistance (CR):

Given:

Required: with and

Claim : If is CR, then is 2PR

Proof: Suppose that is not 2PR.

We’ll show that is not CR.

Select . Since is not 2PR, we can efficiently find , with .

Thus, is a collision for that we have efficiently found, showing that is not CR.

Note: This proof established the contrapositive statement.

Claim : CR does not guarantee PR

Proof: Suppose that is CR.

Consider the hash function defined by

Then is CR (since is).

And is not PR since preimages can be efficiently found for at least half of all , namely the hash values that begin with .

Note: The hash function is rather contrived. For somewhat uniform hash functions, i.e., hash functions for which all hash values have roughly the same number of preimages, CR does not guarantee PR.

Claim : Suppose is somewhat uniform. If is CR, then is PR

Proof: Suppose that is not PR.

We’ll show that is not CR.

Select and compute . Since is not PR, we can efficiently find with . Since is somewhat uniform, we expect that has many preimages, and thus with very high probability. Thus, is a collision for that we have efficiently found, so is not CR.

Claim : PR does not guarantee 2PR

Proof: Suppose that is PR.

Define by for all .

Then is PR. However is not 2PR.

Claim : Suppose is somewhat uniform. If is 2PR, then is PR

Proof: Suppose that is not PR.

We will show that is not 2PR.

So, suppose we are given . We compute .

Since is not PR, we can efficiently find with .

Since is somewhat uniform, we expect that with very high probability. Hence, is a second preimage of that we have efficiently found.

Thus is not 2PR.

Claim : 2PR does not guarantee CR

Proof: Suppose that is 2PR.

Consider defined by if , and .

  • Then is not CR, since is a collision for
  • Suppose now that is not 2PR. We’ll show that is not 2PR.

So, we are given . Since is not 2PR, we can efficiently find , with . With probability essentially , we can assume that . Hence, .

Now, if , then

And, if , then .

In either case, we have efficiently found a second preimage for with respect to .

Hence, is not 2PR, a contradiction. Thus, is 2PR.

Generic Attacks

Definition

A generic attack on hash functions does not exploit any properties that the specific hash function might have.

In the analysis of a generic attack, we view as a random function in the sense that for each , the hash value was defined by selecting .

A random function is an ideal hash function (from a security point-of-view). However they are not practical.

Attack

Given , repeatedly select arbitrary until .

The expected number of hash operations is .

This generic attack is infeasible if .

Attack

Select arbitrary and store in a table sorted by first entry. Repeat until a collision is found.

Analysis: By the birthday paradox, the expected number of hash operations is

This generic attack is infeasible if .

This generic attack for finding collisions is optimal. The expected space required is .

VW: van Oorschot & Wiener parallel collision search.

The expected number of hash operations: . The expected space required is negligible.

It is easy to parallelize, -fold speedup with processors.

The VW collision-finding algorithm can easily be modified to find meaningful collisions.

If collision resistance is desired, then use an -bit hash function with .

Parallel Collision search (VW method)

Problem: Find a collision for .

Assumption: is a random function.

Notation: Let .

Define a sequence by for .

Let be the smallest index for which for some ; such a must exist. Then for all . By the birthday paradox, . In fact, and .

Now, with overwhelming probability, in which event is a collision for .

Question

How to find without using much storage?

We only store distinguished points.

Distinguished points: Select an easily-testable distinguishing property for elements of , e.g. leading 32 bits are all 0.

Let be the proportion of elements of that are distinguished.

VW method: Compute the sequence and only store the points that are distinguished.

VW Collision Finding

Stage 1: Detecting a collision

  1. Select
  2. Store in a sorted table.
  3. ( = last point stored)
  4. For do:
  5. Compute
  6. If is distinguished then
    1. If is already in the table, say where , then go to Stage 2
    2. Store in the table.

Stage 2: Finding a collision

  1. Set
  2. Suppose , and set $k \gets \ell_1 - \ell_2
  3. Compute
  4. For do:
  5. Compute
  6. Until
  7. The collision is

VW Analysis

Stage 1: Expected number of -evaluations is:

Stage 2: Expected number of evaluations is

Overall expected running time:

Expected storage: bits

Example

Consider . Take . Then the expected run time of VW collision search is evaluations (feasible), and the expected storage is 192 gigabytes (negligible).

Iterated Hash Functions

Merkle's Meta Method

Components:

  • Fixed initializing value
  • Efficiently-computable compression function

To compute where has bitlength do:

  1. Break up into bit blocks, , padding the last block with bits as necessary.
  2. Define , the length-block, to hold the right-justified binary representation of .
  3. Define .
  4. Compute for . (The $H_{i}‘s are called chaining variables).
  5. Define

Theorem (Merkle)

If the compression function is collision resistant, then the iterated hash function is also collision resistant.

Merkle’s theorem reduces the problem of designing collision-resistant hash functions to that of designing collision-resistant compression functions.

A major theme in cryptographic research is to formula precise security definitions and assumptions, and then prove that a protocol is secure.

A proof of security is certainly desirable since it rules out the possibility of attacks being discovered in the future.

However, it isn’t always easy to asses the practical security assurances (if any) that a security proof provides).

  • Assumptions might be unrealistic, or false, or circular.
  • The security model might not account for certain kinds of realistic attacks.
  • The security proof might be asymptotic.

Proof of Merkle’s Theorem ( is CR is CR)

Suppose that is not CR. We’ll show that is not CR.

Since is not R, we can efficiently find messages , with and .

Let , = bit-length(), length block.

Let , = bit-length(), length block.

We can efficiently compute:

Since , we have

Case 1: Now, if then . Thus, is a collision for that we have efficiently found.

Case 2: Suppose next that . Then and

  • Let be the largest index, , for which . Such an must exist since .
  • Then , so is a collision for that we have efficiently found.

Thus, is not collision resistant

MDx is a family of iterated hash functions. MD4 has 128-bit outputs. Professor Xiaoyun Wang found collisions for MD4 by hand.

MD5 is a strengthened version of MD4. MD5 should not be used if collision resistance is required, but is probably okay as a preimages-resistant hash function.

Secure Hash Algorithm (SHA) was designed by NSA and published by NIST in 1992. 160-bit iterated hash function based on MD4.

The SHA-2 design is similar to SHA-1, and thus there were lingering concerns that the SHA-1 weaknesses could eventually extend to SHA-2.

SHA-3 is used in practice, but not as widely deployed as SHA-2.

SHA-256

Iterated hash function (Merkle’s meta method). .

Compression function is

Input: bit string of arbitrary bitlength .

Output: 256-bit hash value of .

SHA-256 Notation

are 32-bit words.

  • : addition module
  • : bitwise complement
  • : shift right by positions
  • : rotate right by positions
  • : bitwise AND of
  • : bitwise exclusive-OR

SHA-256 constants

  • 32-bit initial chaining values (IVs): These words were obtained by taking the first 32 bits of the fractional parts of the square roots of the first 8 prime numbers.

  • Per-round integer additive constants: These words were obtained by taking the first 32 bits of the fractional parts of the cube roots of the first 64 prime numbers.

SHA-256 preprocessing

  1. Pad with 1, followed by as few 0’s as possible so that the bitlength is 64 less than a multiple 512.
  2. Append the 64-bit binary representation of
  3. The formatted input is , where each is a 32-bit word
  4. Initialize the words of the chaining variable:

SHA-256 Processing

For each from 0 to do the following:

  • Copy the th block of sixteen 32-bit words into temporary storage:
  • Expand the 16-word block into a 64-word block: For from 16 to 63 do:
  • Initialize working variables:
  • For from 0 to 63 do:
  • Update chaining variable: Output: SHA256() =

Chapter 4: Message Authentication Codes

Fundamental Concepts

Definition

A message authentication code (MAC) scheme is a family of functions parameterized by an -bit key , where each function can be efficiently computed.

is called the MAC or tag of with key .

MAC schemes are used for providing (symmetric-key) data integrity and data origin authentication.

To provide data integrity and data origin authentication:

  1. Alice and Bob establish a secret key
  2. Alice computes the tag of a message and sends to Bob.
  3. Bob verifies that

Note: There is no confidentiality or non-repudiation.

Let be the secret key shared by Alice and Bob.

The adversary does not know , but is allowed to obtain (from either Alice or Bob) the tags for messages of her choosing. The adversary’s goal is to obtain the tag of a new message of the adversary’s choosing.

Definition

A scheme is secure if given some tags for messages ‘s of one’s own choosing, it is computationally infeasible to compute a message-tag pair ( for a new message . A scheme is secure if it is existentially unforgeable against chosen-message attack.

An ideal scheme has the following property: For each , the function is a random function. (This is useless in practice).

Generic Attack

Guessing the tag of a message . Attack: Select and guess that

Analysis: Assuming that the scheme is ideal, the success probability is

Generic Attack (2)

Attack: Given message-tag pairs , one can check whether a guess of the key is correct by verifying that for .

Analysis: Assuming the scheme is ideal, expected number of keys for which all pairs verify is .

Exhaustive key search is infeasible if

GSM (Global System for Mobile Communications) security is notable since it uses only symmetric-key primitives.

Objectives:

  1. Entity authentication: The cell phone service provider needs the assurance that entities accessing its service are legitimate subscribers
  2. Confidentiality: The data exchanged between a cell phone user and their cell phone service provider should be confidential

GSM does not provide end-to-end security.

Alice: Cell phone user. Bob: Cell phone service provider

  1. Alice sends an authentication request to Bob
  2. Bob selects a challenge and sends to Alice
  3. Alice’s SIM card uses to compute the response . Alice sends to Bob.
  4. Bob retrieves Alice’s key from its database, and verifies that
  5. Alice and Bob compute an encryption key , and thereafter use the encryption algorithm Enc to encrypt and decrypt messages for each other for the remainder of the session

CBC-MAC and HMAC

Let be an -bit block cipher with key space .

Assumption: Plaintext messages all have lengths that are multiples of .

To compute CBC-MAC:

  1. Divide into -bit blocks
  2. Compute
  3. For , compute
  4. Then CBC-MAC

CBC-MAC is not secure if variable-length messages are allowed.

Here is a chosen-message attack on CBC-MAC:

  1. Select an arbitrary 3-block message
  2. Obtain the tag of the one-block message
  3. Obtain the tag of the two-block message
  4. Output the forgery

Correctness: CBC-MAC

One countermeasure for variable-length messages is Encrypted CBC-MAC, where CBC-MAC tag is encrypted using a second key : EMAC, where CBC-MAC

Theorem

Suppose that is an ideal encryption scheme. Then EMAC is a secure MAC scheme.

Encryption scheme is ideal if for each is a random permutation.

Hash functions were not originally designed for message authentication. In particular, they are not “keyed” primitives.

Question

How to use a hash function to construct a secure MAC?

Let be an iterated -bit hash function. For simplicity, assume that all message shave bitlengths that are multiples of , and suppose that the length-block is omitted.

Let be the input blocklength of the compression function k \in_R {0,1}^n. Let denote padded with 0’s, so has bitlength .

Definition

This is insecure. Here is a length extension attack:

  • Suppose that (, MAC) is known.
  • Then MAC can be easily computed for any .

Also insecure if messages can be of any bitlength and a length block is postponed to .

Suppose that the adversary knows a message-tag pair , where is a one-block message. Hence, .

The adversary does the following:

  • Arbitrarily select a one-block message .
  • Compute and set . ( is a two-block message).
  • Output

Correctness: The message-tag pair is a valid forgery since .

Hash-based MAC. Define two bit strings: ipad , opad

Definition

HMAC opad, ipad,

HMAC is commonly used as a key derivation function (KDF).

Suppose that Alice has a secret key , and wishes to derive several session keys , e.g. to encrypt data in different communication sessions.

Alice computes HMAC, HMAC, HMAC

Rationale: Without knowledge of , an adversary is unable to learn anything about any particular session key , even though she might have learnt some other session keys.

Chapter 5: Authenticated Encryption

Fundamental Concepts

A symmetric-key encryption scheme provides confidentiality.

A MAC scheme provides authentication (data origin authentication and data integrity).

Question

What if confidentiality and authentication are both required?

Alice computes and MAC, and sends to Bob. Here, is the plaintext and is the secret key she shares with Bob.

Bob decrypts to obtain and then verifies that MAC

This generic method is not secure.

Instead consider the following.

Alice computes and MAC, and sends to Bob. Here, is the plaintext and is the secret key she shares with Bob.

Bob first verifies that MAC and then decrypts to obtain

This generic method has been proven to be secure, provided that the encryption scheme and the MAC scheme employed are secure.

Many special-purpose authenticated encryption schemes have been developed, the most popular of which is using a symmetric-key encryption such as AES in GCM. Some of these authenticated encryption schemes also allow for the authentication of “header” data.

Definition

An authenticated encryption scheme is -secure if:

  1. is semantically secure against chosen-plaintext attack; and
  2. has ciphertext integrity, i.e., an adversary who is able to obtain ciphertext-tag pairs for plaintext messages of her choosing, is unable to produce a valid ciphertext-tag pair where .

AES-GCM

AES-GCM is an authenticated encryption scheme designed by David McGrew and John Viega. Uses the CTR mode of encryption and GMAC, a custom-designed MAC scheme.

CTR: Counter mode of encryption.

Let be the secret key shared by Alice and Bob. Let be a plaintext message, where each is a 128-bit block and .

To encrypt , Alice does the following:

  1. Select a nonce
  2. Let
  3. For from 1 to do: and compute AES
  4. Send to Bob.

To decrypt, Bob does the following:

  1. Let
  2. For from 1 to do: and compute AES

CTR mode of encryption can be viewed as a stream cipher. As was the case with CBC encryption, identical plaintexts with different IVs result in different ciphertexts. Unlike CBC encryption, CTR encryption is parallelizable. Note that AES is not used. The secret key can have bitlength 128, 192, 256.

Let be a 128-bit block. We associate the binary polynomial with .

Let .

If and are 128-bit blocks, then define to be the block corresponding to the polynomial .

That is, is the remainder upon dividing by in . This is multiplication in the Galois field .

Let , where each is a 128-bit block. Let be the bitlength of . Let be the secret key.

  1. Ket , where is a nonce.
  2. Compute AES
  3. Let
  4. Compute the authentication tag AES
  5. Send

Example

Let Then Hence, can be computed using Horner’s rule: This requires three additions and four multiplications in

In general, if has blocklength , then computing using Horner’s rule requires additions and multiplications in .

Consider the simplified tag:

An adversary can guess the tag of a message with success probability . She can also guess the tag by making a guess for and computing . Her success probability is at most , where is the blocklength of . However, if the adversary sees a single valid message-tag pair she can solve the polynomial equation for . To circumvent the aforementioned attack, a second secret AES is used to hide AES. The secret AES serves as a one-time pad for .

Input:

AAD (Additional Authenticated Data), also called encryption context: Data to be authenticated (but not encrypted): Data to be encrypted and authenticated: . Secret key , shared between Alice and Bob.

Output: where

  • is a 96-big initialization vector.
  • is the additional authenticated data.
  • is the encrypted/authenticated data.
  • is a 128-bit authentication tag.

Alice does the following:

  1. Let , where are bitlengths of expressed as 64-bit integers.
  2. Select a nonce and let
  3. Encryption: For from 1 to do: Compute and AES
  4. Authentication: Compute AES. Compute AES
  5. Output:

Note:

Upon receiving Bob does the following:

  1. Let , where are bitlengths of expressed as 64-bit integers.
  2. Authentication: Compute AES. Compute AES. If then proceed to decryption; if then reject.
  3. Decryption: Let . For from 1 to do: Compute and AES
  4. Output:

IV’s should not be repeated (with the same key ). Suppose an IV is reused, and an eavesdropped captures two transmissions: . Suppose also that and have the same blocklengths, and that the eavesdropper knows .

Then AES and AES, so .

This polynomial equation can be quickly solved for , and then AES can be computed.

Thereafter, the adversary can properly encrypt/authenticate any plaintext (of blocklength at most that of ).

Chapter 6: Public-Key Cryptography

Fundamental Concepts

Symmetric-key cryptography: Communicating parties a priori share some secret keying information.

The shared secret keys can then be used to achieve confidentiality (e.g. using AES-CBC), or authentication (e.g. using HMAC), or both (e.g. using AES-GCM).

Question

How can Alice and Bob establish the secret key ?

Method 1: Point-to-point key distribution

flowchart LR
Alice --->|k| Bob

The secured channel could be:

  • A trusted courier
  • A face-to-face meeting
  • A SIM card that contains an authentication key

Method 2: Use a Trusted Third Party (TTP) .

Each user shares a secret key with for a symmetric-key encryption scheme . To establish this key, must visit once.

serves as a key distribution centre (KDC).

flowchart LR
A --->|Request A,B| T
T --->|EkBTk| B
T --->|EkATk| A
  1. sends a request for a key to share with
  2. selects a session key , and encrypts it for using
  3. encrypts for using

Drawbacks of using a KDC:

  1. The TTP must be unconditionally trusted
  2. The TTP is an attractive target
  3. The TTP must be on-line, and is therefore a potential bottleneck and critical reliability point.

With a network of users, each user has to share a different secret key with every other user. Each user has to store and manage different secret keys. The total number of secret keys is

Non-repudiation is impractical.

Public-key cryptography: Communicating parties a priori share some authenticated (but non-secret) information.

Merkle Puzzles

Alice and Bob establish a secret session key by communicating over an authenticated (but non-secret) channel.

  1. Alice creates puzzles (e.g., ). Each puzzle takes hours to solve. The solution to reveals a 128-bit session key and a randomly selected 128-bit serial number
  2. Alice sends to Bob
  3. Bob selects and solves puzzle to obtain and
  4. Bob sends to Alice.
  5. The secret session key is

Key pair generation for public key cryptography:

Each entity does the following:

  1. Generate a key pair
  2. ‘s public key is and her secret key is .

Security requirement: It should be infeasible for an adversary to recover from .

Example

where are randomly-selected prime numbers, and

To encrypt a message for Bob, Alice does:

  1. Obtain an authentic copy of Bob’s public key
  2. Compute , where is the encryption function.
  3. Send to Bob.

To decrypt , Bob does:

  1. Compute where is the decryption function.

To sign a message , Alice does:

  1. Compute Sign().
  2. Send to Bob

To verify Alice’s signature on , Bob does:

  1. Obtain an authentic copy of Alice’s public key .
  2. Accept if Verify “Accept”

If Alice generates a signed message , anyone who has the authentic copy of Alice’s public key can verify the authenticity of the signed message.

Advantages of public-key cryptography

  • No requirement for a secured channel
  • Each user has only one key pair
  • A signed message can be verified by anyone

Disadvantages of public-key cryptography

  • Public-key schemes are slower than their symmetric-key counterparts.

In practice, symmetric-key and public-key schemes are used together.

Elementary Number Theory

Content covered in MATH135

The set of integers is

An integer is said to divide an integer , written if there exists and integer such that .

Division algorithm: If and are integers with , then the orginary long division of by yields unique integers and with and .

An integer is said to be prime if its only positive divisors are and . Otherwise, is called composite.

Fundamental Theorem of Arithmetic: Every integer has a factorization as a product of prime powers: , where the are distinct primes and the are positive integers. Furthermore, the factorization is unique up to rearrangement of factors.

Prime Number Theorem: For , let denote the number of primes between and . For all , we have

Let and be integers, not both . The greatest common divisor of and , denoted , is the largest positive integer that divides both and .

Integers and are said to be relatively prime, or coprime, if .

Fact

If and are positive integers with , then .

Euclidian algorithm for computing where .

While do the following:

  • Set , a \gets b, b \gets r

Return.

Integers modulo : , where addition subtraction and multiplication are performed modulo .

Definition

Let . The multiplicative inverse of modulo is an integer such that . If such an exists, then it is unique, and is said to be invertible modulo ; the inverse of modulo is denoted by

Fact

Let . Then exists if and only if

Computing inverses modulo . Let with . Use the EEA to find integers and such that . Then , so

Algorithmic Number Theory

Theorem: Every integer has a unique prime factorization.

How we find this prime factorization efficiently is a hard problem. How we verify a prime factorization is easy. Deciding whether a number is prime or composite is easy.

An algorithm is a “well-defined computational procedure” that takes a variable input and eventually halts with some output.

The efficiency of an algorithm is measured by the scarce resources it consumes.

Definition

The input size is the number of bits required to write down the input using a reasonable encoding. e.g., the size of a positive integer is bits.

Definition

The running time of an algorithm is an upper bound, as a function of the input size, of the worst case number of basic operations the algorithm executes over all inputs of a fixed size.

Definition

An algorithm is a polynomial-time algorithm if its expected running time is , where is the input size and is a fixed positive integer.

Definition

If and are functions from the positive integers to the real numbers, then means that there exists a positive constant and a positive integer such that .

Modular exponentiation.

Input: A -bit integer , and integers

Output:

Naïve algorithm #1:

  1. Compute
  2. Return ()

Analysis: The bitlength of is since . Hence the algorithm is not polytime.

Naïve algorithm #2:

  1. For from 2 to do:
  2. Return()

This algorithm is also not polytime.

Let the binary representation of be , where . Then:

We can do the following repeated square-and-multiply algorithm for computing

Write in binary

  1. If then ; else .
  2. For from 1 to do:
    1. If then
  3. Return ()

Analysis: At most modular squaring and modular multiplications, so the worst case running time is bit operations. This is polytime.

Chapter 7: RSA

Basic RSA

RSA is used for public-key encryption and signatures.