Questions: Hash Functions and Collision Resistance
5 questions to test your understanding
Score: 0 / 5
Question 1 Short Answer
A hash function produces 128-bit outputs. An attacker wants to find a collision. Approximately how many random inputs must they hash before expecting a collision, and what principle explains this?
Think about your answer, then reveal below.
Model answer: Approximately 2^64 inputs, by the birthday paradox. In a set of n randomly chosen hash outputs from a space of 2^128 values, the probability of at least one collision exceeds 50% when n is roughly 2^{128/2} = 2^64. This is analogous to the birthday problem: in a room of 23 people, there's a >50% chance two share a birthday, despite 365 possible birthdays. The birthday bound means collision resistance is at most half the output length in bits of security.
The birthday bound is fundamental to hash function design. A 128-bit hash provides only 64 bits of collision resistance — potentially feasible for well-resourced attackers. This is why modern hash functions use 256-bit or larger outputs: SHA-256 provides 128 bits of collision resistance, which is currently considered secure. The birthday attack requires no cryptanalytic insight — it works against any hash function purely from output length.
Question 2 True / False
Collision resistance implies second-preimage resistance, but second-preimage resistance does not imply collision resistance.
TTrue
FFalse
Answer: True
If you can find second preimages efficiently (given m, find m' with H(m) = H(m')), you can find collisions — just pick any m and find its second preimage. So collision resistance implies second-preimage resistance. The reverse fails: a collision attacker gets to choose both messages freely, which is strictly more power than being given one message and searching for a match. The birthday attack finds collisions in O(2^{n/2}) time, while the best generic second-preimage attack requires O(2^n) time. A function could resist second preimages but fall to birthday-bound collision attacks.
Question 3 Multiple Choice
A developer uses MD5 to hash passwords, arguing that while MD5 collisions have been found, preimage attacks are still infeasible. What is wrong with this reasoning?
AMD5 preimage attacks are actually practical and passwords can be directly recovered
BPassword hashing requires collision resistance, not preimage resistance, so MD5's broken collision resistance is the relevant vulnerability
CWhile MD5's preimage resistance is not fully broken, password hashing has additional requirements (slowness, salting) that MD5 does not satisfy. MD5 is fast by design, enabling rapid brute-force and dictionary attacks. Dedicated password hashing functions like bcrypt or Argon2 are needed
DMD5 is fine for password hashing as long as the passwords are longer than 128 bits
The reasoning conflates cryptographic hash security properties with password hashing requirements. Even a hash with perfect preimage resistance is unsuitable for passwords if it's fast — the attacker's strategy is not to invert the hash mathematically but to hash billions of candidate passwords per second and compare. Password hashing functions deliberately incorporate tunable slowness (key stretching), memory hardness, and salting. Using MD5 or SHA-256 directly for passwords is a design error independent of collision attacks.
Question 4 Multiple Choice
The Merkle-Damgard construction builds a hash function from a fixed-size compression function. What structural vulnerability does it introduce that newer constructions like SHA-3 (sponge) avoid?
AMerkle-Damgard hashes are vulnerable to timing attacks because they process blocks sequentially
BLength extension attacks: knowing H(m) and |m| allows computing H(m || padding || m') without knowing m, because the hash output is the internal state after processing m
CMerkle-Damgard cannot process messages longer than 2^64 bits
DThe compression function must be collision-resistant, but SHA-3's sponge construction does not need a compression function at all
In Merkle-Damgard, the final hash value IS the internal state. An attacker who knows H(m) can continue the computation by feeding additional blocks, computing H(m || pad || m') without knowing m. This enables signature forgery on schemes that MAC as H(key || message). SHA-3's sponge construction avoids this by using a capacity portion of the state that is never output, so the hash value does not reveal the full internal state. HMAC also works around this by using a nested hash construction.
Question 5 True / False
SHA-256 always produces a 256-bit output regardless of whether the input is 1 byte or 1 terabyte.
TTrue
FFalse
Answer: True
Fixed-length output from arbitrary-length input is a defining property of hash functions. SHA-256 always outputs exactly 256 bits (32 bytes). This compression from an infinite input domain to a finite output domain means collisions must exist (by the pigeonhole principle) — the security guarantee is that finding them is computationally infeasible, not that they don't exist.