A topic in the Open Knowledge Graph — a free, open map of 15,290 topics and the order to learn them in.

Hash Functions and Collision Resistance

Graduate Depth 96 in the knowledge graph ☐ I know this ☆ Set as goal

29topics build on this

514prerequisites beneath it

Symmetric Encryption and Block Ciphers Complexity Class P: Polynomial Time→→Blockchain Consensus Cryptography Commitment Schemes +4 more

Core Idea

A cryptographic hash function maps arbitrary-length inputs to fixed-length outputs and must satisfy three security properties: preimage resistance (given h, hard to find m with H(m) = h), second-preimage resistance (given m, hard to find m' != m with H(m) = H(m')), and collision resistance (hard to find any pair m != m' with H(m) = H(m')). Collision resistance is the strongest property and is limited by the birthday bound — O(2^n/2) for an n-bit hash. SHA-256 (256-bit output, birthday bound 2¹²⁸) is the current standard. Hash functions are foundational building blocks for MACs, digital signatures, commitment schemes, and proof-of-work systems.

Explainer

A cryptographic hash function H takes an input of any length and produces a fixed-length output (the hash or digest). SHA-256, the current workhorse, produces 256-bit digests. Unlike encryption, hashing is a one-way operation with no key and no decryption — the same input always produces the same output, and the goal is to make it infeasible to work backward from output to input. Hash functions serve as the cryptographic equivalent of fingerprints: a compact, deterministic summary that is practically impossible to forge.

Three security properties define a good cryptographic hash. Preimage resistance means that given a hash value h, it is infeasible to find any message m such that H(m) = h. Second-preimage resistance means that given a message m, it is infeasible to find a different message m' with H(m') = H(m). Collision resistance means it is infeasible to find any pair of distinct messages m, m' with H(m) = H(m'). Collision resistance is the strongest property and implies second-preimage resistance (but not vice versa). The generic attack cost for collisions is determined by the birthday bound: approximately 2^n/2 hash evaluations for an n-bit hash, because random collisions among 2^n/2 values are expected by the birthday paradox.

Most deployed hash functions use the Merkle-Damgard construction, which processes a message in fixed-size blocks using a compression function. An initial value (IV) is updated by absorbing each block through the compression function, and the final state becomes the hash. This is elegant and provably collision-resistant if the compression function is — but it introduces length extension vulnerabilities: since the output is the full internal state, an attacker who knows H(m) can continue the computation without knowing m, computing H(m || padding || extra) directly. SHA-3 uses the sponge construction instead, which maintains a larger internal state than the output, preventing this attack by design.

Hash functions are ubiquitous in cryptography and systems. They underpin HMAC (hash-based message authentication), digital signatures (sign the hash of a message rather than the full message), commitment schemes (commit to a value by publishing its hash), proof of work (Bitcoin mining finds inputs whose hash starts with many zeros), and data integrity (verify file downloads match expected hashes). Their security is entirely computational — collisions exist by the pigeonhole principle (infinite inputs, finite outputs), but finding them should require brute-force effort proportional to the birthday bound. When this guarantee breaks, as happened with MD5 (practical collision attacks found in 2004) and SHA-1 (collision demonstrated in 2017), the hash function must be retired from security-critical applications.

Practice Questions 5 questions