Hash Functions and Collision Resistance

Graduate Depth 66 in the knowledge graph I know this Set as goal
Unlocks 29 downstream topics
hash-function collision-resistance preimage-resistance sha-256 merkle-damgard birthday-attack

Core Idea

A cryptographic hash function maps arbitrary-length inputs to fixed-length outputs and must satisfy three security properties: preimage resistance (given h, hard to find m with H(m) = h), second-preimage resistance (given m, hard to find m' != m with H(m) = H(m')), and collision resistance (hard to find any pair m != m' with H(m) = H(m')). Collision resistance is the strongest property and is limited by the birthday bound — O(2^{n/2}) for an n-bit hash. SHA-256 (256-bit output, birthday bound 2^128) is the current standard. Hash functions are foundational building blocks for MACs, digital signatures, commitment schemes, and proof-of-work systems.

Explainer

A cryptographic hash function H takes an input of any length and produces a fixed-length output (the hash or digest). SHA-256, the current workhorse, produces 256-bit digests. Unlike encryption, hashing is a one-way operation with no key and no decryption — the same input always produces the same output, and the goal is to make it infeasible to work backward from output to input. Hash functions serve as the cryptographic equivalent of fingerprints: a compact, deterministic summary that is practically impossible to forge.

Three security properties define a good cryptographic hash. Preimage resistance means that given a hash value h, it is infeasible to find any message m such that H(m) = h. Second-preimage resistance means that given a message m, it is infeasible to find a different message m' with H(m') = H(m). Collision resistance means it is infeasible to find any pair of distinct messages m, m' with H(m) = H(m'). Collision resistance is the strongest property and implies second-preimage resistance (but not vice versa). The generic attack cost for collisions is determined by the birthday bound: approximately 2^{n/2} hash evaluations for an n-bit hash, because random collisions among 2^{n/2} values are expected by the birthday paradox.

Most deployed hash functions use the Merkle-Damgard construction, which processes a message in fixed-size blocks using a compression function. An initial value (IV) is updated by absorbing each block through the compression function, and the final state becomes the hash. This is elegant and provably collision-resistant if the compression function is — but it introduces length extension vulnerabilities: since the output is the full internal state, an attacker who knows H(m) can continue the computation without knowing m, computing H(m || padding || extra) directly. SHA-3 uses the sponge construction instead, which maintains a larger internal state than the output, preventing this attack by design.

Hash functions are ubiquitous in cryptography and systems. They underpin HMAC (hash-based message authentication), digital signatures (sign the hash of a message rather than the full message), commitment schemes (commit to a value by publishing its hash), proof of work (Bitcoin mining finds inputs whose hash starts with many zeros), and data integrity (verify file downloads match expected hashes). Their security is entirely computational — collisions exist by the pigeonhole principle (infinite inputs, finite outputs), but finding them should require brute-force effort proportional to the birthday bound. When this guarantee breaks, as happened with MD5 (practical collision attacks found in 2004) and SHA-1 (collision demonstrated in 2017), the hash function must be retired from security-critical applications.

Practice Questions 5 questions

Prerequisite Chain

Counting to 10Counting to 20Understanding ZeroThe Number ZeroCounting to FiveOne-to-One CorrespondenceCombining Small Groups Within 5Addition Within 10Addition Within 20Two-Digit Addition Without RegroupingTwo-Digit Addition with RegroupingAddition Within 100Repeated Addition as MultiplicationMultiplication Facts Within 100Division as Equal SharingDivision as Grouping (Measurement Division)Division: Grouping (Repeated Subtraction) ModelDivision: Fair Sharing ModelDivision as Equal SharingDivision as GroupingBasic Division FactsDivision Facts Within 100Two-Digit by One-Digit DivisionDivision with RemaindersRemainders and Quotients in DivisionDivision Word ProblemsIntroduction to Long DivisionFactors and MultiplesPrime and Composite NumbersEquivalent FractionsRelating Fractions and DecimalsDecimal Place ValueReading and Writing DecimalsComparing and Ordering DecimalsAdding and Subtracting DecimalsMultiplying DecimalsDividing DecimalsDividing FractionsMixed Number ArithmeticOrder of OperationsInteger Order of OperationsVariable ExpressionsCombining Like TermsOne-Step EquationsTwo-Step EquationsSolving Multi-Step EquationsEquations with Variables on Both SidesLiteral EquationsSlope-Intercept FormPoint-Slope FormWriting Linear EquationsParallel and Perpendicular Line SlopesGraphing Linear EquationsPiecewise FunctionsStep FunctionsComposition of FunctionsInverse FunctionsRadical Functions and GraphsRational ExponentsExponential Functions and GraphsLogarithms IntroductionTime and Space ComplexityTime Complexity Classes: P and EXPTIMENondeterministic Time Complexity and NPThe P vs. NP ProblemComplexity Class P: Polynomial TimeHash Functions and Collision Resistance

Longest path: 67 steps · 381 total prerequisite topics

Prerequisites (2)

Leads To (6)