A topic in the Open Knowledge Graph — a free, open map of 15,290 topics and the order to learn them in.

Parallel Algorithms and the PRAM Model

Research Depth 88 in the knowledge graph ☐ I know this ☆ Set as goal

501prerequisites beneath it

Big-O Notation and Complexity Analysis Divide and Conquer +2 more→

Core Idea

The PRAM (Parallel Random Access Machine) is the standard theoretical model for shared-memory parallel computation: p processors operate synchronously on a shared memory, executing one instruction per step. The key complexity measures are work (total operations across all processors) and depth (longest chain of sequential dependencies, also called span or parallel time). Brent's theorem connects these: any algorithm with work W and depth D can be executed on p processors in time O(W/p + D). The complexity class NC (Nick's Class) captures problems solvable in polylogarithmic depth with polynomial work -- the parallel analog of P. Classic results include O(log n)-depth parallel prefix sums, O(log² n)-depth sorting networks, and the striking fact that some problems in P appear to be inherently sequential (P-complete problems), admitting no significant parallel speedup.

Explainer

Sequential algorithm analysis asks one question: how many steps does the algorithm take? Parallel algorithm analysis asks two: work (total operations, summed over all processors) and depth (the longest chain of dependent operations that must execute sequentially). These two measures capture fundamentally different aspects of an algorithm's parallelizability. An algorithm with small depth can exploit many processors simultaneously, while an algorithm with small work avoids wasting computation. The ideal is both: work matching the best sequential algorithm and polylogarithmic depth.

Brent's theorem bridges theory and practice by showing that any algorithm with work W and depth D can run on p processors in O(W/p + D) time. The W/p term represents the work distributed evenly among processors; the D term represents the inherent sequential bottleneck. When p is much smaller than W/D (the algorithm's parallelism), the running time is approximately W/p -- linear speedup. When p exceeds W/D, adding more processors does not help because the algorithm is depth-bound. This theorem justifies focusing on work-efficient algorithms (W equal to sequential optimal): they guarantee that any available parallelism translates to proportional speedup.

The PRAM model provides the theoretical framework. It assumes p processors sharing a common memory, operating in lockstep. PRAM variants differ in memory access rules: EREW (exclusive read, exclusive write) is the most restrictive and models real hardware most closely; CRCW (concurrent read, concurrent write) is the most permissive and simplifies algorithm design. The difference matters: computing the OR of n bits takes O(1) depth on CRCW (every processor with a 1-bit writes concurrently) but Omega(log n) on EREW. In general, CRCW can be simulated on EREW with a logarithmic slowdown, so the models are polynomially equivalent but the constant factors in depth differ.

The parallel prefix (scan) operation is the workhorse of PRAM algorithms. Given an array [x_1, ..., x_n] and an associative operator, it computes all prefixes [x_1, x_1 + x_2, ..., x_1 + ... + x_n] in O(n) work and O(log n) depth. This deceptively simple primitive underlies an enormous range of parallel algorithms: array compaction (removing marked elements while preserving order), load balancing, segmented operations, tree computations (Euler tour technique), and even sorting. Once you can do parallel prefix, many problems that seem inherently sequential yield to elegant parallel solutions.

The complexity class NC formalizes "efficiently parallelizable": problems solvable with polynomial work and polylogarithmic depth. NC is contained in P (polylog depth, polynomial work implies polynomial sequential time), but whether NC equals P is a major open question. P-complete problems -- like the Circuit Value Problem (evaluating a Boolean circuit) -- are the hardest problems in P for parallel computation: they are in NC only if NC = P. The existence of P-complete problems suggests that some polynomial-time computations are inherently sequential, resisting any significant parallel speedup. This parallels the NP-completeness story: just as NP-complete problems are believed to require superpolynomial sequential time, P-complete problems are believed to require polynomial (not polylogarithmic) parallel depth.

Practice Questions 5 questions

Prerequisite Chain

Understanding Zero → The Number Zero → Counting to Five → Counting to 10 → Counting to 20 → Counting a Set of Objects Up to 20 → Cardinality: The Last Number Counted → Matching Numerals to Quantities → Subitizing Small Quantities → Addition Within 10 → Number Bonds to 10 → Addition Within 20 → Doubles and Near Doubles → Doubles Facts Within 10 → Near Doubles Facts Within 20 → Mental Math Strategies for Addition → Mental Math: Adding and Subtracting Tens → Addition Within 100 → Repeated Addition as Multiplication → Multiplication as Equal Groups → Multiplication: Arrays → Basic Multiplication Facts (0s, 1s, 2s, 5s, 10s) → Multiplication Facts Within 100 → Division as Equal Sharing → Division as Grouping (Measurement Division) → Division: Grouping (Repeated Subtraction) Model → Division: Fair Sharing Model → Division as Equal Sharing → Division as Grouping → Basic Division Facts → Division Facts Within 100 → Multiplication and Division Fact Families → Relationship Between Multiplication and Division → Division Facts as Inverse of Multiplication → Remainders and Quotients in Division → Division Word Problems → Multi-Step Word Problems → Solving Multi-Step Word Problems → Multiplication Word Problems → Division Word Problems → Introduction to Long Division → Factors and Multiples → Prime and Composite Numbers → Equivalent Fractions → Relating Fractions and Decimals → Decimal Place Value → Integers and the Number Line → Comparing and Ordering Integers → Absolute Value → Adding Integers → Subtracting Integers → Multiplying Integers → Introduction to Exponents → Order of Operations → Integer Order of Operations → Variable Expressions → The Distributive Property → Variables and Expressions Review → Introduction to Polynomials → Adding and Subtracting Polynomials → Multiplying Polynomials → Factorial → Permutations → Combinations → Counting Principles: Addition and Multiplication Rules → Introduction to Graph Theory → Propositional Logic Foundations → Logical Equivalences → Boolean Algebra → Boolean Type and Truth Values → Comparison Operators and Boolean Tests → Logical Operators and Boolean Algebra → Conditional Statements → Defining and Calling Functions → Functions: Decomposing Problems → Function Parameters and Argument Passing → Return Values → Variable Scope → Introduction to Classes → Objects and Instances → Methods and Attributes → Algorithm Design Basics → Asymptotic Notation: Big-O, Big-Omega, Big-Theta → Big-O Notation and Complexity Analysis → Time and Space Complexity → Binary Search → Divide and Conquer → Merge Sort → Parallel Algorithms and the PRAM Model

Longest path: 89 steps · 501 total prerequisite topics

Prerequisites (4)

Big-O Notation and Complexity Analysishard Divide and Conquerhard Merge Sortsoft Breadth-First Search (BFS)soft

Leads To (0)

No topics depend on this one yet.