A topic in the Open Knowledge Graph — a free, open map of 15,290 topics and the order to learn them in.

Longest Common Subsequence (LCS) Problem

College Depth 89 in the knowledge graph ☐ I know this ☆ Set as goal

51topics build on this

469prerequisites beneath it

Dynamic Programming→→Edit Distance: Levenshtein Distance and DP Floyd-Warshall Algorithm for All-Pairs Shortest Paths +1 more

Core Idea

The longest common subsequence problem finds the longest sequence of characters appearing in the same order (not necessarily contiguous) in two strings. DP solution: dp[i][j] = longest LCS of first i characters of string A and first j characters of string B. Recurrence: if A[i−1] == B[j−1], dp[i][j] = dp[i−1][j−1] + 1; else dp[i][j] = max(dp[i−1][j], dp[i][j−1]).

How It's Best Learned

Trace the DP table by hand on short strings. Implement and reconstruct the LCS from the table. Test on various examples including repeated characters. See LCS as the foundation for edit distance and diff algorithms.

Common Misconceptions

LCS is the same as edit distance (related but different; LCS finds a common subsequence; edit distance counts minimum edits). - LCS finds contiguous matches (no, it preserves order but can skip characters).

Explainer

From your study of dynamic programming, you know the core pattern: define a subproblem, write a recurrence that relates larger subproblems to smaller ones, and fill in a table bottom-up to avoid redundant computation. The Longest Common Subsequence (LCS) problem is one of the cleanest applications of this pattern. Given two strings — say "ABCBDAB" and "BDCAB" — you want the longest sequence of characters that appears in both strings in the same order, though not necessarily consecutively. Here the answer is "BCAB" (length 4). Notice that "B", "C", "A", "B" appear in that order in both strings, but they are not adjacent in either one. This distinction between a subsequence (same order, gaps allowed) and a substring (same order, no gaps) is critical.

The DP formulation builds a two-dimensional table `dp[i][j]` where each cell represents the length of the LCS of the first `i` characters of string A and the first `j` characters of string B. The base cases are straightforward: `dp[0][j] = 0` and `dp[i][0] = 0`, because the LCS of any string with an empty string is empty. The recurrence handles two cases. If the characters match — `A[i-1] == B[j-1]` — then this matching character extends the best solution from `dp[i-1][j-1]` by one: `dp[i][j] = dp[i-1][j-1] + 1`. If they don't match, you take the better of two options: skip the current character from A (`dp[i-1][j]`) or skip it from B (`dp[i][j-1]`). This "match or skip" logic is what makes LCS a DP problem rather than a greedy one — you cannot simply take the first match you find, because an early match might block a longer overall subsequence.

Once you fill the entire table, `dp[m][n]` gives you the length of the LCS, where m and n are the lengths of the two strings. But often you want the actual subsequence, not just its length. To reconstruct the LCS, start at `dp[m][n]` and trace backward: if the characters at position `i` and `j` match, that character is part of the LCS — record it and move diagonally to `dp[i-1][j-1]`. If they don't match, move in the direction of the larger neighbor (up or left). This backtracking follows the decisions the table encoded during the forward pass.

LCS has deep practical significance. The Unix `diff` utility, which shows differences between two files, is fundamentally an LCS computation — the common subsequence represents unchanged lines, and everything else is an insertion or deletion. Version control systems like Git use similar algorithms to merge changes. LCS is also the foundation for edit distance (Levenshtein distance), which extends the same table structure by adding a cost for substitutions. If you understand LCS well, edit distance is a natural one-step generalization: instead of just matching or skipping, you also allow replacing one character with another at a cost. The O(n × m) time and space complexity can be reduced to O(min(n, m)) space using the rolling-array optimization you likely saw in your DP introduction, since each row only depends on the previous row.

Practice Questions 5 questions

Prerequisite Chain

Understanding Zero → The Number Zero → Counting to Five → Counting to 10 → Counting to 20 → Counting a Set of Objects Up to 20 → Cardinality: The Last Number Counted → Matching Numerals to Quantities → Subitizing Small Quantities → Addition Within 10 → Number Bonds to 10 → Addition Within 20 → Doubles and Near Doubles → Doubles Facts Within 10 → Near Doubles Facts Within 20 → Mental Math Strategies for Addition → Mental Math: Adding and Subtracting Tens → Addition Within 100 → Repeated Addition as Multiplication → Multiplication as Equal Groups → Multiplication: Arrays → Basic Multiplication Facts (0s, 1s, 2s, 5s, 10s) → Multiplication Facts Within 100 → Division as Equal Sharing → Division as Grouping (Measurement Division) → Division: Grouping (Repeated Subtraction) Model → Division: Fair Sharing Model → Division as Equal Sharing → Division as Grouping → Basic Division Facts → Division Facts Within 100 → Multiplication and Division Fact Families → Relationship Between Multiplication and Division → Division Facts as Inverse of Multiplication → Remainders and Quotients in Division → Division Word Problems → Multi-Step Word Problems → Solving Multi-Step Word Problems → Multiplication Word Problems → Division Word Problems → Introduction to Long Division → Factors and Multiples → Prime and Composite Numbers → Equivalent Fractions → Relating Fractions and Decimals → Decimal Place Value → Integers and the Number Line → Comparing and Ordering Integers → Absolute Value → Adding Integers → Subtracting Integers → Multiplying Integers → Introduction to Exponents → Order of Operations → Integer Order of Operations → Variable Expressions → The Distributive Property → Variables and Expressions Review → Introduction to Polynomials → Adding and Subtracting Polynomials → Multiplying Polynomials → Factorial → Permutations → Combinations → Counting Principles: Addition and Multiplication Rules → Introduction to Graph Theory → Propositional Logic Foundations → Logical Equivalences → Boolean Algebra → Boolean Type and Truth Values → Comparison Operators and Boolean Tests → Logical Operators and Boolean Algebra → Conditional Statements → Defining and Calling Functions → Functions: Decomposing Problems → Function Parameters and Argument Passing → Return Values → Variable Scope → Introduction to Classes → Objects and Instances → Methods and Attributes → Algorithm Design Basics → Tree Structure and Node Properties → Binary Trees → Tree Traversals → Depth-First Search (DFS) → Depth-First Search: Implementation and Applications → Topological Sort → Dynamic Programming → Longest Common Subsequence (LCS) Problem

Longest path: 90 steps · 469 total prerequisite topics

Prerequisites (1)

Dynamic Programminghard

Leads To (3)

Edit Distance: Levenshtein Distance and DPsoft Floyd-Warshall Algorithm for All-Pairs Shortest Pathssoft Longest Increasing Subsequence (LIS) Problemsoft