CYK Algorithm and Membership Testing

Graduate Depth 65 in the knowledge graph I know this Set as goal
cyk parsing membership dynamic-programming cubic-time

Core Idea

The Cocke-Younger-Kasami algorithm tests whether a string belongs to a CFL in O(n³) time, assuming grammar in CNF. It builds a table: entry [i,j] lists non-terminals deriving substring i..j. CYK is polynomial—optimal for arbitrary CFGs without compilation overhead—making it crucial for parsing without grammar restrictions.

How It's Best Learned

Trace CYK on a small example (e.g., balanced parentheses grammar). Fill the table bottom-up, checking which rules produce needed sub-derives.

Explainer

The fundamental question in parsing is: given a grammar G and a string w, does G generate w? For context-free grammars, the CYK algorithm (Cocke-Younger-Kasami) answers this question in O(n³) time using dynamic programming — a technique you have seen in other contexts, where you solve small subproblems first and combine their solutions to tackle larger ones. CYK requires the grammar to be in Chomsky Normal Form (CNF), where every production is either A → BC (two nonterminals) or A → a (a single terminal). Your prerequisite on grammar normal forms showed that any CFG can be converted to CNF, so this is not a real restriction.

The algorithm builds a triangular table where each cell T[i,j] stores the set of nonterminals that can derive the substring of w from position i to position j. The base case fills the bottom row: for each single character w[i], T[i,i] contains every nonterminal A such that A → w[i] is a production rule. This is straightforward — you are just asking which rules directly produce each individual symbol.

The recursive case is where the dynamic programming logic appears. To fill T[i,j] for a substring of length greater than 1, you try every way to split the substring into two non-empty parts: positions i to k and k+1 to j, for every valid k. For each split, you check whether there is a production A → BC where B ∈ T[i,k] and C ∈ T[k+1,j]. If so, A goes into T[i,j]. You are asking: "Can I split this substring into two pieces, each derivable from nonterminals that combine via some rule?" By working bottom-up from substrings of length 1 to length n, every cell you need has already been computed by the time you need it.

The string w belongs to L(G) if and only if the start symbol S appears in T[1,n] — the cell representing the entire string. The cubic time complexity comes from three nested loops: O(n) choices for substring length, O(n) choices for starting position, and O(n) choices for the split point. While O(n³) is not fast enough for production parsers that process millions of lines (specialized parsers for restricted grammar subclasses run in linear time), CYK is theoretically important because it works for *any* CFG without requiring the grammar to have special structure beyond CNF. It is also the foundation for probabilistic parsing, where you attach probabilities to grammar rules and use the CYK table to find the most likely parse.

Practice Questions 5 questions

Prerequisite Chain

Counting to 10Counting to 20Understanding ZeroThe Number ZeroCounting to FiveOne-to-One CorrespondenceCombining Small Groups Within 5Addition Within 10Addition Within 20Two-Digit Addition Without RegroupingTwo-Digit Addition with RegroupingAddition Within 100Repeated Addition as MultiplicationMultiplication Facts Within 100Division as Equal SharingDivision as Grouping (Measurement Division)Division: Grouping (Repeated Subtraction) ModelDivision: Fair Sharing ModelDivision as Equal SharingDivision as GroupingBasic Division FactsDivision Facts Within 100Two-Digit by One-Digit DivisionDivision with RemaindersRemainders and Quotients in DivisionDivision Word ProblemsIntroduction to Long DivisionFactors and MultiplesPrime and Composite NumbersEquivalent FractionsRelating Fractions and DecimalsDecimal Place ValueReading and Writing DecimalsComparing and Ordering DecimalsAdding and Subtracting DecimalsMultiplying DecimalsDividing DecimalsDividing FractionsMixed Number ArithmeticOrder of OperationsInteger Order of OperationsVariable ExpressionsCombining Like TermsOne-Step EquationsTwo-Step EquationsSolving Multi-Step EquationsEquations with Variables on Both SidesLiteral EquationsSlope-Intercept FormPoint-Slope FormWriting Linear EquationsParallel and Perpendicular Line SlopesGraphing Linear EquationsSystems of Equations — Graphing MethodSystems of Equations — Elimination MethodSystems of Three VariablesMatrices IntroductionGraph Representation: Matrices and ListsGraph Paths, Cycles, and ConnectivityTrees and Spanning TreesBinary TreesTree TraversalsBreadth-First Search (BFS)Topological SortDynamic ProgrammingCYK Algorithm and Membership Testing

Longest path: 66 steps · 359 total prerequisite topics

Prerequisites (2)

Leads To (0)

No topics depend on this one yet.