What is the primary reason for converting a context-free grammar (CFG) to Chomsky Normal Form?
ACNF grammars generate strictly larger languages than the original grammars, making them more expressive
BCNF simplifies the grammar to minimize the number of production rules, reducing storage
CCNF's uniform binary structure enables the CYK parsing algorithm to run in O(n³) time
DCNF eliminates ambiguity — converting to CNF ensures every string has a unique parse tree
The main payoff of CNF is algorithmic: when every rule produces exactly two variables or one terminal, parse trees are full binary trees of predictable depth. The CYK algorithm exploits this by using dynamic programming over substrings, checking all possible binary splits. Without CNF's uniformity, rules of arbitrary length make this approach much harder to formulate. CNF does not expand the language (it preserves it), does not minimize rule count (it often adds rules), and does not guarantee unambiguity.
Question 2 Multiple Choice
A grammar G generates language L. After converting G to CNF form G', which statement is correct?
AG' generates a subset of L, because CNF's restrictions eliminate some derivations
BG' generates a superset of L, because the binarization step adds new variables that can derive more strings
CG' generates exactly L (or L minus the empty string if ε ∈ L), because CNF conversion preserves the language
DG' is guaranteed to be unambiguous, while G may have been ambiguous
CNF conversion is language-preserving: the converted grammar generates the same set of strings as the original, with the minor caveat that CNF cannot represent ε-productions, so if ε was in the original language it may be excluded. The key misconception is that restricting the form of productions restricts what can be generated — it doesn't, because the conversion systematically introduces new variables and rules that simulate the original behavior within the new format.
Question 3 True / False
Converting a CFG to Chomsky Normal Form changes the language the grammar generates, because removing ε-productions and unit productions eliminates some derivations.
TTrue
FFalse
Answer: False
This is the most common misconception about CNF conversion. The process is carefully designed to preserve the language: ε-productions are eliminated by creating new rules that account for every context in which the ε-deriving variable appears (with and without that variable); unit productions are eliminated by tracing rename chains and creating direct rules. The language is preserved — possibly minus ε if it was originally in the language, since CNF grammars cannot derive the empty string.
Question 4 True / False
In a parse tree generated by a grammar in Chomsky Normal Form, every internal node has exactly two children.
TTrue
FFalse
Answer: True
CNF allows only two types of rules: A → BC (a variable produces two variables) and A → a (a variable produces one terminal). The A → BC rule creates an internal node with exactly two children; A → a creates a leaf. Therefore every internal (non-leaf) node has exactly two children, making the parse tree a full binary tree. This regularity is what allows CYK to use a clean n×n table with exactly n−1 splits for a string of length n.
Question 5 Short Answer
Why must ε-productions be eliminated before unit productions when converting a CFG to CNF, rather than in any order?
Think about your answer, then reveal below.
Model answer: Eliminating unit productions (A → B) is done by tracing rename chains and replacing them with direct rules. But if ε-productions still exist when you do this, a unit chain might involve a nullable variable — one that can derive ε — creating new ε-productions that weren't there before. By eliminating ε-productions first, you ensure that when you collapse unit chains, no new ε-productions are reintroduced. Similarly, removing useless symbols after both steps avoids removing variables that are only 'useless' because earlier steps haven't been completed yet.
This ordering dependency is a practical application of the general principle that normalization algorithms often have correctness-preserving orderings. Applying the steps out of order can produce a grammar that looks normalized but silently generates a different language than intended. The correctness proof of CNF conversion relies on each step operating on a grammar that has already satisfied all the invariants established by prior steps.