Questions: Regular Expressions (Formal Language Theory)
5 questions to test your understanding
Score: 0 / 5
Question 1 Multiple Choice
Which of the following languages CANNOT be described by any formal regular expression?
AAll binary strings containing at least one '1'
BAll strings over {a, b} where every 'a' is immediately followed by a 'b'
CAll strings of the form aⁿbⁿ for n ≥ 0 (equal numbers of a's then b's)
DAll strings over {0, 1} that end in '00'
The language {aⁿbⁿ : n ≥ 0} is not regular — no finite automaton can count unboundedly, and no regular expression can describe it. The other three languages ARE regular: 'at least one 1' is (0∪1)*1(0∪1)*, 'every a followed by b' is (ab∪b)*, and 'ends in 00' is (0∪1)*00. The pumping lemma for regular languages formally proves aⁿbⁿ is non-regular, but the intuition is that you'd need to 'remember' how many a's you've seen to match them with b's — a finite automaton has no such memory.
Question 2 Multiple Choice
What language does the regular expression ab* denote?
AZero or more repetitions of the string 'ab'
BThe string 'a' followed by zero or more 'b's
CEither 'a' or zero or more 'b's
DOne or more 'b's, optionally preceded by 'a'
Operator precedence is the key: Kleene star binds tighter than concatenation. So ab* is parsed as a(b*) — the symbol 'a' concatenated with zero-or-more 'b's. This gives the language {a, ab, abb, abbb, ...}. To get zero or more repetitions of 'ab' (option A), you would need parentheses: (ab)*, which gives {ε, ab, abab, ababab, ...}. Getting precedence wrong is the most common error when writing or reading regular expressions.
Question 3 True / False
The regular expressions (ab)* and ab* describe the same language over the alphabet {a, b}.
TTrue
FFalse
Answer: False
(ab)* describes zero or more repetitions of the pair 'ab': {ε, ab, abab, ababab, ...}. ab* describes 'a' followed by zero or more 'b's: {a, ab, abb, abbb, ...}. These are entirely different sets. For example, 'abab' is in (ab)* but not in ab*; 'abb' is in ab* but not in (ab)*; and ε is in (ab)* but not in ab*. The difference stems from operator precedence: star binds to its immediate left operand — just 'b' in ab*, but the grouped '(ab)' when parenthesized.
Question 4 True / False
Any language accepted by a nondeterministic finite automaton (NFA) can be described by a formal regular expression using only union, concatenation, and Kleene star.
TTrue
FFalse
Answer: True
This is Kleene's theorem, one of the foundational results in formal language theory. It establishes a three-way equivalence: DFAs, NFAs, and formal regular expressions all describe exactly the same class of languages — the regular languages. This means the three operations (union, concatenation, star) are not just convenient notation; they are exactly sufficient to characterize everything a finite automaton can do. No additional operations are needed, and no additional operations can expand the class.
Question 5 Short Answer
Formal regular expressions use only three operations — union, concatenation, and Kleene star. Why does this minimal set describe exactly the same languages that finite automata can recognize?
Think about your answer, then reveal below.
Model answer: The three operations correspond directly to the structural operations on automata. Union corresponds to combining two NFAs with a new start state and epsilon transitions to each. Concatenation corresponds to connecting the accept states of one NFA to the start state of another. Kleene star corresponds to adding epsilon transitions from accept states back to the start state. Thompson's construction converts any regular expression to an NFA using these correspondences, and the state elimination algorithm converts any NFA back to a regular expression, proving the equivalence is exact.
The deep point is that these three operations are not arbitrary — they are exactly the operations you can perform on finite automata while staying within the class of finite automata. More powerful operations like backreferences (which 'remember' a matched string) require more computational power than a finite automaton has, which is why PCRE regex engines that support backreferences can match some non-regular languages.