Questions — Scanner Generator Implementation

Question 1 Multiple Choice

A scanner specification lists the keyword 'if' before the general identifier pattern [a-zA-Z_][a-zA-Z0-9_]*. When the scanner processes the input 'iffy', which token does it produce?

ATwo tokens: keyword 'if' followed by identifier 'fy'

BOne token: identifier 'iffy', because the longest match rule takes precedence

CA lexical error, because 'iffy' partially matches both a keyword and an identifier

DOne token: keyword 'if', because keywords always have highest priority

Question 2 Multiple Choice

Why does a scanner generator convert the combined NFA to a DFA before emitting scanner code, rather than simulating the NFA directly at runtime?

ANFAs cannot recognize the same languages as DFAs and would miss some tokens

BDFAs enable deterministic, O(1)-per-character processing: each state and input character maps to exactly one next state, enabling a simple table-driven scanner loop

CNFAs require exponentially more memory than DFAs and cannot be stored in a transition table

DDFAs are simpler to construct from regular expressions than NFAs using Thompson's construction

Question 3 True / False

A scanner generator combines all token patterns into a single NFA (using alternation) before converting to a DFA, so that the resulting DFA can classify tokens from any of the specified patterns in a single left-to-right pass.

TTrue

FFalse

Question 4 True / False

Because scanner generators use regular expressions, a sufficiently complex regex can recognize inputs with balanced nested parentheses, eliminating the need for a separate parser phase.

TTrue

FFalse

Question 5 Short Answer

Describe the pipeline from a regular expression specification to executable scanner code. What happens at each stage and why?

Think about your answer, then reveal below.

Questions: Scanner Generator Implementation