Questions: Tokenization and Lexemes

5 questions to test your understanding

Score: 0 / 5

Question 1 Multiple Choice

A tokenizer encounters the input `<=`. It has defined patterns: `<` maps to LESS_THAN, and `<=` maps to LESS_EQUAL. Which token does it produce, and why?

ATwo tokens: LESS_THAN for `<` followed by EQUAL for `=`

BOne token: LESS_EQUAL for `<=`, because the longest match rule selects the pattern that matches the most characters

CAn error — `<=` is ambiguous because both patterns could apply

DOne token: LESS_THAN, because simpler patterns take priority

Question 2 Multiple Choice

In the tokenized output for the source text `if (count >= 10)`, what is the *lexeme* for the `>=` operator?

AGE (the token type name)

B`>=` (the actual two-character substring from source code)

C2 (the character count of the match)

DBoth `>=` and GE together — a lexeme is always a type-value pair

Question 3 True / False

The keyword `while` in most programming languages would be tokenized as an IDENTIFIER, because it matches the identifier pattern `[a-zA-Z_][a-zA-Z0-9_]*`.

TTrue

FFalse

Question 4 True / False

In most compilers, whitespace and comments between tokens are consumed by the tokenizer but not emitted as tokens in the output sequence.

TTrue

FFalse

Question 5 Short Answer

Explain the difference between a lexeme and a token, and give a concrete example showing why the distinction matters.

Think about your answer, then reveal below.