A query first joins a 1,000,000-row Employees table with a 500,000-row Transactions table, then filters for employees in department 'Engineering'. A database optimizer rewrites this to filter Employees first (returning ~5,000 rows) before performing the join. What principle justifies this rewrite?
AProjection elimination — removing unused columns reduces data size
BPushing selection down — applying filters early reduces rows flowing into expensive operations
DClosure property — operators always return relations, enabling any reorder
Pushing selections (filters) as early as possible is the single most impactful query optimization rule. Joining 1,000,000 rows × 500,000 rows before filtering produces a 500-billion-row intermediate result; filtering first reduces one input to ~5,000 rows before the join. Relational algebra proves these two expressions are equivalent, so the optimizer can safely substitute the cheaper plan. The closure property (option D) enables composition but does not itself justify reordering.
Question 2 Multiple Choice
A student writes π_{name}(Employees) and gets back 950 rows from a 1,000-row table. In SQL, SELECT name FROM Employees returns 1,000 rows. What explains the difference?
ASQL projection and relational algebra projection are computed differently — SQL is slower
BRelational algebra projection returns a set (no duplicates), while SQL SELECT returns a bag by default
CThe SQL query is incorrect; it should use DISTINCT to match relational algebra
DRelational algebra projection automatically joins with a key column to count rows
In relational algebra, the result of any operation is a relation, which by definition is a set — duplicate rows are eliminated. SQL's SELECT preserves duplicates (it operates on bags/multisets) unless you explicitly add DISTINCT. So π_{name}(Employees) and SELECT name FROM Employees are not equivalent unless the SQL uses SELECT DISTINCT name. This is a foundational difference between the formal algebra and its SQL implementation.
Question 3 True / False
The closure property of relational algebra means that operators can be arbitrarily nested and composed, feeding one operator's output into another's input.
TTrue
FFalse
Answer: True
Closure is the defining feature: every operator takes relations as input and produces a relation as output. This means the result of any expression is itself a valid input to any other operator, enabling arbitrary nesting. For example, σ_{salary > 100000}(π_{name,salary,dept}(Employees)) first projects columns, then filters rows — the intermediate relation produced by π feeds directly into σ.
Question 4 True / False
The natural join (⋈) between two relations produces the same result as a Cartesian product (×) when the relations share no attribute names.
TTrue
FFalse
Answer: True
When two relations share no column names, there are no shared attributes to join on, so the natural join condition is vacuously satisfied for all pairs — every row from one relation matches every row from the other. The result is identical to the Cartesian product. This is a precise and often surprising consequence of the definition. The natural join's useful behavior — automatic matching and duplicate column elimination — only kicks in when shared attribute names exist.
Question 5 Short Answer
Why does the real power of relational algebra lie in query optimization rather than query writing, and what makes 'pushing selections before joins' a valid optimization?
Think about your answer, then reveal below.
Model answer: Relational algebra defines precise equivalence rules proving that different orderings of operators produce identical results. Optimizers use these rules to substitute cheaper execution plans for the user's original query. Pushing selections before joins is valid because σ_{condition}(A ⋈ B) ≡ σ_{condition}(A) ⋈ B when the condition only involves attributes from A — both expressions produce the same final relation, but the right side joins smaller intermediate tables. The equivalence is algebraically provable, making the substitution safe regardless of data values.
SQL is for expressing what data you want; relational algebra is the substrate for proving that different ways of getting it are equivalent. Without a formal algebra, an optimizer would have to either enumerate all possible orderings (infeasible) or trust heuristics (unreliable). The algebra provides guaranteed-correct rewrite rules, which is why it is not merely an academic formalism but an active component of every serious database system.