A parallel matrix multiplication program uses multiple threads, each computing one row of the result matrix. Afterward, a normalization pass divides every element by the matrix's maximum value. A programmer wraps each row computation in a mutex to protect shared state. Does this correctly ensure the normalization pass sees all rows computed?
AYes — mutexes ensure ordered access, so the normalization thread cannot read a row until it is fully written
BNo — a barrier is needed here; mutexes protect shared data from concurrent access but do not prevent the normalization thread from starting before all row-computing threads have finished their rows
CYes — mutexes prevent any concurrent execution, so normalization cannot begin while rows are in progress
DNo — only semaphores can coordinate between computation phases; mutexes only work within a single phase
Mutexes and barriers solve different coordination problems. A mutex ensures that at most one thread accesses a protected resource at a time — it prevents data races. A barrier ensures that no thread advances past a point until all threads have reached it — it prevents phase races. The normalization pass needs a barrier: it must not begin until every row-computing thread has finished. Wrapping individual row writes in a mutex protects against concurrent writes to the same row, but nothing stops the normalization thread from starting on already-written rows while others are still being computed.
Question 2 Multiple Choice
In a simple centralized barrier implementation, the last thread to arrive at the barrier increments the counter to N and then calls broadcast on a condition variable rather than signal. Why broadcast?
ABroadcasting is faster than signaling for groups larger than two threads
BSignal wakes exactly one waiting thread; broadcast wakes all of them — all N-1 threads waiting at the barrier need to be released simultaneously, not just one
CBroadcasting automatically resets the counter to zero, which signaling cannot do
DBroadcasting guarantees FIFO ordering among threads leaving the barrier
pthread_cond_signal wakes one arbitrarily chosen waiting thread; pthread_cond_broadcast wakes all of them. At a barrier, N-1 threads may be sleeping on the condition variable, all waiting for the last arrival. The last thread must wake all of them — if it only signaled, one thread would wake up, proceed, and the remaining N-2 would sleep forever. The counter reset to zero is done explicitly in code before the broadcast, not by the broadcast itself.
Question 3 True / False
A barrier can replace a mutex to protect a shared data structure from concurrent writes, since both are synchronization primitives.
TTrue
FFalse
Answer: False
These primitives serve fundamentally different purposes. A mutex protects a critical section: it allows only one thread inside at a time, preventing concurrent modification of shared data. A barrier coordinates phases: it ensures all threads complete phase N before any begin phase N+1. A barrier does nothing to prevent multiple threads from simultaneously writing to a shared structure within a phase — it only synchronizes the transitions between phases. You need both: mutexes to protect data within a phase, barriers to synchronize the phase transitions.
Question 4 True / False
A centralized barrier (one shared counter, one mutex, one condition variable) can become a performance bottleneck when the thread count is very large, because all threads must contend for the same lock.
TTrue
FFalse
Answer: True
With many threads all converging on a single mutex to increment the same counter, lock contention forces threads to acquire the mutex sequentially — effectively serializing the arrival phase at the barrier. For high thread counts, this sequential bottleneck wastes parallelism. Tree barriers and butterfly barriers address this by organizing threads into pairs or small groups that synchronize locally before propagating the 'all arrived' signal upward, reducing the maximum contention from N threads on one lock to O(log N) levels of two-thread synchronizations.
Question 5 Short Answer
Why do iterative parallel simulations (like finite-element solvers or physics simulations) require barriers rather than just mutexes to produce correct results?
Think about your answer, then reveal below.
Model answer: In an iterative simulation, each thread computes values for its region based on the current state of neighboring regions. Correctness requires that all threads finish computing step N before any thread reads neighbors' values for step N+1. Without a barrier, a fast thread could begin step N+1 and read a neighbor's partially updated or not-yet-updated state from step N, producing incorrect results. Mutexes prevent concurrent access to the same memory location, but they cannot prevent a thread from advancing to the next iteration while other threads are still in the current one — only a barrier that holds everyone until all have arrived can guarantee that the entire state array reflects the completed step N before any thread proceeds.
This phase-completion guarantee is the defining use case for barriers. The pattern appears throughout scientific computing: iterative PDE solvers, cellular automata, parallel graph algorithms with rounds, and any simulation where the next state depends on the complete current state. The barrier is placed at the end of each iteration, creating a synchronization fence that separates the reading phase from the writing phase across all threads.