Questions: Data Sharding and Partitioning Strategies

5 questions to test your understanding

Score: 0 / 5
Question 1 Multiple Choice

A database uses hash sharding on user_id. An analyst runs the query: 'Find all users who signed up between January and March 2024.' How does the database handle this query?

AIt routes the query to the shard responsible for the date range Jan–Mar 2024
BIt must contact every shard, because hash(user_id) scatters adjacent user_ids across all nodes — there is no correlation between signup date and shard location
CIt contacts only the shard that happens to store the most recent signups, since hash functions preserve insertion order
DIt uses a secondary index on signup date stored on the coordinator node to route the query efficiently
Question 2 Multiple Choice

A social media platform is choosing a shard key. Option A: shard on user_id (high cardinality, uniformly distributed). Option B: shard on country (low cardinality, uneven distribution). Why is option A generally better?

Auser_id produces more shards, which always improves performance
Bcountry causes hot spots because a few large countries (US, India) would receive a disproportionate fraction of traffic, overwhelming those shards while others sit idle
Cuser_id is better because alphabetical ordering makes range queries on users more efficient
Dcountry is worse only because it has fewer possible values, not because of traffic distribution
Question 3 True / False

Hash sharding is strictly better than range sharding because it typically distributes load evenly and eliminates hot spots.

TTrue
FFalse
Question 4 True / False

With range sharding on last name, all users with last names starting with 'J' can typically be served by querying a single shard.

TTrue
FFalse
Question 5 Short Answer

A startup is designing a sharding strategy for a social media platform. Why might sharding on user_id be better than sharding on country, and what failure mode remains even with a good shard key?

Think about your answer, then reveal below.