A sublinear-time algorithm estimates the number of edges in a graph given in adjacency list representation. It samples s random vertices, queries their degrees, and returns n * (average sampled degree) / 2. How large must s be for epsilon-relative error?
As = O(1/epsilon) regardless of the graph structure
Bs = O(sqrt(n) / epsilon^2) — the variance depends on the degree distribution and sqrt(n) is needed because high-degree vertices dominate but are rare
Cs = O(n / epsilon) — you need to sample a constant fraction
The estimator is the sample mean of degrees times n/2. The variance depends on the degree distribution: if one vertex has degree n-1 and all others have degree 1, the variance is huge because hitting that vertex matters enormously. In general, achieving epsilon-relative error requires s = O(sqrt(n)/epsilon^2) samples, which is sublinear in n but depends on n. This is tight: there exist graphs where Omega(sqrt(n)) queries are necessary to estimate the edge count to constant relative accuracy. The sqrt(n) dependency reflects the difficulty of estimating sums when the values are highly skewed.
Question 2 True / False
Every decision problem that can be solved exactly in O(n) time can also be solved approximately (with one-sided error) in o(n) time.
TTrue
FFalse
Answer: False
Some properties require reading nearly the entire input even to approximate. For example, determining whether a sorted array has a specific element (search) requires Omega(log n) time even with random access, and for some problems the lower bound is Theta(n). The key distinction is between properties that are 'locally testable' (local structure reveals global properties) and those that are not. Sortedness of an array IS locally testable in O(sqrt(n) / epsilon) time, but determining whether exactly n/2 entries are positive requires Omega(n) queries even approximately. Sublinear algorithms exist when the property has sufficient local-to-global structure.
Question 3 True / False
Sublinear-time algorithms must use randomization — no deterministic sublinear-time algorithm can provide useful approximate answers.
TTrue
FFalse
Answer: True
A deterministic sublinear algorithm reads a fixed subset of the input (fewer than n locations). An adversary can construct two inputs that differ only outside the queried locations — one satisfying the property and one not — making them indistinguishable. Randomization is essential because the adversary (who must fix the input before the algorithm's random choices) cannot predict which locations will be queried. This is analogous to the argument that randomization helps in online algorithms: it prevents adversarial targeting of the algorithm's fixed behavior. All known sublinear-time algorithms for nontrivial problems are randomized.
Question 4 Short Answer
Explain the local-to-global principle that underlies sublinear algorithms: why can global properties of massive datasets sometimes be determined from small random samples?
Think about your answer, then reveal below.
Model answer: Many global properties decompose into local conditions: if the property holds, every local neighborhood is consistent; if the property fails on epsilon-fraction of the input, then epsilon-fraction of local neighborhoods contain evidence of failure. A random sample of O(1/epsilon) neighborhoods therefore detects failure with constant probability. For example, a graph is bipartite if and only if it contains no odd cycles. If a graph is epsilon-far from bipartite (need to remove epsilon*m edges), then many short random walks will encounter evidence of odd cycles. The local-to-global principle says that 'far from satisfying a property' implies 'many local witnesses of violation,' which random sampling finds. This principle fails when violations are concentrated in a small region — which is why some problems have sublinear lower bounds matching the input size.
This principle is formalized in property testing theory. Testable properties are those where epsilon-farness guarantees abundant local evidence. The study of which properties are testable and how many queries they require is a rich area connecting combinatorics, probability, and computational complexity.