Questions: Kolmogorov Complexity and Algorithmic Information Theory
4 questions to test your understanding
Score: 0 / 4
Question 1 Multiple Choice
Why is Kolmogorov complexity uncomputable, and why can't engineers simply use an approximate version (like LZ77 compression ratio) as a practical substitute?
AIt's uncomputable because it requires testing all possible programs, which takes infinite time
BThere exists a diagonalization argument: if K were computable, we could find the first string with K(x) > n using a program of length O(log n), which contradicts K(x) > n
CKolmogorov complexity is computable but incredibly slow — engineers avoid it for efficiency reasons, not fundamental ones
DIt depends on the universal Turing machine choice, making it incomputable
This is Chaitin's uncomputability theorem. If K were computable, consider a program P(n) that finds the first string x such that K(x) > n and outputs it. Program P itself has length O(log n). But then K(x) <= |P| + O(1) = O(log n) < n for sufficiently large n, contradicting the requirement that K(x) > n. Approximations like compression ratio (LZ77, zlib) are computable but don't guarantee correctness — they may underestimate K(x) for any particular string. However, these approximations are valuable in practice: universal data compression schemes like LZ77 compress typical strings to approximately their entropy, approximating K(x) for practical purposes on average.
Question 2 True / False
Chaitin's constant Omega, the halting probability of a universal Turing machine, is both well-defined and uncomputable. This means Omega's digits exist and are unique, but no algorithm can output them.
TTrue
FFalse
Answer: True
Omega = sum over {p : U(p) halts} 2^(-|p|) is a perfectly well-defined real number in (0, 1). Its value is independent of the choice of universal machine (up to a constant in the normalization). It can be approximated from below by running programs: each program that halts contributes 2^(-|p|) to the sum. But computing Omega to arbitrary precision would solve the halting problem (knowing the first n bits of Omega tells you which programs halt in n steps). Omega encodes the solutions to every computably enumerable problem in its digits — it is a real number containing provably inaccessible infinite information, illustrating the limits of formal systems.
Question 3 Short Answer
Explain how the universal prior 2^(-K(x)) relates to Solomonoff induction, and why Solomonoff induction is 'universal' despite being uncomputable.
Think about your answer, then reveal below.
Model answer: Solomonoff induction uses the universal prior P(x) = 2^(-K(x)) / Z where Z is a normalization constant. Given observed data D, the posterior over hypotheses (programs p generating candidate data) is updated via Bayes' rule. The predictive distribution for the next datum is the mixture over all programs consistent with D, weighted by their priors. This is 'universal' because it provably converges to the true distribution (assuming the true process is computable) regardless of what the true distribution is — the convergence is distribution-free. It's 'optimal' in the sense that Solomonoff's sequence prediction is at most a constant factor worse (in log-loss) than any other computable predictor. Despite uncomputability, Solomonoff induction provides the theoretical gold standard for prediction and connects to practical MDL (minimum description length) principle used in machine learning.
Solomonoff induction formalizes Occam's razor informationally: simpler hypotheses (shorter programs) get exponentially higher prior weight. This principle is so powerful that even averaging over all hypotheses (weighted by simplicity) eventually learns the truth. No computable predictor can be fundamentally better by more than a constant factor — Solomonoff is rate-optimal in theory. The practical impact is that MDL-based model selection (minimize description length of data given model) approximates Solomonoff induction without computing K(x) explicitly.
Question 4 Multiple Choice
For a string x = '01' repeated 500 times (1000 bits total), estimate K(x) and explain why this string, despite having length 1000, has much lower Kolmogorov complexity.
AK(x) ≈ 1000 bits, since the string must be fully specified
BK(x) ≈ 30 bits — a short program like 'print 01 500 times' generates the string
CK(x) ≈ log2(1000) ≈ 10 bits, since we only need to specify the length
DK(x) cannot be determined without running all possible programs
A program 'print 01 500 times' is roughly 30 bits (including encoding the number 500). This program outputs the full 1000-bit string, so K(x) <= 30 + O(1). There is no shorter program (to get K(x) significantly below 30, we'd need a shorter representation of 500, which requires approximately log2(500) ≈ 9 bits for the number alone). Thus K(x) is dominated by the program logic, not the string length. This is the essence of incompressibility: regular, structured strings have low K (short generators), while random strings have K ≈ |x|.