A sequence {Xₙ} converges almost surely to X if P(lim_{n→∞} Xₙ = X) = 1, equivalently P({ω: lim_{n→∞} Xₙ(ω) = X(ω)}) = 1. This is the strongest form of convergence, meaning the pointwise limit exists for all ω except on a set of probability zero.
To understand almost sure convergence, you need to think carefully about what a random variable actually is. Each random variable Xₙ is a function from the sample space Ω to the reals — at each outcome ω ∈ Ω, Xₙ(ω) is just a number. A sequence {Xₙ} converges almost surely to X if, for almost every individual outcome ω, the numerical sequence Xₙ(ω) converges to X(ω) in the ordinary sense from real analysis. The "almost" means we allow an exceptional set of measure zero — a set of outcomes so unlikely they collectively have probability zero. Except for those negligible outcomes, every single sample path converges to the target.
Compare this to convergence in probability, which you studied as a prerequisite. That mode says: for any ε > 0, P(|Xₙ − X| > ε) → 0 as n → ∞. This is a statement about marginal behavior at each n — at step n, the probability of being far from X is small. It does NOT say that the path of a given ω actually settles down; individual paths could oscillate and still have the marginal probabilities converge. Almost sure convergence is strictly stronger: it demands that each path eventually locks onto the limit and stays there.
The Borel-Cantelli lemmas (your hard prerequisite) are the primary tool for proving almost sure convergence. The first lemma says: if Σₙ P(Aₙ) < ∞, then P(Aₙ infinitely often) = 0 — only finitely many of the events Aₙ can occur almost surely. Applying this to the events Aₙ = {|Xₙ − X| > ε}: if you can show Σₙ P(|Xₙ − X| > ε) < ∞ for every ε > 0, then almost surely only finitely many Xₙ deviate from X by more than ε, which means the sequence must eventually converge. This is the standard proof strategy: bound the probability tail, sum it, invoke Borel-Cantelli.
Almost sure convergence is the foundation for the Strong Law of Large Numbers: the sample mean X̄ₙ converges almost surely to the population mean μ. This is a much stronger statement than the Weak Law (which gives only convergence in probability). The strong law says that if you were to run a random experiment forever, with probability 1 the running average of your outcomes would converge to the true mean — not just be close with high probability at each step, but actually settle and stay arbitrarily close. Understanding the difference between these modes of convergence is one of the genuine conceptual achievements of rigorous probability theory.