Monte Carlo methods estimate thermal averages by sampling microstates according to their Boltzmann weight P(state) ∝ exp(-E/kT). Importance sampling biases random walks toward likely states, vastly reducing computation. The algorithm efficiently explores the configuration space and provides results for systems where analytical solutions are intractable, such as the 3D Ising model.
From statistical ensembles, you know that the thermal average of any observable A is ⟨A⟩ = Σ A(s) P(s), summed over all microstates s, where P(s) = exp(-E(s)/k_B T) / Z is the Boltzmann weight and Z is the partition function. This formula is exact and elegant. It is also, for almost every system of practical interest, completely useless by direct computation: a system of N spins has 2^N microstates. For N = 100, that is more configurations than atoms in the observable universe. Some other approach is needed.
The Monte Carlo idea is to replace the impossible exact sum with a clever stochastic approximation: instead of visiting all states, randomly sample states and average over the sample. If you sample states according to the Boltzmann distribution P(s), then the sample average ⟨A⟩_sample = (1/n) Σᵢ A(sᵢ) converges to the true thermal average ⟨A⟩ as the sample size n grows. This works because states with high Boltzmann weight appear frequently in the sample and dominate the average, just as they should. The key question is: how do you generate samples from P(s) without knowing Z?
This is where importance sampling comes in. Rather than drawing random microstates uniformly (which would almost never land in the thermodynamically relevant region where P(s) is large), the algorithm performs a random walk through configuration space that automatically favors low-energy states at the right temperature. The most famous implementation is the Metropolis algorithm: propose a small random change to the current state (e.g., flip one spin); if the change lowers energy, always accept it; if it raises energy by ΔE, accept it with probability exp(-ΔE/k_B T). This acceptance rule ensures the random walk visits states in proportion to their Boltzmann weight — a property called detailed balance. After enough steps for the walk to "forget" its starting point (thermalization), subsequent states are drawn from the equilibrium distribution, and any observable averaged over them converges to its thermal average.
The practical power of Monte Carlo is that it scales polynomially with system size rather than exponentially, making it the method of choice for systems like the 3D Ising model, lattice quantum field theories, and protein folding. The cost is statistical: results have error bars that decrease as 1/√n, so precision requires long runs. A subtlety is autocorrelation — successive states in the random walk are correlated, so you need many more steps than independent samples to achieve a given precision. Near a phase transition, correlations extend over very long length scales (critical slowing down), making Monte Carlo expensive exactly where the physics is most interesting. Advanced algorithms such as cluster updates (Wolff, Swendsen-Wang) address this by flipping entire correlated clusters of spins simultaneously, dramatically reducing autocorrelation times near the critical point.