An ensemble is a collection of all possible microstates consistent with a given set of macroscopic constraints. Rather than following a single system's trajectory, ensemble theory computes observables as weighted averages over all allowed microstates. The ergodic hypothesis connects time averages for a single system to ensemble averages.
You know from your prerequisite on microstates and macrostates that a single macroscopic state — a gas at specified temperature, volume, and pressure — is consistent with an enormous number of microscopic configurations. Tracking the actual trajectory of 10²³ particles is both computationally impossible and physically unnecessary. Ensemble theory resolves this by replacing the impossible single-trajectory problem with a tractable statistical average. An ensemble is a conceptual collection of infinitely many identical copies of the system, each in a different microstate consistent with the same macroscopic constraints. Observables are then computed as averages over this collection, weighted by the probability of each microstate.
The genius of the ensemble approach is that it makes statistics do the work that trajectory-following cannot. Instead of asking "what is this system doing right now?", you ask "what is the probability distribution over microstates, and what do observables average to under that distribution?" You already know probability theory — an ensemble is simply a probability distribution over the space of microstates. For an isolated system with fixed energy E, volume V, and particle number N, the appropriate distribution is the microcanonical ensemble: uniform probability over all microstates with exactly energy E. Equal a priori probability for all accessible microstates is the fundamental postulate of equilibrium statistical mechanics, and it is from this postulate that entropy, temperature, and the other thermodynamic potentials derive.
The ergodic hypothesis provides the physical justification for ensemble averaging. It states that for a system at equilibrium, the time average of any observable (following one system for a very long time) equals the ensemble average (averaging over all copies at one instant), provided the system's trajectory eventually visits all accessible microstates with the appropriate frequency. This is what makes ensemble theory physically meaningful: you are not just computing a mathematical average over abstract copies, you are computing what a single real system will show when measured over time. The ergodic hypothesis fails for glasses and spin glasses — those systems get trapped in subsets of phase space — but for most equilibrium systems it holds, licensing the replacement of dynamics with statistics.
Different macroscopic constraints define different ensembles, each suited to different physical situations. The microcanonical ensemble (fixed E, V, N) describes an isolated system. The canonical ensemble (fixed T, V, N) describes a system in thermal contact with a heat reservoir — energy can fluctuate, but temperature is fixed and set by the reservoir. The grand canonical ensemble (fixed T, V, μ) allows both energy and particle number to fluctuate, controlled by temperature and chemical potential. Each ensemble produces the same macroscopic predictions in the thermodynamic limit (large N), because fluctuations relative to mean values scale as 1/√N and become negligible. The ensemble choice is therefore a matter of mathematical convenience, not physics: you use whichever formulation makes the calculation easiest for the constraints of your problem.