The axiom of choice states: for any collection {S_i : i ∈ I} of non-empty sets, there exists a choice function f such that f(i) ∈ S_i for each i. This axiom is equivalent to Zorn's lemma (every partially ordered set with upper bounds has maximal elements) and the well-ordering theorem (every set can be well-ordered). It is independent of ZF.
From indexed families of sets, you know that an indexed family {S_i : i ∈ I} assigns a set S_i to each index i. The axiom of choice (AC) asserts that no matter how large I is and no matter how the sets S_i are defined, as long as each S_i is non-empty, there is a function f with f(i) ∈ S_i for every i ∈ I. For finite families, you can construct f explicitly — just pick one element from each S_i in finitely many steps. For infinite families (and especially for uncountably infinite families of sets with no definable structure), the axiom asserts the existence of f without providing any rule for constructing it. This is the non-constructive character of AC.
A concrete analogy: imagine you have infinitely many drawers, each containing at least one sock. AC says you can select one sock from each drawer simultaneously, even if all the socks are identical (no rule distinguishes them). For finitely many drawers, you could physically reach in and pick; for infinitely many, you are asserting a mathematical object — the choice function — exists without exhibiting it. Bertrand Russell's sock analogy illuminates why AC is genuinely needed: you cannot "define" your way to a choice function when sets have no distinguishing structure.
The three equivalent formulations each expose a different face of the same principle. Zorn's lemma says: if every chain (totally ordered subset) in a partially ordered set P has an upper bound in P, then P has a maximal element. This is the standard tool in algebra and analysis — it is how you prove every vector space has a basis, every ring has a maximal ideal, every filter extends to an ultrafilter. Knowing about well-founded relations helps here: Zorn's lemma is equivalent to AC precisely because "maximal elements exist" encodes the same global selection principle. The well-ordering theorem says: every set can be given a total order in which every non-empty subset has a least element. For ℕ, the standard order is a well-ordering. For ℝ, constructing a well-ordering is impossible without AC (and the resulting order cannot be explicitly described). The well-ordering theorem is perhaps the most startling equivalent: it asserts ℝ can be well-ordered, a claim that is consistent with ZF + AC but whose witness is provably non-constructive.
Independence means that AC is neither provable from nor refutable from the ZF axioms alone. Gödel (1938) showed AC is consistent with ZF by constructing the constructible universe L, where AC holds. Cohen (1963) showed ¬AC is consistent with ZF via forcing, constructing a model where every real is definable but a choice function for a particular family of countable sets does not exist. The independence result means you are choosing whether to include AC as a foundational commitment — and the mathematical community's consensus choice is to include it (giving ZFC), because virtually all of classical analysis, algebra, and topology requires it. Understanding AC's equivalences is understanding a fundamental axis along which mathematical possibility varies.