The gradient of f is the vector ∇f = ⟨∂f/∂x, ∂f/∂y⟩ (in ℝ²) or ⟨∂f/∂x, ∂f/∂y, ∂f/∂z⟩ (in ℝ³) that collects all partial derivatives. The gradient points in the direction of steepest increase of f and is always perpendicular to the level curves (or level surfaces) of f. The magnitude |∇f| gives the rate of change in the steepest direction. These two properties — direction and orthogonality to level sets — make the gradient the central object of multivariable calculus.
Draw level curves and overlay the gradient field. Students should see geometrically that ∇f is perpendicular to level curves before they see any algebraic proof. The steepest-ascent interpretation connects directly to gradient descent in optimization and machine learning contexts, which provides strong motivation.
When you learned partial derivatives, you computed how f changes in the x-direction (holding y fixed) and in the y-direction (holding x fixed). The gradient simply bundles these into a single vector: ∇f = ⟨∂f/∂x, ∂f/∂y⟩. But the gradient is far more than a notational convenience — it encodes the directional behavior of f in every direction at once, through the formula for the directional derivative: Dᵤf = ∇f · u, where u is any unit vector.
The most important geometric fact about the gradient is its relationship to level curves. A level curve of f is the set of all points where f takes some constant value c — think of elevation contours on a topographic map. The gradient ∇f at any point is always perpendicular (normal) to the level curve through that point. This makes intuitive sense: if you walk along a level curve, your elevation doesn't change, so you're moving perpendicular to the direction of steepest change. The steepest ascent must be perpendicular to the flat direction.
This also explains why ∇f points in the direction of steepest increase. The directional derivative equals |∇f| cos(θ), where θ is the angle between ∇f and your direction of travel. This is largest when θ = 0 (moving parallel to ∇f) and equals |∇f|, the maximum possible rate of change. Moving in the −∇f direction gives the steepest descent — which is exactly what gradient descent algorithms in optimization exploit.
Two misconceptions deserve special attention. First, the gradient is a vector with both magnitude and direction — not a scalar. The magnitude |∇f| tells you how steeply f is changing; the direction tells you which way. Second, the gradient is perpendicular to level curves in the domain (the xy-plane), not to the graph of f in 3D space. These are different geometric objects, and confusing them is especially common when students first encounter surface normals in later topics.