Spatial audio describes techniques for reproducing sound in three-dimensional space, going beyond the left-right stereo field to include height and depth dimensions. Where conventional stereo uses two channels to create a horizontal image, spatial audio formats encode the full sphere of possible sound positions — above, below, front, back, and all angles between.
Ambisonics is a full-sphere surround sound technique developed by Michael Gerzon in the 1970s. Rather than recording individual speaker feeds, Ambisonics encodes the soundfield as a set of mathematical components (B-format signals) that describe the acoustic pressure and directional velocity at a single point in space. First-order Ambisonics (FOA) uses four channels (W, X, Y, Z — omnidirectional pressure plus three directional components). Higher-order Ambisonics (HOA) uses additional channels to encode finer spatial detail: second-order uses nine channels, third-order uses sixteen. The advantage of Ambisonics is format agnosticism — a single recorded or mixed B-format file can be decoded for any speaker array (stereo, quad, 5.1, 7.1.4, headphones with HRTF) at playback time.
HRTF (Head-Related Transfer Function) is the acoustic transformation applied to sound as it diffracts around the head and pinnae (outer ears) before reaching the eardrums. HRTFs encode the cues that allow localization of sounds in three dimensions from just two ears. Convolving audio with HRTF filters produces binaural audio — headphone playback that creates convincing height and out-of-head sound placement. Personalized HRTFs (measured from a specific individual) produce more accurate localization than generic averages, which is why platforms like Apple offer HRTF personalization through iPhone scanning.
Object-based audio formats (Dolby Atmos, Sony 360 Reality Audio, MPEG-H) encode audio as individual sound objects with positional metadata rather than fixed speaker channel assignments. At playback, the renderer maps each object to the available speakers or headphones using the HRTF or speaker feed calculation appropriate to the playback configuration. This format flexibility is why a Dolby Atmos mix can play on a 7.1.4 cinema system, a 5.1.2 home theater, or headphones — the same mix data is decoded differently for each context.
Spatial audio represents the leading edge of consumer and professional audio technology. Apple's Spatial Audio (headphone Atmos with dynamic head tracking), Sony's 360 Reality Audio, and Amazon Music's Spatial Audio catalog are driving mainstream adoption of binaural and object-based formats. Simultaneously, immersive audio for VR, AR, and spatial computing demands accurate, interactive three-dimensional audio with head-tracked HRTF rendering.
The technical demands of spatial audio production are substantially higher than stereo. Mixing in Atmos requires atmos-capable software (Pro Tools + Dolby Atmos Production Suite, Logic Pro's Spatial Audio mixer, Nuendo) and a speaker array (at minimum a 7.1.4 bed) for monitoring. Evaluating object placement and height imaging requires both speaker monitoring and binaural headphone checking, as the listening experience differs significantly between systems.
Ambisonics has become particularly important for VR and 360-degree video, where the listener's head orientation changes dynamically. A recorded Ambisonic soundfield can be rotated in real time to match head tracking, maintaining correct spatial correspondence between visual and auditory scenes. This interactivity distinguishes spatial audio production for immersive media from traditional post-production workflows, requiring new tools, new monitoring approaches, and a fundamentally different way of thinking about the relationship between sound and space.
Topics in reflective domains aren't scored by quiz answers. Read, reflect, and mark when you've thought it through.
No topics depend on this one yet.