Spatial Audio and Ambisonics

Research Depth 67 in the knowledge graph I know this Set as goal
spatial-audio ambisonics binaural immersive-audio

Core Idea

Spatial audio describes techniques for reproducing sound in three-dimensional space, going beyond the left-right stereo field to include height and depth dimensions. Where conventional stereo uses two channels to create a horizontal image, spatial audio formats encode the full sphere of possible sound positions — above, below, front, back, and all angles between.

Ambisonics is a full-sphere surround sound technique developed by Michael Gerzon in the 1970s. Rather than recording individual speaker feeds, Ambisonics encodes the soundfield as a set of mathematical components (B-format signals) that describe the acoustic pressure and directional velocity at a single point in space. First-order Ambisonics (FOA) uses four channels (W, X, Y, Z — omnidirectional pressure plus three directional components). Higher-order Ambisonics (HOA) uses additional channels to encode finer spatial detail: second-order uses nine channels, third-order uses sixteen. The advantage of Ambisonics is format agnosticism — a single recorded or mixed B-format file can be decoded for any speaker array (stereo, quad, 5.1, 7.1.4, headphones with HRTF) at playback time.

HRTF (Head-Related Transfer Function) is the acoustic transformation applied to sound as it diffracts around the head and pinnae (outer ears) before reaching the eardrums. HRTFs encode the cues that allow localization of sounds in three dimensions from just two ears. Convolving audio with HRTF filters produces binaural audio — headphone playback that creates convincing height and out-of-head sound placement. Personalized HRTFs (measured from a specific individual) produce more accurate localization than generic averages, which is why platforms like Apple offer HRTF personalization through iPhone scanning.

Object-based audio formats (Dolby Atmos, Sony 360 Reality Audio, MPEG-H) encode audio as individual sound objects with positional metadata rather than fixed speaker channel assignments. At playback, the renderer maps each object to the available speakers or headphones using the HRTF or speaker feed calculation appropriate to the playback configuration. This format flexibility is why a Dolby Atmos mix can play on a 7.1.4 cinema system, a 5.1.2 home theater, or headphones — the same mix data is decoded differently for each context.

Explainer

Spatial audio represents the leading edge of consumer and professional audio technology. Apple's Spatial Audio (headphone Atmos with dynamic head tracking), Sony's 360 Reality Audio, and Amazon Music's Spatial Audio catalog are driving mainstream adoption of binaural and object-based formats. Simultaneously, immersive audio for VR, AR, and spatial computing demands accurate, interactive three-dimensional audio with head-tracked HRTF rendering.

The technical demands of spatial audio production are substantially higher than stereo. Mixing in Atmos requires atmos-capable software (Pro Tools + Dolby Atmos Production Suite, Logic Pro's Spatial Audio mixer, Nuendo) and a speaker array (at minimum a 7.1.4 bed) for monitoring. Evaluating object placement and height imaging requires both speaker monitoring and binaural headphone checking, as the listening experience differs significantly between systems.

Ambisonics has become particularly important for VR and 360-degree video, where the listener's head orientation changes dynamically. A recorded Ambisonic soundfield can be rotated in real time to match head tracking, maintaining correct spatial correspondence between visual and auditory scenes. This interactivity distinguishes spatial audio production for immersive media from traditional post-production workflows, requiring new tools, new monitoring approaches, and a fundamentally different way of thinking about the relationship between sound and space.

What did you take from this?

Topics in reflective domains aren't scored by quiz answers. Read, reflect, and mark when you've thought it through.

Quiz me anyway →

Prerequisite Chain

Counting to 10Counting to 20Understanding ZeroThe Number ZeroCounting to FiveOne-to-One CorrespondenceCombining Small Groups Within 5Addition Within 10Addition Within 20Two-Digit Addition Without RegroupingTwo-Digit Addition with RegroupingAddition Within 100Repeated Addition as MultiplicationMultiplication Facts Within 100Division as Equal SharingDivision as Grouping (Measurement Division)Division: Grouping (Repeated Subtraction) ModelDivision: Fair Sharing ModelDivision as Equal SharingDivision as GroupingBasic Division FactsDivision Facts Within 100Two-Digit by One-Digit DivisionDivision with RemaindersRemainders and Quotients in DivisionDivision Word ProblemsIntroduction to Long DivisionFactors and MultiplesPrime and Composite NumbersEquivalent FractionsRelating Fractions and DecimalsDecimal Place ValueIntegers and the Number LineOpposites and Additive InversesAbsolute ValueAdding IntegersSubtracting IntegersMultiplying IntegersDividing IntegersUnit RatesProportionsPercent ConceptConverting Between Fractions, Decimals, and PercentsOperations with Rational NumbersTwo-Step EquationsSolving Multi-Step EquationsEquations with Variables on Both SidesLiteral EquationsSlope-Intercept FormPoint-Slope FormWriting Linear EquationsParallel and Perpendicular Line SlopesGraphing Linear EquationsPiecewise FunctionsStep FunctionsComposition of FunctionsInverse FunctionsRadical Functions and GraphsRational ExponentsExponential Functions and GraphsLogarithms IntroductionPitch and FrequencyDigital Audio FundamentalsSampling Theory in AudioAnalog-to-Digital Conversion in AudioAudio Signal Chain ArchitectureReverb and Spatial EffectsSpatial Audio and Ambisonics

Longest path: 68 steps · 271 total prerequisite topics

Prerequisites (1)

Leads To (0)

No topics depend on this one yet.