Floating-Point Representation (IEEE 754)

College Depth 51 in the knowledge graph I know this Set as goal
Unlocks 55 downstream topics
floating-point IEEE-754 real-numbers precision

Core Idea

IEEE 754 floating-point represents real numbers in binary scientific notation: a sign bit, a biased exponent, and a significand (mantissa). A 32-bit single-precision float has 1 sign bit, 8 exponent bits, and 23 mantissa bits. The format can represent very large and very small numbers but introduces rounding errors because most real numbers cannot be represented exactly in a finite number of bits. Special values like infinity, negative zero, and NaN (Not a Number) handle edge cases in computation.

How It's Best Learned

Encode and decode several floating-point values by hand using the IEEE 754 formula. Explore precision loss by computing (1.0 + epsilon == 1.0) in a programming language. Use visualization tools to see the distribution of representable values.

Common Misconceptions

Explainer

You already know how two's complement represents signed integers by fixing a set number of bits and interpreting them with positional place values. But integers cannot represent fractions or very large numbers like 6.022 × 10²³. Floating-point representation extends the idea of binary encoding to approximate real numbers, using the same principle as scientific notation: separate a number into a significand (the meaningful digits) and an exponent (the scale). In decimal scientific notation, 0.0042 becomes 4.2 × 10⁻³. IEEE 754 does the same thing in binary: a number is stored as ±1.mantissa × 2^exponent.

A 32-bit single-precision float divides its bits into three fields: 1 sign bit (0 for positive, 1 for negative), 8 exponent bits, and 23 mantissa bits. The exponent uses a biased encoding — the stored value is the actual exponent plus 127, so a stored exponent of 130 means an actual exponent of 3. This avoids the need for two's complement in the exponent field and makes comparison simpler. The mantissa stores only the fractional part of the significand because the leading 1 is implicit — since any nonzero binary number in scientific notation starts with 1 (there is only one nonzero digit in binary), there is no need to store it. This trick gives you 24 bits of precision from 23 stored bits.

The consequence of finite precision is rounding error. The representable floating-point values are not evenly distributed across the number line — they are densely packed near zero and increasingly sparse as magnitude grows. Between 1.0 and 2.0, there are 2²³ (about 8 million) representable values. Between 2.0 and 4.0, there are the same 2²³ values spread over twice the range, so the gap between consecutive representable numbers doubles. This means that adding a tiny number to a large number can produce no change at all: if the small number falls below the gap size at the large number's magnitude, it gets rounded away. The expression `1.0 + 1e-8 == 1.0` evaluates to true in single precision, not because of a bug, but because 10⁻⁸ is smaller than the spacing between representable values near 1.0.

IEEE 754 also defines special values to handle exceptional cases gracefully. Positive and negative infinity result from operations like dividing a positive number by zero. NaN (Not a Number) represents undefined results like 0/0 or √(−1) and has the unique property that NaN ≠ NaN. Negative zero exists because the sign bit is independent of the magnitude — it compares equal to positive zero but preserves sign information for certain mathematical operations. These special values ensure that floating-point arithmetic never traps or halts unexpectedly; every operation produces a defined result, even if that result is "this computation is undefined."

Practice Questions 5 questions

Prerequisite Chain

Counting to 10Counting to 20Understanding ZeroThe Number ZeroCounting to FiveOne-to-One CorrespondenceCombining Small Groups Within 5Addition Within 10Addition Within 20Two-Digit Addition Without RegroupingTwo-Digit Addition with RegroupingAddition Within 100Repeated Addition as MultiplicationMultiplication Facts Within 100Division as Equal SharingDivision as Grouping (Measurement Division)Division: Grouping (Repeated Subtraction) ModelDivision: Fair Sharing ModelDivision as Equal SharingDivision as GroupingBasic Division FactsDivision Facts Within 100Two-Digit by One-Digit DivisionDivision with RemaindersRemainders and Quotients in DivisionDivision Word ProblemsIntroduction to Long DivisionFactors and MultiplesPrime and Composite NumbersEquivalent FractionsRelating Fractions and DecimalsDecimal Place ValueReading and Writing DecimalsComparing and Ordering DecimalsAdding and Subtracting DecimalsMultiplying DecimalsDividing DecimalsDividing FractionsMixed Number ArithmeticOrder of OperationsOperators and ExpressionsArithmetic Operators and Operator PrecedenceComparison Operators and Boolean TestsLogical Operators and Boolean AlgebraBoolean Algebra and Fundamental LawsCombinational Circuit DesignFlip-Flops and LatchesBinary Counters: Design and AnalysisBinary ArithmeticFixed-Point Number RepresentationTwo's Complement RepresentationFloating-Point Representation (IEEE 754)

Longest path: 52 steps · 223 total prerequisite topics

Prerequisites (4)

Leads To (2)