← Graph View All Domains

A topic in the Open Knowledge Graph — a free, open map of 15,290 topics and the order to learn them in.

Floating Point Representation

Graduate Depth 84 in the knowledge graph ☐ I know this ☆ Set as goal

13topics build on this

340prerequisites beneath it

See this on the map →

Machine Epsilon and Unit Roundoff→→Rounding Errors and Error Propagation

Core Idea

Floating point numbers are represented in computers using a fixed number of bits: a sign bit, an exponent, and a mantissa (fractional part). The IEEE 754 standard defines how these are encoded and how arithmetic operations are performed. This limited precision representation allows computers to store a wide range of values but introduces systematic errors in computation.

Explainer

Computers must represent real numbers using a finite string of bits, which immediately poses a problem: there are uncountably many real numbers and only finitely many bit patterns. Floating point is the engineering solution — instead of trying to represent all numbers, it represents a carefully chosen finite set that covers a wide range of magnitudes while maintaining consistent *relative* precision. The key insight is that scientific computation usually cares about significant digits rather than absolute position of the decimal point. A measurement of 6.022 × 10²³ has four significant digits whether expressed as a large integer or not.

IEEE 754 double precision (the default in most languages) uses 64 bits: 1 sign bit, 11 exponent bits, and 52 mantissa bits. The number stored is (−1)^s × 1.f × 2^e−1023, where s is the sign, e is the stored exponent, and 1.f is the mantissa with an implicit leading 1 bit (since every normalized binary number starts with 1, this bit is free). The 52 mantissa bits give about 15–16 significant decimal digits of precision. The 11 exponent bits allow a range from roughly 10^−308 to 10³⁰⁸. This is the same idea as scientific notation in base 2: the exponent controls the scale, the mantissa controls the significant digits.

The critical consequence is that most real numbers cannot be represented exactly. Consider the decimal 0.1: in binary it is a repeating fraction 0.0001100110011..., so it gets truncated. This means that `0.1 + 0.2 ≠ 0.3` in floating point arithmetic — a famous surprise for beginners. The gap between any representable number and the next representable one (relative to the number's magnitude) is bounded by machine epsilon ε ≈ 2.22 × 10^−16. Every arithmetic operation introduces a rounding error of at most ε/2 relative error. Individually tiny, these errors can accumulate dramatically over many operations — a phenomenon you will study when analyzing numerical algorithms.

Special values complete the system: IEEE 754 reserves patterns for ±infinity (for overflow, e.g., 1.0/0.0) and NaN (Not a Number, for undefined results like 0.0/0.0 or √(−1)). These allow computations to continue and propagate failure information rather than crashing. Recognizing a NaN in your output signals that something went wrong upstream — it is a diagnostic, not a valid result. Understanding how floating point works is prerequisite to understanding why numerical algorithms must be designed carefully: operations that are mathematically equivalent may behave very differently when computed in finite precision.

Practice Questions 5 questions

Prerequisite Chain

Understanding Zero → The Number Zero → Counting to Five → Counting to 10 → Counting to 20 → Counting a Set of Objects Up to 20 → Cardinality: The Last Number Counted → Matching Numerals to Quantities → Subitizing Small Quantities → Addition Within 10 → Number Bonds to 10 → Addition Within 20 → Doubles and Near Doubles → Doubles Facts Within 10 → Near Doubles Facts Within 20 → Mental Math Strategies for Addition → Mental Math: Adding and Subtracting Tens → Addition Within 100 → Repeated Addition as Multiplication → Multiplication as Equal Groups → Multiplication: Arrays → Basic Multiplication Facts (0s, 1s, 2s, 5s, 10s) → Multiplication Facts Within 100 → Division as Equal Sharing → Division as Grouping (Measurement Division) → Division: Grouping (Repeated Subtraction) Model → Division: Fair Sharing Model → Division as Equal Sharing → Division as Grouping → Basic Division Facts → Division Facts Within 100 → Multiplication and Division Fact Families → Relationship Between Multiplication and Division → Division Facts as Inverse of Multiplication → Remainders and Quotients in Division → Division Word Problems → Multi-Step Word Problems → Solving Multi-Step Word Problems → Multiplication Word Problems → Division Word Problems → Introduction to Long Division → Factors and Multiples → Prime and Composite Numbers → Equivalent Fractions → Relating Fractions and Decimals → Decimal Place Value → Integers and the Number Line → Comparing and Ordering Integers → Absolute Value → Adding Integers → Subtracting Integers → Multiplying Integers → Introduction to Exponents → Order of Operations → Integer Order of Operations → Variable Expressions → The Distributive Property → Variables and Expressions Review → Introduction to Polynomials → Adding and Subtracting Polynomials → Multiplying Polynomials → Factorial → Permutations → Combinations → Counting Principles: Addition and Multiplication Rules → Introduction to Graph Theory → Propositional Logic Foundations → Logical Equivalences → Boolean Algebra → Boolean Type and Truth Values → Comparison Operators and Boolean Tests → Logical Operators and Boolean Algebra → Boolean Algebra and Fundamental Laws → Logic Gates Fundamentals → Implementing Boolean Functions with Gates → Karnaugh Map Simplification → Combinational Circuit Design → Flip-Flops and Latches → Binary Counters: Design and Analysis → Binary Arithmetic → Fixed-Point Number Representation → Two's Complement Representation → Floating-Point Representation (IEEE 754) → Machine Epsilon and Unit Roundoff → Floating Point Representation

Longest path: 85 steps · 340 total prerequisite topics

Prerequisites (1)

Machine Epsilon and Unit Roundoffsoft

Leads To (1)

Rounding Errors and Error Propagationhard