A programmer writes `if (0.1 + 0.2 == 0.3)` in a language using IEEE 754 double precision. What will happen, and why?
AThe condition evaluates to true — 0.1 + 0.2 equals 0.3 exactly in double precision
BThe condition evaluates to false — neither 0.1 nor 0.2 has an exact binary representation, so their sum differs slightly from the stored value of 0.3
CThe condition evaluates to true only on CPUs with a hardware floating-point unit
DThe expression raises an overflow exception since 0.3 cannot be represented
The decimal 0.1 is a repeating fraction in binary (0.000110011...) and cannot be stored exactly. The error introduced when storing 0.1 and the error introduced when storing 0.2 do not perfectly cancel, so their sum differs from the separately-stored representation of 0.3 — typically by about 5.5 × 10⁻¹⁷. This is not a bug in the computer; it is a fundamental consequence of representing real numbers with finite binary precision. The correct practice is to compare floating-point numbers with a tolerance: `|a − b| < ε`.
Question 2 Multiple Choice
Why does IEEE 754 floating point maintain approximately the same number of significant decimal digits across its entire representable range — whether storing a number near 10⁻³⁰⁰ or near 10³⁰⁰?
ABecause the hardware allocates more mantissa bits to smaller numbers to compensate for their tiny magnitude
BBecause relative precision — bounded by machine epsilon — is constant regardless of magnitude
CBecause the exponent bits grow larger as the number grows, maintaining absolute error
DBecause the mantissa is stored in decimal form internally, independent of the binary exponent
Floating point is designed for relative, not absolute, precision. The mantissa stores the significant digits and the exponent scales the magnitude. Whether the number is 6.022 × 10²³ or 6.022 × 10⁻⁵, the same 52 mantissa bits represent the same ~15–16 significant decimal digits. Machine epsilon ε ≈ 2.22 × 10⁻¹⁶ bounds the relative error of any single operation: the absolute error scales with the magnitude of the number, but the relative error is always at most ε/2.
Question 3 True / False
The decimal number 0.1 cannot be represented exactly in IEEE 754 binary floating point because it requires an infinitely repeating binary fraction.
TTrue
FFalse
Answer: True
In binary, 0.1 = 0.000110011001100110011... — a repeating pattern analogous to 1/3 = 0.333... in decimal. Since the 52 mantissa bits must truncate this infinite sequence, the stored value differs from the true 0.1 by approximately 5.5 × 10⁻¹⁸. This is unavoidable in binary floating point and applies to many 'simple' decimals: 0.2, 0.3, 0.6, 0.7 all have repeating binary representations.
Question 4 True / False
Floating-point arithmetic errors are unpredictable and random, so numerical analysts cannot systematically bound or control their effect on a computation.
TTrue
FFalse
Answer: False
Floating-point errors are systematic and bounded, not random. Each arithmetic operation introduces a relative error of at most machine epsilon ε/2 ≈ 1.11 × 10⁻¹⁶. Numerical analysis provides rigorous frameworks — condition number analysis, backward error analysis, interval arithmetic — for bounding how errors accumulate across a computation. Algorithms can be designed (or avoided) based on their error amplification behavior. The challenge is not unpredictability but the potential for systematic amplification in ill-conditioned computations.
Question 5 Short Answer
Explain why floating point uses relative precision rather than absolute precision, and what practical consequence this has for comparing floating-point numbers for equality.
Think about your answer, then reveal below.
Model answer: Floating point maintains relative precision because scientific computation usually cares about significant digits, not absolute position. A measurement of 6.022 × 10²³ has the same meaningful precision as 6.022 × 10⁻⁵ — the scale differs but the information content is similar. Absolute precision (a fixed number of decimal places) would waste bits for large numbers and provide useless precision for tiny ones. The practical consequence for equality comparison is that you cannot use ==: since most decimals cannot be represented exactly, two values that should be 'equal' often differ by a tiny rounding error. The correct test is |a − b| < δ for some tolerance δ appropriate to the problem.
The relative-precision design is what makes floating point powerful for scientific computing but dangerous for financial calculations (where absolute precision at a fixed decimal place is required). Equality comparison failures are the most common practical pitfall encountered by programmers new to floating point, and they arise directly from this fundamental design choice.