Autonomous systems operate in open, partially observable environments with safety-critical consequences: failures can cause injury or death. Traditional software testing (unit tests, integration tests) is insufficient because autonomous systems encounter scenarios far too numerous to enumerate — every combination of weather, traffic, road condition, and other agents' behavior creates a unique situation. Verification must therefore take a different approach: either prove mathematically that the system is safe under specified assumptions (formal verification), or conduct statistical testing to measure failure rates and ensure they meet safety targets (validation through testing). A third approach combines both: identify the critical failure modes, verify those formally or through targeted testing, and establish a safety case explaining why the system is acceptably safe. Autonomous vehicle safety standards (like ISO 26262) specify that every failure mode must be identified, its consequences assessed, and either eliminated (redundancy, monitoring) or detected and handled gracefully. This is orders of magnitude more rigorous than typical software development.
Autonomous system safety is fundamentally different from traditional software quality. Software bugs might crash an application or lose data; autonomous system failures cause real-world harm. This difference demands a different verification and validation approach.
The rarity problem: Autonomous vehicles must achieve extremely low failure rates (targets like 0.5 fatalities per 100 million miles — roughly 100x safer than human driving). Statistically validating such low rates through testing alone is impractical: you would need to test billions of miles to observe rare failures. Instead, validation uses a multi-faceted approach. Component-level testing measures performance on critical functions (perception accuracy, planning robustness, control stability) on large, diverse datasets. Scenario-based testing uses simulation and recorded data to exercise the system on known challenging situations. Real-world testing with safety drivers accumulates operational experience and identifies failure modes nobody anticipated.
Formal verification proves mathematical properties of specific components. You can formally prove that a control system with a particular feedback structure is stable, that a motion planner will not exceed kinematic limits, or that no buffer overflow will occur. These are valuable guarantees for safety-critical algorithms. However, formal verification cannot verify end-to-end safety: it cannot prove that perception will always correctly identify pedestrians, that the planning algorithm will find solutions to novel spatial puzzles, or that the overall system will handle the full spectrum of real-world variability.
Failure mode and effects analysis (FMEA) is a systematic hazard analysis: enumerate potential failure modes (sensor fails, perception misses object, planning algorithm gets stuck, control command is delayed), assess their probability and severity, and establish mitigation strategies. For high-severity failures (collision-causing), mitigation might be: eliminate the failure (redundant sensors, diverse algorithms), detect and handle the failure (monitoring with fallback behavior, safe stop), or restrict operation to avoid triggering the failure (limit speed, restrict to structured roads).
Operational Design Domain (ODD) defines the boundaries of validated operation. No autonomous system works in all conditions: perception fails at night without infrared, control performance degrades on icy roads, decision-making for residential streets differs from highways. The ODD explicitly states these boundaries: paved roads, daylight to twilight, dry to light rain, speed limits up to 130 km/h, highways and major urban streets. Validation is specific to the ODD. When conditions fall outside the ODD, the system triggers safe behaviors: notification to the safety driver, reduction of speed, or handoff to human control. This transparency prevents over-claiming capabilities while allowing systems to be deployed progressively as operational domains expand.
Safety standards like ISO 26262 (functional safety) and emerging AV-specific standards (ISO/PAS 21448 for intended functionality) define how safety should be engineered. For each identified hazard, the standard requires: ASIL rating (severity and control ability), failure rate targets, proof of mitigation through redundancy or monitoring, validation and verification evidence. Following these standards doesn't guarantee a safe system, but it ensures a disciplined approach to identifying and addressing hazards.
Real-world deployment introduces additional validation. Deployed vehicles accumulate real driving experience, often with safety drivers ready to intervene. This reveals failure modes impossible to anticipate in testing. When failures occur, they are analyzed: was it a system failure or misuse? Can it be addressed through software updates, operational restrictions, or hardware changes? This feedback loop is critical to improving safety over time.
The bottom line: autonomous system safety requires combining formal verification (for critical algorithms), statistical testing and comparison to human baselines (for rare event rates), scenario-based validation (for corner cases), hazard analysis (for unknown unknowns), and transparent disclosure of operational limits. No single approach is sufficient.