Patrick Rogers, AdaCore
Friday June 24th, afternoon
Software for current safety-critical applications, e.g., flight control systems, is both large and complex, such that full testing is not feasible. Furthermore, complete proofs of correctness are at best inherently limited by the potential for specification errors. The combination of potential specification errors and overall complexity define a problem of handling unanticipated software faults. Software fault tolerance is the use of software mechanisms to deal with these unanticipated software faults.
This half-day tutorial explores the software-based techniques and mechanisms available for tolerating unanticipated software design faults in safety-critical systems. We examine the rationale for tolerating software faults, the similarities to mechanisms for tolerating hardware faults, and the advantages and disadvantages of the common techniques. Special attention is paid to the concept of design diversity as the underlying theory for the most widely-used mechanisms (e.g., N-Version Programming) and, in particular, whether design diversity can achieve the extremely low failure rates required for safety-critical systems. The mechanisms explored are illustrated with concrete implementations using Ada 95. This leads to a discussion of the issues of concurrency and exceptions in safety-critical applications and the appropriate application of language features.
Participants will have an appreciation of the necessity for tolerating software faults as well as a firm foundation for further study and informed application of available mechanisms. A bibliography of suggested reading is provided to that end.
Systems software architects and developers responsible for safety-critical software will gain an appreciation for software fault tolerance facilities, including their limitations, and will have a firm foundation for informed application of available mechanisms as well as further study.
Patrick Rogers is a senior Member of the Technical Staff with Ada Core Technologies, specializing in high-integrity application support. A computing professional since 1975 and an Ada developer since 1980, he has extensive experience in real-time applications in Ada and C++ in both embedded and Linux/Unix/POSIX-based environments. An experienced lecturer and trainer since 1981, he has provided numerous tutorials and courses in software fault tolerance, hard real-time schedulability analysis, object-oriented programming, and the Ada programming language. He holds B.S. and M.S. degrees in computer science from the University of Houston and a Ph.D. in computer science from the University of York in the Real-Time Systems Research Group on the topic of software fault tolerance.