Title of Presentation: Technology Validation:  NMP ST8 Dependable Multiprocessor (DM) Project

Primary (Corresponding) Author: John Samson

Organization of Primary Author: Honeywell Aerospace, Space Electronic Systems

Co-Authors:  Minesh Patel, Alan George, Zbigniew Kalbarczyk, Rafi Some


Abstract:  Current and future space-based processing applications are requiring, and will require, increasing amount of onboard processing capability.  One way to achieve a high level of processing capability is through the use of COTS high performance processors.  While current COTS high performance processors are exhibiting adequate Total Integrated Dose (TID) performance to meet the requirements of the natural space radiation environment, Single Event Upsets) SEUs caused by heavy ions and solar flares are, and will remain, a problem.  Traditional approaches to mitigate the SEU problem involve fixed redundancy schemes such as Self Checking Pairs (SCP) or Triple Modular Redundancy (TMR).  While effective in mitigating the effects of SEUs, use of these techniques comes at a high price, 100% overhead for SCP, and 200% overhead for TMR, particularly when such a level of protection is not needed.  In such cases, it would be beneficial to be able to convert that overhead into useful mission processing capability. The idea behind Dependable Multiprocessor (DM) is be able to configure the processing system appropriately to maximize the processing capability available to the mission as a function of the mission environment which includes mission application criticality and mode of operation as well as the physical space radiation environment. In the DM technology, this is accomplished in two ways: first, through the use of high-level, fault-tolerant middleware to effectively and efficiently manage a high-performance, fault-tolerant, COTS-based cluster processor in a radiation environment, and second, to effectively and efficiently use software implemented fault tolerance and Algorithm-Based Fault Tolerance (ABFT) to enhance system immunity to SEUs.

DM technology has been developed as part of NASA’s New Millennium Program (NMP) ST8 project. The objective of the NMP ST8 DM effort is to combine high-performance, fault-tolerant, COTS-based cluster processing with replication services, ABFT, and fault tolerant middleware in an architecture and software framework capable of supporting a wide variety of mission applications.  DM development is continuing as one of the four selected ST8 flight experiments.

The DM project completed Phase B, the Formulation Phase portion of the development effort and is now in Phase C/D, the Implementation Phase.  During Phase B, the DM project successfully passed the TRL5 Technology Maturity Assessment technology validation gate, the E-PDR (Experiment-Preliminary Design Review) gate, and the NAR (Non-Advocate Review) gate.  The ST8 project passed the P-PDR (Project-Preliminary Design Review) gate and received NASA Project Confirmation.  Plans for the DM TRL7 flight experiment, demonstration, and validation were also refined during Phase B.  The paper will provide an overview of DM technology and will describe the TRL5 validation experiments, demonstrations, and performance, the flight system baseline, and the plans for the DM flight experiment and validation.