Answering Your Autonomous Safety Critical Functions Questions
February 3, 2021
MTSI received more questions than time allowed during the recent Verification, Validation, and Qualification of Autonomous Safety Critical Functions webinar. The MTSI team of Russ Wolfe, Vice President Engineering, Steve Parker, Director, Warfighter Strategy, Ron Wingfield, Airworthiness Certification Lead, Allen Rabayda, Artificial Intelligence Lead, Arpit Mehta, Test & Evaluation Lead, and Dean Barten, Senior Principal Engineer, combined to prepare the below information in gratitude to those that viewed the webinar and submitted questions.
Q1) How does this presentation relate to IEEE P7009 or UL4600?
Ron Wingfield: This presentation shares the same goals as IEEE and ANSI for the how to obtain- fail safe design and control safety of autonomous vehicles – whether ground, air, rail etc.
Q2) Is there a test suite for testing chaotic flight events like turbulence or icing?
Russ Wolfe: There isn’t necessarily anything commercially available today, but this technology could certainly be applied to that challenge.
Q3) In your experience and based on current very limited guidance material in ML or AI, what would be the expectation of the Verification to be performed on the target versus the simulated environment for the algorithm training for cert?
Allen Rabayda: Verification will greatly depend on the task and may be broken into sub-verification components. For instance, tracking will require target identification (e.g. Can the ML accurately distinguish a target from other objects–straight up ROC curves?), target ID persistence (e.g. Can the AI/ML correctly ID targets in space and through time–% custody through single frames, say 50% for 30 frames/sec video, different look angles in 3D), target ID in environment(s) (e.g. Can the hyperparameters be effectively tuned to particular environments–air to ground—cluttered urban, desert, sea-based, etc.?).
Q4) How do you purpose to quantify the safety on a critical safety AI equipment?
Ron Wingfield: Ensure ALL requirements (e.g., safety, software safety) are not lacking – drill deep as most programs are deficient. Quantify ALL pass and fail conditions/results. Determine out of bounds/failure conditions and/or safe conditions- when conditions, events, operations mishap severity is Catastrophic or Critical, then add redundancy in the design when required and re-test. Wash, rinse, repeat…
Q5) How do you maintain VV&Q in a system that continues to learn after it is developed and deployed? Does this require a continuous V&V process over the lifecycle, perhaps on a mission-to-mission basis?
Ron Wingfield: Via Sustainment Airworthiness of a validated design and dedicated platform industry/PM/authority and systems integrity teams perhaps to the mission to mission level – especially in the platform infancy to ensure compliance is maintained.
Q6) How do the levels of redundancy in manned flight (often 3) translate to unmanned systems? Is a minimum of 3 redundancies adequate in unmanned?
Dean Barten: Levels of redundancy are typically tied to number of passengers and/or reliability requirements for the system. The airlines and latest technology fighter aircraft are where we see triple redundancy. For example, most Army manned aircraft have dual redundancy in critical flight systems. However, to date most unmanned systems have been designed with reliability requirements that allow for single redundancy.
Q7) To what extent is on-board recording for test verification sufficient, and should live telemetry be considered?
Arpit Mehta: It could vary depending on what kind of system and the risk associated with it. For example, when fighter jets are in developmental testing, a lot of them are instrumented and provides live telemetry for real-time decision making. It also depends on what kind of data you are trying to get – if you are trying to identify performance related data that are not easily identifiable or a system has complex embedded software, onboarding recording can provide you valuable data for your design changes.
Q8) DO178C is not enough to ensure correctness of the AI, you mentioned requirements needs to be defined so they can be verified and deemed deterministic. Are you aware about the SAE WG G34?
Ron Wingfield: Correct, DO-178 is not enough. It is only a piece of the greater puzzle. We must engage the full complement of Aeronautical Design Standards, specifications and define (to a deep degree) criteria, standards and MoC. WG G34 would be part of that.
Q9) Do you think DO-178C will be used for AI embedded software?
Ron Wingfield: Yes…it is currently the best option for industry now.
Q10) How do you know you’ve done sufficient testing for deployment? how do you convince the regulator?
Ron Wingfield & Arpit Mehta: The answer is uniquely dependent on the authority- but the universal goal of each authority is for the air system to safely attain, sustain, and terminate flight IAW the approved usage and limitations. Early/prior agreement on the platform-unique tailored total set of (necessary and sufficient) criteria, standards and methods of compliance (MoC) (called the basis of certification) is paramount in ensuring the logical/systematic means to demonstrate verification and validation of the basis of certification. Agreed to MoC (e.g., test, analysis) will force the generation of artifacts/objective evidence used by an authority to determine airworthiness or assume accepted risk if criteria and standards are not achieved.
Q11) Will AI always require a Run-time Assurance system using dissimilar redundancy. So, when the AI fails (not if) there is a backup that is deterministic. How does one deal with misclassification of a perception AI application without input domain boundary?
Dean Barten: Runtime assurance architecture offers a very promising path forward in fielding future AI capabilities. If we can demonstrate with a high level of certainty that the air vehicle can’t possibly run into the ground, an obstacle, or another aircraft, then obtaining authority to operate AI functions on that vehicle should be significantly easier to gain.
Q12) Comparing to testing and simulation, formal methods-based V&V approaches face the biggest challenge of scalability due to lack of expandability of the AI/ML components, any vision on how to address these challenges in formal approaches?
Allen Rabayda: The requirements traceability matrix may be the biggest challenge of this formal VV&Q process but is still very doable. We may be able to leverage/adapt those processes to those tried and true ones laid out for federated simulations in MIL-STD-3022. Keep in mind, L-V-C, and today’s autonomous systems share common core components and approaches. Another VV&Q challenge to this formalization comes by way of developing an appropriate set of metrics and acceptability criteria. Things like software best practices (e.g. standard libraries/versions like TensorFlow, scikit-learn, etc.) and quantification of bias/risks can bound these challenges and serve as a foundation for our metrics.
Q13) How do you see concepts around in-time system-wide safety assurance (ISSA) capabilities and architectures playing into the safety of these systems?
Dean Barten & Ron Wingfield: See Question #6.
Q14) Must all AI systems be continually learning, or does it reach a point where the learning is frozen, and the system can become stable, thus ending the V&V process?
Allen Rabayda: I’ll address for both a supervised and active learning approach.
1) Supervised Learning Approach. In this approach, the learning is “frozen” then typically deployed for use. It will generally operate, under very similar to the training environment it was developed. These are agile expert systems that once trained perform inference. Accreditation should be performed for each inference engine upon successful completion of each set of training cycles. V&V should be continuous, even after deployment.
2) Active Learning approach. This approach once properly structured for a particular set of autonomous functions can provide orders of magnitude more flexibility for DoD mission types. Training for this approach is continuous, often exhibiting high levels of variance in stability and non-convergence to optimization. To overcome these challenges, one can implement various policy gradient techniques, use discrete vs. continuous action space, or institute loss function constraints. V&V for this should be closely monitored and evolve with each AI-agent of this network. As input fidelity, input/output timescales varied, or reward/action spaces are changed, V&V must adapt its metrics and evaluation processes.
Q15) What type of operational test environments (from the field, physical locations, features of environments) would help create field data that is trusted?
Arpit Mehta: Depends on your system that you are trying to gather that data for (ex: computer vision). You want the data from an environment in which our system/product is going to be operated in. For autonomous cars, it would be on freeways, highways, streets etc. where you may want to capture specific data like stop signs, lane changes, test your lidar system etc.). One example is if you are trying to develop sense & avoid system based on computer vision, you may want diverse set of data that vary in different shape, size, and color (i.e. humans, dogs, cars, wires, etc.). This may not be the specific answer you might be looking for, but happy to chat more so feel free to e-mail me.
Q16) How do datasets used with trained algorithms enter into the qualification equation?
Allen Rabayda: See Questions #12 and #14.