Methods for Estimating a Probability of Rare Accidents in Complex Non-Stationary Systems

Illustration of the principle of particle filtering

Illustration of the principle of particle filtering.


By By Ítalo Romani de Oliveira | Jeffery Musiak

Download full paper

Despite increasing levels of automation, aerospace systems are often inserted in heterogeneous organizations having both technical infrastructure (hardware and software) and human beings with certain roles and responsibilities.

Estimating a probability of failures or accidents with aerospace systems is often necessary when new concepts or designs are introduced, mainly because such events may pose threats to human life or property. In highly sophisticated systems, such as these “socio-technical” systems featuring people interacting with technical systems, accident cases are not obvious and may be the result of complex combinations of events, thus adequate analysis techniques should be employed.

The traditional method of analyzing safety in aerospace systems is the elaboration of a Safety Case, which is a structured argument, supported by evidence, intended to justify that a system is acceptably safe for a specific application in a specific operating environment. Safety cases are used as acceptable means of compliance when a regulatory authority does not have a repeatable and prescribed process for compliance assurance or certificate issuance, or when the system under scrutiny is sui generis and no existing standard pertains, situations often encountered in innovative designs.

The limitation is that traditional Safety Cases do not guarantee full consistency among the several safety case sections and individual elements, thus non-obvious failure paths may be grossly ignored.

These challenges grow with the system complexity that has accompanied technological advances. Thanks in large part to these advances, accident rates in aviation have consistently dropped over the last few decades, while cost and energy efficiency improved markedly. A corresponding evolution in safety analysis methods is needed.

One of the emerging alternatives to Safety Cases is called System- Theoretic Accident Model and Processes (STAMP). Its basic principle is to identify leading indicator for risk management based on the assumptions underlying our safety engineering practices and on the vulnerability of those assumptions rather than on likelihood of loss events.

STAMP is a qualitative and comprehensive accident model to analyze accidents in complex systems. STAMP assists in recognizing scenarios, non-functional interactions and the incorrect models and processes that will be used in the development of a safer system. However, its main limitation is that, by itself, it does not calculate any probability of accident, thus requiring complementary modeling techniques able to do so.

Another analysis technique, Multi- agent dynamic risk models (MA-DRMs), has been proven successful for quantifying the safety properties of complex socio-technical systems. Such models estimate failure and accident rates during early design stages, particularly for the systems whose failure could result in catastrophic consequences. These rates need to be under certain target levels of safety (TLS) as set by governing regulations.

One of the difficulties in this estimation is that the TLS, expressed as a probability of occurrence of an undesired event per unit of time, is often on the order of 10-9 or less—and such rare occurrences are difficult to verify analytically using realistic models.

One way to proceed with this estimation is to use sequential Monte Carlo methods, among which one of the best known is referred to as “multilevel” or “importance splitting.” Additionally, in order to improve computational efficiency, sequential Monte Carlo solutions are expressed in the form of a particle filter, as is the case of the Interacting Particle System (IPS), illustrated in Figure 1.

However, developing and tuning a proper particle filter to deal with rare events is challenging because of the degeneracy that results from the lack of diversity after successive resampling steps, which in turn results in high variance or, often, in not calculating any occurrences of target events.

Another way to perform rare event sampling is to employ optimization techniques such as the Cross-Entropy method, in order to find alternative probability distributions, with higher occurrence of the desired event, which then can be used in importance sampling to allow the estimation of the base rate of occurrence of the rare event. However, such a technique forces fitting of predetermined distribution shapes and this may result in large errors.

Many other methods exist for rare event probability estimation, among which the family of methods named Markov-Chain Monte Carlo (MCMC) has achieved a prominent success in failure analysis of complex systems. However, such methods require the system under analysis to have states with stationary probability distribution, which is hardly true for socio-technical systems that depend on human agents and do not run on a continuous basis.

Despite these difficulties, our research work shows that there are ways to produce statistically significant results involving rare accidents with non-stationary MA-DRM models of complex systems. The approach that produced the best results so far is based on smart partitioning of the probability search space. We developed an algorithm that, given a MA-DRM model of the system, explores the event probability space and partitions it in a way that allows computation time to be focused in the partitions with higher probability of accidents. So far, we applied it in case studies involving simulated scenarios of highly automated manned aircraft and it showed capable of producing statistically significant results for accidents of very low probability. We also compared it with the standard Interacting Particle System (IPS) and the latter failed to achieve statistically significant results, as it can be seen on Figure 2.

In this figure, one can see that only the smart partitioning algorithm reaches the collision event (proximity range equals zero), while IPS does not, rapidly diverging, with mean values going extremely low but with very high upper limits, which show lack of statistical significance. Also, the smart partitioning algorithm took one-fifth of the time taken by IPS to compute, using about the same amount of memory.

It is worth noting that our bibliographic search did not find an example of probability estimation below 1E-10 for models of complexity similar to ours, and here we have reliable results below 1E-17. Our models allow use of Stochastic Differential Equations (SDE) which, by definition, introduce an infinite number of random variables and contribute to the model complexity, which, on the other hand, is managed by efficient sampling and careful implementation of the model.

We believe that our approach can greatly help the development of innovative aerospace products and operational concepts by diagnosing system safety earl in the design phase.

Ítalo Romani de Oliveira is a computer scientist and electrical engineer based in São Paulo, Brazil, who specializes in air traffic management and avionics.

Jeffery Musiak is a Boeing Associate Technical Fellow for airspace and operational efficiency and a geophysicist with more than 30 years of experience modeling complex systems.

    Figure 2 - Comparison of probabilities of reaching filtering distances, according to different algorithms.

    Comparison Between Estimation Algorithms