Tag Archives: Fault

Architecture Product Development RAMS Reliability Safety Video

[004] From Root Cause Investigation to Fault Tree Analysis

Example Fault Tree Analysis FTA, generated automatically from a component modelIn post [003] we referred to “each directly or indirectly required componen“, when talking about determination of the system function’s failure rate. But which components are required? – Well, basically all these individual items or combinations of items whose local failure will affect the considered function in a way that it does not work any longer.[more…]


Further links:


In other words, we have to find out those components that are crucial for the functionality. A common way of doing that is a so-called root-cause investigation, assuming the individual function has failed.

One aspect of this post’s video is the demonstration, how these root-causes can easily be derived from the functional system model, using a kind of automatic backward reasoning. For each detected root-cause – be it a single fault, double fault or even higher order fault – the graphic of the connected SmartRAMS-blocks displays the affected system parts for each particular scenario.

Root cause analysis is often performed during system operation – i.e. late in the product life cycle – during diagnosis or troubleshooting. However, its reasoning and findings are very related to the top-down investigation in the context of a Fault Tree Analysis FTA, usually performed very early in product development.

Risk analysis by FTA has the goal to check if the safety and reliability requirements are met by the anticipated architecture. The system model composed from the simple Boolean library items supports also this purpose. We can automatically derive the Fault Trees and – as a side effect – compute the function failure rates from the component’s lambda-values. This is the other aspect shown in the video:


In post [005] we are going to demonstrate these features using a emergency power system as a simple risk assessment example.

 

Reliability Standards

[003] Reliability Modeling from Fault to Failure

Components And Functions In System Design, Determining the reliability of systemsOne of the core features to investigate in the context of RAMS analyses is the functional reliability of the system. Quantitatively represented by the failure rate lambda, it specifies the number of failures within a certain time period, e.g. “failures per million hours (fpmh)”.

Again as in post [001] we clearly have to distinguish between failure rates of system functions and of system components, as hown in the figure:[more…]


Further links:


 

At the start of the design process what is given are probably upper limits of the failure rates of the functions – not the components (!) – of the designated system. These individual maximal values for lambda might be defined by the customer, by specification, by design rules, by standards (i.e. IEC 61508, ISO 26262, MIL882D etc) or otherwise. When developping a flexible block-library we need to represent this given parameter lambda_max by a value slot on function level.

On the other hand what has to be determined is each functions actual failure rate, say lambda_act. It depends mainly on 2 factors:

  1. the failure rate lambda of each directly or indirectly required component, i.e. its quality
  2. the system topology, how the components are connected and interact, i.e. the design architecture.

Concerning components quality, we simply provide a value slot on component level to represent the individual lambda. In the easiest case it will be just a fixed parameter for lambda.┬áMore ambitious approaches, like dependencies of the component’s failure rate on other environmental usage parameters, can be considered. In industrial practice also the “mean time between failures (mtbf)” is frequently used, which – under common conditions – is the inverse of the failure rate lambda.

Concerning system topology,. we need an arithmetic to consider the topology when determining the actual function failure rates. This is basically not too complicated, if we look at the system on a smaller, local or component scale instead trying to derive the calculation on global level. Assuming independence of the suppliers of a component, it follows simple rules, very similar to those applied in classical Fault Tree Analysis FTA:

  1. Add up the probabilities of the individual failures, if each of them separately might “kill” the output. Example: The┬áprobability that an individual component fails to deliver its output service, is the sum of its own failure rate and the probability that its immediate supply fails.
  2. Multiply the probabilities of the individual failures, if only the combination will “kill” the output. Example: The probability that the redundant supply of a component fails, is the product of the failure rates of the individual supplies.

Applying these rules recursively from the function viewpoint dependent on the individual redundancy situation at each component and its inputs, up to the first elements in the supplier-consumer-path, is a major step to modularize the failure rate computation of each function.

How closely related Fault Tree Analysis FTA and Root Cause Analysis are will be shown in post [004].

Architecture Product Development Safety Video

[002] The safety viewpoint – from faults to failure

Schematics components fault leading to function failureIs functionality all we need? Are we done when we have found an architecture by which all functions work? – Yes and No!

Under the aspect of safety we also have to consider the inverse view and ask, under which conditions which function will fail. If an individual component fails, it might affect the various system functions in different ways. Redundancy plays a crucial role here.

When does a function fail? [more…]Dependent on its internal requirements, it will fail if one or several of its supplying components fail. The one at the end of the functional path. And this one will fail, …


Further links:


  • if itself is defective or
  • if one or several of its own suppliers fail.

Recursively we can track back the whole functional network, a network of suppliers and consumers of services. Each with an individual internal condition. In the simplest form this may be expressed by simple logic, Boolean logic.

The following video shows some failure cases of the system in the figure shown in post [001].. Square boxes represent the components, rectangular ones the functions.

The blocks used here incorporate basically not more than such generic Boolean conditions. If a component provides an expected output – like “energized” – it is displayed in blue. If it provides no signal, it is white. And a nominally working function is green, while a failed function shows up in red. The analysis might remind you to FMEA – we’ll come back to that later.

The clue is, that each component may interactively set to “fails”, which means it fails to deliver its own service, locally, independent of its incoming supply. As the video shows, a simple simulation of such a network will immediately unveil, which functions will be affected by which component faults – indivudual ones or also combinations.

After having clarified these basic dependencies, we will introduce the concept of failure rates in post [003].