Quality Risk Analysis: FMEA

-->

Continuing from the earlier post on Quality Risk Analysis, here's an approach known as FMEA (Failure Modes and Effects Analysis). FMEA is a proactive approach to defect prevention. FMEA involves analyzing failure modes, potential or actual, rating and ranking the risk to the software and taking appropriate actions to mitigate the risk. FMEA is used to improve the quality of the work products during the development life cycle and help reduce defects.

Failure Modes are the ways or modes in which failures occur. Failures are potential or actual errors or defects. Effect Analysis is studying the consequences of these failures. Failures are prioritized according to how serious their consequences are, how frequently they occur and how easily they can be detected. This technique helps product teams anticipate product failure modes and assess their associated risks. Prioritized by potential risk, the riskiest failure modes can then be targeted to design them out of the software or at least mitigate their effects. Failure modes and effects analysis also documents current knowledge and actions about the risks of failures, for use in continuous improvement. Potential failure modes can be identified from many different sources. Some of them include – Brainstorming, Bug and triage data, Defect taxonomy, Root cause analysis, Security vulnerabilities and threat models, Customer feedback, Sustaining engineering fixes, Support issues and fixes and Static analysis tools.

Software FMEA ROI is calculated in terms of a cost avoidance factor – the amount of cost avoided by identifying issues early in the life cycle. This is calculated by multiplying the number of issues found by the Software cost value of addressing these issues during a specific phase. The main purpose of doing a Software FMEA is to catch Software defects in the associated development phases: catching Requirements defects in Requirements phase, Design defects in Design phase, etc.

Some benefits of Software FMEA
  • More robust and reliable software; better quality of software
  • Focus on defect prevention by identifying and eliminating defects in the software design stage helps to drive quality upstream
  • Reduced cost of Testing when measured in terms of cost of poor quality. Proactive identification and elimination of software defects saves time and money. If a defect cannot occur, there will be no need to fix it
  • Enhanced productivity by way of developing higher quality software in lesser time. Prioritization of potential failures based on risk helps support the most effective allocation of people and resources to prevent them
  • Since the technique requires detailed analysis of expected failures, it results in a complete view of potential issues leading to an informed and clearer understanding of risks in the system. Engineering knowledge is persisted for use in future software development projects and iterations. This helps an organization avoid relearning what is already known
  • Helps guide design and development decisions
  • Helps guide testing to focus on areas where more testing is needed and test design requirements
Some watch areas affecting FMEA
  • The potential time commitment required can discourage participation.
  • Focus area documentation does not exist prior to the FMEA session and needs to be created, adding to the time requirements
  • Generally, the more knowledgeable and experienced the session participants are, the better the FMEA results. The risk is that key individuals are often busy and therefore unable or unwilling to participate and commit their time for the process
High level summary of Software FMEA process
  • After the potential failure modes are identified, they are further analyzed, by potential causes and potential effects of the failure mode (Causes and Effects Analysis)
  • For each failure mode, a Risk Priority number (RPN) is assigned based on:
  • Occurrence Rating, Range 1-10; the higher the occurrence probability, the higher the rating
  • Severity Rating, Range 1-10; the higher the severity associated with the potential failure mode, the higher the rating
  • Detectability Rating, Range 1-10; the lower the detectability, the higher the rating
  • Another method is to use a rating scale of High, Medium and Low for Occurrence, Severity and Detectability Ratings
    • High: 9
    • Medium: 6
    • Low: 3
  • RPN = Occurrence * Severity * Detection; (Maximum = 1000, Minimum = 1)
  • For all potential failures identified with an RPN score of 150 or greater, the FMEA team will propose recommended actions to be completed within the phase the failure was found
  • A resulting RPN score must be recomputed after each recommended action to show that the risk has been significantly mitigated