Maintenance Analysis and Improvement Tools [part 1]

HOME | FAQ | Books | Links

AMAZON multi-meters discounts AMAZON oscilloscope discounts

"Every problem is an opportunity."

-Kilchiro Toyoda, founder of Toyota

  • 1 Introduction
  • 2 Terminology
  • 3 Maintenance Root Cause Analysis Tools
  • 4 Six Sigma and Quality Maintenance Tools
  • 5 Lean Maintenance Tools
  • 6 Other Analysis and Improvement Tools
  • 7 Summary
  • 8 QUIZ
  • 9 References

Learning goals:

• Why are analysis tools necessary?

• Types of analysis tools available

• What analysis tools should be used? When and where should they be used?

• Application of 6 sigma in a non-production environment

• What is meant by lean and VSM, and how they apply to maintenance

1. Introduction

Organizations must continually improve processes, reduce costs, and cut waste to remain competitive. Data (O&M data) should be analyzed using various techniques and tools in order to develop and implement effective plans that can lead to improvements in assets and processes.

Recent industry surveys have indicated that although many organizations have started investing time and effort to improve their processes, it isn't unusual to see the same problems popping up over and over. The impact of these problems on customers (internal and external), employees, profitability, and competitiveness have been well documented. One factor making such problems highly visible is the formalized management systems guided by documents such as ISO 9000. A new requirement of "continuous improvement" in ISO 9001 requires organizations to collect and analyze data on process performance using audits, internal performance indicators, and customer feedback. Any problems that are identified are to have corrective action taken to prevent recurrence.

Unfortunately, insufficient effort has been placed on providing guidance on how to carry out an effective diagnosis to identify the causes of problems. Organizations generally try to fix the symptoms of the problems instead of fixing the root causes. They try often to implement what we may call a "duct tape solution," hoping it will address the problem.

Meanwhile, the risks associated with repeat problems have significantly increased.

In addition, our assets /systems are getting very complex. Although the identification of problems is more rigorous, the ability to solve them has not necessarily improved at the same rate. Much of the training that is generally provided is too high level and philosophical, or is not focused on analytical problem solving. People are not being taught how to think logically and deductively. They lack the knowledge of what tools to use and how to apply them appropriately.

This section provides a tool box with many problem solving tools in it which are focused more heavily on the analytical process involved in finding the actual causes of problems. This section will discuss various tools and techniques that are available-ranging from simple checklists and spreadsheets to sophisticated modeling software that is helpful for solving problems. We will focus our discussion on a few of these tools, briefly describing situations in which they can appropriately be used.

There are many tools available to us. For the sake of streamlining our discussion; they have been classified in four major categories. These are:

• Maintenance Root Cause Analysis Tools

• Six Sigma and Quality Maintenance Tools

• Lean Maintenance Tools

• Other Analysis and Improvement Tools Each of these will be discussed in the next four sections.

2 Terminology

5 Whys

A problem-solving technique for discovering the root cause of a problem. This technique helps users to get to the root of the problem quickly by simply asking "why" a number of times until the root cause become evident.

Barrier Analysis

A technique often used, particularly in process industries, based on tracing energy flows. It has a focus on barriers to those flows, and helps to identify how and why the barriers did not prevent the energy flows from causing damage.

Cause-and-Effects Analysis

Also called Ishikawa or fishbone chart. It identifies many possible causes for an effect or problem, and then sorts ideas into useful categories to help in developing appropriate corrective actions.

Cause Mapping

A simple, but effective method of analyzing, documenting, communicating, and solving a problem to show how individual cause-and-effect relationships are inter-connected.


A generic tool that can be developed for a wide variety of purposes. It’s a structured, pre-prepared form for collecting, recording, and analyzing data as the work progresses. Some examples are operator's start-up checklist, PM checklist, and maintainability checklist used by designers.

Control Charts

A graph used to display how a process changes over time.

Comparing current data to historical control limits indicates process variations whether the process is in control or out of control.

Design for Six Sigma (DFSS)

A systematic methodology using tools and training to enable the design of products, processes, and services that meet customer expectations at 6 Sigma quality levels. DFSS optimizes the design process to achieve a very high quality and repeat able 6 Sigma performances. It follows a five-phase process called DMADV (Define - Measure - Analyze - Design - Verify), which is sometimes synonymously referred to as DFSS. Design of Experiments (DOE)

A method for carrying out carefully planned experiments on a process. Usually, design of experiments involves a series of experiments that start by looking broadly at a large numbers of variables and then focusing on the few critical ones.

Failure Modes and Effects Analysis (FMEA)

A technique to examine an asset, process, or design to deter mine potential ways it can fail and the potential effects (consequences); subsequently to identify appropriate mitigation tasks for highest priority risks.

Fault Tree

This analysis tool is constructed starting with the final failure (or event) and progressively tracing each cause that led to the previous cause. This continues till the trail can be traced back no further. Once the fault tree is completed and checked for logical flow, it’s determined what changes would prevent the sequence of causes (or events) with marked consequences from occurring again.

Flow Chart

A graphical summary of the process steps (such as production, storage, transportation) and flows (movement of information and materials) that make up a procedure or process from beginning to end. This information is used in defining, documenting, studying, and improving the system. Also called flow diagram, flow process chart, or network diagram.

Mistake Proofing

Mistake proofing, also known as Poka-Yoke (Japanese equivalent), is the use of any automatic device or method that either makes it impossible for an error to occur or makes the error immediately obvious once it has occurred.

Muda Japanese lean word for waste; non-value-added work.


Japanese lean word for unevenness; inconsistency.


Japanese lean word for overburden; unreasonable work.

Pareto Analysis

This bar graph displays variances by the number of their occurrences. Variances are shown in their descending order to identify the largest opportunities for improvement, and to separate the critical few from the trivial many. The concept is also known as 80/20 Principle.

PDCA - Deming's Improvement Cycle

Plan - Do - Check - Act (PDCA) is known as Deming's methodology to make improvements.

Root Cause Analysis

Identification and evaluation of the reason for an undesirable condition or non-conformance. A methodology that leads to the discovery of the cause of a problem or root cause.

Scatter Diagram

A diagram that graphs pairs of numerical data, one variable on each axis, to look for a trend or a relationship.

Six Sigma

This methodology systemically analyzes processes to reduce process variations and also to eliminate wastes. Six Sigma is also used to further drive productivity and quality improvements in any type of organization. DMAIC (Define - Measure - Analyze - Improve - Control) represents the steps used to guide implementation of the Six Sigma process.

Standard Deviation

Standard deviation measures variations of values from the mean. It’s denoted by the Greek letter (s) and is calculated using the following formula: where ? = sum of, Xi = observed values, X bar (X with a line over the top) = arithmetic mean, and n = number of observations.


A technique that separates data gathered from a variety of sources so that a pattern can be seen.

Theory of Constraints (TOC)

Concepts and methodology aimed mainly at achieving the most efficient flow of material in a plant. Basically it’s a scheduling and inventory control philosophy that proposes that any organization has a chain of interdependent links (departments, functions, resources); some may have potential for greater performance, but cannot realize it because of a weak link - bottleneck (constraint). TOC supports identification and removal of bottlenecks. It’s also sometime called bottle neck analysis.

Value Stream Mapping (VSM)

VSM is a tool that helps to visualize and understand the flow of information and material as it makes its way through the process value stream. It identifies steps which are not adding any value - they are waste and needed to be removed from the process or improved.

3 Maintenance Root Cause Analysis Tools

Root Cause Analysis (RCA)

Root Cause Analysis (RCA), or Root Cause Failure Analysis (RCFA) as it’s sometime called, is a step-by-step methodology that leads to the discovery of the prime cause (or the root cause) of the failure. If the root cause of a failure is not addressed in a timely fashion, the failure will repeat itself, usually causing unnecessary loss of production and increasing the cost of maintenance. RCA is a structured way to arrive at the root cause, thus facilitating elimination of the cause and not just symptoms associated with it.

Assets, components, and processes can fail for a number of reasons.

But usually there is a definite progression of actions and consequences that lead to a failure. An RCA investigation traces the cause and effect trail from the failure back to the root cause. RCA is more like a detective at work trying to solve a crime or, in a somewhat similar way, the National Transportation Safety Board (NTSB) trying to piece together evidence following a plane crash to determine the cause of the failure.

Several studies by many organizations have repeatedly proven that 90% of the time unwanted situations caused by failures are related to process problems; only about 10% are related to personnel problems. Yet, most organizations spend far more time looking for culprits, rather than focusing on finding root causes. Because of this misdirected effort, we often miss the opportunity to learn and benefit from understanding the root cause of the unwanted failures and eliminating those causes.

Consider the following two scenarios.

Scenario #1 The Plant Manager walked into the plant and found a puddle of oil on the floor near a tube assembly machine. The manager instructed the area supervisor to have the oil cleaned up immediately. The next day, while in the same area of the plant, the Plant Manager again found oil on the floor and asked the area supervisor to get the oil cleaned up from the floor. In fact, the manager was a little upset with the supervisor for not following directions given the day before. His parting words were either to get the oil cleaned up or he would find someone who would.

Scenario #2 The Plant Manager walked into the plant and found a puddle of oil on the floor near a tube assembly machine. The manager asked the area supervisor why there was oil on the floor. The supervisor indicated that it was due to a leaky oil seal in the hydraulic pipe joint above. The Plant Manager then asked when the oil seal had been replaced.

The supervisor responded that Maintenance had installed 5 or 6 oil seals over the past few weeks and each one seemed to leak. The supervisor also indicated that Maintenance had been talking to Purchasing about the seals because it seemed they were all failing - leaking prematurely.

The Plant Manager then went to talk to Purchasing about the situation with the seals. The Purchasing Manager indicated that they had in fact received a bad batch of oil seals from the supplier as reported by the maintenance department. The Purchasing Manager also indicated that they had been trying for the past month or so to get the supplier to make good on the last order of 50 seals that all seemed to be bad.

The Plant Manager then asked the Purchasing Manager why they had purchased from this supplier if their quality was poor. The Purchasing Manager replied that the supplier was the lowest bidder when quotes were received from various suppliers. When the Plant Manager asked the Purchasing Manager why they went with the lowest bidder without considering bidder's quality issues, the Purchasing Manager explained that he was directed by the Finance Manager to reduce cost.

Next, the Plant Manager went to talk to the Finance Manager about the situation. The Finance Manager noted that his direction to Purchasing to always take the lowest bid was in response to the Plant Manager's memo telling them to be as cost conscious as possible and only purchase from the lowest bidder, thus saving money. The Plant Manager was horrified to realize that he was the reason there was oil on the plant floor. What a discovery!! We may find Scenario # 2 somewhat funny, and even laugh when the problem comes full circle. We have found that most of the time everyone in the organization tries to do their best and to do the right things. But, sometimes things don't work out the way we envision. The root cause of this whole situation is sub-optimization with no overall vision. Scenario #2 also provides a good example of how one should proceed to do the root cause analysis. We need to continue asking "Why?" until a pattern emerges and the cause of the difficult situation becomes rather obvious.

When we have a problem, how do we approach it for a solution? Do we jump in and start treating the symptoms, much like continually cleaning oil as in Scenario #1? If we only fix the symptoms, based on what we see on the surface, the problem will almost certainly happen again. Then we will keep fixing the problem, again and again, but never solving it.

The practice of RCA is predicated on the belief that problems are best solved by attempting to correct or eliminate root causes, as opposed to merely addressing the immediately obvious symptoms. By directing corrective measures at root causes, the likelihood of problem recurrence will be minimized. In many cases, complete prevention of recurrence through a single intervention is unlikely. Therefore, RCA is often considered to be an iterative process; it’s frequently viewed as a part of a continuous improvement tool box.

Root cause analysis is not a single, defined methodology; there are several types or philosophies of RCA in existence. Most of these can be classified into four, very broadly defined categories based on their field of application: safety-based, production-based, process-based, and asset failure based.

1. Safety-based RCA is performed to find causes of accidents related to occupational safety, health, and environment.

2. Product- or Production-based RCA is performed to identify causes of poor quality, production and other problems in manufacturing related to the product.

3. Process-based RCA is performed to identify causes of problems related to processes, including business systems.

4. Asset-based RCA is performed for failure analysis of assets or systems in engineering and the maintenance area.

Despite the seeming disparity in purpose and definition among the various types of root cause analysis, there are some general principles that can be considered as universal.

General Principles for RCA

• Aiming corrective measures at root causes is more effective than merely treating the symptoms of a problem.

• To be effective, RCA must be performed systematically, and conclusions must be backed up by evidence.

• There is usually more than one root cause for any given problem.

The Six Steps in Performing an RCA

• Define the problem - the failure.

• Collect data / evidence about issues that contributed to the problem.

• Identify possible causal factors.

• Develop solutions and recommendations.

• Implement the recommendations.

• Track the recommended solutions to ensure effectiveness.

Step One: Define the Problem

• What happened?

• What were the specific symptoms?

What happened? What failed? How was the problem discovered?

What was the sequence of events that led to the failure or breakdown? There should be a physical examination of the area or asset involved and a detailed description of the actual event. There is a high probability of missing the right cause if the entire problem from every aspect is not checked.

Step Two: Collect Data / Evidence

• What proof do we have that the problem exists?

• What sequence of events leads to the problem?

• How long has the problem existed?

• What is the impact of the problem?

Analyze the situation to evaluate factors that contributed to the problem. To maximize the effectiveness of RCA, get all involved together, e.g., operators, maintainers, and others who are familiar with the situation.

People most familiar with the problem can help lead to a better under standing of the issues. Identify issues that contributed to the problem collectively. The details of the problem-failure can be organized by using the '3W2H' (what, when, where, how, how much) tool.

Don’t make any assumptions when examining a problem. No two problems are exactly the same in nature and cause. Actually, it’s rare for the exact same failure to occur twice. Each problem should be reviewed as if you are looking at the situation for the first time. It may be that the two failure phenomena appear to be same, but the causes can be different.

Step Three: Identify Possible Causal Factors

• What are the causal factors?

• Why does the causal factor exist?

• What is the real reason the problem occurred?

Identify every possible cause of failure. Looking at problems in the past may be helpful in determining cause of failure; however, you should not limit your search to past causes. Every possible cause must be considered and examined.

In performing cause analysis, it will become clear in the process that some causes will be illogical. Remove all illogical causes after careful examination. If we determine that the cause of the failure was human error, separate that cause from the physical causes.

Use these tools to help identify causal factors:

• 5 Whys - Ask Why? Until the root of the problem is found.

• Cause and Effect Diagrams - Create a chart of all of the possible causal factors, to see where the trouble may have begun.

• Drill Down - Break down a problem into small, detailed parts to better understand the big picture.

• Ask So what? Determine all the possible consequences of a fact.

These tools are designed to encourage analyzing deeper at each level of cause and effect. During this stage, identify as many causal factors as possible. Too often, people identify one or two factors and then stop, but that's not sufficient. With RCA, we don't want to simply treat the most obvious causes; we may need to dig deeper.

Step Four: Develop Solutions and Recommendations.

Based on the factors that may cause or have already caused the failure, you need to develop mitigating corrective actions. To develop appropriate recommendations, group the cause factors into three basic cause types:

a) Physical causes - Tangible, material items failed in some way ( For example, Crane's brakes stopped working; the hydraulic cylinder rod didn't stop at the right location).

b) Human causes - The operator or mechanic did something wrong or did not do something that was needed. Human causes typically lead to physical causes ( For example, low brake fluid which led to brake failure; the limit switch was not relocated at the right location after the last set-up change).

c) Organizational / Process causes - A system, process, or policy that people use to make decisions or do their work is faulty ( For example, no one person was responsible for vehicle maintenance, and everyone assumed someone else had filled the brake fluid; no written procedure was available to ensure relocation of the limit switch after the set-up change).

Root Cause Analysis should investigate all three types of causes. It involves investigating the patterns of negative effects, finding hidden flaws in the system, and discovering specific actions that contributed to the problem. This often means that RCA reveals more than one root cause.

Step Five: Implement Recommendations

a) What can we do to prevent the problem from happening again?

b) How will the solution be implemented?

c) Who will be responsible for it?

d) What are the risks of implementing the solution?

e) How will implementation success be measured?

Develop a plan with a schedule for implementing the suggested solution or recommendations. The plan should be presented to all stakeholders including management for their approval. The plan should also identify how implementation progress will be tracked and what metrics should be used to measure the effectiveness of the recommended solution.

Step Six: Track the recommended solutions to ensure effectiveness.

After recommended solutions have been implemented, tracking the appropriate metrics to measure the effectiveness of the recommended solution is an essential component of RCA. If the solution has not been effective, the team should revisit the RCA and modify the solution, then implement the revised recommendations and start re-measuring the effectiveness Examples of tools and techniques to perform root cause analysis

• 5 Whys

• Cause and effect diagram or fishbone diagram

• Failure mode and effects analysis (FMEA)

• Pareto analysis - 80/20 rule

• Fault tree analysis

• Barrier analysis

• Cause mapping

Examples of some basic elements of root causes


• Defective raw material

• Wrong type of material for this job Asset / Machine

• Incorrect asset /machine used

• Incorrect tool selected

• Poorly maintained or inadequate maintenance

• Poor design

• Poor machine installation

• Defective machine or tool Environment

• Poorly maintained workplace

• Inadequate job design or layout of work

• Surfaces poorly maintained

• Physical demands of the task

• Forces of nature or Act of God Safety and Management

• No or poor management involvement

• Inattention to task

• Task hazards not guarded properly

• Poor recognition of hazard

• Previously identified hazards were not eliminated

• High stress demands Methods

• No or poor procedures

• Not following the procedures

• Poor communication People System / Involvement

• Lack of or poor training

• Lack of process / machine operating procedures

• Lack of or poor employee involvement

Root Cause Analysis Template

Recommended template for final RCA report:

1. Undesirable Event

2. Undesirable Event Summary

3. Data Summary from FMEA / Pareto Analysis

4. Identified Root Causes

a. Physical

b. Human

c. Organization / Process and Procedures

5. Recommended Corrective Action

6. Implementation Plan

7. Metrics to measure effectiveness

8. Team Members

9. Special / Additional comments

Several computer software programs are available that can be very useful in supporting and documenting the root analysis. It’s suggested you use a commercially-available program or a spreadsheet to document the process findings.

5 Whys Analysis

The 5 Whys is a simple problem-solving technique that helps users get to the root of the problem quickly. Made popular in the 1970s by the Toyota Production System, the 5 Whys analysis involves looking at any problem and asking: "Why?" and "What caused this problem? Quite often, the answer to the first why will prompt another why and the answer to the second why will prompt another, and so on - hence the name the 5 Whys analysis.

Benefits of the 5 Whys include:

• It helps to quickly determine the root cause of a problem.

• It’s easy to learn and apply.

How to Use the Tool

When looking to solve a problem, start at the end result and work backward (toward the root cause), continually asking, "Why?" This process will need to be repeated over and over until the root cause of the problem becomes apparent. If it doesn't quickly give an answer that's obviously right, then you may need more sophisticated problem solving techniques.


The following example shows the effectiveness of the 5 Whys analysis as a problem-solving technique:

1. Why is our customer (operations department xyz) unhappy?

• Because we did not deliver our services (fixing the asset) when we said we would.

2. Why were we unable to meet the agreed-upon schedule for delivery?

• The job took much longer than we thought it would.

3. Why did it take so much longer?

• Because we underestimated the work requirements.

4. Why did we underestimate the work?

• Because we made a quick estimate of the time needed to complete it, and did not list the individual steps needed to complete the total job. In short, we did not do work planning for this job.

5. Why didn't we do planning - detailed analysis - for this job?

• Because we were running behind on other projects and were short on planner resources. Our customer (operations department xyz) was forcing us to do the job quickly. Clearly, we needed better planning, including improved time estimations and steps needed to complete the job efficiently.

The 5 Whys analysis is an effective tool for uncovering the root causes of a problem. Because it’s so elementary in nature, it can be adapted quickly and applied to almost any problem. Remember, if it doesn't prompt an intuitive answer, other problem-solving techniques may need to be applied.

Cause-and-Effects Analysis (or Fishbone Diagram)

What Is a Fishbone Diagram?

Dr. Kaoru Ishikawa, a Japanese quality control statistician, invented the fishbone diagram, which often looks like the skeleton of a fish, hence its name. The fishbone diagram is an analytical tool that provides a systematic way of looking at effects and the causes that create or contribute to those effects. Because of this function of the fishbone diagram, it’s also called a Cause-and-Effect Diagram.

Whatever name we choose to call this analysis, it helps us in a systematic and simple way to categorize the many potential causes of problems and to identify root causes. Usually this analysis is performed as a team. After listing the possible causes for a problem, the team can analyze each cause carefully, giving due importance to statements made by each team member during the brainstorming session, accepting or ruling out certain causes, and eventually arriving at the root cause of the problem. In general, fishbone diagrams give us increased understanding of complex problems by visual means of analyses.

When Should We Use a Fishbone Diagram?

It’s helpful to use the fishbone diagram in the following cases:

• To stimulate thinking during a brainstorming session

• When there are many possible causes for a problem

• To evaluate all the possible reasons when a process is beginning to have difficulties, problems, or breakdowns

• To investigate why an asset or process is not performing properly or producing the desired results

• To analyze and find the root causes of a complicated problem and to understand relationships between potential causes

• To dissect problems into smaller pieces

When Should We Not Use a Fishbone Diagram?

Of course, the Fishbone diagram isn't applicable to every situation.

Listed below are just a few examples in which we should not use the Fishbone diagram because the diagram either is not relevant or does not produce the expected results:

• The problem is simple or is already known.

• The team size is too small for brainstorming.

• There is a communication problem among the team members.

• The team has experts who can fix any problem without much difficulty.

How to Construct a Fishbone Diagram

The following five steps are essential when constructing a fish-bone diagram:

1. Define the problem.

2. Brainstorm.

3. Identify all causes.

4. Select any causes that may be at the root of the problem.

5. Develop corrective action plan to eliminate or reduce the impact of the causes selected in Step 4.

The first step is fairly simple and straightforward. Define the problem for which the root cause needs to be identified. Usually the maintenance / reliability engineer or technical leader chooses the problem that needs a permanent fix, and that is worth brainstorming with the team.

FIG. 1 Fishbone Diagram: Failure of a Hydraulic Pump

After the problem is identified, the team leader can start constructing the Fishbone diagram. The leader defines the problem in a square box to the right side of a page or worksheet. A straight line is drawn from the left to the problem box with an arrow pointing towards the box. The problem box now becomes the fish head and its bones are to be filled in during the steps that follow. FIG. 1 provides an example of a Hydraulic Pump analysis. In this example, a hydraulic pump that is not pumping the desired output (oil - specified pressure and volume) has become a problem.

The next step is to start identifying major components and suspected causes of this failure, e.g., bearing failure, motor failure, seal failure, or shaft failure. All major causes are identified and connected as parts (the bones) of the Fishbone diagram. Causes of bearing failures are also listed in this example. The next step is to refine the major causes to find the secondary causes and other causes occurring under each of the major categories.

In general, the following steps are taken to draw the fishbone diagram:

1. List the problem/issue to be investigated in the "head of the fish".

2. Label each "bone" of the "fish". Major categories typically include:

• The 4 Ms: Methods, Machines, Materials, and Manpower

• The 4 Ps: Place, Procedure, People, and Policies

• The 4 Ss: Surroundings, Suppliers, Systems, and Skills

• The 6 Ms Machine, Method, Materials, Measurement, Man, and Mother Nature (Environment)

• The 6 EPMs

Equipment/Asset, Process, People, Materials, Environment, and Management.

a. The team may use one of the categories suggested above, combine them in any manner, or make up others as needed. The categories are to help organize the ideas.

b. Use an idea-generating technique (e.g., brainstorming) to identify the factors within each category that may be affecting the problem or effect being studied. The team should ask, "What are the issue and its cause and effect?"

c. Repeat this procedure with each factor under the category to produce sub-factors. Continue asking, "Why is this happening?" and put additional segments under each factor and subsequently under each sub-factor.

d. Continue until you no longer get useful information when you ask, "Why is that happening?"

e. Analyze the results of the fishbone after team members agree that an adequate amount of detail has been provided under each major category. For example, look for those items that appear in more than one category. These become the most likely causes.

f. For those items identified as the most likely causes, the team should reach consensus on their priority. The first item should be listed the most probable cause.

An example of another fishbone diagram is shown in FIG. 2. In this example, an analysis team is trying to understand poor humidity control problem in a drier application. The team used five specific headings to prompt the ideas.

FIG. 2 Fishbone Diagram: Poor Humidity Control Problem in Drier

FIG. 3 Elements of Excellence

FIG. 3 shows yet another example of a fishbone diagram. Here, "Maintenance Excellence" is the problem statement (result). The diagram lists causes -- in this case actions needed to be taken in order to achieve excellence in maintenance. It should be noted that problem to be solved could be positive too. We always think a problems to be in a negative sense. These tools could be used in either way.

Sometimes the fishbone diagram can become very large because the team may identify many possible causes. This makes the diagram very complex; comprehending the relationship of the causes can be difficult. A good fishbone diagram is one which has explored all the possibilities for a problem but is still easy to understand when developing corrective action plans.

Failure Modes and Effects Analysis (FMEA)

This analysis tool is also called failure modes, effects and criticality analysis (FMECA), and potential failure modes and effects analysis.

FMEA is a step-by-step methodology for identifying all possible failures during the design of an asset (product) - in a manufacturing or assembly process, in the operations and maintenance phase, or in providing services.

Failure modes are the ways, or modes, in which something might fail.

Failures are any potential or actual errors or defects that affect the customer, user, or asset itself. Effects refer to the consequences of those failures.

Failures are prioritized according to how serious their consequences are, how frequently they occur, and how easily they can be detected. The purpose of FMEA is to take actions to eliminate or reduce failures, starting with the highest-priority ones.

FMEA is an economical and effective tool for finding potential failures early in the design-development phase where it’s easier to take actions to overcome these issues, thereby enhancing reliability through design. FMEA is used to identify potential failure modes and their effect on the operation of the assets; it also is helpful when developing effective PM actions to mitigate consequences of failure. It’s an important step in anticipating what might go wrong with assets. Although anticipating every failure mode may not possible, the analysis team should formulate as extensive a list of potential failure modes as possible. Early and consistent use of FMEAs in the design process allows us to design out failures, in turn making assets more reliable and safe.

It has been observed that designers and engineers often use safety factor as a way of making sure that the design will work and protect the asset (product) and user upon failure. In the past, asset and systems designers have not done a good job designing in reliability and quality into the asset.

The use of a large safety factor does not necessarily translate into a reliable asset. In fact, it often leads to an overdesigned product with reliability problems.

FIG. 4 FMEA Steps

Types and Usage of FMEAs

There are several types of FMEAs, based on how and where it’s used, e.g., in the design-development, operations, or maintenance phase of the assets. FMEA should always be performed whenever failures could create potential harm or injury to the user (operator), environmental challenges, or breakdown of the asset, in turn causing loss of production. FMEAs can be classified into the following categories:

a. Design: focuses on components and subsystems

b. Process: focuses on manufacturing and assembly processes

c. Maintenance: focuses on asset functions

d. Service: focuses on service functions

e. Software: focuses on software functions

Although the purpose, terminology, and other details can vary according to type (e.g., Process FMEA, Design FMEA), the basic methodology is similar for all.

FIG. 4 depicts the sequence in which a FMEA is performed. The typical sequence of steps answers the following set of questions:

1. What are the components and the functions they provide?

2. What can go wrong?

3. What are the effects?

4 How bad are the effects?

5. What are the causes?

6. How often can they fail?

7. How can this be prevented?

8. Can this be detected?

9. What can be done; what design, process, or procedural changes can be made?

FIG. 5 SAE/AIAG FMEA Guidelines

FIG. 6a Documenting FMEA with a Spreadsheet

FIG. 7 Failure Modes and Causes

FIG. 6b Documenting FMEA continued

Published Standards and Guidelines

There are a number of published guidelines and standards for the requirements and recommended reporting format of FMEAs. Some of the key published standards for this analysis include

• SAE J1739,


• MIL-STD-1629A (out of print / cancelled)

Automotive Industry Action Group (AIAG) guidelines and reference manual are very similar to SAE standard - J1739. FIG. 5 shows the SAE/AIAG recommended format for performing and reporting FMEA In addition, many industries and organizations have developed their own FMEA procedures and formats to meet the specific requirements of their products and processes. FIG. 6 illustrates another organization's approach to documenting FMEA using a simple spreadsheet. Part a shows elements of Failure Modes Identification and Effects whereas Part b shows Prevention Impact and Mitigation Assessment.

In general, FMEA requires the identification of the following basic information:

i. Items - components

ii. Functions

iii. Failure modes

iv. Effects of Failure

v. Causes of Failure

vi. Probability (Frequency) of Failure

vii. Severity of Effects

viii. Likelihood of Detection

ix. Current Mitigating Plan

x. Recommended Actions


Prev. | Next

Article Index    HOME   Project Management Articles