Pages

Showing posts with label RCA. Show all posts
Showing posts with label RCA. Show all posts

Monday, February 3, 2020

FMEA, and RCA, Got You Overwhelmed, and Paralyzed?


I get it, Failure Modes Effects Analysis (FMEA) and Root Cause Analysis (RCA) are hard or at the very least hard to execute consistently. If you look at the amount of them that you need to do or should do, it can be quite overwhelming. Many sites get paralyzed by this and do very little or nothing with these very important tools. Just like many of the other elements we talk about that are needed for reliability improvement, it is critical to create a plan and set an expectation. It must be broken in into manageable bite size pieces. 
For sites that are just beginning to complete RCAs and FMEAs I often suggest the following goal: What if each reliability engineer or maintenance engineer committed to one FMEA and 2 RCAs per month? Would this be doable in your organization? The expectation is that these would be good quality FMEAs of 200 lines or more and RCAs that are more than 5 whys or fish bones. They all need to be taken through implementation and follow up even if that portion occurs after the month they are scheduled in. I would suggest that FMEA be selected based on "leveragability." What I mean by that is, select your target assets not just based on criticality or failure rate but also based on how many assets that you can apply the FMEA against while developing equipment maintenance plans and systemic improvements. Think of it as, what assets can I get the most bang for the buck?
If your site were to adopt this goal and execute on it then by the end of the year you would have 12 FMEAs implemented and 24 problems addressed with RCA for each reliability or maintenance engineer.  Imagine the change in performance you would experience with this volume of issues leaving the system. 
I get it, these tools can be overwhelming, but if you set a goal and hold each other accountable then your organization can be substantially better than most of your competition within the first year. 

Monday, February 4, 2019

PEMAC Conference Discussion About The Lies Reliability People Tell.


We talked about lies nothing but lies… In our PEMAC Conference table sessions we talked about some of the maintenance and reliability models and tools that we use and some of the subtleties that often aren’t understood or taught correctly. 

We discussed the six failure curves of RCM (Reliability Centered Maintenance) and how many think and explain them as relating to types of asset or classes of assets but in reality they relate to types of failure modes of assets. This means that one asset could have many failure modes that relate to different curves so suggesting that one of the curves represents an asset class is incorrect. This explanation helps individuals to then understand that 68% of the failure modes are infant mortality according to the Nolan and Heap study, but 68% of the assets don’t always fail in the infant phase. It is like the difference in an ice cream shop saying 68% of our flavors have chocolate in them but it does not mean that 68% of the ice cream the shop sells has chocolate in it. Some people like vanilla and maybe a lot do. If so, you could sell a majority of vanilla even though most of your other flavors have chocolate in them. With this logic, you could have 68% of your failure modes be infant mortality but when you look at the number of failures they would be a majority of wear out because of the environment or equipment type or process.

The second lie we talked about was RCA (Root Cause Analysis). The table discussed that there is no such thing as root cause because every single problem has root causes. All problems need both actions that happen instantaneously and conditions that have existed over time. The example we used was fire. Fire does not have a cause it has three causes. Ignition which is likely instantaneous and fuel and oxygen which are likely conditions that have existed over time. The key is it takes all three causes not just one root cause.

We finished up by talking with the group about where and how they could use these learnings in their site to improve their reliability programs. 

Do you agree with our thoughts? Share your opinions below. 

Tuesday, February 16, 2016

Preconceived Notions Get in Your Way with RCA


Preconceived notions very commonly get in your way with Root Cause Analysis. Here is a perfect example. In this picture you can plainly see that the people on the left are taller right? Look again... Maybe we did not have all of the facts at first. It seems that in the video there is more going on than we thought. Our notions of what a room is and how it is typically shaped do not hold true in this case. This irregularity, led us to a poor understanding of the situation. Many times we go after issues with our personal opinion of the problem and the solution predetermined and it blinds us to the truth of the matter. All the data must support the conclusion not just the parts that you like. I have seen these preconceived notions derail investigations time after time and, it is one of the reasons that I suggest not using subject matter experts as RCA facilitators in this blog. Check out this post for 5 ways to help prevent jumping to conclusions. Then let us know what thoughts and ideas you have for avoiding this common problem by listing them in the comments section below.

Friday, September 18, 2015

You Need to Stop Jumping to Conclusions... Oh Me Too. Here are 5 Tips.

Why do we jump to conclusions?
  • Many times it requires no personal change...perfect.
  • "It is what I know from my past" 
  • It is a convenient answer that quickly explains away a problem. 
  • In most cases it is also easier than taking time to understand the details and facts.
What drives this behavior?
Jumping to conclusions seemingly avoids or prevents the appearance of ignorance, and can squelch the emotional reactions of others."Don't get worked up, I already know what's wrong here and am working on it."
What can you do to stop yourself from jumping to conclusions?
  1. Take time to listen and understand.
  2. Collect and analyze the facts from multiple sources.
  3. Set a goal to try and prove yourself wrong instead of working so hard to prove to others that you are right; especially, if you have a preconceived notion.
  4. True analysis of all of the facts is key. If they are indeed facts and not assumptions then, by definition, they all have to support your conclusion. If they do not, then you are missing some detail or your assumptions might be in the way of the truth.
  5. Use root cause analysis tools to help in viewing the facts and their relationships. Some of the Transitional RCA tools can be found here.
In the end, if you take time required and practice the restraint required not to jump to conclusions and really understand the situation you will not find yourself trying to defend a conclusion that you should never have leaped to to start with.

Lets go solve the world's problem one fact at a time!

Thursday, August 20, 2015

Technical Subject Matter Expert Facilitating Your RCA? Stop It.

What is the number one thing that all technical subject matter experts possess besides large amounts of knowledge?
Opinions... followed by years of past history and possibly preconceived notions.
What makes them great as contributing RCA team members for problem solving, can be kryptonite to them as facilitators. Their history and expertise leads to three possible problems when they lead or facilitate an investigation:
1. They take the team down a road to their favorite conclusion that may or may not be based on all the facts discovered.
2. They are so respected as SMEs that no one will challenge their thinking or ideas with new ones.
3. They can be blind to facts that don't fit their paradigms.
Some offer an opinion from the leaders chair and then it is a race to prove that they are right. Don't get me wrong, there are extraordinary folks out there but, in general, most SMEs struggle with these problems when asked to facilitate an RCA. So why put them in this situation. Let a leader lead and an SME provide knowledge in a facilitated manner.
A good facilitator tries to create an environment where everyone on the team is providing ideas and input. They are not ignoring any of the facts even if they are inconvenient and they are working to drive the team to consider all the possibilities and solutions. Not every problem will be solved with an answer from the past, so facilitation becomes important to draw out these new, more effective solutions.
My experience says to separate the roles and have others from different parts of the organization facilitate. Then you can instantly watch your root cause analysis teams drive more failures from your site at a lower total cost of implementation.

Thursday, October 30, 2014

Software Got You Stuck? Do Not Let RCA Applications Hold Up Your Root Cause Efforts

Let us be clear, root cause analysis is a way of thinking not a software application yet their are sites that are spending thousands of dollars and hundreds of hours learning software instead of solving problems. Software is not inherently bad but you don't need a sports car to learn to drive.
If you are getting started with your RCA efforts and software is part of your plan then be aware of these potential problems:
Software can limit team involvement: Facilitator is head down in the computer rather than up truly facilitating responses from the team.
Software can slow the flow of ideas especially if it is slow to create or move causes and links: time required to create or edit means other are waiting and forgetting points and causes.
Software can complicate reporting: The simplest and most effective reporting for most sites is the A3 style RCA report which will be the topic of the next blog post.
Computers create barriers between facilitators and the RCA team. If you have to collect the information directly to the software then you should consider a facilitator and a recorder or scribe.
If you can problem solve without the software or by capturing the information after the analysis then here are a few tips and benefits of a software free RCA.
Use sticky notes and a big blank wall or a white board. (3M's Super Sticky PostIt Notes work well)
This allows good group involvement by allowing them to write and share or verbally share and you capture the causes.This gives you two streams of causes in two different communication styles. If two people share the same or similar cause than you stack them and both participants know they were heard. This can be key to good facilitation.
With sticky notes you can start by understanding the sequence of events and include any time stamped data from PLCs, cameras etc. and then once you identify key event, or forcing functions as they are sometimes known, you can transition to fault and logic tree with ease. This will provide a better understanding of the systemic and latent causes of the key event.
With the very hands nature of the sticky method you can move and reorganize causal chains quickly and as you discover new causes they can be added with ease and with out huge disruptions to the flow of ideas. When you are done you simply take a quick picture of the analysis via the ubiquitous cell phone and paste this into your chosen report format.  These can then be shared with others electronically.
The point of today's post is not that software is bad. It is simply that it is not required to get started and make substantial improvements in your facility. Many of our student save hundreds of thousands of dollars for their sites using nothing more than the sticky notes and a sound root cause methodology. Once root cause becomes part of your business culture then you can capture, catalog, and share more effectively with the help of software but don't let it hold you back from the start.

Monday, April 1, 2013

Are You a Reliability Champion or Reliability Killer?

As I travel from plant to plant and facility to facility I see both Reliability Champions and Reliability Killers.
First, lets talk about the "bad guys." These are the people in the organization that are not necessarily against getting better but they do not "have the time" to make the improvements and certainly do not want to wait for others to either. These folks are quick to say "When is the asset going to be back up?" without asking "what caused the problem?" and "How do we prevent it from happening again?" These are the ones who will spend the companies money to buy predictive and precision maintenance tools but because of the learning curve, do not allow them to be used lest they slow down the repair process. Their goal is not to fix it right just to fix it again and preferably as fast as possible. During one extended visit to a site I watched the company spend a substantial amount of money on alignment equipment and training but as soon as the crafts tried to use them and took a few extra minutes the supervisor yelled at them "to get that stuff off there and use a strait edge. We have to get this line up!" He was a Reliability Killer.
So what does a Reliability Champion look like? They are still concerned with getting the equipment fixed but they allow that bit of extra time that can get us to root cause and improve the precision of the repair. They use your target business processes. They know if we take a little down time now then I prevent a lot of downtime in the future. Their definition of fixed is a permanent repair not one requiring baling wire, duct tape and a hammer. They like to fix it once and forget it (after it is recorded in the CMMS).
In the end the reliability killer can be neutralized and in some cases even converted to a champion. It starts with communication and the creation of awareness and then as the killer sees the error of his ways and why changing could benifit him then he can accept more knowledge that can used to correct the situation or at least stop the killing. It is not eazy but it is possible with a plan and the desire to communicate.
So which category do your leaders fit into? Are you the Champs or the Killers? What are your plans to change that?

Friday, February 8, 2013

Boeing Dreamliner, Root Causes, and Asset Infant Mortality

Boeing and their Dreamliners are taking a lot of heat over their faulty lithium ion batteries. They are dealing with the birthing pain of new technologies. They chose to move away from older style nickel cadmium batteries which have a huge weight disadvantage and a lower power density than the lighter lithium units. There is some speculation that the excessive heat and fire on two jets was caused by a bad batch of batteries but we will not know until the investigation and root cause analysis is complete. The key here is speculation and if you are a practitioner of root cause, speculation has only a small place in the process. You can start with it but in order to make an actionable finding it must be backed up with facts. What ever you do don't forget to concider that much of the speculation will be wrong and you have to keep an open problem solving mind. As of today Boeing has not been able to ascertain the root cause of the battery fire according to reports released yesterday. They have begun to propose design changes that address the symptoms of the root cause in order to mitigate there losses in the short term.  However I believe in the long term the FAA will hold Boeing to a root cause solution.

Regardless of the outcome, I believe we can learn from the event as it has unfolded. We know that according to the original Nowlan and Heap study, which became the foundation of Reliability Centered Maintenance, that 69 percent of the failure modes within our facilities or on our airplanes can be categorized as infant mortality. We also know that many of us, including myself in the early days, look to redesign as the first solution to many problems before truly understanding all of the root causes. If we put the two together then we can likely expect that if we use solely a redesign strategy then the introduction of more failure modes and  more infant mortality is a given. When we redesign to remove one known failure mode we stand a very good chance of adding many more unknown failure modes if we are not using a tool like Failure Modes and Effects Analysis or FMEA. When we redesign we very seldom make things simpler and therefor the more complex redesign comes with more points of failure. So please understand redesign is a valid solution but it should not be your go to solution within most facilities. In the spirit of wrapping it up the three take always from today's post are:
  • Use root cause and don't be afraid of short term solutions but always have a long term plan for defect elimination or mitigation at the systemic and latent levels based on return on investment and risk thresholds.
  • Redesign should not be your first choice in most failure investigations. Infant mortality and unknown failure abound in redesign decisions.
  • Third if you are a root cause practitioner then you can not give in to speculation and rumors without data. Lets leave those to the tabloids and non fact based news outlets of which we have plenty.
To provide a historical perspective and a bit of data, the following quote was provided in an earnings conference call last week by United Continental Holdings Inc. Chief Executive Jeff Smisek. He defended the plane saying:

"History teaches us that all new aircraft types have issues, and the 787 is no different," Smisek said. "We continue to have confidence in the aircraft and in Boeing's ability to fix the issues, just as they have done on every other new aircraft model they've produced."

Designers and Re-designers must expect infant mortality and use tools like RCM, FMEA, and RCA to mitigate these problems and the risk that associate with them. What are you doing to reduce infant mortality in your plant?

Thursday, January 24, 2013

Five Whys and Wishbone: Program and Training Sponsorship

Over the years I have seen many sponsors give great supporting speeches but I have also noted a few that failed miserably along the way. Lets learn from them. One in particular was the kick off of a new root cause analysis facilitators program in a facility with a history of " flavor of the month" programs and lack luster performance. The "leader" did not attend the workshop because he was "fully up to speed" but came in at the end to say how important this new initiative was to the facilities future. He opened his speech with how he had used "five whys and wishbone" to solve problems in the past and that it could be done with no additional resources so it was perfect for them. It went down hill from there. All I could think was "I guess you could use wishbones but how many turkeys would it take to get to the root causes." For those of you who have seen my root cause methodology you know we talk about five whys and fish bones as the simplest of tools that can be used to create a culture of root cause but certainty are not the tools of serious problem solvers like the reliability engineers this "sponsor" was speaking to. My point is if you want your initiative to have even half a chance of success then take some time to learn what they are about. Brush up on the details. Spend a few minutes with the instructor finding out what the key questions are that you need to ask to really lend you support to the program and drive results. Also take a few minutes to understand what the practitioners will need. 
Many leader spend the time and show their support by attending the full class and learning with their team but if this is not an option then many instructors offer executive half day overviews of the course that can be provided prior to the start of the full workshop. In these session they cover all of the basic requirements, key concepts, question that need to be asked to drive results, and key expectations the leader should have.
Happy Training 


Monday, November 12, 2012

Solving the Crime of Unreliability: Elements of a Process for RCA

I was recently watching a popular crime drama on TV and I noticed that they follow a very similar process when solving a crime that I do when solving a reliability problem in a facility.
The first thing the detectives do is identify questions they have that they would like answers to and then collect all the evidence they can to begin to answer those questions. Then they build a timeline to understand where things fit around the crime. Then they combine the evidence and the timeline together and identify the motives and the finally the suspects. I have over simplified all they do but the core process steps are still there.
Solving the Crime of Unreliability in a facility starts with identification of the questions and the evidence to be collected. Then just like the investigator the next step is the collection of said evidence. I suggest folks use collection kits to help categorize and capture the data in its entirety. There is a blog here about the kits I use and what they contain.
Early on I skipped the element of time and did not complete the timeline or sequence of events prior to the use of other tools. Over time I learned this was a mistake in many cases and cause me to miss details. In two recent RCA investigations that were completed by others and reviewed and refined later by me, we discovered whole new causal chains and missed causes related to rebuild and maintenance execution that was not identified in the initial investigation. This was due to the fact that the original RCA team focused on their preconceived notions and did not look at what happen just previous to the failure in the sequence of events. Completing the sequence of events opened their eyes. It will do two things for you: first it identifies other potential causes and second it clarifies the causes that you have already identified. Just as the crime scene investigators then take the time lines and evidence and begin to look at the relationships I do the same. I choose a tool like fault tree or logic tree, among others, to attack evidence in the sequence and to draw the connections and the causal chains.  
It you find the crime of unreliability has been committed in your facility then you may want to make sure you have included each of these steps in your RCA process.

Thursday, September 20, 2012

Fish Bone Alone Doesn't Deliver Root Causes

Chances are, fish bone or Ishikawa diagrams alone will not get you to the root causes. I refer to the fish bone as basic root cause tool. They serve a purpose and they do enable root cause investigations but they do not necessarily have the power to be a stand alone tool.
Let's talk about why and then what they are very good for.
The reason they are not able to give you all the root causes comes from the way they are used. In general they produce a categorized list physical causes and human causes but they do not identify  causal chains or underlying systemic or latent causes. Many times they only feature the symptoms of these latent causes as the bones of the fish. If you choose to only use the fish bone you have the potential to miss many issues and the connections that tie them together. I have reviewed many of these diagrams where the real root causes were just under the surface of the list but never brought to light during the investigation.
The real focus for me is the return on investment and if my root cause program is driven by only fish bone than I am implementing more solutions trying to address all of the identified bones of the fish at additional cost and I am more than likely suffering reoccurring failures driving lost production and additional analysis cost.
So what do I use the fish bone for? Personally I like to use it in three ways first as a lead in to the FMEA when working with a group that may have not used that tool before. I would take the bones of the fish and transition them over to a spread sheet or FMEA tool. This can help me get folks engaged and help to began the population of the next tool.
The second way that I use the fish bone is to is as a brain storming tool where we can identify many things that could have caused the failure and then I assign the individual bones to groups and they go out and look for data to confirm or deny the existence of that cause. Then when the team gets back together to apply the next tool and pursue root causes and solutions we have data to keep the process moving.
The third reason that I use the tool is as a facilitation exercise for when I have a quite or boisterous root cause team or team members. In this situation we use stick notes and we all write and stick causes to the diagram. This give the less expressive folks and alternative in writing instead of speaking and it allows the expressive folks to see the sticky note and know they were heard. This can get a group to develop many more possible causes and then they can verify, investigate and eliminate with data producing a better lead in to the next more thorough tool in your root cause process.
So to wrap it up fish bone analysis is a tool that has its place in the root cause process but if you want the lowest total cost of solution you will need to couple it with other tools from your root cause tool box. To read more on root cause check out these post from the past.

What are your thoughts?

Tuesday, August 21, 2012

Preliminary RCA Reports Promote Failures

Numerous government agencies including the FAA create preliminary failure reports however like many other things our government does preliminary failure reports are a bad idea for businesses as they drive many bad behaviors including reoccurring failures.
If you are currently providing these "educated guesses" to your organization prior to the actual root cause analysis report then the best advise I can give is to stop as soon as possible. I know it is not quite that easy but, when these preliminary reports are issued the manager and those effected are quick to review them even though they are merely guesses at the causes and may not be based on real data. The second issue is that once your complete analysis report is generated with all of the contributing factors, full causal chains, solutions and the data to support them is released then no one is really interested enough to take the time to read them after all they have already seen the prelim. It is like trying to read a book after someone spoils the ending, it is just not worth the effort. If you issue the prelim that  means that major decisions that effect the business are made based off of a quick reviews of partial sets of facts and strait conjecture. This conjecture leads to repetitive failure. In fact, many of the preliminary reports are nothing more than regurgitation of past experiences and not the facts related to this situation and real proven data.
I know many of you feel the pressure to get a report out with in 24 hours but in most cases this is just not a reasonable request and if you can not immediately change that expectation your should at least be working toward it. If you need to have a fast turn around on your RCA investigations then streamline your RCA process to limit the number of analysis required per month, request more resources for each investigation, and/ or speed up the report creation process. One way to speed up the RCA process is to use our A3 RCA reporting process where you put all of the information from the RCA on an A3 size sheet of paper that is populated as you move through the analysis process not at the end. I don't have the space here to show the full document but for an example send me an email at shon@reliabilitynow.net.
The point is don't put out your reactive quesses at a cause if you can streamline your RCA process and put out a complete analysis in less than a week. Just like you can't get to excellence in reliability overnight you also can not complete an effective RCA in an overnight time frame either.

Monday, June 18, 2012

Democratic Root Cause Analysis?

So we had primary elections last week and it reminded me of an RCA process that a client just shared with me. I call it Democratic RCA among a few other things.  Let me describe it for you. The first step was to draw a "cause and effect" or fish bone diagram for the problem. Then they get a group together and add causes to each category. So far so good. This is a great way to get the team going, encourage participation and and cleans the palette, if you will, before you dive into the more advanced RCA tools. This is where things went awry. Once they finished the cause list for each branch or bone of the cause and effect diagram they then voted for the most likely causes and those became the root causes for their analysis. While democracy and the power of the vote works in government it has no place in root cause investigations. RCA is about facts not conjuncture. It is not about opinion unless you can show the data to back it up. This root cause popularity contest will lead to limited if any out of the box thinking and will be steered by the most convincing group member or members. Causes will become a reinvention of something that has occurred in the past and is comfortable to the team. This is not root cause analysis.
If the team wanted to use their fish bone they could have skipped the vote and divided the group up into sub teams that could attack each branch of the fish bone. The sub team could go out and look for evidence on each of the possible causes and then they could choose one of the other RCA tools like fault tree to perform a more in-depth analysis. Now the analysis would be based on the facts that the sub team brought back from the field not a voting process for the teams favorite solution. In the end, this data based process will allow for more open-minded solution development and better fact based results for the effort of the team.

Monday, June 11, 2012

Three Silly Secrets About Root Cause Analysis

Here are three not so closely held secrets that can help clarify a few of the points around Root Cause Analysis (RCA). I hope you find them enjoyable and they help you think about a few things differently.
There is no such thing as one root cause...
Why!?!
  • The connected causes or causal chains as we call them continue forever in an infinite continuum moving out from the event. Lets say you were doing a root cause on your office mate tripping over your computer cord, and you took it to an extreme, if your office mate had not been born then the cord could not have been tripped over tripped by him today. Now we know you can not control your office mates parental breeding patterns in order to prevent the fall but the point here is that you can investigate and build well beyond your ability to effectively mitigate or eliminate the causes and you would still not be at the one root cause. The key here is to try to find the paths that lead to the systemic and latent causes and then take the time to evaluate solutions at that level for the best return on investment. This may mean that they may be out of your span of control but within the span of control of your managers or others allowing you to be successful.
  • Rob Base and DJ E-Z Rock once said "It takes two to make a things go right" I would add "or wrong": With RCA those two things are existing conditions and instantaneous actions.  There are always at least two causes per one effect therefor it should be called Root Causes Analysis. Using the example from above there was a cord which was the condition and a walking office mate which is the action. When you consider both you can look for the hidden causes and lowest cost solutions as you drill down into the problem.
Cause and effect are the same thing...
How!?!
  • The cause of the event or reason for the RCA is the effect of it's cause. 
  • For example from above: Falling office mate was the effect of the action of walking across the floor. The action of walking across the floor was the effect of the action of you requesting help moving your monitor. In this example they are all effects rewritten they can all be causes. You requesting help with your monitor was the cause of your office mate walking across the floor and your office mate walking across the floor was caused of him tripping over the cord. The point is don't get hung up on either word. Use which ever you like the best to build the causal chains but to avoid confusion don't mix them together.
Root Cause Analysis reports create no return on investment...
What!?!
  • You do not get a return on investment from a report. The return on investment comes from the implementation and verifications of the solutions. Why is this important? If I have to choose between spending all my time creating a beautiful report or spending my time ensuring implementation for the solutions then I choose the latter.
Thoughts?

Friday, May 18, 2012

Transitional Root Cause Analysis


When I discuss RCA I use a method called Transitional Root Cause Analysis or TRCA for short.
It is made up of 10 tools that can be explained and understood in a very short period of time.
In the next few minutes I will demonstrate both the simplicity and rules for use for 3 of the 10 and explain why we consider them transitional in nature.
In this blog we will use what I categorize as the tree methods. These are three tools that build out into a tree root like structure. So let's take a look: the first one is called the "5 why"  method. It is very common in industry today and is a very basic root cause tool. It is created simply by asking the question why multiple times to create one causal chain. It creates a simple main tap root to build off of.
Now, if we take the 5 why diagram and branch it out by adding more elements at each level then we get a better representation of all of the causes that come together to create an effect. This transition of the 5 why is known as a fault tree. This method allows us to easily see all factors that led to a failure, but sometimes we need to show a bit more information to make the graphic more meaningful. 
If for instance an effect can only occur when all of its causes exist at the same place and in the same moment in time then we use the word "and" at that junction of the roots. If we eliminate either one of the cause then we can eliminate the effect. On the flip side if either of the causes could precipitate the effect then the word "or" would be placed at the junction. This would be read as this or that could cause the effect above. This allows you to see that both possibilities must be addressed to prevent the cause.
These three tree methods transition from one to the next by adding one simple new feature as needed during the root cause process. First, we take a "5 why" and branch it to get the fault tree then we add in the "and and" or "or" to get logic tree. Three powerful tools that build on each other to get you to the lowest cost solution that mitigates the risk.The other Transitional tools work very much in the same way and allow us to use the right tool for the job instead of trying to use a screwdriver as a hammer.
If you would like to learn more about the Transitional method to RCA please send me an email and I will help you along.

Monday, April 2, 2012

Have Your Corporate Metrics Become a Time Sponge?

The corporate quest to standardize metrics can become a time sponge absorbing countess hours and distracting facilities from the real focus of bottom line performance improvement.  Today's blog is about discovering how to keep KPIs from soaking up all of your precious time and focusing on driving changes in behavior not just the numbers.
There are metrics and Key Performance Indicators (KPI) standardization efforts happening all over the world. They are happening in plants and corporate offices as well as within organizations such as The Society of Maintenance and Reliability Professionals and the European Federation of National Maintenance Societies. Many of these groups and other authors have spent endless pages defining these metrics. While I believe their work is incredibly valuable for individual facilities to understand their current state, what I commonly see is the documents they produce only lead to further internal corporate argument. Some companies have spent years and countless hours on the quest to define metrics like Overall Equipment Effectiveness (OEE) even with the standards that already exist in the market. It becomes a discussion of what time goes into the calculation, what time is out, what speed is the best demonstrated throughput, how do we define quality product and so on. At the end of the day an apples to apples comparison of different sites can be tough if the metric cannot be pulled from a standard data source such as the ERP, EAM, or Profit and Loss (P&L) statements. The EAM data is also suspect unless many of the steps of the business processes have been standardized from site to site to provide consistent data.  I believe one solution is to get it close but don't take the effort to argue through all the details and then compare sites on a delta basis. So instead of reporting just your percentage OEE or any other metric, report and focus on the change in the value from quarter to quarter or month to month. If we all agree that the goal is to improve then let's compare the efforts to improve not just the numbers. If one site is able to increase their OEE by a full point in one quarter then do an RCA to understand why and learn from them. What behaviors have they changed? What processes have they refined? Focus on the Root Cause of Success and spend the time making behavioral change not debating corporate metrics. Wring out the time sponge and get back hours to devote to real results.

Wednesday, March 14, 2012

Root Cause Analysis: Getting the Maximum Return On Investment from RCA

Many Root Cause Analysis (RCA) practitioners tend to rest a bit when they finish the report. They think “Wow that’s a good looking report, now off to solve another problem” and they miss the fact that they have not solved the first problem yet. There is no return on investment for doing RCA reports… It is the implementation of the most cost effective solution that delivers the maximum money to the bottom line. With that said below is a list of four ways to ensure that you get the most bang for your RCA buck.
Follow multiple causal chains to provide multiple solutions for review
Look for all of the ways it could be prevented or mitigated including both the actions and conditions that led to the event. If you just using 5 Whys you may miss one or the other.
Evaluate all the solutions 
Sometimes we tend to recommend our “pet solutions” meaning that we use the solutions that have worked in the past without understanding the cost and benefits and comparing that to other possible solutions. We need to look at all of the ways to lower risk to an acceptable level at an acceptable cost. The logic below can help with that goal.
Be Logical: Remember “and and or”
        When looking at the fault tree use logic symbols to see where you need to only tackle one item to prevent or mitigate the failure in the case of “and” or where you may need to attack multiple causes in the case of “or” branches. If either this “or” this can cause the failure then you must address both to mitigate the risk whereas with the “and” eliminating the lowest cost cause in the branch will prevent the re-occurrence of the failure.
Remember latent and systemic level roots have the best savings potential but can also have the highest cost of implementation because in many cases you must change culture and human behavior. Because of this you should understand problems to the latent level and address problems at the cost effective level.

In the end RCA is not about developing pretty reports it is about developing a business case for the best solution from a cost and risk standpoint then ensuring that those recommendations both get implemented and get the results we expected at the lowest cost. Or to say it another way:
Have fun… understand causes… evaluate solutions… implement… verify… succeed!

Tuesday, February 14, 2012

Five Why "Nots": 5 Reasons Why Reliability Engineers Should Use More Than 5 Whys for Root Cause Analysis

First of all let's talk about what Five Whys is before we mention what it is not. It is a problem solving tool used in many facilities and is commonly associated with Lean, Six Sigma and Kaizen implementations. The technique was originally developed by Sakichi Toyoda and was used within the Toyota Motor Corporation during the evolution of its manufacturing methodologies. The method is quite simple really and involves asking "why" multiple times until the individual believes that they have reached the process cause of the problem. This sometimes (but not always) means you will stop at the fifth "why" hence the name.
While I like five whys as an "on the floor" problem solving tool I cringe when people call it root cause for five reasons:
1. There really is no such thing as "root cause" a more correct phrase would be "root causes” because there are almost always conditions and actions that come together to manifest the failure. 5 whys as it is most often used only addresses one branch of the causal chain either the condition or the action.
2. By only following one causal chain you do not get the opportunity to analysis all the contributing causes and look for the lowest cost solution that eliminates or mitigates the risk to an acceptable level.
3. The results from 5 Whys are not repeatable. Different people using 5 Whys come up with different causes for the same problem. It is all based on their existing knowledge, and experience.
4. Five Whys many times is just used to prove what the practitioner already thought instead of looking at other possibilities. 5 Whys investigators are plagued by an inability to go beyond their current knowledge which leads to them not identifying causes that they do not already know.
5. My experience tells me that Engineers that use five whys solely do more investigations due to problem reoccurrence. They are sometimes celebrated for their ability to do 27 RCA investigations per month but when you look at the list 18 are problems that they should have taken the time to do a true analysis the first time and they would not be focusing on them again. Using the 5 why method causes a tendency for investigators to stop at symptoms rather than going on to lower-level root causes. This leads to reoccurrence of the problem.
So as I mentioned earlier Five Whys is a great tool for shop floor trouble shooting but if you are going to focus on root causes then you may want to consider the more advanced root cause analysis tools. I would suggest that root cause tools like logic tree or fault tree will identify more of the contributing factors and decreases the chances of failure reoccurrence. To see how to make a simple transition from 5 Whys to logic tree check out this single point lesson here. 
Good luck solving your problems and I hope you have fun doing so!

Tuesday, February 7, 2012

Just Because You Can Does Not Mean You Should: Six Ways to Evaluate Change

Six ways to evaluate a change in advance and increase the potential for success.
These six questions came out of the solution implementation phase of my Root Cause Analysis (RCA) process. However they would apply nicely to just about any change you are trying to make.  The point here is to think through the key success factors, ensure that this change is not just for change’s sake, and ensure the change provides the returns we expect over time.  The questions are broken into three basic areas, solution evaluation, risk of the change, and the communication process for the change. If we hit these three areas we can eliminate ninety percent of the issues that cripple most change efforts.
1.    What problem does the change solve? or prevent? In other words what is the true goal of the change? This is what we will test for with our metrics and KPIs so try to keep it measurable.
2.    What is the change worth? What will this provide as a return for our efforts? This becomes the financial driver for the project and should meet your internal standards for Return On Investment (ROI). The important point here is to evaluate the total cost of the change or life cycle cost as we call it and the total savings over that life cycle. Many companies are very good at capturing the savings but not the life cycle cost. This leads to poorly chosen solutions.
3.    Can the problem reoccur? This question helps us understand how effective our chosen solution is versus other options. We are identifying the residual risk and evaluating the options for change.
4.    What problems could it create? This question covers the unintended results of the change. It is a continuation of our risk mitigation planning step where we are making sure we have identified as many possible new failure modes as possible and evaluated their impact.
5.    Who is affected? We know that people are always involved in failures at some level so we must identify who needs to understand the changes to prevent these failures?
6.    What should they know? What training, communication, coaching, or documentation do they need? This combined with the previous step becomes the communication strategy for the change.
If you evaluate each the changes you plan to make with these criteria you are left with a simple goal statement with metrics for improvement, risk analysis, and communication plan that will help you to proactively mitigate issues with your change and truly evaluate the change prior to the expense of implementation has occurred. I hope these six make your life easier and your changes continue to be a success.

Wednesday, February 1, 2012

What I Learned from a Pharmaceutical Facility in Pennsylvania

This is a continuation of the "What I learned" series. 

Pharmaceuticals manufacturing is a very interesting business. For many years excess maintenance cost and maintenance down time has not been an area of prominent concern for many of these manufactures. But, with the need for additional capacity, the expiration of various block buster patents, and the rise of contract manufacturing their world is changing.
This facility was looking to be ahead of the curve and ensure they were ready for the changing environment. They wanted to be in charge of their destiny and not have it dictated to them.
The site showed me that there are many ways to succeed with reliability implementation and it can be called many different things.
This site used the energy behind the implementation of a new Enterprise Resource Planning (ERP) system and more specifically the Enterprise Asset Management (EAM) module as the framework for their reliability improvement efforts. As they implemented modules and sections they improved the processes and data that was required to be able to reach a new level of maintenance efficiency. This sounds like a logical choice but more often than not sites get over whelmed by this level of change all at once. Then the EAM implementation becomes simply a reimplementation of old practices, bad data, and ugly procedures in a shiny new software tool. It is a bit like adding a jet engine to a 1964 AMC Pacer. It sounds cool but it just does not end that well.
This site challenged the past standards and decided the old way was not good enough. They built a clear plan with managed subproject to prevent overloading of the organization.
They realized that the regulatory and product safety administrations were not there to put you out of business. Instead,  they are there to ensure you do what you said you were going to do. This site removed nonvalue added Preventive Maintenance (PM) task and refined procedures that once drove unreliability. They did all of this without increasing risk for the company or the customer. In fact, as they improved reliability with tools like Reliability Centered Maintenance (RCM) and Root Cause Analysis (RCA) in the later part of their ERP implementation they were able to provide more stable equipment and process which leads to a more quality product. Thanks to my time with this organization I know that when building a strategy for maintenance improvement it is very important to take a long hard look at the current state of the site, the culture, and the history of major initiatives and then prescribe an improvement plan that is specific to that site and it's culture.  As a consultant or a leader this means you can not always do it the same way you did your last project. You have to be constantly pushing the boundaries, matching the needs and using the culture to help them reach the goals they have set. This may change the title of your maintenance improvement initiative to a lean or Six Sigma project or it may be a part of an EAM or ERP implementation but in the end you use the site’s successful tools and momentum to drive the results that you need.