Monday, April 23, 2012

FMEA: Its Not Just for Maintenance Anymore

I find it very interesting how many of the tools we reliability and maintenance engineers use within our jobs apply in a much broader sense. The obvious one is root cause analysis or RCA as it is known. It could be applied by anyone to solve nearly any problem. In fact in my workshops I have had operations, human resources, and even information technology folks learning about the processes and tools to reduce or eliminate failures. But, have you ever thought of how you can use the failure modes effects analysis (FMEA) and its derivative failure modes effects and criticality analysis (FMECA) process beyond just equipment maintenance plan development?
I find it can be used for every project that you are a part of. For example say you have been asked to be part of a team that is implementing SAP or Maximo or any other EAM. One of the first things you need to identify after scoping and charter is a  risk plan and in essence that is exactly what an FMEA creates. You can look at the goals of the project then consider what might go wrong to diminish goal attainment. Then you can use risk priority number (RPN)  from a FMECA to sort the potential issues. Lastly, the team can design mitigation or elimination steps for those failure modes that meet a minimum risk threshold. So for example, if one of the goals, or functions in FMEA speak, of the project is to complete the implementation in nine months you might consider that a site leadership change could affect the group’s ability to meet this goal. The change in leadership could mean a change in priorities etc. We would then look at its RPN by asking: What is the likelihood of the leadership changing? How bad would the leadership change effect on time project completion? And what are the chances we could detect the leadership change prior to occurrence? This would lead us to a ranked list of potential failure points or modes. We could then build elements into our communication plan and other project plans to proactively address these potential failure modes. Then, if they do occur everyone would be aware and prepared to address them.
What other RE and ME tools do you use in unorthodox ways?

Thursday, April 19, 2012

Are You Doing PM Right: 5 Things to Keep You in Flight

So earlier this week I had the opportunity to fly the helicopter shown to the right in for its 100 hour Preventive Maintenance Inspection, which led to many of the thoughts in this post. Private or General  Aviation as it is known is really at about the same place as many of us on the reliability maturity scale. You might say they are "not quite best practice". Now for those of you that travel commercially don’t fret, that world tends to operate on a different level with a different mindset. While talking with the mechanic who was servicing the helicopter these points came to light. 
1. They could really remove a lot of non-value added step in the PM. Many of these have no discernible link to actual failure modes. Some of them looked more like an exercise by the vendor in selling parts and satisfying lawyers. As an aside someone should explain the basic tenets of RCM and the failure curves to lawyers so that they understand new does not equal reliable. Statistically, the most dangerous plane a private pilot can fly is one that just went through an overhaul and it is full of new bits.
2. If a skilled pilot or a skilled operator can perform the check then teach them and let them do these as part of a pre-flight or walk down inspection. This will lower maintenance cost and increase the organizations understanding and ability to identify failures early allowing for planning and scheduling of the repair work.
On the helicopter the rear drive pulley was worn and the damage was visible without tools or removal of guards but on the preflight checklist all it says is “check the pulley” and most pilots don’t know what the failure modes of the pulley are so they only check the most obvious and catastrophic. If a pilot noted the defect on the pulley during their inspection then the parts could have been ordered in advance and the ship would have been out of operation much less time and could have been  generating additional revenue and making the business more profitable.
3. Simply put, use Predictive Maintenance (PdM) technologies where you can to replace invasive Preventive Maintenance (PM) task that require downtime and can induce failures.
If you can check the equipment while it is running then you have less downtime and many times less maintenance cost not to mention you catch the failure much earlier allowing for the proper parts to be ordered and shipped in for your downtime windows.
4. PMs and check list should be quantifiable and repeatable. Each step should give tolerances or operating ranges that allow each operator or mechanic to do it the same way each time. One step in the Helicopter PM job plan ask the technician to use a screwdriver to pry against a valve and check for movement. This is quite ubiquitous. The mechanic and I both were left with questions like:
How big of a screwdriver? How much force should we apply? How much movement is acceptable?
In the end the mechanic shared with me that he does not follow the PM for that step because he has no idea what is “good”. He has opted for a different method that he was taught by others.
5. Lastly, If you see the phrase “As needed” or” As necessary” then you know you have a problem. Most of us do not know what these phrases mean therefor we need additional detail to be successful.
In the end, whether you are hanging on for dear life under a set of spinning blades or running a packaging line in a manufacturing facility you need to do great PMs to get great results and hopefully these suggestions might help you look at things in a new way.

Wednesday, April 11, 2012

How Long Does it Take to get from P to F on the Curve

The P-F interval (shown to the left) is sometimes taken as a graphic that shows how equipment fails and how long that failure will take. This is sometime called prognostics or remaining life. The truth of the matter is that it is really just a model of what one failure mode's failure probability looks like as it progresses down the path to functional failure.  The probability of failure goes up as you move down the curve. We can not plot out how long something can last once a defect is present. We simply do not have enough failure data on most common failure modes to accurately predict the time to point F or the point of functional failure. Our model is great for explanation of key concepts but fails to predict key points with anything more than probability. If we did have the number of failure data points for each failure mode then the down time collecting the data more than likely would have put us out of business or out of a job. Reliability Engineering is about eliminating and mitigating failures and by extension reducing the occurrence of data points.
Now their are exceptions where more is known about the individual failure modes and one example is some of the work the University of Tennessee is doing in the area of battery prognostics but, this is the exception not the rule.
Let me show you why we use the P-F model to make explanations not predictions. Let's take a bearing in a pump that was run with dirty oil. In that dirty oil was a particle of alumina or other hard substance. That particle is caught between the bearing element and the bearing race and a small dent or pit is created as the element tries to crush the hard impurity. Now our bearing has a defect that is not that different from a pot hole on the road over time it will grow.  When the elements roll over the pit the sides began to chip away these chips now contribute to additional impurities in the lube and the pit becomes larger the failure mode moves down the P-F curve in our model. Here is where chaos reigns. What if the load on the bearing increases and the elements presses on the spalling area with more force? What if an element catches a piece of the spalled out metal and skids creating additional heat and damage? What if the load decreases or the product being pumped changes? What if the filter that was bypassing is changed and suddenly the oil is cleaner? The point is all of these situations could change the time to point F. This is why we use the condition based maintenance (CBM) techniques to identify defects and start our planned replacement process not try to measure more frequently and "get all the good out of the bearing."
The point is that the P-F Curve is a great model for explaining why we do CBM and how it is different from traditional Preventive Maintenance and it is also very powerful for showing the link between the technologies and planning and scheduling but is not a timeline that can be used to "predict" when failure will take you down.
How do you use the P-F curve?

Monday, April 2, 2012

Have Your Corporate Metrics Become a Time Sponge?

The corporate quest to standardize metrics can become a time sponge absorbing countess hours and distracting facilities from the real focus of bottom line performance improvement.  Today's blog is about discovering how to keep KPIs from soaking up all of your precious time and focusing on driving changes in behavior not just the numbers.
There are metrics and Key Performance Indicators (KPI) standardization efforts happening all over the world. They are happening in plants and corporate offices as well as within organizations such as The Society of Maintenance and Reliability Professionals and the European Federation of National Maintenance Societies. Many of these groups and other authors have spent endless pages defining these metrics. While I believe their work is incredibly valuable for individual facilities to understand their current state, what I commonly see is the documents they produce only lead to further internal corporate argument. Some companies have spent years and countless hours on the quest to define metrics like Overall Equipment Effectiveness (OEE) even with the standards that already exist in the market. It becomes a discussion of what time goes into the calculation, what time is out, what speed is the best demonstrated throughput, how do we define quality product and so on. At the end of the day an apples to apples comparison of different sites can be tough if the metric cannot be pulled from a standard data source such as the ERP, EAM, or Profit and Loss (P&L) statements. The EAM data is also suspect unless many of the steps of the business processes have been standardized from site to site to provide consistent data.  I believe one solution is to get it close but don't take the effort to argue through all the details and then compare sites on a delta basis. So instead of reporting just your percentage OEE or any other metric, report and focus on the change in the value from quarter to quarter or month to month. If we all agree that the goal is to improve then let's compare the efforts to improve not just the numbers. If one site is able to increase their OEE by a full point in one quarter then do an RCA to understand why and learn from them. What behaviors have they changed? What processes have they refined? Focus on the Root Cause of Success and spend the time making behavioral change not debating corporate metrics. Wring out the time sponge and get back hours to devote to real results.