Pages

Tuesday, October 11, 2016

Is The MTTR Metric Killing Your Reliability?

Metrics or Key Performance Indicators (KPI) are a force for good when they are used at the right time and with the right complementary or supporting elements. But, when they are used at the wrong point in a facilities maturity they can have unintentional consequences, or even worse they can drive the wrong behaviors.
An example that I continue to see causing more issues than it is solving, is the use of the KPI known as Mean Time To Repair (MTTR). When this metric is used alone, in immature organizations, or without an understanding of the unintentional consequences it can drive your organization in the opposite direction of world class performance. If your organization is immature from a reliability cultural standpoint and you choose MTTR as your focus then you set yourself up to become very reactive by being very quick to respond to failures. The facts are:
  • Reactive response is at least 5 times more expensive than planned and scheduled work . 
  • Operations will beat on you to get faster and faster at responding so that you lower MTTR. 
  • Rushed repairs are less reliable.
  • Reactive response requires more expensive spare parts stock.
  • Repetitive failures and repairs increase the chances of the introduction of infant mortality failures. 
  • You will find yourself with high skilled maintenance technicians just standing on the manufacturing floor doing nothing while waiting on a failure to occur. 
  • Pressure to make the repair as quickly as possible can lead to taking elevated safety risks either intentional or unintentional. 
Many of the sites that choose MTTR as a primary metric early in their reliability journey create a brigade of firefighters on the ready with crash carts and mounds of spare parts. What we really want is to prevent the failures from happening to start with or at least reduce the frequency. For that we might use other metrics like Mean Time Between Failures (MTBF) early on and then deploy MTTR, after we increase the reliability of our systems. This later use of MTTR will allow us to address the issues we can't prevent and to understand any training gaps and other issues that might be affecting repair times.
Picture it this way: If MTTR is all you have then your organization will create tools like crash carts and quick response teams instead of using tools like Root Cause Analysis (RCA) to understand and eliminate the reoccurring problem. From the real world, I have seen bearing quick change carts developed to speed up re-occurring failures repairs where it they had just tensioned the belts properly the failures would have been eliminated.  Five really fast 2 hour bearing replacements is still much worse that bearings that don't need repairs at all. This site needed to understand better not respond quicker.
Are your metrics driving the wrong behaviors? Are you using them at the right time?
Tell us what metrics have not worked for you and why in the comments below. 




1 comment:

  1. I would add that MTTR doesn't really address the variability of even a normal job. Variability is part of the normal process, sometimes one bolt will take longer to remove. Ignoring that variability using a simple oversimplification will lead to the unintended consequences you have listed. Instead use the lognormal distribution to describe the distribution of repair times - much more informative.

    ReplyDelete