ReliabilityNow.com: February 2013

Thursday, February 28, 2013

The 4's and the 8's: Estimating Work Orders Wrong

This week I have been teaching asset management principles and as a part of that we were discussing the work execution process including planning of maintenance work and the schedule that follows. I shared with the group that my experience tells me than many if not most work orders are slotted into half and whole shift time estimates for the schedule. A gentleman from a very large facility chimed in and added some data to the mix. He said within his facility currently 88 percent of the work orders in his backlog were estimated at 8 hours. 6 percent were 4 hours and the remainder were at random intervals.They were half and whole shift...
If you can only achieve that level of accuracy then are the jobs really planned?
The planning process should include breaking down the job into smaller steps where estimation is easier and more accurate. If you are not breaking that job down into the component task then your inherently inaccurate job plans will lead to inaccurate schedules and that will provide for wasted craft time, excessive downtime, and broken delivery promises to operations. Job plans in general do not have to be planned to the minute but 4 hour and 8 hour is definitely not enough.
What level of planning accuracy would you suggest works for your facility?
to the hour? to the quarter hour?

Tuesday, February 19, 2013

Losing the War on Downtime While Winning the Battles

I recently had the opportunity to write an article for UPTIME Magazine where the focus was on distinguishing between short term wins and long term sustainability when it comes to saving and making money with your maintenance team. In the article, I share in jest four short term solutions that while they might generate short term cash it is at the cost of long term sustainability and future reliability efforts. I then share some thoughts on longer term solutions that provide for greater reliability and more importantly, are sustainable.
One of the short term battles that I see many sites "winning" is the kaizen events, TPM events, continuous improvement events or whatever they may be called at the site. They can be characterized as one or two week events where we clean up an area and "solve" a few problems in a very reactive manner and then decree that this area is now reliable. Battle won!
The issue is three fold:
In many of these events the symptoms are addressed not the root causes because the team in their haste does not use detailed enough RCA tools and processes.
Secondly everything about the event is done in a reactive manner. Maintenance craftsmen are kept at the ready to address every issue the team finds as quickly as possible. This demonstrates little planning, reactive scheduling and rewards your quick fix firefighters with the work that they love. Third the team moves on and the sustainability is left to the area. Some of the people who are expected to maintain this new "reliable" state have no idea why the changes were made and what is in it for them.
So what happens in the end, well when you return to the kaizen area just a few short months later it is in its previous condition or even worse than it was before. War lost because even though the kaizen battles have been won and celebrated the equipment condition and reliability is the same or worse.
Short term focused events can work within your long term war on downtime but they must be constructed to demonstrate the best practices not reinforce the worst. They must use effective problem solving tools, and generate work that can be planned and scheduled and then executed with precision and at the lowest cost. Lastly their needs to be communication stratigies to share the intent, goals and sustainability needs and metrics to reinforce the new behaviors. If you strive to demonstrate the best practices in all you do then you can win the battle and the war on unreliability and downtime.

Wednesday, February 13, 2013

Nine "Ps" for Profitable Plant Reliability Improvement Efforts

So if you could sum up the common areas of focus during reliability improvement efforts what would they be?

The thought behind this blog post was if someone ask us what we are doing or what all is involved in a reliability improvement effort, how can we give them the scope in a concise, and memorable way. This could be used early on in the discovery or kick off phase to outline without overwhelming.

I have listed nine things that I would focus on and they all start with P for ease of remembering.

Predictive Maintenance

Using technology to understand equipment condition in a noninvasive way before the functional failure occurs

Example: Vibration, Ultrasonic, Infrared

Preventive Maintenance

Traditional and more invasive time based inspections which should be failure mode based

Example: Visual Inspection of gears in a gear box

Precision Maintenance

Doing the maintenance craft to the best in class standards to prevent infant mortality

Example: Alignment, Balancing, Bolt Torquing

Process

Clear series of steps to identify, prioritize, plan, schedule, execute, and capture history with who is responsible for each

Example: Work Identification Process, Root Cause Process. Work Completion Process

Problem Solving

The process for understanding the real causes of problems and using business case thinking to select solutions that reduce or eliminate the chance of recurrence

Example: Root Cause Analysis, Fault Tree, Sequence of Events

Prioritizing of Work

The process of determining sequences of work as well as level of effort using tools like equipment criticality and work order type

Example: RIME index

Parts

These are the processes required to have the right part at the right time in the right condition at the right place for the right cost

Example: Cycle counting process, proper storage procedures, kitting process

Planned Execution

This piece is about taking the identified work and building the work instructions, work package and collecting the required parts and then scheduling the execution.

Example: Job Packages, Schedules, Gantt Charts

People

This is where we deal with the change management and leadership portion which is required in order to true make a change to the organization

Example: Situational Leadership, Communication Planning, Risk Identification, Training

So here are my nine "Ps" that you can share as early communication to get your organization on board with your reliability efforts and develop the Profit we all want.

What would you add?

Friday, February 8, 2013

Boeing Dreamliner, Root Causes, and Asset Infant Mortality

Boeing and their Dreamliners are taking a lot of heat over their faulty lithium ion batteries. They are dealing with the birthing pain of new technologies. They chose to move away from older style nickel cadmium batteries which have a huge weight disadvantage and a lower power density than the lighter lithium units. There is some speculation that the excessive heat and fire on two jets was caused by a bad batch of batteries but we will not know until the investigation and root cause analysis is complete. The key here is speculation and if you are a practitioner of root cause, speculation has only a small place in the process. You can start with it but in order to make an actionable finding it must be backed up with facts. What ever you do don't forget to concider that much of the speculation will be wrong and you have to keep an open problem solving mind. As of today Boeing has not been able to ascertain the root cause of the battery fire according to reports released yesterday. They have begun to propose design changes that address the symptoms of the root cause in order to mitigate there losses in the short term. However I believe in the long term the FAA will hold Boeing to a root cause solution.

Regardless of the outcome, I believe we can learn from the event as it has unfolded. We know that according to the original Nowlan and Heap study, which became the foundation of Reliability Centered Maintenance, that 69 percent of the failure modes within our facilities or on our airplanes can be categorized as infant mortality. We also know that many of us, including myself in the early days, look to redesign as the first solution to many problems before truly understanding all of the root causes. If we put the two together then we can likely expect that if we use solely a redesign strategy then the introduction of more failure modes and more infant mortality is a given. When we redesign to remove one known failure mode we stand a very good chance of adding many more unknown failure modes if we are not using a tool like Failure Modes and Effects Analysis or FMEA. When we redesign we very seldom make things simpler and therefor the more complex redesign comes with more points of failure. So please understand redesign is a valid solution but it should not be your go to solution within most facilities. In the spirit of wrapping it up the three take always from today's post are:

Use root cause and don't be afraid of short term solutions but always have a long term plan for defect elimination or mitigation at the systemic and latent levels based on return on investment and risk thresholds.
Redesign should not be your first choice in most failure investigations. Infant mortality and unknown failure abound in redesign decisions.
Third if you are a root cause practitioner then you can not give in to speculation and rumors without data. Lets leave those to the tabloids and non fact based news outlets of which we have plenty.

To provide a historical perspective and a bit of data, the following quote was provided in an earnings conference call last week by United Continental Holdings Inc. Chief Executive Jeff Smisek. He defended the plane saying:

"History teaches us that all new aircraft types have issues, and the 787 is no different," Smisek said. "We continue to have confidence in the aircraft and in Boeing's ability to fix the issues, just as they have done on every other new aircraft model they've produced."

Designers and Re-designers must expect infant mortality and use tools like RCM, FMEA, and RCA to mitigate these problems and the risk that associate with them. What are you doing to reduce infant mortality in your plant?

Pages