Monday, December 31, 2012

What Are Your Three Reliability Resolutions?

Photo by Shon Isenhour
Reliability Resolution Wrecker
Yes it is that time of the year... resolution season. Everyone is planning to lose weight, get in shape, learn a new instrument, lower their stress levels, spend more time with the family, play more and harder or some other life change. If you live your life in a operations, maintenance, and reliability role the one thing you need is more time to make these resolutions possible. The problem is that many of us can not go to the gym or spend more time with the family when your reactive facility requires your constant attention. I have lived in that reactive world. The plant controlled my life instead of the leadership controlling the plant. I think the plant gremlins went into overdrive around the holidays. the facility tended to break down on holidays and important weekends and required a trip into the plant to sort it out. So thinking back what resolutions would I have most liked to make the plant's "Reliability Resolutions?" Here they are:
1. We were good outage planners and executioners but we needed to rely less on preventative maintenance outage work and more on the predictive tools. We needed to stop the invasive downtime requiring inspections and incorporate and trust the predictive tools to provide surgical precision with in our outages. This would have shorten outages and decreased infant mortality.
2. We needed a resolution to make precision maintenance our new standard. We needed to provide the training and the reinforcement structure to drive proper alignment, balancing, torquing and lubrication.
3. We had an ageing workforce so our third resolution should have been one aimed at capturing their knowledge within the EAM or CMMS. There was a lot of equipment history with in the heads of the "mature"crafts. If we could gather this data and add it to the precision maintenance mentality resolution we could make a step change in the infant mortality rates and reoccurring failures.
So these are my three "Reliability Resolutions." With these resolutions we could have begun the process of retaking the facility and controlling it instead of letting the equipment be the boss. If we kept our focus then in no time we would have had the free time to focus on our personal resolutions again. What would your three "Reliability Resolutions" be? If you don't mind sharing then please post them below.
Happy New Year!

Thursday, December 27, 2012

Wrapping Up The Year In Reliability With A Challenge

2012 has been a very interesting year. We have seen increased national focus on manufacturing and some additional focus on engineering as well. We have seen some industries began to gain strength and production demand while others are not so patiently waiting for their own increase.
From an asset management standpoint, efforts continue to increase as the new ISO 55000 standards gets closer to reality. You should see more and more discussion on this topic in the new year. I will feature some here at
In general reliability improvement efforts seem to be stable and possibly showing the beginnings of increased focus for next year. Many industries are hesitant to dive in until the beginning of the year when "things will be clearer." If you are trying to get your organization to move forward you may hear many excuses for the delay but they tend to fall into two categories. Below I have listed the two and a few simple thoughts on how best to help each group pull the trigger on reliability.
Group 1. We are too busy to take on reliability at the moment...
Goal you create: More uptime through reliability
For these folks I would suggest that you might want to help them to see all of the production losses they are facing and showing them how to support the reliability improvements efforts without affecting current production demands. One way to do this is through a reduction in non value added activities and the associated downtime. One example would be having a small group eliminate non value adding preventive maintenance task. This effort will increase uptime of the equipment by reducing the downtime required for PM activities and free up resources to work on other improvement task by reducing the maintenance work load. The site might use the freed up labor to work with planners and improve job plans and incorporate precision maintenance. This would eliminate more maintenance downtime and build the improvements. By the end of the year these sites should be making more product and more profit because of their reliability improvements. 
Challenge: This group may try to delay the improvement for tomorrow because they are just too busy making the production for today. You have to break that cycle by creating a burning platform and a reason for change.
Group 2. We are not busy enough to take on reliability at the moment...
Goal you create: Lower manufacturing cost through reliability
Interestingly, the tactics here are similar to the previous group in that we want to reduce non value added task but the goal is completely different. This time we don't always want more production since the demand is low but we want to produce what we need at the lowest cost possible. This can be the perfect time to attack long standing reliability issues because the work can be done at a much lower cost by planning it out and scheduling it using the down time provided by lower demand. Since production downtime cost is the biggest part of the spend for many improvements this makes it easier to do during a slowdown. Now we need to free up money and labor to pay for the reliability improvements and one way that we can do that is through refining your equipment maintenance plans to remove unnecessary part expense and labor cost. The quickest way can be a combination of RCM and PM review and RCA done to prevent failures, stop re-occurrence and remove any item that does not address a known and likely failure mode. By reducing activity for activities sake you reduce material cost and contractor cost while also reducing infant mortality.
Challenge: In this group the fear of failure is high due to economic pressures and this fear can stop the organization from being able to make the changes that lead to the improvements. Master the elements of change management to ease the transition.
Now there are many other ways to go about addressing both of these situations but they are both situation specific and a function of the sites maturity. You may even find that both of these situations exist in your site.  Feel free to reach out and we can discuss your specific details and craft a plan directed at your needs.

I wish you a great year of reliability in 2013 and I challenge you to attack unreliability in 2013 where it lives, in the minds of your organization. Create Desire, Educate, Apply, Sustain!

Wednesday, December 19, 2012

Retropost: Do you have an "Elf on a Shelf" in Your Facility?

 Do you have an "Elf on a Shelf" in Your Facility?: The Elf on a Shelf has become a great behavior modification tool in many households. In ours, all we have to do is point at the little guy a...

Monday, December 17, 2012

3 Reasons Why Operations Does Not Support Maintenance and Reliability

One of the most common things maintenance folks say is that operations does not support maintenance and reliability. It sounds like this:
"If it weren't for operations we would be reliable"
"They think their job is to break it and then it is our job to fix it... and fast"
"They will not let us have the equipment for PM and they wonder why it breaks down"
Want to know what operations has to say? Here are three quotes and a set of underlining causes:
1. Operations says: "Every time I give them the equipment for PM downtime it runs worse on start up than it did at shutdown"
Reason: Maintenance overly relies on invasive PMs that induce infant mortality instead of using Condition Based Maintenance (CBM) which is performed while the equipment is running and does not induce failures. If maintenance could get fifteen percent of there labor dedicated to CBM and then fifteen percent dedicated to PM then the balance would be better and the number of maintenance induced failures would drop. One example is eliminating the PM where you open a gearboxes up for a gear inspection and transitioning them to CBM inspections which can accomplish the same task without the potential for reassembly errors or foreign contamination in the gear box.
2. Operations says: "Maintenance never sticks to the schedule. They ask for 8 hours and take 16" Maintenance creates a schedule with work that is only marginally planned and then overruns the outage timeline because the estimates are completely inaccurate. If you don't take the time to break the job down into estimable task or steps then it becomes very hard to produce and accurate schedule. The way this sounds in the field is "Oh that job, it will take about a half shift for two guys"
3. Operations says: "This equipment runs better if I can just keep it running and keep maintenance out of it."
Maintenance does not practice precision maintenance therefor as work is completed defects are induced and equipment fails prematurely. One example is the installation of a bearing on a shaft with a hammer and chisel instead of a bearing heater and impact fitting tools.

The point here is that if we as maintenance and reliability professionals start by addressing our issues it becomes much easier to ask operations to address theirs. Or to say it another way:
If you wanna make the world a better place
take a look at yourself, and then make a change

Wednesday, December 12, 2012

Looking Back for the Keys to Sustainability

From time to time I get to go back and visit plants and facilities that were clients of mine from years ago. This week happens to be one of those weeks. I find it so exciting to go back and see what they have held on to, what they have improved, and where they have slipped back and why. Interestingly I see different things in each site. Some sites progress with continued focus on reliability, others moving  into other initiatives like lean or six sigma and build off of their reliability results further building success and a third group lose focus and slides back toward the old norm and the reactive philosophies of the past. These sites have many reasons why they slip back including leadership changes, union issues, retirements, sudden changes in the market place.
The question for today’s blog is how do we lock it in? How do we sustain the cultural change that has been or is being completed?
The plants that have the most success have done these four things.
First they have moved past a champion model. Reliability for them is more than just one person’s vision. They do not have one single leader for the initiative that either charismatically leads the pack or forces compliance within the site. They have many folks who see the benefits of reliability and evangelize it consistently. Nearly all implementations start as a champion model with a key leader but the sustainable ones work this down into the organization and develop and army of like-minded drivers that demonstrate that reliability is the new way we do business.
Second, they communicated broadly as they implemented with a plan, consistent activity, and increasing site involvement. These groups start by going through a risk analysis of the transformational change. They look to understand what might go wrong and what they can do to mitigate that risk. They communicate at every stage of the process and the message and media changes based on the risk and the needs of the impacted. They focus on situational leadership and provide the individuals with what they need to help them progress through the change and then sustain.  In short they plan their communication and they work the plan increasing involvement and pushing toward the tipping point and increased likelihood of sustainability.
Third, they have a clear goal, vision, and business case and they reject things that go against that vision.  If the vision changes then that is fine but they work hard to communicate the changes. They evaluate all initiatives site wide and include only the ones that support the goals. Of those they select they sequence them in a way that supports and allows them to build. This means they may do a part of one initiative with the required resources and then complete a section of another selected improvement strategy knowing that they will build on each other and help improve overall site performance. If they take this approach then they build a system unique to them that can be developed without overloading the resources and provides for stable continuous improvement.
Fourth, they use metrics effectively. They don’t focus on every metric all the time they focus on the metrics that drive the behaviors that they need to change at that moment. Once the behavior is changed and becomes the new norm then they move their focus to other metrics and other behaviors.
There are other factors that come into play but these are some key success factors that seem to be present for success and sustainability.

Monday, December 10, 2012

Monday, December 3, 2012

Data Collectors or Dust Collectors: Three Ways to Knock the Dust Off

We all love shiny new reliability toys, right?
Well I am spending the week in the middle of a conference center full of beautiful shiny new condition monitoring/ predictive maintenance tools in Bahrain at the Maintcon conference. I know that the IMC-2012 International Maintenance Conference is kicking off as you read this in Florida as well. So, literally thousands of people will be in a sea of shiny new equipment this week. They will check out the newest in touch screen technology, they will ogle at the cool wireless features and they may even lust after the robustness of these new super industrialized models.
But let me share a dirty secret, many of them are not looking at data collectors they are actually looking at future dust collectors.
While sitting around with many of the leaders of the companies that provide these incredible technologies they have all lamented with sad eyes about the cool tools that have never made it into regular use in some facilities. One vendor said it makes him want to cry when he sees "his equipment setting on the top shelf with a layer of dust on it." They are proud of their work and they want to see it used to improve facilities reliability. Below are some of the actual excuses mumbled by maintenance folks in facilities globally for why there is dust on their technology:
  • "No time to get trained on the unit" Training issue
  • "No one does anything when I identify a fault" Communication, Process, and Training Issue
  • "No budget left for training class" Training Issue
  • "To busy fighting fires" Process issue
  • "The other maintenance guys don't trust the technology" Training issue
  • "Operations will not give me the down time" Process issue
  • "The old way was easier" Training issue
  • "I can't get the time to mount the sensors" Process and Priority issue
  • "I didn't order it. I wanted the other one." hmmmm.... Attitude issue? OK, how about change management issue
So what can we do? Here are three thoughts that might help you avoid your own set of dust collectors.
First, don't buy technology if you don't have the basic business processes in place. Good technology with bad processes just makes bad things happen faster. Think about how you are going to use the technology. How will you plan and then schedule the resolutions of the findings? If you can not plan and schedule the repairs then you are merely refining your run to failure strategy and continuing to make repairs at 5 times the cost.
Second, package the training into the purchase price of the unit and issue one purchase order. Don't try to "buy it in bits" Get it all at once or wait until you can. You will want to capitalize on the fact that it is new to get folks to engage in the training and apply the technology in the field. Of course you should train your users to operate it but don't forget to create awareness training for those that will be affected like your craftsmen who will make the repairs based on the technology and your planners who will use the findings to plan the work.
Third, plan the time to set it up right. Develop your equipment list, routes, and alarms right from the start. Going back after the fact is gruesome and frustrating and working in a bad database just makes an analyst mad.
What things have you done at your site to prevent your data collectors from becoming dust collectors?
Please feel free to share below.

Have a great week

Friday, November 30, 2012

Ignoring Alarms and Feeling the Fire

I find myself in the Middle East this week and while having lunch today the fire alarm went off in the hotel. The interesting thing is that no one moved or showed much concern for that matter. One guy in the distance did said 'If I do not see fire, smell fire, or feel fire then I am not moving." I have seen similar behavior all over the world. This all got me to thinking, how often do our assets give us alarms that we choose not to hear? How many of us ignore the early points on the P-F curve only to wait for the sight of fire or the smell of smoke. Once we smell the smoke or see the fire it is likely too late to execute a properly planned job and because we may not be able to get the repair done quick enough the chances are very high that a catastrophic failure is likely. When equipment suffers a catastrophic failure you are subject to higher cost from every angle. More parts cost due to additional damaged components, more shipping cost due to expedited spare parts, more labor cost due to overtime required to complete the unplanned repair,  more contractor cost for specialty repairs and support, and finally more operations losses that can cause the total bill for the repair to skyrocket. The most recent statistic that I have found states that the emergent repair will cost you nearly 5 times as much as the planned and properly executed job. So if we can look for the early alarms and heed their warning we can lower maintenance cost and increase production substantially. We can use the operational data like amp draw and differential pressure or we can use the condition based tools like vibration, ultrasound, and IR to give us these early indications. The key is to correctly set up the alarms so that we can trust the technology and not just sit and listen to it sound.
What early signals might you or your operators be ignoring that if caught could reduce repair cost and ancillary damage and increase production reliability?

Listen for the alarms do not wait for the fire
because at that point the situation is dire.

Saturday, November 24, 2012

Thanksgiving Leftovers and Lower Cost Reliability

What if after the Thanksgiving feast you just stuck the leftover food in your china cabinet until the next time you wanted them. This sounds crazy yet many times after plant outages when kitted work is completed the left overs are crammed into a tool box or under a desk in an office. A tool box is no more a suitable storage place for left over parts than a china cabinet is for left over turkey and potato salad. When you are done with the turkey it goes into the refrigerator and when you have leftover parts they should be returned to the maintenance store room where their quality can be verified and they can be preserved in a controlled environment. 
The questions are:
Do you have a "parts return to stores" process?
Does it make returns easy to prevent "squirreling" of parts on the shop floor?
Does it insure quality parts are kept in stores and damaged parts are disposed of?
In the end we want to prevent spoilage and insure that the part and the turkey provides maximum health and reliability and not infant mortality. This way of thinking will get us a lower cost level of reliability now.

Monday, November 19, 2012

5 Reasons to be Thankful for Reliability

Many of you are advocating reliability improvement within your facility on a daily basis. If you want to increase your buy-in from the general population than one of the best things you can do is tell them what's in it for them.
Here are 5 reasons to be thankful for reliability if you have it and why you might want it if you don't:
5. Better relationships with co-workers
If you are not fighting about lack of access to the equipment, quality problems, over budget maintenance cost or not meeting customer deadlines than your coworker act less like evil trolls and more like people you might want to play golf with on the weekend.
4. Less stress about job stability
When a facility runs reliably then it can be a star within the division as opposed to the whipping boy in last place on the cost curve waiting for the other shoe to drop. Reliable plants are not immune to closures and cutbacks but statistically they outlive their peers.
3. Afternoons with your kids
When you enjoy employment in a reliable facility there is very little forced overtime. This is thanks to great job plans, low reactive behavior and schedules that people adhere to. This lack of forced overtime means you can make more time to spend with your kids playing ball, reading and preparing for the future.
2. Holidays with family
When you have reliable equipment you will have fewer calls to deal with plant issues during your family time because the plant is performing, the outages are executed efficiently, and the staff is trained to deal with the minor issues that arise from time to time. This all means you can eat turkey with your parents, family and friends with out pesky plant problems calling you away.
1. Safety at work
A reliable plant is a safe plant. Both Ron Moore in the book "Making Common Sense Common Practice" and a study that is currently being done at the University of Tennessee has demonstrated the statistics to prove reliability improves safety.  But if we just think about it logically for a minute, if you can plan and schedule the repair with the right tools and the right parts and the right people with the right expertise then surely it must be safer than an emergency repair done on forced overtime after a long exhausting day with tools that don't work quite right and parts that don't fit quite right all while an operations manager continuously rushes you to get it running one way or another.

Tell us why you are thankful for your reliability in the comments below.

Monday, November 12, 2012

Solving the Crime of Unreliability: Elements of a Process for RCA

I was recently watching a popular crime drama on TV and I noticed that they follow a very similar process when solving a crime that I do when solving a reliability problem in a facility.
The first thing the detectives do is identify questions they have that they would like answers to and then collect all the evidence they can to begin to answer those questions. Then they build a timeline to understand where things fit around the crime. Then they combine the evidence and the timeline together and identify the motives and the finally the suspects. I have over simplified all they do but the core process steps are still there.
Solving the Crime of Unreliability in a facility starts with identification of the questions and the evidence to be collected. Then just like the investigator the next step is the collection of said evidence. I suggest folks use collection kits to help categorize and capture the data in its entirety. There is a blog here about the kits I use and what they contain.
Early on I skipped the element of time and did not complete the timeline or sequence of events prior to the use of other tools. Over time I learned this was a mistake in many cases and cause me to miss details. In two recent RCA investigations that were completed by others and reviewed and refined later by me, we discovered whole new causal chains and missed causes related to rebuild and maintenance execution that was not identified in the initial investigation. This was due to the fact that the original RCA team focused on their preconceived notions and did not look at what happen just previous to the failure in the sequence of events. Completing the sequence of events opened their eyes. It will do two things for you: first it identifies other potential causes and second it clarifies the causes that you have already identified. Just as the crime scene investigators then take the time lines and evidence and begin to look at the relationships I do the same. I choose a tool like fault tree or logic tree, among others, to attack evidence in the sequence and to draw the connections and the causal chains.  
It you find the crime of unreliability has been committed in your facility then you may want to make sure you have included each of these steps in your RCA process.

Monday, November 5, 2012

Force versus Finesse

Forcing Misalignment
This weekend I was watching my daughter assemble a toy robot. She is young and does not yet understand the concept of finesse. She is more of the brute force school of assembly. In her mind, if it doesn't go together then what you need to do is push harder. If you need to escalate beyond that then bang it on something. If that doesn't work then it is obviously broken and therefor can't be assembled. You can see her in the photo misaligned and still pushing harder.
Alignment Finesse
As I watched this I realized she is not the only one who does this on a regular basis. Many times we get a process or method in our mind and we skip the finesse and go strait for brute force. In our world it may look a bit different but it could sounds like this: "Boss says I have got to get this new process rolled out... I told our folks to just follow the new process... They didn't follow the process so I am going yell and scream about following the process... If that doesn't work then I will let them know that they are just not going to work out and I will look for someone who will follow the process." OK maybe a bit simplified and exaggerated but my money is on it not being that far off.
The real problem may be that the method being installed or implemented does not fit the application or the process is too different culturally for folks to grasp. Digging deeper we find that we did not take the time to look at the situation and understand all of the moving parts and how they work together. It could be that we missed an emotional, political, or rational issue with the change and until we look at each of these areas and address any underlying cause then adoption may escape us.
If you are in that situation take a look at the three areas shown to the above left. Think about what could go wrong in each of the dimensions and how you can proactively address that area using finesse in the place of sheer force.

Tuesday, October 30, 2012

Are Your Reliability Efforts Haunted by the Lack of an Effective Plan?

So many facilities struggle with timely results and expected return on investments within their reliability improvement efforts. One of the demons that continuously shows it's face is the sequencing within the improvement plan.
Sites tend to want to pursue the exciting and fun things and forget many of the foundational elements that support the shiny stuff.  For example a site will purchase technology in the form of software or predictive tools before developing work flow process. Without understanding the work management process you don't have the data needed for the software to analyze nor the ability to execute the work identified with the predictive tools. This just leads to the exciting new tools being underutilized and then eventually put on a shelf to gather dust.
When you are looking at your initiative and planning your strategy, take a look at what is required to support the pieces that you desire. Be honest about your maturity in these support elements and build your base before your tower. Towers are great to look at and show others but foundations are where a smart man spends his time.
Some of the common issues I see include:

  1. Scheduling work without planning first. How can you have an accurate schedule when you have not had a planner break down the job into small enough parts to accurately estimate time required?
  2. Predictive tools applied without work process to execute the findings prior to point F on the P-F curve.
  3. Software such as EAM and CMMS without standard work process to make them work.
  4. Software for engineering analysis without data from FRACAS or failure coding to input for analysis. 
  5. Initiatives like lean and six sigma applied before with maintenance induced variation and waste is reduced. 
  6. Reliability engineering tools applied before maintenance engineering tools. When you have a pure reliability engineer working on the future in a fully reactive facility it becomes a lot like rearrange the deck chairs on the Titanic. You need the maintenance engineer to introduce procedures, precision maintenance and failure mode based maintenance to stop the the sinking and then we can focus on the next cruise. 
If you want to keep the gremlins out of your improvement strategy then focus on the processes first then apply the tools, software and shiny stuff.

Monday, October 22, 2012

SMRP Annual Conference Recap

The Society for Maintenance and Reliability Professionals (SMRP) 20th Annual Conference was last week in Orlando and I thought I would take a few minutes to share the highlights as I saw them.
Attendance was great with over 900 people from all over the world at a great venue that provided for networking and learning with over 70 industry papers presented.
Disney Institute provided a seminar on Business Excellence. Supplier members provided many other pre and post conference workshops which were well attended and from what I saw very well presented.
The tours were great. I was able to attend a behind the scenes maintenance tour at Universal Studios that was very informative and entertaining. The group rode the Spiderman ride and then went into the maintenance shop to see the ride equipment and witness the maintenance testing of the assets we had just ridden on. They wrapped up the night with a great dinner and a presentation of all the fine work they are doing to improve reliability and the experience for the customer.
In one of the conference tracks SMRP had two panels of past SMRP chairmen that provided thoughts on all facets of the organization and the industry. They spoke at length on topics like the upcoming ISO-55000 standard and the industry skills crisis.

Many SMRP benefits were featured included:
Creation of 3 Special Interest Groups: Oil, Gas and PetroChem, Reliability Analytics and Pharma & Biotech. It looks like a few others are on the horizon
Continuation of 18 local chapters. Is there one near you?
Award winning Solutions Magazine
Free Regional Workshops
SMRP Metrics Compendium 
SMRP Bench Marking Survey
CMRT and CMRP certifications are provided to allow member to demonstrate their skills
Past year Conference Proceedings are now available to members

Last but certainly not least the new board of directors was elected and I have the honor of serving as the Chairman for the next year.
It was a great week spent with an incredibly passionate group of reliability professionals. 

Tuesday, October 9, 2012

I'm A Change Manager... But Do Not Ask Me To Change. Part 3 in a 3 Part Series: Duck Hunt

Communication is key, but we have all heard that before. That in its self is not news. But the element that I see missing from many change initiatives is the plan. So maybe it should be "Communication planning is key." If you are trying to create change in the organization and you do not have a communication plan then your communication may be ad hoc and ineffective.
It would be much like going on a duck hunt with a rifle. You would make a lot of noise but the chances of success are unnecessarily low. You need a distributed plan that like a shotgun blast approaches the problem in multiple ways and increases the chances of success.
If you completed the FMEA of your project  (from the last blog) and you have your list of ways the project could fail and the causes associated with those failures then many of those can be addressed in your communication plan.
The plan should include the following things:
  • Items to be communicated (Goal of communication element)
  • Audience (Who needs to hear or see the message?)
  • Time frame and number of iterations (When does it get sent?)
  • Media (How does it get sent?)
  • Person responsible to create (Who?)
  • Person responsible to deliver (Who?)
The other key thing to think about when developing your communication plan is striking the right balance of broadcast and two way dialogue around the messages. Two way is the most effective form of communication, however it is not always efficient. Email blast to the complete population is very efficient but not very effective. You have to find a balance of both.
In the end you need to think about the messages and the points you want to communicate and then craft a plan that has repetition of the message in multiple medias for extended periods of time to ensure the highest level of penetration and understanding.
Happy Hunting!

Thursday, October 4, 2012

I'm A Change Manager... But Do Not Ask Me To Change. Part 2 in a 3 Part Series

 In this post we will focus on risk  management for the change effort. Since many of you are maintenance and reliability engineering type folks I would suggest that you use you failure modes effects analysis (FMEA) tools to look at the change you are trying to make. Think about what the function or goal of the change is. Next you would think about the ways your change or initiative could fail to meet its function. Then think about what might go wrong that could cause each of the failures. Once you have completed that step you can look at what you are currently doing to prevent the failure. Then you could apply a ranking like risk priority number (RPN) to bring the high risk modes of project failure to the top of the list. RPN is assigned based on three factors multiplied together: severity for the failure times the probability of occurrence times the chances of detection prior to failure.  These are all on a ten point scale. Once this is done then you can look at the high risk elements and create communication and project strategies that will lower the risk in one of the three factors of RPN. If you take this step then you will be ready for next week’s blog which will dive into the communication strategy. If you would like to see past blogs on this subject then click here.

Thursday, September 27, 2012

I'm A Change Manager... But Do Not Ask Me To Change. 3 Part Series

I was reminded this week during my travel that even when your business is facilitating change in organizations, change is still hard.
For those of you who travel regularly for business you probably read the USAToday newspaper. It ends up under or outside your hotel door every morning. For many of us it becomes a ritualistic part of our morning on the road. Well this week as I traveled I noticed the USAToday has changed... a lot. The content and the political slant are the same but the layout, logo, fonts and the headers are completely different. Things are not exactly where they were and now they have added new bits like the "tweet of the day." I have quickly developed a modest dislike for the changes. The format actually bothers me because it does not look like "my paper". My morning ritual has forever been changed and I was not involved or warned. As I have spoken with other business travelers they too were shocked by the major changes. And, what is up with that weird extra blank bit at the bottom of the page. But enough of the ranting, the real point is no matter who you are chances are you are not a big fan of change no mater what the size and scope. Change is even hard when it is necessary for survival.
So what can we as leaders do to facilitate changes and ensure they are successful?
Three things that I find that can help are project planning, risk analysis, and focused communication.
In this blog we will talk about the first of these, the project plan. It is a crucial step that many project that miss their projected return on investment treat as optional. In these sites they may start with a plan but do not edit the plan and keep it current. The interesting thing is the project plan meets the needs of the individual who is changing at every point in their change process. If you are familiar with Ken Blanchard and his Situational Leadership 2 model I believe the plan plays a key part in each of those four famous phases. It helps early on by describing the scope and elements of the change and answering how the individual will be affected. As the project progresses and the individual reaches what I call the valley of despair it breaks the overwhelming large project into small bite size task that can more easily be completed driving the change forward. As the individual hits the third phase of the change process the plan shows completed task or progress even when the overall change effort may have not made it to the point of generating a return on investment. This visualization of accomplishment is important to drive the project forward to completion. During this completion phase the project should be developing the projected return on investment and now the completed project plan is a trophy of sorts to take pride in the accomplishments and use as an example for others on how they can make the change as well.
Now if only I had seen the plan for the big change to the USAToday...

Thursday, September 20, 2012

Fish Bone Alone Doesn't Deliver Root Causes

Chances are, fish bone or Ishikawa diagrams alone will not get you to the root causes. I refer to the fish bone as basic root cause tool. They serve a purpose and they do enable root cause investigations but they do not necessarily have the power to be a stand alone tool.
Let's talk about why and then what they are very good for.
The reason they are not able to give you all the root causes comes from the way they are used. In general they produce a categorized list physical causes and human causes but they do not identify  causal chains or underlying systemic or latent causes. Many times they only feature the symptoms of these latent causes as the bones of the fish. If you choose to only use the fish bone you have the potential to miss many issues and the connections that tie them together. I have reviewed many of these diagrams where the real root causes were just under the surface of the list but never brought to light during the investigation.
The real focus for me is the return on investment and if my root cause program is driven by only fish bone than I am implementing more solutions trying to address all of the identified bones of the fish at additional cost and I am more than likely suffering reoccurring failures driving lost production and additional analysis cost.
So what do I use the fish bone for? Personally I like to use it in three ways first as a lead in to the FMEA when working with a group that may have not used that tool before. I would take the bones of the fish and transition them over to a spread sheet or FMEA tool. This can help me get folks engaged and help to began the population of the next tool.
The second way that I use the fish bone is to is as a brain storming tool where we can identify many things that could have caused the failure and then I assign the individual bones to groups and they go out and look for data to confirm or deny the existence of that cause. Then when the team gets back together to apply the next tool and pursue root causes and solutions we have data to keep the process moving.
The third reason that I use the tool is as a facilitation exercise for when I have a quite or boisterous root cause team or team members. In this situation we use stick notes and we all write and stick causes to the diagram. This give the less expressive folks and alternative in writing instead of speaking and it allows the expressive folks to see the sticky note and know they were heard. This can get a group to develop many more possible causes and then they can verify, investigate and eliminate with data producing a better lead in to the next more thorough tool in your root cause process.
So to wrap it up fish bone analysis is a tool that has its place in the root cause process but if you want the lowest total cost of solution you will need to couple it with other tools from your root cause tool box. To read more on root cause check out these post from the past.

What are your thoughts?

Monday, September 10, 2012

News From the Reliability World: GPAllied Acquires ABB Reliability Services

NEWS from the Reliability World:
I wanted to double-blog this week to welcome a new team of coworkers to the Allied Reliability Group and GPAllied. This week we announced the acquisition of ABB Reliability Services and the old HSBRT group. I am personally quite excited to get to work with these great folks once again. We worked together in the past and I have learned both from and with them. I am convinced that what they bring to the table will make our organization stronger and allow us to provide more results for our clients. I look forward to some of them adding content here for the ReliabilityNow reader to enjoy.
I have never mentioned all that the Allied Reliability Group has to offer but with these additions I think a quick list is in order.
Total Plant Reliability Implementation
WCR Benchmarking with 30 years of history
Operator Care
Craft Skills
Reliability Engineering
Condition based maintenance
CMMS /EAM implementation
Planning and Scheduling
Rapid Action Teams
Change Management
Blended Learning Solutions
World Class Public Education

ABB RS welcome to the the Allied Reliability Group
Welcome to the new GPAllied

For more information please see the press release here!

You Don't Need an Asset Manager!?!

This week I am attending the Global Forum on Maintenance and Asset Management (GFMAM) executive meeting in Rio de Janerio Brazil. We are talking about things like the reliability, Asset Management Landscape and the upcoming ISO 55000 standard. One point that has been discussed quite a lot is the concept of an Asset Manager. In the US we are seeing companies and individuals "upscale" their titles from maintenance to reliability to life cycle management and now asset management. The sometimes missed point is that each of these is substantially different from the one before. You could argue that maintenance is a subset of reliability and reliability is a subset of asset management but even if you don't agree with that I would suggest that you might agree the scope of asset management is none the less very broad. If you would like to see GFMAM's asset management landscape which outlines the many elements click on the link above. So based on the breadth of the topic of asset management many of us have come to the conclusion that one person can not acquire the necessary level of knowledge to adequately manage the full scope and will not have the necessary time and focus.
My current understanding affords me the opinion that the best solution is an asset management core team. This team would be made up of an engineering manager, maintenance manager, operations manager, and a financial manager working together. If we use this structure we can cover all of the topics of the asset management landscape with an individual who has the understanding of the core concepts of asset management but also has the specific knowledge of the topics in their area of focus and function.  On the other hand if your organization goes with the stand alone asset manager, by the global definition, the individual will be overwhelmed and could find themselves in conflict with the other managers. By the nature of the job description the AM will play in the other manager's sandbox and at the least create a perception of minimizing their power and influence. 
What are your thoughts? Have you seen the asset management landscape? Do you have an operations or maintenance manager that reports to asset manager?

Tuesday, August 28, 2012

Five Ways to Improve Plant Reliability with the Internet

Many of us are becoming more and more dependent on the internet to manage many parts of our lives, however some of our reliability peers and brethren still have not embraced the power of the web. So, I though I would share 5 great places to start improving your reliability with Al Gore's incredible invention, the Internet.
Five things reliability folks should use on the internet:

#1 Google 
Google is the most obvious tool but also the most powerful, it can provide access to any one of the following:
Vendor websites   
Old pdf vendors manuals and sales literature with specifications 
Common problem with equipment in the form of articles from publications, blogs, bulletin boards, and historical sites
Spare parts for obsolete equipment
Special tools
New technologies
The key here is to search different combinations of words because not everyone calls a piece of equipment or technology by the same name. You know, it is the old grease zerk, fitting, nipple issue that has plagued CMMS users for years. 
I will be facilitating a root cause analysis tomorrow and the first step in that process for me is to google the equipment manufacture as well as the product and a list of common failures of that equipment. Sometimes I am amazed by what is already out there on the web that I can use not to prejudge the root cause but to improve my understanding of anything from design operating context to expectations for maintenance and known weak design points.
#2 Wikipedia
Wikipedia is my go to site for information about companies, product, equipment, industries, and concepts. You can click on each linked word above to see an actual example.
Say you are curious about Monte Carlo Simulations (if you are just click), quickly you can see a definition, details, and many times even examples of how to use the tool. The one point to remember is that it is all crowd sourced, meaning that the "facts" come from many people and are reviewed regularly but it can still have some inaccuracies.
#3 LinkedIn
LinkedIn is where you can connect with many different people from your and other industries. It is a networking site for professionals and a great place to get answers to questions from others in similar situations. On LinkedIn I find both the Groups and Answers section extremely useful. Groups is where you can find a congregation of folks who are interested in the same topic, or company, or discipline and you join them in this interest. You can pose questions to the members of the group or just read their post and learn from their questions and thoughts. The Answers section, which you can find under the more button in the header menu, is a great place to go to ask a question of a larger audience. Your questions can be sent out to more than just a group, it can be sent out to everyone who has interest in the topic you select. This can be thousands of people. You can also address the question just to your network of connections depending on what your needs are. If you send it out to everyone on a topic then you will be amazed how quickly people will respond with great answers and along the way you may make some great new connections.
#4 Ebay
While everyone knows you can buy clothes or toys or cameras or cars most people do not realize you can buy industrial equipment and parts. I find myself sometime searching specifically for obsolete items to keep an old unit running late in its product life cycle. Here is a link to the business section where you will see list of item that you may have been looking for for quite some time. You may find the one switch you need to keep that one original widget machine running that you can not get the capital to replace.
#5 Twitter
Twitter, while clogged with reports or 'tweets" of lunch locations/meal details and celebrity death hoaxes, can be a great place to see what is new in your industry or maintenance and reliability in general. The key here is to follow only those people with similar interest and then you will see when they share information, a quote, a blog post or a relevant news story. This is one of the best ways to see what is new and find new content to use in your reliability toolbox talks from a past week's blog. I constantly see new articles that I can use with clients and with our internal folks as well. Also by following or setting up searches for the hashtags for conferences you get a great look at the content that all of the presenters are sharing real time even if you cant be there.

What sites would you add? What site has been a lifesaver for you?

Tuesday, August 21, 2012

Preliminary RCA Reports Promote Failures

Numerous government agencies including the FAA create preliminary failure reports however like many other things our government does preliminary failure reports are a bad idea for businesses as they drive many bad behaviors including reoccurring failures.
If you are currently providing these "educated guesses" to your organization prior to the actual root cause analysis report then the best advise I can give is to stop as soon as possible. I know it is not quite that easy but, when these preliminary reports are issued the manager and those effected are quick to review them even though they are merely guesses at the causes and may not be based on real data. The second issue is that once your complete analysis report is generated with all of the contributing factors, full causal chains, solutions and the data to support them is released then no one is really interested enough to take the time to read them after all they have already seen the prelim. It is like trying to read a book after someone spoils the ending, it is just not worth the effort. If you issue the prelim that  means that major decisions that effect the business are made based off of a quick reviews of partial sets of facts and strait conjecture. This conjecture leads to repetitive failure. In fact, many of the preliminary reports are nothing more than regurgitation of past experiences and not the facts related to this situation and real proven data.
I know many of you feel the pressure to get a report out with in 24 hours but in most cases this is just not a reasonable request and if you can not immediately change that expectation your should at least be working toward it. If you need to have a fast turn around on your RCA investigations then streamline your RCA process to limit the number of analysis required per month, request more resources for each investigation, and/ or speed up the report creation process. One way to speed up the RCA process is to use our A3 RCA reporting process where you put all of the information from the RCA on an A3 size sheet of paper that is populated as you move through the analysis process not at the end. I don't have the space here to show the full document but for an example send me an email at
The point is don't put out your reactive quesses at a cause if you can streamline your RCA process and put out a complete analysis in less than a week. Just like you can't get to excellence in reliability overnight you also can not complete an effective RCA in an overnight time frame either.

Thursday, August 16, 2012

Tool Box Talks: Not Just For Safety Anymore

Many companies have made great improvement in the area of safety in the last ten years but some have begun to plateau. They are looking for that little push to get to the next level.We know based on data that was released by Ron Moore in his book "Making Common Sense Common Practice" that safety has a strong correlation with reliability. So I would suggest that if you want to break through to a new level of safety performance then focus on your organization's reliability.
Photo by Shon Isenhour
How? Treat reliability just like we have safety. Make it everyone's responsibility. Keep it in front of the organization.
Many of us already have a morning tool box meeting where we stand around the tool box and talk about the jobs and the safety aspects of the days work. If you add reliability topics to this discussion as well then this is one way you can  keep the topic in the forefront of everyone's mind. It does not need to be a lecture as we are looking for a short five minute topic. Now this alone will not step change your reliability performance but as part of a balanced diet of other improvement task it can begin to turn the tides on reliability and on safety.
So what does it look like? I have seen it done many ways. In some cases the supervisor covers a topic from a single point lesson or other communication tool or a member of the team researches and covers a topic with the group. Getting a team member involved is great because the individual learns through their research and through teaching while providing the information for others to learn from.
What should we talk about?
I have seen topics that covered task like how to operate the bearing heater for precision installs, or how to choose the correct bolt grade or Loctite type. But I have also seen quick job plan reviews and root cause analysis report outs. As you rotate through you will want to focus on why each topic is important to the individual and the overall improvement effort. The key is to, like safety, keep the organization focused on the goal and the prize.

What topics would you cover?
What other ideas have you used to keep reliability in the organizations eye?

Monday, August 6, 2012

Curiosity Mars Rover, NASA, Engineering Excellence and a little bit of Reliability

So, ever since the NASA Shuttle program was canceled the view from the outside of NASA has been a bit drab but today they shot an arrow and hit a bulls eye 350 million miles away. This physics forum was predicting a 60-70% chance of success. Those are not great odds but when you look at the complexity of this delivery system, success becomes even more incredible. That's why this is so exciting.
This mission cost about $7 per every American citizen, according to the technical team however I get the feeling that is initial build and launch cost not life cycle cost. The nominal mission length is two years but they have now said they won't be shocked if it lasts much longer than two years. As an example of longevity look at the history from the last two rovers, Spirit and Opportunity. They were designed for 3 months use and Spirit made it 6 years and Opportunity is still going in its 8th year. That is serious reliability.
Many things that we as engineers design or procure will far exceed their "design life". I have spent time in 90 year old hydroelectric dams, 6o year old rolling mills and 50 year old smelters that were as reliable today as they had ever been. The key point to remember is that according to Nolan and Heap 89% of all failure modes are not time based. So with that said, age is not the major driver. It is how you design, operate and maintain them. In the end, that is all you can control. NASA engineered in flexibility to their rovers. This flexibility comes in the form of redundancy, but also the ability to re-purpose elements to do new functions as the situation dictates. They also practice a supreme form of precision assembly that equates to our precision maintenance. It includes standards of alignment, torque, cleanliness, and lubrication. They also make use of sealed for life products like bearings because it is somewhat hard to change out a set of bearings on the surface of Mars.  Finally they use predictive maintenance technologies to their fullest, measuring and trending all aspects of the craft as it operates to identify defects and mitigate where possible.  Are you thinking like NASA? Does your equipment demonstrate their levels of reliability? What could you change today?
 I will leave you with one little piece of trivia that I learned while putting this post together:
According to Miles O'Brien, the holes in Curiosity's wheels (seen above) spell "JPL" in Morse code. This is interesting because NASA made JPL take their name off the treads during the design and build phase. See even rocket scientist have a since of humor.

Tuesday, July 31, 2012

Want Success with Root Cause Analysis? Prove Yourself Wrong

As practitioners of Root Cause Analysis (and generally hard headed engineering types) one of our worst enemies is well... us.  Over time we see failures and reoccurring problems and we start to draw general conclusions. Then a few months later we see something similar and we apply past experience maybe a bit two freely and we end up using the tools of root cause to focus on proving our theory correct. This can lead to missing causal chains and pursuing solutions that will not solve the real problem ergo return on investment is not achieved.
To combat this issue I recommend that if you feel the itch to make a "know it all" proclamation to the RCA team first, take a few minutes to at least try to prove your self wrong. Use the tools to look for evidence and data that does not support your theory. It is not about proving yourself right, it is about proving yourself wrong. If you can not prove yourself wrong then by all means proceed.

This advice applies specifically to RCA facilitators but is good advise for others as well. This is not about stifling brain storming it is more about preventing railroading by the more out spoken types.

What advice would you give to new RCA champions?

Friday, July 27, 2012

Learning through Application for Return On Investment

As we develop new curriculum for our clients, we have put an incredible amount of focus on moving them from "training for training sake" to training for a documented return on investment.
Today I thought I would share a few of the elements that you might look for or create for your training efforts to drive a return on investment.

  • First determine your organizational and individual training needs. Then you can match the curriculum with those needs.
  • Create a charter for application of the training in their facility. This should include the task that will be used to apply the learning points from the curriculum and the expected return on investment from completing those task.
  • Ensure that this charter is approved and owned by both the student and their manager.
  • Verify that those that will be affected (above and beyond the student) by the training application work are aware and properly motivated. This could be operators, maintainers or supervisors in the area of application.
  • Create course material that is not just hundreds of power point slides. It should be interactive and social. We use simulations, games, case studies, e-learning, and teach back single point lessons to ensure that we are engaging all of the learning styles of our students. Our goal is to spend only one quarter of each hour on material directly from the slides.
  • Don't just have a training class. Connect your training with coaching in the field and project work that allows for application of each required learning point. This demonstrates learning while also driving your return on investment. 
In the end we want to verify that the learning objective have been retained and applied within the facility correctly and through this application we will then see the return on investment for the training effort.

Wednesday, July 18, 2012

Is Your Maintenance Organization Centralized Or Decentralized?

During a recent conversation with a group of clients we were discussing the idea of centralized and decentralized maintenance or in other words should maintenance report to a maintenance leader or an operations leader by area. Based on that conversation I wanted to share my thoughts on how I see the two and why you might want one over the other based on your maturity as an organization and then solicit your opinions on the subject.
If I were to discover an organization that was very reactive and lacked reliability maturity I would recommend a centralized maintenance structure where all of the maintenance organization reports up through a mature maintenance and reliability leader. Reason: if you are trying to improve reliability you need a strong central leader to drive the message early on and then build a coalition that moves the understanding out into the organization and ingrains it into the culture. As the organization reaches a higher level of maturity where operations understands the guiding principles of reliability that will help the organization to continue to improve then we can look at decentralization.
If you make the move to decentralized maintenance too early then you risk the craftsmen being stationed by operations next to a machine "to stand guard". Of course they would do this to facilitate faster reacting to failures and reduction of their set up and down time. What we want to occur instead is for those crafts to identify and eliminate the failure modes and prevent re-occurrence through root cause analysis, improved craft skills, precision maintenance and other tools not just learn to react faster.
What are your thoughts?

Wednesday, July 11, 2012

Five Hats for Reliability Engineers

I was recently meeting with some new reliability engineers as they were getting ready to step into the role for the first time. They were asking what kind of things they should expect to be doing in the role. I came up with five hats that they will wear on a regular basis.  I have listed them below with a bit of detail.
The Technical Reliability Hat
Reliability engineer first and foremost are there to analyze failure history and prevent or mitigate failure in the future. While wearing this hat they will use tools like RCM, RCA, statistics, PdM, FRACAS, experts,  and the internet.
The Trainer Hat
Reliability engineers will wear this hat when they begin to share new techniques for predictive maintenance or precision maintenance as well as any other technique that have not been used in the past. Here they will be looking to introduce the “best practice” and demonstrate how it should be done and why it is important. While wearing this hat you will see them use training manuals, equipment manuals, single point lessons, experts from outside the company, and maybe even a bit of Power Point.
The Coach Hat
Successful reliability engineers will put this hat on when they go out on the floor after providing training.  They will work with individual to ensure complete understanding of the improvements to the process. While wearing this hat you will notice the engineer using tools like his ears to listen and understand the concerns of the individual and his hands to demonstrate the concepts while answering the questions that linger in the minds of the students.
The Sales and Marketing Hat
Reliability Engineering is not understood as of yet by the masses. That is why you must have a sales and marketing hat. When you wear it you will be marketing the value of precision maintenance, RCM, RCA, FRACAS, and the cost of not doing it. You will also be selling the predictive maintenance tools to both the maintenance crafts as well as the leadership. You will feature the “saves” to build a basic understanding by the affected parties until everyone “buys in” to the concept. When wearing this hat the RE will be using samples of past success, failed components as props,  pictures, case studies, and benchmarking results.
The Meeting Hat
This is an ugly hat in some organizations where they exhibit the traits of a “Meeting Manufacturer”. These MM sites seem more focused on making meetings that they are on making products. In general however, the RE should expect some time with the meeting hat. Wearing the hat is the only way to get many things accomplished within the organization. While wearing this hat you may find that you need to take along all of the hats above to be successful.

Avoid the Fire Helmet
This is the one hat that reliability engineers need to avoid as much as possible. If you are continuously forced to wear this hat you will not be able to put in the time wearing the others. To say it differently, If you are focused on today's spot fires you can not be focused on tomorrows forest fires and will be held back from improving overall plant reliability.

What other hats do you keep lying around to wear as an Reliability Engineer?