Featured

Simplified Troubleshooting Overview

This post is part of a series describing the generic, high-level troubleshooting process.

In this series I will be attempting to give a very high-level explanation of the troubleshooting process. This will be generic and can be applied to almost any equipment, although it is geared toward electrical/electronic troubleshooting.

Previous Posts:
What is Troubleshooting?

Effective troubleshooting is 60% science, 40% art, and 50% luck. You will need to draw from your training, experience, and gut feeling.

  • Your training will give you an understanding of how things are supposed to work, and the basis of where to look within the system for possible causes.
  • Experience is the second best teacher. It will give you an understanding of how things work in the real world, and past troubleshooting may lead you to the problem more quickly.
  • Sometimes your gut will tell you to try something that does not make logical sense, but somehow, it corrects the issue. I have learned to listen to these feelings and to stop asking “Why?” once the problem is solved. Usually I am never able to figure out how that could have been the problem.

If you noticed, I said that experience is the second best teacher, I know that everyone says that experience is the best teacher, but I disagree. It is much better to learn from other people’s experiences. You can gain the knowledge without having to go through the time, mistakes and pain of learning it for yourself. Learn everything that you can from anyone who knows more than you on a specific topic, even if they are half your age.

Troubleshooting is a process of trying and failing over and over again, until you find the solution. You can’t be afraid of being wrong, if you’re afraid of failing, you will never solve the issue. This is made easier when the equipment is completely non-functional, because how could you possibly make it worse?

I have put together eleven steps to troubleshooting that I feel capture this process at a generic level. These steps are mostly in order, but you will find yourself jumping around, going back up the list, or possibly skipping steps. The next posts will break down each of these steps in more detail.

Step 1. Understand the correct inputs and outputs of the “black box.”
Step 2. Determine which output(s) are incorrect.
Step 3. Figure out which input(s) affect that output(s).
Step 4. Verify that those inputs are as expected.
Step 5. Analyze the path(s) from input(s) to output(s).
Step 6. Check all connections.
Step 7. What are the devices in those path(s) that change the data?
Step 8. Eliminate the devices that could not cause this output.
Step 9. Test the remaining devices.
Step 10. Repair/Replace faulty device(s).
Step 11. Fully test the system.

Featured

What is Troubleshooting?

This post is part of a series describing the generic, high-level troubleshooting process. Stay tuned for future posts.

Merriam-Webster’s Collegiate Dictionary defines “troubleshooting” as “to operate or serve as a troubleshooter.

troubleshooter n.

  1. a skilled worker employed to locate trouble and make repairs in machinery and technical equipment
  2. an expert in resolving diplomatic or political disputes : a mediator of disputes that are at an impasse
  3. a person skilled at solving or anticipating problems or difficulties

To most people, equipment (of whatever kind, from a light switch, to an assembly line, to your smart phone) is just a “Black Box”. They understand that if I give it “X” input, I get “Y” output. That is the way it should be, no one can, or needs to, understand how everything in their life converts “X” to “Y.” Everyone also recognizes that when they get “Z” as an output, something is not right. The job of a troubleshooter is to determine what is wrong inside of the “Black Box” that is causing the “Z” output, and understand how to correct it.

Step 6: Troubleshooting Overview

Check all connections.

This post is part of a series describing the generic, high-level troubleshooting process. Side effects include, frustration, annoyance, hair pulling, possible bad language, and thinking out-loud.

Previous Posts:
What is Troubleshooting?
Simplified Troubleshooting Overview
Step 1. Understand the correct inputs and outputs of the “black box.”
Step 2. Determine which output(s) are incorrect.
Step 3. Figure out which input(s) affect that output(s).
Step 4. Verify that those inputs are as expected.
Step 5. Analyze the path(s) from input(s) to output(s).

Next Posts:
Step 7. What are the devices in those path(s) that change the data?
Step 8. Eliminate the devices that could not cause this output.
Step 9. Test the remaining devices.
Step 10. Repair/Replace faulty device(s).
Step 11. Fully test the system.

Step 6. Check all connections.
”The Jim Sweet patented swiping action.” That’s what we called it in one of my previous lives, where I learned this “trick” of disconnecting and reconnecting connectors. Contacts in connectors will build up corrosion which can completely stop equipment from working. Sometimes it is the obvious, everything is green or white, level of corrosion, sometimes it is so little (maybe a piece of dust) that simply disconnecting and reconnecting a connector will dislodge the problem and fix everything. Sometimes a little contact cleaner will do the trick. Other times you will need to replace contacts or entire connectors to get rid of the bad connection.

I can’t tell you how many times that I have taken something apart looking for a problem, not found one, given up and put it back together, and like magic… it works perfectly. When this happens, it is both satisfying and annoying. I like to know what fixed it just as much as I like it to be fixed. Often what fixed it is like the number of licks to the center of a Tootsie Pop, “The world may never know.”

Whenever possible, disconnect every connector in the path (one at a time) looking for corrosion, broken wires, recessed pins, bad crimps on contacts, anything that is not right. Also take a look at any splices and replace any that are suspect. Fix or replace anything that you find in this step. Whether you find anything or not, its a good idea to put everything back together and test it again. You might have fixed it without knowing and you can buy a bag of Tootsie Pops to celebrate.

Step 5: Troubleshooting Overview

Analyze the path(s) from input(s) to output(s).

This post is part of a series describing the generic, high-level troubleshooting process. These statements have not been evaluated by the FDA.

Previous Posts:
What is Troubleshooting?
Simplified Troubleshooting Overview
Step 1. Understand the correct inputs and outputs of the “black box.”
Step 2. Determine which output(s) are incorrect.
Step 3. Figure out which input(s) affect that output(s).
Step 4. Verify that those inputs are as expected.

Next Posts:
Step 6. Check all connections.
Step 7. What are the devices in those path(s) that change the data?
Step 8. Eliminate the devices that could not cause this output.
Step 9. Test the remaining devices.
Step 10. Repair/Replace faulty device(s).
Step 11. Fully test the system.

Step 5. Analyze the path(s) from input(s) to output(s).
Now that you have determined that you are looking at the right equipment (even if it is not where you started), you understand what the inputs and outputs should look like, deduced which outputs are not correct, know which inputs affect those outputs, verified that the inputs are as expected… its time to do a deeper dive into the inner workings of the equipment.

This step is a more detailed version of Step 3. If you didn’t have schematics, wiring diagrams, or other documentation in Step 3, you’ve probably already done this step. If you had paperwork, now is the time to physically look at the electron flow and see the routes, wires, switches, relays, circuits, pneumatics, motors, fuses, and everything else between input and output inside the actual equipment.

What is the path? What does it go through? Go back and read Step 3 if needed. At this step, we still don’t need to know what the “dots” do, we just need to see everything for ourselves. It is much easier if you learn the path and see what it looks like before you dive into the following steps.

Step 4: Troubleshooting Overview

Verify that those inputs are as expected.

This post is part of a series describing the generic, high-level troubleshooting process. Do not use if you are allergic to troubleshooting, blogs, or learning.

Previous Posts:
What is Troubleshooting?
Simplified Troubleshooting Overview
Step 1. Understand the correct inputs and outputs of the “black box.”
Step 2. Determine which output(s) are incorrect.
Step 3. Figure out which input(s) affect that output(s).

Next Posts:
Step 5. Analyze the path(s) from input(s) to output(s).
Step 6. Check all connections.
Step 7. What are the devices in those path(s) that change the data?
Step 8. Eliminate the devices that could not cause this output.
Step 9. Test the remaining devices.
Step 10. Repair/Replace faulty device(s).
Step 11. Fully test the system.

Step 4. Verify that those inputs are as expected.
In this step, we verify that the inputs to the system are correct. If you find an input that is not as expected, then maybe the issue is upstream of this equipment, or there is no issue at all. It is not unusual to find the the problem is:

  • Not a problem
    • Someone had an input set wrong
    • There was a misunderstanding of how it is supposed to work
  • In a completely different box
    • An upstream output is wrong, giving this box a wrong input
    • A downstream input has an issue affecting the output of this box (as discussed in Step 2)

You need to do everything that you can to verify that the issue is actually with the equipment that you are analyzing. If you don’t you could waste time troubleshooting in the wrong place. If you realize that the issue is in another box, move your analysis to the other box, and start over at Step 1. Only once you have personally verified that the inputs are correct, the output is incorrect, and the issue it not downstream, can you actually say that there is an actual issue with the equipment that you are troubleshooting.

Step 3: Troubleshooting Overview

Figure out which input(s) affect that output(s).

This post is part of a series describing the generic, high-level troubleshooting process. It is not intended to diagnose, treat, prevent, or cure any disease.

Previous Posts:
What is Troubleshooting?
Simplified Troubleshooting Overview
Step 1. Understand the correct inputs and outputs of the “black box.”
Step 2. Determine which output(s) are incorrect.

Next Posts:
Step 4. Verify that those inputs are as expected.
Step 5. Analyze the path(s) from input(s) to output(s).
Step 6. Check all connections.
Step 7. What are the devices in those path(s) that change the data?
Step 8. Eliminate the devices that could not cause this output.
Step 9. Test the remaining devices.
Step 10. Repair/Replace faulty device(s).
Step 11. Fully test the system.

Step 3. Figure out which input(s) affect that output(s).
So now we know what the “black box” is supposed to do and which outputs are not correct, the next step is to start diving into the inner workings of the system. We need to determine which input(s) are supposed to influence the output that is being troubleshot. The key word is “supposed to.” Since the system is not functioning properly, testing may not give us the correct results, so we need to analyze the system. Hopefully you have schematics, or a wiring diagram, or something to make this easier, if not, you will need to dive in and trace the electron flow yourself.

See how you do with this made up diagram below. The connecting dots are one-way, everything coming in from the left is combined and output to the right. Anything on the right, does not affect things to the left. Which input(s) affect which output(s)?

How did you do?
Now lets get more specific.
Output “W” is not functioning correctly.
Which input(s) influence the output at “W”?
Can you tell?
Need a hint?
Sorry, I can’t help you other than to let you know that the answer is below.
Mainly, at this point, I’m just writing lines hoping to push the answer far enough away that you don’t accidentally see it.

The answer is…
B,C,D, & E
Did you get it right? Take a look at the image below to see why. The pink lines are where I traced backward from “W” to the inputs.

I wish real world systems were this easy, but they are not. Mainly because the “dots” are relays, PLCs, sensors, electronic circuits, even full circuit boards. This makes the paths much harder to trace, but it needs to be done, to enough detail to find the needed answers. There is no need to determine what the “dots” do or how at this point, just that they connect input to output.

One other thing to check is, do any of the other inputs change the output? In this case, does “W” change when “A” changes? It shouldn’t, if it does, that is another clue for your troubleshooting. Consider everything that you learn (no matter how small) to be a clue to the puzzle.

Step 2: Troubleshooting Overview

Determine which output(s) are incorrect.

This post is part of a series describing the generic, high-level troubleshooting process.

Previous Posts:
What is Troubleshooting?
Simplified Troubleshooting Overview
Step 1. Understand the correct inputs and outputs of the “black box.”

Next Posts:
Step 3. Figure out which input(s) affect that output(s).
Step 4. Verify that those inputs are as expected.
Step 5. Analyze the path(s) from input(s) to output(s).
Step 6. Check all connections.
Step 7. What are the devices in those path(s) that change the data?
Step 8. Eliminate the devices that could not cause this output.
Step 9. Test the remaining devices.
Step 10. Repair/Replace faulty device(s).
Step 11. Fully test the system.

Step 2. Determine which output(s) are incorrect.
Now that you understand what the input(s) and output(s) are, and what they are supposed to look like, you can compare the expected output(s) to the current output(s) for the current input(s). If one or more outputs are not what you are expecting, then you have a place to start looking. Make sure that you check all outputs, often more than one will be wrong, which could be an indication of multiple problems, or could help point you to a single problem that affects some or all of the incorrect outputs.

When possible, change the input(s) and recheck the output(s). The fault may only happen on certain inputs, or only on a specific combination of inputs. If you know an input configuration causes the fault, make certain that you set the inputs in that way and check the output(s). Do not only check that one configuration, it may have issues with other input settings as well. The more information that you have on the current vs. expected behavior of the “black box,” the easier your troubleshooting will be.

If multiple errors seem to point to multiple problems, or you’re not sure, take note of all of them, but pick a single fault to work on first. It is easy to get lost when working multiple issues simultaneously. Troubleshooting is not a place to try to multitask, it will end up taking longer than if you just focus on one issue at a time.

When one of the incorrect outputs is an input to another “black box,” disconnect it from the other box, if possible. It is not uncommon for the input side of the next box to have an issue which appears to be a problem with the output of the first box. Often, when this is the case, simply disconnecting the two boxes will then give you the correct output on the first box. If this happens, you need to troubleshoot the second box for that issue, not the first. Congratulations, you just saved yourself hours or days of looking in the wrong place.

Step 1: Troubleshooting Overview

Understand the correct inputs and outputs of the “black box.”

This post is part of a series describing the generic, high-level troubleshooting process.

Previous Posts:
What is Troubleshooting?
Simplified Troubleshooting Overview

Next Posts:
Step 2. Determine which output(s) are incorrect.
Step 3. Figure out which input(s) affect that output(s).
Step 4. Verify that those inputs are as expected.
Step 5. Analyze the path(s) from input(s) to output(s).
Step 6. Check all connections.
Step 7. What are the devices in those path(s) that change the data?
Step 8. Eliminate the devices that could not cause this output.
Step 9. Test the remaining devices.
Step 10. Repair/Replace faulty device(s).
Step 11. Fully test the system.

Step 1. Understand the correct inputs and outputs of the “black box.”
The first step when beginning troubleshooting, is to get a firm understanding of how the equipment is supposed to function. If you don’t know what it is supposed to do, then you won’t recognize when it is not doing what it is supposed to. Don’t worry about the inner workings just yet, that will come, for now, consider it a “black box,” figure out what it does before you figure out how it does it.

This seems obvious, but I have watched others, and myself, dive head first into trying to fix a problem before knowing anything about the equipment. Not taking the time to learn about the equipment, or assuming that you understand it, can cause a lot of frustration and lost time chasing ghosts. I personally have spent hours chasing problems and pulling my hair out, only to finally realize that the section I was troubleshooting was doing exactly what it was designed to do. My desire to fix it fast, caused me to skip this step and I completely wasted that time as a result.

  • What are the input(s) to the box?
    • What should they look like?
    • Where do they come from?
    • Are they user inputs, or are they outputs from other boxes?
  • What are the output(s) from the box?
    • What should they look like?
    • Where do they go?
    • Is it a screen, lights, bells, etc., or is the output an input to another box?

This step is also critical when assisting someone else who is stuck in their troubleshooting. My experience has shown that often when I jump in to help someone who has been working on an issue for a while, that as I ask questions about how it is supposed to work, it triggers a thought in their own mind about where the problem could be, something that they had not thought of before, probably because they hadn’t taken the time to fully understand the “black box.” Often this leads them to finding the problem that has been alluding them, without me ever even completing Step 1 for myself.