Troubleshooting: A skill that never goes obsolete
140 comments
·February 25, 2025fossuser
samuell
I think troubleshooting has a lot of overlap with thinking along the lines of the scientific method.
1. You have to start having hypotheses that you test, but should be ready to throw them away as quickly as you thought of them, when the results from testing it says so. Let data
2. You should preferably think hard about effective way to quickly rule out influencing variables and so quickly square in on the area where the erroneous effect is coming from.
3. You have to really rule out confounders. Make sure to turn off any caches or similar that might play games with you.
The area where I see most colleagues fail in this process is not being stringent enough with things like ruling out confounders and being systematic about organizing the outputs from hypothesis testing, to make sure you are 100% which outputs belong to which inputs etc.
It is the discipline and strictness in the process that will do the trick. Anything less and you will just trick yourself.
fossuser
Yes! This generally rings true to me.
The other thing I see trip people up is being unwilling to make a fast hypothesis that can be easily tested to narrow scope.
Instead they’ll often try to look at the code to understand but that’s usually slower for anything remotely complex.
sandinmyjoints
Have you found effective ways to market this skill?
fossuser
Usually it’s best in an operational type role, can be support, sre, tpm, etc. depending on your strengths. It’s best when paired with good comms and somewhat good social skills.
You build credibility by jumping in and doing a lot of support type stuff early on (which then also makes you better at whatever the product is, more familiar with what sucks for users).
upcoming-sesame
The parent comment mirrored my sentiments exactly.
In my company, I'm often the person who joins a production bug troubleshooting call, after sometimes hours of investigation, and rapidly identifies the root cause.
My typical workflow is:
* Clarify the issue and our assumptions. Often, simply restating the observed behavior aligns everyone.
* Pose questions to validate or challenge those assumptions.
* Suggest alternative methods to test the primary hypothesis.
Often, testing the initial hypothesis reveals its inaccuracy, leading to a swift discovery of the actual root cause.
Ultimately, it comes down to critical thinking and questioning assumptions I think.
credit_guy
> Realizing that I spend more time troubleshooting than I do building or doing ...
That's not good. The problem with troubleshooting is that it messes up with your reward system. After you fix a hard-to-debug problem, you feel a sense of accomplishment. Which would be ok, but the problem is that this sense of accomplishment is often time higher than it should be. You go home at the end of the day thinking "well, today I didn't build anything, but it's fine, because I fixed that bug". You are becoming complacent.
If you end up saying to yourself, like the author of this blog here, that you troubleshoot more than you build or you do, then you have a problem. Soon you'll be seen by others as a car mechanic. Maybe a reliable car mechanic. But reliable car mechanics don't get paid a lot.
This might be a controversial take but here it is: being proud of your troubleshooting skills sits somewhere between being proud of your typing speed and being proud of your word document formatting skills. These things never go obsolete, but don't fool yourself into thinking they are gold currency on the job market.
sjsdaiuasgdia
I think you may be leaning too far in the other direction.
I'm a troubleshooter. I fix problems. I keep my head straight in a crisis. Every job I've had across 3 decades, regardless of my actual title or formal responsibilities, I'm the firefighter. People call me when they can't figure something out. People call me when something big breaks and needs to be fixed urgently. Even if I'm not an expert in the broken thing, they call me in. They call me because the experts are often floundering and not making any progress because they can't troubleshoot their way out of a wet paper bag.
I do not feel this has held me back professionally. I have been loved by management and peers in all of these jobs. When I nearly left a prior employer because much of the work wasn't aligned with what I wanted to do, management created a new role with better aligned work and higher pay to convince me to stay. In my current role, I'm very happy with my salary, working environment, management, and team.
I wish troubleshooting skills were as common as typing and document formatting skills. I wouldn't need to help out nearly as many people because they could handle their own crises.
EvanAnderson
> I'm a troubleshooter. I fix problems. I keep my head straight in a crisis. ... People call me when they can't figure something out. ... Even if I'm not an expert in the broken thing, they call me in. They call me because the experts are often floundering ...
This describes a sizable portion of my career. It's lucrative, it's gratifying, and it's fun. It's as close as I'm going to get to being a "kick-ass mercenary".
Seeing new environments, new applications, and new problems never gets old. The stories that come from the work are priceless, too.
> I wish troubleshooting skills were as common as typing and document formatting skills.
When I conduct interviews this is the main skill I screen for. I think it can be taught, but somebody who already has it and is missing some particular technical experience is vastly more valuable.
steveBK123
I've found past a certain point career-wise, troubleshooting really can't be taught. It's sort of a a mindset/attitude to me. I you are 5+ years into your career and haven't gotten there, you probably just don't care. It's the attitude of a developer who is indifferent to the craft and just wants to cobble together found code as quickly as possible to move onto the next thing.
A good troubleshooter can enable higher output across a team because they are like grease in the machine. Particularly indifferent troubleshooters become a net drag because instead of being able to help others they are always interrupting others for help.
hinkley
I hired a contractor who thought she had bombed the interview because she didn’t solve the problem I gave her immediately and struggled with it. But when she got stuck it was because she was not seeing that the code she wrote didn’t match the code she described planning to write.
But she didn’t panic, she cracked open the debugger and went section by section through the code until she finally spotted her typo. Which is exactly the sort of person who won’t crumble every time something doesn’t work exactly the way the documentation says it does. We hired six people, and only renewed two, of which she was one. So as far as I’m concerned, I succeeded in my interview.
ChrisMarshallNY
Absolutely.
I was a manager for over 25 years, and this was exactly the type of thing that I looked for.
LeetCode tests actually tend to bias against that kind of skill.
dblohm7
> I do not feel this has held me back professionally. I have been loved by management and peers in all of these jobs.
If only your experience was universal in that regard! I once had that role in an early-career job -- but I was looked down upon by peers and management because I was doing mostly maintenance work. The "good" developers, in their minds, were the ones shipping the most new features -- the irony being that those features would then blow up out in the field, at which time they landed on my desk to turn them into production-worthy code.
ForTheKidz
That's just poor management, IMO. The good ones will have your number in their cell phone to call when the stuff they shipped breaks (or even better, allow you to take the time you need to not ship broken code to begin with). Plus it doesn't take much time in the industry to realize that shipping a broken product is a far worse look than shipping slower, and that the faster you can fix a broken product the less money you'll bleed.
llm_trw
That's your problem. You keep the fixes for a rainy day when production is down and the business is losing $10m an hour.
>Yeah boss I can fix it, but how much is it worth to you since this isn't in my job description.
hinkley
I’ve been wondering lately how much of this is being good at troubleshooting, and how much of it is being good at picking up a problem someone else gave you, poking at it, and then putting it down again.
Not everyone is cut out to do that. Asking them to look at a puzzle derails their entire day, almost every time instead of just when it’s hard. So even when it’s their puzzle they resist picking it up because it’s a guaranteed bad day.
walledstance
Wonderful description. Thank you for capturing a snap shot that conveys the power of troubleshooting.
wiseowise
Textbook survivorship bias.
> I wouldn't need to help out nearly as many people because they could handle their own crises.
They don't need to, because there's always you who can figure out boring minutiae for them while they deliver business value.
pasc1878
Thus the team is delivering business value not the other developers on their own.
rokhayakebe
In fact you could easily be the guy they keep on a monthly retainer just for peace of mind.
lurk2
The word retainer has an appealing mercenary quality to it. The dream is that your knowledge of an esoteric system set up in the 1980s gets you warehoused in a data closet at a mid-sized organization, where you can spend the rest of your days browsing Hacker News and watching pirated films.
0manrho
> But reliable car mechanics don't get paid a lot.
Not a great analogy. Reliable car mechanics often get paid very well in comparison to their peers. Used to be one. Got into tech as a result of how much tech got into cars. Do they pay as well as tech jobs? Depends. I made more money and worked less hours than my buddy in IT at one of the largest corporations in America in the same city (granted not a techhub like SV, Seattle, NYC, etc, but most places aren't).
The key differentiator here is not time spent building vs time spent repairing (troubleshooting). It's knowing what's worth spending your time on, and when to say "No", because not everything needs fixed, nor is every problem necessarily yours to solve.
Truly good diagnostics skills is knowing what's worth spending time on, regardless of whether it's repairing something that exists or building something that doesn't. A tire with only 10% of treadwear could technically be replaced with something better, but is that worth anyone's time or money? Probably not. But if the tire on the opposite side is still brand new, and they were replaced at the same time, diagnostics tells you the alignment is off, and that issue - whatever it may be - very well could be worth everyone's time and money to fix.
Code is no different. Don't try to fix/improve/build everything. Focus on what matters. Good troubleshooting/diagnostic skills is a big part of knowing what does, and doesn't.
heisenbit
Troubleshooting skills are really valuable but hard to market. You can deal with lot's of different technologies and effortless draw conclusions from for others totally disconnected domains. Sadly the tech market values expertise that is based on keywords. So while it is fun and creates huge value it is worth staying mostly on a path that can be explained to less mentally flexible mortals.
xerox13ster
This mentality through most of my career has left me trapped as technical support, and it's damn near impossible to climb out of the pit I've dug for myself. What you say about being seen as a car mechanic is true.
foo_barrio
This played out at my last place. My boss would assign my co-worker to build the world's crappiest car in the least amount of time and when it broke down I would be the only one that seemed to be able to fix it (while my co-worker was busy building some other crappy car). I would have built a much better car in the first place! However I would have taken more time and the goal was to build and release as fast a possible. My boss was okay with the risk of said crappy car, my co-worker got promoted and I slowly burned out.
It's a tough balancing to make sure you sell yourself correctly and fight to work on things you want to!
steveBK123
We had a guy like this on our team once, it took a year to convince management he was a net drag on the team. Half the team quit, the other half said they would if they had to work with him any longer.
To prove the point we put him on a strategic rewrite and gave him master/trunk while the entire team moved to a feature branch for 6 months. This was complimentary to his ego as he was sick of us bureaucrats in the rest of the team telling him what to do and being such a burden on his genius creativity.
By the end he was unable to build / run his own branch, while the remaining team lost no velocity and was making regular releases to end users. The choice was easy at that point.
fjjjrjj
I feel this way about documentation. I do it, a lot. I get compliments and positive feedback on it. It helps me remember things I would otherwise forget. I hope that others would be inspired by my example but it hasn't happened. I could be selfish and horde my own documentation and let others sink or swim. But that hurts me too as I'd have to pick up their slack.
MathMonkeyMan
I'm reminded of the Gervais Principle. Doing the work is not the way to "win," but not winning might be the better lifestyle. Depends on your motivations, aspirations, and ethics. It's easy to chase the total compensation number, because it's just _there_ and like what are we doing anyway? But then what are you doing, anyway?
ForTheKidz
> and it's damn near impossible to climb out of the pit I've dug for myself.
By far the easiest way to do so will be to find another job. If you can't do this, yea, mentality will lock you in to positions you don't want to be in.
pasc1878
The problem is that the new employer looks for CVs of those who achieve flashy things not those who fix things.
Fixing things only gets notices by coworkers and good managers.
rokhayakebe
Until you start your own company, even if it is just you.
jemmyw
> But reliable car mechanics don't get paid a lot
I dunno, the mechanic I go to is reliable and so busy it's hard to get a slot these days, and he seems to be doing very well for himself. So many mechanics are unreliable
imglorp
If you are debugging your work too much, maybe it's you.
Obligatory Kernighan’s law: “Debugging is twice as hard as writing the code in the first place. Therefore, if you write the code as cleverly as possible, you are, by definition, not smart enough to debug it.”
Curiositry
This is a fascinating take. I have been thinking about your comment for two days.
I think you're right in some cases (when working in a field one has mastered, for example), and I think I could probably go in the direction of getting it right the first time.
But the way I see it, any time I'm doing something new or innovative, I'm doing something I don't know how to do, which takes trial and error; and troubleshooting is basically figuring things out by trial and error, in a systematic way.
Though a lot of time it is used for fixing bugs, I think troubleshooting as a skill and mindset is equally useful for creating new things, where you are solving for something.
ge96
time dilation
Curiositry
Based on the timestamps, it could only be. But this story, timestamps notwithstanding, was submitted by suprisetalk ~2 days ago.
Then it was placed in news.ycombinator.com/pool (https://news.ycombinator.com/pool?next=43176091), and got two comments; credit_guy's comment was one of them.
Then, today, it hit the frontpage.
Notice that if you mouse over "9 hours ago" on the story it shows the timestamp 2025-02-25. 9 hours ago was not 2025-02-25. If you mouse over the "7 hours ago" on credit_guy's comment, the timestamp shows 2025-02-26. One day after it was submitted, two days before it made the frontpage.
fatbird
My skill at troubleshooting has caused me to be the goto guy in every project, which lends great credibility and opportunities for leadership. Your pride in your troubleshooting skills isn't pride in a side-quest, it's pride in having a deep understanding of how systems work in general and in the specific.
"Good troubleshooter" might not look great on a CV, but all of your coworkers naming you as the most valuable member of the team, and a natural leader, is worth more than any feature launches.
tonyedgecombe
>My skill at troubleshooting has caused me to be the goto guy in every project
I stopped trying to be that person because it came with too many costs. It wasn't that I didn't want to do it but that I wanted other people to be able to in my absence.
linza
The value is in leadership, and being able to avoid certain classes of bugs from appearing in the first place. Troubleshooting just happens to be the skill that allows you to gain the knowledge to lead.
fatbird
Exactly:
"I found the root cause and corrected it. It may be an issue in these other two places so we should check there as well."
"Great. How do we avoid issues like this in the future?"
"By doing X thing a different way, and ensuring that Y thing is also in place."
fennecbutt
I feel like the article is just a very long-winded way to say what I try to help junior devs understand and that's: take a step back and start from the very top, change only one thing at a time, don't become fixated.
So many issues turn out to be the smallest configuration mishaps, this is why I also promote using a debugger as much as I can as well - there's nothing quite like being able to see _exactly_ what's going on.
ForTheKidz
> this is why I also promote using a debugger as much as I can as well - there's nothing quite like being able to see _exactly_ what's going on.
I don't know, the speed of just reading code and maybe inserting some diagnostic messages is hard to beat. It's a pretty bad day if I feel like I need to bust out a debugger—99% of "seeing _exactly_ what's going on" is not going to be relevant and will just distract you.
MrDarcy
The code rarely resembles the runtime state of the system. Debuggers are an incredible shortcut almost all the time.
ForTheKidz
I strongly and emphatically disagree on every point (I don't even know what your statement about runtime state and code even means actually, it just seems like a category error, but how do you even write code without being able to reason about runtime state? It just makes no sense.), but I understand there are people who love their debugger and I respect them.
Basically the only time I pop open the debugger is when it's otherwise difficult to see runtime behavior—say, a certain condition in a server for which you can't easily access logs. Outside of that it feels like major overhead and distraction from getting the bug fixed. Plus iterating without print statements is a tedious, tedious affair.
Don't get me wrong, it's a critical and necessary skill that junior devs often struggle to understand and master. I just think over reliance on the debugger will slow your velocity over time when most bugs have straightforward causes easiest to see by simply reading the code (which you'll have to do anyway with a debugger). I can't tell you how many times I've had to tell devs to put away their tools so we can calmly analyze the code without flipping back and forth between views. The vast majority of the time it's the second pair of eyes that resolves the problem.
SoftTalker
I use printf (or log messages, or whatever equivalent). Haven't found a need for more.
davidmurdoch
Depends on the platform. Using a JS debugger is so easy, yet few devs I know use it.
genghisjahn
From zen and the art of moto: ‘There’s no fault isolation problem in motorcycle maintenance that can stand up to it. When you’ve hit a really tough one, tried everything, racked your brain and nothing works, and you know that this time Nature has really decided to be difficult, you say, “Okay, Nature, that’s the end of the nice guy,” and you crank up the formal scientific method.’
analog31
The more I hear about this book, the more I realize that I was way too young when I read it.
genghisjahn
I read it at age 18 and thought, "I should go buy a motorcycle and ride it around. That's the answer." Then I read it at age 30 and thought..."Oh, that wasn't the point at all."
mbonnet
So then what is the point?
wpm
Hugged to death for me right now, here's an archive link: https://web.archive.org/web/20250228192142/https://www.autod...
Curiositry
Thanks for posting an archive link. My site has survived previous HN traffic spikes on Fly.io's free tier, but 256mb of RAM wasn't quite adequate this time :)
RadiozRadioz
No disrespect, but I thought the whole point of these magic cloud platforms was that this situation never happens.
Curiositry
Yes, but you also need to be smart enough to operate magic cloud platforms, and be a paying customer. I am neither.
cdmyrm
The whole point of magic cloud platforms is to upcharge for everything and convince people there's no other way to run software.
mxschumacher
it's a static website, why does it have to consume RAM in the origin? I see Cloudflare is in front, are you caching HTML?
rs186
Related: https://news.ycombinator.com/item?id=42682602
Just finished the book recently. It's very insightful. As someone who considers himself good at debugging* and is still trying to improve debugging skills and efficiency, I view this book (and similar resources like this article) as a guide that also helps me reflect on what could have been done in a better way next time.
* Multiple times, I helped others find the root cause of a bug after they spend hours at it and have no clue what is happening
maleldil
Seconded. I don't think I learned much from the book, but it helped make my thought process more structured and methodical. I had a friend recently start as a developer, and I strongly recommended this to them.
I believe Hillel Wayne gives a copy to every junior dev he meets.
sevensor
What I missed here was the importance of keeping careful notes as you go. What exactly happened when we constructed that weird input and commented out line 353? What hypotheses are we entertaining? Can we rule out any of them based on our evidence? It’s very easy to dupe yourself if you’re doing it all in your head.
RedNifre
What do you think will be the last skills/jobs to go obsolete?
I think it's "Wanting the right thing" (This includes figuring out what the right thing is) and "Being able to articulate your wish clearly" (This includes clarifying your thoughts).
sevensor
Are there people with those skills today? They seem to be in terribly short supply. I’ve seen more than one company spin its wheels for ages because nobody could clearly express an operational vision.
LeonB
There will always be work for people who are smart, hard-working, creative, and willing to do exactly what their billionaire-owner asks.
jaakl
Well “building” is also troubleshooting, fixing a problem. Just in a bit more general level: ideally it is not fixing a “small” well-defined problem in software, but bigger and fuzzier problem in the real world: the thinking process and tooling is quite the same. Of course many devs dont think of it like that, they just try to fulfill given requirements without understanding real problem they troubleshoot. Actually a lot of software “builds” are really troubleshooting attempts on top of other software also, which makes that border even fuzzier.
emrah
If one is to be pedantic, troubleshooting does not involve fixing which is a separate and also valuable skill
herczegzsolt
Often the fix is trivial once the issue is troubleshooted properly.
You may still wish to not fix it for various reasons.
nonesuchuser
Half the comments here are nitpicking the car mechanic analogy (naturally), the other half are complaining about the site shitting the bed.
Yes, debugging is important, and too many people can't do it, which is unsettling considering how many bugs those people are putting into the code in the first place.
mrayycombi
Brendan Gregg's USE method is for performance troubleshooting but could work in any situation (broken is just the worst performance, right?)
Troubleshooting is one of my main comparative advantages - I'm better at it than I am at programming and I enjoy it more. It's also a relatively independent skill, not everyone is good at it or likes it. It reminds me of the lateral thinking puzzles I did as a kid where you had to ask questions to uncover whatever the weird situation was. You have to question your assumptions - think about how you might be wrong, something I like to do in general anyway.
There's a certain way of reasoning about the problem and thinking about what it might be with limited information in a systemic way. It's also a bit broader than debugging - you can do a lot of troubleshooting (sometimes faster and more effectively) by doing things other than reading the code.
It's also been somewhat of a career advantage because it seems to be both more uncommon than standard dev for someone to be really good at and something that most people dislike (while it's my favorite thing to do). It also overlaps a lot with other more general types of problem solving.
Anyway - a lot of the article resonates with how I think about it too.