lauantai 14. syyskuuta 2019

Debugging level: psychic


Over time, I've developed a kind of unofficial policy on error reports considering our devices.

Report one of issue: "Weird." (might be user error, might case of cosmic rays, might be something else)
Report two relatively close to first: "...Okay, that's curious."
Report three: "I need to take a look."

Getting a good, detailed report also helps. You know what I mean, as opposed to "I did that and it said something and then I just pressed everything and anything and now everything is gone." (cue facepalm.)

Yes, this situation happens way too often as people react to unexpected situation by panicking, when somewhat more correct procedure would be to take a step back and consider what to do. Like picking up the phone before touching anything.

But when it comes to actual troubleshooting these cases (actual issues that is), these days it seems I do most of the most of that just by reading the most-likely-relevant code and trying to figure out what needs to happen to cause the reported issues. Not that I could do much else, when a customer calls me to inform of a very rare case of weird behavior, it usually is something I can't reproduce in my test environment, at least not easily.

Most of the time that doesn't help and I have to drop the issue. But like I've said often, the mind keeps working on these things even when they are in back of your mind. In background some processing happens, and next time when I open the editor things looks a bit different, and maybe some other location needs to be examined too. That might not be enough for breakthrough, but repeat this for a few more times and usually I can construct a theory on what could go wrong to cause observed issues and fix it.

Of course I might be wrong and it is something else. But even then, at least that single potential issue is now in order. Moving on to the next.

I think this - ability to think what is going on, as opposed to using debugger or logging - is a skill anyone one develop. I do use debuggers and logging too, but when issue isn't one that I can observe directly, those don't help. Having written all of the code, including interrupt handlers, helps as well as I know it inside out, but knowing everything is not required. What is required is knowing parts that interact, even indirectly. Reading, say, data processing code doesn't help if there is for example an interrupt handler that also touches said data elsewhere and you don't know of it.

After that, it's mostly about figuring out what could happen. This is not rocket science (well, unless it quite literally is something like rocket control module), but it takes some practice and suitably paranoid mindset to think about unexpected events you might have seen or, really, thought of.



Ei kommentteja:

Lähetä kommentti