perjantai 8. helmikuuta 2019

Corrupt debug data


I work mostly on embedded stuff, in a more "classical" sense that the current trend is. That is, I work on "bare iron", with no OS between my program and the chip. That is how I've done things and that is how I still do things - although in some cases something like Raspberry PI core with custom PCB with application-specific stuff placed on it is a better choice. And sometimes feature creep makes things to go sideways; where original design was easy to do with fully custom board with some tiny microcontroller, added requirements add more and more weigh there, to degree where it would have been better to use RPI to begin with. Unfortunately that would have required full redesign which was just not possible due to things I won't go into here.

But that was not what I was going to write about anyway, so disregard that. I was talking about bare iron stuff. When I was getting started, 20 years ago now, in-circuit debuggers/emulators were crazy expensive and I couldn't afford them. So for debugging I used the next best thing: debug output. System prints out debug statements that are part of code, to be removed from release build.

First designs often had just serial port, often due to limited MCU I was using, so this resulted weird and sometimes even twisted systems where serial port was shared between main application and debug stuff. Good times, good times.

Things are very different now, but I still don't like debuggers with my embedded stuff, still preferring debug prints, these days with dedicated serial port that isn't just populated on production boards.

This of course makes crashes sometimes hard to trace down, so I've developed some tools I've mentioned before for that. Now I've got also network-enabled logger where latest debug data is stored on device in case of issues and sent out with WLAN. Methods have become more sophisticated, but idea is still same; when there is a problem, I use my psychic debugging powers (thanks for Raymond Chen for that term) to make my initial  guess on what the problem might be - often with very limited information the (very often non-technical, to put it mildly) customer tells me and throw debug outs in there. More often than not my guesses end up being correct and issue is found and fixed. Granted, sometimes debug prints aren't even needed as issue can be found just by reading the code and comparing it against reported symptoms.

But sometimes this system fails me, like just now.

I was working with a new product, making some relatively small changes around application codebase (at the moment weighing some 37kloc - not really that much for a system with full-color graphic display, touch screen and WLAN, Bluetooth and GPS in it - and remember, no OS or libraries to take care of those either) when I suddenly noticed my debug prints being corrupt. Messages seemed to be half of message A, half of message B and sometimes part of them missing completely.

I write in C. You can guess what my initial guess was. Memory corruption, must be. Some change I made soiled itself and everything around it - including debug data buffer where it is stored briefly before pushed out through serial port.

I spent full hour looking for this issue, to no avail. No matter I did, no matter which parts of application I disabled (trying to isolate the issue), problem remained.

Eventually I got a vague hunch and unplugged the USB-RS232 converter I've been using and plugged it back in. Problem fixed, no more corruption.

Now that I got that figured out I do think that I've run into this issue once before, but as much as I try, I can't recall what exactly happened back then, so this hunch may have been more of recollection.

What makes this curious is that I've been using these same USB-RS232 converters for more than 10 years now, with notable lack of this kind of issues. This very specific converted I've had almost from start - I recognize it due to red tape I've wrapped around it (because I often have three or four of these on desk, all plugged in computer - how you recognize correct one without some kind of marking?).

Only variable thing here is that this computer here is running Windows 10, with its built-in USB-RS232 drivers, and my main work computer still has Windows 7 on it with manufacturer-provided drivers. Makes me wonder about newer systems reliability, just a little bit.




Ei kommentteja:

Lähetä kommentti