lauantai 25. marraskuuta 2017

Where did my code crash?


If you don't have suitable JTAG debugger handy, program crash in microcontroller can be annoyingly difficult to trace, especially if it happens only occasionally. Here is one trick for tracking those crashes with relatively low overhead;

char dtraceBuff[32];
 
void dtrace(char *data)
{
  memcpy(dtraceBuff, data, 32);
}


void dtraceOut()
{
  dtraceBuff[sizeof(dtraceBuff)-1] = 0; 
  printf("DTrace: %s\n", dtraceBuff);
}


And now you can sprinkle dtraces liberally around the code;

dtrace("intHandler");
dtrace("intHandlerEnd");
dtrace("mainloop 1");
dtrace("mainloop 2");
...


And in the beginning of main() (after your output system is initialized, of course) put call  to dtraceOut(). Replace print here with your favorite method of getting debug data out of your MCU.

So, when the code crashes, you can see immediately that DTrace says "mainloop 1". So what happens between "mainloop 1" and "mainloop 2"? If the issue isn't immediately clear, put more outputs there and try again. Eventually you will find the crash point.

But wait, if you just copy paste above to your program, you will quickly find out that this doesn't actually work. No matter what happens, DTrace will only print out empty string. Why is this?

The C standard requires that all uninitialized data is initialized to zeros. So when program crashes, MCU is reset and the C startup code will wipe all data - including dtraceBuff here.

There are however ways to prevent this, and exact method depends on compiler used. For Microchip's XC16, there is persistent-attribute:

char dtraceBuff[32] __attribute__((persistent));

For GCC (ARM), I couldn't quickly find similar simple attribute to make this easy. So next best option is to play around with linker script and related __attribute__s, allocating part of RAM for persistent (non-cleared) data and putting your data there. This is however a bit more complex task than simple attribute, so I will not go into details in this post. Personally I have modified the startup code I use to not clear any variables - but if you do that, you absolutely must remember to initialize everything yourself. And this can be cause for more subtle bugs by itself.

There is also option of using nonvolatile memory for trace data, but I'd recommend not going that route unless you absolutely, positively have to. Nonvolatile memories typically can only handle so many writes before they are worn out (Flash some 10k erase cycles; EEPROM some 10M; MRAM and FRAM on the other hand have "infinite" life), and they (unless embedded to MCU core) are slower to access. Sometimes, however, that may be only option.




Ei kommentteja:

Lähetä kommentti