maanantai 8. tammikuuta 2018

Interesting bug tracking session.


Recently I was testing a new a board I was working on. I already knew that I need to make some minor changes here and there before committing to next iteration I wanted to check the remaining parts of the board. So far there was nothing major, few component changes, few tweaks here and there.

So I was going through thing quickly. Write small software stub to test a piece of hardware, upload, run, seems to be working, next... Until I got to a certain radio module. And it just crashed the software. Based on software traces (see this post about that) it seemed to crash in middle of short delay loop. Same loop that is used everywhere in same code.

...what?

This just didn't make any sense. I have said it before, I am not fan of embedded debuggers, but once again I had to dive in. So I traced the offending code and whenever software hit this one specific instruction it crashed.

The instruction: (ARM assembly, exact format IIRC)

str r3,[r0, #20]

Store contents of register r3 to memory, address (r0+20).

...what?

Everything about this instruction is correct. Register being stored, address where it is to be put to, no errors there. And still it just crashes. Every single time. Mind you, at this point I had fiddled with the code somewhat, so in the meantime the timings had changed slightly - yet it was crashing on this same instruction every time.

At this point I went back to my earlier hunch I had dismissed once already (after quick measurement that turned up nothing then.)  As this is a radio module we're talking about, it's a bit power-hungry. Especially at startup. And there are some large-ish (relatively speaking) capacitors there too. And part of the code that crashed was controlling the FET that fed power to the module...

Aaand yes, that was the issue, after a bit more careful measurement attempt. When power was enabled to the FET, there is a very short - few hundred microseconds long - drop in 3.3v rail. Not even a large drop - 300mV or so - but just enough to trigger external voltage monitor/reset circuit of the main MCU. Causing "crash". Not the kind of crash I was expecting but anyway.

So, is this board pulling that much current during startup that it actually makes power rail sag that badly? Doesn't sound likely but it seemed so. So I added some caps to supply side of mentioned FET. Then some more caps. And more. No change.

Feed the chip power directly. It takes 200mA, 500mA, 1A? No, this can't be right anymore. So on another hunch, take out multimeter and do some resistance measurements... Aaaand right. Supply of this module is shorted to ground.

This is professionally assembled board, by the way, with machine pick-and-place and reflow soldering. Should be all good, but guess not.

This module comes in QFN-type package, meaning that contacts are underneath it, completely inaccessible. Pitch isn't exactly tiny - 1.25mm, but the contacts have just 0.25mm of space between them. Very very close to each other. As it (obviously) had to be, this board I was testing happened to have mis-placed module, approximately 0.5mm away from where it should be. Just great.

Annoying this with these QFNs is that they're extremely hard to desolder, at least without destroying them in the process with excessive heat. Essentially impossible with my tools, unless I wanted to destroy most of the board too - and I do have relatively good hot air soldering station.

So I am kinda out of good options here. If this were production run, I'd be having a hard discussion with manufacturer (about 40% of the boards made had this issue), but as this is just prototype, it might be easier to just junk bad boards and try to figure out how to improve footprint of this part to be easier to solder. Not an easy job, there.






Ei kommentteja:

Lähetä kommentti