torstai 21. lokakuuta 2021

DSP optimization : flip bits

Story time.

Long, long time ago a friend and I worked on very different projects, details not important but he worked with extremely low power DSPs, I with larger embedded stuff, but from my reading and on topics we discussed I got some familiarity with DSP too. None that would apply to real work, but basics.

Mind you, I only know (and only roughly) how things were (with this DSP) back then; I haven't followed DSP advances since then, so any parallels to today's hardware are pure guesswork.

DSPs, for those who don't know, were (and are, I guess) most of the time very powerful when doing tasks they are suited to and do it with very small amount of power, but that tradeoff comes with a cost of programming complexity. And all code would be written in straight assembly, absolutely no one was feeling masochistic enough to use C for these babies.

So these DSPs were very finicky. They had basic pipelining inside (i.e. DSP can start processing next instruction(s) while previous is still processed) but they didn't have any pipeline protection. Like first instruction might use register X, and $deity save you if you tried to use register X again before first instruction was finished. This would cause conflict and bad data. And this applies to branching too.

Branhcing was expensive in terms of processing cycles, so you didn't really want to use that in processing loop. DSP had handy "REPEAT n" instruction that could be used instead in tight loops that had no branching cost on top of fixed initial setup cost of few cycles.

Once he mentioned that he needed to do invert bits in 16-bit word. Bit 0 becomes bit 15, bit 1 becomes 14 and so on. This is quite trivial operation in theory; something like "REPEAT 16 ; ROL A ; ROR B"; rotate highest bit from A to carry bit, then carry to B, repeat 16 times. There was no specific instruction for this so it needed to be manual (or said instruction could not be used for some reason).

Except this needed to be inside existing REPEAT loop. There was only one level of REPEAT, you could not have another repeat withing REPEAT. 

Only answer we thought up was to unroll this loop. ROL A; ROR B; ROL A ... total of 16 times. Ugly as sin, but got the job done.

Later I heard that one after one pretty much his entire software team had dropped by, trying to figure out how they could make this prettier. None could, which in the end they grumblingly admitted, so this code stayed in. I got a small chuckle out of this mental picture.