tiistai 24. helmikuuta 2015

Serial protocols; frames and sequence numbers

At last I got to writing last (for now) chapter of serial protocols. Reason for the delay was that I was actually implementing this system for first time for new board I was making: a "simple" test generator that can generate pulses, generate voltage drop sequences, calculate incoming pulses and so on. This also means that this isn't Arduino code anymore; the board I made uses PIC24 series MCU so code is a bit different, but adapting it shouldn't be too difficult.

This was actually version three of pulse generator I made; second version is now in use but has some problems. First being that since it's modified production board with hacked-on circuitry it isn't very scalable (thus specifically designed new board for third iteration), second problem being its serial protocol.

The protocol used in v2 was loosely something I presented in previous chapter, but the problem is that send-timeout-re-send -sequence sometimes does things twice; not very fun when you are trying to generate very specific pulse sequences. Due to this the tester actually fails more often than the product being tested. Annoying but not fatal - run test again and it's good - thus the third revision.

But I digress here, to the protocol itself now.

Data frames. I'm defining data frame here as sequence of bytes where Start Of Frame (SOF) character always starts data reception and End of Frame (EOF) character ends it. Data between these two characters is the actual payload. This makes synchronization of communications easy; all you need to do is listen to SOF and start processing data when it is received.
Of course this means that you can't use SOF and EOF characters in payload data; so we assign third character, ESCape. When ever data stream has SOF, EOF or ESC, it is changed to ESC-SOF, ESC-EOF or ESC-ESC so that situation can be detected.

Then the sequence numbers. While those are not related to framing above, I'm adding both topics here because both were implemented in the code I made. So, by adding simple sequence number to commands we can detect situations where command is send multiple times. If commands are always run to the completion before starting new the system doesn't even need to cache them, just keeping track of latest one would be enough, but I've added short history to the example here. With slight modification this also allows state tracking of long-running commands (say, "generate 1000 pulses @ 50 Hz"); when starting execution you send "Executing" response, and when done you change it to "Done". So master can just send same request (with same seq id) again and again until "Done" is received -- assuming of course that command parser is run in middle of execution too.

In following code I'm using frame format "SOF SeqID Data0...DataN BCC EOF". Command's SeqID is in range of 1..15 (0x01..0x0F) (since we have only four results stored larger range is not needed), and reply's SeqID is command's SeqID ORed by 0x80, so 0x81 .. 0x8F. And just before EOF comes checksum (I'm used of using short form "BCC" which is technically not correct but bear with me here).

 // Control characters
#define SOF 0xA8
#define EOF 0xD5
#define ESC 0xF0

// Command receive buffer
unsigned char cmdBuff[128];
unsigned char cmdBuffLen, cmdBuffPrevChar;

// Buffer for last 4 results
#define MAX_CMD_RESULTS 4
// [0]=command ID, [1]=length, rest=raw payload data
unsigned char cmdResults[MAX_CMD_RESULTS][10]; 

We don't want checksum to match any of the control characters so slight modification to previous calculation (and again, this is not in any sense "secure" method, just a quick and dirty way to catch basic bit flips and such during communication)

unsigned char calcBcc(unsigned char *buff, unsigned char len)
{
  unsigned char i;
  unsigned char sum = 0x12;
  for (i = 0; i < len; ++i)
    { sum = sum ^ buff[i];
    }
   // checksum can't be any of special chars, so change it.
  if ((sum == SOF) || (sum == EOF) || (sum == ESC)) 
    --sum;
  return sum;
}

Main received that handles frame reception. This assumes that actual receive part is handled elsewhere (typically interrupt). This should typically be run in main loop.

  while (serialGetRxBufferSize())
    { unsigned char c = serialGetRxChar();
       // Start of Frame and not escaped?
      if ((c == SOF) && (cmdBuffPrevChar != ESC))
        { cmdBuff[0] = c;
          cmdBuffLen = 1;
          cmdBuffPrevChar = 0;
        }
        // EOF and not escaped?
      else if ((c == EOF) && (cmdBuffPrevChar != ESC))
        {
          processFrame();
          cmdBuffLen = 0; // frame finished
          cmdBuffPrevChar = 0;
        }
      else if (cmdBuffLen)
        { if (cmdBuffLen < sizeof(cmdBuff)-1)
            {  // BCC is calculated over escaped data
              cmdBuff[cmdBuffLen++] = c;
              cmdBuffPrevChar = c;
            }
          else
            { cmdBuffLen = 0; // overflow - quetly ignore
            }
        }
      else
        { // data outside frame - ignored
        } 
    }

So above handles receiving the frame, so next is the parser with its helpers. This only sends back ACK with "Test" payload sting on correct frames.

/* -----------------------------------------------------
 * Remove escaping within the command buffer.
 * This is done as in-place operation by moving data towards
 * the start of buffer when escape characters are found.
 * 
 * I actually have to admit that I haven't really tested
 * escaping so far; data I've used has been ASCII only so far.
 * 
 */
void unescapeCmdBuff()
{
  unsigned char iptr = 1; // "in pointer"
  unsigned char optr = 1; // "out pointer"
  while (iptr <= cmdBuffLen)
    {
      if (cmdBuff[iptr] == ESC) // escape char; skip it
        { ++iptr;
        }
      cmdBuff[optr++] = cmdBuff[iptr++]; // copy
    }
}

/* ---------------------------------------------------------
 * Process full received input frame.
 * cmdBuff[0] has SOF, [cmdBuffLen] has BCC and rest is data, 
 * excluding EOF that is not stored. 
 */
void processFrame()
{
  unsigned int i;

  if (cmdBuffLen < 4) // minimum for frame: SOF ID DATA0 BCC (eof); 
    return;           // anything less is invalid

  i = calcBcc(cmdBuff+1, cmdBuffLen-2);

  if (i != cmdBuff[cmdBuffLen-1])
    { return; // bad checksum, ignored/dropped; controller will send again.
    }

  if (sendResult(cmdBuff[1], 0, 0)) // check if result is cached already
    { return; // had result; it was sent.
    }

   // first un-escape the buffer
  unescapeCmdBuff();

   // So that result for that request was not stored, generate it.

  sendAck( (unsigned char*)"Test", 4); // send something
}

Frame sending. Internally we store results in non-framed format, so whenever anything is sent back we have to build entire frame again. Reply payloads are limited to 8 bytes by sendFrame function's buffer allocation.

/* -------------------------------------------------------------
 * Send reply data back as frame, adding SOF, EOF, BCC and escaping.
 */
void sendFrame(unsigned char id, unsigned char *data, unsigned char dataLen)
{
   // Build frame
  unsigned char frame[20], fptr, i;
  frame[0] = SOF;
  frame[1] = id;

   // Add data to frame, escaping special characters.
  fptr = 2;
  for (i = 0; i < dataLen; ++i)
    { unsigned char c = data[i];
      if ((c == SOF) || (c == EOF) || (c == ESC))
        { frame[fptr++] = ESC;
        }
      frame[fptr++] = c;
       // Note that there are no guards on overflow here. Assumption is that
       // max send data len is 8, so escaped length is max 16 and frame overhead
       // four more, so 20 is just enough even in worst-case.
    }

  frame[fptr] = calcBcc(frame+1, fptr-1); // SOF excluded from BCC calculation
  frame[fptr+1] = EOF;
  serialWriteData(frame, fptr+2); // fptr+2 = total length
}
 

Then the result sending function that actually serves dual purpose. If called with NULL data pointer (by processFrame) it checks if result for given request is already stored and if so, send it again (i.e. handling duplicate request from controller).

If called with non-NULL data, above is done again (granted, unnecessarily) and when there is no old result, new one is stored in result FIFO and sent to controller.
Yes, this could (and should) be split in two functions, first doing "check if we have result" part and second doing "add new result" part for clearer functional split. This just is how I wrote it so I'm leaving it.

/* ----------------------------------------------------------------
 * Send result (new, or copy of previous).
 * Returns 1 if result was sent (either copy of old, or new), 
 *   or 0 if no result was sent (ie there was no result stored for given ID)
 */
unsigned char sendResult(unsigned char id, unsigned char *data, unsigned char len)
{
   // First check if we already have cached result for given command.
   // If so, send it.
  unsigned int i;
  for (i = 0; i < MAX_CMD_RESULTS; ++i)
    { if (cmdResults[i][0] == id)
        {
          sendFrame(id | 0x80, &cmdResults[i][2], cmdResults[i][1]);
          return 1;
        }
    }
   // Was not cached and no new data -- don't do anything.
  if (data == 0)
    return 0; // no cached data and no new data given -- fail

   // Newest result is stored as first entry in FIFO, so push older
   // results back.
  for (i = MAX_CMD_RESULTS-1; i > 0; --i)
    { memcpy(&cmdResults[i-1], &cmdResults[i], sizeof(cmdResults[i]) );
    }
   // Then store new result data and send it.
  cmdResults[0][0] = id;
  cmdResults[0][1] = len;
  memcpy(&cmdResults[0][2], data, 8);
  sendFrame(id | 0x80, data, len);
  return 1;
}
 
/* ------------------------------------------------
 * Shortcut functions; Ack (A) with payload data and NAK with no data.
 */
void sendAck(unsigned int id, unsigned char *data, unsigned char len)
{
  unsigned char abuff[10];
  abuff[0] = 'A';  
  memcpy(abuff+1, data, len);
  sendResult(id, abuff, len+1);
}

void sendNak(unsigned int id)
{
  sendResult(id, (unsigned char*)"N", 1);
} 

This was receiver side; on master side same code can mostly be used. This unfortunately isn't exactly very clear example, but then again, protocol here is getting quite complex - functionality doesn't come without cost in this case either.

I've left out initial setup (all variables should be cleared to zero) and initial handshake between controller and receiver. Latter is actually quite interesting, as when controller is starting up it has no idea of the receiver's state (assuming receiver has been previously used and not turned off in between) - any sequence ID it uses might already be cached in receiver so results to commands are initially nonsense. I'm leaving that as exercise for reader for now -- there are multiple different solutions, none of which are clearly superior to others and vary in complexity.





perjantai 20. helmikuuta 2015

New thermal toy

I've got a new toy!

I've been drooling after thermal cameras for a while but the price has always been a bit too much to justify purchase. Just now however there was a sufficiently good deal and now I got myself a new Flir E4.

While it is not necessary for most of my stuff, it certainly is useful occasionally. It got its first actual use the day I got it. This PCB was returned with note that it had essentially random failures. I tested it (with usual functional tests) and could not find a fault. Then something else came up and I forget it on the desk for a while. Later I came back and PCB felt hot -- that's wrong, it shouldn't heat up enough to notice. So whip out the camera and what do I see;


Lower hot part is voltage regulator (buck converter), upper hot spot is MCU. No need to look further, that one is ready for junk pile. Faulty MCU means that there's no point trying to fix it, too much trouble for too little gain. To compare, here's properly working board on left (note the details, damn the multispectral tech in E4 is useful!), bad on right. Converter still heats up a bit on properly boardbut not even close to the other one (and yes, I let other one cool down first and turned these on at same time.)


Then playing with it a bit more for fun. Here's our dog, with hit hot eyes and cold nose.


On evening it's sauna time. Stones on of the (electric) heater top are pretty hot as expected, but the sides are surprisingly cool. Not that I'll go and touch them when sauna's hot anyway.


Since it's winter, I also went outside to look for hot spots in our house walls. Not surprising that windows leak a lot of heat, but otherwise I couldn't notice any specific hot (leakage) spots. I'll need to do more testing, those images are actually quite hard to interpret.

I know that there is that hack available for upgrading E4, but so far I haven't tried that. Gotta try that sometime, more thermal resolution would be nice. My main annoyance as it is now is the thermal scale; auto is pretty much useless for when you want to compare different shots (scale changes making easy checks impossible) and locking takes current reading but doesn't allow you to set low and high limits. Not good either.




tiistai 17. helmikuuta 2015

Playing with Chrom-Art accelerator of STM32F429I Discovery


Previously we got something to show on screen, but writing data to screen pixel-by-pixel in software isn't very efficient. Fortunately STM32F429 has Chrom-Art accelerator build it that we can offload some operations to.

When thinking of graphics accelerator one easily thinks current high-end PC graphics board and their embedded alternatives, with massive binary-only drivers and inaccessible datasheets the manufacturer might quickly flash for you, assuming you visit their office, pass background check and lie detector test and sign an NDA.

Fortunately this accelerator isn't one of those and everything is in datasheets (well, I hope, unless they've kept something hidden). On the flipside, unfortunately it isn't very powerful either, being limited to simple pixel-by-pixel (no scaling) copy- and blend operations.

The accelerator is quite clearly intended to be used in RGB (16-bit and above) modes, as it does not officially even support 8-bit output modes (it can however read 8-bit data as input and convert it to RGB data for blending/output). So, since I am now using 8-bit output mode for now, I won't be getting a lot out of it. By lying to it about source- and target modes we can however  use it for faster bitmap copies for drawing, without blending though.

But the code. Unlike most peripheral devices DMA2D (two dimensional DMA transfer, get it?) doesn't require other initialization than enabling the clock, which we already did during display initialization previously, so we can proceed directly to operations;


First simple helper function, this does nothing but waits until previous operation is done so we can start next.
/* -------------------------------------------------------------
 * Wait until DMA2D operation is done.
 */
void dispWaitDMA()
{
  unsigned int n = 0;
  while (DMA2D->CR & 1) // wait until transfer is finished
    { if (++n >= 2000000)
        { serialWrite((unsigned char*)"DMA2D fail\r");
          return;
        }
    }
}


// very simple test bitmap; 16x16, 8-bit data.
static const char bitmap1[] = {
     0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
     0, 0, 0, 0,15,15,15,15,15,15,15, 0, 0, 0, 0, 0,
     0, 0,15,15,15,15,15,15,15,15,15, 0, 0, 0, 0, 0,
     0, 0,15,15,15, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
     0, 0,15,15, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
     0, 0,15,15, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
     0, 0,15,15, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
     0, 0,15,15,15,15,15,15, 0, 0, 0, 0, 0, 0, 0, 0,
     0, 0,15,15,15,15,15,15, 0, 0, 0, 0, 0, 0, 0, 0,
     0, 0,15,15, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
     0, 0,15,15, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
     0, 0,15,15, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
     0, 0,15,15, 0, 0, 0, 0,20,26,31,31,28,26,23, 0,
     0, 0,15,15, 0, 0, 0, 0,20,26,31,31,28,26,23, 0,
     0, 0, 0, 0, 0, 0, 0, 0,20,26,31,31,28,26,23, 0,
     0, 0, 0, 0, 0, 0, 0, 0,20,26,31,31,28,26,23, 0 };



Then drawing a box. I'm using L8 display mode which controller does not officially support, so we lie to it and claim that we are writing ARGB4444 data. Since controller does nothing but memory writes without conversions this works, albeit at same time limiting us to operations with only even number of pixels.
/* -------------------------------------------------------------
 * Draw box filled with single color.
 */
void dispDrawBox(unsigned int x, unsigned int y, unsigned int w, unsigned int h, unsigned int color)
{

   // Controller does not officially support indexed mode for register-memory transfer,
   // so we cheat a bit - we claim to be using ARGB4444 (16bit) mode instead, but fill
   // it with L8L8 data. Since color space conversions are not done, this will go through
   // with no problems.
   // As a side effect this limits us to even addresses and widths, so x and w *must* be even.
  DMA2D->OPFCCR = 4; // ARGB4444. We're lying - we use it as L8L8 instead.
   // Color is 8-bit index value; since register expects 16-bits we duplicate it.
  DMA2D->OCOLR = color + (color << 8);

   // Target address. LSB of X can be cleared to force even address.
  DMA2D->OMAR = (unsigned int)(dispFrameBuff + x + y*DISP_WIDTH); 
   // Controller thinks it's drawing 16-bit data, so /2's are here to correct that.
  DMA2D->OOR = ((DISP_WIDTH-w)/2); // Display pitch, in 16-bit units
  DMA2D->NLR = ((w/2) << 16) | (h); // Width and height

  DMA2D->CR = (3 << 16) | 1; // 3=register-to-memory, start.
  
  dispWaitDMA();
}
Then the bitmap operation. Again, 8-bit output mode limits us to full bitmap copy operations with no transparency (alpha channel), but even that can be helpful. Now we have to lie about source format too; if source and output formats aren't same, DMA2D tries to do format conversions which will cause problems as data isn't what it expects.
/* -------------------------------------------------------------------
 * Copy bitmap from memory (sram or flash) to display buffer.
 * This assumes that source buffer is exactly (w*h) bytes in size and has L8 data in it.
 */
void dispBlit(unsigned int x, unsigned int y, unsigned int w, unsigned int h, const void *adx)
{
    // Memory-memory transfer.
    //  PFC can only be used with RGB output modes so that is unavailable,
    //  restricting this to full square (no transparency) transfers.
    //  Same limitations with width as dispDrawBox also apply.
  
   // Output as rectangle.
  DMA2D->OMAR = (unsigned int)(dispFrameBuff + x + y*DISP_WIDTH);
  DMA2D->OOR = ((DISP_WIDTH-w)/2); 
  DMA2D->NLR = ((w/2) << 16) | (h); 
  
  DMA2D->FGMAR = (unsigned int)(adx); // foreground (source buffer) address.
  DMA2D->FGOR = 0; // zero row offset; pixels rows are directly after each other in source buffer.
  
  DMA2D->OPFCCR = 4; // Output format; ARGB4444. We're lying - we use it as L8L8 instead (output stage doesn't support L8 mode)
  DMA2D->FGPFCCR = 4; // Source format ARGB4444. Lying again due to output format limitations; using L8 (although that is our data) will not work here.
  
  DMA2D->CR = 1; // mem-mem transfer (fg only), start transfer.

  dispWaitDMA();
}
So there, simple bitmap operations. RGB modes would offer more freedom here, including blending, but for now I'm happy to work with lesser memory footprint. After all, I'm planning more informative display (think something like smart display for home heating and individual device tracking/control) over flashy graphics.

tiistai 10. helmikuuta 2015

Initializing STM32F4 LCD-TFT (LTDC) controller

(Update: there is also slightly updated version of this available now)

Last time I referred quickly to STM32F4 Discovery board and its display. Now for everyone's pleasure here is the source for getting that display to work, in quick'n'dirty way. Note that I won't be handing you fully functional copy-pasteable code here, some things like debug output (through USART) and GPIO setup code are left out. Former you don't most likely need, and latter should be relatively trivial.

Since the board I'm planning on building will not have external RAM, I chose to go for minimal memory footprint and set up the display for 320x240 L8 (indexed RGB, 8 bits per pixel; 76800 bytes total) mode. This leaves enough memory for other goodies and even another display layer if becomes necessary. It has some other annoying limitations though, but more of them in later posts...

So, the code. First includes and display module definitions, along with frame buffer data.

[ EDIT: When I started using different display module I found out that the timing information here (both the defines here, and the initialization code below) is somewhat incorrect in general case, although it kinda sorta works for this specific display. I will try to post corrected (and more easily readable) data/code for included display when I find time to return to it; currently I'm a bit busy with the new one... ]

#include "stm32f4_discovery.h"
#include "stm32f4xx_conf.h"

 // Display module timing configuration. These are different for 
 // each display module; see module's datasheet for details.
#define DISP_WIDTH 240
#define DISP_HEIGHT 320
#define DISP_HSYNC_W  10
#define DISP_VSYNC_H  2
#define DISP_ACCUM_HORIZ_BACKPORCH  29
#define DISP_ACCUM_VERT_BACKPORCH  3
#define DISP_ACCUM_ACTIVE_W 269
#define DISP_ACCUM_ACTIVE_H 323
#define DISP_TOTALW 279
#define DISP_TOTALH 327


Highest level display initization routines. You should only need to call dispInit() in your main to get some data shown.


#define LTDC_CFG_PINCOUNT 22
const int ltdc_lcd_pins[] =
    // port AF pin. For some reason LCD has pins in both AF9 (alternate function 9) 
    // and AF14 slots.
  { PORTA | 3 | (14<<8), PORTA | 4 | (14<<8), PORTA | 6  | (14<<8), 
    PORTA | 11  | (14<<8), PORTA | 12  | (14<<8),
    PORTB | 0 | (9<<8),  PORTB | 1 | (9<<8),
    PORTB | 8 | (14<<8), PORTB | 9 | (14<<8), PORTB | 10 | (14<<8), 
    PORTB | 11 | (14<<8),PORTC | 6 | (14<<8), PORTC | 7 | (14<<8), 
    PORTC | 10 | (14<<8),PORTD | 3 | (14<<8), PORTD | 6 | (14<<8), 
    PORTF | 10 | (14<<8), PORTG | 6 | (14<<8), PORTG | 7 | (14<<8), 
    PORTG | 10 | (9<<8), PORTG | 11 | (14<<8), PORTG | 12 | (9<<8) };
   // some pins for easier probing;
   // PC2  = CS
   // PD13 = CMD/DATA
   // PF9  = SDA
   // PF7  = CLK
   // PA4  = vsync (active low)
   // PC6  = hsync (active low)
   // PF10 = DE (active high)
   // PG7  = dotclk
const int ltdc_spi_pins[] = // SPI pin listing; they're used in IO mode
  { PORTF | 7, PORTF | 9, PORTC | 2, PORTD | 13 };


 
void dispInit()
{
   // Enable clocks for GPIOs (pins) and devices. Last is DMA2D (Chrom-Art)
  unsigned int i;
  RCC->AHB1ENR |= (RCC_AHB1ENR_GPIOAEN | RCC_AHB1ENR_GPIOBEN | RCC_AHB1ENR_GPIOCEN | 
     RCC_AHB1ENR_GPIODEN | RCC_AHB1ENR_GPIOFEN | RCC_AHB1ENR_GPIOGEN | (1<<23) ); 

  // Initialize pins. SPI pins are set up in IO mode, others in alternate function mode.
  for (i = 0; i < LTDC_CFG_PINCOUNT; ++i)
    { unsigned int af = (ltdc_lcd_pins[i] >> 8) & 15;
       // Set pin (3) in (PORTA) to (AF14) through GPIO controller
      ioPinSetAF(ltdc_lcd_pins[i] & (~0xff00), 0, af);
       // SPI has only four pins to set up; all push-pull output. 
      if (i < 4)
        ioPinSetOutput(ltdc_spi_pins[i], 0); 
    }

  // SPI CS high and clock low
  ioSetPin(PORTC | 2);
  ioClearPin(PORTF | 7);

  delay(10);

  dispSPIInit();

  dispLTDCInit();

}

The nasty part. Even single bit error in SPI init sequence may result nothing shown, and examples on how they should be set up are often completely missing. Like previously said I had to measure how example binary does it to get all working...

 // Display's SPI init data table.
 // Bit8 is command/data indication (0=command). 
 // Value 0x800 indicates delay (used after some commands). 
 // 0xfff is end of table.
 // Mostly these are written so first value on line is command, rest data.
static const unsigned short ltdc_init_spi_data[] = {
     0x0CA, 0x1C3, 0x108, 0x150,
     0x0CF, 0x100, 0x1C1, 0x130,
     0x0ED, 0x164, 0x103, 0x112, 0x181,
     0x0E8, 0x185, 0x100, 0x178,
     0x0CB, 0x139, 0x12C, 0x100, 0x134, 0x102,
     0x0F7, 0x120,
     0x0EA, 0x100, 0x100,
     0x0B1, 0x100, 0x11B,
     0x0B6, 0x10A, 0x1A2,
     0x0C0, 0x110,
     0x0C1, 0x110,
     0x0C5, 0x145, 0x115,
     0x0C7, 0x190,
     0x036, 0x1C8,
     0x0F2, 0x100,
     0x0B0, 0x1C2,
     0x0B6, 0x10A, 0x1A7, 0x127, 0x104,
     0x02A, 0x100, 0x100, 0x100, 0x1EF,
     0x02B, 0x100, 0x100, 0x101, 0x13F,
     0x0F6, 0x101, 0x100, 0x106,
     0x02C, 0x800,
     0x026, 0x101, 0x0E0, 0x10F, 0x129, 0x124, 0x10C, 0x10E, 0x109, 0x14E, 0x178, 
         0x13C, 0x109, 0x113, 0x105, 0x117, 0x111, 0x100,
     0x0E1, 0x100, 0x116, 0x11B, 0x104, 0x111, 0x107, 0x131, 0x133, 0x142, 0x105, 
         0x10C, 0x10A, 0x128, 0x12F, 0x10F,
     0x011, 0x800,
    0x029,
    0x02C,
     0xFFF /* END */ };



/* -----------------------------------------------------------
 * Send single SPI command to display.
 */
void dispSPIWrite(unsigned int dc, unsigned int data)
{
  ioClearPin(PORTC | 2);
  delay(2);

  ioPin(PORTD | 13, dc); // 0=cmd, 1=data
  delay(2);

  unsigned int i;
  for (i = 0; i< 8; ++i)
    {
      ioPin(PORTF | 9, (data & 0x80) );
      delay(2);
      ioSetPin(PORTF | 7);
      delay(2);
      ioClearPin(PORTF | 7);
      data <<= 1;
    }

  delay(2);
  ioSetPin(PORTC | 2);
  delay(2);
}


/* ---------------------------------------------------------
 * Send display init commands via SPI interface. GPIO pins must be 
 * enabled before this. Goes through the init table above and send data 
 * to controller. I'm using simple bit-banging interface as this is 
 * short one-time operation; no need to use full SPI peripheral for it.
 */
void dispSPIInit()
{
  unsigned int i = 0;
  unsigned int w = ltdc_init_spi_data[0];
  while (w != 0xfff)
    {
      if (w == 0x800)
        { delay(500); // delay command; around 15us should be enough 
        }
      else
        { dispSPIWrite((w & 0x100), (w & 0xff));
        }

      w = ltdc_init_spi_data[++i];
    }


}

And finally the setup of LTDC. Note that the source package I had did not have LTDC definitions in headers so I had to add them myself. If you need them too (and don't want to type them up yourself, all the information in after all in datasheets) let me know.

/* -----------------------------------------------------------------
 * Init internal LTDC controller.
 */
void dispLTDCInit()
{
  RCC->APB2ENR |= (1 << 26); // Enable LTDC clock.

  /*   Clock setup: LCD pixel clock = HSE / M * N / R / DIVR (see your 
   *   display module specs to see what it should be) Most discovery board 
   *   templates has HSE 8MHz and M at 8. This gives PLL input clock of 1 MHz.
   *
   *   Then we use values:
   *      N = 192,  R = 4,  DIVR = 8 so output freq is 6 MHz
   *
   * PLL has also Q divider, for SAI (audio), but we do not use it so it
   * it just set to something here.
   */

#define PLLSAI_N     192  /* Multiplier for input clock (which is HSE/M) */
#define PLLSAI_R     4    /* Divider 1; /4 */
#define PLLSAI_DIVR  2    /* Divider 2; 0=/2, 1=/4, 2=/8, 3=/16 */

  RCC->PLLSAICFGR = (PLLSAI_R << 28) | (4 << 24) | (PLLSAI_N << 6);
  RCC->DCKCFGR = (RCC->DCKCFGR & (~0x30000)) | (PLLSAI_DIVR << 16);    

  RCC->CR |= (1<<28); // enable PLL SAI and wait until it is ready
  unsigned int n = 0;
  while (!(RCC->CR & (1<<29)))
    { if (++n > 2000000)
        { serialWrite((unsigned char*)"pllsai fail\r"); // PLL clock did not start; failure somewhere.
          return;
        }
    }

  LTDC->GCR = 0; // disable LCD-TFT controller for re-initialisation

   // Set up LCD timing variables. At first these values may seem like black magic
   // but trust me, if you spend some time with display datasheet they become clear eventually.
   // Or just trust me on these :)
  LTDC->SSCR = ((DISP_HSYNC_W-1) << 16) | ((DISP_VSYNC_H-1) << 0);
  LTDC->BPCR = ((DISP_ACCUM_HORIZ_BACKPORCH) << 16) | ((DISP_ACCUM_VERT_BACKPORCH));
  LTDC->AWCR = (DISP_ACCUM_ACTIVE_W << 16) | (DISP_ACCUM_ACTIVE_H);
  LTDC->TWCR = (DISP_TOTALW << 16) | (DISP_TOTALH);

   // background color (23:0 : R8G8B8). Purple-ish for testing.
  LTDC->BCCR = 0xff00ff;

   // Enable layer 1. We have 320x240 (76800) bytes in internal memory in L8 (with color LUT) mode
   // Discovery board has also external memory that could be used for bigger
   // display sizes, color depths or multiple display pages.

   // From the display controller's point of view the image is 240x320 pixels, portrait mode.
   // Entire area configured for layer 1.
  LTDC->L1CR = 0; // disable for reprogramming.
  LTDC->L1WHPCR = ((DISP_ACCUM_HORIZ_BACKPORCH + DISP_WIDTH) << 16) | (DISP_ACCUM_HORIZ_BACKPORCH+1);
  LTDC->L1WVPCR = ((DISP_ACCUM_VERT_BACKPORCH + DISP_HEIGHT) << 16) | (DISP_ACCUM_VERT_BACKPORCH+1);
  LTDC->L1PFCR = 5; // Pixel format: 5=L8. 2= RGB565
  LTDC->L1CFBAR = (unsigned int)dispFrameBuff;
  LTDC->L1CFBLR = (DISP_WIDTH << 16) | (DISP_WIDTH+3); // hi:line pitch in bytes; lo:(width_in_bytes+3).
  LTDC->L1CFBLNR = DISP_HEIGHT;
  LTDC->L1CKCR = 0x000; // Black data is transparent, allowing background to be shown.

    // Set color look-up tables. Here we fill only first 128 entries with black-to-full color values.
  unsigned int i;
  for (i = 0; i < 16; ++i) 
    LTDC->L1CLUTWR = ((i+0)  << 24) + (i*0x101010); //   0-15: 16 greyscales
  for (i = 0; i < 16; ++i) 
    LTDC->L1CLUTWR = ((i+16) << 24) + (i*0x100000); //  16-31: 16 reds
  for (i = 0; i < 16; ++i) 
    LTDC->L1CLUTWR = ((i+32) << 24) + (i*0x001000); //  32-47: 16 greens
  for (i = 0; i < 16; ++i) 
    LTDC->L1CLUTWR = ((i+48) << 24) + (i*0x000010); //  48-63: 16 blues
  for (i = 0; i < 16; ++i) 
    LTDC->L1CLUTWR = ((i+64) << 24) + (i*0x101000); //  64-79: 16 yellows
  for (i = 0; i < 16; ++i) 
    LTDC->L1CLUTWR = ((i+80) << 24) + (i*0x100010); //  80-95: 16 purples
  for (i = 0; i < 16; ++i) 
    LTDC->L1CLUTWR = ((i+96) << 24) + (i*0x001010); // 96-112: 16 cyans

  LTDC->L1CR = 0x13; // Enable layer1 (0x01), CLUT (0x10) and color key (0x02)

   // Fill display buffer with some data.
  for (i = 0; i < sizeof(dispFrameBuff); ++i)
    { dispFrameBuff[i] = 7;
    }

   // White borders
  for (i = 0; i < DISP_WIDTH; ++i)
    {
      dispFrameBuff[i] = 15;
      dispFrameBuff[i+DISP_WIDTH] = 15;
      dispFrameBuff[i+(DISP_HEIGHT-1)*DISP_WIDTH] = 15;
      dispFrameBuff[i+(DISP_HEIGHT-2)*DISP_WIDTH] = 15;
    }
  for (i = 0; i < DISP_HEIGHT; ++i)
    {
      dispFrameBuff[i*DISP_WIDTH] = 15;
      dispFrameBuff[i*DISP_WIDTH+1] = 15;
      dispFrameBuff[i*DISP_WIDTH+DISP_WIDTH-1] = 15;
      dispFrameBuff[i*DISP_WIDTH+DISP_WIDTH-2] = 15;
    }

   // Color square at upper left corner
  for (i = 0; i < 16; ++i)
    { unsigned int j;
      for (j = 0; j < 16; ++j)
        { dispFrameBuff[DISP_WIDTH*2+2+ i*DISP_WIDTH*2+j*2] = i+j*16;
          dispFrameBuff[DISP_WIDTH*2+2+ i*DISP_WIDTH*2+1+j*2] = i+j*16;
          dispFrameBuff[DISP_WIDTH*3+2+ i*DISP_WIDTH*2+j*2] = i+j*16;
          dispFrameBuff[DISP_WIDTH*3+2+ i*DISP_WIDTH*2+1+j*2] = i+j*16;
        }
    }


   // Reload shadow registers to active in SRCR
  LTDC->SRCR = 1; // 1=reload immeiately; 2=reload on vertical blank

   // Enable controller.
  LTDC->GCR = 1;


}

There you go. You should now have some data on screen. Next similar quick introduction to the Chrom-Art controller so you don't have to use slow manual writing for everything.

perjantai 6. helmikuuta 2015

STM32F4 discovery with LCD

It's been pretty busy lately and I haven't had time to post lately. Hopefully I'll be writing the next chapter on serial protocols soon (I've got new test board where I plan to actually use the improved routines), but for now there's something else.

After finishing with more urgent tasks I've been playing with STM32F439i discovery board. Whenever I've started with new processor earlier the rule of thumb has been that it will take a week to get basics going - you know, development environment set up, programming interface made, clocks set up, timers running, LED flashing. This it only took maybe a day to get to that point since it seems most ARM-based devices are very similar today and there's a plenty of examples around. And of course board can be programmed through USB too so no need to tweak with programming setup.

The bad thing is that in my experience the examples and libraries provided by chip manufacturer are utter crap. This wasn't exception either. While I understand why they exist - to get things running quickly for beginners - they make things unnecessarily difficult for us that prefer to program the hardware registers directly (and more importantly actually understand what they do - libraries tend to hide the ugly details so when something goes wrong you're out of luck.)

Libraries also come with lot of extra cruft so picking the parts I need for the job gets difficult - like USART, I mostly need just simple 8N1 serial with no flow control -- everything else is noise I have to filter out.

But the main topic here, the nice LCD-TFT display the board has... The thing I hate with these displays is that before you can see anything they require lots of settings programmed in (typically by using SPI-like 3- or 4-wire bus), and even single bit error in middle will easily result just blank screen. When dealing with some random display from somewhere you are lucky if you find datasheet that tells you next to nothing, or some snippet or datasheet written in russian or chinese (don't read this as an insult towards either; I just can't read either so such things don't really help me much.) 

In this case most examples use the libraries mentioned above, which of course gets annoying since that's exactly what I don't want. Trying to find and read the library sources may or may not get you anywhere. Most examples I found didn't include SPI setup either, or used different display so they weren't of much use. 

Reading board schematic and LCD datasheets, along with LCD controller's (which were not the best I've seen but not worst either) gave me serial pins and operating mode; fairly typical setup of chip select, serial data, serial clock and data/command pins. Since SPI example I managed found somewhere wasn't working I had to resort to hard methods: loading example program to chip and probing it's setup with scope.


One byte shown. Here yellow is serial clock, purple is data, green is chip select and blue command/data selection; it's high so this byte is data. Fortunately since CS is disabled between every byte it's easier to read as there is clear space between bytes.
Deciphering above gives us bits 00010000 (MSB first, data on rising edge of clock) - so byte is 0x10. There were some 100 bytes in total so reading it all out took a while. And this process is very sensitive to interruptions - fortunately I got lucky and no one called so I didn't have to backtrack to see what I just had done. And this obviously requires a scope where entire burst fits in memory at one go, LeCroy I'm using has 500kpoint sample memory which was enough. Higher-end scopes may have routines to decipher serial data build-in but I had to do this the hard way.

Move these bytes to code and whops, display is working! (I had written MCU-side LCD controller setup earlier and it seems to be working already - that part was relatively simple).

(If someone wants I can provide working code example too, end result was surprisingly short, around 200 lines including comments and empty lines for setup part)