Part-time Electronics: Serial protocols (part 2)

In previous part I showed few simple cases of serial protocol. Now we'll improve them a bit by adding bit of buffering and simple checksum validation.

Checksum is implemented only on downlink (to Arduino) messages but extending calculation to uplink messages is trivial. This program does nothing but acknowledge messages with valid checksum; invalid checksum results NAK and too short message is ignored.

Side note on code comments; The code here is definitely not example on how you should comment your code. Since this is a tutorial I've chosen to focus most explanations and discussion on the main text, mostly because the topic itself requires way more explanation than I would normally put in code (typically I add "why" comments, but explanations here would add major "how exactly" part too - of which the latter is in my opinion bad in living production code)

char recvBuff[4];
char recvLen = 0;

// Function to calculate checksum. Simple XOR sum is used.
unsigned char calcMsgSum(char *buff, int len)
{
  unsigned char sum = 0x77; 
  unsigned int i;
  for (i = 0; i < len; ++i)
    sum ^= buff[i];
  if ((sum == 13) || (sum == 10)) // Trivia for reader: Why is this here?
    sum += 10;
  return sum;
}
 
void loop()
{
  if (Serial.available() > 0) {
    char c = Serial.read();
    
    if (c == 10) {
      // LF ignored
      
    } else if (c == 13) {
      // CR used as end of line
      
      // Minimum of two characters required; data and checksum
      if (recvLen > 1) {
         // Verify received checksum against calculated
        unsigned char msgSum = calcMsgSum(recvBuff, recvLen-1);
        if (msgSum == recvBuff[recvLen-1]) {
          // For this example we do nothing but acknowledge message.
          Serial.write("A\r");  
        } else {
          Serial.write("N\r");  
        }
        
      }       
      recvLen = 0; // clear buffer for next command      
      
    } else {
      // Not CR or LF; store received char to buffer. 
      // There is never, I repeat, *NEVER* an excuse to not check your buffers. Always do it.
      if (recvLen < sizeof(recvBuff)) { 
        recvBuff[recvLen] = c;
        ++recvLen;
      }
    }
  }
}

Some test strings, including checksum character: "D3", "VER6" (calculated manually so I'm not 100% sure of correctness)

The checksum used here is simple XOR-sum. As a checksum algorithm XOR sum is fairly weak compared to, say, CRC but it is simple and quick to write so it is used as an example. Naturally the sender must have (functionally) equal checksum calculation routine to calculate it when building messages. Checksums in general are not topic here so I'll keep using this same XOR algorithm in my examples to keep them (more) simple.

Data itself is simple format; first data (max 3 bytes), then checksum (1 byte) and message is terminated with CR. LF character (which may be added by some terminal programs) is ignored.
This of course means that CR or LF characters may not appear in main message data. Using CR allows easy testing with simple terminal program (aside checksum calculation).

Note the buffer checking; if you don't do it already, build a habit of ALWAYS - with absolutely no exceptions - checking your buffers when dealing with incoming data. It does not matter where the data is coming from - even data internal to your system can get corrupted. C gets bad enough rep on this already so do your part to keep things tidy and safe. </rant>

When code were to receive too many characters you'd normally want to mark message invalid without bothering with further processing (received message will be invalid anyway), this code doesn't do that (again, to keep things simpler) but that isn't difficult to do.

But what if you want to transfer arbitrary binary data - likely including CR/LF characters - over the link? Escape characters will help you. In this example we map character 127 (0x7F) to mean that following character is data and not control data (changed part in blue).

unsigned char nextIsEscaped = 0; // We need flag for signalling this
  
void loop()
{
  if (Serial.available() > 0) {
    char c = Serial.read();
    
    if (nextIsEscaped) {
      if (recvLen < sizeof(recvBuff)) { // escaped data character - add to buffer
        recvBuff[recvLen] = c;
        ++recvLen;
      }
      nextIsEscaped = 0;
      
    } else if (c == 0x7F) {
      nextIsEscaped = 1;
      
    } else if (c == 10) {
      // LF ignored
      
    } else if (c == 13) {
      // CR used as end of line
      
      // Minimum of two characters required; data and checksum
      if (recvLen > 1) {
         // Verify received checksum against calculated
        unsigned char msgSum = calcMsgSum(recvBuff, recvLen-1);
        if (msgSum == recvBuff[recvLen-1]) {
          // For this example we do nothing but acknowledge message.
          Serial.write("A\r");  
        } else {
          Serial.write("N\r");  
        }
        
      }       
      recvLen = 0; // clear buffer for next command      
    } else {
      // Not escape char, CR or LF; store received char to buffer. 
      // There is never, I repeat, *NEVER* an excuse to not check your buffers. Always do it.
      if (recvLen < sizeof(recvBuff)) { 
        recvBuff[recvLen] = c;
        ++recvLen;
      }
    }
  }
}

So, now if we receive 0x7F character the system state is updated so that next incoming character will always be interpreted as data and not control character. Since our control characters here are CR, LF and 0x7F the sending system must always escape these characters before sending them as part of data. Checksum byte is also data and needs to be escaped. Note however that transmit side must calculate the checksum before doing the escaping to the message.

Also, remember those two lines in earlier checksum calculation marked as "trivia"? Those aren't needed anymore here (but they don't hurt much either if left there).

Unfortunately this change actually reduces out checksum's effectiveness a bit since now the checksum verifies the unescaped string and not the actually received bytes. In practice the reduced effectiveness most likely does not matter, but is still is worth noting.

Another thing to note is that neither lack of ignore-message-on-overflow and weak checksum calculations are security (as in "blackhat-looking-to-0wn-device") issues; it's up to next level message parser to make sure that invalid messages (be it accidental or intentional) are detected and rejected.

So I leave you here for a while. Hopefully I'll get to writing next part quicker than this, and there's still a lot to improve here and so far we've only tackled one part - transmit errors - of the communication protocol.

Part-time Electronics

maanantai 17. marraskuuta 2014

Serial protocols (part 2)

Ei kommentteja:

Lähetä kommentti

Tietoja minusta