Monday, May 18, 2009

More thoughts on my token bus based RS485 protocol

It was suggested on the PICLIST that I list all possible errors that could happen with my protocol and write possible solutions. Let’s see how this goes.

In a perfect bus where there is no line disturbance, no random death of a node, and proper following of protocol by each node, we can assume that data will not get corrupted, the token will not get lost and the buss will never fail. However, in reality, there is noise and line disturbance, nodes may randomly go down or leave or get hung up, or nodes may fail to follow protocol for some other reason. Due to these issues, it is important that we try and list the possible problems that may occur on the bus and provide solutions if we can.

Possible problems:

  • The biggest problem in this type of protocol is when the token itself gets lost.
    • When the bus is in working order, it is always busy due to the token being passed around. If the token gets lost for some reason, no node on the bus will have voice and therefore, the bus goes idle.
    • The token may get lost due to several reasons such as line disturbance during passing of the token from node A to B, or if a node which owns the token goes down. In either case, the bus goes idle because no node has voice and therefore no node transmits.
    • Fortunately, we can use the silence on the bus to our advantage as an indicator that a new token must be assigned. A possible solution to this problem is that every node will detect a lost token when there is a silence on the bus for longer than (ID * 2) + 10 ms (for sake of example; better values need to be calculated during implementation). Therefore, the node with the lowest ID will detect the silence first. As soon as this node detects the silence, it has to assume ownership of the token and start transmitting something (before the next node detects silence).
    • Some things to consider: this method should work well. However, it assumes that all nodes give full priority to the network. If a node gets partially hung up doing a task for (let’s say) 50 milliseconds, it could cause the bus to issue a new token. The problem now is that two tokens are on the bus and this will effectively cripple the bus. However, this shouldn’t happen since part of the software stack design requires full priority be given to the bus.
  • Another problem arises during corruption of data.
    • When node A sends a normal data packet to node B, it expects that B receives the packet successfully. But what if B doesn’t get the packet due to data corruption? For a non critical packet, this may be acceptable. But when we actually need that packet to reach its destination, we need a way to make sure that it got there.
    • A possible solution to this (discussed in the original protocol description) is the ACK packet. How this works is that the sender node (node A) sends a packet and enables the ACK_REQ flag to signal that it wants an ACK from the target that it got the packet. When the target node (node B) receives the packet, it sees that the ACK_REQ flag is enabled and sends back an ACK  (or NACK) packet to the original sender (when it has voice, of course). If there is a problem where B does not get the packet at all, or the packet is too corrupted to identify the sender, A will trigger a timeout 100 ms after it sent the packet (again, these numbers are for example purposes only). This number, however, depends on the number of nodes present on the bus. If there is a large number of nodes on the bus, we must wait a long enough time for the token to get to node B for it to send back an ACK (or NACK). But it is safe to assume a large number like 100 ms or 250 ms for this since we don’t expect errors to occur very often. This leads us to our next problem:
  • What happens if the ACK (or NACK) packet gets corrupted?
    • So far, A has successfully sent a packet (with ACK_REQ) to B. B now sends an ACK (or NACK) back to A. Suddenly, something happens, causing the packet to get completely corrupted. What happens now? In this situation, B has received the packet but A is still waiting for an ACK and thinks that B didn’t get the packet at all.
    • For this situation, we can attach a sequence number to the packet. So even if the same packet (with the same sequence number) gets sent out multiple times, it wont get “used” more than once by the target.

No comments:

Post a Comment