OpenLCB nodes on CAN segments do CAN arbitration by using their unique Node ID alias (NIDa) as part of the CAN header. This page documents a proposed algorithm for assigning unique Node ID aliases.
This method of assigning aliases and checking that they are unique is necessary because of the characteristics and limitations of CAN.
Each node has 48-bit unique id, and each wants a 12-bit local alias. Each assigns itself a potential 12-bit alias. The problem comes in determining whether this alias is in use by another node.
If a node sends out an check message containing just the alias, then it could expect that another node to complain if it has the same alias. This would work, except in the (very unlikely) case where both nodes send out the identical check message simultaneously. Neither would recognize a conflict and both would consider that they own the same alias. Therefore, a method needs to be found that guarantees that the check message(s) are unique, even if they are send simultaneously.
Unfortunately, limitations of CAN will not allow the full 48-bit NodeID to be included in the check message. Doing so would violate CAN rules as it could not be wholly contained in the header, and packets with identical headers are not allowed to have differing data-parts.
Therefore, the CIM/RIM method splits the NodeID into byte-sized parts, sends out 4 check messages (CIM = Check Id Messages, RIM = Reserved Id Message), each consisting of the alias and one quarter (12 bits) of the 48 bit id. Only if all of the CIMs *do not* conflict with another node (ie no RIM is received) does it guarantee that the alias is not in use by another node. Even if two nodes send out the identical packets simultaneously, at least one of these will differ due to a differing ID-part between the two nodes.
Must not require any user intervention.
Must be possible to map the assigned alias to a Node Sequence Number and vice-versa.
It is not required to come up with the same alias every time a node or segment is powered up.
Must always converge to an alias, regardless of when or in what order nodes come up.
Another node watching the process must be able to monitor how the decision was reached and what the result was.
It is desirable to detect duplicate (non-unique) NIDs, but 100% reliable detection is not required.
Start with the six-byte unique Node ID in each node, with the bytes labelled N1 through N6.
Initialize a specific full-sequence 48bit PRNG with the 48 Node ID.
Take the next element from the PRNG . Derive the 12-bit value for TCA, the tentative alias. If TCA is zero, repeat.
Start listening for CAN frames containing Check ID messages (CIMs) and Reserved ID Messages (RIMs). A CIM consists of an extended header with a 12 bit Source NIDa field, an 3 bit sequence field, a 12-bit TestN field, and other bits to denote that this is a Check ID Message. There is no other variable information in the frame. Having NIDa, Nn and TestN in the message ensure that CAN arbitration will work without error. An RIM is the same, except that it can be identified as being a different message for diagnostic purposes; it means the same thing upon receipt.
Send four CIMs with 0 through 3 in the Nn field, the 12-bit quarters of the NodeID in the TestN field, and TCA in the Source NIDa field. During this process, if a CIM or RIM is received that contains the TCA value in its CI field, TCA does not belong to this node. Repeat from (A).
Keep listening until your CAN hardware says your last CIM message has been sent and the CAN link is idle, plus 500 msec. If you cannot detect that condition, wait 1 second.
If no CIM or RIM with this TCA value in the CI field is received before the end of the previous step, you've successfully allocated TCA as this node's source NIDa. Send an RIM with TCA in the CI field and go into normal operation.
If that message fails with an error occurring during the data portion of the message, it's possible that two nodes have the same NodeID, a fatal error. To prevent the OpenLCB being taken down, this node should stop trying to allocate an alias and signal the error to the user via some non-CAN method, e.g. LEDs, emitting smoke, etc.
If any other error occurs, repeat from A.
While in normal operation, if you receive a CIM with your source NIDa, send a RIM, which will cause the other node to back off and try something else.
It should not be possible to receive an RIM with your allocated source NIDa in normal operation. If you do, log an error (on whatever it is they log it on there) and restart the process at (A).
The 12 bit alias has to be derived from a 48-bit sequence number in a way that keeps as much entropy as possible.
Sequentially numbered nodes will have initial seeds that differ only in the low bits of the PRNG output. For shift-register PRNGs, this can present a problem, as all bytes of the result are shifting together. Combining bytes (or even 12-bit chunks) via XOR or add operations results in reduced entropy.
The OpenLCB proposed PRNG (next section) uses add operations with a constant term to avoid this problem, so treating the 48-bit PRNG output as four 12-but values, which are then XORed, works well.
The proposed 48-bit PRNG is from “A 48-Bit Pseudo-Random Number Generator”, Heidi G. Kuehn, Communications of the ACM, Volume 4, Issue 8 (August 1961), pages 350-352.
xi+1 = (29+1) xi + c
where c = 29,741,096,258,473 or 0x1B0CA37A4BA9.The paper actually describes a PRNG that uses signed arithmetic, but for our purposes the 48th “sign” bit isn't important.
Note that, unlike some other PNRGs, this one can generate a zero result, and can accept a zero seed.
CAN arbitration reliably avoids collisions between frames with unique headers. It does not guarantee arbitration between frames with identical headers and no data; if the timing is right, they may overlay each other such that only one frame appears to have been sent. It is very difficult to ensure that two nodes send initialization packets only at different times. In addition, CAN will eventually signal an error if two packets with the same header and different data payloads collide, but not all CAN interface hardware provides reliable indications about why that error occurred.
At the highest level, this algorithm is broadcasting the complete Node ID and tentative alias to see if any other node is checking the same tentative alias. If two nodes have taken the same tentative alias, at least one of the four packets used in that broadcast will not be identical between the two nodes, because their six-byte Node IDs are different in at least one bit. CAN will successfully arbitrate this, and the other node will receive the frame, causing it to back off.
CAN transmissions are not atomic operations; you can receive a frame between the time you tell the hardware to send a frame and the time that frame is actually sent. It's therefore possible for both nodes contending for the same alias value to back off and try again. Using the pseudo-random number generator as a sequence number makes it likely that the next attempt will be made using different alias values in the two nodes. The use of the sequence generator also makes the arbitration process faster in the case of sequential Node ID values. People like to use small and sequential numbers for things, and the sequence generator maps those to very different values that are less likely to collide during arbitration.
The delays at the end of the algorithm are to ensure that higher latency nodes, such as software nodes working through USB convertors, can reply to CID messages from nodes that come up on a working link. OpenLCB is a soft-real-time system, and software that's interacting with it needs to have response times of a couple hundred milliseconds or better to be reliable. The algorithm provides a wait of a few times that to enable those programs to take part.
This is SVN $Revision: 417 $