There's a small test program in OpenLCB SVN for estimating the probability of collisions in identifiers:

prototypes/java/test/simulations/NodeIDCollisions.java

For example, if you assign random and independent 12-bits aliases to N nodes and then put them together, what fraction of the time will there be a collision?

Nodes |
Net Collision Rate |
Node Collision Rate |
---|---|---|

10 |
0.0109 |
0.001096 |

25 |
0.0706 |
0.00292 |

100 |
0.702 |
0.011980 |

The two "collision rate" numbers are the fraction of trials in which one or more collisions were observed on the network, and the fraction of the nodes that had a collision when the network first started up.

The first example shows that 70% of the time, when you put together a 100 node network, you'll have at least one collision in the network if you use this method of assigning IDs. Any particular node, though, has only a 1% chance of having a collision and needing to retry.

The number of nodes doesn't stop at 100, though. Since the protocol allows off-CAN-segment nodes to communicate, and you have to be able to address them, the numbers can continue to rise:

Nodes |
Net Collision Rate |
Node Collision Rate |
---|---|---|

250 |
0.99962 |
0.0297 |

1000 |
1.0 |
0.1126 |

Large modular layouts __will__ have collisions, but even big
ones won't require nodes to retry very often.

If IDs are sequentially, as opposed to randomly, assigned by a single source, the problem goes away. For example, if a manufacturer assigns serial numbers to boards, they won't conflict. But what if two manufacturers are involved? Does that make the problem better or worse?

We model this as picking N unique numbers from each of M manufacturers who have sequentially numbered L boards so far. What's the chance then? If the manufacturers have only made a small number of boards, e.g. 100, the chance of a collision becomes pretty large:

Nodes |
Number of Vendors |
Number Produced |
Net Collision Rate |
Node Collision Rate |
---|---|---|---|---|

10 |
2 |
100 |
0.6672 |
0.0997 |

5 |
5 |
100 |
0.9375 |
0.4757 |

5 |
20 |
100 |
1.0 |
(continuous) |

With 100 boards produced by each of several vendors, collisions are very likely!

If they've produced a thousand boards, the chance goes down, but it's still significant:

Nodes |
Number of Vendors |
Number Produced |
Net Collision Rate |
Node Collision Rate |
---|---|---|---|---|

10 |
2 |
1000 |
0.0975 |
0.0102 |

5 |
5 |
1000 |
0.2247 |
0.0501 |

5 |
20 |
1000 |
0.9928 |
0.9256 |

These rates are quite high; sequential numbering without some form of extra information (e.g. a manufacturer number) causes high collision rates.

OpenLCB's use of a seeded PRNG greatly reduces the size of this problem.

This section describes a study of the earlier 16-bit alias proposal.

There's a small test program in SVN for estimating the probability of collisions in identifiers:

prototypes/java/test/simulations/NodeIDCollisions.java

For example, if you assign random and independent 16-bits aliases to N nodes and then put them together, what fraction of the time will there be a collision?

Nodes |
Net Collision Rate |
Node Collision Rate |
---|---|---|

10 |
0.0006727 |
0.00006727 |

25 |
0.0046 |
0.00018452 |

100 |
0.0727 |
.00075333 |

The two "collision rate" numbers are the fraction of trials in which one or more collisions were observed on the network, and the fraction of the nodes that had a collision when the network first started up.

The first example shows that 7% of the time, when you put together a 100 node network, you'll have at least one collision in the network if you use this method of assigning IDs. Any particular node, though, has only a 0.07% chance of causing a problem.

The number of nodes doesn't stop at 100, though. Since the protocol allows off-CAN-segment nodes to communicate, and you have to be able to address them, the numbers can continue to rise:

Nodes |
Net Collision Rate |
Node Collision Rate |
---|---|---|

250 |
0.3793 |
0.0019 |

1000 |
0.9996 |
0.0076 |

Large modular layouts __will__ have collisions.

If IDs are sequentially, as opposed to randomly, assigned by a single source, the problem goes away. For example, if a manufacturer assigns serial numbers to boards, they won't conflict. But what if two manufacturers are involved? Does that make the problem better or worse?

We model this as picking N unique numbers from each of M manufacturers who have sequentially numbered L boards so far. What's the chance then? If the manufacturers have only made a small number of boards, e.g. 100, the chance of a collision becomes pretty large:

Nodes |
Number of Vendors |
Number Produced |
Net Collision Rate |
Node Collision Rate |
---|---|---|---|---|

10 |
2 |
100 |
0.6672 |
0.0997 |

5 |
5 |
100 |
0.9375 |
0.4757 |

5 |
20 |
100 |
1.0 |
(continuous) |

With 100 boards produced by each of several vendors, collisions are very likely!

If they've produced a thousand boards, the chance goes down, but it's still significant:

Nodes |
Number of Vendors |
Number Produced |
Net Collision Rate |
Node Collision Rate |
---|---|---|---|---|

10 |
2 |
1000 |
0.0975 |
0.0102 |

5 |
5 |
1000 |
0.2247 |
0.0501 |

5 |
20 |
1000 |
0.9928 |
0.9256 |

These rates are quite high; sequential numbering without some form of extra information (e.g. a manufacturer number) causes high collision rates.

Properly seeding the alias algorithm can greatly reduce this problem. For example, putting manufacturer information in the top byte of the alias, and serial number in the lower, reduces the collision rate to zero for up to 255 manufactured nodes. Even in the case of a few thousand, careful choice of packing can reduce the collision rate to close to (but not exactly) zero. The collision algorithm really only earns its keep when the network has caught on enough that tens of thousands of nodes are in use. That's not such a stretch, however, as there are tens of millions of DCC decoders in operation now, and the goal of the current networking proposals is to last for a similar length of time.

This is SVN $Revision: 213 $