How unreliable is UDP?
I realized something recently: I know virtually nothing about UDP. Oh, I know it's connectionless, has no handshaking and thus doesn't provide any guarantees about delivery or ordering. But, in practice, what does that actually mean?
I setup 5 VPS to send each other a few UDP packets over a 7 hour period. I didn't send much traffic (though that's certainly worth trying). Each server, every 9-11 second, randomly picked a target and sent 5-10 packets ranging from 16 to 1016 bytes.
2 servers were in the same data center in New Jersey. 1 each in LA, Amsterdam and Tokyo.
[Un]Reliability
The first thing I wanted to know was how unreliable UDP was. Are we talking about a delivery rate of 25%? 50%? 75%?
Packets Received - click table to toggle %
Receiver | ||||||
---|---|---|---|---|---|---|
NJ 1 | NJ 2 | LA | NLD | JPN | ||
NJ 1 | - | 2981/2981 | 2888/2889 | 2964/2964 | 3053/3054 | |
NJ 2 | 3016/3016 | - | 3100/3101 | 2734/2735 | 3054/3054 | |
LA | 2901/2941 | 2932/2975 | - | 2938/2942 | 2712/2712 | |
NLD | 3038/3038 | 2771/2772 | 2724/2724 | - | 2791/2791 | |
JPN | 2551/2552 | 2886/2886 | 2836/2838 | 2887/2887 | - |
These numbers were better than what I had expected. I was specifically thinking NLD <-> JPN would see above normal loss, but there was none. Data being sent out of LA, specifically to the two servers in NJ, seems to have struggled some. Was there a pattern?
First, I thought maybe the size of the packet would be an issue. Admittedly, I kept them small (16 byte header, 0-1000 byte payload):
Packet Loss Per Size (bytes)
0-115 | 116-215 | 216-315 | 316-515 | 516-715 | 716-915 |
13 | 11 | 12 | 13 | 23 | 23 |
Nothing obvious there. Did the packet loss happen around the same time? Unfortunately, I didn't keep timestamps (why?!), but I did keep a counter per pair. If you look at the 43 packets that failed to make it from LA to NJ2, 29 were lost during 2 ~1 minute periods. The NJ1 packet loss also largely happened during 2 short periods.
Ordering
The other thing I was interested int was ordering.
The first way I looked at this was to measure the inversion of the array. Essentially, that's the number of pairs that are out of order. If you have an array with the values 10, 8, 3, 7, 4
, you end up having to do 8 swaps ((10, 8), (10, 3), (10, 7), (10, 4), (8, 3), (8, 7), (8, 4), (7, 4)).
Inversions
NJ 1 | NJ 2 | LA | NLD | JPN | |
---|---|---|---|---|---|
NJ 1 | - | 0 | 2994 | 2581 | 4658 |
NJ 2 | 0 | - | 3147 | 2459 | 4645 |
LA | 3980 | 3861 | - | 3237 | 4010 |
NLD | 3125 | 1826 | 3133 | - | 4189 |
JPN | 3920 | 4417 | 4147 | 4425 | - |
Don't know about you, but I'm not sure I find that useful. It sure seems high. Of course, one of the reasons to use UDP is when you're able to discard some packets. If you send 10 000 packets, and they're all ordered, except that the last one is somehow first, you can just discard it rather than doing 9999 swaps.
What if we discard any packet that come after a later packet we've already processed (later meaning the counter is great)? For example, if we get 1, 5, 4, 3, 6, 7
, we'd discard 4 and 3 since we've already seen 5. How many "good" packets would that leave?
# of ordered packets - click table to toggle %
NJ 1 | NJ 2 | LA | NLD | JPN | |
---|---|---|---|---|---|
NJ 1 | - | 2981 | 1514 | 1658 | 1123 |
NJ 2 | 3016 | - | 1627 | 1483 | 1161 |
LA | 1227 | 1259 | - | 1485 | 1067 |
NLD | 1407 | 1645 | 1220 | - | 1096 |
JPN | 980 | 1083 | 1141 | 1087 | - |
As a slight tweak, what if we group 5 packets together, sort them, then re-apply the above discarding code:
# of ordered packets (with grouping) - click table to toggle %
NJ 1 | NJ 2 | LA | NLD | JPN | |
---|---|---|---|---|---|
NJ 1 | - | 2981 | 2061 | 2235 | 1807 |
NJ 2 | 3016 | - | 2214 | 2041 | 1889 |
LA | 1868 | 1873 | - | 2066 | 1720 |
NLD | 2200 | 2273 | 1920 | - | 1712 |
JPN | 1541 | 1804 | 1735 | 1732 | - |
Conclusion
It's hard to draw any conclusions without running this for longer and with more data. Still, it seems that UDP reliability is pretty good. Distance usually involves more hops and each hop increases the risk or something going bad, but if things are normally ok, then distance doesn't seem to be an issue.
What is an issue is ordering. Here, distance does appear to play a bigger factor. By grouping the packets we see a substantial and expected improvement. In a lot of cases, ordering might not matter. Unless you're streaming, it's possible that simply keeping a timestamp and re-ordering on the receiving side would work.
I'd like to test more things. More data for a longer period of time and more locations. I'd also like to compare the performance to TCP. But, overall, I feel that the better-than-I-expected reliability makes UDP something I should keep in my toolbox.