I’ve spent considerable time this week working on a problem caused by accidental and unintentional BGP AS Path prepending. For those not in the know, AS Path prepending is the so-called practice of artificially extending the BGP AS Path attribute associated with one’s Internet routes in order to influence the preference in route selection on foreign networks. It is a fairly coarse metric, but it can be an effective mechanism in controlling incoming traffic flow which is often difficult to deal with otherwise.
Unfortunately, over the last week or so, several incidents of excessive prepending have been witnessed with AS paths containing up to 200-odd ASNs. This wouldn’t be illegal, however, since RFC4271 – the latest cut of the BGP protocol specification – only seems to state a limit by way of a maximum BGP message size, which is 4096 bytes. However Cisco IOS routers running certain older code revisions are quite vulnerable to BGP update messages which are accompanied by long AS paths.
There appear to be a number of weaknesses but they all have similar effects: the update with the long AS path is discarded, a notification sent to the other peer and the session torn down. Or, the update is malhandled – because of its content – corrupted and passed on where the next router declares it invalid and tears down. The consequence is the same: interruption of BGP and thus full routing table for anyone downstream of the router, and route flapping of any advertised routes from the affected router upstream.
Observers on the nanog mailing list (here and here) were quick to notice the recent instances of the problem and there was speculation and puzzlement over the logic behind such a BGP advertisement. As most BGP-capable network engineers understand, prepending is not a linear feature whose effects can be turned up or down. It is more of a threshold-crossing event so advertising a BGP route with 200 ASNs seems particularly over-zealous and unlikely to be deliberate legitimate use.
Some speculated that it was ill-informed routing policy or incapable operators while others suggested that this and the other related incidents prior to it – which all exhibited the same characteristics – were essentially a remote-control DoS attack against customers dependent on the routers running older code.
Since Interoute was implicated in the most recent case, however, a colleague and I were able to investigate the problem thoroughly with our customer and the solution was quite remarkable in demonstrating how a relatively simple software defect can cause such a widespread network problem.
The network in question turned out to be running the Microtik Router OS platform, which is an alternate BGP-based router plaform capable of quite specialised network functionality. I’ve little direct experience of the system myself and I’m unsure of the heritage of the BGP implementation, but based on the observations seen here, it would appear to be organic rather than having roots in the more commonly available implementations: gated, zebra, quagga or openbgpd.
The network operator had indeed intended to prepend the AS number on his route advertisements in order to discourage traffic, but a relatively small user interface error had allowed him to configure it incorrectly. When configuring the prepend operation, instead of specifying the desired AS path to be seen – as might have been expected on a Cisco IOS device – the configuration asked the operator to specify the number of times to prepend instead.
As a sensible precaution, the documentation states that the number of AS prepends that can be applied (ie. the number of times the AS will be repeated in the path) is limited to a sensible amount but unfortunately the command-line user interface doesnt actually enforce the limit. Instead the user input is taken literally and interpretted as the number of times to repeat the AS number! Since most ASNs in use on the Internet are large 16-bit quantities, this results in an erroneous configuration that attempts to generate BGP updates with excessively long AS paths which consequently tickle the Cisco bugs.
In fact – as one shrewd observer noted – the number of prepends input is not taken verbatim, but is divided modulo 256 when it is seen as being too large in order to “make it fit”. It’s unclear whether this functionality is an unintentional side-effect of a string-to-number handling library that the code is using or whether the modulo division was thought to be helpful in some way.
Thankfully we were able to work with our customer in order to assist him identify and correct the problem and minimuse further disruption experienced by Internet users but it seems that this relatively simple defect has caused noticeable incidents world-wide (now usefully documented on bgpmon) and may still cause further pain for those with vulnerable platforms. Those with vulnerable Cisco platforms can make use of the Maximum BGP AS Path Limit feature in IOS to limit the effects, although it isnt effective in all cases.