Re-inventing Ping for Link Aggregation

•February 18, 2012 • Leave a Comment

Without doubt, there can be few tools that parallel the usefulness and diagnostic ability of traceroute and ping. These tools offer invaluable insight into the operation and performance of network elements that we take for granted in today’s use of the Internet. Whether it’s a case of trying to discover why your laptop wifi or home broadband is not working, or why the office laser printer is not working, tech-savvy users and network engineers alike have long been acquainted with the use of these tools in order to pinpoint problems.

The same goes for network trouble-shooting in large-scale ISP networks. The principles are the same, even if the interface bandwidths are slightly different. Recently, though, I was offered a stark reminder of just how dependent today’s large-scale ISP networks are on link aggregation technology, and how a technology that makes simple promises can be complicated underneath.

To offer a small digressing background, link aggregation technology allows network devices to seamlessly “bond together” multiple interfaces to make a larger capacity interface. This allows a stop-gap technology extension between today’s ubiquitous 10 GigabitEthernet technology – a staple of most ISPs – and upcoming 100 GigabitEthernet technology, which to most is still lab-bound and expensive. Link aggregation is widely used by large-scale ISPs who routinely run up to 16 or more 10 GigabitEthernet interfaces bundled together to make 160Gbps trunks, while patiently waiting for equipment vendors to make 100GE more practical and cost-effective.

To be useful though, link aggregation technology needs to be augmented with an effective load-balancing algorithm: there has to be a good way of dividing up the traffic demand amongst all of the available bearers. You can’t simply dish out packets to different interfaces in a round-robin fashion. Doing so creates packet reordering issues on individual user sessions which can hurt network session efficiency immensely.

Round-robin per-packet load sharing across multiple links

Instead, the established way to address this problem is to create a “session-aware” hash of a packet and associate it with an individual bearer link that way. This ensures that conversations between Internet endpoints “stick” to an individual 10GE bearer in an ISP’s network, while still allowing the ISP, on aggregate, to carry traffic in excess of a single bearer link.

Session-based traffic load sharing across multiple links

Most network equipment vendors that support link aggregation in this way, support the idea of session-aware load-sharing and at Interoute we make great use of Juniper’s MX-960 MPLS/IP router platform which boasts a generally well-performing implementation of layer-3/layer-4 aware load-balancing. In addition to considering IP address endpoints in the hash decision, it can also include application port numbers. This removes the chance that “chatty” end-stations or proxy-servers with NAT or similar features masquerading many end-users can hog bandwidth on a single bearer.

The algorithm is so good at achieving a fair-balance across all bearer links within a bundle that for most cases, the multiple bearer links can simply be considered to be one link of a much larger size. To illustrate, see the attached graphs which show a group of bearer links within the same bundle and note the almost equal balance of traffic across all links.

Effective traffic load-sharing on Juniper's MX-960

In this particular incident, however, the load-sharing algorithm used on the MX-960, and more specifically, the software that under-pinned it let us down, and I was reminded of how important it is to understand what’s really going on behind the scenes, rather than accepting a conveniently abstracted technology view of the situation.

We’d performed a routine software upgrade of one of our Juniper MX-960 routers in Frankfurt during a maintenance window. The upgrade was designed to rid us of instrumentation management problems and address some perceived security vulnerabilities. It was one of several recent upgrades in a rather long and tedious quest to find a suitable, stable, secure software release that supports the latest 16-port cards that the MX boasts.

Several hours after the low-tide maintenance window closed – approaching daylight in the CET region timezone – Interoute’s Prague NOC began to receive the first of several complaints from customers regarding network performance degradation. Network-savvy customers almost always include the outputs of ping and traceroute in any fault-finding evidence that they produce, but in this case it was rare. They couldn’t quite pinpoint the fault, but they knew things weren’t working correctly. The event of closest correlation was the Frankfurt node software upgrade and experience fosters a healthy scepticism for coincidence. Sure enough, when we were able to re-route customers around this device, their problems seemingly went away.

Now it’s always a difficult decision when one has to choose between action that will likely satisfy a customer’s need for a short-term fix versus prolonging a situation in order to garner more evidence to help nail the issue long-term. We had Juniper engaged but so far none of the instrumentation that we were able to glean was presenting a smoking gun as to exactly what was misbehaving.  Likewise, we were uncomfortable leaving customers on an artificially re-routed path while a suspicious and as yet not completely identified problem remained on one of our Frankfurt nodes with no progress being made. We needed to study the problem happening live. There was no option but to try to find another similarly-configured customer on the Frankfurt node suffering the same problems who could tolerate the problem for long enough so that we could gain more insight.

We didn’t have to look very far. As business hours dawned in the UK, the monitoring system associated with an internal VPN customer started to reported spurious polls in the SNMP activities underpinning its management system, specifically when connected to Frankfurt. We scrutinised on the symptoms, identified the endpoints affected and drew up a list of network elements that affect traffic. General Ping testing was fine, but occasionally SNMP polls would fail and SSH sessions would fail to connect which was extremely puzzling.

It was at this point that our attention focused on the different handling of the traffic types within the Juniper core and we were were reminded of our design decisions to include the features of the load-sharing algorithm that considered TCP and UDP port numbers in the load-sharing decisions. While this results in a very smooth distribution of traffic across bearer links within a bundle, in our case it meant that the network experience of successive network sessions could vary, consuming different network links. For example, a user downloading a file via HTTP might find his download traversing the first bearer link in a bundle, while if he were to press Stop/Refresh, he’d see it move to a different bearer link in the bundle. This would happen because the source TCP port of his client PC would change between the two different download attempts.

What seemed to be happening was that, depending upon how the traffic was hashed within the load-sharing algorithm, there was a chance that it could get lost or discarded in transmission through the Frankfurt node. Ironically, network failures like this, which are really a function of the link aggregation features, manage to completely defeat TCP’s usual reliable transfer mechanisms. If traffic for a session is hashed incorrectly, or hashed to a faulty bearer, that traffic will always suffer until something causes the hash to change. Most client/server protocols, while using a well-known server port, make use of a pseudo-random ephemeral port number for the client. As a result, the symptoms appear as some connections failing, while others are more successful.

We soon realised that, faced with such complexity, we needed a variation on the usual Ping program. In our situation, the ICMP probes produced by Ping would always hash the same way, dependent upon source and destination IP address. As a result, ICMP ping tests wouldn’t adequately testing multiple bearer links in an aggregate bundle.

We found our answer in one of the application protocols that had first alerted us to the situation. SNMP makes use of UDP datagrams to communicate between manager and agent. Without the complexity of TCP retransmissions, it was much more predictable and possible to create a small shell script that could send repeated SNMP queries to a target, originating each query from an independent UDP client port.

Demonstrating session-based packet loss

The results were damning and gave us a solid control check against any actions that we were performing on the network to determine if we were making things better or worse.

We persevered, re-engaged Juniper, and managed to pinpoint the problem down to a specific card configuration on our Frankfurt node – we were spanning a link aggregation bundle across two different types of line card – and this, in conjunction with the software upgrade, appeared to the most likely cause of our problem.

Under carefully controlled conditions, we were able to disable bearer links on one card, and re-enable them on another card, thereby remove the difference in card type. Our new Ping tool was able to instantly confirm the results of our endeavours, and we could breathe easy again.

We were able to make note of the incompatibility and audit the rest of the network for repetition, making corrective plans as required. But we’d re-learned some important lessons during the exercise:

  • The best network monitoring technologies rely on actuating and observing real network traffic, rather than measuring network performance with instrumentation.
  • The most useful and productive technologies often abstract away a detail and complexity in order to enable more sophisticated solutions. But we forget the fundamentals at our peril.

For posterity, our rather simple SNMP/UDP ping shell script wrapper, requiring a version of Net-SNMP, is reproduced here. No warranties!

#! /bin/sh
# Send repeated SNMP UDP datagrams to target host
# Report the response. Ensure SNMP client doesnt retry
# which would mask a failure.
OID=sysUpTime.0
COMMUNITY=public
[ $# -ge 1 ] && HOST=$1
[ $# -ge 2 ] && COMMUNITY=$2
[ $# -ge 3 ] && OID=$3
if [ $# -eq 0 ]; then
 echo "Usage: $0 host [community] [snmp-oid]"
 exit 0
fi
echo "PING $COMMUNITY@$HOST $OID"
while true; do
 snmpget -r 0 -c $COMMUNITY $HOST $OID >/dev/null 2>&1
 if [ $? -eq 0 ]; then
 printf \!
 else
 printf .
 fi
 sleep 1
done

Booting up Interoute’s VDC in 10 seconds…

•December 10, 2011 • Leave a Comment

I’ve spent the weekend beta-testing Interoute’s exciting new Virtual Data Centre proposition and I feel compelled to share my experiences. VDC, as its known internally at Interoute, is a recent technology development from the development group based in London. It attempts to fuse the flexibility and elasticity of public cloud computing offerings, with the home-territory of Enterprise VPN business to which Interoute’s dense IP/MPLS network has always been strong and dependable, having arguably pioneering MPLS VPN technology in Europe. As a well-known critic of poor implementations of virtualisation technology within Interoute, I was invited to participate in an early beta trial and comment freely.

The general idea of Virtual Data Centre is to revolutionise the role of the typical Enterprise IT function in order to mutate today’s compute functionality into a utility resource as consumable as power or floor-space, while still maintaining the assurance, reliability, dependability and integrity of one’s own private enterprise network.

The Problem

Conventional public cloud offerings in this area from respectable competitors such as Amazon or Rackspace, have already demonstrated how virtualisation technology is changing the way that people think about compute power. Cloud-powered Internet technologies are found everywhere: whether it is latest version of the Sims game on Facebook, or the ubiquitous Dropbox dependable personal files folder.

But the problem most enterprises have is the conundrum of how to capitalise on so-called “cloud-power” while retaining control of their own privacy and security. Most large-scale cloud compute implementations are, by their nature, Internet-based, since it is only Internet-scale that provides the volume that yields the skills and experience to operate a virtualisation platform correctly, completely and dependably. It’s also only Internet-based demand that provides the variation in demand that makes the an elastic compute capability viable: one customer’s peak usage is another customer’s quiet time.

Private cloud or, to be more specific, the internal implementation of cloud compute facilities for sharing within the enterprise is often unable to offer the same level of benefit because of this lack of variation of demand. Within one organisation, demand synchronisation is highly likely meaning resource starvation is a very real possibility.

This implicit association with the public Internet and viable cloud computing, then, is often troublesome for enterprise IT managers to accept. While enterprise applications providers such as Microsoft have taken strides recently to ensure that their client/server applications operate quite seamlessly whether making use of private network transport or whether using the public Internet, there are still a myriad of applications in common use in the enterprise space that simply don’t expect the Internet to be between them and their client user.

The Solution

But a managed VPN service provider, already operating in the space of corporate and enterprise WAN connectivity, is in an ideal place to take Internet packet economics, apply it to cloud compute functionality and make it a compelling platform for Enterprise IT managers.

And this is what is happening at Interoute. Already providing the backbone transport across a large European footprint for Enterprises that have realised that operating WAN networks is a distraction from their core business, Interoute have architected a multi-customer-aware Virtual Data Centre architecture that attaches intimately with the MPLS VPN technology now accepted as a secure industry standard for providing isolated logical WAN networks for multiple customers using a common physical network asset.

The result is a platform which allows a customer to view and manipulate the set of virtual resources in his enterprise network, via the very same web-based customer portal that he’s already used to using for routine maintenance of his VPN. Consequently, turning up a new server application in the enterprise network is transformed from the long wieldy process that used to involve hardware costing, space, cooling and environmental analysis, physical installation and software tuning, to a fast, responsive activity inline with the user’s business requirements.

VDC In Action

Access to the VDC Control Centre is via Interoute’s standard customer portal, or the Hub, as it’s often known. Customers log in with existing credentials and are immediately taken to a Flash-based enriched control panel that offers a visualisation of the current state of the Virtual Data Centre.

The package the VDC team have offered me provides a generous 48 virtual CPUs, 150-odd GB of RAM, 6 VLANs and two distinct storage options: VM-specific and shared, and to allow me to try out the platform, my specific VDC has been associated to a special internal VPN that is used for providing services to a bespoke customer.

Initial virtual resource summary

Discussions are still underway, but I don’t think I’m stepping on anyone’s toes by suggesting that the pricing scheme likely to be offered is based on a combination of the resources consumed either on a fixed rental basis, or time-based. There is no real virtual-machine tax per se; one simply dimensions the VMs to make use of the resources in question.

These resources deserve an explanation. Virtual CPUs, RAM and internal storage are all associated with a VM and these properties define the key performance characteristics. A VM’s use of these resources is defined in a template “image” defined in the ever-growing Interoute App Store library. For example, there is a basic vanilla Windows Server 2008 image, making use of 1 CPU, 50GB internal disk and 512MB RAM.

There are possibilities to customise these images to adjust these properties, and I’m reliably told that a VM-importer will allow a customer to “bring-their-own” pre-built VM for execution, although it isn’t available presently.

In order to cater for network applications on shared data, however – probably one of the common cases – the shared storage option allows the customer to specify an amount of storage space that can be made available using common storage protocols to VMs. In my opinion, it is this separation of application logic and data that is a vital distinction in allowing this technology to supersede traditional managed application hosting models. If I manage to somehow “break” an application’s operation, management or otherwise, recovery should be as simply as taking a new VM from the App Store library again. My data is quite safe.

The final resource that requires a mention is the VLAN. By allowing a user to create separate networks, and attach VMs to these networks, it is possible to create quite complex multi-layer topologies mirroring conventional tiered data-centre applications.

Application groups (by another name)

Double-clicking on the Virtual Data Centre in question – a customer can have more than one, associated with geography or availability zone – introduces the view of Virtual Appliances. While the terminology is at first confusing, one quickly realises that this is actually an “application group” and it allows a customer to easily operate several distinct cloud-based projects within a single VDC, pooling together compute resources into one lump, while keeping the necessary application and developers separated. I imagine the VDC team are likely looking at the possibility of defining users and groups and segregating administrative access to VM installations based on application groups.

Inside the virtual 19" rack

Within an application group, a user is given a very simplistic list-view or graphical view showing the virtual machines within the group and their current status. In the graphical view, the left hand-side of the screen shows a view into the Interoute App Store library, which is arranged by category and promises a variety of images from basic installed operating systems that users can further customise to free-standing OS/application combinations designed to just “deploy-and-go”.

It’s a simple matter of dragging an image onto the blank canvas in order to add it in to the application group. For testing purposes, I ignore the plethora of Linux images, and select something reliable instead. I can, after all, I can easily change my mind!

A lot easier than installing real NICs!

Once in place on my canvas, it’s also easy to modify some of the mutable properties associated with the VM: specifically, the external storage that is available and the networks that are attached, though helpfully, the software automatically selects the next free IP address in a pool of addresses that Interoute have nominated for default use in customers’ VPNs. I can modify this assignment as necessary, and add additional network interfaces or storage volumes. Once happy, I can simply click OK and then press the reassuringly-chunky flash-rendered power-button in order to “deploy” my VM. It lacks the tactile feedback, but is just as effective nonetheless.

Turning the key...

It is at this point that one realises the truly revolutionary potential for changing the way that Enterprise IT works. During my testing, I could simply drag an image from the library to the canvas and deploy it in under a minute. It booted in several seconds, and I could login to it to use it shortly afterwards.

Granted, my “application” was simply a vanilla FreeBSD installation, and the VDC interface and concept still has a few little rough edges, but the experience of rapid deployment, installation and usability is compelling.  What will be key is how quickly Interoute manage to grow their current App Store library to become a comprehensive one-stop-shop of ready-made Enterprise applications that can be deployed in-cloud and made available to users extremely rapidly.

What will also be significant for more complicated applications will be the possibility to delegate the management of an application to a specialist or a channel partner. It’s quite likely that application service providers, currently in the business of providing applications to customers and hosting them, will be extremely interested by the idea of becoming a partner or reseller on Interoute’s VDC.

Finally, I am curious to see how the technology may be used to enable a relatively new capability of staging and trialling. The speed of deployment means that it is extremely feasible for customers to “try-out” applications, almost without commitment.

Summary

Interoute’s VDC development represents a recognition that that the unit of deployable application is fast-changing – from CD-ROM media in the 1990s, to downloadable source code, .tar.gz or .MSI file in the 2000s, to the ubiquitous .VMX/.VMDK virtual machine image of today . Through VDC, Interoute is providing a vehicle for Enterprise customers to host applications and functionality in a secure private cloud, where the distraction of hardware and OS maintenance is irrelevant and insignificant.

I think it’s an exciting development, and I eagerly await early customer responses.

An Early Peek at the BlackBerry PlayBook

•February 17, 2011 • Leave a Comment

PlayBook Home Screen

It was with much anticipation and excitement that Research in Motion, arguably pioneers of practical smartphone technology, revealed their new iPad-tackling PlayBook tablet device at GSMA’s Mobile World Congress in Barcelona this week. RIM took a significant portion of the App Planet hall at MWC and showcased an early pre-release of their new PlayBook tablet, along with selected developers who were able to demonstrate early applications. The actual operating system software is still under development, but the devices were actually there and available to play with. Final release is intended for Q1 in the US but pricing is not confirmed at this time.

The first thing to notice is how light the devices are. At 400g, they don’t weigh much more than the average smart-phone but the 7″ screen manages a comfortable 1024×600 pixel resolution which gives both space and definition. The device boasts an impressive 1GHz CPU with 1GB RAM which RIM promise will make applications “fly”, and includes both front and rear facing cameras, paving the way for a FaceTime competitor.

With the beta operating system installed and bundled applications, the device is fast and usable. iPad veterans could be seen struggling to adapt and fumbling with pinch gestures to zoom which the PlayBook doesn’t support. The PlayBook offers its own gestures, however, to minimise apps, rotate through apps, and bring up the virtual keyboard respectively, and these are easy to get used to. Annoyingly – for me at least – the virtual keyboard makes the same mistake as the iPad in failing to combine alpha and numeric characters together, instead requiring the user to shift between the two instead.

HTML5 Test Result

The new WebKit-based browser is slick and fast and the development version scored a promising 219 on HTML5Test, supporting the Canvas API completely. Provocatively, it also includes support for Flash 10.1, a serious win over the Apple contender. Flash video works and, in conjunction with the built-in H.264 support, which can drive an external 1080p HDMI device, ensures that there should be no Internet-based video that this device can’t view.

Testing Flash with the interoute.com website

Flash games, such as Bloxorz work with good performance, but it’s at this point that one realises that Steve Jobs’ fifth point about Flash is quite accurate: in a lot of cases Flash apps are written without consideration for touch devices, placing a lot of user-interface significance on hover versus click interactions which are clearly not possible on a touch device. Bloxorz, in particular, needs cursor key input for which the PlayBook’s virtual keyboard doesn’t cater.

Multi-tasking, something to which BlackBerry users have long been accustomed, is not left as an after-thought either. The underlying QNX operating system seems to effortlessly juggle multiple applications around in response to the user’s demand. An interesting experiment showed a Quake demo managing 30 frames per second while the bundled YouTube app happily runs HD video and the user is able to flick between the two with a finger-gesture.

Network connectivity is wi-fi only with support for all 802.11a/b/g/n and, quite topically by recent events, I noticed that IPv6 is enabled by default. A 3G version is promised for later in the year but here the story gets quite interesting. The PlayBook itself is not a BlackBerry. This means that there is no push email, no BlackBerry Messenger and no BES/BIS connection to configure. The device is essentially an Internet tablet device with Internet applications but RIM have stated that it is possible to pair with a BlackBerry over Bluetooth in order to get the traditional BlackBerry services. This has the potential to cause quite a bit of unnecessary confusion over quite how a user gets network access: via native PlayBook Wifi? Via paired 3G?

If RIM is sensible, they will try to get away from the current mess of network provider-specific BIS with its poor support for push notifications and mailbox synchronisations, while still allowing for the BlackBerry-specific network communication for enterprise VPN connectivity and BlackBerry Messenger. I noticed through my fiddling that the PlayBook does indeed possess a PIN, the essential address token for communication on the BlackBerry network, so perhaps there is hope.

No new platform would be complete without a taster of applications to come, and there seemed to be plenty of candidates. Electronic Arts were on hand at MWC to show a speedy driving game that makes use of the internal accelerometers to allow a player to steer with the device, while Citrix showed their commitments to the business market with a concept demonstration of how a Citrix client for PlayBook might look.

RIM, keen to encourage development on this exciting new platform, ran a series of boot-camps which were immediately over-subscribed. I was lucky enough to get a seat on one of the sessions that ran through one of the two immediately supportable development methods for PlayBook apps, WebWorks. In recognition of the fact that JavaScript is slowly taking over the world, WebWorks combines the use of HTML5, CSS and JavaScript, along with customised JavaScript extensions in order to encourage the generation of “Weblets” that can be deployed as easily as a web page, but stored and executed locally on the PlayBook and make use of native-like functionality.

The other major technology for targeting the PlayBook is Adobe Air, which is considered the native API. What is surprising, here, is that RIM is not promoting the use of the Java development model that has been so successful for BlackBerry. Instead, the PlayBook contains an Air runtime execution environment and applications are deployed as Air packages. Java users are not left out in the cold however; a Java interface is promised for some time in the future.

During the boot-camp, Sanya Kiruluka and her developer relations team confidently ran through how the mandatory Hello World app and other examples can be created using the WebWorks framework, executed in a virtual machine QNX simulator (a rather innovative idea) and deployed to production hardware. They also announced a competition to stimulate creativity by offering a free PlayBook to a selected winner who manages to design, implement and publish to AppWorld a working and qualified PlayBook application by March 15th.

It’s quite clear that application availability will play a significant part in defining the PlayBook’s success, but early indications are that the hardware base makes for a firm foundation.

Facebook Social Engineering

•February 2, 2010 • Leave a Comment

So after returning from a slight blog writing hiatus (for a variety of reasons), I find myself logged into Facebook late one Sunday afternoon when a former work colleague that I haven’t seen for a while pops up on the integrated IM system.

I don’t tend to use Facebook’s chat facility that much but I am not averse to it. When I offer a warm greeting and he politely returns it, but when I enquire into his wellbeing he drops the rather devastating bombshell that he’s been mugged while holidaying in Scotland! Not just mugged, but mugged at gunpoint no less. He explains that he’s lost all his cash, credit cards, wallet and phone in the attack but that he’s okay physically.

This is a lot to take in with the aftermath of Sunday lunch, and while I am breathing a sigh of relief that he has survived unscathed, you’re probably already a little suspicious of what might be coming. I still remain quite firmly committed, however.

His flight home is in two hours, he explains, and the hotel won’t let him leave without settling his bill! No problem, I reassure him, don’t panic. We can easily sort this out over the phone now with a credit card and we can settle it later when he returns to London.

Grateful at this news, he triggers alarm bell number one: can I transfer the money by Western Union wire? What? Why bother with that when I could just call up the hotel desk and secure it on a credit card?

I dismiss this anomaly, and proceed to tell him the plan: go down to reception, tell them a friend will settle the bill and get their phone number so we can arrange the transaction. Tell them to give you a cash advance for a taxi to the airport as well. He seems a little confused by the last point and then rings alarm bell number two in my head: the hotel has a +44 702 “follow-me” number! At this point, my suspicions are aroused: I’m curious, but I still don’t want to believe I’ve been taken in.

In the casual discussion that follows, I reminisce on a tale from work involving one of our colleagues, deliberately forgetting his name, and look to him to banish all my concerns by completing the tale.

But he goes quiet. He can’t do it, and the inescapable truth hits me. This really is a scam. This is not really my friend at all, but someone who has somehow managed to gain the credentials necessary to pose as him. I persist with my questioning and, realising that he has been rumbled, he concedes defeat by logging out and blocking me, presumably to prevent me raising the alarm with my friend’s friends by writing on his wall or similar.

Now as a well-prepared and diligent reader, you’re probably surprised and disappointed that I got so close to handing over my credit card details to scammers of unknown origin. But what makes this scam so considerable are the mechanisms that it capitalises on to disarm one’s normal caution and guarded behaviour when dealing with unknown Internet correspondents.

  • Facebook friends tend to come rather high on the trust list. This is not a random email from the exiled president of a small African nature seeking financial help to realise his investment in gold, diamonds, father’s inheritance or whatever. This is someone I’ve explicitly authorised, someone who matches a picture photograph, someone with whom I have conversed.
  • Direct IM conversations leave no opportunity for the consideration and reflection that would usually be available before making important decisions. It’s similar to the double-glazing salesman who offers the “sign today only” deal.
  • Questioning the authenticity of a message can be considered quite a hostile thing to do and people often feel reluctant to do so. I am sure that if I emailed or IM’d a work colleague and asked for a sensitive bit of data as a convenience, they would probably oblige.
  • The urgency of the situation – a flight – and the fact that my friend has already been through a shocking ordeal demands decisive action from me if I am a good friend.

The combination of all these factors make this attack an extremely potent one, and the only failings were the use of some casual language, poor anticipation of the likely responses to the situation and the logistics and local conventions involved for expedient payment. For most of the dialog, I harboured more concerns over how to extract the cash from my Liverpudlian friend on his return than I did about the authenticity of the request!

The good news is that Facebook appear to be well aware of this class of scam and do offer some sensible practical advice on dealing with the problem and reporting the issue.

I do hope that no-one else is adversely affected by scams such as this, but a work colleague pointed me to some useful advice and general tips on dealing with IM – whatever the platform – which is essentially “pre-authenticated” but in a weak manner:

  • Get an awareness of people’s writing styles and language: in email, and IM or other short form. They can be quite unique. For instance, a colleague I know at work can be relied upon 100% to apostrophe plurals and omit on contractions. I know if I ever see correct punctuation from him that I should be suspicious!
  • Form a characteristic greeting that you always use to initiate and respond with. Correspondents will grow accustomed to it, and will hopefully note its absence in fraudulent communications. Examples include esoteric greetings or even saying hello in a foreign language. Consistent repeatable behaviour is the key.

As the sophistication of electronic “social engineering” attacks increase, I am sure that it will be necessary for people to become more hardened and vigilant in their use of social networking technology, but hopefully this won’t detract from the usefulness and effectiveness that it provides.

Quagga chokes on large 4-octet AS numbers

•May 11, 2009 • Leave a Comment

Last weekend – a bank holiday weekend in the UK – saw a rather significant BGP-related disruption on the Internet. Fortunately it didn’t affect the mainstream router vendors, but caused service interruptions for anyone dependent on certain versions of the Quagga routing protocol suite (an open-source collection of routing protocol implementations with a configuration management interface that closely resembles mainstream Cisco routers). In a lot of cases, this was restricted to informational route server platforms that provide looking glass capabilities on to backbone networks, but several alternative vendors also produce router appliances based upon the Quagga code base.

I spoke at length with the Interoute on-call engineer for the weekend after several customers reported that their BGP routers had crashed for some unknown reason. He’d investigated the situation and, after discussion, it became apparent that other Internet users were reporting similar problems and that the routers involved were all based upon the Quagga routing protocol suite. The problem was linked to a software defect that manifests itself upon exposure to AS numbers exceeding 5 digits, eg. above 99999.

Once the nature of this defect was understood, we were able to identify the specific BGP updates that were causing the customers’ routers to crash and filter them from further advertisement so that connectivity was restored to those affected customers.

Further examination then revealed that the Quagga BGP daemon seemed to be trying to render an AS number into a string buffer dimensioned for only 5 characters. While this would have been perfectly sufficient for today’s ASNs in the range 0-65535, clearly larger-numbered ASNs could not be handled this way.

Larger-numbered ASNs require support for the recent IETF RFC4893 draft standard to extend the BGP AS number space from 2-octets to 4-octets in order to be represented in standard BGP attributes such as the AS Path. Quagga claims to fully implement this but it seems that some sections of the code did not fully consider the ramifications of dealing with 4-octet ASNs.

Rather frustratingly, the Quagga development team had actually already identified and corrected the problem in February when it was first reported but vendors offering router appliances based on the software suite had not yet had sufficient chance to re-distribute the software updates to address the problem.

 Since then, the problem had became significant to production routers  because a new network making use of a freshly-assigned ASN over 100000 had attached to the Internet and this was causing unpatched Quagga routing daemons around the world to crash every time they encountered an AS path containing the longer AS number!

 The full irony of the situation emerged slightly later on the nanog mailling list when it turned out that the new network making use of the problematic ASN was actually a test network designed to demonstrate the production-readiness of 4-octet AS numbers to service providers!

So in summary, it seems that this was another small but painful step on the way to getting what is a complicated, but essential, upgrade to the global BGP routing system accepted for mainstream use by service providers and customers alike.

Further information:
Patched version of the Quagga routing protocol suite
Geoff Huston’s insightful analysis of AS number resource consumption from 2005

Problems with BGP Prepending

•February 21, 2009 • Leave a Comment

I’ve spent considerable time this week working on a problem caused by accidental and unintentional BGP AS Path prepending. For those not in the know, AS Path prepending is the so-called practice of artificially extending the BGP AS Path attribute associated with one’s Internet routes in order to influence the preference in route selection on foreign networks. It is a fairly coarse metric, but it can be an effective mechanism in controlling incoming traffic flow which is often difficult to deal with otherwise.

Unfortunately, over the last week or so, several incidents of excessive prepending have been witnessed with AS paths containing up to 200-odd ASNs. This wouldn’t be illegal, however, since RFC4271 – the latest cut of the BGP protocol specification – only seems to state a limit by way of a maximum BGP message size, which is 4096 bytes. However Cisco IOS routers running certain older code revisions are quite vulnerable to BGP update messages which are accompanied by long AS paths.

There appear to be a number of weaknesses but they all have similar effects: the update with the long AS path is discarded, a notification sent to the other peer and the session torn down. Or, the update is malhandled – because of its content - corrupted and passed on where the next router declares it invalid and tears down.  The consequence is the same: interruption of BGP and thus full routing table for anyone downstream of the router, and route flapping of any advertised routes from the affected router upstream.

Observers on the nanog mailing list (here and here) were quick to notice the recent instances of the problem and there was speculation and puzzlement over the logic behind such a BGP advertisement. As most BGP-capable network engineers understand, prepending is not a linear feature whose effects can be turned up or down. It is more of a threshold-crossing event so advertising a BGP route with 200 ASNs seems particularly over-zealous and unlikely to be deliberate legitimate use.

Some speculated that it was ill-informed routing policy or incapable operators while others suggested that this and the other related incidents prior to it – which all exhibited the same characteristics - were essentially a remote-control DoS attack against customers dependent on the routers running older code.

Since Interoute was implicated in the most recent case, however,  a colleague and I were able to investigate the problem thoroughly with our customer and the solution was quite remarkable in demonstrating how a relatively simple software defect can cause such a widespread network problem.

The network in question turned out to be running the Microtik Router OS platform, which is an alternate BGP-based router plaform capable of quite specialised network functionality. I’ve little direct experience of the system myself and I’m unsure of the heritage of the BGP implementation, but based on the observations seen here, it would appear to be organic rather than having roots in the more commonly available implementations: gated, zebra, quagga or openbgpd.

The network operator had indeed intended to prepend the AS number on his route advertisements in order to discourage traffic, but a relatively small user interface error had allowed him to configure it incorrectly. When configuring the prepend operation, instead of specifying the desired AS path to be seen – as might have been expected on a Cisco IOS device – the configuration asked the operator to specify the number of times to prepend instead.

As a sensible precaution, the documentation states that the number of AS prepends that can be applied (ie. the number of times the AS will be repeated in the path) is limited to a sensible amount but unfortunately the command-line user interface doesnt actually enforce the limit. Instead the user input is taken literally and interpretted as the number of times to repeat the AS number! Since most ASNs in use on the Internet are large 16-bit quantities, this results in an erroneous configuration that attempts to generate BGP updates with excessively long AS paths which consequently tickle the Cisco bugs.

In fact – as one shrewd observer noted – the number of prepends input is not taken verbatim, but is divided modulo 256 when it is seen as being too large in order to “make it fit”. It’s unclear whether this functionality is an unintentional side-effect of a string-to-number handling library that the code is using or whether the modulo division was thought to be helpful in some way.

Thankfully we were able to work with our customer in order to assist him identify and correct the problem and minimuse further disruption experienced by Internet users but it seems that this relatively simple defect has caused noticeable incidents world-wide (now usefully documented on bgpmon) and may still cause further pain for those with vulnerable platforms. Those with vulnerable Cisco platforms can make use of the Maximum BGP AS Path Limit feature in IOS to limit the effects, although it isnt effective in all cases.

Capacity Planning and Traffic Engineering with Packet Design

•February 8, 2009 • Leave a Comment

I was afforded today a brief overview of Packet Design’s software products in the areas of capacity planning and traffic engineering on IP networks.

I can’t provide a complete review of the capabilities of their product since it is quite comprehensive, but I can comment on the aspects that were presented to me which may have particular interest to those seeking a tool to assist in packet network traffic engineering.

Packet Design’s Route Explorer appliance is a routing protocol modelling tool. To model the network’s IGP, it acts as a fully-fledged OSPF or IS-IS speaker that is attached directly into the network via the closest backbone router. Through this interface, it passively observes the link state advertisements that occur within the IGP protocol and it is able to create a replica topology map that shows which devices are connected and by which links. This can be visualised on a GUI.

This allows an engineer to quickly query internal routes and answer questions related to which links are used to satisfy which routes on the network without actually touching the network. An engineer can also assess the impact of works activities or the benefits of proposed topology changes by artificially manipulating the IGP topology in terms of links or link metrics and Route Explorer will show the resulting effects on routing decisions.

In addition to IGP simulation, Route Explorer can also speak IBGP to backbone routers on the network in order to glean information about the external networks to which one is connected. This effectively extends the “what-if” route look-ups possible within the IGP to include all exterior Internet routes which is very powerful. Such capability allows one to assess the network routing effects associated with connecting a new customer, or losing a customer.

Route Explorer capacity is dimensioned based upon the route count that one expects to feed the devices for modelling and it would appear to be capable of supporting split-AS routing policies where one part of the network may have a slightly different view of the best exit than another. 

The Traffic Explorer appliance complements Route Explorer by collecting Netflow accounting records from router platforms and mapping them onto the topology discovered by Route Explorer in order to compute the estimated usage on a per-link basis. The general recommendation is to enable Netflow accounting in a sampled mode on all external interfaces so that traffic entering and leaving the autonomous system is observed by the Traffic Explorer platform but not repeatedly counted.

As well as offering a near-realtime view on link usage that is independent of SNMP interface meters, Traffic Explorer extends the usefulness of Route Explorer since the “what-if” scenarios that can be conceived can also include the resulting traffic swing which is an extremely attractive feature for impact analysis or new or upgraded circuit planning.

Traffic Explorer is dimensioned by the rate of receipt of Netflow accounting records and both Traffic Explorer and Route Explorer are licensed by appliance. This is an attractive feature since in most cases it means license fees are proportional to traffic and hopefully revenue. Competing products in this area often license in terms of the raw network element count, irrespective of actual traffic levels, which can punish resilient network design.

Packet Design’s portfolio also includes an MPLS VPN Explorer capability which makes use of MPLS-based Netflow and multi-protocol BGP in order to report on VPN-specific routing and traffic. 

User access to both Route Explorer and Traffic Explorer is through X11 or VNC to a visualisation server which abstracts the details of the individual collector elements needed to create the view. The X11/VNC access is a slight disappointment, in my opinion, since one would think that a native client with specific client/server protocl would probably perform better, especially on geographically-diverse networks where a central server is unlikely to be in the same place as the user. 

In summary, the Packet Design solution appears to be a promising addition to a network looking to enrich its view of routing and traffic analysis.

 
Follow

Get every new post delivered to your Inbox.