Booting up Interoute’s VDC in 10 seconds…

•December 10, 2011 • Leave a Comment

I’ve spent the weekend beta-testing Interoute’s exciting new Virtual Data Centre proposition and I feel compelled to share my experiences. VDC, as its known internally at Interoute, is a recent technology development from the development group based in London. It attempts to fuse the flexibility and elasticity of public cloud computing offerings, with the home-territory of Enterprise VPN business to which Interoute’s dense IP/MPLS network has always been strong and dependable, having arguably pioneering MPLS VPN technology in Europe. As a well-known critic of poor implementations of virtualisation technology within Interoute, I was invited to participate in an early beta trial and comment freely.

The general idea of Virtual Data Centre is to revolutionise the role of the typical Enterprise IT function in order to mutate today’s compute functionality into a utility resource as consumable as power or floor-space, while still maintaining the assurance, reliability, dependability and integrity of one’s own private enterprise network.

The Problem

Conventional public cloud offerings in this area from respectable competitors such as Amazon or Rackspace, have already demonstrated how virtualisation technology is changing the way that people think about compute power. Cloud-powered Internet technologies are found everywhere: whether it is latest version of the Sims game on Facebook, or the ubiquitous Dropbox dependable personal files folder.

But the problem most enterprises have is the conundrum of how to capitalise on so-called “cloud-power” while retaining control of their own privacy and security. Most large-scale cloud compute implementations are, by their nature, Internet-based, since it is only Internet-scale that provides the volume that yields the skills and experience to operate a virtualisation platform correctly, completely and dependably. It’s also only Internet-based demand that provides the variation in demand that makes the an elastic compute capability viable: one customer’s peak usage is another customer’s quiet time.

Private cloud or, to be more specific, the internal implementation of cloud compute facilities for sharing within the enterprise is often unable to offer the same level of benefit because of this lack of variation of demand. Within one organisation, demand synchronisation is highly likely meaning resource starvation is a very real possibility.

This implicit association with the public Internet and viable cloud computing, then, is often troublesome for enterprise IT managers to accept. While enterprise applications providers such as Microsoft have taken strides recently to ensure that their client/server applications operate quite seamlessly whether making use of private network transport or whether using the public Internet, there are still a myriad of applications in common use in the enterprise space that simply don’t expect the Internet to be between them and their client user.

The Solution

But a managed VPN service provider, already operating in the space of corporate and enterprise WAN connectivity, is in an ideal place to take Internet packet economics, apply it to cloud compute functionality and make it a compelling platform for Enterprise IT managers.

And this is what is happening at Interoute. Already providing the backbone transport across a large European footprint for Enterprises that have realised that operating WAN networks is a distraction from their core business, Interoute have architected a multi-customer-aware Virtual Data Centre architecture that attaches intimately with the MPLS VPN technology now accepted as a secure industry standard for providing isolated logical WAN networks for multiple customers using a common physical network asset.

The result is a platform which allows a customer to view and manipulate the set of virtual resources in his enterprise network, via the very same web-based customer portal that he’s already used to using for routine maintenance of his VPN. Consequently, turning up a new server application in the enterprise network is transformed from the long wieldy process that used to involve hardware costing, space, cooling and environmental analysis, physical installation and software tuning, to a fast, responsive activity inline with the user’s business requirements.

VDC In Action

Access to the VDC Control Centre is via Interoute’s standard customer portal, or the Hub, as it’s often known. Customers log in with existing credentials and are immediately taken to a Flash-based enriched control panel that offers a visualisation of the current state of the Virtual Data Centre.

The package the VDC team have offered me provides a generous 48 virtual CPUs, 150-odd GB of RAM, 6 VLANs and two distinct storage options: VM-specific and shared, and to allow me to try out the platform, my specific VDC has been associated to a special internal VPN that is used for providing services to a bespoke customer.

Initial virtual resource summary

Discussions are still underway, but I don’t think I’m stepping on anyone’s toes by suggesting that the pricing scheme likely to be offered is based on a combination of the resources consumed either on a fixed rental basis, or time-based. There is no real virtual-machine tax per se; one simply dimensions the VMs to make use of the resources in question.

These resources deserve an explanation. Virtual CPUs, RAM and internal storage are all associated with a VM and these properties define the key performance characteristics. A VM’s use of these resources is defined in a template “image” defined in the ever-growing Interoute App Store library. For example, there is a basic vanilla Windows Server 2008 image, making use of 1 CPU, 50GB internal disk and 512MB RAM.

There are possibilities to customise these images to adjust these properties, and I’m reliably told that a VM-importer will allow a customer to “bring-their-own” pre-built VM for execution, although it isn’t available presently.

In order to cater for network applications on shared data, however – probably one of the common cases – the shared storage option allows the customer to specify an amount of storage space that can be made available using common storage protocols to VMs. In my opinion, it is this separation of application logic and data that is a vital distinction in allowing this technology to supersede traditional managed application hosting models. If I manage to somehow “break” an application’s operation, management or otherwise, recovery should be as simply as taking a new VM from the App Store library again. My data is quite safe.

The final resource that requires a mention is the VLAN. By allowing a user to create separate networks, and attach VMs to these networks, it is possible to create quite complex multi-layer topologies mirroring conventional tiered data-centre applications.

Application groups (by another name)

Double-clicking on the Virtual Data Centre in question – a customer can have more than one, associated with geography or availability zone – introduces the view of Virtual Appliances. While the terminology is at first confusing, one quickly realises that this is actually an “application group” and it allows a customer to easily operate several distinct cloud-based projects within a single VDC, pooling together compute resources into one lump, while keeping the necessary application and developers separated. I imagine the VDC team are likely looking at the possibility of defining users and groups and segregating administrative access to VM installations based on application groups.

Inside the virtual 19" rack

Within an application group, a user is given a very simplistic list-view or graphical view showing the virtual machines within the group and their current status. In the graphical view, the left hand-side of the screen shows a view into the Interoute App Store library, which is arranged by category and promises a variety of images from basic installed operating systems that users can further customise to free-standing OS/application combinations designed to just “deploy-and-go”.

It’s a simple matter of dragging an image onto the blank canvas in order to add it in to the application group. For testing purposes, I ignore the plethora of Linux images, and select something reliable instead. I can, after all, I can easily change my mind!

A lot easier than installing real NICs!

Once in place on my canvas, it’s also easy to modify some of the mutable properties associated with the VM: specifically, the external storage that is available and the networks that are attached, though helpfully, the software automatically selects the next free IP address in a pool of addresses that Interoute have nominated for default use in customers’ VPNs. I can modify this assignment as necessary, and add additional network interfaces or storage volumes. Once happy, I can simply click OK and then press the reassuringly-chunky flash-rendered power-button in order to “deploy” my VM. It lacks the tactile feedback, but is just as effective nonetheless.

Turning the key...

It is at this point that one realises the truly revolutionary potential for changing the way that Enterprise IT works. During my testing, I could simply drag an image from the library to the canvas and deploy it in under a minute. It booted in several seconds, and I could login to it to use it shortly afterwards.

Granted, my “application” was simply a vanilla FreeBSD installation, and the VDC interface and concept still has a few little rough edges, but the experience of rapid deployment, installation and usability is compelling.  What will be key is how quickly Interoute manage to grow their current App Store library to become a comprehensive one-stop-shop of ready-made Enterprise applications that can be deployed in-cloud and made available to users extremely rapidly.

What will also be significant for more complicated applications will be the possibility to delegate the management of an application to a specialist or a channel partner. It’s quite likely that application service providers, currently in the business of providing applications to customers and hosting them, will be extremely interested by the idea of becoming a partner or reseller on Interoute’s VDC.

Finally, I am curious to see how the technology may be used to enable a relatively new capability of staging and trialling. The speed of deployment means that it is extremely feasible for customers to “try-out” applications, almost without commitment.

Summary

Interoute’s VDC development represents a recognition that that the unit of deployable application is fast-changing – from CD-ROM media in the 1990s, to downloadable source code, .tar.gz or .MSI file in the 2000s, to the ubiquitous .VMX/.VMDK virtual machine image of today . Through VDC, Interoute is providing a vehicle for Enterprise customers to host applications and functionality in a secure private cloud, where the distraction of hardware and OS maintenance is irrelevant and insignificant.

I think it’s an exciting development, and I eagerly await early customer responses.

An Early Peek at the BlackBerry PlayBook

•February 17, 2011 • Leave a Comment

PlayBook Home Screen

It was with much anticipation and excitement that Research in Motion, arguably pioneers of practical smartphone technology, revealed their new iPad-tackling PlayBook tablet device at GSMA’s Mobile World Congress in Barcelona this week. RIM took a significant portion of the App Planet hall at MWC and showcased an early pre-release of their new PlayBook tablet, along with selected developers who were able to demonstrate early applications. The actual operating system software is still under development, but the devices were actually there and available to play with. Final release is intended for Q1 in the US but pricing is not confirmed at this time.

The first thing to notice is how light the devices are. At 400g, they don’t weigh much more than the average smart-phone but the 7″ screen manages a comfortable 1024×600 pixel resolution which gives both space and definition. The device boasts an impressive 1GHz CPU with 1GB RAM which RIM promise will make applications “fly”, and includes both front and rear facing cameras, paving the way for a FaceTime competitor.

With the beta operating system installed and bundled applications, the device is fast and usable. iPad veterans could be seen struggling to adapt and fumbling with pinch gestures to zoom which the PlayBook doesn’t support. The PlayBook offers its own gestures, however, to minimise apps, rotate through apps, and bring up the virtual keyboard respectively, and these are easy to get used to. Annoyingly – for me at least – the virtual keyboard makes the same mistake as the iPad in failing to combine alpha and numeric characters together, instead requiring the user to shift between the two instead.

HTML5 Test Result

The new WebKit-based browser is slick and fast and the development version scored a promising 219 on HTML5Test, supporting the Canvas API completely. Provocatively, it also includes support for Flash 10.1, a serious win over the Apple contender. Flash video works and, in conjunction with the built-in H.264 support, which can drive an external 1080p HDMI device, ensures that there should be no Internet-based video that this device can’t view.

Testing Flash with the interoute.com website

Flash games, such as Bloxorz work with good performance, but it’s at this point that one realises that Steve Jobs’ fifth point about Flash is quite accurate: in a lot of cases Flash apps are written without consideration for touch devices, placing a lot of user-interface significance on hover versus click interactions which are clearly not possible on a touch device. Bloxorz, in particular, needs cursor key input for which the PlayBook’s virtual keyboard doesn’t cater.

Multi-tasking, something to which BlackBerry users have long been accustomed, is not left as an after-thought either. The underlying QNX operating system seems to effortlessly juggle multiple applications around in response to the user’s demand. An interesting experiment showed a Quake demo managing 30 frames per second while the bundled YouTube app happily runs HD video and the user is able to flick between the two with a finger-gesture.

Network connectivity is wi-fi only with support for all 802.11a/b/g/n and, quite topically by recent events, I noticed that IPv6 is enabled by default. A 3G version is promised for later in the year but here the story gets quite interesting. The PlayBook itself is not a BlackBerry. This means that there is no push email, no BlackBerry Messenger and no BES/BIS connection to configure. The device is essentially an Internet tablet device with Internet applications but RIM have stated that it is possible to pair with a BlackBerry over Bluetooth in order to get the traditional BlackBerry services. This has the potential to cause quite a bit of unnecessary confusion over quite how a user gets network access: via native PlayBook Wifi? Via paired 3G?

If RIM is sensible, they will try to get away from the current mess of network provider-specific BIS with its poor support for push notifications and mailbox synchronisations, while still allowing for the BlackBerry-specific network communication for enterprise VPN connectivity and BlackBerry Messenger. I noticed through my fiddling that the PlayBook does indeed possess a PIN, the essential address token for communication on the BlackBerry network, so perhaps there is hope.

No new platform would be complete without a taster of applications to come, and there seemed to be plenty of candidates. Electronic Arts were on hand at MWC to show a speedy driving game that makes use of the internal accelerometers to allow a player to steer with the device, while Citrix showed their commitments to the business market with a concept demonstration of how a Citrix client for PlayBook might look.

RIM, keen to encourage development on this exciting new platform, ran a series of boot-camps which were immediately over-subscribed. I was lucky enough to get a seat on one of the sessions that ran through one of the two immediately supportable development methods for PlayBook apps, WebWorks. In recognition of the fact that JavaScript is slowly taking over the world, WebWorks combines the use of HTML5, CSS and JavaScript, along with customised JavaScript extensions in order to encourage the generation of “Weblets” that can be deployed as easily as a web page, but stored and executed locally on the PlayBook and make use of native-like functionality.

The other major technology for targeting the PlayBook is Adobe Air, which is considered the native API. What is surprising, here, is that RIM is not promoting the use of the Java development model that has been so successful for BlackBerry. Instead, the PlayBook contains an Air runtime execution environment and applications are deployed as Air packages. Java users are not left out in the cold however; a Java interface is promised for some time in the future.

During the boot-camp, Sanya Kiruluka and her developer relations team confidently ran through how the mandatory Hello World app and other examples can be created using the WebWorks framework, executed in a virtual machine QNX simulator (a rather innovative idea) and deployed to production hardware. They also announced a competition to stimulate creativity by offering a free PlayBook to a selected winner who manages to design, implement and publish to AppWorld a working and qualified PlayBook application by March 15th.

It’s quite clear that application availability will play a significant part in defining the PlayBook’s success, but early indications are that the hardware base makes for a firm foundation.

Facebook Social Engineering

•February 2, 2010 • Leave a Comment

So after returning from a slight blog writing hiatus (for a variety of reasons), I find myself logged into Facebook late one Sunday afternoon when a former work colleague that I haven’t seen for a while pops up on the integrated IM system.

I don’t tend to use Facebook’s chat facility that much but I am not averse to it. When I offer a warm greeting and he politely returns it, but when I enquire into his wellbeing he drops the rather devastating bombshell that he’s been mugged while holidaying in Scotland! Not just mugged, but mugged at gunpoint no less. He explains that he’s lost all his cash, credit cards, wallet and phone in the attack but that he’s okay physically.

This is a lot to take in with the aftermath of Sunday lunch, and while I am breathing a sigh of relief that he has survived unscathed, you’re probably already a little suspicious of what might be coming. I still remain quite firmly committed, however.

His flight home is in two hours, he explains, and the hotel won’t let him leave without settling his bill! No problem, I reassure him, don’t panic. We can easily sort this out over the phone now with a credit card and we can settle it later when he returns to London.

Grateful at this news, he triggers alarm bell number one: can I transfer the money by Western Union wire? What? Why bother with that when I could just call up the hotel desk and secure it on a credit card?

I dismiss this anomaly, and proceed to tell him the plan: go down to reception, tell them a friend will settle the bill and get their phone number so we can arrange the transaction. Tell them to give you a cash advance for a taxi to the airport as well. He seems a little confused by the last point and then rings alarm bell number two in my head: the hotel has a +44 702 “follow-me” number! At this point, my suspicions are aroused: I’m curious, but I still don’t want to believe I’ve been taken in.

In the casual discussion that follows, I reminisce on a tale from work involving one of our colleagues, deliberately forgetting his name, and look to him to banish all my concerns by completing the tale.

But he goes quiet. He can’t do it, and the inescapable truth hits me. This really is a scam. This is not really my friend at all, but someone who has somehow managed to gain the credentials necessary to pose as him. I persist with my questioning and, realising that he has been rumbled, he concedes defeat by logging out and blocking me, presumably to prevent me raising the alarm with my friend’s friends by writing on his wall or similar.

Now as a well-prepared and diligent reader, you’re probably surprised and disappointed that I got so close to handing over my credit card details to scammers of unknown origin. But what makes this scam so considerable are the mechanisms that it capitalises on to disarm one’s normal caution and guarded behaviour when dealing with unknown Internet correspondents.

  • Facebook friends tend to come rather high on the trust list. This is not a random email from the exiled president of a small African nature seeking financial help to realise his investment in gold, diamonds, father’s inheritance or whatever. This is someone I’ve explicitly authorised, someone who matches a picture photograph, someone with whom I have conversed.
  • Direct IM conversations leave no opportunity for the consideration and reflection that would usually be available before making important decisions. It’s similar to the double-glazing salesman who offers the “sign today only” deal.
  • Questioning the authenticity of a message can be considered quite a hostile thing to do and people often feel reluctant to do so. I am sure that if I emailed or IM’d a work colleague and asked for a sensitive bit of data as a convenience, they would probably oblige.
  • The urgency of the situation – a flight – and the fact that my friend has already been through a shocking ordeal demands decisive action from me if I am a good friend.

The combination of all these factors make this attack an extremely potent one, and the only failings were the use of some casual language, poor anticipation of the likely responses to the situation and the logistics and local conventions involved for expedient payment. For most of the dialog, I harboured more concerns over how to extract the cash from my Liverpudlian friend on his return than I did about the authenticity of the request!

The good news is that Facebook appear to be well aware of this class of scam and do offer some sensible practical advice on dealing with the problem and reporting the issue.

I do hope that no-one else is adversely affected by scams such as this, but a work colleague pointed me to some useful advice and general tips on dealing with IM – whatever the platform – which is essentially “pre-authenticated” but in a weak manner:

  • Get an awareness of people’s writing styles and language: in email, and IM or other short form. They can be quite unique. For instance, a colleague I know at work can be relied upon 100% to apostrophe plurals and omit on contractions. I know if I ever see correct punctuation from him that I should be suspicious!
  • Form a characteristic greeting that you always use to initiate and respond with. Correspondents will grow accustomed to it, and will hopefully note its absence in fraudulent communications. Examples include esoteric greetings or even saying hello in a foreign language. Consistent repeatable behaviour is the key.

As the sophistication of electronic “social engineering” attacks increase, I am sure that it will be necessary for people to become more hardened and vigilant in their use of social networking technology, but hopefully this won’t detract from the usefulness and effectiveness that it provides.

Quagga chokes on large 4-octet AS numbers

•May 11, 2009 • Leave a Comment

Last weekend – a bank holiday weekend in the UK – saw a rather significant BGP-related disruption on the Internet. Fortunately it didn’t affect the mainstream router vendors, but caused service interruptions for anyone dependent on certain versions of the Quagga routing protocol suite (an open-source collection of routing protocol implementations with a configuration management interface that closely resembles mainstream Cisco routers). In a lot of cases, this was restricted to informational route server platforms that provide looking glass capabilities on to backbone networks, but several alternative vendors also produce router appliances based upon the Quagga code base.

I spoke at length with the Interoute on-call engineer for the weekend after several customers reported that their BGP routers had crashed for some unknown reason. He’d investigated the situation and, after discussion, it became apparent that other Internet users were reporting similar problems and that the routers involved were all based upon the Quagga routing protocol suite. The problem was linked to a software defect that manifests itself upon exposure to AS numbers exceeding 5 digits, eg. above 99999.

Once the nature of this defect was understood, we were able to identify the specific BGP updates that were causing the customers’ routers to crash and filter them from further advertisement so that connectivity was restored to those affected customers.

Further examination then revealed that the Quagga BGP daemon seemed to be trying to render an AS number into a string buffer dimensioned for only 5 characters. While this would have been perfectly sufficient for today’s ASNs in the range 0-65535, clearly larger-numbered ASNs could not be handled this way.

Larger-numbered ASNs require support for the recent IETF RFC4893 draft standard to extend the BGP AS number space from 2-octets to 4-octets in order to be represented in standard BGP attributes such as the AS Path. Quagga claims to fully implement this but it seems that some sections of the code did not fully consider the ramifications of dealing with 4-octet ASNs.

Rather frustratingly, the Quagga development team had actually already identified and corrected the problem in February when it was first reported but vendors offering router appliances based on the software suite had not yet had sufficient chance to re-distribute the software updates to address the problem.

 Since then, the problem had became significant to production routers  because a new network making use of a freshly-assigned ASN over 100000 had attached to the Internet and this was causing unpatched Quagga routing daemons around the world to crash every time they encountered an AS path containing the longer AS number!

 The full irony of the situation emerged slightly later on the nanog mailling list when it turned out that the new network making use of the problematic ASN was actually a test network designed to demonstrate the production-readiness of 4-octet AS numbers to service providers!

So in summary, it seems that this was another small but painful step on the way to getting what is a complicated, but essential, upgrade to the global BGP routing system accepted for mainstream use by service providers and customers alike.

Further information:
Patched version of the Quagga routing protocol suite
Geoff Huston’s insightful analysis of AS number resource consumption from 2005

Problems with BGP Prepending

•February 21, 2009 • Leave a Comment

I’ve spent considerable time this week working on a problem caused by accidental and unintentional BGP AS Path prepending. For those not in the know, AS Path prepending is the so-called practice of artificially extending the BGP AS Path attribute associated with one’s Internet routes in order to influence the preference in route selection on foreign networks. It is a fairly coarse metric, but it can be an effective mechanism in controlling incoming traffic flow which is often difficult to deal with otherwise.

Unfortunately, over the last week or so, several incidents of excessive prepending have been witnessed with AS paths containing up to 200-odd ASNs. This wouldn’t be illegal, however, since RFC4271 – the latest cut of the BGP protocol specification – only seems to state a limit by way of a maximum BGP message size, which is 4096 bytes. However Cisco IOS routers running certain older code revisions are quite vulnerable to BGP update messages which are accompanied by long AS paths.

There appear to be a number of weaknesses but they all have similar effects: the update with the long AS path is discarded, a notification sent to the other peer and the session torn down. Or, the update is malhandled – because of its content - corrupted and passed on where the next router declares it invalid and tears down.  The consequence is the same: interruption of BGP and thus full routing table for anyone downstream of the router, and route flapping of any advertised routes from the affected router upstream.

Observers on the nanog mailing list (here and here) were quick to notice the recent instances of the problem and there was speculation and puzzlement over the logic behind such a BGP advertisement. As most BGP-capable network engineers understand, prepending is not a linear feature whose effects can be turned up or down. It is more of a threshold-crossing event so advertising a BGP route with 200 ASNs seems particularly over-zealous and unlikely to be deliberate legitimate use.

Some speculated that it was ill-informed routing policy or incapable operators while others suggested that this and the other related incidents prior to it – which all exhibited the same characteristics - were essentially a remote-control DoS attack against customers dependent on the routers running older code.

Since Interoute was implicated in the most recent case, however,  a colleague and I were able to investigate the problem thoroughly with our customer and the solution was quite remarkable in demonstrating how a relatively simple software defect can cause such a widespread network problem.

The network in question turned out to be running the Microtik Router OS platform, which is an alternate BGP-based router plaform capable of quite specialised network functionality. I’ve little direct experience of the system myself and I’m unsure of the heritage of the BGP implementation, but based on the observations seen here, it would appear to be organic rather than having roots in the more commonly available implementations: gated, zebra, quagga or openbgpd.

The network operator had indeed intended to prepend the AS number on his route advertisements in order to discourage traffic, but a relatively small user interface error had allowed him to configure it incorrectly. When configuring the prepend operation, instead of specifying the desired AS path to be seen – as might have been expected on a Cisco IOS device – the configuration asked the operator to specify the number of times to prepend instead.

As a sensible precaution, the documentation states that the number of AS prepends that can be applied (ie. the number of times the AS will be repeated in the path) is limited to a sensible amount but unfortunately the command-line user interface doesnt actually enforce the limit. Instead the user input is taken literally and interpretted as the number of times to repeat the AS number! Since most ASNs in use on the Internet are large 16-bit quantities, this results in an erroneous configuration that attempts to generate BGP updates with excessively long AS paths which consequently tickle the Cisco bugs.

In fact – as one shrewd observer noted – the number of prepends input is not taken verbatim, but is divided modulo 256 when it is seen as being too large in order to “make it fit”. It’s unclear whether this functionality is an unintentional side-effect of a string-to-number handling library that the code is using or whether the modulo division was thought to be helpful in some way.

Thankfully we were able to work with our customer in order to assist him identify and correct the problem and minimuse further disruption experienced by Internet users but it seems that this relatively simple defect has caused noticeable incidents world-wide (now usefully documented on bgpmon) and may still cause further pain for those with vulnerable platforms. Those with vulnerable Cisco platforms can make use of the Maximum BGP AS Path Limit feature in IOS to limit the effects, although it isnt effective in all cases.

Capacity Planning and Traffic Engineering with Packet Design

•February 8, 2009 • Leave a Comment

I was afforded today a brief overview of Packet Design’s software products in the areas of capacity planning and traffic engineering on IP networks.

I can’t provide a complete review of the capabilities of their product since it is quite comprehensive, but I can comment on the aspects that were presented to me which may have particular interest to those seeking a tool to assist in packet network traffic engineering.

Packet Design’s Route Explorer appliance is a routing protocol modelling tool. To model the network’s IGP, it acts as a fully-fledged OSPF or IS-IS speaker that is attached directly into the network via the closest backbone router. Through this interface, it passively observes the link state advertisements that occur within the IGP protocol and it is able to create a replica topology map that shows which devices are connected and by which links. This can be visualised on a GUI.

This allows an engineer to quickly query internal routes and answer questions related to which links are used to satisfy which routes on the network without actually touching the network. An engineer can also assess the impact of works activities or the benefits of proposed topology changes by artificially manipulating the IGP topology in terms of links or link metrics and Route Explorer will show the resulting effects on routing decisions.

In addition to IGP simulation, Route Explorer can also speak IBGP to backbone routers on the network in order to glean information about the external networks to which one is connected. This effectively extends the “what-if” route look-ups possible within the IGP to include all exterior Internet routes which is very powerful. Such capability allows one to assess the network routing effects associated with connecting a new customer, or losing a customer.

Route Explorer capacity is dimensioned based upon the route count that one expects to feed the devices for modelling and it would appear to be capable of supporting split-AS routing policies where one part of the network may have a slightly different view of the best exit than another. 

The Traffic Explorer appliance complements Route Explorer by collecting Netflow accounting records from router platforms and mapping them onto the topology discovered by Route Explorer in order to compute the estimated usage on a per-link basis. The general recommendation is to enable Netflow accounting in a sampled mode on all external interfaces so that traffic entering and leaving the autonomous system is observed by the Traffic Explorer platform but not repeatedly counted.

As well as offering a near-realtime view on link usage that is independent of SNMP interface meters, Traffic Explorer extends the usefulness of Route Explorer since the “what-if” scenarios that can be conceived can also include the resulting traffic swing which is an extremely attractive feature for impact analysis or new or upgraded circuit planning.

Traffic Explorer is dimensioned by the rate of receipt of Netflow accounting records and both Traffic Explorer and Route Explorer are licensed by appliance. This is an attractive feature since in most cases it means license fees are proportional to traffic and hopefully revenue. Competing products in this area often license in terms of the raw network element count, irrespective of actual traffic levels, which can punish resilient network design.

Packet Design’s portfolio also includes an MPLS VPN Explorer capability which makes use of MPLS-based Netflow and multi-protocol BGP in order to report on VPN-specific routing and traffic. 

User access to both Route Explorer and Traffic Explorer is through X11 or VNC to a visualisation server which abstracts the details of the individual collector elements needed to create the view. The X11/VNC access is a slight disappointment, in my opinion, since one would think that a native client with specific client/server protocl would probably perform better, especially on geographically-diverse networks where a central server is unlikely to be in the same place as the user. 

In summary, the Packet Design solution appears to be a promising addition to a network looking to enrich its view of routing and traffic analysis.

Submarine Cable Breaks

•January 7, 2009 • Leave a Comment

One of the most exciting things about working in a large telecommunications company such as Interoute is the unifying effect of people all around the company when faced with a crisis. Such crises are thankfully rare in occurrence, but in the early morning of Friday December 19, significant and substantial damage was caused to three major submarine cables running between Europe and the Middle East and, as a result, telecommunications services were severely compromised. Some people experienced failures of main circuits, backup circuits and even tertiary systems. Internet connectivity was congested and slow-running to and from destinations in the affected region.

At Interoute, there was a controlled chaos in the operations centres in Prague and Geneva as overwhelmed staff dealt with an abundance of lengthy or complicated re-routes for private-wire customers and had to make tough decisions regarding congested packet network links. With the Christmas holiday break looming dangerously close and indications from the submarine cable operators that repair efforts were underway but unlikely to be completed before the New Year, the main objective was to achieve a steady-state for as many customers as possible until full service could be restored.

Emails were urgently exchanged, and conference calls hurriedly convened. For each challenge that presented itself, plans, options, suggestions were solicited from all corners of the company, and even customers and suppliers. They were drafted and re-drafted, considered and critiqued, refined and accepted, or dismissed as necessary, until what was left was hopefully a feasible and workable method to overcome the problems. A collective sense of responsibility had gripped the company and people from all aspects of the business were motivated extraordinarily to work together as much as possible to overcome the problem.

I contributed to the drafting of one plan – of several candidates – to restore network connectivity to an affected IP node in Malta, which had been badly compromised by the damage. I worked into the night along with others to establish the details that would allow the re-stitching together of an IP/MPLS facility over a partner network’s infrastructure. As it happens, my plan was rendered unnecessary by the successful early commission of an alternative SDH transmission system which could restore the original topology with much lower risk of complications, but it was good to know it was an option.

I am sure that Interoute weren’t the only service provider faced with difficult challenges over this episode, but I do know that most of the people I work with felt proud to be part of an organisation that could step up to face extraordinary challenges in such a determined way.

Network Planning with Cariden Mate

•October 10, 2008 • 1 Comment

Over the last few weeks, I’ve been spending time evaluating Cariden Mate – a software tool designed to assist the task of capacity planning and traffic engineering on large scale IP networks. As a result I can provide a brief review of its capabilities.

Cariden Mate ships on three major platforms: Windows, Solaris UNIX and Macintosh OS X. In a style consistent with other network management tools but disappointing nonetheless, there seems to be an implicit assumption that the network operator is likely to dedicate a user workstation to running the software since the user interface and network element data collection software all exist wthin the same installable package. In most cases, what is infinitely preferable is a server installaton that communicates with the network elements and client installations that provide the necessary interfaces to the user.

That said, unlike a lot of other network management tools, the separate components of Cariden Mate, such as the GUI, the network collecting agents, the processing tools are actually all available to execute separately in a modular fashion and are documented as such. This allows a skilled systems architect the ability to construct a client/server-style architecture or, indeed, any other customised architecture that might introduce other data or systems in a flexible manner. One requires an amount of development time and effort to do this, of course.

The operation of Cariden Mate is centred around a plan file. A plan file contains various items of data about a network and since it is not always possible to gather this data from one place, the plan file format is flexible enough to be edited by hand, manipulated by user scripts or generated automatically by Cariden Mate’s built-in tools.

For those wanting a quick start, the GUI offers a fairly comprehensive “Get Plan” operation that will probe a router and generate a plan file by collecting a backbone router’s IGP database – both OSPF and IS-IS are supported – and enriching it with interface and bandwidth levels, obtained through SNMP.  This allows Cariden Mate to understand which nodes are connected to which other nodes and with what link metrics.

With this data alone, it is possible to generate a plan file that a user can use offline as a planning tool. The plan file is rendered in the GUI and shows how devices are linked in the network topology and allows the user to discover which links would be used for any specified route. The user is also able to define an arbitrary traffic matrix (a simple spreadsheet-like table of input node, output node and traffic level) and see it  reflected on the discovered topology in terms of link usage, thus facilitating capacity checks for customer turn-ups.

At this point, the topology can also be manipulated: link metrics can be changed or links can be taken down, and Cariden Mate will show the new routes that will be taken by traffic and the resulting load on the remaining links as a result of the provided traffic matrix.

Assuming the availability of a reasonably acurate traffic matrix – in the right format – using Cariden Mate in this way without further access to the network is a powerfool tool in its own right.

In addition to this, however, Cariden Mate provides tools to enrich the plan file with currently measured interface link utilisation data collected from SNMP. The tools are flexible and can be run in the background in order to constantly poll the network and populate an archive or, alternatively, the GUI’s “Get Plan” function is able to grab a single snapshot of link utilisation for the current time.

Cariden Mate GUI

Cariden Mate GUI

In possession of the link utilisation data, Cariden Mate’s powerful Estimation and Deduction functions can be used to compute a candidate traffic matrix that would result in the observed link utilisation. The mathematics and exact algorithm I confess to not understanding at all beyond the premise that the software seems to consider all possible traffic matrices that could be put to the network and how they would effect the links and offers the most likely traffic candidate. 

The traffic matrix is very valuable data when one considers the difficulties in acquiring an actual observed matrix (different network platforms, different line cards with different features etc.). It can be based on data collections from a perceived network high-tide and then be made available offline in order to run simulations against in order to determine where new circuits would best be placed and what the likely impact of planned works would be.

Cariden also provides a light HTTP interface into a plan file using its WeatherMap-style interface. This presents a dynamically rendered image which shows the plan file with links coloured according to usage from continuous data stored in archive file. This is a convenient and license-effective way to propagate basic network visualisation to users in the organisation that don’t require the full-blown GUI functionality.

Cariden Mate Web Interface

Cariden Mate Web Interface

Licensing is typically per-node and per-seat. The per-node licensing is reasonable but it de-values the topology discovery mechanism which happily disovers every node in the IGP database – inferring location from network naming convention as it goes – whether you intend to use the Cariden Mate functionality for it or not. Unless you license the software for all nodes in the IGP, discovery is hampered by the necessary manual “deletion strategy” to remove all the low-end access routers that you don’t care about. Per-seat licensing is enforced through a stubborn adherence to MAC address for client installations which is frustrating.

But in conclusion, Cariden Mate provides a very useful complement to the usual first-generation network performance reporting tools that measure interface usage by offering various levels of user both a topology understanding and visibility, and also a view on end-to-end traffic which cannot be gleaned from SNMP data alone.

 
Follow

Get every new post delivered to your Inbox.