Protecting Distributed Network Infrastructure Availability across a Cloud-Connected Workforce

"The cost of network downtime for the distributed workforce has risen dramatically", says Waldie
"The cost of network downtime for the distributed workforce has risen dramatically", says Waldie

Arguably the cost-benefit sweet spot for cloud adoption has been for small and medium-sized enterprises (SMEs), where virtualizing back office infrastructure and services has delivered increased scalability, reliability, and reduced operating costs. Enterprise branch offices have enjoyed additional benefits from the migration of server room silos to cloud systems, fostering co-operation and transparency across divisions and departments.

The downside is that this has given rise to distributed sites that are completely dependent on the cloud for day to day operations. When the distributed network infrastructure that tethers these sites to the cloud becomes unavailable, businesses cannot service their customers – short-term financial losses may run in to the tens of thousands per hour in lost revenues and lost productivity, as well as longer-term damage to reputation.

As a consequence, the cost of network downtime for the distributed workforce has risen dramatically. Factor this with the MTTR of a remote site with little or no local IT staff, and any outage becomes very expensive. The sites that have the most to gain through cloud adoption also have the most to lose, whether it be from breach of availability from technical fault or a deliberate incident carried out by a disgruntled employee or malicious third party. Provider SLAs offer some insurance, but the devil is in the details, and they are unlikely to cover the true cost of an outage.

So while incidents such as the Amazon outage and SaaS data leaks of 2011 have drawn focus to the risks at the cloud hub, there is also an acute vulnerability at the spokes. Fortunately this vulnerability can be proactively mitigated by extending some best practices from the data center, to distributed sites – including network infrastructure out-of-band management, monitoring, and run book automation.

Distributed sites may have a secondary DSL circuit or redundant fiber channel provisioned for failover network connectivity; however, the case of a bug or exploit in the distributed routers’ firmware – or a routing change at the ISP or core router necessitating an emergency configuration change at the remote site – illustrates that redundancy is not enough. Remote sites require an out-of-band access path separate from the site’s primary means of connectivity – a dedicated out-of-band management box with PSTN or a cellular modem is a cost effective way to meet this requirement.

This box, in turn, is cabled to the out-of-band management or console ports of critical routers, switches and load balancers, providing a convenient channel for always-available remote management access, but also for continuous monitoring and run book automation. As this box is effectively a public-facing bastion with back-door management access to critical infrastructure, careful consideration must be given to security: logged, auditable management sessions, per-user infrastructure management access policies, and SSH to secure inbound connections at the very least, with x.509 IPSec or OpenVPN recommended.

Continuous monitoring of remote network infrastructure availability and performance is necessary to know what’s normal for your network, and to detect and respond to incidents as they happen. Tools for this task include DC monitoring workhorses MRTG and Nagios/Icinga, the latter in particular being well adapted to both central and distributed installs.

The responsibility for infrastructure monitoring is increasingly being absorbed by specialized data center infrastructure management (DCIM) appliances. Nevertheless, it is important that any monitoring solution has the ability to scale beyond the data center to extend to distributed ICT, environmental and physical security sensors, to maintain situational awareness across all distributed work places (branch offices, campus sites, and even home workers).

Run book automation (RBA) implements the workflow that administrators use to detect and respond to an incident or outage as a series of automatically triggered programs or scripts. RBA effectively installs a virtual network administrator at each remote site to reduce MTTR by way of automatic recovery actions.

RBA is monitoring with teeth, and is very powerful when combined with open-source infrastructure management tools commonly used in the DC NOC. For example, monitoring router connectivity with automatic roll back to known-good router configuration using RANCID configuration management tools, in the event of a remote administrator fat finger error, config corruption or sabotage attempt.

 

Opengear is exhibiting at Infosecurity Europe 2012, the No. 1 industry event in Europe held on 24–26 April 2012 at Earl’s Court, London. The event provides an unrivalled free education program, exhibitors showcasing new and emerging technologies, and offers practical and professional expertise. Visit the Infosecurity Europe website for further information.

 


Robert Waldie is Opengear’s vice president of business development in the UK and Europe. Waldie manages the Opengear UK operation and is responsible for technology partnering, channel development, marketing strategy, channel sales training and technical support there and in Europe. Prior to setting up the UK operation, he led the Opengear software development team. Before Opengear, Waldie worked as a software engineer with Secure Computing / Cyberguard / Snapgear and BRDC developing embedded Linux security and network applications. Waldie holds a BSc in computer science and a BA in linguistics from the University of Queensland. 

What’s hot on Infosecurity Magazine?