Comment: Extreme Data Protection in Virtualized Environments

Eicher provides five data protection imperatives that organizations should consider during virtual server planning
Eicher provides five data protection imperatives that organizations should consider during virtual server planning

Transforming an organization through server virtualization requires a strategic and coordinated approach. Data protection – which includes backup, secondary storage and disaster recovery – is an area that can complicate virtualized data centers if hastily implemented. Following are five data protection imperatives that organizations should consider during virtual server planning.

#1: Minimize impact to host systems during backups

In virtual environments, numerous virtual machines (VMs) share the resources of the single physical VM host. Backups negatively impact the performance and response time of applications running on other VMs on the same host. On a large virtual machine host with many VMs, competing backup jobs can bring the host to a halt, leaving critical data unprotected.

There are various approaches for minimizing the impact to host systems during backups. The simplest approach is to limit the number of VMs on a system, making sure not to exceed the number that can be effectively backed up. While effective, this runs counter to the purpose of virtualization, which is to consolidate applications to the fewest possible physical servers. It would also limit the financial benefits of consolidation.

A second approach is to stagger the scheduling of VM backups. If performance is impacted when four backups are simultaneously running, limit backups to three at a time. This can solve the performance issue, but it can create other challenges. As data grows over time, jobs may take longer to run, creating backup overlap.

An early attempt at solving the backup problem was the use of a proxy server. For VMware, this model is known as VMware Consolidated Backup (VCB). With VCB, a separate server is dedicated for running the backups directly off the storage. While this seemed good in theory, in practice there was still significant performance impacts because of VMware snapshots.

VMware followed with a new storage application programming interface (API) called vStorage APIs for Data Protection. This introduced the concept of changed block tracking (CBT). CBT tracks data changes at the block level, rather than the file level, resulting in significantly less data being moved during backup. CBT goes a long way toward solving the problem of backup impact, although it does still rely on VMware snapshots.

A final approach is to install an efficient data protection agent on each virtual machine, and then run backup jobs just as they would be run in a physical environment. The efficient agent requires technology that deftly tracks, captures, and transfers data streams at a block level without the need to invoke VMware snapshots. By doing so, no strain is placed on the resident applications, open files are not an issue, and the file system, CPU, and other VMs are minimally impacted.

#2: Reduce network traffic impact during backups to maximize backup speed

Reduction of network traffic is best achieved through very small backups, eliminating network bottlenecks as the backup image travels from VM to LAN to SAN to backup target disk. Block-level incremental backups achieve this whereas full base backups, and even file-level incrementals, do not.

Minimal resource contention, low network traffic and small snapshots all lead to faster backups, which deliver improved reliability and allow for more frequent backups and recovery points. This also means more VMs can be backed up per server, increasing VM host density and amplifying the benefits of a virtualization investment.

#3: Focus on simplicity and speed for recovery

Numerous user implementations have revealed that server virtualization introduces new recovery challenges. Recovery complications arise when backups are performed at the physical VM host level or through a proxy.
Users of traditional, file-based backup often assume that the searchable catalog they are used to is available in any backup tool. With VMs, this is not always the case. Systems that do full VM image backups or use snapshot-based backups often are not able to catalog the data, with no easy way to find a file.

Fast and simple recovery can be achieved if point-in-time server backup images on the target disks are always fully ‘hydrated’ and ready to be used for multiple purposes. A data protection model following this practice provides immediate recovery to a VM, cloning to a VM, and even quick migration from a physical to virtual machine by simply transferring a server backup image onto a physical VM host server.

#4: Minimize secondary storage requirements

Traditional backup results in multiple copies of the entire IT environment on secondary storage. Explosive data growth has made those copies larger than ever. The need for extreme backup performance to accommodate more data has necessitated the move from tape backup to more expensive disk backup. The result is that secondary disk data reduction has become an unwanted necessity.

De-duplication of redundant files can be achieved at the source or target. Each approach, however, has drawbacks.

New data streams need to be compared with an ever-growing history of stored data. Source-side de-duplication technology reduces the amount of data sent over the wire, but it can impact performance on backup clients because of the need to scan the data for changes. Target-side de-duplication does nothing to change the behavior of the backup client or limit sent data, although it does reduce the amount of disk resources required.

A hybrid approach combining efficient data protection software with target-side de-duplication can help organizations achieve the full benefits of enterprise de-duplication.

#5: Strive for administrative ease-of-use

Few users have a 100% virtualized environment. Therefore, a data protection solution that behaves the same in virtual and physical environments is desirable.

A data protection solution where a backup agent is installed on each VM can help ease the transition from physical to virtual. When evaluating solutions, it is vital to consider the entire backup lifecycle. If data sets need to be archived to tape, a de-duplication device may not allow easy transfer of data to archive media. This might then require a secondary set of backup jobs to pull data off the device and transfer it to tape, greatly increasing management overhead.

Ease of use can be realized with features such as unified platform support, embedded archiving, and centralized scheduling, reporting, and maintenance – all from a single pane of glass.

A Holistic View of Virtualization

Planning at all levels is required to maximize the value of a virtualization investment. Data protection is a key component of a comprehensive physical-to-virtual (P2V) or virtual-to-virtual (V2V) migration plan.

The five imperatives recommended herein can help significantly improve organizations’ long-term ROI around performance and hardware efficiencies, and accelerate the benefits of virtualization. To complete this holistic vision, organizations must demand easy-to-use data protection solutions that rate highly on all five of the imperatives. Decision makers who follow these best practices can thereby avoid the common data protection pitfalls that plague many server virtualization initiatives.


Peter Eicher is a senior product manager for Syncsort, a provider of performance data integration and data protection software. A frequent participant in industry events, he helps spread the word about the most effective means to protect and recover data. Eicher is a 16-year software industry veteran who has previously held product management and marketing roles at Microsoft, Lucent Technologies, Ascend Communications, and FalconStor Software.

What’s hot on Infosecurity Magazine?