Professional Documents
Culture Documents
EMC VNX Deduplication and Compression
EMC VNX Deduplication and Compression
Abstract
This white paper discusses the capacity efficiency technologies
delivered in the EMC® VNX™ series of storage platforms. High-
powered deduplication and compression capabilities for file and
block storage are delivered standard with the VNX Operating
Environment.
March 2011
Copyright © 2011 EMC Corporation. All Rights Reserved.
For the most up-to-date listing of EMC product names, see EMC
Corporation Trademarks on EMC.com.
Audience
This white paper is for anyone interested in understanding the deduplication and
compression functionality that comes standard with the VNX series of storage
systems.
Technology introduction
There are many capacity-optimization technologies available. Each technology varies
in its efficacy based on the type of data being processed, amount of data, and data
access patterns. Deduplication systems, like the EMC Avamar® and Data Domain®
offerings, are designed to process massive amounts of data at high speed. When
applied to backup data sets, these systems can reduce required capacity by tens and
even hundreds of times the data set’s aggregate size. Avamar and Data Domain serve
the same basic need—backup to disk—but each implementation provides unique
benefits.
VNX systems are high-performing primary-storage devices for file and block data. File
data is accessed on the VNX system via the CIFS, NFS, FTP, or MPFS protocols. Block
data is accessed using the Fibre Channel (FC), Fibre Channel over Ethernet (FCoE), or
internet SCSI (iSCSI) protocols. Capacity optimization on these systems is an
asynchronous operation, occurring after new data is written, in an effort to maximize
server I/O performance. Table 1 compares each storage system category.
*Virtual machines’ OS image disks, no data. Virtual disks used for data will be as compressible as the
data stored on them.
Figure 1. Compression rates of common file types
When migrating from traditional systems to those utilizing capacity efficiency
technologies, the initial capacity savings can be much larger than the nominal data-
compression rate alone. This is due to the other optimizations used: single
instancing of file data and thin-LUN Virtual Provisioning for block data. Figure 2 shows
the efficiency of VNX capacity-optimized volumes over traditional LUNs.
VMware
3.00
Office
Binaries
2.00
Media
0.00
0% 10% 20% 30% 40% 50%
% Free space in traditional LUN
When deduplication is invoked by a policy, files that have been deduplicated are
reduplicated if enough new data is written to the file. The file will then be
deduplicated again once the access and modify time criteria are met again.
In VMware environments, file-level compression can be invoked within the EMC VSI
for VMware vSphere: Unified Storage Management plug-in. Compression is the term
used within vCenter, but this includes compression and single instancing. Using the
plug-in, you can enable compression at the NFS datastore level, individual virtual
machine level, or virtual disk level. When you right-click on a cluster, host, datastore,
or VM, you are shown the compression options. When you enable compression on the
datastore, all virtual disks in the datastore are processed. When you enable it on a
virtual machine, all existing virtual disks associated with that VM are processed.
References
The following white papers are available on EMC.com:
Achieving Storage Efficiency through EMC Celerra Data Deduplication
EMC Data Compression — A Detailed Review
EMC CLARiiON Virtual Provisioning – Applied Technology