Download as pdf or txt
Download as pdf or txt
You are on page 1of 15

White Paper

EMC VNX DEDUPLICATION AND COMPRESSION


Maximizing effective capacity utilization

Abstract
This white paper discusses the capacity efficiency technologies
delivered in the EMC® VNX™ series of storage platforms. High-
powered deduplication and compression capabilities for file and
block storage are delivered standard with the VNX Operating
Environment.

March 2011
Copyright © 2011 EMC Corporation. All Rights Reserved.

EMC believes the information in this publication is accurate of


its publication date. The information is subject to change
without notice.

The information in this publication is provided “as is”. EMC


Corporation makes no representations or warranties of any kind
with respect to the information in this publication, and
specifically disclaims implied warranties of merchantability or
fitness for a particular purpose.

Use, copying, and distribution of any EMC software described in


this publication requires an applicable software license.

For the most up-to-date listing of EMC product names, see EMC
Corporation Trademarks on EMC.com.

VMware, VMware vCenter, and VMware View are registered


trademarks or trademarks of VMware, Inc. in the United States
and/or other jurisdictions. All other trademarks used herein are
the property of their respective owners.

Part Number h8198.1

EMC VNX Deduplication and Compression 2


Table of Contents
Executive summary.................................................................................................. 4
Audience ............................................................................................................................ 4
Technology introduction .......................................................................................... 4
VNX data deduplication and compression ................................................................ 5
Deploying VNX deduplication and compression for file data ............................................... 8
Deploying VNX compression for block data ...................................................................... 11
Conclusion ............................................................................................................ 15
References ............................................................................................................ 15

EMC VNX Deduplication and Compression 3


Executive summary
Capacity-optimization technologies play a critical role in today’s environment where
companies need to do more with less. The EMC® VNX™ series of storage arrays is well
equipped to meet users’ needs in this regard. Intelligent and automated
deduplication and compression features are provided in the VNX Operating
Environment at no additional cost in the VNX5300™ and all higher models. (This
feature is not available in the lowest model, the VNX5100™.)
With VNX deduplication and compression, you can significantly increase storage
utilization for file and block data. In many cases, effective utilizations are increased
two to three times compared with traditional storage.
Management is simple and convenient. Once the capacity-optimization technologies
are turned on, the system intelligently manages capacity-optimization processes as
new data is written. With Unisphere™, you can manage block and file data from
within a single screen. In addition, you can deploy many of the features from VMware
vCenter™ via the EMC VSI for VMware vSphere™: Unified Storage Management
feature.
This white paper discusses the capacity-optimization capabilities of VNX series
systems, how they are best deployed, and how they fit in with other deduplication
technologies in your storage environment.

Audience
This white paper is for anyone interested in understanding the deduplication and
compression functionality that comes standard with the VNX series of storage
systems.

Technology introduction
There are many capacity-optimization technologies available. Each technology varies
in its efficacy based on the type of data being processed, amount of data, and data
access patterns. Deduplication systems, like the EMC Avamar® and Data Domain®
offerings, are designed to process massive amounts of data at high speed. When
applied to backup data sets, these systems can reduce required capacity by tens and
even hundreds of times the data set’s aggregate size. Avamar and Data Domain serve
the same basic need—backup to disk—but each implementation provides unique
benefits.
VNX systems are high-performing primary-storage devices for file and block data. File
data is accessed on the VNX system via the CIFS, NFS, FTP, or MPFS protocols. Block
data is accessed using the Fibre Channel (FC), Fibre Channel over Ethernet (FCoE), or
internet SCSI (iSCSI) protocols. Capacity optimization on these systems is an
asynchronous operation, occurring after new data is written, in an effort to maximize
server I/O performance. Table 1 compares each storage system category.

EMC VNX Deduplication and Compression 4


Table 1. High-level comparison of VNX deduplication and compression with Avamar
and Data Domain backup-to-disk systems
VNX – Multipurpose storage platform with storage Avamar and Data Domain – Dedicated
efficiency features backup/archival storage platforms
Post-process data reduction—Device is sized for Inline data reduction—Device is sized for reduced
original data size. Capacity is released gradually as data. All incoming data is reduced immediately.
it’s processed.
Relatively low to moderate deduplication Very high deduplication processing throughput.
processing throughput.
Low impact, and less aggressive capacity Most aggressive deduplication—variable block.
optimization—single instancing of files with
compression. Compression for block data.
Capacity optimization is a low-priority task Deduplication is a high-priority task.

VNX data deduplication and compression


VNX systems can increase capacity efficiency by as much as three times when
compared to traditional systems without advanced capacity efficiency features. VNX
achieves this through a combination of capacity efficiency technologies including
thin-LUN Virtual Provisioning™, compression, and file-level single instancing. All
deduplication and compression features discussed in this paper are available on the
VNX5300 and higher models. Deduplication features for file data are available on
VNXe™ series systems.
VNX systems are built to handle the I/O demands of large numbers of Flash drives.
Performance-optimization features such as FAST Cache and FAST move the busiest
data onto the highest-performing drives, increasing the system’s IOPS-per-dollar
figure. However, Flash and high-speed SAS drives have a high cost per gigabyte. The
selected use of capacity efficiency features such as deduplication and compression
plays a complementary role in lowering overall cost by increasing effective utilization
rates.
VNX achieves capacity optimization in slightly different manners for file and block
data. Compression is just one element of the VNX capacity-optimization features for
both file and block data. Compression is a fundamental capacity efficiency technique
used in many solutions because it has the benefits for most data types.
The efficiency benefits of VNX compression for several data types are shown in Figure
1.

EMC VNX Deduplication and Compression 5


100%
90%
80%
70%
60%
Savings
50%
40% Data
30%
20%
10%
0%
Media Binaries Office VMware* Text

*Virtual machines’ OS image disks, no data. Virtual disks used for data will be as compressible as the
data stored on them.
Figure 1. Compression rates of common file types
When migrating from traditional systems to those utilizing capacity efficiency
technologies, the initial capacity savings can be much larger than the nominal data-
compression rate alone. This is due to the other optimizations used: single
instancing of file data and thin-LUN Virtual Provisioning for block data. Figure 2 shows
the efficiency of VNX capacity-optimized volumes over traditional LUNs.

Relative capacity utilization of compressed and non-


compressed data
4.00 Text
Efficiency Savings Multiplier

VMware
3.00
Office

Binaries
2.00
Media

1.00 Traditional LUN

0.00
0% 10% 20% 30% 40% 50%
% Free space in traditional LUN

Figure 2. Relative capacity efficiencies of traditional thick LUNs

EMC VNX Deduplication and Compression 6


Figure 2 represents a model of effective capacity utilizations. The graph shows how
different data types benefit from VNX capacity optimizations. These are compared
against the “Traditional LUN” case, which could be any data type on a volume without
capacity optimization. The x-axis represents how much of the user capacity (the
amount presented to servers) is free or unused space.
For example, assume a system with 1,000 GB usable capacity has 600 GB of data,
which equates to 60 percent capacity utilization (40 percent free space). In the case
of block storage, a traditional LUN would “stovepipe” that unused capacity to the
assigned server. As Figure 2 illustrates, if the data were office files and capacity
optimization were used, effective capacity utilization would be increased 2.5x, to 150
percent. (This is shown by a dotted line in the chart.) Other data types that are more
compressible can deliver even higher effective utilizations.
Over 100 percent effective utilization means you can store more data than there is
usable capacity. This is possible because through compression, the data is stored
using less capacity than it normally would be. For file data, capacity savings from
compression and single instancing are returned to the VNX file system for use by
other files. For block data, thin-LUN Virtual Provisioning is used to return unused
capacity to the storage pool for use by other LUNs. The bottom line is that capacity
that would normally be allocated to servers, but not used, is available for other data
when capacity optimization is used.
The following sections introduce each of the technologies used for file and block
data. For more details on the architecture, implementation, and management of these
features, refer to the following white papers:
Achieving Storage Efficiency through EMC Celerra Data Deduplication
EMC Data Compression — A Detailed Review

EMC VNX Deduplication and Compression 7


Deploying VNX deduplication and compression for file data
The VNX Operating Environment for File offers several convenient methods for
managing deduplication. There are user-defined deduplication policies available in
Unisphere as well as integrated options within VMware® vCenter and Windows
Explorer.
User-defined policy attributes identify which files to deduplicate and compress. You
can set these controls at the file-system level as shown in Figure 3. You can select the
deduplication method, file attributes, and system resource threshold limits for
running the deduplication process itself. The default values are shown in Figure 3.

Figure 3. File deduplication policy


You can conduct case-sensitive filename compares for the pathname-exclude list for
NFS. The ability to enable CIFS compression via the Microsoft Windows compression
attribute can be enabled and disabled. Enabling this feature allows you to see
compressed files displayed in a different color in Windows Explorer than non-
compressed files.
Options for the single-instance file compare are the default SHA-1 hash and byte-level
compare. The more rigorous byte method performs a byte-level compare for files that
have matching SHA-1 hashes. Byte-level compare requires more system resources
from the system, which can increase the time needed to do a full scan.
You can set file attributes such as last access time, modification time, min/max file
size, and file extensions to target specific files for deduplication processing.
Minimum Scan Interval sets the minimum number of days between completing a scan

EMC VNX Deduplication and Compression 8


of the file system and starting a new scan. The SavVol High Water Mark sets the usage
level of the checkpoint save volume at which deduplication stops or fails to start. The
Backup Data High Water Mark determines which files are backed up in its current
space-reduced size based on the compressibility of the file. At the default 90 percent,
files that are compressed to 90 percent of their original size are backed up as
compressed files. CPU% High and CPU% Low water marks determine the throttle
mode for the deduplication process. It will run in full throttle mode if the CPU usage is
below the high water mark. If the CPU usage exceeds the high water mark during a
deduplication scan, the deduplication process throttles down until the CPU usage
reaches the low water mark. When the low water mark is hit, the deduplication
process ramps back up to full throttle.
The File System Properties dialog box in Figure 4 lets you enable and disable
deduplication; it also shows deduplication metrics. The deduplication policy can be
enabled, disabled, or suspended. Turning the process off initiates the reduplication
of any files that are already deduplicated. Lastly, detailed statistics are available
about the space saved at the file system level.

Figure 4. Deduplication statistics in Unisphere

When deduplication is invoked by a policy, files that have been deduplicated are
reduplicated if enough new data is written to the file. The file will then be
deduplicated again once the access and modify time criteria are met again.
In VMware environments, file-level compression can be invoked within the EMC VSI
for VMware vSphere: Unified Storage Management plug-in. Compression is the term
used within vCenter, but this includes compression and single instancing. Using the
plug-in, you can enable compression at the NFS datastore level, individual virtual
machine level, or virtual disk level. When you right-click on a cluster, host, datastore,
or VM, you are shown the compression options. When you enable compression on the
datastore, all virtual disks in the datastore are processed. When you enable it on a
virtual machine, all existing virtual disks associated with that VM are processed.

EMC VNX Deduplication and Compression 9


Figure 5 shows the Properties dialog boxes for a datastore and a virtual machine with
compression enabled. Checkboxes for enable/disable as well as savings due to
compression are available in each dialog box.

Figure 5. Compression options and space savings in Datastore and VM Properties


dialog boxes
When processing virtual disks, the file compression feature is aware of the virtual-
disk structure and will only process the .vmdk file. Swap and temp spaces are
excluded because it is not practical to process these files. This optimization allows
the virtual disks on NFS datastores compressed through the vCenter plug-in to remain
compressed, even when active. The system in this case ingests new data to the
compressed file asynchronous to the write.
You can manage the compression of files and directory on CIFS shares in a similar
fashion to virtual disk files in vCenter. Compression in this case also includes single
instancing. Windows users can enable file compression at the share, directory, or
individual-file level from within Windows Explorer. Files compressed in this fashion
also remain compressed with new changes ingested as necessary.
Figure 6 shows a compressed file, file1, displayed in blue within Windows Explorer.

Figure 6. A compressed file in Windows Explorer

EMC VNX Deduplication and Compression 10


Figure 7 shows the properties for file 1. You can enable compression for an individual
file via the Advanced Attributes dialog box.

Figure 7. The properties of a compressed file, file1

With policy-based management, you have the freedom to define default


deduplication behavior. The additional functionality provided within vCenter and
Windows allows you to manage files explicitly over and above the general
deduplication policy.

Deploying VNX compression for block data


Within block storage, there is no notion of a “file,” therefore compression is a
practical approach to capacity optimization that offers significant space savings
benefits for many data types. You can easily manage compression via Unisphere or
Navisphere® CLI both at the LUN level and system level. Once enabled, the system
automatically manages the processing of new data based on the amount of new data
coming in compared to system-defined thresholds.
Block data compression is tightly integrated with Virtual Provisioning. When
compression is enabled on a LUN, the LUN becomes a thin LUN if it is not already. The
software automatically handles migration of non-thin LUNs to thin LUNs. As thin LUN
blocks are freed, they can be returned to the pool for eventual use by other LUNs in
the pool.
Note that LUNs with block compression enabled should not be used for VNX file
volumes. VNX file deduplication and compression should be used exclusively for file
data as more granular control is available for file data rather than block, so the
system can identify inactive data to process versus active data.

EMC VNX Deduplication and Compression 11


Optimizations such as virtual disk file awareness are only available in the file
implementation. Block data compression is intended for relatively inactive data that
requires the high availability of the VNX system. Consider static data repositories or
copies of active data sets that users want to keep on highly available storage. Block
compression is fully compatible with replication software delivered in the VNX Local
and Remote Protection Suites, so there are many use cases where these products
may be used to create a compressed copy of a data set.
Figure 8 shows the LUN table of a VNX system in Unisphere. You can configure
optional columns for the LUN attributes “thin” and “compression.”

Figure 8. VNX LUN table in Unisphere


Many controls for block compression are available in the LUN Properties dialog box
under the Compression tab, as shown in Figure 9. In this dialog box, you can enable
and disable compression for the LUN.

EMC VNX Deduplication and Compression 12


Figure 9. LUN Properties dialog box, Compression tab
You can also set the compression rate here. In this case, rate refers to the speed at
which the compression process operates, not the level of compression effort. The
options are High, Medium (default), and Low. LUN-level Pause and Resume
capabilities are available on the right side of the dialog box.
User capacity is the capacity as presented to the server. Consumed capacity is the
amount of physical capacity allocated to the LUN. If compression is enabled, it
represents the end result of both thin provisioning and compression.
The Compressed LUNs Summary dialog box shown in Figure 10 provides a
consolidated view of block compression activity for all LUNs. It also provides system-
level compression control via the Pause button and Resume button at the bottom of
the dialog box.

EMC VNX Deduplication and Compression 13


Figure 10. Compressed LUNs Summary dialog box
When the Pause Feature option is used, all LUN-level compression operations are
paused. The exception is any traditional or thick LUNs making their initial transition to
thin for compression; you can cancel these operations in the Compression tab in the
LUN Properties dialog box. EMC recommends that you pause the compression feature
at the system level during known periods of high system utilization if response-time-
sensitive applications are running, Otherwise, the compression and subsequent
space reclamation processes will use CPU and cache resources that may impact
response-time-sensitive applications.

EMC VNX Deduplication and Compression 14


Conclusion
VNX storage systems provide powerful capacity efficiency features that can improve
effective capacity utilizations up to three times when compared to traditional storage
devices. These capacity-optimization features are included with the VNX Operating
Environment, at no additional cost. Deduplication and compression features for file
and block storage offer complementary capacity efficiency opportunities for all data
types in the primary storage systems.

References
The following white papers are available on EMC.com:
Achieving Storage Efficiency through EMC Celerra Data Deduplication
EMC Data Compression — A Detailed Review
EMC CLARiiON Virtual Provisioning – Applied Technology

EMC VNX Deduplication and Compression 15

You might also like