In the process of
losing its bidding battle with EMC for de-duplication market leader Data
Domain, Network Appliance (NetApp) has exposed its weakness in the
de-duplication (de-dupe) market sector. It had previously developed its own
technology (A-SIS) but evidently accepted that Data Domain provided a better
preferred NetApp, who seemed to have the deal sewn up with an accepted bid of $1.5
billion; then EMC made a hostile bid of $1.8 billion and, when NetApp matched this,
EMC went higher. NetApp was undoubtedly prudent to walk away at that point.
Perhaps it did not do so badly (and I am not referring to it receiving a
$57 million break-up fee from Data Domain, so from EMC).
The de-dupe market
is far more complex than simply saying "Data Domain is the market leader so the
best." There is actually quite a choice of solutions suited to different
approach is fast and efficient, as well as intuitive in that the appliance sits
in the data stream and de-dupes ‘in-line' as the data is received with no delay
to the output. So it is free-standing and basically plug in and go to get
immediate results. Yet this hardly scratches the surface of the evolving market.
appliances compete directly with Data Domain's and have been offered by EMC (so are likely to be dropped over time). Meanwhile, Falconstor's virtual tape library (VTL)
solution with de-duplication competes well against Data Domain's VTL option.
Sepaton are two of the independent vendors providing post-process de-duplication—the compression carried out after the back-up as this is not so
performance-critical as in-line. ExaGrid's approach looks like in-line to the
user at it backs up onto its own appliance ‘in-line' and immediately de-dupes
it out to the destination.
Both also cluster their appliances to gain greater
scalability. (Data Domain is expected to add this capability in due course but
is not there today.) Sepaton's approach
is also geared to specific back-up application types so can gain greater
compression for some formats by recognising and removing the applications'
headers and data markers from the data stream.
post-process) is unique in using content-aware compression using algorithms which
can de-dupe already compressed JPEG and MPEG file formats; none of the others
so far make any impression on these formats—a problem that is increasing with
graphics and video files becoming ever more common.
Then there is
CommVault's Simpana which uses a more global approach, embedding de-dupe in all
back-ups, remote replication and archiving—and so far the only vendor providing
de-dupe even for archive tape. NetApp itself was the first to offer de-dupe for
primary data with very little performance overhead. However, I can understand
some nervousness about playing with the integrity of primary files as distinct
From a legal and
security standpoint, there are a couple of basic de-dupe issues. One cannot
de-dupe encrypted data—but leaving it unencrypted in order to de-dupe it
obviously makes it more vulnerable to hack attacks. Then, fairly obviously,
de-dupe systems need to tamper with the stored data; yet some legal cases hinge
on the ‘real evidential weight' of the stored information so the tampering could
in theory be used to swing a case. So, de-dupe needs careful consideration by
those organisations for whom security or legal concerns are critical.
Finally, to me, there
is anyway a joker in the pack that may be played going forward. Earlier this
month I wrote about companies specialising in IT infrastructure optimisation
including WAN optimisation—and it is no surprise that they use various advanced single instancing
(SI) and de-dupe techniques; some of these will reduce the size even of an
already de-duped back-up copy.
So, one argument
looking to the future goes: "Who needs de-duplication appliances at all when
WAN optimisation has even better technology built in?" This, of course, assumes
the cost-benefit of installing such optimisation software and equipment would
be greater with de-duplication then providing little or no extra benefit. (In
time that might become the case but it is not so just yet.)
NetApp clearly has
a few alternative companies it could go for—or partner with—assuming it
does not want to opt for further development of its own technology. Right now
these may seem like second choices but are actually just alternative ways of skinning
a cat so to speak—with some of them very sound.
NetApp will no
doubt think carefully about its strategy so could yet turn this into a
success. Whether EMC proves to be a good for Data Domain is another matter.
We automatically stop accepting comments 180 days after a post is published. If you would like to know more about this subject, please contact us and we'll try to help.
30th July 2009: 'Andy' said:
I do appreciate you mentioned NetApp's A-SIS techonology was the first to market for primary file systems, but I think you fail to understand that Data Domain technology and NetApp's A-SIS technology are very different things.
Your opening comment demonstrates this: "In the process of losing its bidding battle with EMC for de-duplication market leader Data Domain, Network Appliance (NetApp) has exposed its weakness in the de-duplication (de-dupe) market sector. It had previously developed its own technology (A-SIS) but evidently accepted that Data Domain provided a better bet."
NetApp has a VTL that runs different software and is comparable to the Data Domain box, but again this is aimed at tertiary or backup systems - it is a far more appropriate platform to compare to Data Domain, and one could make a pretty good argument that Data Domain has the superior product. It is however totally different to A-SIS deduplication which is a feature of NetApp's operating system for its primary and secondary systems. Terminology is always a killer with technology as we can be using the same words and mean very different things. While my terminology may be incorrect, what I mean by primary and secondary is storage systems hosing "live" file systems/LUN's that are accessed by production servers and clients. By tertiary I am referring to a backup file system - VTL or tape.
It's my understanding that the 35,000 plus number is the systems actually running deduplication in production, however I can't officially comment on NetApp's behalf there. I am involved in the storage industry but not a spokesmen for them (although a fan, in case that wasn't obvious :) ).
As for reluctance towards deduplicating data for legal reasons, surely this applies to any tier of storage? If you are legally obligated to retain data, then deduplication in the backup space poses more of a risk from a regulatory point of view than that of a live filesystem. SOX and other regulations require backups of data for defined periods in the event that primary data becomes unavailable. This logically means that deduplication is potentially much more risky for backup data sets than primary or live file systems.
From a technology standpoint, it makes a lot of sense to store more data on less blocks, regardless of where the data resides, as long as the performance and availability requirements of the data are met.
The point of my (previous) comment however, was not to argue the merits or shortcomings of deduplication, rather to point out the level of ignorance about deduplication in the plethora of commentary on the NetApp/EMC/Data Domain saga. My comments may come accross as being a little terse, however I felt compelled to respond to your article and hopefully point out my perceived shortcomings in your commentary.
Reply to Andy?