ILM: Panacea or Proprietary Poison?
Imagine a perfect data-storage environment: heterogeneous storage platforms linked effortlessly using a variety of network and channel protocols to form a stable, well-managed storage infrastructure. Atop this platform is an intelligent management system that automatically places data onto arrays and moves it over time, all the while overseeing its security and protection. Data flows smoothly and transparently in and out of this storage utility.
It's hard to resist the allure of information life-cycle management. But before you buy in, you need to distinguish the fantasy from the reality. Vendors' definitions of ILM vary drastically, as do their visions for implementation.
ILM's Mainframe Origins
If you've worked with mainframes, you've probably already been exposed to ILM's concepts. ILM is an intelligent engine that considers the inherent storage requirements of data generated by applications and end users. The ILM engine reads the data's "DNA" to determine which characteristics it inherited from the process that created it to discern what kind of primary storage it requires. If the data is critical and has inherited confidentiality or retention requirements from the application that generated it, the ILM engine recognizes this and places the data on those storage-infrastructure components that offer the appropriate protection and security services.
Similarly, if the data needs to be referenced or shared among a variety of applications and users, the ILM engine routes it to another set of storage components suited to store data that is often accessed but seldom modified. Data that is written once and then rarely if ever accessed is sent to either an archive or the trash.
This data placement occurs automatically, under the unblinking eye of the ILM
software sentinel. The ILM engine continually monitors data requests, counting
access frequency to determine whether the data should be moved to a more
accessible or less accessible location.
Using an access-frequency counter is a much better method for discovering stale data than using the "date last modified" attribute assigned to the data by the server OS. A lot of data is touched but never changes, referenced but never rewritten. Even the "date last accessed" parameter that OS vendors have begun to capture doesn't effectively identify data that's ready for the archive: There's a difference between data accessed once since the last time you checked and data accessed 100 times. "Date last accessed" lacks the granularity of a true access-frequency counter, like the one they had in the mainframe shop.
The ILM engine, using data characteristics and access frequencies, moves data from platform to platform under the aegis of well-defined migration policies. In addition, the ILM engine considers the capabilities, costs and location of the underlying storage infrastructure, integrating these characteristics into policy definitions so data lands on the storage gear providing the best price-performance mix.
The old formula still holds true: Access to data once written to disk falls by 30 percent within a week and by 90 percent within a month. So why store infrequently accessed or updated data on your most expensive gear? The vendors say ILM will solve the problem of storage oversubscription with underutilization once and for all.
To help you assess the vendors' claims for yourself, we asked 16 companies for their definitions of ILM and for descriptions of their products. Our objective was to create a compendium of definitions and perspectives that serves as a one-stop reference to ILM offerings.
Advanced Digital Information Corp., Arkivio, Avamar Technologies, Hewlett-Packard, Hitachi Data Systems, IBM, NuView, Princeton Softech, Softek, StorageTek, Sun Microsystems, Troika Networks and Veritas Software Corp. all responded. You can find their complete responses here.
Computer Associates chose not to participate, insisting that it's too soon to talk about ILM. EMC, a vocal proponent of ILM, told us it intended to take part, but it never submitted a response. Instead of a questionnaire response, Rainfinity submitted an essay describing the need for building-block technologies to enable ILM.
We're passing no judgment on the vendors' relative strengths and weaknesses--our aim is simply to illuminate their ILM goals and visions. In the final analysis, ILM seems to mean whatever the vendor says it does. Moreover, in many cases, the term describes a proprietary approach to data management that locks the consumer into a particular vendor's technology for as long as the data must be retained. Our best recommendation is a familiar one: Caveat emptor!
Advanced Digital Information Corp.
Well-known in the tape automation world and a comparatively recent player in the SRM (storage-resource management) market, ADIC distinguishes DLM (data life-cycle management) from ILM.
The latter, the company notes, "uses applications that can look inside files, and use the content of them to help an end user determine their value and different use requirements. ILM applications are very specific, dealing with a particular application area (SEC regulations for e-mail retention, for example), and end users are likely to use several different ones. ILM can help users understand different requirements for files, but to actually manage map files to storage resources--to actually move data, protect it [and provide] access [to it]--ILM needs to use DLM."
For DLM, ADIC posits its own SRM product, StorNext Management Suite, as the ideal solution: "[StorNext] provides an automated system for managing files on storage resources. DLM matches data access, protection and retention policies to business requirements, and it provides a system for changing the way that data is treated over time as the business requirements change. DLM provides data-management value for a number of uses, one of which is providing a storage-management foundation for ILM."
ADIC's SRM software provides a policy engine that users can employ to define the storage requirements for discrete data sets. However, it's up to the user to know what data has which access, security, protection and retention requirements. This "unified storage" supports Windows and POSIX, and pricing starts at about $21,000 for a Windows system with four servers and 1 TB of managed storage, according to ADIC.
· Advanced Digital Information Corp., (800) 336-1233, (425) 881-8004. www.adic.com
Arkivio does an excellent job of explaining what it means by ILM. According to the vendor, "there are six key elements or functions of an ILM solution: data/storage discovery, logical organization of resources, classification/valuation schemes, creation of data-management policies, simulation of policy results, and evaluation (monitoring and reporting) of the entire ILM process." The company call its Arkivio Auto-stor offering "the first storage-management product of its kind to integrate all these elements in one comprehensive ILM solution."
Auto-stor is designed to scan and collect detailed metadata on data files and storage resources, as well as statistics on storage utilization and data-usage patterns across heterogeneous file systems, without the help of server agents. It lets the administrator create volume, groups based on common characteristics, such as cost, manufacturer/model, available capacity and storage type. Additionally, the user, based on attributes such as last access/modified, can define so-called file groups like size, owner and type, each of which may be assigned a level of priority reflecting its value to the business. These resource groups are incorporated into policies using a wizard-based interface.
The solution still requires the administrator to bring to the setup an
understanding of the specific storage requirements of data so they can be
included in file groups. Also, the product doesn't offer access-frequency
counting functionality that might help determine when data needs to be moved
from platform to platform or to archive.
Arkivio has come up with a unique mechanism for automating the migration process based on observed and user-supplied information. Its PAE (Policy Automation Engine) computes a DVS (Data Value Score) for the file groups and an SVS (Storage Value Score) for the volume groups to determine the most appropriate files and volumes to include in data-management actions. DVS scores are calculated using the data's parameters, such as age, size, last modified, last accessed and assigned priority level. Similarly, SVS scores are calculated based on volume attributes, such as cost and the percentage of storage utilized.
The product directly supports only Common Internet File System and Network File System files. However, it can indirectly support e-mail, using third-party e-mail archiving software.
Auto-stor's pricing is complicated. Prices start at about $6,000 per terabyte, but decline by 15 percent for every 10 TB added. You'll need Arkivio Central Server, as well as the support options appropriate for the file-system protocols you'll be using. Count on spending between $10,000 and $15,000 for the additional software.
· Arkivio, (877) 275-1700, (650) 237-6100. www.arkivio.com or email@example.com
Avamar Technologies' Axion solution has been available for several years. Using a "commonality factoring" algorithm, the technology eliminates duplication to squeeze more data into a smaller space.
Axion doesn't support the characterization of back-end hardware in terms of capability or cost. This partly reflects the fact that until recently, the technology was a "hardware stovepipe"--meaning it worked only with Avamar RAINs (redundant arrays of independent nodes). However, the vendor plans to support heterogeneous back-end storage, and is building an API to let Axion interoperate with storage-management framework products.
Pricing is on a per-gigabyte basis. A perpetual license for Axion software to provide complete backup functionality is $20 per gigabyte; the list price for a perpetual license for complete replication functionality is $5 per gigabyte.
· Avamar Technologies, (949) 743-5100. www.avamar.com
HP's response to our questionnaire begins with the promising assertion, "In truth, ILM is a new marketing term--not a new technology concept." The vendor goes on to say it has offered products in the ILM space for more than a decade, providing support for a "set of processes and supporting tools that help companies manage business information from cradle to grave." HP's current focus is on providing "building blocks" such as an "operational data store" and a "reference data store."
While calling its suite of data-protection and recovery solutions the most advanced of its ILM building blocks, HP concedes that its "policy-management tools, hardware infrastructure and general improvements to the existing foundation" are still in development. The vendor gave us a list of partners, with product areas ranging from backup and archiving to health-care systems and ILM, to underscore its commitment to developing a "cradle to grave" capability in the future.
HP offers no technology for data-naming or access-frequency counting, or for storage-platform characterization by cost or capability. Instead, it claims to focus "on the business metrics that identify the areas of IT where we can offer a significant reduction in total cost of ownership."
The vendor declined to give us specific pricing information, explaining that the cost of any ILM solution will depend on the components and services selected.
· Hewlett-Packard Co., (650) 857-1501. www.hp.com
Hitachi Data Systems
As HDS sees it, ILM and DLM are distinct from each other, yet complementary. The vendor defines ILM as "the function(s) necessary to manage information through the work flow or business process from creation of an information object to its completed state. ... ILM typically exists at the application layer and is most often associated with document/content management and collaboration software, such as Documentum, FileNet and Open Text." HDS characterizes DLM as "the function(s) necessary to manage a data object or collection of objects within the storage environment based on the prescribed value or access requirements associated with the life cycle of the data object."
Sharing the IBM mainframe world's perception of ILM functionality, HDS recommends choosing your storage environment based on the relative importance of your data.
"As the activity against a valued data object decreases," the vendor says, "it may be moved to lower-cost, secondary or tertiary levels of storage to optimize the cost of storing that data object and to improve the efficiency of the primary storage. As storage leases expire, technology refreshes occur, ownership changes and business processes are replaced, new storage needs to be acquired and provisioned for migration or replication of the valued data."
HDS claims that structured, semistructured and unstructured data require different ILM solutions that will need to be "ported to our DLM solution." HDS also sidesteps the issues of access-frequency counting, storage-platform characterization and policy articulation.
· Hitachi Data Systems Corp., (408) 970-1000. www.hds.com
IBM, a company with a self-described "deep history and leadership in ILM," concedes that the term now has "a broad range of definitions." The core objective, IBM says, is to "align the value of data with business priorities" in terms of access needs, costs and retention requirements.
With its DB2 Content Manager, IBM aims to provide a single repository for all manageable content. The vendor advocates an object-oriented framework to tie all applications, including IBM Tivoli Storage Manager for Hierarchical Storage Management, into the OO repository.
IBM's Tivoli Storage Manager has predictable capabilities for classifying data by file name and date last accessed, for example. However, the product lacks an access-frequency counting or data-naming function as such.
The company offers a data-retention solution for regulatory compliance, priced at $141,600 for 3.5 TB.
· IBM Corp., (888) 839-9289. www.ibm.com
NuView offers two ILM products, but they can be used only with file data in Windows environments.
As its StorageX product reveals, NuView favors a global name space. StorageX
virtualizes the physical location of files and presents a consistent directory
tree to end users. With a global name space in place, StorageX lets data move
behind the scenes without the use of HSM (Hierarchical Storage Management)
StorageX won't directly assist in identifying data-storage requirements. However, it does let administrators create customized "logical" data groups, organized by department, user, project, location, tier of storage and so on. Access-frequency counting is limited to "last access date," and no facilities are offered for characterizing hardware targets, though the product lets administrators define storage pools based on the parameters they specify.
StorageX costs $2,000 per node. Pricing for File Lifecycle Manager varies with the type of Network Appliance Filer on which it's deployed.
· NuView, (281) 497-0620. www.NuView.com
Princeton Softech dominates the database archive management space with a 56 percent market share, according to Gartner Group. Its Active Archive Management tools and consulting services let users develop effective strategies for "data and application classification, followed by data-archive business specifications and data-access requirements," the vendor claims. "Built into these sessions are an assessment of customers' data environments and implementation plans."
The company provided no pricing information.
· Princeton Softech, (800) 457-7060, (609) 627-5500. www.princetonsoftech.com
The vendor's Softek Lifecycle Management product consists of Storage Manager and Storage Provisioner. Storage Manager lets users "profile data" and establish "quality of storage" definitions when moving selected data between storage tiers, Softek says. Sounds like "logical class" and "physical class" definitions in mainframe SMS.
As for access-frequency counting, Softek has designed a unique workaround.
"Softek Storage Manager," the vendor says, "identifies the last time a file or data set was accessed as its closest proxy for access frequency--files most recently accessed are also likely to be the most frequently accessed. By increasing the frequency of data scans, the storage admin can have a higher degree of granularity for the last access date and time for each file." It's a solid workaround to the problem of ferreting out what kind of attention data is receiving.
The Softek solution tallies out to $39,000 for the Storage Management Server, which includes the console, repository, Action Set Engine, reporting and correlation engine, and $795 for each managed-server microagent. Pricing wasn't provided on Storage Provisioner, which comes into play if you're doing Softek-style ILM across a SAN.
· Softek, (877) 887-4562, (408) 746-7638. www.softekfujitsu.com.
From the vendor that claims to have coined the term "ILM" comes this definition: "ILM is a sustainable storage strategy that requires balancing the cost of storing and managing information with its changing value over time, providing a practical methodology for aligning storage costs with business priorities." StorageTek claims to offer all the elements of an end-to-end solution, though it's still working on a policy-based data-classification software product and a virtual-tape product for open systems.
Then, the response takes an interesting turn as StorageTek wraps itself in the Common Information Model flag and states that "anything requiring multivendor ILM solutions will be CIM-based." The vendor says it's working with the Storage Networking Industry Association to become CIM-compliant. This boilerplate repeats several times in the document with respect to a data-naming scheme, a storage-platform characterization and the interoperability of StorageTek products with other industry offerings.
StorageTek didn't provide pricing on its homogeneous solution, but it claims to be competitive.
· StorageTek, (800) 877-9220, 303-673-5151. www.storagetek.com
Sun begins its response with the observation that there's no such thing as ILM in a box: "ILM is an entire set of packages (hardware, software and practices) that represent the aggregate of technology enabling data management. These run from simple media maintenance and backup through complete system-managed data placement and enterprise-continuity offerings. Some of the key cornerstones of Sun's ILM strategy are the StorEdge SAM-FS and QFS software."
Sun StorEdge SAM-FS, the vendor's HSM software, provides management-policy and migration functionality, whereas Sun StorEdge QFS is a universal file system that lets you consolidate file data into one shared metadata repository. SAM-FS, according to Sun, provides basic data-characterization services. However, data-migration policies hinge not on access frequency, but on traditional data parameters, including size, name, directory location (or subdirectory), create date, last-modified date and last-touched date. "Based on these parameters, multiple sets of actions can be defined for SAM-FS to take," Sun says.
Sun claims its product, which doesn't support iSCSI or DAFS (Direct Access File System), is difficult to price. "There is no formula for calculating the cost of an ILM environment," the vendor says, "as cost is dependent on the size of the environment implemented, the performance levels required and so on. Cost will vary per user. However, a preconfigured solution could range from 1.5 cents per MB for the smallest environment to approximately 0.5 cents per MB or less for the largest environments. Many many variables go into designing a cost-effective ILM infrastructure, so the final cost points will, by definition, vary."
· Sun Microsystems, (800) 555-9SUN, (650) 960-1300. www.sun.com
Troika's response states that the vendor doesn't provide ILM per se, but offers an enabling technology for hosting and supporting DLM and ILM data movements across a SAN.
· Troika Networks, (805) 371-1377. www.TroikaNetworks.com
Veritas Software Corp.
For some reason, Veritas elected to partner with Network Appliance in responding to our questionnaire. This is doubly curious, because the mention of anything NetApp is difficult to find in the response. Veritas uses only the term "data life-cycle management" in its document to describe "the process of managing business data throughout its life cycle from conception until disposal, across different storage solutions and within the constraints of the business process."
Citing the absence of a uniform storage infrastructure, which Veritas concludes is cost-prohibitive for any midsize-to-large organization (interesting, in light of NetApp's participation in this response), and the fact that the criticality of any given data may change over time, the company concludes that a DLM system is required that will move data over time to different classes and types of storage solutions.
Veritas goes to great pains to differentiate DLM from HSM: "HSM primarily focuses on optimizing data availability in a virtual online model (across a hierarchy of storage, typically disk and tape), whereas DLM takes all other aspects of the data's life cycle in consideration, too, including the protection levels, data retention and destruction of data."
Enter Veritas Data Lifecycle Manager, which lets companies "solve their problems of data growth, compliance, data security, data organization and resource utilization by automating the management of data--from creation through disposal--according to defined policies. In addition, Veritas Data Lifecycle Manager provides powerful, high-speed search and index technology that reduces the time and cost of electronic records discovery." But there's a catch: The solution is limited to files (NTFS only) and semistructured data such as e-mail (Microsoft Exchange only).
Setting up the solution requires administrators (with the assistance of
Veritas Consulting Services) to hand-pick data to be included into specific
policies. Storage targets are also hand-picked and grouped into "data stores,"
Veritas' version of physical classes.
Veritas offers no performance- or cost-profiling features to characterize hardware platforms. It offers an SRM suite as a means to collect performance data so decisions can be made regarding where to target data at different stages of its life cycle.
Pricing is offered on a scenario basis: a single-processor Veritas Data Lifecycle Manager server-plus to managed file servers and one managed Exchange Server is $8,900. This doesn't include the cost of hardware itself, the cost of consulting services to help define policies or the cost of the SRM tools that will be needed to collect rudimentary performance data on hardware. Nor does it include the cost of any NetApp gear.
· Veritas, (800) 327-2232, (650) 527-8000. www.veritas.com
Leveraging the hype around regulatory compliance, vendors are pitching information life-cycle management as a magic formula for just about everything from reining in storage costs to building a utility storage infrastructure. Missing from their products, however, is one or more of the key ILM enablers: an automated data-naming scheme, an access-frequency counter and a way to characterize storage platforms on a cost/performance basis--all of which are required for policy-based management engines to determine which data goes where. And absent from the messaging around ILM is a consistent definition of what, exactly, ILM means.
To counter the confusion, we asked 16 vendors to define ILM and describe how they're aiming to bring it about. Here's what they told us.
· "Vendors Opt for All-In-One SAN Strategy,"
The concept of Information Lifecycle Management (ILM) dates back nearly 30 years to IBM mainframe computing. IBM's Systems Managed Storage (DFSMS) technology, working in concert with its Hierarchical Storage Management (DFHSM) technology, provided mechanisms for accomplishing four "core tasks" of lifecycle management:
1. Classification of data into groups or 'logical classes' based on the
storage requirements of the data itself.
It could be argued that the mainframe world was just getting started with true ILM when Y2K concerns compelled many companies to migrate their business apps off of the mainframe and into the distributed-computing environment. Unfortunately, this application migration was not accompanied by a transition of ILM or other storage-management capabilities from the mainframe space.
While there has been some flirtation with Hierarchical Storage Management (HSM) in the distributed space, these initiatives by software vendors (including IBM) largely failed to catch on, owing to difficulties in supporting HSM operations over bandwidth-constrained LANs used to interconnect heterogeneous server-attached storage arrays.
Most HSM products migrated files from disk to tape archives across the corporate LAN based on simple criteria such as file age. The software left behind a placeholder or 'stub' in place of the migrated file that could be used to recall the file from an archive if it was requested for use by an application or end user. The burden placed on busy LANs, together with the delay in file retrieval, cast a pall on HSM generally.
Only recently, with the advent of networked storage, have HSM and ILM has come back into vogue. Faster LANs and the possibility of off-LAN data transfers across back-end Fibre Channel fabrics or Gigabit Ethernet-based IP SANs have, in the view of vendors at least, reinvigorated interest in these technologies.
And just in time, apparently, when you listen to the reasons posited by vendors for adopting ILM today. In these days of 'do more with less,' some argue, ILM is a must-have because it provides the key to obtaining better capacity utilization efficiency from a company's storage infrastructure. By migrating data across multiple tiers of storage, based on data requirements and the capabilities and cost characteristics of each storage tier, companies can 'buy back' their most expensive storage for their most critical data.
This is both an extension of the original business case made by storage resource management (SRM) and Hierarchical Storage Management (HSM) software vendors for their products, and of the product pitch of a growing number of vendors of low-cost Serial ATA disk arrays, sometimes called 'ghetto RAIDs.' Those additional tiers of storage can be low-cost arrays capable of hosting data that rarely changes but must be kept online.
ILM can automate the movement of the appropriate data to these low-cost platforms, enabling companies to optimize their expensive platforms, defer expensive capacity upgrades, and keep big iron in play for a longer period of time.
Another business case made by the industry in support of ILM is regulatory compliance. The thinnest arguments are based on Sarbanes-Oxley (technically, the Public Company Accounting Reform and Investor Protection Act of 2002), which has less to do with data storage than personnel and processes. Stronger arguments are based on SEC rules in the broker/trader world and HIPAA requirements in the healthcare industry.
While narrowly focused on specific industries, these regulations require data to be stored for a fairly lengthy period of time--longer, in fact, than the service life of most magnetic media. They also impose certain accessibility and privacy requirements on data handling. Theoretically, ILM is needed to help to move data around to fresh media from time to time and ensure that it is provisioned with appropriate protection and access capabilities.
The downside is that most ILM schemes lack some or most of the functions of classic mainframe ILM. Some solutions are a simple recasting of SRM capacity management, or HSM functionality, or even traditional backup and archive software under a new moniker. Others are content-management systems, pure and simple. In the StoragePipeline poll, the preponderance of users seemed to agree that ILM is not a solution purchased from a single vendor, but a collection of building blocks--software and hardware--that may need to be purchased from numerous vendors.
At a low level, storage equipment must be able to deliver the services (RAID levels, performance, security) required by data. It must also avail itself of an effective management scheme in order to provide a solid foundation on which ILM can be based. SRM tools, management frameworks, and even CIM are posited by most ILM solution providers as a prerequisite for lifecycle management.
Moving data effectively across platform tiers is the job of special switches, according to Troika Networks and Rainfinity. Troika says that its 'application switches track statistics on which data is being accessed most often. These statistics are tracked by the Troika Application Switch hardware and firmware, without affecting application performance or data access speeds. This information is provided through programming interfaces to third-party software, which can use this data to determine their own migration policies.' This may indeed be an enabler of access-frequency information required by true ILM.
What Troika does for Fibre Channel fabrics, Rainfinity endeavors to do for IP networks--specifically for all the files that reside on IP-attached NAS and file servers. The company has recently announced a new product, GridSwitch, to address an increasingly common problem in network-accessed storage: migration.
To move a files from NFS-mounted storage on one server (NAS or direct attached) to NFS-mounted storage on another server as part of a D/ILM scheme, you would need to know which clients mount the file system on the server in question, quiesce those clients for the amount of time required to move all the files, then manually re-point all of the servers toward the new repository. Rainfinity says that this is a new task that has been visited upon storage managers and server administrators with the penetration of NAS and the advent of ILM.
Enter GridSwitch, an appliance that is plugged into a set of ports on your existing switch and brought online when you need to do a cross-platform migration of your data. A virtual LAN is established between the network-attached storage platforms and persists until you shut it off. Files are moved across this VLAN, while the appliance handles any requests from clients for file access, redirecting them to the new platform if that's where the files are.
Over time, most clients will remount the storage at its new location. Once this has happened, the GridSwitch can be taken out of band. The technology makes the entire process of moving data about in your network transparent to the user and simpler for the administrator.
NuView and other advocates of global namespace technology, including StorageTek and IBM, take a different approach to simplifying behind-the-scenes storage changes--that of presenting a consistent set of familiar directory names to end users. This masks the complexity of the actual storage environments from view, while facilitating back-end storage reshuffling to support ILM-style data placement.
Which brings us to the continuing bifurcation of data into block and file types: Most ILM strategies must account for both data types (plus a third: e-mail), if they want to capture all of a company's data in a lifecycle scheme. OuterBay Technologies, with its encapsulated archive technology, moves less-frequently accessed structured data in databases into archival repositories or second-tier 'near on-line' storage to manage database expansion, creating what it terms 'application data lifecycle management.'
The company recently hit the charts with a much-ballyhooed partnership with EMC. Its competitor, Princeton Softek, the market-share leader in this space if you believe the analysts, partners with Network Appliance.
ILM is, in the final analysis, a new and an old way of thinking about storage. It is new in the sense that it forces IT professionals to question preexisting concepts. Storage is not a "repository" where data goes to sleep, it must be construed as a true network across which data is constantly in motion. We have to start dealing with that fact, or storage costs will drive companies into bankruptcy.
Copyright © 2004 CMP Media LLC. | STORAGE PIPELINE All rights reserved .
Questions or problems regarding this web site should be directed to firstname.lastname@example.org.
Copyright © 2008 Art Beckman. All rights reserved.
Last Modified: March 9, 2008