The problems backing up big databases
Jon William Toigo
10 Feb 2004
Rating: -3.75- (out of 5)

According to the University of California at Berkeley, the fastest growing subset of business data is not files, but block data contained in relational database management systems. Anyone who has ever worked with backup/restore knows the hassles of backing up databases to and restoring them from tape -- especially big databases. So, Berkeley's insight is not exactly cause for celebration. But, there are still some issues with database backup that need exploring.
Here are some of the larger issues: Do bigger (say, multi-terabyte) databases spell death for tape, which chugs along at only 2 TB per hour under ideal laboratory conditions? Do such grand data constructs force companies into a disk-to-disk or mirroring strategy, and perhaps into a SAN topology, as some vendors would suggest? Does a big database shatter the concept of "backup windows" once and for all, since you need to quiesce a database before you copy its data to tape or disk, and copying all of the data in a huge database necessitates a fairly lengthy quiescence period, perhaps a lengthier one than your business can tolerate?
These are all good questions that are finally getting some attention as storage vendors jockey for position in the burgeoning "Information Lifecycle Management" space. In December, then again in late January, EMC Corporation made some much-covered moves to ally with in Campbell, CA-based OuterBay Technologies then with Oracle itself, to obtain tools and skills for sorting down the contents of huge databases - ostensibly, to migrate older, non-changing, data in the DB to second tier disk platforms.


These new friendships make sense, of course, within the context of EMC's "reference data" philosophy. Says EMC, the world is full of often accessed, but rarely modified data that needs to stay online for reference purposes. But it is not cost-effective to host such data on your most expensive, most high performance gear. Seems like a sound observation.
EMC is seeking to apply this philosophy to big databases and to develop an enabling strategy that disaster recovery and business continuity planners have been seeking for years. The strategy is simple: confronted by a really big database, might it not be possible to "pre-stage" the lion's share of the DB (the non-changing part) at the recovery center. There, in the event of an interruption, the "pre-staged" data could be loaded from tape to disk in the time it took for the IT guys to travel to the emergency recovery center or hot site. With a viable data segregation and pre-staging methodology, recovery personnel could carry only backups of the changing data components of the DB to the hot site, then load them on top of the already restored non-changing or reference DB components. In short order, you would be ready for processing.
The scenario has appeal for the preponderance of firms that already have investments in tape technology and for whom the cost of mirroring is too great to justify. Plus, to the delight of StorageTek, Quantum, ADIC, Overland, Sony, Breece Hill, Spectra Logic, and many others, it has the additional value of keeping tape library vendors in profit.
The question is whether the enabling technology that EMC and others are exploring to carve "reference data" out of databases is feasible given the diversity and uniqueness of databases in play today. The answer is maybe.

Don't fight your DBA
Jon William Toigo
17 Feb 2004
Rating: -4.67- (out of 5)

In part one of this tip, Jon William Toigo discussed some issues associated with backing up large-scale databases, and offered insight into what one company planned to do about it through the use of reference data segregation and a pre-staging methodology. Part two gets to the root of the problems associated with large-scale backups.

The root of the problem

Database administrators and designers have had the ability for many years to construct their databases so that "reference data" could be neatly tucked away into well-defined subset constructs. Comparatively few have built this functionality into their DB architecture however. Why? The explanation is the same as the explanation for why so many n-tier client-sever applications lack common middleware standards, a design factor that inhibits their recoverability: No one asked them to.
Generally speaking, DBAs have a bad rap. They often take it on the chin from storage guys who view them as out-and-out resource hogs. Storage administrators frequently complain that the DBA doesn't understand storage resource management. He mismanages the resources he has and often requests much more capacity than he actually needs, compromising capacity allocation efficiency strategies. At the end of the day, most storage guys throw up their hands in disgust and just give the DBA whatever he wants, especially if his application is mission critical.

Disaster recovery planners have adopted an even more laissez faire approach by simply accepting whatever instructions the DBA gives them regarding the capacity and platform requirements for database recovery. DBAs almost always want real-time mirroring or low delta journaling systems to safeguard their assets. From their perspective, it is the simplest way to cover their data stores, regardless of whether it is also the most expensive and inflexible approach.

What has always been missing is a collaborative strategy that would give storage managers and DR planners chairs at the application and database development tables. Without their input at the earliest design phases and throughout the design review process, the management and recovery criteria for data base and application design typically go unstated and are not provided in the resulting product.

Of course, the idea of introducing personnel from storage and DRP into the database design process will likely raise the hairs on the necks of DBAs everywhere. Database and application designers have their own lingo and diagrammatic conventions, most of which seem alien to non-DBAs. Anyone who doesn't talk the talk, can't communicate effectively with the DBA let alone specify requirements in terms and language that the DBA will understand.

Some retraining might help to bridge the gaps. But, to really address the systemic problems, a complete retooling of IT professional disciplines is in order: combine the data protection skills and knowledge of the DRP guy with the storage administration skills and knowledge of the storage guy with the database design and administration skills and knowledge of a database guy and you will produce the "data management professional." But that would require chimeric gene splicing in the extreme and would probably violate the Harvard protocols on genetic engineering.

Bottom Line

In the absence of such sweeping systemic and procedural changes, solving the problems of large-scale database backup will require a conscientious effort to get the DBAs and data protection folk talking to one another so they can come up with recoverable designs. In the final analysis, this is probably a more fruitful approach than trying to find a silver bullet technology for ferreting out all the cells from all the columns and all the rows that seem to have the characteristics of reference data.

All Rights Reserved, Copyright 2000 - 2004, TechTarget

Questions or problems regarding this web site should be directed to abeckman@outdoorssite.com.

Copyright 2008 Art Beckman. All rights reserved.

Last Modified: March 9, 2008