A company’s most important asset – besides its people – is its data. Most of this data resides in some sort of database and most companies’ IT resources are spent on enhancing the database’s performance or capacity; allocating money for backup and recovery typically comes second, third or even fourth. Many database backup solutions are untested and insufficient for protecting the critical data – an especially crucial problem in the data-intensive world of computer graphics and electronic publishing. Accordingly, backing up a database can be the most daunting challenge for many IT departments.
Here are a few concepts that may be utilized to implement an effective database backup solution.
Backing up a database ranges from a very quick, simple task to a very time-consuming, complex endeavor. There are a couple of good ways to back up a database and one you should avoid at all costs.
The one to avoid – the absolute worst way to backup a database – is to leave it up and running and execute a normal backup. Provided the backup software will even back up the database files that are open, the resulting backup will be almost completely useless. Databases need their files to all be in a consistent, known state when backed up. During the course of a normal backup, database files may be backed up minutes or hours apart. The files are so far out of sync from each other that it will be nearly impossible to bring the database up from this copy. This method is only mentioned so that it is never attempted.
Depending upon your individual requirements, the two viable ways to back up your database are what we call "cold" backup and "hot" backup.
• Cold Backup
By far, the easiest, most straightforward and cost-effective way to back up a database is via a cold backup, which is performed when the database is completely shut down. During this period, end users and applications will not be able to access the data in the database. As long as the database can have some downtime (enough to perform the backup), a cold backup is the best method for ensuring that the data is in a recoverable state.
• Hot Backup
For databases that cannot afford any downtime, the way to go is with a hot backup, performed when the database is still running. The users and applications are still accessing the database while the data is being backed up. To accomplish this type of backup, the database, obviously, must support a hot-backup model. The backup process consists of placing the database in hot-backup mode, instigating the backup and then taking the database out of hot-backup mode when the backup is complete. While in hot-backup mode, applications and users may experience some level of performance degradation. For many databases, this performance hit is acceptable as long as the database is up.
• Proprietary Backup
The leading backup software vendors provide "agents" that interface with the major databases. These agents provide the ability to perform tasks such as incremental backups and table space backup. The agents can be used to simplify the backup process and allow the database administrator more flexibility when performing a recovery.
The ultimate goal of a backup is to create a usable copy of the database. This copy is normally used for archival purposes (e.g., storing the backup tapes offsite) and data recovery (e.g., by keeping a copy of the archived tape onsite for quick access or storing it in a tape library for even faster access). There are, essentially, three ways to move data from the database to a tape device.
The fastest way to move data from the database server to a tape library is to attach the library directly to the backup server. The major challenge when performing a backup is to keep the drives moving. Attaching the library to the database server allows the drives to perform near or at their rated speeds. Unfortunately, dedicating a tape library to a database server is not a very efficient way to deploy a backup solution.
Attaching a tape library to a storage area network (SAN) enables all of the servers attached to the SAN to utilize the tape library as if it were their own direct-attached library. Because a SAN adds virtually no overhead, performance for each server attached to the SAN will be very similar to the direct-attached model (every server on the SAN actually thinks that the library is directly attached to it). Multiple servers can access the library at the same time, allowing for an efficient use of the library.
To share the library with the maximum number of servers, the tape library can be attached to a dedicated backup server. The performance of an individual backup over the local area network (LAN) to the backup server will be slower than backups to a direct-attached or SAN-attached library. The benefits are that more servers can back up to the library at the same time and every server on the LAN can utilize the library. The optimal way to deploy a library is to attach it to a SAN. The database servers should be attached to the SAN to achieve direct-attached performance. A dedicated backup server can also be attached to the SAN so that the rest of the data can be backed-up over the LAN, through the backup server and to the SAN-attached tape library.
Many databases are so critical that they cannot be shut down (cold backup) or operate in a degraded mode (hot or proprietary backup) for the amount of time it takes to perform a backup. For these databases, the goal is to implement a "zero-impact" backup solution, where some sort of "snapshot" (a point-in-time copy) of the database is taken, then moved, typically over a SAN, to another server that backs it up. The process is to place the database into hot-backup mode, create the snapshot and take the database out of hot-backup mode. At this point, the database server is running at optimal performance and is no longer involved in the backup process. The snapshot can now be moved to another server on the SAN (typically a dedicated backup server) and backed up to a tape library either immediately or later.
There are a variety of ways to create a snapshot. We’ll take a look at the three most common here.
A split-mirror snapshot creates an instantaneous second copy of data that can be used at a later time to recover data. This is accomplished by mirroring disk drives and then breaking the mirror and it is traditionally done within the disk subsystem. The major drawback of this type of snapshot is that there must be enough storage capacity to keep this "third copy" of the data. The benefit is that the data being backed up is on completely separate drives than the production data.
A snap-copy snapshot, like a split-mirror snapshot, is a complete copy of the data. The major difference is that when a split mirror is initiated, it creates an instantaneous copy of the data, so the data is available immediately. When a snap-copy is initiated, the data is copied to another area of storage, which may take from a few minutes to hours. This is a potential performance problem for a database server. While the copy is taking place, the disk copy process may take I/O cycles away from the database. As with a split mirror, each snap-copy snapshot requires enough storage capacity to hold an exact copy of the original data and the data to be backed up is on completely separate drives than the production data.
• Pointer-Based Snapshots
A pointer-based snapshot is not an exact copy of the data but a set of pointers that point to the original data. As the original data is written, the changed blocks are written to an area on the disk reserved for the snapshot, and the pointer is moved to that block; this process is called "copy on first write." Subsequent writes to the original data are not copied to the reserved area because the original data has already been moved. One of the key benefits of a pointer-based snapshot is that the reserved area requires just a fraction of the original disk space because only the changed blocks are copied. The obvious advantages of a pointer-based snapshot are that they take very little storage capacity and are instantaneous. The potential drawback is that, once the backup is initiated, it is reading the original data. This may cause a performance issue with the database.
Many database administrators chose to run a standby database server in their environment. A standby database server is another server that runs a copy of the production database. Periodically, the appropriate log files are copied from the production database server to the standby database server. The log files are then applied to the standby database. When these transactions are finished, the standby database is an exact copy of the production database at the point the files were copied. The standby database is typically used for disaster recovery purposes; savvy database administrators will also use the standby database to perform backups. After a set of transactions has been applied to the standby database, the standby database can be backed up to tape (either hot or cold). There is absolutely no impact to the production database server for this type of backup, and it is as reliable as the other backup methods discussed above.
The choices available to back up a database are much more attractive today than they were even a few years ago. Relatively small companies can now back up their databases with the same confidence and competence that, in the past, could only be realized in mainframe environments. One major point that should be made is that no matter how good a backup is, it is only as good as the recovery it can provide. Database backup and recovery processes should be well defined, well documented and well tested.