Tar and Taper for Linux
This article describes backing up files on a Linux system. Two programs are described—tar and taper. The first program is available from the Free Software Foundation under the GNU license and is included with most distributions of Linux. The second program is written by the author of this article and provides a more user friendly interface. It is also available under the GNU license and thus is freely available. Note that this article is not meant to be a full reference for either package, but merely an introduction to get you started. For full details, see the documentation that comes with each package.
Nearly every form and clone of Unix (as well as other operating systems) comes with some version of tar. It is a standard program, and archives made on one machine should always be usable on other machines. The real problem with tar is that there is virtually no user-interface at all. All operations must be done via command line switches.
tar can make backups to a hard disk file or to a tape drive as well as over a NFS link (which we won't cover here). The files to be backed up can be compressed using GNU gzip (or compress).
To make a backup, the basic form is:
$ tar [options] files_to_backup_or_restore
The most commonly used options are:
- c
Creates a new archive.
- z
Compresses the archive using GNU gzip.
- Z
Compresses the archive using compress.
- f name
Use name as the archive file or device. The default is documented as /dev/rmt0, although some people have changed this so that the default is /dev/nst0, /dev/tape, or even standard input. It is usually safer to explicitly give the device name of your tape drive all the time.
- r
Append files to existing archive. Note that if you use ftape, this option will not work because of a limitation in the current ftape driver.
- u
Append files to existing archive but only if they are newer than the files already in the archive. Once again, if you use ftape, this option will not work.
Thus, to create a compressed backup of your /etc directory in a file called etc_backup.tar, you would do:
$ tar czf etc_backup.tar /etc
Note that all subdirectories under /etc will be backed up as well.
If you now want to add the contents of /usr/local/etc, you would do: $ tar rzf etc_backup.tar /usr/local/etc
Suppose that you have now made some changes to the files, but not all of them. You can do:
$ tar uzf etc_backup.tar /etc /usr/local/etc
and tar will go through and append to the archive only those files that have been changed since the archive was originally created.
The above examples apply to backing up to a file on the hard disk. Backing up to a tape drive simply involves giving the filename of the tape device, usually /dev/ftape for floppy tape drives and /dev/st0 for SCSI tape drives.
The two options that are relevant here are:
- x
Means extract file from archive. If no filenames are specified, all the files in the archive are extracted.
- t
Means print table of contents; prints names of files that would be extracted but does not actually extract the files.
Thus, to restore the contents of the backup in the above example, you would do:
$ tar xzf etc_backup.tar
Note that tar does not put the files back where they came from, but rather creates a new tree based on the current directory. For example, if you were in the /usr/home/john directory when you issued the above command, you will find that a new subdirectory /usr/home/john/etc has been created and all the files are in that subdirectory. If you wish to restore the files whence they came:
$ cd /$ tar xzf etc_backup.tar
Note that doing this is very dangerous, since old files are over-written without warning. This can have dire consequences if not used properly. It is often much safer to restore in your home directory or /tmp and then copy the files to their correct location after you have checked that nothing horrible will happen.
To restore an individual file or directory, simply specify the name after all the tar arguments. For example, to restore just the hosts and the passwd file:
$ tar xzf etc_backup.tar etc/hosts etc/passwd
Note that the full pathname (excluding the leading /, which tar explicitly does not store) needs to be specified.
Most people who write software for Linux and make it available to others distribute it via tar files. For example, say that you have downloaded a new game, best_game-1.3.tar.gz. To install (restore) this game in your home directory:
$ cd$ tar xzf best_game-1.3.tar.gz
The game and all its applicable files will then be restored. If the author of the program has followed normal convention, all the files will be in a directory called best_game-1.3. Note that when restoring files from an unknown source, it is a very good idea to restore the files in your home directory, examine the files, and then when sure everything is correct, move them to the location suggested by the author. This way, you will avoid inadvertent file overwrites. It is also best to first use the t option to see whether the author has put the files in a subdirectory. If not, make a subdirectory and use it:
$ cd$ mkdir worst_game-0.1$ cd worst_game-0.1$ tar xzf worst_game-0.1.tar.gz
Some of the other options that tar supports are:
- M
Tells tar to use multi-volume archives. If tar comes to the end of the floppy or tape, it will not abort with an error, but prompt for insertion of a new floppy or tape, which it calls a “volume”. Each volume contains a stand-alone archive file and you don't need all the volumes to extract files, but if a file is split across two volumes, you will need to extract that file with the -xM option. Note that some tape devices, such as DATs, do not work with this option.
- N DATE
Tells tar to operate only on files that are newer than DATE. Thus, you can tell tar to backup only files that are newer than a certain date. DATE is specified in the same format produced by the date command. Normally, directly before doing one backup, you use the date command to record the date and time when the backup was made, like this:$ date > last_backupThen the next time you are backing up, you can backup only files that have been changed during or since the last backup by including the option -N "cat last_backup" in tar's command line.
- T FILENAME
Tells tar that a list of files to backup/restore is in FILENAME. For example, tar czf /dev/ftape LIST_FILES would create an archive containing files that are named in the LIST_FILES file. The LIST_FILES file is simply a straight text file with one filename on each line.
- v
verbose mode.
- h
When tar comes across a link, it normally stores details about that link. If this option is given, tar will actually store the file pointed to by the link and pretend the link doesn't exist. Use this with caution since you can end up with many different copies of the same file.
- W
Causes tar to verify the archive after it has written it. It will not work on tape drives that cannot rewind.
- P
Normally tar strips the leading / from a pathname so that when you restore, the file is restored in a directory relative to the current one (see the above example with /usr/home/john). By specifying this option, the file is restored from where it was backed up. Use this with caution since you can inadvertently overwrite a file on your hard disk.
More details can be found in the tar man page as well as in the info files that come with the tar sources.
Although most backups are done on tape drives, it is sometimes useful to use your floppy drive (which can be very slow and frustrating!). To use your floppy, give tar the filename /dev/fd0 for the first floppy (drive A: in DOS parlance) and /dev/fd1 for the second floppy (drive B:). So, to make a backup of your /etc directory to “drive A:”
$ tar czf /dev/fd0 /etc
Note that the existing contents of the floppy will be totally overwritten. When using floppy drives, the -M option is very handy since after one floppy is full, tar will automatically prompt for the next floppy.
If you use ftape, you cannot append files to an existing tar backup. Therefore, if you make a backup that occupies only 10 MB, you will have wasted the rest of the tape. Fortunately, there is a way of using the rest of the tape, using the mt program. After tar has written a backup to the tape, it writes two EOF marks on the tape. Schematically, your tape now looks like:
You can tell mt to advance to these EOF marks by
$ mt -f /dev/nftape fsf 1
Note that you must use the non-rewinding device (/dev/nftape), because if you don't, after mt has repositioned the tape, it will automatically be rewound and your repverbose mode positioning will be lost. The fsf 1 is an mt command that advances the tape to the first EOF mark.
Your tape will now be positioned at the end of the first backup, and you can create another backup at this position:
$ tar czf /dev/ftape files_for_backup_2
Because you used the rewinding device, the tape will rewind after the tar backup has been written. Your tape will look like this:
Should you wish to add a third backup, you can do the following:
$ mt -f /dev/nftape fsf 2< $ tar czf /dev/ftape files_for_backup_3
The fsf 2 advances past two EOF marks—hence, past the first two backups. Your tape will look like:
and will be positioned at the beginning of the tape because you used the the rewinding device to do the backup.
If you wish to restore files from the first backup, you can do the following:
$ mt -f /dev/ftape rewind< $ tar xzf /dev/ftape files_from_backup_1
This first line ensures that the tape has been rewound. The second line then restores files from the first backup. If you wish to get files from the second backup, do the following:
$ mt -f /dev/ftape rewind $ mt -f /dev/nftape fsf 1 $ tar xzf /dev/ftape file_from_backup_2
Similarly, to restore files from the third backup:
$ mt -f /dev/ftape rewind $ mt -f /dev/nftape fsf 2 $ tar xzf /dev/ftape files_from_backup_3
You can continue adding backups until your tape is full. Note that strictly speaking, you do not need to rewind the tape each time—if you are sure that the tape is rewound, you can skip the rewind command; however, I would recommend that you always rewind prior to use to avoid problems. If you are already at the beginning of the tape, the rewind command will not take any time at all. You are likely to get into all sorts of bother if you assume that the tape is rewound when it is not. You can find out where your tape actually is by:
$ mt -f /dev/nftape status
As you can imagine, maintaining backups this way is awkward and very time-consuming since if you do not know what is on the third backup, you have to rewind your tape, advance to the third backup and then read what is there.
Another problem with tar is that, when using the compression mode, tar compresses all the files and then writes them to the tape. This leads to a very serious problem. If your tape somehow becomes corrupted, you lose all the files after the corruption occurred.
Because of the above, and because of tar's command-line user-interface, the author was tempted to fill the gap with a user-friendly backup program—hence, taper was born...
Although tar is a very powerful and flexible program, it has no friendly user interface. taper provides many of the same features as tar, but provides a nice user-friendly interface.
See below for locations of the taper sources. At the time this article was written, the current version was 5.4. taper requires a recent version (1.9.6 or greater) of ncurses that supports “forms”, which can be retrieved from any GNU mirror site or from the primary GNU ftp site, prep.ai.mit.edu. It is easy to configure, build, and install for Linux; installation instructions are included in the INSTALL file. As of this writing, the current version of ncurses is 1.9.7a and can be found in the file ncurses-1.9.7a.tar.gz.
To build and install the latest ncurses, I did the following as root:
$ cd /usr/local/sr $ tar xzf ncurses-1.9.7a.tar.gz $ cd ncurses-1.9.7a $ ./configure --with-normal --with-shared --with—debug --disable-termcap $ make $ make install
The steps you need to take may be slightly different; the INSTALL file gives plenty of details. The --with-shared option is especially important if you have ELF libraries and don't remove previous versions—otherwise, compiling taper may not work.
To make a binary of taper, issue the following commands:
$ tar xzf taper-xx.xx.tar.gz
where xx.xx is the the version that you have obtained. Next, you may need to edit the Makefile (e.g., if you are using a SCSI drive, remove the [cw]#[ecw] that is in front of the line -DHAVE_SCSI, or if you are using the new zftape driver, remove the # that is in front of the line -DUSING_ZFTAPE—see the Makefile for more details). Then type:
$ make clean $ make all $ make install
By default, the programs will have been installed in /sbin and the manual page in /usr/man/cat1. You can change these if you wish by appropriately editing the Makefile. If you do not have write permission to these directories, the files will remain in the current directory and will not be copied across.
Some terminology first:
- archive
This describes all the files on a tape (or hard disk file). You can restore files from an archive or add files to an archive.
- volume
Each archive is divided into one or more volumes. Each time you back up a set of files to an archive, a new volume is created. For example, if in one session you back up /etc, then the archive contains one volume. If you then add /usr/local/etc to the archive, another volume is created so that you still have one archive, but two volumes. Note that if you backup /etc and /usr/local/etc in one backup session, only one volume is created.
- preferences
It is possible to customize various aspects of taper—everything from screen colours, to how taper behaves when it encounters soft links. Each customizable option is called a preference.
- file set
It is possible to store commonly selected groups of files into a file set. For example, you may periodically want to backup only /etc, /usr/home/user1 and /usr/local/etc. Rather than having to explicitly select those directories every time you wish to make a backup, you can select them once and then save them to a file set. Subsequently, when you are making backups, you need only load in the file set to automatically select those directories.
- incremental backup
taper supports two modes of backup, full backup and incremental backup. With full backup mode, all the files and directories you select are backed up. In incremental mode, taper is intelligent. If the file you have selected for backup already exists on an archive, taper will back it up only if the file has changed. If the file hasn't changed, it won't be backed up. This makes backing up large directories very quick and easy since only the changed files are backed up.
- most recent restore
It is possible that you will have many copies of the same file on an archive—from old versions to the most recent version. taper can automatically detect which is the most recent and restore only that; you need not manually determine and select it.
When you create an archive, taper stores all the information about files on that archive (such as filename, file size, backup time, etc.) into a file called the archive information file. This file is stored on the hard disk in a directory (default is ~/.taper_info). In future, when accessing this archive, taper uses the archive information file to quickly gain access to all details about the archive—this speeds up performance since the tape doesn't need to be accessed.
The downside of this is that you need to make sure that this information file does not get deleted or corrupted (don't despair if it does, though, since you can recreate it). Also, if you wish to restore files on a different machine, you have to make sure that you either reconstruct the info file on the new machine, or take a copy of the info file with you on a floppy (or via an ftp transfer).
Each archive created is allocated a unique archive ID and you use that in future for accessing the archive if you don't have the tape handy.
You need to tell taper the name of the backup devices (both rewinding and non-rewinding). You can do that by giving the command line options -f (or --rewinding-device) for the rewinding device and -n (or --non-rewinding-device) for the non- rewinding device. Alternatively, you can start taper and then specify the names using the Global preferences option.
If you compiled with -DHAVE_SCSI option on, the default names are /dev/st0 (rewinding) and /dev/nst0 (non-rewinding). If you compiled with -DUSING_ZFTAPE, the default names are /dev/qftape(rewinding) and /dev/nqftape (non-rewinding). Otherwise, the default names are /dev/ftape (rewinding) and /dev/nftape (non-rewinding). It is also possible to set default names using environment variables—TAPE is the name of the rewinding device and NTAPE is the name of the non-rewinding device. Alternatively, you can use a preference file to set defaults (see below).
Start taper, by typing:
$ taper
You will then be presented with the main taper window. There are three main modules—backup, restore and mkinfo, as well as preference management options. Select the backup option.
If an archive exists, you will be asked whether you wish to append files to it, or whether you wish to overwrite it. As with all dialog boxes, the space-bar toggles between the options, and ENTER will select the currently highlighted option.
If the archive doesn't exist, you will be prompted for the archive title.
Next you will be prompted for the volume title.
You will then be presented with a screen with three panels. The top left shows the current directory on the hard disk, the top right shows what's currently on the archive and the bottom panel is used to show which files have been selected for backup. At the top of the screen is the archive ID and archive title.
To move between panels, press the TAB key. To get help on keys, press H.
You now need to select which files and directories you wish to backup. Use the cursor keys to move around the directory. Pressing ENTER when the highlight is on a directory will move into that directory.
When you find a file or directory you wish to back up, press S. The file/directory will then be sized and moved to the bottom window—if you selected a directory, taper will check with you that you really want to back it up. Press ENTER to confirm. To disable the confirmation, change this in global preferences (Prompt Directories).
In the bottom window, the file/directory will be printed as well as its size. Also, to the left of this, there will be an I or F. This indicates that the file/directory will be backed up in incremental mode (IM.) or full backup mode (F). To toggle between F and I modes, press S when the highlight is on the selected file or directory.
When you select directories, all directories under that directory are recursively included.
If you wish to deselect a file, move the cursor to the bottom window (using TAB) and then move the highlight to the file/directory you wish to deselect. Press D and the file/directory will be deselected.
If you select a file (e.g., /usr/home/john/xyz) and then select the directory in which the file resides (e.g., /usr/home/john), taper automatically recognizes that the file has been selected twice and will put brackets around the file (/usr/home/john/xyz) to tell you so. When doing the backup, the file will be backed up only once.
When you have finished selecting, press F and taper will commence the backup. Pressing Q at any time will abort the backup.
Select restore from the taper main menu.
You will then be presented with a list of all the archives taper knows about. They are sorted in archive ID order and the archive title is also printed. The highlight will be on the archive that is currently in the tape drive. Move the highlight onto the archive that you wish to restore from and press ENTER.
You will then be presented with three panels. The top left shows the files and directories currently on the archive, the top right shows a summary of the whole archive and the bottom panel is used to show the directories and files selected for restoring.
Use the cursor keys to move the highlight to select which files you wish to restore—pressing S selects the file/directory the highlight is currently on. Directories are recursively selected.
When you have selected a file/directory, it is transferred to the bottom window. In a similar way to backup, restore will put brackets around files selected twice.
In the select window, after the filename, the volume number is printed. This will show either a volume number or M. If M appears, taper is operating in most recent restore mode and will restore only the most recent copy of that file. If a volume number is displayed, the file will be restored from that volume, regardless of how recent the file in that volume is. You can toggle between the two modes by pressing S in the select window.
To deselect a file, position the highlight on the file you wish to deselect and press D.
When you have finished selecting files for restore, press F and taper will commence the restore. Pressing Q will abort the restore operation.
If you lose the archive information file (it gets deleted, corrupted or you try to restore on another machine and forget to take the archive information file with you), you can reconstruct the info file. Simply put the tape in the drive and select mkinfo from the taper main menu.
Below is a list of the more common preferences. For a complete list, see the man page.
- compress
Tells taper whether to compress files it writes to an archive. Default is TRUE.
- log file
Where taper logs activity to. Default is ~/.taper_log.
- Klog level
The level of logging from 0 (no logging) to 3 (verbose). Default is 2.
- prompt directories
Whether taper confirms before selecting directories in restore and backup. Default is FALSE.
- incremental backup
Whether incremental backup is used as a default. Default is TRUE.
- most recent restore
Whether, by default, taper should restore the most recent file or the file the user specifies. Default is TRUE.
- exclude compress
Certain types of files can bypass the compression facility—e.g., by default, taper doesn't try to compress .gz or .gif files. This preference specifies which files not to try and compress. The preference is simply a string with the files you wish to exclude given as a space-separated list (e.g., default is .gz .gif .Z)
- exclude files
Certain types of files can be excluded from the archive, even if explicitly specified—e.g., by default, taper doesn't try to back up .o files. This preference specifies which files to automatically exclude. The format is the same as the “exclude compress” preference and the default is .o ~
You can save your preferences to customize your particular setup. There are two ways to s: one is to a preference file and the second is to a command line file which can then be used to start taper in the future. Simply select the appropriate option from the main menu.
taper looks for a preference file in the following order:
The filename given by the -p (--preference-file) option on the command line
The filename given by the environment variable TAPER_PREFS
The file ~/.taper_prefs
The file /usr/local/etc/.taper_prefs
Internal defaults
Some tape drives write zeros to the beginning of a tape and this can cause confusion with taper, which thinks it has reached the end of the tape when it detects zeros. To find out if your tape drive does this, put a tape in the drive and run the testzero program—note that the tape will be overwritten. taper will test your tape and print the result on the screen.
If taper says that your drive writes leading zeros, you will have to run mktape on every tape before you can use it with taper.
People using floppy tape drives have to both format and erase tapes. These users must either format tapes using DOS, OS/2 or WINDOWS or buy pre-formatted tapes. Erasing tapes is done automatically by taper.
People with SCSI tape drives generally don't need to format tapes. Some SCSI tape drives don't even need erasing (e.g., DAT). To tell taper not to erase tapes before using them, change the erase tape option in the backup preferences menu and save the preferences. Run mktape on all tapes before you use them so that taper doesn't think they are bad.
If you have a SCSI drive and are not sure whether you need to erase tapes, tell taper not to erase tapes and see what happens.
Linux has support for a /proc file system. This is a directory that looks like a normal directory, but actually contains information about the current machine state. It is useful for programs like ps which can read this directory and print the information contained in it.
However, we obviously don't want to back up this directory. You can tell taper to automatically exclude this directory. Run the which_device program:
$ ./which_device /proc
and the device on which the /proc directory is mounted is printed. Most probably this is 1, in which case you don't need to do anything because this is taper's default. If it is not 1, tell taper this via the backup options or via the command line (--proc-device num).
If you have many files to backup, taper can end up using quite a bit of memory. If you wish to minimize the amount of memory taper uses while running, edit your Makefile and un-comment the line (i.e., remove the #] in front of the line):
DEFINES=-DMEMORY_TIGHT
Now, when taper runs, it will be quite memory efficient. The downside (of course, there's a downside!) will be that performance will not be as good. However, on most machines, you will not notice a performance degredation until you reach 2,000-3,000 files.
There is a special preference that applies only to SCSI drives. This is the --block-size option. The SCSI kernel tape driver expects data to be presented to it in blocks of a maximum size—the default is 32K. For this reason, taper writes data in blocks of 28K by default. However, should you wish to change that, you can do so with the [cw]--block-size[ecw] optioni—for example, some tape drives may function more optimally if data is written in blocks of, say, 64K. Note that this must be less than the SCSI kernel tape driver's maximum size.
You can change this option for non-SCSI drives, but it won't really affect performance.
There is no distinction between file sets made in restore and file sets made in backup—they can be used interchangeably.
To make a file set, enter backup or restore and select the files and directories you wish to designate as your file set. Then press B and taper will prompt you for a name to give to the file set. After you enter the name, the file set will be saved.
Next time you wish to backup this particular file set, press L in backup or restore, and taper will show you a list of the file sets it knows about. Select one using the arrow keys and ENTER. This particular file set will then be loaded.
taper was designed to make backing up your Linux file system easy and painless. The traditional Unix utilities, tar and cpio, are very powerful, but they are not very user friendly. With Linux becoming more popular with non-hackers, another backup solution was badly needed. I hope taper fills this gap.
There are times, however, when you should use tar rather than taper. They are:
When you will be doing backups on one UN*X system and restores on another—e.g., you make a backup on your Linux system and you restore on a Xenix system. As yet, taper has not addressed cross-platform archive compatibility—it may work, but it is not guaranteed. If you do wish to use taper to do this, test it thoroughly first.
If you need to do remote file accessing—e.g., need to access files on host:/directory. taper does not support this yet, and it may be a while before it is added.
Software developers distributing their programs as source files are still better off using tar because to distribute as taper files means also having to distribute the archive information file, which the end-user would have to place in the ~/.taper_info directory—another step confusing to novices.
Unless you are in one of those situations, taper should be adequate for most of your needs. It is certainly easier to use than tar and cpio.
As this product is under development, suggestions, bug-fixes, comments, etc. are all welcome. Similarly, short messages saying that taper works for your system are greatly appreciated since it gives me an idea of how many people are using taper and what sort of hardware it works on. This can allow me to help other people who have similar hardware.
Yusuf Nagree is a part time doctor and a full time Linux hacker (aargh—sorry, full time doctor and part time Linux hacker). He has been a computer buff since his dad bought him a ZX-80 in 1980 and has had various computers over the years. Bored with DOS, OS/2 and Windows, the aspect of Linux he finds most enjoyable is the community spirit and general willingness to help and share knowledge and experience.