Project 28 Archive Files
“How do I squash a directory of files into a single compressed file?”
This project covers the tar command and shows you how to use it to combine a collection of files into a single archive file, how to retrieve those files from an archive, and how to use tar as a file-backup tool.
Make an Archive
Many files can be combined into an archive file for easy distribution or storage. An archive can contain anything from a few named files to a whole directory hierarchy. We’ll take a look at creating archives by using the tar command and see how to compress the archive. Then we’ll do the reverse; decompressing and extracting files from the archive.
Let’s make an archive of the files in the directory week1 by using GNU tar, which is the version of tar supplied with Mac OS X. As arguments, tar requires a function followed by function modifiers. To create a new archive file, specify function c for create and modifier f directly followed by a filename for the archive. You may also include modifier v for verbose, which tells tar to list files and directories as they are added to the archive. (Preceding the function and its modifiers with a dash [ - ] is optional.)
$ tar cvf week1.tar week1 week1/ week1/friday.ws week1/monday.ws week1/thursday.ws week1/tuesday.ws week1/wednesday.ws
We retrieve files from the archive (extract files) and write them to the current directory by specifying function x. When an archive is extracted, tar automatically creates directories as needed to match each extracted file’s pathname. If a file’s target directory already exists, the file will be extracted into that directory and will overwrite any existing file that shares its name.
$ tar xvf week1.tar week1 week1/ week1/friday.ws week1/monday.ws week1/thursday.ws week1/tuesday.ws week1/wednesday.ws
To view archive contents, specify function t (for table of contents).
$ tar tf week1.tar ...
The tar command is inherently recursive. Applying it to a directory archives the directory’s contents and those of all its subdirectories.
$ tar cvf Sites.tar ~/Sites tar: Removing leading `/' from member names ... Users/saruman/Sites/jan/ Users/saruman/Sites/jan/images/ Users/saruman/Sites/jan/images/background/ Users/saruman/Sites/jan/images/background/.DS_Store Users/saruman/Sites/jan/images/background/shade-left-b.png ...
The strange comment Removing leading `/' from member names is explained in “Understand tar and Pathnames,” later in this project.
Compress and Uncompress
To compress and uncompress tar archives, we could apply gzip and friends to the archive files manually, but built-in tar functions spare us that effort. Various modifiers instruct tar to pass archives to gzip, bzip2, or compress automatically:
- To gzip/gunzip a file, specify modifier z or --gzip. The standard extension for a tar-gzipped file is .tgz.
- To bzip2/bunzip2 a file, specify modifier j or --bzip2. The standard extension for a tar-bzipped file is .tbz2 or .tbz.
- To use the older compress, specify modifier Z or --compress. The standard extension for a tar-compressed file is .taZ.
When an archive is created, it will be compressed, and before files are extracted, the archive will be uncompressed.
We archive and compress with gzip, using either
$ tar czf week1.tgz week1 $ tar cf week1.tgz --gzip week1
Let’s check that the archive is in fact compressed by using the file command.
$ file week1.tgz week1.tgz: gzip compressed data, from Unix
To uncompress, use either
$ tar xzf week1.tgz $tar xf week1.tgz --gzip
We archive and compress with bzip2 by using either
$ tar cjf week1.tbz2 week1 $ tar cf week1.tbz2 --bzip2 week1
Let’s check, again using file.
$ file week1.tbz2 week1.tbz: bzip2 compressed data, block size = 900k
To uncompress, use either
$ tar xjf week1.tbz2 $ tar xf week1.tbz2 --bzip2
Understand tar and Pathnames
It’s important to understand the significance that pathnames have when an archive is extracted. It’s also important to understand the different behaviors of tar toward relative and absolute pathnames.
Relative Pathnames
A tar archive includes the relative pathname of each file, from the current directory to the directory being archived. Previously, we archived the directory week1 from the directory that contained it (tips). This time, we’ll move up one level, out of tips, and archive by specifying tips/week1. Compare this with the example at the start of the project.
$ cd .. $ tar cf week1.tar tips/week1 $ tar tf week1.tar tips/week1/ tips/week1/friday.ws tips/week1/monday.ws ...
You’ll notice that the pathname now includes tips/, and when the archive is extracted, it will be written back to tips/week1/ in the current directory, not directly to week1/. This ensures that when an archive is extracted, it will be written back to the same point in the directory hierarchy from which it was archived.
Note that if you were to extract this archive from within tips instead of the directory above from where it was archived, it would be written back to tips/week1 in the current directory—that is, tips/tips/week1.
Absolute Pathnames
If we specify an absolute pathname to tar, the leading slash character is dropped to make the pathname relative.
$ tar cf week1.tar /Users/saruman/Development/tips/week1 tar: Removing leading `/' from member names $ tar tf week1.tar Users/saruman/Development/tips/week1/ Users/saruman/Development/tips/week1/friday.ws ...
To extract the archive, you must change to the root directory.
$ cd / $ tar xvf /path/to/week1.tar
If you do not move to the root directory, the entire pathname of Users/saruman/Development/tips/week1/ will be created below the current directory as the archive is extracted. If you really do want absolute pathnames in the archive, specify option -P or --absolute-names when creating the archive and when extracting from the archive.
Why is the leading / stripped? If it were not, the archive would always be written starting from the root directory, creating all other directories needed to match the archive pathname. At best, sending the absolute-pathname archive /Users/saruman to a friend would force him to create a directory called /Users/saruman that he doesn’t need. At worst, if your friend lacks the permissions needed to create that directory, he will not be able to extract the archive.
Make Incremental Backups
We can use tar to make a backup of a directory and write the archive to CD or DVD. You might place the archive on an external drive or mounted server, and in this case, a neat trick uses the tar function update (u) to update the archive periodically. Updating an archive considers only those files that have changed, adding them to the end of the archive. It’s obviously quicker and easier to update an existing archive than to create a new one.
In the following example, we create an archive of the directory week1 and then change a couple of files with the vim text editor.
$ tar cf week1.tar week1 $ vim week1/tuesday.ws $ vim week1/wednesday.ws
Next, we update the archive by using the function u. The modifier v gives reassurance that the changed files are detected and added to the archive.
$ tar uvf week1.tar week1 week1/ week1/tuesday.ws week1/wednesday.ws
Editing and updating again:
$ vim week1/tuesday.ws $ tar uvf week1.tar week1 week1/ week1/tuesday.ws
If we examine the archive, all the original files, plus the two sets of updates, will be shown.
$ tar tf week1.tar week1/ week1/friday.ws week1/monday.ws week1/saturday.ws week1/thursday.ws week1/tuesday.ws week1/wednesday.ws week1/ week1/tuesday.ws week1/wednesday.ws week1/ week1/tuesday.ws
When the archive is extracted, earlier versions of tuesday.ws and wednesday.ws are replaced by the latest versions.