(tar)cpio


Prev: Extensions Up: Archive Format

Comparison of `tar' and `cpio'
==============================

     *(This message will disappear, once this node revised.)*

   Here is a summary of differences between `tar' and `cpio'.  The
accuracy of the following information has not been verified.  The
following people contributed to this section, mainly through a survey
conducted in 1991.  The remainder of this section does not otherwise
try to relate topics to people.

     Bent Bertelsen          dmdata@login.dkuug.dk
     David Hoopes            talgras!david
     Guy Harris              guy@auspex.com
     Kai Petzke              wpp@marie.physik.tu-berlin.de
     Kristen Nielsen         dmdata@login.dkuug.dk
     Leslie Mikesell         les@chinet.chi.il.us

FIXME: Reorganize the following material

   `tar' handles symbolic links in the form in which it comes in BSD;
`cpio' doesn't handle symbolic links in the form in which it comes in
System V prior to SVR4, and some vendors may have added symlinks to
their system without enhancing `cpio' to know about them.  Others may
have enhanced it in a way other than the way I did it at Sun, and which
was adopted by AT&T (and which is, I think, also present in the `cpio'
that Berkeley picked up from AT&T and put into a later BSD release--I
think I gave them my changes).

   (SVR4 does some funny stuff with `tar'; basically, its `cpio' can
handle `tar' format input, and write it on output, and it probably
handles symbolic links.  They may not have bothered doing anything to
enhance `tar' as a result.)

   `cpio' handles special files; traditional `tar' doesn't.

   `tar' comes with V7, System III, System V, and BSD source; `cpio'
comes only with System III, System V, and later BSD (4.3-tahoe and
later).

   `tar''s way of handling multiple hard links to a file can handle
file systems that support 32-bit inumbers (e.g., the BSD file system);
`cpio's way requires you to play some games (in its "binary" format,
i-numbers are only 16 bits, and in its "portable ASCII" format, they're
18 bits--it would have to play games with the "file system ID" field of
the header to make sure that the file system ID/i-number pairs of
different files were always different), and I don't know which `cpio's,
if any, play those games.  Those that don't might get confused and
think two files are the same file when they're not, and make hard links
between them.

   `tar's way of handling multiple hard links to a file places only one
copy of the link on the tape, but the name attached to that copy is the
*only* one you can use to retrieve the file; `cpio's way puts one copy
for every link, but you can retrieve it using any of the names.

     >What type of check sum (if any) is used, and how is this
     calculated.

   See the attached manual pages for `tar' and `cpio' format.  `tar'
uses a checksum which is the sum of all the bytes in the `tar' header
for a file; `cpio' uses no checksum.

     >If anyone knows why `cpio' was made when `tar' was prasent >at
     the unix scene,

   It wasn't.  `cpio' first showed up in PWB/UNIX 1.0; no
generally-available version of UNIX had `tar' at the time.  I don't
know whether any version that was generally available *within AT&T* had
`tar', or, if so, whether the people within AT&T who did `cpio' knew
about it.

   On restore, if there is a corruption on a tape `tar' will stop at
that point, while `cpio' will skip over it and try to restore the rest
of the files.

   The main difference is just in the command syntax and header format.

   `tar' is a little more tape-oriented in that everything is blocked
to start on a block boundary.

     >Is there any differences between the ability to recover crashed
     >archives between the two of them. (Is there any chance of
     recovering >crashed archives at all.)

   Theoretically it should be easier under `tar' since the blocking
lets you find a header with some variation of `dd skip=NN'.  However,
modern `cpio''s and variations have an option to just search for the
next file header after an error with a reasonable chance of re-syncing.
However, lots of tape driver software won't allow you to continue past
a media error which should be the only reason for getting out of sync
unless a file changed sizes while you were writing the archive.

     >If anyone knows why `cpio' was made when `tar' was prasent >at
     the unix scene, please tell me about this too.

   Probably because it is more media efficient (by not blocking
everything and using only the space needed for the headers where `tar'
always uses 512 bytes per file header) and it knows how to archive
special files.

   You might want to look at the freely available alternatives.  The
major ones are `afio', GNU `tar', and `pax', each of which have their
own extensions with some backwards compatibility.

   Sparse files were `tar'red as sparse files (which you can easily
test, because the resulting archive gets smaller, and GNU `cpio' can no
longer read it).


automatically generated by info2www version 1.2