Submitted-by: Dan Bernstein <brns...@nyu.edu> Posting-number: Volume 24, Issue 73 Archive-name: yabbawhap/part01 [ The file PATENT gives a nice summary of the issues. --r$ ] yabba applies Y compression to its input; unyabba decompresses the result. whap applies AP compression to its input; unwhap decompresses the result. whap and unwhap run at about the same speed as UNIX compress and uncompress, which use LZW coding; yabba and unyabba are two to three times slower. AP and Y compression are typically 10-20% more effective than LZW compression in the same amount of memory. Y coding, unlike LZW coding and AP coding, is unpatented. It should be possible to use these programs on any reasonable C platform, though they were originally designed on a BSD UNIX system. #! /bin/sh # This is a shell archive. Remove anything before this line, then feed it # into a shell via "sh file" or similar. To overwrite existing files, # type "sh file -c". # The tool that generated this appeared in the comp.sources.unix newsgroup; # send mail to comp-sources-u...@uunet.uu.net if you want that tool. # Contents: README Makefile ycoding.4b # Wrapped by rs...@litchi.bbn.com on Wed Mar 20 17:09:21 1991 PATH=/bin:/usr/bin:/usr/ucb ; export PATH echo If this archive is complete, you will see the following message: echo ' "shar: End of archive 1 (of 4)."' if test -f 'README' -a "${1}" != "-c" ; then echo shar: Will not clobber existing file \"'README'\" else echo shar: Extracting \"'README'\" \(8502 characters\) sed "s/^X//" >'README' <<'END_OF_FILE' Xyabbawhap - Y and AP compression filters X Xyabba applies Y compression to its input; unyabba decompresses the Xresult. whap applies AP compression to its input; unwhap decompresses Xthe result. whap and unwhap run at about the same speed as UNIX compress Xand uncompress, which use LZW coding; yabba and unyabba are two to three Xtimes slower. AP and Y compression are typically 10-20% more effective Xthan LZW compression in the same amount of memory. Y coding, unlike LZW Xcoding and AP coding, is unpatented. It should be possible to use these Xprograms on any reasonable C platform, though they were originally Xdesigned on a BSD UNIX system. X Xyabbawhap version 1.00, March 19, 1991. XPlaced into the public domain by Daniel J. Bernstein. X XThanks to the gamma testers for their comments, criticism, and code: X X Eirik Fuller <ei...@theory.tn.cornell.edu> X Marc Andreessen <andrees...@uimrl7.mrl.uiuc.edu> X Loren J. Rittle <cs32...@ux1.cso.uiuc.edu> X Jean-loup Gailly <jl...@chorus.fr> X Terje Malmedal <malme...@dhmolde.no> X Lennart Augustsson <augus...@cs.chalmers.se> X John C. Schultz <schu...@halley.serc.3m.com> X Dave Gudeman <gude...@cs.arizona.edu> X Colin Plumb <ccpl...@rose.waterloo.edu> X Ozan Yigit <o...@nexus.yorku.ca> X Benno Tietz <ti...@cs.bonn.edu> X Alexios Zavras <z...@theseas.ntua.gr> X Hans Henrik Eriksen <h...@ifi.uio.no> X Frank Wales <fr...@grep.co.uk> X Paul A. Houle <pahs...@jupiter.nmt.edu> X Graham Thoal <gt...@ed.ac.uk> X Robert Kelley <r...@sequent.com> X X XOrganization of README: X X1. Files X2. Requirements X3. How to configure X4. How to compile X5. How to install X6. TODO list X X X1. Files: X XBLURB advertisement XCHANGES list of changes since first distributed version XREADME this file XFORMLETTER form letter to send to the author XPATENTS some notes on compression patents XQUESTIONS questions and answers about yabbawhap XFILES file list Xsysconf script to test for certain system features Xcheckconf.c tool to ease Makefile configuration XMakefile compilation commands Xtry script to test yabba and unyabba and compare to compress Xtryap script to test whap and unwhap and compare to compress Xtryapy combination of try and tryap XINSTALL script to install the programs Xyabba.1 man page for yabba and whap Xunyabba.1 man page for unyabba and unwhap Xhuptrie.h huptrie library Xbitout.h bit output library Xpercent.h percent library, computes 100a/b without overflow Xtexts.h various messages printed by the programs Xyw.c main code for yabba and whap Xunwhap.c main code for unwhap Xunyabba.c main code for unyabba Xbitout.c bit output code Xpercent.c percent code Xtexts.c texts code Xycoding.4b Y Coding, paper draft 4b Xycoding.uu yabba'd, uuencoded version of ycoding.4b X X X X2. Requirements X XYou should be able to adapt yabbawhap to practically any C platform X(with 8-bit characters). However, this package was designed on a UNIX Xsystem. The compressors do not take file names; they only act as Xfilters. The support files are also oriented towards UNIX. X Xyabbawhap has been reported to work on the following systems: X X Sun 4/280, SunOS 4.0.3 (1.00) X Sun 4/280, SunOS 4.0.3 (0.98) X Sun 4/280, SunOS 4.0.3, gcc (0.98) X Sun 4/330, SunOS 4.0.3 (0.95) X Sun 4/330, SunOS 4.0.3c (0.98) X SPARC?, SunOS 4.1 (0.95) X Sun 3/?, SunOS 4.1 (0.95) X Sun 3/160, SunOS 4.1 (1.00) X Sun 3/480, SunOS 4.1 (0.98) X SPARCStation SLC, SunOS 4.1 (0.95) X Sun 3/50, SunOS 4.1.1, gcc (0.98) X Sun 3/60, SunOS 4.1.1, gcc (0.98) X Sun 3/60, SunOS 4.1.1, gcc 1.39 (0.95) X SparcStation 2, SunOS 4.1.1 (0.98) X DECStation 5400, Ultrix 4.1 (1.00) X DECStation 5000/200, Ultrix 4.1 (0.95) X DECStation 5000/200, Ultrix 4.1, gcc (0.98) X DECStation 5000/200, Ultrix 4.1 (0.98) X may need -Olimit 2500 to compile with DEC's ANSI C compiler V2.10 X DECStation 3100, Ultrix 4.0 (0.95) X DECStation 3100, Ultrix 4.0 (0.98) X DECSystem 5820, Ultrix 4.1 (0.98) X DECSystem 5820, Ultrix 4.1 (1.00) X VAX 11/780, BSD 4.3 (0.95) X VAX ?, BSD 4.3, gcc 1.39 (0.95) X Tek 4316, Utek 4.1, gcc 1.39 (0.95) X need -DBRAINDAMAGED X Sequent S811, Dynix 3.0.17 (0.95) X Sequent Symmetry, Dynix 3.0.17.9 (0.95) X Sequent ?, BSD 4.2? (0.95) X Apollo DN3500, DOMAIN/OS SR10.2, cc -A cpu,3000 -W0,-opt,2 (1.00) X Convex C1-XP, Convex UNIX 9.0, cc -pcc (1.00) X need -pcc for the new compiler; -UPTRS faster than -DPTRS X IBM RS/6000, AIX 3.1 (0.95) X HP 9000s300, HP/UX 7.0 (0.98) X -O rather than -O2, optimizer appears buggy anyway X NeXT, NeXT Mach 1.0 (0.98) X Astronautics ZS, ZSUnix 1.2 (1.00) X Amiga, AmigaOS 1.3.2 (0.95) X X(Under gcc, always use -O -fstrength-reduce in place of -O2 in CCOPTS. XDon't even bother with -finline-functions or the other function-call Xoptimizations.) X XIf your machine isn't in this list, and you get the programs working, X*please* send a note to me at brns...@nyu.edu on the Internet describing Xwhat you had to do to make the programs compile. (Of course, please also Xlet me know if you have trouble, or if you have comments, questions, or Xsuggestions.) I'd rather be flooded with reports and be able to compile Xa more comprehensive list than have no feedback because everyone assumes Xsomeone else has talked to me first. You can use FORMLETTER if you want. XThanks for being a good sport. X X X X3. How to configure X XFirst, run the sysconf shell script. It will try to figure out whether Xyou have bzero(), memset(), and a certain putchar() bug, and will modify XMakefile accordingly. X XNext, make checkconf. Then run checkconf to see a few facts about your Xcurrent configuration. You can give it options, like -DNODEMAX=21000, Xand it will instantly show you how that change will affect the size of Xthe programs if you add it to CCOPTS in the Makefile. It will also make Xsure that various constraints are met. checkconf -H shows a help screen. X XNext, read through the option descriptions in the Makefile, or print out Xa copy and peruse it at your leisure. You can configure yabba, unyabba, Xwhap, and unwhap in several different ways to change compression size, Xspeed, and power. No single configuration is right for every job. X XFinally, armed with checkconf and the option descriptions, decide how Xyou want to configure the programs. Change Makefile appropriately, and Xremake checkconf just in case you want to experiment with changes later. X XIf you want to get through configuration as quickly as possible, run X X % ./sysconf X Xand press return when it asks whether it should make and run checkconf. XBut don't complain to me about teething trouble if you haven't read Xthrough all of README and Makefile, as well as the checkconf output. X XTwo big caveats: 1. ZEROFILLED should be off on practically any non-UNIX Xoperating system. 2. If you increase NODEMAX, you probably want to set X-DNODENUM=65533 unless you know your recipients have compiled with the Xsame high value of NODEMAX. X X X X4. How to compile X XJust make. You'll get yabba and unyabba. If your optimizer dies, first Xtry defining -DOPTCANT5 in the Makefile. If that doesn't work, try X-DOPTCANT2. If that doesn't work, try -DOPTCANT1, or lower the Xoptimization level. X XTo test the programs, run the ``try'' shell script with a filename Xargument. % ./try yw.shar, for instance. try will give you times and Xresults for yabba, unyabba, compress, and uncompress on the file. (Note Xthat it does not test for nonexistent files or symbolic links.) X XIf you want to compile whap and unwhap, beware! AP coding is patented. X(Then again, LZW coding is patented, and people use compress all the Xtime.) You should understand the information in PATENTS first. Then if Xyou're curious to see how well AP coding can do, make AP. You can then Xrun the tryap and tryapy shell scripts the same way as try. X XAnother test you can run is to uudecode ycoding.uu and unyabba -m9999 Xthe result, ycoding.Y. You should get a perfect copy of ycoding.4b. X X X5. How to install X XBy default, yabba and unyabba are installed in /usr/local/bin; whap and Xunwhap are installed in /usr/local/bin; yabba.1 and unyabba.1 are Xinstalled in /usr/man/man1. If you want to change these defaults, edit XINSTALL. Then run it from a root shell; it will check every action with Xyou before proceeding. X X X6. TODO list X X-E errs X-f filterfile X (Please don't bug me about having yabba take a filename argument X until you've seen what -f does.) X-F Xflexible -m? report size? (tnx dg) Xmake no-header and random-bits independent? Xbetter RESET defaults? Xput compression in function? will happen with -f Xrename as fwhap, fyabba, funwhap, fyabba? no Xuse array trie for top level or two? END_OF_FILE # end of 'README' fi if test -f 'Makefile' -a "${1}" != "-c" ; then echo shar: Will not clobber existing file \"'Makefile'\" else echo shar: Extracting \"'Makefile'\" \(9240 characters\) sed "s/^X//" >'Makefile' <<'END_OF_FILE' XCC=cc XCCOPTS=-O2 -DTYPE=int -DPTRS -DBZERO -DRANDINIT="getpid()" X X# These options would be best to compile ``large'' yabba and unyabba X# on a Sun or other generic 32-bit UNIX machine: X# CCOPTS=-O2 -DTYPE=int -DPTRS -DBZERO -DZEROFILLED -DRANDINIT="getpid()" X# On a System V machine, -DMEMZERO should be used instead of -DBZERO. X# If you want more speed, add -DMOD=131072. X# For large compressions, try -DNODEMAX=500000 -DMOD=1048576 -DNODENUM=65533. X# On a small machine, these options might be more appropriate: X# CCOPTS=-O2 -DTYPE=short -DOPTCANT5 X# To see what the options mean and find out what other options are X# available, keep reading. X X# You should already have run the sysconf script (at least on any UNIX box) X# to test for system features. sysconf may have modified the CCOPTS line X# on line 2 of this Makefile. X X# To check your configuration decisions before compiling, type X# % make checkconf X# % ./checkconf X# or something equivalent on non-UNIX systems, and read through X# checkconf's output. You can give checkconf any -D or -U options to X# see what effect changes will have. On a UNIX machine, you can use X# the ``try'' shell script to test yabba and unyabba and see how they X# measure up against compress and uncompress in speed and effectiveness. X X# TYPE and PTRS have the biggest effect on how yabba and unyabba operate. X# X# TYPE is the type that yabba and unyabba will use for indexing. It X# should almost certainly be either short or int. If PTRS is defined, X# yabba (and unyabba and whap, but not unwhap) will use pointers rather X# than integers wherever possible. On almost all machines yabba runs X# faster with -DPTRS. More precisely: X# X# -UPTRS -DTYPE=short: Use shorts. This is ``small'' yabba and unyabba. X# -UPTRS -DTYPE=int: Use ints everywhere. On large machines this is X# usually quite a bit faster than the small version, but also uses X# twice as much memory. X# -DPTRS -DTYPE=short: small unwhap, but use pointers in yabba X# wherever possible. This typically doubles yabba's memory use. X# -DPTRS -DTYPE=int: Use ints everywhere, with pointers wherever X# possible. This is ``large'' yabba and unyabba. On large machines it is X# the fastest configuration, but uses twice as much memory as ``small.'' X# X# If you want to generate very large-block compressions (see below) X# on a small machine where int is 16 bits, you may want to try compiling X# with -DPTRS -DTYPE=long. X X# NODEMAX and NODENUM control the compression block size. X# X# Any adaptive dictionary compressor has a block size. It can keep X# track of this much input, or this much output, or this many strings X# at once, and compress by looking for patterns in the input data. X# But eventually there's too much to keep track of, and the compressor X# has to stop adapting to the patterns. X# X# In the LZW-based UNIX ``compress'' program, for instance, the block X# size is stated as the number of bits that the coder can use to X# identify strings in its dictionary. Once it runs out of identifying X# numbers, it can't adapt any further. X# X# In this compressor, the natural block size is the size of input. X# NODENUM sets the number of bytes of input that yabba can adapt to at X# once. Past that it will stick to the current dictionary as long as the X# patterns seem useful, then clear the strings and start adapting all X# over again. X# X# The user can set NODENUM dynamically with -m. NODEMAX (minimum 512, X# default 65533---NODEMAX + 2 must fit into TYPE) sets the maximum X# possible value; several fixed arrays in yabba and unyabba are X# dimensioned with size NODEMAX. NODENUM defaults to NODEMAX. X# X# NODENUM and NODEMAX also control the size of input expected by unyabba. X# unyabba *must* be given the same -m parameter as yabba was given for X# compression. That means that if you compress on one machine where X# yabba has NODENUM set to 60000, you won't be able to decompress on X# another machine where unyabba has NODEMAX set to 20000. -m has the X# same effect on portability that -b does for compress (though -m is X# expressed in somewhat more tractable units). X X# MOD (default 65536) is the size of a hash table kept by yabba. It must X# be a power of two. It should be on a similar order of magnitude as X# NODEMAX---if it is much too small, the hash table will be too crowded, X# and if it is much too big, the table will waste memory. X X# BITBUFSIZE only affects yabba (at the moment). It is the size of two X# output buffers kept by the bit-packing coroutines. It defaults to 1000. X X# HASHTYPE (default TYPE) is the type used for storing hash values in X# yabba. It has little effect on whap memory use, so on large machines X# you may want to set -DHASHTYPE=int even for small whap. However, it X# does affect memory use in yabba and unyabba, so -DHASHTYPE=short may X# be useful. In any case, MOD - 1 must fit into unsigned HASHTYPE. X X# RESETNUM (default 8192) and RESETFUZZ (default 30, can be negative) X# control how yabba decides when to clear its dictionary. After NODENUM or X# the -m limit is reached, yabba checks every RESETNUM input characters X# whether compression is still worth it. RESETFUZZ has some vaguely X# defined effect on the meaning of ``worth it'': the higher RESETFUZZ X# is, the longer yabba holds out before clearing the old patterns. The user X# can change RESETFUZZ and RESETNUM dynamically, so don't worry about them X# too much. X X# Under -r, yabba needs some reasonably random value from the environment. X# If you define an integer RANDINIT, yabba will initialize its generator X# based on that value. Otherwise it has to stick to a default sequence, X# which does not add as much security. (If you can't think up a good X# definition for RANDINIT, open up your phone book and pick a number.) X X# Many operating systems have brain-damaged putc(): if you put a X# (perfectly valid) 255 byte, putc() will return EOF, indicating an error. X# This is the case under Ultrix 3.1 and Convex UNIX 8.0, for instance. X# You MUST define -DBRAINDAMAGED if sysconf tells you to; otherwise yabba X# and unyabba will crash mysteriously. Also complain to your vendor. X X# If your compiler blows up on yw.c, you may want to set -DOPTCANT5. X# This will change certain heavily unrolled loops to lightly unrolled. X# If that still doesn't help, try -DOPTCANT2; then -DOPTCANT1. X X# If the internal representation of the NULL pointer on your machine is X# 0, you can make yabba run slightly faster with -DBZERO or -DMEMZERO. X# (checkconf will tell you if this is true.) If you have bzero(), use X# -DBZERO. If you have ANSI memset(), use -DMEMZERO. If you have X# neither, you're out of luck. MEMZERO is ignored under -DBZERO. X X# If you have a machine with particularly fast memory access (perhaps X# certain microcomputers), you might try defining -DHASHPTRS. This will X# use some more memory but may run faster. I have not seen any machines X# where -DHASHPTRS helps. X X# Finally, you should set -DZEROFILLED if your operating system X# guarantees that static arrays are initialized to zeros. Don't set it X# under -DPTRS if NULL pointers don't have internal representation 0. X# ZEROFILLED just makes yabba start up a bit faster. (checkconf warns you X# if it notices a non-zero-filled array, but its test isn't guaranteed.) X# UNIX systems generally guarantee -DZEROFILLED. X X# All of the above comments apply equally to whap as to yabba except X# where otherwise noted. X Xdefault: all X XAP: whap unwhap X Xall: yabba unyabba X Xcheckconf: checkconf.c Makefile X $(CC) $(CCOPTS) -o checkconf checkconf.c X Xyabba: yabba.o bitout.o percent.o ytexts.o Makefile X $(CC) $(CCOPTS) -o yabba yabba.o bitout.o percent.o ytexts.o X Xunyabba: unyabba.o percent.o ytexts.o Makefile X $(CC) $(CCOPTS) -o unyabba unyabba.o percent.o ytexts.o X Xwhap: whap.o bitout.o percent.o wtexts.o Makefile X @echo 'Warning! If you use AP coding except for instruction and amusement,' X @echo 'you may be infringing on a patent.' X @echo 'If you understand this, press return:' X @read foo; X $(CC) $(CCOPTS) -DWHAP -o whap whap.o bitout.o percent.o wtexts.o X Xunwhap: unwhap.o percent.o wtexts.o Makefile X $(CC) $(CCOPTS) -DWHAP -o unwhap unwhap.o percent.o wtexts.o X Xbitout.o: bitout.c bitout.h Makefile X $(CC) $(CCOPTS) -c bitout.c X Xpercent.o: percent.c percent.h Makefile X $(CC) $(CCOPTS) -c percent.c X Xytexts.o: texts.c texts.h Makefile X $(CC) $(CCOPTS) -c texts.c X mv texts.o ytexts.o X Xwtexts.o: texts.c texts.h Makefile X $(CC) $(CCOPTS) -DWHAP -c texts.c X mv texts.o wtexts.o X Xyabba.o: yw.c bitout.h percent.h texts.h huptrie.h Makefile X $(CC) $(CCOPTS) -c yw.c X mv yw.o yabba.o X Xwhap.o: yw.c bitout.h percent.h texts.h huptrie.h Makefile X $(CC) $(CCOPTS) -DWHAP -c yw.c X mv yw.o whap.o X Xunyabba.o: unyabba.c bitout.h percent.h texts.h huptrie.h Makefile X $(CC) $(CCOPTS) -c unyabba.c X Xunwhap.o: unwhap.c percent.h texts.h Makefile X $(CC) $(CCOPTS) -DWHAP -c unwhap.c X Xinstall: X @echo 'Run INSTALL in a root shell.' X exit 1 X Xclean: X @echo 'Why do you want to make clean, anyway?' X @echo 'If you changed the Makefile, it knows it should remake.' X @echo 'If you want to save space, do make shar and then remove everything.' X rm -f unwhap.o unyabba.o whap.o yabba.o yw.o texts.o bitout.o percent.o wtexts.o ytexts.o checkconf whap unwhap yabba unyabba X Xshar: X shar `cat FILES` > yw.shar X chmod 400 yw.shar END_OF_FILE if test 9240 -ne `wc -c <'Makefile'`; then echo shar: \"'Makefile'\" unpacked with wrong size! fi # end of 'Makefile' fi if test -f 'ycoding.4b' -a "${1}" != "-c" ; then echo shar: Will not clobber existing file \"'ycoding.4b'\" else echo shar: Extracting \"'ycoding.4b'\" \(31910 characters\) sed "s/^X//" >'ycoding.4b' <<'END_OF_FILE' XY coding X XDaniel J. Bernstein X XCopyright 1991. All rights reserved. XDraft 4b, March 19, 1991. X XThis is a draft. That means it's supposed to be unreadable, misleading, Xboring, useless, and generally wrong. Any deviations from the norm are Xaccidents---happy accidents, but accidents nonetheless. End of warning. X X X----- 1. Introduction X X--- LZW coding X XFix an alphabet A, and take a string I over that alphabet. Construct a X``dictionary'' D---a one-to-one mapping from strings to integers---as Xfollows: X X 0. Start by mapping each symbol of A to a unique integer. X 1. Find the longest prefix p of I contained in D. (Rather, contained X in the domain of D. It is convenient to ignore this distinction.) X 2. Take the next symbol c after p in I. X 3. Add pc to the dictionary, mapping to another unique integer. X 4. Strip p from the front of I, so that I begins with c. X 5. Repeat from step 1 until I is empty (i.e., until step 2 fails). X XFor example, say A is the set abcdefghijklmnopqrstuvwxyz, and say I is Xthe string yabbadabbadabbadoo. D starts with all single characters from XA. Now the longest match between I and D is I's first character, y; and Xthe charater after that is a. So we add ya to the dictionary and strip y Xfrom the front of I. X XNow I is abbadabbadabbadoo, and D has all single characters plus ya. The Xlongest match between D and I is a, so we add ab to the dictionary and Xremove the a. We continue in this manner until I is empty: X X I match add new dictionary X yabbadabbadabbadoo y ya ya (plus all single characters) X abbadabbadabbadoo a ab ya ab X bbadabbadabbadoo b bb ya ab bb X badabbadabbadoo b ba ya ab bb ba X adabbadabbadoo a ad ya ab bb ba ad X dabbadabbadoo d da ya ab bb ba ad da X abbadabbadoo ab abb ya ab bb ba ad da abb X badabbadoo ba bad ya ab bb ba ad da abb bad X dabbadoo da dab ya ab bb ba ad da abb bad dab X bbadoo bb bba ya ab bb ba ad da abb bad dab bba X adoo ad ado ya ab bb ba ad da abb bad dab bba ado X oo o oo ya ab bb ba ad da abb bad dab bba ado oo X o o X XWhile we construct the dictionary we can output the value of each match Xunder the dictionary. This produces a sequence of numbers which, as we Xwill see below, is sufficient to reconstruct the original string. This Xmapping from strings to sequences of numbers is called LZW coding. X XTypically the numbers in the dictionary are chosen sequentially, from 0 Xto |A| - 1 for the initial single-character strings and then from |A| on Xup. In the above example, the matches are y a b b a d ab ba da bb ad o o, Xwhich have numbers 24 0 1 1 0 3 27 29 31 28 30 14 14. That sequence is Xthe result of yabbadabbadabbadoo under LZW coding. X X X--- LZW decoding X XHow do we reconstruct the original string if we're given the coded Xsequence? The secret is to reconstruct the dictionary. X X 0. Start by mapping each symbol of A to a unique integer. Set I to the X empty string. Read a number from the input, and set p to the X corresponding single-character string. X 1. Append p to I. X 2. Read a number from the input, and let c be the first character of X the corresponding string in the dictionary. X 3. Add pc to the dictionary, mapping to the next unique integer. X 4. Set p to the dictionary string corresponding to the number read. X 5. Repeat from step 1 until end-of-input (i.e., until step 2 fails). X XFor example, take the input sequence 24 0 1 1 0 3 27 29 31 28 30 14 14. XD starts with all single characters from A; p starts as the single Xcharacter y; and I starts empty. X XNow p is appended to I, so that I contains y. The 0 in the input means Xthat the next character of I is an a; and ya is added to the dictionary. Xp is then set to a. X XNext, p is appended to I, so that I contains ya. The 1 in the input Xmeans that the next character of I is b; and ab is added to the Xdictionary. p is then set to b. We continue this way through the entire Xinput: X X input c added new p I X 24 y y X 0 a (ya,26) a ya X 1 b (ab,27) b yab X 1 b (bb,28) b yabb X 0 a (ba,29) a yabba X 3 d (ad,30) d yabbad X 27 *a* (da,31) ab yabbadab 27 is ab, so *a* is the 1st char X 29 _b_ (abb,32) ba yabbadabba 29 is ba, so _b_ is the 1st char X 31 d (bad,33) da yabbadabbada X 28 b (dab,34) bb yabbadabbadabb X 30 a (bba,35) ad yabbadabbadabbad X 14 o (ado,36) o yabbadabbadabbado X 14 o (oo,37) o yabbadabbadabbadoo X XNotice the slightly twisted statement of steps 2 through 4. p is set to Xthe dictionary string corresponding to the number read from the input, Xbut first c is set to the first character of that string, and the old pc Xis added to the dictionary. This roundabout presentation is necessary Xbecause the number on the input may be the number of the very last Xdictionary string added---and that last string depends on the first Xcharacter of the current string. This overlap is always safe but is one Xof the first problems to look out for in a new implementation. X X X--- So who cares about this LZW stuff anyway? X XWhat's the point of LZW coding? When the input string has lots of the Xsame characters or sequences of characters, the dictionary will rapidly Xgrow to include those sequences. Then a single number in the output Xsequence might represent several characters of input, and will use less Xspace. X XFor example, take the simplest natural encoding of a string over our Xlowercase alphabet A. There are 26 symbols, so we can represent each Xwith 5 bits. Our string yabbadabbadabbadoo has 18 characters and takes X90 bits under this encoding. X XOn the other hand, the string after encoding has just 13 numbers: X24 0 1 1 0 3 27 29 31 28 30 14 14. For these values there are 26 27 28 X29 30 31 32 33 34 35 36 37 38 choices respectively, so they take 5 5 5 5 X5 5 5 6 6 6 6 6 6 bits in the natural way, for a total of just 71 bits. X XMost strings that come up in practice have similar redundancy, and LZW Xcoding will produce an output that takes less space than its input. This Xis why LZW coding is usually called LZW compression. It often reduces Xlonger strings to 50% or even 30% of their original size, so that they Xtake a fraction as much disk space and communication time as they would Xin their uncompressed form. X X X--- A bit of jargon X XMany audio and video compression methods throw away some information. XThey don't reconstruct exactly the original pictures and sounds, but Xnobody really notices. These are called inexact, or sometimes lossy, Xcompressors. In contrast, LZW always lets you get back exactly the Xoriginal string. It is an exact, or lossless, compressor. X XAny compressor that splits its input into substrings and codes the Xstrings with a dictionary is called (logically enough) a dictionary Xcompressor. X XA substring of an input string---particularly a substring that is coded Xas a single output number---is called a ``phrase.'' In LZW, one string Xis added to the dictionary per phrase. X XSome compression methods use fixed tables: ``the'' becomes a special, Xsingle character, ``of'' becomes another, etc. They're called static, or Xnonadaptive, compressors. Others, like LZW, build their dictionaries Xdynamically to adapt to any input; LZW is an example of an adaptive Xcompressor. Somewhere in between are compressors that make two passes Xover the input, building a dictionary the first time through and then Xproducing output the second time through. They're called semiadaptive. X XOne useless theoretical property of LZW is that it is universal. This Xmeans that on a certain wide class of inputs it will give as close as Xyou want to the best possible compression of any method. The longer your Xstrings are, the closer it comes to optimal. Universality doesn't make Xone whit of difference in practice---to get within 1% of optimal with XLZW you'd have to start compressing the planet---but it generally Xindicates that a method is both simple and trustworthy. Non-universal Xcompressors are usually much more complicated, and will fail miserably Xon inputs they weren't designed specifically to handle. X X X X----- 2. Implementing LZW X X--- Encoding X XThe most natural way to find the longest match between I and strings in XD is one character at a time. Start with p as the first character of I X(or as the null string if that is more convenient). Read the next Xcharacter, c. If pc is in the dictionary, set p to pc and repeat. XOtherwise p and c are set. X XSo the dictionary has to support two basic operations: searching for pc Xif p is known to be there already, and adding pc if it is not there. XMost implementors use some form of trie---array trie, list trie, hash Xtrie, huptrie---for this job. The array trie, as in Woods' ``hyper-LZ'' X[9], offers extreme speed at the expense of |A| (typically 256) words Xof memory for each string in the trie. The huptrie and straight hash Xtrie use only a small amount of memory per string but still allow Xreasonably fast searches in the average case. We will not consider these Xstructures in further detail. X XIn any case at most a fixed number of operations happen for every input Xcharacter, so LZW coding takes linear time no matter how long the input Xis. Unfortunately, computers don't have infinite memory. Implementations Xtypically limit the dictionary to some size, usually between 1024 and X65536 strings. When the dictionary reaches that size it can either Xremain fixed for the rest of the input, or it can be cleared and start Xfrom single characters again. The second strategy effectively breaks the Xinput into ``blocks'' with independent dictionaries. If the input X``feels'' different in different blocks, blocking will produce good Xresults, because it will weed useless strings out of the dictionary. X XThe ``compress'' program [8] introduced an important blocking variant. XInstead of clearing the dictionary as soon as it reaches its maximum Xsize, compress periodically checks how well it is doing. It will only Xclear the dictionary when its compression ratio deteriorates. This way Xthe blocks adapt to the input. Note that all of these blocking Xtechniques require space for a special output code to clear the Xdictionary. X XAnother variant is LRU replacement: the least-recently-used strings are Xremoved from the dictionary at some opportune time. This provides a Xsmoother transition between blocks than clearing the dictionary. XUnfortunately, it is difficult to find good heuristics for when to Xremove what strings. Blocking is still more of an art than a science. X X X--- Preparing for encryption X XIt is very useful to compress a message before encrypting it, as Xcompression removes much of the statistical regularity of straight text. XHowever, the usual coding leaves a lot of redundancy. If the dictionary Xhas 33 strings, for example, and 6 bits are used to represent a choice Xamong those strings, then the high bit will almost always be 0. These Xhigh bits come in predictable locations, so an attacker can almost Xalways figure out the key given a long enough stretch of compressed Xtext. X XTo prevent this, the compressor should introduce random bits as follows: XGiven a choice between 0 and 40, 0 through 22 are encoded as either Xthemselves or from 41 through 63. The choice should be made with a Xhigh-delay random number generator such as a subtractive generator. X23 through 40 are always encoded as themselves. This method greatly Xreduces the amount of extra information available per bit without Xappreciably slowing down the coding. Because random bits are used at Xunpredictable times, an attacker cannot easily take advantage of the Xdeterminism of the pseudo-random generator. X XMost implementations also add a recognizable header to compressed text. XThis header should be removed before encryption. X X X--- Decoding X XLZW decoding is even easier than encoding: the dictionary does not have Xto support searching. The easiest (and generally fastest) method is to Xkeep I in memory as it is reconstructed, and to keep track of which Xsubstring of I a given dictionary number corresponds to. To add pc to Xthe dictionary means to add the pair (pointer to start of pc within I, Xlength of pc) at the right spot in an array indexed by dictionary Xnumbers. There are methods which take slightly less memory than this, Xbut they are slower. X X X X----- 3. MW and AP coding X X--- MW: Adapting better X XLZW adapts to its input relatively slowly: strings in the dictionary Xonly get one character longer at a time. If LZW is fed a million Xconsecutive x's, its longest dictionary string will only have around X1400 x's, and it will only achieve around 1000:1 compression. Is there Xany better way for it to notice the redundancy? X XThe answer is yes. Instead of adding p plus one character of the next Xphrase to the dictionary, we can add p plus the entire next phrase to Xthe dictionary. This defines MW coding. For example: X X I match added to dictionary X yabbadabbadabbadoo y X abbadabbadabbadoo a ya (concatenation of last two matches) X bbadabbadabbadoo b ab X badabbadabbadoo b bb X adabbadabbadoo a ba X dabbadabbadoo d ad X abbadabbadoo ab dab X badabbadoo ba abba X dabbadoo dab badab X badoo ba dabba X doo d bad X oo o do X o o X XEven such a short example shows how quickly MW begins to adapt. By the Xfifteenth character ``dabba'' is already established as a dictionary Xphrase. On the other hand, MW will miss shorter phrases where there is Xless redundancy, so it generally performs only somewhat better than LZW Xin practice. In this example it outputs 13 numbers, just like LZW. But Xgiven a million x's it will end up with a dictionary string of length Xaround half a million, and it will output just 30 bytes. X X X X X--- Problems of MW X XMW lacks some properties of LZW that we don't realize are so helpful Xuntil they are taken away. Most importantly, not every prefix of a Xdictionary string is in the dictionary. This means that the Xcharacter-at-a-time search method we saw for LZW doesn't work. Instead, Xevery prefix of a string in the dictionary must be added to the trie, Xand every node in the trie must be given a tag saying whether it is in Xthe dictionary or not. Furthermore, finding the longest string may Xrequire backtracking: if the dictionary contains xxxx and xxxxxxxx, we Xdon't know until we've read to the eighth character of xxxxxxxy that we Xhave to choose the shorter string. This seems to imply that MW coding Xrequires backtracking and hence is fundamentally slower than LZW coding. X(Decoding can be made reasonably fast, though.) X XAnother problem is that a string may be added to the dictionary twice. XThis rules out certain implementations and requires extra care in Xothers. It also makes MW a little less safe than LZW before encryption. X X X--- AP: Adapting faster X XThere is a natural way to preserve the good adaptation of MW while Xeliminating its need for backtracking: instead of just concatenating the Xlast two phrases and putting the result into the dictionary, put all Xprefixes of the concatenation into the dictionary. (More precisely, if S Xand T are the last two matches, add St to the dictionary for every Xnonempty prefix t of T, including T itself.) This defines AP coding. For Xexample: X X I match added to dictionary X yabbadabbadabbadoo y X abbadabbadabbadoo a ya X bbadabbadabbadoo b ab X badabbadabbadoo b bb X adabbadabbadoo a ba X dabbadabbadoo d ad X abbadabbadoo ab da, dab (d + all prefixes of ab) X badabbadoo ba abb, abba (ab + all prefixes of ba) X dabbadoo dab bad, bada, badab X badoo ba dabb, dabba X doo d bad X oo o do X o o X XSince AP adds more strings to the dictionary, it takes more bits to Xrepresent a choice. However, it does provide a fuller range of matches Xfor the input string, and in practice it achieves slightly better Xcompression than MW in quite a bit less time. X X X----- 4. Y coding X X--- Completing the square X XLZW adds one dictionary string per phrase and increments strings by one Xcharacter at a time. MW adds one dictionary string per phrase and Xincrements strings by several characters at a time. AP adds one Xdictionary string per input character and increments strings by several Xcharacters at a time. These properties define three broad classes of Xmethods and point naturally to a fourth: coders that add one dictionary Xstring per input character and increment strings by one character at a Xtime. An example of such a method is Y coding. (It is worth noting at Xthis point that LZ77 variants are characterized by adding several Xdictionary strings per input character.) X X X--- The incomprehensible definition X XY coding is defined as follows: The dictionary starts with all single Xcharacters. One string pc starting at each input position is added to Xthe dictionary, where p is in the dictionary before that character and Xpc is not. X XTo put it differently, ``yowie'' appears in the dictionary as soon as Xthe regular expression yo.*yow.*yowi.*yowie matches the text. It is Xadded at the final e. So yabbadabbadabbadoo leads to the dictionary ya, Xab, bb, ba, ad, da, abb, bba, ada, dab, abba, bbad, bado, ado, oo. X XWe haven't defined the coding yet. While we build the dictionary, we Xkeep track of a current longest match (initially empty). Right after Xreading an input character and before integrating it into the Xdictionary, we see whether the longest match plus that character is Xstill in the dictionary *constructed before the start of that longest Xmatch*. If so, we use that as the new longest match, and proceed. XOtherwise, we output the number of the longest match, and set the Xlongest match to just that new character. X XTo decode, we read these matches, one by one. We take each one as a Xsequence of characters as defined by the dictionary, and output that Xsequence. Meanwhile we take each character and feed it as input to the Xdictionary-building routine. X X X--- A comprehensible example X XWe can run through the string keeping track of dictionary matches, as Xfollows: X X I add current matches X abcabcabcabcabcabcabcx a X bcabcabcabcabcabcabcx ab b X cabcabcabcabcabcabcx bc c X abcabcabcabcabcabcx ca a X bcabcabcabcabcabcx ab b X cabcabcabcabcabcx abc bc c X abcabcabcabcabcx bca ca a X bcabcabcabcabcx cab ab b X cabcabcabcabcx _____ abc bc c X abcabcabcabcx abca / \ \___ bca ca a X bcabcabcabcx bcab cab ab b \____ \_/ X cabcabcabcx cabc abc bc c \__/ X abcabcabcx abca bca ca a X bcabcabcx abcab bcab cab ab b X cabcabcx bcabc cabc abc bc c X abcabcx cabca abca bca ca a X bcabcx ,-----------------abcab bcab cab ab b X cabcx abcabc `--bcabc cabc abc bc c X abcx bcabca cabca abca bca ca a X bcx cabcab abcab bcab cab ab b X cx abcabc bcabc cabc abc bc c X x abcabcx x X bcabcx X cabcx X abcx X bcx X cx X XThe strings on the right side are the current matches-so-far after each Xinput character. We start with no matches. When we read a character, we Xappend it to each match so far; if the results are in the dictionary, we Xmake them the new matches, and add the character as a match by itself. XIf any of them are not in the dictionary we take them out of the list Xand add them to the dictionary. X XBefore reading the fifth c, for example, the matches are bcab, cab, ab, Xand b. We append c to each one: bcabc doesn't match, so we add it to the Xdictionary. The rest are still in the dictionary, so our new list is Xcabc, abc, bc, and a lone c. And so on. X XWhen the x is read, the current list is abcab bcab cab ab b. Now none of Xabcabx bcabx cabx abx bx are in the dictionary, so we add all of them, Xand the list becomes just a single x. X X X--- The crucial observation X XIt is not clear so far that Y is a linear-time algorithm: each input Xcharacter demands additions to several matches. However, *every Xsubstring of a dictionary string is in the dictionary*. This is clear Xfrom the regular-expression characterization of dictionary strings: if Xyo.*yow.*yowi.*yowie matches the text, then ow.*owi.*owie (for example) Xcertainly does as well. X XSay the current match list is wonka onka nka ka a, and we read the Xcharacter x. The next list has to be one of the following: X X wonkax onkax nkax kax ax x X onkax nkax kax ax x X nkax kax ax x X kax ax x X ax x X x X XIf onkax matches, for example, then nkax must match as well, and so must Xkax and ax and x. X XSo the match list will always consist of one string and its suffixes. XHence we can keep track of just the longest string in the match list. XFor example, with the input string oompaoompapaoompaoompapa: X X input test add match X o o X o oo oo o X m om om m X p mp mp p X a pa pa a X o ao ao o X o oo oo X m oom oom om X p omp omp mp X a mpa mpa pa X p pap pap X ap p X a pa pa X o pao pao ao X o aoo aoo oo X m oom oom X p oomp oomp omp X a ompa ompa mpa X o mpao mpao X pao ao X o aoo aoo X m aoom aoom oom X p oomp oomp X a oompa oompa ompa X p ompap ompap X mpap pap X a papa papa X apa pa X X X--- A comprehensible definition X XHere is another definition of how to construct the Y dictionary, based Xon the chart above. X X 0. Start with the dictionary mapping every character of A to a unique X integer. Set m to the empty string. X 1. Append the next character c of I to m. X 2. If m is not in the dictionary, then add m to the dictionary, remove X the first character of m, and repeat this step. X 3. Repeat from step 1 until the input is exhausted. X XWe must be careful in defining output: the output is not synchronized Xwith the dictionary additions, and we must be careful not to have the Xlongest output match overlap itself. To this end we split the dictionary Xinto two parts, the first part safe to use for the output, the second Xpart not safe. X X 0. Start with S mapping each single character to a unique integer; X T empty; m empty; and o empty. (S and T are the two parts of the X dictionary.) X 1. Read a character c. If oc is in S, set o to oc; otherwise output X S(o), set o to c, add T to S, and remove everything from T. X 2. While mc is not in S or T, add mc to T (mapping to the next X available integer), and chop off the first character of m. X 3. After m is short enough that mc is in the dictionary, set m to mc. X 4. Repeat as long as there are enough input characters (i.e., until X step 1 fails). X 5. Output S(o) and quit. X XHere's how to decode: X X 0. Initialize (D,m) as above. X 1. Read D(o) from the input and take the inverse under D to find o. X 2. As long as o is not the empty string, find the first character c of X o, and update (D,m) as above. Also output c and chop it off from X the front of o. X 3. Repeat from step 1 until the input is exhausted. X XThe coding only requires two fast operations on strings in the Xdictionary: testing whether sc is there if s's position is known, and Xfinding s's position given cs's position. The decoding only requires the Xsame operations plus finding the first character of a string given its Xspot in the dictionary. X X X--- Some properties of Y coding X XWe propose ``per-phrase dictionary'' to describe the dictionaries Xconstructed by LZW and MW, ``per-character dictionary'' for AP and Y. XWe also propose ``phrase-increment'' to describe MW and AP, X``character-increment'' for LZW and Y. More precisely, a per-phrase Xdictionary is one which contains one string per output phrase, while a Xper-character dictionary contains one string per input character. Each Xstring in a character-increment dictionary is exactly one character Xlonger than a string added at a previous step; in contrast, a string in Xa phrase-increment dictionary can be much longer than any of its Xprefixes formerly in the dictionary. According to this terminology, XY is an exact character-increment per-character dictionary compressor. X XY has a similar feel to LZJ, which has a dictionary consisting of all Xunique substrings up to a certain length in a block of the input. XHowever, Y can adapt much more effectively to redundant text. X X X X----- 5. Results X XThe author has implemented Y coding [2] along with, for instruction and Xamusement, AP coding. In this section we survey various implementation Xissues and compare the effectiveness of the coding methods explained Xhere. X XThe author's implementation, yabbawhap, uses a huptrie [3] for the Xdictionary. yabba keeps track of the separate dictionaries S and T for Y Xcoding by remembering the highest position within D that is ``safe'' Xinside S; all higher positions are in T. X Xyabbawhap uses compress's strategy of adaptive blocking. The user may Xvary most aspects of compression dynamically, including two parameters Xof the heuristic used to control blocking. X XHere are results of these methods on the Bell-Witten-Cleary text corpus, Xalong with a highly redundant 180489-byte ``makelog'' added by the Xauthor to show an extreme case. X X __bib__book1__book2____geo_makelog___news___obj1___obj2_paper1_paper2_ XZ12 54094 394419 325147 78026 34757 230765 16528 160099 29433 40881 XZ14 46817 357387 281354 77696 25647 202594 14048 138521 25077 37196 XZ16__46528_332056_250759__77777___25647_182121__14048_128659__25077__36161_ XMW___41040_336793_230862__80296____8299_168652__13273_109266__22749__34234_ XAP2 47056 389702 297205 84582 20925 219665 13824 134547 26937 39415 XAP6 40770 338046 261270 79471 14762 190502 13824 123323 22413 34637 XAP3__40311_322178_228978__80106____8821_167896__13825_113296__22414__33320_ XY2 46882 363339 287110 80817 28159 212617 13858 141783 26131 38037 XY6 40874 320622 256578 76275 21220 185097 13858 125900 22452 33671 XY3___40456_306813_229851__76695___14411_168287__13859_114323__22453__32733_ X X paper3_paper4_paper5_paper6_____pic__progc__progl__progp__trans_ XZ12 23567 7091 6670 22333 66236 21825 31987 22936 46185 XZ14 22163 6957 6580 18695 63277 19143 27116 19209 39605 XZ16__22163___6957___6580__18695___62215__19143__27148__19209__38240_ XMW___21495___6697___6169__16899___65102__16976__22223__15095__27742_ XAP2 22293 6595 6146 19770 74061 18868 27191 17962 38781 XAP6 20869 6595 6146 16786 68349 16691 22716 15138 30415 XAP3__20870___6596___6147__16787___67980__16692__22451__15139__28056_ XY2 21609 6443 6033 19418 71952 18897 27607 19429 40444 XY6 20355 6443 6033 16677 66117 17063 23625 16616 33026 XY3___20356___6444___6034__16678___65377__17064__23512__16617__31300_ X XZ12 is LZW with 12 bits (i.e., a dictionary of at most 4096 strings), Xusing compress -b12. MW is MW using the author's squeeze program [4], Xwith a dictionary of at most 300000 strings (roughly equivalent to X1500000 input characters). AP2 is AP with an input block size of 21000, Xusing whap -m21000; AP6 has a block size of 65533, and AP3 has a block Xsize of 300000. Y is like AP but using yabba. X XY is notably effective upon book1 and news. X XXXX what else do people want to see in this section? X X X X----- 6. Conclusion X X--- Life goes on X XY coding is not much more complex than LZW coding and produces generally Xbetter results. It is one of the most effective non-Huffman-based XLZ78-style compressors. X X X--- How to achieve fame and fortune X XIt may be possible to implement Y so that the decoding does not have to Xdo all the work of the coding. In particular, three-fourths of the Xdictionary strings are typically not used at all; they should not have Xto be handled during decoding. Even if Y cannot be sped up, there is Xprobably some character-increment per-character dictionary compressor Xthat achieves similar results to Y but runs as quickly as LZW or AP. XThis is a area for future research. X XWe have not discussed different ways to code the output numbers. Huffman Xcoding and arithmetic coding both produce quite respectable improvements Xover and above the compression of the basic Y method. Little is known Xabout the best way to code a sequence from a dynamically expanding Xalphabet; it is a bit counterintuitive that semiadaptive Huffman coding Xupon an output Y sequence can produce worse results than adaptive XHuffman coding. X XIt would be interesting to compare a straight modeller with Y plus a XHuffman coder to see which produces better results. X X X--- Acknowledgments X XThe author would like to thank James Woods for his support and Xencouragement, as well as for many copies of papers; Richard Stallman, Xfor motivating the author to find Y coding; and Herbert J. Bernstein for Xgeneral comments and for suggesting the order of presentation of this Xmaterial. X X X--- So who are these LZWMWAPY people anyway? X XLZW stands for Lempel, Ziv, Welch. In 1977 Ziv and Lempel discovered the Xfirst in what are now called the LZ family of dictionary compressors. XThe original LZ algorithms were like LZW, but transmitted both p and c Xand then skipped past pc. They also started with a dictionary of just Xthe null string. (For detailed surveys of various LZ methods, the reader Xshould consult [1], [5], [7], [11].) Welch popularized the LZ variant, Xnow called LZW, in which the extra character was not transmitted but Xbecame the first character of the next phrase. X XMiller and Wegman independently discovered several LZ variants in 1984, Xincluding LZW and MW. In fact, Seery and Ziv had in 1978 [6] proposed an XLZ variant where two adjacent phrases were concatenated; this is related Xto MW in approximately the same way that LZ is related to LZW. (Seery Xand Ziv also introduced an important improvement for binary alphabets: Xif 01101 and 01100 are in the dictionary, there is no way that 0110 can Xpossibly be a longest match, so it can effectively be removed from the Xdictionary.) X XAP stands for ``all prefixes.'' It was originally discovered by Storer Xin 1988 (?) and independently rediscovered by this author in 1990. X XY actually does stand for yabbadabbadoo. The author discovered Y coding Xon December 26, 1990; he used yabbadabbadabbadoo as an example in Xexplaining the method to Woods the next day. Woods replied [10] ``I'll Xhave to look at your yabbadabbadoo code again to see how it differs from XLZR.'' The author promptly adopted ``Y coding'' as the official name. X(Ross Williams has suggested [12] that ``LZDB coding'' might be more Xappropriate.) X X X--- References X XYeah, yeah, I'm still writing up proper citations. XXX X X[1] Bell, Witten, Cleary, compression article, 1989 X[2] Bernstein, yabbawhap program, 1990-1991 X[3] Bernstein, huptrie article, 1990 X[4] Bernstein, squeeze program, 1989 X[5] Miller and Wegman, compression article, 1987 X[6] Seery and Ziv, LZ78 extensions article, 1978 X[7] Storer, compression book, 1988 X[8] Woods et al., compress program, 1985 X[9] Woods, private communication, 1990 X[10] Woods, private communication, 1990 X[11] Williams, compression book, 1991 X[12] Williams, private communication, 1991 END_OF_FILE if test 31910 -ne `wc -c <'ycoding.4b'`; then echo shar: \"'ycoding.4b'\" unpacked with wrong size! fi # end of 'ycoding.4b' fi echo shar: End of archive 1 \(of 4\). cp /dev/null ark1isdone MISSING="" for I in 1 2 3 4 ; do if test ! -f ark${I}isdone ; then MISSING="${MISSING} ${I}" fi done if test "${MISSING}" = "" ; then echo You have unpacked all 4 archives. rm -f ark[1-9]isdone else echo You still must unpack the following archives: echo " " ${MISSING} fi exit 0 exit 0 # Just in case...