Relay-Version: version B 2.10 5/3/83; site utzoo.UUCP
Posting-Version: version B 2.10.2 9/18/84; site ames.UUCP
Path: utzoo!watmath!clyde!bonnie!akgua!sdcsvax!dcdwest!ittvax!decvax!
genrad!mit-eddie!godot!harvard!seismo!hao!ames!jaw
From: j...@ames.UUCP (James A. Woods)
Newsgroups: net.unix-wizards
Subject: Lempel-Ziv data compression -- from now on, consider it standard.
Message-ID: <750@ames.UUCP>
Date: Sun, 6-Jan-85 00:24:26 EST
Article-I.D.: ames.750
Posted: Sun Jan  6 00:24:26 1985
Date-Received: Tue, 8-Jan-85 04:44:11 EST
Distribution: net
Organization: NASA-Ames Research Center, Mtn. View, CA
Lines: 58


#  We will release no code before its time. -- USENET compression group.

     These annals have already witnessed a discussion of the elegant
and very effective LZ algorithm for general-purpose data compression
as embodied in 'compress'.  Yes, the one which provides 15-25% better
compression than either the Bell 'pack' or the UCB 'compact', is speed
competitive with 'pack', and is an order-of-magnitude faster than 'compact'.

     Only now, it is in actual use on many machines including all manner
of VAXen, 68000 boxes, PDP 11s, Perkin-Elmer machines, and even lowly
IBM PCs.  Compress is an integral part of news batching over various
2.10.2 news links, and now, I understand, used by Lauren Weinstein for his
admirable satellite 'stargate' experiment.

     On behalf of the internationally-distributed implementor's group
I advertise the compression software (version 3.0) now appearing in
mod.sources.  The RCS notes will indicate that principal contributors
include Spencer Thomas of Utah, Jim McKie of Amsterdam, Joe Orost of
Perkin Elmer, Ken Turkowski of CADLINC, and a certain person with
rather mandibular initials.  I'm a little surprised that this communal
effort has worked well, and at the same time admit that the code might
have come out different if it were the product of a single organization.
Berkeley would do well to pick this one up, and although one Bell Labs
staffer on our mailing list known to be involved with an LZ program
chose not to provide input, you can be sure that at least Murray Hill
knows of the superior nature of this method.  Well Bell, you just can't
move fast enough anymore...

     The current public offering is quite a bit different from the
original release in its basic approach, resulting in a 30-40% speed gain.
Chaining has given way to open address hashing, table lookup has
proven effective in several places in the code (especially for cacheing
the output codes of low-density raster images), "block" compression
has shown its worth on long files, and diagnostics have been made more robust.

     It is high time to stamp out those obsolete Huffman coders
(are you listening Kermit packet compactors?).  I would not be surprised
if Macintosh files could benefit from this algorithm, and indeed,
statistics posted to net.graphics show an advantage in applying
LZ coding after run-length and quadtree encoding.

     Again, thank you all, and especially Abraham Lempel and Jacob Ziv.

     -- James A. Woods  {hplabs,hao,ihnp4}!ames!jaw  (jaw@riacs)

P.S.
     The new version is compatible with the old, in that older-format
files can be decompressed with the new block algorithm.  However,
sites which currently do compressed news batching should remember
to add the -C flag to their csendbatch script concurrent with installing
3.0, to be removed for increased efficiency when ALL feeds from
the site have installed the new version.

     On our machines at Ames, we have also rigged 'pack' and 'compact' to act
as "diodes" to decompress only -- compress is THAT good.  We have shrunken
typical text in half, statistical data by a factor of three, and sparse
databases by a factor of eight to ten.