Relay-Version: version B 2.10 5/3/83; site utzoo.UUCP Posting-Version: version B 2.10.2 9/18/84; site ames.UUCP Path: utzoo!watmath!clyde!bonnie!akgua!sdcsvax!dcdwest!ittvax!decvax! genrad!mit-eddie!godot!harvard!seismo!hao!ames!jaw From: j...@ames.UUCP (James A. Woods) Newsgroups: net.unix-wizards Subject: Lempel-Ziv data compression -- from now on, consider it standard. Message-ID: <750@ames.UUCP> Date: Sun, 6-Jan-85 00:24:26 EST Article-I.D.: ames.750 Posted: Sun Jan 6 00:24:26 1985 Date-Received: Tue, 8-Jan-85 04:44:11 EST Distribution: net Organization: NASA-Ames Research Center, Mtn. View, CA Lines: 58 # We will release no code before its time. -- USENET compression group. These annals have already witnessed a discussion of the elegant and very effective LZ algorithm for general-purpose data compression as embodied in 'compress'. Yes, the one which provides 15-25% better compression than either the Bell 'pack' or the UCB 'compact', is speed competitive with 'pack', and is an order-of-magnitude faster than 'compact'. Only now, it is in actual use on many machines including all manner of VAXen, 68000 boxes, PDP 11s, Perkin-Elmer machines, and even lowly IBM PCs. Compress is an integral part of news batching over various 2.10.2 news links, and now, I understand, used by Lauren Weinstein for his admirable satellite 'stargate' experiment. On behalf of the internationally-distributed implementor's group I advertise the compression software (version 3.0) now appearing in mod.sources. The RCS notes will indicate that principal contributors include Spencer Thomas of Utah, Jim McKie of Amsterdam, Joe Orost of Perkin Elmer, Ken Turkowski of CADLINC, and a certain person with rather mandibular initials. I'm a little surprised that this communal effort has worked well, and at the same time admit that the code might have come out different if it were the product of a single organization. Berkeley would do well to pick this one up, and although one Bell Labs staffer on our mailing list known to be involved with an LZ program chose not to provide input, you can be sure that at least Murray Hill knows of the superior nature of this method. Well Bell, you just can't move fast enough anymore... The current public offering is quite a bit different from the original release in its basic approach, resulting in a 30-40% speed gain. Chaining has given way to open address hashing, table lookup has proven effective in several places in the code (especially for cacheing the output codes of low-density raster images), "block" compression has shown its worth on long files, and diagnostics have been made more robust. It is high time to stamp out those obsolete Huffman coders (are you listening Kermit packet compactors?). I would not be surprised if Macintosh files could benefit from this algorithm, and indeed, statistics posted to net.graphics show an advantage in applying LZ coding after run-length and quadtree encoding. Again, thank you all, and especially Abraham Lempel and Jacob Ziv. -- James A. Woods {hplabs,hao,ihnp4}!ames!jaw (jaw@riacs) P.S. The new version is compatible with the old, in that older-format files can be decompressed with the new block algorithm. However, sites which currently do compressed news batching should remember to add the -C flag to their csendbatch script concurrent with installing 3.0, to be removed for increased efficiency when ALL feeds from the site have installed the new version. On our machines at Ames, we have also rigged 'pack' and 'compact' to act as "diodes" to decompress only -- compress is THAT good. We have shrunken typical text in half, statistical data by a factor of three, and sparse databases by a factor of eight to ten.