Relay-Version: version B 2.10 5/3/83; site utzoo.UUCP Posting-Version: version B 2.10.1 6/24/83; site ames.UUCP Path: utzoo!watmath!clyde!burl!ulysses!mhuxl!houxm!houxz!vax135!cornell! uw-beaver!tektronix!hplabs!ames!jaw From: j...@ames.UUCP (James A. Woods) Newsgroups: net.unix-wizards,net.research Subject: Lempel-Ziv Compression Made Swift (method of J. A. Woods), part 1 Message-ID: <425@ames.UUCP> Date: Thu, 26-Jul-84 00:39:02 EDT Article-I.D.: ames.425 Posted: Thu Jul 26 00:39:02 1984 Date-Received: Sat, 21-Jul-84 03:39:50 EDT Organization: NASA-Ames Research Center, Mtn. View, CA Lines: 109 # "It's fresh squozen daily." -- official slogan of the Ames fast file finder (yet another data compression technique) Herein we report on a simple and fleet implementation of the LZW algorithm now making the rounds on UNIX. For equivalent compression rates (2:1 for unix-wizards, 3:1 and more for numerical data), it runs at least 1.6 times faster for large files than the code offered by Spencer Thomas. This is a C language comparison only; a greater speed differential is expected if both versions were written in machine language). This method may be impractical for output codes of length greater than 13 bits, though, fortunately, the basic algorithm offers only diminishing returns beyond this. Like Mr. Thomas, I adapted the Welch implementation to a different data structure, believing that the hashing method reported in the June IEEE Computer to be "too slow". Essentially, the code below is a direct-access lookup of the alternate string table pictured in the article. There are 2 ** BITS codes which serve as prefixes to be associated with any of 256 ascii characters. For the typical BITS == 12, this looks unwieldy, but is actually workable on a 32-bit processor with the enhancements I describe. A first pass took the form: short table [(1 << BITS) * 256]; /* 2 megabytes for BITS = 12 */ squash ( ) { register int K, i, p, prefix; register long code; bzero ( table, 1 << BITS ); /* clear table, but see below */ for ( i = 0; i < 256; i++ ) table [i] = i; prefix = getchar ( ); while ( (K = getchar ( )) != EOF ) { code = (long) ((K << BITS) + prefix); /* test for code in "string" table */ if ( (p = table [code]) != 0 ) prefix = p; else { /* put code in table */ output ( prefix ); prefix = K; if ( i < (1 << BITS) ) /* ran out of codes */ table [code] = (short) i++; } } output ( prefix ); /* last one */ } The table initialization time (Is bcopy() the fastest way to do this in C?) is disconcerting, some 6 seconds of system time (2 MB) on our 11/750. This is unacceptable for small files. It turns out that, using a variant of the sparse array initialization trick discussed by J. Bentley (CACM, 8/83), we can associate a bit-vector with the large code table, and save time by setting a bit in this map before each read access to the table, and by testing the bit before each write. The macros provided in M. D. McIlroy's now defunct V7 "spell" code, can be used for these bit tests: #define SHIFT 5 #define get(h) (bits[h>>SHIFT]&(1<<((long)h&((1<<SHIFT)-1)))) #define set(h) bits[h>>SHIFT] |= 1<<((long)h&((1<<SHIFT)-1)) How's that for spiff? It compiles into just a few VAX instructions and beats the mod operator hands down. If you give up, look at p. 47 in K & R. Though probabilistic Bloom filtering for spelling error detection fades away, the bit macros live on! Integrating this in the above fragment is omitted here for clarity. Another subtle point: originally the indexing code = (long) ((K << BITS) + prefix); was, keeping in spirit with Welch's alternate string table exposition: code = (long) ((prefix << BITS) + K); But we discovered that (on a virtual memory machine at least), the first form easily halves (!) system and user time (and space) for text files, since array accesses, in the absence of parity marked characters, neatly clusters things in the top half (one meg here) of the array. Just for a lark, I also wired in special-case code to keep prefix / character 040 (space) pairs in their own fast storage, since most net English [sic] (and also numerical tables) are 15% space. With these modifications and using Thomas's exact output() routine but with a doubled fwrite() buffer size, the following obtains: Input: 200 KB unix-wizards Machine: VAX 11/750, BSD 4.2 program user system % compression time time pack 15.2 4.4 34.0 Sys V.1 port compress (16 bit) 45.2 13.2 59.6 compact 221.2 4.9 34.0 compress (12 bit) 35.0 1.4 43.1 assembler output() compress (13 bit) 35.7 1.8 48.3 function squash (12 bit) 18.4 2.9 43.1 " squash (13 bit) 19.1 4.5 48.3 " Comments, largely of a philosophical nature, are continued in part 2 ...
Relay-Version: version B 2.10 5/3/83; site utzoo.UUCP Posting-Version: version B 2.10.1 6/24/83; site ames.UUCP Path: utzoo!watmath!clyde!burl!ulysses!mhuxl!houxm!houxz!vax135!cornell! uw-beaver!tektronix!hplabs!ames!jaw From: j...@ames.UUCP (James A. Woods) Newsgroups: net.unix-wizards,net.research Subject: Lempel-Ziv Compression Made Swift, part 2 Message-ID: <426@ames.UUCP> Date: Thu, 26-Jul-84 00:40:42 EDT Article-I.D.: ames.426 Posted: Thu Jul 26 00:40:42 1984 Date-Received: Sat, 21-Jul-84 03:40:34 EDT Organization: NASA-Ames Research Center, Mtn. View, CA Lines: 83 # "They squeeze the orange and throw away the skin." -- Voltaire, 1751 (letter referring to quarrel with Frederick the Great) Last episode, I remarked on an alternative implementation of LZ coding. I proposed a rather memory-intensive structure inherently faster than hashing or linked lists. The tests, by the way, were done on a medium-loaded (10 users) 4 megabyte 11/750 running 4.2 BSD. For all of the program runs, there were only a few (<7) page faults across all compressor types. Lesson: don't let the high I/O usage stats reported by the "time" command scare you -- the bottom line is real time. What is the use of FAST compression, you say? Phone lines only run at 9600 baud or so. We've got a > 50% method in LZW, so why mess with it? Good questions. But, there are applications beyond squeezing blood out of net.flame over modems. Even here, Thomas's code won't work on small address space machines for N > 12. This brings up the point that maybe Lempel-Ziv shouldn't be wired down into a utility until it works for the PDP 11 (or IBM PC/XT!) uucp sites at reasonable rates. Maybe someone should look into hashing with open addressing instead of memory consuming chaining -- there's lots of interesting time/space tradeoffs to investigate. I rejected hashing initially because of a slow malloc(), and the fact that computing good hash functions in software generally requires more time than scanning characters or links (c.f. discussion of hashing vs. tries [for that nifty "on-line" spelling checker] in Aho/Ullman's Data Structures and Algorithms.) But, "tries" have their own problems ... Anyway, I think there's a niche for fast code, and, at a minimum, I'm intending to #ifdef my speedups into Spencer's code along with mcvax!jim's portability mods (oops, I hope he used masks rather than assuming the existence of a barrel shifter), so a VAX (or 68K/32032/RISC) can communicate with the dinosaurs at BITS = 12. While we're at it, why not use the Berkeley "compact" command line interface (filename extensions for friendly in-place file compression in addition to pipe capability), and it's name (we all agree that the algorithm is a turkey). That way, we don't need more manual pages for compress/squeeze/squish/squash -- just do surgery on compact's guts and give it back to UCB. Also, if we're going to let users play with compression rates, that magic number better have the BITS parameter, for idiot-proof decompression. As for benchmark files, try the 200K /usr/dict/words, or System V with the extra "carburetor" but no "tofu". Some other applications: (1) Macintosh screen compression. Run length coding accessed from the Toolbox gets pictures down to 10K or so, but maybe some form of LZ can postprocess this stuff even more for those tiny diskettes. (2) Backup tapes. Welch reports 2-3 X savings system wide. Let's see, "tar/dump" goes about 3 meg / minute = 50K bytes / second at 6250 density. Hmm, we don't have a time payoff unless we're running a mainframe mismatched with slow tapes. But it'd still be fun to wire an assembly-coded version (no memory contention running single-user, afterall) into dump/restore. One tape is still better than two, albeit sluggish. I predict the LZ method will be put (invisibly) into tape controllers by the Hardy boys. (3) Scientific data. Lempel-Ziv buys 3 : 1 for ANOVA columnar statistics and certain random-access data bases (containing mostly zeroes) run at 8 : 1! We never did want those removable drives the division is pushing, anyway. (4) Slow networks. What does 4.2 TCP/IP over Ethernet do? 50K bytes / second? Or have that Cray/Denelcor UNIX ship compressed data to its frontend, or to its slow disks. -- James A. Woods {dual,hplabs,hao}!ames!jaw (j...@riacs.ARPA) P.S Data compression has obsessed me since days as a kid in grad school, where I did music coding (adaptive transform coding on an 11/40 with no floating point, running UNIX Version (not System) 5. If you're going for optimality, try Chaitin/Kolmogorov universal machine complexity theory (except it's worse than NP-complete, being undecidable). Some variants of the linear LZ are indeed NP-complete (Storer/Szymanski [author of pack?!?] at Bell Labs, are you there?) But for practicality, hats off to Lempel and Ziv. For the first time in 20-odd years, we have something to beat the pants off of Huffman codes (perhaps the most famous master's thesis of 1952). In a way, it's like besting Bob Beamon's long jump. Tres recherche!