Relay-Version: version B 2.10 5/3/83; site utzoo.UUCP Posting-Version: version B 2.10.1 6/24/83; site looking.UUCP Path: utzoo!watmath!looking!brad From: b...@looking.UUCP (Brad Templeton) Newsgroups: net.news Subject: What a good news compressor should do Message-ID: <138@looking.UUCP> Date: Tue, 20-Mar-84 00:00:00 EST Article-I.D.: looking.138 Posted: Tue Mar 20 00:00:00 1984 Date-Received: Wed, 21-Mar-84 03:35:38 EST References: <149@forcm5.UUCP> Organization: Looking Glass Software, Waterloo, Ont Lines: 21 A good news compressor would save a lot of people a lot of money, at the expense of a fair amount of cpu time. Great for vaxes with cpu to burn at night, not a good idea for my Onyx. There are various huffman encoders out there, you could ask around for one in public domain, I am sure you will find one. They're pretty easy to write. But a news encoder could do a lot better than a typical huffman. First of all, the news header can be reduced a great deal in size. All Header prefixes, such as "Relay-Version", could be reduced to almost nothing. Dates and such can be reduced easily. Then if you know you are talking about English text, you can do a lot by keeping a small dictionary of words - say the most common 1000 which you keep in ram. Of course, everybody has to keep the same list. You could add a few hundred common site names too. Overall, I could see this cutting the news phone bill down to 1/3 of what it is now. If your site pays for news, it is worth your time and the salary you get for it to write such code for your company! -- Brad Templeton - Waterloo, Ontario (519) 886-7304
Relay-Version: version B 2.10 5/3/83; site utzoo.UUCP Posting-Version: version B 2.10.1 6/24/83; site ames-lm.UUCP Path: utzoo!watmath!clyde!burl!ulysses!harpo!seismo!hao!ames-lm!jaw From: j...@ames-lm.UUCP (James A. Woods) Newsgroups: net.news Subject: Re: What a good news compressor should do Message-ID: <186@ames-lm.UUCP> Date: Fri, 23-Mar-84 19:26:37 EST Article-I.D.: ames-lm.186 Posted: Fri Mar 23 19:26:37 1984 Date-Received: Sun, 25-Mar-84 10:20:09 EST References: <149@forcm5.UUCP>, <138@looking.UUCP> Organization: NASA-Ames Research Center, Mtn. View, CA Lines: 36 # Fools ignore complexity. Pragmatists suffer it. Some can avoid it. Geniuses remove it. -- Perlis's Programming Proverb #58, SIGPLAN Notices, Sept. l982 News compression to save phone costs is a fine idea whose time has come. It's nice that we software types can do something about transmission rates, rather than simply wait for higher baud modems. CPU time is NOT something to worry about -- for 100K byte news batches, Bell's 'pack' code only consumes 15-20 CPU seconds (11/750 running 4.2 BSD) -- certainly less time than the 'uucp' and 'rnews' overhead. Berkeley's 'compact' is a different story, as it consumes an order-of-magnitude more time. This is because of the "on-line" algorithm (c.f. Gallager's Nov. 1978 paper in IEEE Trans. Info. Theory), which recomputes the Huffman tables on-the-fly. It's a sexy idea, in that a prepass to gather entropy statistics is unnecessary, but it still loses to "pack". The method may be useful when built into modem hardware, which works with arbitrary-length data streams instead of files (does anyone know if this is what makes the Chung modem tick?). Both programs squeeze 35-40% from news. It's agreed that the word entropy of English is higher (a very ad hoc compressor (50-60% for text) was published in Byte a few years back.) Certainly headers can shrink by adding field names to the dictionary. But why not start out with the standard, speedy, debugged "pack/unpack", especially since it would add only a few lines (less, if done on the shell level) to the (un)batcher software? A public domain 'pack' was done by Steve Zucker of Rand in the late 70's; Univ. New South Wales made mods for distribution by Usenix. I'm currently wondering why the Bell version takes longer to unpack than to pack, especially when decoding is conceptually simpler than encoding. -- James A. Woods {dual,hplabs,hao,research}!ames-lm!jaw