Relay-Version: version B 2.10 5/3/83; site utzoo.UUCP
Posting-Version: version B 2.10.1 6/24/83; site looking.UUCP
Path: utzoo!watmath!looking!brad
From: b...@looking.UUCP (Brad Templeton)
Newsgroups: net.news
Subject: What a good news compressor should do
Message-ID: <138@looking.UUCP>
Date: Tue, 20-Mar-84 00:00:00 EST
Article-I.D.: looking.138
Posted: Tue Mar 20 00:00:00 1984
Date-Received: Wed, 21-Mar-84 03:35:38 EST
References: <149@forcm5.UUCP>
Organization: Looking Glass Software, Waterloo, Ont
Lines: 21

A good news compressor would save a lot of people a lot of money, at
the expense of a fair amount of cpu time.  Great for vaxes with cpu
to burn at night, not a good idea for my Onyx.  There are various
huffman encoders out there, you could ask around for one in public domain,
I am sure you will find one.  They're pretty easy to write.

But a news encoder could do a lot better than a typical huffman.  First
of all, the news header can be reduced a great deal in size.  All Header
prefixes, such as "Relay-Version", could be reduced to almost nothing.
Dates and such can be reduced easily.

Then if you know you are talking about English text, you can do a lot
by keeping a small dictionary of words - say the most common 1000 which
you keep in ram.  Of course, everybody has to keep the same list.
You could add a few hundred common site names too.

Overall, I could see this cutting the news phone bill down to 1/3 of what
it is now.   If your site pays for news, it is worth your time and the
salary you get for it to write such code for your company!
-- 
	Brad Templeton - Waterloo, Ontario (519) 886-7304

Relay-Version: version B 2.10 5/3/83; site utzoo.UUCP
Posting-Version: version B 2.10.1 6/24/83; site ames-lm.UUCP
Path: utzoo!watmath!clyde!burl!ulysses!harpo!seismo!hao!ames-lm!jaw
From: j...@ames-lm.UUCP (James A. Woods)
Newsgroups: net.news
Subject: Re: What a good news compressor should do
Message-ID: <186@ames-lm.UUCP>
Date: Fri, 23-Mar-84 19:26:37 EST
Article-I.D.: ames-lm.186
Posted: Fri Mar 23 19:26:37 1984
Date-Received: Sun, 25-Mar-84 10:20:09 EST
References: <149@forcm5.UUCP>, <138@looking.UUCP>
Organization: NASA-Ames Research Center, Mtn. View, CA
Lines: 36

#	Fools ignore complexity.  Pragmatists suffer it.
	Some can avoid it.  Geniuses remove it.
  
			-- Perlis's Programming Proverb #58,
		  	   SIGPLAN Notices, Sept. l982

     News compression to save phone costs is a fine idea whose time has come.
It's nice that we software types can do something about transmission rates,
rather than simply wait for higher baud modems.  CPU time is NOT something
to worry about -- for 100K byte news batches, Bell's 'pack' code only
consumes 15-20 CPU seconds (11/750 running 4.2 BSD) -- certainly less time
than the 'uucp' and 'rnews' overhead.

     Berkeley's 'compact' is a different story, as it consumes an
order-of-magnitude more time.  This is because of the "on-line" algorithm
(c.f. Gallager's Nov. 1978 paper in IEEE Trans. Info. Theory), which recomputes
the Huffman tables on-the-fly.  It's a sexy idea, in that a prepass to
gather entropy statistics is unnecessary, but it still loses to "pack".
The method may be useful when built into modem hardware, which works with
arbitrary-length data streams instead of files (does anyone know if this is
what makes the Chung modem tick?).  Both programs squeeze 35-40% from news.

     It's agreed that the word entropy of English is higher (a very ad hoc
compressor (50-60% for text) was published in Byte a few years back.)
Certainly headers can shrink by adding field names to the dictionary.
But why not start out with the standard, speedy, debugged "pack/unpack",
especially since it would add only a few lines (less, if done on the shell
level) to the (un)batcher software?

     A public domain 'pack' was done by Steve Zucker of Rand in the late
70's;  Univ. New South Wales made mods for distribution by Usenix.
I'm currently wondering why the Bell version takes longer to unpack than
to pack, especially when decoding is conceptually simpler than encoding.

	-- James A. Woods  {dual,hplabs,hao,research}!ames-lm!jaw