Anyone want to gamma-test a compressor?

Path: gmdzi!unido!mcsun!uunet!samsung!think.com!hsdndev!cmcl2!
kramden.acf.nyu.edu!brnstnd
From: brns...@kramden.acf.nyu.edu (Dan Bernstein)
Newsgroups: gnu.misc.discuss
Subject: Anyone want to gamma-test a compressor?
Message-ID: <812:Feb1720:37:5291@kramden.acf.nyu.edu>
Date: 17 Feb 91 20:37:52 GMT
Organization: IR
Lines: 106

I now have version 0.95 of a compression package implementing AP and Y
coding, two non-Huffman LZ78-style dictionary coders. Both compressors
produce somewhat smaller output than compress. AP runs at basically the
same speed as compress. Y is quite a bit slower but appears to be new
(and hence unpatented). Enclosed are BLURB and PATENTS from the package.

I plan to send the entire package to comp.sources.misc by the end of the
month. In the meantime, I'm looking for gamma testers. Gamma test means
the package has survived beta test and I think it's reasonably complete,
but it still needs more portability testing, and it's not too late to
add features if I've missed any. If you're interested in the package and
you're willing to put in some time testing it and telling me what you
think, send a note to brns...@nyu.edu.

---Dan

This package includes yabba and unyabba, portable implementations of Y
coding. Y coding is unpatented---unlike the LZW coding used by compress
and the LZ78 variants used by most other dictionary compressors. Y also
gives somewhat better compression than LZW.

As an extra bonus, this package also includes whap and unwhap, portable
implementations of AP coding. AP coding is patented, but like Y it is
more effective than LZW. whap runs about as fast as compress and, in its
default configuration, uses less memory; unwhap runs as fast as
uncompress and uses less than half as much memory.

The entire package is public-domain. You can copy my Y code into any
application without fear of copyrights or patents. Be aware, however,
that the current implementation is probably not too close to optimal,
and I am actively searching for a better method. If AP were not patented
I would recommend that you use it instead. For more information see
PATENTS in the package.

PATENTS:

After someone patents a product or method that he invented, he can sue
anyone who makes, uses, or sells the invention. You've probably heard
that Welch at Unisys patented LZW compression some years back, and so
Unisys can sue anyone who uses UNIX compress or any similar program.
You may not have heard that Miller and Wegman at IBM independently
discovered several compression methods, including LZW---and sent their
patent application to the Patent & Trademark Office before Welch did.
So it appears that IBM controls LZW.

How can Unisys patent something that IBM already patented? And how can
either of them patent what are obviously mathematical algorithms, when
mathematical algorithms aren't supposed to be patentable? The answer is
that the Patent & Trademark Office doesn't let mathematicians and
computer scientists become patent examiners. So none of their examiners
can recognize mathematics without a court order, and even then they have
trouble. They also can't tell that two patents cover exactly the same
algorithm when the wording is slightly different.

Practically every good compression method in the last ten years has been
patented. Modem and disk manufacturers can't ignore the possibility of
government-approved monopolies on compressors: there's too much money at
stake. So they apply for one patent after another, twisting the wording
as much as possible so that the patent examiners don't realize they're
patenting mathematics. By and large, they succeed.

On December 26, 1990, I discovered an apparently new compression method,
Y coding. I haven't found a way to make Y run as fast as compress (yet),
but it's not too slow for general use, and it produces better results
overall than any other non-Huffman dictionary compressor I've seen. I am
sick and tired of having to tiptoe through this minefield of patents; I
want a method that produces good results and can be used freely by
anyone. Well, now I have it. Everything in Y coding is public domain. No
copyrights, no patents, no protection at all, on the code or on the
algorithms or on the file format or on anything else. I can't be totally
sure, of course, that I'm the first to discover Y, but everyone I've
talked to thinks it's new. Use it as you will.

Several months before I found Y coding, I found what I now know to be
Storer's AP coding. At first I thought it was new, and I started putting
together an implementation. It's quite a bit faster than my Y
implementation---even faster than compress on some machines---and uses
less memory. But it has turned out to be patented. I've included the
code in my Y package because I don't want my work to go to waste, but it
can't be used freely. You can use AP for what's sometimes called
``experimental use''; the precise wording in most court cases is ``for
the sole purpose of satisfying philosophical taste or curiosity, or for
instruction and amusement.'' Patents never apply to that sort of use.

If you like AP's efficiency and want to start using it for more than
satisfying your philosophical curiosity, you might be tempted to ask
Storer for a license. Don't give in! First, his patent---any compression
patent---is on very shaky ground, and as it hasn't been acknowledged in
the industry, it's even weaker. Second, you can challenge the patent by
sending the Patent & Trademark Office about $2000 and an explanation of
why AP is a mathematical algorithm. Now you may not have the money or
lawyers or time or energy to do this successfully, but there are a lot
of people who feel just as restricted. You should check up on the League
for Programming Freedom; if you don't subscribe to their political goals
but still want to get rid of compression patents, send me a note. In any
case this will be cheaper on everyone than paying Storer for something
he shouldn't even control. Third, Y isn't that much slower than AP, and
it compresses just as effectively. Chances are good that I or someone
else will find a faster implementation soon. Finally, we may be able to
convince Storer to donate his patent to the public good. If you want to
appeal to his morals, send a note to sto...@cs.brandeis.edu, copy to
brns...@nyu.edu.

Disclaimer: I'm not a lawyer. I'm just an outraged citizen.

---Dan Bernstein, brns...@nyu.edu

Path: gmdzi!unido!mcsun!uunet!bonnie.concordia.ca!news-server.csri.toronto.edu!
helios.physics.utoronto.ca!ists!yunexus!oz
From: o...@yunexus.yorku.ca (Ozan Yigit)
Newsgroups: gnu.misc.discuss
Subject: data compression notes [Re: Anyone want ...]
Message-ID: <21698@yunexus.YorkU.CA>
Date: 19 Feb 91 16:26:08 GMT
References: <812:Feb1720:37:5291@kramden.acf.nyu.edu>
Sender: n...@yunexus.YorkU.CA
Organization: York U. Communications Research & Development
Lines: 36

In article <See Ref> Dan Bernstein (brns...@nyu.edu) writes:

>Practically every good compression method in the last ten years has been
>patented.

Ross Williams' SAKDC [1] (Swiss Army Knife Data Compression) algorithm is
not patented. I verified this with the author, he does not believe in
patenting algorithms.

>On December 26, 1990, I discovered an apparently new compression method,
>Y coding.

That is nice. I think you should publish the algorithm (and a through
analysis of it) in some respectable journal. That would provide you with
the appropriate feedback, allow wider dissamination of your algorithm in
general, and highlight one potential solution to the problem of patented
algorithms: invent a better one.

>... I am
>sick and tired of having to tiptoe through this minefield of patents; I
>want a method that produces good results and can be used freely by
>anyone.

Coincidentally, this month's DDJ contains some stuff on this topic,
including a compressor by Mark R. Nelson (model-2) that is 25-30% better
than compress. There is also a Data Compression Contest hosted by DDJ: who
knows what will turn up.

oz
---
[1] Williams, Ross N., Adaptive Data Compression,
    Kluwer Publishers, Boston, 1991.
---
Yellow pages is evil. All traces of   | Internet: o...@nexus.yorku.ca
it must be vanguished from the face   | Uucp: utzoo/utai!yunexus!oz
of the earth.	   -- Eriks Rugelis   | Phone: 1+ (416) 736 5257 ..

Path: gmdzi!unido!fauern!ira.uka.de!sol.ctr.columbia.edu!
zaphod.mps.ohio-state.edu!rpi!uupsi!cmcl2!kramden.acf.nyu.edu!brnstnd
From: brns...@kramden.acf.nyu.edu (Dan Bernstein)
Newsgroups: gnu.misc.discuss
Subject: Re: data compression notes [Re: Anyone want ...]
Message-ID: <2022:Feb2012:34:0191@kramden.acf.nyu.edu>
Date: 20 Feb 91 12:34:01 GMT
References: <812:Feb1720:37:5291@kramden.acf.nyu.edu> <21698@yunexus.YorkU.CA>
Organization: IR
Lines: 18

In article <21...@yunexus.YorkU.CA> o...@yunexus.yorku.ca (Ozan Yigit) writes:
> That is nice. I think you should publish the algorithm (and a through
> analysis of it) in some respectable journal.

I'm working on a paper.

> Coincidentally, this month's DDJ contains some stuff on this topic,
> including a compressor by Mark R. Nelson (model-2) that is 25-30% better
> than compress.

But how fast is it? How much memory does it use? Take, for example, the
important case of people transferring news between IBM PCs with size-64K
memory segments. compress works with 12 bits, or about a 6K input block
size on average. yabba works with a 12K input block size. whap works
with a 20K input block size. This alone gives a big improvement. Can
Nelson's modeller be used in limited memory?

---Dan

Path: gmdzi!unido!mcsun!ukc!dcl-cs!aber-cs!athene!pcg
From: p...@cs.aber.ac.uk (Piercarlo Grandi)
Newsgroups: gnu.misc.discuss
Subject: Re: data compression notes [Re: Anyone want ...]
Message-ID: <PCG.91Feb25215626@odin.cs.aber.ac.uk>
Date: 25 Feb 91 21:56:26 GMT
References: <812:Feb1720:37:5291@kramden.acf.nyu.edu> <21698@yunexus.YorkU.CA>
	<2022:Feb2012:34:0191@kramden.acf.nyu.edu>
Sender: p...@aber-cs.UUCP
Organization: Coleg Prifysgol Cymru
Lines: 48
Nntp-Posting-Host: odin
In-reply-to: brnstnd@kramden.acf.nyu.edu's message of 20 Feb 91 12:34:01 GMT

On 20 Feb 91 12:34:01 GMT, brns...@kramden.acf.nyu.edu (Dan Bernstein) said:

brnstnd> In article <21...@yunexus.YorkU.CA> o...@yunexus.yorku.ca (Ozan
brnstnd> Yigit) writes:

oz> That is nice. I think you should publish the algorithm (and a
oz> through analysis of it) in some respectable journal.

brnstnd> I'm working on a paper.

Why not patent it? :-)

Seriously, as the FSF might also use patents in the same way as it uses
copyrights, to ensure that certain designs can circulate freely instead
of being hoarded.

Suppose that you do not patent your algorithm; somebody else later can
rediscover and patent it, and then *you* have to sue the somebody and
prove that you had prior art. If you preempt this by getting a patent
and then licensing it for free (to everybody that promises to use it
only in copyleft sources, or under similar conditions), this becomes
much more difficult, because at the very least your work will turn up in
a patent search and subsequent hoarders will have their claim rejected
by the Patent Office (hopefully).

Naturally there is a snag: while copylefting a source costs nothing,
freepatenting an algorithm or a device costs thousands of dollars.

Maybe the FSF could decide to do something about this for some possibly
important ideas, like establishing a fund and a panel to decide which of
the technologies submitted to the FSF should be patented drawing fropm
that fund's money.

The idea appeals to me a lot, because then the FSF could accumulate a
portfolio of freepatents, and if it became suitably attractive, patent
portfolio swaps could be arranged with large corporations, thus freeing
even more patents.

I for one would surely contribute some patentable technology to the FSF;
I have posted in the past an algorithm for compiling multiple
assignments, and the design of a reverse map MMU, in an attempt to
prevent others to patent them eventually. It would have been much better
I think to let the FSF patent them and then let everybody use them. I
think that Dan Bernstein and others could be contributors too.
--
Piercarlo Grandi                   | ARPA: pcg%uk.ac.aber...@nsfnet-relay.ac.uk
Dept of CS, UCW Aberystwyth        | UUCP: ...!mcsun!ukc!aber-cs!pcg
Penglais, Aberystwyth SY23 3BZ, UK | INET: p...@cs.aber.ac.uk

Path: gmdzi!unido!mcsun!uunet!wuarchive!zaphod.mps.ohio-state.edu!rpi!uupsi!
cmcl2!kramden.acf.nyu.edu!brnstnd
From: brns...@kramden.acf.nyu.edu (Dan Bernstein)
Newsgroups: gnu.misc.discuss
Subject: Re: data compression notes [Re: Anyone want ...]
Message-ID: <23019:Feb2609:37:3191@kramden.acf.nyu.edu>
Date: 26 Feb 91 09:37:31 GMT
References: <21698@yunexus.YorkU.CA> <2022:Feb2012:34:0191@kramden.acf.nyu.edu> 
<PCG.91Feb25215626@odin.cs.aber.ac.uk>
Organization: IR
Lines: 112

In article <PCG.91Feb25215...@odin.cs.aber.ac.uk> p...@cs.aber.ac.uk 
(Piercarlo Grandi) writes:
> Suppose that you do not patent your algorithm; somebody else later can
> rediscover and patent it, and then *you* have to sue the somebody and
> prove that you had prior art.

No, there's no point. The courts have said [riffle riffle riffle]
``A patent could not be infringed by the practice of any prior art.''
(Peerless Equipment Co. [now there's a name!] v. W.H. Miner, Inc.,
C.C.A. Ill.1938, 93 F.2d 98, certiorari denied 58 S.Ct. 611, 303 U.S. 
641, 82 L.Ed. 1113.)

Even better: ``It is neither wrong nor immoral for a manufacturer
deliberately to create an article on basis of prior art in order to
avoid infringement.'' (Reiner v. I. Leon Co., C.A.N.Y. 1963, 324 F.2d
648.) All you have to do is find a previous publication of the invention
and you're home free.

An interesting twist is that the courts usually won't even consider
whether the patent in question is valid. They'll just watch a dozen
experts testify that you're working within the prior art, and boom, the
case is over. Normally they would first consider the validity of the
patent, and only then consider whether you've infringed it. (This is
still backwards in some circuits.)

I am told, by the way, that electronic publication does constitute
publication for the purposes of patent law (though not necessarily
copyright law). However, evidence from a computer generally isn't
admissible in court, so anyone who saved a copy of my first announcement
of Y coding should print out a copy, sign and date it, and preferably
notarize it. Here's another copy just in case. (Can never be too
careful... [grin])

---Dan

Path: kramden.acf.nyu.edu!brnstnd
From: brns...@kramden.acf.nyu.edu (Dan Bernstein)
Newsgroups: gnu.misc.discuss
Subject: Re: compress algorithm patent
Message-ID: <6617:Jan2723:16:0...@kramden.acf.nyu.edu>
Date: Sun Jan 27 23:16:09 GMT 1991
References: <CARLTON.91Jan26141...@husc8.harvard.edu>
Organization: IR
Lines: 68

In article <CARLTON.91Jan26141...@husc8.harvard.edu> carl...@husc8.harvard.edu 
(david carlton) writes:
> does anybody know what
> happened with that?

Well, yes. The question is who wants to spend the hours telling you. :-)

The best non-Huffman variant of LZ/LZW/MW is Storer's AP coding, which I
independently discovered and implemented last year. After I found out
that Storer had discovered the method first (and patented it), I tried
to negotiate with him, but he doesn't seem interested in helping people
if he doesn't make some money off the deal.

I now have a compression program that produces typically 10-20% better
results than compress, can be compiled to use under half as much memory
as compress for decompression and the same amount for compression, runs
at basically the same speed, is easier to configure, compiles and runs
on 16- and 32-bit machines under practically any sane C implementation,
and can prep its output for encryption.

If James Storer agrees to let everyone use the program without
restriction, I will place it into the public domain and distribute it
through USENET; since RMS was interested in my squeeze program and since
the new program is better in almost all respects, I imagine he'll pick
it up for GNU. Send a polite note to sto...@cs.brandeis.edu if you would
like to see this happen.

In the meantime, I'm not sure that even a few hundred messages would
convince Storer, so here's a method that appears to be unpatented. For
the record, I found Y coding December 26. It gives slightly better
results than LZW and codes somewhat faster than AP. On the other hand,
decompression is about as slow as compression and uses more memory.

Y coding: Start with a dictionary of all single characters. At each
input position, add a new string to the dictionary. The new string is
the longest match between the current position and the dictionary, plus
one further character in the current position.

Constructing the dictionary and printing output are entirely
independent. To do the latter, simply output a code for the longest
match against the dictionary constructed so far, advance past that
match, and repeat. (``So far'' means taking into account only strings
constructed without having to look at the start of the current match.)

For example, take the input string yabbadabbadabbadoo!!!!. These strings
are added to the dictionary, one starting at each input position: ya,
ab, bb, ba, ad, da, abb, bba, bad, ada, dab, abba, bbad, bado, ado, do,
oo, o!, !!, !!!. One way to look at this generation is that bbad is
added to the dictionary as soon as bb.*bba.*bbad appears in the input.

Now you read off the coding as follows: Read y. y matches the dictionary
constructed up to the previous character, so keep going. Read a. ya
doesn't match the dictionary constructed up to the previous character,
so output the code for y. Read b. Output a. etc. The final outputs are
y, a, b, b, a, d, ab, ba, da, bba, d, o, o, !, !, !!.

Another example: pickypickypickypickypicky! codes as p, i, c, k, y, pi,
ck, yp, ick, ypi, ckyp, icky, !. You can see that strings of any length
can be added to the dictionary rather quickly, so although this has a
similar flavor to LZJ it is certainly not equivalent.

Anyway, I place Y coding into the public domain; unless somebody else
has snuck a patent in somewhere, you're free to use the method in any
program. I'm working on rewriting my AP program for Y coding, though I
think AP is somewhat better, and I don't particularly want to optimize
yet another compressor if Storer's willing to allow use of the current
one.

---Dan