grep replacement

Path: utzoo!utgpu!water!watmath!clyde!bellcore!faline!thumper!ulysses!andante!
alice!andrew
From: and...@alice.UUCP
Newsgroups: comp.unix.wizards,comp.unix.questions
Subject: grep replacement
Summary: proposal for a replacement for grep/egrep/fgrep
Message-ID: <7882@alice.UUCP>
Date: 23 May 88 15:22:02 GMT
Organization: AT&T Bell Laboratories, Murray Hill NJ
Lines: 23
Posted: Mon May 23 11:22:02 1988

	Al Aho and I are designing a replacement for grep, egrep and fgrep.
The question is what flags should it support and what kind of patterns
should it handle? (Assume the existence of flags to make it compatible
with grep, egrep and fgrep.)
	The proposed flags are the V9 flags:
-f file	pattern is (`cat file`)
-v	print nonmatching
-i	ignore aphabetic case
-n	print line number
-x	the pattern used is ^pattern$
-c	print count only
-l	print filenames only
-b	print block numbers
-h	do not print filenames in front of matching lines
-H	always print filenames in front of matching lines
-s	no output; just status
-e expr	use expr as the pattern

The patterns are as for egrep, supplemented by back-referencing
as in \{pattern\}\1.

please send your comments about flags or patterns to research!andrew

Path: utzoo!attcan!uunet!husc6!mailrus!ames!elroy!cogswell!alan
From: a...@cogswell.Jpl.Nasa.Gov (Alan S. Mazer)
Newsgroups: comp.unix.wizards,comp.unix.questions
Subject: Re: grep replacement
Summary: Context please
Message-ID: <6866@elroy.Jpl.Nasa.Gov>
Date: 26 May 88 00:04:47 GMT
References: <7882@alice.UUCP> <5630@umn-cs.cs.umn.edu>
Sender: n...@elroy.Jpl.Nasa.Gov
Lines: 7

One thing I would _love_ is to be able to find the context of what I've
found, for example, to find the two (n?) surrounding lines.  I have wanted
to do this many times and there is no good way.

	-- Alan		..!cit-vax!elroy!alan		* "But seriously, what
			elroy!a...@csvax.caltech.edu	   could go wrong?"

Path: utzoo!attcan!uunet!mcvax!unido!rubmez!frei
From: f...@rubmez.UUCP (Matthias Frei )
Newsgroups: comp.unix.wizards
Subject: Re: grep replacement
Message-ID: <136@rubmez.UUCP>
Date: 30 May 88 11:04:42 GMT
Organization: MEZ, RUB, Bochum, FRG
Lines: 28
Posted: Mon May 30 12:04:42 1988
In-Reply-To: your article <7882@alice.UUCP>

> 	Al Aho and I are designing a replacement for grep, egrep and fgrep.
> The question is what flags should it support and what kind of patterns
> should it handle? (Assume the existence of flags to make it compatible
> with grep, egrep and fgrep.)

Hi,
some applications need to divert a file in two parts.
One should contain all lines matching any patterns, the other
one all lines not matching any of the patterns.
So I want following flags:

	- d	divert the file
		"matches" to stdout
		"nomatches" to stderr
	-r	exchange stdout and stderr, if -d is given  

Will you post Your new grep to the net ? (I hope so)

Thanks in Advance for a nice new tool

	Matthias Frei

--------------------------------------------------------------------
Snail-mail:                    |  E-Mail address:
Microelectronics Center        |                 UUCP  f...@rubmez.uucp        
University of Bochum           |                (...uunet!unido!rubmez!frei)
4630 Bochum 1, P.O.-Box 102143 |
West Germany                   |

Path: utzoo!utgpu!water!watmath!clyde!bellcore!rutgers!mit-eddie!uw-beaver!
uw-june!uw-entropy!dataio!pilchuck!ssc!happym!kent
From: k...@happym.UUCP (Kent Forschmiedt)
Newsgroups: comp.unix.wizards
Subject: Re: grep replacement
Message-ID: <449@happym.UUCP>
Date: 2 Jun 88 02:35:46 GMT
References: <136@rubmez.UUCP>
Reply-To: k...@happym.UUCP (Kent Forschmiedt)
Organization: Happy Man Corp.
Lines: 24

In article <1...@rubmez.UUCP> f...@rubmez.UUCP (Matthias Frei ) writes:
>I want following flags:
>
>	- d	divert the file
>		"matches" to stdout
>		"nomatches" to stderr
>	-r	exchange stdout and stderr, if -d is given  

I second the vote - just today I did one of these:

grep $PATTERN file > afile
grep -v $PATTERN file > anotherfile

Note, however, that -v will serve for the suggested -r.

>Will you post Your new grep to the net ? (I hope so)

From alice.UUCP??  Ha ha!  That's Bell Labs!  It will be in V10 
Unix, and none of us humans will see it until sysVr6, and only then 
if we are lucky!! 
-- 
--
	Kent Forschmiedt -- k...@happym.UUCP, tikal!camco!happym!kent
	Happy Man Corporation  206-282-9598

Path: utzoo!dciem!nrcaer!scs!spl1!laidbak!att!osu-cis!killer!tness7!bellcore!
faline!thumper!ulysses!andante!alice!andrew
From: and...@alice.UUCP
Newsgroups: comp.unix.wizards,comp.unix.questions
Subject: Re: grep replacement
Message-ID: <7944@alice.UUCP>
Date: 3 Jun 88 16:58:39 GMT
Article-I.D.: alice.7944
References: <136@rubmez.UUCP> <449@happym.UUCP>
Organization: AT&T Bell Laboratories, Murray Hill NJ
Lines: 20
Summary: the right way to do context and where's the source?

In article <4...@happym.UUCP>, k...@happym.UUCP writes:
> From alice.UUCP??  Ha ha!  That's Bell Labs!  It will be in V10 
> Unix, and none of us humans will see it until sysVr6, and only then 
> if we are lucky!! 

Context:
	the right thing to do is to write a context program that takes
input looking like "filename:linenumber:goo" and prints whatever context you like.
we can then take this crap out of grep and diff and make it generally available
for use with programs like the C compiler and eqn and so on. It can also do
the right thing with folding together nearby lines. At least one good first
cut has been put on the net but a C program sounds easy enough to do.

Source:
	the software i write is publicly available because it matters to me.
it was a hassle but mk and fio are available to everybody for reasonable cost
(< $125 commercial, nearly free educational). i am trying hard to do the
same for the new grep. it will be in V10, it will be in plan9, and should be
in SVR4 (the joint sun-at&t release).

Path: utzoo!utgpu!water!watmath!clyde!bellcore!faline!thumper!ulysses!andante!
mit-eddie!bbn!uwmcsd1!ig!agate!pasteur!ames!mailrus!nrl-cmf!cmcl2!brl-adm!brl-smoke!gwyn
From: g...@brl-smoke.UUCP
Newsgroups: comp.unix.wizards,comp.unix.questions
Subject: Re: grep replacement
Message-ID: <8012@brl-smoke.ARPA>
Date: 4 Jun 88 21:28:19 GMT
References: <136@rubmez.UUCP> <449@happym.UUCP> <7944@alice.UUCP>
Reply-To: g...@brl.arpa (Doug Gwyn (VLD/VMB) <gwyn>)
Organization: Ballistic Research Lab (BRL), APG, MD.
Lines: 6

In article <7...@alice.UUCP> and...@alice.UUCP writes:
>	the right thing to do is to write a context program that takes
>input looking like "filename:linenumber:goo" and prints whatever context ...

Heavens -- a tool user.  I thought that only Neanderthals were still alive.
I guess Bell Labs escaped the plague.

Path: utzoo!utgpu!water!watmath!clyde!bellcore!rutgers!ukma!husc6!bu-cs!bzs
From: b...@bu-cs.BU.EDU (Barry Shein)
Newsgroups: comp.unix.wizards,comp.unix.questions
Subject: Re: grep replacement
Message-ID: <23133@bu-cs.BU.EDU>
Date: 5 Jun 88 01:37:09 GMT
References: <136@rubmez.UUCP> <449@happym.UUCP> <7944@alice.UUCP> <8012@brl-smoke.ARPA>
Organization: Boston U. Comp. Sci.
Lines: 17
In-reply-to: gwyn@brl-smoke.ARPA's message of 4 Jun 88 21:28:19 GMT

From: g...@brl-smoke.ARPA (Doug Gwyn )
>In article <7...@alice.UUCP> and...@alice.UUCP writes:
>>	the right thing to do is to write a context program that takes
>>input looking like "filename:linenumber:goo" and prints whatever context ...
>
>Heavens -- a tool user.  I thought that only Neanderthals were still alive.
>I guess Bell Labs escaped the plague.

Almost, unless the original input was produced by a pipeline, in which
case this (putative) post-processor can't help unless you tee the mess
to a temp file, yup, mess is the right word.

Or maybe only us Neanderthals are interested in tools which work on
pipes? Have they gone out of style?

	-Barry "Ulak of Org" Shein, Boston University

Path: utzoo!utgpu!water!watmath!clyde!bellcore!rutgers!mit-eddie!ll-xn!
ames!nrl-cmf!cmcl2!brl-adm!brl-smoke!gwyn
From: g...@brl-smoke.ARPA (Doug Gwyn )
Newsgroups: comp.unix.wizards,comp.unix.questions
Subject: Re: grep replacement
Message-ID: <8022@brl-smoke.ARPA>
Date: 5 Jun 88 03:30:46 GMT
References: <136@rubmez.UUCP> <449@happym.UUCP> <7944@alice.UUCP> 
<8012@brl-smoke.ARPA> <23133@bu-cs.BU.EDU>
Reply-To: g...@brl.arpa (Doug Gwyn (VLD/VMB) <gwyn>)
Organization: Ballistic Research Lab (BRL), APG, MD.
Lines: 22

In article <23...@bu-cs.BU.EDU> b...@bu-cs.BU.EDU (Barry Shein) writes:
>Almost, unless the original input was produced by a pipeline, in which
>case this (putative) post-processor can't help unless you tee the mess
>to a temp file, yup, mess is the right word.

The proposed tool would be very handy on ordinary text files,
but it is hard to see a use for it on pipes.  Or, getting back
to context-grep, what good would it do to show context from a
pipe?  To do anything with the information (other than stare
at it), you'd need to produce it again.  There might be some
use for context-{grep,diff,...} on a stream, but if a separate
context tool will satisfy 99% of the need, as I think it would,
as well as provide this capability for other commands "for free",
it would be a better approach than hacking context into other
commands.

By the way, I hope the new grep when asked to always produce
the filename will use "-" for stdin's name, and the context
tool would also follow the same convention.  Even though the
Research systems have /dev/stdin, other sites may not, and
anyway (as we've just seen) stdin isn't really a definite
object.

Path: utzoo!utgpu!water!watmath!clyde!bellcore!rutgers!njin!princeton!udel!
gatech!ncar!boulder!sunybcs!bingvaxu!leah!itsgw!sun.soe.clarkson.edu!nelson
From: nel...@sun.soe.clarkson.edu (Russ Nelson)
Newsgroups: comp.unix.wizards,comp.unix.questions
Subject: Re: grep replacement
Message-ID: <1030@sun.soe.clarkson.edu>
Date: 5 Jun 88 03:38:55 GMT
References: <136@rubmez.UUCP> <449@happym.UUCP> <7944@alice.UUCP> 
<8012@brl-smoke.ARPA> <23133@bu-cs.BU.EDU>
Reply-To: nel...@sun.soe.clarkson.edu (Russ Nelson)
Followup-To: comp.unix.wizards
Organization: Clarkson University, Potsdam, NY
Lines: 19

In article <23...@bu-cs.BU.EDU> b...@bu-cs.BU.EDU (Barry Shein) writes:
>In article <7...@alice.UUCP> and...@alice.UUCP writes:
>>	the right thing to do is to write a context program that takes
>>input looking like "filename:linenumber:goo" and prints whatever context ...
>
>Almost, unless the original input was produced by a pipeline, in which
>case this (putative) post-processor can't help unless you tee the mess
>to a temp file, yup, mess is the right word.

How about:

alias with_context tee >/tmp/$$ | $* | context -f/tmp/$$

or something like that?  Does that offend tool-users sensibilities?
*Do* Neanderthals have any sensibilities?
-- 
signed char *reply-to-russ(int network) {	/* Why can't BITNET go	*/
if(network == BITNET) return "NELSON@CLUTX";	/* domainish?		*/
else return "nel...@clutx.clarkson.edu"; }

Path: utzoo!dciem!nrcaer!scs!spl1!laidbak!att!mtunx!pacbell!lll-tis!
helios.ee.lbl.gov!pasteur!ucbvax!decwrl!purdue!bu-cs!bzs
From: b...@bu-cs.BU.EDU (Barry Shein)
Newsgroups: comp.unix.wizards,comp.unix.questions
Subject: Re: grep replacement
Message-ID: <23142@bu-cs.BU.EDU>
Date: 5 Jun 88 15:24:23 GMT
Article-I.D.: bu-cs.23142
References: <136@rubmez.UUCP> <449@happym.UUCP> <7944@alice.UUCP> 
<8012@brl-smoke.ARPA> <23133@bu-cs.BU.EDU> <8022@brl-smoke.ARPA>
Organization: Boston U. Comp. Sci.
Lines: 82
In-reply-to: gwyn@brl-smoke.ARPA's message of 5 Jun 88 03:30:46 GMT

From: g...@brl-smoke.ARPA (Doug Gwyn )
>In article <23...@bu-cs.BU.EDU> b...@bu-cs.BU.EDU (Barry Shein) writes:
>>Almost, unless the original input was produced by a pipeline, in which
>>case this (putative) post-processor can't help unless you tee the mess
>>to a temp file, yup, mess is the right word.
>
>The proposed tool would be very handy on ordinary text files,
>but it is hard to see a use for it on pipes.  Or, getting back
>to context-grep, what good would it do to show context from a
>pipe?  To do anything with the information (other than stare
>at it), you'd need to produce it again.

What else are context displays for except to stare at (or save in a
file for later staring)?

Are the resultant contexts often the input to other programs? (I know
that 'patch' can take a context input but that's irrelevant, it hardly
needs nor prefers a context diff to my knowledge, it's just being
accomodating so humans can look at the context diff if something
botches.)

Actually, I can answer that in the context of the original suggestion.

The motivation for a context comes in two major flavors:

	A) To stare at (the surrounding context gives a human some
	hint of the context in which the text appeared)

	B) Because the context really represents a multi-line (eg)
	record, such as pulling out every termcap or terminfo entry
	which contains some property but desiring the result to contain
	the entire multiline entry so it could be re-used to create a
	new file.

In either case it's independent of whether the data is coming from a
pipe (as it should be.) Its pipeness may be caused by something as
simple as the data being grabbed across the network (rsh HOST cat foo | ...).

Anyhow, I think it's bad in general to demand the reasoning of why a
selection operator should work in a pipe, it just should (although I
have presented a reasonable argument.) That's what tools are all about.

>There might be some
>use for context-{grep,diff,...} on a stream, but if a separate
>context tool will satisfy 99% of the need, as I think it would,
>as well as provide this capability for other commands "for free",
>it would be a better approach than hacking context into other
>commands.

I think claiming that 99% of the use won't need pipes is unsound, it
should just work with a pipe and any tool which requires passing the
file name and then re-positioning the file just won't, it's violating
a fundamental design concept by doing this (not that in rare cases
this might not be necessary, but I don't see where this is one of them
unless you use the circular argument of it "must be a separate
program".)

The reasoning for adding it to grep would be:

	a) Grep already has its finger on the context, it's right
	there (or could be), why re-process the entire stream/file
	just to get it printed? Grep found the context, why find it
	again?

	b) The context suggestions are merely logical generalizations
	of the what grep already does, print the context of a match
	(it just happens to now limit that to exactly one line.) Nothing
	new conceptually is being added, only generalized.

In fact, if I were to write this context-display tool my first thought
would be to just use grep and try to emit unique patterns (a la TAGS
files) which grep can then re-scan. But grep doesn't quite cut it w/o
this little generalization. I think we're going in circles and this
post-processor is nothing more than a special case of grep or perhaps
cat or sed the way it was proposed (why not just generate sed commands
to list the lines if that's all you want?)

Anyhow, at least we're back to the technical issues and away from
calling anyone who disagrees Neanderthals...

	-Barry Shein, Boston University

Path: utzoo!dciem!nrcaer!scs!spl1!laidbak!att!mtunx!pacbell!lll-tis!
helios.ee.lbl.gov!pasteur!ucbvax!decwrl!purdue!bu-cs!bzs
From: b...@bu-cs.BU.EDU (Barry Shein)
Newsgroups: comp.unix.wizards,comp.unix.questions
Subject: Re: grep replacement
Message-ID: <23143@bu-cs.BU.EDU>
Date: 5 Jun 88 15:28:40 GMT
Article-I.D.: bu-cs.23143
References: <136@rubmez.UUCP> <449@happym.UUCP> <7944@alice.UUCP> 
<8012@brl-smoke.ARPA> <23133@bu-cs.BU.EDU> <1030@sun.soe.clarkson.edu>
Organization: Boston U. Comp. Sci.
Lines: 24
In-reply-to: nelson@sun.soe.clarkson.edu's message of 5 Jun 88 03:38:55 GMT

From: nel...@sun.soe.clarkson.edu (Russ Nelson) [responding to me]
>>Almost, unless the original input was produced by a pipeline, in which
>>case this (putative) post-processor can't help unless you tee the mess
>>to a temp file, yup, mess is the right word.
>
>How about:
>
>alias with_context tee >/tmp/$$ | $* | context -f/tmp/$$
>
>or something like that?  Does that offend tool-users sensibilities?
>*Do* Neanderthals have any sensibilities?

I don't understand, the way to avoid having to tee it into temp
files is to tee it into temp files?

Given that sort of solution we can eliminate pipes entirely from unix,
was that your point? That pipes are fundamentally useless and can
always be eliminated via use of intermediate temp files?

It begs the question, burying it in a little syntactic sugar with an
alias command doesn't solve the problem.

	-Barry Shein, Boston University

Path: utzoo!utgpu!water!watmath!clyde!bellcore!rutgers!gatech!ncar!
oddjob!mimsy!chris
From: ch...@mimsy.UUCP (Chris Torek)
Newsgroups: comp.unix.wizards,comp.unix.questions
Subject: Re: grep replacement and /dev/stdin
Message-ID: <11821@mimsy.UUCP>
Date: 5 Jun 88 21:41:14 GMT
References: <136@rubmez.UUCP> <449@happym.UUCP> <7944@alice.UUCP> 
<8022@brl-smoke.ARPA>
Organization: U of Maryland, Dept. of Computer Science, Coll. Pk., MD 20742
Lines: 22

In article <8...@brl-smoke.ARPA> g...@brl-smoke.ARPA (Doug Gwyn ) writes:
>By the way, I hope the new grep when asked to always produce
>the filename will use "-" for stdin's name, and the context
>tool would also follow the same convention.  Even though the
>Research systems have /dev/stdin, other sites may not,

Why not?  We (ch...@mimsy.umd.edu and f...@mimsy.umd.edu) have posted
an implementation at least twice.  (Still could not get Berkeley to
include it in 4.3-tahoe, alas; maybe 4.4....)  The implmentation was
easy in 4.1BSD, and not hard in 4.2 and 4.3BSD, so it should be easy in
any pre-networking Unix, and not hard in the networking Unices.  (It
only got harder because Fred wanted to open, not dup, the appropriate
descriptor, and that is not possible for sockets or [presumably] streams.
I believe the V8 /dev/stdin dups fd 0.)

>and anyway (as we've just seen) stdin isn't really a definite
>object.

Neither is `-'.
-- 
In-Real-Life: Chris Torek, Univ of MD Comp Sci Dept (+1 301 454 7163)
Domain:	ch...@mimsy.umd.edu	Path:	uunet!mimsy!chris

Path: utzoo!utgpu!water!watmath!clyde!bellcore!rutgers!gatech!ncar!
oddjob!mimsy!chris
From: ch...@mimsy.UUCP (Chris Torek)
Newsgroups: comp.unix.wizards,comp.unix.questions
Subject: aside on patch and context diffs
Message-ID: <11822@mimsy.UUCP>
Date: 5 Jun 88 21:47:19 GMT
References: <136@rubmez.UUCP> <449@happym.UUCP> <7944@alice.UUCP> 
<23142@bu-cs.BU.EDU>
Organization: U of Maryland, Dept. of Computer Science, Coll. Pk., MD 20742
Lines: 13

In article <23...@bu-cs.BU.EDU> b...@bu-cs.BU.EDU (Barry Shein) writes:
>... 'patch' can take a context input but that's irrelevant, it hardly
>needs nor prefers a context diff to my knowledge, it's just being
>accomodating so humans can look at the context diff if something
>botches.

There is another very good reason to use context diffs with patch,
and that is that a one-line change (e.g., fixing a comment) can break
a non-context diff too easily.  (Also, I like to scan the diffs myself
before applying them; it catches a number of bugs handily.)
-- 
In-Real-Life: Chris Torek, Univ of MD Comp Sci Dept (+1 301 454 7163)
Domain:	ch...@mimsy.umd.edu	Path:	uunet!mimsy!chris

Path: utzoo!attcan!uunet!lll-winken!lll-tis!ames!ll-xn!mit-eddie!uw-beaver!
cornell!batcomputer!sun.soe.clarkson.edu!nelson
From: nel...@sun.soe.clarkson.edu (Russ Nelson)
Newsgroups: comp.unix.wizards,comp.unix.questions
Subject: Re: grep replacement
Message-ID: <1037@sun.soe.clarkson.edu>
Date: 6 Jun 88 15:18:29 GMT
References: <136@rubmez.UUCP> <449@happym.UUCP> <7944@alice.UUCP> 
<8012@brl-smoke.ARPA> <23133@bu-cs.BU.EDU> <1030@sun.soe.clarkson.edu> 
<23143@bu-cs.BU.EDU>
Reply-To: nel...@sun.soe.clarkson.edu (Russ Nelson)
Organization: Clarkson University, Potsdam, NY
Lines: 20

In article <23...@bu-cs.BU.EDU> b...@bu-cs.BU.EDU (Barry Shein) writes:
>From: nel...@sun.soe.clarkson.edu (Russ Nelson) [responding to me]
>>alias with_context tee >/tmp/$$ | $* | context -f/tmp/$$
>I don't understand, the way to avoid having to tee it into temp
>files is to tee it into temp files?

No.  There is no way to avoid teeing it into a temp file.  Such is
life with pipes.  If you want context then you need to save it.  My
alias is perfectly consistent with the tool-using philosophy.  Yes,
it's a kludge, but that's the only way to save context in a single-stream
pipe philosophy.  I remember reading a paper in which multiple streams
going hither and yon were proposed, but the syntax was gothic at best.
I like being able to say this:

bsd:	sort | with_context grep rfoo | more
sysv:	sort | with_context grep foo | more
	Because sysv doesn't have the r* utilities, of course :-)
-- 
signed char *reply-to-russ(int network) {	/* Why can't BITNET go	*/
if(network == BITNET) return "NELSON@CLUTX";	/* domainish?		*/
else return "nel...@clutx.clarkson.edu"; }

Path: utzoo!attcan!uunet!seismo!rick
From: r...@seismo.CSS.GOV (Rick Adams)
Newsgroups: comp.unix.questions
Subject: Re: grep replacement
Summary: -h
Message-ID: <44366@beno.seismo.CSS.GOV>
Date: 6 Jun 88 20:40:05 GMT
References: <7882@alice.UUCP> <2450011@hpsal2.HP.COM> <54818@sun.uucp> 
<10264@ncc.Nexus.CA>
Organization: Center for Seismic Studies, Arlington, VA
Lines: 10

7th Edition grep had a -h flag to not print the filenames on a grep.

4BSD still has a -h flag.

System 5 doesn't have a -h flag.

(Another example of how System 5 is superior to BSD... and V7...)

---rick

Path: utzoo!attcan!uunet!husc6!bu-cs!tower
From: to...@bu-cs.BU.EDU (Leonard H. Tower Jr.)
Newsgroups: comp.unix.wizards,comp.unix.questions
Subject: Re: grep replacement
Summary: try GNU Emacs' M-x grep RET
Message-ID: <23158@bu-cs.BU.EDU>
Date: 6 Jun 88 21:44:34 GMT
References: <7882@alice.UUCP> <5630@umn-cs.cs.umn.edu> <6866@elroy.Jpl.Nasa.Gov>
Reply-To: to...@bu-it.bu.edu (Leonard H. Tower Jr.)
Followup-To: comp.unix.wizards
Organization: Distributed Systems Group, Boston University,
       111 Cummington Street, Boston, MA  02215, USA  +1 (617) 353-2780
Lines: 24
X-Home: 36 Porter Street, Somerville, MA  02143, USA  +1 (617) 623-7739
Path: utzoo!attcan!uunet!husc6!bu-cs!tower
From: to...@bu-cs.BU.EDU (Leonard H. Tower Jr.)
Newsgroups: comp.unix.wizards,comp.unix.questions
Subject: Re: grep replacement
Summary: try GNU Emacs' M-x grep RET
Message-ID: <23158@bu-cs.BU.EDU>
Date: 6 Jun 88 21:44:34 GMT
References: <7882@alice.UUCP> <5630@umn-cs.cs.umn.edu> <6866@elroy.Jpl.Nasa.Gov>
Reply-To: to...@bu-it.bu.edu (Leonard H. Tower Jr.)
Followup-To: comp.unix.wizards
Organization: Distributed Systems Group, Boston University,
       111 Cummington Street, Boston, MA  02215, USA  +1 (617) 353-2780
Lines: 24
X-UUCP-Path: ..!harvard!bu-cs!tower

In article <6...@elroy.Jpl.Nasa.Gov> a...@cogswell.Jpl.Nasa.Gov (Alan S. Mazer) writes:
|
|One thing I would _love_ is to be able to find the context of what I've
|found, for example, to find the two (n?) surrounding lines.  I have wanted
|to do this many times and there is no good way.

GNU Emacs has a command that will walk you through each match of a
grep run and show you the context around it:

   grep:
   Run grep, with user-specified args, and collect output in a buffer.
   While grep runs asynchronously, you can use the C-x ` command
   to find the text that grep hits refer to.

M-x grep RET to invoke it.  I suspect other Unix Emacs have a similar
feature.

Information on how to obtain GNU Emacs, other GNU software, or the GNU
project itself is available from:

	g...@prep.ai.mit.edu

enjoy -len

Path: utzoo!utgpu!water!watmath!clyde!att!mtunx!rutgers!gatech!ncar!
oddjob!mimsy!eneevax!umd5!brl-adm!brl-smoke!gwyn
From: g...@brl-smoke.ARPA (Doug Gwyn )
Newsgroups: comp.unix.questions
Subject: Re: grep replacement
Message-ID: <8032@brl-smoke.ARPA>
Date: 7 Jun 88 08:59:56 GMT
References: <7882@alice.UUCP> <2450011@hpsal2.HP.COM> <54818@sun.uucp> 
<10264@ncc.Nexus.CA> <44366@beno.seismo.CSS.GOV>
Reply-To: g...@brl.arpa (Doug Gwyn (VLD/VMB) <gwyn>)
Organization: Ballistic Research Lab (BRL), APG, MD.
Lines: 10

In article <44...@beno.seismo.CSS.GOV> r...@seismo.CSS.GOV (Rick Adams) writes:
>7th Edition grep had a -h flag to not print the filenames on a grep.
>4BSD still has a -h flag.
>System 5 doesn't have a -h flag.
>(Another example of how System 5 is superior to BSD... and V7...)

Maybe the AT&T folks figured that their customers were smart enough
to type "cat files ... | grep".  I've never had the need for a -h
flag, but I sure would like for the -H (ALWAYS print filename)
option to be the default instead of the current variable algorithm.

Path: utzoo!attcan!uunet!lll-winken!lll-tis!ames!pasteur!ucbvax!decwrl!
labrea!rutgers!bellcore!faline!thumper!ulysses!andante!alice!andrew
From: and...@alice.UUCP
Newsgroups: comp.unix.wizards,comp.unix.questions
Subject: Re: grep replacement
Summary: responses to the request for comments
Message-ID: <7962@alice.UUCP>
Date: 10 Jun 88 18:34:00 GMT
Organization: AT&T Bell Laboratories, Murray Hill NJ
Lines: 143


	The following is a summary of the somewhat plausible ideas
suggested for the new grep. I thank leo de witt particularly and others
for clearing up misconceptions and pointing out (correctly) that
existing tools like sed already do (or at least nearly do) what some people
asked for. The following points are in no particular order and no slight is
intended by my presentation. After that, I summarise the current flags.

1) named character classes, e.g. \alpha, \digit.
	i think this is a hokey idea and dismissed it as unnecessary crud
	but then found out it is part of the proposed regular expression
	stuff for posix. it may creep in but i hope not.

2) matching multi-line patterns (\n as part of pattern)
	this actually requires a lot of infrastructure support and thought.
	i prefer to leave that to other more powerful programs such as sam.

3) print lines with context.
	the second most requested feature but i'm not doing it. this is
	just the job for sed. to be consistent, we just took the context
	crap out of diff too. this is actually reasonable; showing context
	is the job for a separate tool (pipeline difficulties apart).

4) print one(first matching) line and go onto the next file.
	most of the justification for this seemed to be scanning
	mail and/or netnews articles for the subject line; neither
	of which gets any sympathy from me. but it is easy to do
	and doesn't add an option; we add a new option (say -1)
	and remove -s. -1 is just like -s except it prints the matching line.
	then the old grep -s pattern is now grep -1 pattern > /dev/null
	and within epsilon of being as efficent.

5) divert matching lines onto one fd, nonmatching onto another.
	sorry, run grep twice.

6) print the Nth occurence of the pattern (N is number or list).
	it may be possible to think of a real reason for this (i couldn't)
	but the answer is no.

7) -w (pattern matches only words)
	the most requested feature. well, it turns out that -x (exact)
	is there because doug mcilroy wanted to match words against a dictionary.
	it seems to have no other use. Therefore, -x is being dropped
	(after all, it only costs a quick edit to do it yourself) and is
	replaced by -w == (^|[^_a-zA-Z0-9])pattern($|[^_a-zA-Z0-9]).

8) grep should work on binary files and kanji.
	that it should work on kanji or any character set is a given
	(at least, any character set supported by the system V international
	character set stuff). binary files will work too modulo the
	following restraint: lines (between \n's) have to fit in a
	buffer (current size 64K). violations are an error (exit 2).

9) -b has bogus units.
	agreed. -b now is in bytes.

10) -B (add an ^ to the front of the given pattern, analogous to -x and -w)
	-x (and -w) is enough. sorry.

11) recursively descend through argument lists
	no. find | xargs is going to have to do.

12) read filenames on standard input
	no. xargs will have to do.

13) should be as fast as bm.
	no worries. in fact, our egrep is 3xfaster than bm. i intend to be
	competitive with woods' egrep. it should also be as fast as fgrep for
	multiple keywords. the new grep incorporates boyer-moore
	as a degenerate case of Commentz-Walter, a faster replacement
	for the fgrep algorithm.

14) -lv (files that don't have any matching lines)
	-lv means print names of files that have any nonmatching lines
	(useful, say, for checking input syntax). -L will mean print
	names of files without selected lines.

15) print the part of the line that matched.
	no. that is available at the subroutine level.

16) compatability with old grep/fgrep/egrep.
	the current name for the new command is gre (aho chose it).
	after a while, it will become our grep. there will be a -G
	flag to take patterns a la old grep and a -F to take
	patterns a la fgrep (that is, no metacharacters except \n == |).
	gre is close enough to egrep to not matter.

17) fewer limits.
	so far, gre will have only one limit, a line length of 64K.
	(NO, i am not supporting arbitrary length lines (yet)!)
	we forsee no need for any other limit. for example, the
	current gre acts like fgrep. it is 4 times faster than
	fgrep and has no limits; we can gre -f /usr/dict/words
	(72K words, 600KB).

18) recognise file types (ignore binaries, unpack packed files etc).
	get real. go back to your macintosh or pyramid. gre will just grep
	files, not understand them.

19) handle patterns occurring multiple times per line
	this is illdefined (how many time does aaaa occur in a line of 20 'a's?
	in order of decreasing correctness, the answers are >=1, 17, 5).
	For the cases people mentioned (words), pipe it thru
	tr to put the words one per line.

20) why use \{\} instead of \(\)?
	this is not yet resolved (mcilroy&ritchie vs aho&pike&me).
	grouping is an orthogonal issue to subexpressions so why
	use the same parentheses? the latest suggestion (by ritchie)
	is to allow both \(\) and \{\} as grouping operators but
	the \3 would only count one type (say \(\)). this would be much
	better for complicated patterns with much grouping.

21) subroutine versions of the pattern matching stuff.
	in a deep sense, the new grep will have no pattern matching code in it.
	all the pattern matching code will be in libc with a uniform
	interface. the boyer-moore and commentz-walter routines have been
	done. the other two are egrep and back-referencing egrep.
	lastly, regexp will be reimplemented.

22) support a filename of - to mean standard input.
	a unix without /dev/stdin is largely bogus but as a sop to the poor
	barstards having to work on BSD, gre will support -
	as stdin (at least for a while).

Thus, the current proposal is the following flags. it would take a GOOD
argument to change my mind on this list (unless it is to get rid of a flag).

-f file	pattern is (`cat file`)
-v	nonmatching lines are 'selected'
-i	ignore aphabetic case
-n	print line number
-c	print count of selected lines only
-l	print filenames which have a selected line
-L	print filenames who do not have a selected line
-b	print byte offset of line begin
-h	do not print filenames in front of matching lines
-H	always print filenames in front of matching lines
-w	pattern is (^|[^_a-zA-Z0-9])pattern($|[^_a-zA-Z0-9])
-1	print only first selected line per file
-e expr	use expr as the pattern

Andrew Hume
research!andrew

Path: utzoo!attcan!uunet!seismo!keith
From: ke...@seismo.CSS.GOV (Keith Bostic)
Newsgroups: comp.unix.wizards,comp.unix.questions
Subject: Re: grep replacement
Message-ID: <44370@beno.seismo.CSS.GOV>
Date: 13 Jun 88 19:12:20 GMT
References: <7962@alice.UUCP>
Organization: Center for Seismic Studies, Arlington, VA
Lines: 31

In article <7...@alice.UUCP>, and...@alice.UUCP writes:

> 22) support a filename of - to mean standard input.
> 	a unix without /dev/stdin is largely bogus but as a sop to the poor
> 	barstards having to work on BSD, gre will support -
> 	as stdin (at least for a while).
>
> Andrew Hume
> research!andrew

A few comments:

     -- As far I'm aware, V9 is the only system that has "/dev/stdin" at the
	moment.  For those who haven't heard of it, V9 is a research version
	of UN*X developed and in use at the Computing Science Research Center,
	a part of AT&T Bell Laboratories, and available to a small number of
	universities.  It was preceded by V8, which, interestingly enough, was
	built on top of 4.1BSD.

     -- System V does not suppport "/dev/stdin".

     -- The next full release of BSD will contain "/dev/stdin" and friends.
	It is not part of the 4.3-tahoe release because it requires changes
	to stdio.  I do not expect, however, commands that currently support
	the "-" syntax to change, for compatibility reasons.  V9 itself
	continues to support such commands.

To sum up, let's try and keep this, if not actually constructive, at least
bearing some distant relationship to the facts.

Keith Bostic

Path: utzoo!attcan!uunet!husc6!uwvax!oddjob!mimsy!chris
From: ch...@mimsy.UUCP (Chris Torek)
Newsgroups: comp.unix.wizards,comp.unix.questions
Subject: Re: grep replacement
Message-ID: <11957@mimsy.UUCP>
Date: 14 Jun 88 03:54:41 GMT
References: <7962@alice.UUCP> <44370@beno.seismo.CSS.GOV>
Organization: U of Maryland, Dept. of Computer Science, Coll. Pk., MD 20742
Lines: 38

In article <44...@beno.seismo.CSS.GOV> ke...@seismo.CSS.GOV
[at seismo?!?] (Keith Bostic) writes:
>    -- The next full release of BSD will contain "/dev/stdin" and friends.
>	It is not part of the 4.3-tahoe release because it requires changes
>	to stdio.

Well, only because

	freopen("/dev/stdin", "r", stdin)

unexpectedly fails: it closes fd 0 before attempting to open /dev/stdin,
which means that stdin is gone before it can grab it again.  When I
`fixed' this here it broke /usr/ucb/head and I had to fix the fix!

The sequence needed is messy:

	old = fileno(fp);
	new = open(...);
	if (new < 0) {
		close(old);	/* maybe it was EMFILE */
		new = open(...);/* (could test errno too) */
		if (new < 0)
			return error;
	}
	if (new != old) {
		if (dup2(new, old) >= 0)	/* move it back */
			close(new);
		else {
			close(old);
			fileno(fp) = new;
		}
	}

Not using dup2 means that freopen(stderr) might make fileno(stderr)
something other than 2, which breaks at least perror().
-- 
In-Real-Life: Chris Torek, Univ of MD Comp Sci Dept (+1 301 454 7163)
Domain:	ch...@mimsy.umd.edu	Path:	uunet!mimsy!chris