Relay-Version: version B 2.10 5/3/83; site utzoo.UUCP
Path: utzoo!mnetor!uunet!husc6!necntc!ames!ucbcad!ucbvax!decvax!decwrl!reid
From: r...@decwrl.UUCP (Brian Reid)
Newsgroups: comp.sources.d,news.groups
Subject: arbitron program (v2.4.2--last updated 4 June 1987)
Message-ID: <11731@decwrl.DEC.COM>
Date: Thu, 1-Oct-87 08:51:07 EDT
Article-I.D.: decwrl.11731
Posted: Thu Oct  1 08:51:07 1987
Date-Received: Mon, 5-Oct-87 04:32:06 EDT
Sender: r...@decwrl.DEC.COM
Organization: DEC Western Research Laboratory
Lines: 234

This is the source for the "arbitron" program that is used to produce the
data for the monthly USENET readership surveys in news.lists. It is posted to
this newsgroup because there is no unmoderated sources newsgroup any more.

#! /bin/sh
# @(#)arbitron	2.4.2	06/05/87
# arbitron -- this program produces rating sweeps for USENET.
#
# Usage: arbitron
#
# To use this program, edit the "configuration" section below so that the
# information is correct for your site, and then run it. It will produce a
# readership survey for your machine and mail that survey to decwrl, with
# a cc to you.
#
# To participate in the international monthly ratings sweeps, 
# run "arbitron" every month. I will run the statistics program on the last
# day of each month; it will include any report that has reached it by that
# time. To make sure your site's data is included, run the survey program no
# later than the 20th day of each month.
#
# Brian Reid, DEC Western Research Lab, reid@decwrl
# Updated and bugfixed by 
#	Spencer Thomas, U.of Utah
#	Geoff Kuenning, SAH Consulting
# Updated to work with 2.10.1 and older news systems by
#	Lindsay Cleveland, AT&T Technologies/Bell Labs
# Made to work with 16-bit address spaces by
#	Andy Walker, Maths Dept., University of Nottingham, UK
# Nagging Bourne shell bug fixed by
#	Tom Donahue, Rabbit Software Corp
#
# Note that the results of this program are dependent on the rate at which
# you expire news.  If you are a small site that expires news rapidly, the
# results may indicate fewer active readers than you actually have.
#
###########################################################################
# Configuration information. Edit this section to reflect your site data. #
TMPDIR=/tmp
NEWS=/usr/lib/news
SPOOL=/usr/spool/news

# Make a crude stab at determining the system type. If your installation has
# only one type of system, you can edit out the "if" statement and just turn
# this into an assignment statement of the correct value.
if [ -d /usr/ucb ]
then
    STYPE="bsd"
else
    STYPE="usg"
fi

# Range of /etc/passwd UID's that represent actual people (rather than
# maintenance accounts or daemons or whatever)
lowUID=5
highUID=9999

# If you aren't running a distributed news system (nntpd & rrn, usually),
# leave NEWSHOST blank. Else set it to the name of the host from which you
# can rcp a copy of the active file.
NEWSHOST=

# uucp path: {ihnp4, decvax, ucbvax}!decwrl!netsurvey
# summarypath="netsur...@decwrl.dec.com $USER"
summarypath="ihnp4!decwrl!netsurvey $USER"

# We need to find the uucp name of your host. If this code doesn't work,
# then just put it in literally like this:
#	hostname="ihnp4"

case $STYPE in
	bsd) cmd='hostname || uuname -l';;
	sysv)cmd='uname -n || uuname -l || hostname';;
	*)   cmd='uuname -l';;
esac;

hostname=`sh -c "$cmd" 2>&-`

PATH=$NEWS:/usr/local/bin:/usr/ucb:/usr/bin:/bin
############################################################################
export PATH
# ---------------------------------------------------------------------------
trap "rm -f $TMPDIR/arb.*.$$; exit" 0 1 2 3 15
set `date`
dat="$2$6"
destination="${MAILER-mail} $summarypath"

################################
# Here are several expressions, each of which figures out approximately how
# many people use this machine. Comment out all but 1 of them; pick the one
# you like best. Initially the most universal but least reliable of them is
# uncommented.
# # ###### Scheme #1: fast but usually returns too big a number
nusers=`awk -F: "BEGIN {N=0}\\$3>=$lowUID && \\$3<=$highUID{N=N+1}END{print N}" </etc/passwd`

# # ###### Scheme #2 (works with BSD systems)
#nusers=`last | sort -u +0 -1 | wc -l`

# # ###### Scheme #3 (works with USG systems)
#nusers=`who /etc/wtmp | sort -u +0 -1 | wc -l`

################################
#
# Set up awk scripts;  these are too large to pass as arguments on most
# systems.
#
# This awk script generates the actual output report.
# We use 'sed' to substitute in the shell variables to save ourselves
# endless hassle trying to find quoting/backslashing problems.
#
# The input to this script consists of two types of lines (pre-sorted):
#
#	(1) Active-file lines.  These have four fields:  newsgroup name,
#	    first existing article, last article number, 'y' or 'n'
#	    to allow/disallow posting.
#			mod.mac 00001 00001 y
#
#	(2) .newsrc-derived lines.  These have three fields:  the newsgroup
#	    name, the user name and the articles-read information.  The latter
#	    can be arbitrarily complex.  It can also be arbitrarily long;
#	    this can potentially break either awk or sed, in which
#	    case the script will not work.
#			mod.map joe 1-199
#
#	The script uses the type 1 lines to define the newsgroups
#	and their active article ranges.  The .newsrc (type 2) lines are
#	then used to deduce which users are reading that group (a group
#	is being read if the last article seen is in that group's active
#	article range).
#
sed "/^#/d
     s/NUSERS/$nusers/g
     s/HOSTNAME/$hostname/g
     s/DATE/$dat/g" > $TMPDIR/arb.fmt.$$ << 'DOG'
# makereport -- utility for "arbitron". Early versions were copied from a
# similar script distributed with "subscribers.sh" by Blonder, McCreery, and
# Herron.
#
	BEGIN	{ rdrcount = 0 ; reader = "" ; grpcount = 0 ; realusers = 0}
#
# Active file line:  dispose of previous group (if any), record group, and
# record first and last article numbers.  Set group's reader count to none.
	NF == 4 { if (grpname != "") {
			printf("%d %s\n",grpcount, grpname)
		  }
		  grpname = $1
		  grpfirst = $3
		  grplast = $2
		  grpcount = 0
		}
#
# .newsrc line.  Break out the final number, which is the last article that
# has actually been read.  This is a pretty good indicator of the person's
# true interest in the group.  If 'lastread' for the group is a current
# (unexpired) article, record a reader for that group.  Finally, record
# the user as a "real" user of the news system.
#
	NF == 3 { if ($1 != grpname) next;
		  n1 = split($3, n2, "-")
		  n3 = split(n2[n1], n4, ",")
		  lastread = n4[n3]
	if ((grpfirst != grplast) && (lastread >= grpfirst) && (lastread <= grplast)) {
			grpcount++
			if (realuser[$2] != 1) {
			    realuser[$2] = 1
			    realusers++
			}
		  }
		}
#
# End of file.  Print the report in 2 columns.
	END	{ printf("9999 Host\t\t%s\n","HOSTNAME")
		  printf("9998 Users\t\t%d\n",NUSERS)
		  printf("9997 NetReaders\t%d\n",realusers)
		  printf("9996 ReportDate\t%s\n","DATE")
		  printf("9995 SystemType\tnews-arbitron-2.4\n")
# For reorganized network, report a group even if nobody reads it. This will
# help us keep track of where the groups propagate.
		  printf("%d %s\n",grpcount, grpname)
		}
DOG

cat >$TMPDIR/arb.pwd.$$ <<'MOUSE'
BEGIN	{ seen["/"]=1; seen[""] = 1; }
	{ if (seen[$6]!=1) {
		printf("if [ -r %s/.newsrc ] ; then ", $6)
		printf("sed -n '/: [0-9]/s/:/ %s/p' <%s/.newsrc; fi\n",$1,$6)
		seen[$6]=1;
	  }
}
MOUSE

# First, make sure we have an active file
if [ -z "$NEWSHOST" ]
then ACTIVE=$NEWS/active
else ACTIVE=/tmp/arb.active.$$
     rcp $NEWSHOST:$NEWS/active $ACTIVE
fi

if [ ! -s $ACTIVE ]
then
    echo arbitron: ACTIVE file missing or empty. Cannot continue.
    exit 1
fi

# Next, get the list of .newsrc files with duplicates and unreadable files
# removed.
awk -F: -f $TMPDIR/arb.pwd.$$ </etc/passwd | sh >$TMPDIR/arb.tmp.$$

# Check to make sure that we found some
if [ -s $TMPDIR/arb.tmp.$$ ]
then # See if "active" file has 4 fields or only two (pre-2.10.2)
     set `sed 1q < $ACTIVE`
     if [ $# -eq 2 ]
     then egrep  '^[a-z]*\.' $ACTIVE |
	  while read group last
	  do dir=`echo "$group" | sed 's;\.;/;g'`
	     first=`ls $SPOOL/$dir | grep '^[0-9]*' | sort -n | sed 1q`
	     case $STYPE in
		usg) echo "$group $last ${first:-$last} X";;
		  *) echo "$group $last ${first-$last} X"
	     esac
	  done
     else egrep '^[a-z]*\.' $ACTIVE
     fi |
     sort - $TMPDIR/arb.tmp.$$ |
     awk -f $TMPDIR/arb.fmt.$$ |
     sort -nr |
     sed '/^$/d
	  s/^999[0-9] //' |
     $destination
else echo Unable to find any readable .newsrc files 2>&1
     exit 1
fi

Relay-Version: version B 2.10 5/3/83; site utzoo.UUCP
Path: utzoo!mnetor!uunet!seismo!rick
From: r...@seismo.CSS.GOV (Rick Adams)
Newsgroups: comp.sources.d,news.groups
Subject: Re: arbitron program (v2.4.2--last updated 4 June 1987)
Message-ID: <44103@beno.seismo.CSS.GOV>
Date: Sun, 4-Oct-87 01:38:37 EDT
Article-I.D.: beno.44103
Posted: Sun Oct  4 01:38:37 1987
Date-Received: Wed, 7-Oct-87 02:47:30 EDT
References: <11731@decwrl.DEC.COM> <919@hao.UCAR.EDU> <14015@oddjob.UChicago.EDU>
Organization: Center for Seismic Studies, Arlington, VA
Lines: 8
Summary: alt.sources

Yet one has to wonder: If alt.sources is "proof" of the need/demand for
unmoderated sources, why does one feel obligated to post to a group
specifically created for discussion only?

Is this an admission that alt.sources is inadequate? (Or maybe inappropriate?)
You would think it would at least be cross posted...

--rick

Relay-Version: version B 2.10 5/3/83; site utzoo.UUCP
Path: utzoo!mnetor!uunet!husc6!mit-eddie!genrad!decvax!decwrl!reid
From: r...@decwrl.dec.com (Brian Reid)
Newsgroups: comp.sources.d,news.groups
Subject: Re: arbitron program (v2.4.2--last updated 4 June 1987)
Message-ID: <74@bacchus.DEC.COM>
Date: Fri, 9-Oct-87 02:39:27 EDT
Article-I.D.: bacchus.74
Posted: Fri Oct  9 02:39:27 1987
Date-Received: Sun, 11-Oct-87 12:16:31 EDT
References: <11731@decwrl.DEC.COM> <919@hao.UCAR.EDU> <14015@oddjob.UChicago.EDU> 
<44103@beno.seismo.CSS.GOV>
Reply-To: r...@decwrl.UUCP (Brian Reid)
Organization: DEC Western Research
Lines: 23

In article <44...@beno.seismo.CSS.GOV> r...@seismo.CSS.GOV (Rick Adams) writes:
>Yet one has to wonder: If alt.sources is "proof" of the need/demand for
>unmoderated sources, why does one feel obligated to post to a group
>specifically created for discussion only?
>
>Is this an admission that alt.sources is inadequate? (Or maybe inappropriate?)
>You would think it would at least be cross posted...

Hey, guys, woof woof. This is not posted by me, it is posted by crontab using
a shell script I wrote back before alt.sources existed. So I forgot to update
it to include alt.sources. I still intend to crosspost to comp.sources.d just
because alt.sources doesn't go everywhere and it is to everyone's benefit for
as many sites as possible to run the arbitron script.

But I'm delighted to see the discussion. Sure, some people are annoyed, but
everything annoys *somebody*. My goal is to get as many people as possible to
install and run the script, and any technique that works to draw attention to
it is fine with me.

Also it's ridiculous to have sources that are posted automatically every
month be sent to a moderator, and it is equally ridiculous to have sources
that are posted automatically every month be in a newsgroup that is archived
in thousands of sites all over the world.

Relay-Version: version B 2.10 5/3/83; site utzoo.UUCP
Path: utzoo!mnetor!uunet!husc6!rutgers!mcnc!xanth!kyle
From: k...@xanth.UUCP (Kyle Jones)
Newsgroups: comp.sources.d
Subject: Re: arbitron program (v2.4.2--last updated 4 June 1987)
Message-ID: <2713@xanth.UUCP>
Date: Fri, 9-Oct-87 21:00:57 EDT
Article-I.D.: xanth.2713
Posted: Fri Oct  9 21:00:57 1987
Date-Received: Mon, 12-Oct-87 05:37:08 EDT
References: <11731@decwrl.DEC.COM> <919@hao.UCAR.EDU> <14015@oddjob.UChicago.EDU> 
<74@bacchus.DEC.COM>
Lines: 15

In article <7...@bacchus.DEC.COM>, r...@decwrl.dec.com (Brian Reid) writes:
> Also it's ridiculous to have sources that are posted automatically every
> month be sent to a moderator, and it is equally ridiculous to have sources
> that are posted automatically every month be in a newsgroup that is archived
> in thousands of sites all over the world.

Agreed, but why post the arbitron program EVERY month?  It would be to
everyone's benefit if more sites ran pathalias and smail, but we
certainly wouldn't want these packages posted every month.

kyle jones  <k...@odu.edu>  old dominion university, norfolk, va  usa

p.s. I feel the same way about the the USENET maps.  Once every six
     months would be often enough, since new or changed map entries
     can be posted to news.config or news.newsite .

Relay-Version: version B 2.10 5/3/83; site utzoo.UUCP
Path: utzoo!mnetor!uunet!seismo!sundc!pitstop!sun!decwrl!labrea!aurora!ames!
lll-tis!ptsfa!ihnp4!homxb!mtuxo!mtune!codas!usfvax2!ateng!chip
From: c...@ateng.UUCP (Chip Salzenberg)
Newsgroups: comp.sources.d
Subject: Periodic repostings (Was: arbitron program)
Message-ID: <37@ateng.UUCP>
Date: Mon, 12-Oct-87 15:10:31 EDT
Article-I.D.: ateng.37
Posted: Mon Oct 12 15:10:31 1987
Date-Received: Wed, 14-Oct-87 05:39:31 EDT
References: <11731@decwrl.DEC.COM> <919@hao.UCAR.EDU> <14015@oddjob.UChicago.EDU> 
<74@bacchus.DEC.COM> <2713@xanth.UUCP>
Reply-To: c...@ateng.UUCP (Chip Salzenberg)
Organization: A.T. Engineering, Tampa, FL
Lines: 22

>[...] why post the arbitron program EVERY month?  It would be to
>everyone's benefit if more sites ran pathalias and smail, but we
>certainly wouldn't want these packages posted every month.
>
>p.s. I feel the same way about the the USENET maps.

My machine cannot subscribe to comp.mail.maps because of its volume.  But
once every six months I could accept.

I see two issues:

	1.  Should useful programs constantly be reposted?
		My opinion:  Usually not.  (At least arbitron is small.)

	2.  How often should updated UUCP maps be posted?
		My opinion:  Every three to six months.

-- 
Chip Salzenberg         "c...@ateng.UUCP"  or  "{uunet,usfvax2}!ateng!chip"
A.T. Engineering        My employer's opinions are not mine, but these are.
   "Gentlemen, your work today has been outstanding.  I intend to recommend
   you all for promotion -- in whatever fleet we end up serving."   - JTK

Relay-Version: version B 2.10 5/3/83; site utzoo.UUCP
Path: utzoo!mnetor!uunet!husc6!necntc!ames!sri-spam!sri-unix!ctnews!pyramid!
fmsrl7!nucleus!netsys!len
From: l...@netsys.UUCP (Len Rose)
Newsgroups: comp.sources.d
Subject: Re: Periodic repostings (Was: arbitron program)
Message-ID: <1487@netsys.UUCP>
Date: Tue, 13-Oct-87 19:00:33 EDT
Article-I.D.: netsys.1487
Posted: Tue Oct 13 19:00:33 1987
Date-Received: Fri, 16-Oct-87 01:40:10 EDT
References: <11731@decwrl.DEC.COM> <919@hao.UCAR.EDU> <14015@oddjob.UChicago.EDU> 
<74@bacchus.DEC.COM> <2713@xanth.UUCP> <37@ateng.UUCP>
Reply-To: l...@netsys.UUCP (Len Rose)
Organization: NetSys Public Access Network
Lines: 28


  It would be nice to see them once in awhile (the maps) ,most sites I know
 can't get the complete set because some site in their path to a backbone cannot
 handle the large amount of traffic they would generate.

  I , like many others end up getting them from archive sites.

  Maybe it would be a good idea to have a few sites in the country as uucp map
 archive centers,in which case the responsibility (and cost) for transferring this
 massive amount of data would fall upon the site that actually needed them.Then we
 could transport diffs on the net,or update packages designed in such a way as to
 facilitate easy maintenance of the net topography.

  Not only would this save the net in general alot of money,but it may make it easier
 for all sites to obtain the bloody maps.This would only be true if there were enough
 sites willing to perform this service.I am sure that many of the anonymous source
 archive sites are providing the maps already,but there will be alot gained by making it
 "Official" .

  Has this been discussed before? If so,then my apologies.

 

Len Rose -* Netsys Public Access Network *- The East Coast Machine
301-540-3656,3657,3658,3659    3B2/Unix SV3.0
-- 
Len Rose -* Netsys Public Access Network *- The East Coast Machine
301-540-3656,3657,3658,3659    3B2/Unix SV3.0