4.2 abrupt halts

Relay-Version: version B 2.10 5/3/83; site utzoo.UUCP
Path: utzoo!linus!security!genrad!grkermit!masscomp!clyde!...@BRL-VGR.ARPA
From: roode%u...@BRL-VGR.ARPA
Newsgroups: net.unix-wizards
Subject: 4.2 abrupt halts
Message-ID: <15161@sri-arpa.UUCP>
Date: Thu, 5-Jan-84 19:24:19 EST
Article-I.D.: sri-arpa.15161
Posted: Thu Jan  5 19:24:19 1984
Date-Received: Sun, 8-Jan-84 00:53:55 EST
Lines: 44

From:  Dana Roode <roode%uci-750a@BRL-VGR.ARPA>

We are experiencing mysterious halts on our 750 system, which, if we had
not just installed 4.2BSD and not had the problem before, we would swear
were hardware caused.  The system will be running fine, and out of nowhere,
we halt:

	800202CA 04

The documentation says the "04" halt code indicates "interrupt stack not
valid or unable to read SCB".  The address corresponds to "_dumpsys+.9e" 
in our kernel, which appears to be a harmless "pushaf" of an argument 
for printf.  Of course the fact that we are in "dumpsys" probably 
indicates we were trying to crash anyway, but why, I don't know.  
Nothing appears on the console before the halt, and the system does not 
try to continue despite the fact that the console switch is in its 
normal "restart" position.

After some of these crashes, we were unable to reboot at all without 
powering the CPU on and off.  We would type the boot command to the front
end and receive a micro verify check failure (single "%" or "%O").  This
lead us to believe we had a hardware problem.  DEC replaced our L0002
CPU module, which they said included the microcode hardware involved.  We
have had another abrupt halt since then, but this time the system
responded properly to a boot command.

	Has anyone seen a problem like this one with 4.2? (or 4.1?)

	Hardware or software?

	If there was an original problem that triggered entry into the
	dumpsys routine, how do we find what the problem was, given that
	nothing is printing on the console?

Since we are in great need to get our system back on its feet as soon
as possible, please send a copy of all replies directly to me.

	Thanks,

		Dana Roode
		University of California, Irvine
		roode.uci@rand-relay  -or-
		ucbvax!ucivax!roode

Relay-Version: version B 2.10 5/3/83; site utzoo.UUCP
Posting-Version: version B 2.10.1 6/24/83 (MC830713); site erix.UUCP
Path: utzoo!watmath!clyde!burl!ulysses!harpo!decvax!mcvax!enea!erix!mike
From: mike@erix.UUCP (Mike Williams)
Newsgroups: net.unix-wizards
Subject: 4.2 died on our VAX.
Message-ID: <301@erix.UUCP>
Date: Wed, 14-Mar-84 09:00:12 EST
Article-I.D.: erix.301
Posted: Wed Mar 14 09:00:12 1984
Date-Received: Thu, 15-Mar-84 07:15:47 EST
Organization: L M Ericsson, Stockholm, Sweden
Lines: 16

Normally when UNIX panics we get a dump which we sometimes look at to
see what happened.

The other day our VAX just died. It continued to run, echoed text from
terminals etc. but just hung if you tried to give it any commands. It
turned out that one of our disk controllers was playing up. We swap and 
page on two disks (hp0 and hp1) and hp1 just gave up. This was quickly
repaired and we were up again after an hour.

Is there any way to force a dump in these conditions? Why did the VAX
just play dumb?

Mike Williams
{decvax,philabs}!mcvax!enea!erix!mike
or