Tech Insider					     Technology and Trends


			      USENET Archives

From: Miquel van Smoorenburg <miqu...@cistron.nl>
Subject: Big SMP machine hangs often [debug included]
Date: 2000/05/17
Message-ID: <fa.d8edb3v.1lk258q@ifi.uio.no>#1/1
X-Deja-AN: 624615735
Original-Date: Wed, 17 May 2000 19:06:43 +0200
Sender: owner-linux-ker...@vger.rutgers.edu
Original-Message-ID: <20000517190643.A9020@cistron.nl>
To: linux-ker...@vger.rutgers.edu
Content-Type: text/plain; charset=us-ascii
X-Orcpt: rfc822;linux-kernel-outgoing-dig
X-NCC-RegID: nl.cistron
Organization: Internet mailing list
Mime-Version: 1.0
User-Agent: Mutt/1.0i
Newsgroups: fa.linux.kernel
X-Loop: majord...@vger.rutgers.edu

[Apologies if you see this twice, but after 3 hours I still haven't
 seen my original posting on this list]

We sold a customer a big AMI Megaplex (4xPIII/500, 2GB RAM) server
but as soon as they put any load on it, they see the following
problems:

- sometimes the I/O subsytems "hangs" for 10-20 seconds
- every few days the server just hangs. Doesn't respond to pings, nothing.
  We need to press the RESET switch....

Config is:

- AMI Megaplex 4xPIII/500 2 GB RAM
- AMI MegaRaid EC9F:1.24 controller with 4 18 GB disks in RAID5 mode
- Linux 2.2.13 kernel compiled with gcc 2.7.2.3 and SMP / 2GB support

Today the machine hung again, but it did still respond to SYSRQ, so
I got the following debug output. I'd appreciate it if someone could
take a look and say if this is something that 2.2.14/2.2.15 should
solve, or that it is something else. It looks like the kernel gets
stuck in add_timer/timer_bh somehow. Note also the strange "buffer hashed"
output.

Unfortunately there is no way to force an OOPS using sysrq right now,
so I do not have a complete stack trace.

[short System.map fragment]
80110cf8 T add_timer		<-----
80110e94 T mod_timer
8011102c T del_timer
80111084 T schedule_timeout

80111f38 T update_one_process
8011200c t update_process_times
80112014 t timer_bh		<-----
801123ac T do_timer
80112400 T sys_alarm
[.....]

SysRq: Show Regs

EIP: 0010:[<8011234c>] EFLAGS: 00000283
EAX: aa448ae0 EBX: aa4489c0 ECX: 801d6584 EDX: aa448af4
ESI: 8015d060 EDI: 00000001 EBP: 81e97cc8 DS: 0018 ES: 0018
CR0: 8005003b CR2: 2acae000 CR3: 3a30f000

SysRq: Show Regs

EIP: 0010:[<8011234c>] EFLAGS: 00000283
EAX: fbdeb72c EBX: aa4489c0 ECX: 801d6558 EDX: aa448af4
ESI: 8015d060 EDI: 00000001 EBP: 81e97cc8 DS: 0018 ES: 0018
CR0: 8005003b CR2: 2acae000 CR3: 3a30f000

SysRq: Show Regs

EIP: 0010:[<80110e78>] EFLAGS: 00000246
EAX: 801d664c EBX: 00000246 ECX: aa448af4 EDX: 000000ec
ESI: 8015d060 EDI: 00000001 EBP: 81e97c9c DS: 0018 ES: 0018
CR0: 8005003b CR2: 2acae000 CR3: 3a30f000

SysRq: Show Regs

EIP: 0010:[<8011234c>] EFLAGS: 00000287
EAX: f67045ac EBX: aa4489c0 ECX: 801d63c0 EDX: aa448af4
ESI: 8015d060 EDI: 00000001 EBP: 81e97cc8 DS: 0018 ES: 0018
CR0: 8005003b CR2: 2acae000 CR3: 3a30f000

SysRq: Show Regs

EIP: 0010:[<80110e78>] EFLAGS: 00000246
EAX: 801d6504 EBX: 00000246 ECX: aa448af4 EDX: 0000009a
ESI: 8015d060 EDI: 00000001 EBP: 81e97c9c DS: 0018 ES: 0018
CR0: 8005003b CR2: 2acae000 CR3: 3a30f000

SysRq: Show Regs

EIP: 0010:[<80110e78>] EFLAGS: 00000246
EAX: 801d6318 EBX: 00000246 ECX: aa448af4 EDX: 0000001f
ESI: 8015d060 EDI: 00000001 EBP: 81e97c9c DS: 0018 ES: 0018
CR0: 8005003b CR2: 2acae000 CR3: 3a30f000

SysRq: Show Regs

EIP: 0010:[<8011234c>] EFLAGS: 00000283
EAX: aa448ae0 EBX: aa4489c0 ECX: 801d665c EDX: aa448af4
ESI: 8015d060 EDI: 00000001 EBP: 81e97cc8 DS: 0018 ES: 0018
CR0: 8005003b CR2: 2acae000 CR3: 3a30f000

SysRq: Show Memory
Mem-info:
Free pages:     71316kB
 ( Free: 17829 (256 512 768)
NonDMA: 8179*4kB 3695*8kB 489*16kB 5*32kB 2*64kB 1*128kB 1*256kB 0*512kB 0*1024k
B 0*2048kB = 70772kB)
DMA: 0*4kB 0*8kB 2*16kB 2*32kB 1*64kB 1*128kB 1*256kB 0*512kB 0*1024kB 0*2048kB 
= 544kB)
Swap cache: add 6339, delete 6330, find 329846/339376
Free swap:      129880kB
507904 pages of RAM
5456 reserved pages
52779 pages shared
9 pages swap cached
361640 pages in file cache
361649 pages in page cache
24 pages in page table cache
Buffer memory:  282680kB
Buffer heads:   71303
Buffer blocks:  71267
Buffer hashed:  -3666981
   CLEAN: 70927 buffers, 83 used (last=70758), 0 locked, 0 protected, 0 dirty
  LOCKED: 74 buffers, 0 used (last=0), 0 locked, 0 protected, 0 dirty
   DIRTY: 190 buffers, 9 used (last=177), 0 locked, 0 protected, 190 dirty
Networking buffers in use         : 780
Total network buffer allocations   : 2201836468
Total failed network buffer allocs : 0
IP fragment buffer size         : 0

Mike.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.rutgers.edu
Please read the FAQ at http://www.tux.org/lkml/

From: Alan Cox <a...@lxorguk.ukuu.org.uk>
Subject: Re: Big SMP machine hangs often [debug included]
Date: 2000/05/17
Message-ID: <fa.fiiqv8v.12min8s@ifi.uio.no>#1/1
X-Deja-AN: 624650271
Original-Date: Wed, 17 May 2000 19:43:53 +0100 (BST)
Sender: owner-linux-ker...@vger.rutgers.edu
Original-Message-Id: <E12s8nT-00005F-00@the-village.bc.nu>
Content-Transfer-Encoding: 7bit
References: <fa.d8edb3v.1lk258q@ifi.uio.no>
To: miqu...@cistron.nl (Miquel van Smoorenburg)
Content-Type: text/plain; charset=us-ascii
X-Orcpt: rfc822;linux-kernel-outgoing-dig
Organization: Internet mailing list
MIME-Version: 1.0
Newsgroups: fa.linux.kernel
X-Loop: majord...@vger.rutgers.edu

> - AMI MegaRaid EC9F:1.24 controller with 4 18 GB disks in RAID5 mode
> - Linux 2.2.13 kernel compiled with gcc 2.7.2.3 and SMP / 2GB support

Get at least the 1.07b MegaRAID driver and also the 3.10 or higher firmware.

> solve, or that it is something else. It looks like the kernel gets
> stuck in add_timer/timer_bh somehow. Note also the strange "buffer hashed"
> output.

Adding a timer continually for now or past can cause this hang. Try 2.2.15
and the delack-5 patch from Andrea. Andrea also did a patch to stop timers
that get queued this way hanging the box so you can debug it

Alan


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.rutgers.edu
Please read the FAQ at http://www.tux.org/lkml/

From: Alan Cox <a...@lxorguk.ukuu.org.uk>
Subject: Re: Big SMP machine hangs often [debug included]
Date: 2000/05/17
Message-ID: <fa.fiiqv8v.12min8s@ifi.uio.no>#1/1
X-Deja-AN: 624650271
Original-Date: Wed, 17 May 2000 19:43:53 +0100 (BST)
Sender: owner-linux-ker...@vger.rutgers.edu
Original-Message-Id: <E12s8nT-00005F-00@the-village.bc.nu>
Content-Transfer-Encoding: 7bit
References: <fa.d8edb3v.1lk258q@ifi.uio.no>
To: miqu...@cistron.nl (Miquel van Smoorenburg)
Content-Type: text/plain; charset=us-ascii
X-Orcpt: rfc822;linux-kernel-outgoing-dig
Organization: Internet mailing list
MIME-Version: 1.0
Newsgroups: fa.linux.kernel
X-Loop: majord...@vger.rutgers.edu

> - AMI MegaRaid EC9F:1.24 controller with 4 18 GB disks in RAID5 mode
> - Linux 2.2.13 kernel compiled with gcc 2.7.2.3 and SMP / 2GB support

Get at least the 1.07b MegaRAID driver and also the 3.10 or higher firmware.

> solve, or that it is something else. It looks like the kernel gets
> stuck in add_timer/timer_bh somehow. Note also the strange "buffer hashed"
> output.

Adding a timer continually for now or past can cause this hang. Try 2.2.15
and the delack-5 patch from Andrea. Andrea also did a patch to stop timers
that get queued this way hanging the box so you can debug it

Alan


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.rutgers.edu
Please read the FAQ at http://www.tux.org/lkml/

			        About USENET

USENET (Users’ Network) was a bulletin board shared among many computer
systems around the world. USENET was a logical network, sitting on top
of several physical networks, among them UUCP, BLICN, BERKNET, X.25, and
the ARPANET. Sites on USENET included many universities, private companies
and research organizations. See USENET Archives.

		       SCO Files Lawsuit Against IBM

March 7, 2003 - The SCO Group filed legal action against IBM in the State 
Court of Utah for trade secrets misappropriation, tortious interference, 
unfair competition and breach of contract. The complaint alleges that IBM 
made concentrated efforts to improperly destroy the economic value of 
UNIX, particularly UNIX on Intel, to benefit IBM's Linux services 
business. See SCO vs IBM.

The materials and information included in this website may only be used
for purposes such as criticism, review, private study, scholarship, or
research.

Electronic mail:			       WorldWideWeb:
   tech-insider@outlook.com			  http://tech-insider.org/