2.2.13ac3 crashes under high network load

From: ur...@usa.net
Subject: 2.2.13ac3 crashes under high network load
Date: 1999/12/15
Message-ID: <837hq1$hat$1@nnrp1.deja.com>
X-Deja-AN: 560862255
X-Http-Proxy: 1.0 x35.deja.com:80 (Squid/1.1.22) for client 199.95.209.163
Organization: Deja.com - Before you buy.
X-Article-Creation-Date: Wed Dec 15 08:00:02 1999 GMT
X-MyDeja-Info: XMYDJUIDray450
Newsgroups: comp.os.linux.networking
X-Http-User-Agent: Mozilla (X11; I; Linux 2.0.32 i586)

Alan C/Dave M/Ingo M/Dan K et. al ...

I'm trying to build a "ultra high-performance" webserver
which needs to handle at least 500 (preferably 1000) sustained
hits/second (my entire webserver farm needs to handle excess
of 1 billion hits/day already, soon to be 2 billion/day).
Yes I've read the C10K page at http://www.kegel.com/c10k.html
and tried some of the experimental webservers, most of which
are way too experimental (and lacking features :) which I need.
The issue isn't with Apache not cutting it, but rather with
some kernel (and/or network driver) issues I think ...

Here is my hardware and software configuration:

Compaq 6400R (rackmount):
  dual PIII-550 MHz processors
  2048 MByte ECC memory
  Intel EtherExpress Pro/100 NIC

Linux 2.2.13:
  with Alan Cox's "2.2.13ac3" super-patch applied
  using Donald Becker's eepro100.c:v1.09l (8/7/99)
  "BIGMEM" option enabled in kernel configuration
  using raid0145 (raidtools-0.90) (in ac3 patchset)
  system on single 9.1G Ultra-Wide 10K SCSI drive
  web content on RAID-0 across 2 identical UWSCSI drives
  modify some SysCtl parameters via script (see below)

I have the above setup on 10 servers running Apache 1.3.9
(modified with an updated version of his TOP_FUEL patch)
in a cluster, managed by a Cisco LocalDirector 420. Under
test conditions, I can pull content fast enough from Apache,
to saturate the 100MBit/s full-duplex [switched] interface
on each webserver (according to `ab`, anyhow).

HOWEVER, after 2-3 days of uptime, handling on average
between 50 and 200 hits/second (per server) which means
from 100 to 400 (and up) open connections per server ...
these machines will start to fail. Usually this happens,
when the number of active TCP/IP connections is high,
say about 1000 for a brief period of time. I did have to
increase the route-cache limits in /proc/sys/net/ipv4
per a posting from Alexey Kuznetsov in fa.linux.kernel,
else I was getting tons of "dst_cache_overflow" errors)

The server becomes unreachable (cannot be PING'ed)
and on the console (finally figured out how to disable
the damned screen-blanker/powersave via a little script
during init, also pasted below). First I see some stuff
about INODE already being cleared (seems like RAID
related problems?) and shortly after, different messages
start looping on the console:

  <[8010b37d]> <[80162067]> <[80162140]> \
  <[8016f168]> <[801511d4]> <[80161fa6]>

  wait_on_bh, CPU 3:
  irq:	0 [0 0]
  bh:	1 [0 0]

(repeats about once per second)

I've compiled in the 'Magic SysRq' feature into the kernel,
so upon pressing <ALT> + <SysRq> + P, we see the following:
(there might be a few mistakes, hand-copied!)

  >SysRq: Show Regs

  EIP: 0010:[<8010b38c>]  EFLAGS: 00000202
  EAX: 00000001  EBX: fae00e80  ECX: f0aeff14  EDX: aee2c21d
  ESI: 00000000  EDI: f0aee000  EBP: f0aee000  DS: 0018  ES: 0018
  CR0: 8005003b  CR2: 2aac1000  CR3: 69e5f000

Pressing <ALT> + <SysRq> + M shows the usual,
nothing unusual about usage and no swap in use.
I did some searches on DejaNews about wait_on_bh,
and found references from January, March 1999
from David Miller, Alan Cox, saying this was
SMP-related ("wait on bottom-half") and fixed,
in 2.2.3 (?) Apparently it's not really fixed :/

On other servers in the same cluster, I'm not seeing the
"wait_on_bh" error above, but rather some resource-starvation
issue (I think); the console displays the usual /etc/issue
message and a login: prompt, but when I try to login (any
user) there is a message displayed [very fast/unreadable]
and I'm kicked back to the login screen again. Suspect that
the message is about "Resource temporarily unavailable".

One one of these machines which would not let me login,
after reducing the network load (bled off some traffic)
I could finally login interactively. I went ahead and
pressed <ALT> + <SysRq> + P on all the machines stuck
at the login screen, and they mostly all were at EIP
"80107a71" (cpu_idle, according to my System.map).
The output from <ALT> + <SysRq> + M was more interesting,
some of these servers had NEGATIVE "buffer hash" values,
e.g. "buffer hash: -191435", "buffer hash:-178450", etc).

In all cases I was able to sync via <ALT> + <SysRq> + S
(some servers I had to <ALT> + <SysRq> + E first, though)
and Unmount/reBoot via SysCtl also in most of the crashes.

For the time being, I've edited /etc/lilo.conf, and appended
"nosmp noapic" to the active kernel entry, to force non-SMP
mode; I'm going to run the webserver cluster for several days,
and see if the same problems occur ...

Any help/patches/advice on the above is greatly appreciated.
Thanks in advance ...

RW

----------------------------------------------------------------------
/etc/rc.d/init.d/noblank
----------------------------------------------------------------------
#!/bin/sh
#
# Prevents situations where kernel crashes,
# but we cannot see any console error messages
# because the screen was blanked earlier.

IFS=' '
TTYS='1 2 3 4 5 6'

for n in ${TTYS}
do
    echo "eval `/usr/bin/setterm -blank 0`"         > /dev/tty${n}
    echo "eval `/usr/bin/setterm -powersave off`"   > /dev/tty${n}
done

exit 0

---------------------------------------------------------------------
/etc/rc.d/init.d/proctune
----------------------------------------------------------------------
#!/bin/bash
#
# /etc/rc.d/init.d/proctune.linuxcare
#
# chkconfig: 345 80 20
# description: SysCtl (proc) tunings from LinuxCare \
#              based on research of Jim Dennis
#
# processname: proctune.linuxcare

# Source function library.
. /etc/rc.d/init.d/functions

#
#       See how we were called.
#
case "$1" in
  start)
        echo "Running proctune.linuxcare:"
        #
        echo -n "  /proc/sys/fs/file-max  . . . . . . . . . . . "
        echo '16384' > /proc/sys/fs/file-max
        cat /proc/sys/fs/file-max
        #
        echo -n "  /proc/sys/fs/inode-max . . . . . . . . . . . "
        echo '65536' > /proc/sys/fs/inode-max
        cat /proc/sys/fs/inode-max
        #
        echo -n "  /proc/sys/net/ipv4/ip_local_port_range . . . "
        echo "32768 65535" > /proc/sys/net/ipv4/ip_local_port_range
        cat /proc/sys/net/ipv4/ip_local_port_range
        #
        echo -n "  /proc/sys/net/ipv4/route/gc_elasticity . . . "
        echo '2' > /proc/sys/net/ipv4/route/gc_elasticity
        cat /proc/sys/net/ipv4/route/gc_elasticity
        #
        echo -n "  /proc/sys/net/ipv4/route/gc_min_interval . . "
        echo '1' > /proc/sys/net/ipv4/route/gc_min_interval
        #echo '0' > /proc/sys/net/ipv4/route/gc_min_interval
        cat /proc/sys/net/ipv4/route/gc_min_interval
        #
        #echo -n "  /proc/sys/net/ipv4/route/gc_thresh . . . . ."
        #echo '256' > /proc/sys/net/ipv4/route/gc_thresh
        #echo '512' > /proc/sys/net/ipv4/route/gc_thresh
        #cat /proc/sys/net/ipv4/route/gc_thresh
        #
        echo -n "  /proc/sys/net/ipv4/route/max_size  . . . . . "
        #echo '4096' > /proc/sys/net/ipv4/route/max_size
        echo '8192' > /proc/sys/net/ipv4/route/max_size
        cat /proc/sys/net/ipv4/route/max_size
        #
	#
        echo -n "  /proc/sys/vm/bdflush . . . . . . . . . . . . "
        echo '98 100 128 256 15 500 1884 2 2' > /proc/sys/vm/bdflush
        cat /proc/sys/vm/bdflush
        #
	#
        echo -n "  /proc/sys/vm/bdflush . . . . . . . . . . . . "
        echo '98 100 128 256 15 500 1884 2 2' > /proc/sys/vm/bdflush
        cat /proc/sys/vm/bdflush
        #
        #
        echo -n "  /proc/sys/vm/buffermem . . . . . . . . . . . "
        echo '90 10 98' > /proc/sys/vm/buffermem
        cat /proc/sys/vm/buffermem
        #
        #echo -n "  /proc/sys/vm/overcommit_memory . . . . . . ."
        #echo '1' > /proc/sys/vm/overcommit_memory
        #cat /proc/sys/vm/overcommit_memory
        #
        echo -n "  /proc/sys/vm/page-cluster  . . . . . . . . . "
        echo '5' > /proc/sys/vm/page-cluster
        cat /proc/sys/vm/page-cluster
        #
        echo -n "  /proc/sys/vm/pagecache . . . . . . . . . . . "
        echo '80 30 95' > /proc/sys/vm/pagecache
        cat /proc/sys/vm/pagecache
        #
        ;;
  stop)
        :
        ;;
  *)
        echo "Usage: /etc/rc.d/init.d/proctune.linuxcare {start|stop}"
        exit 1
esac

exit 0


Sent via Deja.com http://www.deja.com/
Before you buy.