Tech Insider Technology and Trends
Linux Activists Mailing List Archives
From: Matt Welsh < mdw@TC.Cornell.EDU>
Subject: New FAQ on kernel panics
Date: Sat, 16 Jan 1993 00:28:40 +0200
Linus, can we get a Q/A in the next FAQ about how to track down kernel
panic messages with nm? I see a lot of these, and if people would be so
kind as to tell us where the kernel is panicing (instead of just giving us
the panic message) it would help a lot. Basically, a Q/A describing the
contents of a standard panic message, what it means, etc. would be great.
From: torvalds@cs.Helsinki.FI (Linus Torvalds)
Subject: Re: New FAQ on kernel panics
Date: Sat, 16 Jan 1993 02:19:37 +0200
Matt Welsh: "New FAQ on kernel panics" (Jan 15, 17:28):
> Linus, can we get a Q/A in the next FAQ about how to track down kernel
> panic messages with nm? I see a lot of these, and if people would be so
> kind as to tell us where the kernel is panicing (instead of just giving us
> the panic message) it would help a lot. Basically, a Q/A describing the
> contents of a standard panic message, what it means, etc. would be great.
Ok. Hope somebody can make a FAQ out of this...
The panic message essentially consists of:
- possible debugging messages printed out by the code before the panic:
these may be important to tell more closely why the kernel decided to
- one line of "Kernel panic: " and a small reason string (eg "unable to
- if the panic happened while running the swapper task (aka idle task,
aka dummy task), you finally get a line that tells you that the
kernel is unable to sync any devices ("In swapper task - not
syncing"). This generally means that the panic happened in a
interrupt handler, as the idle task should never really panic on it's
After a kernel panic, the machine is essentially dead: keyboard
interrupts may still be working, so that you can switch VC's and press
ctrl-alt-del, but no tasks are running.
More interesting than the actual panic message is usually the debugging
messages prior to a panic. The debugging messages can happen without
the panic, but most debug messages that remain are pretty severe, so a
panic may be likely. The most interesting debugging messages have the
- possible extended explanation (eg "unable to handle kernel paging
request at address xxxxx").
- one line of reason + possible error code. This can look like
"General protection fault: 0000" or similar: it tells which exception
happened, and gives the error code. The error code is mostly zero,
but can sometimes be non-zero, which usually makes them more
- the place the error was reported, in the form "EIP: 0008:xxxxxxxx".
This is important: it should be used to later check up in which
kernel routine the error happened. The 0008 tells that it happened
in the kernel code segment (it can be something else, but it probably
shouldn't happen), and the "xxxxxxxx" is the offset of the offending
- the value of the 'fs' segment at the time of the exception: this is
usually 0017, and isn't really interesting any more (it's a leftover
from much earlier debugging sessions).
- the base and limit of the current code segment. These too are mostly
leftovers from older kernel versions: in the current kernels these
are unlikely to have anything important in them (but do report them
anyway for completeness).
- the pid of the current process and the value of the task register at
this point. Not generally of any importance.
- ten hexadecimal values representing the offending instruction. These
can be used to hand-disassemble what the offending instruction was,
and sometimes helps pinpoint it a bit more easily than just telling
where it happened. This is useful.
When doing a panic report (or a report of just a "normal" kernel error
without an actual panic), the thing to do is:
(a) write down the above debugging info exactly. Especially the EIP
and instruction hex-dump values are important, and need to be
correct for any kind of debugging.
(b) find out where the exception happened. With earlier kernels (0.12
and below), the address was generally enough for me: all the
kernels were generally the same, and I could look at my kernel
binary to find out where the error occurred. With newer kernels
that is no longer possible, so the person who reports the error
will have to pinpoint it a bit closer with respect to his
particular kernel version.
There are several ways to find out where the error happened, but the
simplest one is generally the following:
- get the kernel namelist with 'nm' and sort it according to address.
This is most easily done with the commands
# nm /usr/src/linux/tools/system | sort > namelist
where you have to make sure that the tools/system file actually
corresponds with the kernel that paniced.
- search for the place that seems to contain the offending
instructions. 'grep' is not really an option, as the exact address
is unlikely to be in the output of 'nm', so you'll have to eyeball
it. This is easy enough in a editor or using 'less'.
- send along about 10 lines of the nm output from around the offending
instruction. Assuming the EIP value reported by the panic was
00012345, the output of nm that is interesting might look like this:
00011fd4 T _sys_ssetmask
00011ff4 T _sys_sigpending
00012024 T _sys_sigsuspend
00012084 T _sys_signal
00012114 T _sys_sigaction
00012204 T _do_signal
000124ac T _kernel_mktime
000124ac t gcc2_compiled.
000124ac t mktime.o
00012560 t _get_long
00012560 t gcc2_compiled.
where the 00012345 address is in the _do_signal() function that seems
to extend from 00012204 to 000124ac. Note the "seems" - I prefer to
have a couple of lines of context around the offending place as that
can help pinpoint it a bit more: there may be static functions in the
kernel between the two addresses that won't show up in the namelist
or similar. Also, sending a couple of lines of context means that
bogus lines can safely be ignored (things like the "gcc2_compiled"
and "mktime.o" in the example). But don't try to prune out the bogus
lines yourself unless you know that you know what you are doing.
So, the result of it all? A bug-report with only the register dumps and
no other info is generally pretty useless - although if it also tells
what was going on that resulted in the error the bug might still be
possible to find. Together with a pinpoint where it happened, it's
generally much easier then to find exactly what went wrong, and fix it.
There are some circumstances where even all the above information won't
help: under some circumstances (a kernel jump to a nonexistent address
etc), the debugging info is simply bogus and not enough. So always try
to make the bugreport as complete as possible: if you can re-create the
error so that somebody else also can test it, please include that kind
of info ("if I do this, then that, then the kernel will crash with this
From: email@example.com (Eric Youngdale)
Subject: New FAQ on kernel panics
Date: Sat, 16 Jan 1993 07:59:32 +0200
> - send along about 10 lines of the nm output from around the offending
> instruction. Assuming the EIP value reported by the panic was
> 00012345, the output of nm that is interesting might look like this:
> 00011fd4 T _sys_ssetmask
> 00011ff4 T _sys_sigpending
> 00012024 T _sys_sigsuspend
> 00012084 T _sys_signal
> 00012114 T _sys_sigaction
> 00012204 T _do_signal
Here is an item that is probably only of interest to kernel hackers.
The beta version of gas, 1.93, is capable of generating listing files. It will
print out the offsets of each instruction, the hexadecimal bytes that each
instruction is translated to, and the source code that was responsible for
the assembly code. I am enclosing a short example at the end of this message.
To generate the listings, you need to use the '-a' switch, and the
listing goes to stdout.
GAS LISTING seagate.s page 50
857:seagate.c **** printk("disconnecting..");
1557 .stabd 68,0,857
1558 0f42 68790600 pushl $LC25
1559 0f47 E8B4F0FF call _printk
1560 0f4c 83C404 addl $4,%esp
858:seagate.c **** if(!SCint) panic("SCint == NULL after REQ_MSGIN1");
1561 .stabd 68,0,858
1562 0f4f 833D5612 cmpl $0,_SCint
1563 0f56 7510 jne L110
1564 0f58 68890600 pushl $LC26
1565 0f5d E89EF0FF call _panic
1566 0f62 83C404 addl $4,%esp
1567 0f65 909090 .align 4,0x90
859:seagate.c **** current_data = data; /* WDE add */
1569 .stabd 68,0,859
1570 0f68 8B4DF8 movl -8(%ebp),%ecx
1571 0f6b 890D7C12 movl %ecx,_current_data
860:seagate.c **** current_bufflen = len; /* WDE add */
1572 .stabd 68,0,860
1573 0f71 8B55FC movl -4(%ebp),%edx
1574 0f74 89158012 movl %edx,_current_bufflen
861:seagate.c **** #if (DEBUG & (PHASE_RESELECT | PHASE_MSGIN))
862:seagate.c **** printk("scsi%d : disconnected.\n", hostno);
863:seagate.c **** #endif
864:seagate.c **** done=1;
1575 .stabd 68,0,864
1576 0f7a C745EC01 movl $1,-20(%ebp)
865:seagate.c **** break;
1577 .stabd 68,0,865
1578 0f81 E9CA0000 jmp L108
1579 0f86 9090 .align 4,0x90
866:seagate.c **** case COMMAND_COMPLETE :
867:seagate.c **** #if (DEBUG & PHASE_MSGIN)
868:seagate.c **** printk("scsi%d : command complete.\n", hostno);
869:seagate.c **** #endif
870:seagate.c **** if(!SCint) panic("SCint == NULL after REQ_MSGIN2");
1581 .stabd 68,0,870
1582 0f88 833D5612 cmpl $0,_SCint
1583 0f8f 750F jne L112
1584 0f91 68A80600 pushl $LC27
1585 0f96 E865F0FF call _panic
USENET (Users’ Network) was a bulletin board shared among many computer
systems around the world. USENET was a logical network, sitting on top
of several physical networks, among them UUCP, BLICN, BERKNET, X.25, and
the ARPANET. Sites on USENET included many universities, private companies
and research organizations. See USENET Archives.
SCO Files Lawsuit Against IBM
March 7, 2003 - The SCO Group filed legal action against IBM in the State
Court of Utah for trade secrets misappropriation, tortious interference,
unfair competition and breach of contract. The complaint alleges that IBM
made concentrated efforts to improperly destroy the economic value of
UNIX, particularly UNIX on Intel, to benefit IBM's Linux services
business. See SCO vs IBM.
The materials and information included in this website may only be used
for purposes such as criticism, review, private study, scholarship, or
Electronic mail: WorldWideWeb: