linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Keith Owens <kaos@ocs.com.au>
To: Ryan Sweet <rsweet@atos-group.nl>
Cc: linux-kernel@vger.kernel.org
Subject: Re: random reboots of diskless nodes - 2.4.7 (fwd)
Date: Tue, 16 Oct 2001 14:58:37 +1000	[thread overview]
Message-ID: <20123.1003208317@kao2.melbourne.sgi.com> (raw)
In-Reply-To: Your message of "Tue, 16 Oct 2001 02:28:46 +0200." <Pine.LNX.4.30.0110160228000.18043-100000@core-0>

On Tue, 16 Oct 2001 02:28:46 +0200 (CEST), 
Ryan Sweet <rsweet@atos-group.nl> wrote:
>Questions:
>- what the heck can I do to isolate the problem?

Debugger over a serial console.

>- why would the system re-boot instead of hanging on whatever caused it to
>crash (ie, why don't I see an oops message?)

Probably triple fault on ix86, which forces a reboot.  That is, a fault
was detected, trying to report the fault caused an error which caused a
third error.  Say goodnight, Dick.  The other main possibility is a
hardware or software watchdog that thinks the system has hung and is
forcing a reboot, do you have one of those?

>- how can I tell the system not to re-boot when it crashes (or is it
>crashing at all???)

If it is a triple fault, you have to catch the error before the third
fault.  Tricky.

>- is it worth trying all the newer kernel versions (this does not sound
>very appealing, especially given the troubles reported with 2.4.10 and
>also the split over which vm to use, etc..., also the changelogs don't
>really point to anything that appears to precisely describe my problem)?

Maybe.  OTOH if you wait until you capture some diagnostics it will
give you a better indication if the later kernels actually fix the
problem.

>- if I patch with kgdb and use a null modem connection from the gateway to
>run gdb can I expect to gain any useful info from a backtrace?

It is definitely worth trying kgdb or kdb[1] over a serial console.  I
am biased towards kdb (I maintain it) but either are worth a go.

Unfortunately the most common triple fault is a kernel stack overflow
and the ix86 kernel design has no way to recover from that error, the
error handler needs stack space to report anything, both kgdb and kdb
need stack space as well.  If you suspect stack overflow, look at the
IKD patch[2], it has code to warn about potential stack overflows
before they are completely out of hand.

[1] ftp://oss.sgi.com/projects/kdb/download/ix86, old for 2.4.7.
[2] ftp://ftp.kernel.org/pub/linux/kernel/people/andrea/ikd/


  reply	other threads:[~2001-10-16  4:58 UTC|newest]

Thread overview: 6+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2001-10-16  0:28 random reboots of diskless nodes - 2.4.7 (fwd) Ryan Sweet
2001-10-16  4:58 ` Keith Owens [this message]
2001-11-05 14:50   ` Ryan Sweet
2001-10-16  8:14 ` Alan Cox
2001-10-16 20:06 ` Hans-Peter Jansen
2001-10-19 15:52 ` n0ano

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20123.1003208317@kao2.melbourne.sgi.com \
    --to=kaos@ocs.com.au \
    --cc=linux-kernel@vger.kernel.org \
    --cc=rsweet@atos-group.nl \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).