All of lore.kernel.org
 help / color / mirror / Atom feed
From: l.genoni@oltrelinux.com
To: linux-kernel@vger.kernel.org
Subject: Re: System crash after "No irq handler for vector" linux 2.6.19
Date: Tue, 23 Jan 2007 11:19:10 +0100 (CET)	[thread overview]
Message-ID: <Pine.LNX.4.64.0701231118290.22876@Phoenix.oltrelinux.com> (raw)

reproduced.
it took more or less one hour to reproduce it.  I could reproduce it olny
running also irqbalance 0.55 and commenting out the sleep 1. The message 
in
syslog is the same and then, after a few seconds I think, KABOM! system 
crash
and reboot.

I tested also a similar system that has 4 dual core CPU Opteron 2600MHZ. 
On
this system (linux sees 8 CPU, but it is the same kernel, same gcc, same
config, same glibc, same active services) I could not reproduce it even
running irqbalance 0.55 in almost 1 hour. Maybe I could reproduce it 
waiting
for more time, but my users need to do their work, so I could not have a
longer test window. So on 16 CPU I had the crash, on 8 CPU I had no crash.

I need to give back the system to the users, so if you need other tests,
please, tell me as soon.

thanx

Luigi Genoni

On Monday 22 January 2007 18:14, Eric W. Biederman wrote:
> "Luigi Genoni" <luigi.genoni@pirelli.com> writes:
> > (e-mail resent because not delivered using my other e-mail account)
> >
> > Hi,
> > this night a linux server 8 dual core CPU Optern 2600Mhz crashed just
> > after giving this message
> >
> > Jan 22 04:48:28 frey kernel: do_IRQ: 1.98 No irq handler for vector
>
> Ok.  This indicates that the hardware is doing something we didn't 
expect.
> We don't know which irq the hardware was trying to deliver when it
> sent vector 0x98 to cpu 1.
>
> > I have no other logs, and I eventually lost the OOPS since I have no 
net
> > console setled up.
>
> If you had an oops it may have meant the above message was a secondary
> symptom.  Groan.  If it stayed up long enough to give an OOPS then
> there is a chance the above message appearing only once had nothing
> to do with the actual crash.
>
> How long had the system been up?
>
> > As I said sistem is running linux 2.6.19 compiled with gcc 4.1.1 for 
AMD
> > Opteron (attached see .config), no kernel preemption excepted the BKL
> > preemption. glibc 2.4.
> >
> > System has 16 GB RAM and 8 dual core Opteron 2600Mhz.
> >
> > I am running irqbalance 0.55.
> >
> > any hints on what has happened?
>
> Three guesses.
>
> - A race triggered by irq migration (but I would expect more people to 
be
> yelling). The code path where that message comes from is new in 2.6.19 
so
> it may not have had all of the bugs found yet :(
> - A weird hardware or BIOS setup.
> - A secondary symptom triggered by some other bug.
>
> If this winds up being reproducible we should be able to track it down.
> If not this may end up in the files of crap something bad happened that
> we don't understand.
>
> The one condition I know how to test for (if you are willing) is an
> irq migration race.  Simply by triggering irq migration much more often,
> and thus increasing our chances of hitting a problem.
>
> Stopping irqbalance and running something like:
> for irq in 0 24 28 29 44 45 60 68 ; do
>       while :; do
>               for mask in 1 2 4 8 10 20 40 80 100 200 400 800 1000 2000 
4000 8000 ; do
>                       echo mask > /proc/irq/$irq/smp_affinity
>                       sleep 1
>               done
>       done &
> done
>
> Should force every irq to migrate once a second, and removing the sleep 
1
> is even harsher, although we max at one irq migration by irq received.
>
> If some variation of the above loop does not trigger the do_IRQ ??? No 
irq
> handler for vector message chances are it isn't a race in irq migration.
>
> If we can rule out the race scenario it will at least put us in the 
right
> direction for guessing what went wrong with your box.
>
> Eric

-- 


             reply	other threads:[~2007-01-23 10:19 UTC|newest]

Thread overview: 12+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2007-01-23 10:19 l.genoni [this message]
  -- strict thread matches above, loose matches on Subject: below --
2007-02-01 13:33 System crash after "No irq handler for vector" linux 2.6.19 Chris Rankin
2007-01-23 10:35 l.genoni
     [not found] <200701221116.13154.luigi.genoni@pirelli.com>
2007-01-22 17:14 ` Eric W. Biederman
     [not found]   ` <200701231051.32945.luigi.genoni@pirelli.com>
2007-01-23 12:18     ` Eric W. Biederman
     [not found]       ` <Pine.LNX.4.64.0701232052330.32111@baldios.it.pirelli.com>
2007-01-31  8:39         ` Eric W. Biederman
     [not found]           ` <200701311549.22512.luigi.genoni@pirelli.com>
2007-02-01  5:59             ` Eric W. Biederman
2007-02-01  7:20             ` Eric W. Biederman
     [not found]               ` <200702021848.55921.luigi.genoni@pirelli.com>
2007-02-02 18:02                 ` Eric W. Biederman
     [not found]                   ` <200702021905.39922.luigi.genoni@pirelli.com>
2007-02-02 18:32                     ` Eric W. Biederman
2007-02-03  0:40                     ` Eric W. Biederman
2007-01-22 10:06 l.genoni

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=Pine.LNX.4.64.0701231118290.22876@Phoenix.oltrelinux.com \
    --to=l.genoni@oltrelinux.com \
    --cc=linux-kernel@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.