linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: TeJun Huh <tejun@aratech.co.kr>
To: Stephan von Krawczynski <skraw@ithnet.com>
Cc: linux-kernel@vger.kernel.org
Subject: Re: Race condition in 2.4 tasklet handling (cli() broken?)
Date: Sun, 24 Aug 2003 00:13:15 +0900	[thread overview]
Message-ID: <20030823151315.GA6781@atj.dyndns.org> (raw)
In-Reply-To: <20030823122813.0c90e241.skraw@ithnet.com>

On Sat, Aug 23, 2003 at 12:28:13PM +0200, Stephan von Krawczynski wrote:
> 
> If we follow your analysis and say it is broken, do you have a suggestion/patch
> how to fix both? I am willing to try your proposals, as it seems I am one of
> very few who really experience stability issues on SMP with the current
> implementation.
> 

 Hello, Stephan.

 The race conditions I'm mentioning in this thread are not likely to
cause real troubles.  The first one does not make any difference on
x86, and AFAIK bh isn't used extensively anymore so the second one
isn't very relevant either.  Only the race condition mentioned in the
other thread is of relvance if there is any :-(.

 We've been also suffering from random lockups (2.4.21 with various
patches including in-kernel irqbalancing) which show symptoms somewhat
different from usual kernel deadlock or panics.  We've seen lock ups
on several different machines.  All the machines were SMP and quite
busy with high volume network traffic and a lot of disk I/Os.  A
lockup takes from a week to a month(!) to take place.  Even though
they take very long, they occur sort of reliably.

 I had a chance to examine a locked up machine (Dual 3g xeon w/HT).  I
could turn on and off keyboard LEDs (so, keyboard irq is working) but
console didn't come back from blanked state.  The kernel was compiled
with sysrq and I've tried many sysrqs but teh console remained blank.
After a while, I pressed sysrq reboot key and it rebooted.
Fortunately, kernel log file did contain all outputs from sysrqs and I
could do a little bit of post-mortem analysis.

 The first weird thing was the timestamps.  Time seemed to be stopped
for a few hours between the lock up and the first sysrq request.
Then, time start to go again after the first sysrq request.  (NMI
watchdog was on)

 Process list showed that a server process is stuck inside kernel, but
the stuck position was very weird.  It was freeing a socket after
receving FIN.  The eip was stuck at the same place over several
sysrqs, and the instruction at the eip was plain ADD right after a
CALL instruction to kfree.  I think there is one more frame above
what's shown but I don't know how sysrq prints stack trace for other
cpus so I'm not sure.

 To gather more information, we hooked up a machine with kdb and are
waiting for the lockup to occur again.  My personal feeling is that
the race conditions I've mentioned are not the causes of the lockups
we're suffering from.

 It would be helpful if you can tell us more about your lockups.  Have
you tried sysrq, NMI watchdog, kdb or kgdb?

-- 
tejun

  reply	other threads:[~2003-08-23 17:00 UTC|newest]

Thread overview: 14+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2003-08-23  2:54 Race condition in 2.4 tasklet handling TeJun Huh
2003-08-23  4:09 ` Race condition in 2.4 tasklet handling (cli() broken?) TeJun Huh
2003-08-23  5:26   ` TeJun Huh
2003-08-23 10:28     ` Stephan von Krawczynski
2003-08-23 15:13       ` TeJun Huh [this message]
2003-08-23 15:37         ` Stephan von Krawczynski
2003-08-25  6:31           ` TeJun Huh
2003-08-25  7:23             ` Stephan von Krawczynski
2003-08-26  0:27               ` TeJun Huh
2003-08-23 15:56         ` Stephan von Krawczynski
2003-08-23 16:36           ` TeJun Huh
2003-08-24  3:27           ` TeJun Huh
2003-08-23 15:04 ` Race condition in 2.4 tasklet handling Anton Blanchard
     [not found] <nKwX.1yy.17@gated-at.bofh.it>
2003-08-25  8:06 ` Race condition in 2.4 tasklet handling (cli() broken?) Peter T. Breuer

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20030823151315.GA6781@atj.dyndns.org \
    --to=tejun@aratech.co.kr \
    --cc=linux-kernel@vger.kernel.org \
    --cc=skraw@ithnet.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).