From: Andrea Arcangeli <andrea@suse.de>
To: linas@austin.ibm.com
Cc: Ingo Molnar <mingo@elte.hu>, Andrew Morton <akpm@osdl.org>,
linux-kernel@vger.kernel.org
Subject: Re: PATCH: Race in 2.6.0-test2 timer code
Date: Thu, 31 Jul 2003 00:06:04 +0200 [thread overview]
Message-ID: <20030730220604.GA23835@dualathlon.random> (raw)
In-Reply-To: <20030730161810.B28284@forte.austin.ibm.com>
On Wed, Jul 30, 2003 at 04:18:10PM -0500, linas@austin.ibm.com wrote:
> On Wed, Jul 30, 2003 at 02:34:58PM +0200, Andrea Arcangeli wrote:
> >
> > so if it was really the itimer, it had to be on ppc s390 or ia64, not on
> > x86. I never reproduced this myself. I will ask more info on bugzilla,
> > because I thought it was x86 but maybe it wasn't. As said in the
> > previous email, only non x86 archs can run the timer irq on a cpu
> > different than the one where it was inserted.
>
> I think I started the thread, so let me backtrack. I've got several
> ppc64 boxes running 2.4.21. They hang, they don't crash. Sometimes
> they're pingable, but that's all. The little yellow button is wired
> up to KDB, so I can break in and poke around and see what's happening.
>
> Here's what I find:
>
> __run_timers() is stuck in a infinite loop, because there's a
> timer on the base. However, this timer has timer->list.net == NULL,
> and so it can never be removed. (As a side effect, other CPU's get
> stuck in various places, sitting in spinlocks, etc. which is why the
> machine goes unresponsive) So the question becomes "how did this
> pointer get trashed?"
>
> I thought I saw an "obvious race" in the 2.4 code, and it looked
> identical to the 2.6 code, and thus the email went out. Now, I'm
> not so sure; I'm confused.
the 2.4 code calls into run_all_timers for non x86 archs, that should
go away, and be replaced with calls of the local_run_timers in the per cpu
timer interrupts (not the global timer irq). Infact I wouldn't trust calling
run_all_timers anyways, that might be the racy thing, you're probably the first
one calling it.
>
> However, given that its timer->list.net that is getting clobbered,
> then locking all occurrances of list_add and list_del should cure the
> problem. I'm guessing that Andrea's patch to add a timer->lock
> will protect all of the list_add's, list_del's that matter.
you definitely need that patch on non x86 archs with the run_all_timers or the
race that I pointed out today will trigger (as agreed by Ingo too), so you
should try to reproduce with it applied.
I've also seen another patch floating around that looks like this:
@@ -214,8 +214,14 @@ repeat:
spin_unlock(&old_base->lock);
goto repeat;
}
- } else
+ } else {
spin_lock(&new_base->lock);
+ if (timer->base != old_base) {
+ spin_unlock(&new_base->lock);
+ printk ("duuude found and fixed bug locking mod
timer\n");
+ goto repeat;
+ }
+
but I don't think we need even this other one in 2.4 (2.4 never touches the
timer->base from the del_timer*):
/*
* Subtle, we rely on timer->base being always
* valid and being updated atomically.
*/
if (timer->base != old_base) {
spin_unlock(&new_base->lock);
spin_unlock(&old_base->lock);
goto repeat;
}
since we're serialized by the timer->lock in mod_timer, and del_timer
won't affect the timer->base in 2.4. And add_timer is forbidden to run at the
same time of mod_timer.
So I think we should change the code this way instead of applying the
above patch that looks wrong:
--- 2.4.22pre7aa1/kernel/timer.c.~1~ 2003-07-19 02:34:09.000000000 +0200
+++ 2.4.22pre7aa1/kernel/timer.c 2003-07-30 23:43:03.000000000 +0200
@@ -210,15 +210,7 @@ repeat:
spin_lock(&old_base->lock);
spin_lock(&new_base->lock);
}
- /*
- * Subtle, we rely on timer->base being always
- * valid and being updated atomically.
- */
- if (timer->base != old_base) {
- spin_unlock(&new_base->lock);
- spin_unlock(&old_base->lock);
- goto repeat;
- }
+ BUG_ON(timer->base != old_base);
} else
spin_lock(&new_base->lock);
Assuming you were running 2.4.21 w/o the extended timer->lock to add_timer and
del_timer, the only mistery at the moment is how can have lcm@us.ibm.com
reproduced a race in setitimer on x86 and get it fixed with the patch I postd
to l-k earlier today that extends the same timer->lock of mod_timer, to
add_timer and del_timer* too.
I mean, that patch is perfectly safe, looks very sane, is strictly necessary on
all non x86* and ia64 archs, but it shouldn't be necessary on x86 as discussed
today with Ingo and Andrew.
I will keep that patch applied because this is 2.4 and it fixes stuff in
practice, but still we must be missing something about this code.
Andrea
next prev parent reply other threads:[~2003-07-30 22:06 UTC|newest]
Thread overview: 33+ messages / expand[flat|nested] mbox.gz Atom feed top
2003-07-29 15:41 PATCH: Race in 2.6.0-test2 timer code linas
2003-07-29 20:56 ` Andrew Morton
2003-07-30 5:57 ` Ingo Molnar
2003-07-30 6:36 ` Andrew Morton
2003-07-30 7:07 ` Ingo Molnar
2003-07-30 7:34 ` Andrea Arcangeli
2003-07-30 7:34 ` Ingo Molnar
2003-07-30 8:28 ` Andrea Arcangeli
2003-07-30 10:31 ` Ingo Molnar
2003-07-30 11:16 ` Andrea Arcangeli
2003-07-30 11:49 ` Ingo Molnar
2003-07-30 12:34 ` Andrea Arcangeli
2003-07-30 21:18 ` linas
2003-07-30 22:06 ` Andrea Arcangeli [this message]
2003-07-30 22:17 ` Andrea Arcangeli
2003-07-31 7:04 ` Ingo Molnar
2003-07-30 21:19 ` Andrea Arcangeli
2003-07-30 23:43 ` linas
2003-07-30 23:56 ` Andrea Arcangeli
2003-07-30 23:54 ` Andrew Morton
2003-07-31 0:16 ` Andrea Arcangeli
2003-07-31 17:23 ` linas
2003-08-01 6:27 ` Ingo Molnar
2003-07-30 7:40 ` Ingo Molnar
2003-07-30 8:37 ` Andrea Arcangeli
2003-07-30 10:34 ` Ingo Molnar
2003-07-30 10:51 ` Andrew Morton
2003-07-30 11:28 ` Andrea Arcangeli
2003-07-30 11:22 ` Andrea Arcangeli
2003-07-30 20:05 ` linas
2003-07-31 6:50 ` Ingo Molnar
2003-07-31 22:56 ` linas
2003-08-01 6:23 ` Ingo Molnar
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20030730220604.GA23835@dualathlon.random \
--to=andrea@suse.de \
--cc=akpm@osdl.org \
--cc=linas@austin.ibm.com \
--cc=linux-kernel@vger.kernel.org \
--cc=mingo@elte.hu \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).