linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* PATCH: Race in 2.6.0-test2 timer code
@ 2003-07-29 15:41 linas
  2003-07-29 20:56 ` Andrew Morton
  0 siblings, 1 reply; 33+ messages in thread
From: linas @ 2003-07-29 15:41 UTC (permalink / raw)
  To: linux-kernel


PATCH: Race in 2.6.0-test2 timer code

Hi,

I have been chasing a pointer corruption bug in the timer code, and
found the following race in the mod_timer() routine.  I am still 
testing to see if this fixes my bug ... 

--linas



--- linux-2.6.0-test2/kernel/timer.c.orig	2003-07-27 12:07:34.000000000 -0500
+++ linux-2.6.0-test2/kernel/timer.c	2003-07-29 10:22:13.000000000 -0500
@@ -258,8 +258,13 @@ repeat:
 			spin_unlock(&old_base->lock);
 			goto repeat;
 		}
-	} else
+	} else {
 		spin_lock(&new_base->lock);
+		if (timer->base != old_base) {
+			spin_unlock(&new_base->lock);
+			goto repeat;
+		}
+	}
 
 	/*
 	 * Delete the previous timeout (if there was any), and install

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: PATCH: Race in 2.6.0-test2 timer code
  2003-07-29 15:41 PATCH: Race in 2.6.0-test2 timer code linas
@ 2003-07-29 20:56 ` Andrew Morton
  2003-07-30  5:57   ` Ingo Molnar
  0 siblings, 1 reply; 33+ messages in thread
From: Andrew Morton @ 2003-07-29 20:56 UTC (permalink / raw)
  To: linas; +Cc: linux-kernel

linas@austin.ibm.com wrote:
>
> I have been chasing a pointer corruption bug in the timer code, and
> found the following race in the mod_timer() routine.  I am still 
> testing to see if this fixes my bug ... 

Andrea says that we need to take the per-timer lock in add_timer() and
del_timer(), but I haven't yet got around to working out why.

Does this (untested) patch fix whatever race it is which you are observing?


 25-akpm/kernel/timer.c |   25 ++++++++++++++++---------
 1 files changed, 16 insertions(+), 9 deletions(-)

diff -puN kernel/timer.c~timer-locking-fixes kernel/timer.c
--- 25/kernel/timer.c~timer-locking-fixes	Tue Jul 29 13:52:21 2003
+++ 25-akpm/kernel/timer.c	Tue Jul 29 13:54:55 2003
@@ -167,10 +167,12 @@ void add_timer(struct timer_list *timer)
 
 	check_timer(timer);
 
-	spin_lock_irqsave(&base->lock, flags);
+	spin_lock_irqsave(&timer->lock, flags);
+	spin_lock(&base->lock);
 	internal_add_timer(base, timer);
 	timer->base = base;
-	spin_unlock_irqrestore(&base->lock, flags);
+	spin_unlock(&base->lock);
+	spin_unlock_irqrestore(&timer->lock, flags);
 	put_cpu();
 }
 
@@ -190,10 +192,12 @@ void add_timer_on(struct timer_list *tim
 
 	check_timer(timer);
 
-	spin_lock_irqsave(&base->lock, flags);
+	spin_lock_irqsave(&timer->lock, flags);
+	spin_lock(&base->lock);
 	internal_add_timer(base, timer);
 	timer->base = base;
-	spin_unlock_irqrestore(&base->lock, flags);
+	spin_unlock(&base->lock);
+	spin_unlock_irqrestore(&timer->lock, flags);
 }
 
 /***
@@ -298,19 +302,22 @@ int del_timer(struct timer_list *timer)
 	tvec_base_t *base;
 
 	check_timer(timer);
-
+	spin_lock_irqsave(&timer->lock, flags);
 repeat:
  	base = timer->base;
-	if (!base)
+	if (!base) {
+		spin_unlock_irqrestore(&timer->lock, flags);
 		return 0;
-	spin_lock_irqsave(&base->lock, flags);
+	}
+	spin_lock(&base, flags);
 	if (base != timer->base) {
-		spin_unlock_irqrestore(&base->lock, flags);
+		spin_unlock(&base->lock);
 		goto repeat;
 	}
 	list_del(&timer->entry);
 	timer->base = NULL;
-	spin_unlock_irqrestore(&base->lock, flags);
+	spin_unlock(&base->lock);
+	spin_unlock_irqrestore(&timer->lock, flags);
 
 	return 1;
 }

_


^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: PATCH: Race in 2.6.0-test2 timer code
  2003-07-29 20:56 ` Andrew Morton
@ 2003-07-30  5:57   ` Ingo Molnar
  2003-07-30  6:36     ` Andrew Morton
                       ` (2 more replies)
  0 siblings, 3 replies; 33+ messages in thread
From: Ingo Molnar @ 2003-07-30  5:57 UTC (permalink / raw)
  To: Andrew Morton; +Cc: linas, linux-kernel


On Tue, 29 Jul 2003, Andrew Morton wrote:

> Andrea says that we need to take the per-timer lock in add_timer() and
> del_timer(), but I haven't yet got around to working out why.

this makes no sense - in 2.6 (and in 2.4) there's no safe add_timer() /
del_timer() use without using external SMP synchronization. (There's one
special timer use variant involving del_timer_sync() that was safe in 2.4
but is unsafe in 2.6, see below.)

and i dont think Linas' patch is correct either - how can the timer base
change under us? We are holding the timer spinlock.

What i'd propose is the attached (tested, against 2.6.0-test2) patch
instead. It unifies the functionality of add_timer() and mod_timer(), and
makes any combination of the timer API calls completely SMP-safe.  
del_timer() is still not using the timer lock.

this patch fixes the only timer bug in 2.6 i'm aware of: the
del_timer_sync() + add_timer() combination in kernel/itimer.c is buggy.
This was correct code in 2.4, because there it was safe to do an
add_timer() from the timer handler itself, parallel to a del_timer_sync().  
If we want to make this safe in 2.6 too (which i think we want to) then we
have to make add_timer() almost equivalent to mod_timer(), locking-wise.
And once we are at this point i think it's much cleaner to actually make
add_timer() a variant of mod_timer(). (There's no locking cost for
add_timer(), only the cost of an extra branch. And we've removed another
commonly used function from the icache.)

Linas, could you please give this patch a go, does it make a difference to
your timer list corruption problem? I've booted it on SMP and UP as well.

	Ingo

--- linux/include/linux/timer.h.orig	
+++ linux/include/linux/timer.h	
@@ -60,11 +60,30 @@ static inline int timer_pending(const st
 	return timer->base != NULL;
 }
 
-extern void add_timer(struct timer_list * timer);
 extern void add_timer_on(struct timer_list *timer, int cpu);
 extern int del_timer(struct timer_list * timer);
+extern int __mod_timer(struct timer_list *timer, unsigned long expires);
 extern int mod_timer(struct timer_list *timer, unsigned long expires);
-  
+
+/***
+ * add_timer - start a timer
+ * @timer: the timer to be added
+ *
+ * The kernel will do a ->function(->data) callback from the
+ * timer interrupt at the ->expired point in the future. The
+ * current time is 'jiffies'.
+ *
+ * The timer's ->expired, ->function (and if the handler uses it, ->data)
+ * fields must be set prior calling this function.
+ *
+ * Timers with an ->expired field in the past will be executed in the next
+ * timer tick.
+ */
+static inline void add_timer(struct timer_list * timer)
+{
+	__mod_timer(timer, timer->expires);
+}
+ 
 #ifdef CONFIG_SMP
   extern int del_timer_sync(struct timer_list * timer);
 #else
--- linux/kernel/timer.c.orig	
+++ linux/kernel/timer.c	
@@ -144,34 +144,62 @@ static void internal_add_timer(tvec_base
 	list_add_tail(&timer->entry, vec);
 }
 
-/***
- * add_timer - start a timer
- * @timer: the timer to be added
- *
- * The kernel will do a ->function(->data) callback from the
- * timer interrupt at the ->expired point in the future. The
- * current time is 'jiffies'.
- *
- * The timer's ->expired, ->function (and if the handler uses it, ->data)
- * fields must be set prior calling this function.
- *
- * Timers with an ->expired field in the past will be executed in the next
- * timer tick. It's illegal to add an already pending timer.
- */
-void add_timer(struct timer_list *timer)
+int __mod_timer(struct timer_list *timer, unsigned long expires)
 {
-	tvec_base_t *base = &get_cpu_var(tvec_bases);
-  	unsigned long flags;
-  
-  	BUG_ON(timer_pending(timer) || !timer->function);
+	tvec_base_t *old_base, *new_base;
+	unsigned long flags;
+	int ret = 0;
+
+	BUG_ON(!timer->function);
 
 	check_timer(timer);
 
-	spin_lock_irqsave(&base->lock, flags);
-	internal_add_timer(base, timer);
-	timer->base = base;
-	spin_unlock_irqrestore(&base->lock, flags);
-	put_cpu_var(tvec_bases);
+	spin_lock_irqsave(&timer->lock, flags);
+	new_base = &__get_cpu_var(tvec_bases);
+repeat:
+	old_base = timer->base;
+
+	/*
+	 * Prevent deadlocks via ordering by old_base < new_base.
+	 */
+	if (old_base && (new_base != old_base)) {
+		if (old_base < new_base) {
+			spin_lock(&new_base->lock);
+			spin_lock(&old_base->lock);
+		} else {
+			spin_lock(&old_base->lock);
+			spin_lock(&new_base->lock);
+		}
+		/*
+		 * The timer base might have been cancelled while we were
+		 * trying to take the lock(s):
+		 */
+		if (timer->base != old_base) {
+			spin_unlock(&new_base->lock);
+			spin_unlock(&old_base->lock);
+			goto repeat;
+		}
+	} else
+		spin_lock(&new_base->lock);
+
+	/*
+	 * Delete the previous timeout (if there was any), and install
+	 * the new one:
+	 */
+	if (old_base) {
+		list_del(&timer->entry);
+		ret = 1;
+	}
+	timer->expires = expires;
+	internal_add_timer(new_base, timer);
+	timer->base = new_base;
+
+	if (old_base && (new_base != old_base))
+		spin_unlock(&old_base->lock);
+	spin_unlock(&new_base->lock);
+	spin_unlock_irqrestore(&timer->lock, flags);
+
+	return ret;
 }
 
 /***
@@ -179,7 +207,7 @@ void add_timer(struct timer_list *timer)
  * @timer: the timer to be added
  * @cpu: the CPU to start it on
  *
- * This is not very scalable on SMP.
+ * This is not very scalable on SMP. Double adds are not possible.
  */
 void add_timer_on(struct timer_list *timer, int cpu)
 {
@@ -217,10 +245,6 @@ void add_timer_on(struct timer_list *tim
  */
 int mod_timer(struct timer_list *timer, unsigned long expires)
 {
-	tvec_base_t *old_base, *new_base;
-	unsigned long flags;
-	int ret = 0;
-
 	BUG_ON(!timer->function);
 
 	check_timer(timer);
@@ -233,52 +257,7 @@ int mod_timer(struct timer_list *timer, 
 	if (timer->expires == expires && timer_pending(timer))
 		return 1;
 
-	spin_lock_irqsave(&timer->lock, flags);
-	new_base = &__get_cpu_var(tvec_bases);
-repeat:
-	old_base = timer->base;
-
-	/*
-	 * Prevent deadlocks via ordering by old_base < new_base.
-	 */
-	if (old_base && (new_base != old_base)) {
-		if (old_base < new_base) {
-			spin_lock(&new_base->lock);
-			spin_lock(&old_base->lock);
-		} else {
-			spin_lock(&old_base->lock);
-			spin_lock(&new_base->lock);
-		}
-		/*
-		 * The timer base might have been cancelled while we were
-		 * trying to take the lock(s):
-		 */
-		if (timer->base != old_base) {
-			spin_unlock(&new_base->lock);
-			spin_unlock(&old_base->lock);
-			goto repeat;
-		}
-	} else
-		spin_lock(&new_base->lock);
-
-	/*
-	 * Delete the previous timeout (if there was any), and install
-	 * the new one:
-	 */
-	if (old_base) {
-		list_del(&timer->entry);
-		ret = 1;
-	}
-	timer->expires = expires;
-	internal_add_timer(new_base, timer);
-	timer->base = new_base;
-
-	if (old_base && (new_base != old_base))
-		spin_unlock(&old_base->lock);
-	spin_unlock(&new_base->lock);
-	spin_unlock_irqrestore(&timer->lock, flags);
-
-	return ret;
+	return __mod_timer(timer, expires);
 }
 
 /***



^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: PATCH: Race in 2.6.0-test2 timer code
  2003-07-30  5:57   ` Ingo Molnar
@ 2003-07-30  6:36     ` Andrew Morton
  2003-07-30  7:07       ` Ingo Molnar
  2003-07-30 20:05     ` linas
  2003-07-31 22:56     ` linas
  2 siblings, 1 reply; 33+ messages in thread
From: Andrew Morton @ 2003-07-30  6:36 UTC (permalink / raw)
  To: Ingo Molnar; +Cc: linas, linux-kernel, Andrea Arcangeli

Ingo Molnar <mingo@elte.hu> wrote:
>
> 
> On Tue, 29 Jul 2003, Andrew Morton wrote:
> 
> > Andrea says that we need to take the per-timer lock in add_timer() and
> > del_timer(), but I haven't yet got around to working out why.
> 
> this makes no sense - in 2.6 (and in 2.4) there's no safe add_timer() /
> del_timer() use without using external SMP synchronization. (There's one
> special timer use variant involving del_timer_sync() that was safe in 2.4
> but is unsafe in 2.6, see below.)
> 

Well Andrea did mention a problem with the interval timers.  But I am not
aware of the exact details of the race which he found, and I don't
understand why del_timer() and add_timer() would be needing the per-timer
locks.

You need to export __mod_timer to modules btw.

--- 25/kernel/ksyms.c~timer-race-fixes	2003-07-29 23:27:05.000000000 -0700
+++ 25-akpm/kernel/ksyms.c	2003-07-29 23:27:49.000000000 -0700
@@ -405,8 +405,6 @@ EXPORT_SYMBOL(proc_doulongvec_ms_jiffies
 EXPORT_SYMBOL(proc_doulongvec_minmax);
 
 /* interrupt handling */
-EXPORT_SYMBOL(add_timer);
-EXPORT_SYMBOL(del_timer);
 EXPORT_SYMBOL(request_irq);
 EXPORT_SYMBOL(free_irq);
 
@@ -433,7 +431,10 @@ EXPORT_SYMBOL(probe_irq_off);
 #ifdef CONFIG_SMP
 EXPORT_SYMBOL(del_timer_sync);
 #endif
+EXPORT_SYMBOL(add_timer);
+EXPORT_SYMBOL(del_timer);
 EXPORT_SYMBOL(mod_timer);
+EXPORT_SYMBOL(__mod_timer);
 
 #ifdef HAVE_DISABLE_HLT
 EXPORT_SYMBOL(disable_hlt);

_


^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: PATCH: Race in 2.6.0-test2 timer code
  2003-07-30  6:36     ` Andrew Morton
@ 2003-07-30  7:07       ` Ingo Molnar
  2003-07-30  7:34         ` Andrea Arcangeli
  0 siblings, 1 reply; 33+ messages in thread
From: Ingo Molnar @ 2003-07-30  7:07 UTC (permalink / raw)
  To: Andrew Morton; +Cc: linas, linux-kernel, Andrea Arcangeli


On Tue, 29 Jul 2003, Andrew Morton wrote:

> Well Andrea did mention a problem with the interval timers.  But I am
> not aware of the exact details of the race which he found, and I don't
> understand why del_timer() and add_timer() would be needing the
> per-timer locks.

hmm ... indeed i cannot see the 2.6 itimer race anymore. Andrea, can you
see any SMP races in the current 2.6 timer code?

(in any case, i still think it would be safer to 'upgrade' the add_timer()  
interface to be SMP-safe and to allow double-adds - but not for any bug
reason anymore.)

	Ingo


^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: PATCH: Race in 2.6.0-test2 timer code
  2003-07-30  7:34         ` Andrea Arcangeli
@ 2003-07-30  7:34           ` Ingo Molnar
  2003-07-30  8:28             ` Andrea Arcangeli
  2003-07-30  7:40           ` Ingo Molnar
  1 sibling, 1 reply; 33+ messages in thread
From: Ingo Molnar @ 2003-07-30  7:34 UTC (permalink / raw)
  To: Andrea Arcangeli; +Cc: Andrew Morton, linas, linux-kernel


On Wed, 30 Jul 2003, Andrea Arcangeli wrote:

> The thing triggered simply by running setitimer in one function, while
> the it_real_fn was running in the other cpu. I don't see how 2.6 can
> have fixed this, it_real_fun can still trivially call add_timer while
> you run inside do_setitimer in 2.6 too. [...]

This is not a race that can happen. itimer does this:

	del_timer_sync();
	add_timer();

how can the add_timer() still happen while it_real_fn is still running on 
another CPU?

	Ingo


^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: PATCH: Race in 2.6.0-test2 timer code
  2003-07-30  7:07       ` Ingo Molnar
@ 2003-07-30  7:34         ` Andrea Arcangeli
  2003-07-30  7:34           ` Ingo Molnar
  2003-07-30  7:40           ` Ingo Molnar
  0 siblings, 2 replies; 33+ messages in thread
From: Andrea Arcangeli @ 2003-07-30  7:34 UTC (permalink / raw)
  To: Ingo Molnar; +Cc: Andrew Morton, linas, linux-kernel

On Wed, Jul 30, 2003 at 09:07:59AM +0200, Ingo Molnar wrote:
> 
> On Tue, 29 Jul 2003, Andrew Morton wrote:
> 
> > Well Andrea did mention a problem with the interval timers.  But I am
> > not aware of the exact details of the race which he found, and I don't
> > understand why del_timer() and add_timer() would be needing the
> > per-timer locks.
> 
> hmm ... indeed i cannot see the 2.6 itimer race anymore. Andrea, can you
> see any SMP races in the current 2.6 timer code?
> 
> (in any case, i still think it would be safer to 'upgrade' the add_timer()  
> interface to be SMP-safe and to allow double-adds - but not for any bug
> reason anymore.)

The thing triggered simply by running setitimer in one function, while
the it_real_fn was running in the other cpu. I don't see how 2.6 can
have fixed this, it_real_fun can still trivially call add_timer while
you run inside do_setitimer in 2.6 too. It's simply that 2.6 isn't
running in production yet, so a bug that didn't showup in years of 2.5,
showsup in a month of 2.4. But even if you serialized some how
it_real_fn against do_setitimer (and I don't see any external
serialization in 2.6 in current bkcvs) I wouldn't trust 2.6 drivers not
to execute del_timer_sync in parallel anyways etc.. that's why I didn't
search for too many 2.6 timer bugs too much, and I still suggested to
apply the locking despite I only known about a single bug. The real
scalability optimization is probably only the fast path check in
mod_timer, and the cacheline bouncing avoidance in the timer global data
structures like the cascading queues, the per-timer lock should do well,
and I'd feel much safer with the whole timer API being smp safe w/o
requiring driver developers to add complicated external brainer locking.
This will provide a much more friendly abstraction.

I would be definitely against putting external locking in
do_setitimer/it_timer_fn in 2.6 since like you missed that place, there
can be still tons of other buggy drivers out there that don't trigger
simply because the userspace is still too small.

Andrea

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: PATCH: Race in 2.6.0-test2 timer code
  2003-07-30  7:34         ` Andrea Arcangeli
  2003-07-30  7:34           ` Ingo Molnar
@ 2003-07-30  7:40           ` Ingo Molnar
  2003-07-30  8:37             ` Andrea Arcangeli
  1 sibling, 1 reply; 33+ messages in thread
From: Ingo Molnar @ 2003-07-30  7:40 UTC (permalink / raw)
  To: Andrea Arcangeli; +Cc: Andrew Morton, linas, linux-kernel


On Wed, 30 Jul 2003, Andrea Arcangeli wrote:

> [...] I'd feel much safer with the whole timer API being smp safe w/o
> requiring driver developers to add complicated external brainer locking.
> This will provide a much more friendly abstraction.

i agree with the goal, but your patch does not achieve this. Your patch
does not make double-add_timer() safe for example. As far as i can see
your 2.6 patch only introduces additional overhead.

	Ingo


^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: PATCH: Race in 2.6.0-test2 timer code
  2003-07-30  7:34           ` Ingo Molnar
@ 2003-07-30  8:28             ` Andrea Arcangeli
  2003-07-30 10:31               ` Ingo Molnar
  0 siblings, 1 reply; 33+ messages in thread
From: Andrea Arcangeli @ 2003-07-30  8:28 UTC (permalink / raw)
  To: Ingo Molnar; +Cc: Andrew Morton, linas, linux-kernel

On Wed, Jul 30, 2003 at 09:34:40AM +0200, Ingo Molnar wrote:
> 
> On Wed, 30 Jul 2003, Andrea Arcangeli wrote:
> 
> > The thing triggered simply by running setitimer in one function, while
> > the it_real_fn was running in the other cpu. I don't see how 2.6 can
> > have fixed this, it_real_fun can still trivially call add_timer while
> > you run inside do_setitimer in 2.6 too. [...]
> 
> This is not a race that can happen. itimer does this:
> 
> 	del_timer_sync();
> 	add_timer();
> 
> how can the add_timer() still happen while it_real_fn is still running on 
> another CPU?

it's not add_timer against add_timer in this case, it's del_timer_sync
against add_timer.


	cpu0			cpu1
	------------		--------------------

	do_setitimer
				it_real_fn
	del_timer_sync		add_timer	-> crash

Andrea

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: PATCH: Race in 2.6.0-test2 timer code
  2003-07-30  7:40           ` Ingo Molnar
@ 2003-07-30  8:37             ` Andrea Arcangeli
  2003-07-30 10:34               ` Ingo Molnar
  0 siblings, 1 reply; 33+ messages in thread
From: Andrea Arcangeli @ 2003-07-30  8:37 UTC (permalink / raw)
  To: Ingo Molnar; +Cc: Andrew Morton, linas, linux-kernel

On Wed, Jul 30, 2003 at 09:40:14AM +0200, Ingo Molnar wrote:
> 
> On Wed, 30 Jul 2003, Andrea Arcangeli wrote:
> 
> > [...] I'd feel much safer with the whole timer API being smp safe w/o
> > requiring driver developers to add complicated external brainer locking.
> > This will provide a much more friendly abstraction.
> 
> i agree with the goal, but your patch does not achieve this. Your patch
> does not make double-add_timer() safe for example. As far as i can see
> your 2.6 patch only introduces additional overhead.

I never did any 2.6 patch, so it maybe a different thing what you've
seen, not what I applied to 2.4. Infact even the 2.4 patch isn't from
me.

This is the fix I applied to my 2.4 tree and it definitely fixes the
problem, including add_timer against add_timer. And it fixes the bug in
practice, it usually took a few hours to crash, not it can run for days
stable and I never heard complains again.

--- linux-2.4.19.SuSE-orig/kernel/timer.c	2003-07-02 13:53:55.000000000 -0700
+++ linux-2.4.19.SuSE-timer/kernel/timer.c	2003-07-02 18:32:08.000000000 -0700
@@ -144,17 +144,22 @@ void add_timer(timer_t *timer)
 
 	CHECK_BASE(base);
 	CHECK_BASE(timer->base);
-	spin_lock_irqsave(&base->lock, flags);
-	if (timer_pending(timer))
-		goto bug;
-	internal_add_timer(base, timer);
-	timer->base = base;
-	spin_unlock_irqrestore(&base->lock, flags);
-	return;
-bug:
-	spin_unlock_irqrestore(&base->lock, flags);
-	printk("bug: kernel timer added twice at %p.\n",
-			__builtin_return_address(0));
+
+	local_irq_save(flags);
+	while (unlikely(test_and_set_bit(0, &timer->lock)))
+		while (test_bit(0, &timer->lock));
+
+	spin_lock(&base->lock);
+	if (timer_pending(timer)) {
+		printk("bug: kernel timer added twice at %p.\n",
+				__builtin_return_address(0));
+	} else {
+		internal_add_timer(base, timer);
+		timer->base = base;
+	}
+	spin_unlock(&base->lock);
+	clear_bit(0, &timer->lock);
+	local_irq_restore(flags);
 }

 static inline int detach_timer(timer_t *timer)
@@ -244,16 +249,24 @@ int del_timer(timer_t * timer)
 	CHECK_BASE(timer->base);
 	if (!timer->base)
 		return 0;
+
+	local_irq_save(flags);
+	while (unlikely(test_and_set_bit(0, &timer->lock)))
+		while (test_bit(0, &timer->lock));
+
 repeat:
  	base = timer->base;
-	spin_lock_irqsave(&base->lock, flags);
+	spin_lock(&base->lock);
 	if (base != timer->base) {
-		spin_unlock_irqrestore(&base->lock, flags);
+		spin_unlock(&base->lock);
 		goto repeat;
 	}
 	ret = detach_timer(timer);
 	timer->list.next = timer->list.prev = NULL;
-	spin_unlock_irqrestore(&base->lock, flags);
+	spin_unlock(&base->lock);
+
+	clear_bit(0, &timer->lock);
+	local_irq_restore(flags);

 	return ret;
 }
@@ -274,34 +287,48 @@ void sync_timers(void)

 int del_timer_sync(timer_t * timer)
 {
+	unsigned long flags;
 	tvec_base_t * base;
 	int ret = 0;

 	CHECK_BASE(timer->base);
 	if (!timer->base)
 		return 0;
+
+	local_irq_save(flags);
+	while (unlikely(test_and_set_bit(0, &timer->lock)))
+		while (test_bit(0, &timer->lock));
+
 	for (;;) {
-		unsigned long flags;
 		int running;
-
 repeat:
 	 	base = timer->base;
-		spin_lock_irqsave(&base->lock, flags);
+		spin_lock(&base->lock);
 		if (base != timer->base) {
-			spin_unlock_irqrestore(&base->lock, flags);
+			spin_unlock(&base->lock);
 			goto repeat;
 		}
 		ret += detach_timer(timer);
 		timer->list.next = timer->list.prev = 0;
 		running = timer_is_running(base, timer);
-		spin_unlock_irqrestore(&base->lock, flags);
+		spin_unlock(&base->lock);

 		if (!running)
 			break;
+
+		clear_bit(0, &timer->lock);
+		local_irq_restore(flags);

 		timer_synchronize(base, timer);
+
+		local_irq_save(flags);
+		while (unlikely(test_and_set_bit(0, &timer->lock)))
+			while (test_bit(0, &timer->lock));
 	}

+	clear_bit(0, &timer->lock);
+	local_irq_restore(flags);
+
 	return ret;
 }
 #endif

Andrea

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: PATCH: Race in 2.6.0-test2 timer code
  2003-07-30  8:28             ` Andrea Arcangeli
@ 2003-07-30 10:31               ` Ingo Molnar
  2003-07-30 11:16                 ` Andrea Arcangeli
  2003-07-30 23:43                 ` linas
  0 siblings, 2 replies; 33+ messages in thread
From: Ingo Molnar @ 2003-07-30 10:31 UTC (permalink / raw)
  To: Andrea Arcangeli; +Cc: Andrew Morton, linas, linux-kernel


On Wed, 30 Jul 2003, Andrea Arcangeli wrote:

> 
> 	cpu0			cpu1
> 	------------		--------------------
> 
> 	do_setitimer
> 				it_real_fn
> 	del_timer_sync		add_timer	-> crash

would you mind to elaborate the precise race? I cannot see how the above
sequence could crash on current 2.6:

del_timer_sync() takes the base pointer in timer->base and locks it, then
checks whether the timer has this base or not and repeats the sequence if
not. add_timer() locks the current CPU's base, then installs the timer and
sets timer->base. Where's the race?

	Ingo


^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: PATCH: Race in 2.6.0-test2 timer code
  2003-07-30  8:37             ` Andrea Arcangeli
@ 2003-07-30 10:34               ` Ingo Molnar
  2003-07-30 10:51                 ` Andrew Morton
  2003-07-30 11:22                 ` Andrea Arcangeli
  0 siblings, 2 replies; 33+ messages in thread
From: Ingo Molnar @ 2003-07-30 10:34 UTC (permalink / raw)
  To: Andrea Arcangeli; +Cc: Andrew Morton, linas, linux-kernel


On Wed, 30 Jul 2003, Andrea Arcangeli wrote:

> I never did any 2.6 patch, so it maybe a different thing what you've
> seen, not what I applied to 2.4. Infact even the 2.4 patch isn't from
> me.

the 2.4 timer-scalability patches do have a problem: due to TIMER_BH the
actual timer expiry can happen on a different CPU, which opens up a
del_timer()/add_timer() race in the itimer code. Your patch closes that
hole.

But on 2.6 the timer will run precisely on the CPU it was added, so i
think the race is not possible.

	Ingo


^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: PATCH: Race in 2.6.0-test2 timer code
  2003-07-30 10:34               ` Ingo Molnar
@ 2003-07-30 10:51                 ` Andrew Morton
  2003-07-30 11:28                   ` Andrea Arcangeli
  2003-07-30 11:22                 ` Andrea Arcangeli
  1 sibling, 1 reply; 33+ messages in thread
From: Andrew Morton @ 2003-07-30 10:51 UTC (permalink / raw)
  To: Ingo Molnar; +Cc: andrea, linas, linux-kernel

Ingo Molnar <mingo@elte.hu> wrote:
>
> But on 2.6 the timer will run precisely on the CPU it was added, so i
>  think the race is not possible.

well there is add_timer_on()...

I still don't see the race in the itimer code actually.  On return
from del_timer_sync() we know that the timer is not pending, even
if it_real_fn() tried to re-add it.

ie: why does the below "crash"?


Andrea Arcangeli <andrea@suse.de> wrote:
>
> 	cpu0			cpu1
>  	------------		--------------------
> 
>  	do_setitimer
>  				it_real_fn
>  	del_timer_sync		add_timer	-> crash


(Does the timer_pending() test in del_timer_sync() needs some
barriers btw?)


^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: PATCH: Race in 2.6.0-test2 timer code
  2003-07-30 10:31               ` Ingo Molnar
@ 2003-07-30 11:16                 ` Andrea Arcangeli
  2003-07-30 11:49                   ` Ingo Molnar
  2003-07-30 23:43                 ` linas
  1 sibling, 1 reply; 33+ messages in thread
From: Andrea Arcangeli @ 2003-07-30 11:16 UTC (permalink / raw)
  To: Ingo Molnar; +Cc: Andrew Morton, linas, linux-kernel

On Wed, Jul 30, 2003 at 12:31:23PM +0200, Ingo Molnar wrote:
> 
> On Wed, 30 Jul 2003, Andrea Arcangeli wrote:
> 
> > 
> > 	cpu0			cpu1
> > 	------------		--------------------
> > 
> > 	do_setitimer
> > 				it_real_fn
> > 	del_timer_sync		add_timer	-> crash
> 
> would you mind to elaborate the precise race? I cannot see how the above
> sequence could crash on current 2.6:
> 
> del_timer_sync() takes the base pointer in timer->base and locks it, then
> checks whether the timer has this base or not and repeats the sequence if
> not. add_timer() locks the current CPU's base, then installs the timer and
> sets timer->base. Where's the race?

not sure why you mention timer->base, you know the base lock means
nothing if it's in two different cpus (if it meant anything, it would
mean we'd still suffer the cacheline bouncing with timer ops running in
different cpus). To make it more clear I edit timer.c deleting all
spin_lock(&base->lock) so the base->lock won't make it look like it can
serialize anything. I delete the local_irqsave too since it doesn't
matter too for this example (also assume premption turned off so I don't
need to add the preempt disable in place of the local irq save).

so for this example we have this:

int del_timer(struct timer_list *timer)
{
	tvec_base_t *base;

repeat:
 	base = timer->base;
	if (!base)
		return 0;
	if (base != timer->base) {
		goto repeat;
	}
	list_del(&timer->entry);
	timer->base = NULL;

	return 1;
}

against this:

void add_timer(struct timer_list *timer)
{
	tvec_base_t *base = &get_cpu_var(tvec_bases);
  
  	BUG_ON(timer_pending(timer) || !timer->function);

	internal_add_timer(base, timer);
	timer->base = base;
	put_cpu_var(tvec_bases);
}


in del_timer, list_del can be reordered after the timer->base = NULL,
the C compiler can do that. so list_del will run at the same time of
internal_add_timer(base, timer) that does the list_add_tail.

list_del and list_add_tail running at the same time will crash.

Andrea

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: PATCH: Race in 2.6.0-test2 timer code
  2003-07-30 10:34               ` Ingo Molnar
  2003-07-30 10:51                 ` Andrew Morton
@ 2003-07-30 11:22                 ` Andrea Arcangeli
  1 sibling, 0 replies; 33+ messages in thread
From: Andrea Arcangeli @ 2003-07-30 11:22 UTC (permalink / raw)
  To: Ingo Molnar; +Cc: Andrew Morton, linas, linux-kernel

On Wed, Jul 30, 2003 at 12:34:39PM +0200, Ingo Molnar wrote:
> 
> On Wed, 30 Jul 2003, Andrea Arcangeli wrote:
> 
> > I never did any 2.6 patch, so it maybe a different thing what you've
> > seen, not what I applied to 2.4. Infact even the 2.4 patch isn't from
> > me.
> 
> the 2.4 timer-scalability patches do have a problem: due to TIMER_BH the
> actual timer expiry can happen on a different CPU, which opens up a
> del_timer()/add_timer() race in the itimer code. Your patch closes that
> hole.
> 
> But on 2.6 the timer will run precisely on the CPU it was added, so i
> think the race is not possible.

sure, the timer will run in the CPU it was added, and the setitimer will
run in a random cpu depending where the process is been rescheduled too
between the last setitimer, and the current setitimer.  (it's not the
timer moving, it's the process!)

The only way I can imagine this race not triggering in 2.6 and my old
2.4-aa, is by using cpu bindings, of course if you bind the app that
triggers the crash to a single cpu, there's no risk of triggering this.

Also note, the 2.4 code I included is different from what you mention,
I'm definitely enforcing to run the timer always and exactly in the same
cpu it was added. that's why it needs the hooks in the local timer
interrupts of x86, x86-64 and I added it a few weeks ago to ia64 too,
near the per-process accounting, and it can't run the timers from the
global timer irq anymore. Currently only IA64, AMD64 and x86 provides
the feature, other archs still fallbacks in the global timer irq and
they behave as you described, and for them the scalability is lower of
course. But on the archs where this bug triggered it was behaving 100%
like 2.6 since it was x86.

Andrea

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: PATCH: Race in 2.6.0-test2 timer code
  2003-07-30 10:51                 ` Andrew Morton
@ 2003-07-30 11:28                   ` Andrea Arcangeli
  0 siblings, 0 replies; 33+ messages in thread
From: Andrea Arcangeli @ 2003-07-30 11:28 UTC (permalink / raw)
  To: Andrew Morton; +Cc: Ingo Molnar, linas, linux-kernel

On Wed, Jul 30, 2003 at 03:51:40AM -0700, Andrew Morton wrote:
> Ingo Molnar <mingo@elte.hu> wrote:
> >
> > But on 2.6 the timer will run precisely on the CPU it was added, so i
> >  think the race is not possible.
> 
> well there is add_timer_on()...
> 
> I still don't see the race in the itimer code actually.  On return
> from del_timer_sync() we know that the timer is not pending, even
> if it_real_fn() tried to re-add it.
> 
> ie: why does the below "crash"?
> 
> 
> Andrea Arcangeli <andrea@suse.de> wrote:
> >
> > 	cpu0			cpu1
> >  	------------		--------------------
> > 
> >  	do_setitimer
> >  				it_real_fn
> >  	del_timer_sync		add_timer	-> crash
> 
> 
> (Does the timer_pending() test in del_timer_sync() needs some
> barriers btw?)

it might be possible to use ordered writes on one side and ordered reads
on the other side to fix this instead of spinlock. I suggested to use my
spinlock-by-hand idea to fix it in 2.4 (like I previously did with
mod_timer), but we might try to do something more efficient in 2.6 if
you've some idea. I don't think it matters much anyways since the
cacheline wouldn't be exlusive anyways if we get into the above path,
and the above isn't the common path, but maybe it does. I think the
unified way of locking with mod_timer/add_timer/del_timer I'm currently
used is simple and clean, but if you see any significant performance
advantage we can change it of course.

If my last email where I analyzed the problem in more detail is not
clear or you see fault please let me know of course.

thanks,

Andrea

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: PATCH: Race in 2.6.0-test2 timer code
  2003-07-30 11:16                 ` Andrea Arcangeli
@ 2003-07-30 11:49                   ` Ingo Molnar
  2003-07-30 12:34                     ` Andrea Arcangeli
  0 siblings, 1 reply; 33+ messages in thread
From: Ingo Molnar @ 2003-07-30 11:49 UTC (permalink / raw)
  To: Andrea Arcangeli; +Cc: Andrew Morton, linas, linux-kernel


On Wed, 30 Jul 2003, Andrea Arcangeli wrote:

> in del_timer, list_del can be reordered after the timer->base = NULL,
> the C compiler can do that. so list_del will run at the same time of
> internal_add_timer(base, timer) that does the list_add_tail.

no, it cannot run at the same time. The add_timer() will first lock the
current CPU's base, before touching the list. Any parallel del_timer() can
only do the list_del() if it first has locked timer->base. timer->base can
only have the base of the CPU where it_real_fn is running, or be NULL. In
the NULL case del_timer() wont do a thing but return. In the other case
the timer->base value observed by the del_timer()-executing CPU will be
the same base as where it_real_fn is running, so both the add_timer() and
the del_timer() will serialize on the same base => no parallel list
handling possible. How the compiler (or even the CPU, on non-x86) orders
the writes within the locked section is irrelevant in this scenario.

	Ingo


^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: PATCH: Race in 2.6.0-test2 timer code
  2003-07-30 11:49                   ` Ingo Molnar
@ 2003-07-30 12:34                     ` Andrea Arcangeli
  2003-07-30 21:18                       ` linas
  2003-07-30 21:19                       ` Andrea Arcangeli
  0 siblings, 2 replies; 33+ messages in thread
From: Andrea Arcangeli @ 2003-07-30 12:34 UTC (permalink / raw)
  To: Ingo Molnar; +Cc: Andrew Morton, linas, linux-kernel

On Wed, Jul 30, 2003 at 01:49:52PM +0200, Ingo Molnar wrote:
> 
> On Wed, 30 Jul 2003, Andrea Arcangeli wrote:
> 
> > in del_timer, list_del can be reordered after the timer->base = NULL,
> > the C compiler can do that. so list_del will run at the same time of
> > internal_add_timer(base, timer) that does the list_add_tail.
> 
> no, it cannot run at the same time. The add_timer() will first lock the
> current CPU's base, before touching the list. Any parallel del_timer() can
> only do the list_del() if it first has locked timer->base. timer->base can
> only have the base of the CPU where it_real_fn is running, or be NULL. In
> the NULL case del_timer() wont do a thing but return. In the other case
> the timer->base value observed by the del_timer()-executing CPU will be
> the same base as where it_real_fn is running, so both the add_timer() and
> the del_timer() will serialize on the same base => no parallel list
> handling possible. How the compiler (or even the CPU, on non-x86) orders
> the writes within the locked section is irrelevant in this scenario.

so if it was really the itimer, it had to be on ppc s390 or ia64, not on
x86.  I never reproduced this myself. I will ask more info on bugzilla,
because I thought it was x86 but maybe it wasn't. As said in the
previous email, only non x86 archs can run the timer irq on a cpu
different than the one where it was inserted.

As Andrew, noted the same race could happen in 2.6 with add_timer_on.
But apparently you're right that the setitimer couldn't trigger on 2.6
or 2.4-aa x86.

Still it could be something else that broke related to
add_timer/del_timer_sync in drivers reproducible in x86 too. As said I
didn't debug or reproduce it myself. It simply looked correct to allow
add_timer to run in parallel of del_timer_sync (no matter which cpu
they're running on, if add_timer runs from inside the timer itself, of
course it can't trigger because the base won't change in add_timer and
del_timer will be a reader falling in the same base, but that's a kind
of special case, and it fails too if you use add_timer_on).

Andrea

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: PATCH: Race in 2.6.0-test2 timer code
  2003-07-30  5:57   ` Ingo Molnar
  2003-07-30  6:36     ` Andrew Morton
@ 2003-07-30 20:05     ` linas
  2003-07-31  6:50       ` Ingo Molnar
  2003-07-31 22:56     ` linas
  2 siblings, 1 reply; 33+ messages in thread
From: linas @ 2003-07-30 20:05 UTC (permalink / raw)
  To: Ingo Molnar; +Cc: Andrew Morton, linux-kernel

On Wed, Jul 30, 2003 at 07:57:32AM +0200, Ingo Molnar wrote:
> 
> and i dont think Linas' patch is correct either - how can the timer base
> change under us? We are holding the timer spinlock.

I deduced the need for the patch by looking at the locking code immediately
above it, which takes pains to get old_base and new_base correctly locked.
My patch was to handle the case of old_base being NULL, which seemed to be 
unhandled.  

Comments & disclaimers:
-- I don't quite have the 'big picture' for the timer code yet,
   so please excuse any thinko's below.
-- I don't know under what cases timer->base can be NULL, but clearly
   there are checks for it being NULL, ergo...
-- I then see a race where timer->base is NULL, and so old_base is NULL, 
   then some other CPU sets timer->base, then we get the lock
   on new_base, and blandly assume timer->base is NULL, which it no longer 
   is.

A bit more graphically ... 

timer->base = NULL;  /* starting condition */

cpu 1                                   cpu 2
--------                                ---------
mod_timer()  {                          

old_base = timer->base;
if (old_base &&  ) { /* not taken */
}
else
.                                       spin_lock(&cpu2_base->lock);
.                                       timer->base = cpu2_base;
.                                       spin_unlock(&cpu2_base->lock);
.
spin_lock(&new_base->lock);
                                                                                
/* Uhh oh, timer->base is not null,
   in fact its cpu2 base, and we aren't
   checking for this before we clobber it.
 */


I don't see what prevents the above from happening.  And if the 
above really can't happen, then I also don't understand why the fancy
old_base, new-base locking code is required at all. 


> What i'd propose is the attached (tested, against 2.6.0-test2) patch

Due to technical difficulties at this end, I'm having trouble testing
any patches.  The guy who knows how to run the test case to reproduce
this is out sick,  its a big hairy testcase with all sorts of stuff 
going on, and I can't make it go.  It can take 8+ hours to hit it.
I'll need a few days ... 

The other machine that hits it is a 6-way smp powerpc running nothing 
but 6 copies of setiathome, but it takes 48+ hours to reproduce there, 
so testing is even slower there ...

--linas

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: PATCH: Race in 2.6.0-test2 timer code
  2003-07-30 12:34                     ` Andrea Arcangeli
@ 2003-07-30 21:18                       ` linas
  2003-07-30 22:06                         ` Andrea Arcangeli
  2003-07-30 21:19                       ` Andrea Arcangeli
  1 sibling, 1 reply; 33+ messages in thread
From: linas @ 2003-07-30 21:18 UTC (permalink / raw)
  To: Andrea Arcangeli; +Cc: Ingo Molnar, Andrew Morton, linux-kernel

On Wed, Jul 30, 2003 at 02:34:58PM +0200, Andrea Arcangeli wrote:
> 
> so if it was really the itimer, it had to be on ppc s390 or ia64, not on
> x86.  I never reproduced this myself. I will ask more info on bugzilla,
> because I thought it was x86 but maybe it wasn't. As said in the
> previous email, only non x86 archs can run the timer irq on a cpu
> different than the one where it was inserted.

I think I started the thread, so let me backtrack.  I've got several
ppc64 boxes running 2.4.21.    They hang, they don't crash.  Sometimes
they're pingable, but that's all.  The little yellow button is wired 
up to KDB, so I can break in and poke around and see what's happening.

Here's what I find:

__run_timers() is stuck in a infinite loop, because there's a 
timer on the base.  However, this timer has timer->list.net == NULL,
and so it can never be removed.  (As a side effect, other CPU's get 
stuck in various places, sitting in spinlocks, etc. which is why the
machine goes unresponsive)  So the question becomes "how did this 
pointer get trashed?"

I thought I saw an "obvious race" in the 2.4 code, and it looked
identical to the 2.6 code, and thus the email went out.  Now, I'm
not so sure; I'm confused.  

However, given that its timer->list.net that is getting clobbered,
then locking all occurrances of list_add and list_del should cure the
problem.   I'm guessing that Andrea's patch to add a timer->lock 
will protect all of the list_add's, list_del's that matter.  

Given that I don't yet understand how timer->list.net got clobbered,
I can't yet say 'yes/no this fixes it'. 

--linas


^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: PATCH: Race in 2.6.0-test2 timer code
  2003-07-30 12:34                     ` Andrea Arcangeli
  2003-07-30 21:18                       ` linas
@ 2003-07-30 21:19                       ` Andrea Arcangeli
  1 sibling, 0 replies; 33+ messages in thread
From: Andrea Arcangeli @ 2003-07-30 21:19 UTC (permalink / raw)
  To: Ingo Molnar; +Cc: Andrew Morton, linas, linux-kernel

On Wed, Jul 30, 2003 at 02:34:58PM +0200, Andrea Arcangeli wrote:
> x86.  I never reproduced this myself. I will ask more info on bugzilla,

I've the confirmation it was reproduced on x86, so the timer should have
been running in the same cpu where it was queued (i.e. the base should
have been the same for both del_timer_sync and add_timer). So at the
moment it's not clear what race that patch fixed, but nevertheless I
feel much safer to keep this obviously safe additional locking applied
until we know for sure ;) And I've the confirmation that it makes the
2.4 kernel stable again.

FYI, if I don't answer to further emails on this in the next days, it's
because I'll be on vacations the next 14 days.

Andrea

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: PATCH: Race in 2.6.0-test2 timer code
  2003-07-30 21:18                       ` linas
@ 2003-07-30 22:06                         ` Andrea Arcangeli
  2003-07-30 22:17                           ` Andrea Arcangeli
  0 siblings, 1 reply; 33+ messages in thread
From: Andrea Arcangeli @ 2003-07-30 22:06 UTC (permalink / raw)
  To: linas; +Cc: Ingo Molnar, Andrew Morton, linux-kernel

On Wed, Jul 30, 2003 at 04:18:10PM -0500, linas@austin.ibm.com wrote:
> On Wed, Jul 30, 2003 at 02:34:58PM +0200, Andrea Arcangeli wrote:
> > 
> > so if it was really the itimer, it had to be on ppc s390 or ia64, not on
> > x86.  I never reproduced this myself. I will ask more info on bugzilla,
> > because I thought it was x86 but maybe it wasn't. As said in the
> > previous email, only non x86 archs can run the timer irq on a cpu
> > different than the one where it was inserted.
> 
> I think I started the thread, so let me backtrack.  I've got several
> ppc64 boxes running 2.4.21.    They hang, they don't crash.  Sometimes
> they're pingable, but that's all.  The little yellow button is wired 
> up to KDB, so I can break in and poke around and see what's happening.
> 
> Here's what I find:
> 
> __run_timers() is stuck in a infinite loop, because there's a 
> timer on the base.  However, this timer has timer->list.net == NULL,
> and so it can never be removed.  (As a side effect, other CPU's get 
> stuck in various places, sitting in spinlocks, etc. which is why the
> machine goes unresponsive)  So the question becomes "how did this 
> pointer get trashed?"
> 
> I thought I saw an "obvious race" in the 2.4 code, and it looked
> identical to the 2.6 code, and thus the email went out.  Now, I'm
> not so sure; I'm confused.  

the 2.4 code calls into run_all_timers for non x86 archs, that should
go away, and be replaced with calls of the local_run_timers in the per cpu
timer interrupts (not the global timer irq). Infact I wouldn't trust calling
run_all_timers anyways, that might be the racy thing, you're probably the first
one calling it.

> 
> However, given that its timer->list.net that is getting clobbered,
> then locking all occurrances of list_add and list_del should cure the
> problem.   I'm guessing that Andrea's patch to add a timer->lock 
> will protect all of the list_add's, list_del's that matter.  

you definitely need that patch on non x86 archs with the run_all_timers or the
race that I pointed out today will trigger (as agreed by Ingo too), so you
should try to reproduce with it applied.

I've also seen another patch floating around that looks like this:

@@ -214,8 +214,14 @@ repeat:
                        spin_unlock(&old_base->lock);
                        goto repeat;
                }
-       } else
+       } else {
                spin_lock(&new_base->lock);
+               if (timer->base != old_base) {
+                       spin_unlock(&new_base->lock);
+                       printk ("duuude found and fixed bug locking mod
timer\n");
+                       goto repeat;
+               }
+ 

but I don't think we need even this other one in 2.4 (2.4 never touches the
timer->base from the del_timer*):

                /*
                 * Subtle, we rely on timer->base being always
                 * valid and being updated atomically.
                 */
                if (timer->base != old_base) {
                        spin_unlock(&new_base->lock);
                        spin_unlock(&old_base->lock);
                        goto repeat;
                }

since we're serialized by the timer->lock in mod_timer, and del_timer
won't affect the timer->base in 2.4. And add_timer is forbidden to run at the
same time of mod_timer.

So I think we should change the code this way instead of applying the
above patch that looks wrong:

--- 2.4.22pre7aa1/kernel/timer.c.~1~    2003-07-19 02:34:09.000000000 +0200
+++ 2.4.22pre7aa1/kernel/timer.c        2003-07-30 23:43:03.000000000 +0200
@@ -210,15 +210,7 @@ repeat:
                        spin_lock(&old_base->lock);
                        spin_lock(&new_base->lock);
                }
-               /*
-                * Subtle, we rely on timer->base being always
-                * valid and being updated atomically.
-                */
-               if (timer->base != old_base) {
-                       spin_unlock(&new_base->lock);
-                       spin_unlock(&old_base->lock);
-                       goto repeat;
-               }
+               BUG_ON(timer->base != old_base);
        } else
                spin_lock(&new_base->lock);


Assuming you were running 2.4.21 w/o the extended timer->lock to add_timer and
del_timer, the only mistery at the moment is how can have  lcm@us.ibm.com
reproduced a race in setitimer on x86 and get it fixed with the patch I postd
to l-k earlier today that extends the same timer->lock of mod_timer, to
add_timer and del_timer* too.

I mean, that patch is perfectly safe, looks very sane, is strictly necessary on
all non x86* and ia64 archs, but it shouldn't be necessary on x86 as discussed
today with Ingo and Andrew.

I will keep that patch applied because this is 2.4 and it fixes stuff in
practice, but still we must be missing something about this code.

Andrea

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: PATCH: Race in 2.6.0-test2 timer code
  2003-07-30 22:06                         ` Andrea Arcangeli
@ 2003-07-30 22:17                           ` Andrea Arcangeli
  2003-07-31  7:04                             ` Ingo Molnar
  0 siblings, 1 reply; 33+ messages in thread
From: Andrea Arcangeli @ 2003-07-30 22:17 UTC (permalink / raw)
  To: linas; +Cc: Ingo Molnar, Andrew Morton, linux-kernel

On Thu, Jul 31, 2003 at 12:06:04AM +0200, Andrea Arcangeli wrote:
> practice, but still we must be missing something about this code.

ah, finally I see how can the timer->lock can have made the kernel
stable again!

run_all_timers can still definitely run on x86 too if the local cpu
timer runs on top of an irq handler:

	if (in_interrupt())
		goto out_mark;
out_mark:
	mark_bh(TIMER_BH);

	init_bh(TIMER_BH, run_all_timers);

(still on ppc will be an order of magnitude less stable than on x86,
since ppc only calls run_all_timers, so you don't need two races to
trigger at the same time to crash)

ok, so my current code is the right one and it's not needed in 2.6 since
2.6 always runs inside a softirq and it never fallbacks to a
run_all_timers.

So the best fix would be to nuke the run_all_timers thing from 2.4 too.

For now I'm only going to take the risk adding the BUG_ON in mod_timer
and to keep the timer->lock everywhere to make run_all_timers safe.

Now you should only make sure that your 2.4.21 gets stable too with the
fix I applied today (and please add the BUG_ON(old_base != timer->base)
in mod_timer too so I won't be the only tester of that code ;)

In short the stack traces I described today were all right but only for
2.4, and not for 2.6.

Andrea

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: PATCH: Race in 2.6.0-test2 timer code
  2003-07-30 10:31               ` Ingo Molnar
  2003-07-30 11:16                 ` Andrea Arcangeli
@ 2003-07-30 23:43                 ` linas
  2003-07-30 23:56                   ` Andrea Arcangeli
  1 sibling, 1 reply; 33+ messages in thread
From: linas @ 2003-07-30 23:43 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Andrea Arcangeli, Andrew Morton, linux-kernel, olh, olof, linas

Hi,

OK, I finally spent some time studying the timer code and am starting 
to grasp it.  I think I finally understand the bug.  Andrea's timer->lock 
patch will fix it for 2.4 and everybody says it can't happen in 2.6 so
ok.


On Thu, Jul 31, 2003 at 12:17:17AM +0200, Andrea Arcangeli wrote:
> So the best fix would be to nuke the run_all_timers thing from 2.4 too.

Yes.

>and to keep the timer->lock everywhere to make run_all_timers safe.

Or do that instead, yes.

> In short the stack traces I described today were all right but only for
> 2.4, and not for 2.6.

I see the bug in 2.4.  

On Wed, Jul 30, 2003 at 12:31:23PM +0200, Ingo Molnar wrote:
> 
> On Wed, 30 Jul 2003, Andrea Arcangeli wrote:
> 
> > 
> > 	cpu0			cpu1
> > 	------------		--------------------
> > 
> > 	do_setitimer
> > 				it_real_fn
> > 	del_timer_sync		add_timer	-> crash
> 
> would you mind to elaborate the precise race? I cannot see how the above
> sequence could crash on current 2.6:
 
I don't know enough about how 2.6 works to say, 
but here's more detail on what happened in 2.4:


cpu 0                                              cpu 3
-------                                          ---------
                                      previously,  timer->base = cpu 3 base
                                          (its task-struct->real_timer)
bh_action() {
__run_timers() {                          sys_setitimer() {
  it_real_fn() // called as fn()              del_timer_sync() {
    add_timer() {                               spinlock (cpu3 base)
      spinlock (cpu0 base)
        timer->base = cpu0 base                    detach_timer (timer)
          list_add(&timer->list)
                                                   timer->list.next = NULL;
  and now timer is vectored in but can't be unlinked.

And so either of Andrea's fixes should fix this race.

--linas

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: PATCH: Race in 2.6.0-test2 timer code
  2003-07-30 23:56                   ` Andrea Arcangeli
@ 2003-07-30 23:54                     ` Andrew Morton
  2003-07-31  0:16                       ` Andrea Arcangeli
  2003-07-31 17:23                     ` linas
  1 sibling, 1 reply; 33+ messages in thread
From: Andrew Morton @ 2003-07-30 23:54 UTC (permalink / raw)
  To: Andrea Arcangeli; +Cc: linas, mingo, linux-kernel, olh, olof

Andrea Arcangeli <andrea@suse.de> wrote:
>
> 2.6 will kernel crash like we did in 2.4 only if it calls
> add_timer_on from timer context (of course with a cpuid != than the
> smp_processor_id()), that would be fixed by the timer->lock everywhere
> that we've in 2.4 right now.  (but there's no add_timer_on in 2.4
> anyways)

add_timer_on() was added specifically for slab bringup.  If we need extra
locking to cope with it then the best solution would probably be to rename
it to add_timer_on_dont_use_this_for_anything_else().

But if we are going to rely on timer handlers only ever running on the
adding CPU for locking purposes then can we please have a big comment
somewhere describing what's going on?  It's very subtle...


^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: PATCH: Race in 2.6.0-test2 timer code
  2003-07-30 23:43                 ` linas
@ 2003-07-30 23:56                   ` Andrea Arcangeli
  2003-07-30 23:54                     ` Andrew Morton
  2003-07-31 17:23                     ` linas
  0 siblings, 2 replies; 33+ messages in thread
From: Andrea Arcangeli @ 2003-07-30 23:56 UTC (permalink / raw)
  To: linas; +Cc: Ingo Molnar, Andrew Morton, linux-kernel, olh, olof

On Wed, Jul 30, 2003 at 06:43:18PM -0500, linas@austin.ibm.com wrote:
> Hi,
> 
> OK, I finally spent some time studying the timer code and am starting 
> to grasp it.  I think I finally understand the bug.  Andrea's timer->lock 
> patch will fix it for 2.4 and everybody says it can't happen in 2.6 so
> ok.
> 
> 
> On Thu, Jul 31, 2003 at 12:17:17AM +0200, Andrea Arcangeli wrote:
> > So the best fix would be to nuke the run_all_timers thing from 2.4 too.
> 
> Yes.
> 
> >and to keep the timer->lock everywhere to make run_all_timers safe.
> 
> Or do that instead, yes.
> 
> > In short the stack traces I described today were all right but only for
> > 2.4, and not for 2.6.
> 
> I see the bug in 2.4.  
> 
> On Wed, Jul 30, 2003 at 12:31:23PM +0200, Ingo Molnar wrote:
> > 
> > On Wed, 30 Jul 2003, Andrea Arcangeli wrote:
> > 
> > > 
> > > 	cpu0			cpu1
> > > 	------------		--------------------
> > > 
> > > 	do_setitimer
> > > 				it_real_fn
> > > 	del_timer_sync		add_timer	-> crash
> > 
> > would you mind to elaborate the precise race? I cannot see how the above
> > sequence could crash on current 2.6:
>  
> I don't know enough about how 2.6 works to say, 
> but here's more detail on what happened in 2.4:
> 
> 
> cpu 0                                              cpu 3
> -------                                          ---------
>                                       previously,  timer->base = cpu 3 base
>                                           (its task-struct->real_timer)
> bh_action() {
> __run_timers() {                          sys_setitimer() {
>   it_real_fn() // called as fn()              del_timer_sync() {
>     add_timer() {                               spinlock (cpu3 base)
>       spinlock (cpu0 base)
>         timer->base = cpu0 base                    detach_timer (timer)
>           list_add(&timer->list)
>                                                    timer->list.next = NULL;
>   and now timer is vectored in but can't be unlinked.
> 
> And so either of Andrea's fixes should fix this race.

exactly. If you want to nuke the run_all_timers from the 2.4 backport
feel free, then we could drop the additional locking from
add_timer/del_timer* and leave it only in mod_timer like 2.6 does, that
will avoid cacheline bouncing in the small window when the local timer
irq run on top of an irq handler. In the meantime the kernel is already
solid (again ;).

2.6 will kernel crash like we did in 2.4 only if it calls
add_timer_on from timer context (of course with a cpuid != than the
smp_processor_id()), that would be fixed by the timer->lock everywhere
that we've in 2.4 right now.  (but there's no add_timer_on in 2.4
anyways)

BTW, it's reassuring it wasn't lack of 2.6 stress testing, I apologize
for claiming that.

Andrea

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: PATCH: Race in 2.6.0-test2 timer code
  2003-07-30 23:54                     ` Andrew Morton
@ 2003-07-31  0:16                       ` Andrea Arcangeli
  0 siblings, 0 replies; 33+ messages in thread
From: Andrea Arcangeli @ 2003-07-31  0:16 UTC (permalink / raw)
  To: Andrew Morton; +Cc: linas, mingo, linux-kernel, olh, olof

On Wed, Jul 30, 2003 at 04:54:18PM -0700, Andrew Morton wrote:
> Andrea Arcangeli <andrea@suse.de> wrote:
> >
> > 2.6 will kernel crash like we did in 2.4 only if it calls
> > add_timer_on from timer context (of course with a cpuid != than the
> > smp_processor_id()), that would be fixed by the timer->lock everywhere
> > that we've in 2.4 right now.  (but there's no add_timer_on in 2.4
> > anyways)
> 
> add_timer_on() was added specifically for slab bringup.  If we need extra
> locking to cope with it then the best solution would probably be to rename
> it to add_timer_on_dont_use_this_for_anything_else().

yes. I wasn't actually suggesting to add locking everywhere just for
this. It sounds reasonable to leave it unsafe from timer context.

> But if we are going to rely on timer handlers only ever running on the
> adding CPU for locking purposes then can we please have a big comment
> somewhere describing what's going on?  It's very subtle...

agreed, it definitely deserves the big fat comment somewhere ;)

thanks,

Andrea

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: PATCH: Race in 2.6.0-test2 timer code
  2003-07-30 20:05     ` linas
@ 2003-07-31  6:50       ` Ingo Molnar
  0 siblings, 0 replies; 33+ messages in thread
From: Ingo Molnar @ 2003-07-31  6:50 UTC (permalink / raw)
  To: linas; +Cc: Andrew Morton, linux-kernel


On Wed, 30 Jul 2003 linas@austin.ibm.com wrote:

> cpu 1                                   cpu 2
> --------                                ---------
> mod_timer()  {                          
> 
> old_base = timer->base;
> if (old_base &&  ) { /* not taken */
> }
> else
> .                                       spin_lock(&cpu2_base->lock);


this race is not possible on 2.6. You are forgetting:

        spin_lock_irqsave(&timer->lock, flags);

which serializes the full mod_timer() operation. Ok?

	Ingo


^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: PATCH: Race in 2.6.0-test2 timer code
  2003-07-30 22:17                           ` Andrea Arcangeli
@ 2003-07-31  7:04                             ` Ingo Molnar
  0 siblings, 0 replies; 33+ messages in thread
From: Ingo Molnar @ 2003-07-31  7:04 UTC (permalink / raw)
  To: Andrea Arcangeli; +Cc: linas, Andrew Morton, linux-kernel


On Thu, 31 Jul 2003, Andrea Arcangeli wrote:

> ah, finally I see how can the timer->lock can have made the kernel
> stable again!
> 
> run_all_timers can still definitely run on x86 too if the local cpu
> timer runs on top of an irq handler:

as i told you ...

i've added the TIMER_BH thing to the 2.4 patch to ensure timer-fn
single-threadedness.

this issue has been fixed in 2.6 by the removal of TIMER_BH.

	Ingo


^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: PATCH: Race in 2.6.0-test2 timer code
  2003-07-30 23:56                   ` Andrea Arcangeli
  2003-07-30 23:54                     ` Andrew Morton
@ 2003-07-31 17:23                     ` linas
  2003-08-01  6:27                       ` Ingo Molnar
  1 sibling, 1 reply; 33+ messages in thread
From: linas @ 2003-07-31 17:23 UTC (permalink / raw)
  To: Andrea Arcangeli; +Cc: Ingo Molnar, Andrew Morton, linux-kernel, olof

On Thu, Jul 31, 2003 at 01:56:07AM +0200, Andrea Arcangeli wrote:
> On Wed, Jul 30, 2003 at 06:43:18PM -0500, linas@austin.ibm.com wrote:
> > On Thu, Jul 31, 2003 at 12:17:17AM +0200, Andrea Arcangeli wrote:
> > > So the best fix would be to nuke the run_all_timers thing from 2.4 too.
> > Yes.
[...]
> > And so either of Andrea's fixes should fix this race.
> 
> exactly. If you want to nuke the run_all_timers from the 2.4 backport
> feel free, then we could drop the additional locking from
> add_timer/del_timer* and leave it only in mod_timer like 2.6 does, that
> will avoid cacheline bouncing in the small window when the local timer
> irq run on top of an irq handler. In the meantime the kernel is already
> solid (again ;).

OK, I looked at removing run_all_timers, it doesn't seem too hard. 

I would need to: 
-- add TIMER_SOFTIRQ to interrupts.h,
-- add open_softirq (run_timer_softirq) to timer.c init_timer()
-- move guts of run_local_timers() to run_timer_softirq()
-- remove bh locks in above, not yet sure about other locks
-- remove TIMER_BH everywhere.  Or rather, remove it for those
   arches that support cpu-local timer interupts (curently x86 & freinds, 
   soon hopefully ppc64, I attach it below, in case other arches want to 
   play with this).

Is that right?
Should I do this patch or will people loose interest in it?
Who should I send it to?  Who wants to review it?

> BTW, it's reassuring it wasn't lack of 2.6 stress testing, I apologize
> for claiming that.

Me too, sorry for making noise before figuring it all out. But then, 
I remember asking a college physics professor for a recommendation, 
and he didn't know me well.  "You sat in the back of the class and
didn't ask any questions, unlike so-n-so, (the #1 student in the class) 
who sat in front and asked a lot of questions."  I told him that that was 
because I knew the answers to all of the questions that were being asked.
(I  was trying to distance myself from the stupid people up front).
He wasn't impressed by my reply, I hurt pretty bad.

--------------------------------------------------------------------
Re the per-cpu, local timer patch, unless I'm badly mistaken, its pretty 
trivial, right:?

Index: kernel/timer.c
===================================================================
RCS file: /home/linas/cvsroot/linux24/kernel/timer.c,v
retrieving revision 1.1.1.1.4.1
diff -u -p -u -r1.1.1.1.4.1 timer.c
--- kernel/timer.c   15 Jul 2003 18:43:52 -0000 1.1.1.1.4.1
+++ kernel/timer.c   31 Jul 2003 15:30:26 -0000
@@ -764,7 +764,7 @@ void do_timer(struct pt_regs *regs)
   /* SMP process accounting uses the local APIC timer */
                                                                                
   update_process_times(user_mode(regs));
-#if defined(CONFIG_X86) || defined(CONFIG_IA64) /* x86-64 is also included by CONFIG_X86 */
+#if defined(CONFIG_X86) || defined(CONFIG_IA64) || defined(CONFIG_PPC64) /* x86-64 is also included by CONFIG_X86 */
   mark_bh(TIMER_BH);
 #endif
 #endif
@@ -772,7 +772,7 @@ void do_timer(struct pt_regs *regs)
    * Right now only x86-SMP calls run_local_timers() from a
    * per-CPU interrupt.
    */
-#if !defined(CONFIG_X86) && !defined(CONFIG_IA64) /* x86-64 is also included by CONFIG_X86 */
+#if !defined(CONFIG_X86) && !defined(CONFIG_IA64) && !defined(CONFIG_PPC64) /*
x86-64 is also included by CONFIG_X86 */
   mark_bh(TIMER_BH);
 #endif
   update_times();
Index: arch/ppc64/kernel/smp.c
===================================================================
RCS file: /home/linas/cvsroot/linux24/arch/ppc64/kernel/smp.c,v
retrieving revision 1.2.4.1
diff -u -p -u -r1.2.4.1 smp.c
--- arch/ppc64/kernel/smp.c   15 Jul 2003 18:41:56 -0000 1.2.4.1
+++ arch/ppc64/kernel/smp.c   31 Jul 2003 15:21:35 -0000
@@ -398,6 +398,8 @@ void smp_local_timer_interrupt(struct pt
      update_process_times(user_mode(regs));
      (get_paca()->prof_counter)=get_paca()->prof_multiplier;
   }
+
+  run_local_timers();
 }
                                                                                
 void smp_message_recv(int msg, struct pt_regs *regs)



^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: PATCH: Race in 2.6.0-test2 timer code
  2003-07-30  5:57   ` Ingo Molnar
  2003-07-30  6:36     ` Andrew Morton
  2003-07-30 20:05     ` linas
@ 2003-07-31 22:56     ` linas
  2003-08-01  6:23       ` Ingo Molnar
  2 siblings, 1 reply; 33+ messages in thread
From: linas @ 2003-07-31 22:56 UTC (permalink / raw)
  To: Ingo Molnar; +Cc: Andrew Morton, linux-kernel

Hi Mingo,

On Wed, Jul 30, 2003 at 07:57:32AM +0200, Ingo Molnar wrote:
> 
> On Tue, 29 Jul 2003, Andrew Morton wrote:
> > Andrea says that we need to take the per-timer lock in add_timer() and
> > del_timer(), but I haven't yet got around to working out why.
> 
> this makes no sense - in 2.6 (and in 2.4) there's no safe add_timer() /
> del_timer() use without using external SMP synchronization. (There's one
> special timer use variant involving del_timer_sync() that was safe in 2.4
> but is unsafe in 2.6, see below.)

I don't understand why you don't like this, since your patch seems to 
acheive the same results as Andrea's patch (he uses timer->lock to 
serialize add_timer() and del_timer(), where as your patch modifies
add_timer so that it grabs locks on old_base as well as new_base;
either approach should fix the add_timer/del_timer race.)

> What i'd propose is the attached (tested, against 2.6.0-test2) patch
> instead. It unifies the functionality of add_timer() and mod_timer(), and
> makes any combination of the timer API calls completely SMP-safe.  
> del_timer() is still not using the timer lock.
> 
> this patch fixes the only timer bug in 2.6 i'm aware of: the
> del_timer_sync() + add_timer() combination in kernel/itimer.c is buggy.
> This was correct code in 2.4, because there it was safe to do an
> add_timer() from the timer handler itself, parallel to a del_timer_sync().  
> If we want to make this safe in 2.6 too (which i think we want to) then we
> have to make add_timer() almost equivalent to mod_timer(), locking-wise.
> And once we are at this point i think it's much cleaner to actually make
> add_timer() a variant of mod_timer(). (There's no locking cost for
> add_timer(), only the cost of an extra branch. And we've removed another
> commonly used function from the icache.)

Well, I'm confused by this a bit too, now.  I saw this bug in 2.4 and I 
thought that Andrea was implying that it couldn't happen in 2.6. 
He seemed to be saying that the del_timer_sync() + add_timer() race
can happen only in 2.4, where add_timer() is running on the 'wrong' cpu 
bacuase it got there through the evil run_all_timers()/TIMER_BH.  Since 
there's no run_all_timers() in 2.6, he seemed to imply that the race 
'couldn't happen'.  Is he right?  

So, to add to my 'stupid question' quota of the day: the only way that
a timer could run on a CPU other than what it was added on would be for 
a softirq to somehow get moved to a different cpu, and that is impossible,
right?

> Linas, could you please give this patch a go, does it make a difference to
> your timer list corruption problem? I've booted it on SMP and UP as well.

Still trying ... but after reading it, it looks like it will fix my 2.4 bug.  

--linas


^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: PATCH: Race in 2.6.0-test2 timer code
  2003-07-31 22:56     ` linas
@ 2003-08-01  6:23       ` Ingo Molnar
  0 siblings, 0 replies; 33+ messages in thread
From: Ingo Molnar @ 2003-08-01  6:23 UTC (permalink / raw)
  To: linas; +Cc: Andrew Morton, linux-kernel


On Thu, 31 Jul 2003 linas@austin.ibm.com wrote:

> Well, I'm confused by this a bit too, now.  I saw this bug in 2.4 and I
> thought that Andrea was implying that it couldn't happen in 2.6. He
> seemed to be saying that the del_timer_sync() + add_timer() race can
> happen only in 2.4, where add_timer() is running on the 'wrong' cpu
> bacuase it got there through the evil run_all_timers()/TIMER_BH.  Since
> there's no run_all_timers() in 2.6, he seemed to imply that the race
> 'couldn't happen'.  Is he right?

it is correct, but it was me convincing Andrea about this, not the other
way around :-) Pls. re-read the email thread. My point still stands:

>> (in any case, i still think it would be safer to 'upgrade' the
>> add_timer() interface to be SMP-safe and to allow double-adds - but not
>> for any bug reason anymore.)

this should also make backporting easier.

	Ingo



^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: PATCH: Race in 2.6.0-test2 timer code
  2003-07-31 17:23                     ` linas
@ 2003-08-01  6:27                       ` Ingo Molnar
  0 siblings, 0 replies; 33+ messages in thread
From: Ingo Molnar @ 2003-08-01  6:27 UTC (permalink / raw)
  To: linas; +Cc: Andrea Arcangeli, Andrew Morton, linux-kernel, olof


On Thu, 31 Jul 2003 linas@austin.ibm.com wrote:

> OK, I looked at removing run_all_timers, it doesn't seem too hard.
> 
> I would need to: 
> -- add TIMER_SOFTIRQ to interrupts.h,
> -- add open_softirq (run_timer_softirq) to timer.c init_timer()
> -- move guts of run_local_timers() to run_timer_softirq()
> -- remove bh locks in above, not yet sure about other locks
> -- remove TIMER_BH everywhere.  Or rather, remove it for those
>    arches that support cpu-local timer interupts (curently x86 & freinds, 
>    soon hopefully ppc64, I attach it below, in case other arches want to 
>    play with this).
> 
> Is that right?

no. In 2.4 there are (and/or can be) all sorts of assumptions about
TIMER_BH being serialized with other bh contexts (eg. the serial bh),
that's why i added the TIMER_BH logic to the 2.4 timer-scalability patch.  
You cannot just remove TIMER_BH. The way we did it in 2.5 was to remove
_all_ bhs and thus all assumptions about serialization. This is not an
option for 2.4 in any case.

	Ingo


^ permalink raw reply	[flat|nested] 33+ messages in thread

end of thread, other threads:[~2003-08-01  6:28 UTC | newest]

Thread overview: 33+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2003-07-29 15:41 PATCH: Race in 2.6.0-test2 timer code linas
2003-07-29 20:56 ` Andrew Morton
2003-07-30  5:57   ` Ingo Molnar
2003-07-30  6:36     ` Andrew Morton
2003-07-30  7:07       ` Ingo Molnar
2003-07-30  7:34         ` Andrea Arcangeli
2003-07-30  7:34           ` Ingo Molnar
2003-07-30  8:28             ` Andrea Arcangeli
2003-07-30 10:31               ` Ingo Molnar
2003-07-30 11:16                 ` Andrea Arcangeli
2003-07-30 11:49                   ` Ingo Molnar
2003-07-30 12:34                     ` Andrea Arcangeli
2003-07-30 21:18                       ` linas
2003-07-30 22:06                         ` Andrea Arcangeli
2003-07-30 22:17                           ` Andrea Arcangeli
2003-07-31  7:04                             ` Ingo Molnar
2003-07-30 21:19                       ` Andrea Arcangeli
2003-07-30 23:43                 ` linas
2003-07-30 23:56                   ` Andrea Arcangeli
2003-07-30 23:54                     ` Andrew Morton
2003-07-31  0:16                       ` Andrea Arcangeli
2003-07-31 17:23                     ` linas
2003-08-01  6:27                       ` Ingo Molnar
2003-07-30  7:40           ` Ingo Molnar
2003-07-30  8:37             ` Andrea Arcangeli
2003-07-30 10:34               ` Ingo Molnar
2003-07-30 10:51                 ` Andrew Morton
2003-07-30 11:28                   ` Andrea Arcangeli
2003-07-30 11:22                 ` Andrea Arcangeli
2003-07-30 20:05     ` linas
2003-07-31  6:50       ` Ingo Molnar
2003-07-31 22:56     ` linas
2003-08-01  6:23       ` Ingo Molnar

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).