* PATCH: Race in 2.6.0-test2 timer code @ 2003-07-29 15:41 linas 2003-07-29 20:56 ` Andrew Morton 0 siblings, 1 reply; 33+ messages in thread From: linas @ 2003-07-29 15:41 UTC (permalink / raw) To: linux-kernel PATCH: Race in 2.6.0-test2 timer code Hi, I have been chasing a pointer corruption bug in the timer code, and found the following race in the mod_timer() routine. I am still testing to see if this fixes my bug ... --linas --- linux-2.6.0-test2/kernel/timer.c.orig 2003-07-27 12:07:34.000000000 -0500 +++ linux-2.6.0-test2/kernel/timer.c 2003-07-29 10:22:13.000000000 -0500 @@ -258,8 +258,13 @@ repeat: spin_unlock(&old_base->lock); goto repeat; } - } else + } else { spin_lock(&new_base->lock); + if (timer->base != old_base) { + spin_unlock(&new_base->lock); + goto repeat; + } + } /* * Delete the previous timeout (if there was any), and install ^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: PATCH: Race in 2.6.0-test2 timer code 2003-07-29 15:41 PATCH: Race in 2.6.0-test2 timer code linas @ 2003-07-29 20:56 ` Andrew Morton 2003-07-30 5:57 ` Ingo Molnar 0 siblings, 1 reply; 33+ messages in thread From: Andrew Morton @ 2003-07-29 20:56 UTC (permalink / raw) To: linas; +Cc: linux-kernel linas@austin.ibm.com wrote: > > I have been chasing a pointer corruption bug in the timer code, and > found the following race in the mod_timer() routine. I am still > testing to see if this fixes my bug ... Andrea says that we need to take the per-timer lock in add_timer() and del_timer(), but I haven't yet got around to working out why. Does this (untested) patch fix whatever race it is which you are observing? 25-akpm/kernel/timer.c | 25 ++++++++++++++++--------- 1 files changed, 16 insertions(+), 9 deletions(-) diff -puN kernel/timer.c~timer-locking-fixes kernel/timer.c --- 25/kernel/timer.c~timer-locking-fixes Tue Jul 29 13:52:21 2003 +++ 25-akpm/kernel/timer.c Tue Jul 29 13:54:55 2003 @@ -167,10 +167,12 @@ void add_timer(struct timer_list *timer) check_timer(timer); - spin_lock_irqsave(&base->lock, flags); + spin_lock_irqsave(&timer->lock, flags); + spin_lock(&base->lock); internal_add_timer(base, timer); timer->base = base; - spin_unlock_irqrestore(&base->lock, flags); + spin_unlock(&base->lock); + spin_unlock_irqrestore(&timer->lock, flags); put_cpu(); } @@ -190,10 +192,12 @@ void add_timer_on(struct timer_list *tim check_timer(timer); - spin_lock_irqsave(&base->lock, flags); + spin_lock_irqsave(&timer->lock, flags); + spin_lock(&base->lock); internal_add_timer(base, timer); timer->base = base; - spin_unlock_irqrestore(&base->lock, flags); + spin_unlock(&base->lock); + spin_unlock_irqrestore(&timer->lock, flags); } /*** @@ -298,19 +302,22 @@ int del_timer(struct timer_list *timer) tvec_base_t *base; check_timer(timer); - + spin_lock_irqsave(&timer->lock, flags); repeat: base = timer->base; - if (!base) + if (!base) { + spin_unlock_irqrestore(&timer->lock, flags); return 0; - spin_lock_irqsave(&base->lock, flags); + } + spin_lock(&base, flags); if (base != timer->base) { - spin_unlock_irqrestore(&base->lock, flags); + spin_unlock(&base->lock); goto repeat; } list_del(&timer->entry); timer->base = NULL; - spin_unlock_irqrestore(&base->lock, flags); + spin_unlock(&base->lock); + spin_unlock_irqrestore(&timer->lock, flags); return 1; } _ ^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: PATCH: Race in 2.6.0-test2 timer code 2003-07-29 20:56 ` Andrew Morton @ 2003-07-30 5:57 ` Ingo Molnar 2003-07-30 6:36 ` Andrew Morton ` (2 more replies) 0 siblings, 3 replies; 33+ messages in thread From: Ingo Molnar @ 2003-07-30 5:57 UTC (permalink / raw) To: Andrew Morton; +Cc: linas, linux-kernel On Tue, 29 Jul 2003, Andrew Morton wrote: > Andrea says that we need to take the per-timer lock in add_timer() and > del_timer(), but I haven't yet got around to working out why. this makes no sense - in 2.6 (and in 2.4) there's no safe add_timer() / del_timer() use without using external SMP synchronization. (There's one special timer use variant involving del_timer_sync() that was safe in 2.4 but is unsafe in 2.6, see below.) and i dont think Linas' patch is correct either - how can the timer base change under us? We are holding the timer spinlock. What i'd propose is the attached (tested, against 2.6.0-test2) patch instead. It unifies the functionality of add_timer() and mod_timer(), and makes any combination of the timer API calls completely SMP-safe. del_timer() is still not using the timer lock. this patch fixes the only timer bug in 2.6 i'm aware of: the del_timer_sync() + add_timer() combination in kernel/itimer.c is buggy. This was correct code in 2.4, because there it was safe to do an add_timer() from the timer handler itself, parallel to a del_timer_sync(). If we want to make this safe in 2.6 too (which i think we want to) then we have to make add_timer() almost equivalent to mod_timer(), locking-wise. And once we are at this point i think it's much cleaner to actually make add_timer() a variant of mod_timer(). (There's no locking cost for add_timer(), only the cost of an extra branch. And we've removed another commonly used function from the icache.) Linas, could you please give this patch a go, does it make a difference to your timer list corruption problem? I've booted it on SMP and UP as well. Ingo --- linux/include/linux/timer.h.orig +++ linux/include/linux/timer.h @@ -60,11 +60,30 @@ static inline int timer_pending(const st return timer->base != NULL; } -extern void add_timer(struct timer_list * timer); extern void add_timer_on(struct timer_list *timer, int cpu); extern int del_timer(struct timer_list * timer); +extern int __mod_timer(struct timer_list *timer, unsigned long expires); extern int mod_timer(struct timer_list *timer, unsigned long expires); - + +/*** + * add_timer - start a timer + * @timer: the timer to be added + * + * The kernel will do a ->function(->data) callback from the + * timer interrupt at the ->expired point in the future. The + * current time is 'jiffies'. + * + * The timer's ->expired, ->function (and if the handler uses it, ->data) + * fields must be set prior calling this function. + * + * Timers with an ->expired field in the past will be executed in the next + * timer tick. + */ +static inline void add_timer(struct timer_list * timer) +{ + __mod_timer(timer, timer->expires); +} + #ifdef CONFIG_SMP extern int del_timer_sync(struct timer_list * timer); #else --- linux/kernel/timer.c.orig +++ linux/kernel/timer.c @@ -144,34 +144,62 @@ static void internal_add_timer(tvec_base list_add_tail(&timer->entry, vec); } -/*** - * add_timer - start a timer - * @timer: the timer to be added - * - * The kernel will do a ->function(->data) callback from the - * timer interrupt at the ->expired point in the future. The - * current time is 'jiffies'. - * - * The timer's ->expired, ->function (and if the handler uses it, ->data) - * fields must be set prior calling this function. - * - * Timers with an ->expired field in the past will be executed in the next - * timer tick. It's illegal to add an already pending timer. - */ -void add_timer(struct timer_list *timer) +int __mod_timer(struct timer_list *timer, unsigned long expires) { - tvec_base_t *base = &get_cpu_var(tvec_bases); - unsigned long flags; - - BUG_ON(timer_pending(timer) || !timer->function); + tvec_base_t *old_base, *new_base; + unsigned long flags; + int ret = 0; + + BUG_ON(!timer->function); check_timer(timer); - spin_lock_irqsave(&base->lock, flags); - internal_add_timer(base, timer); - timer->base = base; - spin_unlock_irqrestore(&base->lock, flags); - put_cpu_var(tvec_bases); + spin_lock_irqsave(&timer->lock, flags); + new_base = &__get_cpu_var(tvec_bases); +repeat: + old_base = timer->base; + + /* + * Prevent deadlocks via ordering by old_base < new_base. + */ + if (old_base && (new_base != old_base)) { + if (old_base < new_base) { + spin_lock(&new_base->lock); + spin_lock(&old_base->lock); + } else { + spin_lock(&old_base->lock); + spin_lock(&new_base->lock); + } + /* + * The timer base might have been cancelled while we were + * trying to take the lock(s): + */ + if (timer->base != old_base) { + spin_unlock(&new_base->lock); + spin_unlock(&old_base->lock); + goto repeat; + } + } else + spin_lock(&new_base->lock); + + /* + * Delete the previous timeout (if there was any), and install + * the new one: + */ + if (old_base) { + list_del(&timer->entry); + ret = 1; + } + timer->expires = expires; + internal_add_timer(new_base, timer); + timer->base = new_base; + + if (old_base && (new_base != old_base)) + spin_unlock(&old_base->lock); + spin_unlock(&new_base->lock); + spin_unlock_irqrestore(&timer->lock, flags); + + return ret; } /*** @@ -179,7 +207,7 @@ void add_timer(struct timer_list *timer) * @timer: the timer to be added * @cpu: the CPU to start it on * - * This is not very scalable on SMP. + * This is not very scalable on SMP. Double adds are not possible. */ void add_timer_on(struct timer_list *timer, int cpu) { @@ -217,10 +245,6 @@ void add_timer_on(struct timer_list *tim */ int mod_timer(struct timer_list *timer, unsigned long expires) { - tvec_base_t *old_base, *new_base; - unsigned long flags; - int ret = 0; - BUG_ON(!timer->function); check_timer(timer); @@ -233,52 +257,7 @@ int mod_timer(struct timer_list *timer, if (timer->expires == expires && timer_pending(timer)) return 1; - spin_lock_irqsave(&timer->lock, flags); - new_base = &__get_cpu_var(tvec_bases); -repeat: - old_base = timer->base; - - /* - * Prevent deadlocks via ordering by old_base < new_base. - */ - if (old_base && (new_base != old_base)) { - if (old_base < new_base) { - spin_lock(&new_base->lock); - spin_lock(&old_base->lock); - } else { - spin_lock(&old_base->lock); - spin_lock(&new_base->lock); - } - /* - * The timer base might have been cancelled while we were - * trying to take the lock(s): - */ - if (timer->base != old_base) { - spin_unlock(&new_base->lock); - spin_unlock(&old_base->lock); - goto repeat; - } - } else - spin_lock(&new_base->lock); - - /* - * Delete the previous timeout (if there was any), and install - * the new one: - */ - if (old_base) { - list_del(&timer->entry); - ret = 1; - } - timer->expires = expires; - internal_add_timer(new_base, timer); - timer->base = new_base; - - if (old_base && (new_base != old_base)) - spin_unlock(&old_base->lock); - spin_unlock(&new_base->lock); - spin_unlock_irqrestore(&timer->lock, flags); - - return ret; + return __mod_timer(timer, expires); } /*** ^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: PATCH: Race in 2.6.0-test2 timer code 2003-07-30 5:57 ` Ingo Molnar @ 2003-07-30 6:36 ` Andrew Morton 2003-07-30 7:07 ` Ingo Molnar 2003-07-30 20:05 ` linas 2003-07-31 22:56 ` linas 2 siblings, 1 reply; 33+ messages in thread From: Andrew Morton @ 2003-07-30 6:36 UTC (permalink / raw) To: Ingo Molnar; +Cc: linas, linux-kernel, Andrea Arcangeli Ingo Molnar <mingo@elte.hu> wrote: > > > On Tue, 29 Jul 2003, Andrew Morton wrote: > > > Andrea says that we need to take the per-timer lock in add_timer() and > > del_timer(), but I haven't yet got around to working out why. > > this makes no sense - in 2.6 (and in 2.4) there's no safe add_timer() / > del_timer() use without using external SMP synchronization. (There's one > special timer use variant involving del_timer_sync() that was safe in 2.4 > but is unsafe in 2.6, see below.) > Well Andrea did mention a problem with the interval timers. But I am not aware of the exact details of the race which he found, and I don't understand why del_timer() and add_timer() would be needing the per-timer locks. You need to export __mod_timer to modules btw. --- 25/kernel/ksyms.c~timer-race-fixes 2003-07-29 23:27:05.000000000 -0700 +++ 25-akpm/kernel/ksyms.c 2003-07-29 23:27:49.000000000 -0700 @@ -405,8 +405,6 @@ EXPORT_SYMBOL(proc_doulongvec_ms_jiffies EXPORT_SYMBOL(proc_doulongvec_minmax); /* interrupt handling */ -EXPORT_SYMBOL(add_timer); -EXPORT_SYMBOL(del_timer); EXPORT_SYMBOL(request_irq); EXPORT_SYMBOL(free_irq); @@ -433,7 +431,10 @@ EXPORT_SYMBOL(probe_irq_off); #ifdef CONFIG_SMP EXPORT_SYMBOL(del_timer_sync); #endif +EXPORT_SYMBOL(add_timer); +EXPORT_SYMBOL(del_timer); EXPORT_SYMBOL(mod_timer); +EXPORT_SYMBOL(__mod_timer); #ifdef HAVE_DISABLE_HLT EXPORT_SYMBOL(disable_hlt); _ ^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: PATCH: Race in 2.6.0-test2 timer code 2003-07-30 6:36 ` Andrew Morton @ 2003-07-30 7:07 ` Ingo Molnar 2003-07-30 7:34 ` Andrea Arcangeli 0 siblings, 1 reply; 33+ messages in thread From: Ingo Molnar @ 2003-07-30 7:07 UTC (permalink / raw) To: Andrew Morton; +Cc: linas, linux-kernel, Andrea Arcangeli On Tue, 29 Jul 2003, Andrew Morton wrote: > Well Andrea did mention a problem with the interval timers. But I am > not aware of the exact details of the race which he found, and I don't > understand why del_timer() and add_timer() would be needing the > per-timer locks. hmm ... indeed i cannot see the 2.6 itimer race anymore. Andrea, can you see any SMP races in the current 2.6 timer code? (in any case, i still think it would be safer to 'upgrade' the add_timer() interface to be SMP-safe and to allow double-adds - but not for any bug reason anymore.) Ingo ^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: PATCH: Race in 2.6.0-test2 timer code 2003-07-30 7:07 ` Ingo Molnar @ 2003-07-30 7:34 ` Andrea Arcangeli 2003-07-30 7:34 ` Ingo Molnar 2003-07-30 7:40 ` Ingo Molnar 0 siblings, 2 replies; 33+ messages in thread From: Andrea Arcangeli @ 2003-07-30 7:34 UTC (permalink / raw) To: Ingo Molnar; +Cc: Andrew Morton, linas, linux-kernel On Wed, Jul 30, 2003 at 09:07:59AM +0200, Ingo Molnar wrote: > > On Tue, 29 Jul 2003, Andrew Morton wrote: > > > Well Andrea did mention a problem with the interval timers. But I am > > not aware of the exact details of the race which he found, and I don't > > understand why del_timer() and add_timer() would be needing the > > per-timer locks. > > hmm ... indeed i cannot see the 2.6 itimer race anymore. Andrea, can you > see any SMP races in the current 2.6 timer code? > > (in any case, i still think it would be safer to 'upgrade' the add_timer() > interface to be SMP-safe and to allow double-adds - but not for any bug > reason anymore.) The thing triggered simply by running setitimer in one function, while the it_real_fn was running in the other cpu. I don't see how 2.6 can have fixed this, it_real_fun can still trivially call add_timer while you run inside do_setitimer in 2.6 too. It's simply that 2.6 isn't running in production yet, so a bug that didn't showup in years of 2.5, showsup in a month of 2.4. But even if you serialized some how it_real_fn against do_setitimer (and I don't see any external serialization in 2.6 in current bkcvs) I wouldn't trust 2.6 drivers not to execute del_timer_sync in parallel anyways etc.. that's why I didn't search for too many 2.6 timer bugs too much, and I still suggested to apply the locking despite I only known about a single bug. The real scalability optimization is probably only the fast path check in mod_timer, and the cacheline bouncing avoidance in the timer global data structures like the cascading queues, the per-timer lock should do well, and I'd feel much safer with the whole timer API being smp safe w/o requiring driver developers to add complicated external brainer locking. This will provide a much more friendly abstraction. I would be definitely against putting external locking in do_setitimer/it_timer_fn in 2.6 since like you missed that place, there can be still tons of other buggy drivers out there that don't trigger simply because the userspace is still too small. Andrea ^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: PATCH: Race in 2.6.0-test2 timer code 2003-07-30 7:34 ` Andrea Arcangeli @ 2003-07-30 7:34 ` Ingo Molnar 2003-07-30 8:28 ` Andrea Arcangeli 2003-07-30 7:40 ` Ingo Molnar 1 sibling, 1 reply; 33+ messages in thread From: Ingo Molnar @ 2003-07-30 7:34 UTC (permalink / raw) To: Andrea Arcangeli; +Cc: Andrew Morton, linas, linux-kernel On Wed, 30 Jul 2003, Andrea Arcangeli wrote: > The thing triggered simply by running setitimer in one function, while > the it_real_fn was running in the other cpu. I don't see how 2.6 can > have fixed this, it_real_fun can still trivially call add_timer while > you run inside do_setitimer in 2.6 too. [...] This is not a race that can happen. itimer does this: del_timer_sync(); add_timer(); how can the add_timer() still happen while it_real_fn is still running on another CPU? Ingo ^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: PATCH: Race in 2.6.0-test2 timer code 2003-07-30 7:34 ` Ingo Molnar @ 2003-07-30 8:28 ` Andrea Arcangeli 2003-07-30 10:31 ` Ingo Molnar 0 siblings, 1 reply; 33+ messages in thread From: Andrea Arcangeli @ 2003-07-30 8:28 UTC (permalink / raw) To: Ingo Molnar; +Cc: Andrew Morton, linas, linux-kernel On Wed, Jul 30, 2003 at 09:34:40AM +0200, Ingo Molnar wrote: > > On Wed, 30 Jul 2003, Andrea Arcangeli wrote: > > > The thing triggered simply by running setitimer in one function, while > > the it_real_fn was running in the other cpu. I don't see how 2.6 can > > have fixed this, it_real_fun can still trivially call add_timer while > > you run inside do_setitimer in 2.6 too. [...] > > This is not a race that can happen. itimer does this: > > del_timer_sync(); > add_timer(); > > how can the add_timer() still happen while it_real_fn is still running on > another CPU? it's not add_timer against add_timer in this case, it's del_timer_sync against add_timer. cpu0 cpu1 ------------ -------------------- do_setitimer it_real_fn del_timer_sync add_timer -> crash Andrea ^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: PATCH: Race in 2.6.0-test2 timer code 2003-07-30 8:28 ` Andrea Arcangeli @ 2003-07-30 10:31 ` Ingo Molnar 2003-07-30 11:16 ` Andrea Arcangeli 2003-07-30 23:43 ` linas 0 siblings, 2 replies; 33+ messages in thread From: Ingo Molnar @ 2003-07-30 10:31 UTC (permalink / raw) To: Andrea Arcangeli; +Cc: Andrew Morton, linas, linux-kernel On Wed, 30 Jul 2003, Andrea Arcangeli wrote: > > cpu0 cpu1 > ------------ -------------------- > > do_setitimer > it_real_fn > del_timer_sync add_timer -> crash would you mind to elaborate the precise race? I cannot see how the above sequence could crash on current 2.6: del_timer_sync() takes the base pointer in timer->base and locks it, then checks whether the timer has this base or not and repeats the sequence if not. add_timer() locks the current CPU's base, then installs the timer and sets timer->base. Where's the race? Ingo ^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: PATCH: Race in 2.6.0-test2 timer code 2003-07-30 10:31 ` Ingo Molnar @ 2003-07-30 11:16 ` Andrea Arcangeli 2003-07-30 11:49 ` Ingo Molnar 2003-07-30 23:43 ` linas 1 sibling, 1 reply; 33+ messages in thread From: Andrea Arcangeli @ 2003-07-30 11:16 UTC (permalink / raw) To: Ingo Molnar; +Cc: Andrew Morton, linas, linux-kernel On Wed, Jul 30, 2003 at 12:31:23PM +0200, Ingo Molnar wrote: > > On Wed, 30 Jul 2003, Andrea Arcangeli wrote: > > > > > cpu0 cpu1 > > ------------ -------------------- > > > > do_setitimer > > it_real_fn > > del_timer_sync add_timer -> crash > > would you mind to elaborate the precise race? I cannot see how the above > sequence could crash on current 2.6: > > del_timer_sync() takes the base pointer in timer->base and locks it, then > checks whether the timer has this base or not and repeats the sequence if > not. add_timer() locks the current CPU's base, then installs the timer and > sets timer->base. Where's the race? not sure why you mention timer->base, you know the base lock means nothing if it's in two different cpus (if it meant anything, it would mean we'd still suffer the cacheline bouncing with timer ops running in different cpus). To make it more clear I edit timer.c deleting all spin_lock(&base->lock) so the base->lock won't make it look like it can serialize anything. I delete the local_irqsave too since it doesn't matter too for this example (also assume premption turned off so I don't need to add the preempt disable in place of the local irq save). so for this example we have this: int del_timer(struct timer_list *timer) { tvec_base_t *base; repeat: base = timer->base; if (!base) return 0; if (base != timer->base) { goto repeat; } list_del(&timer->entry); timer->base = NULL; return 1; } against this: void add_timer(struct timer_list *timer) { tvec_base_t *base = &get_cpu_var(tvec_bases); BUG_ON(timer_pending(timer) || !timer->function); internal_add_timer(base, timer); timer->base = base; put_cpu_var(tvec_bases); } in del_timer, list_del can be reordered after the timer->base = NULL, the C compiler can do that. so list_del will run at the same time of internal_add_timer(base, timer) that does the list_add_tail. list_del and list_add_tail running at the same time will crash. Andrea ^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: PATCH: Race in 2.6.0-test2 timer code 2003-07-30 11:16 ` Andrea Arcangeli @ 2003-07-30 11:49 ` Ingo Molnar 2003-07-30 12:34 ` Andrea Arcangeli 0 siblings, 1 reply; 33+ messages in thread From: Ingo Molnar @ 2003-07-30 11:49 UTC (permalink / raw) To: Andrea Arcangeli; +Cc: Andrew Morton, linas, linux-kernel On Wed, 30 Jul 2003, Andrea Arcangeli wrote: > in del_timer, list_del can be reordered after the timer->base = NULL, > the C compiler can do that. so list_del will run at the same time of > internal_add_timer(base, timer) that does the list_add_tail. no, it cannot run at the same time. The add_timer() will first lock the current CPU's base, before touching the list. Any parallel del_timer() can only do the list_del() if it first has locked timer->base. timer->base can only have the base of the CPU where it_real_fn is running, or be NULL. In the NULL case del_timer() wont do a thing but return. In the other case the timer->base value observed by the del_timer()-executing CPU will be the same base as where it_real_fn is running, so both the add_timer() and the del_timer() will serialize on the same base => no parallel list handling possible. How the compiler (or even the CPU, on non-x86) orders the writes within the locked section is irrelevant in this scenario. Ingo ^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: PATCH: Race in 2.6.0-test2 timer code 2003-07-30 11:49 ` Ingo Molnar @ 2003-07-30 12:34 ` Andrea Arcangeli 2003-07-30 21:18 ` linas 2003-07-30 21:19 ` Andrea Arcangeli 0 siblings, 2 replies; 33+ messages in thread From: Andrea Arcangeli @ 2003-07-30 12:34 UTC (permalink / raw) To: Ingo Molnar; +Cc: Andrew Morton, linas, linux-kernel On Wed, Jul 30, 2003 at 01:49:52PM +0200, Ingo Molnar wrote: > > On Wed, 30 Jul 2003, Andrea Arcangeli wrote: > > > in del_timer, list_del can be reordered after the timer->base = NULL, > > the C compiler can do that. so list_del will run at the same time of > > internal_add_timer(base, timer) that does the list_add_tail. > > no, it cannot run at the same time. The add_timer() will first lock the > current CPU's base, before touching the list. Any parallel del_timer() can > only do the list_del() if it first has locked timer->base. timer->base can > only have the base of the CPU where it_real_fn is running, or be NULL. In > the NULL case del_timer() wont do a thing but return. In the other case > the timer->base value observed by the del_timer()-executing CPU will be > the same base as where it_real_fn is running, so both the add_timer() and > the del_timer() will serialize on the same base => no parallel list > handling possible. How the compiler (or even the CPU, on non-x86) orders > the writes within the locked section is irrelevant in this scenario. so if it was really the itimer, it had to be on ppc s390 or ia64, not on x86. I never reproduced this myself. I will ask more info on bugzilla, because I thought it was x86 but maybe it wasn't. As said in the previous email, only non x86 archs can run the timer irq on a cpu different than the one where it was inserted. As Andrew, noted the same race could happen in 2.6 with add_timer_on. But apparently you're right that the setitimer couldn't trigger on 2.6 or 2.4-aa x86. Still it could be something else that broke related to add_timer/del_timer_sync in drivers reproducible in x86 too. As said I didn't debug or reproduce it myself. It simply looked correct to allow add_timer to run in parallel of del_timer_sync (no matter which cpu they're running on, if add_timer runs from inside the timer itself, of course it can't trigger because the base won't change in add_timer and del_timer will be a reader falling in the same base, but that's a kind of special case, and it fails too if you use add_timer_on). Andrea ^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: PATCH: Race in 2.6.0-test2 timer code 2003-07-30 12:34 ` Andrea Arcangeli @ 2003-07-30 21:18 ` linas 2003-07-30 22:06 ` Andrea Arcangeli 2003-07-30 21:19 ` Andrea Arcangeli 1 sibling, 1 reply; 33+ messages in thread From: linas @ 2003-07-30 21:18 UTC (permalink / raw) To: Andrea Arcangeli; +Cc: Ingo Molnar, Andrew Morton, linux-kernel On Wed, Jul 30, 2003 at 02:34:58PM +0200, Andrea Arcangeli wrote: > > so if it was really the itimer, it had to be on ppc s390 or ia64, not on > x86. I never reproduced this myself. I will ask more info on bugzilla, > because I thought it was x86 but maybe it wasn't. As said in the > previous email, only non x86 archs can run the timer irq on a cpu > different than the one where it was inserted. I think I started the thread, so let me backtrack. I've got several ppc64 boxes running 2.4.21. They hang, they don't crash. Sometimes they're pingable, but that's all. The little yellow button is wired up to KDB, so I can break in and poke around and see what's happening. Here's what I find: __run_timers() is stuck in a infinite loop, because there's a timer on the base. However, this timer has timer->list.net == NULL, and so it can never be removed. (As a side effect, other CPU's get stuck in various places, sitting in spinlocks, etc. which is why the machine goes unresponsive) So the question becomes "how did this pointer get trashed?" I thought I saw an "obvious race" in the 2.4 code, and it looked identical to the 2.6 code, and thus the email went out. Now, I'm not so sure; I'm confused. However, given that its timer->list.net that is getting clobbered, then locking all occurrances of list_add and list_del should cure the problem. I'm guessing that Andrea's patch to add a timer->lock will protect all of the list_add's, list_del's that matter. Given that I don't yet understand how timer->list.net got clobbered, I can't yet say 'yes/no this fixes it'. --linas ^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: PATCH: Race in 2.6.0-test2 timer code 2003-07-30 21:18 ` linas @ 2003-07-30 22:06 ` Andrea Arcangeli 2003-07-30 22:17 ` Andrea Arcangeli 0 siblings, 1 reply; 33+ messages in thread From: Andrea Arcangeli @ 2003-07-30 22:06 UTC (permalink / raw) To: linas; +Cc: Ingo Molnar, Andrew Morton, linux-kernel On Wed, Jul 30, 2003 at 04:18:10PM -0500, linas@austin.ibm.com wrote: > On Wed, Jul 30, 2003 at 02:34:58PM +0200, Andrea Arcangeli wrote: > > > > so if it was really the itimer, it had to be on ppc s390 or ia64, not on > > x86. I never reproduced this myself. I will ask more info on bugzilla, > > because I thought it was x86 but maybe it wasn't. As said in the > > previous email, only non x86 archs can run the timer irq on a cpu > > different than the one where it was inserted. > > I think I started the thread, so let me backtrack. I've got several > ppc64 boxes running 2.4.21. They hang, they don't crash. Sometimes > they're pingable, but that's all. The little yellow button is wired > up to KDB, so I can break in and poke around and see what's happening. > > Here's what I find: > > __run_timers() is stuck in a infinite loop, because there's a > timer on the base. However, this timer has timer->list.net == NULL, > and so it can never be removed. (As a side effect, other CPU's get > stuck in various places, sitting in spinlocks, etc. which is why the > machine goes unresponsive) So the question becomes "how did this > pointer get trashed?" > > I thought I saw an "obvious race" in the 2.4 code, and it looked > identical to the 2.6 code, and thus the email went out. Now, I'm > not so sure; I'm confused. the 2.4 code calls into run_all_timers for non x86 archs, that should go away, and be replaced with calls of the local_run_timers in the per cpu timer interrupts (not the global timer irq). Infact I wouldn't trust calling run_all_timers anyways, that might be the racy thing, you're probably the first one calling it. > > However, given that its timer->list.net that is getting clobbered, > then locking all occurrances of list_add and list_del should cure the > problem. I'm guessing that Andrea's patch to add a timer->lock > will protect all of the list_add's, list_del's that matter. you definitely need that patch on non x86 archs with the run_all_timers or the race that I pointed out today will trigger (as agreed by Ingo too), so you should try to reproduce with it applied. I've also seen another patch floating around that looks like this: @@ -214,8 +214,14 @@ repeat: spin_unlock(&old_base->lock); goto repeat; } - } else + } else { spin_lock(&new_base->lock); + if (timer->base != old_base) { + spin_unlock(&new_base->lock); + printk ("duuude found and fixed bug locking mod timer\n"); + goto repeat; + } + but I don't think we need even this other one in 2.4 (2.4 never touches the timer->base from the del_timer*): /* * Subtle, we rely on timer->base being always * valid and being updated atomically. */ if (timer->base != old_base) { spin_unlock(&new_base->lock); spin_unlock(&old_base->lock); goto repeat; } since we're serialized by the timer->lock in mod_timer, and del_timer won't affect the timer->base in 2.4. And add_timer is forbidden to run at the same time of mod_timer. So I think we should change the code this way instead of applying the above patch that looks wrong: --- 2.4.22pre7aa1/kernel/timer.c.~1~ 2003-07-19 02:34:09.000000000 +0200 +++ 2.4.22pre7aa1/kernel/timer.c 2003-07-30 23:43:03.000000000 +0200 @@ -210,15 +210,7 @@ repeat: spin_lock(&old_base->lock); spin_lock(&new_base->lock); } - /* - * Subtle, we rely on timer->base being always - * valid and being updated atomically. - */ - if (timer->base != old_base) { - spin_unlock(&new_base->lock); - spin_unlock(&old_base->lock); - goto repeat; - } + BUG_ON(timer->base != old_base); } else spin_lock(&new_base->lock); Assuming you were running 2.4.21 w/o the extended timer->lock to add_timer and del_timer, the only mistery at the moment is how can have lcm@us.ibm.com reproduced a race in setitimer on x86 and get it fixed with the patch I postd to l-k earlier today that extends the same timer->lock of mod_timer, to add_timer and del_timer* too. I mean, that patch is perfectly safe, looks very sane, is strictly necessary on all non x86* and ia64 archs, but it shouldn't be necessary on x86 as discussed today with Ingo and Andrew. I will keep that patch applied because this is 2.4 and it fixes stuff in practice, but still we must be missing something about this code. Andrea ^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: PATCH: Race in 2.6.0-test2 timer code 2003-07-30 22:06 ` Andrea Arcangeli @ 2003-07-30 22:17 ` Andrea Arcangeli 2003-07-31 7:04 ` Ingo Molnar 0 siblings, 1 reply; 33+ messages in thread From: Andrea Arcangeli @ 2003-07-30 22:17 UTC (permalink / raw) To: linas; +Cc: Ingo Molnar, Andrew Morton, linux-kernel On Thu, Jul 31, 2003 at 12:06:04AM +0200, Andrea Arcangeli wrote: > practice, but still we must be missing something about this code. ah, finally I see how can the timer->lock can have made the kernel stable again! run_all_timers can still definitely run on x86 too if the local cpu timer runs on top of an irq handler: if (in_interrupt()) goto out_mark; out_mark: mark_bh(TIMER_BH); init_bh(TIMER_BH, run_all_timers); (still on ppc will be an order of magnitude less stable than on x86, since ppc only calls run_all_timers, so you don't need two races to trigger at the same time to crash) ok, so my current code is the right one and it's not needed in 2.6 since 2.6 always runs inside a softirq and it never fallbacks to a run_all_timers. So the best fix would be to nuke the run_all_timers thing from 2.4 too. For now I'm only going to take the risk adding the BUG_ON in mod_timer and to keep the timer->lock everywhere to make run_all_timers safe. Now you should only make sure that your 2.4.21 gets stable too with the fix I applied today (and please add the BUG_ON(old_base != timer->base) in mod_timer too so I won't be the only tester of that code ;) In short the stack traces I described today were all right but only for 2.4, and not for 2.6. Andrea ^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: PATCH: Race in 2.6.0-test2 timer code 2003-07-30 22:17 ` Andrea Arcangeli @ 2003-07-31 7:04 ` Ingo Molnar 0 siblings, 0 replies; 33+ messages in thread From: Ingo Molnar @ 2003-07-31 7:04 UTC (permalink / raw) To: Andrea Arcangeli; +Cc: linas, Andrew Morton, linux-kernel On Thu, 31 Jul 2003, Andrea Arcangeli wrote: > ah, finally I see how can the timer->lock can have made the kernel > stable again! > > run_all_timers can still definitely run on x86 too if the local cpu > timer runs on top of an irq handler: as i told you ... i've added the TIMER_BH thing to the 2.4 patch to ensure timer-fn single-threadedness. this issue has been fixed in 2.6 by the removal of TIMER_BH. Ingo ^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: PATCH: Race in 2.6.0-test2 timer code 2003-07-30 12:34 ` Andrea Arcangeli 2003-07-30 21:18 ` linas @ 2003-07-30 21:19 ` Andrea Arcangeli 1 sibling, 0 replies; 33+ messages in thread From: Andrea Arcangeli @ 2003-07-30 21:19 UTC (permalink / raw) To: Ingo Molnar; +Cc: Andrew Morton, linas, linux-kernel On Wed, Jul 30, 2003 at 02:34:58PM +0200, Andrea Arcangeli wrote: > x86. I never reproduced this myself. I will ask more info on bugzilla, I've the confirmation it was reproduced on x86, so the timer should have been running in the same cpu where it was queued (i.e. the base should have been the same for both del_timer_sync and add_timer). So at the moment it's not clear what race that patch fixed, but nevertheless I feel much safer to keep this obviously safe additional locking applied until we know for sure ;) And I've the confirmation that it makes the 2.4 kernel stable again. FYI, if I don't answer to further emails on this in the next days, it's because I'll be on vacations the next 14 days. Andrea ^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: PATCH: Race in 2.6.0-test2 timer code 2003-07-30 10:31 ` Ingo Molnar 2003-07-30 11:16 ` Andrea Arcangeli @ 2003-07-30 23:43 ` linas 2003-07-30 23:56 ` Andrea Arcangeli 1 sibling, 1 reply; 33+ messages in thread From: linas @ 2003-07-30 23:43 UTC (permalink / raw) To: Ingo Molnar Cc: Andrea Arcangeli, Andrew Morton, linux-kernel, olh, olof, linas Hi, OK, I finally spent some time studying the timer code and am starting to grasp it. I think I finally understand the bug. Andrea's timer->lock patch will fix it for 2.4 and everybody says it can't happen in 2.6 so ok. On Thu, Jul 31, 2003 at 12:17:17AM +0200, Andrea Arcangeli wrote: > So the best fix would be to nuke the run_all_timers thing from 2.4 too. Yes. >and to keep the timer->lock everywhere to make run_all_timers safe. Or do that instead, yes. > In short the stack traces I described today were all right but only for > 2.4, and not for 2.6. I see the bug in 2.4. On Wed, Jul 30, 2003 at 12:31:23PM +0200, Ingo Molnar wrote: > > On Wed, 30 Jul 2003, Andrea Arcangeli wrote: > > > > > cpu0 cpu1 > > ------------ -------------------- > > > > do_setitimer > > it_real_fn > > del_timer_sync add_timer -> crash > > would you mind to elaborate the precise race? I cannot see how the above > sequence could crash on current 2.6: I don't know enough about how 2.6 works to say, but here's more detail on what happened in 2.4: cpu 0 cpu 3 ------- --------- previously, timer->base = cpu 3 base (its task-struct->real_timer) bh_action() { __run_timers() { sys_setitimer() { it_real_fn() // called as fn() del_timer_sync() { add_timer() { spinlock (cpu3 base) spinlock (cpu0 base) timer->base = cpu0 base detach_timer (timer) list_add(&timer->list) timer->list.next = NULL; and now timer is vectored in but can't be unlinked. And so either of Andrea's fixes should fix this race. --linas ^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: PATCH: Race in 2.6.0-test2 timer code 2003-07-30 23:43 ` linas @ 2003-07-30 23:56 ` Andrea Arcangeli 2003-07-30 23:54 ` Andrew Morton 2003-07-31 17:23 ` linas 0 siblings, 2 replies; 33+ messages in thread From: Andrea Arcangeli @ 2003-07-30 23:56 UTC (permalink / raw) To: linas; +Cc: Ingo Molnar, Andrew Morton, linux-kernel, olh, olof On Wed, Jul 30, 2003 at 06:43:18PM -0500, linas@austin.ibm.com wrote: > Hi, > > OK, I finally spent some time studying the timer code and am starting > to grasp it. I think I finally understand the bug. Andrea's timer->lock > patch will fix it for 2.4 and everybody says it can't happen in 2.6 so > ok. > > > On Thu, Jul 31, 2003 at 12:17:17AM +0200, Andrea Arcangeli wrote: > > So the best fix would be to nuke the run_all_timers thing from 2.4 too. > > Yes. > > >and to keep the timer->lock everywhere to make run_all_timers safe. > > Or do that instead, yes. > > > In short the stack traces I described today were all right but only for > > 2.4, and not for 2.6. > > I see the bug in 2.4. > > On Wed, Jul 30, 2003 at 12:31:23PM +0200, Ingo Molnar wrote: > > > > On Wed, 30 Jul 2003, Andrea Arcangeli wrote: > > > > > > > > cpu0 cpu1 > > > ------------ -------------------- > > > > > > do_setitimer > > > it_real_fn > > > del_timer_sync add_timer -> crash > > > > would you mind to elaborate the precise race? I cannot see how the above > > sequence could crash on current 2.6: > > I don't know enough about how 2.6 works to say, > but here's more detail on what happened in 2.4: > > > cpu 0 cpu 3 > ------- --------- > previously, timer->base = cpu 3 base > (its task-struct->real_timer) > bh_action() { > __run_timers() { sys_setitimer() { > it_real_fn() // called as fn() del_timer_sync() { > add_timer() { spinlock (cpu3 base) > spinlock (cpu0 base) > timer->base = cpu0 base detach_timer (timer) > list_add(&timer->list) > timer->list.next = NULL; > and now timer is vectored in but can't be unlinked. > > And so either of Andrea's fixes should fix this race. exactly. If you want to nuke the run_all_timers from the 2.4 backport feel free, then we could drop the additional locking from add_timer/del_timer* and leave it only in mod_timer like 2.6 does, that will avoid cacheline bouncing in the small window when the local timer irq run on top of an irq handler. In the meantime the kernel is already solid (again ;). 2.6 will kernel crash like we did in 2.4 only if it calls add_timer_on from timer context (of course with a cpuid != than the smp_processor_id()), that would be fixed by the timer->lock everywhere that we've in 2.4 right now. (but there's no add_timer_on in 2.4 anyways) BTW, it's reassuring it wasn't lack of 2.6 stress testing, I apologize for claiming that. Andrea ^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: PATCH: Race in 2.6.0-test2 timer code 2003-07-30 23:56 ` Andrea Arcangeli @ 2003-07-30 23:54 ` Andrew Morton 2003-07-31 0:16 ` Andrea Arcangeli 2003-07-31 17:23 ` linas 1 sibling, 1 reply; 33+ messages in thread From: Andrew Morton @ 2003-07-30 23:54 UTC (permalink / raw) To: Andrea Arcangeli; +Cc: linas, mingo, linux-kernel, olh, olof Andrea Arcangeli <andrea@suse.de> wrote: > > 2.6 will kernel crash like we did in 2.4 only if it calls > add_timer_on from timer context (of course with a cpuid != than the > smp_processor_id()), that would be fixed by the timer->lock everywhere > that we've in 2.4 right now. (but there's no add_timer_on in 2.4 > anyways) add_timer_on() was added specifically for slab bringup. If we need extra locking to cope with it then the best solution would probably be to rename it to add_timer_on_dont_use_this_for_anything_else(). But if we are going to rely on timer handlers only ever running on the adding CPU for locking purposes then can we please have a big comment somewhere describing what's going on? It's very subtle... ^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: PATCH: Race in 2.6.0-test2 timer code 2003-07-30 23:54 ` Andrew Morton @ 2003-07-31 0:16 ` Andrea Arcangeli 0 siblings, 0 replies; 33+ messages in thread From: Andrea Arcangeli @ 2003-07-31 0:16 UTC (permalink / raw) To: Andrew Morton; +Cc: linas, mingo, linux-kernel, olh, olof On Wed, Jul 30, 2003 at 04:54:18PM -0700, Andrew Morton wrote: > Andrea Arcangeli <andrea@suse.de> wrote: > > > > 2.6 will kernel crash like we did in 2.4 only if it calls > > add_timer_on from timer context (of course with a cpuid != than the > > smp_processor_id()), that would be fixed by the timer->lock everywhere > > that we've in 2.4 right now. (but there's no add_timer_on in 2.4 > > anyways) > > add_timer_on() was added specifically for slab bringup. If we need extra > locking to cope with it then the best solution would probably be to rename > it to add_timer_on_dont_use_this_for_anything_else(). yes. I wasn't actually suggesting to add locking everywhere just for this. It sounds reasonable to leave it unsafe from timer context. > But if we are going to rely on timer handlers only ever running on the > adding CPU for locking purposes then can we please have a big comment > somewhere describing what's going on? It's very subtle... agreed, it definitely deserves the big fat comment somewhere ;) thanks, Andrea ^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: PATCH: Race in 2.6.0-test2 timer code 2003-07-30 23:56 ` Andrea Arcangeli 2003-07-30 23:54 ` Andrew Morton @ 2003-07-31 17:23 ` linas 2003-08-01 6:27 ` Ingo Molnar 1 sibling, 1 reply; 33+ messages in thread From: linas @ 2003-07-31 17:23 UTC (permalink / raw) To: Andrea Arcangeli; +Cc: Ingo Molnar, Andrew Morton, linux-kernel, olof On Thu, Jul 31, 2003 at 01:56:07AM +0200, Andrea Arcangeli wrote: > On Wed, Jul 30, 2003 at 06:43:18PM -0500, linas@austin.ibm.com wrote: > > On Thu, Jul 31, 2003 at 12:17:17AM +0200, Andrea Arcangeli wrote: > > > So the best fix would be to nuke the run_all_timers thing from 2.4 too. > > Yes. [...] > > And so either of Andrea's fixes should fix this race. > > exactly. If you want to nuke the run_all_timers from the 2.4 backport > feel free, then we could drop the additional locking from > add_timer/del_timer* and leave it only in mod_timer like 2.6 does, that > will avoid cacheline bouncing in the small window when the local timer > irq run on top of an irq handler. In the meantime the kernel is already > solid (again ;). OK, I looked at removing run_all_timers, it doesn't seem too hard. I would need to: -- add TIMER_SOFTIRQ to interrupts.h, -- add open_softirq (run_timer_softirq) to timer.c init_timer() -- move guts of run_local_timers() to run_timer_softirq() -- remove bh locks in above, not yet sure about other locks -- remove TIMER_BH everywhere. Or rather, remove it for those arches that support cpu-local timer interupts (curently x86 & freinds, soon hopefully ppc64, I attach it below, in case other arches want to play with this). Is that right? Should I do this patch or will people loose interest in it? Who should I send it to? Who wants to review it? > BTW, it's reassuring it wasn't lack of 2.6 stress testing, I apologize > for claiming that. Me too, sorry for making noise before figuring it all out. But then, I remember asking a college physics professor for a recommendation, and he didn't know me well. "You sat in the back of the class and didn't ask any questions, unlike so-n-so, (the #1 student in the class) who sat in front and asked a lot of questions." I told him that that was because I knew the answers to all of the questions that were being asked. (I was trying to distance myself from the stupid people up front). He wasn't impressed by my reply, I hurt pretty bad. -------------------------------------------------------------------- Re the per-cpu, local timer patch, unless I'm badly mistaken, its pretty trivial, right:? Index: kernel/timer.c =================================================================== RCS file: /home/linas/cvsroot/linux24/kernel/timer.c,v retrieving revision 1.1.1.1.4.1 diff -u -p -u -r1.1.1.1.4.1 timer.c --- kernel/timer.c 15 Jul 2003 18:43:52 -0000 1.1.1.1.4.1 +++ kernel/timer.c 31 Jul 2003 15:30:26 -0000 @@ -764,7 +764,7 @@ void do_timer(struct pt_regs *regs) /* SMP process accounting uses the local APIC timer */ update_process_times(user_mode(regs)); -#if defined(CONFIG_X86) || defined(CONFIG_IA64) /* x86-64 is also included by CONFIG_X86 */ +#if defined(CONFIG_X86) || defined(CONFIG_IA64) || defined(CONFIG_PPC64) /* x86-64 is also included by CONFIG_X86 */ mark_bh(TIMER_BH); #endif #endif @@ -772,7 +772,7 @@ void do_timer(struct pt_regs *regs) * Right now only x86-SMP calls run_local_timers() from a * per-CPU interrupt. */ -#if !defined(CONFIG_X86) && !defined(CONFIG_IA64) /* x86-64 is also included by CONFIG_X86 */ +#if !defined(CONFIG_X86) && !defined(CONFIG_IA64) && !defined(CONFIG_PPC64) /* x86-64 is also included by CONFIG_X86 */ mark_bh(TIMER_BH); #endif update_times(); Index: arch/ppc64/kernel/smp.c =================================================================== RCS file: /home/linas/cvsroot/linux24/arch/ppc64/kernel/smp.c,v retrieving revision 1.2.4.1 diff -u -p -u -r1.2.4.1 smp.c --- arch/ppc64/kernel/smp.c 15 Jul 2003 18:41:56 -0000 1.2.4.1 +++ arch/ppc64/kernel/smp.c 31 Jul 2003 15:21:35 -0000 @@ -398,6 +398,8 @@ void smp_local_timer_interrupt(struct pt update_process_times(user_mode(regs)); (get_paca()->prof_counter)=get_paca()->prof_multiplier; } + + run_local_timers(); } void smp_message_recv(int msg, struct pt_regs *regs) ^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: PATCH: Race in 2.6.0-test2 timer code 2003-07-31 17:23 ` linas @ 2003-08-01 6:27 ` Ingo Molnar 0 siblings, 0 replies; 33+ messages in thread From: Ingo Molnar @ 2003-08-01 6:27 UTC (permalink / raw) To: linas; +Cc: Andrea Arcangeli, Andrew Morton, linux-kernel, olof On Thu, 31 Jul 2003 linas@austin.ibm.com wrote: > OK, I looked at removing run_all_timers, it doesn't seem too hard. > > I would need to: > -- add TIMER_SOFTIRQ to interrupts.h, > -- add open_softirq (run_timer_softirq) to timer.c init_timer() > -- move guts of run_local_timers() to run_timer_softirq() > -- remove bh locks in above, not yet sure about other locks > -- remove TIMER_BH everywhere. Or rather, remove it for those > arches that support cpu-local timer interupts (curently x86 & freinds, > soon hopefully ppc64, I attach it below, in case other arches want to > play with this). > > Is that right? no. In 2.4 there are (and/or can be) all sorts of assumptions about TIMER_BH being serialized with other bh contexts (eg. the serial bh), that's why i added the TIMER_BH logic to the 2.4 timer-scalability patch. You cannot just remove TIMER_BH. The way we did it in 2.5 was to remove _all_ bhs and thus all assumptions about serialization. This is not an option for 2.4 in any case. Ingo ^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: PATCH: Race in 2.6.0-test2 timer code 2003-07-30 7:34 ` Andrea Arcangeli 2003-07-30 7:34 ` Ingo Molnar @ 2003-07-30 7:40 ` Ingo Molnar 2003-07-30 8:37 ` Andrea Arcangeli 1 sibling, 1 reply; 33+ messages in thread From: Ingo Molnar @ 2003-07-30 7:40 UTC (permalink / raw) To: Andrea Arcangeli; +Cc: Andrew Morton, linas, linux-kernel On Wed, 30 Jul 2003, Andrea Arcangeli wrote: > [...] I'd feel much safer with the whole timer API being smp safe w/o > requiring driver developers to add complicated external brainer locking. > This will provide a much more friendly abstraction. i agree with the goal, but your patch does not achieve this. Your patch does not make double-add_timer() safe for example. As far as i can see your 2.6 patch only introduces additional overhead. Ingo ^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: PATCH: Race in 2.6.0-test2 timer code 2003-07-30 7:40 ` Ingo Molnar @ 2003-07-30 8:37 ` Andrea Arcangeli 2003-07-30 10:34 ` Ingo Molnar 0 siblings, 1 reply; 33+ messages in thread From: Andrea Arcangeli @ 2003-07-30 8:37 UTC (permalink / raw) To: Ingo Molnar; +Cc: Andrew Morton, linas, linux-kernel On Wed, Jul 30, 2003 at 09:40:14AM +0200, Ingo Molnar wrote: > > On Wed, 30 Jul 2003, Andrea Arcangeli wrote: > > > [...] I'd feel much safer with the whole timer API being smp safe w/o > > requiring driver developers to add complicated external brainer locking. > > This will provide a much more friendly abstraction. > > i agree with the goal, but your patch does not achieve this. Your patch > does not make double-add_timer() safe for example. As far as i can see > your 2.6 patch only introduces additional overhead. I never did any 2.6 patch, so it maybe a different thing what you've seen, not what I applied to 2.4. Infact even the 2.4 patch isn't from me. This is the fix I applied to my 2.4 tree and it definitely fixes the problem, including add_timer against add_timer. And it fixes the bug in practice, it usually took a few hours to crash, not it can run for days stable and I never heard complains again. --- linux-2.4.19.SuSE-orig/kernel/timer.c 2003-07-02 13:53:55.000000000 -0700 +++ linux-2.4.19.SuSE-timer/kernel/timer.c 2003-07-02 18:32:08.000000000 -0700 @@ -144,17 +144,22 @@ void add_timer(timer_t *timer) CHECK_BASE(base); CHECK_BASE(timer->base); - spin_lock_irqsave(&base->lock, flags); - if (timer_pending(timer)) - goto bug; - internal_add_timer(base, timer); - timer->base = base; - spin_unlock_irqrestore(&base->lock, flags); - return; -bug: - spin_unlock_irqrestore(&base->lock, flags); - printk("bug: kernel timer added twice at %p.\n", - __builtin_return_address(0)); + + local_irq_save(flags); + while (unlikely(test_and_set_bit(0, &timer->lock))) + while (test_bit(0, &timer->lock)); + + spin_lock(&base->lock); + if (timer_pending(timer)) { + printk("bug: kernel timer added twice at %p.\n", + __builtin_return_address(0)); + } else { + internal_add_timer(base, timer); + timer->base = base; + } + spin_unlock(&base->lock); + clear_bit(0, &timer->lock); + local_irq_restore(flags); } static inline int detach_timer(timer_t *timer) @@ -244,16 +249,24 @@ int del_timer(timer_t * timer) CHECK_BASE(timer->base); if (!timer->base) return 0; + + local_irq_save(flags); + while (unlikely(test_and_set_bit(0, &timer->lock))) + while (test_bit(0, &timer->lock)); + repeat: base = timer->base; - spin_lock_irqsave(&base->lock, flags); + spin_lock(&base->lock); if (base != timer->base) { - spin_unlock_irqrestore(&base->lock, flags); + spin_unlock(&base->lock); goto repeat; } ret = detach_timer(timer); timer->list.next = timer->list.prev = NULL; - spin_unlock_irqrestore(&base->lock, flags); + spin_unlock(&base->lock); + + clear_bit(0, &timer->lock); + local_irq_restore(flags); return ret; } @@ -274,34 +287,48 @@ void sync_timers(void) int del_timer_sync(timer_t * timer) { + unsigned long flags; tvec_base_t * base; int ret = 0; CHECK_BASE(timer->base); if (!timer->base) return 0; + + local_irq_save(flags); + while (unlikely(test_and_set_bit(0, &timer->lock))) + while (test_bit(0, &timer->lock)); + for (;;) { - unsigned long flags; int running; - repeat: base = timer->base; - spin_lock_irqsave(&base->lock, flags); + spin_lock(&base->lock); if (base != timer->base) { - spin_unlock_irqrestore(&base->lock, flags); + spin_unlock(&base->lock); goto repeat; } ret += detach_timer(timer); timer->list.next = timer->list.prev = 0; running = timer_is_running(base, timer); - spin_unlock_irqrestore(&base->lock, flags); + spin_unlock(&base->lock); if (!running) break; + + clear_bit(0, &timer->lock); + local_irq_restore(flags); timer_synchronize(base, timer); + + local_irq_save(flags); + while (unlikely(test_and_set_bit(0, &timer->lock))) + while (test_bit(0, &timer->lock)); } + clear_bit(0, &timer->lock); + local_irq_restore(flags); + return ret; } #endif Andrea ^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: PATCH: Race in 2.6.0-test2 timer code 2003-07-30 8:37 ` Andrea Arcangeli @ 2003-07-30 10:34 ` Ingo Molnar 2003-07-30 10:51 ` Andrew Morton 2003-07-30 11:22 ` Andrea Arcangeli 0 siblings, 2 replies; 33+ messages in thread From: Ingo Molnar @ 2003-07-30 10:34 UTC (permalink / raw) To: Andrea Arcangeli; +Cc: Andrew Morton, linas, linux-kernel On Wed, 30 Jul 2003, Andrea Arcangeli wrote: > I never did any 2.6 patch, so it maybe a different thing what you've > seen, not what I applied to 2.4. Infact even the 2.4 patch isn't from > me. the 2.4 timer-scalability patches do have a problem: due to TIMER_BH the actual timer expiry can happen on a different CPU, which opens up a del_timer()/add_timer() race in the itimer code. Your patch closes that hole. But on 2.6 the timer will run precisely on the CPU it was added, so i think the race is not possible. Ingo ^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: PATCH: Race in 2.6.0-test2 timer code 2003-07-30 10:34 ` Ingo Molnar @ 2003-07-30 10:51 ` Andrew Morton 2003-07-30 11:28 ` Andrea Arcangeli 2003-07-30 11:22 ` Andrea Arcangeli 1 sibling, 1 reply; 33+ messages in thread From: Andrew Morton @ 2003-07-30 10:51 UTC (permalink / raw) To: Ingo Molnar; +Cc: andrea, linas, linux-kernel Ingo Molnar <mingo@elte.hu> wrote: > > But on 2.6 the timer will run precisely on the CPU it was added, so i > think the race is not possible. well there is add_timer_on()... I still don't see the race in the itimer code actually. On return from del_timer_sync() we know that the timer is not pending, even if it_real_fn() tried to re-add it. ie: why does the below "crash"? Andrea Arcangeli <andrea@suse.de> wrote: > > cpu0 cpu1 > ------------ -------------------- > > do_setitimer > it_real_fn > del_timer_sync add_timer -> crash (Does the timer_pending() test in del_timer_sync() needs some barriers btw?) ^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: PATCH: Race in 2.6.0-test2 timer code 2003-07-30 10:51 ` Andrew Morton @ 2003-07-30 11:28 ` Andrea Arcangeli 0 siblings, 0 replies; 33+ messages in thread From: Andrea Arcangeli @ 2003-07-30 11:28 UTC (permalink / raw) To: Andrew Morton; +Cc: Ingo Molnar, linas, linux-kernel On Wed, Jul 30, 2003 at 03:51:40AM -0700, Andrew Morton wrote: > Ingo Molnar <mingo@elte.hu> wrote: > > > > But on 2.6 the timer will run precisely on the CPU it was added, so i > > think the race is not possible. > > well there is add_timer_on()... > > I still don't see the race in the itimer code actually. On return > from del_timer_sync() we know that the timer is not pending, even > if it_real_fn() tried to re-add it. > > ie: why does the below "crash"? > > > Andrea Arcangeli <andrea@suse.de> wrote: > > > > cpu0 cpu1 > > ------------ -------------------- > > > > do_setitimer > > it_real_fn > > del_timer_sync add_timer -> crash > > > (Does the timer_pending() test in del_timer_sync() needs some > barriers btw?) it might be possible to use ordered writes on one side and ordered reads on the other side to fix this instead of spinlock. I suggested to use my spinlock-by-hand idea to fix it in 2.4 (like I previously did with mod_timer), but we might try to do something more efficient in 2.6 if you've some idea. I don't think it matters much anyways since the cacheline wouldn't be exlusive anyways if we get into the above path, and the above isn't the common path, but maybe it does. I think the unified way of locking with mod_timer/add_timer/del_timer I'm currently used is simple and clean, but if you see any significant performance advantage we can change it of course. If my last email where I analyzed the problem in more detail is not clear or you see fault please let me know of course. thanks, Andrea ^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: PATCH: Race in 2.6.0-test2 timer code 2003-07-30 10:34 ` Ingo Molnar 2003-07-30 10:51 ` Andrew Morton @ 2003-07-30 11:22 ` Andrea Arcangeli 1 sibling, 0 replies; 33+ messages in thread From: Andrea Arcangeli @ 2003-07-30 11:22 UTC (permalink / raw) To: Ingo Molnar; +Cc: Andrew Morton, linas, linux-kernel On Wed, Jul 30, 2003 at 12:34:39PM +0200, Ingo Molnar wrote: > > On Wed, 30 Jul 2003, Andrea Arcangeli wrote: > > > I never did any 2.6 patch, so it maybe a different thing what you've > > seen, not what I applied to 2.4. Infact even the 2.4 patch isn't from > > me. > > the 2.4 timer-scalability patches do have a problem: due to TIMER_BH the > actual timer expiry can happen on a different CPU, which opens up a > del_timer()/add_timer() race in the itimer code. Your patch closes that > hole. > > But on 2.6 the timer will run precisely on the CPU it was added, so i > think the race is not possible. sure, the timer will run in the CPU it was added, and the setitimer will run in a random cpu depending where the process is been rescheduled too between the last setitimer, and the current setitimer. (it's not the timer moving, it's the process!) The only way I can imagine this race not triggering in 2.6 and my old 2.4-aa, is by using cpu bindings, of course if you bind the app that triggers the crash to a single cpu, there's no risk of triggering this. Also note, the 2.4 code I included is different from what you mention, I'm definitely enforcing to run the timer always and exactly in the same cpu it was added. that's why it needs the hooks in the local timer interrupts of x86, x86-64 and I added it a few weeks ago to ia64 too, near the per-process accounting, and it can't run the timers from the global timer irq anymore. Currently only IA64, AMD64 and x86 provides the feature, other archs still fallbacks in the global timer irq and they behave as you described, and for them the scalability is lower of course. But on the archs where this bug triggered it was behaving 100% like 2.6 since it was x86. Andrea ^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: PATCH: Race in 2.6.0-test2 timer code 2003-07-30 5:57 ` Ingo Molnar 2003-07-30 6:36 ` Andrew Morton @ 2003-07-30 20:05 ` linas 2003-07-31 6:50 ` Ingo Molnar 2003-07-31 22:56 ` linas 2 siblings, 1 reply; 33+ messages in thread From: linas @ 2003-07-30 20:05 UTC (permalink / raw) To: Ingo Molnar; +Cc: Andrew Morton, linux-kernel On Wed, Jul 30, 2003 at 07:57:32AM +0200, Ingo Molnar wrote: > > and i dont think Linas' patch is correct either - how can the timer base > change under us? We are holding the timer spinlock. I deduced the need for the patch by looking at the locking code immediately above it, which takes pains to get old_base and new_base correctly locked. My patch was to handle the case of old_base being NULL, which seemed to be unhandled. Comments & disclaimers: -- I don't quite have the 'big picture' for the timer code yet, so please excuse any thinko's below. -- I don't know under what cases timer->base can be NULL, but clearly there are checks for it being NULL, ergo... -- I then see a race where timer->base is NULL, and so old_base is NULL, then some other CPU sets timer->base, then we get the lock on new_base, and blandly assume timer->base is NULL, which it no longer is. A bit more graphically ... timer->base = NULL; /* starting condition */ cpu 1 cpu 2 -------- --------- mod_timer() { old_base = timer->base; if (old_base && ) { /* not taken */ } else . spin_lock(&cpu2_base->lock); . timer->base = cpu2_base; . spin_unlock(&cpu2_base->lock); . spin_lock(&new_base->lock); /* Uhh oh, timer->base is not null, in fact its cpu2 base, and we aren't checking for this before we clobber it. */ I don't see what prevents the above from happening. And if the above really can't happen, then I also don't understand why the fancy old_base, new-base locking code is required at all. > What i'd propose is the attached (tested, against 2.6.0-test2) patch Due to technical difficulties at this end, I'm having trouble testing any patches. The guy who knows how to run the test case to reproduce this is out sick, its a big hairy testcase with all sorts of stuff going on, and I can't make it go. It can take 8+ hours to hit it. I'll need a few days ... The other machine that hits it is a 6-way smp powerpc running nothing but 6 copies of setiathome, but it takes 48+ hours to reproduce there, so testing is even slower there ... --linas ^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: PATCH: Race in 2.6.0-test2 timer code 2003-07-30 20:05 ` linas @ 2003-07-31 6:50 ` Ingo Molnar 0 siblings, 0 replies; 33+ messages in thread From: Ingo Molnar @ 2003-07-31 6:50 UTC (permalink / raw) To: linas; +Cc: Andrew Morton, linux-kernel On Wed, 30 Jul 2003 linas@austin.ibm.com wrote: > cpu 1 cpu 2 > -------- --------- > mod_timer() { > > old_base = timer->base; > if (old_base && ) { /* not taken */ > } > else > . spin_lock(&cpu2_base->lock); this race is not possible on 2.6. You are forgetting: spin_lock_irqsave(&timer->lock, flags); which serializes the full mod_timer() operation. Ok? Ingo ^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: PATCH: Race in 2.6.0-test2 timer code 2003-07-30 5:57 ` Ingo Molnar 2003-07-30 6:36 ` Andrew Morton 2003-07-30 20:05 ` linas @ 2003-07-31 22:56 ` linas 2003-08-01 6:23 ` Ingo Molnar 2 siblings, 1 reply; 33+ messages in thread From: linas @ 2003-07-31 22:56 UTC (permalink / raw) To: Ingo Molnar; +Cc: Andrew Morton, linux-kernel Hi Mingo, On Wed, Jul 30, 2003 at 07:57:32AM +0200, Ingo Molnar wrote: > > On Tue, 29 Jul 2003, Andrew Morton wrote: > > Andrea says that we need to take the per-timer lock in add_timer() and > > del_timer(), but I haven't yet got around to working out why. > > this makes no sense - in 2.6 (and in 2.4) there's no safe add_timer() / > del_timer() use without using external SMP synchronization. (There's one > special timer use variant involving del_timer_sync() that was safe in 2.4 > but is unsafe in 2.6, see below.) I don't understand why you don't like this, since your patch seems to acheive the same results as Andrea's patch (he uses timer->lock to serialize add_timer() and del_timer(), where as your patch modifies add_timer so that it grabs locks on old_base as well as new_base; either approach should fix the add_timer/del_timer race.) > What i'd propose is the attached (tested, against 2.6.0-test2) patch > instead. It unifies the functionality of add_timer() and mod_timer(), and > makes any combination of the timer API calls completely SMP-safe. > del_timer() is still not using the timer lock. > > this patch fixes the only timer bug in 2.6 i'm aware of: the > del_timer_sync() + add_timer() combination in kernel/itimer.c is buggy. > This was correct code in 2.4, because there it was safe to do an > add_timer() from the timer handler itself, parallel to a del_timer_sync(). > If we want to make this safe in 2.6 too (which i think we want to) then we > have to make add_timer() almost equivalent to mod_timer(), locking-wise. > And once we are at this point i think it's much cleaner to actually make > add_timer() a variant of mod_timer(). (There's no locking cost for > add_timer(), only the cost of an extra branch. And we've removed another > commonly used function from the icache.) Well, I'm confused by this a bit too, now. I saw this bug in 2.4 and I thought that Andrea was implying that it couldn't happen in 2.6. He seemed to be saying that the del_timer_sync() + add_timer() race can happen only in 2.4, where add_timer() is running on the 'wrong' cpu bacuase it got there through the evil run_all_timers()/TIMER_BH. Since there's no run_all_timers() in 2.6, he seemed to imply that the race 'couldn't happen'. Is he right? So, to add to my 'stupid question' quota of the day: the only way that a timer could run on a CPU other than what it was added on would be for a softirq to somehow get moved to a different cpu, and that is impossible, right? > Linas, could you please give this patch a go, does it make a difference to > your timer list corruption problem? I've booted it on SMP and UP as well. Still trying ... but after reading it, it looks like it will fix my 2.4 bug. --linas ^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: PATCH: Race in 2.6.0-test2 timer code 2003-07-31 22:56 ` linas @ 2003-08-01 6:23 ` Ingo Molnar 0 siblings, 0 replies; 33+ messages in thread From: Ingo Molnar @ 2003-08-01 6:23 UTC (permalink / raw) To: linas; +Cc: Andrew Morton, linux-kernel On Thu, 31 Jul 2003 linas@austin.ibm.com wrote: > Well, I'm confused by this a bit too, now. I saw this bug in 2.4 and I > thought that Andrea was implying that it couldn't happen in 2.6. He > seemed to be saying that the del_timer_sync() + add_timer() race can > happen only in 2.4, where add_timer() is running on the 'wrong' cpu > bacuase it got there through the evil run_all_timers()/TIMER_BH. Since > there's no run_all_timers() in 2.6, he seemed to imply that the race > 'couldn't happen'. Is he right? it is correct, but it was me convincing Andrea about this, not the other way around :-) Pls. re-read the email thread. My point still stands: >> (in any case, i still think it would be safer to 'upgrade' the >> add_timer() interface to be SMP-safe and to allow double-adds - but not >> for any bug reason anymore.) this should also make backporting easier. Ingo ^ permalink raw reply [flat|nested] 33+ messages in thread
end of thread, other threads:[~2003-08-01 6:28 UTC | newest] Thread overview: 33+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2003-07-29 15:41 PATCH: Race in 2.6.0-test2 timer code linas 2003-07-29 20:56 ` Andrew Morton 2003-07-30 5:57 ` Ingo Molnar 2003-07-30 6:36 ` Andrew Morton 2003-07-30 7:07 ` Ingo Molnar 2003-07-30 7:34 ` Andrea Arcangeli 2003-07-30 7:34 ` Ingo Molnar 2003-07-30 8:28 ` Andrea Arcangeli 2003-07-30 10:31 ` Ingo Molnar 2003-07-30 11:16 ` Andrea Arcangeli 2003-07-30 11:49 ` Ingo Molnar 2003-07-30 12:34 ` Andrea Arcangeli 2003-07-30 21:18 ` linas 2003-07-30 22:06 ` Andrea Arcangeli 2003-07-30 22:17 ` Andrea Arcangeli 2003-07-31 7:04 ` Ingo Molnar 2003-07-30 21:19 ` Andrea Arcangeli 2003-07-30 23:43 ` linas 2003-07-30 23:56 ` Andrea Arcangeli 2003-07-30 23:54 ` Andrew Morton 2003-07-31 0:16 ` Andrea Arcangeli 2003-07-31 17:23 ` linas 2003-08-01 6:27 ` Ingo Molnar 2003-07-30 7:40 ` Ingo Molnar 2003-07-30 8:37 ` Andrea Arcangeli 2003-07-30 10:34 ` Ingo Molnar 2003-07-30 10:51 ` Andrew Morton 2003-07-30 11:28 ` Andrea Arcangeli 2003-07-30 11:22 ` Andrea Arcangeli 2003-07-30 20:05 ` linas 2003-07-31 6:50 ` Ingo Molnar 2003-07-31 22:56 ` linas 2003-08-01 6:23 ` Ingo Molnar
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).