* [PATCH] BUG(): sched.c: Line 944 @ 2002-09-16 18:48 Robert Love 2002-09-16 19:01 ` Linus Torvalds 0 siblings, 1 reply; 34+ messages in thread From: Robert Love @ 2002-09-16 18:48 UTC (permalink / raw) To: torvalds; +Cc: linux-kernel Linus, The current in_atomic() check fails with kernel preemption enabled since we set preempt_count to PREEMPT_ACTIVE in preempt_schedule(). We need to additionally check whether PREEMPT_ACTIVE is set. There is also still the issue that bugging out is a bit drastic and a hindrance to debugging; but I will tackle that later. For now, please apply this so we can at least boot with preemption enabled. Patch is against 2.5.35-bk. Robert Love diff -urN linux-2.5.35/kernel/sched.c linux/kernel/sched.c --- linux-2.5.35/kernel/sched.c Sun Sep 15 22:18:24 2002 +++ linux/kernel/sched.c Mon Sep 16 14:43:54 2002 @@ -940,8 +940,7 @@ struct list_head *queue; int idx; - if (unlikely(in_atomic())) - BUG(); + BUG_ON(in_atomic() && preempt_count() != PREEMPT_ACTIVE); #if CONFIG_DEBUG_HIGHMEM check_highmem_ptes(); @@ -959,7 +958,7 @@ * if entering off of a kernel preemption go straight * to picking the next task. */ - if (unlikely(preempt_count() & PREEMPT_ACTIVE)) + if (unlikely(preempt_count() == PREEMPT_ACTIVE)) goto pick_next_task; switch (prev->state) { ^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: [PATCH] BUG(): sched.c: Line 944 2002-09-16 18:48 [PATCH] BUG(): sched.c: Line 944 Robert Love @ 2002-09-16 19:01 ` Linus Torvalds 2002-09-16 21:14 ` Robert Love 0 siblings, 1 reply; 34+ messages in thread From: Linus Torvalds @ 2002-09-16 19:01 UTC (permalink / raw) To: Robert Love; +Cc: linux-kernel On 16 Sep 2002, Robert Love wrote: > > The current in_atomic() check fails with kernel preemption enabled since > we set preempt_count to PREEMPT_ACTIVE in preempt_schedule(). > > We need to additionally check whether PREEMPT_ACTIVE is set. Would it not be a lot better to just mask off PREEMPT_ACTIVE() instead of checking for it explicitly. The in_interrupt() etc stuff already effectively do this by masking off the HARDIRQ_MASK etc. I would prefer a patch to hardirq.h that just adds a #define to make preempt_count() not contain PREEMPT_ACTIVE - and make the PREEMPT_ACTIVE checks be a totally separate check (logic: it's not a count, so it shouldn't show up in preempt_count()) > There is also still the issue that bugging out is a bit drastic and a > hindrance to debugging; but I will tackle that later. For now, please > apply this so we can at least boot with preemption enabled. I certainly wouldn't mind the DEBUG/WARNING/FATAL infrastructure discussed earlier.. Linus ^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: [PATCH] BUG(): sched.c: Line 944 2002-09-16 19:01 ` Linus Torvalds @ 2002-09-16 21:14 ` Robert Love 2002-09-16 21:41 ` Linus Torvalds 0 siblings, 1 reply; 34+ messages in thread From: Robert Love @ 2002-09-16 21:14 UTC (permalink / raw) To: Linus Torvalds; +Cc: linux-kernel On Mon, 2002-09-16 at 15:01, Linus Torvalds wrote: > Would it not be a lot better to just mask off PREEMPT_ACTIVE() instead of > checking for it explicitly. > > The in_interrupt() etc stuff already effectively do this by masking off > the HARDIRQ_MASK etc. I would prefer a patch to hardirq.h that just adds a > #define to make preempt_count() not contain PREEMPT_ACTIVE - and make the > PREEMPT_ACTIVE checks be a totally separate check (logic: it's not a > count, so it shouldn't show up in preempt_count()) I liked this idea, and was working on implementing it when I ran into a few roadblocks. Your ideas are welcome. First, "preempt_count()" is used as an l-value in a lot of places, i.e. look at all the "preempt_count() += foo" in the IRQ code. We cannot mask things out of it. Thus, I then looked into doing a separate function for the raw value, say an "atomic_count()" ... the code just looked ugly mixing "atomic_count()" and "preempt_count()" for no apparent reason. Third, PREEMPT_ACTIVE actually _is_ part of the count. It helps assure us a task is not preempted repeatedly. If we did not have it, we would have to bump preempt_count on preemption. So we still need it in the preempt_count(). Simplest solution is to: #define in_atomic() \ (preempt_count() & ~PREEMPT_ACTIVE) != kernel_locked()) although I still dislike the masking just to make the schedule() code-path cleaner. Oh, and there is another problem: printk() from schedule() implicitly calls wake_up(). My machine dies even with just a printk() and not a BUG()... I suspect there may be some SMP issue in that whole mess too, because setting oops_in_progress prior did not help. Comments? Robert Love ^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: [PATCH] BUG(): sched.c: Line 944 2002-09-16 21:14 ` Robert Love @ 2002-09-16 21:41 ` Linus Torvalds 2002-09-16 22:15 ` Robert Love 0 siblings, 1 reply; 34+ messages in thread From: Linus Torvalds @ 2002-09-16 21:41 UTC (permalink / raw) To: Robert Love; +Cc: linux-kernel On 16 Sep 2002, Robert Love wrote: > > I liked this idea, and was working on implementing it when I ran into a > few roadblocks. Your ideas are welcome. > > First, "preempt_count()" is used as an l-value in a lot of places, i.e. > look at all the "preempt_count() += foo" in the IRQ code. We cannot > mask things out of it. Ok. Let's just make the masking explicit in in_atomic() then, like you suggest: > Simplest solution is to: > > #define in_atomic() \ > (preempt_count() & ~PREEMPT_ACTIVE) != kernel_locked()) > > although I still dislike the masking just to make the schedule() > code-path cleaner. I don't think this is a scheduler cleanliness issue: it's a consistency issue. If "in_interrupt()" and friends do not care about PREEMPT_ACTIVE, then neither should "in_atomic()". The fact that the scheduler test gets cleaned up is secondary - although it is obviously a result of being consistent. > Oh, and there is another problem: printk() from schedule() implicitly > calls wake_up(). My machine dies even with just a printk() and not a > BUG()... I suspect there may be some SMP issue in that whole mess too, > because setting oops_in_progress prior did not help. Hmm.. It will call wake_up() because it will try to wake up any klogd. What's the problem? Calling wake_up() should be fine from there.. Linus ^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: [PATCH] BUG(): sched.c: Line 944 2002-09-16 21:41 ` Linus Torvalds @ 2002-09-16 22:15 ` Robert Love 2002-09-16 22:26 ` Linus Torvalds 0 siblings, 1 reply; 34+ messages in thread From: Robert Love @ 2002-09-16 22:15 UTC (permalink / raw) To: Linus Torvalds; +Cc: linux-kernel On Mon, 2002-09-16 at 17:41, Linus Torvalds wrote: > Ok. Let's just make the masking explicit in in_atomic() then, like you > suggest: OK. But, ugh, more fun: we preempt_disable() in do_exit(). Every exiting task hits the test. My syslog is huge. At least for now, can we please revert the check to in_interrupt() ? Robert Love diff -urN linux-2.5.35/kernel/sched.c linux/kernel/sched.c --- linux-2.5.35/kernel/sched.c Sun Sep 15 22:18:24 2002 +++ linux/kernel/sched.c Mon Sep 16 18:17:06 2002 @@ -940,7 +940,7 @@ struct list_head *queue; int idx; - if (unlikely(in_atomic())) + if (unlikely(in_interrupt())) BUG(); #if CONFIG_DEBUG_HIGHMEM ^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: [PATCH] BUG(): sched.c: Line 944 2002-09-16 22:15 ` Robert Love @ 2002-09-16 22:26 ` Linus Torvalds 2002-09-16 23:15 ` Robert Love 0 siblings, 1 reply; 34+ messages in thread From: Linus Torvalds @ 2002-09-16 22:26 UTC (permalink / raw) To: Robert Love; +Cc: linux-kernel On 16 Sep 2002, Robert Love wrote: > > But, ugh, more fun: we preempt_disable() in do_exit(). Every exiting > task hits the test. My syslog is huge. > > At least for now, can we please revert the check to in_interrupt() ? I really think the test is correct, and if we revert it now, we certainly won't be able to re-introduce it later when we're closer to 2.6. So if the in_atomic() change is enough to fix everything but do_exit(), then how about just making do_exit() use PREEMPT_ACTIVE instead? Linus ^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: [PATCH] BUG(): sched.c: Line 944 2002-09-16 22:26 ` Linus Torvalds @ 2002-09-16 23:15 ` Robert Love 2002-09-16 23:45 ` Linus Torvalds 0 siblings, 1 reply; 34+ messages in thread From: Robert Love @ 2002-09-16 23:15 UTC (permalink / raw) To: Linus Torvalds; +Cc: linux-kernel On Mon, 2002-09-16 at 18:26, Linus Torvalds wrote: > On 16 Sep 2002, Robert Love wrote: > > > > At least for now, can we please revert the check to in_interrupt() ? > > I really think the test is correct, and if we revert it now, we certainly > won't be able to re-introduce it later when we're closer to 2.6. > > So if the in_atomic() change is enough to fix everything but do_exit(), > then how about just making do_exit() use PREEMPT_ACTIVE instead? Nope. If PREEMPT_ACTIVE is set, schedule() assumes the task is being preempted and skips certain logic e.g. deactivate_task() (this is the same code that lets us safely preempt a TASK_ZOMBIE). Result is death before init even executes. Ugh... Robert Love ^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: [PATCH] BUG(): sched.c: Line 944 2002-09-16 23:15 ` Robert Love @ 2002-09-16 23:45 ` Linus Torvalds 2002-09-16 23:58 ` Robert Love 0 siblings, 1 reply; 34+ messages in thread From: Linus Torvalds @ 2002-09-16 23:45 UTC (permalink / raw) To: Robert Love; +Cc: linux-kernel On 16 Sep 2002, Robert Love wrote: > > Nope. If PREEMPT_ACTIVE is set, schedule() assumes the task is being > preempted and skips certain logic e.g. deactivate_task() (this is the > same code that lets us safely preempt a TASK_ZOMBIE). Ahhah! I know. You just make lock_depth 0 when you exit, without actually taking the kernel lock. Which fools the logic into accepting a preempt-disable, since it thinks that the preempt disable is due to holding the kernel lock. Add a big comment and you're all done. Linus ^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: [PATCH] BUG(): sched.c: Line 944 2002-09-16 23:45 ` Linus Torvalds @ 2002-09-16 23:58 ` Robert Love 2002-09-17 5:56 ` Linus Torvalds 0 siblings, 1 reply; 34+ messages in thread From: Robert Love @ 2002-09-16 23:58 UTC (permalink / raw) To: Linus Torvalds; +Cc: linux-kernel On Mon, 2002-09-16 at 19:45, Linus Torvalds wrote: > Ahhah! I know. You just make lock_depth 0 when you exit, without actually > taking the kernel lock. Which fools the logic into accepting a > preempt-disable, since it thinks that the preempt disable is due to > holding the kernel lock. I was this -> <- close to celebrating. Not so fast, smarty. What about release_kernel_lock() ? It sees task->lock_depth>=0 and calls spin_unlock() on a lock that it does not hold. Robert Love ^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: [PATCH] BUG(): sched.c: Line 944 2002-09-16 23:58 ` Robert Love @ 2002-09-17 5:56 ` Linus Torvalds 2002-09-17 8:12 ` Robert Love 0 siblings, 1 reply; 34+ messages in thread From: Linus Torvalds @ 2002-09-17 5:56 UTC (permalink / raw) To: Robert Love; +Cc: linux-kernel On 16 Sep 2002, Robert Love wrote: > > I was this -> <- close to celebrating. Not so fast, smarty. You forget - I'm not only a smarty, I'm sick and twisted too. > What about release_kernel_lock() ? > > It sees task->lock_depth>=0 and calls spin_unlock() on a lock that it > does not hold. We have a simple rule: - task->lock_depth = -1 means "no lock held" - task->lock_depth >= 0 means "BKL really held" ... but what does "task->lock_depth < -1" mean? Yup: "validly nonpreemptable". So then you have: #define kernel_locked() (current->lock_depth >= 0) #define in_atomic() (preempt_count() & ~PREEMPT_ACTIVE) != (current->lock_depth != -1)) and you're all set - just set lock_depth to -2 when you exit to show that you don't hold the BKL, but that you are validly not preemtable. I get the award for having the most disgusting ideas. Linus "but it works and is efficient" Torvalds ^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: [PATCH] BUG(): sched.c: Line 944 2002-09-17 5:56 ` Linus Torvalds @ 2002-09-17 8:12 ` Robert Love 2002-09-17 8:51 ` Robert Love 2002-09-17 8:59 ` Robert Love 0 siblings, 2 replies; 34+ messages in thread From: Robert Love @ 2002-09-17 8:12 UTC (permalink / raw) To: Linus Torvalds; +Cc: linux-kernel On Tue, 2002-09-17 at 01:56, Linus Torvalds wrote: > We have a simple rule: > - task->lock_depth = -1 means "no lock held" > - task->lock_depth >= 0 means "BKL really held" > > ... but what does "task->lock_depth < -1" mean? > > Yup: "validly nonpreemptable". > > So then you have: > > #define kernel_locked() (current->lock_depth >= 0) > > #define in_atomic() (preempt_count() & ~PREEMPT_ACTIVE) != (current->lock_depth != -1)) > > and you're all set - just set lock_depth to -2 when you exit to show that > you don't hold the BKL, but that you are validly not preemtable. > > I get the award for having the most disgusting ideas. Yah, you do. Good job though. I implemented exactly what you detailed, with one change: we need to check kernel_locked() before setting lock_depth because it is valid to exit() while holding the BKL. Aside from kernel code that does it intentionally, crashed code (e.g. modules) would have the same problem. Anyhow, this works. Finally. Except the printk() still causes my system to lockup, but that is another (tomorrow's) issue. Patch is against current BK. Thanks for the help. Robert Love diff -urN linux-2.5.35/include/asm-i386/hardirq.h linux/include/asm-i386/hardirq.h --- linux-2.5.35/include/asm-i386/hardirq.h Sun Sep 15 22:18:46 2002 +++ linux/include/asm-i386/hardirq.h Tue Sep 17 03:20:00 2002 @@ -77,7 +77,8 @@ #define irq_enter() (preempt_count() += HARDIRQ_OFFSET) #if CONFIG_PREEMPT -# define in_atomic() (preempt_count() != kernel_locked()) +# define in_atomic() \ + ((preempt_count() & ~PREEMPT_ACTIVE) != (current->lock_depth != -1)) # define IRQ_EXIT_OFFSET (HARDIRQ_OFFSET-1) #else # define in_atomic() (preempt_count() != 0) diff -urN linux-2.5.35/kernel/exit.c linux/kernel/exit.c --- linux-2.5.35/kernel/exit.c Tue Sep 17 03:18:53 2002 +++ linux/kernel/exit.c Tue Sep 17 03:55:12 2002 @@ -637,6 +637,21 @@ preempt_disable(); if (current->exit_signal == -1) release_task(current); + + /* + * This little bit of genius comes from the twisted mind of Linus. + * We need exit() to be atomic but we also want a debugging check + * in schedule() to whine if we are atomic. The wickedness is in + * these rules: + * - task->lock_depth = -2 means "validly nonpreemptable" + * - task->lock_depth = -1 means "BKL not held" + * - task->lock_depth >= 0 means "BKL held" + * release_kernel_lock and kernel_locked() check >=0, and + * in_atomic() checks != -1... the "fake BKL" will "cancel out" + * the preempt_disable() above and everyone is happy. + */ + if (unlikely(!kernel_locked())) + tsk->lock_depth = -2; schedule(); BUG(); /* diff -urN linux-2.5.35/kernel/sched.c linux/kernel/sched.c --- linux-2.5.35/kernel/sched.c Sun Sep 15 22:18:24 2002 +++ linux/kernel/sched.c Tue Sep 17 03:50:04 2002 @@ -940,8 +940,10 @@ struct list_head *queue; int idx; - if (unlikely(in_atomic())) - BUG(); + if (unlikely(in_atomic())) { + printk(KERN_ERR "error: scheduling while non-atomic!\n"); + dump_stack(); + } #if CONFIG_DEBUG_HIGHMEM check_highmem_ptes(); @@ -959,7 +961,7 @@ * if entering off of a kernel preemption go straight * to picking the next task. */ - if (unlikely(preempt_count() & PREEMPT_ACTIVE)) + if (unlikely(preempt_count() == PREEMPT_ACTIVE)) goto pick_next_task; switch (prev->state) { ^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: [PATCH] BUG(): sched.c: Line 944 2002-09-17 8:12 ` Robert Love @ 2002-09-17 8:51 ` Robert Love 2002-09-17 8:59 ` Robert Love 1 sibling, 0 replies; 34+ messages in thread From: Robert Love @ 2002-09-17 8:51 UTC (permalink / raw) To: Linus Torvalds; +Cc: linux-kernel On Tue, 2002-09-17 at 04:12, Robert Love wrote: > + if (unlikely(!kernel_locked())) > + tsk->lock_depth = -2; Eh, this should be likely()... Robert Love diff -urN linux-2.5.35/include/asm-i386/hardirq.h linux/include/asm-i386/hardirq.h --- linux-2.5.35/include/asm-i386/hardirq.h Sun Sep 15 22:18:46 2002 +++ linux/include/asm-i386/hardirq.h Tue Sep 17 03:20:00 2002 @@ -77,7 +77,8 @@ #define irq_enter() (preempt_count() += HARDIRQ_OFFSET) #if CONFIG_PREEMPT -# define in_atomic() (preempt_count() != kernel_locked()) +# define in_atomic() \ + ((preempt_count() & ~PREEMPT_ACTIVE) != (current->lock_depth != -1)) # define IRQ_EXIT_OFFSET (HARDIRQ_OFFSET-1) #else # define in_atomic() (preempt_count() != 0) diff -urN linux-2.5.35/kernel/exit.c linux/kernel/exit.c --- linux-2.5.35/kernel/exit.c Tue Sep 17 03:18:53 2002 +++ linux/kernel/exit.c Tue Sep 17 03:55:12 2002 @@ -637,6 +637,21 @@ preempt_disable(); if (current->exit_signal == -1) release_task(current); + + /* + * This little bit of genius comes from the twisted mind of Linus. + * We need exit() to be atomic but we also want a debugging check + * in schedule() to whine if we are atomic. The wickedness is in + * these rules: + * - task->lock_depth = -2 means "validly nonpreemptable" + * - task->lock_depth = -1 means "BKL not held" + * - task->lock_depth >= 0 means "BKL held" + * release_kernel_lock and kernel_locked() check >=0, and + * in_atomic() checks != -1... the "fake BKL" will "cancel out" + * the preempt_disable() above and the world is happy. + */ + if (likely(!kernel_locked())) + tsk->lock_depth = -2; schedule(); BUG(); /* diff -urN linux-2.5.35/kernel/sched.c linux/kernel/sched.c --- linux-2.5.35/kernel/sched.c Sun Sep 15 22:18:24 2002 +++ linux/kernel/sched.c Tue Sep 17 03:50:04 2002 @@ -940,8 +940,10 @@ struct list_head *queue; int idx; - if (unlikely(in_atomic())) - BUG(); + if (unlikely(in_atomic())) { + printk(KERN_ERR "error: scheduling while non-atomic!\n"); + dump_stack(); + } #if CONFIG_DEBUG_HIGHMEM check_highmem_ptes(); @@ -959,7 +961,7 @@ * if entering off of a kernel preemption go straight * to picking the next task. */ - if (unlikely(preempt_count() & PREEMPT_ACTIVE)) + if (unlikely(preempt_count() == PREEMPT_ACTIVE)) goto pick_next_task; switch (prev->state) { ^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: [PATCH] BUG(): sched.c: Line 944 2002-09-17 8:12 ` Robert Love 2002-09-17 8:51 ` Robert Love @ 2002-09-17 8:59 ` Robert Love 2002-09-17 9:57 ` Ingo Molnar ` (2 more replies) 1 sibling, 3 replies; 34+ messages in thread From: Robert Love @ 2002-09-17 8:59 UTC (permalink / raw) To: Linus Torvalds; +Cc: linux-kernel On Tue, 2002-09-17 at 04:12, Robert Love wrote: > I implemented exactly what you detailed, with one change: we need to > check kernel_locked() before setting lock_depth because it is valid to > exit() while holding the BKL. Aside from kernel code that does it > intentionally, crashed code (e.g. modules) would have the same problem. fsck, this is non-ending... obviously, this is insufficient: preempt_count will equal two if the BKL was previously held and in_atomic() will trip. This should work: if (likely(!kernel_locked()) tsk->lock_depth = -2; else { /* compensate for BKL; we still cannot preempt */ preempt_enable_no_resched(); } look sane? Now, remind me why this is all worth it... Robert Love ^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: [PATCH] BUG(): sched.c: Line 944 2002-09-17 8:59 ` Robert Love @ 2002-09-17 9:57 ` Ingo Molnar 2002-09-17 18:27 ` Robert Love 2002-09-17 14:10 ` Steven Cole 2002-09-17 15:27 ` Linus Torvalds 2 siblings, 1 reply; 34+ messages in thread From: Ingo Molnar @ 2002-09-17 9:57 UTC (permalink / raw) To: Robert Love; +Cc: Linus Torvalds, linux-kernel On 17 Sep 2002, Robert Love wrote: [...] > Now, remind me why this is all worth it... having preemption support that 1) is correct 2) works? We *must* use the schedule() check to debug preemption bugs, or we wont have usable preemption in 2.6, i dont really understand why your are not happy that we have such a great tool. In fact we should also add other debugging bits, like 'check for !0 preemption count in smp_processor_id()' , and the underflow checks that caught the IDE bug. These are all bits that help the elimination of preemption bugs which are also often SMP bugs, on plain UP boxes. Ingo ^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: [PATCH] BUG(): sched.c: Line 944 2002-09-17 9:57 ` Ingo Molnar @ 2002-09-17 18:27 ` Robert Love 2002-09-17 18:46 ` Ingo Molnar 0 siblings, 1 reply; 34+ messages in thread From: Robert Love @ 2002-09-17 18:27 UTC (permalink / raw) To: Ingo Molnar; +Cc: Linus Torvalds, linux-kernel On Tue, 2002-09-17 at 05:57, Ingo Molnar wrote: > We *must* use the schedule() check to debug preemption bugs, or we wont > have usable preemption in 2.6, i dont really understand why your are not > happy that we have such a great tool. In fact we should also add other > debugging bits, like 'check for !0 preemption count in smp_processor_id()' > , and the underflow checks that caught the IDE bug. These are all bits > that help the elimination of preemption bugs which are also often SMP > bugs, on plain UP boxes. Hm, sorry if I sound like I do not want something "so great". I do. I just do not ever want to compromise the existing code... I would much prefer to say "wow we cannot do this cleanly now, let's wait until we figure out a clean way". Anyhow, one of us is confused. How can this in_atomic() test _ever_ catch a preemption bug? We cannot enter the scheduler off kernel preemption unless preempt_count==0. This is a test to catch bugs in other parts of the kernel, e.g. where code explicitly calls schedule() while holding a lock. Robert Love ^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: [PATCH] BUG(): sched.c: Line 944 2002-09-17 18:27 ` Robert Love @ 2002-09-17 18:46 ` Ingo Molnar 0 siblings, 0 replies; 34+ messages in thread From: Ingo Molnar @ 2002-09-17 18:46 UTC (permalink / raw) To: Robert Love; +Cc: Linus Torvalds, linux-kernel On 17 Sep 2002, Robert Love wrote: > [...] How can this in_atomic() test _ever_ catch a preemption bug? We > cannot enter the scheduler off kernel preemption unless > preempt_count==0. This is a test to catch bugs in other parts of the > kernel, e.g. where code explicitly calls schedule() while holding a > lock. you are right, i was confusing this with the older check for disabled interrupt in preempt_schedule() [which i'd still find useful]. The smp_processor_id() test catches true preemption bugs. So does preempt_count() underflow detection. i do agree with Alan - there can be nothing bad in trying to fix all that non-preempt-aware code right now, before it becomes too late. Ingo ^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: [PATCH] BUG(): sched.c: Line 944 2002-09-17 8:59 ` Robert Love 2002-09-17 9:57 ` Ingo Molnar @ 2002-09-17 14:10 ` Steven Cole 2002-09-17 18:29 ` Robert Love 2002-09-17 15:27 ` Linus Torvalds 2 siblings, 1 reply; 34+ messages in thread From: Steven Cole @ 2002-09-17 14:10 UTC (permalink / raw) To: Robert Love; +Cc: Linus Torvalds, Linux Kernel On Tue, 2002-09-17 at 02:59, Robert Love wrote: > On Tue, 2002-09-17 at 04:12, Robert Love wrote: > > > I implemented exactly what you detailed, with one change: we need to > > check kernel_locked() before setting lock_depth because it is valid to > > exit() while holding the BKL. Aside from kernel code that does it > > intentionally, crashed code (e.g. modules) would have the same problem. > > fsck, this is non-ending... obviously, this is insufficient: > preempt_count will equal two if the BKL was previously held and > in_atomic() will trip. > > This should work: > > if (likely(!kernel_locked()) > tsk->lock_depth = -2; > else { > /* compensate for BKL; we still cannot preempt */ > preempt_enable_no_resched(); > } > Robert, I applied the modified patch you sent at 04:51 plus the above and had to change dump_stack() to show_stack(NULL) in sched.c due to kernel/kernel.o: In function `schedule': kernel/kernel.o(.text+0xf13): undefined reference to `dump_stack' I used plain vanilla 2.5.35 to patch against. I booted that so-patched kernel and it got much further than before, up to where syslogd was able to write some stuff to /var/log/messages. There were many instances of "error: scheduling while non-atomic!", and the traces were similar. Here is a decoded one: [steven@spc5 linux-2.5.35]$ ksymoops -K -L -O -v vmlinux -m System.map <non-atomic.txt ksymoops 2.4.4 on i686 2.5.35. Options used -v vmlinux (specified) -K (specified) -L (specified) -O (specified) -m System.map (specified) f7739f50 c0282f00 f7771960 c0337f20 c011c51b c1b0eec0 c1b71ba0 f7738000 f7738000 f7738000 f78d4da0 c011d08b f78eb380 c012d67f f78d01c0 f78eb380 f78eb560 f78d01c0 f78d01dc 40015000 bffffc48 c012d6c5 4213030c 00000004 Call Trace: [<c011c51b>] [<c011d08b>] [<c012d67f>] [<c012d6c5>] [<c010918f>] Warning (Oops_read): Code line not seen, dumping what data is available Trace; c011c51b <put_files_struct+bb/d0> Trace; c011d08b <do_exit+35b/370> Trace; c012d67f <do_munmap+11f/130> Trace; c012d6c5 <sys_munmap+35/60> Trace; c010918f <syscall_call+7/b> Hope this helps, Steven ^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: [PATCH] BUG(): sched.c: Line 944 2002-09-17 14:10 ` Steven Cole @ 2002-09-17 18:29 ` Robert Love 2002-09-17 18:42 ` Steven Cole 0 siblings, 1 reply; 34+ messages in thread From: Robert Love @ 2002-09-17 18:29 UTC (permalink / raw) To: Steven Cole; +Cc: Linus Torvalds, Linux Kernel On Tue, 2002-09-17 at 10:10, Steven Cole wrote: > I booted that so-patched kernel and it got much further than before, > up to where syslogd was able to write some stuff to /var/log/messages. That is what is happening to me... if you trace when you lockup, its due to the printk. My machines in these cases are only livelocked, I can still ping etc. > Trace; c011c51b <put_files_struct+bb/d0> > Trace; c011d08b <do_exit+35b/370> > Trace; c012d67f <do_munmap+11f/130> > Trace; c012d6c5 <sys_munmap+35/60> > Trace; c010918f <syscall_call+7/b> Ugh this looks like the exit path which should not be triggered. Robert Love ^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: [PATCH] BUG(): sched.c: Line 944 2002-09-17 18:29 ` Robert Love @ 2002-09-17 18:42 ` Steven Cole 0 siblings, 0 replies; 34+ messages in thread From: Steven Cole @ 2002-09-17 18:42 UTC (permalink / raw) To: Robert Love; +Cc: Linus Torvalds, Linux Kernel On Tue, 2002-09-17 at 12:29, Robert Love wrote: > On Tue, 2002-09-17 at 10:10, Steven Cole wrote: > > > I booted that so-patched kernel and it got much further than before, > > up to where syslogd was able to write some stuff to /var/log/messages. > > That is what is happening to me... if you trace when you lockup, its due > to the printk. My machines in these cases are only livelocked, I can > still ping etc. > > > Trace; c011c51b <put_files_struct+bb/d0> > > Trace; c011d08b <do_exit+35b/370> > > Trace; c012d67f <do_munmap+11f/130> > > Trace; c012d6c5 <sys_munmap+35/60> > > Trace; c010918f <syscall_call+7/b> > > Ugh this looks like the exit path which should not be triggered. > I grabbed the 2.5.35-bk3 patch provided by Jeff Garzik so that I could test something even more current and applied your patches to that (one hunk was already there in -bk3). I ran several of the stack dumps through ksymoops. Perhaps some of these might point to something interesting. Steven ksymoops 2.4.4 on i686 2.5.35. Options used -v vmlinux (specified) -K (specified) -L (specified) -O (specified) -m System.map (specified) f773df4c c01167a6 c02834a0 f77801c0 c0337f20 c011c4eb c1b0eec0 f7936c60 f773c000 f773c000 f773c000 f77021a0 c011d05b 00000000 0000a9db f77021a0 c0122340 f7744760 f78cd2e0 f7744760 f773c000 bffffd08 c0122522 4213030c Call Trace: [<c01167a6>] [<c011c4eb>] [<c011d05b>] [<c0122340>] [<c0122522>] [<c010918f>] f773df4c c01167a6 c02834a0 f77801c0 c0337f20 c011c4eb c1b0eec0 f7936ee0 f773c000 f773c000 f773c000 f77021a0 c011d05b c022305b 00000003 bfffea00 0000001c 00000000 0804b3c8 00000014 00000003 bfffea00 0000001c 4213030c Call Trace: [<c01167a6>] [<c011c4eb>] [<c011d05b>] [<c022305b>] [<c010918f>] f7739f4c c01167a6 c02834a0 f77801c0 c0337f20 c011c4eb c1b0eec0 f7936820 f7738000 f7738000 f7738000 f772a800 c011d05b 00000000 f7738000 bffff7a4 f7739fb0 0807e210 04000000 42029098 00000000 00000000 00000000 4213030c Call Trace: [<c01167a6>] [<c011c4eb>] [<c011d05b>] [<c010918f>] f7789f4c c01167a6 c02834a0 f7c5e760 c0337f20 c011c4eb c1b0eec0 c1b71da0 f7788000 f7788000 f7788000 f7ea8d60 c011d05b 0000006e 00000000 00000005 c0222fb0 00000005 f77fa1a0 fffffff7 00000005 bffffb28 c0142fe3 4213030c Call Trace: [<c01167a6>] [<c011c4eb>] [<c011d05b>] [<c0222fb0>] [<c0142fe3>] [<c010918f>] Warning (Oops_read): Code line not seen, dumping what data is available Trace; c01167a6 <schedule+36/3d0> Trace; c011c4eb <put_files_struct+bb/d0> Trace; c011d05b <do_exit+35b/370> Trace; c0122340 <process_timeout+0/10> Trace; c0122522 <sys_nanosleep+122/1a0> Trace; c010918f <syscall_call+7/b> Trace; c01167a6 <schedule+36/3d0> Trace; c011c4eb <put_files_struct+bb/d0> Trace; c011d05b <do_exit+35b/370> Trace; c022305b <sys_socketcall+13b/200> Trace; c010918f <syscall_call+7/b> Trace; c01167a6 <schedule+36/3d0> Trace; c011c4eb <put_files_struct+bb/d0> Trace; c011d05b <do_exit+35b/370> Trace; c010918f <syscall_call+7/b> Trace; c01167a6 <schedule+36/3d0> Trace; c011c4eb <put_files_struct+bb/d0> Trace; c011d05b <do_exit+35b/370> Trace; c0222fb0 <sys_socketcall+90/200> Trace; c0142fe3 <sys_write+33/40> Trace; c010918f <syscall_call+7/b> 1 warning issued. Results may not be reliable. ^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: [PATCH] BUG(): sched.c: Line 944 2002-09-17 8:59 ` Robert Love 2002-09-17 9:57 ` Ingo Molnar 2002-09-17 14:10 ` Steven Cole @ 2002-09-17 15:27 ` Linus Torvalds 2002-09-17 15:40 ` Linus Torvalds 2 siblings, 1 reply; 34+ messages in thread From: Linus Torvalds @ 2002-09-17 15:27 UTC (permalink / raw) To: Robert Love; +Cc: linux-kernel On 17 Sep 2002, Robert Love wrote: > > Now, remind me why this is all worth it... I really think we need this to have a stable system. Alan still thinks preempt is unstable, and this helps counter some of that - by adding sanity checks that all the counters are doing the right thing. Also, it's one more reason to support preemption in the first place. Having lower latency is all fine, but not everybody cares and clearly preemption adds its own overhead. Having the preemption infrastructure add its own set of help (ie helping find not only preempt-related bugs, but spinlock bugs too) makes preempt all the more useful. In particular, with preempt we can add atomicity debugging to "put_user()" and friends, to verify that nobody tries to do user-mode accesses when they aren't allowed to - another use for "in_atomic()". But that's requires a functioning in_atomic() that isn't too limited to be generally used.. Linus ^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: [PATCH] BUG(): sched.c: Line 944 2002-09-17 15:27 ` Linus Torvalds @ 2002-09-17 15:40 ` Linus Torvalds 2002-09-17 15:53 ` Ingo Molnar 2002-09-17 15:54 ` Ingo Molnar 0 siblings, 2 replies; 34+ messages in thread From: Linus Torvalds @ 2002-09-17 15:40 UTC (permalink / raw) To: Robert Love; +Cc: linux-kernel, Ingo Molnar On Tue, 17 Sep 2002, Linus Torvalds wrote: > > I really think we need this to have a stable system. Alan still thinks > preempt is unstable, and this helps counter some of that - by adding > sanity checks that all the counters are doing the right thing. That said, the exit() crap does end up being a bit unreadable. How about just splitting up schedule() into the "normal schedule" and the "exit()" case. In particular, there are a few other things that are illegal for the non-exit case anyway, and that we migth as well check for. There are also potentially things we might want to do in the exit case that we don't want to do in the hot-path. So we could just rename the current core "schedule()" functionality as "static void do_schedule()", and then have void schedule(void) { BUG_ON(in_atomic()); BUG_ON(current->state & TASK_ZOMBIE); do_schedule(); } void exit_schedule(void) { BUG_ON(!(current->state & TASK_ZOMBIE)); do_schedule(); } which is simpler than playing games with preempt_count == -2, and also allows us more sanity checking than we have right now (the TASK_ZOMBIE thing is also a exit-specific thing, and there may well be others too). Linus ^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: [PATCH] BUG(): sched.c: Line 944 2002-09-17 15:40 ` Linus Torvalds @ 2002-09-17 15:53 ` Ingo Molnar 2002-09-17 16:18 ` Linus Torvalds 2002-09-17 15:54 ` Ingo Molnar 1 sibling, 1 reply; 34+ messages in thread From: Ingo Molnar @ 2002-09-17 15:53 UTC (permalink / raw) To: Linus Torvalds; +Cc: Robert Love, linux-kernel On Tue, 17 Sep 2002, Linus Torvalds wrote: > void schedule(void) > { > BUG_ON(in_atomic()); > BUG_ON(current->state & TASK_ZOMBIE); > do_schedule(); > } > > void exit_schedule(void) > { > BUG_ON(!(current->state & TASK_ZOMBIE)); > do_schedule(); > } i'd rather do it the other way around - ie. do a: if (unlikely(current->state == TASK_ZOMBIE)) { ... exit checks ... } else { ... normal checks ... } this test is cheaper than a function call. I'd say the ZOMBIE check alone is not significant enough to introduce a function split in the hotpath. This check is also easier to remove later on. TASK_ZOMBIE is only ever allowed to be set for exiting tasks. [and if TASK_ZOMBIE handling is broken then it will trigger the normal atomic tests so it's not like we are silently ignoring those errors.] Ingo ^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: [PATCH] BUG(): sched.c: Line 944 2002-09-17 15:53 ` Ingo Molnar @ 2002-09-17 16:18 ` Linus Torvalds 2002-09-17 16:26 ` Ingo Molnar 0 siblings, 1 reply; 34+ messages in thread From: Linus Torvalds @ 2002-09-17 16:18 UTC (permalink / raw) To: Ingo Molnar; +Cc: Robert Love, linux-kernel On Tue, 17 Sep 2002, Ingo Molnar wrote: > > this test is cheaper than a function call. Ehh.. On most architectures an unconditional direct function call is _faster_ than a conditional test that does not predict all that well. So the "test is faster than a function call" is not a very good optimization, I suspect. In fact, we'd probably be better off trying to factor out more out of schedule() rather than making it even bigger. We've done that before: the timeout stuff was done that way, and meant that normal reschedules no longer need to care about timeouts. That made things simpler. On the other hand, we do have other ways to test the preempt count inside the scheduler. In particular, we might just move the "in_atomic()" check a few lines downwards, at which point we've released the kernel lock and explicitly disabled preemption, so at that point the test should be even simpler with fewer conditionals.. Linus ^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: [PATCH] BUG(): sched.c: Line 944 2002-09-17 16:18 ` Linus Torvalds @ 2002-09-17 16:26 ` Ingo Molnar 2002-09-17 18:47 ` Robert Love 0 siblings, 1 reply; 34+ messages in thread From: Ingo Molnar @ 2002-09-17 16:26 UTC (permalink / raw) To: Linus Torvalds; +Cc: Robert Love, linux-kernel On Tue, 17 Sep 2002, Linus Torvalds wrote: > On the other hand, we do have other ways to test the preempt count > inside the scheduler. In particular, we might just move the > "in_atomic()" check a few lines downwards, at which point we've released > the kernel lock and explicitly disabled preemption, so at that point the > test should be even simpler with fewer conditionals.. indeed ... Ingo ^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: [PATCH] BUG(): sched.c: Line 944 2002-09-17 16:26 ` Ingo Molnar @ 2002-09-17 18:47 ` Robert Love 2002-09-17 18:57 ` Ingo Molnar 0 siblings, 1 reply; 34+ messages in thread From: Robert Love @ 2002-09-17 18:47 UTC (permalink / raw) To: Ingo Molnar; +Cc: Linus Torvalds, linux-kernel On Tue, 2002-09-17 at 12:26, Ingo Molnar wrote: > On Tue, 17 Sep 2002, Linus Torvalds wrote: > > > On the other hand, we do have other ways to test the preempt count > > inside the scheduler. In particular, we might just move the > > "in_atomic()" check a few lines downwards, at which point we've released > > the kernel lock and explicitly disabled preemption, so at that point the > > test should be even simpler with fewer conditionals.. > > indeed ... OK so do we want to do (a): (moved down to after the preempt_disable() and release_kernel_lock()) if (likely(current->state != TASK_ZOMBIE) if (unlikely((preempt_count() & ~PREEMPT_ACTIVE) != 1)) ... or go with (b) where we split schedule() into schedule(), exit_schedule(), and do_schedule(). Robert Love ^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: [PATCH] BUG(): sched.c: Line 944 2002-09-17 18:47 ` Robert Love @ 2002-09-17 18:57 ` Ingo Molnar 2002-09-17 19:23 ` Robert Love 0 siblings, 1 reply; 34+ messages in thread From: Ingo Molnar @ 2002-09-17 18:57 UTC (permalink / raw) To: Robert Love; +Cc: Linus Torvalds, linux-kernel On 17 Sep 2002, Robert Love wrote: > OK so do we want to do (a): > > (moved down to after the preempt_disable() and release_kernel_lock()) > > if (likely(current->state != TASK_ZOMBIE) > if (unlikely((preempt_count() & ~PREEMPT_ACTIVE) != 1)) > ... > > or go with (b) where we split schedule() into schedule(), > exit_schedule(), and do_schedule(). i'd do (a). current->state is to be used anyway, and the default-untaken first branch should be cheap. Plus by moving things down the splitup of the function would create more code duplication than necessery i think. Ingo ^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: [PATCH] BUG(): sched.c: Line 944 2002-09-17 18:57 ` Ingo Molnar @ 2002-09-17 19:23 ` Robert Love 2002-09-17 19:54 ` Steven Cole 0 siblings, 1 reply; 34+ messages in thread From: Robert Love @ 2002-09-17 19:23 UTC (permalink / raw) To: Ingo Molnar; +Cc: Linus Torvalds, linux-kernel On Tue, 2002-09-17 at 14:57, Ingo Molnar wrote: > i'd do (a). current->state is to be used anyway, and the default-untaken > first branch should be cheap. Plus by moving things down the splitup of > the function would create more code duplication than necessery i think. Note by moving it down, the only gain over keeping it at the top is not having to check for the BKL... Anyhow, I would appreciate it if you could give this a try (with kernel preemption enabled)... any comments are appreciated. (Note you need a 2.5.35-bk release to get the dump_stack(). Otherwise use show_trace(0).) Robert Love diff -urN linux-2.5.35/kernel/sched.c linux/kernel/sched.c --- linux-2.5.35/kernel/sched.c Sun Sep 15 22:18:24 2002 +++ linux/kernel/sched.c Tue Sep 17 15:24:08 2002 @@ -940,9 +940,6 @@ struct list_head *queue; int idx; - if (unlikely(in_atomic())) - BUG(); - #if CONFIG_DEBUG_HIGHMEM check_highmem_ptes(); #endif @@ -950,8 +947,20 @@ preempt_disable(); prev = current; rq = this_rq(); - release_kernel_lock(prev); + + /* + * Test if we are atomic. Since do_exit() needs to call into + * schedule() atomically, we ignore that for now. Otherwise, + * whine if we are scheduling when we should not be. + */ + if (likely(current->state != TASK_ZOMBIE)) { + if (unlikely((preempt_count() & ~PREEMPT_ACTIVE) != 1)) { + printk(KERN_ERR "scheduling while non-atomic!\n"); + dump_stack(); + } + } + prev->sleep_timestamp = jiffies; spin_lock_irq(&rq->lock); @@ -959,7 +968,7 @@ * if entering off of a kernel preemption go straight * to picking the next task. */ - if (unlikely(preempt_count() & PREEMPT_ACTIVE)) + if (unlikely(preempt_count() == PREEMPT_ACTIVE)) goto pick_next_task; switch (prev->state) { ^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: [PATCH] BUG(): sched.c: Line 944 2002-09-17 19:23 ` Robert Love @ 2002-09-17 19:54 ` Steven Cole 2002-09-17 20:06 ` Robert Love 0 siblings, 1 reply; 34+ messages in thread From: Steven Cole @ 2002-09-17 19:54 UTC (permalink / raw) To: Robert Love; +Cc: Ingo Molnar, Linus Torvalds, Linux Kernel On Tue, 2002-09-17 at 13:23, Robert Love wrote: > On Tue, 2002-09-17 at 14:57, Ingo Molnar wrote: > > > i'd do (a). current->state is to be used anyway, and the default-untaken > > first branch should be cheap. Plus by moving things down the splitup of > > the function would create more code duplication than necessery i think. > > Note by moving it down, the only gain over keeping it at the top is not > having to check for the BKL... > > Anyhow, I would appreciate it if you could give this a try (with kernel > preemption enabled)... any comments are appreciated. > > (Note you need a 2.5.35-bk release to get the dump_stack(). Otherwise > use show_trace(0).) > > Robert Love > I applied that patch to 2.5.35-bk3 and with PREEMPT enabled. And it booted without any of the usual complaints with preempt and the in_atomic check. But then, I ran 1) dbench 1 OK 2) dbench 2 OK 3) dbench 3 blam! Running dbench 3 resulted in the dbench clients hanging and being unkillable with kill -9 in the D state. steven 1046 0.0 0.0 1440 472 ? D 13:46 0:00 ./dbench 3 steven 1047 0.0 0.0 1440 420 ? D 13:46 0:00 ./dbench 3 steven 1048 0.0 0.0 1440 472 ? D 13:46 0:00 ./dbench 3 I can ssh and enter my user password, but it hangs after that with no bash prompt. Other ssh sessions which I started previously are still responsive. Test box is 2-way pIII, kernel SMP. Steven ^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: [PATCH] BUG(): sched.c: Line 944 2002-09-17 19:54 ` Steven Cole @ 2002-09-17 20:06 ` Robert Love 2002-09-17 20:32 ` Steven Cole 2002-09-17 20:58 ` Steven Cole 0 siblings, 2 replies; 34+ messages in thread From: Robert Love @ 2002-09-17 20:06 UTC (permalink / raw) To: Steven Cole; +Cc: Ingo Molnar, Linus Torvalds, Linux Kernel On Tue, 2002-09-17 at 15:54, Steven Cole wrote: Thank you for the testing, Steven. > Running dbench 3 resulted in the dbench clients hanging and being > unkillable with kill -9 in the D state. Hrm, I cannot reproduce this. I just successfully completed a `dbench 16'. Can you find where they are hanging? You can get a trace via sysrq. You can also see where they are in the kernel via the wchan field of ps: "ps -ewo user,pid,priority,%cpu,stat,command,wchan" is a favorite of mine. Sure it does not happen with a stock kernel (no preempt)? What if you replace the printk() and dump_stack() in schedule() with a no-op (but not something that will optimize away the conditional, i.e. try a cpu_relax()). Oh, is the previous patch fully backed out? None of that do_exit muck anymore, right? > Test box is 2-way pIII, kernel SMP. I too am SMP with kernel preemption, dual Athlon MP. Regards, Robert Love ^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: [PATCH] BUG(): sched.c: Line 944 2002-09-17 20:06 ` Robert Love @ 2002-09-17 20:32 ` Steven Cole 2002-09-17 20:58 ` Steven Cole 1 sibling, 0 replies; 34+ messages in thread From: Steven Cole @ 2002-09-17 20:32 UTC (permalink / raw) To: Robert Love; +Cc: Ingo Molnar, Linus Torvalds, Linux Kernel On Tue, 2002-09-17 at 14:06, Robert Love wrote: > On Tue, 2002-09-17 at 15:54, Steven Cole wrote: > > Thank you for the testing, Steven. > > > Running dbench 3 resulted in the dbench clients hanging and being > > unkillable with kill -9 in the D state. > > Hrm, I cannot reproduce this. I just successfully completed a `dbench > 16'. Can you find where they are hanging? You can get a trace via > sysrq. You can also see where they are in the kernel via the wchan > field of ps: "ps -ewo user,pid,priority,%cpu,stat,command,wchan" is a > favorite of mine. I rebooted the box, but had to SYSRQ-S, SYSRQ-B, due to shutdown -r now hanging up. On reboot, the init scripts froze after "starting atd", but fortunately this was after sshd, so I ssh'd to the box several times, and did your ps command before hanging it with dbench 3 (dbench 1 and 2 were successful again). I have that ps output if you want it. I could NOT do that, or even ls, after the dbench 3 hang. When I shutdown, SYSRQ-S only partially synced (just one partition). > > Sure it does not happen with a stock kernel (no preempt)? I'll try 2.5.35-bk3 (no other patches) without preempt shortly. I'll report any failures, but no news is good news. > > What if you replace the printk() and dump_stack() in schedule() with a > no-op (but not something that will optimize away the conditional, i.e. > try a cpu_relax()). I'll try that if I get the time. But I wasn't getting any dump info in dmesg for those boots. > > Oh, is the previous patch fully backed out? None of that do_exit muck > anymore, right? Yep, plain 2.5.35, then patched with 2.5.35-bk3, then with your patch posted at 15:23. Nothing else. > > > Test box is 2-way pIII, kernel SMP. > > I too am SMP with kernel preemption, dual Athlon MP. Steven ^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: [PATCH] BUG(): sched.c: Line 944 2002-09-17 20:06 ` Robert Love 2002-09-17 20:32 ` Steven Cole @ 2002-09-17 20:58 ` Steven Cole 2002-09-18 4:44 ` Robert Love 1 sibling, 1 reply; 34+ messages in thread From: Steven Cole @ 2002-09-17 20:58 UTC (permalink / raw) To: Robert Love; +Cc: Ingo Molnar, Linus Torvalds, Linux Kernel On Tue, 2002-09-17 at 14:06, Robert Love wrote: > On Tue, 2002-09-17 at 15:54, Steven Cole wrote: > > Thank you for the testing, Steven. > > > Running dbench 3 resulted in the dbench clients hanging and being > > unkillable with kill -9 in the D state. > > Hrm, I cannot reproduce this. I just successfully completed a `dbench > 16'. Can you find where they are hanging? You can get a trace via > sysrq. You can also see where they are in the kernel via the wchan > field of ps: "ps -ewo user,pid,priority,%cpu,stat,command,wchan" is a > favorite of mine. Sorry, it hung so badly that it didn't respond to that. > > Sure it does not happen with a stock kernel (no preempt)? I just began testing plain vanilla 2.5.35-bk3 without preempt, and the box has run up to 24 clients so far. So far so good. > > What if you replace the printk() and dump_stack() in schedule() with a > no-op (but not something that will optimize away the conditional, i.e. > try a cpu_relax()). I'll try that in a bit. Steven ^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: [PATCH] BUG(): sched.c: Line 944 2002-09-17 20:58 ` Steven Cole @ 2002-09-18 4:44 ` Robert Love 2002-09-18 14:08 ` Steven Cole 0 siblings, 1 reply; 34+ messages in thread From: Robert Love @ 2002-09-18 4:44 UTC (permalink / raw) To: Steven Cole; +Cc: Ingo Molnar, Linux Kernel On Tue, 2002-09-17 at 16:58, Steven Cole wrote: > Sorry, it hung so badly that it didn't respond to that. I fixed the hang. If you notice the problem, please do not laugh. The attached patch, against 2.5.36, should work fine... Robert Love diff -urN linux-2.5.36/kernel/sched.c linux/kernel/sched.c --- linux-2.5.36/kernel/sched.c Tue Sep 17 20:58:48 2002 +++ linux/kernel/sched.c Wed Sep 18 00:41:09 2002 @@ -940,9 +940,6 @@ struct list_head *queue; int idx; - if (unlikely(in_atomic())) - BUG(); - #if CONFIG_DEBUG_HIGHMEM check_highmem_ptes(); #endif @@ -950,8 +947,20 @@ preempt_disable(); prev = current; rq = this_rq(); - release_kernel_lock(prev); + + /* + * Test if we are atomic. Since do_exit() needs to call into + * schedule() atomically, we ignore that for now. Otherwise, + * whine if we are scheduling when we should not be. + */ + if (likely(current->state != TASK_ZOMBIE)) { + if (unlikely((preempt_count() & ~PREEMPT_ACTIVE) != 1)) { + printk(KERN_ERR "scheduling while non-atomic!\n"); + dump_stack(); + } + } + prev->sleep_timestamp = jiffies; spin_lock_irq(&rq->lock); ^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: [PATCH] BUG(): sched.c: Line 944 2002-09-18 4:44 ` Robert Love @ 2002-09-18 14:08 ` Steven Cole 0 siblings, 0 replies; 34+ messages in thread From: Steven Cole @ 2002-09-18 14:08 UTC (permalink / raw) To: Robert Love; +Cc: Ingo Molnar, Linux Kernel On Tue, 2002-09-17 at 22:44, Robert Love wrote: > On Tue, 2002-09-17 at 16:58, Steven Cole wrote: > > > Sorry, it hung so badly that it didn't respond to that. > > I fixed the hang. If you notice the problem, please do not laugh. > > The attached patch, against 2.5.36, should work fine... > > Robert Love Thanks. That works fine here. Now that I can run with PREEMPT again, I'll leave it enabled. It does seem to improve interactive feel under heavy load, but without an easily identifiable metric for that you know what Rik would say. Steven ^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: [PATCH] BUG(): sched.c: Line 944 2002-09-17 15:40 ` Linus Torvalds 2002-09-17 15:53 ` Ingo Molnar @ 2002-09-17 15:54 ` Ingo Molnar 1 sibling, 0 replies; 34+ messages in thread From: Ingo Molnar @ 2002-09-17 15:54 UTC (permalink / raw) To: Linus Torvalds; +Cc: Robert Love, linux-kernel ie. for the time being, something like: --- linux/kernel/sched.c.orig Tue Sep 17 17:53:31 2002 +++ linux/kernel/sched.c Tue Sep 17 17:54:10 2002 @@ -940,8 +940,9 @@ struct list_head *queue; int idx; - if (unlikely(in_atomic())) - BUG(); + if (likely(current->state != TASK_ZOMBIE)) + if (unlikely(in_atomic())) + BUG(); #if CONFIG_DEBUG_HIGHMEM check_highmem_ptes(); ^ permalink raw reply [flat|nested] 34+ messages in thread
end of thread, other threads:[~2002-09-18 14:07 UTC | newest] Thread overview: 34+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2002-09-16 18:48 [PATCH] BUG(): sched.c: Line 944 Robert Love 2002-09-16 19:01 ` Linus Torvalds 2002-09-16 21:14 ` Robert Love 2002-09-16 21:41 ` Linus Torvalds 2002-09-16 22:15 ` Robert Love 2002-09-16 22:26 ` Linus Torvalds 2002-09-16 23:15 ` Robert Love 2002-09-16 23:45 ` Linus Torvalds 2002-09-16 23:58 ` Robert Love 2002-09-17 5:56 ` Linus Torvalds 2002-09-17 8:12 ` Robert Love 2002-09-17 8:51 ` Robert Love 2002-09-17 8:59 ` Robert Love 2002-09-17 9:57 ` Ingo Molnar 2002-09-17 18:27 ` Robert Love 2002-09-17 18:46 ` Ingo Molnar 2002-09-17 14:10 ` Steven Cole 2002-09-17 18:29 ` Robert Love 2002-09-17 18:42 ` Steven Cole 2002-09-17 15:27 ` Linus Torvalds 2002-09-17 15:40 ` Linus Torvalds 2002-09-17 15:53 ` Ingo Molnar 2002-09-17 16:18 ` Linus Torvalds 2002-09-17 16:26 ` Ingo Molnar 2002-09-17 18:47 ` Robert Love 2002-09-17 18:57 ` Ingo Molnar 2002-09-17 19:23 ` Robert Love 2002-09-17 19:54 ` Steven Cole 2002-09-17 20:06 ` Robert Love 2002-09-17 20:32 ` Steven Cole 2002-09-17 20:58 ` Steven Cole 2002-09-18 4:44 ` Robert Love 2002-09-18 14:08 ` Steven Cole 2002-09-17 15:54 ` Ingo Molnar
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).