From: Peter Zijlstra <peterz@infradead.org> To: Marc Zyngier <Marc.Zyngier@arm.com> Cc: Ingo Molnar <mingo@elte.hu>, Frank Rowand <frank.rowand@am.sony.com>, linux-kernel@vger.kernel.org, linux-arm-kernel@lists.infradead.org Subject: Re: [BUG] "sched: Remove rq->lock from the first half of ttwu()" locks up on ARM Date: Wed, 25 May 2011 19:08:55 +0200 [thread overview] Message-ID: <1306343335.21578.65.camel@twins> (raw) In-Reply-To: <1306272750.2497.79.camel@laptop> On Tue, 2011-05-24 at 23:32 +0200, Peter Zijlstra wrote: > On Tue, 2011-05-24 at 19:13 +0100, Marc Zyngier wrote: > > Peter, > > > > I've experienced all kind of lock-ups on ARM SMP platforms recently, and > > finally tracked it down to the following patch: > > > > e4a52bcb9a18142d79e231b6733cabdbf2e67c1f [sched: Remove rq->lock from the first half of ttwu()]. > > > > Even on moderate load, the machine locks up, often silently, and > > sometimes with a few messages like: > > INFO: rcu_preempt_state detected stalls on CPUs/tasks: { 0} (detected by 1, t=12002 jiffies) > > > > Another side effect of this patch is that the load average is always 0, > > whatever load I throw at the system. > > > > Reverting the sched changes up to that patch (included) gives me a > > working system again, which happily survives parallel kernel > > compilations without complaining. > > > > My knowledge of the scheduler being rather limited, I haven't been able > > to pinpoint the exact problem (though it probably have something to do > > with __ARCH_WANT_INTERRUPTS_ON_CTXSW being defined on ARM). The enclosed > > patch somehow papers over the load average problem, but the system ends > > up locking up anyway: > > Hurm.. I'll try and make x86 use __ARCH_WANT_INTERRUPTS_ON_CTXSW, IIRC > Ingo once said that that is possible and try to see if I can reproduce. > No clear ideas atm. So I checked out that particular commit and build with the below patch on-top. grep __ARCH_WANT /proc/sched_debug did indeed return those strings so I'm assuming CPP did its magic and I'm indeed running a kernel that enables IRQs around context switches. The sad news however is that a make -j8 (on a dual core) seems to result in a kernel image, not an oops. Ooh, shiny, whilst typing this I got an NMI-watchdog error reporting me that CPU1 got stuck in try_to_wake_up(), so it looks like I can indeed reproduce some funnies. /me goes dig in. --- arch/x86/include/asm/system.h | 2 ++ kernel/sched_debug.c | 7 +++++++ 2 files changed, 9 insertions(+), 0 deletions(-) diff --git a/arch/x86/include/asm/system.h b/arch/x86/include/asm/system.h index 12569e6..56103bb 100644 --- a/arch/x86/include/asm/system.h +++ b/arch/x86/include/asm/system.h @@ -10,6 +10,8 @@ #include <linux/kernel.h> #include <linux/irqflags.h> +#define __ARCH_WANT_INTERRUPTS_ON_CTXSW + /* entries in ARCH_DLINFO: */ #if defined(CONFIG_IA32_EMULATION) || !defined(CONFIG_X86_64) # define AT_VECTOR_SIZE_ARCH 2 diff --git a/kernel/sched_debug.c b/kernel/sched_debug.c index 3669bec6..18b4ace 100644 --- a/kernel/sched_debug.c +++ b/kernel/sched_debug.c @@ -335,6 +335,13 @@ static int sched_debug_show(struct seq_file *m, void *v) (int)strcspn(init_utsname()->version, " "), init_utsname()->version); +#ifdef __ARCH_WANT_INTERRUPTS_ON_CTXSW + SEQ_printf(m, "__ARCH_WANT_INTERRUPTS_ON_CTXSW\n"); +#endif +#ifdef __ARCH_WANT_UNLOCKED_CTXSW + SEQ_printf(m, "__ARCH_WANT_UNLOCKED_CTXSW\n"); +#endif + #define P(x) \ SEQ_printf(m, "%-40s: %Ld\n", #x, (long long)(x)) #define PN(x) \
WARNING: multiple messages have this Message-ID (diff)
From: peterz@infradead.org (Peter Zijlstra) To: linux-arm-kernel@lists.infradead.org Subject: [BUG] "sched: Remove rq->lock from the first half of ttwu()" locks up on ARM Date: Wed, 25 May 2011 19:08:55 +0200 [thread overview] Message-ID: <1306343335.21578.65.camel@twins> (raw) In-Reply-To: <1306272750.2497.79.camel@laptop> On Tue, 2011-05-24 at 23:32 +0200, Peter Zijlstra wrote: > On Tue, 2011-05-24 at 19:13 +0100, Marc Zyngier wrote: > > Peter, > > > > I've experienced all kind of lock-ups on ARM SMP platforms recently, and > > finally tracked it down to the following patch: > > > > e4a52bcb9a18142d79e231b6733cabdbf2e67c1f [sched: Remove rq->lock from the first half of ttwu()]. > > > > Even on moderate load, the machine locks up, often silently, and > > sometimes with a few messages like: > > INFO: rcu_preempt_state detected stalls on CPUs/tasks: { 0} (detected by 1, t=12002 jiffies) > > > > Another side effect of this patch is that the load average is always 0, > > whatever load I throw at the system. > > > > Reverting the sched changes up to that patch (included) gives me a > > working system again, which happily survives parallel kernel > > compilations without complaining. > > > > My knowledge of the scheduler being rather limited, I haven't been able > > to pinpoint the exact problem (though it probably have something to do > > with __ARCH_WANT_INTERRUPTS_ON_CTXSW being defined on ARM). The enclosed > > patch somehow papers over the load average problem, but the system ends > > up locking up anyway: > > Hurm.. I'll try and make x86 use __ARCH_WANT_INTERRUPTS_ON_CTXSW, IIRC > Ingo once said that that is possible and try to see if I can reproduce. > No clear ideas atm. So I checked out that particular commit and build with the below patch on-top. grep __ARCH_WANT /proc/sched_debug did indeed return those strings so I'm assuming CPP did its magic and I'm indeed running a kernel that enables IRQs around context switches. The sad news however is that a make -j8 (on a dual core) seems to result in a kernel image, not an oops. Ooh, shiny, whilst typing this I got an NMI-watchdog error reporting me that CPU1 got stuck in try_to_wake_up(), so it looks like I can indeed reproduce some funnies. /me goes dig in. --- arch/x86/include/asm/system.h | 2 ++ kernel/sched_debug.c | 7 +++++++ 2 files changed, 9 insertions(+), 0 deletions(-) diff --git a/arch/x86/include/asm/system.h b/arch/x86/include/asm/system.h index 12569e6..56103bb 100644 --- a/arch/x86/include/asm/system.h +++ b/arch/x86/include/asm/system.h @@ -10,6 +10,8 @@ #include <linux/kernel.h> #include <linux/irqflags.h> +#define __ARCH_WANT_INTERRUPTS_ON_CTXSW + /* entries in ARCH_DLINFO: */ #if defined(CONFIG_IA32_EMULATION) || !defined(CONFIG_X86_64) # define AT_VECTOR_SIZE_ARCH 2 diff --git a/kernel/sched_debug.c b/kernel/sched_debug.c index 3669bec6..18b4ace 100644 --- a/kernel/sched_debug.c +++ b/kernel/sched_debug.c @@ -335,6 +335,13 @@ static int sched_debug_show(struct seq_file *m, void *v) (int)strcspn(init_utsname()->version, " "), init_utsname()->version); +#ifdef __ARCH_WANT_INTERRUPTS_ON_CTXSW + SEQ_printf(m, "__ARCH_WANT_INTERRUPTS_ON_CTXSW\n"); +#endif +#ifdef __ARCH_WANT_UNLOCKED_CTXSW + SEQ_printf(m, "__ARCH_WANT_UNLOCKED_CTXSW\n"); +#endif + #define P(x) \ SEQ_printf(m, "%-40s: %Ld\n", #x, (long long)(x)) #define PN(x) \
next prev parent reply other threads:[~2011-05-25 17:09 UTC|newest] Thread overview: 102+ messages / expand[flat|nested] mbox.gz Atom feed top 2011-05-24 18:13 [BUG] "sched: Remove rq->lock from the first half of ttwu()" locks up on ARM Marc Zyngier 2011-05-24 18:13 ` Marc Zyngier 2011-05-24 21:32 ` Peter Zijlstra 2011-05-24 21:32 ` Peter Zijlstra 2011-05-24 21:39 ` Ingo Molnar 2011-05-24 21:39 ` Ingo Molnar 2011-05-25 12:23 ` Marc Zyngier 2011-05-25 12:23 ` Marc Zyngier 2011-05-25 17:08 ` Peter Zijlstra [this message] 2011-05-25 17:08 ` Peter Zijlstra 2011-05-25 21:15 ` Peter Zijlstra 2011-05-25 21:15 ` Peter Zijlstra 2011-05-26 7:29 ` Yong Zhang 2011-05-26 7:29 ` Yong Zhang 2011-05-26 10:32 ` Peter Zijlstra 2011-05-26 10:32 ` Peter Zijlstra 2011-05-26 11:02 ` Marc Zyngier 2011-05-26 11:02 ` Marc Zyngier 2011-05-26 11:32 ` Peter Zijlstra 2011-05-26 11:32 ` Peter Zijlstra 2011-05-26 12:21 ` Peter Zijlstra 2011-05-26 12:21 ` Peter Zijlstra 2011-05-26 12:26 ` Ingo Molnar 2011-05-26 12:26 ` Ingo Molnar 2011-05-26 12:31 ` Russell King - ARM Linux 2011-05-26 12:31 ` Russell King - ARM Linux 2011-05-26 12:37 ` Peter Zijlstra 2011-05-26 12:37 ` Peter Zijlstra 2011-05-26 12:50 ` Ingo Molnar 2011-05-26 12:50 ` Ingo Molnar 2011-05-26 13:36 ` Russell King - ARM Linux 2011-05-26 13:36 ` Russell King - ARM Linux 2011-05-26 14:45 ` Catalin Marinas 2011-05-26 14:45 ` Catalin Marinas 2011-05-27 12:06 ` Ingo Molnar 2011-05-27 12:06 ` Ingo Molnar 2011-05-27 17:55 ` Russell King - ARM Linux 2011-05-27 17:55 ` Russell King - ARM Linux 2011-05-27 19:41 ` Nicolas Pitre 2011-05-27 19:41 ` Nicolas Pitre 2011-05-27 20:52 ` Russell King - ARM Linux 2011-05-27 20:52 ` Russell King - ARM Linux 2011-05-28 13:13 ` Peter Zijlstra 2011-05-28 13:13 ` Peter Zijlstra 2011-05-31 11:08 ` Michal Simek 2011-05-31 11:08 ` Michal Simek 2011-05-31 13:22 ` Peter Zijlstra 2011-05-31 13:22 ` Peter Zijlstra 2011-05-31 13:37 ` Michal Simek 2011-05-31 13:37 ` Michal Simek 2011-05-31 13:52 ` Peter Zijlstra 2011-05-31 13:52 ` Peter Zijlstra 2011-05-31 14:08 ` Michal Simek 2011-05-31 14:08 ` Michal Simek 2011-05-31 14:29 ` Peter Zijlstra 2011-05-31 14:29 ` Peter Zijlstra 2011-05-29 10:21 ` Catalin Marinas 2011-05-29 10:21 ` Catalin Marinas 2011-05-29 10:26 ` Russell King - ARM Linux 2011-05-29 10:26 ` Russell King - ARM Linux 2011-05-29 12:01 ` Catalin Marinas 2011-05-29 12:01 ` Catalin Marinas 2011-05-29 13:19 ` Russell King - ARM Linux 2011-05-29 13:19 ` Russell King - ARM Linux 2011-05-29 21:21 ` Catalin Marinas 2011-05-29 21:21 ` Catalin Marinas 2011-05-29 9:51 ` Catalin Marinas 2011-05-29 9:51 ` Catalin Marinas 2011-06-06 10:29 ` Pavel Machek 2011-06-06 10:29 ` Pavel Machek 2011-05-26 14:56 ` Marc Zyngier 2011-05-26 14:56 ` Marc Zyngier 2011-05-26 15:45 ` Oleg Nesterov 2011-05-26 15:45 ` Oleg Nesterov 2011-05-26 15:59 ` Peter Zijlstra 2011-05-26 15:59 ` Peter Zijlstra 2011-05-26 16:09 ` Peter Zijlstra 2011-05-26 16:09 ` Peter Zijlstra 2011-05-26 16:20 ` Marc Zyngier 2011-05-26 16:20 ` Marc Zyngier 2011-05-26 16:32 ` Peter Zijlstra 2011-05-26 16:32 ` Peter Zijlstra 2011-05-27 8:01 ` Marc Zyngier 2011-05-27 8:01 ` Marc Zyngier 2011-05-26 16:22 ` Marc Zyngier 2011-05-26 16:22 ` Marc Zyngier 2011-05-26 17:04 ` Oleg Nesterov 2011-05-26 17:04 ` Oleg Nesterov 2011-05-26 17:17 ` Peter Zijlstra 2011-05-26 17:17 ` Peter Zijlstra 2011-05-26 17:23 ` Peter Zijlstra 2011-05-26 17:23 ` Peter Zijlstra 2011-05-26 17:49 ` Oleg Nesterov 2011-05-26 17:49 ` Oleg Nesterov 2011-05-27 7:01 ` Yong Zhang 2011-05-27 7:01 ` Yong Zhang 2011-05-27 15:23 ` Santosh Shilimkar 2011-05-27 15:23 ` Santosh Shilimkar 2011-05-27 15:29 ` Marc Zyngier 2011-05-27 15:29 ` Marc Zyngier 2011-05-27 15:30 ` Santosh Shilimkar 2011-05-27 15:30 ` Santosh Shilimkar
Reply instructions: You may reply publicly to this message via plain-text email using any one of the following methods: * Save the following mbox file, import it into your mail client, and reply-to-all from there: mbox Avoid top-posting and favor interleaved quoting: https://en.wikipedia.org/wiki/Posting_style#Interleaved_style * Reply using the --to, --cc, and --in-reply-to switches of git-send-email(1): git send-email \ --in-reply-to=1306343335.21578.65.camel@twins \ --to=peterz@infradead.org \ --cc=Marc.Zyngier@arm.com \ --cc=frank.rowand@am.sony.com \ --cc=linux-arm-kernel@lists.infradead.org \ --cc=linux-kernel@vger.kernel.org \ --cc=mingo@elte.hu \ /path/to/YOUR_REPLY https://kernel.org/pub/software/scm/git/docs/git-send-email.html * If your mail client supports setting the In-Reply-To header via mailto: links, try the mailto: linkBe sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.