All of lore.kernel.org
 help / color / mirror / Atom feed
From: Peter Zijlstra <peterz@infradead.org>
To: Marc Zyngier <Marc.Zyngier@arm.com>
Cc: Ingo Molnar <mingo@elte.hu>,
	Frank Rowand <frank.rowand@am.sony.com>,
	linux-kernel@vger.kernel.org,
	linux-arm-kernel@lists.infradead.org
Subject: Re: [BUG]  "sched: Remove rq->lock from the first half of ttwu()"  locks up on ARM
Date: Wed, 25 May 2011 19:08:55 +0200	[thread overview]
Message-ID: <1306343335.21578.65.camel@twins> (raw)
In-Reply-To: <1306272750.2497.79.camel@laptop>

On Tue, 2011-05-24 at 23:32 +0200, Peter Zijlstra wrote:
> On Tue, 2011-05-24 at 19:13 +0100, Marc Zyngier wrote:
> > Peter,
> > 
> > I've experienced all kind of lock-ups on ARM SMP platforms recently, and
> > finally tracked it down to the following patch:
> > 
> > e4a52bcb9a18142d79e231b6733cabdbf2e67c1f [sched: Remove rq->lock from the first half of ttwu()].
> > 
> > Even on moderate load, the machine locks up, often silently, and
> > sometimes with a few messages like:
> > INFO: rcu_preempt_state detected stalls on CPUs/tasks: { 0} (detected by 1, t=12002 jiffies)
> > 
> > Another side effect of this patch is that the load average is always 0,
> > whatever load I throw at the system.
> > 
> > Reverting the sched changes up to that patch (included) gives me a
> > working system again, which happily survives parallel kernel
> > compilations without complaining.
> > 
> > My knowledge of the scheduler being rather limited, I haven't been able
> > to pinpoint the exact problem (though it probably have something to do
> > with __ARCH_WANT_INTERRUPTS_ON_CTXSW being defined on ARM). The enclosed
> > patch somehow papers over the load average problem, but the system ends
> > up locking up anyway:
> 
> Hurm.. I'll try and make x86 use __ARCH_WANT_INTERRUPTS_ON_CTXSW, IIRC
> Ingo once said that that is possible and try to see if I can reproduce.
> No clear ideas atm.

So I checked out that particular commit and build with the below patch
on-top. grep __ARCH_WANT /proc/sched_debug did indeed return those
strings so I'm assuming CPP did its magic and I'm indeed running a
kernel that enables IRQs around context switches.

The sad news however is that a make -j8 (on a dual core) seems to result
in a kernel image, not an oops. 

Ooh, shiny, whilst typing this I got an NMI-watchdog error reporting me
that CPU1 got stuck in try_to_wake_up(), so it looks like I can indeed
reproduce some funnies.

/me goes dig in.

---
 arch/x86/include/asm/system.h |    2 ++
 kernel/sched_debug.c          |    7 +++++++
 2 files changed, 9 insertions(+), 0 deletions(-)

diff --git a/arch/x86/include/asm/system.h b/arch/x86/include/asm/system.h
index 12569e6..56103bb 100644
--- a/arch/x86/include/asm/system.h
+++ b/arch/x86/include/asm/system.h
@@ -10,6 +10,8 @@
 #include <linux/kernel.h>
 #include <linux/irqflags.h>
 
+#define __ARCH_WANT_INTERRUPTS_ON_CTXSW
+
 /* entries in ARCH_DLINFO: */
 #if defined(CONFIG_IA32_EMULATION) || !defined(CONFIG_X86_64)
 # define AT_VECTOR_SIZE_ARCH 2
diff --git a/kernel/sched_debug.c b/kernel/sched_debug.c
index 3669bec6..18b4ace 100644
--- a/kernel/sched_debug.c
+++ b/kernel/sched_debug.c
@@ -335,6 +335,13 @@ static int sched_debug_show(struct seq_file *m, void *v)
 		(int)strcspn(init_utsname()->version, " "),
 		init_utsname()->version);
 
+#ifdef __ARCH_WANT_INTERRUPTS_ON_CTXSW
+	SEQ_printf(m, "__ARCH_WANT_INTERRUPTS_ON_CTXSW\n");
+#endif
+#ifdef __ARCH_WANT_UNLOCKED_CTXSW
+	SEQ_printf(m, "__ARCH_WANT_UNLOCKED_CTXSW\n");                                  
+#endif
+
 #define P(x) \
 	SEQ_printf(m, "%-40s: %Ld\n", #x, (long long)(x))
 #define PN(x) \


WARNING: multiple messages have this Message-ID (diff)
From: peterz@infradead.org (Peter Zijlstra)
To: linux-arm-kernel@lists.infradead.org
Subject: [BUG]  "sched: Remove rq->lock from the first half of ttwu()" locks up on ARM
Date: Wed, 25 May 2011 19:08:55 +0200	[thread overview]
Message-ID: <1306343335.21578.65.camel@twins> (raw)
In-Reply-To: <1306272750.2497.79.camel@laptop>

On Tue, 2011-05-24 at 23:32 +0200, Peter Zijlstra wrote:
> On Tue, 2011-05-24 at 19:13 +0100, Marc Zyngier wrote:
> > Peter,
> > 
> > I've experienced all kind of lock-ups on ARM SMP platforms recently, and
> > finally tracked it down to the following patch:
> > 
> > e4a52bcb9a18142d79e231b6733cabdbf2e67c1f [sched: Remove rq->lock from the first half of ttwu()].
> > 
> > Even on moderate load, the machine locks up, often silently, and
> > sometimes with a few messages like:
> > INFO: rcu_preempt_state detected stalls on CPUs/tasks: { 0} (detected by 1, t=12002 jiffies)
> > 
> > Another side effect of this patch is that the load average is always 0,
> > whatever load I throw at the system.
> > 
> > Reverting the sched changes up to that patch (included) gives me a
> > working system again, which happily survives parallel kernel
> > compilations without complaining.
> > 
> > My knowledge of the scheduler being rather limited, I haven't been able
> > to pinpoint the exact problem (though it probably have something to do
> > with __ARCH_WANT_INTERRUPTS_ON_CTXSW being defined on ARM). The enclosed
> > patch somehow papers over the load average problem, but the system ends
> > up locking up anyway:
> 
> Hurm.. I'll try and make x86 use __ARCH_WANT_INTERRUPTS_ON_CTXSW, IIRC
> Ingo once said that that is possible and try to see if I can reproduce.
> No clear ideas atm.

So I checked out that particular commit and build with the below patch
on-top. grep __ARCH_WANT /proc/sched_debug did indeed return those
strings so I'm assuming CPP did its magic and I'm indeed running a
kernel that enables IRQs around context switches.

The sad news however is that a make -j8 (on a dual core) seems to result
in a kernel image, not an oops. 

Ooh, shiny, whilst typing this I got an NMI-watchdog error reporting me
that CPU1 got stuck in try_to_wake_up(), so it looks like I can indeed
reproduce some funnies.

/me goes dig in.

---
 arch/x86/include/asm/system.h |    2 ++
 kernel/sched_debug.c          |    7 +++++++
 2 files changed, 9 insertions(+), 0 deletions(-)

diff --git a/arch/x86/include/asm/system.h b/arch/x86/include/asm/system.h
index 12569e6..56103bb 100644
--- a/arch/x86/include/asm/system.h
+++ b/arch/x86/include/asm/system.h
@@ -10,6 +10,8 @@
 #include <linux/kernel.h>
 #include <linux/irqflags.h>
 
+#define __ARCH_WANT_INTERRUPTS_ON_CTXSW
+
 /* entries in ARCH_DLINFO: */
 #if defined(CONFIG_IA32_EMULATION) || !defined(CONFIG_X86_64)
 # define AT_VECTOR_SIZE_ARCH 2
diff --git a/kernel/sched_debug.c b/kernel/sched_debug.c
index 3669bec6..18b4ace 100644
--- a/kernel/sched_debug.c
+++ b/kernel/sched_debug.c
@@ -335,6 +335,13 @@ static int sched_debug_show(struct seq_file *m, void *v)
 		(int)strcspn(init_utsname()->version, " "),
 		init_utsname()->version);
 
+#ifdef __ARCH_WANT_INTERRUPTS_ON_CTXSW
+	SEQ_printf(m, "__ARCH_WANT_INTERRUPTS_ON_CTXSW\n");
+#endif
+#ifdef __ARCH_WANT_UNLOCKED_CTXSW
+	SEQ_printf(m, "__ARCH_WANT_UNLOCKED_CTXSW\n");                                  
+#endif
+
 #define P(x) \
 	SEQ_printf(m, "%-40s: %Ld\n", #x, (long long)(x))
 #define PN(x) \

  parent reply	other threads:[~2011-05-25 17:09 UTC|newest]

Thread overview: 102+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2011-05-24 18:13 [BUG] "sched: Remove rq->lock from the first half of ttwu()" locks up on ARM Marc Zyngier
2011-05-24 18:13 ` Marc Zyngier
2011-05-24 21:32 ` Peter Zijlstra
2011-05-24 21:32   ` Peter Zijlstra
2011-05-24 21:39   ` Ingo Molnar
2011-05-24 21:39     ` Ingo Molnar
2011-05-25 12:23     ` Marc Zyngier
2011-05-25 12:23       ` Marc Zyngier
2011-05-25 17:08   ` Peter Zijlstra [this message]
2011-05-25 17:08     ` Peter Zijlstra
2011-05-25 21:15     ` Peter Zijlstra
2011-05-25 21:15       ` Peter Zijlstra
2011-05-26  7:29       ` Yong Zhang
2011-05-26  7:29         ` Yong Zhang
2011-05-26 10:32         ` Peter Zijlstra
2011-05-26 10:32           ` Peter Zijlstra
2011-05-26 11:02           ` Marc Zyngier
2011-05-26 11:02             ` Marc Zyngier
2011-05-26 11:32             ` Peter Zijlstra
2011-05-26 11:32               ` Peter Zijlstra
2011-05-26 12:21               ` Peter Zijlstra
2011-05-26 12:21                 ` Peter Zijlstra
2011-05-26 12:26                 ` Ingo Molnar
2011-05-26 12:26                   ` Ingo Molnar
2011-05-26 12:31                   ` Russell King - ARM Linux
2011-05-26 12:31                     ` Russell King - ARM Linux
2011-05-26 12:37                     ` Peter Zijlstra
2011-05-26 12:37                       ` Peter Zijlstra
2011-05-26 12:50                     ` Ingo Molnar
2011-05-26 12:50                       ` Ingo Molnar
2011-05-26 13:36                       ` Russell King - ARM Linux
2011-05-26 13:36                         ` Russell King - ARM Linux
2011-05-26 14:45                       ` Catalin Marinas
2011-05-26 14:45                         ` Catalin Marinas
2011-05-27 12:06                         ` Ingo Molnar
2011-05-27 12:06                           ` Ingo Molnar
2011-05-27 17:55                           ` Russell King - ARM Linux
2011-05-27 17:55                             ` Russell King - ARM Linux
2011-05-27 19:41                           ` Nicolas Pitre
2011-05-27 19:41                             ` Nicolas Pitre
2011-05-27 20:52                           ` Russell King - ARM Linux
2011-05-27 20:52                             ` Russell King - ARM Linux
2011-05-28 13:13                             ` Peter Zijlstra
2011-05-28 13:13                               ` Peter Zijlstra
2011-05-31 11:08                               ` Michal Simek
2011-05-31 11:08                                 ` Michal Simek
2011-05-31 13:22                                 ` Peter Zijlstra
2011-05-31 13:22                                   ` Peter Zijlstra
2011-05-31 13:37                                   ` Michal Simek
2011-05-31 13:37                                     ` Michal Simek
2011-05-31 13:52                                     ` Peter Zijlstra
2011-05-31 13:52                                       ` Peter Zijlstra
2011-05-31 14:08                                       ` Michal Simek
2011-05-31 14:08                                         ` Michal Simek
2011-05-31 14:29                                         ` Peter Zijlstra
2011-05-31 14:29                                           ` Peter Zijlstra
2011-05-29 10:21                             ` Catalin Marinas
2011-05-29 10:21                               ` Catalin Marinas
2011-05-29 10:26                               ` Russell King - ARM Linux
2011-05-29 10:26                                 ` Russell King - ARM Linux
2011-05-29 12:01                                 ` Catalin Marinas
2011-05-29 12:01                                   ` Catalin Marinas
2011-05-29 13:19                                   ` Russell King - ARM Linux
2011-05-29 13:19                                     ` Russell King - ARM Linux
2011-05-29 21:21                                     ` Catalin Marinas
2011-05-29 21:21                                       ` Catalin Marinas
2011-05-29  9:51                           ` Catalin Marinas
2011-05-29  9:51                             ` Catalin Marinas
2011-06-06 10:29                           ` Pavel Machek
2011-06-06 10:29                             ` Pavel Machek
2011-05-26 14:56                 ` Marc Zyngier
2011-05-26 14:56                   ` Marc Zyngier
2011-05-26 15:45                 ` Oleg Nesterov
2011-05-26 15:45                   ` Oleg Nesterov
2011-05-26 15:59                   ` Peter Zijlstra
2011-05-26 15:59                     ` Peter Zijlstra
2011-05-26 16:09                     ` Peter Zijlstra
2011-05-26 16:09                       ` Peter Zijlstra
2011-05-26 16:20                       ` Marc Zyngier
2011-05-26 16:20                         ` Marc Zyngier
2011-05-26 16:32                         ` Peter Zijlstra
2011-05-26 16:32                           ` Peter Zijlstra
2011-05-27  8:01                           ` Marc Zyngier
2011-05-27  8:01                             ` Marc Zyngier
2011-05-26 16:22                       ` Marc Zyngier
2011-05-26 16:22                         ` Marc Zyngier
2011-05-26 17:04                       ` Oleg Nesterov
2011-05-26 17:04                         ` Oleg Nesterov
2011-05-26 17:17                         ` Peter Zijlstra
2011-05-26 17:17                           ` Peter Zijlstra
2011-05-26 17:23                           ` Peter Zijlstra
2011-05-26 17:23                             ` Peter Zijlstra
2011-05-26 17:49                             ` Oleg Nesterov
2011-05-26 17:49                               ` Oleg Nesterov
2011-05-27  7:01                             ` Yong Zhang
2011-05-27  7:01                               ` Yong Zhang
2011-05-27 15:23                             ` Santosh Shilimkar
2011-05-27 15:23                               ` Santosh Shilimkar
2011-05-27 15:29                               ` Marc Zyngier
2011-05-27 15:29                                 ` Marc Zyngier
2011-05-27 15:30                                 ` Santosh Shilimkar
2011-05-27 15:30                                   ` Santosh Shilimkar

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1306343335.21578.65.camel@twins \
    --to=peterz@infradead.org \
    --cc=Marc.Zyngier@arm.com \
    --cc=frank.rowand@am.sony.com \
    --cc=linux-arm-kernel@lists.infradead.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mingo@elte.hu \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.