All of lore.kernel.org
 help / color / mirror / Atom feed
From: Frank Rowand <frank.rowand@am.sony.com>
To: "paulmck@linux.vnet.ibm.com" <paulmck@linux.vnet.ibm.com>
Cc: "Rowand, Frank" <Frank_Rowand@sonyusa.com>,
	Peter Zijlstra <peterz@infradead.org>,
	linux-kernel <linux-kernel@vger.kernel.org>,
	Thomas Gleixner <tglx@linutronix.de>,
	linux-rt-users <linux-rt-users@vger.kernel.org>,
	Mike Galbraith <efault@gmx.de>, <linux-kernel@vger.kernel.org>,
	Ingo Molnar <mingo@elte.hu>,
	Venkatesh Pallipadi <venki@google.com>
Subject: Re: [ANNOUNCE] 3.0.1-rt11
Date: Tue, 6 Sep 2011 19:53:31 -0700	[thread overview]
Message-ID: <4E66DCAB.8090801@am.sony.com> (raw)
In-Reply-To: <20110826235507.GJ2342@linux.vnet.ibm.com>

On 08/26/11 16:55, Paul E. McKenney wrote:
> On Wed, Aug 24, 2011 at 04:58:49PM -0700, Frank Rowand wrote:
>> On 08/13/11 03:53, Peter Zijlstra wrote:
>>>
>>> Whee, I can skip release announcements too!
>>>
>>> So no the subject ain't no mistake its not, 3.0.1-rt11 is there for the
>>> grabs.

< snip >

>> I have a consistent (every boot) hang on boot.  With a few
>> hacks to get console output, I get the
>>
>>   rcu_preempt_state detected stalls on CPUs/tasks

< snip >

>> This is an ARM NaviEngine (out of tree, so I also have applied
>> a series of pages for platform support).
>>
>> CONFIG_PREEMPT_RT_FULL is set.  Full config is attached.

I have also replicated the problem on the ARM RealView (in tree) and
without the RT patches.

> 
> Hmmm...  The last few that I have seen that looked like this were
> due to my messing up rcutorture so that the RCU-boost testing kthreads
> ran CPU-bound at real-time priority.
> 
> Is it possible that something similar is happening on your system?
> 
>                                                         Thanx, Paul

The problem ended up being caused by the allowed cpus mask being set
to all possible cpus for the ksoftirqd on the secondary processors.
So the RCU softirq was never executing on cpu 2.

I'll test the following patch on 3.1 tomorrow.

-Frank Rowand


Symptom: rcu stall

The problem was that ksoftirqd was woken on the secondary processors before
the secondary processors were online.  This led to allowed cpus being set
to all cpus.

   wake_up_process()
      try_to_wake_up()
         select_task_rq()
            if (... || !cpu_online(cpu))
               select_fallback_rq(task_cpu(p), p)
                  ...
                  /* No more Mr. Nice Guy. */
                  dest_cpu = cpuset_cpus_allowed_fallback(p)
                     do_set_cpus_allowed(p, cpu_possible_mask)
                        #  Thus ksoftirqd can now run on any cpu...


Signed-off-by: Frank Rowand <frank.rowand@am.sony.com>
---
 kernel/softirq.c |   19 	14 +	5 -	0 !
 1 file changed, 14 insertions(+), 5 deletions(-)

Index: b/kernel/softirq.c
===================================================================
--- a/kernel/softirq.c
+++ b/kernel/softirq.c
@@ -55,6 +55,7 @@ EXPORT_SYMBOL(irq_stat);
 static struct softirq_action softirq_vec[NR_SOFTIRQS] __cacheline_aligned_in_smp;
 
 DEFINE_PER_CPU(struct task_struct *, ksoftirqd);
+DEFINE_PER_CPU(struct task_struct *, ksoftirqd_pending_online);
 
 char *softirq_to_name[NR_SOFTIRQS] = {
 	"HI", "TIMER", "NET_TX", "NET_RX", "BLOCK", "BLOCK_IOPOLL",
@@ -862,28 +863,36 @@ static int __cpuinit cpu_callback(struct
 			return notifier_from_errno(PTR_ERR(p));
 		}
 		kthread_bind(p, hotcpu);
-  		per_cpu(ksoftirqd, hotcpu) = p;
+		per_cpu(ksoftirqd_pending_online, hotcpu) = p;
  		break;
 	case CPU_ONLINE:
 	case CPU_ONLINE_FROZEN:
+		per_cpu(ksoftirqd, hotcpu) =
+			per_cpu(ksoftirqd_pending_online, hotcpu);
+		per_cpu(ksoftirqd_pending_online, hotcpu) = NULL;
 		wake_up_process(per_cpu(ksoftirqd, hotcpu));
 		break;
 #ifdef CONFIG_HOTPLUG_CPU
 	case CPU_UP_CANCELED:
 	case CPU_UP_CANCELED_FROZEN:
-		if (!per_cpu(ksoftirqd, hotcpu))
+		p = per_cpu(ksoftirqd_pending_online, hotcpu);
+		if (!p)
+			p = per_cpu(ksoftirqd, hotcpu);
+		if (!p)
 			break;
 		/* Unbind so it can run.  Fall thru. */
-		kthread_bind(per_cpu(ksoftirqd, hotcpu),
-			     cpumask_any(cpu_online_mask));
+		kthread_bind(p, cpumask_any(cpu_online_mask));
 	case CPU_DEAD:
 	case CPU_DEAD_FROZEN: {
 		static const struct sched_param param = {
 			.sched_priority = MAX_RT_PRIO-1
 		};
 
-		p = per_cpu(ksoftirqd, hotcpu);
+		p = per_cpu(ksoftirqd_pending_online, hotcpu);
+		if (!p)
+			p = per_cpu(ksoftirqd, hotcpu);
 		per_cpu(ksoftirqd, hotcpu) = NULL;
+		per_cpu(ksoftirqd_pending_online, hotcpu) = NULL;
 		sched_setscheduler_nocheck(p, SCHED_FIFO, &param);
 		kthread_stop(p);
 		takeover_tasklets(hotcpu);


  parent reply	other threads:[~2011-09-07  2:54 UTC|newest]

Thread overview: 35+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2011-08-13 10:53 [ANNOUNCE] 3.0.1-rt11 Peter Zijlstra
2011-08-13 11:48 ` Mike Galbraith
2011-08-13 11:58   ` Peter Zijlstra
2011-08-13 13:59     ` Mike Galbraith
2011-08-13 14:23       ` Peter Zijlstra
2011-08-13 16:27       ` Paul E. McKenney
2011-08-14  4:23         ` Mike Galbraith
2011-08-16 14:17           ` Nivedita Singhvi
2011-08-16 15:10             ` Mike Galbraith
2011-08-16 15:18               ` Nivedita Singhvi
2011-08-16 19:31               ` Paul E. McKenney
2011-08-17  4:28                 ` Mike Galbraith
2011-08-17  5:03                   ` Nivedita Singhvi
2011-08-15 10:09         ` Mike Galbraith
2011-08-14 21:19 ` Clark Williams
2011-08-21  8:30 ` patches/mm-memory-rt.patch can go away Mike Galbraith
2011-08-23 14:12 ` [patch] sched, rt: fix migrate_enable() thinko Mike Galbraith
2011-09-08  2:11   ` Frank Rowand
2011-09-08  4:58     ` Mike Galbraith
2011-08-24 23:58 ` [ANNOUNCE] 3.0.1-rt11 Frank Rowand
2011-08-26 23:55   ` Paul E. McKenney
2011-08-29 19:57     ` Frank Rowand
2011-08-30  3:17       ` Paul E. McKenney
2011-09-07  2:53     ` Frank Rowand [this message]
2011-09-07  3:00       ` Frank Rowand
2011-09-07  3:00         ` Frank Rowand
2011-09-07  6:42       ` Paul E. McKenney
2011-09-07  9:25       ` Thomas Gleixner
2011-09-07  9:25         ` Thomas Gleixner
2011-09-07 10:46         ` Russell King - ARM Linux
2011-09-07 10:47           ` Russell King - ARM Linux
2011-09-07 10:57             ` Thomas Gleixner
2011-09-07 14:01               ` Russell King - ARM Linux
2011-09-07 16:32                 ` Thomas Gleixner
2011-09-07 16:33                 ` Frank Rowand

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=4E66DCAB.8090801@am.sony.com \
    --to=frank.rowand@am.sony.com \
    --cc=Frank_Rowand@sonyusa.com \
    --cc=efault@gmx.de \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-rt-users@vger.kernel.org \
    --cc=mingo@elte.hu \
    --cc=paulmck@linux.vnet.ibm.com \
    --cc=peterz@infradead.org \
    --cc=tglx@linutronix.de \
    --cc=venki@google.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.