All of lore.kernel.org
 help / color / mirror / Atom feed
From: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
To: Peter Zijlstra <peterz@infradead.org>
Cc: "Mathieu Desnoyers" <mathieu.desnoyers@efficios.com>,
	"Chatre, Reinette" <reinette.chatre@intel.com>,
	"Jacob Pan" <jacob.jun.pan@linux.intel.com>,
	"Josh Triplett" <josh@joshtriplett.org>,
	"Ross Green" <rgkernel@gmail.com>,
	"John Stultz" <john.stultz@linaro.org>,
	"Thomas Gleixner" <tglx@linutronix.de>,
	lkml <linux-kernel@vger.kernel.org>,
	"Ingo Molnar" <mingo@kernel.org>,
	"Lai Jiangshan" <jiangshanlai@gmail.com>,
	dipankar@in.ibm.com, "Andrew Morton" <akpm@linux-foundation.org>,
	rostedt <rostedt@goodmis.org>,
	"David Howells" <dhowells@redhat.com>,
	"Eric Dumazet" <edumazet@google.com>,
	"Darren Hart" <dvhart@linux.intel.com>,
	"Frédéric Weisbecker" <fweisbec@gmail.com>,
	"Oleg Nesterov" <oleg@redhat.com>,
	"pranith kumar" <bobby.prani@gmail.com>
Subject: Re: rcu_preempt self-detected stall on CPU from 4.5-rc3, since 3.17
Date: Mon, 28 Mar 2016 17:28:14 -0700	[thread overview]
Message-ID: <20160329002814.GB13058@linux.vnet.ibm.com> (raw)
In-Reply-To: <20160329002518.GA13058@linux.vnet.ibm.com>

On Mon, Mar 28, 2016 at 05:25:18PM -0700, Paul E. McKenney wrote:
> On Mon, Mar 28, 2016 at 06:08:41AM -0700, Paul E. McKenney wrote:
> > On Mon, Mar 28, 2016 at 08:25:47AM +0200, Peter Zijlstra wrote:
> > > On Sun, Mar 27, 2016 at 02:06:41PM -0700, Paul E. McKenney wrote:
> 
> [ . . . ]
> 
> > > > OK, so I should instrument migration_call() if I get the repro rate up?
> > > 
> > > Can do, maybe try the below first. (yes I know how long it all takes :/)
> > 
> > OK, will run this today, then run calibration for last night's run this
> > evening.
> 
> And there was one failure out of ten runs.  If last night's failure rate
> was typical (7 of 24), then I believe we can be about 87% confident that
> this change helped.  That isn't all that confident, but...

And, as Murphy would have it, the instrumentation didn't trigger.  I just
got the usual stall-warning messages with a starving RCU grace-period
kthread.

							Thanx, Paul

> Tested-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
> 
> So what to run tonight?
> 
> The most sane approach would be to run stock in order to get a baseline
> failure rate.  It is tempting to run more of Peter's patch, but part of
> the problem is that we don't know the current baseline.
> 
> So baseline it is...
> 
> 							Thanx, Paul
> 
> > Speaking of which, last night's run (disabling TIF_POLLING_NRFLAG)
> > consisted of 24 two-hour runs.  Six of them had hard hangs, and another
> > had a hang that eventually unhung of its own accord.  I believe that this
> > is significantly fewer failures than from a stock kernel, but I could
> > be wrong, and it will take some serious testing to give statistical
> > confidence for whatever conclusion is correct.
> > 
> > > > > The other interesting case would be resched_cpu(), which uses
> > > > > set_nr_and_not_polling() to kick a remote cpu to call schedule(). It
> > > > > atomically sets TIF_NEED_RESCHED and returns if TIF_POLLING_NRFLAG was
> > > > > not set. If indeed not, it will send an IPI.
> > > > > 
> > > > > This assumes the idle 'exit' path will do the same as the IPI does; and
> > > > > if you look at cpu_idle_loop() it does indeed do both
> > > > > preempt_fold_need_resched() and sched_ttwu_pending().
> > > > > 
> > > > > Note that one cannot rely on irq_enter()/irq_exit() being called for the
> > > > > scheduler IPI.
> > > > 
> > > > OK, thank you for the info!  Any specific debug actions?
> > > 
> > > Dunno, something like the below should bring visibility into the
> > > (lockless) wake_list thingy.
> > > 
> > > So these trace_printk()s should happen between trace_sched_waking() and
> > > trace_sched_wakeup() (I've not fully read the thread, but ISTR you had
> > > some traces with these here thingies on).
> > > 
> > > ---
> > >  arch/x86/include/asm/bitops.h | 6 ++++--
> > >  kernel/sched/core.c           | 9 +++++++++
> > >  2 files changed, 13 insertions(+), 2 deletions(-)
> > > 
> > > diff --git a/arch/x86/include/asm/bitops.h b/arch/x86/include/asm/bitops.h
> > > index 7766d1cf096e..5345784d5e41 100644
> > > --- a/arch/x86/include/asm/bitops.h
> > > +++ b/arch/x86/include/asm/bitops.h
> > > @@ -112,11 +112,13 @@ clear_bit(long nr, volatile unsigned long *addr)
> > >  	if (IS_IMMEDIATE(nr)) {
> > >  		asm volatile(LOCK_PREFIX "andb %1,%0"
> > >  			: CONST_MASK_ADDR(nr, addr)
> > > -			: "iq" ((u8)~CONST_MASK(nr)));
> > > +			: "iq" ((u8)~CONST_MASK(nr))
> > > +			: "memory");
> > >  	} else {
> > >  		asm volatile(LOCK_PREFIX "btr %1,%0"
> > >  			: BITOP_ADDR(addr)
> > > -			: "Ir" (nr));
> > > +			: "Ir" (nr)
> > > +			: "memory");
> > >  	}
> > >  }
> > 
> > Is the above addition of "memory" strictly for the debug below, or is
> > it also a potential fix?
> > 
> > Starting it up regardless, but figured I should ask!
> > 
> > 							Thanx, Paul
> > 
> > > diff --git a/kernel/sched/core.c b/kernel/sched/core.c
> > > index 0b21e7a724e1..b446f73c530d 100644
> > > --- a/kernel/sched/core.c
> > > +++ b/kernel/sched/core.c
> > > @@ -1669,6 +1669,7 @@ void sched_ttwu_pending(void)
> > >  	while (llist) {
> > >  		p = llist_entry(llist, struct task_struct, wake_entry);
> > >  		llist = llist_next(llist);
> > > +		trace_printk("waking %d\n", p->pid);
> > >  		ttwu_do_activate(rq, p, 0);
> > >  	}
> > > 
> > > @@ -1719,6 +1720,7 @@ static void ttwu_queue_remote(struct task_struct *p, int cpu)
> > >  	struct rq *rq = cpu_rq(cpu);
> > > 
> > >  	if (llist_add(&p->wake_entry, &cpu_rq(cpu)->wake_list)) {
> > > +		trace_printk("queued %d for waking on %d\n", p->pid, cpu);
> > >  		if (!set_nr_if_polling(rq->idle))
> > >  			smp_send_reschedule(cpu);
> > >  		else
> > > @@ -5397,10 +5399,17 @@ migration_call(struct notifier_block *nfb, unsigned long action, void *hcpu)
> > >  		migrate_tasks(rq);
> > >  		BUG_ON(rq->nr_running != 1); /* the migration thread */
> > >  		raw_spin_unlock_irqrestore(&rq->lock, flags);
> > > +
> > > +		/* really bad m'kay */
> > > +		WARN_ON(!llist_empty(&rq->wake_list));
> > > +
> > >  		break;
> > > 
> > >  	case CPU_DEAD:
> > >  		calc_load_migrate(rq);
> > > +
> > > +		/* more bad */
> > > +		WARN_ON(!llist_empty(&rq->wake_list));
> > >  		break;
> > >  #endif
> > >  	}
> > > 

  reply	other threads:[~2016-03-29  0:28 UTC|newest]

Thread overview: 79+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-02-09 10:11 rcu_preempt self-detected stall on CPU from 4.5-rc3, since 3.17 Ross Green
2016-02-17  5:45 ` Paul E. McKenney
2016-02-17 19:28   ` Paul E. McKenney
2016-02-17 19:45     ` Peter Zijlstra
2016-02-17 20:28       ` Paul E. McKenney
2016-02-17 23:19         ` Paul E. McKenney
2016-02-18 11:51           ` Ross Green
2016-02-18 23:14             ` Mathieu Desnoyers
2016-02-19  3:56               ` Ross Green
2016-02-19  4:13                 ` John Stultz
2016-02-19 17:33                   ` Paul E. McKenney
2016-02-20  4:34                     ` Ross Green
2016-02-20  6:32                       ` Paul E. McKenney
2016-02-21  5:04                         ` Ross Green
2016-02-21 18:15                           ` Ross Green
2016-02-23 20:34                             ` Mathieu Desnoyers
2016-02-23 20:55                               ` Paul E. McKenney
2016-02-23 21:28                                 ` Ross Green
2016-02-25  5:13                                   ` Ross Green
2016-02-26  0:56                                     ` Paul E. McKenney
2016-02-26  1:35                                       ` Paul E. McKenney
2016-03-04  5:30                                         ` Ross Green
2016-03-04 15:18                                           ` Paul E. McKenney
2016-03-18 21:00                                       ` Josh Triplett
2016-03-18 23:56                                         ` Paul E. McKenney
2016-03-21 16:22                                           ` Jacob Pan
2016-03-21 17:26                                             ` Paul E. McKenney
2016-03-22 16:35                                               ` Chatre, Reinette
2016-03-22 17:40                                                 ` Paul E. McKenney
2016-03-22 21:04                                                   ` Chatre, Reinette
2016-03-22 21:19                                                     ` Paul E. McKenney
2016-03-23 17:15                                                       ` Chatre, Reinette
2016-03-23 18:20                                                         ` Paul E. McKenney
2016-03-23 18:25                                                           ` Chatre, Reinette
2016-03-23 19:50                                                             ` Paul E. McKenney
2016-03-25 21:24                                                           ` Chatre, Reinette
2016-03-25 21:46                                                             ` Paul E. McKenney
2016-03-26 12:29                                                               ` Mathieu Desnoyers
2016-03-26 15:28                                                                 ` Paul E. McKenney
2016-03-26 18:49                                                                   ` Paul E. McKenney
2016-03-26 22:22                                                                     ` Mathieu Desnoyers
2016-03-27  1:34                                                                       ` Paul E. McKenney
2016-03-27 13:48                                                                         ` Mathieu Desnoyers
2016-03-27 15:40                                                                           ` Paul E. McKenney
2016-03-27 20:00                                                                             ` Paul E. McKenney
2016-03-27 20:45                                                                             ` Peter Zijlstra
2016-03-27 21:06                                                                               ` Paul E. McKenney
2016-03-28  6:25                                                                                 ` Peter Zijlstra
2016-03-28 13:08                                                                                   ` Paul E. McKenney
2016-03-29  0:25                                                                                     ` Paul E. McKenney
2016-03-29  0:28                                                                                       ` Paul E. McKenney [this message]
2016-03-29 13:49                                                                                         ` Paul E. McKenney
2016-03-30 14:55                                                                                           ` Paul E. McKenney
2016-03-31 15:42                                                                                             ` Paul E. McKenney
2016-04-03  8:18                                                                                               ` Paul E. McKenney
2016-05-06  6:25                                                                                                 ` Ross Green
2016-05-07 15:25                                                                                                   ` Paul E. McKenney
2016-05-10  2:36                                                                                                     ` Ross Green
2016-06-30 17:52                                                                                                     ` Paul E. McKenney
2016-03-28  1:44                                                                               ` Mathieu Desnoyers
2016-03-28  2:23                                                                                 ` Mathieu Desnoyers
2016-03-28  6:13                                                                                   ` Peter Zijlstra
2016-03-28 13:50                                                                                     ` Paul E. McKenney
2016-03-28 14:15                                                                                     ` Mathieu Desnoyers
2016-03-27 20:53                                                                             ` Peter Zijlstra
2016-03-27 21:07                                                                               ` Paul E. McKenney
2016-03-27 20:54                                             ` Peter Zijlstra
2016-03-27 21:09                                               ` Paul E. McKenney
2016-03-28  6:28                                                 ` Peter Zijlstra
2016-03-28 13:29                                                   ` Paul E. McKenney
2016-03-28 15:07                                                     ` Mathieu Desnoyers
2016-03-28 15:56                                                       ` Paul E. McKenney
2016-03-28 16:12                                                         ` Mathieu Desnoyers
2016-03-28 16:29                                                           ` Paul E. McKenney
2016-03-30 12:58                                                     ` Boqun Feng
2016-03-30 13:30                                                       ` Paul E. McKenney
2016-03-30 14:15                                                         ` Boqun Feng
2016-02-19  4:22               ` Paul E. McKenney
2016-02-19  5:59                 ` Ross Green

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20160329002814.GB13058@linux.vnet.ibm.com \
    --to=paulmck@linux.vnet.ibm.com \
    --cc=akpm@linux-foundation.org \
    --cc=bobby.prani@gmail.com \
    --cc=dhowells@redhat.com \
    --cc=dipankar@in.ibm.com \
    --cc=dvhart@linux.intel.com \
    --cc=edumazet@google.com \
    --cc=fweisbec@gmail.com \
    --cc=jacob.jun.pan@linux.intel.com \
    --cc=jiangshanlai@gmail.com \
    --cc=john.stultz@linaro.org \
    --cc=josh@joshtriplett.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mathieu.desnoyers@efficios.com \
    --cc=mingo@kernel.org \
    --cc=oleg@redhat.com \
    --cc=peterz@infradead.org \
    --cc=reinette.chatre@intel.com \
    --cc=rgkernel@gmail.com \
    --cc=rostedt@goodmis.org \
    --cc=tglx@linutronix.de \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.