All of lore.kernel.org
 help / color / mirror / Atom feed
From: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
To: Thomas Gleixner <tglx@linutronix.de>
Cc: kernel test robot <xiaolong.ye@intel.com>,
	LKML <linux-kernel@vger.kernel.org>,
	lkp@01.org, frederic@kernel.org
Subject: Re: [lkp-robot] [torture] b151f93a71: INFO:rcu_preempt_detected_stalls_on_CPUs/tasks
Date: Wed, 29 Nov 2017 14:38:06 -0800	[thread overview]
Message-ID: <20171129223806.GA15615@linux.vnet.ibm.com> (raw)
In-Reply-To: <20171129220703.GA12908@linux.vnet.ibm.com>

On Wed, Nov 29, 2017 at 02:07:03PM -0800, Paul E. McKenney wrote:
> On Wed, Nov 29, 2017 at 11:08:19AM -0800, Paul E. McKenney wrote:
> > On Tue, Nov 28, 2017 at 01:08:10PM -0800, Paul E. McKenney wrote:
> > > On Tue, Nov 28, 2017 at 12:46:19PM -0800, Paul E. McKenney wrote:
> > > > On Tue, Nov 28, 2017 at 09:35:54AM -0800, Paul E. McKenney wrote:
> > > > > On Tue, Nov 28, 2017 at 06:10:08PM +0100, Thomas Gleixner wrote:
> > > > > > On Tue, 28 Nov 2017, Paul E. McKenney wrote:
> > > > > > > On Tue, Nov 28, 2017 at 05:47:35PM +0100, Thomas Gleixner wrote:
> > > > > > > diff --git a/kernel/time/timer.c b/kernel/time/timer.c
> > > > > > > index db774b0f217e..a3321bb565db 100644
> > > > > > > --- a/kernel/time/timer.c
> > > > > > > +++ b/kernel/time/timer.c
> > > > > > > @@ -1803,7 +1803,7 @@ signed long __sched schedule_timeout(signed long timeout)
> > > > > > >  		idx = timer_get_idx(&timer.timer);
> > > > > > >  		idx_now = calc_wheel_index(j, base->clk);
> > > > > > >  		raw_spin_unlock_irqrestore(&base->lock, flags);
> > > > > > > -		pr_info("%s: Waylayed timer idx: %u idx_now: %u\n", __func__, idx, idx_now);
> > > > > > > +		pr_info("%s: Waylayed timer base->clk: %#lx jiffies: %#lx base->next_expiry: %#lx timer->flags: %#x timer->expires %#lx idx: %u idx_now: %u\n", __func__, base->clk, j, base->next_expiry, timer.timer.flags, timer.timer.expires, idx, idx_now);
> > > > > > 
> > > > > > Please print idx and idx_now as hex values. It's simpler to decode that way.
> > > > > 
> > > > > Here you go!  Starting tests at this end, focusing on TREE01 and TREE04.
> > > > > BTW, TREE04 doesn't do any CPU hotplug, providing a counterexample to
> > > > > my long-held assumption that this only happened in the presence of CPU
> > > > > hotplug operations.
> > > > 
> > > > And here is output with changes discussed on IRC.  TREE04 managed to
> > > > have not one but two overlapping RCU CPU stall warnings, one for RCU-bh
> > > > and the second for RCU-sched.  TREE04 and TREE04.  HZ=1000.
> > > 
> > > And here is the full patch, in all its lack of aesthetic appeal.
> > 
> > And here is the list of waylaid timers from last night's testing.  The big
> > pile of them from TREE01 at the end is due to wakeups from kthread_stop(),
> > I am guessing.  The TREE04 run only had two of them, but they seem reliable
> > enough that I just might be able to bisect.  I will try that.
> 
> And it converged to 5c4991e24c69 ("sched/isolation: Split out new
> CONFIG_CPU_ISOLATION=y config from CONFIG_NO_HZ_FULL"), which is a bit
> hard to believe.  Please see below for the log.  I will be retesting
> some of the allegedly good commits, just in case.

And the bisection really did converge here.  It appears that splitting
CONFIG_CPU_ISOLATION from CONFIG_NO_HZ_FULL isn't fully cooked.
Adding CONFIG_CPU_ISOLATION=y to my tests causes them to pass, but I
get the eternal wait for a three-jiffy timeout given CONFIG_NO_HZ_FULL=y
and CONFIG_CPU_ISOLATION=n.  Both cases use CONFIG_NO_HZ_FULL_ALL=y.

Adding Frederic for his perspective.

							Thanx, Paul

WARNING: multiple messages have this Message-ID (diff)
From: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
To: lkp@lists.01.org
Subject: Re: [lkp-robot] [torture] b151f93a71: INFO:rcu_preempt_detected_stalls_on_CPUs/tasks
Date: Wed, 29 Nov 2017 14:38:06 -0800	[thread overview]
Message-ID: <20171129223806.GA15615@linux.vnet.ibm.com> (raw)
In-Reply-To: <20171129220703.GA12908@linux.vnet.ibm.com>

[-- Attachment #1: Type: text/plain, Size: 3120 bytes --]

On Wed, Nov 29, 2017 at 02:07:03PM -0800, Paul E. McKenney wrote:
> On Wed, Nov 29, 2017 at 11:08:19AM -0800, Paul E. McKenney wrote:
> > On Tue, Nov 28, 2017 at 01:08:10PM -0800, Paul E. McKenney wrote:
> > > On Tue, Nov 28, 2017 at 12:46:19PM -0800, Paul E. McKenney wrote:
> > > > On Tue, Nov 28, 2017 at 09:35:54AM -0800, Paul E. McKenney wrote:
> > > > > On Tue, Nov 28, 2017 at 06:10:08PM +0100, Thomas Gleixner wrote:
> > > > > > On Tue, 28 Nov 2017, Paul E. McKenney wrote:
> > > > > > > On Tue, Nov 28, 2017 at 05:47:35PM +0100, Thomas Gleixner wrote:
> > > > > > > diff --git a/kernel/time/timer.c b/kernel/time/timer.c
> > > > > > > index db774b0f217e..a3321bb565db 100644
> > > > > > > --- a/kernel/time/timer.c
> > > > > > > +++ b/kernel/time/timer.c
> > > > > > > @@ -1803,7 +1803,7 @@ signed long __sched schedule_timeout(signed long timeout)
> > > > > > >  		idx = timer_get_idx(&timer.timer);
> > > > > > >  		idx_now = calc_wheel_index(j, base->clk);
> > > > > > >  		raw_spin_unlock_irqrestore(&base->lock, flags);
> > > > > > > -		pr_info("%s: Waylayed timer idx: %u idx_now: %u\n", __func__, idx, idx_now);
> > > > > > > +		pr_info("%s: Waylayed timer base->clk: %#lx jiffies: %#lx base->next_expiry: %#lx timer->flags: %#x timer->expires %#lx idx: %u idx_now: %u\n", __func__, base->clk, j, base->next_expiry, timer.timer.flags, timer.timer.expires, idx, idx_now);
> > > > > > 
> > > > > > Please print idx and idx_now as hex values. It's simpler to decode that way.
> > > > > 
> > > > > Here you go!  Starting tests at this end, focusing on TREE01 and TREE04.
> > > > > BTW, TREE04 doesn't do any CPU hotplug, providing a counterexample to
> > > > > my long-held assumption that this only happened in the presence of CPU
> > > > > hotplug operations.
> > > > 
> > > > And here is output with changes discussed on IRC.  TREE04 managed to
> > > > have not one but two overlapping RCU CPU stall warnings, one for RCU-bh
> > > > and the second for RCU-sched.  TREE04 and TREE04.  HZ=1000.
> > > 
> > > And here is the full patch, in all its lack of aesthetic appeal.
> > 
> > And here is the list of waylaid timers from last night's testing.  The big
> > pile of them from TREE01 at the end is due to wakeups from kthread_stop(),
> > I am guessing.  The TREE04 run only had two of them, but they seem reliable
> > enough that I just might be able to bisect.  I will try that.
> 
> And it converged to 5c4991e24c69 ("sched/isolation: Split out new
> CONFIG_CPU_ISOLATION=y config from CONFIG_NO_HZ_FULL"), which is a bit
> hard to believe.  Please see below for the log.  I will be retesting
> some of the allegedly good commits, just in case.

And the bisection really did converge here.  It appears that splitting
CONFIG_CPU_ISOLATION from CONFIG_NO_HZ_FULL isn't fully cooked.
Adding CONFIG_CPU_ISOLATION=y to my tests causes them to pass, but I
get the eternal wait for a three-jiffy timeout given CONFIG_NO_HZ_FULL=y
and CONFIG_CPU_ISOLATION=n.  Both cases use CONFIG_NO_HZ_FULL_ALL=y.

Adding Frederic for his perspective.

							Thanx, Paul


  reply	other threads:[~2017-11-29 22:38 UTC|newest]

Thread overview: 32+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-11-26  8:42 [lkp-robot] [torture] b151f93a71: INFO:rcu_preempt_detected_stalls_on_CPUs/tasks kernel test robot
2017-11-26  8:42 ` kernel test robot
2017-11-27 21:57 ` Paul E. McKenney
2017-11-27 21:57   ` Paul E. McKenney
2017-11-27  2:48   ` Ye Xiaolong
2017-11-27  2:48     ` Ye Xiaolong
2017-11-28 14:16   ` Thomas Gleixner
2017-11-28 14:16     ` Thomas Gleixner
2017-11-28 16:41     ` Paul E. McKenney
2017-11-28 16:41       ` Paul E. McKenney
2017-11-28 16:47       ` Thomas Gleixner
2017-11-28 16:47         ` Thomas Gleixner
2017-11-28 17:07         ` Paul E. McKenney
2017-11-28 17:07           ` Paul E. McKenney
2017-11-28 17:10           ` Thomas Gleixner
2017-11-28 17:10             ` Thomas Gleixner
2017-11-28 17:35             ` Paul E. McKenney
2017-11-28 17:35               ` Paul E. McKenney
2017-11-28 20:46               ` Paul E. McKenney
2017-11-28 20:46                 ` Paul E. McKenney
2017-11-28 21:08                 ` Paul E. McKenney
2017-11-28 21:08                   ` Paul E. McKenney
2017-11-29 19:08                   ` Paul E. McKenney
2017-11-29 19:08                     ` Paul E. McKenney
2017-11-29 22:07                     ` Paul E. McKenney
2017-11-29 22:07                       ` Paul E. McKenney
2017-11-29 22:38                       ` Paul E. McKenney [this message]
2017-11-29 22:38                         ` Paul E. McKenney
2017-12-01  0:45                         ` Paul E. McKenney
2017-12-01  0:45                           ` Paul E. McKenney
2017-11-28 16:52       ` Paul E. McKenney
2017-11-28 16:52         ` Paul E. McKenney

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20171129223806.GA15615@linux.vnet.ibm.com \
    --to=paulmck@linux.vnet.ibm.com \
    --cc=frederic@kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=lkp@01.org \
    --cc=tglx@linutronix.de \
    --cc=xiaolong.ye@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.