All of lore.kernel.org
 help / color / mirror / Atom feed
From: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
To: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: "Ross Green" <rgkernel@gmail.com>,
	"Thomas Gleixner" <tglx@linutronix.de>,
	"John Stultz" <john.stultz@linaro.org>,
	"Peter Zijlstra" <peterz@infradead.org>,
	linux-kernel@vger.kernel.org, "Ingo Molnar" <mingo@kernel.org>,
	"Lai Jiangshan" <jiangshanlai@gmail.com>,
	dipankar@in.ibm.com, "Andrew Morton" <akpm@linux-foundation.org>,
	josh@joshtriplett.org, rostedt <rostedt@goodmis.org>,
	"David Howells" <dhowells@redhat.com>,
	"Eric Dumazet" <edumazet@google.com>,
	dvhart@linux.intel.com,
	"Frédéric Weisbecker" <fweisbec@gmail.com>,
	"Oleg Nesterov" <oleg@redhat.com>,
	"pranith kumar" <bobby.prani@gmail.com>
Subject: Re: rcu_preempt self-detected stall on CPU from 4.5-rc3, since 3.17
Date: Thu, 18 Feb 2016 20:22:53 -0800	[thread overview]
Message-ID: <20160219042253.GJ6719@linux.vnet.ibm.com> (raw)
In-Reply-To: <1568248905.2264.1455837260992.JavaMail.zimbra@efficios.com>

On Thu, Feb 18, 2016 at 11:14:21PM +0000, Mathieu Desnoyers wrote:
> ----- On Feb 18, 2016, at 6:51 AM, Ross Green rgkernel@gmail.com wrote:
> 
> > On Thu, Feb 18, 2016 at 10:19 AM, Paul E. McKenney
> > <paulmck@linux.vnet.ibm.com> wrote:
> >> On Wed, Feb 17, 2016 at 12:28:29PM -0800, Paul E. McKenney wrote:
> >>> On Wed, Feb 17, 2016 at 08:45:54PM +0100, Peter Zijlstra wrote:
> >>> > On Wed, Feb 17, 2016 at 11:28:17AM -0800, Paul E. McKenney wrote:
> >>> > > On Tue, Feb 16, 2016 at 09:45:49PM -0800, Paul E. McKenney wrote:
> >>> > > > On Tue, Feb 09, 2016 at 09:11:55PM +1100, Ross Green wrote:
> >>> > > > > Continued testing with the latest linux-4.5-rc3 release.
> >>> > > > >
> >>> > > > > Please find attached a copy of traces from dmesg:
> >>> > > > >
> >>> > > > > There is a lot more debug and trace data so hopefully this will shed
> >>> > > > > some light on what might be happening here.
> >>> > > > >
> >>> > > > > My testing remains run a series of simple benchmarks, let that run to
> >>> > > > > completion and then leave the system idle away with just a few daemons
> >>> > > > > running.
> >>> > > > >
> >>> > > > > the self detected stalls in this instance turned up after a days run time.
> >>> > > > > There were  NO heavy artificial computational loads on the machine.
> >>> > > >
> >>> > > > It does indeed look quiet on that dmesg for a good long time.
> >>> > > >
> >>> > > > The following insanely crude not-for-mainline hack -might- be producing
> >>> > > > good results in my testing.  It will take some time before I can claim
> >>> > > > statistically different results.  But please feel free to give it a go
> >>> > > > in the meantime.  (Thanks to Al Viro for pointing me in this direction.)
> >>> >
> >>> > Your case was special in that is was hotplug triggering it, right?
> >>>
> >>> Yes, it has thus far only shown up with CPU hotplug enabled.
> >>>
> >>> > I was auditing the hotplug paths involved when I fell ill two weeks ago,
> >>> > and have not really made any progress on that because of that :/
> >>>
> >>> I have always said that being sick is bad for one's health, but I didn't
> >>> realize that it could be bad for the kernel's health as well.  ;-)
> >>>
> >>> > I'll go have another look, I had a vague feeling for a race back then,
> >>> > lets see if I can still remember how..
> >>>
> >>> I believe that I can -finally- get an ftrace_dump() to happen within
> >>> 10-20 milliseconds of the problem, which just might be soon enough
> >>> after the problem to gather some useful information.  I am currently
> >>> testing this theory with "ftrace trace_event=sched_waking,sched_wakeup"
> >>> boot arguments on a two-hour run.
> >>
> >> And apparently another way to greatly reduce the probability of this
> >> bug occurring is to enable ftrace.  :-/
> >>
> >> Will try longer runs.
> >>
> >>                                                         Thanx, Paul
> >>
> >>> If this works out, what would be a useful set of trace events for me
> >>> to capture?
> >>>
> >>>                                                       Thanx, Paul
> >>
> > 
> > Well managed to catch this one on linux-4.5-rc4.
> > 
> > Took over 3 days and 7 hours to hit.
> > 
> > Same test as before, boot, run a series of simple benchmarks and then
> > let the machine just idle away.
> > 
> > As before, the reported stall, AND everything keeps on running as if
> > nothing had happened.
> > 
> > I notice in the task dump for both the cpus, the swapper is running on
> > both cpus.
> > 
> > does that make any sense?
> > There is around 3% of memory actually used.
> > 
> > Anyway, please find attached a copy of the dmesg output.
> > 
> > Hope this helps a few people fill in the missing pieces here.
> 
> What seems weird here is that all code paths in the loop
> perform a WRITE_ONCE(rsp->gp_activity, jiffies), which
> implies progress in each case:
> 
> - rcu_gp_init() does it,
> - both branches in the QS forcing loop do it, either
>   through rcu_gp_fqs(), or directly,
> 
> This means the thread is really stalled, and the backtrace
> shows those threads are stalled on the
> 
>                         ret = wait_event_interruptible_timeout(rsp->gp_wq,
>                                         rcu_gp_fqs_check_wake(rsp, &gf), j);
> 
> Since this is a *_timeout wait, for which the timeout
> is bounded by "j" jiffies which is bounded by "HZ" value,
> we should really not stay there too long, even if we are
> not awakened by whatever is supposed to awaken us.

Completely agreed on this seeming weird.  ;-)

> So unless I'm missing something, it would look like
> schedule_timeout() is missing its timeout there.
> 
> Perhaps we only experience this missed timeout here
> because typically there is always a wakeup coming sooner
> or later on relatively busy systems. This one is idle
> for quite a while.
> 
> Thoughts ?

I can also make this happen (infrequently) on a busy system with
rcutorture, but only with frequent CPU hotplugging.  Ross is making
it happen with pure idle.

I did manage to make this fail with ftrace running, but thus far
have not been able to get a trace that actually includes any
activity for the grace-period kthread.  Working on tightening
up the tests...

						Thanx, Paul

> Thanks,
> 
> Mathieu
> 
> 
> > 
> > Regards,
> > 
> > Ross Green
> 
> -- 
> Mathieu Desnoyers
> EfficiOS Inc.
> http://www.efficios.com

  parent reply	other threads:[~2016-02-19  4:22 UTC|newest]

Thread overview: 79+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-02-09 10:11 rcu_preempt self-detected stall on CPU from 4.5-rc3, since 3.17 Ross Green
2016-02-17  5:45 ` Paul E. McKenney
2016-02-17 19:28   ` Paul E. McKenney
2016-02-17 19:45     ` Peter Zijlstra
2016-02-17 20:28       ` Paul E. McKenney
2016-02-17 23:19         ` Paul E. McKenney
2016-02-18 11:51           ` Ross Green
2016-02-18 23:14             ` Mathieu Desnoyers
2016-02-19  3:56               ` Ross Green
2016-02-19  4:13                 ` John Stultz
2016-02-19 17:33                   ` Paul E. McKenney
2016-02-20  4:34                     ` Ross Green
2016-02-20  6:32                       ` Paul E. McKenney
2016-02-21  5:04                         ` Ross Green
2016-02-21 18:15                           ` Ross Green
2016-02-23 20:34                             ` Mathieu Desnoyers
2016-02-23 20:55                               ` Paul E. McKenney
2016-02-23 21:28                                 ` Ross Green
2016-02-25  5:13                                   ` Ross Green
2016-02-26  0:56                                     ` Paul E. McKenney
2016-02-26  1:35                                       ` Paul E. McKenney
2016-03-04  5:30                                         ` Ross Green
2016-03-04 15:18                                           ` Paul E. McKenney
2016-03-18 21:00                                       ` Josh Triplett
2016-03-18 23:56                                         ` Paul E. McKenney
2016-03-21 16:22                                           ` Jacob Pan
2016-03-21 17:26                                             ` Paul E. McKenney
2016-03-22 16:35                                               ` Chatre, Reinette
2016-03-22 17:40                                                 ` Paul E. McKenney
2016-03-22 21:04                                                   ` Chatre, Reinette
2016-03-22 21:19                                                     ` Paul E. McKenney
2016-03-23 17:15                                                       ` Chatre, Reinette
2016-03-23 18:20                                                         ` Paul E. McKenney
2016-03-23 18:25                                                           ` Chatre, Reinette
2016-03-23 19:50                                                             ` Paul E. McKenney
2016-03-25 21:24                                                           ` Chatre, Reinette
2016-03-25 21:46                                                             ` Paul E. McKenney
2016-03-26 12:29                                                               ` Mathieu Desnoyers
2016-03-26 15:28                                                                 ` Paul E. McKenney
2016-03-26 18:49                                                                   ` Paul E. McKenney
2016-03-26 22:22                                                                     ` Mathieu Desnoyers
2016-03-27  1:34                                                                       ` Paul E. McKenney
2016-03-27 13:48                                                                         ` Mathieu Desnoyers
2016-03-27 15:40                                                                           ` Paul E. McKenney
2016-03-27 20:00                                                                             ` Paul E. McKenney
2016-03-27 20:45                                                                             ` Peter Zijlstra
2016-03-27 21:06                                                                               ` Paul E. McKenney
2016-03-28  6:25                                                                                 ` Peter Zijlstra
2016-03-28 13:08                                                                                   ` Paul E. McKenney
2016-03-29  0:25                                                                                     ` Paul E. McKenney
2016-03-29  0:28                                                                                       ` Paul E. McKenney
2016-03-29 13:49                                                                                         ` Paul E. McKenney
2016-03-30 14:55                                                                                           ` Paul E. McKenney
2016-03-31 15:42                                                                                             ` Paul E. McKenney
2016-04-03  8:18                                                                                               ` Paul E. McKenney
2016-05-06  6:25                                                                                                 ` Ross Green
2016-05-07 15:25                                                                                                   ` Paul E. McKenney
2016-05-10  2:36                                                                                                     ` Ross Green
2016-06-30 17:52                                                                                                     ` Paul E. McKenney
2016-03-28  1:44                                                                               ` Mathieu Desnoyers
2016-03-28  2:23                                                                                 ` Mathieu Desnoyers
2016-03-28  6:13                                                                                   ` Peter Zijlstra
2016-03-28 13:50                                                                                     ` Paul E. McKenney
2016-03-28 14:15                                                                                     ` Mathieu Desnoyers
2016-03-27 20:53                                                                             ` Peter Zijlstra
2016-03-27 21:07                                                                               ` Paul E. McKenney
2016-03-27 20:54                                             ` Peter Zijlstra
2016-03-27 21:09                                               ` Paul E. McKenney
2016-03-28  6:28                                                 ` Peter Zijlstra
2016-03-28 13:29                                                   ` Paul E. McKenney
2016-03-28 15:07                                                     ` Mathieu Desnoyers
2016-03-28 15:56                                                       ` Paul E. McKenney
2016-03-28 16:12                                                         ` Mathieu Desnoyers
2016-03-28 16:29                                                           ` Paul E. McKenney
2016-03-30 12:58                                                     ` Boqun Feng
2016-03-30 13:30                                                       ` Paul E. McKenney
2016-03-30 14:15                                                         ` Boqun Feng
2016-02-19  4:22               ` Paul E. McKenney [this message]
2016-02-19  5:59                 ` Ross Green

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20160219042253.GJ6719@linux.vnet.ibm.com \
    --to=paulmck@linux.vnet.ibm.com \
    --cc=akpm@linux-foundation.org \
    --cc=bobby.prani@gmail.com \
    --cc=dhowells@redhat.com \
    --cc=dipankar@in.ibm.com \
    --cc=dvhart@linux.intel.com \
    --cc=edumazet@google.com \
    --cc=fweisbec@gmail.com \
    --cc=jiangshanlai@gmail.com \
    --cc=john.stultz@linaro.org \
    --cc=josh@joshtriplett.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mathieu.desnoyers@efficios.com \
    --cc=mingo@kernel.org \
    --cc=oleg@redhat.com \
    --cc=peterz@infradead.org \
    --cc=rgkernel@gmail.com \
    --cc=rostedt@goodmis.org \
    --cc=tglx@linutronix.de \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.