All of lore.kernel.org
 help / color / mirror / Atom feed
From: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
To: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
Cc: "Chatre, Reinette" <reinette.chatre@intel.com>,
	"Jacob Pan" <jacob.jun.pan@linux.intel.com>,
	"Josh Triplett" <josh@joshtriplett.org>,
	"Ross Green" <rgkernel@gmail.com>,
	"John Stultz" <john.stultz@linaro.org>,
	"Thomas Gleixner" <tglx@linutronix.de>,
	"Peter Zijlstra" <peterz@infradead.org>,
	lkml <linux-kernel@vger.kernel.org>,
	"Ingo Molnar" <mingo@kernel.org>,
	"Lai Jiangshan" <jiangshanlai@gmail.com>,
	dipankar@in.ibm.com, "Andrew Morton" <akpm@linux-foundation.org>,
	rostedt <rostedt@goodmis.org>,
	"David Howells" <dhowells@redhat.com>,
	"Eric Dumazet" <edumazet@google.com>,
	"Darren Hart" <dvhart@linux.intel.com>,
	"Frédéric Weisbecker" <fweisbec@gmail.com>,
	"Oleg Nesterov" <oleg@redhat.com>,
	"pranith kumar" <bobby.prani@gmail.com>
Subject: Re: rcu_preempt self-detected stall on CPU from 4.5-rc3, since 3.17
Date: Sat, 26 Mar 2016 22:22:57 +0000 (UTC)	[thread overview]
Message-ID: <706246733.37102.1459030977316.JavaMail.zimbra@efficios.com> (raw)
In-Reply-To: <20160326184940.GA23851@linux.vnet.ibm.com>

----- On Mar 26, 2016, at 2:49 PM, Paul E. McKenney paulmck@linux.vnet.ibm.com wrote:

> On Sat, Mar 26, 2016 at 08:28:16AM -0700, Paul E. McKenney wrote:
>> On Sat, Mar 26, 2016 at 12:29:31PM +0000, Mathieu Desnoyers wrote:
>> > ----- On Mar 25, 2016, at 5:46 PM, Paul E. McKenney paulmck@linux.vnet.ibm.com
>> > wrote:
>> > 
>> > > On Fri, Mar 25, 2016 at 09:24:14PM +0000, Chatre, Reinette wrote:
>> > >> Hi  Paul,
>> > >> 
>> > >> On 2016-03-23, Paul E. McKenney wrote:
>> > >> > Please boot with the following parameters:
>> > >> > 
>> > >> > 	rcu_tree.rcu_kick_kthreads ftrace
>> > >> > trace_event=sched_waking,sched_wakeup,sched_wake_idle_without_ipi
>> > >> 
>> > >> With these parameters I expected more details to show up in the kernel logs but
>> > >> cannot find any. Even so, today I left the machine running again and when this
>> > >> happened I think I was able to capture the trace data for the event. Please
>> > >> find attached the trace information for the kernel message below. Since the
>> > >> complete trace file is very big I trimmed it to show the time around this event
>> > >> - hopefully this will contain the information you need. I would also like to
>> > >> provide some additional information. The system on which I see these events had
>> > >> a time that was _very_ wrong. I noticed that this issue occurs when
>> > >> system-timesynd was one of the tasks calling the functions of interest to your
>> > >> tracing and am wondering if a very out of sync time in process of being
>> > >> corrected could be the cause of this issue? As an experiment I ensured the
>> > >> system time was accurate before leaving the system idle overnight and I did not
>> > >> see the issue the next morning.
>> > > 
>> > > Ah!  Yes, a sudden jump in time or a disagreement about the time among
>> > > different components of the system can definitely cause these symptoms.
>> > > We have sometimes seen these problems occur when a pair of CPUs have
>> > > wildly different ideas about what time it is, for example.  Please let
>> > > me know how it goes.
>> > > 
>> > > Also, in your trace, there are no sched_waking events for the rcu_preempt
>> > > process that are not immediately followed by sched_wakeup, so your trace
>> > > isn't showing the problem that I am seeing.
>> > 
>> > This is interesting.
>> > 
>> > Perhaps we could try with those commits reverted ?
>> > 
>> > commit e3baac47f0e82c4be632f4f97215bb93bf16b342
>> > Author: Peter Zijlstra <peterz@infradead.org>
>> > Date:   Wed Jun 4 10:31:18 2014 -0700
>> > 
>> >     sched/idle: Optimize try-to-wake-up IPI
>> > 
>> > commit fd99f91aa007ba255aac44fe6cf21c1db398243a
>> > Author: Peter Zijlstra <peterz@infradead.org>
>> > Date:   Wed Apr 9 15:35:08 2014 +0200
>> > 
>> >     sched/idle: Avoid spurious wakeup IPIs
>> > 
>> > They appeared in 3.16.
>> 
>> At this point, I am up for trying pretty much anything.  ;-)
>> 
>> Will give it a go.
> 
> And those certainly don't revert cleanly!  Would patching the kernel
> to remove the definition of TIF_POLLING_NRFLAG be useful?  Or, more
> to the point, is there some other course of action that would be more
> useful?  At this point, the test times are measured in weeks...

Indeed, patching the kernel to remove the TIF_POLLING_NRFLAG
definition would have an effect similar to reverting those two
commits.

Since testing takes a while, we could take a more aggressive
approach towards reproducing a possible race condition: we
could re-implement the _TIF_POLLING_NRFLAG vs _TIF_NEED_RESCHED
dance, along with the ttwu pending lock-list queue, within
a dummy test module, with custom data structures, and
stress-test the invariants. We could also create a Promela
model of these ipi-skip optimisations trying to validate
progress: whenever a wakeup is requested, there should
always be a scheduling performed, even if no further wakeup
is encountered.

Each of the two approaches proposed above might be a significant
endeavor, and would only validate my specific hunch. So it might
be a good idea to just let a test run for a few weeks with
TIF_POLLING_NRFLAG disabled meanwhile.

Thoughts ?

Thanks,

Mathieu


-- 
Mathieu Desnoyers
EfficiOS Inc.
http://www.efficios.com

  reply	other threads:[~2016-03-26 22:23 UTC|newest]

Thread overview: 79+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-02-09 10:11 rcu_preempt self-detected stall on CPU from 4.5-rc3, since 3.17 Ross Green
2016-02-17  5:45 ` Paul E. McKenney
2016-02-17 19:28   ` Paul E. McKenney
2016-02-17 19:45     ` Peter Zijlstra
2016-02-17 20:28       ` Paul E. McKenney
2016-02-17 23:19         ` Paul E. McKenney
2016-02-18 11:51           ` Ross Green
2016-02-18 23:14             ` Mathieu Desnoyers
2016-02-19  3:56               ` Ross Green
2016-02-19  4:13                 ` John Stultz
2016-02-19 17:33                   ` Paul E. McKenney
2016-02-20  4:34                     ` Ross Green
2016-02-20  6:32                       ` Paul E. McKenney
2016-02-21  5:04                         ` Ross Green
2016-02-21 18:15                           ` Ross Green
2016-02-23 20:34                             ` Mathieu Desnoyers
2016-02-23 20:55                               ` Paul E. McKenney
2016-02-23 21:28                                 ` Ross Green
2016-02-25  5:13                                   ` Ross Green
2016-02-26  0:56                                     ` Paul E. McKenney
2016-02-26  1:35                                       ` Paul E. McKenney
2016-03-04  5:30                                         ` Ross Green
2016-03-04 15:18                                           ` Paul E. McKenney
2016-03-18 21:00                                       ` Josh Triplett
2016-03-18 23:56                                         ` Paul E. McKenney
2016-03-21 16:22                                           ` Jacob Pan
2016-03-21 17:26                                             ` Paul E. McKenney
2016-03-22 16:35                                               ` Chatre, Reinette
2016-03-22 17:40                                                 ` Paul E. McKenney
2016-03-22 21:04                                                   ` Chatre, Reinette
2016-03-22 21:19                                                     ` Paul E. McKenney
2016-03-23 17:15                                                       ` Chatre, Reinette
2016-03-23 18:20                                                         ` Paul E. McKenney
2016-03-23 18:25                                                           ` Chatre, Reinette
2016-03-23 19:50                                                             ` Paul E. McKenney
2016-03-25 21:24                                                           ` Chatre, Reinette
2016-03-25 21:46                                                             ` Paul E. McKenney
2016-03-26 12:29                                                               ` Mathieu Desnoyers
2016-03-26 15:28                                                                 ` Paul E. McKenney
2016-03-26 18:49                                                                   ` Paul E. McKenney
2016-03-26 22:22                                                                     ` Mathieu Desnoyers [this message]
2016-03-27  1:34                                                                       ` Paul E. McKenney
2016-03-27 13:48                                                                         ` Mathieu Desnoyers
2016-03-27 15:40                                                                           ` Paul E. McKenney
2016-03-27 20:00                                                                             ` Paul E. McKenney
2016-03-27 20:45                                                                             ` Peter Zijlstra
2016-03-27 21:06                                                                               ` Paul E. McKenney
2016-03-28  6:25                                                                                 ` Peter Zijlstra
2016-03-28 13:08                                                                                   ` Paul E. McKenney
2016-03-29  0:25                                                                                     ` Paul E. McKenney
2016-03-29  0:28                                                                                       ` Paul E. McKenney
2016-03-29 13:49                                                                                         ` Paul E. McKenney
2016-03-30 14:55                                                                                           ` Paul E. McKenney
2016-03-31 15:42                                                                                             ` Paul E. McKenney
2016-04-03  8:18                                                                                               ` Paul E. McKenney
2016-05-06  6:25                                                                                                 ` Ross Green
2016-05-07 15:25                                                                                                   ` Paul E. McKenney
2016-05-10  2:36                                                                                                     ` Ross Green
2016-06-30 17:52                                                                                                     ` Paul E. McKenney
2016-03-28  1:44                                                                               ` Mathieu Desnoyers
2016-03-28  2:23                                                                                 ` Mathieu Desnoyers
2016-03-28  6:13                                                                                   ` Peter Zijlstra
2016-03-28 13:50                                                                                     ` Paul E. McKenney
2016-03-28 14:15                                                                                     ` Mathieu Desnoyers
2016-03-27 20:53                                                                             ` Peter Zijlstra
2016-03-27 21:07                                                                               ` Paul E. McKenney
2016-03-27 20:54                                             ` Peter Zijlstra
2016-03-27 21:09                                               ` Paul E. McKenney
2016-03-28  6:28                                                 ` Peter Zijlstra
2016-03-28 13:29                                                   ` Paul E. McKenney
2016-03-28 15:07                                                     ` Mathieu Desnoyers
2016-03-28 15:56                                                       ` Paul E. McKenney
2016-03-28 16:12                                                         ` Mathieu Desnoyers
2016-03-28 16:29                                                           ` Paul E. McKenney
2016-03-30 12:58                                                     ` Boqun Feng
2016-03-30 13:30                                                       ` Paul E. McKenney
2016-03-30 14:15                                                         ` Boqun Feng
2016-02-19  4:22               ` Paul E. McKenney
2016-02-19  5:59                 ` Ross Green

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=706246733.37102.1459030977316.JavaMail.zimbra@efficios.com \
    --to=mathieu.desnoyers@efficios.com \
    --cc=akpm@linux-foundation.org \
    --cc=bobby.prani@gmail.com \
    --cc=dhowells@redhat.com \
    --cc=dipankar@in.ibm.com \
    --cc=dvhart@linux.intel.com \
    --cc=edumazet@google.com \
    --cc=fweisbec@gmail.com \
    --cc=jacob.jun.pan@linux.intel.com \
    --cc=jiangshanlai@gmail.com \
    --cc=john.stultz@linaro.org \
    --cc=josh@joshtriplett.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mingo@kernel.org \
    --cc=oleg@redhat.com \
    --cc=paulmck@linux.vnet.ibm.com \
    --cc=peterz@infradead.org \
    --cc=reinette.chatre@intel.com \
    --cc=rgkernel@gmail.com \
    --cc=rostedt@goodmis.org \
    --cc=tglx@linutronix.de \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.