All of lore.kernel.org
 help / color / mirror / Atom feed
From: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
To: "Chatre, Reinette" <reinette.chatre@intel.com>
Cc: "Jacob Pan" <jacob.jun.pan@linux.intel.com>,
	"Josh Triplett" <josh@joshtriplett.org>,
	"Ross Green" <rgkernel@gmail.com>,
	"Mathieu Desnoyers" <mathieu.desnoyers@efficios.com>,
	"John Stultz" <john.stultz@linaro.org>,
	"Thomas Gleixner" <tglx@linutronix.de>,
	"Peter Zijlstra" <peterz@infradead.org>,
	lkml <linux-kernel@vger.kernel.org>,
	"Ingo Molnar" <mingo@kernel.org>,
	"Lai Jiangshan" <jiangshanlai@gmail.com>,
	"dipankar@in.ibm.com" <dipankar@in.ibm.com>,
	"Andrew Morton" <akpm@linux-foundation.org>,
	rostedt <rostedt@goodmis.org>,
	"David Howells" <dhowells@redhat.com>,
	"Eric Dumazet" <edumazet@google.com>,
	"Darren Hart" <dvhart@linux.intel.com>,
	"Frédéric Weisbecker" <fweisbec@gmail.com>,
	"Oleg Nesterov" <oleg@redhat.com>,
	"pranith kumar" <bobby.prani@gmail.com>
Subject: Re: rcu_preempt self-detected stall on CPU from 4.5-rc3, since 3.17
Date: Tue, 22 Mar 2016 10:40:11 -0700	[thread overview]
Message-ID: <20160322174011.GM4287@linux.vnet.ibm.com> (raw)
In-Reply-To: <0D818C7A2259ED42912C1E04120FDE26712E27C8@ORSMSX111.amr.corp.intel.com>

On Tue, Mar 22, 2016 at 04:35:32PM +0000, Chatre, Reinette wrote:
> Hi Paul,

Hello, Reinette!

> On 2016-03-21, Paul E. McKenney wrote:
> > On Mon, Mar 21, 2016 at 09:22:30AM -0700, Jacob Pan wrote:
> >> On Fri, 18 Mar 2016 16:56:41 -0700
> >> "Paul E. McKenney" <paulmck@linux.vnet.ibm.com> wrote:
> >>> On Fri, Mar 18, 2016 at 02:00:11PM -0700, Josh Triplett wrote:
> >>>> On Thu, Feb 25, 2016 at 04:56:38PM -0800, Paul E. McKenney wrote:
> > 
> > [ . . . ]
> > 
> >>>> We're seeing a similar stall (~60 seconds) on an x86 development
> >>>> system here.  Any luck tracking down the cause of this?  If not, any
> >>>> suggestions for traces that might be helpful?
> >>> 
> >>> The dmesg containing the stall, the kernel version, and the .config
> >>> would be helpful!  Working on a torture test specific to this bug...

And thank you for the .config.  Your kenrle version looks to be 4.5.0.

> >> +Reinette, she has the system that can reproduce the issue. I
> >> believe she is having some other problems with it at the moment. But
> >> the .config should be available. Version is v4.5.
> > 
> > A couple of additional questions:
> > 
> > 1.	Is the test running on bare metal or virtualized?  If the
> > 	latter, what is the host?
> 
> Bare metal.

OK, you are ahead of me.  Mine is virtualized.

> > 2.	Does the workload involve CPU hotplug?
> 
> No.

Again, you are ahead of me.  Mine makes extremely heavy use of CPU hotplug.

> > 3.	Are you seeing things like this in dmesg?
> > 
> > 	"rcu_preempt kthread starved for 21033 jiffies"
> > 	"rcu_sched kthread starved for 32103 jiffies"
> > 	"rcu_bh kthread starved for 84031 jiffies"
> > 
> > 	If not, you are probably facing some other bug, and should
> > 	proceed debugging as described in Documentation/RCU/stallwarn.txt.
> 
> Below is a sample of what I see as captured with v4.5. The kernel configuration is attached.
> 
> [  135.456197] INFO: rcu_preempt detected stalls on CPUs/tasks:
> [  135.457729]  3-...: (0 ticks this GP) idle=722/0/0 softirq=5532/5532 fqs=0 
> [  135.459604]  (detected by 2, t=60004 jiffies, g=2105, c=2104, q=165)
> [  135.461318] Task dump for CPU 3:
> [  135.461321] swapper/3       R  running task        0     0      1 0x00200000
> [  135.461325]  00000078560040e5 ffff88017846fed0 ffffffff818af2cc ffff880100000000
> [  135.461330]  0000000600000003 ffff880178470000 ffff880072f32200 ffffffff822dcec0
> [  135.461334]  ffff88017846c000 ffff88017846c000 ffff88017846fee0 ffffffff818af517
> [  135.461338] Call Trace:
> [  135.461345]  [<ffffffff818af2cc>] ? cpuidle_enter_state+0xfc/0x310
> [  135.461349]  [<ffffffff818af517>] ? cpuidle_enter+0x17/0x20
> [  135.461353]  [<ffffffff811515aa>] ? call_cpuidle+0x2a/0x40
> [  135.461355]  [<ffffffff8115197d>] ? cpu_startup_entry+0x28d/0x360
> [  135.461360]  [<ffffffff8108c874>] ? start_secondary+0x114/0x140
> [  135.461365] rcu_preempt kthread starved for 60004 jiffies! g2105 c2104 f0x0 RCU_GP_WAIT_FQS(3) ->state=0x1

And yes, it looks like you are seeing the same bug that I am tracing.

The kthread is blocked on a schedule_timeout_interruptible().  Given
default configuration, this would have a three-jiffy timeout.

You set CONFIG_RCU_CPU_STALL_TIMEOUT=60, which matches the 60004 jiffies
above.  Is that value due to a distro setting or something?  Mainline
uses CONFIG_RCU_CPU_STALL_TIMEOUT=21.

> [  135.463965] rcu_preempt     S ffff88017844fd68     0     7      2 0x00000000
> [  135.463969]  ffff88017844fd68 ffff88017dd8cc80 ffff880177ff0000 ffff880178443b80
> [  135.463973]  ffff880178450000 ffff88017844fda0 ffff88017dd8cc80 ffff88017dd8cc80
> [  135.463977]  0000000000000003 ffff88017844fd80 ffffffff81ab031f 0000000100031504
> [  135.463981] Call Trace:
> [  135.463986]  [<ffffffff81ab031f>] schedule+0x3f/0xa0
> [  135.463989]  [<ffffffff81ab42d7>] schedule_timeout+0x127/0x270
> [  135.463993]  [<ffffffff81171a50>] ? detach_if_pending+0x120/0x120
> [  135.463997]  [<ffffffff8116da5d>] rcu_gp_kthread+0x6bd/0xa30
> [  135.464000]  [<ffffffff81151390>] ? wake_atomic_t_function+0x70/0x70
> [  135.464003]  [<ffffffff8116d3a0>] ? force_qs_rnp+0x1b0/0x1b0
> [  135.464006]  [<ffffffff8112f846>] kthread+0xe6/0x100
> [  135.464009]  [<ffffffff8112f760>] ? kthread_worker_fn+0x190/0x190
> [  135.464012]  [<ffffffff81ab5c0f>] ret_from_fork+0x3f/0x70
> [  135.464015]  [<ffffffff8112f760>] ? kthread_worker_fn+0x190/0x190

How long does it take to reproduce this?  If it reproduces in minutes
or hours, could you please boot with the following on the kernel command
line and dump the trace buffer shortly after the stall?

ftrace trace_event=sched_waking,sched_wakeup,sched_wake_idle_without_ipi

If dumping manually shortly after the stall is at all non-trivial
(for example, if your reproduction time is many minute or hours),
I can supply some patches that automate this.  Or you can pick
them up from -rcu:

git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu.git

Branch rcu/dev has these patches (and much else besides).

							Thanx, Paul

PS:  In case you are curious, when I enable those tracepoints, it
     shows me that the timer is firing every three jiffies, as it
     should, but that something happens between the sched_waking
     and the IPI handler that should actually do the wakeup.
     However, adding the traces significantly slows reproduction,
     so I am writing a stress test specific to this bug to try to
     speed things up, hopefully allowing more tracing to be added
     while still retaining non-geologic reproduction times.

  reply	other threads:[~2016-03-22 17:40 UTC|newest]

Thread overview: 79+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-02-09 10:11 rcu_preempt self-detected stall on CPU from 4.5-rc3, since 3.17 Ross Green
2016-02-17  5:45 ` Paul E. McKenney
2016-02-17 19:28   ` Paul E. McKenney
2016-02-17 19:45     ` Peter Zijlstra
2016-02-17 20:28       ` Paul E. McKenney
2016-02-17 23:19         ` Paul E. McKenney
2016-02-18 11:51           ` Ross Green
2016-02-18 23:14             ` Mathieu Desnoyers
2016-02-19  3:56               ` Ross Green
2016-02-19  4:13                 ` John Stultz
2016-02-19 17:33                   ` Paul E. McKenney
2016-02-20  4:34                     ` Ross Green
2016-02-20  6:32                       ` Paul E. McKenney
2016-02-21  5:04                         ` Ross Green
2016-02-21 18:15                           ` Ross Green
2016-02-23 20:34                             ` Mathieu Desnoyers
2016-02-23 20:55                               ` Paul E. McKenney
2016-02-23 21:28                                 ` Ross Green
2016-02-25  5:13                                   ` Ross Green
2016-02-26  0:56                                     ` Paul E. McKenney
2016-02-26  1:35                                       ` Paul E. McKenney
2016-03-04  5:30                                         ` Ross Green
2016-03-04 15:18                                           ` Paul E. McKenney
2016-03-18 21:00                                       ` Josh Triplett
2016-03-18 23:56                                         ` Paul E. McKenney
2016-03-21 16:22                                           ` Jacob Pan
2016-03-21 17:26                                             ` Paul E. McKenney
2016-03-22 16:35                                               ` Chatre, Reinette
2016-03-22 17:40                                                 ` Paul E. McKenney [this message]
2016-03-22 21:04                                                   ` Chatre, Reinette
2016-03-22 21:19                                                     ` Paul E. McKenney
2016-03-23 17:15                                                       ` Chatre, Reinette
2016-03-23 18:20                                                         ` Paul E. McKenney
2016-03-23 18:25                                                           ` Chatre, Reinette
2016-03-23 19:50                                                             ` Paul E. McKenney
2016-03-25 21:24                                                           ` Chatre, Reinette
2016-03-25 21:46                                                             ` Paul E. McKenney
2016-03-26 12:29                                                               ` Mathieu Desnoyers
2016-03-26 15:28                                                                 ` Paul E. McKenney
2016-03-26 18:49                                                                   ` Paul E. McKenney
2016-03-26 22:22                                                                     ` Mathieu Desnoyers
2016-03-27  1:34                                                                       ` Paul E. McKenney
2016-03-27 13:48                                                                         ` Mathieu Desnoyers
2016-03-27 15:40                                                                           ` Paul E. McKenney
2016-03-27 20:00                                                                             ` Paul E. McKenney
2016-03-27 20:45                                                                             ` Peter Zijlstra
2016-03-27 21:06                                                                               ` Paul E. McKenney
2016-03-28  6:25                                                                                 ` Peter Zijlstra
2016-03-28 13:08                                                                                   ` Paul E. McKenney
2016-03-29  0:25                                                                                     ` Paul E. McKenney
2016-03-29  0:28                                                                                       ` Paul E. McKenney
2016-03-29 13:49                                                                                         ` Paul E. McKenney
2016-03-30 14:55                                                                                           ` Paul E. McKenney
2016-03-31 15:42                                                                                             ` Paul E. McKenney
2016-04-03  8:18                                                                                               ` Paul E. McKenney
2016-05-06  6:25                                                                                                 ` Ross Green
2016-05-07 15:25                                                                                                   ` Paul E. McKenney
2016-05-10  2:36                                                                                                     ` Ross Green
2016-06-30 17:52                                                                                                     ` Paul E. McKenney
2016-03-28  1:44                                                                               ` Mathieu Desnoyers
2016-03-28  2:23                                                                                 ` Mathieu Desnoyers
2016-03-28  6:13                                                                                   ` Peter Zijlstra
2016-03-28 13:50                                                                                     ` Paul E. McKenney
2016-03-28 14:15                                                                                     ` Mathieu Desnoyers
2016-03-27 20:53                                                                             ` Peter Zijlstra
2016-03-27 21:07                                                                               ` Paul E. McKenney
2016-03-27 20:54                                             ` Peter Zijlstra
2016-03-27 21:09                                               ` Paul E. McKenney
2016-03-28  6:28                                                 ` Peter Zijlstra
2016-03-28 13:29                                                   ` Paul E. McKenney
2016-03-28 15:07                                                     ` Mathieu Desnoyers
2016-03-28 15:56                                                       ` Paul E. McKenney
2016-03-28 16:12                                                         ` Mathieu Desnoyers
2016-03-28 16:29                                                           ` Paul E. McKenney
2016-03-30 12:58                                                     ` Boqun Feng
2016-03-30 13:30                                                       ` Paul E. McKenney
2016-03-30 14:15                                                         ` Boqun Feng
2016-02-19  4:22               ` Paul E. McKenney
2016-02-19  5:59                 ` Ross Green

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20160322174011.GM4287@linux.vnet.ibm.com \
    --to=paulmck@linux.vnet.ibm.com \
    --cc=akpm@linux-foundation.org \
    --cc=bobby.prani@gmail.com \
    --cc=dhowells@redhat.com \
    --cc=dipankar@in.ibm.com \
    --cc=dvhart@linux.intel.com \
    --cc=edumazet@google.com \
    --cc=fweisbec@gmail.com \
    --cc=jacob.jun.pan@linux.intel.com \
    --cc=jiangshanlai@gmail.com \
    --cc=john.stultz@linaro.org \
    --cc=josh@joshtriplett.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mathieu.desnoyers@efficios.com \
    --cc=mingo@kernel.org \
    --cc=oleg@redhat.com \
    --cc=peterz@infradead.org \
    --cc=reinette.chatre@intel.com \
    --cc=rgkernel@gmail.com \
    --cc=rostedt@goodmis.org \
    --cc=tglx@linutronix.de \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.