linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Jeffrey Hugo <jhugo@codeaurora.org>
To: paulmck@linux.vnet.ibm.com
Cc: linux-kernel@vger.kernel.org, linux-block@vger.kernel.org,
	pprakash@codeaurora.org, Josh Triplett <josh@joshtriplett.org>,
	Steven Rostedt <rostedt@goodmis.org>,
	Mathieu Desnoyers <mathieu.desnoyers@efficios.com>,
	Lai Jiangshan <jiangshanlai@gmail.com>,
	Jens Axboe <axboe@kernel.dk>,
	Sebastian Andrzej Siewior <bigeasy@linutronix.de>,
	Thomas Gleixner <tglx@linutronix.de>,
	Richard Cochran <rcochran@linutronix.de>,
	Boris Ostrovsky <boris.ostrovsky@oracle.com>,
	Richard Weinberger <richard@nod.at>
Subject: Re: [BUG] Deadlock due due to interactions of block, RCU, and cpu offline
Date: Mon, 27 Mar 2017 12:02:27 -0600	[thread overview]
Message-ID: <d23f3f77-8158-24a4-727d-123ec526dffa@codeaurora.org> (raw)
In-Reply-To: <20170326232843.GA3637@linux.vnet.ibm.com>

Hi Paul.

Thanks for the quick reply.

On 3/26/2017 5:28 PM, Paul E. McKenney wrote:
> On Sun, Mar 26, 2017 at 05:10:40PM -0600, Jeffrey Hugo wrote:

>> It is a race between this work running, and the cpu offline processing.
>
> One quick way to test this assumption is to build a kernel with Kconfig
> options CONFIG_RCU_NOCB_CPU=y and CONFIG_RCU_NOCB_CPU_ALL=y.  This will
> cause call_rcu_sched() to queue the work to a kthread, which can migrate
> to some other CPU.  If your analysis is correct, this should avoid
> the deadlock.  (Note that the deadlock should be fixed in any case,
> just a diagnostic assumption-check procedure.)

I enabled CONFIG_RCU_EXPERT=y, CONFIG_RCU_NOCB_CPU=y, 
CONFIG_RCU_NOCB_CPU_ALL=y in my build.  I've only had time so far to do 
one test run however the issue reproduced, but it took a fair bit longer 
to do so.  An initial look at the data indicates that the work is still 
not running.  An odd observation, the two threads are no longer blocked 
on the same queue, but different ones.

Let me look at this more and see what is going on now.


>> What is the opinion of the domain experts?
>
> I do hope that we can come up with a better fix.  No offense intended,
> as coming up with -any- fix in the CPU-hotplug domain is not to be
> denigrated, but this looks to be at vest quite fragile.
>
> 							Thanx, Paul
>

None taken.  I'm not particularly attached to the current fix.  I agree, 
it does appear to be quite fragile.

I'm still not sure what a better solution would be though.  Maybe the 
RCU framework flushes the work somehow during cpu offline?  It would 
need to ensure further work is not queued after that point, which seems 
like it might be tricky to synchronize.  I don't know enough about the 
working of RCU to even attempt to implement that.

In any case, it seem like some more analysis is needed based on the 
latest data.

-- 
Jeffrey Hugo
Qualcomm Datacenter Technologies as an affiliate of Qualcomm 
Technologies, Inc.
Qualcomm Technologies, Inc. is a member of the
Code Aurora Forum, a Linux Foundation Collaborative Project.

  reply	other threads:[~2017-03-27 18:03 UTC|newest]

Thread overview: 17+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-03-26 23:10 [BUG] Deadlock due due to interactions of block, RCU, and cpu offline Jeffrey Hugo
2017-03-26 23:28 ` Paul E. McKenney
2017-03-27 18:02   ` Jeffrey Hugo [this message]
2017-03-27 18:17     ` Paul E. McKenney
2017-06-20 23:46       ` Paul E. McKenney
2017-06-21 14:39         ` Jeffrey Hugo
2017-06-21 16:18           ` Paul E. McKenney
2017-06-23  3:34             ` Paul E. McKenney
2017-06-27 22:32               ` Jeffrey Hugo
2017-06-28  0:11                 ` Paul E. McKenney
2017-06-29 16:29                   ` Jeffrey Hugo
2017-06-30  0:18                     ` Paul E. McKenney
2017-08-20 19:31                       ` Jeffrey Hugo
2017-08-20 20:56                         ` Paul E. McKenney
2017-08-22 16:12                           ` Paolo Bonzini
2017-08-22 20:53                             ` Jeffrey Hugo
2017-08-15  8:46 ` [tip:core/rcu] rcu: Migrate callbacks earlier in the CPU-offline timeline tip-bot for Paul E. McKenney

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=d23f3f77-8158-24a4-727d-123ec526dffa@codeaurora.org \
    --to=jhugo@codeaurora.org \
    --cc=axboe@kernel.dk \
    --cc=bigeasy@linutronix.de \
    --cc=boris.ostrovsky@oracle.com \
    --cc=jiangshanlai@gmail.com \
    --cc=josh@joshtriplett.org \
    --cc=linux-block@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mathieu.desnoyers@efficios.com \
    --cc=paulmck@linux.vnet.ibm.com \
    --cc=pprakash@codeaurora.org \
    --cc=rcochran@linutronix.de \
    --cc=richard@nod.at \
    --cc=rostedt@goodmis.org \
    --cc=tglx@linutronix.de \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).