All of lore.kernel.org
 help / color / mirror / Atom feed
From: Tejun Heo <tj@kernel.org>
To: Christoph Hellwig <hch@lst.de>
Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>,
	Peter Zijlstra <peterz@infradead.org>,
	Christian Borntraeger <borntraeger@de.ibm.com>,
	Heiko Carstens <heiko.carstens@de.ibm.com>,
	"linux-kernel@vger.kernel.org >> Linux Kernel Mailing List" 
	<linux-kernel@vger.kernel.org>,
	linux-s390 <linux-s390@vger.kernel.org>,
	KVM list <kvm@vger.kernel.org>, Oleg Nesterov <oleg@redhat.com>
Subject: Re: regression 4.4: deadlock in with cgroup percpu_rwsem
Date: Tue, 26 Jan 2016 10:28:46 -0500	[thread overview]
Message-ID: <20160126152846.GO3628@mtj.duckdns.org> (raw)
In-Reply-To: <20160126145157.GA31177@lst.de>

Hello, Christoph.

On Tue, Jan 26, 2016 at 03:51:57PM +0100, Christoph Hellwig wrote:
> > That's interesting.  Can you please elaborate on how kill and exit
> > interact to make things complex?
> 
> That we need to first call kill to tear down the reference, then we get
> a release callback which is in the calling context of the last
> percpu_ref_put, but will need to call percpu_ref_exit from process context
> again.  This means if any percpu_ref_put is from non-process context

Hmmm... why do you need to call percpu_ref_exit() from process
context?  All it does is freeing the percpu counter and resetting the
state, both of which can be done from any context.

> we will always need a work_struct or similar to schedule the final
> percpu_ref_exit.  Except when..

I don't think that's true.

> > > be a percpu_ref_exit_sync that kills the ref and waits for all references
> > > to go away synchronously.
> > 
> > That shouldn't be difficult to implement.  One minor concern is that
> > it's almost guaranteed that there will be cases where the
> > synchronicity is exposed to userland.  Anyways, can you please
> > describe the use case?
> 
> We use this completion scheme where the percpu_ref_exit is done from
> the same context as the percpu_ref_kill which previously waits for
> the last reference drop.  But for these cases exposing the synchronicity
> to the caller (including userland) actually is intentional.
> 
> My use case is a new storage target, broadly similar to the SCSI target,
> which happens to exhibit the same behavior.  In that case we only want
> to return from the teardown function when all I/O on a 'queue' of sorts
> has finished, for example during module removal.

It'd most likely end up doing synchronous destruction in a loop with
each iteration involving a full RCU grace period.  If there can be a
lot of devices, it can add up to a substantial amount of time.  Maybe
it's okay here but I've already been bitten several times by the exact
same issue.

Thanks.

-- 
tejun

  reply	other threads:[~2016-01-26 15:28 UTC|newest]

Thread overview: 87+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-01-14 11:19 regression 4.4: deadlock in with cgroup percpu_rwsem Christian Borntraeger
2016-01-14 11:19 ` Christian Borntraeger
2016-01-14 13:38 ` Christian Borntraeger
2016-01-14 13:38   ` Christian Borntraeger
2016-01-14 14:04 ` Nikolay Borisov
2016-01-14 14:04   ` Nikolay Borisov
2016-01-14 14:08   ` Christian Borntraeger
2016-01-14 14:08     ` Christian Borntraeger
2016-01-14 14:27     ` Nikolay Borisov
2016-01-14 14:27       ` Nikolay Borisov
2016-01-14 17:15       ` Christian Borntraeger
2016-01-14 17:15         ` Christian Borntraeger
2016-01-14 19:56 ` Tejun Heo
2016-01-14 19:56   ` Tejun Heo
2016-01-15  7:30   ` Christian Borntraeger
2016-01-15  7:30     ` Christian Borntraeger
2016-01-15 15:13     ` Christian Borntraeger
2016-01-15 15:13       ` Christian Borntraeger
2016-01-18 18:32       ` Peter Zijlstra
2016-01-18 18:32         ` Peter Zijlstra
2016-01-18 18:48         ` Christian Borntraeger
2016-01-18 18:48           ` Christian Borntraeger
2016-01-19  9:55           ` Heiko Carstens
2016-01-19  9:55             ` Heiko Carstens
2016-01-19 19:36             ` Christian Borntraeger
2016-01-19 19:36               ` Christian Borntraeger
2016-01-19 19:38               ` Tejun Heo
2016-01-19 19:38                 ` Tejun Heo
2016-01-20  7:07                 ` Heiko Carstens
2016-01-20  7:07                   ` Heiko Carstens
2016-01-20 10:15                   ` Christian Borntraeger
2016-01-20 10:15                     ` Christian Borntraeger
2016-01-20 10:30                     ` Peter Zijlstra
2016-01-20 10:30                       ` Peter Zijlstra
2016-01-20 10:47                       ` Peter Zijlstra
2016-01-20 10:47                         ` Peter Zijlstra
2016-01-20 15:30                         ` Tejun Heo
2016-01-20 15:30                           ` Tejun Heo
2016-01-20 16:04                           ` Tejun Heo
2016-01-20 16:04                             ` Tejun Heo
2016-01-20 16:49                             ` Peter Zijlstra
2016-01-20 16:49                               ` Peter Zijlstra
2016-01-20 16:56                               ` Tejun Heo
2016-01-20 16:56                                 ` Tejun Heo
2016-01-23  2:03                           ` Paul E. McKenney
2016-01-23  2:03                             ` Paul E. McKenney
2016-01-25  8:49                             ` Christoph Hellwig
2016-01-25  8:49                               ` Christoph Hellwig
2016-01-25 19:38                               ` Tejun Heo
2016-01-25 19:38                                 ` Tejun Heo
2016-01-26 14:51                                 ` Christoph Hellwig
2016-01-26 14:51                                   ` Christoph Hellwig
2016-01-26 15:28                                   ` Tejun Heo [this message]
2016-01-26 15:28                                     ` Tejun Heo
2016-01-26 16:41                                     ` Christoph Hellwig
2016-01-26 16:41                                       ` Christoph Hellwig
2016-01-20 10:53                       ` Peter Zijlstra
2016-01-20 10:53                         ` Peter Zijlstra
2016-01-21  8:23                         ` Christian Borntraeger
2016-01-21  8:23                           ` Christian Borntraeger
2016-01-21  9:27                           ` Peter Zijlstra
2016-01-21  9:27                             ` Peter Zijlstra
2016-01-15 16:40     ` Tejun Heo
2016-01-15 16:40       ` Tejun Heo
2016-01-19 17:18       ` [PATCH cgroup/for-4.5-fixes] cpuset: make mm migration asynchronous Tejun Heo
2016-01-19 17:18         ` Tejun Heo
2016-01-22 14:24         ` Christian Borntraeger
2016-01-22 15:22           ` Tejun Heo
2016-01-22 15:45             ` Christian Borntraeger
2016-01-22 15:45               ` Christian Borntraeger
2016-01-22 15:47               ` Tejun Heo
2016-01-22 15:23         ` Tejun Heo
2016-01-22 15:23           ` Tejun Heo
2016-01-21 20:31     ` [PATCH 1/2] cgroup: make sure a parent css isn't offlined before its children Tejun Heo
2016-01-21 20:31       ` Tejun Heo
2016-01-21 20:32       ` [PATCH 2/2] cgroup: make sure a parent css isn't freed " Tejun Heo
2016-01-22 15:45         ` [PATCH v2 " Tejun Heo
2016-01-22 15:45           ` Tejun Heo
2016-01-21 21:24       ` [PATCH 1/2] cgroup: make sure a parent css isn't offlined " Peter Zijlstra
2016-01-21 21:24         ` Peter Zijlstra
2016-01-21 21:28         ` Tejun Heo
2016-01-21 21:28           ` Tejun Heo
2016-01-22  8:18           ` Christian Borntraeger
2016-02-29 11:13         ` [tip:sched/core] sched/cgroup: Fix cgroup entity load tracking tear-down tip-bot for Peter Zijlstra
2016-01-22 15:45       ` [PATCH v2 1/2] cgroup: make sure a parent css isn't offlined before its children Tejun Heo
2016-01-22 15:45         ` Tejun Heo
2016-01-22 15:45         ` Tejun Heo

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20160126152846.GO3628@mtj.duckdns.org \
    --to=tj@kernel.org \
    --cc=borntraeger@de.ibm.com \
    --cc=hch@lst.de \
    --cc=heiko.carstens@de.ibm.com \
    --cc=kvm@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-s390@vger.kernel.org \
    --cc=oleg@redhat.com \
    --cc=paulmck@linux.vnet.ibm.com \
    --cc=peterz@infradead.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.