All of lore.kernel.org
 help / color / mirror / Atom feed
From: Dennis Zhou <dennis@kernel.org>
To: Tejun Heo <tj@kernel.org>
Cc: Qi Zheng <zhengqi.arch@bytedance.com>,
	cl@linux.com, akpm@linux-foundation.org, linux-mm@kvack.org,
	linux-kernel@vger.kernel.org, zhouchengming@bytedance.com,
	songmuchun@bytedance.com
Subject: Re: [PATCH] percpu_ref: call wake_up_all() after percpu_ref_put() completes
Date: Fri, 8 Apr 2022 12:19:32 -0700	[thread overview]
Message-ID: <YlCKxBufsHgexguy@fedora> (raw)
In-Reply-To: <YlBzsakUloG4nS7W@slm.duckdns.org>

On Fri, Apr 08, 2022 at 07:41:05AM -1000, Tejun Heo wrote:
> Hello,
> 
> On Thu, Apr 07, 2022 at 06:33:35PM +0800, Qi Zheng wrote:
> > In the percpu_ref_call_confirm_rcu(), we call the wake_up_all()
> > before calling percpu_ref_put(), which will cause the value of
> > percpu_ref to be unstable when percpu_ref_switch_to_atomic_sync()
> > returns.
> > 
> > 	CPU0				CPU1
> > 
> > percpu_ref_switch_to_atomic_sync(&ref)
> > --> percpu_ref_switch_to_atomic(&ref)
> >     --> percpu_ref_get(ref);	/* put after confirmation */
> > 	call_rcu(&ref->data->rcu, percpu_ref_switch_to_atomic_rcu);
> > 
> > 					percpu_ref_switch_to_atomic_rcu
> > 					--> percpu_ref_call_confirm_rcu
> > 					    --> data->confirm_switch = NULL;
> > 						wake_up_all(&percpu_ref_switch_waitq);
> > 
> >     /* here waiting to wake up */
> >     wait_event(percpu_ref_switch_waitq, !ref->data->confirm_switch);
> > 						(A)percpu_ref_put(ref);
> > /* The value of &ref is unstable! */
> > percpu_ref_is_zero(&ref)
> > 						(B)percpu_ref_put(ref);
> > 
> > As shown above, assuming that the counts on each cpu add up to 0 before
> > calling percpu_ref_switch_to_atomic_sync(), we expect that after switching
> > to atomic mode, percpu_ref_is_zero() can return true. But actually it will
> > return different values in the two cases of A and B, which is not what
> > we expected.
> > 
> > Maybe the original purpose of percpu_ref_switch_to_atomic_sync() is
> > just to ensure that the conversion to atomic mode is completed, but it
> > should not return with an extra reference count.
> > 
> > Calling wake_up_all() after percpu_ref_put() ensures that the value of
> > percpu_ref is stable after percpu_ref_switch_to_atomic_sync() returns.
> > So just do it.
> > 
> > Signed-off-by: Qi Zheng <zhengqi.arch@bytedance.com>
> > ---
> >  lib/percpu-refcount.c | 3 ++-
> >  1 file changed, 2 insertions(+), 1 deletion(-)
> > 
> > diff --git a/lib/percpu-refcount.c b/lib/percpu-refcount.c
> > index af9302141bcf..b11b4152c8cd 100644
> > --- a/lib/percpu-refcount.c
> > +++ b/lib/percpu-refcount.c
> > @@ -154,13 +154,14 @@ static void percpu_ref_call_confirm_rcu(struct rcu_head *rcu)
> >  
> >  	data->confirm_switch(ref);
> >  	data->confirm_switch = NULL;
> > -	wake_up_all(&percpu_ref_switch_waitq);
> >  
> >  	if (!data->allow_reinit)
> >  		__percpu_ref_exit(ref);
> >  
> >  	/* drop ref from percpu_ref_switch_to_atomic() */
> >  	percpu_ref_put(ref);
> > +
> > +	wake_up_all(&percpu_ref_switch_waitq);
> 
> The interface, at least originally, doesn't give any guarantee over whether
> there's gonna be a residual reference on it or not. There's nothing
> necessarily wrong with guaranteeing that but it's rather unusual and given
> that putting the base ref in a percpu_ref is a special "kill" operation and
> a ref in percpu mode always returns %false on is_zero(), I'm not quite sure
> how such semantics would be useful. Do you care to explain the use case with
> concrete examples?

block/blk-pm.c has:
        percpu_ref_switch_to_atomic_sync(&q->q_usage_counter);
        if (percpu_ref_is_zero(&q->q_usage_counter))

> 
> Also, the proposed patch is racy. There's nothing preventing
> percpu_ref_switch_to_atomic_sync() from waking up early between
> confirm_switch clearing and the wake_up_all, so the above change doesn't
> guarantee what it tries to guarantee. For that, you'd have to move
> confirm_switch clearing *after* percpu_ref_put() but then, you'd be
> accessing the ref after its final ref is put which can lead to
> use-after-free.
> 

Sad that is my bad missing that.

> In fact, the whole premise seems wrong. The switching needs a reference to
> the percpu_ref because it is accessing it asynchronously. The switching side
> doesn't know when the ref is gonna go away once it puts its reference and
> thus can't signal that they're done after putting their reference.
> 

I read it as 2 usages of percpu_ref. 1 is as the tie a lifetime to an
object, the 2nd is just as a raw reference counter which md and
request_queue use.

In the first use case, I don't think it makes any sense to call
percpu_ref_switch_to_atomic_sync(). And if you did, wouldn't
percpu_ref_switch_to_atomic_sync() to percpu_ref_is_zero() either be
use-after-free or always false.

I feel like the 2nd use case is fair game though because if you're using
percpu_ref_switch_to_atomic_*(), the lifetime of percpu_ref has to be
guaranteed outside of the kill callback.

> We *can* make that work by putting the whole thing in its own critical
> section so that we can make confirm_switch clearing atomic with the possibly
> final put, but that's gonna add some complexity and begs the question why
> we'd need such a thing.
> 
> Andrew, I don't think the patch as proposed makes much sense. Maybe it'd be
> better to keep it out of the tree for the time being?
> 
> Thanks.
> 

Thanks,
Dennis

  reply	other threads:[~2022-04-08 19:19 UTC|newest]

Thread overview: 17+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-04-07 10:33 [PATCH] percpu_ref: call wake_up_all() after percpu_ref_put() completes Qi Zheng
2022-04-07 22:57 ` Andrew Morton
2022-04-08  0:39   ` Dennis Zhou
2022-04-08  1:40   ` Ming Lei
2022-04-08  2:54 ` Muchun Song
2022-04-08  3:50   ` Qi Zheng
2022-04-08  3:54     ` Andrew Morton
2022-04-08  4:06       ` Qi Zheng
2022-04-08  4:10         ` Andrew Morton
2022-04-08  4:14           ` Qi Zheng
2022-04-08  4:16             ` Qi Zheng
2022-04-08  5:57             ` Dennis Zhou
2022-04-08  6:28               ` Qi Zheng
2022-04-08 17:41 ` Tejun Heo
2022-04-08 19:19   ` Dennis Zhou [this message]
2022-04-09  0:40   ` Qi Zheng
2022-04-11  7:19     ` Qi Zheng

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=YlCKxBufsHgexguy@fedora \
    --to=dennis@kernel.org \
    --cc=akpm@linux-foundation.org \
    --cc=cl@linux.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=songmuchun@bytedance.com \
    --cc=tj@kernel.org \
    --cc=zhengqi.arch@bytedance.com \
    --cc=zhouchengming@bytedance.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.