All of lore.kernel.org
 help / color / mirror / Atom feed
From: Qi Zheng <zhengqi.arch@bytedance.com>
To: Tejun Heo <tj@kernel.org>
Cc: dennis@kernel.org, cl@linux.com, akpm@linux-foundation.org,
	linux-mm@kvack.org, linux-kernel@vger.kernel.org,
	zhouchengming@bytedance.com, songmuchun@bytedance.com
Subject: Re: [PATCH] percpu_ref: call wake_up_all() after percpu_ref_put() completes
Date: Mon, 11 Apr 2022 15:19:23 +0800	[thread overview]
Message-ID: <e5dbcdfe-c011-cf26-07c8-71dd720eb16a@bytedance.com> (raw)
In-Reply-To: <7213fc3b-27f5-373a-0786-0ca9441b9e7e@bytedance.com>



On 2022/4/9 8:40 AM, Qi Zheng wrote:
> 
> 
> On 2022/4/9 1:41 AM, Tejun Heo wrote:
>> Hello,
>>
>> On Thu, Apr 07, 2022 at 06:33:35PM +0800, Qi Zheng wrote:
>>> In the percpu_ref_call_confirm_rcu(), we call the wake_up_all()
>>> before calling percpu_ref_put(), which will cause the value of
>>> percpu_ref to be unstable when percpu_ref_switch_to_atomic_sync()
>>> returns.
>>>
>>>     CPU0                CPU1
>>>
>>> percpu_ref_switch_to_atomic_sync(&ref)
>>> --> percpu_ref_switch_to_atomic(&ref)
>>>      --> percpu_ref_get(ref);    /* put after confirmation */
>>>     call_rcu(&ref->data->rcu, percpu_ref_switch_to_atomic_rcu);
>>>
>>>                     percpu_ref_switch_to_atomic_rcu
>>>                     --> percpu_ref_call_confirm_rcu
>>>                         --> data->confirm_switch = NULL;
>>>                         wake_up_all(&percpu_ref_switch_waitq);
>>>
>>>      /* here waiting to wake up */
>>>      wait_event(percpu_ref_switch_waitq, !ref->data->confirm_switch);
>>>                         (A)percpu_ref_put(ref);
>>> /* The value of &ref is unstable! */
>>> percpu_ref_is_zero(&ref)
>>>                         (B)percpu_ref_put(ref);
>>>
>>> As shown above, assuming that the counts on each cpu add up to 0 before
>>> calling percpu_ref_switch_to_atomic_sync(), we expect that after 
>>> switching
>>> to atomic mode, percpu_ref_is_zero() can return true. But actually it 
>>> will
>>> return different values in the two cases of A and B, which is not what
>>> we expected.
>>>
>>> Maybe the original purpose of percpu_ref_switch_to_atomic_sync() is
>>> just to ensure that the conversion to atomic mode is completed, but it
>>> should not return with an extra reference count.
>>>
>>> Calling wake_up_all() after percpu_ref_put() ensures that the value of
>>> percpu_ref is stable after percpu_ref_switch_to_atomic_sync() returns.
>>> So just do it.
>>>
>>> Signed-off-by: Qi Zheng <zhengqi.arch@bytedance.com>
>>> ---
>>>   lib/percpu-refcount.c | 3 ++-
>>>   1 file changed, 2 insertions(+), 1 deletion(-)
>>>
>>> diff --git a/lib/percpu-refcount.c b/lib/percpu-refcount.c
>>> index af9302141bcf..b11b4152c8cd 100644
>>> --- a/lib/percpu-refcount.c
>>> +++ b/lib/percpu-refcount.c
>>> @@ -154,13 +154,14 @@ static void percpu_ref_call_confirm_rcu(struct 
>>> rcu_head *rcu)
>>>       data->confirm_switch(ref);
>>>       data->confirm_switch = NULL;
>>> -    wake_up_all(&percpu_ref_switch_waitq);
>>>       if (!data->allow_reinit)
>>>           __percpu_ref_exit(ref);
>>>       /* drop ref from percpu_ref_switch_to_atomic() */
>>>       percpu_ref_put(ref);
>>> +
>>> +    wake_up_all(&percpu_ref_switch_waitq);
>>
>> The interface, at least originally, doesn't give any guarantee over 
>> whether
>> there's gonna be a residual reference on it or not. There's nothing
>> necessarily wrong with guaranteeing that but it's rather unusual and 
>> given
>> that putting the base ref in a percpu_ref is a special "kill" 
>> operation and
>> a ref in percpu mode always returns %false on is_zero(), I'm not quite 
>> sure
>> how such semantics would be useful. Do you care to explain the use 
>> case with
>> concrete examples?
> 
> There are currently two users of percpu_ref_switch_to_atomic_sync(), and 
> both are used in the example, one is mddev->writes_pending in
> driver/md/md.c and the other is q->q_usage_counter in block/blk-pm.c.
> 
> The former discards the initial reference count after percpu_ref_init(),
> and the latter kills the initial reference count(by calling 
> percpu_ref_kill() in blk_freeze_queue_start()) before
> percpu_ref_switch_to_atomic_sync(). Looks like they all expect
> percpu_ref to be stable when percpu_ref_switch_to_atomic_sync() returns.
> 
>>
>> Also, the proposed patch is racy. There's nothing preventing
>> percpu_ref_switch_to_atomic_sync() from waking up early between
>> confirm_switch clearing and the wake_up_all, so the above change doesn't
>> guarantee what it tries to guarantee. For that, you'd have to move
>> confirm_switch clearing *after* percpu_ref_put() but then, you'd be
>> accessing the ref after its final ref is put which can lead to
>> use-after-free.
>>
> 
> Oh sorry, it is my bad missing.
> 
>> In fact, the whole premise seems wrong. The switching needs a 
>> reference to
>> the percpu_ref because it is accessing it asynchronously. The 
>> switching side
>> doesn't know when the ref is gonna go away once it puts its reference and
>> thus can't signal that they're done after putting their reference.
>>
>> We *can* make that work by putting the whole thing in its own critical
>> section so that we can make confirm_switch clearing atomic with the 
>> possibly
>> final put, but that's gonna add some complexity and begs the question why
>> we'd need such a thing.
> 
> How about moving the last percpu_ref_put() outside of the
> percpu_ref_switch_to_atomic_rcu() in sync mode like below? But this may 
> not be elegant.
> 
> diff --git a/include/linux/percpu-refcount.h 
> b/include/linux/percpu-refcount.h
> index d73a1c08c3e3..07f92e7e3e19 100644
> --- a/include/linux/percpu-refcount.h
> +++ b/include/linux/percpu-refcount.h
> @@ -98,6 +98,7 @@ struct percpu_ref_data {
>          percpu_ref_func_t       *confirm_switch;
>          bool                    force_atomic:1;
>          bool                    allow_reinit:1;
> +       bool                    sync;
>          struct rcu_head         rcu;
>          struct percpu_ref       *ref;
>   };
> @@ -123,7 +124,8 @@ int __must_check percpu_ref_init(struct percpu_ref 
> *ref,
>                                   gfp_t gfp);
>   void percpu_ref_exit(struct percpu_ref *ref);
>   void percpu_ref_switch_to_atomic(struct percpu_ref *ref,
> -                                percpu_ref_func_t *confirm_switch);
> +                                percpu_ref_func_t *confirm_switch,
> +                                bool sync);
>   void percpu_ref_switch_to_atomic_sync(struct percpu_ref *ref);
>   void percpu_ref_switch_to_percpu(struct percpu_ref *ref);
>   void percpu_ref_kill_and_confirm(struct percpu_ref *ref,
> diff --git a/lib/percpu-refcount.c b/lib/percpu-refcount.c
> index af9302141bcf..2a9d777bcf35 100644
> --- a/lib/percpu-refcount.c
> +++ b/lib/percpu-refcount.c
> @@ -99,6 +99,7 @@ int percpu_ref_init(struct percpu_ref *ref, 
> percpu_ref_func_t *release,
>          data->release = release;
>          data->confirm_switch = NULL;
>          data->ref = ref;
> +       data->sync = false;
>          ref->data = data;
>          return 0;
>   }
> @@ -146,21 +147,30 @@ void percpu_ref_exit(struct percpu_ref *ref)
>   }
>   EXPORT_SYMBOL_GPL(percpu_ref_exit);
> 
> +static inline void percpu_ref_switch_to_atomic_post(struct percpu_ref 
> *ref)
> +{
> +       struct percpu_ref_data *data = ref->data;
> +
> +       if (!data->allow_reinit)
> +               __percpu_ref_exit(ref);
> +
> +       /* drop ref from percpu_ref_switch_to_atomic() */
> +       percpu_ref_put(ref);
> +}
> +
>   static void percpu_ref_call_confirm_rcu(struct rcu_head *rcu)
>   {
>          struct percpu_ref_data *data = container_of(rcu,
>                          struct percpu_ref_data, rcu);
>          struct percpu_ref *ref = data->ref;
> +       bool need_put = !data->sync;
> 
>          data->confirm_switch(ref);
>          data->confirm_switch = NULL;
>          wake_up_all(&percpu_ref_switch_waitq);
> 
> -       if (!data->allow_reinit)
> -               __percpu_ref_exit(ref);
> -
> -       /* drop ref from percpu_ref_switch_to_atomic() */
> -       percpu_ref_put(ref);
> +       if (need_put)
> +               percpu_ref_switch_to_atomic_post(ref);
>   }
> 
>   static void percpu_ref_switch_to_atomic_rcu(struct rcu_head *rcu)
> @@ -302,12 +312,14 @@ static void __percpu_ref_switch_mode(struct 
> percpu_ref *ref,
>    * switching to atomic mode, this function can be called from any 
> context.
>    */
>   void percpu_ref_switch_to_atomic(struct percpu_ref *ref,
> -                                percpu_ref_func_t *confirm_switch)
> +                                percpu_ref_func_t *confirm_switch,
> +                                bool sync)
>   {
>          unsigned long flags;
> 
>          spin_lock_irqsave(&percpu_ref_switch_lock, flags);
> 
> +       ref->data->sync = sync;
>          ref->data->force_atomic = true;
>          __percpu_ref_switch_mode(ref, confirm_switch);
> 
> @@ -325,8 +337,9 @@ EXPORT_SYMBOL_GPL(percpu_ref_switch_to_atomic);
>    */
>   void percpu_ref_switch_to_atomic_sync(struct percpu_ref *ref)
>   {
> -       percpu_ref_switch_to_atomic(ref, NULL);
> +       percpu_ref_switch_to_atomic(ref, NULL, true);
>          wait_event(percpu_ref_switch_waitq, !ref->data->confirm_switch);
> +       percpu_ref_switch_to_atomic_post(ref);
>   }
>   EXPORT_SYMBOL_GPL(percpu_ref_switch_to_atomic_sync);

Hi all, is this fix ok? I can send the v2 version if it looks good.

Thanks,
Qi

> 
>>
>> Andrew, I don't think the patch as proposed makes much sense. Maybe 
>> it'd be
>> better to keep it out of the tree for the time being?
>>
>> Thanks.
>>
> 

-- 
Thanks,
Qi

      reply	other threads:[~2022-04-11  7:19 UTC|newest]

Thread overview: 17+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-04-07 10:33 [PATCH] percpu_ref: call wake_up_all() after percpu_ref_put() completes Qi Zheng
2022-04-07 22:57 ` Andrew Morton
2022-04-08  0:39   ` Dennis Zhou
2022-04-08  1:40   ` Ming Lei
2022-04-08  2:54 ` Muchun Song
2022-04-08  3:50   ` Qi Zheng
2022-04-08  3:54     ` Andrew Morton
2022-04-08  4:06       ` Qi Zheng
2022-04-08  4:10         ` Andrew Morton
2022-04-08  4:14           ` Qi Zheng
2022-04-08  4:16             ` Qi Zheng
2022-04-08  5:57             ` Dennis Zhou
2022-04-08  6:28               ` Qi Zheng
2022-04-08 17:41 ` Tejun Heo
2022-04-08 19:19   ` Dennis Zhou
2022-04-09  0:40   ` Qi Zheng
2022-04-11  7:19     ` Qi Zheng [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=e5dbcdfe-c011-cf26-07c8-71dd720eb16a@bytedance.com \
    --to=zhengqi.arch@bytedance.com \
    --cc=akpm@linux-foundation.org \
    --cc=cl@linux.com \
    --cc=dennis@kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=songmuchun@bytedance.com \
    --cc=tj@kernel.org \
    --cc=zhouchengming@bytedance.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.