All of lore.kernel.org
 help / color / mirror / Atom feed
From: Miaohe Lin <linmiaohe@huawei.com>
To: "Huang, Ying" <ying.huang@intel.com>
Cc: <akpm@linux-foundation.org>, <hannes@cmpxchg.org>,
	<mhocko@suse.com>, <iamjoonsoo.kim@lge.com>, <vbabka@suse.cz>,
	<alex.shi@linux.alibaba.com>, <willy@infradead.org>,
	<minchan@kernel.org>, <richard.weiyang@gmail.com>,
	<hughd@google.com>, <tim.c.chen@linux.intel.com>,
	<linux-kernel@vger.kernel.org>, <linux-mm@kvack.org>
Subject: Re: [PATCH 1/5] mm/swapfile: add percpu_ref support for swap
Date: Tue, 13 Apr 2021 20:39:24 +0800	[thread overview]
Message-ID: <46a51c49-2887-0c1a-bcf3-e1ebe9698ebf@huawei.com> (raw)
In-Reply-To: <87zgy4ufr3.fsf@yhuang6-desk1.ccr.corp.intel.com>

On 2021/4/12 15:24, Huang, Ying wrote:
> "Huang, Ying" <ying.huang@intel.com> writes:
> 
>> Miaohe Lin <linmiaohe@huawei.com> writes:
>>
>>> We will use percpu-refcount to serialize against concurrent swapoff. This
>>> patch adds the percpu_ref support for later fixup.
>>>
>>> Signed-off-by: Miaohe Lin <linmiaohe@huawei.com>
>>> ---
>>>  include/linux/swap.h |  2 ++
>>>  mm/swapfile.c        | 25 ++++++++++++++++++++++---
>>>  2 files changed, 24 insertions(+), 3 deletions(-)
>>>
>>> diff --git a/include/linux/swap.h b/include/linux/swap.h
>>> index 144727041e78..849ba5265c11 100644
>>> --- a/include/linux/swap.h
>>> +++ b/include/linux/swap.h
>>> @@ -240,6 +240,7 @@ struct swap_cluster_list {
>>>   * The in-memory structure used to track swap areas.
>>>   */
>>>  struct swap_info_struct {
>>> +	struct percpu_ref users;	/* serialization against concurrent swapoff */
>>>  	unsigned long	flags;		/* SWP_USED etc: see above */
>>>  	signed short	prio;		/* swap priority of this type */
>>>  	struct plist_node list;		/* entry in swap_active_head */
>>> @@ -260,6 +261,7 @@ struct swap_info_struct {
>>>  	struct block_device *bdev;	/* swap device or bdev of swap file */
>>>  	struct file *swap_file;		/* seldom referenced */
>>>  	unsigned int old_block_size;	/* seldom referenced */
>>> +	struct completion comp;		/* seldom referenced */
>>>  #ifdef CONFIG_FRONTSWAP
>>>  	unsigned long *frontswap_map;	/* frontswap in-use, one bit per page */
>>>  	atomic_t frontswap_pages;	/* frontswap pages in-use counter */
>>> diff --git a/mm/swapfile.c b/mm/swapfile.c
>>> index 149e77454e3c..724173cd7d0c 100644
>>> --- a/mm/swapfile.c
>>> +++ b/mm/swapfile.c
>>> @@ -39,6 +39,7 @@
>>>  #include <linux/export.h>
>>>  #include <linux/swap_slots.h>
>>>  #include <linux/sort.h>
>>> +#include <linux/completion.h>
>>>  
>>>  #include <asm/tlbflush.h>
>>>  #include <linux/swapops.h>
>>> @@ -511,6 +512,15 @@ static void swap_discard_work(struct work_struct *work)
>>>  	spin_unlock(&si->lock);
>>>  }
>>>  
>>> +static void swap_users_ref_free(struct percpu_ref *ref)
>>> +{
>>> +	struct swap_info_struct *si;
>>> +
>>> +	si = container_of(ref, struct swap_info_struct, users);
>>> +	complete(&si->comp);
>>> +	percpu_ref_exit(&si->users);
>>
>> Because percpu_ref_exit() is used, we cannot use percpu_ref_tryget() in
>> get_swap_device(), better to add comments there.
> 
> I just noticed that the comments of percpu_ref_tryget_live() says,
> 
>  * This function is safe to call as long as @ref is between init and exit.
> 
> While we need to call get_swap_device() almost at any time, so it's
> better to avoid to call percpu_ref_exit() at all.  This will waste some
> memory, but we need to follow the API definition to avoid potential
> issues in the long term.

I have to admit that I'am not really familiar with percpu_ref. So I read the
implementation code of the percpu_ref and found percpu_ref_tryget_live() could
be called after exit now. But you're right we need to follow the API definition
to avoid potential issues in the long term.

> 
> And we need to call percpu_ref_init() before insert the swap_info_struct
> into the swap_info[].

If we remove the call to percpu_ref_exit(), we should not use percpu_ref_init()
here because *percpu_ref->data is assumed to be NULL* in percpu_ref_init() while
this is not the case as we do not call percpu_ref_exit(). Maybe percpu_ref_reinit()
or percpu_ref_resurrect() will do the work.

One more thing, how could I distinguish the killed percpu_ref from newly allocated one?
It seems percpu_ref_is_dying is only safe to call when @ref is between init and exit.
Maybe I could do this in alloc_swap_info()?

> 
>>> +}
>>> +
>>>  static void alloc_cluster(struct swap_info_struct *si, unsigned long idx)
>>>  {
>>>  	struct swap_cluster_info *ci = si->cluster_info;
>>> @@ -2500,7 +2510,7 @@ static void enable_swap_info(struct swap_info_struct *p, int prio,
>>>  	 * Guarantee swap_map, cluster_info, etc. fields are valid
>>>  	 * between get/put_swap_device() if SWP_VALID bit is set
>>>  	 */
>>> -	synchronize_rcu();
>>> +	percpu_ref_reinit(&p->users);
>>
>> Although the effect is same, I think it's better to use
>> percpu_ref_resurrect() here to improve code readability.
> 
> Check the original commit description for commit eb085574a752 "mm, swap:
> fix race between swapoff and some swap operations" and discussion email
> thread as follows again,
> 
> https://lore.kernel.org/linux-mm/20171219053650.GB7829@linux.vnet.ibm.com/
> 
> I found that the synchronize_rcu() here is to avoid to call smp_rmb() or
> smp_load_acquire() in get_swap_device().  Now we will use
> percpu_ref_tryget_live() in get_swap_device(), so we will need to add
> the necessary memory barrier, or make sure percpu_ref_tryget_live() has
> ACQUIRE semantics.  Per my understanding, we need to change
> percpu_ref_tryget_live() for that.
> 

Do you mean the below scene is possible?

cpu1
swapon()
  ...
  percpu_ref_init
  ...
  setup_swap_info
  /* smp_store_release() is inside percpu_ref_reinit */
  percpu_ref_reinit
  ...

cpu2
get_swap_device()
  /* ignored  smp_rmb() */
  percpu_ref_tryget_live
  ...

There is indeed missing smp_rmb() in percpu_ref_tryget_live. So I think the above
scene possible and we should fix this.

>>>  	spin_lock(&swap_lock);
>>>  	spin_lock(&p->lock);
>>>  	_enable_swap_info(p);
>>> @@ -2621,11 +2631,13 @@ SYSCALL_DEFINE1(swapoff, const char __user *, specialfile)
>>>  	p->flags &= ~SWP_VALID;		/* mark swap device as invalid */
>>>  	spin_unlock(&p->lock);
>>>  	spin_unlock(&swap_lock);
>>> +
>>> +	percpu_ref_kill(&p->users);
>>>  	/*
>>>  	 * wait for swap operations protected by get/put_swap_device()
>>>  	 * to complete
>>>  	 */
>>> -	synchronize_rcu();
>>> +	wait_for_completion(&p->comp);
>>
>> Better to move percpu_ref_kill() after the comments.  And maybe revise
>> the comments.
> 
> After reading the original commit description as above, I found that we
> need synchronize_rcu() here to protect the accessing to the swap cache
> data structure.  Because there's call_rcu() during percpu_ref_kill(), it
> appears OK to keep the synchronize_rcu() here.  And we need to revise
> the comments to make it clear what is protected by which operation.
> 

Per my understanding, percpu_ref->data->release is called only after the refcnt
reaches 0, including a full grace period has elapsed or refcnt won't be 0.
wait_for_completion() is used for waiting the last refcnt being released. So
synchronize_rcu() is not necessary here?

> Best Regards,

Many thanks for precious comments! :)

> Huang, Ying
> 
> [snip]
> .
> 


  reply	other threads:[~2021-04-13 12:39 UTC|newest]

Thread overview: 72+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-04-08 13:08 [PATCH 0/5] close various race windows for swap Miaohe Lin
2021-04-08 13:08 ` [PATCH 1/5] mm/swapfile: add percpu_ref support " Miaohe Lin
2021-04-12  3:30   ` Huang, Ying
2021-04-12  3:30     ` Huang, Ying
2021-04-12  6:59     ` Miaohe Lin
2021-04-12  7:24     ` Huang, Ying
2021-04-12  7:24       ` Huang, Ying
2021-04-13 12:39       ` Miaohe Lin [this message]
2021-04-14  1:17         ` Huang, Ying
2021-04-14  1:17           ` Huang, Ying
2021-04-14  1:58           ` Miaohe Lin
2021-04-14  2:06             ` Huang, Ying
2021-04-14  2:06               ` Huang, Ying
2021-04-14  3:44               ` Dennis Zhou
2021-04-14  3:59                 ` Huang, Ying
2021-04-14  3:59                   ` Huang, Ying
2021-04-14  4:05                   ` Dennis Zhou
2021-04-14  5:44                     ` Huang, Ying
2021-04-14  5:44                       ` Huang, Ying
2021-04-14 14:53                       ` Dennis Zhou
2021-04-15  3:16                         ` Miaohe Lin
2021-04-15  4:20                           ` Dennis Zhou
2021-04-15  9:17                             ` Miaohe Lin
2021-04-15  5:24                         ` Huang, Ying
2021-04-15  5:24                           ` Huang, Ying
2021-04-15 14:31                           ` Dennis Zhou
2021-04-16  0:54                             ` Huang, Ying
2021-04-16  0:54                               ` Huang, Ying
2021-04-16  2:27                             ` Miaohe Lin
2021-04-16  6:25                               ` Huang, Ying
2021-04-16  6:25                                 ` Huang, Ying
2021-04-16  8:30                                 ` Miaohe Lin
2021-04-08 13:08 ` [PATCH 2/5] swap: fix do_swap_page() race with swapoff Miaohe Lin
2021-04-08 21:34   ` Tim Chen
2021-04-09  8:42     ` Miaohe Lin
2021-04-09 17:17       ` Tim Chen
2021-04-10  3:17         ` Miaohe Lin
2021-04-12  1:44           ` Huang, Ying
2021-04-12  1:44             ` Huang, Ying
2021-04-12  3:24             ` Miaohe Lin
2021-04-08 21:37   ` kernel test robot
2021-04-09  8:46     ` Miaohe Lin
2021-04-08 22:56   ` kernel test robot
2021-04-13  1:27   ` Huang, Ying
2021-04-13  1:27     ` Huang, Ying
2021-04-13 19:24     ` Tim Chen
2021-04-14  1:04       ` Huang, Ying
2021-04-14  1:04         ` Huang, Ying
2021-04-14  2:20         ` Miaohe Lin
2021-04-14 16:13         ` Tim Chen
2021-04-15  3:19           ` Miaohe Lin
2021-04-14  2:55     ` Miaohe Lin
2021-04-14  3:07       ` Huang, Ying
2021-04-14  3:07         ` Huang, Ying
2021-04-14  3:27         ` Miaohe Lin
2021-04-08 13:08 ` [PATCH 3/5] mm/swap_state: fix get_shadow_from_swap_cache() " Miaohe Lin
2021-04-13  1:33   ` Huang, Ying
2021-04-13  1:33     ` Huang, Ying
2021-04-14  2:42     ` Miaohe Lin
2021-04-08 13:08 ` [PATCH 4/5] mm/swap_state: fix potential faulted in race in swap_ra_info() Miaohe Lin
2021-04-09  8:50   ` Huang, Ying
2021-04-09  8:50     ` Huang, Ying
2021-04-09  9:00     ` Miaohe Lin
2021-04-12  0:55       ` Huang, Ying
2021-04-12  0:55         ` Huang, Ying
2021-04-12  3:17         ` Miaohe Lin
2021-04-08 13:08 ` [PATCH 5/5] mm/swap_state: fix swap_cluster_readahead() race with swapoff Miaohe Lin
2021-04-13  1:36   ` Huang, Ying
2021-04-13  1:36     ` Huang, Ying
2021-04-14  2:43     ` Miaohe Lin
2021-04-08 14:55 ` [PATCH 0/5] close various race windows for swap riteshh
2021-04-09  8:01   ` Miaohe Lin

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=46a51c49-2887-0c1a-bcf3-e1ebe9698ebf@huawei.com \
    --to=linmiaohe@huawei.com \
    --cc=akpm@linux-foundation.org \
    --cc=alex.shi@linux.alibaba.com \
    --cc=hannes@cmpxchg.org \
    --cc=hughd@google.com \
    --cc=iamjoonsoo.kim@lge.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mhocko@suse.com \
    --cc=minchan@kernel.org \
    --cc=richard.weiyang@gmail.com \
    --cc=tim.c.chen@linux.intel.com \
    --cc=vbabka@suse.cz \
    --cc=willy@infradead.org \
    --cc=ying.huang@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.