All of lore.kernel.org
 help / color / mirror / Atom feed
From: Miaohe Lin <linmiaohe@huawei.com>
To: Tim Chen <tim.c.chen@linux.intel.com>, <akpm@linux-foundation.org>
Cc: <hannes@cmpxchg.org>, <mhocko@suse.com>, <iamjoonsoo.kim@lge.com>,
	<vbabka@suse.cz>, <alex.shi@linux.alibaba.com>,
	<willy@infradead.org>, <minchan@kernel.org>,
	<richard.weiyang@gmail.com>, <ying.huang@intel.com>,
	<hughd@google.com>, <linux-kernel@vger.kernel.org>,
	<linux-mm@kvack.org>
Subject: Re: [PATCH 2/5] swap: fix do_swap_page() race with swapoff
Date: Sat, 10 Apr 2021 11:17:29 +0800	[thread overview]
Message-ID: <d2a26fb0-caef-04b7-e163-88bcda358b0d@huawei.com> (raw)
In-Reply-To: <995a130b-f07a-4771-1fe3-477d2f3c1e8e@linux.intel.com>

On 2021/4/10 1:17, Tim Chen wrote:
> 
> 
> On 4/9/21 1:42 AM, Miaohe Lin wrote:
>> On 2021/4/9 5:34, Tim Chen wrote:
>>>
>>>
>>> On 4/8/21 6:08 AM, Miaohe Lin wrote:
>>>> When I was investigating the swap code, I found the below possible race
>>>> window:
>>>>
>>>> CPU 1					CPU 2
>>>> -----					-----
>>>> do_swap_page
>>>>   synchronous swap_readpage
>>>>     alloc_page_vma
>>>> 					swapoff
>>>> 					  release swap_file, bdev, or ...
>>>
>>
>> Many thanks for quick review and reply!
>>
>>> Perhaps I'm missing something.  The release of swap_file, bdev etc
>>> happens after we have cleared the SWP_VALID bit in si->flags in destroy_swap_extents
>>> if I read the swapoff code correctly.
>> Agree. Let's look this more close:
>> CPU1								CPU2
>> -----								-----
>> swap_readpage
>>   if (data_race(sis->flags & SWP_FS_OPS)) {
>> 								swapoff
>> 								  p->swap_file = NULL;
>>     struct file *swap_file = sis->swap_file;
>>     struct address_space *mapping = swap_file->f_mapping;[oops!]
>> 								  ...
>> 								  p->flags = 0;
>>     ...
>>
>> Does this make sense for you?
> 
> p->swapfile = NULL happens after the 
> p->flags &= ~SWP_VALID, synchronize_rcu(), destroy_swap_extents() sequence in swapoff().
> 
> So I don't think the sequence you illustrated on CPU2 is in the right order.
> That said, without get_swap_device/put_swap_device in swap_readpage, you could
> potentially blow pass synchronize_rcu() on CPU2 and causes a problem.  so I think
> the problematic race looks something like the following:
> 
> 
> CPU1								CPU2
> -----								-----
> swap_readpage
>   if (data_race(sis->flags & SWP_FS_OPS)) {
> 								swapoff
> 								  p->flags = &= ~SWP_VALID;
> 								  ..
> 								  synchronize_rcu();
> 								  ..
> 								  p->swap_file = NULL;
>     struct file *swap_file = sis->swap_file;
>     struct address_space *mapping = swap_file->f_mapping;[oops!]
> 								  ...
>     ...
> 

Agree. This is also what I meant to illustrate. And you provide a better one. Many thanks!

> By adding get_swap_device/put_swap_device, then the race is fixed.
> 
> 
> CPU1								CPU2
> -----								-----
> swap_readpage
>   get_swap_device()
>   ..
>   if (data_race(sis->flags & SWP_FS_OPS)) {
> 								swapoff
> 								  p->flags = &= ~SWP_VALID;
> 								  ..
>     struct file *swap_file = sis->swap_file;
>     struct address_space *mapping = swap_file->f_mapping;[valid value]
>   ..
>   put_swap_device()
> 								  synchronize_rcu();
> 								  ..
> 								  p->swap_file = NULL;
> 
> 
>>
>>>>
>>>>       swap_readpage
>>>> 	check sis->flags is ok
>>>> 	  access swap_file, bdev...[oops!]
>>>> 					    si->flags = 0
>>>
>>> This happens after we clear the si->flags
>>> 					synchronize_rcu()
>>> 					release swap_file, bdev, in destroy_swap_extents()
>>>
>>> So I think if we have get_swap_device/put_swap_device in do_swap_page,
>>> it should fix the race you've pointed out here.  
>>> Then synchronize_rcu() will wait till we have completed do_swap_page and
>>> call put_swap_device.
>>
>> Right, get_swap_device/put_swap_device could fix this race. __But__ rcu_read_lock()
>> in get_swap_device() could disable preempt and do_swap_page() may take a really long
>> time because it involves I/O. It may not be acceptable to disable preempt for such a
>> long time. :(
> 
> I can see that it is not a good idea to hold rcu read lock for a long
> time over slow file I/O operation, which will be the side effect of
> introducing get/put_swap_device to swap_readpage.  So using percpu_ref
> will then be preferable for synchronization once we introduce 
> get/put_swap_device into swap_readpage.
> 

The sis->bdev should also be protected by get/put_swap_device. It has the similar
issue. And swap_slot_free_notify (called from callback end_swap_bio_read) would
race with swapoff too. So I use get/put_swap_device to protect swap_readpage until
file I/O operation is completed.

Thanks again!

> Tim
> .
> 

  reply	other threads:[~2021-04-10  3:17 UTC|newest]

Thread overview: 73+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-04-08 13:08 [PATCH 0/5] close various race windows for swap Miaohe Lin
2021-04-08 13:08 ` [PATCH 1/5] mm/swapfile: add percpu_ref support " Miaohe Lin
2021-04-12  3:30   ` Huang, Ying
2021-04-12  3:30     ` Huang, Ying
2021-04-12  6:59     ` Miaohe Lin
2021-04-12  7:24     ` Huang, Ying
2021-04-12  7:24       ` Huang, Ying
2021-04-13 12:39       ` Miaohe Lin
2021-04-14  1:17         ` Huang, Ying
2021-04-14  1:17           ` Huang, Ying
2021-04-14  1:58           ` Miaohe Lin
2021-04-14  2:06             ` Huang, Ying
2021-04-14  2:06               ` Huang, Ying
2021-04-14  3:44               ` Dennis Zhou
2021-04-14  3:59                 ` Huang, Ying
2021-04-14  3:59                   ` Huang, Ying
2021-04-14  4:05                   ` Dennis Zhou
2021-04-14  5:44                     ` Huang, Ying
2021-04-14  5:44                       ` Huang, Ying
2021-04-14 14:53                       ` Dennis Zhou
2021-04-15  3:16                         ` Miaohe Lin
2021-04-15  4:20                           ` Dennis Zhou
2021-04-15  9:17                             ` Miaohe Lin
2021-04-15  5:24                         ` Huang, Ying
2021-04-15  5:24                           ` Huang, Ying
2021-04-15 14:31                           ` Dennis Zhou
2021-04-16  0:54                             ` Huang, Ying
2021-04-16  0:54                               ` Huang, Ying
2021-04-16  2:27                             ` Miaohe Lin
2021-04-16  6:25                               ` Huang, Ying
2021-04-16  6:25                                 ` Huang, Ying
2021-04-16  8:30                                 ` Miaohe Lin
2021-04-08 13:08 ` [PATCH 2/5] swap: fix do_swap_page() race with swapoff Miaohe Lin
2021-04-08 21:34   ` Tim Chen
2021-04-09  8:42     ` Miaohe Lin
2021-04-09 17:17       ` Tim Chen
2021-04-10  3:17         ` Miaohe Lin [this message]
2021-04-12  1:44           ` Huang, Ying
2021-04-12  1:44             ` Huang, Ying
2021-04-12  3:24             ` Miaohe Lin
2021-04-08 21:37   ` kernel test robot
2021-04-09  8:46     ` Miaohe Lin
2021-04-08 22:56   ` kernel test robot
2021-04-13  1:27   ` Huang, Ying
2021-04-13  1:27     ` Huang, Ying
2021-04-13 19:24     ` Tim Chen
2021-04-14  1:04       ` Huang, Ying
2021-04-14  1:04         ` Huang, Ying
2021-04-14  2:20         ` Miaohe Lin
2021-04-14 16:13         ` Tim Chen
2021-04-15  3:19           ` Miaohe Lin
2021-04-14  2:55     ` Miaohe Lin
2021-04-14  3:07       ` Huang, Ying
2021-04-14  3:07         ` Huang, Ying
2021-04-14  3:27         ` Miaohe Lin
2021-04-08 13:08 ` [PATCH 3/5] mm/swap_state: fix get_shadow_from_swap_cache() " Miaohe Lin
2021-04-13  1:33   ` Huang, Ying
2021-04-13  1:33     ` Huang, Ying
2021-04-14  2:42     ` Miaohe Lin
2021-04-08 13:08 ` [PATCH 4/5] mm/swap_state: fix potential faulted in race in swap_ra_info() Miaohe Lin
2021-04-09  8:50   ` Huang, Ying
2021-04-09  8:50     ` Huang, Ying
2021-04-09  9:00     ` Miaohe Lin
2021-04-12  0:55       ` Huang, Ying
2021-04-12  0:55         ` Huang, Ying
2021-04-12  3:17         ` Miaohe Lin
2021-04-08 13:08 ` [PATCH 5/5] mm/swap_state: fix swap_cluster_readahead() race with swapoff Miaohe Lin
2021-04-13  1:36   ` Huang, Ying
2021-04-13  1:36     ` Huang, Ying
2021-04-14  2:43     ` Miaohe Lin
2021-04-08 14:55 ` [PATCH 0/5] close various race windows for swap riteshh
2021-04-09  8:01   ` Miaohe Lin
2021-04-08 20:46 [PATCH 2/5] swap: fix do_swap_page() race with swapoff kernel test robot

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=d2a26fb0-caef-04b7-e163-88bcda358b0d@huawei.com \
    --to=linmiaohe@huawei.com \
    --cc=akpm@linux-foundation.org \
    --cc=alex.shi@linux.alibaba.com \
    --cc=hannes@cmpxchg.org \
    --cc=hughd@google.com \
    --cc=iamjoonsoo.kim@lge.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mhocko@suse.com \
    --cc=minchan@kernel.org \
    --cc=richard.weiyang@gmail.com \
    --cc=tim.c.chen@linux.intel.com \
    --cc=vbabka@suse.cz \
    --cc=willy@infradead.org \
    --cc=ying.huang@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.