From: Yu Zhao <yuzhao@google.com>
To: "Huang, Ying" <ying.huang@intel.com>, Miaohe Lin <linmiaohe@huawei.com>
Cc: Linux-MM <linux-mm@kvack.org>,
linux-kernel <linux-kernel@vger.kernel.org>,
Andrew Morton <akpm@linux-foundation.org>,
Matthew Wilcox <willy@infradead.org>,
Shakeel Butt <shakeelb@google.com>,
Alex Shi <alex.shi@linux.alibaba.com>,
Minchan Kim <minchan@kernel.org>
Subject: Re: [Question] Is there a race window between swapoff vs synchronous swap_readpage
Date: Mon, 29 Mar 2021 23:47:13 -0600 [thread overview]
Message-ID: <CAOUHufYMjhQagKfdjekxr62bsB83UJvArddUsuwTXoo-5jA45A@mail.gmail.com> (raw)
In-Reply-To: <87h7kt9ufw.fsf@yhuang6-desk1.ccr.corp.intel.com>
On Mon, Mar 29, 2021 at 9:44 PM Huang, Ying <ying.huang@intel.com> wrote:
>
> Miaohe Lin <linmiaohe@huawei.com> writes:
>
> > On 2021/3/30 9:57, Huang, Ying wrote:
> >> Hi, Miaohe,
> >>
> >> Miaohe Lin <linmiaohe@huawei.com> writes:
> >>
> >>> Hi all,
> >>> I am investigating the swap code, and I found the below possible race window:
> >>>
> >>> CPU 1 CPU 2
> >>> ----- -----
> >>> do_swap_page
> >>> skip swapcache case (synchronous swap_readpage)
> >>> alloc_page_vma
> >>> swapoff
> >>> release swap_file, bdev, or ...
> >>> swap_readpage
> >>> check sis->flags is ok
> >>> access swap_file, bdev or ...[oops!]
> >>> si->flags = 0
> >>>
> >>> The swapcache case is ok because swapoff will wait on the page_lock of swapcache page.
> >>> Is this will really happen or Am I miss something ?
> >>> Any reply would be really grateful. Thanks! :)
> >>
> >> This appears possible. Even for swapcache case, we can't guarantee the
> >
> > Many thanks for reply!
> >
> >> swap entry gotten from the page table is always valid too. The
> >
> > The page table may change at any time. And we may thus do some useless work.
> > But the pte_same() check could handle these races correctly if these do not
> > result in oops.
> >
> >> underlying swap device can be swapped off at the same time. So we use
> >> get/put_swap_device() for that. Maybe we need similar stuff here.
> >
> > Using get/put_swap_device() to guard against swapoff for swap_readpage() sounds
> > really bad as swap_readpage() may take really long time. Also such race may not be
> > really hurtful because swapoff is usually done when system shutdown only.
> > I can not figure some simple and stable stuff out to fix this. Any suggestions or
> > could anyone help get rid of such race?
>
> Some reference counting on the swap device can prevent swap device from
> swapping-off. To reduce the performance overhead on the hot-path as
> much as possible, it appears we can use the percpu_ref.
Hi,
I've been seeing crashes when testing the latest kernels with
stress-ng --class vm -a 20 -t 600s --temp-path /tmp
I haven't had time to look into them yet:
DEBUG_VM:
BUG: unable to handle page fault for address: ffff905c33c9a000
Call Trace:
get_swap_pages+0x278/0x590
get_swap_page+0x1ab/0x280
add_to_swap+0x7d/0x130
shrink_page_list+0xf84/0x25f0
reclaim_pages+0x313/0x430
madvise_cold_or_pageout_pte_range+0x95c/0xaa0
KASAN:
==================================================================
BUG: KASAN: slab-out-of-bounds in __frontswap_store+0xc9/0x2e0
Read of size 8 at addr ffff88901f646f18 by task stress-ng-mrema/31329
CPU: 2 PID: 31329 Comm: stress-ng-mrema Tainted: G S I L
5.12.0-smp-DEV #2
Call Trace:
dump_stack+0xff/0x165
print_address_description+0x81/0x390
__kasan_report+0x154/0x1b0
? __frontswap_store+0xc9/0x2e0
? __frontswap_store+0xc9/0x2e0
kasan_report+0x47/0x60
kasan_check_range+0x2f3/0x340
__kasan_check_read+0x11/0x20
__frontswap_store+0xc9/0x2e0
swap_writepage+0x52/0x80
pageout+0x489/0x7f0
shrink_page_list+0x1b11/0x2c90
reclaim_pages+0x6ca/0x930
madvise_cold_or_pageout_pte_range+0x1260/0x13a0
Allocated by task 16813:
____kasan_kmalloc+0xb0/0xe0
__kasan_kmalloc+0x9/0x10
__kmalloc_node+0x52/0x70
kvmalloc_node+0x50/0x90
__se_sys_swapon+0x353a/0x4860
__x64_sys_swapon+0x5b/0x70
The buggy address belongs to the object at ffff88901f640000
which belongs to the cache kmalloc-32k of size 32768
The buggy address is located 28440 bytes inside of
32768-byte region [ffff88901f640000, ffff88901f648000)
The buggy address belongs to the page:
page:0000000032d23e33 refcount:1 mapcount:0 mapping:0000000000000000
index:0x0 pfn:0x101f640
head:0000000032d23e33 order:4 compound_mapcount:0 compound_pincount:0
flags: 0x400000000010200(slab|head)
raw: 0400000000010200 ffffea00062b8408 ffffea000a6e9008 ffff888100040300
raw: 0000000000000000 ffff88901f640000 0000000100000001 000000000000000
page dumped because: kasan: bad access detected
Memory state around the buggy address:
ffff88901f646e00: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
ffff88901f646e80: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
>ffff88901f646f00: 00 00 00 fc fc fc fc fc fc fc fc fc fc fc fc fc
^
ffff88901f646f80: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
ffff88901f647000: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
==================================================================
Relevant config options I could think of:
CONFIG_MEMCG_SWAP=y
CONFIG_THP_SWAP=y
CONFIG_ZSWAP=y
next prev parent reply other threads:[~2021-03-30 5:48 UTC|newest]
Thread overview: 9+ messages / expand[flat|nested] mbox.gz Atom feed top
2021-03-29 13:18 [Question] Is there a race window between swapoff vs synchronous swap_readpage Miaohe Lin
2021-03-30 1:57 ` Huang, Ying
2021-03-30 3:15 ` Miaohe Lin
2021-03-30 3:44 ` Huang, Ying
2021-03-30 5:47 ` Yu Zhao [this message]
2021-03-30 6:57 ` Huang, Ying
2021-03-30 7:27 ` Yu Zhao
2021-04-12 3:12 ` Miaohe Lin
2021-03-30 11:21 ` Miaohe Lin
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=CAOUHufYMjhQagKfdjekxr62bsB83UJvArddUsuwTXoo-5jA45A@mail.gmail.com \
--to=yuzhao@google.com \
--cc=akpm@linux-foundation.org \
--cc=alex.shi@linux.alibaba.com \
--cc=linmiaohe@huawei.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=minchan@kernel.org \
--cc=shakeelb@google.com \
--cc=willy@infradead.org \
--cc=ying.huang@intel.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).