From: "Huang, Ying" <ying.huang@intel.com>
To: Matthew Wilcox <willy@infradead.org>
Cc: Hugh Dickins <hughd@google.com>,
Andrew Morton <akpm@linux-foundation.org>,
David Hildenbrand <david@redhat.com>,
Yang Shi <shy828301@gmail.com>,
linux-mm@kvack.org, linux-kernel@vger.kernel.org,
Miaohe Lin <linmiaohe@huawei.com>,
Johannes Weiner <hannes@cmpxchg.org>,
Michal Hocko <mhocko@suse.com>,
Joonsoo Kim <iamjoonsoo.kim@lge.com>,
Minchan Kim <minchan@kernel.org>
Subject: Re: [PATCH] mm,shmem: Fix a typo in shmem_swapin_page()
Date: Tue, 03 Aug 2021 16:14:38 +0800 [thread overview]
Message-ID: <8735rr54i9.fsf@yhuang6-desk2.ccr.corp.intel.com> (raw)
In-Reply-To: <YPs6cQo7iG1JcOn8@casper.infradead.org> (Matthew Wilcox's message of "Fri, 23 Jul 2021 22:53:53 +0100")
Matthew Wilcox <willy@infradead.org> writes:
> On Fri, Jul 23, 2021 at 01:23:07PM -0700, Hugh Dickins wrote:
>> I was wary because, if the (never observed) race to be fixed is in
>> swap_cluster_readahead(), why was shmem_swapin_page() being patched?
>> Not explained in its commit message, probably a misunderstanding of
>> how mm/shmem.c already manages races (and prefers not to be involved
>> in swap_info_struct stuff).
>>
>> But why do I now say it's bad? Because even if you correct the EINVAL
>> to -EINVAL, that's an unexpected error: -EEXIST is common, -ENOMEM is
>> not surprising, -ENOSPC can need consideration, but -EIO and anything
>> else just end up as SIGBUS when faulting (or as error from syscall).
>> So, 2efa33fc7f6e converts a race with swapoff to SIGBUS: not good,
>> and I think much more likely than the race to be fixed (since
>> swapoff's percpu_ref_kill() rightly comes before synchronize_rcu()).
>
> Yes, I think a lot more thought was needed here. And I would have
> preferred to start with a reproducer instead of "hey, this could
> happen". Maybe something like booting a 1GB VM, adding two 2GB swap
> partitions, swapon(partition A); run a 2GB memhog and then
>
> loop:
> swapon(part B);
> swapoff(part A);
> swapon(part A);
> swapoff(part B);
>
> to make this happen.
>
> but if it does happen, why would returning EINVAL be the right thing
> to do? We've swapped it out. It must be on swap somewhere, or we've
> really messed up. So I could see there being a race where we get
> preempted between looking up the swap entry and calling get_swap_device().
> But if that does happen, then the page gets brought in, and potentially
> reswapped to the other swap device.
>
> So returning -EEXIST here would actually work. That forces a re-lookup
> in the page cache, so we'll get the new swap entry that tells us which
> swap device the page is now on.
Yes. -EEXIST is the right error code. We use that in
shmem_swapin_page() to deal with race condition.
> But I REALLY REALLY REALLY want a reproducer. Right now, I have a hard
> time believing this, or any of the other races can really happen.
I think the race is only theoretical too. Firstly, swapoff is a rare
operations in practice; secondly, the race window is really small.
Best Regards,
Huang, Ying
>> 2efa33fc7f6e was intending to fix a race introduced by two-year-old
>> 8fd2e0b505d1 ("mm: swap: check if swap backing device is congested
>> or not"), which added a call to inode_read_congested(). Certainly
>> relying on si->swap_file->f_mapping->host there was new territory:
>> whether actually racy I'm not sure offhand - I've forgotten whether
>> synchronize_rcu() waits for preempted tasks or not.
>>
>> But if it is racy, then I wonder if the right fix might be to revert
>> 8fd2e0b505d1 too. Convincing numbers were offered for it, but I'm
>> puzzled: because Matthew has in the past noted that the block layer
>> broke and further broke bdi congestion tracking (I don't know the
>> relevant release numbers), so I don't understand how checking
>> inode_read_congested() is actually useful there nowadays.
>
> It might be useful for NFS? I don't think congestion is broken there
> (except how does the NFS client have any idea whether the server is
> congested or not?)
next prev parent reply other threads:[~2021-08-03 8:14 UTC|newest]
Thread overview: 21+ messages / expand[flat|nested] mbox.gz Atom feed top
2021-07-23 8:00 [PATCH] mm,shmem: Fix a typo in shmem_swapin_page() Huang Ying
2021-07-23 11:10 ` David Hildenbrand
2021-07-23 20:23 ` Hugh Dickins
2021-07-23 21:53 ` Matthew Wilcox
2021-08-03 8:14 ` Huang, Ying [this message]
2021-08-03 12:01 ` Matthew Wilcox
2021-08-04 5:34 ` Hugh Dickins
2021-08-04 9:04 ` Huang, Ying
2021-08-05 23:08 ` Yang Shi
2021-08-06 6:01 ` Hugh Dickins
2021-08-06 20:37 ` Yang Shi
2021-08-09 21:26 ` Yang Shi
2021-08-09 23:43 ` Huang, Ying
2021-08-10 1:11 ` Yang Shi
2021-08-04 9:01 ` Huang, Ying
2021-07-28 13:03 ` huang ying
2021-08-03 8:06 ` Huang, Ying
2021-08-03 11:59 ` Matthew Wilcox
2021-08-04 4:28 ` Hugh Dickins
2021-08-04 6:42 ` Hugh Dickins
2021-08-04 8:59 ` Huang, Ying
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=8735rr54i9.fsf@yhuang6-desk2.ccr.corp.intel.com \
--to=ying.huang@intel.com \
--cc=akpm@linux-foundation.org \
--cc=david@redhat.com \
--cc=hannes@cmpxchg.org \
--cc=hughd@google.com \
--cc=iamjoonsoo.kim@lge.com \
--cc=linmiaohe@huawei.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=mhocko@suse.com \
--cc=minchan@kernel.org \
--cc=shy828301@gmail.com \
--cc=willy@infradead.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).