From: Rafael Aquini <aquini@redhat.com>
To: "Huang, Ying" <ying.huang@intel.com>
Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org,
akpm@linux-foundation.org
Subject: Re: [PATCH] mm: swapfile: avoid split_swap_cluster() NULL pointer dereference
Date: Thu, 24 Sep 2020 02:30:38 -0400 [thread overview]
Message-ID: <20200924063038.GD1023012@optiplex-lnx> (raw)
In-Reply-To: <877dsjessq.fsf@yhuang-dev.intel.com>
On Thu, Sep 24, 2020 at 11:51:17AM +0800, Huang, Ying wrote:
> Rafael Aquini <aquini@redhat.com> writes:
> > The bug here is quite simple: split_swap_cluster() misses checking for
> > lock_cluster() returning NULL before committing to change cluster_info->flags.
>
> I don't think so. We shouldn't run into this situation firstly. So the
> "fix" hides the real bug instead of fixing it. Just like we call
> VM_BUG_ON_PAGE(!PageLocked(head), head) in split_huge_page_to_list()
> instead of returning if !PageLocked(head) silently.
>
Not the same thing, obviously, as you are going for an apples-to-carrots
comparison, but since you mentioned:
split_huge_page_to_list() asserts (in debug builds) *page is locked,
and later checks if *head bears the SwapCache flag.
deferred_split_scan(), OTOH, doesn't hand down the compound head locked,
but the 2nd page in the group instead.
This doesn't necessarely means it's a problem, though, but might help
on hitting the issue.
> > The fundamental problem has nothing to do with allocating, or not allocating
> > a swap cluster, but it has to do with the fact that the THP deferred split scan
> > can transiently race with swapcache insertion, and the fact that when you run
> > your swap area on rotational storage cluster_info is _always_ NULL.
> > split_swap_cluster() needs to check for lock_cluster() returning NULL because
> > that's one possible case, and it clearly fails to do so.
>
> If there's a race, we should fix the race. But the code path for
> swapcache insertion is,
>
> add_to_swap()
> get_swap_page() /* Return if fails to allocate */
> add_to_swap_cache()
> SetPageSwapCache()
>
> While the code path to split THP is,
>
> split_huge_page_to_list()
> if PageSwapCache()
> split_swap_cluster()
>
> Both code paths are protected by the page lock. So there should be some
> other reasons to trigger the bug.
As mentioned above, no they seem to not be protected (at least, not the
same page, depending on the case). While add_to_swap() will assure a
page_lock on the compound head, split_huge_page_to_list() does not.
> And again, for HDD, a THP shouldn't have PageSwapCache() set at the
> first place. If so, the bug is that the flag is set and we should fix
> the setting.
>
I fail to follow your claim here. Where is the guarantee, in the code, that
you'll never have a compound head in the swapcache?
> > Run a workload that cause multiple THP COW, and add a memory hogger to create
> > memory pressure so you'll force the reclaimers to kick the registered
> > shrinkers. The trigger is not heavy swapping, and that's probably why
> > most swap test cases don't hit it. The window is tight, but you will get the
> > NULL pointer dereference.
>
> Do you have a script to reproduce the bug?
>
Nope, a convoluted set of internal regression tests we have usually
triggers it. In the wild, customers running HANNA are seeing it,
occasionally.
> > Regardless you find furhter bugs, or not, this patch is needed to correct a
> > blunt coding mistake.
>
> As above. I don't agree with that.
>
It's OK to disagree, split_swap_cluster still misses the cluster_info NULL check,
though.
next prev parent reply other threads:[~2020-09-24 6:30 UTC|newest]
Thread overview: 23+ messages / expand[flat|nested] mbox.gz Atom feed top
2020-09-22 18:48 [PATCH] mm: swapfile: avoid split_swap_cluster() NULL pointer dereference Rafael Aquini
2020-09-22 19:47 ` Andrew Morton
2020-09-23 13:42 ` Rafael Aquini
2020-09-25 2:59 ` Andrew Morton
2020-09-25 3:06 ` Huang, Ying
2020-09-25 3:10 ` Andrew Morton
2020-09-23 2:21 ` Huang, Ying
2020-09-23 4:34 ` Rafael Aquini
2020-09-23 5:13 ` Huang, Ying
2020-09-23 13:01 ` Rafael Aquini
2020-09-24 0:59 ` Huang, Ying
2020-09-24 2:09 ` Rafael Aquini
2020-09-24 3:51 ` Huang, Ying
2020-09-24 6:30 ` Rafael Aquini [this message]
2020-09-24 6:57 ` Huang, Ying
2020-09-24 7:45 ` Huang, Ying
2020-09-24 15:08 ` Rafael Aquini
2020-09-25 3:21 ` Huang, Ying
2020-09-26 15:16 ` Rafael Aquini
2020-09-27 5:33 ` Huang, Ying
2020-10-01 14:31 ` Rafael Aquini
2020-10-05 13:39 ` Rafael Aquini
2020-10-09 0:18 ` Huang, Ying
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20200924063038.GD1023012@optiplex-lnx \
--to=aquini@redhat.com \
--cc=akpm@linux-foundation.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=ying.huang@intel.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).