linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Yang Shi <shy828301@gmail.com>
To: "Kirill A. Shutemov" <kirill@shutemov.name>
Cc: "HORIGUCHI NAOYA(堀口 直也)" <naoya.horiguchi@nec.com>,
	"Hugh Dickins" <hughd@google.com>,
	"Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>,
	"Matthew Wilcox" <willy@infradead.org>,
	"Oscar Salvador" <osalvador@suse.de>,
	"Andrew Morton" <akpm@linux-foundation.org>,
	"Linux MM" <linux-mm@kvack.org>,
	"Linux FS-devel Mailing List" <linux-fsdevel@vger.kernel.org>,
	"Linux Kernel Mailing List" <linux-kernel@vger.kernel.org>
Subject: Re: [PATCH 2/4] mm: khugepaged: check if file page is on LRU after locking page
Date: Wed, 15 Sep 2021 16:10:18 -0700	[thread overview]
Message-ID: <CAHbLzkrpWF=WXsn20-1oeRGch1L-HPAAyNXZpojC+RXHopFYfw@mail.gmail.com> (raw)
In-Reply-To: <CAHbLzkoyEcKMwg04SRWtWaMZCO3HLpP2BA2_kv3ASuGN6=tE2Q@mail.gmail.com>

On Wed, Sep 15, 2021 at 4:00 PM Yang Shi <shy828301@gmail.com> wrote:
>
> On Wed, Sep 15, 2021 at 10:48 AM Yang Shi <shy828301@gmail.com> wrote:
> >
> > On Wed, Sep 15, 2021 at 4:49 AM Kirill A. Shutemov <kirill@shutemov.name> wrote:
> > >
> > > On Tue, Sep 14, 2021 at 11:37:16AM -0700, Yang Shi wrote:
> > > > The khugepaged does check if the page is on LRU or not but it doesn't
> > > > hold page lock.  And it doesn't check this again after holding page
> > > > lock.  So it may race with some others, e.g. reclaimer, migration, etc.
> > > > All of them isolates page from LRU then lock the page then do something.
> > > >
> > > > But it could pass the refcount check done by khugepaged to proceed
> > > > collapse.  Typically such race is not fatal.  But if the page has been
> > > > isolated from LRU before khugepaged it likely means the page may be not
> > > > suitable for collapse for now.
> > > >
> > > > The other more fatal case is the following patch will keep the poisoned
> > > > page in page cache for shmem, so khugepaged may collapse a poisoned page
> > > > since the refcount check could pass.  3 refcounts come from:
> > > >   - hwpoison
> > > >   - page cache
> > > >   - khugepaged
> > > >
> > > > Since it is not on LRU so no refcount is incremented from LRU isolation.
> > > >
> > > > This is definitely not expected.  Checking if it is on LRU or not after
> > > > holding page lock could help serialize against hwpoison handler.
> > > >
> > > > But there is still a small race window between setting hwpoison flag and
> > > > bump refcount in hwpoison handler.  It could be closed by checking
> > > > hwpoison flag in khugepaged, however this race seems unlikely to happen
> > > > in real life workload.  So just check LRU flag for now to avoid
> > > > over-engineering.
> > > >
> > > > Signed-off-by: Yang Shi <shy828301@gmail.com>
> > > > ---
> > > >  mm/khugepaged.c | 6 ++++++
> > > >  1 file changed, 6 insertions(+)
> > > >
> > > > diff --git a/mm/khugepaged.c b/mm/khugepaged.c
> > > > index 045cc579f724..bdc161dc27dc 100644
> > > > --- a/mm/khugepaged.c
> > > > +++ b/mm/khugepaged.c
> > > > @@ -1808,6 +1808,12 @@ static void collapse_file(struct mm_struct *mm,
> > > >                       goto out_unlock;
> > > >               }
> > > >
> > > > +             /* The hwpoisoned page is off LRU but in page cache */
> > > > +             if (!PageLRU(page)) {
> > > > +                     result = SCAN_PAGE_LRU;
> > > > +                     goto out_unlock;
> > > > +             }
> > > > +
> > > >               if (isolate_lru_page(page)) {
> > >
> > > isolate_lru_page() should catch the case, no? TestClearPageLRU would fail
> > > and we get here.
> >
> > Hmm... you are definitely right. How could I miss this point.
> >
> > It might be because of I messed up the page state by some tests which
> > may do hole punch then reread the same index. That could drop the
> > poisoned page then collapse succeed. But I'm not sure. Anyway I didn't
> > figure out how the poisoned page could be collapsed. It seems
> > impossible. I will drop this patch.
>
> I think I figured out the problem. This problem happened after the
> page cache split patch and if the hwpoisoned page is not head page. It
> is because THP split will unfreeze the refcount of tail pages to 2
> (restore refcount from page cache) then dec refcount to 1. The
> refcount pin from hwpoison is gone and it is still on LRU. Then
> khugepged locked the page before hwpoison, the refcount is expected to
> khugepaged.
>
> The worse thing is it seems this problem is applicable to anonymous
> page too. Once the anonymous THP is split by hwpoison the pin from
> hwpoison is gone too the refcount is 1 (comes from PTE map). Then
> khugepaged could collapse it to huge page again. It may incur data
> corruption.
>
> And the poisoned page may be freed back to buddy since the lost refcount pin.
>
> If the poisoned page is head page, the code is fine since hwpoison
> doesn't put the refcount for head page after split.
>
> The fix is simple, just keep the refcount pin for hwpoisoned subpage.

Err... wait... I just realized I missed the below code block:

if (subpage == page)
        continue;

It skips the subpage passed to split_huge_page() so the refcount pin
from the caller for this subpage is kept. And hwpoison doesn't put it.
So it seems fine.

>
> >
> > >
> > > >                       result = SCAN_DEL_PAGE_LRU;
> > > >                       goto out_unlock;
> > > > --
> > > > 2.26.2
> > > >
> > > >
> > >
> > > --
> > >  Kirill A. Shutemov


  reply	other threads:[~2021-09-15 23:10 UTC|newest]

Thread overview: 15+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-09-14 18:37 [RFC PATCH 0/4] Solve silent data loss caused by poisoned page cache (shmem/tmpfs) Yang Shi
2021-09-14 18:37 ` [PATCH 1/4] mm: filemap: check if any subpage is hwpoisoned for PMD page fault Yang Shi
2021-09-15 11:46   ` Kirill A. Shutemov
2021-09-15 17:28     ` Yang Shi
2021-09-14 18:37 ` [PATCH 2/4] mm: khugepaged: check if file page is on LRU after locking page Yang Shi
2021-09-15 11:49   ` Kirill A. Shutemov
2021-09-15 17:48     ` Yang Shi
2021-09-15 23:00       ` Yang Shi
2021-09-15 23:10         ` Yang Shi [this message]
2021-09-14 18:37 ` [PATCH 3/4] mm: shmem: don't truncate page if memory failure happens Yang Shi
2021-09-21  9:49   ` Naoya Horiguchi
2021-09-21 19:34     ` Yang Shi
2021-09-14 18:37 ` [PATCH 4/4] mm: hwpoison: handle non-anonymous THP correctly Yang Shi
2021-09-21  9:50   ` Naoya Horiguchi
2021-09-21 19:46     ` Yang Shi

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CAHbLzkrpWF=WXsn20-1oeRGch1L-HPAAyNXZpojC+RXHopFYfw@mail.gmail.com' \
    --to=shy828301@gmail.com \
    --cc=akpm@linux-foundation.org \
    --cc=hughd@google.com \
    --cc=kirill.shutemov@linux.intel.com \
    --cc=kirill@shutemov.name \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=naoya.horiguchi@nec.com \
    --cc=osalvador@suse.de \
    --cc=willy@infradead.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).