linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Yang Shi <shy828301@gmail.com>
To: "Kirill A. Shutemov" <kirill@shutemov.name>
Cc: "HORIGUCHI NAOYA(堀口 直也)" <naoya.horiguchi@nec.com>,
	"Hugh Dickins" <hughd@google.com>,
	"Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>,
	"Matthew Wilcox" <willy@infradead.org>,
	"Peter Xu" <peterx@redhat.com>,
	"Oscar Salvador" <osalvador@suse.de>,
	"Andrew Morton" <akpm@linux-foundation.org>,
	"Linux MM" <linux-mm@kvack.org>,
	"Linux FS-devel Mailing List" <linux-fsdevel@vger.kernel.org>,
	"Linux Kernel Mailing List" <linux-kernel@vger.kernel.org>
Subject: Re: [v2 PATCH 1/5] mm: filemap: check if THP has hwpoisoned subpage for PMD page fault
Date: Thu, 23 Sep 2021 10:15:45 -0700	[thread overview]
Message-ID: <CAHbLzkqb-6a7c=C8WF0G0X2yCey=t7OoL-oW2Y0CpM0MpgJbBg@mail.gmail.com> (raw)
In-Reply-To: <20210923143901.mdc6rejuh7hmr5vh@box.shutemov.name>

On Thu, Sep 23, 2021 at 7:39 AM Kirill A. Shutemov <kirill@shutemov.name> wrote:
>
> On Wed, Sep 22, 2021 at 08:28:26PM -0700, Yang Shi wrote:
> > When handling shmem page fault the THP with corrupted subpage could be PMD
> > mapped if certain conditions are satisfied.  But kernel is supposed to
> > send SIGBUS when trying to map hwpoisoned page.
> >
> > There are two paths which may do PMD map: fault around and regular fault.
> >
> > Before commit f9ce0be71d1f ("mm: Cleanup faultaround and finish_fault() codepaths")
> > the thing was even worse in fault around path.  The THP could be PMD mapped as
> > long as the VMA fits regardless what subpage is accessed and corrupted.  After
> > this commit as long as head page is not corrupted the THP could be PMD mapped.
> >
> > In the regulat fault path the THP could be PMD mapped as long as the corrupted
>
> s/regulat/regular/
>
> > page is not accessed and the VMA fits.
> >
> > This loophole could be fixed by iterating every subpage to check if any
> > of them is hwpoisoned or not, but it is somewhat costly in page fault path.
> >
> > So introduce a new page flag called HasHWPoisoned on the first tail page.  It
> > indicates the THP has hwpoisoned subpage(s).  It is set if any subpage of THP
> > is found hwpoisoned by memory failure and cleared when the THP is freed or
> > split.
> >
> > Cc: <stable@vger.kernel.org>
> > Suggested-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
> > Signed-off-by: Yang Shi <shy828301@gmail.com>
> > ---
>
> ...
>
> > diff --git a/mm/filemap.c b/mm/filemap.c
> > index dae481293b5d..740b7afe159a 100644
> > --- a/mm/filemap.c
> > +++ b/mm/filemap.c
> > @@ -3195,12 +3195,14 @@ static bool filemap_map_pmd(struct vm_fault *vmf, struct page *page)
> >       }
> >
> >       if (pmd_none(*vmf->pmd) && PageTransHuge(page)) {
> > -         vm_fault_t ret = do_set_pmd(vmf, page);
> > -         if (!ret) {
> > -                 /* The page is mapped successfully, reference consumed. */
> > -                 unlock_page(page);
> > -                 return true;
> > -         }
> > +             vm_fault_t ret = do_set_pmd(vmf, page);
> > +             if (ret == VM_FAULT_FALLBACK)
> > +                     goto out;
>
> Hm.. What? I don't get it. Who will establish page table in the pmd then?

Aha, yeah. It should jump to the below PMD populate section. Will fix
it in the next version.

>
> > +             if (!ret) {
> > +                     /* The page is mapped successfully, reference consumed. */
> > +                     unlock_page(page);
> > +                     return true;
> > +             }
> >       }
> >
> >       if (pmd_none(*vmf->pmd)) {
> > @@ -3220,6 +3222,7 @@ static bool filemap_map_pmd(struct vm_fault *vmf, struct page *page)
> >               return true;
> >       }
> >
> > +out:
> >       return false;
> >  }
> >
> > diff --git a/mm/huge_memory.c b/mm/huge_memory.c
> > index 5e9ef0fc261e..0574b1613714 100644
> > --- a/mm/huge_memory.c
> > +++ b/mm/huge_memory.c
> > @@ -2426,6 +2426,8 @@ static void __split_huge_page(struct page *page, struct list_head *list,
> >       /* lock lru list/PageCompound, ref frozen by page_ref_freeze */
> >       lruvec = lock_page_lruvec(head);
> >
> > +     ClearPageHasHWPoisoned(head);
> > +
>
> Do we serialize the new flag with lock_page() or what? I mean what
> prevents the flag being set again after this point, but before
> ClearPageCompound()?

No, not in this patch. But I think we could use refcount. THP split
would freeze refcount and the split is guaranteed to succeed after
that point, so refcount can be checked in memory failure. The
SetPageHasHWPoisoned() call could be moved to __get_hwpoison_page()
when get_unless_page_zero() bumps the refcount successfully. If the
refcount is zero it means the THP is under split or being freed, we
don't care about these two cases.

The THP might be mapped before this flag is set, but the process will
be killed later, so it seems fine.

>
> --
>  Kirill A. Shutemov

  reply	other threads:[~2021-09-23 17:16 UTC|newest]

Thread overview: 11+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-09-23  3:28 [RFC v2 PATCH 0/5] Solve silent data loss caused by poisoned page cache (shmem/tmpfs) Yang Shi
2021-09-23  3:28 ` [v2 PATCH 1/5] mm: filemap: check if THP has hwpoisoned subpage for PMD page fault Yang Shi
2021-09-23 14:39   ` Kirill A. Shutemov
2021-09-23 17:15     ` Yang Shi [this message]
2021-09-23 20:39       ` Yang Shi
2021-09-24  9:26         ` Kirill A. Shutemov
2021-09-24 16:44           ` Yang Shi
2021-09-23  3:28 ` [v2 PATCH 2/5] mm: hwpoison: refactor refcount check handling Yang Shi
2021-09-23  3:28 ` [v2 PATCH 3/5] mm: hwpoison: remove the unnecessary THP check Yang Shi
2021-09-23  3:28 ` [v2 PATCH 4/5] mm: shmem: don't truncate page if memory failure happens Yang Shi
2021-09-23  3:28 ` [v2 PATCH 5/5] mm: hwpoison: handle non-anonymous THP correctly Yang Shi

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CAHbLzkqb-6a7c=C8WF0G0X2yCey=t7OoL-oW2Y0CpM0MpgJbBg@mail.gmail.com' \
    --to=shy828301@gmail.com \
    --cc=akpm@linux-foundation.org \
    --cc=hughd@google.com \
    --cc=kirill.shutemov@linux.intel.com \
    --cc=kirill@shutemov.name \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=naoya.horiguchi@nec.com \
    --cc=osalvador@suse.de \
    --cc=peterx@redhat.com \
    --cc=willy@infradead.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).