linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
* File THP and HWPoison
@ 2021-03-16 14:09 Matthew Wilcox
  2021-03-16 16:36 ` Andi Kleen
  2021-03-18 14:08 ` Kirill A. Shutemov
  0 siblings, 2 replies; 7+ messages in thread
From: Matthew Wilcox @ 2021-03-16 14:09 UTC (permalink / raw)
  To: linux-mm, Kirill A. Shutemov, Hugh Dickins, Andi Kleen

If we get a memory failure in the middle of a file THP, I think we handle
it poorly.

int memory_failure(unsigned long pfn, int flags)
...
        if (TestSetPageHWPoison(p)) {
...
        orig_head = hpage = compound_head(p);
...
        if (PageTransHuge(hpage)) {
                if (try_to_split_thp_page(p, "Memory Failure") < 0) {
                        action_result(pfn, MF_MSG_UNSPLIT_THP, MF_IGNORED);
                        return -EBUSY;
                }

static int try_to_split_thp_page(struct page *page, const char *msg)
{
        lock_page(page);
        if (!PageAnon(page) || unlikely(split_huge_page(page))) {
                unsigned long pfn = page_to_pfn(page);

                unlock_page(page);
                if (!PageAnon(page))
                        pr_info("%s: %#lx: non anonymous thp\n", msg, pfn);
                else
                        pr_info("%s: %#lx: thp split failed\n", msg, pfn);
                put_page(page);
                return -EBUSY;

So (for some reason) we don't even try to split a file THP.  But then,
if we take a page fault on a file THP:

static struct page *next_uptodate_page(struct page *page,
...
                if (PageHWPoison(page))
                        goto skip;
(... but we're only testing the head page here, which isn't necessarily
the one which got the error ...)

        if (pmd_none(*vmf->pmd) && PageTransHuge(page)) {
            vm_fault_t ret = do_set_pmd(vmf, page);

So we now map the PMD-sized page into userspace, even though it has a
HWPoison in it.

I think there are two things that we should be doing:

1. Attempt to split THPs which are file-backed.  That makes most of this
problem disappear because there won't be THPs with HWPoison, mostly.

2. When the THP fails to split, use a spare page flag to indicate that
the THP contains a HWPoison bit in one of its subpages.  There are a
lot of PF_SECOND flags available for this purpose.

but I know almost nothing about the memory-failure subsystem and I'm
still learning all the complexities of THPs, so it's entirely possible
I've overlooked something important.


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: File THP and HWPoison
  2021-03-16 14:09 File THP and HWPoison Matthew Wilcox
@ 2021-03-16 16:36 ` Andi Kleen
  2021-03-16 16:52   ` Matthew Wilcox
  2021-03-18 14:08 ` Kirill A. Shutemov
  1 sibling, 1 reply; 7+ messages in thread
From: Andi Kleen @ 2021-03-16 16:36 UTC (permalink / raw)
  To: Matthew Wilcox; +Cc: linux-mm, Kirill A. Shutemov, Hugh Dickins

> So we now map the PMD-sized page into userspace, even though it has a
> HWPoison in it.

Yes that's bad. It might panic the system.

> 
> I think there are two things that we should be doing:
> 
> 1. Attempt to split THPs which are file-backed.  That makes most of this
> problem disappear because there won't be THPs with HWPoison, mostly.

That seems simple and sensible.

> 2. When the THP fails to split, use a spare page flag to indicate that
> the THP contains a HWPoison bit in one of its subpages.  There are a
> lot of PF_SECOND flags available for this purpose.

Why should it fail? I thought splitting always succeeds.

-Andi


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: File THP and HWPoison
  2021-03-16 16:36 ` Andi Kleen
@ 2021-03-16 16:52   ` Matthew Wilcox
  0 siblings, 0 replies; 7+ messages in thread
From: Matthew Wilcox @ 2021-03-16 16:52 UTC (permalink / raw)
  To: Andi Kleen; +Cc: linux-mm, Kirill A. Shutemov, Hugh Dickins

On Tue, Mar 16, 2021 at 09:36:49AM -0700, Andi Kleen wrote:
> > So we now map the PMD-sized page into userspace, even though it has a
> > HWPoison in it.
> 
> Yes that's bad. It might panic the system.
> 
> > 
> > I think there are two things that we should be doing:
> > 
> > 1. Attempt to split THPs which are file-backed.  That makes most of this
> > problem disappear because there won't be THPs with HWPoison, mostly.
> 
> That seems simple and sensible.
> 
> > 2. When the THP fails to split, use a spare page flag to indicate that
> > the THP contains a HWPoison bit in one of its subpages.  There are a
> > lot of PF_SECOND flags available for this purpose.
> 
> Why should it fail? I thought splitting always succeeds.

If somebody has a temporary reference to it, splitting will fail.
eg a process is currently doing a read() from it, so filemap_read()
has a reference on the page.  This is why THPs support a delayed split
(currently used under memory pressure, but we could do something using
that delayed list to schedule the page for splitting later if it has
HWPoison).


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: File THP and HWPoison
  2021-03-16 14:09 File THP and HWPoison Matthew Wilcox
  2021-03-16 16:36 ` Andi Kleen
@ 2021-03-18 14:08 ` Kirill A. Shutemov
  2021-03-18 14:57   ` Matthew Wilcox
  1 sibling, 1 reply; 7+ messages in thread
From: Kirill A. Shutemov @ 2021-03-18 14:08 UTC (permalink / raw)
  To: Matthew Wilcox, Naoya Horiguchi
  Cc: linux-mm, Kirill A. Shutemov, Hugh Dickins, Andi Kleen

On Tue, Mar 16, 2021 at 02:09:47PM +0000, Matthew Wilcox wrote:
> If we get a memory failure in the middle of a file THP, I think we handle
> it poorly.
> 
> int memory_failure(unsigned long pfn, int flags)
> ...
>         if (TestSetPageHWPoison(p)) {
> ...
>         orig_head = hpage = compound_head(p);
> ...
>         if (PageTransHuge(hpage)) {
>                 if (try_to_split_thp_page(p, "Memory Failure") < 0) {
>                         action_result(pfn, MF_MSG_UNSPLIT_THP, MF_IGNORED);
>                         return -EBUSY;
>                 }
> 
> static int try_to_split_thp_page(struct page *page, const char *msg)
> {
>         lock_page(page);
>         if (!PageAnon(page) || unlikely(split_huge_page(page))) {
>                 unsigned long pfn = page_to_pfn(page);
> 
>                 unlock_page(page);
>                 if (!PageAnon(page))
>                         pr_info("%s: %#lx: non anonymous thp\n", msg, pfn);
>                 else
>                         pr_info("%s: %#lx: thp split failed\n", msg, pfn);
>                 put_page(page);
>                 return -EBUSY;
> 
> So (for some reason) we don't even try to split a file THP.  But then,
> if we take a page fault on a file THP:
> 
> static struct page *next_uptodate_page(struct page *page,
> ...
>                 if (PageHWPoison(page))
>                         goto skip;
> (... but we're only testing the head page here, which isn't necessarily
> the one which got the error ...)
> 
>         if (pmd_none(*vmf->pmd) && PageTransHuge(page)) {
>             vm_fault_t ret = do_set_pmd(vmf, page);
> 
> So we now map the PMD-sized page into userspace, even though it has a
> HWPoison in it.
> 
> I think there are two things that we should be doing:
> 
> 1. Attempt to split THPs which are file-backed.  That makes most of this
> problem disappear because there won't be THPs with HWPoison, mostly.

+Naoya. Could you give more context here?

> 2. When the THP fails to split, use a spare page flag to indicate that
> the THP contains a HWPoison bit in one of its subpages.  There are a
> lot of PF_SECOND flags available for this purpose.
> 
> but I know almost nothing about the memory-failure subsystem and I'm
> still learning all the complexities of THPs, so it's entirely possible
> I've overlooked something important.

I wounder if it would be cleaner to switch PG_hwpoison to PF_HEAD: if
split failed we posion whole compound page. Yes, we will waste more
memory, but it makes it much cleaner for user: just check if the page is
poisoned.

-- 
 Kirill A. Shutemov


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: File THP and HWPoison
  2021-03-18 14:08 ` Kirill A. Shutemov
@ 2021-03-18 14:57   ` Matthew Wilcox
  2021-03-18 17:25     ` HORIGUCHI NAOYA(堀口 直也)
  0 siblings, 1 reply; 7+ messages in thread
From: Matthew Wilcox @ 2021-03-18 14:57 UTC (permalink / raw)
  To: Kirill A. Shutemov
  Cc: Naoya Horiguchi, linux-mm, Kirill A. Shutemov, Hugh Dickins, Andi Kleen

On Thu, Mar 18, 2021 at 05:08:43PM +0300, Kirill A. Shutemov wrote:
> On Tue, Mar 16, 2021 at 02:09:47PM +0000, Matthew Wilcox wrote:
> > If we get a memory failure in the middle of a file THP, I think we handle
> > it poorly.
> > 
> > int memory_failure(unsigned long pfn, int flags)
> > ...
> >         if (TestSetPageHWPoison(p)) {
> > ...
> >         orig_head = hpage = compound_head(p);
> > ...
> >         if (PageTransHuge(hpage)) {
> >                 if (try_to_split_thp_page(p, "Memory Failure") < 0) {
> >                         action_result(pfn, MF_MSG_UNSPLIT_THP, MF_IGNORED);
> >                         return -EBUSY;
> >                 }
> > 
> > static int try_to_split_thp_page(struct page *page, const char *msg)
> > {
> >         lock_page(page);
> >         if (!PageAnon(page) || unlikely(split_huge_page(page))) {
> >                 unsigned long pfn = page_to_pfn(page);
> > 
> >                 unlock_page(page);
> >                 if (!PageAnon(page))
> >                         pr_info("%s: %#lx: non anonymous thp\n", msg, pfn);
> >                 else
> >                         pr_info("%s: %#lx: thp split failed\n", msg, pfn);
> >                 put_page(page);
> >                 return -EBUSY;
> > 
> > So (for some reason) we don't even try to split a file THP.  But then,
> > if we take a page fault on a file THP:
> > 
> > static struct page *next_uptodate_page(struct page *page,
> > ...
> >                 if (PageHWPoison(page))
> >                         goto skip;
> > (... but we're only testing the head page here, which isn't necessarily
> > the one which got the error ...)
> > 
> >         if (pmd_none(*vmf->pmd) && PageTransHuge(page)) {
> >             vm_fault_t ret = do_set_pmd(vmf, page);
> > 
> > So we now map the PMD-sized page into userspace, even though it has a
> > HWPoison in it.
> > 
> > I think there are two things that we should be doing:
> > 
> > 1. Attempt to split THPs which are file-backed.  That makes most of this
> > problem disappear because there won't be THPs with HWPoison, mostly.
> 
> +Naoya. Could you give more context here?

I did some git archaeology and found this check was introduced in
7f6bf39bbdd1 ("mm/hwpoison: fix panic due to split huge zero page") where
it wasn't intended to catch _file_ pages at all, but the zero page.
I suspect that nobody thought to look at this when introducing THP
for shmem.

> > 2. When the THP fails to split, use a spare page flag to indicate that
> > the THP contains a HWPoison bit in one of its subpages.  There are a
> > lot of PF_SECOND flags available for this purpose.
> > 
> > but I know almost nothing about the memory-failure subsystem and I'm
> > still learning all the complexities of THPs, so it's entirely possible
> > I've overlooked something important.
> 
> I wounder if it would be cleaner to switch PG_hwpoison to PF_HEAD: if
> split failed we posion whole compound page. Yes, we will waste more
> memory, but it makes it much cleaner for user: just check if the page is
> poisoned.

I think that's a poor quality implementation ... it'd cause processes
to die that weren't even touching the page that had hwpoison.  Using
a PF_SECOND bit lets us do the check as cheaply as if we made hwpoison
PF_HEAD.


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: File THP and HWPoison
  2021-03-18 14:57   ` Matthew Wilcox
@ 2021-03-18 17:25     ` HORIGUCHI NAOYA(堀口 直也)
  2021-03-18 18:05       ` Yang Shi
  0 siblings, 1 reply; 7+ messages in thread
From: HORIGUCHI NAOYA(堀口 直也) @ 2021-03-18 17:25 UTC (permalink / raw)
  To: Matthew Wilcox
  Cc: Kirill A. Shutemov, linux-mm, Kirill A. Shutemov, Hugh Dickins,
	Andi Kleen

On Thu, Mar 18, 2021 at 02:57:16PM +0000, Matthew Wilcox wrote:
> On Thu, Mar 18, 2021 at 05:08:43PM +0300, Kirill A. Shutemov wrote:
> > On Tue, Mar 16, 2021 at 02:09:47PM +0000, Matthew Wilcox wrote:
> > > If we get a memory failure in the middle of a file THP, I think we handle
> > > it poorly.
> > > 
> > > int memory_failure(unsigned long pfn, int flags)
> > > ...
> > >         if (TestSetPageHWPoison(p)) {
> > > ...
> > >         orig_head = hpage = compound_head(p);
> > > ...
> > >         if (PageTransHuge(hpage)) {
> > >                 if (try_to_split_thp_page(p, "Memory Failure") < 0) {
> > >                         action_result(pfn, MF_MSG_UNSPLIT_THP, MF_IGNORED);
> > >                         return -EBUSY;
> > >                 }
> > > 
> > > static int try_to_split_thp_page(struct page *page, const char *msg)
> > > {
> > >         lock_page(page);
> > >         if (!PageAnon(page) || unlikely(split_huge_page(page))) {
> > >                 unsigned long pfn = page_to_pfn(page);
> > > 
> > >                 unlock_page(page);
> > >                 if (!PageAnon(page))
> > >                         pr_info("%s: %#lx: non anonymous thp\n", msg, pfn);
> > >                 else
> > >                         pr_info("%s: %#lx: thp split failed\n", msg, pfn);
> > >                 put_page(page);
> > >                 return -EBUSY;
> > > 
> > > So (for some reason) we don't even try to split a file THP.  But then,
> > > if we take a page fault on a file THP:
> > > 
> > > static struct page *next_uptodate_page(struct page *page,
> > > ...
> > >                 if (PageHWPoison(page))
> > >                         goto skip;
> > > (... but we're only testing the head page here, which isn't necessarily
> > > the one which got the error ...)
> > > 
> > >         if (pmd_none(*vmf->pmd) && PageTransHuge(page)) {
> > >             vm_fault_t ret = do_set_pmd(vmf, page);
> > > 
> > > So we now map the PMD-sized page into userspace, even though it has a
> > > HWPoison in it.
> > > 
> > > I think there are two things that we should be doing:
> > > 
> > > 1. Attempt to split THPs which are file-backed.  That makes most of this
> > > problem disappear because there won't be THPs with HWPoison, mostly.
> > 
> > +Naoya. Could you give more context here?

Recently, I tried to address the problem on
https://lore.kernel.org/linux-mm/20210209062128.453814-1-nao.horiguchi@gmail.com/
but the patch was found incorrect because the related page table entries disappeared
after split_huge_page() succeeded.  I thought I'm going to study more, but
didn't make it this week because I looked at other review requests.

A pmd mapping for anonymous thp is replaced with 512 pte mappings by
split_huge_page(), so I'm wondering why we don't do the same for shmem thp.

> 
> I did some git archaeology and found this check was introduced in
> 7f6bf39bbdd1 ("mm/hwpoison: fix panic due to split huge zero page") where
> it wasn't intended to catch _file_ pages at all, but the zero page.
> I suspect that nobody thought to look at this when introducing THP
> for shmem.

Yes, 7f6bf39bbdd1 was worked before thp page cache, so we did not consider
it at that time.

Thanks,
Naoya Horiguchi

> 
> > > 2. When the THP fails to split, use a spare page flag to indicate that
> > > the THP contains a HWPoison bit in one of its subpages.  There are a
> > > lot of PF_SECOND flags available for this purpose.
> > > 
> > > but I know almost nothing about the memory-failure subsystem and I'm
> > > still learning all the complexities of THPs, so it's entirely possible
> > > I've overlooked something important.
> > 
> > I wounder if it would be cleaner to switch PG_hwpoison to PF_HEAD: if
> > split failed we posion whole compound page. Yes, we will waste more
> > memory, but it makes it much cleaner for user: just check if the page is
> > poisoned.
> 
> I think that's a poor quality implementation ... it'd cause processes
> to die that weren't even touching the page that had hwpoison.  Using
> a PF_SECOND bit lets us do the check as cheaply as if we made hwpoison
> PF_HEAD.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: File THP and HWPoison
  2021-03-18 17:25     ` HORIGUCHI NAOYA(堀口 直也)
@ 2021-03-18 18:05       ` Yang Shi
  0 siblings, 0 replies; 7+ messages in thread
From: Yang Shi @ 2021-03-18 18:05 UTC (permalink / raw)
  To: HORIGUCHI NAOYA(堀口 直也)
  Cc: Matthew Wilcox, Kirill A. Shutemov, linux-mm, Kirill A. Shutemov,
	Hugh Dickins, Andi Kleen

On Thu, Mar 18, 2021 at 10:25 AM HORIGUCHI NAOYA(堀口 直也)
<naoya.horiguchi@nec.com> wrote:
>
> On Thu, Mar 18, 2021 at 02:57:16PM +0000, Matthew Wilcox wrote:
> > On Thu, Mar 18, 2021 at 05:08:43PM +0300, Kirill A. Shutemov wrote:
> > > On Tue, Mar 16, 2021 at 02:09:47PM +0000, Matthew Wilcox wrote:
> > > > If we get a memory failure in the middle of a file THP, I think we handle
> > > > it poorly.
> > > >
> > > > int memory_failure(unsigned long pfn, int flags)
> > > > ...
> > > >         if (TestSetPageHWPoison(p)) {
> > > > ...
> > > >         orig_head = hpage = compound_head(p);
> > > > ...
> > > >         if (PageTransHuge(hpage)) {
> > > >                 if (try_to_split_thp_page(p, "Memory Failure") < 0) {
> > > >                         action_result(pfn, MF_MSG_UNSPLIT_THP, MF_IGNORED);
> > > >                         return -EBUSY;
> > > >                 }
> > > >
> > > > static int try_to_split_thp_page(struct page *page, const char *msg)
> > > > {
> > > >         lock_page(page);
> > > >         if (!PageAnon(page) || unlikely(split_huge_page(page))) {
> > > >                 unsigned long pfn = page_to_pfn(page);
> > > >
> > > >                 unlock_page(page);
> > > >                 if (!PageAnon(page))
> > > >                         pr_info("%s: %#lx: non anonymous thp\n", msg, pfn);
> > > >                 else
> > > >                         pr_info("%s: %#lx: thp split failed\n", msg, pfn);
> > > >                 put_page(page);
> > > >                 return -EBUSY;
> > > >
> > > > So (for some reason) we don't even try to split a file THP.  But then,
> > > > if we take a page fault on a file THP:
> > > >
> > > > static struct page *next_uptodate_page(struct page *page,
> > > > ...
> > > >                 if (PageHWPoison(page))
> > > >                         goto skip;
> > > > (... but we're only testing the head page here, which isn't necessarily
> > > > the one which got the error ...)
> > > >
> > > >         if (pmd_none(*vmf->pmd) && PageTransHuge(page)) {
> > > >             vm_fault_t ret = do_set_pmd(vmf, page);
> > > >
> > > > So we now map the PMD-sized page into userspace, even though it has a
> > > > HWPoison in it.
> > > >
> > > > I think there are two things that we should be doing:
> > > >
> > > > 1. Attempt to split THPs which are file-backed.  That makes most of this
> > > > problem disappear because there won't be THPs with HWPoison, mostly.
> > >
> > > +Naoya. Could you give more context here?
>
> Recently, I tried to address the problem on
> https://lore.kernel.org/linux-mm/20210209062128.453814-1-nao.horiguchi@gmail.com/
> but the patch was found incorrect because the related page table entries disappeared
> after split_huge_page() succeeded.  I thought I'm going to study more, but
> didn't make it this week because I looked at other review requests.
>
> A pmd mapping for anonymous thp is replaced with 512 pte mappings by
> split_huge_page(), so I'm wondering why we don't do the same for shmem thp.

We have to install 512 ptes for anonymous pages otherwise they are
gone. But it is unnecessary for page cache pages, even though the ptes
are gone they are still in xarray and can be found by page fault
later.

IMHO we should be able to pass in one more parameter to teach
split_huge_page to reinstall ptes for page cache for some cases.

>
> >
> > I did some git archaeology and found this check was introduced in
> > 7f6bf39bbdd1 ("mm/hwpoison: fix panic due to split huge zero page") where
> > it wasn't intended to catch _file_ pages at all, but the zero page.
> > I suspect that nobody thought to look at this when introducing THP
> > for shmem.
>
> Yes, 7f6bf39bbdd1 was worked before thp page cache, so we did not consider
> it at that time.
>
> Thanks,
> Naoya Horiguchi
>
> >
> > > > 2. When the THP fails to split, use a spare page flag to indicate that
> > > > the THP contains a HWPoison bit in one of its subpages.  There are a
> > > > lot of PF_SECOND flags available for this purpose.
> > > >
> > > > but I know almost nothing about the memory-failure subsystem and I'm
> > > > still learning all the complexities of THPs, so it's entirely possible
> > > > I've overlooked something important.
> > >
> > > I wounder if it would be cleaner to switch PG_hwpoison to PF_HEAD: if
> > > split failed we posion whole compound page. Yes, we will waste more
> > > memory, but it makes it much cleaner for user: just check if the page is
> > > poisoned.
> >
> > I think that's a poor quality implementation ... it'd cause processes
> > to die that weren't even touching the page that had hwpoison.  Using
> > a PF_SECOND bit lets us do the check as cheaply as if we made hwpoison
> > PF_HEAD.


^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2021-03-18 18:15 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-03-16 14:09 File THP and HWPoison Matthew Wilcox
2021-03-16 16:36 ` Andi Kleen
2021-03-16 16:52   ` Matthew Wilcox
2021-03-18 14:08 ` Kirill A. Shutemov
2021-03-18 14:57   ` Matthew Wilcox
2021-03-18 17:25     ` HORIGUCHI NAOYA(堀口 直也)
2021-03-18 18:05       ` Yang Shi

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).