linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Hugh Dickins <hughd@google.com>
To: Naoya Horiguchi <nao.horiguchi@gmail.com>
Cc: Andrew Morton <akpm@linux-foundation.org>,
	Michal Hocko <mhocko@kernel.org>,
	Oscar Salvador <osalvador@suse.de>,
	Tony Luck <tony.luck@intel.com>,
	Matthew Wilcox <willy@infradead.org>,
	"Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com>,
	Naoya Horiguchi <naoya.horiguchi@nec.com>,
	Jue Wang <juew@google.com>, Greg Thelen <gthelen@google.com>,
	Hugh Dickins <hughd@google.com>,
	linux-kernel@vger.kernel.org, linux-mm@kvack.org
Subject: Re: [PATCH v1] mm, hwpoison: enable error handling on shmem thp
Date: Wed, 10 Mar 2021 23:22:18 -0800 (PST)	[thread overview]
Message-ID: <alpine.LSU.2.11.2103102214170.7159@eggly.anvils> (raw)
In-Reply-To: <20210209062128.453814-1-nao.horiguchi@gmail.com>

On Tue, 9 Feb 2021, Naoya Horiguchi wrote:

> From: Naoya Horiguchi <naoya.horiguchi@nec.com>
> 
> Currently hwpoison code checks PageAnon() for thp and refuses to handle
> errors on non-anonymous thps (just for historical reason).  We now
> support non-anonymou thp like shmem one, so this patch suggests to enable
> to handle shmem thps. Fortunately, we already have can_split_huge_page()
> to check if a give thp is splittable, so this patch relies on it.

Fortunately? I don't understand. Why call can_split_huge_page()
at all, instead of simply trying split_huge_page() directly?
And could it do better than -EBUSY when split_huge_page() fails?

> 
> Signed-off-by: Naoya Horiguchi <naoya.horiguchi@nec.com>

Thanks for trying to add shmem+file THP support, but I think this
does not work as intended - Andrew, if Naoya agrees, please drop from
mmotm for now, the fixup needed will be more than a line or two.

I'm not much into memory-failure myself, but Jue discovered that the
SIGBUS never arrives: because split_huge_page() on a shmem or file
THP unmaps all its pmds and ptes, and (unlike with anon) leaves them
unmapped - in normal circumstances, to be faulted back on demand.
So the page_mapped() check in hwpoison_user_mappings() fails,
and the intended SIGBUS is not delivered.

(Or, is it acceptable that the SIGBUS is not delivered to those who
have the huge page mapped: should it get delivered later, to anyone
who faults back in the bad 4k?)

We believe the tokill list has to be set up earlier, before
split_huge_page() is called, then passed in to hwpoison_user_mappings().

Sorry, we don't have a proper patch for that right now, but I expect
you can see what needs to be done.  But something we found on the way,
we do have a patch for: add_to_kill() uses page_address_in_vma(), but
that has not been used on file THP tails before - fix appended at the
end below, so as not to waste your time on that bit.

> ---
>  mm/memory-failure.c | 34 +++++++++-------------------------
>  1 file changed, 9 insertions(+), 25 deletions(-)
> 
> diff --git v5.11-rc6/mm/memory-failure.c v5.11-rc6_patched/mm/memory-failure.c
> index e9481632fcd1..8c1575de0507 100644
> --- v5.11-rc6/mm/memory-failure.c
> +++ v5.11-rc6_patched/mm/memory-failure.c
> @@ -956,20 +956,6 @@ static int __get_hwpoison_page(struct page *page)
>  {
>  	struct page *head = compound_head(page);
>  
> -	if (!PageHuge(head) && PageTransHuge(head)) {
> -		/*
> -		 * Non anonymous thp exists only in allocation/free time. We
> -		 * can't handle such a case correctly, so let's give it up.
> -		 * This should be better than triggering BUG_ON when kernel
> -		 * tries to touch the "partially handled" page.
> -		 */
> -		if (!PageAnon(head)) {
> -			pr_err("Memory failure: %#lx: non anonymous thp\n",
> -				page_to_pfn(page));
> -			return 0;
> -		}
> -	}
> -
>  	if (get_page_unless_zero(head)) {
>  		if (head == compound_head(page))
>  			return 1;
> @@ -1197,21 +1183,19 @@ static int identify_page_state(unsigned long pfn, struct page *p,
>  
>  static int try_to_split_thp_page(struct page *page, const char *msg)
>  {
> -	lock_page(page);
> -	if (!PageAnon(page) || unlikely(split_huge_page(page))) {
> -		unsigned long pfn = page_to_pfn(page);
> +	struct page *head;
>  
> +	lock_page(page);
> +	head = compound_head(page);
> +	if (PageTransHuge(head) && can_split_huge_page(head, NULL) &&
> +	    !split_huge_page(page)) {
>  		unlock_page(page);
> -		if (!PageAnon(page))
> -			pr_info("%s: %#lx: non anonymous thp\n", msg, pfn);
> -		else
> -			pr_info("%s: %#lx: thp split failed\n", msg, pfn);
> -		put_page(page);
> -		return -EBUSY;
> +		return 0;
>  	}
> +	pr_info("%s: %#lx: thp split failed\n", msg, page_to_pfn(page));
>  	unlock_page(page);
> -
> -	return 0;
> +	put_page(page);
> +	return -EBUSY;
>  }
>  
>  static int memory_failure_hugetlb(unsigned long pfn, int flags)
> -- 
> 2.25.1

[PATCH] mm: fix page_address_in_vma() on file THP tails
From: Jue Wang <juew@google.com>

Anon THP tails were already supported, but memory-failure now needs to use
page_address_in_vma() on file THP tails, which its page->mapping check did
not permit: fix it.

Signed-off-by: Jue Wang <juew@google.com>
Signed-off-by: Hugh Dickins <hughd@google.com>
---

 mm/rmap.c |    8 ++++----
 1 file changed, 4 insertions(+), 4 deletions(-)

--- 5.12-rc2/mm/rmap.c	2021-02-28 16:58:57.950450151 -0800
+++ linux/mm/rmap.c	2021-03-10 20:29:21.591475177 -0800
@@ -717,11 +717,11 @@ unsigned long page_address_in_vma(struct
 		if (!vma->anon_vma || !page__anon_vma ||
 		    vma->anon_vma->root != page__anon_vma->root)
 			return -EFAULT;
-	} else if (page->mapping) {
-		if (!vma->vm_file || vma->vm_file->f_mapping != page->mapping)
-			return -EFAULT;
-	} else
+	} else if (!vma->vm_file) {
+		return -EFAULT;
+	} else if (vma->vm_file->f_mapping != compound_head(page)->mapping) {
 		return -EFAULT;
+	}
 	address = __vma_address(page, vma);
 	if (unlikely(address < vma->vm_start || address >= vma->vm_end))
 		return -EFAULT;

  parent reply	other threads:[~2021-03-11  7:23 UTC|newest]

Thread overview: 9+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-02-09  6:21 [PATCH v1] mm, hwpoison: enable error handling on shmem thp Naoya Horiguchi
2021-02-09 19:46 ` Andrew Morton
2021-02-09 23:51   ` Naoya Horiguchi
2021-02-11  8:06 ` Oscar Salvador
2021-03-11  7:22 ` Hugh Dickins [this message]
2021-03-11 14:45   ` Matthew Wilcox
     [not found]   ` <20210311151446.GA28650@hori.linux.bs1.fc.nec.co.jp>
2021-03-11 19:32     ` Jue Wang
2021-03-11 22:00       ` Hugh Dickins
2021-03-18 18:33         ` Matthew Wilcox

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=alpine.LSU.2.11.2103102214170.7159@eggly.anvils \
    --to=hughd@google.com \
    --cc=akpm@linux-foundation.org \
    --cc=aneesh.kumar@linux.vnet.ibm.com \
    --cc=gthelen@google.com \
    --cc=juew@google.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mhocko@kernel.org \
    --cc=nao.horiguchi@gmail.com \
    --cc=naoya.horiguchi@nec.com \
    --cc=osalvador@suse.de \
    --cc=tony.luck@intel.com \
    --cc=willy@infradead.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).