All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 0/6] hwpoison, shmem, hugetlb: fix data loss issue 5.10.y
@ 2022-11-23 19:54 Mike Kravetz
  2022-11-23 19:54 ` [PATCH 1/6] mm: hwpoison: remove the unnecessary THP check Mike Kravetz
                   ` (6 more replies)
  0 siblings, 7 replies; 8+ messages in thread
From: Mike Kravetz @ 2022-11-23 19:54 UTC (permalink / raw)
  To: stable, linux-mm
  Cc: Yang Shi, Naoya Horiguchi, James Houghton, Oscar Salvador,
	Miaohe Lin, Muchun Song, Andrew Morton

This is a request for adding the following patches to stable 5.10.y.

Poisoned shmem and hugetlb pages are removed from the pagecache.
Subsequent access to the offset in the file results in a NEW zero
filled page.  Application code does not get notified of the data
loss, and the only 'clue' is a message in the system log.  Data
loss has been experienced by real users.

This was addressed upstream.  Most commits were marked for backports,
but some were not.  This was discussed here [1] and here [2].

Patches apply cleanly to v5.4.224 and pass tests checking for this
specific data loss issue.  LTP mm tests show no regressions.

All patches except 4 "mm: hwpoison: handle non-anonymous THP correctly"
required a small bit of change to apply correctly: mostly for context.

linux-mm Cc'ed as it would be great to get at least an ACK from others
familiar with this issue.

[1] https://lore.kernel.org/linux-mm/Y2UTUNBHVY5U9si2@monkey/
[2] https://lore.kernel.org/stable/20221114131403.GA3807058@u2004/

James Houghton (1):
  hugetlbfs: don't delete error page from pagecache

Yang Shi (5):
  mm: hwpoison: remove the unnecessary THP check
  mm: filemap: check if THP has hwpoisoned subpage for PMD page fault
  mm: hwpoison: refactor refcount check handling
  mm: hwpoison: handle non-anonymous THP correctly
  mm: shmem: don't truncate page if memory failure happens

 fs/hugetlbfs/inode.c       |  13 ++--
 include/linux/page-flags.h |  23 ++++++
 mm/huge_memory.c           |   2 +
 mm/hugetlb.c               |   4 +
 mm/memory-failure.c        | 153 ++++++++++++++++++++++++-------------
 mm/memory.c                |   9 +++
 mm/page_alloc.c            |   4 +-
 mm/shmem.c                 |  51 +++++++++++--
 8 files changed, 191 insertions(+), 68 deletions(-)

-- 
2.38.1


^ permalink raw reply	[flat|nested] 8+ messages in thread

* [PATCH 1/6] mm: hwpoison: remove the unnecessary THP check
  2022-11-23 19:54 [PATCH 0/6] hwpoison, shmem, hugetlb: fix data loss issue 5.10.y Mike Kravetz
@ 2022-11-23 19:54 ` Mike Kravetz
  2022-11-23 19:54 ` [PATCH 2/6] mm: filemap: check if THP has hwpoisoned subpage for PMD page fault Mike Kravetz
                   ` (5 subsequent siblings)
  6 siblings, 0 replies; 8+ messages in thread
From: Mike Kravetz @ 2022-11-23 19:54 UTC (permalink / raw)
  To: stable, linux-mm
  Cc: Yang Shi, Naoya Horiguchi, James Houghton, Oscar Salvador,
	Miaohe Lin, Muchun Song, Andrew Morton

From: Yang Shi <shy828301@gmail.com>

commit c7cb42e94473aafe553c0f2a3d8ca904599399ed upstream

When handling THP hwpoison checked if the THP is in allocation or free
stage since hwpoison may mistreat it as hugetlb page.  After commit
415c64c1453a ("mm/memory-failure: split thp earlier in memory error
handling") the problem has been fixed, so this check is no longer
needed.  Remove it.  The side effect of the removal is hwpoison may
report unsplit THP instead of unknown error for shmem THP.  It seems not
like a big deal.

The following patch "mm: filemap: check if THP has hwpoisoned subpage
for PMD page fault" depends on this, which fixes shmem THP with
hwpoisoned subpage(s) are mapped PMD wrongly.  So this patch needs to be
backported to -stable as well.

Link: https://lkml.kernel.org/r/20211020210755.23964-2-shy828301@gmail.com
Signed-off-by: Yang Shi <shy828301@gmail.com>
Suggested-by: Naoya Horiguchi <naoya.horiguchi@nec.com>
Acked-by: Naoya Horiguchi <naoya.horiguchi@nec.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Oscar Salvador <osalvador@suse.de>
Cc: Peter Xu <peterx@redhat.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Mike Kravetz <mike.kravetz@oracle.com>
---
 mm/memory-failure.c | 14 --------------
 1 file changed, 14 deletions(-)

diff --git a/mm/memory-failure.c b/mm/memory-failure.c
index aef267c6a724..ae8b60e5f939 100644
--- a/mm/memory-failure.c
+++ b/mm/memory-failure.c
@@ -956,20 +956,6 @@ static int get_hwpoison_page(struct page *page)
 {
 	struct page *head = compound_head(page);
 
-	if (!PageHuge(head) && PageTransHuge(head)) {
-		/*
-		 * Non anonymous thp exists only in allocation/free time. We
-		 * can't handle such a case correctly, so let's give it up.
-		 * This should be better than triggering BUG_ON when kernel
-		 * tries to touch the "partially handled" page.
-		 */
-		if (!PageAnon(head)) {
-			pr_err("Memory failure: %#lx: non anonymous thp\n",
-				page_to_pfn(page));
-			return 0;
-		}
-	}
-
 	if (get_page_unless_zero(head)) {
 		if (head == compound_head(page))
 			return 1;
-- 
2.38.1


^ permalink raw reply related	[flat|nested] 8+ messages in thread

* [PATCH 2/6] mm: filemap: check if THP has hwpoisoned subpage for PMD page fault
  2022-11-23 19:54 [PATCH 0/6] hwpoison, shmem, hugetlb: fix data loss issue 5.10.y Mike Kravetz
  2022-11-23 19:54 ` [PATCH 1/6] mm: hwpoison: remove the unnecessary THP check Mike Kravetz
@ 2022-11-23 19:54 ` Mike Kravetz
  2022-11-23 19:54 ` [PATCH 3/6] mm: hwpoison: refactor refcount check handling Mike Kravetz
                   ` (4 subsequent siblings)
  6 siblings, 0 replies; 8+ messages in thread
From: Mike Kravetz @ 2022-11-23 19:54 UTC (permalink / raw)
  To: stable, linux-mm
  Cc: Yang Shi, Naoya Horiguchi, James Houghton, Oscar Salvador,
	Miaohe Lin, Muchun Song, Andrew Morton

From: Yang Shi <shy828301@gmail.com>

commit eac96c3efdb593df1a57bb5b95dbe037bfa9a522 upstream

When handling shmem page fault the THP with corrupted subpage could be
PMD mapped if certain conditions are satisfied.  But kernel is supposed
to send SIGBUS when trying to map hwpoisoned page.

There are two paths which may do PMD map: fault around and regular
fault.

Before commit f9ce0be71d1f ("mm: Cleanup faultaround and finish_fault()
codepaths") the thing was even worse in fault around path.  The THP
could be PMD mapped as long as the VMA fits regardless what subpage is
accessed and corrupted.  After this commit as long as head page is not
corrupted the THP could be PMD mapped.

In the regular fault path the THP could be PMD mapped as long as the
corrupted page is not accessed and the VMA fits.

This loophole could be fixed by iterating every subpage to check if any
of them is hwpoisoned or not, but it is somewhat costly in page fault
path.

So introduce a new page flag called HasHWPoisoned on the first tail
page.  It indicates the THP has hwpoisoned subpage(s).  It is set if any
subpage of THP is found hwpoisoned by memory failure and after the
refcount is bumped successfully, then cleared when the THP is freed or
split.

The soft offline path doesn't need this since soft offline handler just
marks a subpage hwpoisoned when the subpage is migrated successfully.
But shmem THP didn't get split then migrated at all.

Link: https://lkml.kernel.org/r/20211020210755.23964-3-shy828301@gmail.com
Fixes: 800d8c63b2e9 ("shmem: add huge pages support")
Signed-off-by: Yang Shi <shy828301@gmail.com>
Reviewed-by: Naoya Horiguchi <naoya.horiguchi@nec.com>
Suggested-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Oscar Salvador <osalvador@suse.de>
Cc: Peter Xu <peterx@redhat.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Mike Kravetz <mike.kravetz@oracle.com>
---
 include/linux/page-flags.h | 23 +++++++++++++++++++++++
 mm/huge_memory.c           |  2 ++
 mm/memory-failure.c        | 14 ++++++++++++++
 mm/memory.c                |  9 +++++++++
 mm/page_alloc.c            |  4 +++-
 5 files changed, 51 insertions(+), 1 deletion(-)

diff --git a/include/linux/page-flags.h b/include/linux/page-flags.h
index 4f6ba9379112..3f431736cfc0 100644
--- a/include/linux/page-flags.h
+++ b/include/linux/page-flags.h
@@ -169,6 +169,15 @@ enum pageflags {
 	/* Compound pages. Stored in first tail page's flags */
 	PG_double_map = PG_workingset,
 
+#ifdef CONFIG_MEMORY_FAILURE
+	/*
+	 * Compound pages. Stored in first tail page's flags.
+	 * Indicates that at least one subpage is hwpoisoned in the
+	 * THP.
+	 */
+	PG_has_hwpoisoned = PG_mappedtodisk,
+#endif
+
 	/* non-lru isolated movable page */
 	PG_isolated = PG_reclaim,
 
@@ -588,6 +597,20 @@ static inline void ClearPageCompound(struct page *page)
 }
 #endif
 
+#if defined(CONFIG_MEMORY_FAILURE) && defined(CONFIG_TRANSPARENT_HUGEPAGE)
+/*
+ * PageHasHWPoisoned indicates that at least one subpage is hwpoisoned in the
+ * compound page.
+ *
+ * This flag is set by hwpoison handler.  Cleared by THP split or free page.
+ */
+PAGEFLAG(HasHWPoisoned, has_hwpoisoned, PF_SECOND)
+	TESTSCFLAG(HasHWPoisoned, has_hwpoisoned, PF_SECOND)
+#else
+PAGEFLAG_FALSE(HasHWPoisoned)
+	TESTSCFLAG_FALSE(HasHWPoisoned)
+#endif
+
 #define PG_head_mask ((1UL << PG_head))
 
 #ifdef CONFIG_HUGETLB_PAGE
diff --git a/mm/huge_memory.c b/mm/huge_memory.c
index cb7b0aead709..cda0ea6c8623 100644
--- a/mm/huge_memory.c
+++ b/mm/huge_memory.c
@@ -2464,6 +2464,8 @@ static void __split_huge_page(struct page *page, struct list_head *list,
 		xa_lock(&swap_cache->i_pages);
 	}
 
+	ClearPageHasHWPoisoned(head);
+
 	for (i = nr - 1; i >= 1; i--) {
 		__split_huge_page_tail(head, i, lruvec, list);
 		/* Some pages can be beyond i_size: drop them from page cache */
diff --git a/mm/memory-failure.c b/mm/memory-failure.c
index ae8b60e5f939..b6ab841d1db7 100644
--- a/mm/memory-failure.c
+++ b/mm/memory-failure.c
@@ -1367,6 +1367,20 @@ int memory_failure(unsigned long pfn, int flags)
 	}
 
 	if (PageTransHuge(hpage)) {
+		/*
+		 * The flag must be set after the refcount is bumped
+		 * otherwise it may race with THP split.
+		 * And the flag can't be set in get_hwpoison_page() since
+		 * it is called by soft offline too and it is just called
+		 * for !MF_COUNT_INCREASE.  So here seems to be the best
+		 * place.
+		 *
+		 * Don't need care about the above error handling paths for
+		 * get_hwpoison_page() since they handle either free page
+		 * or unhandlable page.  The refcount is bumped iff the
+		 * page is a valid handlable page.
+		 */
+		SetPageHasHWPoisoned(hpage);
 		if (try_to_split_thp_page(p, "Memory Failure") < 0) {
 			action_result(pfn, MF_MSG_UNSPLIT_THP, MF_IGNORED);
 			return -EBUSY;
diff --git a/mm/memory.c b/mm/memory.c
index cbc0a163d705..1e9f96caabbe 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -3814,6 +3814,15 @@ static vm_fault_t do_set_pmd(struct vm_fault *vmf, struct page *page)
 	if (compound_order(page) != HPAGE_PMD_ORDER)
 		return ret;
 
+	/*
+	 * Just backoff if any subpage of a THP is corrupted otherwise
+	 * the corrupted page may mapped by PMD silently to escape the
+	 * check.  This kind of THP just can be PTE mapped.  Access to
+	 * the corrupted subpage should trigger SIGBUS as expected.
+	 */
+	if (unlikely(PageHasHWPoisoned(page)))
+		return ret;
+
 	/*
 	 * Archs like ppc64 need additonal space to store information
 	 * related to pte entry. Use the preallocated table for that.
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index a56f2b9df5a0..8eb745920cac 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -1232,8 +1232,10 @@ static __always_inline bool free_pages_prepare(struct page *page,
 
 		VM_BUG_ON_PAGE(compound && compound_order(page) != order, page);
 
-		if (compound)
+		if (compound) {
 			ClearPageDoubleMap(page);
+			ClearPageHasHWPoisoned(page);
+		}
 		for (i = 1; i < (1 << order); i++) {
 			if (compound)
 				bad += free_tail_pages_check(page, page + i);
-- 
2.38.1


^ permalink raw reply related	[flat|nested] 8+ messages in thread

* [PATCH 3/6] mm: hwpoison: refactor refcount check handling
  2022-11-23 19:54 [PATCH 0/6] hwpoison, shmem, hugetlb: fix data loss issue 5.10.y Mike Kravetz
  2022-11-23 19:54 ` [PATCH 1/6] mm: hwpoison: remove the unnecessary THP check Mike Kravetz
  2022-11-23 19:54 ` [PATCH 2/6] mm: filemap: check if THP has hwpoisoned subpage for PMD page fault Mike Kravetz
@ 2022-11-23 19:54 ` Mike Kravetz
  2022-11-23 19:54 ` [PATCH 4/6] mm: hwpoison: handle non-anonymous THP correctly Mike Kravetz
                   ` (3 subsequent siblings)
  6 siblings, 0 replies; 8+ messages in thread
From: Mike Kravetz @ 2022-11-23 19:54 UTC (permalink / raw)
  To: stable, linux-mm
  Cc: Yang Shi, Naoya Horiguchi, James Houghton, Oscar Salvador,
	Miaohe Lin, Muchun Song, Andrew Morton

From: Yang Shi <shy828301@gmail.com>

commit dd0f230a0a80ff396c7ce587f16429f2a8131344 upstream.

Memory failure will report failure if the page still has extra pinned
refcount other than from hwpoison after the handler is done.  Actually
the check is not necessary for all handlers, so move the check into
specific handlers.  This would make the following keeping shmem page in
page cache patch easier.

There may be expected extra pin for some cases, for example, when the
page is dirty and in swapcache.

Link: https://lkml.kernel.org/r/20211020210755.23964-5-shy828301@gmail.com
Signed-off-by: Yang Shi <shy828301@gmail.com>
Signed-off-by: Naoya Horiguchi <naoya.horiguchi@nec.com>
Suggested-by: Naoya Horiguchi <naoya.horiguchi@nec.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Oscar Salvador <osalvador@suse.de>
Cc: Peter Xu <peterx@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Mike Kravetz <mike.kravetz@oracle.com>
---
 mm/memory-failure.c | 107 +++++++++++++++++++++++++++++---------------
 1 file changed, 72 insertions(+), 35 deletions(-)

diff --git a/mm/memory-failure.c b/mm/memory-failure.c
index b6ab841d1db7..0f77ec1985dc 100644
--- a/mm/memory-failure.c
+++ b/mm/memory-failure.c
@@ -650,12 +650,42 @@ static int truncate_error_page(struct page *p, unsigned long pfn,
 	return ret;
 }
 
+struct page_state {
+	unsigned long mask;
+	unsigned long res;
+	enum mf_action_page_type type;
+	int (*action)(struct page_state *ps, struct page *p);
+};
+
+/*
+ * Return true if page is still referenced by others, otherwise return
+ * false.
+ *
+ * The extra_pins is true when one extra refcount is expected.
+ */
+static bool has_extra_refcount(struct page_state *ps, struct page *p,
+			       bool extra_pins)
+{
+	int count = page_count(p) - 1;
+
+	if (extra_pins)
+		count -= 1;
+
+	if (count > 0) {
+		pr_err("%#lx: %s still referenced by %d users\n",
+		       page_to_pfn(p), action_page_types[ps->type], count);
+		return true;
+	}
+
+	return false;
+}
+
 /*
  * Error hit kernel page.
  * Do nothing, try to be lucky and not touch this instead. For a few cases we
  * could be more sophisticated.
  */
-static int me_kernel(struct page *p, unsigned long pfn)
+static int me_kernel(struct page_state *ps, struct page *p)
 {
 	return MF_IGNORED;
 }
@@ -663,18 +693,19 @@ static int me_kernel(struct page *p, unsigned long pfn)
 /*
  * Page in unknown state. Do nothing.
  */
-static int me_unknown(struct page *p, unsigned long pfn)
+static int me_unknown(struct page_state *ps, struct page *p)
 {
-	pr_err("Memory failure: %#lx: Unknown page state\n", pfn);
+	pr_err("Memory failure: %#lx: Unknown page state\n", page_to_pfn(p));
 	return MF_FAILED;
 }
 
 /*
  * Clean (or cleaned) page cache page.
  */
-static int me_pagecache_clean(struct page *p, unsigned long pfn)
+static int me_pagecache_clean(struct page_state *ps, struct page *p)
 {
 	struct address_space *mapping;
+	int ret;
 
 	delete_from_lru_cache(p);
 
@@ -705,7 +736,12 @@ static int me_pagecache_clean(struct page *p, unsigned long pfn)
 	 *
 	 * Open: to take i_mutex or not for this? Right now we don't.
 	 */
-	return truncate_error_page(p, pfn, mapping);
+	ret = truncate_error_page(p, page_to_pfn(p), mapping);
+
+	if (has_extra_refcount(ps, p, false))
+		ret = MF_FAILED;
+
+	return ret;
 }
 
 /*
@@ -713,7 +749,7 @@ static int me_pagecache_clean(struct page *p, unsigned long pfn)
  * Issues: when the error hit a hole page the error is not properly
  * propagated.
  */
-static int me_pagecache_dirty(struct page *p, unsigned long pfn)
+static int me_pagecache_dirty(struct page_state *ps, struct page *p)
 {
 	struct address_space *mapping = page_mapping(p);
 
@@ -757,7 +793,7 @@ static int me_pagecache_dirty(struct page *p, unsigned long pfn)
 		mapping_set_error(mapping, -EIO);
 	}
 
-	return me_pagecache_clean(p, pfn);
+	return me_pagecache_clean(ps, p);
 }
 
 /*
@@ -779,26 +815,38 @@ static int me_pagecache_dirty(struct page *p, unsigned long pfn)
  * Clean swap cache pages can be directly isolated. A later page fault will
  * bring in the known good data from disk.
  */
-static int me_swapcache_dirty(struct page *p, unsigned long pfn)
+static int me_swapcache_dirty(struct page_state *ps, struct page *p)
 {
+	bool extra_pins = false;
+	int ret;
+
 	ClearPageDirty(p);
 	/* Trigger EIO in shmem: */
 	ClearPageUptodate(p);
 
-	if (!delete_from_lru_cache(p))
-		return MF_DELAYED;
-	else
-		return MF_FAILED;
+	ret = delete_from_lru_cache(p) ? MF_FAILED : MF_DELAYED;
+
+	if (ret == MF_DELAYED)
+		extra_pins = true;
+
+	if (has_extra_refcount(ps, p, extra_pins))
+		ret = MF_FAILED;
+
+	return ret;
 }
 
-static int me_swapcache_clean(struct page *p, unsigned long pfn)
+static int me_swapcache_clean(struct page_state *ps, struct page *p)
 {
+	int ret;
+
 	delete_from_swap_cache(p);
 
-	if (!delete_from_lru_cache(p))
-		return MF_RECOVERED;
-	else
-		return MF_FAILED;
+	ret = delete_from_lru_cache(p) ? MF_FAILED : MF_RECOVERED;
+
+	if (has_extra_refcount(ps, p, false))
+		ret = MF_FAILED;
+
+	return ret;
 }
 
 /*
@@ -807,7 +855,7 @@ static int me_swapcache_clean(struct page *p, unsigned long pfn)
  * - Error on hugepage is contained in hugepage unit (not in raw page unit.)
  *   To narrow down kill region to one page, we need to break up pmd.
  */
-static int me_huge_page(struct page *p, unsigned long pfn)
+static int me_huge_page(struct page_state *ps, struct page *p)
 {
 	int res = 0;
 	struct page *hpage = compound_head(p);
@@ -818,7 +866,7 @@ static int me_huge_page(struct page *p, unsigned long pfn)
 
 	mapping = page_mapping(hpage);
 	if (mapping) {
-		res = truncate_error_page(hpage, pfn, mapping);
+		res = truncate_error_page(hpage, page_to_pfn(p), mapping);
 	} else {
 		unlock_page(hpage);
 		/*
@@ -833,6 +881,9 @@ static int me_huge_page(struct page *p, unsigned long pfn)
 		lock_page(hpage);
 	}
 
+	if (has_extra_refcount(ps, p, false))
+		res = MF_FAILED;
+
 	return res;
 }
 
@@ -858,12 +909,7 @@ static int me_huge_page(struct page *p, unsigned long pfn)
 #define slab		(1UL << PG_slab)
 #define reserved	(1UL << PG_reserved)
 
-static struct page_state {
-	unsigned long mask;
-	unsigned long res;
-	enum mf_action_page_type type;
-	int (*action)(struct page *p, unsigned long pfn);
-} error_states[] = {
+static struct page_state error_states[] = {
 	{ reserved,	reserved,	MF_MSG_KERNEL,	me_kernel },
 	/*
 	 * free pages are specially detected outside this table:
@@ -923,18 +969,9 @@ static int page_action(struct page_state *ps, struct page *p,
 			unsigned long pfn)
 {
 	int result;
-	int count;
 
-	result = ps->action(p, pfn);
+	result = ps->action(ps, p);
 
-	count = page_count(p) - 1;
-	if (ps->action == me_swapcache_dirty && result == MF_DELAYED)
-		count--;
-	if (count > 0) {
-		pr_err("Memory failure: %#lx: %s still referenced by %d users\n",
-		       pfn, action_page_types[ps->type], count);
-		result = MF_FAILED;
-	}
 	action_result(pfn, ps->type, result);
 
 	/* Could do more checks here if page looks ok */
-- 
2.38.1


^ permalink raw reply related	[flat|nested] 8+ messages in thread

* [PATCH 4/6] mm: hwpoison: handle non-anonymous THP correctly
  2022-11-23 19:54 [PATCH 0/6] hwpoison, shmem, hugetlb: fix data loss issue 5.10.y Mike Kravetz
                   ` (2 preceding siblings ...)
  2022-11-23 19:54 ` [PATCH 3/6] mm: hwpoison: refactor refcount check handling Mike Kravetz
@ 2022-11-23 19:54 ` Mike Kravetz
  2022-11-23 19:54 ` [PATCH 5/6] mm: shmem: don't truncate page if memory failure happens Mike Kravetz
                   ` (2 subsequent siblings)
  6 siblings, 0 replies; 8+ messages in thread
From: Mike Kravetz @ 2022-11-23 19:54 UTC (permalink / raw)
  To: stable, linux-mm
  Cc: Yang Shi, Naoya Horiguchi, James Houghton, Oscar Salvador,
	Miaohe Lin, Muchun Song, Andrew Morton

From: Yang Shi <shy828301@gmail.com>

commit 4966455d9100236fd6dd72b0cd00818435fdb25d upstream.

Currently hwpoison doesn't handle non-anonymous THP, but since v4.8 THP
support for tmpfs and read-only file cache has been added.  They could
be offlined by split THP, just like anonymous THP.

Link: https://lkml.kernel.org/r/20211020210755.23964-7-shy828301@gmail.com
Signed-off-by: Yang Shi <shy828301@gmail.com>
Acked-by: Naoya Horiguchi <naoya.horiguchi@nec.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Oscar Salvador <osalvador@suse.de>
Cc: Peter Xu <peterx@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
---
 mm/memory-failure.c | 7 ++-----
 1 file changed, 2 insertions(+), 5 deletions(-)

diff --git a/mm/memory-failure.c b/mm/memory-failure.c
index 0f77ec1985dc..1d37de089008 100644
--- a/mm/memory-failure.c
+++ b/mm/memory-failure.c
@@ -1154,14 +1154,11 @@ static int identify_page_state(unsigned long pfn, struct page *p,
 static int try_to_split_thp_page(struct page *page, const char *msg)
 {
 	lock_page(page);
-	if (!PageAnon(page) || unlikely(split_huge_page(page))) {
+	if (unlikely(split_huge_page(page))) {
 		unsigned long pfn = page_to_pfn(page);
 
 		unlock_page(page);
-		if (!PageAnon(page))
-			pr_info("%s: %#lx: non anonymous thp\n", msg, pfn);
-		else
-			pr_info("%s: %#lx: thp split failed\n", msg, pfn);
+		pr_info("%s: %#lx: thp split failed\n", msg, pfn);
 		put_page(page);
 		return -EBUSY;
 	}
-- 
2.38.1


^ permalink raw reply related	[flat|nested] 8+ messages in thread

* [PATCH 5/6] mm: shmem: don't truncate page if memory failure happens
  2022-11-23 19:54 [PATCH 0/6] hwpoison, shmem, hugetlb: fix data loss issue 5.10.y Mike Kravetz
                   ` (3 preceding siblings ...)
  2022-11-23 19:54 ` [PATCH 4/6] mm: hwpoison: handle non-anonymous THP correctly Mike Kravetz
@ 2022-11-23 19:54 ` Mike Kravetz
  2022-11-23 19:54 ` [PATCH 6/6] hugetlbfs: don't delete error page from pagecache Mike Kravetz
  2023-02-20 11:38 ` [PATCH 0/6] hwpoison, shmem, hugetlb: fix data loss issue 5.10.y Shuai Xue
  6 siblings, 0 replies; 8+ messages in thread
From: Mike Kravetz @ 2022-11-23 19:54 UTC (permalink / raw)
  To: stable, linux-mm
  Cc: Yang Shi, Naoya Horiguchi, James Houghton, Oscar Salvador,
	Miaohe Lin, Muchun Song, Andrew Morton

From: Yang Shi <shy828301@gmail.com>

commit a7605426666196c5a460dd3de6f8dac1d3c21f00 upstream.

The current behavior of memory failure is to truncate the page cache
regardless of dirty or clean.  If the page is dirty the later access
will get the obsolete data from disk without any notification to the
users.  This may cause silent data loss.  It is even worse for shmem
since shmem is in-memory filesystem, truncating page cache means
discarding data blocks.  The later read would return all zero.

The right approach is to keep the corrupted page in page cache, any
later access would return error for syscalls or SIGBUS for page fault,
until the file is truncated, hole punched or removed.  The regular
storage backed filesystems would be more complicated so this patch is
focused on shmem.  This also unblock the support for soft offlining
shmem THP.

[akpm@linux-foundation.org: coding style fixes]
[arnd@arndb.de: fix uninitialized variable use in me_pagecache_clean()]
  Link: https://lkml.kernel.org/r/20211022064748.4173718-1-arnd@kernel.org
[Fix invalid pointer dereference in shmem_read_mapping_page_gfp() with a
 slight different implementation from what Ajay Garg <ajaygargnsit@gmail.com>
 and Muchun Song <songmuchun@bytedance.com> proposed and reworked the
 error handling of shmem_write_begin() suggested by Linus]
  Link: https://lore.kernel.org/linux-mm/20211111084617.6746-1-ajaygargnsit@gmail.com/

Link: https://lkml.kernel.org/r/20211020210755.23964-6-shy828301@gmail.com
Link: https://lkml.kernel.org/r/20211116193247.21102-1-shy828301@gmail.com
Signed-off-by: Yang Shi <shy828301@gmail.com>
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Cc: Hugh Dickins <hughd@google.com>
Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Naoya Horiguchi <naoya.horiguchi@nec.com>
Cc: Oscar Salvador <osalvador@suse.de>
Cc: Peter Xu <peterx@redhat.com>
Cc: Ajay Garg <ajaygargnsit@gmail.com>
Cc: Muchun Song <songmuchun@bytedance.com>
Cc: Andy Lavr <andy.lavr@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Mike Kravetz <mike.kravetz@oracle.com>
---
 mm/memory-failure.c | 10 ++++++++-
 mm/shmem.c          | 51 +++++++++++++++++++++++++++++++++++++++------
 2 files changed, 54 insertions(+), 7 deletions(-)

diff --git a/mm/memory-failure.c b/mm/memory-failure.c
index 1d37de089008..7d96be8e93b7 100644
--- a/mm/memory-failure.c
+++ b/mm/memory-failure.c
@@ -56,6 +56,7 @@
 #include <linux/kfifo.h>
 #include <linux/ratelimit.h>
 #include <linux/page-isolation.h>
+#include <linux/shmem_fs.h>
 #include "internal.h"
 #include "ras/ras_event.h"
 
@@ -705,6 +706,7 @@ static int me_unknown(struct page_state *ps, struct page *p)
 static int me_pagecache_clean(struct page_state *ps, struct page *p)
 {
 	struct address_space *mapping;
+	bool extra_pins;
 	int ret;
 
 	delete_from_lru_cache(p);
@@ -731,6 +733,12 @@ static int me_pagecache_clean(struct page_state *ps, struct page *p)
 		return MF_FAILED;
 	}
 
+	/*
+	 * The shmem page is kept in page cache instead of truncating
+	 * so is expected to have an extra refcount after error-handling.
+	 */
+	extra_pins = shmem_mapping(mapping);
+
 	/*
 	 * Truncation is a bit tricky. Enable it per file system for now.
 	 *
@@ -738,7 +746,7 @@ static int me_pagecache_clean(struct page_state *ps, struct page *p)
 	 */
 	ret = truncate_error_page(p, page_to_pfn(p), mapping);
 
-	if (has_extra_refcount(ps, p, false))
+	if (has_extra_refcount(ps, p, extra_pins))
 		ret = MF_FAILED;
 
 	return ret;
diff --git a/mm/shmem.c b/mm/shmem.c
index d3d8c5e7a296..8b539033dfe2 100644
--- a/mm/shmem.c
+++ b/mm/shmem.c
@@ -2521,6 +2521,7 @@ shmem_write_begin(struct file *file, struct address_space *mapping,
 	struct inode *inode = mapping->host;
 	struct shmem_inode_info *info = SHMEM_I(inode);
 	pgoff_t index = pos >> PAGE_SHIFT;
+	int ret = 0;
 
 	/* i_mutex is held by caller */
 	if (unlikely(info->seals & (F_SEAL_GROW |
@@ -2531,7 +2532,19 @@ shmem_write_begin(struct file *file, struct address_space *mapping,
 			return -EPERM;
 	}
 
-	return shmem_getpage(inode, index, pagep, SGP_WRITE);
+	ret = shmem_getpage(inode, index, pagep, SGP_WRITE);
+
+	if (ret)
+		return ret;
+
+	if (PageHWPoison(*pagep)) {
+		unlock_page(*pagep);
+		put_page(*pagep);
+		*pagep = NULL;
+		return -EIO;
+	}
+
+	return 0;
 }
 
 static int
@@ -2618,6 +2631,12 @@ static ssize_t shmem_file_read_iter(struct kiocb *iocb, struct iov_iter *to)
 			if (sgp == SGP_CACHE)
 				set_page_dirty(page);
 			unlock_page(page);
+
+			if (PageHWPoison(page)) {
+				put_page(page);
+				error = -EIO;
+				break;
+			}
 		}
 
 		/*
@@ -3210,7 +3229,8 @@ static const char *shmem_get_link(struct dentry *dentry,
 		page = find_get_page(inode->i_mapping, 0);
 		if (!page)
 			return ERR_PTR(-ECHILD);
-		if (!PageUptodate(page)) {
+		if (PageHWPoison(page) ||
+		    !PageUptodate(page)) {
 			put_page(page);
 			return ERR_PTR(-ECHILD);
 		}
@@ -3218,6 +3238,13 @@ static const char *shmem_get_link(struct dentry *dentry,
 		error = shmem_getpage(inode, 0, &page, SGP_READ);
 		if (error)
 			return ERR_PTR(error);
+		if (!page)
+			return ERR_PTR(-ECHILD);
+		if (PageHWPoison(page)) {
+			unlock_page(page);
+			put_page(page);
+			return ERR_PTR(-ECHILD);
+		}
 		unlock_page(page);
 	}
 	set_delayed_call(done, shmem_put_link, page);
@@ -3866,6 +3893,13 @@ static void shmem_destroy_inodecache(void)
 	kmem_cache_destroy(shmem_inode_cachep);
 }
 
+/* Keep the page in page cache instead of truncating it */
+static int shmem_error_remove_page(struct address_space *mapping,
+				struct page *page)
+{
+	return 0;
+}
+
 static const struct address_space_operations shmem_aops = {
 	.writepage	= shmem_writepage,
 	.set_page_dirty	= __set_page_dirty_no_writeback,
@@ -3876,7 +3910,7 @@ static const struct address_space_operations shmem_aops = {
 #ifdef CONFIG_MIGRATION
 	.migratepage	= migrate_page,
 #endif
-	.error_remove_page = generic_error_remove_page,
+	.error_remove_page = shmem_error_remove_page,
 };
 
 static const struct file_operations shmem_file_operations = {
@@ -4316,9 +4350,14 @@ struct page *shmem_read_mapping_page_gfp(struct address_space *mapping,
 	error = shmem_getpage_gfp(inode, index, &page, SGP_CACHE,
 				  gfp, NULL, NULL, NULL);
 	if (error)
-		page = ERR_PTR(error);
-	else
-		unlock_page(page);
+		return ERR_PTR(error);
+
+	unlock_page(page);
+	if (PageHWPoison(page)) {
+		put_page(page);
+		return ERR_PTR(-EIO);
+	}
+
 	return page;
 #else
 	/*
-- 
2.38.1


^ permalink raw reply related	[flat|nested] 8+ messages in thread

* [PATCH 6/6] hugetlbfs: don't delete error page from pagecache
  2022-11-23 19:54 [PATCH 0/6] hwpoison, shmem, hugetlb: fix data loss issue 5.10.y Mike Kravetz
                   ` (4 preceding siblings ...)
  2022-11-23 19:54 ` [PATCH 5/6] mm: shmem: don't truncate page if memory failure happens Mike Kravetz
@ 2022-11-23 19:54 ` Mike Kravetz
  2023-02-20 11:38 ` [PATCH 0/6] hwpoison, shmem, hugetlb: fix data loss issue 5.10.y Shuai Xue
  6 siblings, 0 replies; 8+ messages in thread
From: Mike Kravetz @ 2022-11-23 19:54 UTC (permalink / raw)
  To: stable, linux-mm
  Cc: Yang Shi, Naoya Horiguchi, James Houghton, Oscar Salvador,
	Miaohe Lin, Muchun Song, Andrew Morton

From: James Houghton <jthoughton@google.com>

commit 8625147cafaa9ba74713d682f5185eb62cb2aedb upstream.

This change is very similar to the change that was made for shmem [1], and
it solves the same problem but for HugeTLBFS instead.

Currently, when poison is found in a HugeTLB page, the page is removed
from the page cache.  That means that attempting to map or read that
hugepage in the future will result in a new hugepage being allocated
instead of notifying the user that the page was poisoned.  As [1] states,
this is effectively memory corruption.

The fix is to leave the page in the page cache.  If the user attempts to
use a poisoned HugeTLB page with a syscall, the syscall will fail with
EIO, the same error code that shmem uses.  For attempts to map the page,
the thread will get a BUS_MCEERR_AR SIGBUS.

[1]: commit a76054266661 ("mm: shmem: don't truncate page if memory failure happens")

Link: https://lkml.kernel.org/r/20221018200125.848471-1-jthoughton@google.com
Signed-off-by: James Houghton <jthoughton@google.com>
Reviewed-by: Mike Kravetz <mike.kravetz@oracle.com>
Reviewed-by: Naoya Horiguchi <naoya.horiguchi@nec.com>
Tested-by: Naoya Horiguchi <naoya.horiguchi@nec.com>
Reviewed-by: Yang Shi <shy828301@gmail.com>
Cc: Axel Rasmussen <axelrasmussen@google.com>
Cc: James Houghton <jthoughton@google.com>
Cc: Miaohe Lin <linmiaohe@huawei.com>
Cc: Muchun Song <songmuchun@bytedance.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Mike Kravetz <mike.kravetz@oracle.com>
---
 fs/hugetlbfs/inode.c | 13 ++++++-------
 mm/hugetlb.c         |  4 ++++
 mm/memory-failure.c  |  5 ++++-
 3 files changed, 14 insertions(+), 8 deletions(-)

diff --git a/fs/hugetlbfs/inode.c b/fs/hugetlbfs/inode.c
index a2f43f1a85f8..580fcf26a48f 100644
--- a/fs/hugetlbfs/inode.c
+++ b/fs/hugetlbfs/inode.c
@@ -361,6 +361,12 @@ static ssize_t hugetlbfs_read_iter(struct kiocb *iocb, struct iov_iter *to)
 		} else {
 			unlock_page(page);
 
+			if (PageHWPoison(page)) {
+				put_page(page);
+				retval = -EIO;
+				break;
+			}
+
 			/*
 			 * We have the page, copy it to user space buffer.
 			 */
@@ -994,13 +1000,6 @@ static int hugetlbfs_migrate_page(struct address_space *mapping,
 static int hugetlbfs_error_remove_page(struct address_space *mapping,
 				struct page *page)
 {
-	struct inode *inode = mapping->host;
-	pgoff_t index = page->index;
-
-	remove_huge_page(page);
-	if (unlikely(hugetlb_unreserve_pages(inode, index, index + 1, 1)))
-		hugetlb_fix_reserve_counts(inode);
-
 	return 0;
 }
 
diff --git a/mm/hugetlb.c b/mm/hugetlb.c
index d8c63d79af20..6c99b217a03e 100644
--- a/mm/hugetlb.c
+++ b/mm/hugetlb.c
@@ -4775,6 +4775,10 @@ int hugetlb_mcopy_atomic_pte(struct mm_struct *dst_mm,
 	ptl = huge_pte_lockptr(h, dst_mm, dst_pte);
 	spin_lock(ptl);
 
+	ret = -EIO;
+	if (PageHWPoison(page))
+		goto out_release_unlock;
+
 	/*
 	 * Recheck the i_size after holding PT lock to make sure not
 	 * to leave any page mapped (as page_mapped()) beyond the end
diff --git a/mm/memory-failure.c b/mm/memory-failure.c
index 7d96be8e93b7..f0cfd7d9c425 100644
--- a/mm/memory-failure.c
+++ b/mm/memory-failure.c
@@ -868,6 +868,7 @@ static int me_huge_page(struct page_state *ps, struct page *p)
 	int res = 0;
 	struct page *hpage = compound_head(p);
 	struct address_space *mapping;
+	bool extra_pins = false;
 
 	if (!PageHuge(hpage))
 		return MF_DELAYED;
@@ -875,6 +876,8 @@ static int me_huge_page(struct page_state *ps, struct page *p)
 	mapping = page_mapping(hpage);
 	if (mapping) {
 		res = truncate_error_page(hpage, page_to_pfn(p), mapping);
+		/* The page is kept in page cache. */
+		extra_pins = true;
 	} else {
 		unlock_page(hpage);
 		/*
@@ -889,7 +892,7 @@ static int me_huge_page(struct page_state *ps, struct page *p)
 		lock_page(hpage);
 	}
 
-	if (has_extra_refcount(ps, p, false))
+	if (has_extra_refcount(ps, p, extra_pins))
 		res = MF_FAILED;
 
 	return res;
-- 
2.38.1


^ permalink raw reply related	[flat|nested] 8+ messages in thread

* Re: [PATCH 0/6] hwpoison, shmem, hugetlb: fix data loss issue 5.10.y
  2022-11-23 19:54 [PATCH 0/6] hwpoison, shmem, hugetlb: fix data loss issue 5.10.y Mike Kravetz
                   ` (5 preceding siblings ...)
  2022-11-23 19:54 ` [PATCH 6/6] hugetlbfs: don't delete error page from pagecache Mike Kravetz
@ 2023-02-20 11:38 ` Shuai Xue
  6 siblings, 0 replies; 8+ messages in thread
From: Shuai Xue @ 2023-02-20 11:38 UTC (permalink / raw)
  To: Mike Kravetz, stable, linux-mm
  Cc: Yang Shi, Naoya Horiguchi, James Houghton, Oscar Salvador,
	Miaohe Lin, Muchun Song, Andrew Morton



On 2022/11/24 AM3:54, Mike Kravetz wrote:
> This is a request for adding the following patches to stable 5.10.y.
> 
> Poisoned shmem and hugetlb pages are removed from the pagecache.
> Subsequent access to the offset in the file results in a NEW zero
> filled page.  Application code does not get notified of the data
> loss, and the only 'clue' is a message in the system log.  Data
> loss has been experienced by real users.
> 
> This was addressed upstream.  Most commits were marked for backports,
> but some were not.  This was discussed here [1] and here [2].
> 
> Patches apply cleanly to v5.4.224 and pass tests checking for this
> specific data loss issue.  LTP mm tests show no regressions.
> 
> All patches except 4 "mm: hwpoison: handle non-anonymous THP correctly"
> required a small bit of change to apply correctly: mostly for context.
> 
> linux-mm Cc'ed as it would be great to get at least an ACK from others
> familiar with this issue.
> 
> [1] https://lore.kernel.org/linux-mm/Y2UTUNBHVY5U9si2@monkey/
> [2] https://lore.kernel.org/stable/20221114131403.GA3807058@u2004/
> 
> James Houghton (1):
>   hugetlbfs: don't delete error page from pagecache
> 
> Yang Shi (5):
>   mm: hwpoison: remove the unnecessary THP check
>   mm: filemap: check if THP has hwpoisoned subpage for PMD page fault
>   mm: hwpoison: refactor refcount check handling
>   mm: hwpoison: handle non-anonymous THP correctly
>   mm: shmem: don't truncate page if memory failure happens
> 
>  fs/hugetlbfs/inode.c       |  13 ++--
>  include/linux/page-flags.h |  23 ++++++
>  mm/huge_memory.c           |   2 +
>  mm/hugetlb.c               |   4 +
>  mm/memory-failure.c        | 153 ++++++++++++++++++++++++-------------
>  mm/memory.c                |   9 +++
>  mm/page_alloc.c            |   4 +-
>  mm/shmem.c                 |  51 +++++++++++--
>  8 files changed, 191 insertions(+), 68 deletions(-)
> 

Hi, folks

Thank you for your effort. Data loss will break the data consistency of
end users and it is critical to notify users.

I tried to apply this patch set to 5.10.168 stable release[1] and run
mm_regression[3] test cases following steps[4] provided by Naoya. All
four cases passed.

	#./run.sh project summary -p
	Project Name: debug
	PASS mm/hwpoison/shmem_link/link-hard.auto3
	PASS mm/hwpoison/shmem_link/link-sym.auto3
	PASS mm/hwpoison/shmem_rw/thp-always.auto3
	PASS mm/hwpoison/shmem_rw/thp-never.auto3
	Progress: 4 / 4 (100%)

Tested-by: Shuai Xue <xueshuai@linux.alibaba.com>

Cheers,
Shuai

[1] https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/tag/?h=v5.10.168
[2] https://github.com/nhoriguchi/mm_regression
[3] https://lore.kernel.org/stable/20221116235842.GA62826@u2004/

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2023-02-20 11:39 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-11-23 19:54 [PATCH 0/6] hwpoison, shmem, hugetlb: fix data loss issue 5.10.y Mike Kravetz
2022-11-23 19:54 ` [PATCH 1/6] mm: hwpoison: remove the unnecessary THP check Mike Kravetz
2022-11-23 19:54 ` [PATCH 2/6] mm: filemap: check if THP has hwpoisoned subpage for PMD page fault Mike Kravetz
2022-11-23 19:54 ` [PATCH 3/6] mm: hwpoison: refactor refcount check handling Mike Kravetz
2022-11-23 19:54 ` [PATCH 4/6] mm: hwpoison: handle non-anonymous THP correctly Mike Kravetz
2022-11-23 19:54 ` [PATCH 5/6] mm: shmem: don't truncate page if memory failure happens Mike Kravetz
2022-11-23 19:54 ` [PATCH 6/6] hugetlbfs: don't delete error page from pagecache Mike Kravetz
2023-02-20 11:38 ` [PATCH 0/6] hwpoison, shmem, hugetlb: fix data loss issue 5.10.y Shuai Xue

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.