linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [RFC v3 PATCH 0/5] Solve silent data loss caused by poisoned page cache (shmem/tmpfs)
@ 2021-09-30 21:53 Yang Shi
  2021-09-30 21:53 ` [v3 PATCH 1/5] mm: hwpoison: remove the unnecessary THP check Yang Shi
                   ` (5 more replies)
  0 siblings, 6 replies; 59+ messages in thread
From: Yang Shi @ 2021-09-30 21:53 UTC (permalink / raw)
  To: naoya.horiguchi, hughd, kirill.shutemov, willy, peterx, osalvador, akpm
  Cc: shy828301, linux-mm, linux-fsdevel, linux-kernel


When discussing the patch that splits page cache THP in order to offline the
poisoned page, Noaya mentioned there is a bigger problem [1] that prevents this
from working since the page cache page will be truncated if uncorrectable
errors happen.  By looking this deeper it turns out this approach (truncating
poisoned page) may incur silent data loss for all non-readonly filesystems if
the page is dirty.  It may be worse for in-memory filesystem, e.g. shmem/tmpfs
since the data blocks are actually gone.

To solve this problem we could keep the poisoned dirty page in page cache then
notify the users on any later access, e.g. page fault, read/write, etc.  The
clean page could be truncated as is since they can be reread from disk later on.

The consequence is the filesystems may find poisoned page and manipulate it as
healthy page since all the filesystems actually don't check if the page is
poisoned or not in all the relevant paths except page fault.  In general, we
need make the filesystems be aware of poisoned page before we could keep the
poisoned page in page cache in order to solve the data loss problem.

To make filesystems be aware of poisoned page we should consider:
- The page should be not written back: clearing dirty flag could prevent from
  writeback.
- The page should not be dropped (it shows as a clean page) by drop caches or
  other callers: the refcount pin from hwpoison could prevent from invalidating
  (called by cache drop, inode cache shrinking, etc), but it doesn't avoid
  invalidation in DIO path.
- The page should be able to get truncated/hole punched/unlinked: it works as it
  is.
- Notify users when the page is accessed, e.g. read/write, page fault and other
  paths (compression, encryption, etc).

The scope of the last one is huge since almost all filesystems need do it once
a page is returned from page cache lookup.  There are a couple of options to
do it:

1. Check hwpoison flag for every path, the most straightforward way.
2. Return NULL for poisoned page from page cache lookup, the most callsites
   check if NULL is returned, this should have least work I think.  But the
   error handling in filesystems just return -ENOMEM, the error code will incur
   confusion to the users obviously.
3. To improve #2, we could return error pointer, e.g. ERR_PTR(-EIO), but this
   will involve significant amount of code change as well since all the paths
   need check if the pointer is ERR or not just like option #1.

I did prototype for both #1 and #3, but it seems #3 may require more changes
than #1.  For #3 ERR_PTR will be returned so all the callers need to check the
return value otherwise invalid pointer may be dereferenced, but not all callers
really care about the content of the page, for example, partial truncate which
just sets the truncated range in one page to 0.  So for such paths it needs
additional modification if ERR_PTR is returned.  And if the callers have their
own way to handle the problematic pages we need to add a new FGP flag to tell
FGP functions to return the pointer to the page.

It may happen very rarely, but once it happens the consequence (data corruption)
could be very bad and it is very hard to debug.  It seems this problem had been
slightly discussed before, but seems no action was taken at that time. [2]

As the aforementioned investigation, it needs huge amount of work to solve
the potential data loss for all filesystems.  But it is much easier for
in-memory filesystems and such filesystems actually suffer more than others
since even the data blocks are gone due to truncating.  So this patchset starts
from shmem/tmpfs by taking option #1.

Patch #1: cleanup, depended by patch #2
Patch #2: fix THP with hwpoisoned subpage(s) PMD map bug
Patch #2: refactor and preparation.
Patch #4: keep the poisoned page in page cache and handle such case for all
          the paths.
Patch #5: the previous patches unblock page cache THP split, so this patch
          add page cache THP split support.

I didn't receive too many comments for patch #3 ~ #5, so may consider separate
the bug fixes (patch #1 and #2) from others to make them merged sooner.  This
version still includes all 5 patches.


Changelog
v2 --> v3:
  * Incorporated the comments from Kirill.
  * Reordered the series to reflect the right dependency (patch #3 from v2
    is patch #1 in this revision, patch #1 from v2 is patch #2 in this
    revision).
  * After the reorder, patch #2 depends on patch #1 and both need to be
    backported to -stable.
v1 --> v2:
  * Incorporated the suggestion from Kirill to use a new page flag to
    indicate there is hwpoisoned subpage(s) in a THP. (patch #1)
  * Dropped patch #2 of v1.
  * Refctored the page refcount check logic of hwpoison per Naoya. (patch #2)
  * Removed unnecessary THP check per Naoya. (patch #3)
  * Incorporated the other comments for shmem from Naoya. (patch #4)


Yang Shi (5):
      mm: hwpoison: remove the unnecessary THP check
      mm: filemap: check if THP has hwpoisoned subpage for PMD page fault
      mm: hwpoison: refactor refcount check handling
      mm: shmem: don't truncate page if memory failure happens
      mm: hwpoison: handle non-anonymous THP correctly

 include/linux/page-flags.h |  19 +++++++++++
 mm/filemap.c               |  12 +++----
 mm/huge_memory.c           |   2 ++
 mm/memory-failure.c        | 129 +++++++++++++++++++++++++++++++++++++++++++--------------------------
 mm/memory.c                |   9 +++++
 mm/page_alloc.c            |   4 ++-
 mm/shmem.c                 |  31 +++++++++++++++--
 mm/userfaultfd.c           |   5 +++
 8 files changed, 153 insertions(+), 58 deletions(-)


[1] https://lore.kernel.org/linux-mm/CAHbLzkqNPBh_sK09qfr4yu4WTFOzRy+MKj+PA7iG-adzi9zGsg@mail.gmail.com/T/#m0e959283380156f1d064456af01ae51fdff91265
[2] https://lore.kernel.org/lkml/20210318183350.GT3420@casper.infradead.org/


^ permalink raw reply	[flat|nested] 59+ messages in thread

* [v3 PATCH 1/5] mm: hwpoison: remove the unnecessary THP check
  2021-09-30 21:53 [RFC v3 PATCH 0/5] Solve silent data loss caused by poisoned page cache (shmem/tmpfs) Yang Shi
@ 2021-09-30 21:53 ` Yang Shi
  2021-10-06  2:35   ` Yang Shi
  2021-09-30 21:53 ` [v3 PATCH 2/5] mm: filemap: check if THP has hwpoisoned subpage for PMD page fault Yang Shi
                   ` (4 subsequent siblings)
  5 siblings, 1 reply; 59+ messages in thread
From: Yang Shi @ 2021-09-30 21:53 UTC (permalink / raw)
  To: naoya.horiguchi, hughd, kirill.shutemov, willy, peterx, osalvador, akpm
  Cc: shy828301, linux-mm, linux-fsdevel, linux-kernel

When handling THP hwpoison checked if the THP is in allocation or free
stage since hwpoison may mistreat it as hugetlb page.  After
commit 415c64c1453a ("mm/memory-failure: split thp earlier in memory error
handling") the problem has been fixed, so this check is no longer
needed.  Remove it.  The side effect of the removal is hwpoison may
report unsplit THP instead of unknown error for shmem THP.  It seems not
like a big deal.

The following patch depends on this, which fixes shmem THP with
hwpoisoned subpage(s) are mapped PMD wrongly.  So this patch needs to be
backported to -stable as well.

Cc: <stable@vger.kernel.org>
Suggested-by: Naoya Horiguchi <naoya.horiguchi@nec.com>
Signed-off-by: Yang Shi <shy828301@gmail.com>
---
 mm/memory-failure.c | 14 --------------
 1 file changed, 14 deletions(-)

diff --git a/mm/memory-failure.c b/mm/memory-failure.c
index 54879c339024..ed28eba50f98 100644
--- a/mm/memory-failure.c
+++ b/mm/memory-failure.c
@@ -1147,20 +1147,6 @@ static int __get_hwpoison_page(struct page *page)
 	if (!HWPoisonHandlable(head))
 		return -EBUSY;
 
-	if (PageTransHuge(head)) {
-		/*
-		 * Non anonymous thp exists only in allocation/free time. We
-		 * can't handle such a case correctly, so let's give it up.
-		 * This should be better than triggering BUG_ON when kernel
-		 * tries to touch the "partially handled" page.
-		 */
-		if (!PageAnon(head)) {
-			pr_err("Memory failure: %#lx: non anonymous thp\n",
-				page_to_pfn(page));
-			return 0;
-		}
-	}
-
 	if (get_page_unless_zero(head)) {
 		if (head == compound_head(page))
 			return 1;
-- 
2.26.2


^ permalink raw reply related	[flat|nested] 59+ messages in thread

* [v3 PATCH 2/5] mm: filemap: check if THP has hwpoisoned subpage for PMD page fault
  2021-09-30 21:53 [RFC v3 PATCH 0/5] Solve silent data loss caused by poisoned page cache (shmem/tmpfs) Yang Shi
  2021-09-30 21:53 ` [v3 PATCH 1/5] mm: hwpoison: remove the unnecessary THP check Yang Shi
@ 2021-09-30 21:53 ` Yang Shi
  2021-10-01  7:23   ` Naoya Horiguchi
                     ` (4 more replies)
  2021-09-30 21:53 ` [v3 PATCH 3/5] mm: hwpoison: refactor refcount check handling Yang Shi
                   ` (3 subsequent siblings)
  5 siblings, 5 replies; 59+ messages in thread
From: Yang Shi @ 2021-09-30 21:53 UTC (permalink / raw)
  To: naoya.horiguchi, hughd, kirill.shutemov, willy, peterx, osalvador, akpm
  Cc: shy828301, linux-mm, linux-fsdevel, linux-kernel

When handling shmem page fault the THP with corrupted subpage could be PMD
mapped if certain conditions are satisfied.  But kernel is supposed to
send SIGBUS when trying to map hwpoisoned page.

There are two paths which may do PMD map: fault around and regular fault.

Before commit f9ce0be71d1f ("mm: Cleanup faultaround and finish_fault() codepaths")
the thing was even worse in fault around path.  The THP could be PMD mapped as
long as the VMA fits regardless what subpage is accessed and corrupted.  After
this commit as long as head page is not corrupted the THP could be PMD mapped.

In the regular fault path the THP could be PMD mapped as long as the corrupted
page is not accessed and the VMA fits.

This loophole could be fixed by iterating every subpage to check if any
of them is hwpoisoned or not, but it is somewhat costly in page fault path.

So introduce a new page flag called HasHWPoisoned on the first tail page.  It
indicates the THP has hwpoisoned subpage(s).  It is set if any subpage of THP
is found hwpoisoned by memory failure and cleared when the THP is freed or
split.

Fixes: 800d8c63b2e9 ("shmem: add huge pages support")
Cc: <stable@vger.kernel.org>
Suggested-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Signed-off-by: Yang Shi <shy828301@gmail.com>
---
 include/linux/page-flags.h | 19 +++++++++++++++++++
 mm/filemap.c               | 12 ++++++------
 mm/huge_memory.c           |  2 ++
 mm/memory-failure.c        |  6 +++++-
 mm/memory.c                |  9 +++++++++
 mm/page_alloc.c            |  4 +++-
 6 files changed, 44 insertions(+), 8 deletions(-)

diff --git a/include/linux/page-flags.h b/include/linux/page-flags.h
index a558d67ee86f..a357b41b3057 100644
--- a/include/linux/page-flags.h
+++ b/include/linux/page-flags.h
@@ -171,6 +171,11 @@ enum pageflags {
 	/* Compound pages. Stored in first tail page's flags */
 	PG_double_map = PG_workingset,
 
+#ifdef CONFIG_MEMORY_FAILURE
+	/* Compound pages. Stored in first tail page's flags */
+	PG_has_hwpoisoned = PG_mappedtodisk,
+#endif
+
 	/* non-lru isolated movable page */
 	PG_isolated = PG_reclaim,
 
@@ -668,6 +673,20 @@ PAGEFLAG_FALSE(DoubleMap)
 	TESTSCFLAG_FALSE(DoubleMap)
 #endif
 
+#if defined(CONFIG_MEMORY_FAILURE) && defined(CONFIG_TRANSPARENT_HUGEPAGE)
+/*
+ * PageHasPoisoned indicates that at least on subpage is hwpoisoned in the
+ * compound page.
+ *
+ * This flag is set by hwpoison handler.  Cleared by THP split or free page.
+ */
+PAGEFLAG(HasHWPoisoned, has_hwpoisoned, PF_SECOND)
+	TESTSCFLAG(HasHWPoisoned, has_hwpoisoned, PF_SECOND)
+#else
+PAGEFLAG_FALSE(HasHWPoisoned)
+	TESTSCFLAG_FALSE(HasHWPoisoned)
+#endif
+
 /*
  * Check if a page is currently marked HWPoisoned. Note that this check is
  * best effort only and inherently racy: there is no way to synchronize with
diff --git a/mm/filemap.c b/mm/filemap.c
index dae481293b5d..2acc2b977f66 100644
--- a/mm/filemap.c
+++ b/mm/filemap.c
@@ -3195,12 +3195,12 @@ static bool filemap_map_pmd(struct vm_fault *vmf, struct page *page)
 	}
 
 	if (pmd_none(*vmf->pmd) && PageTransHuge(page)) {
-	    vm_fault_t ret = do_set_pmd(vmf, page);
-	    if (!ret) {
-		    /* The page is mapped successfully, reference consumed. */
-		    unlock_page(page);
-		    return true;
-	    }
+		vm_fault_t ret = do_set_pmd(vmf, page);
+		if (!ret) {
+			/* The page is mapped successfully, reference consumed. */
+			unlock_page(page);
+			return true;
+		}
 	}
 
 	if (pmd_none(*vmf->pmd)) {
diff --git a/mm/huge_memory.c b/mm/huge_memory.c
index 5e9ef0fc261e..0574b1613714 100644
--- a/mm/huge_memory.c
+++ b/mm/huge_memory.c
@@ -2426,6 +2426,8 @@ static void __split_huge_page(struct page *page, struct list_head *list,
 	/* lock lru list/PageCompound, ref frozen by page_ref_freeze */
 	lruvec = lock_page_lruvec(head);
 
+	ClearPageHasHWPoisoned(head);
+
 	for (i = nr - 1; i >= 1; i--) {
 		__split_huge_page_tail(head, i, lruvec, list);
 		/* Some pages can be beyond EOF: drop them from page cache */
diff --git a/mm/memory-failure.c b/mm/memory-failure.c
index ed28eba50f98..a79a38374a14 100644
--- a/mm/memory-failure.c
+++ b/mm/memory-failure.c
@@ -1148,8 +1148,12 @@ static int __get_hwpoison_page(struct page *page)
 		return -EBUSY;
 
 	if (get_page_unless_zero(head)) {
-		if (head == compound_head(page))
+		if (head == compound_head(page)) {
+			if (PageTransHuge(head))
+				SetPageHasHWPoisoned(head);
+
 			return 1;
+		}
 
 		pr_info("Memory failure: %#lx cannot catch tail\n",
 			page_to_pfn(page));
diff --git a/mm/memory.c b/mm/memory.c
index adf9b9ef8277..c52be6d6b605 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -3906,6 +3906,15 @@ vm_fault_t do_set_pmd(struct vm_fault *vmf, struct page *page)
 	if (compound_order(page) != HPAGE_PMD_ORDER)
 		return ret;
 
+	/*
+	 * Just backoff if any subpage of a THP is corrupted otherwise
+	 * the corrupted page may mapped by PMD silently to escape the
+	 * check.  This kind of THP just can be PTE mapped.  Access to
+	 * the corrupted subpage should trigger SIGBUS as expected.
+	 */
+	if (unlikely(PageHasHWPoisoned(page)))
+		return ret;
+
 	/*
 	 * Archs like ppc64 need additional space to store information
 	 * related to pte entry. Use the preallocated table for that.
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index b37435c274cf..7f37652f0287 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -1312,8 +1312,10 @@ static __always_inline bool free_pages_prepare(struct page *page,
 
 		VM_BUG_ON_PAGE(compound && compound_order(page) != order, page);
 
-		if (compound)
+		if (compound) {
 			ClearPageDoubleMap(page);
+			ClearPageHasHWPoisoned(page);
+		}
 		for (i = 1; i < (1 << order); i++) {
 			if (compound)
 				bad += free_tail_pages_check(page, page + i);
-- 
2.26.2


^ permalink raw reply related	[flat|nested] 59+ messages in thread

* [v3 PATCH 3/5] mm: hwpoison: refactor refcount check handling
  2021-09-30 21:53 [RFC v3 PATCH 0/5] Solve silent data loss caused by poisoned page cache (shmem/tmpfs) Yang Shi
  2021-09-30 21:53 ` [v3 PATCH 1/5] mm: hwpoison: remove the unnecessary THP check Yang Shi
  2021-09-30 21:53 ` [v3 PATCH 2/5] mm: filemap: check if THP has hwpoisoned subpage for PMD page fault Yang Shi
@ 2021-09-30 21:53 ` Yang Shi
  2021-10-06 22:01   ` Peter Xu
  2021-09-30 21:53 ` [v3 PATCH 4/5] mm: shmem: don't truncate page if memory failure happens Yang Shi
                   ` (2 subsequent siblings)
  5 siblings, 1 reply; 59+ messages in thread
From: Yang Shi @ 2021-09-30 21:53 UTC (permalink / raw)
  To: naoya.horiguchi, hughd, kirill.shutemov, willy, peterx, osalvador, akpm
  Cc: shy828301, linux-mm, linux-fsdevel, linux-kernel

Memory failure will report failure if the page still has extra pinned
refcount other than from hwpoison after the handler is done.  Actually
the check is not necessary for all handlers, so move the check into
specific handlers.  This would make the following keeping shmem page in
page cache patch easier.

Suggested-by: Naoya Horiguchi <naoya.horiguchi@nec.com>
Signed-off-by: Naoya Horiguchi <naoya.horiguchi@nec.com>
Signed-off-by: Yang Shi <shy828301@gmail.com>
---
 mm/memory-failure.c | 93 +++++++++++++++++++++++++++++++--------------
 1 file changed, 64 insertions(+), 29 deletions(-)

diff --git a/mm/memory-failure.c b/mm/memory-failure.c
index a79a38374a14..562bcf335bd2 100644
--- a/mm/memory-failure.c
+++ b/mm/memory-failure.c
@@ -806,12 +806,44 @@ static int truncate_error_page(struct page *p, unsigned long pfn,
 	return ret;
 }
 
+struct page_state {
+	unsigned long mask;
+	unsigned long res;
+	enum mf_action_page_type type;
+
+	/* Callback ->action() has to unlock the relevant page inside it. */
+	int (*action)(struct page_state *ps, struct page *p);
+};
+
+/*
+ * Return true if page is still referenced by others, otherwise return
+ * false.
+ *
+ * The dec is true when one extra refcount is expected.
+ */
+static bool has_extra_refcount(struct page_state *ps, struct page *p,
+			       bool dec)
+{
+	int count = page_count(p) - 1;
+
+	if (dec)
+		count -= 1;
+
+	if (count > 0) {
+		pr_err("Memory failure: %#lx: %s still referenced by %d users\n",
+		       page_to_pfn(p), action_page_types[ps->type], count);
+		return true;
+	}
+
+	return false;
+}
+
 /*
  * Error hit kernel page.
  * Do nothing, try to be lucky and not touch this instead. For a few cases we
  * could be more sophisticated.
  */
-static int me_kernel(struct page *p, unsigned long pfn)
+static int me_kernel(struct page_state *ps, struct page *p)
 {
 	unlock_page(p);
 	return MF_IGNORED;
@@ -820,9 +852,9 @@ static int me_kernel(struct page *p, unsigned long pfn)
 /*
  * Page in unknown state. Do nothing.
  */
-static int me_unknown(struct page *p, unsigned long pfn)
+static int me_unknown(struct page_state *ps, struct page *p)
 {
-	pr_err("Memory failure: %#lx: Unknown page state\n", pfn);
+	pr_err("Memory failure: %#lx: Unknown page state\n", page_to_pfn(p));
 	unlock_page(p);
 	return MF_FAILED;
 }
@@ -830,7 +862,7 @@ static int me_unknown(struct page *p, unsigned long pfn)
 /*
  * Clean (or cleaned) page cache page.
  */
-static int me_pagecache_clean(struct page *p, unsigned long pfn)
+static int me_pagecache_clean(struct page_state *ps, struct page *p)
 {
 	int ret;
 	struct address_space *mapping;
@@ -867,9 +899,13 @@ static int me_pagecache_clean(struct page *p, unsigned long pfn)
 	 *
 	 * Open: to take i_rwsem or not for this? Right now we don't.
 	 */
-	ret = truncate_error_page(p, pfn, mapping);
+	ret = truncate_error_page(p, page_to_pfn(p), mapping);
 out:
 	unlock_page(p);
+
+	if (has_extra_refcount(ps, p, false))
+		ret = MF_FAILED;
+
 	return ret;
 }
 
@@ -878,7 +914,7 @@ static int me_pagecache_clean(struct page *p, unsigned long pfn)
  * Issues: when the error hit a hole page the error is not properly
  * propagated.
  */
-static int me_pagecache_dirty(struct page *p, unsigned long pfn)
+static int me_pagecache_dirty(struct page_state *ps, struct page *p)
 {
 	struct address_space *mapping = page_mapping(p);
 
@@ -922,7 +958,7 @@ static int me_pagecache_dirty(struct page *p, unsigned long pfn)
 		mapping_set_error(mapping, -EIO);
 	}
 
-	return me_pagecache_clean(p, pfn);
+	return me_pagecache_clean(ps, p);
 }
 
 /*
@@ -944,9 +980,10 @@ static int me_pagecache_dirty(struct page *p, unsigned long pfn)
  * Clean swap cache pages can be directly isolated. A later page fault will
  * bring in the known good data from disk.
  */
-static int me_swapcache_dirty(struct page *p, unsigned long pfn)
+static int me_swapcache_dirty(struct page_state *ps, struct page *p)
 {
 	int ret;
+	bool dec = false;
 
 	ClearPageDirty(p);
 	/* Trigger EIO in shmem: */
@@ -954,10 +991,17 @@ static int me_swapcache_dirty(struct page *p, unsigned long pfn)
 
 	ret = delete_from_lru_cache(p) ? MF_FAILED : MF_DELAYED;
 	unlock_page(p);
+
+	if (ret == MF_DELAYED)
+		dec = true;
+
+	if (has_extra_refcount(ps, p, dec))
+		ret = MF_FAILED;
+
 	return ret;
 }
 
-static int me_swapcache_clean(struct page *p, unsigned long pfn)
+static int me_swapcache_clean(struct page_state *ps, struct page *p)
 {
 	int ret;
 
@@ -965,6 +1009,10 @@ static int me_swapcache_clean(struct page *p, unsigned long pfn)
 
 	ret = delete_from_lru_cache(p) ? MF_FAILED : MF_RECOVERED;
 	unlock_page(p);
+
+	if (has_extra_refcount(ps, p, false))
+		ret = MF_FAILED;
+
 	return ret;
 }
 
@@ -974,7 +1022,7 @@ static int me_swapcache_clean(struct page *p, unsigned long pfn)
  * - Error on hugepage is contained in hugepage unit (not in raw page unit.)
  *   To narrow down kill region to one page, we need to break up pmd.
  */
-static int me_huge_page(struct page *p, unsigned long pfn)
+static int me_huge_page(struct page_state *ps, struct page *p)
 {
 	int res;
 	struct page *hpage = compound_head(p);
@@ -985,7 +1033,7 @@ static int me_huge_page(struct page *p, unsigned long pfn)
 
 	mapping = page_mapping(hpage);
 	if (mapping) {
-		res = truncate_error_page(hpage, pfn, mapping);
+		res = truncate_error_page(hpage, page_to_pfn(p), mapping);
 		unlock_page(hpage);
 	} else {
 		res = MF_FAILED;
@@ -1003,6 +1051,9 @@ static int me_huge_page(struct page *p, unsigned long pfn)
 		}
 	}
 
+	if (has_extra_refcount(ps, p, false))
+		res = MF_FAILED;
+
 	return res;
 }
 
@@ -1028,14 +1079,7 @@ static int me_huge_page(struct page *p, unsigned long pfn)
 #define slab		(1UL << PG_slab)
 #define reserved	(1UL << PG_reserved)
 
-static struct page_state {
-	unsigned long mask;
-	unsigned long res;
-	enum mf_action_page_type type;
-
-	/* Callback ->action() has to unlock the relevant page inside it. */
-	int (*action)(struct page *p, unsigned long pfn);
-} error_states[] = {
+static struct page_state error_states[] = {
 	{ reserved,	reserved,	MF_MSG_KERNEL,	me_kernel },
 	/*
 	 * free pages are specially detected outside this table:
@@ -1095,19 +1139,10 @@ static int page_action(struct page_state *ps, struct page *p,
 			unsigned long pfn)
 {
 	int result;
-	int count;
 
 	/* page p should be unlocked after returning from ps->action().  */
-	result = ps->action(p, pfn);
+	result = ps->action(ps, p);
 
-	count = page_count(p) - 1;
-	if (ps->action == me_swapcache_dirty && result == MF_DELAYED)
-		count--;
-	if (count > 0) {
-		pr_err("Memory failure: %#lx: %s still referenced by %d users\n",
-		       pfn, action_page_types[ps->type], count);
-		result = MF_FAILED;
-	}
 	action_result(pfn, ps->type, result);
 
 	/* Could do more checks here if page looks ok */
-- 
2.26.2


^ permalink raw reply related	[flat|nested] 59+ messages in thread

* [v3 PATCH 4/5] mm: shmem: don't truncate page if memory failure happens
  2021-09-30 21:53 [RFC v3 PATCH 0/5] Solve silent data loss caused by poisoned page cache (shmem/tmpfs) Yang Shi
                   ` (2 preceding siblings ...)
  2021-09-30 21:53 ` [v3 PATCH 3/5] mm: hwpoison: refactor refcount check handling Yang Shi
@ 2021-09-30 21:53 ` Yang Shi
  2021-10-01  7:05   ` Naoya Horiguchi
  2021-10-12  1:57   ` Peter Xu
  2021-09-30 21:53 ` [v3 PATCH 5/5] mm: hwpoison: handle non-anonymous THP correctly Yang Shi
  2021-10-13  2:40 ` [RFC v3 PATCH 0/5] Solve silent data loss caused by poisoned page cache (shmem/tmpfs) Peter Xu
  5 siblings, 2 replies; 59+ messages in thread
From: Yang Shi @ 2021-09-30 21:53 UTC (permalink / raw)
  To: naoya.horiguchi, hughd, kirill.shutemov, willy, peterx, osalvador, akpm
  Cc: shy828301, linux-mm, linux-fsdevel, linux-kernel

The current behavior of memory failure is to truncate the page cache
regardless of dirty or clean.  If the page is dirty the later access
will get the obsolete data from disk without any notification to the
users.  This may cause silent data loss.  It is even worse for shmem
since shmem is in-memory filesystem, truncating page cache means
discarding data blocks.  The later read would return all zero.

The right approach is to keep the corrupted page in page cache, any
later access would return error for syscalls or SIGBUS for page fault,
until the file is truncated, hole punched or removed.  The regular
storage backed filesystems would be more complicated so this patch
is focused on shmem.  This also unblock the support for soft
offlining shmem THP.

Signed-off-by: Yang Shi <shy828301@gmail.com>
---
 mm/memory-failure.c | 10 +++++++++-
 mm/shmem.c          | 31 +++++++++++++++++++++++++++++--
 mm/userfaultfd.c    |  5 +++++
 3 files changed, 43 insertions(+), 3 deletions(-)

diff --git a/mm/memory-failure.c b/mm/memory-failure.c
index 562bcf335bd2..176883cd080f 100644
--- a/mm/memory-failure.c
+++ b/mm/memory-failure.c
@@ -57,6 +57,7 @@
 #include <linux/ratelimit.h>
 #include <linux/page-isolation.h>
 #include <linux/pagewalk.h>
+#include <linux/shmem_fs.h>
 #include "internal.h"
 #include "ras/ras_event.h"
 
@@ -866,6 +867,7 @@ static int me_pagecache_clean(struct page_state *ps, struct page *p)
 {
 	int ret;
 	struct address_space *mapping;
+	bool dec;
 
 	delete_from_lru_cache(p);
 
@@ -894,6 +896,12 @@ static int me_pagecache_clean(struct page_state *ps, struct page *p)
 		goto out;
 	}
 
+	/*
+	 * The shmem page is kept in page cache instead of truncating
+	 * so need decrement the refcount from page cache.
+	 */
+	dec = shmem_mapping(mapping);
+
 	/*
 	 * Truncation is a bit tricky. Enable it per file system for now.
 	 *
@@ -903,7 +911,7 @@ static int me_pagecache_clean(struct page_state *ps, struct page *p)
 out:
 	unlock_page(p);
 
-	if (has_extra_refcount(ps, p, false))
+	if (has_extra_refcount(ps, p, dec))
 		ret = MF_FAILED;
 
 	return ret;
diff --git a/mm/shmem.c b/mm/shmem.c
index 88742953532c..75c36b6a405a 100644
--- a/mm/shmem.c
+++ b/mm/shmem.c
@@ -2456,6 +2456,7 @@ shmem_write_begin(struct file *file, struct address_space *mapping,
 	struct inode *inode = mapping->host;
 	struct shmem_inode_info *info = SHMEM_I(inode);
 	pgoff_t index = pos >> PAGE_SHIFT;
+	int ret = 0;
 
 	/* i_rwsem is held by caller */
 	if (unlikely(info->seals & (F_SEAL_GROW |
@@ -2466,7 +2467,17 @@ shmem_write_begin(struct file *file, struct address_space *mapping,
 			return -EPERM;
 	}
 
-	return shmem_getpage(inode, index, pagep, SGP_WRITE);
+	ret = shmem_getpage(inode, index, pagep, SGP_WRITE);
+
+	if (*pagep) {
+		if (PageHWPoison(*pagep)) {
+			unlock_page(*pagep);
+			put_page(*pagep);
+			ret = -EIO;
+		}
+	}
+
+	return ret;
 }
 
 static int
@@ -2555,6 +2566,11 @@ static ssize_t shmem_file_read_iter(struct kiocb *iocb, struct iov_iter *to)
 			unlock_page(page);
 		}
 
+		if (page && PageHWPoison(page)) {
+			error = -EIO;
+			break;
+		}
+
 		/*
 		 * We must evaluate after, since reads (unlike writes)
 		 * are called without i_rwsem protection against truncate
@@ -3772,6 +3788,13 @@ static void shmem_destroy_inodecache(void)
 	kmem_cache_destroy(shmem_inode_cachep);
 }
 
+/* Keep the page in page cache instead of truncating it */
+static int shmem_error_remove_page(struct address_space *mapping,
+				   struct page *page)
+{
+	return 0;
+}
+
 const struct address_space_operations shmem_aops = {
 	.writepage	= shmem_writepage,
 	.set_page_dirty	= __set_page_dirty_no_writeback,
@@ -3782,7 +3805,7 @@ const struct address_space_operations shmem_aops = {
 #ifdef CONFIG_MIGRATION
 	.migratepage	= migrate_page,
 #endif
-	.error_remove_page = generic_error_remove_page,
+	.error_remove_page = shmem_error_remove_page,
 };
 EXPORT_SYMBOL(shmem_aops);
 
@@ -4193,6 +4216,10 @@ struct page *shmem_read_mapping_page_gfp(struct address_space *mapping,
 		page = ERR_PTR(error);
 	else
 		unlock_page(page);
+
+	if (PageHWPoison(page))
+		page = ERR_PTR(-EIO);
+
 	return page;
 #else
 	/*
diff --git a/mm/userfaultfd.c b/mm/userfaultfd.c
index 7a9008415534..b688d5327177 100644
--- a/mm/userfaultfd.c
+++ b/mm/userfaultfd.c
@@ -233,6 +233,11 @@ static int mcontinue_atomic_pte(struct mm_struct *dst_mm,
 		goto out;
 	}
 
+	if (PageHWPoison(page)) {
+		ret = -EIO;
+		goto out_release;
+	}
+
 	ret = mfill_atomic_install_pte(dst_mm, dst_pmd, dst_vma, dst_addr,
 				       page, false, wp_copy);
 	if (ret)
-- 
2.26.2


^ permalink raw reply related	[flat|nested] 59+ messages in thread

* [v3 PATCH 5/5] mm: hwpoison: handle non-anonymous THP correctly
  2021-09-30 21:53 [RFC v3 PATCH 0/5] Solve silent data loss caused by poisoned page cache (shmem/tmpfs) Yang Shi
                   ` (3 preceding siblings ...)
  2021-09-30 21:53 ` [v3 PATCH 4/5] mm: shmem: don't truncate page if memory failure happens Yang Shi
@ 2021-09-30 21:53 ` Yang Shi
  2021-10-01  7:06   ` Naoya Horiguchi
  2021-10-13  2:40 ` [RFC v3 PATCH 0/5] Solve silent data loss caused by poisoned page cache (shmem/tmpfs) Peter Xu
  5 siblings, 1 reply; 59+ messages in thread
From: Yang Shi @ 2021-09-30 21:53 UTC (permalink / raw)
  To: naoya.horiguchi, hughd, kirill.shutemov, willy, peterx, osalvador, akpm
  Cc: shy828301, linux-mm, linux-fsdevel, linux-kernel

Currently hwpoison doesn't handle non-anonymous THP, but since v4.8 THP
support for tmpfs and read-only file cache has been added.  They could
be offlined by split THP, just like anonymous THP.

Signed-off-by: Yang Shi <shy828301@gmail.com>
---
 mm/memory-failure.c | 7 ++-----
 1 file changed, 2 insertions(+), 5 deletions(-)

diff --git a/mm/memory-failure.c b/mm/memory-failure.c
index 176883cd080f..88866bf4f4a9 100644
--- a/mm/memory-failure.c
+++ b/mm/memory-failure.c
@@ -1447,14 +1447,11 @@ static int identify_page_state(unsigned long pfn, struct page *p,
 static int try_to_split_thp_page(struct page *page, const char *msg)
 {
 	lock_page(page);
-	if (!PageAnon(page) || unlikely(split_huge_page(page))) {
+	if (unlikely(split_huge_page(page))) {
 		unsigned long pfn = page_to_pfn(page);
 
 		unlock_page(page);
-		if (!PageAnon(page))
-			pr_info("%s: %#lx: non anonymous thp\n", msg, pfn);
-		else
-			pr_info("%s: %#lx: thp split failed\n", msg, pfn);
+		pr_info("%s: %#lx: thp split failed\n", msg, pfn);
 		put_page(page);
 		return -EBUSY;
 	}
-- 
2.26.2


^ permalink raw reply related	[flat|nested] 59+ messages in thread

* Re: [v3 PATCH 4/5] mm: shmem: don't truncate page if memory failure happens
  2021-09-30 21:53 ` [v3 PATCH 4/5] mm: shmem: don't truncate page if memory failure happens Yang Shi
@ 2021-10-01  7:05   ` Naoya Horiguchi
  2021-10-01 21:08     ` Yang Shi
  2021-10-12  1:57   ` Peter Xu
  1 sibling, 1 reply; 59+ messages in thread
From: Naoya Horiguchi @ 2021-10-01  7:05 UTC (permalink / raw)
  To: Yang Shi
  Cc: naoya.horiguchi, hughd, kirill.shutemov, willy, peterx,
	osalvador, akpm, linux-mm, linux-fsdevel, linux-kernel

On Thu, Sep 30, 2021 at 02:53:10PM -0700, Yang Shi wrote:
> The current behavior of memory failure is to truncate the page cache
> regardless of dirty or clean.  If the page is dirty the later access
> will get the obsolete data from disk without any notification to the
> users.  This may cause silent data loss.  It is even worse for shmem
> since shmem is in-memory filesystem, truncating page cache means
> discarding data blocks.  The later read would return all zero.
> 
> The right approach is to keep the corrupted page in page cache, any
> later access would return error for syscalls or SIGBUS for page fault,
> until the file is truncated, hole punched or removed.  The regular
> storage backed filesystems would be more complicated so this patch
> is focused on shmem.  This also unblock the support for soft
> offlining shmem THP.
> 
> Signed-off-by: Yang Shi <shy828301@gmail.com>
> ---
...
> @@ -894,6 +896,12 @@ static int me_pagecache_clean(struct page_state *ps, struct page *p)
>  		goto out;
>  	}
>  
> +	/*
> +	 * The shmem page is kept in page cache instead of truncating
> +	 * so need decrement the refcount from page cache.
> +	 */

This comment seems to me confusing because no refcount is decremented here.
What the variable dec tries to do is to give the expected value of the
refcount of the error page after successfull erorr handling, which differs
according to the page state before error handling, so dec adjusts it.

How about the below?

+	/*
+	 * The shmem page is kept in page cache instead of truncating
+	 * so is expected to have an extra refcount after error-handling.
+	 */

> +	dec = shmem_mapping(mapping);
> +
>  	/*
>  	 * Truncation is a bit tricky. Enable it per file system for now.
>  	 *
...
> @@ -2466,7 +2467,17 @@ shmem_write_begin(struct file *file, struct address_space *mapping,
>  			return -EPERM;
>  	}
>  
> -	return shmem_getpage(inode, index, pagep, SGP_WRITE);
> +	ret = shmem_getpage(inode, index, pagep, SGP_WRITE);
> +
> +	if (*pagep) {
> +		if (PageHWPoison(*pagep)) {

Unless you plan to add some code in the near future, how about merging
these two if sentences?

	if (*pagep && PageHWPoison(*pagep)) {

Thanks,
Naoya Horiguchi

> +			unlock_page(*pagep);
> +			put_page(*pagep);
> +			ret = -EIO;
> +		}
> +	}
> +
> +	return ret;
>  }
>  
>  static int

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [v3 PATCH 5/5] mm: hwpoison: handle non-anonymous THP correctly
  2021-09-30 21:53 ` [v3 PATCH 5/5] mm: hwpoison: handle non-anonymous THP correctly Yang Shi
@ 2021-10-01  7:06   ` Naoya Horiguchi
  2021-10-01 21:09     ` Yang Shi
  0 siblings, 1 reply; 59+ messages in thread
From: Naoya Horiguchi @ 2021-10-01  7:06 UTC (permalink / raw)
  To: Yang Shi
  Cc: naoya.horiguchi, hughd, kirill.shutemov, willy, peterx,
	osalvador, akpm, linux-mm, linux-fsdevel, linux-kernel

On Thu, Sep 30, 2021 at 02:53:11PM -0700, Yang Shi wrote:
> Currently hwpoison doesn't handle non-anonymous THP, but since v4.8 THP
> support for tmpfs and read-only file cache has been added.  They could
> be offlined by split THP, just like anonymous THP.
> 
> Signed-off-by: Yang Shi <shy828301@gmail.com>

Acked-by: Naoya Horiguchi <naoya.horiguchi@nec.com>

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [v3 PATCH 2/5] mm: filemap: check if THP has hwpoisoned subpage for PMD page fault
  2021-09-30 21:53 ` [v3 PATCH 2/5] mm: filemap: check if THP has hwpoisoned subpage for PMD page fault Yang Shi
@ 2021-10-01  7:23   ` Naoya Horiguchi
  2021-10-01 21:07     ` Yang Shi
  2021-10-04 14:06   ` Kirill A. Shutemov
                     ` (3 subsequent siblings)
  4 siblings, 1 reply; 59+ messages in thread
From: Naoya Horiguchi @ 2021-10-01  7:23 UTC (permalink / raw)
  To: Yang Shi
  Cc: naoya.horiguchi, hughd, kirill.shutemov, willy, peterx,
	osalvador, akpm, linux-mm, linux-fsdevel, linux-kernel

On Thu, Sep 30, 2021 at 02:53:08PM -0700, Yang Shi wrote:
> When handling shmem page fault the THP with corrupted subpage could be PMD
> mapped if certain conditions are satisfied.  But kernel is supposed to
> send SIGBUS when trying to map hwpoisoned page.
> 
> There are two paths which may do PMD map: fault around and regular fault.
> 
> Before commit f9ce0be71d1f ("mm: Cleanup faultaround and finish_fault() codepaths")
> the thing was even worse in fault around path.  The THP could be PMD mapped as
> long as the VMA fits regardless what subpage is accessed and corrupted.  After
> this commit as long as head page is not corrupted the THP could be PMD mapped.
> 
> In the regular fault path the THP could be PMD mapped as long as the corrupted
> page is not accessed and the VMA fits.
> 
> This loophole could be fixed by iterating every subpage to check if any
> of them is hwpoisoned or not, but it is somewhat costly in page fault path.
> 
> So introduce a new page flag called HasHWPoisoned on the first tail page.  It
> indicates the THP has hwpoisoned subpage(s).  It is set if any subpage of THP
> is found hwpoisoned by memory failure and cleared when the THP is freed or
> split.
> 
> Fixes: 800d8c63b2e9 ("shmem: add huge pages support")
> Cc: <stable@vger.kernel.org>
> Suggested-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
> Signed-off-by: Yang Shi <shy828301@gmail.com>
...
> @@ -668,6 +673,20 @@ PAGEFLAG_FALSE(DoubleMap)
>  	TESTSCFLAG_FALSE(DoubleMap)
>  #endif
>  
> +#if defined(CONFIG_MEMORY_FAILURE) && defined(CONFIG_TRANSPARENT_HUGEPAGE)
> +/*
> + * PageHasPoisoned indicates that at least on subpage is hwpoisoned in the

Maybe you meant as follow?

+ * PageHasHWPoisoned indicates that at least one subpage is hwpoisoned in the

Thanks,
Naoya Horiguchi

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [v3 PATCH 2/5] mm: filemap: check if THP has hwpoisoned subpage for PMD page fault
  2021-10-01  7:23   ` Naoya Horiguchi
@ 2021-10-01 21:07     ` Yang Shi
  0 siblings, 0 replies; 59+ messages in thread
From: Yang Shi @ 2021-10-01 21:07 UTC (permalink / raw)
  To: Naoya Horiguchi
  Cc: HORIGUCHI NAOYA(堀口 直也),
	Hugh Dickins, Kirill A. Shutemov, Matthew Wilcox, Peter Xu,
	Oscar Salvador, Andrew Morton, Linux MM,
	Linux FS-devel Mailing List, Linux Kernel Mailing List

On Fri, Oct 1, 2021 at 12:23 AM Naoya Horiguchi
<naoya.horiguchi@linux.dev> wrote:
>
> On Thu, Sep 30, 2021 at 02:53:08PM -0700, Yang Shi wrote:
> > When handling shmem page fault the THP with corrupted subpage could be PMD
> > mapped if certain conditions are satisfied.  But kernel is supposed to
> > send SIGBUS when trying to map hwpoisoned page.
> >
> > There are two paths which may do PMD map: fault around and regular fault.
> >
> > Before commit f9ce0be71d1f ("mm: Cleanup faultaround and finish_fault() codepaths")
> > the thing was even worse in fault around path.  The THP could be PMD mapped as
> > long as the VMA fits regardless what subpage is accessed and corrupted.  After
> > this commit as long as head page is not corrupted the THP could be PMD mapped.
> >
> > In the regular fault path the THP could be PMD mapped as long as the corrupted
> > page is not accessed and the VMA fits.
> >
> > This loophole could be fixed by iterating every subpage to check if any
> > of them is hwpoisoned or not, but it is somewhat costly in page fault path.
> >
> > So introduce a new page flag called HasHWPoisoned on the first tail page.  It
> > indicates the THP has hwpoisoned subpage(s).  It is set if any subpage of THP
> > is found hwpoisoned by memory failure and cleared when the THP is freed or
> > split.
> >
> > Fixes: 800d8c63b2e9 ("shmem: add huge pages support")
> > Cc: <stable@vger.kernel.org>
> > Suggested-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
> > Signed-off-by: Yang Shi <shy828301@gmail.com>
> ...
> > @@ -668,6 +673,20 @@ PAGEFLAG_FALSE(DoubleMap)
> >       TESTSCFLAG_FALSE(DoubleMap)
> >  #endif
> >
> > +#if defined(CONFIG_MEMORY_FAILURE) && defined(CONFIG_TRANSPARENT_HUGEPAGE)
> > +/*
> > + * PageHasPoisoned indicates that at least on subpage is hwpoisoned in the
>
> Maybe you meant as follow?
>
> + * PageHasHWPoisoned indicates that at least one subpage is hwpoisoned in the

Yeah, thanks for catching it. It is a typo because the flag was called
PageHasPoisoned. But "poisoned" seems ambiguous for some cases since,
for example, some memory sanitizers use "poisoned", so I renamed it to
PageHasHWPoisoned to make it less ambiguous.

>
> Thanks,
> Naoya Horiguchi

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [v3 PATCH 4/5] mm: shmem: don't truncate page if memory failure happens
  2021-10-01  7:05   ` Naoya Horiguchi
@ 2021-10-01 21:08     ` Yang Shi
  0 siblings, 0 replies; 59+ messages in thread
From: Yang Shi @ 2021-10-01 21:08 UTC (permalink / raw)
  To: Naoya Horiguchi
  Cc: HORIGUCHI NAOYA(堀口 直也),
	Hugh Dickins, Kirill A. Shutemov, Matthew Wilcox, Peter Xu,
	Oscar Salvador, Andrew Morton, Linux MM,
	Linux FS-devel Mailing List, Linux Kernel Mailing List

On Fri, Oct 1, 2021 at 12:05 AM Naoya Horiguchi
<naoya.horiguchi@linux.dev> wrote:
>
> On Thu, Sep 30, 2021 at 02:53:10PM -0700, Yang Shi wrote:
> > The current behavior of memory failure is to truncate the page cache
> > regardless of dirty or clean.  If the page is dirty the later access
> > will get the obsolete data from disk without any notification to the
> > users.  This may cause silent data loss.  It is even worse for shmem
> > since shmem is in-memory filesystem, truncating page cache means
> > discarding data blocks.  The later read would return all zero.
> >
> > The right approach is to keep the corrupted page in page cache, any
> > later access would return error for syscalls or SIGBUS for page fault,
> > until the file is truncated, hole punched or removed.  The regular
> > storage backed filesystems would be more complicated so this patch
> > is focused on shmem.  This also unblock the support for soft
> > offlining shmem THP.
> >
> > Signed-off-by: Yang Shi <shy828301@gmail.com>
> > ---
> ...
> > @@ -894,6 +896,12 @@ static int me_pagecache_clean(struct page_state *ps, struct page *p)
> >               goto out;
> >       }
> >
> > +     /*
> > +      * The shmem page is kept in page cache instead of truncating
> > +      * so need decrement the refcount from page cache.
> > +      */
>
> This comment seems to me confusing because no refcount is decremented here.
> What the variable dec tries to do is to give the expected value of the
> refcount of the error page after successfull erorr handling, which differs
> according to the page state before error handling, so dec adjusts it.
>
> How about the below?
>
> +       /*
> +        * The shmem page is kept in page cache instead of truncating
> +        * so is expected to have an extra refcount after error-handling.
> +        */

Thanks for the suggestion, yes, it seems better.

>
> > +     dec = shmem_mapping(mapping);
> > +
> >       /*
> >        * Truncation is a bit tricky. Enable it per file system for now.
> >        *
> ...
> > @@ -2466,7 +2467,17 @@ shmem_write_begin(struct file *file, struct address_space *mapping,
> >                       return -EPERM;
> >       }
> >
> > -     return shmem_getpage(inode, index, pagep, SGP_WRITE);
> > +     ret = shmem_getpage(inode, index, pagep, SGP_WRITE);
> > +
> > +     if (*pagep) {
> > +             if (PageHWPoison(*pagep)) {
>
> Unless you plan to add some code in the near future, how about merging
> these two if sentences?
>
>         if (*pagep && PageHWPoison(*pagep)) {

Sure.

>
> Thanks,
> Naoya Horiguchi
>
> > +                     unlock_page(*pagep);
> > +                     put_page(*pagep);
> > +                     ret = -EIO;
> > +             }
> > +     }
> > +
> > +     return ret;
> >  }
> >
> >  static int

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [v3 PATCH 5/5] mm: hwpoison: handle non-anonymous THP correctly
  2021-10-01  7:06   ` Naoya Horiguchi
@ 2021-10-01 21:09     ` Yang Shi
  0 siblings, 0 replies; 59+ messages in thread
From: Yang Shi @ 2021-10-01 21:09 UTC (permalink / raw)
  To: Naoya Horiguchi
  Cc: HORIGUCHI NAOYA(堀口 直也),
	Hugh Dickins, Kirill A. Shutemov, Matthew Wilcox, Peter Xu,
	Oscar Salvador, Andrew Morton, Linux MM,
	Linux FS-devel Mailing List, Linux Kernel Mailing List

On Fri, Oct 1, 2021 at 12:06 AM Naoya Horiguchi
<naoya.horiguchi@linux.dev> wrote:
>
> On Thu, Sep 30, 2021 at 02:53:11PM -0700, Yang Shi wrote:
> > Currently hwpoison doesn't handle non-anonymous THP, but since v4.8 THP
> > support for tmpfs and read-only file cache has been added.  They could
> > be offlined by split THP, just like anonymous THP.
> >
> > Signed-off-by: Yang Shi <shy828301@gmail.com>
>
> Acked-by: Naoya Horiguchi <naoya.horiguchi@nec.com>

Thank you!

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [v3 PATCH 2/5] mm: filemap: check if THP has hwpoisoned subpage for PMD page fault
  2021-09-30 21:53 ` [v3 PATCH 2/5] mm: filemap: check if THP has hwpoisoned subpage for PMD page fault Yang Shi
  2021-10-01  7:23   ` Naoya Horiguchi
@ 2021-10-04 14:06   ` Kirill A. Shutemov
  2021-10-04 18:17     ` Yang Shi
  2021-10-06 20:15   ` Peter Xu
                     ` (2 subsequent siblings)
  4 siblings, 1 reply; 59+ messages in thread
From: Kirill A. Shutemov @ 2021-10-04 14:06 UTC (permalink / raw)
  To: Yang Shi
  Cc: naoya.horiguchi, hughd, kirill.shutemov, willy, peterx,
	osalvador, akpm, linux-mm, linux-fsdevel, linux-kernel

On Thu, Sep 30, 2021 at 02:53:08PM -0700, Yang Shi wrote:
> diff --git a/mm/filemap.c b/mm/filemap.c
> index dae481293b5d..2acc2b977f66 100644
> --- a/mm/filemap.c
> +++ b/mm/filemap.c
> @@ -3195,12 +3195,12 @@ static bool filemap_map_pmd(struct vm_fault *vmf, struct page *page)
>  	}
>  
>  	if (pmd_none(*vmf->pmd) && PageTransHuge(page)) {
> -	    vm_fault_t ret = do_set_pmd(vmf, page);
> -	    if (!ret) {
> -		    /* The page is mapped successfully, reference consumed. */
> -		    unlock_page(page);
> -		    return true;
> -	    }
> +		vm_fault_t ret = do_set_pmd(vmf, page);
> +		if (!ret) {
> +			/* The page is mapped successfully, reference consumed. */
> +			unlock_page(page);
> +			return true;
> +		}
>  	}
>  
>  	if (pmd_none(*vmf->pmd)) {

Hm. Is it unrelated whitespace fix?

-- 
 Kirill A. Shutemov

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [v3 PATCH 2/5] mm: filemap: check if THP has hwpoisoned subpage for PMD page fault
  2021-10-04 14:06   ` Kirill A. Shutemov
@ 2021-10-04 18:17     ` Yang Shi
  2021-10-04 19:41       ` Kirill A. Shutemov
  0 siblings, 1 reply; 59+ messages in thread
From: Yang Shi @ 2021-10-04 18:17 UTC (permalink / raw)
  To: Kirill A. Shutemov
  Cc: HORIGUCHI NAOYA(堀口 直也),
	Hugh Dickins, Kirill A. Shutemov, Matthew Wilcox, Peter Xu,
	Oscar Salvador, Andrew Morton, Linux MM,
	Linux FS-devel Mailing List, Linux Kernel Mailing List

On Mon, Oct 4, 2021 at 7:06 AM Kirill A. Shutemov <kirill@shutemov.name> wrote:
>
> On Thu, Sep 30, 2021 at 02:53:08PM -0700, Yang Shi wrote:
> > diff --git a/mm/filemap.c b/mm/filemap.c
> > index dae481293b5d..2acc2b977f66 100644
> > --- a/mm/filemap.c
> > +++ b/mm/filemap.c
> > @@ -3195,12 +3195,12 @@ static bool filemap_map_pmd(struct vm_fault *vmf, struct page *page)
> >       }
> >
> >       if (pmd_none(*vmf->pmd) && PageTransHuge(page)) {
> > -         vm_fault_t ret = do_set_pmd(vmf, page);
> > -         if (!ret) {
> > -                 /* The page is mapped successfully, reference consumed. */
> > -                 unlock_page(page);
> > -                 return true;
> > -         }
> > +             vm_fault_t ret = do_set_pmd(vmf, page);
> > +             if (!ret) {
> > +                     /* The page is mapped successfully, reference consumed. */
> > +                     unlock_page(page);
> > +                     return true;
> > +             }
> >       }
> >
> >       if (pmd_none(*vmf->pmd)) {
>
> Hm. Is it unrelated whitespace fix?

It is a coding style clean up. I thought it may be overkilling to have
a separate patch. Do you prefer separate one?

>
> --
>  Kirill A. Shutemov

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [v3 PATCH 2/5] mm: filemap: check if THP has hwpoisoned subpage for PMD page fault
  2021-10-04 18:17     ` Yang Shi
@ 2021-10-04 19:41       ` Kirill A. Shutemov
  2021-10-04 20:13         ` Yang Shi
  0 siblings, 1 reply; 59+ messages in thread
From: Kirill A. Shutemov @ 2021-10-04 19:41 UTC (permalink / raw)
  To: Yang Shi
  Cc: HORIGUCHI NAOYA(堀口 直也),
	Hugh Dickins, Kirill A. Shutemov, Matthew Wilcox, Peter Xu,
	Oscar Salvador, Andrew Morton, Linux MM,
	Linux FS-devel Mailing List, Linux Kernel Mailing List

On Mon, Oct 04, 2021 at 11:17:29AM -0700, Yang Shi wrote:
> On Mon, Oct 4, 2021 at 7:06 AM Kirill A. Shutemov <kirill@shutemov.name> wrote:
> >
> > On Thu, Sep 30, 2021 at 02:53:08PM -0700, Yang Shi wrote:
> > > diff --git a/mm/filemap.c b/mm/filemap.c
> > > index dae481293b5d..2acc2b977f66 100644
> > > --- a/mm/filemap.c
> > > +++ b/mm/filemap.c
> > > @@ -3195,12 +3195,12 @@ static bool filemap_map_pmd(struct vm_fault *vmf, struct page *page)
> > >       }
> > >
> > >       if (pmd_none(*vmf->pmd) && PageTransHuge(page)) {
> > > -         vm_fault_t ret = do_set_pmd(vmf, page);
> > > -         if (!ret) {
> > > -                 /* The page is mapped successfully, reference consumed. */
> > > -                 unlock_page(page);
> > > -                 return true;
> > > -         }
> > > +             vm_fault_t ret = do_set_pmd(vmf, page);
> > > +             if (!ret) {
> > > +                     /* The page is mapped successfully, reference consumed. */
> > > +                     unlock_page(page);
> > > +                     return true;
> > > +             }
> > >       }
> > >
> > >       if (pmd_none(*vmf->pmd)) {
> >
> > Hm. Is it unrelated whitespace fix?
> 
> It is a coding style clean up. I thought it may be overkilling to have
> a separate patch. Do you prefer separate one?

Maybe. I tried to find what changed here. It's confusing.

-- 
 Kirill A. Shutemov

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [v3 PATCH 2/5] mm: filemap: check if THP has hwpoisoned subpage for PMD page fault
  2021-10-04 19:41       ` Kirill A. Shutemov
@ 2021-10-04 20:13         ` Yang Shi
  2021-10-06 19:54           ` Peter Xu
  0 siblings, 1 reply; 59+ messages in thread
From: Yang Shi @ 2021-10-04 20:13 UTC (permalink / raw)
  To: Kirill A. Shutemov
  Cc: HORIGUCHI NAOYA(堀口 直也),
	Hugh Dickins, Kirill A. Shutemov, Matthew Wilcox, Peter Xu,
	Oscar Salvador, Andrew Morton, Linux MM,
	Linux FS-devel Mailing List, Linux Kernel Mailing List

On Mon, Oct 4, 2021 at 12:41 PM Kirill A. Shutemov <kirill@shutemov.name> wrote:
>
> On Mon, Oct 04, 2021 at 11:17:29AM -0700, Yang Shi wrote:
> > On Mon, Oct 4, 2021 at 7:06 AM Kirill A. Shutemov <kirill@shutemov.name> wrote:
> > >
> > > On Thu, Sep 30, 2021 at 02:53:08PM -0700, Yang Shi wrote:
> > > > diff --git a/mm/filemap.c b/mm/filemap.c
> > > > index dae481293b5d..2acc2b977f66 100644
> > > > --- a/mm/filemap.c
> > > > +++ b/mm/filemap.c
> > > > @@ -3195,12 +3195,12 @@ static bool filemap_map_pmd(struct vm_fault *vmf, struct page *page)
> > > >       }
> > > >
> > > >       if (pmd_none(*vmf->pmd) && PageTransHuge(page)) {
> > > > -         vm_fault_t ret = do_set_pmd(vmf, page);
> > > > -         if (!ret) {
> > > > -                 /* The page is mapped successfully, reference consumed. */
> > > > -                 unlock_page(page);
> > > > -                 return true;
> > > > -         }
> > > > +             vm_fault_t ret = do_set_pmd(vmf, page);
> > > > +             if (!ret) {
> > > > +                     /* The page is mapped successfully, reference consumed. */
> > > > +                     unlock_page(page);
> > > > +                     return true;
> > > > +             }
> > > >       }
> > > >
> > > >       if (pmd_none(*vmf->pmd)) {
> > >
> > > Hm. Is it unrelated whitespace fix?
> >
> > It is a coding style clean up. I thought it may be overkilling to have
> > a separate patch. Do you prefer separate one?
>
> Maybe. I tried to find what changed here. It's confusing.

Yeah, maybe. Anyway I will separate the real big fix and the cleanup
into two patches. This may be helpful for backporting too.

>
> --
>  Kirill A. Shutemov

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [v3 PATCH 1/5] mm: hwpoison: remove the unnecessary THP check
  2021-09-30 21:53 ` [v3 PATCH 1/5] mm: hwpoison: remove the unnecessary THP check Yang Shi
@ 2021-10-06  2:35   ` Yang Shi
  2021-10-06  4:00     ` Naoya Horiguchi
  0 siblings, 1 reply; 59+ messages in thread
From: Yang Shi @ 2021-10-06  2:35 UTC (permalink / raw)
  To: HORIGUCHI NAOYA(堀口 直也),
	Hugh Dickins, Kirill A. Shutemov, Matthew Wilcox, Peter Xu,
	Oscar Salvador, Andrew Morton
  Cc: Linux MM, Linux FS-devel Mailing List, Linux Kernel Mailing List

Hi Naoya,

Any comment on this patch and patch #3? I'd prefer to fix more
comments for a new version.

Thanks,
Yang


On Thu, Sep 30, 2021 at 2:53 PM Yang Shi <shy828301@gmail.com> wrote:
>
> When handling THP hwpoison checked if the THP is in allocation or free
> stage since hwpoison may mistreat it as hugetlb page.  After
> commit 415c64c1453a ("mm/memory-failure: split thp earlier in memory error
> handling") the problem has been fixed, so this check is no longer
> needed.  Remove it.  The side effect of the removal is hwpoison may
> report unsplit THP instead of unknown error for shmem THP.  It seems not
> like a big deal.
>
> The following patch depends on this, which fixes shmem THP with
> hwpoisoned subpage(s) are mapped PMD wrongly.  So this patch needs to be
> backported to -stable as well.
>
> Cc: <stable@vger.kernel.org>
> Suggested-by: Naoya Horiguchi <naoya.horiguchi@nec.com>
> Signed-off-by: Yang Shi <shy828301@gmail.com>
> ---
>  mm/memory-failure.c | 14 --------------
>  1 file changed, 14 deletions(-)
>
> diff --git a/mm/memory-failure.c b/mm/memory-failure.c
> index 54879c339024..ed28eba50f98 100644
> --- a/mm/memory-failure.c
> +++ b/mm/memory-failure.c
> @@ -1147,20 +1147,6 @@ static int __get_hwpoison_page(struct page *page)
>         if (!HWPoisonHandlable(head))
>                 return -EBUSY;
>
> -       if (PageTransHuge(head)) {
> -               /*
> -                * Non anonymous thp exists only in allocation/free time. We
> -                * can't handle such a case correctly, so let's give it up.
> -                * This should be better than triggering BUG_ON when kernel
> -                * tries to touch the "partially handled" page.
> -                */
> -               if (!PageAnon(head)) {
> -                       pr_err("Memory failure: %#lx: non anonymous thp\n",
> -                               page_to_pfn(page));
> -                       return 0;
> -               }
> -       }
> -
>         if (get_page_unless_zero(head)) {
>                 if (head == compound_head(page))
>                         return 1;
> --
> 2.26.2
>

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [v3 PATCH 1/5] mm: hwpoison: remove the unnecessary THP check
  2021-10-06  2:35   ` Yang Shi
@ 2021-10-06  4:00     ` Naoya Horiguchi
  2021-10-06 17:56       ` Yang Shi
  0 siblings, 1 reply; 59+ messages in thread
From: Naoya Horiguchi @ 2021-10-06  4:00 UTC (permalink / raw)
  To: Yang Shi
  Cc: naoya.horiguchi, Hugh Dickins, Kirill A. Shutemov,
	Matthew Wilcox, Peter Xu, Oscar Salvador, Andrew Morton,
	Linux MM, Linux FS-devel Mailing List, Linux Kernel Mailing List

> Any comment on this patch and patch #3? I'd prefer to fix more
> comments for a new version.

No. Both 1/5 and 3/5 look fine to me. So ...

> On Thu, Sep 30, 2021 at 2:53 PM Yang Shi <shy828301@gmail.com> wrote:
> >
> > When handling THP hwpoison checked if the THP is in allocation or free
> > stage since hwpoison may mistreat it as hugetlb page.  After
> > commit 415c64c1453a ("mm/memory-failure: split thp earlier in memory error
> > handling") the problem has been fixed, so this check is no longer
> > needed.  Remove it.  The side effect of the removal is hwpoison may
> > report unsplit THP instead of unknown error for shmem THP.  It seems not
> > like a big deal.
> >
> > The following patch depends on this, which fixes shmem THP with
> > hwpoisoned subpage(s) are mapped PMD wrongly.  So this patch needs to be
> > backported to -stable as well.
> >
> > Cc: <stable@vger.kernel.org>
> > Suggested-by: Naoya Horiguchi <naoya.horiguchi@nec.com>
> > Signed-off-by: Yang Shi <shy828301@gmail.com>

Acked-by: Naoya Horiguchi <naoya.horiguchi@nec.com>

Patch 3/5 already has my Signed-off-by, so I think it can be considered
as acked by me.

Thanks,
Naoya Horiguchi

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [v3 PATCH 1/5] mm: hwpoison: remove the unnecessary THP check
  2021-10-06  4:00     ` Naoya Horiguchi
@ 2021-10-06 17:56       ` Yang Shi
  0 siblings, 0 replies; 59+ messages in thread
From: Yang Shi @ 2021-10-06 17:56 UTC (permalink / raw)
  To: Naoya Horiguchi
  Cc: HORIGUCHI NAOYA(堀口 直也),
	Hugh Dickins, Kirill A. Shutemov, Matthew Wilcox, Peter Xu,
	Oscar Salvador, Andrew Morton, Linux MM,
	Linux FS-devel Mailing List, Linux Kernel Mailing List

On Tue, Oct 5, 2021 at 9:00 PM Naoya Horiguchi
<naoya.horiguchi@linux.dev> wrote:
>
> > Any comment on this patch and patch #3? I'd prefer to fix more
> > comments for a new version.
>
> No. Both 1/5 and 3/5 look fine to me. So ...
>
> > On Thu, Sep 30, 2021 at 2:53 PM Yang Shi <shy828301@gmail.com> wrote:
> > >
> > > When handling THP hwpoison checked if the THP is in allocation or free
> > > stage since hwpoison may mistreat it as hugetlb page.  After
> > > commit 415c64c1453a ("mm/memory-failure: split thp earlier in memory error
> > > handling") the problem has been fixed, so this check is no longer
> > > needed.  Remove it.  The side effect of the removal is hwpoison may
> > > report unsplit THP instead of unknown error for shmem THP.  It seems not
> > > like a big deal.
> > >
> > > The following patch depends on this, which fixes shmem THP with
> > > hwpoisoned subpage(s) are mapped PMD wrongly.  So this patch needs to be
> > > backported to -stable as well.
> > >
> > > Cc: <stable@vger.kernel.org>
> > > Suggested-by: Naoya Horiguchi <naoya.horiguchi@nec.com>
> > > Signed-off-by: Yang Shi <shy828301@gmail.com>
>
> Acked-by: Naoya Horiguchi <naoya.horiguchi@nec.com>
>
> Patch 3/5 already has my Signed-off-by, so I think it can be considered
> as acked by me.

Aha, thanks. I will add your ack to patch 3/5 too.

>
> Thanks,
> Naoya Horiguchi

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [v3 PATCH 2/5] mm: filemap: check if THP has hwpoisoned subpage for PMD page fault
  2021-10-04 20:13         ` Yang Shi
@ 2021-10-06 19:54           ` Peter Xu
  2021-10-06 23:41             ` Yang Shi
  2021-10-08  9:35             ` Kirill A. Shutemov
  0 siblings, 2 replies; 59+ messages in thread
From: Peter Xu @ 2021-10-06 19:54 UTC (permalink / raw)
  To: Yang Shi
  Cc: Kirill A. Shutemov,
	HORIGUCHI NAOYA(堀口 直也),
	Hugh Dickins, Kirill A. Shutemov, Matthew Wilcox, Oscar Salvador,
	Andrew Morton, Linux MM, Linux FS-devel Mailing List,
	Linux Kernel Mailing List

On Mon, Oct 04, 2021 at 01:13:07PM -0700, Yang Shi wrote:
> On Mon, Oct 4, 2021 at 12:41 PM Kirill A. Shutemov <kirill@shutemov.name> wrote:
> >
> > On Mon, Oct 04, 2021 at 11:17:29AM -0700, Yang Shi wrote:
> > > On Mon, Oct 4, 2021 at 7:06 AM Kirill A. Shutemov <kirill@shutemov.name> wrote:
> > > >
> > > > On Thu, Sep 30, 2021 at 02:53:08PM -0700, Yang Shi wrote:
> > > > > diff --git a/mm/filemap.c b/mm/filemap.c
> > > > > index dae481293b5d..2acc2b977f66 100644
> > > > > --- a/mm/filemap.c
> > > > > +++ b/mm/filemap.c
> > > > > @@ -3195,12 +3195,12 @@ static bool filemap_map_pmd(struct vm_fault *vmf, struct page *page)
> > > > >       }
> > > > >
> > > > >       if (pmd_none(*vmf->pmd) && PageTransHuge(page)) {
> > > > > -         vm_fault_t ret = do_set_pmd(vmf, page);
> > > > > -         if (!ret) {
> > > > > -                 /* The page is mapped successfully, reference consumed. */
> > > > > -                 unlock_page(page);
> > > > > -                 return true;
> > > > > -         }
> > > > > +             vm_fault_t ret = do_set_pmd(vmf, page);
> > > > > +             if (!ret) {
> > > > > +                     /* The page is mapped successfully, reference consumed. */
> > > > > +                     unlock_page(page);
> > > > > +                     return true;
> > > > > +             }
> > > > >       }
> > > > >
> > > > >       if (pmd_none(*vmf->pmd)) {
> > > >
> > > > Hm. Is it unrelated whitespace fix?
> > >
> > > It is a coding style clean up. I thought it may be overkilling to have
> > > a separate patch. Do you prefer separate one?
> >
> > Maybe. I tried to find what changed here. It's confusing.
> 
> Yeah, maybe. Anyway I will separate the real big fix and the cleanup
> into two patches. This may be helpful for backporting too.

Or maybe we just don't touch it until there's need for a functional change?  I
feel it a pity to lose the git blame info for reindent-only patches, but no
strong opinion, because I know many people don't think the same and I'm fine
with either ways.

Another side note: perhaps a comment above pageflags enum on PG_has_hwpoisoned
would be nice?  I saw that we've got a bunch of those already.

Thanks,

-- 
Peter Xu


^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [v3 PATCH 2/5] mm: filemap: check if THP has hwpoisoned subpage for PMD page fault
  2021-09-30 21:53 ` [v3 PATCH 2/5] mm: filemap: check if THP has hwpoisoned subpage for PMD page fault Yang Shi
  2021-10-01  7:23   ` Naoya Horiguchi
  2021-10-04 14:06   ` Kirill A. Shutemov
@ 2021-10-06 20:15   ` Peter Xu
  2021-10-06 23:57     ` Yang Shi
  2021-10-06 20:18   ` Peter Xu
  2021-11-01 19:05   ` Naresh Kamboju
  4 siblings, 1 reply; 59+ messages in thread
From: Peter Xu @ 2021-10-06 20:15 UTC (permalink / raw)
  To: Yang Shi
  Cc: naoya.horiguchi, hughd, kirill.shutemov, willy, osalvador, akpm,
	linux-mm, linux-fsdevel, linux-kernel

On Thu, Sep 30, 2021 at 02:53:08PM -0700, Yang Shi wrote:
> @@ -1148,8 +1148,12 @@ static int __get_hwpoison_page(struct page *page)
>  		return -EBUSY;
>  
>  	if (get_page_unless_zero(head)) {
> -		if (head == compound_head(page))
> +		if (head == compound_head(page)) {
> +			if (PageTransHuge(head))
> +				SetPageHasHWPoisoned(head);
> +
>  			return 1;
> +		}
>  
>  		pr_info("Memory failure: %#lx cannot catch tail\n",
>  			page_to_pfn(page));

Sorry for the late comments.

I'm wondering whether it's ideal to set this bit here, as get_hwpoison_page()
sounds like a pure helper to get a refcount out of a sane hwpoisoned page.  I'm
afraid there can be side effect that we set this without being noticed, so I'm
also wondering we should keep it in memory_failure().

Quotting comments for get_hwpoison_page():

 * get_hwpoison_page() takes a page refcount of an error page to handle memory
 * error on it, after checking that the error page is in a well-defined state
 * (defined as a page-type we can successfully handle the memor error on it,
 * such as LRU page and hugetlb page).

For example, I see that both unpoison_memory() and soft_offline_page() will
call it too, does it mean that we'll also set the bits e.g. even when we want
to inject an unpoison event too?

Thanks,

-- 
Peter Xu


^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [v3 PATCH 2/5] mm: filemap: check if THP has hwpoisoned subpage for PMD page fault
  2021-09-30 21:53 ` [v3 PATCH 2/5] mm: filemap: check if THP has hwpoisoned subpage for PMD page fault Yang Shi
                     ` (2 preceding siblings ...)
  2021-10-06 20:15   ` Peter Xu
@ 2021-10-06 20:18   ` Peter Xu
  2021-10-07  2:49     ` Yang Shi
  2021-11-01 19:05   ` Naresh Kamboju
  4 siblings, 1 reply; 59+ messages in thread
From: Peter Xu @ 2021-10-06 20:18 UTC (permalink / raw)
  To: Yang Shi
  Cc: naoya.horiguchi, hughd, kirill.shutemov, willy, osalvador, akpm,
	linux-mm, linux-fsdevel, linux-kernel

On Thu, Sep 30, 2021 at 02:53:08PM -0700, Yang Shi wrote:
> +#ifdef CONFIG_MEMORY_FAILURE
> +	/* Compound pages. Stored in first tail page's flags */
> +	PG_has_hwpoisoned = PG_mappedtodisk,
> +#endif

Sorry one more comment I missed: would PG_hwpoisoned_subpage better?  It's just
that "has_hwpoison" can be directly read as "this page has been hwpoisoned",
which sounds too close to PG_hwpoisoned.  No strong opinion either.  Thanks,

-- 
Peter Xu


^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [v3 PATCH 3/5] mm: hwpoison: refactor refcount check handling
  2021-09-30 21:53 ` [v3 PATCH 3/5] mm: hwpoison: refactor refcount check handling Yang Shi
@ 2021-10-06 22:01   ` Peter Xu
  2021-10-07  2:47     ` Yang Shi
  0 siblings, 1 reply; 59+ messages in thread
From: Peter Xu @ 2021-10-06 22:01 UTC (permalink / raw)
  To: Yang Shi
  Cc: naoya.horiguchi, hughd, kirill.shutemov, willy, osalvador, akpm,
	linux-mm, linux-fsdevel, linux-kernel

On Thu, Sep 30, 2021 at 02:53:09PM -0700, Yang Shi wrote:
> +/*
> + * Return true if page is still referenced by others, otherwise return
> + * false.
> + *
> + * The dec is true when one extra refcount is expected.
> + */
> +static bool has_extra_refcount(struct page_state *ps, struct page *p,
> +			       bool dec)

Nit: would it be nicer to keep using things like "extra_pins", so we pass in 1
for swapcache dirty case and 0 for the rest?  Then it'll also match with most
of the similar cases in e.g. huge_memory.c (please try grep "extra_pins" there).

> +{
> +	int count = page_count(p) - 1;
> +
> +	if (dec)
> +		count -= 1;
> +
> +	if (count > 0) {
> +		pr_err("Memory failure: %#lx: %s still referenced by %d users\n",
> +		       page_to_pfn(p), action_page_types[ps->type], count);
> +		return true;
> +	}
> +
> +	return false;
> +}
> +
>  /*
>   * Error hit kernel page.
>   * Do nothing, try to be lucky and not touch this instead. For a few cases we
>   * could be more sophisticated.
>   */
> -static int me_kernel(struct page *p, unsigned long pfn)
> +static int me_kernel(struct page_state *ps, struct page *p)

Not sure whether it's intended, but some of the action() hooks do not call the
refcount check now while in the past they'll all do.  Just to double check
they're expected, like this one and me_unknown().

>  {
>  	unlock_page(p);
>  	return MF_IGNORED;
> @@ -820,9 +852,9 @@ static int me_kernel(struct page *p, unsigned long pfn)
>  /*
>   * Page in unknown state. Do nothing.
>   */
> -static int me_unknown(struct page *p, unsigned long pfn)
> +static int me_unknown(struct page_state *ps, struct page *p)
>  {
> -	pr_err("Memory failure: %#lx: Unknown page state\n", pfn);
> +	pr_err("Memory failure: %#lx: Unknown page state\n", page_to_pfn(p));
>  	unlock_page(p);
>  	return MF_FAILED;
>  }

Thanks,

-- 
Peter Xu


^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [v3 PATCH 2/5] mm: filemap: check if THP has hwpoisoned subpage for PMD page fault
  2021-10-06 19:54           ` Peter Xu
@ 2021-10-06 23:41             ` Yang Shi
  2021-10-07 16:14               ` Peter Xu
  2021-10-08  9:35             ` Kirill A. Shutemov
  1 sibling, 1 reply; 59+ messages in thread
From: Yang Shi @ 2021-10-06 23:41 UTC (permalink / raw)
  To: Peter Xu
  Cc: Kirill A. Shutemov,
	HORIGUCHI NAOYA(堀口 直也),
	Hugh Dickins, Kirill A. Shutemov, Matthew Wilcox, Oscar Salvador,
	Andrew Morton, Linux MM, Linux FS-devel Mailing List,
	Linux Kernel Mailing List

On Wed, Oct 6, 2021 at 12:54 PM Peter Xu <peterx@redhat.com> wrote:
>
> On Mon, Oct 04, 2021 at 01:13:07PM -0700, Yang Shi wrote:
> > On Mon, Oct 4, 2021 at 12:41 PM Kirill A. Shutemov <kirill@shutemov.name> wrote:
> > >
> > > On Mon, Oct 04, 2021 at 11:17:29AM -0700, Yang Shi wrote:
> > > > On Mon, Oct 4, 2021 at 7:06 AM Kirill A. Shutemov <kirill@shutemov.name> wrote:
> > > > >
> > > > > On Thu, Sep 30, 2021 at 02:53:08PM -0700, Yang Shi wrote:
> > > > > > diff --git a/mm/filemap.c b/mm/filemap.c
> > > > > > index dae481293b5d..2acc2b977f66 100644
> > > > > > --- a/mm/filemap.c
> > > > > > +++ b/mm/filemap.c
> > > > > > @@ -3195,12 +3195,12 @@ static bool filemap_map_pmd(struct vm_fault *vmf, struct page *page)
> > > > > >       }
> > > > > >
> > > > > >       if (pmd_none(*vmf->pmd) && PageTransHuge(page)) {
> > > > > > -         vm_fault_t ret = do_set_pmd(vmf, page);
> > > > > > -         if (!ret) {
> > > > > > -                 /* The page is mapped successfully, reference consumed. */
> > > > > > -                 unlock_page(page);
> > > > > > -                 return true;
> > > > > > -         }
> > > > > > +             vm_fault_t ret = do_set_pmd(vmf, page);
> > > > > > +             if (!ret) {
> > > > > > +                     /* The page is mapped successfully, reference consumed. */
> > > > > > +                     unlock_page(page);
> > > > > > +                     return true;
> > > > > > +             }
> > > > > >       }
> > > > > >
> > > > > >       if (pmd_none(*vmf->pmd)) {
> > > > >
> > > > > Hm. Is it unrelated whitespace fix?
> > > >
> > > > It is a coding style clean up. I thought it may be overkilling to have
> > > > a separate patch. Do you prefer separate one?
> > >
> > > Maybe. I tried to find what changed here. It's confusing.
> >
> > Yeah, maybe. Anyway I will separate the real big fix and the cleanup
> > into two patches. This may be helpful for backporting too.
>
> Or maybe we just don't touch it until there's need for a functional change?  I
> feel it a pity to lose the git blame info for reindent-only patches, but no
> strong opinion, because I know many people don't think the same and I'm fine
> with either ways.

TBH I really don't think keeping old "git blame" info should be an
excuse to avoid any coding style cleanup.

>
> Another side note: perhaps a comment above pageflags enum on PG_has_hwpoisoned
> would be nice?  I saw that we've got a bunch of those already.

I was thinking about that, but it seems PG_double_map doesn't have
comment there either so I didn't add.

>
> Thanks,
>
> --
> Peter Xu
>

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [v3 PATCH 2/5] mm: filemap: check if THP has hwpoisoned subpage for PMD page fault
  2021-10-06 20:15   ` Peter Xu
@ 2021-10-06 23:57     ` Yang Shi
  2021-10-07 16:06       ` Peter Xu
  2021-10-07 21:28       ` Yang Shi
  0 siblings, 2 replies; 59+ messages in thread
From: Yang Shi @ 2021-10-06 23:57 UTC (permalink / raw)
  To: Peter Xu
  Cc: HORIGUCHI NAOYA(堀口 直也),
	Hugh Dickins, Kirill A. Shutemov, Matthew Wilcox, Oscar Salvador,
	Andrew Morton, Linux MM, Linux FS-devel Mailing List,
	Linux Kernel Mailing List

On Wed, Oct 6, 2021 at 1:15 PM Peter Xu <peterx@redhat.com> wrote:
>
> On Thu, Sep 30, 2021 at 02:53:08PM -0700, Yang Shi wrote:
> > @@ -1148,8 +1148,12 @@ static int __get_hwpoison_page(struct page *page)
> >               return -EBUSY;
> >
> >       if (get_page_unless_zero(head)) {
> > -             if (head == compound_head(page))
> > +             if (head == compound_head(page)) {
> > +                     if (PageTransHuge(head))
> > +                             SetPageHasHWPoisoned(head);
> > +
> >                       return 1;
> > +             }
> >
> >               pr_info("Memory failure: %#lx cannot catch tail\n",
> >                       page_to_pfn(page));
>
> Sorry for the late comments.
>
> I'm wondering whether it's ideal to set this bit here, as get_hwpoison_page()
> sounds like a pure helper to get a refcount out of a sane hwpoisoned page.  I'm
> afraid there can be side effect that we set this without being noticed, so I'm
> also wondering we should keep it in memory_failure().
>
> Quotting comments for get_hwpoison_page():
>
>  * get_hwpoison_page() takes a page refcount of an error page to handle memory
>  * error on it, after checking that the error page is in a well-defined state
>  * (defined as a page-type we can successfully handle the memor error on it,
>  * such as LRU page and hugetlb page).
>
> For example, I see that both unpoison_memory() and soft_offline_page() will
> call it too, does it mean that we'll also set the bits e.g. even when we want
> to inject an unpoison event too?

unpoison_memory() should be not a problem since it will just bail out
once THP is met as the comment says:

/*
* unpoison_memory() can encounter thp only when the thp is being
* worked by memory_failure() and the page lock is not held yet.
* In such case, we yield to memory_failure() and make unpoison fail.
*/


And I think we should set the flag for soft offline too, right? The
soft offline does set the hwpoison flag for the corrupted sub page and
doesn't split file THP, so it should be captured by page fault as
well. And yes for poison injection.

But your comment reminds me that get_hwpoison_page() is just called
when !MF_COUNT_INCREASED, so it means MADV_HWPOISON still could
escape. This needs to be covered too.

BTW, I did the test with MADV_HWPOISON, but I didn't test this change
(moving flag set after get_page_unless_zero()) since I thought it was
just a trivial change and did overlook this case.

>
> Thanks,
>
> --
> Peter Xu
>

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [v3 PATCH 3/5] mm: hwpoison: refactor refcount check handling
  2021-10-06 22:01   ` Peter Xu
@ 2021-10-07  2:47     ` Yang Shi
  2021-10-07 16:18       ` Peter Xu
  0 siblings, 1 reply; 59+ messages in thread
From: Yang Shi @ 2021-10-07  2:47 UTC (permalink / raw)
  To: Peter Xu
  Cc: HORIGUCHI NAOYA(堀口 直也),
	Hugh Dickins, Kirill A. Shutemov, Matthew Wilcox, Oscar Salvador,
	Andrew Morton, Linux MM, Linux FS-devel Mailing List,
	Linux Kernel Mailing List

On Wed, Oct 6, 2021 at 3:02 PM Peter Xu <peterx@redhat.com> wrote:
>
> On Thu, Sep 30, 2021 at 02:53:09PM -0700, Yang Shi wrote:
> > +/*
> > + * Return true if page is still referenced by others, otherwise return
> > + * false.
> > + *
> > + * The dec is true when one extra refcount is expected.
> > + */
> > +static bool has_extra_refcount(struct page_state *ps, struct page *p,
> > +                            bool dec)
>
> Nit: would it be nicer to keep using things like "extra_pins", so we pass in 1
> for swapcache dirty case and 0 for the rest?  Then it'll also match with most
> of the similar cases in e.g. huge_memory.c (please try grep "extra_pins" there).

Thanks for the suggestion. Yes, it makes some sense to me. And the
code comments in patch 4/5 does says (the suggested version by Naoya):

/*
 * The shmem page is kept in page cache instead of truncating
 * so is expected to have an extra refcount after error-handling.
 */

Will rename it in the new version.

>
> > +{
> > +     int count = page_count(p) - 1;
> > +
> > +     if (dec)
> > +             count -= 1;
> > +
> > +     if (count > 0) {
> > +             pr_err("Memory failure: %#lx: %s still referenced by %d users\n",
> > +                    page_to_pfn(p), action_page_types[ps->type], count);
> > +             return true;
> > +     }
> > +
> > +     return false;
> > +}
> > +
> >  /*
> >   * Error hit kernel page.
> >   * Do nothing, try to be lucky and not touch this instead. For a few cases we
> >   * could be more sophisticated.
> >   */
> > -static int me_kernel(struct page *p, unsigned long pfn)
> > +static int me_kernel(struct page_state *ps, struct page *p)
>
> Not sure whether it's intended, but some of the action() hooks do not call the
> refcount check now while in the past they'll all do.  Just to double check
> they're expected, like this one and me_unknown().

Yeah, it is intentional. Before this change all me_* handlers did
check refcount even though it was not necessary, for example,
me_kernel() and me_unknown().

>
> >  {
> >       unlock_page(p);
> >       return MF_IGNORED;
> > @@ -820,9 +852,9 @@ static int me_kernel(struct page *p, unsigned long pfn)
> >  /*
> >   * Page in unknown state. Do nothing.
> >   */
> > -static int me_unknown(struct page *p, unsigned long pfn)
> > +static int me_unknown(struct page_state *ps, struct page *p)
> >  {
> > -     pr_err("Memory failure: %#lx: Unknown page state\n", pfn);
> > +     pr_err("Memory failure: %#lx: Unknown page state\n", page_to_pfn(p));
> >       unlock_page(p);
> >       return MF_FAILED;
> >  }
>
> Thanks,
>
> --
> Peter Xu
>

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [v3 PATCH 2/5] mm: filemap: check if THP has hwpoisoned subpage for PMD page fault
  2021-10-06 20:18   ` Peter Xu
@ 2021-10-07  2:49     ` Yang Shi
  0 siblings, 0 replies; 59+ messages in thread
From: Yang Shi @ 2021-10-07  2:49 UTC (permalink / raw)
  To: Peter Xu
  Cc: HORIGUCHI NAOYA(堀口 直也),
	Hugh Dickins, Kirill A. Shutemov, Matthew Wilcox, Oscar Salvador,
	Andrew Morton, Linux MM, Linux FS-devel Mailing List,
	Linux Kernel Mailing List

On Wed, Oct 6, 2021 at 1:18 PM Peter Xu <peterx@redhat.com> wrote:
>
> On Thu, Sep 30, 2021 at 02:53:08PM -0700, Yang Shi wrote:
> > +#ifdef CONFIG_MEMORY_FAILURE
> > +     /* Compound pages. Stored in first tail page's flags */
> > +     PG_has_hwpoisoned = PG_mappedtodisk,
> > +#endif
>
> Sorry one more comment I missed: would PG_hwpoisoned_subpage better?  It's just
> that "has_hwpoison" can be directly read as "this page has been hwpoisoned",
> which sounds too close to PG_hwpoisoned.  No strong opinion either.  Thanks,

TBH, I don't see too much difference IMHO. Someone may interpret it to
"this" subpage is hwpoisoned rather than at least one subpage of THP
is hwpoisoned.

>
> --
> Peter Xu
>

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [v3 PATCH 2/5] mm: filemap: check if THP has hwpoisoned subpage for PMD page fault
  2021-10-06 23:57     ` Yang Shi
@ 2021-10-07 16:06       ` Peter Xu
  2021-10-07 18:19         ` Yang Shi
  2021-10-07 21:28       ` Yang Shi
  1 sibling, 1 reply; 59+ messages in thread
From: Peter Xu @ 2021-10-07 16:06 UTC (permalink / raw)
  To: Yang Shi
  Cc: HORIGUCHI NAOYA(堀口 直也),
	Hugh Dickins, Kirill A. Shutemov, Matthew Wilcox, Oscar Salvador,
	Andrew Morton, Linux MM, Linux FS-devel Mailing List,
	Linux Kernel Mailing List

On Wed, Oct 06, 2021 at 04:57:38PM -0700, Yang Shi wrote:
> > For example, I see that both unpoison_memory() and soft_offline_page() will
> > call it too, does it mean that we'll also set the bits e.g. even when we want
> > to inject an unpoison event too?
> 
> unpoison_memory() should be not a problem since it will just bail out
> once THP is met as the comment says:
> 
> /*
> * unpoison_memory() can encounter thp only when the thp is being
> * worked by memory_failure() and the page lock is not held yet.
> * In such case, we yield to memory_failure() and make unpoison fail.
> */

But I still think setting the subpage-hwpoison bit hides too deep there, it'll
be great we can keep get_hwpoison_page() as simple as a safe version of getting
the refcount of the page we want.  Or we'd still better touch up the comment
above get_hwpoison_page() to show that side effect.

> 
> 
> And I think we should set the flag for soft offline too, right? The

I'm not familiar with either memory failure or soft offline, so far it looks
right to me.  However..

> soft offline does set the hwpoison flag for the corrupted sub page and
> doesn't split file THP,

.. I believe this will become not true after your patch 5, right?

> so it should be captured by page fault as well. And yes for poison injection.

One more thing: besides thp split and page free, do we need to conditionally
drop the HasHwpoisoned bit when received an unpoison event?

If my understanding is correct, we may need to scan all the subpages there, to
make sure HasHwpoisoned bit reflects the latest status for the thp in question.

> 
> But your comment reminds me that get_hwpoison_page() is just called
> when !MF_COUNT_INCREASED, so it means MADV_HWPOISON still could
> escape. This needs to be covered too.

Right, maybe that's also a clue that we shouldn't set the new page flag within
get_hwpoison_page(), since get_hwpoison_page() is actually well coupled with
MF_COUNT_INCREASED and all of them are only about refcounting of the pages.

-- 
Peter Xu


^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [v3 PATCH 2/5] mm: filemap: check if THP has hwpoisoned subpage for PMD page fault
  2021-10-06 23:41             ` Yang Shi
@ 2021-10-07 16:14               ` Peter Xu
  2021-10-07 18:28                 ` Yang Shi
  0 siblings, 1 reply; 59+ messages in thread
From: Peter Xu @ 2021-10-07 16:14 UTC (permalink / raw)
  To: Yang Shi
  Cc: Kirill A. Shutemov,
	HORIGUCHI NAOYA(堀口 直也),
	Hugh Dickins, Kirill A. Shutemov, Matthew Wilcox, Oscar Salvador,
	Andrew Morton, Linux MM, Linux FS-devel Mailing List,
	Linux Kernel Mailing List

On Wed, Oct 06, 2021 at 04:41:35PM -0700, Yang Shi wrote:
> > Or maybe we just don't touch it until there's need for a functional change?  I
> > feel it a pity to lose the git blame info for reindent-only patches, but no
> > strong opinion, because I know many people don't think the same and I'm fine
> > with either ways.
> 
> TBH I really don't think keeping old "git blame" info should be an
> excuse to avoid any coding style cleanup.

Sure.

> 
> >
> > Another side note: perhaps a comment above pageflags enum on PG_has_hwpoisoned
> > would be nice?  I saw that we've got a bunch of those already.
> 
> I was thinking about that, but it seems PG_double_map doesn't have
> comment there either so I didn't add.

IMHO that means we may just need even more documentations? :)

I won't ask for documenting doublemap bit in this series, but I just don't
think it's a good excuse to not provide documentations if we still can.
Especially to me PageHasHwpoisoned looks really so like PageHwpoisoned, so
it'll be still very nice to have some good document along with the patch it's
introduced.

Thanks,

-- 
Peter Xu


^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [v3 PATCH 3/5] mm: hwpoison: refactor refcount check handling
  2021-10-07  2:47     ` Yang Shi
@ 2021-10-07 16:18       ` Peter Xu
  0 siblings, 0 replies; 59+ messages in thread
From: Peter Xu @ 2021-10-07 16:18 UTC (permalink / raw)
  To: Yang Shi
  Cc: HORIGUCHI NAOYA(堀口 直也),
	Hugh Dickins, Kirill A. Shutemov, Matthew Wilcox, Oscar Salvador,
	Andrew Morton, Linux MM, Linux FS-devel Mailing List,
	Linux Kernel Mailing List

On Wed, Oct 06, 2021 at 07:47:20PM -0700, Yang Shi wrote:
> Yeah, it is intentional. Before this change all me_* handlers did
> check refcount even though it was not necessary, for example,
> me_kernel() and me_unknown().

Would you mind add some explanation into the commit message on what kind of
pages dropped the refcount check, and why they can be dropped, when you respin?
Thanks a lot.

-- 
Peter Xu


^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [v3 PATCH 2/5] mm: filemap: check if THP has hwpoisoned subpage for PMD page fault
  2021-10-07 16:06       ` Peter Xu
@ 2021-10-07 18:19         ` Yang Shi
  2021-10-07 20:27           ` Yang Shi
  0 siblings, 1 reply; 59+ messages in thread
From: Yang Shi @ 2021-10-07 18:19 UTC (permalink / raw)
  To: Peter Xu
  Cc: HORIGUCHI NAOYA(堀口 直也),
	Hugh Dickins, Kirill A. Shutemov, Matthew Wilcox, Oscar Salvador,
	Andrew Morton, Linux MM, Linux FS-devel Mailing List,
	Linux Kernel Mailing List

On Thu, Oct 7, 2021 at 9:06 AM Peter Xu <peterx@redhat.com> wrote:
>
> On Wed, Oct 06, 2021 at 04:57:38PM -0700, Yang Shi wrote:
> > > For example, I see that both unpoison_memory() and soft_offline_page() will
> > > call it too, does it mean that we'll also set the bits e.g. even when we want
> > > to inject an unpoison event too?
> >
> > unpoison_memory() should be not a problem since it will just bail out
> > once THP is met as the comment says:
> >
> > /*
> > * unpoison_memory() can encounter thp only when the thp is being
> > * worked by memory_failure() and the page lock is not held yet.
> > * In such case, we yield to memory_failure() and make unpoison fail.
> > */
>
> But I still think setting the subpage-hwpoison bit hides too deep there, it'll
> be great we can keep get_hwpoison_page() as simple as a safe version of getting
> the refcount of the page we want.  Or we'd still better touch up the comment
> above get_hwpoison_page() to show that side effect.
>
> >
> >
> > And I think we should set the flag for soft offline too, right? The
>
> I'm not familiar with either memory failure or soft offline, so far it looks
> right to me.  However..
>
> > soft offline does set the hwpoison flag for the corrupted sub page and
> > doesn't split file THP,
>
> .. I believe this will become not true after your patch 5, right?

But THP split may fail, right?

>
> > so it should be captured by page fault as well. And yes for poison injection.
>
> One more thing: besides thp split and page free, do we need to conditionally
> drop the HasHwpoisoned bit when received an unpoison event?

It seems not to me, as the above comment from unpoison_memory() says
unpoison can encounter thp only when the thp is being worked by
memory_failure() and the page lock is not held yet. So it just bails
out.

In addition, unpoison just works for software injected errors, not
real hardware failure.

>
> If my understanding is correct, we may need to scan all the subpages there, to
> make sure HasHwpoisoned bit reflects the latest status for the thp in question.
>
> >
> > But your comment reminds me that get_hwpoison_page() is just called
> > when !MF_COUNT_INCREASED, so it means MADV_HWPOISON still could
> > escape. This needs to be covered too.
>
> Right, maybe that's also a clue that we shouldn't set the new page flag within
> get_hwpoison_page(), since get_hwpoison_page() is actually well coupled with
> MF_COUNT_INCREASED and all of them are only about refcounting of the pages.

Yeah, maybe, as long as there is not early bail out in some error
handling paths.

>
> --
> Peter Xu
>

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [v3 PATCH 2/5] mm: filemap: check if THP has hwpoisoned subpage for PMD page fault
  2021-10-07 16:14               ` Peter Xu
@ 2021-10-07 18:28                 ` Yang Shi
  0 siblings, 0 replies; 59+ messages in thread
From: Yang Shi @ 2021-10-07 18:28 UTC (permalink / raw)
  To: Peter Xu
  Cc: Kirill A. Shutemov,
	HORIGUCHI NAOYA(堀口 直也),
	Hugh Dickins, Kirill A. Shutemov, Matthew Wilcox, Oscar Salvador,
	Andrew Morton, Linux MM, Linux FS-devel Mailing List,
	Linux Kernel Mailing List

On Thu, Oct 7, 2021 at 9:14 AM Peter Xu <peterx@redhat.com> wrote:
>
> On Wed, Oct 06, 2021 at 04:41:35PM -0700, Yang Shi wrote:
> > > Or maybe we just don't touch it until there's need for a functional change?  I
> > > feel it a pity to lose the git blame info for reindent-only patches, but no
> > > strong opinion, because I know many people don't think the same and I'm fine
> > > with either ways.
> >
> > TBH I really don't think keeping old "git blame" info should be an
> > excuse to avoid any coding style cleanup.
>
> Sure.
>
> >
> > >
> > > Another side note: perhaps a comment above pageflags enum on PG_has_hwpoisoned
> > > would be nice?  I saw that we've got a bunch of those already.
> >
> > I was thinking about that, but it seems PG_double_map doesn't have
> > comment there either so I didn't add.
>
> IMHO that means we may just need even more documentations? :)
>
> I won't ask for documenting doublemap bit in this series, but I just don't
> think it's a good excuse to not provide documentations if we still can.
> Especially to me PageHasHwpoisoned looks really so like PageHwpoisoned, so
> it'll be still very nice to have some good document along with the patch it's
> introduced.

OK, I could add more comments for this flag in the enum. It should be
just a duplicate of the comment right before the PAGEFLAG definition.

>
> Thanks,
>
> --
> Peter Xu
>

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [v3 PATCH 2/5] mm: filemap: check if THP has hwpoisoned subpage for PMD page fault
  2021-10-07 18:19         ` Yang Shi
@ 2021-10-07 20:27           ` Yang Shi
  0 siblings, 0 replies; 59+ messages in thread
From: Yang Shi @ 2021-10-07 20:27 UTC (permalink / raw)
  To: Peter Xu
  Cc: HORIGUCHI NAOYA(堀口 直也),
	Hugh Dickins, Kirill A. Shutemov, Matthew Wilcox, Oscar Salvador,
	Andrew Morton, Linux MM, Linux FS-devel Mailing List,
	Linux Kernel Mailing List

On Thu, Oct 7, 2021 at 11:19 AM Yang Shi <shy828301@gmail.com> wrote:
>
> On Thu, Oct 7, 2021 at 9:06 AM Peter Xu <peterx@redhat.com> wrote:
> >
> > On Wed, Oct 06, 2021 at 04:57:38PM -0700, Yang Shi wrote:
> > > > For example, I see that both unpoison_memory() and soft_offline_page() will
> > > > call it too, does it mean that we'll also set the bits e.g. even when we want
> > > > to inject an unpoison event too?
> > >
> > > unpoison_memory() should be not a problem since it will just bail out
> > > once THP is met as the comment says:
> > >
> > > /*
> > > * unpoison_memory() can encounter thp only when the thp is being
> > > * worked by memory_failure() and the page lock is not held yet.
> > > * In such case, we yield to memory_failure() and make unpoison fail.
> > > */
> >
> > But I still think setting the subpage-hwpoison bit hides too deep there, it'll
> > be great we can keep get_hwpoison_page() as simple as a safe version of getting
> > the refcount of the page we want.  Or we'd still better touch up the comment
> > above get_hwpoison_page() to show that side effect.
> >
> > >
> > >
> > > And I think we should set the flag for soft offline too, right? The
> >
> > I'm not familiar with either memory failure or soft offline, so far it looks
> > right to me.  However..
> >
> > > soft offline does set the hwpoison flag for the corrupted sub page and
> > > doesn't split file THP,
> >
> > .. I believe this will become not true after your patch 5, right?
>
> But THP split may fail, right?
>
> >
> > > so it should be captured by page fault as well. And yes for poison injection.
> >
> > One more thing: besides thp split and page free, do we need to conditionally
> > drop the HasHwpoisoned bit when received an unpoison event?
>
> It seems not to me, as the above comment from unpoison_memory() says
> unpoison can encounter thp only when the thp is being worked by
> memory_failure() and the page lock is not held yet. So it just bails
> out.
>
> In addition, unpoison just works for software injected errors, not
> real hardware failure.
>
> >
> > If my understanding is correct, we may need to scan all the subpages there, to
> > make sure HasHwpoisoned bit reflects the latest status for the thp in question.
> >
> > >
> > > But your comment reminds me that get_hwpoison_page() is just called
> > > when !MF_COUNT_INCREASED, so it means MADV_HWPOISON still could
> > > escape. This needs to be covered too.
> >
> > Right, maybe that's also a clue that we shouldn't set the new page flag within
> > get_hwpoison_page(), since get_hwpoison_page() is actually well coupled with
> > MF_COUNT_INCREASED and all of them are only about refcounting of the pages.
>
> Yeah, maybe, as long as there is not early bail out in some error
> handling paths.

It seems fine to move setting the flag out of get_hwpoison_page() to
right before splitting THP so that both MF_COUNT_INCREASED and
!MF_COUNT_INCREASED could be covered.


>
> >
> > --
> > Peter Xu
> >

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [v3 PATCH 2/5] mm: filemap: check if THP has hwpoisoned subpage for PMD page fault
  2021-10-06 23:57     ` Yang Shi
  2021-10-07 16:06       ` Peter Xu
@ 2021-10-07 21:28       ` Yang Shi
  2021-10-12  0:55         ` Peter Xu
  1 sibling, 1 reply; 59+ messages in thread
From: Yang Shi @ 2021-10-07 21:28 UTC (permalink / raw)
  To: Peter Xu
  Cc: HORIGUCHI NAOYA(堀口 直也),
	Hugh Dickins, Kirill A. Shutemov, Matthew Wilcox, Oscar Salvador,
	Andrew Morton, Linux MM, Linux FS-devel Mailing List,
	Linux Kernel Mailing List

On Wed, Oct 6, 2021 at 4:57 PM Yang Shi <shy828301@gmail.com> wrote:
>
> On Wed, Oct 6, 2021 at 1:15 PM Peter Xu <peterx@redhat.com> wrote:
> >
> > On Thu, Sep 30, 2021 at 02:53:08PM -0700, Yang Shi wrote:
> > > @@ -1148,8 +1148,12 @@ static int __get_hwpoison_page(struct page *page)
> > >               return -EBUSY;
> > >
> > >       if (get_page_unless_zero(head)) {
> > > -             if (head == compound_head(page))
> > > +             if (head == compound_head(page)) {
> > > +                     if (PageTransHuge(head))
> > > +                             SetPageHasHWPoisoned(head);
> > > +
> > >                       return 1;
> > > +             }
> > >
> > >               pr_info("Memory failure: %#lx cannot catch tail\n",
> > >                       page_to_pfn(page));
> >
> > Sorry for the late comments.
> >
> > I'm wondering whether it's ideal to set this bit here, as get_hwpoison_page()
> > sounds like a pure helper to get a refcount out of a sane hwpoisoned page.  I'm
> > afraid there can be side effect that we set this without being noticed, so I'm
> > also wondering we should keep it in memory_failure().
> >
> > Quotting comments for get_hwpoison_page():
> >
> >  * get_hwpoison_page() takes a page refcount of an error page to handle memory
> >  * error on it, after checking that the error page is in a well-defined state
> >  * (defined as a page-type we can successfully handle the memor error on it,
> >  * such as LRU page and hugetlb page).
> >
> > For example, I see that both unpoison_memory() and soft_offline_page() will
> > call it too, does it mean that we'll also set the bits e.g. even when we want
> > to inject an unpoison event too?
>
> unpoison_memory() should be not a problem since it will just bail out
> once THP is met as the comment says:
>
> /*
> * unpoison_memory() can encounter thp only when the thp is being
> * worked by memory_failure() and the page lock is not held yet.
> * In such case, we yield to memory_failure() and make unpoison fail.
> */
>
>
> And I think we should set the flag for soft offline too, right? The
> soft offline does set the hwpoison flag for the corrupted sub page and
> doesn't split file THP, so it should be captured by page fault as
> well. And yes for poison injection.

Err... I must be blind. The soft offline does *NOT* set hwpoison flag
for any page. So your comment does stand. The flag should be set
outside get_hwpoison_page().

>
> But your comment reminds me that get_hwpoison_page() is just called
> when !MF_COUNT_INCREASED, so it means MADV_HWPOISON still could
> escape. This needs to be covered too.
>
> BTW, I did the test with MADV_HWPOISON, but I didn't test this change
> (moving flag set after get_page_unless_zero()) since I thought it was
> just a trivial change and did overlook this case.
>
> >
> > Thanks,
> >
> > --
> > Peter Xu
> >

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [v3 PATCH 2/5] mm: filemap: check if THP has hwpoisoned subpage for PMD page fault
  2021-10-06 19:54           ` Peter Xu
  2021-10-06 23:41             ` Yang Shi
@ 2021-10-08  9:35             ` Kirill A. Shutemov
  2021-10-11 22:57               ` Peter Xu
  1 sibling, 1 reply; 59+ messages in thread
From: Kirill A. Shutemov @ 2021-10-08  9:35 UTC (permalink / raw)
  To: Peter Xu
  Cc: Yang Shi, HORIGUCHI NAOYA(堀口 直也),
	Hugh Dickins, Kirill A. Shutemov, Matthew Wilcox, Oscar Salvador,
	Andrew Morton, Linux MM, Linux FS-devel Mailing List,
	Linux Kernel Mailing List

On Wed, Oct 06, 2021 at 03:54:16PM -0400, Peter Xu wrote:
> On Mon, Oct 04, 2021 at 01:13:07PM -0700, Yang Shi wrote:
> > On Mon, Oct 4, 2021 at 12:41 PM Kirill A. Shutemov <kirill@shutemov.name> wrote:
> > >
> > > On Mon, Oct 04, 2021 at 11:17:29AM -0700, Yang Shi wrote:
> > > > On Mon, Oct 4, 2021 at 7:06 AM Kirill A. Shutemov <kirill@shutemov.name> wrote:
> > > > >
> > > > > On Thu, Sep 30, 2021 at 02:53:08PM -0700, Yang Shi wrote:
> > > > > > diff --git a/mm/filemap.c b/mm/filemap.c
> > > > > > index dae481293b5d..2acc2b977f66 100644
> > > > > > --- a/mm/filemap.c
> > > > > > +++ b/mm/filemap.c
> > > > > > @@ -3195,12 +3195,12 @@ static bool filemap_map_pmd(struct vm_fault *vmf, struct page *page)
> > > > > >       }
> > > > > >
> > > > > >       if (pmd_none(*vmf->pmd) && PageTransHuge(page)) {
> > > > > > -         vm_fault_t ret = do_set_pmd(vmf, page);
> > > > > > -         if (!ret) {
> > > > > > -                 /* The page is mapped successfully, reference consumed. */
> > > > > > -                 unlock_page(page);
> > > > > > -                 return true;
> > > > > > -         }
> > > > > > +             vm_fault_t ret = do_set_pmd(vmf, page);
> > > > > > +             if (!ret) {
> > > > > > +                     /* The page is mapped successfully, reference consumed. */
> > > > > > +                     unlock_page(page);
> > > > > > +                     return true;
> > > > > > +             }
> > > > > >       }
> > > > > >
> > > > > >       if (pmd_none(*vmf->pmd)) {
> > > > >
> > > > > Hm. Is it unrelated whitespace fix?
> > > >
> > > > It is a coding style clean up. I thought it may be overkilling to have
> > > > a separate patch. Do you prefer separate one?
> > >
> > > Maybe. I tried to find what changed here. It's confusing.
> > 
> > Yeah, maybe. Anyway I will separate the real big fix and the cleanup
> > into two patches. This may be helpful for backporting too.
> 
> Or maybe we just don't touch it until there's need for a functional change?  I
> feel it a pity to lose the git blame info for reindent-only patches,

JFYI, git blame -w ignores whitespace changes :P

-- 
 Kirill A. Shutemov

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [v3 PATCH 2/5] mm: filemap: check if THP has hwpoisoned subpage for PMD page fault
  2021-10-08  9:35             ` Kirill A. Shutemov
@ 2021-10-11 22:57               ` Peter Xu
  0 siblings, 0 replies; 59+ messages in thread
From: Peter Xu @ 2021-10-11 22:57 UTC (permalink / raw)
  To: Kirill A. Shutemov
  Cc: Yang Shi, HORIGUCHI NAOYA(堀口 直也),
	Hugh Dickins, Kirill A. Shutemov, Matthew Wilcox, Oscar Salvador,
	Andrew Morton, Linux MM, Linux FS-devel Mailing List,
	Linux Kernel Mailing List

On Fri, Oct 08, 2021 at 12:35:29PM +0300, Kirill A. Shutemov wrote:
> JFYI, git blame -w ignores whitespace changes :P

Thanks, Kirill. :)

I must confess in reality I didn't encounter white-space changes a lot, but
mostly on moving the code around either e.g. by putting things into, or out of,
"if/for" blocks, or moving code between files.

I used git-blame a lot not to looking for people to blame but to dig history of
code changes, not sure about how others do that.  So maybe it's just that I
didn't do it right beforfe, and I'll be more than glad to learn if there's more
tricks like the "-w" one (which I don't know before..).

-- 
Peter Xu


^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [v3 PATCH 2/5] mm: filemap: check if THP has hwpoisoned subpage for PMD page fault
  2021-10-07 21:28       ` Yang Shi
@ 2021-10-12  0:55         ` Peter Xu
  2021-10-12  1:44           ` Peter Xu
  0 siblings, 1 reply; 59+ messages in thread
From: Peter Xu @ 2021-10-12  0:55 UTC (permalink / raw)
  To: Yang Shi
  Cc: HORIGUCHI NAOYA(堀口 直也),
	Hugh Dickins, Kirill A. Shutemov, Matthew Wilcox, Oscar Salvador,
	Andrew Morton, Linux MM, Linux FS-devel Mailing List,
	Linux Kernel Mailing List

On Thu, Oct 07, 2021 at 02:28:35PM -0700, Yang Shi wrote:
> On Wed, Oct 6, 2021 at 4:57 PM Yang Shi <shy828301@gmail.com> wrote:
> >
> > On Wed, Oct 6, 2021 at 1:15 PM Peter Xu <peterx@redhat.com> wrote:
> > >
> > > On Thu, Sep 30, 2021 at 02:53:08PM -0700, Yang Shi wrote:
> > > > @@ -1148,8 +1148,12 @@ static int __get_hwpoison_page(struct page *page)
> > > >               return -EBUSY;
> > > >
> > > >       if (get_page_unless_zero(head)) {
> > > > -             if (head == compound_head(page))
> > > > +             if (head == compound_head(page)) {
> > > > +                     if (PageTransHuge(head))
> > > > +                             SetPageHasHWPoisoned(head);
> > > > +
> > > >                       return 1;
> > > > +             }
> > > >
> > > >               pr_info("Memory failure: %#lx cannot catch tail\n",
> > > >                       page_to_pfn(page));
> > >
> > > Sorry for the late comments.
> > >
> > > I'm wondering whether it's ideal to set this bit here, as get_hwpoison_page()
> > > sounds like a pure helper to get a refcount out of a sane hwpoisoned page.  I'm
> > > afraid there can be side effect that we set this without being noticed, so I'm
> > > also wondering we should keep it in memory_failure().
> > >
> > > Quotting comments for get_hwpoison_page():
> > >
> > >  * get_hwpoison_page() takes a page refcount of an error page to handle memory
> > >  * error on it, after checking that the error page is in a well-defined state
> > >  * (defined as a page-type we can successfully handle the memor error on it,
> > >  * such as LRU page and hugetlb page).
> > >
> > > For example, I see that both unpoison_memory() and soft_offline_page() will
> > > call it too, does it mean that we'll also set the bits e.g. even when we want
> > > to inject an unpoison event too?
> >
> > unpoison_memory() should be not a problem since it will just bail out
> > once THP is met as the comment says:
> >
> > /*
> > * unpoison_memory() can encounter thp only when the thp is being
> > * worked by memory_failure() and the page lock is not held yet.
> > * In such case, we yield to memory_failure() and make unpoison fail.
> > */
> >
> >
> > And I think we should set the flag for soft offline too, right? The
> > soft offline does set the hwpoison flag for the corrupted sub page and
> > doesn't split file THP, so it should be captured by page fault as
> > well. And yes for poison injection.
> 
> Err... I must be blind. The soft offline does *NOT* set hwpoison flag
> for any page. So your comment does stand. The flag should be set
> outside get_hwpoison_page().

I saw that page_handle_poison() sets it, so perhaps we do need to take care of
soft offline?  Though I still think the extra bit should be set outside of the
get_hwpoison_page() function.

Another thing is I noticed soft_offline_in_use_page() will still ignore file
backed split.  I'm not sure whether it means we'd better also handle that case
as well, so shmem thp can be split there too?

-- 
Peter Xu


^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [v3 PATCH 2/5] mm: filemap: check if THP has hwpoisoned subpage for PMD page fault
  2021-10-12  0:55         ` Peter Xu
@ 2021-10-12  1:44           ` Peter Xu
  2021-10-12 18:02             ` Yang Shi
  0 siblings, 1 reply; 59+ messages in thread
From: Peter Xu @ 2021-10-12  1:44 UTC (permalink / raw)
  To: Yang Shi
  Cc: HORIGUCHI NAOYA(堀口 直也),
	Hugh Dickins, Kirill A. Shutemov, Matthew Wilcox, Oscar Salvador,
	Andrew Morton, Linux MM, Linux FS-devel Mailing List,
	Linux Kernel Mailing List

On Mon, Oct 11, 2021 at 08:55:26PM -0400, Peter Xu wrote:
> Another thing is I noticed soft_offline_in_use_page() will still ignore file
> backed split.  I'm not sure whether it means we'd better also handle that case
> as well, so shmem thp can be split there too?

Please ignore this paragraph - I somehow read "!PageHuge(page)" as
"PageAnon(page)"...  So I think patch 5 handles soft offline too.

-- 
Peter Xu


^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [v3 PATCH 4/5] mm: shmem: don't truncate page if memory failure happens
  2021-09-30 21:53 ` [v3 PATCH 4/5] mm: shmem: don't truncate page if memory failure happens Yang Shi
  2021-10-01  7:05   ` Naoya Horiguchi
@ 2021-10-12  1:57   ` Peter Xu
  2021-10-12 19:17     ` Yang Shi
  1 sibling, 1 reply; 59+ messages in thread
From: Peter Xu @ 2021-10-12  1:57 UTC (permalink / raw)
  To: Yang Shi
  Cc: naoya.horiguchi, hughd, kirill.shutemov, willy, osalvador, akpm,
	linux-mm, linux-fsdevel, linux-kernel

On Thu, Sep 30, 2021 at 02:53:10PM -0700, Yang Shi wrote:
> diff --git a/mm/shmem.c b/mm/shmem.c
> index 88742953532c..75c36b6a405a 100644
> --- a/mm/shmem.c
> +++ b/mm/shmem.c
> @@ -2456,6 +2456,7 @@ shmem_write_begin(struct file *file, struct address_space *mapping,
>  	struct inode *inode = mapping->host;
>  	struct shmem_inode_info *info = SHMEM_I(inode);
>  	pgoff_t index = pos >> PAGE_SHIFT;
> +	int ret = 0;
>  
>  	/* i_rwsem is held by caller */
>  	if (unlikely(info->seals & (F_SEAL_GROW |
> @@ -2466,7 +2467,17 @@ shmem_write_begin(struct file *file, struct address_space *mapping,
>  			return -EPERM;
>  	}
>  
> -	return shmem_getpage(inode, index, pagep, SGP_WRITE);
> +	ret = shmem_getpage(inode, index, pagep, SGP_WRITE);
> +
> +	if (*pagep) {
> +		if (PageHWPoison(*pagep)) {
> +			unlock_page(*pagep);
> +			put_page(*pagep);
> +			ret = -EIO;
> +		}
> +	}
> +
> +	return ret;
>  }
>  
>  static int
> @@ -2555,6 +2566,11 @@ static ssize_t shmem_file_read_iter(struct kiocb *iocb, struct iov_iter *to)
>  			unlock_page(page);
>  		}
>  
> +		if (page && PageHWPoison(page)) {
> +			error = -EIO;
> +			break;
> +		}
> +
>  		/*
>  		 * We must evaluate after, since reads (unlike writes)
>  		 * are called without i_rwsem protection against truncate

[...]

> @@ -4193,6 +4216,10 @@ struct page *shmem_read_mapping_page_gfp(struct address_space *mapping,
>  		page = ERR_PTR(error);
>  	else
>  		unlock_page(page);
> +
> +	if (PageHWPoison(page))
> +		page = ERR_PTR(-EIO);
> +
>  	return page;
>  #else
>  	/*
> diff --git a/mm/userfaultfd.c b/mm/userfaultfd.c
> index 7a9008415534..b688d5327177 100644
> --- a/mm/userfaultfd.c
> +++ b/mm/userfaultfd.c
> @@ -233,6 +233,11 @@ static int mcontinue_atomic_pte(struct mm_struct *dst_mm,
>  		goto out;
>  	}
>  
> +	if (PageHWPoison(page)) {
> +		ret = -EIO;
> +		goto out_release;
> +	}
> +
>  	ret = mfill_atomic_install_pte(dst_mm, dst_pmd, dst_vma, dst_addr,
>  				       page, false, wp_copy);
>  	if (ret)
> -- 
> 2.26.2
> 

These are shmem_getpage_gfp() call sites:

  shmem_getpage[151]             return shmem_getpage_gfp(inode, index, pagep, sgp,
  shmem_fault[2112]              err = shmem_getpage_gfp(inode, vmf->pgoff, &vmf->page, SGP_CACHE,
  shmem_read_mapping_page_gfp[4188] error = shmem_getpage_gfp(inode, index, &page, SGP_CACHE,

These are further shmem_getpage() call sites:

  collapse_file[1735]            if (shmem_getpage(mapping->host, index, &page,
  shmem_undo_range[965]          shmem_getpage(inode, start - 1, &page, SGP_READ);
  shmem_undo_range[980]          shmem_getpage(inode, end, &page, SGP_READ);
  shmem_write_begin[2467]        return shmem_getpage(inode, index, pagep, SGP_WRITE);
  shmem_file_read_iter[2544]     error = shmem_getpage(inode, index, &page, sgp);
  shmem_fallocate[2733]          error = shmem_getpage(inode, index, &page, SGP_FALLOC);
  shmem_symlink[3079]            error = shmem_getpage(inode, 0, &page, SGP_WRITE);
  shmem_get_link[3120]           error = shmem_getpage(inode, 0, &page, SGP_READ);
  mcontinue_atomic_pte[235]      ret = shmem_getpage(inode, pgoff, &page, SGP_READ);

Wondering whether this patch covered all of them.

This also reminded me that whether we should simply fail shmem_getpage_gfp()
directly, then all above callers will get a proper failure, rather than we do
PageHWPoison() check everywhere?

-- 
Peter Xu


^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [v3 PATCH 2/5] mm: filemap: check if THP has hwpoisoned subpage for PMD page fault
  2021-10-12  1:44           ` Peter Xu
@ 2021-10-12 18:02             ` Yang Shi
  2021-10-12 22:10               ` Peter Xu
  0 siblings, 1 reply; 59+ messages in thread
From: Yang Shi @ 2021-10-12 18:02 UTC (permalink / raw)
  To: Peter Xu
  Cc: HORIGUCHI NAOYA(堀口 直也),
	Hugh Dickins, Kirill A. Shutemov, Matthew Wilcox, Oscar Salvador,
	Andrew Morton, Linux MM, Linux FS-devel Mailing List,
	Linux Kernel Mailing List

On Mon, Oct 11, 2021 at 6:44 PM Peter Xu <peterx@redhat.com> wrote:
>
> On Mon, Oct 11, 2021 at 08:55:26PM -0400, Peter Xu wrote:
> > Another thing is I noticed soft_offline_in_use_page() will still ignore file
> > backed split.  I'm not sure whether it means we'd better also handle that case
> > as well, so shmem thp can be split there too?
>
> Please ignore this paragraph - I somehow read "!PageHuge(page)" as
> "PageAnon(page)"...  So I think patch 5 handles soft offline too.

Yes, exactly. And even though the split is failed (or file THP didn't
get split before patch 5/5), soft offline would just return -EBUSY
instead of calling __soft_offline_page->page_handle_poison(). So
page_handle_poison() should not see THP at all.

>
> --
> Peter Xu
>

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [v3 PATCH 4/5] mm: shmem: don't truncate page if memory failure happens
  2021-10-12  1:57   ` Peter Xu
@ 2021-10-12 19:17     ` Yang Shi
  2021-10-12 22:26       ` Peter Xu
  0 siblings, 1 reply; 59+ messages in thread
From: Yang Shi @ 2021-10-12 19:17 UTC (permalink / raw)
  To: Peter Xu
  Cc: HORIGUCHI NAOYA(堀口 直也),
	Hugh Dickins, Kirill A. Shutemov, Matthew Wilcox, Oscar Salvador,
	Andrew Morton, Linux MM, Linux FS-devel Mailing List,
	Linux Kernel Mailing List

On Mon, Oct 11, 2021 at 6:57 PM Peter Xu <peterx@redhat.com> wrote:
>
> On Thu, Sep 30, 2021 at 02:53:10PM -0700, Yang Shi wrote:
> > diff --git a/mm/shmem.c b/mm/shmem.c
> > index 88742953532c..75c36b6a405a 100644
> > --- a/mm/shmem.c
> > +++ b/mm/shmem.c
> > @@ -2456,6 +2456,7 @@ shmem_write_begin(struct file *file, struct address_space *mapping,
> >       struct inode *inode = mapping->host;
> >       struct shmem_inode_info *info = SHMEM_I(inode);
> >       pgoff_t index = pos >> PAGE_SHIFT;
> > +     int ret = 0;
> >
> >       /* i_rwsem is held by caller */
> >       if (unlikely(info->seals & (F_SEAL_GROW |
> > @@ -2466,7 +2467,17 @@ shmem_write_begin(struct file *file, struct address_space *mapping,
> >                       return -EPERM;
> >       }
> >
> > -     return shmem_getpage(inode, index, pagep, SGP_WRITE);
> > +     ret = shmem_getpage(inode, index, pagep, SGP_WRITE);
> > +
> > +     if (*pagep) {
> > +             if (PageHWPoison(*pagep)) {
> > +                     unlock_page(*pagep);
> > +                     put_page(*pagep);
> > +                     ret = -EIO;
> > +             }
> > +     }
> > +
> > +     return ret;
> >  }
> >
> >  static int
> > @@ -2555,6 +2566,11 @@ static ssize_t shmem_file_read_iter(struct kiocb *iocb, struct iov_iter *to)
> >                       unlock_page(page);
> >               }
> >
> > +             if (page && PageHWPoison(page)) {
> > +                     error = -EIO;
> > +                     break;
> > +             }
> > +
> >               /*
> >                * We must evaluate after, since reads (unlike writes)
> >                * are called without i_rwsem protection against truncate
>
> [...]
>
> > @@ -4193,6 +4216,10 @@ struct page *shmem_read_mapping_page_gfp(struct address_space *mapping,
> >               page = ERR_PTR(error);
> >       else
> >               unlock_page(page);
> > +
> > +     if (PageHWPoison(page))
> > +             page = ERR_PTR(-EIO);
> > +
> >       return page;
> >  #else
> >       /*
> > diff --git a/mm/userfaultfd.c b/mm/userfaultfd.c
> > index 7a9008415534..b688d5327177 100644
> > --- a/mm/userfaultfd.c
> > +++ b/mm/userfaultfd.c
> > @@ -233,6 +233,11 @@ static int mcontinue_atomic_pte(struct mm_struct *dst_mm,
> >               goto out;
> >       }
> >
> > +     if (PageHWPoison(page)) {
> > +             ret = -EIO;
> > +             goto out_release;
> > +     }
> > +
> >       ret = mfill_atomic_install_pte(dst_mm, dst_pmd, dst_vma, dst_addr,
> >                                      page, false, wp_copy);
> >       if (ret)
> > --
> > 2.26.2
> >
>
> These are shmem_getpage_gfp() call sites:
>
>   shmem_getpage[151]             return shmem_getpage_gfp(inode, index, pagep, sgp,
>   shmem_fault[2112]              err = shmem_getpage_gfp(inode, vmf->pgoff, &vmf->page, SGP_CACHE,
>   shmem_read_mapping_page_gfp[4188] error = shmem_getpage_gfp(inode, index, &page, SGP_CACHE,
>
> These are further shmem_getpage() call sites:
>
>   collapse_file[1735]            if (shmem_getpage(mapping->host, index, &page,
>   shmem_undo_range[965]          shmem_getpage(inode, start - 1, &page, SGP_READ);
>   shmem_undo_range[980]          shmem_getpage(inode, end, &page, SGP_READ);
>   shmem_write_begin[2467]        return shmem_getpage(inode, index, pagep, SGP_WRITE);
>   shmem_file_read_iter[2544]     error = shmem_getpage(inode, index, &page, sgp);
>   shmem_fallocate[2733]          error = shmem_getpage(inode, index, &page, SGP_FALLOC);
>   shmem_symlink[3079]            error = shmem_getpage(inode, 0, &page, SGP_WRITE);
>   shmem_get_link[3120]           error = shmem_getpage(inode, 0, &page, SGP_READ);
>   mcontinue_atomic_pte[235]      ret = shmem_getpage(inode, pgoff, &page, SGP_READ);
>
> Wondering whether this patch covered all of them.

No, it doesn't need. Not all places care about hwpoison page, for
example, truncate, hole punch, etc. Only the APIs which return the
data back to userspace or write back to disk need care about if the
data is corrupted or not since. This has been elaborated in the cover
letter.

>
> This also reminded me that whether we should simply fail shmem_getpage_gfp()
> directly, then all above callers will get a proper failure, rather than we do
> PageHWPoison() check everywhere?

Actually I did a prototype for this approach by returning
ERR_PTR(-EIO). But all the callers have to check this return value
even though the callers don't care about hwpoison page since all the
callers (not only shmem, but also all other filesystems) just check if
page is NULL but not check if it is an error pointer. This actually
incur more changes. It sounds not optimal IMHO. So I just treat
hwpoison as other flags, for example, Uptodate, and have callers check
it when necessary.

>
> --
> Peter Xu
>

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [v3 PATCH 2/5] mm: filemap: check if THP has hwpoisoned subpage for PMD page fault
  2021-10-12 18:02             ` Yang Shi
@ 2021-10-12 22:10               ` Peter Xu
  2021-10-13  2:48                 ` Yang Shi
  0 siblings, 1 reply; 59+ messages in thread
From: Peter Xu @ 2021-10-12 22:10 UTC (permalink / raw)
  To: Yang Shi
  Cc: HORIGUCHI NAOYA(堀口 直也),
	Hugh Dickins, Kirill A. Shutemov, Matthew Wilcox, Oscar Salvador,
	Andrew Morton, Linux MM, Linux FS-devel Mailing List,
	Linux Kernel Mailing List

On Tue, Oct 12, 2021 at 11:02:09AM -0700, Yang Shi wrote:
> On Mon, Oct 11, 2021 at 6:44 PM Peter Xu <peterx@redhat.com> wrote:
> >
> > On Mon, Oct 11, 2021 at 08:55:26PM -0400, Peter Xu wrote:
> > > Another thing is I noticed soft_offline_in_use_page() will still ignore file
> > > backed split.  I'm not sure whether it means we'd better also handle that case
> > > as well, so shmem thp can be split there too?
> >
> > Please ignore this paragraph - I somehow read "!PageHuge(page)" as
> > "PageAnon(page)"...  So I think patch 5 handles soft offline too.
> 
> Yes, exactly. And even though the split is failed (or file THP didn't
> get split before patch 5/5), soft offline would just return -EBUSY
> instead of calling __soft_offline_page->page_handle_poison(). So
> page_handle_poison() should not see THP at all.

I see, so I'm trying to summarize myself on what I see now with the new logic..

I think the offline code handles hwpoison differently as it sets PageHWPoison
at the end of the process, IOW if anything failed during the offline process
the hwpoison bit is not set.

That's different from how the memory failure path is handling this, as in that
case the hwpoison bit on the subpage is set firstly, e.g. before split thp.  I
believe that's also why memory failure requires the extra sub-page-hwpoison bit
while offline code shouldn't need to: because for soft offline split happens
before setting hwpoison so we just won't ever see a "poisoned file thp", while
for memory failure it could happen, and the sub-page-hwpoison will be a temp
bit anyway only exist for a very short period right after we set hwpoison on
the small page but before we split the thp.

Am I right above?

I feel like __soft_offline_page() still has some code that assumes "thp can be
there", e.g. iiuc after your change to allow file thp split, "hpage" will
always be the same as "page" then in that function, and isolate_page() does not
need to pass in a pagelist pointer too as it'll always be handling a small page
anyway.  But maybe they're fine to be there for now as they'll just work as
before, I think, so just raise it up.

-- 
Peter Xu


^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [v3 PATCH 4/5] mm: shmem: don't truncate page if memory failure happens
  2021-10-12 19:17     ` Yang Shi
@ 2021-10-12 22:26       ` Peter Xu
  2021-10-13  3:00         ` Yang Shi
  0 siblings, 1 reply; 59+ messages in thread
From: Peter Xu @ 2021-10-12 22:26 UTC (permalink / raw)
  To: Yang Shi
  Cc: HORIGUCHI NAOYA(堀口 直也),
	Hugh Dickins, Kirill A. Shutemov, Matthew Wilcox, Oscar Salvador,
	Andrew Morton, Linux MM, Linux FS-devel Mailing List,
	Linux Kernel Mailing List

On Tue, Oct 12, 2021 at 12:17:33PM -0700, Yang Shi wrote:
> On Mon, Oct 11, 2021 at 6:57 PM Peter Xu <peterx@redhat.com> wrote:
> >
> > On Thu, Sep 30, 2021 at 02:53:10PM -0700, Yang Shi wrote:
> > > diff --git a/mm/shmem.c b/mm/shmem.c
> > > index 88742953532c..75c36b6a405a 100644
> > > --- a/mm/shmem.c
> > > +++ b/mm/shmem.c
> > > @@ -2456,6 +2456,7 @@ shmem_write_begin(struct file *file, struct address_space *mapping,
> > >       struct inode *inode = mapping->host;
> > >       struct shmem_inode_info *info = SHMEM_I(inode);
> > >       pgoff_t index = pos >> PAGE_SHIFT;
> > > +     int ret = 0;
> > >
> > >       /* i_rwsem is held by caller */
> > >       if (unlikely(info->seals & (F_SEAL_GROW |
> > > @@ -2466,7 +2467,17 @@ shmem_write_begin(struct file *file, struct address_space *mapping,
> > >                       return -EPERM;
> > >       }
> > >
> > > -     return shmem_getpage(inode, index, pagep, SGP_WRITE);
> > > +     ret = shmem_getpage(inode, index, pagep, SGP_WRITE);
> > > +
> > > +     if (*pagep) {
> > > +             if (PageHWPoison(*pagep)) {
> > > +                     unlock_page(*pagep);
> > > +                     put_page(*pagep);
> > > +                     ret = -EIO;
> > > +             }
> > > +     }
> > > +
> > > +     return ret;
> > >  }
> > >
> > >  static int
> > > @@ -2555,6 +2566,11 @@ static ssize_t shmem_file_read_iter(struct kiocb *iocb, struct iov_iter *to)
> > >                       unlock_page(page);
> > >               }
> > >
> > > +             if (page && PageHWPoison(page)) {
> > > +                     error = -EIO;
> > > +                     break;
> > > +             }
> > > +
> > >               /*
> > >                * We must evaluate after, since reads (unlike writes)
> > >                * are called without i_rwsem protection against truncate
> >
> > [...]
> >
> > > @@ -4193,6 +4216,10 @@ struct page *shmem_read_mapping_page_gfp(struct address_space *mapping,
> > >               page = ERR_PTR(error);
> > >       else
> > >               unlock_page(page);
> > > +
> > > +     if (PageHWPoison(page))
> > > +             page = ERR_PTR(-EIO);
> > > +
> > >       return page;
> > >  #else
> > >       /*
> > > diff --git a/mm/userfaultfd.c b/mm/userfaultfd.c
> > > index 7a9008415534..b688d5327177 100644
> > > --- a/mm/userfaultfd.c
> > > +++ b/mm/userfaultfd.c
> > > @@ -233,6 +233,11 @@ static int mcontinue_atomic_pte(struct mm_struct *dst_mm,
> > >               goto out;
> > >       }
> > >
> > > +     if (PageHWPoison(page)) {
> > > +             ret = -EIO;
> > > +             goto out_release;
> > > +     }
> > > +
> > >       ret = mfill_atomic_install_pte(dst_mm, dst_pmd, dst_vma, dst_addr,
> > >                                      page, false, wp_copy);
> > >       if (ret)
> > > --
> > > 2.26.2
> > >
> >
> > These are shmem_getpage_gfp() call sites:
> >
> >   shmem_getpage[151]             return shmem_getpage_gfp(inode, index, pagep, sgp,
> >   shmem_fault[2112]              err = shmem_getpage_gfp(inode, vmf->pgoff, &vmf->page, SGP_CACHE,
> >   shmem_read_mapping_page_gfp[4188] error = shmem_getpage_gfp(inode, index, &page, SGP_CACHE,
> >
> > These are further shmem_getpage() call sites:
> >
> >   collapse_file[1735]            if (shmem_getpage(mapping->host, index, &page,
> >   shmem_undo_range[965]          shmem_getpage(inode, start - 1, &page, SGP_READ);
> >   shmem_undo_range[980]          shmem_getpage(inode, end, &page, SGP_READ);
> >   shmem_write_begin[2467]        return shmem_getpage(inode, index, pagep, SGP_WRITE);
> >   shmem_file_read_iter[2544]     error = shmem_getpage(inode, index, &page, sgp);
> >   shmem_fallocate[2733]          error = shmem_getpage(inode, index, &page, SGP_FALLOC);
> >   shmem_symlink[3079]            error = shmem_getpage(inode, 0, &page, SGP_WRITE);
> >   shmem_get_link[3120]           error = shmem_getpage(inode, 0, &page, SGP_READ);
> >   mcontinue_atomic_pte[235]      ret = shmem_getpage(inode, pgoff, &page, SGP_READ);
> >
> > Wondering whether this patch covered all of them.
> 
> No, it doesn't need. Not all places care about hwpoison page, for
> example, truncate, hole punch, etc. Only the APIs which return the
> data back to userspace or write back to disk need care about if the
> data is corrupted or not since. This has been elaborated in the cover
> letter.

I see, sorry I missed that.  However I still have two entries unsure in above
list that this patch didn't cover (besides fault path, truncate, ...):

  collapse_file[1735]            if (shmem_getpage(mapping->host, index, &page,
  shmem_get_link[3120]           error = shmem_getpage(inode, 0, &page, SGP_READ);

IIUC the 1st one is when we want to collapse a file thp, should we stop the
attempt if we see a hwpoison small page?

The 2nd one should be where we had a symlink shmem file and the 1st page which
stores the link got corrupted.  Should we fail the get_link() then?

-- 
Peter Xu


^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [RFC v3 PATCH 0/5] Solve silent data loss caused by poisoned page cache (shmem/tmpfs)
  2021-09-30 21:53 [RFC v3 PATCH 0/5] Solve silent data loss caused by poisoned page cache (shmem/tmpfs) Yang Shi
                   ` (4 preceding siblings ...)
  2021-09-30 21:53 ` [v3 PATCH 5/5] mm: hwpoison: handle non-anonymous THP correctly Yang Shi
@ 2021-10-13  2:40 ` Peter Xu
  2021-10-13  3:09   ` Yang Shi
  5 siblings, 1 reply; 59+ messages in thread
From: Peter Xu @ 2021-10-13  2:40 UTC (permalink / raw)
  To: Yang Shi
  Cc: naoya.horiguchi, hughd, kirill.shutemov, willy, osalvador, akpm,
	linux-mm, linux-fsdevel, linux-kernel

On Thu, Sep 30, 2021 at 02:53:06PM -0700, Yang Shi wrote:
> Yang Shi (5):
>       mm: hwpoison: remove the unnecessary THP check
>       mm: filemap: check if THP has hwpoisoned subpage for PMD page fault
>       mm: hwpoison: refactor refcount check handling
>       mm: shmem: don't truncate page if memory failure happens
>       mm: hwpoison: handle non-anonymous THP correctly

Today I just noticed one more thing: unpoison path has (unpoison_memory):

	if (page_mapping(page)) {
		unpoison_pr_info("Unpoison: the hwpoison page has non-NULL mapping %#lx\n",
				 pfn, &unpoison_rs);
		return 0;
	}

I _think_ it was used to make sure we ignore page that was not successfully
poisoned/offlined before (for anonymous), so raising this question up on
whether we should make sure e.g. shmem hwpoisoned pages still can be unpoisoned
for debugging purposes.

-- 
Peter Xu


^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [v3 PATCH 2/5] mm: filemap: check if THP has hwpoisoned subpage for PMD page fault
  2021-10-12 22:10               ` Peter Xu
@ 2021-10-13  2:48                 ` Yang Shi
  2021-10-13  3:01                   ` Peter Xu
  0 siblings, 1 reply; 59+ messages in thread
From: Yang Shi @ 2021-10-13  2:48 UTC (permalink / raw)
  To: Peter Xu
  Cc: HORIGUCHI NAOYA(堀口 直也),
	Hugh Dickins, Kirill A. Shutemov, Matthew Wilcox, Oscar Salvador,
	Andrew Morton, Linux MM, Linux FS-devel Mailing List,
	Linux Kernel Mailing List

On Tue, Oct 12, 2021 at 3:10 PM Peter Xu <peterx@redhat.com> wrote:
>
> On Tue, Oct 12, 2021 at 11:02:09AM -0700, Yang Shi wrote:
> > On Mon, Oct 11, 2021 at 6:44 PM Peter Xu <peterx@redhat.com> wrote:
> > >
> > > On Mon, Oct 11, 2021 at 08:55:26PM -0400, Peter Xu wrote:
> > > > Another thing is I noticed soft_offline_in_use_page() will still ignore file
> > > > backed split.  I'm not sure whether it means we'd better also handle that case
> > > > as well, so shmem thp can be split there too?
> > >
> > > Please ignore this paragraph - I somehow read "!PageHuge(page)" as
> > > "PageAnon(page)"...  So I think patch 5 handles soft offline too.
> >
> > Yes, exactly. And even though the split is failed (or file THP didn't
> > get split before patch 5/5), soft offline would just return -EBUSY
> > instead of calling __soft_offline_page->page_handle_poison(). So
> > page_handle_poison() should not see THP at all.
>
> I see, so I'm trying to summarize myself on what I see now with the new logic..
>
> I think the offline code handles hwpoison differently as it sets PageHWPoison
> at the end of the process, IOW if anything failed during the offline process
> the hwpoison bit is not set.
>
> That's different from how the memory failure path is handling this, as in that
> case the hwpoison bit on the subpage is set firstly, e.g. before split thp.  I
> believe that's also why memory failure requires the extra sub-page-hwpoison bit
> while offline code shouldn't need to: because for soft offline split happens
> before setting hwpoison so we just won't ever see a "poisoned file thp", while
> for memory failure it could happen, and the sub-page-hwpoison will be a temp
> bit anyway only exist for a very short period right after we set hwpoison on
> the small page but before we split the thp.
>
> Am I right above?

Yeah, you are right. I noticed this too, only successfully migrated
page is marked as hwpoison. But TBH I'm not sure why it does this way.
Naoya may know. Anyway, THP doesn't get migrated if it can't be split,
so PageHasHWPoisoned doesn't apply, right?

>
> I feel like __soft_offline_page() still has some code that assumes "thp can be
> there", e.g. iiuc after your change to allow file thp split, "hpage" will
> always be the same as "page" then in that function, and isolate_page() does not
> need to pass in a pagelist pointer too as it'll always be handling a small page
> anyway.  But maybe they're fine to be there for now as they'll just work as
> before, I think, so just raise it up.

That compound_head() call seems to be for hugetlb since isolating
hugetlb needs to pass in the head page IIUC. For the pagelist, I think
it is just because migrate_pages() requires a list as the second
parameter.

>
> --
> Peter Xu
>

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [v3 PATCH 4/5] mm: shmem: don't truncate page if memory failure happens
  2021-10-12 22:26       ` Peter Xu
@ 2021-10-13  3:00         ` Yang Shi
  2021-10-13  3:06           ` Peter Xu
  0 siblings, 1 reply; 59+ messages in thread
From: Yang Shi @ 2021-10-13  3:00 UTC (permalink / raw)
  To: Peter Xu
  Cc: HORIGUCHI NAOYA(堀口 直也),
	Hugh Dickins, Kirill A. Shutemov, Matthew Wilcox, Oscar Salvador,
	Andrew Morton, Linux MM, Linux FS-devel Mailing List,
	Linux Kernel Mailing List

On Tue, Oct 12, 2021 at 3:27 PM Peter Xu <peterx@redhat.com> wrote:
>
> On Tue, Oct 12, 2021 at 12:17:33PM -0700, Yang Shi wrote:
> > On Mon, Oct 11, 2021 at 6:57 PM Peter Xu <peterx@redhat.com> wrote:
> > >
> > > On Thu, Sep 30, 2021 at 02:53:10PM -0700, Yang Shi wrote:
> > > > diff --git a/mm/shmem.c b/mm/shmem.c
> > > > index 88742953532c..75c36b6a405a 100644
> > > > --- a/mm/shmem.c
> > > > +++ b/mm/shmem.c
> > > > @@ -2456,6 +2456,7 @@ shmem_write_begin(struct file *file, struct address_space *mapping,
> > > >       struct inode *inode = mapping->host;
> > > >       struct shmem_inode_info *info = SHMEM_I(inode);
> > > >       pgoff_t index = pos >> PAGE_SHIFT;
> > > > +     int ret = 0;
> > > >
> > > >       /* i_rwsem is held by caller */
> > > >       if (unlikely(info->seals & (F_SEAL_GROW |
> > > > @@ -2466,7 +2467,17 @@ shmem_write_begin(struct file *file, struct address_space *mapping,
> > > >                       return -EPERM;
> > > >       }
> > > >
> > > > -     return shmem_getpage(inode, index, pagep, SGP_WRITE);
> > > > +     ret = shmem_getpage(inode, index, pagep, SGP_WRITE);
> > > > +
> > > > +     if (*pagep) {
> > > > +             if (PageHWPoison(*pagep)) {
> > > > +                     unlock_page(*pagep);
> > > > +                     put_page(*pagep);
> > > > +                     ret = -EIO;
> > > > +             }
> > > > +     }
> > > > +
> > > > +     return ret;
> > > >  }
> > > >
> > > >  static int
> > > > @@ -2555,6 +2566,11 @@ static ssize_t shmem_file_read_iter(struct kiocb *iocb, struct iov_iter *to)
> > > >                       unlock_page(page);
> > > >               }
> > > >
> > > > +             if (page && PageHWPoison(page)) {
> > > > +                     error = -EIO;
> > > > +                     break;
> > > > +             }
> > > > +
> > > >               /*
> > > >                * We must evaluate after, since reads (unlike writes)
> > > >                * are called without i_rwsem protection against truncate
> > >
> > > [...]
> > >
> > > > @@ -4193,6 +4216,10 @@ struct page *shmem_read_mapping_page_gfp(struct address_space *mapping,
> > > >               page = ERR_PTR(error);
> > > >       else
> > > >               unlock_page(page);
> > > > +
> > > > +     if (PageHWPoison(page))
> > > > +             page = ERR_PTR(-EIO);
> > > > +
> > > >       return page;
> > > >  #else
> > > >       /*
> > > > diff --git a/mm/userfaultfd.c b/mm/userfaultfd.c
> > > > index 7a9008415534..b688d5327177 100644
> > > > --- a/mm/userfaultfd.c
> > > > +++ b/mm/userfaultfd.c
> > > > @@ -233,6 +233,11 @@ static int mcontinue_atomic_pte(struct mm_struct *dst_mm,
> > > >               goto out;
> > > >       }
> > > >
> > > > +     if (PageHWPoison(page)) {
> > > > +             ret = -EIO;
> > > > +             goto out_release;
> > > > +     }
> > > > +
> > > >       ret = mfill_atomic_install_pte(dst_mm, dst_pmd, dst_vma, dst_addr,
> > > >                                      page, false, wp_copy);
> > > >       if (ret)
> > > > --
> > > > 2.26.2
> > > >
> > >
> > > These are shmem_getpage_gfp() call sites:
> > >
> > >   shmem_getpage[151]             return shmem_getpage_gfp(inode, index, pagep, sgp,
> > >   shmem_fault[2112]              err = shmem_getpage_gfp(inode, vmf->pgoff, &vmf->page, SGP_CACHE,
> > >   shmem_read_mapping_page_gfp[4188] error = shmem_getpage_gfp(inode, index, &page, SGP_CACHE,
> > >
> > > These are further shmem_getpage() call sites:
> > >
> > >   collapse_file[1735]            if (shmem_getpage(mapping->host, index, &page,
> > >   shmem_undo_range[965]          shmem_getpage(inode, start - 1, &page, SGP_READ);
> > >   shmem_undo_range[980]          shmem_getpage(inode, end, &page, SGP_READ);
> > >   shmem_write_begin[2467]        return shmem_getpage(inode, index, pagep, SGP_WRITE);
> > >   shmem_file_read_iter[2544]     error = shmem_getpage(inode, index, &page, sgp);
> > >   shmem_fallocate[2733]          error = shmem_getpage(inode, index, &page, SGP_FALLOC);
> > >   shmem_symlink[3079]            error = shmem_getpage(inode, 0, &page, SGP_WRITE);
> > >   shmem_get_link[3120]           error = shmem_getpage(inode, 0, &page, SGP_READ);
> > >   mcontinue_atomic_pte[235]      ret = shmem_getpage(inode, pgoff, &page, SGP_READ);
> > >
> > > Wondering whether this patch covered all of them.
> >
> > No, it doesn't need. Not all places care about hwpoison page, for
> > example, truncate, hole punch, etc. Only the APIs which return the
> > data back to userspace or write back to disk need care about if the
> > data is corrupted or not since. This has been elaborated in the cover
> > letter.
>
> I see, sorry I missed that.  However I still have two entries unsure in above
> list that this patch didn't cover (besides fault path, truncate, ...):
>
>   collapse_file[1735]            if (shmem_getpage(mapping->host, index, &page,
>   shmem_get_link[3120]           error = shmem_getpage(inode, 0, &page, SGP_READ);
>
> IIUC the 1st one is when we want to collapse a file thp, should we stop the
> attempt if we see a hwpoison small page?

The page refcount could stop collapsing hwpoison page. One could argue
khugepaged could bail out earlier by checking hwpoison flag, but it is
definitely not a must do. So it relies on refcount now.

>
> The 2nd one should be where we had a symlink shmem file and the 1st page which
> stores the link got corrupted.  Should we fail the get_link() then?

Thanks for catching this. Yeah, it seems this one is overlooked. I
didn't know that reading symlink needed to copy the first page. Will
fix it in the next version.

>
> --
> Peter Xu
>

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [v3 PATCH 2/5] mm: filemap: check if THP has hwpoisoned subpage for PMD page fault
  2021-10-13  2:48                 ` Yang Shi
@ 2021-10-13  3:01                   ` Peter Xu
  2021-10-13  3:27                     ` Yang Shi
  2021-10-14  6:54                     ` Naoya Horiguchi
  0 siblings, 2 replies; 59+ messages in thread
From: Peter Xu @ 2021-10-13  3:01 UTC (permalink / raw)
  To: Yang Shi
  Cc: HORIGUCHI NAOYA(堀口 直也),
	Hugh Dickins, Kirill A. Shutemov, Matthew Wilcox, Oscar Salvador,
	Andrew Morton, Linux MM, Linux FS-devel Mailing List,
	Linux Kernel Mailing List

On Tue, Oct 12, 2021 at 07:48:39PM -0700, Yang Shi wrote:
> On Tue, Oct 12, 2021 at 3:10 PM Peter Xu <peterx@redhat.com> wrote:
> >
> > On Tue, Oct 12, 2021 at 11:02:09AM -0700, Yang Shi wrote:
> > > On Mon, Oct 11, 2021 at 6:44 PM Peter Xu <peterx@redhat.com> wrote:
> > > >
> > > > On Mon, Oct 11, 2021 at 08:55:26PM -0400, Peter Xu wrote:
> > > > > Another thing is I noticed soft_offline_in_use_page() will still ignore file
> > > > > backed split.  I'm not sure whether it means we'd better also handle that case
> > > > > as well, so shmem thp can be split there too?
> > > >
> > > > Please ignore this paragraph - I somehow read "!PageHuge(page)" as
> > > > "PageAnon(page)"...  So I think patch 5 handles soft offline too.
> > >
> > > Yes, exactly. And even though the split is failed (or file THP didn't
> > > get split before patch 5/5), soft offline would just return -EBUSY
> > > instead of calling __soft_offline_page->page_handle_poison(). So
> > > page_handle_poison() should not see THP at all.
> >
> > I see, so I'm trying to summarize myself on what I see now with the new logic..
> >
> > I think the offline code handles hwpoison differently as it sets PageHWPoison
> > at the end of the process, IOW if anything failed during the offline process
> > the hwpoison bit is not set.
> >
> > That's different from how the memory failure path is handling this, as in that
> > case the hwpoison bit on the subpage is set firstly, e.g. before split thp.  I
> > believe that's also why memory failure requires the extra sub-page-hwpoison bit
> > while offline code shouldn't need to: because for soft offline split happens
> > before setting hwpoison so we just won't ever see a "poisoned file thp", while
> > for memory failure it could happen, and the sub-page-hwpoison will be a temp
> > bit anyway only exist for a very short period right after we set hwpoison on
> > the small page but before we split the thp.
> >
> > Am I right above?
> 
> Yeah, you are right. I noticed this too, only successfully migrated
> page is marked as hwpoison. But TBH I'm not sure why it does this way.

My wild guess is that unlike memory failures, soft offline is best-effort. Say,
the data on the page is still consistent, so even if offline failed for some
reason we shouldn't stop the program from execution.  That's not true for
memory failures via MCEs, afaict, as the execution could read/write wrong data
and that'll be a serious mistake, so we set hwpoison 1st there first before
doing anything else, making sure "this page is broken" message delivered and
user app won't run with risk.

But yeah it'll be great if Naoya could help confirm that.

> Naoya may know. Anyway, THP doesn't get migrated if it can't be split,
> so PageHasHWPoisoned doesn't apply, right?

Right, that matches my current understanding of the code, so the extra bit is
perhaps not useful for soft offline case.

But this also reminded me that shouldn't we be with the page lock already
during the process of "setting hwpoison-subpage bit, split thp, clear
hwpoison-subpage bit"?  If it's only the small window that needs protection,
while when looking up the shmem pagecache we always need to take the page lock
too, then it seems already safe even without the extra bit?  Hmm?

> 
> >
> > I feel like __soft_offline_page() still has some code that assumes "thp can be
> > there", e.g. iiuc after your change to allow file thp split, "hpage" will
> > always be the same as "page" then in that function, and isolate_page() does not
> > need to pass in a pagelist pointer too as it'll always be handling a small page
> > anyway.  But maybe they're fine to be there for now as they'll just work as
> > before, I think, so just raise it up.
> 
> That compound_head() call seems to be for hugetlb since isolating
> hugetlb needs to pass in the head page IIUC. For the pagelist, I think
> it is just because migrate_pages() requires a list as the second
> parameter.

Fair enough.

Thanks,

-- 
Peter Xu


^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [v3 PATCH 4/5] mm: shmem: don't truncate page if memory failure happens
  2021-10-13  3:00         ` Yang Shi
@ 2021-10-13  3:06           ` Peter Xu
  2021-10-13  3:29             ` Yang Shi
  0 siblings, 1 reply; 59+ messages in thread
From: Peter Xu @ 2021-10-13  3:06 UTC (permalink / raw)
  To: Yang Shi
  Cc: HORIGUCHI NAOYA(堀口 直也),
	Hugh Dickins, Kirill A. Shutemov, Matthew Wilcox, Oscar Salvador,
	Andrew Morton, Linux MM, Linux FS-devel Mailing List,
	Linux Kernel Mailing List

On Tue, Oct 12, 2021 at 08:00:31PM -0700, Yang Shi wrote:
> The page refcount could stop collapsing hwpoison page. One could argue
> khugepaged could bail out earlier by checking hwpoison flag, but it is
> definitely not a must do. So it relies on refcount now.

I suppose you mean the page_ref_freeze() in collapse_file()?  Yeah that seems
to work too.  Thanks,

-- 
Peter Xu


^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [RFC v3 PATCH 0/5] Solve silent data loss caused by poisoned page cache (shmem/tmpfs)
  2021-10-13  2:40 ` [RFC v3 PATCH 0/5] Solve silent data loss caused by poisoned page cache (shmem/tmpfs) Peter Xu
@ 2021-10-13  3:09   ` Yang Shi
  2021-10-13  3:24     ` Peter Xu
  2021-10-14  6:54     ` Naoya Horiguchi
  0 siblings, 2 replies; 59+ messages in thread
From: Yang Shi @ 2021-10-13  3:09 UTC (permalink / raw)
  To: Peter Xu
  Cc: HORIGUCHI NAOYA(堀口 直也),
	Hugh Dickins, Kirill A. Shutemov, Matthew Wilcox, Oscar Salvador,
	Andrew Morton, Linux MM, Linux FS-devel Mailing List,
	Linux Kernel Mailing List

On Tue, Oct 12, 2021 at 7:41 PM Peter Xu <peterx@redhat.com> wrote:
>
> On Thu, Sep 30, 2021 at 02:53:06PM -0700, Yang Shi wrote:
> > Yang Shi (5):
> >       mm: hwpoison: remove the unnecessary THP check
> >       mm: filemap: check if THP has hwpoisoned subpage for PMD page fault
> >       mm: hwpoison: refactor refcount check handling
> >       mm: shmem: don't truncate page if memory failure happens
> >       mm: hwpoison: handle non-anonymous THP correctly
>
> Today I just noticed one more thing: unpoison path has (unpoison_memory):
>
>         if (page_mapping(page)) {
>                 unpoison_pr_info("Unpoison: the hwpoison page has non-NULL mapping %#lx\n",
>                                  pfn, &unpoison_rs);
>                 return 0;
>         }
>
> I _think_ it was used to make sure we ignore page that was not successfully
> poisoned/offlined before (for anonymous), so raising this question up on
> whether we should make sure e.g. shmem hwpoisoned pages still can be unpoisoned
> for debugging purposes.

Yes, not only mapping, the refcount check is not right if page cache
page is kept in page cache instead of being truncated after this
series. But actually unpoison has been broken since commit
0ed950d1f28142ccd9a9453c60df87853530d778 ("mm,hwpoison: make
get_hwpoison_page() call get_any_page()"). And Naoya said in the
commit "unpoison_memory() is also unchanged because it's broken and
need thorough fixes (will be done later)."

I do have some fixes in my tree to unblock tests and fix unpoison for
this series (just make it work for testing). Naoya may have some ideas
in mind and it is just a debugging feature so I don't think it must be
fixed in this series. It could be done later. I could add a TODO
section in the cover letter to make this more clear.

>
> --
> Peter Xu
>

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [RFC v3 PATCH 0/5] Solve silent data loss caused by poisoned page cache (shmem/tmpfs)
  2021-10-13  3:09   ` Yang Shi
@ 2021-10-13  3:24     ` Peter Xu
  2021-10-14  6:54     ` Naoya Horiguchi
  1 sibling, 0 replies; 59+ messages in thread
From: Peter Xu @ 2021-10-13  3:24 UTC (permalink / raw)
  To: Yang Shi
  Cc: HORIGUCHI NAOYA(堀口 直也),
	Hugh Dickins, Kirill A. Shutemov, Matthew Wilcox, Oscar Salvador,
	Andrew Morton, Linux MM, Linux FS-devel Mailing List,
	Linux Kernel Mailing List

On Tue, Oct 12, 2021 at 08:09:24PM -0700, Yang Shi wrote:
> On Tue, Oct 12, 2021 at 7:41 PM Peter Xu <peterx@redhat.com> wrote:
> >
> > On Thu, Sep 30, 2021 at 02:53:06PM -0700, Yang Shi wrote:
> > > Yang Shi (5):
> > >       mm: hwpoison: remove the unnecessary THP check
> > >       mm: filemap: check if THP has hwpoisoned subpage for PMD page fault
> > >       mm: hwpoison: refactor refcount check handling
> > >       mm: shmem: don't truncate page if memory failure happens
> > >       mm: hwpoison: handle non-anonymous THP correctly
> >
> > Today I just noticed one more thing: unpoison path has (unpoison_memory):
> >
> >         if (page_mapping(page)) {
> >                 unpoison_pr_info("Unpoison: the hwpoison page has non-NULL mapping %#lx\n",
> >                                  pfn, &unpoison_rs);
> >                 return 0;
> >         }
> >
> > I _think_ it was used to make sure we ignore page that was not successfully
> > poisoned/offlined before (for anonymous), so raising this question up on
> > whether we should make sure e.g. shmem hwpoisoned pages still can be unpoisoned
> > for debugging purposes.
> 
> Yes, not only mapping, the refcount check is not right if page cache
> page is kept in page cache instead of being truncated after this
> series. But actually unpoison has been broken since commit
> 0ed950d1f28142ccd9a9453c60df87853530d778 ("mm,hwpoison: make
> get_hwpoison_page() call get_any_page()"). And Naoya said in the
> commit "unpoison_memory() is also unchanged because it's broken and
> need thorough fixes (will be done later)."
> 
> I do have some fixes in my tree to unblock tests and fix unpoison for
> this series (just make it work for testing). Naoya may have some ideas
> in mind and it is just a debugging feature so I don't think it must be
> fixed in this series. It could be done later. I could add a TODO
> section in the cover letter to make this more clear.

I see, that sounds good enough to me.  Thanks,

-- 
Peter Xu


^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [v3 PATCH 2/5] mm: filemap: check if THP has hwpoisoned subpage for PMD page fault
  2021-10-13  3:01                   ` Peter Xu
@ 2021-10-13  3:27                     ` Yang Shi
  2021-10-13  3:41                       ` Peter Xu
  2021-10-14  6:54                     ` Naoya Horiguchi
  1 sibling, 1 reply; 59+ messages in thread
From: Yang Shi @ 2021-10-13  3:27 UTC (permalink / raw)
  To: Peter Xu
  Cc: HORIGUCHI NAOYA(堀口 直也),
	Hugh Dickins, Kirill A. Shutemov, Matthew Wilcox, Oscar Salvador,
	Andrew Morton, Linux MM, Linux FS-devel Mailing List,
	Linux Kernel Mailing List

On Tue, Oct 12, 2021 at 8:01 PM Peter Xu <peterx@redhat.com> wrote:
>
> On Tue, Oct 12, 2021 at 07:48:39PM -0700, Yang Shi wrote:
> > On Tue, Oct 12, 2021 at 3:10 PM Peter Xu <peterx@redhat.com> wrote:
> > >
> > > On Tue, Oct 12, 2021 at 11:02:09AM -0700, Yang Shi wrote:
> > > > On Mon, Oct 11, 2021 at 6:44 PM Peter Xu <peterx@redhat.com> wrote:
> > > > >
> > > > > On Mon, Oct 11, 2021 at 08:55:26PM -0400, Peter Xu wrote:
> > > > > > Another thing is I noticed soft_offline_in_use_page() will still ignore file
> > > > > > backed split.  I'm not sure whether it means we'd better also handle that case
> > > > > > as well, so shmem thp can be split there too?
> > > > >
> > > > > Please ignore this paragraph - I somehow read "!PageHuge(page)" as
> > > > > "PageAnon(page)"...  So I think patch 5 handles soft offline too.
> > > >
> > > > Yes, exactly. And even though the split is failed (or file THP didn't
> > > > get split before patch 5/5), soft offline would just return -EBUSY
> > > > instead of calling __soft_offline_page->page_handle_poison(). So
> > > > page_handle_poison() should not see THP at all.
> > >
> > > I see, so I'm trying to summarize myself on what I see now with the new logic..
> > >
> > > I think the offline code handles hwpoison differently as it sets PageHWPoison
> > > at the end of the process, IOW if anything failed during the offline process
> > > the hwpoison bit is not set.
> > >
> > > That's different from how the memory failure path is handling this, as in that
> > > case the hwpoison bit on the subpage is set firstly, e.g. before split thp.  I
> > > believe that's also why memory failure requires the extra sub-page-hwpoison bit
> > > while offline code shouldn't need to: because for soft offline split happens
> > > before setting hwpoison so we just won't ever see a "poisoned file thp", while
> > > for memory failure it could happen, and the sub-page-hwpoison will be a temp
> > > bit anyway only exist for a very short period right after we set hwpoison on
> > > the small page but before we split the thp.
> > >
> > > Am I right above?
> >
> > Yeah, you are right. I noticed this too, only successfully migrated
> > page is marked as hwpoison. But TBH I'm not sure why it does this way.
>
> My wild guess is that unlike memory failures, soft offline is best-effort. Say,
> the data on the page is still consistent, so even if offline failed for some
> reason we shouldn't stop the program from execution.  That's not true for
> memory failures via MCEs, afaict, as the execution could read/write wrong data
> and that'll be a serious mistake, so we set hwpoison 1st there first before
> doing anything else, making sure "this page is broken" message delivered and
> user app won't run with risk.

Makes sense to me.

>
> But yeah it'll be great if Naoya could help confirm that.
>
> > Naoya may know. Anyway, THP doesn't get migrated if it can't be split,
> > so PageHasHWPoisoned doesn't apply, right?
>
> Right, that matches my current understanding of the code, so the extra bit is
> perhaps not useful for soft offline case.
>
> But this also reminded me that shouldn't we be with the page lock already
> during the process of "setting hwpoison-subpage bit, split thp, clear
> hwpoison-subpage bit"?  If it's only the small window that needs protection,
> while when looking up the shmem pagecache we always need to take the page lock
> too, then it seems already safe even without the extra bit?  Hmm?

I don't quite get your point. Do you mean memory_failure()? If so the
answer is no, outside the page lock. And the window may be indefinite
since file THP doesn't get split before this series and the split may
fail even after this series.

>
> >
> > >
> > > I feel like __soft_offline_page() still has some code that assumes "thp can be
> > > there", e.g. iiuc after your change to allow file thp split, "hpage" will
> > > always be the same as "page" then in that function, and isolate_page() does not
> > > need to pass in a pagelist pointer too as it'll always be handling a small page
> > > anyway.  But maybe they're fine to be there for now as they'll just work as
> > > before, I think, so just raise it up.
> >
> > That compound_head() call seems to be for hugetlb since isolating
> > hugetlb needs to pass in the head page IIUC. For the pagelist, I think
> > it is just because migrate_pages() requires a list as the second
> > parameter.
>
> Fair enough.
>
> Thanks,
>
> --
> Peter Xu
>

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [v3 PATCH 4/5] mm: shmem: don't truncate page if memory failure happens
  2021-10-13  3:06           ` Peter Xu
@ 2021-10-13  3:29             ` Yang Shi
  0 siblings, 0 replies; 59+ messages in thread
From: Yang Shi @ 2021-10-13  3:29 UTC (permalink / raw)
  To: Peter Xu
  Cc: HORIGUCHI NAOYA(堀口 直也),
	Hugh Dickins, Kirill A. Shutemov, Matthew Wilcox, Oscar Salvador,
	Andrew Morton, Linux MM, Linux FS-devel Mailing List,
	Linux Kernel Mailing List

On Tue, Oct 12, 2021 at 8:07 PM Peter Xu <peterx@redhat.com> wrote:
>
> On Tue, Oct 12, 2021 at 08:00:31PM -0700, Yang Shi wrote:
> > The page refcount could stop collapsing hwpoison page. One could argue
> > khugepaged could bail out earlier by checking hwpoison flag, but it is
> > definitely not a must do. So it relies on refcount now.
>
> I suppose you mean the page_ref_freeze() in collapse_file()?  Yeah that seems
> to work too.  Thanks,

Yes, exactly.

>
> --
> Peter Xu
>

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [v3 PATCH 2/5] mm: filemap: check if THP has hwpoisoned subpage for PMD page fault
  2021-10-13  3:27                     ` Yang Shi
@ 2021-10-13  3:41                       ` Peter Xu
  2021-10-13 21:42                         ` Yang Shi
  0 siblings, 1 reply; 59+ messages in thread
From: Peter Xu @ 2021-10-13  3:41 UTC (permalink / raw)
  To: Yang Shi
  Cc: HORIGUCHI NAOYA(堀口 直也),
	Hugh Dickins, Kirill A. Shutemov, Matthew Wilcox, Oscar Salvador,
	Andrew Morton, Linux MM, Linux FS-devel Mailing List,
	Linux Kernel Mailing List

On Tue, Oct 12, 2021 at 08:27:06PM -0700, Yang Shi wrote:
> > But this also reminded me that shouldn't we be with the page lock already
> > during the process of "setting hwpoison-subpage bit, split thp, clear
> > hwpoison-subpage bit"?  If it's only the small window that needs protection,
> > while when looking up the shmem pagecache we always need to take the page lock
> > too, then it seems already safe even without the extra bit?  Hmm?
> 
> I don't quite get your point. Do you mean memory_failure()? If so the
> answer is no, outside the page lock. And the window may be indefinite
> since file THP doesn't get split before this series and the split may
> fail even after this series.

What I meant is that we could extend the page_lock in try_to_split_thp_page()
to cover setting hwpoison-subpage too (and it of course covers the clearing if
thp split succeeded, as that's part of the split process).  But yeah it's a
good point that the split may fail, so the extra bit seems still necessary.

Maybe that'll be something worth mentioning in the commit message too?  The
commit message described very well on the overhead of looping over 512 pages,
however the reader can easily overlook the real reason for needing this bit -
IMHO it's really for the thp split failure case, as we could also mention that
if thp split won't fail, page lock should be suffice (imho).  We could also
mention about why soft offline does not need that extra bit, which seems not
obvious as well, so imho good material for commit messages.

Sorry to have asked for a lot of commit message changes; I hope they make sense.

Thanks,

-- 
Peter Xu


^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [v3 PATCH 2/5] mm: filemap: check if THP has hwpoisoned subpage for PMD page fault
  2021-10-13  3:41                       ` Peter Xu
@ 2021-10-13 21:42                         ` Yang Shi
  2021-10-13 23:13                           ` Peter Xu
  0 siblings, 1 reply; 59+ messages in thread
From: Yang Shi @ 2021-10-13 21:42 UTC (permalink / raw)
  To: Peter Xu
  Cc: HORIGUCHI NAOYA(堀口 直也),
	Hugh Dickins, Kirill A. Shutemov, Matthew Wilcox, Oscar Salvador,
	Andrew Morton, Linux MM, Linux FS-devel Mailing List,
	Linux Kernel Mailing List

On Tue, Oct 12, 2021 at 8:41 PM Peter Xu <peterx@redhat.com> wrote:
>
> On Tue, Oct 12, 2021 at 08:27:06PM -0700, Yang Shi wrote:
> > > But this also reminded me that shouldn't we be with the page lock already
> > > during the process of "setting hwpoison-subpage bit, split thp, clear
> > > hwpoison-subpage bit"?  If it's only the small window that needs protection,
> > > while when looking up the shmem pagecache we always need to take the page lock
> > > too, then it seems already safe even without the extra bit?  Hmm?
> >
> > I don't quite get your point. Do you mean memory_failure()? If so the
> > answer is no, outside the page lock. And the window may be indefinite
> > since file THP doesn't get split before this series and the split may
> > fail even after this series.
>
> What I meant is that we could extend the page_lock in try_to_split_thp_page()
> to cover setting hwpoison-subpage too (and it of course covers the clearing if
> thp split succeeded, as that's part of the split process).  But yeah it's a
> good point that the split may fail, so the extra bit seems still necessary.
>
> Maybe that'll be something worth mentioning in the commit message too?  The
> commit message described very well on the overhead of looping over 512 pages,
> however the reader can easily overlook the real reason for needing this bit -
> IMHO it's really for the thp split failure case, as we could also mention that
> if thp split won't fail, page lock should be suffice (imho).  We could also

Not only for THP split failure case. Before this series, shmem THP
does't get split at all. And this commit is supposed to be backported
to the older versions, so saying "page lock is sufficient" is not
precise and confusing.

> mention about why soft offline does not need that extra bit, which seems not
> obvious as well, so imho good material for commit messages.

It would be nice to mention soft offline case.

>
> Sorry to have asked for a lot of commit message changes; I hope they make sense.

Thanks a lot for all the great suggestions.

>
> Thanks,
>
> --
> Peter Xu
>

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [v3 PATCH 2/5] mm: filemap: check if THP has hwpoisoned subpage for PMD page fault
  2021-10-13 21:42                         ` Yang Shi
@ 2021-10-13 23:13                           ` Peter Xu
  0 siblings, 0 replies; 59+ messages in thread
From: Peter Xu @ 2021-10-13 23:13 UTC (permalink / raw)
  To: Yang Shi
  Cc: HORIGUCHI NAOYA(堀口 直也),
	Hugh Dickins, Kirill A. Shutemov, Matthew Wilcox, Oscar Salvador,
	Andrew Morton, Linux MM, Linux FS-devel Mailing List,
	Linux Kernel Mailing List

On Wed, Oct 13, 2021 at 02:42:42PM -0700, Yang Shi wrote:
> On Tue, Oct 12, 2021 at 8:41 PM Peter Xu <peterx@redhat.com> wrote:
> >
> > On Tue, Oct 12, 2021 at 08:27:06PM -0700, Yang Shi wrote:
> > > > But this also reminded me that shouldn't we be with the page lock already
> > > > during the process of "setting hwpoison-subpage bit, split thp, clear
> > > > hwpoison-subpage bit"?  If it's only the small window that needs protection,
> > > > while when looking up the shmem pagecache we always need to take the page lock
> > > > too, then it seems already safe even without the extra bit?  Hmm?
> > >
> > > I don't quite get your point. Do you mean memory_failure()? If so the
> > > answer is no, outside the page lock. And the window may be indefinite
> > > since file THP doesn't get split before this series and the split may
> > > fail even after this series.
> >
> > What I meant is that we could extend the page_lock in try_to_split_thp_page()
> > to cover setting hwpoison-subpage too (and it of course covers the clearing if
> > thp split succeeded, as that's part of the split process).  But yeah it's a
> > good point that the split may fail, so the extra bit seems still necessary.
> >
> > Maybe that'll be something worth mentioning in the commit message too?  The
> > commit message described very well on the overhead of looping over 512 pages,
> > however the reader can easily overlook the real reason for needing this bit -
> > IMHO it's really for the thp split failure case, as we could also mention that
> > if thp split won't fail, page lock should be suffice (imho).  We could also
> 
> Not only for THP split failure case. Before this series, shmem THP
> does't get split at all. And this commit is supposed to be backported
> to the older versions, so saying "page lock is sufficient" is not
> precise and confusing.

Sure, please feel free to use any wording you prefer as long as the other side
of the lock besides the performance impact could be mentioned.  Thanks,

-- 
Peter Xu


^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [RFC v3 PATCH 0/5] Solve silent data loss caused by poisoned page cache (shmem/tmpfs)
  2021-10-13  3:09   ` Yang Shi
  2021-10-13  3:24     ` Peter Xu
@ 2021-10-14  6:54     ` Naoya Horiguchi
  1 sibling, 0 replies; 59+ messages in thread
From: Naoya Horiguchi @ 2021-10-14  6:54 UTC (permalink / raw)
  To: Yang Shi
  Cc: Peter Xu, HORIGUCHI NAOYA(堀口 直也),
	Hugh Dickins, Kirill A. Shutemov, Matthew Wilcox, Oscar Salvador,
	Andrew Morton, Linux MM, Linux FS-devel Mailing List,
	Linux Kernel Mailing List

On Tue, Oct 12, 2021 at 08:09:24PM -0700, Yang Shi wrote:
> On Tue, Oct 12, 2021 at 7:41 PM Peter Xu <peterx@redhat.com> wrote:
> >
> > On Thu, Sep 30, 2021 at 02:53:06PM -0700, Yang Shi wrote:
> > > Yang Shi (5):
> > >       mm: hwpoison: remove the unnecessary THP check
> > >       mm: filemap: check if THP has hwpoisoned subpage for PMD page fault
> > >       mm: hwpoison: refactor refcount check handling
> > >       mm: shmem: don't truncate page if memory failure happens
> > >       mm: hwpoison: handle non-anonymous THP correctly
> >
> > Today I just noticed one more thing: unpoison path has (unpoison_memory):
> >
> >         if (page_mapping(page)) {
> >                 unpoison_pr_info("Unpoison: the hwpoison page has non-NULL mapping %#lx\n",
> >                                  pfn, &unpoison_rs);
> >                 return 0;
> >         }
> >
> > I _think_ it was used to make sure we ignore page that was not successfully
> > poisoned/offlined before (for anonymous), so raising this question up on
> > whether we should make sure e.g. shmem hwpoisoned pages still can be unpoisoned
> > for debugging purposes.
> 
> Yes, not only mapping, the refcount check is not right if page cache
> page is kept in page cache instead of being truncated after this
> series. But actually unpoison has been broken since commit
> 0ed950d1f28142ccd9a9453c60df87853530d778 ("mm,hwpoison: make
> get_hwpoison_page() call get_any_page()"). And Naoya said in the
> commit "unpoison_memory() is also unchanged because it's broken and
> need thorough fixes (will be done later)."
> 

Yeah, I have a patch but still not make it merged to upstream.
Sorry about that...

> I do have some fixes in my tree to unblock tests and fix unpoison for
> this series (just make it work for testing). Naoya may have some ideas
> in mind and it is just a debugging feature so I don't think it must be
> fixed in this series. It could be done later. I could add a TODO
> section in the cover letter to make this more clear.

Yes, it's fine to me that you leave this unsolved in this patchset.

Thanks,
Naoya Horiguchi

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [v3 PATCH 2/5] mm: filemap: check if THP has hwpoisoned subpage for PMD page fault
  2021-10-13  3:01                   ` Peter Xu
  2021-10-13  3:27                     ` Yang Shi
@ 2021-10-14  6:54                     ` Naoya Horiguchi
  1 sibling, 0 replies; 59+ messages in thread
From: Naoya Horiguchi @ 2021-10-14  6:54 UTC (permalink / raw)
  To: Peter Xu
  Cc: Yang Shi, HORIGUCHI NAOYA(堀口 直也),
	Hugh Dickins, Kirill A. Shutemov, Matthew Wilcox, Oscar Salvador,
	Andrew Morton, Linux MM, Linux FS-devel Mailing List,
	Linux Kernel Mailing List

On Wed, Oct 13, 2021 at 11:01:33AM +0800, Peter Xu wrote:
> On Tue, Oct 12, 2021 at 07:48:39PM -0700, Yang Shi wrote:
> > On Tue, Oct 12, 2021 at 3:10 PM Peter Xu <peterx@redhat.com> wrote:
> > >
> > > On Tue, Oct 12, 2021 at 11:02:09AM -0700, Yang Shi wrote:
> > > > On Mon, Oct 11, 2021 at 6:44 PM Peter Xu <peterx@redhat.com> wrote:
> > > > >
> > > > > On Mon, Oct 11, 2021 at 08:55:26PM -0400, Peter Xu wrote:
> > > > > > Another thing is I noticed soft_offline_in_use_page() will still ignore file
> > > > > > backed split.  I'm not sure whether it means we'd better also handle that case
> > > > > > as well, so shmem thp can be split there too?
> > > > >
> > > > > Please ignore this paragraph - I somehow read "!PageHuge(page)" as
> > > > > "PageAnon(page)"...  So I think patch 5 handles soft offline too.
> > > >
> > > > Yes, exactly. And even though the split is failed (or file THP didn't
> > > > get split before patch 5/5), soft offline would just return -EBUSY
> > > > instead of calling __soft_offline_page->page_handle_poison(). So
> > > > page_handle_poison() should not see THP at all.
> > >
> > > I see, so I'm trying to summarize myself on what I see now with the new logic..
> > >
> > > I think the offline code handles hwpoison differently as it sets PageHWPoison
> > > at the end of the process, IOW if anything failed during the offline process
> > > the hwpoison bit is not set.
> > >
> > > That's different from how the memory failure path is handling this, as in that
> > > case the hwpoison bit on the subpage is set firstly, e.g. before split thp.  I
> > > believe that's also why memory failure requires the extra sub-page-hwpoison bit
> > > while offline code shouldn't need to: because for soft offline split happens
> > > before setting hwpoison so we just won't ever see a "poisoned file thp", while
> > > for memory failure it could happen, and the sub-page-hwpoison will be a temp
> > > bit anyway only exist for a very short period right after we set hwpoison on
> > > the small page but before we split the thp.
> > >
> > > Am I right above?
> > 
> > Yeah, you are right. I noticed this too, only successfully migrated
> > page is marked as hwpoison. But TBH I'm not sure why it does this way.
> 
> My wild guess is that unlike memory failures, soft offline is best-effort. Say,
> the data on the page is still consistent, so even if offline failed for some
> reason we shouldn't stop the program from execution.  That's not true for
> memory failures via MCEs, afaict, as the execution could read/write wrong data
> and that'll be a serious mistake, so we set hwpoison 1st there first before
> doing anything else, making sure "this page is broken" message delivered and
> user app won't run with risk.
> 
> But yeah it'll be great if Naoya could help confirm that.

Yes, these descriptions are totally correct, how PG_hwpoison flag is set
is different between hwpoison/soft-offline mechanisms from the beginning.

Thanks,
Naoya Horiguchi

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [v3 PATCH 2/5] mm: filemap: check if THP has hwpoisoned subpage for PMD page fault
  2021-09-30 21:53 ` [v3 PATCH 2/5] mm: filemap: check if THP has hwpoisoned subpage for PMD page fault Yang Shi
                     ` (3 preceding siblings ...)
  2021-10-06 20:18   ` Peter Xu
@ 2021-11-01 19:05   ` Naresh Kamboju
  2021-11-01 19:26     ` Yang Shi
  4 siblings, 1 reply; 59+ messages in thread
From: Naresh Kamboju @ 2021-11-01 19:05 UTC (permalink / raw)
  To: Yang Shi
  Cc: naoya.horiguchi, hughd, kirill.shutemov, willy, peterx,
	osalvador, akpm, linux-mm, linux-fsdevel, linux-kernel,
	Linus Torvalds, Greg Kroah-Hartman, Sasha Levin, Anders Roxell,
	lkft-triage

Hi Yang,

On Fri, 1 Oct 2021 at 03:23, Yang Shi <shy828301@gmail.com> wrote:
>
> When handling shmem page fault the THP with corrupted subpage could be PMD
> mapped if certain conditions are satisfied.  But kernel is supposed to
> send SIGBUS when trying to map hwpoisoned page.
>
> There are two paths which may do PMD map: fault around and regular fault.
>
> Before commit f9ce0be71d1f ("mm: Cleanup faultaround and finish_fault() codepaths")
> the thing was even worse in fault around path.  The THP could be PMD mapped as
> long as the VMA fits regardless what subpage is accessed and corrupted.  After
> this commit as long as head page is not corrupted the THP could be PMD mapped.
>
> In the regular fault path the THP could be PMD mapped as long as the corrupted
> page is not accessed and the VMA fits.
>
> This loophole could be fixed by iterating every subpage to check if any
> of them is hwpoisoned or not, but it is somewhat costly in page fault path.
>
> So introduce a new page flag called HasHWPoisoned on the first tail page.  It
> indicates the THP has hwpoisoned subpage(s).  It is set if any subpage of THP
> is found hwpoisoned by memory failure and cleared when the THP is freed or
> split.
>
> Fixes: 800d8c63b2e9 ("shmem: add huge pages support")
> Cc: <stable@vger.kernel.org>
> Suggested-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
> Signed-off-by: Yang Shi <shy828301@gmail.com>
> ---
>  include/linux/page-flags.h | 19 +++++++++++++++++++
>  mm/filemap.c               | 12 ++++++------
>  mm/huge_memory.c           |  2 ++
>  mm/memory-failure.c        |  6 +++++-
>  mm/memory.c                |  9 +++++++++
>  mm/page_alloc.c            |  4 +++-
>  6 files changed, 44 insertions(+), 8 deletions(-)

When CONFIG_MEMORY_FAILURE not set
we get these build failures.

Regression found on x86_64 and i386 gcc-11 builds
Following build warnings / errors reported on Linux mainline master.

metadata:
    git_describe: v5.15-559-g19901165d90f
    git_repo: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
    git_short_log: 19901165d90f (\"Merge tag
'for-5.16/inode-sync-2021-10-29' of git://git.kernel.dk/linux-block\")
    target_arch: x86_64
    toolchain: gcc-11


In file included from include/linux/mmzone.h:22,
                 from include/linux/gfp.h:6,
                 from include/linux/slab.h:15,
                 from include/linux/crypto.h:20,
                 from arch/x86/kernel/asm-offsets.c:9:
include/linux/page-flags.h:806:29: error: macro "PAGEFLAG_FALSE"
requires 2 arguments, but only 1 given
  806 | PAGEFLAG_FALSE(HasHWPoisoned)
      |                             ^
include/linux/page-flags.h:411: note: macro "PAGEFLAG_FALSE" defined here
  411 | #define PAGEFLAG_FALSE(uname, lname) TESTPAGEFLAG_FALSE(uname,
lname)   \
      |
include/linux/page-flags.h:807:39: error: macro "TESTSCFLAG_FALSE"
requires 2 arguments, but only 1 given
  807 |         TESTSCFLAG_FALSE(HasHWPoisoned)
      |                                       ^
include/linux/page-flags.h:414: note: macro "TESTSCFLAG_FALSE" defined here
  414 | #define TESTSCFLAG_FALSE(uname, lname)
         \
      |
include/linux/page-flags.h:806:1: error: unknown type name 'PAGEFLAG_FALSE'
  806 | PAGEFLAG_FALSE(HasHWPoisoned)
      | ^~~~~~~~~~~~~~
include/linux/page-flags.h:807:25: error: expected ';' before 'static'
  807 |         TESTSCFLAG_FALSE(HasHWPoisoned)
      |                         ^
      |                         ;
......
  815 | static inline bool is_page_hwpoison(struct page *page)
      | ~~~~~~
make[2]: *** [scripts/Makefile.build:121: arch/x86/kernel/asm-offsets.s] Error 1

Reported-by: Linux Kernel Functional Testing <lkft@linaro.org>

build link:
-----------
https://builds.tuxbuild.com/20KPBpXK6K0bKSIKAIKfwlBq7O4/build.log

build config:
-------------
https://builds.tuxbuild.com/20KPBpXK6K0bKSIKAIKfwlBq7O4/config

# To install tuxmake on your system globally
# sudo pip3 install -U tuxmake

tuxmake --runtime podman --target-arch x86_64 --toolchain gcc-11
--kconfig defconfig --kconfig-add
https://builds.tuxbuild.com/20KPBpXK6K0bKSIKAIKfwlBq7O4/config

link:
https://builds.tuxbuild.com/20KPBpXK6K0bKSIKAIKfwlBq7O4/tuxmake_reproducer.sh

--
Linaro LKFT
https://lkft.linaro.org

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [v3 PATCH 2/5] mm: filemap: check if THP has hwpoisoned subpage for PMD page fault
  2021-11-01 19:05   ` Naresh Kamboju
@ 2021-11-01 19:26     ` Yang Shi
  0 siblings, 0 replies; 59+ messages in thread
From: Yang Shi @ 2021-11-01 19:26 UTC (permalink / raw)
  To: Naresh Kamboju
  Cc: HORIGUCHI NAOYA(堀口 直也),
	Hugh Dickins, Kirill A. Shutemov, Matthew Wilcox, Peter Xu,
	Oscar Salvador, Andrew Morton, Linux MM,
	Linux FS-devel Mailing List, Linux Kernel Mailing List,
	Linus Torvalds, Greg Kroah-Hartman, Sasha Levin, Anders Roxell,
	lkft-triage

On Mon, Nov 1, 2021 at 12:05 PM Naresh Kamboju
<naresh.kamboju@linaro.org> wrote:
>
> Hi Yang,
>
> On Fri, 1 Oct 2021 at 03:23, Yang Shi <shy828301@gmail.com> wrote:
> >
> > When handling shmem page fault the THP with corrupted subpage could be PMD
> > mapped if certain conditions are satisfied.  But kernel is supposed to
> > send SIGBUS when trying to map hwpoisoned page.
> >
> > There are two paths which may do PMD map: fault around and regular fault.
> >
> > Before commit f9ce0be71d1f ("mm: Cleanup faultaround and finish_fault() codepaths")
> > the thing was even worse in fault around path.  The THP could be PMD mapped as
> > long as the VMA fits regardless what subpage is accessed and corrupted.  After
> > this commit as long as head page is not corrupted the THP could be PMD mapped.
> >
> > In the regular fault path the THP could be PMD mapped as long as the corrupted
> > page is not accessed and the VMA fits.
> >
> > This loophole could be fixed by iterating every subpage to check if any
> > of them is hwpoisoned or not, but it is somewhat costly in page fault path.
> >
> > So introduce a new page flag called HasHWPoisoned on the first tail page.  It
> > indicates the THP has hwpoisoned subpage(s).  It is set if any subpage of THP
> > is found hwpoisoned by memory failure and cleared when the THP is freed or
> > split.
> >
> > Fixes: 800d8c63b2e9 ("shmem: add huge pages support")
> > Cc: <stable@vger.kernel.org>
> > Suggested-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
> > Signed-off-by: Yang Shi <shy828301@gmail.com>
> > ---
> >  include/linux/page-flags.h | 19 +++++++++++++++++++
> >  mm/filemap.c               | 12 ++++++------
> >  mm/huge_memory.c           |  2 ++
> >  mm/memory-failure.c        |  6 +++++-
> >  mm/memory.c                |  9 +++++++++
> >  mm/page_alloc.c            |  4 +++-
> >  6 files changed, 44 insertions(+), 8 deletions(-)
>
> When CONFIG_MEMORY_FAILURE not set
> we get these build failures.

Thanks for catching this. It is because Willy's page folio series
changed the definition of PAGEFLAG_FALSE macro. But patch was new in
5.15-rc7, so his series doesn't cover this.

The below patch should be able to fix it:

diff --git a/include/linux/page-flags.h b/include/linux/page-flags.h
index d8623d6e1141..981341a3c3c4 100644
--- a/include/linux/page-flags.h
+++ b/include/linux/page-flags.h
@@ -803,8 +803,8 @@ PAGEFLAG_FALSE(DoubleMap, double_map)
 PAGEFLAG(HasHWPoisoned, has_hwpoisoned, PF_SECOND)
        TESTSCFLAG(HasHWPoisoned, has_hwpoisoned, PF_SECOND)
 #else
-PAGEFLAG_FALSE(HasHWPoisoned)
-       TESTSCFLAG_FALSE(HasHWPoisoned)
+PAGEFLAG_FALSE(HasHWPoisoned, has_hwpoisoned)
+       TESTSCFLAG_FALSE(HasHWPoisoned, has_hwpoisoned)
 #endif

 /*

I will prepare a formal patch for 5.16.

>
> Regression found on x86_64 and i386 gcc-11 builds
> Following build warnings / errors reported on Linux mainline master.
>
> metadata:
>     git_describe: v5.15-559-g19901165d90f
>     git_repo: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
>     git_short_log: 19901165d90f (\"Merge tag
> 'for-5.16/inode-sync-2021-10-29' of git://git.kernel.dk/linux-block\")
>     target_arch: x86_64
>     toolchain: gcc-11
>
>
> In file included from include/linux/mmzone.h:22,
>                  from include/linux/gfp.h:6,
>                  from include/linux/slab.h:15,
>                  from include/linux/crypto.h:20,
>                  from arch/x86/kernel/asm-offsets.c:9:
> include/linux/page-flags.h:806:29: error: macro "PAGEFLAG_FALSE"
> requires 2 arguments, but only 1 given
>   806 | PAGEFLAG_FALSE(HasHWPoisoned)
>       |                             ^
> include/linux/page-flags.h:411: note: macro "PAGEFLAG_FALSE" defined here
>   411 | #define PAGEFLAG_FALSE(uname, lname) TESTPAGEFLAG_FALSE(uname,
> lname)   \
>       |
> include/linux/page-flags.h:807:39: error: macro "TESTSCFLAG_FALSE"
> requires 2 arguments, but only 1 given
>   807 |         TESTSCFLAG_FALSE(HasHWPoisoned)
>       |                                       ^
> include/linux/page-flags.h:414: note: macro "TESTSCFLAG_FALSE" defined here
>   414 | #define TESTSCFLAG_FALSE(uname, lname)
>          \
>       |
> include/linux/page-flags.h:806:1: error: unknown type name 'PAGEFLAG_FALSE'
>   806 | PAGEFLAG_FALSE(HasHWPoisoned)
>       | ^~~~~~~~~~~~~~
> include/linux/page-flags.h:807:25: error: expected ';' before 'static'
>   807 |         TESTSCFLAG_FALSE(HasHWPoisoned)
>       |                         ^
>       |                         ;
> ......
>   815 | static inline bool is_page_hwpoison(struct page *page)
>       | ~~~~~~
> make[2]: *** [scripts/Makefile.build:121: arch/x86/kernel/asm-offsets.s] Error 1
>
> Reported-by: Linux Kernel Functional Testing <lkft@linaro.org>
>
> build link:
> -----------
> https://builds.tuxbuild.com/20KPBpXK6K0bKSIKAIKfwlBq7O4/build.log
>
> build config:
> -------------
> https://builds.tuxbuild.com/20KPBpXK6K0bKSIKAIKfwlBq7O4/config
>
> # To install tuxmake on your system globally
> # sudo pip3 install -U tuxmake
>
> tuxmake --runtime podman --target-arch x86_64 --toolchain gcc-11
> --kconfig defconfig --kconfig-add
> https://builds.tuxbuild.com/20KPBpXK6K0bKSIKAIKfwlBq7O4/config
>
> link:
> https://builds.tuxbuild.com/20KPBpXK6K0bKSIKAIKfwlBq7O4/tuxmake_reproducer.sh
>
> --
> Linaro LKFT
> https://lkft.linaro.org

^ permalink raw reply related	[flat|nested] 59+ messages in thread

end of thread, other threads:[~2021-11-01 19:26 UTC | newest]

Thread overview: 59+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-09-30 21:53 [RFC v3 PATCH 0/5] Solve silent data loss caused by poisoned page cache (shmem/tmpfs) Yang Shi
2021-09-30 21:53 ` [v3 PATCH 1/5] mm: hwpoison: remove the unnecessary THP check Yang Shi
2021-10-06  2:35   ` Yang Shi
2021-10-06  4:00     ` Naoya Horiguchi
2021-10-06 17:56       ` Yang Shi
2021-09-30 21:53 ` [v3 PATCH 2/5] mm: filemap: check if THP has hwpoisoned subpage for PMD page fault Yang Shi
2021-10-01  7:23   ` Naoya Horiguchi
2021-10-01 21:07     ` Yang Shi
2021-10-04 14:06   ` Kirill A. Shutemov
2021-10-04 18:17     ` Yang Shi
2021-10-04 19:41       ` Kirill A. Shutemov
2021-10-04 20:13         ` Yang Shi
2021-10-06 19:54           ` Peter Xu
2021-10-06 23:41             ` Yang Shi
2021-10-07 16:14               ` Peter Xu
2021-10-07 18:28                 ` Yang Shi
2021-10-08  9:35             ` Kirill A. Shutemov
2021-10-11 22:57               ` Peter Xu
2021-10-06 20:15   ` Peter Xu
2021-10-06 23:57     ` Yang Shi
2021-10-07 16:06       ` Peter Xu
2021-10-07 18:19         ` Yang Shi
2021-10-07 20:27           ` Yang Shi
2021-10-07 21:28       ` Yang Shi
2021-10-12  0:55         ` Peter Xu
2021-10-12  1:44           ` Peter Xu
2021-10-12 18:02             ` Yang Shi
2021-10-12 22:10               ` Peter Xu
2021-10-13  2:48                 ` Yang Shi
2021-10-13  3:01                   ` Peter Xu
2021-10-13  3:27                     ` Yang Shi
2021-10-13  3:41                       ` Peter Xu
2021-10-13 21:42                         ` Yang Shi
2021-10-13 23:13                           ` Peter Xu
2021-10-14  6:54                     ` Naoya Horiguchi
2021-10-06 20:18   ` Peter Xu
2021-10-07  2:49     ` Yang Shi
2021-11-01 19:05   ` Naresh Kamboju
2021-11-01 19:26     ` Yang Shi
2021-09-30 21:53 ` [v3 PATCH 3/5] mm: hwpoison: refactor refcount check handling Yang Shi
2021-10-06 22:01   ` Peter Xu
2021-10-07  2:47     ` Yang Shi
2021-10-07 16:18       ` Peter Xu
2021-09-30 21:53 ` [v3 PATCH 4/5] mm: shmem: don't truncate page if memory failure happens Yang Shi
2021-10-01  7:05   ` Naoya Horiguchi
2021-10-01 21:08     ` Yang Shi
2021-10-12  1:57   ` Peter Xu
2021-10-12 19:17     ` Yang Shi
2021-10-12 22:26       ` Peter Xu
2021-10-13  3:00         ` Yang Shi
2021-10-13  3:06           ` Peter Xu
2021-10-13  3:29             ` Yang Shi
2021-09-30 21:53 ` [v3 PATCH 5/5] mm: hwpoison: handle non-anonymous THP correctly Yang Shi
2021-10-01  7:06   ` Naoya Horiguchi
2021-10-01 21:09     ` Yang Shi
2021-10-13  2:40 ` [RFC v3 PATCH 0/5] Solve silent data loss caused by poisoned page cache (shmem/tmpfs) Peter Xu
2021-10-13  3:09   ` Yang Shi
2021-10-13  3:24     ` Peter Xu
2021-10-14  6:54     ` Naoya Horiguchi

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).