linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v2 0/4] mm/hwpoison: fix unpoison_memory()
@ 2021-10-25 23:04 Naoya Horiguchi
  2021-10-25 23:05 ` [PATCH v2 1/4] mm/hwpoison: mf_mutex for soft offline and unpoison Naoya Horiguchi
                   ` (3 more replies)
  0 siblings, 4 replies; 15+ messages in thread
From: Naoya Horiguchi @ 2021-10-25 23:04 UTC (permalink / raw)
  To: linux-mm
  Cc: Andrew Morton, David Hildenbrand, Oscar Salvador, Michal Hocko,
	Ding Hui, Tony Luck, Aneesh Kumar K.V, Miaohe Lin, Yang Shi,
	Peter Xu, Naoya Horiguchi, linux-kernel

Hi,

I updated unpoison fix patchset (sorry for long blank time since v1).

Main purpose of this series is to sync unpoison code to recent changes
around how hwpoison code takes page refcount.  Unpoison should work or
simply fail (without crash) if impossible.

The recent works of keeping hwpoison pages in shmem pagecache introduce
a new state of hwpoisoned pages, but unpoison for such pages is not
supported yet with this series.

It seems that soft-offline and unpoison can be used as general purpose
page offline/online mechanism (not in the context of memory error). I
think that we need some additional works to realize it because currently
soft-offline and unpoison are assumed not to happen so frequently
(print out too many messages for aggressive usecases). But anyway this
could be another interesting next topic.

v1: https://lore.kernel.org/linux-mm/20210614021212.223326-1-nao.horiguchi@gmail.com/

Thanks,
Naoya Horiguchi
---
Summary:

Naoya Horiguchi (4):
      mm/hwpoison: mf_mutex for soft offline and unpoison
      mm/hwpoison: remove race consideration
      mm/hwpoison: remove MF_MSG_BUDDY_2ND and MF_MSG_POISONED_HUGE
      mm/hwpoison: fix unpoison_memory()

 include/linux/mm.h         |   3 +-
 include/linux/page-flags.h |   4 ++
 include/ras/ras_event.h    |   2 -
 mm/memory-failure.c        | 166 ++++++++++++++++++++++++++++-----------------
 mm/page_alloc.c            |  23 +++++++
 5 files changed, 130 insertions(+), 68 deletions(-)


^ permalink raw reply	[flat|nested] 15+ messages in thread

* [PATCH v2 1/4] mm/hwpoison: mf_mutex for soft offline and unpoison
  2021-10-25 23:04 [PATCH v2 0/4] mm/hwpoison: fix unpoison_memory() Naoya Horiguchi
@ 2021-10-25 23:05 ` Naoya Horiguchi
  2021-10-27  1:32   ` Yang Shi
  2021-10-25 23:05 ` [PATCH v2 2/4] mm/hwpoison: remove race consideration Naoya Horiguchi
                   ` (2 subsequent siblings)
  3 siblings, 1 reply; 15+ messages in thread
From: Naoya Horiguchi @ 2021-10-25 23:05 UTC (permalink / raw)
  To: linux-mm
  Cc: Andrew Morton, David Hildenbrand, Oscar Salvador, Michal Hocko,
	Ding Hui, Tony Luck, Aneesh Kumar K.V, Miaohe Lin, Yang Shi,
	Peter Xu, Naoya Horiguchi, linux-kernel

From: Naoya Horiguchi <naoya.horiguchi@nec.com>

Originally mf_mutex is introduced to serialize multiple MCE events, but
it's also helpful to exclude races among  soft_offline_page() and
unpoison_memory().  So apply mf_mutex to them.

Signed-off-by: Naoya Horiguchi <naoya.horiguchi@nec.com>
---
ChangeLog v2:
- add mutex_unlock() in "page already poisoned" path in soft_offline_page().
  (Thanks to Ding Hui)
---
 mm/memory-failure.c | 27 +++++++++++++++++++--------
 1 file changed, 19 insertions(+), 8 deletions(-)

diff --git a/mm/memory-failure.c b/mm/memory-failure.c
index fa9dda95a2a2..97297edfbd8e 100644
--- a/mm/memory-failure.c
+++ b/mm/memory-failure.c
@@ -1628,6 +1628,8 @@ static int memory_failure_dev_pagemap(unsigned long pfn, int flags,
 	return rc;
 }
 
+static DEFINE_MUTEX(mf_mutex);
+
 /**
  * memory_failure - Handle memory failure of a page.
  * @pfn: Page Number of the corrupted page
@@ -1654,7 +1656,6 @@ int memory_failure(unsigned long pfn, int flags)
 	int res = 0;
 	unsigned long page_flags;
 	bool retry = true;
-	static DEFINE_MUTEX(mf_mutex);
 
 	if (!sysctl_memory_failure_recovery)
 		panic("Memory failure on page %lx", pfn);
@@ -1978,6 +1979,7 @@ int unpoison_memory(unsigned long pfn)
 	struct page *page;
 	struct page *p;
 	int freeit = 0;
+	int ret = 0;
 	unsigned long flags = 0;
 	static DEFINE_RATELIMIT_STATE(unpoison_rs, DEFAULT_RATELIMIT_INTERVAL,
 					DEFAULT_RATELIMIT_BURST);
@@ -1988,28 +1990,30 @@ int unpoison_memory(unsigned long pfn)
 	p = pfn_to_page(pfn);
 	page = compound_head(p);
 
+	mutex_lock(&mf_mutex);
+
 	if (!PageHWPoison(p)) {
 		unpoison_pr_info("Unpoison: Page was already unpoisoned %#lx\n",
 				 pfn, &unpoison_rs);
-		return 0;
+		goto unlock_mutex;
 	}
 
 	if (page_count(page) > 1) {
 		unpoison_pr_info("Unpoison: Someone grabs the hwpoison page %#lx\n",
 				 pfn, &unpoison_rs);
-		return 0;
+		goto unlock_mutex;
 	}
 
 	if (page_mapped(page)) {
 		unpoison_pr_info("Unpoison: Someone maps the hwpoison page %#lx\n",
 				 pfn, &unpoison_rs);
-		return 0;
+		goto unlock_mutex;
 	}
 
 	if (page_mapping(page)) {
 		unpoison_pr_info("Unpoison: the hwpoison page has non-NULL mapping %#lx\n",
 				 pfn, &unpoison_rs);
-		return 0;
+		goto unlock_mutex;
 	}
 
 	/*
@@ -2020,7 +2024,7 @@ int unpoison_memory(unsigned long pfn)
 	if (!PageHuge(page) && PageTransHuge(page)) {
 		unpoison_pr_info("Unpoison: Memory failure is now running on %#lx\n",
 				 pfn, &unpoison_rs);
-		return 0;
+		goto unlock_mutex;
 	}
 
 	if (!get_hwpoison_page(p, flags)) {
@@ -2028,7 +2032,7 @@ int unpoison_memory(unsigned long pfn)
 			num_poisoned_pages_dec();
 		unpoison_pr_info("Unpoison: Software-unpoisoned free page %#lx\n",
 				 pfn, &unpoison_rs);
-		return 0;
+		goto unlock_mutex;
 	}
 
 	lock_page(page);
@@ -2050,7 +2054,9 @@ int unpoison_memory(unsigned long pfn)
 	if (freeit && !(pfn == my_zero_pfn(0) && page_count(p) == 1))
 		put_page(page);
 
-	return 0;
+unlock_mutex:
+	mutex_unlock(&mf_mutex);
+	return ret;
 }
 EXPORT_SYMBOL(unpoison_memory);
 
@@ -2231,9 +2237,12 @@ int soft_offline_page(unsigned long pfn, int flags)
 		return -EIO;
 	}
 
+	mutex_lock(&mf_mutex);
+
 	if (PageHWPoison(page)) {
 		pr_info("%s: %#lx page already poisoned\n", __func__, pfn);
 		put_ref_page(ref_page);
+		mutex_unlock(&mf_mutex);
 		return 0;
 	}
 
@@ -2251,5 +2260,7 @@ int soft_offline_page(unsigned long pfn, int flags)
 		}
 	}
 
+	mutex_unlock(&mf_mutex);
+
 	return ret;
 }
-- 
2.25.1



^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH v2 2/4] mm/hwpoison: remove race consideration
  2021-10-25 23:04 [PATCH v2 0/4] mm/hwpoison: fix unpoison_memory() Naoya Horiguchi
  2021-10-25 23:05 ` [PATCH v2 1/4] mm/hwpoison: mf_mutex for soft offline and unpoison Naoya Horiguchi
@ 2021-10-25 23:05 ` Naoya Horiguchi
  2021-10-27  1:04   ` Yang Shi
  2021-10-25 23:14 ` [PATCH 3/4] mm/hwpoison: remove MF_MSG_BUDDY_2ND and MF_MSG_POISONED_HUGE Naoya Horiguchi
  2021-10-25 23:16 ` [PATCH v2 4/4] mm/hwpoison: fix unpoison_memory() Naoya Horiguchi
  3 siblings, 1 reply; 15+ messages in thread
From: Naoya Horiguchi @ 2021-10-25 23:05 UTC (permalink / raw)
  To: linux-mm
  Cc: Andrew Morton, David Hildenbrand, Oscar Salvador, Michal Hocko,
	Ding Hui, Tony Luck, Aneesh Kumar K.V, Miaohe Lin, Yang Shi,
	Peter Xu, Naoya Horiguchi, linux-kernel

From: Naoya Horiguchi <naoya.horiguchi@nec.com>

Now memory_failure() and unpoison_memory() are protected by mf_mutex,
so no need to explicitly check races between them.  So remove them.

Signed-off-by: Naoya Horiguchi <naoya.horiguchi@nec.com>
---
 mm/memory-failure.c | 37 -------------------------------------
 1 file changed, 37 deletions(-)

diff --git a/mm/memory-failure.c b/mm/memory-failure.c
index 97297edfbd8e..a47b741ca04b 100644
--- a/mm/memory-failure.c
+++ b/mm/memory-failure.c
@@ -1507,14 +1507,6 @@ static int memory_failure_hugetlb(unsigned long pfn, int flags)
 	lock_page(head);
 	page_flags = head->flags;
 
-	if (!PageHWPoison(head)) {
-		pr_err("Memory failure: %#lx: just unpoisoned\n", pfn);
-		num_poisoned_pages_dec();
-		unlock_page(head);
-		put_page(head);
-		return 0;
-	}
-
 	/*
 	 * TODO: hwpoison for pud-sized hugetlb doesn't work right now, so
 	 * simply disable it. In order to make it work properly, we need
@@ -1789,16 +1781,6 @@ int memory_failure(unsigned long pfn, int flags)
 	 */
 	page_flags = p->flags;
 
-	/*
-	 * unpoison always clear PG_hwpoison inside page lock
-	 */
-	if (!PageHWPoison(p)) {
-		pr_err("Memory failure: %#lx: just unpoisoned\n", pfn);
-		num_poisoned_pages_dec();
-		unlock_page(p);
-		put_page(p);
-		goto unlock_mutex;
-	}
 	if (hwpoison_filter(p)) {
 		if (TestClearPageHWPoison(p))
 			num_poisoned_pages_dec();
@@ -2016,17 +1998,6 @@ int unpoison_memory(unsigned long pfn)
 		goto unlock_mutex;
 	}
 
-	/*
-	 * unpoison_memory() can encounter thp only when the thp is being
-	 * worked by memory_failure() and the page lock is not held yet.
-	 * In such case, we yield to memory_failure() and make unpoison fail.
-	 */
-	if (!PageHuge(page) && PageTransHuge(page)) {
-		unpoison_pr_info("Unpoison: Memory failure is now running on %#lx\n",
-				 pfn, &unpoison_rs);
-		goto unlock_mutex;
-	}
-
 	if (!get_hwpoison_page(p, flags)) {
 		if (TestClearPageHWPoison(p))
 			num_poisoned_pages_dec();
@@ -2035,20 +2006,12 @@ int unpoison_memory(unsigned long pfn)
 		goto unlock_mutex;
 	}
 
-	lock_page(page);
-	/*
-	 * This test is racy because PG_hwpoison is set outside of page lock.
-	 * That's acceptable because that won't trigger kernel panic. Instead,
-	 * the PG_hwpoison page will be caught and isolated on the entrance to
-	 * the free buddy page pool.
-	 */
 	if (TestClearPageHWPoison(page)) {
 		unpoison_pr_info("Unpoison: Software-unpoisoned page %#lx\n",
 				 pfn, &unpoison_rs);
 		num_poisoned_pages_dec();
 		freeit = 1;
 	}
-	unlock_page(page);
 
 	put_page(page);
 	if (freeit && !(pfn == my_zero_pfn(0) && page_count(p) == 1))
-- 
2.25.1



^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH 3/4] mm/hwpoison: remove MF_MSG_BUDDY_2ND and MF_MSG_POISONED_HUGE
  2021-10-25 23:04 [PATCH v2 0/4] mm/hwpoison: fix unpoison_memory() Naoya Horiguchi
  2021-10-25 23:05 ` [PATCH v2 1/4] mm/hwpoison: mf_mutex for soft offline and unpoison Naoya Horiguchi
  2021-10-25 23:05 ` [PATCH v2 2/4] mm/hwpoison: remove race consideration Naoya Horiguchi
@ 2021-10-25 23:14 ` Naoya Horiguchi
  2021-10-27  1:05   ` Yang Shi
  2021-10-25 23:16 ` [PATCH v2 4/4] mm/hwpoison: fix unpoison_memory() Naoya Horiguchi
  3 siblings, 1 reply; 15+ messages in thread
From: Naoya Horiguchi @ 2021-10-25 23:14 UTC (permalink / raw)
  To: linux-mm
  Cc: Andrew Morton, David Hildenbrand, Oscar Salvador, Michal Hocko,
	Ding Hui, Tony Luck, Aneesh Kumar K.V, Miaohe Lin, Yang Shi,
	Peter Xu, Naoya Horiguchi, linux-kernel

(I failed to send patch 3/4 and 4/4 due to the ratelimit of linux.dev,
so I switched mail server...)

From: Naoya Horiguchi <naoya.horiguchi@nec.com>

These action_page_types are no longer used, so remove them.

Signed-off-by: Naoya Horiguchi <naoya.horiguchi@nec.com>
---
 include/linux/mm.h      | 2 --
 include/ras/ras_event.h | 2 --
 mm/memory-failure.c     | 2 --
 3 files changed, 6 deletions(-)

diff --git a/include/linux/mm.h b/include/linux/mm.h
index a3229f609856..71d886470d71 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -3246,7 +3246,6 @@ enum mf_action_page_type {
 	MF_MSG_KERNEL_HIGH_ORDER,
 	MF_MSG_SLAB,
 	MF_MSG_DIFFERENT_COMPOUND,
-	MF_MSG_POISONED_HUGE,
 	MF_MSG_HUGE,
 	MF_MSG_FREE_HUGE,
 	MF_MSG_NON_PMD_HUGE,
@@ -3261,7 +3260,6 @@ enum mf_action_page_type {
 	MF_MSG_CLEAN_LRU,
 	MF_MSG_TRUNCATED_LRU,
 	MF_MSG_BUDDY,
-	MF_MSG_BUDDY_2ND,
 	MF_MSG_DAX,
 	MF_MSG_UNSPLIT_THP,
 	MF_MSG_UNKNOWN,
diff --git a/include/ras/ras_event.h b/include/ras/ras_event.h
index 0bdbc0d17d2f..d0337a41141c 100644
--- a/include/ras/ras_event.h
+++ b/include/ras/ras_event.h
@@ -358,7 +358,6 @@ TRACE_EVENT(aer_event,
 	EM ( MF_MSG_KERNEL_HIGH_ORDER, "high-order kernel page" )	\
 	EM ( MF_MSG_SLAB, "kernel slab page" )				\
 	EM ( MF_MSG_DIFFERENT_COMPOUND, "different compound page after locking" ) \
-	EM ( MF_MSG_POISONED_HUGE, "huge page already hardware poisoned" )	\
 	EM ( MF_MSG_HUGE, "huge page" )					\
 	EM ( MF_MSG_FREE_HUGE, "free huge page" )			\
 	EM ( MF_MSG_NON_PMD_HUGE, "non-pmd-sized huge page" )		\
@@ -373,7 +372,6 @@ TRACE_EVENT(aer_event,
 	EM ( MF_MSG_CLEAN_LRU, "clean LRU page" )			\
 	EM ( MF_MSG_TRUNCATED_LRU, "already truncated LRU page" )	\
 	EM ( MF_MSG_BUDDY, "free buddy page" )				\
-	EM ( MF_MSG_BUDDY_2ND, "free buddy page (2nd try)" )		\
 	EM ( MF_MSG_DAX, "dax page" )					\
 	EM ( MF_MSG_UNSPLIT_THP, "unsplit thp" )			\
 	EMe ( MF_MSG_UNKNOWN, "unknown page" )
diff --git a/mm/memory-failure.c b/mm/memory-failure.c
index a47b741ca04b..09f079987928 100644
--- a/mm/memory-failure.c
+++ b/mm/memory-failure.c
@@ -723,7 +723,6 @@ static const char * const action_page_types[] = {
 	[MF_MSG_KERNEL_HIGH_ORDER]	= "high-order kernel page",
 	[MF_MSG_SLAB]			= "kernel slab page",
 	[MF_MSG_DIFFERENT_COMPOUND]	= "different compound page after locking",
-	[MF_MSG_POISONED_HUGE]		= "huge page already hardware poisoned",
 	[MF_MSG_HUGE]			= "huge page",
 	[MF_MSG_FREE_HUGE]		= "free huge page",
 	[MF_MSG_NON_PMD_HUGE]		= "non-pmd-sized huge page",
@@ -738,7 +737,6 @@ static const char * const action_page_types[] = {
 	[MF_MSG_CLEAN_LRU]		= "clean LRU page",
 	[MF_MSG_TRUNCATED_LRU]		= "already truncated LRU page",
 	[MF_MSG_BUDDY]			= "free buddy page",
-	[MF_MSG_BUDDY_2ND]		= "free buddy page (2nd try)",
 	[MF_MSG_DAX]			= "dax page",
 	[MF_MSG_UNSPLIT_THP]		= "unsplit thp",
 	[MF_MSG_UNKNOWN]		= "unknown page",
-- 
2.25.1












From: Naoya Horiguchi <nao.horiguchi@gmail.com>
To: 
Cc: 
Bcc: nao.horiguchi@gmail.com
Subject: 
Reply-To: 



^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH v2 4/4] mm/hwpoison: fix unpoison_memory()
  2021-10-25 23:04 [PATCH v2 0/4] mm/hwpoison: fix unpoison_memory() Naoya Horiguchi
                   ` (2 preceding siblings ...)
  2021-10-25 23:14 ` [PATCH 3/4] mm/hwpoison: remove MF_MSG_BUDDY_2ND and MF_MSG_POISONED_HUGE Naoya Horiguchi
@ 2021-10-25 23:16 ` Naoya Horiguchi
  2021-10-25 23:27   ` [PATCH RESEND " Naoya Horiguchi
  2021-10-27  4:00   ` [PATCH " Yang Shi
  3 siblings, 2 replies; 15+ messages in thread
From: Naoya Horiguchi @ 2021-10-25 23:16 UTC (permalink / raw)
  To: linux-mm
  Cc: Andrew Morton, David Hildenbrand, Oscar Salvador, Michal Hocko,
	Ding Hui, Tony Luck, Aneesh Kumar K.V, Miaohe Lin, Yang Shi,
	Peter Xu, Naoya Horiguchi, linux-kernel

From: Naoya Horiguchi <naoya.horiguchi@nec.com>

After recent soft-offline rework, error pages can be taken off from
buddy allocator, but the existing unpoison_memory() does not properly
undo the operation.  Moreover, due to the recent change on
__get_hwpoison_page(), get_page_unless_zero() is hardly called for
hwpoisoned pages.  So __get_hwpoison_page() highly likely returns zero
(meaning to fail to grab page refcount) and unpoison just clears
PG_hwpoison without releasing a refcount.  That does not lead to a
critical issue like kernel panic, but unpoisoned pages never get back to
buddy (leaked permanently), which is not good.

To (partially) fix this, we need to identify "taken off" pages from
other types of hwpoisoned pages.  We can't use refcount or page flags
for this purpose, so a pseudo flag is defined by hacking ->private
field.  Someone might think that put_page() is enough to cancel
taken-off pages, but the normal free path contains some operations not
suitable for the current purpose, and can fire VM_BUG_ON().

Note that unpoison_memory() is now supposed to be cancel hwpoison events
injected only by madvise() or /sys/devices/system/memory/{hard,soft}_offline_page,
not by MCE injection, so please don't try to use unpoison when testing
with MCE injection.

Signed-off-by: Naoya Horiguchi <naoya.horiguchi@nec.com>
---
ChangeLog v2:
- unpoison_memory() returns as commented
- explicitly avoids unpoisoning slab pages
- separates internal pinning function into __get_unpoison_page()
---
 include/linux/mm.h         |   1 +
 include/linux/page-flags.h |   4 ++
 mm/memory-failure.c        | 104 ++++++++++++++++++++++++++++++-------
 mm/page_alloc.c            |  23 ++++++++
 4 files changed, 113 insertions(+), 19 deletions(-)

diff --git a/include/linux/mm.h b/include/linux/mm.h
index 71d886470d71..c7ad3fdfee7c 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -3219,6 +3219,7 @@ enum mf_flags {
 	MF_ACTION_REQUIRED = 1 << 1,
 	MF_MUST_KILL = 1 << 2,
 	MF_SOFT_OFFLINE = 1 << 3,
+	MF_UNPOISON = 1 << 4,
 };
 extern int memory_failure(unsigned long pfn, int flags);
 extern void memory_failure_queue(unsigned long pfn, int flags);
diff --git a/include/linux/page-flags.h b/include/linux/page-flags.h
index b78f137acc62..8add006535f6 100644
--- a/include/linux/page-flags.h
+++ b/include/linux/page-flags.h
@@ -522,7 +522,11 @@ PAGEFLAG_FALSE(Uncached, uncached)
 PAGEFLAG(HWPoison, hwpoison, PF_ANY)
 TESTSCFLAG(HWPoison, hwpoison, PF_ANY)
 #define __PG_HWPOISON (1UL << PG_hwpoison)
+#define MAGIC_HWPOISON	0x4857504f49534f4e
+extern void SetPageHWPoisonTakenOff(struct page *page);
+extern void ClearPageHWPoisonTakenOff(struct page *page);
 extern bool take_page_off_buddy(struct page *page);
+extern bool take_page_back_buddy(struct page *page);
 #else
 PAGEFLAG_FALSE(HWPoison, hwpoison)
 #define __PG_HWPOISON 0
diff --git a/mm/memory-failure.c b/mm/memory-failure.c
index 09f079987928..a6f80a670012 100644
--- a/mm/memory-failure.c
+++ b/mm/memory-failure.c
@@ -1160,6 +1160,22 @@ static int page_action(struct page_state *ps, struct page *p,
 	return (result == MF_RECOVERED || result == MF_DELAYED) ? 0 : -EBUSY;
 }
 
+static inline bool PageHWPoisonTakenOff(struct page *page)
+{
+	return PageHWPoison(page) && page_private(page) == MAGIC_HWPOISON;
+}
+
+void SetPageHWPoisonTakenOff(struct page *page)
+{
+	set_page_private(page, MAGIC_HWPOISON);
+}
+
+void ClearPageHWPoisonTakenOff(struct page *page)
+{
+	if (PageHWPoison(page))
+		set_page_private(page, 0);
+}
+
 /*
  * Return true if a page type of a given page is supported by hwpoison
  * mechanism (while handling could fail), otherwise false.  This function
@@ -1262,6 +1278,27 @@ static int get_any_page(struct page *p, unsigned long flags)
 	return ret;
 }
 
+static int __get_unpoison_page(struct page *page)
+{
+	struct page *head = compound_head(page);
+	int ret = 0;
+	bool hugetlb = false;
+
+	ret = get_hwpoison_huge_page(head, &hugetlb);
+	if (hugetlb)
+		return ret;
+
+	/*
+	 * PageHWPoisonTakenOff pages are not only marked as PG_hwpoison,
+	 * but also isolated from buddy freelist, so need to identify the
+	 * state and have to cancel both operations to unpoison.
+	 */
+	if (PageHWPoisonTakenOff(head))
+		return -EHWPOISON;
+
+	return get_page_unless_zero(head) ? 1 : 0;
+}
+
 /**
  * get_hwpoison_page() - Get refcount for memory error handling
  * @p:		Raw error page (hit by memory error)
@@ -1278,18 +1315,26 @@ static int get_any_page(struct page *p, unsigned long flags)
  * extra care for the error page's state (as done in __get_hwpoison_page()),
  * and has some retry logic in get_any_page().
  *
+ * When called from unpoison_memory(), the caller should already ensure that
+ * the given page has PG_hwpoison. So it's never reused for other page
+ * allocations, and __get_unpoison_page() never races with them.
+ *
  * Return: 0 on failure,
  *         1 on success for in-use pages in a well-defined state,
  *         -EIO for pages on which we can not handle memory errors,
  *         -EBUSY when get_hwpoison_page() has raced with page lifecycle
- *         operations like allocation and free.
+ *         operations like allocation and free,
+ *         -EHWPOISON when the page is hwpoisoned and taken off from buddy.
  */
 static int get_hwpoison_page(struct page *p, unsigned long flags)
 {
 	int ret;
 
 	zone_pcp_disable(page_zone(p));
-	ret = get_any_page(p, flags);
+	if (flags & MF_UNPOISON)
+		ret = __get_unpoison_page(p);
+	else
+		ret = get_any_page(p, flags);
 	zone_pcp_enable(page_zone(p));
 
 	return ret;
@@ -1942,6 +1987,26 @@ core_initcall(memory_failure_init);
 		pr_info(fmt, pfn);			\
 })
 
+static inline int clear_page_hwpoison(struct ratelimit_state *rs, struct page *p)
+{
+	if (TestClearPageHWPoison(p)) {
+		unpoison_pr_info("Unpoison: Software-unpoisoned page %#lx\n",
+				 page_to_pfn(p), rs);
+		num_poisoned_pages_dec();
+		return 0;
+	}
+	return -EBUSY;
+}
+
+static inline int unpoison_taken_off_page(struct ratelimit_state *rs,
+					  struct page *p)
+{
+	if (take_page_back_buddy(p) && !clear_page_hwpoison(rs, p))
+		return 0;
+	else
+		return -EBUSY;
+}
+
 /**
  * unpoison_memory - Unpoison a previously poisoned page
  * @pfn: Page number of the to be unpoisoned page
@@ -1958,9 +2023,7 @@ int unpoison_memory(unsigned long pfn)
 {
 	struct page *page;
 	struct page *p;
-	int freeit = 0;
-	int ret = 0;
-	unsigned long flags = 0;
+	int ret = -EBUSY;
 	static DEFINE_RATELIMIT_STATE(unpoison_rs, DEFAULT_RATELIMIT_INTERVAL,
 					DEFAULT_RATELIMIT_BURST);
 
@@ -1996,24 +2059,27 @@ int unpoison_memory(unsigned long pfn)
 		goto unlock_mutex;
 	}
 
-	if (!get_hwpoison_page(p, flags)) {
-		if (TestClearPageHWPoison(p))
-			num_poisoned_pages_dec();
-		unpoison_pr_info("Unpoison: Software-unpoisoned free page %#lx\n",
-				 pfn, &unpoison_rs);
+	if (PageSlab(page))
 		goto unlock_mutex;
-	}
 
-	if (TestClearPageHWPoison(page)) {
-		unpoison_pr_info("Unpoison: Software-unpoisoned page %#lx\n",
-				 pfn, &unpoison_rs);
-		num_poisoned_pages_dec();
-		freeit = 1;
-	}
+	ret = get_hwpoison_page(p, MF_UNPOISON);
+	if (!ret) {
+		ret = clear_page_hwpoison(&unpoison_rs, p);
+	} else if (ret < 0) {
+		if (ret == -EHWPOISON) {
+			ret = unpoison_taken_off_page(&unpoison_rs, p);
+		} else
+			unpoison_pr_info("Unpoison: failed to grab page %#lx\n",
+					 pfn, &unpoison_rs);
+	} else {
+		int freeit = clear_page_hwpoison(&unpoison_rs, p);
 
-	put_page(page);
-	if (freeit && !(pfn == my_zero_pfn(0) && page_count(p) == 1))
 		put_page(page);
+		if (freeit && !(pfn == my_zero_pfn(0) && page_count(p) == 1)) {
+			put_page(page);
+			ret = 0;
+		}
+	}
 
 unlock_mutex:
 	mutex_unlock(&mf_mutex);
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 4ea590646f89..b6e4cbb44c54 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -9466,6 +9466,7 @@ bool take_page_off_buddy(struct page *page)
 			del_page_from_free_list(page_head, zone, page_order);
 			break_down_buddy_pages(zone, page_head, page, 0,
 						page_order, migratetype);
+			SetPageHWPoisonTakenOff(page);
 			if (!is_migrate_isolate(migratetype))
 				__mod_zone_freepage_state(zone, -1, migratetype);
 			ret = true;
@@ -9477,4 +9478,26 @@ bool take_page_off_buddy(struct page *page)
 	spin_unlock_irqrestore(&zone->lock, flags);
 	return ret;
 }
+
+/*
+ * Cancel takeoff done by take_page_off_buddy().
+ */
+bool take_page_back_buddy(struct page *page)
+{
+	struct zone *zone = page_zone(page);
+	unsigned long pfn = page_to_pfn(page);
+	unsigned long flags;
+	int migratetype = get_pfnblock_migratetype(page, pfn);
+	bool ret = false;
+
+	spin_lock_irqsave(&zone->lock, flags);
+	if (put_page_testzero(page)) {
+		ClearPageHWPoisonTakenOff(page);
+		__free_one_page(page, pfn, zone, 0, migratetype, FPI_NONE);
+		ret = true;
+	}
+	spin_unlock_irqrestore(&zone->lock, flags);
+
+	return ret;
+}
 #endif
-- 
2.25.1



On Tue, Oct 26, 2021 at 08:04:59AM +0900, Naoya Horiguchi wrote:
> Hi,
> 
> I updated unpoison fix patchset (sorry for long blank time since v1).
> 
> Main purpose of this series is to sync unpoison code to recent changes
> around how hwpoison code takes page refcount.  Unpoison should work or
> simply fail (without crash) if impossible.
> 
> The recent works of keeping hwpoison pages in shmem pagecache introduce
> a new state of hwpoisoned pages, but unpoison for such pages is not
> supported yet with this series.
> 
> It seems that soft-offline and unpoison can be used as general purpose
> page offline/online mechanism (not in the context of memory error). I
> think that we need some additional works to realize it because currently
> soft-offline and unpoison are assumed not to happen so frequently
> (print out too many messages for aggressive usecases). But anyway this
> could be another interesting next topic.
> 
> v1: https://lore.kernel.org/linux-mm/20210614021212.223326-1-nao.horiguchi@gmail.com/
> 
> Thanks,
> Naoya Horiguchi
> ---
> Summary:
> 
> Naoya Horiguchi (4):
>       mm/hwpoison: mf_mutex for soft offline and unpoison
>       mm/hwpoison: remove race consideration
>       mm/hwpoison: remove MF_MSG_BUDDY_2ND and MF_MSG_POISONED_HUGE
>       mm/hwpoison: fix unpoison_memory()
> 
>  include/linux/mm.h         |   3 +-
>  include/linux/page-flags.h |   4 ++
>  include/ras/ras_event.h    |   2 -
>  mm/memory-failure.c        | 166 ++++++++++++++++++++++++++++-----------------
>  mm/page_alloc.c            |  23 +++++++
>  5 files changed, 130 insertions(+), 68 deletions(-)

To: 
Cc: 
Bcc: nao.horiguchi@gmail.com
Subject: 
Reply-To: 



^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH RESEND v2 4/4] mm/hwpoison: fix unpoison_memory()
  2021-10-25 23:16 ` [PATCH v2 4/4] mm/hwpoison: fix unpoison_memory() Naoya Horiguchi
@ 2021-10-25 23:27   ` Naoya Horiguchi
  2021-10-27  1:26     ` Ding Hui
  2021-10-27  4:00   ` [PATCH " Yang Shi
  1 sibling, 1 reply; 15+ messages in thread
From: Naoya Horiguchi @ 2021-10-25 23:27 UTC (permalink / raw)
  To: linux-mm
  Cc: Andrew Morton, David Hildenbrand, Oscar Salvador, Michal Hocko,
	Ding Hui, Tony Luck, Aneesh Kumar K.V, Miaohe Lin, Yang Shi,
	Peter Xu, Naoya Horiguchi, Naoya Horiguchi, linux-kernel

(Please ignore previous patch 4/4 which leaves the replied message at
the end of body, this was due to my wrong manual edit, sorry about noise.
I resend 4/4.)

From: Naoya Horiguchi <naoya.horiguchi@nec.com>

After recent soft-offline rework, error pages can be taken off from
buddy allocator, but the existing unpoison_memory() does not properly
undo the operation.  Moreover, due to the recent change on
__get_hwpoison_page(), get_page_unless_zero() is hardly called for
hwpoisoned pages.  So __get_hwpoison_page() highly likely returns zero
(meaning to fail to grab page refcount) and unpoison just clears
PG_hwpoison without releasing a refcount.  That does not lead to a
critical issue like kernel panic, but unpoisoned pages never get back to
buddy (leaked permanently), which is not good.

To (partially) fix this, we need to identify "taken off" pages from
other types of hwpoisoned pages.  We can't use refcount or page flags
for this purpose, so a pseudo flag is defined by hacking ->private
field.  Someone might think that put_page() is enough to cancel
taken-off pages, but the normal free path contains some operations not
suitable for the current purpose, and can fire VM_BUG_ON().

Note that unpoison_memory() is now supposed to be cancel hwpoison events
injected only by madvise() or /sys/devices/system/memory/{hard,soft}_offline_page,
not by MCE injection, so please don't try to use unpoison when testing
with MCE injection.

Signed-off-by: Naoya Horiguchi <naoya.horiguchi@nec.com>
---
ChangeLog v2:
- unpoison_memory() returns as commented
- explicitly avoids unpoisoning slab pages
- separates internal pinning function into __get_unpoison_page()
---
 include/linux/mm.h         |   1 +
 include/linux/page-flags.h |   4 ++
 mm/memory-failure.c        | 104 ++++++++++++++++++++++++++++++-------
 mm/page_alloc.c            |  23 ++++++++
 4 files changed, 113 insertions(+), 19 deletions(-)

diff --git a/include/linux/mm.h b/include/linux/mm.h
index 71d886470d71..c7ad3fdfee7c 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -3219,6 +3219,7 @@ enum mf_flags {
 	MF_ACTION_REQUIRED = 1 << 1,
 	MF_MUST_KILL = 1 << 2,
 	MF_SOFT_OFFLINE = 1 << 3,
+	MF_UNPOISON = 1 << 4,
 };
 extern int memory_failure(unsigned long pfn, int flags);
 extern void memory_failure_queue(unsigned long pfn, int flags);
diff --git a/include/linux/page-flags.h b/include/linux/page-flags.h
index b78f137acc62..8add006535f6 100644
--- a/include/linux/page-flags.h
+++ b/include/linux/page-flags.h
@@ -522,7 +522,11 @@ PAGEFLAG_FALSE(Uncached, uncached)
 PAGEFLAG(HWPoison, hwpoison, PF_ANY)
 TESTSCFLAG(HWPoison, hwpoison, PF_ANY)
 #define __PG_HWPOISON (1UL << PG_hwpoison)
+#define MAGIC_HWPOISON	0x4857504f49534f4e
+extern void SetPageHWPoisonTakenOff(struct page *page);
+extern void ClearPageHWPoisonTakenOff(struct page *page);
 extern bool take_page_off_buddy(struct page *page);
+extern bool take_page_back_buddy(struct page *page);
 #else
 PAGEFLAG_FALSE(HWPoison, hwpoison)
 #define __PG_HWPOISON 0
diff --git a/mm/memory-failure.c b/mm/memory-failure.c
index 09f079987928..a6f80a670012 100644
--- a/mm/memory-failure.c
+++ b/mm/memory-failure.c
@@ -1160,6 +1160,22 @@ static int page_action(struct page_state *ps, struct page *p,
 	return (result == MF_RECOVERED || result == MF_DELAYED) ? 0 : -EBUSY;
 }
 
+static inline bool PageHWPoisonTakenOff(struct page *page)
+{
+	return PageHWPoison(page) && page_private(page) == MAGIC_HWPOISON;
+}
+
+void SetPageHWPoisonTakenOff(struct page *page)
+{
+	set_page_private(page, MAGIC_HWPOISON);
+}
+
+void ClearPageHWPoisonTakenOff(struct page *page)
+{
+	if (PageHWPoison(page))
+		set_page_private(page, 0);
+}
+
 /*
  * Return true if a page type of a given page is supported by hwpoison
  * mechanism (while handling could fail), otherwise false.  This function
@@ -1262,6 +1278,27 @@ static int get_any_page(struct page *p, unsigned long flags)
 	return ret;
 }
 
+static int __get_unpoison_page(struct page *page)
+{
+	struct page *head = compound_head(page);
+	int ret = 0;
+	bool hugetlb = false;
+
+	ret = get_hwpoison_huge_page(head, &hugetlb);
+	if (hugetlb)
+		return ret;
+
+	/*
+	 * PageHWPoisonTakenOff pages are not only marked as PG_hwpoison,
+	 * but also isolated from buddy freelist, so need to identify the
+	 * state and have to cancel both operations to unpoison.
+	 */
+	if (PageHWPoisonTakenOff(head))
+		return -EHWPOISON;
+
+	return get_page_unless_zero(head) ? 1 : 0;
+}
+
 /**
  * get_hwpoison_page() - Get refcount for memory error handling
  * @p:		Raw error page (hit by memory error)
@@ -1278,18 +1315,26 @@ static int get_any_page(struct page *p, unsigned long flags)
  * extra care for the error page's state (as done in __get_hwpoison_page()),
  * and has some retry logic in get_any_page().
  *
+ * When called from unpoison_memory(), the caller should already ensure that
+ * the given page has PG_hwpoison. So it's never reused for other page
+ * allocations, and __get_unpoison_page() never races with them.
+ *
  * Return: 0 on failure,
  *         1 on success for in-use pages in a well-defined state,
  *         -EIO for pages on which we can not handle memory errors,
  *         -EBUSY when get_hwpoison_page() has raced with page lifecycle
- *         operations like allocation and free.
+ *         operations like allocation and free,
+ *         -EHWPOISON when the page is hwpoisoned and taken off from buddy.
  */
 static int get_hwpoison_page(struct page *p, unsigned long flags)
 {
 	int ret;
 
 	zone_pcp_disable(page_zone(p));
-	ret = get_any_page(p, flags);
+	if (flags & MF_UNPOISON)
+		ret = __get_unpoison_page(p);
+	else
+		ret = get_any_page(p, flags);
 	zone_pcp_enable(page_zone(p));
 
 	return ret;
@@ -1942,6 +1987,26 @@ core_initcall(memory_failure_init);
 		pr_info(fmt, pfn);			\
 })
 
+static inline int clear_page_hwpoison(struct ratelimit_state *rs, struct page *p)
+{
+	if (TestClearPageHWPoison(p)) {
+		unpoison_pr_info("Unpoison: Software-unpoisoned page %#lx\n",
+				 page_to_pfn(p), rs);
+		num_poisoned_pages_dec();
+		return 0;
+	}
+	return -EBUSY;
+}
+
+static inline int unpoison_taken_off_page(struct ratelimit_state *rs,
+					  struct page *p)
+{
+	if (take_page_back_buddy(p) && !clear_page_hwpoison(rs, p))
+		return 0;
+	else
+		return -EBUSY;
+}
+
 /**
  * unpoison_memory - Unpoison a previously poisoned page
  * @pfn: Page number of the to be unpoisoned page
@@ -1958,9 +2023,7 @@ int unpoison_memory(unsigned long pfn)
 {
 	struct page *page;
 	struct page *p;
-	int freeit = 0;
-	int ret = 0;
-	unsigned long flags = 0;
+	int ret = -EBUSY;
 	static DEFINE_RATELIMIT_STATE(unpoison_rs, DEFAULT_RATELIMIT_INTERVAL,
 					DEFAULT_RATELIMIT_BURST);
 
@@ -1996,24 +2059,27 @@ int unpoison_memory(unsigned long pfn)
 		goto unlock_mutex;
 	}
 
-	if (!get_hwpoison_page(p, flags)) {
-		if (TestClearPageHWPoison(p))
-			num_poisoned_pages_dec();
-		unpoison_pr_info("Unpoison: Software-unpoisoned free page %#lx\n",
-				 pfn, &unpoison_rs);
+	if (PageSlab(page))
 		goto unlock_mutex;
-	}
 
-	if (TestClearPageHWPoison(page)) {
-		unpoison_pr_info("Unpoison: Software-unpoisoned page %#lx\n",
-				 pfn, &unpoison_rs);
-		num_poisoned_pages_dec();
-		freeit = 1;
-	}
+	ret = get_hwpoison_page(p, MF_UNPOISON);
+	if (!ret) {
+		ret = clear_page_hwpoison(&unpoison_rs, p);
+	} else if (ret < 0) {
+		if (ret == -EHWPOISON) {
+			ret = unpoison_taken_off_page(&unpoison_rs, p);
+		} else
+			unpoison_pr_info("Unpoison: failed to grab page %#lx\n",
+					 pfn, &unpoison_rs);
+	} else {
+		int freeit = clear_page_hwpoison(&unpoison_rs, p);
 
-	put_page(page);
-	if (freeit && !(pfn == my_zero_pfn(0) && page_count(p) == 1))
 		put_page(page);
+		if (freeit && !(pfn == my_zero_pfn(0) && page_count(p) == 1)) {
+			put_page(page);
+			ret = 0;
+		}
+	}
 
 unlock_mutex:
 	mutex_unlock(&mf_mutex);
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 4ea590646f89..b6e4cbb44c54 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -9466,6 +9466,7 @@ bool take_page_off_buddy(struct page *page)
 			del_page_from_free_list(page_head, zone, page_order);
 			break_down_buddy_pages(zone, page_head, page, 0,
 						page_order, migratetype);
+			SetPageHWPoisonTakenOff(page);
 			if (!is_migrate_isolate(migratetype))
 				__mod_zone_freepage_state(zone, -1, migratetype);
 			ret = true;
@@ -9477,4 +9478,26 @@ bool take_page_off_buddy(struct page *page)
 	spin_unlock_irqrestore(&zone->lock, flags);
 	return ret;
 }
+
+/*
+ * Cancel takeoff done by take_page_off_buddy().
+ */
+bool take_page_back_buddy(struct page *page)
+{
+	struct zone *zone = page_zone(page);
+	unsigned long pfn = page_to_pfn(page);
+	unsigned long flags;
+	int migratetype = get_pfnblock_migratetype(page, pfn);
+	bool ret = false;
+
+	spin_lock_irqsave(&zone->lock, flags);
+	if (put_page_testzero(page)) {
+		ClearPageHWPoisonTakenOff(page);
+		__free_one_page(page, pfn, zone, 0, migratetype, FPI_NONE);
+		ret = true;
+	}
+	spin_unlock_irqrestore(&zone->lock, flags);
+
+	return ret;
+}
 #endif
-- 
2.25.1



^ permalink raw reply related	[flat|nested] 15+ messages in thread

* Re: [PATCH v2 2/4] mm/hwpoison: remove race consideration
  2021-10-25 23:05 ` [PATCH v2 2/4] mm/hwpoison: remove race consideration Naoya Horiguchi
@ 2021-10-27  1:04   ` Yang Shi
  2021-10-27  1:18     ` Naoya Horiguchi
  0 siblings, 1 reply; 15+ messages in thread
From: Yang Shi @ 2021-10-27  1:04 UTC (permalink / raw)
  To: Naoya Horiguchi
  Cc: Linux MM, Andrew Morton, David Hildenbrand, Oscar Salvador,
	Michal Hocko, Ding Hui, Tony Luck, Aneesh Kumar K.V, Miaohe Lin,
	Peter Xu, Naoya Horiguchi, Linux Kernel Mailing List

On Mon, Oct 25, 2021 at 4:06 PM Naoya Horiguchi
<naoya.horiguchi@linux.dev> wrote:
>
> From: Naoya Horiguchi <naoya.horiguchi@nec.com>
>
> Now memory_failure() and unpoison_memory() are protected by mf_mutex,
> so no need to explicitly check races between them.  So remove them.

It seems this patch could be folded into patch #1. Some "unlock_mutex"
were added by patch #1 then they were removed by this patch
immediately, it seems a bit of a waste and this patch is actually the
by-product of patch #1.

>
> Signed-off-by: Naoya Horiguchi <naoya.horiguchi@nec.com>
> ---
>  mm/memory-failure.c | 37 -------------------------------------
>  1 file changed, 37 deletions(-)
>
> diff --git a/mm/memory-failure.c b/mm/memory-failure.c
> index 97297edfbd8e..a47b741ca04b 100644
> --- a/mm/memory-failure.c
> +++ b/mm/memory-failure.c
> @@ -1507,14 +1507,6 @@ static int memory_failure_hugetlb(unsigned long pfn, int flags)
>         lock_page(head);
>         page_flags = head->flags;
>
> -       if (!PageHWPoison(head)) {
> -               pr_err("Memory failure: %#lx: just unpoisoned\n", pfn);
> -               num_poisoned_pages_dec();
> -               unlock_page(head);
> -               put_page(head);
> -               return 0;
> -       }
> -
>         /*
>          * TODO: hwpoison for pud-sized hugetlb doesn't work right now, so
>          * simply disable it. In order to make it work properly, we need
> @@ -1789,16 +1781,6 @@ int memory_failure(unsigned long pfn, int flags)
>          */
>         page_flags = p->flags;
>
> -       /*
> -        * unpoison always clear PG_hwpoison inside page lock
> -        */
> -       if (!PageHWPoison(p)) {
> -               pr_err("Memory failure: %#lx: just unpoisoned\n", pfn);
> -               num_poisoned_pages_dec();
> -               unlock_page(p);
> -               put_page(p);
> -               goto unlock_mutex;
> -       }
>         if (hwpoison_filter(p)) {
>                 if (TestClearPageHWPoison(p))
>                         num_poisoned_pages_dec();
> @@ -2016,17 +1998,6 @@ int unpoison_memory(unsigned long pfn)
>                 goto unlock_mutex;
>         }
>
> -       /*
> -        * unpoison_memory() can encounter thp only when the thp is being
> -        * worked by memory_failure() and the page lock is not held yet.
> -        * In such case, we yield to memory_failure() and make unpoison fail.
> -        */
> -       if (!PageHuge(page) && PageTransHuge(page)) {
> -               unpoison_pr_info("Unpoison: Memory failure is now running on %#lx\n",
> -                                pfn, &unpoison_rs);
> -               goto unlock_mutex;
> -       }
> -
>         if (!get_hwpoison_page(p, flags)) {
>                 if (TestClearPageHWPoison(p))
>                         num_poisoned_pages_dec();
> @@ -2035,20 +2006,12 @@ int unpoison_memory(unsigned long pfn)
>                 goto unlock_mutex;
>         }
>
> -       lock_page(page);
> -       /*
> -        * This test is racy because PG_hwpoison is set outside of page lock.
> -        * That's acceptable because that won't trigger kernel panic. Instead,
> -        * the PG_hwpoison page will be caught and isolated on the entrance to
> -        * the free buddy page pool.
> -        */
>         if (TestClearPageHWPoison(page)) {
>                 unpoison_pr_info("Unpoison: Software-unpoisoned page %#lx\n",
>                                  pfn, &unpoison_rs);
>                 num_poisoned_pages_dec();
>                 freeit = 1;
>         }
> -       unlock_page(page);
>
>         put_page(page);
>         if (freeit && !(pfn == my_zero_pfn(0) && page_count(p) == 1))
> --
> 2.25.1
>


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 3/4] mm/hwpoison: remove MF_MSG_BUDDY_2ND and MF_MSG_POISONED_HUGE
  2021-10-25 23:14 ` [PATCH 3/4] mm/hwpoison: remove MF_MSG_BUDDY_2ND and MF_MSG_POISONED_HUGE Naoya Horiguchi
@ 2021-10-27  1:05   ` Yang Shi
  0 siblings, 0 replies; 15+ messages in thread
From: Yang Shi @ 2021-10-27  1:05 UTC (permalink / raw)
  To: Naoya Horiguchi
  Cc: Linux MM, Andrew Morton, David Hildenbrand, Oscar Salvador,
	Michal Hocko, Ding Hui, Tony Luck, Aneesh Kumar K.V, Miaohe Lin,
	Peter Xu, Naoya Horiguchi, Linux Kernel Mailing List

On Mon, Oct 25, 2021 at 4:14 PM Naoya Horiguchi <nao.horiguchi@gmail.com> wrote:
>
> (I failed to send patch 3/4 and 4/4 due to the ratelimit of linux.dev,
> so I switched mail server...)
>
> From: Naoya Horiguchi <naoya.horiguchi@nec.com>
>
> These action_page_types are no longer used, so remove them.
>
> Signed-off-by: Naoya Horiguchi <naoya.horiguchi@nec.com>

Acked-by: Yang Shi <shy828301@gmail.com>

> ---
>  include/linux/mm.h      | 2 --
>  include/ras/ras_event.h | 2 --
>  mm/memory-failure.c     | 2 --
>  3 files changed, 6 deletions(-)
>
> diff --git a/include/linux/mm.h b/include/linux/mm.h
> index a3229f609856..71d886470d71 100644
> --- a/include/linux/mm.h
> +++ b/include/linux/mm.h
> @@ -3246,7 +3246,6 @@ enum mf_action_page_type {
>         MF_MSG_KERNEL_HIGH_ORDER,
>         MF_MSG_SLAB,
>         MF_MSG_DIFFERENT_COMPOUND,
> -       MF_MSG_POISONED_HUGE,
>         MF_MSG_HUGE,
>         MF_MSG_FREE_HUGE,
>         MF_MSG_NON_PMD_HUGE,
> @@ -3261,7 +3260,6 @@ enum mf_action_page_type {
>         MF_MSG_CLEAN_LRU,
>         MF_MSG_TRUNCATED_LRU,
>         MF_MSG_BUDDY,
> -       MF_MSG_BUDDY_2ND,
>         MF_MSG_DAX,
>         MF_MSG_UNSPLIT_THP,
>         MF_MSG_UNKNOWN,
> diff --git a/include/ras/ras_event.h b/include/ras/ras_event.h
> index 0bdbc0d17d2f..d0337a41141c 100644
> --- a/include/ras/ras_event.h
> +++ b/include/ras/ras_event.h
> @@ -358,7 +358,6 @@ TRACE_EVENT(aer_event,
>         EM ( MF_MSG_KERNEL_HIGH_ORDER, "high-order kernel page" )       \
>         EM ( MF_MSG_SLAB, "kernel slab page" )                          \
>         EM ( MF_MSG_DIFFERENT_COMPOUND, "different compound page after locking" ) \
> -       EM ( MF_MSG_POISONED_HUGE, "huge page already hardware poisoned" )      \
>         EM ( MF_MSG_HUGE, "huge page" )                                 \
>         EM ( MF_MSG_FREE_HUGE, "free huge page" )                       \
>         EM ( MF_MSG_NON_PMD_HUGE, "non-pmd-sized huge page" )           \
> @@ -373,7 +372,6 @@ TRACE_EVENT(aer_event,
>         EM ( MF_MSG_CLEAN_LRU, "clean LRU page" )                       \
>         EM ( MF_MSG_TRUNCATED_LRU, "already truncated LRU page" )       \
>         EM ( MF_MSG_BUDDY, "free buddy page" )                          \
> -       EM ( MF_MSG_BUDDY_2ND, "free buddy page (2nd try)" )            \
>         EM ( MF_MSG_DAX, "dax page" )                                   \
>         EM ( MF_MSG_UNSPLIT_THP, "unsplit thp" )                        \
>         EMe ( MF_MSG_UNKNOWN, "unknown page" )
> diff --git a/mm/memory-failure.c b/mm/memory-failure.c
> index a47b741ca04b..09f079987928 100644
> --- a/mm/memory-failure.c
> +++ b/mm/memory-failure.c
> @@ -723,7 +723,6 @@ static const char * const action_page_types[] = {
>         [MF_MSG_KERNEL_HIGH_ORDER]      = "high-order kernel page",
>         [MF_MSG_SLAB]                   = "kernel slab page",
>         [MF_MSG_DIFFERENT_COMPOUND]     = "different compound page after locking",
> -       [MF_MSG_POISONED_HUGE]          = "huge page already hardware poisoned",
>         [MF_MSG_HUGE]                   = "huge page",
>         [MF_MSG_FREE_HUGE]              = "free huge page",
>         [MF_MSG_NON_PMD_HUGE]           = "non-pmd-sized huge page",
> @@ -738,7 +737,6 @@ static const char * const action_page_types[] = {
>         [MF_MSG_CLEAN_LRU]              = "clean LRU page",
>         [MF_MSG_TRUNCATED_LRU]          = "already truncated LRU page",
>         [MF_MSG_BUDDY]                  = "free buddy page",
> -       [MF_MSG_BUDDY_2ND]              = "free buddy page (2nd try)",
>         [MF_MSG_DAX]                    = "dax page",
>         [MF_MSG_UNSPLIT_THP]            = "unsplit thp",
>         [MF_MSG_UNKNOWN]                = "unknown page",
> --
> 2.25.1
>
>
>
>
>
>
>
>
>
>
>
>
> From: Naoya Horiguchi <nao.horiguchi@gmail.com>
> To:
> Cc:
> Bcc: nao.horiguchi@gmail.com
> Subject:
> Reply-To:
>


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH v2 2/4] mm/hwpoison: remove race consideration
  2021-10-27  1:04   ` Yang Shi
@ 2021-10-27  1:18     ` Naoya Horiguchi
  0 siblings, 0 replies; 15+ messages in thread
From: Naoya Horiguchi @ 2021-10-27  1:18 UTC (permalink / raw)
  To: Yang Shi
  Cc: Linux MM, Andrew Morton, David Hildenbrand, Oscar Salvador,
	Michal Hocko, Ding Hui, Tony Luck, Aneesh Kumar K.V, Miaohe Lin,
	Peter Xu, Naoya Horiguchi, Linux Kernel Mailing List

On Tue, Oct 26, 2021 at 06:04:13PM -0700, Yang Shi wrote:
> On Mon, Oct 25, 2021 at 4:06 PM Naoya Horiguchi
> <naoya.horiguchi@linux.dev> wrote:
> >
> > From: Naoya Horiguchi <naoya.horiguchi@nec.com>
> >
> > Now memory_failure() and unpoison_memory() are protected by mf_mutex,
> > so no need to explicitly check races between them.  So remove them.
> 
> It seems this patch could be folded into patch #1. Some "unlock_mutex"
> were added by patch #1 then they were removed by this patch
> immediately, it seems a bit of a waste and this patch is actually the
> by-product of patch #1.

OK, I'll do this in the next post.

Thanks,
Naoya Horiguchi


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH RESEND v2 4/4] mm/hwpoison: fix unpoison_memory()
  2021-10-25 23:27   ` [PATCH RESEND " Naoya Horiguchi
@ 2021-10-27  1:26     ` Ding Hui
  2021-10-27  2:29       ` Naoya Horiguchi
  0 siblings, 1 reply; 15+ messages in thread
From: Ding Hui @ 2021-10-27  1:26 UTC (permalink / raw)
  To: Naoya Horiguchi, linux-mm
  Cc: Andrew Morton, David Hildenbrand, Oscar Salvador, Michal Hocko,
	Tony Luck, Aneesh Kumar K.V, Miaohe Lin, Yang Shi, Peter Xu,
	Naoya Horiguchi, Naoya Horiguchi, linux-kernel

On 2021/10/26 7:27, Naoya Horiguchi wrote:
> (Please ignore previous patch 4/4 which leaves the replied message at
> the end of body, this was due to my wrong manual edit, sorry about noise.
> I resend 4/4.)
> 
> From: Naoya Horiguchi <naoya.horiguchi@nec.com>
> 
> After recent soft-offline rework, error pages can be taken off from
> buddy allocator, but the existing unpoison_memory() does not properly
> undo the operation.  Moreover, due to the recent change on
> __get_hwpoison_page(), get_page_unless_zero() is hardly called for
> hwpoisoned pages.  So __get_hwpoison_page() highly likely returns zero

Sorry, can you explain that __get_hwpoison_page() likely go which branch 
to return zero for hwpoisoned pages ?

not returns -EBUSY because HWPoisonHandlable() is false ?

> (meaning to fail to grab page refcount) and unpoison just clears
> PG_hwpoison without releasing a refcount.  That does not lead to a
> critical issue like kernel panic, but unpoisoned pages never get back to
> buddy (leaked permanently), which is not good.
> 
> To (partially) fix this, we need to identify "taken off" pages from
> other types of hwpoisoned pages.  We can't use refcount or page flags
> for this purpose, so a pseudo flag is defined by hacking ->private
> field.  Someone might think that put_page() is enough to cancel
> taken-off pages, but the normal free path contains some operations not
> suitable for the current purpose, and can fire VM_BUG_ON().
> 
> Note that unpoison_memory() is now supposed to be cancel hwpoison events
> injected only by madvise() or /sys/devices/system/memory/{hard,soft}_offline_page,
> not by MCE injection, so please don't try to use unpoison when testing
> with MCE injection.
> 
> Signed-off-by: Naoya Horiguchi <naoya.horiguchi@nec.com>
> ---
> ChangeLog v2:
> - unpoison_memory() returns as commented
> - explicitly avoids unpoisoning slab pages
> - separates internal pinning function into __get_unpoison_page()
> ---
>   include/linux/mm.h         |   1 +
>   include/linux/page-flags.h |   4 ++
>   mm/memory-failure.c        | 104 ++++++++++++++++++++++++++++++-------
>   mm/page_alloc.c            |  23 ++++++++
>   4 files changed, 113 insertions(+), 19 deletions(-)
> 
> diff --git a/include/linux/mm.h b/include/linux/mm.h
> index 71d886470d71..c7ad3fdfee7c 100644
> --- a/include/linux/mm.h
> +++ b/include/linux/mm.h
> @@ -3219,6 +3219,7 @@ enum mf_flags {
>   	MF_ACTION_REQUIRED = 1 << 1,
>   	MF_MUST_KILL = 1 << 2,
>   	MF_SOFT_OFFLINE = 1 << 3,
> +	MF_UNPOISON = 1 << 4,
>   };
>   extern int memory_failure(unsigned long pfn, int flags);
>   extern void memory_failure_queue(unsigned long pfn, int flags);
> diff --git a/include/linux/page-flags.h b/include/linux/page-flags.h
> index b78f137acc62..8add006535f6 100644
> --- a/include/linux/page-flags.h
> +++ b/include/linux/page-flags.h
> @@ -522,7 +522,11 @@ PAGEFLAG_FALSE(Uncached, uncached)
>   PAGEFLAG(HWPoison, hwpoison, PF_ANY)
>   TESTSCFLAG(HWPoison, hwpoison, PF_ANY)
>   #define __PG_HWPOISON (1UL << PG_hwpoison)
> +#define MAGIC_HWPOISON	0x4857504f49534f4e
> +extern void SetPageHWPoisonTakenOff(struct page *page);
> +extern void ClearPageHWPoisonTakenOff(struct page *page);
>   extern bool take_page_off_buddy(struct page *page);
> +extern bool take_page_back_buddy(struct page *page);
>   #else
>   PAGEFLAG_FALSE(HWPoison, hwpoison)
>   #define __PG_HWPOISON 0
> diff --git a/mm/memory-failure.c b/mm/memory-failure.c
> index 09f079987928..a6f80a670012 100644
> --- a/mm/memory-failure.c
> +++ b/mm/memory-failure.c
> @@ -1160,6 +1160,22 @@ static int page_action(struct page_state *ps, struct page *p,
>   	return (result == MF_RECOVERED || result == MF_DELAYED) ? 0 : -EBUSY;
>   }
>   
> +static inline bool PageHWPoisonTakenOff(struct page *page)
> +{
> +	return PageHWPoison(page) && page_private(page) == MAGIC_HWPOISON;
> +}
> +
> +void SetPageHWPoisonTakenOff(struct page *page)
> +{
> +	set_page_private(page, MAGIC_HWPOISON);
> +}
> +
> +void ClearPageHWPoisonTakenOff(struct page *page)
> +{
> +	if (PageHWPoison(page))
> +		set_page_private(page, 0);
> +}
> +
>   /*
>    * Return true if a page type of a given page is supported by hwpoison
>    * mechanism (while handling could fail), otherwise false.  This function
> @@ -1262,6 +1278,27 @@ static int get_any_page(struct page *p, unsigned long flags)
>   	return ret;
>   }
>   
> +static int __get_unpoison_page(struct page *page)
> +{
> +	struct page *head = compound_head(page);
> +	int ret = 0;
> +	bool hugetlb = false;
> +
> +	ret = get_hwpoison_huge_page(head, &hugetlb);
> +	if (hugetlb)
> +		return ret;
> +
> +	/*
> +	 * PageHWPoisonTakenOff pages are not only marked as PG_hwpoison,
> +	 * but also isolated from buddy freelist, so need to identify the
> +	 * state and have to cancel both operations to unpoison.
> +	 */
> +	if (PageHWPoisonTakenOff(head))
> +		return -EHWPOISON;
> +
> +	return get_page_unless_zero(head) ? 1 : 0;
> +}
> +
>   /**
>    * get_hwpoison_page() - Get refcount for memory error handling
>    * @p:		Raw error page (hit by memory error)
> @@ -1278,18 +1315,26 @@ static int get_any_page(struct page *p, unsigned long flags)
>    * extra care for the error page's state (as done in __get_hwpoison_page()),
>    * and has some retry logic in get_any_page().
>    *
> + * When called from unpoison_memory(), the caller should already ensure that
> + * the given page has PG_hwpoison. So it's never reused for other page
> + * allocations, and __get_unpoison_page() never races with them.
> + *
>    * Return: 0 on failure,
>    *         1 on success for in-use pages in a well-defined state,
>    *         -EIO for pages on which we can not handle memory errors,
>    *         -EBUSY when get_hwpoison_page() has raced with page lifecycle
> - *         operations like allocation and free.
> + *         operations like allocation and free,
> + *         -EHWPOISON when the page is hwpoisoned and taken off from buddy.
>    */
>   static int get_hwpoison_page(struct page *p, unsigned long flags)
>   {
>   	int ret;
>   
>   	zone_pcp_disable(page_zone(p));
> -	ret = get_any_page(p, flags);
> +	if (flags & MF_UNPOISON)
> +		ret = __get_unpoison_page(p);
> +	else
> +		ret = get_any_page(p, flags);
>   	zone_pcp_enable(page_zone(p));
>   
>   	return ret;
> @@ -1942,6 +1987,26 @@ core_initcall(memory_failure_init);
>   		pr_info(fmt, pfn);			\
>   })
>   
> +static inline int clear_page_hwpoison(struct ratelimit_state *rs, struct page *p)
> +{
> +	if (TestClearPageHWPoison(p)) {
> +		unpoison_pr_info("Unpoison: Software-unpoisoned page %#lx\n",
> +				 page_to_pfn(p), rs);
> +		num_poisoned_pages_dec();
> +		return 0;
> +	}
> +	return -EBUSY;
> +}
> +
> +static inline int unpoison_taken_off_page(struct ratelimit_state *rs,
> +					  struct page *p)
> +{
> +	if (take_page_back_buddy(p) && !clear_page_hwpoison(rs, p))
> +		return 0;
> +	else
> +		return -EBUSY;
> +}
> +
>   /**
>    * unpoison_memory - Unpoison a previously poisoned page
>    * @pfn: Page number of the to be unpoisoned page
> @@ -1958,9 +2023,7 @@ int unpoison_memory(unsigned long pfn)
>   {
>   	struct page *page;
>   	struct page *p;
> -	int freeit = 0;
> -	int ret = 0;
> -	unsigned long flags = 0;
> +	int ret = -EBUSY;
>   	static DEFINE_RATELIMIT_STATE(unpoison_rs, DEFAULT_RATELIMIT_INTERVAL,
>   					DEFAULT_RATELIMIT_BURST);
>   
> @@ -1996,24 +2059,27 @@ int unpoison_memory(unsigned long pfn)
>   		goto unlock_mutex;
>   	}
>   
> -	if (!get_hwpoison_page(p, flags)) {
> -		if (TestClearPageHWPoison(p))
> -			num_poisoned_pages_dec();
> -		unpoison_pr_info("Unpoison: Software-unpoisoned free page %#lx\n",
> -				 pfn, &unpoison_rs);
> +	if (PageSlab(page))

As I reported before [1], get refcount to a PageTable(page) is not safe, 
maybe let huge_pmd_unshare() go to wrong branch, so can you also avoid 
PageTable(page) here ?

[1]: 
https://lore.kernel.org/linux-mm/271d0f41-0599-9d5d-0555-47189f476243@sangfor.com.cn/

>   		goto unlock_mutex;
> -	}
>   
> -	if (TestClearPageHWPoison(page)) {
> -		unpoison_pr_info("Unpoison: Software-unpoisoned page %#lx\n",
> -				 pfn, &unpoison_rs);
> -		num_poisoned_pages_dec();
> -		freeit = 1;
> -	}
> +	ret = get_hwpoison_page(p, MF_UNPOISON);
> +	if (!ret) {
> +		ret = clear_page_hwpoison(&unpoison_rs, p);

in this case, the page p has zero refcount, with HWPoison flag, but 
without MAGIC_HWPOISON, where it come from ? race condition or normal 
case ? Is there any examples please ?

> +	} else if (ret < 0) {
> +		if (ret == -EHWPOISON) {
> +			ret = unpoison_taken_off_page(&unpoison_rs, p);
> +		} else
> +			unpoison_pr_info("Unpoison: failed to grab page %#lx\n",
> +					 pfn, &unpoison_rs);
> +	} else {
> +		int freeit = clear_page_hwpoison(&unpoison_rs, p);
>   
> -	put_page(page);
> -	if (freeit && !(pfn == my_zero_pfn(0) && page_count(p) == 1))
>   		put_page(page);
> +		if (freeit && !(pfn == my_zero_pfn(0) && page_count(p) == 1)) {
> +			put_page(page);
> +			ret = 0;
> +		}
> +	}
>   
>   unlock_mutex:
>   	mutex_unlock(&mf_mutex);
> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> index 4ea590646f89..b6e4cbb44c54 100644
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -9466,6 +9466,7 @@ bool take_page_off_buddy(struct page *page)
>   			del_page_from_free_list(page_head, zone, page_order);
>   			break_down_buddy_pages(zone, page_head, page, 0,
>   						page_order, migratetype);
> +			SetPageHWPoisonTakenOff(page);
>   			if (!is_migrate_isolate(migratetype))
>   				__mod_zone_freepage_state(zone, -1, migratetype);
>   			ret = true;
> @@ -9477,4 +9478,26 @@ bool take_page_off_buddy(struct page *page)
>   	spin_unlock_irqrestore(&zone->lock, flags);
>   	return ret;
>   }
> +
> +/*
> + * Cancel takeoff done by take_page_off_buddy().
> + */
> +bool take_page_back_buddy(struct page *page)
> +{
> +	struct zone *zone = page_zone(page);
> +	unsigned long pfn = page_to_pfn(page);
> +	unsigned long flags;
> +	int migratetype = get_pfnblock_migratetype(page, pfn);
> +	bool ret = false;
> +
> +	spin_lock_irqsave(&zone->lock, flags);
> +	if (put_page_testzero(page)) {
> +		ClearPageHWPoisonTakenOff(page);
> +		__free_one_page(page, pfn, zone, 0, migratetype, FPI_NONE);
> +		ret = true;
> +	}
> +	spin_unlock_irqrestore(&zone->lock, flags);
> +
> +	return ret;
> +}
>   #endif
> 


-- 
Thanks,
- Ding Hui


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH v2 1/4] mm/hwpoison: mf_mutex for soft offline and unpoison
  2021-10-25 23:05 ` [PATCH v2 1/4] mm/hwpoison: mf_mutex for soft offline and unpoison Naoya Horiguchi
@ 2021-10-27  1:32   ` Yang Shi
  2021-10-27  2:31     ` Naoya Horiguchi
  0 siblings, 1 reply; 15+ messages in thread
From: Yang Shi @ 2021-10-27  1:32 UTC (permalink / raw)
  To: Naoya Horiguchi
  Cc: Linux MM, Andrew Morton, David Hildenbrand, Oscar Salvador,
	Michal Hocko, Ding Hui, Tony Luck, Aneesh Kumar K.V, Miaohe Lin,
	Peter Xu, Naoya Horiguchi, Linux Kernel Mailing List

On Mon, Oct 25, 2021 at 4:06 PM Naoya Horiguchi
<naoya.horiguchi@linux.dev> wrote:
>
> From: Naoya Horiguchi <naoya.horiguchi@nec.com>
>
> Originally mf_mutex is introduced to serialize multiple MCE events, but
> it's also helpful to exclude races among  soft_offline_page() and
> unpoison_memory().  So apply mf_mutex to them.

My understanding is it is not that useful to make unpoison run
parallel with memory_failure() and soft offline, so they can be
serialized by mf_mutex and we could make the memory failure handler
and soft offline simpler.

If the above statement is correct, could you please tweak this commit
log to reflect it with patch #2 squashed into this patch?

> Signed-off-by: Naoya Horiguchi <naoya.horiguchi@nec.com>
> ---
> ChangeLog v2:
> - add mutex_unlock() in "page already poisoned" path in soft_offline_page().
>   (Thanks to Ding Hui)
> ---
>  mm/memory-failure.c | 27 +++++++++++++++++++--------
>  1 file changed, 19 insertions(+), 8 deletions(-)
>
> diff --git a/mm/memory-failure.c b/mm/memory-failure.c
> index fa9dda95a2a2..97297edfbd8e 100644
> --- a/mm/memory-failure.c
> +++ b/mm/memory-failure.c
> @@ -1628,6 +1628,8 @@ static int memory_failure_dev_pagemap(unsigned long pfn, int flags,
>         return rc;
>  }
>
> +static DEFINE_MUTEX(mf_mutex);
> +
>  /**
>   * memory_failure - Handle memory failure of a page.
>   * @pfn: Page Number of the corrupted page
> @@ -1654,7 +1656,6 @@ int memory_failure(unsigned long pfn, int flags)
>         int res = 0;
>         unsigned long page_flags;
>         bool retry = true;
> -       static DEFINE_MUTEX(mf_mutex);
>
>         if (!sysctl_memory_failure_recovery)
>                 panic("Memory failure on page %lx", pfn);
> @@ -1978,6 +1979,7 @@ int unpoison_memory(unsigned long pfn)
>         struct page *page;
>         struct page *p;
>         int freeit = 0;
> +       int ret = 0;
>         unsigned long flags = 0;
>         static DEFINE_RATELIMIT_STATE(unpoison_rs, DEFAULT_RATELIMIT_INTERVAL,
>                                         DEFAULT_RATELIMIT_BURST);
> @@ -1988,28 +1990,30 @@ int unpoison_memory(unsigned long pfn)
>         p = pfn_to_page(pfn);
>         page = compound_head(p);
>
> +       mutex_lock(&mf_mutex);
> +
>         if (!PageHWPoison(p)) {
>                 unpoison_pr_info("Unpoison: Page was already unpoisoned %#lx\n",
>                                  pfn, &unpoison_rs);
> -               return 0;
> +               goto unlock_mutex;
>         }
>
>         if (page_count(page) > 1) {
>                 unpoison_pr_info("Unpoison: Someone grabs the hwpoison page %#lx\n",
>                                  pfn, &unpoison_rs);
> -               return 0;
> +               goto unlock_mutex;
>         }
>
>         if (page_mapped(page)) {
>                 unpoison_pr_info("Unpoison: Someone maps the hwpoison page %#lx\n",
>                                  pfn, &unpoison_rs);
> -               return 0;
> +               goto unlock_mutex;
>         }
>
>         if (page_mapping(page)) {
>                 unpoison_pr_info("Unpoison: the hwpoison page has non-NULL mapping %#lx\n",
>                                  pfn, &unpoison_rs);
> -               return 0;
> +               goto unlock_mutex;
>         }
>
>         /*
> @@ -2020,7 +2024,7 @@ int unpoison_memory(unsigned long pfn)
>         if (!PageHuge(page) && PageTransHuge(page)) {
>                 unpoison_pr_info("Unpoison: Memory failure is now running on %#lx\n",
>                                  pfn, &unpoison_rs);
> -               return 0;
> +               goto unlock_mutex;
>         }
>
>         if (!get_hwpoison_page(p, flags)) {
> @@ -2028,7 +2032,7 @@ int unpoison_memory(unsigned long pfn)
>                         num_poisoned_pages_dec();
>                 unpoison_pr_info("Unpoison: Software-unpoisoned free page %#lx\n",
>                                  pfn, &unpoison_rs);
> -               return 0;
> +               goto unlock_mutex;
>         }
>
>         lock_page(page);
> @@ -2050,7 +2054,9 @@ int unpoison_memory(unsigned long pfn)
>         if (freeit && !(pfn == my_zero_pfn(0) && page_count(p) == 1))
>                 put_page(page);
>
> -       return 0;
> +unlock_mutex:
> +       mutex_unlock(&mf_mutex);
> +       return ret;
>  }
>  EXPORT_SYMBOL(unpoison_memory);
>
> @@ -2231,9 +2237,12 @@ int soft_offline_page(unsigned long pfn, int flags)
>                 return -EIO;
>         }
>
> +       mutex_lock(&mf_mutex);
> +
>         if (PageHWPoison(page)) {
>                 pr_info("%s: %#lx page already poisoned\n", __func__, pfn);
>                 put_ref_page(ref_page);
> +               mutex_unlock(&mf_mutex);
>                 return 0;
>         }
>
> @@ -2251,5 +2260,7 @@ int soft_offline_page(unsigned long pfn, int flags)
>                 }
>         }
>
> +       mutex_unlock(&mf_mutex);
> +
>         return ret;
>  }
> --
> 2.25.1
>


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH RESEND v2 4/4] mm/hwpoison: fix unpoison_memory()
  2021-10-27  1:26     ` Ding Hui
@ 2021-10-27  2:29       ` Naoya Horiguchi
  0 siblings, 0 replies; 15+ messages in thread
From: Naoya Horiguchi @ 2021-10-27  2:29 UTC (permalink / raw)
  To: Ding Hui
  Cc: Naoya Horiguchi, linux-mm, Andrew Morton, David Hildenbrand,
	Oscar Salvador, Michal Hocko, Tony Luck, Aneesh Kumar K.V,
	Miaohe Lin, Yang Shi, Peter Xu, Naoya Horiguchi, linux-kernel

On Wed, Oct 27, 2021 at 09:26:08AM +0800, Ding Hui wrote:
> On 2021/10/26 7:27, Naoya Horiguchi wrote:
> > (Please ignore previous patch 4/4 which leaves the replied message at
> > the end of body, this was due to my wrong manual edit, sorry about noise.
> > I resend 4/4.)
> > 
> > From: Naoya Horiguchi <naoya.horiguchi@nec.com>
> > 
> > After recent soft-offline rework, error pages can be taken off from
> > buddy allocator, but the existing unpoison_memory() does not properly
> > undo the operation.  Moreover, due to the recent change on
> > __get_hwpoison_page(), get_page_unless_zero() is hardly called for
> > hwpoisoned pages.  So __get_hwpoison_page() highly likely returns zero
> 
> Sorry, can you explain that __get_hwpoison_page() likely go which branch to
> return zero for hwpoisoned pages ?
> 
> not returns -EBUSY because HWPoisonHandlable() is false ?

Sorry, you're right, the return value mentioned here was changed since v1,
so I should have updated the description.

...
> > @@ -1958,9 +2023,7 @@ int unpoison_memory(unsigned long pfn)
> >   {
> >   	struct page *page;
> >   	struct page *p;
> > -	int freeit = 0;
> > -	int ret = 0;
> > -	unsigned long flags = 0;
> > +	int ret = -EBUSY;
> >   	static DEFINE_RATELIMIT_STATE(unpoison_rs, DEFAULT_RATELIMIT_INTERVAL,
> >   					DEFAULT_RATELIMIT_BURST);
> > @@ -1996,24 +2059,27 @@ int unpoison_memory(unsigned long pfn)
> >   		goto unlock_mutex;
> >   	}
> > -	if (!get_hwpoison_page(p, flags)) {
> > -		if (TestClearPageHWPoison(p))
> > -			num_poisoned_pages_dec();
> > -		unpoison_pr_info("Unpoison: Software-unpoisoned free page %#lx\n",
> > -				 pfn, &unpoison_rs);
> > +	if (PageSlab(page))
> 
> As I reported before [1], get refcount to a PageTable(page) is not safe,
> maybe let huge_pmd_unshare() go to wrong branch, so can you also avoid
> PageTable(page) here ?
> 
> [1]: https://lore.kernel.org/linux-mm/271d0f41-0599-9d5d-0555-47189f476243@sangfor.com.cn/

Sure, I'll add it.

> 
> >   		goto unlock_mutex;
> > -	}
> > -	if (TestClearPageHWPoison(page)) {
> > -		unpoison_pr_info("Unpoison: Software-unpoisoned page %#lx\n",
> > -				 pfn, &unpoison_rs);
> > -		num_poisoned_pages_dec();
> > -		freeit = 1;
> > -	}
> > +	ret = get_hwpoison_page(p, MF_UNPOISON);
> > +	if (!ret) {
> > +		ret = clear_page_hwpoison(&unpoison_rs, p);
> 
> in this case, the page p has zero refcount, with HWPoison flag, but without
> MAGIC_HWPOISON, where it come from ? race condition or normal case ? Is
> there any examples please ?

memory_failure() could set HWPoison flag at any time, so if a memory error happens
on a page which a just being freed, that could be linked to buddy freelist with
HWPoison flag.

Another example is hugetlb's case, if memory_failrue() is called for a free hugetlb
page and somehow dissolve_free_huge_page() fails in handling path, the hugepage
remains with HWPoison flag set and refcount 0.
BTW, considering this case, I found that clear_page_hwpoison() should be called
for "page" instead of "p", which will be fixed in the next version.

Thanks,
Naoya Horiguchi


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH v2 1/4] mm/hwpoison: mf_mutex for soft offline and unpoison
  2021-10-27  1:32   ` Yang Shi
@ 2021-10-27  2:31     ` Naoya Horiguchi
  0 siblings, 0 replies; 15+ messages in thread
From: Naoya Horiguchi @ 2021-10-27  2:31 UTC (permalink / raw)
  To: Yang Shi
  Cc: Linux MM, Andrew Morton, David Hildenbrand, Oscar Salvador,
	Michal Hocko, Ding Hui, Tony Luck, Aneesh Kumar K.V, Miaohe Lin,
	Peter Xu, Naoya Horiguchi, Linux Kernel Mailing List

On Tue, Oct 26, 2021 at 06:32:36PM -0700, Yang Shi wrote:
> On Mon, Oct 25, 2021 at 4:06 PM Naoya Horiguchi
> <naoya.horiguchi@linux.dev> wrote:
> >
> > From: Naoya Horiguchi <naoya.horiguchi@nec.com>
> >
> > Originally mf_mutex is introduced to serialize multiple MCE events, but
> > it's also helpful to exclude races among  soft_offline_page() and
> > unpoison_memory().  So apply mf_mutex to them.
> 
> My understanding is it is not that useful to make unpoison run
> parallel with memory_failure() and soft offline, so they can be
> serialized by mf_mutex and we could make the memory failure handler
> and soft offline simpler.

Thank you for the suggestion, this sounds correct and more specific.

> 
> If the above statement is correct, could you please tweak this commit
> log to reflect it with patch #2 squashed into this patch?

Sure, I'm thinking of revising like below:

  Originally mf_mutex is introduced to serialize multiple MCE events, but
  it is not that useful to allow unpoison to run in parallel with memory_failure()
  and soft offline.  So apply mf_they to soft offline and unpoison.
  The memory failure handler and soft offline handler get simpler with this.

Thanks,
Naoya Horiguchi


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH v2 4/4] mm/hwpoison: fix unpoison_memory()
  2021-10-25 23:16 ` [PATCH v2 4/4] mm/hwpoison: fix unpoison_memory() Naoya Horiguchi
  2021-10-25 23:27   ` [PATCH RESEND " Naoya Horiguchi
@ 2021-10-27  4:00   ` Yang Shi
  2021-10-27 11:58     ` Naoya Horiguchi
  1 sibling, 1 reply; 15+ messages in thread
From: Yang Shi @ 2021-10-27  4:00 UTC (permalink / raw)
  To: Naoya Horiguchi
  Cc: Linux MM, Andrew Morton, David Hildenbrand, Oscar Salvador,
	Michal Hocko, Ding Hui, Tony Luck, Aneesh Kumar K.V, Miaohe Lin,
	Peter Xu, Naoya Horiguchi, Linux Kernel Mailing List

On Mon, Oct 25, 2021 at 4:16 PM Naoya Horiguchi <nao.horiguchi@gmail.com> wrote:
>
> From: Naoya Horiguchi <naoya.horiguchi@nec.com>
>
> After recent soft-offline rework, error pages can be taken off from
> buddy allocator, but the existing unpoison_memory() does not properly
> undo the operation.  Moreover, due to the recent change on
> __get_hwpoison_page(), get_page_unless_zero() is hardly called for
> hwpoisoned pages.  So __get_hwpoison_page() highly likely returns zero
> (meaning to fail to grab page refcount) and unpoison just clears
> PG_hwpoison without releasing a refcount.  That does not lead to a
> critical issue like kernel panic, but unpoisoned pages never get back to
> buddy (leaked permanently), which is not good.
>
> To (partially) fix this, we need to identify "taken off" pages from
> other types of hwpoisoned pages.  We can't use refcount or page flags
> for this purpose, so a pseudo flag is defined by hacking ->private
> field.  Someone might think that put_page() is enough to cancel
> taken-off pages, but the normal free path contains some operations not
> suitable for the current purpose, and can fire VM_BUG_ON().
>
> Note that unpoison_memory() is now supposed to be cancel hwpoison events
> injected only by madvise() or /sys/devices/system/memory/{hard,soft}_offline_page,
> not by MCE injection, so please don't try to use unpoison when testing
> with MCE injection.
>
> Signed-off-by: Naoya Horiguchi <naoya.horiguchi@nec.com>
> ---
> ChangeLog v2:
> - unpoison_memory() returns as commented
> - explicitly avoids unpoisoning slab pages
> - separates internal pinning function into __get_unpoison_page()
> ---
>  include/linux/mm.h         |   1 +
>  include/linux/page-flags.h |   4 ++
>  mm/memory-failure.c        | 104 ++++++++++++++++++++++++++++++-------
>  mm/page_alloc.c            |  23 ++++++++
>  4 files changed, 113 insertions(+), 19 deletions(-)
>
> diff --git a/include/linux/mm.h b/include/linux/mm.h
> index 71d886470d71..c7ad3fdfee7c 100644
> --- a/include/linux/mm.h
> +++ b/include/linux/mm.h
> @@ -3219,6 +3219,7 @@ enum mf_flags {
>         MF_ACTION_REQUIRED = 1 << 1,
>         MF_MUST_KILL = 1 << 2,
>         MF_SOFT_OFFLINE = 1 << 3,
> +       MF_UNPOISON = 1 << 4,
>  };
>  extern int memory_failure(unsigned long pfn, int flags);
>  extern void memory_failure_queue(unsigned long pfn, int flags);
> diff --git a/include/linux/page-flags.h b/include/linux/page-flags.h
> index b78f137acc62..8add006535f6 100644
> --- a/include/linux/page-flags.h
> +++ b/include/linux/page-flags.h
> @@ -522,7 +522,11 @@ PAGEFLAG_FALSE(Uncached, uncached)
>  PAGEFLAG(HWPoison, hwpoison, PF_ANY)
>  TESTSCFLAG(HWPoison, hwpoison, PF_ANY)
>  #define __PG_HWPOISON (1UL << PG_hwpoison)
> +#define MAGIC_HWPOISON 0x4857504f49534f4e
> +extern void SetPageHWPoisonTakenOff(struct page *page);
> +extern void ClearPageHWPoisonTakenOff(struct page *page);
>  extern bool take_page_off_buddy(struct page *page);
> +extern bool take_page_back_buddy(struct page *page);
>  #else
>  PAGEFLAG_FALSE(HWPoison, hwpoison)
>  #define __PG_HWPOISON 0
> diff --git a/mm/memory-failure.c b/mm/memory-failure.c
> index 09f079987928..a6f80a670012 100644
> --- a/mm/memory-failure.c
> +++ b/mm/memory-failure.c
> @@ -1160,6 +1160,22 @@ static int page_action(struct page_state *ps, struct page *p,
>         return (result == MF_RECOVERED || result == MF_DELAYED) ? 0 : -EBUSY;
>  }
>
> +static inline bool PageHWPoisonTakenOff(struct page *page)
> +{
> +       return PageHWPoison(page) && page_private(page) == MAGIC_HWPOISON;
> +}
> +
> +void SetPageHWPoisonTakenOff(struct page *page)
> +{
> +       set_page_private(page, MAGIC_HWPOISON);
> +}
> +
> +void ClearPageHWPoisonTakenOff(struct page *page)
> +{
> +       if (PageHWPoison(page))
> +               set_page_private(page, 0);
> +}
> +
>  /*
>   * Return true if a page type of a given page is supported by hwpoison
>   * mechanism (while handling could fail), otherwise false.  This function
> @@ -1262,6 +1278,27 @@ static int get_any_page(struct page *p, unsigned long flags)
>         return ret;
>  }
>
> +static int __get_unpoison_page(struct page *page)
> +{
> +       struct page *head = compound_head(page);
> +       int ret = 0;
> +       bool hugetlb = false;
> +
> +       ret = get_hwpoison_huge_page(head, &hugetlb);
> +       if (hugetlb)
> +               return ret;
> +
> +       /*
> +        * PageHWPoisonTakenOff pages are not only marked as PG_hwpoison,
> +        * but also isolated from buddy freelist, so need to identify the
> +        * state and have to cancel both operations to unpoison.
> +        */
> +       if (PageHWPoisonTakenOff(head))
> +               return -EHWPOISON;

I don't think we could see compound page here, but checking head page
might be confusing since private is per subpage, so it might be better
to use page instead of head.

> +
> +       return get_page_unless_zero(head) ? 1 : 0;
> +}
> +
>  /**
>   * get_hwpoison_page() - Get refcount for memory error handling
>   * @p:         Raw error page (hit by memory error)
> @@ -1278,18 +1315,26 @@ static int get_any_page(struct page *p, unsigned long flags)
>   * extra care for the error page's state (as done in __get_hwpoison_page()),
>   * and has some retry logic in get_any_page().
>   *
> + * When called from unpoison_memory(), the caller should already ensure that
> + * the given page has PG_hwpoison. So it's never reused for other page
> + * allocations, and __get_unpoison_page() never races with them.
> + *
>   * Return: 0 on failure,
>   *         1 on success for in-use pages in a well-defined state,
>   *         -EIO for pages on which we can not handle memory errors,
>   *         -EBUSY when get_hwpoison_page() has raced with page lifecycle
> - *         operations like allocation and free.
> + *         operations like allocation and free,
> + *         -EHWPOISON when the page is hwpoisoned and taken off from buddy.
>   */
>  static int get_hwpoison_page(struct page *p, unsigned long flags)
>  {
>         int ret;
>
>         zone_pcp_disable(page_zone(p));
> -       ret = get_any_page(p, flags);
> +       if (flags & MF_UNPOISON)
> +               ret = __get_unpoison_page(p);
> +       else
> +               ret = get_any_page(p, flags);
>         zone_pcp_enable(page_zone(p));
>
>         return ret;
> @@ -1942,6 +1987,26 @@ core_initcall(memory_failure_init);
>                 pr_info(fmt, pfn);                      \
>  })
>
> +static inline int clear_page_hwpoison(struct ratelimit_state *rs, struct page *p)
> +{
> +       if (TestClearPageHWPoison(p)) {
> +               unpoison_pr_info("Unpoison: Software-unpoisoned page %#lx\n",
> +                                page_to_pfn(p), rs);
> +               num_poisoned_pages_dec();
> +               return 0;
> +       }
> +       return -EBUSY;

I don't quite get why -EBUSY is returned. TestClear returns the old
value, so returning 0 means the flag was cleared before. It is fine,
right? I don't see why we have to return different values.

> +}
> +
> +static inline int unpoison_taken_off_page(struct ratelimit_state *rs,
> +                                         struct page *p)
> +{
> +       if (take_page_back_buddy(p) && !clear_page_hwpoison(rs, p))

If clear_page_hwpoison() is void, it can be moved into take_page_back_buddy().

> +               return 0;
> +       else
> +               return -EBUSY;
> +}
> +
>  /**
>   * unpoison_memory - Unpoison a previously poisoned page
>   * @pfn: Page number of the to be unpoisoned page
> @@ -1958,9 +2023,7 @@ int unpoison_memory(unsigned long pfn)
>  {
>         struct page *page;
>         struct page *p;
> -       int freeit = 0;
> -       int ret = 0;
> -       unsigned long flags = 0;
> +       int ret = -EBUSY;
>         static DEFINE_RATELIMIT_STATE(unpoison_rs, DEFAULT_RATELIMIT_INTERVAL,
>                                         DEFAULT_RATELIMIT_BURST);
>
> @@ -1996,24 +2059,27 @@ int unpoison_memory(unsigned long pfn)
>                 goto unlock_mutex;
>         }
>
> -       if (!get_hwpoison_page(p, flags)) {
> -               if (TestClearPageHWPoison(p))
> -                       num_poisoned_pages_dec();
> -               unpoison_pr_info("Unpoison: Software-unpoisoned free page %#lx\n",
> -                                pfn, &unpoison_rs);
> +       if (PageSlab(page))
>                 goto unlock_mutex;
> -       }
>
> -       if (TestClearPageHWPoison(page)) {
> -               unpoison_pr_info("Unpoison: Software-unpoisoned page %#lx\n",
> -                                pfn, &unpoison_rs);
> -               num_poisoned_pages_dec();
> -               freeit = 1;
> -       }
> +       ret = get_hwpoison_page(p, MF_UNPOISON);
> +       if (!ret) {
> +               ret = clear_page_hwpoison(&unpoison_rs, p);
> +       } else if (ret < 0) {
> +               if (ret == -EHWPOISON) {
> +                       ret = unpoison_taken_off_page(&unpoison_rs, p);
> +               } else
> +                       unpoison_pr_info("Unpoison: failed to grab page %#lx\n",
> +                                        pfn, &unpoison_rs);
> +       } else {
> +               int freeit = clear_page_hwpoison(&unpoison_rs, p);
>
> -       put_page(page);
> -       if (freeit && !(pfn == my_zero_pfn(0) && page_count(p) == 1))
>                 put_page(page);
> +               if (freeit && !(pfn == my_zero_pfn(0) && page_count(p) == 1)) {
> +                       put_page(page);
> +                       ret = 0;
> +               }
> +       }
>
>  unlock_mutex:
>         mutex_unlock(&mf_mutex);
> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> index 4ea590646f89..b6e4cbb44c54 100644
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -9466,6 +9466,7 @@ bool take_page_off_buddy(struct page *page)
>                         del_page_from_free_list(page_head, zone, page_order);
>                         break_down_buddy_pages(zone, page_head, page, 0,
>                                                 page_order, migratetype);
> +                       SetPageHWPoisonTakenOff(page);
>                         if (!is_migrate_isolate(migratetype))
>                                 __mod_zone_freepage_state(zone, -1, migratetype);
>                         ret = true;
> @@ -9477,4 +9478,26 @@ bool take_page_off_buddy(struct page *page)
>         spin_unlock_irqrestore(&zone->lock, flags);
>         return ret;
>  }
> +
> +/*
> + * Cancel takeoff done by take_page_off_buddy().
> + */
> +bool take_page_back_buddy(struct page *page)

put_page_back_buddy() sounds more natural?

> +{
> +       struct zone *zone = page_zone(page);
> +       unsigned long pfn = page_to_pfn(page);
> +       unsigned long flags;
> +       int migratetype = get_pfnblock_migratetype(page, pfn);
> +       bool ret = false;
> +
> +       spin_lock_irqsave(&zone->lock, flags);
> +       if (put_page_testzero(page)) {
> +               ClearPageHWPoisonTakenOff(page);
> +               __free_one_page(page, pfn, zone, 0, migratetype, FPI_NONE);
> +               ret = true;
> +       }
> +       spin_unlock_irqrestore(&zone->lock, flags);
> +
> +       return ret;
> +}
>  #endif
> --
> 2.25.1
>
>
>
> On Tue, Oct 26, 2021 at 08:04:59AM +0900, Naoya Horiguchi wrote:
> > Hi,
> >
> > I updated unpoison fix patchset (sorry for long blank time since v1).
> >
> > Main purpose of this series is to sync unpoison code to recent changes
> > around how hwpoison code takes page refcount.  Unpoison should work or
> > simply fail (without crash) if impossible.
> >
> > The recent works of keeping hwpoison pages in shmem pagecache introduce
> > a new state of hwpoisoned pages, but unpoison for such pages is not
> > supported yet with this series.
> >
> > It seems that soft-offline and unpoison can be used as general purpose
> > page offline/online mechanism (not in the context of memory error). I
> > think that we need some additional works to realize it because currently
> > soft-offline and unpoison are assumed not to happen so frequently
> > (print out too many messages for aggressive usecases). But anyway this
> > could be another interesting next topic.
> >
> > v1: https://lore.kernel.org/linux-mm/20210614021212.223326-1-nao.horiguchi@gmail.com/
> >
> > Thanks,
> > Naoya Horiguchi
> > ---
> > Summary:
> >
> > Naoya Horiguchi (4):
> >       mm/hwpoison: mf_mutex for soft offline and unpoison
> >       mm/hwpoison: remove race consideration
> >       mm/hwpoison: remove MF_MSG_BUDDY_2ND and MF_MSG_POISONED_HUGE
> >       mm/hwpoison: fix unpoison_memory()
> >
> >  include/linux/mm.h         |   3 +-
> >  include/linux/page-flags.h |   4 ++
> >  include/ras/ras_event.h    |   2 -
> >  mm/memory-failure.c        | 166 ++++++++++++++++++++++++++++-----------------
> >  mm/page_alloc.c            |  23 +++++++
> >  5 files changed, 130 insertions(+), 68 deletions(-)
>
> To:
> Cc:
> Bcc: nao.horiguchi@gmail.com
> Subject:
> Reply-To:
>


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH v2 4/4] mm/hwpoison: fix unpoison_memory()
  2021-10-27  4:00   ` [PATCH " Yang Shi
@ 2021-10-27 11:58     ` Naoya Horiguchi
  0 siblings, 0 replies; 15+ messages in thread
From: Naoya Horiguchi @ 2021-10-27 11:58 UTC (permalink / raw)
  To: Yang Shi
  Cc: Linux MM, Andrew Morton, David Hildenbrand, Oscar Salvador,
	Michal Hocko, Ding Hui, Tony Luck, Aneesh Kumar K.V, Miaohe Lin,
	Peter Xu, Naoya Horiguchi, Linux Kernel Mailing List

On Tue, Oct 26, 2021 at 09:00:37PM -0700, Yang Shi wrote:
> On Mon, Oct 25, 2021 at 4:16 PM Naoya Horiguchi <nao.horiguchi@gmail.com> wrote:
> >
> > From: Naoya Horiguchi <naoya.horiguchi@nec.com>
> >
> > After recent soft-offline rework, error pages can be taken off from
> > buddy allocator, but the existing unpoison_memory() does not properly
> > undo the operation.  Moreover, due to the recent change on
> > __get_hwpoison_page(), get_page_unless_zero() is hardly called for
> > hwpoisoned pages.  So __get_hwpoison_page() highly likely returns zero
> > (meaning to fail to grab page refcount) and unpoison just clears
> > PG_hwpoison without releasing a refcount.  That does not lead to a
> > critical issue like kernel panic, but unpoisoned pages never get back to
> > buddy (leaked permanently), which is not good.
> >
> > To (partially) fix this, we need to identify "taken off" pages from
> > other types of hwpoisoned pages.  We can't use refcount or page flags
> > for this purpose, so a pseudo flag is defined by hacking ->private
> > field.  Someone might think that put_page() is enough to cancel
> > taken-off pages, but the normal free path contains some operations not
> > suitable for the current purpose, and can fire VM_BUG_ON().
> >
> > Note that unpoison_memory() is now supposed to be cancel hwpoison events
> > injected only by madvise() or /sys/devices/system/memory/{hard,soft}_offline_page,
> > not by MCE injection, so please don't try to use unpoison when testing
> > with MCE injection.
> >
> > Signed-off-by: Naoya Horiguchi <naoya.horiguchi@nec.com>
> > ---
> > ChangeLog v2:
> > - unpoison_memory() returns as commented
> > - explicitly avoids unpoisoning slab pages
> > - separates internal pinning function into __get_unpoison_page()
> > ---
...
> > @@ -1262,6 +1278,27 @@ static int get_any_page(struct page *p, unsigned long flags)
> >         return ret;
> >  }
> >
> > +static int __get_unpoison_page(struct page *page)
> > +{
> > +       struct page *head = compound_head(page);
> > +       int ret = 0;
> > +       bool hugetlb = false;
> > +
> > +       ret = get_hwpoison_huge_page(head, &hugetlb);
> > +       if (hugetlb)
> > +               return ret;
> > +
> > +       /*
> > +        * PageHWPoisonTakenOff pages are not only marked as PG_hwpoison,
> > +        * but also isolated from buddy freelist, so need to identify the
> > +        * state and have to cancel both operations to unpoison.
> > +        */
> > +       if (PageHWPoisonTakenOff(head))
> > +               return -EHWPOISON;
> 
> I don't think we could see compound page here, but checking head page
> might be confusing since private is per subpage, so it might be better
> to use page instead of head.

OK, I'll do this (and pass page to get_page_unless_zero() too).

> > +
> > +       return get_page_unless_zero(head) ? 1 : 0;
> > +}
> > +
> >  /**
> >   * get_hwpoison_page() - Get refcount for memory error handling
> >   * @p:         Raw error page (hit by memory error)
> > @@ -1278,18 +1315,26 @@ static int get_any_page(struct page *p, unsigned long flags)
> >   * extra care for the error page's state (as done in __get_hwpoison_page()),
> >   * and has some retry logic in get_any_page().
> >   *
> > + * When called from unpoison_memory(), the caller should already ensure that
> > + * the given page has PG_hwpoison. So it's never reused for other page
> > + * allocations, and __get_unpoison_page() never races with them.
> > + *
> >   * Return: 0 on failure,
> >   *         1 on success for in-use pages in a well-defined state,
> >   *         -EIO for pages on which we can not handle memory errors,
> >   *         -EBUSY when get_hwpoison_page() has raced with page lifecycle
> > - *         operations like allocation and free.
> > + *         operations like allocation and free,
> > + *         -EHWPOISON when the page is hwpoisoned and taken off from buddy.
> >   */
> >  static int get_hwpoison_page(struct page *p, unsigned long flags)
> >  {
> >         int ret;
> >
> >         zone_pcp_disable(page_zone(p));
> > -       ret = get_any_page(p, flags);
> > +       if (flags & MF_UNPOISON)
> > +               ret = __get_unpoison_page(p);
> > +       else
> > +               ret = get_any_page(p, flags);
> >         zone_pcp_enable(page_zone(p));
> >
> >         return ret;
> > @@ -1942,6 +1987,26 @@ core_initcall(memory_failure_init);
> >                 pr_info(fmt, pfn);                      \
> >  })
> >
> > +static inline int clear_page_hwpoison(struct ratelimit_state *rs, struct page *p)
> > +{
> > +       if (TestClearPageHWPoison(p)) {
> > +               unpoison_pr_info("Unpoison: Software-unpoisoned page %#lx\n",
> > +                                page_to_pfn(p), rs);
> > +               num_poisoned_pages_dec();
> > +               return 0;
> > +       }
> > +       return -EBUSY;
> 
> I don't quite get why -EBUSY is returned. TestClear returns the old
> value, so returning 0 means the flag was cleared before. It is fine,
> right? I don't see why we have to return different values.

I think clear_page_hwpoison()'s return value is used in the path where
get_hwpoison_page(MF_UNPOISON) returns 1.  But as you mentioned -EBUSY might
not be a good return value.  And I noticed that freeit's semantics is wrongly
reversed, that's just a bug, so I'll fix it.

> 
> > +}
> > +
> > +static inline int unpoison_taken_off_page(struct ratelimit_state *rs,
> > +                                         struct page *p)
> > +{
> > +       if (take_page_back_buddy(p) && !clear_page_hwpoison(rs, p))
> 
> If clear_page_hwpoison() is void, it can be moved into take_page_back_buddy().
> 
> > +               return 0;
> > +       else
> > +               return -EBUSY;
> > +}
> > +
> >  /**
> >   * unpoison_memory - Unpoison a previously poisoned page
> >   * @pfn: Page number of the to be unpoisoned page
> > @@ -1958,9 +2023,7 @@ int unpoison_memory(unsigned long pfn)
> >  {
> >         struct page *page;
> >         struct page *p;
> > -       int freeit = 0;
> > -       int ret = 0;
> > -       unsigned long flags = 0;
> > +       int ret = -EBUSY;
> >         static DEFINE_RATELIMIT_STATE(unpoison_rs, DEFAULT_RATELIMIT_INTERVAL,
> >                                         DEFAULT_RATELIMIT_BURST);
> >
> > @@ -1996,24 +2059,27 @@ int unpoison_memory(unsigned long pfn)
> >                 goto unlock_mutex;
> >         }
> >
> > -       if (!get_hwpoison_page(p, flags)) {
> > -               if (TestClearPageHWPoison(p))
> > -                       num_poisoned_pages_dec();
> > -               unpoison_pr_info("Unpoison: Software-unpoisoned free page %#lx\n",
> > -                                pfn, &unpoison_rs);
> > +       if (PageSlab(page))
> >                 goto unlock_mutex;
> > -       }
> >
> > -       if (TestClearPageHWPoison(page)) {
> > -               unpoison_pr_info("Unpoison: Software-unpoisoned page %#lx\n",
> > -                                pfn, &unpoison_rs);
> > -               num_poisoned_pages_dec();
> > -               freeit = 1;
> > -       }
> > +       ret = get_hwpoison_page(p, MF_UNPOISON);
> > +       if (!ret) {
> > +               ret = clear_page_hwpoison(&unpoison_rs, p);
> > +       } else if (ret < 0) {
> > +               if (ret == -EHWPOISON) {
> > +                       ret = unpoison_taken_off_page(&unpoison_rs, p);
> > +               } else
> > +                       unpoison_pr_info("Unpoison: failed to grab page %#lx\n",
> > +                                        pfn, &unpoison_rs);
> > +       } else {
> > +               int freeit = clear_page_hwpoison(&unpoison_rs, p);
> >
> > -       put_page(page);
> > -       if (freeit && !(pfn == my_zero_pfn(0) && page_count(p) == 1))
> >                 put_page(page);
> > +               if (freeit && !(pfn == my_zero_pfn(0) && page_count(p) == 1)) {
> > +                       put_page(page);
> > +                       ret = 0;
> > +               }
> > +       }
> >
> >  unlock_mutex:
> >         mutex_unlock(&mf_mutex);
> > diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> > index 4ea590646f89..b6e4cbb44c54 100644
> > --- a/mm/page_alloc.c
> > +++ b/mm/page_alloc.c
> > @@ -9466,6 +9466,7 @@ bool take_page_off_buddy(struct page *page)
> >                         del_page_from_free_list(page_head, zone, page_order);
> >                         break_down_buddy_pages(zone, page_head, page, 0,
> >                                                 page_order, migratetype);
> > +                       SetPageHWPoisonTakenOff(page);
> >                         if (!is_migrate_isolate(migratetype))
> >                                 __mod_zone_freepage_state(zone, -1, migratetype);
> >                         ret = true;
> > @@ -9477,4 +9478,26 @@ bool take_page_off_buddy(struct page *page)
> >         spin_unlock_irqrestore(&zone->lock, flags);
> >         return ret;
> >  }
> > +
> > +/*
> > + * Cancel takeoff done by take_page_off_buddy().
> > + */
> > +bool take_page_back_buddy(struct page *page)
> 
> put_page_back_buddy() sounds more natural?

Thanks for the suggestion, sounds nice, so I'll rename it.

Thanks,
Naoya Horiguchi


^ permalink raw reply	[flat|nested] 15+ messages in thread

end of thread, other threads:[~2021-10-27 11:59 UTC | newest]

Thread overview: 15+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-10-25 23:04 [PATCH v2 0/4] mm/hwpoison: fix unpoison_memory() Naoya Horiguchi
2021-10-25 23:05 ` [PATCH v2 1/4] mm/hwpoison: mf_mutex for soft offline and unpoison Naoya Horiguchi
2021-10-27  1:32   ` Yang Shi
2021-10-27  2:31     ` Naoya Horiguchi
2021-10-25 23:05 ` [PATCH v2 2/4] mm/hwpoison: remove race consideration Naoya Horiguchi
2021-10-27  1:04   ` Yang Shi
2021-10-27  1:18     ` Naoya Horiguchi
2021-10-25 23:14 ` [PATCH 3/4] mm/hwpoison: remove MF_MSG_BUDDY_2ND and MF_MSG_POISONED_HUGE Naoya Horiguchi
2021-10-27  1:05   ` Yang Shi
2021-10-25 23:16 ` [PATCH v2 4/4] mm/hwpoison: fix unpoison_memory() Naoya Horiguchi
2021-10-25 23:27   ` [PATCH RESEND " Naoya Horiguchi
2021-10-27  1:26     ` Ding Hui
2021-10-27  2:29       ` Naoya Horiguchi
2021-10-27  4:00   ` [PATCH " Yang Shi
2021-10-27 11:58     ` Naoya Horiguchi

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).