All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v6 0/4] mm, hwpoison: improve handling workload related to hugetlb and memory_hotplug
@ 2022-10-07  1:07 Naoya Horiguchi
  2022-10-07  1:07 ` [PATCH v6 1/4] mm,hwpoison,hugetlb,memory_hotplug: hotremove memory section with hwpoisoned hugepage Naoya Horiguchi
                   ` (3 more replies)
  0 siblings, 4 replies; 17+ messages in thread
From: Naoya Horiguchi @ 2022-10-07  1:07 UTC (permalink / raw)
  To: linux-mm
  Cc: Andrew Morton, Miaohe Lin, David Hildenbrand, Mike Kravetz,
	Yang Shi, Oscar Salvador, Muchun Song, Jane Chu, Naoya Horiguchi,
	linux-kernel

Hi,

This patchset tries to solve the issue among memory_hotplug, hugetlb and hwpoison.
In this patchset, memory hotplug handles hwpoison pages like below:

  - hwpoison pages should not prevent memory hotremove,
  - memory block with hwpoison pages should not be onlined.

Changes in this version:

  - applied comments for previous versions (thank you, reviewers),
  - fixed a few build errors (sorry for making noises on linux-next),
  - unpoison_memory() properly cancels per-memblk hwpoison counter,
  - rebased onto akpm/mm-unstable on Oct 4.

Thanks,
Naoya Horiguchi

v1: https://lore.kernel.org/linux-mm/20220427042841.678351-1-naoya.horiguchi@linux.dev/T
v2: https://lore.kernel.org/linux-mm/20220905062137.1455537-1-naoya.horiguchi@linux.dev/T
v3-5: https://lore.kernel.org/linux-mm/20220921091359.25889-1-naoya.horiguchi@linux.dev/T
---
Summary:

Naoya Horiguchi (4):
      mm,hwpoison,hugetlb,memory_hotplug: hotremove memory section with hwpoisoned hugepage
      mm/hwpoison: move definitions of num_poisoned_pages_* to memory-failure.c
      mm/hwpoison: pass pfn to num_poisoned_pages_*()
      mm/hwpoison: introduce per-memory_block hwpoison counter

 arch/parisc/kernel/pdt.c |  5 ++--
 drivers/base/memory.c    | 40 +++++++++++++++++++++++++++
 include/linux/hugetlb.h  | 10 ++++---
 include/linux/memory.h   |  3 ++
 include/linux/mm.h       | 29 ++++++++++++++++++--
 include/linux/swapops.h  | 24 ++--------------
 mm/hugetlb.c             |  9 +++---
 mm/internal.h            |  8 ------
 mm/memory-failure.c      | 71 +++++++++++++++++++++++++++---------------------
 mm/sparse.c              |  2 --
 10 files changed, 125 insertions(+), 76 deletions(-)

^ permalink raw reply	[flat|nested] 17+ messages in thread

* [PATCH v6 1/4] mm,hwpoison,hugetlb,memory_hotplug: hotremove memory section with hwpoisoned hugepage
  2022-10-07  1:07 [PATCH v6 0/4] mm, hwpoison: improve handling workload related to hugetlb and memory_hotplug Naoya Horiguchi
@ 2022-10-07  1:07 ` Naoya Horiguchi
  2022-10-13 14:17   ` Oscar Salvador
  2022-10-15  1:58   ` Miaohe Lin
  2022-10-07  1:07 ` [PATCH v6 2/4] mm/hwpoison: move definitions of num_poisoned_pages_* to memory-failure.c Naoya Horiguchi
                   ` (2 subsequent siblings)
  3 siblings, 2 replies; 17+ messages in thread
From: Naoya Horiguchi @ 2022-10-07  1:07 UTC (permalink / raw)
  To: linux-mm
  Cc: Andrew Morton, Miaohe Lin, David Hildenbrand, Mike Kravetz,
	Yang Shi, Oscar Salvador, Muchun Song, Jane Chu, Naoya Horiguchi,
	linux-kernel

From: Naoya Horiguchi <naoya.horiguchi@nec.com>

HWPoisoned page is not supposed to be accessed once marked, but currently
such accesses can happen during memory hotremove because do_migrate_range()
can be called before dissolve_free_huge_pages() is called.

Clear HPageMigratable for hwpoisoned hugepages to prevent them from being
migrated.  This should be done in hugetlb_lock to avoid race against
isolate_hugetlb().

get_hwpoison_huge_page() needs to have a flag to show it's called from
unpoison to take refcount of hwpoisoned hugepages, so add it.

Reported-by: Miaohe Lin <linmiaohe@huawei.com>
Signed-off-by: Naoya Horiguchi <naoya.horiguchi@nec.com>
---
ChangeLog v3 -> v6:
- introduce migratable_cleared to remember that HPageMigratable is
  cleared in error handling.  It's needed to cancel when an error event
  is filtered by hwpoison_filter(). (Thanks to Miaohe)

ChangeLog v2 -> v3
- move to the approach of clearing HPageMigratable instead of shifting
  dissolve_free_huge_pages.
---
 include/linux/hugetlb.h | 10 ++++++----
 include/linux/mm.h      |  6 ++++--
 mm/hugetlb.c            |  9 +++++----
 mm/memory-failure.c     | 21 +++++++++++++++++----
 4 files changed, 32 insertions(+), 14 deletions(-)

diff --git a/include/linux/hugetlb.h b/include/linux/hugetlb.h
index 32d45e96a894..19b99ff7fea0 100644
--- a/include/linux/hugetlb.h
+++ b/include/linux/hugetlb.h
@@ -183,8 +183,9 @@ bool hugetlb_reserve_pages(struct inode *inode, long from, long to,
 long hugetlb_unreserve_pages(struct inode *inode, long start, long end,
 						long freed);
 int isolate_hugetlb(struct page *page, struct list_head *list);
-int get_hwpoison_huge_page(struct page *page, bool *hugetlb);
-int get_huge_page_for_hwpoison(unsigned long pfn, int flags);
+int get_hwpoison_huge_page(struct page *page, bool *hugetlb, bool unpoison);
+int get_huge_page_for_hwpoison(unsigned long pfn, int flags,
+				bool *migratable_cleared);
 void putback_active_hugepage(struct page *page);
 void move_hugetlb_state(struct page *oldpage, struct page *newpage, int reason);
 void free_huge_page(struct page *page);
@@ -391,12 +392,13 @@ static inline int isolate_hugetlb(struct page *page, struct list_head *list)
 	return -EBUSY;
 }
 
-static inline int get_hwpoison_huge_page(struct page *page, bool *hugetlb)
+static inline int get_hwpoison_huge_page(struct page *page, bool *hugetlb, bool unpoison)
 {
 	return 0;
 }
 
-static inline int get_huge_page_for_hwpoison(unsigned long pfn, int flags)
+static inline int get_huge_page_for_hwpoison(unsigned long pfn, int flags,
+					bool *migratable_cleared)
 {
 	return 0;
 }
diff --git a/include/linux/mm.h b/include/linux/mm.h
index 8bbcccbc5565..3264bf993ad8 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -3277,9 +3277,11 @@ extern void shake_page(struct page *p);
 extern atomic_long_t num_poisoned_pages __read_mostly;
 extern int soft_offline_page(unsigned long pfn, int flags);
 #ifdef CONFIG_MEMORY_FAILURE
-extern int __get_huge_page_for_hwpoison(unsigned long pfn, int flags);
+extern int __get_huge_page_for_hwpoison(unsigned long pfn, int flags,
+					bool *migratable_cleared);
 #else
-static inline int __get_huge_page_for_hwpoison(unsigned long pfn, int flags)
+static inline int __get_huge_page_for_hwpoison(unsigned long pfn, int flags,
+					bool *migratable_cleared)
 {
 	return 0;
 }
diff --git a/mm/hugetlb.c b/mm/hugetlb.c
index 63fe47a0240a..0e482dfaf92e 100644
--- a/mm/hugetlb.c
+++ b/mm/hugetlb.c
@@ -7253,7 +7253,7 @@ int isolate_hugetlb(struct page *page, struct list_head *list)
 	return ret;
 }
 
-int get_hwpoison_huge_page(struct page *page, bool *hugetlb)
+int get_hwpoison_huge_page(struct page *page, bool *hugetlb, bool unpoison)
 {
 	int ret = 0;
 
@@ -7263,7 +7263,7 @@ int get_hwpoison_huge_page(struct page *page, bool *hugetlb)
 		*hugetlb = true;
 		if (HPageFreed(page))
 			ret = 0;
-		else if (HPageMigratable(page))
+		else if (HPageMigratable(page) || unpoison)
 			ret = get_page_unless_zero(page);
 		else
 			ret = -EBUSY;
@@ -7272,12 +7272,13 @@ int get_hwpoison_huge_page(struct page *page, bool *hugetlb)
 	return ret;
 }
 
-int get_huge_page_for_hwpoison(unsigned long pfn, int flags)
+int get_huge_page_for_hwpoison(unsigned long pfn, int flags,
+				bool *migratable_cleared)
 {
 	int ret;
 
 	spin_lock_irq(&hugetlb_lock);
-	ret = __get_huge_page_for_hwpoison(pfn, flags);
+	ret = __get_huge_page_for_hwpoison(pfn, flags, migratable_cleared);
 	spin_unlock_irq(&hugetlb_lock);
 	return ret;
 }
diff --git a/mm/memory-failure.c b/mm/memory-failure.c
index 145bb561ddb3..d4fef56c0438 100644
--- a/mm/memory-failure.c
+++ b/mm/memory-failure.c
@@ -1244,7 +1244,7 @@ static int __get_hwpoison_page(struct page *page, unsigned long flags)
 	int ret = 0;
 	bool hugetlb = false;
 
-	ret = get_hwpoison_huge_page(head, &hugetlb);
+	ret = get_hwpoison_huge_page(head, &hugetlb, false);
 	if (hugetlb)
 		return ret;
 
@@ -1334,7 +1334,7 @@ static int __get_unpoison_page(struct page *page)
 	int ret = 0;
 	bool hugetlb = false;
 
-	ret = get_hwpoison_huge_page(head, &hugetlb);
+	ret = get_hwpoison_huge_page(head, &hugetlb, true);
 	if (hugetlb)
 		return ret;
 
@@ -1785,7 +1785,8 @@ void hugetlb_clear_page_hwpoison(struct page *hpage)
  *   -EBUSY        - the hugepage is busy (try to retry)
  *   -EHWPOISON    - the hugepage is already hwpoisoned
  */
-int __get_huge_page_for_hwpoison(unsigned long pfn, int flags)
+int __get_huge_page_for_hwpoison(unsigned long pfn, int flags,
+				 bool *migratable_cleared)
 {
 	struct page *page = pfn_to_page(pfn);
 	struct page *head = compound_head(page);
@@ -1815,6 +1816,15 @@ int __get_huge_page_for_hwpoison(unsigned long pfn, int flags)
 		goto out;
 	}
 
+	/*
+	 * Clearing HPageMigratable for hwpoisoned hugepages to prevent them
+	 * from being migrated by memory hotremove.
+	 */
+	if (count_increased) {
+		*migratable_cleared = true;
+		ClearHPageMigratable(head);
+	}
+
 	return ret;
 out:
 	if (count_increased)
@@ -1834,10 +1844,11 @@ static int try_memory_failure_hugetlb(unsigned long pfn, int flags, int *hugetlb
 	struct page *p = pfn_to_page(pfn);
 	struct page *head;
 	unsigned long page_flags;
+	bool migratable_cleared = false;
 
 	*hugetlb = 1;
 retry:
-	res = get_huge_page_for_hwpoison(pfn, flags);
+	res = get_huge_page_for_hwpoison(pfn, flags, &migratable_cleared);
 	if (res == 2) { /* fallback to normal page handling */
 		*hugetlb = 0;
 		return 0;
@@ -1862,6 +1873,8 @@ static int try_memory_failure_hugetlb(unsigned long pfn, int flags, int *hugetlb
 
 	if (hwpoison_filter(p)) {
 		hugetlb_clear_page_hwpoison(head);
+		if (migratable_cleared)
+			SetHPageMigratable(head);
 		unlock_page(head);
 		if (res == 1)
 			put_page(head);
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH v6 2/4] mm/hwpoison: move definitions of num_poisoned_pages_* to memory-failure.c
  2022-10-07  1:07 [PATCH v6 0/4] mm, hwpoison: improve handling workload related to hugetlb and memory_hotplug Naoya Horiguchi
  2022-10-07  1:07 ` [PATCH v6 1/4] mm,hwpoison,hugetlb,memory_hotplug: hotremove memory section with hwpoisoned hugepage Naoya Horiguchi
@ 2022-10-07  1:07 ` Naoya Horiguchi
  2022-10-13 14:31   ` Oscar Salvador
  2022-10-07  1:07 ` [PATCH v6 3/4] mm/hwpoison: pass pfn to num_poisoned_pages_*() Naoya Horiguchi
  2022-10-07  1:07 ` [PATCH v6 4/4] mm/hwpoison: introduce per-memory_block hwpoison counter Naoya Horiguchi
  3 siblings, 1 reply; 17+ messages in thread
From: Naoya Horiguchi @ 2022-10-07  1:07 UTC (permalink / raw)
  To: linux-mm
  Cc: Andrew Morton, Miaohe Lin, David Hildenbrand, Mike Kravetz,
	Yang Shi, Oscar Salvador, Muchun Song, Jane Chu, Naoya Horiguchi,
	linux-kernel

From: Naoya Horiguchi <naoya.horiguchi@nec.com>

These interfaces will be used by drivers/base/memory.c by later patch, so as a
preparatory work move them to more common header file visible to the file.

Signed-off-by: Naoya Horiguchi <naoya.horiguchi@nec.com>
Reviewed-by: Miaohe Lin <linmiaohe@huawei.com>
---
ChangeLog v3 -> v6:
- remove static in definition of num_poisoned_pages_inc() to fix build error.

ChangeLog v2 -> v3:
- added declaration of num_poisoned_pages_inc() in #ifdef CONFIG_MEMORY_FAILURE
---
 arch/parisc/kernel/pdt.c |  3 +--
 include/linux/mm.h       |  5 +++++
 include/linux/swapops.h  | 24 ++----------------------
 mm/memory-failure.c      | 10 ++++++++++
 4 files changed, 18 insertions(+), 24 deletions(-)

diff --git a/arch/parisc/kernel/pdt.c b/arch/parisc/kernel/pdt.c
index e391b175f5ec..fdc880e2575a 100644
--- a/arch/parisc/kernel/pdt.c
+++ b/arch/parisc/kernel/pdt.c
@@ -18,8 +18,7 @@
 #include <linux/kthread.h>
 #include <linux/initrd.h>
 #include <linux/pgtable.h>
-#include <linux/swap.h>
-#include <linux/swapops.h>
+#include <linux/mm.h>
 
 #include <asm/pdc.h>
 #include <asm/pdcpat.h>
diff --git a/include/linux/mm.h b/include/linux/mm.h
index 3264bf993ad8..782ab29bb1fb 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -3279,12 +3279,17 @@ extern int soft_offline_page(unsigned long pfn, int flags);
 #ifdef CONFIG_MEMORY_FAILURE
 extern int __get_huge_page_for_hwpoison(unsigned long pfn, int flags,
 					bool *migratable_cleared);
+extern void num_poisoned_pages_inc(void);
 #else
 static inline int __get_huge_page_for_hwpoison(unsigned long pfn, int flags,
 					bool *migratable_cleared)
 {
 	return 0;
 }
+
+static inline void num_poisoned_pages_inc(void)
+{
+}
 #endif
 
 #ifndef arch_memory_failure
diff --git a/include/linux/swapops.h b/include/linux/swapops.h
index a91dd08e107b..3e58a812399a 100644
--- a/include/linux/swapops.h
+++ b/include/linux/swapops.h
@@ -581,8 +581,6 @@ static inline int is_pmd_migration_entry(pmd_t pmd)
 
 #ifdef CONFIG_MEMORY_FAILURE
 
-extern atomic_long_t num_poisoned_pages __read_mostly;
-
 /*
  * Support for hardware poisoned pages
  */
@@ -610,17 +608,7 @@ static inline struct page *hwpoison_entry_to_page(swp_entry_t entry)
 	return p;
 }
 
-static inline void num_poisoned_pages_inc(void)
-{
-	atomic_long_inc(&num_poisoned_pages);
-}
-
-static inline void num_poisoned_pages_sub(long i)
-{
-	atomic_long_sub(i, &num_poisoned_pages);
-}
-
-#else  /* CONFIG_MEMORY_FAILURE */
+#else
 
 static inline swp_entry_t make_hwpoison_entry(struct page *page)
 {
@@ -636,15 +624,7 @@ static inline struct page *hwpoison_entry_to_page(swp_entry_t entry)
 {
 	return NULL;
 }
-
-static inline void num_poisoned_pages_inc(void)
-{
-}
-
-static inline void num_poisoned_pages_sub(long i)
-{
-}
-#endif  /* CONFIG_MEMORY_FAILURE */
+#endif
 
 static inline int non_swap_entry(swp_entry_t entry)
 {
diff --git a/mm/memory-failure.c b/mm/memory-failure.c
index d4fef56c0438..323e0a21833f 100644
--- a/mm/memory-failure.c
+++ b/mm/memory-failure.c
@@ -74,6 +74,16 @@ atomic_long_t num_poisoned_pages __read_mostly = ATOMIC_LONG_INIT(0);
 
 static bool hw_memory_failure __read_mostly = false;
 
+inline void num_poisoned_pages_inc(void)
+{
+	atomic_long_inc(&num_poisoned_pages);
+}
+
+static inline void num_poisoned_pages_sub(long i)
+{
+	atomic_long_sub(i, &num_poisoned_pages);
+}
+
 /*
  * Return values:
  *   1:   the page is dissolved (if needed) and taken off from buddy,
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH v6 3/4] mm/hwpoison: pass pfn to num_poisoned_pages_*()
  2022-10-07  1:07 [PATCH v6 0/4] mm, hwpoison: improve handling workload related to hugetlb and memory_hotplug Naoya Horiguchi
  2022-10-07  1:07 ` [PATCH v6 1/4] mm,hwpoison,hugetlb,memory_hotplug: hotremove memory section with hwpoisoned hugepage Naoya Horiguchi
  2022-10-07  1:07 ` [PATCH v6 2/4] mm/hwpoison: move definitions of num_poisoned_pages_* to memory-failure.c Naoya Horiguchi
@ 2022-10-07  1:07 ` Naoya Horiguchi
  2022-10-07  1:07 ` [PATCH v6 4/4] mm/hwpoison: introduce per-memory_block hwpoison counter Naoya Horiguchi
  3 siblings, 0 replies; 17+ messages in thread
From: Naoya Horiguchi @ 2022-10-07  1:07 UTC (permalink / raw)
  To: linux-mm
  Cc: Andrew Morton, Miaohe Lin, David Hildenbrand, Mike Kravetz,
	Yang Shi, Oscar Salvador, Muchun Song, Jane Chu, Naoya Horiguchi,
	linux-kernel

From: Naoya Horiguchi <naoya.horiguchi@nec.com>

No functional change.

Signed-off-by: Naoya Horiguchi <naoya.horiguchi@nec.com>
Reviewed-by: Miaohe Lin <linmiaohe@huawei.com>
---
ChangeLog v2 -> v3:
- added declaration of num_poisoned_pages_inc() in #ifdef CONFIG_MEMORY_FAILURE
---
 arch/parisc/kernel/pdt.c |  2 +-
 include/linux/mm.h       |  4 ++--
 mm/memory-failure.c      | 14 +++++++-------
 3 files changed, 10 insertions(+), 10 deletions(-)

diff --git a/arch/parisc/kernel/pdt.c b/arch/parisc/kernel/pdt.c
index fdc880e2575a..80943a00e245 100644
--- a/arch/parisc/kernel/pdt.c
+++ b/arch/parisc/kernel/pdt.c
@@ -231,7 +231,7 @@ void __init pdc_pdt_init(void)
 
 		/* mark memory page bad */
 		memblock_reserve(pdt_entry[i] & PAGE_MASK, PAGE_SIZE);
-		num_poisoned_pages_inc();
+		num_poisoned_pages_inc(addr >> PAGE_SHIFT);
 	}
 }
 
diff --git a/include/linux/mm.h b/include/linux/mm.h
index 782ab29bb1fb..17119dbf8fad 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -3279,7 +3279,7 @@ extern int soft_offline_page(unsigned long pfn, int flags);
 #ifdef CONFIG_MEMORY_FAILURE
 extern int __get_huge_page_for_hwpoison(unsigned long pfn, int flags,
 					bool *migratable_cleared);
-extern void num_poisoned_pages_inc(void);
+extern void num_poisoned_pages_inc(unsigned long pfn);
 #else
 static inline int __get_huge_page_for_hwpoison(unsigned long pfn, int flags,
 					bool *migratable_cleared)
@@ -3287,7 +3287,7 @@ static inline int __get_huge_page_for_hwpoison(unsigned long pfn, int flags,
 	return 0;
 }
 
-static inline void num_poisoned_pages_inc(void)
+static inline void num_poisoned_pages_inc(unsigned long pfn)
 {
 }
 #endif
diff --git a/mm/memory-failure.c b/mm/memory-failure.c
index 323e0a21833f..c19a301f16fc 100644
--- a/mm/memory-failure.c
+++ b/mm/memory-failure.c
@@ -74,12 +74,12 @@ atomic_long_t num_poisoned_pages __read_mostly = ATOMIC_LONG_INIT(0);
 
 static bool hw_memory_failure __read_mostly = false;
 
-inline void num_poisoned_pages_inc(void)
+inline void num_poisoned_pages_inc(unsigned long pfn)
 {
 	atomic_long_inc(&num_poisoned_pages);
 }
 
-static inline void num_poisoned_pages_sub(long i)
+static inline void num_poisoned_pages_sub(unsigned long pfn, long i)
 {
 	atomic_long_sub(i, &num_poisoned_pages);
 }
@@ -125,7 +125,7 @@ static bool page_handle_poison(struct page *page, bool hugepage_or_freepage, boo
 	if (release)
 		put_page(page);
 	page_ref_inc(page);
-	num_poisoned_pages_inc();
+	num_poisoned_pages_inc(page_to_pfn(page));
 
 	return true;
 }
@@ -1194,7 +1194,7 @@ static void action_result(unsigned long pfn, enum mf_action_page_type type,
 {
 	trace_memory_failure_event(pfn, type, result);
 
-	num_poisoned_pages_inc();
+	num_poisoned_pages_inc(pfn);
 	pr_err("%#lx: recovery action for %s: %s\n",
 		pfn, action_page_types[type], action_name[result]);
 }
@@ -1741,7 +1741,7 @@ static int hugetlb_set_page_hwpoison(struct page *hpage, struct page *page)
 		llist_add(&raw_hwp->node, head);
 		/* the first error event will be counted in action_result(). */
 		if (ret)
-			num_poisoned_pages_inc();
+			num_poisoned_pages_inc(page_to_pfn(page));
 	} else {
 		/*
 		 * Failed to save raw error info.  We no longer trace all
@@ -2419,7 +2419,7 @@ int unpoison_memory(unsigned long pfn)
 unlock_mutex:
 	mutex_unlock(&mf_mutex);
 	if (!ret || freeit) {
-		num_poisoned_pages_sub(count);
+		num_poisoned_pages_sub(pfn, count);
 		unpoison_pr_info("Unpoison: Software-unpoisoned page %#lx\n",
 				 page_to_pfn(p), &unpoison_rs);
 	}
@@ -2643,5 +2643,5 @@ void clear_hwpoisoned_pages(struct page *memmap, int nr_pages)
 		}
 	}
 	if (total)
-		num_poisoned_pages_sub(total);
+		num_poisoned_pages_sub(0, total);
 }
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH v6 4/4] mm/hwpoison: introduce per-memory_block hwpoison counter
  2022-10-07  1:07 [PATCH v6 0/4] mm, hwpoison: improve handling workload related to hugetlb and memory_hotplug Naoya Horiguchi
                   ` (2 preceding siblings ...)
  2022-10-07  1:07 ` [PATCH v6 3/4] mm/hwpoison: pass pfn to num_poisoned_pages_*() Naoya Horiguchi
@ 2022-10-07  1:07 ` Naoya Horiguchi
  2022-10-07  4:34   ` HORIGUCHI NAOYA(堀口 直也)
                     ` (2 more replies)
  3 siblings, 3 replies; 17+ messages in thread
From: Naoya Horiguchi @ 2022-10-07  1:07 UTC (permalink / raw)
  To: linux-mm
  Cc: Andrew Morton, Miaohe Lin, David Hildenbrand, Mike Kravetz,
	Yang Shi, Oscar Salvador, Muchun Song, Jane Chu, Naoya Horiguchi,
	linux-kernel

From: Naoya Horiguchi <naoya.horiguchi@nec.com>

Currently PageHWPoison flag does not behave well when experiencing memory
hotremove/hotplug.  Any data field in struct page is unreliable when the
associated memory is offlined, and the current mechanism can't tell whether
a memory block is onlined because a new memory devices is installed or
because previous failed offline operations are undone.  Especially if
there's a hwpoisoned memory, it's unclear what the best option is.

So introduce a new mechanism to make struct memory_block remember that
a memory block has hwpoisoned memory inside it. And make any online event
fail if the onlining memory block contains hwpoison.  struct memory_block
is freed and reallocated over ACPI-based hotremove/hotplug, but not over
sysfs-based hotremove/hotplug.  So the new counter can distinguish these
cases.

Signed-off-by: Naoya Horiguchi <naoya.horiguchi@nec.com>
Reported-by: kernel test robot <lkp@intel.com>
---
ChangeLog v5 -> v6:
- fix build errors over memblk_nr_poison_inc() and memblk_nr_poison_sub(),
- pass "struct memory_block *" to memblk_nr_poison() instead of pfn,
- removed clear_hwpoisoned_pages() and call num_poisoned_pages_sub() directly.
- add static keyword to the definition of memblk_nr_poison().
- Mioahe added Reviewed-by for v5, but I have some non trivial changes in
  v6, so let me hold to add it.
- unpoison_memory() properly cancels per-memblk hwpoison counter.

ChangeLog v4 -> v5:
- add Reported-by of lkp bot,
- check both CONFIG_MEMORY_FAILURE and CONFIG_MEMORY_HOTPLUG in introduced #ifdefs,
  intending to fix "undefined reference" errors in aarch64.

ChangeLog v3 -> v4:
- fix build error (https://lore.kernel.org/linux-mm/202209231134.tnhKHRfg-lkp@intel.com/)
  by using memblk_nr_poison() to access to the member ->nr_hwpoison
---
 drivers/base/memory.c  | 40 ++++++++++++++++++++++++++++++++++++++++
 include/linux/memory.h |  3 +++
 include/linux/mm.h     | 18 ++++++++++++++++++
 mm/internal.h          |  8 --------
 mm/memory-failure.c    | 36 +++++++++++-------------------------
 mm/sparse.c            |  2 --
 6 files changed, 72 insertions(+), 35 deletions(-)

diff --git a/drivers/base/memory.c b/drivers/base/memory.c
index 9aa0da991cfb..5d00d8a14c79 100644
--- a/drivers/base/memory.c
+++ b/drivers/base/memory.c
@@ -175,6 +175,17 @@ int memory_notify(unsigned long val, void *v)
 	return blocking_notifier_call_chain(&memory_chain, val, v);
 }
 
+#if defined(CONFIG_MEMORY_FAILURE) && defined(CONFIG_MEMORY_HOTPLUG)
+void memblk_nr_poison_inc(unsigned long pfn);
+void memblk_nr_poison_sub(unsigned long pfn, long i);
+static unsigned long memblk_nr_poison(struct memory_block *mem);
+#else
+static inline unsigned long memblk_nr_poison(struct memory_block *mem)
+{
+	return 0;
+}
+#endif
+
 static int memory_block_online(struct memory_block *mem)
 {
 	unsigned long start_pfn = section_nr_to_pfn(mem->start_section_nr);
@@ -183,6 +194,9 @@ static int memory_block_online(struct memory_block *mem)
 	struct zone *zone;
 	int ret;
 
+	if (memblk_nr_poison(mem))
+		return -EHWPOISON;
+
 	zone = zone_for_pfn_range(mem->online_type, mem->nid, mem->group,
 				  start_pfn, nr_pages);
 
@@ -864,6 +878,7 @@ void remove_memory_block_devices(unsigned long start, unsigned long size)
 		mem = find_memory_block_by_id(block_id);
 		if (WARN_ON_ONCE(!mem))
 			continue;
+		num_poisoned_pages_sub(-1UL, memblk_nr_poison(mem));
 		unregister_memory_block_under_nodes(mem);
 		remove_memory_block(mem);
 	}
@@ -1164,3 +1179,28 @@ int walk_dynamic_memory_groups(int nid, walk_memory_groups_func_t func,
 	}
 	return ret;
 }
+
+#if defined(CONFIG_MEMORY_FAILURE) && defined(CONFIG_MEMORY_HOTPLUG)
+void memblk_nr_poison_inc(unsigned long pfn)
+{
+	const unsigned long block_id = pfn_to_block_id(pfn);
+	struct memory_block *mem = find_memory_block_by_id(block_id);
+
+	if (mem)
+		atomic_long_inc(&mem->nr_hwpoison);
+}
+
+void memblk_nr_poison_sub(unsigned long pfn, long i)
+{
+	const unsigned long block_id = pfn_to_block_id(pfn);
+	struct memory_block *mem = find_memory_block_by_id(block_id);
+
+	if (mem)
+		atomic_long_sub(i, &mem->nr_hwpoison);
+}
+
+static unsigned long memblk_nr_poison(struct memory_block *mem)
+{
+	return atomic_long_read(&mem->nr_hwpoison);
+}
+#endif
diff --git a/include/linux/memory.h b/include/linux/memory.h
index aa619464a1df..ad8cd9bb3239 100644
--- a/include/linux/memory.h
+++ b/include/linux/memory.h
@@ -85,6 +85,9 @@ struct memory_block {
 	unsigned long nr_vmemmap_pages;
 	struct memory_group *group;	/* group (if any) for this block */
 	struct list_head group_next;	/* next block inside memory group */
+#if defined(CONFIG_MEMORY_FAILURE) && defined(CONFIG_MEMORY_HOTPLUG)
+	atomic_long_t nr_hwpoison;
+#endif
 };
 
 int arch_get_memory_phys_device(unsigned long start_pfn);
diff --git a/include/linux/mm.h b/include/linux/mm.h
index 17119dbf8fad..f80269e90772 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -3280,6 +3280,7 @@ extern int soft_offline_page(unsigned long pfn, int flags);
 extern int __get_huge_page_for_hwpoison(unsigned long pfn, int flags,
 					bool *migratable_cleared);
 extern void num_poisoned_pages_inc(unsigned long pfn);
+extern void num_poisoned_pages_sub(unsigned long pfn, long i);
 #else
 static inline int __get_huge_page_for_hwpoison(unsigned long pfn, int flags,
 					bool *migratable_cleared)
@@ -3290,6 +3291,23 @@ static inline int __get_huge_page_for_hwpoison(unsigned long pfn, int flags,
 static inline void num_poisoned_pages_inc(unsigned long pfn)
 {
 }
+
+static inline void num_poisoned_pages_sub(unsigned long pfn, long i)
+{
+}
+#endif
+
+#if defined(CONFIG_MEMORY_FAILURE) && defined(CONFIG_MEMORY_HOTPLUG)
+extern void memblk_nr_poison_inc(unsigned long pfn);
+extern void memblk_nr_poison_sub(unsigned long pfn, long i);
+#else
+static inline void memblk_nr_poison_inc(unsigned long pfn)
+{
+}
+
+static inline void memblk_nr_poison_sub(unsigned long pfn, long i)
+{
+}
 #endif
 
 #ifndef arch_memory_failure
diff --git a/mm/internal.h b/mm/internal.h
index b3002e03c28f..42ba8b96cab5 100644
--- a/mm/internal.h
+++ b/mm/internal.h
@@ -708,14 +708,6 @@ extern u64 hwpoison_filter_flags_value;
 extern u64 hwpoison_filter_memcg;
 extern u32 hwpoison_filter_enable;
 
-#ifdef CONFIG_MEMORY_FAILURE
-void clear_hwpoisoned_pages(struct page *memmap, int nr_pages);
-#else
-static inline void clear_hwpoisoned_pages(struct page *memmap, int nr_pages)
-{
-}
-#endif
-
 extern unsigned long  __must_check vm_mmap_pgoff(struct file *, unsigned long,
         unsigned long, unsigned long,
         unsigned long, unsigned long);
diff --git a/mm/memory-failure.c b/mm/memory-failure.c
index c19a301f16fc..4f5921590b76 100644
--- a/mm/memory-failure.c
+++ b/mm/memory-failure.c
@@ -77,11 +77,14 @@ static bool hw_memory_failure __read_mostly = false;
 inline void num_poisoned_pages_inc(unsigned long pfn)
 {
 	atomic_long_inc(&num_poisoned_pages);
+	memblk_nr_poison_inc(pfn);
 }
 
-static inline void num_poisoned_pages_sub(unsigned long pfn, long i)
+inline void num_poisoned_pages_sub(unsigned long pfn, long i)
 {
 	atomic_long_sub(i, &num_poisoned_pages);
+	if (pfn != -1UL)
+		memblk_nr_poison_sub(pfn, i);
 }
 
 /*
@@ -1706,6 +1709,8 @@ static unsigned long __free_raw_hwp_pages(struct page *hpage, bool move_flag)
 
 		if (move_flag)
 			SetPageHWPoison(p->page);
+		else
+			num_poisoned_pages_sub(page_to_pfn(p->page), 1);
 		kfree(p);
 		count++;
 	}
@@ -2337,6 +2342,7 @@ int unpoison_memory(unsigned long pfn)
 	int ret = -EBUSY;
 	int freeit = 0;
 	unsigned long count = 1;
+	bool huge = false;
 	static DEFINE_RATELIMIT_STATE(unpoison_rs, DEFAULT_RATELIMIT_INTERVAL,
 					DEFAULT_RATELIMIT_BURST);
 
@@ -2385,6 +2391,7 @@ int unpoison_memory(unsigned long pfn)
 	ret = get_hwpoison_page(p, MF_UNPOISON);
 	if (!ret) {
 		if (PageHuge(p)) {
+			huge = true;
 			count = free_raw_hwp_pages(page, false);
 			if (count == 0) {
 				ret = -EBUSY;
@@ -2400,6 +2407,7 @@ int unpoison_memory(unsigned long pfn)
 					 pfn, &unpoison_rs);
 	} else {
 		if (PageHuge(p)) {
+			huge = true;
 			count = free_raw_hwp_pages(page, false);
 			if (count == 0) {
 				ret = -EBUSY;
@@ -2419,7 +2427,8 @@ int unpoison_memory(unsigned long pfn)
 unlock_mutex:
 	mutex_unlock(&mf_mutex);
 	if (!ret || freeit) {
-		num_poisoned_pages_sub(pfn, count);
+		if (!huge)
+			num_poisoned_pages_sub(pfn, 1);
 		unpoison_pr_info("Unpoison: Software-unpoisoned page %#lx\n",
 				 page_to_pfn(p), &unpoison_rs);
 	}
@@ -2622,26 +2631,3 @@ int soft_offline_page(unsigned long pfn, int flags)
 
 	return ret;
 }
-
-void clear_hwpoisoned_pages(struct page *memmap, int nr_pages)
-{
-	int i, total = 0;
-
-	/*
-	 * A further optimization is to have per section refcounted
-	 * num_poisoned_pages.  But that would need more space per memmap, so
-	 * for now just do a quick global check to speed up this routine in the
-	 * absence of bad pages.
-	 */
-	if (atomic_long_read(&num_poisoned_pages) == 0)
-		return;
-
-	for (i = 0; i < nr_pages; i++) {
-		if (PageHWPoison(&memmap[i])) {
-			total++;
-			ClearPageHWPoison(&memmap[i]);
-		}
-	}
-	if (total)
-		num_poisoned_pages_sub(0, total);
-}
diff --git a/mm/sparse.c b/mm/sparse.c
index e5a8a3a0edd7..2779b419ef2a 100644
--- a/mm/sparse.c
+++ b/mm/sparse.c
@@ -926,8 +926,6 @@ void sparse_remove_section(struct mem_section *ms, unsigned long pfn,
 		unsigned long nr_pages, unsigned long map_offset,
 		struct vmem_altmap *altmap)
 {
-	clear_hwpoisoned_pages(pfn_to_page(pfn) + map_offset,
-			nr_pages - map_offset);
 	section_deactivate(pfn, nr_pages, altmap);
 }
 #endif /* CONFIG_MEMORY_HOTPLUG */
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* Re: [PATCH v6 4/4] mm/hwpoison: introduce per-memory_block hwpoison counter
  2022-10-07  1:07 ` [PATCH v6 4/4] mm/hwpoison: introduce per-memory_block hwpoison counter Naoya Horiguchi
@ 2022-10-07  4:34   ` HORIGUCHI NAOYA(堀口 直也)
  2022-10-13  8:33   ` Oscar Salvador
  2022-10-15  2:28   ` Miaohe Lin
  2 siblings, 0 replies; 17+ messages in thread
From: HORIGUCHI NAOYA(堀口 直也) @ 2022-10-07  4:34 UTC (permalink / raw)
  To: Naoya Horiguchi
  Cc: linux-mm, Andrew Morton, Miaohe Lin, David Hildenbrand,
	Mike Kravetz, Yang Shi, Oscar Salvador, Muchun Song, Jane Chu,
	linux-kernel

On Fri, Oct 07, 2022 at 10:07:06AM +0900, Naoya Horiguchi wrote:
...
> diff --git a/drivers/base/memory.c b/drivers/base/memory.c
> index 9aa0da991cfb..5d00d8a14c79 100644
> --- a/drivers/base/memory.c
> +++ b/drivers/base/memory.c
> @@ -175,6 +175,17 @@ int memory_notify(unsigned long val, void *v)
>  	return blocking_notifier_call_chain(&memory_chain, val, v);
>  }
>  
> +#if defined(CONFIG_MEMORY_FAILURE) && defined(CONFIG_MEMORY_HOTPLUG)

> +void memblk_nr_poison_inc(unsigned long pfn);
> +void memblk_nr_poison_sub(unsigned long pfn, long i);

Sorry again, these prototypes should not be necessary. I'll remove these
when I need resubmit the patch series.

It seems that scripts/checkpatch.pl shows the following warning by these.

  WARNING: externs should be avoided in .c files
  #59: FILE: drivers/base/memory.c:180:
  +void memblk_nr_poison_sub(unsigned long pfn, long i);

This disappears by removing the prototypes.

Thanks,
Naoya Horiguchi

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v6 4/4] mm/hwpoison: introduce per-memory_block hwpoison counter
  2022-10-07  1:07 ` [PATCH v6 4/4] mm/hwpoison: introduce per-memory_block hwpoison counter Naoya Horiguchi
  2022-10-07  4:34   ` HORIGUCHI NAOYA(堀口 直也)
@ 2022-10-13  8:33   ` Oscar Salvador
  2022-10-13 10:09     ` Naoya Horiguchi
  2022-10-15  2:28   ` Miaohe Lin
  2 siblings, 1 reply; 17+ messages in thread
From: Oscar Salvador @ 2022-10-13  8:33 UTC (permalink / raw)
  To: Naoya Horiguchi
  Cc: linux-mm, Andrew Morton, Miaohe Lin, David Hildenbrand,
	Mike Kravetz, Yang Shi, Muchun Song, Jane Chu, Naoya Horiguchi,
	linux-kernel

On Fri, Oct 07, 2022 at 10:07:06AM +0900, Naoya Horiguchi wrote:
> From: Naoya Horiguchi <naoya.horiguchi@nec.com>
> 
> Currently PageHWPoison flag does not behave well when experiencing memory
> hotremove/hotplug.  Any data field in struct page is unreliable when the
> associated memory is offlined, and the current mechanism can't tell whether
> a memory block is onlined because a new memory devices is installed or
> because previous failed offline operations are undone.  Especially if
> there's a hwpoisoned memory, it's unclear what the best option is.
> 
> So introduce a new mechanism to make struct memory_block remember that
> a memory block has hwpoisoned memory inside it. And make any online event
> fail if the onlining memory block contains hwpoison.  struct memory_block
> is freed and reallocated over ACPI-based hotremove/hotplug, but not over
> sysfs-based hotremove/hotplug.  So the new counter can distinguish these
> cases.
> 
> Signed-off-by: Naoya Horiguchi <naoya.horiguchi@nec.com>
> Reported-by: kernel test robot <lkp@intel.com>

I glanzed over it and looks good overall.
Have a small question though:

> @@ -864,6 +878,7 @@ void remove_memory_block_devices(unsigned long start, unsigned long size)
>  		mem = find_memory_block_by_id(block_id);
>  		if (WARN_ON_ONCE(!mem))
>  			continue;
> +		num_poisoned_pages_sub(-1UL, memblk_nr_poison(mem));

Why does num_poisoned_pages_sub() have to make this distinction (!-1 == -1)
for the hot-remove stage?


-- 
Oscar Salvador
SUSE Labs

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v6 4/4] mm/hwpoison: introduce per-memory_block hwpoison counter
  2022-10-13  8:33   ` Oscar Salvador
@ 2022-10-13 10:09     ` Naoya Horiguchi
  0 siblings, 0 replies; 17+ messages in thread
From: Naoya Horiguchi @ 2022-10-13 10:09 UTC (permalink / raw)
  To: Oscar Salvador
  Cc: linux-mm, Andrew Morton, Miaohe Lin, David Hildenbrand,
	Mike Kravetz, Yang Shi, Muchun Song, Jane Chu, Naoya Horiguchi,
	linux-kernel

On Thu, Oct 13, 2022 at 10:33:45AM +0200, Oscar Salvador wrote:
> On Fri, Oct 07, 2022 at 10:07:06AM +0900, Naoya Horiguchi wrote:
> > From: Naoya Horiguchi <naoya.horiguchi@nec.com>
> > 
> > Currently PageHWPoison flag does not behave well when experiencing memory
> > hotremove/hotplug.  Any data field in struct page is unreliable when the
> > associated memory is offlined, and the current mechanism can't tell whether
> > a memory block is onlined because a new memory devices is installed or
> > because previous failed offline operations are undone.  Especially if
> > there's a hwpoisoned memory, it's unclear what the best option is.
> > 
> > So introduce a new mechanism to make struct memory_block remember that
> > a memory block has hwpoisoned memory inside it. And make any online event
> > fail if the onlining memory block contains hwpoison.  struct memory_block
> > is freed and reallocated over ACPI-based hotremove/hotplug, but not over
> > sysfs-based hotremove/hotplug.  So the new counter can distinguish these
> > cases.
> > 
> > Signed-off-by: Naoya Horiguchi <naoya.horiguchi@nec.com>
> > Reported-by: kernel test robot <lkp@intel.com>
> 
> I glanzed over it and looks good overall.
> Have a small question though:

Thank you for looking.

> 
> > @@ -864,6 +878,7 @@ void remove_memory_block_devices(unsigned long start, unsigned long size)
> >  		mem = find_memory_block_by_id(block_id);
> >  		if (WARN_ON_ONCE(!mem))
> >  			continue;
> > +		num_poisoned_pages_sub(-1UL, memblk_nr_poison(mem));
> 
> Why does num_poisoned_pages_sub() have to make this distinction (!-1 == -1)
> for the hot-remove stage?

The first argument is used to find memory_block including the given pfn.
And in the above context remove_memory_block_devices() already has the
pointer "mem", so recalcurating it looked to me not necessary.  Moreover,
this code is about to free the memory_block so updating the counter inside
it can be avoided.  This is just a tiny optimization, and there can be
better option.

Thanks,
Naoya Horiguchi

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v6 1/4] mm,hwpoison,hugetlb,memory_hotplug: hotremove memory section with hwpoisoned hugepage
  2022-10-07  1:07 ` [PATCH v6 1/4] mm,hwpoison,hugetlb,memory_hotplug: hotremove memory section with hwpoisoned hugepage Naoya Horiguchi
@ 2022-10-13 14:17   ` Oscar Salvador
  2022-10-15  1:58   ` Miaohe Lin
  1 sibling, 0 replies; 17+ messages in thread
From: Oscar Salvador @ 2022-10-13 14:17 UTC (permalink / raw)
  To: Naoya Horiguchi
  Cc: linux-mm, Andrew Morton, Miaohe Lin, David Hildenbrand,
	Mike Kravetz, Yang Shi, Muchun Song, Jane Chu, Naoya Horiguchi,
	linux-kernel

On Fri, Oct 07, 2022 at 10:07:03AM +0900, Naoya Horiguchi wrote:
> From: Naoya Horiguchi <naoya.horiguchi@nec.com>
> 
> HWPoisoned page is not supposed to be accessed once marked, but currently
> such accesses can happen during memory hotremove because do_migrate_range()
> can be called before dissolve_free_huge_pages() is called.
> 
> Clear HPageMigratable for hwpoisoned hugepages to prevent them from being
> migrated.  This should be done in hugetlb_lock to avoid race against
> isolate_hugetlb().
> 
> get_hwpoison_huge_page() needs to have a flag to show it's called from
> unpoison to take refcount of hwpoisoned hugepages, so add it.
> 
> Reported-by: Miaohe Lin <linmiaohe@huawei.com>
> Signed-off-by: Naoya Horiguchi <naoya.horiguchi@nec.com>

I could not spot any red flags:

Reviewed-by: Oscar Salvador <osalvador@suse.de>

> ---
> ChangeLog v3 -> v6:
> - introduce migratable_cleared to remember that HPageMigratable is
>   cleared in error handling.  It's needed to cancel when an error event
>   is filtered by hwpoison_filter(). (Thanks to Miaohe)
> 
> ChangeLog v2 -> v3
> - move to the approach of clearing HPageMigratable instead of shifting
>   dissolve_free_huge_pages.
> ---
>  include/linux/hugetlb.h | 10 ++++++----
>  include/linux/mm.h      |  6 ++++--
>  mm/hugetlb.c            |  9 +++++----
>  mm/memory-failure.c     | 21 +++++++++++++++++----
>  4 files changed, 32 insertions(+), 14 deletions(-)
> 
> diff --git a/include/linux/hugetlb.h b/include/linux/hugetlb.h
> index 32d45e96a894..19b99ff7fea0 100644
> --- a/include/linux/hugetlb.h
> +++ b/include/linux/hugetlb.h
> @@ -183,8 +183,9 @@ bool hugetlb_reserve_pages(struct inode *inode, long from, long to,
>  long hugetlb_unreserve_pages(struct inode *inode, long start, long end,
>  						long freed);
>  int isolate_hugetlb(struct page *page, struct list_head *list);
> -int get_hwpoison_huge_page(struct page *page, bool *hugetlb);
> -int get_huge_page_for_hwpoison(unsigned long pfn, int flags);
> +int get_hwpoison_huge_page(struct page *page, bool *hugetlb, bool unpoison);
> +int get_huge_page_for_hwpoison(unsigned long pfn, int flags,
> +				bool *migratable_cleared);
>  void putback_active_hugepage(struct page *page);
>  void move_hugetlb_state(struct page *oldpage, struct page *newpage, int reason);
>  void free_huge_page(struct page *page);
> @@ -391,12 +392,13 @@ static inline int isolate_hugetlb(struct page *page, struct list_head *list)
>  	return -EBUSY;
>  }
>  
> -static inline int get_hwpoison_huge_page(struct page *page, bool *hugetlb)
> +static inline int get_hwpoison_huge_page(struct page *page, bool *hugetlb, bool unpoison)
>  {
>  	return 0;
>  }
>  
> -static inline int get_huge_page_for_hwpoison(unsigned long pfn, int flags)
> +static inline int get_huge_page_for_hwpoison(unsigned long pfn, int flags,
> +					bool *migratable_cleared)
>  {
>  	return 0;
>  }
> diff --git a/include/linux/mm.h b/include/linux/mm.h
> index 8bbcccbc5565..3264bf993ad8 100644
> --- a/include/linux/mm.h
> +++ b/include/linux/mm.h
> @@ -3277,9 +3277,11 @@ extern void shake_page(struct page *p);
>  extern atomic_long_t num_poisoned_pages __read_mostly;
>  extern int soft_offline_page(unsigned long pfn, int flags);
>  #ifdef CONFIG_MEMORY_FAILURE
> -extern int __get_huge_page_for_hwpoison(unsigned long pfn, int flags);
> +extern int __get_huge_page_for_hwpoison(unsigned long pfn, int flags,
> +					bool *migratable_cleared);
>  #else
> -static inline int __get_huge_page_for_hwpoison(unsigned long pfn, int flags)
> +static inline int __get_huge_page_for_hwpoison(unsigned long pfn, int flags,
> +					bool *migratable_cleared)
>  {
>  	return 0;
>  }
> diff --git a/mm/hugetlb.c b/mm/hugetlb.c
> index 63fe47a0240a..0e482dfaf92e 100644
> --- a/mm/hugetlb.c
> +++ b/mm/hugetlb.c
> @@ -7253,7 +7253,7 @@ int isolate_hugetlb(struct page *page, struct list_head *list)
>  	return ret;
>  }
>  
> -int get_hwpoison_huge_page(struct page *page, bool *hugetlb)
> +int get_hwpoison_huge_page(struct page *page, bool *hugetlb, bool unpoison)
>  {
>  	int ret = 0;
>  
> @@ -7263,7 +7263,7 @@ int get_hwpoison_huge_page(struct page *page, bool *hugetlb)
>  		*hugetlb = true;
>  		if (HPageFreed(page))
>  			ret = 0;
> -		else if (HPageMigratable(page))
> +		else if (HPageMigratable(page) || unpoison)
>  			ret = get_page_unless_zero(page);
>  		else
>  			ret = -EBUSY;
> @@ -7272,12 +7272,13 @@ int get_hwpoison_huge_page(struct page *page, bool *hugetlb)
>  	return ret;
>  }
>  
> -int get_huge_page_for_hwpoison(unsigned long pfn, int flags)
> +int get_huge_page_for_hwpoison(unsigned long pfn, int flags,
> +				bool *migratable_cleared)
>  {
>  	int ret;
>  
>  	spin_lock_irq(&hugetlb_lock);
> -	ret = __get_huge_page_for_hwpoison(pfn, flags);
> +	ret = __get_huge_page_for_hwpoison(pfn, flags, migratable_cleared);
>  	spin_unlock_irq(&hugetlb_lock);
>  	return ret;
>  }
> diff --git a/mm/memory-failure.c b/mm/memory-failure.c
> index 145bb561ddb3..d4fef56c0438 100644
> --- a/mm/memory-failure.c
> +++ b/mm/memory-failure.c
> @@ -1244,7 +1244,7 @@ static int __get_hwpoison_page(struct page *page, unsigned long flags)
>  	int ret = 0;
>  	bool hugetlb = false;
>  
> -	ret = get_hwpoison_huge_page(head, &hugetlb);
> +	ret = get_hwpoison_huge_page(head, &hugetlb, false);
>  	if (hugetlb)
>  		return ret;
>  
> @@ -1334,7 +1334,7 @@ static int __get_unpoison_page(struct page *page)
>  	int ret = 0;
>  	bool hugetlb = false;
>  
> -	ret = get_hwpoison_huge_page(head, &hugetlb);
> +	ret = get_hwpoison_huge_page(head, &hugetlb, true);
>  	if (hugetlb)
>  		return ret;
>  
> @@ -1785,7 +1785,8 @@ void hugetlb_clear_page_hwpoison(struct page *hpage)
>   *   -EBUSY        - the hugepage is busy (try to retry)
>   *   -EHWPOISON    - the hugepage is already hwpoisoned
>   */
> -int __get_huge_page_for_hwpoison(unsigned long pfn, int flags)
> +int __get_huge_page_for_hwpoison(unsigned long pfn, int flags,
> +				 bool *migratable_cleared)
>  {
>  	struct page *page = pfn_to_page(pfn);
>  	struct page *head = compound_head(page);
> @@ -1815,6 +1816,15 @@ int __get_huge_page_for_hwpoison(unsigned long pfn, int flags)
>  		goto out;
>  	}
>  
> +	/*
> +	 * Clearing HPageMigratable for hwpoisoned hugepages to prevent them
> +	 * from being migrated by memory hotremove.
> +	 */
> +	if (count_increased) {
> +		*migratable_cleared = true;
> +		ClearHPageMigratable(head);
> +	}
> +
>  	return ret;
>  out:
>  	if (count_increased)
> @@ -1834,10 +1844,11 @@ static int try_memory_failure_hugetlb(unsigned long pfn, int flags, int *hugetlb
>  	struct page *p = pfn_to_page(pfn);
>  	struct page *head;
>  	unsigned long page_flags;
> +	bool migratable_cleared = false;
>  
>  	*hugetlb = 1;
>  retry:
> -	res = get_huge_page_for_hwpoison(pfn, flags);
> +	res = get_huge_page_for_hwpoison(pfn, flags, &migratable_cleared);
>  	if (res == 2) { /* fallback to normal page handling */
>  		*hugetlb = 0;
>  		return 0;
> @@ -1862,6 +1873,8 @@ static int try_memory_failure_hugetlb(unsigned long pfn, int flags, int *hugetlb
>  
>  	if (hwpoison_filter(p)) {
>  		hugetlb_clear_page_hwpoison(head);
> +		if (migratable_cleared)
> +			SetHPageMigratable(head);
>  		unlock_page(head);
>  		if (res == 1)
>  			put_page(head);
> -- 
> 2.25.1
> 
> 

-- 
Oscar Salvador
SUSE Labs

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v6 2/4] mm/hwpoison: move definitions of num_poisoned_pages_* to memory-failure.c
  2022-10-07  1:07 ` [PATCH v6 2/4] mm/hwpoison: move definitions of num_poisoned_pages_* to memory-failure.c Naoya Horiguchi
@ 2022-10-13 14:31   ` Oscar Salvador
  2022-10-14  6:38     ` Naoya Horiguchi
  0 siblings, 1 reply; 17+ messages in thread
From: Oscar Salvador @ 2022-10-13 14:31 UTC (permalink / raw)
  To: Naoya Horiguchi
  Cc: linux-mm, Andrew Morton, Miaohe Lin, David Hildenbrand,
	Mike Kravetz, Yang Shi, Muchun Song, Jane Chu, Naoya Horiguchi,
	linux-kernel

On Fri, Oct 07, 2022 at 10:07:04AM +0900, Naoya Horiguchi wrote:
> From: Naoya Horiguchi <naoya.horiguchi@nec.com>
> 
> These interfaces will be used by drivers/base/memory.c by later patch, so as a
> preparatory work move them to more common header file visible to the file.
> 
> Signed-off-by: Naoya Horiguchi <naoya.horiguchi@nec.com>
> Reviewed-by: Miaohe Lin <linmiaohe@huawei.com>
> ---
> ChangeLog v3 -> v6:
> - remove static in definition of num_poisoned_pages_inc() to fix build error.
> 
> ChangeLog v2 -> v3:
> - added declaration of num_poisoned_pages_inc() in #ifdef CONFIG_MEMORY_FAILURE
> ---
>  arch/parisc/kernel/pdt.c |  3 +--
>  include/linux/mm.h       |  5 +++++
>  include/linux/swapops.h  | 24 ++----------------------
>  mm/memory-failure.c      | 10 ++++++++++
>  4 files changed, 18 insertions(+), 24 deletions(-)
> 
> diff --git a/arch/parisc/kernel/pdt.c b/arch/parisc/kernel/pdt.c
> index e391b175f5ec..fdc880e2575a 100644
> --- a/arch/parisc/kernel/pdt.c
> +++ b/arch/parisc/kernel/pdt.c
> @@ -18,8 +18,7 @@
>  #include <linux/kthread.h>
>  #include <linux/initrd.h>
>  #include <linux/pgtable.h>
> -#include <linux/swap.h>
> -#include <linux/swapops.h>
> +#include <linux/mm.h>

I am probably missing something.
num_poisoned_pages_* functions are in swapops.h, but why are you removing swap.h as well?

Also, reading the changelog it sounded like both functions would be in mm.h,
but actually only the _inc part is.

The rest looks good to me.
 

-- 
Oscar Salvador
SUSE Labs

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v6 2/4] mm/hwpoison: move definitions of num_poisoned_pages_* to memory-failure.c
  2022-10-13 14:31   ` Oscar Salvador
@ 2022-10-14  6:38     ` Naoya Horiguchi
  0 siblings, 0 replies; 17+ messages in thread
From: Naoya Horiguchi @ 2022-10-14  6:38 UTC (permalink / raw)
  To: Oscar Salvador
  Cc: linux-mm, Andrew Morton, Miaohe Lin, David Hildenbrand,
	Mike Kravetz, Yang Shi, Muchun Song, Jane Chu, Naoya Horiguchi,
	linux-kernel

On Thu, Oct 13, 2022 at 04:31:53PM +0200, Oscar Salvador wrote:
> On Fri, Oct 07, 2022 at 10:07:04AM +0900, Naoya Horiguchi wrote:
> > From: Naoya Horiguchi <naoya.horiguchi@nec.com>
> > 
> > These interfaces will be used by drivers/base/memory.c by later patch, so as a
> > preparatory work move them to more common header file visible to the file.
> > 
> > Signed-off-by: Naoya Horiguchi <naoya.horiguchi@nec.com>
> > Reviewed-by: Miaohe Lin <linmiaohe@huawei.com>
> > ---
> > ChangeLog v3 -> v6:
> > - remove static in definition of num_poisoned_pages_inc() to fix build error.
> > 
> > ChangeLog v2 -> v3:
> > - added declaration of num_poisoned_pages_inc() in #ifdef CONFIG_MEMORY_FAILURE
> > ---
> >  arch/parisc/kernel/pdt.c |  3 +--
> >  include/linux/mm.h       |  5 +++++
> >  include/linux/swapops.h  | 24 ++----------------------
> >  mm/memory-failure.c      | 10 ++++++++++
> >  4 files changed, 18 insertions(+), 24 deletions(-)
> > 
> > diff --git a/arch/parisc/kernel/pdt.c b/arch/parisc/kernel/pdt.c
> > index e391b175f5ec..fdc880e2575a 100644
> > --- a/arch/parisc/kernel/pdt.c
> > +++ b/arch/parisc/kernel/pdt.c
> > @@ -18,8 +18,7 @@
> >  #include <linux/kthread.h>
> >  #include <linux/initrd.h>
> >  #include <linux/pgtable.h>
> > -#include <linux/swap.h>
> > -#include <linux/swapops.h>
> > +#include <linux/mm.h>
> 
> I am probably missing something.
> num_poisoned_pages_* functions are in swapops.h, but why are you removing swap.h as well?

This file included swap.h and swapops.h together to use num_poisoned_pages_inc()
by commit 0e5a7ff6e36a ("parisc: Report bad pages as HardwareCorrupted"),
so I thought these may be updated together.

> 
> Also, reading the changelog it sounded like both functions would be in mm.h,
> but actually only the _inc part is.

> > ChangeLog v2 -> v3:
> > - added declaration of num_poisoned_pages_inc() in #ifdef CONFIG_MEMORY_FAILURE

Yeah, important part of this log is "in #ifdef CONFIG_MEMORY_FAILURE", but
this might not be clear from my writing.  Sorry about that, I'll care about
making change log clearer from now.  This change log will not included when
merged to mainline, so this hopefully will not confuse anyone.

> 
> The rest looks good to me.

Thank you.

- Naoya Horiguchi

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v6 1/4] mm,hwpoison,hugetlb,memory_hotplug: hotremove memory section with hwpoisoned hugepage
  2022-10-07  1:07 ` [PATCH v6 1/4] mm,hwpoison,hugetlb,memory_hotplug: hotremove memory section with hwpoisoned hugepage Naoya Horiguchi
  2022-10-13 14:17   ` Oscar Salvador
@ 2022-10-15  1:58   ` Miaohe Lin
  2022-10-17  7:24     ` HORIGUCHI NAOYA(堀口 直也)
  1 sibling, 1 reply; 17+ messages in thread
From: Miaohe Lin @ 2022-10-15  1:58 UTC (permalink / raw)
  To: Naoya Horiguchi, linux-mm
  Cc: Andrew Morton, David Hildenbrand, Mike Kravetz, Yang Shi,
	Oscar Salvador, Muchun Song, Jane Chu, Naoya Horiguchi,
	linux-kernel

On 2022/10/7 9:07, Naoya Horiguchi wrote:
> From: Naoya Horiguchi <naoya.horiguchi@nec.com>
> 
> HWPoisoned page is not supposed to be accessed once marked, but currently
> such accesses can happen during memory hotremove because do_migrate_range()
> can be called before dissolve_free_huge_pages() is called.
> 
> Clear HPageMigratable for hwpoisoned hugepages to prevent them from being
> migrated.  This should be done in hugetlb_lock to avoid race against
> isolate_hugetlb().
> 
> get_hwpoison_huge_page() needs to have a flag to show it's called from
> unpoison to take refcount of hwpoisoned hugepages, so add it.
> 
> Reported-by: Miaohe Lin <linmiaohe@huawei.com>
> Signed-off-by: Naoya Horiguchi <naoya.horiguchi@nec.com>

Sorry for late respond. I was spending a busy week. :) And thanks for your work, Naoya.

> ---
> ChangeLog v3 -> v6:
> - introduce migratable_cleared to remember that HPageMigratable is
>   cleared in error handling.  It's needed to cancel when an error event
>   is filtered by hwpoison_filter(). (Thanks to Miaohe)
> 
> ChangeLog v2 -> v3
> - move to the approach of clearing HPageMigratable instead of shifting
>   dissolve_free_huge_pages.
> ---
>  include/linux/hugetlb.h | 10 ++++++----
>  include/linux/mm.h      |  6 ++++--
>  mm/hugetlb.c            |  9 +++++----
>  mm/memory-failure.c     | 21 +++++++++++++++++----
>  4 files changed, 32 insertions(+), 14 deletions(-)
> 
> diff --git a/include/linux/hugetlb.h b/include/linux/hugetlb.h
> index 32d45e96a894..19b99ff7fea0 100644
> --- a/include/linux/hugetlb.h
> +++ b/include/linux/hugetlb.h
> @@ -183,8 +183,9 @@ bool hugetlb_reserve_pages(struct inode *inode, long from, long to,
>  long hugetlb_unreserve_pages(struct inode *inode, long start, long end,
>  						long freed);
>  int isolate_hugetlb(struct page *page, struct list_head *list);
> -int get_hwpoison_huge_page(struct page *page, bool *hugetlb);
> -int get_huge_page_for_hwpoison(unsigned long pfn, int flags);
> +int get_hwpoison_huge_page(struct page *page, bool *hugetlb, bool unpoison);
> +int get_huge_page_for_hwpoison(unsigned long pfn, int flags,
> +				bool *migratable_cleared);
>  void putback_active_hugepage(struct page *page);
>  void move_hugetlb_state(struct page *oldpage, struct page *newpage, int reason);
>  void free_huge_page(struct page *page);
> @@ -391,12 +392,13 @@ static inline int isolate_hugetlb(struct page *page, struct list_head *list)
>  	return -EBUSY;
>  }
>  
> -static inline int get_hwpoison_huge_page(struct page *page, bool *hugetlb)
> +static inline int get_hwpoison_huge_page(struct page *page, bool *hugetlb, bool unpoison)
>  {
>  	return 0;
>  }
>  
> -static inline int get_huge_page_for_hwpoison(unsigned long pfn, int flags)
> +static inline int get_huge_page_for_hwpoison(unsigned long pfn, int flags,
> +					bool *migratable_cleared)
>  {
>  	return 0;
>  }
> diff --git a/include/linux/mm.h b/include/linux/mm.h
> index 8bbcccbc5565..3264bf993ad8 100644
> --- a/include/linux/mm.h
> +++ b/include/linux/mm.h
> @@ -3277,9 +3277,11 @@ extern void shake_page(struct page *p);
>  extern atomic_long_t num_poisoned_pages __read_mostly;
>  extern int soft_offline_page(unsigned long pfn, int flags);
>  #ifdef CONFIG_MEMORY_FAILURE
> -extern int __get_huge_page_for_hwpoison(unsigned long pfn, int flags);
> +extern int __get_huge_page_for_hwpoison(unsigned long pfn, int flags,
> +					bool *migratable_cleared);
>  #else
> -static inline int __get_huge_page_for_hwpoison(unsigned long pfn, int flags)
> +static inline int __get_huge_page_for_hwpoison(unsigned long pfn, int flags,
> +					bool *migratable_cleared)
>  {
>  	return 0;
>  }
> diff --git a/mm/hugetlb.c b/mm/hugetlb.c
> index 63fe47a0240a..0e482dfaf92e 100644
> --- a/mm/hugetlb.c
> +++ b/mm/hugetlb.c
> @@ -7253,7 +7253,7 @@ int isolate_hugetlb(struct page *page, struct list_head *list)
>  	return ret;
>  }
>  
> -int get_hwpoison_huge_page(struct page *page, bool *hugetlb)
> +int get_hwpoison_huge_page(struct page *page, bool *hugetlb, bool unpoison)
>  {
>  	int ret = 0;
>  
> @@ -7263,7 +7263,7 @@ int get_hwpoison_huge_page(struct page *page, bool *hugetlb)
>  		*hugetlb = true;
>  		if (HPageFreed(page))
>  			ret = 0;
> -		else if (HPageMigratable(page))
> +		else if (HPageMigratable(page) || unpoison)
>  			ret = get_page_unless_zero(page);
>  		else
>  			ret = -EBUSY;
> @@ -7272,12 +7272,13 @@ int get_hwpoison_huge_page(struct page *page, bool *hugetlb)
>  	return ret;
>  }
>  
> -int get_huge_page_for_hwpoison(unsigned long pfn, int flags)
> +int get_huge_page_for_hwpoison(unsigned long pfn, int flags,
> +				bool *migratable_cleared)
>  {
>  	int ret;
>  
>  	spin_lock_irq(&hugetlb_lock);
> -	ret = __get_huge_page_for_hwpoison(pfn, flags);
> +	ret = __get_huge_page_for_hwpoison(pfn, flags, migratable_cleared);
>  	spin_unlock_irq(&hugetlb_lock);
>  	return ret;
>  }
> diff --git a/mm/memory-failure.c b/mm/memory-failure.c
> index 145bb561ddb3..d4fef56c0438 100644
> --- a/mm/memory-failure.c
> +++ b/mm/memory-failure.c
> @@ -1244,7 +1244,7 @@ static int __get_hwpoison_page(struct page *page, unsigned long flags)
>  	int ret = 0;
>  	bool hugetlb = false;
>  
> -	ret = get_hwpoison_huge_page(head, &hugetlb);
> +	ret = get_hwpoison_huge_page(head, &hugetlb, false);
>  	if (hugetlb)
>  		return ret;
>  
> @@ -1334,7 +1334,7 @@ static int __get_unpoison_page(struct page *page)
>  	int ret = 0;
>  	bool hugetlb = false;
>  
> -	ret = get_hwpoison_huge_page(head, &hugetlb);
> +	ret = get_hwpoison_huge_page(head, &hugetlb, true);
>  	if (hugetlb)
>  		return ret;
>  
> @@ -1785,7 +1785,8 @@ void hugetlb_clear_page_hwpoison(struct page *hpage)
>   *   -EBUSY        - the hugepage is busy (try to retry)
>   *   -EHWPOISON    - the hugepage is already hwpoisoned
>   */
> -int __get_huge_page_for_hwpoison(unsigned long pfn, int flags)
> +int __get_huge_page_for_hwpoison(unsigned long pfn, int flags,
> +				 bool *migratable_cleared)
>  {
>  	struct page *page = pfn_to_page(pfn);
>  	struct page *head = compound_head(page);
> @@ -1815,6 +1816,15 @@ int __get_huge_page_for_hwpoison(unsigned long pfn, int flags)
>  		goto out;
>  	}
>  
> +	/*
> +	 * Clearing HPageMigratable for hwpoisoned hugepages to prevent them
> +	 * from being migrated by memory hotremove.
> +	 */
> +	if (count_increased) {
> +		*migratable_cleared = true;
> +		ClearHPageMigratable(head);

I think I might be nitpicking... But it seems ClearHPageMigratable is not enough here.
  1. In MF_COUNT_INCREASED case, we don't know whether HPageMigratable is set.
  2. Even if HPageMigratable is set, there might be a race window before we clear HPageMigratable?
So "*migratable_cleared = TestClearHPageMigratable" might be better? But I might be wrong.

With above fixed (if it's really a problem), this patch looks good to me.

Reviewed-by: Miaohe Lin <linmiaohe@huawei.com>

Thanks,
Miaohe Lin



^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v6 4/4] mm/hwpoison: introduce per-memory_block hwpoison counter
  2022-10-07  1:07 ` [PATCH v6 4/4] mm/hwpoison: introduce per-memory_block hwpoison counter Naoya Horiguchi
  2022-10-07  4:34   ` HORIGUCHI NAOYA(堀口 直也)
  2022-10-13  8:33   ` Oscar Salvador
@ 2022-10-15  2:28   ` Miaohe Lin
  2022-10-17 11:43     ` HORIGUCHI NAOYA(堀口 直也)
  2 siblings, 1 reply; 17+ messages in thread
From: Miaohe Lin @ 2022-10-15  2:28 UTC (permalink / raw)
  To: Naoya Horiguchi, linux-mm
  Cc: Andrew Morton, David Hildenbrand, Mike Kravetz, Yang Shi,
	Oscar Salvador, Muchun Song, Jane Chu, Naoya Horiguchi,
	linux-kernel

On 2022/10/7 9:07, Naoya Horiguchi wrote:
> From: Naoya Horiguchi <naoya.horiguchi@nec.com>
> 
> Currently PageHWPoison flag does not behave well when experiencing memory
> hotremove/hotplug.  Any data field in struct page is unreliable when the
> associated memory is offlined, and the current mechanism can't tell whether
> a memory block is onlined because a new memory devices is installed or
> because previous failed offline operations are undone.  Especially if
> there's a hwpoisoned memory, it's unclear what the best option is.
> 
> So introduce a new mechanism to make struct memory_block remember that
> a memory block has hwpoisoned memory inside it. And make any online event
> fail if the onlining memory block contains hwpoison.  struct memory_block
> is freed and reallocated over ACPI-based hotremove/hotplug, but not over
> sysfs-based hotremove/hotplug.  So the new counter can distinguish these
> cases.
> 
> Signed-off-by: Naoya Horiguchi <naoya.horiguchi@nec.com>
> Reported-by: kernel test robot <lkp@intel.com>
> ---
> ChangeLog v5 -> v6:
> - fix build errors over memblk_nr_poison_inc() and memblk_nr_poison_sub(),
> - pass "struct memory_block *" to memblk_nr_poison() instead of pfn,
> - removed clear_hwpoisoned_pages() and call num_poisoned_pages_sub() directly.
> - add static keyword to the definition of memblk_nr_poison().
> - Mioahe added Reviewed-by for v5, but I have some non trivial changes in
>   v6, so let me hold to add it.
> - unpoison_memory() properly cancels per-memblk hwpoison counter.
> 
> ChangeLog v4 -> v5:
> - add Reported-by of lkp bot,
> - check both CONFIG_MEMORY_FAILURE and CONFIG_MEMORY_HOTPLUG in introduced #ifdefs,
>   intending to fix "undefined reference" errors in aarch64.
> 
> ChangeLog v3 -> v4:
> - fix build error (https://lore.kernel.org/linux-mm/202209231134.tnhKHRfg-lkp@intel.com/)
>   by using memblk_nr_poison() to access to the member ->nr_hwpoison
> ---
>  drivers/base/memory.c  | 40 ++++++++++++++++++++++++++++++++++++++++
>  include/linux/memory.h |  3 +++
>  include/linux/mm.h     | 18 ++++++++++++++++++
>  mm/internal.h          |  8 --------
>  mm/memory-failure.c    | 36 +++++++++++-------------------------
>  mm/sparse.c            |  2 --
>  6 files changed, 72 insertions(+), 35 deletions(-)
> 
> diff --git a/drivers/base/memory.c b/drivers/base/memory.c
> index 9aa0da991cfb..5d00d8a14c79 100644
> --- a/drivers/base/memory.c
> +++ b/drivers/base/memory.c
> @@ -175,6 +175,17 @@ int memory_notify(unsigned long val, void *v)
>  	return blocking_notifier_call_chain(&memory_chain, val, v);
>  }
>  
> +#if defined(CONFIG_MEMORY_FAILURE) && defined(CONFIG_MEMORY_HOTPLUG)
> +void memblk_nr_poison_inc(unsigned long pfn);
> +void memblk_nr_poison_sub(unsigned long pfn, long i);
> +static unsigned long memblk_nr_poison(struct memory_block *mem);
> +#else
> +static inline unsigned long memblk_nr_poison(struct memory_block *mem)
> +{
> +	return 0;
> +}
> +#endif
> +
>  static int memory_block_online(struct memory_block *mem)
>  {
>  	unsigned long start_pfn = section_nr_to_pfn(mem->start_section_nr);
> @@ -183,6 +194,9 @@ static int memory_block_online(struct memory_block *mem)
>  	struct zone *zone;
>  	int ret;
>  
> +	if (memblk_nr_poison(mem))
> +		return -EHWPOISON;
> +
>  	zone = zone_for_pfn_range(mem->online_type, mem->nid, mem->group,
>  				  start_pfn, nr_pages);
>  
> @@ -864,6 +878,7 @@ void remove_memory_block_devices(unsigned long start, unsigned long size)
>  		mem = find_memory_block_by_id(block_id);
>  		if (WARN_ON_ONCE(!mem))
>  			continue;
> +		num_poisoned_pages_sub(-1UL, memblk_nr_poison(mem));
>  		unregister_memory_block_under_nodes(mem);
>  		remove_memory_block(mem);
>  	}
> @@ -1164,3 +1179,28 @@ int walk_dynamic_memory_groups(int nid, walk_memory_groups_func_t func,
>  	}
>  	return ret;
>  }
> +
> +#if defined(CONFIG_MEMORY_FAILURE) && defined(CONFIG_MEMORY_HOTPLUG)
> +void memblk_nr_poison_inc(unsigned long pfn)
> +{
> +	const unsigned long block_id = pfn_to_block_id(pfn);
> +	struct memory_block *mem = find_memory_block_by_id(block_id);
> +
> +	if (mem)
> +		atomic_long_inc(&mem->nr_hwpoison);
> +}
> +
> +void memblk_nr_poison_sub(unsigned long pfn, long i)
> +{
> +	const unsigned long block_id = pfn_to_block_id(pfn);
> +	struct memory_block *mem = find_memory_block_by_id(block_id);
> +
> +	if (mem)
> +		atomic_long_sub(i, &mem->nr_hwpoison);
> +}
> +
> +static unsigned long memblk_nr_poison(struct memory_block *mem)
> +{
> +	return atomic_long_read(&mem->nr_hwpoison);
> +}
> +#endif
> diff --git a/include/linux/memory.h b/include/linux/memory.h
> index aa619464a1df..ad8cd9bb3239 100644
> --- a/include/linux/memory.h
> +++ b/include/linux/memory.h
> @@ -85,6 +85,9 @@ struct memory_block {
>  	unsigned long nr_vmemmap_pages;
>  	struct memory_group *group;	/* group (if any) for this block */
>  	struct list_head group_next;	/* next block inside memory group */
> +#if defined(CONFIG_MEMORY_FAILURE) && defined(CONFIG_MEMORY_HOTPLUG)
> +	atomic_long_t nr_hwpoison;
> +#endif
>  };
>  
>  int arch_get_memory_phys_device(unsigned long start_pfn);
> diff --git a/include/linux/mm.h b/include/linux/mm.h
> index 17119dbf8fad..f80269e90772 100644
> --- a/include/linux/mm.h
> +++ b/include/linux/mm.h
> @@ -3280,6 +3280,7 @@ extern int soft_offline_page(unsigned long pfn, int flags);
>  extern int __get_huge_page_for_hwpoison(unsigned long pfn, int flags,
>  					bool *migratable_cleared);
>  extern void num_poisoned_pages_inc(unsigned long pfn);
> +extern void num_poisoned_pages_sub(unsigned long pfn, long i);

The prototype of this function is: *inline* void num_poisoned_pages_sub(unsigned long pfn, long i).
The combination of inline and extern looks weird to me. Is this a common use case?

Anyway, this patch looks good to me. Thanks.
Reviewed-by: Miaohe Lin <linmiaohe@huawei.com>

Thanks,
Miaohe Lin



^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v6 1/4] mm,hwpoison,hugetlb,memory_hotplug: hotremove memory section with hwpoisoned hugepage
  2022-10-15  1:58   ` Miaohe Lin
@ 2022-10-17  7:24     ` HORIGUCHI NAOYA(堀口 直也)
  2022-10-17 13:29       ` Miaohe Lin
  0 siblings, 1 reply; 17+ messages in thread
From: HORIGUCHI NAOYA(堀口 直也) @ 2022-10-17  7:24 UTC (permalink / raw)
  To: Miaohe Lin
  Cc: Naoya Horiguchi, linux-mm, Andrew Morton, David Hildenbrand,
	Mike Kravetz, Yang Shi, Oscar Salvador, Muchun Song, Jane Chu,
	linux-kernel

On Sat, Oct 15, 2022 at 09:58:09AM +0800, Miaohe Lin wrote:
> On 2022/10/7 9:07, Naoya Horiguchi wrote:
...
> > @@ -1785,7 +1785,8 @@ void hugetlb_clear_page_hwpoison(struct page *hpage)
> >   *   -EBUSY        - the hugepage is busy (try to retry)
> >   *   -EHWPOISON    - the hugepage is already hwpoisoned
> >   */
> > -int __get_huge_page_for_hwpoison(unsigned long pfn, int flags)
> > +int __get_huge_page_for_hwpoison(unsigned long pfn, int flags,
> > +				 bool *migratable_cleared)
> >  {
> >  	struct page *page = pfn_to_page(pfn);
> >  	struct page *head = compound_head(page);
> > @@ -1815,6 +1816,15 @@ int __get_huge_page_for_hwpoison(unsigned long pfn, int flags)
> >  		goto out;
> >  	}
> >  
> > +	/*
> > +	 * Clearing HPageMigratable for hwpoisoned hugepages to prevent them
> > +	 * from being migrated by memory hotremove.
> > +	 */
> > +	if (count_increased) {
> > +		*migratable_cleared = true;
> > +		ClearHPageMigratable(head);
> 
> I think I might be nitpicking... But it seems ClearHPageMigratable is not enough here.
>   1. In MF_COUNT_INCREASED case, we don't know whether HPageMigratable is set.
>   2. Even if HPageMigratable is set, there might be a race window before we clear HPageMigratable?

Maybe this race is what I mentioned in
https://lore.kernel.org/linux-mm/20220928012647.GA597297@u2004.lan/
(the second scenario).  My stance to this case is that the hugepage is not
hwpoisoned even if some hardware (not visible to kernel) issue is in it,
so hwpoison handler can/may not cope with the race.
I guess that this could be handled by applying memcpy_mcsafe() mechanism
to page migration, but I'm not sure of the feasibility.

> So "*migratable_cleared = TestClearHPageMigratable" might be better? But I might be wrong.

Thanks, this seems work for 1 (I need testing to check it), so I'll do this
in the next post.

> 
> With above fixed (if it's really a problem), this patch looks good to me.
> 
> Reviewed-by: Miaohe Lin <linmiaohe@huawei.com>

Thank you very much.

- Naoya Horiguchi

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v6 4/4] mm/hwpoison: introduce per-memory_block hwpoison counter
  2022-10-15  2:28   ` Miaohe Lin
@ 2022-10-17 11:43     ` HORIGUCHI NAOYA(堀口 直也)
  2022-10-17 13:31       ` Miaohe Lin
  0 siblings, 1 reply; 17+ messages in thread
From: HORIGUCHI NAOYA(堀口 直也) @ 2022-10-17 11:43 UTC (permalink / raw)
  To: Miaohe Lin
  Cc: Naoya Horiguchi, linux-mm, Andrew Morton, David Hildenbrand,
	Mike Kravetz, Yang Shi, Oscar Salvador, Muchun Song, Jane Chu,
	linux-kernel

On Sat, Oct 15, 2022 at 10:28:00AM +0800, Miaohe Lin wrote:
> On 2022/10/7 9:07, Naoya Horiguchi wrote:
...
> > diff --git a/include/linux/memory.h b/include/linux/memory.h
> > index aa619464a1df..ad8cd9bb3239 100644
> > --- a/include/linux/memory.h
> > +++ b/include/linux/memory.h
> > @@ -85,6 +85,9 @@ struct memory_block {
> >  	unsigned long nr_vmemmap_pages;
> >  	struct memory_group *group;	/* group (if any) for this block */
> >  	struct list_head group_next;	/* next block inside memory group */
> > +#if defined(CONFIG_MEMORY_FAILURE) && defined(CONFIG_MEMORY_HOTPLUG)
> > +	atomic_long_t nr_hwpoison;
> > +#endif
> >  };
> >  
> >  int arch_get_memory_phys_device(unsigned long start_pfn);
> > diff --git a/include/linux/mm.h b/include/linux/mm.h
> > index 17119dbf8fad..f80269e90772 100644
> > --- a/include/linux/mm.h
> > +++ b/include/linux/mm.h
> > @@ -3280,6 +3280,7 @@ extern int soft_offline_page(unsigned long pfn, int flags);
> >  extern int __get_huge_page_for_hwpoison(unsigned long pfn, int flags,
> >  					bool *migratable_cleared);
> >  extern void num_poisoned_pages_inc(unsigned long pfn);
> > +extern void num_poisoned_pages_sub(unsigned long pfn, long i);
> 
> The prototype of this function is: *inline* void num_poisoned_pages_sub(unsigned long pfn, long i).
> The combination of inline and extern looks weird to me. Is this a common use case?

No, it seems not.  I can find a few place of such a comination like task_curr()
and raise_softirq_irqoff(), but as long as I understand, there's little meaning
(showing explicitly but redundant) to add extern keyword to functions in shared
header files. So I think of dropping the extern keyword.

> 
> Anyway, this patch looks good to me. Thanks.
> Reviewed-by: Miaohe Lin <linmiaohe@huawei.com>

Thank you.

- Naoya Horiguchi

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v6 1/4] mm,hwpoison,hugetlb,memory_hotplug: hotremove memory section with hwpoisoned hugepage
  2022-10-17  7:24     ` HORIGUCHI NAOYA(堀口 直也)
@ 2022-10-17 13:29       ` Miaohe Lin
  0 siblings, 0 replies; 17+ messages in thread
From: Miaohe Lin @ 2022-10-17 13:29 UTC (permalink / raw)
  To: HORIGUCHI NAOYA(堀口 直也)
  Cc: Naoya Horiguchi, linux-mm, Andrew Morton, David Hildenbrand,
	Mike Kravetz, Yang Shi, Oscar Salvador, Muchun Song, Jane Chu,
	linux-kernel

On 2022/10/17 15:24, HORIGUCHI NAOYA(堀口 直也) wrote:
> On Sat, Oct 15, 2022 at 09:58:09AM +0800, Miaohe Lin wrote:
>> On 2022/10/7 9:07, Naoya Horiguchi wrote:
> ...
>>> @@ -1785,7 +1785,8 @@ void hugetlb_clear_page_hwpoison(struct page *hpage)
>>>   *   -EBUSY        - the hugepage is busy (try to retry)
>>>   *   -EHWPOISON    - the hugepage is already hwpoisoned
>>>   */
>>> -int __get_huge_page_for_hwpoison(unsigned long pfn, int flags)
>>> +int __get_huge_page_for_hwpoison(unsigned long pfn, int flags,
>>> +				 bool *migratable_cleared)
>>>  {
>>>  	struct page *page = pfn_to_page(pfn);
>>>  	struct page *head = compound_head(page);
>>> @@ -1815,6 +1816,15 @@ int __get_huge_page_for_hwpoison(unsigned long pfn, int flags)
>>>  		goto out;
>>>  	}
>>>  
>>> +	/*
>>> +	 * Clearing HPageMigratable for hwpoisoned hugepages to prevent them
>>> +	 * from being migrated by memory hotremove.
>>> +	 */
>>> +	if (count_increased) {
>>> +		*migratable_cleared = true;
>>> +		ClearHPageMigratable(head);
>>
>> I think I might be nitpicking... But it seems ClearHPageMigratable is not enough here.
>>   1. In MF_COUNT_INCREASED case, we don't know whether HPageMigratable is set.
>>   2. Even if HPageMigratable is set, there might be a race window before we clear HPageMigratable?
> 
> Maybe this race is what I mentioned in
> https://lore.kernel.org/linux-mm/20220928012647.GA597297@u2004.lan/
> (the second scenario).  My stance to this case is that the hugepage is not
> hwpoisoned even if some hardware (not visible to kernel) issue is in it,
> so hwpoison handler can/may not cope with the race.
> I guess that this could be handled by applying memcpy_mcsafe() mechanism
> to page migration, but I'm not sure of the feasibility.

Thanks Naoya. memcpy_mcsafe() might be a good idea to handle hwpoison with the memory copy in
page migration path. And [1] is doing the similar thing. If this mechanism is applicable, then
we could handle more memory failure scene. ;)

[1] https://lore.kernel.org/linux-mm/20221010160142.1087120-1-jiaqiyan@google.com/

> 
>> So "*migratable_cleared = TestClearHPageMigratable" might be better? But I might be wrong.
> 
> Thanks, this seems work for 1 (I need testing to check it), so I'll do this
> in the next post.

Many thanks for your work.

Thanks,
Miaohe Lin

> 
>>
>> With above fixed (if it's really a problem), this patch looks good to me.
>>
>> Reviewed-by: Miaohe Lin <linmiaohe@huawei.com>
> 
> Thank you very much.
> 
> - Naoya Horiguchi
> 


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v6 4/4] mm/hwpoison: introduce per-memory_block hwpoison counter
  2022-10-17 11:43     ` HORIGUCHI NAOYA(堀口 直也)
@ 2022-10-17 13:31       ` Miaohe Lin
  0 siblings, 0 replies; 17+ messages in thread
From: Miaohe Lin @ 2022-10-17 13:31 UTC (permalink / raw)
  To: HORIGUCHI NAOYA(堀口 直也)
  Cc: Naoya Horiguchi, linux-mm, Andrew Morton, David Hildenbrand,
	Mike Kravetz, Yang Shi, Oscar Salvador, Muchun Song, Jane Chu,
	linux-kernel

On 2022/10/17 19:43, HORIGUCHI NAOYA(堀口 直也) wrote:
> On Sat, Oct 15, 2022 at 10:28:00AM +0800, Miaohe Lin wrote:
>> On 2022/10/7 9:07, Naoya Horiguchi wrote:
> ...
>>> diff --git a/include/linux/memory.h b/include/linux/memory.h
>>> index aa619464a1df..ad8cd9bb3239 100644
>>> --- a/include/linux/memory.h
>>> +++ b/include/linux/memory.h
>>> @@ -85,6 +85,9 @@ struct memory_block {
>>>  	unsigned long nr_vmemmap_pages;
>>>  	struct memory_group *group;	/* group (if any) for this block */
>>>  	struct list_head group_next;	/* next block inside memory group */
>>> +#if defined(CONFIG_MEMORY_FAILURE) && defined(CONFIG_MEMORY_HOTPLUG)
>>> +	atomic_long_t nr_hwpoison;
>>> +#endif
>>>  };
>>>  
>>>  int arch_get_memory_phys_device(unsigned long start_pfn);
>>> diff --git a/include/linux/mm.h b/include/linux/mm.h
>>> index 17119dbf8fad..f80269e90772 100644
>>> --- a/include/linux/mm.h
>>> +++ b/include/linux/mm.h
>>> @@ -3280,6 +3280,7 @@ extern int soft_offline_page(unsigned long pfn, int flags);
>>>  extern int __get_huge_page_for_hwpoison(unsigned long pfn, int flags,
>>>  					bool *migratable_cleared);
>>>  extern void num_poisoned_pages_inc(unsigned long pfn);
>>> +extern void num_poisoned_pages_sub(unsigned long pfn, long i);
>>
>> The prototype of this function is: *inline* void num_poisoned_pages_sub(unsigned long pfn, long i).
>> The combination of inline and extern looks weird to me. Is this a common use case?
> 
> No, it seems not.  I can find a few place of such a comination like task_curr()
> and raise_softirq_irqoff(), but as long as I understand, there's little meaning
> (showing explicitly but redundant) to add extern keyword to functions in shared
> header files. So I think of dropping the extern keyword.

That looks fine to me. My Reviewed-by tag still applies. Thanks Naoya.

Thanks,
Miaohe Lin

> 
>>
>> Anyway, this patch looks good to me. Thanks.
>> Reviewed-by: Miaohe Lin <linmiaohe@huawei.com>
> 
> Thank you.
> 
> - Naoya Horiguchi
> 


^ permalink raw reply	[flat|nested] 17+ messages in thread

end of thread, other threads:[~2022-10-17 13:31 UTC | newest]

Thread overview: 17+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-10-07  1:07 [PATCH v6 0/4] mm, hwpoison: improve handling workload related to hugetlb and memory_hotplug Naoya Horiguchi
2022-10-07  1:07 ` [PATCH v6 1/4] mm,hwpoison,hugetlb,memory_hotplug: hotremove memory section with hwpoisoned hugepage Naoya Horiguchi
2022-10-13 14:17   ` Oscar Salvador
2022-10-15  1:58   ` Miaohe Lin
2022-10-17  7:24     ` HORIGUCHI NAOYA(堀口 直也)
2022-10-17 13:29       ` Miaohe Lin
2022-10-07  1:07 ` [PATCH v6 2/4] mm/hwpoison: move definitions of num_poisoned_pages_* to memory-failure.c Naoya Horiguchi
2022-10-13 14:31   ` Oscar Salvador
2022-10-14  6:38     ` Naoya Horiguchi
2022-10-07  1:07 ` [PATCH v6 3/4] mm/hwpoison: pass pfn to num_poisoned_pages_*() Naoya Horiguchi
2022-10-07  1:07 ` [PATCH v6 4/4] mm/hwpoison: introduce per-memory_block hwpoison counter Naoya Horiguchi
2022-10-07  4:34   ` HORIGUCHI NAOYA(堀口 直也)
2022-10-13  8:33   ` Oscar Salvador
2022-10-13 10:09     ` Naoya Horiguchi
2022-10-15  2:28   ` Miaohe Lin
2022-10-17 11:43     ` HORIGUCHI NAOYA(堀口 直也)
2022-10-17 13:31       ` Miaohe Lin

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.