linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v3 0/4] mm, hwpoison: improve handling workload related to hugetlb and memory_hotplug
@ 2022-09-21  9:13 Naoya Horiguchi
  2022-09-21  9:13 ` [PATCH v3 1/4] mm,hwpoison,hugetlb,memory_hotplug: hotremove memory section with hwpoisoned hugepage Naoya Horiguchi
                   ` (3 more replies)
  0 siblings, 4 replies; 19+ messages in thread
From: Naoya Horiguchi @ 2022-09-21  9:13 UTC (permalink / raw)
  To: linux-mm
  Cc: Andrew Morton, Miaohe Lin, David Hildenbrand, Mike Kravetz,
	Yang Shi, Oscar Salvador, Muchun Song, Jane Chu, Naoya Horiguchi,
	linux-kernel

Hi,

This patchset tries to solve the issue among memory_hotplug, hugetlb and
hwpoison.  Based on the review over v2 by Miaohe (thank you!), 1/4 takes
another approach to prevent hwpoisoned hugepages to be migrated (i.e.
the corrupted data is accessed) in memory hotremove.

In this patchset, memory hotplug handles hwpoison pages like below:

  - hwpoison pages should not prevent memory hotremove,
  - memory block with hwpoison pages should not be onlined.

Any comments and feedbacks would be appreciated.

Thanks,
Naoya Horiguchi

v1: https://lore.kernel.org/linux-mm/20220427042841.678351-1-naoya.horiguchi@linux.dev/T
v2: https://lore.kernel.org/linux-mm/20220905062137.1455537-1-naoya.horiguchi@linux.dev/T
---
Summary:

Naoya Horiguchi (4):
      mm,hwpoison,hugetlb,memory_hotplug: hotremove memory section with hwpoisoned hugepage
      mm/hwpoison: move definitions of num_poisoned_pages_* to memory-failure.c
      mm/hwpoison: pass pfn to num_poisoned_pages_*()
      mm/hwpoison: introduce per-memory_block hwpoison counter

 arch/parisc/kernel/pdt.c |  5 ++--
 drivers/base/memory.c    | 36 +++++++++++++++++++++++++++++
 include/linux/hugetlb.h  |  4 ++--
 include/linux/memory.h   |  3 +++
 include/linux/mm.h       | 13 +++++++++++
 include/linux/swapops.h  | 24 ++------------------
 mm/hugetlb.c             |  4 ++--
 mm/internal.h            |  8 -------
 mm/memory-failure.c      | 59 +++++++++++++++++++++++++++---------------------
 mm/sparse.c              |  2 --
 10 files changed, 93 insertions(+), 65 deletions(-)


^ permalink raw reply	[flat|nested] 19+ messages in thread

* [PATCH v3 1/4] mm,hwpoison,hugetlb,memory_hotplug: hotremove memory section with hwpoisoned hugepage
  2022-09-21  9:13 [PATCH v3 0/4] mm, hwpoison: improve handling workload related to hugetlb and memory_hotplug Naoya Horiguchi
@ 2022-09-21  9:13 ` Naoya Horiguchi
  2022-09-24 11:43   ` Miaohe Lin
  2022-09-21  9:13 ` [PATCH v3 2/4] mm/hwpoison: move definitions of num_poisoned_pages_* to memory-failure.c Naoya Horiguchi
                   ` (2 subsequent siblings)
  3 siblings, 1 reply; 19+ messages in thread
From: Naoya Horiguchi @ 2022-09-21  9:13 UTC (permalink / raw)
  To: linux-mm
  Cc: Andrew Morton, Miaohe Lin, David Hildenbrand, Mike Kravetz,
	Yang Shi, Oscar Salvador, Muchun Song, Jane Chu, Naoya Horiguchi,
	linux-kernel

From: Naoya Horiguchi <naoya.horiguchi@nec.com>

HWPoisoned page is not supposed to be accessed once marked, but currently
such accesses can happen during memory hotremove because do_migrate_range()
can be called before dissolve_free_huge_pages() is called.

Clear HPageMigratable for hwpoisoned hugepages to prevent them from being
migrated.  This should be done in hugetlb_lock to avoid race against
isolate_hugetlb().

get_hwpoison_huge_page() needs to have a flag to show it's called from
unpoison to take refcount of hwpoisoned hugepages, so add it.

Reported-by: Miaohe Lin <linmiaohe@huawei.com>
Signed-off-by: Naoya Horiguchi <naoya.horiguchi@nec.com>
---
ChangeLog v2 -> v3
- move to the approach of clearing HPageMigratable instead of shifting
  dissolve_free_huge_pages.
---
 include/linux/hugetlb.h |  4 ++--
 mm/hugetlb.c            |  4 ++--
 mm/memory-failure.c     | 12 ++++++++++--
 3 files changed, 14 insertions(+), 6 deletions(-)

diff --git a/include/linux/hugetlb.h b/include/linux/hugetlb.h
index cfe15b32e2d4..18229402c6d7 100644
--- a/include/linux/hugetlb.h
+++ b/include/linux/hugetlb.h
@@ -181,7 +181,7 @@ bool hugetlb_reserve_pages(struct inode *inode, long from, long to,
 long hugetlb_unreserve_pages(struct inode *inode, long start, long end,
 						long freed);
 int isolate_hugetlb(struct page *page, struct list_head *list);
-int get_hwpoison_huge_page(struct page *page, bool *hugetlb);
+int get_hwpoison_huge_page(struct page *page, bool *hugetlb, bool unpoison);
 int get_huge_page_for_hwpoison(unsigned long pfn, int flags);
 void putback_active_hugepage(struct page *page);
 void move_hugetlb_state(struct page *oldpage, struct page *newpage, int reason);
@@ -425,7 +425,7 @@ static inline int isolate_hugetlb(struct page *page, struct list_head *list)
 	return -EBUSY;
 }
 
-static inline int get_hwpoison_huge_page(struct page *page, bool *hugetlb)
+static inline int get_hwpoison_huge_page(struct page *page, bool *hugetlb, bool unpoison)
 {
 	return 0;
 }
diff --git a/mm/hugetlb.c b/mm/hugetlb.c
index 8bcaf66defc5..d3b83c570b56 100644
--- a/mm/hugetlb.c
+++ b/mm/hugetlb.c
@@ -7257,7 +7257,7 @@ int isolate_hugetlb(struct page *page, struct list_head *list)
 	return ret;
 }
 
-int get_hwpoison_huge_page(struct page *page, bool *hugetlb)
+int get_hwpoison_huge_page(struct page *page, bool *hugetlb, bool unpoison)
 {
 	int ret = 0;
 
@@ -7267,7 +7267,7 @@ int get_hwpoison_huge_page(struct page *page, bool *hugetlb)
 		*hugetlb = true;
 		if (HPageFreed(page))
 			ret = 0;
-		else if (HPageMigratable(page))
+		else if (HPageMigratable(page) || unpoison)
 			ret = get_page_unless_zero(page);
 		else
 			ret = -EBUSY;
diff --git a/mm/memory-failure.c b/mm/memory-failure.c
index 145bb561ddb3..5942e1c0407e 100644
--- a/mm/memory-failure.c
+++ b/mm/memory-failure.c
@@ -1244,7 +1244,7 @@ static int __get_hwpoison_page(struct page *page, unsigned long flags)
 	int ret = 0;
 	bool hugetlb = false;
 
-	ret = get_hwpoison_huge_page(head, &hugetlb);
+	ret = get_hwpoison_huge_page(head, &hugetlb, false);
 	if (hugetlb)
 		return ret;
 
@@ -1334,7 +1334,7 @@ static int __get_unpoison_page(struct page *page)
 	int ret = 0;
 	bool hugetlb = false;
 
-	ret = get_hwpoison_huge_page(head, &hugetlb);
+	ret = get_hwpoison_huge_page(head, &hugetlb, true);
 	if (hugetlb)
 		return ret;
 
@@ -1815,6 +1815,13 @@ int __get_huge_page_for_hwpoison(unsigned long pfn, int flags)
 		goto out;
 	}
 
+	/*
+	 * Clearing HPageMigratable for hwpoisoned hugepages to prevent them
+	 * from being migrated by memory hotremove.
+	 */
+	if (count_increased)
+		ClearHPageMigratable(head);
+
 	return ret;
 out:
 	if (count_increased)
@@ -1862,6 +1869,7 @@ static int try_memory_failure_hugetlb(unsigned long pfn, int flags, int *hugetlb
 
 	if (hwpoison_filter(p)) {
 		hugetlb_clear_page_hwpoison(head);
+		SetHPageMigratable(head);
 		unlock_page(head);
 		if (res == 1)
 			put_page(head);
-- 
2.25.1



^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH v3 2/4] mm/hwpoison: move definitions of num_poisoned_pages_* to memory-failure.c
  2022-09-21  9:13 [PATCH v3 0/4] mm, hwpoison: improve handling workload related to hugetlb and memory_hotplug Naoya Horiguchi
  2022-09-21  9:13 ` [PATCH v3 1/4] mm,hwpoison,hugetlb,memory_hotplug: hotremove memory section with hwpoisoned hugepage Naoya Horiguchi
@ 2022-09-21  9:13 ` Naoya Horiguchi
  2022-09-24 11:53   ` Miaohe Lin
  2022-09-21  9:13 ` [PATCH v3 3/4] mm/hwpoison: pass pfn to num_poisoned_pages_*() Naoya Horiguchi
  2022-09-21  9:13 ` [PATCH v3 4/4] mm/hwpoison: introduce per-memory_block hwpoison counter Naoya Horiguchi
  3 siblings, 1 reply; 19+ messages in thread
From: Naoya Horiguchi @ 2022-09-21  9:13 UTC (permalink / raw)
  To: linux-mm
  Cc: Andrew Morton, Miaohe Lin, David Hildenbrand, Mike Kravetz,
	Yang Shi, Oscar Salvador, Muchun Song, Jane Chu, Naoya Horiguchi,
	linux-kernel

From: Naoya Horiguchi <naoya.horiguchi@nec.com>

These interfaces will be used by drivers/base/core.c by later patch, so as a
preparatory work move them to more common header file visible to the file.

Signed-off-by: Naoya Horiguchi <naoya.horiguchi@nec.com>
---
ChangeLog v2 -> v3:
- added declaration of num_poisoned_pages_inc() in #ifdef CONFIG_MEMORY_FAILURE
---
 arch/parisc/kernel/pdt.c |  3 +--
 include/linux/mm.h       |  5 +++++
 include/linux/swapops.h  | 24 ++----------------------
 mm/memory-failure.c      | 10 ++++++++++
 4 files changed, 18 insertions(+), 24 deletions(-)

diff --git a/arch/parisc/kernel/pdt.c b/arch/parisc/kernel/pdt.c
index e391b175f5ec..fdc880e2575a 100644
--- a/arch/parisc/kernel/pdt.c
+++ b/arch/parisc/kernel/pdt.c
@@ -18,8 +18,7 @@
 #include <linux/kthread.h>
 #include <linux/initrd.h>
 #include <linux/pgtable.h>
-#include <linux/swap.h>
-#include <linux/swapops.h>
+#include <linux/mm.h>
 
 #include <asm/pdc.h>
 #include <asm/pdcpat.h>
diff --git a/include/linux/mm.h b/include/linux/mm.h
index c2277f5aba9e..80a2d800f272 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -3279,11 +3279,16 @@ extern atomic_long_t num_poisoned_pages __read_mostly;
 extern int soft_offline_page(unsigned long pfn, int flags);
 #ifdef CONFIG_MEMORY_FAILURE
 extern int __get_huge_page_for_hwpoison(unsigned long pfn, int flags);
+extern void num_poisoned_pages_inc(void);
 #else
 static inline int __get_huge_page_for_hwpoison(unsigned long pfn, int flags)
 {
 	return 0;
 }
+
+static inline void num_poisoned_pages_inc(void)
+{
+}
 #endif
 
 #ifndef arch_memory_failure
diff --git a/include/linux/swapops.h b/include/linux/swapops.h
index a91dd08e107b..3e58a812399a 100644
--- a/include/linux/swapops.h
+++ b/include/linux/swapops.h
@@ -581,8 +581,6 @@ static inline int is_pmd_migration_entry(pmd_t pmd)
 
 #ifdef CONFIG_MEMORY_FAILURE
 
-extern atomic_long_t num_poisoned_pages __read_mostly;
-
 /*
  * Support for hardware poisoned pages
  */
@@ -610,17 +608,7 @@ static inline struct page *hwpoison_entry_to_page(swp_entry_t entry)
 	return p;
 }
 
-static inline void num_poisoned_pages_inc(void)
-{
-	atomic_long_inc(&num_poisoned_pages);
-}
-
-static inline void num_poisoned_pages_sub(long i)
-{
-	atomic_long_sub(i, &num_poisoned_pages);
-}
-
-#else  /* CONFIG_MEMORY_FAILURE */
+#else
 
 static inline swp_entry_t make_hwpoison_entry(struct page *page)
 {
@@ -636,15 +624,7 @@ static inline struct page *hwpoison_entry_to_page(swp_entry_t entry)
 {
 	return NULL;
 }
-
-static inline void num_poisoned_pages_inc(void)
-{
-}
-
-static inline void num_poisoned_pages_sub(long i)
-{
-}
-#endif  /* CONFIG_MEMORY_FAILURE */
+#endif
 
 static inline int non_swap_entry(swp_entry_t entry)
 {
diff --git a/mm/memory-failure.c b/mm/memory-failure.c
index 5942e1c0407e..aa6ce685b863 100644
--- a/mm/memory-failure.c
+++ b/mm/memory-failure.c
@@ -74,6 +74,16 @@ atomic_long_t num_poisoned_pages __read_mostly = ATOMIC_LONG_INIT(0);
 
 static bool hw_memory_failure __read_mostly = false;
 
+static inline void num_poisoned_pages_inc(void)
+{
+	atomic_long_inc(&num_poisoned_pages);
+}
+
+static inline void num_poisoned_pages_sub(long i)
+{
+	atomic_long_sub(i, &num_poisoned_pages);
+}
+
 /*
  * Return values:
  *   1:   the page is dissolved (if needed) and taken off from buddy,
-- 
2.25.1



^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH v3 3/4] mm/hwpoison: pass pfn to num_poisoned_pages_*()
  2022-09-21  9:13 [PATCH v3 0/4] mm, hwpoison: improve handling workload related to hugetlb and memory_hotplug Naoya Horiguchi
  2022-09-21  9:13 ` [PATCH v3 1/4] mm,hwpoison,hugetlb,memory_hotplug: hotremove memory section with hwpoisoned hugepage Naoya Horiguchi
  2022-09-21  9:13 ` [PATCH v3 2/4] mm/hwpoison: move definitions of num_poisoned_pages_* to memory-failure.c Naoya Horiguchi
@ 2022-09-21  9:13 ` Naoya Horiguchi
  2022-09-21  9:13 ` [PATCH v3 4/4] mm/hwpoison: introduce per-memory_block hwpoison counter Naoya Horiguchi
  3 siblings, 0 replies; 19+ messages in thread
From: Naoya Horiguchi @ 2022-09-21  9:13 UTC (permalink / raw)
  To: linux-mm
  Cc: Andrew Morton, Miaohe Lin, David Hildenbrand, Mike Kravetz,
	Yang Shi, Oscar Salvador, Muchun Song, Jane Chu, Naoya Horiguchi,
	linux-kernel

From: Naoya Horiguchi <naoya.horiguchi@nec.com>

No functional change.

Signed-off-by: Naoya Horiguchi <naoya.horiguchi@nec.com>
Reviewed-by: Miaohe Lin <linmiaohe@huawei.com>
---
ChangeLog v2 -> v3:
- added declaration of num_poisoned_pages_inc() in #ifdef CONFIG_MEMORY_FAILURE
---
 arch/parisc/kernel/pdt.c |  2 +-
 include/linux/mm.h       |  4 ++--
 mm/memory-failure.c      | 12 ++++++------
 3 files changed, 9 insertions(+), 9 deletions(-)

diff --git a/arch/parisc/kernel/pdt.c b/arch/parisc/kernel/pdt.c
index fdc880e2575a..80943a00e245 100644
--- a/arch/parisc/kernel/pdt.c
+++ b/arch/parisc/kernel/pdt.c
@@ -231,7 +231,7 @@ void __init pdc_pdt_init(void)
 
 		/* mark memory page bad */
 		memblock_reserve(pdt_entry[i] & PAGE_MASK, PAGE_SIZE);
-		num_poisoned_pages_inc();
+		num_poisoned_pages_inc(addr >> PAGE_SHIFT);
 	}
 }
 
diff --git a/include/linux/mm.h b/include/linux/mm.h
index 80a2d800f272..2bb5d1596041 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -3279,14 +3279,14 @@ extern atomic_long_t num_poisoned_pages __read_mostly;
 extern int soft_offline_page(unsigned long pfn, int flags);
 #ifdef CONFIG_MEMORY_FAILURE
 extern int __get_huge_page_for_hwpoison(unsigned long pfn, int flags);
-extern void num_poisoned_pages_inc(void);
+extern void num_poisoned_pages_inc(unsigned long pfn);
 #else
 static inline int __get_huge_page_for_hwpoison(unsigned long pfn, int flags)
 {
 	return 0;
 }
 
-static inline void num_poisoned_pages_inc(void)
+static inline void num_poisoned_pages_inc(unsigned long pfn)
 {
 }
 #endif
diff --git a/mm/memory-failure.c b/mm/memory-failure.c
index aa6ce685b863..a069d43bc87f 100644
--- a/mm/memory-failure.c
+++ b/mm/memory-failure.c
@@ -74,12 +74,12 @@ atomic_long_t num_poisoned_pages __read_mostly = ATOMIC_LONG_INIT(0);
 
 static bool hw_memory_failure __read_mostly = false;
 
-static inline void num_poisoned_pages_inc(void)
+static inline void num_poisoned_pages_inc(unsigned long pfn)
 {
 	atomic_long_inc(&num_poisoned_pages);
 }
 
-static inline void num_poisoned_pages_sub(long i)
+static inline void num_poisoned_pages_sub(unsigned long pfn, long i)
 {
 	atomic_long_sub(i, &num_poisoned_pages);
 }
@@ -125,7 +125,7 @@ static bool page_handle_poison(struct page *page, bool hugepage_or_freepage, boo
 	if (release)
 		put_page(page);
 	page_ref_inc(page);
-	num_poisoned_pages_inc();
+	num_poisoned_pages_inc(page_to_pfn(page));
 
 	return true;
 }
@@ -1194,7 +1194,7 @@ static void action_result(unsigned long pfn, enum mf_action_page_type type,
 {
 	trace_memory_failure_event(pfn, type, result);
 
-	num_poisoned_pages_inc();
+	num_poisoned_pages_inc(pfn);
 	pr_err("%#lx: recovery action for %s: %s\n",
 		pfn, action_page_types[type], action_name[result]);
 }
@@ -1741,7 +1741,7 @@ static int hugetlb_set_page_hwpoison(struct page *hpage, struct page *page)
 		llist_add(&raw_hwp->node, head);
 		/* the first error event will be counted in action_result(). */
 		if (ret)
-			num_poisoned_pages_inc();
+			num_poisoned_pages_inc(page_to_pfn(page));
 	} else {
 		/*
 		 * Failed to save raw error info.  We no longer trace all
@@ -2414,7 +2414,7 @@ int unpoison_memory(unsigned long pfn)
 unlock_mutex:
 	mutex_unlock(&mf_mutex);
 	if (!ret || freeit) {
-		num_poisoned_pages_sub(count);
+		num_poisoned_pages_sub(pfn, count);
 		unpoison_pr_info("Unpoison: Software-unpoisoned page %#lx\n",
 				 page_to_pfn(p), &unpoison_rs);
 	}
-- 
2.25.1



^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH v3 4/4] mm/hwpoison: introduce per-memory_block hwpoison counter
  2022-09-21  9:13 [PATCH v3 0/4] mm, hwpoison: improve handling workload related to hugetlb and memory_hotplug Naoya Horiguchi
                   ` (2 preceding siblings ...)
  2022-09-21  9:13 ` [PATCH v3 3/4] mm/hwpoison: pass pfn to num_poisoned_pages_*() Naoya Horiguchi
@ 2022-09-21  9:13 ` Naoya Horiguchi
  2022-09-23  8:26   ` [PATCH v4 4/4] mm/hwpoison: introduce per-memory_block hwpoison counter counter Naoya Horiguchi
  3 siblings, 1 reply; 19+ messages in thread
From: Naoya Horiguchi @ 2022-09-21  9:13 UTC (permalink / raw)
  To: linux-mm
  Cc: Andrew Morton, Miaohe Lin, David Hildenbrand, Mike Kravetz,
	Yang Shi, Oscar Salvador, Muchun Song, Jane Chu, Naoya Horiguchi,
	linux-kernel

From: Naoya Horiguchi <naoya.horiguchi@nec.com>

Currently PageHWPoison flag does not behave well when experiencing memory
hotremove/hotplug.  Any data field in struct page is unreliable when the
associated memory is offlined, and the current mechanism can't tell whether
a memory section is onlined because a new memory devices is installed or
because previous failed offline operations are undone.  Especially if
there's a hwpoisoned memory, it's unclear what the best option is.

So introduce a new mechanism to make struct memory_block remember that
a memory block has hwpoisoned memory inside it. And make any online event
fail if the onlined memory block contains hwpoison.  struct memory_block
is freed and reallocated over ACPI-based hotremove/hotplug, but not over
sysfs-based hotremove/hotplug.  So it's desirable to implement hwpoison
counter on this struct.

Note that clear_hwpoisoned_pages() is relocated to be called earlier than
now, just before unregistering struct memory_block.  Otherwise, the
per-memory_block hwpoison counter is freed and we fail to adjust global
hwpoison counter properly.

Signed-off-by: Naoya Horiguchi <naoya.horiguchi@nec.com>
---
 drivers/base/memory.c  | 36 ++++++++++++++++++++++++++++++++++++
 include/linux/memory.h |  3 +++
 include/linux/mm.h     |  8 ++++++++
 mm/internal.h          |  8 --------
 mm/memory-failure.c    | 31 ++++++++++---------------------
 mm/sparse.c            |  2 --
 6 files changed, 57 insertions(+), 31 deletions(-)

diff --git a/drivers/base/memory.c b/drivers/base/memory.c
index 9aa0da991cfb..c9bde4c4ffdf 100644
--- a/drivers/base/memory.c
+++ b/drivers/base/memory.c
@@ -183,6 +183,9 @@ static int memory_block_online(struct memory_block *mem)
 	struct zone *zone;
 	int ret;
 
+	if (atomic_long_read(&mem->nr_hwpoison))
+		return -EHWPOISON;
+
 	zone = zone_for_pfn_range(mem->online_type, mem->nid, mem->group,
 				  start_pfn, nr_pages);
 
@@ -864,6 +867,7 @@ void remove_memory_block_devices(unsigned long start, unsigned long size)
 		mem = find_memory_block_by_id(block_id);
 		if (WARN_ON_ONCE(!mem))
 			continue;
+		clear_hwpoisoned_pages(atomic_long_read(&mem->nr_hwpoison));
 		unregister_memory_block_under_nodes(mem);
 		remove_memory_block(mem);
 	}
@@ -1164,3 +1168,35 @@ int walk_dynamic_memory_groups(int nid, walk_memory_groups_func_t func,
 	}
 	return ret;
 }
+
+#ifdef CONFIG_MEMORY_FAILURE
+
+void memblk_nr_poison_inc(unsigned long pfn)
+{
+	const unsigned long block_id = pfn_to_block_id(pfn);
+	struct memory_block *mem = find_memory_block_by_id(block_id);
+
+	if (mem)
+		atomic_long_inc(&mem->nr_hwpoison);
+}
+
+void memblk_nr_poison_sub(unsigned long pfn, long i)
+{
+	const unsigned long block_id = pfn_to_block_id(pfn);
+	struct memory_block *mem = find_memory_block_by_id(block_id);
+
+	if (mem)
+		atomic_long_sub(i, &mem->nr_hwpoison);
+}
+
+unsigned long memblk_nr_poison(unsigned long pfn)
+{
+	const unsigned long block_id = pfn_to_block_id(pfn);
+	struct memory_block *mem = find_memory_block_by_id(block_id);
+
+	if (mem)
+		return atomic_long_read(&mem->nr_hwpoison);
+	return 0;
+}
+
+#endif
diff --git a/include/linux/memory.h b/include/linux/memory.h
index aa619464a1df..74e6b3ad947f 100644
--- a/include/linux/memory.h
+++ b/include/linux/memory.h
@@ -85,6 +85,9 @@ struct memory_block {
 	unsigned long nr_vmemmap_pages;
 	struct memory_group *group;	/* group (if any) for this block */
 	struct list_head group_next;	/* next block inside memory group */
+#ifdef CONFIG_MEMORY_FAILURE
+	atomic_long_t nr_hwpoison;
+#endif
 };
 
 int arch_get_memory_phys_device(unsigned long start_pfn);
diff --git a/include/linux/mm.h b/include/linux/mm.h
index 2bb5d1596041..2fe42bb9a517 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -3280,6 +3280,10 @@ extern int soft_offline_page(unsigned long pfn, int flags);
 #ifdef CONFIG_MEMORY_FAILURE
 extern int __get_huge_page_for_hwpoison(unsigned long pfn, int flags);
 extern void num_poisoned_pages_inc(unsigned long pfn);
+extern void memblk_nr_poison_inc(unsigned long pfn);
+extern void memblk_nr_poison_sub(unsigned long pfn, long i);
+extern unsigned long memblk_nr_poison(unsigned long pfn);
+extern void clear_hwpoisoned_pages(long nr_poison);
 #else
 static inline int __get_huge_page_for_hwpoison(unsigned long pfn, int flags)
 {
@@ -3289,6 +3293,10 @@ static inline int __get_huge_page_for_hwpoison(unsigned long pfn, int flags)
 static inline void num_poisoned_pages_inc(unsigned long pfn)
 {
 }
+
+static inline void clear_hwpoisoned_pages(long nr_poison)
+{
+}
 #endif
 
 #ifndef arch_memory_failure
diff --git a/mm/internal.h b/mm/internal.h
index b3002e03c28f..42ba8b96cab5 100644
--- a/mm/internal.h
+++ b/mm/internal.h
@@ -708,14 +708,6 @@ extern u64 hwpoison_filter_flags_value;
 extern u64 hwpoison_filter_memcg;
 extern u32 hwpoison_filter_enable;
 
-#ifdef CONFIG_MEMORY_FAILURE
-void clear_hwpoisoned_pages(struct page *memmap, int nr_pages);
-#else
-static inline void clear_hwpoisoned_pages(struct page *memmap, int nr_pages)
-{
-}
-#endif
-
 extern unsigned long  __must_check vm_mmap_pgoff(struct file *, unsigned long,
         unsigned long, unsigned long,
         unsigned long, unsigned long);
diff --git a/mm/memory-failure.c b/mm/memory-failure.c
index a069d43bc87f..03479895086d 100644
--- a/mm/memory-failure.c
+++ b/mm/memory-failure.c
@@ -74,14 +74,17 @@ atomic_long_t num_poisoned_pages __read_mostly = ATOMIC_LONG_INIT(0);
 
 static bool hw_memory_failure __read_mostly = false;
 
-static inline void num_poisoned_pages_inc(unsigned long pfn)
+void num_poisoned_pages_inc(unsigned long pfn)
 {
 	atomic_long_inc(&num_poisoned_pages);
+	memblk_nr_poison_inc(pfn);
 }
 
 static inline void num_poisoned_pages_sub(unsigned long pfn, long i)
 {
 	atomic_long_sub(i, &num_poisoned_pages);
+	if (pfn != -1UL)
+		memblk_nr_poison_sub(pfn, i);
 }
 
 /*
@@ -2414,6 +2417,10 @@ int unpoison_memory(unsigned long pfn)
 unlock_mutex:
 	mutex_unlock(&mf_mutex);
 	if (!ret || freeit) {
+		/*
+		 * TODO: per-memory_block counter might break when the page
+		 * size to be unpoisoned is larger than a memory_block.
+		 */
 		num_poisoned_pages_sub(pfn, count);
 		unpoison_pr_info("Unpoison: Software-unpoisoned page %#lx\n",
 				 page_to_pfn(p), &unpoison_rs);
@@ -2618,25 +2625,7 @@ int soft_offline_page(unsigned long pfn, int flags)
 	return ret;
 }
 
-void clear_hwpoisoned_pages(struct page *memmap, int nr_pages)
+void clear_hwpoisoned_pages(long nr_poison)
 {
-	int i, total = 0;
-
-	/*
-	 * A further optimization is to have per section refcounted
-	 * num_poisoned_pages.  But that would need more space per memmap, so
-	 * for now just do a quick global check to speed up this routine in the
-	 * absence of bad pages.
-	 */
-	if (atomic_long_read(&num_poisoned_pages) == 0)
-		return;
-
-	for (i = 0; i < nr_pages; i++) {
-		if (PageHWPoison(&memmap[i])) {
-			total++;
-			ClearPageHWPoison(&memmap[i]);
-		}
-	}
-	if (total)
-		num_poisoned_pages_sub(total);
+	num_poisoned_pages_sub(-1UL, nr_poison);
 }
diff --git a/mm/sparse.c b/mm/sparse.c
index e5a8a3a0edd7..2779b419ef2a 100644
--- a/mm/sparse.c
+++ b/mm/sparse.c
@@ -926,8 +926,6 @@ void sparse_remove_section(struct mem_section *ms, unsigned long pfn,
 		unsigned long nr_pages, unsigned long map_offset,
 		struct vmem_altmap *altmap)
 {
-	clear_hwpoisoned_pages(pfn_to_page(pfn) + map_offset,
-			nr_pages - map_offset);
 	section_deactivate(pfn, nr_pages, altmap);
 }
 #endif /* CONFIG_MEMORY_HOTPLUG */
-- 
2.25.1



^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH v4 4/4] mm/hwpoison: introduce per-memory_block hwpoison counter counter
  2022-09-21  9:13 ` [PATCH v3 4/4] mm/hwpoison: introduce per-memory_block hwpoison counter Naoya Horiguchi
@ 2022-09-23  8:26   ` Naoya Horiguchi
  2022-09-23 14:12     ` [PATCH v5 " Naoya Horiguchi
  0 siblings, 1 reply; 19+ messages in thread
From: Naoya Horiguchi @ 2022-09-23  8:26 UTC (permalink / raw)
  To: linux-mm
  Cc: Andrew Morton, Miaohe Lin, David Hildenbrand, Mike Kravetz,
	Yang Shi, Oscar Salvador, Muchun Song, Jane Chu, Naoya Horiguchi,
	linux-kernel

From: Naoya Horiguchi <naoya.horiguchi@nec.com>

Currently PageHWPoison flag does not behave well when experiencing memory
hotremove/hotplug.  Any data field in struct page is unreliable when the
associated memory is offlined, and the current mechanism can't tell whether
a memory section is onlined because a new memory devices is installed or
because previous failed offline operations are undone.  Especially if
there's a hwpoisoned memory, it's unclear what the best option is.

So introduce a new mechanism to make struct memory_block remember that
a memory block has hwpoisoned memory inside it. And make any online event
fail if the onlined memory block contains hwpoison.  struct memory_block
is freed and reallocated over ACPI-based hotremove/hotplug, but not over
sysfs-based hotremove/hotplug.  So it's desirable to implement hwpoison
counter on this struct.

Note that clear_hwpoisoned_pages() is relocated to be called earlier than
now, just before unregistering struct memory_block.  Otherwise, the
per-memory_block hwpoison counter is freed and we fail to adjust global
hwpoison counter properly.

Signed-off-by: Naoya Horiguchi <naoya.horiguchi@nec.com>
---
ChangeLog v3 -> v4:
- fixed build error (https://lore.kernel.org/linux-mm/202209231134.tnhKHRfg-lkp@intel.com/)
  by using memblk_nr_poison() to access to the member ->nr_hwpoison
---
 drivers/base/memory.c  | 41 +++++++++++++++++++++++++++++++++++++++++
 include/linux/memory.h |  3 +++
 include/linux/mm.h     |  8 ++++++++
 mm/internal.h          |  8 --------
 mm/memory-failure.c    | 31 ++++++++++---------------------
 mm/sparse.c            |  2 --
 6 files changed, 62 insertions(+), 31 deletions(-)

diff --git a/drivers/base/memory.c b/drivers/base/memory.c
index 9aa0da991cfb..f470bbfc68d0 100644
--- a/drivers/base/memory.c
+++ b/drivers/base/memory.c
@@ -183,6 +183,9 @@ static int memory_block_online(struct memory_block *mem)
 	struct zone *zone;
 	int ret;
 
+	if (memblk_nr_poison(start_pfn))
+		return -EHWPOISON;
+
 	zone = zone_for_pfn_range(mem->online_type, mem->nid, mem->group,
 				  start_pfn, nr_pages);
 
@@ -864,6 +867,7 @@ void remove_memory_block_devices(unsigned long start, unsigned long size)
 		mem = find_memory_block_by_id(block_id);
 		if (WARN_ON_ONCE(!mem))
 			continue;
+		clear_hwpoisoned_pages(memblk_nr_poison(start));
 		unregister_memory_block_under_nodes(mem);
 		remove_memory_block(mem);
 	}
@@ -1164,3 +1168,40 @@ int walk_dynamic_memory_groups(int nid, walk_memory_groups_func_t func,
 	}
 	return ret;
 }
+
+#ifdef CONFIG_MEMORY_FAILURE
+
+void memblk_nr_poison_inc(unsigned long pfn)
+{
+	const unsigned long block_id = pfn_to_block_id(pfn);
+	struct memory_block *mem = find_memory_block_by_id(block_id);
+
+	if (mem)
+		atomic_long_inc(&mem->nr_hwpoison);
+}
+
+void memblk_nr_poison_sub(unsigned long pfn, long i)
+{
+	const unsigned long block_id = pfn_to_block_id(pfn);
+	struct memory_block *mem = find_memory_block_by_id(block_id);
+
+	if (mem)
+		atomic_long_sub(i, &mem->nr_hwpoison);
+}
+
+unsigned long memblk_nr_poison(unsigned long pfn)
+{
+	const unsigned long block_id = pfn_to_block_id(pfn);
+	struct memory_block *mem = find_memory_block_by_id(block_id);
+
+	if (mem)
+		return atomic_long_read(&mem->nr_hwpoison);
+	return 0;
+}
+
+#else
+unsigned long memblk_nr_poison(unsigned long pfn)
+{
+	return 0;
+}
+#endif
diff --git a/include/linux/memory.h b/include/linux/memory.h
index aa619464a1df..74e6b3ad947f 100644
--- a/include/linux/memory.h
+++ b/include/linux/memory.h
@@ -85,6 +85,9 @@ struct memory_block {
 	unsigned long nr_vmemmap_pages;
 	struct memory_group *group;	/* group (if any) for this block */
 	struct list_head group_next;	/* next block inside memory group */
+#ifdef CONFIG_MEMORY_FAILURE
+	atomic_long_t nr_hwpoison;
+#endif
 };
 
 int arch_get_memory_phys_device(unsigned long start_pfn);
diff --git a/include/linux/mm.h b/include/linux/mm.h
index 2bb5d1596041..5445943bbb4b 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -3280,6 +3280,9 @@ extern int soft_offline_page(unsigned long pfn, int flags);
 #ifdef CONFIG_MEMORY_FAILURE
 extern int __get_huge_page_for_hwpoison(unsigned long pfn, int flags);
 extern void num_poisoned_pages_inc(unsigned long pfn);
+extern void memblk_nr_poison_inc(unsigned long pfn);
+extern void memblk_nr_poison_sub(unsigned long pfn, long i);
+extern void clear_hwpoisoned_pages(long nr_poison);
 #else
 static inline int __get_huge_page_for_hwpoison(unsigned long pfn, int flags)
 {
@@ -3289,7 +3292,12 @@ static inline int __get_huge_page_for_hwpoison(unsigned long pfn, int flags)
 static inline void num_poisoned_pages_inc(unsigned long pfn)
 {
 }
+
+static inline void clear_hwpoisoned_pages(long nr_poison)
+{
+}
 #endif
+extern unsigned long memblk_nr_poison(unsigned long pfn);
 
 #ifndef arch_memory_failure
 static inline int arch_memory_failure(unsigned long pfn, int flags)
diff --git a/mm/internal.h b/mm/internal.h
index b3002e03c28f..42ba8b96cab5 100644
--- a/mm/internal.h
+++ b/mm/internal.h
@@ -708,14 +708,6 @@ extern u64 hwpoison_filter_flags_value;
 extern u64 hwpoison_filter_memcg;
 extern u32 hwpoison_filter_enable;
 
-#ifdef CONFIG_MEMORY_FAILURE
-void clear_hwpoisoned_pages(struct page *memmap, int nr_pages);
-#else
-static inline void clear_hwpoisoned_pages(struct page *memmap, int nr_pages)
-{
-}
-#endif
-
 extern unsigned long  __must_check vm_mmap_pgoff(struct file *, unsigned long,
         unsigned long, unsigned long,
         unsigned long, unsigned long);
diff --git a/mm/memory-failure.c b/mm/memory-failure.c
index a069d43bc87f..03479895086d 100644
--- a/mm/memory-failure.c
+++ b/mm/memory-failure.c
@@ -74,14 +74,17 @@ atomic_long_t num_poisoned_pages __read_mostly = ATOMIC_LONG_INIT(0);
 
 static bool hw_memory_failure __read_mostly = false;
 
-static inline void num_poisoned_pages_inc(unsigned long pfn)
+void num_poisoned_pages_inc(unsigned long pfn)
 {
 	atomic_long_inc(&num_poisoned_pages);
+	memblk_nr_poison_inc(pfn);
 }
 
 static inline void num_poisoned_pages_sub(unsigned long pfn, long i)
 {
 	atomic_long_sub(i, &num_poisoned_pages);
+	if (pfn != -1UL)
+		memblk_nr_poison_sub(pfn, i);
 }
 
 /*
@@ -2414,6 +2417,10 @@ int unpoison_memory(unsigned long pfn)
 unlock_mutex:
 	mutex_unlock(&mf_mutex);
 	if (!ret || freeit) {
+		/*
+		 * TODO: per-memory_block counter might break when the page
+		 * size to be unpoisoned is larger than a memory_block.
+		 */
 		num_poisoned_pages_sub(pfn, count);
 		unpoison_pr_info("Unpoison: Software-unpoisoned page %#lx\n",
 				 page_to_pfn(p), &unpoison_rs);
@@ -2618,25 +2625,7 @@ int soft_offline_page(unsigned long pfn, int flags)
 	return ret;
 }
 
-void clear_hwpoisoned_pages(struct page *memmap, int nr_pages)
+void clear_hwpoisoned_pages(long nr_poison)
 {
-	int i, total = 0;
-
-	/*
-	 * A further optimization is to have per section refcounted
-	 * num_poisoned_pages.  But that would need more space per memmap, so
-	 * for now just do a quick global check to speed up this routine in the
-	 * absence of bad pages.
-	 */
-	if (atomic_long_read(&num_poisoned_pages) == 0)
-		return;
-
-	for (i = 0; i < nr_pages; i++) {
-		if (PageHWPoison(&memmap[i])) {
-			total++;
-			ClearPageHWPoison(&memmap[i]);
-		}
-	}
-	if (total)
-		num_poisoned_pages_sub(total);
+	num_poisoned_pages_sub(-1UL, nr_poison);
 }
diff --git a/mm/sparse.c b/mm/sparse.c
index e5a8a3a0edd7..2779b419ef2a 100644
--- a/mm/sparse.c
+++ b/mm/sparse.c
@@ -926,8 +926,6 @@ void sparse_remove_section(struct mem_section *ms, unsigned long pfn,
 		unsigned long nr_pages, unsigned long map_offset,
 		struct vmem_altmap *altmap)
 {
-	clear_hwpoisoned_pages(pfn_to_page(pfn) + map_offset,
-			nr_pages - map_offset);
 	section_deactivate(pfn, nr_pages, altmap);
 }
 #endif /* CONFIG_MEMORY_HOTPLUG */
-- 
2.37.3.518.g79f2338b37



^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH v5 4/4] mm/hwpoison: introduce per-memory_block hwpoison counter counter
  2022-09-23  8:26   ` [PATCH v4 4/4] mm/hwpoison: introduce per-memory_block hwpoison counter counter Naoya Horiguchi
@ 2022-09-23 14:12     ` Naoya Horiguchi
  2022-09-24 12:27       ` Miaohe Lin
  2022-09-26  8:05       ` David Hildenbrand
  0 siblings, 2 replies; 19+ messages in thread
From: Naoya Horiguchi @ 2022-09-23 14:12 UTC (permalink / raw)
  To: linux-mm
  Cc: Andrew Morton, Miaohe Lin, David Hildenbrand, Mike Kravetz,
	Yang Shi, Oscar Salvador, Muchun Song, Jane Chu, Naoya Horiguchi,
	linux-kernel

There seems another build error in aarch64 with MEMORY_HOTPLUG disabled.
https://lore.kernel.org/lkml/20220923110144.GA1413812@ik1-406-35019.vs.sakura.ne.jp/
, so let me revise this patch again to handle it.

- Naoya Horiguchi

---
From: Naoya Horiguchi <naoya.horiguchi@nec.com>
Date: Fri, 23 Sep 2022 22:51:20 +0900
Subject: [PATCH v5 4/4] mm/hwpoison: introduce per-memory_block hwpoison counter

Currently PageHWPoison flag does not behave well when experiencing memory
hotremove/hotplug.  Any data field in struct page is unreliable when the
associated memory is offlined, and the current mechanism can't tell whether
a memory section is onlined because a new memory devices is installed or
because previous failed offline operations are undone.  Especially if
there's a hwpoisoned memory, it's unclear what the best option is.

So introduce a new mechanism to make struct memory_block remember that
a memory block has hwpoisoned memory inside it. And make any online event
fail if the onlined memory block contains hwpoison.  struct memory_block
is freed and reallocated over ACPI-based hotremove/hotplug, but not over
sysfs-based hotremove/hotplug.  So it's desirable to implement hwpoison
counter on this struct.

Note that clear_hwpoisoned_pages() is relocated to be called earlier than
now, just before unregistering struct memory_block.  Otherwise, the
per-memory_block hwpoison counter is freed and we fail to adjust global
hwpoison counter properly.

Signed-off-by: Naoya Horiguchi <naoya.horiguchi@nec.com>
Reported-by: kernel test robot <lkp@intel.com>
---
ChangeLog v4 -> v5:
- add Reported-by of lkp bot,
- check both CONFIG_MEMORY_FAILURE and CONFIG_MEMORY_HOTPLUG in introduced #ifdefs,
  intending to fix "undefined reference" errors in aarch64.

ChangeLog v3 -> v4:
- fix build error (https://lore.kernel.org/linux-mm/202209231134.tnhKHRfg-lkp@intel.com/)
  by using memblk_nr_poison() to access to the member ->nr_hwpoison
---
 drivers/base/memory.c  | 34 ++++++++++++++++++++++++++++++++++
 include/linux/memory.h |  3 +++
 include/linux/mm.h     | 24 ++++++++++++++++++++++++
 mm/internal.h          |  8 --------
 mm/memory-failure.c    | 31 ++++++++++---------------------
 mm/sparse.c            |  2 --
 6 files changed, 71 insertions(+), 31 deletions(-)

diff --git a/drivers/base/memory.c b/drivers/base/memory.c
index 9aa0da991cfb..99e0e789616c 100644
--- a/drivers/base/memory.c
+++ b/drivers/base/memory.c
@@ -183,6 +183,9 @@ static int memory_block_online(struct memory_block *mem)
 	struct zone *zone;
 	int ret;
 
+	if (memblk_nr_poison(start_pfn))
+		return -EHWPOISON;
+
 	zone = zone_for_pfn_range(mem->online_type, mem->nid, mem->group,
 				  start_pfn, nr_pages);
 
@@ -864,6 +867,7 @@ void remove_memory_block_devices(unsigned long start, unsigned long size)
 		mem = find_memory_block_by_id(block_id);
 		if (WARN_ON_ONCE(!mem))
 			continue;
+		clear_hwpoisoned_pages(memblk_nr_poison(start));
 		unregister_memory_block_under_nodes(mem);
 		remove_memory_block(mem);
 	}
@@ -1164,3 +1168,33 @@ int walk_dynamic_memory_groups(int nid, walk_memory_groups_func_t func,
 	}
 	return ret;
 }
+
+#if defined(CONFIG_MEMORY_FAILURE) && defined(CONFIG_MEMORY_HOTPLUG)
+void memblk_nr_poison_inc(unsigned long pfn)
+{
+	const unsigned long block_id = pfn_to_block_id(pfn);
+	struct memory_block *mem = find_memory_block_by_id(block_id);
+
+	if (mem)
+		atomic_long_inc(&mem->nr_hwpoison);
+}
+
+void memblk_nr_poison_sub(unsigned long pfn, long i)
+{
+	const unsigned long block_id = pfn_to_block_id(pfn);
+	struct memory_block *mem = find_memory_block_by_id(block_id);
+
+	if (mem)
+		atomic_long_sub(i, &mem->nr_hwpoison);
+}
+
+unsigned long memblk_nr_poison(unsigned long pfn)
+{
+	const unsigned long block_id = pfn_to_block_id(pfn);
+	struct memory_block *mem = find_memory_block_by_id(block_id);
+
+	if (mem)
+		return atomic_long_read(&mem->nr_hwpoison);
+	return 0;
+}
+#endif
diff --git a/include/linux/memory.h b/include/linux/memory.h
index aa619464a1df..ad8cd9bb3239 100644
--- a/include/linux/memory.h
+++ b/include/linux/memory.h
@@ -85,6 +85,9 @@ struct memory_block {
 	unsigned long nr_vmemmap_pages;
 	struct memory_group *group;	/* group (if any) for this block */
 	struct list_head group_next;	/* next block inside memory group */
+#if defined(CONFIG_MEMORY_FAILURE) && defined(CONFIG_MEMORY_HOTPLUG)
+	atomic_long_t nr_hwpoison;
+#endif
 };
 
 int arch_get_memory_phys_device(unsigned long start_pfn);
diff --git a/include/linux/mm.h b/include/linux/mm.h
index 2bb5d1596041..936864d6f8be 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -3280,6 +3280,7 @@ extern int soft_offline_page(unsigned long pfn, int flags);
 #ifdef CONFIG_MEMORY_FAILURE
 extern int __get_huge_page_for_hwpoison(unsigned long pfn, int flags);
 extern void num_poisoned_pages_inc(unsigned long pfn);
+extern void clear_hwpoisoned_pages(long nr_poison);
 #else
 static inline int __get_huge_page_for_hwpoison(unsigned long pfn, int flags)
 {
@@ -3289,6 +3290,29 @@ static inline int __get_huge_page_for_hwpoison(unsigned long pfn, int flags)
 static inline void num_poisoned_pages_inc(unsigned long pfn)
 {
 }
+
+static inline void clear_hwpoisoned_pages(long nr_poison)
+{
+}
+#endif
+
+#if defined(CONFIG_MEMORY_FAILURE) && defined(CONFIG_MEMORY_HOTPLUG)
+extern void memblk_nr_poison_inc(unsigned long pfn);
+extern void memblk_nr_poison_sub(unsigned long pfn, long i);
+extern unsigned long memblk_nr_poison(unsigned long pfn);
+#else
+static inline void memblk_nr_poison_inc(unsigned long pfn)
+{
+}
+
+static inline void memblk_nr_poison_sub(unsigned long pfn, long i)
+{
+}
+
+static inline unsigned long memblk_nr_poison(unsigned long pfn)
+{
+	return 0;
+}
 #endif
 
 #ifndef arch_memory_failure
diff --git a/mm/internal.h b/mm/internal.h
index b3002e03c28f..42ba8b96cab5 100644
--- a/mm/internal.h
+++ b/mm/internal.h
@@ -708,14 +708,6 @@ extern u64 hwpoison_filter_flags_value;
 extern u64 hwpoison_filter_memcg;
 extern u32 hwpoison_filter_enable;
 
-#ifdef CONFIG_MEMORY_FAILURE
-void clear_hwpoisoned_pages(struct page *memmap, int nr_pages);
-#else
-static inline void clear_hwpoisoned_pages(struct page *memmap, int nr_pages)
-{
-}
-#endif
-
 extern unsigned long  __must_check vm_mmap_pgoff(struct file *, unsigned long,
         unsigned long, unsigned long,
         unsigned long, unsigned long);
diff --git a/mm/memory-failure.c b/mm/memory-failure.c
index a069d43bc87f..03479895086d 100644
--- a/mm/memory-failure.c
+++ b/mm/memory-failure.c
@@ -74,14 +74,17 @@ atomic_long_t num_poisoned_pages __read_mostly = ATOMIC_LONG_INIT(0);
 
 static bool hw_memory_failure __read_mostly = false;
 
-static inline void num_poisoned_pages_inc(unsigned long pfn)
+void num_poisoned_pages_inc(unsigned long pfn)
 {
 	atomic_long_inc(&num_poisoned_pages);
+	memblk_nr_poison_inc(pfn);
 }
 
 static inline void num_poisoned_pages_sub(unsigned long pfn, long i)
 {
 	atomic_long_sub(i, &num_poisoned_pages);
+	if (pfn != -1UL)
+		memblk_nr_poison_sub(pfn, i);
 }
 
 /*
@@ -2414,6 +2417,10 @@ int unpoison_memory(unsigned long pfn)
 unlock_mutex:
 	mutex_unlock(&mf_mutex);
 	if (!ret || freeit) {
+		/*
+		 * TODO: per-memory_block counter might break when the page
+		 * size to be unpoisoned is larger than a memory_block.
+		 */
 		num_poisoned_pages_sub(pfn, count);
 		unpoison_pr_info("Unpoison: Software-unpoisoned page %#lx\n",
 				 page_to_pfn(p), &unpoison_rs);
@@ -2618,25 +2625,7 @@ int soft_offline_page(unsigned long pfn, int flags)
 	return ret;
 }
 
-void clear_hwpoisoned_pages(struct page *memmap, int nr_pages)
+void clear_hwpoisoned_pages(long nr_poison)
 {
-	int i, total = 0;
-
-	/*
-	 * A further optimization is to have per section refcounted
-	 * num_poisoned_pages.  But that would need more space per memmap, so
-	 * for now just do a quick global check to speed up this routine in the
-	 * absence of bad pages.
-	 */
-	if (atomic_long_read(&num_poisoned_pages) == 0)
-		return;
-
-	for (i = 0; i < nr_pages; i++) {
-		if (PageHWPoison(&memmap[i])) {
-			total++;
-			ClearPageHWPoison(&memmap[i]);
-		}
-	}
-	if (total)
-		num_poisoned_pages_sub(total);
+	num_poisoned_pages_sub(-1UL, nr_poison);
 }
diff --git a/mm/sparse.c b/mm/sparse.c
index e5a8a3a0edd7..2779b419ef2a 100644
--- a/mm/sparse.c
+++ b/mm/sparse.c
@@ -926,8 +926,6 @@ void sparse_remove_section(struct mem_section *ms, unsigned long pfn,
 		unsigned long nr_pages, unsigned long map_offset,
 		struct vmem_altmap *altmap)
 {
-	clear_hwpoisoned_pages(pfn_to_page(pfn) + map_offset,
-			nr_pages - map_offset);
 	section_deactivate(pfn, nr_pages, altmap);
 }
 #endif /* CONFIG_MEMORY_HOTPLUG */
-- 
2.37.3.518.g79f2338b37



^ permalink raw reply related	[flat|nested] 19+ messages in thread

* Re: [PATCH v3 1/4] mm,hwpoison,hugetlb,memory_hotplug: hotremove memory section with hwpoisoned hugepage
  2022-09-21  9:13 ` [PATCH v3 1/4] mm,hwpoison,hugetlb,memory_hotplug: hotremove memory section with hwpoisoned hugepage Naoya Horiguchi
@ 2022-09-24 11:43   ` Miaohe Lin
  2022-09-28  1:26     ` Naoya Horiguchi
  0 siblings, 1 reply; 19+ messages in thread
From: Miaohe Lin @ 2022-09-24 11:43 UTC (permalink / raw)
  To: Naoya Horiguchi, linux-mm
  Cc: Andrew Morton, David Hildenbrand, Mike Kravetz, Yang Shi,
	Oscar Salvador, Muchun Song, Jane Chu, Naoya Horiguchi,
	linux-kernel

On 2022/9/21 17:13, Naoya Horiguchi wrote:
> From: Naoya Horiguchi <naoya.horiguchi@nec.com>
> 
> HWPoisoned page is not supposed to be accessed once marked, but currently
> such accesses can happen during memory hotremove because do_migrate_range()
> can be called before dissolve_free_huge_pages() is called.
> 
> Clear HPageMigratable for hwpoisoned hugepages to prevent them from being
> migrated.  This should be done in hugetlb_lock to avoid race against
> isolate_hugetlb().
> 
> get_hwpoison_huge_page() needs to have a flag to show it's called from
> unpoison to take refcount of hwpoisoned hugepages, so add it.
> 
> Reported-by: Miaohe Lin <linmiaohe@huawei.com>
> Signed-off-by: Naoya Horiguchi <naoya.horiguchi@nec.com>

Thanks for your work, Naoya. Maybe something to improve below.

> ---
> ChangeLog v2 -> v3
> - move to the approach of clearing HPageMigratable instead of shifting
>   dissolve_free_huge_pages.
> ---
>  include/linux/hugetlb.h |  4 ++--
>  mm/hugetlb.c            |  4 ++--
>  mm/memory-failure.c     | 12 ++++++++++--
>  3 files changed, 14 insertions(+), 6 deletions(-)
> 

<snip>

> @@ -7267,7 +7267,7 @@ int get_hwpoison_huge_page(struct page *page, bool *hugetlb)
>  		*hugetlb = true;
>  		if (HPageFreed(page))
>  			ret = 0;
> -		else if (HPageMigratable(page))
> +		else if (HPageMigratable(page) || unpoison)

Is unpoison_memory() expected to restore the HPageMigratable flag as well ?

>  			ret = get_page_unless_zero(page);
>  		else
>  			ret = -EBUSY;
> diff --git a/mm/memory-failure.c b/mm/memory-failure.c
> index 145bb561ddb3..5942e1c0407e 100644
> --- a/mm/memory-failure.c
> +++ b/mm/memory-failure.c
> @@ -1244,7 +1244,7 @@ static int __get_hwpoison_page(struct page *page, unsigned long flags)
>  	int ret = 0;
>  	bool hugetlb = false;
>  
> -	ret = get_hwpoison_huge_page(head, &hugetlb);
> +	ret = get_hwpoison_huge_page(head, &hugetlb, false);
>  	if (hugetlb)
>  		return ret;
>  
> @@ -1334,7 +1334,7 @@ static int __get_unpoison_page(struct page *page)
>  	int ret = 0;
>  	bool hugetlb = false;
>  
> -	ret = get_hwpoison_huge_page(head, &hugetlb);
> +	ret = get_hwpoison_huge_page(head, &hugetlb, true);
>  	if (hugetlb)
>  		return ret;
>  
> @@ -1815,6 +1815,13 @@ int __get_huge_page_for_hwpoison(unsigned long pfn, int flags)
>  		goto out;
>  	}
>  
> +	/*
> +	 * Clearing HPageMigratable for hwpoisoned hugepages to prevent them
> +	 * from being migrated by memory hotremove.
> +	 */
> +	if (count_increased)
> +		ClearHPageMigratable(head);

I believe this can prevent hwpoisoned hugepages from being migrated though there still be some windows.

> +
>  	return ret;
>  out:
>  	if (count_increased)
> @@ -1862,6 +1869,7 @@ static int try_memory_failure_hugetlb(unsigned long pfn, int flags, int *hugetlb
>  
>  	if (hwpoison_filter(p)) {
>  		hugetlb_clear_page_hwpoison(head);
> +		SetHPageMigratable(head);

Would we set HPageMigratable flag for free hugetlb pages here? IIUC, they're not expected to have this flag set.

Thanks,
Miaohe Lin

>  		unlock_page(head);
>  		if (res == 1)
>  			put_page(head);
> 



^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v3 2/4] mm/hwpoison: move definitions of num_poisoned_pages_* to memory-failure.c
  2022-09-21  9:13 ` [PATCH v3 2/4] mm/hwpoison: move definitions of num_poisoned_pages_* to memory-failure.c Naoya Horiguchi
@ 2022-09-24 11:53   ` Miaohe Lin
  2022-09-28  2:05     ` Naoya Horiguchi
  0 siblings, 1 reply; 19+ messages in thread
From: Miaohe Lin @ 2022-09-24 11:53 UTC (permalink / raw)
  To: Naoya Horiguchi, linux-mm
  Cc: Andrew Morton, David Hildenbrand, Mike Kravetz, Yang Shi,
	Oscar Salvador, Muchun Song, Jane Chu, Naoya Horiguchi,
	linux-kernel

On 2022/9/21 17:13, Naoya Horiguchi wrote:
> From: Naoya Horiguchi <naoya.horiguchi@nec.com>
> 
> These interfaces will be used by drivers/base/core.c by later patch, so as a
> preparatory work move them to more common header file visible to the file.
> 
> Signed-off-by: Naoya Horiguchi <naoya.horiguchi@nec.com>
> ---
> ChangeLog v2 -> v3:
> - added declaration of num_poisoned_pages_inc() in #ifdef CONFIG_MEMORY_FAILURE
> ---
>  arch/parisc/kernel/pdt.c |  3 +--
>  include/linux/mm.h       |  5 +++++
>  include/linux/swapops.h  | 24 ++----------------------
>  mm/memory-failure.c      | 10 ++++++++++
>  4 files changed, 18 insertions(+), 24 deletions(-)
> 
> diff --git a/arch/parisc/kernel/pdt.c b/arch/parisc/kernel/pdt.c
> index e391b175f5ec..fdc880e2575a 100644
> --- a/arch/parisc/kernel/pdt.c
> +++ b/arch/parisc/kernel/pdt.c
> @@ -18,8 +18,7 @@
>  #include <linux/kthread.h>
>  #include <linux/initrd.h>
>  #include <linux/pgtable.h>
> -#include <linux/swap.h>

Is header file "linux/swap.h" already unneeded before the code change? It seems there's
no code change in that file.

> -#include <linux/swapops.h>
> +#include <linux/mm.h>
>  
>  #include <asm/pdc.h>
>  #include <asm/pdcpat.h>
> diff --git a/include/linux/mm.h b/include/linux/mm.h
> index c2277f5aba9e..80a2d800f272 100644
> --- a/include/linux/mm.h
> +++ b/include/linux/mm.h
> @@ -3279,11 +3279,16 @@ extern atomic_long_t num_poisoned_pages __read_mostly;
>  extern int soft_offline_page(unsigned long pfn, int flags);
>  #ifdef CONFIG_MEMORY_FAILURE
>  extern int __get_huge_page_for_hwpoison(unsigned long pfn, int flags);
> +extern void num_poisoned_pages_inc(void);
>  #else
>  static inline int __get_huge_page_for_hwpoison(unsigned long pfn, int flags)
>  {
>  	return 0;
>  }
> +
> +static inline void num_poisoned_pages_inc(void)
> +{
> +}
>  #endif
>  
>  #ifndef arch_memory_failure
> diff --git a/include/linux/swapops.h b/include/linux/swapops.h
> index a91dd08e107b..3e58a812399a 100644
> --- a/include/linux/swapops.h
> +++ b/include/linux/swapops.h
> @@ -581,8 +581,6 @@ static inline int is_pmd_migration_entry(pmd_t pmd)
>  
>  #ifdef CONFIG_MEMORY_FAILURE
>  
> -extern atomic_long_t num_poisoned_pages __read_mostly;
> -
>  /*
>   * Support for hardware poisoned pages
>   */
> @@ -610,17 +608,7 @@ static inline struct page *hwpoison_entry_to_page(swp_entry_t entry)
>  	return p;
>  }
>  
> -static inline void num_poisoned_pages_inc(void)
> -{
> -	atomic_long_inc(&num_poisoned_pages);
> -}
> -
> -static inline void num_poisoned_pages_sub(long i)
> -{
> -	atomic_long_sub(i, &num_poisoned_pages);
> -}
> -
> -#else  /* CONFIG_MEMORY_FAILURE */
> +#else
>  
>  static inline swp_entry_t make_hwpoison_entry(struct page *page)
>  {
> @@ -636,15 +624,7 @@ static inline struct page *hwpoison_entry_to_page(swp_entry_t entry)
>  {
>  	return NULL;
>  }
> -
> -static inline void num_poisoned_pages_inc(void)
> -{
> -}
> -
> -static inline void num_poisoned_pages_sub(long i)
> -{
> -}
> -#endif  /* CONFIG_MEMORY_FAILURE */
> +#endif
>  
>  static inline int non_swap_entry(swp_entry_t entry)
>  {
> diff --git a/mm/memory-failure.c b/mm/memory-failure.c
> index 5942e1c0407e..aa6ce685b863 100644
> --- a/mm/memory-failure.c
> +++ b/mm/memory-failure.c
> @@ -74,6 +74,16 @@ atomic_long_t num_poisoned_pages __read_mostly = ATOMIC_LONG_INIT(0);
>  
>  static bool hw_memory_failure __read_mostly = false;
>  
> +static inline void num_poisoned_pages_inc(void)

This function is defined as "static inline" while it's "extern void num_poisoned_pages_inc(void)"
in the header file. Is this expected?

Thanks,
Miaohe Lin

> +{
> +	atomic_long_inc(&num_poisoned_pages);
> +}
> +
> +static inline void num_poisoned_pages_sub(long i)
> +{
> +	atomic_long_sub(i, &num_poisoned_pages);
> +}
> +
>  /*
>   * Return values:
>   *   1:   the page is dissolved (if needed) and taken off from buddy,
> 



^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v5 4/4] mm/hwpoison: introduce per-memory_block hwpoison counter counter
  2022-09-23 14:12     ` [PATCH v5 " Naoya Horiguchi
@ 2022-09-24 12:27       ` Miaohe Lin
  2022-10-07  0:47         ` HORIGUCHI NAOYA(堀口 直也)
  2022-09-26  8:05       ` David Hildenbrand
  1 sibling, 1 reply; 19+ messages in thread
From: Miaohe Lin @ 2022-09-24 12:27 UTC (permalink / raw)
  To: Naoya Horiguchi, linux-mm
  Cc: Andrew Morton, David Hildenbrand, Mike Kravetz, Yang Shi,
	Oscar Salvador, Muchun Song, Jane Chu, Naoya Horiguchi,
	linux-kernel

On 2022/9/23 22:12, Naoya Horiguchi wrote:
> There seems another build error in aarch64 with MEMORY_HOTPLUG disabled.
> https://lore.kernel.org/lkml/20220923110144.GA1413812@ik1-406-35019.vs.sakura.ne.jp/
> , so let me revise this patch again to handle it.
> 
> - Naoya Horiguchi
> 
> ---
> From: Naoya Horiguchi <naoya.horiguchi@nec.com>
> Date: Fri, 23 Sep 2022 22:51:20 +0900
> Subject: [PATCH v5 4/4] mm/hwpoison: introduce per-memory_block hwpoison counter
> 
> Currently PageHWPoison flag does not behave well when experiencing memory
> hotremove/hotplug.  Any data field in struct page is unreliable when the
> associated memory is offlined, and the current mechanism can't tell whether
> a memory section is onlined because a new memory devices is installed or
> because previous failed offline operations are undone.  Especially if
> there's a hwpoisoned memory, it's unclear what the best option is.
> 
> So introduce a new mechanism to make struct memory_block remember that
> a memory block has hwpoisoned memory inside it. And make any online event
> fail if the onlined memory block contains hwpoison.  struct memory_block
> is freed and reallocated over ACPI-based hotremove/hotplug, but not over
> sysfs-based hotremove/hotplug.  So it's desirable to implement hwpoison
> counter on this struct.
> 
> Note that clear_hwpoisoned_pages() is relocated to be called earlier than
> now, just before unregistering struct memory_block.  Otherwise, the
> per-memory_block hwpoison counter is freed and we fail to adjust global
> hwpoison counter properly.
> 
> Signed-off-by: Naoya Horiguchi <naoya.horiguchi@nec.com>
> Reported-by: kernel test robot <lkp@intel.com>

LGTM with some nits below. Thanks.

Reviewed-by: Miaohe Lin <linmiaohe@huawei.com>

> ---
> ChangeLog v4 -> v5:
> - add Reported-by of lkp bot,
> - check both CONFIG_MEMORY_FAILURE and CONFIG_MEMORY_HOTPLUG in introduced #ifdefs,
>   intending to fix "undefined reference" errors in aarch64.
> 
> ChangeLog v3 -> v4:
> - fix build error (https://lore.kernel.org/linux-mm/202209231134.tnhKHRfg-lkp@intel.com/)
>   by using memblk_nr_poison() to access to the member ->nr_hwpoison
> ---
>  drivers/base/memory.c  | 34 ++++++++++++++++++++++++++++++++++
>  include/linux/memory.h |  3 +++
>  include/linux/mm.h     | 24 ++++++++++++++++++++++++
>  mm/internal.h          |  8 --------
>  mm/memory-failure.c    | 31 ++++++++++---------------------
>  mm/sparse.c            |  2 --
>  6 files changed, 71 insertions(+), 31 deletions(-)
> 
> diff --git a/drivers/base/memory.c b/drivers/base/memory.c
> index 9aa0da991cfb..99e0e789616c 100644
> --- a/drivers/base/memory.c
> +++ b/drivers/base/memory.c
> @@ -183,6 +183,9 @@ static int memory_block_online(struct memory_block *mem)
>  	struct zone *zone;
>  	int ret;
>  
> +	if (memblk_nr_poison(start_pfn))
> +		return -EHWPOISON;
> +
>  	zone = zone_for_pfn_range(mem->online_type, mem->nid, mem->group,
>  				  start_pfn, nr_pages);
>  
> @@ -864,6 +867,7 @@ void remove_memory_block_devices(unsigned long start, unsigned long size)
>  		mem = find_memory_block_by_id(block_id);
>  		if (WARN_ON_ONCE(!mem))
>  			continue;
> +		clear_hwpoisoned_pages(memblk_nr_poison(start));

clear_hwpoisoned_pages seems not a proper name now? PageHWPoison info is kept now. But this should be trivial.

>  		unregister_memory_block_under_nodes(mem);
>  		remove_memory_block(mem);
>  	}
> @@ -1164,3 +1168,33 @@ int walk_dynamic_memory_groups(int nid, walk_memory_groups_func_t func,
>  	}
>  	return ret;
>  }
> +
> +#if defined(CONFIG_MEMORY_FAILURE) && defined(CONFIG_MEMORY_HOTPLUG)
> +void memblk_nr_poison_inc(unsigned long pfn)
> +{
> +	const unsigned long block_id = pfn_to_block_id(pfn);
> +	struct memory_block *mem = find_memory_block_by_id(block_id);
> +
> +	if (mem)
> +		atomic_long_inc(&mem->nr_hwpoison);
> +}
> +
> +void memblk_nr_poison_sub(unsigned long pfn, long i)
> +{
> +	const unsigned long block_id = pfn_to_block_id(pfn);
> +	struct memory_block *mem = find_memory_block_by_id(block_id);
> +
> +	if (mem)
> +		atomic_long_sub(i, &mem->nr_hwpoison);
> +}
> +
> +unsigned long memblk_nr_poison(unsigned long pfn)

memblk_nr_poison() is only used inside this file. Make it static?

Thanks,
Miaohe Lin



^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v5 4/4] mm/hwpoison: introduce per-memory_block hwpoison counter counter
  2022-09-23 14:12     ` [PATCH v5 " Naoya Horiguchi
  2022-09-24 12:27       ` Miaohe Lin
@ 2022-09-26  8:05       ` David Hildenbrand
  2022-10-07  0:52         ` HORIGUCHI NAOYA(堀口 直也)
  1 sibling, 1 reply; 19+ messages in thread
From: David Hildenbrand @ 2022-09-26  8:05 UTC (permalink / raw)
  To: Naoya Horiguchi, linux-mm
  Cc: Andrew Morton, Miaohe Lin, Mike Kravetz, Yang Shi,
	Oscar Salvador, Muchun Song, Jane Chu, Naoya Horiguchi,
	linux-kernel

>   /*
> @@ -2414,6 +2417,10 @@ int unpoison_memory(unsigned long pfn)
>   unlock_mutex:
>   	mutex_unlock(&mf_mutex);
>   	if (!ret || freeit) {
> +		/*
> +		 * TODO: per-memory_block counter might break when the page
> +		 * size to be unpoisoned is larger than a memory_block.
> +		 */

Hmm, but that happens easily e.g., with 1 GiB hugetlb page and 128 MiB 
memory section/block size. What would be the right thing to do here? The 
TODO should rather spell that out instead of just stating the problem.

-- 
Thanks,

David / dhildenb



^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v3 1/4] mm,hwpoison,hugetlb,memory_hotplug: hotremove memory section with hwpoisoned hugepage
  2022-09-24 11:43   ` Miaohe Lin
@ 2022-09-28  1:26     ` Naoya Horiguchi
  2022-09-28  9:32       ` Miaohe Lin
  0 siblings, 1 reply; 19+ messages in thread
From: Naoya Horiguchi @ 2022-09-28  1:26 UTC (permalink / raw)
  To: Miaohe Lin
  Cc: linux-mm, Andrew Morton, David Hildenbrand, Mike Kravetz,
	Yang Shi, Oscar Salvador, Muchun Song, Jane Chu, Naoya Horiguchi,
	linux-kernel

On Sat, Sep 24, 2022 at 07:43:16PM +0800, Miaohe Lin wrote:
> On 2022/9/21 17:13, Naoya Horiguchi wrote:
> > From: Naoya Horiguchi <naoya.horiguchi@nec.com>
> > 
> > HWPoisoned page is not supposed to be accessed once marked, but currently
> > such accesses can happen during memory hotremove because do_migrate_range()
> > can be called before dissolve_free_huge_pages() is called.
> > 
> > Clear HPageMigratable for hwpoisoned hugepages to prevent them from being
> > migrated.  This should be done in hugetlb_lock to avoid race against
> > isolate_hugetlb().
> > 
> > get_hwpoison_huge_page() needs to have a flag to show it's called from
> > unpoison to take refcount of hwpoisoned hugepages, so add it.
> > 
> > Reported-by: Miaohe Lin <linmiaohe@huawei.com>
> > Signed-off-by: Naoya Horiguchi <naoya.horiguchi@nec.com>
> 
> Thanks for your work, Naoya. Maybe something to improve below.
> 
> > ---
> > ChangeLog v2 -> v3
> > - move to the approach of clearing HPageMigratable instead of shifting
> >   dissolve_free_huge_pages.
> > ---
> >  include/linux/hugetlb.h |  4 ++--
> >  mm/hugetlb.c            |  4 ++--
> >  mm/memory-failure.c     | 12 ++++++++++--
> >  3 files changed, 14 insertions(+), 6 deletions(-)
> > 
> 
> <snip>
> 
> > @@ -7267,7 +7267,7 @@ int get_hwpoison_huge_page(struct page *page, bool *hugetlb)
> >  		*hugetlb = true;
> >  		if (HPageFreed(page))
> >  			ret = 0;
> > -		else if (HPageMigratable(page))
> > +		else if (HPageMigratable(page) || unpoison)
> 
> Is unpoison_memory() expected to restore the HPageMigratable flag as well ?

No it isn't. When unpoison_memory() unpoisons a hugepage, the hugepage
is sent back to free hugepage pool, so I think that there's no need to
restore HPageMigratable for it.

> 
> >  			ret = get_page_unless_zero(page);
> >  		else
> >  			ret = -EBUSY;
> > diff --git a/mm/memory-failure.c b/mm/memory-failure.c
> > index 145bb561ddb3..5942e1c0407e 100644
> > --- a/mm/memory-failure.c
> > +++ b/mm/memory-failure.c
> > @@ -1244,7 +1244,7 @@ static int __get_hwpoison_page(struct page *page, unsigned long flags)
> >  	int ret = 0;
> >  	bool hugetlb = false;
> >  
> > -	ret = get_hwpoison_huge_page(head, &hugetlb);
> > +	ret = get_hwpoison_huge_page(head, &hugetlb, false);
> >  	if (hugetlb)
> >  		return ret;
> >  
> > @@ -1334,7 +1334,7 @@ static int __get_unpoison_page(struct page *page)
> >  	int ret = 0;
> >  	bool hugetlb = false;
> >  
> > -	ret = get_hwpoison_huge_page(head, &hugetlb);
> > +	ret = get_hwpoison_huge_page(head, &hugetlb, true);
> >  	if (hugetlb)
> >  		return ret;
> >  
> > @@ -1815,6 +1815,13 @@ int __get_huge_page_for_hwpoison(unsigned long pfn, int flags)
> >  		goto out;
> >  	}
> >  
> > +	/*
> > +	 * Clearing HPageMigratable for hwpoisoned hugepages to prevent them
> > +	 * from being migrated by memory hotremove.
> > +	 */
> > +	if (count_increased)
> > +		ClearHPageMigratable(head);
> 
> I believe this can prevent hwpoisoned hugepages from being migrated though there still be some windows.

I'm not sure of "still racy" part, so could you elaborate it?
Main scenario this patch tries to handle is like below:

  CPU 0                                   CPU 1                                   
  get_huge_page_for_hwpoison                                                      
    // take hugetlb_lock                                                          
    __get_huge_page_for_hwpoison                                                  
                                          scan_movable_pages                      
                                            if HPageMigratable                    
                                              goto found                          
                                          do_migrate_range                        
      if HPageMigratable                                                          
        get_page_unless_zero                                                      
      hugetlb_set_page_hwpoison                                                   
      ClearHPageMigratable                                                        
    // release hugetlb_lock                                                       
                                            isolate_hugetlb                       
                                              // take hugetlb_lock                
                                              if !HPageMigratable                 
                                                // fails to isolate the hwpoisoned hugetlb.

Maybe the following could be possible.

  CPU 0                                   CPU 1
                                          scan_movable_pages
                                            if HPageMigratable
                                              goto found
                                          do_migrate_range
                                            isolate_hugetlb
                                              // the hugetlb is isolated,
                                              // but it's not hwpoisoned yet.
  get_huge_page_for_hwpoison
    // take hugetlb_lock
    __get_huge_page_for_hwpoison
      if HPageMigratable
        get_page_unless_zero
      hugetlb_set_page_hwpoison
      ClearHPageMigratable
    // release hugetlb_lock

In this case, the hugepage is maybe broken but not marked as hwpoison yet,
so it's not detectable.

> 
> > +
> >  	return ret;
> >  out:
> >  	if (count_increased)
> > @@ -1862,6 +1869,7 @@ static int try_memory_failure_hugetlb(unsigned long pfn, int flags, int *hugetlb
> >  
> >  	if (hwpoison_filter(p)) {
> >  		hugetlb_clear_page_hwpoison(head);
> > +		SetHPageMigratable(head);
> 
> Would we set HPageMigratable flag for free hugetlb pages here? IIUC, they're not expected to have this flag set.

Thank you, you're right.  This should be done in "if (res == 1)" block.
(hwpoison_filter often bothers me ...)

> 
> Thanks,
> Miaohe Lin

Thank you very much!
- Naoya Horiguchi


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v3 2/4] mm/hwpoison: move definitions of num_poisoned_pages_* to memory-failure.c
  2022-09-24 11:53   ` Miaohe Lin
@ 2022-09-28  2:05     ` Naoya Horiguchi
  2022-09-28  7:56       ` Miaohe Lin
  0 siblings, 1 reply; 19+ messages in thread
From: Naoya Horiguchi @ 2022-09-28  2:05 UTC (permalink / raw)
  To: Miaohe Lin
  Cc: linux-mm, Andrew Morton, David Hildenbrand, Mike Kravetz,
	Yang Shi, Oscar Salvador, Muchun Song, Jane Chu, Naoya Horiguchi,
	linux-kernel

On Sat, Sep 24, 2022 at 07:53:15PM +0800, Miaohe Lin wrote:
> On 2022/9/21 17:13, Naoya Horiguchi wrote:
> > From: Naoya Horiguchi <naoya.horiguchi@nec.com>
> > 
> > These interfaces will be used by drivers/base/core.c by later patch, so as a
> > preparatory work move them to more common header file visible to the file.
> > 
> > Signed-off-by: Naoya Horiguchi <naoya.horiguchi@nec.com>
> > ---
> > ChangeLog v2 -> v3:
> > - added declaration of num_poisoned_pages_inc() in #ifdef CONFIG_MEMORY_FAILURE
> > ---
> >  arch/parisc/kernel/pdt.c |  3 +--
> >  include/linux/mm.h       |  5 +++++
> >  include/linux/swapops.h  | 24 ++----------------------
> >  mm/memory-failure.c      | 10 ++++++++++
> >  4 files changed, 18 insertions(+), 24 deletions(-)
> > 
> > diff --git a/arch/parisc/kernel/pdt.c b/arch/parisc/kernel/pdt.c
> > index e391b175f5ec..fdc880e2575a 100644
> > --- a/arch/parisc/kernel/pdt.c
> > +++ b/arch/parisc/kernel/pdt.c
> > @@ -18,8 +18,7 @@
> >  #include <linux/kthread.h>
> >  #include <linux/initrd.h>
> >  #include <linux/pgtable.h>
> > -#include <linux/swap.h>
> 
> Is header file "linux/swap.h" already unneeded before the code change? It seems there's
> no code change in that file.

Maybe yes.  I updated this line too because it's introduced together
swapops.h by the following commit.

  commit 0e5a7ff6e36ad58933d076ddcac36ff14d014692
  Author: Helge Deller <deller@gmx.de>
  Date:   Fri Jul 24 19:17:52 2020 +0200
  
      parisc: Report bad pages as HardwareCorrupted

> 
> > -#include <linux/swapops.h>
> > +#include <linux/mm.h>
> >  
> >  #include <asm/pdc.h>
> >  #include <asm/pdcpat.h>
> > diff --git a/include/linux/mm.h b/include/linux/mm.h
> > index c2277f5aba9e..80a2d800f272 100644
> > --- a/include/linux/mm.h
> > +++ b/include/linux/mm.h
> > @@ -3279,11 +3279,16 @@ extern atomic_long_t num_poisoned_pages __read_mostly;
> >  extern int soft_offline_page(unsigned long pfn, int flags);
> >  #ifdef CONFIG_MEMORY_FAILURE
> >  extern int __get_huge_page_for_hwpoison(unsigned long pfn, int flags);
> > +extern void num_poisoned_pages_inc(void);
> >  #else
> >  static inline int __get_huge_page_for_hwpoison(unsigned long pfn, int flags)
> >  {
> >  	return 0;
> >  }
> > +
> > +static inline void num_poisoned_pages_inc(void)
> > +{
> > +}
> >  #endif
> >  
> >  #ifndef arch_memory_failure
> > diff --git a/include/linux/swapops.h b/include/linux/swapops.h
> > index a91dd08e107b..3e58a812399a 100644
> > --- a/include/linux/swapops.h
> > +++ b/include/linux/swapops.h
> > @@ -581,8 +581,6 @@ static inline int is_pmd_migration_entry(pmd_t pmd)
> >  
> >  #ifdef CONFIG_MEMORY_FAILURE
> >  
> > -extern atomic_long_t num_poisoned_pages __read_mostly;
> > -
> >  /*
> >   * Support for hardware poisoned pages
> >   */
> > @@ -610,17 +608,7 @@ static inline struct page *hwpoison_entry_to_page(swp_entry_t entry)
> >  	return p;
> >  }
> >  
> > -static inline void num_poisoned_pages_inc(void)
> > -{
> > -	atomic_long_inc(&num_poisoned_pages);
> > -}
> > -
> > -static inline void num_poisoned_pages_sub(long i)
> > -{
> > -	atomic_long_sub(i, &num_poisoned_pages);
> > -}
> > -
> > -#else  /* CONFIG_MEMORY_FAILURE */
> > +#else
> >  
> >  static inline swp_entry_t make_hwpoison_entry(struct page *page)
> >  {
> > @@ -636,15 +624,7 @@ static inline struct page *hwpoison_entry_to_page(swp_entry_t entry)
> >  {
> >  	return NULL;
> >  }
> > -
> > -static inline void num_poisoned_pages_inc(void)
> > -{
> > -}
> > -
> > -static inline void num_poisoned_pages_sub(long i)
> > -{
> > -}
> > -#endif  /* CONFIG_MEMORY_FAILURE */
> > +#endif
> >  
> >  static inline int non_swap_entry(swp_entry_t entry)
> >  {
> > diff --git a/mm/memory-failure.c b/mm/memory-failure.c
> > index 5942e1c0407e..aa6ce685b863 100644
> > --- a/mm/memory-failure.c
> > +++ b/mm/memory-failure.c
> > @@ -74,6 +74,16 @@ atomic_long_t num_poisoned_pages __read_mostly = ATOMIC_LONG_INIT(0);
> >  
> >  static bool hw_memory_failure __read_mostly = false;
> >  
> > +static inline void num_poisoned_pages_inc(void)
> 
> This function is defined as "static inline" while it's "extern void num_poisoned_pages_inc(void)"
> in the header file. Is this expected?

No. 4/4 effectively fixes it, but I should've done this in this patch.
Thank you,
- Naoya Horiguchi

> 
> Thanks,
> Miaohe Lin
> 
> > +{
> > +	atomic_long_inc(&num_poisoned_pages);
> > +}
> > +
> > +static inline void num_poisoned_pages_sub(long i)
> > +{
> > +	atomic_long_sub(i, &num_poisoned_pages);
> > +}
> > +
> >  /*
> >   * Return values:
> >   *   1:   the page is dissolved (if needed) and taken off from buddy,
> > 
> 


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v3 2/4] mm/hwpoison: move definitions of num_poisoned_pages_* to memory-failure.c
  2022-09-28  2:05     ` Naoya Horiguchi
@ 2022-09-28  7:56       ` Miaohe Lin
  0 siblings, 0 replies; 19+ messages in thread
From: Miaohe Lin @ 2022-09-28  7:56 UTC (permalink / raw)
  To: Naoya Horiguchi
  Cc: linux-mm, Andrew Morton, David Hildenbrand, Mike Kravetz,
	Yang Shi, Oscar Salvador, Muchun Song, Jane Chu, Naoya Horiguchi,
	linux-kernel

On 2022/9/28 10:05, Naoya Horiguchi wrote:
> On Sat, Sep 24, 2022 at 07:53:15PM +0800, Miaohe Lin wrote:
>> On 2022/9/21 17:13, Naoya Horiguchi wrote:
>>> From: Naoya Horiguchi <naoya.horiguchi@nec.com>
>>>
>>> These interfaces will be used by drivers/base/core.c by later patch, so as a
>>> preparatory work move them to more common header file visible to the file.
>>>
>>> Signed-off-by: Naoya Horiguchi <naoya.horiguchi@nec.com>
>>> ---
>>> ChangeLog v2 -> v3:
>>> - added declaration of num_poisoned_pages_inc() in #ifdef CONFIG_MEMORY_FAILURE
>>> ---
>>>  arch/parisc/kernel/pdt.c |  3 +--
>>>  include/linux/mm.h       |  5 +++++
>>>  include/linux/swapops.h  | 24 ++----------------------
>>>  mm/memory-failure.c      | 10 ++++++++++
>>>  4 files changed, 18 insertions(+), 24 deletions(-)
>>>
>>> diff --git a/arch/parisc/kernel/pdt.c b/arch/parisc/kernel/pdt.c
>>> index e391b175f5ec..fdc880e2575a 100644
>>> --- a/arch/parisc/kernel/pdt.c
>>> +++ b/arch/parisc/kernel/pdt.c
>>> @@ -18,8 +18,7 @@
>>>  #include <linux/kthread.h>
>>>  #include <linux/initrd.h>
>>>  #include <linux/pgtable.h>
>>> -#include <linux/swap.h>
>>
>> Is header file "linux/swap.h" already unneeded before the code change? It seems there's
>> no code change in that file.
> 
> Maybe yes.  I updated this line too because it's introduced together
> swapops.h by the following commit.
> 
>   commit 0e5a7ff6e36ad58933d076ddcac36ff14d014692
>   Author: Helge Deller <deller@gmx.de>
>   Date:   Fri Jul 24 19:17:52 2020 +0200
>   
>       parisc: Report bad pages as HardwareCorrupted
> 

I see. Many thanks for your explanation.

>>
>>> -#include <linux/swapops.h>
>>> +#include <linux/mm.h>
>>>  
>>>  #include <asm/pdc.h>
>>>  #include <asm/pdcpat.h>
>>> diff --git a/include/linux/mm.h b/include/linux/mm.h
>>> index c2277f5aba9e..80a2d800f272 100644
>>> --- a/include/linux/mm.h
>>> +++ b/include/linux/mm.h
>>> @@ -3279,11 +3279,16 @@ extern atomic_long_t num_poisoned_pages __read_mostly;
>>>  extern int soft_offline_page(unsigned long pfn, int flags);
>>>  #ifdef CONFIG_MEMORY_FAILURE
>>>  extern int __get_huge_page_for_hwpoison(unsigned long pfn, int flags);
>>> +extern void num_poisoned_pages_inc(void);
>>>  #else
>>>  static inline int __get_huge_page_for_hwpoison(unsigned long pfn, int flags)
>>>  {
>>>  	return 0;
>>>  }
>>> +
>>> +static inline void num_poisoned_pages_inc(void)
>>> +{
>>> +}
>>>  #endif
>>>  
>>>  #ifndef arch_memory_failure
>>> diff --git a/include/linux/swapops.h b/include/linux/swapops.h
>>> index a91dd08e107b..3e58a812399a 100644
>>> --- a/include/linux/swapops.h
>>> +++ b/include/linux/swapops.h
>>> @@ -581,8 +581,6 @@ static inline int is_pmd_migration_entry(pmd_t pmd)
>>>  
>>>  #ifdef CONFIG_MEMORY_FAILURE
>>>  
>>> -extern atomic_long_t num_poisoned_pages __read_mostly;
>>> -
>>>  /*
>>>   * Support for hardware poisoned pages
>>>   */
>>> @@ -610,17 +608,7 @@ static inline struct page *hwpoison_entry_to_page(swp_entry_t entry)
>>>  	return p;
>>>  }
>>>  
>>> -static inline void num_poisoned_pages_inc(void)
>>> -{
>>> -	atomic_long_inc(&num_poisoned_pages);
>>> -}
>>> -
>>> -static inline void num_poisoned_pages_sub(long i)
>>> -{
>>> -	atomic_long_sub(i, &num_poisoned_pages);
>>> -}
>>> -
>>> -#else  /* CONFIG_MEMORY_FAILURE */
>>> +#else
>>>  
>>>  static inline swp_entry_t make_hwpoison_entry(struct page *page)
>>>  {
>>> @@ -636,15 +624,7 @@ static inline struct page *hwpoison_entry_to_page(swp_entry_t entry)
>>>  {
>>>  	return NULL;
>>>  }
>>> -
>>> -static inline void num_poisoned_pages_inc(void)
>>> -{
>>> -}
>>> -
>>> -static inline void num_poisoned_pages_sub(long i)
>>> -{
>>> -}
>>> -#endif  /* CONFIG_MEMORY_FAILURE */
>>> +#endif
>>>  
>>>  static inline int non_swap_entry(swp_entry_t entry)
>>>  {
>>> diff --git a/mm/memory-failure.c b/mm/memory-failure.c
>>> index 5942e1c0407e..aa6ce685b863 100644
>>> --- a/mm/memory-failure.c
>>> +++ b/mm/memory-failure.c
>>> @@ -74,6 +74,16 @@ atomic_long_t num_poisoned_pages __read_mostly = ATOMIC_LONG_INIT(0);
>>>  
>>>  static bool hw_memory_failure __read_mostly = false;
>>>  
>>> +static inline void num_poisoned_pages_inc(void)
>>
>> This function is defined as "static inline" while it's "extern void num_poisoned_pages_inc(void)"
>> in the header file. Is this expected?
> 
> No. 4/4 effectively fixes it, but I should've done this in this patch.
> Thank you,
> - Naoya Horiguchi

Then this patch looks good to me. Thanks.

Reviewed-by: Miaohe Lin <linmiaohe@huawei.com>

Thanks,
Miaohe Lin



^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v3 1/4] mm,hwpoison,hugetlb,memory_hotplug: hotremove memory section with hwpoisoned hugepage
  2022-09-28  1:26     ` Naoya Horiguchi
@ 2022-09-28  9:32       ` Miaohe Lin
  2022-10-07  0:45         ` HORIGUCHI NAOYA(堀口 直也)
  0 siblings, 1 reply; 19+ messages in thread
From: Miaohe Lin @ 2022-09-28  9:32 UTC (permalink / raw)
  To: Naoya Horiguchi
  Cc: linux-mm, Andrew Morton, David Hildenbrand, Mike Kravetz,
	Yang Shi, Oscar Salvador, Muchun Song, Jane Chu, Naoya Horiguchi,
	linux-kernel

On 2022/9/28 9:26, Naoya Horiguchi wrote:
> On Sat, Sep 24, 2022 at 07:43:16PM +0800, Miaohe Lin wrote:
>> On 2022/9/21 17:13, Naoya Horiguchi wrote:
>>> From: Naoya Horiguchi <naoya.horiguchi@nec.com>
>>>
>>> HWPoisoned page is not supposed to be accessed once marked, but currently
>>> such accesses can happen during memory hotremove because do_migrate_range()
>>> can be called before dissolve_free_huge_pages() is called.
>>>
>>> Clear HPageMigratable for hwpoisoned hugepages to prevent them from being
>>> migrated.  This should be done in hugetlb_lock to avoid race against
>>> isolate_hugetlb().
>>>
>>> get_hwpoison_huge_page() needs to have a flag to show it's called from
>>> unpoison to take refcount of hwpoisoned hugepages, so add it.
>>>
>>> Reported-by: Miaohe Lin <linmiaohe@huawei.com>
>>> Signed-off-by: Naoya Horiguchi <naoya.horiguchi@nec.com>
>>
>> Thanks for your work, Naoya. Maybe something to improve below.
>>
>>> ---
>>> ChangeLog v2 -> v3
>>> - move to the approach of clearing HPageMigratable instead of shifting
>>>   dissolve_free_huge_pages.
>>> ---
>>>  include/linux/hugetlb.h |  4 ++--
>>>  mm/hugetlb.c            |  4 ++--
>>>  mm/memory-failure.c     | 12 ++++++++++--
>>>  3 files changed, 14 insertions(+), 6 deletions(-)
>>>
>>
>> <snip>
>>
>>> @@ -7267,7 +7267,7 @@ int get_hwpoison_huge_page(struct page *page, bool *hugetlb)
>>>  		*hugetlb = true;
>>>  		if (HPageFreed(page))
>>>  			ret = 0;
>>> -		else if (HPageMigratable(page))
>>> +		else if (HPageMigratable(page) || unpoison)
>>
>> Is unpoison_memory() expected to restore the HPageMigratable flag as well ?
> 
> No it isn't. When unpoison_memory() unpoisons a hugepage, the hugepage
> is sent back to free hugepage pool, so I think that there's no need to
> restore HPageMigratable for it.

I tend to agree with you.

> 
>>
>>>  			ret = get_page_unless_zero(page);
>>>  		else
>>>  			ret = -EBUSY;
>>> diff --git a/mm/memory-failure.c b/mm/memory-failure.c
>>> index 145bb561ddb3..5942e1c0407e 100644
>>> --- a/mm/memory-failure.c
>>> +++ b/mm/memory-failure.c
>>> @@ -1244,7 +1244,7 @@ static int __get_hwpoison_page(struct page *page, unsigned long flags)
>>>  	int ret = 0;
>>>  	bool hugetlb = false;
>>>  
>>> -	ret = get_hwpoison_huge_page(head, &hugetlb);
>>> +	ret = get_hwpoison_huge_page(head, &hugetlb, false);
>>>  	if (hugetlb)
>>>  		return ret;
>>>  
>>> @@ -1334,7 +1334,7 @@ static int __get_unpoison_page(struct page *page)
>>>  	int ret = 0;
>>>  	bool hugetlb = false;
>>>  
>>> -	ret = get_hwpoison_huge_page(head, &hugetlb);
>>> +	ret = get_hwpoison_huge_page(head, &hugetlb, true);
>>>  	if (hugetlb)
>>>  		return ret;
>>>  
>>> @@ -1815,6 +1815,13 @@ int __get_huge_page_for_hwpoison(unsigned long pfn, int flags)
>>>  		goto out;
>>>  	}
>>>  
>>> +	/*
>>> +	 * Clearing HPageMigratable for hwpoisoned hugepages to prevent them
>>> +	 * from being migrated by memory hotremove.
>>> +	 */
>>> +	if (count_increased)
>>> +		ClearHPageMigratable(head);
>>
>> I believe this can prevent hwpoisoned hugepages from being migrated though there still be some windows.
> 
> I'm not sure of "still racy" part, so could you elaborate it?
> Main scenario this patch tries to handle is like below:
> 
>   CPU 0                                   CPU 1                                   
>   get_huge_page_for_hwpoison                                                      
>     // take hugetlb_lock                                                          
>     __get_huge_page_for_hwpoison                                                  
>                                           scan_movable_pages                      
>                                             if HPageMigratable                    
>                                               goto found                          
>                                           do_migrate_range                        
>       if HPageMigratable                                                          
>         get_page_unless_zero                                                      
>       hugetlb_set_page_hwpoison                                                   
>       ClearHPageMigratable                                                        
>     // release hugetlb_lock                                                       
>                                             isolate_hugetlb                       
>                                               // take hugetlb_lock                
>                                               if !HPageMigratable                 
>                                                 // fails to isolate the hwpoisoned hugetlb.
> 
> Maybe the following could be possible.
> 
>   CPU 0                                   CPU 1
>                                           scan_movable_pages
>                                             if HPageMigratable
>                                               goto found
>                                           do_migrate_range
>                                             isolate_hugetlb
>                                               // the hugetlb is isolated,
>                                               // but it's not hwpoisoned yet.
>   get_huge_page_for_hwpoison
>     // take hugetlb_lock
>     __get_huge_page_for_hwpoison
>       if HPageMigratable
>         get_page_unless_zero
>       hugetlb_set_page_hwpoison
>       ClearHPageMigratable
>     // release hugetlb_lock

Yes, this is what I mean. For already isolated hugetlb pages, HPageMigratable flags can't help much.

> 
> In this case, the hugepage is maybe broken but not marked as hwpoison yet,
> so it's not detectable.
> 
>>
>>> +
>>>  	return ret;
>>>  out:
>>>  	if (count_increased)
>>> @@ -1862,6 +1869,7 @@ static int try_memory_failure_hugetlb(unsigned long pfn, int flags, int *hugetlb
>>>  
>>>  	if (hwpoison_filter(p)) {
>>>  		hugetlb_clear_page_hwpoison(head);
>>> +		SetHPageMigratable(head);
>>
>> Would we set HPageMigratable flag for free hugetlb pages here? IIUC, they're not expected to have this flag set.
> 
> Thank you, you're right.  This should be done in "if (res == 1)" block.

If res == 1, it means hugetlb page refcnt is incremented. But it seems this does not necessarily mean
HPageMigratable is cleared by __get_huge_page_for_hwpoison() if the hugetlb page is already isolated.
If so, we might set HPageMigratable flag back for already isolated hugetlb pages?

> (hwpoison_filter often bothers me ...)

Agree, hwpoison_filter() makes things more complicated. ;)

Thanks,
Miaohe Lin

> 
>>
>> Thanks,
>> Miaohe Lin
> 
> Thank you very much!
> - Naoya Horiguchi
> .
> 



^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v3 1/4] mm,hwpoison,hugetlb,memory_hotplug: hotremove memory section with hwpoisoned hugepage
  2022-09-28  9:32       ` Miaohe Lin
@ 2022-10-07  0:45         ` HORIGUCHI NAOYA(堀口 直也)
  2022-10-08  2:33           ` Miaohe Lin
  0 siblings, 1 reply; 19+ messages in thread
From: HORIGUCHI NAOYA(堀口 直也) @ 2022-10-07  0:45 UTC (permalink / raw)
  To: Miaohe Lin
  Cc: Naoya Horiguchi, linux-mm, Andrew Morton, David Hildenbrand,
	Mike Kravetz, Yang Shi, Oscar Salvador, Muchun Song, Jane Chu,
	linux-kernel

On Wed, Sep 28, 2022 at 05:32:12PM +0800, Miaohe Lin wrote:
> On 2022/9/28 9:26, Naoya Horiguchi wrote:
> >>> @@ -1815,6 +1815,13 @@ int __get_huge_page_for_hwpoison(unsigned long pfn, int flags)
...
> >>> @@ -1862,6 +1869,7 @@ static int try_memory_failure_hugetlb(unsigned long pfn, int flags, int *hugetlb
> >>>  
> >>>  	if (hwpoison_filter(p)) {
> >>>  		hugetlb_clear_page_hwpoison(head);
> >>> +		SetHPageMigratable(head);
> >>
> >> Would we set HPageMigratable flag for free hugetlb pages here? IIUC, they're not expected to have this flag set.
> > 
> > Thank you, you're right.  This should be done in "if (res == 1)" block.
> 
> If res == 1, it means hugetlb page refcnt is incremented. But it seems this does not necessarily mean
> HPageMigratable is cleared by __get_huge_page_for_hwpoison() if the hugetlb page is already isolated.
> If so, we might set HPageMigratable flag back for already isolated hugetlb pages?

# sorry for my late reply, I was busy with personal matters these days...

Yes, that could happen (and also in the case where MF_COUNT_INCREASED is
set).  We need store whether HPageMigratable flag is cleared or not in
__get_huge_page_for_hwpoison().  I'll add a parameter to
__get_huge_page_for_hwpoison() to return the flag change to the caller.
But I also think that there're a few internal states during error handling,
so it might be good to add some structure like "struct hwpoison_control"
to save such internal states over related functions (not in this series).

Thanks,
Naoya Horiguchi

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v5 4/4] mm/hwpoison: introduce per-memory_block hwpoison counter counter
  2022-09-24 12:27       ` Miaohe Lin
@ 2022-10-07  0:47         ` HORIGUCHI NAOYA(堀口 直也)
  0 siblings, 0 replies; 19+ messages in thread
From: HORIGUCHI NAOYA(堀口 直也) @ 2022-10-07  0:47 UTC (permalink / raw)
  To: Miaohe Lin
  Cc: Naoya Horiguchi, linux-mm, Andrew Morton, David Hildenbrand,
	Mike Kravetz, Yang Shi, Oscar Salvador, Muchun Song, Jane Chu,
	linux-kernel

On Sat, Sep 24, 2022 at 08:27:35PM +0800, Miaohe Lin wrote:
> On 2022/9/23 22:12, Naoya Horiguchi wrote:
> > There seems another build error in aarch64 with MEMORY_HOTPLUG disabled.
> > https://lore.kernel.org/lkml/20220923110144.GA1413812@ik1-406-35019.vs.sakura.ne.jp/
> > , so let me revise this patch again to handle it.
> > 
> > - Naoya Horiguchi
> > 
> > ---
> > From: Naoya Horiguchi <naoya.horiguchi@nec.com>
> > Date: Fri, 23 Sep 2022 22:51:20 +0900
> > Subject: [PATCH v5 4/4] mm/hwpoison: introduce per-memory_block hwpoison counter
> > 
> > Currently PageHWPoison flag does not behave well when experiencing memory
> > hotremove/hotplug.  Any data field in struct page is unreliable when the
> > associated memory is offlined, and the current mechanism can't tell whether
> > a memory section is onlined because a new memory devices is installed or
> > because previous failed offline operations are undone.  Especially if
> > there's a hwpoisoned memory, it's unclear what the best option is.
> > 
> > So introduce a new mechanism to make struct memory_block remember that
> > a memory block has hwpoisoned memory inside it. And make any online event
> > fail if the onlined memory block contains hwpoison.  struct memory_block
> > is freed and reallocated over ACPI-based hotremove/hotplug, but not over
> > sysfs-based hotremove/hotplug.  So it's desirable to implement hwpoison
> > counter on this struct.
> > 
> > Note that clear_hwpoisoned_pages() is relocated to be called earlier than
> > now, just before unregistering struct memory_block.  Otherwise, the
> > per-memory_block hwpoison counter is freed and we fail to adjust global
> > hwpoison counter properly.
> > 
> > Signed-off-by: Naoya Horiguchi <naoya.horiguchi@nec.com>
> > Reported-by: kernel test robot <lkp@intel.com>
> 
> LGTM with some nits below. Thanks.
> 
> Reviewed-by: Miaohe Lin <linmiaohe@huawei.com>

Thank you.

> 
> > ---
> > ChangeLog v4 -> v5:
> > - add Reported-by of lkp bot,
> > - check both CONFIG_MEMORY_FAILURE and CONFIG_MEMORY_HOTPLUG in introduced #ifdefs,
> >   intending to fix "undefined reference" errors in aarch64.
> > 
> > ChangeLog v3 -> v4:
> > - fix build error (https://lore.kernel.org/linux-mm/202209231134.tnhKHRfg-lkp@intel.com/)
> >   by using memblk_nr_poison() to access to the member ->nr_hwpoison
> > ---
> >  drivers/base/memory.c  | 34 ++++++++++++++++++++++++++++++++++
> >  include/linux/memory.h |  3 +++
> >  include/linux/mm.h     | 24 ++++++++++++++++++++++++
> >  mm/internal.h          |  8 --------
> >  mm/memory-failure.c    | 31 ++++++++++---------------------
> >  mm/sparse.c            |  2 --
> >  6 files changed, 71 insertions(+), 31 deletions(-)
> > 
> > diff --git a/drivers/base/memory.c b/drivers/base/memory.c
> > index 9aa0da991cfb..99e0e789616c 100644
> > --- a/drivers/base/memory.c
> > +++ b/drivers/base/memory.c
> > @@ -183,6 +183,9 @@ static int memory_block_online(struct memory_block *mem)
> >  	struct zone *zone;
> >  	int ret;
> >  
> > +	if (memblk_nr_poison(start_pfn))
> > +		return -EHWPOISON;
> > +
> >  	zone = zone_for_pfn_range(mem->online_type, mem->nid, mem->group,
> >  				  start_pfn, nr_pages);
> >  
> > @@ -864,6 +867,7 @@ void remove_memory_block_devices(unsigned long start, unsigned long size)
> >  		mem = find_memory_block_by_id(block_id);
> >  		if (WARN_ON_ONCE(!mem))
> >  			continue;
> > +		clear_hwpoisoned_pages(memblk_nr_poison(start));
> 
> clear_hwpoisoned_pages seems not a proper name now? PageHWPoison info is kept now. But this should be trivial.
> 

Right, I think that the name num_poisoned_pages_sub() is clear enough, so
I'll open this function.

> >  		unregister_memory_block_under_nodes(mem);
> >  		remove_memory_block(mem);
> >  	}
> > @@ -1164,3 +1168,33 @@ int walk_dynamic_memory_groups(int nid, walk_memory_groups_func_t func,
> >  	}
> >  	return ret;
> >  }
> > +
> > +#if defined(CONFIG_MEMORY_FAILURE) && defined(CONFIG_MEMORY_HOTPLUG)
> > +void memblk_nr_poison_inc(unsigned long pfn)
> > +{
> > +	const unsigned long block_id = pfn_to_block_id(pfn);
> > +	struct memory_block *mem = find_memory_block_by_id(block_id);
> > +
> > +	if (mem)
> > +		atomic_long_inc(&mem->nr_hwpoison);
> > +}
> > +
> > +void memblk_nr_poison_sub(unsigned long pfn, long i)
> > +{
> > +	const unsigned long block_id = pfn_to_block_id(pfn);
> > +	struct memory_block *mem = find_memory_block_by_id(block_id);
> > +
> > +	if (mem)
> > +		atomic_long_sub(i, &mem->nr_hwpoison);
> > +}
> > +
> > +unsigned long memblk_nr_poison(unsigned long pfn)
> 
> memblk_nr_poison() is only used inside this file. Make it static?

Thanks, I'll add it.

Thanks,
Naoya Horiguchi

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v5 4/4] mm/hwpoison: introduce per-memory_block hwpoison counter counter
  2022-09-26  8:05       ` David Hildenbrand
@ 2022-10-07  0:52         ` HORIGUCHI NAOYA(堀口 直也)
  0 siblings, 0 replies; 19+ messages in thread
From: HORIGUCHI NAOYA(堀口 直也) @ 2022-10-07  0:52 UTC (permalink / raw)
  To: David Hildenbrand
  Cc: Naoya Horiguchi, linux-mm, Andrew Morton, Miaohe Lin,
	Mike Kravetz, Yang Shi, Oscar Salvador, Muchun Song, Jane Chu,
	linux-kernel

On Mon, Sep 26, 2022 at 10:05:05AM +0200, David Hildenbrand wrote:
> >   /*
> > @@ -2414,6 +2417,10 @@ int unpoison_memory(unsigned long pfn)
> >   unlock_mutex:
> >   	mutex_unlock(&mf_mutex);
> >   	if (!ret || freeit) {
> > +		/*
> > +		 * TODO: per-memory_block counter might break when the page
> > +		 * size to be unpoisoned is larger than a memory_block.
> > +		 */
> 
> Hmm, but that happens easily e.g., with 1 GiB hugetlb page and 128 MiB
> memory section/block size. What would be the right thing to do here? The
> TODO should rather spell that out instead of just stating the problem.

What should we need here is to cancel the per-memory_block hwpoison counts
in each memory block associated with the hugepage to be unpoisoned.
I found that it can be done with additional several lines of code, so
v6 will contain them.  Then, this TODO comment is no longer needed.

Thanks,
Naoya Horiguchi

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v3 1/4] mm,hwpoison,hugetlb,memory_hotplug: hotremove memory section with hwpoisoned hugepage
  2022-10-07  0:45         ` HORIGUCHI NAOYA(堀口 直也)
@ 2022-10-08  2:33           ` Miaohe Lin
  0 siblings, 0 replies; 19+ messages in thread
From: Miaohe Lin @ 2022-10-08  2:33 UTC (permalink / raw)
  To: HORIGUCHI NAOYA(堀口 直也)
  Cc: Naoya Horiguchi, linux-mm, Andrew Morton, David Hildenbrand,
	Mike Kravetz, Yang Shi, Oscar Salvador, Muchun Song, Jane Chu,
	linux-kernel

On 2022/10/7 8:45, HORIGUCHI NAOYA(堀口 直也) wrote:
> On Wed, Sep 28, 2022 at 05:32:12PM +0800, Miaohe Lin wrote:
>> On 2022/9/28 9:26, Naoya Horiguchi wrote:
>>>>> @@ -1815,6 +1815,13 @@ int __get_huge_page_for_hwpoison(unsigned long pfn, int flags)
> ...
>>>>> @@ -1862,6 +1869,7 @@ static int try_memory_failure_hugetlb(unsigned long pfn, int flags, int *hugetlb
>>>>>  
>>>>>  	if (hwpoison_filter(p)) {
>>>>>  		hugetlb_clear_page_hwpoison(head);
>>>>> +		SetHPageMigratable(head);
>>>>
>>>> Would we set HPageMigratable flag for free hugetlb pages here? IIUC, they're not expected to have this flag set.
>>>
>>> Thank you, you're right.  This should be done in "if (res == 1)" block.
>>
>> If res == 1, it means hugetlb page refcnt is incremented. But it seems this does not necessarily mean
>> HPageMigratable is cleared by __get_huge_page_for_hwpoison() if the hugetlb page is already isolated.
>> If so, we might set HPageMigratable flag back for already isolated hugetlb pages?
> 
> # sorry for my late reply, I was busy with personal matters these days...

That's all right. I was back from my vacation too. Thanks for your work. :)

> 
> Yes, that could happen (and also in the case where MF_COUNT_INCREASED is
> set).  We need store whether HPageMigratable flag is cleared or not in
> __get_huge_page_for_hwpoison().  I'll add a parameter to
> __get_huge_page_for_hwpoison() to return the flag change to the caller.
> But I also think that there're a few internal states during error handling,
> so it might be good to add some structure like "struct hwpoison_control"
> to save such internal states over related functions (not in this series).

That sounds like a good idea.

Thanks,
Miaohe Lin

> 
> Thanks,
> Naoya Horiguchi
> 



^ permalink raw reply	[flat|nested] 19+ messages in thread

end of thread, other threads:[~2022-10-08  2:34 UTC | newest]

Thread overview: 19+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-09-21  9:13 [PATCH v3 0/4] mm, hwpoison: improve handling workload related to hugetlb and memory_hotplug Naoya Horiguchi
2022-09-21  9:13 ` [PATCH v3 1/4] mm,hwpoison,hugetlb,memory_hotplug: hotremove memory section with hwpoisoned hugepage Naoya Horiguchi
2022-09-24 11:43   ` Miaohe Lin
2022-09-28  1:26     ` Naoya Horiguchi
2022-09-28  9:32       ` Miaohe Lin
2022-10-07  0:45         ` HORIGUCHI NAOYA(堀口 直也)
2022-10-08  2:33           ` Miaohe Lin
2022-09-21  9:13 ` [PATCH v3 2/4] mm/hwpoison: move definitions of num_poisoned_pages_* to memory-failure.c Naoya Horiguchi
2022-09-24 11:53   ` Miaohe Lin
2022-09-28  2:05     ` Naoya Horiguchi
2022-09-28  7:56       ` Miaohe Lin
2022-09-21  9:13 ` [PATCH v3 3/4] mm/hwpoison: pass pfn to num_poisoned_pages_*() Naoya Horiguchi
2022-09-21  9:13 ` [PATCH v3 4/4] mm/hwpoison: introduce per-memory_block hwpoison counter Naoya Horiguchi
2022-09-23  8:26   ` [PATCH v4 4/4] mm/hwpoison: introduce per-memory_block hwpoison counter counter Naoya Horiguchi
2022-09-23 14:12     ` [PATCH v5 " Naoya Horiguchi
2022-09-24 12:27       ` Miaohe Lin
2022-10-07  0:47         ` HORIGUCHI NAOYA(堀口 直也)
2022-09-26  8:05       ` David Hildenbrand
2022-10-07  0:52         ` HORIGUCHI NAOYA(堀口 直也)

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).