linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v2 0/5] memory-failure: fix hwpoison_filter
@ 2022-05-09 10:56 zhenwei pi
  2022-05-09 10:56 ` [PATCH v2 1/5] mm/memory-failure.c: move clear_hwpoisoned_pages zhenwei pi
                   ` (4 more replies)
  0 siblings, 5 replies; 10+ messages in thread
From: zhenwei pi @ 2022-05-09 10:56 UTC (permalink / raw)
  To: akpm, naoya.horiguchi; +Cc: linux-mm, linux-kernel, zhenwei pi

v1 -> v2:
- move clear_hwpoisoned_pages() near the definitions of
  hwpoison_filter*.
- fix typo.
- remove "mm/memofy-failure.c: optimize hwpoison_filter".
- call hwpoison_filter() after get_hwpoison_page().
- disable hwpoison filter during removing
- simplify num_poisoned_pages_inc/dec

Also add background of this work:
As well known, the memory failure mechanism handles memory corrupted event, and try to send SIGBUS to the user process which uses this corrupted page.

For the virtualization case, QEMU catches SIGBUS and tries to inject MCE into the guest, and the guest handles memory failure again. Thus the guest gets the minimal effect from hardware memory corruption.

The further step I'm working on:
1, try to modify code to decrease poisoned pages in a single place (mm/memofy-failure.c: simplify num_poisoned_pages_dec in this series).

2, try to use page_handle_poison() to handle SetPageHWPoison() and num_poisoned_pages_inc() together. It would be best to call num_poisoned_pages_inc() in a single place too.

3, introduce memory failure notifier list in memory-failure.c: notify the corrupted PFN to someone who registers this list.
If I can complete [1] and [2] part, [3] will be quite easy(just call notifier list after increasing poisoned page).

4, introduce memory recover VQ for memory balloon device, and registers memory failure notifier list. During the guest kernel handles memory failure, balloon device gets notified by memory failure notifier list, and tells the host to recover the corrupted PFN(GPA) by the new VQ.

5, host side remaps the corrupted page(HVA), and tells the guest side to unpoison the PFN(GPA). Then the guest fixes the corrupted page(GPA) dynamically.

Thanks to Naoya & David for the suggestions!

v1:
- move clear_hwpoisoned_pages() from sparse.c to memory-failure.c.
- simplify num_poisoned_pages_dec().
- call hwpoison_filter() early in memory_failure().
- add hwpoison_filter for soft offline.

zhenwei pi (5):
  mm/memory-failure.c: move clear_hwpoisoned_pages
  mm/memory-failure.c: simplify num_poisoned_pages_dec
  mm/memory-failure.c: add hwpoison_filter for soft offline
  mm/hwpoison: disable hwpoison filter during removing
  mm/memory-failure.c: simplify num_poisoned_pages_inc/dec

 mm/hwpoison-inject.c |  1 +
 mm/internal.h        | 11 ++++++
 mm/memory-failure.c  | 85 ++++++++++++++++++++++++--------------------
 mm/page_alloc.c      |  1 -
 mm/sparse.c          | 27 --------------
 5 files changed, 59 insertions(+), 66 deletions(-)

-- 
2.20.1


^ permalink raw reply	[flat|nested] 10+ messages in thread

* [PATCH v2 1/5] mm/memory-failure.c: move clear_hwpoisoned_pages
  2022-05-09 10:56 [PATCH v2 0/5] memory-failure: fix hwpoison_filter zhenwei pi
@ 2022-05-09 10:56 ` zhenwei pi
  2022-05-09 10:56 ` [PATCH v2 2/5] mm/memory-failure.c: simplify num_poisoned_pages_dec zhenwei pi
                   ` (3 subsequent siblings)
  4 siblings, 0 replies; 10+ messages in thread
From: zhenwei pi @ 2022-05-09 10:56 UTC (permalink / raw)
  To: akpm, naoya.horiguchi; +Cc: linux-mm, linux-kernel, zhenwei pi

clear_hwpoisoned_pages() clears HWPoison flag and decreases the number
of poisoned pages, this actually works as part of memory failure.

Move this function from sparse.c to memory-failure.c, finally there
is no CONFIG_MEMORY_FAILURE in sparse.c.

Acked-by: Naoya Horiguchi <naoya.horiguchi@nec.com>
Signed-off-by: zhenwei pi <pizhenwei@bytedance.com>
---
 mm/internal.h       | 11 +++++++++++
 mm/memory-failure.c | 21 +++++++++++++++++++++
 mm/sparse.c         | 27 ---------------------------
 3 files changed, 32 insertions(+), 27 deletions(-)

diff --git a/mm/internal.h b/mm/internal.h
index cf16280ce132..84dd6aa7ba97 100644
--- a/mm/internal.h
+++ b/mm/internal.h
@@ -634,6 +634,9 @@ static inline int find_next_best_node(int node, nodemask_t *used_node_mask)
 }
 #endif
 
+/*
+ * mm/memory-failure.c
+ */
 extern int hwpoison_filter(struct page *p);
 
 extern u32 hwpoison_filter_dev_major;
@@ -643,6 +646,14 @@ extern u64 hwpoison_filter_flags_value;
 extern u64 hwpoison_filter_memcg;
 extern u32 hwpoison_filter_enable;
 
+#ifdef CONFIG_MEMORY_FAILURE
+void clear_hwpoisoned_pages(struct page *memmap, int nr_pages);
+#else
+static inline void clear_hwpoisoned_pages(struct page *memmap, int nr_pages)
+{
+}
+#endif
+
 extern unsigned long  __must_check vm_mmap_pgoff(struct file *, unsigned long,
         unsigned long, unsigned long,
         unsigned long, unsigned long);
diff --git a/mm/memory-failure.c b/mm/memory-failure.c
index 27760c19bad7..46d9fb612dcc 100644
--- a/mm/memory-failure.c
+++ b/mm/memory-failure.c
@@ -2401,3 +2401,24 @@ int soft_offline_page(unsigned long pfn, int flags)
 
 	return ret;
 }
+
+void clear_hwpoisoned_pages(struct page *memmap, int nr_pages)
+{
+	int i;
+
+	/*
+	 * A further optimization is to have per section refcounted
+	 * num_poisoned_pages.  But that would need more space per memmap, so
+	 * for now just do a quick global check to speed up this routine in the
+	 * absence of bad pages.
+	 */
+	if (atomic_long_read(&num_poisoned_pages) == 0)
+		return;
+
+	for (i = 0; i < nr_pages; i++) {
+		if (PageHWPoison(&memmap[i])) {
+			num_poisoned_pages_dec();
+			ClearPageHWPoison(&memmap[i]);
+		}
+	}
+}
diff --git a/mm/sparse.c b/mm/sparse.c
index 952f06d8f373..e983c38fac8f 100644
--- a/mm/sparse.c
+++ b/mm/sparse.c
@@ -916,33 +916,6 @@ int __meminit sparse_add_section(int nid, unsigned long start_pfn,
 	return 0;
 }
 
-#ifdef CONFIG_MEMORY_FAILURE
-static void clear_hwpoisoned_pages(struct page *memmap, int nr_pages)
-{
-	int i;
-
-	/*
-	 * A further optimization is to have per section refcounted
-	 * num_poisoned_pages.  But that would need more space per memmap, so
-	 * for now just do a quick global check to speed up this routine in the
-	 * absence of bad pages.
-	 */
-	if (atomic_long_read(&num_poisoned_pages) == 0)
-		return;
-
-	for (i = 0; i < nr_pages; i++) {
-		if (PageHWPoison(&memmap[i])) {
-			num_poisoned_pages_dec();
-			ClearPageHWPoison(&memmap[i]);
-		}
-	}
-}
-#else
-static inline void clear_hwpoisoned_pages(struct page *memmap, int nr_pages)
-{
-}
-#endif
-
 void sparse_remove_section(struct mem_section *ms, unsigned long pfn,
 		unsigned long nr_pages, unsigned long map_offset,
 		struct vmem_altmap *altmap)
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [PATCH v2 2/5] mm/memory-failure.c: simplify num_poisoned_pages_dec
  2022-05-09 10:56 [PATCH v2 0/5] memory-failure: fix hwpoison_filter zhenwei pi
  2022-05-09 10:56 ` [PATCH v2 1/5] mm/memory-failure.c: move clear_hwpoisoned_pages zhenwei pi
@ 2022-05-09 10:56 ` zhenwei pi
  2022-05-09 10:56 ` [PATCH v2 3/5] mm/memory-failure.c: add hwpoison_filter for soft offline zhenwei pi
                   ` (2 subsequent siblings)
  4 siblings, 0 replies; 10+ messages in thread
From: zhenwei pi @ 2022-05-09 10:56 UTC (permalink / raw)
  To: akpm, naoya.horiguchi; +Cc: linux-mm, linux-kernel, zhenwei pi

Don't decrease the number of poisoned pages in page_alloc.c, let the
memory-failure.c do inc/dec poisoned pages only.

Also simplify unpoison_memory(), only decrease the number of
poisoned pages when:
 - TestClearPageHWPoison() succeed
 - put_page_back_buddy succeed

After decreasing, print necessary log.

Finally, remove clear_page_hwpoison() and unpoison_taken_off_page().

Acked-by: Naoya Horiguchi <naoya.horiguchi@nec.com>
Signed-off-by: zhenwei pi <pizhenwei@bytedance.com>
---
 mm/memory-failure.c | 37 +++++++++----------------------------
 mm/page_alloc.c     |  1 -
 2 files changed, 9 insertions(+), 29 deletions(-)

diff --git a/mm/memory-failure.c b/mm/memory-failure.c
index 46d9fb612dcc..ece05858568f 100644
--- a/mm/memory-failure.c
+++ b/mm/memory-failure.c
@@ -2101,28 +2101,6 @@ core_initcall(memory_failure_init);
 		pr_info(fmt, pfn);			\
 })
 
-static inline int clear_page_hwpoison(struct ratelimit_state *rs, struct page *p)
-{
-	if (TestClearPageHWPoison(p)) {
-		unpoison_pr_info("Unpoison: Software-unpoisoned page %#lx\n",
-				 page_to_pfn(p), rs);
-		num_poisoned_pages_dec();
-		return 1;
-	}
-	return 0;
-}
-
-static inline int unpoison_taken_off_page(struct ratelimit_state *rs,
-					  struct page *p)
-{
-	if (put_page_back_buddy(p)) {
-		unpoison_pr_info("Unpoison: Software-unpoisoned page %#lx\n",
-				 page_to_pfn(p), rs);
-		return 0;
-	}
-	return -EBUSY;
-}
-
 /**
  * unpoison_memory - Unpoison a previously poisoned page
  * @pfn: Page number of the to be unpoisoned page
@@ -2140,6 +2118,7 @@ int unpoison_memory(unsigned long pfn)
 	struct page *page;
 	struct page *p;
 	int ret = -EBUSY;
+	int freeit = 0;
 	static DEFINE_RATELIMIT_STATE(unpoison_rs, DEFAULT_RATELIMIT_INTERVAL,
 					DEFAULT_RATELIMIT_BURST);
 
@@ -2180,18 +2159,15 @@ int unpoison_memory(unsigned long pfn)
 
 	ret = get_hwpoison_page(p, MF_UNPOISON);
 	if (!ret) {
-		if (clear_page_hwpoison(&unpoison_rs, page))
-			ret = 0;
-		else
-			ret = -EBUSY;
+		ret = TestClearPageHWPoison(page) ? 0 : -EBUSY;
 	} else if (ret < 0) {
 		if (ret == -EHWPOISON) {
-			ret = unpoison_taken_off_page(&unpoison_rs, p);
+			ret = put_page_back_buddy(p) ? 0 : -EBUSY;
 		} else
 			unpoison_pr_info("Unpoison: failed to grab page %#lx\n",
 					 pfn, &unpoison_rs);
 	} else {
-		int freeit = clear_page_hwpoison(&unpoison_rs, p);
+		freeit = !!TestClearPageHWPoison(p);
 
 		put_page(page);
 		if (freeit && !(pfn == my_zero_pfn(0) && page_count(p) == 1)) {
@@ -2202,6 +2178,11 @@ int unpoison_memory(unsigned long pfn)
 
 unlock_mutex:
 	mutex_unlock(&mf_mutex);
+	if (!ret || freeit) {
+		num_poisoned_pages_dec();
+		unpoison_pr_info("Unpoison: Software-unpoisoned page %#lx\n",
+				 page_to_pfn(p), &unpoison_rs);
+	}
 	return ret;
 }
 EXPORT_SYMBOL(unpoison_memory);
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 0e42038382c1..d710846ef653 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -9625,7 +9625,6 @@ bool put_page_back_buddy(struct page *page)
 		ClearPageHWPoisonTakenOff(page);
 		__free_one_page(page, pfn, zone, 0, migratetype, FPI_NONE);
 		if (TestClearPageHWPoison(page)) {
-			num_poisoned_pages_dec();
 			ret = true;
 		}
 	}
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [PATCH v2 3/5] mm/memory-failure.c: add hwpoison_filter for soft offline
  2022-05-09 10:56 [PATCH v2 0/5] memory-failure: fix hwpoison_filter zhenwei pi
  2022-05-09 10:56 ` [PATCH v2 1/5] mm/memory-failure.c: move clear_hwpoisoned_pages zhenwei pi
  2022-05-09 10:56 ` [PATCH v2 2/5] mm/memory-failure.c: simplify num_poisoned_pages_dec zhenwei pi
@ 2022-05-09 10:56 ` zhenwei pi
  2022-05-11  8:18   ` HORIGUCHI NAOYA(堀口 直也)
  2022-05-09 10:56 ` [PATCH v2 4/5] mm/hwpoison: disable hwpoison filter during removing zhenwei pi
  2022-05-09 10:56 ` [PATCH v2 5/5] mm/memory-failure.c: simplify num_poisoned_pages_inc/dec zhenwei pi
  4 siblings, 1 reply; 10+ messages in thread
From: zhenwei pi @ 2022-05-09 10:56 UTC (permalink / raw)
  To: akpm, naoya.horiguchi; +Cc: linux-mm, linux-kernel, zhenwei pi

hwpoison_filter is missing in the soft offline path, this leads an
issue: after enabling the corrupt filter, the user process still has
a chance to inject hwpoison fault by
madvise(addr, len, MADV_SOFT_OFFLINE) at PFN which is expected to
reject.

Also do a minor change in comment of memory_failure().

Signed-off-by: zhenwei pi <pizhenwei@bytedance.com>
---
 mm/memory-failure.c | 16 ++++++++++++++--
 1 file changed, 14 insertions(+), 2 deletions(-)

diff --git a/mm/memory-failure.c b/mm/memory-failure.c
index ece05858568f..ed280ef5473d 100644
--- a/mm/memory-failure.c
+++ b/mm/memory-failure.c
@@ -1762,7 +1762,7 @@ static DEFINE_MUTEX(mf_mutex);
  * enabled and no spinlocks hold.
  *
  * Return: 0 for successfully handled the memory error,
- *         -EOPNOTSUPP for memory_filter() filtered the error event,
+ *         -EOPNOTSUPP for hwpoison_filter() filtered the error event,
  *         < 0(except -EOPNOTSUPP) on failure.
  */
 int memory_failure(unsigned long pfn, int flags)
@@ -2317,7 +2317,9 @@ static void put_ref_page(struct page *page)
  * @pfn: pfn to soft-offline
  * @flags: flags. Same as memory_failure().
  *
- * Returns 0 on success, otherwise negated errno.
+ * Returns 0 on success
+ *         -EOPNOTSUPP for hwpoison_filter() filtered the error event
+ *         < 0 otherwise negated errno.
  *
  * Soft offline a page, by migration or invalidation,
  * without killing anything. This is for the case when
@@ -2368,6 +2370,16 @@ int soft_offline_page(unsigned long pfn, int flags)
 	ret = get_hwpoison_page(page, flags | MF_SOFT_OFFLINE);
 	put_online_mems();
 
+	if (hwpoison_filter(page)) {
+		if (ret > 0)
+			put_page(page);
+		else
+			put_ref_page(ref_page);
+
+		mutex_unlock(&mf_mutex);
+		return -EOPNOTSUPP;
+	}
+
 	if (ret > 0) {
 		ret = soft_offline_in_use_page(page);
 	} else if (ret == 0) {
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [PATCH v2 4/5] mm/hwpoison: disable hwpoison filter during removing
  2022-05-09 10:56 [PATCH v2 0/5] memory-failure: fix hwpoison_filter zhenwei pi
                   ` (2 preceding siblings ...)
  2022-05-09 10:56 ` [PATCH v2 3/5] mm/memory-failure.c: add hwpoison_filter for soft offline zhenwei pi
@ 2022-05-09 10:56 ` zhenwei pi
  2022-05-11  8:18   ` HORIGUCHI NAOYA(堀口 直也)
  2022-05-09 10:56 ` [PATCH v2 5/5] mm/memory-failure.c: simplify num_poisoned_pages_inc/dec zhenwei pi
  4 siblings, 1 reply; 10+ messages in thread
From: zhenwei pi @ 2022-05-09 10:56 UTC (permalink / raw)
  To: akpm, naoya.horiguchi; +Cc: linux-mm, linux-kernel, zhenwei pi

hwpoison filter is enabled by hwpoison-inject module, after removing
this module, hwpoison filter still works. What is worse, user can not
find the debugfs entries to know this.

Disable the hwpoison filter during removing hwpoison-inject module.

Signed-off-by: zhenwei pi <pizhenwei@bytedance.com>
---
 mm/hwpoison-inject.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/mm/hwpoison-inject.c b/mm/hwpoison-inject.c
index bb0cea5468cb..5c0cddd81505 100644
--- a/mm/hwpoison-inject.c
+++ b/mm/hwpoison-inject.c
@@ -65,6 +65,7 @@ DEFINE_DEBUGFS_ATTRIBUTE(unpoison_fops, NULL, hwpoison_unpoison, "%lli\n");
 
 static void pfn_inject_exit(void)
 {
+	hwpoison_filter_enable = 0;
 	debugfs_remove_recursive(hwpoison_dir);
 }
 
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [PATCH v2 5/5] mm/memory-failure.c: simplify num_poisoned_pages_inc/dec
  2022-05-09 10:56 [PATCH v2 0/5] memory-failure: fix hwpoison_filter zhenwei pi
                   ` (3 preceding siblings ...)
  2022-05-09 10:56 ` [PATCH v2 4/5] mm/hwpoison: disable hwpoison filter during removing zhenwei pi
@ 2022-05-09 10:56 ` zhenwei pi
  2022-05-11  8:20   ` HORIGUCHI NAOYA(堀口 直也)
  4 siblings, 1 reply; 10+ messages in thread
From: zhenwei pi @ 2022-05-09 10:56 UTC (permalink / raw)
  To: akpm, naoya.horiguchi; +Cc: linux-mm, linux-kernel, zhenwei pi

Originally, do num_poisoned_pages_inc() in memory failure routine,
use num_poisoned_pages_dec() to rollback the number if filtered/
cancelled.

Suggested by Naoya, do num_poisoned_pages_inc() only in
action_result(), this make this clear and simple.

Signed-off-by: zhenwei pi <pizhenwei@bytedance.com>
---
 mm/memory-failure.c | 11 +++--------
 1 file changed, 3 insertions(+), 8 deletions(-)

diff --git a/mm/memory-failure.c b/mm/memory-failure.c
index ed280ef5473d..2d590cba412c 100644
--- a/mm/memory-failure.c
+++ b/mm/memory-failure.c
@@ -1133,6 +1133,7 @@ static void action_result(unsigned long pfn, enum mf_action_page_type type,
 {
 	trace_memory_failure_event(pfn, type, result);
 
+	num_poisoned_pages_inc();
 	pr_err("Memory failure: %#lx: recovery action for %s: %s\n",
 		pfn, action_page_types[type], action_name[result]);
 }
@@ -1588,8 +1589,6 @@ static int try_memory_failure_hugetlb(unsigned long pfn, int flags, int *hugetlb
 		goto out;
 	}
 
-	num_poisoned_pages_inc();
-
 	/*
 	 * Handling free hugepage.  The possible race with hugepage allocation
 	 * or demotion can be prevented by PageHWPoison flag.
@@ -1815,7 +1814,6 @@ int memory_failure(unsigned long pfn, int flags)
 	}
 
 	hpage = compound_head(p);
-	num_poisoned_pages_inc();
 
 	/*
 	 * We need/can do nothing about count=0 pages.
@@ -1839,7 +1837,6 @@ int memory_failure(unsigned long pfn, int flags)
 					/* We lost the race, try again */
 					if (retry) {
 						ClearPageHWPoison(p);
-						num_poisoned_pages_dec();
 						retry = false;
 						goto try_again;
 					}
@@ -1915,8 +1912,7 @@ int memory_failure(unsigned long pfn, int flags)
 	 */
 	if (PageCompound(p)) {
 		if (retry) {
-			if (TestClearPageHWPoison(p))
-				num_poisoned_pages_dec();
+			ClearPageHWPoison(p);
 			unlock_page(p);
 			put_page(p);
 			flags &= ~MF_COUNT_INCREASED;
@@ -1938,8 +1934,7 @@ int memory_failure(unsigned long pfn, int flags)
 	page_flags = p->flags;
 
 	if (hwpoison_filter(p)) {
-		if (TestClearPageHWPoison(p))
-			num_poisoned_pages_dec();
+		TestClearPageHWPoison(p);
 		unlock_page(p);
 		put_page(p);
 		res = -EOPNOTSUPP;
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* Re: Re: [PATCH v2 5/5] mm/memory-failure.c: simplify num_poisoned_pages_inc/dec
  2022-05-11  8:20   ` HORIGUCHI NAOYA(堀口 直也)
@ 2022-05-11  8:18     ` zhenwei pi
  0 siblings, 0 replies; 10+ messages in thread
From: zhenwei pi @ 2022-05-11  8:18 UTC (permalink / raw)
  To: HORIGUCHI NAOYA(堀口 直也)
  Cc: akpm, linux-mm, linux-kernel



On 5/11/22 16:20, HORIGUCHI NAOYA(堀口 直也) wrote:
> On Mon, May 09, 2022 at 06:56:41PM +0800, zhenwei pi wrote:
>> Originally, do num_poisoned_pages_inc() in memory failure routine,
>> use num_poisoned_pages_dec() to rollback the number if filtered/
>> cancelled.
>>
>> Suggested by Naoya, do num_poisoned_pages_inc() only in
>> action_result(), this make this clear and simple.
>>
>> Signed-off-by: zhenwei pi <pizhenwei@bytedance.com>
> 
> I found that action_result(MF_MSG_UNKNOWN) in try_memory_failure_hugetlb()
> does not follow the rule, so that could break the counter.
> I don't think this is the issue in your patch, so I'm fine with this patch.
> 
> I'll submit a fix later for this, which will add hugetlb_set_page_hwpoison()
> in the path.  That will have a bit non-trivial change because we need do
> this in hugeltb_lock.
> 
> Anyway, thank you for this work.
> 
> Acked-by: Naoya Horiguchi <naoya.horiguchi@nec.com>

A million thanks to you!

-- 
zhenwei pi

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH v2 3/5] mm/memory-failure.c: add hwpoison_filter for soft offline
  2022-05-09 10:56 ` [PATCH v2 3/5] mm/memory-failure.c: add hwpoison_filter for soft offline zhenwei pi
@ 2022-05-11  8:18   ` HORIGUCHI NAOYA(堀口 直也)
  0 siblings, 0 replies; 10+ messages in thread
From: HORIGUCHI NAOYA(堀口 直也) @ 2022-05-11  8:18 UTC (permalink / raw)
  To: zhenwei pi; +Cc: akpm, linux-mm, linux-kernel

On Mon, May 09, 2022 at 06:56:39PM +0800, zhenwei pi wrote:
> hwpoison_filter is missing in the soft offline path, this leads an
> issue: after enabling the corrupt filter, the user process still has
> a chance to inject hwpoison fault by
> madvise(addr, len, MADV_SOFT_OFFLINE) at PFN which is expected to
> reject.
> 
> Also do a minor change in comment of memory_failure().
> 
> Signed-off-by: zhenwei pi <pizhenwei@bytedance.com>

Looks good to me, thank you.

Acked-by: Naoya Horiguchi <naoya.horiguchi@nec.com>

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH v2 4/5] mm/hwpoison: disable hwpoison filter during removing
  2022-05-09 10:56 ` [PATCH v2 4/5] mm/hwpoison: disable hwpoison filter during removing zhenwei pi
@ 2022-05-11  8:18   ` HORIGUCHI NAOYA(堀口 直也)
  0 siblings, 0 replies; 10+ messages in thread
From: HORIGUCHI NAOYA(堀口 直也) @ 2022-05-11  8:18 UTC (permalink / raw)
  To: zhenwei pi; +Cc: akpm, linux-mm, linux-kernel

On Mon, May 09, 2022 at 06:56:40PM +0800, zhenwei pi wrote:
> hwpoison filter is enabled by hwpoison-inject module, after removing
> this module, hwpoison filter still works. What is worse, user can not
> find the debugfs entries to know this.
> 
> Disable the hwpoison filter during removing hwpoison-inject module.
> 
> Signed-off-by: zhenwei pi <pizhenwei@bytedance.com>

Acked-by: Naoya Horiguchi <naoya.horiguchi@nec.com>

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH v2 5/5] mm/memory-failure.c: simplify num_poisoned_pages_inc/dec
  2022-05-09 10:56 ` [PATCH v2 5/5] mm/memory-failure.c: simplify num_poisoned_pages_inc/dec zhenwei pi
@ 2022-05-11  8:20   ` HORIGUCHI NAOYA(堀口 直也)
  2022-05-11  8:18     ` zhenwei pi
  0 siblings, 1 reply; 10+ messages in thread
From: HORIGUCHI NAOYA(堀口 直也) @ 2022-05-11  8:20 UTC (permalink / raw)
  To: zhenwei pi; +Cc: akpm, linux-mm, linux-kernel

On Mon, May 09, 2022 at 06:56:41PM +0800, zhenwei pi wrote:
> Originally, do num_poisoned_pages_inc() in memory failure routine,
> use num_poisoned_pages_dec() to rollback the number if filtered/
> cancelled.
> 
> Suggested by Naoya, do num_poisoned_pages_inc() only in
> action_result(), this make this clear and simple.
> 
> Signed-off-by: zhenwei pi <pizhenwei@bytedance.com>

I found that action_result(MF_MSG_UNKNOWN) in try_memory_failure_hugetlb()
does not follow the rule, so that could break the counter.
I don't think this is the issue in your patch, so I'm fine with this patch.

I'll submit a fix later for this, which will add hugetlb_set_page_hwpoison()
in the path.  That will have a bit non-trivial change because we need do
this in hugeltb_lock.

Anyway, thank you for this work.

Acked-by: Naoya Horiguchi <naoya.horiguchi@nec.com>

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2022-05-11  8:22 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-05-09 10:56 [PATCH v2 0/5] memory-failure: fix hwpoison_filter zhenwei pi
2022-05-09 10:56 ` [PATCH v2 1/5] mm/memory-failure.c: move clear_hwpoisoned_pages zhenwei pi
2022-05-09 10:56 ` [PATCH v2 2/5] mm/memory-failure.c: simplify num_poisoned_pages_dec zhenwei pi
2022-05-09 10:56 ` [PATCH v2 3/5] mm/memory-failure.c: add hwpoison_filter for soft offline zhenwei pi
2022-05-11  8:18   ` HORIGUCHI NAOYA(堀口 直也)
2022-05-09 10:56 ` [PATCH v2 4/5] mm/hwpoison: disable hwpoison filter during removing zhenwei pi
2022-05-11  8:18   ` HORIGUCHI NAOYA(堀口 直也)
2022-05-09 10:56 ` [PATCH v2 5/5] mm/memory-failure.c: simplify num_poisoned_pages_inc/dec zhenwei pi
2022-05-11  8:20   ` HORIGUCHI NAOYA(堀口 直也)
2022-05-11  8:18     ` zhenwei pi

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).