linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 0/4] memory-failure: fix hwpoison_filter
@ 2022-04-29 14:22 zhenwei pi
  2022-04-29 14:22 ` [PATCH 1/4] mm/memory-failure.c: move clear_hwpoisoned_pages zhenwei pi
                   ` (3 more replies)
  0 siblings, 4 replies; 14+ messages in thread
From: zhenwei pi @ 2022-04-29 14:22 UTC (permalink / raw)
  To: akpm, naoya.horiguchi; +Cc: linux-mm, linux-kernel, zhenwei pi

Hi,
There are two parts in this series:
"move clear_hwpoisoned_pages" and "simplify num_poisoned_pages_dec"
do not fix bugs, just make memory failure easy to maintain.

"optimize hwpoison_filter" and "add hwpoison_filter for soft offline"
try to fix hwpoison_filter.

zhenwei pi (4):
  mm/memory-failure.c: move clear_hwpoisoned_pages
  mm/memofy-failure.c:: simplify num_poisoned_pages_dec
  mm/memofy-failure.c: optimize hwpoison_filter
  mm/memofy-failure.c: add hwpoison_filter for soft offline

 mm/internal.h       | 11 ++++++
 mm/memory-failure.c | 81 ++++++++++++++++++++++++---------------------
 mm/page_alloc.c     |  1 -
 mm/sparse.c         | 27 ---------------
 4 files changed, 54 insertions(+), 66 deletions(-)

-- 
2.20.1


^ permalink raw reply	[flat|nested] 14+ messages in thread

* [PATCH 1/4] mm/memory-failure.c: move clear_hwpoisoned_pages
  2022-04-29 14:22 [PATCH 0/4] memory-failure: fix hwpoison_filter zhenwei pi
@ 2022-04-29 14:22 ` zhenwei pi
  2022-05-06  8:55   ` Naoya Horiguchi
  2022-04-29 14:22 ` [PATCH 2/4] mm/memofy-failure.c:: simplify num_poisoned_pages_dec zhenwei pi
                   ` (2 subsequent siblings)
  3 siblings, 1 reply; 14+ messages in thread
From: zhenwei pi @ 2022-04-29 14:22 UTC (permalink / raw)
  To: akpm, naoya.horiguchi; +Cc: linux-mm, linux-kernel, zhenwei pi, Wu Fengguang

clear_hwpoisoned_pages() clears HWPoison flag and decreases the number
of poisoned pages, this actually works as part of memory failure.

Move this function from sparse.c to memory-failure.c, finally there
is no CONFIG_MEMORY_FAILURE in sparse.c.

Cc: Wu Fengguang <fengguang.wu@intel.com>
Signed-off-by: zhenwei pi <pizhenwei@bytedance.com>
---
 mm/internal.h       | 11 +++++++++++
 mm/memory-failure.c | 21 +++++++++++++++++++++
 mm/sparse.c         | 27 ---------------------------
 3 files changed, 32 insertions(+), 27 deletions(-)

diff --git a/mm/internal.h b/mm/internal.h
index cf16280ce132..e8add8df4e0f 100644
--- a/mm/internal.h
+++ b/mm/internal.h
@@ -758,4 +758,15 @@ struct folio *try_grab_folio(struct page *page, int refs, unsigned int flags);
 
 DECLARE_PER_CPU(struct per_cpu_nodestat, boot_nodestats);
 
+/*
+ * mm/memory-failure.c
+ */
+#ifdef CONFIG_MEMORY_FAILURE
+void clear_hwpoisoned_pages(struct page *memmap, int nr_pages);
+#else
+static inline void clear_hwpoisoned_pages(struct page *memmap, int nr_pages)
+{
+}
+#endif
+
 #endif	/* __MM_INTERNAL_H */
diff --git a/mm/memory-failure.c b/mm/memory-failure.c
index 27760c19bad7..46d9fb612dcc 100644
--- a/mm/memory-failure.c
+++ b/mm/memory-failure.c
@@ -2401,3 +2401,24 @@ int soft_offline_page(unsigned long pfn, int flags)
 
 	return ret;
 }
+
+void clear_hwpoisoned_pages(struct page *memmap, int nr_pages)
+{
+	int i;
+
+	/*
+	 * A further optimization is to have per section refcounted
+	 * num_poisoned_pages.  But that would need more space per memmap, so
+	 * for now just do a quick global check to speed up this routine in the
+	 * absence of bad pages.
+	 */
+	if (atomic_long_read(&num_poisoned_pages) == 0)
+		return;
+
+	for (i = 0; i < nr_pages; i++) {
+		if (PageHWPoison(&memmap[i])) {
+			num_poisoned_pages_dec();
+			ClearPageHWPoison(&memmap[i]);
+		}
+	}
+}
diff --git a/mm/sparse.c b/mm/sparse.c
index 952f06d8f373..e983c38fac8f 100644
--- a/mm/sparse.c
+++ b/mm/sparse.c
@@ -916,33 +916,6 @@ int __meminit sparse_add_section(int nid, unsigned long start_pfn,
 	return 0;
 }
 
-#ifdef CONFIG_MEMORY_FAILURE
-static void clear_hwpoisoned_pages(struct page *memmap, int nr_pages)
-{
-	int i;
-
-	/*
-	 * A further optimization is to have per section refcounted
-	 * num_poisoned_pages.  But that would need more space per memmap, so
-	 * for now just do a quick global check to speed up this routine in the
-	 * absence of bad pages.
-	 */
-	if (atomic_long_read(&num_poisoned_pages) == 0)
-		return;
-
-	for (i = 0; i < nr_pages; i++) {
-		if (PageHWPoison(&memmap[i])) {
-			num_poisoned_pages_dec();
-			ClearPageHWPoison(&memmap[i]);
-		}
-	}
-}
-#else
-static inline void clear_hwpoisoned_pages(struct page *memmap, int nr_pages)
-{
-}
-#endif
-
 void sparse_remove_section(struct mem_section *ms, unsigned long pfn,
 		unsigned long nr_pages, unsigned long map_offset,
 		struct vmem_altmap *altmap)
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH 2/4] mm/memofy-failure.c:: simplify num_poisoned_pages_dec
  2022-04-29 14:22 [PATCH 0/4] memory-failure: fix hwpoison_filter zhenwei pi
  2022-04-29 14:22 ` [PATCH 1/4] mm/memory-failure.c: move clear_hwpoisoned_pages zhenwei pi
@ 2022-04-29 14:22 ` zhenwei pi
  2022-05-06  8:55   ` Naoya Horiguchi
  2022-04-29 14:22 ` [PATCH 3/4] mm/memofy-failure.c: optimize hwpoison_filter zhenwei pi
  2022-04-29 14:22 ` [PATCH 4/4] mm/memofy-failure.c: add hwpoison_filter for soft offline zhenwei pi
  3 siblings, 1 reply; 14+ messages in thread
From: zhenwei pi @ 2022-04-29 14:22 UTC (permalink / raw)
  To: akpm, naoya.horiguchi; +Cc: linux-mm, linux-kernel, zhenwei pi, Wu Fengguang

Don't descrease the number of poisoned pages in page_alloc.c, let the
memory-failure.c do inc/dec poisoned pages only.

Also simplify unpoison_memory(), only descrease the number of
poisoned pages when:
 - TestClearPageHWPoison() succeed
 - put_page_back_buddy succeed

After descreasing, print necessary log.

Finally, remove clear_page_hwpoison() and unpoison_taken_off_page().

Cc: Wu Fengguang <fengguang.wu@intel.com>
Signed-off-by: zhenwei pi <pizhenwei@bytedance.com>
---
 mm/memory-failure.c | 37 +++++++++----------------------------
 mm/page_alloc.c     |  1 -
 2 files changed, 9 insertions(+), 29 deletions(-)

diff --git a/mm/memory-failure.c b/mm/memory-failure.c
index 46d9fb612dcc..ece05858568f 100644
--- a/mm/memory-failure.c
+++ b/mm/memory-failure.c
@@ -2101,28 +2101,6 @@ core_initcall(memory_failure_init);
 		pr_info(fmt, pfn);			\
 })
 
-static inline int clear_page_hwpoison(struct ratelimit_state *rs, struct page *p)
-{
-	if (TestClearPageHWPoison(p)) {
-		unpoison_pr_info("Unpoison: Software-unpoisoned page %#lx\n",
-				 page_to_pfn(p), rs);
-		num_poisoned_pages_dec();
-		return 1;
-	}
-	return 0;
-}
-
-static inline int unpoison_taken_off_page(struct ratelimit_state *rs,
-					  struct page *p)
-{
-	if (put_page_back_buddy(p)) {
-		unpoison_pr_info("Unpoison: Software-unpoisoned page %#lx\n",
-				 page_to_pfn(p), rs);
-		return 0;
-	}
-	return -EBUSY;
-}
-
 /**
  * unpoison_memory - Unpoison a previously poisoned page
  * @pfn: Page number of the to be unpoisoned page
@@ -2140,6 +2118,7 @@ int unpoison_memory(unsigned long pfn)
 	struct page *page;
 	struct page *p;
 	int ret = -EBUSY;
+	int freeit = 0;
 	static DEFINE_RATELIMIT_STATE(unpoison_rs, DEFAULT_RATELIMIT_INTERVAL,
 					DEFAULT_RATELIMIT_BURST);
 
@@ -2180,18 +2159,15 @@ int unpoison_memory(unsigned long pfn)
 
 	ret = get_hwpoison_page(p, MF_UNPOISON);
 	if (!ret) {
-		if (clear_page_hwpoison(&unpoison_rs, page))
-			ret = 0;
-		else
-			ret = -EBUSY;
+		ret = TestClearPageHWPoison(page) ? 0 : -EBUSY;
 	} else if (ret < 0) {
 		if (ret == -EHWPOISON) {
-			ret = unpoison_taken_off_page(&unpoison_rs, p);
+			ret = put_page_back_buddy(p) ? 0 : -EBUSY;
 		} else
 			unpoison_pr_info("Unpoison: failed to grab page %#lx\n",
 					 pfn, &unpoison_rs);
 	} else {
-		int freeit = clear_page_hwpoison(&unpoison_rs, p);
+		freeit = !!TestClearPageHWPoison(p);
 
 		put_page(page);
 		if (freeit && !(pfn == my_zero_pfn(0) && page_count(p) == 1)) {
@@ -2202,6 +2178,11 @@ int unpoison_memory(unsigned long pfn)
 
 unlock_mutex:
 	mutex_unlock(&mf_mutex);
+	if (!ret || freeit) {
+		num_poisoned_pages_dec();
+		unpoison_pr_info("Unpoison: Software-unpoisoned page %#lx\n",
+				 page_to_pfn(p), &unpoison_rs);
+	}
 	return ret;
 }
 EXPORT_SYMBOL(unpoison_memory);
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 0e42038382c1..d710846ef653 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -9625,7 +9625,6 @@ bool put_page_back_buddy(struct page *page)
 		ClearPageHWPoisonTakenOff(page);
 		__free_one_page(page, pfn, zone, 0, migratetype, FPI_NONE);
 		if (TestClearPageHWPoison(page)) {
-			num_poisoned_pages_dec();
 			ret = true;
 		}
 	}
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH 3/4] mm/memofy-failure.c: optimize hwpoison_filter
  2022-04-29 14:22 [PATCH 0/4] memory-failure: fix hwpoison_filter zhenwei pi
  2022-04-29 14:22 ` [PATCH 1/4] mm/memory-failure.c: move clear_hwpoisoned_pages zhenwei pi
  2022-04-29 14:22 ` [PATCH 2/4] mm/memofy-failure.c:: simplify num_poisoned_pages_dec zhenwei pi
@ 2022-04-29 14:22 ` zhenwei pi
  2022-05-06  8:59   ` Naoya Horiguchi
  2022-04-29 14:22 ` [PATCH 4/4] mm/memofy-failure.c: add hwpoison_filter for soft offline zhenwei pi
  3 siblings, 1 reply; 14+ messages in thread
From: zhenwei pi @ 2022-04-29 14:22 UTC (permalink / raw)
  To: akpm, naoya.horiguchi; +Cc: linux-mm, linux-kernel, zhenwei pi, Wu Fengguang

In the memory failure procedure, hwpoison_filter has higher priority,
if memory_filter() filters the error event, there is no need to do
the further work.

Cc: Wu Fengguang <fengguang.wu@intel.com>
Signed-off-by: zhenwei pi <pizhenwei@bytedance.com>
---
 mm/memory-failure.c | 14 +++++---------
 1 file changed, 5 insertions(+), 9 deletions(-)

diff --git a/mm/memory-failure.c b/mm/memory-failure.c
index ece05858568f..a6a27c8b800f 100644
--- a/mm/memory-failure.c
+++ b/mm/memory-failure.c
@@ -1800,6 +1800,11 @@ int memory_failure(unsigned long pfn, int flags)
 		goto unlock_mutex;
 	}
 
+	if (hwpoison_filter(p)) {
+		res = -EOPNOTSUPP;
+		goto unlock_mutex;
+	}
+
 try_again:
 	res = try_memory_failure_hugetlb(pfn, flags, &hugetlb);
 	if (hugetlb)
@@ -1937,15 +1942,6 @@ int memory_failure(unsigned long pfn, int flags)
 	 */
 	page_flags = p->flags;
 
-	if (hwpoison_filter(p)) {
-		if (TestClearPageHWPoison(p))
-			num_poisoned_pages_dec();
-		unlock_page(p);
-		put_page(p);
-		res = -EOPNOTSUPP;
-		goto unlock_mutex;
-	}
-
 	/*
 	 * __munlock_pagevec may clear a writeback page's LRU flag without
 	 * page_lock. We need wait writeback completion for this page or it
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH 4/4] mm/memofy-failure.c: add hwpoison_filter for soft offline
  2022-04-29 14:22 [PATCH 0/4] memory-failure: fix hwpoison_filter zhenwei pi
                   ` (2 preceding siblings ...)
  2022-04-29 14:22 ` [PATCH 3/4] mm/memofy-failure.c: optimize hwpoison_filter zhenwei pi
@ 2022-04-29 14:22 ` zhenwei pi
  2022-05-06  8:59   ` Naoya Horiguchi
  3 siblings, 1 reply; 14+ messages in thread
From: zhenwei pi @ 2022-04-29 14:22 UTC (permalink / raw)
  To: akpm, naoya.horiguchi; +Cc: linux-mm, linux-kernel, zhenwei pi, Wu Fengguang

hwpoison_filter is missing in the soft offline path, this leads an
issue: after enabling the corrupt filter, the user process still has
a chance to inject hwpoison fault by
madvise(addr, len, MADV_SOFT_OFFLINE) at PFN which is expected to
reject.

Cc: Wu Fengguang <fengguang.wu@intel.com>
Signed-off-by: zhenwei pi <pizhenwei@bytedance.com>
---
 mm/memory-failure.c | 9 ++++++++-
 1 file changed, 8 insertions(+), 1 deletion(-)

diff --git a/mm/memory-failure.c b/mm/memory-failure.c
index a6a27c8b800f..6564f5a34658 100644
--- a/mm/memory-failure.c
+++ b/mm/memory-failure.c
@@ -2313,7 +2313,9 @@ static void put_ref_page(struct page *page)
  * @pfn: pfn to soft-offline
  * @flags: flags. Same as memory_failure().
  *
- * Returns 0 on success, otherwise negated errno.
+ * Returns 0 on success
+ *         -EOPNOTSUPP for memory_filter() filtered the error event
+ *         < 0 otherwise negated errno.
  *
  * Soft offline a page, by migration or invalidation,
  * without killing anything. This is for the case when
@@ -2350,6 +2352,11 @@ int soft_offline_page(unsigned long pfn, int flags)
 		return -EIO;
 	}
 
+	if (hwpoison_filter(page)) {
+		put_ref_page(ref_page);
+		return -EOPNOTSUPP;
+	}
+
 	mutex_lock(&mf_mutex);
 
 	if (PageHWPoison(page)) {
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* Re: [PATCH 1/4] mm/memory-failure.c: move clear_hwpoisoned_pages
  2022-04-29 14:22 ` [PATCH 1/4] mm/memory-failure.c: move clear_hwpoisoned_pages zhenwei pi
@ 2022-05-06  8:55   ` Naoya Horiguchi
  0 siblings, 0 replies; 14+ messages in thread
From: Naoya Horiguchi @ 2022-05-06  8:55 UTC (permalink / raw)
  To: zhenwei pi; +Cc: akpm, naoya.horiguchi, linux-mm, linux-kernel, Wu Fengguang

On Fri, Apr 29, 2022 at 10:22:03PM +0800, zhenwei pi wrote:
> clear_hwpoisoned_pages() clears HWPoison flag and decreases the number
> of poisoned pages, this actually works as part of memory failure.
> 
> Move this function from sparse.c to memory-failure.c, finally there
> is no CONFIG_MEMORY_FAILURE in sparse.c.
> 
> Cc: Wu Fengguang <fengguang.wu@intel.com>
> Signed-off-by: zhenwei pi <pizhenwei@bytedance.com>
> ---
>  mm/internal.h       | 11 +++++++++++
>  mm/memory-failure.c | 21 +++++++++++++++++++++
>  mm/sparse.c         | 27 ---------------------------
>  3 files changed, 32 insertions(+), 27 deletions(-)
> 
> diff --git a/mm/internal.h b/mm/internal.h
> index cf16280ce132..e8add8df4e0f 100644
> --- a/mm/internal.h
> +++ b/mm/internal.h
> @@ -758,4 +758,15 @@ struct folio *try_grab_folio(struct page *page, int refs, unsigned int flags);
>  
>  DECLARE_PER_CPU(struct per_cpu_nodestat, boot_nodestats);
>  
> +/*
> + * mm/memory-failure.c
> + */
> +#ifdef CONFIG_MEMORY_FAILURE
> +void clear_hwpoisoned_pages(struct page *memmap, int nr_pages);
> +#else
> +static inline void clear_hwpoisoned_pages(struct page *memmap, int nr_pages)
> +{
> +}
> +#endif

This code may be put near the definitions of hwpoison_filter* in mm/internal.h
because they are also called from mm/memory-failure.

Otherwise, looks good to me.

Acked-by: Naoya Horiguchi <naoya.horiguchi@nec.com>

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 2/4] mm/memofy-failure.c:: simplify num_poisoned_pages_dec
  2022-04-29 14:22 ` [PATCH 2/4] mm/memofy-failure.c:: simplify num_poisoned_pages_dec zhenwei pi
@ 2022-05-06  8:55   ` Naoya Horiguchi
  0 siblings, 0 replies; 14+ messages in thread
From: Naoya Horiguchi @ 2022-05-06  8:55 UTC (permalink / raw)
  To: zhenwei pi; +Cc: akpm, naoya.horiguchi, linux-mm, linux-kernel, Wu Fengguang

On Fri, Apr 29, 2022 at 10:22:04PM +0800, zhenwei pi wrote:
> Don't descrease the number of poisoned pages in page_alloc.c, let the
        ^^^^
        decrease?

> memory-failure.c do inc/dec poisoned pages only.
> 
> Also simplify unpoison_memory(), only descrease the number of
                                        ^^^^
                                        same typo

> poisoned pages when:
>  - TestClearPageHWPoison() succeed
>  - put_page_back_buddy succeed
> 
> After descreasing, print necessary log.
        ^^^^
        ditto

> 
> Finally, remove clear_page_hwpoison() and unpoison_taken_off_page().
> 
> Cc: Wu Fengguang <fengguang.wu@intel.com>
> Signed-off-by: zhenwei pi <pizhenwei@bytedance.com>

Otherwise, the changes look good to me.

Acked-by: Naoya Horiguchi <naoya.horiguchi@nec.com>

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 3/4] mm/memofy-failure.c: optimize hwpoison_filter
  2022-04-29 14:22 ` [PATCH 3/4] mm/memofy-failure.c: optimize hwpoison_filter zhenwei pi
@ 2022-05-06  8:59   ` Naoya Horiguchi
  2022-05-06 13:38     ` zhenwei pi
  0 siblings, 1 reply; 14+ messages in thread
From: Naoya Horiguchi @ 2022-05-06  8:59 UTC (permalink / raw)
  To: zhenwei pi; +Cc: akpm, naoya.horiguchi, linux-mm, linux-kernel, Wu Fengguang

On Fri, Apr 29, 2022 at 10:22:05PM +0800, zhenwei pi wrote:
> In the memory failure procedure, hwpoison_filter has higher priority,
> if memory_filter() filters the error event, there is no need to do
> the further work.

Could you clarify what problem you are trying to solve (what does
"optimize" mean in this context or what is the benefit)?

Now hwpoison_filter() can be called both with *and* without taking page refcount.
It's mainly called *with* taking page refcount in order to make sure that the
actual handling process is executed only for pages that meet a given condition.
IOW, it's important to prevent pages which do not meet the condition from going
ahead to further steps (false-positive is not permitted).  So this type of
callsite should not be omittable.

As for the other case, hwpoison_filter() is also called in hwpoison_inject()
*without* taking page refcount.  This actually has a different nuance and
intended to speculatively filter the injection events before setting
PageHWPoison flag to reduce the noise due to setting PG_hwpoison temporary.
The point is that it's not intended here to filter precisely and this callsite
is omittable.

So in my understanding, we need keep hwpoison_filter() after taking page
refcount as we do now.  Maybe optionally and additionally calling
hwpoison_filter() at the beginning of memory_failure() might be possible,
but I'm not sure yet how helpful...

Thanks,
Naoya Horiguchi

> 
> Cc: Wu Fengguang <fengguang.wu@intel.com>
> Signed-off-by: zhenwei pi <pizhenwei@bytedance.com>
> ---
>  mm/memory-failure.c | 14 +++++---------
>  1 file changed, 5 insertions(+), 9 deletions(-)
> 
> diff --git a/mm/memory-failure.c b/mm/memory-failure.c
> index ece05858568f..a6a27c8b800f 100644
> --- a/mm/memory-failure.c
> +++ b/mm/memory-failure.c
> @@ -1800,6 +1800,11 @@ int memory_failure(unsigned long pfn, int flags)
>  		goto unlock_mutex;
>  	}
>  
> +	if (hwpoison_filter(p)) {
> +		res = -EOPNOTSUPP;
> +		goto unlock_mutex;
> +	}
> +
>  try_again:
>  	res = try_memory_failure_hugetlb(pfn, flags, &hugetlb);
>  	if (hugetlb)
> @@ -1937,15 +1942,6 @@ int memory_failure(unsigned long pfn, int flags)
>  	 */
>  	page_flags = p->flags;
>  
> -	if (hwpoison_filter(p)) {
> -		if (TestClearPageHWPoison(p))
> -			num_poisoned_pages_dec();
> -		unlock_page(p);
> -		put_page(p);
> -		res = -EOPNOTSUPP;
> -		goto unlock_mutex;
> -	}
> -
>  	/*
>  	 * __munlock_pagevec may clear a writeback page's LRU flag without
>  	 * page_lock. We need wait writeback completion for this page or it
> -- 
> 2.20.1
> 

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 4/4] mm/memofy-failure.c: add hwpoison_filter for soft offline
  2022-04-29 14:22 ` [PATCH 4/4] mm/memofy-failure.c: add hwpoison_filter for soft offline zhenwei pi
@ 2022-05-06  8:59   ` Naoya Horiguchi
  0 siblings, 0 replies; 14+ messages in thread
From: Naoya Horiguchi @ 2022-05-06  8:59 UTC (permalink / raw)
  To: zhenwei pi; +Cc: akpm, naoya.horiguchi, linux-mm, linux-kernel, Wu Fengguang

On Fri, Apr 29, 2022 at 10:22:06PM +0800, zhenwei pi wrote:
> hwpoison_filter is missing in the soft offline path, this leads an
> issue: after enabling the corrupt filter, the user process still has
> a chance to inject hwpoison fault by
> madvise(addr, len, MADV_SOFT_OFFLINE) at PFN which is expected to
> reject.

The motivation is fine to me. Thank you for finding this.

> 
> Cc: Wu Fengguang <fengguang.wu@intel.com>
> Signed-off-by: zhenwei pi <pizhenwei@bytedance.com>
> ---
>  mm/memory-failure.c | 9 ++++++++-
>  1 file changed, 8 insertions(+), 1 deletion(-)
> 
> diff --git a/mm/memory-failure.c b/mm/memory-failure.c
> index a6a27c8b800f..6564f5a34658 100644
> --- a/mm/memory-failure.c
> +++ b/mm/memory-failure.c
> @@ -2313,7 +2313,9 @@ static void put_ref_page(struct page *page)
>   * @pfn: pfn to soft-offline
>   * @flags: flags. Same as memory_failure().
>   *
> - * Returns 0 on success, otherwise negated errno.
> + * Returns 0 on success
> + *         -EOPNOTSUPP for memory_filter() filtered the error event

Using word hwpoison_filter() rather than memory_filter() seems better to me.

> + *         < 0 otherwise negated errno.
>   *
>   * Soft offline a page, by migration or invalidation,
>   * without killing anything. This is for the case when
> @@ -2350,6 +2352,11 @@ int soft_offline_page(unsigned long pfn, int flags)
>  		return -EIO;
>  	}
>  
> +	if (hwpoison_filter(page)) {
> +		put_ref_page(ref_page);
> +		return -EOPNOTSUPP;
> +	}
> +

Based on the assumption behind hwpoison_filter(), calling it after
get_hwpoison_page() would be better?

Thanks,
Naoya Horiguchi

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Re: [PATCH 3/4] mm/memofy-failure.c: optimize hwpoison_filter
  2022-05-06  8:59   ` Naoya Horiguchi
@ 2022-05-06 13:38     ` zhenwei pi
  2022-05-06 16:28       ` David Hildenbrand
  0 siblings, 1 reply; 14+ messages in thread
From: zhenwei pi @ 2022-05-06 13:38 UTC (permalink / raw)
  To: Naoya Horiguchi
  Cc: akpm, naoya.horiguchi, linux-mm, linux-kernel, Wu Fengguang,
	Michael S. Tsirkin, David Hildenbrand, Jason Wang,
	virtualization



On 5/6/22 16:59, Naoya Horiguchi wrote:
> On Fri, Apr 29, 2022 at 10:22:05PM +0800, zhenwei pi wrote:
>> In the memory failure procedure, hwpoison_filter has higher priority,
>> if memory_filter() filters the error event, there is no need to do
>> the further work.
> 
> Could you clarify what problem you are trying to solve (what does
> "optimize" mean in this context or what is the benefit)?
> 

OK. The background of this work:
As well known, the memory failure mechanism handles memory corrupted 
event, and try to send SIGBUS to the user process which uses this 
corrupted page.

For the virtualization case, QEMU catches SIGBUS and tries to inject MCE 
into the guest, and the guest handles memory failure again. Thus the 
guest gets the minimal effect from hardware memory corruption.

The further step I'm working on:
1, try to modify code to decrease poisoned pages in a single place 
(mm/memofy-failure.c: simplify num_poisoned_pages_dec in this series).

2, try to use page_handle_poison() to handle SetPageHWPoison() and 
num_poisoned_pages_inc() together. It would be best to call 
num_poisoned_pages_inc() in a single place too. I'm not sure if this is 
possible or not, please correct me if I misunderstand.

3, introduce memory failure notifier list in memory-failure.c: notify 
the corrupted PFN to someone who registers this list.
If I can complete [1] and [2] part, [3] will be quite easy(just call 
notifier list after increasing poisoned page).

4, introduce memory recover VQ for memory balloon device, and registers 
memory failure notifier list. During the guest kernel handles memory 
failure, balloon device gets notified by memory failure notifier list, 
and tells the host to recover the corrupted PFN(GPA) by the new VQ.

5, host side remaps the corrupted page(HVA), and tells the guest side to 
unpoison the PFN(GPA). Then the guest fixes the corrupted page(GPA) 
dynamically.

Because [4] and [5] are related to balloon device, also CC Michael, 
David and Jason.

> Now hwpoison_filter() can be called both with *and* without taking page refcount.
> It's mainly called *with* taking page refcount in order to make sure that the
> actual handling process is executed only for pages that meet a given condition.
> IOW, it's important to prevent pages which do not meet the condition from going
> ahead to further steps (false-positive is not permitted).  So this type of
> callsite should not be omittable.
> 
> As for the other case, hwpoison_filter() is also called in hwpoison_inject()
> *without* taking page refcount.  This actually has a different nuance and
> intended to speculatively filter the injection events before setting
> PageHWPoison flag to reduce the noise due to setting PG_hwpoison temporary.
> The point is that it's not intended here to filter precisely and this callsite
> is omittable.
> 
> So in my understanding, we need keep hwpoison_filter() after taking page
> refcount as we do now.  Maybe optionally and additionally calling
> hwpoison_filter() at the beginning of memory_failure() might be possible,
> but I'm not sure yet how helpful...
> 
> Thanks,
> Naoya Horiguchi
> 
>>
>> Cc: Wu Fengguang <fengguang.wu@intel.com>
>> Signed-off-by: zhenwei pi <pizhenwei@bytedance.com>
>> ---
>>   mm/memory-failure.c | 14 +++++---------
>>   1 file changed, 5 insertions(+), 9 deletions(-)
>>
>> diff --git a/mm/memory-failure.c b/mm/memory-failure.c
>> index ece05858568f..a6a27c8b800f 100644
>> --- a/mm/memory-failure.c
>> +++ b/mm/memory-failure.c
>> @@ -1800,6 +1800,11 @@ int memory_failure(unsigned long pfn, int flags)
>>   		goto unlock_mutex;
>>   	}
>>   
>> +	if (hwpoison_filter(p)) {
>> +		res = -EOPNOTSUPP;
>> +		goto unlock_mutex;
>> +	}
>> +
>>   try_again:
>>   	res = try_memory_failure_hugetlb(pfn, flags, &hugetlb);
>>   	if (hugetlb)
>> @@ -1937,15 +1942,6 @@ int memory_failure(unsigned long pfn, int flags)
>>   	 */
>>   	page_flags = p->flags;
>>   
>> -	if (hwpoison_filter(p)) {
>> -		if (TestClearPageHWPoison(p))
>> -			num_poisoned_pages_dec();
>> -		unlock_page(p);
>> -		put_page(p);
>> -		res = -EOPNOTSUPP;
>> -		goto unlock_mutex;
>> -	}
>> -
>>   	/*
>>   	 * __munlock_pagevec may clear a writeback page's LRU flag without
>>   	 * page_lock. We need wait writeback completion for this page or it
>> -- 
>> 2.20.1
>>

-- 
zhenwei pi

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 3/4] mm/memofy-failure.c: optimize hwpoison_filter
  2022-05-06 13:38     ` zhenwei pi
@ 2022-05-06 16:28       ` David Hildenbrand
  2022-05-07  0:28         ` zhenwei pi
  0 siblings, 1 reply; 14+ messages in thread
From: David Hildenbrand @ 2022-05-06 16:28 UTC (permalink / raw)
  To: zhenwei pi, Naoya Horiguchi
  Cc: akpm, naoya.horiguchi, linux-mm, linux-kernel, Wu Fengguang,
	Michael S. Tsirkin, Jason Wang, virtualization

On 06.05.22 15:38, zhenwei pi wrote:
> 
> 
> On 5/6/22 16:59, Naoya Horiguchi wrote:
>> On Fri, Apr 29, 2022 at 10:22:05PM +0800, zhenwei pi wrote:
>>> In the memory failure procedure, hwpoison_filter has higher priority,
>>> if memory_filter() filters the error event, there is no need to do
>>> the further work.
>>
>> Could you clarify what problem you are trying to solve (what does
>> "optimize" mean in this context or what is the benefit)?
>>
> 
> OK. The background of this work:
> As well known, the memory failure mechanism handles memory corrupted 
> event, and try to send SIGBUS to the user process which uses this 
> corrupted page.
> 
> For the virtualization case, QEMU catches SIGBUS and tries to inject MCE 
> into the guest, and the guest handles memory failure again. Thus the 
> guest gets the minimal effect from hardware memory corruption.
> 
> The further step I'm working on:
> 1, try to modify code to decrease poisoned pages in a single place 
> (mm/memofy-failure.c: simplify num_poisoned_pages_dec in this series).
> 
> 2, try to use page_handle_poison() to handle SetPageHWPoison() and 
> num_poisoned_pages_inc() together. It would be best to call 
> num_poisoned_pages_inc() in a single place too. I'm not sure if this is 
> possible or not, please correct me if I misunderstand.
> 
> 3, introduce memory failure notifier list in memory-failure.c: notify 
> the corrupted PFN to someone who registers this list.
> If I can complete [1] and [2] part, [3] will be quite easy(just call 
> notifier list after increasing poisoned page).
> 
> 4, introduce memory recover VQ for memory balloon device, and registers 
> memory failure notifier list. During the guest kernel handles memory 
> failure, balloon device gets notified by memory failure notifier list, 
> and tells the host to recover the corrupted PFN(GPA) by the new VQ.

Most probably you might want to do that asynchronously, and once the
callback succeeds, un-poison the page.

> 
> 5, host side remaps the corrupted page(HVA), and tells the guest side to 
> unpoison the PFN(GPA). Then the guest fixes the corrupted page(GPA) 
> dynamically.

I think QEMU already does that during reboots. Now it would be triggered
by the guest for individual pages.

> 
> Because [4] and [5] are related to balloon device, also CC Michael, 
> David and Jason.

Doesn't sound too crazy for me, although it's a shame that we always
have to use virtio-balloon for such fairly balloon-unrelated things.

-- 
Thanks,

David / dhildenb


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Re: [PATCH 3/4] mm/memofy-failure.c: optimize hwpoison_filter
  2022-05-06 16:28       ` David Hildenbrand
@ 2022-05-07  0:28         ` zhenwei pi
  2022-05-07  8:20           ` Naoya Horiguchi
  0 siblings, 1 reply; 14+ messages in thread
From: zhenwei pi @ 2022-05-07  0:28 UTC (permalink / raw)
  To: David Hildenbrand, Naoya Horiguchi
  Cc: akpm, naoya.horiguchi, linux-mm, linux-kernel, Wu Fengguang,
	Michael S. Tsirkin, Jason Wang, virtualization


On 5/7/22 00:28, David Hildenbrand wrote:
> On 06.05.22 15:38, zhenwei pi wrote:
>>
>>
>> On 5/6/22 16:59, Naoya Horiguchi wrote:
>>> On Fri, Apr 29, 2022 at 10:22:05PM +0800, zhenwei pi wrote:
>>>> In the memory failure procedure, hwpoison_filter has higher priority,
>>>> if memory_filter() filters the error event, there is no need to do
>>>> the further work.
>>>
>>> Could you clarify what problem you are trying to solve (what does
>>> "optimize" mean in this context or what is the benefit)?
>>>
>>
>> OK. The background of this work:
>> As well known, the memory failure mechanism handles memory corrupted
>> event, and try to send SIGBUS to the user process which uses this
>> corrupted page.
>>
>> For the virtualization case, QEMU catches SIGBUS and tries to inject MCE
>> into the guest, and the guest handles memory failure again. Thus the
>> guest gets the minimal effect from hardware memory corruption.
>>
>> The further step I'm working on:
>> 1, try to modify code to decrease poisoned pages in a single place
>> (mm/memofy-failure.c: simplify num_poisoned_pages_dec in this series).
>>
>> 2, try to use page_handle_poison() to handle SetPageHWPoison() and
>> num_poisoned_pages_inc() together. It would be best to call
>> num_poisoned_pages_inc() in a single place too. I'm not sure if this is
>> possible or not, please correct me if I misunderstand.
>>
>> 3, introduce memory failure notifier list in memory-failure.c: notify
>> the corrupted PFN to someone who registers this list.
>> If I can complete [1] and [2] part, [3] will be quite easy(just call
>> notifier list after increasing poisoned page).
>>
>> 4, introduce memory recover VQ for memory balloon device, and registers
>> memory failure notifier list. During the guest kernel handles memory
>> failure, balloon device gets notified by memory failure notifier list,
>> and tells the host to recover the corrupted PFN(GPA) by the new VQ.
> 
> Most probably you might want to do that asynchronously, and once the
> callback succeeds, un-poison the page.
> 
Yes!

>>
>> 5, host side remaps the corrupted page(HVA), and tells the guest side to
>> unpoison the PFN(GPA). Then the guest fixes the corrupted page(GPA)
>> dynamically.
> 
> I think QEMU already does that during reboots. Now it would be triggered
> by the guest for individual pages.
> 
Yes, currently QEMU supports to un-poison corrupted pages during 
reset/reboot. We can reuse some code to do the work in this case, this 
allows a VM to fix corrupted pages as soon as possible(also no need to 
reset/reboot).

>>
>> Because [4] and [5] are related to balloon device, also CC Michael,
>> David and Jason.
> 
> Doesn't sound too crazy for me, although it's a shame that we always
> have to use virtio-balloon for such fairly balloon-unrelated things.
> 
Thanks!

-- 
zhenwei pi

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Re: [PATCH 3/4] mm/memofy-failure.c: optimize hwpoison_filter
  2022-05-07  0:28         ` zhenwei pi
@ 2022-05-07  8:20           ` Naoya Horiguchi
  2022-05-07  9:19             ` zhenwei pi
  0 siblings, 1 reply; 14+ messages in thread
From: Naoya Horiguchi @ 2022-05-07  8:20 UTC (permalink / raw)
  To: zhenwei pi
  Cc: David Hildenbrand, akpm, naoya.horiguchi, linux-mm, linux-kernel,
	Wu Fengguang, Michael S. Tsirkin, Jason Wang, virtualization

On Sat, May 07, 2022 at 08:28:05AM +0800, zhenwei pi wrote:
> 
> On 5/7/22 00:28, David Hildenbrand wrote:
> > On 06.05.22 15:38, zhenwei pi wrote:
> > > 
> > > 
> > > On 5/6/22 16:59, Naoya Horiguchi wrote:
> > > > On Fri, Apr 29, 2022 at 10:22:05PM +0800, zhenwei pi wrote:
> > > > > In the memory failure procedure, hwpoison_filter has higher priority,
> > > > > if memory_filter() filters the error event, there is no need to do
> > > > > the further work.
> > > > 
> > > > Could you clarify what problem you are trying to solve (what does
> > > > "optimize" mean in this context or what is the benefit)?
> > > > 
> > > 
> > > OK. The background of this work:
> > > As well known, the memory failure mechanism handles memory corrupted
> > > event, and try to send SIGBUS to the user process which uses this
> > > corrupted page.
> > > 
> > > For the virtualization case, QEMU catches SIGBUS and tries to inject MCE
> > > into the guest, and the guest handles memory failure again. Thus the
> > > guest gets the minimal effect from hardware memory corruption.
> > > 
> > > The further step I'm working on:
> > > 1, try to modify code to decrease poisoned pages in a single place
> > > (mm/memofy-failure.c: simplify num_poisoned_pages_dec in this series).

This is fine to me.

> > > 
> > > 2, try to use page_handle_poison() to handle SetPageHWPoison() and
> > > num_poisoned_pages_inc() together. It would be best to call
> > > num_poisoned_pages_inc() in a single place too. I'm not sure if this is
> > > possible or not, please correct me if I misunderstand.

SetPageHWPoison() can be cancelled in memory_failure(), so simply bundling
it with num_poisoned_pages_inc() might not be optimal.  I think that
action_result() is supposed to be called when memory error handling is
effective (not filtered, not cancelled). So moving num_poisoned_pages_inc()
(and notification code in your plan) into this function might be good.

> > > 
> > > 3, introduce memory failure notifier list in memory-failure.c: notify
> > > the corrupted PFN to someone who registers this list.
> > > If I can complete [1] and [2] part, [3] will be quite easy(just call
> > > notifier list after increasing poisoned page).
> > > 
> > > 4, introduce memory recover VQ for memory balloon device, and registers
> > > memory failure notifier list. During the guest kernel handles memory
> > > failure, balloon device gets notified by memory failure notifier list,
> > > and tells the host to recover the corrupted PFN(GPA) by the new VQ.
> > 
> > Most probably you might want to do that asynchronously, and once the
> > callback succeeds, un-poison the page.
> > 
> Yes!
> 
> > > 
> > > 5, host side remaps the corrupted page(HVA), and tells the guest side to
> > > unpoison the PFN(GPA). Then the guest fixes the corrupted page(GPA)
> > > dynamically.
> > 
> > I think QEMU already does that during reboots. Now it would be triggered
> > by the guest for individual pages.
> > 
> Yes, currently QEMU supports to un-poison corrupted pages during
> reset/reboot. We can reuse some code to do the work in this case, this
> allows a VM to fix corrupted pages as soon as possible(also no need to
> reset/reboot).

So this finally allows to replace broken page mapped to guest with
a healthy page without rebooting the guest. That sounds helpful.

Thanks,
Naoya Horiguchi

> 
> > > 
> > > Because [4] and [5] are related to balloon device, also CC Michael,
> > > David and Jason.
> > 
> > Doesn't sound too crazy for me, although it's a shame that we always
> > have to use virtio-balloon for such fairly balloon-unrelated things.
> > 
> Thanks!
> 
> -- 
> zhenwei pi

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Re: Re: [PATCH 3/4] mm/memofy-failure.c: optimize hwpoison_filter
  2022-05-07  8:20           ` Naoya Horiguchi
@ 2022-05-07  9:19             ` zhenwei pi
  0 siblings, 0 replies; 14+ messages in thread
From: zhenwei pi @ 2022-05-07  9:19 UTC (permalink / raw)
  To: Naoya Horiguchi
  Cc: David Hildenbrand, akpm, naoya.horiguchi, linux-mm, linux-kernel,
	Wu Fengguang, Michael S. Tsirkin, Jason Wang, virtualization

On 5/7/22 16:20, Naoya Horiguchi wrote:
> On Sat, May 07, 2022 at 08:28:05AM +0800, zhenwei pi wrote:
>>
>> On 5/7/22 00:28, David Hildenbrand wrote:
>>> On 06.05.22 15:38, zhenwei pi wrote:
>>>>
>>>>
>>>> On 5/6/22 16:59, Naoya Horiguchi wrote:
>>>>> On Fri, Apr 29, 2022 at 10:22:05PM +0800, zhenwei pi wrote:
>>>>>> In the memory failure procedure, hwpoison_filter has higher priority,
>>>>>> if memory_filter() filters the error event, there is no need to do
>>>>>> the further work.
>>>>>
>>>>> Could you clarify what problem you are trying to solve (what does
>>>>> "optimize" mean in this context or what is the benefit)?
>>>>>
>>>>
>>>> OK. The background of this work:
>>>> As well known, the memory failure mechanism handles memory corrupted
>>>> event, and try to send SIGBUS to the user process which uses this
>>>> corrupted page.
>>>>
>>>> For the virtualization case, QEMU catches SIGBUS and tries to inject MCE
>>>> into the guest, and the guest handles memory failure again. Thus the
>>>> guest gets the minimal effect from hardware memory corruption.
>>>>
>>>> The further step I'm working on:
>>>> 1, try to modify code to decrease poisoned pages in a single place
>>>> (mm/memofy-failure.c: simplify num_poisoned_pages_dec in this series).
> 
> This is fine to me.
> 
>>>>
>>>> 2, try to use page_handle_poison() to handle SetPageHWPoison() and
>>>> num_poisoned_pages_inc() together. It would be best to call
>>>> num_poisoned_pages_inc() in a single place too. I'm not sure if this is
>>>> possible or not, please correct me if I misunderstand.
> 
> SetPageHWPoison() can be cancelled in memory_failure(), so simply bundling
> it with num_poisoned_pages_inc() might not be optimal.  I think that
> action_result() is supposed to be called when memory error handling is
> effective (not filtered, not cancelled). So moving num_poisoned_pages_inc()
> (and notification code in your plan) into this function might be good.
> 
OK, I'll remove this patch(mm/memofy-failure.c: optimize 
hwpoison_filter) from this series, and fix the other 3 patches in the v2 
version. Then try to implement/test as your suggestion in another series.

>>>>
>>>> 3, introduce memory failure notifier list in memory-failure.c: notify
>>>> the corrupted PFN to someone who registers this list.
>>>> If I can complete [1] and [2] part, [3] will be quite easy(just call
>>>> notifier list after increasing poisoned page).
>>>>
>>>> 4, introduce memory recover VQ for memory balloon device, and registers
>>>> memory failure notifier list. During the guest kernel handles memory
>>>> failure, balloon device gets notified by memory failure notifier list,
>>>> and tells the host to recover the corrupted PFN(GPA) by the new VQ.
>>>
>>> Most probably you might want to do that asynchronously, and once the
>>> callback succeeds, un-poison the page.
>>>
>> Yes!
>>
>>>>
>>>> 5, host side remaps the corrupted page(HVA), and tells the guest side to
>>>> unpoison the PFN(GPA). Then the guest fixes the corrupted page(GPA)
>>>> dynamically.
>>>
>>> I think QEMU already does that during reboots. Now it would be triggered
>>> by the guest for individual pages.
>>>
>> Yes, currently QEMU supports to un-poison corrupted pages during
>> reset/reboot. We can reuse some code to do the work in this case, this
>> allows a VM to fix corrupted pages as soon as possible(also no need to
>> reset/reboot).
> 
> So this finally allows to replace broken page mapped to guest with
> a healthy page without rebooting the guest. That sounds helpful.
> 
> Thanks,
> Naoya Horiguchi
> 
Yes, it's my plan. Thanks for your suggestions!

>>
>>>>
>>>> Because [4] and [5] are related to balloon device, also CC Michael,
>>>> David and Jason.
>>>
>>> Doesn't sound too crazy for me, although it's a shame that we always
>>> have to use virtio-balloon for such fairly balloon-unrelated things.
>>>
>> Thanks!
>>
>> -- 
>> zhenwei pi

-- 
zhenwei pi

^ permalink raw reply	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2022-05-07  9:23 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-04-29 14:22 [PATCH 0/4] memory-failure: fix hwpoison_filter zhenwei pi
2022-04-29 14:22 ` [PATCH 1/4] mm/memory-failure.c: move clear_hwpoisoned_pages zhenwei pi
2022-05-06  8:55   ` Naoya Horiguchi
2022-04-29 14:22 ` [PATCH 2/4] mm/memofy-failure.c:: simplify num_poisoned_pages_dec zhenwei pi
2022-05-06  8:55   ` Naoya Horiguchi
2022-04-29 14:22 ` [PATCH 3/4] mm/memofy-failure.c: optimize hwpoison_filter zhenwei pi
2022-05-06  8:59   ` Naoya Horiguchi
2022-05-06 13:38     ` zhenwei pi
2022-05-06 16:28       ` David Hildenbrand
2022-05-07  0:28         ` zhenwei pi
2022-05-07  8:20           ` Naoya Horiguchi
2022-05-07  9:19             ` zhenwei pi
2022-04-29 14:22 ` [PATCH 4/4] mm/memofy-failure.c: add hwpoison_filter for soft offline zhenwei pi
2022-05-06  8:59   ` Naoya Horiguchi

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).