linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v2 0/5] Enhance soft hwpoison handling and injection
@ 2024-05-10  6:25 Jane Chu
  2024-05-10  6:25 ` [PATCH v2 1/5] mm/memory-failure: try to send SIGBUS even if unmap failed Jane Chu
                   ` (4 more replies)
  0 siblings, 5 replies; 15+ messages in thread
From: Jane Chu @ 2024-05-10  6:25 UTC (permalink / raw)
  To: linmiaohe, nao.horiguchi, akpm, osalvador, linux-mm, linux-kernel

This series aim at the following enhancement -
- Let one hwpoison injector, that is, madvise(MADV_HWPOISON) to behave
  more like as if a real UE occurred. Because the other two injectors
  such as hwpoison-inject and the 'einj' on x86 can't, and it seems to
  me we need a better simulation to real UE scenario.
- For years, if the kernel is unable to unmap a hwpoisoned page, it send
  a SIGKILL instead of SIGBUS to prevent user process from potentially
  accessing the page again. But in doing so, the user process also lose
  important information: vaddr, for recovery.  Fortunately, the kernel
  already has code to kill process re-accessing a hwpoisoned page, so
  remove the '!unmap_success' check.
- Right now, if a thp page under GUP longterm pin is hwpoisoned, and
  kernel cannot split the thp page, memory-failure simply ignores
  the UE and returns.  That's not ideal, it could deliver a SIGBUS with
  useful information for userspace recovery.

Changes in v2:
  - rebased to mm-stable as of 5/8/2024
  - added RB by Oscar Salvador
  - comments from Oscar on patch 1-of-3: clarify changelog
  - comments from Miahe Lin on patch 3-of-3: remove unnecessary user page
    checking and remove incorrect put_page() in kill_procs_now().
    Invoke kill_procs_now() regardless MF_ACTIN_REQUIRED is set or not,
    moved hwpoison_filter() higher up.
  - added two patches 3-of-5 and 4-of-5 

Jane Chu (5):
  mm/memory-failure: try to send SIGBUS even if unmap failed
  mm/madvise: Add MF_ACTION_REQUIRED to madvise(MADV_HWPOISON)
  mm/memory-failure: improve memory failure action_result messages
  mm/memory-failure: move hwpoison_filter() higher up
  mm/memory-failure: send SIGBUS in the event of thp split fail

 include/linux/mm.h      |  2 +
 include/ras/ras_event.h |  2 +
 mm/madvise.c            |  2 +-
 mm/memory-failure.c     | 89 +++++++++++++++++++++++++++--------------
 4 files changed, 65 insertions(+), 30 deletions(-)

-- 
2.39.3


^ permalink raw reply	[flat|nested] 15+ messages in thread

* [PATCH v2 1/5] mm/memory-failure: try to send SIGBUS even if unmap failed
  2024-05-10  6:25 [PATCH v2 0/5] Enhance soft hwpoison handling and injection Jane Chu
@ 2024-05-10  6:25 ` Jane Chu
  2024-05-11  7:01   ` Miaohe Lin
  2024-05-10  6:25 ` [PATCH v2 2/5] mm/madvise: Add MF_ACTION_REQUIRED to madvise(MADV_HWPOISON) Jane Chu
                   ` (3 subsequent siblings)
  4 siblings, 1 reply; 15+ messages in thread
From: Jane Chu @ 2024-05-10  6:25 UTC (permalink / raw)
  To: linmiaohe, nao.horiguchi, akpm, osalvador, linux-mm, linux-kernel

For years when it comes down to kill a process due to hwpoison,
a SIGBUS is delivered only if unmap has been successful.
Otherwise, a SIGKILL is delivered. And the reason for that is
to prevent the involved process from accessing the hwpoisoned
page again.

Since then a lot has changed, a hwpoisoned page is marked and
upon being re-accessed, the memory-failure handler invokes
kill_accessing_process() to kill the process immediately.
So let's take out the '!unmap_success' factor and try to deliver
SIGBUS if possible.

Signed-off-by: Jane Chu <jane.chu@oracle.com>
Reviewed-by: Oscar Salvador <osalvador@suse.de>
---
 mm/memory-failure.c | 15 ++++-----------
 1 file changed, 4 insertions(+), 11 deletions(-)

diff --git a/mm/memory-failure.c b/mm/memory-failure.c
index 16ada4fb02b7..739311e121af 100644
--- a/mm/memory-failure.c
+++ b/mm/memory-failure.c
@@ -514,22 +514,15 @@ void add_to_kill_ksm(struct task_struct *tsk, struct page *p,
  *
  * Only do anything when FORCEKILL is set, otherwise just free the
  * list (this is used for clean pages which do not need killing)
- * Also when FAIL is set do a force kill because something went
- * wrong earlier.
  */
-static void kill_procs(struct list_head *to_kill, int forcekill, bool fail,
+static void kill_procs(struct list_head *to_kill, int forcekill,
 		unsigned long pfn, int flags)
 {
 	struct to_kill *tk, *next;
 
 	list_for_each_entry_safe(tk, next, to_kill, nd) {
 		if (forcekill) {
-			/*
-			 * In case something went wrong with munmapping
-			 * make sure the process doesn't catch the
-			 * signal and then access the memory. Just kill it.
-			 */
-			if (fail || tk->addr == -EFAULT) {
+			if (tk->addr == -EFAULT) {
 				pr_err("%#lx: forcibly killing %s:%d because of failure to unmap corrupted page\n",
 				       pfn, tk->tsk->comm, tk->tsk->pid);
 				do_send_sig_info(SIGKILL, SEND_SIG_PRIV,
@@ -1660,7 +1653,7 @@ static bool hwpoison_user_mappings(struct folio *folio, struct page *p,
 	 */
 	forcekill = folio_test_dirty(folio) || (flags & MF_MUST_KILL) ||
 		    !unmap_success;
-	kill_procs(&tokill, forcekill, !unmap_success, pfn, flags);
+	kill_procs(&tokill, forcekill, pfn, flags);
 
 	return unmap_success;
 }
@@ -1724,7 +1717,7 @@ static void unmap_and_kill(struct list_head *to_kill, unsigned long pfn,
 		unmap_mapping_range(mapping, start, size, 0);
 	}
 
-	kill_procs(to_kill, flags & MF_MUST_KILL, false, pfn, flags);
+	kill_procs(to_kill, flags & MF_MUST_KILL, pfn, flags);
 }
 
 /*
-- 
2.39.3


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH v2 2/5] mm/madvise: Add MF_ACTION_REQUIRED to madvise(MADV_HWPOISON)
  2024-05-10  6:25 [PATCH v2 0/5] Enhance soft hwpoison handling and injection Jane Chu
  2024-05-10  6:25 ` [PATCH v2 1/5] mm/memory-failure: try to send SIGBUS even if unmap failed Jane Chu
@ 2024-05-10  6:25 ` Jane Chu
  2024-05-15 11:44   ` Oscar Salvador
  2024-05-10  6:26 ` [PATCH v2 3/5] mm/memory-failure: improve memory failure action_result messages Jane Chu
                   ` (2 subsequent siblings)
  4 siblings, 1 reply; 15+ messages in thread
From: Jane Chu @ 2024-05-10  6:25 UTC (permalink / raw)
  To: linmiaohe, nao.horiguchi, akpm, osalvador, linux-mm, linux-kernel

The soft hwpoison injector via madvise(MADV_HWPOISON) operates in
a synchrous way in a sense, the injector is also a process under
test, and should it have the poisoned page mapped in its address
space, it should legitimately get killed as much as in a real UE
situation.

Signed-off-by: Jane Chu <jane.chu@oracle.com>
---
 mm/madvise.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/mm/madvise.c b/mm/madvise.c
index c8ba3f3eb54d..d8a01d7b2860 100644
--- a/mm/madvise.c
+++ b/mm/madvise.c
@@ -1147,7 +1147,7 @@ static int madvise_inject_error(int behavior,
 		} else {
 			pr_info("Injecting memory failure for pfn %#lx at process virtual address %#lx\n",
 				 pfn, start);
-			ret = memory_failure(pfn, MF_COUNT_INCREASED | MF_SW_SIMULATED);
+			ret = memory_failure(pfn, MF_ACTION_REQUIRED | MF_COUNT_INCREASED | MF_SW_SIMULATED);
 			if (ret == -EOPNOTSUPP)
 				ret = 0;
 		}
-- 
2.39.3


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH v2 3/5] mm/memory-failure: improve memory failure action_result messages
  2024-05-10  6:25 [PATCH v2 0/5] Enhance soft hwpoison handling and injection Jane Chu
  2024-05-10  6:25 ` [PATCH v2 1/5] mm/memory-failure: try to send SIGBUS even if unmap failed Jane Chu
  2024-05-10  6:25 ` [PATCH v2 2/5] mm/madvise: Add MF_ACTION_REQUIRED to madvise(MADV_HWPOISON) Jane Chu
@ 2024-05-10  6:26 ` Jane Chu
  2024-05-16  9:46   ` Oscar Salvador
  2024-05-10  6:26 ` [PATCH v2 4/5] mm/memory-failure: move hwpoison_filter() higher up Jane Chu
  2024-05-10  6:26 ` [PATCH v2 5/5] mm/memory-failure: send SIGBUS in the event of thp split fail Jane Chu
  4 siblings, 1 reply; 15+ messages in thread
From: Jane Chu @ 2024-05-10  6:26 UTC (permalink / raw)
  To: linmiaohe, nao.horiguchi, akpm, osalvador, linux-mm, linux-kernel

Added two explicit MF_MSG messages describing failure in get_hwpoison_page.
Attemped to document the definition of various action names, and made a few
adjustment to the action_result() calls.

Signed-off-by: Jane Chu <jane.chu@oracle.com>
---
 include/linux/mm.h      |  2 ++
 include/ras/ras_event.h |  2 ++
 mm/memory-failure.c     | 28 +++++++++++++++++++++++-----
 3 files changed, 27 insertions(+), 5 deletions(-)

diff --git a/include/linux/mm.h b/include/linux/mm.h
index 9849dfda44d4..b4598c6a393a 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -4111,6 +4111,7 @@ enum mf_action_page_type {
 	MF_MSG_DIFFERENT_COMPOUND,
 	MF_MSG_HUGE,
 	MF_MSG_FREE_HUGE,
+	MF_MSG_GET_HWPOISON,
 	MF_MSG_UNMAP_FAILED,
 	MF_MSG_DIRTY_SWAPCACHE,
 	MF_MSG_CLEAN_SWAPCACHE,
@@ -4124,6 +4125,7 @@ enum mf_action_page_type {
 	MF_MSG_BUDDY,
 	MF_MSG_DAX,
 	MF_MSG_UNSPLIT_THP,
+	MF_MSG_ALREADY_POISONED,
 	MF_MSG_UNKNOWN,
 };
 
diff --git a/include/ras/ras_event.h b/include/ras/ras_event.h
index c011ea236e9b..b3f6832a94fe 100644
--- a/include/ras/ras_event.h
+++ b/include/ras/ras_event.h
@@ -360,6 +360,7 @@ TRACE_EVENT(aer_event,
 	EM ( MF_MSG_DIFFERENT_COMPOUND, "different compound page after locking" ) \
 	EM ( MF_MSG_HUGE, "huge page" )					\
 	EM ( MF_MSG_FREE_HUGE, "free huge page" )			\
+	EM ( MF_MSG_GET_HWPOISON, "get hwpoison page" )			\
 	EM ( MF_MSG_UNMAP_FAILED, "unmapping failed page" )		\
 	EM ( MF_MSG_DIRTY_SWAPCACHE, "dirty swapcache page" )		\
 	EM ( MF_MSG_CLEAN_SWAPCACHE, "clean swapcache page" )		\
@@ -373,6 +374,7 @@ TRACE_EVENT(aer_event,
 	EM ( MF_MSG_BUDDY, "free buddy page" )				\
 	EM ( MF_MSG_DAX, "dax page" )					\
 	EM ( MF_MSG_UNSPLIT_THP, "unsplit thp" )			\
+	EM ( MF_MSG_ALREADY_POISONED, "already poisoned" )		\
 	EMe ( MF_MSG_UNKNOWN, "unknown page" )
 
 /*
diff --git a/mm/memory-failure.c b/mm/memory-failure.c
index 739311e121af..62133c10fb51 100644
--- a/mm/memory-failure.c
+++ b/mm/memory-failure.c
@@ -879,6 +879,20 @@ static int kill_accessing_process(struct task_struct *p, unsigned long pfn,
 	return ret > 0 ? -EHWPOISON : -EFAULT;
 }
 
+/*
+ * MF_IGNORED - Either the m-f() handler did nothing to recover, or it did
+ *   something, then caught in a race condition which renders the effort sort
+ *   of moot. This is perhaps the most sever.
+ * MF_FAILED - The m-f() handler might have killed the process, but it can't
+ *   do much to the poisoned page other than marking it poisoned, due to
+ *   conditions such as extra pin, unmap failure, etc.
+ * MF_DELAYED - The poisoned page has been unmapped but not completely isolated,
+ *   such as page might remain in LRU. But an attempt by userspace process
+ *   to access the page again will hit page fault which will kill the process.
+ * MF_RECOVERED - The poisoned page has been completely isolated, via unmap,
+ *   taking the page out of the buddy system, or hole punching out of the
+ *   mapping.
+ */
 static const char *action_name[] = {
 	[MF_IGNORED] = "Ignored",
 	[MF_FAILED] = "Failed",
@@ -893,6 +907,7 @@ static const char * const action_page_types[] = {
 	[MF_MSG_DIFFERENT_COMPOUND]	= "different compound page after locking",
 	[MF_MSG_HUGE]			= "huge page",
 	[MF_MSG_FREE_HUGE]		= "free huge page",
+	[MF_MSG_GET_HWPOISON]		= "get hwpoison page",
 	[MF_MSG_UNMAP_FAILED]		= "unmapping failed page",
 	[MF_MSG_DIRTY_SWAPCACHE]	= "dirty swapcache page",
 	[MF_MSG_CLEAN_SWAPCACHE]	= "clean swapcache page",
@@ -906,6 +921,7 @@ static const char * const action_page_types[] = {
 	[MF_MSG_BUDDY]			= "free buddy page",
 	[MF_MSG_DAX]			= "dax page",
 	[MF_MSG_UNSPLIT_THP]		= "unsplit thp",
+	[MF_MSG_ALREADY_POISONED]	= "already poisoned",
 	[MF_MSG_UNKNOWN]		= "unknown page",
 };
 
@@ -1018,7 +1034,7 @@ static int me_unknown(struct page_state *ps, struct page *p)
 {
 	pr_err("%#lx: Unknown page state\n", page_to_pfn(p));
 	unlock_page(p);
-	return MF_FAILED;
+	return MF_IGNORED;
 }
 
 /*
@@ -2055,6 +2071,7 @@ static int try_memory_failure_hugetlb(unsigned long pfn, int flags, int *hugetlb
 		if (flags & MF_ACTION_REQUIRED) {
 			folio = page_folio(p);
 			res = kill_accessing_process(current, folio_pfn(folio), flags);
+			return action_result(pfn, MF_MSG_ALREADY_POISONED, MF_FAILED);
 		}
 		return res;
 	} else if (res == -EBUSY) {
@@ -2062,7 +2079,7 @@ static int try_memory_failure_hugetlb(unsigned long pfn, int flags, int *hugetlb
 			flags |= MF_NO_RETRY;
 			goto retry;
 		}
-		return action_result(pfn, MF_MSG_UNKNOWN, MF_IGNORED);
+		return action_result(pfn, MF_MSG_GET_HWPOISON, MF_IGNORED);
 	}
 
 	folio = page_folio(p);
@@ -2097,7 +2114,7 @@ static int try_memory_failure_hugetlb(unsigned long pfn, int flags, int *hugetlb
 
 	if (!hwpoison_user_mappings(folio, p, pfn, flags)) {
 		folio_unlock(folio);
-		return action_result(pfn, MF_MSG_UNMAP_FAILED, MF_IGNORED);
+		return action_result(pfn, MF_MSG_UNMAP_FAILED, MF_FAILED);
 	}
 
 	return identify_page_state(pfn, p, page_flags);
@@ -2231,6 +2248,7 @@ int memory_failure(unsigned long pfn, int flags)
 			res = kill_accessing_process(current, pfn, flags);
 		if (flags & MF_COUNT_INCREASED)
 			put_page(p);
+		action_result(pfn, MF_MSG_ALREADY_POISONED, MF_FAILED);
 		goto unlock_mutex;
 	}
 
@@ -2267,7 +2285,7 @@ int memory_failure(unsigned long pfn, int flags)
 			}
 			goto unlock_mutex;
 		} else if (res < 0) {
-			res = action_result(pfn, MF_MSG_UNKNOWN, MF_IGNORED);
+			res = action_result(pfn, MF_MSG_GET_HWPOISON, MF_IGNORED);
 			goto unlock_mutex;
 		}
 	}
@@ -2363,7 +2381,7 @@ int memory_failure(unsigned long pfn, int flags)
 	 * Abort on fail: __filemap_remove_folio() assumes unmapped page.
 	 */
 	if (!hwpoison_user_mappings(folio, p, pfn, flags)) {
-		res = action_result(pfn, MF_MSG_UNMAP_FAILED, MF_IGNORED);
+		res = action_result(pfn, MF_MSG_UNMAP_FAILED, MF_FAILED);
 		goto unlock_page;
 	}
 
-- 
2.39.3


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH v2 4/5] mm/memory-failure: move hwpoison_filter() higher up
  2024-05-10  6:25 [PATCH v2 0/5] Enhance soft hwpoison handling and injection Jane Chu
                   ` (2 preceding siblings ...)
  2024-05-10  6:26 ` [PATCH v2 3/5] mm/memory-failure: improve memory failure action_result messages Jane Chu
@ 2024-05-10  6:26 ` Jane Chu
  2024-05-11  8:29   ` Miaohe Lin
  2024-05-16 10:11   ` Oscar Salvador
  2024-05-10  6:26 ` [PATCH v2 5/5] mm/memory-failure: send SIGBUS in the event of thp split fail Jane Chu
  4 siblings, 2 replies; 15+ messages in thread
From: Jane Chu @ 2024-05-10  6:26 UTC (permalink / raw)
  To: linmiaohe, nao.horiguchi, akpm, osalvador, linux-mm, linux-kernel

Move hwpoison_filter() higher up as there is no need to spend a lot
cycles only to find out later that the page is supposed to be skipped
for hwpoison handling.

Signed-off-by: Jane Chu <jane.chu@oracle.com>
---
 mm/memory-failure.c | 15 +++++++--------
 1 file changed, 7 insertions(+), 8 deletions(-)

diff --git a/mm/memory-failure.c b/mm/memory-failure.c
index 62133c10fb51..2fa884d8b5a3 100644
--- a/mm/memory-failure.c
+++ b/mm/memory-failure.c
@@ -2236,6 +2236,13 @@ int memory_failure(unsigned long pfn, int flags)
 		goto unlock_mutex;
 	}
 
+	if (hwpoison_filter(p)) {
+		if (flags & MF_COUNT_INCREASED)
+			put_page(p);
+		res = -EOPNOTSUPP;
+		goto unlock_mutex;
+	}
+
 try_again:
 	res = try_memory_failure_hugetlb(pfn, flags, &hugetlb);
 	if (hugetlb)
@@ -2354,14 +2361,6 @@ int memory_failure(unsigned long pfn, int flags)
 	 */
 	page_flags = folio->flags;
 
-	if (hwpoison_filter(p)) {
-		ClearPageHWPoison(p);
-		folio_unlock(folio);
-		folio_put(folio);
-		res = -EOPNOTSUPP;
-		goto unlock_mutex;
-	}
-
 	/*
 	 * __munlock_folio() may clear a writeback folio's LRU flag without
 	 * the folio lock. We need to wait for writeback completion for this
-- 
2.39.3


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH v2 5/5] mm/memory-failure: send SIGBUS in the event of thp split fail
  2024-05-10  6:25 [PATCH v2 0/5] Enhance soft hwpoison handling and injection Jane Chu
                   ` (3 preceding siblings ...)
  2024-05-10  6:26 ` [PATCH v2 4/5] mm/memory-failure: move hwpoison_filter() higher up Jane Chu
@ 2024-05-10  6:26 ` Jane Chu
  2024-05-16 12:47   ` Oscar Salvador
  4 siblings, 1 reply; 15+ messages in thread
From: Jane Chu @ 2024-05-10  6:26 UTC (permalink / raw)
  To: linmiaohe, nao.horiguchi, akpm, osalvador, linux-mm, linux-kernel

When handle hwpoison in a RDMA longterm pinned thp page,
try_to_split_thp_page() will fail. And at this point, there is
little else the kernel could do except sending a SIGBUS to
the user process, thus give it a chance to recover.

Signed-off-by: Jane Chu <jane.chu@oracle.com>
---
 mm/memory-failure.c | 31 ++++++++++++++++++++++++++-----
 1 file changed, 26 insertions(+), 5 deletions(-)

diff --git a/mm/memory-failure.c b/mm/memory-failure.c
index 2fa884d8b5a3..15bb1c0c42e8 100644
--- a/mm/memory-failure.c
+++ b/mm/memory-failure.c
@@ -1697,7 +1697,7 @@ static int identify_page_state(unsigned long pfn, struct page *p,
 	return page_action(ps, p, pfn);
 }
 
-static int try_to_split_thp_page(struct page *page)
+static int try_to_split_thp_page(struct page *page, bool release)
 {
 	int ret;
 
@@ -1705,7 +1705,7 @@ static int try_to_split_thp_page(struct page *page)
 	ret = split_huge_page(page);
 	unlock_page(page);
 
-	if (unlikely(ret))
+	if (ret && release)
 		put_page(page);
 
 	return ret;
@@ -2177,6 +2177,24 @@ static int memory_failure_dev_pagemap(unsigned long pfn, int flags,
 	return rc;
 }
 
+/*
+ * The calling condition is as such: thp split failed, page might have
+ * been RDMA pinned, not much can be done for recovery.
+ * But a SIGBUS should be delivered with vaddr provided so that the user
+ * application has a chance to recover. Also, application processes'
+ * election for MCE early killed will be honored.
+ */
+static int kill_procs_now(struct page *p, unsigned long pfn, int flags,
+				struct folio *folio)
+{
+	LIST_HEAD(tokill);
+
+	collect_procs(folio, p, &tokill, flags & MF_ACTION_REQUIRED);
+	kill_procs(&tokill, true, pfn, flags);
+
+	return -EHWPOISON;
+}
+
 /**
  * memory_failure - Handle memory failure of a page.
  * @pfn: Page Number of the corrupted page
@@ -2313,8 +2331,11 @@ int memory_failure(unsigned long pfn, int flags)
 		 * page is a valid handlable page.
 		 */
 		folio_set_has_hwpoisoned(folio);
-		if (try_to_split_thp_page(p) < 0) {
-			res = action_result(pfn, MF_MSG_UNSPLIT_THP, MF_IGNORED);
+		if (try_to_split_thp_page(p, false) < 0) {
+			pr_err("%#lx: thp split failed\n", pfn);
+			res = kill_procs_now(p, pfn, flags, folio);
+			put_page(p);
+			res = action_result(pfn, MF_MSG_UNSPLIT_THP, MF_FAILED);
 			goto unlock_mutex;
 		}
 		VM_BUG_ON_PAGE(!page_count(p), p);
@@ -2688,7 +2709,7 @@ static int soft_offline_in_use_page(struct page *page)
 	};
 
 	if (!huge && folio_test_large(folio)) {
-		if (try_to_split_thp_page(page)) {
+		if (try_to_split_thp_page(page, true)) {
 			pr_info("soft offline: %#lx: thp split failed\n", pfn);
 			return -EBUSY;
 		}
-- 
2.39.3


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* Re: [PATCH v2 1/5] mm/memory-failure: try to send SIGBUS even if unmap failed
  2024-05-10  6:25 ` [PATCH v2 1/5] mm/memory-failure: try to send SIGBUS even if unmap failed Jane Chu
@ 2024-05-11  7:01   ` Miaohe Lin
  0 siblings, 0 replies; 15+ messages in thread
From: Miaohe Lin @ 2024-05-11  7:01 UTC (permalink / raw)
  To: Jane Chu, nao.horiguchi, akpm, osalvador, linux-mm, linux-kernel

On 2024/5/10 14:25, Jane Chu wrote:
> For years when it comes down to kill a process due to hwpoison,
> a SIGBUS is delivered only if unmap has been successful.
> Otherwise, a SIGKILL is delivered. And the reason for that is
> to prevent the involved process from accessing the hwpoisoned
> page again.
> 
> Since then a lot has changed, a hwpoisoned page is marked and
> upon being re-accessed, the memory-failure handler invokes
> kill_accessing_process() to kill the process immediately.
> So let's take out the '!unmap_success' factor and try to deliver
> SIGBUS if possible.
> 
> Signed-off-by: Jane Chu <jane.chu@oracle.com>
> Reviewed-by: Oscar Salvador <osalvador@suse.de>

LGTM. Thanks.
Acked-by: Miaohe Lin <linmiaohe@huawei.com>
Thanks.
.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH v2 4/5] mm/memory-failure: move hwpoison_filter() higher up
  2024-05-10  6:26 ` [PATCH v2 4/5] mm/memory-failure: move hwpoison_filter() higher up Jane Chu
@ 2024-05-11  8:29   ` Miaohe Lin
  2024-05-20 18:15     ` Jane Chu
  2024-05-16 10:11   ` Oscar Salvador
  1 sibling, 1 reply; 15+ messages in thread
From: Miaohe Lin @ 2024-05-11  8:29 UTC (permalink / raw)
  To: Jane Chu, nao.horiguchi, akpm, osalvador, linux-mm, linux-kernel

On 2024/5/10 14:26, Jane Chu wrote:
> Move hwpoison_filter() higher up as there is no need to spend a lot
> cycles only to find out later that the page is supposed to be skipped
> for hwpoison handling.
> 
> Signed-off-by: Jane Chu <jane.chu@oracle.com>
> ---
>  mm/memory-failure.c | 15 +++++++--------
>  1 file changed, 7 insertions(+), 8 deletions(-)
> 
> diff --git a/mm/memory-failure.c b/mm/memory-failure.c
> index 62133c10fb51..2fa884d8b5a3 100644
> --- a/mm/memory-failure.c
> +++ b/mm/memory-failure.c
> @@ -2236,6 +2236,13 @@ int memory_failure(unsigned long pfn, int flags)
>  		goto unlock_mutex;
>  	}
>  
> +	if (hwpoison_filter(p)) {
> +		if (flags & MF_COUNT_INCREASED)
> +			put_page(p);
> +		res = -EOPNOTSUPP;
> +		goto unlock_mutex;
> +	}

It might not be a good idea to do hwpoison_filter() here. We don't hold extra page refcnt
yet, so the page state will be really unstable. Or am I miss something?
Thanks.
.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH v2 2/5] mm/madvise: Add MF_ACTION_REQUIRED to madvise(MADV_HWPOISON)
  2024-05-10  6:25 ` [PATCH v2 2/5] mm/madvise: Add MF_ACTION_REQUIRED to madvise(MADV_HWPOISON) Jane Chu
@ 2024-05-15 11:44   ` Oscar Salvador
  0 siblings, 0 replies; 15+ messages in thread
From: Oscar Salvador @ 2024-05-15 11:44 UTC (permalink / raw)
  To: Jane Chu; +Cc: linmiaohe, nao.horiguchi, akpm, linux-mm, linux-kernel

On Fri, May 10, 2024 at 12:25:59AM -0600, Jane Chu wrote:
> The soft hwpoison injector via madvise(MADV_HWPOISON) operates in
> a synchrous way in a sense, the injector is also a process under
> test, and should it have the poisoned page mapped in its address
> space, it should legitimately get killed as much as in a real UE
> situation.
> 
> Signed-off-by: Jane Chu <jane.chu@oracle.com>

You should also mention that this aligns with what the madvise man page
mentions:

"... This operation may result in the calling process receiving a SIGBUS
and the page being unmapped..."

Reviewed-by: Oscar Salvador <oalvador@suse.de>

> ---
>  mm/madvise.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/mm/madvise.c b/mm/madvise.c
> index c8ba3f3eb54d..d8a01d7b2860 100644
> --- a/mm/madvise.c
> +++ b/mm/madvise.c
> @@ -1147,7 +1147,7 @@ static int madvise_inject_error(int behavior,
>  		} else {
>  			pr_info("Injecting memory failure for pfn %#lx at process virtual address %#lx\n",
>  				 pfn, start);
> -			ret = memory_failure(pfn, MF_COUNT_INCREASED | MF_SW_SIMULATED);
> +			ret = memory_failure(pfn, MF_ACTION_REQUIRED | MF_COUNT_INCREASED | MF_SW_SIMULATED);
>  			if (ret == -EOPNOTSUPP)
>  				ret = 0;
>  		}
> -- 
> 2.39.3
> 

-- 
Oscar Salvador
SUSE Labs

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH v2 3/5] mm/memory-failure: improve memory failure action_result messages
  2024-05-10  6:26 ` [PATCH v2 3/5] mm/memory-failure: improve memory failure action_result messages Jane Chu
@ 2024-05-16  9:46   ` Oscar Salvador
  2024-05-20 18:30     ` Jane Chu
  0 siblings, 1 reply; 15+ messages in thread
From: Oscar Salvador @ 2024-05-16  9:46 UTC (permalink / raw)
  To: Jane Chu; +Cc: linmiaohe, nao.horiguchi, akpm, linux-mm, linux-kernel

On Fri, May 10, 2024 at 12:26:00AM -0600, Jane Chu wrote:
> Added two explicit MF_MSG messages describing failure in get_hwpoison_page.
> Attemped to document the definition of various action names, and made a few
> adjustment to the action_result() calls.
> 
> Signed-off-by: Jane Chu <jane.chu@oracle.com>
...  
> +/*
> + * MF_IGNORED - Either the m-f() handler did nothing to recover, or it did
"or if it "
> + *   something, then caught in a race condition which renders the effort sort
"it was caught"

I would also add to MF_IGNORED that we mark the page hwpoisoned anyway.
  
> @@ -1018,7 +1034,7 @@ static int me_unknown(struct page_state *ps, struct page *p)
>  {
>  	pr_err("%#lx: Unknown page state\n", page_to_pfn(p));
>  	unlock_page(p);
> -	return MF_FAILED;
> +	return MF_IGNORED;
>  }

I was confused because I saw you replaced all MF_MSG_UNKNOWN, so I
wondered how we can end up here until I saw this is a catch-all in case
we fail to make sense of the page->flags.
While you are improving this, I would suggest to add a little comment
above the function explaining how we can reach it.

>  /*
> @@ -2055,6 +2071,7 @@ static int try_memory_failure_hugetlb(unsigned long pfn, int flags, int *hugetlb
>  		if (flags & MF_ACTION_REQUIRED) {
>  			folio = page_folio(p);
>  			res = kill_accessing_process(current, folio_pfn(folio), flags);
> +			return action_result(pfn, MF_MSG_ALREADY_POISONED, MF_FAILED);

I do not understand why are you doing this.

First of all, why is this considered a failure? We did not fail this
time, did we? We went right ahead and kill the process which was
re-accessing the hwpoisoned page. Is that considered a failure?

Second, you are know supressing -EHWPOISON with whatever action_result()
will gives us, which judging from the actual code would be -EBUSY?
I do not think that that is right, and we should be returning
-EHWPOISON. Or whatever error code kill_accessing_process() gives us.


> @@ -2231,6 +2248,7 @@ int memory_failure(unsigned long pfn, int flags)
>  			res = kill_accessing_process(current, pfn, flags);
>  		if (flags & MF_COUNT_INCREASED)
>  			put_page(p);
> +		action_result(pfn, MF_MSG_ALREADY_POISONED, MF_FAILED);

This is not coherent with what you did in try_memory_failure_hugetlb
for MF_MSG_ALREADY_POISONED, I __think__ that in there we should be
doing the same as we do here.

 

-- 
Oscar Salvador
SUSE Labs

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH v2 4/5] mm/memory-failure: move hwpoison_filter() higher up
  2024-05-10  6:26 ` [PATCH v2 4/5] mm/memory-failure: move hwpoison_filter() higher up Jane Chu
  2024-05-11  8:29   ` Miaohe Lin
@ 2024-05-16 10:11   ` Oscar Salvador
  1 sibling, 0 replies; 15+ messages in thread
From: Oscar Salvador @ 2024-05-16 10:11 UTC (permalink / raw)
  To: Jane Chu; +Cc: linmiaohe, nao.horiguchi, akpm, linux-mm, linux-kernel

On Fri, May 10, 2024 at 12:26:01AM -0600, Jane Chu wrote:
> Move hwpoison_filter() higher up as there is no need to spend a lot
> cycles only to find out later that the page is supposed to be skipped
> for hwpoison handling.
> 
> Signed-off-by: Jane Chu <jane.chu@oracle.com>
> ---
>  mm/memory-failure.c | 15 +++++++--------
>  1 file changed, 7 insertions(+), 8 deletions(-)
> 
> diff --git a/mm/memory-failure.c b/mm/memory-failure.c
> index 62133c10fb51..2fa884d8b5a3 100644
> --- a/mm/memory-failure.c
> +++ b/mm/memory-failure.c
> @@ -2236,6 +2236,13 @@ int memory_failure(unsigned long pfn, int flags)
>  		goto unlock_mutex;
>  	}
>  
> +	if (hwpoison_filter(p)) {
> +		if (flags & MF_COUNT_INCREASED)
> +			put_page(p);
> +		res = -EOPNOTSUPP;
> +		goto unlock_mutex;
> +	}

Now, it is true that doing this might not be optimal for the reasons
explained by Miaohe, but the whole hwpoison_filter() thing is only used
by the hwpoison-inject code AFAICS, which is just for testing purposes,
so I do not think there is any harm in lifting the check.

But no real strong opinion here.


-- 
Oscar Salvador
SUSE Labs

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH v2 5/5] mm/memory-failure: send SIGBUS in the event of thp split fail
  2024-05-10  6:26 ` [PATCH v2 5/5] mm/memory-failure: send SIGBUS in the event of thp split fail Jane Chu
@ 2024-05-16 12:47   ` Oscar Salvador
  2024-05-20 18:48     ` Jane Chu
  0 siblings, 1 reply; 15+ messages in thread
From: Oscar Salvador @ 2024-05-16 12:47 UTC (permalink / raw)
  To: Jane Chu; +Cc: linmiaohe, nao.horiguchi, akpm, linux-mm, linux-kernel

On Fri, May 10, 2024 at 12:26:02AM -0600, Jane Chu wrote:
> When handle hwpoison in a RDMA longterm pinned thp page,
> try_to_split_thp_page() will fail. And at this point, there is
> little else the kernel could do except sending a SIGBUS to
> the user process, thus give it a chance to recover.

Well, it does need to be a RDMA longterm pinned, right?
Anything holding an extra refcount can already make us bite the dust, so
I would not make it that specific.


> Signed-off-by: Jane Chu <jane.chu@oracle.com>
> ---
>  mm/memory-failure.c | 31 ++++++++++++++++++++++++++-----
>  1 file changed, 26 insertions(+), 5 deletions(-)
> 
> diff --git a/mm/memory-failure.c b/mm/memory-failure.c
> index 2fa884d8b5a3..15bb1c0c42e8 100644
> --- a/mm/memory-failure.c
> +++ b/mm/memory-failure.c
> @@ -1697,7 +1697,7 @@ static int identify_page_state(unsigned long pfn, struct page *p,
>  	return page_action(ps, p, pfn);
>  }
>  
> -static int try_to_split_thp_page(struct page *page)
> +static int try_to_split_thp_page(struct page *page, bool release)
>  {
>  	int ret;
>  
> @@ -1705,7 +1705,7 @@ static int try_to_split_thp_page(struct page *page)
>  	ret = split_huge_page(page);
>  	unlock_page(page);
>  
> -	if (unlikely(ret))
> +	if (ret && release)
>  		put_page(page);

I would document whhen and when not we can release the page.
E.g: we cannot release it if there are still processes mapping the thp.


> +static int kill_procs_now(struct page *p, unsigned long pfn, int flags,
> +				struct folio *folio)
> +{
> +	LIST_HEAD(tokill);
> +
> +	collect_procs(folio, p, &tokill, flags & MF_ACTION_REQUIRED);
> +	kill_procs(&tokill, true, pfn, flags);
> +
> +	return -EHWPOISON;

You are returning -EHWPOISON here,

> +}
> +
>  /**
>   * memory_failure - Handle memory failure of a page.
>   * @pfn: Page Number of the corrupted page
> @@ -2313,8 +2331,11 @@ int memory_failure(unsigned long pfn, int flags)
>  		 * page is a valid handlable page.
>  		 */
>  		folio_set_has_hwpoisoned(folio);
> -		if (try_to_split_thp_page(p) < 0) {
> -			res = action_result(pfn, MF_MSG_UNSPLIT_THP, MF_IGNORED);
> +		if (try_to_split_thp_page(p, false) < 0) {
> +			pr_err("%#lx: thp split failed\n", pfn);
> +			res = kill_procs_now(p, pfn, flags, folio);
> +			put_page(p);
> +			res = action_result(pfn, MF_MSG_UNSPLIT_THP, MF_FAILED);

just to overwrite it here with action_result(). Which one do we need?
I think we would need -EBUSY here, right? So I would drop the retcode
from kill_procs_now.

Also, do we want the extra pr_err() here.
action_result() will already provide us the pfn and the
action_page_types which will be "unsplit thp". Is not that clear enough?

I would drop that.


-- 
Oscar Salvador
SUSE Labs

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH v2 4/5] mm/memory-failure: move hwpoison_filter() higher up
  2024-05-11  8:29   ` Miaohe Lin
@ 2024-05-20 18:15     ` Jane Chu
  0 siblings, 0 replies; 15+ messages in thread
From: Jane Chu @ 2024-05-20 18:15 UTC (permalink / raw)
  To: Miaohe Lin, nao.horiguchi, akpm, osalvador, linux-mm, linux-kernel

On 5/11/2024 1:29 AM, Miaohe Lin wrote:

> On 2024/5/10 14:26, Jane Chu wrote:
>> Move hwpoison_filter() higher up as there is no need to spend a lot
>> cycles only to find out later that the page is supposed to be skipped
>> for hwpoison handling.
>>
>> Signed-off-by: Jane Chu <jane.chu@oracle.com>
>> ---
>>   mm/memory-failure.c | 15 +++++++--------
>>   1 file changed, 7 insertions(+), 8 deletions(-)
>>
>> diff --git a/mm/memory-failure.c b/mm/memory-failure.c
>> index 62133c10fb51..2fa884d8b5a3 100644
>> --- a/mm/memory-failure.c
>> +++ b/mm/memory-failure.c
>> @@ -2236,6 +2236,13 @@ int memory_failure(unsigned long pfn, int flags)
>>   		goto unlock_mutex;
>>   	}
>>   
>> +	if (hwpoison_filter(p)) {
>> +		if (flags & MF_COUNT_INCREASED)
>> +			put_page(p);
>> +		res = -EOPNOTSUPP;
>> +		goto unlock_mutex;
>> +	}
> It might not be a good idea to do hwpoison_filter() here. We don't hold extra page refcnt
> yet, so the page state will be really unstable. Or am I miss something?

I agree with you.

It  looks like hwpoison_filter_flags() in particular needs a stable page 
in order to retrieve

a wholesome KPF_ flags set that at any time, although the flags could 
change immediately

afterwards, they won't be torn flags. For that, it looks like the folio 
should be locked as well.

thanks!

-jane

> Thanks.
> .

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH v2 3/5] mm/memory-failure: improve memory failure action_result messages
  2024-05-16  9:46   ` Oscar Salvador
@ 2024-05-20 18:30     ` Jane Chu
  0 siblings, 0 replies; 15+ messages in thread
From: Jane Chu @ 2024-05-20 18:30 UTC (permalink / raw)
  To: Oscar Salvador; +Cc: linmiaohe, nao.horiguchi, akpm, linux-mm, linux-kernel

On 5/16/2024 2:46 AM, Oscar Salvador wrote:

> On Fri, May 10, 2024 at 12:26:00AM -0600, Jane Chu wrote:
>> Added two explicit MF_MSG messages describing failure in get_hwpoison_page.
>> Attemped to document the definition of various action names, and made a few
>> adjustment to the action_result() calls.
>>
>> Signed-off-by: Jane Chu <jane.chu@oracle.com>
> ...
>> +/*
>> + * MF_IGNORED - Either the m-f() handler did nothing to recover, or it did
> "or if it"
>> + *   something, then caught in a race condition which renders the effort sort
> "it was caught"
>
> I would also add to MF_IGNORED that we mark the page hwpoisoned anyway.
Thanks!  Will fix, and add comments.
>    
>> @@ -1018,7 +1034,7 @@ static int me_unknown(struct page_state *ps, struct page *p)
>>   {
>>   	pr_err("%#lx: Unknown page state\n", page_to_pfn(p));
>>   	unlock_page(p);
>> -	return MF_FAILED;
>> +	return MF_IGNORED;
>>   }
> I was confused because I saw you replaced all MF_MSG_UNKNOWN, so I
> wondered how we can end up here until I saw this is a catch-all in case
> we fail to make sense of the page->flags.
> While you are improving this, I would suggest to add a little comment
> above the function explaining how we can reach it.
Yes, it's a catch-all versus something that suppose to happen, will add 
comments.
>
>>   /*
>> @@ -2055,6 +2071,7 @@ static int try_memory_failure_hugetlb(unsigned long pfn, int flags, int *hugetlb
>>   		if (flags & MF_ACTION_REQUIRED) {
>>   			folio = page_folio(p);
>>   			res = kill_accessing_process(current, folio_pfn(folio), flags);
>> +			return action_result(pfn, MF_MSG_ALREADY_POISONED, MF_FAILED);
> I do not understand why are you doing this.
>
> First of all, why is this considered a failure? We did not fail this
> time, did we? We went right ahead and kill the process which was
> re-accessing the hwpoisoned page. Is that considered a failure?

Given that the goal for the m-f() handler is to isolate the poisoned 
page, so in this case,

even if the kernel has killed the page, nothing else could be done to 
prevent the page from

being accessed and triggering another MCE, which is, I suppose more 
sever than triggering a page fault.

>
> Second, you are know supressing -EHWPOISON with whatever action_result()
> will gives us, which judging from the actual code would be -EBUSY?
> I do not think that that is right, and we should be returning
> -EHWPOISON. Or whatever error code kill_accessing_process() gives us.
>
>
>> @@ -2231,6 +2248,7 @@ int memory_failure(unsigned long pfn, int flags)
>>   			res = kill_accessing_process(current, pfn, flags);
>>   		if (flags & MF_COUNT_INCREASED)
>>   			put_page(p);
>> +		action_result(pfn, MF_MSG_ALREADY_POISONED, MF_FAILED);
> This is not coherent with what you did in try_memory_failure_hugetlb
> for MF_MSG_ALREADY_POISONED, I __think__ that in there we should be
> doing the same as we do here.
>
>   

Good catch, thanks.  In both cases, 'res' from kill_accessing_process() 
should be returned.

Thanks!

-jane

>

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH v2 5/5] mm/memory-failure: send SIGBUS in the event of thp split fail
  2024-05-16 12:47   ` Oscar Salvador
@ 2024-05-20 18:48     ` Jane Chu
  0 siblings, 0 replies; 15+ messages in thread
From: Jane Chu @ 2024-05-20 18:48 UTC (permalink / raw)
  To: Oscar Salvador; +Cc: linmiaohe, nao.horiguchi, akpm, linux-mm, linux-kernel

On 5/16/2024 5:47 AM, Oscar Salvador wrote:

> On Fri, May 10, 2024 at 12:26:02AM -0600, Jane Chu wrote:
>> When handle hwpoison in a RDMA longterm pinned thp page,
>> try_to_split_thp_page() will fail. And at this point, there is
>> little else the kernel could do except sending a SIGBUS to
>> the user process, thus give it a chance to recover.
> Well, it does need to be a RDMA longterm pinned, right?
> Anything holding an extra refcount can already make us bite the dust, so
> I would not make it that specific.

How about let me just mention RDMA longterm pin as one of the use cases?

To be honest, it is the only known case to me, and not all FOLL_LONGTERM 
pin

lead to THP split failure.

>
>> Signed-off-by: Jane Chu <jane.chu@oracle.com>
>> ---
>>   mm/memory-failure.c | 31 ++++++++++++++++++++++++++-----
>>   1 file changed, 26 insertions(+), 5 deletions(-)
>>
>> diff --git a/mm/memory-failure.c b/mm/memory-failure.c
>> index 2fa884d8b5a3..15bb1c0c42e8 100644
>> --- a/mm/memory-failure.c
>> +++ b/mm/memory-failure.c
>> @@ -1697,7 +1697,7 @@ static int identify_page_state(unsigned long pfn, struct page *p,
>>   	return page_action(ps, p, pfn);
>>   }
>>   
>> -static int try_to_split_thp_page(struct page *page)
>> +static int try_to_split_thp_page(struct page *page, bool release)
>>   {
>>   	int ret;
>>   
>> @@ -1705,7 +1705,7 @@ static int try_to_split_thp_page(struct page *page)
>>   	ret = split_huge_page(page);
>>   	unlock_page(page);
>>   
>> -	if (unlikely(ret))
>> +	if (ret && release)
>>   		put_page(page);
> I would document whhen and when not we can release the page.
> E.g: we cannot release it if there are still processes mapping the thp.
Sure.
>
>> +static int kill_procs_now(struct page *p, unsigned long pfn, int flags,
>> +				struct folio *folio)
>> +{
>> +	LIST_HEAD(tokill);
>> +
>> +	collect_procs(folio, p, &tokill, flags & MF_ACTION_REQUIRED);
>> +	kill_procs(&tokill, true, pfn, flags);
>> +
>> +	return -EHWPOISON;
> You are returning -EHWPOISON here,

Yes, indeed.

>> +}
>> +
>>   /**
>>    * memory_failure - Handle memory failure of a page.
>>    * @pfn: Page Number of the corrupted page
>> @@ -2313,8 +2331,11 @@ int memory_failure(unsigned long pfn, int flags)
>>   		 * page is a valid handlable page.
>>   		 */
>>   		folio_set_has_hwpoisoned(folio);
>> -		if (try_to_split_thp_page(p) < 0) {
>> -			res = action_result(pfn, MF_MSG_UNSPLIT_THP, MF_IGNORED);
>> +		if (try_to_split_thp_page(p, false) < 0) {
>> +			pr_err("%#lx: thp split failed\n", pfn);
>> +			res = kill_procs_now(p, pfn, flags, folio);
>> +			put_page(p);
>> +			res = action_result(pfn, MF_MSG_UNSPLIT_THP, MF_FAILED);
> just to overwrite it here with action_result(). Which one do we need?
> I think we would need -EBUSY here, right? So I would drop the retcode
> from kill_procs_now.

The overwrite was wrong, it should return -EHWPOISON to indicate to the 
caller (such as kill_me_maybe)

that no further action against the process is needed since the m-f() 
handler has killed the process.

>
> Also, do we want the extra pr_err() here.
> action_result() will already provide us the pfn and the
> action_page_types which will be "unsplit thp". Is not that clear enough?
>
> I would drop that.

Agreed, will drop the extra print.

thanks!

-jane

>
>

^ permalink raw reply	[flat|nested] 15+ messages in thread

end of thread, other threads:[~2024-05-20 18:49 UTC | newest]

Thread overview: 15+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2024-05-10  6:25 [PATCH v2 0/5] Enhance soft hwpoison handling and injection Jane Chu
2024-05-10  6:25 ` [PATCH v2 1/5] mm/memory-failure: try to send SIGBUS even if unmap failed Jane Chu
2024-05-11  7:01   ` Miaohe Lin
2024-05-10  6:25 ` [PATCH v2 2/5] mm/madvise: Add MF_ACTION_REQUIRED to madvise(MADV_HWPOISON) Jane Chu
2024-05-15 11:44   ` Oscar Salvador
2024-05-10  6:26 ` [PATCH v2 3/5] mm/memory-failure: improve memory failure action_result messages Jane Chu
2024-05-16  9:46   ` Oscar Salvador
2024-05-20 18:30     ` Jane Chu
2024-05-10  6:26 ` [PATCH v2 4/5] mm/memory-failure: move hwpoison_filter() higher up Jane Chu
2024-05-11  8:29   ` Miaohe Lin
2024-05-20 18:15     ` Jane Chu
2024-05-16 10:11   ` Oscar Salvador
2024-05-10  6:26 ` [PATCH v2 5/5] mm/memory-failure: send SIGBUS in the event of thp split fail Jane Chu
2024-05-16 12:47   ` Oscar Salvador
2024-05-20 18:48     ` Jane Chu

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).