linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v4 0/2] mm/memory-failure: don't allow to unpoison hw corrupted page
@ 2022-06-14  4:38 zhenwei pi
  2022-06-14  4:38 ` [PATCH v4 1/2] mm/memory-failure: introduce "hwpoisoned-pages" entry zhenwei pi
  2022-06-14  4:38 ` [PATCH v4 2/2] mm/memory-failure: disable unpoison once hw error happens zhenwei pi
  0 siblings, 2 replies; 8+ messages in thread
From: zhenwei pi @ 2022-06-14  4:38 UTC (permalink / raw)
  To: akpm, naoya.horiguchi
  Cc: linux-mm, linux-kernel, david, linmiaohe, zhenwei pi

v3 -> v4:
- Add debug entry "hwpoisoned-pages" to show the number of hwpoisoned
  pages.
- Disable unpoison when a read HW memory failure occurs.

v2 -> v3:
- David pointed out that virt_to_kpte() is broken(no pmd_large() test
  on a PMD), so drop this API in this patch, walk kmap instead.

v1 -> v2:
- this change gets protected by mf_mutex
- use -EOPNOTSUPP instead of -EPERM 

v1:
- check KPTE to avoid to unpoison hardware corrupted page

zhenwei pi (2):
  mm/memory-failure: introduce "hwpoisoned-pages" entry
  mm/memory-failure: disable unpoison once hw error happens

 Documentation/vm/hwpoison.rst |  7 ++++++-
 mm/hwpoison-inject.c          | 25 ++++++++++++++++++++++++-
 mm/memory-failure.c           |  1 +
 3 files changed, 31 insertions(+), 2 deletions(-)

-- 
2.20.1


^ permalink raw reply	[flat|nested] 8+ messages in thread

* [PATCH v4 1/2] mm/memory-failure: introduce "hwpoisoned-pages" entry
  2022-06-14  4:38 [PATCH v4 0/2] mm/memory-failure: don't allow to unpoison hw corrupted page zhenwei pi
@ 2022-06-14  4:38 ` zhenwei pi
  2022-06-14  5:12   ` Muchun Song
  2022-06-14  7:09   ` HORIGUCHI NAOYA(堀口 直也)
  2022-06-14  4:38 ` [PATCH v4 2/2] mm/memory-failure: disable unpoison once hw error happens zhenwei pi
  1 sibling, 2 replies; 8+ messages in thread
From: zhenwei pi @ 2022-06-14  4:38 UTC (permalink / raw)
  To: akpm, naoya.horiguchi
  Cc: linux-mm, linux-kernel, david, linmiaohe, zhenwei pi

Add a new debug entry to show the number of hwpoisoned pages. And
use module_get/module_put to manager this kernel module, don't allow
to remove this module unless hwpoisoned-pages is zero.

Signed-off-by: zhenwei pi <pizhenwei@bytedance.com>
---
 Documentation/vm/hwpoison.rst |  4 ++++
 mm/hwpoison-inject.c          | 19 ++++++++++++++++++-
 2 files changed, 22 insertions(+), 1 deletion(-)

diff --git a/Documentation/vm/hwpoison.rst b/Documentation/vm/hwpoison.rst
index c742de1769d1..c832a8b192d4 100644
--- a/Documentation/vm/hwpoison.rst
+++ b/Documentation/vm/hwpoison.rst
@@ -155,6 +155,10 @@ Testing
 	flag bits are defined in include/linux/kernel-page-flags.h and
 	documented in Documentation/admin-guide/mm/pagemap.rst
 
+  hwpoisoned-pages
+	The number of hwpoisoned pages. The hwpoison kernel module can not be
+	removed unless this count is zero.
+
 * Architecture specific MCE injector
 
   x86 has mce-inject, mce-test
diff --git a/mm/hwpoison-inject.c b/mm/hwpoison-inject.c
index 5c0cddd81505..9e522ecedeef 100644
--- a/mm/hwpoison-inject.c
+++ b/mm/hwpoison-inject.c
@@ -10,6 +10,7 @@
 #include "internal.h"
 
 static struct dentry *hwpoison_dir;
+static atomic_t hwpoisoned_pages;
 
 static int hwpoison_inject(void *data, u64 val)
 {
@@ -49,15 +50,28 @@ static int hwpoison_inject(void *data, u64 val)
 inject:
 	pr_info("Injecting memory failure at pfn %#lx\n", pfn);
 	err = memory_failure(pfn, 0);
+	if (!err) {
+		WARN_ON(!try_module_get(THIS_MODULE));
+		atomic_inc(&hwpoisoned_pages);
+	}
+
 	return (err == -EOPNOTSUPP) ? 0 : err;
 }
 
 static int hwpoison_unpoison(void *data, u64 val)
 {
+	int ret;
+
 	if (!capable(CAP_SYS_ADMIN))
 		return -EPERM;
 
-	return unpoison_memory(val);
+	ret = unpoison_memory(val);
+	if (!ret) {
+		atomic_dec(&hwpoisoned_pages);
+		module_put(THIS_MODULE);
+	}
+
+	return ret;
 }
 
 DEFINE_DEBUGFS_ATTRIBUTE(hwpoison_fops, NULL, hwpoison_inject, "%lli\n");
@@ -99,6 +113,9 @@ static int pfn_inject_init(void)
 	debugfs_create_u64("corrupt-filter-flags-value", 0600, hwpoison_dir,
 			   &hwpoison_filter_flags_value);
 
+	debugfs_create_atomic_t("hwpoisoned-pages", 0400, hwpoison_dir,
+			   &hwpoisoned_pages);
+
 #ifdef CONFIG_MEMCG
 	debugfs_create_u64("corrupt-filter-memcg", 0600, hwpoison_dir,
 			   &hwpoison_filter_memcg);
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 8+ messages in thread

* [PATCH v4 2/2] mm/memory-failure: disable unpoison once hw error happens
  2022-06-14  4:38 [PATCH v4 0/2] mm/memory-failure: don't allow to unpoison hw corrupted page zhenwei pi
  2022-06-14  4:38 ` [PATCH v4 1/2] mm/memory-failure: introduce "hwpoisoned-pages" entry zhenwei pi
@ 2022-06-14  4:38 ` zhenwei pi
  1 sibling, 0 replies; 8+ messages in thread
From: zhenwei pi @ 2022-06-14  4:38 UTC (permalink / raw)
  To: akpm, naoya.horiguchi
  Cc: linux-mm, linux-kernel, david, linmiaohe, zhenwei pi

Currently unpoison_memory(unsigned long pfn) is designed for soft
poison(hwpoison-inject) only. Since 17fae1294ad9d, the KPTE gets
cleared on a x86 platform once hardware memory corrupts.

Unpoisoning a hardware corrupted page puts page back buddy only,
the kernel has a chance to access the page with *NOT PRESENT* KPTE.
This leads BUG during accessing on the corrupted KPTE.

Do not allow to unpoison hardware corrupted page in unpoison_memory() to
avoid BUG like this:

 Unpoison: Software-unpoisoned page 0x61234
 BUG: unable to handle page fault for address: ffff888061234000
 #PF: supervisor write access in kernel mode
 #PF: error_code(0x0002) - not-present page
 PGD 2c01067 P4D 2c01067 PUD 107267063 PMD 10382b063 PTE 800fffff9edcb062
 Oops: 0002 [#1] PREEMPT SMP NOPTI
 CPU: 4 PID: 26551 Comm: stress Kdump: loaded Tainted: G   M       OE     5.18.0.bm.1-amd64 #7
 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996) ...
 RIP: 0010:clear_page_erms+0x7/0x10
 Code: ...
 RSP: 0000:ffffc90001107bc8 EFLAGS: 00010246
 RAX: 0000000000000000 RBX: 0000000000000901 RCX: 0000000000001000
 RDX: ffffea0001848d00 RSI: ffffea0001848d40 RDI: ffff888061234000
 RBP: ffffea0001848d00 R08: 0000000000000901 R09: 0000000000001276
 R10: 0000000000000003 R11: 0000000000000000 R12: 0000000000000001
 R13: 0000000000000000 R14: 0000000000140dca R15: 0000000000000001
 FS:  00007fd8b2333740(0000) GS:ffff88813fd00000(0000) knlGS:0000000000000000
 CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
 CR2: ffff888061234000 CR3: 00000001023d2005 CR4: 0000000000770ee0
 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
 PKRU: 55555554
 Call Trace:
  <TASK>
  prep_new_page+0x151/0x170
  get_page_from_freelist+0xca0/0xe20
  ? sysvec_apic_timer_interrupt+0xab/0xc0
  ? asm_sysvec_apic_timer_interrupt+0x1b/0x20
  __alloc_pages+0x17e/0x340
  __folio_alloc+0x17/0x40
  vma_alloc_folio+0x84/0x280
  __handle_mm_fault+0x8d4/0xeb0
  handle_mm_fault+0xd5/0x2a0
  do_user_addr_fault+0x1d0/0x680
  ? kvm_read_and_reset_apf_flags+0x3b/0x50
  exc_page_fault+0x78/0x170
  asm_exc_page_fault+0x27/0x30

Suggested by David&Naoya, disable unpoison mechanism when a real HW error
happens.

Fixes: 847ce401df392 ("HWPOISON: Add unpoisoning support")
Fixes: 17fae1294ad9d ("x86/{mce,mm}: Unmap the entire page if the whole page is affected and poisoned")
Cc: Naoya Horiguchi <naoya.horiguchi@nec.com>
Cc: David Hildenbrand <david@redhat.com>
Signed-off-by: zhenwei pi <pizhenwei@bytedance.com>
---
 Documentation/vm/hwpoison.rst | 3 ++-
 mm/hwpoison-inject.c          | 6 ++++++
 mm/memory-failure.c           | 1 +
 3 files changed, 9 insertions(+), 1 deletion(-)

diff --git a/Documentation/vm/hwpoison.rst b/Documentation/vm/hwpoison.rst
index c832a8b192d4..ac439381cad4 100644
--- a/Documentation/vm/hwpoison.rst
+++ b/Documentation/vm/hwpoison.rst
@@ -120,7 +120,8 @@ Testing
   unpoison-pfn
 	Software-unpoison page at PFN echoed into this file. This way
 	a page can be reused again.  This only works for Linux
-	injected failures, not for real memory failures.
+	injected failures, not for real memory failures. Once any hardware
+	memory failure happens, the feature is disabled.
 
   Note these injection interfaces are not stable and might change between
   kernel versions
diff --git a/mm/hwpoison-inject.c b/mm/hwpoison-inject.c
index 9e522ecedeef..787d2daf41e8 100644
--- a/mm/hwpoison-inject.c
+++ b/mm/hwpoison-inject.c
@@ -7,6 +7,7 @@
 #include <linux/swap.h>
 #include <linux/pagemap.h>
 #include <linux/hugetlb.h>
+#include <linux/swapops.h>
 #include "internal.h"
 
 static struct dentry *hwpoison_dir;
@@ -65,6 +66,11 @@ static int hwpoison_unpoison(void *data, u64 val)
 	if (!capable(CAP_SYS_ADMIN))
 		return -EPERM;
 
+	if (atomic_read(&hwpoisoned_pages) != atomic_long_read(&num_poisoned_pages)) {
+		pr_info("Unpoison is disabled after hardware memory failure happened\n");
+		return -EOPNOTSUPP;
+	}
+
 	ret = unpoison_memory(val);
 	if (!ret) {
 		atomic_dec(&hwpoisoned_pages);
diff --git a/mm/memory-failure.c b/mm/memory-failure.c
index b85661cbdc4a..a3e6bd4b5528 100644
--- a/mm/memory-failure.c
+++ b/mm/memory-failure.c
@@ -68,6 +68,7 @@ int sysctl_memory_failure_early_kill __read_mostly = 0;
 int sysctl_memory_failure_recovery __read_mostly = 1;
 
 atomic_long_t num_poisoned_pages __read_mostly = ATOMIC_LONG_INIT(0);
+EXPORT_SYMBOL_GPL(num_poisoned_pages);
 
 static bool __page_handle_poison(struct page *page)
 {
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 8+ messages in thread

* Re: [PATCH v4 1/2] mm/memory-failure: introduce "hwpoisoned-pages" entry
  2022-06-14  4:38 ` [PATCH v4 1/2] mm/memory-failure: introduce "hwpoisoned-pages" entry zhenwei pi
@ 2022-06-14  5:12   ` Muchun Song
  2022-06-14  7:09   ` HORIGUCHI NAOYA(堀口 直也)
  1 sibling, 0 replies; 8+ messages in thread
From: Muchun Song @ 2022-06-14  5:12 UTC (permalink / raw)
  To: zhenwei pi
  Cc: akpm, naoya.horiguchi, linux-mm, linux-kernel, david, linmiaohe

On Tue, Jun 14, 2022 at 12:38:29PM +0800, zhenwei pi wrote:
> Add a new debug entry to show the number of hwpoisoned pages. And
> use module_get/module_put to manager this kernel module, don't allow
> to remove this module unless hwpoisoned-pages is zero.
> 
> Signed-off-by: zhenwei pi <pizhenwei@bytedance.com>
> ---
>  Documentation/vm/hwpoison.rst |  4 ++++
>  mm/hwpoison-inject.c          | 19 ++++++++++++++++++-
>  2 files changed, 22 insertions(+), 1 deletion(-)
> 
> diff --git a/Documentation/vm/hwpoison.rst b/Documentation/vm/hwpoison.rst
> index c742de1769d1..c832a8b192d4 100644
> --- a/Documentation/vm/hwpoison.rst
> +++ b/Documentation/vm/hwpoison.rst
> @@ -155,6 +155,10 @@ Testing
>  	flag bits are defined in include/linux/kernel-page-flags.h and
>  	documented in Documentation/admin-guide/mm/pagemap.rst
>  
> +  hwpoisoned-pages

A bit weird to me. IIUC, this means the number of **software** poisoned
pages instead of **hardware**. The prefix "hw" may be not suitable.  How
about "poisoned-pages" (a little simplified), "poisoned-pfns" (keep the
name consistent with "corrupt-pfn" and "unpoison-pfn") or "swpoisoned-pages"
(sw means software)?

> +	The number of hwpoisoned pages. The hwpoison kernel module can not be
> +	removed unless this count is zero.
> +
>  * Architecture specific MCE injector
>  
>    x86 has mce-inject, mce-test
> diff --git a/mm/hwpoison-inject.c b/mm/hwpoison-inject.c
> index 5c0cddd81505..9e522ecedeef 100644
> --- a/mm/hwpoison-inject.c
> +++ b/mm/hwpoison-inject.c
> @@ -10,6 +10,7 @@
>  #include "internal.h"
>  
>  static struct dentry *hwpoison_dir;
> +static atomic_t hwpoisoned_pages;
>  
>  static int hwpoison_inject(void *data, u64 val)
>  {
> @@ -49,15 +50,28 @@ static int hwpoison_inject(void *data, u64 val)
>  inject:
>  	pr_info("Injecting memory failure at pfn %#lx\n", pfn);
>  	err = memory_failure(pfn, 0);
> +	if (!err) {
> +		WARN_ON(!try_module_get(THIS_MODULE));

__module_get() is enough since we already hold a refcount at open time.
This WARN_ON() will not be triggered unless something unexpected happens.

> +		atomic_inc(&hwpoisoned_pages);
> +	}
> +
>  	return (err == -EOPNOTSUPP) ? 0 : err;
>  }
>  
>  static int hwpoison_unpoison(void *data, u64 val)
>  {
> +	int ret;
> +
>  	if (!capable(CAP_SYS_ADMIN))
>  		return -EPERM;
>  
> -	return unpoison_memory(val);
> +	ret = unpoison_memory(val);
> +	if (!ret) {
> +		atomic_dec(&hwpoisoned_pages);
> +		module_put(THIS_MODULE);
> +	}
> +
> +	return ret;
>  }
>  
>  DEFINE_DEBUGFS_ATTRIBUTE(hwpoison_fops, NULL, hwpoison_inject, "%lli\n");
> @@ -99,6 +113,9 @@ static int pfn_inject_init(void)
>  	debugfs_create_u64("corrupt-filter-flags-value", 0600, hwpoison_dir,
>  			   &hwpoison_filter_flags_value);
>  
> +	debugfs_create_atomic_t("hwpoisoned-pages", 0400, hwpoison_dir,
> +			   &hwpoisoned_pages);
> +
>  #ifdef CONFIG_MEMCG
>  	debugfs_create_u64("corrupt-filter-memcg", 0600, hwpoison_dir,
>  			   &hwpoison_filter_memcg);
> -- 
> 2.20.1
> 
> 

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH v4 1/2] mm/memory-failure: introduce "hwpoisoned-pages" entry
  2022-06-14  4:38 ` [PATCH v4 1/2] mm/memory-failure: introduce "hwpoisoned-pages" entry zhenwei pi
  2022-06-14  5:12   ` Muchun Song
@ 2022-06-14  7:09   ` HORIGUCHI NAOYA(堀口 直也)
  2022-06-14  7:13     ` David Hildenbrand
  1 sibling, 1 reply; 8+ messages in thread
From: HORIGUCHI NAOYA(堀口 直也) @ 2022-06-14  7:09 UTC (permalink / raw)
  To: zhenwei pi; +Cc: akpm, linux-mm, linux-kernel, david, linmiaohe

On Tue, Jun 14, 2022 at 12:38:29PM +0800, zhenwei pi wrote:
> Add a new debug entry to show the number of hwpoisoned pages. And
> use module_get/module_put to manager this kernel module, don't allow
> to remove this module unless hwpoisoned-pages is zero.
> 
> Signed-off-by: zhenwei pi <pizhenwei@bytedance.com>
> ---
>  Documentation/vm/hwpoison.rst |  4 ++++
>  mm/hwpoison-inject.c          | 19 ++++++++++++++++++-
>  2 files changed, 22 insertions(+), 1 deletion(-)
> 
> diff --git a/Documentation/vm/hwpoison.rst b/Documentation/vm/hwpoison.rst
> index c742de1769d1..c832a8b192d4 100644
> --- a/Documentation/vm/hwpoison.rst
> +++ b/Documentation/vm/hwpoison.rst
> @@ -155,6 +155,10 @@ Testing
>  	flag bits are defined in include/linux/kernel-page-flags.h and
>  	documented in Documentation/admin-guide/mm/pagemap.rst
>  
> +  hwpoisoned-pages
> +	The number of hwpoisoned pages. The hwpoison kernel module can not be
> +	removed unless this count is zero.
> +
>  * Architecture specific MCE injector
>  
>    x86 has mce-inject, mce-test
> diff --git a/mm/hwpoison-inject.c b/mm/hwpoison-inject.c
> index 5c0cddd81505..9e522ecedeef 100644
> --- a/mm/hwpoison-inject.c
> +++ b/mm/hwpoison-inject.c
> @@ -10,6 +10,7 @@
>  #include "internal.h"
>  
>  static struct dentry *hwpoison_dir;
> +static atomic_t hwpoisoned_pages;
>  
>  static int hwpoison_inject(void *data, u64 val)
>  {
> @@ -49,15 +50,28 @@ static int hwpoison_inject(void *data, u64 val)
>  inject:
>  	pr_info("Injecting memory failure at pfn %#lx\n", pfn);
>  	err = memory_failure(pfn, 0);
> +	if (!err) {
> +		WARN_ON(!try_module_get(THIS_MODULE));
> +		atomic_inc(&hwpoisoned_pages);
> +	}

There's a few other interfaces to generate "software-simulated memory error"
event, i.e. madvise_inject_error() and hard_offline_page_store(). So you need
handle such code path.

> +
>  	return (err == -EOPNOTSUPP) ? 0 : err;
>  }
>  
>  static int hwpoison_unpoison(void *data, u64 val)
>  {
> +	int ret;
> +
>  	if (!capable(CAP_SYS_ADMIN))
>  		return -EPERM;
>  
> -	return unpoison_memory(val);
> +	ret = unpoison_memory(val);
> +	if (!ret) {
> +		atomic_dec(&hwpoisoned_pages);
> +		module_put(THIS_MODULE);
> +	}
> +
> +	return ret;
>  }
>  
>  DEFINE_DEBUGFS_ATTRIBUTE(hwpoison_fops, NULL, hwpoison_inject, "%lli\n");
> @@ -99,6 +113,9 @@ static int pfn_inject_init(void)
>  	debugfs_create_u64("corrupt-filter-flags-value", 0600, hwpoison_dir,
>  			   &hwpoison_filter_flags_value);
>  
> +	debugfs_create_atomic_t("hwpoisoned-pages", 0400, hwpoison_dir,
> +			   &hwpoisoned_pages);

I'm not sure how useful this interface from userspace (controlling test process
with this?).  Do we really need to expose this to userspace? 


TBH I feel that another approach like below is more desirable:

  - define a new flag in "enum mf_flags" (for example named MF_SW_SIMULATED),
  - set the flag when calling memory_failure() from the three callers
    mentioned above,
  - define a global variable (typed bool) in mm/memory_failure.c_to show that
    the system has experienced a real hardware memory error events.
  - once memory_failure() is called without MF_SW_SIMULATED, the new global
    bool variable is set, and afterward unpoison_memory always fails with
    -EOPNOTSUPP.

Thanks,
Naoya Horiguchi

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH v4 1/2] mm/memory-failure: introduce "hwpoisoned-pages" entry
  2022-06-14  7:09   ` HORIGUCHI NAOYA(堀口 直也)
@ 2022-06-14  7:13     ` David Hildenbrand
  2022-06-14  7:23       ` [External] " zhenwei pi
  2022-06-14  8:19       ` Miaohe Lin
  0 siblings, 2 replies; 8+ messages in thread
From: David Hildenbrand @ 2022-06-14  7:13 UTC (permalink / raw)
  To: HORIGUCHI NAOYA(堀口 直也), zhenwei pi
  Cc: akpm, linux-mm, linux-kernel, linmiaohe

	   &hwpoisoned_pages);
> 
> I'm not sure how useful this interface from userspace (controlling test process
> with this?).  Do we really need to expose this to userspace? 
> 
> 
> TBH I feel that another approach like below is more desirable:
> 
>   - define a new flag in "enum mf_flags" (for example named MF_SW_SIMULATED),
>   - set the flag when calling memory_failure() from the three callers
>     mentioned above,
>   - define a global variable (typed bool) in mm/memory_failure.c_to show that
>     the system has experienced a real hardware memory error events.
>   - once memory_failure() is called without MF_SW_SIMULATED, the new global
>     bool variable is set, and afterward unpoison_memory always fails with
>     -EOPNOTSUPP.

Exactly what I had in mind.

-- 
Thanks,

David / dhildenb


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [External] Re: [PATCH v4 1/2] mm/memory-failure: introduce "hwpoisoned-pages" entry
  2022-06-14  7:13     ` David Hildenbrand
@ 2022-06-14  7:23       ` zhenwei pi
  2022-06-14  8:19       ` Miaohe Lin
  1 sibling, 0 replies; 8+ messages in thread
From: zhenwei pi @ 2022-06-14  7:23 UTC (permalink / raw)
  To: David Hildenbrand, HORIGUCHI NAOYA(堀口 直也)
  Cc: akpm, linux-mm, linux-kernel, linmiaohe



On 6/14/22 15:13, David Hildenbrand wrote:
> 	   &hwpoisoned_pages);
>>
>> I'm not sure how useful this interface from userspace (controlling test process
>> with this?).  Do we really need to expose this to userspace?
>>
>>
>> TBH I feel that another approach like below is more desirable:
>>
>>    - define a new flag in "enum mf_flags" (for example named MF_SW_SIMULATED),
>>    - set the flag when calling memory_failure() from the three callers
>>      mentioned above,
>>    - define a global variable (typed bool) in mm/memory_failure.c_to show that
>>      the system has experienced a real hardware memory error events.
>>    - once memory_failure() is called without MF_SW_SIMULATED, the new global
>>      bool variable is set, and afterward unpoison_memory always fails with
>>      -EOPNOTSUPP.
> 
> Exactly what I had in mind.
> 

Sure, I'll send a new version later! Thanks!

-- 
zhenwei pi

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH v4 1/2] mm/memory-failure: introduce "hwpoisoned-pages" entry
  2022-06-14  7:13     ` David Hildenbrand
  2022-06-14  7:23       ` [External] " zhenwei pi
@ 2022-06-14  8:19       ` Miaohe Lin
  1 sibling, 0 replies; 8+ messages in thread
From: Miaohe Lin @ 2022-06-14  8:19 UTC (permalink / raw)
  To: David Hildenbrand, HORIGUCHI NAOYA(堀口 直也)
  Cc: akpm, linux-mm, linux-kernel, zhenwei pi

On 2022/6/14 15:13, David Hildenbrand wrote:
> 	   &hwpoisoned_pages);
>>
>> I'm not sure how useful this interface from userspace (controlling test process
>> with this?).  Do we really need to expose this to userspace? 
>>
>>
>> TBH I feel that another approach like below is more desirable:
>>
>>   - define a new flag in "enum mf_flags" (for example named MF_SW_SIMULATED),
>>   - set the flag when calling memory_failure() from the three callers
>>     mentioned above,
>>   - define a global variable (typed bool) in mm/memory_failure.c_to show that
>>     the system has experienced a real hardware memory error events.
>>   - once memory_failure() is called without MF_SW_SIMULATED, the new global
>>     bool variable is set, and afterward unpoison_memory always fails with
>>     -EOPNOTSUPP.
> 
> Exactly what I had in mind.

This approach should be more straightforward. ;)

> 


^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2022-06-14  8:19 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-06-14  4:38 [PATCH v4 0/2] mm/memory-failure: don't allow to unpoison hw corrupted page zhenwei pi
2022-06-14  4:38 ` [PATCH v4 1/2] mm/memory-failure: introduce "hwpoisoned-pages" entry zhenwei pi
2022-06-14  5:12   ` Muchun Song
2022-06-14  7:09   ` HORIGUCHI NAOYA(堀口 直也)
2022-06-14  7:13     ` David Hildenbrand
2022-06-14  7:23       ` [External] " zhenwei pi
2022-06-14  8:19       ` Miaohe Lin
2022-06-14  4:38 ` [PATCH v4 2/2] mm/memory-failure: disable unpoison once hw error happens zhenwei pi

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).