All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v4 0/2] mm/memory-failure: don't allow to unpoison hw corrupted page
@ 2022-06-14  4:38 zhenwei pi
  2022-06-14  4:38 ` [PATCH v4 1/2] mm/memory-failure: introduce "hwpoisoned-pages" entry zhenwei pi
  2022-06-14  4:38 ` [PATCH v4 2/2] mm/memory-failure: disable unpoison once hw error happens zhenwei pi
  0 siblings, 2 replies; 8+ messages in thread
From: zhenwei pi @ 2022-06-14  4:38 UTC (permalink / raw)
  To: akpm, naoya.horiguchi
  Cc: linux-mm, linux-kernel, david, linmiaohe, zhenwei pi

v3 -> v4:
- Add debug entry "hwpoisoned-pages" to show the number of hwpoisoned
  pages.
- Disable unpoison when a read HW memory failure occurs.

v2 -> v3:
- David pointed out that virt_to_kpte() is broken(no pmd_large() test
  on a PMD), so drop this API in this patch, walk kmap instead.

v1 -> v2:
- this change gets protected by mf_mutex
- use -EOPNOTSUPP instead of -EPERM 

v1:
- check KPTE to avoid to unpoison hardware corrupted page

zhenwei pi (2):
  mm/memory-failure: introduce "hwpoisoned-pages" entry
  mm/memory-failure: disable unpoison once hw error happens

 Documentation/vm/hwpoison.rst |  7 ++++++-
 mm/hwpoison-inject.c          | 25 ++++++++++++++++++++++++-
 mm/memory-failure.c           |  1 +
 3 files changed, 31 insertions(+), 2 deletions(-)

-- 
2.20.1


^ permalink raw reply	[flat|nested] 8+ messages in thread

* [PATCH v4 1/2] mm/memory-failure: introduce "hwpoisoned-pages" entry
  2022-06-14  4:38 [PATCH v4 0/2] mm/memory-failure: don't allow to unpoison hw corrupted page zhenwei pi
@ 2022-06-14  4:38 ` zhenwei pi
  2022-06-14  5:12   ` Muchun Song
  2022-06-14  7:09   ` HORIGUCHI NAOYA(堀口 直也)
  2022-06-14  4:38 ` [PATCH v4 2/2] mm/memory-failure: disable unpoison once hw error happens zhenwei pi
  1 sibling, 2 replies; 8+ messages in thread
From: zhenwei pi @ 2022-06-14  4:38 UTC (permalink / raw)
  To: akpm, naoya.horiguchi
  Cc: linux-mm, linux-kernel, david, linmiaohe, zhenwei pi

Add a new debug entry to show the number of hwpoisoned pages. And
use module_get/module_put to manager this kernel module, don't allow
to remove this module unless hwpoisoned-pages is zero.

Signed-off-by: zhenwei pi <pizhenwei@bytedance.com>
---
 Documentation/vm/hwpoison.rst |  4 ++++
 mm/hwpoison-inject.c          | 19 ++++++++++++++++++-
 2 files changed, 22 insertions(+), 1 deletion(-)

diff --git a/Documentation/vm/hwpoison.rst b/Documentation/vm/hwpoison.rst
index c742de1769d1..c832a8b192d4 100644
--- a/Documentation/vm/hwpoison.rst
+++ b/Documentation/vm/hwpoison.rst
@@ -155,6 +155,10 @@ Testing
 	flag bits are defined in include/linux/kernel-page-flags.h and
 	documented in Documentation/admin-guide/mm/pagemap.rst
 
+  hwpoisoned-pages
+	The number of hwpoisoned pages. The hwpoison kernel module can not be
+	removed unless this count is zero.
+
 * Architecture specific MCE injector
 
   x86 has mce-inject, mce-test
diff --git a/mm/hwpoison-inject.c b/mm/hwpoison-inject.c
index 5c0cddd81505..9e522ecedeef 100644
--- a/mm/hwpoison-inject.c
+++ b/mm/hwpoison-inject.c
@@ -10,6 +10,7 @@
 #include "internal.h"
 
 static struct dentry *hwpoison_dir;
+static atomic_t hwpoisoned_pages;
 
 static int hwpoison_inject(void *data, u64 val)
 {
@@ -49,15 +50,28 @@ static int hwpoison_inject(void *data, u64 val)
 inject:
 	pr_info("Injecting memory failure at pfn %#lx\n", pfn);
 	err = memory_failure(pfn, 0);
+	if (!err) {
+		WARN_ON(!try_module_get(THIS_MODULE));
+		atomic_inc(&hwpoisoned_pages);
+	}
+
 	return (err == -EOPNOTSUPP) ? 0 : err;
 }
 
 static int hwpoison_unpoison(void *data, u64 val)
 {
+	int ret;
+
 	if (!capable(CAP_SYS_ADMIN))
 		return -EPERM;
 
-	return unpoison_memory(val);
+	ret = unpoison_memory(val);
+	if (!ret) {
+		atomic_dec(&hwpoisoned_pages);
+		module_put(THIS_MODULE);
+	}
+
+	return ret;
 }
 
 DEFINE_DEBUGFS_ATTRIBUTE(hwpoison_fops, NULL, hwpoison_inject, "%lli\n");
@@ -99,6 +113,9 @@ static int pfn_inject_init(void)
 	debugfs_create_u64("corrupt-filter-flags-value", 0600, hwpoison_dir,
 			   &hwpoison_filter_flags_value);
 
+	debugfs_create_atomic_t("hwpoisoned-pages", 0400, hwpoison_dir,
+			   &hwpoisoned_pages);
+
 #ifdef CONFIG_MEMCG
 	debugfs_create_u64("corrupt-filter-memcg", 0600, hwpoison_dir,
 			   &hwpoison_filter_memcg);
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 8+ messages in thread

* [PATCH v4 2/2] mm/memory-failure: disable unpoison once hw error happens
  2022-06-14  4:38 [PATCH v4 0/2] mm/memory-failure: don't allow to unpoison hw corrupted page zhenwei pi
  2022-06-14  4:38 ` [PATCH v4 1/2] mm/memory-failure: introduce "hwpoisoned-pages" entry zhenwei pi
@ 2022-06-14  4:38 ` zhenwei pi
  1 sibling, 0 replies; 8+ messages in thread
From: zhenwei pi @ 2022-06-14  4:38 UTC (permalink / raw)
  To: akpm, naoya.horiguchi
  Cc: linux-mm, linux-kernel, david, linmiaohe, zhenwei pi

Currently unpoison_memory(unsigned long pfn) is designed for soft
poison(hwpoison-inject) only. Since 17fae1294ad9d, the KPTE gets
cleared on a x86 platform once hardware memory corrupts.

Unpoisoning a hardware corrupted page puts page back buddy only,
the kernel has a chance to access the page with *NOT PRESENT* KPTE.
This leads BUG during accessing on the corrupted KPTE.

Do not allow to unpoison hardware corrupted page in unpoison_memory() to
avoid BUG like this:

 Unpoison: Software-unpoisoned page 0x61234
 BUG: unable to handle page fault for address: ffff888061234000
 #PF: supervisor write access in kernel mode
 #PF: error_code(0x0002) - not-present page
 PGD 2c01067 P4D 2c01067 PUD 107267063 PMD 10382b063 PTE 800fffff9edcb062
 Oops: 0002 [#1] PREEMPT SMP NOPTI
 CPU: 4 PID: 26551 Comm: stress Kdump: loaded Tainted: G   M       OE     5.18.0.bm.1-amd64 #7
 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996) ...
 RIP: 0010:clear_page_erms+0x7/0x10
 Code: ...
 RSP: 0000:ffffc90001107bc8 EFLAGS: 00010246
 RAX: 0000000000000000 RBX: 0000000000000901 RCX: 0000000000001000
 RDX: ffffea0001848d00 RSI: ffffea0001848d40 RDI: ffff888061234000
 RBP: ffffea0001848d00 R08: 0000000000000901 R09: 0000000000001276
 R10: 0000000000000003 R11: 0000000000000000 R12: 0000000000000001
 R13: 0000000000000000 R14: 0000000000140dca R15: 0000000000000001
 FS:  00007fd8b2333740(0000) GS:ffff88813fd00000(0000) knlGS:0000000000000000
 CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
 CR2: ffff888061234000 CR3: 00000001023d2005 CR4: 0000000000770ee0
 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
 PKRU: 55555554
 Call Trace:
  <TASK>
  prep_new_page+0x151/0x170
  get_page_from_freelist+0xca0/0xe20
  ? sysvec_apic_timer_interrupt+0xab/0xc0
  ? asm_sysvec_apic_timer_interrupt+0x1b/0x20
  __alloc_pages+0x17e/0x340
  __folio_alloc+0x17/0x40
  vma_alloc_folio+0x84/0x280
  __handle_mm_fault+0x8d4/0xeb0
  handle_mm_fault+0xd5/0x2a0
  do_user_addr_fault+0x1d0/0x680
  ? kvm_read_and_reset_apf_flags+0x3b/0x50
  exc_page_fault+0x78/0x170
  asm_exc_page_fault+0x27/0x30

Suggested by David&Naoya, disable unpoison mechanism when a real HW error
happens.

Fixes: 847ce401df392 ("HWPOISON: Add unpoisoning support")
Fixes: 17fae1294ad9d ("x86/{mce,mm}: Unmap the entire page if the whole page is affected and poisoned")
Cc: Naoya Horiguchi <naoya.horiguchi@nec.com>
Cc: David Hildenbrand <david@redhat.com>
Signed-off-by: zhenwei pi <pizhenwei@bytedance.com>
---
 Documentation/vm/hwpoison.rst | 3 ++-
 mm/hwpoison-inject.c          | 6 ++++++
 mm/memory-failure.c           | 1 +
 3 files changed, 9 insertions(+), 1 deletion(-)

diff --git a/Documentation/vm/hwpoison.rst b/Documentation/vm/hwpoison.rst
index c832a8b192d4..ac439381cad4 100644
--- a/Documentation/vm/hwpoison.rst
+++ b/Documentation/vm/hwpoison.rst
@@ -120,7 +120,8 @@ Testing
   unpoison-pfn
 	Software-unpoison page at PFN echoed into this file. This way
 	a page can be reused again.  This only works for Linux
-	injected failures, not for real memory failures.
+	injected failures, not for real memory failures. Once any hardware
+	memory failure happens, the feature is disabled.
 
   Note these injection interfaces are not stable and might change between
   kernel versions
diff --git a/mm/hwpoison-inject.c b/mm/hwpoison-inject.c
index 9e522ecedeef..787d2daf41e8 100644
--- a/mm/hwpoison-inject.c
+++ b/mm/hwpoison-inject.c
@@ -7,6 +7,7 @@
 #include <linux/swap.h>
 #include <linux/pagemap.h>
 #include <linux/hugetlb.h>
+#include <linux/swapops.h>
 #include "internal.h"
 
 static struct dentry *hwpoison_dir;
@@ -65,6 +66,11 @@ static int hwpoison_unpoison(void *data, u64 val)
 	if (!capable(CAP_SYS_ADMIN))
 		return -EPERM;
 
+	if (atomic_read(&hwpoisoned_pages) != atomic_long_read(&num_poisoned_pages)) {
+		pr_info("Unpoison is disabled after hardware memory failure happened\n");
+		return -EOPNOTSUPP;
+	}
+
 	ret = unpoison_memory(val);
 	if (!ret) {
 		atomic_dec(&hwpoisoned_pages);
diff --git a/mm/memory-failure.c b/mm/memory-failure.c
index b85661cbdc4a..a3e6bd4b5528 100644
--- a/mm/memory-failure.c
+++ b/mm/memory-failure.c
@@ -68,6 +68,7 @@ int sysctl_memory_failure_early_kill __read_mostly = 0;
 int sysctl_memory_failure_recovery __read_mostly = 1;
 
 atomic_long_t num_poisoned_pages __read_mostly = ATOMIC_LONG_INIT(0);
+EXPORT_SYMBOL_GPL(num_poisoned_pages);
 
 static bool __page_handle_poison(struct page *page)
 {
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 8+ messages in thread

* Re: [PATCH v4 1/2] mm/memory-failure: introduce "hwpoisoned-pages" entry
  2022-06-14  4:38 ` [PATCH v4 1/2] mm/memory-failure: introduce "hwpoisoned-pages" entry zhenwei pi
@ 2022-06-14  5:12   ` Muchun Song
  2022-06-14  7:09   ` HORIGUCHI NAOYA(堀口 直也)
  1 sibling, 0 replies; 8+ messages in thread
From: Muchun Song @ 2022-06-14  5:12 UTC (permalink / raw)
  To: zhenwei pi
  Cc: akpm, naoya.horiguchi, linux-mm, linux-kernel, david, linmiaohe

On Tue, Jun 14, 2022 at 12:38:29PM +0800, zhenwei pi wrote:
> Add a new debug entry to show the number of hwpoisoned pages. And
> use module_get/module_put to manager this kernel module, don't allow
> to remove this module unless hwpoisoned-pages is zero.
> 
> Signed-off-by: zhenwei pi <pizhenwei@bytedance.com>
> ---
>  Documentation/vm/hwpoison.rst |  4 ++++
>  mm/hwpoison-inject.c          | 19 ++++++++++++++++++-
>  2 files changed, 22 insertions(+), 1 deletion(-)
> 
> diff --git a/Documentation/vm/hwpoison.rst b/Documentation/vm/hwpoison.rst
> index c742de1769d1..c832a8b192d4 100644
> --- a/Documentation/vm/hwpoison.rst
> +++ b/Documentation/vm/hwpoison.rst
> @@ -155,6 +155,10 @@ Testing
>  	flag bits are defined in include/linux/kernel-page-flags.h and
>  	documented in Documentation/admin-guide/mm/pagemap.rst
>  
> +  hwpoisoned-pages

A bit weird to me. IIUC, this means the number of **software** poisoned
pages instead of **hardware**. The prefix "hw" may be not suitable.  How
about "poisoned-pages" (a little simplified), "poisoned-pfns" (keep the
name consistent with "corrupt-pfn" and "unpoison-pfn") or "swpoisoned-pages"
(sw means software)?

> +	The number of hwpoisoned pages. The hwpoison kernel module can not be
> +	removed unless this count is zero.
> +
>  * Architecture specific MCE injector
>  
>    x86 has mce-inject, mce-test
> diff --git a/mm/hwpoison-inject.c b/mm/hwpoison-inject.c
> index 5c0cddd81505..9e522ecedeef 100644
> --- a/mm/hwpoison-inject.c
> +++ b/mm/hwpoison-inject.c
> @@ -10,6 +10,7 @@
>  #include "internal.h"
>  
>  static struct dentry *hwpoison_dir;
> +static atomic_t hwpoisoned_pages;
>  
>  static int hwpoison_inject(void *data, u64 val)
>  {
> @@ -49,15 +50,28 @@ static int hwpoison_inject(void *data, u64 val)
>  inject:
>  	pr_info("Injecting memory failure at pfn %#lx\n", pfn);
>  	err = memory_failure(pfn, 0);
> +	if (!err) {
> +		WARN_ON(!try_module_get(THIS_MODULE));

__module_get() is enough since we already hold a refcount at open time.
This WARN_ON() will not be triggered unless something unexpected happens.

> +		atomic_inc(&hwpoisoned_pages);
> +	}
> +
>  	return (err == -EOPNOTSUPP) ? 0 : err;
>  }
>  
>  static int hwpoison_unpoison(void *data, u64 val)
>  {
> +	int ret;
> +
>  	if (!capable(CAP_SYS_ADMIN))
>  		return -EPERM;
>  
> -	return unpoison_memory(val);
> +	ret = unpoison_memory(val);
> +	if (!ret) {
> +		atomic_dec(&hwpoisoned_pages);
> +		module_put(THIS_MODULE);
> +	}
> +
> +	return ret;
>  }
>  
>  DEFINE_DEBUGFS_ATTRIBUTE(hwpoison_fops, NULL, hwpoison_inject, "%lli\n");
> @@ -99,6 +113,9 @@ static int pfn_inject_init(void)
>  	debugfs_create_u64("corrupt-filter-flags-value", 0600, hwpoison_dir,
>  			   &hwpoison_filter_flags_value);
>  
> +	debugfs_create_atomic_t("hwpoisoned-pages", 0400, hwpoison_dir,
> +			   &hwpoisoned_pages);
> +
>  #ifdef CONFIG_MEMCG
>  	debugfs_create_u64("corrupt-filter-memcg", 0600, hwpoison_dir,
>  			   &hwpoison_filter_memcg);
> -- 
> 2.20.1
> 
> 

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH v4 1/2] mm/memory-failure: introduce "hwpoisoned-pages" entry
  2022-06-14  4:38 ` [PATCH v4 1/2] mm/memory-failure: introduce "hwpoisoned-pages" entry zhenwei pi
  2022-06-14  5:12   ` Muchun Song
@ 2022-06-14  7:09   ` HORIGUCHI NAOYA(堀口 直也)
  2022-06-14  7:13     ` David Hildenbrand
  1 sibling, 1 reply; 8+ messages in thread
From: HORIGUCHI NAOYA(堀口 直也) @ 2022-06-14  7:09 UTC (permalink / raw)
  To: zhenwei pi; +Cc: akpm, linux-mm, linux-kernel, david, linmiaohe

On Tue, Jun 14, 2022 at 12:38:29PM +0800, zhenwei pi wrote:
> Add a new debug entry to show the number of hwpoisoned pages. And
> use module_get/module_put to manager this kernel module, don't allow
> to remove this module unless hwpoisoned-pages is zero.
> 
> Signed-off-by: zhenwei pi <pizhenwei@bytedance.com>
> ---
>  Documentation/vm/hwpoison.rst |  4 ++++
>  mm/hwpoison-inject.c          | 19 ++++++++++++++++++-
>  2 files changed, 22 insertions(+), 1 deletion(-)
> 
> diff --git a/Documentation/vm/hwpoison.rst b/Documentation/vm/hwpoison.rst
> index c742de1769d1..c832a8b192d4 100644
> --- a/Documentation/vm/hwpoison.rst
> +++ b/Documentation/vm/hwpoison.rst
> @@ -155,6 +155,10 @@ Testing
>  	flag bits are defined in include/linux/kernel-page-flags.h and
>  	documented in Documentation/admin-guide/mm/pagemap.rst
>  
> +  hwpoisoned-pages
> +	The number of hwpoisoned pages. The hwpoison kernel module can not be
> +	removed unless this count is zero.
> +
>  * Architecture specific MCE injector
>  
>    x86 has mce-inject, mce-test
> diff --git a/mm/hwpoison-inject.c b/mm/hwpoison-inject.c
> index 5c0cddd81505..9e522ecedeef 100644
> --- a/mm/hwpoison-inject.c
> +++ b/mm/hwpoison-inject.c
> @@ -10,6 +10,7 @@
>  #include "internal.h"
>  
>  static struct dentry *hwpoison_dir;
> +static atomic_t hwpoisoned_pages;
>  
>  static int hwpoison_inject(void *data, u64 val)
>  {
> @@ -49,15 +50,28 @@ static int hwpoison_inject(void *data, u64 val)
>  inject:
>  	pr_info("Injecting memory failure at pfn %#lx\n", pfn);
>  	err = memory_failure(pfn, 0);
> +	if (!err) {
> +		WARN_ON(!try_module_get(THIS_MODULE));
> +		atomic_inc(&hwpoisoned_pages);
> +	}

There's a few other interfaces to generate "software-simulated memory error"
event, i.e. madvise_inject_error() and hard_offline_page_store(). So you need
handle such code path.

> +
>  	return (err == -EOPNOTSUPP) ? 0 : err;
>  }
>  
>  static int hwpoison_unpoison(void *data, u64 val)
>  {
> +	int ret;
> +
>  	if (!capable(CAP_SYS_ADMIN))
>  		return -EPERM;
>  
> -	return unpoison_memory(val);
> +	ret = unpoison_memory(val);
> +	if (!ret) {
> +		atomic_dec(&hwpoisoned_pages);
> +		module_put(THIS_MODULE);
> +	}
> +
> +	return ret;
>  }
>  
>  DEFINE_DEBUGFS_ATTRIBUTE(hwpoison_fops, NULL, hwpoison_inject, "%lli\n");
> @@ -99,6 +113,9 @@ static int pfn_inject_init(void)
>  	debugfs_create_u64("corrupt-filter-flags-value", 0600, hwpoison_dir,
>  			   &hwpoison_filter_flags_value);
>  
> +	debugfs_create_atomic_t("hwpoisoned-pages", 0400, hwpoison_dir,
> +			   &hwpoisoned_pages);

I'm not sure how useful this interface from userspace (controlling test process
with this?).  Do we really need to expose this to userspace? 


TBH I feel that another approach like below is more desirable:

  - define a new flag in "enum mf_flags" (for example named MF_SW_SIMULATED),
  - set the flag when calling memory_failure() from the three callers
    mentioned above,
  - define a global variable (typed bool) in mm/memory_failure.c_to show that
    the system has experienced a real hardware memory error events.
  - once memory_failure() is called without MF_SW_SIMULATED, the new global
    bool variable is set, and afterward unpoison_memory always fails with
    -EOPNOTSUPP.

Thanks,
Naoya Horiguchi

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH v4 1/2] mm/memory-failure: introduce "hwpoisoned-pages" entry
  2022-06-14  7:09   ` HORIGUCHI NAOYA(堀口 直也)
@ 2022-06-14  7:13     ` David Hildenbrand
  2022-06-14  7:23       ` [External] " zhenwei pi
  2022-06-14  8:19       ` Miaohe Lin
  0 siblings, 2 replies; 8+ messages in thread
From: David Hildenbrand @ 2022-06-14  7:13 UTC (permalink / raw)
  To: HORIGUCHI NAOYA(堀口 直也), zhenwei pi
  Cc: akpm, linux-mm, linux-kernel, linmiaohe

	   &hwpoisoned_pages);
> 
> I'm not sure how useful this interface from userspace (controlling test process
> with this?).  Do we really need to expose this to userspace? 
> 
> 
> TBH I feel that another approach like below is more desirable:
> 
>   - define a new flag in "enum mf_flags" (for example named MF_SW_SIMULATED),
>   - set the flag when calling memory_failure() from the three callers
>     mentioned above,
>   - define a global variable (typed bool) in mm/memory_failure.c_to show that
>     the system has experienced a real hardware memory error events.
>   - once memory_failure() is called without MF_SW_SIMULATED, the new global
>     bool variable is set, and afterward unpoison_memory always fails with
>     -EOPNOTSUPP.

Exactly what I had in mind.

-- 
Thanks,

David / dhildenb


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [External] Re: [PATCH v4 1/2] mm/memory-failure: introduce "hwpoisoned-pages" entry
  2022-06-14  7:13     ` David Hildenbrand
@ 2022-06-14  7:23       ` zhenwei pi
  2022-06-14  8:19       ` Miaohe Lin
  1 sibling, 0 replies; 8+ messages in thread
From: zhenwei pi @ 2022-06-14  7:23 UTC (permalink / raw)
  To: David Hildenbrand, HORIGUCHI NAOYA(堀口 直也)
  Cc: akpm, linux-mm, linux-kernel, linmiaohe



On 6/14/22 15:13, David Hildenbrand wrote:
> 	   &hwpoisoned_pages);
>>
>> I'm not sure how useful this interface from userspace (controlling test process
>> with this?).  Do we really need to expose this to userspace?
>>
>>
>> TBH I feel that another approach like below is more desirable:
>>
>>    - define a new flag in "enum mf_flags" (for example named MF_SW_SIMULATED),
>>    - set the flag when calling memory_failure() from the three callers
>>      mentioned above,
>>    - define a global variable (typed bool) in mm/memory_failure.c_to show that
>>      the system has experienced a real hardware memory error events.
>>    - once memory_failure() is called without MF_SW_SIMULATED, the new global
>>      bool variable is set, and afterward unpoison_memory always fails with
>>      -EOPNOTSUPP.
> 
> Exactly what I had in mind.
> 

Sure, I'll send a new version later! Thanks!

-- 
zhenwei pi

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH v4 1/2] mm/memory-failure: introduce "hwpoisoned-pages" entry
  2022-06-14  7:13     ` David Hildenbrand
  2022-06-14  7:23       ` [External] " zhenwei pi
@ 2022-06-14  8:19       ` Miaohe Lin
  1 sibling, 0 replies; 8+ messages in thread
From: Miaohe Lin @ 2022-06-14  8:19 UTC (permalink / raw)
  To: David Hildenbrand, HORIGUCHI NAOYA(堀口 直也)
  Cc: akpm, linux-mm, linux-kernel, zhenwei pi

On 2022/6/14 15:13, David Hildenbrand wrote:
> 	   &hwpoisoned_pages);
>>
>> I'm not sure how useful this interface from userspace (controlling test process
>> with this?).  Do we really need to expose this to userspace? 
>>
>>
>> TBH I feel that another approach like below is more desirable:
>>
>>   - define a new flag in "enum mf_flags" (for example named MF_SW_SIMULATED),
>>   - set the flag when calling memory_failure() from the three callers
>>     mentioned above,
>>   - define a global variable (typed bool) in mm/memory_failure.c_to show that
>>     the system has experienced a real hardware memory error events.
>>   - once memory_failure() is called without MF_SW_SIMULATED, the new global
>>     bool variable is set, and afterward unpoison_memory always fails with
>>     -EOPNOTSUPP.
> 
> Exactly what I had in mind.

This approach should be more straightforward. ;)

> 


^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2022-06-14  8:19 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-06-14  4:38 [PATCH v4 0/2] mm/memory-failure: don't allow to unpoison hw corrupted page zhenwei pi
2022-06-14  4:38 ` [PATCH v4 1/2] mm/memory-failure: introduce "hwpoisoned-pages" entry zhenwei pi
2022-06-14  5:12   ` Muchun Song
2022-06-14  7:09   ` HORIGUCHI NAOYA(堀口 直也)
2022-06-14  7:13     ` David Hildenbrand
2022-06-14  7:23       ` [External] " zhenwei pi
2022-06-14  8:19       ` Miaohe Lin
2022-06-14  4:38 ` [PATCH v4 2/2] mm/memory-failure: disable unpoison once hw error happens zhenwei pi

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.