linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v5 0/1] mm/memory-failure: don't allow to unpoison hw corrupted page
@ 2022-06-15  2:00 zhenwei pi
  2022-06-15  2:00 ` [PATCH v5 1/1] mm/memory-failure: disable unpoison once hw error happens zhenwei pi
  0 siblings, 1 reply; 9+ messages in thread
From: zhenwei pi @ 2022-06-15  2:00 UTC (permalink / raw)
  To: pizhenwei, naoya.horiguchi, akpm
  Cc: linux-mm, linux-kernel, david, linmiaohe, gregkh

v4 -> v5:
- Add mf_flags 'MF_SW_SIMULATED' to distinguish SW/HW memory failure,
  and use a global variable to record HW memory failure, once HW
  memory failure happens, disable unpoison.

v3 -> v4:
- Add debug entry "hwpoisoned-pages" to show the number of hwpoisoned
  pages.
- Disable unpoison when a read HW memory failure occurs.

v2 -> v3:
- David pointed out that virt_to_kpte() is broken(no pmd_large() test
  on a PMD), so drop this API in this patch, walk kmap instead.

v1 -> v2:
- this change gets protected by mf_mutex
- use -EOPNOTSUPP instead of -EPERM 

v1:
- check KPTE to avoid to unpoison hardware corrupted page

zhenwei pi (1):
  mm/memory-failure: disable unpoison once hw error happens

 Documentation/vm/hwpoison.rst |  3 ++-
 drivers/base/memory.c         |  2 +-
 include/linux/mm.h            |  1 +
 mm/hwpoison-inject.c          |  2 +-
 mm/madvise.c                  |  2 +-
 mm/memory-failure.c           | 12 ++++++++++++
 6 files changed, 18 insertions(+), 4 deletions(-)

-- 
2.20.1



^ permalink raw reply	[flat|nested] 9+ messages in thread

* [PATCH v5 1/1] mm/memory-failure: disable unpoison once hw error happens
  2022-06-15  2:00 [PATCH v5 0/1] mm/memory-failure: don't allow to unpoison hw corrupted page zhenwei pi
@ 2022-06-15  2:00 ` zhenwei pi
  2022-06-15  4:23   ` Oscar Salvador
  2022-06-15  8:15   ` HORIGUCHI NAOYA(堀口 直也)
  0 siblings, 2 replies; 9+ messages in thread
From: zhenwei pi @ 2022-06-15  2:00 UTC (permalink / raw)
  To: pizhenwei, naoya.horiguchi, akpm
  Cc: linux-mm, linux-kernel, david, linmiaohe, gregkh

Currently unpoison_memory(unsigned long pfn) is designed for soft
poison(hwpoison-inject) only. Since 17fae1294ad9d, the KPTE gets
cleared on a x86 platform once hardware memory corrupts.

Unpoisoning a hardware corrupted page puts page back buddy only,
the kernel has a chance to access the page with *NOT PRESENT* KPTE.
This leads BUG during accessing on the corrupted KPTE.

Suggested by David&Naoya, disable unpoison mechanism when a real HW error
happens to avoid BUG like this:

 Unpoison: Software-unpoisoned page 0x61234
 BUG: unable to handle page fault for address: ffff888061234000
 #PF: supervisor write access in kernel mode
 #PF: error_code(0x0002) - not-present page
 PGD 2c01067 P4D 2c01067 PUD 107267063 PMD 10382b063 PTE 800fffff9edcb062
 Oops: 0002 [#1] PREEMPT SMP NOPTI
 CPU: 4 PID: 26551 Comm: stress Kdump: loaded Tainted: G   M       OE     5.18.0.bm.1-amd64 #7
 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996) ...
 RIP: 0010:clear_page_erms+0x7/0x10
 Code: ...
 RSP: 0000:ffffc90001107bc8 EFLAGS: 00010246
 RAX: 0000000000000000 RBX: 0000000000000901 RCX: 0000000000001000
 RDX: ffffea0001848d00 RSI: ffffea0001848d40 RDI: ffff888061234000
 RBP: ffffea0001848d00 R08: 0000000000000901 R09: 0000000000001276
 R10: 0000000000000003 R11: 0000000000000000 R12: 0000000000000001
 R13: 0000000000000000 R14: 0000000000140dca R15: 0000000000000001
 FS:  00007fd8b2333740(0000) GS:ffff88813fd00000(0000) knlGS:0000000000000000
 CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
 CR2: ffff888061234000 CR3: 00000001023d2005 CR4: 0000000000770ee0
 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
 PKRU: 55555554
 Call Trace:
  <TASK>
  prep_new_page+0x151/0x170
  get_page_from_freelist+0xca0/0xe20
  ? sysvec_apic_timer_interrupt+0xab/0xc0
  ? asm_sysvec_apic_timer_interrupt+0x1b/0x20
  __alloc_pages+0x17e/0x340
  __folio_alloc+0x17/0x40
  vma_alloc_folio+0x84/0x280
  __handle_mm_fault+0x8d4/0xeb0
  handle_mm_fault+0xd5/0x2a0
  do_user_addr_fault+0x1d0/0x680
  ? kvm_read_and_reset_apf_flags+0x3b/0x50
  exc_page_fault+0x78/0x170
  asm_exc_page_fault+0x27/0x30

Fixes: 847ce401df392 ("HWPOISON: Add unpoisoning support")
Fixes: 17fae1294ad9d ("x86/{mce,mm}: Unmap the entire page if the whole page is affected and poisoned")
Cc: Naoya Horiguchi <naoya.horiguchi@nec.com>
Cc: David Hildenbrand <david@redhat.com>
Signed-off-by: zhenwei pi <pizhenwei@bytedance.com>
---
 Documentation/vm/hwpoison.rst |  3 ++-
 drivers/base/memory.c         |  2 +-
 include/linux/mm.h            |  1 +
 mm/hwpoison-inject.c          |  2 +-
 mm/madvise.c                  |  2 +-
 mm/memory-failure.c           | 12 ++++++++++++
 6 files changed, 18 insertions(+), 4 deletions(-)

diff --git a/Documentation/vm/hwpoison.rst b/Documentation/vm/hwpoison.rst
index c742de1769d1..b9d5253c1305 100644
--- a/Documentation/vm/hwpoison.rst
+++ b/Documentation/vm/hwpoison.rst
@@ -120,7 +120,8 @@ Testing
   unpoison-pfn
 	Software-unpoison page at PFN echoed into this file. This way
 	a page can be reused again.  This only works for Linux
-	injected failures, not for real memory failures.
+	injected failures, not for real memory failures. Once any hardware
+	memory failure happens, this feature is disabled.
 
   Note these injection interfaces are not stable and might change between
   kernel versions
diff --git a/drivers/base/memory.c b/drivers/base/memory.c
index 084d67fd55cc..bc60c9cd3230 100644
--- a/drivers/base/memory.c
+++ b/drivers/base/memory.c
@@ -558,7 +558,7 @@ static ssize_t hard_offline_page_store(struct device *dev,
 	if (kstrtoull(buf, 0, &pfn) < 0)
 		return -EINVAL;
 	pfn >>= PAGE_SHIFT;
-	ret = memory_failure(pfn, 0);
+	ret = memory_failure(pfn, MF_SW_SIMULATED);
 	if (ret == -EOPNOTSUPP)
 		ret = 0;
 	return ret ? ret : count;
diff --git a/include/linux/mm.h b/include/linux/mm.h
index bc8f326be0ce..4346e51484ba 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -3232,6 +3232,7 @@ enum mf_flags {
 	MF_MUST_KILL = 1 << 2,
 	MF_SOFT_OFFLINE = 1 << 3,
 	MF_UNPOISON = 1 << 4,
+	MF_SW_SIMULATED = 1 << 5,
 };
 extern int memory_failure(unsigned long pfn, int flags);
 extern void memory_failure_queue(unsigned long pfn, int flags);
diff --git a/mm/hwpoison-inject.c b/mm/hwpoison-inject.c
index 5c0cddd81505..65e242b5a432 100644
--- a/mm/hwpoison-inject.c
+++ b/mm/hwpoison-inject.c
@@ -48,7 +48,7 @@ static int hwpoison_inject(void *data, u64 val)
 
 inject:
 	pr_info("Injecting memory failure at pfn %#lx\n", pfn);
-	err = memory_failure(pfn, 0);
+	err = memory_failure(pfn, MF_SW_SIMULATED);
 	return (err == -EOPNOTSUPP) ? 0 : err;
 }
 
diff --git a/mm/madvise.c b/mm/madvise.c
index d7b4f2602949..0316bbc6441b 100644
--- a/mm/madvise.c
+++ b/mm/madvise.c
@@ -1112,7 +1112,7 @@ static int madvise_inject_error(int behavior,
 		} else {
 			pr_info("Injecting memory failure for pfn %#lx at process virtual address %#lx\n",
 				 pfn, start);
-			ret = memory_failure(pfn, MF_COUNT_INCREASED);
+			ret = memory_failure(pfn, MF_COUNT_INCREASED | MF_SW_SIMULATED);
 			if (ret == -EOPNOTSUPP)
 				ret = 0;
 		}
diff --git a/mm/memory-failure.c b/mm/memory-failure.c
index b85661cbdc4a..385b5e99bfc1 100644
--- a/mm/memory-failure.c
+++ b/mm/memory-failure.c
@@ -69,6 +69,8 @@ int sysctl_memory_failure_recovery __read_mostly = 1;
 
 atomic_long_t num_poisoned_pages __read_mostly = ATOMIC_LONG_INIT(0);
 
+static bool hw_memory_failure;
+
 static bool __page_handle_poison(struct page *page)
 {
 	int ret;
@@ -1768,6 +1770,9 @@ int memory_failure(unsigned long pfn, int flags)
 
 	mutex_lock(&mf_mutex);
 
+	if (!(flags & MF_SW_SIMULATED))
+		hw_memory_failure = true;
+
 	p = pfn_to_online_page(pfn);
 	if (!p) {
 		res = arch_memory_failure(pfn, flags);
@@ -2103,6 +2108,13 @@ int unpoison_memory(unsigned long pfn)
 
 	mutex_lock(&mf_mutex);
 
+	if (hw_memory_failure) {
+		unpoison_pr_info("Unpoison: Disabled after HW memory failure %#lx\n",
+				 pfn, &unpoison_rs);
+		ret = -EOPNOTSUPP;
+		goto unlock_mutex;
+	}
+
 	if (!PageHWPoison(p)) {
 		unpoison_pr_info("Unpoison: Page was already unpoisoned %#lx\n",
 				 pfn, &unpoison_rs);
-- 
2.20.1



^ permalink raw reply related	[flat|nested] 9+ messages in thread

* Re: [PATCH v5 1/1] mm/memory-failure: disable unpoison once hw error happens
  2022-06-15  2:00 ` [PATCH v5 1/1] mm/memory-failure: disable unpoison once hw error happens zhenwei pi
@ 2022-06-15  4:23   ` Oscar Salvador
  2022-06-15  5:18     ` zhenwei pi
  2022-06-15  8:15   ` HORIGUCHI NAOYA(堀口 直也)
  1 sibling, 1 reply; 9+ messages in thread
From: Oscar Salvador @ 2022-06-15  4:23 UTC (permalink / raw)
  To: zhenwei pi
  Cc: naoya.horiguchi, akpm, linux-mm, linux-kernel, david, linmiaohe, gregkh

On Wed, Jun 15, 2022 at 10:00:05AM +0800, zhenwei pi wrote:
> Currently unpoison_memory(unsigned long pfn) is designed for soft
> poison(hwpoison-inject) only. Since 17fae1294ad9d, the KPTE gets
> cleared on a x86 platform once hardware memory corrupts.
> 
> Unpoisoning a hardware corrupted page puts page back buddy only,
> the kernel has a chance to access the page with *NOT PRESENT* KPTE.
> This leads BUG during accessing on the corrupted KPTE.
> 
> Suggested by David&Naoya, disable unpoison mechanism when a real HW error
> happens to avoid BUG like this:
> 
>  Unpoison: Software-unpoisoned page 0x61234
>  BUG: unable to handle page fault for address: ffff888061234000
>  #PF: supervisor write access in kernel mode
>  #PF: error_code(0x0002) - not-present page
>  PGD 2c01067 P4D 2c01067 PUD 107267063 PMD 10382b063 PTE 800fffff9edcb062
>  Oops: 0002 [#1] PREEMPT SMP NOPTI
>  CPU: 4 PID: 26551 Comm: stress Kdump: loaded Tainted: G   M       OE     5.18.0.bm.1-amd64 #7
>  Hardware name: QEMU Standard PC (i440FX + PIIX, 1996) ...
>  RIP: 0010:clear_page_erms+0x7/0x10
>  Code: ...
>  RSP: 0000:ffffc90001107bc8 EFLAGS: 00010246
>  RAX: 0000000000000000 RBX: 0000000000000901 RCX: 0000000000001000
>  RDX: ffffea0001848d00 RSI: ffffea0001848d40 RDI: ffff888061234000
>  RBP: ffffea0001848d00 R08: 0000000000000901 R09: 0000000000001276
>  R10: 0000000000000003 R11: 0000000000000000 R12: 0000000000000001
>  R13: 0000000000000000 R14: 0000000000140dca R15: 0000000000000001
>  FS:  00007fd8b2333740(0000) GS:ffff88813fd00000(0000) knlGS:0000000000000000
>  CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>  CR2: ffff888061234000 CR3: 00000001023d2005 CR4: 0000000000770ee0
>  DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
>  DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
>  PKRU: 55555554
>  Call Trace:
>   <TASK>
>   prep_new_page+0x151/0x170
>   get_page_from_freelist+0xca0/0xe20
>   ? sysvec_apic_timer_interrupt+0xab/0xc0
>   ? asm_sysvec_apic_timer_interrupt+0x1b/0x20
>   __alloc_pages+0x17e/0x340
>   __folio_alloc+0x17/0x40
>   vma_alloc_folio+0x84/0x280
>   __handle_mm_fault+0x8d4/0xeb0
>   handle_mm_fault+0xd5/0x2a0
>   do_user_addr_fault+0x1d0/0x680
>   ? kvm_read_and_reset_apf_flags+0x3b/0x50
>   exc_page_fault+0x78/0x170
>   asm_exc_page_fault+0x27/0x30
> 
> Fixes: 847ce401df392 ("HWPOISON: Add unpoisoning support")
> Fixes: 17fae1294ad9d ("x86/{mce,mm}: Unmap the entire page if the whole page is affected and poisoned")
> Cc: Naoya Horiguchi <naoya.horiguchi@nec.com>
> Cc: David Hildenbrand <david@redhat.com>
> Signed-off-by: zhenwei pi <pizhenwei@bytedance.com>
> ---
...
...
> diff --git a/mm/memory-failure.c b/mm/memory-failure.c
> index b85661cbdc4a..385b5e99bfc1 100644
> --- a/mm/memory-failure.c
> +++ b/mm/memory-failure.c
> @@ -69,6 +69,8 @@ int sysctl_memory_failure_recovery __read_mostly = 1;
>  
>  atomic_long_t num_poisoned_pages __read_mostly = ATOMIC_LONG_INIT(0);
>  
> +static bool hw_memory_failure;
> +
>  static bool __page_handle_poison(struct page *page)
>  {
>  	int ret;
> @@ -1768,6 +1770,9 @@ int memory_failure(unsigned long pfn, int flags)
>  
>  	mutex_lock(&mf_mutex);
>  
> +	if (!(flags & MF_SW_SIMULATED))
> +		hw_memory_failure = true;
> +
>  	p = pfn_to_online_page(pfn);
>  	if (!p) {
>  		res = arch_memory_failure(pfn, flags);
> @@ -2103,6 +2108,13 @@ int unpoison_memory(unsigned long pfn)
>  
>  	mutex_lock(&mf_mutex);
>  
> +	if (hw_memory_failure) {
> +		unpoison_pr_info("Unpoison: Disabled after HW memory failure %#lx\n",
> +				 pfn, &unpoison_rs);
> +		ret = -EOPNOTSUPP;
> +		goto unlock_mutex;
> +	}

If we disable this, I would move this at the beginning of the function.
We do not really care whether the pfn is valid or getting the head of
the page, etc.
So, unless I'm missing something, this should be enough?

diff --git a/mm/memory-failure.c b/mm/memory-failure.c
index 385b5e99bfc1..ece15f07dee7 100644
--- a/mm/memory-failure.c
+++ b/mm/memory-failure.c
@@ -2100,6 +2100,12 @@ int unpoison_memory(unsigned long pfn)
        static DEFINE_RATELIMIT_STATE(unpoison_rs, DEFAULT_RATELIMIT_INTERVAL,
                                        DEFAULT_RATELIMIT_BURST);
 
+       if (hw_memory_failure) {
+                unpoison_pr_info("Unpoison: Disabled after HW memory failure\n",
+                                 &unpoison_rs);
+               ret -EOPNOTSUPP;
+        }
+
        if (!pfn_valid(pfn))
                return -ENXIO;
 

-- 
Oscar Salvador
SUSE Labs


^ permalink raw reply related	[flat|nested] 9+ messages in thread

* Re: Re: [PATCH v5 1/1] mm/memory-failure: disable unpoison once hw error happens
  2022-06-15  4:23   ` Oscar Salvador
@ 2022-06-15  5:18     ` zhenwei pi
  2022-06-15  5:52       ` Oscar Salvador
  0 siblings, 1 reply; 9+ messages in thread
From: zhenwei pi @ 2022-06-15  5:18 UTC (permalink / raw)
  To: Oscar Salvador
  Cc: naoya.horiguchi, akpm, linux-mm, linux-kernel, david, linmiaohe, gregkh



On 6/15/22 12:23, Oscar Salvador wrote:
> On Wed, Jun 15, 2022 at 10:00:05AM +0800, zhenwei pi wrote:
>> Currently unpoison_memory(unsigned long pfn) is designed for soft
>> poison(hwpoison-inject) only. Since 17fae1294ad9d, the KPTE gets
>> cleared on a x86 platform once hardware memory corrupts.
>>
>> Unpoisoning a hardware corrupted page puts page back buddy only,
>> the kernel has a chance to access the page with *NOT PRESENT* KPTE.
>> This leads BUG during accessing on the corrupted KPTE.
>>
>> Suggested by David&Naoya, disable unpoison mechanism when a real HW error
>> happens to avoid BUG like this:
>>
>>   Unpoison: Software-unpoisoned page 0x61234
>>   BUG: unable to handle page fault for address: ffff888061234000
>>   #PF: supervisor write access in kernel mode
>>   #PF: error_code(0x0002) - not-present page
>>   PGD 2c01067 P4D 2c01067 PUD 107267063 PMD 10382b063 PTE 800fffff9edcb062
>>   Oops: 0002 [#1] PREEMPT SMP NOPTI
>>   CPU: 4 PID: 26551 Comm: stress Kdump: loaded Tainted: G   M       OE     5.18.0.bm.1-amd64 #7
>>   Hardware name: QEMU Standard PC (i440FX + PIIX, 1996) ...
>>   RIP: 0010:clear_page_erms+0x7/0x10
>>   Code: ...
>>   RSP: 0000:ffffc90001107bc8 EFLAGS: 00010246
>>   RAX: 0000000000000000 RBX: 0000000000000901 RCX: 0000000000001000
>>   RDX: ffffea0001848d00 RSI: ffffea0001848d40 RDI: ffff888061234000
>>   RBP: ffffea0001848d00 R08: 0000000000000901 R09: 0000000000001276
>>   R10: 0000000000000003 R11: 0000000000000000 R12: 0000000000000001
>>   R13: 0000000000000000 R14: 0000000000140dca R15: 0000000000000001
>>   FS:  00007fd8b2333740(0000) GS:ffff88813fd00000(0000) knlGS:0000000000000000
>>   CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>>   CR2: ffff888061234000 CR3: 00000001023d2005 CR4: 0000000000770ee0
>>   DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
>>   DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
>>   PKRU: 55555554
>>   Call Trace:
>>    <TASK>
>>    prep_new_page+0x151/0x170
>>    get_page_from_freelist+0xca0/0xe20
>>    ? sysvec_apic_timer_interrupt+0xab/0xc0
>>    ? asm_sysvec_apic_timer_interrupt+0x1b/0x20
>>    __alloc_pages+0x17e/0x340
>>    __folio_alloc+0x17/0x40
>>    vma_alloc_folio+0x84/0x280
>>    __handle_mm_fault+0x8d4/0xeb0
>>    handle_mm_fault+0xd5/0x2a0
>>    do_user_addr_fault+0x1d0/0x680
>>    ? kvm_read_and_reset_apf_flags+0x3b/0x50
>>    exc_page_fault+0x78/0x170
>>    asm_exc_page_fault+0x27/0x30
>>
>> Fixes: 847ce401df392 ("HWPOISON: Add unpoisoning support")
>> Fixes: 17fae1294ad9d ("x86/{mce,mm}: Unmap the entire page if the whole page is affected and poisoned")
>> Cc: Naoya Horiguchi <naoya.horiguchi@nec.com>
>> Cc: David Hildenbrand <david@redhat.com>
>> Signed-off-by: zhenwei pi <pizhenwei@bytedance.com>
>> ---
> ...
> ...
>> diff --git a/mm/memory-failure.c b/mm/memory-failure.c
>> index b85661cbdc4a..385b5e99bfc1 100644
>> --- a/mm/memory-failure.c
>> +++ b/mm/memory-failure.c
>> @@ -69,6 +69,8 @@ int sysctl_memory_failure_recovery __read_mostly = 1;
>>   
>>   atomic_long_t num_poisoned_pages __read_mostly = ATOMIC_LONG_INIT(0);
>>   
>> +static bool hw_memory_failure;
>> +
>>   static bool __page_handle_poison(struct page *page)
>>   {
>>   	int ret;
>> @@ -1768,6 +1770,9 @@ int memory_failure(unsigned long pfn, int flags)
>>   
>>   	mutex_lock(&mf_mutex);
>>   
>> +	if (!(flags & MF_SW_SIMULATED))
>> +		hw_memory_failure = true;
>> +
>>   	p = pfn_to_online_page(pfn);
>>   	if (!p) {
>>   		res = arch_memory_failure(pfn, flags);
>> @@ -2103,6 +2108,13 @@ int unpoison_memory(unsigned long pfn)
>>   
>>   	mutex_lock(&mf_mutex);
>>   
>> +	if (hw_memory_failure) {
>> +		unpoison_pr_info("Unpoison: Disabled after HW memory failure %#lx\n",
>> +				 pfn, &unpoison_rs);
>> +		ret = -EOPNOTSUPP;
>> +		goto unlock_mutex;
>> +	}
> 
> If we disable this, I would move this at the beginning of the function.
> We do not really care whether the pfn is valid or getting the head of
> the page, etc.
> So, unless I'm missing something, this should be enough?
> 

Hi,

Because memory_failure() may be called by hardware error randomly, 
hw_memory_failure should be protected by mf_mutex to avoid this case:
int unpoison_memory(unsigned long pfn)
{
     ...
     if (hw_memory_failure) {
     }
     ... --> memory_failure() happens, and mark hw_memory_failure as true
     mutex_lock(&mf_mutex);
}

> diff --git a/mm/memory-failure.c b/mm/memory-failure.c
> index 385b5e99bfc1..ece15f07dee7 100644
> --- a/mm/memory-failure.c
> +++ b/mm/memory-failure.c
> @@ -2100,6 +2100,12 @@ int unpoison_memory(unsigned long pfn)
>          static DEFINE_RATELIMIT_STATE(unpoison_rs, DEFAULT_RATELIMIT_INTERVAL,
>                                          DEFAULT_RATELIMIT_BURST);
>   
> +       if (hw_memory_failure) {
> +                unpoison_pr_info("Unpoison: Disabled after HW memory failure\n",
> +                                 &unpoison_rs);
> +               ret -EOPNOTSUPP;
> +        }
> +
>          if (!pfn_valid(pfn))
>                  return -ENXIO;
>   
> 

-- 
zhenwei pi


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Re: [PATCH v5 1/1] mm/memory-failure: disable unpoison once hw error happens
  2022-06-15  5:18     ` zhenwei pi
@ 2022-06-15  5:52       ` Oscar Salvador
  2022-06-15  7:39         ` HORIGUCHI NAOYA(堀口 直也)
  0 siblings, 1 reply; 9+ messages in thread
From: Oscar Salvador @ 2022-06-15  5:52 UTC (permalink / raw)
  To: zhenwei pi
  Cc: naoya.horiguchi, akpm, linux-mm, linux-kernel, david, linmiaohe, gregkh

On Wed, Jun 15, 2022 at 01:18:23PM +0800, zhenwei pi wrote:
> Hi,
> 
> Because memory_failure() may be called by hardware error randomly,
> hw_memory_failure should be protected by mf_mutex to avoid this case:
> int unpoison_memory(unsigned long pfn)
> {
>     ...
>     if (hw_memory_failure) {
>     }
>     ... --> memory_failure() happens, and mark hw_memory_failure as true
>     mutex_lock(&mf_mutex);

Yeah, I am aware of that.
But once memory_failure() sets hw_memory_failure to true, it does not really matter
whether unpoison_memory() checks that while holding or not the lock, does it?

Note that it does not really matter in the end, but I am just curious whether
there is any strong impediment to that. 


-- 
Oscar Salvador
SUSE Labs


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Re: [PATCH v5 1/1] mm/memory-failure: disable unpoison once hw error happens
  2022-06-15  5:52       ` Oscar Salvador
@ 2022-06-15  7:39         ` HORIGUCHI NAOYA(堀口 直也)
  0 siblings, 0 replies; 9+ messages in thread
From: HORIGUCHI NAOYA(堀口 直也) @ 2022-06-15  7:39 UTC (permalink / raw)
  To: Oscar Salvador, zhenwei pi
  Cc: akpm, linux-mm, linux-kernel, david, linmiaohe, gregkh

On Wed, Jun 15, 2022 at 07:52:06AM +0200, Oscar Salvador wrote:
> On Wed, Jun 15, 2022 at 01:18:23PM +0800, zhenwei pi wrote:
> > Hi,
> > 
> > Because memory_failure() may be called by hardware error randomly,
> > hw_memory_failure should be protected by mf_mutex to avoid this case:
> > int unpoison_memory(unsigned long pfn)
> > {
> >     ...
> >     if (hw_memory_failure) {
> >     }
> >     ... --> memory_failure() happens, and mark hw_memory_failure as true
> >     mutex_lock(&mf_mutex);

I think that this race can cause the reported problem (hw_memory_failure is
unreliable outside mf_mutex), so we need put the check in mf_mutex for the proper fix.

Thanks,
Naoya Horiguchi

> 
> Yeah, I am aware of that.
> But once memory_failure() sets hw_memory_failure to true, it does not really matter
> whether unpoison_memory() checks that while holding or not the lock, does it?
> 
> Note that it does not really matter in the end, but I am just curious whether
> there is any strong impediment to that. 
> 
> 
> -- 
> Oscar Salvador
> SUSE Labs

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH v5 1/1] mm/memory-failure: disable unpoison once hw error happens
  2022-06-15  2:00 ` [PATCH v5 1/1] mm/memory-failure: disable unpoison once hw error happens zhenwei pi
  2022-06-15  4:23   ` Oscar Salvador
@ 2022-06-15  8:15   ` HORIGUCHI NAOYA(堀口 直也)
  2022-06-15  8:21     ` David Hildenbrand
  1 sibling, 1 reply; 9+ messages in thread
From: HORIGUCHI NAOYA(堀口 直也) @ 2022-06-15  8:15 UTC (permalink / raw)
  To: zhenwei pi; +Cc: akpm, linux-mm, linux-kernel, david, linmiaohe, gregkh

On Wed, Jun 15, 2022 at 10:00:05AM +0800, zhenwei pi wrote:
> Currently unpoison_memory(unsigned long pfn) is designed for soft
> poison(hwpoison-inject) only. Since 17fae1294ad9d, the KPTE gets
> cleared on a x86 platform once hardware memory corrupts.
> 
> Unpoisoning a hardware corrupted page puts page back buddy only,
> the kernel has a chance to access the page with *NOT PRESENT* KPTE.
> This leads BUG during accessing on the corrupted KPTE.
> 
> Suggested by David&Naoya, disable unpoison mechanism when a real HW error
> happens to avoid BUG like this:
...

> 
> Fixes: 847ce401df392 ("HWPOISON: Add unpoisoning support")
> Fixes: 17fae1294ad9d ("x86/{mce,mm}: Unmap the entire page if the whole page is affected and poisoned")
> Cc: Naoya Horiguchi <naoya.horiguchi@nec.com>
> Cc: David Hildenbrand <david@redhat.com>
> Signed-off-by: zhenwei pi <pizhenwei@bytedance.com>

Cc to stable?
I think that the current approach seems predictable to me than earlier versions,
so I can agree with sending this to stable a little more confidently.

> ---
>  Documentation/vm/hwpoison.rst |  3 ++-
>  drivers/base/memory.c         |  2 +-
>  include/linux/mm.h            |  1 +
>  mm/hwpoison-inject.c          |  2 +-
>  mm/madvise.c                  |  2 +-
>  mm/memory-failure.c           | 12 ++++++++++++
>  6 files changed, 18 insertions(+), 4 deletions(-)
> 

...

> diff --git a/mm/memory-failure.c b/mm/memory-failure.c
> index b85661cbdc4a..385b5e99bfc1 100644
> --- a/mm/memory-failure.c
> +++ b/mm/memory-failure.c
> @@ -69,6 +69,8 @@ int sysctl_memory_failure_recovery __read_mostly = 1;
>  
>  atomic_long_t num_poisoned_pages __read_mostly = ATOMIC_LONG_INIT(0);
>  
> +static bool hw_memory_failure;

Could you set the initial value explicitly?  Using a default value is good,
but doing as the surrounding code do is better for consistency.  And this
variable can be updated only once, so adding __read_mostly macro is also fine.

Thanks,
Naoya Horiguchi

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH v5 1/1] mm/memory-failure: disable unpoison once hw error happens
  2022-06-15  8:15   ` HORIGUCHI NAOYA(堀口 直也)
@ 2022-06-15  8:21     ` David Hildenbrand
  2022-06-15  8:43       ` HORIGUCHI NAOYA(堀口 直也)
  0 siblings, 1 reply; 9+ messages in thread
From: David Hildenbrand @ 2022-06-15  8:21 UTC (permalink / raw)
  To: HORIGUCHI NAOYA(堀口 直也), zhenwei pi
  Cc: akpm, linux-mm, linux-kernel, linmiaohe, gregkh

On 15.06.22 10:15, HORIGUCHI NAOYA(堀口 直也) wrote:
> On Wed, Jun 15, 2022 at 10:00:05AM +0800, zhenwei pi wrote:
>> Currently unpoison_memory(unsigned long pfn) is designed for soft
>> poison(hwpoison-inject) only. Since 17fae1294ad9d, the KPTE gets
>> cleared on a x86 platform once hardware memory corrupts.
>>
>> Unpoisoning a hardware corrupted page puts page back buddy only,
>> the kernel has a chance to access the page with *NOT PRESENT* KPTE.
>> This leads BUG during accessing on the corrupted KPTE.
>>
>> Suggested by David&Naoya, disable unpoison mechanism when a real HW error
>> happens to avoid BUG like this:
> ...
> 
>>
>> Fixes: 847ce401df392 ("HWPOISON: Add unpoisoning support")
>> Fixes: 17fae1294ad9d ("x86/{mce,mm}: Unmap the entire page if the whole page is affected and poisoned")
>> Cc: Naoya Horiguchi <naoya.horiguchi@nec.com>
>> Cc: David Hildenbrand <david@redhat.com>
>> Signed-off-by: zhenwei pi <pizhenwei@bytedance.com>
> 
> Cc to stable?
> I think that the current approach seems predictable to me than earlier versions,
> so I can agree with sending this to stable a little more confidently.
> 
>> ---
>>  Documentation/vm/hwpoison.rst |  3 ++-
>>  drivers/base/memory.c         |  2 +-
>>  include/linux/mm.h            |  1 +
>>  mm/hwpoison-inject.c          |  2 +-
>>  mm/madvise.c                  |  2 +-
>>  mm/memory-failure.c           | 12 ++++++++++++
>>  6 files changed, 18 insertions(+), 4 deletions(-)
>>
> 
> ...
> 
>> diff --git a/mm/memory-failure.c b/mm/memory-failure.c
>> index b85661cbdc4a..385b5e99bfc1 100644
>> --- a/mm/memory-failure.c
>> +++ b/mm/memory-failure.c
>> @@ -69,6 +69,8 @@ int sysctl_memory_failure_recovery __read_mostly = 1;
>>  
>>  atomic_long_t num_poisoned_pages __read_mostly = ATOMIC_LONG_INIT(0);
>>  
>> +static bool hw_memory_failure;
> 
> Could you set the initial value explicitly?  Using a default value is good,
> but doing as the surrounding code do is better for consistency.  And this
> variable can be updated only once, so adding __read_mostly macro is also fine.

No strong opinion. __read_mostly makes sense, but I assume we don't
really care about performance that much when dealing with HW errors.

With or without changes around this initialization

Acked-by: David Hildenbrand <david@redhat.com>

-- 
Thanks,

David / dhildenb



^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH v5 1/1] mm/memory-failure: disable unpoison once hw error happens
  2022-06-15  8:21     ` David Hildenbrand
@ 2022-06-15  8:43       ` HORIGUCHI NAOYA(堀口 直也)
  0 siblings, 0 replies; 9+ messages in thread
From: HORIGUCHI NAOYA(堀口 直也) @ 2022-06-15  8:43 UTC (permalink / raw)
  To: David Hildenbrand
  Cc: zhenwei pi, akpm, linux-mm, linux-kernel, linmiaohe, gregkh

On Wed, Jun 15, 2022 at 10:21:09AM +0200, David Hildenbrand wrote:
> On 15.06.22 10:15, HORIGUCHI NAOYA(堀口 直也) wrote:
> > On Wed, Jun 15, 2022 at 10:00:05AM +0800, zhenwei pi wrote:
...
> > 
> > ...
> > 
> >> diff --git a/mm/memory-failure.c b/mm/memory-failure.c
> >> index b85661cbdc4a..385b5e99bfc1 100644
> >> --- a/mm/memory-failure.c
> >> +++ b/mm/memory-failure.c
> >> @@ -69,6 +69,8 @@ int sysctl_memory_failure_recovery __read_mostly = 1;
> >>  
> >>  atomic_long_t num_poisoned_pages __read_mostly = ATOMIC_LONG_INIT(0);
> >>  
> >> +static bool hw_memory_failure;
> > 
> > Could you set the initial value explicitly?  Using a default value is good,
> > but doing as the surrounding code do is better for consistency.  And this
> > variable can be updated only once, so adding __read_mostly macro is also fine.
> 
> No strong opinion. __read_mostly makes sense, but I assume we don't
> really care about performance that much when dealing with HW errors.

That's right, mm/memory-failure.c should be mostly performance insensitive.

- Naoya Horiguchi

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2022-06-15  8:43 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-06-15  2:00 [PATCH v5 0/1] mm/memory-failure: don't allow to unpoison hw corrupted page zhenwei pi
2022-06-15  2:00 ` [PATCH v5 1/1] mm/memory-failure: disable unpoison once hw error happens zhenwei pi
2022-06-15  4:23   ` Oscar Salvador
2022-06-15  5:18     ` zhenwei pi
2022-06-15  5:52       ` Oscar Salvador
2022-06-15  7:39         ` HORIGUCHI NAOYA(堀口 直也)
2022-06-15  8:15   ` HORIGUCHI NAOYA(堀口 直也)
2022-06-15  8:21     ` David Hildenbrand
2022-06-15  8:43       ` HORIGUCHI NAOYA(堀口 直也)

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).