linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH] HWPOISON: add a pr_err message when forcibly send a sigbus
@ 2023-08-19 10:22 Shuai Xue
  2023-08-19 11:49 ` Helge Deller
  2023-08-21 10:50 ` Will Deacon
  0 siblings, 2 replies; 13+ messages in thread
From: Shuai Xue @ 2023-08-19 10:22 UTC (permalink / raw)
  To: catalin.marinas, will, James.Bottomley, deller, dave.hansen,
	luto, peterz, tglx, mingo, bp, x86, hpa
  Cc: linux-arm-kernel, linux-kernel, linux-parisc

When a process tries to access a page that is already offline, the
kernel will send a sigbus signal with the BUS_MCEERR_AR code. This
signal is typically handled by a registered sigbus handler in the
process. However, if the process does not have a registered sigbus
handler, it is important for end users to be informed about what
happened.

To address this, add an error message similar to those implemented on
the x86, powerpc, and parisc platforms.

Signed-off-by: Shuai Xue <xueshuai@linux.alibaba.com>
---
 arch/arm64/mm/fault.c  | 2 ++
 arch/parisc/mm/fault.c | 5 ++---
 arch/x86/mm/fault.c    | 3 +--
 3 files changed, 5 insertions(+), 5 deletions(-)

diff --git a/arch/arm64/mm/fault.c b/arch/arm64/mm/fault.c
index 3fe516b32577..38e2186882bd 100644
--- a/arch/arm64/mm/fault.c
+++ b/arch/arm64/mm/fault.c
@@ -679,6 +679,8 @@ static int __kprobes do_page_fault(unsigned long far, unsigned long esr,
 	} else if (fault & (VM_FAULT_HWPOISON_LARGE | VM_FAULT_HWPOISON)) {
 		unsigned int lsb;
 
+		pr_err("MCE: Killing %s:%d due to hardware memory corruption fault at %lx\n",
+		       current->comm, current->pid, far);
 		lsb = PAGE_SHIFT;
 		if (fault & VM_FAULT_HWPOISON_LARGE)
 			lsb = hstate_index_to_shift(VM_FAULT_GET_HINDEX(fault));
diff --git a/arch/parisc/mm/fault.c b/arch/parisc/mm/fault.c
index a4c7c7630f48..6b096b47e149 100644
--- a/arch/parisc/mm/fault.c
+++ b/arch/parisc/mm/fault.c
@@ -395,9 +395,8 @@ void do_page_fault(struct pt_regs *regs, unsigned long code,
 #ifdef CONFIG_MEMORY_FAILURE
 		if (fault & (VM_FAULT_HWPOISON|VM_FAULT_HWPOISON_LARGE)) {
 			unsigned int lsb = 0;
-			printk(KERN_ERR
-	"MCE: Killing %s:%d due to hardware memory corruption fault at %08lx\n",
-			tsk->comm, tsk->pid, address);
+			pr_err("MCE: Killing %s:%d due to hardware memory corruption fault at %08lx\n",
+				tsk->comm, tsk->pid, address);
 			/*
 			 * Either small page or large page may be poisoned.
 			 * In other words, VM_FAULT_HWPOISON_LARGE and
diff --git a/arch/x86/mm/fault.c b/arch/x86/mm/fault.c
index e8711b2cafaf..7266509cca54 100644
--- a/arch/x86/mm/fault.c
+++ b/arch/x86/mm/fault.c
@@ -962,8 +962,7 @@ do_sigbus(struct pt_regs *regs, unsigned long error_code, unsigned long address,
 		struct task_struct *tsk = current;
 		unsigned lsb = 0;
 
-		pr_err(
-	"MCE: Killing %s:%d due to hardware memory corruption fault at %lx\n",
+		pr_err("MCE: Killing %s:%d due to hardware memory corruption fault at %lx\n",
 			tsk->comm, tsk->pid, address);
 		if (fault & VM_FAULT_HWPOISON_LARGE)
 			lsb = hstate_index_to_shift(VM_FAULT_GET_HINDEX(fault));
-- 
2.39.3


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* Re: [PATCH] HWPOISON: add a pr_err message when forcibly send a sigbus
  2023-08-19 10:22 [PATCH] HWPOISON: add a pr_err message when forcibly send a sigbus Shuai Xue
@ 2023-08-19 11:49 ` Helge Deller
  2023-08-21 10:50 ` Will Deacon
  1 sibling, 0 replies; 13+ messages in thread
From: Helge Deller @ 2023-08-19 11:49 UTC (permalink / raw)
  To: Shuai Xue, catalin.marinas, will, James.Bottomley, dave.hansen,
	luto, peterz, tglx, mingo, bp, x86, hpa
  Cc: linux-arm-kernel, linux-kernel, linux-parisc

On 8/19/23 12:22, Shuai Xue wrote:
> When a process tries to access a page that is already offline, the
> kernel will send a sigbus signal with the BUS_MCEERR_AR code. This
> signal is typically handled by a registered sigbus handler in the
> process. However, if the process does not have a registered sigbus
> handler, it is important for end users to be informed about what
> happened.
>
> To address this, add an error message similar to those implemented on
> the x86, powerpc, and parisc platforms.
>
> Signed-off-by: Shuai Xue <xueshuai@linux.alibaba.com>
> ---
>   arch/arm64/mm/fault.c  | 2 ++
>   arch/parisc/mm/fault.c | 5 ++---
>   arch/x86/mm/fault.c    | 3 +--
>   3 files changed, 5 insertions(+), 5 deletions(-)

Acked-by: Helge Deller <deller@gmx.de> # parisc



^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH] HWPOISON: add a pr_err message when forcibly send a sigbus
  2023-08-19 10:22 [PATCH] HWPOISON: add a pr_err message when forcibly send a sigbus Shuai Xue
  2023-08-19 11:49 ` Helge Deller
@ 2023-08-21 10:50 ` Will Deacon
  2023-08-21 11:59   ` Helge Deller
  2023-08-22  1:15   ` Shuai Xue
  1 sibling, 2 replies; 13+ messages in thread
From: Will Deacon @ 2023-08-21 10:50 UTC (permalink / raw)
  To: Shuai Xue
  Cc: catalin.marinas, James.Bottomley, deller, dave.hansen, luto,
	peterz, tglx, mingo, bp, x86, hpa, linux-arm-kernel,
	linux-kernel, linux-parisc

On Sat, Aug 19, 2023 at 06:22:12PM +0800, Shuai Xue wrote:
> When a process tries to access a page that is already offline

How does this happen?

> the kernel will send a sigbus signal with the BUS_MCEERR_AR code. This
> signal is typically handled by a registered sigbus handler in the
> process. However, if the process does not have a registered sigbus
> handler, it is important for end users to be informed about what
> happened.
> 
> To address this, add an error message similar to those implemented on
> the x86, powerpc, and parisc platforms.
> 
> Signed-off-by: Shuai Xue <xueshuai@linux.alibaba.com>
> ---
>  arch/arm64/mm/fault.c  | 2 ++
>  arch/parisc/mm/fault.c | 5 ++---
>  arch/x86/mm/fault.c    | 3 +--
>  3 files changed, 5 insertions(+), 5 deletions(-)
> 
> diff --git a/arch/arm64/mm/fault.c b/arch/arm64/mm/fault.c
> index 3fe516b32577..38e2186882bd 100644
> --- a/arch/arm64/mm/fault.c
> +++ b/arch/arm64/mm/fault.c
> @@ -679,6 +679,8 @@ static int __kprobes do_page_fault(unsigned long far, unsigned long esr,
>  	} else if (fault & (VM_FAULT_HWPOISON_LARGE | VM_FAULT_HWPOISON)) {
>  		unsigned int lsb;
>  
> +		pr_err("MCE: Killing %s:%d due to hardware memory corruption fault at %lx\n",
> +		       current->comm, current->pid, far);
>  		lsb = PAGE_SHIFT;
>  		if (fault & VM_FAULT_HWPOISON_LARGE)
>  			lsb = hstate_index_to_shift(VM_FAULT_GET_HINDEX(fault));

Hmm, I'm not convinced by this. We have 'show_unhandled_signals' already,
and there's plenty of code in memory-failure.c for handling poisoned pages
reported by e.g. GHES. I don't think dumping extra messages in dmesg from
the arch code really adds anything.

Will

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH] HWPOISON: add a pr_err message when forcibly send a sigbus
  2023-08-21 10:50 ` Will Deacon
@ 2023-08-21 11:59   ` Helge Deller
  2023-08-22  1:15   ` Shuai Xue
  1 sibling, 0 replies; 13+ messages in thread
From: Helge Deller @ 2023-08-21 11:59 UTC (permalink / raw)
  To: Will Deacon, Shuai Xue
  Cc: catalin.marinas, James.Bottomley, dave.hansen, luto, peterz,
	tglx, mingo, bp, x86, hpa, linux-arm-kernel, linux-kernel,
	linux-parisc

On 8/21/23 12:50, Will Deacon wrote:
> On Sat, Aug 19, 2023 at 06:22:12PM +0800, Shuai Xue wrote:
>> When a process tries to access a page that is already offline
>
> How does this happen?
>
>> the kernel will send a sigbus signal with the BUS_MCEERR_AR code. This
>> signal is typically handled by a registered sigbus handler in the
>> process. However, if the process does not have a registered sigbus
>> handler, it is important for end users to be informed about what
>> happened.
>>
>> To address this, add an error message similar to those implemented on
>> the x86, powerpc, and parisc platforms.
>>
>> Signed-off-by: Shuai Xue <xueshuai@linux.alibaba.com>
>> ---
>>   arch/arm64/mm/fault.c  | 2 ++
>>   arch/parisc/mm/fault.c | 5 ++---
>>   arch/x86/mm/fault.c    | 3 +--
>>   3 files changed, 5 insertions(+), 5 deletions(-)
>>
>> diff --git a/arch/arm64/mm/fault.c b/arch/arm64/mm/fault.c
>> index 3fe516b32577..38e2186882bd 100644
>> --- a/arch/arm64/mm/fault.c
>> +++ b/arch/arm64/mm/fault.c
>> @@ -679,6 +679,8 @@ static int __kprobes do_page_fault(unsigned long far, unsigned long esr,
>>   	} else if (fault & (VM_FAULT_HWPOISON_LARGE | VM_FAULT_HWPOISON)) {
>>   		unsigned int lsb;
>>
>> +		pr_err("MCE: Killing %s:%d due to hardware memory corruption fault at %lx\n",
>> +		       current->comm, current->pid, far);
>>   		lsb = PAGE_SHIFT;
>>   		if (fault & VM_FAULT_HWPOISON_LARGE)
>>   			lsb = hstate_index_to_shift(VM_FAULT_GET_HINDEX(fault));
>
> Hmm, I'm not convinced by this. We have 'show_unhandled_signals' already,
> and there's plenty of code in memory-failure.c for handling poisoned pages
> reported by e.g. GHES. I don't think dumping extra messages in dmesg from
> the arch code really adds anything.

I added the parisc hunk in commit 606f95e42558 due to the memory fault injections by the LTP
testsuite (madvise07). Not sure if there were any other kernel messages when this happened.

Helge

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH] HWPOISON: add a pr_err message when forcibly send a sigbus
  2023-08-21 10:50 ` Will Deacon
  2023-08-21 11:59   ` Helge Deller
@ 2023-08-22  1:15   ` Shuai Xue
  2023-08-28  1:41     ` Shuai Xue
  1 sibling, 1 reply; 13+ messages in thread
From: Shuai Xue @ 2023-08-22  1:15 UTC (permalink / raw)
  To: Will Deacon
  Cc: catalin.marinas, James.Bottomley, deller, dave.hansen, luto,
	peterz, tglx, mingo, bp, x86, hpa, linux-arm-kernel,
	linux-kernel, linux-parisc



On 2023/8/21 18:50, Will Deacon wrote:
> On Sat, Aug 19, 2023 at 06:22:12PM +0800, Shuai Xue wrote:
>> When a process tries to access a page that is already offline
> 
> How does this happen?

Case 1:

When a process consume poison, it will trigger a Synchronous External Abort.
The memory-failure.c on arm64 platform does not work as expected, it does NOT
send sigbus to the process consumed poison because GHES handle all notification
as Action Option event. Process will not be collected to be killed unless explicitly
set early kill in mm/memory-failure.c:collect_procs(). Even worse, QEMU relies on
SIGBUS and its code to perform vSEA injection into the Guest Kernel.

The memory-failure.c only offlines the page which unmaps the page from process.
and the process will start from the last PC so that a page fault occurs. Then
a sigbus will send to process in do_page_fault() with BUS_MCEERR_AR.

By the way, I have sent a patch set to solve the memory failure workflow in GHES[1]
collect some reviewed-tags, but there has been no response since the last version
sent 4 months ago. Could you please help to review it? Thank you.

https://lore.kernel.org/all/20230606074238.97166-1-xueshuai@linux.alibaba.com/

Case 2:

When a poison page shared by many processes, the memory-failure.c unmap the poison page
from all processes and only send sigbus to the process which accessing the poison page.
The poison page may be accessed by other processes later and it triggers a page fault,
then a sigbus will send to process in do_page_fault() with BUS_MCEERR_AR.


> 
>> the kernel will send a sigbus signal with the BUS_MCEERR_AR code. This
>> signal is typically handled by a registered sigbus handler in the
>> process. However, if the process does not have a registered sigbus
>> handler, it is important for end users to be informed about what
>> happened.
>>
>> To address this, add an error message similar to those implemented on
>> the x86, powerpc, and parisc platforms.
>>
>> Signed-off-by: Shuai Xue <xueshuai@linux.alibaba.com>
>> ---
>>  arch/arm64/mm/fault.c  | 2 ++
>>  arch/parisc/mm/fault.c | 5 ++---
>>  arch/x86/mm/fault.c    | 3 +--
>>  3 files changed, 5 insertions(+), 5 deletions(-)
>>
>> diff --git a/arch/arm64/mm/fault.c b/arch/arm64/mm/fault.c
>> index 3fe516b32577..38e2186882bd 100644
>> --- a/arch/arm64/mm/fault.c
>> +++ b/arch/arm64/mm/fault.c
>> @@ -679,6 +679,8 @@ static int __kprobes do_page_fault(unsigned long far, unsigned long esr,
>>  	} else if (fault & (VM_FAULT_HWPOISON_LARGE | VM_FAULT_HWPOISON)) {
>>  		unsigned int lsb;
>>  
>> +		pr_err("MCE: Killing %s:%d due to hardware memory corruption fault at %lx\n",
>> +		       current->comm, current->pid, far);
>>  		lsb = PAGE_SHIFT;
>>  		if (fault & VM_FAULT_HWPOISON_LARGE)
>>  			lsb = hstate_index_to_shift(VM_FAULT_GET_HINDEX(fault));
> 
> Hmm, I'm not convinced by this. We have 'show_unhandled_signals' already,
> and there's plenty of code in memory-failure.c for handling poisoned pages
> reported by e.g. GHES. I don't think dumping extra messages in dmesg from
> the arch code really adds anything.

I see the show_unhandled_signals() will dump the stack but it rely on
/proc/sys/debug/exception-trace be set.

The memory failure is the top issue in our production cloud and also other hyperscalers.
We have received complaints from our operations engineers and end users that processes
are being inexplicably killed :(. Could you please consider add a message?

Thank you.

Best Regards,
Shuai

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH] HWPOISON: add a pr_err message when forcibly send a sigbus
  2023-08-22  1:15   ` Shuai Xue
@ 2023-08-28  1:41     ` Shuai Xue
  2023-08-30 22:18       ` Will Deacon
  0 siblings, 1 reply; 13+ messages in thread
From: Shuai Xue @ 2023-08-28  1:41 UTC (permalink / raw)
  To: Will Deacon
  Cc: catalin.marinas, James.Bottomley, deller, dave.hansen, luto,
	peterz, tglx, mingo, bp, x86, hpa, linux-arm-kernel,
	linux-kernel, linux-parisc



On 2023/8/22 09:15, Shuai Xue wrote:
> 
> 
> On 2023/8/21 18:50, Will Deacon wrote:
>> On Sat, Aug 19, 2023 at 06:22:12PM +0800, Shuai Xue wrote:
>>> When a process tries to access a page that is already offline
>>
>> How does this happen?
> 
> Case 1:
> 
> When a process consume poison, it will trigger a Synchronous External Abort.
> The memory-failure.c on arm64 platform does not work as expected, it does NOT
> send sigbus to the process consumed poison because GHES handle all notification
> as Action Option event. Process will not be collected to be killed unless explicitly
> set early kill in mm/memory-failure.c:collect_procs(). Even worse, QEMU relies on
> SIGBUS and its code to perform vSEA injection into the Guest Kernel.
> 
> The memory-failure.c only offlines the page which unmaps the page from process.
> and the process will start from the last PC so that a page fault occurs. Then
> a sigbus will send to process in do_page_fault() with BUS_MCEERR_AR.
> 
> By the way, I have sent a patch set to solve the memory failure workflow in GHES[1]
> collect some reviewed-tags, but there has been no response since the last version
> sent 4 months ago. Could you please help to review it? Thank you.
> 
> https://lore.kernel.org/all/20230606074238.97166-1-xueshuai@linux.alibaba.com/
> 
> Case 2:
> 
> When a poison page shared by many processes, the memory-failure.c unmap the poison page
> from all processes and only send sigbus to the process which accessing the poison page.
> The poison page may be accessed by other processes later and it triggers a page fault,
> then a sigbus will send to process in do_page_fault() with BUS_MCEERR_AR.
> 
> 
>>
>>> the kernel will send a sigbus signal with the BUS_MCEERR_AR code. This
>>> signal is typically handled by a registered sigbus handler in the
>>> process. However, if the process does not have a registered sigbus
>>> handler, it is important for end users to be informed about what
>>> happened.
>>>
>>> To address this, add an error message similar to those implemented on
>>> the x86, powerpc, and parisc platforms.
>>>
>>> Signed-off-by: Shuai Xue <xueshuai@linux.alibaba.com>
>>> ---
>>>  arch/arm64/mm/fault.c  | 2 ++
>>>  arch/parisc/mm/fault.c | 5 ++---
>>>  arch/x86/mm/fault.c    | 3 +--
>>>  3 files changed, 5 insertions(+), 5 deletions(-)
>>>
>>> diff --git a/arch/arm64/mm/fault.c b/arch/arm64/mm/fault.c
>>> index 3fe516b32577..38e2186882bd 100644
>>> --- a/arch/arm64/mm/fault.c
>>> +++ b/arch/arm64/mm/fault.c
>>> @@ -679,6 +679,8 @@ static int __kprobes do_page_fault(unsigned long far, unsigned long esr,
>>>  	} else if (fault & (VM_FAULT_HWPOISON_LARGE | VM_FAULT_HWPOISON)) {
>>>  		unsigned int lsb;
>>>  
>>> +		pr_err("MCE: Killing %s:%d due to hardware memory corruption fault at %lx\n",
>>> +		       current->comm, current->pid, far);
>>>  		lsb = PAGE_SHIFT;
>>>  		if (fault & VM_FAULT_HWPOISON_LARGE)
>>>  			lsb = hstate_index_to_shift(VM_FAULT_GET_HINDEX(fault));
>>
>> Hmm, I'm not convinced by this. We have 'show_unhandled_signals' already,
>> and there's plenty of code in memory-failure.c for handling poisoned pages
>> reported by e.g. GHES. I don't think dumping extra messages in dmesg from
>> the arch code really adds anything.
> 
> I see the show_unhandled_signals() will dump the stack but it rely on
> /proc/sys/debug/exception-trace be set.
> 
> The memory failure is the top issue in our production cloud and also other hyperscalers.
> We have received complaints from our operations engineers and end users that processes
> are being inexplicably killed :(. Could you please consider add a message?
> 
> Thank you.
> 
> Best Regards,
> Shuai

I apologize that the two cases mentioned above failed to convince you. Let me provide an example of
an online outage ticket to help you better understand the challenges I faced in a production environment.
The only log collected from console is pasted bellow:

[2023-06-22 21:55:31.352444][5975153.869131] Memory failure: 0x7740c0: recovery action for dirty page: Recovered
[2023-06-22 21:55:31.352446][5975153.878190] EDAC MC0: 1 UE memory read error on CPU_SrcID#0_MC#0_Chan#1_DIMM#0 (xxx xxx)
[2023-06-22 21:55:31.778747][5975154.296056] EDAC MC0: 2 CE memory read error on CPU_SrcID#0_MC#0_Chan#1_DIMM#0 (xxx xxx)
[2023-06-22 21:55:32.204051][5975154.721742] EDAC MC0: 1 CE memory read error on CPU_SrcID#0_MC#0_Chan#1_DIMM#0 (xxx xxx)
[2023-06-22 21:55:32.759710][5975155.280550] EDAC MC0: 1 CE memory read error on CPU_SrcID#0_MC#0_Chan#1_DIMM#0 (xxx xxx)
[2023-06-22 21:55:32.817890][5975155.337168] EDAC MC0: 1 CE memory read error on CPU_SrcID#0_MC#0_Chan#1_DIMM#0 (xxx xxx)
[2023-06-22 21:55:33.786742][5975156.304515] EDAC MC0: 1 CE memory read error on CPU_SrcID#0_MC#0_Chan#1_DIMM#0 (xxx xxx)
[2023-06-22 21:55:33.843280][5975156.361284] EDAC MC0: 3 CE memorERROR: C00000002:V030xxxx I0 D6476950-xxxx 7731xxxx
[2023-06-22 21:55:34.218603]y read error on CPU_SrcID#0_MC#0_Chan#1_DIMM#0 (xxxx)
[2023-06-22 21:55:34.269969]ERROR: C00000002:V030xxx I0 D6476950-xxxx 7731xxxx
[2023-06-24 00:40:54.482177]ERROR: C00000002:V030xxx I0 A8CEC941-xxxx 7731xxxx

(For security reasons, I have concealed some information.)

Typically, a temporary virtual team consisting of operations, firmware, and OS engineers will analyze this log.

It is really hard to infer exactly what is going on and where the problem occurred. Does the kernel send a SIGBUS
to handle the memory failure as expected? I am not able to provide a conclusion that is 100% certain but a pr_err
message would help.

Thank you.

Best Regards
Shuai.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH] HWPOISON: add a pr_err message when forcibly send a sigbus
  2023-08-28  1:41     ` Shuai Xue
@ 2023-08-30 22:18       ` Will Deacon
  2023-08-31  3:29         ` Shuai Xue
  0 siblings, 1 reply; 13+ messages in thread
From: Will Deacon @ 2023-08-30 22:18 UTC (permalink / raw)
  To: Shuai Xue
  Cc: catalin.marinas, James.Bottomley, deller, dave.hansen, luto,
	peterz, tglx, mingo, bp, x86, hpa, linux-arm-kernel,
	linux-kernel, linux-parisc

On Mon, Aug 28, 2023 at 09:41:55AM +0800, Shuai Xue wrote:
> On 2023/8/22 09:15, Shuai Xue wrote:
> > On 2023/8/21 18:50, Will Deacon wrote:
> >>> diff --git a/arch/arm64/mm/fault.c b/arch/arm64/mm/fault.c
> >>> index 3fe516b32577..38e2186882bd 100644
> >>> --- a/arch/arm64/mm/fault.c
> >>> +++ b/arch/arm64/mm/fault.c
> >>> @@ -679,6 +679,8 @@ static int __kprobes do_page_fault(unsigned long far, unsigned long esr,
> >>>  	} else if (fault & (VM_FAULT_HWPOISON_LARGE | VM_FAULT_HWPOISON)) {
> >>>  		unsigned int lsb;
> >>>  
> >>> +		pr_err("MCE: Killing %s:%d due to hardware memory corruption fault at %lx\n",
> >>> +		       current->comm, current->pid, far);
> >>>  		lsb = PAGE_SHIFT;
> >>>  		if (fault & VM_FAULT_HWPOISON_LARGE)
> >>>  			lsb = hstate_index_to_shift(VM_FAULT_GET_HINDEX(fault));
> >>
> >> Hmm, I'm not convinced by this. We have 'show_unhandled_signals' already,
> >> and there's plenty of code in memory-failure.c for handling poisoned pages
> >> reported by e.g. GHES. I don't think dumping extra messages in dmesg from
> >> the arch code really adds anything.
> > 
> > I see the show_unhandled_signals() will dump the stack but it rely on
> > /proc/sys/debug/exception-trace be set.
> > 
> > The memory failure is the top issue in our production cloud and also other hyperscalers.
> > We have received complaints from our operations engineers and end users that processes
> > are being inexplicably killed :(. Could you please consider add a message?

I don't have any objection to logging this stuff somehow, I'm just not
convinced that the console is the best place for that information in 2023.
Is there really nothing better?

Will

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH] HWPOISON: add a pr_err message when forcibly send a sigbus
  2023-08-30 22:18       ` Will Deacon
@ 2023-08-31  3:29         ` Shuai Xue
  2023-08-31  9:06           ` Helge Deller
  2023-08-31 16:56           ` Luck, Tony
  0 siblings, 2 replies; 13+ messages in thread
From: Shuai Xue @ 2023-08-31  3:29 UTC (permalink / raw)
  To: Will Deacon, Luck, Tony
  Cc: catalin.marinas, James.Bottomley, deller, dave.hansen, luto,
	peterz, tglx, mingo, bp, x86, hpa, linux-arm-kernel,
	linux-kernel, linux-parisc



On 2023/8/31 06:18, Will Deacon wrote:
> On Mon, Aug 28, 2023 at 09:41:55AM +0800, Shuai Xue wrote:
>> On 2023/8/22 09:15, Shuai Xue wrote:
>>> On 2023/8/21 18:50, Will Deacon wrote:
>>>>> diff --git a/arch/arm64/mm/fault.c b/arch/arm64/mm/fault.c
>>>>> index 3fe516b32577..38e2186882bd 100644
>>>>> --- a/arch/arm64/mm/fault.c
>>>>> +++ b/arch/arm64/mm/fault.c
>>>>> @@ -679,6 +679,8 @@ static int __kprobes do_page_fault(unsigned long far, unsigned long esr,
>>>>>  	} else if (fault & (VM_FAULT_HWPOISON_LARGE | VM_FAULT_HWPOISON)) {
>>>>>  		unsigned int lsb;
>>>>>  
>>>>> +		pr_err("MCE: Killing %s:%d due to hardware memory corruption fault at %lx\n",
>>>>> +		       current->comm, current->pid, far);
>>>>>  		lsb = PAGE_SHIFT;
>>>>>  		if (fault & VM_FAULT_HWPOISON_LARGE)
>>>>>  			lsb = hstate_index_to_shift(VM_FAULT_GET_HINDEX(fault));
>>>>
>>>> Hmm, I'm not convinced by this. We have 'show_unhandled_signals' already,
>>>> and there's plenty of code in memory-failure.c for handling poisoned pages
>>>> reported by e.g. GHES. I don't think dumping extra messages in dmesg from
>>>> the arch code really adds anything.
>>>
>>> I see the show_unhandled_signals() will dump the stack but it rely on
>>> /proc/sys/debug/exception-trace be set.
>>>
>>> The memory failure is the top issue in our production cloud and also other hyperscalers.
>>> We have received complaints from our operations engineers and end users that processes
>>> are being inexplicably killed :(. Could you please consider add a message?
> 
> I don't have any objection to logging this stuff somehow, I'm just not
> convinced that the console is the best place for that information in 2023.
> Is there really nothing better?
> 

Hi, Will,

I agree that console might not the better place, but it still plays an important role.
IMO the most direct idea for end user to check what happened is to check by viewing
the dmesg. In addition, we deployed some log store service collects all cluster dmesg
from /var/log/kern.

Do you have any better choice?

+ @Tony for ERST
I found that after /dev/mcelog driver deprecated, both x86 and ARM64 platform does not
support to collect MCE record of previous boot in persistent storage via APEI ERST.
I propose to add a mechanism to do it for rasdaemon. Do you have any suggestion?

Thank you.
Best Regards,
Shuai


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH] HWPOISON: add a pr_err message when forcibly send a sigbus
  2023-08-31  3:29         ` Shuai Xue
@ 2023-08-31  9:06           ` Helge Deller
  2023-09-04 10:40             ` Shuai Xue
  2023-08-31 16:56           ` Luck, Tony
  1 sibling, 1 reply; 13+ messages in thread
From: Helge Deller @ 2023-08-31  9:06 UTC (permalink / raw)
  To: Shuai Xue, Will Deacon, Luck, Tony
  Cc: catalin.marinas, James.Bottomley, dave.hansen, luto, peterz,
	tglx, mingo, bp, x86, hpa, linux-arm-kernel, linux-kernel,
	linux-parisc

On 8/31/23 05:29, Shuai Xue wrote:
> On 2023/8/31 06:18, Will Deacon wrote:
>> On Mon, Aug 28, 2023 at 09:41:55AM +0800, Shuai Xue wrote:
>>> On 2023/8/22 09:15, Shuai Xue wrote:
>>>> On 2023/8/21 18:50, Will Deacon wrote:
>>>>>> diff --git a/arch/arm64/mm/fault.c b/arch/arm64/mm/fault.c
>>>>>> index 3fe516b32577..38e2186882bd 100644
>>>>>> --- a/arch/arm64/mm/fault.c
>>>>>> +++ b/arch/arm64/mm/fault.c
>>>>>> @@ -679,6 +679,8 @@ static int __kprobes do_page_fault(unsigned long far, unsigned long esr,
>>>>>>   	} else if (fault & (VM_FAULT_HWPOISON_LARGE | VM_FAULT_HWPOISON)) {
>>>>>>   		unsigned int lsb;
>>>>>>
>>>>>> +		pr_err("MCE: Killing %s:%d due to hardware memory corruption fault at %lx\n",
>>>>>> +		       current->comm, current->pid, far);
>>>>>>   		lsb = PAGE_SHIFT;
>>>>>>   		if (fault & VM_FAULT_HWPOISON_LARGE)
>>>>>>   			lsb = hstate_index_to_shift(VM_FAULT_GET_HINDEX(fault));
>>>>>
>>>>> Hmm, I'm not convinced by this. We have 'show_unhandled_signals' already,
>>>>> and there's plenty of code in memory-failure.c for handling poisoned pages
>>>>> reported by e.g. GHES. I don't think dumping extra messages in dmesg from
>>>>> the arch code really adds anything.
>>>>
>>>> I see the show_unhandled_signals() will dump the stack but it rely on
>>>> /proc/sys/debug/exception-trace be set.
>>>>
>>>> The memory failure is the top issue in our production cloud and also other hyperscalers.
>>>> We have received complaints from our operations engineers and end users that processes
>>>> are being inexplicably killed :(. Could you please consider add a message?
>>
>> I don't have any objection to logging this stuff somehow, I'm just not
>> convinced that the console is the best place for that information in 2023.
>> Is there really nothing better?

> I agree that console might not the better place, but it still plays an important role.
> IMO the most direct idea for end user to check what happened is to check by viewing
> the dmesg. In addition, we deployed some log store service collects all cluster dmesg
> from /var/log/kern.

Right, pr_err() is not just console.
It ends up in the syslog, which ends up in a lot of places, e.g. through syslog forwarding.
Most monitoring tools monitor the syslog as well.

So, IMHO pr_err() is the right thing.

Helge



>
> Hi, Will,
>



>
> Do you have any better choice?
>
> + @Tony for ERST
> I found that after /dev/mcelog driver deprecated, both x86 and ARM64 platform does not
> support to collect MCE record of previous boot in persistent storage via APEI ERST.
> I propose to add a mechanism to do it for rasdaemon. Do you have any suggestion?
>
> Thank you.
> Best Regards,
> Shuai
>


^ permalink raw reply	[flat|nested] 13+ messages in thread

* RE: [PATCH] HWPOISON: add a pr_err message when forcibly send a sigbus
  2023-08-31  3:29         ` Shuai Xue
  2023-08-31  9:06           ` Helge Deller
@ 2023-08-31 16:56           ` Luck, Tony
  2023-09-04 10:39             ` Shuai Xue
  1 sibling, 1 reply; 13+ messages in thread
From: Luck, Tony @ 2023-08-31 16:56 UTC (permalink / raw)
  To: Shuai Xue, Will Deacon
  Cc: catalin.marinas, James.Bottomley, deller, dave.hansen,
	Lutomirski, Andy, peterz, tglx, mingo, bp, x86, hpa,
	linux-arm-kernel, linux-kernel, linux-parisc

> + @Tony for ERST
> I found that after /dev/mcelog driver deprecated, both x86 and ARM64 platform does not
> support to collect MCE record of previous boot in persistent storage via APEI ERST.
> I propose to add a mechanism to do it for rasdaemon. Do you have any suggestion?

APEI ERST stored records should be available in /sys/fs/pstore after the system reboots.

Can rasdaemon collect them from there?

-Tony

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH] HWPOISON: add a pr_err message when forcibly send a sigbus
  2023-08-31 16:56           ` Luck, Tony
@ 2023-09-04 10:39             ` Shuai Xue
  0 siblings, 0 replies; 13+ messages in thread
From: Shuai Xue @ 2023-09-04 10:39 UTC (permalink / raw)
  To: Luck, Tony, Will Deacon
  Cc: catalin.marinas, James.Bottomley, deller, dave.hansen,
	Lutomirski, Andy, peterz, tglx, mingo, bp, x86, hpa,
	linux-arm-kernel, linux-kernel, linux-parisc



On 2023/9/1 00:56, Luck, Tony wrote:
>> + @Tony for ERST
>> I found that after /dev/mcelog driver deprecated, both x86 and ARM64 platform does not
>> support to collect MCE record of previous boot in persistent storage via APEI ERST.
>> I propose to add a mechanism to do it for rasdaemon. Do you have any suggestion?
> 
> APEI ERST stored records should be available in /sys/fs/pstore after the system reboots.
> 
> Can rasdaemon collect them from there?

I am afraid not yet. I will give it a shot and try to send a patch set.

> 
> -Tony

Thank you.

Best Regards,
Shuai

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH] HWPOISON: add a pr_err message when forcibly send a sigbus
  2023-08-31  9:06           ` Helge Deller
@ 2023-09-04 10:40             ` Shuai Xue
  2023-10-12  8:16               ` Shuai Xue
  0 siblings, 1 reply; 13+ messages in thread
From: Shuai Xue @ 2023-09-04 10:40 UTC (permalink / raw)
  To: Helge Deller, Will Deacon, Luck, Tony
  Cc: catalin.marinas, James.Bottomley, dave.hansen, luto, peterz,
	tglx, mingo, bp, x86, hpa, linux-arm-kernel, linux-kernel,
	linux-parisc



On 2023/8/31 17:06, Helge Deller wrote:
> On 8/31/23 05:29, Shuai Xue wrote:
>> On 2023/8/31 06:18, Will Deacon wrote:
>>> On Mon, Aug 28, 2023 at 09:41:55AM +0800, Shuai Xue wrote:
>>>> On 2023/8/22 09:15, Shuai Xue wrote:
>>>>> On 2023/8/21 18:50, Will Deacon wrote:
>>>>>>> diff --git a/arch/arm64/mm/fault.c b/arch/arm64/mm/fault.c
>>>>>>> index 3fe516b32577..38e2186882bd 100644
>>>>>>> --- a/arch/arm64/mm/fault.c
>>>>>>> +++ b/arch/arm64/mm/fault.c
>>>>>>> @@ -679,6 +679,8 @@ static int __kprobes do_page_fault(unsigned long far, unsigned long esr,
>>>>>>>       } else if (fault & (VM_FAULT_HWPOISON_LARGE | VM_FAULT_HWPOISON)) {
>>>>>>>           unsigned int lsb;
>>>>>>>
>>>>>>> +        pr_err("MCE: Killing %s:%d due to hardware memory corruption fault at %lx\n",
>>>>>>> +               current->comm, current->pid, far);
>>>>>>>           lsb = PAGE_SHIFT;
>>>>>>>           if (fault & VM_FAULT_HWPOISON_LARGE)
>>>>>>>               lsb = hstate_index_to_shift(VM_FAULT_GET_HINDEX(fault));
>>>>>>
>>>>>> Hmm, I'm not convinced by this. We have 'show_unhandled_signals' already,
>>>>>> and there's plenty of code in memory-failure.c for handling poisoned pages
>>>>>> reported by e.g. GHES. I don't think dumping extra messages in dmesg from
>>>>>> the arch code really adds anything.
>>>>>
>>>>> I see the show_unhandled_signals() will dump the stack but it rely on
>>>>> /proc/sys/debug/exception-trace be set.
>>>>>
>>>>> The memory failure is the top issue in our production cloud and also other hyperscalers.
>>>>> We have received complaints from our operations engineers and end users that processes
>>>>> are being inexplicably killed :(. Could you please consider add a message?
>>>
>>> I don't have any objection to logging this stuff somehow, I'm just not
>>> convinced that the console is the best place for that information in 2023.
>>> Is there really nothing better?
> 
>> I agree that console might not the better place, but it still plays an important role.
>> IMO the most direct idea for end user to check what happened is to check by viewing
>> the dmesg. In addition, we deployed some log store service collects all cluster dmesg
>> from /var/log/kern.
> 
> Right, pr_err() is not just console.
> It ends up in the syslog, which ends up in a lot of places, e.g. through syslog forwarding.
> Most monitoring tools monitor the syslog as well.
> 
> So, IMHO pr_err() is the right thing.
> 
> Helge
> 

Totally agreed.

Thank you.

Best Regards,
Shuai


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH] HWPOISON: add a pr_err message when forcibly send a sigbus
  2023-09-04 10:40             ` Shuai Xue
@ 2023-10-12  8:16               ` Shuai Xue
  0 siblings, 0 replies; 13+ messages in thread
From: Shuai Xue @ 2023-10-12  8:16 UTC (permalink / raw)
  To: Helge Deller, Will Deacon, Luck, Tony
  Cc: catalin.marinas, James.Bottomley, dave.hansen, luto, peterz,
	tglx, mingo, bp, x86, hpa, linux-arm-kernel, linux-kernel,
	linux-parisc



On 2023/9/4 18:40, Shuai Xue wrote:
> 
> 
> On 2023/8/31 17:06, Helge Deller wrote:
>> On 8/31/23 05:29, Shuai Xue wrote:
>>> On 2023/8/31 06:18, Will Deacon wrote:
>>>> On Mon, Aug 28, 2023 at 09:41:55AM +0800, Shuai Xue wrote:
>>>>> On 2023/8/22 09:15, Shuai Xue wrote:
>>>>>> On 2023/8/21 18:50, Will Deacon wrote:
>>>>>>>> diff --git a/arch/arm64/mm/fault.c b/arch/arm64/mm/fault.c
>>>>>>>> index 3fe516b32577..38e2186882bd 100644
>>>>>>>> --- a/arch/arm64/mm/fault.c
>>>>>>>> +++ b/arch/arm64/mm/fault.c
>>>>>>>> @@ -679,6 +679,8 @@ static int __kprobes do_page_fault(unsigned long far, unsigned long esr,
>>>>>>>>       } else if (fault & (VM_FAULT_HWPOISON_LARGE | VM_FAULT_HWPOISON)) {
>>>>>>>>           unsigned int lsb;
>>>>>>>>
>>>>>>>> +        pr_err("MCE: Killing %s:%d due to hardware memory corruption fault at %lx\n",
>>>>>>>> +               current->comm, current->pid, far);
>>>>>>>>           lsb = PAGE_SHIFT;
>>>>>>>>           if (fault & VM_FAULT_HWPOISON_LARGE)
>>>>>>>>               lsb = hstate_index_to_shift(VM_FAULT_GET_HINDEX(fault));
>>>>>>>
>>>>>>> Hmm, I'm not convinced by this. We have 'show_unhandled_signals' already,
>>>>>>> and there's plenty of code in memory-failure.c for handling poisoned pages
>>>>>>> reported by e.g. GHES. I don't think dumping extra messages in dmesg from
>>>>>>> the arch code really adds anything.
>>>>>>
>>>>>> I see the show_unhandled_signals() will dump the stack but it rely on
>>>>>> /proc/sys/debug/exception-trace be set.
>>>>>>
>>>>>> The memory failure is the top issue in our production cloud and also other hyperscalers.
>>>>>> We have received complaints from our operations engineers and end users that processes
>>>>>> are being inexplicably killed :(. Could you please consider add a message?
>>>>
>>>> I don't have any objection to logging this stuff somehow, I'm just not
>>>> convinced that the console is the best place for that information in 2023.
>>>> Is there really nothing better?
>>
>>> I agree that console might not the better place, but it still plays an important role.
>>> IMO the most direct idea for end user to check what happened is to check by viewing
>>> the dmesg. In addition, we deployed some log store service collects all cluster dmesg
>>> from /var/log/kern.
>>
>> Right, pr_err() is not just console.
>> It ends up in the syslog, which ends up in a lot of places, e.g. through syslog forwarding.
>> Most monitoring tools monitor the syslog as well.
>>
>> So, IMHO pr_err() is the right thing.
>>
>> Helge
>>
> 
> Totally agreed.
> 
> Thank you.
> 
> Best Regards,
> Shuai
> 


Hi, Will,

Based our discussion in this thread, are you happy to queue this patch?

Thank you.
Best Regards,
Shuai

^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2023-10-12  8:17 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-08-19 10:22 [PATCH] HWPOISON: add a pr_err message when forcibly send a sigbus Shuai Xue
2023-08-19 11:49 ` Helge Deller
2023-08-21 10:50 ` Will Deacon
2023-08-21 11:59   ` Helge Deller
2023-08-22  1:15   ` Shuai Xue
2023-08-28  1:41     ` Shuai Xue
2023-08-30 22:18       ` Will Deacon
2023-08-31  3:29         ` Shuai Xue
2023-08-31  9:06           ` Helge Deller
2023-09-04 10:40             ` Shuai Xue
2023-10-12  8:16               ` Shuai Xue
2023-08-31 16:56           ` Luck, Tony
2023-09-04 10:39             ` Shuai Xue

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).