All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 1/2] powerpc/book3s: Display more info for MCE error console log.
@ 2017-03-28 13:45 Mahesh J Salgaonkar
  2017-03-28 13:45 ` [PATCH 2/2] powerpc/book3s: Display task info for MCE error in user mode Mahesh J Salgaonkar
  2017-04-06 13:06 ` [1/2] powerpc/book3s: Display more info for MCE error console log Michael Ellerman
  0 siblings, 2 replies; 5+ messages in thread
From: Mahesh J Salgaonkar @ 2017-03-28 13:45 UTC (permalink / raw)
  To: linuxppc-dev, Michael Ellerman; +Cc: Nicholas Piggin

From: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com>

For D-side errors we print data load/store address as 'Effective address'
that caused MC. In addition to print NIP, print kernel function name as well.

After this patch the MCE console log would look like:

[  291.444281] Severe Machine check interrupt [Recovered]
[  291.444477]   NIP [d00000001bc70194]: init_module+0x194/0x2b0 [bork_kernel]
[  291.444707]   Initiator: CPU
[  291.444761]   Error type: SLB [Parity]
[  291.444793]     Effective address: d000000026de0000

Signed-off-by: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com>
---
 arch/powerpc/kernel/mce.c |    3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/arch/powerpc/kernel/mce.c b/arch/powerpc/kernel/mce.c
index 399aeaf..e82d4ee 100644
--- a/arch/powerpc/kernel/mce.c
+++ b/arch/powerpc/kernel/mce.c
@@ -311,7 +311,8 @@ void machine_check_print_event_info(struct machine_check_event *evt)
 	printk("%s%s Machine check interrupt [%s]\n", level, sevstr,
 	       evt->disposition == MCE_DISPOSITION_RECOVERED ?
 	       "Recovered" : "Not recovered");
-	printk("%s  NIP: %016llx\n", level, evt->srr0);
+	printk("%s  NIP [%016llx]: %pS\n", level, evt->srr0,
+							(void *)evt->srr0);
 	printk("%s  Initiator: %s\n", level,
 	       evt->initiator == MCE_INITIATOR_CPU ? "CPU" : "Unknown");
 	switch (evt->error_type) {

^ permalink raw reply related	[flat|nested] 5+ messages in thread

* [PATCH 2/2] powerpc/book3s: Display task info for MCE error in user mode.
  2017-03-28 13:45 [PATCH 1/2] powerpc/book3s: Display more info for MCE error console log Mahesh J Salgaonkar
@ 2017-03-28 13:45 ` Mahesh J Salgaonkar
  2017-03-30  0:09   ` Nicholas Piggin
  2017-04-06 13:06 ` [1/2] powerpc/book3s: Display more info for MCE error console log Michael Ellerman
  1 sibling, 1 reply; 5+ messages in thread
From: Mahesh J Salgaonkar @ 2017-03-28 13:45 UTC (permalink / raw)
  To: linuxppc-dev, Michael Ellerman; +Cc: Nicholas Piggin

From: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com>

For MCE that hit while in use mode MSR(HV=1,PR=1), print the task info on the
console MCE error log. This will help to identify application that stumbled
upon MCE error.

After this patch the MCE console log would look like:

[    1.268998] Severe Machine check interrupt [Recovered]
[    1.269024]   NIP: [0000000010039778] PID: 762 Comm: ebizzy
[    1.269048]   Initiator: CPU
[    1.269067]   Error type: SLB [Multihit]
[    1.269088]     Effective address: 0000000010039778

[    1.855084] Severe Machine check interrupt [Not recovered]
[    1.855111]   NIP: [0000000010039778] PID: 763 Comm: ebizzy
[    1.855135]   Initiator: CPU
[    1.855154]   Error type: UE [Page table walk ifetch]
[    1.855179]     Effective address: 0000000010039778
[    1.855210] ebizzy[763]: unhandled signal 7 at 0000000010039778 nip 0000000010039778 lr 0000000010001b44 code 30004

Signed-off-by: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com>
---
 arch/powerpc/include/asm/mce.h        |    3 ++-
 arch/powerpc/kernel/mce.c             |   12 +++++++++---
 arch/powerpc/platforms/powernv/opal.c |    2 +-
 3 files changed, 12 insertions(+), 5 deletions(-)

diff --git a/arch/powerpc/include/asm/mce.h b/arch/powerpc/include/asm/mce.h
index e3498b4..0d865fe 100644
--- a/arch/powerpc/include/asm/mce.h
+++ b/arch/powerpc/include/asm/mce.h
@@ -207,7 +207,8 @@ extern void save_mce_event(struct pt_regs *regs, long handled,
 extern int get_mce_event(struct machine_check_event *mce, bool release);
 extern void release_mce_event(void);
 extern void machine_check_queue_event(void);
-extern void machine_check_print_event_info(struct machine_check_event *evt);
+extern void machine_check_print_event_info(struct machine_check_event *evt,
+							bool user_mode);
 extern uint64_t get_mce_fault_addr(struct machine_check_event *evt);
 
 #endif /* __ASM_PPC64_MCE_H__ */
diff --git a/arch/powerpc/kernel/mce.c b/arch/powerpc/kernel/mce.c
index e82d4ee..304cbaf 100644
--- a/arch/powerpc/kernel/mce.c
+++ b/arch/powerpc/kernel/mce.c
@@ -228,12 +228,13 @@ static void machine_check_process_queued_event(struct irq_work *work)
 	while (__this_cpu_read(mce_queue_count) > 0) {
 		index = __this_cpu_read(mce_queue_count) - 1;
 		machine_check_print_event_info(
-				this_cpu_ptr(&mce_event_queue[index]));
+				this_cpu_ptr(&mce_event_queue[index]), false);
 		__this_cpu_dec(mce_queue_count);
 	}
 }
 
-void machine_check_print_event_info(struct machine_check_event *evt)
+void machine_check_print_event_info(struct machine_check_event *evt,
+								bool user_mode)
 {
 	const char *level, *sevstr, *subtype;
 	static const char *mc_ue_types[] = {
@@ -311,8 +312,13 @@ void machine_check_print_event_info(struct machine_check_event *evt)
 	printk("%s%s Machine check interrupt [%s]\n", level, sevstr,
 	       evt->disposition == MCE_DISPOSITION_RECOVERED ?
 	       "Recovered" : "Not recovered");
-	printk("%s  NIP [%016llx]: %pS\n", level, evt->srr0,
+	if (user_mode) {
+		printk("%s  NIP: [%016llx] PID: %d Comm: %s\n", level,
+				evt->srr0, current->pid, current->comm);
+	} else {
+		printk("%s  NIP [%016llx]: %pS\n", level, evt->srr0,
 							(void *)evt->srr0);
+	}
 	printk("%s  Initiator: %s\n", level,
 	       evt->initiator == MCE_INITIATOR_CPU ? "CPU" : "Unknown");
 	switch (evt->error_type) {
diff --git a/arch/powerpc/platforms/powernv/opal.c b/arch/powerpc/platforms/powernv/opal.c
index e0f856b..296c942 100644
--- a/arch/powerpc/platforms/powernv/opal.c
+++ b/arch/powerpc/platforms/powernv/opal.c
@@ -435,7 +435,7 @@ int opal_machine_check(struct pt_regs *regs)
 		       evt.version);
 		return 0;
 	}
-	machine_check_print_event_info(&evt);
+	machine_check_print_event_info(&evt, user_mode(regs));
 
 	if (opal_recover_mce(regs, &evt))
 		return 1;

^ permalink raw reply related	[flat|nested] 5+ messages in thread

* Re: [PATCH 2/2] powerpc/book3s: Display task info for MCE error in user mode.
  2017-03-28 13:45 ` [PATCH 2/2] powerpc/book3s: Display task info for MCE error in user mode Mahesh J Salgaonkar
@ 2017-03-30  0:09   ` Nicholas Piggin
  2017-03-30  7:04     ` Mahesh Jagannath Salgaonkar
  0 siblings, 1 reply; 5+ messages in thread
From: Nicholas Piggin @ 2017-03-30  0:09 UTC (permalink / raw)
  To: Mahesh J Salgaonkar; +Cc: linuxppc-dev, Michael Ellerman

On Tue, 28 Mar 2017 19:15:28 +0530
Mahesh J Salgaonkar <mahesh@linux.vnet.ibm.com> wrote:

> From: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com>
> 
> For MCE that hit while in use mode MSR(HV=1,PR=1), print the task info on the
> console MCE error log. This will help to identify application that stumbled
> upon MCE error.

I think you may still want these details for a task currently in
kernel. How about something like if (!in_interrupt()) {


> @@ -311,8 +312,13 @@ void machine_check_print_event_info(struct machine_check_event *evt)
>  	printk("%s%s Machine check interrupt [%s]\n", level, sevstr,
>  	       evt->disposition == MCE_DISPOSITION_RECOVERED ?
>  	       "Recovered" : "Not recovered");
> -	printk("%s  NIP [%016llx]: %pS\n", level, evt->srr0,
> +	if (user_mode) {
> +		printk("%s  NIP: [%016llx] PID: %d Comm: %s\n", level,
> +				evt->srr0, current->pid, current->comm);
> +	} else {
> +		printk("%s  NIP [%016llx]: %pS\n", level, evt->srr0,
>  							(void *)evt->srr0);
> +	}
>  	printk("%s  Initiator: %s\n", level,
>  	       evt->initiator == MCE_INITIATOR_CPU ? "CPU" : "Unknown");
>  	switch (evt->error_type) {

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH 2/2] powerpc/book3s: Display task info for MCE error in user mode.
  2017-03-30  0:09   ` Nicholas Piggin
@ 2017-03-30  7:04     ` Mahesh Jagannath Salgaonkar
  0 siblings, 0 replies; 5+ messages in thread
From: Mahesh Jagannath Salgaonkar @ 2017-03-30  7:04 UTC (permalink / raw)
  To: Nicholas Piggin; +Cc: linuxppc-dev, Michael Ellerman

On 03/30/2017 05:39 AM, Nicholas Piggin wrote:
> On Tue, 28 Mar 2017 19:15:28 +0530
> Mahesh J Salgaonkar <mahesh@linux.vnet.ibm.com> wrote:
> 
>> From: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com>
>>
>> For MCE that hit while in use mode MSR(HV=1,PR=1), print the task info on the
>> console MCE error log. This will help to identify application that stumbled
>> upon MCE error.
> 
> I think you may still want these details for a task currently in
> kernel. How about something like if (!in_interrupt()) {

We queue up the MCE event to delay the printing of recovered MCEs in the
kernel. We may have to hook up the task details in the MCE event.

> 
> 
>> @@ -311,8 +312,13 @@ void machine_check_print_event_info(struct machine_check_event *evt)
>>  	printk("%s%s Machine check interrupt [%s]\n", level, sevstr,
>>  	       evt->disposition == MCE_DISPOSITION_RECOVERED ?
>>  	       "Recovered" : "Not recovered");
>> -	printk("%s  NIP [%016llx]: %pS\n", level, evt->srr0,
>> +	if (user_mode) {
>> +		printk("%s  NIP: [%016llx] PID: %d Comm: %s\n", level,
>> +				evt->srr0, current->pid, current->comm);
>> +	} else {
>> +		printk("%s  NIP [%016llx]: %pS\n", level, evt->srr0,
>>  							(void *)evt->srr0);
>> +	}
>>  	printk("%s  Initiator: %s\n", level,
>>  	       evt->initiator == MCE_INITIATOR_CPU ? "CPU" : "Unknown");
>>  	switch (evt->error_type) {
> 

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [1/2] powerpc/book3s: Display more info for MCE error console log.
  2017-03-28 13:45 [PATCH 1/2] powerpc/book3s: Display more info for MCE error console log Mahesh J Salgaonkar
  2017-03-28 13:45 ` [PATCH 2/2] powerpc/book3s: Display task info for MCE error in user mode Mahesh J Salgaonkar
@ 2017-04-06 13:06 ` Michael Ellerman
  1 sibling, 0 replies; 5+ messages in thread
From: Michael Ellerman @ 2017-04-06 13:06 UTC (permalink / raw)
  To: Mahesh Salgaonkar, linuxppc-dev; +Cc: Nicholas Piggin

On Tue, 2017-03-28 at 13:45:04 UTC, Mahesh Salgaonkar wrote:
> From: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com>
> 
> For D-side errors we print data load/store address as 'Effective address'
> that caused MC. In addition to print NIP, print kernel function name as well.
> 
> After this patch the MCE console log would look like:
> 
> [  291.444281] Severe Machine check interrupt [Recovered]
> [  291.444477]   NIP [d00000001bc70194]: init_module+0x194/0x2b0 [bork_kernel]
> [  291.444707]   Initiator: CPU
> [  291.444761]   Error type: SLB [Parity]
> [  291.444793]     Effective address: d000000026de0000
> 
> Signed-off-by: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com>

Series applied to powerpc next, thanks.

https://git.kernel.org/powerpc/c/5b1d6fc2d4d927852214f2a7e2a8eb

cheers

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2017-04-06 13:06 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-03-28 13:45 [PATCH 1/2] powerpc/book3s: Display more info for MCE error console log Mahesh J Salgaonkar
2017-03-28 13:45 ` [PATCH 2/2] powerpc/book3s: Display task info for MCE error in user mode Mahesh J Salgaonkar
2017-03-30  0:09   ` Nicholas Piggin
2017-03-30  7:04     ` Mahesh Jagannath Salgaonkar
2017-04-06 13:06 ` [1/2] powerpc/book3s: Display more info for MCE error console log Michael Ellerman

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.