All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH] powerpc/mce: check if event info is valid
@ 2021-08-06 13:23 Ganesh Goudar
  2021-09-17  8:54 ` Ganesh
  0 siblings, 1 reply; 4+ messages in thread
From: Ganesh Goudar @ 2021-08-06 13:23 UTC (permalink / raw)
  To: linuxppc-dev, mpe; +Cc: Ganesh Goudar, mahesh, npiggin

Check if the event info is valid before printing the
event information. When a fwnmi enabled nested kvm guest
hits a machine check exception L0 and L2 would generate
machine check event info, But L1 would not generate any
machine check event info as it won't go through 0x200
vector and prints some unwanted message.

To fix this, 'in_use' variable in machine check event info is
no more in use, rename it to 'valid' and check if the event
information is valid before logging the event information.

without this patch L1 would print following message for
exceptions encountered in L2, as event structure will be
empty in L1.

"Machine Check Exception, Unknown event version 0".

Signed-off-by: Ganesh Goudar <ganeshgr@linux.ibm.com>
---
 arch/powerpc/include/asm/mce.h | 2 +-
 arch/powerpc/kernel/mce.c      | 7 +++++--
 2 files changed, 6 insertions(+), 3 deletions(-)

diff --git a/arch/powerpc/include/asm/mce.h b/arch/powerpc/include/asm/mce.h
index 331d944280b8..3646f53f228f 100644
--- a/arch/powerpc/include/asm/mce.h
+++ b/arch/powerpc/include/asm/mce.h
@@ -113,7 +113,7 @@ enum MCE_LinkErrorType {
 
 struct machine_check_event {
 	enum MCE_Version	version:8;
-	u8			in_use;
+	u8			valid;
 	enum MCE_Severity	severity:8;
 	enum MCE_Initiator	initiator:8;
 	enum MCE_ErrorType	error_type:8;
diff --git a/arch/powerpc/kernel/mce.c b/arch/powerpc/kernel/mce.c
index 47a683cd00d2..b778394a06b5 100644
--- a/arch/powerpc/kernel/mce.c
+++ b/arch/powerpc/kernel/mce.c
@@ -114,7 +114,7 @@ void save_mce_event(struct pt_regs *regs, long handled,
 	mce->srr0 = nip;
 	mce->srr1 = regs->msr;
 	mce->gpr3 = regs->gpr[3];
-	mce->in_use = 1;
+	mce->valid = 1;
 	mce->cpu = get_paca()->paca_index;
 
 	/* Mark it recovered if we have handled it and MSR(RI=1). */
@@ -202,7 +202,7 @@ int get_mce_event(struct machine_check_event *mce, bool release)
 		if (mce)
 			*mce = *mc_evt;
 		if (release)
-			mc_evt->in_use = 0;
+			mc_evt->valid = 0;
 		ret = 1;
 	}
 	/* Decrement the count to free the slot. */
@@ -413,6 +413,9 @@ void machine_check_print_event_info(struct machine_check_event *evt,
 		"Probable Software error (some chance of hardware cause)",
 	};
 
+	if (!evt->valid)
+		return;
+
 	/* Print things out */
 	if (evt->version != MCE_V1) {
 		pr_err("Machine Check Exception, Unknown event version %d !\n",
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 4+ messages in thread

* Re: [PATCH] powerpc/mce: check if event info is valid
  2021-08-06 13:23 [PATCH] powerpc/mce: check if event info is valid Ganesh Goudar
@ 2021-09-17  8:54 ` Ganesh
  2021-10-07 12:09   ` Michael Ellerman
  0 siblings, 1 reply; 4+ messages in thread
From: Ganesh @ 2021-09-17  8:54 UTC (permalink / raw)
  To: linuxppc-dev, mpe; +Cc: mahesh, npiggin

[-- Attachment #1: Type: text/plain, Size: 851 bytes --]

On 8/6/21 6:53 PM, Ganesh Goudar wrote:

> Check if the event info is valid before printing the
> event information. When a fwnmi enabled nested kvm guest
> hits a machine check exception L0 and L2 would generate
> machine check event info, But L1 would not generate any
> machine check event info as it won't go through 0x200
> vector and prints some unwanted message.
>
> To fix this, 'in_use' variable in machine check event info is
> no more in use, rename it to 'valid' and check if the event
> information is valid before logging the event information.
>
> without this patch L1 would print following message for
> exceptions encountered in L2, as event structure will be
> empty in L1.
>
> "Machine Check Exception, Unknown event version 0".
>
> Signed-off-by: Ganesh Goudar <ganeshgr@linux.ibm.com>
> ---

Hi mpe, Any comments on this patch.


[-- Attachment #2: Type: text/html, Size: 1255 bytes --]

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [PATCH] powerpc/mce: check if event info is valid
  2021-09-17  8:54 ` Ganesh
@ 2021-10-07 12:09   ` Michael Ellerman
  2021-10-15  7:44     ` Nicholas Piggin
  0 siblings, 1 reply; 4+ messages in thread
From: Michael Ellerman @ 2021-10-07 12:09 UTC (permalink / raw)
  To: Ganesh, linuxppc-dev; +Cc: mahesh, npiggin

Ganesh <ganeshgr@linux.ibm.com> writes:
> On 8/6/21 6:53 PM, Ganesh Goudar wrote:
>
>> Check if the event info is valid before printing the
>> event information. When a fwnmi enabled nested kvm guest
>> hits a machine check exception L0 and L2 would generate
>> machine check event info, But L1 would not generate any
>> machine check event info as it won't go through 0x200
>> vector and prints some unwanted message.
>>
>> To fix this, 'in_use' variable in machine check event info is
>> no more in use, rename it to 'valid' and check if the event
>> information is valid before logging the event information.
>>
>> without this patch L1 would print following message for
>> exceptions encountered in L2, as event structure will be
>> empty in L1.
>>
>> "Machine Check Exception, Unknown event version 0".
>>
>> Signed-off-by: Ganesh Goudar <ganeshgr@linux.ibm.com>
>> ---
>
> Hi mpe, Any comments on this patch.

The variable rename is a bit of a distraction.

But ignoring that, how do we end up processing a machine_check_event
that has in_use = 0?

You don't give much detail on what call path you're talking about. I
guess we're coming in via the calls in the KVM code?

In the definition of kvm_vcpu_arch we have:

	struct machine_check_event mce_evt; /* Valid if trap == 0x200 */

And you said we're not going via 0x200 in L1. But so shouldn't we be
teaching the KVM code not to use mce_evt when trap is not 0x200?

cheers

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [PATCH] powerpc/mce: check if event info is valid
  2021-10-07 12:09   ` Michael Ellerman
@ 2021-10-15  7:44     ` Nicholas Piggin
  0 siblings, 0 replies; 4+ messages in thread
From: Nicholas Piggin @ 2021-10-15  7:44 UTC (permalink / raw)
  To: Ganesh, linuxppc-dev, Michael Ellerman; +Cc: mahesh

Excerpts from Michael Ellerman's message of October 7, 2021 10:09 pm:
> Ganesh <ganeshgr@linux.ibm.com> writes:
>> On 8/6/21 6:53 PM, Ganesh Goudar wrote:
>>
>>> Check if the event info is valid before printing the
>>> event information. When a fwnmi enabled nested kvm guest
>>> hits a machine check exception L0 and L2 would generate
>>> machine check event info, But L1 would not generate any
>>> machine check event info as it won't go through 0x200
>>> vector and prints some unwanted message.
>>>
>>> To fix this, 'in_use' variable in machine check event info is
>>> no more in use, rename it to 'valid' and check if the event
>>> information is valid before logging the event information.
>>>
>>> without this patch L1 would print following message for
>>> exceptions encountered in L2, as event structure will be
>>> empty in L1.
>>>
>>> "Machine Check Exception, Unknown event version 0".
>>>
>>> Signed-off-by: Ganesh Goudar <ganeshgr@linux.ibm.com>
>>> ---
>>
>> Hi mpe, Any comments on this patch.
> 
> The variable rename is a bit of a distraction.
> 
> But ignoring that, how do we end up processing a machine_check_event
> that has in_use = 0?
> 
> You don't give much detail on what call path you're talking about. I
> guess we're coming in via the calls in the KVM code?
> 
> In the definition of kvm_vcpu_arch we have:
> 
> 	struct machine_check_event mce_evt; /* Valid if trap == 0x200 */
> 
> And you said we're not going via 0x200 in L1. But so shouldn't we be
> teaching the KVM code not to use mce_evt when trap is not 0x200?

I'm not sure we want the MCE to skip the L1 either. It should match the 
L0 hypervisor behaviour as closely as reasonably possible.

We might have to teach the KVM pseries path to do something about the
0x200 before the common HV guest exit handler (which is where the L1
message comes from).

Thanks,
Nick

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2021-10-15  7:45 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-08-06 13:23 [PATCH] powerpc/mce: check if event info is valid Ganesh Goudar
2021-09-17  8:54 ` Ganesh
2021-10-07 12:09   ` Michael Ellerman
2021-10-15  7:44     ` Nicholas Piggin

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.