All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v2] powerpc/mce: Fix access error in mce handler
@ 2021-09-09  6:43 Ganesh Goudar
  2021-09-17  6:39 ` Daniel Axtens
  2021-09-19 12:20 ` Michael Ellerman
  0 siblings, 2 replies; 4+ messages in thread
From: Ganesh Goudar @ 2021-09-09  6:43 UTC (permalink / raw)
  To: linuxppc-dev, mpe; +Cc: Ganesh Goudar, mahesh, npiggin

We queue an irq work for deferred processing of mce event
in realmode mce handler, where translation is disabled.
Queuing of the work may result in accessing memory outside
RMO region, such access needs the translation to be enabled
for an LPAR running with hash mmu else the kernel crashes.

After enabling translation in mce_handle_error() we used to
leave it enabled to avoid crashing here, but now with the
commit 74c3354bc1d89 ("powerpc/pseries/mce: restore msr before
returning from handler") we are restoring the MSR to disable
translation.

Hence to fix this enable the translation before queuing the work.

Without this change following trace is seen on injecting SLB
multihit in an LPAR running with hash mmu.

Oops: Kernel access of bad area, sig: 11 [#1]
LE PAGE_SIZE=64K MMU=Hash SMP NR_CPUS=2048 NUMA pSeries
CPU: 5 PID: 1883 Comm: insmod Tainted: G        OE     5.14.0-mce+ #137
NIP:  c000000000735d60 LR: c000000000318640 CTR: 0000000000000000
REGS: c00000001ebff9a0 TRAP: 0300   Tainted: G       OE      (5.14.0-mce+)
MSR:  8000000000001003 <SF,ME,RI,LE>  CR: 28008228  XER: 00000001
CFAR: c00000000031863c DAR: c00000027fa8fe08 DSISR: 40000000 IRQMASK: 0
GPR00: c0000000003186d0 c00000001ebffc40 c000000001b0df00 c0000000016337e8
GPR04: c0000000016337e8 c00000027fa8fe08 0000000000000023 c0000000016337f0
GPR08: 0000000000000023 c0000000012ffe08 0000000000000000 c008000001460240
GPR12: 0000000000000000 c00000001ec9a900 c00000002ac4bd00 0000000000000000
GPR16: 00000000000005a0 c0080000006b0000 c0080000006b05a0 c000000000ff3068
GPR20: c00000002ac4bbc0 0000000000000001 c00000002ac4bbc0 c008000001490298
GPR24: c008000001490108 c000000001636198 c008000001470090 c008000001470058
GPR28: 0000000000000510 c008000001000000 c008000008000019 0000000000000019
NIP [c000000000735d60] llist_add_batch+0x0/0x40
LR [c000000000318640] __irq_work_queue_local+0x70/0xc0
Call Trace:
[c00000001ebffc40] [c00000001ebffc0c] 0xc00000001ebffc0c (unreliable)
[c00000001ebffc60] [c0000000003186d0] irq_work_queue+0x40/0x70
[c00000001ebffc80] [c00000000004425c] machine_check_queue_event+0xbc/0xd0
[c00000001ebffcf0] [c00000000000838c] machine_check_early_common+0x16c/0x1f4

Fixes: 74c3354bc1d89 ("powerpc/pseries/mce: restore msr before returning from handler")
Signed-off-by: Ganesh Goudar <ganeshgr@linux.ibm.com>
---
v2: Change in commit message.
---
 arch/powerpc/kernel/mce.c | 16 ++++++++++++++--
 1 file changed, 14 insertions(+), 2 deletions(-)

diff --git a/arch/powerpc/kernel/mce.c b/arch/powerpc/kernel/mce.c
index 47a683cd00d2..9d1e39d42e3e 100644
--- a/arch/powerpc/kernel/mce.c
+++ b/arch/powerpc/kernel/mce.c
@@ -249,6 +249,7 @@ void machine_check_queue_event(void)
 {
 	int index;
 	struct machine_check_event evt;
+	unsigned long msr;
 
 	if (!get_mce_event(&evt, MCE_EVENT_RELEASE))
 		return;
@@ -262,8 +263,19 @@ void machine_check_queue_event(void)
 	memcpy(&local_paca->mce_info->mce_event_queue[index],
 	       &evt, sizeof(evt));
 
-	/* Queue irq work to process this event later. */
-	irq_work_queue(&mce_event_process_work);
+	/* Queue irq work to process this event later. Before
+	 * queuing the work enable translation for non radix LPAR,
+	 * as irq_work_queue may try to access memory outside RMO
+	 * region.
+	 */
+	if (!radix_enabled() && firmware_has_feature(FW_FEATURE_LPAR)) {
+		msr = mfmsr();
+		mtmsr(msr | MSR_IR | MSR_DR);
+		irq_work_queue(&mce_event_process_work);
+		mtmsr(msr);
+	} else {
+		irq_work_queue(&mce_event_process_work);
+	}
 }
 
 void mce_common_process_ue(struct pt_regs *regs,
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 4+ messages in thread

* Re: [PATCH v2] powerpc/mce: Fix access error in mce handler
  2021-09-09  6:43 [PATCH v2] powerpc/mce: Fix access error in mce handler Ganesh Goudar
@ 2021-09-17  6:39 ` Daniel Axtens
  2021-09-17  8:40   ` Ganesh
  2021-09-19 12:20 ` Michael Ellerman
  1 sibling, 1 reply; 4+ messages in thread
From: Daniel Axtens @ 2021-09-17  6:39 UTC (permalink / raw)
  To: Ganesh Goudar, linuxppc-dev, mpe; +Cc: Ganesh Goudar, mahesh, npiggin

Hi Ganesh,

> We queue an irq work for deferred processing of mce event
> in realmode mce handler, where translation is disabled.
> Queuing of the work may result in accessing memory outside
> RMO region, such access needs the translation to be enabled
> for an LPAR running with hash mmu else the kernel crashes.
>
> After enabling translation in mce_handle_error() we used to
> leave it enabled to avoid crashing here, but now with the
> commit 74c3354bc1d89 ("powerpc/pseries/mce: restore msr before
> returning from handler") we are restoring the MSR to disable
> translation.
>
> Hence to fix this enable the translation before queuing the work.

[snip]

> Fixes: 74c3354bc1d89 ("powerpc/pseries/mce: restore msr before returning from handler")

That patch changes arch/powerpc/powerpc/platforms/pseries/ras.c just
below this comment:

    /*
     * Enable translation as we will be accessing per-cpu variables
     * in save_mce_event() which may fall outside RMO region, also
     * leave it enabled because subsequently we will be queuing work
     * to workqueues where again per-cpu variables accessed, besides
     * fwnmi_release_errinfo() crashes when called in realmode on
     * pseries.
     * Note: All the realmode handling like flushing SLB entries for
     *       SLB multihit is done by now.
     */

That suggests per-cpu variables need protection. In your patch, you
enable translations just around irq_work_queue:

> +	/* Queue irq work to process this event later. Before
> +	 * queuing the work enable translation for non radix LPAR,
> +	 * as irq_work_queue may try to access memory outside RMO
> +	 * region.
> +	 */
> +	if (!radix_enabled() && firmware_has_feature(FW_FEATURE_LPAR)) {
> +		msr = mfmsr();
> +		mtmsr(msr | MSR_IR | MSR_DR);
> +		irq_work_queue(&mce_event_process_work);
> +		mtmsr(msr);
> +	} else {
> +		irq_work_queue(&mce_event_process_work);
> +	}

However, just before that in the function, there are a few things that
access per-cpu variables via the local_paca, e.g.:

memcpy(&local_paca->mce_info->mce_event_queue[index],
       &evt, sizeof(evt));

Do we need to widen the window where translations are enabled in order
to protect accesses to local_paca?

Kind regards,
Daniel

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [PATCH v2] powerpc/mce: Fix access error in mce handler
  2021-09-17  6:39 ` Daniel Axtens
@ 2021-09-17  8:40   ` Ganesh
  0 siblings, 0 replies; 4+ messages in thread
From: Ganesh @ 2021-09-17  8:40 UTC (permalink / raw)
  To: Daniel Axtens, linuxppc-dev, mpe; +Cc: mahesh, npiggin

[-- Attachment #1: Type: text/plain, Size: 2655 bytes --]

On 9/17/21 12:09 PM, Daniel Axtens wrote:

> Hi Ganesh,
>
>> We queue an irq work for deferred processing of mce event
>> in realmode mce handler, where translation is disabled.
>> Queuing of the work may result in accessing memory outside
>> RMO region, such access needs the translation to be enabled
>> for an LPAR running with hash mmu else the kernel crashes.
>>
>> After enabling translation in mce_handle_error() we used to
>> leave it enabled to avoid crashing here, but now with the
>> commit 74c3354bc1d89 ("powerpc/pseries/mce: restore msr before
>> returning from handler") we are restoring the MSR to disable
>> translation.
>>
>> Hence to fix this enable the translation before queuing the work.
> [snip]
>
>> Fixes: 74c3354bc1d89 ("powerpc/pseries/mce: restore msr before returning from handler")
> That patch changes arch/powerpc/powerpc/platforms/pseries/ras.c just
> below this comment:
>
>      /*
>       * Enable translation as we will be accessing per-cpu variables
>       * in save_mce_event() which may fall outside RMO region, also
>       * leave it enabled because subsequently we will be queuing work
>       * to workqueues where again per-cpu variables accessed, besides
>       * fwnmi_release_errinfo() crashes when called in realmode on
>       * pseries.
>       * Note: All the realmode handling like flushing SLB entries for
>       *       SLB multihit is done by now.
>       */
>
> That suggests per-cpu variables need protection. In your patch, you
> enable translations just around irq_work_queue:

The comment is bit old, most of it doesn't make any sense now, yes per-cpu
variables cannot be accessed in realmode, but with commit 923b3cf00b3f
("powerpc/mce: Remove per cpu variables from MCE handlers") we moved all of
them to paca.

>> +	/* Queue irq work to process this event later. Before
>> +	 * queuing the work enable translation for non radix LPAR,
>> +	 * as irq_work_queue may try to access memory outside RMO
>> +	 * region.
>> +	 */
>> +	if (!radix_enabled() && firmware_has_feature(FW_FEATURE_LPAR)) {
>> +		msr = mfmsr();
>> +		mtmsr(msr | MSR_IR | MSR_DR);
>> +		irq_work_queue(&mce_event_process_work);
>> +		mtmsr(msr);
>> +	} else {
>> +		irq_work_queue(&mce_event_process_work);
>> +	}
> However, just before that in the function, there are a few things that
> access per-cpu variables via the local_paca, e.g.:
>
> memcpy(&local_paca->mce_info->mce_event_queue[index],
>         &evt, sizeof(evt));
>
> Do we need to widen the window where translations are enabled in order
> to protect accesses to local_paca?

paca will be within Real Mode Area, so it can be accessed with translate off.


[-- Attachment #2: Type: text/html, Size: 3620 bytes --]

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [PATCH v2] powerpc/mce: Fix access error in mce handler
  2021-09-09  6:43 [PATCH v2] powerpc/mce: Fix access error in mce handler Ganesh Goudar
  2021-09-17  6:39 ` Daniel Axtens
@ 2021-09-19 12:20 ` Michael Ellerman
  1 sibling, 0 replies; 4+ messages in thread
From: Michael Ellerman @ 2021-09-19 12:20 UTC (permalink / raw)
  To: linuxppc-dev, Ganesh Goudar, mpe; +Cc: mahesh, npiggin

On Thu, 9 Sep 2021 12:13:30 +0530, Ganesh Goudar wrote:
> We queue an irq work for deferred processing of mce event
> in realmode mce handler, where translation is disabled.
> Queuing of the work may result in accessing memory outside
> RMO region, such access needs the translation to be enabled
> for an LPAR running with hash mmu else the kernel crashes.
> 
> After enabling translation in mce_handle_error() we used to
> leave it enabled to avoid crashing here, but now with the
> commit 74c3354bc1d89 ("powerpc/pseries/mce: restore msr before
> returning from handler") we are restoring the MSR to disable
> translation.
> 
> [...]

Applied to powerpc/fixes.

[1/1] powerpc/mce: Fix access error in mce handler
      https://git.kernel.org/powerpc/c/3a1e92d0896e928ac2a5b58962d05a39afef2e23

cheers

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2021-09-19 12:22 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-09-09  6:43 [PATCH v2] powerpc/mce: Fix access error in mce handler Ganesh Goudar
2021-09-17  6:39 ` Daniel Axtens
2021-09-17  8:40   ` Ganesh
2021-09-19 12:20 ` Michael Ellerman

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.