All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH] powerpc/pseries: Fix MCE handling on pseries
@ 2020-03-13 14:04 Ganesh Goudar
  2020-03-14  3:48 ` Nicholas Piggin
  0 siblings, 1 reply; 7+ messages in thread
From: Ganesh Goudar @ 2020-03-13 14:04 UTC (permalink / raw)
  To: mpe, linuxppc-dev; +Cc: mahesh, npiggin

MCE handling on pSeries platform fails as recent rework to use common
code for pSeries and PowerNV in machine check error handling tries to
access per-cpu variables in realmode. The per-cpu variables may be
outside the RMO region on pSeries platform and needs translation to be
enabled for access. Just moving these per-cpu variable into RMO region
did'nt help because we queue some work to workqueues in real mode, which
again tries to touch per-cpu variables. Also fwnmi_release_errinfo()
cannot be called when translation is not enabled.

This patch fixes this by enabling translation in the exception handler
when all required real mode handling is done. This change only affects
the pSeries platform.

Without this fix below kernel crash is seen on injecting
SLB multihit:

BUG: Unable to handle kernel data access on read at 0xc00000027b205950
Faulting instruction address: 0xc00000000003b7e0
Oops: Kernel access of bad area, sig: 11 [#1]
LE PAGE_SIZE=64K MMU=Hash SMP NR_CPUS=2048 NUMA pSeries
Modules linked in: mcetest_slb(OE+) af_packet(E) xt_tcpudp(E) ip6t_rpfilter(E) ip6t_REJECT(E) ipt_REJECT(E) xt_conntrack(E) ip_set(E) nfnetlink(E) ebtable_nat(E) ebtable_broute(E) ip6table_nat(E) ip6table_mangle(E) ip6table_raw(E) ip6table_security(E) iptable_nat(E) nf_nat(E) nf_conntrack(E) nf_defrag_ipv6(E) nf_defrag_ipv4(E) iptable_mangle(E) iptable_raw(E) iptable_security(E) ebtable_filter(E) ebtables(E) ip6table_filter(E) ip6_tables(E) iptable_filter(E) ip_tables(E) x_tables(E) xfs(E) ibmveth(E) vmx_crypto(E) gf128mul(E) uio_pdrv_genirq(E) uio(E) crct10dif_vpmsum(E) rtc_generic(E) btrfs(E) libcrc32c(E) xor(E) zstd_decompress(E) zstd_compress(E) raid6_pq(E) sr_mod(E) sd_mod(E) cdrom(E) ibmvscsi(E) scsi_transport_srp(E) crc32c_vpmsum(E) dm_mod(E) sg(E) scsi_mod(E)
CPU: 34 PID: 8154 Comm: insmod Kdump: loaded Tainted: G OE 5.5.0-mahesh #1
NIP: c00000000003b7e0 LR: c0000000000f2218 CTR: 0000000000000000
REGS: c000000007dcb960 TRAP: 0300 Tainted: G OE (5.5.0-mahesh)
MSR: 8000000000001003 <SF,ME,RI,LE> CR: 28002428 XER: 20040000
CFAR: c0000000000f2214 DAR: c00000027b205950 DSISR: 40000000 IRQMASK: 0
GPR00: c0000000000f2218 c000000007dcbbf0 c000000001544800 c000000007dcbd70
GPR04: 0000000000000001 c000000007dcbc98 c008000000d00258 c0080000011c0000
GPR08: 0000000000000000 0000000300000003 c000000001035950 0000000003000048
GPR12: 000000027a1d0000 c000000007f9c000 0000000000000558 0000000000000000
GPR16: 0000000000000540 c008000001110000 c008000001110540 0000000000000000
GPR20: c00000000022af10 c00000025480fd70 c008000001280000 c00000004bfbb300
GPR24: c000000001442330 c00800000800000d c008000008000000 4009287a77000510
GPR28: 0000000000000000 0000000000000002 c000000001033d30 0000000000000001
NIP [c00000000003b7e0] save_mce_event+0x30/0x240
LR [c0000000000f2218] pseries_machine_check_realmode+0x2c8/0x4f0
Call Trace:
Instruction dump:
3c4c0151 38429050 7c0802a6 60000000 fbc1fff0 fbe1fff8 f821ffd1 3d42ffaf
3fc2ffaf e98d0030 394a1150 3bdef530 <7d6a62aa> 1d2b0048 2f8b0063 380b0001
---[ end trace 46fd63f36bbdd940 ]---

Fixes: 9ca766f9891d ("powerpc/64s/pseries: machine check convert to use common event code")
Signed-off-by: Ganesh Goudar <ganeshgr@linux.ibm.com>
---
 arch/powerpc/kernel/exceptions-64s.S     | 12 ++++++++++++
 arch/powerpc/platforms/pseries/pseries.h |  1 +
 arch/powerpc/platforms/pseries/ras.c     | 11 +++++++++++
 3 files changed, 24 insertions(+)

diff --git a/arch/powerpc/kernel/exceptions-64s.S b/arch/powerpc/kernel/exceptions-64s.S
index ffc15f4f079d..93d0fddd8074 100644
--- a/arch/powerpc/kernel/exceptions-64s.S
+++ b/arch/powerpc/kernel/exceptions-64s.S
@@ -2405,3 +2405,15 @@ replay_interrupt_return:
 	blr
 
 _ASM_NOKPROBE_SYMBOL(__replay_interrupt)
+
+_GLOBAL(switch_to_virt_mode)
+	mflr	r10
+	mfmsr	r11
+	ori	r11,r11,MSR_IR|MSR_DR
+	/* Clear MSR_RI before setting SRR0 and SRR1 */
+	li	r12,0
+	mtmsrd	r12,1
+	mtspr	SPRN_SRR0,r10
+	mtspr	SPRN_SRR1,r11
+	RFI_TO_KERNEL
+	b	.
diff --git a/arch/powerpc/platforms/pseries/pseries.h b/arch/powerpc/platforms/pseries/pseries.h
index 13fa370a87e4..768e3e0f137d 100644
--- a/arch/powerpc/platforms/pseries/pseries.h
+++ b/arch/powerpc/platforms/pseries/pseries.h
@@ -113,5 +113,6 @@ int dlpar_workqueue_init(void);
 
 void pseries_setup_rfi_flush(void);
 void pseries_lpar_read_hblkrm_characteristics(void);
+extern void switch_to_virt_mode(void);
 
 #endif /* _PSERIES_PSERIES_H */
diff --git a/arch/powerpc/platforms/pseries/ras.c b/arch/powerpc/platforms/pseries/ras.c
index 1d7f973c647b..5d49d9d711da 100644
--- a/arch/powerpc/platforms/pseries/ras.c
+++ b/arch/powerpc/platforms/pseries/ras.c
@@ -683,6 +683,17 @@ static int mce_handle_error(struct pt_regs *regs, struct rtas_error_log *errp)
 #endif
 
 out:
+	/*
+	 * Enable translation as we will be accessing per-cpu variables
+	 * in save_mce_event() which may fall outside RMO region, also
+	 * leave it enabled because subsequently we will be queuing work
+	 * to workqueues where again per-cpu variables accessed, besides
+	 * fwnmi_release_errinfo() crashes when called in realmode on
+	 * pseries.
+	 * Note: All the realmode handling like flushing SLB entries for
+	 *       SLB multihit is done by now.
+	 */
+	switch_to_virt_mode();
 	save_mce_event(regs, disposition == RTAS_DISP_FULLY_RECOVERED,
 			&mce_err, regs->nip, eaddr, paddr);
 
-- 
2.17.2


^ permalink raw reply related	[flat|nested] 7+ messages in thread

* Re: [PATCH] powerpc/pseries: Fix MCE handling on pseries
  2020-03-13 14:04 [PATCH] powerpc/pseries: Fix MCE handling on pseries Ganesh Goudar
@ 2020-03-14  3:48 ` Nicholas Piggin
  2020-03-16 11:47   ` Ganesh
  0 siblings, 1 reply; 7+ messages in thread
From: Nicholas Piggin @ 2020-03-14  3:48 UTC (permalink / raw)
  To: Ganesh Goudar, linuxppc-dev, mpe; +Cc: mahesh

Ganesh Goudar's on March 14, 2020 12:04 am:
> MCE handling on pSeries platform fails as recent rework to use common
> code for pSeries and PowerNV in machine check error handling tries to
> access per-cpu variables in realmode. The per-cpu variables may be
> outside the RMO region on pSeries platform and needs translation to be
> enabled for access. Just moving these per-cpu variable into RMO region
> did'nt help because we queue some work to workqueues in real mode, which
> again tries to touch per-cpu variables.

Which queues are these? We should not be using Linux workqueues, but the
powerpc mce code which uses irq_work.

> Also fwnmi_release_errinfo()
> cannot be called when translation is not enabled.

Why not?

> This patch fixes this by enabling translation in the exception handler
> when all required real mode handling is done. This change only affects
> the pSeries platform.

Not supposed to do this, because we might not be in a state
where the MMU is ready to be turned on at this point.

I'd like to understand better which accesses are a problem, and whether
we can fix them all to be in the RMO.

Thanks,
Nick

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] powerpc/pseries: Fix MCE handling on pseries
  2020-03-14  3:48 ` Nicholas Piggin
@ 2020-03-16 11:47   ` Ganesh
  2020-03-17 10:01     ` Nicholas Piggin
  0 siblings, 1 reply; 7+ messages in thread
From: Ganesh @ 2020-03-16 11:47 UTC (permalink / raw)
  To: Nicholas Piggin, linuxppc-dev, mpe; +Cc: mahesh

[-- Attachment #1: Type: text/plain, Size: 4075 bytes --]



On 3/14/20 9:18 AM, Nicholas Piggin wrote:
> Ganesh Goudar's on March 14, 2020 12:04 am:
>> MCE handling on pSeries platform fails as recent rework to use common
>> code for pSeries and PowerNV in machine check error handling tries to
>> access per-cpu variables in realmode. The per-cpu variables may be
>> outside the RMO region on pSeries platform and needs translation to be
>> enabled for access. Just moving these per-cpu variable into RMO region
>> did'nt help because we queue some work to workqueues in real mode, which
>> again tries to touch per-cpu variables.
> Which queues are these? We should not be using Linux workqueues, but the
> powerpc mce code which uses irq_work.

Yes, irq work queues accesses memory outside RMO.
irq_work_queue()->__irq_work_queue_local()->[this_cpu_ptr(&lazy_list) | this_cpu_ptr(&raised_list)]

>> Also fwnmi_release_errinfo()
>> cannot be called when translation is not enabled.
> Why not?

It crashes when we try to get RTAS token for "ibm, nmi-interlock" device
tree node. But yes we can avoid it by storing it rtas_token somewhere but haven't
tried it, here is the backtrace I got when fwnmi_release_errinfo() called from
realmode handler.

[   70.856908] BUG: Unable to handle kernel data access on read at 0xc0000001ffffa8f8
[   70.856918] Faulting instruction address: 0xc000000000853920
[   70.856927] Oops: Kernel access of bad area, sig: 11 [#1]
[   70.856935] LE PAGE_SIZE=64K MMU=Hash SMP NR_CPUS=2048 NUMA pSeries
[   70.856943] Modules linked in: mcetest_slb(OE+) bridge stp llc ebtable_filter ebtables ip6table_filter ip6_tables iptable_filter sg pseries_rng ip_tables xfs libcrc32c sd_mod t10_pi ibmvscsi ibmveth scsi_transport_srp
[   70.856975] CPU: 13 PID: 6480 Comm: insmod Kdump: loaded Tainted: G           OE     5.6.0-rc2-ganesh+ #6
[   70.856985] NIP:  c000000000853920 LR: c000000000853a14 CTR: c0000000000376b0
[   70.856994] REGS: c000000007e4b870 TRAP: 0300   Tainted: G           OE      (5.6.0-rc2-ganesh+)
[   70.857003] MSR:  8000000000001003 <SF,ME,RI,LE>  CR: 88000422  XER: 00000009
[   70.857015] CFAR: c000000000853a10 DAR: c0000001ffffa8f8 DSISR: 40000000 IRQMASK: 1
[   70.857015] GPR00: c000000000853a14 c000000007e4bb00 c000000001372b00 c0000001ffffa8c8
[   70.857015] GPR04: c000000000cf8728 0000000000000000 0000000000000002 c008000000420810
[   70.857015] GPR08: 0000000000000000 0000000000000000 0000000000000001 0000000000000001
[   70.857015] GPR12: 0000000000000000 c000000007f92000 c0000001f8113d70 c00800000059070d
[   70.857015] GPR16: 00000000000004f8 c008000000421080 000000000000fff1 c008000000421038
[   70.857015] GPR20: c00000000125eb20 c000000000d1d1c8 c008000000590000 0000000000000000
[   70.857015] GPR24: 4000000000000510 c008000008000000 c0000000012355d8 c008000000420940
[   70.857015] GPR28: c008000008000011 0000000000000000 c000000000cf8728 c00000000169a098
[   70.857097] NIP [c000000000853920] __of_find_property+0x30/0xd0
[   70.857106] LR [c000000000853a14] of_find_property+0x54/0x90
[   70.857113] Call Trace:
[   70.857117] Instruction dump:
[   70.857124] 3c4c00b2 3842f210 2c230000 418200bc 7c0802a6 fba1ffe8 fbc1fff0 7cbd2b78
[   70.857136] fbe1fff8 7c9e2378 f8010010 f821ffc1 <ebe30030> 2fbf0000 409e0014 48000064
[   70.857152] ---[ end trace 13755f7502f3150b ]---
[   70.864199]
[   70.864226] Sending IPI to other CPUs
[   82.011761] ERROR: 15 cpu(s) not responding

>> This patch fixes this by enabling translation in the exception handler
>> when all required real mode handling is done. This change only affects
>> the pSeries platform.
> Not supposed to do this, because we might not be in a state
> where the MMU is ready to be turned on at this point.
>
> I'd like to understand better which accesses are a problem, and whether
> we can fix them all to be in the RMO.

I faced three such access problems,
  * accessing per-cpu data (like mce_event,mce_event_queue and mce_event_queue),
    we can move this inside RMO.
  * calling fwnmi_release_errinfo().
  * And queuing work to irq_work_queue, not sure how to fix this.

> Thanks,
> Nick


[-- Attachment #2: Type: text/html, Size: 5307 bytes --]

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] powerpc/pseries: Fix MCE handling on pseries
  2020-03-16 11:47   ` Ganesh
@ 2020-03-17 10:01     ` Nicholas Piggin
  2020-03-17 14:35       ` Ganesh
  0 siblings, 1 reply; 7+ messages in thread
From: Nicholas Piggin @ 2020-03-17 10:01 UTC (permalink / raw)
  To: Ganesh, linuxppc-dev, mpe; +Cc: mahesh

Ganesh's on March 16, 2020 9:47 pm:
> 
> 
> On 3/14/20 9:18 AM, Nicholas Piggin wrote:
>> Ganesh Goudar's on March 14, 2020 12:04 am:
>>> MCE handling on pSeries platform fails as recent rework to use common
>>> code for pSeries and PowerNV in machine check error handling tries to
>>> access per-cpu variables in realmode. The per-cpu variables may be
>>> outside the RMO region on pSeries platform and needs translation to be
>>> enabled for access. Just moving these per-cpu variable into RMO region
>>> did'nt help because we queue some work to workqueues in real mode, which
>>> again tries to touch per-cpu variables.
>> Which queues are these? We should not be using Linux workqueues, but the
>> powerpc mce code which uses irq_work.
> 
> Yes, irq work queues accesses memory outside RMO.
> irq_work_queue()->__irq_work_queue_local()->[this_cpu_ptr(&lazy_list) | this_cpu_ptr(&raised_list)]

Hmm, okay.

>>> Also fwnmi_release_errinfo()
>>> cannot be called when translation is not enabled.
>> Why not?
> 
> It crashes when we try to get RTAS token for "ibm, nmi-interlock" device
> tree node. But yes we can avoid it by storing it rtas_token somewhere but haven't
> tried it, here is the backtrace I got when fwnmi_release_errinfo() called from
> realmode handler.

Okay, I actually had problems with that messing up soft-irq state too
and so I sent a patch to get rid of it, but that's the least of your
problems really.

>>> This patch fixes this by enabling translation in the exception handler
>>> when all required real mode handling is done. This change only affects
>>> the pSeries platform.
>> Not supposed to do this, because we might not be in a state
>> where the MMU is ready to be turned on at this point.
>>
>> I'd like to understand better which accesses are a problem, and whether
>> we can fix them all to be in the RMO.
> 
> I faced three such access problems,
>   * accessing per-cpu data (like mce_event,mce_event_queue and mce_event_queue),
>     we can move this inside RMO.
>   * calling fwnmi_release_errinfo().
>   * And queuing work to irq_work_queue, not sure how to fix this.

Yeah. The irq_work_queue one is the biggest problem.

This code "worked" prior to the series unifying pseries and powernv
machine check handlers, 9ca766f9891d ("powerpc/64s/pseries: machine 
check convert to use common event code") and friends. But it does in
basically the same way as your fix (i.e., it runs this early handler
in virtual mode), but that's not really the right fix.

Consider: you get a SLB multi hit on a kernel address due to hardware or 
software error. That access causes a MCE, but before the error can be 
decode to save and flush the SLB, you turn on relocation and that
causes another SLB multi hit...

I think the irq_work subsystem will have to be changed to use an array
unfortunately.

Thanks,
Nick


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] powerpc/pseries: Fix MCE handling on pseries
  2020-03-17 10:01     ` Nicholas Piggin
@ 2020-03-17 14:35       ` Ganesh
  2020-03-20  2:41         ` Nicholas Piggin
  0 siblings, 1 reply; 7+ messages in thread
From: Ganesh @ 2020-03-17 14:35 UTC (permalink / raw)
  To: Nicholas Piggin, linuxppc-dev, mpe; +Cc: mahesh

[-- Attachment #1: Type: text/plain, Size: 3197 bytes --]



On 3/17/20 3:31 PM, Nicholas Piggin wrote:
> Ganesh's on March 16, 2020 9:47 pm:
>>
>> On 3/14/20 9:18 AM, Nicholas Piggin wrote:
>>> Ganesh Goudar's on March 14, 2020 12:04 am:
>>>> MCE handling on pSeries platform fails as recent rework to use common
>>>> code for pSeries and PowerNV in machine check error handling tries to
>>>> access per-cpu variables in realmode. The per-cpu variables may be
>>>> outside the RMO region on pSeries platform and needs translation to be
>>>> enabled for access. Just moving these per-cpu variable into RMO region
>>>> did'nt help because we queue some work to workqueues in real mode, which
>>>> again tries to touch per-cpu variables.
>>> Which queues are these? We should not be using Linux workqueues, but the
>>> powerpc mce code which uses irq_work.
>> Yes, irq work queues accesses memory outside RMO.
>> irq_work_queue()->__irq_work_queue_local()->[this_cpu_ptr(&lazy_list) | this_cpu_ptr(&raised_list)]
> Hmm, okay.
>
>>>> Also fwnmi_release_errinfo()
>>>> cannot be called when translation is not enabled.
>>> Why not?
>> It crashes when we try to get RTAS token for "ibm, nmi-interlock" device
>> tree node. But yes we can avoid it by storing it rtas_token somewhere but haven't
>> tried it, here is the backtrace I got when fwnmi_release_errinfo() called from
>> realmode handler.
> Okay, I actually had problems with that messing up soft-irq state too
> and so I sent a patch to get rid of it, but that's the least of your
> problems really.
>
>>>> This patch fixes this by enabling translation in the exception handler
>>>> when all required real mode handling is done. This change only affects
>>>> the pSeries platform.
>>> Not supposed to do this, because we might not be in a state
>>> where the MMU is ready to be turned on at this point.
>>>
>>> I'd like to understand better which accesses are a problem, and whether
>>> we can fix them all to be in the RMO.
>> I faced three such access problems,
>>    * accessing per-cpu data (like mce_event,mce_event_queue and mce_event_queue),
>>      we can move this inside RMO.
>>    * calling fwnmi_release_errinfo().
>>    * And queuing work to irq_work_queue, not sure how to fix this.
> Yeah. The irq_work_queue one is the biggest problem.
>
> This code "worked" prior to the series unifying pseries and powernv
> machine check handlers, 9ca766f9891d ("powerpc/64s/pseries: machine
> check convert to use common event code") and friends. But it does in
> basically the same way as your fix (i.e., it runs this early handler
> in virtual mode), but that's not really the right fix.
>
> Consider: you get a SLB multi hit on a kernel address due to hardware or
> software error. That access causes a MCE, but before the error can be
> decode to save and flush the SLB, you turn on relocation and that
> causes another SLB multi hit...

We turn on relocation only after all the realmode handling/recovery is done
like SLB flush and reload, All we do after we turn relocation on is saving
mce event to array and queuing the work to irq_workqueue.
So we are good to turn it on here.

> I think the irq_work subsystem will have to be changed to use an array
> unfortunately.
>
> Thanks,
> Nick
>


[-- Attachment #2: Type: text/html, Size: 4803 bytes --]

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] powerpc/pseries: Fix MCE handling on pseries
  2020-03-17 14:35       ` Ganesh
@ 2020-03-20  2:41         ` Nicholas Piggin
  2020-03-20  9:09           ` Ganesh
  0 siblings, 1 reply; 7+ messages in thread
From: Nicholas Piggin @ 2020-03-20  2:41 UTC (permalink / raw)
  To: Ganesh, linuxppc-dev, mpe; +Cc: mahesh

Ganesh's on March 18, 2020 12:35 am:
> 
> 
> On 3/17/20 3:31 PM, Nicholas Piggin wrote:
>> Ganesh's on March 16, 2020 9:47 pm:
>>>
>>> On 3/14/20 9:18 AM, Nicholas Piggin wrote:
>>>> Ganesh Goudar's on March 14, 2020 12:04 am:
>>>>> MCE handling on pSeries platform fails as recent rework to use common
>>>>> code for pSeries and PowerNV in machine check error handling tries to
>>>>> access per-cpu variables in realmode. The per-cpu variables may be
>>>>> outside the RMO region on pSeries platform and needs translation to be
>>>>> enabled for access. Just moving these per-cpu variable into RMO region
>>>>> did'nt help because we queue some work to workqueues in real mode, which
>>>>> again tries to touch per-cpu variables.
>>>> Which queues are these? We should not be using Linux workqueues, but the
>>>> powerpc mce code which uses irq_work.
>>> Yes, irq work queues accesses memory outside RMO.
>>> irq_work_queue()->__irq_work_queue_local()->[this_cpu_ptr(&lazy_list) | this_cpu_ptr(&raised_list)]
>> Hmm, okay.
>>
>>>>> Also fwnmi_release_errinfo()
>>>>> cannot be called when translation is not enabled.
>>>> Why not?
>>> It crashes when we try to get RTAS token for "ibm, nmi-interlock" device
>>> tree node. But yes we can avoid it by storing it rtas_token somewhere but haven't
>>> tried it, here is the backtrace I got when fwnmi_release_errinfo() called from
>>> realmode handler.
>> Okay, I actually had problems with that messing up soft-irq state too
>> and so I sent a patch to get rid of it, but that's the least of your
>> problems really.
>>
>>>>> This patch fixes this by enabling translation in the exception handler
>>>>> when all required real mode handling is done. This change only affects
>>>>> the pSeries platform.
>>>> Not supposed to do this, because we might not be in a state
>>>> where the MMU is ready to be turned on at this point.
>>>>
>>>> I'd like to understand better which accesses are a problem, and whether
>>>> we can fix them all to be in the RMO.
>>> I faced three such access problems,
>>>    * accessing per-cpu data (like mce_event,mce_event_queue and mce_event_queue),
>>>      we can move this inside RMO.
>>>    * calling fwnmi_release_errinfo().
>>>    * And queuing work to irq_work_queue, not sure how to fix this.
>> Yeah. The irq_work_queue one is the biggest problem.
>>
>> This code "worked" prior to the series unifying pseries and powernv
>> machine check handlers, 9ca766f9891d ("powerpc/64s/pseries: machine
>> check convert to use common event code") and friends. But it does in
>> basically the same way as your fix (i.e., it runs this early handler
>> in virtual mode), but that's not really the right fix.
>>
>> Consider: you get a SLB multi hit on a kernel address due to hardware or
>> software error. That access causes a MCE, but before the error can be
>> decode to save and flush the SLB, you turn on relocation and that
>> causes another SLB multi hit...
> 
> We turn on relocation only after all the realmode handling/recovery is done
> like SLB flush and reload, All we do after we turn relocation on is saving
> mce event to array and queuing the work to irq_workqueue.

Oh I see, fwnmi_release_errinfo is done after mce_handle_error, I didnt
read your comment closely!

That means the recovery is done with MSR[ME]=0, which means saving the
SLB entries can take a machine check which will turn into a checkstop,
or walking user page tables and loading memory to handle memory 
failures.

We really should release that immediately so we get ME back on.

> So we are good to turn it on here.

Possibly. I don't think it's generally a good idea enable relocation
from an interrupted relocation off context, but yeah this might be okay.

I think FWNMI mce needs to be fixed to not do this, and do the
nmi-interlock earlier, but for now your patch I guess improves things
significantly. So, okay let's go with it.

You should be able to just use mtmsrd to switch to virtual mode, so no
need for the asm code.

  mtmsr(mfmsr()|MSR_IR|MSR_DR);

Otherwise,

Reviewed-by: Nicholas Piggin <npiggin@gmail.com>

Thanks,
Nick

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] powerpc/pseries: Fix MCE handling on pseries
  2020-03-20  2:41         ` Nicholas Piggin
@ 2020-03-20  9:09           ` Ganesh
  0 siblings, 0 replies; 7+ messages in thread
From: Ganesh @ 2020-03-20  9:09 UTC (permalink / raw)
  To: Nicholas Piggin, linuxppc-dev, mpe; +Cc: mahesh

[-- Attachment #1: Type: text/plain, Size: 4256 bytes --]



On 3/20/20 8:11 AM, Nicholas Piggin wrote:
> Ganesh's on March 18, 2020 12:35 am:
>>
>> On 3/17/20 3:31 PM, Nicholas Piggin wrote:
>>> Ganesh's on March 16, 2020 9:47 pm:
>>>> On 3/14/20 9:18 AM, Nicholas Piggin wrote:
>>>>> Ganesh Goudar's on March 14, 2020 12:04 am:
>>>>>> MCE handling on pSeries platform fails as recent rework to use common
>>>>>> code for pSeries and PowerNV in machine check error handling tries to
>>>>>> access per-cpu variables in realmode. The per-cpu variables may be
>>>>>> outside the RMO region on pSeries platform and needs translation to be
>>>>>> enabled for access. Just moving these per-cpu variable into RMO region
>>>>>> did'nt help because we queue some work to workqueues in real mode, which
>>>>>> again tries to touch per-cpu variables.
>>>>> Which queues are these? We should not be using Linux workqueues, but the
>>>>> powerpc mce code which uses irq_work.
>>>> Yes, irq work queues accesses memory outside RMO.
>>>> irq_work_queue()->__irq_work_queue_local()->[this_cpu_ptr(&lazy_list) | this_cpu_ptr(&raised_list)]
>>> Hmm, okay.
>>>
>>>>>> Also fwnmi_release_errinfo()
>>>>>> cannot be called when translation is not enabled.
>>>>> Why not?
>>>> It crashes when we try to get RTAS token for "ibm, nmi-interlock" device
>>>> tree node. But yes we can avoid it by storing it rtas_token somewhere but haven't
>>>> tried it, here is the backtrace I got when fwnmi_release_errinfo() called from
>>>> realmode handler.
>>> Okay, I actually had problems with that messing up soft-irq state too
>>> and so I sent a patch to get rid of it, but that's the least of your
>>> problems really.
>>>
>>>>>> This patch fixes this by enabling translation in the exception handler
>>>>>> when all required real mode handling is done. This change only affects
>>>>>> the pSeries platform.
>>>>> Not supposed to do this, because we might not be in a state
>>>>> where the MMU is ready to be turned on at this point.
>>>>>
>>>>> I'd like to understand better which accesses are a problem, and whether
>>>>> we can fix them all to be in the RMO.
>>>> I faced three such access problems,
>>>>     * accessing per-cpu data (like mce_event,mce_event_queue and mce_event_queue),
>>>>       we can move this inside RMO.
>>>>     * calling fwnmi_release_errinfo().
>>>>     * And queuing work to irq_work_queue, not sure how to fix this.
>>> Yeah. The irq_work_queue one is the biggest problem.
>>>
>>> This code "worked" prior to the series unifying pseries and powernv
>>> machine check handlers, 9ca766f9891d ("powerpc/64s/pseries: machine
>>> check convert to use common event code") and friends. But it does in
>>> basically the same way as your fix (i.e., it runs this early handler
>>> in virtual mode), but that's not really the right fix.
>>>
>>> Consider: you get a SLB multi hit on a kernel address due to hardware or
>>> software error. That access causes a MCE, but before the error can be
>>> decode to save and flush the SLB, you turn on relocation and that
>>> causes another SLB multi hit...
>> We turn on relocation only after all the realmode handling/recovery is done
>> like SLB flush and reload, All we do after we turn relocation on is saving
>> mce event to array and queuing the work to irq_workqueue.
> Oh I see, fwnmi_release_errinfo is done after mce_handle_error, I didnt
> read your comment closely!
>
> That means the recovery is done with MSR[ME]=0, which means saving the
> SLB entries can take a machine check which will turn into a checkstop,
> or walking user page tables and loading memory to handle memory
> failures.
>
> We really should release that immediately so we get ME back on.
>
>> So we are good to turn it on here.
> Possibly. I don't think it's generally a good idea enable relocation
> from an interrupted relocation off context, but yeah this might be okay.
>
> I think FWNMI mce needs to be fixed to not do this, and do the
> nmi-interlock earlier, but for now your patch I guess improves things
> significantly. So, okay let's go with it.
>
> You should be able to just use mtmsrd to switch to virtual mode, so no
> need for the asm code.
>
>    mtmsr(mfmsr()|MSR_IR|MSR_DR);

Sure, Thanks

>
> Otherwise,
>
> Reviewed-by: Nicholas Piggin <npiggin@gmail.com>
>
> Thanks,
> Nick


[-- Attachment #2: Type: text/html, Size: 6357 bytes --]

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2020-03-20 10:05 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-03-13 14:04 [PATCH] powerpc/pseries: Fix MCE handling on pseries Ganesh Goudar
2020-03-14  3:48 ` Nicholas Piggin
2020-03-16 11:47   ` Ganesh
2020-03-17 10:01     ` Nicholas Piggin
2020-03-17 14:35       ` Ganesh
2020-03-20  2:41         ` Nicholas Piggin
2020-03-20  9:09           ` Ganesh

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.