linuxppc-dev.lists.ozlabs.org archive mirror
 help / color / mirror / Atom feed
* [PATCH] powerpc/64s/exception: Fix machine check early corrupting AMR
@ 2019-06-21 22:55 Nicholas Piggin
  2019-06-29 11:37 ` Michael Ellerman
  2019-06-29 12:25 ` Michael Ellerman
  0 siblings, 2 replies; 3+ messages in thread
From: Nicholas Piggin @ 2019-06-21 22:55 UTC (permalink / raw)
  To: linuxppc-dev; +Cc: Nicholas Piggin

The early machine check runs in real mode, so locking is unnecessary.
Worse, the windup does not restore AMR, so this can result in a false
KUAP fault after a recoverable machine check hits inside a user copy
operation.

Fix this similarly to HMI by just avoiding the kuap lock in the
early machine check handler (it will be set by the late handler that
runs in virtual mode if that runs). If the virtual mode handler is
reached, it will lock and restore the AMR.

Fixes: 890274c2dc4c0 ("powerpc/64s: Implement KUAP for Radix MMU")
Cc: Russell Currey <ruscur@russell.cc>
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
---
 arch/powerpc/kernel/exceptions-64s.S | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/powerpc/kernel/exceptions-64s.S b/arch/powerpc/kernel/exceptions-64s.S
index 6b86055e5251..73ba246ca11d 100644
--- a/arch/powerpc/kernel/exceptions-64s.S
+++ b/arch/powerpc/kernel/exceptions-64s.S
@@ -315,7 +315,7 @@ TRAMP_REAL_BEGIN(machine_check_common_early)
 	mfspr	r11,SPRN_DSISR		/* Save DSISR */
 	std	r11,_DSISR(r1)
 	std	r9,_CCR(r1)		/* Save CR in stackframe */
-	kuap_save_amr_and_lock r9, r10, cr1
+	/* We don't touch AMR here, we never go to virtual mode */
 	/* Save r9 through r13 from EXMC save area to stack frame. */
 	EXCEPTION_PROLOG_COMMON_2(PACA_EXMC)
 	mfmsr	r11			/* get MSR value */
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 3+ messages in thread

* Re: [PATCH] powerpc/64s/exception: Fix machine check early corrupting AMR
  2019-06-21 22:55 [PATCH] powerpc/64s/exception: Fix machine check early corrupting AMR Nicholas Piggin
@ 2019-06-29 11:37 ` Michael Ellerman
  2019-06-29 12:25 ` Michael Ellerman
  1 sibling, 0 replies; 3+ messages in thread
From: Michael Ellerman @ 2019-06-29 11:37 UTC (permalink / raw)
  To: Nicholas Piggin, linuxppc-dev; +Cc: Nicholas Piggin

On Fri, 2019-06-21 at 22:55:54 UTC, Nicholas Piggin wrote:
> The early machine check runs in real mode, so locking is unnecessary.
> Worse, the windup does not restore AMR, so this can result in a false
> KUAP fault after a recoverable machine check hits inside a user copy
> operation.
> 
> Fix this similarly to HMI by just avoiding the kuap lock in the
> early machine check handler (it will be set by the late handler that
> runs in virtual mode if that runs). If the virtual mode handler is
> reached, it will lock and restore the AMR.
> 
> Fixes: 890274c2dc4c0 ("powerpc/64s: Implement KUAP for Radix MMU")
> Cc: Russell Currey <ruscur@russell.cc>
> Signed-off-by: Nicholas Piggin <npiggin@gmail.com>

Applied to powerpc fixes, thanks.

https://git.kernel.org/powerpc/c/e13e7cd4c0c1cc9984d9b6a8663e10d76b53f2aa

cheers

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: [PATCH] powerpc/64s/exception: Fix machine check early corrupting AMR
  2019-06-21 22:55 [PATCH] powerpc/64s/exception: Fix machine check early corrupting AMR Nicholas Piggin
  2019-06-29 11:37 ` Michael Ellerman
@ 2019-06-29 12:25 ` Michael Ellerman
  1 sibling, 0 replies; 3+ messages in thread
From: Michael Ellerman @ 2019-06-29 12:25 UTC (permalink / raw)
  To: Nicholas Piggin, linuxppc-dev; +Cc: Nicholas Piggin

Nicholas Piggin <npiggin@gmail.com> writes:

> The early machine check runs in real mode, so locking is unnecessary.
> Worse, the windup does not restore AMR, so this can result in a false
> KUAP fault after a recoverable machine check hits inside a user copy
> operation.
>
> Fix this similarly to HMI by just avoiding the kuap lock in the
> early machine check handler (it will be set by the late handler that
> runs in virtual mode if that runs). If the virtual mode handler is
> reached, it will lock and restore the AMR.

For the archives, this is how I tested this.

Build with KUAP enabled, disassemble load_elf_binary(), in there is a
call to __copy_tofrom_user(), preceded by a write to AMR, eg:

c00000000045eec8:	a6 03 3d 7d 	mtspr   29,r9
c00000000045eecc:	2c 01 00 4c 	isync
c00000000045eed0:	78 93 44 7e 	mr      r4,r18
c00000000045eed4:	78 e3 83 7f 	mr      r3,r28
c00000000045eed8:	b1 c1 c3 4b 	bl      c00000000009b088 <__copy_tofrom_user+0x8>


Boot mambo using skiboot.tcl, break into the mambo shell. Add a
breakpoint at the branch to __copy_tofrom_user():

  systemsim % b 0xc00000000045eed8
  breakpoint set at [0:0:0]: 0xc00000000045eed8 (0xC00000000045EED8) Enc:0x00000000 : INVALID

Continue, run `ls` in the system shell and it should break at your breakpoint:

  systemsim % c
  4439260000000: [0:0]: (PC:0x00007FFFB43B2F00) :      2.1 Mega-Inst/Sec :      2.1 Mega-Cycles/Sec [1 Zaps  0 PA-Zaps] *ON*  [0:0] pri=4 extra=0
  4440009381609: (7208208132): # ls
  [0:0:0]: 0xC00000000045EED8 (0x000000000045EED8) Enc:0xB1C1C34B : bl      $-0x3C3E50
  INFO: 4440936223969: (8135050536): ** Execution stopped: user (tcl),  **
  4440936223969: ** finished running 8135050536 instructions **

Print the AMR, it has been cleared:

  systemsim % p amr
  0x0000000000000000

Then inject a machine check exception, and continue:
  systemsim % exc_mce
  systemsim % c
  4440936231861: (8135058428): [ 8673.510176] Disabling lock debugging due to kernel taint
  4440936246871: (8135073438): [ 8673.510205] MCE: CPU0: machine check (Warning) Host TLB Multihit [Recovered]
  4440936266680: (8135093247): [ 8673.510244] MCE: CPU0: NIP: [c00000000045eed8] load_elf_binary+0xef8/0x1970
  4440936282657: (8135109224): [ 8673.510275] MCE: CPU0: Probable Software error (some chance of hardware cause)
  [0:0:0]: 0xC00000000045EED8 (0x000000000045EED8) Enc:0xB1C1C34B : bl      $-0x3C3E50
  INFO: 4440936296116: (8135122683): ** Execution stopped: user (tcl),  **
  4440936296116: ** finished running 8135122683 instructions **

Now we're back at our breakpoint. Continue again and we should get an
oops due to a bad AMR fault:

  systemsim % c
  4440936301692: (8135128259): [ 8673.510312] ------------[ cut here ]------------
  4440936321016: (8135147583): [ 8673.510336] Bug: Write fault blocked by AMR!
  4440936331347: (8135157914): [ 8673.510350] WARNING: CPU: 0 PID: 95 at arch/powerpc/include/asm/book3s/64/kup-radix.h:102 __do_page_fault+0x604/0xe60
  4440936352510: (8135179077): [ 8673.510410] Modules linked in:
  4440936365222: (8135191789): [ 8673.510436] CPU: 0 PID: 95 Comm: ls Tainted: G   M              5.2.0-rc2-gcc-8.2.0 #273
  4440936383775: (8135210342): [ 8673.510473] NIP:  c0000000000716b4 LR: c0000000000716b0 CTR: c000000000ca88b0
  4440936401995: (8135228562): [ 8673.510508] REGS: c0000000ec883530 TRAP: 0700   Tainted: G   M               (5.2.0-rc2-gcc-8.2.0)
  4440936430641: (8135257208): [ 8673.510545] MSR:  9000000000021033 <SF,HV,ME,IR,DR,RI,LE>  CR: 28002422  XER: 20040000
  4440936498754: (8135325321): [ 8673.510597] CFAR: c00000000011b8e4 IRQMASK: 1 
  4440936505159: (8135331726): [ 8673.510597] GPR00: c0000000000716b0 c0000000ec8837c0 c0000000015f4900 0000000000000020 
  4440936515814: (8135342381): [ 8673.510597] GPR04: c000000001824550 0000000000000000 746c756166206574 64656b636f6c6220 
  4440936528594: (8135355161): [ 8673.510597] GPR08: 00000000fed30000 c000000001130de8 0000000000000000 9000000030001033 
  4440936541374: (8135367941): [ 8673.510597] GPR12: 0000000000002000 c0000000018e0000 0000000080000000 00007fffe2e3de09 
  4440936554154: (8135380721): [ 8673.510597] GPR16: c000000000ed2c50 0000000010000000 c000000000ed2c50 00000000100d3648 
  4440936564809: (8135391376): [ 8673.510597] GPR20: c0000000f0968b00 00000000100e3648 00007fff930a0000 0000000002000000 
  4440936577589: (8135404156): [ 8673.510597] GPR24: 0000000002000000 c0000000ee830600 0000000000000301 00007fffe2e3de09 
  4440936590369: (8135416936): [ 8673.510597] GPR28: 0000000000000000 000000000a000000 0000000000000000 c0000000ec883900 
  4440936611699: (8135438266): [ 8673.510918] NIP [c0000000000716b4] __do_page_fault+0x604/0xe60
  4440936628747: (8135455314): [ 8673.510951] LR [c0000000000716b0] __do_page_fault+0x600/0xe60
  4440936642325: (8135468892): [ 8673.510978] Call Trace:
  4440936655614: (8135482181): [ 8673.511000] [c0000000ec8837c0] [c0000000000716b0] __do_page_fault+0x600/0xe60 (unreliable)
  4440936677874: (8135504441): [ 8673.511045] [c0000000ec883890] [c00000000000b0d4] handle_page_fault+0x18/0x38
  4440936700658: (8135527225): [ 8673.511091] --- interrupt: 301 at __copy_tofrom_user_power7+0x230/0x7ac
  4440936709188: (8135535755): [ 8673.511091]     LR = load_elf_binary+0xefc/0x1970
  4440936728082: (8135554649): [ 8673.511142] [c0000000ec883b90] [c00000000045ee80] load_elf_binary+0xea0/0x1970 (unreliable)
  4440936750368: (8135576935): [ 8673.511187] [c0000000ec883c90] [c0000000003d2f88] search_binary_handler.part.12+0xb8/0x2b0
  4440936772446: (8135599013): [ 8673.511230] [c0000000ec883d20] [c0000000003d3934] __do_execve_file.isra.14+0x684/0xa10
  4440936793891: (8135620458): [ 8673.511272] [c0000000ec883df0] [c0000000003d41b8] sys_execve+0x38/0x50
  4440936813829: (8135640396): [ 8673.511311] [c0000000ec883e20] [c00000000000bdf4] system_call+0x5c/0x70
  4440936828817: (8135655384): [ 8673.511340] Instruction dump:
  4440936848134: (8135674701): [ 8673.511361] 60000000 2fb70000 e93f0168 419e0620 2fa90000 409cfba4 3c82ff8e 38846b88 
  4440936874244: (8135700811): [ 8673.511412] 3c62ff8e 38636c98 480aa1d1 60000000 <0fe00000> e80100e0 3b80000b eae10088 
  4440936891327: (8135717894): [ 8673.511464] ---[ end trace 0698ac8ff1068918 ]---
  4440938377906: (8137204473): Segmentation fault


Apply the fix, retest, and no oops is seen.

cheers

^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2019-06-29 12:27 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-06-21 22:55 [PATCH] powerpc/64s/exception: Fix machine check early corrupting AMR Nicholas Piggin
2019-06-29 11:37 ` Michael Ellerman
2019-06-29 12:25 ` Michael Ellerman

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).