Re: [PATCH] powerpc/64/interrupt: Fix return to masked context after hard-mask irq becomes pending

From: Nicholas Piggin <npiggin@gmail.com>
To: Sachin Sant <sachinp@linux.ibm.com>
Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com>,
	linuxppc-dev@lists.ozlabs.org
Subject: Re: [PATCH] powerpc/64/interrupt: Fix return to masked context after hard-mask irq becomes pending
Date: Thu, 26 May 2022 14:50:04 +1000	[thread overview]
Message-ID: <1653536815.6wga69vb1l.astroid@bobo.none> (raw)
In-Reply-To: <BE8D6B08-1276-4BBB-8859-C7D48242D0DF@linux.ibm.com>

Excerpts from Sachin Sant's message of March 9, 2022 6:37 pm:
> 
> 
>> On 07-Mar-2022, at 8:21 PM, Nicholas Piggin <npiggin@gmail.com> wrote:
>> 
>> When a synchronous interrupt[1] is taken in a local_irq_disable() region
>> which has MSR[EE]=1, the interrupt handler will enable MSR[EE] as part
>> of enabling MSR[RI], for peformance and profiling reasons.
>> 
>> [1] Typically a hash fault, but in error cases this could be a page
>>    fault or facility unavailable as well.
>> 
>> If an asynchronous interrupt hits here and its masked handler requires
>> MSR[EE] to be cleared (it is a PACA_IRQ_MUST_HARD_MASK interrupt), then
>> MSR[EE] must remain disabled until that pending interrupt is replayed.
>> The problem is that the MSR of the original context has MSR[EE]=1, so
>> returning directly to that causes MSR[EE] to be enabled while the
>> interrupt is still pending.
>> 
>> This issue was hacked around in the interrupt return code by just
>> clearing the hard mask to avoid a warning, and taking the masked
>> interrupt again immediately in the return context, which would disable
>> MSR[EE]. However in the case of a pending PMI, it is possible that it is
>> not maked in the calling context so the full handler will be run while
>> there is a PMI pending, and this confuses the perf code and causes
>> warnings with its PMI pending management.
>> 
>> Fix this by removing the hack, and adjusting the return MSR if it has
>> MSR[EE]=1 and there is a PACA_IRQ_MUST_HARD_MASK interrupt pending.
>> 
>> Fixes: 4423eb5ae32e ("powerpc/64/interrupt: make normal synchronous interrupts enable MSR[EE] if possible")
>> Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
>> ---
>> arch/powerpc/kernel/interrupt.c    | 10 ---------
>> arch/powerpc/kernel/interrupt_64.S | 34 +++++++++++++++++++++++++++---
>> 2 files changed, 31 insertions(+), 13 deletions(-)
> 
> With this patch on top of powerpc/merge following rcu stalls are seen while
> running powerpc selftests (mitigation-patching) on P9. I don’t see this
> issue on P10.
> 
> [ 1841.248838] link-stack-flush: flush disabled.
> [ 1841.248905] count-cache-flush: software flush enabled.
> [ 1841.248911] link-stack-flush: software flush enabled.
> [ 1901.249668] rcu: INFO: rcu_sched self-detected stall on CPU
> [ 1901.249703] rcu: 	12-...!: (5999 ticks this GP) idle=d0f/1/0x4000000000000002 softirq=37019/37027 fqs=0 
> [ 1901.249720] 	(t=6000 jiffies g=106273 q=1624)
> [ 1901.249729] rcu: rcu_sched kthread starved for 6000 jiffies! g106273 f0x0 RCU_GP_WAIT_FQS(5) ->state=0x0 ->cpu=6
> [ 1901.249743] rcu: 	Unless rcu_sched kthread gets sufficient CPU time, OOM is now expected behavior.
> [ 1901.249752] rcu: RCU grace-period kthread stack dump:
> [ 1901.249759] task:rcu_sched       state:R  running task     stack:    0 pid:   11 ppid:     2 flags:0x00000800
> [ 1901.249775] Call Trace:
> [ 1901.249781] [c0000000076ab870] [0000000000000001] 0x1 (unreliable)
> [ 1901.249795] [c0000000076aba60] [c00000000001e508] __switch_to+0x288/0x4a0
> [ 1901.249811] [c0000000076abac0] [c000000000d15950] __schedule+0x2c0/0x950
> [ 1901.249824] [c0000000076abb80] [c000000000d16048] schedule+0x68/0x130
> [ 1901.249836] [c0000000076abbb0] [c000000000d1df1c] schedule_timeout+0x25c/0x3f0
> [ 1901.249849] [c0000000076abc90] [c00000000021522c] rcu_gp_fqs_loop+0x2fc/0x3e0
> [ 1901.249863] [c0000000076abd40] [c00000000021a0fc] rcu_gp_kthread+0x13c/0x180
> [ 1901.249875] [c0000000076abdc0] [c00000000018ce94] kthread+0x124/0x130
> [ 1901.249887] [c0000000076abe10] [c00000000000cec0] ret_from_kernel_thread+0x5c/0x64
> [ 1901.249900] rcu: Stack dump where RCU GP kthread last ran:
> [ 1901.249908] Sending NMI from CPU 12 to CPUs 6:
> [ 1901.249944] NMI backtrace for cpu 6
> [ 1901.249957] CPU: 6 PID: 40 Comm: migration/6 Not tainted 5.17.0-rc6-00327-g782b30d101f6-dirty #3
> [ 1901.249971] Stopper: multi_cpu_stop+0x0/0x230 <- stop_machine_cpuslocked+0x188/0x1e0
> [ 1901.249987] NIP:  c000000000d14e0c LR: c000000000214280 CTR: c0000000002914f0
> [ 1901.249996] REGS: c00000000785b980 TRAP: 0500   Not tainted  (5.17.0-rc6-00327-g782b30d101f6-dirty)
> [ 1901.250007] MSR:  800000000280b033 <SF,VEC,VSX,EE,FP,ME,IR,DR,RI,LE>  CR: 48002822  XER: 00000000
> [ 1901.250038] CFAR: 0000000000000000 IRQMASK: 0 
> [ 1901.250038] GPR00: c00000000029165c c00000000785bc20 c000000002a20000 0000000000000002 
> [ 1901.250038] GPR04: 0000000000000000 c0000009fb60ab80 c0000009fb60ab70 c00000000001e508 
> [ 1901.250038] GPR08: 0000000000000000 c0000009fb68f5a8 00000009f94c0000 000000000098967f 
> [ 1901.250038] GPR12: 0000000000000000 c00000001ec57a00 c00000000018cd78 c000000007234f80 
> [ 1901.250038] GPR16: 0000000000000000 0000000000000000 0000000000000000 0000000000000000 
> [ 1901.250038] GPR20: 0000000000000000 0000000000000000 0000000000000000 0000000000000001 
> [ 1901.250038] GPR24: 0000000000000002 0000000000000003 0000000000000000 c000000002a62138 
> [ 1901.250038] GPR28: c0000000ee70faf8 0000000000000001 c0000000ee70fb1c 0000000000000001 
> [ 1901.250157] NIP [c000000000d14e0c] rcu_dynticks_inc+0x1c/0x40
> [ 1901.250168] LR [c000000000214280] rcu_momentary_dyntick_idle+0x30/0x60
> [ 1901.250180] Call Trace:
> [ 1901.250185] [c00000000785bc20] [c000000007a96738] 0xc000000007a96738 (unreliable)
> [ 1901.250198] [c00000000785bc40] [c00000000029165c] multi_cpu_stop+0x16c/0x230
> [ 1901.250210] [c00000000785bcb0] [c000000000291244] cpu_stopper_thread+0xe4/0x240
> [ 1901.250223] [c00000000785bd60] [c000000000193214] smpboot_thread_fn+0x1e4/0x250
> [ 1901.250237] [c00000000785bdc0] [c00000000018ce94] kthread+0x124/0x130
> [ 1901.250249] [c00000000785be10] [c00000000000cec0] ret_from_kernel_thread+0x5c/0x64
> [ 1901.250262] Instruction dump:
> [ 1901.250269] f821ffa1 4b43b81d 00000000 00000000 00000000 3c4c01d1 3842b210 e94d0030 
> [ 1901.250290] 3d22ff7b 3929f5a8 7d295214 7c0004ac <7d404828> 7d4a1814 7d40492d 40c2fff4 
> [ 1901.250313] Sending NMI from CPU 12 to CPUs 0:
> [ 1901.250328] NMI backtrace for cpu 0
> 
> Have attached o/p captured during mitigation-patching test run.

Hey, I haven't been able to reproduce this. Are you just running make 
run_tests in the selftests/powerpc/security/ directory? Any particular
config?

Thanks,
Nick