All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH] powerpc/64s/interrupt: Fix lost interrupts when returning to soft-masked context
@ 2022-10-13  6:44 Nicholas Piggin
  2022-10-14 17:05 ` Guenter Roeck
  2022-10-14 23:21 ` Michael Ellerman
  0 siblings, 2 replies; 4+ messages in thread
From: Nicholas Piggin @ 2022-10-13  6:44 UTC (permalink / raw)
  To: linuxppc-dev
  Cc: Jason A . Donenfeld, Guenter Roeck, Sachin Sant, Nicholas Piggin

It's possible for an interrupt returning to an irqs-disabled context to
lose a pending soft-masked irq because it branches to part of the exit
code for irqs-enabled contexts, which is meant to clear only the
PACA_IRQS_HARD_DIS flag from PACAIRQHAPPENED by zeroing the byte. This
just looks like a simple thinko from a recent commit (if there was no
hard mask pending, there would be no reason to clear it anyway).

This also adds comment to the code that actually does need to clear the
flag.

Cc: Jason A. Donenfeld <Jason@zx2c4.com>
Reported-by: Sachin Sant <sachinp@linux.ibm.com>
Reported-by: Guenter Roeck <linux@roeck-us.net>
Fixes: e485f6c751e0a ("powerpc/64/interrupt: Fix return to masked context after hard-mask irq becomes pending")
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
---
I credit Sachin as well because he likely ran into it here,

https://lists.ozlabs.org/pipermail/linuxppc-dev/2022-March/240971.html

It was much more difficult to hit on his setup so when I asked him to
re-test with a later kernel and it didn't reproduce, I thought it
could have been unrelated. I stared at the patch for ages back then and
didn't see the bug. I should have known better :(

I'm pretty confident this is the hang problem. Reproducer is intermittent
for me, but I did catch it losing pending irqs here using debug code, so
this certainly is _a_ bug that can explain the symptoms.

Thanks,
Nick

 arch/powerpc/kernel/interrupt_64.S | 15 +++++++++++++--
 1 file changed, 13 insertions(+), 2 deletions(-)

diff --git a/arch/powerpc/kernel/interrupt_64.S b/arch/powerpc/kernel/interrupt_64.S
index 904a5608cbe3..978a173eb339 100644
--- a/arch/powerpc/kernel/interrupt_64.S
+++ b/arch/powerpc/kernel/interrupt_64.S
@@ -538,7 +538,7 @@ _ASM_NOKPROBE_SYMBOL(interrupt_return_\srr\()_kernel)
 	beq	.Lfast_kernel_interrupt_return_\srr\() // EE already disabled
 	lbz	r11,PACAIRQHAPPENED(r13)
 	andi.	r10,r11,PACA_IRQ_MUST_HARD_MASK
-	beq	1f // No HARD_MASK pending
+	beq	.Lfast_kernel_interrupt_return_\srr\() // No HARD_MASK pending
 
 	/* Must clear MSR_EE from _MSR */
 #ifdef CONFIG_PPC_BOOK3S
@@ -555,12 +555,23 @@ _ASM_NOKPROBE_SYMBOL(interrupt_return_\srr\()_kernel)
 	b	.Lfast_kernel_interrupt_return_\srr\()
 
 .Linterrupt_return_\srr\()_soft_enabled:
+	/*
+	 * In the soft-enabled case, need to double-check that we have no
+	 * pending interrupts that might have come in before we reached the
+	 * restart section of code, and restart the exit so those can be
+	 * handled.
+	 *
+	 * If there are none, it is be possible that the interrupt still
+	 * has PACA_IRQ_HARD_DIS set, which needs to be cleared for the
+	 * interrupted context. This clear will not clobber a new pending
+	 * interrupt coming in, because we're in the restart section, so
+	 * such would return to the restart location.
+	 */
 #ifdef CONFIG_PPC_BOOK3S
 	lbz	r11,PACAIRQHAPPENED(r13)
 	andi.	r11,r11,(~PACA_IRQ_HARD_DIS)@l
 	bne-	interrupt_return_\srr\()_kernel_restart
 #endif
-1:
 	li	r11,0
 	stb	r11,PACAIRQHAPPENED(r13) // clear the possible HARD_DIS
 
-- 
2.37.2


^ permalink raw reply related	[flat|nested] 4+ messages in thread

* Re: [PATCH] powerpc/64s/interrupt: Fix lost interrupts when returning to soft-masked context
  2022-10-13  6:44 [PATCH] powerpc/64s/interrupt: Fix lost interrupts when returning to soft-masked context Nicholas Piggin
@ 2022-10-14 17:05 ` Guenter Roeck
  2022-10-14 23:21 ` Michael Ellerman
  1 sibling, 0 replies; 4+ messages in thread
From: Guenter Roeck @ 2022-10-14 17:05 UTC (permalink / raw)
  To: Nicholas Piggin; +Cc: Jason A . Donenfeld, linuxppc-dev, Sachin Sant

On Thu, Oct 13, 2022 at 04:44:18PM +1000, Nicholas Piggin wrote:
> It's possible for an interrupt returning to an irqs-disabled context to
> lose a pending soft-masked irq because it branches to part of the exit
> code for irqs-enabled contexts, which is meant to clear only the
> PACA_IRQS_HARD_DIS flag from PACAIRQHAPPENED by zeroing the byte. This
> just looks like a simple thinko from a recent commit (if there was no
> hard mask pending, there would be no reason to clear it anyway).
> 
> This also adds comment to the code that actually does need to clear the
> flag.
> 
> Cc: Jason A. Donenfeld <Jason@zx2c4.com>
> Reported-by: Sachin Sant <sachinp@linux.ibm.com>
> Reported-by: Guenter Roeck <linux@roeck-us.net>
> Fixes: e485f6c751e0a ("powerpc/64/interrupt: Fix return to masked context after hard-mask irq becomes pending")
> Signed-off-by: Nicholas Piggin <npiggin@gmail.com>

I thought I sent this before, but I am not sure if it got lost
since I don't see it in the powerpc patchwork.

Tested-by: Guenter Roeck <linux@roeck-us.net>

Guenter

> ---
> I credit Sachin as well because he likely ran into it here,
> 
> https://lists.ozlabs.org/pipermail/linuxppc-dev/2022-March/240971.html
> 
> It was much more difficult to hit on his setup so when I asked him to
> re-test with a later kernel and it didn't reproduce, I thought it
> could have been unrelated. I stared at the patch for ages back then and
> didn't see the bug. I should have known better :(
> 
> I'm pretty confident this is the hang problem. Reproducer is intermittent
> for me, but I did catch it losing pending irqs here using debug code, so
> this certainly is _a_ bug that can explain the symptoms.
> 
> Thanks,
> Nick
> 
>  arch/powerpc/kernel/interrupt_64.S | 15 +++++++++++++--
>  1 file changed, 13 insertions(+), 2 deletions(-)
> 
> diff --git a/arch/powerpc/kernel/interrupt_64.S b/arch/powerpc/kernel/interrupt_64.S
> index 904a5608cbe3..978a173eb339 100644
> --- a/arch/powerpc/kernel/interrupt_64.S
> +++ b/arch/powerpc/kernel/interrupt_64.S
> @@ -538,7 +538,7 @@ _ASM_NOKPROBE_SYMBOL(interrupt_return_\srr\()_kernel)
>  	beq	.Lfast_kernel_interrupt_return_\srr\() // EE already disabled
>  	lbz	r11,PACAIRQHAPPENED(r13)
>  	andi.	r10,r11,PACA_IRQ_MUST_HARD_MASK
> -	beq	1f // No HARD_MASK pending
> +	beq	.Lfast_kernel_interrupt_return_\srr\() // No HARD_MASK pending
>  
>  	/* Must clear MSR_EE from _MSR */
>  #ifdef CONFIG_PPC_BOOK3S
> @@ -555,12 +555,23 @@ _ASM_NOKPROBE_SYMBOL(interrupt_return_\srr\()_kernel)
>  	b	.Lfast_kernel_interrupt_return_\srr\()
>  
>  .Linterrupt_return_\srr\()_soft_enabled:
> +	/*
> +	 * In the soft-enabled case, need to double-check that we have no
> +	 * pending interrupts that might have come in before we reached the
> +	 * restart section of code, and restart the exit so those can be
> +	 * handled.
> +	 *
> +	 * If there are none, it is be possible that the interrupt still
> +	 * has PACA_IRQ_HARD_DIS set, which needs to be cleared for the
> +	 * interrupted context. This clear will not clobber a new pending
> +	 * interrupt coming in, because we're in the restart section, so
> +	 * such would return to the restart location.
> +	 */
>  #ifdef CONFIG_PPC_BOOK3S
>  	lbz	r11,PACAIRQHAPPENED(r13)
>  	andi.	r11,r11,(~PACA_IRQ_HARD_DIS)@l
>  	bne-	interrupt_return_\srr\()_kernel_restart
>  #endif
> -1:
>  	li	r11,0
>  	stb	r11,PACAIRQHAPPENED(r13) // clear the possible HARD_DIS
>  

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [PATCH] powerpc/64s/interrupt: Fix lost interrupts when returning to soft-masked context
  2022-10-13  6:44 [PATCH] powerpc/64s/interrupt: Fix lost interrupts when returning to soft-masked context Nicholas Piggin
  2022-10-14 17:05 ` Guenter Roeck
@ 2022-10-14 23:21 ` Michael Ellerman
  1 sibling, 0 replies; 4+ messages in thread
From: Michael Ellerman @ 2022-10-14 23:21 UTC (permalink / raw)
  To: linuxppc-dev, Nicholas Piggin
  Cc: Jason A . Donenfeld, Sachin Sant, Guenter Roeck

On Thu, 13 Oct 2022 16:44:18 +1000, Nicholas Piggin wrote:
> It's possible for an interrupt returning to an irqs-disabled context to
> lose a pending soft-masked irq because it branches to part of the exit
> code for irqs-enabled contexts, which is meant to clear only the
> PACA_IRQS_HARD_DIS flag from PACAIRQHAPPENED by zeroing the byte. This
> just looks like a simple thinko from a recent commit (if there was no
> hard mask pending, there would be no reason to clear it anyway).
> 
> [...]

Applied to powerpc/fixes.

[1/1] powerpc/64s/interrupt: Fix lost interrupts when returning to soft-masked context
      https://git.kernel.org/powerpc/c/a4cb3651a174366cc85a677da9e3681fbe97fdae

cheers

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [PATCH] powerpc/64s/interrupt: Fix lost interrupts when returning to soft-masked context
@ 2022-10-13 15:58 Guenter Roeck
  0 siblings, 0 replies; 4+ messages in thread
From: Guenter Roeck @ 2022-10-13 15:58 UTC (permalink / raw)
  To: Nicholas Piggin; +Cc: Jason A . Donenfeld, linuxppc-dev, Sachin Sant

On Thu, Oct 13, 2022 at 04:44:18PM +1000, Nicholas Piggin wrote:
> It's possible for an interrupt returning to an irqs-disabled context to
> lose a pending soft-masked irq because it branches to part of the exit
> code for irqs-enabled contexts, which is meant to clear only the
> PACA_IRQS_HARD_DIS flag from PACAIRQHAPPENED by zeroing the byte. This
> just looks like a simple thinko from a recent commit (if there was no
> hard mask pending, there would be no reason to clear it anyway).
> 
> This also adds comment to the code that actually does need to clear the
> flag.
> 
> Cc: Jason A. Donenfeld <Jason@zx2c4.com>
> Reported-by: Sachin Sant <sachinp@linux.ibm.com>
> Reported-by: Guenter Roeck <linux@roeck-us.net>
> Fixes: e485f6c751e0a ("powerpc/64/interrupt: Fix return to masked context after hard-mask irq becomes pending")
> Signed-off-by: Nicholas Piggin <npiggin@gmail.com>

Tested-by: Guenter Roeck <linux@roeck-us.net>

> ---
> I credit Sachin as well because he likely ran into it here,
> 
> https://lists.ozlabs.org/pipermail/linuxppc-dev/2022-March/240971.html
> 
> It was much more difficult to hit on his setup so when I asked him to
> re-test with a later kernel and it didn't reproduce, I thought it
> could have been unrelated. I stared at the patch for ages back then and
> didn't see the bug. I should have known better :(
> 
> I'm pretty confident this is the hang problem. Reproducer is intermittent
> for me, but I did catch it losing pending irqs here using debug code, so
> this certainly is _a_ bug that can explain the symptoms.
> 
> Thanks,
> Nick
> 
>  arch/powerpc/kernel/interrupt_64.S | 15 +++++++++++++--
>  1 file changed, 13 insertions(+), 2 deletions(-)
> 
> diff --git a/arch/powerpc/kernel/interrupt_64.S b/arch/powerpc/kernel/interrupt_64.S
> index 904a5608cbe3..978a173eb339 100644
> --- a/arch/powerpc/kernel/interrupt_64.S
> +++ b/arch/powerpc/kernel/interrupt_64.S
> @@ -538,7 +538,7 @@ _ASM_NOKPROBE_SYMBOL(interrupt_return_\srr\()_kernel)
>  	beq	.Lfast_kernel_interrupt_return_\srr\() // EE already disabled
>  	lbz	r11,PACAIRQHAPPENED(r13)
>  	andi.	r10,r11,PACA_IRQ_MUST_HARD_MASK
> -	beq	1f // No HARD_MASK pending
> +	beq	.Lfast_kernel_interrupt_return_\srr\() // No HARD_MASK pending
>  
>  	/* Must clear MSR_EE from _MSR */
>  #ifdef CONFIG_PPC_BOOK3S
> @@ -555,12 +555,23 @@ _ASM_NOKPROBE_SYMBOL(interrupt_return_\srr\()_kernel)
>  	b	.Lfast_kernel_interrupt_return_\srr\()
>  
>  .Linterrupt_return_\srr\()_soft_enabled:
> +	/*
> +	 * In the soft-enabled case, need to double-check that we have no
> +	 * pending interrupts that might have come in before we reached the
> +	 * restart section of code, and restart the exit so those can be
> +	 * handled.
> +	 *
> +	 * If there are none, it is be possible that the interrupt still
> +	 * has PACA_IRQ_HARD_DIS set, which needs to be cleared for the
> +	 * interrupted context. This clear will not clobber a new pending
> +	 * interrupt coming in, because we're in the restart section, so
> +	 * such would return to the restart location.
> +	 */
>  #ifdef CONFIG_PPC_BOOK3S
>  	lbz	r11,PACAIRQHAPPENED(r13)
>  	andi.	r11,r11,(~PACA_IRQ_HARD_DIS)@l
>  	bne-	interrupt_return_\srr\()_kernel_restart
>  #endif
> -1:
>  	li	r11,0
>  	stb	r11,PACAIRQHAPPENED(r13) // clear the possible HARD_DIS
>  
> -- 
> 2.37.2
> 

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2022-10-14 23:22 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-10-13  6:44 [PATCH] powerpc/64s/interrupt: Fix lost interrupts when returning to soft-masked context Nicholas Piggin
2022-10-14 17:05 ` Guenter Roeck
2022-10-14 23:21 ` Michael Ellerman
2022-10-13 15:58 Guenter Roeck

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.