All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH] cxl: Prevent IRQ storm
@ 2017-04-26  6:40 Alastair D'Silva
  2017-04-26  6:56 ` Andrew Donnellan
  2017-04-26  9:23 ` Frederic Barrat
  0 siblings, 2 replies; 5+ messages in thread
From: Alastair D'Silva @ 2017-04-26  6:40 UTC (permalink / raw)
  To: linuxppc-dev; +Cc: frederic.barrat, andrew.donnellan, Alastair D'Silva

From: Alastair D'Silva <alastair@d-silva.org>

In some situations, a faulty AFU slice may create an interrupt storm,
rendering the machine unusable. Since these interrupts are informational
only, present the interrupt once, then mask it off to prevent it from
being retriggered until the card is reset.

Signed-off-by: Alastair D'Silva <alastair@d-silva.org>
---
 drivers/misc/cxl/native.c | 6 +++++-
 1 file changed, 5 insertions(+), 1 deletion(-)

diff --git a/drivers/misc/cxl/native.c b/drivers/misc/cxl/native.c
index 7ae7105..4e8010f 100644
--- a/drivers/misc/cxl/native.c
+++ b/drivers/misc/cxl/native.c
@@ -996,7 +996,7 @@ static void native_irq_wait(struct cxl_context *ctx)
 static irqreturn_t native_slice_irq_err(int irq, void *data)
 {
 	struct cxl_afu *afu = data;
-	u64 fir_slice, errstat, serr, afu_debug, afu_error, dsisr;
+	u64 fir_slice, errstat, serr, afu_debug, afu_error, dsisr, irq_mask;
 
 	/*
 	 * slice err interrupt is only used with full PSL (no XSL)
@@ -1014,6 +1014,10 @@ static irqreturn_t native_slice_irq_err(int irq, void *data)
 	dev_crit(&afu->dev, "AFU_ERR_An: 0x%.16llx\n", afu_error);
 	dev_crit(&afu->dev, "PSL_DSISR_An: 0x%.16llx\n", dsisr);
 
+	/* mask off the IRQ so it won't retrigger until the card is reset */
+	irq_mask = (serr & 0xff80000000000000ULL) >> 32;
+	serr |= irq_mask;
+
 	cxl_p1n_write(afu, CXL_PSL_SERR_An, serr);
 
 	return IRQ_HANDLED;
-- 
2.9.3

^ permalink raw reply related	[flat|nested] 5+ messages in thread

* Re: [PATCH] cxl: Prevent IRQ storm
  2017-04-26  6:40 [PATCH] cxl: Prevent IRQ storm Alastair D'Silva
@ 2017-04-26  6:56 ` Andrew Donnellan
  2017-04-26  9:23 ` Frederic Barrat
  1 sibling, 0 replies; 5+ messages in thread
From: Andrew Donnellan @ 2017-04-26  6:56 UTC (permalink / raw)
  To: Alastair D'Silva, linuxppc-dev; +Cc: frederic.barrat, Alastair D'Silva

On 26/04/17 16:40, Alastair D'Silva wrote:
> From: Alastair D'Silva <alastair@d-silva.org>
>
> In some situations, a faulty AFU slice may create an interrupt storm,
> rendering the machine unusable. Since these interrupts are informational
> only, present the interrupt once, then mask it off to prevent it from
> being retriggered until the card is reset.
>
> Signed-off-by: Alastair D'Silva <alastair@d-silva.org>

Reviewed-by: Andrew Donnellan <andrew.donnellan@au1.ibm.com>

> ---
>  drivers/misc/cxl/native.c | 6 +++++-
>  1 file changed, 5 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/misc/cxl/native.c b/drivers/misc/cxl/native.c
> index 7ae7105..4e8010f 100644
> --- a/drivers/misc/cxl/native.c
> +++ b/drivers/misc/cxl/native.c
> @@ -996,7 +996,7 @@ static void native_irq_wait(struct cxl_context *ctx)
>  static irqreturn_t native_slice_irq_err(int irq, void *data)
>  {
>  	struct cxl_afu *afu = data;
> -	u64 fir_slice, errstat, serr, afu_debug, afu_error, dsisr;
> +	u64 fir_slice, errstat, serr, afu_debug, afu_error, dsisr, irq_mask;
>
>  	/*
>  	 * slice err interrupt is only used with full PSL (no XSL)
> @@ -1014,6 +1014,10 @@ static irqreturn_t native_slice_irq_err(int irq, void *data)
>  	dev_crit(&afu->dev, "AFU_ERR_An: 0x%.16llx\n", afu_error);
>  	dev_crit(&afu->dev, "PSL_DSISR_An: 0x%.16llx\n", dsisr);
>
> +	/* mask off the IRQ so it won't retrigger until the card is reset */
> +	irq_mask = (serr & 0xff80000000000000ULL) >> 32;
> +	serr |= irq_mask;
> +
>  	cxl_p1n_write(afu, CXL_PSL_SERR_An, serr);
>
>  	return IRQ_HANDLED;
>

-- 
Andrew Donnellan              OzLabs, ADL Canberra
andrew.donnellan@au1.ibm.com  IBM Australia Limited

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH] cxl: Prevent IRQ storm
  2017-04-26  6:40 [PATCH] cxl: Prevent IRQ storm Alastair D'Silva
  2017-04-26  6:56 ` Andrew Donnellan
@ 2017-04-26  9:23 ` Frederic Barrat
  2017-04-27  1:09   ` Alastair D'Silva
  1 sibling, 1 reply; 5+ messages in thread
From: Frederic Barrat @ 2017-04-26  9:23 UTC (permalink / raw)
  To: Alastair D'Silva, linuxppc-dev
  Cc: Alastair D'Silva, andrew.donnellan, frederic.barrat



Le 26/04/2017 à 08:40, Alastair D'Silva a écrit :
> From: Alastair D'Silva <alastair@d-silva.org>
>
> In some situations, a faulty AFU slice may create an interrupt storm,
> rendering the machine unusable. Since these interrupts are informational
> only, present the interrupt once, then mask it off to prevent it from
> being retriggered until the card is reset.
>
> Signed-off-by: Alastair D'Silva <alastair@d-silva.org>
> ---

Patch looks good, thanks!
It doesn't apply cleanly on the 'next' tree due to the capi2 patchset 
though, so you should probably rebase on that tree. The bits have 
changed a bit on PSL9, but the approach still works (error type reported 
in the first byte, and the corresponding masking bits are still 
right-shifted by 32).

   Fred

>  drivers/misc/cxl/native.c | 6 +++++-
>  1 file changed, 5 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/misc/cxl/native.c b/drivers/misc/cxl/native.c
> index 7ae7105..4e8010f 100644
> --- a/drivers/misc/cxl/native.c
> +++ b/drivers/misc/cxl/native.c
> @@ -996,7 +996,7 @@ static void native_irq_wait(struct cxl_context *ctx)
>  static irqreturn_t native_slice_irq_err(int irq, void *data)
>  {
>  	struct cxl_afu *afu = data;
> -	u64 fir_slice, errstat, serr, afu_debug, afu_error, dsisr;
> +	u64 fir_slice, errstat, serr, afu_debug, afu_error, dsisr, irq_mask;
>
>  	/*
>  	 * slice err interrupt is only used with full PSL (no XSL)
> @@ -1014,6 +1014,10 @@ static irqreturn_t native_slice_irq_err(int irq, void *data)
>  	dev_crit(&afu->dev, "AFU_ERR_An: 0x%.16llx\n", afu_error);
>  	dev_crit(&afu->dev, "PSL_DSISR_An: 0x%.16llx\n", dsisr);
>
> +	/* mask off the IRQ so it won't retrigger until the card is reset */
> +	irq_mask = (serr & 0xff80000000000000ULL) >> 32;
> +	serr |= irq_mask;
> +
>  	cxl_p1n_write(afu, CXL_PSL_SERR_An, serr);
>
>  	return IRQ_HANDLED;
>

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH] cxl: Prevent IRQ storm
  2017-04-26  9:23 ` Frederic Barrat
@ 2017-04-27  1:09   ` Alastair D'Silva
  2017-04-27  1:13     ` Andrew Donnellan
  0 siblings, 1 reply; 5+ messages in thread
From: Alastair D'Silva @ 2017-04-27  1:09 UTC (permalink / raw)
  To: Frederic Barrat, linuxppc-dev; +Cc: andrew.donnellan

On Wed, 2017-04-26 at 11:23 +0200, Frederic Barrat wrote:
> 
> Le 26/04/2017 à 08:40, Alastair D'Silva a écrit :
> > From: Alastair D'Silva <alastair@d-silva.org>
> > 
> > In some situations, a faulty AFU slice may create an interrupt
> > storm,
> > rendering the machine unusable. Since these interrupts are
> > informational
> > only, present the interrupt once, then mask it off to prevent it
> > from
> > being retriggered until the card is reset.
> > 
> > Signed-off-by: Alastair D'Silva <alastair@d-silva.org>
> > ---
> 
> Patch looks good, thanks!
> It doesn't apply cleanly on the 'next' tree due to the capi2
> patchset 
> though, so you should probably rebase on that tree. The bits have 
> changed a bit on PSL9, but the approach still works (error type
> reported 
> in the first byte, and the corresponding masking bits are still 
> right-shifted by 32).
> 

Hmm, both you & the documentation say 8 bits, but the code suggests 9:
#define CXL_PSL_SERR_An_afuto	(1ull << (63-0))
#define CXL_PSL_SERR_An_afudis	(1ull << (63-1))
#define CXL_PSL_SERR_An_afuov	(1ull << (63-2))
#define CXL_PSL_SERR_An_badsrc	(1ull << (63-3))
#define CXL_PSL_SERR_An_badctx	(1ull << (63-4))
#define CXL_PSL_SERR_An_llcmdis	(1ull << (63-5))
#define CXL_PSL_SERR_An_llcmdto	(1ull << (63-6))
#define CXL_PSL_SERR_An_afupar	(1ull << (63-7))
#define CXL_PSL_SERR_An_afudup	(1ull << (63-8))

Referenced from irq.c:cxl_afu_decode_psl_serr()

Thoughts?

-- 
Alastair D'Silva
Open Source Developer
Linux Technology Centre, IBM Australia
mob: 0423 762 819

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH] cxl: Prevent IRQ storm
  2017-04-27  1:09   ` Alastair D'Silva
@ 2017-04-27  1:13     ` Andrew Donnellan
  0 siblings, 0 replies; 5+ messages in thread
From: Andrew Donnellan @ 2017-04-27  1:13 UTC (permalink / raw)
  To: Alastair D'Silva, Frederic Barrat, linuxppc-dev

On 27/04/17 11:09, Alastair D'Silva wrote:
>> Patch looks good, thanks!
>> It doesn't apply cleanly on the 'next' tree due to the capi2
>> patchset
>> though, so you should probably rebase on that tree. The bits have
>> changed a bit on PSL9, but the approach still works (error type
>> reported
>> in the first byte, and the corresponding masking bits are still
>> right-shifted by 32).
>>
>
> Hmm, both you & the documentation say 8 bits, but the code suggests 9:
> #define CXL_PSL_SERR_An_afuto	(1ull << (63-0))
> #define CXL_PSL_SERR_An_afudis	(1ull << (63-1))
> #define CXL_PSL_SERR_An_afuov	(1ull << (63-2))
> #define CXL_PSL_SERR_An_badsrc	(1ull << (63-3))
> #define CXL_PSL_SERR_An_badctx	(1ull << (63-4))
> #define CXL_PSL_SERR_An_llcmdis	(1ull << (63-5))
> #define CXL_PSL_SERR_An_llcmdto	(1ull << (63-6))
> #define CXL_PSL_SERR_An_afupar	(1ull << (63-7))
> #define CXL_PSL_SERR_An_afudup	(1ull << (63-8))
>
> Referenced from irq.c:cxl_afu_decode_psl_serr()
>
> Thoughts?

The latest version of the (IBM internal) PSL8 workbook which I happen to 
have at hand lists bit 8 as "Reserved" in the bitfield diagram, but 
lists it as "afudup" in the description underneath, so I think it's safe 
to say it's the first 9 bits.

-- 
Andrew Donnellan              OzLabs, ADL Canberra
andrew.donnellan@au1.ibm.com  IBM Australia Limited

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2017-04-27  1:14 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-04-26  6:40 [PATCH] cxl: Prevent IRQ storm Alastair D'Silva
2017-04-26  6:56 ` Andrew Donnellan
2017-04-26  9:23 ` Frederic Barrat
2017-04-27  1:09   ` Alastair D'Silva
2017-04-27  1:13     ` Andrew Donnellan

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.