All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 1/2] cxl: Keep IRQ mappings on context teardown
@ 2016-04-22  4:57 Michael Neuling
  2016-04-22  4:57 ` [PATCH 2/2] cxl: Poll for outstanding IRQs when detaching a context Michael Neuling
                   ` (4 more replies)
  0 siblings, 5 replies; 10+ messages in thread
From: Michael Neuling @ 2016-04-22  4:57 UTC (permalink / raw)
  To: mpe
  Cc: Frederic Barrat, imunsie, linuxppc-dev, mikey, Vaibhav Jain,
	Frank Haverkamp, Joerg-Stephan Vogt, Wolfgang Bolz1,
	andrew.donnellan

Keep IRQ mappings on context teardown.  This won't leak IRQs as if we
allocate the mapping again, the generic code will give the same
mapping used last time.

Doing this works around a race in the generic code. Masking the
interrupt introduces a race which can crash the kernel or result in
IRQ that is never EOIed. The lost of EOI results in all subsequent
mappings to the same HW IRQ never receiving an interrupt.

We've seen this race with cxl test cases which are doing heavy context
startup and teardown at the same time as heavy interrupt load.

A fix to the generic code is being investigated also.

Signed-off-by: Michael Neuling <mikey@neuling.org>
cc: stable@vger.kernel.org # 3.8
---
 drivers/misc/cxl/irq.c | 1 -
 1 file changed, 1 deletion(-)

diff --git a/drivers/misc/cxl/irq.c b/drivers/misc/cxl/irq.c
index be646dc..8def455 100644
--- a/drivers/misc/cxl/irq.c
+++ b/drivers/misc/cxl/irq.c
@@ -203,7 +203,6 @@ unsigned int cxl_map_irq(struct cxl *adapter, irq_hw_number_t hwirq,
 void cxl_unmap_irq(unsigned int virq, void *cookie)
 {
 	free_irq(virq, cookie);
-	irq_dispose_mapping(virq);
 }
 
 int cxl_register_one_irq(struct cxl *adapter,
-- 
2.5.0

^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [PATCH 2/2] cxl: Poll for outstanding IRQs when detaching a context
  2016-04-22  4:57 [PATCH 1/2] cxl: Keep IRQ mappings on context teardown Michael Neuling
@ 2016-04-22  4:57 ` Michael Neuling
  2016-04-22  5:14   ` Andrew Donnellan
                     ` (3 more replies)
  2016-04-22  5:14 ` [PATCH 1/2] cxl: Keep IRQ mappings on context teardown Andrew Donnellan
                   ` (3 subsequent siblings)
  4 siblings, 4 replies; 10+ messages in thread
From: Michael Neuling @ 2016-04-22  4:57 UTC (permalink / raw)
  To: mpe
  Cc: Frederic Barrat, imunsie, linuxppc-dev, mikey, Vaibhav Jain,
	Frank Haverkamp, Joerg-Stephan Vogt, Wolfgang Bolz1,
	andrew.donnellan

When detaching contexts, we may still have interrupts in the system
which are yet to be delivered to any CPU and be acked in the PSL.
This can result in a subsequent unrelated process getting an spurious
IRQ or an interrupt for a non-existent context.

This polls the PSL to ensure that the PSL is clear of IRQs for the
detached context, before removing the context from the idr.

Signed-off-by: Michael Neuling <mikey@neuling.org>
---
 drivers/misc/cxl/context.c |  7 +++++++
 drivers/misc/cxl/cxl.h     |  2 ++
 drivers/misc/cxl/native.c  | 31 +++++++++++++++++++++++++++++++
 3 files changed, 40 insertions(+)

diff --git a/drivers/misc/cxl/context.c b/drivers/misc/cxl/context.c
index 10370f2..7edea9c 100644
--- a/drivers/misc/cxl/context.c
+++ b/drivers/misc/cxl/context.c
@@ -223,6 +223,13 @@ int __detach_context(struct cxl_context *ctx)
 		cxl_ops->link_ok(ctx->afu->adapter, ctx->afu));
 	flush_work(&ctx->fault_work); /* Only needed for dedicated process */
 
+	/*
+	 * Wait until no further interrupts are presented by the PSL
+	 * for this context.
+	 */
+	if (cxl_ops->irq_wait)
+		cxl_ops->irq_wait(ctx);
+
 	/* release the reference to the group leader and mm handling pid */
 	put_pid(ctx->pid);
 	put_pid(ctx->glpid);
diff --git a/drivers/misc/cxl/cxl.h b/drivers/misc/cxl/cxl.h
index 38e21cf..73dc2a3 100644
--- a/drivers/misc/cxl/cxl.h
+++ b/drivers/misc/cxl/cxl.h
@@ -274,6 +274,7 @@ static const cxl_p2n_reg_t CXL_PSL_WED_An     = {0x0A0};
 #define CXL_PSL_DSISR_An_PE (1ull << (63-4))  /* PSL Error (implementation specific) */
 #define CXL_PSL_DSISR_An_AE (1ull << (63-5))  /* AFU Error */
 #define CXL_PSL_DSISR_An_OC (1ull << (63-6))  /* OS Context Warning */
+#define CXL_PSL_DSISR_PENDING (CXL_PSL_DSISR_TRANS | CXL_PSL_DSISR_An_PE | CXL_PSL_DSISR_An_AE | CXL_PSL_DSISR_An_OC)
 /* NOTE: Bits 32:63 are undefined if DSISR[DS] = 1 */
 #define CXL_PSL_DSISR_An_M  DSISR_NOHPTE      /* PTE not found */
 #define CXL_PSL_DSISR_An_P  DSISR_PROTFAULT   /* Storage protection violation */
@@ -855,6 +856,7 @@ struct cxl_backend_ops {
 					u64 dsisr, u64 errstat);
 	irqreturn_t (*psl_interrupt)(int irq, void *data);
 	int (*ack_irq)(struct cxl_context *ctx, u64 tfc, u64 psl_reset_mask);
+	void (*irq_wait)(struct cxl_context *ctx);
 	int (*attach_process)(struct cxl_context *ctx, bool kernel,
 			u64 wed, u64 amr);
 	int (*detach_process)(struct cxl_context *ctx);
diff --git a/drivers/misc/cxl/native.c b/drivers/misc/cxl/native.c
index 387fcbd..ecf7557 100644
--- a/drivers/misc/cxl/native.c
+++ b/drivers/misc/cxl/native.c
@@ -14,6 +14,7 @@
 #include <linux/mutex.h>
 #include <linux/mm.h>
 #include <linux/uaccess.h>
+#include <linux/delay.h>
 #include <asm/synch.h>
 #include <misc/cxl-base.h>
 
@@ -797,6 +798,35 @@ static irqreturn_t native_irq_multiplexed(int irq, void *data)
 	return fail_psl_irq(afu, &irq_info);
 }
 
+void native_irq_wait(struct cxl_context *ctx)
+{
+	u64 dsisr;
+	int timeout = 1000;
+	int ph;
+
+	/*
+	 * Wait until no further interrupts are presented by the PSL
+	 * for this context.
+	 */
+	while (timeout--) {
+		ph = cxl_p2n_read(ctx->afu, CXL_PSL_PEHandle_An) & 0xffff;
+		if (ph != ctx->pe)
+			return;
+		dsisr = cxl_p2n_read(ctx->afu, CXL_PSL_DSISR_An);
+		if ((dsisr & CXL_PSL_DSISR_PENDING) == 0)
+			return;
+		/*
+		 * We are waiting for the workqueue to process our
+		 * irq, so need to let that run here.
+		 */
+		msleep(1);
+	}
+
+	dev_warn(&ctx->afu->dev, "WARNING: waiting on DSI for PE %i"
+		 " DSISR %016llx!\n", ph, dsisr);
+	return;
+}
+
 static irqreturn_t native_slice_irq_err(int irq, void *data)
 {
 	struct cxl_afu *afu = data;
@@ -1076,6 +1106,7 @@ const struct cxl_backend_ops cxl_native_ops = {
 	.handle_psl_slice_error = native_handle_psl_slice_error,
 	.psl_interrupt = NULL,
 	.ack_irq = native_ack_irq,
+	.irq_wait = native_irq_wait,
 	.attach_process = native_attach_process,
 	.detach_process = native_detach_process,
 	.support_attributes = native_support_attributes,
-- 
2.5.0

^ permalink raw reply related	[flat|nested] 10+ messages in thread

* Re: [PATCH 1/2] cxl: Keep IRQ mappings on context teardown
  2016-04-22  4:57 [PATCH 1/2] cxl: Keep IRQ mappings on context teardown Michael Neuling
  2016-04-22  4:57 ` [PATCH 2/2] cxl: Poll for outstanding IRQs when detaching a context Michael Neuling
@ 2016-04-22  5:14 ` Andrew Donnellan
  2016-04-22  5:18 ` Ian Munsie
                   ` (2 subsequent siblings)
  4 siblings, 0 replies; 10+ messages in thread
From: Andrew Donnellan @ 2016-04-22  5:14 UTC (permalink / raw)
  To: Michael Neuling, mpe
  Cc: Joerg-Stephan Vogt, Frederic Barrat, Vaibhav Jain,
	Wolfgang Bolz1, imunsie, Frank Haverkamp, linuxppc-dev

On 22/04/16 14:57, Michael Neuling wrote:
> Keep IRQ mappings on context teardown.  This won't leak IRQs as if we
> allocate the mapping again, the generic code will give the same
> mapping used last time.
>
> Doing this works around a race in the generic code. Masking the
> interrupt introduces a race which can crash the kernel or result in
> IRQ that is never EOIed. The lost of EOI results in all subsequent
> mappings to the same HW IRQ never receiving an interrupt.
>
> We've seen this race with cxl test cases which are doing heavy context
> startup and teardown at the same time as heavy interrupt load.
>
> A fix to the generic code is being investigated also.
>
> Signed-off-by: Michael Neuling <mikey@neuling.org>
> cc: stable@vger.kernel.org # 3.8

Tested on top of 4.6-rc3 using the genwqe-echo test utility[0].

Tested-by: Andrew Donnellan <andrew.donnellan@au1.ibm.com>

[0] 
https://github.com/ibm-genwqe/genwqe-user/blob/master/tools/genwqe_echo.c

-- 
Andrew Donnellan              OzLabs, ADL Canberra
andrew.donnellan@au1.ibm.com  IBM Australia Limited

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH 2/2] cxl: Poll for outstanding IRQs when detaching a context
  2016-04-22  4:57 ` [PATCH 2/2] cxl: Poll for outstanding IRQs when detaching a context Michael Neuling
@ 2016-04-22  5:14   ` Andrew Donnellan
  2016-04-22  5:18   ` Ian Munsie
                     ` (2 subsequent siblings)
  3 siblings, 0 replies; 10+ messages in thread
From: Andrew Donnellan @ 2016-04-22  5:14 UTC (permalink / raw)
  To: Michael Neuling, mpe
  Cc: Joerg-Stephan Vogt, Frederic Barrat, Vaibhav Jain,
	Wolfgang Bolz1, imunsie, Frank Haverkamp, linuxppc-dev

On 22/04/16 14:57, Michael Neuling wrote:
> When detaching contexts, we may still have interrupts in the system
> which are yet to be delivered to any CPU and be acked in the PSL.
> This can result in a subsequent unrelated process getting an spurious
> IRQ or an interrupt for a non-existent context.
>
> This polls the PSL to ensure that the PSL is clear of IRQs for the
> detached context, before removing the context from the idr.
>
> Signed-off-by: Michael Neuling <mikey@neuling.org>

Tested on top of 4.6-rc3 using the genwqe-echo test utility[0].

Tested-by: Andrew Donnellan <andrew.donnellan@au1.ibm.com>

[0] 
https://github.com/ibm-genwqe/genwqe-user/blob/master/tools/genwqe_echo.c

-- 
Andrew Donnellan              OzLabs, ADL Canberra
andrew.donnellan@au1.ibm.com  IBM Australia Limited

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH 1/2] cxl: Keep IRQ mappings on context teardown
  2016-04-22  4:57 [PATCH 1/2] cxl: Keep IRQ mappings on context teardown Michael Neuling
  2016-04-22  4:57 ` [PATCH 2/2] cxl: Poll for outstanding IRQs when detaching a context Michael Neuling
  2016-04-22  5:14 ` [PATCH 1/2] cxl: Keep IRQ mappings on context teardown Andrew Donnellan
@ 2016-04-22  5:18 ` Ian Munsie
  2016-04-22  6:03 ` Vaibhav Jain
  2016-04-28  1:52 ` [1/2] " Michael Ellerman
  4 siblings, 0 replies; 10+ messages in thread
From: Ian Munsie @ 2016-04-22  5:18 UTC (permalink / raw)
  To: Michael Neuling
  Cc: Michael Ellerman, Frederic Barrat, linuxppc-dev, Vaibhav Jain,
	Frank Haverkamp, Joerg-Stephan Vogt, Wolfgang Bolz1,
	andrew.donnellan

Acked-by: Ian Munsie <imunsie@au1.ibm.com>

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH 2/2] cxl: Poll for outstanding IRQs when detaching a context
  2016-04-22  4:57 ` [PATCH 2/2] cxl: Poll for outstanding IRQs when detaching a context Michael Neuling
  2016-04-22  5:14   ` Andrew Donnellan
@ 2016-04-22  5:18   ` Ian Munsie
  2016-04-22  6:05   ` Vaibhav Jain
  2016-04-28  1:52   ` [2/2] " Michael Ellerman
  3 siblings, 0 replies; 10+ messages in thread
From: Ian Munsie @ 2016-04-22  5:18 UTC (permalink / raw)
  To: Michael Neuling
  Cc: Michael Ellerman, Joerg-Stephan Vogt, Frederic Barrat,
	Vaibhav Jain, Wolfgang Bolz1, andrew.donnellan, Frank Haverkamp,
	linuxppc-dev

Acked-by: Ian Munsie <imunsie@au1.ibm.com>

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH 1/2] cxl: Keep IRQ mappings on context teardown
  2016-04-22  4:57 [PATCH 1/2] cxl: Keep IRQ mappings on context teardown Michael Neuling
                   ` (2 preceding siblings ...)
  2016-04-22  5:18 ` Ian Munsie
@ 2016-04-22  6:03 ` Vaibhav Jain
  2016-04-28  1:52 ` [1/2] " Michael Ellerman
  4 siblings, 0 replies; 10+ messages in thread
From: Vaibhav Jain @ 2016-04-22  6:03 UTC (permalink / raw)
  To: Michael Neuling, mpe
  Cc: mikey, Joerg-Stephan Vogt, Frederic Barrat, Wolfgang Bolz1,
	imunsie, andrew.donnellan, Frank Haverkamp, linuxppc-dev

Michael Neuling <mikey@neuling.org> writes:

> Keep IRQ mappings on context teardown.  This won't leak IRQs as if we
> allocate the mapping again, the generic code will give the same
> mapping used last time.
>
> Doing this works around a race in the generic code. Masking the
> interrupt introduces a race which can crash the kernel or result in
> IRQ that is never EOIed. The lost of EOI results in all subsequent
> mappings to the same HW IRQ never receiving an interrupt.
>
> We've seen this race with cxl test cases which are doing heavy context
> startup and teardown at the same time as heavy interrupt load.
>
> A fix to the generic code is being investigated also.
>
> Signed-off-by: Michael Neuling <mikey@neuling.org>
> cc: stable@vger.kernel.org # 3.8

Tested on top of following skiboot patches that fix potential races in
phb.

http://patchwork.ozlabs.org/patch/581764/
http://patchwork.ozlabs.org/patch/581765/

Tested-by: Vaibhav Jain <vaibhav@linux.vnet.ibm.com>

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH 2/2] cxl: Poll for outstanding IRQs when detaching a context
  2016-04-22  4:57 ` [PATCH 2/2] cxl: Poll for outstanding IRQs when detaching a context Michael Neuling
  2016-04-22  5:14   ` Andrew Donnellan
  2016-04-22  5:18   ` Ian Munsie
@ 2016-04-22  6:05   ` Vaibhav Jain
  2016-04-28  1:52   ` [2/2] " Michael Ellerman
  3 siblings, 0 replies; 10+ messages in thread
From: Vaibhav Jain @ 2016-04-22  6:05 UTC (permalink / raw)
  To: Michael Neuling, mpe
  Cc: Frederic Barrat, imunsie, linuxppc-dev, mikey, Frank Haverkamp,
	Joerg-Stephan Vogt, Wolfgang Bolz1, andrew.donnellan

Michael Neuling <mikey@neuling.org> writes:

> When detaching contexts, we may still have interrupts in the system
> which are yet to be delivered to any CPU and be acked in the PSL.
> This can result in a subsequent unrelated process getting an spurious
> IRQ or an interrupt for a non-existent context.
>
> This polls the PSL to ensure that the PSL is clear of IRQs for the
> detached context, before removing the context from the idr.
>
> Signed-off-by: Michael Neuling <mikey@neuling.org>

Tested-by: Vaibhav Jain <vaibhav@linux.vnet.ibm.com>

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [1/2] cxl: Keep IRQ mappings on context teardown
  2016-04-22  4:57 [PATCH 1/2] cxl: Keep IRQ mappings on context teardown Michael Neuling
                   ` (3 preceding siblings ...)
  2016-04-22  6:03 ` Vaibhav Jain
@ 2016-04-28  1:52 ` Michael Ellerman
  4 siblings, 0 replies; 10+ messages in thread
From: Michael Ellerman @ 2016-04-28  1:52 UTC (permalink / raw)
  To: Michael Neuling
  Cc: mikey, Joerg-Stephan Vogt, Frederic Barrat, Vaibhav Jain,
	Wolfgang Bolz1, imunsie, andrew.donnellan, Frank Haverkamp,
	linuxppc-dev

On Fri, 2016-22-04 at 04:57:48 UTC, Michael Neuling wrote:
> Keep IRQ mappings on context teardown.  This won't leak IRQs as if we
> allocate the mapping again, the generic code will give the same
> mapping used last time.
> 
> Doing this works around a race in the generic code. Masking the
> interrupt introduces a race which can crash the kernel or result in
> IRQ that is never EOIed. The lost of EOI results in all subsequent
> mappings to the same HW IRQ never receiving an interrupt.
> 
> We've seen this race with cxl test cases which are doing heavy context
> startup and teardown at the same time as heavy interrupt load.
> 
> A fix to the generic code is being investigated also.
> 
> Signed-off-by: Michael Neuling <mikey@neuling.org>
> cc: stable@vger.kernel.org # 3.8
> Tested-by: Andrew Donnellan <andrew.donnellan@au1.ibm.com>
> Acked-by: Ian Munsie <imunsie@au1.ibm.com>
> Tested-by: Vaibhav Jain <vaibhav@linux.vnet.ibm.com>

Applied to powerpc fixes, thanks.

https://git.kernel.org/powerpc/c/d6776bba44d9752f6cdf640046

cheers

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [2/2] cxl: Poll for outstanding IRQs when detaching a context
  2016-04-22  4:57 ` [PATCH 2/2] cxl: Poll for outstanding IRQs when detaching a context Michael Neuling
                     ` (2 preceding siblings ...)
  2016-04-22  6:05   ` Vaibhav Jain
@ 2016-04-28  1:52   ` Michael Ellerman
  3 siblings, 0 replies; 10+ messages in thread
From: Michael Ellerman @ 2016-04-28  1:52 UTC (permalink / raw)
  To: Michael Neuling
  Cc: mikey, Joerg-Stephan Vogt, Frederic Barrat, Vaibhav Jain,
	Wolfgang Bolz1, imunsie, andrew.donnellan, Frank Haverkamp,
	linuxppc-dev

On Fri, 2016-22-04 at 04:57:49 UTC, Michael Neuling wrote:
> When detaching contexts, we may still have interrupts in the system
> which are yet to be delivered to any CPU and be acked in the PSL.
> This can result in a subsequent unrelated process getting an spurious
> IRQ or an interrupt for a non-existent context.
> 
> This polls the PSL to ensure that the PSL is clear of IRQs for the
> detached context, before removing the context from the idr.
> 
> Signed-off-by: Michael Neuling <mikey@neuling.org>
> Tested-by: Andrew Donnellan <andrew.donnellan@au1.ibm.com>
> Acked-by: Ian Munsie <imunsie@au1.ibm.com>
> Tested-by: Vaibhav Jain <vaibhav@linux.vnet.ibm.com>

Applied to powerpc fixes, thanks.

https://git.kernel.org/powerpc/c/2bc79ffcbb817873cc43d63118

cheers

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2016-04-28  1:52 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-04-22  4:57 [PATCH 1/2] cxl: Keep IRQ mappings on context teardown Michael Neuling
2016-04-22  4:57 ` [PATCH 2/2] cxl: Poll for outstanding IRQs when detaching a context Michael Neuling
2016-04-22  5:14   ` Andrew Donnellan
2016-04-22  5:18   ` Ian Munsie
2016-04-22  6:05   ` Vaibhav Jain
2016-04-28  1:52   ` [2/2] " Michael Ellerman
2016-04-22  5:14 ` [PATCH 1/2] cxl: Keep IRQ mappings on context teardown Andrew Donnellan
2016-04-22  5:18 ` Ian Munsie
2016-04-22  6:03 ` Vaibhav Jain
2016-04-28  1:52 ` [1/2] " Michael Ellerman

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.