All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 1/2] cxl: Fix driver use count
@ 2017-08-28  8:47 Frederic Barrat
  2017-08-28  8:47 ` [PATCH 2/2] cxl: Enable global TLBIs for cxl contexts Frederic Barrat
  2017-08-28  9:35 ` [PATCH 1/2] cxl: Fix driver use count Andrew Donnellan
  0 siblings, 2 replies; 6+ messages in thread
From: Frederic Barrat @ 2017-08-28  8:47 UTC (permalink / raw)
  To: mpe, linuxppc-dev, andrew.donnellan, clombard, vaibhav; +Cc: benh, alistair

cxl keeps a driver use count, which is used with the hash memory model
on p8 to know when to upgrade local TLBIs to global and to trigger
callbacks to manage the MMU for PSL8.

If a process opens a context and closes without attaching or fails the
attachment, the driver use count is never decremented. As a
consequence, TLB invalidations remain global, even if there are no
active cxl contexts.

We should increment the driver use count when the process is attaching
to the cxl adapter, and not on open. It's not needed before the
adapter starts using the context and the use count is decremented on
the detach path, so it makes more sense.

It affects only the user api. The kernel api is already doing The
Right Thing.

Signed-off-by: Frederic Barrat <fbarrat@linux.vnet.ibm.com>
Cc: stable@vger.kernel.org # v4.2+
Fixes: 7bb5d91a4dda ("cxl: Rework context lifetimes")
---
 drivers/misc/cxl/api.c  | 4 ++++
 drivers/misc/cxl/file.c | 8 +++++++-
 2 files changed, 11 insertions(+), 1 deletion(-)

diff --git a/drivers/misc/cxl/api.c b/drivers/misc/cxl/api.c
index 1a138c83f877..e0dfd1eadd70 100644
--- a/drivers/misc/cxl/api.c
+++ b/drivers/misc/cxl/api.c
@@ -336,6 +336,10 @@ int cxl_start_context(struct cxl_context *ctx, u64 wed,
 			mmput(ctx->mm);
 	}
 
+	/*
+	 * Increment driver use count. Enables global TLBIs for hash
+	 * and callacks to handle the segment table
+	 */
 	cxl_ctx_get();
 
 	if ((rc = cxl_ops->attach_process(ctx, kernel, wed, 0))) {
diff --git a/drivers/misc/cxl/file.c b/drivers/misc/cxl/file.c
index 0761271d68c5..b76a491a485d 100644
--- a/drivers/misc/cxl/file.c
+++ b/drivers/misc/cxl/file.c
@@ -95,7 +95,6 @@ static int __afu_open(struct inode *inode, struct file *file, bool master)
 
 	pr_devel("afu_open pe: %i\n", ctx->pe);
 	file->private_data = ctx;
-	cxl_ctx_get();
 
 	/* indicate success */
 	rc = 0;
@@ -225,6 +224,12 @@ static long afu_ioctl_start_work(struct cxl_context *ctx,
 	if (ctx->mm)
 		mmput(ctx->mm);
 
+	/*
+	 * Increment driver use count. Enables global TLBIs for hash
+	 * and callacks to handle the segment table
+	 */
+	cxl_ctx_get();
+
 	trace_cxl_attach(ctx, work.work_element_descriptor, work.num_interrupts, amr);
 
 	if ((rc = cxl_ops->attach_process(ctx, false, work.work_element_descriptor,
@@ -233,6 +238,7 @@ static long afu_ioctl_start_work(struct cxl_context *ctx,
 		cxl_adapter_context_put(ctx->afu->adapter);
 		put_pid(ctx->pid);
 		ctx->pid = NULL;
+		cxl_ctx_put();
 		cxl_context_mm_count_put(ctx);
 		goto out;
 	}
-- 
2.11.0

^ permalink raw reply related	[flat|nested] 6+ messages in thread

* [PATCH 2/2] cxl: Enable global TLBIs for cxl contexts
  2017-08-28  8:47 [PATCH 1/2] cxl: Fix driver use count Frederic Barrat
@ 2017-08-28  8:47 ` Frederic Barrat
  2017-08-28 12:03   ` Benjamin Herrenschmidt
  2017-08-28  9:35 ` [PATCH 1/2] cxl: Fix driver use count Andrew Donnellan
  1 sibling, 1 reply; 6+ messages in thread
From: Frederic Barrat @ 2017-08-28  8:47 UTC (permalink / raw)
  To: mpe, linuxppc-dev, andrew.donnellan, clombard, vaibhav; +Cc: benh, alistair

The PSL and nMMU need to see all TLB invalidations for the memory
contexts used on the adapter. For the hash memory model, it is done by
making all TLBIs global as soon as the cxl driver is in use. For
radix, we need something similar, but we can refine and only convert
to global the invalidations for contexts actually used by the device.

So increment the 'active_cpus' count for the contexts attached to the
cxl adapter. As soon as there's more than 1 active cpu, the TLBIs for
the context become global. Active cpu count must be decremented when
detaching to restore locality if possible and to avoid overflowing the
counter.
    
Signed-off-by: Frederic Barrat <fbarrat@linux.vnet.ibm.com>
diff --git a/arch/powerpc/include/asm/mmu_context.h b/arch/powerpc/include/asm/mmu_context.h
index 309592589e30..6447c0df7ec4 100644
--- a/arch/powerpc/include/asm/mmu_context.h
+++ b/arch/powerpc/include/asm/mmu_context.h
@@ -77,6 +77,41 @@ extern void switch_cop(struct mm_struct *next);
 extern int use_cop(unsigned long acop, struct mm_struct *mm);
 extern void drop_cop(unsigned long acop, struct mm_struct *mm);
 
+#ifdef CONFIG_PPC_BOOK3S_64
+static inline void inc_mm_active_cpus(struct mm_struct *mm)
+{
+	atomic_inc(&mm->context.active_cpus);
+}
+
+static inline void dec_mm_active_cpus(struct mm_struct *mm)
+{
+	atomic_dec(&mm->context.active_cpus);
+}
+
+static inline void mm_context_add_copro(struct mm_struct *mm)
+{
+	inc_mm_active_cpus(mm);
+}
+
+static inline void mm_context_remove_copro(struct mm_struct *mm)
+{
+	/*
+	 * Need to broadcast a global flush of the full mm before
+	 * decrementing active_cpus count, as the next TLBI may be
+	 * local and the nMMU and/or PSL need to be cleaned up.
+	 * Should be rare enough so that it's acceptable.
+	 */
+	flush_tlb_mm(mm);
+	dec_mm_active_cpus(mm);
+}
+#else
+static inline void inc_mm_active_cpus(struct mm_struct *mm) { }
+static inline void dec_mm_active_cpus(struct mm_struct *mm) { }
+static inline void mm_context_add_copro(struct mm_struct *mm) { }
+static inline void mm_context_remove_copro(struct mm_struct *mm) { }
+#endif
+
+
 extern void switch_mm_irqs_off(struct mm_struct *prev, struct mm_struct *next,
 			       struct task_struct *tsk);
 
diff --git a/arch/powerpc/mm/mmu_context.c b/arch/powerpc/mm/mmu_context.c
index 0f613bc63c50..d60a62bf4fc7 100644
--- a/arch/powerpc/mm/mmu_context.c
+++ b/arch/powerpc/mm/mmu_context.c
@@ -34,15 +34,6 @@ static inline void switch_mm_pgdir(struct task_struct *tsk,
 				   struct mm_struct *mm) { }
 #endif
 
-#ifdef CONFIG_PPC_BOOK3S_64
-static inline void inc_mm_active_cpus(struct mm_struct *mm)
-{
-	atomic_inc(&mm->context.active_cpus);
-}
-#else
-static inline void inc_mm_active_cpus(struct mm_struct *mm) { }
-#endif
-
 void switch_mm_irqs_off(struct mm_struct *prev, struct mm_struct *next,
 			struct task_struct *tsk)
 {
diff --git a/drivers/misc/cxl/api.c b/drivers/misc/cxl/api.c
index e0dfd1eadd70..33daf33e0e05 100644
--- a/drivers/misc/cxl/api.c
+++ b/drivers/misc/cxl/api.c
@@ -15,6 +15,7 @@
 #include <linux/module.h>
 #include <linux/mount.h>
 #include <linux/sched/mm.h>
+#include <linux/mmu_context.h>
 
 #include "cxl.h"
 
@@ -332,8 +333,11 @@ int cxl_start_context(struct cxl_context *ctx, u64 wed,
 		cxl_context_mm_count_get(ctx);
 
 		/* decrement the use count */
-		if (ctx->mm)
+		if (ctx->mm) {
 			mmput(ctx->mm);
+			/* make TLBIs for this context global */
+			mm_context_add_copro(ctx->mm);
+		}
 	}
 
 	/*
@@ -342,13 +346,25 @@ int cxl_start_context(struct cxl_context *ctx, u64 wed,
 	 */
 	cxl_ctx_get();
 
+	/*
+	 * Barrier is needed to make sure all TLBIs are global before
+	 * we attach and the context starts being used by the adapter.
+	 *
+	 * Needed after mm_context_add_copro() for radix and
+	 * cxl_ctx_get() for hash/p8
+	 */
+	smp_mb();
+
 	if ((rc = cxl_ops->attach_process(ctx, kernel, wed, 0))) {
 		put_pid(ctx->pid);
 		ctx->pid = NULL;
 		cxl_adapter_context_put(ctx->afu->adapter);
 		cxl_ctx_put();
-		if (task)
+		if (task) {
 			cxl_context_mm_count_put(ctx);
+			if (ctx->mm)
+				mm_context_remove_copro(ctx->mm);
+		}
 		goto out;
 	}
 
diff --git a/drivers/misc/cxl/context.c b/drivers/misc/cxl/context.c
index 8c32040b9c09..12a41b2753f0 100644
--- a/drivers/misc/cxl/context.c
+++ b/drivers/misc/cxl/context.c
@@ -18,6 +18,7 @@
 #include <linux/slab.h>
 #include <linux/idr.h>
 #include <linux/sched/mm.h>
+#include <linux/mmu_context.h>
 #include <asm/cputable.h>
 #include <asm/current.h>
 #include <asm/copro.h>
@@ -267,6 +268,8 @@ int __detach_context(struct cxl_context *ctx)
 
 	/* Decrease the mm count on the context */
 	cxl_context_mm_count_put(ctx);
+	if (ctx->mm)
+		mm_context_remove_copro(ctx->mm);
 	ctx->mm = NULL;
 
 	return 0;
diff --git a/drivers/misc/cxl/file.c b/drivers/misc/cxl/file.c
index b76a491a485d..411e83cbbd82 100644
--- a/drivers/misc/cxl/file.c
+++ b/drivers/misc/cxl/file.c
@@ -19,6 +19,7 @@
 #include <linux/mm.h>
 #include <linux/slab.h>
 #include <linux/sched/mm.h>
+#include <linux/mmu_context.h>
 #include <asm/cputable.h>
 #include <asm/current.h>
 #include <asm/copro.h>
@@ -220,9 +221,12 @@ static long afu_ioctl_start_work(struct cxl_context *ctx,
 	/* ensure this mm_struct can't be freed */
 	cxl_context_mm_count_get(ctx);
 
-	/* decrement the use count */
-	if (ctx->mm)
+	if (ctx->mm) {
+		/* decrement the use count */
 		mmput(ctx->mm);
+		/* make TLBIs for this context global */
+		mm_context_add_copro(ctx->mm);
+	}
 
 	/*
 	 * Increment driver use count. Enables global TLBIs for hash
@@ -230,6 +234,15 @@ static long afu_ioctl_start_work(struct cxl_context *ctx,
 	 */
 	cxl_ctx_get();
 
+	/*
+	 * Barrier is needed to make sure all TLBIs are global before
+	 * we attach and the context starts being used by the adapter.
+	 *
+	 * Needed after mm_context_add_copro() for radix and
+	 * cxl_ctx_get() for hash/p8
+	 */
+	smp_mb();
+
 	trace_cxl_attach(ctx, work.work_element_descriptor, work.num_interrupts, amr);
 
 	if ((rc = cxl_ops->attach_process(ctx, false, work.work_element_descriptor,
@@ -240,6 +253,8 @@ static long afu_ioctl_start_work(struct cxl_context *ctx,
 		ctx->pid = NULL;
 		cxl_ctx_put();
 		cxl_context_mm_count_put(ctx);
+		if (ctx->mm)
+			mm_context_remove_copro(ctx->mm);
 		goto out;
 	}
 

^ permalink raw reply related	[flat|nested] 6+ messages in thread

* Re: [PATCH 1/2] cxl: Fix driver use count
  2017-08-28  8:47 [PATCH 1/2] cxl: Fix driver use count Frederic Barrat
  2017-08-28  8:47 ` [PATCH 2/2] cxl: Enable global TLBIs for cxl contexts Frederic Barrat
@ 2017-08-28  9:35 ` Andrew Donnellan
  1 sibling, 0 replies; 6+ messages in thread
From: Andrew Donnellan @ 2017-08-28  9:35 UTC (permalink / raw)
  To: Frederic Barrat, mpe, linuxppc-dev, clombard, vaibhav; +Cc: benh, alistair

On 28/08/17 18:47, Frederic Barrat wrote:
> cxl keeps a driver use count, which is used with the hash memory model
> on p8 to know when to upgrade local TLBIs to global and to trigger
> callbacks to manage the MMU for PSL8.
> 
> If a process opens a context and closes without attaching or fails the
> attachment, the driver use count is never decremented. As a
> consequence, TLB invalidations remain global, even if there are no
> active cxl contexts.
> 
> We should increment the driver use count when the process is attaching
> to the cxl adapter, and not on open. It's not needed before the
> adapter starts using the context and the use count is decremented on
> the detach path, so it makes more sense.
> 
> It affects only the user api. The kernel api is already doing The
> Right Thing.
> 
> Signed-off-by: Frederic Barrat <fbarrat@linux.vnet.ibm.com>
> Cc: stable@vger.kernel.org # v4.2+
> Fixes: 7bb5d91a4dda ("cxl: Rework context lifetimes")

A couple of comment typos.

Acked-by: Andrew Donnellan <andrew.donnellan@au1.ibm.com>

> ---
>   drivers/misc/cxl/api.c  | 4 ++++
>   drivers/misc/cxl/file.c | 8 +++++++-
>   2 files changed, 11 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/misc/cxl/api.c b/drivers/misc/cxl/api.c
> index 1a138c83f877..e0dfd1eadd70 100644
> --- a/drivers/misc/cxl/api.c
> +++ b/drivers/misc/cxl/api.c
> @@ -336,6 +336,10 @@ int cxl_start_context(struct cxl_context *ctx, u64 wed,
>   			mmput(ctx->mm);
>   	}
>   
> +	/*
> +	 * Increment driver use count. Enables global TLBIs for hash
> +	 * and callacks to handle the segment table

callbacks

> +	 */
>   	cxl_ctx_get();
>   
>   	if ((rc = cxl_ops->attach_process(ctx, kernel, wed, 0))) {
> diff --git a/drivers/misc/cxl/file.c b/drivers/misc/cxl/file.c
> index 0761271d68c5..b76a491a485d 100644
> --- a/drivers/misc/cxl/file.c
> +++ b/drivers/misc/cxl/file.c
> @@ -95,7 +95,6 @@ static int __afu_open(struct inode *inode, struct file *file, bool master)
>   
>   	pr_devel("afu_open pe: %i\n", ctx->pe);
>   	file->private_data = ctx;
> -	cxl_ctx_get();
>   
>   	/* indicate success */
>   	rc = 0;
> @@ -225,6 +224,12 @@ static long afu_ioctl_start_work(struct cxl_context *ctx,
>   	if (ctx->mm)
>   		mmput(ctx->mm);
>   
> +	/*
> +	 * Increment driver use count. Enables global TLBIs for hash
> +	 * and callacks to handle the segment table

callbacks

> +	 */
> +	cxl_ctx_get();
> +
>   	trace_cxl_attach(ctx, work.work_element_descriptor, work.num_interrupts, amr);
>   
>   	if ((rc = cxl_ops->attach_process(ctx, false, work.work_element_descriptor,
> @@ -233,6 +238,7 @@ static long afu_ioctl_start_work(struct cxl_context *ctx,
>   		cxl_adapter_context_put(ctx->afu->adapter);
>   		put_pid(ctx->pid);
>   		ctx->pid = NULL;
> +		cxl_ctx_put();
>   		cxl_context_mm_count_put(ctx);
>   		goto out;
>   	}
> 

-- 
Andrew Donnellan              OzLabs, ADL Canberra
andrew.donnellan@au1.ibm.com  IBM Australia Limited

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH 2/2] cxl: Enable global TLBIs for cxl contexts
  2017-08-28  8:47 ` [PATCH 2/2] cxl: Enable global TLBIs for cxl contexts Frederic Barrat
@ 2017-08-28 12:03   ` Benjamin Herrenschmidt
  2017-08-28 17:37     ` Frederic Barrat
  0 siblings, 1 reply; 6+ messages in thread
From: Benjamin Herrenschmidt @ 2017-08-28 12:03 UTC (permalink / raw)
  To: Frederic Barrat, mpe, linuxppc-dev, andrew.donnellan, clombard, vaibhav
  Cc: alistair

On Mon, 2017-08-28 at 10:47 +0200, Frederic Barrat wrote:
> 
>     
> Signed-off-by: Frederic Barrat <fbarrat@linux.vnet.ibm.com>
> diff --git a/arch/powerpc/include/asm/mmu_context.h b/arch/powerpc/include/asm/mmu_context.h
> index 309592589e30..6447c0df7ec4 100644
> --- a/arch/powerpc/include/asm/mmu_context.h
> +++ b/arch/powerpc/include/asm/mmu_context.h
> @@ -77,6 +77,41 @@ extern void switch_cop(struct mm_struct *next);
>  extern int use_cop(unsigned long acop, struct mm_struct *mm);
>  extern void drop_cop(unsigned long acop, struct mm_struct *mm);
>  
> +#ifdef CONFIG_PPC_BOOK3S_64
> +static inline void inc_mm_active_cpus(struct mm_struct *mm)
> +{
> +	atomic_inc(&mm->context.active_cpus);
> +}
> +
> +static inline void dec_mm_active_cpus(struct mm_struct *mm)
> +{
> +	atomic_dec(&mm->context.active_cpus);
> +}
> +
> +static inline void mm_context_add_copro(struct mm_struct *mm)
> +{
> +	inc_mm_active_cpus(mm);
> +}
> +
> +static inline void mm_context_remove_copro(struct mm_struct *mm)
> +{
> +	/*
> +	 * Need to broadcast a global flush of the full mm before
> +	 * decrementing active_cpus count, as the next TLBI may be
> +	 * local and the nMMU and/or PSL need to be cleaned up.
> +	 * Should be rare enough so that it's acceptable.
> +	 */
> +	flush_tlb_mm(mm);
> +	dec_mm_active_cpus(mm);
> +}

You probably need to kill the pwc too. With my recent optimizations
flush_tlb_mm won't do that anymore. You need a bigger hammer (I don't
have the code at hand right now to tell you what exactly :-) Basically
something that does a RIC_FLUSH_ALL.

> +#else
> +static inline void inc_mm_active_cpus(struct mm_struct *mm) { }
> +static inline void dec_mm_active_cpus(struct mm_struct *mm) { }
> +static inline void mm_context_add_copro(struct mm_struct *mm) { }
> +static inline void mm_context_remove_copro(struct mm_struct *mm) { }
> +#endif
> +
> +
>  extern void switch_mm_irqs_off(struct mm_struct *prev, struct mm_struct *next,
>  			       struct task_struct *tsk);
>  
> diff --git a/arch/powerpc/mm/mmu_context.c b/arch/powerpc/mm/mmu_context.c
> index 0f613bc63c50..d60a62bf4fc7 100644
> --- a/arch/powerpc/mm/mmu_context.c
> +++ b/arch/powerpc/mm/mmu_context.c
> @@ -34,15 +34,6 @@ static inline void switch_mm_pgdir(struct task_struct *tsk,
>  				   struct mm_struct *mm) { }
>  #endif
>  
> -#ifdef CONFIG_PPC_BOOK3S_64
> -static inline void inc_mm_active_cpus(struct mm_struct *mm)
> -{
> -	atomic_inc(&mm->context.active_cpus);
> -}
> -#else
> -static inline void inc_mm_active_cpus(struct mm_struct *mm) { }
> -#endif
> -
>  void switch_mm_irqs_off(struct mm_struct *prev, struct mm_struct *next,
>  			struct task_struct *tsk)
>  {
> diff --git a/drivers/misc/cxl/api.c b/drivers/misc/cxl/api.c
> index e0dfd1eadd70..33daf33e0e05 100644
> --- a/drivers/misc/cxl/api.c
> +++ b/drivers/misc/cxl/api.c
> @@ -15,6 +15,7 @@
>  #include <linux/module.h>
>  #include <linux/mount.h>
>  #include <linux/sched/mm.h>
> +#include <linux/mmu_context.h>
>  
>  #include "cxl.h"
>  
> @@ -332,8 +333,11 @@ int cxl_start_context(struct cxl_context *ctx, u64 wed,
>  		cxl_context_mm_count_get(ctx);
>  
>  		/* decrement the use count */
> -		if (ctx->mm)
> +		if (ctx->mm) {
>  			mmput(ctx->mm);
> +			/* make TLBIs for this context global */
> +			mm_context_add_copro(ctx->mm);
> +		}
>  	}
>  
>  	/*
> @@ -342,13 +346,25 @@ int cxl_start_context(struct cxl_context *ctx, u64 wed,
>  	 */
>  	cxl_ctx_get();
>  
> +	/*
> +	 * Barrier is needed to make sure all TLBIs are global before
> +	 * we attach and the context starts being used by the adapter.
> +	 *
> +	 * Needed after mm_context_add_copro() for radix and
> +	 * cxl_ctx_get() for hash/p8
> +	 */
> +	smp_mb();
> +
>  	if ((rc = cxl_ops->attach_process(ctx, kernel, wed, 0))) {
>  		put_pid(ctx->pid);
>  		ctx->pid = NULL;
>  		cxl_adapter_context_put(ctx->afu->adapter);
>  		cxl_ctx_put();
> -		if (task)
> +		if (task) {
>  			cxl_context_mm_count_put(ctx);
> +			if (ctx->mm)
> +				mm_context_remove_copro(ctx->mm);
> +		}
>  		goto out;
>  	}
>  
> diff --git a/drivers/misc/cxl/context.c b/drivers/misc/cxl/context.c
> index 8c32040b9c09..12a41b2753f0 100644
> --- a/drivers/misc/cxl/context.c
> +++ b/drivers/misc/cxl/context.c
> @@ -18,6 +18,7 @@
>  #include <linux/slab.h>
>  #include <linux/idr.h>
>  #include <linux/sched/mm.h>
> +#include <linux/mmu_context.h>
>  #include <asm/cputable.h>
>  #include <asm/current.h>
>  #include <asm/copro.h>
> @@ -267,6 +268,8 @@ int __detach_context(struct cxl_context *ctx)
>  
>  	/* Decrease the mm count on the context */
>  	cxl_context_mm_count_put(ctx);
> +	if (ctx->mm)
> +		mm_context_remove_copro(ctx->mm);
>  	ctx->mm = NULL;
>  
>  	return 0;
> diff --git a/drivers/misc/cxl/file.c b/drivers/misc/cxl/file.c
> index b76a491a485d..411e83cbbd82 100644
> --- a/drivers/misc/cxl/file.c
> +++ b/drivers/misc/cxl/file.c
> @@ -19,6 +19,7 @@
>  #include <linux/mm.h>
>  #include <linux/slab.h>
>  #include <linux/sched/mm.h>
> +#include <linux/mmu_context.h>
>  #include <asm/cputable.h>
>  #include <asm/current.h>
>  #include <asm/copro.h>
> @@ -220,9 +221,12 @@ static long afu_ioctl_start_work(struct cxl_context *ctx,
>  	/* ensure this mm_struct can't be freed */
>  	cxl_context_mm_count_get(ctx);
>  
> -	/* decrement the use count */
> -	if (ctx->mm)
> +	if (ctx->mm) {
> +		/* decrement the use count */
>  		mmput(ctx->mm);
> +		/* make TLBIs for this context global */
> +		mm_context_add_copro(ctx->mm);
> +	}
>  
>  	/*
>  	 * Increment driver use count. Enables global TLBIs for hash
> @@ -230,6 +234,15 @@ static long afu_ioctl_start_work(struct cxl_context *ctx,
>  	 */
>  	cxl_ctx_get();
>  
> +	/*
> +	 * Barrier is needed to make sure all TLBIs are global before
> +	 * we attach and the context starts being used by the adapter.
> +	 *
> +	 * Needed after mm_context_add_copro() for radix and
> +	 * cxl_ctx_get() for hash/p8
> +	 */
> +	smp_mb();
> +
>  	trace_cxl_attach(ctx, work.work_element_descriptor, work.num_interrupts, amr);
>  
>  	if ((rc = cxl_ops->attach_process(ctx, false, work.work_element_descriptor,
> @@ -240,6 +253,8 @@ static long afu_ioctl_start_work(struct cxl_context *ctx,
>  		ctx->pid = NULL;
>  		cxl_ctx_put();
>  		cxl_context_mm_count_put(ctx);
> +		if (ctx->mm)
> +			mm_context_remove_copro(ctx->mm);
>  		goto out;
>  	}
>  

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH 2/2] cxl: Enable global TLBIs for cxl contexts
  2017-08-28 12:03   ` Benjamin Herrenschmidt
@ 2017-08-28 17:37     ` Frederic Barrat
  2017-08-28 20:49       ` Benjamin Herrenschmidt
  0 siblings, 1 reply; 6+ messages in thread
From: Frederic Barrat @ 2017-08-28 17:37 UTC (permalink / raw)
  To: Benjamin Herrenschmidt, mpe, linuxppc-dev, andrew.donnellan,
	clombard, vaibhav
  Cc: alistair



Le 28/08/2017 à 14:03, Benjamin Herrenschmidt a écrit :
> On Mon, 2017-08-28 at 10:47 +0200, Frederic Barrat wrote:
>>
>>      
>> Signed-off-by: Frederic Barrat <fbarrat@linux.vnet.ibm.com>
>> diff --git a/arch/powerpc/include/asm/mmu_context.h b/arch/powerpc/include/asm/mmu_context.h
>> index 309592589e30..6447c0df7ec4 100644
>> --- a/arch/powerpc/include/asm/mmu_context.h
>> +++ b/arch/powerpc/include/asm/mmu_context.h
>> @@ -77,6 +77,41 @@ extern void switch_cop(struct mm_struct *next);
>>   extern int use_cop(unsigned long acop, struct mm_struct *mm);
>>   extern void drop_cop(unsigned long acop, struct mm_struct *mm);
>>   
>> +#ifdef CONFIG_PPC_BOOK3S_64
>> +static inline void inc_mm_active_cpus(struct mm_struct *mm)
>> +{
>> +	atomic_inc(&mm->context.active_cpus);
>> +}
>> +
>> +static inline void dec_mm_active_cpus(struct mm_struct *mm)
>> +{
>> +	atomic_dec(&mm->context.active_cpus);
>> +}
>> +
>> +static inline void mm_context_add_copro(struct mm_struct *mm)
>> +{
>> +	inc_mm_active_cpus(mm);
>> +}
>> +
>> +static inline void mm_context_remove_copro(struct mm_struct *mm)
>> +{
>> +	/*
>> +	 * Need to broadcast a global flush of the full mm before
>> +	 * decrementing active_cpus count, as the next TLBI may be
>> +	 * local and the nMMU and/or PSL need to be cleaned up.
>> +	 * Should be rare enough so that it's acceptable.
>> +	 */
>> +	flush_tlb_mm(mm);
>> +	dec_mm_active_cpus(mm);
>> +}
> 
> You probably need to kill the pwc too. With my recent optimizations
> flush_tlb_mm won't do that anymore. You need a bigger hammer (I don't
> have the code at hand right now to tell you what exactly :-) Basically
> something that does a RIC_FLUSH_ALL.


Good point, I had missed the change. It looks like I now need to call 
radix__flush_all_mm(), which I would have to export outside of 
tlb-radix.c first.

Any problem with having a flush_all_mm() to complement a flush_tlb_mm()? 
It's tainted with radix, and the 2 would be equivalent on hash, but it 
would make things easy.

   Fred


>> +#else
>> +static inline void inc_mm_active_cpus(struct mm_struct *mm) { }
>> +static inline void dec_mm_active_cpus(struct mm_struct *mm) { }
>> +static inline void mm_context_add_copro(struct mm_struct *mm) { }
>> +static inline void mm_context_remove_copro(struct mm_struct *mm) { }
>> +#endif
>> +
>> +
>>   extern void switch_mm_irqs_off(struct mm_struct *prev, struct mm_struct *next,
>>   			       struct task_struct *tsk);
>>   
>> diff --git a/arch/powerpc/mm/mmu_context.c b/arch/powerpc/mm/mmu_context.c
>> index 0f613bc63c50..d60a62bf4fc7 100644
>> --- a/arch/powerpc/mm/mmu_context.c
>> +++ b/arch/powerpc/mm/mmu_context.c
>> @@ -34,15 +34,6 @@ static inline void switch_mm_pgdir(struct task_struct *tsk,
>>   				   struct mm_struct *mm) { }
>>   #endif
>>   
>> -#ifdef CONFIG_PPC_BOOK3S_64
>> -static inline void inc_mm_active_cpus(struct mm_struct *mm)
>> -{
>> -	atomic_inc(&mm->context.active_cpus);
>> -}
>> -#else
>> -static inline void inc_mm_active_cpus(struct mm_struct *mm) { }
>> -#endif
>> -
>>   void switch_mm_irqs_off(struct mm_struct *prev, struct mm_struct *next,
>>   			struct task_struct *tsk)
>>   {
>> diff --git a/drivers/misc/cxl/api.c b/drivers/misc/cxl/api.c
>> index e0dfd1eadd70..33daf33e0e05 100644
>> --- a/drivers/misc/cxl/api.c
>> +++ b/drivers/misc/cxl/api.c
>> @@ -15,6 +15,7 @@
>>   #include <linux/module.h>
>>   #include <linux/mount.h>
>>   #include <linux/sched/mm.h>
>> +#include <linux/mmu_context.h>
>>   
>>   #include "cxl.h"
>>   
>> @@ -332,8 +333,11 @@ int cxl_start_context(struct cxl_context *ctx, u64 wed,
>>   		cxl_context_mm_count_get(ctx);
>>   
>>   		/* decrement the use count */
>> -		if (ctx->mm)
>> +		if (ctx->mm) {
>>   			mmput(ctx->mm);
>> +			/* make TLBIs for this context global */
>> +			mm_context_add_copro(ctx->mm);
>> +		}
>>   	}
>>   
>>   	/*
>> @@ -342,13 +346,25 @@ int cxl_start_context(struct cxl_context *ctx, u64 wed,
>>   	 */
>>   	cxl_ctx_get();
>>   
>> +	/*
>> +	 * Barrier is needed to make sure all TLBIs are global before
>> +	 * we attach and the context starts being used by the adapter.
>> +	 *
>> +	 * Needed after mm_context_add_copro() for radix and
>> +	 * cxl_ctx_get() for hash/p8
>> +	 */
>> +	smp_mb();
>> +
>>   	if ((rc = cxl_ops->attach_process(ctx, kernel, wed, 0))) {
>>   		put_pid(ctx->pid);
>>   		ctx->pid = NULL;
>>   		cxl_adapter_context_put(ctx->afu->adapter);
>>   		cxl_ctx_put();
>> -		if (task)
>> +		if (task) {
>>   			cxl_context_mm_count_put(ctx);
>> +			if (ctx->mm)
>> +				mm_context_remove_copro(ctx->mm);
>> +		}
>>   		goto out;
>>   	}
>>   
>> diff --git a/drivers/misc/cxl/context.c b/drivers/misc/cxl/context.c
>> index 8c32040b9c09..12a41b2753f0 100644
>> --- a/drivers/misc/cxl/context.c
>> +++ b/drivers/misc/cxl/context.c
>> @@ -18,6 +18,7 @@
>>   #include <linux/slab.h>
>>   #include <linux/idr.h>
>>   #include <linux/sched/mm.h>
>> +#include <linux/mmu_context.h>
>>   #include <asm/cputable.h>
>>   #include <asm/current.h>
>>   #include <asm/copro.h>
>> @@ -267,6 +268,8 @@ int __detach_context(struct cxl_context *ctx)
>>   
>>   	/* Decrease the mm count on the context */
>>   	cxl_context_mm_count_put(ctx);
>> +	if (ctx->mm)
>> +		mm_context_remove_copro(ctx->mm);
>>   	ctx->mm = NULL;
>>   
>>   	return 0;
>> diff --git a/drivers/misc/cxl/file.c b/drivers/misc/cxl/file.c
>> index b76a491a485d..411e83cbbd82 100644
>> --- a/drivers/misc/cxl/file.c
>> +++ b/drivers/misc/cxl/file.c
>> @@ -19,6 +19,7 @@
>>   #include <linux/mm.h>
>>   #include <linux/slab.h>
>>   #include <linux/sched/mm.h>
>> +#include <linux/mmu_context.h>
>>   #include <asm/cputable.h>
>>   #include <asm/current.h>
>>   #include <asm/copro.h>
>> @@ -220,9 +221,12 @@ static long afu_ioctl_start_work(struct cxl_context *ctx,
>>   	/* ensure this mm_struct can't be freed */
>>   	cxl_context_mm_count_get(ctx);
>>   
>> -	/* decrement the use count */
>> -	if (ctx->mm)
>> +	if (ctx->mm) {
>> +		/* decrement the use count */
>>   		mmput(ctx->mm);
>> +		/* make TLBIs for this context global */
>> +		mm_context_add_copro(ctx->mm);
>> +	}
>>   
>>   	/*
>>   	 * Increment driver use count. Enables global TLBIs for hash
>> @@ -230,6 +234,15 @@ static long afu_ioctl_start_work(struct cxl_context *ctx,
>>   	 */
>>   	cxl_ctx_get();
>>   
>> +	/*
>> +	 * Barrier is needed to make sure all TLBIs are global before
>> +	 * we attach and the context starts being used by the adapter.
>> +	 *
>> +	 * Needed after mm_context_add_copro() for radix and
>> +	 * cxl_ctx_get() for hash/p8
>> +	 */
>> +	smp_mb();
>> +
>>   	trace_cxl_attach(ctx, work.work_element_descriptor, work.num_interrupts, amr);
>>   
>>   	if ((rc = cxl_ops->attach_process(ctx, false, work.work_element_descriptor,
>> @@ -240,6 +253,8 @@ static long afu_ioctl_start_work(struct cxl_context *ctx,
>>   		ctx->pid = NULL;
>>   		cxl_ctx_put();
>>   		cxl_context_mm_count_put(ctx);
>> +		if (ctx->mm)
>> +			mm_context_remove_copro(ctx->mm);
>>   		goto out;
>>   	}
>>   
> 

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH 2/2] cxl: Enable global TLBIs for cxl contexts
  2017-08-28 17:37     ` Frederic Barrat
@ 2017-08-28 20:49       ` Benjamin Herrenschmidt
  0 siblings, 0 replies; 6+ messages in thread
From: Benjamin Herrenschmidt @ 2017-08-28 20:49 UTC (permalink / raw)
  To: Frederic Barrat, mpe, linuxppc-dev, andrew.donnellan, clombard, vaibhav
  Cc: alistair

On Mon, 2017-08-28 at 19:37 +0200, Frederic Barrat wrote:
> Good point, I had missed the change. It looks like I now need to call 
> radix__flush_all_mm(), which I would have to export outside of 
> tlb-radix.c first.
> 
> Any problem with having a flush_all_mm() to complement a flush_tlb_mm()? 
> It's tainted with radix, and the 2 would be equivalent on hash, but it 
> would make things easy.

Yup something like that.

Cheers,
Ben.

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2017-08-28 20:49 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-08-28  8:47 [PATCH 1/2] cxl: Fix driver use count Frederic Barrat
2017-08-28  8:47 ` [PATCH 2/2] cxl: Enable global TLBIs for cxl contexts Frederic Barrat
2017-08-28 12:03   ` Benjamin Herrenschmidt
2017-08-28 17:37     ` Frederic Barrat
2017-08-28 20:49       ` Benjamin Herrenschmidt
2017-08-28  9:35 ` [PATCH 1/2] cxl: Fix driver use count Andrew Donnellan

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.