All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v3 0/3] powerpc/mm: Mark memory contexts requiring global TLBIs
@ 2017-08-02 20:29 Frederic Barrat
  2017-08-02 20:29 ` [PATCH v3 1/3] powerpc/mm: Add marker for contexts requiring global TLB invalidations Frederic Barrat
                   ` (3 more replies)
  0 siblings, 4 replies; 9+ messages in thread
From: Frederic Barrat @ 2017-08-02 20:29 UTC (permalink / raw)
  To: mpe, aneesh.kumar, bsingharora, linuxppc-dev; +Cc: clombard, alistair, vaibhav

capi2 and opencapi require the TLB invalidations being sent for
addresses used on the cxl adapter or opencapi device to be global, as
there's a translation cache in the PSL (for capi2) or NPU (for
opencapi). The CAPP, on behalf of the PSL, and NPU snoop the power bus.

This is not new: for the hash memory model, as soon as the cxl driver
is active, all local TLBIs become global. We need a similar mechanism
for the radix memory model. This patch tries to improve things a bit
by flagging the contexts requiring global TLBIs, therefore limiting
the "upgrade" and not affecting contexts not used by the card.

A longer-term goal is to modify the current implementation for hash to
follow the same direction, i.e. identify contexts needing global
TLBIs, but that will be for later. It would be required to support
hash for opencapi.

Changelog:
v3:
 - convert from RFC to PATCH
 - mark contexts used by XSL (cxllib) as needed global invalidation
RFC v2:
 - address comments received
 - rename MM_CONTEXT_GLOBAL_TLBI -> MM_GLOBAL_TLBIE
 - add memory barriers to make sure the device doesn't miss any TLBI
 - also add barrier for the hash implemention to fix the same issue

Frederic Barrat (3):
  powerpc/mm: Add marker for contexts requiring global TLB invalidations
  cxl: Mark context requiring global TLBIs
  cxl: Add memory barrier to guarantee TLBI scope

 arch/powerpc/include/asm/book3s/64/mmu.h | 18 ++++++++++++++++++
 arch/powerpc/include/asm/tlb.h           | 27 +++++++++++++++++++++++----
 arch/powerpc/mm/mmu_context_book3s64.c   |  1 +
 arch/powerpc/mm/tlb-radix.c              |  8 ++++----
 arch/powerpc/mm/tlb_hash64.c             |  3 ++-
 drivers/misc/cxl/api.c                   | 12 ++++++++++--
 drivers/misc/cxl/cxllib.c                |  7 +++++++
 drivers/misc/cxl/file.c                  | 12 ++++++++++--
 include/misc/cxl-base.h                  | 22 +++++++++++++++++++---
 9 files changed, 94 insertions(+), 16 deletions(-)

-- 
2.11.0

^ permalink raw reply	[flat|nested] 9+ messages in thread

* [PATCH v3 1/3] powerpc/mm: Add marker for contexts requiring global TLB invalidations
  2017-08-02 20:29 [PATCH v3 0/3] powerpc/mm: Mark memory contexts requiring global TLBIs Frederic Barrat
@ 2017-08-02 20:29 ` Frederic Barrat
  2017-08-03  7:16   ` Balbir Singh
  2017-08-02 20:29 ` [PATCH v3 2/3] cxl: Mark context requiring global TLBIs Frederic Barrat
                   ` (2 subsequent siblings)
  3 siblings, 1 reply; 9+ messages in thread
From: Frederic Barrat @ 2017-08-02 20:29 UTC (permalink / raw)
  To: mpe, aneesh.kumar, bsingharora, linuxppc-dev; +Cc: clombard, alistair, vaibhav

Introduce a new 'flags' attribute per context and define its first bit
to be a marker requiring all TLBIs for that context to be broadcasted
globally. Once that marker is set on a context, it cannot be removed.

Such a marker is useful for memory contexts used by devices behind the
NPU and CAPP/PSL. The NPU and the PSL keep their own translation cache
so they need to see all the TLBIs for those contexts.

Rename mm_is_thread_local() to mm_is_invalidation_local() to better
describe what it's doing.

Signed-off-by: Frederic Barrat <fbarrat@linux.vnet.ibm.com>
---
 arch/powerpc/include/asm/book3s/64/mmu.h | 18 ++++++++++++++++++
 arch/powerpc/include/asm/tlb.h           | 27 +++++++++++++++++++++++----
 arch/powerpc/mm/mmu_context_book3s64.c   |  1 +
 arch/powerpc/mm/tlb-radix.c              |  8 ++++----
 arch/powerpc/mm/tlb_hash64.c             |  3 ++-
 5 files changed, 48 insertions(+), 9 deletions(-)

diff --git a/arch/powerpc/include/asm/book3s/64/mmu.h b/arch/powerpc/include/asm/book3s/64/mmu.h
index 5b4023c616f7..03d4515ecfa6 100644
--- a/arch/powerpc/include/asm/book3s/64/mmu.h
+++ b/arch/powerpc/include/asm/book3s/64/mmu.h
@@ -79,8 +79,12 @@ struct spinlock;
 /* Maximum possible number of NPUs in a system. */
 #define NV_MAX_NPUS 8
 
+/* Bits definition for the context flags */
+#define MM_GLOBAL_TLBIE	0	/* TLBI must be global */
+
 typedef struct {
 	mm_context_id_t id;
+	unsigned long flags;
 	u16 user_psize;		/* page size index */
 
 	/* NPU NMMU context */
@@ -165,5 +169,19 @@ extern void radix_init_pseries(void);
 static inline void radix_init_pseries(void) { };
 #endif
 
+/*
+ * Mark the memory context as requiring global TLBIs, when used by
+ * GPUs or CAPI accelerators managing their own TLB or ERAT.
+ */
+static inline void mm_context_set_global_tlbi(mm_context_t *ctx)
+{
+	set_bit(MM_GLOBAL_TLBIE, &ctx->flags);
+}
+
+static inline bool mm_context_get_global_tlbi(mm_context_t *ctx)
+{
+	return test_bit(MM_GLOBAL_TLBIE, &ctx->flags);
+}
+
 #endif /* __ASSEMBLY__ */
 #endif /* _ASM_POWERPC_BOOK3S_64_MMU_H_ */
diff --git a/arch/powerpc/include/asm/tlb.h b/arch/powerpc/include/asm/tlb.h
index 609557569f65..f06dcac82097 100644
--- a/arch/powerpc/include/asm/tlb.h
+++ b/arch/powerpc/include/asm/tlb.h
@@ -69,10 +69,29 @@ static inline int mm_is_core_local(struct mm_struct *mm)
 			      topology_sibling_cpumask(smp_processor_id()));
 }
 
-static inline int mm_is_thread_local(struct mm_struct *mm)
+static inline int mm_is_invalidation_local(struct mm_struct *mm)
 {
-	return cpumask_equal(mm_cpumask(mm),
-			      cpumask_of(smp_processor_id()));
+	int rc;
+
+	rc = cpumask_equal(mm_cpumask(mm),
+			cpumask_of(smp_processor_id()));
+#ifdef CONFIG_PPC_BOOK3S_64
+	if (rc) {
+		/*
+		 * Check if context requires global TLBI.
+		 *
+		 * We need to make sure the PTE update is happening
+		 * before reading the context global flag. Otherwise,
+		 * reading the flag may be re-ordered and happen
+		 * first, and we could end up in a situation where the
+		 * old PTE was seen by the NPU/PSL/device, but the
+		 * TLBI is local.
+		 */
+		mb();
+		rc = !mm_context_get_global_tlbi(&mm->context);
+	}
+#endif
+	return rc;
 }
 
 #else
@@ -81,7 +100,7 @@ static inline int mm_is_core_local(struct mm_struct *mm)
 	return 1;
 }
 
-static inline int mm_is_thread_local(struct mm_struct *mm)
+static inline int mm_is_invalidation_local(struct mm_struct *mm)
 {
 	return 1;
 }
diff --git a/arch/powerpc/mm/mmu_context_book3s64.c b/arch/powerpc/mm/mmu_context_book3s64.c
index a75f63833284..4dfe57f9c3b0 100644
--- a/arch/powerpc/mm/mmu_context_book3s64.c
+++ b/arch/powerpc/mm/mmu_context_book3s64.c
@@ -165,6 +165,7 @@ int init_new_context(struct task_struct *tsk, struct mm_struct *mm)
 		return index;
 
 	mm->context.id = index;
+	mm->context.flags = 0;
 #ifdef CONFIG_PPC_ICSWX
 	mm->context.cop_lockp = kmalloc(sizeof(spinlock_t), GFP_KERNEL);
 	if (!mm->context.cop_lockp) {
diff --git a/arch/powerpc/mm/tlb-radix.c b/arch/powerpc/mm/tlb-radix.c
index 16ae1bbe13f0..620ed7dced44 100644
--- a/arch/powerpc/mm/tlb-radix.c
+++ b/arch/powerpc/mm/tlb-radix.c
@@ -207,7 +207,7 @@ void radix__flush_tlb_mm(struct mm_struct *mm)
 	if (unlikely(pid == MMU_NO_CONTEXT))
 		goto no_context;
 
-	if (!mm_is_thread_local(mm))
+	if (!mm_is_invalidation_local(mm))
 		_tlbie_pid(pid, RIC_FLUSH_ALL);
 	else
 		_tlbiel_pid(pid, RIC_FLUSH_ALL);
@@ -233,7 +233,7 @@ void radix__flush_tlb_pwc(struct mmu_gather *tlb, unsigned long addr)
 	if (unlikely(pid == MMU_NO_CONTEXT))
 		goto no_context;
 
-	if (!mm_is_thread_local(mm))
+	if (!mm_is_invalidation_local(mm))
 		_tlbie_pid(pid, RIC_FLUSH_PWC);
 	else
 		tlbiel_pwc(pid);
@@ -252,7 +252,7 @@ void radix__flush_tlb_page_psize(struct mm_struct *mm, unsigned long vmaddr,
 	pid = mm ? mm->context.id : 0;
 	if (unlikely(pid == MMU_NO_CONTEXT))
 		goto bail;
-	if (!mm_is_thread_local(mm))
+	if (!mm_is_invalidation_local(mm))
 		_tlbie_va(vmaddr, pid, ap, RIC_FLUSH_TLB);
 	else
 		_tlbiel_va(vmaddr, pid, ap, RIC_FLUSH_TLB);
@@ -335,7 +335,7 @@ void radix__flush_tlb_range_psize(struct mm_struct *mm, unsigned long start,
 {
 	unsigned long pid;
 	unsigned long addr;
-	int local = mm_is_thread_local(mm);
+	int local = mm_is_invalidation_local(mm);
 	unsigned long ap = mmu_get_ap(psize);
 	unsigned long page_size = 1UL << mmu_psize_defs[psize].shift;
 
diff --git a/arch/powerpc/mm/tlb_hash64.c b/arch/powerpc/mm/tlb_hash64.c
index b5b0fb97b9c0..618865cdc793 100644
--- a/arch/powerpc/mm/tlb_hash64.c
+++ b/arch/powerpc/mm/tlb_hash64.c
@@ -96,7 +96,8 @@ void hpte_need_flush(struct mm_struct *mm, unsigned long addr,
 	 * flush now and return.
 	 */
 	if (!batch->active) {
-		flush_hash_page(vpn, rpte, psize, ssize, mm_is_thread_local(mm));
+		flush_hash_page(vpn, rpte, psize, ssize,
+				mm_is_invalidation_local(mm));
 		put_cpu_var(ppc64_tlb_batch);
 		return;
 	}
-- 
2.11.0

^ permalink raw reply related	[flat|nested] 9+ messages in thread

* [PATCH v3 2/3] cxl: Mark context requiring global TLBIs
  2017-08-02 20:29 [PATCH v3 0/3] powerpc/mm: Mark memory contexts requiring global TLBIs Frederic Barrat
  2017-08-02 20:29 ` [PATCH v3 1/3] powerpc/mm: Add marker for contexts requiring global TLB invalidations Frederic Barrat
@ 2017-08-02 20:29 ` Frederic Barrat
  2017-08-03  7:22   ` Balbir Singh
  2017-08-02 20:29 ` [PATCH v3 3/3] cxl: Add memory barrier to guarantee TLBI scope Frederic Barrat
  2017-08-30 13:29 ` [PATCH v3 0/3] powerpc/mm: Mark memory contexts requiring global TLBIs Frederic Barrat
  3 siblings, 1 reply; 9+ messages in thread
From: Frederic Barrat @ 2017-08-02 20:29 UTC (permalink / raw)
  To: mpe, aneesh.kumar, bsingharora, linuxppc-dev; +Cc: clombard, alistair, vaibhav

The PSL and XSL need to see all TLBIs pertinent to the memory contexts
used on the adapter. For the hash memory model, it is done by making
all TLBIs global as soon as the cxl driver is in use. For radix, we
need something similar, but we can refine and only convert to global
the invalidations for contexts actually used by the device.

So mark the contexts being attached to the cxl adapter as requiring
global TLBIs.

Signed-off-by: Frederic Barrat <fbarrat@linux.vnet.ibm.com>
---
 drivers/misc/cxl/api.c    | 12 ++++++++++--
 drivers/misc/cxl/cxllib.c |  7 +++++++
 drivers/misc/cxl/file.c   | 12 ++++++++++--
 3 files changed, 27 insertions(+), 4 deletions(-)

diff --git a/drivers/misc/cxl/api.c b/drivers/misc/cxl/api.c
index 1a138c83f877..d3f3fdede755 100644
--- a/drivers/misc/cxl/api.c
+++ b/drivers/misc/cxl/api.c
@@ -332,8 +332,17 @@ int cxl_start_context(struct cxl_context *ctx, u64 wed,
 		cxl_context_mm_count_get(ctx);
 
 		/* decrement the use count */
-		if (ctx->mm)
+		if (ctx->mm) {
 			mmput(ctx->mm);
+#ifdef CONFIG_PPC_BOOK3S_64
+			mm_context_set_global_tlbi(&ctx->mm->context);
+			/*
+			 * Barrier guarantees that the device will
+			 * receive all TLBIs from that point on
+			 */
+			wmb();
+#endif
+		}
 	}
 
 	cxl_ctx_get();
@@ -347,7 +356,6 @@ int cxl_start_context(struct cxl_context *ctx, u64 wed,
 			cxl_context_mm_count_put(ctx);
 		goto out;
 	}
-
 	ctx->status = STARTED;
 out:
 	mutex_unlock(&ctx->status_mutex);
diff --git a/drivers/misc/cxl/cxllib.c b/drivers/misc/cxl/cxllib.c
index 5dba23ca2e5f..00b8125752d9 100644
--- a/drivers/misc/cxl/cxllib.c
+++ b/drivers/misc/cxl/cxllib.c
@@ -198,6 +198,13 @@ int cxllib_get_PE_attributes(struct task_struct *task,
 		 * as XSL uses the memory context
 		 */
 		attr->pid = mm->context.id;
+#ifdef CONFIG_PPC_BOOK3S_64
+		mm_context_set_global_tlbi(&mm->context);
+		/*
+		 * barrier guarantees that XSL receives all invalidations
+		 */
+		wmb();
+#endif
 		mmput(mm);
 	} else {
 		attr->pid = 0;
diff --git a/drivers/misc/cxl/file.c b/drivers/misc/cxl/file.c
index 0761271d68c5..efdf9483275c 100644
--- a/drivers/misc/cxl/file.c
+++ b/drivers/misc/cxl/file.c
@@ -222,8 +222,17 @@ static long afu_ioctl_start_work(struct cxl_context *ctx,
 	cxl_context_mm_count_get(ctx);
 
 	/* decrement the use count */
-	if (ctx->mm)
+	if (ctx->mm) {
 		mmput(ctx->mm);
+#ifdef CONFIG_PPC_BOOK3S_64
+		mm_context_set_global_tlbi(&ctx->mm->context);
+		/*
+		 * Barrier guarantees that the device will receive all
+		 * TLBIs from that point on
+		 */
+		wmb();
+#endif
+	}
 
 	trace_cxl_attach(ctx, work.work_element_descriptor, work.num_interrupts, amr);
 
@@ -236,7 +245,6 @@ static long afu_ioctl_start_work(struct cxl_context *ctx,
 		cxl_context_mm_count_put(ctx);
 		goto out;
 	}
-
 	ctx->status = STARTED;
 	rc = 0;
 out:
-- 
2.11.0

^ permalink raw reply related	[flat|nested] 9+ messages in thread

* [PATCH v3 3/3] cxl: Add memory barrier to guarantee TLBI scope
  2017-08-02 20:29 [PATCH v3 0/3] powerpc/mm: Mark memory contexts requiring global TLBIs Frederic Barrat
  2017-08-02 20:29 ` [PATCH v3 1/3] powerpc/mm: Add marker for contexts requiring global TLB invalidations Frederic Barrat
  2017-08-02 20:29 ` [PATCH v3 2/3] cxl: Mark context requiring global TLBIs Frederic Barrat
@ 2017-08-02 20:29 ` Frederic Barrat
  2017-08-30 13:29 ` [PATCH v3 0/3] powerpc/mm: Mark memory contexts requiring global TLBIs Frederic Barrat
  3 siblings, 0 replies; 9+ messages in thread
From: Frederic Barrat @ 2017-08-02 20:29 UTC (permalink / raw)
  To: mpe, aneesh.kumar, bsingharora, linuxppc-dev; +Cc: clombard, alistair, vaibhav

With the hash memory model, all TLBIs become global when the cxl
driver is active, i.e. as soon as one context is open.
It is theoretically possible to send a TLBI with the wrong scope as
there's currently no memory barrier between when the driver is marked
as in use, and attaching a context to the device, therefore we are
exposed to re-ordering. It is highly unlikely as the use count for the
driver is incremented on open() and the attachment to the device
happens on a different system call (ioctl)

Signed-off-by: Frederic Barrat <fbarrat@linux.vnet.ibm.com>
---
 include/misc/cxl-base.h | 22 +++++++++++++++++++---
 1 file changed, 19 insertions(+), 3 deletions(-)

diff --git a/include/misc/cxl-base.h b/include/misc/cxl-base.h
index b2ebc91fe09a..25afe6bbe0a9 100644
--- a/include/misc/cxl-base.h
+++ b/include/misc/cxl-base.h
@@ -25,17 +25,33 @@ extern atomic_t cxl_use_count;
 
 static inline bool cxl_ctx_in_use(void)
 {
-       return (atomic_read(&cxl_use_count) != 0);
+	/*
+	 * This is called when sending an TLBI, to know whether it
+	 * should be global or local.
+	 *
+	 * We need to make sure the PTE update is happening before
+	 * reading the context global flag. Otherwise, reading the
+	 * flag may be re-ordered and happen first, and we could end
+	 * up in a situation where the old PTE is seen by the device,
+	 * but the TLBI is not global.
+	 */
+	mb();
+	return (atomic_read(&cxl_use_count) != 0);
 }
 
 static inline void cxl_ctx_get(void)
 {
-       atomic_inc(&cxl_use_count);
+	atomic_inc(&cxl_use_count);
+	/*
+	 * Barrier guarantees that the device will receive all TLBIs
+	 * from that point on
+	 */
+	wmb();
 }
 
 static inline void cxl_ctx_put(void)
 {
-       atomic_dec(&cxl_use_count);
+	atomic_dec(&cxl_use_count);
 }
 
 struct cxl_afu *cxl_afu_get(struct cxl_afu *afu);
-- 
2.11.0

^ permalink raw reply related	[flat|nested] 9+ messages in thread

* Re: [PATCH v3 1/3] powerpc/mm: Add marker for contexts requiring global TLB invalidations
  2017-08-02 20:29 ` [PATCH v3 1/3] powerpc/mm: Add marker for contexts requiring global TLB invalidations Frederic Barrat
@ 2017-08-03  7:16   ` Balbir Singh
  2017-08-22  3:11     ` Alistair Popple
  0 siblings, 1 reply; 9+ messages in thread
From: Balbir Singh @ 2017-08-03  7:16 UTC (permalink / raw)
  To: Frederic Barrat
  Cc: Michael Ellerman, Aneesh Kumar KV,
	open list:LINUX FOR POWERPC (32-BIT AND 64-BIT),
	clombard, Alistair Popple, vaibhav

On Thu, Aug 3, 2017 at 6:29 AM, Frederic Barrat
<fbarrat@linux.vnet.ibm.com> wrote:
> Introduce a new 'flags' attribute per context and define its first bit
> to be a marker requiring all TLBIs for that context to be broadcasted
> globally. Once that marker is set on a context, it cannot be removed.
>
> Such a marker is useful for memory contexts used by devices behind the
> NPU and CAPP/PSL. The NPU and the PSL keep their own translation cache
> so they need to see all the TLBIs for those contexts.
>
> Rename mm_is_thread_local() to mm_is_invalidation_local() to better
> describe what it's doing.

mm_is_tlb_local? Just nitpicking

>
> Signed-off-by: Frederic Barrat <fbarrat@linux.vnet.ibm.com>
> ---
>  arch/powerpc/include/asm/book3s/64/mmu.h | 18 ++++++++++++++++++
>  arch/powerpc/include/asm/tlb.h           | 27 +++++++++++++++++++++++----
>  arch/powerpc/mm/mmu_context_book3s64.c   |  1 +
>  arch/powerpc/mm/tlb-radix.c              |  8 ++++----
>  arch/powerpc/mm/tlb_hash64.c             |  3 ++-
>  5 files changed, 48 insertions(+), 9 deletions(-)
>
> diff --git a/arch/powerpc/include/asm/book3s/64/mmu.h b/arch/powerpc/include/asm/book3s/64/mmu.h
> index 5b4023c616f7..03d4515ecfa6 100644
> --- a/arch/powerpc/include/asm/book3s/64/mmu.h
> +++ b/arch/powerpc/include/asm/book3s/64/mmu.h
> @@ -79,8 +79,12 @@ struct spinlock;
>  /* Maximum possible number of NPUs in a system. */
>  #define NV_MAX_NPUS 8
>
> +/* Bits definition for the context flags */
> +#define MM_GLOBAL_TLBIE        0       /* TLBI must be global */
> +
>  typedef struct {
>         mm_context_id_t id;
> +       unsigned long flags;
>         u16 user_psize;         /* page size index */
>
>         /* NPU NMMU context */
> @@ -165,5 +169,19 @@ extern void radix_init_pseries(void);
>  static inline void radix_init_pseries(void) { };
>  #endif
>
> +/*
> + * Mark the memory context as requiring global TLBIs, when used by
> + * GPUs or CAPI accelerators managing their own TLB or ERAT.
> + */
> +static inline void mm_context_set_global_tlbi(mm_context_t *ctx)
> +{
> +       set_bit(MM_GLOBAL_TLBIE, &ctx->flags);
> +}
> +
> +static inline bool mm_context_get_global_tlbi(mm_context_t *ctx)
> +{
> +       return test_bit(MM_GLOBAL_TLBIE, &ctx->flags);
> +}
> +
>  #endif /* __ASSEMBLY__ */
>  #endif /* _ASM_POWERPC_BOOK3S_64_MMU_H_ */
> diff --git a/arch/powerpc/include/asm/tlb.h b/arch/powerpc/include/asm/tlb.h
> index 609557569f65..f06dcac82097 100644
> --- a/arch/powerpc/include/asm/tlb.h
> +++ b/arch/powerpc/include/asm/tlb.h
> @@ -69,10 +69,29 @@ static inline int mm_is_core_local(struct mm_struct *mm)
>                               topology_sibling_cpumask(smp_processor_id()));
>  }
>
> -static inline int mm_is_thread_local(struct mm_struct *mm)
> +static inline int mm_is_invalidation_local(struct mm_struct *mm)
>  {
> -       return cpumask_equal(mm_cpumask(mm),
> -                             cpumask_of(smp_processor_id()));
> +       int rc;
> +
> +       rc = cpumask_equal(mm_cpumask(mm),
> +                       cpumask_of(smp_processor_id()));
> +#ifdef CONFIG_PPC_BOOK3S_64
> +       if (rc) {
> +               /*
> +                * Check if context requires global TLBI.
> +                *
> +                * We need to make sure the PTE update is happening
> +                * before reading the context global flag. Otherwise,
> +                * reading the flag may be re-ordered and happen
> +                * first, and we could end up in a situation where the
> +                * old PTE was seen by the NPU/PSL/device, but the
> +                * TLBI is local.
> +                */
> +               mb();

smp_mb()?

> +               rc = !mm_context_get_global_tlbi(&mm->context);
> +       }

Otherwise looks good!

Acked-by: Balbir Singh <bsingharora@gmail.com>

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH v3 2/3] cxl: Mark context requiring global TLBIs
  2017-08-02 20:29 ` [PATCH v3 2/3] cxl: Mark context requiring global TLBIs Frederic Barrat
@ 2017-08-03  7:22   ` Balbir Singh
  2017-08-22  3:09     ` Alistair Popple
  0 siblings, 1 reply; 9+ messages in thread
From: Balbir Singh @ 2017-08-03  7:22 UTC (permalink / raw)
  To: Frederic Barrat
  Cc: Michael Ellerman, Aneesh Kumar KV,
	open list:LINUX FOR POWERPC (32-BIT AND 64-BIT),
	clombard, Alistair Popple, vaibhav

On Thu, Aug 3, 2017 at 6:29 AM, Frederic Barrat
<fbarrat@linux.vnet.ibm.com> wrote:
> The PSL and XSL need to see all TLBIs pertinent to the memory contexts
> used on the adapter. For the hash memory model, it is done by making
> all TLBIs global as soon as the cxl driver is in use. For radix, we
> need something similar, but we can refine and only convert to global
> the invalidations for contexts actually used by the device.
>
> So mark the contexts being attached to the cxl adapter as requiring
> global TLBIs.
>

Looking at these bits, I'm wondering the previous code should check
for CONFIG_CXL in mm_is_invalidation_local? That would pretty much cover
BOOK3S_64

> Signed-off-by: Frederic Barrat <fbarrat@linux.vnet.ibm.com>
> ---
>  drivers/misc/cxl/api.c    | 12 ++++++++++--
>  drivers/misc/cxl/cxllib.c |  7 +++++++
>  drivers/misc/cxl/file.c   | 12 ++++++++++--
>  3 files changed, 27 insertions(+), 4 deletions(-)
>
> diff --git a/drivers/misc/cxl/api.c b/drivers/misc/cxl/api.c
> index 1a138c83f877..d3f3fdede755 100644
> --- a/drivers/misc/cxl/api.c
> +++ b/drivers/misc/cxl/api.c
> @@ -332,8 +332,17 @@ int cxl_start_context(struct cxl_context *ctx, u64 wed,
>                 cxl_context_mm_count_get(ctx);
>
>                 /* decrement the use count */
> -               if (ctx->mm)
> +               if (ctx->mm) {
>                         mmput(ctx->mm);
> +#ifdef CONFIG_PPC_BOOK3S_64
> +                       mm_context_set_global_tlbi(&ctx->mm->context);

Do we have CXL for non PPC_BOOK3S_64?

> +                       /*
> +                        * Barrier guarantees that the device will
> +                        * receive all TLBIs from that point on
> +                        */
> +                       wmb();

smp_wmb();

> +#endif
> +               }
>         }

The other comments are the same as this (in the snip'd code)

Balbir

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH v3 2/3] cxl: Mark context requiring global TLBIs
  2017-08-03  7:22   ` Balbir Singh
@ 2017-08-22  3:09     ` Alistair Popple
  0 siblings, 0 replies; 9+ messages in thread
From: Alistair Popple @ 2017-08-22  3:09 UTC (permalink / raw)
  To: linuxppc-dev
  Cc: Balbir Singh, Frederic Barrat, clombard, vaibhav, Aneesh Kumar KV

On Thu, 3 Aug 2017 05:22:31 PM Balbir Singh wrote:
> On Thu, Aug 3, 2017 at 6:29 AM, Frederic Barrat
> <fbarrat@linux.vnet.ibm.com> wrote:
> > The PSL and XSL need to see all TLBIs pertinent to the memory contexts
> > used on the adapter. For the hash memory model, it is done by making
> > all TLBIs global as soon as the cxl driver is in use. For radix, we
> > need something similar, but we can refine and only convert to global
> > the invalidations for contexts actually used by the device.
> >
> > So mark the contexts being attached to the cxl adapter as requiring
> > global TLBIs.
> >
> 
> Looking at these bits, I'm wondering the previous code should check
> for CONFIG_CXL in mm_is_invalidation_local? That would pretty much cover
> BOOK3S_64

That won't work as we also need this for other nest MMU users which might
not have CONFIG_CXL set - for example the NPU the will use
mm_context_set_global_tlbi() for processes on the GPU.

Regards,

Alistair

> > Signed-off-by: Frederic Barrat <fbarrat@linux.vnet.ibm.com>
> > ---
> >  drivers/misc/cxl/api.c    | 12 ++++++++++--
> >  drivers/misc/cxl/cxllib.c |  7 +++++++
> >  drivers/misc/cxl/file.c   | 12 ++++++++++--
> >  3 files changed, 27 insertions(+), 4 deletions(-)
> >
> > diff --git a/drivers/misc/cxl/api.c b/drivers/misc/cxl/api.c
> > index 1a138c83f877..d3f3fdede755 100644
> > --- a/drivers/misc/cxl/api.c
> > +++ b/drivers/misc/cxl/api.c
> > @@ -332,8 +332,17 @@ int cxl_start_context(struct cxl_context *ctx, u64 wed,
> >                 cxl_context_mm_count_get(ctx);
> >
> >                 /* decrement the use count */
> > -               if (ctx->mm)
> > +               if (ctx->mm) {
> >                         mmput(ctx->mm);
> > +#ifdef CONFIG_PPC_BOOK3S_64
> > +                       mm_context_set_global_tlbi(&ctx->mm->context);
> 
> Do we have CXL for non PPC_BOOK3S_64?
> 
> > +                       /*
> > +                        * Barrier guarantees that the device will
> > +                        * receive all TLBIs from that point on
> > +                        */
> > +                       wmb();
> 
> smp_wmb();
> 
> > +#endif
> > +               }
> >         }
> 
> The other comments are the same as this (in the snip'd code)
> 
> Balbir

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH v3 1/3] powerpc/mm: Add marker for contexts requiring global TLB invalidations
  2017-08-03  7:16   ` Balbir Singh
@ 2017-08-22  3:11     ` Alistair Popple
  0 siblings, 0 replies; 9+ messages in thread
From: Alistair Popple @ 2017-08-22  3:11 UTC (permalink / raw)
  To: linuxppc-dev
  Cc: Balbir Singh, Frederic Barrat, clombard, vaibhav, Aneesh Kumar KV

For what it's worth this worked fine when testing with the NPU as well.

Tested-by: Alistair Popple <alistair@popple.id.au>

On Thu, 3 Aug 2017 05:16:35 PM Balbir Singh wrote:
> On Thu, Aug 3, 2017 at 6:29 AM, Frederic Barrat
> <fbarrat@linux.vnet.ibm.com> wrote:
> > Introduce a new 'flags' attribute per context and define its first bit
> > to be a marker requiring all TLBIs for that context to be broadcasted
> > globally. Once that marker is set on a context, it cannot be removed.
> >
> > Such a marker is useful for memory contexts used by devices behind the
> > NPU and CAPP/PSL. The NPU and the PSL keep their own translation cache
> > so they need to see all the TLBIs for those contexts.
> >
> > Rename mm_is_thread_local() to mm_is_invalidation_local() to better
> > describe what it's doing.
> 
> mm_is_tlb_local? Just nitpicking
> 
> >
> > Signed-off-by: Frederic Barrat <fbarrat@linux.vnet.ibm.com>
> > ---
> >  arch/powerpc/include/asm/book3s/64/mmu.h | 18 ++++++++++++++++++
> >  arch/powerpc/include/asm/tlb.h           | 27 +++++++++++++++++++++++----
> >  arch/powerpc/mm/mmu_context_book3s64.c   |  1 +
> >  arch/powerpc/mm/tlb-radix.c              |  8 ++++----
> >  arch/powerpc/mm/tlb_hash64.c             |  3 ++-
> >  5 files changed, 48 insertions(+), 9 deletions(-)
> >
> > diff --git a/arch/powerpc/include/asm/book3s/64/mmu.h b/arch/powerpc/include/asm/book3s/64/mmu.h
> > index 5b4023c616f7..03d4515ecfa6 100644
> > --- a/arch/powerpc/include/asm/book3s/64/mmu.h
> > +++ b/arch/powerpc/include/asm/book3s/64/mmu.h
> > @@ -79,8 +79,12 @@ struct spinlock;
> >  /* Maximum possible number of NPUs in a system. */
> >  #define NV_MAX_NPUS 8
> >
> > +/* Bits definition for the context flags */
> > +#define MM_GLOBAL_TLBIE        0       /* TLBI must be global */
> > +
> >  typedef struct {
> >         mm_context_id_t id;
> > +       unsigned long flags;
> >         u16 user_psize;         /* page size index */
> >
> >         /* NPU NMMU context */
> > @@ -165,5 +169,19 @@ extern void radix_init_pseries(void);
> >  static inline void radix_init_pseries(void) { };
> >  #endif
> >
> > +/*
> > + * Mark the memory context as requiring global TLBIs, when used by
> > + * GPUs or CAPI accelerators managing their own TLB or ERAT.
> > + */
> > +static inline void mm_context_set_global_tlbi(mm_context_t *ctx)
> > +{
> > +       set_bit(MM_GLOBAL_TLBIE, &ctx->flags);
> > +}
> > +
> > +static inline bool mm_context_get_global_tlbi(mm_context_t *ctx)
> > +{
> > +       return test_bit(MM_GLOBAL_TLBIE, &ctx->flags);
> > +}
> > +
> >  #endif /* __ASSEMBLY__ */
> >  #endif /* _ASM_POWERPC_BOOK3S_64_MMU_H_ */
> > diff --git a/arch/powerpc/include/asm/tlb.h b/arch/powerpc/include/asm/tlb.h
> > index 609557569f65..f06dcac82097 100644
> > --- a/arch/powerpc/include/asm/tlb.h
> > +++ b/arch/powerpc/include/asm/tlb.h
> > @@ -69,10 +69,29 @@ static inline int mm_is_core_local(struct mm_struct *mm)
> >                               topology_sibling_cpumask(smp_processor_id()));
> >  }
> >
> > -static inline int mm_is_thread_local(struct mm_struct *mm)
> > +static inline int mm_is_invalidation_local(struct mm_struct *mm)
> >  {
> > -       return cpumask_equal(mm_cpumask(mm),
> > -                             cpumask_of(smp_processor_id()));
> > +       int rc;
> > +
> > +       rc = cpumask_equal(mm_cpumask(mm),
> > +                       cpumask_of(smp_processor_id()));
> > +#ifdef CONFIG_PPC_BOOK3S_64
> > +       if (rc) {
> > +               /*
> > +                * Check if context requires global TLBI.
> > +                *
> > +                * We need to make sure the PTE update is happening
> > +                * before reading the context global flag. Otherwise,
> > +                * reading the flag may be re-ordered and happen
> > +                * first, and we could end up in a situation where the
> > +                * old PTE was seen by the NPU/PSL/device, but the
> > +                * TLBI is local.
> > +                */
> > +               mb();
> 
> smp_mb()?
> 
> > +               rc = !mm_context_get_global_tlbi(&mm->context);
> > +       }
> 
> Otherwise looks good!
> 
> Acked-by: Balbir Singh <bsingharora@gmail.com>

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH v3 0/3] powerpc/mm: Mark memory contexts requiring global TLBIs
  2017-08-02 20:29 [PATCH v3 0/3] powerpc/mm: Mark memory contexts requiring global TLBIs Frederic Barrat
                   ` (2 preceding siblings ...)
  2017-08-02 20:29 ` [PATCH v3 3/3] cxl: Add memory barrier to guarantee TLBI scope Frederic Barrat
@ 2017-08-30 13:29 ` Frederic Barrat
  3 siblings, 0 replies; 9+ messages in thread
From: Frederic Barrat @ 2017-08-30 13:29 UTC (permalink / raw)
  To: mpe, aneesh.kumar, bsingharora, linuxppc-dev; +Cc: alistair, clombard, vaibhav

I'm dropping this series, as there was a recent change done in the 
memory context that I can reuse.
The follow up of the story is:
http://patchwork.ozlabs.org/patch/807570/

   Fred


Le 02/08/2017 à 22:29, Frederic Barrat a écrit :
> capi2 and opencapi require the TLB invalidations being sent for
> addresses used on the cxl adapter or opencapi device to be global, as
> there's a translation cache in the PSL (for capi2) or NPU (for
> opencapi). The CAPP, on behalf of the PSL, and NPU snoop the power bus.
> 
> This is not new: for the hash memory model, as soon as the cxl driver
> is active, all local TLBIs become global. We need a similar mechanism
> for the radix memory model. This patch tries to improve things a bit
> by flagging the contexts requiring global TLBIs, therefore limiting
> the "upgrade" and not affecting contexts not used by the card.
> 
> A longer-term goal is to modify the current implementation for hash to
> follow the same direction, i.e. identify contexts needing global
> TLBIs, but that will be for later. It would be required to support
> hash for opencapi.
> 
> Changelog:
> v3:
>   - convert from RFC to PATCH
>   - mark contexts used by XSL (cxllib) as needed global invalidation
> RFC v2:
>   - address comments received
>   - rename MM_CONTEXT_GLOBAL_TLBI -> MM_GLOBAL_TLBIE
>   - add memory barriers to make sure the device doesn't miss any TLBI
>   - also add barrier for the hash implemention to fix the same issue
> 
> Frederic Barrat (3):
>    powerpc/mm: Add marker for contexts requiring global TLB invalidations
>    cxl: Mark context requiring global TLBIs
>    cxl: Add memory barrier to guarantee TLBI scope
> 
>   arch/powerpc/include/asm/book3s/64/mmu.h | 18 ++++++++++++++++++
>   arch/powerpc/include/asm/tlb.h           | 27 +++++++++++++++++++++++----
>   arch/powerpc/mm/mmu_context_book3s64.c   |  1 +
>   arch/powerpc/mm/tlb-radix.c              |  8 ++++----
>   arch/powerpc/mm/tlb_hash64.c             |  3 ++-
>   drivers/misc/cxl/api.c                   | 12 ++++++++++--
>   drivers/misc/cxl/cxllib.c                |  7 +++++++
>   drivers/misc/cxl/file.c                  | 12 ++++++++++--
>   include/misc/cxl-base.h                  | 22 +++++++++++++++++++---
>   9 files changed, 94 insertions(+), 16 deletions(-)
> 

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2017-08-30 13:29 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-08-02 20:29 [PATCH v3 0/3] powerpc/mm: Mark memory contexts requiring global TLBIs Frederic Barrat
2017-08-02 20:29 ` [PATCH v3 1/3] powerpc/mm: Add marker for contexts requiring global TLB invalidations Frederic Barrat
2017-08-03  7:16   ` Balbir Singh
2017-08-22  3:11     ` Alistair Popple
2017-08-02 20:29 ` [PATCH v3 2/3] cxl: Mark context requiring global TLBIs Frederic Barrat
2017-08-03  7:22   ` Balbir Singh
2017-08-22  3:09     ` Alistair Popple
2017-08-02 20:29 ` [PATCH v3 3/3] cxl: Add memory barrier to guarantee TLBI scope Frederic Barrat
2017-08-30 13:29 ` [PATCH v3 0/3] powerpc/mm: Mark memory contexts requiring global TLBIs Frederic Barrat

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.