All of lore.kernel.org
 help / color / mirror / Atom feed
* [RFC 0/2] powerpc/mm: Mark memory contexts requiring global TLBIs
@ 2017-05-03 14:29 Frederic Barrat
  2017-05-03 14:29 ` [RFC 1/2] powerpc/mm: Add marker for contexts requiring global TLB invalidations Frederic Barrat
                   ` (2 more replies)
  0 siblings, 3 replies; 12+ messages in thread
From: Frederic Barrat @ 2017-05-03 14:29 UTC (permalink / raw)
  To: alistair, aneesh.kumar, bsingharora, linuxppc-dev, mpe

capi2 and opencapi require the TLB invalidations being sent for
addresses used on the cxl adapter or opencapi device to be global, as
there's a translation cache in the PSL (for capi2) or NPU (for
opencapi). The CAPP (for PSL) and NPU snoop the power bus.

This is not new: for the hash memory model, as soon as the cxl driver
is active, all local TLBIs become global. We need a similar mechanism
for the radix memory model. This patch tries to improve things a bit
by flagging the contexts requiring global TLBIs, therefore limiting
the "upgrade" and not affecting contexts not used by the card.

Alistair: for nvlink2, it is my understanding that all the required
invalidations are already in place through software mmio/ATSD, i.e. this
patch is not useful for you.

Submitting as an RFC, since I don't get to touch mmu.h everyday and
would like to probe people's reaction.



Frederic Barrat (2):
  powerpc/mm: Add marker for contexts requiring global TLB invalidations
  cxl: Mark context requiring global TLBIs

 arch/powerpc/include/asm/book3s/64/mmu.h |  9 +++++++++
 arch/powerpc/include/asm/tlb.h           | 10 ++++++++--
 arch/powerpc/mm/mmu_context_book3s64.c   |  1 +
 drivers/misc/cxl/api.c                   |  5 ++++-
 drivers/misc/cxl/file.c                  |  5 ++++-
 5 files changed, 26 insertions(+), 4 deletions(-)

-- 
2.11.0

^ permalink raw reply	[flat|nested] 12+ messages in thread

* [RFC 1/2] powerpc/mm: Add marker for contexts requiring global TLB invalidations
  2017-05-03 14:29 [RFC 0/2] powerpc/mm: Mark memory contexts requiring global TLBIs Frederic Barrat
@ 2017-05-03 14:29 ` Frederic Barrat
  2017-05-04  6:41   ` Aneesh Kumar K.V
                     ` (2 more replies)
  2017-05-03 14:29 ` [RFC 2/2] cxl: Mark context requiring global TLBIs Frederic Barrat
  2017-05-05  5:28 ` [RFC 0/2] powerpc/mm: Mark memory contexts " Alistair Popple
  2 siblings, 3 replies; 12+ messages in thread
From: Frederic Barrat @ 2017-05-03 14:29 UTC (permalink / raw)
  To: alistair, aneesh.kumar, bsingharora, linuxppc-dev, mpe

Introduce a new 'flags' attribute per context and define its first bit
to be a marker requiring all TLBIs for that context to be broadcasted
globally. Once that marker is set on a context, it cannot be removed.

Such a marker is useful for memory contexts used by devices behind the
NPU and CAPP/PSL. The NPU and the PSL keep their own
translation cache so they need to see all the TLBIs for those
contexts.

Signed-off-by: Frederic Barrat <fbarrat@linux.vnet.ibm.com>
---
 arch/powerpc/include/asm/book3s/64/mmu.h |  9 +++++++++
 arch/powerpc/include/asm/tlb.h           | 10 ++++++++--
 arch/powerpc/mm/mmu_context_book3s64.c   |  1 +
 3 files changed, 18 insertions(+), 2 deletions(-)

diff --git a/arch/powerpc/include/asm/book3s/64/mmu.h b/arch/powerpc/include/asm/book3s/64/mmu.h
index 77529a3e3811..7b640ab1cbeb 100644
--- a/arch/powerpc/include/asm/book3s/64/mmu.h
+++ b/arch/powerpc/include/asm/book3s/64/mmu.h
@@ -78,8 +78,12 @@ struct spinlock;
 /* Maximum possible number of NPUs in a system. */
 #define NV_MAX_NPUS 8
 
+/* Bits definition for the context flags */
+#define MM_CONTEXT_GLOBAL_TLBI	1	/* TLBI must be global */
+
 typedef struct {
 	mm_context_id_t id;
+	unsigned long flags;
 	u16 user_psize;		/* page size index */
 
 	/* NPU NMMU context */
@@ -164,5 +168,10 @@ extern void radix_init_pseries(void);
 static inline void radix_init_pseries(void) { };
 #endif
 
+static inline void mm_context_set_global_tlbi(mm_context_t *ctx)
+{
+	set_bit(MM_CONTEXT_GLOBAL_TLBI, &ctx->flags);
+}
+
 #endif /* __ASSEMBLY__ */
 #endif /* _ASM_POWERPC_BOOK3S_64_MMU_H_ */
diff --git a/arch/powerpc/include/asm/tlb.h b/arch/powerpc/include/asm/tlb.h
index 609557569f65..bd18ed083011 100644
--- a/arch/powerpc/include/asm/tlb.h
+++ b/arch/powerpc/include/asm/tlb.h
@@ -71,8 +71,14 @@ static inline int mm_is_core_local(struct mm_struct *mm)
 
 static inline int mm_is_thread_local(struct mm_struct *mm)
 {
-	return cpumask_equal(mm_cpumask(mm),
-			      cpumask_of(smp_processor_id()));
+	int rc;
+
+	rc = cpumask_equal(mm_cpumask(mm),
+			cpumask_of(smp_processor_id()));
+#ifdef CONFIG_PPC_BOOK3S_64
+	rc = rc && !test_bit(MM_CONTEXT_GLOBAL_TLBI, &mm->context.flags);
+#endif
+	return rc;
 }
 
 #else
diff --git a/arch/powerpc/mm/mmu_context_book3s64.c b/arch/powerpc/mm/mmu_context_book3s64.c
index c6dca2ae78ef..53983a0c220c 100644
--- a/arch/powerpc/mm/mmu_context_book3s64.c
+++ b/arch/powerpc/mm/mmu_context_book3s64.c
@@ -156,6 +156,7 @@ int init_new_context(struct task_struct *tsk, struct mm_struct *mm)
 		return index;
 
 	mm->context.id = index;
+	mm->context.flags = 0;
 #ifdef CONFIG_PPC_ICSWX
 	mm->context.cop_lockp = kmalloc(sizeof(spinlock_t), GFP_KERNEL);
 	if (!mm->context.cop_lockp) {
-- 
2.11.0

^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [RFC 2/2] cxl: Mark context requiring global TLBIs
  2017-05-03 14:29 [RFC 0/2] powerpc/mm: Mark memory contexts requiring global TLBIs Frederic Barrat
  2017-05-03 14:29 ` [RFC 1/2] powerpc/mm: Add marker for contexts requiring global TLB invalidations Frederic Barrat
@ 2017-05-03 14:29 ` Frederic Barrat
  2017-05-04  7:39   ` Balbir Singh
  2017-05-05  5:28 ` [RFC 0/2] powerpc/mm: Mark memory contexts " Alistair Popple
  2 siblings, 1 reply; 12+ messages in thread
From: Frederic Barrat @ 2017-05-03 14:29 UTC (permalink / raw)
  To: alistair, aneesh.kumar, bsingharora, linuxppc-dev, mpe

The PSL needs to see all TLBI pertinent to the memory contexts used on
the cxl adapter. For the hash memory model, it was done by making all
TLBIs global as soon as the cxl driver is in us. For radix, we need
something similar, but we can refine and only convert to global the
invalidations for contexts actually used by the device.

So mark the contexts being attached to the cxl adapter as requiring
global TLBIs.

Signed-off-by: Frederic Barrat <fbarrat@linux.vnet.ibm.com>
---
 drivers/misc/cxl/api.c  | 5 ++++-
 drivers/misc/cxl/file.c | 5 ++++-
 2 files changed, 8 insertions(+), 2 deletions(-)

diff --git a/drivers/misc/cxl/api.c b/drivers/misc/cxl/api.c
index 1a138c83f877..86b2ad86fdee 100644
--- a/drivers/misc/cxl/api.c
+++ b/drivers/misc/cxl/api.c
@@ -347,7 +347,10 @@ int cxl_start_context(struct cxl_context *ctx, u64 wed,
 			cxl_context_mm_count_put(ctx);
 		goto out;
 	}
-
+#ifdef CONFIG_PPC_BOOK3S_64
+	if (ctx->mm)
+		mm_context_set_global_tlbi(&ctx->mm->context);
+#endif
 	ctx->status = STARTED;
 out:
 	mutex_unlock(&ctx->status_mutex);
diff --git a/drivers/misc/cxl/file.c b/drivers/misc/cxl/file.c
index 17b433f1ce23..c7ead488eee2 100644
--- a/drivers/misc/cxl/file.c
+++ b/drivers/misc/cxl/file.c
@@ -239,7 +239,10 @@ static long afu_ioctl_start_work(struct cxl_context *ctx,
 		cxl_context_mm_count_put(ctx);
 		goto out;
 	}
-
+#ifdef CONFIG_PPC_BOOK3S_64
+	if (ctx->mm)
+		mm_context_set_global_tlbi(&ctx->mm->context);
+#endif
 	ctx->status = STARTED;
 	rc = 0;
 out:
-- 
2.11.0

^ permalink raw reply related	[flat|nested] 12+ messages in thread

* Re: [RFC 1/2] powerpc/mm: Add marker for contexts requiring global TLB invalidations
  2017-05-03 14:29 ` [RFC 1/2] powerpc/mm: Add marker for contexts requiring global TLB invalidations Frederic Barrat
@ 2017-05-04  6:41   ` Aneesh Kumar K.V
  2017-05-04 17:24     ` Frederic Barrat
  2017-05-04  7:25   ` Balbir Singh
  2017-05-04  9:42   ` Michael Ellerman
  2 siblings, 1 reply; 12+ messages in thread
From: Aneesh Kumar K.V @ 2017-05-04  6:41 UTC (permalink / raw)
  To: Frederic Barrat, alistair, bsingharora, linuxppc-dev, mpe

Frederic Barrat <fbarrat@linux.vnet.ibm.com> writes:

> Introduce a new 'flags' attribute per context and define its first bit
> to be a marker requiring all TLBIs for that context to be broadcasted
> globally. Once that marker is set on a context, it cannot be removed.
>
> Such a marker is useful for memory contexts used by devices behind the
> NPU and CAPP/PSL. The NPU and the PSL keep their own
> translation cache so they need to see all the TLBIs for those
> contexts.

Can we also switch existing cxl_ctx_in_use() to this ?

-aneesh

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [RFC 1/2] powerpc/mm: Add marker for contexts requiring global TLB invalidations
  2017-05-03 14:29 ` [RFC 1/2] powerpc/mm: Add marker for contexts requiring global TLB invalidations Frederic Barrat
  2017-05-04  6:41   ` Aneesh Kumar K.V
@ 2017-05-04  7:25   ` Balbir Singh
  2017-05-04  9:24     ` Michael Ellerman
  2017-05-04  9:42   ` Michael Ellerman
  2 siblings, 1 reply; 12+ messages in thread
From: Balbir Singh @ 2017-05-04  7:25 UTC (permalink / raw)
  To: Frederic Barrat, alistair, aneesh.kumar, linuxppc-dev, mpe

On Wed, 2017-05-03 at 16:29 +0200, Frederic Barrat wrote:
> Introduce a new 'flags' attribute per context and define its first bit
> to be a marker requiring all TLBIs for that context to be broadcasted
> globally. Once that marker is set on a context, it cannot be removed.
> 
> Such a marker is useful for memory contexts used by devices behind the
> NPU and CAPP/PSL. The NPU and the PSL keep their own
> translation cache so they need to see all the TLBIs for those
> contexts.
> 
> Signed-off-by: Frederic Barrat <fbarrat@linux.vnet.ibm.com>
> ---
>  arch/powerpc/include/asm/book3s/64/mmu.h |  9 +++++++++
>  arch/powerpc/include/asm/tlb.h           | 10 ++++++++--
>  arch/powerpc/mm/mmu_context_book3s64.c   |  1 +
>  3 files changed, 18 insertions(+), 2 deletions(-)
> 
> diff --git a/arch/powerpc/include/asm/book3s/64/mmu.h b/arch/powerpc/include/asm/book3s/64/mmu.h
> index 77529a3e3811..7b640ab1cbeb 100644
> --- a/arch/powerpc/include/asm/book3s/64/mmu.h
> +++ b/arch/powerpc/include/asm/book3s/64/mmu.h
> @@ -78,8 +78,12 @@ struct spinlock;
>  /* Maximum possible number of NPUs in a system. */
>  #define NV_MAX_NPUS 8
>  
> +/* Bits definition for the context flags */
> +#define MM_CONTEXT_GLOBAL_TLBI	1	/* TLBI must be global */
> +
>  typedef struct {
>  	mm_context_id_t id;
> +	unsigned long flags;

Should these flags be under #ifdef PPC_BOOK3S_64 as well? Not sure.

>  	u16 user_psize;		/* page size index */
>  
>  	/* NPU NMMU context */
> @@ -164,5 +168,10 @@ extern void radix_init_pseries(void);
>  static inline void radix_init_pseries(void) { };
>  #endif
>  
> +static inline void mm_context_set_global_tlbi(mm_context_t *ctx)
> +{
> +	set_bit(MM_CONTEXT_GLOBAL_TLBI, &ctx->flags);
> +}
> +
>  #endif /* __ASSEMBLY__ */
>  #endif /* _ASM_POWERPC_BOOK3S_64_MMU_H_ */
> diff --git a/arch/powerpc/include/asm/tlb.h b/arch/powerpc/include/asm/tlb.h
> index 609557569f65..bd18ed083011 100644
> --- a/arch/powerpc/include/asm/tlb.h
> +++ b/arch/powerpc/include/asm/tlb.h
> @@ -71,8 +71,14 @@ static inline int mm_is_core_local(struct mm_struct *mm)
>  
>  static inline int mm_is_thread_local(struct mm_struct *mm)
>  {
> -	return cpumask_equal(mm_cpumask(mm),
> -			      cpumask_of(smp_processor_id()));
> +	int rc;
> +
> +	rc = cpumask_equal(mm_cpumask(mm),
> +			cpumask_of(smp_processor_id()));
> +#ifdef CONFIG_PPC_BOOK3S_64
> +	rc = rc && !test_bit(MM_CONTEXT_GLOBAL_TLBI, &mm->context.flags);
> +#endif

Acked-by: Balbir Singh <bsingharora@gmail.com>

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [RFC 2/2] cxl: Mark context requiring global TLBIs
  2017-05-03 14:29 ` [RFC 2/2] cxl: Mark context requiring global TLBIs Frederic Barrat
@ 2017-05-04  7:39   ` Balbir Singh
  2017-05-07 10:41     ` Frederic Barrat
  0 siblings, 1 reply; 12+ messages in thread
From: Balbir Singh @ 2017-05-04  7:39 UTC (permalink / raw)
  To: Frederic Barrat, alistair, aneesh.kumar, linuxppc-dev, mpe

On Wed, 2017-05-03 at 16:29 +0200, Frederic Barrat wrote:
> The PSL needs to see all TLBI pertinent to the memory contexts used on
> the cxl adapter. For the hash memory model, it was done by making all
> TLBIs global as soon as the cxl driver is in us. For radix, we need
> something similar, but we can refine and only convert to global the
> invalidations for contexts actually used by the device.
> 
> So mark the contexts being attached to the cxl adapter as requiring
> global TLBIs.
>
<snip> 
> +#ifdef CONFIG_PPC_BOOK3S_64
> +	if (ctx->mm)
> +		mm_context_set_global_tlbi(&ctx->mm->context);

Just curious and wondering

Could we do mm_context_set_global_tlbi() before ->attach_process() that
way we won't need atomic tests (set_bit() and test_bit())? May be a memory
barrier would suffice. Not 100% sure, hence checking

Balbir Singh.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [RFC 1/2] powerpc/mm: Add marker for contexts requiring global TLB invalidations
  2017-05-04  7:25   ` Balbir Singh
@ 2017-05-04  9:24     ` Michael Ellerman
  0 siblings, 0 replies; 12+ messages in thread
From: Michael Ellerman @ 2017-05-04  9:24 UTC (permalink / raw)
  To: Balbir Singh, Frederic Barrat, alistair, aneesh.kumar, linuxppc-dev

Balbir Singh <bsingharora@gmail.com> writes:

> On Wed, 2017-05-03 at 16:29 +0200, Frederic Barrat wrote:
>> Introduce a new 'flags' attribute per context and define its first bit
>> to be a marker requiring all TLBIs for that context to be broadcasted
>> globally. Once that marker is set on a context, it cannot be removed.
>> 
>> Such a marker is useful for memory contexts used by devices behind the
>> NPU and CAPP/PSL. The NPU and the PSL keep their own
>> translation cache so they need to see all the TLBIs for those
>> contexts.
>> 
>> Signed-off-by: Frederic Barrat <fbarrat@linux.vnet.ibm.com>
>> ---
>>  arch/powerpc/include/asm/book3s/64/mmu.h |  9 +++++++++
>>  arch/powerpc/include/asm/tlb.h           | 10 ++++++++--
>>  arch/powerpc/mm/mmu_context_book3s64.c   |  1 +
>>  3 files changed, 18 insertions(+), 2 deletions(-)
>> 
>> diff --git a/arch/powerpc/include/asm/book3s/64/mmu.h b/arch/powerpc/include/asm/book3s/64/mmu.h
>> index 77529a3e3811..7b640ab1cbeb 100644
>> --- a/arch/powerpc/include/asm/book3s/64/mmu.h
>> +++ b/arch/powerpc/include/asm/book3s/64/mmu.h
>> @@ -78,8 +78,12 @@ struct spinlock;
>>  /* Maximum possible number of NPUs in a system. */
>>  #define NV_MAX_NPUS 8
>>  
>> +/* Bits definition for the context flags */
>> +#define MM_CONTEXT_GLOBAL_TLBI	1	/* TLBI must be global */
>> +
>>  typedef struct {
>>  	mm_context_id_t id;
>> +	unsigned long flags;
>
> Should these flags be under #ifdef PPC_BOOK3S_64 as well? Not sure.

They shouldn't need to be, the whole file is Book3s 64 only.

cheers

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [RFC 1/2] powerpc/mm: Add marker for contexts requiring global TLB invalidations
  2017-05-03 14:29 ` [RFC 1/2] powerpc/mm: Add marker for contexts requiring global TLB invalidations Frederic Barrat
  2017-05-04  6:41   ` Aneesh Kumar K.V
  2017-05-04  7:25   ` Balbir Singh
@ 2017-05-04  9:42   ` Michael Ellerman
  2017-05-07 11:15     ` Frederic Barrat
  2 siblings, 1 reply; 12+ messages in thread
From: Michael Ellerman @ 2017-05-04  9:42 UTC (permalink / raw)
  To: Frederic Barrat, alistair, aneesh.kumar, bsingharora, linuxppc-dev

Frederic Barrat <fbarrat@linux.vnet.ibm.com> writes:

> Introduce a new 'flags' attribute per context and define its first bit
> to be a marker requiring all TLBIs for that context to be broadcasted
> globally. Once that marker is set on a context, it cannot be removed.
>
> Such a marker is useful for memory contexts used by devices behind the
> NPU and CAPP/PSL. The NPU and the PSL keep their own
> translation cache so they need to see all the TLBIs for those
> contexts.
>
> Signed-off-by: Frederic Barrat <fbarrat@linux.vnet.ibm.com>
> ---
>  arch/powerpc/include/asm/book3s/64/mmu.h |  9 +++++++++
>  arch/powerpc/include/asm/tlb.h           | 10 ++++++++--
>  arch/powerpc/mm/mmu_context_book3s64.c   |  1 +
>  3 files changed, 18 insertions(+), 2 deletions(-)
>
> diff --git a/arch/powerpc/include/asm/book3s/64/mmu.h b/arch/powerpc/include/asm/book3s/64/mmu.h
> index 77529a3e3811..7b640ab1cbeb 100644
> --- a/arch/powerpc/include/asm/book3s/64/mmu.h
> +++ b/arch/powerpc/include/asm/book3s/64/mmu.h
> @@ -78,8 +78,12 @@ struct spinlock;
>  /* Maximum possible number of NPUs in a system. */
>  #define NV_MAX_NPUS 8
>  
> +/* Bits definition for the context flags */
> +#define MM_CONTEXT_GLOBAL_TLBI	1	/* TLBI must be global */

I think I'd prefer MM_GLOBAL_TLBIE, it's shorter and tlbie is the name
of the instruction so is something people can search for.

> @@ -164,5 +168,10 @@ extern void radix_init_pseries(void);
>  static inline void radix_init_pseries(void) { };
>  #endif
>  
> +static inline void mm_context_set_global_tlbi(mm_context_t *ctx)
> +{
> +	set_bit(MM_CONTEXT_GLOBAL_TLBI, &ctx->flags);
> +}

set_bit() and test_bit() are non-atomic, and unordered vs other loads
and stores.

So the caller will need to be careful they have a barrier between this
and whatever it is they do that creates mappings that might need to be
invalidated.

Similarly on the read side we should have a barrier between the store
that makes the PTE invalid and the load of the flag.

Which makes me think cxl_ctx_in_use() is buggy :/, hmm. But it's late so
hopefully I'm wrong :D

> diff --git a/arch/powerpc/include/asm/tlb.h b/arch/powerpc/include/asm/tlb.h
> index 609557569f65..bd18ed083011 100644
> --- a/arch/powerpc/include/asm/tlb.h
> +++ b/arch/powerpc/include/asm/tlb.h
> @@ -71,8 +71,14 @@ static inline int mm_is_core_local(struct mm_struct *mm)
>  
>  static inline int mm_is_thread_local(struct mm_struct *mm)
>  {
> -	return cpumask_equal(mm_cpumask(mm),
> -			      cpumask_of(smp_processor_id()));
> +	int rc;
> +
> +	rc = cpumask_equal(mm_cpumask(mm),
> +			cpumask_of(smp_processor_id()));
> +#ifdef CONFIG_PPC_BOOK3S_64
> +	rc = rc && !test_bit(MM_CONTEXT_GLOBAL_TLBI, &mm->context.flags);
> +#endif

The ifdef's a bit ugly, but I guess it's not worth putting it in a
static inline.

I'd be interested to see the generated code for this, and for the
reverse, ie. putting the test_bit() first, and doing an early return if
it's true. That way once the bit is set we can just skip the cpumask
comparison.

cheers

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [RFC 1/2] powerpc/mm: Add marker for contexts requiring global TLB invalidations
  2017-05-04  6:41   ` Aneesh Kumar K.V
@ 2017-05-04 17:24     ` Frederic Barrat
  0 siblings, 0 replies; 12+ messages in thread
From: Frederic Barrat @ 2017-05-04 17:24 UTC (permalink / raw)
  To: Aneesh Kumar K.V, alistair, bsingharora, linuxppc-dev, mpe



Le 04/05/2017 à 08:41, Aneesh Kumar K.V a écrit :
> Frederic Barrat <fbarrat@linux.vnet.ibm.com> writes:
>
>> Introduce a new 'flags' attribute per context and define its first bit
>> to be a marker requiring all TLBIs for that context to be broadcasted
>> globally. Once that marker is set on a context, it cannot be removed.
>>
>> Such a marker is useful for memory contexts used by devices behind the
>> NPU and CAPP/PSL. The NPU and the PSL keep their own
>> translation cache so they need to see all the TLBIs for those
>> contexts.
>
> Can we also switch existing cxl_ctx_in_use() to this ?


That was my initial intent. But in the hash code, when calling the 
tlbie, it seems that we no longer have the related context handy. Or did 
I miss it? so it would require quite a bit of changes.

So I've just focused on fixing the pb for radix for the time being. That 
being said, we'll have to update what we do for hash if we ever want to 
support opencapi with hash/powervm (which seems like a strong 
possibility for next year), as we could have more than one opencapi drivers.

   Fred

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [RFC 0/2] powerpc/mm: Mark memory contexts requiring global TLBIs
  2017-05-03 14:29 [RFC 0/2] powerpc/mm: Mark memory contexts requiring global TLBIs Frederic Barrat
  2017-05-03 14:29 ` [RFC 1/2] powerpc/mm: Add marker for contexts requiring global TLB invalidations Frederic Barrat
  2017-05-03 14:29 ` [RFC 2/2] cxl: Mark context requiring global TLBIs Frederic Barrat
@ 2017-05-05  5:28 ` Alistair Popple
  2 siblings, 0 replies; 12+ messages in thread
From: Alistair Popple @ 2017-05-05  5:28 UTC (permalink / raw)
  To: Frederic Barrat; +Cc: aneesh.kumar, bsingharora, linuxppc-dev, mpe

Hi Frederic,

On Wed, 3 May 2017 04:29:04 PM Frederic Barrat wrote:
> capi2 and opencapi require the TLB invalidations being sent for
> addresses used on the cxl adapter or opencapi device to be global, as
> there's a translation cache in the PSL (for capi2) or NPU (for
> opencapi). The CAPP (for PSL) and NPU snoop the power bus.
>
> This is not new: for the hash memory model, as soon as the cxl driver
> is active, all local TLBIs become global. We need a similar mechanism
> for the radix memory model. This patch tries to improve things a bit
> by flagging the contexts requiring global TLBIs, therefore limiting
> the "upgrade" and not affecting contexts not used by the card.
>
> Alistair: for nvlink2, it is my understanding that all the required
> invalidations are already in place through software mmio/ATSD, i.e. this
> patch is not useful for you.

Not quite true. I would like to drop the global TLBI from the MMU
notifier so will need this to invalidate entries the NMMU cache.

- Alistair

> Submitting as an RFC, since I don't get to touch mmu.h everyday and
> would like to probe people's reaction.
>
>
>
> Frederic Barrat (2):
>   powerpc/mm: Add marker for contexts requiring global TLB invalidations
>   cxl: Mark context requiring global TLBIs
>
>  arch/powerpc/include/asm/book3s/64/mmu.h |  9 +++++++++
>  arch/powerpc/include/asm/tlb.h           | 10 ++++++++--
>  arch/powerpc/mm/mmu_context_book3s64.c   |  1 +
>  drivers/misc/cxl/api.c                   |  5 ++++-
>  drivers/misc/cxl/file.c                  |  5 ++++-
>  5 files changed, 26 insertions(+), 4 deletions(-)
>
>

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [RFC 2/2] cxl: Mark context requiring global TLBIs
  2017-05-04  7:39   ` Balbir Singh
@ 2017-05-07 10:41     ` Frederic Barrat
  0 siblings, 0 replies; 12+ messages in thread
From: Frederic Barrat @ 2017-05-07 10:41 UTC (permalink / raw)
  To: Balbir Singh, alistair, aneesh.kumar, linuxppc-dev, mpe



Le 04/05/2017 à 09:39, Balbir Singh a écrit :
> On Wed, 2017-05-03 at 16:29 +0200, Frederic Barrat wrote:
>> The PSL needs to see all TLBI pertinent to the memory contexts used on
>> the cxl adapter. For the hash memory model, it was done by making all
>> TLBIs global as soon as the cxl driver is in us. For radix, we need
>> something similar, but we can refine and only convert to global the
>> invalidations for contexts actually used by the device.
>>
>> So mark the contexts being attached to the cxl adapter as requiring
>> global TLBIs.
>>
> <snip>
>> +#ifdef CONFIG_PPC_BOOK3S_64
>> +	if (ctx->mm)
>> +		mm_context_set_global_tlbi(&ctx->mm->context);
>
> Just curious and wondering
>
> Could we do mm_context_set_global_tlbi() before ->attach_process() that
> way we won't need atomic tests (set_bit() and test_bit())? May be a memory
> barrier would suffice. Not 100% sure, hence checking


You're right, I need to move mm_context_set_global_tlbi() before the 
attach and have a write memory barrier.
If the attach fails, then the context will still be marked for global 
TLBIs (some other driver may also set the bit for a different reason). 
But I would expect the life expectancy of a process designed to use an 
accelerator and failing to attach to be pretty short.

I still think we need the atomic set_bit() though, but that's really for 
the future and if somebody introduces new bits in the context 'flags'.

   Fred


> Balbir Singh.
>

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [RFC 1/2] powerpc/mm: Add marker for contexts requiring global TLB invalidations
  2017-05-04  9:42   ` Michael Ellerman
@ 2017-05-07 11:15     ` Frederic Barrat
  0 siblings, 0 replies; 12+ messages in thread
From: Frederic Barrat @ 2017-05-07 11:15 UTC (permalink / raw)
  To: Michael Ellerman, alistair, aneesh.kumar, bsingharora, linuxppc-dev



Le 04/05/2017 à 11:42, Michael Ellerman a écrit :
> Frederic Barrat <fbarrat@linux.vnet.ibm.com> writes:
>
>> Introduce a new 'flags' attribute per context and define its first bit
>> to be a marker requiring all TLBIs for that context to be broadcasted
>> globally. Once that marker is set on a context, it cannot be removed.
>>
>> Such a marker is useful for memory contexts used by devices behind the
>> NPU and CAPP/PSL. The NPU and the PSL keep their own
>> translation cache so they need to see all the TLBIs for those
>> contexts.
>>
>> Signed-off-by: Frederic Barrat <fbarrat@linux.vnet.ibm.com>
>> ---
>>  arch/powerpc/include/asm/book3s/64/mmu.h |  9 +++++++++
>>  arch/powerpc/include/asm/tlb.h           | 10 ++++++++--
>>  arch/powerpc/mm/mmu_context_book3s64.c   |  1 +
>>  3 files changed, 18 insertions(+), 2 deletions(-)
>>
>> diff --git a/arch/powerpc/include/asm/book3s/64/mmu.h b/arch/powerpc/include/asm/book3s/64/mmu.h
>> index 77529a3e3811..7b640ab1cbeb 100644
>> --- a/arch/powerpc/include/asm/book3s/64/mmu.h
>> +++ b/arch/powerpc/include/asm/book3s/64/mmu.h
>> @@ -78,8 +78,12 @@ struct spinlock;
>>  /* Maximum possible number of NPUs in a system. */
>>  #define NV_MAX_NPUS 8
>>
>> +/* Bits definition for the context flags */
>> +#define MM_CONTEXT_GLOBAL_TLBI	1	/* TLBI must be global */
>
> I think I'd prefer MM_GLOBAL_TLBIE, it's shorter and tlbie is the name
> of the instruction so is something people can search for.


OK


>> @@ -164,5 +168,10 @@ extern void radix_init_pseries(void);
>>  static inline void radix_init_pseries(void) { };
>>  #endif
>>
>> +static inline void mm_context_set_global_tlbi(mm_context_t *ctx)
>> +{
>> +	set_bit(MM_CONTEXT_GLOBAL_TLBI, &ctx->flags);
>> +}
>
> set_bit() and test_bit() are non-atomic, and unordered vs other loads
> and stores.
>
> So the caller will need to be careful they have a barrier between this
> and whatever it is they do that creates mappings that might need to be
> invalidated.

Agreed, I missed the barrier. So I need to set the flag, have a write 
memory barrier. Then, in the case of cxl, we can attach to the accelerator.


> Similarly on the read side we should have a barrier between the store
> that makes the PTE invalid and the load of the flag.

That one is more subtle, at least to me, but I think I now see what you 
mean. With no extra read barrier, we would be exposed to have the 
following order:

CPU1                CPU2                 device
                     read flag=>local
set global flag
write barrier
attach
                                          read PTE
                     update PTE
                     tlbiel               not seen, hence broken


Is that what you meant?
That would mean an extra read barrier for each tlbie.

>
> Which makes me think cxl_ctx_in_use() is buggy :/, hmm. But it's late so
> hopefully I'm wrong :D

Unfortunately, I think you're right. And we're missing the same 2 
barriers: a write barrier when cxl increments atomically the use count 
before attaching, and a read barrier like above.

   Fred

>
>> diff --git a/arch/powerpc/include/asm/tlb.h b/arch/powerpc/include/asm/tlb.h
>> index 609557569f65..bd18ed083011 100644
>> --- a/arch/powerpc/include/asm/tlb.h
>> +++ b/arch/powerpc/include/asm/tlb.h
>> @@ -71,8 +71,14 @@ static inline int mm_is_core_local(struct mm_struct *mm)
>>
>>  static inline int mm_is_thread_local(struct mm_struct *mm)
>>  {
>> -	return cpumask_equal(mm_cpumask(mm),
>> -			      cpumask_of(smp_processor_id()));
>> +	int rc;
>> +
>> +	rc = cpumask_equal(mm_cpumask(mm),
>> +			cpumask_of(smp_processor_id()));
>> +#ifdef CONFIG_PPC_BOOK3S_64
>> +	rc = rc && !test_bit(MM_CONTEXT_GLOBAL_TLBI, &mm->context.flags);
>> +#endif
>
> The ifdef's a bit ugly, but I guess it's not worth putting it in a
> static inline.
>
> I'd be interested to see the generated code for this, and for the
> reverse, ie. putting the test_bit() first, and doing an early return if
> it's true. That way once the bit is set we can just skip the cpumask
> comparison.
>
> cheers
>

^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2017-05-07 11:16 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-05-03 14:29 [RFC 0/2] powerpc/mm: Mark memory contexts requiring global TLBIs Frederic Barrat
2017-05-03 14:29 ` [RFC 1/2] powerpc/mm: Add marker for contexts requiring global TLB invalidations Frederic Barrat
2017-05-04  6:41   ` Aneesh Kumar K.V
2017-05-04 17:24     ` Frederic Barrat
2017-05-04  7:25   ` Balbir Singh
2017-05-04  9:24     ` Michael Ellerman
2017-05-04  9:42   ` Michael Ellerman
2017-05-07 11:15     ` Frederic Barrat
2017-05-03 14:29 ` [RFC 2/2] cxl: Mark context requiring global TLBIs Frederic Barrat
2017-05-04  7:39   ` Balbir Singh
2017-05-07 10:41     ` Frederic Barrat
2017-05-05  5:28 ` [RFC 0/2] powerpc/mm: Mark memory contexts " Alistair Popple

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.