All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 1/6] powerpc/mm: Move pgdir setting into a helper
@ 2017-07-24  4:27 Benjamin Herrenschmidt
  2017-07-24  4:27 ` [PATCH 2/6] powerpc/mm: Avoid double irq save/restore in activate_mm Benjamin Herrenschmidt
                   ` (5 more replies)
  0 siblings, 6 replies; 33+ messages in thread
From: Benjamin Herrenschmidt @ 2017-07-24  4:27 UTC (permalink / raw)
  To: linuxppc-dev; +Cc: aneesh.kumar, npiggin, Benjamin Herrenschmidt

Makes switch_mm_irqs_off() a bit more readable

Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
---
 arch/powerpc/include/asm/mmu_context.h | 30 ++++++++++++++++++++++--------
 1 file changed, 22 insertions(+), 8 deletions(-)

diff --git a/arch/powerpc/include/asm/mmu_context.h b/arch/powerpc/include/asm/mmu_context.h
index 94d93f256b94..603b0e5cdec0 100644
--- a/arch/powerpc/include/asm/mmu_context.h
+++ b/arch/powerpc/include/asm/mmu_context.h
@@ -77,6 +77,26 @@ extern void switch_cop(struct mm_struct *next);
 extern int use_cop(unsigned long acop, struct mm_struct *mm);
 extern void drop_cop(unsigned long acop, struct mm_struct *mm);
 
+#if defined(CONFIG_PPC32)
+static inline void switch_mm_pgdir(struct task_struct *tsk,
+				   struct mm_struct *mm)
+{
+	/* 32-bit keeps track of the current PGDIR in the thread struct */
+	tsk->thread.pgdir = mm->pgd;
+}
+#elif defined(CONFIG_PPC_BOOK3E_64)
+static inline void switch_mm_pgdir(struct task_struct *tsk,
+				   struct mm_struct *mm)
+{
+	/* 64-bit Book3E keeps track of current PGD in the PACA */
+	get_paca()->pgd = mm->pgd;
+}
+#else
+static inline void switch_mm_pgdir(struct task_struct *tsk,
+				   struct mm_struct *mm) { }
+#endif
+
+
 /*
  * switch_mm is the entry point called from the architecture independent
  * code in kernel/sched/core.c
@@ -93,15 +113,9 @@ static inline void switch_mm_irqs_off(struct mm_struct *prev,
 		new_on_cpu = true;
 	}
 
-	/* 32-bit keeps track of the current PGDIR in the thread struct */
-#ifdef CONFIG_PPC32
-	tsk->thread.pgdir = next->pgd;
-#endif /* CONFIG_PPC32 */
+	/* Some subarchs need to track the PGD elsewhere */
+	switch_mm_pgdir(tsk, next);
 
-	/* 64-bit Book3E keeps track of current PGD in the PACA */
-#ifdef CONFIG_PPC_BOOK3E_64
-	get_paca()->pgd = next->pgd;
-#endif
 	/* Nothing else to do if we aren't actually switching */
 	if (prev == next)
 		return;
-- 
2.13.3

^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [PATCH 2/6] powerpc/mm: Avoid double irq save/restore in activate_mm
  2017-07-24  4:27 [PATCH 1/6] powerpc/mm: Move pgdir setting into a helper Benjamin Herrenschmidt
@ 2017-07-24  4:27 ` Benjamin Herrenschmidt
  2017-07-24  4:28 ` [PATCH 3/6] powerpc/mm: Ensure cpumask update is ordered Benjamin Herrenschmidt
                   ` (4 subsequent siblings)
  5 siblings, 0 replies; 33+ messages in thread
From: Benjamin Herrenschmidt @ 2017-07-24  4:27 UTC (permalink / raw)
  To: linuxppc-dev; +Cc: aneesh.kumar, npiggin, Benjamin Herrenschmidt

It calls switch_mm() which already does the irq save/restore
these days.

Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
---
 arch/powerpc/include/asm/mmu_context.h | 4 ----
 1 file changed, 4 deletions(-)

diff --git a/arch/powerpc/include/asm/mmu_context.h b/arch/powerpc/include/asm/mmu_context.h
index 603b0e5cdec0..ed9a36ee3107 100644
--- a/arch/powerpc/include/asm/mmu_context.h
+++ b/arch/powerpc/include/asm/mmu_context.h
@@ -158,11 +158,7 @@ static inline void switch_mm(struct mm_struct *prev, struct mm_struct *next,
  */
 static inline void activate_mm(struct mm_struct *prev, struct mm_struct *next)
 {
-	unsigned long flags;
-
-	local_irq_save(flags);
 	switch_mm(prev, next, current);
-	local_irq_restore(flags);
 }
 
 /* We don't currently use enter_lazy_tlb() for anything */
-- 
2.13.3

^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [PATCH 3/6] powerpc/mm: Ensure cpumask update is ordered
  2017-07-24  4:27 [PATCH 1/6] powerpc/mm: Move pgdir setting into a helper Benjamin Herrenschmidt
  2017-07-24  4:27 ` [PATCH 2/6] powerpc/mm: Avoid double irq save/restore in activate_mm Benjamin Herrenschmidt
@ 2017-07-24  4:28 ` Benjamin Herrenschmidt
  2017-07-24 11:20   ` Nicholas Piggin
  2017-08-23 12:01   ` [3/6] " Michael Ellerman
  2017-07-24  4:28 ` [PATCH 4/6] powerpc/mm: Use mm_is_thread_local() instread of open-coding Benjamin Herrenschmidt
                   ` (3 subsequent siblings)
  5 siblings, 2 replies; 33+ messages in thread
From: Benjamin Herrenschmidt @ 2017-07-24  4:28 UTC (permalink / raw)
  To: linuxppc-dev; +Cc: aneesh.kumar, npiggin, Benjamin Herrenschmidt

There is no guarantee that the various isync's involved with
the context switch will order the update of the CPU mask with
the first TLB entry for the new context being loaded by the HW.

Be safe here and add a memory barrier to order any subsequent
load/store which may bring entries into the TLB.

The corresponding barrier on the other side already exists as
pte updates use pte_xchg() which uses __cmpxchg_u64 which has
a sync after the atomic operation.

Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
---
 arch/powerpc/include/asm/mmu_context.h | 1 +
 1 file changed, 1 insertion(+)

diff --git a/arch/powerpc/include/asm/mmu_context.h b/arch/powerpc/include/asm/mmu_context.h
index ed9a36ee3107..ff1aeb2cd19f 100644
--- a/arch/powerpc/include/asm/mmu_context.h
+++ b/arch/powerpc/include/asm/mmu_context.h
@@ -110,6 +110,7 @@ static inline void switch_mm_irqs_off(struct mm_struct *prev,
 	/* Mark this context has been used on the new CPU */
 	if (!cpumask_test_cpu(smp_processor_id(), mm_cpumask(next))) {
 		cpumask_set_cpu(smp_processor_id(), mm_cpumask(next));
+		smp_mb();
 		new_on_cpu = true;
 	}
 
-- 
2.13.3

^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [PATCH 4/6] powerpc/mm: Use mm_is_thread_local() instread of open-coding
  2017-07-24  4:27 [PATCH 1/6] powerpc/mm: Move pgdir setting into a helper Benjamin Herrenschmidt
  2017-07-24  4:27 ` [PATCH 2/6] powerpc/mm: Avoid double irq save/restore in activate_mm Benjamin Herrenschmidt
  2017-07-24  4:28 ` [PATCH 3/6] powerpc/mm: Ensure cpumask update is ordered Benjamin Herrenschmidt
@ 2017-07-24  4:28 ` Benjamin Herrenschmidt
  2017-07-24  4:28 ` [PATCH 5/6] powerpc/mm: Optimize detection of thread local mm's Benjamin Herrenschmidt
                   ` (2 subsequent siblings)
  5 siblings, 0 replies; 33+ messages in thread
From: Benjamin Herrenschmidt @ 2017-07-24  4:28 UTC (permalink / raw)
  To: linuxppc-dev; +Cc: aneesh.kumar, npiggin, Benjamin Herrenschmidt

We open-code testing for the mm being local to the current CPU
in a few places. Use our existing helper instead.

Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
---
 arch/powerpc/mm/hash_utils_64.c  | 6 ++----
 arch/powerpc/mm/hugetlbpage.c    | 3 +--
 arch/powerpc/mm/pgtable-hash64.c | 4 +---
 arch/powerpc/mm/tlb_hash64.c     | 7 ++-----
 4 files changed, 6 insertions(+), 14 deletions(-)

diff --git a/arch/powerpc/mm/hash_utils_64.c b/arch/powerpc/mm/hash_utils_64.c
index 7a20669c19e7..943d9fe2b9a3 100644
--- a/arch/powerpc/mm/hash_utils_64.c
+++ b/arch/powerpc/mm/hash_utils_64.c
@@ -1228,7 +1228,6 @@ int hash_page_mm(struct mm_struct *mm, unsigned long ea,
 	unsigned long vsid;
 	pte_t *ptep;
 	unsigned hugeshift;
-	const struct cpumask *tmp;
 	int rc, user_region = 0;
 	int psize, ssize;
 
@@ -1280,8 +1279,7 @@ int hash_page_mm(struct mm_struct *mm, unsigned long ea,
 	}
 
 	/* Check CPU locality */
-	tmp = cpumask_of(smp_processor_id());
-	if (user_region && cpumask_equal(mm_cpumask(mm), tmp))
+	if (user_region && mm_is_thread_local(mm))
 		flags |= HPTE_LOCAL_UPDATE;
 
 #ifndef CONFIG_PPC_64K_PAGES
@@ -1543,7 +1541,7 @@ void hash_preload(struct mm_struct *mm, unsigned long ea,
 #endif /* CONFIG_PPC_64K_PAGES */
 
 	/* Is that local to this CPU ? */
-	if (cpumask_equal(mm_cpumask(mm), cpumask_of(smp_processor_id())))
+	if (mm_is_thread_local(mm))
 		update_flags |= HPTE_LOCAL_UPDATE;
 
 	/* Hash it in */
diff --git a/arch/powerpc/mm/hugetlbpage.c b/arch/powerpc/mm/hugetlbpage.c
index e1bf5ca397fe..122844c0734a 100644
--- a/arch/powerpc/mm/hugetlbpage.c
+++ b/arch/powerpc/mm/hugetlbpage.c
@@ -407,8 +407,7 @@ static void hugepd_free(struct mmu_gather *tlb, void *hugepte)
 	batchp = &get_cpu_var(hugepd_freelist_cur);
 
 	if (atomic_read(&tlb->mm->mm_users) < 2 ||
-	    cpumask_equal(mm_cpumask(tlb->mm),
-			  cpumask_of(smp_processor_id()))) {
+	    mm_is_thread_local(tlb->mm)) {
 		kmem_cache_free(hugepte_cache, hugepte);
 		put_cpu_var(hugepd_freelist_cur);
 		return;
diff --git a/arch/powerpc/mm/pgtable-hash64.c b/arch/powerpc/mm/pgtable-hash64.c
index 443a2c66a304..d89c637cb600 100644
--- a/arch/powerpc/mm/pgtable-hash64.c
+++ b/arch/powerpc/mm/pgtable-hash64.c
@@ -329,7 +329,6 @@ void hpte_do_hugepage_flush(struct mm_struct *mm, unsigned long addr,
 	unsigned int psize;
 	unsigned long vsid;
 	unsigned long flags = 0;
-	const struct cpumask *tmp;
 
 	/* get the base page size,vsid and segment size */
 #ifdef CONFIG_DEBUG_VM
@@ -350,8 +349,7 @@ void hpte_do_hugepage_flush(struct mm_struct *mm, unsigned long addr,
 		ssize = mmu_kernel_ssize;
 	}
 
-	tmp = cpumask_of(smp_processor_id());
-	if (cpumask_equal(mm_cpumask(mm), tmp))
+	if (mm_is_thread_local(mm))
 		flags |= HPTE_LOCAL_UPDATE;
 
 	return flush_hash_hugepage(vsid, addr, pmdp, psize, ssize, flags);
diff --git a/arch/powerpc/mm/tlb_hash64.c b/arch/powerpc/mm/tlb_hash64.c
index b5b0fb97b9c0..4dde13d7452d 100644
--- a/arch/powerpc/mm/tlb_hash64.c
+++ b/arch/powerpc/mm/tlb_hash64.c
@@ -138,13 +138,10 @@ void hpte_need_flush(struct mm_struct *mm, unsigned long addr,
  */
 void __flush_tlb_pending(struct ppc64_tlb_batch *batch)
 {
-	const struct cpumask *tmp;
-	int i, local = 0;
+	int i, local;
 
 	i = batch->index;
-	tmp = cpumask_of(smp_processor_id());
-	if (cpumask_equal(mm_cpumask(batch->mm), tmp))
-		local = 1;
+	local = mm_is_thread_local(batch->mm);
 	if (i == 1)
 		flush_hash_page(batch->vpn[0], batch->pte[0],
 				batch->psize, batch->ssize, local);
-- 
2.13.3

^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [PATCH 5/6] powerpc/mm: Optimize detection of thread local mm's
  2017-07-24  4:27 [PATCH 1/6] powerpc/mm: Move pgdir setting into a helper Benjamin Herrenschmidt
                   ` (2 preceding siblings ...)
  2017-07-24  4:28 ` [PATCH 4/6] powerpc/mm: Use mm_is_thread_local() instread of open-coding Benjamin Herrenschmidt
@ 2017-07-24  4:28 ` Benjamin Herrenschmidt
  2017-07-24 11:25   ` Nicholas Piggin
                     ` (2 more replies)
  2017-07-24  4:28 ` [PATCH 6/6] powerpc/mm: Make switch_mm_irqs_off() out of line Benjamin Herrenschmidt
  2017-08-24 12:37 ` [1/6] powerpc/mm: Move pgdir setting into a helper Michael Ellerman
  5 siblings, 3 replies; 33+ messages in thread
From: Benjamin Herrenschmidt @ 2017-07-24  4:28 UTC (permalink / raw)
  To: linuxppc-dev; +Cc: aneesh.kumar, npiggin, Benjamin Herrenschmidt

Instead of comparing the whole CPU mask every time, let's
keep a counter of how many bits are set in the mask. Thus
testing for a local mm only requires testing if that counter
is 1 and the current CPU bit is set in the mask.

Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
---
 arch/powerpc/include/asm/book3s/64/mmu.h |  3 +++
 arch/powerpc/include/asm/mmu_context.h   |  9 +++++++++
 arch/powerpc/include/asm/tlb.h           | 11 ++++++++++-
 arch/powerpc/mm/mmu_context_book3s64.c   |  2 ++
 4 files changed, 24 insertions(+), 1 deletion(-)

diff --git a/arch/powerpc/include/asm/book3s/64/mmu.h b/arch/powerpc/include/asm/book3s/64/mmu.h
index 1a220cdff923..c3b00e8ff791 100644
--- a/arch/powerpc/include/asm/book3s/64/mmu.h
+++ b/arch/powerpc/include/asm/book3s/64/mmu.h
@@ -83,6 +83,9 @@ typedef struct {
 	mm_context_id_t id;
 	u16 user_psize;		/* page size index */
 
+	/* Number of bits in the mm_cpumask */
+	atomic_t active_cpus;
+
 	/* NPU NMMU context */
 	struct npu_context *npu_context;
 
diff --git a/arch/powerpc/include/asm/mmu_context.h b/arch/powerpc/include/asm/mmu_context.h
index ff1aeb2cd19f..cf8f50cd4030 100644
--- a/arch/powerpc/include/asm/mmu_context.h
+++ b/arch/powerpc/include/asm/mmu_context.h
@@ -96,6 +96,14 @@ static inline void switch_mm_pgdir(struct task_struct *tsk,
 				   struct mm_struct *mm) { }
 #endif
 
+#ifdef CONFIG_PPC_BOOK3S_64
+static inline void inc_mm_active_cpus(struct mm_struct *mm)
+{
+	atomic_inc(&mm->context.active_cpus);
+}
+#else
+static inline void inc_mm_active_cpus(struct mm_struct *mm) { }
+#endif
 
 /*
  * switch_mm is the entry point called from the architecture independent
@@ -110,6 +118,7 @@ static inline void switch_mm_irqs_off(struct mm_struct *prev,
 	/* Mark this context has been used on the new CPU */
 	if (!cpumask_test_cpu(smp_processor_id(), mm_cpumask(next))) {
 		cpumask_set_cpu(smp_processor_id(), mm_cpumask(next));
+		inc_mm_active_cpus(next);
 		smp_mb();
 		new_on_cpu = true;
 	}
diff --git a/arch/powerpc/include/asm/tlb.h b/arch/powerpc/include/asm/tlb.h
index 609557569f65..a7eabff27a0f 100644
--- a/arch/powerpc/include/asm/tlb.h
+++ b/arch/powerpc/include/asm/tlb.h
@@ -69,13 +69,22 @@ static inline int mm_is_core_local(struct mm_struct *mm)
 			      topology_sibling_cpumask(smp_processor_id()));
 }
 
+#ifdef CONFIG_PPC_BOOK3S_64
+static inline int mm_is_thread_local(struct mm_struct *mm)
+{
+	if (atomic_read(&mm->context.active_cpus) > 1)
+		return false;
+	return cpumask_test_cpu(smp_processor_id(), mm_cpumask(mm));
+}
+#else /* CONFIG_PPC_BOOK3S_64 */
 static inline int mm_is_thread_local(struct mm_struct *mm)
 {
 	return cpumask_equal(mm_cpumask(mm),
 			      cpumask_of(smp_processor_id()));
 }
+#endif /* !CONFIG_PPC_BOOK3S_64 */
 
-#else
+#else /* CONFIG_SMP */
 static inline int mm_is_core_local(struct mm_struct *mm)
 {
 	return 1;
diff --git a/arch/powerpc/mm/mmu_context_book3s64.c b/arch/powerpc/mm/mmu_context_book3s64.c
index 8159f5219137..de17d3e714aa 100644
--- a/arch/powerpc/mm/mmu_context_book3s64.c
+++ b/arch/powerpc/mm/mmu_context_book3s64.c
@@ -174,6 +174,8 @@ int init_new_context(struct task_struct *tsk, struct mm_struct *mm)
 #ifdef CONFIG_SPAPR_TCE_IOMMU
 	mm_iommu_init(mm);
 #endif
+	atomic_set(&mm->context.active_cpus, 0);
+
 	return 0;
 }
 
-- 
2.13.3

^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [PATCH 6/6] powerpc/mm: Make switch_mm_irqs_off() out of line
  2017-07-24  4:27 [PATCH 1/6] powerpc/mm: Move pgdir setting into a helper Benjamin Herrenschmidt
                   ` (3 preceding siblings ...)
  2017-07-24  4:28 ` [PATCH 5/6] powerpc/mm: Optimize detection of thread local mm's Benjamin Herrenschmidt
@ 2017-07-24  4:28 ` Benjamin Herrenschmidt
  2017-08-24 12:37 ` [1/6] powerpc/mm: Move pgdir setting into a helper Michael Ellerman
  5 siblings, 0 replies; 33+ messages in thread
From: Benjamin Herrenschmidt @ 2017-07-24  4:28 UTC (permalink / raw)
  To: linuxppc-dev; +Cc: aneesh.kumar, npiggin, Benjamin Herrenschmidt

It's too big to be inline, there is no reason to keep it
that way.

Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>

# Conflicts:
#	arch/powerpc/include/asm/mmu_context.h
---
 arch/powerpc/include/asm/mmu_context.h | 73 ++----------------------------
 arch/powerpc/mm/Makefile               |  2 +-
 arch/powerpc/mm/mmu_context.c          | 81 ++++++++++++++++++++++++++++++++++
 3 files changed, 85 insertions(+), 71 deletions(-)
 create mode 100644 arch/powerpc/mm/mmu_context.c

diff --git a/arch/powerpc/include/asm/mmu_context.h b/arch/powerpc/include/asm/mmu_context.h
index cf8f50cd4030..ed949b2675cb 100644
--- a/arch/powerpc/include/asm/mmu_context.h
+++ b/arch/powerpc/include/asm/mmu_context.h
@@ -77,76 +77,9 @@ extern void switch_cop(struct mm_struct *next);
 extern int use_cop(unsigned long acop, struct mm_struct *mm);
 extern void drop_cop(unsigned long acop, struct mm_struct *mm);
 
-#if defined(CONFIG_PPC32)
-static inline void switch_mm_pgdir(struct task_struct *tsk,
-				   struct mm_struct *mm)
-{
-	/* 32-bit keeps track of the current PGDIR in the thread struct */
-	tsk->thread.pgdir = mm->pgd;
-}
-#elif defined(CONFIG_PPC_BOOK3E_64)
-static inline void switch_mm_pgdir(struct task_struct *tsk,
-				   struct mm_struct *mm)
-{
-	/* 64-bit Book3E keeps track of current PGD in the PACA */
-	get_paca()->pgd = mm->pgd;
-}
-#else
-static inline void switch_mm_pgdir(struct task_struct *tsk,
-				   struct mm_struct *mm) { }
-#endif
-
-#ifdef CONFIG_PPC_BOOK3S_64
-static inline void inc_mm_active_cpus(struct mm_struct *mm)
-{
-	atomic_inc(&mm->context.active_cpus);
-}
-#else
-static inline void inc_mm_active_cpus(struct mm_struct *mm) { }
-#endif
-
-/*
- * switch_mm is the entry point called from the architecture independent
- * code in kernel/sched/core.c
- */
-static inline void switch_mm_irqs_off(struct mm_struct *prev,
-				      struct mm_struct *next,
-				      struct task_struct *tsk)
-{
-	bool new_on_cpu = false;
-
-	/* Mark this context has been used on the new CPU */
-	if (!cpumask_test_cpu(smp_processor_id(), mm_cpumask(next))) {
-		cpumask_set_cpu(smp_processor_id(), mm_cpumask(next));
-		inc_mm_active_cpus(next);
-		smp_mb();
-		new_on_cpu = true;
-	}
-
-	/* Some subarchs need to track the PGD elsewhere */
-	switch_mm_pgdir(tsk, next);
-
-	/* Nothing else to do if we aren't actually switching */
-	if (prev == next)
-		return;
-
-	/* We must stop all altivec streams before changing the HW
-	 * context
-	 */
-#ifdef CONFIG_ALTIVEC
-	if (cpu_has_feature(CPU_FTR_ALTIVEC))
-		asm volatile ("dssall");
-#endif /* CONFIG_ALTIVEC */
-
-	if (new_on_cpu)
-		radix_kvm_prefetch_workaround(next);
-
-	/*
-	 * The actual HW switching method differs between the various
-	 * sub architectures. Out of line for now
-	 */
-	switch_mmu_context(prev, next, tsk);
-}
+extern void switch_mm_irqs_off(struct mm_struct *prev,
+			       struct mm_struct *next,
+			       struct task_struct *tsk);
 
 static inline void switch_mm(struct mm_struct *prev, struct mm_struct *next,
 			     struct task_struct *tsk)
diff --git a/arch/powerpc/mm/Makefile b/arch/powerpc/mm/Makefile
index fa6a6ba34355..fb844d2f266e 100644
--- a/arch/powerpc/mm/Makefile
+++ b/arch/powerpc/mm/Makefile
@@ -8,7 +8,7 @@ ccflags-$(CONFIG_PPC64)	:= $(NO_MINIMAL_TOC)
 
 obj-y				:= fault.o mem.o pgtable.o mmap.o \
 				   init_$(BITS).o pgtable_$(BITS).o \
-				   init-common.o
+				   init-common.o mmu_context.o
 obj-$(CONFIG_PPC_MMU_NOHASH)	+= mmu_context_nohash.o tlb_nohash.o \
 				   tlb_nohash_low.o
 obj-$(CONFIG_PPC_BOOK3E)	+= tlb_low_$(BITS)e.o
diff --git a/arch/powerpc/mm/mmu_context.c b/arch/powerpc/mm/mmu_context.c
new file mode 100644
index 000000000000..5f7b59c7ace5
--- /dev/null
+++ b/arch/powerpc/mm/mmu_context.c
@@ -0,0 +1,81 @@
+/*
+ *  Common implementation of switch_mm_irqs_off
+ *
+ *  Copyright IBM Corp. 2017
+ *
+ *  This program is free software; you can redistribute it and/or
+ *  modify it under the terms of the GNU General Public License
+ *  as published by the Free Software Foundation; either version
+ *  2 of the License, or (at your option) any later version.
+ *
+ */
+
+#include <linux/mm.h>
+#include <linux/cpu.h>
+
+#include <asm/mmu_context.h>
+
+#if defined(CONFIG_PPC32)
+static inline void switch_mm_pgdir(struct task_struct *tsk,
+				   struct mm_struct *mm)
+{
+	/* 32-bit keeps track of the current PGDIR in the thread struct */
+	tsk->thread.pgdir = mm->pgd;
+}
+#elif defined(CONFIG_PPC_BOOK3E_64)
+static inline void switch_mm_pgdir(struct task_struct *tsk,
+				   struct mm_struct *mm)
+{
+	/* 64-bit Book3E keeps track of current PGD in the PACA */
+	get_paca()->pgd = mm->pgd;
+}
+#else
+static inline void switch_mm_pgdir(struct task_struct *tsk,
+				   struct mm_struct *mm) { }
+#endif
+
+#ifdef CONFIG_PPC_BOOK3S_64
+static inline void inc_mm_active_cpus(struct mm_struct *mm)
+{
+	atomic_inc(&mm->context.active_cpus);
+}
+#else
+static inline void inc_mm_active_cpus(struct mm_struct *mm) { }
+#endif
+
+void switch_mm_irqs_off(struct mm_struct *prev, struct mm_struct *next,
+			struct task_struct *tsk)
+{
+	bool new_on_cpu = false;
+
+	/* Mark this context has been used on the new CPU */
+	if (!cpumask_test_cpu(smp_processor_id(), mm_cpumask(next))) {
+		cpumask_set_cpu(smp_processor_id(), mm_cpumask(next));
+		inc_mm_active_cpus(next);
+		smp_mb();
+		new_on_cpu = true;
+	}
+
+	/* Some subarchs need to track the PGD elsewhere */
+	switch_mm_pgdir(tsk, next);
+
+	/* Nothing else to do if we aren't actually switching */
+	if (prev == next)
+		return;
+
+	/* We must stop all altivec streams before changing the HW
+	 * context
+	 */
+	if (cpu_has_feature(CPU_FTR_ALTIVEC))
+		asm volatile ("dssall");
+
+	if (new_on_cpu)
+		radix_kvm_prefetch_workaround(next);
+
+	/*
+	 * The actual HW switching method differs between the various
+	 * sub architectures. Out of line for now
+	 */
+	switch_mmu_context(prev, next, tsk);
+}
+
-- 
2.13.3

^ permalink raw reply related	[flat|nested] 33+ messages in thread

* Re: [PATCH 3/6] powerpc/mm: Ensure cpumask update is ordered
  2017-07-24  4:28 ` [PATCH 3/6] powerpc/mm: Ensure cpumask update is ordered Benjamin Herrenschmidt
@ 2017-07-24 11:20   ` Nicholas Piggin
  2017-07-24 20:54     ` Benjamin Herrenschmidt
  2017-08-11 11:06     ` Nicholas Piggin
  2017-08-23 12:01   ` [3/6] " Michael Ellerman
  1 sibling, 2 replies; 33+ messages in thread
From: Nicholas Piggin @ 2017-07-24 11:20 UTC (permalink / raw)
  To: Benjamin Herrenschmidt; +Cc: linuxppc-dev, aneesh.kumar

On Mon, 24 Jul 2017 14:28:00 +1000
Benjamin Herrenschmidt <benh@kernel.crashing.org> wrote:

> There is no guarantee that the various isync's involved with
> the context switch will order the update of the CPU mask with
> the first TLB entry for the new context being loaded by the HW.
> 
> Be safe here and add a memory barrier to order any subsequent
> load/store which may bring entries into the TLB.
> 
> The corresponding barrier on the other side already exists as
> pte updates use pte_xchg() which uses __cmpxchg_u64 which has
> a sync after the atomic operation.
> 
> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
> ---
>  arch/powerpc/include/asm/mmu_context.h | 1 +
>  1 file changed, 1 insertion(+)
> 
> diff --git a/arch/powerpc/include/asm/mmu_context.h b/arch/powerpc/include/asm/mmu_context.h
> index ed9a36ee3107..ff1aeb2cd19f 100644
> --- a/arch/powerpc/include/asm/mmu_context.h
> +++ b/arch/powerpc/include/asm/mmu_context.h
> @@ -110,6 +110,7 @@ static inline void switch_mm_irqs_off(struct mm_struct *prev,
>  	/* Mark this context has been used on the new CPU */
>  	if (!cpumask_test_cpu(smp_processor_id(), mm_cpumask(next))) {
>  		cpumask_set_cpu(smp_processor_id(), mm_cpumask(next));
> +		smp_mb();
>  		new_on_cpu = true;
>  	}
>  

I think this is the right thing to do, but it should be commented.
Is hwsync the right barrier? (i.e., it will order the page table walk)

Thanks,
Nick

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 5/6] powerpc/mm: Optimize detection of thread local mm's
  2017-07-24  4:28 ` [PATCH 5/6] powerpc/mm: Optimize detection of thread local mm's Benjamin Herrenschmidt
@ 2017-07-24 11:25   ` Nicholas Piggin
  2017-07-24 13:46     ` Michael Ellerman
  2017-07-24 20:58     ` Benjamin Herrenschmidt
  2017-08-04 12:06   ` Frederic Barrat
  2017-08-21 17:27   ` Frederic Barrat
  2 siblings, 2 replies; 33+ messages in thread
From: Nicholas Piggin @ 2017-07-24 11:25 UTC (permalink / raw)
  To: Benjamin Herrenschmidt; +Cc: linuxppc-dev, aneesh.kumar

On Mon, 24 Jul 2017 14:28:02 +1000
Benjamin Herrenschmidt <benh@kernel.crashing.org> wrote:

> Instead of comparing the whole CPU mask every time, let's
> keep a counter of how many bits are set in the mask. Thus
> testing for a local mm only requires testing if that counter
> is 1 and the current CPU bit is set in the mask.
> 
> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
> ---
>  arch/powerpc/include/asm/book3s/64/mmu.h |  3 +++
>  arch/powerpc/include/asm/mmu_context.h   |  9 +++++++++
>  arch/powerpc/include/asm/tlb.h           | 11 ++++++++++-
>  arch/powerpc/mm/mmu_context_book3s64.c   |  2 ++
>  4 files changed, 24 insertions(+), 1 deletion(-)
> 
> diff --git a/arch/powerpc/include/asm/book3s/64/mmu.h b/arch/powerpc/include/asm/book3s/64/mmu.h
> index 1a220cdff923..c3b00e8ff791 100644
> --- a/arch/powerpc/include/asm/book3s/64/mmu.h
> +++ b/arch/powerpc/include/asm/book3s/64/mmu.h
> @@ -83,6 +83,9 @@ typedef struct {
>  	mm_context_id_t id;
>  	u16 user_psize;		/* page size index */
>  
> +	/* Number of bits in the mm_cpumask */
> +	atomic_t active_cpus;
> +
>  	/* NPU NMMU context */
>  	struct npu_context *npu_context;
>  
> diff --git a/arch/powerpc/include/asm/mmu_context.h b/arch/powerpc/include/asm/mmu_context.h
> index ff1aeb2cd19f..cf8f50cd4030 100644
> --- a/arch/powerpc/include/asm/mmu_context.h
> +++ b/arch/powerpc/include/asm/mmu_context.h
> @@ -96,6 +96,14 @@ static inline void switch_mm_pgdir(struct task_struct *tsk,
>  				   struct mm_struct *mm) { }
>  #endif
>  
> +#ifdef CONFIG_PPC_BOOK3S_64
> +static inline void inc_mm_active_cpus(struct mm_struct *mm)
> +{
> +	atomic_inc(&mm->context.active_cpus);
> +}
> +#else
> +static inline void inc_mm_active_cpus(struct mm_struct *mm) { }
> +#endif

This is a bit awkward. Can we just move the entire function to test
cpumask and set / increment into helper functions and define them
together with mm_is_thread_local, so it's all in one place?

The extra atomic does not need to be defined when it's not used either.

Also does it make sense to define it based on NR_CPUS > BITS_PER_LONG?
If it's <= then it should be similar load and compare, no?

Looks like a good optimisation though.

Thanks,
Nick

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 5/6] powerpc/mm: Optimize detection of thread local mm's
  2017-07-24 11:25   ` Nicholas Piggin
@ 2017-07-24 13:46     ` Michael Ellerman
  2017-07-25  0:34       ` Nicholas Piggin
  2017-07-24 20:58     ` Benjamin Herrenschmidt
  1 sibling, 1 reply; 33+ messages in thread
From: Michael Ellerman @ 2017-07-24 13:46 UTC (permalink / raw)
  To: Nicholas Piggin, Benjamin Herrenschmidt; +Cc: linuxppc-dev, aneesh.kumar

Nicholas Piggin <npiggin@gmail.com> writes:

> On Mon, 24 Jul 2017 14:28:02 +1000
> Benjamin Herrenschmidt <benh@kernel.crashing.org> wrote:
>
>> Instead of comparing the whole CPU mask every time, let's
>> keep a counter of how many bits are set in the mask. Thus
>> testing for a local mm only requires testing if that counter
>> is 1 and the current CPU bit is set in the mask.
...
>
> Also does it make sense to define it based on NR_CPUS > BITS_PER_LONG?
> If it's <= then it should be similar load and compare, no?

Do we make a machine with that few CPUs? ;)

I don't think it's worth special casing, all the distros run with much
much larger NR_CPUs than that.

cheers

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 3/6] powerpc/mm: Ensure cpumask update is ordered
  2017-07-24 11:20   ` Nicholas Piggin
@ 2017-07-24 20:54     ` Benjamin Herrenschmidt
  2017-08-11 11:06     ` Nicholas Piggin
  1 sibling, 0 replies; 33+ messages in thread
From: Benjamin Herrenschmidt @ 2017-07-24 20:54 UTC (permalink / raw)
  To: Nicholas Piggin; +Cc: linuxppc-dev, aneesh.kumar

On Mon, 2017-07-24 at 21:20 +1000, Nicholas Piggin wrote:
> I think this is the right thing to do, but it should be commented.
> Is hwsync the right barrier? (i.e., it will order the page table walk)

This is an open question, I've asked the architects and HW guys and
waiting for an answer.

That said, are we really trying to order the page table walk or are
we trying to order any (speculative or not) load/store that may trigger
a page table update ?

Cheers,
Ben.

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 5/6] powerpc/mm: Optimize detection of thread local mm's
  2017-07-24 11:25   ` Nicholas Piggin
  2017-07-24 13:46     ` Michael Ellerman
@ 2017-07-24 20:58     ` Benjamin Herrenschmidt
  2017-07-25  0:44       ` Nicholas Piggin
  1 sibling, 1 reply; 33+ messages in thread
From: Benjamin Herrenschmidt @ 2017-07-24 20:58 UTC (permalink / raw)
  To: Nicholas Piggin; +Cc: linuxppc-dev, aneesh.kumar

On Mon, 2017-07-24 at 21:25 +1000, Nicholas Piggin wrote:
> > +#ifdef CONFIG_PPC_BOOK3S_64
> > +static inline void inc_mm_active_cpus(struct mm_struct *mm)
> > +{
> > +     atomic_inc(&mm->context.active_cpus);
> > +}
> > +#else
> > +static inline void inc_mm_active_cpus(struct mm_struct *mm) { }
> > +#endif
> 
> This is a bit awkward. Can we just move the entire function to test
> cpumask and set / increment into helper functions and define them
> together with mm_is_thread_local, so it's all in one place?

I thought about it but then we have 2 variants, unless I start moving
the active_cpus into mm_context_t on all the 32-bit subarchs too, etc..

It gets messy either way.

> The extra atomic does not need to be defined when it's not used either.
> 
> Also does it make sense to define it based on NR_CPUS > BITS_PER_LONG?
> If it's <= then it should be similar load and compare, no?

Right, we could.

> Looks like a good optimisation though.

Thx. It's a pre-req for further optimizations such as flushing the PID
when a single threaded process moves, so we don't have to constantly
scan the mask.

Cheers,
Ben.

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 5/6] powerpc/mm: Optimize detection of thread local mm's
  2017-07-24 13:46     ` Michael Ellerman
@ 2017-07-25  0:34       ` Nicholas Piggin
  2017-07-25 12:00         ` Michael Ellerman
  0 siblings, 1 reply; 33+ messages in thread
From: Nicholas Piggin @ 2017-07-25  0:34 UTC (permalink / raw)
  To: Michael Ellerman; +Cc: Benjamin Herrenschmidt, linuxppc-dev, aneesh.kumar

On Mon, 24 Jul 2017 23:46:44 +1000
Michael Ellerman <mpe@ellerman.id.au> wrote:

> Nicholas Piggin <npiggin@gmail.com> writes:
> 
> > On Mon, 24 Jul 2017 14:28:02 +1000
> > Benjamin Herrenschmidt <benh@kernel.crashing.org> wrote:
> >  
> >> Instead of comparing the whole CPU mask every time, let's
> >> keep a counter of how many bits are set in the mask. Thus
> >> testing for a local mm only requires testing if that counter
> >> is 1 and the current CPU bit is set in the mask.  
> ...
> >
> > Also does it make sense to define it based on NR_CPUS > BITS_PER_LONG?
> > If it's <= then it should be similar load and compare, no?  
> 
> Do we make a machine with that few CPUs? ;)
> 
> I don't think it's worth special casing, all the distros run with much
> much larger NR_CPUs than that.

Not further special-casing, but just casing it based on NR_CPUS
rather than BOOK3S.

Thanks,
Nick

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 5/6] powerpc/mm: Optimize detection of thread local mm's
  2017-07-24 20:58     ` Benjamin Herrenschmidt
@ 2017-07-25  0:44       ` Nicholas Piggin
  2017-07-25  1:03         ` Benjamin Herrenschmidt
  0 siblings, 1 reply; 33+ messages in thread
From: Nicholas Piggin @ 2017-07-25  0:44 UTC (permalink / raw)
  To: Benjamin Herrenschmidt; +Cc: linuxppc-dev, aneesh.kumar

On Tue, 25 Jul 2017 06:58:46 +1000
Benjamin Herrenschmidt <benh@kernel.crashing.org> wrote:

> On Mon, 2017-07-24 at 21:25 +1000, Nicholas Piggin wrote:
> > > +#ifdef CONFIG_PPC_BOOK3S_64
> > > +static inline void inc_mm_active_cpus(struct mm_struct *mm)
> > > +{
> > > +     atomic_inc(&mm->context.active_cpus);
> > > +}
> > > +#else
> > > +static inline void inc_mm_active_cpus(struct mm_struct *mm) { }
> > > +#endif  
> > 
> > This is a bit awkward. Can we just move the entire function to test
> > cpumask and set / increment into helper functions and define them
> > together with mm_is_thread_local, so it's all in one place?  
> 
> I thought about it but then we have 2 variants, unless I start moving
> the active_cpus into mm_context_t on all the 32-bit subarchs too, etc..

The two variants are just cleaner versions of the two variants you
already introduced.

static inline bool mm_activate_cpu(struct mm_struct *mm)
{
    if (!cpumask_test_cpu(smp_processor_id(), mm_cpumask(next))) {
        cpumask_set_cpu(smp_processor_id(), mm_cpumask(next));
#if CONFIG_PPC_BOOK3S_64
        atomic_inc(&mm->context.active_cpus);
#endif
        smp_mb();
        return true;
    }
    return false;
}

I think it would be nicer to put something like that with
mm_is_thread_local etc definitions so you can see how it all works
in one place.

> It gets messy either way.
> 
> > The extra atomic does not need to be defined when it's not used either.
> > 
> > Also does it make sense to define it based on NR_CPUS > BITS_PER_LONG?
> > If it's <= then it should be similar load and compare, no?  
> 
> Right, we could.
> 
> > Looks like a good optimisation though.  
> 
> Thx. It's a pre-req for further optimizations such as flushing the PID
> when a single threaded process moves, so we don't have to constantly
> scan the mask.

Yep, will be very interesting to see how much global tlbies can be
reduced.

Thanks,
Nick

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 5/6] powerpc/mm: Optimize detection of thread local mm's
  2017-07-25  0:44       ` Nicholas Piggin
@ 2017-07-25  1:03         ` Benjamin Herrenschmidt
  2017-07-25 10:55           ` Nicholas Piggin
  0 siblings, 1 reply; 33+ messages in thread
From: Benjamin Herrenschmidt @ 2017-07-25  1:03 UTC (permalink / raw)
  To: Nicholas Piggin; +Cc: linuxppc-dev, aneesh.kumar

On Tue, 2017-07-25 at 10:44 +1000, Nicholas Piggin wrote:
> The two variants are just cleaner versions of the two variants you
> already introduced.
> 
> static inline bool mm_activate_cpu(struct mm_struct *mm)
> {
>     if (!cpumask_test_cpu(smp_processor_id(), mm_cpumask(next))) {
>         cpumask_set_cpu(smp_processor_id(), mm_cpumask(next));
> #if CONFIG_PPC_BOOK3S_64
>         atomic_inc(&mm->context.active_cpus);
> #endif
>         smp_mb();
>         return true;
>     }
>     return false;
> }

Well the above is what I originally wrote, which Michael encouraged me
to turn into a helper ;-) I was removing ifdef's from switch_mm in
this series...

> I think it would be nicer to put something like that with
> mm_is_thread_local etc definitions so you can see how it all works
> in one place.
> 
> > It gets messy either way.
> > 
> > > The extra atomic does not need to be defined when it's not used either.
> > > 
> > > Also does it make sense to define it based on NR_CPUS > BITS_PER_LONG?
> > > If it's <= then it should be similar load and compare, no?  
> > 
> > Right, we could.
> > 
> > > Looks like a good optimisation though.  
> > 
> > Thx. It's a pre-req for further optimizations such as flushing the PID
> > when a single threaded process moves, so we don't have to constantly
> > scan the mask.
> 
> Yep, will be very interesting to see how much global tlbies can be
> reduced.
> 
> Thanks,
> Nick

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 5/6] powerpc/mm: Optimize detection of thread local mm's
  2017-07-25  1:03         ` Benjamin Herrenschmidt
@ 2017-07-25 10:55           ` Nicholas Piggin
  0 siblings, 0 replies; 33+ messages in thread
From: Nicholas Piggin @ 2017-07-25 10:55 UTC (permalink / raw)
  To: Benjamin Herrenschmidt; +Cc: linuxppc-dev, aneesh.kumar

On Tue, 25 Jul 2017 11:03:45 +1000
Benjamin Herrenschmidt <benh@kernel.crashing.org> wrote:

> On Tue, 2017-07-25 at 10:44 +1000, Nicholas Piggin wrote:
> > The two variants are just cleaner versions of the two variants you
> > already introduced.
> > 
> > static inline bool mm_activate_cpu(struct mm_struct *mm)
> > {
> >     if (!cpumask_test_cpu(smp_processor_id(), mm_cpumask(next))) {
> >         cpumask_set_cpu(smp_processor_id(), mm_cpumask(next));
> > #if CONFIG_PPC_BOOK3S_64
> >         atomic_inc(&mm->context.active_cpus);
> > #endif
> >         smp_mb();
> >         return true;
> >     }
> >     return false;
> > }  
> 
> Well the above is what I originally wrote, which Michael encouraged me
> to turn into a helper ;-) I was removing ifdef's from switch_mm in
> this series...

Well I won't harp on about it if you guys prefer the increment helper.
Just the comment would be good. The rest of the series seems okay to
me.

Thanks,
Nick

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 5/6] powerpc/mm: Optimize detection of thread local mm's
  2017-07-25  0:34       ` Nicholas Piggin
@ 2017-07-25 12:00         ` Michael Ellerman
  0 siblings, 0 replies; 33+ messages in thread
From: Michael Ellerman @ 2017-07-25 12:00 UTC (permalink / raw)
  To: Nicholas Piggin; +Cc: Benjamin Herrenschmidt, linuxppc-dev, aneesh.kumar

Nicholas Piggin <npiggin@gmail.com> writes:

> On Mon, 24 Jul 2017 23:46:44 +1000
> Michael Ellerman <mpe@ellerman.id.au> wrote:
>
>> Nicholas Piggin <npiggin@gmail.com> writes:
>> 
>> > On Mon, 24 Jul 2017 14:28:02 +1000
>> > Benjamin Herrenschmidt <benh@kernel.crashing.org> wrote:
>> >  
>> >> Instead of comparing the whole CPU mask every time, let's
>> >> keep a counter of how many bits are set in the mask. Thus
>> >> testing for a local mm only requires testing if that counter
>> >> is 1 and the current CPU bit is set in the mask.  
>> ...
>> >
>> > Also does it make sense to define it based on NR_CPUS > BITS_PER_LONG?
>> > If it's <= then it should be similar load and compare, no?  
>> 
>> Do we make a machine with that few CPUs? ;)
>> 
>> I don't think it's worth special casing, all the distros run with much
>> much larger NR_CPUs than that.
>
> Not further special-casing, but just casing it based on NR_CPUS
> rather than BOOK3S.

The problem is the mm_context_t is defined based on BookE vs BookS etc.
not based on NR_CPUS.

So we'd have to add the atomic_t to all mm_context_t's, but #ifdef'ed
based on NR_CPUS.

But then some platforms don't support SMP, so it's a waste there. The
existing cpumask check compiles to ~= nothing on UP.

cheers

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 5/6] powerpc/mm: Optimize detection of thread local mm's
  2017-07-24  4:28 ` [PATCH 5/6] powerpc/mm: Optimize detection of thread local mm's Benjamin Herrenschmidt
  2017-07-24 11:25   ` Nicholas Piggin
@ 2017-08-04 12:06   ` Frederic Barrat
  2017-08-04 12:58     ` Benjamin Herrenschmidt
  2017-08-21 17:27   ` Frederic Barrat
  2 siblings, 1 reply; 33+ messages in thread
From: Frederic Barrat @ 2017-08-04 12:06 UTC (permalink / raw)
  To: Benjamin Herrenschmidt, linuxppc-dev
  Cc: aneesh.kumar, npiggin, Michael Ellerman



Le 24/07/2017 à 06:28, Benjamin Herrenschmidt a écrit :
> Instead of comparing the whole CPU mask every time, let's
> keep a counter of how many bits are set in the mask. Thus
> testing for a local mm only requires testing if that counter
> is 1 and the current CPU bit is set in the mask.
> 
> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
> ---
>   arch/powerpc/include/asm/book3s/64/mmu.h |  3 +++
>   arch/powerpc/include/asm/mmu_context.h   |  9 +++++++++
>   arch/powerpc/include/asm/tlb.h           | 11 ++++++++++-
>   arch/powerpc/mm/mmu_context_book3s64.c   |  2 ++
>   4 files changed, 24 insertions(+), 1 deletion(-)
> 
> diff --git a/arch/powerpc/include/asm/book3s/64/mmu.h b/arch/powerpc/include/asm/book3s/64/mmu.h
> index 1a220cdff923..c3b00e8ff791 100644
> --- a/arch/powerpc/include/asm/book3s/64/mmu.h
> +++ b/arch/powerpc/include/asm/book3s/64/mmu.h
> @@ -83,6 +83,9 @@ typedef struct {
>   	mm_context_id_t id;
>   	u16 user_psize;		/* page size index */
> 
> +	/* Number of bits in the mm_cpumask */
> +	atomic_t active_cpus;
> +
>   	/* NPU NMMU context */
>   	struct npu_context *npu_context;
> 
> diff --git a/arch/powerpc/include/asm/mmu_context.h b/arch/powerpc/include/asm/mmu_context.h
> index ff1aeb2cd19f..cf8f50cd4030 100644
> --- a/arch/powerpc/include/asm/mmu_context.h
> +++ b/arch/powerpc/include/asm/mmu_context.h
> @@ -96,6 +96,14 @@ static inline void switch_mm_pgdir(struct task_struct *tsk,
>   				   struct mm_struct *mm) { }
>   #endif
> 
> +#ifdef CONFIG_PPC_BOOK3S_64
> +static inline void inc_mm_active_cpus(struct mm_struct *mm)
> +{
> +	atomic_inc(&mm->context.active_cpus);
> +}
> +#else
> +static inline void inc_mm_active_cpus(struct mm_struct *mm) { }
> +#endif
> 
>   /*
>    * switch_mm is the entry point called from the architecture independent
> @@ -110,6 +118,7 @@ static inline void switch_mm_irqs_off(struct mm_struct *prev,
>   	/* Mark this context has been used on the new CPU */
>   	if (!cpumask_test_cpu(smp_processor_id(), mm_cpumask(next))) {
>   		cpumask_set_cpu(smp_processor_id(), mm_cpumask(next));
> +		inc_mm_active_cpus(next);
>   		smp_mb();
>   		new_on_cpu = true;
>   	}
> diff --git a/arch/powerpc/include/asm/tlb.h b/arch/powerpc/include/asm/tlb.h
> index 609557569f65..a7eabff27a0f 100644
> --- a/arch/powerpc/include/asm/tlb.h
> +++ b/arch/powerpc/include/asm/tlb.h
> @@ -69,13 +69,22 @@ static inline int mm_is_core_local(struct mm_struct *mm)
>   			      topology_sibling_cpumask(smp_processor_id()));
>   }
> 
> +#ifdef CONFIG_PPC_BOOK3S_64
> +static inline int mm_is_thread_local(struct mm_struct *mm)
> +{
> +	if (atomic_read(&mm->context.active_cpus) > 1)
> +		return false;
> +	return cpumask_test_cpu(smp_processor_id(), mm_cpumask(mm));
> +}
> +#else /* CONFIG_PPC_BOOK3S_64 */


While working on something related (mark memory context as needing 
global TLBI if used behind a NPU or PSL):
http://patchwork.ozlabs.org/patch/796775/

Michael raised the point that the store for the pte update cannot be 
reordered with the load which decides the scope of the TLBI, and had 
convinced me that a memory barrier was required.

Couldn't we have the same problem here, where the atomic read is 
reordered with the store of the invalid PTE?

Thanks,

   Fred


>   static inline int mm_is_thread_local(struct mm_struct *mm)
>   {
>   	return cpumask_equal(mm_cpumask(mm),
>   			      cpumask_of(smp_processor_id()));
>   }
> +#endif /* !CONFIG_PPC_BOOK3S_64 */
> 
> -#else
> +#else /* CONFIG_SMP */
>   static inline int mm_is_core_local(struct mm_struct *mm)
>   {
>   	return 1;
> diff --git a/arch/powerpc/mm/mmu_context_book3s64.c b/arch/powerpc/mm/mmu_context_book3s64.c
> index 8159f5219137..de17d3e714aa 100644
> --- a/arch/powerpc/mm/mmu_context_book3s64.c
> +++ b/arch/powerpc/mm/mmu_context_book3s64.c
> @@ -174,6 +174,8 @@ int init_new_context(struct task_struct *tsk, struct mm_struct *mm)
>   #ifdef CONFIG_SPAPR_TCE_IOMMU
>   	mm_iommu_init(mm);
>   #endif
> +	atomic_set(&mm->context.active_cpus, 0);
> +
>   	return 0;
>   }
> 

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 5/6] powerpc/mm: Optimize detection of thread local mm's
  2017-08-04 12:06   ` Frederic Barrat
@ 2017-08-04 12:58     ` Benjamin Herrenschmidt
  0 siblings, 0 replies; 33+ messages in thread
From: Benjamin Herrenschmidt @ 2017-08-04 12:58 UTC (permalink / raw)
  To: Frederic Barrat, linuxppc-dev; +Cc: aneesh.kumar, npiggin, Michael Ellerman

On Fri, 2017-08-04 at 14:06 +0200, Frederic Barrat wrote:
> > +#ifdef CONFIG_PPC_BOOK3S_64
> > +static inline int mm_is_thread_local(struct mm_struct *mm)
> > +{
> > +     if (atomic_read(&mm->context.active_cpus) > 1)
> > +             return false;
> > +     return cpumask_test_cpu(smp_processor_id(), mm_cpumask(mm));
> > +}
> > +#else /* CONFIG_PPC_BOOK3S_64 */
> 
> 
> While working on something related (mark memory context as needing 
> global TLBI if used behind a NPU or PSL):
> http://patchwork.ozlabs.org/patch/796775/
> 
> Michael raised the point that the store for the pte update cannot be 
> reordered with the load which decides the scope of the TLBI, and had 
> convinced me that a memory barrier was required.
> 
> Couldn't we have the same problem here, where the atomic read is 
> reordered with the store of the invalid PTE?

The store of the invalid PTE is done with a pte_update which contains a
sync as far as I can tell.

Cheers,
Ben.

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 3/6] powerpc/mm: Ensure cpumask update is ordered
  2017-07-24 11:20   ` Nicholas Piggin
  2017-07-24 20:54     ` Benjamin Herrenschmidt
@ 2017-08-11 11:06     ` Nicholas Piggin
  2017-08-11 22:40       ` Benjamin Herrenschmidt
  2017-08-17 12:58       ` Michael Ellerman
  1 sibling, 2 replies; 33+ messages in thread
From: Nicholas Piggin @ 2017-08-11 11:06 UTC (permalink / raw)
  To: Benjamin Herrenschmidt; +Cc: linuxppc-dev, aneesh.kumar, Michael Ellerman

On Mon, 24 Jul 2017 21:20:07 +1000
Nicholas Piggin <npiggin@gmail.com> wrote:

> On Mon, 24 Jul 2017 14:28:00 +1000
> Benjamin Herrenschmidt <benh@kernel.crashing.org> wrote:
> 
> > There is no guarantee that the various isync's involved with
> > the context switch will order the update of the CPU mask with
> > the first TLB entry for the new context being loaded by the HW.
> > 
> > Be safe here and add a memory barrier to order any subsequent
> > load/store which may bring entries into the TLB.
> > 
> > The corresponding barrier on the other side already exists as
> > pte updates use pte_xchg() which uses __cmpxchg_u64 which has
> > a sync after the atomic operation.
> > 
> > Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
> > ---
> >  arch/powerpc/include/asm/mmu_context.h | 1 +
> >  1 file changed, 1 insertion(+)
> > 
> > diff --git a/arch/powerpc/include/asm/mmu_context.h b/arch/powerpc/include/asm/mmu_context.h
> > index ed9a36ee3107..ff1aeb2cd19f 100644
> > --- a/arch/powerpc/include/asm/mmu_context.h
> > +++ b/arch/powerpc/include/asm/mmu_context.h
> > @@ -110,6 +110,7 @@ static inline void switch_mm_irqs_off(struct mm_struct *prev,
> >  	/* Mark this context has been used on the new CPU */
> >  	if (!cpumask_test_cpu(smp_processor_id(), mm_cpumask(next))) {
> >  		cpumask_set_cpu(smp_processor_id(), mm_cpumask(next));
> > +		smp_mb();
> >  		new_on_cpu = true;
> >  	}
> >    
> 
> I think this is the right thing to do, but it should be commented.
> Is hwsync the right barrier? (i.e., it will order the page table walk)

After some offline discussion, I think we have an agreement that
this is the right barrier, as it orders with the subsequent load
of next->context.id that the mtpid depends on (or slbmte for HPT).

So we should have a comment here to that effect, and including
the pte_xchg comments from your changelog. Some comment (at least
refer back to here) added at pte_xchg too please.

Other than that your series seems good to me if you repost it you
can add

Reviewed-by: Nicholas Piggin <npiggin@gmail.com>

This one out of the series is the bugfix so it should go to stable
as well, right?

Thanks,
Nick

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 3/6] powerpc/mm: Ensure cpumask update is ordered
  2017-08-11 11:06     ` Nicholas Piggin
@ 2017-08-11 22:40       ` Benjamin Herrenschmidt
  2017-08-17 12:58       ` Michael Ellerman
  1 sibling, 0 replies; 33+ messages in thread
From: Benjamin Herrenschmidt @ 2017-08-11 22:40 UTC (permalink / raw)
  To: Nicholas Piggin; +Cc: linuxppc-dev, aneesh.kumar, Michael Ellerman

On Fri, 2017-08-11 at 21:06 +1000, Nicholas Piggin wrote:
> Other than that your series seems good to me if you repost it you
> can add
> 
> Reviewed-by: Nicholas Piggin <npiggin@gmail.com>
> 
> This one out of the series is the bugfix so it should go to stable
> as well, right?

Yup.

Ben.

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 3/6] powerpc/mm: Ensure cpumask update is ordered
  2017-08-11 11:06     ` Nicholas Piggin
  2017-08-11 22:40       ` Benjamin Herrenschmidt
@ 2017-08-17 12:58       ` Michael Ellerman
  1 sibling, 0 replies; 33+ messages in thread
From: Michael Ellerman @ 2017-08-17 12:58 UTC (permalink / raw)
  To: Nicholas Piggin, Benjamin Herrenschmidt; +Cc: linuxppc-dev, aneesh.kumar

Nicholas Piggin <npiggin@gmail.com> writes:

> On Mon, 24 Jul 2017 21:20:07 +1000
> Nicholas Piggin <npiggin@gmail.com> wrote:
>
>> On Mon, 24 Jul 2017 14:28:00 +1000
>> Benjamin Herrenschmidt <benh@kernel.crashing.org> wrote:
>> 
>> > There is no guarantee that the various isync's involved with
>> > the context switch will order the update of the CPU mask with
>> > the first TLB entry for the new context being loaded by the HW.
>> > 
>> > Be safe here and add a memory barrier to order any subsequent
>> > load/store which may bring entries into the TLB.
>> > 
>> > The corresponding barrier on the other side already exists as
>> > pte updates use pte_xchg() which uses __cmpxchg_u64 which has
>> > a sync after the atomic operation.
>> > 
>> > Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
>> > ---
>> >  arch/powerpc/include/asm/mmu_context.h | 1 +
>> >  1 file changed, 1 insertion(+)
>> > 
>> > diff --git a/arch/powerpc/include/asm/mmu_context.h b/arch/powerpc/include/asm/mmu_context.h
>> > index ed9a36ee3107..ff1aeb2cd19f 100644
>> > --- a/arch/powerpc/include/asm/mmu_context.h
>> > +++ b/arch/powerpc/include/asm/mmu_context.h
>> > @@ -110,6 +110,7 @@ static inline void switch_mm_irqs_off(struct mm_struct *prev,
>> >  	/* Mark this context has been used on the new CPU */
>> >  	if (!cpumask_test_cpu(smp_processor_id(), mm_cpumask(next))) {
>> >  		cpumask_set_cpu(smp_processor_id(), mm_cpumask(next));
>> > +		smp_mb();
>> >  		new_on_cpu = true;
>> >  	}
>> >    
>> 
>> I think this is the right thing to do, but it should be commented.
>> Is hwsync the right barrier? (i.e., it will order the page table walk)
>
> After some offline discussion, I think we have an agreement that
> this is the right barrier, as it orders with the subsequent load
> of next->context.id that the mtpid depends on (or slbmte for HPT).
>
> So we should have a comment here to that effect, and including
> the pte_xchg comments from your changelog. Some comment (at least
> refer back to here) added at pte_xchg too please.
>
> Other than that your series seems good to me if you repost it you
> can add
>
> Reviewed-by: Nicholas Piggin <npiggin@gmail.com>
>
> This one out of the series is the bugfix so it should go to stable
> as well, right?

So I'm waiting on a v2?

cheers

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 5/6] powerpc/mm: Optimize detection of thread local mm's
  2017-07-24  4:28 ` [PATCH 5/6] powerpc/mm: Optimize detection of thread local mm's Benjamin Herrenschmidt
  2017-07-24 11:25   ` Nicholas Piggin
  2017-08-04 12:06   ` Frederic Barrat
@ 2017-08-21 17:27   ` Frederic Barrat
  2017-08-21 17:35     ` Benjamin Herrenschmidt
  2017-08-22  4:28     ` Michael Ellerman
  2 siblings, 2 replies; 33+ messages in thread
From: Frederic Barrat @ 2017-08-21 17:27 UTC (permalink / raw)
  To: Benjamin Herrenschmidt, linuxppc-dev; +Cc: aneesh.kumar, npiggin

Hi Ben,

Le 24/07/2017 à 06:28, Benjamin Herrenschmidt a écrit :
> Instead of comparing the whole CPU mask every time, let's
> keep a counter of how many bits are set in the mask. Thus
> testing for a local mm only requires testing if that counter
> is 1 and the current CPU bit is set in the mask.


I'm trying to see if we could merge this patch with what I'm trying to 
do to mark a context as requiring global TLBIs.
In http://patchwork.ozlabs.org/patch/796775/
I'm introducing a 'flags' per memory context, using one bit to say if 
the context needs global TLBIs.
The 2 could co-exist, just checking... Do you think about using the 
actual active_cpus count down the road, or is it just a matter of 
knowing if there are more than one active cpus?

Thanks,

   Fred



> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
> ---
>   arch/powerpc/include/asm/book3s/64/mmu.h |  3 +++
>   arch/powerpc/include/asm/mmu_context.h   |  9 +++++++++
>   arch/powerpc/include/asm/tlb.h           | 11 ++++++++++-
>   arch/powerpc/mm/mmu_context_book3s64.c   |  2 ++
>   4 files changed, 24 insertions(+), 1 deletion(-)
> 
> diff --git a/arch/powerpc/include/asm/book3s/64/mmu.h b/arch/powerpc/include/asm/book3s/64/mmu.h
> index 1a220cdff923..c3b00e8ff791 100644
> --- a/arch/powerpc/include/asm/book3s/64/mmu.h
> +++ b/arch/powerpc/include/asm/book3s/64/mmu.h
> @@ -83,6 +83,9 @@ typedef struct {
>   	mm_context_id_t id;
>   	u16 user_psize;		/* page size index */
> 
> +	/* Number of bits in the mm_cpumask */
> +	atomic_t active_cpus;
> +
>   	/* NPU NMMU context */
>   	struct npu_context *npu_context;
> 
> diff --git a/arch/powerpc/include/asm/mmu_context.h b/arch/powerpc/include/asm/mmu_context.h
> index ff1aeb2cd19f..cf8f50cd4030 100644
> --- a/arch/powerpc/include/asm/mmu_context.h
> +++ b/arch/powerpc/include/asm/mmu_context.h
> @@ -96,6 +96,14 @@ static inline void switch_mm_pgdir(struct task_struct *tsk,
>   				   struct mm_struct *mm) { }
>   #endif
> 
> +#ifdef CONFIG_PPC_BOOK3S_64
> +static inline void inc_mm_active_cpus(struct mm_struct *mm)
> +{
> +	atomic_inc(&mm->context.active_cpus);
> +}
> +#else
> +static inline void inc_mm_active_cpus(struct mm_struct *mm) { }
> +#endif
> 
>   /*
>    * switch_mm is the entry point called from the architecture independent
> @@ -110,6 +118,7 @@ static inline void switch_mm_irqs_off(struct mm_struct *prev,
>   	/* Mark this context has been used on the new CPU */
>   	if (!cpumask_test_cpu(smp_processor_id(), mm_cpumask(next))) {
>   		cpumask_set_cpu(smp_processor_id(), mm_cpumask(next));
> +		inc_mm_active_cpus(next);
>   		smp_mb();
>   		new_on_cpu = true;
>   	}
> diff --git a/arch/powerpc/include/asm/tlb.h b/arch/powerpc/include/asm/tlb.h
> index 609557569f65..a7eabff27a0f 100644
> --- a/arch/powerpc/include/asm/tlb.h
> +++ b/arch/powerpc/include/asm/tlb.h
> @@ -69,13 +69,22 @@ static inline int mm_is_core_local(struct mm_struct *mm)
>   			      topology_sibling_cpumask(smp_processor_id()));
>   }
> 
> +#ifdef CONFIG_PPC_BOOK3S_64
> +static inline int mm_is_thread_local(struct mm_struct *mm)
> +{
> +	if (atomic_read(&mm->context.active_cpus) > 1)
> +		return false;
> +	return cpumask_test_cpu(smp_processor_id(), mm_cpumask(mm));
> +}
> +#else /* CONFIG_PPC_BOOK3S_64 */
>   static inline int mm_is_thread_local(struct mm_struct *mm)
>   {
>   	return cpumask_equal(mm_cpumask(mm),
>   			      cpumask_of(smp_processor_id()));
>   }
> +#endif /* !CONFIG_PPC_BOOK3S_64 */
> 
> -#else
> +#else /* CONFIG_SMP */
>   static inline int mm_is_core_local(struct mm_struct *mm)
>   {
>   	return 1;
> diff --git a/arch/powerpc/mm/mmu_context_book3s64.c b/arch/powerpc/mm/mmu_context_book3s64.c
> index 8159f5219137..de17d3e714aa 100644
> --- a/arch/powerpc/mm/mmu_context_book3s64.c
> +++ b/arch/powerpc/mm/mmu_context_book3s64.c
> @@ -174,6 +174,8 @@ int init_new_context(struct task_struct *tsk, struct mm_struct *mm)
>   #ifdef CONFIG_SPAPR_TCE_IOMMU
>   	mm_iommu_init(mm);
>   #endif
> +	atomic_set(&mm->context.active_cpus, 0);
> +
>   	return 0;
>   }
> 

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 5/6] powerpc/mm: Optimize detection of thread local mm's
  2017-08-21 17:27   ` Frederic Barrat
@ 2017-08-21 17:35     ` Benjamin Herrenschmidt
  2017-08-22 13:18       ` Frederic Barrat
  2017-08-24 16:40       ` Frederic Barrat
  2017-08-22  4:28     ` Michael Ellerman
  1 sibling, 2 replies; 33+ messages in thread
From: Benjamin Herrenschmidt @ 2017-08-21 17:35 UTC (permalink / raw)
  To: Frederic Barrat, linuxppc-dev; +Cc: aneesh.kumar, npiggin

On Mon, 2017-08-21 at 19:27 +0200, Frederic Barrat wrote:
> Hi Ben,
> 
> Le 24/07/2017 à 06:28, Benjamin Herrenschmidt a écrit :
> > Instead of comparing the whole CPU mask every time, let's
> > keep a counter of how many bits are set in the mask. Thus
> > testing for a local mm only requires testing if that counter
> > is 1 and the current CPU bit is set in the mask.
> 
> 
> I'm trying to see if we could merge this patch with what I'm trying to 
> do to mark a context as requiring global TLBIs.
> In http://patchwork.ozlabs.org/patch/796775/
> I'm introducing a 'flags' per memory context, using one bit to say if 
> the context needs global TLBIs.
> The 2 could co-exist, just checking... Do you think about using the 
> actual active_cpus count down the road, or is it just a matter of 
> knowing if there are more than one active cpus?

Or you could just incrementer my counter. Just make sure you increment
it at most once per CXL context and decrement when the context is gone.

Cheers,
Ben.

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 5/6] powerpc/mm: Optimize detection of thread local mm's
  2017-08-21 17:27   ` Frederic Barrat
  2017-08-21 17:35     ` Benjamin Herrenschmidt
@ 2017-08-22  4:28     ` Michael Ellerman
  1 sibling, 0 replies; 33+ messages in thread
From: Michael Ellerman @ 2017-08-22  4:28 UTC (permalink / raw)
  To: Frederic Barrat, Benjamin Herrenschmidt, linuxppc-dev
  Cc: aneesh.kumar, npiggin

Frederic Barrat <fbarrat@linux.vnet.ibm.com> writes:

> Hi Ben,
>
> Le 24/07/2017 =C3=A0 06:28, Benjamin Herrenschmidt a =C3=A9crit :
>> Instead of comparing the whole CPU mask every time, let's
>> keep a counter of how many bits are set in the mask. Thus
>> testing for a local mm only requires testing if that counter
>> is 1 and the current CPU bit is set in the mask.
>
>
> I'm trying to see if we could merge this patch with what I'm trying to=20
> do to mark a context as requiring global TLBIs.
> In http://patchwork.ozlabs.org/patch/796775/
> I'm introducing a 'flags' per memory context, using one bit to say if=20
> the context needs global TLBIs.
> The 2 could co-exist, just checking... Do you think about using the=20
> actual active_cpus count down the road, or is it just a matter of=20
> knowing if there are more than one active cpus?

Currently it's just an optimisation to save comparing the full cpumask
every time to detect if we can go local vs broadcast.

So if you increment it then it will mean we do broadcast, which is what
you need.

It's possible in future we might try to do something more complicated,
like send targeted IPIs etc. But if we ever do that we can adapt CXL
then.

cheers

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 5/6] powerpc/mm: Optimize detection of thread local mm's
  2017-08-21 17:35     ` Benjamin Herrenschmidt
@ 2017-08-22 13:18       ` Frederic Barrat
  2017-08-22 16:55         ` Benjamin Herrenschmidt
  2017-08-24 16:40       ` Frederic Barrat
  1 sibling, 1 reply; 33+ messages in thread
From: Frederic Barrat @ 2017-08-22 13:18 UTC (permalink / raw)
  To: Benjamin Herrenschmidt, linuxppc-dev; +Cc: aneesh.kumar, npiggin



Le 21/08/2017 à 19:35, Benjamin Herrenschmidt a écrit :
> On Mon, 2017-08-21 at 19:27 +0200, Frederic Barrat wrote:
>> Hi Ben,
>>
>> Le 24/07/2017 à 06:28, Benjamin Herrenschmidt a écrit :
>>> Instead of comparing the whole CPU mask every time, let's
>>> keep a counter of how many bits are set in the mask. Thus
>>> testing for a local mm only requires testing if that counter
>>> is 1 and the current CPU bit is set in the mask.
>>
>>
>> I'm trying to see if we could merge this patch with what I'm trying to
>> do to mark a context as requiring global TLBIs.
>> In http://patchwork.ozlabs.org/patch/796775/
>> I'm introducing a 'flags' per memory context, using one bit to say if
>> the context needs global TLBIs.
>> The 2 could co-exist, just checking... Do you think about using the
>> actual active_cpus count down the road, or is it just a matter of
>> knowing if there are more than one active cpus?
> 
> Or you could just incrementer my counter. Just make sure you increment
> it at most once per CXL context and decrement when the context is gone.

Ah great, I didn't dare messing with your counter, it makes it easier. 
Arguably what happens on those accelerators is pretty close to an active 
cpu.

Once it is merged, I'm going to have to backport your patch (and an 
update to mine) to the p9-supporting distros. From a quick look, your 
patch, i.e."[PATCH 5/6] powerpc/mm: Optimize detection of thread local 
mm's" is completely independent from the rest of the series, right?

   Fred

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 5/6] powerpc/mm: Optimize detection of thread local mm's
  2017-08-22 13:18       ` Frederic Barrat
@ 2017-08-22 16:55         ` Benjamin Herrenschmidt
  0 siblings, 0 replies; 33+ messages in thread
From: Benjamin Herrenschmidt @ 2017-08-22 16:55 UTC (permalink / raw)
  To: Frederic Barrat, linuxppc-dev; +Cc: aneesh.kumar, npiggin

On Tue, 2017-08-22 at 15:18 +0200, Frederic Barrat wrote:
> > Or you could just incrementer my counter. Just make sure you increment
> > it at most once per CXL context and decrement when the context is gone.
> 
> Ah great, I didn't dare messing with your counter, it makes it easier. 
> Arguably what happens on those accelerators is pretty close to an active 
> cpu.
> 
> Once it is merged, I'm going to have to backport your patch (and an 
> update to mine) to the p9-supporting distros. From a quick look, your 
> patch, i.e."[PATCH 5/6] powerpc/mm: Optimize detection of thread local 
> mm's" is completely independent from the rest of the series, right?

We also need the memory barrier fix but yes.

Cheers,
Ben.

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [3/6] powerpc/mm: Ensure cpumask update is ordered
  2017-07-24  4:28 ` [PATCH 3/6] powerpc/mm: Ensure cpumask update is ordered Benjamin Herrenschmidt
  2017-07-24 11:20   ` Nicholas Piggin
@ 2017-08-23 12:01   ` Michael Ellerman
  1 sibling, 0 replies; 33+ messages in thread
From: Michael Ellerman @ 2017-08-23 12:01 UTC (permalink / raw)
  To: Benjamin Herrenschmidt, linuxppc-dev; +Cc: aneesh.kumar, npiggin

On Mon, 2017-07-24 at 04:28:00 UTC, Benjamin Herrenschmidt wrote:
> There is no guarantee that the various isync's involved with
> the context switch will order the update of the CPU mask with
> the first TLB entry for the new context being loaded by the HW.
> 
> Be safe here and add a memory barrier to order any subsequent
> load/store which may bring entries into the TLB.
> 
> The corresponding barrier on the other side already exists as
> pte updates use pte_xchg() which uses __cmpxchg_u64 which has
> a sync after the atomic operation.
> 
> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
> Reviewed-by: Nicholas Piggin <npiggin@gmail.com>

Applied to powerpc fixes, thanks.

https://git.kernel.org/powerpc/c/1a92a80ad386a1a6e3b36d576d52a1

cheers

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [1/6] powerpc/mm: Move pgdir setting into a helper
  2017-07-24  4:27 [PATCH 1/6] powerpc/mm: Move pgdir setting into a helper Benjamin Herrenschmidt
                   ` (4 preceding siblings ...)
  2017-07-24  4:28 ` [PATCH 6/6] powerpc/mm: Make switch_mm_irqs_off() out of line Benjamin Herrenschmidt
@ 2017-08-24 12:37 ` Michael Ellerman
  5 siblings, 0 replies; 33+ messages in thread
From: Michael Ellerman @ 2017-08-24 12:37 UTC (permalink / raw)
  To: Benjamin Herrenschmidt, linuxppc-dev; +Cc: aneesh.kumar, npiggin

On Mon, 2017-07-24 at 04:27:58 UTC, Benjamin Herrenschmidt wrote:
> Makes switch_mm_irqs_off() a bit more readable
> 
> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>

Patches 1-2, 4-6 applied to powerpc next, thanks.

https://git.kernel.org/powerpc/c/43ed84a891b70165a621a5c9219694

cheers

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 5/6] powerpc/mm: Optimize detection of thread local mm's
  2017-08-21 17:35     ` Benjamin Herrenschmidt
  2017-08-22 13:18       ` Frederic Barrat
@ 2017-08-24 16:40       ` Frederic Barrat
  2017-08-24 18:47         ` Benjamin Herrenschmidt
  1 sibling, 1 reply; 33+ messages in thread
From: Frederic Barrat @ 2017-08-24 16:40 UTC (permalink / raw)
  To: Benjamin Herrenschmidt, linuxppc-dev; +Cc: aneesh.kumar, npiggin



Le 21/08/2017 à 19:35, Benjamin Herrenschmidt a écrit :
> On Mon, 2017-08-21 at 19:27 +0200, Frederic Barrat wrote:
>> Hi Ben,
>>
>> Le 24/07/2017 à 06:28, Benjamin Herrenschmidt a écrit :
>>> Instead of comparing the whole CPU mask every time, let's
>>> keep a counter of how many bits are set in the mask. Thus
>>> testing for a local mm only requires testing if that counter
>>> is 1 and the current CPU bit is set in the mask.
>>
>>
>> I'm trying to see if we could merge this patch with what I'm trying to
>> do to mark a context as requiring global TLBIs.
>> In http://patchwork.ozlabs.org/patch/796775/
>> I'm introducing a 'flags' per memory context, using one bit to say if
>> the context needs global TLBIs.
>> The 2 could co-exist, just checking... Do you think about using the
>> actual active_cpus count down the road, or is it just a matter of
>> knowing if there are more than one active cpus?
> 
> Or you could just incrementer my counter. Just make sure you increment
> it at most once per CXL context and decrement when the context is gone.

The decrementing part is giving me troubles, and I think it makes sense: 
if I decrement the counter when detaching the context from the capi 
card, then the next TLBIs for the memory context may be back to local. 
So when the process exits, the NPU wouldn't get the associated TLBIs, 
which spells trouble the next time the same memory context ID is reused. 
I believe this the cause of the problem I'm seeing. As soon as I keep 
the TLBIs global, even after I detach from the capi adapter, everything 
is fine.

Does it sound right?

So to keep the checks minimal in mm_is_thread_local(), to just checking 
the active_cpus count, I'm thinking of introducing a "copro enabled" bit 
on the context, so that we can increment active_cpus only once. And 
never decrement it.

   Fred

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 5/6] powerpc/mm: Optimize detection of thread local mm's
  2017-08-24 16:40       ` Frederic Barrat
@ 2017-08-24 18:47         ` Benjamin Herrenschmidt
  2017-08-25  4:53           ` Frederic Barrat
  0 siblings, 1 reply; 33+ messages in thread
From: Benjamin Herrenschmidt @ 2017-08-24 18:47 UTC (permalink / raw)
  To: Frederic Barrat, linuxppc-dev; +Cc: aneesh.kumar, npiggin

On Thu, 2017-08-24 at 18:40 +0200, Frederic Barrat wrote:
> 
> The decrementing part is giving me troubles, and I think it makes sense: 
> if I decrement the counter when detaching the context from the capi 
> card, then the next TLBIs for the memory context may be back to local. 

Yes, you need to flush the CAPI TLB first.

> So when the process exits, the NPU wouldn't get the associated TLBIs, 
> which spells trouble the next time the same memory context ID is reused. 
> I believe this the cause of the problem I'm seeing. As soon as I keep 
> the TLBIs global, even after I detach from the capi adapter, everything 
> is fine.
> 
> Does it sound right?
> 
> So to keep the checks minimal in mm_is_thread_local(), to just checking 
> the active_cpus count, I'm thinking of introducing a "copro enabled" bit 
> on the context, so that we can increment active_cpus only once. And 
> never decrement it.

You can decrement if you flush. Don't you have MMIOs to do directed
flushes ?

Cheers,
Ben.

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 5/6] powerpc/mm: Optimize detection of thread local mm's
  2017-08-24 18:47         ` Benjamin Herrenschmidt
@ 2017-08-25  4:53           ` Frederic Barrat
  2017-08-25  7:44             ` Benjamin Herrenschmidt
  0 siblings, 1 reply; 33+ messages in thread
From: Frederic Barrat @ 2017-08-25  4:53 UTC (permalink / raw)
  To: Benjamin Herrenschmidt, linuxppc-dev; +Cc: aneesh.kumar, npiggin



Le 24/08/2017 à 20:47, Benjamin Herrenschmidt a écrit :
> On Thu, 2017-08-24 at 18:40 +0200, Frederic Barrat wrote:
>>
>> The decrementing part is giving me troubles, and I think it makes sense:
>> if I decrement the counter when detaching the context from the capi
>> card, then the next TLBIs for the memory context may be back to local.
> 
> Yes, you need to flush the CAPI TLB first.
> 
>> So when the process exits, the NPU wouldn't get the associated TLBIs,
>> which spells trouble the next time the same memory context ID is reused.
>> I believe this the cause of the problem I'm seeing. As soon as I keep
>> the TLBIs global, even after I detach from the capi adapter, everything
>> is fine.
>>
>> Does it sound right?
>>
>> So to keep the checks minimal in mm_is_thread_local(), to just checking
>> the active_cpus count, I'm thinking of introducing a "copro enabled" bit
>> on the context, so that we can increment active_cpus only once. And
>> never decrement it.
> 
> You can decrement if you flush. Don't you have MMIOs to do directed
> flushes ?

That's for the nMMU. Last I heard, we don't have MMIOs to flush anything 
on the nMMU.

Side note: for the PSL, we do have MMIOs to flush, but they were 
perceived as useful only for debug and we don't rely on them, precisely 
because the nMMU would fall out of sync, so we have to rely on broadcast.

   Fred

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 5/6] powerpc/mm: Optimize detection of thread local mm's
  2017-08-25  4:53           ` Frederic Barrat
@ 2017-08-25  7:44             ` Benjamin Herrenschmidt
  2017-08-25  8:03               ` Frederic Barrat
  0 siblings, 1 reply; 33+ messages in thread
From: Benjamin Herrenschmidt @ 2017-08-25  7:44 UTC (permalink / raw)
  To: Frederic Barrat, linuxppc-dev; +Cc: aneesh.kumar, npiggin

On Fri, 2017-08-25 at 06:53 +0200, Frederic Barrat wrote:
> 
> Le 24/08/2017 à 20:47, Benjamin Herrenschmidt a écrit :
> > On Thu, 2017-08-24 at 18:40 +0200, Frederic Barrat wrote:
> > > 
> > > The decrementing part is giving me troubles, and I think it makes sense:
> > > if I decrement the counter when detaching the context from the capi
> > > card, then the next TLBIs for the memory context may be back to local.
> > 
> > Yes, you need to flush the CAPI TLB first.
> > 
> > > So when the process exits, the NPU wouldn't get the associated TLBIs,
> > > which spells trouble the next time the same memory context ID is reused.
> > > I believe this the cause of the problem I'm seeing. As soon as I keep
> > > the TLBIs global, even after I detach from the capi adapter, everything
> > > is fine.
> > > 
> > > Does it sound right?
> > > 
> > > So to keep the checks minimal in mm_is_thread_local(), to just checking
> > > the active_cpus count, I'm thinking of introducing a "copro enabled" bit
> > > on the context, so that we can increment active_cpus only once. And
> > > never decrement it.
> > 
> > You can decrement if you flush. Don't you have MMIOs to do directed
> > flushes ?
> 
> That's for the nMMU. Last I heard, we don't have MMIOs to flush anything 
> on the nMMU.
> 
> Side note: for the PSL, we do have MMIOs to flush, but they were 
> perceived as useful only for debug and we don't rely on them, precisely 
> because the nMMU would fall out of sync, so we have to rely on broadcast.

Well, you can always do a broadcast tlbi to flush the whole PID if you
decrement... that shouldn't be a very frequent operation.

Ben.

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 5/6] powerpc/mm: Optimize detection of thread local mm's
  2017-08-25  7:44             ` Benjamin Herrenschmidt
@ 2017-08-25  8:03               ` Frederic Barrat
  0 siblings, 0 replies; 33+ messages in thread
From: Frederic Barrat @ 2017-08-25  8:03 UTC (permalink / raw)
  To: Benjamin Herrenschmidt, linuxppc-dev; +Cc: aneesh.kumar, npiggin



Le 25/08/2017 à 09:44, Benjamin Herrenschmidt a écrit :
> On Fri, 2017-08-25 at 06:53 +0200, Frederic Barrat wrote:
>>
>> Le 24/08/2017 à 20:47, Benjamin Herrenschmidt a écrit :
>>> On Thu, 2017-08-24 at 18:40 +0200, Frederic Barrat wrote:
>>>>
>>>> The decrementing part is giving me troubles, and I think it makes sense:
>>>> if I decrement the counter when detaching the context from the capi
>>>> card, then the next TLBIs for the memory context may be back to local.
>>>
>>> Yes, you need to flush the CAPI TLB first.
>>>
>>>> So when the process exits, the NPU wouldn't get the associated TLBIs,
>>>> which spells trouble the next time the same memory context ID is reused.
>>>> I believe this the cause of the problem I'm seeing. As soon as I keep
>>>> the TLBIs global, even after I detach from the capi adapter, everything
>>>> is fine.
>>>>
>>>> Does it sound right?
>>>>
>>>> So to keep the checks minimal in mm_is_thread_local(), to just checking
>>>> the active_cpus count, I'm thinking of introducing a "copro enabled" bit
>>>> on the context, so that we can increment active_cpus only once. And
>>>> never decrement it.
>>>
>>> You can decrement if you flush. Don't you have MMIOs to do directed
>>> flushes ?
>>
>> That's for the nMMU. Last I heard, we don't have MMIOs to flush anything
>> on the nMMU.
>>
>> Side note: for the PSL, we do have MMIOs to flush, but they were
>> perceived as useful only for debug and we don't rely on them, precisely
>> because the nMMU would fall out of sync, so we have to rely on broadcast.
> 
> Well, you can always do a broadcast tlbi to flush the whole PID if you
> decrement... that shouldn't be a very frequent operation.

Arg, yes!
Thanks!

   Fred

^ permalink raw reply	[flat|nested] 33+ messages in thread

end of thread, other threads:[~2017-08-25  8:04 UTC | newest]

Thread overview: 33+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-07-24  4:27 [PATCH 1/6] powerpc/mm: Move pgdir setting into a helper Benjamin Herrenschmidt
2017-07-24  4:27 ` [PATCH 2/6] powerpc/mm: Avoid double irq save/restore in activate_mm Benjamin Herrenschmidt
2017-07-24  4:28 ` [PATCH 3/6] powerpc/mm: Ensure cpumask update is ordered Benjamin Herrenschmidt
2017-07-24 11:20   ` Nicholas Piggin
2017-07-24 20:54     ` Benjamin Herrenschmidt
2017-08-11 11:06     ` Nicholas Piggin
2017-08-11 22:40       ` Benjamin Herrenschmidt
2017-08-17 12:58       ` Michael Ellerman
2017-08-23 12:01   ` [3/6] " Michael Ellerman
2017-07-24  4:28 ` [PATCH 4/6] powerpc/mm: Use mm_is_thread_local() instread of open-coding Benjamin Herrenschmidt
2017-07-24  4:28 ` [PATCH 5/6] powerpc/mm: Optimize detection of thread local mm's Benjamin Herrenschmidt
2017-07-24 11:25   ` Nicholas Piggin
2017-07-24 13:46     ` Michael Ellerman
2017-07-25  0:34       ` Nicholas Piggin
2017-07-25 12:00         ` Michael Ellerman
2017-07-24 20:58     ` Benjamin Herrenschmidt
2017-07-25  0:44       ` Nicholas Piggin
2017-07-25  1:03         ` Benjamin Herrenschmidt
2017-07-25 10:55           ` Nicholas Piggin
2017-08-04 12:06   ` Frederic Barrat
2017-08-04 12:58     ` Benjamin Herrenschmidt
2017-08-21 17:27   ` Frederic Barrat
2017-08-21 17:35     ` Benjamin Herrenschmidt
2017-08-22 13:18       ` Frederic Barrat
2017-08-22 16:55         ` Benjamin Herrenschmidt
2017-08-24 16:40       ` Frederic Barrat
2017-08-24 18:47         ` Benjamin Herrenschmidt
2017-08-25  4:53           ` Frederic Barrat
2017-08-25  7:44             ` Benjamin Herrenschmidt
2017-08-25  8:03               ` Frederic Barrat
2017-08-22  4:28     ` Michael Ellerman
2017-07-24  4:28 ` [PATCH 6/6] powerpc/mm: Make switch_mm_irqs_off() out of line Benjamin Herrenschmidt
2017-08-24 12:37 ` [1/6] powerpc/mm: Move pgdir setting into a helper Michael Ellerman

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.