linuxppc-dev.lists.ozlabs.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 0/6] Making tlbie optional for radix
@ 2019-09-02 15:29 Nicholas Piggin
  2019-09-02 15:29 ` [PATCH 1/6] powerpc/64s: remove register_process_table callback Nicholas Piggin
                   ` (6 more replies)
  0 siblings, 7 replies; 11+ messages in thread
From: Nicholas Piggin @ 2019-09-02 15:29 UTC (permalink / raw)
  To: linuxppc-dev; +Cc: Nicholas Piggin

This is a rebase of the series against the the powerpc next branch
with ultravisor changes. Main improvements are implementing and
splitting out the precursor patches better.

KVM still requires tlbie to run radix guests. A naive implementation
of tlbiel + IPI for LPID flushes was crashing so requires more
investigation.

Thanks,
Nick

Nicholas Piggin (6):
  powerpc/64s: remove register_process_table callback
  powerpc/64s/radix: tidy up TLB flushing code
  powerpc/64s: make mmu_partition_table_set_entry TLB flush optional
  powerpc/64s/pseries: radix flush translations before MMU is enabled at
    boot
  powerpc/64s: remove unnecessary translation cache flushes at boot
  powerpc/64s/radix: introduce options to disable use of the tlbie
    instruction

 .../admin-guide/kernel-parameters.txt         |   4 +
 arch/powerpc/include/asm/book3s/64/mmu.h      |   4 -
 .../include/asm/book3s/64/tlbflush-radix.h    |  12 +-
 arch/powerpc/include/asm/book3s/64/tlbflush.h |   9 +
 arch/powerpc/include/asm/mmu.h                |   2 +-
 arch/powerpc/kvm/book3s_hv.c                  |   6 +
 arch/powerpc/kvm/book3s_hv_nested.c           |   4 +-
 arch/powerpc/mm/book3s64/hash_utils.c         |   8 +-
 arch/powerpc/mm/book3s64/pgtable.c            |  72 ++++-
 arch/powerpc/mm/book3s64/radix_pgtable.c      |  45 +--
 arch/powerpc/mm/book3s64/radix_tlb.c          | 303 ++++++++++++------
 arch/powerpc/platforms/pseries/lpar.c         |  12 +-
 drivers/misc/cxl/main.c                       |   4 +
 drivers/misc/ocxl/main.c                      |   4 +
 14 files changed, 308 insertions(+), 181 deletions(-)

-- 
2.22.0


^ permalink raw reply	[flat|nested] 11+ messages in thread

* [PATCH 1/6] powerpc/64s: remove register_process_table callback
  2019-09-02 15:29 [PATCH 0/6] Making tlbie optional for radix Nicholas Piggin
@ 2019-09-02 15:29 ` Nicholas Piggin
  2019-09-19 10:25   ` Michael Ellerman
  2019-09-02 15:29 ` [PATCH 2/6] powerpc/64s/radix: tidy up TLB flushing code Nicholas Piggin
                   ` (5 subsequent siblings)
  6 siblings, 1 reply; 11+ messages in thread
From: Nicholas Piggin @ 2019-09-02 15:29 UTC (permalink / raw)
  To: linuxppc-dev; +Cc: Nicholas Piggin

This callback is only required because the partition table init comes
before process table allocation on powernv (aka bare metal aka native).

Change the order to allocate the process table first, and remove the
callback.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
---
 arch/powerpc/include/asm/book3s/64/mmu.h |  4 ---
 arch/powerpc/mm/book3s64/hash_utils.c    |  6 ----
 arch/powerpc/mm/book3s64/pgtable.c       |  3 --
 arch/powerpc/mm/book3s64/radix_pgtable.c | 45 +++++++-----------------
 arch/powerpc/platforms/pseries/lpar.c    | 17 +++++++--
 5 files changed, 27 insertions(+), 48 deletions(-)

diff --git a/arch/powerpc/include/asm/book3s/64/mmu.h b/arch/powerpc/include/asm/book3s/64/mmu.h
index 23b83d3593e2..bb3deb76c951 100644
--- a/arch/powerpc/include/asm/book3s/64/mmu.h
+++ b/arch/powerpc/include/asm/book3s/64/mmu.h
@@ -206,7 +206,6 @@ extern int mmu_io_psize;
 void mmu_early_init_devtree(void);
 void hash__early_init_devtree(void);
 void radix__early_init_devtree(void);
-extern void radix_init_native(void);
 extern void hash__early_init_mmu(void);
 extern void radix__early_init_mmu(void);
 static inline void early_init_mmu(void)
@@ -238,9 +237,6 @@ static inline void setup_initial_memory_limit(phys_addr_t first_memblock_base,
 					   first_memblock_size);
 }
 
-extern int (*register_process_table)(unsigned long base, unsigned long page_size,
-				     unsigned long tbl_size);
-
 #ifdef CONFIG_PPC_PSERIES
 extern void radix_init_pseries(void);
 #else
diff --git a/arch/powerpc/mm/book3s64/hash_utils.c b/arch/powerpc/mm/book3s64/hash_utils.c
index fe99bba39b69..7aed27ea5361 100644
--- a/arch/powerpc/mm/book3s64/hash_utils.c
+++ b/arch/powerpc/mm/book3s64/hash_utils.c
@@ -859,12 +859,6 @@ static void __init htab_initialize(void)
 		/* Using a hypervisor which owns the htab */
 		htab_address = NULL;
 		_SDR1 = 0; 
-		/*
-		 * On POWER9, we need to do a H_REGISTER_PROC_TBL hcall
-		 * to inform the hypervisor that we wish to use the HPT.
-		 */
-		if (cpu_has_feature(CPU_FTR_ARCH_300))
-			register_process_table(0, 0, 0);
 #ifdef CONFIG_FA_DUMP
 		/*
 		 * If firmware assisted dump is active firmware preserves
diff --git a/arch/powerpc/mm/book3s64/pgtable.c b/arch/powerpc/mm/book3s64/pgtable.c
index 206b43ae4000..97f3be778c79 100644
--- a/arch/powerpc/mm/book3s64/pgtable.c
+++ b/arch/powerpc/mm/book3s64/pgtable.c
@@ -23,9 +23,6 @@ EXPORT_SYMBOL(__pmd_frag_nr);
 unsigned long __pmd_frag_size_shift;
 EXPORT_SYMBOL(__pmd_frag_size_shift);
 
-int (*register_process_table)(unsigned long base, unsigned long page_size,
-			      unsigned long tbl_size);
-
 #ifdef CONFIG_TRANSPARENT_HUGEPAGE
 /*
  * This is called when relaxing access to a hugepage. It's also called in the page
diff --git a/arch/powerpc/mm/book3s64/radix_pgtable.c b/arch/powerpc/mm/book3s64/radix_pgtable.c
index 71b649473045..83fa7864e8f4 100644
--- a/arch/powerpc/mm/book3s64/radix_pgtable.c
+++ b/arch/powerpc/mm/book3s64/radix_pgtable.c
@@ -34,19 +34,6 @@
 unsigned int mmu_pid_bits;
 unsigned int mmu_base_pid;
 
-static int native_register_process_table(unsigned long base, unsigned long pg_sz,
-					 unsigned long table_size)
-{
-	unsigned long patb0, patb1;
-
-	patb0 = be64_to_cpu(partition_tb[0].patb0);
-	patb1 = base | table_size | PATB_GR;
-
-	mmu_partition_table_set_entry(0, patb0, patb1);
-
-	return 0;
-}
-
 static __ref void *early_alloc_pgtable(unsigned long size, int nid,
 			unsigned long region_start, unsigned long region_end)
 {
@@ -381,18 +368,8 @@ static void __init radix_init_pgtable(void)
 	 */
 	rts_field = radix__get_tree_size();
 	process_tb->prtb0 = cpu_to_be64(rts_field | __pa(init_mm.pgd) | RADIX_PGD_INDEX_SIZE);
-	/*
-	 * Fill in the partition table. We are suppose to use effective address
-	 * of process table here. But our linear mapping also enable us to use
-	 * physical address here.
-	 */
-	register_process_table(__pa(process_tb), 0, PRTB_SIZE_SHIFT - 12);
+
 	pr_info("Process table %p and radix root for kernel: %p\n", process_tb, init_mm.pgd);
-	asm volatile("ptesync" : : : "memory");
-	asm volatile(PPC_TLBIE_5(%0,%1,2,1,1) : :
-		     "r" (TLBIEL_INVAL_SET_LPID), "r" (0));
-	asm volatile("eieio; tlbsync; ptesync" : : : "memory");
-	trace_tlbie(0, 0, TLBIEL_INVAL_SET_LPID, 0, 2, 1, 1);
 
 	/*
 	 * The init_mm context is given the first available (non-zero) PID,
@@ -413,22 +390,24 @@ static void __init radix_init_pgtable(void)
 
 static void __init radix_init_partition_table(void)
 {
-	unsigned long rts_field, dw0;
+	unsigned long rts_field, dw0, dw1;
 
 	mmu_partition_table_init();
 	rts_field = radix__get_tree_size();
 	dw0 = rts_field | __pa(init_mm.pgd) | RADIX_PGD_INDEX_SIZE | PATB_HR;
-	mmu_partition_table_set_entry(0, dw0, 0);
+	dw1 = __pa(process_tb) | (PRTB_SIZE_SHIFT - 12) | PATB_GR;
+	mmu_partition_table_set_entry(0, dw0, dw1);
+
+	asm volatile("ptesync" : : : "memory");
+	asm volatile(PPC_TLBIE_5(%0,%1,2,1,1) : :
+		     "r" (TLBIEL_INVAL_SET_LPID), "r" (0));
+	asm volatile("eieio; tlbsync; ptesync" : : : "memory");
+	trace_tlbie(0, 0, TLBIEL_INVAL_SET_LPID, 0, 2, 1, 1);
 
 	pr_info("Initializing Radix MMU\n");
 	pr_info("Partition table %p\n", partition_tb);
 }
 
-void __init radix_init_native(void)
-{
-	register_process_table = native_register_process_table;
-}
-
 static int __init get_idx_from_shift(unsigned int shift)
 {
 	int idx = -1;
@@ -622,8 +601,9 @@ void __init radix__early_init_mmu(void)
 	__pmd_frag_nr = RADIX_PMD_FRAG_NR;
 	__pmd_frag_size_shift = RADIX_PMD_FRAG_SIZE_SHIFT;
 
+	radix_init_pgtable();
+
 	if (!firmware_has_feature(FW_FEATURE_LPAR)) {
-		radix_init_native();
 		lpcr = mfspr(SPRN_LPCR);
 		mtspr(SPRN_LPCR, lpcr | LPCR_UPRT | LPCR_HR);
 		radix_init_partition_table();
@@ -634,7 +614,6 @@ void __init radix__early_init_mmu(void)
 
 	memblock_set_current_limit(MEMBLOCK_ALLOC_ANYWHERE);
 
-	radix_init_pgtable();
 	/* Switch to the guard PID before turning on MMU */
 	radix__switch_mmu_context(NULL, &init_mm);
 	if (cpu_has_feature(CPU_FTR_HVMODE))
diff --git a/arch/powerpc/platforms/pseries/lpar.c b/arch/powerpc/platforms/pseries/lpar.c
index 4f76e5f30c97..b3205a6c950c 100644
--- a/arch/powerpc/platforms/pseries/lpar.c
+++ b/arch/powerpc/platforms/pseries/lpar.c
@@ -1531,16 +1531,29 @@ void __init hpte_init_pseries(void)
 	mmu_hash_ops.flush_hash_range	 = pSeries_lpar_flush_hash_range;
 	mmu_hash_ops.hpte_clear_all      = pseries_hpte_clear_all;
 	mmu_hash_ops.hugepage_invalidate = pSeries_lpar_hugepage_invalidate;
-	register_process_table		 = pseries_lpar_register_process_table;
 
 	if (firmware_has_feature(FW_FEATURE_HPT_RESIZE))
 		mmu_hash_ops.resize_hpt = pseries_lpar_resize_hpt;
+
+	/*
+	 * On POWER9, we need to do a H_REGISTER_PROC_TBL hcall
+	 * to inform the hypervisor that we wish to use the HPT.
+	 */
+	if (cpu_has_feature(CPU_FTR_ARCH_300))
+		pseries_lpar_register_process_table(0, 0, 0);
 }
 
 void radix_init_pseries(void)
 {
 	pr_info("Using radix MMU under hypervisor\n");
-	register_process_table = pseries_lpar_register_process_table;
+
+	pseries_lpar_register_process_table(__pa(process_tb),
+						0, PRTB_SIZE_SHIFT - 12);
+	asm volatile("ptesync" : : : "memory");
+	asm volatile(PPC_TLBIE_5(%0,%1,2,1,1) : :
+		     "r" (TLBIEL_INVAL_SET_LPID), "r" (0));
+	asm volatile("eieio; tlbsync; ptesync" : : : "memory");
+	trace_tlbie(0, 0, TLBIEL_INVAL_SET_LPID, 0, 2, 1, 1);
 }
 
 #ifdef CONFIG_PPC_SMLPAR
-- 
2.22.0


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [PATCH 2/6] powerpc/64s/radix: tidy up TLB flushing code
  2019-09-02 15:29 [PATCH 0/6] Making tlbie optional for radix Nicholas Piggin
  2019-09-02 15:29 ` [PATCH 1/6] powerpc/64s: remove register_process_table callback Nicholas Piggin
@ 2019-09-02 15:29 ` Nicholas Piggin
  2019-09-02 15:29 ` [PATCH 3/6] powerpc/64s: make mmu_partition_table_set_entry TLB flush optional Nicholas Piggin
                   ` (4 subsequent siblings)
  6 siblings, 0 replies; 11+ messages in thread
From: Nicholas Piggin @ 2019-09-02 15:29 UTC (permalink / raw)
  To: linuxppc-dev; +Cc: Nicholas Piggin

There should be no functional changes.

- Use calls to existing radix_tlb.c functions in flush_partition.

- Rename radix__flush_tlb_lpid to radix__flush_all_lpid and similar,
  because they flush everything, matching flush_all_mm rather than
  flush_tlb_mm for the lpid.

- Remove some unused radix_tlb.c flush primitives.

Signed-off: Nicholas Piggin <npiggin@gmail.com>
---
 .../include/asm/book3s/64/tlbflush-radix.h    |  12 +-
 arch/powerpc/kvm/book3s_hv_nested.c           |   2 +-
 arch/powerpc/mm/book3s64/pgtable.c            |  13 +-
 arch/powerpc/mm/book3s64/radix_tlb.c          | 117 ++++--------------
 4 files changed, 34 insertions(+), 110 deletions(-)

diff --git a/arch/powerpc/include/asm/book3s/64/tlbflush-radix.h b/arch/powerpc/include/asm/book3s/64/tlbflush-radix.h
index 05147cecb8df..4ce795d30377 100644
--- a/arch/powerpc/include/asm/book3s/64/tlbflush-radix.h
+++ b/arch/powerpc/include/asm/book3s/64/tlbflush-radix.h
@@ -17,8 +17,8 @@ extern void radix__flush_tlb_lpid_page(unsigned int lpid,
 					unsigned long addr,
 					unsigned long page_size);
 extern void radix__flush_pwc_lpid(unsigned int lpid);
-extern void radix__flush_tlb_lpid(unsigned int lpid);
-extern void radix__local_flush_tlb_lpid_guest(unsigned int lpid);
+extern void radix__flush_all_lpid(unsigned int lpid);
+extern void radix__flush_all_lpid_guest(unsigned int lpid);
 #else
 static inline void radix__tlbiel_all(unsigned int action) { WARN_ON(1); };
 static inline void radix__flush_tlb_lpid_page(unsigned int lpid,
@@ -31,11 +31,7 @@ static inline void radix__flush_pwc_lpid(unsigned int lpid)
 {
 	WARN_ON(1);
 }
-static inline void radix__flush_tlb_lpid(unsigned int lpid)
-{
-	WARN_ON(1);
-}
-static inline void radix__local_flush_tlb_lpid_guest(unsigned int lpid)
+static inline void radix__flush_all_lpid(unsigned int lpid)
 {
 	WARN_ON(1);
 }
@@ -73,6 +69,4 @@ extern void radix__flush_tlb_pwc(struct mmu_gather *tlb, unsigned long addr);
 extern void radix__flush_tlb_collapsed_pmd(struct mm_struct *mm, unsigned long addr);
 extern void radix__flush_tlb_all(void);
 
-extern void radix__local_flush_tlb_lpid(unsigned int lpid);
-
 #endif
diff --git a/arch/powerpc/kvm/book3s_hv_nested.c b/arch/powerpc/kvm/book3s_hv_nested.c
index 735e0ac6f5b2..b3316da2f13e 100644
--- a/arch/powerpc/kvm/book3s_hv_nested.c
+++ b/arch/powerpc/kvm/book3s_hv_nested.c
@@ -398,7 +398,7 @@ static void kvmhv_flush_lpid(unsigned int lpid)
 	long rc;
 
 	if (!kvmhv_on_pseries()) {
-		radix__flush_tlb_lpid(lpid);
+		radix__flush_all_lpid(lpid);
 		return;
 	}
 
diff --git a/arch/powerpc/mm/book3s64/pgtable.c b/arch/powerpc/mm/book3s64/pgtable.c
index 97f3be778c79..c2b87c5ba50b 100644
--- a/arch/powerpc/mm/book3s64/pgtable.c
+++ b/arch/powerpc/mm/book3s64/pgtable.c
@@ -210,20 +210,17 @@ void __init mmu_partition_table_init(void)
 
 static void flush_partition(unsigned int lpid, bool radix)
 {
-	asm volatile("ptesync" : : : "memory");
 	if (radix) {
-		asm volatile(PPC_TLBIE_5(%0,%1,2,0,1) : :
-			     "r" (TLBIEL_INVAL_SET_LPID), "r" (lpid));
-		asm volatile(PPC_TLBIE_5(%0,%1,2,1,1) : :
-			     "r" (TLBIEL_INVAL_SET_LPID), "r" (lpid));
-		trace_tlbie(lpid, 0, TLBIEL_INVAL_SET_LPID, lpid, 2, 0, 1);
+		radix__flush_all_lpid(lpid);
+		radix__flush_all_lpid_guest(lpid);
 	} else {
+		asm volatile("ptesync" : : : "memory");
 		asm volatile(PPC_TLBIE_5(%0,%1,2,0,0) : :
 			     "r" (TLBIEL_INVAL_SET_LPID), "r" (lpid));
+		/* do we need fixup here ?*/
+		asm volatile("eieio; tlbsync; ptesync" : : : "memory");
 		trace_tlbie(lpid, 0, TLBIEL_INVAL_SET_LPID, lpid, 2, 0, 0);
 	}
-	/* do we need fixup here ?*/
-	asm volatile("eieio; tlbsync; ptesync" : : : "memory");
 }
 
 void mmu_partition_table_set_entry(unsigned int lpid, unsigned long dw0,
diff --git a/arch/powerpc/mm/book3s64/radix_tlb.c b/arch/powerpc/mm/book3s64/radix_tlb.c
index 71f7fede2fa4..082f90d068ee 100644
--- a/arch/powerpc/mm/book3s64/radix_tlb.c
+++ b/arch/powerpc/mm/book3s64/radix_tlb.c
@@ -116,22 +116,6 @@ static __always_inline void __tlbie_pid(unsigned long pid, unsigned long ric)
 	trace_tlbie(0, 0, rb, rs, ric, prs, r);
 }
 
-static __always_inline void __tlbiel_lpid(unsigned long lpid, int set,
-				unsigned long ric)
-{
-	unsigned long rb,rs,prs,r;
-
-	rb = PPC_BIT(52); /* IS = 2 */
-	rb |= set << PPC_BITLSHIFT(51);
-	rs = 0;  /* LPID comes from LPIDR */
-	prs = 0; /* partition scoped */
-	r = 1;   /* radix format */
-
-	asm volatile(PPC_TLBIEL(%0, %4, %3, %2, %1)
-		     : : "r"(rb), "i"(r), "i"(prs), "i"(ric), "r"(rs) : "memory");
-	trace_tlbie(lpid, 1, rb, rs, ric, prs, r);
-}
-
 static __always_inline void __tlbie_lpid(unsigned long lpid, unsigned long ric)
 {
 	unsigned long rb,rs,prs,r;
@@ -146,23 +130,20 @@ static __always_inline void __tlbie_lpid(unsigned long lpid, unsigned long ric)
 	trace_tlbie(lpid, 0, rb, rs, ric, prs, r);
 }
 
-static __always_inline void __tlbiel_lpid_guest(unsigned long lpid, int set,
-						unsigned long ric)
+static __always_inline void __tlbie_lpid_guest(unsigned long lpid, unsigned long ric)
 {
 	unsigned long rb,rs,prs,r;
 
 	rb = PPC_BIT(52); /* IS = 2 */
-	rb |= set << PPC_BITLSHIFT(51);
-	rs = 0;  /* LPID comes from LPIDR */
+	rs = lpid;
 	prs = 1; /* process scoped */
 	r = 1;   /* radix format */
 
-	asm volatile(PPC_TLBIEL(%0, %4, %3, %2, %1)
+	asm volatile(PPC_TLBIE_5(%0, %4, %3, %2, %1)
 		     : : "r"(rb), "i"(r), "i"(prs), "i"(ric), "r"(rs) : "memory");
-	trace_tlbie(lpid, 1, rb, rs, ric, prs, r);
+	trace_tlbie(lpid, 0, rb, rs, ric, prs, r);
 }
 
-
 static __always_inline void __tlbiel_va(unsigned long va, unsigned long pid,
 					unsigned long ap, unsigned long ric)
 {
@@ -285,34 +266,6 @@ static inline void _tlbie_pid(unsigned long pid, unsigned long ric)
 	asm volatile("eieio; tlbsync; ptesync": : :"memory");
 }
 
-static inline void _tlbiel_lpid(unsigned long lpid, unsigned long ric)
-{
-	int set;
-
-	VM_BUG_ON(mfspr(SPRN_LPID) != lpid);
-
-	asm volatile("ptesync": : :"memory");
-
-	/*
-	 * Flush the first set of the TLB, and if we're doing a RIC_FLUSH_ALL,
-	 * also flush the entire Page Walk Cache.
-	 */
-	__tlbiel_lpid(lpid, 0, ric);
-
-	/* For PWC, only one flush is needed */
-	if (ric == RIC_FLUSH_PWC) {
-		asm volatile("ptesync": : :"memory");
-		return;
-	}
-
-	/* For the remaining sets, just flush the TLB */
-	for (set = 1; set < POWER9_TLB_SETS_RADIX ; set++)
-		__tlbiel_lpid(lpid, set, RIC_FLUSH_TLB);
-
-	asm volatile("ptesync": : :"memory");
-	asm volatile(PPC_RADIX_INVALIDATE_ERAT_GUEST "; isync" : : :"memory");
-}
-
 static inline void _tlbie_lpid(unsigned long lpid, unsigned long ric)
 {
 	asm volatile("ptesync": : :"memory");
@@ -337,35 +290,28 @@ static inline void _tlbie_lpid(unsigned long lpid, unsigned long ric)
 	asm volatile("eieio; tlbsync; ptesync": : :"memory");
 }
 
-static __always_inline void _tlbiel_lpid_guest(unsigned long lpid, unsigned long ric)
+static __always_inline void _tlbie_lpid_guest(unsigned long lpid, unsigned long ric)
 {
-	int set;
-
-	VM_BUG_ON(mfspr(SPRN_LPID) != lpid);
-
-	asm volatile("ptesync": : :"memory");
-
 	/*
-	 * Flush the first set of the TLB, and if we're doing a RIC_FLUSH_ALL,
-	 * also flush the entire Page Walk Cache.
+	 * Workaround the fact that the "ric" argument to __tlbie_pid
+	 * must be a compile-time contraint to match the "i" constraint
+	 * in the asm statement.
 	 */
-	__tlbiel_lpid_guest(lpid, 0, ric);
-
-	/* For PWC, only one flush is needed */
-	if (ric == RIC_FLUSH_PWC) {
-		asm volatile("ptesync": : :"memory");
-		return;
+	switch (ric) {
+	case RIC_FLUSH_TLB:
+		__tlbie_lpid_guest(lpid, RIC_FLUSH_TLB);
+		break;
+	case RIC_FLUSH_PWC:
+		__tlbie_lpid_guest(lpid, RIC_FLUSH_PWC);
+		break;
+	case RIC_FLUSH_ALL:
+	default:
+		__tlbie_lpid_guest(lpid, RIC_FLUSH_ALL);
 	}
-
-	/* For the remaining sets, just flush the TLB */
-	for (set = 1; set < POWER9_TLB_SETS_RADIX ; set++)
-		__tlbiel_lpid_guest(lpid, set, RIC_FLUSH_TLB);
-
-	asm volatile("ptesync": : :"memory");
-	asm volatile(PPC_RADIX_INVALIDATE_ERAT_GUEST : : :"memory");
+	fixup_tlbie_lpid(lpid);
+	asm volatile("eieio; tlbsync; ptesync": : :"memory");
 }
 
-
 static inline void __tlbiel_va_range(unsigned long start, unsigned long end,
 				    unsigned long pid, unsigned long page_size,
 				    unsigned long psize)
@@ -835,32 +781,19 @@ EXPORT_SYMBOL_GPL(radix__flush_pwc_lpid);
 /*
  * Flush partition scoped translations from LPID (=LPIDR)
  */
-void radix__flush_tlb_lpid(unsigned int lpid)
+void radix__flush_all_lpid(unsigned int lpid)
 {
 	_tlbie_lpid(lpid, RIC_FLUSH_ALL);
 }
-EXPORT_SYMBOL_GPL(radix__flush_tlb_lpid);
+EXPORT_SYMBOL_GPL(radix__flush_all_lpid);
 
 /*
- * Flush partition scoped translations from LPID (=LPIDR)
+ * Flush process scoped translations from LPID (=LPIDR)
  */
-void radix__local_flush_tlb_lpid(unsigned int lpid)
+void radix__flush_all_lpid_guest(unsigned int lpid)
 {
-	_tlbiel_lpid(lpid, RIC_FLUSH_ALL);
+	_tlbie_lpid_guest(lpid, RIC_FLUSH_ALL);
 }
-EXPORT_SYMBOL_GPL(radix__local_flush_tlb_lpid);
-
-/*
- * Flush process scoped translations from LPID (=LPIDR).
- * Important difference, the guest normally manages its own translations,
- * but some cases e.g., vCPU CPU migration require KVM to flush.
- */
-void radix__local_flush_tlb_lpid_guest(unsigned int lpid)
-{
-	_tlbiel_lpid_guest(lpid, RIC_FLUSH_ALL);
-}
-EXPORT_SYMBOL_GPL(radix__local_flush_tlb_lpid_guest);
-
 
 static void radix__flush_tlb_pwc_range_psize(struct mm_struct *mm, unsigned long start,
 				  unsigned long end, int psize);
-- 
2.22.0


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [PATCH 3/6] powerpc/64s: make mmu_partition_table_set_entry TLB flush optional
  2019-09-02 15:29 [PATCH 0/6] Making tlbie optional for radix Nicholas Piggin
  2019-09-02 15:29 ` [PATCH 1/6] powerpc/64s: remove register_process_table callback Nicholas Piggin
  2019-09-02 15:29 ` [PATCH 2/6] powerpc/64s/radix: tidy up TLB flushing code Nicholas Piggin
@ 2019-09-02 15:29 ` Nicholas Piggin
  2019-09-02 15:29 ` [PATCH 4/6] powerpc/64s/pseries: radix flush translations before MMU is enabled at boot Nicholas Piggin
                   ` (3 subsequent siblings)
  6 siblings, 0 replies; 11+ messages in thread
From: Nicholas Piggin @ 2019-09-02 15:29 UTC (permalink / raw)
  To: linuxppc-dev; +Cc: Nicholas Piggin

No functional change.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
---
 arch/powerpc/include/asm/mmu.h           | 2 +-
 arch/powerpc/kvm/book3s_hv_nested.c      | 2 +-
 arch/powerpc/mm/book3s64/hash_utils.c    | 2 +-
 arch/powerpc/mm/book3s64/pgtable.c       | 4 ++--
 arch/powerpc/mm/book3s64/radix_pgtable.c | 2 +-
 5 files changed, 6 insertions(+), 6 deletions(-)

diff --git a/arch/powerpc/include/asm/mmu.h b/arch/powerpc/include/asm/mmu.h
index ba94ce8c22d7..0699cfeeb8c9 100644
--- a/arch/powerpc/include/asm/mmu.h
+++ b/arch/powerpc/include/asm/mmu.h
@@ -257,7 +257,7 @@ extern void radix__mmu_cleanup_all(void);
 /* Functions for creating and updating partition table on POWER9 */
 extern void mmu_partition_table_init(void);
 extern void mmu_partition_table_set_entry(unsigned int lpid, unsigned long dw0,
-					  unsigned long dw1);
+					  unsigned long dw1, bool flush);
 #endif /* CONFIG_PPC64 */
 
 struct mm_struct;
diff --git a/arch/powerpc/kvm/book3s_hv_nested.c b/arch/powerpc/kvm/book3s_hv_nested.c
index b3316da2f13e..fff90f2c3de2 100644
--- a/arch/powerpc/kvm/book3s_hv_nested.c
+++ b/arch/powerpc/kvm/book3s_hv_nested.c
@@ -411,7 +411,7 @@ static void kvmhv_flush_lpid(unsigned int lpid)
 void kvmhv_set_ptbl_entry(unsigned int lpid, u64 dw0, u64 dw1)
 {
 	if (!kvmhv_on_pseries()) {
-		mmu_partition_table_set_entry(lpid, dw0, dw1);
+		mmu_partition_table_set_entry(lpid, dw0, dw1, true);
 		return;
 	}
 
diff --git a/arch/powerpc/mm/book3s64/hash_utils.c b/arch/powerpc/mm/book3s64/hash_utils.c
index 7aed27ea5361..b73d08b54d12 100644
--- a/arch/powerpc/mm/book3s64/hash_utils.c
+++ b/arch/powerpc/mm/book3s64/hash_utils.c
@@ -825,7 +825,7 @@ static void __init hash_init_partition_table(phys_addr_t hash_table,
 	 * For now, UPRT is 0 and we have no segment table.
 	 */
 	htab_size =  __ilog2(htab_size) - 18;
-	mmu_partition_table_set_entry(0, hash_table | htab_size, 0);
+	mmu_partition_table_set_entry(0, hash_table | htab_size, 0, true);
 	pr_info("Partition table %p\n", partition_tb);
 }
 
diff --git a/arch/powerpc/mm/book3s64/pgtable.c b/arch/powerpc/mm/book3s64/pgtable.c
index c2b87c5ba50b..6fab9c0bbbaf 100644
--- a/arch/powerpc/mm/book3s64/pgtable.c
+++ b/arch/powerpc/mm/book3s64/pgtable.c
@@ -224,7 +224,7 @@ static void flush_partition(unsigned int lpid, bool radix)
 }
 
 void mmu_partition_table_set_entry(unsigned int lpid, unsigned long dw0,
-				  unsigned long dw1)
+				  unsigned long dw1, bool flush)
 {
 	unsigned long old = be64_to_cpu(partition_tb[lpid].patb0);
 
@@ -251,7 +251,7 @@ void mmu_partition_table_set_entry(unsigned int lpid, unsigned long dw0,
 		uv_register_pate(lpid, dw0, dw1);
 		pr_info("PATE registered by ultravisor: dw0 = 0x%lx, dw1 = 0x%lx\n",
 			dw0, dw1);
-	} else {
+	} else if (flush) {
 		flush_partition(lpid, (old & PATB_HR));
 	}
 }
diff --git a/arch/powerpc/mm/book3s64/radix_pgtable.c b/arch/powerpc/mm/book3s64/radix_pgtable.c
index 83fa7864e8f4..078a7eeec1f5 100644
--- a/arch/powerpc/mm/book3s64/radix_pgtable.c
+++ b/arch/powerpc/mm/book3s64/radix_pgtable.c
@@ -396,7 +396,7 @@ static void __init radix_init_partition_table(void)
 	rts_field = radix__get_tree_size();
 	dw0 = rts_field | __pa(init_mm.pgd) | RADIX_PGD_INDEX_SIZE | PATB_HR;
 	dw1 = __pa(process_tb) | (PRTB_SIZE_SHIFT - 12) | PATB_GR;
-	mmu_partition_table_set_entry(0, dw0, dw1);
+	mmu_partition_table_set_entry(0, dw0, dw1, true);
 
 	asm volatile("ptesync" : : : "memory");
 	asm volatile(PPC_TLBIE_5(%0,%1,2,1,1) : :
-- 
2.22.0


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [PATCH 4/6] powerpc/64s/pseries: radix flush translations before MMU is enabled at boot
  2019-09-02 15:29 [PATCH 0/6] Making tlbie optional for radix Nicholas Piggin
                   ` (2 preceding siblings ...)
  2019-09-02 15:29 ` [PATCH 3/6] powerpc/64s: make mmu_partition_table_set_entry TLB flush optional Nicholas Piggin
@ 2019-09-02 15:29 ` Nicholas Piggin
  2019-09-02 15:29 ` [PATCH 5/6] powerpc/64s: remove unnecessary translation cache flushes " Nicholas Piggin
                   ` (2 subsequent siblings)
  6 siblings, 0 replies; 11+ messages in thread
From: Nicholas Piggin @ 2019-09-02 15:29 UTC (permalink / raw)
  To: linuxppc-dev; +Cc: Nicholas Piggin

Radix guests are responsible for managing their own translation caches,
so make them match bare metal radix and hash, and make each CPU flush
all its translations right before enabling its MMU.

Radix guests may not flush partition scope translations, so in
tlbiel_all, make these flushes conditional on CPU_FTR_HVMODE. Process
scope translations are the only type visible to the guest.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
---
 arch/powerpc/mm/book3s64/radix_pgtable.c |  6 ++----
 arch/powerpc/mm/book3s64/radix_tlb.c     | 12 ++++++++----
 2 files changed, 10 insertions(+), 8 deletions(-)

diff --git a/arch/powerpc/mm/book3s64/radix_pgtable.c b/arch/powerpc/mm/book3s64/radix_pgtable.c
index 078a7eeec1f5..e1e711c4704a 100644
--- a/arch/powerpc/mm/book3s64/radix_pgtable.c
+++ b/arch/powerpc/mm/book3s64/radix_pgtable.c
@@ -616,8 +616,7 @@ void __init radix__early_init_mmu(void)
 
 	/* Switch to the guard PID before turning on MMU */
 	radix__switch_mmu_context(NULL, &init_mm);
-	if (cpu_has_feature(CPU_FTR_HVMODE))
-		tlbiel_all();
+	tlbiel_all();
 }
 
 void radix__early_init_mmu_secondary(void)
@@ -637,8 +636,7 @@ void radix__early_init_mmu_secondary(void)
 	}
 
 	radix__switch_mmu_context(NULL, &init_mm);
-	if (cpu_has_feature(CPU_FTR_HVMODE))
-		tlbiel_all();
+	tlbiel_all();
 }
 
 void radix__mmu_cleanup_all(void)
diff --git a/arch/powerpc/mm/book3s64/radix_tlb.c b/arch/powerpc/mm/book3s64/radix_tlb.c
index 082f90d068ee..f9cf8ae59831 100644
--- a/arch/powerpc/mm/book3s64/radix_tlb.c
+++ b/arch/powerpc/mm/book3s64/radix_tlb.c
@@ -51,11 +51,15 @@ static void tlbiel_all_isa300(unsigned int num_sets, unsigned int is)
 	 * and partition table entries. Then flush the remaining sets of the
 	 * TLB.
 	 */
-	tlbiel_radix_set_isa300(0, is, 0, RIC_FLUSH_ALL, 0);
-	for (set = 1; set < num_sets; set++)
-		tlbiel_radix_set_isa300(set, is, 0, RIC_FLUSH_TLB, 0);
 
-	/* Do the same for process scoped entries. */
+	if (early_cpu_has_feature(CPU_FTR_HVMODE)) {
+		/* MSR[HV] should flush partition scope translations first. */
+		tlbiel_radix_set_isa300(0, is, 0, RIC_FLUSH_ALL, 0);
+		for (set = 1; set < num_sets; set++)
+			tlbiel_radix_set_isa300(set, is, 0, RIC_FLUSH_TLB, 0);
+	}
+
+	/* Flush process scoped entries. */
 	tlbiel_radix_set_isa300(0, is, 0, RIC_FLUSH_ALL, 1);
 	for (set = 1; set < num_sets; set++)
 		tlbiel_radix_set_isa300(set, is, 0, RIC_FLUSH_TLB, 1);
-- 
2.22.0


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [PATCH 5/6] powerpc/64s: remove unnecessary translation cache flushes at boot
  2019-09-02 15:29 [PATCH 0/6] Making tlbie optional for radix Nicholas Piggin
                   ` (3 preceding siblings ...)
  2019-09-02 15:29 ` [PATCH 4/6] powerpc/64s/pseries: radix flush translations before MMU is enabled at boot Nicholas Piggin
@ 2019-09-02 15:29 ` Nicholas Piggin
  2019-09-02 15:29 ` [PATCH 6/6] powerpc/64s/radix: introduce options to disable use of the tlbie instruction Nicholas Piggin
       [not found] ` <20190902152931.17840-3-npiggin__24629.6128186927$1567438719$gmane$org@gmail.com>
  6 siblings, 0 replies; 11+ messages in thread
From: Nicholas Piggin @ 2019-09-02 15:29 UTC (permalink / raw)
  To: linuxppc-dev; +Cc: Nicholas Piggin

The various translation structure invalidations performed in early boot
when the MMU is off are not required, because everything is invalidated
immediately before a CPU first enables its MMU (see early_init_mmu
and early_init_mmu_secondary).

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
---
 arch/powerpc/mm/book3s64/hash_utils.c    | 2 +-
 arch/powerpc/mm/book3s64/pgtable.c       | 5 +++++
 arch/powerpc/mm/book3s64/radix_pgtable.c | 8 +-------
 arch/powerpc/platforms/pseries/lpar.c    | 5 -----
 4 files changed, 7 insertions(+), 13 deletions(-)

diff --git a/arch/powerpc/mm/book3s64/hash_utils.c b/arch/powerpc/mm/book3s64/hash_utils.c
index b73d08b54d12..7684a596158b 100644
--- a/arch/powerpc/mm/book3s64/hash_utils.c
+++ b/arch/powerpc/mm/book3s64/hash_utils.c
@@ -825,7 +825,7 @@ static void __init hash_init_partition_table(phys_addr_t hash_table,
 	 * For now, UPRT is 0 and we have no segment table.
 	 */
 	htab_size =  __ilog2(htab_size) - 18;
-	mmu_partition_table_set_entry(0, hash_table | htab_size, 0, true);
+	mmu_partition_table_set_entry(0, hash_table | htab_size, 0, false);
 	pr_info("Partition table %p\n", partition_tb);
 }
 
diff --git a/arch/powerpc/mm/book3s64/pgtable.c b/arch/powerpc/mm/book3s64/pgtable.c
index 6fab9c0bbbaf..351eb78eed55 100644
--- a/arch/powerpc/mm/book3s64/pgtable.c
+++ b/arch/powerpc/mm/book3s64/pgtable.c
@@ -252,6 +252,11 @@ void mmu_partition_table_set_entry(unsigned int lpid, unsigned long dw0,
 		pr_info("PATE registered by ultravisor: dw0 = 0x%lx, dw1 = 0x%lx\n",
 			dw0, dw1);
 	} else if (flush) {
+		/*
+		 * Boot does not need to flush, because MMU is off and each
+		 * CPU does a tlbiel_all() before switching them on, which
+		 * flushes everything.
+		 */
 		flush_partition(lpid, (old & PATB_HR));
 	}
 }
diff --git a/arch/powerpc/mm/book3s64/radix_pgtable.c b/arch/powerpc/mm/book3s64/radix_pgtable.c
index e1e711c4704a..0d1107fb34c1 100644
--- a/arch/powerpc/mm/book3s64/radix_pgtable.c
+++ b/arch/powerpc/mm/book3s64/radix_pgtable.c
@@ -396,13 +396,7 @@ static void __init radix_init_partition_table(void)
 	rts_field = radix__get_tree_size();
 	dw0 = rts_field | __pa(init_mm.pgd) | RADIX_PGD_INDEX_SIZE | PATB_HR;
 	dw1 = __pa(process_tb) | (PRTB_SIZE_SHIFT - 12) | PATB_GR;
-	mmu_partition_table_set_entry(0, dw0, dw1, true);
-
-	asm volatile("ptesync" : : : "memory");
-	asm volatile(PPC_TLBIE_5(%0,%1,2,1,1) : :
-		     "r" (TLBIEL_INVAL_SET_LPID), "r" (0));
-	asm volatile("eieio; tlbsync; ptesync" : : : "memory");
-	trace_tlbie(0, 0, TLBIEL_INVAL_SET_LPID, 0, 2, 1, 1);
+	mmu_partition_table_set_entry(0, dw0, dw1, false);
 
 	pr_info("Initializing Radix MMU\n");
 	pr_info("Partition table %p\n", partition_tb);
diff --git a/arch/powerpc/platforms/pseries/lpar.c b/arch/powerpc/platforms/pseries/lpar.c
index b3205a6c950c..36b846f6e74e 100644
--- a/arch/powerpc/platforms/pseries/lpar.c
+++ b/arch/powerpc/platforms/pseries/lpar.c
@@ -1549,11 +1549,6 @@ void radix_init_pseries(void)
 
 	pseries_lpar_register_process_table(__pa(process_tb),
 						0, PRTB_SIZE_SHIFT - 12);
-	asm volatile("ptesync" : : : "memory");
-	asm volatile(PPC_TLBIE_5(%0,%1,2,1,1) : :
-		     "r" (TLBIEL_INVAL_SET_LPID), "r" (0));
-	asm volatile("eieio; tlbsync; ptesync" : : : "memory");
-	trace_tlbie(0, 0, TLBIEL_INVAL_SET_LPID, 0, 2, 1, 1);
 }
 
 #ifdef CONFIG_PPC_SMLPAR
-- 
2.22.0


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [PATCH 6/6] powerpc/64s/radix: introduce options to disable use of the tlbie instruction
  2019-09-02 15:29 [PATCH 0/6] Making tlbie optional for radix Nicholas Piggin
                   ` (4 preceding siblings ...)
  2019-09-02 15:29 ` [PATCH 5/6] powerpc/64s: remove unnecessary translation cache flushes " Nicholas Piggin
@ 2019-09-02 15:29 ` Nicholas Piggin
  2019-09-03  0:32   ` Alistair Popple
       [not found] ` <20190902152931.17840-3-npiggin__24629.6128186927$1567438719$gmane$org@gmail.com>
  6 siblings, 1 reply; 11+ messages in thread
From: Nicholas Piggin @ 2019-09-02 15:29 UTC (permalink / raw)
  To: linuxppc-dev; +Cc: Nicholas Piggin

Introduce two options to control the use of the tlbie instruction. A
boot time option which completely disables the kernel using the
instruction, this is currently incompatible with HASH MMU, KVM, and
coherent accelerators.

And a debugfs option can be switched at runtime and avoids using tlbie
for invalidating CPU TLBs for normal process and kernel address
mappings. Coherent accelerators are still managed with tlbie, as will
KVM partition scope translations.

Cross-CPU TLB flushing is implemented with IPIs and tlbiel. This is a
basic implementation which does not attempt to make any optimisation
beyond the tlbie implementation.

This is useful for performance testing among other things. For example
in certain situations on large systems, using IPIs may be faster than
tlbie as they can be directed rather than broadcast. Later we may also
take advantage of the IPIs to do more interesting things such as trim
the mm cpumask more aggressively.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
---
 .../admin-guide/kernel-parameters.txt         |   4 +
 arch/powerpc/include/asm/book3s/64/tlbflush.h |   9 +
 arch/powerpc/kvm/book3s_hv.c                  |   6 +
 arch/powerpc/mm/book3s64/pgtable.c            |  47 +++++
 arch/powerpc/mm/book3s64/radix_tlb.c          | 190 ++++++++++++++++--
 drivers/misc/cxl/main.c                       |   4 +
 drivers/misc/ocxl/main.c                      |   4 +
 7 files changed, 246 insertions(+), 18 deletions(-)

diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt
index d3cbb3ae62b6..65ae16549aa3 100644
--- a/Documentation/admin-guide/kernel-parameters.txt
+++ b/Documentation/admin-guide/kernel-parameters.txt
@@ -860,6 +860,10 @@
 	disable_radix	[PPC]
 			Disable RADIX MMU mode on POWER9
 
+	disable_tlbie	[PPC]
+			Disable TLBIE instruction. Currently does not work
+			with KVM, with HASH MMU, or with coherent accelerators.
+
 	disable_cpu_apicid= [X86,APIC,SMP]
 			Format: <int>
 			The number of initial APIC ID for the
diff --git a/arch/powerpc/include/asm/book3s/64/tlbflush.h b/arch/powerpc/include/asm/book3s/64/tlbflush.h
index ebf572ea621e..7aa8195b6cff 100644
--- a/arch/powerpc/include/asm/book3s/64/tlbflush.h
+++ b/arch/powerpc/include/asm/book3s/64/tlbflush.h
@@ -162,4 +162,13 @@ static inline void flush_tlb_pgtable(struct mmu_gather *tlb, unsigned long addre
 
 	radix__flush_tlb_pwc(tlb, address);
 }
+
+extern bool tlbie_capable;
+extern bool tlbie_enabled;
+
+static inline bool cputlb_use_tlbie(void)
+{
+	return tlbie_enabled;
+}
+
 #endif /*  _ASM_POWERPC_BOOK3S_64_TLBFLUSH_H */
diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c
index cde3f5a4b3e4..3cdaa2a09a19 100644
--- a/arch/powerpc/kvm/book3s_hv.c
+++ b/arch/powerpc/kvm/book3s_hv.c
@@ -5462,6 +5462,12 @@ static int kvmppc_radix_possible(void)
 static int kvmppc_book3s_init_hv(void)
 {
 	int r;
+
+	if (!tlbie_capable) {
+		pr_err("KVM-HV: Host does not support TLBIE\n");
+		return -ENODEV;
+	}
+
 	/*
 	 * FIXME!! Do we need to check on all cpus ?
 	 */
diff --git a/arch/powerpc/mm/book3s64/pgtable.c b/arch/powerpc/mm/book3s64/pgtable.c
index 351eb78eed55..75483b40fcb1 100644
--- a/arch/powerpc/mm/book3s64/pgtable.c
+++ b/arch/powerpc/mm/book3s64/pgtable.c
@@ -8,6 +8,7 @@
 #include <linux/memblock.h>
 #include <misc/cxl-base.h>
 
+#include <asm/debugfs.h>
 #include <asm/pgalloc.h>
 #include <asm/tlb.h>
 #include <asm/trace.h>
@@ -469,3 +470,49 @@ int pmd_move_must_withdraw(struct spinlock *new_pmd_ptl,
 
 	return true;
 }
+
+/*
+ * Does the CPU support tlbie?
+ */
+bool tlbie_capable __read_mostly = true;
+EXPORT_SYMBOL(tlbie_capable);
+
+/*
+ * Should tlbie be used for management of CPU TLBs, for kernel and process
+ * address spaces? tlbie may still be used for nMMU accelerators, and for KVM
+ * guest address spaces.
+ */
+bool tlbie_enabled __read_mostly = true;
+
+static int __init setup_disable_tlbie(char *str)
+{
+	if (!radix_enabled()) {
+		pr_err("disable_tlbie: Unable to disable TLBIE with Hash MMU.\n");
+		return 1;
+	}
+
+	tlbie_capable = false;
+	tlbie_enabled = false;
+
+        return 1;
+}
+__setup("disable_tlbie", setup_disable_tlbie);
+
+static int __init pgtable_debugfs_setup(void)
+{
+	if (!tlbie_capable)
+		return 0;
+
+	/*
+	 * There is no locking vs tlb flushing when changing this value.
+	 * The tlb flushers will see one value or another, and use either
+	 * tlbie or tlbiel with IPIs. In both cases the TLBs will be
+	 * invalidated as expected.
+	 */
+	debugfs_create_bool("tlbie_enabled", 0600,
+			powerpc_debugfs_root,
+			&tlbie_enabled);
+
+	return 0;
+}
+arch_initcall(pgtable_debugfs_setup);
diff --git a/arch/powerpc/mm/book3s64/radix_tlb.c b/arch/powerpc/mm/book3s64/radix_tlb.c
index f9cf8ae59831..631be42abd33 100644
--- a/arch/powerpc/mm/book3s64/radix_tlb.c
+++ b/arch/powerpc/mm/book3s64/radix_tlb.c
@@ -270,6 +270,39 @@ static inline void _tlbie_pid(unsigned long pid, unsigned long ric)
 	asm volatile("eieio; tlbsync; ptesync": : :"memory");
 }
 
+struct tlbiel_pid {
+	unsigned long pid;
+	unsigned long ric;
+};
+
+static void do_tlbiel_pid(void *info)
+{
+	struct tlbiel_pid *t = info;
+
+	if (t->ric == RIC_FLUSH_TLB)
+		_tlbiel_pid(t->pid, RIC_FLUSH_TLB);
+	else if (t->ric == RIC_FLUSH_PWC)
+		_tlbiel_pid(t->pid, RIC_FLUSH_PWC);
+	else
+		_tlbiel_pid(t->pid, RIC_FLUSH_ALL);
+}
+
+static inline void _tlbiel_pid_multicast(struct mm_struct *mm,
+				unsigned long pid, unsigned long ric)
+{
+	struct cpumask *cpus = mm_cpumask(mm);
+	struct tlbiel_pid t = { .pid = pid, .ric = ric };
+
+	on_each_cpu_mask(cpus, do_tlbiel_pid, &t, 1);
+	/*
+	 * Always want the CPU translations to be invalidated with tlbiel in
+	 * these paths, so while coprocessors must use tlbie, we can not
+	 * optimise away the tlbiel component.
+	 */
+	if (atomic_read(&mm->context.copros) > 0)
+		_tlbie_pid(pid, RIC_FLUSH_ALL);
+}
+
 static inline void _tlbie_lpid(unsigned long lpid, unsigned long ric)
 {
 	asm volatile("ptesync": : :"memory");
@@ -370,6 +403,53 @@ static __always_inline void _tlbie_va(unsigned long va, unsigned long pid,
 	asm volatile("eieio; tlbsync; ptesync": : :"memory");
 }
 
+struct tlbiel_va {
+	unsigned long pid;
+	unsigned long va;
+	unsigned long psize;
+	unsigned long ric;
+};
+
+static void do_tlbiel_va(void *info)
+{
+	struct tlbiel_va *t = info;
+
+	if (t->ric == RIC_FLUSH_TLB)
+		_tlbiel_va(t->va, t->pid, t->psize, RIC_FLUSH_TLB);
+	else if (t->ric == RIC_FLUSH_PWC)
+		_tlbiel_va(t->va, t->pid, t->psize, RIC_FLUSH_PWC);
+	else
+		_tlbiel_va(t->va, t->pid, t->psize, RIC_FLUSH_ALL);
+}
+
+static inline void _tlbiel_va_multicast(struct mm_struct *mm,
+				unsigned long va, unsigned long pid,
+				unsigned long psize, unsigned long ric)
+{
+	struct cpumask *cpus = mm_cpumask(mm);
+	struct tlbiel_va t = { .va = va, .pid = pid, .psize = psize, .ric = ric };
+	on_each_cpu_mask(cpus, do_tlbiel_va, &t, 1);
+	if (atomic_read(&mm->context.copros) > 0)
+		_tlbie_va(va, pid, psize, RIC_FLUSH_TLB);
+}
+
+struct tlbiel_va_range {
+	unsigned long pid;
+	unsigned long start;
+	unsigned long end;
+	unsigned long page_size;
+	unsigned long psize;
+	bool also_pwc;
+};
+
+static void do_tlbiel_va_range(void *info)
+{
+	struct tlbiel_va_range *t = info;
+
+	_tlbiel_va_range(t->start, t->end, t->pid, t->page_size,
+				    t->psize, t->also_pwc);
+}
+
 static __always_inline void _tlbie_lpid_va(unsigned long va, unsigned long lpid,
 			      unsigned long psize, unsigned long ric)
 {
@@ -393,6 +473,21 @@ static inline void _tlbie_va_range(unsigned long start, unsigned long end,
 	asm volatile("eieio; tlbsync; ptesync": : :"memory");
 }
 
+static inline void _tlbiel_va_range_multicast(struct mm_struct *mm,
+				unsigned long start, unsigned long end,
+				unsigned long pid, unsigned long page_size,
+				unsigned long psize, bool also_pwc)
+{
+	struct cpumask *cpus = mm_cpumask(mm);
+	struct tlbiel_va_range t = { .start = start, .end = end,
+				.pid = pid, .page_size = page_size,
+				.psize = psize, .also_pwc = also_pwc };
+
+	on_each_cpu_mask(cpus, do_tlbiel_va_range, &t, 1);
+	if (atomic_read(&mm->context.copros) > 0)
+		_tlbie_va_range(start, end, pid, page_size, psize, also_pwc);
+}
+
 /*
  * Base TLB flushing operations:
  *
@@ -530,10 +625,14 @@ void radix__flush_tlb_mm(struct mm_struct *mm)
 			goto local;
 		}
 
-		if (mm_needs_flush_escalation(mm))
-			_tlbie_pid(pid, RIC_FLUSH_ALL);
-		else
-			_tlbie_pid(pid, RIC_FLUSH_TLB);
+		if (cputlb_use_tlbie()) {
+			if (mm_needs_flush_escalation(mm))
+				_tlbie_pid(pid, RIC_FLUSH_ALL);
+			else
+				_tlbie_pid(pid, RIC_FLUSH_TLB);
+		} else {
+			_tlbiel_pid_multicast(mm, pid, RIC_FLUSH_TLB);
+		}
 	} else {
 local:
 		_tlbiel_pid(pid, RIC_FLUSH_TLB);
@@ -559,7 +658,10 @@ static void __flush_all_mm(struct mm_struct *mm, bool fullmm)
 				goto local;
 			}
 		}
-		_tlbie_pid(pid, RIC_FLUSH_ALL);
+		if (cputlb_use_tlbie())
+			_tlbie_pid(pid, RIC_FLUSH_ALL);
+		else
+			_tlbiel_pid_multicast(mm, pid, RIC_FLUSH_ALL);
 	} else {
 local:
 		_tlbiel_pid(pid, RIC_FLUSH_ALL);
@@ -594,7 +696,10 @@ void radix__flush_tlb_page_psize(struct mm_struct *mm, unsigned long vmaddr,
 			exit_flush_lazy_tlbs(mm);
 			goto local;
 		}
-		_tlbie_va(vmaddr, pid, psize, RIC_FLUSH_TLB);
+		if (cputlb_use_tlbie())
+			_tlbie_va(vmaddr, pid, psize, RIC_FLUSH_TLB);
+		else
+			_tlbiel_va_multicast(mm, vmaddr, pid, psize, RIC_FLUSH_TLB);
 	} else {
 local:
 		_tlbiel_va(vmaddr, pid, psize, RIC_FLUSH_TLB);
@@ -616,6 +721,24 @@ EXPORT_SYMBOL(radix__flush_tlb_page);
 #define radix__flush_all_mm radix__local_flush_all_mm
 #endif /* CONFIG_SMP */
 
+static void do_tlbiel_kernel(void *info)
+{
+	_tlbiel_pid(0, RIC_FLUSH_ALL);
+}
+
+static inline void _tlbiel_kernel_broadcast(void)
+{
+	on_each_cpu(do_tlbiel_kernel, NULL, 1);
+	if (tlbie_capable) {
+		/*
+		 * Coherent accelerators don't refcount kernel memory mappings,
+		 * so have to always issue a tlbie for them. This is quite a
+		 * slow path anyway.
+		 */
+		_tlbie_pid(0, RIC_FLUSH_ALL);
+	}
+}
+
 /*
  * If kernel TLBIs ever become local rather than global, then
  * drivers/misc/ocxl/link.c:ocxl_link_add_pe will need some work, as it
@@ -623,7 +746,10 @@ EXPORT_SYMBOL(radix__flush_tlb_page);
  */
 void radix__flush_tlb_kernel_range(unsigned long start, unsigned long end)
 {
-	_tlbie_pid(0, RIC_FLUSH_ALL);
+	if (cputlb_use_tlbie())
+		_tlbie_pid(0, RIC_FLUSH_ALL);
+	else
+		_tlbiel_kernel_broadcast();
 }
 EXPORT_SYMBOL(radix__flush_tlb_kernel_range);
 
@@ -679,10 +805,14 @@ static inline void __radix__flush_tlb_range(struct mm_struct *mm,
 		if (local) {
 			_tlbiel_pid(pid, RIC_FLUSH_TLB);
 		} else {
-			if (mm_needs_flush_escalation(mm))
-				_tlbie_pid(pid, RIC_FLUSH_ALL);
-			else
-				_tlbie_pid(pid, RIC_FLUSH_TLB);
+			if (cputlb_use_tlbie()) {
+				if (mm_needs_flush_escalation(mm))
+					_tlbie_pid(pid, RIC_FLUSH_ALL);
+				else
+					_tlbie_pid(pid, RIC_FLUSH_TLB);
+			} else {
+				_tlbiel_pid_multicast(mm, pid, RIC_FLUSH_TLB);
+			}
 		}
 	} else {
 		bool hflush = flush_all_sizes;
@@ -707,8 +837,8 @@ static inline void __radix__flush_tlb_range(struct mm_struct *mm,
 				gflush = false;
 		}
 
-		asm volatile("ptesync": : :"memory");
 		if (local) {
+			asm volatile("ptesync": : :"memory");
 			__tlbiel_va_range(start, end, pid, page_size, mmu_virtual_psize);
 			if (hflush)
 				__tlbiel_va_range(hstart, hend, pid,
@@ -717,7 +847,8 @@ static inline void __radix__flush_tlb_range(struct mm_struct *mm,
 				__tlbiel_va_range(gstart, gend, pid,
 						PUD_SIZE, MMU_PAGE_1G);
 			asm volatile("ptesync": : :"memory");
-		} else {
+		} else if (cputlb_use_tlbie()) {
+			asm volatile("ptesync": : :"memory");
 			__tlbie_va_range(start, end, pid, page_size, mmu_virtual_psize);
 			if (hflush)
 				__tlbie_va_range(hstart, hend, pid,
@@ -727,6 +858,15 @@ static inline void __radix__flush_tlb_range(struct mm_struct *mm,
 						PUD_SIZE, MMU_PAGE_1G);
 			fixup_tlbie();
 			asm volatile("eieio; tlbsync; ptesync": : :"memory");
+		} else {
+			_tlbiel_va_range_multicast(mm,
+					start, end, pid, page_size, mmu_virtual_psize, false);
+			if (hflush)
+				_tlbiel_va_range_multicast(mm,
+					hstart, hend, pid, PMD_SIZE, MMU_PAGE_2M, false);
+			if (gflush)
+				_tlbiel_va_range_multicast(mm,
+					gstart, gend, pid, PUD_SIZE, MMU_PAGE_1G, false);
 		}
 	}
 	preempt_enable();
@@ -903,16 +1043,26 @@ static __always_inline void __radix__flush_tlb_range_psize(struct mm_struct *mm,
 		if (local) {
 			_tlbiel_pid(pid, also_pwc ? RIC_FLUSH_ALL : RIC_FLUSH_TLB);
 		} else {
-			if (mm_needs_flush_escalation(mm))
-				also_pwc = true;
+			if (cputlb_use_tlbie()) {
+				if (mm_needs_flush_escalation(mm))
+					also_pwc = true;
+
+				_tlbie_pid(pid,
+					also_pwc ?  RIC_FLUSH_ALL : RIC_FLUSH_TLB);
+			} else {
+				_tlbiel_pid_multicast(mm, pid,
+					also_pwc ?  RIC_FLUSH_ALL : RIC_FLUSH_TLB);
+			}
 
-			_tlbie_pid(pid, also_pwc ? RIC_FLUSH_ALL : RIC_FLUSH_TLB);
 		}
 	} else {
 		if (local)
 			_tlbiel_va_range(start, end, pid, page_size, psize, also_pwc);
-		else
+		else if (cputlb_use_tlbie())
 			_tlbie_va_range(start, end, pid, page_size, psize, also_pwc);
+		else
+			_tlbiel_va_range_multicast(mm,
+					start, end, pid, page_size, psize, also_pwc);
 	}
 	preempt_enable();
 }
@@ -954,7 +1104,11 @@ void radix__flush_tlb_collapsed_pmd(struct mm_struct *mm, unsigned long addr)
 			exit_flush_lazy_tlbs(mm);
 			goto local;
 		}
-		_tlbie_va_range(addr, end, pid, PAGE_SIZE, mmu_virtual_psize, true);
+		if (cputlb_use_tlbie())
+			_tlbie_va_range(addr, end, pid, PAGE_SIZE, mmu_virtual_psize, true);
+		else
+			_tlbiel_va_range_multicast(mm,
+					addr, end, pid, PAGE_SIZE, mmu_virtual_psize, true);
 	} else {
 local:
 		_tlbiel_va_range(addr, end, pid, PAGE_SIZE, mmu_virtual_psize, true);
diff --git a/drivers/misc/cxl/main.c b/drivers/misc/cxl/main.c
index 482a2c1b340a..43b312d06e3e 100644
--- a/drivers/misc/cxl/main.c
+++ b/drivers/misc/cxl/main.c
@@ -18,6 +18,7 @@
 #include <linux/sched/task.h>
 
 #include <asm/cputable.h>
+#include <asm/mmu.h>
 #include <misc/cxl-base.h>
 
 #include "cxl.h"
@@ -315,6 +316,9 @@ static int __init init_cxl(void)
 {
 	int rc = 0;
 
+	if (!tlbie_capable)
+		return -EINVAL;
+
 	if ((rc = cxl_file_init()))
 		return rc;
 
diff --git a/drivers/misc/ocxl/main.c b/drivers/misc/ocxl/main.c
index 7210d9e059be..ef73cf35dda2 100644
--- a/drivers/misc/ocxl/main.c
+++ b/drivers/misc/ocxl/main.c
@@ -2,12 +2,16 @@
 // Copyright 2017 IBM Corp.
 #include <linux/module.h>
 #include <linux/pci.h>
+#include <asm/mmu.h>
 #include "ocxl_internal.h"
 
 static int __init init_ocxl(void)
 {
 	int rc = 0;
 
+	if (!tlbie_capable)
+		return -EINVAL;
+
 	rc = ocxl_file_init();
 	if (rc)
 		return rc;
-- 
2.22.0


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* Re: [PATCH 6/6] powerpc/64s/radix: introduce options to disable use of the tlbie instruction
  2019-09-02 15:29 ` [PATCH 6/6] powerpc/64s/radix: introduce options to disable use of the tlbie instruction Nicholas Piggin
@ 2019-09-03  0:32   ` Alistair Popple
  2019-09-03  2:52     ` Nicholas Piggin
  0 siblings, 1 reply; 11+ messages in thread
From: Alistair Popple @ 2019-09-03  0:32 UTC (permalink / raw)
  To: linuxppc-dev; +Cc: Nicholas Piggin

Nick,

On Tuesday, 3 September 2019 1:29:31 AM AEST Nicholas Piggin wrote:
> Introduce two options to control the use of the tlbie instruction. A
> boot time option which completely disables the kernel using the
> instruction, this is currently incompatible with HASH MMU, KVM, and
> coherent accelerators.

Some accelerators (eg. cxl, ocxl, npu) call mm_context_add_copro() to force 
global TLB invalidations:

static inline void mm_context_add_copro(struct mm_struct *mm)
{
        /*
         * If any copro is in use, increment the active CPU count
         * in order to force TLB invalidations to be global as to
         * propagate to the Nest MMU.
         */
        if (atomic_inc_return(&mm->context.copros) == 1)
                inc_mm_active_cpus(mm);
}

Admittedly I haven't dug into all the details of this patch but it sounds like 
it might break the above if TLBIE is disabled. Do you think we should add a 
WARN_ON if mm_context_add_copro() is called with TLBIE disabled? Or perhaps 
even force TLBIE to be re-enabled if it is called with it disabled?

- Alistair

> And a debugfs option can be switched at runtime and avoids using tlbie
> for invalidating CPU TLBs for normal process and kernel address
> mappings. Coherent accelerators are still managed with tlbie, as will
> KVM partition scope translations.
> 
> Cross-CPU TLB flushing is implemented with IPIs and tlbiel. This is a
> basic implementation which does not attempt to make any optimisation
> beyond the tlbie implementation.
> 
> This is useful for performance testing among other things. For example
> in certain situations on large systems, using IPIs may be faster than
> tlbie as they can be directed rather than broadcast. Later we may also
> take advantage of the IPIs to do more interesting things such as trim
> the mm cpumask more aggressively.
> 
> Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
> ---
>  .../admin-guide/kernel-parameters.txt         |   4 +
>  arch/powerpc/include/asm/book3s/64/tlbflush.h |   9 +
>  arch/powerpc/kvm/book3s_hv.c                  |   6 +
>  arch/powerpc/mm/book3s64/pgtable.c            |  47 +++++
>  arch/powerpc/mm/book3s64/radix_tlb.c          | 190 ++++++++++++++++--
>  drivers/misc/cxl/main.c                       |   4 +
>  drivers/misc/ocxl/main.c                      |   4 +
>  7 files changed, 246 insertions(+), 18 deletions(-)
> 
> diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/
admin-guide/kernel-parameters.txt
> index d3cbb3ae62b6..65ae16549aa3 100644
> --- a/Documentation/admin-guide/kernel-parameters.txt
> +++ b/Documentation/admin-guide/kernel-parameters.txt
> @@ -860,6 +860,10 @@
>  	disable_radix	[PPC]
>  			Disable RADIX MMU mode on POWER9
>  
> +	disable_tlbie	[PPC]
> +			Disable TLBIE instruction. Currently does not work
> +			with KVM, with HASH MMU, or with coherent accelerators.
> +
>  	disable_cpu_apicid= [X86,APIC,SMP]
>  			Format: <int>
>  			The number of initial APIC ID for the
> diff --git a/arch/powerpc/include/asm/book3s/64/tlbflush.h b/arch/powerpc/
include/asm/book3s/64/tlbflush.h
> index ebf572ea621e..7aa8195b6cff 100644
> --- a/arch/powerpc/include/asm/book3s/64/tlbflush.h
> +++ b/arch/powerpc/include/asm/book3s/64/tlbflush.h
> @@ -162,4 +162,13 @@ static inline void flush_tlb_pgtable(struct mmu_gather 
*tlb, unsigned long addre
>  
>  	radix__flush_tlb_pwc(tlb, address);
>  }
> +
> +extern bool tlbie_capable;
> +extern bool tlbie_enabled;
> +
> +static inline bool cputlb_use_tlbie(void)
> +{
> +	return tlbie_enabled;
> +}
> +
>  #endif /*  _ASM_POWERPC_BOOK3S_64_TLBFLUSH_H */
> diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c
> index cde3f5a4b3e4..3cdaa2a09a19 100644
> --- a/arch/powerpc/kvm/book3s_hv.c
> +++ b/arch/powerpc/kvm/book3s_hv.c
> @@ -5462,6 +5462,12 @@ static int kvmppc_radix_possible(void)
>  static int kvmppc_book3s_init_hv(void)
>  {
>  	int r;
> +
> +	if (!tlbie_capable) {
> +		pr_err("KVM-HV: Host does not support TLBIE\n");
> +		return -ENODEV;
> +	}
> +
>  	/*
>  	 * FIXME!! Do we need to check on all cpus ?
>  	 */
> diff --git a/arch/powerpc/mm/book3s64/pgtable.c b/arch/powerpc/mm/book3s64/
pgtable.c
> index 351eb78eed55..75483b40fcb1 100644
> --- a/arch/powerpc/mm/book3s64/pgtable.c
> +++ b/arch/powerpc/mm/book3s64/pgtable.c
> @@ -8,6 +8,7 @@
>  #include <linux/memblock.h>
>  #include <misc/cxl-base.h>
>  
> +#include <asm/debugfs.h>
>  #include <asm/pgalloc.h>
>  #include <asm/tlb.h>
>  #include <asm/trace.h>
> @@ -469,3 +470,49 @@ int pmd_move_must_withdraw(struct spinlock 
*new_pmd_ptl,
>  
>  	return true;
>  }
> +
> +/*
> + * Does the CPU support tlbie?
> + */
> +bool tlbie_capable __read_mostly = true;
> +EXPORT_SYMBOL(tlbie_capable);
> +
> +/*
> + * Should tlbie be used for management of CPU TLBs, for kernel and process
> + * address spaces? tlbie may still be used for nMMU accelerators, and for 
KVM
> + * guest address spaces.
> + */
> +bool tlbie_enabled __read_mostly = true;
> +
> +static int __init setup_disable_tlbie(char *str)
> +{
> +	if (!radix_enabled()) {
> +		pr_err("disable_tlbie: Unable to disable TLBIE with Hash MMU.\n");
> +		return 1;
> +	}
> +
> +	tlbie_capable = false;
> +	tlbie_enabled = false;
> +
> +        return 1;
> +}
> +__setup("disable_tlbie", setup_disable_tlbie);
> +
> +static int __init pgtable_debugfs_setup(void)
> +{
> +	if (!tlbie_capable)
> +		return 0;
> +
> +	/*
> +	 * There is no locking vs tlb flushing when changing this value.
> +	 * The tlb flushers will see one value or another, and use either
> +	 * tlbie or tlbiel with IPIs. In both cases the TLBs will be
> +	 * invalidated as expected.
> +	 */
> +	debugfs_create_bool("tlbie_enabled", 0600,
> +			powerpc_debugfs_root,
> +			&tlbie_enabled);
> +
> +	return 0;
> +}
> +arch_initcall(pgtable_debugfs_setup);
> diff --git a/arch/powerpc/mm/book3s64/radix_tlb.c b/arch/powerpc/mm/book3s64/
radix_tlb.c
> index f9cf8ae59831..631be42abd33 100644
> --- a/arch/powerpc/mm/book3s64/radix_tlb.c
> +++ b/arch/powerpc/mm/book3s64/radix_tlb.c
> @@ -270,6 +270,39 @@ static inline void _tlbie_pid(unsigned long pid, 
unsigned long ric)
>  	asm volatile("eieio; tlbsync; ptesync": : :"memory");
>  }
>  
> +struct tlbiel_pid {
> +	unsigned long pid;
> +	unsigned long ric;
> +};
> +
> +static void do_tlbiel_pid(void *info)
> +{
> +	struct tlbiel_pid *t = info;
> +
> +	if (t->ric == RIC_FLUSH_TLB)
> +		_tlbiel_pid(t->pid, RIC_FLUSH_TLB);
> +	else if (t->ric == RIC_FLUSH_PWC)
> +		_tlbiel_pid(t->pid, RIC_FLUSH_PWC);
> +	else
> +		_tlbiel_pid(t->pid, RIC_FLUSH_ALL);
> +}
> +
> +static inline void _tlbiel_pid_multicast(struct mm_struct *mm,
> +				unsigned long pid, unsigned long ric)
> +{
> +	struct cpumask *cpus = mm_cpumask(mm);
> +	struct tlbiel_pid t = { .pid = pid, .ric = ric };
> +
> +	on_each_cpu_mask(cpus, do_tlbiel_pid, &t, 1);
> +	/*
> +	 * Always want the CPU translations to be invalidated with tlbiel in
> +	 * these paths, so while coprocessors must use tlbie, we can not
> +	 * optimise away the tlbiel component.
> +	 */
> +	if (atomic_read(&mm->context.copros) > 0)
> +		_tlbie_pid(pid, RIC_FLUSH_ALL);
> +}
> +
>  static inline void _tlbie_lpid(unsigned long lpid, unsigned long ric)
>  {
>  	asm volatile("ptesync": : :"memory");
> @@ -370,6 +403,53 @@ static __always_inline void _tlbie_va(unsigned long va, 
unsigned long pid,
>  	asm volatile("eieio; tlbsync; ptesync": : :"memory");
>  }
>  
> +struct tlbiel_va {
> +	unsigned long pid;
> +	unsigned long va;
> +	unsigned long psize;
> +	unsigned long ric;
> +};
> +
> +static void do_tlbiel_va(void *info)
> +{
> +	struct tlbiel_va *t = info;
> +
> +	if (t->ric == RIC_FLUSH_TLB)
> +		_tlbiel_va(t->va, t->pid, t->psize, RIC_FLUSH_TLB);
> +	else if (t->ric == RIC_FLUSH_PWC)
> +		_tlbiel_va(t->va, t->pid, t->psize, RIC_FLUSH_PWC);
> +	else
> +		_tlbiel_va(t->va, t->pid, t->psize, RIC_FLUSH_ALL);
> +}
> +
> +static inline void _tlbiel_va_multicast(struct mm_struct *mm,
> +				unsigned long va, unsigned long pid,
> +				unsigned long psize, unsigned long ric)
> +{
> +	struct cpumask *cpus = mm_cpumask(mm);
> +	struct tlbiel_va t = { .va = va, .pid = pid, .psize = psize, .ric = ric };
> +	on_each_cpu_mask(cpus, do_tlbiel_va, &t, 1);
> +	if (atomic_read(&mm->context.copros) > 0)
> +		_tlbie_va(va, pid, psize, RIC_FLUSH_TLB);
> +}
> +
> +struct tlbiel_va_range {
> +	unsigned long pid;
> +	unsigned long start;
> +	unsigned long end;
> +	unsigned long page_size;
> +	unsigned long psize;
> +	bool also_pwc;
> +};
> +
> +static void do_tlbiel_va_range(void *info)
> +{
> +	struct tlbiel_va_range *t = info;
> +
> +	_tlbiel_va_range(t->start, t->end, t->pid, t->page_size,
> +				    t->psize, t->also_pwc);
> +}
> +
>  static __always_inline void _tlbie_lpid_va(unsigned long va, unsigned long 
lpid,
>  			      unsigned long psize, unsigned long ric)
>  {
> @@ -393,6 +473,21 @@ static inline void _tlbie_va_range(unsigned long start, 
unsigned long end,
>  	asm volatile("eieio; tlbsync; ptesync": : :"memory");
>  }
>  
> +static inline void _tlbiel_va_range_multicast(struct mm_struct *mm,
> +				unsigned long start, unsigned long end,
> +				unsigned long pid, unsigned long page_size,
> +				unsigned long psize, bool also_pwc)
> +{
> +	struct cpumask *cpus = mm_cpumask(mm);
> +	struct tlbiel_va_range t = { .start = start, .end = end,
> +				.pid = pid, .page_size = page_size,
> +				.psize = psize, .also_pwc = also_pwc };
> +
> +	on_each_cpu_mask(cpus, do_tlbiel_va_range, &t, 1);
> +	if (atomic_read(&mm->context.copros) > 0)
> +		_tlbie_va_range(start, end, pid, page_size, psize, also_pwc);
> +}
> +
>  /*
>   * Base TLB flushing operations:
>   *
> @@ -530,10 +625,14 @@ void radix__flush_tlb_mm(struct mm_struct *mm)
>  			goto local;
>  		}
>  
> -		if (mm_needs_flush_escalation(mm))
> -			_tlbie_pid(pid, RIC_FLUSH_ALL);
> -		else
> -			_tlbie_pid(pid, RIC_FLUSH_TLB);
> +		if (cputlb_use_tlbie()) {
> +			if (mm_needs_flush_escalation(mm))
> +				_tlbie_pid(pid, RIC_FLUSH_ALL);
> +			else
> +				_tlbie_pid(pid, RIC_FLUSH_TLB);
> +		} else {
> +			_tlbiel_pid_multicast(mm, pid, RIC_FLUSH_TLB);
> +		}
>  	} else {
>  local:
>  		_tlbiel_pid(pid, RIC_FLUSH_TLB);
> @@ -559,7 +658,10 @@ static void __flush_all_mm(struct mm_struct *mm, bool 
fullmm)
>  				goto local;
>  			}
>  		}
> -		_tlbie_pid(pid, RIC_FLUSH_ALL);
> +		if (cputlb_use_tlbie())
> +			_tlbie_pid(pid, RIC_FLUSH_ALL);
> +		else
> +			_tlbiel_pid_multicast(mm, pid, RIC_FLUSH_ALL);
>  	} else {
>  local:
>  		_tlbiel_pid(pid, RIC_FLUSH_ALL);
> @@ -594,7 +696,10 @@ void radix__flush_tlb_page_psize(struct mm_struct *mm, 
unsigned long vmaddr,
>  			exit_flush_lazy_tlbs(mm);
>  			goto local;
>  		}
> -		_tlbie_va(vmaddr, pid, psize, RIC_FLUSH_TLB);
> +		if (cputlb_use_tlbie())
> +			_tlbie_va(vmaddr, pid, psize, RIC_FLUSH_TLB);
> +		else
> +			_tlbiel_va_multicast(mm, vmaddr, pid, psize, RIC_FLUSH_TLB);
>  	} else {
>  local:
>  		_tlbiel_va(vmaddr, pid, psize, RIC_FLUSH_TLB);
> @@ -616,6 +721,24 @@ EXPORT_SYMBOL(radix__flush_tlb_page);
>  #define radix__flush_all_mm radix__local_flush_all_mm
>  #endif /* CONFIG_SMP */
>  
> +static void do_tlbiel_kernel(void *info)
> +{
> +	_tlbiel_pid(0, RIC_FLUSH_ALL);
> +}
> +
> +static inline void _tlbiel_kernel_broadcast(void)
> +{
> +	on_each_cpu(do_tlbiel_kernel, NULL, 1);
> +	if (tlbie_capable) {
> +		/*
> +		 * Coherent accelerators don't refcount kernel memory mappings,
> +		 * so have to always issue a tlbie for them. This is quite a
> +		 * slow path anyway.
> +		 */
> +		_tlbie_pid(0, RIC_FLUSH_ALL);
> +	}
> +}
> +
>  /*
>   * If kernel TLBIs ever become local rather than global, then
>   * drivers/misc/ocxl/link.c:ocxl_link_add_pe will need some work, as it
> @@ -623,7 +746,10 @@ EXPORT_SYMBOL(radix__flush_tlb_page);
>   */
>  void radix__flush_tlb_kernel_range(unsigned long start, unsigned long end)
>  {
> -	_tlbie_pid(0, RIC_FLUSH_ALL);
> +	if (cputlb_use_tlbie())
> +		_tlbie_pid(0, RIC_FLUSH_ALL);
> +	else
> +		_tlbiel_kernel_broadcast();
>  }
>  EXPORT_SYMBOL(radix__flush_tlb_kernel_range);
>  
> @@ -679,10 +805,14 @@ static inline void __radix__flush_tlb_range(struct 
mm_struct *mm,
>  		if (local) {
>  			_tlbiel_pid(pid, RIC_FLUSH_TLB);
>  		} else {
> -			if (mm_needs_flush_escalation(mm))
> -				_tlbie_pid(pid, RIC_FLUSH_ALL);
> -			else
> -				_tlbie_pid(pid, RIC_FLUSH_TLB);
> +			if (cputlb_use_tlbie()) {
> +				if (mm_needs_flush_escalation(mm))
> +					_tlbie_pid(pid, RIC_FLUSH_ALL);
> +				else
> +					_tlbie_pid(pid, RIC_FLUSH_TLB);
> +			} else {
> +				_tlbiel_pid_multicast(mm, pid, RIC_FLUSH_TLB);
> +			}
>  		}
>  	} else {
>  		bool hflush = flush_all_sizes;
> @@ -707,8 +837,8 @@ static inline void __radix__flush_tlb_range(struct 
mm_struct *mm,
>  				gflush = false;
>  		}
>  
> -		asm volatile("ptesync": : :"memory");
>  		if (local) {
> +			asm volatile("ptesync": : :"memory");
>  			__tlbiel_va_range(start, end, pid, page_size, mmu_virtual_psize);
>  			if (hflush)
>  				__tlbiel_va_range(hstart, hend, pid,
> @@ -717,7 +847,8 @@ static inline void __radix__flush_tlb_range(struct 
mm_struct *mm,
>  				__tlbiel_va_range(gstart, gend, pid,
>  						PUD_SIZE, MMU_PAGE_1G);
>  			asm volatile("ptesync": : :"memory");
> -		} else {
> +		} else if (cputlb_use_tlbie()) {
> +			asm volatile("ptesync": : :"memory");
>  			__tlbie_va_range(start, end, pid, page_size, mmu_virtual_psize);
>  			if (hflush)
>  				__tlbie_va_range(hstart, hend, pid,
> @@ -727,6 +858,15 @@ static inline void __radix__flush_tlb_range(struct 
mm_struct *mm,
>  						PUD_SIZE, MMU_PAGE_1G);
>  			fixup_tlbie();
>  			asm volatile("eieio; tlbsync; ptesync": : :"memory");
> +		} else {
> +			_tlbiel_va_range_multicast(mm,
> +					start, end, pid, page_size, mmu_virtual_psize, false);
> +			if (hflush)
> +				_tlbiel_va_range_multicast(mm,
> +					hstart, hend, pid, PMD_SIZE, MMU_PAGE_2M, false);
> +			if (gflush)
> +				_tlbiel_va_range_multicast(mm,
> +					gstart, gend, pid, PUD_SIZE, MMU_PAGE_1G, false);
>  		}
>  	}
>  	preempt_enable();
> @@ -903,16 +1043,26 @@ static __always_inline void 
__radix__flush_tlb_range_psize(struct mm_struct *mm,
>  		if (local) {
>  			_tlbiel_pid(pid, also_pwc ? RIC_FLUSH_ALL : RIC_FLUSH_TLB);
>  		} else {
> -			if (mm_needs_flush_escalation(mm))
> -				also_pwc = true;
> +			if (cputlb_use_tlbie()) {
> +				if (mm_needs_flush_escalation(mm))
> +					also_pwc = true;
> +
> +				_tlbie_pid(pid,
> +					also_pwc ?  RIC_FLUSH_ALL : RIC_FLUSH_TLB);
> +			} else {
> +				_tlbiel_pid_multicast(mm, pid,
> +					also_pwc ?  RIC_FLUSH_ALL : RIC_FLUSH_TLB);
> +			}
>  
> -			_tlbie_pid(pid, also_pwc ? RIC_FLUSH_ALL : RIC_FLUSH_TLB);
>  		}
>  	} else {
>  		if (local)
>  			_tlbiel_va_range(start, end, pid, page_size, psize, also_pwc);
> -		else
> +		else if (cputlb_use_tlbie())
>  			_tlbie_va_range(start, end, pid, page_size, psize, also_pwc);
> +		else
> +			_tlbiel_va_range_multicast(mm,
> +					start, end, pid, page_size, psize, also_pwc);
>  	}
>  	preempt_enable();
>  }
> @@ -954,7 +1104,11 @@ void radix__flush_tlb_collapsed_pmd(struct mm_struct 
*mm, unsigned long addr)
>  			exit_flush_lazy_tlbs(mm);
>  			goto local;
>  		}
> -		_tlbie_va_range(addr, end, pid, PAGE_SIZE, mmu_virtual_psize, true);
> +		if (cputlb_use_tlbie())
> +			_tlbie_va_range(addr, end, pid, PAGE_SIZE, mmu_virtual_psize, 
true);
> +		else
> +			_tlbiel_va_range_multicast(mm,
> +					addr, end, pid, PAGE_SIZE, mmu_virtual_psize, true);
>  	} else {
>  local:
>  		_tlbiel_va_range(addr, end, pid, PAGE_SIZE, mmu_virtual_psize, true);
> diff --git a/drivers/misc/cxl/main.c b/drivers/misc/cxl/main.c
> index 482a2c1b340a..43b312d06e3e 100644
> --- a/drivers/misc/cxl/main.c
> +++ b/drivers/misc/cxl/main.c
> @@ -18,6 +18,7 @@
>  #include <linux/sched/task.h>
>  
>  #include <asm/cputable.h>
> +#include <asm/mmu.h>
>  #include <misc/cxl-base.h>
>  
>  #include "cxl.h"
> @@ -315,6 +316,9 @@ static int __init init_cxl(void)
>  {
>  	int rc = 0;
>  
> +	if (!tlbie_capable)
> +		return -EINVAL;
> +
>  	if ((rc = cxl_file_init()))
>  		return rc;
>  
> diff --git a/drivers/misc/ocxl/main.c b/drivers/misc/ocxl/main.c
> index 7210d9e059be..ef73cf35dda2 100644
> --- a/drivers/misc/ocxl/main.c
> +++ b/drivers/misc/ocxl/main.c
> @@ -2,12 +2,16 @@
>  // Copyright 2017 IBM Corp.
>  #include <linux/module.h>
>  #include <linux/pci.h>
> +#include <asm/mmu.h>
>  #include "ocxl_internal.h"
>  
>  static int __init init_ocxl(void)
>  {
>  	int rc = 0;
>  
> +	if (!tlbie_capable)
> +		return -EINVAL;
> +
>  	rc = ocxl_file_init();
>  	if (rc)
>  		return rc;
> 





^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH 6/6] powerpc/64s/radix: introduce options to disable use of the tlbie instruction
  2019-09-03  0:32   ` Alistair Popple
@ 2019-09-03  2:52     ` Nicholas Piggin
  0 siblings, 0 replies; 11+ messages in thread
From: Nicholas Piggin @ 2019-09-03  2:52 UTC (permalink / raw)
  To: Alistair Popple, linuxppc-dev

Alistair Popple's on September 3, 2019 10:32 am:
> Nick,
> 
> On Tuesday, 3 September 2019 1:29:31 AM AEST Nicholas Piggin wrote:
>> Introduce two options to control the use of the tlbie instruction. A
>> boot time option which completely disables the kernel using the
>> instruction, this is currently incompatible with HASH MMU, KVM, and
>> coherent accelerators.
> 
> Some accelerators (eg. cxl, ocxl, npu) call mm_context_add_copro() to force 
> global TLB invalidations:
> 
> static inline void mm_context_add_copro(struct mm_struct *mm)
> {
>         /*
>          * If any copro is in use, increment the active CPU count
>          * in order to force TLB invalidations to be global as to
>          * propagate to the Nest MMU.
>          */
>         if (atomic_inc_return(&mm->context.copros) == 1)
>                 inc_mm_active_cpus(mm);
> }
> 
> Admittedly I haven't dug into all the details of this patch but it sounds like 
> it might break the above if TLBIE is disabled. Do you think we should add a 
> WARN_ON if mm_context_add_copro() is called with TLBIE disabled? Or perhaps 
> even force TLBIE to be re-enabled if it is called with it disabled?

The patch has two flags, "enabled" and "capable". If capable is false
then it prevents cxl, oxcl, and KVM from loading. I think NPU is gone
from the tree now. Hash MMU won't work either, but for now you can't
mark !capable with hash.

If enabled is false but capable is true, then it avoids tlbie for
flushing the CPU translations, but will also use it to flush nMMU
coprocessors (and KVM for partition scope, but hopefully that can
be made to work with tlbiel as well).

So this should be fine. Could put a BUG in there if !tlbie_capable,
because we can't continue -- idea is that tlbie could be broken or
not implemented or we want to test without ever using it.

Thanks,
Nick


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH 1/6] powerpc/64s: remove register_process_table callback
  2019-09-02 15:29 ` [PATCH 1/6] powerpc/64s: remove register_process_table callback Nicholas Piggin
@ 2019-09-19 10:25   ` Michael Ellerman
  0 siblings, 0 replies; 11+ messages in thread
From: Michael Ellerman @ 2019-09-19 10:25 UTC (permalink / raw)
  To: Nicholas Piggin, linuxppc-dev; +Cc: Nicholas Piggin

On Mon, 2019-09-02 at 15:29:26 UTC, Nicholas Piggin wrote:
> This callback is only required because the partition table init comes
> before process table allocation on powernv (aka bare metal aka native).
> 
> Change the order to allocate the process table first, and remove the
> callback.
> 
> Signed-off-by: Nicholas Piggin <npiggin@gmail.com>

Series applied to powerpc next, thanks.

https://git.kernel.org/powerpc/c/ed6546bdc61b7c4bd926cebd82ba52d056fcefa1

cheers

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH 2/6] powerpc/64s/radix: tidy up TLB flushing code
       [not found] ` <20190902152931.17840-3-npiggin__24629.6128186927$1567438719$gmane$org@gmail.com>
@ 2019-09-30 21:37   ` Andreas Schwab
  0 siblings, 0 replies; 11+ messages in thread
From: Andreas Schwab @ 2019-09-30 21:37 UTC (permalink / raw)
  To: Nicholas Piggin; +Cc: linuxppc-dev

arch/powerpc/mm/book3s64/pgtable.c: In function ‘flush_partition’:
arch/powerpc/mm/book3s64/pgtable.c:216:3: error: implicit declaration of function ‘radix__flush_all_lpid_guest’ [-Werror=implicit-function-declaration]
   radix__flush_all_lpid_guest(lpid);
   ^
cc1: all warnings being treated as errors
make[4]: *** [arch/powerpc/mm/book3s64/pgtable.o] Error 1

Andreas.

-- 
Andreas Schwab, schwab@linux-m68k.org
GPG Key fingerprint = 7578 EB47 D4E5 4D69 2510  2552 DF73 E780 A9DA AEC1
"And now for something completely different."

^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2019-09-30 21:39 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-09-02 15:29 [PATCH 0/6] Making tlbie optional for radix Nicholas Piggin
2019-09-02 15:29 ` [PATCH 1/6] powerpc/64s: remove register_process_table callback Nicholas Piggin
2019-09-19 10:25   ` Michael Ellerman
2019-09-02 15:29 ` [PATCH 2/6] powerpc/64s/radix: tidy up TLB flushing code Nicholas Piggin
2019-09-02 15:29 ` [PATCH 3/6] powerpc/64s: make mmu_partition_table_set_entry TLB flush optional Nicholas Piggin
2019-09-02 15:29 ` [PATCH 4/6] powerpc/64s/pseries: radix flush translations before MMU is enabled at boot Nicholas Piggin
2019-09-02 15:29 ` [PATCH 5/6] powerpc/64s: remove unnecessary translation cache flushes " Nicholas Piggin
2019-09-02 15:29 ` [PATCH 6/6] powerpc/64s/radix: introduce options to disable use of the tlbie instruction Nicholas Piggin
2019-09-03  0:32   ` Alistair Popple
2019-09-03  2:52     ` Nicholas Piggin
     [not found] ` <20190902152931.17840-3-npiggin__24629.6128186927$1567438719$gmane$org@gmail.com>
2019-09-30 21:37   ` [PATCH 2/6] powerpc/64s/radix: tidy up TLB flushing code Andreas Schwab

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).