linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v2 0/6] arm64: tlb: add support for TTL feature
@ 2020-04-23 13:56 Zhenyu Ye
  2020-04-23 13:56 ` [PATCH v2 1/6] arm64: Detect the ARMv8.4 " Zhenyu Ye
                   ` (6 more replies)
  0 siblings, 7 replies; 21+ messages in thread
From: Zhenyu Ye @ 2020-04-23 13:56 UTC (permalink / raw)
  To: peterz, mark.rutland, will, catalin.marinas, aneesh.kumar, akpm,
	npiggin, arnd, rostedt, maz, suzuki.poulose, tglx, yuzhao,
	Dave.Martin, steven.price, broonie, guohanjun
  Cc: yezhenyu2, linux-arm-kernel, linux-kernel, linux-arch, linux-mm,
	arm, xiexiangyou, prime.zeng, zhangshaokun, kuhn.chenqun

In order to reduce the cost of TLB invalidation, ARMv8.4 provides
the TTL field in TLBI instruction.  The TTL field indicates the
level of page table walk holding the leaf entry for the address
being invalidated.  This series provide support for this feature.

When ARMv8.4-TTL is implemented, the operand for TLBIs looks like
below:

* +----------+-------+----------------------+
* |   ASID   |  TTL  |        BADDR         |
* +----------+-------+----------------------+
* |63      48|47   44|43                   0|


This version updates some codes implementation according to Peter's
suggestion, and adds some commit msg.

See patches for details, Thanks.


--
ChangeList:
v2:
rebase series on Linux 5.7-rc1 and simplify the code implementation.

v1:
add support for TTL feature in arm64.

Marc Zyngier (2):
  arm64: Detect the ARMv8.4 TTL feature
  arm64: Add level-hinted TLB invalidation helper

Peter Zijlstra (Intel) (1):
  tlb: mmu_gather: add tlb_flush_*_range APIs

Zhenyu Ye (3):
  arm64: Add tlbi_user_level TLB invalidation helper
  mm: tlb: Provide flush_*_tlb_range wrappers
  arm64: tlb: Set the TTL field in flush_tlb_range

 arch/arm64/include/asm/cpucaps.h  |  3 +-
 arch/arm64/include/asm/sysreg.h   |  1 +
 arch/arm64/include/asm/tlb.h      | 29 +++++++++++++++-
 arch/arm64/include/asm/tlbflush.h | 54 +++++++++++++++++++++++++-----
 arch/arm64/kernel/cpufeature.c    | 11 +++++++
 include/asm-generic/pgtable.h     | 12 +++++--
 include/asm-generic/tlb.h         | 55 ++++++++++++++++++++++---------
 mm/pgtable-generic.c              | 22 +++++++++++++
 8 files changed, 160 insertions(+), 27 deletions(-)

-- 
2.19.1




^ permalink raw reply	[flat|nested] 21+ messages in thread

* [PATCH v2 1/6] arm64: Detect the ARMv8.4 TTL feature
  2020-04-23 13:56 [PATCH v2 0/6] arm64: tlb: add support for TTL feature Zhenyu Ye
@ 2020-04-23 13:56 ` Zhenyu Ye
  2020-05-22 15:50   ` Catalin Marinas
  2020-04-23 13:56 ` [PATCH v2 2/6] arm64: Add level-hinted TLB invalidation helper Zhenyu Ye
                   ` (5 subsequent siblings)
  6 siblings, 1 reply; 21+ messages in thread
From: Zhenyu Ye @ 2020-04-23 13:56 UTC (permalink / raw)
  To: peterz, mark.rutland, will, catalin.marinas, aneesh.kumar, akpm,
	npiggin, arnd, rostedt, maz, suzuki.poulose, tglx, yuzhao,
	Dave.Martin, steven.price, broonie, guohanjun
  Cc: yezhenyu2, linux-arm-kernel, linux-kernel, linux-arch, linux-mm,
	arm, xiexiangyou, prime.zeng, zhangshaokun, kuhn.chenqun

From: Marc Zyngier <maz@kernel.org>

In order to reduce the cost of TLB invalidation, the ARMv8.4 TTL
feature allows TLBs to be issued with a level allowing for quicker
invalidation.

The TTL field indicates the level of page table walk
holding the leaf entry for the address being invalidated.

Let's detect the feature for now. Further patches will implement
its actual usage.

Signed-off-by: Marc Zyngier <maz@kernel.org>
Signed-off-by: Zhenyu Ye <yezhenyu2@huawei.com>
---
 arch/arm64/include/asm/cpucaps.h |  3 ++-
 arch/arm64/include/asm/sysreg.h  |  1 +
 arch/arm64/kernel/cpufeature.c   | 11 +++++++++++
 3 files changed, 14 insertions(+), 1 deletion(-)

diff --git a/arch/arm64/include/asm/cpucaps.h b/arch/arm64/include/asm/cpucaps.h
index 8eb5a088ae65..cabb0c49a1d1 100644
--- a/arch/arm64/include/asm/cpucaps.h
+++ b/arch/arm64/include/asm/cpucaps.h
@@ -61,7 +61,8 @@
 #define ARM64_HAS_AMU_EXTN			51
 #define ARM64_HAS_ADDRESS_AUTH			52
 #define ARM64_HAS_GENERIC_AUTH			53
+#define ARM64_HAS_ARMv8_4_TTL			54
 
-#define ARM64_NCAPS				54
+#define ARM64_NCAPS				55
 
 #endif /* __ASM_CPUCAPS_H */
diff --git a/arch/arm64/include/asm/sysreg.h b/arch/arm64/include/asm/sysreg.h
index ebc622432831..df02a57fc329 100644
--- a/arch/arm64/include/asm/sysreg.h
+++ b/arch/arm64/include/asm/sysreg.h
@@ -723,6 +723,7 @@
 
 /* id_aa64mmfr2 */
 #define ID_AA64MMFR2_E0PD_SHIFT		60
+#define ID_AA64MMFR2_TTL_SHIFT		48
 #define ID_AA64MMFR2_FWB_SHIFT		40
 #define ID_AA64MMFR2_AT_SHIFT		32
 #define ID_AA64MMFR2_LVA_SHIFT		16
diff --git a/arch/arm64/kernel/cpufeature.c b/arch/arm64/kernel/cpufeature.c
index 9fac745aa7bb..d993dc6dc7d5 100644
--- a/arch/arm64/kernel/cpufeature.c
+++ b/arch/arm64/kernel/cpufeature.c
@@ -244,6 +244,7 @@ static const struct arm64_ftr_bits ftr_id_aa64mmfr1[] = {
 
 static const struct arm64_ftr_bits ftr_id_aa64mmfr2[] = {
 	ARM64_FTR_BITS(FTR_HIDDEN, FTR_NONSTRICT, FTR_LOWER_SAFE, ID_AA64MMFR2_E0PD_SHIFT, 4, 0),
+	ARM64_FTR_BITS(FTR_HIDDEN, FTR_STRICT, FTR_LOWER_SAFE, ID_AA64MMFR2_TTL_SHIFT, 4, 0),
 	ARM64_FTR_BITS(FTR_HIDDEN, FTR_STRICT, FTR_LOWER_SAFE, ID_AA64MMFR2_FWB_SHIFT, 4, 0),
 	ARM64_FTR_BITS(FTR_VISIBLE, FTR_STRICT, FTR_LOWER_SAFE, ID_AA64MMFR2_AT_SHIFT, 4, 0),
 	ARM64_FTR_BITS(FTR_HIDDEN, FTR_STRICT, FTR_LOWER_SAFE, ID_AA64MMFR2_LVA_SHIFT, 4, 0),
@@ -1622,6 +1623,16 @@ static const struct arm64_cpu_capabilities arm64_features[] = {
 		.matches = has_cpuid_feature,
 		.cpu_enable = cpu_has_fwb,
 	},
+	{
+		.desc = "ARMv8.4 Translation Table Level",
+		.type = ARM64_CPUCAP_SYSTEM_FEATURE,
+		.capability = ARM64_HAS_ARMv8_4_TTL,
+		.sys_reg = SYS_ID_AA64MMFR2_EL1,
+		.sign = FTR_UNSIGNED,
+		.field_pos = ID_AA64MMFR2_TTL_SHIFT,
+		.min_field_value = 1,
+		.matches = has_cpuid_feature,
+	},
 #ifdef CONFIG_ARM64_HW_AFDBM
 	{
 		/*
-- 
2.19.1




^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH v2 2/6] arm64: Add level-hinted TLB invalidation helper
  2020-04-23 13:56 [PATCH v2 0/6] arm64: tlb: add support for TTL feature Zhenyu Ye
  2020-04-23 13:56 ` [PATCH v2 1/6] arm64: Detect the ARMv8.4 " Zhenyu Ye
@ 2020-04-23 13:56 ` Zhenyu Ye
  2020-05-22 15:50   ` Catalin Marinas
  2020-04-23 13:56 ` [PATCH v2 3/6] arm64: Add tlbi_user_level " Zhenyu Ye
                   ` (4 subsequent siblings)
  6 siblings, 1 reply; 21+ messages in thread
From: Zhenyu Ye @ 2020-04-23 13:56 UTC (permalink / raw)
  To: peterz, mark.rutland, will, catalin.marinas, aneesh.kumar, akpm,
	npiggin, arnd, rostedt, maz, suzuki.poulose, tglx, yuzhao,
	Dave.Martin, steven.price, broonie, guohanjun
  Cc: yezhenyu2, linux-arm-kernel, linux-kernel, linux-arch, linux-mm,
	arm, xiexiangyou, prime.zeng, zhangshaokun, kuhn.chenqun

From: Marc Zyngier <maz@kernel.org>

Add a level-hinted TLB invalidation helper that only gets used if
ARMv8.4-TTL gets detected.

When ARMv8.4-TTL is implemented, the operand for TLBIs looks like
below:

* +----------+-------+----------------------+
* |   ASID   |  TTL  |        BADDR         |
* +----------+-------+----------------------+
* |63      48|47   44|43                   0|

Signed-off-by: Marc Zyngier <maz@kernel.org>
Signed-off-by: Zhenyu Ye <yezhenyu2@huawei.com>
---
 arch/arm64/include/asm/tlbflush.h | 30 ++++++++++++++++++++++++++++++
 1 file changed, 30 insertions(+)

diff --git a/arch/arm64/include/asm/tlbflush.h b/arch/arm64/include/asm/tlbflush.h
index bc3949064725..5f9f189bc6d2 100644
--- a/arch/arm64/include/asm/tlbflush.h
+++ b/arch/arm64/include/asm/tlbflush.h
@@ -10,6 +10,7 @@
 
 #ifndef __ASSEMBLY__
 
+#include <linux/bitfield.h>
 #include <linux/mm_types.h>
 #include <linux/sched.h>
 #include <asm/cputype.h>
@@ -59,6 +60,35 @@
 		__ta;						\
 	})
 
+#define TLBI_TTL_MASK	GENMASK_ULL(47, 44)
+
+#define __tlbi_level(op, addr, level)					\
+	do {								\
+		u64 arg = addr;						\
+									\
+		if (cpus_have_const_cap(ARM64_HAS_ARMv8_4_TTL) &&	\
+		    level) {						\
+			u64 ttl = level;				\
+									\
+			switch (PAGE_SIZE) {				\
+			case SZ_4K:					\
+				ttl |= 1 << 2;				\
+				break;					\
+			case SZ_16K:					\
+				ttl |= 2 << 2;				\
+				break;					\
+			case SZ_64K:					\
+				ttl |= 3 << 2;				\
+				break;					\
+			}						\
+									\
+			arg &= ~TLBI_TTL_MASK;				\
+			arg |= FIELD_PREP(TLBI_TTL_MASK, ttl);		\
+		}							\
+									\
+		__tlbi(op,  arg);					\
+	} while (0)
+
 /*
  *	TLB Invalidation
  *	================
-- 
2.19.1




^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH v2 3/6] arm64: Add tlbi_user_level TLB invalidation helper
  2020-04-23 13:56 [PATCH v2 0/6] arm64: tlb: add support for TTL feature Zhenyu Ye
  2020-04-23 13:56 ` [PATCH v2 1/6] arm64: Detect the ARMv8.4 " Zhenyu Ye
  2020-04-23 13:56 ` [PATCH v2 2/6] arm64: Add level-hinted TLB invalidation helper Zhenyu Ye
@ 2020-04-23 13:56 ` Zhenyu Ye
  2020-05-22 15:49   ` Catalin Marinas
  2020-04-23 13:56 ` [PATCH v2 4/6] tlb: mmu_gather: add tlb_flush_*_range APIs Zhenyu Ye
                   ` (3 subsequent siblings)
  6 siblings, 1 reply; 21+ messages in thread
From: Zhenyu Ye @ 2020-04-23 13:56 UTC (permalink / raw)
  To: peterz, mark.rutland, will, catalin.marinas, aneesh.kumar, akpm,
	npiggin, arnd, rostedt, maz, suzuki.poulose, tglx, yuzhao,
	Dave.Martin, steven.price, broonie, guohanjun
  Cc: yezhenyu2, linux-arm-kernel, linux-kernel, linux-arch, linux-mm,
	arm, xiexiangyou, prime.zeng, zhangshaokun, kuhn.chenqun

Add a level-hinted parameter to __tlbi_user, which only gets used
if ARMv8.4-TTL gets detected.

This patch set the default level value to 0.

Signed-off-by: Zhenyu Ye <yezhenyu2@huawei.com>
---
 arch/arm64/include/asm/tlbflush.h | 18 ++++++++++++------
 1 file changed, 12 insertions(+), 6 deletions(-)

diff --git a/arch/arm64/include/asm/tlbflush.h b/arch/arm64/include/asm/tlbflush.h
index 5f9f189bc6d2..892f33235dc7 100644
--- a/arch/arm64/include/asm/tlbflush.h
+++ b/arch/arm64/include/asm/tlbflush.h
@@ -89,6 +89,12 @@
 		__tlbi(op,  arg);					\
 	} while (0)
 
+#define __tlbi_user_level(op, arg, level) do {				\
+	if (arm64_kernel_unmapped_at_el0())				\
+		__tlbi_level(op, (arg | USER_ASID_FLAG), level);	\
+} while (0)
+
+
 /*
  *	TLB Invalidation
  *	================
@@ -190,8 +196,8 @@ static inline void flush_tlb_page_nosync(struct vm_area_struct *vma,
 	unsigned long addr = __TLBI_VADDR(uaddr, ASID(vma->vm_mm));
 
 	dsb(ishst);
-	__tlbi(vale1is, addr);
-	__tlbi_user(vale1is, addr);
+	__tlbi_level(vale1is, addr, 0);
+	__tlbi_user_level(vale1is, addr, 0);
 }
 
 static inline void flush_tlb_page(struct vm_area_struct *vma,
@@ -231,11 +237,11 @@ static inline void __flush_tlb_range(struct vm_area_struct *vma,
 	dsb(ishst);
 	for (addr = start; addr < end; addr += stride) {
 		if (last_level) {
-			__tlbi(vale1is, addr);
-			__tlbi_user(vale1is, addr);
+			__tlbi_level(vale1is, addr, 0);
+			__tlbi_user_level(vale1is, addr, 0);
 		} else {
-			__tlbi(vae1is, addr);
-			__tlbi_user(vae1is, addr);
+			__tlbi_level(vae1is, addr, 0);
+			__tlbi_user_level(vae1is, addr, 0);
 		}
 	}
 	dsb(ish);
-- 
2.19.1




^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH v2 4/6] tlb: mmu_gather: add tlb_flush_*_range APIs
  2020-04-23 13:56 [PATCH v2 0/6] arm64: tlb: add support for TTL feature Zhenyu Ye
                   ` (2 preceding siblings ...)
  2020-04-23 13:56 ` [PATCH v2 3/6] arm64: Add tlbi_user_level " Zhenyu Ye
@ 2020-04-23 13:56 ` Zhenyu Ye
  2020-05-22 15:50   ` Catalin Marinas
  2020-04-23 13:56 ` [PATCH v2 5/6] mm: tlb: Provide flush_*_tlb_range wrappers Zhenyu Ye
                   ` (2 subsequent siblings)
  6 siblings, 1 reply; 21+ messages in thread
From: Zhenyu Ye @ 2020-04-23 13:56 UTC (permalink / raw)
  To: peterz, mark.rutland, will, catalin.marinas, aneesh.kumar, akpm,
	npiggin, arnd, rostedt, maz, suzuki.poulose, tglx, yuzhao,
	Dave.Martin, steven.price, broonie, guohanjun
  Cc: yezhenyu2, linux-arm-kernel, linux-kernel, linux-arch, linux-mm,
	arm, xiexiangyou, prime.zeng, zhangshaokun, kuhn.chenqun

From: "Peter Zijlstra (Intel)" <peterz@infradead.org>

tlb_flush_{pte|pmd|pud|p4d}_range() adjust the tlb->start and
tlb->end, then set corresponding cleared_*.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Zhenyu Ye <yezhenyu2@huawei.com>
---
 include/asm-generic/tlb.h | 55 ++++++++++++++++++++++++++++-----------
 1 file changed, 40 insertions(+), 15 deletions(-)

diff --git a/include/asm-generic/tlb.h b/include/asm-generic/tlb.h
index 3f1649a8cf55..ef75ec86f865 100644
--- a/include/asm-generic/tlb.h
+++ b/include/asm-generic/tlb.h
@@ -512,6 +512,38 @@ static inline void tlb_end_vma(struct mmu_gather *tlb, struct vm_area_struct *vm
 }
 #endif
 
+/*
+ * tlb_flush_{pte|pmd|pud|p4d}_range() adjust the tlb->start and tlb->end,
+ * and set corresponding cleared_*.
+ */
+static inline void tlb_flush_pte_range(struct mmu_gather *tlb,
+				     unsigned long address, unsigned long size)
+{
+	__tlb_adjust_range(tlb, address, size);
+	tlb->cleared_ptes = 1;
+}
+
+static inline void tlb_flush_pmd_range(struct mmu_gather *tlb,
+				     unsigned long address, unsigned long size)
+{
+	__tlb_adjust_range(tlb, address, size);
+	tlb->cleared_pmds = 1;
+}
+
+static inline void tlb_flush_pud_range(struct mmu_gather *tlb,
+				     unsigned long address, unsigned long size)
+{
+	__tlb_adjust_range(tlb, address, size);
+	tlb->cleared_puds = 1;
+}
+
+static inline void tlb_flush_p4d_range(struct mmu_gather *tlb,
+				     unsigned long address, unsigned long size)
+{
+	__tlb_adjust_range(tlb, address, size);
+	tlb->cleared_p4ds = 1;
+}
+
 #ifndef __tlb_remove_tlb_entry
 #define __tlb_remove_tlb_entry(tlb, ptep, address) do { } while (0)
 #endif
@@ -525,19 +557,17 @@ static inline void tlb_end_vma(struct mmu_gather *tlb, struct vm_area_struct *vm
  */
 #define tlb_remove_tlb_entry(tlb, ptep, address)		\
 	do {							\
-		__tlb_adjust_range(tlb, address, PAGE_SIZE);	\
-		tlb->cleared_ptes = 1;				\
+		tlb_flush_pte_range(tlb, address, PAGE_SIZE);	\
 		__tlb_remove_tlb_entry(tlb, ptep, address);	\
 	} while (0)
 
 #define tlb_remove_huge_tlb_entry(h, tlb, ptep, address)	\
 	do {							\
 		unsigned long _sz = huge_page_size(h);		\
-		__tlb_adjust_range(tlb, address, _sz);		\
 		if (_sz == PMD_SIZE)				\
-			tlb->cleared_pmds = 1;			\
+			tlb_flush_pmd_range(tlb, address, _sz);	\
 		else if (_sz == PUD_SIZE)			\
-			tlb->cleared_puds = 1;			\
+			tlb_flush_pud_range(tlb, address, _sz);	\
 		__tlb_remove_tlb_entry(tlb, ptep, address);	\
 	} while (0)
 
@@ -551,8 +581,7 @@ static inline void tlb_end_vma(struct mmu_gather *tlb, struct vm_area_struct *vm
 
 #define tlb_remove_pmd_tlb_entry(tlb, pmdp, address)			\
 	do {								\
-		__tlb_adjust_range(tlb, address, HPAGE_PMD_SIZE);	\
-		tlb->cleared_pmds = 1;					\
+		tlb_flush_pmd_range(tlb, address, HPAGE_PMD_SIZE);	\
 		__tlb_remove_pmd_tlb_entry(tlb, pmdp, address);		\
 	} while (0)
 
@@ -566,8 +595,7 @@ static inline void tlb_end_vma(struct mmu_gather *tlb, struct vm_area_struct *vm
 
 #define tlb_remove_pud_tlb_entry(tlb, pudp, address)			\
 	do {								\
-		__tlb_adjust_range(tlb, address, HPAGE_PUD_SIZE);	\
-		tlb->cleared_puds = 1;					\
+		tlb_flush_pud_range(tlb, address, HPAGE_PUD_SIZE);	\
 		__tlb_remove_pud_tlb_entry(tlb, pudp, address);		\
 	} while (0)
 
@@ -592,9 +620,8 @@ static inline void tlb_end_vma(struct mmu_gather *tlb, struct vm_area_struct *vm
 #ifndef pte_free_tlb
 #define pte_free_tlb(tlb, ptep, address)			\
 	do {							\
-		__tlb_adjust_range(tlb, address, PAGE_SIZE);	\
+		tlb_flush_pmd_range(tlb, address, PAGE_SIZE);	\
 		tlb->freed_tables = 1;				\
-		tlb->cleared_pmds = 1;				\
 		__pte_free_tlb(tlb, ptep, address);		\
 	} while (0)
 #endif
@@ -602,9 +629,8 @@ static inline void tlb_end_vma(struct mmu_gather *tlb, struct vm_area_struct *vm
 #ifndef pmd_free_tlb
 #define pmd_free_tlb(tlb, pmdp, address)			\
 	do {							\
-		__tlb_adjust_range(tlb, address, PAGE_SIZE);	\
+		tlb_flush_pud_range(tlb, address, PAGE_SIZE);	\
 		tlb->freed_tables = 1;				\
-		tlb->cleared_puds = 1;				\
 		__pmd_free_tlb(tlb, pmdp, address);		\
 	} while (0)
 #endif
@@ -612,9 +638,8 @@ static inline void tlb_end_vma(struct mmu_gather *tlb, struct vm_area_struct *vm
 #ifndef pud_free_tlb
 #define pud_free_tlb(tlb, pudp, address)			\
 	do {							\
-		__tlb_adjust_range(tlb, address, PAGE_SIZE);	\
+		tlb_flush_p4d_range(tlb, address, PAGE_SIZE);	\
 		tlb->freed_tables = 1;				\
-		tlb->cleared_p4ds = 1;				\
 		__pud_free_tlb(tlb, pudp, address);		\
 	} while (0)
 #endif
-- 
2.19.1




^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH v2 5/6] mm: tlb: Provide flush_*_tlb_range wrappers
  2020-04-23 13:56 [PATCH v2 0/6] arm64: tlb: add support for TTL feature Zhenyu Ye
                   ` (3 preceding siblings ...)
  2020-04-23 13:56 ` [PATCH v2 4/6] tlb: mmu_gather: add tlb_flush_*_range APIs Zhenyu Ye
@ 2020-04-23 13:56 ` Zhenyu Ye
  2020-05-22 15:42   ` Catalin Marinas
  2020-04-23 13:56 ` [PATCH v2 6/6] arm64: tlb: Set the TTL field in flush_tlb_range Zhenyu Ye
  2020-05-11 12:41 ` [PATCH v2 0/6] arm64: tlb: add support for TTL feature Zhenyu Ye
  6 siblings, 1 reply; 21+ messages in thread
From: Zhenyu Ye @ 2020-04-23 13:56 UTC (permalink / raw)
  To: peterz, mark.rutland, will, catalin.marinas, aneesh.kumar, akpm,
	npiggin, arnd, rostedt, maz, suzuki.poulose, tglx, yuzhao,
	Dave.Martin, steven.price, broonie, guohanjun
  Cc: yezhenyu2, linux-arm-kernel, linux-kernel, linux-arch, linux-mm,
	arm, xiexiangyou, prime.zeng, zhangshaokun, kuhn.chenqun

This patch provides flush_{pte|pmd|pud|p4d}_tlb_range() in generic
code, which are expressed through the mmu_gather APIs.  These
interface set tlb->cleared_* and finally call tlb_flush(), so we
can do the tlb invalidation according to the information in
struct mmu_gather.

Signed-off-by: Zhenyu Ye <yezhenyu2@huawei.com>
---
 include/asm-generic/pgtable.h | 12 ++++++++++--
 mm/pgtable-generic.c          | 22 ++++++++++++++++++++++
 2 files changed, 32 insertions(+), 2 deletions(-)

diff --git a/include/asm-generic/pgtable.h b/include/asm-generic/pgtable.h
index 329b8c8ca703..8c92122ded9b 100644
--- a/include/asm-generic/pgtable.h
+++ b/include/asm-generic/pgtable.h
@@ -1161,11 +1161,19 @@ static inline int pmd_free_pte_page(pmd_t *pmd, unsigned long addr)
  * invalidate the entire TLB which is not desitable.
  * e.g. see arch/arc: flush_pmd_tlb_range
  */
-#define flush_pmd_tlb_range(vma, addr, end)	flush_tlb_range(vma, addr, end)
-#define flush_pud_tlb_range(vma, addr, end)	flush_tlb_range(vma, addr, end)
+extern void flush_pte_tlb_range(struct vm_area_struct *vma,
+				unsigned long addr, unsigned long end);
+extern void flush_pmd_tlb_range(struct vm_area_struct *vma,
+				unsigned long addr, unsigned long end);
+extern void flush_pud_tlb_range(struct vm_area_struct *vma,
+				unsigned long addr, unsigned long end);
+extern void flush_p4d_tlb_range(struct vm_area_struct *vma,
+				unsigned long addr, unsigned long end);
 #else
+#define flush_pte_tlb_range(vma, addr, end)	BUILD_BUG()
 #define flush_pmd_tlb_range(vma, addr, end)	BUILD_BUG()
 #define flush_pud_tlb_range(vma, addr, end)	BUILD_BUG()
+#define flush_p4d_tlb_range(vma, addr, end)	BUILD_BUG()
 #endif
 #endif
 
diff --git a/mm/pgtable-generic.c b/mm/pgtable-generic.c
index 3d7c01e76efc..3eff199d3507 100644
--- a/mm/pgtable-generic.c
+++ b/mm/pgtable-generic.c
@@ -101,6 +101,28 @@ pte_t ptep_clear_flush(struct vm_area_struct *vma, unsigned long address,
 
 #ifdef CONFIG_TRANSPARENT_HUGEPAGE
 
+#ifndef __HAVE_ARCH_FLUSH_PMD_TLB_RANGE
+
+#define FLUSH_Pxx_TLB_RANGE(_pxx)					\
+void flush_##_pxx##_tlb_range(struct vm_area_struct *vma,		\
+			      unsigned long addr, unsigned long end)	\
+{									\
+		struct mmu_gather tlb;					\
+									\
+		tlb_gather_mmu(&tlb, vma->vm_mm, addr, end);		\
+		tlb_start_vma(&tlb, vma);				\
+		tlb_flush_##_pxx##_range(&tlb, addr, end - addr);	\
+		tlb_end_vma(&tlb, vma);					\
+		tlb_finish_mmu(&tlb, addr, end);			\
+}
+
+FLUSH_Pxx_TLB_RANGE(pte)
+FLUSH_Pxx_TLB_RANGE(pmd)
+FLUSH_Pxx_TLB_RANGE(pud)
+FLUSH_Pxx_TLB_RANGE(p4d)
+
+#endif /* __HAVE_ARCH_FLUSH_PMD_TLB_RANGE */
+
 #ifndef __HAVE_ARCH_PMDP_SET_ACCESS_FLAGS
 int pmdp_set_access_flags(struct vm_area_struct *vma,
 			  unsigned long address, pmd_t *pmdp,
-- 
2.19.1




^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH v2 6/6] arm64: tlb: Set the TTL field in flush_tlb_range
  2020-04-23 13:56 [PATCH v2 0/6] arm64: tlb: add support for TTL feature Zhenyu Ye
                   ` (4 preceding siblings ...)
  2020-04-23 13:56 ` [PATCH v2 5/6] mm: tlb: Provide flush_*_tlb_range wrappers Zhenyu Ye
@ 2020-04-23 13:56 ` Zhenyu Ye
  2020-05-26 14:56   ` Catalin Marinas
  2020-05-11 12:41 ` [PATCH v2 0/6] arm64: tlb: add support for TTL feature Zhenyu Ye
  6 siblings, 1 reply; 21+ messages in thread
From: Zhenyu Ye @ 2020-04-23 13:56 UTC (permalink / raw)
  To: peterz, mark.rutland, will, catalin.marinas, aneesh.kumar, akpm,
	npiggin, arnd, rostedt, maz, suzuki.poulose, tglx, yuzhao,
	Dave.Martin, steven.price, broonie, guohanjun
  Cc: yezhenyu2, linux-arm-kernel, linux-kernel, linux-arch, linux-mm,
	arm, xiexiangyou, prime.zeng, zhangshaokun, kuhn.chenqun

This patch uses the cleared_* in struct mmu_gather to set the
TTL field in flush_tlb_range().

Signed-off-by: Zhenyu Ye <yezhenyu2@huawei.com>
---
 arch/arm64/include/asm/tlb.h      | 29 ++++++++++++++++++++++++++++-
 arch/arm64/include/asm/tlbflush.h | 14 ++++++++------
 2 files changed, 36 insertions(+), 7 deletions(-)

diff --git a/arch/arm64/include/asm/tlb.h b/arch/arm64/include/asm/tlb.h
index b76df828e6b7..61c97d3b58c7 100644
--- a/arch/arm64/include/asm/tlb.h
+++ b/arch/arm64/include/asm/tlb.h
@@ -21,11 +21,37 @@ static void tlb_flush(struct mmu_gather *tlb);
 
 #include <asm-generic/tlb.h>
 
+/*
+ * get the tlbi levels in arm64.  Default value is 0 if more than one
+ * of cleared_* is set or neither is set.
+ * Arm64 doesn't support p4ds now.
+ */
+static inline int tlb_get_level(struct mmu_gather *tlb)
+{
+	if (tlb->cleared_ptes && !(tlb->cleared_pmds ||
+				   tlb->cleared_puds ||
+				   tlb->cleared_p4ds))
+		return 3;
+
+	if (tlb->cleared_pmds && !(tlb->cleared_ptes ||
+				   tlb->cleared_puds ||
+				   tlb->cleared_p4ds))
+		return 2;
+
+	if (tlb->cleared_puds && !(tlb->cleared_ptes ||
+				   tlb->cleared_pmds ||
+				   tlb->cleared_p4ds))
+		return 1;
+
+	return 0;
+}
+
 static inline void tlb_flush(struct mmu_gather *tlb)
 {
 	struct vm_area_struct vma = TLB_FLUSH_VMA(tlb->mm, 0);
 	bool last_level = !tlb->freed_tables;
 	unsigned long stride = tlb_get_unmap_size(tlb);
+	int tlb_level = tlb_get_level(tlb);
 
 	/*
 	 * If we're tearing down the address space then we only care about
@@ -38,7 +64,8 @@ static inline void tlb_flush(struct mmu_gather *tlb)
 		return;
 	}
 
-	__flush_tlb_range(&vma, tlb->start, tlb->end, stride, last_level);
+	__flush_tlb_range(&vma, tlb->start, tlb->end, stride,
+			  last_level, tlb_level);
 }
 
 static inline void __pte_free_tlb(struct mmu_gather *tlb, pgtable_t pte,
diff --git a/arch/arm64/include/asm/tlbflush.h b/arch/arm64/include/asm/tlbflush.h
index 892f33235dc7..3cc705755a2d 100644
--- a/arch/arm64/include/asm/tlbflush.h
+++ b/arch/arm64/include/asm/tlbflush.h
@@ -215,7 +215,8 @@ static inline void flush_tlb_page(struct vm_area_struct *vma,
 
 static inline void __flush_tlb_range(struct vm_area_struct *vma,
 				     unsigned long start, unsigned long end,
-				     unsigned long stride, bool last_level)
+				     unsigned long stride, bool last_level,
+				     int tlb_level)
 {
 	unsigned long asid = ASID(vma->vm_mm);
 	unsigned long addr;
@@ -237,11 +238,11 @@ static inline void __flush_tlb_range(struct vm_area_struct *vma,
 	dsb(ishst);
 	for (addr = start; addr < end; addr += stride) {
 		if (last_level) {
-			__tlbi_level(vale1is, addr, 0);
-			__tlbi_user_level(vale1is, addr, 0);
+			__tlbi_level(vale1is, addr, tlb_level);
+			__tlbi_user_level(vale1is, addr, tlb_level);
 		} else {
-			__tlbi_level(vae1is, addr, 0);
-			__tlbi_user_level(vae1is, addr, 0);
+			__tlbi_level(vae1is, addr, tlb_level);
+			__tlbi_user_level(vae1is, addr, tlb_level);
 		}
 	}
 	dsb(ish);
@@ -253,8 +254,9 @@ static inline void flush_tlb_range(struct vm_area_struct *vma,
 	/*
 	 * We cannot use leaf-only invalidation here, since we may be invalidating
 	 * table entries as part of collapsing hugepages or moving page tables.
+	 * Set the tlb_level to 0 because we can not get enough information here.
 	 */
-	__flush_tlb_range(vma, start, end, PAGE_SIZE, false);
+	__flush_tlb_range(vma, start, end, PAGE_SIZE, false, 0);
 }
 
 static inline void flush_tlb_kernel_range(unsigned long start, unsigned long end)
-- 
2.19.1




^ permalink raw reply related	[flat|nested] 21+ messages in thread

* Re: [PATCH v2 0/6] arm64: tlb: add support for TTL feature
  2020-04-23 13:56 [PATCH v2 0/6] arm64: tlb: add support for TTL feature Zhenyu Ye
                   ` (5 preceding siblings ...)
  2020-04-23 13:56 ` [PATCH v2 6/6] arm64: tlb: Set the TTL field in flush_tlb_range Zhenyu Ye
@ 2020-05-11 12:41 ` Zhenyu Ye
  6 siblings, 0 replies; 21+ messages in thread
From: Zhenyu Ye @ 2020-05-11 12:41 UTC (permalink / raw)
  To: peterz, mark.rutland, will, catalin.marinas, aneesh.kumar, akpm,
	npiggin, arnd, rostedt, maz, suzuki.poulose, tglx, yuzhao,
	Dave.Martin, steven.price, broonie, guohanjun
  Cc: linux-arm-kernel, linux-kernel, linux-arch, linux-mm, arm,
	xiexiangyou, prime.zeng, zhangshaokun, kuhn.chenqun

Hi all,

How is this going about this patch series? Does anyone have any
suggestions?

Thanks,
Zhenyu

On 2020/4/23 21:56, Zhenyu Ye wrote:
> In order to reduce the cost of TLB invalidation, ARMv8.4 provides
> the TTL field in TLBI instruction.  The TTL field indicates the
> level of page table walk holding the leaf entry for the address
> being invalidated.  This series provide support for this feature.
> 
> When ARMv8.4-TTL is implemented, the operand for TLBIs looks like
> below:
> 
> * +----------+-------+----------------------+
> * |   ASID   |  TTL  |        BADDR         |
> * +----------+-------+----------------------+
> * |63      48|47   44|43                   0|
> 
> 
> This version updates some codes implementation according to Peter's
> suggestion, and adds some commit msg.
> 
> See patches for details, Thanks.
> 
> 
> --
> ChangeList:
> v2:
> rebase series on Linux 5.7-rc1 and simplify the code implementation.
> 
> v1:
> add support for TTL feature in arm64.
> 
> Marc Zyngier (2):
>   arm64: Detect the ARMv8.4 TTL feature
>   arm64: Add level-hinted TLB invalidation helper
> 
> Peter Zijlstra (Intel) (1):
>   tlb: mmu_gather: add tlb_flush_*_range APIs
> 
> Zhenyu Ye (3):
>   arm64: Add tlbi_user_level TLB invalidation helper
>   mm: tlb: Provide flush_*_tlb_range wrappers
>   arm64: tlb: Set the TTL field in flush_tlb_range
> 
>  arch/arm64/include/asm/cpucaps.h  |  3 +-
>  arch/arm64/include/asm/sysreg.h   |  1 +
>  arch/arm64/include/asm/tlb.h      | 29 +++++++++++++++-
>  arch/arm64/include/asm/tlbflush.h | 54 +++++++++++++++++++++++++-----
>  arch/arm64/kernel/cpufeature.c    | 11 +++++++
>  include/asm-generic/pgtable.h     | 12 +++++--
>  include/asm-generic/tlb.h         | 55 ++++++++++++++++++++++---------
>  mm/pgtable-generic.c              | 22 +++++++++++++
>  8 files changed, 160 insertions(+), 27 deletions(-)
> 



^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v2 5/6] mm: tlb: Provide flush_*_tlb_range wrappers
  2020-04-23 13:56 ` [PATCH v2 5/6] mm: tlb: Provide flush_*_tlb_range wrappers Zhenyu Ye
@ 2020-05-22 15:42   ` Catalin Marinas
  2020-05-25  7:19     ` Zhenyu Ye
  0 siblings, 1 reply; 21+ messages in thread
From: Catalin Marinas @ 2020-05-22 15:42 UTC (permalink / raw)
  To: Zhenyu Ye
  Cc: peterz, mark.rutland, will, aneesh.kumar, akpm, npiggin, arnd,
	rostedt, maz, suzuki.poulose, tglx, yuzhao, Dave.Martin,
	steven.price, broonie, guohanjun, linux-arm-kernel, linux-kernel,
	linux-arch, linux-mm, arm, xiexiangyou, prime.zeng, zhangshaokun,
	kuhn.chenqun

On Thu, Apr 23, 2020 at 09:56:55PM +0800, Zhenyu Ye wrote:
> diff --git a/mm/pgtable-generic.c b/mm/pgtable-generic.c
> index 3d7c01e76efc..3eff199d3507 100644
> --- a/mm/pgtable-generic.c
> +++ b/mm/pgtable-generic.c
> @@ -101,6 +101,28 @@ pte_t ptep_clear_flush(struct vm_area_struct *vma, unsigned long address,
>  
>  #ifdef CONFIG_TRANSPARENT_HUGEPAGE
>  
> +#ifndef __HAVE_ARCH_FLUSH_PMD_TLB_RANGE
> +
> +#define FLUSH_Pxx_TLB_RANGE(_pxx)					\
> +void flush_##_pxx##_tlb_range(struct vm_area_struct *vma,		\
> +			      unsigned long addr, unsigned long end)	\
> +{									\
> +		struct mmu_gather tlb;					\
> +									\
> +		tlb_gather_mmu(&tlb, vma->vm_mm, addr, end);		\
> +		tlb_start_vma(&tlb, vma);				\
> +		tlb_flush_##_pxx##_range(&tlb, addr, end - addr);	\
> +		tlb_end_vma(&tlb, vma);					\
> +		tlb_finish_mmu(&tlb, addr, end);			\
> +}

I may have confused myself (flush_p??_tlb_* vs. tlb_flush_p??_*) but do
actually need this whole tlb_gather thing here? IIUC (by grep'ing),
flush_p?d_tlb_range() is only called on huge pages, so we should know
the level already.

-- 
Catalin


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v2 3/6] arm64: Add tlbi_user_level TLB invalidation helper
  2020-04-23 13:56 ` [PATCH v2 3/6] arm64: Add tlbi_user_level " Zhenyu Ye
@ 2020-05-22 15:49   ` Catalin Marinas
  2020-05-25  6:57     ` Zhenyu Ye
  0 siblings, 1 reply; 21+ messages in thread
From: Catalin Marinas @ 2020-05-22 15:49 UTC (permalink / raw)
  To: Zhenyu Ye
  Cc: peterz, mark.rutland, will, aneesh.kumar, akpm, npiggin, arnd,
	rostedt, maz, suzuki.poulose, tglx, yuzhao, Dave.Martin,
	steven.price, broonie, guohanjun, linux-arm-kernel, linux-kernel,
	linux-arch, linux-mm, arm, xiexiangyou, prime.zeng, zhangshaokun,
	kuhn.chenqun

On Thu, Apr 23, 2020 at 09:56:53PM +0800, Zhenyu Ye wrote:
> @@ -190,8 +196,8 @@ static inline void flush_tlb_page_nosync(struct vm_area_struct *vma,
>  	unsigned long addr = __TLBI_VADDR(uaddr, ASID(vma->vm_mm));
>  
>  	dsb(ishst);
> -	__tlbi(vale1is, addr);
> -	__tlbi_user(vale1is, addr);
> +	__tlbi_level(vale1is, addr, 0);
> +	__tlbi_user_level(vale1is, addr, 0);
>  }

This one remains with a level 0 throughout the series. Is this
intentional? If we can't guarantee the level here, better to use the
non-level __tlbi().

-- 
Catalin


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v2 1/6] arm64: Detect the ARMv8.4 TTL feature
  2020-04-23 13:56 ` [PATCH v2 1/6] arm64: Detect the ARMv8.4 " Zhenyu Ye
@ 2020-05-22 15:50   ` Catalin Marinas
  0 siblings, 0 replies; 21+ messages in thread
From: Catalin Marinas @ 2020-05-22 15:50 UTC (permalink / raw)
  To: Zhenyu Ye
  Cc: peterz, mark.rutland, will, aneesh.kumar, akpm, npiggin, arnd,
	rostedt, maz, suzuki.poulose, tglx, yuzhao, Dave.Martin,
	steven.price, broonie, guohanjun, linux-arm-kernel, linux-kernel,
	linux-arch, linux-mm, arm, xiexiangyou, prime.zeng, zhangshaokun,
	kuhn.chenqun

On Thu, Apr 23, 2020 at 09:56:51PM +0800, Zhenyu Ye wrote:
> From: Marc Zyngier <maz@kernel.org>
> 
> In order to reduce the cost of TLB invalidation, the ARMv8.4 TTL
> feature allows TLBs to be issued with a level allowing for quicker
> invalidation.
> 
> The TTL field indicates the level of page table walk
> holding the leaf entry for the address being invalidated.
> 
> Let's detect the feature for now. Further patches will implement
> its actual usage.
> 
> Signed-off-by: Marc Zyngier <maz@kernel.org>
> Signed-off-by: Zhenyu Ye <yezhenyu2@huawei.com>

Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v2 2/6] arm64: Add level-hinted TLB invalidation helper
  2020-04-23 13:56 ` [PATCH v2 2/6] arm64: Add level-hinted TLB invalidation helper Zhenyu Ye
@ 2020-05-22 15:50   ` Catalin Marinas
  2020-05-25  6:54     ` Zhenyu Ye
  0 siblings, 1 reply; 21+ messages in thread
From: Catalin Marinas @ 2020-05-22 15:50 UTC (permalink / raw)
  To: Zhenyu Ye
  Cc: peterz, mark.rutland, will, aneesh.kumar, akpm, npiggin, arnd,
	rostedt, maz, suzuki.poulose, tglx, yuzhao, Dave.Martin,
	steven.price, broonie, guohanjun, linux-arm-kernel, linux-kernel,
	linux-arch, linux-mm, arm, xiexiangyou, prime.zeng, zhangshaokun,
	kuhn.chenqun

On Thu, Apr 23, 2020 at 09:56:52PM +0800, Zhenyu Ye wrote:
> diff --git a/arch/arm64/include/asm/tlbflush.h b/arch/arm64/include/asm/tlbflush.h
> index bc3949064725..5f9f189bc6d2 100644
> --- a/arch/arm64/include/asm/tlbflush.h
> +++ b/arch/arm64/include/asm/tlbflush.h
> @@ -10,6 +10,7 @@
>  
>  #ifndef __ASSEMBLY__
>  
> +#include <linux/bitfield.h>
>  #include <linux/mm_types.h>
>  #include <linux/sched.h>
>  #include <asm/cputype.h>
> @@ -59,6 +60,35 @@
>  		__ta;						\
>  	})
>  
> +#define TLBI_TTL_MASK	GENMASK_ULL(47, 44)
> +
> +#define __tlbi_level(op, addr, level)					\
> +	do {								\

Nitpick: move "do {" on the same line as __tlbi_level() to reduce the
indentation levels of the whole block.

Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v2 4/6] tlb: mmu_gather: add tlb_flush_*_range APIs
  2020-04-23 13:56 ` [PATCH v2 4/6] tlb: mmu_gather: add tlb_flush_*_range APIs Zhenyu Ye
@ 2020-05-22 15:50   ` Catalin Marinas
  0 siblings, 0 replies; 21+ messages in thread
From: Catalin Marinas @ 2020-05-22 15:50 UTC (permalink / raw)
  To: Zhenyu Ye
  Cc: peterz, mark.rutland, will, aneesh.kumar, akpm, npiggin, arnd,
	rostedt, maz, suzuki.poulose, tglx, yuzhao, Dave.Martin,
	steven.price, broonie, guohanjun, linux-arm-kernel, linux-kernel,
	linux-arch, linux-mm, arm, xiexiangyou, prime.zeng, zhangshaokun,
	kuhn.chenqun

On Thu, Apr 23, 2020 at 09:56:54PM +0800, Zhenyu Ye wrote:
> From: "Peter Zijlstra (Intel)" <peterz@infradead.org>
> 
> tlb_flush_{pte|pmd|pud|p4d}_range() adjust the tlb->start and
> tlb->end, then set corresponding cleared_*.
> 
> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
> Signed-off-by: Zhenyu Ye <yezhenyu2@huawei.com>

Acked-by: Catalin Marinas <catalin.marinas@arm.com>


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v2 2/6] arm64: Add level-hinted TLB invalidation helper
  2020-05-22 15:50   ` Catalin Marinas
@ 2020-05-25  6:54     ` Zhenyu Ye
  0 siblings, 0 replies; 21+ messages in thread
From: Zhenyu Ye @ 2020-05-25  6:54 UTC (permalink / raw)
  To: Catalin Marinas
  Cc: peterz, mark.rutland, will, aneesh.kumar, akpm, npiggin, arnd,
	rostedt, maz, suzuki.poulose, tglx, yuzhao, Dave.Martin,
	steven.price, broonie, guohanjun, linux-arm-kernel, linux-kernel,
	linux-arch, linux-mm, arm, xiexiangyou, prime.zeng, zhangshaokun,
	kuhn.chenqun

On 2020/5/22 23:50, Catalin Marinas wrote:
> On Thu, Apr 23, 2020 at 09:56:52PM +0800, Zhenyu Ye wrote:
>> diff --git a/arch/arm64/include/asm/tlbflush.h b/arch/arm64/include/asm/tlbflush.h
>> index bc3949064725..5f9f189bc6d2 100644
>> --- a/arch/arm64/include/asm/tlbflush.h
>> +++ b/arch/arm64/include/asm/tlbflush.h
>> @@ -10,6 +10,7 @@
>>  
>>  #ifndef __ASSEMBLY__
>>  
>> +#include <linux/bitfield.h>
>>  #include <linux/mm_types.h>
>>  #include <linux/sched.h>
>>  #include <asm/cputype.h>
>> @@ -59,6 +60,35 @@
>>  		__ta;						\
>>  	})
>>  
>> +#define TLBI_TTL_MASK	GENMASK_ULL(47, 44)
>> +
>> +#define __tlbi_level(op, addr, level)					\
>> +	do {								\
> 
> Nitpick: move "do {" on the same line as __tlbi_level() to reduce the
> indentation levels of the whole block.
> 
> Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
> 

OK.



^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v2 3/6] arm64: Add tlbi_user_level TLB invalidation helper
  2020-05-22 15:49   ` Catalin Marinas
@ 2020-05-25  6:57     ` Zhenyu Ye
  0 siblings, 0 replies; 21+ messages in thread
From: Zhenyu Ye @ 2020-05-25  6:57 UTC (permalink / raw)
  To: Catalin Marinas
  Cc: peterz, mark.rutland, will, aneesh.kumar, akpm, npiggin, arnd,
	rostedt, maz, suzuki.poulose, tglx, yuzhao, Dave.Martin,
	steven.price, broonie, guohanjun, linux-arm-kernel, linux-kernel,
	linux-arch, linux-mm, arm, xiexiangyou, prime.zeng, zhangshaokun,
	kuhn.chenqun

On 2020/5/22 23:49, Catalin Marinas wrote:
> On Thu, Apr 23, 2020 at 09:56:53PM +0800, Zhenyu Ye wrote:
>> @@ -190,8 +196,8 @@ static inline void flush_tlb_page_nosync(struct vm_area_struct *vma,
>>  	unsigned long addr = __TLBI_VADDR(uaddr, ASID(vma->vm_mm));
>>  
>>  	dsb(ishst);
>> -	__tlbi(vale1is, addr);
>> -	__tlbi_user(vale1is, addr);
>> +	__tlbi_level(vale1is, addr, 0);
>> +	__tlbi_user_level(vale1is, addr, 0);
>>  }
> 
> This one remains with a level 0 throughout the series. Is this
> intentional? If we can't guarantee the level here, better to use the
> non-level __tlbi().
> 

OK, I will change it back to non-level __tlbi().

Thanks,
Zhenyu



^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v2 5/6] mm: tlb: Provide flush_*_tlb_range wrappers
  2020-05-22 15:42   ` Catalin Marinas
@ 2020-05-25  7:19     ` Zhenyu Ye
  2020-05-26 14:52       ` Catalin Marinas
  0 siblings, 1 reply; 21+ messages in thread
From: Zhenyu Ye @ 2020-05-25  7:19 UTC (permalink / raw)
  To: Catalin Marinas
  Cc: peterz, mark.rutland, will, aneesh.kumar, akpm, npiggin, arnd,
	rostedt, maz, suzuki.poulose, tglx, yuzhao, Dave.Martin,
	steven.price, broonie, guohanjun, linux-arm-kernel, linux-kernel,
	linux-arch, linux-mm, arm, xiexiangyou, prime.zeng, zhangshaokun,
	kuhn.chenqun

On 2020/5/22 23:42, Catalin Marinas wrote:
> On Thu, Apr 23, 2020 at 09:56:55PM +0800, Zhenyu Ye wrote:
>> diff --git a/mm/pgtable-generic.c b/mm/pgtable-generic.c
>> index 3d7c01e76efc..3eff199d3507 100644
>> --- a/mm/pgtable-generic.c
>> +++ b/mm/pgtable-generic.c
>> @@ -101,6 +101,28 @@ pte_t ptep_clear_flush(struct vm_area_struct *vma, unsigned long address,
>>  
>>  #ifdef CONFIG_TRANSPARENT_HUGEPAGE
>>  
>> +#ifndef __HAVE_ARCH_FLUSH_PMD_TLB_RANGE
>> +
>> +#define FLUSH_Pxx_TLB_RANGE(_pxx)					\
>> +void flush_##_pxx##_tlb_range(struct vm_area_struct *vma,		\
>> +			      unsigned long addr, unsigned long end)	\
>> +{									\
>> +		struct mmu_gather tlb;					\
>> +									\
>> +		tlb_gather_mmu(&tlb, vma->vm_mm, addr, end);		\
>> +		tlb_start_vma(&tlb, vma);				\
>> +		tlb_flush_##_pxx##_range(&tlb, addr, end - addr);	\
>> +		tlb_end_vma(&tlb, vma);					\
>> +		tlb_finish_mmu(&tlb, addr, end);			\
>> +}
> 
> I may have confused myself (flush_p??_tlb_* vs. tlb_flush_p??_*) but do
> actually need this whole tlb_gather thing here? IIUC (by grep'ing),
> flush_p?d_tlb_range() is only called on huge pages, so we should know
> the level already.
> 

tlb_flush_##_pxx##_range() is used to set tlb->cleared_*,
flush_##_pxx##_tlb_range() will actually flush the TLB entry.

In arch64, tlb_flush_p?d_range() is defined as:

	#define flush_pmd_tlb_range(vma, addr, end)	flush_tlb_range(vma, addr, end)
	#define flush_pud_tlb_range(vma, addr, end)	flush_tlb_range(vma, addr, end)

So even if we know the level here, we can not pass the value to tlbi
instructions (flush_tlb_range() is a common kernel interface and retro-fit it
needs lots of changes), according to Peter's suggestion, I finally decide to
pass the value of TTL by the tlb_gather_* frame.[1]

[1] https://lore.kernel.org/linux-arm-kernel/20200331142927.1237-1-yezhenyu2@huawei.com/

Thanks,
Zhenyu



^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v2 5/6] mm: tlb: Provide flush_*_tlb_range wrappers
  2020-05-25  7:19     ` Zhenyu Ye
@ 2020-05-26 14:52       ` Catalin Marinas
  2020-05-30 10:24         ` Zhenyu Ye
  0 siblings, 1 reply; 21+ messages in thread
From: Catalin Marinas @ 2020-05-26 14:52 UTC (permalink / raw)
  To: Zhenyu Ye
  Cc: peterz, mark.rutland, will, aneesh.kumar, akpm, npiggin, arnd,
	rostedt, maz, suzuki.poulose, tglx, yuzhao, Dave.Martin,
	steven.price, broonie, guohanjun, linux-arm-kernel, linux-kernel,
	linux-arch, linux-mm, arm, xiexiangyou, prime.zeng, zhangshaokun,
	kuhn.chenqun

On Mon, May 25, 2020 at 03:19:42PM +0800, Zhenyu Ye wrote:
> On 2020/5/22 23:42, Catalin Marinas wrote:
> > On Thu, Apr 23, 2020 at 09:56:55PM +0800, Zhenyu Ye wrote:
> >> diff --git a/mm/pgtable-generic.c b/mm/pgtable-generic.c
> >> index 3d7c01e76efc..3eff199d3507 100644
> >> --- a/mm/pgtable-generic.c
> >> +++ b/mm/pgtable-generic.c
> >> @@ -101,6 +101,28 @@ pte_t ptep_clear_flush(struct vm_area_struct *vma, unsigned long address,
> >>  
> >>  #ifdef CONFIG_TRANSPARENT_HUGEPAGE
> >>  
> >> +#ifndef __HAVE_ARCH_FLUSH_PMD_TLB_RANGE
> >> +
> >> +#define FLUSH_Pxx_TLB_RANGE(_pxx)					\
> >> +void flush_##_pxx##_tlb_range(struct vm_area_struct *vma,		\
> >> +			      unsigned long addr, unsigned long end)	\
> >> +{									\
> >> +		struct mmu_gather tlb;					\
> >> +									\
> >> +		tlb_gather_mmu(&tlb, vma->vm_mm, addr, end);		\
> >> +		tlb_start_vma(&tlb, vma);				\
> >> +		tlb_flush_##_pxx##_range(&tlb, addr, end - addr);	\
> >> +		tlb_end_vma(&tlb, vma);					\
> >> +		tlb_finish_mmu(&tlb, addr, end);			\
> >> +}
> > 
> > I may have confused myself (flush_p??_tlb_* vs. tlb_flush_p??_*) but do
> > actually need this whole tlb_gather thing here? IIUC (by grep'ing),
> > flush_p?d_tlb_range() is only called on huge pages, so we should know
> > the level already.
> 
> tlb_flush_##_pxx##_range() is used to set tlb->cleared_*,
> flush_##_pxx##_tlb_range() will actually flush the TLB entry.
> 
> In arch64, tlb_flush_p?d_range() is defined as:
> 
> 	#define flush_pmd_tlb_range(vma, addr, end)	flush_tlb_range(vma, addr, end)
> 	#define flush_pud_tlb_range(vma, addr, end)	flush_tlb_range(vma, addr, end)

Currently, flush_p??_tlb_range() are generic and defined as above. I
think in the generic code they can remain an alias for
flush_tlb_range().

On arm64, we can redefine them as:

#define flush_pte_tlb_range(vma, addr, end)	__flush_tlb_range(vma, addr, end, 3)
#define flush_pmd_tlb_range(vma, addr, end)	__flush_tlb_range(vma, addr, end, 2)
#define flush_pud_tlb_range(vma, addr, end)	__flush_tlb_range(vma, addr, end, 1)
#define flush_p4d_tlb_range(vma, addr, end)	__flush_tlb_range(vma, addr, end, 0)

(unless the compiler optimises away all the mmu_gather stuff in your
macro above but they don't look trivial to me)

Also, I don't see the new flush_pte_* and flush_p4d_* macros used
anywhere and I don't think they are needed. The pte equivalent is
flush_tlb_page() (we need to make sure it's not used on a pmd in the
hugetlb context).

> So even if we know the level here, we can not pass the value to tlbi
> instructions (flush_tlb_range() is a common kernel interface and retro-fit it
> needs lots of changes), according to Peter's suggestion, I finally decide to
> pass the value of TTL by the tlb_gather_* frame.[1]

My comment was about the generic implementation using mmu_gather as you
are proposing. We don't need to change the flush_tlb_range() interface,
nor do we need to rewrite flush_p??_tlb_range().

-- 
Catalin


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v2 6/6] arm64: tlb: Set the TTL field in flush_tlb_range
  2020-04-23 13:56 ` [PATCH v2 6/6] arm64: tlb: Set the TTL field in flush_tlb_range Zhenyu Ye
@ 2020-05-26 14:56   ` Catalin Marinas
  0 siblings, 0 replies; 21+ messages in thread
From: Catalin Marinas @ 2020-05-26 14:56 UTC (permalink / raw)
  To: Zhenyu Ye
  Cc: peterz, mark.rutland, will, aneesh.kumar, akpm, npiggin, arnd,
	rostedt, maz, suzuki.poulose, tglx, yuzhao, Dave.Martin,
	steven.price, broonie, guohanjun, linux-arm-kernel, linux-kernel,
	linux-arch, linux-mm, arm, xiexiangyou, prime.zeng, zhangshaokun,
	kuhn.chenqun

On Thu, Apr 23, 2020 at 09:56:56PM +0800, Zhenyu Ye wrote:
> This patch uses the cleared_* in struct mmu_gather to set the
> TTL field in flush_tlb_range().
> 
> Signed-off-by: Zhenyu Ye <yezhenyu2@huawei.com>

Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v2 5/6] mm: tlb: Provide flush_*_tlb_range wrappers
  2020-05-26 14:52       ` Catalin Marinas
@ 2020-05-30 10:24         ` Zhenyu Ye
  2020-06-01 11:56           ` Catalin Marinas
  0 siblings, 1 reply; 21+ messages in thread
From: Zhenyu Ye @ 2020-05-30 10:24 UTC (permalink / raw)
  To: Catalin Marinas
  Cc: peterz, mark.rutland, will, aneesh.kumar, akpm, npiggin, arnd,
	rostedt, maz, suzuki.poulose, tglx, yuzhao, Dave.Martin,
	steven.price, broonie, guohanjun, linux-arm-kernel, linux-kernel,
	linux-arch, linux-mm, arm, xiexiangyou, prime.zeng, zhangshaokun,
	kuhn.chenqun

Hi Catalin,

Sorry for taking so long to reply to you.

On 2020/5/26 22:52, Catalin Marinas wrote:
> On Mon, May 25, 2020 at 03:19:42PM +0800, Zhenyu Ye wrote:
>>
>> tlb_flush_##_pxx##_range() is used to set tlb->cleared_*,
>> flush_##_pxx##_tlb_range() will actually flush the TLB entry.
>>
>> In arch64, tlb_flush_p?d_range() is defined as:
>>
>> 	#define flush_pmd_tlb_range(vma, addr, end)	flush_tlb_range(vma, addr, end)
>> 	#define flush_pud_tlb_range(vma, addr, end)	flush_tlb_range(vma, addr, end)
> 
> Currently, flush_p??_tlb_range() are generic and defined as above. I
> think in the generic code they can remain an alias for
> flush_tlb_range().
> 
> On arm64, we can redefine them as:
> 
> #define flush_pte_tlb_range(vma, addr, end)	__flush_tlb_range(vma, addr, end, 3)
> #define flush_pmd_tlb_range(vma, addr, end)	__flush_tlb_range(vma, addr, end, 2)
> #define flush_pud_tlb_range(vma, addr, end)	__flush_tlb_range(vma, addr, end, 1)
> #define flush_p4d_tlb_range(vma, addr, end)	__flush_tlb_range(vma, addr, end, 0)
> 
> (unless the compiler optimises away all the mmu_gather stuff in your
> macro above but they don't look trivial to me)
> 

I changed generic code before considering that other structures may also
use this feature, such as Power9. And Peter may want to replace all
flush_tlb_range() by tlb_flush() in the future, see [1] for details.

If only enable this feature on aarch64, your codes are better.

[1] https://lore.kernel.org/linux-arm-kernel/20200402163849.GM20713@hirez.programming.kicks-ass.net/

> Also, I don't see the new flush_pte_* and flush_p4d_* macros used
> anywhere and I don't think they are needed. The pte equivalent is
> flush_tlb_page() (we need to make sure it's not used on a pmd in the
> hugetlb context).
> 

flush_tlb_page() is used to flush only one page.  If we add the flush_pte_tlb_range(),
then we can use it to flush a range of pages in the future.

But flush_pte_* and flush_p4d_* macros are really not used anywhere.
I will remove them in next version of series, and add them if someone needs.

>> So even if we know the level here, we can not pass the value to tlbi
>> instructions (flush_tlb_range() is a common kernel interface and retro-fit it
>> needs lots of changes), according to Peter's suggestion, I finally decide to
>> pass the value of TTL by the tlb_gather_* frame.[1]
> 
> My comment was about the generic implementation using mmu_gather as you
> are proposing. We don't need to change the flush_tlb_range() interface,
> nor do we need to rewrite flush_p??_tlb_range().
> 

Thanks,
Zhenyu




^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v2 5/6] mm: tlb: Provide flush_*_tlb_range wrappers
  2020-05-30 10:24         ` Zhenyu Ye
@ 2020-06-01 11:56           ` Catalin Marinas
  2020-06-01 13:36             ` Zhenyu Ye
  0 siblings, 1 reply; 21+ messages in thread
From: Catalin Marinas @ 2020-06-01 11:56 UTC (permalink / raw)
  To: Zhenyu Ye
  Cc: peterz, mark.rutland, will, aneesh.kumar, akpm, npiggin, arnd,
	rostedt, maz, suzuki.poulose, tglx, yuzhao, Dave.Martin,
	steven.price, broonie, guohanjun, linux-arm-kernel, linux-kernel,
	linux-arch, linux-mm, arm, xiexiangyou, prime.zeng, zhangshaokun,
	kuhn.chenqun

Hi Zhenyu,

On Sat, May 30, 2020 at 06:24:21PM +0800, Zhenyu Ye wrote:
> On 2020/5/26 22:52, Catalin Marinas wrote:
> > On Mon, May 25, 2020 at 03:19:42PM +0800, Zhenyu Ye wrote:
> >> tlb_flush_##_pxx##_range() is used to set tlb->cleared_*,
> >> flush_##_pxx##_tlb_range() will actually flush the TLB entry.
> >>
> >> In arch64, tlb_flush_p?d_range() is defined as:
> >>
> >> 	#define flush_pmd_tlb_range(vma, addr, end)	flush_tlb_range(vma, addr, end)
> >> 	#define flush_pud_tlb_range(vma, addr, end)	flush_tlb_range(vma, addr, end)
> > 
> > Currently, flush_p??_tlb_range() are generic and defined as above. I
> > think in the generic code they can remain an alias for
> > flush_tlb_range().
> > 
> > On arm64, we can redefine them as:
> > 
> > #define flush_pte_tlb_range(vma, addr, end)	__flush_tlb_range(vma, addr, end, 3)
> > #define flush_pmd_tlb_range(vma, addr, end)	__flush_tlb_range(vma, addr, end, 2)
> > #define flush_pud_tlb_range(vma, addr, end)	__flush_tlb_range(vma, addr, end, 1)
> > #define flush_p4d_tlb_range(vma, addr, end)	__flush_tlb_range(vma, addr, end, 0)
> > 
> > (unless the compiler optimises away all the mmu_gather stuff in your
> > macro above but they don't look trivial to me)
> 
> I changed generic code before considering that other structures may also
> use this feature, such as Power9. And Peter may want to replace all
> flush_tlb_range() by tlb_flush() in the future, see [1] for details.
> 
> If only enable this feature on aarch64, your codes are better.
> 
> [1] https://lore.kernel.org/linux-arm-kernel/20200402163849.GM20713@hirez.programming.kicks-ass.net/

But we change the semantics slightly if we implement these as
mmu_gather. For example, tlb_end_vma() -> tlb_flush_mmu_tlbonly() ends
up calling mmu_notifier_invalidate_range() which it didn't before. I
think we end up invoking the notifier unnecessarily in some cases (see
the comment in __split_huge_pmd()) or we end up calling the notifier
twice (e.g. pmdp_huge_clear_flush_notify()).

> > Also, I don't see the new flush_pte_* and flush_p4d_* macros used
> > anywhere and I don't think they are needed. The pte equivalent is
> > flush_tlb_page() (we need to make sure it's not used on a pmd in the
> > hugetlb context).
> 
> flush_tlb_page() is used to flush only one page.  If we add the
> flush_pte_tlb_range(), then we can use it to flush a range of pages in
> the future.

If we know flush_tlb_page() is only called on a small page, could we add
TTL information here as well?

> But flush_pte_* and flush_p4d_* macros are really not used anywhere. I
> will remove them in next version of series, and add them if someone
> needs.

I think it makes sense.

-- 
Catalin


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v2 5/6] mm: tlb: Provide flush_*_tlb_range wrappers
  2020-06-01 11:56           ` Catalin Marinas
@ 2020-06-01 13:36             ` Zhenyu Ye
  0 siblings, 0 replies; 21+ messages in thread
From: Zhenyu Ye @ 2020-06-01 13:36 UTC (permalink / raw)
  To: Catalin Marinas
  Cc: peterz, mark.rutland, will, aneesh.kumar, akpm, npiggin, arnd,
	rostedt, maz, suzuki.poulose, tglx, yuzhao, Dave.Martin,
	steven.price, broonie, guohanjun, linux-arm-kernel, linux-kernel,
	linux-arch, linux-mm, arm, xiexiangyou, prime.zeng, zhangshaokun,
	kuhn.chenqun

On 2020/6/1 19:56, Catalin Marinas wrote:
> Hi Zhenyu,
> 
> On Sat, May 30, 2020 at 06:24:21PM +0800, Zhenyu Ye wrote:
>> On 2020/5/26 22:52, Catalin Marinas wrote:
>>> On Mon, May 25, 2020 at 03:19:42PM +0800, Zhenyu Ye wrote:
>>>> tlb_flush_##_pxx##_range() is used to set tlb->cleared_*,
>>>> flush_##_pxx##_tlb_range() will actually flush the TLB entry.
>>>>
>>>> In arch64, tlb_flush_p?d_range() is defined as:
>>>>
>>>> 	#define flush_pmd_tlb_range(vma, addr, end)	flush_tlb_range(vma, addr, end)
>>>> 	#define flush_pud_tlb_range(vma, addr, end)	flush_tlb_range(vma, addr, end)
>>>
>>> Currently, flush_p??_tlb_range() are generic and defined as above. I
>>> think in the generic code they can remain an alias for
>>> flush_tlb_range().
>>>
>>> On arm64, we can redefine them as:
>>>
>>> #define flush_pte_tlb_range(vma, addr, end)	__flush_tlb_range(vma, addr, end, 3)
>>> #define flush_pmd_tlb_range(vma, addr, end)	__flush_tlb_range(vma, addr, end, 2)
>>> #define flush_pud_tlb_range(vma, addr, end)	__flush_tlb_range(vma, addr, end, 1)
>>> #define flush_p4d_tlb_range(vma, addr, end)	__flush_tlb_range(vma, addr, end, 0)
>>>
>>> (unless the compiler optimises away all the mmu_gather stuff in your
>>> macro above but they don't look trivial to me)
>>
>> I changed generic code before considering that other structures may also
>> use this feature, such as Power9. And Peter may want to replace all
>> flush_tlb_range() by tlb_flush() in the future, see [1] for details.
>>
>> If only enable this feature on aarch64, your codes are better.
>>
>> [1] https://lore.kernel.org/linux-arm-kernel/20200402163849.GM20713@hirez.programming.kicks-ass.net/
> 
> But we change the semantics slightly if we implement these as
> mmu_gather. For example, tlb_end_vma() -> tlb_flush_mmu_tlbonly() ends
> up calling mmu_notifier_invalidate_range() which it didn't before. I
> think we end up invoking the notifier unnecessarily in some cases (see
> the comment in __split_huge_pmd()) or we end up calling the notifier
> twice (e.g. pmdp_huge_clear_flush_notify()).
> 

Yes, so only enable this feature on aarch64 may be better.
I will change this in V4 of this series. [the v3 only has some minor
changes and can be ignored :)]

>>> Also, I don't see the new flush_pte_* and flush_p4d_* macros used
>>> anywhere and I don't think they are needed. The pte equivalent is
>>> flush_tlb_page() (we need to make sure it's not used on a pmd in the
>>> hugetlb context).
>>
>> flush_tlb_page() is used to flush only one page.  If we add the
>> flush_pte_tlb_range(), then we can use it to flush a range of pages in
>> the future.
> 
> If we know flush_tlb_page() is only called on a small page, could we add
> TTL information here as well?
> 

Yes, we could. I will add this in flush_tlb_page().

>> But flush_pte_* and flush_p4d_* macros are really not used anywhere. I
>> will remove them in next version of series, and add them if someone
>> needs.
> 
> I think it makes sense.
> 

Thanks,
Zhenyu



^ permalink raw reply	[flat|nested] 21+ messages in thread

end of thread, other threads:[~2020-06-01 13:37 UTC | newest]

Thread overview: 21+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-04-23 13:56 [PATCH v2 0/6] arm64: tlb: add support for TTL feature Zhenyu Ye
2020-04-23 13:56 ` [PATCH v2 1/6] arm64: Detect the ARMv8.4 " Zhenyu Ye
2020-05-22 15:50   ` Catalin Marinas
2020-04-23 13:56 ` [PATCH v2 2/6] arm64: Add level-hinted TLB invalidation helper Zhenyu Ye
2020-05-22 15:50   ` Catalin Marinas
2020-05-25  6:54     ` Zhenyu Ye
2020-04-23 13:56 ` [PATCH v2 3/6] arm64: Add tlbi_user_level " Zhenyu Ye
2020-05-22 15:49   ` Catalin Marinas
2020-05-25  6:57     ` Zhenyu Ye
2020-04-23 13:56 ` [PATCH v2 4/6] tlb: mmu_gather: add tlb_flush_*_range APIs Zhenyu Ye
2020-05-22 15:50   ` Catalin Marinas
2020-04-23 13:56 ` [PATCH v2 5/6] mm: tlb: Provide flush_*_tlb_range wrappers Zhenyu Ye
2020-05-22 15:42   ` Catalin Marinas
2020-05-25  7:19     ` Zhenyu Ye
2020-05-26 14:52       ` Catalin Marinas
2020-05-30 10:24         ` Zhenyu Ye
2020-06-01 11:56           ` Catalin Marinas
2020-06-01 13:36             ` Zhenyu Ye
2020-04-23 13:56 ` [PATCH v2 6/6] arm64: tlb: Set the TTL field in flush_tlb_range Zhenyu Ye
2020-05-26 14:56   ` Catalin Marinas
2020-05-11 12:41 ` [PATCH v2 0/6] arm64: tlb: add support for TTL feature Zhenyu Ye

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).