linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [RFC PATCH v3 0/2] arm64: tlb: add support for TLBI RANGE instructions
@ 2020-06-01 14:47 Zhenyu Ye
  2020-06-01 14:47 ` [RFC PATCH v4 1/2] arm64: tlb: Detect the ARMv8.4 TLBI RANGE feature Zhenyu Ye
  2020-06-01 14:47 ` [RFC PATCH v4 2/2] arm64: tlb: Use the TLBI RANGE feature in arm64 Zhenyu Ye
  0 siblings, 2 replies; 9+ messages in thread
From: Zhenyu Ye @ 2020-06-01 14:47 UTC (permalink / raw)
  To: catalin.marinas, will, suzuki.poulose, maz, steven.price,
	guohanjun, olof
  Cc: yezhenyu2, linux-arm-kernel, linux-kernel, linux-arch, linux-mm,
	arm, xiexiangyou, prime.zeng, zhangshaokun, kuhn.chenqun

ARMv8.4-TLBI provides TLBI invalidation instruction that apply to a
range of input addresses. This series add support for this feature.

--
ChangeList:
v4:
combine the __flush_tlb_range() and the __directly into the same function
with a single loop for both.

v3:
rebase this series on Linux 5.7-rc1.

v2:
Link: https://lkml.org/lkml/2019/11/11/348

Zhenyu Ye (2):
  arm64: tlb: Detect the ARMv8.4 TLBI RANGE feature
  arm64: tlb: Use the TLBI RANGE feature in arm64

 arch/arm64/include/asm/cpucaps.h  |   3 +-
 arch/arm64/include/asm/sysreg.h   |   4 ++
 arch/arm64/include/asm/tlbflush.h | 108 +++++++++++++++++++++++++++++-
 arch/arm64/kernel/cpufeature.c    |  11 +++
 4 files changed, 124 insertions(+), 2 deletions(-)

-- 
2.19.1



^ permalink raw reply	[flat|nested] 9+ messages in thread

* [RFC PATCH v4 1/2] arm64: tlb: Detect the ARMv8.4 TLBI RANGE feature
  2020-06-01 14:47 [RFC PATCH v3 0/2] arm64: tlb: add support for TLBI RANGE instructions Zhenyu Ye
@ 2020-06-01 14:47 ` Zhenyu Ye
  2020-06-01 14:47 ` [RFC PATCH v4 2/2] arm64: tlb: Use the TLBI RANGE feature in arm64 Zhenyu Ye
  1 sibling, 0 replies; 9+ messages in thread
From: Zhenyu Ye @ 2020-06-01 14:47 UTC (permalink / raw)
  To: catalin.marinas, will, suzuki.poulose, maz, steven.price,
	guohanjun, olof
  Cc: yezhenyu2, linux-arm-kernel, linux-kernel, linux-arch, linux-mm,
	arm, xiexiangyou, prime.zeng, zhangshaokun, kuhn.chenqun

ARMv8.4-TLBI provides TLBI invalidation instruction that apply to a
range of input addresses. This patch detect this feature.

Signed-off-by: Zhenyu Ye <yezhenyu2@huawei.com>
---
 arch/arm64/include/asm/cpucaps.h |  3 ++-
 arch/arm64/include/asm/sysreg.h  |  4 ++++
 arch/arm64/kernel/cpufeature.c   | 11 +++++++++++
 3 files changed, 17 insertions(+), 1 deletion(-)

diff --git a/arch/arm64/include/asm/cpucaps.h b/arch/arm64/include/asm/cpucaps.h
index 8eb5a088ae65..950095a72617 100644
--- a/arch/arm64/include/asm/cpucaps.h
+++ b/arch/arm64/include/asm/cpucaps.h
@@ -61,7 +61,8 @@
 #define ARM64_HAS_AMU_EXTN			51
 #define ARM64_HAS_ADDRESS_AUTH			52
 #define ARM64_HAS_GENERIC_AUTH			53
+#define ARM64_HAS_TLBI_RANGE			54
 
-#define ARM64_NCAPS				54
+#define ARM64_NCAPS				55
 
 #endif /* __ASM_CPUCAPS_H */
diff --git a/arch/arm64/include/asm/sysreg.h b/arch/arm64/include/asm/sysreg.h
index c4ac0ac25a00..4cc3efd633bc 100644
--- a/arch/arm64/include/asm/sysreg.h
+++ b/arch/arm64/include/asm/sysreg.h
@@ -594,6 +594,7 @@
 
 /* id_aa64isar0 */
 #define ID_AA64ISAR0_RNDR_SHIFT		60
+#define ID_AA64ISAR0_TLBI_RANGE_SHIFT	56
 #define ID_AA64ISAR0_TS_SHIFT		52
 #define ID_AA64ISAR0_FHM_SHIFT		48
 #define ID_AA64ISAR0_DP_SHIFT		44
@@ -607,6 +608,9 @@
 #define ID_AA64ISAR0_SHA1_SHIFT		8
 #define ID_AA64ISAR0_AES_SHIFT		4
 
+#define ID_AA64ISAR0_TLBI_RANGE_NI	0x0
+#define ID_AA64ISAR0_TLBI_RANGE		0x2
+
 /* id_aa64isar1 */
 #define ID_AA64ISAR1_I8MM_SHIFT		52
 #define ID_AA64ISAR1_DGH_SHIFT		48
diff --git a/arch/arm64/kernel/cpufeature.c b/arch/arm64/kernel/cpufeature.c
index 9fac745aa7bb..31bcfd0722b5 100644
--- a/arch/arm64/kernel/cpufeature.c
+++ b/arch/arm64/kernel/cpufeature.c
@@ -124,6 +124,7 @@ static bool __system_matches_cap(unsigned int n);
  */
 static const struct arm64_ftr_bits ftr_id_aa64isar0[] = {
 	ARM64_FTR_BITS(FTR_VISIBLE, FTR_STRICT, FTR_LOWER_SAFE, ID_AA64ISAR0_RNDR_SHIFT, 4, 0),
+	ARM64_FTR_BITS(FTR_HIDDEN, FTR_STRICT, FTR_LOWER_SAFE, ID_AA64ISAR0_TLBI_RANGE_SHIFT, 4, 0),
 	ARM64_FTR_BITS(FTR_VISIBLE, FTR_STRICT, FTR_LOWER_SAFE, ID_AA64ISAR0_TS_SHIFT, 4, 0),
 	ARM64_FTR_BITS(FTR_VISIBLE, FTR_STRICT, FTR_LOWER_SAFE, ID_AA64ISAR0_FHM_SHIFT, 4, 0),
 	ARM64_FTR_BITS(FTR_VISIBLE, FTR_STRICT, FTR_LOWER_SAFE, ID_AA64ISAR0_DP_SHIFT, 4, 0),
@@ -1779,6 +1780,16 @@ static const struct arm64_cpu_capabilities arm64_features[] = {
 		.min_field_value = 1,
 	},
 #endif
+	{
+		.desc = "TLB range maintenance instruction",
+		.capability = ARM64_HAS_TLBI_RANGE,
+		.type = ARM64_CPUCAP_SYSTEM_FEATURE,
+		.matches = has_cpuid_feature,
+		.sys_reg = SYS_ID_AA64ISAR0_EL1,
+		.field_pos = ID_AA64ISAR0_TLBI_RANGE_SHIFT,
+		.sign = FTR_UNSIGNED,
+		.min_field_value = ID_AA64ISAR0_TLBI_RANGE,
+	},
 	{},
 };
 
-- 
2.19.1



^ permalink raw reply related	[flat|nested] 9+ messages in thread

* [RFC PATCH v4 2/2] arm64: tlb: Use the TLBI RANGE feature in arm64
  2020-06-01 14:47 [RFC PATCH v3 0/2] arm64: tlb: add support for TLBI RANGE instructions Zhenyu Ye
  2020-06-01 14:47 ` [RFC PATCH v4 1/2] arm64: tlb: Detect the ARMv8.4 TLBI RANGE feature Zhenyu Ye
@ 2020-06-01 14:47 ` Zhenyu Ye
  2020-06-02 12:06   ` Zhenyu Ye
  2020-07-07 17:36   ` Catalin Marinas
  1 sibling, 2 replies; 9+ messages in thread
From: Zhenyu Ye @ 2020-06-01 14:47 UTC (permalink / raw)
  To: catalin.marinas, will, suzuki.poulose, maz, steven.price,
	guohanjun, olof
  Cc: yezhenyu2, linux-arm-kernel, linux-kernel, linux-arch, linux-mm,
	arm, xiexiangyou, prime.zeng, zhangshaokun, kuhn.chenqun

Add __TLBI_VADDR_RANGE macro and rewrite __flush_tlb_range().

In this patch, we only use the TLBI RANGE feature if the stride == PAGE_SIZE,
because when stride > PAGE_SIZE, usually only a small number of pages need
to be flushed and classic tlbi intructions are more effective.

We can also use 'end - start < threshold number' to decide which way
to go, however, different hardware may have different thresholds, so
I'm not sure if this is feasible.

Signed-off-by: Zhenyu Ye <yezhenyu2@huawei.com>
---
 arch/arm64/include/asm/tlbflush.h | 98 +++++++++++++++++++++++++++----
 1 file changed, 86 insertions(+), 12 deletions(-)

diff --git a/arch/arm64/include/asm/tlbflush.h b/arch/arm64/include/asm/tlbflush.h
index bc3949064725..818f27c82024 100644
--- a/arch/arm64/include/asm/tlbflush.h
+++ b/arch/arm64/include/asm/tlbflush.h
@@ -50,6 +50,16 @@
 		__tlbi(op, (arg) | USER_ASID_FLAG);				\
 } while (0)
 
+#define __tlbi_last_level(op1, op2, arg, last_level) do {		\
+	if (last_level)	{						\
+		__tlbi(op1, arg);					\
+		__tlbi_user(op1, arg);					\
+	} else {							\
+		__tlbi(op2, arg);					\
+		__tlbi_user(op2, arg);					\
+	}								\
+} while (0)
+
 /* This macro creates a properly formatted VA operand for the TLBI */
 #define __TLBI_VADDR(addr, asid)				\
 	({							\
@@ -59,6 +69,47 @@
 		__ta;						\
 	})
 
+/*
+ * __TG defines translation granule of the system, which is decided by
+ * PAGE_SHIFT.  Used by TTL.
+ *  - 4KB	: 1
+ *  - 16KB	: 2
+ *  - 64KB	: 3
+ */
+#define __TG	((PAGE_SHIFT - 12) / 2 + 1)
+
+/*
+ * This macro creates a properly formatted VA operand for the TLBI RANGE.
+ * The value bit assignments are:
+ *
+ * +----------+------+-------+-------+-------+----------------------+
+ * |   ASID   |  TG  | SCALE |  NUM  |  TTL  |        BADDR         |
+ * +-----------------+-------+-------+-------+----------------------+
+ * |63      48|47  46|45   44|43   39|38   37|36                   0|
+ *
+ * The address range is determined by below formula:
+ * [BADDR, BADDR + (NUM + 1) * 2^(5*SCALE + 1) * PAGESIZE)
+ *
+ */
+#define __TLBI_VADDR_RANGE(addr, asid, scale, num, ttl)		\
+	({							\
+		unsigned long __ta = (addr) >> PAGE_SHIFT;	\
+		__ta &= GENMASK_ULL(36, 0);			\
+		__ta |= (unsigned long)(ttl) << 37;		\
+		__ta |= (unsigned long)(num) << 39;		\
+		__ta |= (unsigned long)(scale) << 44;		\
+		__ta |= (unsigned long)(__TG) << 46;		\
+		__ta |= (unsigned long)(asid) << 48;		\
+		__ta;						\
+	})
+
+/* This macro defines the range pages of the TLBI RANGE. */
+#define __TLBI_RANGE_SIZES(num, scale)	((num + 1) << (5 * scale + 1) << PAGE_SHIFT)
+
+#define TLB_RANGE_MASK_SHIFT 5
+#define TLB_RANGE_MASK GENMASK_ULL(TLB_RANGE_MASK_SHIFT - 1, 0)
+
+
 /*
  *	TLB Invalidation
  *	================
@@ -181,32 +232,55 @@ static inline void __flush_tlb_range(struct vm_area_struct *vma,
 				     unsigned long start, unsigned long end,
 				     unsigned long stride, bool last_level)
 {
+	int num = 0;
+	int scale = 0;
 	unsigned long asid = ASID(vma->vm_mm);
 	unsigned long addr;
+	unsigned long range_pages;
 
 	start = round_down(start, stride);
 	end = round_up(end, stride);
+	range_pages = (end - start) >> PAGE_SHIFT;
 
 	if ((end - start) >= (MAX_TLBI_OPS * stride)) {
 		flush_tlb_mm(vma->vm_mm);
 		return;
 	}
 
-	/* Convert the stride into units of 4k */
-	stride >>= 12;
+	dsb(ishst);
 
-	start = __TLBI_VADDR(start, asid);
-	end = __TLBI_VADDR(end, asid);
+	/*
+	 * The minimum size of TLB RANGE is 2 pages;
+	 * Use normal TLB instruction to handle odd pages.
+	 * If the stride != PAGE_SIZE, this will never happen.
+	 */
+	if (range_pages % 2 == 1) {
+		addr = __TLBI_VADDR(start, asid);
+		__tlbi_last_level(vale1is, vae1is, addr, last_level);
+		start += 1 << PAGE_SHIFT;
+		range_pages >>= 1;
+	}
 
-	dsb(ishst);
-	for (addr = start; addr < end; addr += stride) {
-		if (last_level) {
-			__tlbi(vale1is, addr);
-			__tlbi_user(vale1is, addr);
-		} else {
-			__tlbi(vae1is, addr);
-			__tlbi_user(vae1is, addr);
+	while (range_pages > 0) {
+		if (cpus_have_const_cap(ARM64_HAS_TLBI_RANGE) &&
+		    stride == PAGE_SIZE) {
+			num = (range_pages & TLB_RANGE_MASK) - 1;
+			if (num >= 0) {
+				addr = __TLBI_VADDR_RANGE(start, asid, scale,
+							  num, 0);
+				__tlbi_last_level(rvale1is, rvae1is, addr,
+						  last_level);
+				start += __TLBI_RANGE_SIZES(num, scale);
+			}
+			scale++;
+			range_pages >>= TLB_RANGE_MASK_SHIFT;
+			continue;
 		}
+
+		addr = __TLBI_VADDR(start, asid);
+		__tlbi_last_level(vale1is, vae1is, addr, last_level);
+		start += stride;
+		range_pages -= stride >> 12;
 	}
 	dsb(ish);
 }
-- 
2.19.1



^ permalink raw reply related	[flat|nested] 9+ messages in thread

* Re: [RFC PATCH v4 2/2] arm64: tlb: Use the TLBI RANGE feature in arm64
  2020-06-01 14:47 ` [RFC PATCH v4 2/2] arm64: tlb: Use the TLBI RANGE feature in arm64 Zhenyu Ye
@ 2020-06-02 12:06   ` Zhenyu Ye
  2020-07-07 17:36   ` Catalin Marinas
  1 sibling, 0 replies; 9+ messages in thread
From: Zhenyu Ye @ 2020-06-02 12:06 UTC (permalink / raw)
  To: catalin.marinas, will, suzuki.poulose, maz, steven.price,
	guohanjun, olof
  Cc: linux-arm-kernel, linux-kernel, linux-arch, linux-mm, arm,
	xiexiangyou, prime.zeng, zhangshaokun, kuhn.chenqun

Hi all,

Some optimizations to the codes:

On 2020/6/1 22:47, Zhenyu Ye wrote:
> -	start = __TLBI_VADDR(start, asid);
> -	end = __TLBI_VADDR(end, asid);
> +	/*
> +	 * The minimum size of TLB RANGE is 2 pages;
> +	 * Use normal TLB instruction to handle odd pages.
> +	 * If the stride != PAGE_SIZE, this will never happen.
> +	 */
> +	if (range_pages % 2 == 1) {
> +		addr = __TLBI_VADDR(start, asid);
> +		__tlbi_last_level(vale1is, vae1is, addr, last_level);
> +		start += 1 << PAGE_SHIFT;
> +		range_pages >>= 1;
> +	}
>  

We flush a single page here, and below loop does the same thing
if cpu not support TLB RANGE feature.  So may we use a goto statement
to simplify the code.

> +	while (range_pages > 0) {
> +		if (cpus_have_const_cap(ARM64_HAS_TLBI_RANGE) &&
> +		    stride == PAGE_SIZE) {
> +			num = (range_pages & TLB_RANGE_MASK) - 1;
> +			if (num >= 0) {
> +				addr = __TLBI_VADDR_RANGE(start, asid, scale,
> +							  num, 0);
> +				__tlbi_last_level(rvale1is, rvae1is, addr,
> +						  last_level);
> +				start += __TLBI_RANGE_SIZES(num, scale);
> +			}
> +			scale++;
> +			range_pages >>= TLB_RANGE_MASK_SHIFT;
> +			continue;
>  		}
> +
> +		addr = __TLBI_VADDR(start, asid);
> +		__tlbi_last_level(vale1is, vae1is, addr, last_level);
> +		start += stride;
> +		range_pages -= stride >> 12;
>  	}
>  	dsb(ish);
>  }
> 

Just like:

--8<---
	if (range_pages %2 == 1)
		goto flush_single_tlb;

	while (range_pages > 0) {
		if (cpus_have_const_cap(ARM64_HAS_TLBI_RANGE) &&
		    stride == PAGE_SIZE) {
			num = ((range_pages >> 1) & TLB_RANGE_MASK) - 1;
			if (num >= 0) {
				addr = __TLBI_VADDR_RANGE(start, asid, scale,
							  num, 0);
				__tlbi_last_level(rvale1is, rvae1is, addr,
						  last_level);
				start += __TLBI_RANGE_SIZES(num, scale);
			}
			scale++;
			range_pages >>= TLB_RANGE_MASK_SHIFT;
			continue;
		}

flush_single_tlb:
		addr = __TLBI_VADDR(start, asid);
		__tlbi_last_level(vale1is, vae1is, addr, last_level);
		start += stride;
		range_pages -= stride >> PAGE_SHIFT;
	}
--8<---




^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [RFC PATCH v4 2/2] arm64: tlb: Use the TLBI RANGE feature in arm64
  2020-06-01 14:47 ` [RFC PATCH v4 2/2] arm64: tlb: Use the TLBI RANGE feature in arm64 Zhenyu Ye
  2020-06-02 12:06   ` Zhenyu Ye
@ 2020-07-07 17:36   ` Catalin Marinas
  2020-07-07 17:43     ` Marc Zyngier
  2020-07-08  9:01     ` Zhenyu Ye
  1 sibling, 2 replies; 9+ messages in thread
From: Catalin Marinas @ 2020-07-07 17:36 UTC (permalink / raw)
  To: Zhenyu Ye
  Cc: will, suzuki.poulose, maz, steven.price, guohanjun, olof,
	linux-arm-kernel, linux-kernel, linux-arch, linux-mm, arm,
	xiexiangyou, prime.zeng, zhangshaokun, kuhn.chenqun

On Mon, Jun 01, 2020 at 10:47:13PM +0800, Zhenyu Ye wrote:
> @@ -59,6 +69,47 @@
>  		__ta;						\
>  	})
>  
> +/*
> + * __TG defines translation granule of the system, which is decided by
> + * PAGE_SHIFT.  Used by TTL.
> + *  - 4KB	: 1
> + *  - 16KB	: 2
> + *  - 64KB	: 3
> + */
> +#define __TG	((PAGE_SHIFT - 12) / 2 + 1)

Nitpick: maybe something like __TLBI_TG to avoid clashes in case someone
else defines a __TG macro.

> @@ -181,32 +232,55 @@ static inline void __flush_tlb_range(struct vm_area_struct *vma,
>  				     unsigned long start, unsigned long end,
>  				     unsigned long stride, bool last_level)
>  {
> +	int num = 0;
> +	int scale = 0;
>  	unsigned long asid = ASID(vma->vm_mm);
>  	unsigned long addr;
> +	unsigned long range_pages;
>  
>  	start = round_down(start, stride);
>  	end = round_up(end, stride);
> +	range_pages = (end - start) >> PAGE_SHIFT;
>  
>  	if ((end - start) >= (MAX_TLBI_OPS * stride)) {
>  		flush_tlb_mm(vma->vm_mm);
>  		return;
>  	}
>  
> -	/* Convert the stride into units of 4k */
> -	stride >>= 12;
> +	dsb(ishst);
>  
> -	start = __TLBI_VADDR(start, asid);
> -	end = __TLBI_VADDR(end, asid);
> +	/*
> +	 * The minimum size of TLB RANGE is 2 pages;
> +	 * Use normal TLB instruction to handle odd pages.
> +	 * If the stride != PAGE_SIZE, this will never happen.
> +	 */
> +	if (range_pages % 2 == 1) {
> +		addr = __TLBI_VADDR(start, asid);
> +		__tlbi_last_level(vale1is, vae1is, addr, last_level);
> +		start += 1 << PAGE_SHIFT;
> +		range_pages >>= 1;
> +	}

Shouldn't this be range_pages-- or -= stride >> 12? Your goto follow-up
fixes this, though I'm not a big fan of gotos jumping in the middle of a
loop.

> -	dsb(ishst);
> -	for (addr = start; addr < end; addr += stride) {
> -		if (last_level) {
> -			__tlbi(vale1is, addr);
> -			__tlbi_user(vale1is, addr);
> -		} else {
> -			__tlbi(vae1is, addr);
> -			__tlbi_user(vae1is, addr);
> +	while (range_pages > 0) {
> +		if (cpus_have_const_cap(ARM64_HAS_TLBI_RANGE) &&
> +		    stride == PAGE_SIZE) {

I think we could have the odd range_pages check here:

		if (cpus_have_const_cap(ARM64_HAS_TLBI_RANGE) &&
		    stride == PAGE_SIZE && range_pages % 2 == 0) {

and avoid the one outside the loop.

> +			num = (range_pages & TLB_RANGE_MASK) - 1;
> +			if (num >= 0) {
> +				addr = __TLBI_VADDR_RANGE(start, asid, scale,
> +							  num, 0);
> +				__tlbi_last_level(rvale1is, rvae1is, addr,
> +						  last_level);
> +				start += __TLBI_RANGE_SIZES(num, scale);
> +			}
> +			scale++;
> +			range_pages >>= TLB_RANGE_MASK_SHIFT;
> +			continue;
>  		}
> +
> +		addr = __TLBI_VADDR(start, asid);
> +		__tlbi_last_level(vale1is, vae1is, addr, last_level);
> +		start += stride;
> +		range_pages -= stride >> 12;
>  	}
>  	dsb(ish);
>  }
> -- 
> 2.19.1

-- 
Catalin

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [RFC PATCH v4 2/2] arm64: tlb: Use the TLBI RANGE feature in arm64
  2020-07-07 17:36   ` Catalin Marinas
@ 2020-07-07 17:43     ` Marc Zyngier
  2020-07-07 17:46       ` Catalin Marinas
  2020-07-08  9:01     ` Zhenyu Ye
  1 sibling, 1 reply; 9+ messages in thread
From: Marc Zyngier @ 2020-07-07 17:43 UTC (permalink / raw)
  To: Catalin Marinas
  Cc: Zhenyu Ye, will, suzuki.poulose, steven.price, guohanjun, olof,
	linux-arm-kernel, linux-kernel, linux-arch, linux-mm, arm,
	xiexiangyou, prime.zeng, zhangshaokun, kuhn.chenqun

On 2020-07-07 18:36, Catalin Marinas wrote:
> On Mon, Jun 01, 2020 at 10:47:13PM +0800, Zhenyu Ye wrote:
>> @@ -59,6 +69,47 @@
>>  		__ta;						\
>>  	})
>> 
>> +/*
>> + * __TG defines translation granule of the system, which is decided 
>> by
>> + * PAGE_SHIFT.  Used by TTL.
>> + *  - 4KB	: 1
>> + *  - 16KB	: 2
>> + *  - 64KB	: 3
>> + */
>> +#define __TG	((PAGE_SHIFT - 12) / 2 + 1)
> 
> Nitpick: maybe something like __TLBI_TG to avoid clashes in case 
> someone
> else defines a __TG macro.

I have commented on this in the past, and still maintain that this
would be better served by a switch statement similar to what is used
for TTL already (I don't think this magic formula exists in the
ARM ARM).

         M.
-- 
Jazz is not dead. It just smells funny...

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [RFC PATCH v4 2/2] arm64: tlb: Use the TLBI RANGE feature in arm64
  2020-07-07 17:43     ` Marc Zyngier
@ 2020-07-07 17:46       ` Catalin Marinas
  2020-07-07 18:12         ` Robin Murphy
  0 siblings, 1 reply; 9+ messages in thread
From: Catalin Marinas @ 2020-07-07 17:46 UTC (permalink / raw)
  To: Marc Zyngier
  Cc: Zhenyu Ye, will, suzuki.poulose, steven.price, guohanjun, olof,
	linux-arm-kernel, linux-kernel, linux-arch, linux-mm, arm,
	xiexiangyou, prime.zeng, zhangshaokun, kuhn.chenqun

On Tue, Jul 07, 2020 at 06:43:35PM +0100, Marc Zyngier wrote:
> On 2020-07-07 18:36, Catalin Marinas wrote:
> > On Mon, Jun 01, 2020 at 10:47:13PM +0800, Zhenyu Ye wrote:
> > > @@ -59,6 +69,47 @@
> > >  		__ta;						\
> > >  	})
> > > 
> > > +/*
> > > + * __TG defines translation granule of the system, which is decided
> > > by
> > > + * PAGE_SHIFT.  Used by TTL.
> > > + *  - 4KB	: 1
> > > + *  - 16KB	: 2
> > > + *  - 64KB	: 3
> > > + */
> > > +#define __TG	((PAGE_SHIFT - 12) / 2 + 1)
> > 
> > Nitpick: maybe something like __TLBI_TG to avoid clashes in case someone
> > else defines a __TG macro.
> 
> I have commented on this in the past, and still maintain that this
> would be better served by a switch statement similar to what is used
> for TTL already (I don't think this magic formula exists in the
> ARM ARM).

Good point, it would be cleaner indeed.

-- 
Catalin

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [RFC PATCH v4 2/2] arm64: tlb: Use the TLBI RANGE feature in arm64
  2020-07-07 17:46       ` Catalin Marinas
@ 2020-07-07 18:12         ` Robin Murphy
  0 siblings, 0 replies; 9+ messages in thread
From: Robin Murphy @ 2020-07-07 18:12 UTC (permalink / raw)
  To: Catalin Marinas, Marc Zyngier
  Cc: linux-arch, suzuki.poulose, Zhenyu Ye, linux-kernel, xiexiangyou,
	steven.price, zhangshaokun, linux-mm, arm, prime.zeng, guohanjun,
	olof, kuhn.chenqun, will, linux-arm-kernel

On 2020-07-07 18:46, Catalin Marinas wrote:
> On Tue, Jul 07, 2020 at 06:43:35PM +0100, Marc Zyngier wrote:
>> On 2020-07-07 18:36, Catalin Marinas wrote:
>>> On Mon, Jun 01, 2020 at 10:47:13PM +0800, Zhenyu Ye wrote:
>>>> @@ -59,6 +69,47 @@
>>>>   		__ta;						\
>>>>   	})
>>>>
>>>> +/*
>>>> + * __TG defines translation granule of the system, which is decided
>>>> by
>>>> + * PAGE_SHIFT.  Used by TTL.
>>>> + *  - 4KB	: 1
>>>> + *  - 16KB	: 2
>>>> + *  - 64KB	: 3
>>>> + */
>>>> +#define __TG	((PAGE_SHIFT - 12) / 2 + 1)
>>>
>>> Nitpick: maybe something like __TLBI_TG to avoid clashes in case someone
>>> else defines a __TG macro.
>>
>> I have commented on this in the past, and still maintain that this
>> would be better served by a switch statement similar to what is used
>> for TTL already (I don't think this magic formula exists in the
>> ARM ARM).
> 
> Good point, it would be cleaner indeed.

FWIW, we use the somewhat obtuse "(shift - 10) / 2" in the SMMUv3 
driver, but that's because we support multiple different granules at 
runtime and want to generate efficient code. Anything based on 
PAGE_SHIFT that resolves to a compile-time constant has no excuse for 
not being written in a clear and obvious manner ;)

Robin.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [RFC PATCH v4 2/2] arm64: tlb: Use the TLBI RANGE feature in arm64
  2020-07-07 17:36   ` Catalin Marinas
  2020-07-07 17:43     ` Marc Zyngier
@ 2020-07-08  9:01     ` Zhenyu Ye
  1 sibling, 0 replies; 9+ messages in thread
From: Zhenyu Ye @ 2020-07-08  9:01 UTC (permalink / raw)
  To: Catalin Marinas
  Cc: will, suzuki.poulose, maz, steven.price, guohanjun, olof,
	linux-arm-kernel, linux-kernel, linux-arch, linux-mm, arm,
	xiexiangyou, prime.zeng, zhangshaokun, kuhn.chenqun

Hi Catalin,

On 2020/7/8 1:36, Catalin Marinas wrote:
> On Mon, Jun 01, 2020 at 10:47:13PM +0800, Zhenyu Ye wrote:
>> @@ -59,6 +69,47 @@
>>  		__ta;						\
>>  	})
>>  
>> +/*
>> + * __TG defines translation granule of the system, which is decided by
>> + * PAGE_SHIFT.  Used by TTL.
>> + *  - 4KB	: 1
>> + *  - 16KB	: 2
>> + *  - 64KB	: 3
>> + */
>> +#define __TG	((PAGE_SHIFT - 12) / 2 + 1)
> 
> Nitpick: maybe something like __TLBI_TG to avoid clashes in case someone
> else defines a __TG macro.
> 

Thanks for your review. According to Marc and Robin's suggestion, I will remove this
macro.
I'd like implement this in a function beacause it's used in both TTL and TLB RANGE.

>> -	for (addr = start; addr < end; addr += stride) {
>> -		if (last_level) {
>> -			__tlbi(vale1is, addr);
>> -			__tlbi_user(vale1is, addr);
>> -		} else {
>> -			__tlbi(vae1is, addr);
>> -			__tlbi_user(vae1is, addr);
>> +	while (range_pages > 0) {
>> +		if (cpus_have_const_cap(ARM64_HAS_TLBI_RANGE) &&
>> +		    stride == PAGE_SIZE) {
> 
> I think we could have the odd range_pages check here:
> 
> 		if (cpus_have_const_cap(ARM64_HAS_TLBI_RANGE) &&
> 		    stride == PAGE_SIZE && range_pages % 2 == 0) {
> 
> and avoid the one outside the loop.
> 

This may need some other necessary changes to do this. See in next version series.

Thanks,
Zhenyu


^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2020-07-08  9:01 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-06-01 14:47 [RFC PATCH v3 0/2] arm64: tlb: add support for TLBI RANGE instructions Zhenyu Ye
2020-06-01 14:47 ` [RFC PATCH v4 1/2] arm64: tlb: Detect the ARMv8.4 TLBI RANGE feature Zhenyu Ye
2020-06-01 14:47 ` [RFC PATCH v4 2/2] arm64: tlb: Use the TLBI RANGE feature in arm64 Zhenyu Ye
2020-06-02 12:06   ` Zhenyu Ye
2020-07-07 17:36   ` Catalin Marinas
2020-07-07 17:43     ` Marc Zyngier
2020-07-07 17:46       ` Catalin Marinas
2020-07-07 18:12         ` Robin Murphy
2020-07-08  9:01     ` Zhenyu Ye

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).