linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
* [RFC PATCH v5 0/2] arm64: tlb: add support for TLBI RANGE instructions
@ 2020-07-08 12:40 Zhenyu Ye
  2020-07-08 12:40 ` [RFC PATCH v5 1/2] arm64: tlb: Detect the ARMv8.4 TLBI RANGE feature Zhenyu Ye
  2020-07-08 12:40 ` [RFC PATCH v5 2/2] arm64: tlb: Use the TLBI RANGE feature in arm64 Zhenyu Ye
  0 siblings, 2 replies; 5+ messages in thread
From: Zhenyu Ye @ 2020-07-08 12:40 UTC (permalink / raw)
  To: catalin.marinas, will, suzuki.poulose, maz, steven.price,
	guohanjun, olof
  Cc: yezhenyu2, linux-arm-kernel, linux-kernel, linux-arch, linux-mm,
	arm, xiexiangyou, prime.zeng, zhangshaokun, kuhn.chenqun

ARMv8.4-TLBI provides TLBI invalidation instruction that apply to a
range of input addresses. This series add support for this feature.

I tested this feature on a FPGA machine whose cpus support the tlbi range.
As the page num increases, the performance is improved significantly. When
page num = 256, the performance is improved by about 10 times.

Below is the test data when the stride = PTE:

	[page num]	[classic]		[tlbi range]
	1		16051			13524
	2		11366			11146
	3		11582			12171
	4		11694			11101
	5		12138			12267
	6		12290			11105
	7		12400			12002
	8		12837			11097
	9		14791			12140
	10		15461			11087
	16		18233			11094
	32		26983			11079
	64		43840			11092
	128		77754			11098
	256		145514			11089
	512		280932			11111

See more details in:

https://lore.kernel.org/linux-arm-kernel/504c7588-97e5-e014-fca0-c5511ae0d256@huawei.com/

--
ChangeList:
v5:
- rebase this series on Linux 5.8-rc4.
- remove the __TG macro.
- move the odd range_pages check into loop.

v4:
combine the __flush_tlb_range() and the __directly into the same function
with a single loop for both.

v3:
rebase this series on Linux 5.7-rc1.

v2:
Link: https://lkml.org/lkml/2019/11/11/348

Zhenyu Ye (2):
  arm64: tlb: Detect the ARMv8.4 TLBI RANGE feature
  arm64: tlb: Use the TLBI RANGE feature in arm64

 arch/arm64/include/asm/cpucaps.h  |   3 +-
 arch/arm64/include/asm/sysreg.h   |   3 +
 arch/arm64/include/asm/tlbflush.h | 101 +++++++++++++++++++++++++-----
 arch/arm64/kernel/cpufeature.c    |  10 +++
 4 files changed, 102 insertions(+), 15 deletions(-)

-- 
2.19.1




^ permalink raw reply	[flat|nested] 5+ messages in thread

* [RFC PATCH v5 1/2] arm64: tlb: Detect the ARMv8.4 TLBI RANGE feature
  2020-07-08 12:40 [RFC PATCH v5 0/2] arm64: tlb: add support for TLBI RANGE instructions Zhenyu Ye
@ 2020-07-08 12:40 ` Zhenyu Ye
  2020-07-08 12:40 ` [RFC PATCH v5 2/2] arm64: tlb: Use the TLBI RANGE feature in arm64 Zhenyu Ye
  1 sibling, 0 replies; 5+ messages in thread
From: Zhenyu Ye @ 2020-07-08 12:40 UTC (permalink / raw)
  To: catalin.marinas, will, suzuki.poulose, maz, steven.price,
	guohanjun, olof
  Cc: yezhenyu2, linux-arm-kernel, linux-kernel, linux-arch, linux-mm,
	arm, xiexiangyou, prime.zeng, zhangshaokun, kuhn.chenqun

ARMv8.4-TLBI provides TLBI invalidation instruction that apply to a
range of input addresses. This patch detect this feature.

Signed-off-by: Zhenyu Ye <yezhenyu2@huawei.com>
---
 arch/arm64/include/asm/cpucaps.h |  3 ++-
 arch/arm64/include/asm/sysreg.h  |  3 +++
 arch/arm64/kernel/cpufeature.c   | 10 ++++++++++
 3 files changed, 15 insertions(+), 1 deletion(-)

diff --git a/arch/arm64/include/asm/cpucaps.h b/arch/arm64/include/asm/cpucaps.h
index d7b3bb0cb180..96fe898bfb5f 100644
--- a/arch/arm64/include/asm/cpucaps.h
+++ b/arch/arm64/include/asm/cpucaps.h
@@ -62,7 +62,8 @@
 #define ARM64_HAS_GENERIC_AUTH			52
 #define ARM64_HAS_32BIT_EL1			53
 #define ARM64_BTI				54
+#define ARM64_HAS_TLBI_RANGE			55
 
-#define ARM64_NCAPS				55
+#define ARM64_NCAPS				56
 
 #endif /* __ASM_CPUCAPS_H */
diff --git a/arch/arm64/include/asm/sysreg.h b/arch/arm64/include/asm/sysreg.h
index 463175f80341..b4eb2e5601f2 100644
--- a/arch/arm64/include/asm/sysreg.h
+++ b/arch/arm64/include/asm/sysreg.h
@@ -617,6 +617,9 @@
 #define ID_AA64ISAR0_SHA1_SHIFT		8
 #define ID_AA64ISAR0_AES_SHIFT		4
 
+#define ID_AA64ISAR0_TLBI_RANGE_NI	0x0
+#define ID_AA64ISAR0_TLBI_RANGE		0x2
+
 /* id_aa64isar1 */
 #define ID_AA64ISAR1_I8MM_SHIFT		52
 #define ID_AA64ISAR1_DGH_SHIFT		48
diff --git a/arch/arm64/kernel/cpufeature.c b/arch/arm64/kernel/cpufeature.c
index 9fae0efc80c1..5491bf47e62c 100644
--- a/arch/arm64/kernel/cpufeature.c
+++ b/arch/arm64/kernel/cpufeature.c
@@ -2058,6 +2058,16 @@ static const struct arm64_cpu_capabilities arm64_features[] = {
 		.sign = FTR_UNSIGNED,
 	},
 #endif
+	{
+		.desc = "TLB range maintenance instruction",
+		.capability = ARM64_HAS_TLBI_RANGE,
+		.type = ARM64_CPUCAP_SYSTEM_FEATURE,
+		.matches = has_cpuid_feature,
+		.sys_reg = SYS_ID_AA64ISAR0_EL1,
+		.field_pos = ID_AA64ISAR0_TLB_SHIFT,
+		.sign = FTR_UNSIGNED,
+		.min_field_value = ID_AA64ISAR0_TLBI_RANGE,
+	},
 	{},
 };
 
-- 
2.19.1




^ permalink raw reply related	[flat|nested] 5+ messages in thread

* [RFC PATCH v5 2/2] arm64: tlb: Use the TLBI RANGE feature in arm64
  2020-07-08 12:40 [RFC PATCH v5 0/2] arm64: tlb: add support for TLBI RANGE instructions Zhenyu Ye
  2020-07-08 12:40 ` [RFC PATCH v5 1/2] arm64: tlb: Detect the ARMv8.4 TLBI RANGE feature Zhenyu Ye
@ 2020-07-08 12:40 ` Zhenyu Ye
  2020-07-08 18:24   ` Catalin Marinas
  1 sibling, 1 reply; 5+ messages in thread
From: Zhenyu Ye @ 2020-07-08 12:40 UTC (permalink / raw)
  To: catalin.marinas, will, suzuki.poulose, maz, steven.price,
	guohanjun, olof
  Cc: yezhenyu2, linux-arm-kernel, linux-kernel, linux-arch, linux-mm,
	arm, xiexiangyou, prime.zeng, zhangshaokun, kuhn.chenqun

Add __TLBI_VADDR_RANGE macro and rewrite __flush_tlb_range().

In this patch, we only use the TLBI RANGE feature if the stride == PAGE_SIZE,
because when stride > PAGE_SIZE, usually only a small number of pages need
to be flushed and classic tlbi intructions are more effective.

We can also use 'end - start < threshold number' to decide which way
to go, however, different hardware may have different thresholds, so
I'm not sure if this is feasible.

Signed-off-by: Zhenyu Ye <yezhenyu2@huawei.com>
---
 arch/arm64/include/asm/tlbflush.h | 104 ++++++++++++++++++++++++++----
 1 file changed, 90 insertions(+), 14 deletions(-)

diff --git a/arch/arm64/include/asm/tlbflush.h b/arch/arm64/include/asm/tlbflush.h
index bc3949064725..30975ddb8f06 100644
--- a/arch/arm64/include/asm/tlbflush.h
+++ b/arch/arm64/include/asm/tlbflush.h
@@ -50,6 +50,16 @@
 		__tlbi(op, (arg) | USER_ASID_FLAG);				\
 } while (0)
 
+#define __tlbi_last_level(op1, op2, arg, last_level) do {		\
+	if (last_level)	{						\
+		__tlbi(op1, arg);					\
+		__tlbi_user(op1, arg);					\
+	} else {							\
+		__tlbi(op2, arg);					\
+		__tlbi_user(op2, arg);					\
+	}								\
+} while (0)
+
 /* This macro creates a properly formatted VA operand for the TLBI */
 #define __TLBI_VADDR(addr, asid)				\
 	({							\
@@ -59,6 +69,60 @@
 		__ta;						\
 	})
 
+/*
+ * Get translation granule of the system, which is decided by
+ * PAGE_SIZE.  Used by TTL.
+ *  - 4KB	: 1
+ *  - 16KB	: 2
+ *  - 64KB	: 3
+ */
+static inline unsigned long get_trans_granule(void)
+{
+	switch (PAGE_SIZE) {
+	case SZ_4K:
+		return 1;
+	case SZ_16K:
+		return 2;
+	case SZ_64K:
+		return 3;
+	default:
+		return 0;
+	}
+}
+
+/*
+ * This macro creates a properly formatted VA operand for the TLBI RANGE.
+ * The value bit assignments are:
+ *
+ * +----------+------+-------+-------+-------+----------------------+
+ * |   ASID   |  TG  | SCALE |  NUM  |  TTL  |        BADDR         |
+ * +-----------------+-------+-------+-------+----------------------+
+ * |63      48|47  46|45   44|43   39|38   37|36                   0|
+ *
+ * The address range is determined by below formula:
+ * [BADDR, BADDR + (NUM + 1) * 2^(5*SCALE + 1) * PAGESIZE)
+ *
+ */
+#define __TLBI_VADDR_RANGE(addr, asid, scale, num, ttl)		\
+	({							\
+		unsigned long __ta = (addr) >> PAGE_SHIFT;	\
+		__ta &= GENMASK_ULL(36, 0);			\
+		__ta |= (unsigned long)(ttl) << 37;		\
+		__ta |= (unsigned long)(num) << 39;		\
+		__ta |= (unsigned long)(scale) << 44;		\
+		__ta |= get_trans_granule() << 46;		\
+		__ta |= (unsigned long)(asid) << 48;		\
+		__ta;						\
+	})
+
+/* These macros are used by the TLBI RANGE feature. */
+#define __TLBI_RANGE_PAGES(num, scale)	(((num) + 1) << (5 * (scale) + 1))
+#define MAX_TLBI_RANGE_PAGES		__TLBI_RANGE_PAGES(31, 3)
+
+#define TLBI_RANGE_MASK			GENMASK_ULL(4, 0)
+#define __TLBI_RANGE_NUM(range, scale)	\
+	(((range) >> (5 * (scale) + 1)) & TLBI_RANGE_MASK)
+
 /*
  *	TLB Invalidation
  *	================
@@ -181,32 +245,44 @@ static inline void __flush_tlb_range(struct vm_area_struct *vma,
 				     unsigned long start, unsigned long end,
 				     unsigned long stride, bool last_level)
 {
+	int num = 0;
+	int scale = 0;
 	unsigned long asid = ASID(vma->vm_mm);
 	unsigned long addr;
+	unsigned long range_pages;
 
 	start = round_down(start, stride);
 	end = round_up(end, stride);
+	range_pages = (end - start) >> PAGE_SHIFT;
 
-	if ((end - start) >= (MAX_TLBI_OPS * stride)) {
+	if ((!cpus_have_const_cap(ARM64_HAS_TLBI_RANGE) &&
+	    (end - start) >= (MAX_TLBI_OPS * stride)) ||
+	    range_pages >= MAX_TLBI_RANGE_PAGES) {
 		flush_tlb_mm(vma->vm_mm);
 		return;
 	}
 
-	/* Convert the stride into units of 4k */
-	stride >>= 12;
-
-	start = __TLBI_VADDR(start, asid);
-	end = __TLBI_VADDR(end, asid);
-
 	dsb(ishst);
-	for (addr = start; addr < end; addr += stride) {
-		if (last_level) {
-			__tlbi(vale1is, addr);
-			__tlbi_user(vale1is, addr);
-		} else {
-			__tlbi(vae1is, addr);
-			__tlbi_user(vae1is, addr);
+	while (range_pages > 0) {
+		if (cpus_have_const_cap(ARM64_HAS_TLBI_RANGE) &&
+		    stride == PAGE_SIZE && range_pages % 2 == 0) {
+			num = __TLBI_RANGE_NUM(range_pages, scale) - 1;
+			if (num >= 0) {
+				addr = __TLBI_VADDR_RANGE(start, asid, scale,
+							  num, 0);
+				__tlbi_last_level(rvale1is, rvae1is, addr,
+						  last_level);
+				start += __TLBI_RANGE_PAGES(num, scale) << PAGE_SHIFT;
+				range_pages -= __TLBI_RANGE_PAGES(num, scale);
+			}
+			scale++;
+			continue;
 		}
+
+		addr = __TLBI_VADDR(start, asid);
+		__tlbi_last_level(vale1is, vae1is, addr, last_level);
+		start += stride;
+		range_pages -= stride >> PAGE_SHIFT;
 	}
 	dsb(ish);
 }
-- 
2.19.1




^ permalink raw reply related	[flat|nested] 5+ messages in thread

* Re: [RFC PATCH v5 2/2] arm64: tlb: Use the TLBI RANGE feature in arm64
  2020-07-08 12:40 ` [RFC PATCH v5 2/2] arm64: tlb: Use the TLBI RANGE feature in arm64 Zhenyu Ye
@ 2020-07-08 18:24   ` Catalin Marinas
  2020-07-09  6:51     ` Zhenyu Ye
  0 siblings, 1 reply; 5+ messages in thread
From: Catalin Marinas @ 2020-07-08 18:24 UTC (permalink / raw)
  To: Zhenyu Ye
  Cc: will, suzuki.poulose, maz, steven.price, guohanjun, olof,
	linux-arm-kernel, linux-kernel, linux-arch, linux-mm, arm,
	xiexiangyou, prime.zeng, zhangshaokun, kuhn.chenqun

On Wed, Jul 08, 2020 at 08:40:31PM +0800, Zhenyu Ye wrote:
> Add __TLBI_VADDR_RANGE macro and rewrite __flush_tlb_range().
> 
> In this patch, we only use the TLBI RANGE feature if the stride == PAGE_SIZE,
> because when stride > PAGE_SIZE, usually only a small number of pages need
> to be flushed and classic tlbi intructions are more effective.

Why are they more effective? I guess a range op would work on this as
well, say unmapping a large THP range. If we ignore this stride ==
PAGE_SIZE, it could make the code easier to read.

> We can also use 'end - start < threshold number' to decide which way
> to go, however, different hardware may have different thresholds, so
> I'm not sure if this is feasible.
> 
> Signed-off-by: Zhenyu Ye <yezhenyu2@huawei.com>
> ---
>  arch/arm64/include/asm/tlbflush.h | 104 ++++++++++++++++++++++++++----
>  1 file changed, 90 insertions(+), 14 deletions(-)

Could you please rebase these patches on top of the arm64 for-next/tlbi
branch:

git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux.git for-next/tlbi

> diff --git a/arch/arm64/include/asm/tlbflush.h b/arch/arm64/include/asm/tlbflush.h
> index bc3949064725..30975ddb8f06 100644
> --- a/arch/arm64/include/asm/tlbflush.h
> +++ b/arch/arm64/include/asm/tlbflush.h
> @@ -50,6 +50,16 @@
>  		__tlbi(op, (arg) | USER_ASID_FLAG);				\
>  } while (0)
>  
> +#define __tlbi_last_level(op1, op2, arg, last_level) do {		\
> +	if (last_level)	{						\
> +		__tlbi(op1, arg);					\
> +		__tlbi_user(op1, arg);					\
> +	} else {							\
> +		__tlbi(op2, arg);					\
> +		__tlbi_user(op2, arg);					\
> +	}								\
> +} while (0)
> +
>  /* This macro creates a properly formatted VA operand for the TLBI */
>  #define __TLBI_VADDR(addr, asid)				\
>  	({							\
> @@ -59,6 +69,60 @@
>  		__ta;						\
>  	})
>  
> +/*
> + * Get translation granule of the system, which is decided by
> + * PAGE_SIZE.  Used by TTL.
> + *  - 4KB	: 1
> + *  - 16KB	: 2
> + *  - 64KB	: 3
> + */
> +static inline unsigned long get_trans_granule(void)
> +{
> +	switch (PAGE_SIZE) {
> +	case SZ_4K:
> +		return 1;
> +	case SZ_16K:
> +		return 2;
> +	case SZ_64K:
> +		return 3;
> +	default:
> +		return 0;
> +	}
> +}

Maybe you can factor out this switch statement in the for-next/tlbi
branch to be shared with TTL.

> +/*
> + * This macro creates a properly formatted VA operand for the TLBI RANGE.
> + * The value bit assignments are:
> + *
> + * +----------+------+-------+-------+-------+----------------------+
> + * |   ASID   |  TG  | SCALE |  NUM  |  TTL  |        BADDR         |
> + * +-----------------+-------+-------+-------+----------------------+
> + * |63      48|47  46|45   44|43   39|38   37|36                   0|
> + *
> + * The address range is determined by below formula:
> + * [BADDR, BADDR + (NUM + 1) * 2^(5*SCALE + 1) * PAGESIZE)
> + *
> + */
> +#define __TLBI_VADDR_RANGE(addr, asid, scale, num, ttl)		\

I don't see a non-zero ttl passed to this macro but I suspect this would
change if based on top of the TTL patches.

> +	({							\
> +		unsigned long __ta = (addr) >> PAGE_SHIFT;	\
> +		__ta &= GENMASK_ULL(36, 0);			\
> +		__ta |= (unsigned long)(ttl) << 37;		\
> +		__ta |= (unsigned long)(num) << 39;		\
> +		__ta |= (unsigned long)(scale) << 44;		\
> +		__ta |= get_trans_granule() << 46;		\
> +		__ta |= (unsigned long)(asid) << 48;		\
> +		__ta;						\
> +	})
> +
> +/* These macros are used by the TLBI RANGE feature. */
> +#define __TLBI_RANGE_PAGES(num, scale)	(((num) + 1) << (5 * (scale) + 1))
> +#define MAX_TLBI_RANGE_PAGES		__TLBI_RANGE_PAGES(31, 3)
> +
> +#define TLBI_RANGE_MASK			GENMASK_ULL(4, 0)
> +#define __TLBI_RANGE_NUM(range, scale)	\
> +	(((range) >> (5 * (scale) + 1)) & TLBI_RANGE_MASK)
> +
>  /*
>   *	TLB Invalidation
>   *	================
> @@ -181,32 +245,44 @@ static inline void __flush_tlb_range(struct vm_area_struct *vma,
>  				     unsigned long start, unsigned long end,
>  				     unsigned long stride, bool last_level)
>  {
> +	int num = 0;
> +	int scale = 0;
>  	unsigned long asid = ASID(vma->vm_mm);
>  	unsigned long addr;
> +	unsigned long range_pages;
>  
>  	start = round_down(start, stride);
>  	end = round_up(end, stride);
> +	range_pages = (end - start) >> PAGE_SHIFT;
>  
> -	if ((end - start) >= (MAX_TLBI_OPS * stride)) {
> +	if ((!cpus_have_const_cap(ARM64_HAS_TLBI_RANGE) &&
> +	    (end - start) >= (MAX_TLBI_OPS * stride)) ||
> +	    range_pages >= MAX_TLBI_RANGE_PAGES) {
>  		flush_tlb_mm(vma->vm_mm);
>  		return;
>  	}

Is there any value in this range_pages check here? What's the value of
MAX_TLBI_RANGE_PAGES? If we have TLBI range ops, we make a decision here
but without including the stride. Further down we use the stride to skip
the TLBI range ops.

>  
> -	/* Convert the stride into units of 4k */
> -	stride >>= 12;
> -
> -	start = __TLBI_VADDR(start, asid);
> -	end = __TLBI_VADDR(end, asid);
> -
>  	dsb(ishst);
> -	for (addr = start; addr < end; addr += stride) {
> -		if (last_level) {
> -			__tlbi(vale1is, addr);
> -			__tlbi_user(vale1is, addr);
> -		} else {
> -			__tlbi(vae1is, addr);
> -			__tlbi_user(vae1is, addr);
> +	while (range_pages > 0) {

BTW, I think we can even drop the "range_" from range_pages, it's just
the number of pages.

> +		if (cpus_have_const_cap(ARM64_HAS_TLBI_RANGE) &&
> +		    stride == PAGE_SIZE && range_pages % 2 == 0) {
> +			num = __TLBI_RANGE_NUM(range_pages, scale) - 1;
> +			if (num >= 0) {
> +				addr = __TLBI_VADDR_RANGE(start, asid, scale,
> +							  num, 0);
> +				__tlbi_last_level(rvale1is, rvae1is, addr,
> +						  last_level);
> +				start += __TLBI_RANGE_PAGES(num, scale) << PAGE_SHIFT;
> +				range_pages -= __TLBI_RANGE_PAGES(num, scale);
> +			}
> +			scale++;
> +			continue;
>  		}
> +
> +		addr = __TLBI_VADDR(start, asid);
> +		__tlbi_last_level(vale1is, vae1is, addr, last_level);
> +		start += stride;
> +		range_pages -= stride >> PAGE_SHIFT;
>  	}
>  	dsb(ish);
>  }

I think the algorithm is correct, though I need to work it out on a
piece of paper.

The code could benefit from some comments (above the loop) on how the
range is built and the right scale found.

-- 
Catalin


^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [RFC PATCH v5 2/2] arm64: tlb: Use the TLBI RANGE feature in arm64
  2020-07-08 18:24   ` Catalin Marinas
@ 2020-07-09  6:51     ` Zhenyu Ye
  0 siblings, 0 replies; 5+ messages in thread
From: Zhenyu Ye @ 2020-07-09  6:51 UTC (permalink / raw)
  To: Catalin Marinas
  Cc: will, suzuki.poulose, maz, steven.price, guohanjun, olof,
	linux-arm-kernel, linux-kernel, linux-arch, linux-mm, arm,
	xiexiangyou, prime.zeng, zhangshaokun, kuhn.chenqun

On 2020/7/9 2:24, Catalin Marinas wrote:
> On Wed, Jul 08, 2020 at 08:40:31PM +0800, Zhenyu Ye wrote:
>> Add __TLBI_VADDR_RANGE macro and rewrite __flush_tlb_range().
>>
>> In this patch, we only use the TLBI RANGE feature if the stride == PAGE_SIZE,
>> because when stride > PAGE_SIZE, usually only a small number of pages need
>> to be flushed and classic tlbi intructions are more effective.
> 
> Why are they more effective? I guess a range op would work on this as
> well, say unmapping a large THP range. If we ignore this stride ==
> PAGE_SIZE, it could make the code easier to read.
> 

OK, I will remove the stride == PAGE_SIZE here.

>> We can also use 'end - start < threshold number' to decide which way
>> to go, however, different hardware may have different thresholds, so
>> I'm not sure if this is feasible.
>>
>> Signed-off-by: Zhenyu Ye <yezhenyu2@huawei.com>
>> ---
>>  arch/arm64/include/asm/tlbflush.h | 104 ++++++++++++++++++++++++++----
>>  1 file changed, 90 insertions(+), 14 deletions(-)
> 
> Could you please rebase these patches on top of the arm64 for-next/tlbi
> branch:
> 
> git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux.git for-next/tlbi
> 

OK, I will send a formal version patch of this series soon.

>>  
>> -	if ((end - start) >= (MAX_TLBI_OPS * stride)) {
>> +	if ((!cpus_have_const_cap(ARM64_HAS_TLBI_RANGE) &&
>> +	    (end - start) >= (MAX_TLBI_OPS * stride)) ||
>> +	    range_pages >= MAX_TLBI_RANGE_PAGES) {
>>  		flush_tlb_mm(vma->vm_mm);
>>  		return;
>>  	}
> 
> Is there any value in this range_pages check here? What's the value of
> MAX_TLBI_RANGE_PAGES? If we have TLBI range ops, we make a decision here
> but without including the stride. Further down we use the stride to skip
> the TLBI range ops.
> 

MAX_TLBI_RANGE_PAGES is defined as __TLBI_RANGE_PAGES(31, 3), which is
decided by ARMv8.4 spec. The address range is determined by below formula:

	[BADDR, BADDR + (NUM + 1) * 2^(5*SCALE + 1) * PAGESIZE)

Which has nothing to do with the stride.  After removing the stride ==
PAGE_SIZE below, there will be more clear.


>>  }
> 
> I think the algorithm is correct, though I need to work it out on a
> piece of paper.
> 
> The code could benefit from some comments (above the loop) on how the
> range is built and the right scale found.
> 

OK.

Thanks,
Zhenyu



^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2020-07-09  6:51 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-07-08 12:40 [RFC PATCH v5 0/2] arm64: tlb: add support for TLBI RANGE instructions Zhenyu Ye
2020-07-08 12:40 ` [RFC PATCH v5 1/2] arm64: tlb: Detect the ARMv8.4 TLBI RANGE feature Zhenyu Ye
2020-07-08 12:40 ` [RFC PATCH v5 2/2] arm64: tlb: Use the TLBI RANGE feature in arm64 Zhenyu Ye
2020-07-08 18:24   ` Catalin Marinas
2020-07-09  6:51     ` Zhenyu Ye

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).