[PATCH v2 0/2] arm64: tlb: add support for TLBI RANGE instructions

linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* [PATCH v2 0/2] arm64: tlb: add support for TLBI RANGE instructions
@ 2020-07-10  9:44 Zhenyu Ye
  2020-07-10  9:44 ` [PATCH v2 1/2] arm64: tlb: Detect the ARMv8.4 TLBI RANGE feature Zhenyu Ye
                   ` (2 more replies)
  0 siblings, 3 replies; 18+ messages in thread
From: Zhenyu Ye @ 2020-07-10  9:44 UTC (permalink / raw)
  To: catalin.marinas, will, suzuki.poulose, maz, steven.price,
	guohanjun, olof
  Cc: yezhenyu2, linux-arm-kernel, linux-kernel, linux-arch, linux-mm,
	arm, xiexiangyou, prime.zeng, zhangshaokun, kuhn.chenqun

NOTICE: this series are based on the arm64 for-next/tlbi branch:
git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux.git for-next/tlbi

--
ARMv8.4-TLBI provides TLBI invalidation instruction that apply to a
range of input addresses. This series add support for this feature.

--
ChangeList:
v2:
- remove the __tlbi_last_level() macro.
- add check for parameters in __TLBI_VADDR_RANGE macro.

RFC patches:
- Link: https://lore.kernel.org/linux-arm-kernel/20200708124031.1414-1-yezhenyu2@huawei.com/

Zhenyu Ye (2):
  arm64: tlb: Detect the ARMv8.4 TLBI RANGE feature
  arm64: tlb: Use the TLBI RANGE feature in arm64

 arch/arm64/include/asm/cpucaps.h  |   3 +-
 arch/arm64/include/asm/sysreg.h   |   3 +
 arch/arm64/include/asm/tlbflush.h | 138 +++++++++++++++++++++++-------
 arch/arm64/kernel/cpufeature.c    |  10 +++
 4 files changed, 124 insertions(+), 30 deletions(-)

-- 
2.19.1



^ permalink raw reply	[flat|nested] 18+ messages in thread

* [PATCH v2 1/2] arm64: tlb: Detect the ARMv8.4 TLBI RANGE feature
  2020-07-10  9:44 [PATCH v2 0/2] arm64: tlb: add support for TLBI RANGE instructions Zhenyu Ye
@ 2020-07-10  9:44 ` Zhenyu Ye
  2020-07-10  9:44 ` [PATCH v2 2/2] arm64: tlb: Use the TLBI RANGE feature in arm64 Zhenyu Ye
  2020-07-10 19:11 ` [PATCH v2 0/2] arm64: tlb: add support for TLBI RANGE instructions Catalin Marinas
  2 siblings, 0 replies; 18+ messages in thread
From: Zhenyu Ye @ 2020-07-10  9:44 UTC (permalink / raw)
  To: catalin.marinas, will, suzuki.poulose, maz, steven.price,
	guohanjun, olof
  Cc: yezhenyu2, linux-arm-kernel, linux-kernel, linux-arch, linux-mm,
	arm, xiexiangyou, prime.zeng, zhangshaokun, kuhn.chenqun

ARMv8.4-TLBI provides TLBI invalidation instruction that apply to a
range of input addresses. This patch detect this feature.

Signed-off-by: Zhenyu Ye <yezhenyu2@huawei.com>
---
 arch/arm64/include/asm/cpucaps.h |  3 ++-
 arch/arm64/include/asm/sysreg.h  |  3 +++
 arch/arm64/kernel/cpufeature.c   | 10 ++++++++++
 3 files changed, 15 insertions(+), 1 deletion(-)

diff --git a/arch/arm64/include/asm/cpucaps.h b/arch/arm64/include/asm/cpucaps.h
index d44ba903d11d..8fe4aa1d372b 100644
--- a/arch/arm64/include/asm/cpucaps.h
+++ b/arch/arm64/include/asm/cpucaps.h
@@ -63,7 +63,8 @@
 #define ARM64_HAS_32BIT_EL1			53
 #define ARM64_BTI				54
 #define ARM64_HAS_ARMv8_4_TTL			55
+#define ARM64_HAS_TLBI_RANGE			56
 
-#define ARM64_NCAPS				56
+#define ARM64_NCAPS				57
 
 #endif /* __ASM_CPUCAPS_H */
diff --git a/arch/arm64/include/asm/sysreg.h b/arch/arm64/include/asm/sysreg.h
index 8c209aa17273..a5f24a26d86a 100644
--- a/arch/arm64/include/asm/sysreg.h
+++ b/arch/arm64/include/asm/sysreg.h
@@ -617,6 +617,9 @@
 #define ID_AA64ISAR0_SHA1_SHIFT		8
 #define ID_AA64ISAR0_AES_SHIFT		4
 
+#define ID_AA64ISAR0_TLBI_RANGE_NI	0x0
+#define ID_AA64ISAR0_TLBI_RANGE		0x2
+
 /* id_aa64isar1 */
 #define ID_AA64ISAR1_I8MM_SHIFT		52
 #define ID_AA64ISAR1_DGH_SHIFT		48
diff --git a/arch/arm64/kernel/cpufeature.c b/arch/arm64/kernel/cpufeature.c
index e877f56ff1ab..ba0f0ce06fee 100644
--- a/arch/arm64/kernel/cpufeature.c
+++ b/arch/arm64/kernel/cpufeature.c
@@ -2067,6 +2067,16 @@ static const struct arm64_cpu_capabilities arm64_features[] = {
 		.sign = FTR_UNSIGNED,
 	},
 #endif
+	{
+		.desc = "TLB range maintenance instruction",
+		.capability = ARM64_HAS_TLBI_RANGE,
+		.type = ARM64_CPUCAP_SYSTEM_FEATURE,
+		.matches = has_cpuid_feature,
+		.sys_reg = SYS_ID_AA64ISAR0_EL1,
+		.field_pos = ID_AA64ISAR0_TLB_SHIFT,
+		.sign = FTR_UNSIGNED,
+		.min_field_value = ID_AA64ISAR0_TLBI_RANGE,
+	},
 	{},
 };
 
-- 
2.19.1



^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH v2 2/2] arm64: tlb: Use the TLBI RANGE feature in arm64
  2020-07-10  9:44 [PATCH v2 0/2] arm64: tlb: add support for TLBI RANGE instructions Zhenyu Ye
  2020-07-10  9:44 ` [PATCH v2 1/2] arm64: tlb: Detect the ARMv8.4 TLBI RANGE feature Zhenyu Ye
@ 2020-07-10  9:44 ` Zhenyu Ye
  2020-07-10 18:31   ` Catalin Marinas
                     ` (2 more replies)
  2020-07-10 19:11 ` [PATCH v2 0/2] arm64: tlb: add support for TLBI RANGE instructions Catalin Marinas
  2 siblings, 3 replies; 18+ messages in thread
From: Zhenyu Ye @ 2020-07-10  9:44 UTC (permalink / raw)
  To: catalin.marinas, will, suzuki.poulose, maz, steven.price,
	guohanjun, olof
  Cc: yezhenyu2, linux-arm-kernel, linux-kernel, linux-arch, linux-mm,
	arm, xiexiangyou, prime.zeng, zhangshaokun, kuhn.chenqun

Add __TLBI_VADDR_RANGE macro and rewrite __flush_tlb_range().

When cpu supports TLBI feature, the minimum range granularity is
decided by 'scale', so we can not flush all pages by one instruction
in some cases.

For example, when the pages = 0xe81a, let's start 'scale' from
maximum, and find right 'num' for each 'scale':

1. scale = 3, we can flush no pages because the minimum range is
   2^(5*3 + 1) = 0x10000.
2. scale = 2, the minimum range is 2^(5*2 + 1) = 0x800, we can
   flush 0xe800 pages this time, the num = 0xe800/0x800 - 1 = 0x1c.
   Remaining pages is 0x1a;
3. scale = 1, the minimum range is 2^(5*1 + 1) = 0x40, no page
   can be flushed.
4. scale = 0, we flush the remaining 0x1a pages, the num =
   0x1a/0x2 - 1 = 0xd.

However, in most scenarios, the pages = 1 when flush_tlb_range() is
called. Start from scale = 3 or other proper value (such as scale =
ilog2(pages)), will incur extra overhead.
So increase 'scale' from 0 to maximum, the flush order is exactly
opposite to the example.

Signed-off-by: Zhenyu Ye <yezhenyu2@huawei.com>
---
 arch/arm64/include/asm/tlbflush.h | 138 +++++++++++++++++++++++-------
 1 file changed, 109 insertions(+), 29 deletions(-)

diff --git a/arch/arm64/include/asm/tlbflush.h b/arch/arm64/include/asm/tlbflush.h
index 39aed2efd21b..edfec8139ef8 100644
--- a/arch/arm64/include/asm/tlbflush.h
+++ b/arch/arm64/include/asm/tlbflush.h
@@ -60,6 +60,31 @@
 		__ta;						\
 	})
 
+/*
+ * Get translation granule of the system, which is decided by
+ * PAGE_SIZE.  Used by TTL.
+ *  - 4KB	: 1
+ *  - 16KB	: 2
+ *  - 64KB	: 3
+ */
+#define TLBI_TTL_TG_4K		1
+#define TLBI_TTL_TG_16K		2
+#define TLBI_TTL_TG_64K		3
+
+static inline unsigned long get_trans_granule(void)
+{
+	switch (PAGE_SIZE) {
+	case SZ_4K:
+		return TLBI_TTL_TG_4K;
+	case SZ_16K:
+		return TLBI_TTL_TG_16K;
+	case SZ_64K:
+		return TLBI_TTL_TG_64K;
+	default:
+		return 0;
+	}
+}
+
 /*
  * Level-based TLBI operations.
  *
@@ -73,9 +98,6 @@
  * in asm/stage2_pgtable.h.
  */
 #define TLBI_TTL_MASK		GENMASK_ULL(47, 44)
-#define TLBI_TTL_TG_4K		1
-#define TLBI_TTL_TG_16K		2
-#define TLBI_TTL_TG_64K		3
 
 #define __tlbi_level(op, addr, level) do {				\
 	u64 arg = addr;							\
@@ -83,19 +105,7 @@
 	if (cpus_have_const_cap(ARM64_HAS_ARMv8_4_TTL) &&		\
 	    level) {							\
 		u64 ttl = level & 3;					\
-									\
-		switch (PAGE_SIZE) {					\
-		case SZ_4K:						\
-			ttl |= TLBI_TTL_TG_4K << 2;			\
-			break;						\
-		case SZ_16K:						\
-			ttl |= TLBI_TTL_TG_16K << 2;			\
-			break;						\
-		case SZ_64K:						\
-			ttl |= TLBI_TTL_TG_64K << 2;			\
-			break;						\
-		}							\
-									\
+		ttl |= get_trans_granule() << 2;			\
 		arg &= ~TLBI_TTL_MASK;					\
 		arg |= FIELD_PREP(TLBI_TTL_MASK, ttl);			\
 	}								\
@@ -108,6 +118,39 @@
 		__tlbi_level(op, (arg | USER_ASID_FLAG), level);	\
 } while (0)
 
+/*
+ * This macro creates a properly formatted VA operand for the TLBI RANGE.
+ * The value bit assignments are:
+ *
+ * +----------+------+-------+-------+-------+----------------------+
+ * |   ASID   |  TG  | SCALE |  NUM  |  TTL  |        BADDR         |
+ * +-----------------+-------+-------+-------+----------------------+
+ * |63      48|47  46|45   44|43   39|38   37|36                   0|
+ *
+ * The address range is determined by below formula:
+ * [BADDR, BADDR + (NUM + 1) * 2^(5*SCALE + 1) * PAGESIZE)
+ *
+ */
+#define __TLBI_VADDR_RANGE(addr, asid, scale, num, ttl)		\
+	({							\
+		unsigned long __ta = (addr) >> PAGE_SHIFT;	\
+		__ta &= GENMASK_ULL(36, 0);			\
+		__ta |= (unsigned long)(ttl & 3) << 37;		\
+		__ta |= (unsigned long)(num & 31) << 39;	\
+		__ta |= (unsigned long)(scale & 3) << 44;	\
+		__ta |= (get_trans_granule() & 3) << 46;	\
+		__ta |= (unsigned long)(asid) << 48;		\
+		__ta;						\
+	})
+
+/* These macros are used by the TLBI RANGE feature. */
+#define __TLBI_RANGE_PAGES(num, scale)	(((num) + 1) << (5 * (scale) + 1))
+#define MAX_TLBI_RANGE_PAGES		__TLBI_RANGE_PAGES(31, 3)
+
+#define TLBI_RANGE_MASK			GENMASK_ULL(4, 0)
+#define __TLBI_RANGE_NUM(range, scale)	\
+	(((range) >> (5 * (scale) + 1)) & TLBI_RANGE_MASK)
+
 /*
  *	TLB Invalidation
  *	================
@@ -232,32 +275,69 @@ static inline void __flush_tlb_range(struct vm_area_struct *vma,
 				     unsigned long stride, bool last_level,
 				     int tlb_level)
 {
+	int num = 0;
+	int scale = 0;
 	unsigned long asid = ASID(vma->vm_mm);
 	unsigned long addr;
+	unsigned long pages;
 
 	start = round_down(start, stride);
 	end = round_up(end, stride);
+	pages = (end - start) >> PAGE_SHIFT;
 
-	if ((end - start) >= (MAX_TLBI_OPS * stride)) {
+	if ((!cpus_have_const_cap(ARM64_HAS_TLBI_RANGE) &&
+	    (end - start) >= (MAX_TLBI_OPS * stride)) ||
+	    pages >= MAX_TLBI_RANGE_PAGES) {
 		flush_tlb_mm(vma->vm_mm);
 		return;
 	}
 
-	/* Convert the stride into units of 4k */
-	stride >>= 12;
+	dsb(ishst);
 
-	start = __TLBI_VADDR(start, asid);
-	end = __TLBI_VADDR(end, asid);
+	/*
+	 * When cpu does not support TLBI RANGE feature, we flush the tlb
+	 * entries one by one at the granularity of 'stride'.
+	 * When cpu supports the TLBI RANGE feature, then:
+	 * 1. If pages is odd, flush the first page through non-RANGE
+	 *    instruction;
+	 * 2. For remaining pages: The minimum range granularity is decided
+	 *    by 'scale', so we can not flush all pages by one instruction
+	 *    in some cases.
+	 *    Here, we start from scale = 0, flush corresponding pages
+	 *    (from 2^(5*scale + 1) to 2^(5*(scale + 1) + 1)), and increase
+	 *    it until no pages left.
+	 */
+	while (pages > 0) {
+		if (!cpus_have_const_cap(ARM64_HAS_TLBI_RANGE) ||
+		    pages % 2 == 1) {
+			addr = __TLBI_VADDR(start, asid);
+			if (last_level) {
+				__tlbi_level(vale1is, addr, tlb_level);
+				__tlbi_user_level(vale1is, addr, tlb_level);
+			} else {
+				__tlbi_level(vae1is, addr, tlb_level);
+				__tlbi_user_level(vae1is, addr, tlb_level);
+			}
+			start += stride;
+			pages -= stride >> PAGE_SHIFT;
+			continue;
+		}
 
-	dsb(ishst);
-	for (addr = start; addr < end; addr += stride) {
-		if (last_level) {
-			__tlbi_level(vale1is, addr, tlb_level);
-			__tlbi_user_level(vale1is, addr, tlb_level);
-		} else {
-			__tlbi_level(vae1is, addr, tlb_level);
-			__tlbi_user_level(vae1is, addr, tlb_level);
+		num = __TLBI_RANGE_NUM(pages, scale) - 1;
+		if (num >= 0) {
+			addr = __TLBI_VADDR_RANGE(start, asid, scale,
+						  num, tlb_level);
+			if (last_level) {
+				__tlbi(rvale1is, addr);
+				__tlbi_user(rvale1is, addr);
+			} else {
+				__tlbi(rvae1is, addr);
+				__tlbi_user(rvae1is, addr);
+			}
+			start += __TLBI_RANGE_PAGES(num, scale) << PAGE_SHIFT;
+			pages -= __TLBI_RANGE_PAGES(num, scale);
 		}
+		scale++;
 	}
 	dsb(ish);
 }
-- 
2.19.1



^ permalink raw reply related	[flat|nested] 18+ messages in thread

* Re: [PATCH v2 2/2] arm64: tlb: Use the TLBI RANGE feature in arm64
  2020-07-10  9:44 ` [PATCH v2 2/2] arm64: tlb: Use the TLBI RANGE feature in arm64 Zhenyu Ye
@ 2020-07-10 18:31   ` Catalin Marinas
  2020-07-11  6:50     ` Zhenyu Ye
  2020-07-13 14:27   ` Jon Hunter
  2020-07-14 10:36   ` Catalin Marinas
  2 siblings, 1 reply; 18+ messages in thread
From: Catalin Marinas @ 2020-07-10 18:31 UTC (permalink / raw)
  To: Zhenyu Ye
  Cc: will, suzuki.poulose, maz, steven.price, guohanjun, olof,
	linux-arm-kernel, linux-kernel, linux-arch, linux-mm, arm,
	xiexiangyou, prime.zeng, zhangshaokun, kuhn.chenqun

On Fri, Jul 10, 2020 at 05:44:20PM +0800, Zhenyu Ye wrote:
> Add __TLBI_VADDR_RANGE macro and rewrite __flush_tlb_range().
> 
> When cpu supports TLBI feature, the minimum range granularity is
> decided by 'scale', so we can not flush all pages by one instruction
> in some cases.
> 
> For example, when the pages = 0xe81a, let's start 'scale' from
> maximum, and find right 'num' for each 'scale':
> 
> 1. scale = 3, we can flush no pages because the minimum range is
>    2^(5*3 + 1) = 0x10000.
> 2. scale = 2, the minimum range is 2^(5*2 + 1) = 0x800, we can
>    flush 0xe800 pages this time, the num = 0xe800/0x800 - 1 = 0x1c.
>    Remaining pages is 0x1a;
> 3. scale = 1, the minimum range is 2^(5*1 + 1) = 0x40, no page
>    can be flushed.
> 4. scale = 0, we flush the remaining 0x1a pages, the num =
>    0x1a/0x2 - 1 = 0xd.
> 
> However, in most scenarios, the pages = 1 when flush_tlb_range() is
> called. Start from scale = 3 or other proper value (such as scale =
> ilog2(pages)), will incur extra overhead.
> So increase 'scale' from 0 to maximum, the flush order is exactly
> opposite to the example.
> 
> Signed-off-by: Zhenyu Ye <yezhenyu2@huawei.com>
> ---
>  arch/arm64/include/asm/tlbflush.h | 138 +++++++++++++++++++++++-------
>  1 file changed, 109 insertions(+), 29 deletions(-)
> 
> diff --git a/arch/arm64/include/asm/tlbflush.h b/arch/arm64/include/asm/tlbflush.h
> index 39aed2efd21b..edfec8139ef8 100644
> --- a/arch/arm64/include/asm/tlbflush.h
> +++ b/arch/arm64/include/asm/tlbflush.h
> @@ -60,6 +60,31 @@
>  		__ta;						\
>  	})
>  
> +/*
> + * Get translation granule of the system, which is decided by
> + * PAGE_SIZE.  Used by TTL.
> + *  - 4KB	: 1
> + *  - 16KB	: 2
> + *  - 64KB	: 3
> + */
> +#define TLBI_TTL_TG_4K		1
> +#define TLBI_TTL_TG_16K		2
> +#define TLBI_TTL_TG_64K		3
> +
> +static inline unsigned long get_trans_granule(void)
> +{
> +	switch (PAGE_SIZE) {
> +	case SZ_4K:
> +		return TLBI_TTL_TG_4K;
> +	case SZ_16K:
> +		return TLBI_TTL_TG_16K;
> +	case SZ_64K:
> +		return TLBI_TTL_TG_64K;
> +	default:
> +		return 0;
> +	}
> +}
> +
>  /*
>   * Level-based TLBI operations.
>   *
> @@ -73,9 +98,6 @@
>   * in asm/stage2_pgtable.h.
>   */
>  #define TLBI_TTL_MASK		GENMASK_ULL(47, 44)
> -#define TLBI_TTL_TG_4K		1
> -#define TLBI_TTL_TG_16K		2
> -#define TLBI_TTL_TG_64K		3
>  
>  #define __tlbi_level(op, addr, level) do {				\
>  	u64 arg = addr;							\
> @@ -83,19 +105,7 @@
>  	if (cpus_have_const_cap(ARM64_HAS_ARMv8_4_TTL) &&		\
>  	    level) {							\
>  		u64 ttl = level & 3;					\
> -									\
> -		switch (PAGE_SIZE) {					\
> -		case SZ_4K:						\
> -			ttl |= TLBI_TTL_TG_4K << 2;			\
> -			break;						\
> -		case SZ_16K:						\
> -			ttl |= TLBI_TTL_TG_16K << 2;			\
> -			break;						\
> -		case SZ_64K:						\
> -			ttl |= TLBI_TTL_TG_64K << 2;			\
> -			break;						\
> -		}							\
> -									\
> +		ttl |= get_trans_granule() << 2;			\
>  		arg &= ~TLBI_TTL_MASK;					\
>  		arg |= FIELD_PREP(TLBI_TTL_MASK, ttl);			\
>  	}								\
> @@ -108,6 +118,39 @@
>  		__tlbi_level(op, (arg | USER_ASID_FLAG), level);	\
>  } while (0)
>  
> +/*
> + * This macro creates a properly formatted VA operand for the TLBI RANGE.
> + * The value bit assignments are:
> + *
> + * +----------+------+-------+-------+-------+----------------------+
> + * |   ASID   |  TG  | SCALE |  NUM  |  TTL  |        BADDR         |
> + * +-----------------+-------+-------+-------+----------------------+
> + * |63      48|47  46|45   44|43   39|38   37|36                   0|
> + *
> + * The address range is determined by below formula:
> + * [BADDR, BADDR + (NUM + 1) * 2^(5*SCALE + 1) * PAGESIZE)
> + *
> + */
> +#define __TLBI_VADDR_RANGE(addr, asid, scale, num, ttl)		\
> +	({							\
> +		unsigned long __ta = (addr) >> PAGE_SHIFT;	\
> +		__ta &= GENMASK_ULL(36, 0);			\
> +		__ta |= (unsigned long)(ttl & 3) << 37;		\
> +		__ta |= (unsigned long)(num & 31) << 39;	\
> +		__ta |= (unsigned long)(scale & 3) << 44;	\
> +		__ta |= (get_trans_granule() & 3) << 46;	\
> +		__ta |= (unsigned long)(asid) << 48;		\
> +		__ta;						\
> +	})

Nitpick: we don't need the additional masking here (e.g. ttl & 3) since
the values are capped anyway.

> +
> +/* These macros are used by the TLBI RANGE feature. */
> +#define __TLBI_RANGE_PAGES(num, scale)	(((num) + 1) << (5 * (scale) + 1))
> +#define MAX_TLBI_RANGE_PAGES		__TLBI_RANGE_PAGES(31, 3)
> +
> +#define TLBI_RANGE_MASK			GENMASK_ULL(4, 0)
> +#define __TLBI_RANGE_NUM(range, scale)	\
> +	(((range) >> (5 * (scale) + 1)) & TLBI_RANGE_MASK)
> +
>  /*
>   *	TLB Invalidation
>   *	================
> @@ -232,32 +275,69 @@ static inline void __flush_tlb_range(struct vm_area_struct *vma,
>  				     unsigned long stride, bool last_level,
>  				     int tlb_level)
>  {
> +	int num = 0;
> +	int scale = 0;
>  	unsigned long asid = ASID(vma->vm_mm);
>  	unsigned long addr;
> +	unsigned long pages;
>  
>  	start = round_down(start, stride);
>  	end = round_up(end, stride);
> +	pages = (end - start) >> PAGE_SHIFT;
>  
> -	if ((end - start) >= (MAX_TLBI_OPS * stride)) {
> +	if ((!cpus_have_const_cap(ARM64_HAS_TLBI_RANGE) &&
> +	    (end - start) >= (MAX_TLBI_OPS * stride)) ||
> +	    pages >= MAX_TLBI_RANGE_PAGES) {
>  		flush_tlb_mm(vma->vm_mm);
>  		return;
>  	}

I think we can use strictly greater here rather than greater or equal.
MAX_TLBI_RANGE_PAGES can be encoded as num 31, scale 3.

>  
> -	/* Convert the stride into units of 4k */
> -	stride >>= 12;
> +	dsb(ishst);
>  
> -	start = __TLBI_VADDR(start, asid);
> -	end = __TLBI_VADDR(end, asid);
> +	/*
> +	 * When cpu does not support TLBI RANGE feature, we flush the tlb
> +	 * entries one by one at the granularity of 'stride'.
> +	 * When cpu supports the TLBI RANGE feature, then:
> +	 * 1. If pages is odd, flush the first page through non-RANGE
> +	 *    instruction;
> +	 * 2. For remaining pages: The minimum range granularity is decided
> +	 *    by 'scale', so we can not flush all pages by one instruction
> +	 *    in some cases.
> +	 *    Here, we start from scale = 0, flush corresponding pages
> +	 *    (from 2^(5*scale + 1) to 2^(5*(scale + 1) + 1)), and increase
> +	 *    it until no pages left.
> +	 */
> +	while (pages > 0) {

I did some simple checks on ((end - start) % stride) and never
triggered. I had a slight worry that pages could become negative (and
we'd loop forever since it's unsigned long) for some mismatched stride
and flush size. It doesn't seem like.

> +		if (!cpus_have_const_cap(ARM64_HAS_TLBI_RANGE) ||
> +		    pages % 2 == 1) {
> +			addr = __TLBI_VADDR(start, asid);
> +			if (last_level) {
> +				__tlbi_level(vale1is, addr, tlb_level);
> +				__tlbi_user_level(vale1is, addr, tlb_level);
> +			} else {
> +				__tlbi_level(vae1is, addr, tlb_level);
> +				__tlbi_user_level(vae1is, addr, tlb_level);
> +			}
> +			start += stride;
> +			pages -= stride >> PAGE_SHIFT;
> +			continue;
> +		}
>  
> -	dsb(ishst);
> -	for (addr = start; addr < end; addr += stride) {
> -		if (last_level) {
> -			__tlbi_level(vale1is, addr, tlb_level);
> -			__tlbi_user_level(vale1is, addr, tlb_level);
> -		} else {
> -			__tlbi_level(vae1is, addr, tlb_level);
> -			__tlbi_user_level(vae1is, addr, tlb_level);
> +		num = __TLBI_RANGE_NUM(pages, scale) - 1;
> +		if (num >= 0) {
> +			addr = __TLBI_VADDR_RANGE(start, asid, scale,
> +						  num, tlb_level);
> +			if (last_level) {
> +				__tlbi(rvale1is, addr);
> +				__tlbi_user(rvale1is, addr);
> +			} else {
> +				__tlbi(rvae1is, addr);
> +				__tlbi_user(rvae1is, addr);
> +			}
> +			start += __TLBI_RANGE_PAGES(num, scale) << PAGE_SHIFT;
> +			pages -= __TLBI_RANGE_PAGES(num, scale);
>  		}
> +		scale++;
>  	}
>  	dsb(ish);

The logic looks fine to me now. I can fix the above nitpicks myself and
maybe adjust the comment a bit. I plan to push them into next to see if
anything explodes.

Thanks.

-- 
Catalin

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH v2 0/2] arm64: tlb: add support for TLBI RANGE instructions
  2020-07-10  9:44 [PATCH v2 0/2] arm64: tlb: add support for TLBI RANGE instructions Zhenyu Ye
  2020-07-10  9:44 ` [PATCH v2 1/2] arm64: tlb: Detect the ARMv8.4 TLBI RANGE feature Zhenyu Ye
  2020-07-10  9:44 ` [PATCH v2 2/2] arm64: tlb: Use the TLBI RANGE feature in arm64 Zhenyu Ye
@ 2020-07-10 19:11 ` Catalin Marinas
  2020-07-13 12:21   ` Catalin Marinas
  2 siblings, 1 reply; 18+ messages in thread
From: Catalin Marinas @ 2020-07-10 19:11 UTC (permalink / raw)
  To: maz, steven.price, guohanjun, Zhenyu Ye, will, olof, suzuki.poulose
  Cc: linux-kernel, linux-arm-kernel, zhangshaokun, prime.zeng,
	linux-arch, kuhn.chenqun, xiexiangyou, linux-mm, arm

On Fri, 10 Jul 2020 17:44:18 +0800, Zhenyu Ye wrote:
> NOTICE: this series are based on the arm64 for-next/tlbi branch:
> git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux.git for-next/tlbi
> 
> --
> ARMv8.4-TLBI provides TLBI invalidation instruction that apply to a
> range of input addresses. This series add support for this feature.
> 
> [...]

Applied to arm64 (for-next/tlbi), thanks!

[1/2] arm64: tlb: Detect the ARMv8.4 TLBI RANGE feature
      https://git.kernel.org/arm64/c/a2fd755f77ff
[2/2] arm64: tlb: Use the TLBI RANGE feature in arm64
      https://git.kernel.org/arm64/c/db34a081d273

-- 
Catalin


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH v2 2/2] arm64: tlb: Use the TLBI RANGE feature in arm64
  2020-07-10 18:31   ` Catalin Marinas
@ 2020-07-11  6:50     ` Zhenyu Ye
  2020-07-12 12:03       ` Catalin Marinas
  0 siblings, 1 reply; 18+ messages in thread
From: Zhenyu Ye @ 2020-07-11  6:50 UTC (permalink / raw)
  To: Catalin Marinas
  Cc: will, suzuki.poulose, maz, steven.price, guohanjun, olof,
	linux-arm-kernel, linux-kernel, linux-arch, linux-mm, arm,
	xiexiangyou, prime.zeng, zhangshaokun, kuhn.chenqun

Hi Catalin,

On 2020/7/11 2:31, Catalin Marinas wrote:
> On Fri, Jul 10, 2020 at 05:44:20PM +0800, Zhenyu Ye wrote:
>> -	if ((end - start) >= (MAX_TLBI_OPS * stride)) {
>> +	if ((!cpus_have_const_cap(ARM64_HAS_TLBI_RANGE) &&
>> +	    (end - start) >= (MAX_TLBI_OPS * stride)) ||
>> +	    pages >= MAX_TLBI_RANGE_PAGES) {
>>  		flush_tlb_mm(vma->vm_mm);
>>  		return;
>>  	}
> 
> I think we can use strictly greater here rather than greater or equal.
> MAX_TLBI_RANGE_PAGES can be encoded as num 31, scale 3.

Sorry, we can't.
For a boundary value (such as 2^6), we have two way to express it
in TLBI RANGE operations:
1. scale = 0, num = 31.
2. scale = 1, num = 0.

I used the second way in following implementation.  However, for the
MAX_TLBI_RANGE_PAGES, we can only use scale = 3, num = 31.
So if use strictly greater here, ERROR will happen when range pages
equal to MAX_TLBI_RANGE_PAGES.

There are two ways to avoid this bug:
1. Just keep 'greater or equal' here.  The ARM64 specification does
not specify how we flush tlb entries in this case, flush_tlb_mm()
is also a good choice for such a wide range of pages.
2. Add check in the loop, just like: (this may cause the codes a bit ugly)

	num = __TLBI_RANGE_NUM(pages, scale) - 1;

	/* scale = 4, num = 0 is equal to scale = 3, num = 31. */
	if (scale == 4 && num == 0) {
		scale = 3;
		num = 31;
	}

	if (num >= 0) {
	...

Which one do you prefer and how do you want to fix this error? Just
a fix patch again?

> 
>>  
>> -	/* Convert the stride into units of 4k */
>> -	stride >>= 12;
>> +	dsb(ishst);
>>  
>> -	start = __TLBI_VADDR(start, asid);
>> -	end = __TLBI_VADDR(end, asid);
>> +	/*
>> +	 * When cpu does not support TLBI RANGE feature, we flush the tlb
>> +	 * entries one by one at the granularity of 'stride'.
>> +	 * When cpu supports the TLBI RANGE feature, then:
>> +	 * 1. If pages is odd, flush the first page through non-RANGE
>> +	 *    instruction;
>> +	 * 2. For remaining pages: The minimum range granularity is decided
>> +	 *    by 'scale', so we can not flush all pages by one instruction
>> +	 *    in some cases.
>> +	 *    Here, we start from scale = 0, flush corresponding pages
>> +	 *    (from 2^(5*scale + 1) to 2^(5*(scale + 1) + 1)), and increase
>> +	 *    it until no pages left.
>> +	 */
>> +	while (pages > 0) {
> 
> I did some simple checks on ((end - start) % stride) and never
> triggered. I had a slight worry that pages could become negative (and
> we'd loop forever since it's unsigned long) for some mismatched stride
> and flush size. It doesn't seem like.
> 

The start and end are round_down/up in the function:

	start = round_down(start, stride);
 	end = round_up(end, stride);

So the flush size and stride will never mismatch.

Thanks,
Zhenyu


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH v2 2/2] arm64: tlb: Use the TLBI RANGE feature in arm64
  2020-07-11  6:50     ` Zhenyu Ye
@ 2020-07-12 12:03       ` Catalin Marinas
  0 siblings, 0 replies; 18+ messages in thread
From: Catalin Marinas @ 2020-07-12 12:03 UTC (permalink / raw)
  To: Zhenyu Ye
  Cc: will, suzuki.poulose, maz, steven.price, guohanjun, olof,
	linux-arm-kernel, linux-kernel, linux-arch, linux-mm, arm,
	xiexiangyou, prime.zeng, zhangshaokun, kuhn.chenqun

On Sat, Jul 11, 2020 at 02:50:46PM +0800, Zhenyu Ye wrote:
> On 2020/7/11 2:31, Catalin Marinas wrote:
> > On Fri, Jul 10, 2020 at 05:44:20PM +0800, Zhenyu Ye wrote:
> >> -	if ((end - start) >= (MAX_TLBI_OPS * stride)) {
> >> +	if ((!cpus_have_const_cap(ARM64_HAS_TLBI_RANGE) &&
> >> +	    (end - start) >= (MAX_TLBI_OPS * stride)) ||
> >> +	    pages >= MAX_TLBI_RANGE_PAGES) {
> >>  		flush_tlb_mm(vma->vm_mm);
> >>  		return;
> >>  	}
> > 
> > I think we can use strictly greater here rather than greater or equal.
> > MAX_TLBI_RANGE_PAGES can be encoded as num 31, scale 3.
> 
> Sorry, we can't.
> For a boundary value (such as 2^6), we have two way to express it
> in TLBI RANGE operations:
> 1. scale = 0, num = 31.
> 2. scale = 1, num = 0.
> 
> I used the second way in following implementation.  However, for the
> MAX_TLBI_RANGE_PAGES, we can only use scale = 3, num = 31.
> So if use strictly greater here, ERROR will happen when range pages
> equal to MAX_TLBI_RANGE_PAGES.

You are right, I got confused by the __TLBI_RANGE_NUM() macro which
doesn't return the actual 'num' for the TLBI argument as it would go
from 0 to 31. After subtracting 1, num end sup from -1 to 30, so we
never get the maximum range. I think for scale 3 and num 31, this would
be 8GB with 4K pages, so the maximum we'd cover 8GB - 64K * 4K.

> There are two ways to avoid this bug:
> 1. Just keep 'greater or equal' here.  The ARM64 specification does
> not specify how we flush tlb entries in this case, flush_tlb_mm()
> is also a good choice for such a wide range of pages.

I'll go for this option, I don't think it would make much difference in
practice if we stop at 8GB - 256M range.

> 2. Add check in the loop, just like: (this may cause the codes a bit ugly)
> 
> 	num = __TLBI_RANGE_NUM(pages, scale) - 1;
> 
> 	/* scale = 4, num = 0 is equal to scale = 3, num = 31. */
> 	if (scale == 4 && num == 0) {
> 		scale = 3;
> 		num = 31;
> 	}
> 
> 	if (num >= 0) {
> 	...
> 
> Which one do you prefer and how do you want to fix this error? Just
> a fix patch again?

I'll fold the diff below and refresh the patch:

diff --git a/arch/arm64/include/asm/tlbflush.h b/arch/arm64/include/asm/tlbflush.h
index 1eb0588718fb..0300e433ffe6 100644
--- a/arch/arm64/include/asm/tlbflush.h
+++ b/arch/arm64/include/asm/tlbflush.h
@@ -147,9 +147,13 @@ static inline unsigned long get_trans_granule(void)
 #define __TLBI_RANGE_PAGES(num, scale)	(((num) + 1) << (5 * (scale) + 1))
 #define MAX_TLBI_RANGE_PAGES		__TLBI_RANGE_PAGES(31, 3)
 
+/*
+ * Generate 'num' values from -1 to 30 with -1 rejected by the
+ * __flush_tlb_range() loop below.
+ */
 #define TLBI_RANGE_MASK			GENMASK_ULL(4, 0)
 #define __TLBI_RANGE_NUM(range, scale)	\
-	(((range) >> (5 * (scale) + 1)) & TLBI_RANGE_MASK)
+	((((range) >> (5 * (scale) + 1)) & TLBI_RANGE_MASK) - 1)
 
 /*
  *	TLB Invalidation
@@ -285,8 +289,8 @@ static inline void __flush_tlb_range(struct vm_area_struct *vma,
 	pages = (end - start) >> PAGE_SHIFT;
 
 	if ((!cpus_have_const_cap(ARM64_HAS_TLB_RANGE) &&
-	     (end - start) > (MAX_TLBI_OPS * stride)) ||
-	    pages > MAX_TLBI_RANGE_PAGES) {
+	     (end - start) >= (MAX_TLBI_OPS * stride)) ||
+	    pages >= MAX_TLBI_RANGE_PAGES) {
 		flush_tlb_mm(vma->vm_mm);
 		return;
 	}
@@ -306,6 +310,10 @@ static inline void __flush_tlb_range(struct vm_area_struct *vma,
 	 *    Start from scale = 0, flush the corresponding number of pages
 	 *    ((num+1)*2^(5*scale+1) starting from 'addr'), then increase it
 	 *    until no pages left.
+	 *
+	 * Note that certain ranges can be represented by either num = 31 and
+	 * scale or num = 0 and scale + 1. The loop below favours the latter
+	 * since num is limited to 30 by the __TLBI_RANGE_NUM() macro.
 	 */
 	while (pages > 0) {
 		if (!cpus_have_const_cap(ARM64_HAS_TLB_RANGE) ||
@@ -323,7 +331,7 @@ static inline void __flush_tlb_range(struct vm_area_struct *vma,
 			continue;
 		}
 
-		num = __TLBI_RANGE_NUM(pages, scale) - 1;
+		num = __TLBI_RANGE_NUM(pages, scale);
 		if (num >= 0) {
 			addr = __TLBI_VADDR_RANGE(start, asid, scale,
 						  num, tlb_level);

> >> -	/* Convert the stride into units of 4k */
> >> -	stride >>= 12;
> >> +	dsb(ishst);
> >>  
> >> -	start = __TLBI_VADDR(start, asid);
> >> -	end = __TLBI_VADDR(end, asid);
> >> +	/*
> >> +	 * When cpu does not support TLBI RANGE feature, we flush the tlb
> >> +	 * entries one by one at the granularity of 'stride'.
> >> +	 * When cpu supports the TLBI RANGE feature, then:
> >> +	 * 1. If pages is odd, flush the first page through non-RANGE
> >> +	 *    instruction;
> >> +	 * 2. For remaining pages: The minimum range granularity is decided
> >> +	 *    by 'scale', so we can not flush all pages by one instruction
> >> +	 *    in some cases.
> >> +	 *    Here, we start from scale = 0, flush corresponding pages
> >> +	 *    (from 2^(5*scale + 1) to 2^(5*(scale + 1) + 1)), and increase
> >> +	 *    it until no pages left.
> >> +	 */
> >> +	while (pages > 0) {
> > 
> > I did some simple checks on ((end - start) % stride) and never
> > triggered. I had a slight worry that pages could become negative (and
> > we'd loop forever since it's unsigned long) for some mismatched stride
> > and flush size. It doesn't seem like.
> 
> The start and end are round_down/up in the function:
> 
> 	start = round_down(start, stride);
>  	end = round_up(end, stride);
> 
> So the flush size and stride will never mismatch.

Right.

To make sure we don't miss any corner cases, I'll try to through the
algorithm above at CBMC (model checker; hopefully next week if I find
some time).

-- 
Catalin

^ permalink raw reply related	[flat|nested] 18+ messages in thread

* Re: [PATCH v2 0/2] arm64: tlb: add support for TLBI RANGE instructions
  2020-07-10 19:11 ` [PATCH v2 0/2] arm64: tlb: add support for TLBI RANGE instructions Catalin Marinas
@ 2020-07-13 12:21   ` Catalin Marinas
  2020-07-13 12:41     ` Zhenyu Ye
  0 siblings, 1 reply; 18+ messages in thread
From: Catalin Marinas @ 2020-07-13 12:21 UTC (permalink / raw)
  To: maz, steven.price, guohanjun, Zhenyu Ye, will, olof, suzuki.poulose
  Cc: linux-kernel, linux-arm-kernel, zhangshaokun, prime.zeng,
	linux-arch, kuhn.chenqun, xiexiangyou, linux-mm, arm

On Fri, Jul 10, 2020 at 08:11:19PM +0100, Catalin Marinas wrote:
> On Fri, 10 Jul 2020 17:44:18 +0800, Zhenyu Ye wrote:
> > NOTICE: this series are based on the arm64 for-next/tlbi branch:
> > git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux.git for-next/tlbi
> > 
> > --
> > ARMv8.4-TLBI provides TLBI invalidation instruction that apply to a
> > range of input addresses. This series add support for this feature.
> > 
> > [...]
> 
> Applied to arm64 (for-next/tlbi), thanks!
> 
> [1/2] arm64: tlb: Detect the ARMv8.4 TLBI RANGE feature
>       https://git.kernel.org/arm64/c/a2fd755f77ff
> [2/2] arm64: tlb: Use the TLBI RANGE feature in arm64
>       https://git.kernel.org/arm64/c/db34a081d273

I'm dropping these two patches from for-next/tlbi and for-next/core.
They need a check on whether binutils supports the new "tlbi rva*"
instructions, otherwise the build mail fail.

I kept the latest incarnation of these patches on devel/tlbi-range for
reference.

-- 
Catalin

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH v2 0/2] arm64: tlb: add support for TLBI RANGE instructions
  2020-07-13 12:21   ` Catalin Marinas
@ 2020-07-13 12:41     ` Zhenyu Ye
  2020-07-13 16:59       ` Catalin Marinas
  0 siblings, 1 reply; 18+ messages in thread
From: Zhenyu Ye @ 2020-07-13 12:41 UTC (permalink / raw)
  To: Catalin Marinas, maz, steven.price, guohanjun, will, olof,
	suzuki.poulose
  Cc: linux-kernel, linux-arm-kernel, zhangshaokun, prime.zeng,
	linux-arch, kuhn.chenqun, xiexiangyou, linux-mm, arm

Hi Catalin,

On 2020/7/13 20:21, Catalin Marinas wrote:
> On Fri, Jul 10, 2020 at 08:11:19PM +0100, Catalin Marinas wrote:
>> On Fri, 10 Jul 2020 17:44:18 +0800, Zhenyu Ye wrote:
>>> NOTICE: this series are based on the arm64 for-next/tlbi branch:
>>> git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux.git for-next/tlbi
>>>
>>> --
>>> ARMv8.4-TLBI provides TLBI invalidation instruction that apply to a
>>> range of input addresses. This series add support for this feature.
>>>
>>> [...]
>>
>> Applied to arm64 (for-next/tlbi), thanks!
>>
>> [1/2] arm64: tlb: Detect the ARMv8.4 TLBI RANGE feature
>>       https://git.kernel.org/arm64/c/a2fd755f77ff
>> [2/2] arm64: tlb: Use the TLBI RANGE feature in arm64
>>       https://git.kernel.org/arm64/c/db34a081d273
> 
> I'm dropping these two patches from for-next/tlbi and for-next/core.
> They need a check on whether binutils supports the new "tlbi rva*"
> instructions, otherwise the build mail fail.
> 
> I kept the latest incarnation of these patches on devel/tlbi-range for
> reference.
> 

Should we add a check for the binutils version? Just like:

diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
index fad573883e89..d5fb6567e0d2 100644
--- a/arch/arm64/Kconfig
+++ b/arch/arm64/Kconfig
@@ -1300,6 +1300,20 @@ config ARM64_AMU_EXTN
 	  correctly reflect reality. Most commonly, the value read will be 0,
 	  indicating that the counter is not enabled.

+config ARM64_TLBI_RANGE
+	bool "Enable support for tlbi range feature"
+	default y
+	depends on AS_HAS_TLBI_RANGE
+	help
+	  ARMv8.4-TLBI provides TLBI invalidation instruction that apply to a
+	  range of input addresses.
+
+	  The feature introduces new assembly instructions, and they were
+	  support when binutils >= 2.30.
+
+config AS_HAS_TLBI_RANGE
+	def_bool $(as-option, -Wa$(comma)-march=armv8.4-a)
+
 endmenu

 menu "ARMv8.5 architectural features"

Then uses the check in the loop:

	while (pages > 0) {
		if (!IS_ENABLED(CONFIG_ARM64_TLBI_RANGE) ||
		   !cpus_have_const_cap(ARM64_HAS_TLBI_RANGE) ||


If this is ok, I could send a new series soon.

Thanks,
Zhenyu



^ permalink raw reply related	[flat|nested] 18+ messages in thread

* Re: [PATCH v2 2/2] arm64: tlb: Use the TLBI RANGE feature in arm64
  2020-07-10  9:44 ` [PATCH v2 2/2] arm64: tlb: Use the TLBI RANGE feature in arm64 Zhenyu Ye
  2020-07-10 18:31   ` Catalin Marinas
@ 2020-07-13 14:27   ` Jon Hunter
  2020-07-13 14:39     ` Zhenyu Ye
  2020-07-14 10:36   ` Catalin Marinas
  2 siblings, 1 reply; 18+ messages in thread
From: Jon Hunter @ 2020-07-13 14:27 UTC (permalink / raw)
  To: Zhenyu Ye, catalin.marinas, will, suzuki.poulose, maz,
	steven.price, guohanjun, olof
  Cc: linux-arm-kernel, linux-kernel, linux-arch, linux-mm, arm,
	xiexiangyou, prime.zeng, zhangshaokun, kuhn.chenqun, linux-tegra


On 10/07/2020 10:44, Zhenyu Ye wrote:
> Add __TLBI_VADDR_RANGE macro and rewrite __flush_tlb_range().
> 
> When cpu supports TLBI feature, the minimum range granularity is
> decided by 'scale', so we can not flush all pages by one instruction
> in some cases.
> 
> For example, when the pages = 0xe81a, let's start 'scale' from
> maximum, and find right 'num' for each 'scale':
> 
> 1. scale = 3, we can flush no pages because the minimum range is
>    2^(5*3 + 1) = 0x10000.
> 2. scale = 2, the minimum range is 2^(5*2 + 1) = 0x800, we can
>    flush 0xe800 pages this time, the num = 0xe800/0x800 - 1 = 0x1c.
>    Remaining pages is 0x1a;
> 3. scale = 1, the minimum range is 2^(5*1 + 1) = 0x40, no page
>    can be flushed.
> 4. scale = 0, we flush the remaining 0x1a pages, the num =
>    0x1a/0x2 - 1 = 0xd.
> 
> However, in most scenarios, the pages = 1 when flush_tlb_range() is
> called. Start from scale = 3 or other proper value (such as scale =
> ilog2(pages)), will incur extra overhead.
> So increase 'scale' from 0 to maximum, the flush order is exactly
> opposite to the example.
> 
> Signed-off-by: Zhenyu Ye <yezhenyu2@huawei.com>


After this change I am seeing the following build errors ...

/tmp/cckzq3FT.s: Assembler messages:
/tmp/cckzq3FT.s:854: Error: unknown or missing operation name at operand 1 -- `tlbi rvae1is,x7'
/tmp/cckzq3FT.s:870: Error: unknown or missing operation name at operand 1 -- `tlbi rvae1is,x7'
/tmp/cckzq3FT.s:1095: Error: unknown or missing operation name at operand 1 -- `tlbi rvae1is,x7'
/tmp/cckzq3FT.s:1111: Error: unknown or missing operation name at operand 1 -- `tlbi rvae1is,x7'
/tmp/cckzq3FT.s:1964: Error: unknown or missing operation name at operand 1 -- `tlbi rvae1is,x7'
/tmp/cckzq3FT.s:1980: Error: unknown or missing operation name at operand 1 -- `tlbi rvae1is,x7'
/tmp/cckzq3FT.s:2286: Error: unknown or missing operation name at operand 1 -- `tlbi rvae1is,x7'
/tmp/cckzq3FT.s:2302: Error: unknown or missing operation name at operand 1 -- `tlbi rvae1is,x7'
/tmp/cckzq3FT.s:4833: Error: unknown or missing operation name at operand 1 -- `tlbi rvae1is,x6'
/tmp/cckzq3FT.s:4849: Error: unknown or missing operation name at operand 1 -- `tlbi rvae1is,x6'
/tmp/cckzq3FT.s:5090: Error: unknown or missing operation name at operand 1 -- `tlbi rvae1is,x6'
/tmp/cckzq3FT.s:5106: Error: unknown or missing operation name at operand 1 -- `tlbi rvae1is,x6'
/tmp/cckzq3FT.s:874: Error: attempt to move .org backwards
/tmp/cckzq3FT.s:1115: Error: attempt to move .org backwards
/tmp/cckzq3FT.s:1984: Error: attempt to move .org backwards
/tmp/cckzq3FT.s:2306: Error: attempt to move .org backwards
/tmp/cckzq3FT.s:4853: Error: attempt to move .org backwards
/tmp/cckzq3FT.s:5110: Error: attempt to move .org backwards
scripts/Makefile.build:280: recipe for target 'arch/arm64/mm/hugetlbpage.o' failed
make[3]: *** [arch/arm64/mm/hugetlbpage.o] Error 1
scripts/Makefile.build:497: recipe for target 'arch/arm64/mm' failed
make[2]: *** [arch/arm64/mm] Error 2

Cheers
Jon

-- 
nvpublic

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH v2 2/2] arm64: tlb: Use the TLBI RANGE feature in arm64
  2020-07-13 14:27   ` Jon Hunter
@ 2020-07-13 14:39     ` Zhenyu Ye
  2020-07-13 14:44       ` Jon Hunter
  0 siblings, 1 reply; 18+ messages in thread
From: Zhenyu Ye @ 2020-07-13 14:39 UTC (permalink / raw)
  To: Jon Hunter, catalin.marinas, will, suzuki.poulose, maz,
	steven.price, guohanjun, olof
  Cc: linux-arm-kernel, linux-kernel, linux-arch, linux-mm, arm,
	xiexiangyou, prime.zeng, zhangshaokun, kuhn.chenqun, linux-tegra

Hi Jon,

On 2020/7/13 22:27, Jon Hunter wrote:
> After this change I am seeing the following build errors ...
> 
> /tmp/cckzq3FT.s: Assembler messages:
> /tmp/cckzq3FT.s:854: Error: unknown or missing operation name at operand 1 -- `tlbi rvae1is,x7'
> /tmp/cckzq3FT.s:870: Error: unknown or missing operation name at operand 1 -- `tlbi rvae1is,x7'
> /tmp/cckzq3FT.s:1095: Error: unknown or missing operation name at operand 1 -- `tlbi rvae1is,x7'
> /tmp/cckzq3FT.s:1111: Error: unknown or missing operation name at operand 1 -- `tlbi rvae1is,x7'
> /tmp/cckzq3FT.s:1964: Error: unknown or missing operation name at operand 1 -- `tlbi rvae1is,x7'
> /tmp/cckzq3FT.s:1980: Error: unknown or missing operation name at operand 1 -- `tlbi rvae1is,x7'
> /tmp/cckzq3FT.s:2286: Error: unknown or missing operation name at operand 1 -- `tlbi rvae1is,x7'
> /tmp/cckzq3FT.s:2302: Error: unknown or missing operation name at operand 1 -- `tlbi rvae1is,x7'
> /tmp/cckzq3FT.s:4833: Error: unknown or missing operation name at operand 1 -- `tlbi rvae1is,x6'
> /tmp/cckzq3FT.s:4849: Error: unknown or missing operation name at operand 1 -- `tlbi rvae1is,x6'
> /tmp/cckzq3FT.s:5090: Error: unknown or missing operation name at operand 1 -- `tlbi rvae1is,x6'
> /tmp/cckzq3FT.s:5106: Error: unknown or missing operation name at operand 1 -- `tlbi rvae1is,x6'
> /tmp/cckzq3FT.s:874: Error: attempt to move .org backwards
> /tmp/cckzq3FT.s:1115: Error: attempt to move .org backwards
> /tmp/cckzq3FT.s:1984: Error: attempt to move .org backwards
> /tmp/cckzq3FT.s:2306: Error: attempt to move .org backwards
> /tmp/cckzq3FT.s:4853: Error: attempt to move .org backwards
> /tmp/cckzq3FT.s:5110: Error: attempt to move .org backwards
> scripts/Makefile.build:280: recipe for target 'arch/arm64/mm/hugetlbpage.o' failed
> make[3]: *** [arch/arm64/mm/hugetlbpage.o] Error 1
> scripts/Makefile.build:497: recipe for target 'arch/arm64/mm' failed
> make[2]: *** [arch/arm64/mm] Error 2
> 
> Cheers
> Jon
> 

The code must be built with binutils >= 2.30.
Maybe I should add  a check on whether binutils supports ARMv8.4-a instructions...

Thanks,
Zhenyu


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH v2 2/2] arm64: tlb: Use the TLBI RANGE feature in arm64
  2020-07-13 14:39     ` Zhenyu Ye
@ 2020-07-13 14:44       ` Jon Hunter
  2020-07-13 17:21         ` Catalin Marinas
  0 siblings, 1 reply; 18+ messages in thread
From: Jon Hunter @ 2020-07-13 14:44 UTC (permalink / raw)
  To: Zhenyu Ye, catalin.marinas, will, suzuki.poulose, maz,
	steven.price, guohanjun, olof
  Cc: linux-arm-kernel, linux-kernel, linux-arch, linux-mm, arm,
	xiexiangyou, prime.zeng, zhangshaokun, kuhn.chenqun, linux-tegra


On 13/07/2020 15:39, Zhenyu Ye wrote:
> Hi Jon,
> 
> On 2020/7/13 22:27, Jon Hunter wrote:
>> After this change I am seeing the following build errors ...
>>
>> /tmp/cckzq3FT.s: Assembler messages:
>> /tmp/cckzq3FT.s:854: Error: unknown or missing operation name at operand 1 -- `tlbi rvae1is,x7'
>> /tmp/cckzq3FT.s:870: Error: unknown or missing operation name at operand 1 -- `tlbi rvae1is,x7'
>> /tmp/cckzq3FT.s:1095: Error: unknown or missing operation name at operand 1 -- `tlbi rvae1is,x7'
>> /tmp/cckzq3FT.s:1111: Error: unknown or missing operation name at operand 1 -- `tlbi rvae1is,x7'
>> /tmp/cckzq3FT.s:1964: Error: unknown or missing operation name at operand 1 -- `tlbi rvae1is,x7'
>> /tmp/cckzq3FT.s:1980: Error: unknown or missing operation name at operand 1 -- `tlbi rvae1is,x7'
>> /tmp/cckzq3FT.s:2286: Error: unknown or missing operation name at operand 1 -- `tlbi rvae1is,x7'
>> /tmp/cckzq3FT.s:2302: Error: unknown or missing operation name at operand 1 -- `tlbi rvae1is,x7'
>> /tmp/cckzq3FT.s:4833: Error: unknown or missing operation name at operand 1 -- `tlbi rvae1is,x6'
>> /tmp/cckzq3FT.s:4849: Error: unknown or missing operation name at operand 1 -- `tlbi rvae1is,x6'
>> /tmp/cckzq3FT.s:5090: Error: unknown or missing operation name at operand 1 -- `tlbi rvae1is,x6'
>> /tmp/cckzq3FT.s:5106: Error: unknown or missing operation name at operand 1 -- `tlbi rvae1is,x6'
>> /tmp/cckzq3FT.s:874: Error: attempt to move .org backwards
>> /tmp/cckzq3FT.s:1115: Error: attempt to move .org backwards
>> /tmp/cckzq3FT.s:1984: Error: attempt to move .org backwards
>> /tmp/cckzq3FT.s:2306: Error: attempt to move .org backwards
>> /tmp/cckzq3FT.s:4853: Error: attempt to move .org backwards
>> /tmp/cckzq3FT.s:5110: Error: attempt to move .org backwards
>> scripts/Makefile.build:280: recipe for target 'arch/arm64/mm/hugetlbpage.o' failed
>> make[3]: *** [arch/arm64/mm/hugetlbpage.o] Error 1
>> scripts/Makefile.build:497: recipe for target 'arch/arm64/mm' failed
>> make[2]: *** [arch/arm64/mm] Error 2
>>
>> Cheers
>> Jon
>>
> 
> The code must be built with binutils >= 2.30.
> Maybe I should add  a check on whether binutils supports ARMv8.4-a instructions...

Yes I believe so.

Cheers
Jon

-- 
nvpublic

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH v2 0/2] arm64: tlb: add support for TLBI RANGE instructions
  2020-07-13 12:41     ` Zhenyu Ye
@ 2020-07-13 16:59       ` Catalin Marinas
  2020-07-14 15:17         ` Zhenyu Ye
  0 siblings, 1 reply; 18+ messages in thread
From: Catalin Marinas @ 2020-07-13 16:59 UTC (permalink / raw)
  To: Zhenyu Ye
  Cc: maz, steven.price, guohanjun, will, olof, suzuki.poulose,
	linux-kernel, linux-arm-kernel, zhangshaokun, prime.zeng,
	linux-arch, kuhn.chenqun, xiexiangyou, linux-mm, arm

On Mon, Jul 13, 2020 at 08:41:31PM +0800, Zhenyu Ye wrote:
> On 2020/7/13 20:21, Catalin Marinas wrote:
> > On Fri, Jul 10, 2020 at 08:11:19PM +0100, Catalin Marinas wrote:
> >> On Fri, 10 Jul 2020 17:44:18 +0800, Zhenyu Ye wrote:
> >>> NOTICE: this series are based on the arm64 for-next/tlbi branch:
> >>> git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux.git for-next/tlbi
> >>>
> >>> --
> >>> ARMv8.4-TLBI provides TLBI invalidation instruction that apply to a
> >>> range of input addresses. This series add support for this feature.
> >>>
> >>> [...]
> >>
> >> Applied to arm64 (for-next/tlbi), thanks!
> >>
> >> [1/2] arm64: tlb: Detect the ARMv8.4 TLBI RANGE feature
> >>       https://git.kernel.org/arm64/c/a2fd755f77ff
> >> [2/2] arm64: tlb: Use the TLBI RANGE feature in arm64
> >>       https://git.kernel.org/arm64/c/db34a081d273
> > 
> > I'm dropping these two patches from for-next/tlbi and for-next/core.
> > They need a check on whether binutils supports the new "tlbi rva*"
> > instructions, otherwise the build mail fail.
> > 
> > I kept the latest incarnation of these patches on devel/tlbi-range for
> > reference.
> 
> Should we add a check for the binutils version? Just like:
> 
> diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
> index fad573883e89..d5fb6567e0d2 100644
> --- a/arch/arm64/Kconfig
> +++ b/arch/arm64/Kconfig
> @@ -1300,6 +1300,20 @@ config ARM64_AMU_EXTN
>  	  correctly reflect reality. Most commonly, the value read will be 0,
>  	  indicating that the counter is not enabled.
> 
> +config ARM64_TLBI_RANGE
> +	bool "Enable support for tlbi range feature"
> +	default y
> +	depends on AS_HAS_TLBI_RANGE
> +	help
> +	  ARMv8.4-TLBI provides TLBI invalidation instruction that apply to a
> +	  range of input addresses.
> +
> +	  The feature introduces new assembly instructions, and they were
> +	  support when binutils >= 2.30.

It looks like 2.30. I tracked it down to this commit:

https://sourceware.org/git/?p=binutils-gdb.git;a=commitdiff;h=793a194839bc8add71fdc7429c58b10f0667a6f6;hp=1a7ed57c840dcb0401f1a67c6763a89f7d2686d2

> +config AS_HAS_TLBI_RANGE
> +	def_bool $(as-option, -Wa$(comma)-march=armv8.4-a)
> +
>  endmenu

The problem is that we don't pass -Wa,-march=armv8.4-a to gas. AFAICT,
we only set an 8.3 for PAC but I'm not sure how passing two such options
goes.

I'm slightly surprised that my toolchains (and yours) did not complain
about these instructions. Looking at the binutils code, I think it
should have complained if -march=armv8.4-a wasn't passed but works fine.
I thought gas doesn't enable the maximum arch feature by default.

An alternative would be to check for a specific instruction (untested):

	def_bool $(as-instr,tlbi rvae1is, x0)

but we need to figure out whether gas not requiring -march=armv8.4-a is
a bug (which may be fixed) or that gas accepts all TLBI instructions.

A safer bet may be to simply encode the instructions by hand:

#define SYS_TLBI_RVAE1IS(Rt) \
	__emit_inst(0xd5000000 | sys_insn(1, 0, 8, 2, 1) | ((Rt) & 0x1f))
#define SYS_TLBI_RVALE1IS(Rt) \
	__emit_inst(0xd5000000 | sys_insn(1, 0, 8, 2, 5) | ((Rt) & 0x1f))

(please check that they are correct)

-- 
Catalin

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH v2 2/2] arm64: tlb: Use the TLBI RANGE feature in arm64
  2020-07-13 14:44       ` Jon Hunter
@ 2020-07-13 17:21         ` Catalin Marinas
  0 siblings, 0 replies; 18+ messages in thread
From: Catalin Marinas @ 2020-07-13 17:21 UTC (permalink / raw)
  To: Jon Hunter
  Cc: Zhenyu Ye, will, suzuki.poulose, maz, steven.price, guohanjun,
	olof, linux-arm-kernel, linux-kernel, linux-arch, linux-mm, arm,
	xiexiangyou, prime.zeng, zhangshaokun, kuhn.chenqun, linux-tegra

On Mon, Jul 13, 2020 at 03:44:16PM +0100, Jon Hunter wrote:
> On 13/07/2020 15:39, Zhenyu Ye wrote:
> > On 2020/7/13 22:27, Jon Hunter wrote:
> >> After this change I am seeing the following build errors ...
> >>
> >> /tmp/cckzq3FT.s: Assembler messages:
> >> /tmp/cckzq3FT.s:854: Error: unknown or missing operation name at operand 1 -- `tlbi rvae1is,x7'
> >> /tmp/cckzq3FT.s:870: Error: unknown or missing operation name at operand 1 -- `tlbi rvae1is,x7'
> >> /tmp/cckzq3FT.s:1095: Error: unknown or missing operation name at operand 1 -- `tlbi rvae1is,x7'
> >> /tmp/cckzq3FT.s:1111: Error: unknown or missing operation name at operand 1 -- `tlbi rvae1is,x7'
> >> /tmp/cckzq3FT.s:1964: Error: unknown or missing operation name at operand 1 -- `tlbi rvae1is,x7'
> >> /tmp/cckzq3FT.s:1980: Error: unknown or missing operation name at operand 1 -- `tlbi rvae1is,x7'
> >> /tmp/cckzq3FT.s:2286: Error: unknown or missing operation name at operand 1 -- `tlbi rvae1is,x7'
> >> /tmp/cckzq3FT.s:2302: Error: unknown or missing operation name at operand 1 -- `tlbi rvae1is,x7'
> >> /tmp/cckzq3FT.s:4833: Error: unknown or missing operation name at operand 1 -- `tlbi rvae1is,x6'
> >> /tmp/cckzq3FT.s:4849: Error: unknown or missing operation name at operand 1 -- `tlbi rvae1is,x6'
> >> /tmp/cckzq3FT.s:5090: Error: unknown or missing operation name at operand 1 -- `tlbi rvae1is,x6'
> >> /tmp/cckzq3FT.s:5106: Error: unknown or missing operation name at operand 1 -- `tlbi rvae1is,x6'
> >> /tmp/cckzq3FT.s:874: Error: attempt to move .org backwards
> >> /tmp/cckzq3FT.s:1115: Error: attempt to move .org backwards
> >> /tmp/cckzq3FT.s:1984: Error: attempt to move .org backwards
> >> /tmp/cckzq3FT.s:2306: Error: attempt to move .org backwards
> >> /tmp/cckzq3FT.s:4853: Error: attempt to move .org backwards
> >> /tmp/cckzq3FT.s:5110: Error: attempt to move .org backwards
> >> scripts/Makefile.build:280: recipe for target 'arch/arm64/mm/hugetlbpage.o' failed
> >> make[3]: *** [arch/arm64/mm/hugetlbpage.o] Error 1
> >> scripts/Makefile.build:497: recipe for target 'arch/arm64/mm' failed
> >> make[2]: *** [arch/arm64/mm] Error 2
> > 
> > The code must be built with binutils >= 2.30.
> > Maybe I should add  a check on whether binutils supports ARMv8.4-a instructions...
> 
> Yes I believe so.

The binutils guys in Arm confirmed that assembling "tlbi rvae1is"
without -march=armv8.4-a is a bug. When it gets fixed, checking for the
binutils version is not sufficient without passing -march. I think we
are better off with a manual encoding of the instruction.

-- 
Catalin

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH v2 2/2] arm64: tlb: Use the TLBI RANGE feature in arm64
  2020-07-10  9:44 ` [PATCH v2 2/2] arm64: tlb: Use the TLBI RANGE feature in arm64 Zhenyu Ye
  2020-07-10 18:31   ` Catalin Marinas
  2020-07-13 14:27   ` Jon Hunter
@ 2020-07-14 10:36   ` Catalin Marinas
  2020-07-14 13:51     ` Zhenyu Ye
  2 siblings, 1 reply; 18+ messages in thread
From: Catalin Marinas @ 2020-07-14 10:36 UTC (permalink / raw)
  To: Zhenyu Ye
  Cc: will, suzuki.poulose, maz, steven.price, guohanjun, olof,
	linux-arm-kernel, linux-kernel, linux-arch, linux-mm, arm,
	xiexiangyou, prime.zeng, zhangshaokun, kuhn.chenqun

On Fri, Jul 10, 2020 at 05:44:20PM +0800, Zhenyu Ye wrote:
> +#define __TLBI_RANGE_PAGES(num, scale)	(((num) + 1) << (5 * (scale) + 1))
> +#define MAX_TLBI_RANGE_PAGES		__TLBI_RANGE_PAGES(31, 3)
> +
> +#define TLBI_RANGE_MASK			GENMASK_ULL(4, 0)
> +#define __TLBI_RANGE_NUM(range, scale)	\
> +	(((range) >> (5 * (scale) + 1)) & TLBI_RANGE_MASK)
[...]
> +	int num = 0;
> +	int scale = 0;
[...]
> +			start += __TLBI_RANGE_PAGES(num, scale) << PAGE_SHIFT;
[...]

Since num is an int, __TLBI_RANGE_PAGES is still an int. Shifting it by
PAGE_SHIFT can overflow as the maximum would be 8GB for 4K pages (or
128GB for 64K pages). I think we probably get away with this because of
some implicit type conversion but I'd rather make __TLBI_RANGE_PAGES an
explicit unsigned long:

#define __TLBI_RANGE_PAGES(num, scale)	((unsigned long)((num) + 1) << (5 * (scale) + 1))

Without this change, the CBMC check fails (see below for the test). In
the kernel, we don't have this problem as we encode the address via
__TLBI_VADDR_RANGE and it doesn't overflow.

The good part is that CBMC reckons the algorithm is correct ;).

---------------8<------tlbinval.c---------------------------
// SPDX-License-Identifier: GPL-2.0-only
/*
 * Check with:
 *   cbmc --unwind 6 tlbinval.c
 */

#define PAGE_SHIFT	(12)
#define PAGE_SIZE	(1 << PAGE_SHIFT)
#define VA_RANGE	(1UL << 48)

#define __round_mask(x, y) ((__typeof__(x))((y)-1))
#define round_up(x, y) ((((x)-1) | __round_mask(x, y))+1)
#define round_down(x, y) ((x) & ~__round_mask(x, y))

#define __TLBI_RANGE_PAGES(num, scale)	((unsigned long)((num) + 1) << (5 * (scale) + 1))
#define MAX_TLBI_RANGE_PAGES		__TLBI_RANGE_PAGES(31, 3)

#define TLBI_RANGE_MASK			0x1fUL
#define __TLBI_RANGE_NUM(pages, scale)	\
	((((pages) >> (5 * (scale) + 1)) & TLBI_RANGE_MASK) - 1)

static unsigned long inval_start;
static unsigned long inval_end;

static void tlbi(unsigned long start, unsigned long size)
{
	unsigned long end = start + size;

	if (inval_end == 0) {
		inval_start = start;
		inval_end = end;
		return;
	}

	/* contiguous ranges in ascending order only */
	__CPROVER_assert(start == inval_end, "Contiguous TLBI ranges");

	inval_end = end;
}

static void __flush_tlb_range(unsigned long start, unsigned long end,
			      unsigned long stride)
{
	int num = 0;
	int scale = 0;
	unsigned long pages;

	start = round_down(start, stride);
	end = round_up(end, stride);
	pages = (end - start) >> PAGE_SHIFT;

	if (pages >= MAX_TLBI_RANGE_PAGES) {
		tlbi(0, VA_RANGE);
		return;
	}

	while (pages > 0) {
		__CPROVER_assert(scale <= 3, "Scale in range");
		if (pages % 2 == 1) {
			tlbi(start, stride);
			start += stride;
			pages -= stride >> PAGE_SHIFT;
			continue;
		}

		num = __TLBI_RANGE_NUM(pages, scale);
		__CPROVER_assert(num <= 30, "Num in range");
		if (num >= 0) {
			tlbi(start, __TLBI_RANGE_PAGES(num, scale) << PAGE_SHIFT);
			start += __TLBI_RANGE_PAGES(num, scale) << PAGE_SHIFT;
			pages -= __TLBI_RANGE_PAGES(num, scale);
		}
		scale++;
	}
}

static unsigned long nondet_ulong(void);

int main(void)
{
	unsigned long stride = nondet_ulong();
	unsigned long start = round_down(nondet_ulong(), stride);
	unsigned long end = round_up(nondet_ulong(), stride);

	__CPROVER_assume(stride == PAGE_SIZE ||
			 stride == PAGE_SIZE << (PAGE_SHIFT - 3) ||
			 stride == PAGE_SIZE << (2 * (PAGE_SHIFT - 3)));
	__CPROVER_assume(start < end);
	__CPROVER_assume(end <= VA_RANGE);

	__flush_tlb_range(start, end, stride);

	__CPROVER_assert((inval_start == 0 && inval_end == VA_RANGE) ||
			 (inval_start == start && inval_end == end),
			 "Correct invalidation");

	return 0;
}

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH v2 2/2] arm64: tlb: Use the TLBI RANGE feature in arm64
  2020-07-14 10:36   ` Catalin Marinas
@ 2020-07-14 13:51     ` Zhenyu Ye
  0 siblings, 0 replies; 18+ messages in thread
From: Zhenyu Ye @ 2020-07-14 13:51 UTC (permalink / raw)
  To: Catalin Marinas
  Cc: will, suzuki.poulose, maz, steven.price, guohanjun, olof,
	linux-arm-kernel, linux-kernel, linux-arch, linux-mm, arm,
	xiexiangyou, prime.zeng, zhangshaokun, kuhn.chenqun

On 2020/7/14 18:36, Catalin Marinas wrote:
> On Fri, Jul 10, 2020 at 05:44:20PM +0800, Zhenyu Ye wrote:
>> +#define __TLBI_RANGE_PAGES(num, scale)	(((num) + 1) << (5 * (scale) + 1))
>> +#define MAX_TLBI_RANGE_PAGES		__TLBI_RANGE_PAGES(31, 3)
>> +
>> +#define TLBI_RANGE_MASK			GENMASK_ULL(4, 0)
>> +#define __TLBI_RANGE_NUM(range, scale)	\
>> +	(((range) >> (5 * (scale) + 1)) & TLBI_RANGE_MASK)
> [...]
>> +	int num = 0;
>> +	int scale = 0;
> [...]
>> +			start += __TLBI_RANGE_PAGES(num, scale) << PAGE_SHIFT;
> [...]
> 
> Since num is an int, __TLBI_RANGE_PAGES is still an int. Shifting it by
> PAGE_SHIFT can overflow as the maximum would be 8GB for 4K pages (or
> 128GB for 64K pages). I think we probably get away with this because of
> some implicit type conversion but I'd rather make __TLBI_RANGE_PAGES an
> explicit unsigned long:
> 
> #define __TLBI_RANGE_PAGES(num, scale)	((unsigned long)((num) + 1) << (5 * (scale) + 1))
> 

This is valuable and I will update this in next series, with the check
for binutils (or encode the instructions by hand), as soon as possible.

> Without this change, the CBMC check fails (see below for the test). In
> the kernel, we don't have this problem as we encode the address via
> __TLBI_VADDR_RANGE and it doesn't overflow.> The good part is that CBMC reckons the algorithm is correct ;).

Thanks for your test!

Zhenyu


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH v2 0/2] arm64: tlb: add support for TLBI RANGE instructions
  2020-07-13 16:59       ` Catalin Marinas
@ 2020-07-14 15:17         ` Zhenyu Ye
  2020-07-14 15:58           ` Catalin Marinas
  0 siblings, 1 reply; 18+ messages in thread
From: Zhenyu Ye @ 2020-07-14 15:17 UTC (permalink / raw)
  To: Catalin Marinas
  Cc: maz, steven.price, guohanjun, will, olof, suzuki.poulose,
	linux-kernel, linux-arm-kernel, zhangshaokun, prime.zeng,
	linux-arch, kuhn.chenqun, xiexiangyou, linux-mm, arm

On 2020/7/14 0:59, Catalin Marinas wrote:
>> +config ARM64_TLBI_RANGE
>> +	bool "Enable support for tlbi range feature"
>> +	default y
>> +	depends on AS_HAS_TLBI_RANGE
>> +	help
>> +	  ARMv8.4-TLBI provides TLBI invalidation instruction that apply to a
>> +	  range of input addresses.
>> +
>> +	  The feature introduces new assembly instructions, and they were
>> +	  support when binutils >= 2.30.
> 
> It looks like 2.30. I tracked it down to this commit:
> 
> https://sourceware.org/git/?p=binutils-gdb.git;a=commitdiff;h=793a194839bc8add71fdc7429c58b10f0667a6f6;hp=1a7ed57c840dcb0401f1a67c6763a89f7d2686d2
> 
>> +config AS_HAS_TLBI_RANGE
>> +	def_bool $(as-option, -Wa$(comma)-march=armv8.4-a)
>> +
>>  endmenu
> 
> The problem is that we don't pass -Wa,-march=armv8.4-a to gas. AFAICT,
> we only set an 8.3 for PAC but I'm not sure how passing two such options
> goes.
> 

Pass the -march twice may not have bad impact.  Test in my toolchains
and the newer one will be chosen.  Anyway, we can add judgment to avoid
them be passed at the same time.

> I'm slightly surprised that my toolchains (and yours) did not complain
> about these instructions. Looking at the binutils code, I think it
> should have complained if -march=armv8.4-a wasn't passed but works fine.
> I thought gas doesn't enable the maximum arch feature by default.
>> An alternative would be to check for a specific instruction (untested):
> 
> 	def_bool $(as-instr,tlbi rvae1is, x0)
> 
> but we need to figure out whether gas not requiring -march=armv8.4-a is
> a bug (which may be fixed) or that gas accepts all TLBI instructions.
> 

As you say in another email, this is a bug.  So we should pass -march=
armv8.4-a to gas if we use toolchains to generate tlbi range instructions.

But this bug only affects the compilation (cause WARNING or ERROR if not
pass -march-armv8.4-a when compiling) but not the judgment.

> A safer bet may be to simply encode the instructions by hand:
> 
> #define SYS_TLBI_RVAE1IS(Rt) \
> 	__emit_inst(0xd5000000 | sys_insn(1, 0, 8, 2, 1) | ((Rt) & 0x1f))
> #define SYS_TLBI_RVALE1IS(Rt) \
> 	__emit_inst(0xd5000000 | sys_insn(1, 0, 8, 2, 5) | ((Rt) & 0x1f))
> 
> (please check that they are correct)
> 

Currently in kernel, all tlbi instructions are passed through __tlbi()
and __tlbi_user(). If we encode the range instructions by hand, we may
should have to add a new mechanism for this:

1. choose a register and save it;
2. put the operations for tlbi range to the register;
3. do tlbi range by asm(SYS_TLBI_RVAE1IS(x0));
4. restore the value of the register.

It's complicated and will only be used with tlbi range instructions.
(Am I understand something wrong? )

So I am prefer to pass -march=armv8.4-a to toolschains to support tlbi
range instruction, just like what PAC does.

Thanks,
Zhenyu

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH v2 0/2] arm64: tlb: add support for TLBI RANGE instructions
  2020-07-14 15:17         ` Zhenyu Ye
@ 2020-07-14 15:58           ` Catalin Marinas
  0 siblings, 0 replies; 18+ messages in thread
From: Catalin Marinas @ 2020-07-14 15:58 UTC (permalink / raw)
  To: Zhenyu Ye
  Cc: maz, steven.price, guohanjun, will, olof, suzuki.poulose,
	linux-kernel, linux-arm-kernel, zhangshaokun, prime.zeng,
	linux-arch, kuhn.chenqun, xiexiangyou, linux-mm, arm

On Tue, Jul 14, 2020 at 11:17:01PM +0800, Zhenyu Ye wrote:
> On 2020/7/14 0:59, Catalin Marinas wrote:
> >> +config ARM64_TLBI_RANGE
> >> +	bool "Enable support for tlbi range feature"
> >> +	default y
> >> +	depends on AS_HAS_TLBI_RANGE
> >> +	help
> >> +	  ARMv8.4-TLBI provides TLBI invalidation instruction that apply to a
> >> +	  range of input addresses.
> >> +
> >> +	  The feature introduces new assembly instructions, and they were
> >> +	  support when binutils >= 2.30.
> > 
> > It looks like 2.30. I tracked it down to this commit:
> > 
> > https://sourceware.org/git/?p=binutils-gdb.git;a=commitdiff;h=793a194839bc8add71fdc7429c58b10f0667a6f6;hp=1a7ed57c840dcb0401f1a67c6763a89f7d2686d2
> > 
> >> +config AS_HAS_TLBI_RANGE
> >> +	def_bool $(as-option, -Wa$(comma)-march=armv8.4-a)

You could make this more generic like AS_HAS_ARMV8_4.

> > The problem is that we don't pass -Wa,-march=armv8.4-a to gas. AFAICT,
> > we only set an 8.3 for PAC but I'm not sure how passing two such options
> > goes.
> 
> Pass the -march twice may not have bad impact.  Test in my toolchains
> and the newer one will be chosen.  Anyway, we can add judgment to avoid
> them be passed at the same time.

I think the last one always overrides the previous (same with the .arch
statements in asm files). For example:

echo "paciasp" | aarch64-none-linux-gnu-as -march=armv8.2-a -march=armv8.3-a

succeeds but the one below fails:

echo "paciasp" | aarch64-none-linux-gnu-as -march=armv8.3-a -march=armv8.2-a

> > A safer bet may be to simply encode the instructions by hand:
> > 
> > #define SYS_TLBI_RVAE1IS(Rt) \
> > 	__emit_inst(0xd5000000 | sys_insn(1, 0, 8, 2, 1) | ((Rt) & 0x1f))
> > #define SYS_TLBI_RVALE1IS(Rt) \
> > 	__emit_inst(0xd5000000 | sys_insn(1, 0, 8, 2, 5) | ((Rt) & 0x1f))
> > 
> > (please check that they are correct)
> 
> Currently in kernel, all tlbi instructions are passed through __tlbi()
> and __tlbi_user(). If we encode the range instructions by hand, we may
> should have to add a new mechanism for this:
> 
> 1. choose a register and save it;
> 2. put the operations for tlbi range to the register;
> 3. do tlbi range by asm(SYS_TLBI_RVAE1IS(x0));
> 4. restore the value of the register.
> 
> It's complicated and will only be used with tlbi range instructions.
> (Am I understand something wrong? )
> 
> So I am prefer to pass -march=armv8.4-a to toolschains to support tlbi
> range instruction, just like what PAC does.

It will indeed get more complicated than necessary. So please go with
the -Wa,-march=armv8.4-a check in Kconfig and update the
arch/arm64/Makefile to pass this option (after the 8.3 one).

Thanks.

-- 
Catalin

^ permalink raw reply	[flat|nested] 18+ messages in thread

end of thread, other threads:[~2020-07-14 15:58 UTC | newest]

Thread overview: 18+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-07-10  9:44 [PATCH v2 0/2] arm64: tlb: add support for TLBI RANGE instructions Zhenyu Ye
2020-07-10  9:44 ` [PATCH v2 1/2] arm64: tlb: Detect the ARMv8.4 TLBI RANGE feature Zhenyu Ye
2020-07-10  9:44 ` [PATCH v2 2/2] arm64: tlb: Use the TLBI RANGE feature in arm64 Zhenyu Ye
2020-07-10 18:31   ` Catalin Marinas
2020-07-11  6:50     ` Zhenyu Ye
2020-07-12 12:03       ` Catalin Marinas
2020-07-13 14:27   ` Jon Hunter
2020-07-13 14:39     ` Zhenyu Ye
2020-07-13 14:44       ` Jon Hunter
2020-07-13 17:21         ` Catalin Marinas
2020-07-14 10:36   ` Catalin Marinas
2020-07-14 13:51     ` Zhenyu Ye
2020-07-10 19:11 ` [PATCH v2 0/2] arm64: tlb: add support for TLBI RANGE instructions Catalin Marinas
2020-07-13 12:21   ` Catalin Marinas
2020-07-13 12:41     ` Zhenyu Ye
2020-07-13 16:59       ` Catalin Marinas
2020-07-14 15:17         ` Zhenyu Ye
2020-07-14 15:58           ` Catalin Marinas

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).