kvm.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [kvm-unit-tests PATCH 0/6] Misc assembly fixes and cleanups
@ 2021-02-27 10:41 Alexandru Elisei
  2021-02-27 10:41 ` [kvm-unit-tests PATCH 1/6] arm64: Remove unnecessary ISB when writing to SPSel Alexandru Elisei
                   ` (5 more replies)
  0 siblings, 6 replies; 17+ messages in thread
From: Alexandru Elisei @ 2021-02-27 10:41 UTC (permalink / raw)
  To: drjones, kvm, kvmarm

This series is mostly fixes and cleanups for things I found when playing
with EFI support. Most of them I hope are fairly self-explanatory.

What is clearly aimed at running on baremetal is patch #2 ("arm/arm64:
Remove dcache_line_size global variable"), which is needed because the
startup environment is different for EFI apps and we're going to need to do
cache maintenance before setup() is run.

Patch #4 ("lib: arm64: Consolidate register definitions to sysreg.h") is
there to make importing register definitions and other header files from
Linux (like el2_setup.h) easier by switching to the same layout. And arm
is already using sysreg.h for SCTLR fields.

Alexandru Elisei (6):
  arm64: Remove unnecessary ISB when writing to SPSel
  arm/arm64: Remove dcache_line_size global variable
  arm/arm64: Remove unnecessary ISB when doing dcache maintenance
  lib: arm64: Consolidate register definitions to sysreg.h
  arm64: Configure SCTLR_EL1 at boot
  arm64: Disable TTBR1_EL1 translation table walks

 lib/arm/asm/assembler.h       | 44 +++++++++++++++++++++++++++++
 lib/arm/asm/processor.h       |  7 -----
 lib/arm64/asm/arch_gicv3.h    |  6 ----
 lib/arm64/asm/assembler.h     | 53 +++++++++++++++++++++++++++++++++++
 lib/arm64/asm/pgtable-hwdef.h |  1 +
 lib/arm64/asm/processor.h     | 17 -----------
 lib/arm64/asm/sysreg.h        | 24 ++++++++++++++++
 lib/arm/setup.c               |  7 -----
 arm/cstart.S                  | 19 ++-----------
 arm/cstart64.S                | 28 +++++++-----------
 10 files changed, 135 insertions(+), 71 deletions(-)
 create mode 100644 lib/arm/asm/assembler.h
 create mode 100644 lib/arm64/asm/assembler.h

-- 
2.30.1


^ permalink raw reply	[flat|nested] 17+ messages in thread

* [kvm-unit-tests PATCH 1/6] arm64: Remove unnecessary ISB when writing to SPSel
  2021-02-27 10:41 [kvm-unit-tests PATCH 0/6] Misc assembly fixes and cleanups Alexandru Elisei
@ 2021-02-27 10:41 ` Alexandru Elisei
  2021-03-03 17:35   ` Andre Przywara
  2021-02-27 10:41 ` [kvm-unit-tests PATCH 2/6] arm/arm64: Remove dcache_line_size global variable Alexandru Elisei
                   ` (4 subsequent siblings)
  5 siblings, 1 reply; 17+ messages in thread
From: Alexandru Elisei @ 2021-02-27 10:41 UTC (permalink / raw)
  To: drjones, kvm, kvmarm

Software can use the SPSel operand to write directly to PSTATE.SP.
According to ARM DDI 0487F.b, page D1-2332, writes to PSTATE are
self-synchronizing and no ISB is needed:

"Writes to the PSTATE fields have side-effects on various aspects of the PE
operation. All of these side-effects are guaranteed:
- Not to be visible to earlier instructions in the execution stream.
- To be visible to later instructions in the execution stream."

Signed-off-by: Alexandru Elisei <alexandru.elisei@arm.com>
---
 arm/cstart64.S | 1 -
 1 file changed, 1 deletion(-)

diff --git a/arm/cstart64.S b/arm/cstart64.S
index 0428014aa58a..fc1930bcdb53 100644
--- a/arm/cstart64.S
+++ b/arm/cstart64.S
@@ -54,7 +54,6 @@ start:
 	/* set up stack */
 	mov	x4, #1
 	msr	spsel, x4
-	isb
 	adrp    x4, stackptr
 	add     sp, x4, :lo12:stackptr
 
-- 
2.30.1


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [kvm-unit-tests PATCH 2/6] arm/arm64: Remove dcache_line_size global variable
  2021-02-27 10:41 [kvm-unit-tests PATCH 0/6] Misc assembly fixes and cleanups Alexandru Elisei
  2021-02-27 10:41 ` [kvm-unit-tests PATCH 1/6] arm64: Remove unnecessary ISB when writing to SPSel Alexandru Elisei
@ 2021-02-27 10:41 ` Alexandru Elisei
  2021-03-04 15:00   ` Andre Przywara
  2021-02-27 10:41 ` [kvm-unit-tests PATCH 3/6] arm/arm64: Remove unnecessary ISB when doing dcache maintenance Alexandru Elisei
                   ` (3 subsequent siblings)
  5 siblings, 1 reply; 17+ messages in thread
From: Alexandru Elisei @ 2021-02-27 10:41 UTC (permalink / raw)
  To: drjones, kvm, kvmarm

Compute the dcache line size when doing dcache maintenance instead of using
a global variable computed in setup(), which allows us to do dcache
maintenance at any point in the boot process. This will be useful for
running as an EFI app and it also aligns our implementation to that of the
Linux kernel.

For consistency, the arm code has been similary modified.

Signed-off-by: Alexandru Elisei <alexandru.elisei@arm.com>
---
 lib/arm/asm/assembler.h   | 44 ++++++++++++++++++++++++++++++++
 lib/arm/asm/processor.h   |  7 ------
 lib/arm64/asm/assembler.h | 53 +++++++++++++++++++++++++++++++++++++++
 lib/arm64/asm/processor.h |  7 ------
 lib/arm/setup.c           |  7 ------
 arm/cstart.S              | 18 +++----------
 arm/cstart64.S            | 16 ++----------
 7 files changed, 102 insertions(+), 50 deletions(-)
 create mode 100644 lib/arm/asm/assembler.h
 create mode 100644 lib/arm64/asm/assembler.h

diff --git a/lib/arm/asm/assembler.h b/lib/arm/asm/assembler.h
new file mode 100644
index 000000000000..6b932df86204
--- /dev/null
+++ b/lib/arm/asm/assembler.h
@@ -0,0 +1,44 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+/*
+ * Based on several files from Linux version v5.10: arch/arm/mm/proc-macros.S,
+ * arch/arm/mm/proc-v7.S.
+ */
+
+/*
+ * dcache_line_size - get the minimum D-cache line size from the CTR register
+ * on ARMv7.
+ */
+	.macro	dcache_line_size, reg, tmp
+	mrc	p15, 0, \tmp, c0, c0, 1		// read ctr
+	lsr	\tmp, \tmp, #16
+	and	\tmp, \tmp, #0xf		// cache line size encoding
+	mov	\reg, #4			// bytes per word
+	mov	\reg, \reg, lsl \tmp		// actual cache line size
+	.endm
+
+/*
+ * Macro to perform a data cache maintenance for the interval
+ * [addr, addr + size).
+ *
+ * 	op:		operation to execute
+ * 	domain		domain used in the dsb instruction
+ * 	addr:		starting virtual address of the region
+ * 	size:		size of the region
+ * 	Corrupts:	addr, size, tmp1, tmp2
+ */
+	.macro dcache_by_line_op op, domain, addr, size, tmp1, tmp2
+	dcache_line_size \tmp1, \tmp2
+	add	\size, \addr, \size
+	sub	\tmp2, \tmp1, #1
+	bic	\addr, \addr, \tmp2
+9998:
+	.ifc	\op, dccimvac
+	mcr	p15, 0, \addr, c7, c14, 1
+	.else
+	.err
+	.endif
+	add	\addr, \addr, \tmp1
+	cmp	\addr, \size
+	blo	9998b
+	dsb	\domain
+	.endm
diff --git a/lib/arm/asm/processor.h b/lib/arm/asm/processor.h
index 273366d1fe1c..3c36eac903f0 100644
--- a/lib/arm/asm/processor.h
+++ b/lib/arm/asm/processor.h
@@ -9,11 +9,6 @@
 #include <asm/sysreg.h>
 #include <asm/barrier.h>
 
-#define CTR_DMINLINE_SHIFT	16
-#define CTR_DMINLINE_MASK	(0xf << 16)
-#define CTR_DMINLINE(x)	\
-	(((x) & CTR_DMINLINE_MASK) >> CTR_DMINLINE_SHIFT)
-
 enum vector {
 	EXCPTN_RST,
 	EXCPTN_UND,
@@ -89,6 +84,4 @@ static inline u32 get_ctr(void)
 	return read_sysreg(CTR);
 }
 
-extern unsigned long dcache_line_size;
-
 #endif /* _ASMARM_PROCESSOR_H_ */
diff --git a/lib/arm64/asm/assembler.h b/lib/arm64/asm/assembler.h
new file mode 100644
index 000000000000..f801c0c43d02
--- /dev/null
+++ b/lib/arm64/asm/assembler.h
@@ -0,0 +1,53 @@
+/* SPDX-License-Identifier: GPL-2.0-only */
+/*
+ * Based on the file arch/arm64/include/asm/assembled.h from Linux v5.10, which
+ * in turn is based on arch/arm/include/asm/assembler.h and
+ * arch/arm/mm/proc-macros.S
+ *
+ * Copyright (C) 1996-2000 Russell King
+ * Copyright (C) 2012 ARM Ltd.
+ */
+#ifndef __ASSEMBLY__
+#error "Only include this from assembly code"
+#endif
+
+#ifndef __ASM_ASSEMBLER_H
+#define __ASM_ASSEMBLER_H
+
+/*
+ * raw_dcache_line_size - get the minimum D-cache line size on this CPU
+ * from the CTR register.
+ */
+	.macro	raw_dcache_line_size, reg, tmp
+	mrs	\tmp, ctr_el0			// read CTR
+	ubfm	\tmp, \tmp, #16, #19		// cache line size encoding
+	mov	\reg, #4			// bytes per word
+	lsl	\reg, \reg, \tmp		// actual cache line size
+	.endm
+
+/*
+ * Macro to perform a data cache maintenance for the interval
+ * [addr, addr + size). Use the raw value for the dcache line size because
+ * kvm-unit-tests has no concept of scheduling.
+ *
+ * 	op:		operation passed to dc instruction
+ * 	domain:		domain used in dsb instruciton
+ * 	addr:		starting virtual address of the region
+ * 	size:		size of the region
+ * 	Corrupts:	addr, size, tmp1, tmp2
+ */
+
+	.macro dcache_by_line_op op, domain, addr, size, tmp1, tmp2
+	raw_dcache_line_size \tmp1, \tmp2
+	add	\size, \addr, \size
+	sub	\tmp2, \tmp1, #1
+	bic	\addr, \addr, \tmp2
+9998:
+	dc	\op, \addr
+	add	\addr, \addr, \tmp1
+	cmp	\addr, \size
+	b.lo	9998b
+	dsb	\domain
+	.endm
+
+#endif	/* __ASM_ASSEMBLER_H */
diff --git a/lib/arm64/asm/processor.h b/lib/arm64/asm/processor.h
index 771b2d1e0c94..cdc2463e1981 100644
--- a/lib/arm64/asm/processor.h
+++ b/lib/arm64/asm/processor.h
@@ -16,11 +16,6 @@
 #define SCTLR_EL1_A	(1 << 1)
 #define SCTLR_EL1_M	(1 << 0)
 
-#define CTR_DMINLINE_SHIFT	16
-#define CTR_DMINLINE_MASK	(0xf << 16)
-#define CTR_DMINLINE(x)	\
-	(((x) & CTR_DMINLINE_MASK) >> CTR_DMINLINE_SHIFT)
-
 #ifndef __ASSEMBLY__
 #include <asm/ptrace.h>
 #include <asm/esr.h>
@@ -115,8 +110,6 @@ static inline u64 get_ctr(void)
 	return read_sysreg(ctr_el0);
 }
 
-extern unsigned long dcache_line_size;
-
 static inline unsigned long get_id_aa64mmfr0_el1(void)
 {
 	return read_sysreg(id_aa64mmfr0_el1);
diff --git a/lib/arm/setup.c b/lib/arm/setup.c
index 066524f8bf61..751ba980000a 100644
--- a/lib/arm/setup.c
+++ b/lib/arm/setup.c
@@ -42,8 +42,6 @@ static struct mem_region __initial_mem_regions[NR_INITIAL_MEM_REGIONS + 1];
 struct mem_region *mem_regions = __initial_mem_regions;
 phys_addr_t __phys_offset, __phys_end;
 
-unsigned long dcache_line_size;
-
 int mpidr_to_cpu(uint64_t mpidr)
 {
 	int i;
@@ -72,11 +70,6 @@ static void cpu_init(void)
 	ret = dt_for_each_cpu_node(cpu_set, NULL);
 	assert(ret == 0);
 	set_cpu_online(0, true);
-	/*
-	 * DminLine is log2 of the number of words in the smallest cache line; a
-	 * word is 4 bytes.
-	 */
-	dcache_line_size = 1 << (CTR_DMINLINE(get_ctr()) + 2);
 }
 
 unsigned int mem_region_get_flags(phys_addr_t paddr)
diff --git a/arm/cstart.S b/arm/cstart.S
index ef936ae2f874..954748b00f64 100644
--- a/arm/cstart.S
+++ b/arm/cstart.S
@@ -7,6 +7,7 @@
  */
 #define __ASSEMBLY__
 #include <auxinfo.h>
+#include <asm/assembler.h>
 #include <asm/thread_info.h>
 #include <asm/asm-offsets.h>
 #include <asm/pgtable-hwdef.h>
@@ -197,20 +198,6 @@ asm_mmu_enable:
 
 	mov     pc, lr
 
-.macro dcache_clean_inval domain, start, end, tmp1, tmp2
-	ldr	\tmp1, =dcache_line_size
-	ldr	\tmp1, [\tmp1]
-	sub	\tmp2, \tmp1, #1
-	bic	\start, \start, \tmp2
-9998:
-	/* DCCIMVAC */
-	mcr	p15, 0, \start, c7, c14, 1
-	add	\start, \start, \tmp1
-	cmp	\start, \end
-	blo	9998b
-	dsb	\domain
-.endm
-
 .globl asm_mmu_disable
 asm_mmu_disable:
 	/* SCTLR */
@@ -223,7 +210,8 @@ asm_mmu_disable:
 	ldr	r0, [r0]
 	ldr	r1, =__phys_end
 	ldr	r1, [r1]
-	dcache_clean_inval sy, r0, r1, r2, r3
+	sub	r1, r1, r0
+	dcache_by_line_op dccimvac, sy, r0, r1, r2, r3
 	isb
 
 	mov     pc, lr
diff --git a/arm/cstart64.S b/arm/cstart64.S
index fc1930bcdb53..046bd3914098 100644
--- a/arm/cstart64.S
+++ b/arm/cstart64.S
@@ -8,6 +8,7 @@
 #define __ASSEMBLY__
 #include <auxinfo.h>
 #include <asm/asm-offsets.h>
+#include <asm/assembler.h>
 #include <asm/ptrace.h>
 #include <asm/processor.h>
 #include <asm/page.h>
@@ -204,20 +205,6 @@ asm_mmu_enable:
 
 	ret
 
-/* Taken with small changes from arch/arm64/incluse/asm/assembler.h */
-.macro dcache_by_line_op op, domain, start, end, tmp1, tmp2
-	adrp	\tmp1, dcache_line_size
-	ldr	\tmp1, [\tmp1, :lo12:dcache_line_size]
-	sub	\tmp2, \tmp1, #1
-	bic	\start, \start, \tmp2
-9998:
-	dc	\op , \start
-	add	\start, \start, \tmp1
-	cmp	\start, \end
-	b.lo	9998b
-	dsb	\domain
-.endm
-
 .globl asm_mmu_disable
 asm_mmu_disable:
 	mrs	x0, sctlr_el1
@@ -230,6 +217,7 @@ asm_mmu_disable:
 	ldr	x0, [x0, :lo12:__phys_offset]
 	adrp	x1, __phys_end
 	ldr	x1, [x1, :lo12:__phys_end]
+	sub	x1, x1, x0
 	dcache_by_line_op civac, sy, x0, x1, x2, x3
 	isb
 
-- 
2.30.1


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [kvm-unit-tests PATCH 3/6] arm/arm64: Remove unnecessary ISB when doing dcache maintenance
  2021-02-27 10:41 [kvm-unit-tests PATCH 0/6] Misc assembly fixes and cleanups Alexandru Elisei
  2021-02-27 10:41 ` [kvm-unit-tests PATCH 1/6] arm64: Remove unnecessary ISB when writing to SPSel Alexandru Elisei
  2021-02-27 10:41 ` [kvm-unit-tests PATCH 2/6] arm/arm64: Remove dcache_line_size global variable Alexandru Elisei
@ 2021-02-27 10:41 ` Alexandru Elisei
  2021-03-12 14:59   ` Andrew Jones
  2021-02-27 10:41 ` [kvm-unit-tests PATCH 4/6] lib: arm64: Consolidate register definitions to sysreg.h Alexandru Elisei
                   ` (2 subsequent siblings)
  5 siblings, 1 reply; 17+ messages in thread
From: Alexandru Elisei @ 2021-02-27 10:41 UTC (permalink / raw)
  To: drjones, kvm, kvmarm

The dcache_by_line_op macro executes a DSB to complete the cache
maintenance operations. According to ARM DDI 0487G.a, page B2-150:

"In addition, no instruction that appears in program order after the DSB
instruction can alter any state of the system or perform any part of its
functionality until the DSB completes other than:

- Being fetched from memory and decoded.
- Reading the general-purpose, SIMD and floating-point, Special-purpose, or
  System registers that are directly or indirectly read without causing
  side-effects."

Similar definition for ARM in ARM DDI 0406C.d, page A3-150:

"In addition, no instruction that appears in program order after the DSB
instruction can execute until the DSB completes."

This means that we don't need the ISB to prevent reordering of the cache
maintenance instructions.

We are also not doing icache maintenance, where an ISB would be required
for the PE to discard instructions speculated before the invalidation.

In conclusion, the ISB is unnecessary, so remove it.

Signed-off-by: Alexandru Elisei <alexandru.elisei@arm.com>
---
 arm/cstart.S   | 1 -
 arm/cstart64.S | 1 -
 2 files changed, 2 deletions(-)

diff --git a/arm/cstart.S b/arm/cstart.S
index 954748b00f64..2d62c1e6d40d 100644
--- a/arm/cstart.S
+++ b/arm/cstart.S
@@ -212,7 +212,6 @@ asm_mmu_disable:
 	ldr	r1, [r1]
 	sub	r1, r1, r0
 	dcache_by_line_op dccimvac, sy, r0, r1, r2, r3
-	isb
 
 	mov     pc, lr
 
diff --git a/arm/cstart64.S b/arm/cstart64.S
index 046bd3914098..c1deff842f03 100644
--- a/arm/cstart64.S
+++ b/arm/cstart64.S
@@ -219,7 +219,6 @@ asm_mmu_disable:
 	ldr	x1, [x1, :lo12:__phys_end]
 	sub	x1, x1, x0
 	dcache_by_line_op civac, sy, x0, x1, x2, x3
-	isb
 
 	ret
 
-- 
2.30.1


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [kvm-unit-tests PATCH 4/6] lib: arm64: Consolidate register definitions to sysreg.h
  2021-02-27 10:41 [kvm-unit-tests PATCH 0/6] Misc assembly fixes and cleanups Alexandru Elisei
                   ` (2 preceding siblings ...)
  2021-02-27 10:41 ` [kvm-unit-tests PATCH 3/6] arm/arm64: Remove unnecessary ISB when doing dcache maintenance Alexandru Elisei
@ 2021-02-27 10:41 ` Alexandru Elisei
  2021-03-03 17:32   ` Andre Przywara
  2021-02-27 10:42 ` [kvm-unit-tests PATCH 5/6] arm64: Configure SCTLR_EL1 at boot Alexandru Elisei
  2021-02-27 10:42 ` [kvm-unit-tests PATCH 6/6] arm64: Disable TTBR1_EL1 translation table walks Alexandru Elisei
  5 siblings, 1 reply; 17+ messages in thread
From: Alexandru Elisei @ 2021-02-27 10:41 UTC (permalink / raw)
  To: drjones, kvm, kvmarm

Move system register field definitions to sysreg.h, where the opcodes for
system register access are defined, to align ourselves with the Linux
kernel. EL2 support, needed for EFI and nested virtualization testing, will
require additional register and field definions, and having them in the
same place as Linux will make maintenance easier.

Signed-off-by: Alexandru Elisei <alexandru.elisei@arm.com>
---
 lib/arm64/asm/arch_gicv3.h |  6 ------
 lib/arm64/asm/processor.h  | 10 ----------
 lib/arm64/asm/sysreg.h     | 17 +++++++++++++++++
 arm/cstart64.S             |  2 +-
 4 files changed, 18 insertions(+), 17 deletions(-)

diff --git a/lib/arm64/asm/arch_gicv3.h b/lib/arm64/asm/arch_gicv3.h
index a7994ec2fbbe..fdee4de1f1f6 100644
--- a/lib/arm64/asm/arch_gicv3.h
+++ b/lib/arm64/asm/arch_gicv3.h
@@ -10,12 +10,6 @@
 
 #include <asm/sysreg.h>
 
-#define ICC_PMR_EL1			sys_reg(3, 0, 4, 6, 0)
-#define ICC_SGI1R_EL1			sys_reg(3, 0, 12, 11, 5)
-#define ICC_IAR1_EL1			sys_reg(3, 0, 12, 12, 0)
-#define ICC_EOIR1_EL1			sys_reg(3, 0, 12, 12, 1)
-#define ICC_GRPEN1_EL1			sys_reg(3, 0, 12, 12, 7)
-
 #ifndef __ASSEMBLY__
 
 #include <libcflat.h>
diff --git a/lib/arm64/asm/processor.h b/lib/arm64/asm/processor.h
index cdc2463e1981..4a3d826ab560 100644
--- a/lib/arm64/asm/processor.h
+++ b/lib/arm64/asm/processor.h
@@ -6,16 +6,6 @@
  * This work is licensed under the terms of the GNU LGPL, version 2.
  */
 
-/* System Control Register (SCTLR_EL1) bits */
-#define SCTLR_EL1_EE	(1 << 25)
-#define SCTLR_EL1_WXN	(1 << 19)
-#define SCTLR_EL1_I	(1 << 12)
-#define SCTLR_EL1_SA0	(1 << 4)
-#define SCTLR_EL1_SA	(1 << 3)
-#define SCTLR_EL1_C	(1 << 2)
-#define SCTLR_EL1_A	(1 << 1)
-#define SCTLR_EL1_M	(1 << 0)
-
 #ifndef __ASSEMBLY__
 #include <asm/ptrace.h>
 #include <asm/esr.h>
diff --git a/lib/arm64/asm/sysreg.h b/lib/arm64/asm/sysreg.h
index 378bf7ebb3b5..9d6b4fc66936 100644
--- a/lib/arm64/asm/sysreg.h
+++ b/lib/arm64/asm/sysreg.h
@@ -70,4 +70,21 @@ asm(
 "	.endm\n"
 );
 #endif /* __ASSEMBLY__ */
+
+#define ICC_PMR_EL1			sys_reg(3, 0, 4, 6, 0)
+#define ICC_SGI1R_EL1			sys_reg(3, 0, 12, 11, 5)
+#define ICC_IAR1_EL1			sys_reg(3, 0, 12, 12, 0)
+#define ICC_EOIR1_EL1			sys_reg(3, 0, 12, 12, 1)
+#define ICC_GRPEN1_EL1			sys_reg(3, 0, 12, 12, 7)
+
+/* System Control Register (SCTLR_EL1) bits */
+#define SCTLR_EL1_EE	(1 << 25)
+#define SCTLR_EL1_WXN	(1 << 19)
+#define SCTLR_EL1_I	(1 << 12)
+#define SCTLR_EL1_SA0	(1 << 4)
+#define SCTLR_EL1_SA	(1 << 3)
+#define SCTLR_EL1_C	(1 << 2)
+#define SCTLR_EL1_A	(1 << 1)
+#define SCTLR_EL1_M	(1 << 0)
+
 #endif /* _ASMARM64_SYSREG_H_ */
diff --git a/arm/cstart64.S b/arm/cstart64.S
index c1deff842f03..f6c5d2ebccf3 100644
--- a/arm/cstart64.S
+++ b/arm/cstart64.S
@@ -10,9 +10,9 @@
 #include <asm/asm-offsets.h>
 #include <asm/assembler.h>
 #include <asm/ptrace.h>
-#include <asm/processor.h>
 #include <asm/page.h>
 #include <asm/pgtable-hwdef.h>
+#include <asm/sysreg.h>
 
 .section .init
 
-- 
2.30.1


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [kvm-unit-tests PATCH 5/6] arm64: Configure SCTLR_EL1 at boot
  2021-02-27 10:41 [kvm-unit-tests PATCH 0/6] Misc assembly fixes and cleanups Alexandru Elisei
                   ` (3 preceding siblings ...)
  2021-02-27 10:41 ` [kvm-unit-tests PATCH 4/6] lib: arm64: Consolidate register definitions to sysreg.h Alexandru Elisei
@ 2021-02-27 10:42 ` Alexandru Elisei
  2021-03-03 17:32   ` Andre Przywara
  2021-02-27 10:42 ` [kvm-unit-tests PATCH 6/6] arm64: Disable TTBR1_EL1 translation table walks Alexandru Elisei
  5 siblings, 1 reply; 17+ messages in thread
From: Alexandru Elisei @ 2021-02-27 10:42 UTC (permalink / raw)
  To: drjones, kvm, kvmarm

Some fields in SCTLR_EL1 are UNKNOWN at reset and the arm64 boot
requirements, as stated by Linux in Documentation/arm64/booting.rst, do not
specify a particular value for all the fields. Do not rely on the good will
of the hypervisor and userspace to set SCTLR_EL1 to a sane value (by their
definition of sane) and set SCTLR_EL1 explicitely before running setup().
This will ensure that all tests are performed with the hardware set up
identically, regardless of the KVM or VMM versions.

Signed-off-by: Alexandru Elisei <alexandru.elisei@arm.com>
---
 lib/arm64/asm/sysreg.h | 7 +++++++
 arm/cstart64.S         | 5 +++++
 2 files changed, 12 insertions(+)

diff --git a/lib/arm64/asm/sysreg.h b/lib/arm64/asm/sysreg.h
index 9d6b4fc66936..18c4ed39557a 100644
--- a/lib/arm64/asm/sysreg.h
+++ b/lib/arm64/asm/sysreg.h
@@ -8,6 +8,8 @@
 #ifndef _ASMARM64_SYSREG_H_
 #define _ASMARM64_SYSREG_H_
 
+#include <linux/const.h>
+
 #define sys_reg(op0, op1, crn, crm, op2) \
 	((((op0)&3)<<19)|((op1)<<16)|((crn)<<12)|((crm)<<8)|((op2)<<5))
 
@@ -87,4 +89,9 @@ asm(
 #define SCTLR_EL1_A	(1 << 1)
 #define SCTLR_EL1_M	(1 << 0)
 
+#define SCTLR_EL1_RES1	(_BITUL(7) | _BITUL(8) | _BITUL(11) | _BITUL(20) | \
+			 _BITUL(22) | _BITUL(23) | _BITUL(28) | _BITUL(29))
+#define INIT_SCTLR_EL1_MMU_OFF	\
+			SCTLR_EL1_RES1
+
 #endif /* _ASMARM64_SYSREG_H_ */
diff --git a/arm/cstart64.S b/arm/cstart64.S
index f6c5d2ebccf3..42a838ff4c38 100644
--- a/arm/cstart64.S
+++ b/arm/cstart64.S
@@ -52,6 +52,11 @@ start:
 	b	1b
 
 1:
+	/* set SCTLR_EL1 to a known value */
+	ldr	x4, =INIT_SCTLR_EL1_MMU_OFF
+	msr	sctlr_el1, x4
+	isb
+
 	/* set up stack */
 	mov	x4, #1
 	msr	spsel, x4
-- 
2.30.1


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [kvm-unit-tests PATCH 6/6] arm64: Disable TTBR1_EL1 translation table walks
  2021-02-27 10:41 [kvm-unit-tests PATCH 0/6] Misc assembly fixes and cleanups Alexandru Elisei
                   ` (4 preceding siblings ...)
  2021-02-27 10:42 ` [kvm-unit-tests PATCH 5/6] arm64: Configure SCTLR_EL1 at boot Alexandru Elisei
@ 2021-02-27 10:42 ` Alexandru Elisei
  2021-03-03 17:32   ` Andre Przywara
  5 siblings, 1 reply; 17+ messages in thread
From: Alexandru Elisei @ 2021-02-27 10:42 UTC (permalink / raw)
  To: drjones, kvm, kvmarm; +Cc: Mark Rutland

From an architectural point of view, the PE can speculate instruction
fetches and data reads at any time when the MMU is enabled using the
translation tables from TTBR0_EL1 and TTBR1_EL1. kvm-unit-tests uses an
identity map, and as such it only programs TTBR0_EL1 with a valid table and
leaves TTBR1_EL1 unchanged. The reset value for TTBR1_EL1 is UNKNOWN, which
means it is possible for the PE to perform reads from memory locations
where accesses can cause side effects (like memory-mapped devices) as part
of the speculated translation table walk.

So far, this hasn't been a problem, because KVM resets TTBR{0,1}_EL1 to
zero, and that address is used for emulation for both qemu and kvmtool and
it doesn't point to a real device. However, kvm-unit-tests shouldn't rely
on a particular combination of hypervisor and userspace for correctness.
Another situation where we can't rely on these assumptions being true is
when kvm-unit-tests is run as an EFI app.

To prevent reads from arbitrary addresses, set the TCR_EL1.EPD1 bit to
disable speculative translation table walks using TTBR1_EL1.

This is similar to EDK2 commit fafb7e9c110e ("ArmPkg: correct TTBR1_EL1
settings in TCR_EL1"). Also mentioned in that commit is the Cortex-A57
erratum 822227 which impacts the hypervisor, but kvm-unit-tests is
protected against it because asm_mmu_enable sets both the TCR_EL1.TG0 and
TCR_EL1.TG1 bits when programming the register.

Suggested-by: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Alexandru Elisei <alexandru.elisei@arm.com>
---
 lib/arm64/asm/pgtable-hwdef.h | 1 +
 arm/cstart64.S                | 3 ++-
 2 files changed, 3 insertions(+), 1 deletion(-)

diff --git a/lib/arm64/asm/pgtable-hwdef.h b/lib/arm64/asm/pgtable-hwdef.h
index 48a1d1ab1ac2..8c41fe123fb3 100644
--- a/lib/arm64/asm/pgtable-hwdef.h
+++ b/lib/arm64/asm/pgtable-hwdef.h
@@ -136,6 +136,7 @@
 #define TCR_ORGN_WBnWA		((UL(3) << 10) | (UL(3) << 26))
 #define TCR_ORGN_MASK		((UL(3) << 10) | (UL(3) << 26))
 #define TCR_SHARED		((UL(3) << 12) | (UL(3) << 28))
+#define TCR_EPD1		(UL(1) << 23)
 #define TCR_TG0_4K		(UL(0) << 14)
 #define TCR_TG0_64K		(UL(1) << 14)
 #define TCR_TG0_16K		(UL(2) << 14)
diff --git a/arm/cstart64.S b/arm/cstart64.S
index 42a838ff4c38..3d359c8387c9 100644
--- a/arm/cstart64.S
+++ b/arm/cstart64.S
@@ -181,7 +181,8 @@ asm_mmu_enable:
 	ldr	x1, =TCR_TxSZ(VA_BITS) |		\
 		     TCR_TG_FLAGS  |			\
 		     TCR_IRGN_WBWA | TCR_ORGN_WBWA |	\
-		     TCR_SHARED
+		     TCR_SHARED |			\
+		     TCR_EPD1
 	mrs	x2, id_aa64mmfr0_el1
 	bfi	x1, x2, #32, #3
 	msr	tcr_el1, x1
-- 
2.30.1


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* Re: [kvm-unit-tests PATCH 4/6] lib: arm64: Consolidate register definitions to sysreg.h
  2021-02-27 10:41 ` [kvm-unit-tests PATCH 4/6] lib: arm64: Consolidate register definitions to sysreg.h Alexandru Elisei
@ 2021-03-03 17:32   ` Andre Przywara
  0 siblings, 0 replies; 17+ messages in thread
From: Andre Przywara @ 2021-03-03 17:32 UTC (permalink / raw)
  To: Alexandru Elisei; +Cc: drjones, kvm, kvmarm

On Sat, 27 Feb 2021 10:41:59 +0000
Alexandru Elisei <alexandru.elisei@arm.com> wrote:

> Move system register field definitions to sysreg.h, where the opcodes for
> system register access are defined, to align ourselves with the Linux
> kernel. EL2 support, needed for EFI and nested virtualization testing, will
> require additional register and field definions, and having them in the
> same place as Linux will make maintenance easier.
> 
> Signed-off-by: Alexandru Elisei <alexandru.elisei@arm.com>

Checked to be just moves, and it compiles, so:

Reviewed-by: Andre Przywara <andre.przywara@arm.com>

Cheers,
Andre

> ---
>  lib/arm64/asm/arch_gicv3.h |  6 ------
>  lib/arm64/asm/processor.h  | 10 ----------
>  lib/arm64/asm/sysreg.h     | 17 +++++++++++++++++
>  arm/cstart64.S             |  2 +-
>  4 files changed, 18 insertions(+), 17 deletions(-)
> 
> diff --git a/lib/arm64/asm/arch_gicv3.h b/lib/arm64/asm/arch_gicv3.h
> index a7994ec2fbbe..fdee4de1f1f6 100644
> --- a/lib/arm64/asm/arch_gicv3.h
> +++ b/lib/arm64/asm/arch_gicv3.h
> @@ -10,12 +10,6 @@
>  
>  #include <asm/sysreg.h>
>  
> -#define ICC_PMR_EL1			sys_reg(3, 0, 4, 6, 0)
> -#define ICC_SGI1R_EL1			sys_reg(3, 0, 12, 11, 5)
> -#define ICC_IAR1_EL1			sys_reg(3, 0, 12, 12, 0)
> -#define ICC_EOIR1_EL1			sys_reg(3, 0, 12, 12, 1)
> -#define ICC_GRPEN1_EL1			sys_reg(3, 0, 12, 12, 7)
> -
>  #ifndef __ASSEMBLY__
>  
>  #include <libcflat.h>
> diff --git a/lib/arm64/asm/processor.h b/lib/arm64/asm/processor.h
> index cdc2463e1981..4a3d826ab560 100644
> --- a/lib/arm64/asm/processor.h
> +++ b/lib/arm64/asm/processor.h
> @@ -6,16 +6,6 @@
>   * This work is licensed under the terms of the GNU LGPL, version 2.
>   */
>  
> -/* System Control Register (SCTLR_EL1) bits */
> -#define SCTLR_EL1_EE	(1 << 25)
> -#define SCTLR_EL1_WXN	(1 << 19)
> -#define SCTLR_EL1_I	(1 << 12)
> -#define SCTLR_EL1_SA0	(1 << 4)
> -#define SCTLR_EL1_SA	(1 << 3)
> -#define SCTLR_EL1_C	(1 << 2)
> -#define SCTLR_EL1_A	(1 << 1)
> -#define SCTLR_EL1_M	(1 << 0)
> -
>  #ifndef __ASSEMBLY__
>  #include <asm/ptrace.h>
>  #include <asm/esr.h>
> diff --git a/lib/arm64/asm/sysreg.h b/lib/arm64/asm/sysreg.h
> index 378bf7ebb3b5..9d6b4fc66936 100644
> --- a/lib/arm64/asm/sysreg.h
> +++ b/lib/arm64/asm/sysreg.h
> @@ -70,4 +70,21 @@ asm(
>  "	.endm\n"
>  );
>  #endif /* __ASSEMBLY__ */
> +
> +#define ICC_PMR_EL1			sys_reg(3, 0, 4, 6, 0)
> +#define ICC_SGI1R_EL1			sys_reg(3, 0, 12, 11, 5)
> +#define ICC_IAR1_EL1			sys_reg(3, 0, 12, 12, 0)
> +#define ICC_EOIR1_EL1			sys_reg(3, 0, 12, 12, 1)
> +#define ICC_GRPEN1_EL1			sys_reg(3, 0, 12, 12, 7)
> +
> +/* System Control Register (SCTLR_EL1) bits */
> +#define SCTLR_EL1_EE	(1 << 25)
> +#define SCTLR_EL1_WXN	(1 << 19)
> +#define SCTLR_EL1_I	(1 << 12)
> +#define SCTLR_EL1_SA0	(1 << 4)
> +#define SCTLR_EL1_SA	(1 << 3)
> +#define SCTLR_EL1_C	(1 << 2)
> +#define SCTLR_EL1_A	(1 << 1)
> +#define SCTLR_EL1_M	(1 << 0)
> +
>  #endif /* _ASMARM64_SYSREG_H_ */
> diff --git a/arm/cstart64.S b/arm/cstart64.S
> index c1deff842f03..f6c5d2ebccf3 100644
> --- a/arm/cstart64.S
> +++ b/arm/cstart64.S
> @@ -10,9 +10,9 @@
>  #include <asm/asm-offsets.h>
>  #include <asm/assembler.h>
>  #include <asm/ptrace.h>
> -#include <asm/processor.h>
>  #include <asm/page.h>
>  #include <asm/pgtable-hwdef.h>
> +#include <asm/sysreg.h>
>  
>  .section .init
>  


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [kvm-unit-tests PATCH 5/6] arm64: Configure SCTLR_EL1 at boot
  2021-02-27 10:42 ` [kvm-unit-tests PATCH 5/6] arm64: Configure SCTLR_EL1 at boot Alexandru Elisei
@ 2021-03-03 17:32   ` Andre Przywara
  0 siblings, 0 replies; 17+ messages in thread
From: Andre Przywara @ 2021-03-03 17:32 UTC (permalink / raw)
  To: Alexandru Elisei; +Cc: drjones, kvm, kvmarm

On Sat, 27 Feb 2021 10:42:00 +0000
Alexandru Elisei <alexandru.elisei@arm.com> wrote:

Hi,

> Some fields in SCTLR_EL1 are UNKNOWN at reset and the arm64 boot
> requirements, as stated by Linux in Documentation/arm64/booting.rst, do not
> specify a particular value for all the fields. Do not rely on the good will
> of the hypervisor and userspace to set SCTLR_EL1 to a sane value (by their
> definition of sane) and set SCTLR_EL1 explicitely before running setup().
> This will ensure that all tests are performed with the hardware set up
> identically, regardless of the KVM or VMM versions.
> 
> Signed-off-by: Alexandru Elisei <alexandru.elisei@arm.com>

Can confirm that the RES1 bits match the ARM ARM, and that it's indeed
a good idea to start from a known good state:

Reviewed-by: Andre Przywara <andre.przywara@arm.com>

Cheers,
Andre

> ---
>  lib/arm64/asm/sysreg.h | 7 +++++++
>  arm/cstart64.S         | 5 +++++
>  2 files changed, 12 insertions(+)
> 
> diff --git a/lib/arm64/asm/sysreg.h b/lib/arm64/asm/sysreg.h
> index 9d6b4fc66936..18c4ed39557a 100644
> --- a/lib/arm64/asm/sysreg.h
> +++ b/lib/arm64/asm/sysreg.h
> @@ -8,6 +8,8 @@
>  #ifndef _ASMARM64_SYSREG_H_
>  #define _ASMARM64_SYSREG_H_
>  
> +#include <linux/const.h>
> +
>  #define sys_reg(op0, op1, crn, crm, op2) \
>  	((((op0)&3)<<19)|((op1)<<16)|((crn)<<12)|((crm)<<8)|((op2)<<5))
>  
> @@ -87,4 +89,9 @@ asm(
>  #define SCTLR_EL1_A	(1 << 1)
>  #define SCTLR_EL1_M	(1 << 0)
>  
> +#define SCTLR_EL1_RES1	(_BITUL(7) | _BITUL(8) | _BITUL(11) | _BITUL(20) | \
> +			 _BITUL(22) | _BITUL(23) | _BITUL(28) | _BITUL(29))
> +#define INIT_SCTLR_EL1_MMU_OFF	\
> +			SCTLR_EL1_RES1
> +
>  #endif /* _ASMARM64_SYSREG_H_ */
> diff --git a/arm/cstart64.S b/arm/cstart64.S
> index f6c5d2ebccf3..42a838ff4c38 100644
> --- a/arm/cstart64.S
> +++ b/arm/cstart64.S
> @@ -52,6 +52,11 @@ start:
>  	b	1b
>  
>  1:
> +	/* set SCTLR_EL1 to a known value */
> +	ldr	x4, =INIT_SCTLR_EL1_MMU_OFF
> +	msr	sctlr_el1, x4
> +	isb
> +
>  	/* set up stack */
>  	mov	x4, #1
>  	msr	spsel, x4


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [kvm-unit-tests PATCH 6/6] arm64: Disable TTBR1_EL1 translation table walks
  2021-02-27 10:42 ` [kvm-unit-tests PATCH 6/6] arm64: Disable TTBR1_EL1 translation table walks Alexandru Elisei
@ 2021-03-03 17:32   ` Andre Przywara
  0 siblings, 0 replies; 17+ messages in thread
From: Andre Przywara @ 2021-03-03 17:32 UTC (permalink / raw)
  To: Alexandru Elisei; +Cc: drjones, kvm, kvmarm

On Sat, 27 Feb 2021 10:42:01 +0000
Alexandru Elisei <alexandru.elisei@arm.com> wrote:

> From an architectural point of view, the PE can speculate instruction
> fetches and data reads at any time when the MMU is enabled using the
> translation tables from TTBR0_EL1 and TTBR1_EL1. kvm-unit-tests uses an
> identity map, and as such it only programs TTBR0_EL1 with a valid table and
> leaves TTBR1_EL1 unchanged. The reset value for TTBR1_EL1 is UNKNOWN, which
> means it is possible for the PE to perform reads from memory locations
> where accesses can cause side effects (like memory-mapped devices) as part
> of the speculated translation table walk.
> 
> So far, this hasn't been a problem, because KVM resets TTBR{0,1}_EL1 to
> zero, and that address is used for emulation for both qemu and kvmtool and
> it doesn't point to a real device. However, kvm-unit-tests shouldn't rely
> on a particular combination of hypervisor and userspace for correctness.
> Another situation where we can't rely on these assumptions being true is
> when kvm-unit-tests is run as an EFI app.
> 
> To prevent reads from arbitrary addresses, set the TCR_EL1.EPD1 bit to
> disable speculative translation table walks using TTBR1_EL1.
> 
> This is similar to EDK2 commit fafb7e9c110e ("ArmPkg: correct TTBR1_EL1
> settings in TCR_EL1"). Also mentioned in that commit is the Cortex-A57
> erratum 822227 which impacts the hypervisor, but kvm-unit-tests is
> protected against it because asm_mmu_enable sets both the TCR_EL1.TG0 and
> TCR_EL1.TG1 bits when programming the register.
> 
> Suggested-by: Mark Rutland <mark.rutland@arm.com>
> Signed-off-by: Alexandru Elisei <alexandru.elisei@arm.com>

That sounds like a good idea. Verified the bit against the ARM ARM.

Reviewed-by: Andre Przywara <andre.przywara@arm.com>

Cheers,
Andre

> ---
>  lib/arm64/asm/pgtable-hwdef.h | 1 +
>  arm/cstart64.S                | 3 ++-
>  2 files changed, 3 insertions(+), 1 deletion(-)
> 
> diff --git a/lib/arm64/asm/pgtable-hwdef.h b/lib/arm64/asm/pgtable-hwdef.h
> index 48a1d1ab1ac2..8c41fe123fb3 100644
> --- a/lib/arm64/asm/pgtable-hwdef.h
> +++ b/lib/arm64/asm/pgtable-hwdef.h
> @@ -136,6 +136,7 @@
>  #define TCR_ORGN_WBnWA		((UL(3) << 10) | (UL(3) << 26))
>  #define TCR_ORGN_MASK		((UL(3) << 10) | (UL(3) << 26))
>  #define TCR_SHARED		((UL(3) << 12) | (UL(3) << 28))
> +#define TCR_EPD1		(UL(1) << 23)
>  #define TCR_TG0_4K		(UL(0) << 14)
>  #define TCR_TG0_64K		(UL(1) << 14)
>  #define TCR_TG0_16K		(UL(2) << 14)
> diff --git a/arm/cstart64.S b/arm/cstart64.S
> index 42a838ff4c38..3d359c8387c9 100644
> --- a/arm/cstart64.S
> +++ b/arm/cstart64.S
> @@ -181,7 +181,8 @@ asm_mmu_enable:
>  	ldr	x1, =TCR_TxSZ(VA_BITS) |		\
>  		     TCR_TG_FLAGS  |			\
>  		     TCR_IRGN_WBWA | TCR_ORGN_WBWA |	\
> -		     TCR_SHARED
> +		     TCR_SHARED |			\
> +		     TCR_EPD1
>  	mrs	x2, id_aa64mmfr0_el1
>  	bfi	x1, x2, #32, #3
>  	msr	tcr_el1, x1


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [kvm-unit-tests PATCH 1/6] arm64: Remove unnecessary ISB when writing to SPSel
  2021-02-27 10:41 ` [kvm-unit-tests PATCH 1/6] arm64: Remove unnecessary ISB when writing to SPSel Alexandru Elisei
@ 2021-03-03 17:35   ` Andre Przywara
  0 siblings, 0 replies; 17+ messages in thread
From: Andre Przywara @ 2021-03-03 17:35 UTC (permalink / raw)
  To: Alexandru Elisei; +Cc: drjones, kvm, kvmarm

On Sat, 27 Feb 2021 10:41:56 +0000
Alexandru Elisei <alexandru.elisei@arm.com> wrote:

> Software can use the SPSel operand to write directly to PSTATE.SP.
> According to ARM DDI 0487F.b, page D1-2332, writes to PSTATE are
> self-synchronizing and no ISB is needed:
> 
> "Writes to the PSTATE fields have side-effects on various aspects of the PE
> operation. All of these side-effects are guaranteed:
> - Not to be visible to earlier instructions in the execution stream.
> - To be visible to later instructions in the execution stream."
> 
> Signed-off-by: Alexandru Elisei <alexandru.elisei@arm.com>

I am always a bit wary about *removing* barriers, but I can confirm
that the ARM ARM indeed makes this guarantee above, and SP access
sounds like an easy enough case, so:

Reviewed-by: Andre Przywara <andre.przywara@arm.com>

Cheers,
Andre

> ---
>  arm/cstart64.S | 1 -
>  1 file changed, 1 deletion(-)
> 
> diff --git a/arm/cstart64.S b/arm/cstart64.S
> index 0428014aa58a..fc1930bcdb53 100644
> --- a/arm/cstart64.S
> +++ b/arm/cstart64.S
> @@ -54,7 +54,6 @@ start:
>  	/* set up stack */
>  	mov	x4, #1
>  	msr	spsel, x4
> -	isb
>  	adrp    x4, stackptr
>  	add     sp, x4, :lo12:stackptr
>  


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [kvm-unit-tests PATCH 2/6] arm/arm64: Remove dcache_line_size global variable
  2021-02-27 10:41 ` [kvm-unit-tests PATCH 2/6] arm/arm64: Remove dcache_line_size global variable Alexandru Elisei
@ 2021-03-04 15:00   ` Andre Przywara
  2021-03-15 15:46     ` Alexandru Elisei
  0 siblings, 1 reply; 17+ messages in thread
From: Andre Przywara @ 2021-03-04 15:00 UTC (permalink / raw)
  To: Alexandru Elisei; +Cc: drjones, kvm, kvmarm

On Sat, 27 Feb 2021 10:41:57 +0000
Alexandru Elisei <alexandru.elisei@arm.com> wrote:

> Compute the dcache line size when doing dcache maintenance instead of using
> a global variable computed in setup(), which allows us to do dcache
> maintenance at any point in the boot process. This will be useful for
> running as an EFI app and it also aligns our implementation to that of the
> Linux kernel.

Can you add that this changes the semantic of dcache_by_line_op to use
the size instead of the end address?

> 
> For consistency, the arm code has been similary modified.
> 
> Signed-off-by: Alexandru Elisei <alexandru.elisei@arm.com>
> ---
>  lib/arm/asm/assembler.h   | 44 ++++++++++++++++++++++++++++++++
>  lib/arm/asm/processor.h   |  7 ------
>  lib/arm64/asm/assembler.h | 53 +++++++++++++++++++++++++++++++++++++++
>  lib/arm64/asm/processor.h |  7 ------
>  lib/arm/setup.c           |  7 ------
>  arm/cstart.S              | 18 +++----------
>  arm/cstart64.S            | 16 ++----------
>  7 files changed, 102 insertions(+), 50 deletions(-)
>  create mode 100644 lib/arm/asm/assembler.h
>  create mode 100644 lib/arm64/asm/assembler.h
> 
> diff --git a/lib/arm/asm/assembler.h b/lib/arm/asm/assembler.h
> new file mode 100644
> index 000000000000..6b932df86204
> --- /dev/null
> +++ b/lib/arm/asm/assembler.h
> @@ -0,0 +1,44 @@
> +/* SPDX-License-Identifier: GPL-2.0 */
> +/*
> + * Based on several files from Linux version v5.10: arch/arm/mm/proc-macros.S,
> + * arch/arm/mm/proc-v7.S.
> + */
> +
> +/*
> + * dcache_line_size - get the minimum D-cache line size from the CTR register
`> + * on ARMv7.
> + */
> +	.macro	dcache_line_size, reg, tmp
> +	mrc	p15, 0, \tmp, c0, c0, 1		// read ctr
> +	lsr	\tmp, \tmp, #16
> +	and	\tmp, \tmp, #0xf		// cache line size encoding
> +	mov	\reg, #4			// bytes per word
> +	mov	\reg, \reg, lsl \tmp		// actual cache line size
> +	.endm
> +
> +/*
> + * Macro to perform a data cache maintenance for the interval
> + * [addr, addr + size).
> + *
> + * 	op:		operation to execute
> + * 	domain		domain used in the dsb instruction
> + * 	addr:		starting virtual address of the region
> + * 	size:		size of the region
> + * 	Corrupts:	addr, size, tmp1, tmp2
> + */
> +	.macro dcache_by_line_op op, domain, addr, size, tmp1, tmp2
> +	dcache_line_size \tmp1, \tmp2
> +	add	\size, \addr, \size
> +	sub	\tmp2, \tmp1, #1
> +	bic	\addr, \addr, \tmp2

Just a nit, but since my brain was in assembly land: We could skip tmp2,
by adding back #1 to tmp1 after the bic.
Same for the arm64 code.

> +9998:
> +	.ifc	\op, dccimvac
> +	mcr	p15, 0, \addr, c7, c14, 1
> +	.else
> +	.err
> +	.endif
> +	add	\addr, \addr, \tmp1
> +	cmp	\addr, \size
> +	blo	9998b
> +	dsb	\domain
> +	.endm
> diff --git a/lib/arm/asm/processor.h b/lib/arm/asm/processor.h
> index 273366d1fe1c..3c36eac903f0 100644
> --- a/lib/arm/asm/processor.h
> +++ b/lib/arm/asm/processor.h
> @@ -9,11 +9,6 @@
>  #include <asm/sysreg.h>
>  #include <asm/barrier.h>

Do we want the same protection against inclusion from C here as in the
arm64 version?

> -#define CTR_DMINLINE_SHIFT	16
> -#define CTR_DMINLINE_MASK	(0xf << 16)
> -#define CTR_DMINLINE(x)	\
> -	(((x) & CTR_DMINLINE_MASK) >> CTR_DMINLINE_SHIFT)
> -
>  enum vector {
>  	EXCPTN_RST,
>  	EXCPTN_UND,
> @@ -89,6 +84,4 @@ static inline u32 get_ctr(void)
>  	return read_sysreg(CTR);
>  }
>  
> -extern unsigned long dcache_line_size;
> -
>  #endif /* _ASMARM_PROCESSOR_H_ */
> diff --git a/lib/arm64/asm/assembler.h b/lib/arm64/asm/assembler.h
> new file mode 100644
> index 000000000000..f801c0c43d02
> --- /dev/null
> +++ b/lib/arm64/asm/assembler.h
> @@ -0,0 +1,53 @@
> +/* SPDX-License-Identifier: GPL-2.0-only */
> +/*
> + * Based on the file arch/arm64/include/asm/assembled.h from Linux v5.10, which
> + * in turn is based on arch/arm/include/asm/assembler.h and
> + * arch/arm/mm/proc-macros.S
> + *
> + * Copyright (C) 1996-2000 Russell King
> + * Copyright (C) 2012 ARM Ltd.
> + */
> +#ifndef __ASSEMBLY__
> +#error "Only include this from assembly code"
> +#endif
> +
> +#ifndef __ASM_ASSEMBLER_H
> +#define __ASM_ASSEMBLER_H
> +
> +/*
> + * raw_dcache_line_size - get the minimum D-cache line size on this CPU
> + * from the CTR register.
> + */
> +	.macro	raw_dcache_line_size, reg, tmp
> +	mrs	\tmp, ctr_el0			// read CTR
> +	ubfm	\tmp, \tmp, #16, #19		// cache line size encoding

this encoding of ubfm is supposed to be written as:
	ubfx \tmp, \tmp, #16, #4
This is also what objdump makes of the above.

The rest looks good, I convinced myself that the assembly algorithms are
correct.

Cheers,
Andre


> +	mov	\reg, #4			// bytes per word
> +	lsl	\reg, \reg, \tmp		// actual cache line size
> +	.endm
> +
> +/*
> + * Macro to perform a data cache maintenance for the interval
> + * [addr, addr + size). Use the raw value for the dcache line size because
> + * kvm-unit-tests has no concept of scheduling.
> + *
> + * 	op:		operation passed to dc instruction
> + * 	domain:		domain used in dsb instruciton
> + * 	addr:		starting virtual address of the region
> + * 	size:		size of the region
> + * 	Corrupts:	addr, size, tmp1, tmp2
> + */
> +
> +	.macro dcache_by_line_op op, domain, addr, size, tmp1, tmp2
> +	raw_dcache_line_size \tmp1, \tmp2
> +	add	\size, \addr, \size
> +	sub	\tmp2, \tmp1, #1
> +	bic	\addr, \addr, \tmp2
> +9998:
> +	dc	\op, \addr
> +	add	\addr, \addr, \tmp1
> +	cmp	\addr, \size
> +	b.lo	9998b
> +	dsb	\domain
> +	.endm
> +
> +#endif	/* __ASM_ASSEMBLER_H */
> diff --git a/lib/arm64/asm/processor.h b/lib/arm64/asm/processor.h
> index 771b2d1e0c94..cdc2463e1981 100644
> --- a/lib/arm64/asm/processor.h
> +++ b/lib/arm64/asm/processor.h
> @@ -16,11 +16,6 @@
>  #define SCTLR_EL1_A	(1 << 1)
>  #define SCTLR_EL1_M	(1 << 0)
>  
> -#define CTR_DMINLINE_SHIFT	16
> -#define CTR_DMINLINE_MASK	(0xf << 16)
> -#define CTR_DMINLINE(x)	\
> -	(((x) & CTR_DMINLINE_MASK) >> CTR_DMINLINE_SHIFT)
> -
>  #ifndef __ASSEMBLY__
>  #include <asm/ptrace.h>
>  #include <asm/esr.h>
> @@ -115,8 +110,6 @@ static inline u64 get_ctr(void)
>  	return read_sysreg(ctr_el0);
>  }
>  
> -extern unsigned long dcache_line_size;
> -
>  static inline unsigned long get_id_aa64mmfr0_el1(void)
>  {
>  	return read_sysreg(id_aa64mmfr0_el1);
> diff --git a/lib/arm/setup.c b/lib/arm/setup.c
> index 066524f8bf61..751ba980000a 100644
> --- a/lib/arm/setup.c
> +++ b/lib/arm/setup.c
> @@ -42,8 +42,6 @@ static struct mem_region __initial_mem_regions[NR_INITIAL_MEM_REGIONS + 1];
>  struct mem_region *mem_regions = __initial_mem_regions;
>  phys_addr_t __phys_offset, __phys_end;
>  
> -unsigned long dcache_line_size;
> -
>  int mpidr_to_cpu(uint64_t mpidr)
>  {
>  	int i;
> @@ -72,11 +70,6 @@ static void cpu_init(void)
>  	ret = dt_for_each_cpu_node(cpu_set, NULL);
>  	assert(ret == 0);
>  	set_cpu_online(0, true);
> -	/*
> -	 * DminLine is log2 of the number of words in the smallest cache line; a
> -	 * word is 4 bytes.
> -	 */
> -	dcache_line_size = 1 << (CTR_DMINLINE(get_ctr()) + 2);
>  }
>  
>  unsigned int mem_region_get_flags(phys_addr_t paddr)
> diff --git a/arm/cstart.S b/arm/cstart.S
> index ef936ae2f874..954748b00f64 100644
> --- a/arm/cstart.S
> +++ b/arm/cstart.S
> @@ -7,6 +7,7 @@
>   */
>  #define __ASSEMBLY__
>  #include <auxinfo.h>
> +#include <asm/assembler.h>
>  #include <asm/thread_info.h>
>  #include <asm/asm-offsets.h>
>  #include <asm/pgtable-hwdef.h>
> @@ -197,20 +198,6 @@ asm_mmu_enable:
>  
>  	mov     pc, lr
>  
> -.macro dcache_clean_inval domain, start, end, tmp1, tmp2
> -	ldr	\tmp1, =dcache_line_size
> -	ldr	\tmp1, [\tmp1]
> -	sub	\tmp2, \tmp1, #1
> -	bic	\start, \start, \tmp2
> -9998:
> -	/* DCCIMVAC */
> -	mcr	p15, 0, \start, c7, c14, 1
> -	add	\start, \start, \tmp1
> -	cmp	\start, \end
> -	blo	9998b
> -	dsb	\domain
> -.endm
> -
>  .globl asm_mmu_disable
>  asm_mmu_disable:
>  	/* SCTLR */
> @@ -223,7 +210,8 @@ asm_mmu_disable:
>  	ldr	r0, [r0]
>  	ldr	r1, =__phys_end
>  	ldr	r1, [r1]
> -	dcache_clean_inval sy, r0, r1, r2, r3
> +	sub	r1, r1, r0
> +	dcache_by_line_op dccimvac, sy, r0, r1, r2, r3
>  	isb
>  
>  	mov     pc, lr
> diff --git a/arm/cstart64.S b/arm/cstart64.S
> index fc1930bcdb53..046bd3914098 100644
> --- a/arm/cstart64.S
> +++ b/arm/cstart64.S
> @@ -8,6 +8,7 @@
>  #define __ASSEMBLY__
>  #include <auxinfo.h>
>  #include <asm/asm-offsets.h>
> +#include <asm/assembler.h>
>  #include <asm/ptrace.h>
>  #include <asm/processor.h>
>  #include <asm/page.h>
> @@ -204,20 +205,6 @@ asm_mmu_enable:
>  
>  	ret
>  
> -/* Taken with small changes from arch/arm64/incluse/asm/assembler.h */
> -.macro dcache_by_line_op op, domain, start, end, tmp1, tmp2
> -	adrp	\tmp1, dcache_line_size
> -	ldr	\tmp1, [\tmp1, :lo12:dcache_line_size]
> -	sub	\tmp2, \tmp1, #1
> -	bic	\start, \start, \tmp2
> -9998:
> -	dc	\op , \start
> -	add	\start, \start, \tmp1
> -	cmp	\start, \end
> -	b.lo	9998b
> -	dsb	\domain
> -.endm
> -
>  .globl asm_mmu_disable
>  asm_mmu_disable:
>  	mrs	x0, sctlr_el1
> @@ -230,6 +217,7 @@ asm_mmu_disable:
>  	ldr	x0, [x0, :lo12:__phys_offset]
>  	adrp	x1, __phys_end
>  	ldr	x1, [x1, :lo12:__phys_end]
> +	sub	x1, x1, x0
>  	dcache_by_line_op civac, sy, x0, x1, x2, x3
>  	isb
>  


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [kvm-unit-tests PATCH 3/6] arm/arm64: Remove unnecessary ISB when doing dcache maintenance
  2021-02-27 10:41 ` [kvm-unit-tests PATCH 3/6] arm/arm64: Remove unnecessary ISB when doing dcache maintenance Alexandru Elisei
@ 2021-03-12 14:59   ` Andrew Jones
  2021-03-15 16:22     ` Alexandru Elisei
  0 siblings, 1 reply; 17+ messages in thread
From: Andrew Jones @ 2021-03-12 14:59 UTC (permalink / raw)
  To: Alexandru Elisei; +Cc: kvm, kvmarm

On Sat, Feb 27, 2021 at 10:41:58AM +0000, Alexandru Elisei wrote:
> The dcache_by_line_op macro executes a DSB to complete the cache
> maintenance operations. According to ARM DDI 0487G.a, page B2-150:
> 
> "In addition, no instruction that appears in program order after the DSB
> instruction can alter any state of the system or perform any part of its
> functionality until the DSB completes other than:
> 
> - Being fetched from memory and decoded.
> - Reading the general-purpose, SIMD and floating-point, Special-purpose, or
>   System registers that are directly or indirectly read without causing
>   side-effects."
> 
> Similar definition for ARM in ARM DDI 0406C.d, page A3-150:
> 
> "In addition, no instruction that appears in program order after the DSB
> instruction can execute until the DSB completes."
> 
> This means that we don't need the ISB to prevent reordering of the cache
> maintenance instructions.
> 
> We are also not doing icache maintenance, where an ISB would be required
> for the PE to discard instructions speculated before the invalidation.
> 
> In conclusion, the ISB is unnecessary, so remove it.

Hi Alexandru,

We can go ahead and take this patch, since you've written quite a
convincing commit message, but in general I'd prefer we be overly cautious
in our common code. We'd like to ensure we don't introduce difficult to
debug issues there, and we don't care about optimizations, let alone
micro-optimizations. Testing barrier needs to the letter of the spec is a
good idea, but it's probably better to do that in the test cases.

Thanks,
drew

> 
> Signed-off-by: Alexandru Elisei <alexandru.elisei@arm.com>
> ---
>  arm/cstart.S   | 1 -
>  arm/cstart64.S | 1 -
>  2 files changed, 2 deletions(-)
> 
> diff --git a/arm/cstart.S b/arm/cstart.S
> index 954748b00f64..2d62c1e6d40d 100644
> --- a/arm/cstart.S
> +++ b/arm/cstart.S
> @@ -212,7 +212,6 @@ asm_mmu_disable:
>  	ldr	r1, [r1]
>  	sub	r1, r1, r0
>  	dcache_by_line_op dccimvac, sy, r0, r1, r2, r3
> -	isb
>  
>  	mov     pc, lr
>  
> diff --git a/arm/cstart64.S b/arm/cstart64.S
> index 046bd3914098..c1deff842f03 100644
> --- a/arm/cstart64.S
> +++ b/arm/cstart64.S
> @@ -219,7 +219,6 @@ asm_mmu_disable:
>  	ldr	x1, [x1, :lo12:__phys_end]
>  	sub	x1, x1, x0
>  	dcache_by_line_op civac, sy, x0, x1, x2, x3
> -	isb
>  
>  	ret
>  
> -- 
> 2.30.1
> 


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [kvm-unit-tests PATCH 2/6] arm/arm64: Remove dcache_line_size global variable
  2021-03-04 15:00   ` Andre Przywara
@ 2021-03-15 15:46     ` Alexandru Elisei
  2021-03-16 15:40       ` Andre Przywara
  0 siblings, 1 reply; 17+ messages in thread
From: Alexandru Elisei @ 2021-03-15 15:46 UTC (permalink / raw)
  To: Andre Przywara; +Cc: drjones, kvm, kvmarm

Hi Andre,

On 3/4/21 3:00 PM, Andre Przywara wrote:
> On Sat, 27 Feb 2021 10:41:57 +0000
> Alexandru Elisei <alexandru.elisei@arm.com> wrote:
>
>> Compute the dcache line size when doing dcache maintenance instead of using
>> a global variable computed in setup(), which allows us to do dcache
>> maintenance at any point in the boot process. This will be useful for
>> running as an EFI app and it also aligns our implementation to that of the
>> Linux kernel.
> Can you add that this changes the semantic of dcache_by_line_op to use
> the size instead of the end address?

Sure, I can do that. The dcache_by_line_op was never visible to code outside
assembly, and it was only used by asm_mmu_disable, so no other callers are
affected by this change.

>
>> For consistency, the arm code has been similary modified.
>>
>> Signed-off-by: Alexandru Elisei <alexandru.elisei@arm.com>
>> ---
>>  lib/arm/asm/assembler.h   | 44 ++++++++++++++++++++++++++++++++
>>  lib/arm/asm/processor.h   |  7 ------
>>  lib/arm64/asm/assembler.h | 53 +++++++++++++++++++++++++++++++++++++++
>>  lib/arm64/asm/processor.h |  7 ------
>>  lib/arm/setup.c           |  7 ------
>>  arm/cstart.S              | 18 +++----------
>>  arm/cstart64.S            | 16 ++----------
>>  7 files changed, 102 insertions(+), 50 deletions(-)
>>  create mode 100644 lib/arm/asm/assembler.h
>>  create mode 100644 lib/arm64/asm/assembler.h
>>
>> diff --git a/lib/arm/asm/assembler.h b/lib/arm/asm/assembler.h
>> new file mode 100644
>> index 000000000000..6b932df86204
>> --- /dev/null
>> +++ b/lib/arm/asm/assembler.h
>> @@ -0,0 +1,44 @@
>> +/* SPDX-License-Identifier: GPL-2.0 */
>> +/*
>> + * Based on several files from Linux version v5.10: arch/arm/mm/proc-macros.S,
>> + * arch/arm/mm/proc-v7.S.
>> + */
>> +
>> +/*
>> + * dcache_line_size - get the minimum D-cache line size from the CTR register
> `> + * on ARMv7.

Well, it's in the arm directory and there's a file with the same name under
lib/arm64/asm/, so I don't think there's any room for confusion here.

>> + */
>> +	.macro	dcache_line_size, reg, tmp
>> +	mrc	p15, 0, \tmp, c0, c0, 1		// read ctr
>> +	lsr	\tmp, \tmp, #16
>> +	and	\tmp, \tmp, #0xf		// cache line size encoding
>> +	mov	\reg, #4			// bytes per word
>> +	mov	\reg, \reg, lsl \tmp		// actual cache line size
>> +	.endm
>> +
>> +/*
>> + * Macro to perform a data cache maintenance for the interval
>> + * [addr, addr + size).
>> + *
>> + * 	op:		operation to execute
>> + * 	domain		domain used in the dsb instruction
>> + * 	addr:		starting virtual address of the region
>> + * 	size:		size of the region
>> + * 	Corrupts:	addr, size, tmp1, tmp2
>> + */
>> +	.macro dcache_by_line_op op, domain, addr, size, tmp1, tmp2
>> +	dcache_line_size \tmp1, \tmp2
>> +	add	\size, \addr, \size
>> +	sub	\tmp2, \tmp1, #1
>> +	bic	\addr, \addr, \tmp2
> Just a nit, but since my brain was in assembly land: We could skip tmp2,
> by adding back #1 to tmp1 after the bic.
> Same for the arm64 code.

Using one less temporary register wouldn't help with register pressure:

- On arm, registers r0-r3 are used, which ARM IHI 0042F says that they can be used
as scratch registers and the caller will save their contents before the calling
the function (or not use them at all).

- On arm64, register x0-x3 are used, which have a similar usage according to ARM
IHI 0055B.

Using one less temporary register means one more instruction, but not relevant
since the macro will perform writes, as even invalidation is transformed to a
clean + invalidate under virtualization.

The reason I chose to keep the macro unchanged for arm64 is that it matches the
Linux definition, and I think it's better to try not to deviate too much from it,
as in the long it will make maintenance easier for everyone.

For arm, I wrote it this way to match the arm64 definition.

>
>> +9998:
>> +	.ifc	\op, dccimvac
>> +	mcr	p15, 0, \addr, c7, c14, 1
>> +	.else
>> +	.err
>> +	.endif
>> +	add	\addr, \addr, \tmp1
>> +	cmp	\addr, \size
>> +	blo	9998b
>> +	dsb	\domain
>> +	.endm
>> diff --git a/lib/arm/asm/processor.h b/lib/arm/asm/processor.h
>> index 273366d1fe1c..3c36eac903f0 100644
>> --- a/lib/arm/asm/processor.h
>> +++ b/lib/arm/asm/processor.h
>> @@ -9,11 +9,6 @@
>>  #include <asm/sysreg.h>
>>  #include <asm/barrier.h>
> Do we want the same protection against inclusion from C here as in the
> arm64 version?

We do, I will add it in the next iteration.

>
>> -#define CTR_DMINLINE_SHIFT	16
>> -#define CTR_DMINLINE_MASK	(0xf << 16)
>> -#define CTR_DMINLINE(x)	\
>> -	(((x) & CTR_DMINLINE_MASK) >> CTR_DMINLINE_SHIFT)
>> -
>>  enum vector {
>>  	EXCPTN_RST,
>>  	EXCPTN_UND,
>> @@ -89,6 +84,4 @@ static inline u32 get_ctr(void)
>>  	return read_sysreg(CTR);
>>  }
>>  
>> -extern unsigned long dcache_line_size;
>> -
>>  #endif /* _ASMARM_PROCESSOR_H_ */
>> diff --git a/lib/arm64/asm/assembler.h b/lib/arm64/asm/assembler.h
>> new file mode 100644
>> index 000000000000..f801c0c43d02
>> --- /dev/null
>> +++ b/lib/arm64/asm/assembler.h
>> @@ -0,0 +1,53 @@
>> +/* SPDX-License-Identifier: GPL-2.0-only */
>> +/*
>> + * Based on the file arch/arm64/include/asm/assembled.h from Linux v5.10, which
>> + * in turn is based on arch/arm/include/asm/assembler.h and
>> + * arch/arm/mm/proc-macros.S
>> + *
>> + * Copyright (C) 1996-2000 Russell King
>> + * Copyright (C) 2012 ARM Ltd.
>> + */
>> +#ifndef __ASSEMBLY__
>> +#error "Only include this from assembly code"
>> +#endif
>> +
>> +#ifndef __ASM_ASSEMBLER_H
>> +#define __ASM_ASSEMBLER_H
>> +
>> +/*
>> + * raw_dcache_line_size - get the minimum D-cache line size on this CPU
>> + * from the CTR register.
>> + */
>> +	.macro	raw_dcache_line_size, reg, tmp
>> +	mrs	\tmp, ctr_el0			// read CTR
>> +	ubfm	\tmp, \tmp, #16, #19		// cache line size encoding
> this encoding of ubfm is supposed to be written as:
> 	ubfx \tmp, \tmp, #16, #4
> This is also what objdump makes of the above.

I would rather keep it the same as Linux.

>
> The rest looks good, I convinced myself that the assembly algorithms are
> correct.

Thanks, much appreciated!

Thanks,

Alex

>
> Cheers,
> Andre
>
>
>> +	mov	\reg, #4			// bytes per word
>> +	lsl	\reg, \reg, \tmp		// actual cache line size
>> +	.endm
>> +
>> +/*
>> + * Macro to perform a data cache maintenance for the interval
>> + * [addr, addr + size). Use the raw value for the dcache line size because
>> + * kvm-unit-tests has no concept of scheduling.
>> + *
>> + * 	op:		operation passed to dc instruction
>> + * 	domain:		domain used in dsb instruciton
>> + * 	addr:		starting virtual address of the region
>> + * 	size:		size of the region
>> + * 	Corrupts:	addr, size, tmp1, tmp2
>> + */
>> +
>> +	.macro dcache_by_line_op op, domain, addr, size, tmp1, tmp2
>> +	raw_dcache_line_size \tmp1, \tmp2
>> +	add	\size, \addr, \size
>> +	sub	\tmp2, \tmp1, #1
>> +	bic	\addr, \addr, \tmp2
>> +9998:
>> +	dc	\op, \addr
>> +	add	\addr, \addr, \tmp1
>> +	cmp	\addr, \size
>> +	b.lo	9998b
>> +	dsb	\domain
>> +	.endm
>> +
>> +#endif	/* __ASM_ASSEMBLER_H */
>> diff --git a/lib/arm64/asm/processor.h b/lib/arm64/asm/processor.h
>> index 771b2d1e0c94..cdc2463e1981 100644
>> --- a/lib/arm64/asm/processor.h
>> +++ b/lib/arm64/asm/processor.h
>> @@ -16,11 +16,6 @@
>>  #define SCTLR_EL1_A	(1 << 1)
>>  #define SCTLR_EL1_M	(1 << 0)
>>  
>> -#define CTR_DMINLINE_SHIFT	16
>> -#define CTR_DMINLINE_MASK	(0xf << 16)
>> -#define CTR_DMINLINE(x)	\
>> -	(((x) & CTR_DMINLINE_MASK) >> CTR_DMINLINE_SHIFT)
>> -
>>  #ifndef __ASSEMBLY__
>>  #include <asm/ptrace.h>
>>  #include <asm/esr.h>
>> @@ -115,8 +110,6 @@ static inline u64 get_ctr(void)
>>  	return read_sysreg(ctr_el0);
>>  }
>>  
>> -extern unsigned long dcache_line_size;
>> -
>>  static inline unsigned long get_id_aa64mmfr0_el1(void)
>>  {
>>  	return read_sysreg(id_aa64mmfr0_el1);
>> diff --git a/lib/arm/setup.c b/lib/arm/setup.c
>> index 066524f8bf61..751ba980000a 100644
>> --- a/lib/arm/setup.c
>> +++ b/lib/arm/setup.c
>> @@ -42,8 +42,6 @@ static struct mem_region __initial_mem_regions[NR_INITIAL_MEM_REGIONS + 1];
>>  struct mem_region *mem_regions = __initial_mem_regions;
>>  phys_addr_t __phys_offset, __phys_end;
>>  
>> -unsigned long dcache_line_size;
>> -
>>  int mpidr_to_cpu(uint64_t mpidr)
>>  {
>>  	int i;
>> @@ -72,11 +70,6 @@ static void cpu_init(void)
>>  	ret = dt_for_each_cpu_node(cpu_set, NULL);
>>  	assert(ret == 0);
>>  	set_cpu_online(0, true);
>> -	/*
>> -	 * DminLine is log2 of the number of words in the smallest cache line; a
>> -	 * word is 4 bytes.
>> -	 */
>> -	dcache_line_size = 1 << (CTR_DMINLINE(get_ctr()) + 2);
>>  }
>>  
>>  unsigned int mem_region_get_flags(phys_addr_t paddr)
>> diff --git a/arm/cstart.S b/arm/cstart.S
>> index ef936ae2f874..954748b00f64 100644
>> --- a/arm/cstart.S
>> +++ b/arm/cstart.S
>> @@ -7,6 +7,7 @@
>>   */
>>  #define __ASSEMBLY__
>>  #include <auxinfo.h>
>> +#include <asm/assembler.h>
>>  #include <asm/thread_info.h>
>>  #include <asm/asm-offsets.h>
>>  #include <asm/pgtable-hwdef.h>
>> @@ -197,20 +198,6 @@ asm_mmu_enable:
>>  
>>  	mov     pc, lr
>>  
>> -.macro dcache_clean_inval domain, start, end, tmp1, tmp2
>> -	ldr	\tmp1, =dcache_line_size
>> -	ldr	\tmp1, [\tmp1]
>> -	sub	\tmp2, \tmp1, #1
>> -	bic	\start, \start, \tmp2
>> -9998:
>> -	/* DCCIMVAC */
>> -	mcr	p15, 0, \start, c7, c14, 1
>> -	add	\start, \start, \tmp1
>> -	cmp	\start, \end
>> -	blo	9998b
>> -	dsb	\domain
>> -.endm
>> -
>>  .globl asm_mmu_disable
>>  asm_mmu_disable:
>>  	/* SCTLR */
>> @@ -223,7 +210,8 @@ asm_mmu_disable:
>>  	ldr	r0, [r0]
>>  	ldr	r1, =__phys_end
>>  	ldr	r1, [r1]
>> -	dcache_clean_inval sy, r0, r1, r2, r3
>> +	sub	r1, r1, r0
>> +	dcache_by_line_op dccimvac, sy, r0, r1, r2, r3
>>  	isb
>>  
>>  	mov     pc, lr
>> diff --git a/arm/cstart64.S b/arm/cstart64.S
>> index fc1930bcdb53..046bd3914098 100644
>> --- a/arm/cstart64.S
>> +++ b/arm/cstart64.S
>> @@ -8,6 +8,7 @@
>>  #define __ASSEMBLY__
>>  #include <auxinfo.h>
>>  #include <asm/asm-offsets.h>
>> +#include <asm/assembler.h>
>>  #include <asm/ptrace.h>
>>  #include <asm/processor.h>
>>  #include <asm/page.h>
>> @@ -204,20 +205,6 @@ asm_mmu_enable:
>>  
>>  	ret
>>  
>> -/* Taken with small changes from arch/arm64/incluse/asm/assembler.h */
>> -.macro dcache_by_line_op op, domain, start, end, tmp1, tmp2
>> -	adrp	\tmp1, dcache_line_size
>> -	ldr	\tmp1, [\tmp1, :lo12:dcache_line_size]
>> -	sub	\tmp2, \tmp1, #1
>> -	bic	\start, \start, \tmp2
>> -9998:
>> -	dc	\op , \start
>> -	add	\start, \start, \tmp1
>> -	cmp	\start, \end
>> -	b.lo	9998b
>> -	dsb	\domain
>> -.endm
>> -
>>  .globl asm_mmu_disable
>>  asm_mmu_disable:
>>  	mrs	x0, sctlr_el1
>> @@ -230,6 +217,7 @@ asm_mmu_disable:
>>  	ldr	x0, [x0, :lo12:__phys_offset]
>>  	adrp	x1, __phys_end
>>  	ldr	x1, [x1, :lo12:__phys_end]
>> +	sub	x1, x1, x0
>>  	dcache_by_line_op civac, sy, x0, x1, x2, x3
>>  	isb
>>  

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [kvm-unit-tests PATCH 3/6] arm/arm64: Remove unnecessary ISB when doing dcache maintenance
  2021-03-12 14:59   ` Andrew Jones
@ 2021-03-15 16:22     ` Alexandru Elisei
  0 siblings, 0 replies; 17+ messages in thread
From: Alexandru Elisei @ 2021-03-15 16:22 UTC (permalink / raw)
  To: Andrew Jones; +Cc: kvm, kvmarm

Hi Drew,

On 3/12/21 2:59 PM, Andrew Jones wrote:
> On Sat, Feb 27, 2021 at 10:41:58AM +0000, Alexandru Elisei wrote:
>> The dcache_by_line_op macro executes a DSB to complete the cache
>> maintenance operations. According to ARM DDI 0487G.a, page B2-150:
>>
>> "In addition, no instruction that appears in program order after the DSB
>> instruction can alter any state of the system or perform any part of its
>> functionality until the DSB completes other than:
>>
>> - Being fetched from memory and decoded.
>> - Reading the general-purpose, SIMD and floating-point, Special-purpose, or
>>   System registers that are directly or indirectly read without causing
>>   side-effects."
>>
>> Similar definition for ARM in ARM DDI 0406C.d, page A3-150:
>>
>> "In addition, no instruction that appears in program order after the DSB
>> instruction can execute until the DSB completes."
>>
>> This means that we don't need the ISB to prevent reordering of the cache
>> maintenance instructions.
>>
>> We are also not doing icache maintenance, where an ISB would be required
>> for the PE to discard instructions speculated before the invalidation.
>>
>> In conclusion, the ISB is unnecessary, so remove it.
> Hi Alexandru,
>
> We can go ahead and take this patch, since you've written quite a
> convincing commit message, but in general I'd prefer we be overly cautious
> in our common code. We'd like to ensure we don't introduce difficult to
> debug issues there, and we don't care about optimizations, let alone
> micro-optimizations. Testing barrier needs to the letter of the spec is a
> good idea, but it's probably better to do that in the test cases.

You are correct, the intention of this patch was to do the minimum necessary to
ensure correctness.

Thank you for the explanation, I will keep this in mind for future patches.

Thanks,

Alex

>
> Thanks,
> drew
>
>> Signed-off-by: Alexandru Elisei <alexandru.elisei@arm.com>
>> ---
>>  arm/cstart.S   | 1 -
>>  arm/cstart64.S | 1 -
>>  2 files changed, 2 deletions(-)
>>
>> diff --git a/arm/cstart.S b/arm/cstart.S
>> index 954748b00f64..2d62c1e6d40d 100644
>> --- a/arm/cstart.S
>> +++ b/arm/cstart.S
>> @@ -212,7 +212,6 @@ asm_mmu_disable:
>>  	ldr	r1, [r1]
>>  	sub	r1, r1, r0
>>  	dcache_by_line_op dccimvac, sy, r0, r1, r2, r3
>> -	isb
>>  
>>  	mov     pc, lr
>>  
>> diff --git a/arm/cstart64.S b/arm/cstart64.S
>> index 046bd3914098..c1deff842f03 100644
>> --- a/arm/cstart64.S
>> +++ b/arm/cstart64.S
>> @@ -219,7 +219,6 @@ asm_mmu_disable:
>>  	ldr	x1, [x1, :lo12:__phys_end]
>>  	sub	x1, x1, x0
>>  	dcache_by_line_op civac, sy, x0, x1, x2, x3
>> -	isb
>>  
>>  	ret
>>  
>> -- 
>> 2.30.1
>>

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [kvm-unit-tests PATCH 2/6] arm/arm64: Remove dcache_line_size global variable
  2021-03-15 15:46     ` Alexandru Elisei
@ 2021-03-16 15:40       ` Andre Przywara
  2021-03-22 12:01         ` Alexandru Elisei
  0 siblings, 1 reply; 17+ messages in thread
From: Andre Przywara @ 2021-03-16 15:40 UTC (permalink / raw)
  To: Alexandru Elisei; +Cc: drjones, kvm, kvmarm

On Mon, 15 Mar 2021 15:46:09 +0000
Alexandru Elisei <alexandru.elisei@arm.com> wrote:

Hi Alex,

> On 3/4/21 3:00 PM, Andre Przywara wrote:
> > On Sat, 27 Feb 2021 10:41:57 +0000
> > Alexandru Elisei <alexandru.elisei@arm.com> wrote:
> >  
> >> Compute the dcache line size when doing dcache maintenance instead of using
> >> a global variable computed in setup(), which allows us to do dcache
> >> maintenance at any point in the boot process. This will be useful for
> >> running as an EFI app and it also aligns our implementation to that of the
> >> Linux kernel.  
> > Can you add that this changes the semantic of dcache_by_line_op to use
> > the size instead of the end address?  
> 
> Sure, I can do that. The dcache_by_line_op was never visible to code outside
> assembly, and it was only used by asm_mmu_disable, so no other callers are
> affected by this change.

Thanks, just a short sentence suffices. I was just mentioning this
since many cache-op wrappers I have seen use (start,stop) pairs, while
I actually think (start,length) is more practical. So it just deserves
a short mentioning in case anyone was familiar with the previous
arguments and wonders what's going on.

> >  
> >> For consistency, the arm code has been similary modified.
> >>
> >> Signed-off-by: Alexandru Elisei <alexandru.elisei@arm.com>
> >> ---
> >>  lib/arm/asm/assembler.h   | 44 ++++++++++++++++++++++++++++++++
> >>  lib/arm/asm/processor.h   |  7 ------
> >>  lib/arm64/asm/assembler.h | 53 +++++++++++++++++++++++++++++++++++++++
> >>  lib/arm64/asm/processor.h |  7 ------
> >>  lib/arm/setup.c           |  7 ------
> >>  arm/cstart.S              | 18 +++----------
> >>  arm/cstart64.S            | 16 ++----------
> >>  7 files changed, 102 insertions(+), 50 deletions(-)
> >>  create mode 100644 lib/arm/asm/assembler.h
> >>  create mode 100644 lib/arm64/asm/assembler.h
> >>
> >> diff --git a/lib/arm/asm/assembler.h b/lib/arm/asm/assembler.h
> >> new file mode 100644
> >> index 000000000000..6b932df86204
> >> --- /dev/null
> >> +++ b/lib/arm/asm/assembler.h
> >> @@ -0,0 +1,44 @@
> >> +/* SPDX-License-Identifier: GPL-2.0 */
> >> +/*
> >> + * Based on several files from Linux version v5.10: arch/arm/mm/proc-macros.S,
> >> + * arch/arm/mm/proc-v7.S.
> >> + */
> >> +
> >> +/*
> >> + * dcache_line_size - get the minimum D-cache line size from the CTR register
> > `> + * on ARMv7.  
> 
> Well, it's in the arm directory and there's a file with the same name under
> lib/arm64/asm/, so I don't think there's any room for confusion here.

Mmh, this v7 line was already in there, and I didn't complain, it was
apparently just a stray character sneaking in the reply which made this
look like a comment?

> 
> >> + */
> >> +	.macro	dcache_line_size, reg, tmp
> >> +	mrc	p15, 0, \tmp, c0, c0, 1		// read ctr
> >> +	lsr	\tmp, \tmp, #16
> >> +	and	\tmp, \tmp, #0xf		// cache line size encoding
> >> +	mov	\reg, #4			// bytes per word
> >> +	mov	\reg, \reg, lsl \tmp		// actual cache line size
> >> +	.endm
> >> +
> >> +/*
> >> + * Macro to perform a data cache maintenance for the interval
> >> + * [addr, addr + size).
> >> + *
> >> + * 	op:		operation to execute
> >> + * 	domain		domain used in the dsb instruction
> >> + * 	addr:		starting virtual address of the region
> >> + * 	size:		size of the region
> >> + * 	Corrupts:	addr, size, tmp1, tmp2
> >> + */
> >> +	.macro dcache_by_line_op op, domain, addr, size, tmp1, tmp2
> >> +	dcache_line_size \tmp1, \tmp2
> >> +	add	\size, \addr, \size
> >> +	sub	\tmp2, \tmp1, #1
> >> +	bic	\addr, \addr, \tmp2  
> > Just a nit, but since my brain was in assembly land: We could skip tmp2,
> > by adding back #1 to tmp1 after the bic.
> > Same for the arm64 code.  
> 
> Using one less temporary register wouldn't help with register pressure:
> 
> - On arm, registers r0-r3 are used, which ARM IHI 0042F says that they can be used
> as scratch registers and the caller will save their contents before the calling
> the function (or not use them at all).
> 
> - On arm64, register x0-x3 are used, which have a similar usage according to ARM
> IHI 0055B.
> 
> Using one less temporary register means one more instruction, but not relevant
> since the macro will perform writes, as even invalidation is transformed to a
> clean + invalidate under virtualization.

I am not talking about micro-optimisation here, but this is a *macro*,
not a function, so could potentially be used in multiple different
places, for instance inside leaf functions. And maybe that already uses
two registers on its own, so can't spare three extra ones?

Was just a hint, anyway ...

> The reason I chose to keep the macro unchanged for arm64 is that it matches the
> Linux definition, and I think it's better to try not to deviate too much from it,
> as in the long it will make maintenance easier for everyone.

> For arm, I wrote it this way to match the arm64 definition.
> 
> >  
> >> +9998:
> >> +	.ifc	\op, dccimvac
> >> +	mcr	p15, 0, \addr, c7, c14, 1
> >> +	.else
> >> +	.err
> >> +	.endif
> >> +	add	\addr, \addr, \tmp1
> >> +	cmp	\addr, \size
> >> +	blo	9998b
> >> +	dsb	\domain
> >> +	.endm
> >> diff --git a/lib/arm/asm/processor.h b/lib/arm/asm/processor.h
> >> index 273366d1fe1c..3c36eac903f0 100644
> >> --- a/lib/arm/asm/processor.h
> >> +++ b/lib/arm/asm/processor.h
> >> @@ -9,11 +9,6 @@
> >>  #include <asm/sysreg.h>
> >>  #include <asm/barrier.h>  
> > Do we want the same protection against inclusion from C here as in the
> > arm64 version?  
> 
> We do, I will add it in the next iteration.
> 
> >  
> >> -#define CTR_DMINLINE_SHIFT	16
> >> -#define CTR_DMINLINE_MASK	(0xf << 16)
> >> -#define CTR_DMINLINE(x)	\
> >> -	(((x) & CTR_DMINLINE_MASK) >> CTR_DMINLINE_SHIFT)
> >> -
> >>  enum vector {
> >>  	EXCPTN_RST,
> >>  	EXCPTN_UND,
> >> @@ -89,6 +84,4 @@ static inline u32 get_ctr(void)
> >>  	return read_sysreg(CTR);
> >>  }
> >>  
> >> -extern unsigned long dcache_line_size;
> >> -
> >>  #endif /* _ASMARM_PROCESSOR_H_ */
> >> diff --git a/lib/arm64/asm/assembler.h b/lib/arm64/asm/assembler.h
> >> new file mode 100644
> >> index 000000000000..f801c0c43d02
> >> --- /dev/null
> >> +++ b/lib/arm64/asm/assembler.h
> >> @@ -0,0 +1,53 @@
> >> +/* SPDX-License-Identifier: GPL-2.0-only */
> >> +/*
> >> + * Based on the file arch/arm64/include/asm/assembled.h from Linux v5.10, which
> >> + * in turn is based on arch/arm/include/asm/assembler.h and
> >> + * arch/arm/mm/proc-macros.S
> >> + *
> >> + * Copyright (C) 1996-2000 Russell King
> >> + * Copyright (C) 2012 ARM Ltd.
> >> + */
> >> +#ifndef __ASSEMBLY__
> >> +#error "Only include this from assembly code"
> >> +#endif
> >> +
> >> +#ifndef __ASM_ASSEMBLER_H
> >> +#define __ASM_ASSEMBLER_H
> >> +
> >> +/*
> >> + * raw_dcache_line_size - get the minimum D-cache line size on this CPU
> >> + * from the CTR register.
> >> + */
> >> +	.macro	raw_dcache_line_size, reg, tmp
> >> +	mrs	\tmp, ctr_el0			// read CTR
> >> +	ubfm	\tmp, \tmp, #16, #19		// cache line size encoding  
> > this encoding of ubfm is supposed to be written as:
> > 	ubfx \tmp, \tmp, #16, #4
> > This is also what objdump makes of the above.  
> 
> I would rather keep it the same as Linux.

Which means I need to send a patch there? ;-)
I am all for copying from reliable sources, but that doesn't mean that
we can't improve on them.

The ARM ARM says that ubfx is the preferred disassembly for this opcode.
Plus there is a ubfx in AArch32 (with the exact same semantic and
arguments), but no ubfm.

It's just that my brain expects ubfx when extracting bits from a
register, and I needed to wade through the exact definition of ubfm to
understand what it does.

So your choice, just wanted to point that out.

Cheers,
Andre

> >
> > The rest looks good, I convinced myself that the assembly algorithms are
> > correct.  
> 
> Thanks, much appreciated!
> 
> Thanks,
> 
> Alex
> 
> >
> > Cheers,
> > Andre
> >
> >  
> >> +	mov	\reg, #4			// bytes per word
> >> +	lsl	\reg, \reg, \tmp		// actual cache line size
> >> +	.endm
> >> +
> >> +/*
> >> + * Macro to perform a data cache maintenance for the interval
> >> + * [addr, addr + size). Use the raw value for the dcache line size because
> >> + * kvm-unit-tests has no concept of scheduling.
> >> + *
> >> + * 	op:		operation passed to dc instruction
> >> + * 	domain:		domain used in dsb instruciton
> >> + * 	addr:		starting virtual address of the region
> >> + * 	size:		size of the region
> >> + * 	Corrupts:	addr, size, tmp1, tmp2
> >> + */
> >> +
> >> +	.macro dcache_by_line_op op, domain, addr, size, tmp1, tmp2
> >> +	raw_dcache_line_size \tmp1, \tmp2
> >> +	add	\size, \addr, \size
> >> +	sub	\tmp2, \tmp1, #1
> >> +	bic	\addr, \addr, \tmp2
> >> +9998:
> >> +	dc	\op, \addr
> >> +	add	\addr, \addr, \tmp1
> >> +	cmp	\addr, \size
> >> +	b.lo	9998b
> >> +	dsb	\domain
> >> +	.endm
> >> +
> >> +#endif	/* __ASM_ASSEMBLER_H */
> >> diff --git a/lib/arm64/asm/processor.h b/lib/arm64/asm/processor.h
> >> index 771b2d1e0c94..cdc2463e1981 100644
> >> --- a/lib/arm64/asm/processor.h
> >> +++ b/lib/arm64/asm/processor.h
> >> @@ -16,11 +16,6 @@
> >>  #define SCTLR_EL1_A	(1 << 1)
> >>  #define SCTLR_EL1_M	(1 << 0)
> >>  
> >> -#define CTR_DMINLINE_SHIFT	16
> >> -#define CTR_DMINLINE_MASK	(0xf << 16)
> >> -#define CTR_DMINLINE(x)	\
> >> -	(((x) & CTR_DMINLINE_MASK) >> CTR_DMINLINE_SHIFT)
> >> -
> >>  #ifndef __ASSEMBLY__
> >>  #include <asm/ptrace.h>
> >>  #include <asm/esr.h>
> >> @@ -115,8 +110,6 @@ static inline u64 get_ctr(void)
> >>  	return read_sysreg(ctr_el0);
> >>  }
> >>  
> >> -extern unsigned long dcache_line_size;
> >> -
> >>  static inline unsigned long get_id_aa64mmfr0_el1(void)
> >>  {
> >>  	return read_sysreg(id_aa64mmfr0_el1);
> >> diff --git a/lib/arm/setup.c b/lib/arm/setup.c
> >> index 066524f8bf61..751ba980000a 100644
> >> --- a/lib/arm/setup.c
> >> +++ b/lib/arm/setup.c
> >> @@ -42,8 +42,6 @@ static struct mem_region __initial_mem_regions[NR_INITIAL_MEM_REGIONS + 1];
> >>  struct mem_region *mem_regions = __initial_mem_regions;
> >>  phys_addr_t __phys_offset, __phys_end;
> >>  
> >> -unsigned long dcache_line_size;
> >> -
> >>  int mpidr_to_cpu(uint64_t mpidr)
> >>  {
> >>  	int i;
> >> @@ -72,11 +70,6 @@ static void cpu_init(void)
> >>  	ret = dt_for_each_cpu_node(cpu_set, NULL);
> >>  	assert(ret == 0);
> >>  	set_cpu_online(0, true);
> >> -	/*
> >> -	 * DminLine is log2 of the number of words in the smallest cache line; a
> >> -	 * word is 4 bytes.
> >> -	 */
> >> -	dcache_line_size = 1 << (CTR_DMINLINE(get_ctr()) + 2);
> >>  }
> >>  
> >>  unsigned int mem_region_get_flags(phys_addr_t paddr)
> >> diff --git a/arm/cstart.S b/arm/cstart.S
> >> index ef936ae2f874..954748b00f64 100644
> >> --- a/arm/cstart.S
> >> +++ b/arm/cstart.S
> >> @@ -7,6 +7,7 @@
> >>   */
> >>  #define __ASSEMBLY__
> >>  #include <auxinfo.h>
> >> +#include <asm/assembler.h>
> >>  #include <asm/thread_info.h>
> >>  #include <asm/asm-offsets.h>
> >>  #include <asm/pgtable-hwdef.h>
> >> @@ -197,20 +198,6 @@ asm_mmu_enable:
> >>  
> >>  	mov     pc, lr
> >>  
> >> -.macro dcache_clean_inval domain, start, end, tmp1, tmp2
> >> -	ldr	\tmp1, =dcache_line_size
> >> -	ldr	\tmp1, [\tmp1]
> >> -	sub	\tmp2, \tmp1, #1
> >> -	bic	\start, \start, \tmp2
> >> -9998:
> >> -	/* DCCIMVAC */
> >> -	mcr	p15, 0, \start, c7, c14, 1
> >> -	add	\start, \start, \tmp1
> >> -	cmp	\start, \end
> >> -	blo	9998b
> >> -	dsb	\domain
> >> -.endm
> >> -
> >>  .globl asm_mmu_disable
> >>  asm_mmu_disable:
> >>  	/* SCTLR */
> >> @@ -223,7 +210,8 @@ asm_mmu_disable:
> >>  	ldr	r0, [r0]
> >>  	ldr	r1, =__phys_end
> >>  	ldr	r1, [r1]
> >> -	dcache_clean_inval sy, r0, r1, r2, r3
> >> +	sub	r1, r1, r0
> >> +	dcache_by_line_op dccimvac, sy, r0, r1, r2, r3
> >>  	isb
> >>  
> >>  	mov     pc, lr
> >> diff --git a/arm/cstart64.S b/arm/cstart64.S
> >> index fc1930bcdb53..046bd3914098 100644
> >> --- a/arm/cstart64.S
> >> +++ b/arm/cstart64.S
> >> @@ -8,6 +8,7 @@
> >>  #define __ASSEMBLY__
> >>  #include <auxinfo.h>
> >>  #include <asm/asm-offsets.h>
> >> +#include <asm/assembler.h>
> >>  #include <asm/ptrace.h>
> >>  #include <asm/processor.h>
> >>  #include <asm/page.h>
> >> @@ -204,20 +205,6 @@ asm_mmu_enable:
> >>  
> >>  	ret
> >>  
> >> -/* Taken with small changes from arch/arm64/incluse/asm/assembler.h */
> >> -.macro dcache_by_line_op op, domain, start, end, tmp1, tmp2
> >> -	adrp	\tmp1, dcache_line_size
> >> -	ldr	\tmp1, [\tmp1, :lo12:dcache_line_size]
> >> -	sub	\tmp2, \tmp1, #1
> >> -	bic	\start, \start, \tmp2
> >> -9998:
> >> -	dc	\op , \start
> >> -	add	\start, \start, \tmp1
> >> -	cmp	\start, \end
> >> -	b.lo	9998b
> >> -	dsb	\domain
> >> -.endm
> >> -
> >>  .globl asm_mmu_disable
> >>  asm_mmu_disable:
> >>  	mrs	x0, sctlr_el1
> >> @@ -230,6 +217,7 @@ asm_mmu_disable:
> >>  	ldr	x0, [x0, :lo12:__phys_offset]
> >>  	adrp	x1, __phys_end
> >>  	ldr	x1, [x1, :lo12:__phys_end]
> >> +	sub	x1, x1, x0
> >>  	dcache_by_line_op civac, sy, x0, x1, x2, x3
> >>  	isb
> >>    


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [kvm-unit-tests PATCH 2/6] arm/arm64: Remove dcache_line_size global variable
  2021-03-16 15:40       ` Andre Przywara
@ 2021-03-22 12:01         ` Alexandru Elisei
  0 siblings, 0 replies; 17+ messages in thread
From: Alexandru Elisei @ 2021-03-22 12:01 UTC (permalink / raw)
  To: Andre Przywara; +Cc: drjones, kvm, kvmarm

Hi Andre,

On 3/16/21 3:40 PM, Andre Przywara wrote:
> On Mon, 15 Mar 2021 15:46:09 +0000
> Alexandru Elisei <alexandru.elisei@arm.com> wrote:
>
> Hi Alex,
>
>> On 3/4/21 3:00 PM, Andre Przywara wrote:
>>> On Sat, 27 Feb 2021 10:41:57 +0000
>>> Alexandru Elisei <alexandru.elisei@arm.com> wrote:
>>>  
>>>> Compute the dcache line size when doing dcache maintenance instead of using
>>>> a global variable computed in setup(), which allows us to do dcache
>>>> maintenance at any point in the boot process. This will be useful for
>>>> running as an EFI app and it also aligns our implementation to that of the
>>>> Linux kernel.  
>>> Can you add that this changes the semantic of dcache_by_line_op to use
>>> the size instead of the end address?  
>> Sure, I can do that. The dcache_by_line_op was never visible to code outside
>> assembly, and it was only used by asm_mmu_disable, so no other callers are
>> affected by this change.
> Thanks, just a short sentence suffices. I was just mentioning this
> since many cache-op wrappers I have seen use (start,stop) pairs, while
> I actually think (start,length) is more practical. So it just deserves
> a short mentioning in case anyone was familiar with the previous
> arguments and wonders what's going on.
>
>>>  
>>>> For consistency, the arm code has been similary modified.
>>>>
>>>> Signed-off-by: Alexandru Elisei <alexandru.elisei@arm.com>
>>>> ---
>>>>  lib/arm/asm/assembler.h   | 44 ++++++++++++++++++++++++++++++++
>>>>  lib/arm/asm/processor.h   |  7 ------
>>>>  lib/arm64/asm/assembler.h | 53 +++++++++++++++++++++++++++++++++++++++
>>>>  lib/arm64/asm/processor.h |  7 ------
>>>>  lib/arm/setup.c           |  7 ------
>>>>  arm/cstart.S              | 18 +++----------
>>>>  arm/cstart64.S            | 16 ++----------
>>>>  7 files changed, 102 insertions(+), 50 deletions(-)
>>>>  create mode 100644 lib/arm/asm/assembler.h
>>>>  create mode 100644 lib/arm64/asm/assembler.h
>>>>
>>>> diff --git a/lib/arm/asm/assembler.h b/lib/arm/asm/assembler.h
>>>> new file mode 100644
>>>> index 000000000000..6b932df86204
>>>> --- /dev/null
>>>> +++ b/lib/arm/asm/assembler.h
>>>> @@ -0,0 +1,44 @@
>>>> +/* SPDX-License-Identifier: GPL-2.0 */
>>>> +/*
>>>> + * Based on several files from Linux version v5.10: arch/arm/mm/proc-macros.S,
>>>> + * arch/arm/mm/proc-v7.S.
>>>> + */
>>>> +
>>>> +/*
>>>> + * dcache_line_size - get the minimum D-cache line size from the CTR register
>>> `> + * on ARMv7.  
>> Well, it's in the arm directory and there's a file with the same name under
>> lib/arm64/asm/, so I don't think there's any room for confusion here.
> Mmh, this v7 line was already in there, and I didn't complain, it was
> apparently just a stray character sneaking in the reply which made this
> look like a comment?

I guess so, my bad.

>
>>>> + */
>>>> +	.macro	dcache_line_size, reg, tmp
>>>> +	mrc	p15, 0, \tmp, c0, c0, 1		// read ctr
>>>> +	lsr	\tmp, \tmp, #16
>>>> +	and	\tmp, \tmp, #0xf		// cache line size encoding
>>>> +	mov	\reg, #4			// bytes per word
>>>> +	mov	\reg, \reg, lsl \tmp		// actual cache line size
>>>> +	.endm
>>>> +
>>>> +/*
>>>> + * Macro to perform a data cache maintenance for the interval
>>>> + * [addr, addr + size).
>>>> + *
>>>> + * 	op:		operation to execute
>>>> + * 	domain		domain used in the dsb instruction
>>>> + * 	addr:		starting virtual address of the region
>>>> + * 	size:		size of the region
>>>> + * 	Corrupts:	addr, size, tmp1, tmp2
>>>> + */
>>>> +	.macro dcache_by_line_op op, domain, addr, size, tmp1, tmp2
>>>> +	dcache_line_size \tmp1, \tmp2
>>>> +	add	\size, \addr, \size
>>>> +	sub	\tmp2, \tmp1, #1
>>>> +	bic	\addr, \addr, \tmp2  
>>> Just a nit, but since my brain was in assembly land: We could skip tmp2,
>>> by adding back #1 to tmp1 after the bic.
>>> Same for the arm64 code.  
>> Using one less temporary register wouldn't help with register pressure:
>>
>> - On arm, registers r0-r3 are used, which ARM IHI 0042F says that they can be used
>> as scratch registers and the caller will save their contents before the calling
>> the function (or not use them at all).
>>
>> - On arm64, register x0-x3 are used, which have a similar usage according to ARM
>> IHI 0055B.
>>
>> Using one less temporary register means one more instruction, but not relevant
>> since the macro will perform writes, as even invalidation is transformed to a
>> clean + invalidate under virtualization.
> I am not talking about micro-optimisation here, but this is a *macro*,
> not a function, so could potentially be used in multiple different
> places, for instance inside leaf functions. And maybe that already uses
> two registers on its own, so can't spare three extra ones?
>
> Was just a hint, anyway ...
>
>> The reason I chose to keep the macro unchanged for arm64 is that it matches the
>> Linux definition, and I think it's better to try not to deviate too much from it,
>> as in the long it will make maintenance easier for everyone.
>> For arm, I wrote it this way to match the arm64 definition.
>>
>>>  
>>>> +9998:
>>>> +	.ifc	\op, dccimvac
>>>> +	mcr	p15, 0, \addr, c7, c14, 1
>>>> +	.else
>>>> +	.err
>>>> +	.endif
>>>> +	add	\addr, \addr, \tmp1
>>>> +	cmp	\addr, \size
>>>> +	blo	9998b
>>>> +	dsb	\domain
>>>> +	.endm
>>>> diff --git a/lib/arm/asm/processor.h b/lib/arm/asm/processor.h
>>>> index 273366d1fe1c..3c36eac903f0 100644
>>>> --- a/lib/arm/asm/processor.h
>>>> +++ b/lib/arm/asm/processor.h
>>>> @@ -9,11 +9,6 @@
>>>>  #include <asm/sysreg.h>
>>>>  #include <asm/barrier.h>  
>>> Do we want the same protection against inclusion from C here as in the
>>> arm64 version?  
>> We do, I will add it in the next iteration.
>>
>>>  
>>>> -#define CTR_DMINLINE_SHIFT	16
>>>> -#define CTR_DMINLINE_MASK	(0xf << 16)
>>>> -#define CTR_DMINLINE(x)	\
>>>> -	(((x) & CTR_DMINLINE_MASK) >> CTR_DMINLINE_SHIFT)
>>>> -
>>>>  enum vector {
>>>>  	EXCPTN_RST,
>>>>  	EXCPTN_UND,
>>>> @@ -89,6 +84,4 @@ static inline u32 get_ctr(void)
>>>>  	return read_sysreg(CTR);
>>>>  }
>>>>  
>>>> -extern unsigned long dcache_line_size;
>>>> -
>>>>  #endif /* _ASMARM_PROCESSOR_H_ */
>>>> diff --git a/lib/arm64/asm/assembler.h b/lib/arm64/asm/assembler.h
>>>> new file mode 100644
>>>> index 000000000000..f801c0c43d02
>>>> --- /dev/null
>>>> +++ b/lib/arm64/asm/assembler.h
>>>> @@ -0,0 +1,53 @@
>>>> +/* SPDX-License-Identifier: GPL-2.0-only */
>>>> +/*
>>>> + * Based on the file arch/arm64/include/asm/assembled.h from Linux v5.10, which
>>>> + * in turn is based on arch/arm/include/asm/assembler.h and
>>>> + * arch/arm/mm/proc-macros.S
>>>> + *
>>>> + * Copyright (C) 1996-2000 Russell King
>>>> + * Copyright (C) 2012 ARM Ltd.
>>>> + */
>>>> +#ifndef __ASSEMBLY__
>>>> +#error "Only include this from assembly code"
>>>> +#endif
>>>> +
>>>> +#ifndef __ASM_ASSEMBLER_H
>>>> +#define __ASM_ASSEMBLER_H
>>>> +
>>>> +/*
>>>> + * raw_dcache_line_size - get the minimum D-cache line size on this CPU
>>>> + * from the CTR register.
>>>> + */
>>>> +	.macro	raw_dcache_line_size, reg, tmp
>>>> +	mrs	\tmp, ctr_el0			// read CTR
>>>> +	ubfm	\tmp, \tmp, #16, #19		// cache line size encoding  
>>> this encoding of ubfm is supposed to be written as:
>>> 	ubfx \tmp, \tmp, #16, #4
>>> This is also what objdump makes of the above.  
>> I would rather keep it the same as Linux.
> Which means I need to send a patch there? ;-)

Only if you want to try to convince people that ubfx is better than ubfm :)

> I am all for copying from reliable sources, but that doesn't mean that
> we can't improve on them.
>
> The ARM ARM says that ubfx is the preferred disassembly for this opcode.
> Plus there is a ubfx in AArch32 (with the exact same semantic and
> arguments), but no ubfm.
>
> It's just that my brain expects ubfx when extracting bits from a
> register, and I needed to wade through the exact definition of ubfm to
> understand what it does.
>
> So your choice, just wanted to point that out.

I think you've convinced me. I'm also more familiar with ubfx, the Arm ARM says
that ubfm is usually accessed via one of its aliases, and Linux seems to be moving
away from using ubfm (the only instances that I've found were
{raw_,}dcache_line_size which are very old, I think from the initial arm64
architecture enablement). I'll use ubfx instead.

Thanks,

Alex

>
> Cheers,
> Andre
>
>>> The rest looks good, I convinced myself that the assembly algorithms are
>>> correct.  
>> Thanks, much appreciated!
>>
>> Thanks,
>>
>> Alex
>>
>>> Cheers,
>>> Andre
>>>
>>>  
>>>> +	mov	\reg, #4			// bytes per word
>>>> +	lsl	\reg, \reg, \tmp		// actual cache line size
>>>> +	.endm
>>>> +
>>>> +/*
>>>> + * Macro to perform a data cache maintenance for the interval
>>>> + * [addr, addr + size). Use the raw value for the dcache line size because
>>>> + * kvm-unit-tests has no concept of scheduling.
>>>> + *
>>>> + * 	op:		operation passed to dc instruction
>>>> + * 	domain:		domain used in dsb instruciton
>>>> + * 	addr:		starting virtual address of the region
>>>> + * 	size:		size of the region
>>>> + * 	Corrupts:	addr, size, tmp1, tmp2
>>>> + */
>>>> +
>>>> +	.macro dcache_by_line_op op, domain, addr, size, tmp1, tmp2
>>>> +	raw_dcache_line_size \tmp1, \tmp2
>>>> +	add	\size, \addr, \size
>>>> +	sub	\tmp2, \tmp1, #1
>>>> +	bic	\addr, \addr, \tmp2
>>>> +9998:
>>>> +	dc	\op, \addr
>>>> +	add	\addr, \addr, \tmp1
>>>> +	cmp	\addr, \size
>>>> +	b.lo	9998b
>>>> +	dsb	\domain
>>>> +	.endm
>>>> +
>>>> +#endif	/* __ASM_ASSEMBLER_H */
>>>> diff --git a/lib/arm64/asm/processor.h b/lib/arm64/asm/processor.h
>>>> index 771b2d1e0c94..cdc2463e1981 100644
>>>> --- a/lib/arm64/asm/processor.h
>>>> +++ b/lib/arm64/asm/processor.h
>>>> @@ -16,11 +16,6 @@
>>>>  #define SCTLR_EL1_A	(1 << 1)
>>>>  #define SCTLR_EL1_M	(1 << 0)
>>>>  
>>>> -#define CTR_DMINLINE_SHIFT	16
>>>> -#define CTR_DMINLINE_MASK	(0xf << 16)
>>>> -#define CTR_DMINLINE(x)	\
>>>> -	(((x) & CTR_DMINLINE_MASK) >> CTR_DMINLINE_SHIFT)
>>>> -
>>>>  #ifndef __ASSEMBLY__
>>>>  #include <asm/ptrace.h>
>>>>  #include <asm/esr.h>
>>>> @@ -115,8 +110,6 @@ static inline u64 get_ctr(void)
>>>>  	return read_sysreg(ctr_el0);
>>>>  }
>>>>  
>>>> -extern unsigned long dcache_line_size;
>>>> -
>>>>  static inline unsigned long get_id_aa64mmfr0_el1(void)
>>>>  {
>>>>  	return read_sysreg(id_aa64mmfr0_el1);
>>>> diff --git a/lib/arm/setup.c b/lib/arm/setup.c
>>>> index 066524f8bf61..751ba980000a 100644
>>>> --- a/lib/arm/setup.c
>>>> +++ b/lib/arm/setup.c
>>>> @@ -42,8 +42,6 @@ static struct mem_region __initial_mem_regions[NR_INITIAL_MEM_REGIONS + 1];
>>>>  struct mem_region *mem_regions = __initial_mem_regions;
>>>>  phys_addr_t __phys_offset, __phys_end;
>>>>  
>>>> -unsigned long dcache_line_size;
>>>> -
>>>>  int mpidr_to_cpu(uint64_t mpidr)
>>>>  {
>>>>  	int i;
>>>> @@ -72,11 +70,6 @@ static void cpu_init(void)
>>>>  	ret = dt_for_each_cpu_node(cpu_set, NULL);
>>>>  	assert(ret == 0);
>>>>  	set_cpu_online(0, true);
>>>> -	/*
>>>> -	 * DminLine is log2 of the number of words in the smallest cache line; a
>>>> -	 * word is 4 bytes.
>>>> -	 */
>>>> -	dcache_line_size = 1 << (CTR_DMINLINE(get_ctr()) + 2);
>>>>  }
>>>>  
>>>>  unsigned int mem_region_get_flags(phys_addr_t paddr)
>>>> diff --git a/arm/cstart.S b/arm/cstart.S
>>>> index ef936ae2f874..954748b00f64 100644
>>>> --- a/arm/cstart.S
>>>> +++ b/arm/cstart.S
>>>> @@ -7,6 +7,7 @@
>>>>   */
>>>>  #define __ASSEMBLY__
>>>>  #include <auxinfo.h>
>>>> +#include <asm/assembler.h>
>>>>  #include <asm/thread_info.h>
>>>>  #include <asm/asm-offsets.h>
>>>>  #include <asm/pgtable-hwdef.h>
>>>> @@ -197,20 +198,6 @@ asm_mmu_enable:
>>>>  
>>>>  	mov     pc, lr
>>>>  
>>>> -.macro dcache_clean_inval domain, start, end, tmp1, tmp2
>>>> -	ldr	\tmp1, =dcache_line_size
>>>> -	ldr	\tmp1, [\tmp1]
>>>> -	sub	\tmp2, \tmp1, #1
>>>> -	bic	\start, \start, \tmp2
>>>> -9998:
>>>> -	/* DCCIMVAC */
>>>> -	mcr	p15, 0, \start, c7, c14, 1
>>>> -	add	\start, \start, \tmp1
>>>> -	cmp	\start, \end
>>>> -	blo	9998b
>>>> -	dsb	\domain
>>>> -.endm
>>>> -
>>>>  .globl asm_mmu_disable
>>>>  asm_mmu_disable:
>>>>  	/* SCTLR */
>>>> @@ -223,7 +210,8 @@ asm_mmu_disable:
>>>>  	ldr	r0, [r0]
>>>>  	ldr	r1, =__phys_end
>>>>  	ldr	r1, [r1]
>>>> -	dcache_clean_inval sy, r0, r1, r2, r3
>>>> +	sub	r1, r1, r0
>>>> +	dcache_by_line_op dccimvac, sy, r0, r1, r2, r3
>>>>  	isb
>>>>  
>>>>  	mov     pc, lr
>>>> diff --git a/arm/cstart64.S b/arm/cstart64.S
>>>> index fc1930bcdb53..046bd3914098 100644
>>>> --- a/arm/cstart64.S
>>>> +++ b/arm/cstart64.S
>>>> @@ -8,6 +8,7 @@
>>>>  #define __ASSEMBLY__
>>>>  #include <auxinfo.h>
>>>>  #include <asm/asm-offsets.h>
>>>> +#include <asm/assembler.h>
>>>>  #include <asm/ptrace.h>
>>>>  #include <asm/processor.h>
>>>>  #include <asm/page.h>
>>>> @@ -204,20 +205,6 @@ asm_mmu_enable:
>>>>  
>>>>  	ret
>>>>  
>>>> -/* Taken with small changes from arch/arm64/incluse/asm/assembler.h */
>>>> -.macro dcache_by_line_op op, domain, start, end, tmp1, tmp2
>>>> -	adrp	\tmp1, dcache_line_size
>>>> -	ldr	\tmp1, [\tmp1, :lo12:dcache_line_size]
>>>> -	sub	\tmp2, \tmp1, #1
>>>> -	bic	\start, \start, \tmp2
>>>> -9998:
>>>> -	dc	\op , \start
>>>> -	add	\start, \start, \tmp1
>>>> -	cmp	\start, \end
>>>> -	b.lo	9998b
>>>> -	dsb	\domain
>>>> -.endm
>>>> -
>>>>  .globl asm_mmu_disable
>>>>  asm_mmu_disable:
>>>>  	mrs	x0, sctlr_el1
>>>> @@ -230,6 +217,7 @@ asm_mmu_disable:
>>>>  	ldr	x0, [x0, :lo12:__phys_offset]
>>>>  	adrp	x1, __phys_end
>>>>  	ldr	x1, [x1, :lo12:__phys_end]
>>>> +	sub	x1, x1, x0
>>>>  	dcache_by_line_op civac, sy, x0, x1, x2, x3
>>>>  	isb
>>>>    

^ permalink raw reply	[flat|nested] 17+ messages in thread

end of thread, other threads:[~2021-03-22 12:02 UTC | newest]

Thread overview: 17+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-02-27 10:41 [kvm-unit-tests PATCH 0/6] Misc assembly fixes and cleanups Alexandru Elisei
2021-02-27 10:41 ` [kvm-unit-tests PATCH 1/6] arm64: Remove unnecessary ISB when writing to SPSel Alexandru Elisei
2021-03-03 17:35   ` Andre Przywara
2021-02-27 10:41 ` [kvm-unit-tests PATCH 2/6] arm/arm64: Remove dcache_line_size global variable Alexandru Elisei
2021-03-04 15:00   ` Andre Przywara
2021-03-15 15:46     ` Alexandru Elisei
2021-03-16 15:40       ` Andre Przywara
2021-03-22 12:01         ` Alexandru Elisei
2021-02-27 10:41 ` [kvm-unit-tests PATCH 3/6] arm/arm64: Remove unnecessary ISB when doing dcache maintenance Alexandru Elisei
2021-03-12 14:59   ` Andrew Jones
2021-03-15 16:22     ` Alexandru Elisei
2021-02-27 10:41 ` [kvm-unit-tests PATCH 4/6] lib: arm64: Consolidate register definitions to sysreg.h Alexandru Elisei
2021-03-03 17:32   ` Andre Przywara
2021-02-27 10:42 ` [kvm-unit-tests PATCH 5/6] arm64: Configure SCTLR_EL1 at boot Alexandru Elisei
2021-03-03 17:32   ` Andre Przywara
2021-02-27 10:42 ` [kvm-unit-tests PATCH 6/6] arm64: Disable TTBR1_EL1 translation table walks Alexandru Elisei
2021-03-03 17:32   ` Andre Przywara

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).