linux-efi.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v3 0/5] ARM: decompressor: use by-VA cache maintenance for v7 cores
@ 2020-02-24 12:17 Ard Biesheuvel
  2020-02-24 12:17 ` [PATCH v3 1/5] efi/arm: Work around missing cache maintenance in decompressor handover Ard Biesheuvel
                   ` (5 more replies)
  0 siblings, 6 replies; 10+ messages in thread
From: Ard Biesheuvel @ 2020-02-24 12:17 UTC (permalink / raw)
  To: linux-efi
  Cc: linux-arm-kernel, Ard Biesheuvel, Russell King, Marc Zyngier,
	Nicolas Pitre, Catalin Marinas, Tony Lindgren, Linus Walleij

While making changes to the EFI stub startup code, I noticed that we are
still doing set/way maintenance on the caches when booting on v7 cores.
This works today on VMs by virtue of the fact that KVM traps set/way ops
and cleans the whole address space by VA on behalf of the guest, and on
most v7 hardware, the set/way ops are in fact sufficient when only one
core is running, as there usually is no system cache. But on systems
like SynQuacer, for which 32-bit firmware is available, the current cache
maintenance only pushes the data out to the L3 system cache, where it
is not visible to the CPU once it turns the MMU and caches off.

So instead, switch to the by-VA cache maintenance that the architecture
requires for v7 and later (and ARM1176, as a side effect).

Changes since v2:
- add a patch to factor out the code sequence that obtains the inflated image
  size by doing an unaligned LE32 load from the end of the compressed data
- use new macro to load the inflated image size instead of doing a potentially
  unaligned load
- omit the stack for getting the base and size of the self-relocated zImage

Changes since v1:
- include the EFI patch that was sent out separately before (#1)
- split the preparatory work to pass the region to clean in r0/r1 in a EFI
  specific one and one for the decompressor - this way, the first two patches
  can go on a stable branch that is shared between the ARM tree and the EFI
  tree
- document the meaning of the values in r0/r1 upon entry to cache_clean_flush
- take care to treat the region end address as exclusive
- switch to clean+invalidate to align with the other implementations
- drop some code that manages the stack pointer value before calling
  cache_clean_flush(), which is no longer necessary
- take care to clean the entire region that is covered by the relocated zImage
  if it needs to relocate itself before decompressing

https://git.kernel.org/pub/scm/linux/kernel/git/ardb/linux.git/log/?h=arm32-efi-cache-ops

[ Several people asked me offline why on earth I am running SynQuacer on 32 bit:
  the answer is that this is simply to prove that it is currently broken, and
  this implies that for 32-bit VMs running under KVM, we are relying on the
  special, non-architectural cache management done by the hypervisor on behalf
  of the guest to be able to run this code. ]

Cc: Russell King <linux@armlinux.org.uk>
Cc: Marc Zyngier <maz@kernel.org>
Cc: Nicolas Pitre <nico@fluxnic.net>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Tony Lindgren <tony@atomide.com>
Cc: Linus Walleij <linus.walleij@linaro.org>

Ard Biesheuvel (5):
  efi/arm: Work around missing cache maintenance in decompressor
    handover
  efi/arm: Pass start and end addresses to cache_clean_flush()
  ARM: decompressor: factor out routine to obtain the inflated image
    size
  ARM: decompressor: prepare cache_clean_flush for doing by-VA
    maintenance
  ARM: decompressor: switch to by-VA cache maintenance for v7 cores

 arch/arm/boot/compressed/head.S | 166 +++++++++++---------
 1 file changed, 91 insertions(+), 75 deletions(-)

-- 
2.17.1


^ permalink raw reply	[flat|nested] 10+ messages in thread

* [PATCH v3 1/5] efi/arm: Work around missing cache maintenance in decompressor handover
  2020-02-24 12:17 [PATCH v3 0/5] ARM: decompressor: use by-VA cache maintenance for v7 cores Ard Biesheuvel
@ 2020-02-24 12:17 ` Ard Biesheuvel
  2020-02-24 12:17 ` [PATCH v3 2/5] efi/arm: Pass start and end addresses to cache_clean_flush() Ard Biesheuvel
                   ` (4 subsequent siblings)
  5 siblings, 0 replies; 10+ messages in thread
From: Ard Biesheuvel @ 2020-02-24 12:17 UTC (permalink / raw)
  To: linux-efi
  Cc: linux-arm-kernel, Ard Biesheuvel, Russell King, Marc Zyngier,
	Nicolas Pitre, Catalin Marinas, Tony Lindgren, Linus Walleij

The EFI stub executes within the context of the zImage as it was
loaded by the firmware, which means it is treated as an ordinary
PE/COFF executable, which is loaded into memory, and cleaned to
the PoU to ensure that it can be executed safely while the MMU
and caches are on.

When the EFI stub hands over to the decompressor, we clean the caches
by set/way and disable the MMU and D-cache, to comply with the Linux
boot protocol for ARM. However, cache maintenance by set/way is not
sufficient to ensure that subsequent instruction fetches and data
accesses done with the MMU off see the correct data. This means that
proceeding as we do currently is not safe, especially since we also
perform data accesses with the MMU off, from a literal pool as well as
the stack.

So let's kick this can down the road a bit, and jump into the relocated
zImage before disabling the caches. This removes the requirement to
perform any by-VA cache maintenance on the original PE/COFF executable,
but it does require that the relocated zImage is cleaned to the PoC,
which is currently not the case. This will be addressed in a subsequent
patch.

Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
---
 arch/arm/boot/compressed/head.S | 20 ++++++++++++++------
 1 file changed, 14 insertions(+), 6 deletions(-)

diff --git a/arch/arm/boot/compressed/head.S b/arch/arm/boot/compressed/head.S
index 088b0a060876..39f7071d47c7 100644
--- a/arch/arm/boot/compressed/head.S
+++ b/arch/arm/boot/compressed/head.S
@@ -1461,6 +1461,17 @@ ENTRY(efi_stub_entry)
 		@ Preserve return value of efi_entry() in r4
 		mov	r4, r0
 		bl	cache_clean_flush
+
+		@ The PE/COFF loader might not have cleaned the code we are
+		@ running beyond the PoU, and so calling cache_off below from
+		@ inside the PE/COFF loader allocated region is unsafe. Let's
+		@ assume our own zImage relocation code did a better job, and
+		@ jump into its version of this routine before proceeding.
+		ldr	r0, [sp]			@ relocated zImage
+		ldr	r1, .Ljmp
+		sub	r1, r0, r1
+		mov	pc, r1				@ no mode switch
+0:
 		bl	cache_off
 
 		@ Set parameters for booting zImage according to boot protocol
@@ -1469,18 +1480,15 @@ ENTRY(efi_stub_entry)
 		mov	r0, #0
 		mov	r1, #0xFFFFFFFF
 		mov	r2, r4
-
-		@ Branch to (possibly) relocated zImage that is in [sp]
-		ldr	lr, [sp]
-		ldr	ip, =start_offset
-		add	lr, lr, ip
-		mov	pc, lr				@ no mode switch
+		b	__efi_start
 
 efi_load_fail:
 		@ Return EFI_LOAD_ERROR to EFI firmware on error.
 		ldr	r0, =0x80000001
 		ldmfd	sp!, {ip, pc}
 ENDPROC(efi_stub_entry)
+		.align	2
+.Ljmp:		.long	start - 0b
 #endif
 
 		.align
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [PATCH v3 2/5] efi/arm: Pass start and end addresses to cache_clean_flush()
  2020-02-24 12:17 [PATCH v3 0/5] ARM: decompressor: use by-VA cache maintenance for v7 cores Ard Biesheuvel
  2020-02-24 12:17 ` [PATCH v3 1/5] efi/arm: Work around missing cache maintenance in decompressor handover Ard Biesheuvel
@ 2020-02-24 12:17 ` Ard Biesheuvel
  2020-02-24 12:17 ` [PATCH v3 3/5] ARM: decompressor: factor out routine to obtain the inflated image size Ard Biesheuvel
                   ` (3 subsequent siblings)
  5 siblings, 0 replies; 10+ messages in thread
From: Ard Biesheuvel @ 2020-02-24 12:17 UTC (permalink / raw)
  To: linux-efi
  Cc: linux-arm-kernel, Ard Biesheuvel, Russell King, Marc Zyngier,
	Nicolas Pitre, Catalin Marinas, Tony Lindgren, Linus Walleij

In preparation for turning the decompressor's cache clean/flush
operations into proper by-VA maintenance for v7 cores, pass the
start and end addresses of the regions that need cache maintenance
into cache_clean_flush in registers r0 and r1.

Currently, all implementations of cache_clean_flush ignore these
values, so no functional change is expected as a result of this
patch.

Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
---
 arch/arm/boot/compressed/head.S | 6 ++++++
 1 file changed, 6 insertions(+)

diff --git a/arch/arm/boot/compressed/head.S b/arch/arm/boot/compressed/head.S
index 39f7071d47c7..8487221bedb0 100644
--- a/arch/arm/boot/compressed/head.S
+++ b/arch/arm/boot/compressed/head.S
@@ -1460,6 +1460,12 @@ ENTRY(efi_stub_entry)
 
 		@ Preserve return value of efi_entry() in r4
 		mov	r4, r0
+		add	r1, r4, #SZ_2M			@ DT end
+		bl	cache_clean_flush
+
+		ldr	r0, [sp]			@ relocated zImage
+		ldr	r1, =_edata			@ size of zImage
+		add	r1, r1, r0			@ end of zImage
 		bl	cache_clean_flush
 
 		@ The PE/COFF loader might not have cleaned the code we are
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [PATCH v3 3/5] ARM: decompressor: factor out routine to obtain the inflated image size
  2020-02-24 12:17 [PATCH v3 0/5] ARM: decompressor: use by-VA cache maintenance for v7 cores Ard Biesheuvel
  2020-02-24 12:17 ` [PATCH v3 1/5] efi/arm: Work around missing cache maintenance in decompressor handover Ard Biesheuvel
  2020-02-24 12:17 ` [PATCH v3 2/5] efi/arm: Pass start and end addresses to cache_clean_flush() Ard Biesheuvel
@ 2020-02-24 12:17 ` Ard Biesheuvel
  2020-02-24 12:17 ` [PATCH v3 4/5] ARM: decompressor: prepare cache_clean_flush for doing by-VA maintenance Ard Biesheuvel
                   ` (2 subsequent siblings)
  5 siblings, 0 replies; 10+ messages in thread
From: Ard Biesheuvel @ 2020-02-24 12:17 UTC (permalink / raw)
  To: linux-efi
  Cc: linux-arm-kernel, Ard Biesheuvel, Russell King, Marc Zyngier,
	Nicolas Pitre, Catalin Marinas, Tony Lindgren, Linus Walleij

Before adding another reference to the inflated image size, factor
out the slightly complicated way of loading the unaligned little-endian
constant from the end of the compressed data.

Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
---
 arch/arm/boot/compressed/head.S | 43 ++++++++++++--------
 1 file changed, 26 insertions(+), 17 deletions(-)

diff --git a/arch/arm/boot/compressed/head.S b/arch/arm/boot/compressed/head.S
index 8487221bedb0..674e55400cfd 100644
--- a/arch/arm/boot/compressed/head.S
+++ b/arch/arm/boot/compressed/head.S
@@ -151,6 +151,25 @@
 .L_\@:
 		.endm
 
+		/*
+		 * The kernel build system appends the size of the
+		 * decompressed kernel at the end of the compressed data
+		 * in little-endian form.
+		 */
+		.macro	get_inflated_image_size, res:req, tmp1:req, tmp2:req
+		adr	\res, .Linflated_image_size_offset
+		ldr	\tmp1, [\res]
+		add	\tmp1, \tmp1, \res	@ offset of inflated image size
+
+		ldrb	\res, [\tmp1]		@ get_unaligned_le32
+		ldrb	\tmp2, [\tmp1, #1]
+		orr	\res, \res, \tmp2, lsl #8
+		ldrb	\tmp2, [\tmp1, #2]
+		ldrb	\tmp1, [\tmp1, #3]
+		orr	\res, \res, \tmp2, lsl #16
+		orr	\res, \res, \tmp1, lsl #24
+		.endm
+
 		.section ".start", "ax"
 /*
  * sort out different calling conventions
@@ -268,15 +287,15 @@ not_angel:
 		 */
 		mov	r0, pc
 		cmp	r0, r4
-		ldrcc	r0, LC0+32
+		ldrcc	r0, LC0+28
 		addcc	r0, r0, pc
 		cmpcc	r4, r0
 		orrcc	r4, r4, #1		@ remember we skipped cache_on
 		blcs	cache_on
 
 restart:	adr	r0, LC0
-		ldmia	r0, {r1, r2, r3, r6, r10, r11, r12}
-		ldr	sp, [r0, #28]
+		ldmia	r0, {r1, r2, r3, r6, r11, r12}
+		ldr	sp, [r0, #24]
 
 		/*
 		 * We might be running at a different address.  We need
@@ -284,20 +303,8 @@ restart:	adr	r0, LC0
 		 */
 		sub	r0, r0, r1		@ calculate the delta offset
 		add	r6, r6, r0		@ _edata
-		add	r10, r10, r0		@ inflated kernel size location
 
-		/*
-		 * The kernel build system appends the size of the
-		 * decompressed kernel at the end of the compressed data
-		 * in little-endian form.
-		 */
-		ldrb	r9, [r10, #0]
-		ldrb	lr, [r10, #1]
-		orr	r9, r9, lr, lsl #8
-		ldrb	lr, [r10, #2]
-		ldrb	r10, [r10, #3]
-		orr	r9, r9, lr, lsl #16
-		orr	r9, r9, r10, lsl #24
+		get_inflated_image_size	r9, r10, lr
 
 #ifndef CONFIG_ZBOOT_ROM
 		/* malloc space is above the relocated stack (64k max) */
@@ -652,13 +659,15 @@ LC0:		.word	LC0			@ r1
 		.word	__bss_start		@ r2
 		.word	_end			@ r3
 		.word	_edata			@ r6
-		.word	input_data_end - 4	@ r10 (inflated size location)
 		.word	_got_start		@ r11
 		.word	_got_end		@ ip
 		.word	.L_user_stack_end	@ sp
 		.word	_end - restart + 16384 + 1024*1024
 		.size	LC0, . - LC0
 
+.Linflated_image_size_offset:
+		.long	(input_data_end - 4) - .
+
 #ifdef CONFIG_ARCH_RPC
 		.globl	params
 params:		ldr	r0, =0x10000100		@ params_phys for RPC
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [PATCH v3 4/5] ARM: decompressor: prepare cache_clean_flush for doing by-VA maintenance
  2020-02-24 12:17 [PATCH v3 0/5] ARM: decompressor: use by-VA cache maintenance for v7 cores Ard Biesheuvel
                   ` (2 preceding siblings ...)
  2020-02-24 12:17 ` [PATCH v3 3/5] ARM: decompressor: factor out routine to obtain the inflated image size Ard Biesheuvel
@ 2020-02-24 12:17 ` Ard Biesheuvel
  2020-02-24 12:17 ` [PATCH v3 5/5] ARM: decompressor: switch to by-VA cache maintenance for v7 cores Ard Biesheuvel
  2020-02-25 15:48 ` [PATCH v3 0/5] ARM: decompressor: use " Linus Walleij
  5 siblings, 0 replies; 10+ messages in thread
From: Ard Biesheuvel @ 2020-02-24 12:17 UTC (permalink / raw)
  To: linux-efi
  Cc: linux-arm-kernel, Ard Biesheuvel, Russell King, Marc Zyngier,
	Nicolas Pitre, Catalin Marinas, Tony Lindgren, Linus Walleij

In preparation for turning the decompressor's cache clean/flush
operations into proper by-VA maintenance for v7 cores, pass the
start and end addresses of the regions that need cache maintenance
into cache_clean_flush in registers r0 and r1.

Currently, all implementations of cache_clean_flush ignore these
values, so no functional change is expected as a result of this
patch.

Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
---
 arch/arm/boot/compressed/head.S | 15 +++++++++++++++
 1 file changed, 15 insertions(+)

diff --git a/arch/arm/boot/compressed/head.S b/arch/arm/boot/compressed/head.S
index 674e55400cfd..12d631503bfa 100644
--- a/arch/arm/boot/compressed/head.S
+++ b/arch/arm/boot/compressed/head.S
@@ -533,12 +533,19 @@ dtb_check_done:
 		add	sp, sp, r6
 #endif
 
+		adr	r0, restart
+		ldr	r1, .Lclean_size
+		add	r0, r0, r6
+		add	r1, r1, r0
 		bl	cache_clean_flush
 
 		badr	r0, restart
 		add	r0, r0, r6
 		mov	pc, r0
 
+		.align	2
+.Lclean_size:	.long	_edata - restart
+
 wont_overwrite:
 /*
  * If delta is zero, we are running at the address we were linked at.
@@ -629,6 +636,11 @@ not_relocated:	mov	r0, #0
 		add	r2, sp, #0x10000	@ 64k max
 		mov	r3, r7
 		bl	decompress_kernel
+
+		get_inflated_image_size r1, r2, r3
+
+		mov	r0, r4			@ start of inflated image
+		add	r1, r1, r0		@ end of inflated image
 		bl	cache_clean_flush
 		bl	cache_off
 
@@ -1182,6 +1194,9 @@ __armv7_mmu_cache_off:
 /*
  * Clean and flush the cache to maintain consistency.
  *
+ * On entry,
+ *  r0 = start address
+ *  r1 = end address (exclusive)
  * On exit,
  *  r1, r2, r3, r9, r10, r11, r12 corrupted
  * This routine must preserve:
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [PATCH v3 5/5] ARM: decompressor: switch to by-VA cache maintenance for v7 cores
  2020-02-24 12:17 [PATCH v3 0/5] ARM: decompressor: use by-VA cache maintenance for v7 cores Ard Biesheuvel
                   ` (3 preceding siblings ...)
  2020-02-24 12:17 ` [PATCH v3 4/5] ARM: decompressor: prepare cache_clean_flush for doing by-VA maintenance Ard Biesheuvel
@ 2020-02-24 12:17 ` Ard Biesheuvel
  2020-02-25 15:48 ` [PATCH v3 0/5] ARM: decompressor: use " Linus Walleij
  5 siblings, 0 replies; 10+ messages in thread
From: Ard Biesheuvel @ 2020-02-24 12:17 UTC (permalink / raw)
  To: linux-efi
  Cc: linux-arm-kernel, Ard Biesheuvel, Russell King, Marc Zyngier,
	Nicolas Pitre, Catalin Marinas, Tony Lindgren, Linus Walleij

Update the v7 cache_clean_flush routine to take into account the
memory range passed in r0/r1, and perform cache maintenance by
virtual address on this range instead of set/way maintenance, which
is inappropriate for the purpose of maintaining the cached state of
memory contents.

Since this removes any use of the stack in the implementation of
cache_clean_flush(), we can also drop some code that manages the
value of the stack pointer before calling it.

Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
---
 arch/arm/boot/compressed/head.S | 82 +++++++-------------
 1 file changed, 30 insertions(+), 52 deletions(-)

diff --git a/arch/arm/boot/compressed/head.S b/arch/arm/boot/compressed/head.S
index 12d631503bfa..aedc9bdb1719 100644
--- a/arch/arm/boot/compressed/head.S
+++ b/arch/arm/boot/compressed/head.S
@@ -528,10 +528,6 @@ dtb_check_done:
 		/* Preserve offset to relocated code. */
 		sub	r6, r9, r6
 
-#ifndef CONFIG_ZBOOT_ROM
-		/* cache_clean_flush may use the stack, so relocate it */
-		add	sp, sp, r6
-#endif
 
 		adr	r0, restart
 		ldr	r1, .Lclean_size
@@ -688,6 +684,24 @@ params:		ldr	r0, =0x10000100		@ params_phys for RPC
 		.align
 #endif
 
+/*
+ * dcache_line_size - get the minimum D-cache line size from the CTR register
+ * on ARMv7.
+ */
+		.macro	dcache_line_size, reg, tmp
+#ifdef CONFIG_CPU_V7M
+		movw	\tmp, #:lower16:BASEADDR_V7M_SCB + V7M_SCB_CTR
+		movt	\tmp, #:upper16:BASEADDR_V7M_SCB + V7M_SCB_CTR
+		ldr	\tmp, [\tmp]
+#else
+		mrc	p15, 0, \tmp, c0, c0, 1		@ read ctr
+#endif
+		lsr	\tmp, \tmp, #16
+		and	\tmp, \tmp, #0xf		@ cache line size encoding
+		mov	\reg, #4			@ bytes per word
+		mov	\reg, \reg, lsl \tmp		@ actual cache line size
+		.endm
+
 /*
  * Turn on the cache.  We need to setup some page tables so that we
  * can have both the I and D caches on.
@@ -1180,8 +1194,6 @@ __armv7_mmu_cache_off:
 		bic	r0, r0, #0x000c
 #endif
 		mcr	p15, 0, r0, c1, c0	@ turn MMU and cache off
-		mov	r12, lr
-		bl	__armv7_mmu_cache_flush
 		mov	r0, #0
 #ifdef CONFIG_MMU
 		mcr	p15, 0, r0, c8, c7, 0	@ invalidate whole TLB
@@ -1189,7 +1201,7 @@ __armv7_mmu_cache_off:
 		mcr	p15, 0, r0, c7, c5, 6	@ invalidate BTC
 		mcr	p15, 0, r0, c7, c10, 4	@ DSB
 		mcr	p15, 0, r0, c7, c5, 4	@ ISB
-		mov	pc, r12
+		mov	pc, lr
 
 /*
  * Clean and flush the cache to maintain consistency.
@@ -1205,6 +1217,7 @@ __armv7_mmu_cache_off:
 		.align	5
 cache_clean_flush:
 		mov	r3, #16
+		mov	r11, r1
 		b	call_cache_fn
 
 __armv4_mpu_cache_flush:
@@ -1255,51 +1268,16 @@ __armv7_mmu_cache_flush:
 		mcr	p15, 0, r10, c7, c14, 0	@ clean+invalidate D
 		b	iflush
 hierarchical:
-		mcr	p15, 0, r10, c7, c10, 5	@ DMB
-		stmfd	sp!, {r0-r7, r9-r11}
-		mrc	p15, 1, r0, c0, c0, 1	@ read clidr
-		ands	r3, r0, #0x7000000	@ extract loc from clidr
-		mov	r3, r3, lsr #23		@ left align loc bit field
-		beq	finished		@ if loc is 0, then no need to clean
-		mov	r10, #0			@ start clean at cache level 0
-loop1:
-		add	r2, r10, r10, lsr #1	@ work out 3x current cache level
-		mov	r1, r0, lsr r2		@ extract cache type bits from clidr
-		and	r1, r1, #7		@ mask of the bits for current cache only
-		cmp	r1, #2			@ see what cache we have at this level
-		blt	skip			@ skip if no cache, or just i-cache
-		mcr	p15, 2, r10, c0, c0, 0	@ select current cache level in cssr
-		mcr	p15, 0, r10, c7, c5, 4	@ isb to sych the new cssr&csidr
-		mrc	p15, 1, r1, c0, c0, 0	@ read the new csidr
-		and	r2, r1, #7		@ extract the length of the cache lines
-		add	r2, r2, #4		@ add 4 (line length offset)
-		ldr	r4, =0x3ff
-		ands	r4, r4, r1, lsr #3	@ find maximum number on the way size
-		clz	r5, r4			@ find bit position of way size increment
-		ldr	r7, =0x7fff
-		ands	r7, r7, r1, lsr #13	@ extract max number of the index size
-loop2:
-		mov	r9, r4			@ create working copy of max way size
-loop3:
- ARM(		orr	r11, r10, r9, lsl r5	) @ factor way and cache number into r11
- ARM(		orr	r11, r11, r7, lsl r2	) @ factor index number into r11
- THUMB(		lsl	r6, r9, r5		)
- THUMB(		orr	r11, r10, r6		) @ factor way and cache number into r11
- THUMB(		lsl	r6, r7, r2		)
- THUMB(		orr	r11, r11, r6		) @ factor index number into r11
-		mcr	p15, 0, r11, c7, c14, 2	@ clean & invalidate by set/way
-		subs	r9, r9, #1		@ decrement the way
-		bge	loop3
-		subs	r7, r7, #1		@ decrement the index
-		bge	loop2
-skip:
-		add	r10, r10, #2		@ increment cache number
-		cmp	r3, r10
-		bgt	loop1
-finished:
-		ldmfd	sp!, {r0-r7, r9-r11}
-		mov	r10, #0			@ switch back to cache level 0
-		mcr	p15, 2, r10, c0, c0, 0	@ select current cache level in cssr
+		dcache_line_size r1, r2		@ r1 := dcache min line size
+		sub	r2, r1, #1		@ r2 := line size mask
+		bic	r0, r0, r2		@ round down start to line size
+		sub	r11, r11, #1		@ end address is exclusive
+		bic	r11, r11, r2		@ round down end to line size
+0:		cmp	r0, r11			@ finished?
+		bgt	iflush
+		mcr	p15, 0, r0, c7, c14, 1	@ Dcache clean/invalidate by VA
+		add	r0, r0, r1
+		b	0b
 iflush:
 		mcr	p15, 0, r10, c7, c10, 4	@ DSB
 		mcr	p15, 0, r10, c7, c5, 0	@ invalidate I+BTB
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* Re: [PATCH v3 0/5] ARM: decompressor: use by-VA cache maintenance for v7 cores
  2020-02-24 12:17 [PATCH v3 0/5] ARM: decompressor: use by-VA cache maintenance for v7 cores Ard Biesheuvel
                   ` (4 preceding siblings ...)
  2020-02-24 12:17 ` [PATCH v3 5/5] ARM: decompressor: switch to by-VA cache maintenance for v7 cores Ard Biesheuvel
@ 2020-02-25 15:48 ` Linus Walleij
  2020-02-25 17:18   ` Ard Biesheuvel
  5 siblings, 1 reply; 10+ messages in thread
From: Linus Walleij @ 2020-02-25 15:48 UTC (permalink / raw)
  To: Ard Biesheuvel
  Cc: linux-efi, Linux ARM, Russell King, Marc Zyngier, Nicolas Pitre,
	Catalin Marinas, Tony Lindgren

On Mon, Feb 24, 2020 at 1:17 PM Ard Biesheuvel <ardb@kernel.org> wrote:

> While making changes to the EFI stub startup code, I noticed that we are
> still doing set/way maintenance on the caches when booting on v7 cores.
> This works today on VMs by virtue of the fact that KVM traps set/way ops
> and cleans the whole address space by VA on behalf of the guest, and on
> most v7 hardware, the set/way ops are in fact sufficient when only one
> core is running, as there usually is no system cache. But on systems
> like SynQuacer, for which 32-bit firmware is available, the current cache
> maintenance only pushes the data out to the L3 system cache, where it
> is not visible to the CPU once it turns the MMU and caches off.
>
> So instead, switch to the by-VA cache maintenance that the architecture
> requires for v7 and later (and ARM1176, as a side effect).

I took this v3 patch set for a ride on some ARMv7 and ARMv6
(hardware) boards using zImage:s so the compressed path
should be exercised:

- Ux500 (ARMv7 Cortex A9 x 2) works like a charm
- RealView PB11MPCore (ARM1176 x 4 MPCore) works like a charm

Tested-by: Linus Walleij <linus.walleij@linaro.org>

I can do more thorough tests with more boards if needed.

Yours,
Linus Walleij

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH v3 0/5] ARM: decompressor: use by-VA cache maintenance for v7 cores
  2020-02-25 15:48 ` [PATCH v3 0/5] ARM: decompressor: use " Linus Walleij
@ 2020-02-25 17:18   ` Ard Biesheuvel
  2020-02-25 17:30     ` Ard Biesheuvel
  2020-02-25 21:25     ` Linus Walleij
  0 siblings, 2 replies; 10+ messages in thread
From: Ard Biesheuvel @ 2020-02-25 17:18 UTC (permalink / raw)
  To: Linus Walleij
  Cc: linux-efi, Linux ARM, Russell King, Marc Zyngier, Nicolas Pitre,
	Catalin Marinas, Tony Lindgren

On Tue, 25 Feb 2020 at 16:48, Linus Walleij <linus.walleij@linaro.org> wrote:
>
> On Mon, Feb 24, 2020 at 1:17 PM Ard Biesheuvel <ardb@kernel.org> wrote:
>
> > While making changes to the EFI stub startup code, I noticed that we are
> > still doing set/way maintenance on the caches when booting on v7 cores.
> > This works today on VMs by virtue of the fact that KVM traps set/way ops
> > and cleans the whole address space by VA on behalf of the guest, and on
> > most v7 hardware, the set/way ops are in fact sufficient when only one
> > core is running, as there usually is no system cache. But on systems
> > like SynQuacer, for which 32-bit firmware is available, the current cache
> > maintenance only pushes the data out to the L3 system cache, where it
> > is not visible to the CPU once it turns the MMU and caches off.
> >
> > So instead, switch to the by-VA cache maintenance that the architecture
> > requires for v7 and later (and ARM1176, as a side effect).
>
> I took this v3 patch set for a ride on some ARMv7 and ARMv6
> (hardware) boards using zImage:s so the compressed path
> should be exercised:
>
> - Ux500 (ARMv7 Cortex A9 x 2) works like a charm
> - RealView PB11MPCore (ARM1176 x 4 MPCore) works like a charm
>
> Tested-by: Linus Walleij <linus.walleij@linaro.org>
>
> I can do more thorough tests with more boards if needed.
>

Thanks Linus. Do you happen to have any boards that boot with appended DTB?

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH v3 0/5] ARM: decompressor: use by-VA cache maintenance for v7 cores
  2020-02-25 17:18   ` Ard Biesheuvel
@ 2020-02-25 17:30     ` Ard Biesheuvel
  2020-02-25 21:25     ` Linus Walleij
  1 sibling, 0 replies; 10+ messages in thread
From: Ard Biesheuvel @ 2020-02-25 17:30 UTC (permalink / raw)
  To: Linus Walleij
  Cc: linux-efi, Linux ARM, Russell King, Marc Zyngier, Nicolas Pitre,
	Catalin Marinas, Tony Lindgren

On Tue, 25 Feb 2020 at 18:18, Ard Biesheuvel <ardb@kernel.org> wrote:
>
> On Tue, 25 Feb 2020 at 16:48, Linus Walleij <linus.walleij@linaro.org> wrote:
> >
> > On Mon, Feb 24, 2020 at 1:17 PM Ard Biesheuvel <ardb@kernel.org> wrote:
> >
> > > While making changes to the EFI stub startup code, I noticed that we are
> > > still doing set/way maintenance on the caches when booting on v7 cores.
> > > This works today on VMs by virtue of the fact that KVM traps set/way ops
> > > and cleans the whole address space by VA on behalf of the guest, and on
> > > most v7 hardware, the set/way ops are in fact sufficient when only one
> > > core is running, as there usually is no system cache. But on systems
> > > like SynQuacer, for which 32-bit firmware is available, the current cache
> > > maintenance only pushes the data out to the L3 system cache, where it
> > > is not visible to the CPU once it turns the MMU and caches off.
> > >
> > > So instead, switch to the by-VA cache maintenance that the architecture
> > > requires for v7 and later (and ARM1176, as a side effect).
> >
> > I took this v3 patch set for a ride on some ARMv7 and ARMv6
> > (hardware) boards using zImage:s so the compressed path
> > should be exercised:
> >
> > - Ux500 (ARMv7 Cortex A9 x 2) works like a charm
> > - RealView PB11MPCore (ARM1176 x 4 MPCore) works like a charm
> >
> > Tested-by: Linus Walleij <linus.walleij@linaro.org>
> >
> > I can do more thorough tests with more boards if needed.
> >
>
> Thanks Linus. Do you happen to have any boards that boot with appended DTB?

Actually, I can easily test that myself as well in QEMU.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH v3 0/5] ARM: decompressor: use by-VA cache maintenance for v7 cores
  2020-02-25 17:18   ` Ard Biesheuvel
  2020-02-25 17:30     ` Ard Biesheuvel
@ 2020-02-25 21:25     ` Linus Walleij
  1 sibling, 0 replies; 10+ messages in thread
From: Linus Walleij @ 2020-02-25 21:25 UTC (permalink / raw)
  To: Ard Biesheuvel
  Cc: linux-efi, Linux ARM, Russell King, Marc Zyngier, Nicolas Pitre,
	Catalin Marinas, Tony Lindgren

On Tue, Feb 25, 2020 at 6:18 PM Ard Biesheuvel <ardb@kernel.org> wrote:
> On Tue, 25 Feb 2020 at 16:48, Linus Walleij <linus.walleij@linaro.org> wrote:

> > I took this v3 patch set for a ride on some ARMv7 and ARMv6
> > (hardware) boards using zImage:s so the compressed path
> > should be exercised:
> >
> > - Ux500 (ARMv7 Cortex A9 x 2) works like a charm
> > - RealView PB11MPCore (ARM1176 x 4 MPCore) works like a charm
> >
> > Tested-by: Linus Walleij <linus.walleij@linaro.org>
> >
> > I can do more thorough tests with more boards if needed.
>
> Thanks Linus. Do you happen to have any boards that boot with appended DTB?

Oh, both of these use appended DTB so it's definitely working.

Yours,
Linus Walleij

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2020-02-25 21:26 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-02-24 12:17 [PATCH v3 0/5] ARM: decompressor: use by-VA cache maintenance for v7 cores Ard Biesheuvel
2020-02-24 12:17 ` [PATCH v3 1/5] efi/arm: Work around missing cache maintenance in decompressor handover Ard Biesheuvel
2020-02-24 12:17 ` [PATCH v3 2/5] efi/arm: Pass start and end addresses to cache_clean_flush() Ard Biesheuvel
2020-02-24 12:17 ` [PATCH v3 3/5] ARM: decompressor: factor out routine to obtain the inflated image size Ard Biesheuvel
2020-02-24 12:17 ` [PATCH v3 4/5] ARM: decompressor: prepare cache_clean_flush for doing by-VA maintenance Ard Biesheuvel
2020-02-24 12:17 ` [PATCH v3 5/5] ARM: decompressor: switch to by-VA cache maintenance for v7 cores Ard Biesheuvel
2020-02-25 15:48 ` [PATCH v3 0/5] ARM: decompressor: use " Linus Walleij
2020-02-25 17:18   ` Ard Biesheuvel
2020-02-25 17:30     ` Ard Biesheuvel
2020-02-25 21:25     ` Linus Walleij

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).