* [PATCH 0/2] ARM: decompressor: use by-VA cache maintenance for v7 cores
@ 2020-02-18 16:44 Ard Biesheuvel
2020-02-18 16:44 ` [PATCH 1/2] ARM: decompressor: prepare cache_clean_flush for doing by-VA maintenance Ard Biesheuvel
2020-02-18 16:44 ` [PATCH 2/2] ARM: decompressor: switch to by-VA cache maintenance for v7 cores Ard Biesheuvel
0 siblings, 2 replies; 6+ messages in thread
From: Ard Biesheuvel @ 2020-02-18 16:44 UTC (permalink / raw)
To: linux-efi
Cc: linux-arm-kernel, Ard Biesheuvel, Russell King, Marc Zyngier,
Nicolas Pitre, Catalin Marinas
While making changes to the EFI stub startup code, I noticed that we are
still doing set/way maintenance on the caches when booting on v7 cores.
This works today on VMs by virtue of the fact that KVM traps set/way ops
and cleans the whole address space by VA on behalf of the guest, and on
most v7 hardware, the set/way ops are in fact sufficient when only one
core is running, as there usually is no system cache.
But let's make this code a bit more future proof, by switching to by-VA
ops for the v7 code paths (and for ARM1176, as a side effect).
Note that these patches are based on an EFI stub fix that I have omitted
here, and which can be found at
https://git.kernel.org/pub/scm/linux/kernel/git/ardb/linux.git/commit/?h=arm32-efi-cache-ops&id=01d742dcf0a3dce6f6db9e4661750129bc3d3216
Cc: Russell King <linux@armlinux.org.uk>
Cc: Marc Zyngier <maz@kernel.org>
Cc: Nicolas Pitre <nico@fluxnic.net>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Ard Biesheuvel (2):
ARM: decompressor: prepare cache_clean_flush for doing by-VA
maintenance
ARM: decompressor: switch to by-VA cache maintenance for v7 cores
arch/arm/boot/compressed/head.S | 105 ++++++++++----------
1 file changed, 54 insertions(+), 51 deletions(-)
--
2.17.1
^ permalink raw reply [flat|nested] 6+ messages in thread
* [PATCH 1/2] ARM: decompressor: prepare cache_clean_flush for doing by-VA maintenance
2020-02-18 16:44 [PATCH 0/2] ARM: decompressor: use by-VA cache maintenance for v7 cores Ard Biesheuvel
@ 2020-02-18 16:44 ` Ard Biesheuvel
2020-02-18 16:51 ` Russell King - ARM Linux admin
2020-02-18 16:44 ` [PATCH 2/2] ARM: decompressor: switch to by-VA cache maintenance for v7 cores Ard Biesheuvel
1 sibling, 1 reply; 6+ messages in thread
From: Ard Biesheuvel @ 2020-02-18 16:44 UTC (permalink / raw)
To: linux-efi
Cc: linux-arm-kernel, Ard Biesheuvel, Russell King, Marc Zyngier,
Nicolas Pitre, Catalin Marinas
In preparation of turning the decompressor's cache clean/flush
operations into proper by-VA maintenance for v7 cores, pass the
start and end addresses of the regions that need cache maintenance
into cache_clean_flush in registers r0 and r1.
Currently, all implementations of cache_clean_flush ignore these
values, so no functional change is expected as a result of this
patch.
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
---
arch/arm/boot/compressed/head.S | 28 +++++++++++++++++---
1 file changed, 25 insertions(+), 3 deletions(-)
diff --git a/arch/arm/boot/compressed/head.S b/arch/arm/boot/compressed/head.S
index 7b86a2e1acce..935799b92198 100644
--- a/arch/arm/boot/compressed/head.S
+++ b/arch/arm/boot/compressed/head.S
@@ -525,12 +525,15 @@ dtb_check_done:
/* cache_clean_flush may use the stack, so relocate it */
add sp, sp, r6
#endif
-
+ mov r0, r9
+ ldr r1, 0f
+ add r1, r1, r0
bl cache_clean_flush
badr r0, restart
add r0, r0, r6
mov pc, r0
+0: .long _edata - restart
wont_overwrite:
/*
@@ -622,6 +625,21 @@ not_relocated: mov r0, #0
add r2, sp, #0x10000 @ 64k max
mov r3, r7
bl decompress_kernel
+
+ mov r0, r4 @ base of inflated image
+ adr r1, LC0 @ actual LC0
+ ldr r2, [r1] @ linktime LC0
+ sub r2, r1, r2 @ LC0 delta
+ ldr r1, [r1, #16] @ link time inflated size offset
+ ldr r1, [r1, r2] @ actual inflated size (LE)
+#ifdef __ARMEB__
+ /* convert to big endian */
+ eor r2, r1, r1, ror #16
+ bic r2, r2, #0x00ff0000
+ mov r1, r1, ror #8
+ eor r1, r1, r2, lsr #8
+#endif
+ add r1, r1, r0 @ end of inflated image
bl cache_clean_flush
bl cache_off
@@ -1439,6 +1457,7 @@ reloc_code_end:
#ifdef CONFIG_EFI_STUB
.align 2
_start: .long start - .
+__edata: .long _edata - .
ENTRY(efi_stub_entry)
@ allocate space on stack for passing current zImage address
@@ -1470,8 +1489,11 @@ ENTRY(efi_stub_entry)
.align 2
0: .long start - (. + 4)
- @ Preserve return value of efi_entry() in r4
- mov r4, r0
+ mov r4, r0 @ preserve DTB pointer
+ mov r0, r1 @ start of image
+ adr r2, __edata
+ ldr r1, [r2]
+ add r1, r1, r2 @ end of image
bl cache_clean_flush
bl cache_off
--
2.17.1
^ permalink raw reply related [flat|nested] 6+ messages in thread
* [PATCH 2/2] ARM: decompressor: switch to by-VA cache maintenance for v7 cores
2020-02-18 16:44 [PATCH 0/2] ARM: decompressor: use by-VA cache maintenance for v7 cores Ard Biesheuvel
2020-02-18 16:44 ` [PATCH 1/2] ARM: decompressor: prepare cache_clean_flush for doing by-VA maintenance Ard Biesheuvel
@ 2020-02-18 16:44 ` Ard Biesheuvel
1 sibling, 0 replies; 6+ messages in thread
From: Ard Biesheuvel @ 2020-02-18 16:44 UTC (permalink / raw)
To: linux-efi
Cc: linux-arm-kernel, Ard Biesheuvel, Russell King, Marc Zyngier,
Nicolas Pitre, Catalin Marinas
Update the v7 cache_clean_flush routine to take into account the
memory range passed in r0/r1, and perform cache maintenance by
virtual address on this range instead of set/way maintenance, which
is inappropriate for the purpose of maintaining the cached state of
memory contents.
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
---
arch/arm/boot/compressed/head.S | 77 ++++++++------------
1 file changed, 29 insertions(+), 48 deletions(-)
diff --git a/arch/arm/boot/compressed/head.S b/arch/arm/boot/compressed/head.S
index 935799b92198..df93c9f0a19a 100644
--- a/arch/arm/boot/compressed/head.S
+++ b/arch/arm/boot/compressed/head.S
@@ -685,6 +685,24 @@ params: ldr r0, =0x10000100 @ params_phys for RPC
.align
#endif
+/*
+ * dcache_line_size - get the minimum D-cache line size from the CTR register
+ * on ARMv7.
+ */
+ .macro dcache_line_size, reg, tmp
+#ifdef CONFIG_CPU_V7M
+ movw \tmp, #:lower16:BASEADDR_V7M_SCB + V7M_SCB_CTR
+ movt \tmp, #:upper16:BASEADDR_V7M_SCB + V7M_SCB_CTR
+ ldr \tmp, [\tmp]
+#else
+ mrc p15, 0, \tmp, c0, c0, 1 @ read ctr
+#endif
+ lsr \tmp, \tmp, #16
+ and \tmp, \tmp, #0xf @ cache line size encoding
+ mov \reg, #4 @ bytes per word
+ mov \reg, \reg, lsl \tmp @ actual cache line size
+ .endm
+
/*
* Turn on the cache. We need to setup some page tables so that we
* can have both the I and D caches on.
@@ -1177,8 +1195,6 @@ __armv7_mmu_cache_off:
bic r0, r0, #0x000c
#endif
mcr p15, 0, r0, c1, c0 @ turn MMU and cache off
- mov r12, lr
- bl __armv7_mmu_cache_flush
mov r0, #0
#ifdef CONFIG_MMU
mcr p15, 0, r0, c8, c7, 0 @ invalidate whole TLB
@@ -1186,7 +1202,7 @@ __armv7_mmu_cache_off:
mcr p15, 0, r0, c7, c5, 6 @ invalidate BTC
mcr p15, 0, r0, c7, c10, 4 @ DSB
mcr p15, 0, r0, c7, c5, 4 @ ISB
- mov pc, r12
+ mov pc, lr
/*
* Clean and flush the cache to maintain consistency.
@@ -1199,6 +1215,7 @@ __armv7_mmu_cache_off:
.align 5
cache_clean_flush:
mov r3, #16
+ mov r11, r1
b call_cache_fn
__armv4_mpu_cache_flush:
@@ -1249,51 +1266,15 @@ __armv7_mmu_cache_flush:
mcr p15, 0, r10, c7, c14, 0 @ clean+invalidate D
b iflush
hierarchical:
- mcr p15, 0, r10, c7, c10, 5 @ DMB
- stmfd sp!, {r0-r7, r9-r11}
- mrc p15, 1, r0, c0, c0, 1 @ read clidr
- ands r3, r0, #0x7000000 @ extract loc from clidr
- mov r3, r3, lsr #23 @ left align loc bit field
- beq finished @ if loc is 0, then no need to clean
- mov r10, #0 @ start clean at cache level 0
-loop1:
- add r2, r10, r10, lsr #1 @ work out 3x current cache level
- mov r1, r0, lsr r2 @ extract cache type bits from clidr
- and r1, r1, #7 @ mask of the bits for current cache only
- cmp r1, #2 @ see what cache we have at this level
- blt skip @ skip if no cache, or just i-cache
- mcr p15, 2, r10, c0, c0, 0 @ select current cache level in cssr
- mcr p15, 0, r10, c7, c5, 4 @ isb to sych the new cssr&csidr
- mrc p15, 1, r1, c0, c0, 0 @ read the new csidr
- and r2, r1, #7 @ extract the length of the cache lines
- add r2, r2, #4 @ add 4 (line length offset)
- ldr r4, =0x3ff
- ands r4, r4, r1, lsr #3 @ find maximum number on the way size
- clz r5, r4 @ find bit position of way size increment
- ldr r7, =0x7fff
- ands r7, r7, r1, lsr #13 @ extract max number of the index size
-loop2:
- mov r9, r4 @ create working copy of max way size
-loop3:
- ARM( orr r11, r10, r9, lsl r5 ) @ factor way and cache number into r11
- ARM( orr r11, r11, r7, lsl r2 ) @ factor index number into r11
- THUMB( lsl r6, r9, r5 )
- THUMB( orr r11, r10, r6 ) @ factor way and cache number into r11
- THUMB( lsl r6, r7, r2 )
- THUMB( orr r11, r11, r6 ) @ factor index number into r11
- mcr p15, 0, r11, c7, c14, 2 @ clean & invalidate by set/way
- subs r9, r9, #1 @ decrement the way
- bge loop3
- subs r7, r7, #1 @ decrement the index
- bge loop2
-skip:
- add r10, r10, #2 @ increment cache number
- cmp r3, r10
- bgt loop1
-finished:
- ldmfd sp!, {r0-r7, r9-r11}
- mov r10, #0 @ switch back to cache level 0
- mcr p15, 2, r10, c0, c0, 0 @ select current cache level in cssr
+ dcache_line_size r1, r2 @ r1 := dcache min line size
+ sub r2, r1, #1 @ r2 := line size mask
+ bic r0, r0, r2 @ round down start to line size
+ bic r11, r11, r2 @ round down end to line size
+0: cmp r0, r11 @ finished?
+ bgt iflush
+ mcr p15, 0, r0, c7, c10, 1 @ clean line at r0 from Dcache
+ add r0, r0, r1
+ b 0b
iflush:
mcr p15, 0, r10, c7, c10, 4 @ DSB
mcr p15, 0, r10, c7, c5, 0 @ invalidate I+BTB
--
2.17.1
^ permalink raw reply related [flat|nested] 6+ messages in thread
* Re: [PATCH 1/2] ARM: decompressor: prepare cache_clean_flush for doing by-VA maintenance
2020-02-18 16:44 ` [PATCH 1/2] ARM: decompressor: prepare cache_clean_flush for doing by-VA maintenance Ard Biesheuvel
@ 2020-02-18 16:51 ` Russell King - ARM Linux admin
2020-02-18 16:56 ` Ard Biesheuvel
0 siblings, 1 reply; 6+ messages in thread
From: Russell King - ARM Linux admin @ 2020-02-18 16:51 UTC (permalink / raw)
To: Ard Biesheuvel
Cc: linux-efi, linux-arm-kernel, Marc Zyngier, Nicolas Pitre,
Catalin Marinas
On Tue, Feb 18, 2020 at 05:44:29PM +0100, Ard Biesheuvel wrote:
> In preparation of turning the decompressor's cache clean/flush
> operations into proper by-VA maintenance for v7 cores, pass the
> start and end addresses of the regions that need cache maintenance
> into cache_clean_flush in registers r0 and r1.
Where's the documentation of the new calling convention? This is
assembly code, it needs such things documented as there's no
function prototypes to give that information.
>
> Currently, all implementations of cache_clean_flush ignore these
> values, so no functional change is expected as a result of this
> patch.
>
> Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
> ---
> arch/arm/boot/compressed/head.S | 28 +++++++++++++++++---
> 1 file changed, 25 insertions(+), 3 deletions(-)
>
> diff --git a/arch/arm/boot/compressed/head.S b/arch/arm/boot/compressed/head.S
> index 7b86a2e1acce..935799b92198 100644
> --- a/arch/arm/boot/compressed/head.S
> +++ b/arch/arm/boot/compressed/head.S
> @@ -525,12 +525,15 @@ dtb_check_done:
> /* cache_clean_flush may use the stack, so relocate it */
> add sp, sp, r6
> #endif
> -
> + mov r0, r9
> + ldr r1, 0f
> + add r1, r1, r0
> bl cache_clean_flush
>
> badr r0, restart
> add r0, r0, r6
> mov pc, r0
> +0: .long _edata - restart
>
> wont_overwrite:
> /*
> @@ -622,6 +625,21 @@ not_relocated: mov r0, #0
> add r2, sp, #0x10000 @ 64k max
> mov r3, r7
> bl decompress_kernel
> +
> + mov r0, r4 @ base of inflated image
> + adr r1, LC0 @ actual LC0
> + ldr r2, [r1] @ linktime LC0
> + sub r2, r1, r2 @ LC0 delta
> + ldr r1, [r1, #16] @ link time inflated size offset
> + ldr r1, [r1, r2] @ actual inflated size (LE)
> +#ifdef __ARMEB__
> + /* convert to big endian */
> + eor r2, r1, r1, ror #16
> + bic r2, r2, #0x00ff0000
> + mov r1, r1, ror #8
> + eor r1, r1, r2, lsr #8
> +#endif
> + add r1, r1, r0 @ end of inflated image
> bl cache_clean_flush
> bl cache_off
>
> @@ -1439,6 +1457,7 @@ reloc_code_end:
> #ifdef CONFIG_EFI_STUB
> .align 2
> _start: .long start - .
> +__edata: .long _edata - .
>
> ENTRY(efi_stub_entry)
> @ allocate space on stack for passing current zImage address
> @@ -1470,8 +1489,11 @@ ENTRY(efi_stub_entry)
> .align 2
> 0: .long start - (. + 4)
>
> - @ Preserve return value of efi_entry() in r4
> - mov r4, r0
> + mov r4, r0 @ preserve DTB pointer
> + mov r0, r1 @ start of image
> + adr r2, __edata
> + ldr r1, [r2]
> + add r1, r1, r2 @ end of image
> bl cache_clean_flush
> bl cache_off
>
> --
> 2.17.1
>
>
--
RMK's Patch system: https://www.armlinux.org.uk/developer/patches/
FTTC broadband for 0.8mile line in suburbia: sync at 12.1Mbps down 622kbps up
According to speedtest.net: 11.9Mbps down 500kbps up
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH 1/2] ARM: decompressor: prepare cache_clean_flush for doing by-VA maintenance
2020-02-18 16:51 ` Russell King - ARM Linux admin
@ 2020-02-18 16:56 ` Ard Biesheuvel
2020-02-18 17:08 ` Russell King - ARM Linux admin
0 siblings, 1 reply; 6+ messages in thread
From: Ard Biesheuvel @ 2020-02-18 16:56 UTC (permalink / raw)
To: Russell King - ARM Linux admin
Cc: linux-efi, linux-arm-kernel, Marc Zyngier, Nicolas Pitre,
Catalin Marinas
On Tue, 18 Feb 2020 at 17:52, Russell King - ARM Linux admin
<linux@armlinux.org.uk> wrote:
>
> On Tue, Feb 18, 2020 at 05:44:29PM +0100, Ard Biesheuvel wrote:
> > In preparation of turning the decompressor's cache clean/flush
> > operations into proper by-VA maintenance for v7 cores, pass the
> > start and end addresses of the regions that need cache maintenance
> > into cache_clean_flush in registers r0 and r1.
>
> Where's the documentation of the new calling convention? This is
> assembly code, it needs such things documented as there's no
> function prototypes to give that information.
>
Would something like
diff --git a/arch/arm/boot/compressed/head.S b/arch/arm/boot/compressed/head.S
index df93c9f0a19a..e4c779a89db1 100644
--- a/arch/arm/boot/compressed/head.S
+++ b/arch/arm/boot/compressed/head.S
@@ -1207,6 +1207,9 @@ __armv7_mmu_cache_off:
/*
* Clean and flush the cache to maintain consistency.
*
+ * On entry,
+ * r0 = start address
+ * r1 = end address (exclusive)
* On exit,
* r1, r2, r3, r9, r10, r11, r12 corrupted
* This routine must preserve:
work for you?
> >
> > Currently, all implementations of cache_clean_flush ignore these
> > values, so no functional change is expected as a result of this
> > patch.
> >
> > Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
> > ---
> > arch/arm/boot/compressed/head.S | 28 +++++++++++++++++---
> > 1 file changed, 25 insertions(+), 3 deletions(-)
> >
> > diff --git a/arch/arm/boot/compressed/head.S b/arch/arm/boot/compressed/head.S
> > index 7b86a2e1acce..935799b92198 100644
> > --- a/arch/arm/boot/compressed/head.S
> > +++ b/arch/arm/boot/compressed/head.S
> > @@ -525,12 +525,15 @@ dtb_check_done:
> > /* cache_clean_flush may use the stack, so relocate it */
> > add sp, sp, r6
> > #endif
> > -
> > + mov r0, r9
> > + ldr r1, 0f
> > + add r1, r1, r0
> > bl cache_clean_flush
> >
> > badr r0, restart
> > add r0, r0, r6
> > mov pc, r0
> > +0: .long _edata - restart
> >
> > wont_overwrite:
> > /*
> > @@ -622,6 +625,21 @@ not_relocated: mov r0, #0
> > add r2, sp, #0x10000 @ 64k max
> > mov r3, r7
> > bl decompress_kernel
> > +
> > + mov r0, r4 @ base of inflated image
> > + adr r1, LC0 @ actual LC0
> > + ldr r2, [r1] @ linktime LC0
> > + sub r2, r1, r2 @ LC0 delta
> > + ldr r1, [r1, #16] @ link time inflated size offset
> > + ldr r1, [r1, r2] @ actual inflated size (LE)
> > +#ifdef __ARMEB__
> > + /* convert to big endian */
> > + eor r2, r1, r1, ror #16
> > + bic r2, r2, #0x00ff0000
> > + mov r1, r1, ror #8
> > + eor r1, r1, r2, lsr #8
> > +#endif
> > + add r1, r1, r0 @ end of inflated image
> > bl cache_clean_flush
> > bl cache_off
> >
> > @@ -1439,6 +1457,7 @@ reloc_code_end:
> > #ifdef CONFIG_EFI_STUB
> > .align 2
> > _start: .long start - .
> > +__edata: .long _edata - .
> >
> > ENTRY(efi_stub_entry)
> > @ allocate space on stack for passing current zImage address
> > @@ -1470,8 +1489,11 @@ ENTRY(efi_stub_entry)
> > .align 2
> > 0: .long start - (. + 4)
> >
> > - @ Preserve return value of efi_entry() in r4
> > - mov r4, r0
> > + mov r4, r0 @ preserve DTB pointer
> > + mov r0, r1 @ start of image
> > + adr r2, __edata
> > + ldr r1, [r2]
> > + add r1, r1, r2 @ end of image
> > bl cache_clean_flush
> > bl cache_off
> >
> > --
> > 2.17.1
> >
> >
>
> --
> RMK's Patch system: https://www.armlinux.org.uk/developer/patches/
> FTTC broadband for 0.8mile line in suburbia: sync at 12.1Mbps down 622kbps up
> According to speedtest.net: 11.9Mbps down 500kbps up
^ permalink raw reply related [flat|nested] 6+ messages in thread
* Re: [PATCH 1/2] ARM: decompressor: prepare cache_clean_flush for doing by-VA maintenance
2020-02-18 16:56 ` Ard Biesheuvel
@ 2020-02-18 17:08 ` Russell King - ARM Linux admin
0 siblings, 0 replies; 6+ messages in thread
From: Russell King - ARM Linux admin @ 2020-02-18 17:08 UTC (permalink / raw)
To: Ard Biesheuvel
Cc: linux-efi, linux-arm-kernel, Marc Zyngier, Nicolas Pitre,
Catalin Marinas
On Tue, Feb 18, 2020 at 05:56:52PM +0100, Ard Biesheuvel wrote:
> On Tue, 18 Feb 2020 at 17:52, Russell King - ARM Linux admin
> <linux@armlinux.org.uk> wrote:
> >
> > On Tue, Feb 18, 2020 at 05:44:29PM +0100, Ard Biesheuvel wrote:
> > > In preparation of turning the decompressor's cache clean/flush
> > > operations into proper by-VA maintenance for v7 cores, pass the
> > > start and end addresses of the regions that need cache maintenance
> > > into cache_clean_flush in registers r0 and r1.
> >
> > Where's the documentation of the new calling convention? This is
> > assembly code, it needs such things documented as there's no
> > function prototypes to give that information.
> >
>
> Would something like
>
> diff --git a/arch/arm/boot/compressed/head.S b/arch/arm/boot/compressed/head.S
> index df93c9f0a19a..e4c779a89db1 100644
> --- a/arch/arm/boot/compressed/head.S
> +++ b/arch/arm/boot/compressed/head.S
> @@ -1207,6 +1207,9 @@ __armv7_mmu_cache_off:
> /*
> * Clean and flush the cache to maintain consistency.
> *
> + * On entry,
> + * r0 = start address
> + * r1 = end address (exclusive)
> * On exit,
> * r1, r2, r3, r9, r10, r11, r12 corrupted
> * This routine must preserve:
>
> work for you?
Definitely what is required, thanks.
--
RMK's Patch system: https://www.armlinux.org.uk/developer/patches/
FTTC broadband for 0.8mile line in suburbia: sync at 12.1Mbps down 622kbps up
According to speedtest.net: 11.9Mbps down 500kbps up
^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~2020-02-18 17:09 UTC | newest]
Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-02-18 16:44 [PATCH 0/2] ARM: decompressor: use by-VA cache maintenance for v7 cores Ard Biesheuvel
2020-02-18 16:44 ` [PATCH 1/2] ARM: decompressor: prepare cache_clean_flush for doing by-VA maintenance Ard Biesheuvel
2020-02-18 16:51 ` Russell King - ARM Linux admin
2020-02-18 16:56 ` Ard Biesheuvel
2020-02-18 17:08 ` Russell King - ARM Linux admin
2020-02-18 16:44 ` [PATCH 2/2] ARM: decompressor: switch to by-VA cache maintenance for v7 cores Ard Biesheuvel
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).