* [PATCH 1/3] ARM: cache-v7: add missing ISB after cache level selection
2021-02-08 22:49 [PATCH 0/3] ARM: v7: get rid of boot time mini stack Ard Biesheuvel
@ 2021-02-08 22:49 ` Ard Biesheuvel
2021-02-08 22:49 ` [PATCH 2/3] ARM: cache-v7: refactor v7_invalidate_l1 to avoid clobbering r5/r6 Ard Biesheuvel
` (2 subsequent siblings)
3 siblings, 0 replies; 12+ messages in thread
From: Ard Biesheuvel @ 2021-02-08 22:49 UTC (permalink / raw)
To: linux-arm-kernel
Cc: Nicolas Pitre, Marc Zyngier, Linus Walleij, Russell King,
kernel-team, Ard Biesheuvel
A write to CCSELR needs to complete before its results can be observed
via CCSIDR. So add a ISB to ensure that this is the case.
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
---
arch/arm/mm/cache-v7.S | 7 ++++---
1 file changed, 4 insertions(+), 3 deletions(-)
diff --git a/arch/arm/mm/cache-v7.S b/arch/arm/mm/cache-v7.S
index dc8f152f3556..307f381eee71 100644
--- a/arch/arm/mm/cache-v7.S
+++ b/arch/arm/mm/cache-v7.S
@@ -38,9 +38,10 @@ icache_size:
* procedures.
*/
ENTRY(v7_invalidate_l1)
- mov r0, #0
- mcr p15, 2, r0, c0, c0, 0
- mrc p15, 1, r0, c0, c0, 0
+ mov r0, #0
+ mcr p15, 2, r0, c0, c0, 0 @ select L1 data cache in CSSELR
+ isb
+ mrc p15, 1, r0, c0, c0, 0 @ read cache geometry from CCSIDR
movw r1, #0x7fff
and r2, r1, r0, lsr #13
--
2.20.1
_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
^ permalink raw reply related [flat|nested] 12+ messages in thread
* [PATCH 2/3] ARM: cache-v7: refactor v7_invalidate_l1 to avoid clobbering r5/r6
2021-02-08 22:49 [PATCH 0/3] ARM: v7: get rid of boot time mini stack Ard Biesheuvel
2021-02-08 22:49 ` [PATCH 1/3] ARM: cache-v7: add missing ISB after cache level selection Ard Biesheuvel
@ 2021-02-08 22:49 ` Ard Biesheuvel
2021-02-08 22:49 ` [PATCH 3/3] ARM: cache-v7: get rid of mini-stack Ard Biesheuvel
2021-02-08 23:11 ` [PATCH 0/3] ARM: v7: get rid of boot time mini stack Nicolas Pitre
3 siblings, 0 replies; 12+ messages in thread
From: Ard Biesheuvel @ 2021-02-08 22:49 UTC (permalink / raw)
To: linux-arm-kernel
Cc: Nicolas Pitre, Marc Zyngier, Linus Walleij, Russell King,
kernel-team, Ard Biesheuvel
The cache invalidation code in v7_invalidate_l1 can be tweaked to
re-read the associativity from CCSIDR, and keep the set/way identifier
component in a single register that is assigned in the outer loop. This
way, we need 2 registers less.
Given that the number of sets is typically much larger than the
associativity, rearrange the code so that the outer loop has the fewer
number of iterations, ensuring that the re-read of CCSIDR only occurs a
handful of times in practice.
Fix the whitespace while at it, and update the comment to indicate that
this code is no longer a clone of anything else.
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
---
arch/arm/mm/cache-v7.S | 51 ++++++++++----------
1 file changed, 25 insertions(+), 26 deletions(-)
diff --git a/arch/arm/mm/cache-v7.S b/arch/arm/mm/cache-v7.S
index 307f381eee71..4544af4855f6 100644
--- a/arch/arm/mm/cache-v7.S
+++ b/arch/arm/mm/cache-v7.S
@@ -33,9 +33,8 @@ icache_size:
* processor. We fix this by performing an invalidate, rather than a
* clean + invalidate, before jumping into the kernel.
*
- * This function is cloned from arch/arm/mach-tegra/headsmp.S, and needs
- * to be called for both secondary cores startup and primary core resume
- * procedures.
+ * This function needs to be called for both secondary cores startup and
+ * primary core resume procedures.
*/
ENTRY(v7_invalidate_l1)
mov r0, #0
@@ -43,32 +42,32 @@ ENTRY(v7_invalidate_l1)
isb
mrc p15, 1, r0, c0, c0, 0 @ read cache geometry from CCSIDR
- movw r1, #0x7fff
- and r2, r1, r0, lsr #13
+ movw r3, #0x3ff
+ and r3, r3, r0, lsr #3 @ 'Associativity' in CCSIDR[12:3]
+ clz r1, r3 @ WayShift
+ mov r2, #1
+ mov r3, r3, lsl r1 @ NumWays-1 shifted into bits [31:...]
+ movs r1, r2, lsl r1 @ #1 shifted left by same amount
+ moveq r1, #1 @ r1 needs value > 0 even if only 1 way
- movw r1, #0x3ff
+ and r2, r0, #0x7
+ add r2, r2, #4 @ SetShift
- and r3, r1, r0, lsr #3 @ NumWays - 1
- add r2, r2, #1 @ NumSets
+1: movw r4, #0x7fff
+ and r0, r4, r0, lsr #13 @ 'NumSets' in CCSIDR[27:13]
- and r0, r0, #0x7
- add r0, r0, #4 @ SetShift
-
- clz r1, r3 @ WayShift
- add r4, r3, #1 @ NumWays
-1: sub r2, r2, #1 @ NumSets--
- mov r3, r4 @ Temp = NumWays
-2: subs r3, r3, #1 @ Temp--
- mov r5, r3, lsl r1
- mov r6, r2, lsl r0
- orr r5, r5, r6 @ Reg = (Temp<<WayShift)|(NumSets<<SetShift)
- mcr p15, 0, r5, c7, c6, 2
- bgt 2b
- cmp r2, #0
- bgt 1b
- dsb st
- isb
- ret lr
+2: mov r4, r0, lsl r2 @ NumSet << SetShift
+ orr r4, r4, r3 @ Reg = (Temp<<WayShift)|(NumSets<<SetShift)
+ mcr p15, 0, r4, c7, c6, 2
+ subs r0, r0, #1 @ Set--
+ bpl 2b
+ subs r3, r3, r1 @ Way--
+ bmi 3f
+ mrc p15, 1, r0, c0, c0, 0 @ re-read cache geometry from CCSIDR
+ b 1b
+3: dsb st
+ isb
+ ret lr
ENDPROC(v7_invalidate_l1)
/*
--
2.20.1
_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
^ permalink raw reply related [flat|nested] 12+ messages in thread
* [PATCH 3/3] ARM: cache-v7: get rid of mini-stack
2021-02-08 22:49 [PATCH 0/3] ARM: v7: get rid of boot time mini stack Ard Biesheuvel
2021-02-08 22:49 ` [PATCH 1/3] ARM: cache-v7: add missing ISB after cache level selection Ard Biesheuvel
2021-02-08 22:49 ` [PATCH 2/3] ARM: cache-v7: refactor v7_invalidate_l1 to avoid clobbering r5/r6 Ard Biesheuvel
@ 2021-02-08 22:49 ` Ard Biesheuvel
2021-02-08 23:07 ` Nicolas Pitre
2021-02-08 23:11 ` [PATCH 0/3] ARM: v7: get rid of boot time mini stack Nicolas Pitre
3 siblings, 1 reply; 12+ messages in thread
From: Ard Biesheuvel @ 2021-02-08 22:49 UTC (permalink / raw)
To: linux-arm-kernel
Cc: Nicolas Pitre, Marc Zyngier, Linus Walleij, Russell King,
kernel-team, Ard Biesheuvel
Now that we have reduced the number of registers that we need to
preserve when calling v7_invalidate_l1 from the boot code, we can use
scratch registers to preserve the remaining ones, and get rid of the
mini stack entirely. This works around any issues regarding cache
behavior in relation to the uncached accesses to this memory, which is
hard to get right in the general case (i.e., both bare metal and under
virtualization)
While at it, switch v7_invalidate_l1 to using ip as a scratch register
instead of r4. This makes the function AAPCS compliant, and removes the
need to stash r4 in ip across the call.
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
---
arch/arm/include/asm/memory.h | 15 --------
arch/arm/mm/cache-v7.S | 10 ++---
arch/arm/mm/proc-v7.S | 40 +++++++++-----------
3 files changed, 23 insertions(+), 42 deletions(-)
diff --git a/arch/arm/include/asm/memory.h b/arch/arm/include/asm/memory.h
index 2f841cb65c30..a711322d9f40 100644
--- a/arch/arm/include/asm/memory.h
+++ b/arch/arm/include/asm/memory.h
@@ -150,21 +150,6 @@ extern unsigned long vectors_base;
*/
#define PLAT_PHYS_OFFSET UL(CONFIG_PHYS_OFFSET)
-#ifdef CONFIG_XIP_KERNEL
-/*
- * When referencing data in RAM from the XIP region in a relative manner
- * with the MMU off, we need the relative offset between the two physical
- * addresses. The macro below achieves this, which is:
- * __pa(v_data) - __xip_pa(v_text)
- */
-#define PHYS_RELATIVE(v_data, v_text) \
- (((v_data) - PAGE_OFFSET + PLAT_PHYS_OFFSET) - \
- ((v_text) - XIP_VIRT_ADDR(CONFIG_XIP_PHYS_ADDR) + \
- CONFIG_XIP_PHYS_ADDR))
-#else
-#define PHYS_RELATIVE(v_data, v_text) ((v_data) - (v_text))
-#endif
-
#ifndef __ASSEMBLY__
/*
diff --git a/arch/arm/mm/cache-v7.S b/arch/arm/mm/cache-v7.S
index 4544af4855f6..243e82124663 100644
--- a/arch/arm/mm/cache-v7.S
+++ b/arch/arm/mm/cache-v7.S
@@ -53,12 +53,12 @@ ENTRY(v7_invalidate_l1)
and r2, r0, #0x7
add r2, r2, #4 @ SetShift
-1: movw r4, #0x7fff
- and r0, r4, r0, lsr #13 @ 'NumSets' in CCSIDR[27:13]
+1: movw ip, #0x7fff
+ and r0, ip, r0, lsr #13 @ 'NumSets' in CCSIDR[27:13]
-2: mov r4, r0, lsl r2 @ NumSet << SetShift
- orr r4, r4, r3 @ Reg = (Temp<<WayShift)|(NumSets<<SetShift)
- mcr p15, 0, r4, c7, c6, 2
+2: mov ip, r0, lsl r2 @ NumSet << SetShift
+ orr ip, ip, r3 @ Reg = (Temp<<WayShift)|(NumSets<<SetShift)
+ mcr p15, 0, ip, c7, c6, 2
subs r0, r0, #1 @ Set--
bpl 2b
subs r3, r3, r1 @ Way--
diff --git a/arch/arm/mm/proc-v7.S b/arch/arm/mm/proc-v7.S
index 28c9d32fa99a..cab978c25b68 100644
--- a/arch/arm/mm/proc-v7.S
+++ b/arch/arm/mm/proc-v7.S
@@ -256,6 +256,20 @@ ENDPROC(cpu_pj4b_do_resume)
#endif
+ @
+ @ Invoke the v7_invalidate_l1() function, which adheres to the AAPCS
+ @ rules, and so it may corrupt registers that we need to preserve.
+ @
+ .macro do_invalidate_l1
+ mov r6, r1
+ mov r7, r2
+ mov r10, lr
+ bl v7_invalidate_l1 @ corrupts {r0-r3, ip, lr}
+ mov r1, r6
+ mov r2, r7
+ mov lr, r10
+ .endm
+
/*
* __v7_setup
*
@@ -277,6 +291,7 @@ __v7_ca5mp_setup:
__v7_ca9mp_setup:
__v7_cr7mp_setup:
__v7_cr8mp_setup:
+ do_invalidate_l1
mov r10, #(1 << 0) @ Cache/TLB ops broadcasting
b 1f
__v7_ca7mp_setup:
@@ -284,13 +299,9 @@ __v7_ca12mp_setup:
__v7_ca15mp_setup:
__v7_b15mp_setup:
__v7_ca17mp_setup:
+ do_invalidate_l1
mov r10, #0
-1: adr r0, __v7_setup_stack_ptr
- ldr r12, [r0]
- add r12, r12, r0 @ the local stack
- stmia r12, {r1-r6, lr} @ v7_invalidate_l1 touches r0-r6
- bl v7_invalidate_l1
- ldmia r12, {r1-r6, lr}
+1:
#ifdef CONFIG_SMP
orr r10, r10, #(1 << 6) @ Enable SMP/nAMP mode
ALT_SMP(mrc p15, 0, r0, c1, c0, 1)
@@ -471,12 +482,7 @@ __v7_pj4b_setup:
#endif /* CONFIG_CPU_PJ4B */
__v7_setup:
- adr r0, __v7_setup_stack_ptr
- ldr r12, [r0]
- add r12, r12, r0 @ the local stack
- stmia r12, {r1-r6, lr} @ v7_invalidate_l1 touches r0-r6
- bl v7_invalidate_l1
- ldmia r12, {r1-r6, lr}
+ do_invalidate_l1
__v7_setup_cont:
and r0, r9, #0xff000000 @ ARM?
@@ -549,16 +555,6 @@ __errata_finish:
THUMB( orr r0, r0, #1 << 30 ) @ Thumb exceptions
ret lr @ return to head.S:__ret
- .align 2
-__v7_setup_stack_ptr:
- .word PHYS_RELATIVE(__v7_setup_stack, .)
-ENDPROC(__v7_setup)
-
- .bss
- .align 2
-__v7_setup_stack:
- .space 4 * 7 @ 7 registers
-
__INITDATA
.weak cpu_v7_bugs_init
--
2.20.1
_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
^ permalink raw reply related [flat|nested] 12+ messages in thread
* Re: [PATCH 3/3] ARM: cache-v7: get rid of mini-stack
2021-02-08 22:49 ` [PATCH 3/3] ARM: cache-v7: get rid of mini-stack Ard Biesheuvel
@ 2021-02-08 23:07 ` Nicolas Pitre
2021-02-09 22:37 ` Ard Biesheuvel
0 siblings, 1 reply; 12+ messages in thread
From: Nicolas Pitre @ 2021-02-08 23:07 UTC (permalink / raw)
To: Ard Biesheuvel
Cc: Marc Zyngier, Linus Walleij, kernel-team, Russell King, linux-arm-kernel
On Mon, 8 Feb 2021, Ard Biesheuvel wrote:
> @@ -549,16 +555,6 @@ __errata_finish:
> THUMB( orr r0, r0, #1 << 30 ) @ Thumb exceptions
> ret lr @ return to head.S:__ret
>
> - .align 2
> -__v7_setup_stack_ptr:
> - .word PHYS_RELATIVE(__v7_setup_stack, .)
> -ENDPROC(__v7_setup)
> -
> - .bss
> - .align 2
> -__v7_setup_stack:
> - .space 4 * 7 @ 7 registers
> -
> __INITDATA
You should have preserved the ENDPROC here.
Nicolas
_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH 3/3] ARM: cache-v7: get rid of mini-stack
2021-02-08 23:07 ` Nicolas Pitre
@ 2021-02-09 22:37 ` Ard Biesheuvel
0 siblings, 0 replies; 12+ messages in thread
From: Ard Biesheuvel @ 2021-02-09 22:37 UTC (permalink / raw)
To: Nicolas Pitre
Cc: Marc Zyngier, Linus Walleij, Android Kernel Team, Russell King,
Linux ARM
On Tue, 9 Feb 2021 at 00:08, Nicolas Pitre <nico@fluxnic.net> wrote:
>
> On Mon, 8 Feb 2021, Ard Biesheuvel wrote:
>
> > @@ -549,16 +555,6 @@ __errata_finish:
> > THUMB( orr r0, r0, #1 << 30 ) @ Thumb exceptions
> > ret lr @ return to head.S:__ret
> >
> > - .align 2
> > -__v7_setup_stack_ptr:
> > - .word PHYS_RELATIVE(__v7_setup_stack, .)
> > -ENDPROC(__v7_setup)
> > -
> > - .bss
> > - .align 2
> > -__v7_setup_stack:
> > - .space 4 * 7 @ 7 registers
> > -
> > __INITDATA
>
> You should have preserved the ENDPROC here.
>
Indeed. Will fix that in v2.
Thanks,
Ard.
_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH 0/3] ARM: v7: get rid of boot time mini stack
2021-02-08 22:49 [PATCH 0/3] ARM: v7: get rid of boot time mini stack Ard Biesheuvel
` (2 preceding siblings ...)
2021-02-08 22:49 ` [PATCH 3/3] ARM: cache-v7: get rid of mini-stack Ard Biesheuvel
@ 2021-02-08 23:11 ` Nicolas Pitre
2021-02-09 22:37 ` Ard Biesheuvel
3 siblings, 1 reply; 12+ messages in thread
From: Nicolas Pitre @ 2021-02-08 23:11 UTC (permalink / raw)
To: Ard Biesheuvel
Cc: Marc Zyngier, Linus Walleij, kernel-team, Russell King, linux-arm-kernel
On Mon, 8 Feb 2021, Ard Biesheuvel wrote:
> The v7 boot code uses a small chunk of BSS to preserve some register
> contents across a call to v7_invalidate_l1 that occurs with the MMU and
> caches disabled. Memory accesses in such cases are tricky on v7+, given
> that the architecture permits some unintuitive behaviors (it is
> implementation defined whether accesses done with the MMU and caches off
> may hit in the caches, and on SoCs that incorporate off-core system
> caches, this behavior appears to be different even between cache
> levels). Also, cache invalidation is not safe under virtualization if
> the intent is to retain stores issued directly to DRAM, given that the
> hypervisor may upgrade invalidate operations to clean+invalidate,
> resulting in DRAM contents to be overwritte by the dirty cachelines that
> we were trying to evict in the first place.
>
> So let's address this issue, by removing the need for this stack to
> exist in the first place: v7_invalidate_l1 can be rewritten to use fewer
> registers, which means fewer registers need to be preserved, and we have
> enough spare registers available.
That is excellent.
I wonder why r1-r3 were preserved though.
Nicolas
_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH 0/3] ARM: v7: get rid of boot time mini stack
2021-02-08 23:11 ` [PATCH 0/3] ARM: v7: get rid of boot time mini stack Nicolas Pitre
@ 2021-02-09 22:37 ` Ard Biesheuvel
2021-02-09 22:59 ` Nicolas Pitre
0 siblings, 1 reply; 12+ messages in thread
From: Ard Biesheuvel @ 2021-02-09 22:37 UTC (permalink / raw)
To: Nicolas Pitre
Cc: Marc Zyngier, Linus Walleij, Android Kernel Team, Russell King,
Linux ARM
On Tue, 9 Feb 2021 at 00:12, Nicolas Pitre <nico@fluxnic.net> wrote:
>
> On Mon, 8 Feb 2021, Ard Biesheuvel wrote:
>
> > The v7 boot code uses a small chunk of BSS to preserve some register
> > contents across a call to v7_invalidate_l1 that occurs with the MMU and
> > caches disabled. Memory accesses in such cases are tricky on v7+, given
> > that the architecture permits some unintuitive behaviors (it is
> > implementation defined whether accesses done with the MMU and caches off
> > may hit in the caches, and on SoCs that incorporate off-core system
> > caches, this behavior appears to be different even between cache
> > levels). Also, cache invalidation is not safe under virtualization if
> > the intent is to retain stores issued directly to DRAM, given that the
> > hypervisor may upgrade invalidate operations to clean+invalidate,
> > resulting in DRAM contents to be overwritte by the dirty cachelines that
> > we were trying to evict in the first place.
> >
> > So let's address this issue, by removing the need for this stack to
> > exist in the first place: v7_invalidate_l1 can be rewritten to use fewer
> > registers, which means fewer registers need to be preserved, and we have
> > enough spare registers available.
>
> That is excellent.
>
> I wonder why r1-r3 were preserved though.
>
r1 and r2 are documented in head.S as
* The processor init function will be called with:
* r1 - machine type
* r2 - boot data (atags/dt) pointer
but preserving the value of r3 does not seem necessary. Perhaps this
is a leftover from old code?
_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH 0/3] ARM: v7: get rid of boot time mini stack
2021-02-09 22:37 ` Ard Biesheuvel
@ 2021-02-09 22:59 ` Nicolas Pitre
2021-02-09 23:02 ` Ard Biesheuvel
0 siblings, 1 reply; 12+ messages in thread
From: Nicolas Pitre @ 2021-02-09 22:59 UTC (permalink / raw)
To: Ard Biesheuvel
Cc: Marc Zyngier, Linus Walleij, Android Kernel Team, Russell King,
Linux ARM
On Tue, 9 Feb 2021, Ard Biesheuvel wrote:
> On Tue, 9 Feb 2021 at 00:12, Nicolas Pitre <nico@fluxnic.net> wrote:
> >
> > On Mon, 8 Feb 2021, Ard Biesheuvel wrote:
> >
> > > The v7 boot code uses a small chunk of BSS to preserve some register
> > > contents across a call to v7_invalidate_l1 that occurs with the MMU and
> > > caches disabled. Memory accesses in such cases are tricky on v7+, given
> > > that the architecture permits some unintuitive behaviors (it is
> > > implementation defined whether accesses done with the MMU and caches off
> > > may hit in the caches, and on SoCs that incorporate off-core system
> > > caches, this behavior appears to be different even between cache
> > > levels). Also, cache invalidation is not safe under virtualization if
> > > the intent is to retain stores issued directly to DRAM, given that the
> > > hypervisor may upgrade invalidate operations to clean+invalidate,
> > > resulting in DRAM contents to be overwritte by the dirty cachelines that
> > > we were trying to evict in the first place.
> > >
> > > So let's address this issue, by removing the need for this stack to
> > > exist in the first place: v7_invalidate_l1 can be rewritten to use fewer
> > > registers, which means fewer registers need to be preserved, and we have
> > > enough spare registers available.
> >
> > That is excellent.
> >
> > I wonder why r1-r3 were preserved though.
> >
>
> r1 and r2 are documented in head.S as
>
> * The processor init function will be called with:
> * r1 - machine type
> * r2 - boot data (atags/dt) pointer
>
> but preserving the value of r3 does not seem necessary. Perhaps this
> is a leftover from old code?
Still, further down in the same comment it is said:
* On return, the CPU will be ready for the MMU to be turned on,
* r0 will hold the CPU control register value, r1, r2, r4, and
* r9 will be preserved. r5 will also be preserved if LPAE.
But you're now clobbering r1 and r2. And when this returns, code
execution goes to __enable_mmu whose comment also mentions the expected
r1 and r2 content. That would need adjusting too.
Nicolas
_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH 0/3] ARM: v7: get rid of boot time mini stack
2021-02-09 22:59 ` Nicolas Pitre
@ 2021-02-09 23:02 ` Ard Biesheuvel
2021-02-09 23:22 ` Nicolas Pitre
0 siblings, 1 reply; 12+ messages in thread
From: Ard Biesheuvel @ 2021-02-09 23:02 UTC (permalink / raw)
To: Nicolas Pitre
Cc: Marc Zyngier, Linus Walleij, Android Kernel Team, Russell King,
Linux ARM
On Tue, 9 Feb 2021 at 23:59, Nicolas Pitre <nico@fluxnic.net> wrote:
>
> On Tue, 9 Feb 2021, Ard Biesheuvel wrote:
>
> > On Tue, 9 Feb 2021 at 00:12, Nicolas Pitre <nico@fluxnic.net> wrote:
> > >
> > > On Mon, 8 Feb 2021, Ard Biesheuvel wrote:
> > >
> > > > The v7 boot code uses a small chunk of BSS to preserve some register
> > > > contents across a call to v7_invalidate_l1 that occurs with the MMU and
> > > > caches disabled. Memory accesses in such cases are tricky on v7+, given
> > > > that the architecture permits some unintuitive behaviors (it is
> > > > implementation defined whether accesses done with the MMU and caches off
> > > > may hit in the caches, and on SoCs that incorporate off-core system
> > > > caches, this behavior appears to be different even between cache
> > > > levels). Also, cache invalidation is not safe under virtualization if
> > > > the intent is to retain stores issued directly to DRAM, given that the
> > > > hypervisor may upgrade invalidate operations to clean+invalidate,
> > > > resulting in DRAM contents to be overwritte by the dirty cachelines that
> > > > we were trying to evict in the first place.
> > > >
> > > > So let's address this issue, by removing the need for this stack to
> > > > exist in the first place: v7_invalidate_l1 can be rewritten to use fewer
> > > > registers, which means fewer registers need to be preserved, and we have
> > > > enough spare registers available.
> > >
> > > That is excellent.
> > >
> > > I wonder why r1-r3 were preserved though.
> > >
> >
> > r1 and r2 are documented in head.S as
> >
> > * The processor init function will be called with:
> > * r1 - machine type
> > * r2 - boot data (atags/dt) pointer
> >
> > but preserving the value of r3 does not seem necessary. Perhaps this
> > is a leftover from old code?
>
> Still, further down in the same comment it is said:
>
> * On return, the CPU will be ready for the MMU to be turned on,
> * r0 will hold the CPU control register value, r1, r2, r4, and
> * r9 will be preserved. r5 will also be preserved if LPAE.
>
> But you're now clobbering r1 and r2. And when this returns, code
> execution goes to __enable_mmu whose comment also mentions the expected
> r1 and r2 content. That would need adjusting too.
>
No, r1 and r2 are preserved - they are stashed in r6/r7 across the
call to v7_invalidate_l1. r0, r3 and r12 are clobbered as well, but
those do not contain anything that needs to be preserved.
_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH 0/3] ARM: v7: get rid of boot time mini stack
2021-02-09 23:02 ` Ard Biesheuvel
@ 2021-02-09 23:22 ` Nicolas Pitre
2021-02-10 17:39 ` Ard Biesheuvel
0 siblings, 1 reply; 12+ messages in thread
From: Nicolas Pitre @ 2021-02-09 23:22 UTC (permalink / raw)
To: Ard Biesheuvel
Cc: Marc Zyngier, Linus Walleij, Android Kernel Team, Russell King,
Linux ARM
On Wed, 10 Feb 2021, Ard Biesheuvel wrote:
> On Tue, 9 Feb 2021 at 23:59, Nicolas Pitre <nico@fluxnic.net> wrote:
> >
> > On Tue, 9 Feb 2021, Ard Biesheuvel wrote:
> >
> > > On Tue, 9 Feb 2021 at 00:12, Nicolas Pitre <nico@fluxnic.net> wrote:
> > > >
> > > > On Mon, 8 Feb 2021, Ard Biesheuvel wrote:
> > > >
> > > > > The v7 boot code uses a small chunk of BSS to preserve some register
> > > > > contents across a call to v7_invalidate_l1 that occurs with the MMU and
> > > > > caches disabled. Memory accesses in such cases are tricky on v7+, given
> > > > > that the architecture permits some unintuitive behaviors (it is
> > > > > implementation defined whether accesses done with the MMU and caches off
> > > > > may hit in the caches, and on SoCs that incorporate off-core system
> > > > > caches, this behavior appears to be different even between cache
> > > > > levels). Also, cache invalidation is not safe under virtualization if
> > > > > the intent is to retain stores issued directly to DRAM, given that the
> > > > > hypervisor may upgrade invalidate operations to clean+invalidate,
> > > > > resulting in DRAM contents to be overwritte by the dirty cachelines that
> > > > > we were trying to evict in the first place.
> > > > >
> > > > > So let's address this issue, by removing the need for this stack to
> > > > > exist in the first place: v7_invalidate_l1 can be rewritten to use fewer
> > > > > registers, which means fewer registers need to be preserved, and we have
> > > > > enough spare registers available.
> > > >
> > > > That is excellent.
> > > >
> > > > I wonder why r1-r3 were preserved though.
> > > >
> > >
> > > r1 and r2 are documented in head.S as
> > >
> > > * The processor init function will be called with:
> > > * r1 - machine type
> > > * r2 - boot data (atags/dt) pointer
> > >
> > > but preserving the value of r3 does not seem necessary. Perhaps this
> > > is a leftover from old code?
> >
> > Still, further down in the same comment it is said:
> >
> > * On return, the CPU will be ready for the MMU to be turned on,
> > * r0 will hold the CPU control register value, r1, r2, r4, and
> > * r9 will be preserved. r5 will also be preserved if LPAE.
> >
> > But you're now clobbering r1 and r2. And when this returns, code
> > execution goes to __enable_mmu whose comment also mentions the expected
> > r1 and r2 content. That would need adjusting too.
> >
>
> No, r1 and r2 are preserved - they are stashed in r6/r7 across the
> call to v7_invalidate_l1. r0, r3 and r12 are clobbered as well, but
> those do not contain anything that needs to be preserved.
OK, you're right. I didn't look at the whole code.
Acked-by: Nicolas Pitre <nico@fluxnic.net>
Nicolas
_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH 0/3] ARM: v7: get rid of boot time mini stack
2021-02-09 23:22 ` Nicolas Pitre
@ 2021-02-10 17:39 ` Ard Biesheuvel
0 siblings, 0 replies; 12+ messages in thread
From: Ard Biesheuvel @ 2021-02-10 17:39 UTC (permalink / raw)
To: Nicolas Pitre
Cc: Marc Zyngier, Linus Walleij, Android Kernel Team, Russell King,
Linux ARM
On Wed, 10 Feb 2021 at 00:22, Nicolas Pitre <nico@fluxnic.net> wrote:
>
> On Wed, 10 Feb 2021, Ard Biesheuvel wrote:
>
> > On Tue, 9 Feb 2021 at 23:59, Nicolas Pitre <nico@fluxnic.net> wrote:
> > >
> > > On Tue, 9 Feb 2021, Ard Biesheuvel wrote:
> > >
> > > > On Tue, 9 Feb 2021 at 00:12, Nicolas Pitre <nico@fluxnic.net> wrote:
> > > > >
> > > > > On Mon, 8 Feb 2021, Ard Biesheuvel wrote:
> > > > >
> > > > > > The v7 boot code uses a small chunk of BSS to preserve some register
> > > > > > contents across a call to v7_invalidate_l1 that occurs with the MMU and
> > > > > > caches disabled. Memory accesses in such cases are tricky on v7+, given
> > > > > > that the architecture permits some unintuitive behaviors (it is
> > > > > > implementation defined whether accesses done with the MMU and caches off
> > > > > > may hit in the caches, and on SoCs that incorporate off-core system
> > > > > > caches, this behavior appears to be different even between cache
> > > > > > levels). Also, cache invalidation is not safe under virtualization if
> > > > > > the intent is to retain stores issued directly to DRAM, given that the
> > > > > > hypervisor may upgrade invalidate operations to clean+invalidate,
> > > > > > resulting in DRAM contents to be overwritte by the dirty cachelines that
> > > > > > we were trying to evict in the first place.
> > > > > >
> > > > > > So let's address this issue, by removing the need for this stack to
> > > > > > exist in the first place: v7_invalidate_l1 can be rewritten to use fewer
> > > > > > registers, which means fewer registers need to be preserved, and we have
> > > > > > enough spare registers available.
> > > > >
> > > > > That is excellent.
> > > > >
> > > > > I wonder why r1-r3 were preserved though.
> > > > >
> > > >
> > > > r1 and r2 are documented in head.S as
> > > >
> > > > * The processor init function will be called with:
> > > > * r1 - machine type
> > > > * r2 - boot data (atags/dt) pointer
> > > >
> > > > but preserving the value of r3 does not seem necessary. Perhaps this
> > > > is a leftover from old code?
> > >
> > > Still, further down in the same comment it is said:
> > >
> > > * On return, the CPU will be ready for the MMU to be turned on,
> > > * r0 will hold the CPU control register value, r1, r2, r4, and
> > > * r9 will be preserved. r5 will also be preserved if LPAE.
> > >
> > > But you're now clobbering r1 and r2. And when this returns, code
> > > execution goes to __enable_mmu whose comment also mentions the expected
> > > r1 and r2 content. That would need adjusting too.
> > >
> >
> > No, r1 and r2 are preserved - they are stashed in r6/r7 across the
> > call to v7_invalidate_l1. r0, r3 and r12 are clobbered as well, but
> > those do not contain anything that needs to be preserved.
>
> OK, you're right. I didn't look at the whole code.
>
> Acked-by: Nicolas Pitre <nico@fluxnic.net>
>
Thanks Nico.
_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
^ permalink raw reply [flat|nested] 12+ messages in thread