All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 0/3] ARM: v7: get rid of boot time mini stack
@ 2021-02-08 22:49 Ard Biesheuvel
  2021-02-08 22:49 ` [PATCH 1/3] ARM: cache-v7: add missing ISB after cache level selection Ard Biesheuvel
                   ` (3 more replies)
  0 siblings, 4 replies; 12+ messages in thread
From: Ard Biesheuvel @ 2021-02-08 22:49 UTC (permalink / raw)
  To: linux-arm-kernel
  Cc: Nicolas Pitre, Marc Zyngier, Linus Walleij, Russell King,
	kernel-team, Ard Biesheuvel

The v7 boot code uses a small chunk of BSS to preserve some register
contents across a call to v7_invalidate_l1 that occurs with the MMU and
caches disabled. Memory accesses in such cases are tricky on v7+, given
that the architecture permits some unintuitive behaviors (it is
implementation defined whether accesses done with the MMU and caches off
may hit in the caches, and on SoCs that incorporate off-core system
caches, this behavior appears to be different even between cache
levels). Also, cache invalidation is not safe under virtualization if
the intent is to retain stores issued directly to DRAM, given that the
hypervisor may upgrade invalidate operations to clean+invalidate,
resulting in DRAM contents to be overwritte by the dirty cachelines that
we were trying to evict in the first place.

So let's address this issue, by removing the need for this stack to
exist in the first place: v7_invalidate_l1 can be rewritten to use fewer
registers, which means fewer registers need to be preserved, and we have
enough spare registers available.

Patch #1 adds a missing ISB. This patch is included separately so it can
be backported if desired.

Patch #2 rewrites v7_invalidate_l1 so it only uses 5 registers (not
counting lr which it must preserve as well)

Patch #3 updates the callers to use spare registers instead of the mini
stack to stash the values that need to be preserved across the calls to
v7_invalidate_l1.

Cc: Marc Zyngier <maz@kernel.org>
Cc: Russell King <linux@armlinux.org.uk>
Cc: Linus Walleij <linus.walleij@linaro.org>
Cc: Nicolas Pitre <nico@fluxnic.net>

Ard Biesheuvel (3):
  ARM: cache-v7: add missing ISB after cache level selection
  ARM: cache-v7: refactor v7_invalidate_l1 to avoid clobbering r5/r6
  ARM: cache-v7: get rid of mini-stack

 arch/arm/include/asm/memory.h | 15 -----
 arch/arm/mm/cache-v7.S        | 58 ++++++++++----------
 arch/arm/mm/proc-v7.S         | 40 ++++++--------
 3 files changed, 47 insertions(+), 66 deletions(-)

-- 
2.20.1


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 12+ messages in thread

* [PATCH 1/3] ARM: cache-v7: add missing ISB after cache level selection
  2021-02-08 22:49 [PATCH 0/3] ARM: v7: get rid of boot time mini stack Ard Biesheuvel
@ 2021-02-08 22:49 ` Ard Biesheuvel
  2021-02-08 22:49 ` [PATCH 2/3] ARM: cache-v7: refactor v7_invalidate_l1 to avoid clobbering r5/r6 Ard Biesheuvel
                   ` (2 subsequent siblings)
  3 siblings, 0 replies; 12+ messages in thread
From: Ard Biesheuvel @ 2021-02-08 22:49 UTC (permalink / raw)
  To: linux-arm-kernel
  Cc: Nicolas Pitre, Marc Zyngier, Linus Walleij, Russell King,
	kernel-team, Ard Biesheuvel

A write to CCSELR needs to complete before its results can be observed
via CCSIDR. So add a ISB to ensure that this is the case.

Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
---
 arch/arm/mm/cache-v7.S | 7 ++++---
 1 file changed, 4 insertions(+), 3 deletions(-)

diff --git a/arch/arm/mm/cache-v7.S b/arch/arm/mm/cache-v7.S
index dc8f152f3556..307f381eee71 100644
--- a/arch/arm/mm/cache-v7.S
+++ b/arch/arm/mm/cache-v7.S
@@ -38,9 +38,10 @@ icache_size:
  * procedures.
  */
 ENTRY(v7_invalidate_l1)
-       mov     r0, #0
-       mcr     p15, 2, r0, c0, c0, 0
-       mrc     p15, 1, r0, c0, c0, 0
+	mov	r0, #0
+	mcr	p15, 2, r0, c0, c0, 0	@ select L1 data cache in CSSELR
+	isb
+	mrc	p15, 1, r0, c0, c0, 0	@ read cache geometry from CCSIDR
 
        movw    r1, #0x7fff
        and     r2, r1, r0, lsr #13
-- 
2.20.1


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH 2/3] ARM: cache-v7: refactor v7_invalidate_l1 to avoid clobbering r5/r6
  2021-02-08 22:49 [PATCH 0/3] ARM: v7: get rid of boot time mini stack Ard Biesheuvel
  2021-02-08 22:49 ` [PATCH 1/3] ARM: cache-v7: add missing ISB after cache level selection Ard Biesheuvel
@ 2021-02-08 22:49 ` Ard Biesheuvel
  2021-02-08 22:49 ` [PATCH 3/3] ARM: cache-v7: get rid of mini-stack Ard Biesheuvel
  2021-02-08 23:11 ` [PATCH 0/3] ARM: v7: get rid of boot time mini stack Nicolas Pitre
  3 siblings, 0 replies; 12+ messages in thread
From: Ard Biesheuvel @ 2021-02-08 22:49 UTC (permalink / raw)
  To: linux-arm-kernel
  Cc: Nicolas Pitre, Marc Zyngier, Linus Walleij, Russell King,
	kernel-team, Ard Biesheuvel

The cache invalidation code in v7_invalidate_l1 can be tweaked to
re-read the associativity from CCSIDR, and keep the set/way identifier
component in a single register that is assigned in the outer loop. This
way, we need 2 registers less.

Given that the number of sets is typically much larger than the
associativity, rearrange the code so that the outer loop has the fewer
number of iterations, ensuring that the re-read of CCSIDR only occurs a
handful of times in practice.

Fix the whitespace while at it, and update the comment to indicate that
this code is no longer a clone of anything else.

Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
---
 arch/arm/mm/cache-v7.S | 51 ++++++++++----------
 1 file changed, 25 insertions(+), 26 deletions(-)

diff --git a/arch/arm/mm/cache-v7.S b/arch/arm/mm/cache-v7.S
index 307f381eee71..4544af4855f6 100644
--- a/arch/arm/mm/cache-v7.S
+++ b/arch/arm/mm/cache-v7.S
@@ -33,9 +33,8 @@ icache_size:
  * processor.  We fix this by performing an invalidate, rather than a
  * clean + invalidate, before jumping into the kernel.
  *
- * This function is cloned from arch/arm/mach-tegra/headsmp.S, and needs
- * to be called for both secondary cores startup and primary core resume
- * procedures.
+ * This function needs to be called for both secondary cores startup and
+ * primary core resume procedures.
  */
 ENTRY(v7_invalidate_l1)
 	mov	r0, #0
@@ -43,32 +42,32 @@ ENTRY(v7_invalidate_l1)
 	isb
 	mrc	p15, 1, r0, c0, c0, 0	@ read cache geometry from CCSIDR
 
-       movw    r1, #0x7fff
-       and     r2, r1, r0, lsr #13
+	movw	r3, #0x3ff
+	and	r3, r3, r0, lsr #3	@ 'Associativity' in CCSIDR[12:3]
+	clz	r1, r3			@ WayShift
+	mov	r2, #1
+	mov	r3, r3, lsl r1		@ NumWays-1 shifted into bits [31:...]
+	movs	r1, r2, lsl r1		@ #1 shifted left by same amount
+	moveq	r1, #1			@ r1 needs value > 0 even if only 1 way
 
-       movw    r1, #0x3ff
+	and	r2, r0, #0x7
+	add	r2, r2, #4		@ SetShift
 
-       and     r3, r1, r0, lsr #3      @ NumWays - 1
-       add     r2, r2, #1              @ NumSets
+1:	movw	r4, #0x7fff
+	and	r0, r4, r0, lsr #13	@ 'NumSets' in CCSIDR[27:13]
 
-       and     r0, r0, #0x7
-       add     r0, r0, #4      @ SetShift
-
-       clz     r1, r3          @ WayShift
-       add     r4, r3, #1      @ NumWays
-1:     sub     r2, r2, #1      @ NumSets--
-       mov     r3, r4          @ Temp = NumWays
-2:     subs    r3, r3, #1      @ Temp--
-       mov     r5, r3, lsl r1
-       mov     r6, r2, lsl r0
-       orr     r5, r5, r6      @ Reg = (Temp<<WayShift)|(NumSets<<SetShift)
-       mcr     p15, 0, r5, c7, c6, 2
-       bgt     2b
-       cmp     r2, #0
-       bgt     1b
-       dsb     st
-       isb
-       ret     lr
+2:	mov	r4, r0, lsl r2		@ NumSet << SetShift
+	orr	r4, r4, r3		@ Reg = (Temp<<WayShift)|(NumSets<<SetShift)
+	mcr	p15, 0, r4, c7, c6, 2
+	subs	r0, r0, #1		@ Set--
+	bpl	2b
+	subs	r3, r3, r1		@ Way--
+	bmi	3f
+	mrc	p15, 1, r0, c0, c0, 0	@ re-read cache geometry from CCSIDR
+	b	1b
+3:	dsb	st
+	isb
+	ret	lr
 ENDPROC(v7_invalidate_l1)
 
 /*
-- 
2.20.1


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH 3/3] ARM: cache-v7: get rid of mini-stack
  2021-02-08 22:49 [PATCH 0/3] ARM: v7: get rid of boot time mini stack Ard Biesheuvel
  2021-02-08 22:49 ` [PATCH 1/3] ARM: cache-v7: add missing ISB after cache level selection Ard Biesheuvel
  2021-02-08 22:49 ` [PATCH 2/3] ARM: cache-v7: refactor v7_invalidate_l1 to avoid clobbering r5/r6 Ard Biesheuvel
@ 2021-02-08 22:49 ` Ard Biesheuvel
  2021-02-08 23:07   ` Nicolas Pitre
  2021-02-08 23:11 ` [PATCH 0/3] ARM: v7: get rid of boot time mini stack Nicolas Pitre
  3 siblings, 1 reply; 12+ messages in thread
From: Ard Biesheuvel @ 2021-02-08 22:49 UTC (permalink / raw)
  To: linux-arm-kernel
  Cc: Nicolas Pitre, Marc Zyngier, Linus Walleij, Russell King,
	kernel-team, Ard Biesheuvel

Now that we have reduced the number of registers that we need to
preserve when calling v7_invalidate_l1 from the boot code, we can use
scratch registers to preserve the remaining ones, and get rid of the
mini stack entirely. This works around any issues regarding cache
behavior in relation to the uncached accesses to this memory, which is
hard to get right in the general case (i.e., both bare metal and under
virtualization)

While at it, switch v7_invalidate_l1 to using ip as a scratch register
instead of r4. This makes the function AAPCS compliant, and removes the
need to stash r4 in ip across the call.

Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
---
 arch/arm/include/asm/memory.h | 15 --------
 arch/arm/mm/cache-v7.S        | 10 ++---
 arch/arm/mm/proc-v7.S         | 40 +++++++++-----------
 3 files changed, 23 insertions(+), 42 deletions(-)

diff --git a/arch/arm/include/asm/memory.h b/arch/arm/include/asm/memory.h
index 2f841cb65c30..a711322d9f40 100644
--- a/arch/arm/include/asm/memory.h
+++ b/arch/arm/include/asm/memory.h
@@ -150,21 +150,6 @@ extern unsigned long vectors_base;
  */
 #define PLAT_PHYS_OFFSET	UL(CONFIG_PHYS_OFFSET)
 
-#ifdef CONFIG_XIP_KERNEL
-/*
- * When referencing data in RAM from the XIP region in a relative manner
- * with the MMU off, we need the relative offset between the two physical
- * addresses.  The macro below achieves this, which is:
- *    __pa(v_data) - __xip_pa(v_text)
- */
-#define PHYS_RELATIVE(v_data, v_text) \
-	(((v_data) - PAGE_OFFSET + PLAT_PHYS_OFFSET) - \
-	 ((v_text) - XIP_VIRT_ADDR(CONFIG_XIP_PHYS_ADDR) + \
-          CONFIG_XIP_PHYS_ADDR))
-#else
-#define PHYS_RELATIVE(v_data, v_text) ((v_data) - (v_text))
-#endif
-
 #ifndef __ASSEMBLY__
 
 /*
diff --git a/arch/arm/mm/cache-v7.S b/arch/arm/mm/cache-v7.S
index 4544af4855f6..243e82124663 100644
--- a/arch/arm/mm/cache-v7.S
+++ b/arch/arm/mm/cache-v7.S
@@ -53,12 +53,12 @@ ENTRY(v7_invalidate_l1)
 	and	r2, r0, #0x7
 	add	r2, r2, #4		@ SetShift
 
-1:	movw	r4, #0x7fff
-	and	r0, r4, r0, lsr #13	@ 'NumSets' in CCSIDR[27:13]
+1:	movw	ip, #0x7fff
+	and	r0, ip, r0, lsr #13	@ 'NumSets' in CCSIDR[27:13]
 
-2:	mov	r4, r0, lsl r2		@ NumSet << SetShift
-	orr	r4, r4, r3		@ Reg = (Temp<<WayShift)|(NumSets<<SetShift)
-	mcr	p15, 0, r4, c7, c6, 2
+2:	mov	ip, r0, lsl r2		@ NumSet << SetShift
+	orr	ip, ip, r3		@ Reg = (Temp<<WayShift)|(NumSets<<SetShift)
+	mcr	p15, 0, ip, c7, c6, 2
 	subs	r0, r0, #1		@ Set--
 	bpl	2b
 	subs	r3, r3, r1		@ Way--
diff --git a/arch/arm/mm/proc-v7.S b/arch/arm/mm/proc-v7.S
index 28c9d32fa99a..cab978c25b68 100644
--- a/arch/arm/mm/proc-v7.S
+++ b/arch/arm/mm/proc-v7.S
@@ -256,6 +256,20 @@ ENDPROC(cpu_pj4b_do_resume)
 
 #endif
 
+	@
+	@ Invoke the v7_invalidate_l1() function, which adheres to the AAPCS
+	@ rules, and so it may corrupt registers that we need to preserve.
+	@
+	.macro	do_invalidate_l1
+	mov	r6, r1
+	mov	r7, r2
+	mov	r10, lr
+	bl	v7_invalidate_l1		@ corrupts {r0-r3, ip, lr}
+	mov	r1, r6
+	mov	r2, r7
+	mov	lr, r10
+	.endm
+
 /*
  *	__v7_setup
  *
@@ -277,6 +291,7 @@ __v7_ca5mp_setup:
 __v7_ca9mp_setup:
 __v7_cr7mp_setup:
 __v7_cr8mp_setup:
+	do_invalidate_l1
 	mov	r10, #(1 << 0)			@ Cache/TLB ops broadcasting
 	b	1f
 __v7_ca7mp_setup:
@@ -284,13 +299,9 @@ __v7_ca12mp_setup:
 __v7_ca15mp_setup:
 __v7_b15mp_setup:
 __v7_ca17mp_setup:
+	do_invalidate_l1
 	mov	r10, #0
-1:	adr	r0, __v7_setup_stack_ptr
-	ldr	r12, [r0]
-	add	r12, r12, r0			@ the local stack
-	stmia	r12, {r1-r6, lr}		@ v7_invalidate_l1 touches r0-r6
-	bl      v7_invalidate_l1
-	ldmia	r12, {r1-r6, lr}
+1:
 #ifdef CONFIG_SMP
 	orr	r10, r10, #(1 << 6)		@ Enable SMP/nAMP mode
 	ALT_SMP(mrc	p15, 0, r0, c1, c0, 1)
@@ -471,12 +482,7 @@ __v7_pj4b_setup:
 #endif /* CONFIG_CPU_PJ4B */
 
 __v7_setup:
-	adr	r0, __v7_setup_stack_ptr
-	ldr	r12, [r0]
-	add	r12, r12, r0			@ the local stack
-	stmia	r12, {r1-r6, lr}		@ v7_invalidate_l1 touches r0-r6
-	bl      v7_invalidate_l1
-	ldmia	r12, {r1-r6, lr}
+	do_invalidate_l1
 
 __v7_setup_cont:
 	and	r0, r9, #0xff000000		@ ARM?
@@ -549,16 +555,6 @@ __errata_finish:
  THUMB(	orr	r0, r0, #1 << 30	)	@ Thumb exceptions
 	ret	lr				@ return to head.S:__ret
 
-	.align	2
-__v7_setup_stack_ptr:
-	.word	PHYS_RELATIVE(__v7_setup_stack, .)
-ENDPROC(__v7_setup)
-
-	.bss
-	.align	2
-__v7_setup_stack:
-	.space	4 * 7				@ 7 registers
-
 	__INITDATA
 
 	.weak cpu_v7_bugs_init
-- 
2.20.1


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 12+ messages in thread

* Re: [PATCH 3/3] ARM: cache-v7: get rid of mini-stack
  2021-02-08 22:49 ` [PATCH 3/3] ARM: cache-v7: get rid of mini-stack Ard Biesheuvel
@ 2021-02-08 23:07   ` Nicolas Pitre
  2021-02-09 22:37     ` Ard Biesheuvel
  0 siblings, 1 reply; 12+ messages in thread
From: Nicolas Pitre @ 2021-02-08 23:07 UTC (permalink / raw)
  To: Ard Biesheuvel
  Cc: Marc Zyngier, Linus Walleij, kernel-team, Russell King, linux-arm-kernel

On Mon, 8 Feb 2021, Ard Biesheuvel wrote:

> @@ -549,16 +555,6 @@ __errata_finish:
>   THUMB(	orr	r0, r0, #1 << 30	)	@ Thumb exceptions
>  	ret	lr				@ return to head.S:__ret
>  
> -	.align	2
> -__v7_setup_stack_ptr:
> -	.word	PHYS_RELATIVE(__v7_setup_stack, .)
> -ENDPROC(__v7_setup)
> -
> -	.bss
> -	.align	2
> -__v7_setup_stack:
> -	.space	4 * 7				@ 7 registers
> -
>  	__INITDATA

You should have preserved the ENDPROC here.


Nicolas

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH 0/3] ARM: v7: get rid of boot time mini stack
  2021-02-08 22:49 [PATCH 0/3] ARM: v7: get rid of boot time mini stack Ard Biesheuvel
                   ` (2 preceding siblings ...)
  2021-02-08 22:49 ` [PATCH 3/3] ARM: cache-v7: get rid of mini-stack Ard Biesheuvel
@ 2021-02-08 23:11 ` Nicolas Pitre
  2021-02-09 22:37   ` Ard Biesheuvel
  3 siblings, 1 reply; 12+ messages in thread
From: Nicolas Pitre @ 2021-02-08 23:11 UTC (permalink / raw)
  To: Ard Biesheuvel
  Cc: Marc Zyngier, Linus Walleij, kernel-team, Russell King, linux-arm-kernel

On Mon, 8 Feb 2021, Ard Biesheuvel wrote:

> The v7 boot code uses a small chunk of BSS to preserve some register
> contents across a call to v7_invalidate_l1 that occurs with the MMU and
> caches disabled. Memory accesses in such cases are tricky on v7+, given
> that the architecture permits some unintuitive behaviors (it is
> implementation defined whether accesses done with the MMU and caches off
> may hit in the caches, and on SoCs that incorporate off-core system
> caches, this behavior appears to be different even between cache
> levels). Also, cache invalidation is not safe under virtualization if
> the intent is to retain stores issued directly to DRAM, given that the
> hypervisor may upgrade invalidate operations to clean+invalidate,
> resulting in DRAM contents to be overwritte by the dirty cachelines that
> we were trying to evict in the first place.
> 
> So let's address this issue, by removing the need for this stack to
> exist in the first place: v7_invalidate_l1 can be rewritten to use fewer
> registers, which means fewer registers need to be preserved, and we have
> enough spare registers available.

That is excellent.

I wonder why r1-r3 were preserved though.


Nicolas

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH 0/3] ARM: v7: get rid of boot time mini stack
  2021-02-08 23:11 ` [PATCH 0/3] ARM: v7: get rid of boot time mini stack Nicolas Pitre
@ 2021-02-09 22:37   ` Ard Biesheuvel
  2021-02-09 22:59     ` Nicolas Pitre
  0 siblings, 1 reply; 12+ messages in thread
From: Ard Biesheuvel @ 2021-02-09 22:37 UTC (permalink / raw)
  To: Nicolas Pitre
  Cc: Marc Zyngier, Linus Walleij, Android Kernel Team, Russell King,
	Linux ARM

On Tue, 9 Feb 2021 at 00:12, Nicolas Pitre <nico@fluxnic.net> wrote:
>
> On Mon, 8 Feb 2021, Ard Biesheuvel wrote:
>
> > The v7 boot code uses a small chunk of BSS to preserve some register
> > contents across a call to v7_invalidate_l1 that occurs with the MMU and
> > caches disabled. Memory accesses in such cases are tricky on v7+, given
> > that the architecture permits some unintuitive behaviors (it is
> > implementation defined whether accesses done with the MMU and caches off
> > may hit in the caches, and on SoCs that incorporate off-core system
> > caches, this behavior appears to be different even between cache
> > levels). Also, cache invalidation is not safe under virtualization if
> > the intent is to retain stores issued directly to DRAM, given that the
> > hypervisor may upgrade invalidate operations to clean+invalidate,
> > resulting in DRAM contents to be overwritte by the dirty cachelines that
> > we were trying to evict in the first place.
> >
> > So let's address this issue, by removing the need for this stack to
> > exist in the first place: v7_invalidate_l1 can be rewritten to use fewer
> > registers, which means fewer registers need to be preserved, and we have
> > enough spare registers available.
>
> That is excellent.
>
> I wonder why r1-r3 were preserved though.
>

r1 and r2 are documented in head.S as

         * The processor init function will be called with:
         *  r1 - machine type
         *  r2 - boot data (atags/dt) pointer

but preserving the value of r3 does not seem necessary. Perhaps this
is a leftover from old code?

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH 3/3] ARM: cache-v7: get rid of mini-stack
  2021-02-08 23:07   ` Nicolas Pitre
@ 2021-02-09 22:37     ` Ard Biesheuvel
  0 siblings, 0 replies; 12+ messages in thread
From: Ard Biesheuvel @ 2021-02-09 22:37 UTC (permalink / raw)
  To: Nicolas Pitre
  Cc: Marc Zyngier, Linus Walleij, Android Kernel Team, Russell King,
	Linux ARM

On Tue, 9 Feb 2021 at 00:08, Nicolas Pitre <nico@fluxnic.net> wrote:
>
> On Mon, 8 Feb 2021, Ard Biesheuvel wrote:
>
> > @@ -549,16 +555,6 @@ __errata_finish:
> >   THUMB(      orr     r0, r0, #1 << 30        )       @ Thumb exceptions
> >       ret     lr                              @ return to head.S:__ret
> >
> > -     .align  2
> > -__v7_setup_stack_ptr:
> > -     .word   PHYS_RELATIVE(__v7_setup_stack, .)
> > -ENDPROC(__v7_setup)
> > -
> > -     .bss
> > -     .align  2
> > -__v7_setup_stack:
> > -     .space  4 * 7                           @ 7 registers
> > -
> >       __INITDATA
>
> You should have preserved the ENDPROC here.
>

Indeed. Will fix that in v2.

Thanks,
Ard.

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH 0/3] ARM: v7: get rid of boot time mini stack
  2021-02-09 22:37   ` Ard Biesheuvel
@ 2021-02-09 22:59     ` Nicolas Pitre
  2021-02-09 23:02       ` Ard Biesheuvel
  0 siblings, 1 reply; 12+ messages in thread
From: Nicolas Pitre @ 2021-02-09 22:59 UTC (permalink / raw)
  To: Ard Biesheuvel
  Cc: Marc Zyngier, Linus Walleij, Android Kernel Team, Russell King,
	Linux ARM

On Tue, 9 Feb 2021, Ard Biesheuvel wrote:

> On Tue, 9 Feb 2021 at 00:12, Nicolas Pitre <nico@fluxnic.net> wrote:
> >
> > On Mon, 8 Feb 2021, Ard Biesheuvel wrote:
> >
> > > The v7 boot code uses a small chunk of BSS to preserve some register
> > > contents across a call to v7_invalidate_l1 that occurs with the MMU and
> > > caches disabled. Memory accesses in such cases are tricky on v7+, given
> > > that the architecture permits some unintuitive behaviors (it is
> > > implementation defined whether accesses done with the MMU and caches off
> > > may hit in the caches, and on SoCs that incorporate off-core system
> > > caches, this behavior appears to be different even between cache
> > > levels). Also, cache invalidation is not safe under virtualization if
> > > the intent is to retain stores issued directly to DRAM, given that the
> > > hypervisor may upgrade invalidate operations to clean+invalidate,
> > > resulting in DRAM contents to be overwritte by the dirty cachelines that
> > > we were trying to evict in the first place.
> > >
> > > So let's address this issue, by removing the need for this stack to
> > > exist in the first place: v7_invalidate_l1 can be rewritten to use fewer
> > > registers, which means fewer registers need to be preserved, and we have
> > > enough spare registers available.
> >
> > That is excellent.
> >
> > I wonder why r1-r3 were preserved though.
> >
> 
> r1 and r2 are documented in head.S as
> 
>          * The processor init function will be called with:
>          *  r1 - machine type
>          *  r2 - boot data (atags/dt) pointer
> 
> but preserving the value of r3 does not seem necessary. Perhaps this
> is a leftover from old code?

Still, further down in the same comment it is said:

         * On return, the CPU will be ready for the MMU to be turned on,
         * r0 will hold the CPU control register value, r1, r2, r4, and
         * r9 will be preserved.  r5 will also be preserved if LPAE.

But you're now clobbering r1 and r2. And when this returns, code 
execution goes to __enable_mmu whose comment also mentions the expected 
r1 and r2 content. That would need adjusting too.


Nicolas

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH 0/3] ARM: v7: get rid of boot time mini stack
  2021-02-09 22:59     ` Nicolas Pitre
@ 2021-02-09 23:02       ` Ard Biesheuvel
  2021-02-09 23:22         ` Nicolas Pitre
  0 siblings, 1 reply; 12+ messages in thread
From: Ard Biesheuvel @ 2021-02-09 23:02 UTC (permalink / raw)
  To: Nicolas Pitre
  Cc: Marc Zyngier, Linus Walleij, Android Kernel Team, Russell King,
	Linux ARM

On Tue, 9 Feb 2021 at 23:59, Nicolas Pitre <nico@fluxnic.net> wrote:
>
> On Tue, 9 Feb 2021, Ard Biesheuvel wrote:
>
> > On Tue, 9 Feb 2021 at 00:12, Nicolas Pitre <nico@fluxnic.net> wrote:
> > >
> > > On Mon, 8 Feb 2021, Ard Biesheuvel wrote:
> > >
> > > > The v7 boot code uses a small chunk of BSS to preserve some register
> > > > contents across a call to v7_invalidate_l1 that occurs with the MMU and
> > > > caches disabled. Memory accesses in such cases are tricky on v7+, given
> > > > that the architecture permits some unintuitive behaviors (it is
> > > > implementation defined whether accesses done with the MMU and caches off
> > > > may hit in the caches, and on SoCs that incorporate off-core system
> > > > caches, this behavior appears to be different even between cache
> > > > levels). Also, cache invalidation is not safe under virtualization if
> > > > the intent is to retain stores issued directly to DRAM, given that the
> > > > hypervisor may upgrade invalidate operations to clean+invalidate,
> > > > resulting in DRAM contents to be overwritte by the dirty cachelines that
> > > > we were trying to evict in the first place.
> > > >
> > > > So let's address this issue, by removing the need for this stack to
> > > > exist in the first place: v7_invalidate_l1 can be rewritten to use fewer
> > > > registers, which means fewer registers need to be preserved, and we have
> > > > enough spare registers available.
> > >
> > > That is excellent.
> > >
> > > I wonder why r1-r3 were preserved though.
> > >
> >
> > r1 and r2 are documented in head.S as
> >
> >          * The processor init function will be called with:
> >          *  r1 - machine type
> >          *  r2 - boot data (atags/dt) pointer
> >
> > but preserving the value of r3 does not seem necessary. Perhaps this
> > is a leftover from old code?
>
> Still, further down in the same comment it is said:
>
>          * On return, the CPU will be ready for the MMU to be turned on,
>          * r0 will hold the CPU control register value, r1, r2, r4, and
>          * r9 will be preserved.  r5 will also be preserved if LPAE.
>
> But you're now clobbering r1 and r2. And when this returns, code
> execution goes to __enable_mmu whose comment also mentions the expected
> r1 and r2 content. That would need adjusting too.
>

No, r1 and r2 are preserved - they are stashed in r6/r7 across the
call to v7_invalidate_l1. r0, r3 and r12 are clobbered as well, but
those do not contain anything that needs to be preserved.

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH 0/3] ARM: v7: get rid of boot time mini stack
  2021-02-09 23:02       ` Ard Biesheuvel
@ 2021-02-09 23:22         ` Nicolas Pitre
  2021-02-10 17:39           ` Ard Biesheuvel
  0 siblings, 1 reply; 12+ messages in thread
From: Nicolas Pitre @ 2021-02-09 23:22 UTC (permalink / raw)
  To: Ard Biesheuvel
  Cc: Marc Zyngier, Linus Walleij, Android Kernel Team, Russell King,
	Linux ARM

On Wed, 10 Feb 2021, Ard Biesheuvel wrote:

> On Tue, 9 Feb 2021 at 23:59, Nicolas Pitre <nico@fluxnic.net> wrote:
> >
> > On Tue, 9 Feb 2021, Ard Biesheuvel wrote:
> >
> > > On Tue, 9 Feb 2021 at 00:12, Nicolas Pitre <nico@fluxnic.net> wrote:
> > > >
> > > > On Mon, 8 Feb 2021, Ard Biesheuvel wrote:
> > > >
> > > > > The v7 boot code uses a small chunk of BSS to preserve some register
> > > > > contents across a call to v7_invalidate_l1 that occurs with the MMU and
> > > > > caches disabled. Memory accesses in such cases are tricky on v7+, given
> > > > > that the architecture permits some unintuitive behaviors (it is
> > > > > implementation defined whether accesses done with the MMU and caches off
> > > > > may hit in the caches, and on SoCs that incorporate off-core system
> > > > > caches, this behavior appears to be different even between cache
> > > > > levels). Also, cache invalidation is not safe under virtualization if
> > > > > the intent is to retain stores issued directly to DRAM, given that the
> > > > > hypervisor may upgrade invalidate operations to clean+invalidate,
> > > > > resulting in DRAM contents to be overwritte by the dirty cachelines that
> > > > > we were trying to evict in the first place.
> > > > >
> > > > > So let's address this issue, by removing the need for this stack to
> > > > > exist in the first place: v7_invalidate_l1 can be rewritten to use fewer
> > > > > registers, which means fewer registers need to be preserved, and we have
> > > > > enough spare registers available.
> > > >
> > > > That is excellent.
> > > >
> > > > I wonder why r1-r3 were preserved though.
> > > >
> > >
> > > r1 and r2 are documented in head.S as
> > >
> > >          * The processor init function will be called with:
> > >          *  r1 - machine type
> > >          *  r2 - boot data (atags/dt) pointer
> > >
> > > but preserving the value of r3 does not seem necessary. Perhaps this
> > > is a leftover from old code?
> >
> > Still, further down in the same comment it is said:
> >
> >          * On return, the CPU will be ready for the MMU to be turned on,
> >          * r0 will hold the CPU control register value, r1, r2, r4, and
> >          * r9 will be preserved.  r5 will also be preserved if LPAE.
> >
> > But you're now clobbering r1 and r2. And when this returns, code
> > execution goes to __enable_mmu whose comment also mentions the expected
> > r1 and r2 content. That would need adjusting too.
> >
> 
> No, r1 and r2 are preserved - they are stashed in r6/r7 across the
> call to v7_invalidate_l1. r0, r3 and r12 are clobbered as well, but
> those do not contain anything that needs to be preserved.

OK, you're right. I didn't look at the whole code.

Acked-by: Nicolas Pitre <nico@fluxnic.net>


Nicolas

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH 0/3] ARM: v7: get rid of boot time mini stack
  2021-02-09 23:22         ` Nicolas Pitre
@ 2021-02-10 17:39           ` Ard Biesheuvel
  0 siblings, 0 replies; 12+ messages in thread
From: Ard Biesheuvel @ 2021-02-10 17:39 UTC (permalink / raw)
  To: Nicolas Pitre
  Cc: Marc Zyngier, Linus Walleij, Android Kernel Team, Russell King,
	Linux ARM

On Wed, 10 Feb 2021 at 00:22, Nicolas Pitre <nico@fluxnic.net> wrote:
>
> On Wed, 10 Feb 2021, Ard Biesheuvel wrote:
>
> > On Tue, 9 Feb 2021 at 23:59, Nicolas Pitre <nico@fluxnic.net> wrote:
> > >
> > > On Tue, 9 Feb 2021, Ard Biesheuvel wrote:
> > >
> > > > On Tue, 9 Feb 2021 at 00:12, Nicolas Pitre <nico@fluxnic.net> wrote:
> > > > >
> > > > > On Mon, 8 Feb 2021, Ard Biesheuvel wrote:
> > > > >
> > > > > > The v7 boot code uses a small chunk of BSS to preserve some register
> > > > > > contents across a call to v7_invalidate_l1 that occurs with the MMU and
> > > > > > caches disabled. Memory accesses in such cases are tricky on v7+, given
> > > > > > that the architecture permits some unintuitive behaviors (it is
> > > > > > implementation defined whether accesses done with the MMU and caches off
> > > > > > may hit in the caches, and on SoCs that incorporate off-core system
> > > > > > caches, this behavior appears to be different even between cache
> > > > > > levels). Also, cache invalidation is not safe under virtualization if
> > > > > > the intent is to retain stores issued directly to DRAM, given that the
> > > > > > hypervisor may upgrade invalidate operations to clean+invalidate,
> > > > > > resulting in DRAM contents to be overwritte by the dirty cachelines that
> > > > > > we were trying to evict in the first place.
> > > > > >
> > > > > > So let's address this issue, by removing the need for this stack to
> > > > > > exist in the first place: v7_invalidate_l1 can be rewritten to use fewer
> > > > > > registers, which means fewer registers need to be preserved, and we have
> > > > > > enough spare registers available.
> > > > >
> > > > > That is excellent.
> > > > >
> > > > > I wonder why r1-r3 were preserved though.
> > > > >
> > > >
> > > > r1 and r2 are documented in head.S as
> > > >
> > > >          * The processor init function will be called with:
> > > >          *  r1 - machine type
> > > >          *  r2 - boot data (atags/dt) pointer
> > > >
> > > > but preserving the value of r3 does not seem necessary. Perhaps this
> > > > is a leftover from old code?
> > >
> > > Still, further down in the same comment it is said:
> > >
> > >          * On return, the CPU will be ready for the MMU to be turned on,
> > >          * r0 will hold the CPU control register value, r1, r2, r4, and
> > >          * r9 will be preserved.  r5 will also be preserved if LPAE.
> > >
> > > But you're now clobbering r1 and r2. And when this returns, code
> > > execution goes to __enable_mmu whose comment also mentions the expected
> > > r1 and r2 content. That would need adjusting too.
> > >
> >
> > No, r1 and r2 are preserved - they are stashed in r6/r7 across the
> > call to v7_invalidate_l1. r0, r3 and r12 are clobbered as well, but
> > those do not contain anything that needs to be preserved.
>
> OK, you're right. I didn't look at the whole code.
>
> Acked-by: Nicolas Pitre <nico@fluxnic.net>
>

Thanks Nico.

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2021-02-10 17:40 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-02-08 22:49 [PATCH 0/3] ARM: v7: get rid of boot time mini stack Ard Biesheuvel
2021-02-08 22:49 ` [PATCH 1/3] ARM: cache-v7: add missing ISB after cache level selection Ard Biesheuvel
2021-02-08 22:49 ` [PATCH 2/3] ARM: cache-v7: refactor v7_invalidate_l1 to avoid clobbering r5/r6 Ard Biesheuvel
2021-02-08 22:49 ` [PATCH 3/3] ARM: cache-v7: get rid of mini-stack Ard Biesheuvel
2021-02-08 23:07   ` Nicolas Pitre
2021-02-09 22:37     ` Ard Biesheuvel
2021-02-08 23:11 ` [PATCH 0/3] ARM: v7: get rid of boot time mini stack Nicolas Pitre
2021-02-09 22:37   ` Ard Biesheuvel
2021-02-09 22:59     ` Nicolas Pitre
2021-02-09 23:02       ` Ard Biesheuvel
2021-02-09 23:22         ` Nicolas Pitre
2021-02-10 17:39           ` Ard Biesheuvel

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.