All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v3 0/7] ARM: add vmap'ed stack support
@ 2021-11-15 11:18 Ard Biesheuvel
  2021-11-15 11:18 ` [PATCH v3 1/7] ARM: memcpy: use frame pointer as unwind anchor Ard Biesheuvel
                   ` (6 more replies)
  0 siblings, 7 replies; 36+ messages in thread
From: Ard Biesheuvel @ 2021-11-15 11:18 UTC (permalink / raw)
  To: linux-arm-kernel
  Cc: Ard Biesheuvel, Russell King, Nicolas Pitre, Arnd Bergmann,
	Kees Cook, Keith Packard, Linus Walleij, Nick Desaulniers

This series enables support on ARM for vmap'ed task and IRQ stacks in
the kernel. This is an important hardening feature that terminates tasks
on inadvertent or deliberate accesses past the stack pointer, which
might otherwise go completely unnoticed.

Since having an accurate backtrace is especially important in such
cases, this series includes some enhancements to the unwinder and to
some hand rolled unwind info to increase the likelihood that a backtrace
can be generated when relying on the ARM unwinder. The frame pointer
unwinder turns out to be rather bullet proof in this context, and does
not need any such enhancements.

According to a quick survey I did, compiler generated code puts a single
stack push as the first instruction in about 2/3 of the cases, which the
unwinder can deal with after applying patch #4, even if this push
faulted because of a stack overflow. In the remaining cases, the
compiler tends to fall back to R11 or R7 as the frame pointer (on ARM
or Thumb-2, respectively), or emit partial unwind frames for the part of
the function that runs before the stack frame is set up, and the part
that runs inside the stack frame. In either case, the unwinder can deal
with such occurrences as they don't rely on the stack pointer directly.

Changes since v2:
- rebase onto v5.16-rc1
- incorporate Nico's review feedback

Changes since v1:
- handle a missed corner case in svc_entry code, and while at it,
  streamline it a bit, especially for Thumb-2, which no longer
  needs to move SP into R0 twice to do the overflow check and the
  alignment check,
- improve the memcpy patch so that we no longer need to push the frame
  pointer separately,
- add Keith's tested-by

Patches #1, #2 and #3 update the ARM asm string routines to align more
closely with the compiler's approach in terms of unwind tables,
increasing the likelihood that we can unwind them in case of a stack
overflow.

Patches #5 and #6 do some preparatory refactoring for the entry and
switch_to code, to reduce clutter in patch #7.

Patch #7 wires up the generic support, and adds the entry code to detect
and deal with stack overflows.

This series applies onto my IRQ stacks series sent out earlier:
https://lore.kernel.org/linux-arm-kernel/20211115084732.3704393-1-ardb@kernel.org/

Cc: Russell King <linux@armlinux.org.uk>
Cc: Nicolas Pitre <nico@fluxnic.net>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Kees Cook <keescook@chromium.org>
Cc: Keith Packard <keithpac@amazon.com>
Cc: Linus Walleij <linus.walleij@linaro.org>
Cc: Nick Desaulniers <ndesaulniers@google.com>

Ard Biesheuvel (7):
  ARM: memcpy: use frame pointer as unwind anchor
  ARM: memmove: use frame pointer as unwind anchor
  ARM: memset: clean up unwind annotations
  ARM: unwind: disregard unwind info before stack frame is set up
  ARM: switch_to: clean up Thumb2 code path
  ARM: entry: rework stack realignment code in svc_entry
  ARM: implement support for vmap'ed stacks

 arch/arm/Kconfig                   |   1 +
 arch/arm/include/asm/page.h        |   4 +
 arch/arm/include/asm/thread_info.h |   8 ++
 arch/arm/kernel/entry-armv.S       | 121 +++++++++++++++++---
 arch/arm/kernel/entry-header.S     |  57 +++++++++
 arch/arm/kernel/irq.c              |   9 +-
 arch/arm/kernel/traps.c            |  65 ++++++++++-
 arch/arm/kernel/unwind.c           |  19 ++-
 arch/arm/kernel/vmlinux.lds.S      |   4 +-
 arch/arm/lib/copy_from_user.S      |  13 +--
 arch/arm/lib/copy_template.S       |  67 ++++-------
 arch/arm/lib/copy_to_user.S        |  13 +--
 arch/arm/lib/memcpy.S              |  13 +--
 arch/arm/lib/memmove.S             |  60 ++++------
 arch/arm/lib/memset.S              |   7 +-
 15 files changed, 324 insertions(+), 137 deletions(-)

-- 
2.30.2


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 36+ messages in thread

* [PATCH v3 1/7] ARM: memcpy: use frame pointer as unwind anchor
  2021-11-15 11:18 [PATCH v3 0/7] ARM: add vmap'ed stack support Ard Biesheuvel
@ 2021-11-15 11:18 ` Ard Biesheuvel
  2021-11-15 11:18 ` [PATCH v3 2/7] ARM: memmove: " Ard Biesheuvel
                   ` (5 subsequent siblings)
  6 siblings, 0 replies; 36+ messages in thread
From: Ard Biesheuvel @ 2021-11-15 11:18 UTC (permalink / raw)
  To: linux-arm-kernel
  Cc: Ard Biesheuvel, Russell King, Nicolas Pitre, Arnd Bergmann,
	Kees Cook, Keith Packard, Linus Walleij, Nick Desaulniers

The memcpy template is a bit unusual in the way it manages the stack
pointer: depending on the execution path through the function, the SP
assumes different values as different subsets of the register file are
preserved and restored again. This is problematic when it comes to EHABI
unwind info, as it is not instruction accurate, and does not allow
tracking the SP value as it changes.

Commit 279f487e0b471 ("ARM: 8225/1: Add unwinding support for memory
copy functions") addressed this by carving up the function in different
chunks as far as the unwinder is concerned, and keeping a set of unwind
directives for each of them, each corresponding with the state of the
stack pointer during execution of the chunk in question. This not only
duplicates unwind info unnecessarily, but it also complicates unwinding
the stack upon overflow.

Instead, let's do what the compiler does when the SP is updated halfway
through a function, which is to use a frame pointer and emit the
appropriate unwind directives to communicate this to the unwinder.

Note that Thumb-2 uses R7 for this, while ARM uses R11 aka FP. So let's
avoid touching R7 in the body of the template, so that Thumb-2 can use
it as the frame pointer. R11 was not modified in the first place.

Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Tested-by: Keith Packard <keithpac@amazon.com>
---
 arch/arm/lib/copy_from_user.S | 13 ++--
 arch/arm/lib/copy_template.S  | 67 +++++++-------------
 arch/arm/lib/copy_to_user.S   | 13 ++--
 arch/arm/lib/memcpy.S         | 13 ++--
 4 files changed, 38 insertions(+), 68 deletions(-)

diff --git a/arch/arm/lib/copy_from_user.S b/arch/arm/lib/copy_from_user.S
index 480a20766137..270de7debd0f 100644
--- a/arch/arm/lib/copy_from_user.S
+++ b/arch/arm/lib/copy_from_user.S
@@ -91,18 +91,15 @@
 	strb\cond \reg, [\ptr], #1
 	.endm
 
-	.macro enter reg1 reg2
+	.macro enter regs:vararg
 	mov	r3, #0
-	stmdb	sp!, {r0, r2, r3, \reg1, \reg2}
+UNWIND( .save	{r0, r2, r3, \regs}		)
+	stmdb	sp!, {r0, r2, r3, \regs}
 	.endm
 
-	.macro usave reg1 reg2
-	UNWIND(	.save {r0, r2, r3, \reg1, \reg2}	)
-	.endm
-
-	.macro exit reg1 reg2
+	.macro exit regs:vararg
 	add	sp, sp, #8
-	ldmfd	sp!, {r0, \reg1, \reg2}
+	ldmfd	sp!, {r0, \regs}
 	.endm
 
 	.text
diff --git a/arch/arm/lib/copy_template.S b/arch/arm/lib/copy_template.S
index 810a805d36dc..8fbafb074fe9 100644
--- a/arch/arm/lib/copy_template.S
+++ b/arch/arm/lib/copy_template.S
@@ -69,13 +69,10 @@
  *	than one 32bit instruction in Thumb-2)
  */
 
-
-	UNWIND(	.fnstart			)
-		enter	r4, lr
-	UNWIND(	.fnend				)
-
 	UNWIND(	.fnstart			)
-		usave	r4, lr			  @ in first stmdb block
+		enter	r4, UNWIND(fpreg,) lr
+	UNWIND(	.setfp	fpreg, sp		)
+	UNWIND(	mov	fpreg, sp		)
 
 		subs	r2, r2, #4
 		blt	8f
@@ -86,12 +83,7 @@
 		bne	10f
 
 1:		subs	r2, r2, #(28)
-		stmfd	sp!, {r5 - r8}
-	UNWIND(	.fnend				)
-
-	UNWIND(	.fnstart			)
-		usave	r4, lr
-	UNWIND(	.save	{r5 - r8}		) @ in second stmfd block
+		stmfd	sp!, {r5, r6, r8, r9}
 		blt	5f
 
 	CALGN(	ands	ip, r0, #31		)
@@ -110,9 +102,9 @@
 	PLD(	pld	[r1, #92]		)
 
 3:	PLD(	pld	[r1, #124]		)
-4:		ldr8w	r1, r3, r4, r5, r6, r7, r8, ip, lr, abort=20f
+4:		ldr8w	r1, r3, r4, r5, r6, r8, r9, ip, lr, abort=20f
 		subs	r2, r2, #32
-		str8w	r0, r3, r4, r5, r6, r7, r8, ip, lr, abort=20f
+		str8w	r0, r3, r4, r5, r6, r8, r9, ip, lr, abort=20f
 		bge	3b
 	PLD(	cmn	r2, #96			)
 	PLD(	bge	4b			)
@@ -132,8 +124,8 @@
 		ldr1w	r1, r4, abort=20f
 		ldr1w	r1, r5, abort=20f
 		ldr1w	r1, r6, abort=20f
-		ldr1w	r1, r7, abort=20f
 		ldr1w	r1, r8, abort=20f
+		ldr1w	r1, r9, abort=20f
 		ldr1w	r1, lr, abort=20f
 
 #if LDR1W_SHIFT < STR1W_SHIFT
@@ -150,17 +142,14 @@
 		str1w	r0, r4, abort=20f
 		str1w	r0, r5, abort=20f
 		str1w	r0, r6, abort=20f
-		str1w	r0, r7, abort=20f
 		str1w	r0, r8, abort=20f
+		str1w	r0, r9, abort=20f
 		str1w	r0, lr, abort=20f
 
 	CALGN(	bcs	2b			)
 
-7:		ldmfd	sp!, {r5 - r8}
-	UNWIND(	.fnend				) @ end of second stmfd block
+7:		ldmfd	sp!, {r5, r6, r8, r9}
 
-	UNWIND(	.fnstart			)
-		usave	r4, lr			  @ still in first stmdb block
 8:		movs	r2, r2, lsl #31
 		ldr1b	r1, r3, ne, abort=21f
 		ldr1b	r1, r4, cs, abort=21f
@@ -169,7 +158,7 @@
 		str1b	r0, r4, cs, abort=21f
 		str1b	r0, ip, cs, abort=21f
 
-		exit	r4, pc
+		exit	r4, UNWIND(fpreg,) pc
 
 9:		rsb	ip, ip, #4
 		cmp	ip, #2
@@ -189,13 +178,10 @@
 		ldr1w	r1, lr, abort=21f
 		beq	17f
 		bgt	18f
-	UNWIND(	.fnend				)
 
 
 		.macro	forward_copy_shift pull push
 
-	UNWIND(	.fnstart			)
-		usave	r4, lr			  @ still in first stmdb block
 		subs	r2, r2, #28
 		blt	14f
 
@@ -205,12 +191,8 @@
 	CALGN(	subcc	r2, r2, ip		)
 	CALGN(	bcc	15f			)
 
-11:		stmfd	sp!, {r5 - r9}
-	UNWIND(	.fnend				)
+11:		stmfd	sp!, {r5, r6, r8 - r10}
 
-	UNWIND(	.fnstart			)
-		usave	r4, lr
-	UNWIND(	.save	{r5 - r9}		) @ in new second stmfd block
 	PLD(	pld	[r1, #0]		)
 	PLD(	subs	r2, r2, #96		)
 	PLD(	pld	[r1, #28]		)
@@ -219,35 +201,32 @@
 	PLD(	pld	[r1, #92]		)
 
 12:	PLD(	pld	[r1, #124]		)
-13:		ldr4w	r1, r4, r5, r6, r7, abort=19f
+13:		ldr4w	r1, r4, r5, r6, r8, abort=19f
 		mov	r3, lr, lspull #\pull
 		subs	r2, r2, #32
-		ldr4w	r1, r8, r9, ip, lr, abort=19f
+		ldr4w	r1, r9, r10, ip, lr, abort=19f
 		orr	r3, r3, r4, lspush #\push
 		mov	r4, r4, lspull #\pull
 		orr	r4, r4, r5, lspush #\push
 		mov	r5, r5, lspull #\pull
 		orr	r5, r5, r6, lspush #\push
 		mov	r6, r6, lspull #\pull
-		orr	r6, r6, r7, lspush #\push
-		mov	r7, r7, lspull #\pull
-		orr	r7, r7, r8, lspush #\push
+		orr	r6, r6, r8, lspush #\push
 		mov	r8, r8, lspull #\pull
 		orr	r8, r8, r9, lspush #\push
 		mov	r9, r9, lspull #\pull
-		orr	r9, r9, ip, lspush #\push
+		orr	r9, r9, r10, lspush #\push
+		mov	r10, r10, lspull #\pull
+		orr	r10, r10, ip, lspush #\push
 		mov	ip, ip, lspull #\pull
 		orr	ip, ip, lr, lspush #\push
-		str8w	r0, r3, r4, r5, r6, r7, r8, r9, ip, abort=19f
+		str8w	r0, r3, r4, r5, r6, r8, r9, r10, ip, abort=19f
 		bge	12b
 	PLD(	cmn	r2, #96			)
 	PLD(	bge	13b			)
 
-		ldmfd	sp!, {r5 - r9}
-	UNWIND(	.fnend				) @ end of the second stmfd block
+		ldmfd	sp!, {r5, r6, r8 - r10}
 
-	UNWIND(	.fnstart			)
-		usave	r4, lr			  @ still in first stmdb block
 14:		ands	ip, r2, #28
 		beq	16f
 
@@ -262,7 +241,6 @@
 
 16:		sub	r1, r1, #(\push / 8)
 		b	8b
-	UNWIND(	.fnend				)
 
 		.endm
 
@@ -273,6 +251,7 @@
 
 18:		forward_copy_shift	pull=24	push=8
 
+	UNWIND(	.fnend				)
 
 /*
  * Abort preamble and completion macros.
@@ -282,13 +261,13 @@
  */
 
 	.macro	copy_abort_preamble
-19:	ldmfd	sp!, {r5 - r9}
+19:	ldmfd	sp!, {r5, r6, r8 - r10}
 	b	21f
-20:	ldmfd	sp!, {r5 - r8}
+20:	ldmfd	sp!, {r5, r6, r8, r9}
 21:
 	.endm
 
 	.macro	copy_abort_end
-	ldmfd	sp!, {r4, pc}
+	ldmfd	sp!, {r4, UNWIND(fpreg,) pc}
 	.endm
 
diff --git a/arch/arm/lib/copy_to_user.S b/arch/arm/lib/copy_to_user.S
index 842ea5ede485..fac49e57cc0b 100644
--- a/arch/arm/lib/copy_to_user.S
+++ b/arch/arm/lib/copy_to_user.S
@@ -90,18 +90,15 @@
 	strusr	\reg, \ptr, 1, \cond, abort=\abort
 	.endm
 
-	.macro enter reg1 reg2
+	.macro enter regs:vararg
 	mov	r3, #0
-	stmdb	sp!, {r0, r2, r3, \reg1, \reg2}
+UNWIND( .save	{r0, r2, r3, \regs}		)
+	stmdb	sp!, {r0, r2, r3, \regs}
 	.endm
 
-	.macro usave reg1 reg2
-	UNWIND(	.save {r0, r2, r3, \reg1, \reg2}	)
-	.endm
-
-	.macro exit reg1 reg2
+	.macro exit regs:vararg
 	add	sp, sp, #8
-	ldmfd	sp!, {r0, \reg1, \reg2}
+	ldmfd	sp!, {r0, \regs}
 	.endm
 
 	.text
diff --git a/arch/arm/lib/memcpy.S b/arch/arm/lib/memcpy.S
index e4caf48c089f..90f2b645aa0d 100644
--- a/arch/arm/lib/memcpy.S
+++ b/arch/arm/lib/memcpy.S
@@ -42,16 +42,13 @@
 	strb\cond \reg, [\ptr], #1
 	.endm
 
-	.macro enter reg1 reg2
-	stmdb sp!, {r0, \reg1, \reg2}
+	.macro enter regs:vararg
+UNWIND( .save	{r0, \regs}		)
+	stmdb sp!, {r0, \regs}
 	.endm
 
-	.macro usave reg1 reg2
-	UNWIND(	.save	{r0, \reg1, \reg2}	)
-	.endm
-
-	.macro exit reg1 reg2
-	ldmfd sp!, {r0, \reg1, \reg2}
+	.macro exit regs:vararg
+	ldmfd sp!, {r0, \regs}
 	.endm
 
 	.text
-- 
2.30.2


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [PATCH v3 2/7] ARM: memmove: use frame pointer as unwind anchor
  2021-11-15 11:18 [PATCH v3 0/7] ARM: add vmap'ed stack support Ard Biesheuvel
  2021-11-15 11:18 ` [PATCH v3 1/7] ARM: memcpy: use frame pointer as unwind anchor Ard Biesheuvel
@ 2021-11-15 11:18 ` Ard Biesheuvel
  2021-11-15 11:18 ` [PATCH v3 3/7] ARM: memset: clean up unwind annotations Ard Biesheuvel
                   ` (4 subsequent siblings)
  6 siblings, 0 replies; 36+ messages in thread
From: Ard Biesheuvel @ 2021-11-15 11:18 UTC (permalink / raw)
  To: linux-arm-kernel
  Cc: Ard Biesheuvel, Russell King, Nicolas Pitre, Arnd Bergmann,
	Kees Cook, Keith Packard, Linus Walleij, Nick Desaulniers

The memmove routine is a bit unusual in the way it manages the stack
pointer: depending on the execution path through the function, the SP
assumes different values as different subsets of the register file are
preserved and restored again. This is problematic when it comes to EHABI
unwind info, as it is not instruction accurate, and does not allow
tracking the SP value as it changes.

Commit 207a6cb06990c ("ARM: 8224/1: Add unwinding support for memmove
function") addressed this by carving up the function in different chunks
as far as the unwinder is concerned, and keeping a set of unwind
directives for each of them, each corresponding with the state of the
stack pointer during execution of the chunk in question. This not only
duplicates unwind info unnecessarily, but it also complicates unwinding
the stack upon overflow.

Instead, let's do what the compiler does when the SP is updated halfway
through a function, which is to use a frame pointer and emit the
appropriate unwind directives to communicate this to the unwinder.

Note that Thumb-2 uses R7 for this, while ARM uses R11 aka FP. So let's
avoid touching R7 in the body of the function, so that Thumb-2 can use
it as the frame pointer. R11 was not modified in the first place.

Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Tested-by: Keith Packard <keithpac@amazon.com>
---
 arch/arm/lib/memmove.S | 60 +++++++-------------
 1 file changed, 20 insertions(+), 40 deletions(-)

diff --git a/arch/arm/lib/memmove.S b/arch/arm/lib/memmove.S
index 6fecc12a1f51..6410554039fd 100644
--- a/arch/arm/lib/memmove.S
+++ b/arch/arm/lib/memmove.S
@@ -31,12 +31,13 @@ WEAK(memmove)
 		subs	ip, r0, r1
 		cmphi	r2, ip
 		bls	__memcpy
-
-		stmfd	sp!, {r0, r4, lr}
 	UNWIND(	.fnend				)
 
 	UNWIND(	.fnstart			)
-	UNWIND(	.save	{r0, r4, lr}		) @ in first stmfd block
+	UNWIND(	.save	{r0, r4, fpreg, lr}	)
+		stmfd	sp!, {r0, r4, UNWIND(fpreg,) lr}
+	UNWIND(	.setfp	fpreg, sp		)
+	UNWIND(	mov	fpreg, sp		)
 		add	r1, r1, r2
 		add	r0, r0, r2
 		subs	r2, r2, #4
@@ -48,12 +49,7 @@ WEAK(memmove)
 		bne	10f
 
 1:		subs	r2, r2, #(28)
-		stmfd	sp!, {r5 - r8}
-	UNWIND(	.fnend				)
-
-	UNWIND(	.fnstart			)
-	UNWIND(	.save	{r0, r4, lr}		)
-	UNWIND(	.save	{r5 - r8}		) @ in second stmfd block
+		stmfd	sp!, {r5, r6, r8, r9}
 		blt	5f
 
 	CALGN(	ands	ip, r0, #31		)
@@ -72,9 +68,9 @@ WEAK(memmove)
 	PLD(	pld	[r1, #-96]		)
 
 3:	PLD(	pld	[r1, #-128]		)
-4:		ldmdb	r1!, {r3, r4, r5, r6, r7, r8, ip, lr}
+4:		ldmdb	r1!, {r3, r4, r5, r6, r8, r9, ip, lr}
 		subs	r2, r2, #32
-		stmdb	r0!, {r3, r4, r5, r6, r7, r8, ip, lr}
+		stmdb	r0!, {r3, r4, r5, r6, r8, r9, ip, lr}
 		bge	3b
 	PLD(	cmn	r2, #96			)
 	PLD(	bge	4b			)
@@ -88,8 +84,8 @@ WEAK(memmove)
 		W(ldr)	r4, [r1, #-4]!
 		W(ldr)	r5, [r1, #-4]!
 		W(ldr)	r6, [r1, #-4]!
-		W(ldr)	r7, [r1, #-4]!
 		W(ldr)	r8, [r1, #-4]!
+		W(ldr)	r9, [r1, #-4]!
 		W(ldr)	lr, [r1, #-4]!
 
 		add	pc, pc, ip
@@ -99,17 +95,13 @@ WEAK(memmove)
 		W(str)	r4, [r0, #-4]!
 		W(str)	r5, [r0, #-4]!
 		W(str)	r6, [r0, #-4]!
-		W(str)	r7, [r0, #-4]!
 		W(str)	r8, [r0, #-4]!
+		W(str)	r9, [r0, #-4]!
 		W(str)	lr, [r0, #-4]!
 
 	CALGN(	bcs	2b			)
 
-7:		ldmfd	sp!, {r5 - r8}
-	UNWIND(	.fnend				) @ end of second stmfd block
-
-	UNWIND(	.fnstart			)
-	UNWIND(	.save	{r0, r4, lr}		) @ still in first stmfd block
+7:		ldmfd	sp!, {r5, r6, r8, r9}
 
 8:		movs	r2, r2, lsl #31
 		ldrbne	r3, [r1, #-1]!
@@ -118,7 +110,7 @@ WEAK(memmove)
 		strbne	r3, [r0, #-1]!
 		strbcs	r4, [r0, #-1]!
 		strbcs	ip, [r0, #-1]
-		ldmfd	sp!, {r0, r4, pc}
+		ldmfd	sp!, {r0, r4, UNWIND(fpreg,) pc}
 
 9:		cmp	ip, #2
 		ldrbgt	r3, [r1, #-1]!
@@ -137,13 +129,10 @@ WEAK(memmove)
 		ldr	r3, [r1, #0]
 		beq	17f
 		blt	18f
-	UNWIND(	.fnend				)
 
 
 		.macro	backward_copy_shift push pull
 
-	UNWIND(	.fnstart			)
-	UNWIND(	.save	{r0, r4, lr}		) @ still in first stmfd block
 		subs	r2, r2, #28
 		blt	14f
 
@@ -152,12 +141,7 @@ WEAK(memmove)
 	CALGN(	subcc	r2, r2, ip		)
 	CALGN(	bcc	15f			)
 
-11:		stmfd	sp!, {r5 - r9}
-	UNWIND(	.fnend				)
-
-	UNWIND(	.fnstart			)
-	UNWIND(	.save	{r0, r4, lr}		)
-	UNWIND(	.save	{r5 - r9}		) @ in new second stmfd block
+11:		stmfd	sp!, {r5, r6, r8 - r10}
 
 	PLD(	pld	[r1, #-4]		)
 	PLD(	subs	r2, r2, #96		)
@@ -167,35 +151,31 @@ WEAK(memmove)
 	PLD(	pld	[r1, #-96]		)
 
 12:	PLD(	pld	[r1, #-128]		)
-13:		ldmdb   r1!, {r7, r8, r9, ip}
+13:		ldmdb   r1!, {r8, r9, r10, ip}
 		mov     lr, r3, lspush #\push
 		subs    r2, r2, #32
 		ldmdb   r1!, {r3, r4, r5, r6}
 		orr     lr, lr, ip, lspull #\pull
 		mov     ip, ip, lspush #\push
-		orr     ip, ip, r9, lspull #\pull
+		orr     ip, ip, r10, lspull #\pull
+		mov     r10, r10, lspush #\push
+		orr     r10, r10, r9, lspull #\pull
 		mov     r9, r9, lspush #\push
 		orr     r9, r9, r8, lspull #\pull
 		mov     r8, r8, lspush #\push
-		orr     r8, r8, r7, lspull #\pull
-		mov     r7, r7, lspush #\push
-		orr     r7, r7, r6, lspull #\pull
+		orr     r8, r8, r6, lspull #\pull
 		mov     r6, r6, lspush #\push
 		orr     r6, r6, r5, lspull #\pull
 		mov     r5, r5, lspush #\push
 		orr     r5, r5, r4, lspull #\pull
 		mov     r4, r4, lspush #\push
 		orr     r4, r4, r3, lspull #\pull
-		stmdb   r0!, {r4 - r9, ip, lr}
+		stmdb   r0!, {r4 - r6, r8 - r10, ip, lr}
 		bge	12b
 	PLD(	cmn	r2, #96			)
 	PLD(	bge	13b			)
 
-		ldmfd	sp!, {r5 - r9}
-	UNWIND(	.fnend				) @ end of the second stmfd block
-
-	UNWIND(	.fnstart			)
-	UNWIND(	.save {r0, r4, lr}		) @ still in first stmfd block
+		ldmfd	sp!, {r5, r6, r8 - r10}
 
 14:		ands	ip, r2, #28
 		beq	16f
@@ -211,7 +191,6 @@ WEAK(memmove)
 
 16:		add	r1, r1, #(\pull / 8)
 		b	8b
-	UNWIND(	.fnend				)
 
 		.endm
 
@@ -222,5 +201,6 @@ WEAK(memmove)
 
 18:		backward_copy_shift	push=24	pull=8
 
+	UNWIND(	.fnend				)
 ENDPROC(memmove)
 ENDPROC(__memmove)
-- 
2.30.2


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [PATCH v3 3/7] ARM: memset: clean up unwind annotations
  2021-11-15 11:18 [PATCH v3 0/7] ARM: add vmap'ed stack support Ard Biesheuvel
  2021-11-15 11:18 ` [PATCH v3 1/7] ARM: memcpy: use frame pointer as unwind anchor Ard Biesheuvel
  2021-11-15 11:18 ` [PATCH v3 2/7] ARM: memmove: " Ard Biesheuvel
@ 2021-11-15 11:18 ` Ard Biesheuvel
  2021-11-15 11:18 ` [PATCH v3 4/7] ARM: unwind: disregard unwind info before stack frame is set up Ard Biesheuvel
                   ` (3 subsequent siblings)
  6 siblings, 0 replies; 36+ messages in thread
From: Ard Biesheuvel @ 2021-11-15 11:18 UTC (permalink / raw)
  To: linux-arm-kernel
  Cc: Ard Biesheuvel, Russell King, Nicolas Pitre, Arnd Bergmann,
	Kees Cook, Keith Packard, Linus Walleij, Nick Desaulniers

The memset implementation carves up the code in different sections, each
covered with their own unwind info. In this case, it is done in a way
similar to how the compiler might do it, to disambiguate between parts
where the return address is in LR and the SP is unmodified, and parts
where a stack frame is live, and the unwinder needs to know the size of
the stack frame and the location of the return address within it.

Only the placement of the unwind directives is slightly odd: the stack
pushes are placed in the wrong sections, which may confuse the unwinder
when attempting to unwind with PC pointing at the stack push in
question.

So let's fix this up, by reordering the directives and instructions as
appropriate.

Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Tested-by: Keith Packard <keithpac@amazon.com>
---
 arch/arm/lib/memset.S | 7 +++----
 1 file changed, 3 insertions(+), 4 deletions(-)

diff --git a/arch/arm/lib/memset.S b/arch/arm/lib/memset.S
index 9817cb258c1a..d71ab61430b2 100644
--- a/arch/arm/lib/memset.S
+++ b/arch/arm/lib/memset.S
@@ -28,16 +28,16 @@ UNWIND( .fnstart         )
 	mov	r3, r1
 7:	cmp	r2, #16
 	blt	4f
+UNWIND( .fnend              )
 
 #if ! CALGN(1)+0
 
 /*
  * We need 2 extra registers for this loop - use r8 and the LR
  */
-	stmfd	sp!, {r8, lr}
-UNWIND( .fnend              )
 UNWIND( .fnstart            )
 UNWIND( .save {r8, lr}      )
+	stmfd	sp!, {r8, lr}
 	mov	r8, r1
 	mov	lr, r3
 
@@ -66,10 +66,9 @@ UNWIND( .fnend              )
  * whole cache lines at once.
  */
 
-	stmfd	sp!, {r4-r8, lr}
-UNWIND( .fnend                 )
 UNWIND( .fnstart               )
 UNWIND( .save {r4-r8, lr}      )
+	stmfd	sp!, {r4-r8, lr}
 	mov	r4, r1
 	mov	r5, r3
 	mov	r6, r1
-- 
2.30.2


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [PATCH v3 4/7] ARM: unwind: disregard unwind info before stack frame is set up
  2021-11-15 11:18 [PATCH v3 0/7] ARM: add vmap'ed stack support Ard Biesheuvel
                   ` (2 preceding siblings ...)
  2021-11-15 11:18 ` [PATCH v3 3/7] ARM: memset: clean up unwind annotations Ard Biesheuvel
@ 2021-11-15 11:18 ` Ard Biesheuvel
  2021-11-15 11:18 ` [PATCH v3 5/7] ARM: switch_to: clean up Thumb2 code path Ard Biesheuvel
                   ` (2 subsequent siblings)
  6 siblings, 0 replies; 36+ messages in thread
From: Ard Biesheuvel @ 2021-11-15 11:18 UTC (permalink / raw)
  To: linux-arm-kernel
  Cc: Ard Biesheuvel, Russell King, Nicolas Pitre, Arnd Bergmann,
	Kees Cook, Keith Packard, Linus Walleij, Nick Desaulniers

When unwinding the stack from a stack overflow, we are likely to start
from a stack push instruction, given that this is the most common way to
grow the stack for compiler emitted code. This push instruction rarely
appears anywhere else than at offset 0x0 of the function, and if it
doesn't, the compiler tends to split up the unwind annotations, given
that the stack frame layout is apparently not the same throughout the
function.

This means that, in the general case, if the frame's PC points at the
first instruction covered by a certain unwind entry, there is no way the
stack frame that the unwind entry describes could have been created yet,
and so we are still on the stack frame of the caller in that case. So
treat this as a special case, and return with the new PC taken from the
frame's LR, without applying the unwind transformations to the virtual
register set.

This permits us to unwind the call stack on stack overflow when the
overflow was caused by a stack push on function entry.

Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Tested-by: Keith Packard <keithpac@amazon.com>
---
 arch/arm/kernel/unwind.c | 16 +++++++++++++++-
 1 file changed, 15 insertions(+), 1 deletion(-)

diff --git a/arch/arm/kernel/unwind.c b/arch/arm/kernel/unwind.c
index b7a6141c342f..e8d729975f12 100644
--- a/arch/arm/kernel/unwind.c
+++ b/arch/arm/kernel/unwind.c
@@ -411,7 +411,21 @@ int unwind_frame(struct stackframe *frame)
 	if (idx->insn == 1)
 		/* can't unwind */
 		return -URC_FAILURE;
-	else if ((idx->insn & 0x80000000) == 0)
+	else if (frame->pc == prel31_to_addr(&idx->addr_offset)) {
+		/*
+		 * Unwinding is tricky when we're halfway through the prologue,
+		 * since the stack frame that the unwinder expects may not be
+		 * fully set up yet. However, one thing we do know for sure is
+		 * that if we are unwinding from the very first instruction of
+		 * a function, we are still effectively in the stack frame of
+		 * the caller, and the unwind info has no relevance yet.
+		 */
+		if (frame->pc == frame->lr)
+			return -URC_FAILURE;
+		frame->sp_low = frame->sp;
+		frame->pc = frame->lr;
+		return URC_OK;
+	} else if ((idx->insn & 0x80000000) == 0)
 		/* prel31 to the unwind table */
 		ctrl.insn = (unsigned long *)prel31_to_addr(&idx->insn);
 	else if ((idx->insn & 0xff000000) == 0x80000000)
-- 
2.30.2


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [PATCH v3 5/7] ARM: switch_to: clean up Thumb2 code path
  2021-11-15 11:18 [PATCH v3 0/7] ARM: add vmap'ed stack support Ard Biesheuvel
                   ` (3 preceding siblings ...)
  2021-11-15 11:18 ` [PATCH v3 4/7] ARM: unwind: disregard unwind info before stack frame is set up Ard Biesheuvel
@ 2021-11-15 11:18 ` Ard Biesheuvel
  2021-11-15 11:18 ` [PATCH v3 6/7] ARM: entry: rework stack realignment code in svc_entry Ard Biesheuvel
  2021-11-15 11:18 ` [PATCH v3 7/7] ARM: implement support for vmap'ed stacks Ard Biesheuvel
  6 siblings, 0 replies; 36+ messages in thread
From: Ard Biesheuvel @ 2021-11-15 11:18 UTC (permalink / raw)
  To: linux-arm-kernel
  Cc: Ard Biesheuvel, Russell King, Nicolas Pitre, Arnd Bergmann,
	Kees Cook, Keith Packard, Linus Walleij, Nick Desaulniers

The load-multiple instruction that essentially performs the switch_to
operation in ARM mode, by loading all callee save registers as well the
stack pointer and the program counter, is split into 3 separate loads
for Thumb-2, with the IP register used as a temporary to capture the
value of R4 before it gets overwritten.

We can clean this up a bit, by sticking with a single LDMIA instruction,
but one that pops SP and PC into IP and LR, respectively, and by using
ordinary move register and branch instructions to get those values into
SP and PC. This also allows us to move the set_current call closer to
the assignment of SP, reducing the window where those are mutually out
of sync. This is especially relevant for CONFIG_VMAP_STACK, which is
being introduced in a subsequent patch, where we need to issue a load
that might fault from the new stack while running from the old one, to
ensure that stale PMD entries in the VMALLOC space are synced up.

Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Tested-by: Keith Packard <keithpac@amazon.com>
---
 arch/arm/kernel/entry-armv.S | 23 +++++++++++++++-----
 1 file changed, 18 insertions(+), 5 deletions(-)

diff --git a/arch/arm/kernel/entry-armv.S b/arch/arm/kernel/entry-armv.S
index 1c7590eef712..ce8ca29461de 100644
--- a/arch/arm/kernel/entry-armv.S
+++ b/arch/arm/kernel/entry-armv.S
@@ -823,13 +823,26 @@ ENTRY(__switch_to)
 #if defined(CONFIG_STACKPROTECTOR) && !defined(CONFIG_SMP)
 	str	r7, [r8]
 #endif
- THUMB(	mov	ip, r4			   )
 	mov	r0, r5
+#if !defined(CONFIG_THUMB2_KERNEL)
 	set_current r7
- ARM(	ldmia	r4, {r4 - sl, fp, sp, pc}  )	@ Load all regs saved previously
- THUMB(	ldmia	ip!, {r4 - sl, fp}	   )	@ Load all regs saved previously
- THUMB(	ldr	sp, [ip], #4		   )
- THUMB(	ldr	pc, [ip]		   )
+	ldmia	r4, {r4 - sl, fp, sp, pc}	@ Load all regs saved previously
+#else
+	mov	r1, r7
+	ldmia	r4, {r4 - sl, fp, ip, lr}	@ Load all regs saved previously
+
+	@ When CONFIG_THREAD_INFO_IN_TASK=n, the update of SP itself is what
+	@ effectuates the task switch, as that is what causes the observable
+	@ values of current and current_thread_info to change. When
+	@ CONFIG_THREAD_INFO_IN_TASK=y, setting current (and therefore
+	@ current_thread_info) is done explicitly, and the update of SP just
+	@ switches us to another stack, with few other side effects. In order
+	@ to prevent this distinction from causing any inconsistencies, let's
+	@ keep the 'set_current' call as close as we can to the update of SP.
+	set_current r1
+	mov	sp, ip
+	ret	lr
+#endif
  UNWIND(.fnend		)
 ENDPROC(__switch_to)
 
-- 
2.30.2


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [PATCH v3 6/7] ARM: entry: rework stack realignment code in svc_entry
  2021-11-15 11:18 [PATCH v3 0/7] ARM: add vmap'ed stack support Ard Biesheuvel
                   ` (4 preceding siblings ...)
  2021-11-15 11:18 ` [PATCH v3 5/7] ARM: switch_to: clean up Thumb2 code path Ard Biesheuvel
@ 2021-11-15 11:18 ` Ard Biesheuvel
  2021-11-15 11:18 ` [PATCH v3 7/7] ARM: implement support for vmap'ed stacks Ard Biesheuvel
  6 siblings, 0 replies; 36+ messages in thread
From: Ard Biesheuvel @ 2021-11-15 11:18 UTC (permalink / raw)
  To: linux-arm-kernel
  Cc: Ard Biesheuvel, Russell King, Nicolas Pitre, Arnd Bergmann,
	Kees Cook, Keith Packard, Linus Walleij, Nick Desaulniers

The original Thumb-2 enablement patches updated the stack realignment
code in svc_entry to work around the lack of a STMIB instruction in
Thumb-2, by subtracting 4 from the frame size, inverting the sense of
the misaligment check, and changing to a STMIA instruction and a final
stack push of a 4 byte quantity that results in the stack becoming
aligned at the end of the sequence. It also pushes and pops R0 to the
stack in order to have a temp register that Thumb-2 allows in general
purpose ALU instructions, as TST using SP is not permitted.

Both are a bit problematic for vmap'ed stacks, as using the stack is
only permitted after we decide that we did not overflow the stack, or
have already switched to the overflow stack.

As for the alignment check: the current approach creates a corner case
where, if the initial SUB of SP ends up right at the start of the stack,
we will end up subtracting another 8 bytes and overflowing it.  This
means we would need to add the overflow check *after* the SUB that
deliberately misaligns the stack. However, this would require us to keep
local state (i.e., whether we performed the subtract or not) across the
overflow check, but without any GPRs or stack available.

So let's switch to an approach where we don't use the stack, and where
the alignment check of the stack pointer occurs in the usual way, as
this is guaranteed not to result in overflow. This means we will be able
to do the overflow check first.

Acked-by: Nicolas Pitre <nico@fluxnic.net>
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
---
 arch/arm/kernel/entry-armv.S | 25 +++++++++++---------
 1 file changed, 14 insertions(+), 11 deletions(-)

diff --git a/arch/arm/kernel/entry-armv.S b/arch/arm/kernel/entry-armv.S
index ce8ca29461de..b18f3aa98f42 100644
--- a/arch/arm/kernel/entry-armv.S
+++ b/arch/arm/kernel/entry-armv.S
@@ -191,24 +191,27 @@ ENDPROC(__und_invalid)
 	.macro	svc_entry, stack_hole=0, trace=1, uaccess=1
  UNWIND(.fnstart		)
  UNWIND(.save {r0 - pc}		)
-	sub	sp, sp, #(SVC_REGS_SIZE + \stack_hole - 4)
+	sub	sp, sp, #(SVC_REGS_SIZE + \stack_hole)
 #ifdef CONFIG_THUMB2_KERNEL
- SPFIX(	str	r0, [sp]	)	@ temporarily saved
- SPFIX(	mov	r0, sp		)
- SPFIX(	tst	r0, #4		)	@ test original stack alignment
- SPFIX(	ldr	r0, [sp]	)	@ restored
+	add	sp, r0			@ get SP in a GPR without
+	sub	r0, sp, r0		@ using a temp register
+	tst	r0, #4			@ test stack pointer alignment
+	sub	r0, sp, r0		@ restore original R0
+	sub	sp, r0			@ restore original SP
 #else
  SPFIX(	tst	sp, #4		)
 #endif
- SPFIX(	subeq	sp, sp, #4	)
-	stmia	sp, {r1 - r12}
+ SPFIX(	subne	sp, sp, #4	)
+
+ ARM(	stmib	sp, {r1 - r12}	)
+ THUMB(	stmia	sp, {r0 - r12}	)	@ No STMIB in Thumb-2
 
 	ldmia	r0, {r3 - r5}
-	add	r7, sp, #S_SP - 4	@ here for interlock avoidance
+	add	r7, sp, #S_SP		@ here for interlock avoidance
 	mov	r6, #-1			@  ""  ""      ""       ""
-	add	r2, sp, #(SVC_REGS_SIZE + \stack_hole - 4)
- SPFIX(	addeq	r2, r2, #4	)
-	str	r3, [sp, #-4]!		@ save the "real" r0 copied
+	add	r2, sp, #(SVC_REGS_SIZE + \stack_hole)
+ SPFIX(	addne	r2, r2, #4	)
+	str	r3, [sp]		@ save the "real" r0 copied
 					@ from the exception stack
 
 	mov	r3, lr
-- 
2.30.2


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [PATCH v3 7/7] ARM: implement support for vmap'ed stacks
  2021-11-15 11:18 [PATCH v3 0/7] ARM: add vmap'ed stack support Ard Biesheuvel
                   ` (5 preceding siblings ...)
  2021-11-15 11:18 ` [PATCH v3 6/7] ARM: entry: rework stack realignment code in svc_entry Ard Biesheuvel
@ 2021-11-15 11:18 ` Ard Biesheuvel
  2021-11-16  9:22     ` Guillaume Tucker
  6 siblings, 1 reply; 36+ messages in thread
From: Ard Biesheuvel @ 2021-11-15 11:18 UTC (permalink / raw)
  To: linux-arm-kernel
  Cc: Ard Biesheuvel, Russell King, Nicolas Pitre, Arnd Bergmann,
	Kees Cook, Keith Packard, Linus Walleij, Nick Desaulniers

Wire up the generic support for managing task stack allocations via vmalloc,
and implement the entry code that detects whether we faulted because of a
stack overrun (or future stack overrun caused by pushing the pt_regs array)

While this adds a fair amount of tricky entry asm code, it should be
noted that it only adds a TST + branch to the svc_entry path. The code
implementing the non-trivial handling of the overflow stack is emitted
out-of-line into the .text section.

Since on ARM, we rely on do_translation_fault() to keep PMD level page
table entries that cover the vmalloc region up to date, we need to
ensure that we don't hit such a stale PMD entry when accessing the
stack. So we do a dummy read from the new stack while still running from
the old one on the context switch path, and bump the vmalloc_seq counter
when PMD level entries in the vmalloc range are modified, so that the MM
switch fetches the latest version of the entries.

Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Tested-by: Keith Packard <keithpac@amazon.com>
---
 arch/arm/Kconfig                   |  1 +
 arch/arm/include/asm/page.h        |  4 +
 arch/arm/include/asm/thread_info.h |  8 ++
 arch/arm/kernel/entry-armv.S       | 79 ++++++++++++++++++--
 arch/arm/kernel/entry-header.S     | 57 ++++++++++++++
 arch/arm/kernel/irq.c              |  9 ++-
 arch/arm/kernel/traps.c            | 65 +++++++++++++++-
 arch/arm/kernel/unwind.c           |  3 +-
 arch/arm/kernel/vmlinux.lds.S      |  4 +-
 9 files changed, 219 insertions(+), 11 deletions(-)

diff --git a/arch/arm/Kconfig b/arch/arm/Kconfig
index b1eba1b4168c..a072600527ca 100644
--- a/arch/arm/Kconfig
+++ b/arch/arm/Kconfig
@@ -127,6 +127,7 @@ config ARM
 	select RTC_LIB
 	select SYS_SUPPORTS_APM_EMULATION
 	select THREAD_INFO_IN_TASK if CURRENT_POINTER_IN_TPIDRURO
+	select HAVE_ARCH_VMAP_STACK if THREAD_INFO_IN_TASK
 	select TRACE_IRQFLAGS_SUPPORT if !CPU_V7M
 	# Above selects are sorted alphabetically; please add new ones
 	# according to that.  Thanks.
diff --git a/arch/arm/include/asm/page.h b/arch/arm/include/asm/page.h
index 11b058a72a5b..7b871ed99ccf 100644
--- a/arch/arm/include/asm/page.h
+++ b/arch/arm/include/asm/page.h
@@ -149,6 +149,10 @@ extern void copy_page(void *to, const void *from);
 #include <asm/pgtable-2level-types.h>
 #endif
 
+#ifdef CONFIG_VMAP_STACK
+#define ARCH_PAGE_TABLE_SYNC_MASK	PGTBL_PMD_MODIFIED
+#endif
+
 #endif /* CONFIG_MMU */
 
 typedef struct page *pgtable_t;
diff --git a/arch/arm/include/asm/thread_info.h b/arch/arm/include/asm/thread_info.h
index 164e15f26485..004b89d86224 100644
--- a/arch/arm/include/asm/thread_info.h
+++ b/arch/arm/include/asm/thread_info.h
@@ -25,6 +25,14 @@
 #define THREAD_SIZE		(PAGE_SIZE << THREAD_SIZE_ORDER)
 #define THREAD_START_SP		(THREAD_SIZE - 8)
 
+#ifdef CONFIG_VMAP_STACK
+#define THREAD_ALIGN		(2 * THREAD_SIZE)
+#else
+#define THREAD_ALIGN		THREAD_SIZE
+#endif
+
+#define OVERFLOW_STACK_SIZE	SZ_4K
+
 #ifndef __ASSEMBLY__
 
 struct task_struct;
diff --git a/arch/arm/kernel/entry-armv.S b/arch/arm/kernel/entry-armv.S
index b18f3aa98f42..ad8d8304539e 100644
--- a/arch/arm/kernel/entry-armv.S
+++ b/arch/arm/kernel/entry-armv.S
@@ -57,6 +57,14 @@ UNWIND(	.setfp	fpreg, sp		)
 	@
 	subs	r2, sp, r0		@ SP above bottom of IRQ stack?
 	rsbscs	r2, r2, #THREAD_SIZE	@ ... and below the top?
+#ifdef CONFIG_VMAP_STACK
+	bcs	.L\@
+	mov_l	r2, overflow_stack	@ Take base address
+	add	r2, r2, r3		@ Top of this CPU's overflow stack
+	subs	r2, r0, r2		@ Compare with incoming SP
+	rsbscs	r2, r2, #OVERFLOW_STACK_SIZE
+.L\@:
+#endif
 	movcs	sp, r0			@ If so, revert to incoming SP
 
 #ifndef CONFIG_UNWINDER_ARM
@@ -188,13 +196,18 @@ ENDPROC(__und_invalid)
 #define SPFIX(code...)
 #endif
 
-	.macro	svc_entry, stack_hole=0, trace=1, uaccess=1
+	.macro	svc_entry, stack_hole=0, trace=1, uaccess=1, overflow_check=1
  UNWIND(.fnstart		)
- UNWIND(.save {r0 - pc}		)
 	sub	sp, sp, #(SVC_REGS_SIZE + \stack_hole)
+ THUMB(	add	sp, r0		)	@ get SP in a GPR without
+ THUMB(	sub	r0, sp, r0	)	@ using a temp register
+
+	.if	\overflow_check
+ UNWIND(.save	{r0 - pc}	)
+	do_overflow_check (SVC_REGS_SIZE + \stack_hole)
+	.endif
+
 #ifdef CONFIG_THUMB2_KERNEL
-	add	sp, r0			@ get SP in a GPR without
-	sub	r0, sp, r0		@ using a temp register
 	tst	r0, #4			@ test stack pointer alignment
 	sub	r0, sp, r0		@ restore original R0
 	sub	sp, r0			@ restore original SP
@@ -827,12 +840,20 @@ ENTRY(__switch_to)
 	str	r7, [r8]
 #endif
 	mov	r0, r5
-#if !defined(CONFIG_THUMB2_KERNEL)
+#if !defined(CONFIG_THUMB2_KERNEL) && !defined(CONFIG_VMAP_STACK)
 	set_current r7
 	ldmia	r4, {r4 - sl, fp, sp, pc}	@ Load all regs saved previously
 #else
 	mov	r1, r7
 	ldmia	r4, {r4 - sl, fp, ip, lr}	@ Load all regs saved previously
+#ifdef CONFIG_VMAP_STACK
+	@
+	@ Do a dummy read from the new stack while running from the old one so
+	@ that we can rely on do_translation_fault() to fix up any stale PMD
+	@ entries covering the vmalloc region.
+	@
+	ldr	r2, [ip]
+#endif
 
 	@ When CONFIG_THREAD_INFO_IN_TASK=n, the update of SP itself is what
 	@ effectuates the task switch, as that is what causes the observable
@@ -849,6 +870,54 @@ ENTRY(__switch_to)
  UNWIND(.fnend		)
 ENDPROC(__switch_to)
 
+#ifdef CONFIG_VMAP_STACK
+	.text
+__bad_stack:
+	@
+	@ We detected an overflow in svc_entry, which switched to the
+	@ overflow stack. Stash the exception regs, and head to our overflow
+	@ handler. Entered with the orginal value of SP in IP, and the original
+	@ value of IP in TPIDRURW
+	@
+
+#if defined(CONFIG_UNWINDER_FRAME_POINTER) && defined(CONFIG_CC_IS_GCC)
+	mov	ip, ip				@ mov expected by unwinder
+	push	{fp, ip, lr, pc}		@ GCC flavor frame record
+#else
+	str	ip, [sp, #-8]!			@ store original SP
+	push	{fpreg, lr}			@ Clang flavor frame record
+#endif
+UNWIND( ldr	ip, [r0, #4]	)		@ load exception LR
+UNWIND( str	ip, [sp, #12]	)		@ store in the frame record
+	mrc	p15, 0, ip, c13, c0, 2		@ reload IP
+
+	@ Store the original GPRs to the new stack.
+	svc_entry uaccess=0, overflow_check=0
+
+UNWIND( .save   {sp, pc}	)
+UNWIND( .save   {fpreg, lr}	)
+UNWIND( .setfp  fpreg, sp	)
+
+	ldr	fpreg, [sp, #S_SP]		@ Add our frame record
+						@ to the linked list
+#if defined(CONFIG_UNWINDER_FRAME_POINTER) && defined(CONFIG_CC_IS_GCC)
+	ldr	r1, [fp, #4]			@ reload SP at entry
+	add	fp, fp, #12
+#else
+	ldr	r1, [fpreg, #8]
+#endif
+	str	r1, [sp, #S_SP]			@ store in pt_regs
+
+	@ Stash the regs for handle_bad_stack
+	mov	r0, sp
+
+	@ Time to die
+	bl	handle_bad_stack
+	nop
+UNWIND( .fnend			)
+ENDPROC(__bad_stack)
+#endif
+
 	__INIT
 
 /*
diff --git a/arch/arm/kernel/entry-header.S b/arch/arm/kernel/entry-header.S
index ae24dd54e9ef..823dd1aa6e3e 100644
--- a/arch/arm/kernel/entry-header.S
+++ b/arch/arm/kernel/entry-header.S
@@ -423,3 +423,60 @@ scno	.req	r7		@ syscall number
 tbl	.req	r8		@ syscall table pointer
 why	.req	r8		@ Linux syscall (!= 0)
 tsk	.req	r9		@ current thread_info
+
+	.macro	do_overflow_check, frame_size:req
+#ifdef CONFIG_VMAP_STACK
+	@
+	@ Test whether the SP has overflowed. Task and IRQ stacks are aligned
+	@ so that SP & BIT(THREAD_SIZE_ORDER + PAGE_SHIFT) should always be
+	@ zero.
+	@
+ARM(	tst	sp, #1 << (THREAD_SIZE_ORDER + PAGE_SHIFT)	)
+THUMB(	tst	r0, #1 << (THREAD_SIZE_ORDER + PAGE_SHIFT)	)
+THUMB(	it	ne						)
+	bne	.Lstack_overflow_check\@
+
+	.pushsection	.text
+.Lstack_overflow_check\@:
+	@
+	@ Either we've just detected an overflow, or we've taken an exception
+	@ while on the overflow stack. We cannot use the stack until we have
+	@ decided which is the case. However, as we won't return to userspace,
+	@ we can clobber some USR/SYS mode registers to free up GPRs.
+	@
+
+	mcr	p15, 0, ip, c13, c0, 2		@ Stash IP in TPIDRURW
+	mrs	ip, cpsr
+	eor	ip, ip, #(SVC_MODE ^ SYSTEM_MODE)
+	msr	cpsr_c, ip			@ Switch to SYS mode
+	eor	ip, ip, #(SVC_MODE ^ SYSTEM_MODE)
+	mov	sp, ip				@ Stash mode in SP_usr
+
+	@ Load the overflow stack into IP using LR_usr as a scratch register
+	mov_l	lr, overflow_stack + OVERFLOW_STACK_SIZE
+	mrc	p15, 0, ip, c13, c0, 4		@ Get CPU offset
+	add	ip, ip, lr			@ IP := this CPU's overflow stack
+	mov	lr, sp				@ Unstash mode into LR_usr
+	msr	cpsr_c, lr			@ Switch back to SVC mode
+
+	@
+	@ Check whether we are already on the overflow stack. This may happen,
+	@ e.g., when performing accesses that may fault when dumping the stack.
+	@ The overflow stack is not in the vmalloc space so we only need to
+	@ check whether the incoming SP is below the top of the overflow stack.
+	@
+ARM(	subs	ip, sp, ip		)	@ Delta with top of overflow stack
+THUMB(	subs	ip, r0, ip		)
+	mrclo	p15, 0, ip, c13, c0, 2		@ Restore IP
+	blo	.Lout\@				@ Carry on
+
+THUMB(	sub	r0, sp, r0		)	@ Restore original R0
+THUMB(	sub	sp, r0			)	@ Restore original SP
+	sub	sp, sp, ip			@ Switch to overflow stack
+	add	ip, sp, ip			@ Keep incoming SP value in IP
+	add	ip, ip, #\frame_size		@ Undo svc_entry's SP change
+	b	__bad_stack
+	.popsection
+.Lout\@:
+#endif
+	.endm
diff --git a/arch/arm/kernel/irq.c b/arch/arm/kernel/irq.c
index e05219bca218..5deb40f39999 100644
--- a/arch/arm/kernel/irq.c
+++ b/arch/arm/kernel/irq.c
@@ -56,7 +56,14 @@ static void __init init_irq_stacks(void)
 	int cpu;
 
 	for_each_possible_cpu(cpu) {
-		stack = (u8 *)__get_free_pages(GFP_KERNEL, THREAD_SIZE_ORDER);
+		if (!IS_ENABLED(CONFIG_VMAP_STACK))
+			stack = (u8 *)__get_free_pages(GFP_KERNEL,
+						       THREAD_SIZE_ORDER);
+		else
+			stack = __vmalloc_node(THREAD_SIZE, THREAD_ALIGN,
+					       THREADINFO_GFP, NUMA_NO_NODE,
+					       __builtin_return_address(0));
+
 		if (WARN_ON(!stack))
 			break;
 		per_cpu(irq_stack_ptr, cpu) = &stack[THREAD_SIZE];
diff --git a/arch/arm/kernel/traps.c b/arch/arm/kernel/traps.c
index b42c446cec9a..eb8c73be7c81 100644
--- a/arch/arm/kernel/traps.c
+++ b/arch/arm/kernel/traps.c
@@ -121,7 +121,8 @@ void dump_backtrace_stm(u32 *stack, u32 instruction, const char *loglvl)
 static int verify_stack(unsigned long sp)
 {
 	if (sp < PAGE_OFFSET ||
-	    (sp > (unsigned long)high_memory && high_memory != NULL))
+	    (!IS_ENABLED(CONFIG_VMAP_STACK) &&
+	     sp > (unsigned long)high_memory && high_memory != NULL))
 		return -EFAULT;
 
 	return 0;
@@ -291,7 +292,8 @@ static int __die(const char *str, int err, struct pt_regs *regs)
 
 	if (!user_mode(regs) || in_interrupt()) {
 		dump_mem(KERN_EMERG, "Stack: ", regs->ARM_sp,
-			 ALIGN(regs->ARM_sp, THREAD_SIZE));
+			 ALIGN(regs->ARM_sp - THREAD_SIZE, THREAD_ALIGN)
+			 + THREAD_SIZE);
 		dump_backtrace(regs, tsk, KERN_EMERG);
 		dump_instr(KERN_EMERG, regs);
 	}
@@ -838,3 +840,62 @@ void __init early_trap_init(void *vectors_base)
 	 */
 #endif
 }
+
+#ifdef CONFIG_VMAP_STACK
+
+DECLARE_PER_CPU(u8 *, irq_stack_ptr);
+
+asmlinkage DEFINE_PER_CPU_ALIGNED(u8[OVERFLOW_STACK_SIZE], overflow_stack);
+
+asmlinkage void handle_bad_stack(struct pt_regs *regs)
+{
+	unsigned long tsk_stk = (unsigned long)current->stack;
+	unsigned long irq_stk = (unsigned long)this_cpu_read(irq_stack_ptr);
+	unsigned long ovf_stk = (unsigned long)this_cpu_ptr(overflow_stack);
+
+	console_verbose();
+	pr_emerg("Insufficient stack space to handle exception!");
+
+	pr_emerg("Task stack:     [0x%08lx..0x%08lx]\n",
+		 tsk_stk, tsk_stk + THREAD_SIZE);
+	pr_emerg("IRQ stack:      [0x%08lx..0x%08lx]\n",
+		 irq_stk, irq_stk + THREAD_SIZE);
+	pr_emerg("Overflow stack: [0x%08lx..0x%08lx]\n",
+		 ovf_stk, ovf_stk + OVERFLOW_STACK_SIZE);
+
+	die("kernel stack overflow", regs, 0);
+}
+
+/*
+ * Normally, we rely on the logic in do_translation_fault() to update stale PMD
+ * entries covering the vmalloc space in a task's page tables when it first
+ * accesses the region in question. Unfortunately, this is not sufficient when
+ * the task stack resides in the vmalloc region, as do_translation_fault() is a
+ * C function that needs a stack to run.
+ *
+ * So we need to ensure that these PMD entries are up to date *before* the MM
+ * switch. As we already have some logic in the MM switch path that takes care
+ * of this, let's trigger it by bumping the counter every time the core vmalloc
+ * code modifies a PMD entry in the vmalloc region.
+ */
+void arch_sync_kernel_mappings(unsigned long start, unsigned long end)
+{
+	if (start > VMALLOC_END || end < VMALLOC_START)
+		return;
+
+	/*
+	 * This hooks into the core vmalloc code to receive notifications of
+	 * any PMD level changes that have been made to the kernel page tables.
+	 * This means it should only be triggered once for every MiB worth of
+	 * vmalloc space, given that we don't support huge vmalloc/vmap on ARM,
+	 * and that kernel PMD level table entries are rarely (if ever)
+	 * updated.
+	 *
+	 * This means that the counter is going to max out at ~250 for the
+	 * typical case. If it overflows, something entirely unexpected has
+	 * occurred so let's throw a warning if that happens.
+	 */
+	WARN_ON(++init_mm.context.vmalloc_seq == UINT_MAX);
+}
+
+#endif
diff --git a/arch/arm/kernel/unwind.c b/arch/arm/kernel/unwind.c
index e8d729975f12..c5ea328c428d 100644
--- a/arch/arm/kernel/unwind.c
+++ b/arch/arm/kernel/unwind.c
@@ -389,7 +389,8 @@ int unwind_frame(struct stackframe *frame)
 
 	/* store the highest address on the stack to avoid crossing it*/
 	ctrl.sp_low = frame->sp;
-	ctrl.sp_high = ALIGN(ctrl.sp_low, THREAD_SIZE);
+	ctrl.sp_high = ALIGN(ctrl.sp_low - THREAD_SIZE, THREAD_ALIGN)
+		       + THREAD_SIZE;
 
 	pr_debug("%s(pc = %08lx lr = %08lx sp = %08lx)\n", __func__,
 		 frame->pc, frame->lr, frame->sp);
diff --git a/arch/arm/kernel/vmlinux.lds.S b/arch/arm/kernel/vmlinux.lds.S
index f02d617e3359..aa12b65a7fd6 100644
--- a/arch/arm/kernel/vmlinux.lds.S
+++ b/arch/arm/kernel/vmlinux.lds.S
@@ -138,12 +138,12 @@ SECTIONS
 #ifdef CONFIG_STRICT_KERNEL_RWX
 	. = ALIGN(1<<SECTION_SHIFT);
 #else
-	. = ALIGN(THREAD_SIZE);
+	. = ALIGN(THREAD_ALIGN);
 #endif
 	__init_end = .;
 
 	_sdata = .;
-	RW_DATA(L1_CACHE_BYTES, PAGE_SIZE, THREAD_SIZE)
+	RW_DATA(L1_CACHE_BYTES, PAGE_SIZE, THREAD_ALIGN)
 	_edata = .;
 
 	BSS_SECTION(0, 0, 0)
-- 
2.30.2


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 36+ messages in thread

* Re: [PATCH v3 7/7] ARM: implement support for vmap'ed stacks
  2021-11-15 11:18 ` [PATCH v3 7/7] ARM: implement support for vmap'ed stacks Ard Biesheuvel
@ 2021-11-16  9:22     ` Guillaume Tucker
  0 siblings, 0 replies; 36+ messages in thread
From: Guillaume Tucker @ 2021-11-16  9:22 UTC (permalink / raw)
  To: Ard Biesheuvel, linux-arm-kernel
  Cc: Russell King, Nicolas Pitre, Arnd Bergmann, Kees Cook,
	Keith Packard, Linus Walleij, Nick Desaulniers, kernelci

Hi Ard,

Please see the bisection report below about a boot failure on
omap4-panda which is pointing to this patch.

Reports aren't automatically sent to the public while we're
trialing new bisection features on kernelci.org but this one
looks valid.

Some more details can be found here:

  https://linux.kernelci.org/test/case/id/6191b1b97c175a5ade335948/

It seems like the kernel just froze after about 3 seconds without
any obvious errors in the log.

Please let us know if you need any help debugging this issue or
if you have a fix to try.

Best wishes,
Guillaume


GitHub: https://github.com/kernelci/kernelci-project/issues/75

-------------------------------------------------------------------------------

* * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * *
* This automated bisection report was sent to you on the basis  *
* that you may be involved with the breaking commit it has      *
* found.  No manual investigation has been done to verify it,   *
* and the root cause of the problem may be somewhere else.      *
*                                                               *
* If you do send a fix, please include this trailer:            *
*   Reported-by: "kernelci.org bot" <bot@kernelci.org>          *
*                                                               *
* Hope this helps!                                              *
* * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * *

ardb/for-kernelci bisection: baseline.login on panda

Summary:
  Start:      d1eccc4f44f11 ARM: implement support for vmap'ed stacks
  Plain log:  https://storage.kernelci.org/ardb/for-kernelci/v5.16-rc1-16-gd1eccc4f44f1/arm/multi_v7_defconfig+crypto/gcc-10/lab-collabora/baseline-panda.txt
  HTML log:   https://storage.kernelci.org/ardb/for-kernelci/v5.16-rc1-16-gd1eccc4f44f1/arm/multi_v7_defconfig+crypto/gcc-10/lab-collabora/baseline-panda.html
  Result:     d1eccc4f44f11 ARM: implement support for vmap'ed stacks

Checks:
  revert:     PASS
  verify:     PASS

Parameters:
  Tree:       ardb
  URL:        https://git.kernel.org/pub/scm/linux/kernel/git/ardb/linux.git
  Branch:     for-kernelci
  Target:     panda
  CPU arch:   arm
  Lab:        lab-collabora
  Compiler:   gcc-10
  Config:     multi_v7_defconfig+crypto
  Test case:  baseline.login

Breaking commit found:

-------------------------------------------------------------------------------
commit d1eccc4f44f11a8f3f5d376f08e3779d2196f93a
Author: Ard Biesheuvel <ardb@kernel.org>
Date:   Thu Sep 23 09:15:53 2021 +0200

    ARM: implement support for vmap'ed stacks

On 15/11/2021 11:18, Ard Biesheuvel wrote:
> Wire up the generic support for managing task stack allocations via vmalloc,
> and implement the entry code that detects whether we faulted because of a
> stack overrun (or future stack overrun caused by pushing the pt_regs array)
> 
> While this adds a fair amount of tricky entry asm code, it should be
> noted that it only adds a TST + branch to the svc_entry path. The code
> implementing the non-trivial handling of the overflow stack is emitted
> out-of-line into the .text section.
> 
> Since on ARM, we rely on do_translation_fault() to keep PMD level page
> table entries that cover the vmalloc region up to date, we need to
> ensure that we don't hit such a stale PMD entry when accessing the
> stack. So we do a dummy read from the new stack while still running from
> the old one on the context switch path, and bump the vmalloc_seq counter
> when PMD level entries in the vmalloc range are modified, so that the MM
> switch fetches the latest version of the entries.
> 
> Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
> Tested-by: Keith Packard <keithpac@amazon.com>
> ---
>  arch/arm/Kconfig                   |  1 +
>  arch/arm/include/asm/page.h        |  4 +
>  arch/arm/include/asm/thread_info.h |  8 ++
>  arch/arm/kernel/entry-armv.S       | 79 ++++++++++++++++++--
>  arch/arm/kernel/entry-header.S     | 57 ++++++++++++++
>  arch/arm/kernel/irq.c              |  9 ++-
>  arch/arm/kernel/traps.c            | 65 +++++++++++++++-
>  arch/arm/kernel/unwind.c           |  3 +-
>  arch/arm/kernel/vmlinux.lds.S      |  4 +-
>  9 files changed, 219 insertions(+), 11 deletions(-)
> 
> diff --git a/arch/arm/Kconfig b/arch/arm/Kconfig
> index b1eba1b4168c..a072600527ca 100644
> --- a/arch/arm/Kconfig
> +++ b/arch/arm/Kconfig
> @@ -127,6 +127,7 @@ config ARM
>  	select RTC_LIB
>  	select SYS_SUPPORTS_APM_EMULATION
>  	select THREAD_INFO_IN_TASK if CURRENT_POINTER_IN_TPIDRURO
> +	select HAVE_ARCH_VMAP_STACK if THREAD_INFO_IN_TASK
>  	select TRACE_IRQFLAGS_SUPPORT if !CPU_V7M
>  	# Above selects are sorted alphabetically; please add new ones
>  	# according to that.  Thanks.
> diff --git a/arch/arm/include/asm/page.h b/arch/arm/include/asm/page.h
> index 11b058a72a5b..7b871ed99ccf 100644
> --- a/arch/arm/include/asm/page.h
> +++ b/arch/arm/include/asm/page.h
> @@ -149,6 +149,10 @@ extern void copy_page(void *to, const void *from);
>  #include <asm/pgtable-2level-types.h>
>  #endif
>  
> +#ifdef CONFIG_VMAP_STACK
> +#define ARCH_PAGE_TABLE_SYNC_MASK	PGTBL_PMD_MODIFIED
> +#endif
> +
>  #endif /* CONFIG_MMU */
>  
>  typedef struct page *pgtable_t;
> diff --git a/arch/arm/include/asm/thread_info.h b/arch/arm/include/asm/thread_info.h
> index 164e15f26485..004b89d86224 100644
> --- a/arch/arm/include/asm/thread_info.h
> +++ b/arch/arm/include/asm/thread_info.h
> @@ -25,6 +25,14 @@
>  #define THREAD_SIZE		(PAGE_SIZE << THREAD_SIZE_ORDER)
>  #define THREAD_START_SP		(THREAD_SIZE - 8)
>  
> +#ifdef CONFIG_VMAP_STACK
> +#define THREAD_ALIGN		(2 * THREAD_SIZE)
> +#else
> +#define THREAD_ALIGN		THREAD_SIZE
> +#endif
> +
> +#define OVERFLOW_STACK_SIZE	SZ_4K
> +
>  #ifndef __ASSEMBLY__
>  
>  struct task_struct;
> diff --git a/arch/arm/kernel/entry-armv.S b/arch/arm/kernel/entry-armv.S
> index b18f3aa98f42..ad8d8304539e 100644
> --- a/arch/arm/kernel/entry-armv.S
> +++ b/arch/arm/kernel/entry-armv.S
> @@ -57,6 +57,14 @@ UNWIND(	.setfp	fpreg, sp		)
>  	@
>  	subs	r2, sp, r0		@ SP above bottom of IRQ stack?
>  	rsbscs	r2, r2, #THREAD_SIZE	@ ... and below the top?
> +#ifdef CONFIG_VMAP_STACK
> +	bcs	.L\@
> +	mov_l	r2, overflow_stack	@ Take base address
> +	add	r2, r2, r3		@ Top of this CPU's overflow stack
> +	subs	r2, r0, r2		@ Compare with incoming SP
> +	rsbscs	r2, r2, #OVERFLOW_STACK_SIZE
> +.L\@:
> +#endif
>  	movcs	sp, r0			@ If so, revert to incoming SP
>  
>  #ifndef CONFIG_UNWINDER_ARM
> @@ -188,13 +196,18 @@ ENDPROC(__und_invalid)
>  #define SPFIX(code...)
>  #endif
>  
> -	.macro	svc_entry, stack_hole=0, trace=1, uaccess=1
> +	.macro	svc_entry, stack_hole=0, trace=1, uaccess=1, overflow_check=1
>   UNWIND(.fnstart		)
> - UNWIND(.save {r0 - pc}		)
>  	sub	sp, sp, #(SVC_REGS_SIZE + \stack_hole)
> + THUMB(	add	sp, r0		)	@ get SP in a GPR without
> + THUMB(	sub	r0, sp, r0	)	@ using a temp register
> +
> +	.if	\overflow_check
> + UNWIND(.save	{r0 - pc}	)
> +	do_overflow_check (SVC_REGS_SIZE + \stack_hole)
> +	.endif
> +
>  #ifdef CONFIG_THUMB2_KERNEL
> -	add	sp, r0			@ get SP in a GPR without
> -	sub	r0, sp, r0		@ using a temp register
>  	tst	r0, #4			@ test stack pointer alignment
>  	sub	r0, sp, r0		@ restore original R0
>  	sub	sp, r0			@ restore original SP
> @@ -827,12 +840,20 @@ ENTRY(__switch_to)
>  	str	r7, [r8]
>  #endif
>  	mov	r0, r5
> -#if !defined(CONFIG_THUMB2_KERNEL)
> +#if !defined(CONFIG_THUMB2_KERNEL) && !defined(CONFIG_VMAP_STACK)
>  	set_current r7
>  	ldmia	r4, {r4 - sl, fp, sp, pc}	@ Load all regs saved previously
>  #else
>  	mov	r1, r7
>  	ldmia	r4, {r4 - sl, fp, ip, lr}	@ Load all regs saved previously
> +#ifdef CONFIG_VMAP_STACK
> +	@
> +	@ Do a dummy read from the new stack while running from the old one so
> +	@ that we can rely on do_translation_fault() to fix up any stale PMD
> +	@ entries covering the vmalloc region.
> +	@
> +	ldr	r2, [ip]
> +#endif
>  
>  	@ When CONFIG_THREAD_INFO_IN_TASK=n, the update of SP itself is what
>  	@ effectuates the task switch, as that is what causes the observable
> @@ -849,6 +870,54 @@ ENTRY(__switch_to)
>   UNWIND(.fnend		)
>  ENDPROC(__switch_to)
>  
> +#ifdef CONFIG_VMAP_STACK
> +	.text
> +__bad_stack:
> +	@
> +	@ We detected an overflow in svc_entry, which switched to the
> +	@ overflow stack. Stash the exception regs, and head to our overflow
> +	@ handler. Entered with the orginal value of SP in IP, and the original
> +	@ value of IP in TPIDRURW
> +	@
> +
> +#if defined(CONFIG_UNWINDER_FRAME_POINTER) && defined(CONFIG_CC_IS_GCC)
> +	mov	ip, ip				@ mov expected by unwinder
> +	push	{fp, ip, lr, pc}		@ GCC flavor frame record
> +#else
> +	str	ip, [sp, #-8]!			@ store original SP
> +	push	{fpreg, lr}			@ Clang flavor frame record
> +#endif
> +UNWIND( ldr	ip, [r0, #4]	)		@ load exception LR
> +UNWIND( str	ip, [sp, #12]	)		@ store in the frame record
> +	mrc	p15, 0, ip, c13, c0, 2		@ reload IP
> +
> +	@ Store the original GPRs to the new stack.
> +	svc_entry uaccess=0, overflow_check=0
> +
> +UNWIND( .save   {sp, pc}	)
> +UNWIND( .save   {fpreg, lr}	)
> +UNWIND( .setfp  fpreg, sp	)
> +
> +	ldr	fpreg, [sp, #S_SP]		@ Add our frame record
> +						@ to the linked list
> +#if defined(CONFIG_UNWINDER_FRAME_POINTER) && defined(CONFIG_CC_IS_GCC)
> +	ldr	r1, [fp, #4]			@ reload SP at entry
> +	add	fp, fp, #12
> +#else
> +	ldr	r1, [fpreg, #8]
> +#endif
> +	str	r1, [sp, #S_SP]			@ store in pt_regs
> +
> +	@ Stash the regs for handle_bad_stack
> +	mov	r0, sp
> +
> +	@ Time to die
> +	bl	handle_bad_stack
> +	nop
> +UNWIND( .fnend			)
> +ENDPROC(__bad_stack)
> +#endif
> +
>  	__INIT
>  
>  /*
> diff --git a/arch/arm/kernel/entry-header.S b/arch/arm/kernel/entry-header.S
> index ae24dd54e9ef..823dd1aa6e3e 100644
> --- a/arch/arm/kernel/entry-header.S
> +++ b/arch/arm/kernel/entry-header.S
> @@ -423,3 +423,60 @@ scno	.req	r7		@ syscall number
>  tbl	.req	r8		@ syscall table pointer
>  why	.req	r8		@ Linux syscall (!= 0)
>  tsk	.req	r9		@ current thread_info
> +
> +	.macro	do_overflow_check, frame_size:req
> +#ifdef CONFIG_VMAP_STACK
> +	@
> +	@ Test whether the SP has overflowed. Task and IRQ stacks are aligned
> +	@ so that SP & BIT(THREAD_SIZE_ORDER + PAGE_SHIFT) should always be
> +	@ zero.
> +	@
> +ARM(	tst	sp, #1 << (THREAD_SIZE_ORDER + PAGE_SHIFT)	)
> +THUMB(	tst	r0, #1 << (THREAD_SIZE_ORDER + PAGE_SHIFT)	)
> +THUMB(	it	ne						)
> +	bne	.Lstack_overflow_check\@
> +
> +	.pushsection	.text
> +.Lstack_overflow_check\@:
> +	@
> +	@ Either we've just detected an overflow, or we've taken an exception
> +	@ while on the overflow stack. We cannot use the stack until we have
> +	@ decided which is the case. However, as we won't return to userspace,
> +	@ we can clobber some USR/SYS mode registers to free up GPRs.
> +	@
> +
> +	mcr	p15, 0, ip, c13, c0, 2		@ Stash IP in TPIDRURW
> +	mrs	ip, cpsr
> +	eor	ip, ip, #(SVC_MODE ^ SYSTEM_MODE)
> +	msr	cpsr_c, ip			@ Switch to SYS mode
> +	eor	ip, ip, #(SVC_MODE ^ SYSTEM_MODE)
> +	mov	sp, ip				@ Stash mode in SP_usr
> +
> +	@ Load the overflow stack into IP using LR_usr as a scratch register
> +	mov_l	lr, overflow_stack + OVERFLOW_STACK_SIZE
> +	mrc	p15, 0, ip, c13, c0, 4		@ Get CPU offset
> +	add	ip, ip, lr			@ IP := this CPU's overflow stack
> +	mov	lr, sp				@ Unstash mode into LR_usr
> +	msr	cpsr_c, lr			@ Switch back to SVC mode
> +
> +	@
> +	@ Check whether we are already on the overflow stack. This may happen,
> +	@ e.g., when performing accesses that may fault when dumping the stack.
> +	@ The overflow stack is not in the vmalloc space so we only need to
> +	@ check whether the incoming SP is below the top of the overflow stack.
> +	@
> +ARM(	subs	ip, sp, ip		)	@ Delta with top of overflow stack
> +THUMB(	subs	ip, r0, ip		)
> +	mrclo	p15, 0, ip, c13, c0, 2		@ Restore IP
> +	blo	.Lout\@				@ Carry on
> +
> +THUMB(	sub	r0, sp, r0		)	@ Restore original R0
> +THUMB(	sub	sp, r0			)	@ Restore original SP
> +	sub	sp, sp, ip			@ Switch to overflow stack
> +	add	ip, sp, ip			@ Keep incoming SP value in IP
> +	add	ip, ip, #\frame_size		@ Undo svc_entry's SP change
> +	b	__bad_stack
> +	.popsection
> +.Lout\@:
> +#endif
> +	.endm
> diff --git a/arch/arm/kernel/irq.c b/arch/arm/kernel/irq.c
> index e05219bca218..5deb40f39999 100644
> --- a/arch/arm/kernel/irq.c
> +++ b/arch/arm/kernel/irq.c
> @@ -56,7 +56,14 @@ static void __init init_irq_stacks(void)
>  	int cpu;
>  
>  	for_each_possible_cpu(cpu) {
> -		stack = (u8 *)__get_free_pages(GFP_KERNEL, THREAD_SIZE_ORDER);
> +		if (!IS_ENABLED(CONFIG_VMAP_STACK))
> +			stack = (u8 *)__get_free_pages(GFP_KERNEL,
> +						       THREAD_SIZE_ORDER);
> +		else
> +			stack = __vmalloc_node(THREAD_SIZE, THREAD_ALIGN,
> +					       THREADINFO_GFP, NUMA_NO_NODE,
> +					       __builtin_return_address(0));
> +
>  		if (WARN_ON(!stack))
>  			break;
>  		per_cpu(irq_stack_ptr, cpu) = &stack[THREAD_SIZE];
> diff --git a/arch/arm/kernel/traps.c b/arch/arm/kernel/traps.c
> index b42c446cec9a..eb8c73be7c81 100644
> --- a/arch/arm/kernel/traps.c
> +++ b/arch/arm/kernel/traps.c
> @@ -121,7 +121,8 @@ void dump_backtrace_stm(u32 *stack, u32 instruction, const char *loglvl)
>  static int verify_stack(unsigned long sp)
>  {
>  	if (sp < PAGE_OFFSET ||
> -	    (sp > (unsigned long)high_memory && high_memory != NULL))
> +	    (!IS_ENABLED(CONFIG_VMAP_STACK) &&
> +	     sp > (unsigned long)high_memory && high_memory != NULL))
>  		return -EFAULT;
>  
>  	return 0;
> @@ -291,7 +292,8 @@ static int __die(const char *str, int err, struct pt_regs *regs)
>  
>  	if (!user_mode(regs) || in_interrupt()) {
>  		dump_mem(KERN_EMERG, "Stack: ", regs->ARM_sp,
> -			 ALIGN(regs->ARM_sp, THREAD_SIZE));
> +			 ALIGN(regs->ARM_sp - THREAD_SIZE, THREAD_ALIGN)
> +			 + THREAD_SIZE);
>  		dump_backtrace(regs, tsk, KERN_EMERG);
>  		dump_instr(KERN_EMERG, regs);
>  	}
> @@ -838,3 +840,62 @@ void __init early_trap_init(void *vectors_base)
>  	 */
>  #endif
>  }
> +
> +#ifdef CONFIG_VMAP_STACK
> +
> +DECLARE_PER_CPU(u8 *, irq_stack_ptr);
> +
> +asmlinkage DEFINE_PER_CPU_ALIGNED(u8[OVERFLOW_STACK_SIZE], overflow_stack);
> +
> +asmlinkage void handle_bad_stack(struct pt_regs *regs)
> +{
> +	unsigned long tsk_stk = (unsigned long)current->stack;
> +	unsigned long irq_stk = (unsigned long)this_cpu_read(irq_stack_ptr);
> +	unsigned long ovf_stk = (unsigned long)this_cpu_ptr(overflow_stack);
> +
> +	console_verbose();
> +	pr_emerg("Insufficient stack space to handle exception!");
> +
> +	pr_emerg("Task stack:     [0x%08lx..0x%08lx]\n",
> +		 tsk_stk, tsk_stk + THREAD_SIZE);
> +	pr_emerg("IRQ stack:      [0x%08lx..0x%08lx]\n",
> +		 irq_stk, irq_stk + THREAD_SIZE);
> +	pr_emerg("Overflow stack: [0x%08lx..0x%08lx]\n",
> +		 ovf_stk, ovf_stk + OVERFLOW_STACK_SIZE);
> +
> +	die("kernel stack overflow", regs, 0);
> +}
> +
> +/*
> + * Normally, we rely on the logic in do_translation_fault() to update stale PMD
> + * entries covering the vmalloc space in a task's page tables when it first
> + * accesses the region in question. Unfortunately, this is not sufficient when
> + * the task stack resides in the vmalloc region, as do_translation_fault() is a
> + * C function that needs a stack to run.
> + *
> + * So we need to ensure that these PMD entries are up to date *before* the MM
> + * switch. As we already have some logic in the MM switch path that takes care
> + * of this, let's trigger it by bumping the counter every time the core vmalloc
> + * code modifies a PMD entry in the vmalloc region.
> + */
> +void arch_sync_kernel_mappings(unsigned long start, unsigned long end)
> +{
> +	if (start > VMALLOC_END || end < VMALLOC_START)
> +		return;
> +
> +	/*
> +	 * This hooks into the core vmalloc code to receive notifications of
> +	 * any PMD level changes that have been made to the kernel page tables.
> +	 * This means it should only be triggered once for every MiB worth of
> +	 * vmalloc space, given that we don't support huge vmalloc/vmap on ARM,
> +	 * and that kernel PMD level table entries are rarely (if ever)
> +	 * updated.
> +	 *
> +	 * This means that the counter is going to max out at ~250 for the
> +	 * typical case. If it overflows, something entirely unexpected has
> +	 * occurred so let's throw a warning if that happens.
> +	 */
> +	WARN_ON(++init_mm.context.vmalloc_seq == UINT_MAX);
> +}
> +
> +#endif
> diff --git a/arch/arm/kernel/unwind.c b/arch/arm/kernel/unwind.c
> index e8d729975f12..c5ea328c428d 100644
> --- a/arch/arm/kernel/unwind.c
> +++ b/arch/arm/kernel/unwind.c
> @@ -389,7 +389,8 @@ int unwind_frame(struct stackframe *frame)
>  
>  	/* store the highest address on the stack to avoid crossing it*/
>  	ctrl.sp_low = frame->sp;
> -	ctrl.sp_high = ALIGN(ctrl.sp_low, THREAD_SIZE);
> +	ctrl.sp_high = ALIGN(ctrl.sp_low - THREAD_SIZE, THREAD_ALIGN)
> +		       + THREAD_SIZE;
>  
>  	pr_debug("%s(pc = %08lx lr = %08lx sp = %08lx)\n", __func__,
>  		 frame->pc, frame->lr, frame->sp);
> diff --git a/arch/arm/kernel/vmlinux.lds.S b/arch/arm/kernel/vmlinux.lds.S
> index f02d617e3359..aa12b65a7fd6 100644
> --- a/arch/arm/kernel/vmlinux.lds.S
> +++ b/arch/arm/kernel/vmlinux.lds.S
> @@ -138,12 +138,12 @@ SECTIONS
>  #ifdef CONFIG_STRICT_KERNEL_RWX
>  	. = ALIGN(1<<SECTION_SHIFT);
>  #else
> -	. = ALIGN(THREAD_SIZE);
> +	. = ALIGN(THREAD_ALIGN);
>  #endif
>  	__init_end = .;
>  
>  	_sdata = .;
> -	RW_DATA(L1_CACHE_BYTES, PAGE_SIZE, THREAD_SIZE)
> +	RW_DATA(L1_CACHE_BYTES, PAGE_SIZE, THREAD_ALIGN)
>  	_edata = .;
>  
>  	BSS_SECTION(0, 0, 0)
> 

Git bisection log:

-------------------------------------------------------------------------------
git bisect start
# good: [fa55b7dcdc43c1aa1ba12bca9d2dd4318c2a0dbf] Linux 5.16-rc1
git bisect good fa55b7dcdc43c1aa1ba12bca9d2dd4318c2a0dbf
# bad: [d1eccc4f44f11a8f3f5d376f08e3779d2196f93a] ARM: implement support for vmap'ed stacks
git bisect bad d1eccc4f44f11a8f3f5d376f08e3779d2196f93a
# good: [6bdd12e7427858bd0f1a68f553617fb1d97dbabb] ARM: call_with_stack: add unwind support
git bisect good 6bdd12e7427858bd0f1a68f553617fb1d97dbabb
# good: [fcc7029189cce80c2eac396ce4cd4544634d46e3] ARM: memset: clean up unwind annotations
git bisect good fcc7029189cce80c2eac396ce4cd4544634d46e3
# good: [e206f1841c51d2b3d8339efac3ac806f32d64821] ARM: switch_to: clean up Thumb2 code path
git bisect good e206f1841c51d2b3d8339efac3ac806f32d64821
# good: [a7c9e1b40e858eba7ff99a548153fd5c92b68e24] ARM: entry: rework stack realignment code in svc_entry
git bisect good a7c9e1b40e858eba7ff99a548153fd5c92b68e24
# first bad commit: [d1eccc4f44f11a8f3f5d376f08e3779d2196f93a] ARM: implement support for vmap'ed stacks
-------------------------------------------------------------------------------

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH v3 7/7] ARM: implement support for vmap'ed stacks
@ 2021-11-16  9:22     ` Guillaume Tucker
  0 siblings, 0 replies; 36+ messages in thread
From: Guillaume Tucker @ 2021-11-16  9:22 UTC (permalink / raw)
  To: Ard Biesheuvel, linux-arm-kernel
  Cc: Russell King, Nicolas Pitre, Arnd Bergmann, Kees Cook,
	Keith Packard, Linus Walleij, Nick Desaulniers, kernelci

Hi Ard,

Please see the bisection report below about a boot failure on
omap4-panda which is pointing to this patch.

Reports aren't automatically sent to the public while we're
trialing new bisection features on kernelci.org but this one
looks valid.

Some more details can be found here:

  https://linux.kernelci.org/test/case/id/6191b1b97c175a5ade335948/

It seems like the kernel just froze after about 3 seconds without
any obvious errors in the log.

Please let us know if you need any help debugging this issue or
if you have a fix to try.

Best wishes,
Guillaume


GitHub: https://github.com/kernelci/kernelci-project/issues/75

-------------------------------------------------------------------------------

* * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * *
* This automated bisection report was sent to you on the basis  *
* that you may be involved with the breaking commit it has      *
* found.  No manual investigation has been done to verify it,   *
* and the root cause of the problem may be somewhere else.      *
*                                                               *
* If you do send a fix, please include this trailer:            *
*   Reported-by: "kernelci.org bot" <bot@kernelci.org>          *
*                                                               *
* Hope this helps!                                              *
* * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * *

ardb/for-kernelci bisection: baseline.login on panda

Summary:
  Start:      d1eccc4f44f11 ARM: implement support for vmap'ed stacks
  Plain log:  https://storage.kernelci.org/ardb/for-kernelci/v5.16-rc1-16-gd1eccc4f44f1/arm/multi_v7_defconfig+crypto/gcc-10/lab-collabora/baseline-panda.txt
  HTML log:   https://storage.kernelci.org/ardb/for-kernelci/v5.16-rc1-16-gd1eccc4f44f1/arm/multi_v7_defconfig+crypto/gcc-10/lab-collabora/baseline-panda.html
  Result:     d1eccc4f44f11 ARM: implement support for vmap'ed stacks

Checks:
  revert:     PASS
  verify:     PASS

Parameters:
  Tree:       ardb
  URL:        https://git.kernel.org/pub/scm/linux/kernel/git/ardb/linux.git
  Branch:     for-kernelci
  Target:     panda
  CPU arch:   arm
  Lab:        lab-collabora
  Compiler:   gcc-10
  Config:     multi_v7_defconfig+crypto
  Test case:  baseline.login

Breaking commit found:

-------------------------------------------------------------------------------
commit d1eccc4f44f11a8f3f5d376f08e3779d2196f93a
Author: Ard Biesheuvel <ardb@kernel.org>
Date:   Thu Sep 23 09:15:53 2021 +0200

    ARM: implement support for vmap'ed stacks

On 15/11/2021 11:18, Ard Biesheuvel wrote:
> Wire up the generic support for managing task stack allocations via vmalloc,
> and implement the entry code that detects whether we faulted because of a
> stack overrun (or future stack overrun caused by pushing the pt_regs array)
> 
> While this adds a fair amount of tricky entry asm code, it should be
> noted that it only adds a TST + branch to the svc_entry path. The code
> implementing the non-trivial handling of the overflow stack is emitted
> out-of-line into the .text section.
> 
> Since on ARM, we rely on do_translation_fault() to keep PMD level page
> table entries that cover the vmalloc region up to date, we need to
> ensure that we don't hit such a stale PMD entry when accessing the
> stack. So we do a dummy read from the new stack while still running from
> the old one on the context switch path, and bump the vmalloc_seq counter
> when PMD level entries in the vmalloc range are modified, so that the MM
> switch fetches the latest version of the entries.
> 
> Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
> Tested-by: Keith Packard <keithpac@amazon.com>
> ---
>  arch/arm/Kconfig                   |  1 +
>  arch/arm/include/asm/page.h        |  4 +
>  arch/arm/include/asm/thread_info.h |  8 ++
>  arch/arm/kernel/entry-armv.S       | 79 ++++++++++++++++++--
>  arch/arm/kernel/entry-header.S     | 57 ++++++++++++++
>  arch/arm/kernel/irq.c              |  9 ++-
>  arch/arm/kernel/traps.c            | 65 +++++++++++++++-
>  arch/arm/kernel/unwind.c           |  3 +-
>  arch/arm/kernel/vmlinux.lds.S      |  4 +-
>  9 files changed, 219 insertions(+), 11 deletions(-)
> 
> diff --git a/arch/arm/Kconfig b/arch/arm/Kconfig
> index b1eba1b4168c..a072600527ca 100644
> --- a/arch/arm/Kconfig
> +++ b/arch/arm/Kconfig
> @@ -127,6 +127,7 @@ config ARM
>  	select RTC_LIB
>  	select SYS_SUPPORTS_APM_EMULATION
>  	select THREAD_INFO_IN_TASK if CURRENT_POINTER_IN_TPIDRURO
> +	select HAVE_ARCH_VMAP_STACK if THREAD_INFO_IN_TASK
>  	select TRACE_IRQFLAGS_SUPPORT if !CPU_V7M
>  	# Above selects are sorted alphabetically; please add new ones
>  	# according to that.  Thanks.
> diff --git a/arch/arm/include/asm/page.h b/arch/arm/include/asm/page.h
> index 11b058a72a5b..7b871ed99ccf 100644
> --- a/arch/arm/include/asm/page.h
> +++ b/arch/arm/include/asm/page.h
> @@ -149,6 +149,10 @@ extern void copy_page(void *to, const void *from);
>  #include <asm/pgtable-2level-types.h>
>  #endif
>  
> +#ifdef CONFIG_VMAP_STACK
> +#define ARCH_PAGE_TABLE_SYNC_MASK	PGTBL_PMD_MODIFIED
> +#endif
> +
>  #endif /* CONFIG_MMU */
>  
>  typedef struct page *pgtable_t;
> diff --git a/arch/arm/include/asm/thread_info.h b/arch/arm/include/asm/thread_info.h
> index 164e15f26485..004b89d86224 100644
> --- a/arch/arm/include/asm/thread_info.h
> +++ b/arch/arm/include/asm/thread_info.h
> @@ -25,6 +25,14 @@
>  #define THREAD_SIZE		(PAGE_SIZE << THREAD_SIZE_ORDER)
>  #define THREAD_START_SP		(THREAD_SIZE - 8)
>  
> +#ifdef CONFIG_VMAP_STACK
> +#define THREAD_ALIGN		(2 * THREAD_SIZE)
> +#else
> +#define THREAD_ALIGN		THREAD_SIZE
> +#endif
> +
> +#define OVERFLOW_STACK_SIZE	SZ_4K
> +
>  #ifndef __ASSEMBLY__
>  
>  struct task_struct;
> diff --git a/arch/arm/kernel/entry-armv.S b/arch/arm/kernel/entry-armv.S
> index b18f3aa98f42..ad8d8304539e 100644
> --- a/arch/arm/kernel/entry-armv.S
> +++ b/arch/arm/kernel/entry-armv.S
> @@ -57,6 +57,14 @@ UNWIND(	.setfp	fpreg, sp		)
>  	@
>  	subs	r2, sp, r0		@ SP above bottom of IRQ stack?
>  	rsbscs	r2, r2, #THREAD_SIZE	@ ... and below the top?
> +#ifdef CONFIG_VMAP_STACK
> +	bcs	.L\@
> +	mov_l	r2, overflow_stack	@ Take base address
> +	add	r2, r2, r3		@ Top of this CPU's overflow stack
> +	subs	r2, r0, r2		@ Compare with incoming SP
> +	rsbscs	r2, r2, #OVERFLOW_STACK_SIZE
> +.L\@:
> +#endif
>  	movcs	sp, r0			@ If so, revert to incoming SP
>  
>  #ifndef CONFIG_UNWINDER_ARM
> @@ -188,13 +196,18 @@ ENDPROC(__und_invalid)
>  #define SPFIX(code...)
>  #endif
>  
> -	.macro	svc_entry, stack_hole=0, trace=1, uaccess=1
> +	.macro	svc_entry, stack_hole=0, trace=1, uaccess=1, overflow_check=1
>   UNWIND(.fnstart		)
> - UNWIND(.save {r0 - pc}		)
>  	sub	sp, sp, #(SVC_REGS_SIZE + \stack_hole)
> + THUMB(	add	sp, r0		)	@ get SP in a GPR without
> + THUMB(	sub	r0, sp, r0	)	@ using a temp register
> +
> +	.if	\overflow_check
> + UNWIND(.save	{r0 - pc}	)
> +	do_overflow_check (SVC_REGS_SIZE + \stack_hole)
> +	.endif
> +
>  #ifdef CONFIG_THUMB2_KERNEL
> -	add	sp, r0			@ get SP in a GPR without
> -	sub	r0, sp, r0		@ using a temp register
>  	tst	r0, #4			@ test stack pointer alignment
>  	sub	r0, sp, r0		@ restore original R0
>  	sub	sp, r0			@ restore original SP
> @@ -827,12 +840,20 @@ ENTRY(__switch_to)
>  	str	r7, [r8]
>  #endif
>  	mov	r0, r5
> -#if !defined(CONFIG_THUMB2_KERNEL)
> +#if !defined(CONFIG_THUMB2_KERNEL) && !defined(CONFIG_VMAP_STACK)
>  	set_current r7
>  	ldmia	r4, {r4 - sl, fp, sp, pc}	@ Load all regs saved previously
>  #else
>  	mov	r1, r7
>  	ldmia	r4, {r4 - sl, fp, ip, lr}	@ Load all regs saved previously
> +#ifdef CONFIG_VMAP_STACK
> +	@
> +	@ Do a dummy read from the new stack while running from the old one so
> +	@ that we can rely on do_translation_fault() to fix up any stale PMD
> +	@ entries covering the vmalloc region.
> +	@
> +	ldr	r2, [ip]
> +#endif
>  
>  	@ When CONFIG_THREAD_INFO_IN_TASK=n, the update of SP itself is what
>  	@ effectuates the task switch, as that is what causes the observable
> @@ -849,6 +870,54 @@ ENTRY(__switch_to)
>   UNWIND(.fnend		)
>  ENDPROC(__switch_to)
>  
> +#ifdef CONFIG_VMAP_STACK
> +	.text
> +__bad_stack:
> +	@
> +	@ We detected an overflow in svc_entry, which switched to the
> +	@ overflow stack. Stash the exception regs, and head to our overflow
> +	@ handler. Entered with the orginal value of SP in IP, and the original
> +	@ value of IP in TPIDRURW
> +	@
> +
> +#if defined(CONFIG_UNWINDER_FRAME_POINTER) && defined(CONFIG_CC_IS_GCC)
> +	mov	ip, ip				@ mov expected by unwinder
> +	push	{fp, ip, lr, pc}		@ GCC flavor frame record
> +#else
> +	str	ip, [sp, #-8]!			@ store original SP
> +	push	{fpreg, lr}			@ Clang flavor frame record
> +#endif
> +UNWIND( ldr	ip, [r0, #4]	)		@ load exception LR
> +UNWIND( str	ip, [sp, #12]	)		@ store in the frame record
> +	mrc	p15, 0, ip, c13, c0, 2		@ reload IP
> +
> +	@ Store the original GPRs to the new stack.
> +	svc_entry uaccess=0, overflow_check=0
> +
> +UNWIND( .save   {sp, pc}	)
> +UNWIND( .save   {fpreg, lr}	)
> +UNWIND( .setfp  fpreg, sp	)
> +
> +	ldr	fpreg, [sp, #S_SP]		@ Add our frame record
> +						@ to the linked list
> +#if defined(CONFIG_UNWINDER_FRAME_POINTER) && defined(CONFIG_CC_IS_GCC)
> +	ldr	r1, [fp, #4]			@ reload SP at entry
> +	add	fp, fp, #12
> +#else
> +	ldr	r1, [fpreg, #8]
> +#endif
> +	str	r1, [sp, #S_SP]			@ store in pt_regs
> +
> +	@ Stash the regs for handle_bad_stack
> +	mov	r0, sp
> +
> +	@ Time to die
> +	bl	handle_bad_stack
> +	nop
> +UNWIND( .fnend			)
> +ENDPROC(__bad_stack)
> +#endif
> +
>  	__INIT
>  
>  /*
> diff --git a/arch/arm/kernel/entry-header.S b/arch/arm/kernel/entry-header.S
> index ae24dd54e9ef..823dd1aa6e3e 100644
> --- a/arch/arm/kernel/entry-header.S
> +++ b/arch/arm/kernel/entry-header.S
> @@ -423,3 +423,60 @@ scno	.req	r7		@ syscall number
>  tbl	.req	r8		@ syscall table pointer
>  why	.req	r8		@ Linux syscall (!= 0)
>  tsk	.req	r9		@ current thread_info
> +
> +	.macro	do_overflow_check, frame_size:req
> +#ifdef CONFIG_VMAP_STACK
> +	@
> +	@ Test whether the SP has overflowed. Task and IRQ stacks are aligned
> +	@ so that SP & BIT(THREAD_SIZE_ORDER + PAGE_SHIFT) should always be
> +	@ zero.
> +	@
> +ARM(	tst	sp, #1 << (THREAD_SIZE_ORDER + PAGE_SHIFT)	)
> +THUMB(	tst	r0, #1 << (THREAD_SIZE_ORDER + PAGE_SHIFT)	)
> +THUMB(	it	ne						)
> +	bne	.Lstack_overflow_check\@
> +
> +	.pushsection	.text
> +.Lstack_overflow_check\@:
> +	@
> +	@ Either we've just detected an overflow, or we've taken an exception
> +	@ while on the overflow stack. We cannot use the stack until we have
> +	@ decided which is the case. However, as we won't return to userspace,
> +	@ we can clobber some USR/SYS mode registers to free up GPRs.
> +	@
> +
> +	mcr	p15, 0, ip, c13, c0, 2		@ Stash IP in TPIDRURW
> +	mrs	ip, cpsr
> +	eor	ip, ip, #(SVC_MODE ^ SYSTEM_MODE)
> +	msr	cpsr_c, ip			@ Switch to SYS mode
> +	eor	ip, ip, #(SVC_MODE ^ SYSTEM_MODE)
> +	mov	sp, ip				@ Stash mode in SP_usr
> +
> +	@ Load the overflow stack into IP using LR_usr as a scratch register
> +	mov_l	lr, overflow_stack + OVERFLOW_STACK_SIZE
> +	mrc	p15, 0, ip, c13, c0, 4		@ Get CPU offset
> +	add	ip, ip, lr			@ IP := this CPU's overflow stack
> +	mov	lr, sp				@ Unstash mode into LR_usr
> +	msr	cpsr_c, lr			@ Switch back to SVC mode
> +
> +	@
> +	@ Check whether we are already on the overflow stack. This may happen,
> +	@ e.g., when performing accesses that may fault when dumping the stack.
> +	@ The overflow stack is not in the vmalloc space so we only need to
> +	@ check whether the incoming SP is below the top of the overflow stack.
> +	@
> +ARM(	subs	ip, sp, ip		)	@ Delta with top of overflow stack
> +THUMB(	subs	ip, r0, ip		)
> +	mrclo	p15, 0, ip, c13, c0, 2		@ Restore IP
> +	blo	.Lout\@				@ Carry on
> +
> +THUMB(	sub	r0, sp, r0		)	@ Restore original R0
> +THUMB(	sub	sp, r0			)	@ Restore original SP
> +	sub	sp, sp, ip			@ Switch to overflow stack
> +	add	ip, sp, ip			@ Keep incoming SP value in IP
> +	add	ip, ip, #\frame_size		@ Undo svc_entry's SP change
> +	b	__bad_stack
> +	.popsection
> +.Lout\@:
> +#endif
> +	.endm
> diff --git a/arch/arm/kernel/irq.c b/arch/arm/kernel/irq.c
> index e05219bca218..5deb40f39999 100644
> --- a/arch/arm/kernel/irq.c
> +++ b/arch/arm/kernel/irq.c
> @@ -56,7 +56,14 @@ static void __init init_irq_stacks(void)
>  	int cpu;
>  
>  	for_each_possible_cpu(cpu) {
> -		stack = (u8 *)__get_free_pages(GFP_KERNEL, THREAD_SIZE_ORDER);
> +		if (!IS_ENABLED(CONFIG_VMAP_STACK))
> +			stack = (u8 *)__get_free_pages(GFP_KERNEL,
> +						       THREAD_SIZE_ORDER);
> +		else
> +			stack = __vmalloc_node(THREAD_SIZE, THREAD_ALIGN,
> +					       THREADINFO_GFP, NUMA_NO_NODE,
> +					       __builtin_return_address(0));
> +
>  		if (WARN_ON(!stack))
>  			break;
>  		per_cpu(irq_stack_ptr, cpu) = &stack[THREAD_SIZE];
> diff --git a/arch/arm/kernel/traps.c b/arch/arm/kernel/traps.c
> index b42c446cec9a..eb8c73be7c81 100644
> --- a/arch/arm/kernel/traps.c
> +++ b/arch/arm/kernel/traps.c
> @@ -121,7 +121,8 @@ void dump_backtrace_stm(u32 *stack, u32 instruction, const char *loglvl)
>  static int verify_stack(unsigned long sp)
>  {
>  	if (sp < PAGE_OFFSET ||
> -	    (sp > (unsigned long)high_memory && high_memory != NULL))
> +	    (!IS_ENABLED(CONFIG_VMAP_STACK) &&
> +	     sp > (unsigned long)high_memory && high_memory != NULL))
>  		return -EFAULT;
>  
>  	return 0;
> @@ -291,7 +292,8 @@ static int __die(const char *str, int err, struct pt_regs *regs)
>  
>  	if (!user_mode(regs) || in_interrupt()) {
>  		dump_mem(KERN_EMERG, "Stack: ", regs->ARM_sp,
> -			 ALIGN(regs->ARM_sp, THREAD_SIZE));
> +			 ALIGN(regs->ARM_sp - THREAD_SIZE, THREAD_ALIGN)
> +			 + THREAD_SIZE);
>  		dump_backtrace(regs, tsk, KERN_EMERG);
>  		dump_instr(KERN_EMERG, regs);
>  	}
> @@ -838,3 +840,62 @@ void __init early_trap_init(void *vectors_base)
>  	 */
>  #endif
>  }
> +
> +#ifdef CONFIG_VMAP_STACK
> +
> +DECLARE_PER_CPU(u8 *, irq_stack_ptr);
> +
> +asmlinkage DEFINE_PER_CPU_ALIGNED(u8[OVERFLOW_STACK_SIZE], overflow_stack);
> +
> +asmlinkage void handle_bad_stack(struct pt_regs *regs)
> +{
> +	unsigned long tsk_stk = (unsigned long)current->stack;
> +	unsigned long irq_stk = (unsigned long)this_cpu_read(irq_stack_ptr);
> +	unsigned long ovf_stk = (unsigned long)this_cpu_ptr(overflow_stack);
> +
> +	console_verbose();
> +	pr_emerg("Insufficient stack space to handle exception!");
> +
> +	pr_emerg("Task stack:     [0x%08lx..0x%08lx]\n",
> +		 tsk_stk, tsk_stk + THREAD_SIZE);
> +	pr_emerg("IRQ stack:      [0x%08lx..0x%08lx]\n",
> +		 irq_stk, irq_stk + THREAD_SIZE);
> +	pr_emerg("Overflow stack: [0x%08lx..0x%08lx]\n",
> +		 ovf_stk, ovf_stk + OVERFLOW_STACK_SIZE);
> +
> +	die("kernel stack overflow", regs, 0);
> +}
> +
> +/*
> + * Normally, we rely on the logic in do_translation_fault() to update stale PMD
> + * entries covering the vmalloc space in a task's page tables when it first
> + * accesses the region in question. Unfortunately, this is not sufficient when
> + * the task stack resides in the vmalloc region, as do_translation_fault() is a
> + * C function that needs a stack to run.
> + *
> + * So we need to ensure that these PMD entries are up to date *before* the MM
> + * switch. As we already have some logic in the MM switch path that takes care
> + * of this, let's trigger it by bumping the counter every time the core vmalloc
> + * code modifies a PMD entry in the vmalloc region.
> + */
> +void arch_sync_kernel_mappings(unsigned long start, unsigned long end)
> +{
> +	if (start > VMALLOC_END || end < VMALLOC_START)
> +		return;
> +
> +	/*
> +	 * This hooks into the core vmalloc code to receive notifications of
> +	 * any PMD level changes that have been made to the kernel page tables.
> +	 * This means it should only be triggered once for every MiB worth of
> +	 * vmalloc space, given that we don't support huge vmalloc/vmap on ARM,
> +	 * and that kernel PMD level table entries are rarely (if ever)
> +	 * updated.
> +	 *
> +	 * This means that the counter is going to max out at ~250 for the
> +	 * typical case. If it overflows, something entirely unexpected has
> +	 * occurred so let's throw a warning if that happens.
> +	 */
> +	WARN_ON(++init_mm.context.vmalloc_seq == UINT_MAX);
> +}
> +
> +#endif
> diff --git a/arch/arm/kernel/unwind.c b/arch/arm/kernel/unwind.c
> index e8d729975f12..c5ea328c428d 100644
> --- a/arch/arm/kernel/unwind.c
> +++ b/arch/arm/kernel/unwind.c
> @@ -389,7 +389,8 @@ int unwind_frame(struct stackframe *frame)
>  
>  	/* store the highest address on the stack to avoid crossing it*/
>  	ctrl.sp_low = frame->sp;
> -	ctrl.sp_high = ALIGN(ctrl.sp_low, THREAD_SIZE);
> +	ctrl.sp_high = ALIGN(ctrl.sp_low - THREAD_SIZE, THREAD_ALIGN)
> +		       + THREAD_SIZE;
>  
>  	pr_debug("%s(pc = %08lx lr = %08lx sp = %08lx)\n", __func__,
>  		 frame->pc, frame->lr, frame->sp);
> diff --git a/arch/arm/kernel/vmlinux.lds.S b/arch/arm/kernel/vmlinux.lds.S
> index f02d617e3359..aa12b65a7fd6 100644
> --- a/arch/arm/kernel/vmlinux.lds.S
> +++ b/arch/arm/kernel/vmlinux.lds.S
> @@ -138,12 +138,12 @@ SECTIONS
>  #ifdef CONFIG_STRICT_KERNEL_RWX
>  	. = ALIGN(1<<SECTION_SHIFT);
>  #else
> -	. = ALIGN(THREAD_SIZE);
> +	. = ALIGN(THREAD_ALIGN);
>  #endif
>  	__init_end = .;
>  
>  	_sdata = .;
> -	RW_DATA(L1_CACHE_BYTES, PAGE_SIZE, THREAD_SIZE)
> +	RW_DATA(L1_CACHE_BYTES, PAGE_SIZE, THREAD_ALIGN)
>  	_edata = .;
>  
>  	BSS_SECTION(0, 0, 0)
> 

Git bisection log:

-------------------------------------------------------------------------------
git bisect start
# good: [fa55b7dcdc43c1aa1ba12bca9d2dd4318c2a0dbf] Linux 5.16-rc1
git bisect good fa55b7dcdc43c1aa1ba12bca9d2dd4318c2a0dbf
# bad: [d1eccc4f44f11a8f3f5d376f08e3779d2196f93a] ARM: implement support for vmap'ed stacks
git bisect bad d1eccc4f44f11a8f3f5d376f08e3779d2196f93a
# good: [6bdd12e7427858bd0f1a68f553617fb1d97dbabb] ARM: call_with_stack: add unwind support
git bisect good 6bdd12e7427858bd0f1a68f553617fb1d97dbabb
# good: [fcc7029189cce80c2eac396ce4cd4544634d46e3] ARM: memset: clean up unwind annotations
git bisect good fcc7029189cce80c2eac396ce4cd4544634d46e3
# good: [e206f1841c51d2b3d8339efac3ac806f32d64821] ARM: switch_to: clean up Thumb2 code path
git bisect good e206f1841c51d2b3d8339efac3ac806f32d64821
# good: [a7c9e1b40e858eba7ff99a548153fd5c92b68e24] ARM: entry: rework stack realignment code in svc_entry
git bisect good a7c9e1b40e858eba7ff99a548153fd5c92b68e24
# first bad commit: [d1eccc4f44f11a8f3f5d376f08e3779d2196f93a] ARM: implement support for vmap'ed stacks
-------------------------------------------------------------------------------

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH v3 7/7] ARM: implement support for vmap'ed stacks
  2021-11-16  9:22     ` Guillaume Tucker
@ 2021-11-16 19:28       ` Ard Biesheuvel
  -1 siblings, 0 replies; 36+ messages in thread
From: Ard Biesheuvel @ 2021-11-16 19:28 UTC (permalink / raw)
  To: Guillaume Tucker, Tony Lindgren, linux-omap
  Cc: Linux ARM, Russell King, Nicolas Pitre, Arnd Bergmann, Kees Cook,
	Keith Packard, Linus Walleij, Nick Desaulniers, kernelci

(+ Tony and linux-omap@)

On Tue, 16 Nov 2021 at 10:23, Guillaume Tucker
<guillaume.tucker@collabora.com> wrote:
>
> Hi Ard,
>
> Please see the bisection report below about a boot failure on
> omap4-panda which is pointing to this patch.
>
> Reports aren't automatically sent to the public while we're
> trialing new bisection features on kernelci.org but this one
> looks valid.
>
> Some more details can be found here:
>
>   https://linux.kernelci.org/test/case/id/6191b1b97c175a5ade335948/
>
> It seems like the kernel just froze after about 3 seconds without
> any obvious errors in the log.
>
> Please let us know if you need any help debugging this issue or
> if you have a fix to try.
>

Thanks for the report.

I wonder if this might be related to low level platform code running
off a different stack (maybe in SRAM?) when an interrupt is taken? Or
using a different set of page tables that are out of sync in terms of
VMALLOC space mappings?

Could anyone who speaks OMAP please take a look at the linked boot
log, and hopefully make sense of it?

For background, this series enables vmap'ed stacks support for ARMv7,
which means that the entry code checks whether the stack pointer may
be pointing into the guard region before the vmalloc'ed stack, and
kills the task if it looks like the kernel stack overflowed.

Here's another instance:
https://linux.kernelci.org/build/id/6193fa5c6c4e1d02bd3358ff/

Everything builds and boots happily, but odd things happen on OMAP
based devices: Panda just gives up right after discovering the USB
controller, and Beagle-XM just starts showing all kinds of weird
crashes at roughly the same point in the boot.

>
> GitHub: https://github.com/kernelci/kernelci-project/issues/75
>
> -------------------------------------------------------------------------------
>
> * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * *
> * This automated bisection report was sent to you on the basis  *
> * that you may be involved with the breaking commit it has      *
> * found.  No manual investigation has been done to verify it,   *
> * and the root cause of the problem may be somewhere else.      *
> *                                                               *
> * If you do send a fix, please include this trailer:            *
> *   Reported-by: "kernelci.org bot" <bot@kernelci.org>          *
> *                                                               *
> * Hope this helps!                                              *
> * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * *
>
> ardb/for-kernelci bisection: baseline.login on panda
>
> Summary:
>   Start:      d1eccc4f44f11 ARM: implement support for vmap'ed stacks
>   Plain log:  https://storage.kernelci.org/ardb/for-kernelci/v5.16-rc1-16-gd1eccc4f44f1/arm/multi_v7_defconfig+crypto/gcc-10/lab-collabora/baseline-panda.txt
>   HTML log:   https://storage.kernelci.org/ardb/for-kernelci/v5.16-rc1-16-gd1eccc4f44f1/arm/multi_v7_defconfig+crypto/gcc-10/lab-collabora/baseline-panda.html
>   Result:     d1eccc4f44f11 ARM: implement support for vmap'ed stacks
>
> Checks:
>   revert:     PASS
>   verify:     PASS
>
> Parameters:
>   Tree:       ardb
>   URL:        https://git.kernel.org/pub/scm/linux/kernel/git/ardb/linux.git
>   Branch:     for-kernelci
>   Target:     panda
>   CPU arch:   arm
>   Lab:        lab-collabora
>   Compiler:   gcc-10
>   Config:     multi_v7_defconfig+crypto
>   Test case:  baseline.login
>
> Breaking commit found:
>
> -------------------------------------------------------------------------------
> commit d1eccc4f44f11a8f3f5d376f08e3779d2196f93a
> Author: Ard Biesheuvel <ardb@kernel.org>
> Date:   Thu Sep 23 09:15:53 2021 +0200
>
>     ARM: implement support for vmap'ed stacks
>
> On 15/11/2021 11:18, Ard Biesheuvel wrote:
> > Wire up the generic support for managing task stack allocations via vmalloc,
> > and implement the entry code that detects whether we faulted because of a
> > stack overrun (or future stack overrun caused by pushing the pt_regs array)
> >
> > While this adds a fair amount of tricky entry asm code, it should be
> > noted that it only adds a TST + branch to the svc_entry path. The code
> > implementing the non-trivial handling of the overflow stack is emitted
> > out-of-line into the .text section.
> >
> > Since on ARM, we rely on do_translation_fault() to keep PMD level page
> > table entries that cover the vmalloc region up to date, we need to
> > ensure that we don't hit such a stale PMD entry when accessing the
> > stack. So we do a dummy read from the new stack while still running from
> > the old one on the context switch path, and bump the vmalloc_seq counter
> > when PMD level entries in the vmalloc range are modified, so that the MM
> > switch fetches the latest version of the entries.
> >
> > Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
> > Tested-by: Keith Packard <keithpac@amazon.com>
> > ---
> >  arch/arm/Kconfig                   |  1 +
> >  arch/arm/include/asm/page.h        |  4 +
> >  arch/arm/include/asm/thread_info.h |  8 ++
> >  arch/arm/kernel/entry-armv.S       | 79 ++++++++++++++++++--
> >  arch/arm/kernel/entry-header.S     | 57 ++++++++++++++
> >  arch/arm/kernel/irq.c              |  9 ++-
> >  arch/arm/kernel/traps.c            | 65 +++++++++++++++-
> >  arch/arm/kernel/unwind.c           |  3 +-
> >  arch/arm/kernel/vmlinux.lds.S      |  4 +-
> >  9 files changed, 219 insertions(+), 11 deletions(-)
> >
> > diff --git a/arch/arm/Kconfig b/arch/arm/Kconfig
> > index b1eba1b4168c..a072600527ca 100644
> > --- a/arch/arm/Kconfig
> > +++ b/arch/arm/Kconfig
> > @@ -127,6 +127,7 @@ config ARM
> >       select RTC_LIB
> >       select SYS_SUPPORTS_APM_EMULATION
> >       select THREAD_INFO_IN_TASK if CURRENT_POINTER_IN_TPIDRURO
> > +     select HAVE_ARCH_VMAP_STACK if THREAD_INFO_IN_TASK
> >       select TRACE_IRQFLAGS_SUPPORT if !CPU_V7M
> >       # Above selects are sorted alphabetically; please add new ones
> >       # according to that.  Thanks.
> > diff --git a/arch/arm/include/asm/page.h b/arch/arm/include/asm/page.h
> > index 11b058a72a5b..7b871ed99ccf 100644
> > --- a/arch/arm/include/asm/page.h
> > +++ b/arch/arm/include/asm/page.h
> > @@ -149,6 +149,10 @@ extern void copy_page(void *to, const void *from);
> >  #include <asm/pgtable-2level-types.h>
> >  #endif
> >
> > +#ifdef CONFIG_VMAP_STACK
> > +#define ARCH_PAGE_TABLE_SYNC_MASK    PGTBL_PMD_MODIFIED
> > +#endif
> > +
> >  #endif /* CONFIG_MMU */
> >
> >  typedef struct page *pgtable_t;
> > diff --git a/arch/arm/include/asm/thread_info.h b/arch/arm/include/asm/thread_info.h
> > index 164e15f26485..004b89d86224 100644
> > --- a/arch/arm/include/asm/thread_info.h
> > +++ b/arch/arm/include/asm/thread_info.h
> > @@ -25,6 +25,14 @@
> >  #define THREAD_SIZE          (PAGE_SIZE << THREAD_SIZE_ORDER)
> >  #define THREAD_START_SP              (THREAD_SIZE - 8)
> >
> > +#ifdef CONFIG_VMAP_STACK
> > +#define THREAD_ALIGN         (2 * THREAD_SIZE)
> > +#else
> > +#define THREAD_ALIGN         THREAD_SIZE
> > +#endif
> > +
> > +#define OVERFLOW_STACK_SIZE  SZ_4K
> > +
> >  #ifndef __ASSEMBLY__
> >
> >  struct task_struct;
> > diff --git a/arch/arm/kernel/entry-armv.S b/arch/arm/kernel/entry-armv.S
> > index b18f3aa98f42..ad8d8304539e 100644
> > --- a/arch/arm/kernel/entry-armv.S
> > +++ b/arch/arm/kernel/entry-armv.S
> > @@ -57,6 +57,14 @@ UNWIND(    .setfp  fpreg, sp               )
> >       @
> >       subs    r2, sp, r0              @ SP above bottom of IRQ stack?
> >       rsbscs  r2, r2, #THREAD_SIZE    @ ... and below the top?
> > +#ifdef CONFIG_VMAP_STACK
> > +     bcs     .L\@
> > +     mov_l   r2, overflow_stack      @ Take base address
> > +     add     r2, r2, r3              @ Top of this CPU's overflow stack
> > +     subs    r2, r0, r2              @ Compare with incoming SP
> > +     rsbscs  r2, r2, #OVERFLOW_STACK_SIZE
> > +.L\@:
> > +#endif
> >       movcs   sp, r0                  @ If so, revert to incoming SP
> >
> >  #ifndef CONFIG_UNWINDER_ARM
> > @@ -188,13 +196,18 @@ ENDPROC(__und_invalid)
> >  #define SPFIX(code...)
> >  #endif
> >
> > -     .macro  svc_entry, stack_hole=0, trace=1, uaccess=1
> > +     .macro  svc_entry, stack_hole=0, trace=1, uaccess=1, overflow_check=1
> >   UNWIND(.fnstart             )
> > - UNWIND(.save {r0 - pc}              )
> >       sub     sp, sp, #(SVC_REGS_SIZE + \stack_hole)
> > + THUMB(      add     sp, r0          )       @ get SP in a GPR without
> > + THUMB(      sub     r0, sp, r0      )       @ using a temp register
> > +
> > +     .if     \overflow_check
> > + UNWIND(.save        {r0 - pc}       )
> > +     do_overflow_check (SVC_REGS_SIZE + \stack_hole)
> > +     .endif
> > +
> >  #ifdef CONFIG_THUMB2_KERNEL
> > -     add     sp, r0                  @ get SP in a GPR without
> > -     sub     r0, sp, r0              @ using a temp register
> >       tst     r0, #4                  @ test stack pointer alignment
> >       sub     r0, sp, r0              @ restore original R0
> >       sub     sp, r0                  @ restore original SP
> > @@ -827,12 +840,20 @@ ENTRY(__switch_to)
> >       str     r7, [r8]
> >  #endif
> >       mov     r0, r5
> > -#if !defined(CONFIG_THUMB2_KERNEL)
> > +#if !defined(CONFIG_THUMB2_KERNEL) && !defined(CONFIG_VMAP_STACK)
> >       set_current r7
> >       ldmia   r4, {r4 - sl, fp, sp, pc}       @ Load all regs saved previously
> >  #else
> >       mov     r1, r7
> >       ldmia   r4, {r4 - sl, fp, ip, lr}       @ Load all regs saved previously
> > +#ifdef CONFIG_VMAP_STACK
> > +     @
> > +     @ Do a dummy read from the new stack while running from the old one so
> > +     @ that we can rely on do_translation_fault() to fix up any stale PMD
> > +     @ entries covering the vmalloc region.
> > +     @
> > +     ldr     r2, [ip]
> > +#endif
> >
> >       @ When CONFIG_THREAD_INFO_IN_TASK=n, the update of SP itself is what
> >       @ effectuates the task switch, as that is what causes the observable
> > @@ -849,6 +870,54 @@ ENTRY(__switch_to)
> >   UNWIND(.fnend               )
> >  ENDPROC(__switch_to)
> >
> > +#ifdef CONFIG_VMAP_STACK
> > +     .text
> > +__bad_stack:
> > +     @
> > +     @ We detected an overflow in svc_entry, which switched to the
> > +     @ overflow stack. Stash the exception regs, and head to our overflow
> > +     @ handler. Entered with the orginal value of SP in IP, and the original
> > +     @ value of IP in TPIDRURW
> > +     @
> > +
> > +#if defined(CONFIG_UNWINDER_FRAME_POINTER) && defined(CONFIG_CC_IS_GCC)
> > +     mov     ip, ip                          @ mov expected by unwinder
> > +     push    {fp, ip, lr, pc}                @ GCC flavor frame record
> > +#else
> > +     str     ip, [sp, #-8]!                  @ store original SP
> > +     push    {fpreg, lr}                     @ Clang flavor frame record
> > +#endif
> > +UNWIND( ldr  ip, [r0, #4]    )               @ load exception LR
> > +UNWIND( str  ip, [sp, #12]   )               @ store in the frame record
> > +     mrc     p15, 0, ip, c13, c0, 2          @ reload IP
> > +
> > +     @ Store the original GPRs to the new stack.
> > +     svc_entry uaccess=0, overflow_check=0
> > +
> > +UNWIND( .save   {sp, pc}     )
> > +UNWIND( .save   {fpreg, lr}  )
> > +UNWIND( .setfp  fpreg, sp    )
> > +
> > +     ldr     fpreg, [sp, #S_SP]              @ Add our frame record
> > +                                             @ to the linked list
> > +#if defined(CONFIG_UNWINDER_FRAME_POINTER) && defined(CONFIG_CC_IS_GCC)
> > +     ldr     r1, [fp, #4]                    @ reload SP at entry
> > +     add     fp, fp, #12
> > +#else
> > +     ldr     r1, [fpreg, #8]
> > +#endif
> > +     str     r1, [sp, #S_SP]                 @ store in pt_regs
> > +
> > +     @ Stash the regs for handle_bad_stack
> > +     mov     r0, sp
> > +
> > +     @ Time to die
> > +     bl      handle_bad_stack
> > +     nop
> > +UNWIND( .fnend                       )
> > +ENDPROC(__bad_stack)
> > +#endif
> > +
> >       __INIT
> >
> >  /*
> > diff --git a/arch/arm/kernel/entry-header.S b/arch/arm/kernel/entry-header.S
> > index ae24dd54e9ef..823dd1aa6e3e 100644
> > --- a/arch/arm/kernel/entry-header.S
> > +++ b/arch/arm/kernel/entry-header.S
> > @@ -423,3 +423,60 @@ scno     .req    r7              @ syscall number
> >  tbl  .req    r8              @ syscall table pointer
> >  why  .req    r8              @ Linux syscall (!= 0)
> >  tsk  .req    r9              @ current thread_info
> > +
> > +     .macro  do_overflow_check, frame_size:req
> > +#ifdef CONFIG_VMAP_STACK
> > +     @
> > +     @ Test whether the SP has overflowed. Task and IRQ stacks are aligned
> > +     @ so that SP & BIT(THREAD_SIZE_ORDER + PAGE_SHIFT) should always be
> > +     @ zero.
> > +     @
> > +ARM( tst     sp, #1 << (THREAD_SIZE_ORDER + PAGE_SHIFT)      )
> > +THUMB(       tst     r0, #1 << (THREAD_SIZE_ORDER + PAGE_SHIFT)      )
> > +THUMB(       it      ne                                              )
> > +     bne     .Lstack_overflow_check\@
> > +
> > +     .pushsection    .text
> > +.Lstack_overflow_check\@:
> > +     @
> > +     @ Either we've just detected an overflow, or we've taken an exception
> > +     @ while on the overflow stack. We cannot use the stack until we have
> > +     @ decided which is the case. However, as we won't return to userspace,
> > +     @ we can clobber some USR/SYS mode registers to free up GPRs.
> > +     @
> > +
> > +     mcr     p15, 0, ip, c13, c0, 2          @ Stash IP in TPIDRURW
> > +     mrs     ip, cpsr
> > +     eor     ip, ip, #(SVC_MODE ^ SYSTEM_MODE)
> > +     msr     cpsr_c, ip                      @ Switch to SYS mode
> > +     eor     ip, ip, #(SVC_MODE ^ SYSTEM_MODE)
> > +     mov     sp, ip                          @ Stash mode in SP_usr
> > +
> > +     @ Load the overflow stack into IP using LR_usr as a scratch register
> > +     mov_l   lr, overflow_stack + OVERFLOW_STACK_SIZE
> > +     mrc     p15, 0, ip, c13, c0, 4          @ Get CPU offset
> > +     add     ip, ip, lr                      @ IP := this CPU's overflow stack
> > +     mov     lr, sp                          @ Unstash mode into LR_usr
> > +     msr     cpsr_c, lr                      @ Switch back to SVC mode
> > +
> > +     @
> > +     @ Check whether we are already on the overflow stack. This may happen,
> > +     @ e.g., when performing accesses that may fault when dumping the stack.
> > +     @ The overflow stack is not in the vmalloc space so we only need to
> > +     @ check whether the incoming SP is below the top of the overflow stack.
> > +     @
> > +ARM( subs    ip, sp, ip              )       @ Delta with top of overflow stack
> > +THUMB(       subs    ip, r0, ip              )
> > +     mrclo   p15, 0, ip, c13, c0, 2          @ Restore IP
> > +     blo     .Lout\@                         @ Carry on
> > +
> > +THUMB(       sub     r0, sp, r0              )       @ Restore original R0
> > +THUMB(       sub     sp, r0                  )       @ Restore original SP
> > +     sub     sp, sp, ip                      @ Switch to overflow stack
> > +     add     ip, sp, ip                      @ Keep incoming SP value in IP
> > +     add     ip, ip, #\frame_size            @ Undo svc_entry's SP change
> > +     b       __bad_stack
> > +     .popsection
> > +.Lout\@:
> > +#endif
> > +     .endm
> > diff --git a/arch/arm/kernel/irq.c b/arch/arm/kernel/irq.c
> > index e05219bca218..5deb40f39999 100644
> > --- a/arch/arm/kernel/irq.c
> > +++ b/arch/arm/kernel/irq.c
> > @@ -56,7 +56,14 @@ static void __init init_irq_stacks(void)
> >       int cpu;
> >
> >       for_each_possible_cpu(cpu) {
> > -             stack = (u8 *)__get_free_pages(GFP_KERNEL, THREAD_SIZE_ORDER);
> > +             if (!IS_ENABLED(CONFIG_VMAP_STACK))
> > +                     stack = (u8 *)__get_free_pages(GFP_KERNEL,
> > +                                                    THREAD_SIZE_ORDER);
> > +             else
> > +                     stack = __vmalloc_node(THREAD_SIZE, THREAD_ALIGN,
> > +                                            THREADINFO_GFP, NUMA_NO_NODE,
> > +                                            __builtin_return_address(0));
> > +
> >               if (WARN_ON(!stack))
> >                       break;
> >               per_cpu(irq_stack_ptr, cpu) = &stack[THREAD_SIZE];
> > diff --git a/arch/arm/kernel/traps.c b/arch/arm/kernel/traps.c
> > index b42c446cec9a..eb8c73be7c81 100644
> > --- a/arch/arm/kernel/traps.c
> > +++ b/arch/arm/kernel/traps.c
> > @@ -121,7 +121,8 @@ void dump_backtrace_stm(u32 *stack, u32 instruction, const char *loglvl)
> >  static int verify_stack(unsigned long sp)
> >  {
> >       if (sp < PAGE_OFFSET ||
> > -         (sp > (unsigned long)high_memory && high_memory != NULL))
> > +         (!IS_ENABLED(CONFIG_VMAP_STACK) &&
> > +          sp > (unsigned long)high_memory && high_memory != NULL))
> >               return -EFAULT;
> >
> >       return 0;
> > @@ -291,7 +292,8 @@ static int __die(const char *str, int err, struct pt_regs *regs)
> >
> >       if (!user_mode(regs) || in_interrupt()) {
> >               dump_mem(KERN_EMERG, "Stack: ", regs->ARM_sp,
> > -                      ALIGN(regs->ARM_sp, THREAD_SIZE));
> > +                      ALIGN(regs->ARM_sp - THREAD_SIZE, THREAD_ALIGN)
> > +                      + THREAD_SIZE);
> >               dump_backtrace(regs, tsk, KERN_EMERG);
> >               dump_instr(KERN_EMERG, regs);
> >       }
> > @@ -838,3 +840,62 @@ void __init early_trap_init(void *vectors_base)
> >        */
> >  #endif
> >  }
> > +
> > +#ifdef CONFIG_VMAP_STACK
> > +
> > +DECLARE_PER_CPU(u8 *, irq_stack_ptr);
> > +
> > +asmlinkage DEFINE_PER_CPU_ALIGNED(u8[OVERFLOW_STACK_SIZE], overflow_stack);
> > +
> > +asmlinkage void handle_bad_stack(struct pt_regs *regs)
> > +{
> > +     unsigned long tsk_stk = (unsigned long)current->stack;
> > +     unsigned long irq_stk = (unsigned long)this_cpu_read(irq_stack_ptr);
> > +     unsigned long ovf_stk = (unsigned long)this_cpu_ptr(overflow_stack);
> > +
> > +     console_verbose();
> > +     pr_emerg("Insufficient stack space to handle exception!");
> > +
> > +     pr_emerg("Task stack:     [0x%08lx..0x%08lx]\n",
> > +              tsk_stk, tsk_stk + THREAD_SIZE);
> > +     pr_emerg("IRQ stack:      [0x%08lx..0x%08lx]\n",
> > +              irq_stk, irq_stk + THREAD_SIZE);
> > +     pr_emerg("Overflow stack: [0x%08lx..0x%08lx]\n",
> > +              ovf_stk, ovf_stk + OVERFLOW_STACK_SIZE);
> > +
> > +     die("kernel stack overflow", regs, 0);
> > +}
> > +
> > +/*
> > + * Normally, we rely on the logic in do_translation_fault() to update stale PMD
> > + * entries covering the vmalloc space in a task's page tables when it first
> > + * accesses the region in question. Unfortunately, this is not sufficient when
> > + * the task stack resides in the vmalloc region, as do_translation_fault() is a
> > + * C function that needs a stack to run.
> > + *
> > + * So we need to ensure that these PMD entries are up to date *before* the MM
> > + * switch. As we already have some logic in the MM switch path that takes care
> > + * of this, let's trigger it by bumping the counter every time the core vmalloc
> > + * code modifies a PMD entry in the vmalloc region.
> > + */
> > +void arch_sync_kernel_mappings(unsigned long start, unsigned long end)
> > +{
> > +     if (start > VMALLOC_END || end < VMALLOC_START)
> > +             return;
> > +
> > +     /*
> > +      * This hooks into the core vmalloc code to receive notifications of
> > +      * any PMD level changes that have been made to the kernel page tables.
> > +      * This means it should only be triggered once for every MiB worth of
> > +      * vmalloc space, given that we don't support huge vmalloc/vmap on ARM,
> > +      * and that kernel PMD level table entries are rarely (if ever)
> > +      * updated.
> > +      *
> > +      * This means that the counter is going to max out at ~250 for the
> > +      * typical case. If it overflows, something entirely unexpected has
> > +      * occurred so let's throw a warning if that happens.
> > +      */
> > +     WARN_ON(++init_mm.context.vmalloc_seq == UINT_MAX);
> > +}
> > +
> > +#endif
> > diff --git a/arch/arm/kernel/unwind.c b/arch/arm/kernel/unwind.c
> > index e8d729975f12..c5ea328c428d 100644
> > --- a/arch/arm/kernel/unwind.c
> > +++ b/arch/arm/kernel/unwind.c
> > @@ -389,7 +389,8 @@ int unwind_frame(struct stackframe *frame)
> >
> >       /* store the highest address on the stack to avoid crossing it*/
> >       ctrl.sp_low = frame->sp;
> > -     ctrl.sp_high = ALIGN(ctrl.sp_low, THREAD_SIZE);
> > +     ctrl.sp_high = ALIGN(ctrl.sp_low - THREAD_SIZE, THREAD_ALIGN)
> > +                    + THREAD_SIZE;
> >
> >       pr_debug("%s(pc = %08lx lr = %08lx sp = %08lx)\n", __func__,
> >                frame->pc, frame->lr, frame->sp);
> > diff --git a/arch/arm/kernel/vmlinux.lds.S b/arch/arm/kernel/vmlinux.lds.S
> > index f02d617e3359..aa12b65a7fd6 100644
> > --- a/arch/arm/kernel/vmlinux.lds.S
> > +++ b/arch/arm/kernel/vmlinux.lds.S
> > @@ -138,12 +138,12 @@ SECTIONS
> >  #ifdef CONFIG_STRICT_KERNEL_RWX
> >       . = ALIGN(1<<SECTION_SHIFT);
> >  #else
> > -     . = ALIGN(THREAD_SIZE);
> > +     . = ALIGN(THREAD_ALIGN);
> >  #endif
> >       __init_end = .;
> >
> >       _sdata = .;
> > -     RW_DATA(L1_CACHE_BYTES, PAGE_SIZE, THREAD_SIZE)
> > +     RW_DATA(L1_CACHE_BYTES, PAGE_SIZE, THREAD_ALIGN)
> >       _edata = .;
> >
> >       BSS_SECTION(0, 0, 0)
> >
>
> Git bisection log:
>
> -------------------------------------------------------------------------------
> git bisect start
> # good: [fa55b7dcdc43c1aa1ba12bca9d2dd4318c2a0dbf] Linux 5.16-rc1
> git bisect good fa55b7dcdc43c1aa1ba12bca9d2dd4318c2a0dbf
> # bad: [d1eccc4f44f11a8f3f5d376f08e3779d2196f93a] ARM: implement support for vmap'ed stacks
> git bisect bad d1eccc4f44f11a8f3f5d376f08e3779d2196f93a
> # good: [6bdd12e7427858bd0f1a68f553617fb1d97dbabb] ARM: call_with_stack: add unwind support
> git bisect good 6bdd12e7427858bd0f1a68f553617fb1d97dbabb
> # good: [fcc7029189cce80c2eac396ce4cd4544634d46e3] ARM: memset: clean up unwind annotations
> git bisect good fcc7029189cce80c2eac396ce4cd4544634d46e3
> # good: [e206f1841c51d2b3d8339efac3ac806f32d64821] ARM: switch_to: clean up Thumb2 code path
> git bisect good e206f1841c51d2b3d8339efac3ac806f32d64821
> # good: [a7c9e1b40e858eba7ff99a548153fd5c92b68e24] ARM: entry: rework stack realignment code in svc_entry
> git bisect good a7c9e1b40e858eba7ff99a548153fd5c92b68e24
> # first bad commit: [d1eccc4f44f11a8f3f5d376f08e3779d2196f93a] ARM: implement support for vmap'ed stacks
> -------------------------------------------------------------------------------

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH v3 7/7] ARM: implement support for vmap'ed stacks
@ 2021-11-16 19:28       ` Ard Biesheuvel
  0 siblings, 0 replies; 36+ messages in thread
From: Ard Biesheuvel @ 2021-11-16 19:28 UTC (permalink / raw)
  To: Guillaume Tucker, Tony Lindgren, linux-omap
  Cc: Linux ARM, Russell King, Nicolas Pitre, Arnd Bergmann, Kees Cook,
	Keith Packard, Linus Walleij, Nick Desaulniers, kernelci

(+ Tony and linux-omap@)

On Tue, 16 Nov 2021 at 10:23, Guillaume Tucker
<guillaume.tucker@collabora.com> wrote:
>
> Hi Ard,
>
> Please see the bisection report below about a boot failure on
> omap4-panda which is pointing to this patch.
>
> Reports aren't automatically sent to the public while we're
> trialing new bisection features on kernelci.org but this one
> looks valid.
>
> Some more details can be found here:
>
>   https://linux.kernelci.org/test/case/id/6191b1b97c175a5ade335948/
>
> It seems like the kernel just froze after about 3 seconds without
> any obvious errors in the log.
>
> Please let us know if you need any help debugging this issue or
> if you have a fix to try.
>

Thanks for the report.

I wonder if this might be related to low level platform code running
off a different stack (maybe in SRAM?) when an interrupt is taken? Or
using a different set of page tables that are out of sync in terms of
VMALLOC space mappings?

Could anyone who speaks OMAP please take a look at the linked boot
log, and hopefully make sense of it?

For background, this series enables vmap'ed stacks support for ARMv7,
which means that the entry code checks whether the stack pointer may
be pointing into the guard region before the vmalloc'ed stack, and
kills the task if it looks like the kernel stack overflowed.

Here's another instance:
https://linux.kernelci.org/build/id/6193fa5c6c4e1d02bd3358ff/

Everything builds and boots happily, but odd things happen on OMAP
based devices: Panda just gives up right after discovering the USB
controller, and Beagle-XM just starts showing all kinds of weird
crashes at roughly the same point in the boot.

>
> GitHub: https://github.com/kernelci/kernelci-project/issues/75
>
> -------------------------------------------------------------------------------
>
> * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * *
> * This automated bisection report was sent to you on the basis  *
> * that you may be involved with the breaking commit it has      *
> * found.  No manual investigation has been done to verify it,   *
> * and the root cause of the problem may be somewhere else.      *
> *                                                               *
> * If you do send a fix, please include this trailer:            *
> *   Reported-by: "kernelci.org bot" <bot@kernelci.org>          *
> *                                                               *
> * Hope this helps!                                              *
> * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * *
>
> ardb/for-kernelci bisection: baseline.login on panda
>
> Summary:
>   Start:      d1eccc4f44f11 ARM: implement support for vmap'ed stacks
>   Plain log:  https://storage.kernelci.org/ardb/for-kernelci/v5.16-rc1-16-gd1eccc4f44f1/arm/multi_v7_defconfig+crypto/gcc-10/lab-collabora/baseline-panda.txt
>   HTML log:   https://storage.kernelci.org/ardb/for-kernelci/v5.16-rc1-16-gd1eccc4f44f1/arm/multi_v7_defconfig+crypto/gcc-10/lab-collabora/baseline-panda.html
>   Result:     d1eccc4f44f11 ARM: implement support for vmap'ed stacks
>
> Checks:
>   revert:     PASS
>   verify:     PASS
>
> Parameters:
>   Tree:       ardb
>   URL:        https://git.kernel.org/pub/scm/linux/kernel/git/ardb/linux.git
>   Branch:     for-kernelci
>   Target:     panda
>   CPU arch:   arm
>   Lab:        lab-collabora
>   Compiler:   gcc-10
>   Config:     multi_v7_defconfig+crypto
>   Test case:  baseline.login
>
> Breaking commit found:
>
> -------------------------------------------------------------------------------
> commit d1eccc4f44f11a8f3f5d376f08e3779d2196f93a
> Author: Ard Biesheuvel <ardb@kernel.org>
> Date:   Thu Sep 23 09:15:53 2021 +0200
>
>     ARM: implement support for vmap'ed stacks
>
> On 15/11/2021 11:18, Ard Biesheuvel wrote:
> > Wire up the generic support for managing task stack allocations via vmalloc,
> > and implement the entry code that detects whether we faulted because of a
> > stack overrun (or future stack overrun caused by pushing the pt_regs array)
> >
> > While this adds a fair amount of tricky entry asm code, it should be
> > noted that it only adds a TST + branch to the svc_entry path. The code
> > implementing the non-trivial handling of the overflow stack is emitted
> > out-of-line into the .text section.
> >
> > Since on ARM, we rely on do_translation_fault() to keep PMD level page
> > table entries that cover the vmalloc region up to date, we need to
> > ensure that we don't hit such a stale PMD entry when accessing the
> > stack. So we do a dummy read from the new stack while still running from
> > the old one on the context switch path, and bump the vmalloc_seq counter
> > when PMD level entries in the vmalloc range are modified, so that the MM
> > switch fetches the latest version of the entries.
> >
> > Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
> > Tested-by: Keith Packard <keithpac@amazon.com>
> > ---
> >  arch/arm/Kconfig                   |  1 +
> >  arch/arm/include/asm/page.h        |  4 +
> >  arch/arm/include/asm/thread_info.h |  8 ++
> >  arch/arm/kernel/entry-armv.S       | 79 ++++++++++++++++++--
> >  arch/arm/kernel/entry-header.S     | 57 ++++++++++++++
> >  arch/arm/kernel/irq.c              |  9 ++-
> >  arch/arm/kernel/traps.c            | 65 +++++++++++++++-
> >  arch/arm/kernel/unwind.c           |  3 +-
> >  arch/arm/kernel/vmlinux.lds.S      |  4 +-
> >  9 files changed, 219 insertions(+), 11 deletions(-)
> >
> > diff --git a/arch/arm/Kconfig b/arch/arm/Kconfig
> > index b1eba1b4168c..a072600527ca 100644
> > --- a/arch/arm/Kconfig
> > +++ b/arch/arm/Kconfig
> > @@ -127,6 +127,7 @@ config ARM
> >       select RTC_LIB
> >       select SYS_SUPPORTS_APM_EMULATION
> >       select THREAD_INFO_IN_TASK if CURRENT_POINTER_IN_TPIDRURO
> > +     select HAVE_ARCH_VMAP_STACK if THREAD_INFO_IN_TASK
> >       select TRACE_IRQFLAGS_SUPPORT if !CPU_V7M
> >       # Above selects are sorted alphabetically; please add new ones
> >       # according to that.  Thanks.
> > diff --git a/arch/arm/include/asm/page.h b/arch/arm/include/asm/page.h
> > index 11b058a72a5b..7b871ed99ccf 100644
> > --- a/arch/arm/include/asm/page.h
> > +++ b/arch/arm/include/asm/page.h
> > @@ -149,6 +149,10 @@ extern void copy_page(void *to, const void *from);
> >  #include <asm/pgtable-2level-types.h>
> >  #endif
> >
> > +#ifdef CONFIG_VMAP_STACK
> > +#define ARCH_PAGE_TABLE_SYNC_MASK    PGTBL_PMD_MODIFIED
> > +#endif
> > +
> >  #endif /* CONFIG_MMU */
> >
> >  typedef struct page *pgtable_t;
> > diff --git a/arch/arm/include/asm/thread_info.h b/arch/arm/include/asm/thread_info.h
> > index 164e15f26485..004b89d86224 100644
> > --- a/arch/arm/include/asm/thread_info.h
> > +++ b/arch/arm/include/asm/thread_info.h
> > @@ -25,6 +25,14 @@
> >  #define THREAD_SIZE          (PAGE_SIZE << THREAD_SIZE_ORDER)
> >  #define THREAD_START_SP              (THREAD_SIZE - 8)
> >
> > +#ifdef CONFIG_VMAP_STACK
> > +#define THREAD_ALIGN         (2 * THREAD_SIZE)
> > +#else
> > +#define THREAD_ALIGN         THREAD_SIZE
> > +#endif
> > +
> > +#define OVERFLOW_STACK_SIZE  SZ_4K
> > +
> >  #ifndef __ASSEMBLY__
> >
> >  struct task_struct;
> > diff --git a/arch/arm/kernel/entry-armv.S b/arch/arm/kernel/entry-armv.S
> > index b18f3aa98f42..ad8d8304539e 100644
> > --- a/arch/arm/kernel/entry-armv.S
> > +++ b/arch/arm/kernel/entry-armv.S
> > @@ -57,6 +57,14 @@ UNWIND(    .setfp  fpreg, sp               )
> >       @
> >       subs    r2, sp, r0              @ SP above bottom of IRQ stack?
> >       rsbscs  r2, r2, #THREAD_SIZE    @ ... and below the top?
> > +#ifdef CONFIG_VMAP_STACK
> > +     bcs     .L\@
> > +     mov_l   r2, overflow_stack      @ Take base address
> > +     add     r2, r2, r3              @ Top of this CPU's overflow stack
> > +     subs    r2, r0, r2              @ Compare with incoming SP
> > +     rsbscs  r2, r2, #OVERFLOW_STACK_SIZE
> > +.L\@:
> > +#endif
> >       movcs   sp, r0                  @ If so, revert to incoming SP
> >
> >  #ifndef CONFIG_UNWINDER_ARM
> > @@ -188,13 +196,18 @@ ENDPROC(__und_invalid)
> >  #define SPFIX(code...)
> >  #endif
> >
> > -     .macro  svc_entry, stack_hole=0, trace=1, uaccess=1
> > +     .macro  svc_entry, stack_hole=0, trace=1, uaccess=1, overflow_check=1
> >   UNWIND(.fnstart             )
> > - UNWIND(.save {r0 - pc}              )
> >       sub     sp, sp, #(SVC_REGS_SIZE + \stack_hole)
> > + THUMB(      add     sp, r0          )       @ get SP in a GPR without
> > + THUMB(      sub     r0, sp, r0      )       @ using a temp register
> > +
> > +     .if     \overflow_check
> > + UNWIND(.save        {r0 - pc}       )
> > +     do_overflow_check (SVC_REGS_SIZE + \stack_hole)
> > +     .endif
> > +
> >  #ifdef CONFIG_THUMB2_KERNEL
> > -     add     sp, r0                  @ get SP in a GPR without
> > -     sub     r0, sp, r0              @ using a temp register
> >       tst     r0, #4                  @ test stack pointer alignment
> >       sub     r0, sp, r0              @ restore original R0
> >       sub     sp, r0                  @ restore original SP
> > @@ -827,12 +840,20 @@ ENTRY(__switch_to)
> >       str     r7, [r8]
> >  #endif
> >       mov     r0, r5
> > -#if !defined(CONFIG_THUMB2_KERNEL)
> > +#if !defined(CONFIG_THUMB2_KERNEL) && !defined(CONFIG_VMAP_STACK)
> >       set_current r7
> >       ldmia   r4, {r4 - sl, fp, sp, pc}       @ Load all regs saved previously
> >  #else
> >       mov     r1, r7
> >       ldmia   r4, {r4 - sl, fp, ip, lr}       @ Load all regs saved previously
> > +#ifdef CONFIG_VMAP_STACK
> > +     @
> > +     @ Do a dummy read from the new stack while running from the old one so
> > +     @ that we can rely on do_translation_fault() to fix up any stale PMD
> > +     @ entries covering the vmalloc region.
> > +     @
> > +     ldr     r2, [ip]
> > +#endif
> >
> >       @ When CONFIG_THREAD_INFO_IN_TASK=n, the update of SP itself is what
> >       @ effectuates the task switch, as that is what causes the observable
> > @@ -849,6 +870,54 @@ ENTRY(__switch_to)
> >   UNWIND(.fnend               )
> >  ENDPROC(__switch_to)
> >
> > +#ifdef CONFIG_VMAP_STACK
> > +     .text
> > +__bad_stack:
> > +     @
> > +     @ We detected an overflow in svc_entry, which switched to the
> > +     @ overflow stack. Stash the exception regs, and head to our overflow
> > +     @ handler. Entered with the orginal value of SP in IP, and the original
> > +     @ value of IP in TPIDRURW
> > +     @
> > +
> > +#if defined(CONFIG_UNWINDER_FRAME_POINTER) && defined(CONFIG_CC_IS_GCC)
> > +     mov     ip, ip                          @ mov expected by unwinder
> > +     push    {fp, ip, lr, pc}                @ GCC flavor frame record
> > +#else
> > +     str     ip, [sp, #-8]!                  @ store original SP
> > +     push    {fpreg, lr}                     @ Clang flavor frame record
> > +#endif
> > +UNWIND( ldr  ip, [r0, #4]    )               @ load exception LR
> > +UNWIND( str  ip, [sp, #12]   )               @ store in the frame record
> > +     mrc     p15, 0, ip, c13, c0, 2          @ reload IP
> > +
> > +     @ Store the original GPRs to the new stack.
> > +     svc_entry uaccess=0, overflow_check=0
> > +
> > +UNWIND( .save   {sp, pc}     )
> > +UNWIND( .save   {fpreg, lr}  )
> > +UNWIND( .setfp  fpreg, sp    )
> > +
> > +     ldr     fpreg, [sp, #S_SP]              @ Add our frame record
> > +                                             @ to the linked list
> > +#if defined(CONFIG_UNWINDER_FRAME_POINTER) && defined(CONFIG_CC_IS_GCC)
> > +     ldr     r1, [fp, #4]                    @ reload SP at entry
> > +     add     fp, fp, #12
> > +#else
> > +     ldr     r1, [fpreg, #8]
> > +#endif
> > +     str     r1, [sp, #S_SP]                 @ store in pt_regs
> > +
> > +     @ Stash the regs for handle_bad_stack
> > +     mov     r0, sp
> > +
> > +     @ Time to die
> > +     bl      handle_bad_stack
> > +     nop
> > +UNWIND( .fnend                       )
> > +ENDPROC(__bad_stack)
> > +#endif
> > +
> >       __INIT
> >
> >  /*
> > diff --git a/arch/arm/kernel/entry-header.S b/arch/arm/kernel/entry-header.S
> > index ae24dd54e9ef..823dd1aa6e3e 100644
> > --- a/arch/arm/kernel/entry-header.S
> > +++ b/arch/arm/kernel/entry-header.S
> > @@ -423,3 +423,60 @@ scno     .req    r7              @ syscall number
> >  tbl  .req    r8              @ syscall table pointer
> >  why  .req    r8              @ Linux syscall (!= 0)
> >  tsk  .req    r9              @ current thread_info
> > +
> > +     .macro  do_overflow_check, frame_size:req
> > +#ifdef CONFIG_VMAP_STACK
> > +     @
> > +     @ Test whether the SP has overflowed. Task and IRQ stacks are aligned
> > +     @ so that SP & BIT(THREAD_SIZE_ORDER + PAGE_SHIFT) should always be
> > +     @ zero.
> > +     @
> > +ARM( tst     sp, #1 << (THREAD_SIZE_ORDER + PAGE_SHIFT)      )
> > +THUMB(       tst     r0, #1 << (THREAD_SIZE_ORDER + PAGE_SHIFT)      )
> > +THUMB(       it      ne                                              )
> > +     bne     .Lstack_overflow_check\@
> > +
> > +     .pushsection    .text
> > +.Lstack_overflow_check\@:
> > +     @
> > +     @ Either we've just detected an overflow, or we've taken an exception
> > +     @ while on the overflow stack. We cannot use the stack until we have
> > +     @ decided which is the case. However, as we won't return to userspace,
> > +     @ we can clobber some USR/SYS mode registers to free up GPRs.
> > +     @
> > +
> > +     mcr     p15, 0, ip, c13, c0, 2          @ Stash IP in TPIDRURW
> > +     mrs     ip, cpsr
> > +     eor     ip, ip, #(SVC_MODE ^ SYSTEM_MODE)
> > +     msr     cpsr_c, ip                      @ Switch to SYS mode
> > +     eor     ip, ip, #(SVC_MODE ^ SYSTEM_MODE)
> > +     mov     sp, ip                          @ Stash mode in SP_usr
> > +
> > +     @ Load the overflow stack into IP using LR_usr as a scratch register
> > +     mov_l   lr, overflow_stack + OVERFLOW_STACK_SIZE
> > +     mrc     p15, 0, ip, c13, c0, 4          @ Get CPU offset
> > +     add     ip, ip, lr                      @ IP := this CPU's overflow stack
> > +     mov     lr, sp                          @ Unstash mode into LR_usr
> > +     msr     cpsr_c, lr                      @ Switch back to SVC mode
> > +
> > +     @
> > +     @ Check whether we are already on the overflow stack. This may happen,
> > +     @ e.g., when performing accesses that may fault when dumping the stack.
> > +     @ The overflow stack is not in the vmalloc space so we only need to
> > +     @ check whether the incoming SP is below the top of the overflow stack.
> > +     @
> > +ARM( subs    ip, sp, ip              )       @ Delta with top of overflow stack
> > +THUMB(       subs    ip, r0, ip              )
> > +     mrclo   p15, 0, ip, c13, c0, 2          @ Restore IP
> > +     blo     .Lout\@                         @ Carry on
> > +
> > +THUMB(       sub     r0, sp, r0              )       @ Restore original R0
> > +THUMB(       sub     sp, r0                  )       @ Restore original SP
> > +     sub     sp, sp, ip                      @ Switch to overflow stack
> > +     add     ip, sp, ip                      @ Keep incoming SP value in IP
> > +     add     ip, ip, #\frame_size            @ Undo svc_entry's SP change
> > +     b       __bad_stack
> > +     .popsection
> > +.Lout\@:
> > +#endif
> > +     .endm
> > diff --git a/arch/arm/kernel/irq.c b/arch/arm/kernel/irq.c
> > index e05219bca218..5deb40f39999 100644
> > --- a/arch/arm/kernel/irq.c
> > +++ b/arch/arm/kernel/irq.c
> > @@ -56,7 +56,14 @@ static void __init init_irq_stacks(void)
> >       int cpu;
> >
> >       for_each_possible_cpu(cpu) {
> > -             stack = (u8 *)__get_free_pages(GFP_KERNEL, THREAD_SIZE_ORDER);
> > +             if (!IS_ENABLED(CONFIG_VMAP_STACK))
> > +                     stack = (u8 *)__get_free_pages(GFP_KERNEL,
> > +                                                    THREAD_SIZE_ORDER);
> > +             else
> > +                     stack = __vmalloc_node(THREAD_SIZE, THREAD_ALIGN,
> > +                                            THREADINFO_GFP, NUMA_NO_NODE,
> > +                                            __builtin_return_address(0));
> > +
> >               if (WARN_ON(!stack))
> >                       break;
> >               per_cpu(irq_stack_ptr, cpu) = &stack[THREAD_SIZE];
> > diff --git a/arch/arm/kernel/traps.c b/arch/arm/kernel/traps.c
> > index b42c446cec9a..eb8c73be7c81 100644
> > --- a/arch/arm/kernel/traps.c
> > +++ b/arch/arm/kernel/traps.c
> > @@ -121,7 +121,8 @@ void dump_backtrace_stm(u32 *stack, u32 instruction, const char *loglvl)
> >  static int verify_stack(unsigned long sp)
> >  {
> >       if (sp < PAGE_OFFSET ||
> > -         (sp > (unsigned long)high_memory && high_memory != NULL))
> > +         (!IS_ENABLED(CONFIG_VMAP_STACK) &&
> > +          sp > (unsigned long)high_memory && high_memory != NULL))
> >               return -EFAULT;
> >
> >       return 0;
> > @@ -291,7 +292,8 @@ static int __die(const char *str, int err, struct pt_regs *regs)
> >
> >       if (!user_mode(regs) || in_interrupt()) {
> >               dump_mem(KERN_EMERG, "Stack: ", regs->ARM_sp,
> > -                      ALIGN(regs->ARM_sp, THREAD_SIZE));
> > +                      ALIGN(regs->ARM_sp - THREAD_SIZE, THREAD_ALIGN)
> > +                      + THREAD_SIZE);
> >               dump_backtrace(regs, tsk, KERN_EMERG);
> >               dump_instr(KERN_EMERG, regs);
> >       }
> > @@ -838,3 +840,62 @@ void __init early_trap_init(void *vectors_base)
> >        */
> >  #endif
> >  }
> > +
> > +#ifdef CONFIG_VMAP_STACK
> > +
> > +DECLARE_PER_CPU(u8 *, irq_stack_ptr);
> > +
> > +asmlinkage DEFINE_PER_CPU_ALIGNED(u8[OVERFLOW_STACK_SIZE], overflow_stack);
> > +
> > +asmlinkage void handle_bad_stack(struct pt_regs *regs)
> > +{
> > +     unsigned long tsk_stk = (unsigned long)current->stack;
> > +     unsigned long irq_stk = (unsigned long)this_cpu_read(irq_stack_ptr);
> > +     unsigned long ovf_stk = (unsigned long)this_cpu_ptr(overflow_stack);
> > +
> > +     console_verbose();
> > +     pr_emerg("Insufficient stack space to handle exception!");
> > +
> > +     pr_emerg("Task stack:     [0x%08lx..0x%08lx]\n",
> > +              tsk_stk, tsk_stk + THREAD_SIZE);
> > +     pr_emerg("IRQ stack:      [0x%08lx..0x%08lx]\n",
> > +              irq_stk, irq_stk + THREAD_SIZE);
> > +     pr_emerg("Overflow stack: [0x%08lx..0x%08lx]\n",
> > +              ovf_stk, ovf_stk + OVERFLOW_STACK_SIZE);
> > +
> > +     die("kernel stack overflow", regs, 0);
> > +}
> > +
> > +/*
> > + * Normally, we rely on the logic in do_translation_fault() to update stale PMD
> > + * entries covering the vmalloc space in a task's page tables when it first
> > + * accesses the region in question. Unfortunately, this is not sufficient when
> > + * the task stack resides in the vmalloc region, as do_translation_fault() is a
> > + * C function that needs a stack to run.
> > + *
> > + * So we need to ensure that these PMD entries are up to date *before* the MM
> > + * switch. As we already have some logic in the MM switch path that takes care
> > + * of this, let's trigger it by bumping the counter every time the core vmalloc
> > + * code modifies a PMD entry in the vmalloc region.
> > + */
> > +void arch_sync_kernel_mappings(unsigned long start, unsigned long end)
> > +{
> > +     if (start > VMALLOC_END || end < VMALLOC_START)
> > +             return;
> > +
> > +     /*
> > +      * This hooks into the core vmalloc code to receive notifications of
> > +      * any PMD level changes that have been made to the kernel page tables.
> > +      * This means it should only be triggered once for every MiB worth of
> > +      * vmalloc space, given that we don't support huge vmalloc/vmap on ARM,
> > +      * and that kernel PMD level table entries are rarely (if ever)
> > +      * updated.
> > +      *
> > +      * This means that the counter is going to max out at ~250 for the
> > +      * typical case. If it overflows, something entirely unexpected has
> > +      * occurred so let's throw a warning if that happens.
> > +      */
> > +     WARN_ON(++init_mm.context.vmalloc_seq == UINT_MAX);
> > +}
> > +
> > +#endif
> > diff --git a/arch/arm/kernel/unwind.c b/arch/arm/kernel/unwind.c
> > index e8d729975f12..c5ea328c428d 100644
> > --- a/arch/arm/kernel/unwind.c
> > +++ b/arch/arm/kernel/unwind.c
> > @@ -389,7 +389,8 @@ int unwind_frame(struct stackframe *frame)
> >
> >       /* store the highest address on the stack to avoid crossing it*/
> >       ctrl.sp_low = frame->sp;
> > -     ctrl.sp_high = ALIGN(ctrl.sp_low, THREAD_SIZE);
> > +     ctrl.sp_high = ALIGN(ctrl.sp_low - THREAD_SIZE, THREAD_ALIGN)
> > +                    + THREAD_SIZE;
> >
> >       pr_debug("%s(pc = %08lx lr = %08lx sp = %08lx)\n", __func__,
> >                frame->pc, frame->lr, frame->sp);
> > diff --git a/arch/arm/kernel/vmlinux.lds.S b/arch/arm/kernel/vmlinux.lds.S
> > index f02d617e3359..aa12b65a7fd6 100644
> > --- a/arch/arm/kernel/vmlinux.lds.S
> > +++ b/arch/arm/kernel/vmlinux.lds.S
> > @@ -138,12 +138,12 @@ SECTIONS
> >  #ifdef CONFIG_STRICT_KERNEL_RWX
> >       . = ALIGN(1<<SECTION_SHIFT);
> >  #else
> > -     . = ALIGN(THREAD_SIZE);
> > +     . = ALIGN(THREAD_ALIGN);
> >  #endif
> >       __init_end = .;
> >
> >       _sdata = .;
> > -     RW_DATA(L1_CACHE_BYTES, PAGE_SIZE, THREAD_SIZE)
> > +     RW_DATA(L1_CACHE_BYTES, PAGE_SIZE, THREAD_ALIGN)
> >       _edata = .;
> >
> >       BSS_SECTION(0, 0, 0)
> >
>
> Git bisection log:
>
> -------------------------------------------------------------------------------
> git bisect start
> # good: [fa55b7dcdc43c1aa1ba12bca9d2dd4318c2a0dbf] Linux 5.16-rc1
> git bisect good fa55b7dcdc43c1aa1ba12bca9d2dd4318c2a0dbf
> # bad: [d1eccc4f44f11a8f3f5d376f08e3779d2196f93a] ARM: implement support for vmap'ed stacks
> git bisect bad d1eccc4f44f11a8f3f5d376f08e3779d2196f93a
> # good: [6bdd12e7427858bd0f1a68f553617fb1d97dbabb] ARM: call_with_stack: add unwind support
> git bisect good 6bdd12e7427858bd0f1a68f553617fb1d97dbabb
> # good: [fcc7029189cce80c2eac396ce4cd4544634d46e3] ARM: memset: clean up unwind annotations
> git bisect good fcc7029189cce80c2eac396ce4cd4544634d46e3
> # good: [e206f1841c51d2b3d8339efac3ac806f32d64821] ARM: switch_to: clean up Thumb2 code path
> git bisect good e206f1841c51d2b3d8339efac3ac806f32d64821
> # good: [a7c9e1b40e858eba7ff99a548153fd5c92b68e24] ARM: entry: rework stack realignment code in svc_entry
> git bisect good a7c9e1b40e858eba7ff99a548153fd5c92b68e24
> # first bad commit: [d1eccc4f44f11a8f3f5d376f08e3779d2196f93a] ARM: implement support for vmap'ed stacks
> -------------------------------------------------------------------------------

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH v3 7/7] ARM: implement support for vmap'ed stacks
  2021-11-16 19:28       ` Ard Biesheuvel
@ 2021-11-16 20:06         ` Russell King (Oracle)
  -1 siblings, 0 replies; 36+ messages in thread
From: Russell King (Oracle) @ 2021-11-16 20:06 UTC (permalink / raw)
  To: Ard Biesheuvel
  Cc: Guillaume Tucker, Tony Lindgren, linux-omap, Linux ARM,
	Nicolas Pitre, Arnd Bergmann, Kees Cook, Keith Packard,
	Linus Walleij, Nick Desaulniers, kernelci

On Tue, Nov 16, 2021 at 08:28:02PM +0100, Ard Biesheuvel wrote:
> (+ Tony and linux-omap@)
> 
> On Tue, 16 Nov 2021 at 10:23, Guillaume Tucker
> <guillaume.tucker@collabora.com> wrote:
> >
> > Hi Ard,
> >
> > Please see the bisection report below about a boot failure on
> > omap4-panda which is pointing to this patch.
> >
> > Reports aren't automatically sent to the public while we're
> > trialing new bisection features on kernelci.org but this one
> > looks valid.
> >
> > Some more details can be found here:
> >
> >   https://linux.kernelci.org/test/case/id/6191b1b97c175a5ade335948/
> >
> > It seems like the kernel just froze after about 3 seconds without
> > any obvious errors in the log.
> >
> > Please let us know if you need any help debugging this issue or
> > if you have a fix to try.
> >
> 
> Thanks for the report.
> 
> I wonder if this might be related to low level platform code running
> off a different stack (maybe in SRAM?) when an interrupt is taken? Or
> using a different set of page tables that are out of sync in terms of
> VMALLOC space mappings?
> 
> Could anyone who speaks OMAP please take a look at the linked boot
> log, and hopefully make sense of it?
> 
> For background, this series enables vmap'ed stacks support for ARMv7,
> which means that the entry code checks whether the stack pointer may
> be pointing into the guard region before the vmalloc'ed stack, and
> kills the task if it looks like the kernel stack overflowed.
> 
> Here's another instance:
> https://linux.kernelci.org/build/id/6193fa5c6c4e1d02bd3358ff/
> 
> Everything builds and boots happily, but odd things happen on OMAP
> based devices: Panda just gives up right after discovering the USB
> controller, and Beagle-XM just starts showing all kinds of weird
> crashes at roughly the same point in the boot.

I haven't looked at the logs yet... but there may be a more
fundamental reason that it may be stalling.

vmalloc space is lazily mapped to process page tables that the
allocation did not happen inside - specifically the L1 entries.

When a new thread is created, you're vmalloc()ing a kernel stack.
This is done in the parent task for the child task. If the child
task doesn't contain the L1 entry for its vmalloc'd stack, then
the first stack access by the child will fault.

The fault processing will be done in the child's context, so we
immediately try to save the state to the child's kernel stack,
which is not yet mapped. The result is another fault, which
triggers yet another fault, etc.

-- 
RMK's Patch system: https://www.armlinux.org.uk/developer/patches/
FTTP is here! 40Mbps down 10Mbps up. Decent connectivity at last!

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH v3 7/7] ARM: implement support for vmap'ed stacks
@ 2021-11-16 20:06         ` Russell King (Oracle)
  0 siblings, 0 replies; 36+ messages in thread
From: Russell King (Oracle) @ 2021-11-16 20:06 UTC (permalink / raw)
  To: Ard Biesheuvel
  Cc: Guillaume Tucker, Tony Lindgren, linux-omap, Linux ARM,
	Nicolas Pitre, Arnd Bergmann, Kees Cook, Keith Packard,
	Linus Walleij, Nick Desaulniers, kernelci

On Tue, Nov 16, 2021 at 08:28:02PM +0100, Ard Biesheuvel wrote:
> (+ Tony and linux-omap@)
> 
> On Tue, 16 Nov 2021 at 10:23, Guillaume Tucker
> <guillaume.tucker@collabora.com> wrote:
> >
> > Hi Ard,
> >
> > Please see the bisection report below about a boot failure on
> > omap4-panda which is pointing to this patch.
> >
> > Reports aren't automatically sent to the public while we're
> > trialing new bisection features on kernelci.org but this one
> > looks valid.
> >
> > Some more details can be found here:
> >
> >   https://linux.kernelci.org/test/case/id/6191b1b97c175a5ade335948/
> >
> > It seems like the kernel just froze after about 3 seconds without
> > any obvious errors in the log.
> >
> > Please let us know if you need any help debugging this issue or
> > if you have a fix to try.
> >
> 
> Thanks for the report.
> 
> I wonder if this might be related to low level platform code running
> off a different stack (maybe in SRAM?) when an interrupt is taken? Or
> using a different set of page tables that are out of sync in terms of
> VMALLOC space mappings?
> 
> Could anyone who speaks OMAP please take a look at the linked boot
> log, and hopefully make sense of it?
> 
> For background, this series enables vmap'ed stacks support for ARMv7,
> which means that the entry code checks whether the stack pointer may
> be pointing into the guard region before the vmalloc'ed stack, and
> kills the task if it looks like the kernel stack overflowed.
> 
> Here's another instance:
> https://linux.kernelci.org/build/id/6193fa5c6c4e1d02bd3358ff/
> 
> Everything builds and boots happily, but odd things happen on OMAP
> based devices: Panda just gives up right after discovering the USB
> controller, and Beagle-XM just starts showing all kinds of weird
> crashes at roughly the same point in the boot.

I haven't looked at the logs yet... but there may be a more
fundamental reason that it may be stalling.

vmalloc space is lazily mapped to process page tables that the
allocation did not happen inside - specifically the L1 entries.

When a new thread is created, you're vmalloc()ing a kernel stack.
This is done in the parent task for the child task. If the child
task doesn't contain the L1 entry for its vmalloc'd stack, then
the first stack access by the child will fault.

The fault processing will be done in the child's context, so we
immediately try to save the state to the child's kernel stack,
which is not yet mapped. The result is another fault, which
triggers yet another fault, etc.

-- 
RMK's Patch system: https://www.armlinux.org.uk/developer/patches/
FTTP is here! 40Mbps down 10Mbps up. Decent connectivity at last!

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH v3 7/7] ARM: implement support for vmap'ed stacks
  2021-11-16 20:06         ` Russell King (Oracle)
@ 2021-11-16 22:02           ` Ard Biesheuvel
  -1 siblings, 0 replies; 36+ messages in thread
From: Ard Biesheuvel @ 2021-11-16 22:02 UTC (permalink / raw)
  To: Russell King (Oracle)
  Cc: Guillaume Tucker, Tony Lindgren, linux-omap, Linux ARM,
	Nicolas Pitre, Arnd Bergmann, Kees Cook, Keith Packard,
	Linus Walleij, Nick Desaulniers, kernelci

On Tue, 16 Nov 2021 at 21:06, Russell King (Oracle)
<linux@armlinux.org.uk> wrote:
>
> On Tue, Nov 16, 2021 at 08:28:02PM +0100, Ard Biesheuvel wrote:
> > (+ Tony and linux-omap@)
> >
> > On Tue, 16 Nov 2021 at 10:23, Guillaume Tucker
> > <guillaume.tucker@collabora.com> wrote:
> > >
> > > Hi Ard,
> > >
> > > Please see the bisection report below about a boot failure on
> > > omap4-panda which is pointing to this patch.
> > >
> > > Reports aren't automatically sent to the public while we're
> > > trialing new bisection features on kernelci.org but this one
> > > looks valid.
> > >
> > > Some more details can be found here:
> > >
> > >   https://linux.kernelci.org/test/case/id/6191b1b97c175a5ade335948/
> > >
> > > It seems like the kernel just froze after about 3 seconds without
> > > any obvious errors in the log.
> > >
> > > Please let us know if you need any help debugging this issue or
> > > if you have a fix to try.
> > >
> >
> > Thanks for the report.
> >
> > I wonder if this might be related to low level platform code running
> > off a different stack (maybe in SRAM?) when an interrupt is taken? Or
> > using a different set of page tables that are out of sync in terms of
> > VMALLOC space mappings?
> >
> > Could anyone who speaks OMAP please take a look at the linked boot
> > log, and hopefully make sense of it?
> >
> > For background, this series enables vmap'ed stacks support for ARMv7,
> > which means that the entry code checks whether the stack pointer may
> > be pointing into the guard region before the vmalloc'ed stack, and
> > kills the task if it looks like the kernel stack overflowed.
> >
> > Here's another instance:
> > https://linux.kernelci.org/build/id/6193fa5c6c4e1d02bd3358ff/
> >
> > Everything builds and boots happily, but odd things happen on OMAP
> > based devices: Panda just gives up right after discovering the USB
> > controller, and Beagle-XM just starts showing all kinds of weird
> > crashes at roughly the same point in the boot.
>
> I haven't looked at the logs yet... but there may be a more
> fundamental reason that it may be stalling.
>
> vmalloc space is lazily mapped to process page tables that the
> allocation did not happen inside - specifically the L1 entries.
>
> When a new thread is created, you're vmalloc()ing a kernel stack.
> This is done in the parent task for the child task. If the child
> task doesn't contain the L1 entry for its vmalloc'd stack, then
> the first stack access by the child will fault.
>
> The fault processing will be done in the child's context, so we
> immediately try to save the state to the child's kernel stack,
> which is not yet mapped. The result is another fault, which
> triggers yet another fault, etc.
>

I deal with this condition specifically in two different places:
- at context switch time, there is a dummy read from the new stack
while running from the old one, to ensure that the fault takes place
while SP points to a valid mapping;
- at mm_switch() time, the vmalloc_seq counter is used to ensure that
the new MM is synced to init_mm in terms of vmalloc PMD entries.

Of course, I may have missed something, but I wouldn't expect a
fundamental flaw in this logic to affect only OMAP3/4 based platforms
in such a weird way. Perhaps there is something I missed in terms of
TLB maintenance, although I would expect the existing fault handler to
take care of that.

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH v3 7/7] ARM: implement support for vmap'ed stacks
@ 2021-11-16 22:02           ` Ard Biesheuvel
  0 siblings, 0 replies; 36+ messages in thread
From: Ard Biesheuvel @ 2021-11-16 22:02 UTC (permalink / raw)
  To: Russell King (Oracle)
  Cc: Guillaume Tucker, Tony Lindgren, linux-omap, Linux ARM,
	Nicolas Pitre, Arnd Bergmann, Kees Cook, Keith Packard,
	Linus Walleij, Nick Desaulniers, kernelci

On Tue, 16 Nov 2021 at 21:06, Russell King (Oracle)
<linux@armlinux.org.uk> wrote:
>
> On Tue, Nov 16, 2021 at 08:28:02PM +0100, Ard Biesheuvel wrote:
> > (+ Tony and linux-omap@)
> >
> > On Tue, 16 Nov 2021 at 10:23, Guillaume Tucker
> > <guillaume.tucker@collabora.com> wrote:
> > >
> > > Hi Ard,
> > >
> > > Please see the bisection report below about a boot failure on
> > > omap4-panda which is pointing to this patch.
> > >
> > > Reports aren't automatically sent to the public while we're
> > > trialing new bisection features on kernelci.org but this one
> > > looks valid.
> > >
> > > Some more details can be found here:
> > >
> > >   https://linux.kernelci.org/test/case/id/6191b1b97c175a5ade335948/
> > >
> > > It seems like the kernel just froze after about 3 seconds without
> > > any obvious errors in the log.
> > >
> > > Please let us know if you need any help debugging this issue or
> > > if you have a fix to try.
> > >
> >
> > Thanks for the report.
> >
> > I wonder if this might be related to low level platform code running
> > off a different stack (maybe in SRAM?) when an interrupt is taken? Or
> > using a different set of page tables that are out of sync in terms of
> > VMALLOC space mappings?
> >
> > Could anyone who speaks OMAP please take a look at the linked boot
> > log, and hopefully make sense of it?
> >
> > For background, this series enables vmap'ed stacks support for ARMv7,
> > which means that the entry code checks whether the stack pointer may
> > be pointing into the guard region before the vmalloc'ed stack, and
> > kills the task if it looks like the kernel stack overflowed.
> >
> > Here's another instance:
> > https://linux.kernelci.org/build/id/6193fa5c6c4e1d02bd3358ff/
> >
> > Everything builds and boots happily, but odd things happen on OMAP
> > based devices: Panda just gives up right after discovering the USB
> > controller, and Beagle-XM just starts showing all kinds of weird
> > crashes at roughly the same point in the boot.
>
> I haven't looked at the logs yet... but there may be a more
> fundamental reason that it may be stalling.
>
> vmalloc space is lazily mapped to process page tables that the
> allocation did not happen inside - specifically the L1 entries.
>
> When a new thread is created, you're vmalloc()ing a kernel stack.
> This is done in the parent task for the child task. If the child
> task doesn't contain the L1 entry for its vmalloc'd stack, then
> the first stack access by the child will fault.
>
> The fault processing will be done in the child's context, so we
> immediately try to save the state to the child's kernel stack,
> which is not yet mapped. The result is another fault, which
> triggers yet another fault, etc.
>

I deal with this condition specifically in two different places:
- at context switch time, there is a dummy read from the new stack
while running from the old one, to ensure that the fault takes place
while SP points to a valid mapping;
- at mm_switch() time, the vmalloc_seq counter is used to ensure that
the new MM is synced to init_mm in terms of vmalloc PMD entries.

Of course, I may have missed something, but I wouldn't expect a
fundamental flaw in this logic to affect only OMAP3/4 based platforms
in such a weird way. Perhaps there is something I missed in terms of
TLB maintenance, although I would expect the existing fault handler to
take care of that.

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH v3 7/7] ARM: implement support for vmap'ed stacks
  2021-11-16 22:02           ` Ard Biesheuvel
@ 2021-11-17  7:59             ` Tony Lindgren
  -1 siblings, 0 replies; 36+ messages in thread
From: Tony Lindgren @ 2021-11-17  7:59 UTC (permalink / raw)
  To: Ard Biesheuvel
  Cc: Russell King (Oracle),
	Guillaume Tucker, linux-omap, Linux ARM, Nicolas Pitre,
	Arnd Bergmann, Kees Cook, Keith Packard, Linus Walleij,
	Nick Desaulniers, kernelci

* Ard Biesheuvel <ardb@kernel.org> [211116 22:03]:
> Of course, I may have missed something, but I wouldn't expect a
> fundamental flaw in this logic to affect only OMAP3/4 based platforms
> in such a weird way. Perhaps there is something I missed in terms of
> TLB maintenance, although I would expect the existing fault handler to
> take care of that.

Looks like disabling the deeper idle states for cpuidle where the CPUSs
get shut down and restored seems to work around the issue at least for
omap4. The assembly code is in arch/arm/mach-omap2/sleep44xx.S, and in
sleep34xx.S for omap3. No idea so far what might be causing this..

Regards,

Tony

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH v3 7/7] ARM: implement support for vmap'ed stacks
@ 2021-11-17  7:59             ` Tony Lindgren
  0 siblings, 0 replies; 36+ messages in thread
From: Tony Lindgren @ 2021-11-17  7:59 UTC (permalink / raw)
  To: Ard Biesheuvel
  Cc: Russell King (Oracle),
	Guillaume Tucker, linux-omap, Linux ARM, Nicolas Pitre,
	Arnd Bergmann, Kees Cook, Keith Packard, Linus Walleij,
	Nick Desaulniers, kernelci

* Ard Biesheuvel <ardb@kernel.org> [211116 22:03]:
> Of course, I may have missed something, but I wouldn't expect a
> fundamental flaw in this logic to affect only OMAP3/4 based platforms
> in such a weird way. Perhaps there is something I missed in terms of
> TLB maintenance, although I would expect the existing fault handler to
> take care of that.

Looks like disabling the deeper idle states for cpuidle where the CPUSs
get shut down and restored seems to work around the issue at least for
omap4. The assembly code is in arch/arm/mach-omap2/sleep44xx.S, and in
sleep34xx.S for omap3. No idea so far what might be causing this..

Regards,

Tony

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH v3 7/7] ARM: implement support for vmap'ed stacks
  2021-11-17  7:59             ` Tony Lindgren
@ 2021-11-17  8:28               ` Ard Biesheuvel
  -1 siblings, 0 replies; 36+ messages in thread
From: Ard Biesheuvel @ 2021-11-17  8:28 UTC (permalink / raw)
  To: Tony Lindgren
  Cc: Russell King (Oracle),
	Guillaume Tucker, linux-omap, Linux ARM, Nicolas Pitre,
	Arnd Bergmann, Kees Cook, Keith Packard, Linus Walleij,
	Nick Desaulniers, kernelci

On Wed, 17 Nov 2021 at 08:59, Tony Lindgren <tony@atomide.com> wrote:
>
> * Ard Biesheuvel <ardb@kernel.org> [211116 22:03]:
> > Of course, I may have missed something, but I wouldn't expect a
> > fundamental flaw in this logic to affect only OMAP3/4 based platforms
> > in such a weird way. Perhaps there is something I missed in terms of
> > TLB maintenance, although I would expect the existing fault handler to
> > take care of that.
>
> Looks like disabling the deeper idle states for cpuidle where the CPUSs
> get shut down and restored seems to work around the issue at least for
> omap4. The assembly code is in arch/arm/mach-omap2/sleep44xx.S, and in
> sleep34xx.S for omap3. No idea so far what might be causing this..
>

Thanks Tony, that is very helpful. I have a Beaglebone white somewhere
so I'll try and reproduce it locally as well.

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH v3 7/7] ARM: implement support for vmap'ed stacks
@ 2021-11-17  8:28               ` Ard Biesheuvel
  0 siblings, 0 replies; 36+ messages in thread
From: Ard Biesheuvel @ 2021-11-17  8:28 UTC (permalink / raw)
  To: Tony Lindgren
  Cc: Russell King (Oracle),
	Guillaume Tucker, linux-omap, Linux ARM, Nicolas Pitre,
	Arnd Bergmann, Kees Cook, Keith Packard, Linus Walleij,
	Nick Desaulniers, kernelci

On Wed, 17 Nov 2021 at 08:59, Tony Lindgren <tony@atomide.com> wrote:
>
> * Ard Biesheuvel <ardb@kernel.org> [211116 22:03]:
> > Of course, I may have missed something, but I wouldn't expect a
> > fundamental flaw in this logic to affect only OMAP3/4 based platforms
> > in such a weird way. Perhaps there is something I missed in terms of
> > TLB maintenance, although I would expect the existing fault handler to
> > take care of that.
>
> Looks like disabling the deeper idle states for cpuidle where the CPUSs
> get shut down and restored seems to work around the issue at least for
> omap4. The assembly code is in arch/arm/mach-omap2/sleep44xx.S, and in
> sleep34xx.S for omap3. No idea so far what might be causing this..
>

Thanks Tony, that is very helpful. I have a Beaglebone white somewhere
so I'll try and reproduce it locally as well.

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH v3 7/7] ARM: implement support for vmap'ed stacks
  2021-11-17  8:28               ` Ard Biesheuvel
@ 2021-11-17  8:36                 ` Tony Lindgren
  -1 siblings, 0 replies; 36+ messages in thread
From: Tony Lindgren @ 2021-11-17  8:36 UTC (permalink / raw)
  To: Ard Biesheuvel
  Cc: Russell King (Oracle),
	Guillaume Tucker, linux-omap, Linux ARM, Nicolas Pitre,
	Arnd Bergmann, Kees Cook, Keith Packard, Linus Walleij,
	Nick Desaulniers, kernelci

* Ard Biesheuvel <ardb@kernel.org> [211117 08:29]:
> On Wed, 17 Nov 2021 at 08:59, Tony Lindgren <tony@atomide.com> wrote:
> >
> > * Ard Biesheuvel <ardb@kernel.org> [211116 22:03]:
> > > Of course, I may have missed something, but I wouldn't expect a
> > > fundamental flaw in this logic to affect only OMAP3/4 based platforms
> > > in such a weird way. Perhaps there is something I missed in terms of
> > > TLB maintenance, although I would expect the existing fault handler to
> > > take care of that.
> >
> > Looks like disabling the deeper idle states for cpuidle where the CPUSs
> > get shut down and restored seems to work around the issue at least for
> > omap4. The assembly code is in arch/arm/mach-omap2/sleep44xx.S, and in
> > sleep34xx.S for omap3. No idea so far what might be causing this..
> >
> 
> Thanks Tony, that is very helpful. I have a Beaglebone white somewhere
> so I'll try and reproduce it locally as well.

I think with Beaglebone you may hit this only with suspend/resume if at
all. On am335x cpuidle is not shutting down the CPU. And only some models
will suspend to deeper idle states as it depends on the PMIC.

If you have some test patch to try, just let me know.

Regards,

Tony

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH v3 7/7] ARM: implement support for vmap'ed stacks
@ 2021-11-17  8:36                 ` Tony Lindgren
  0 siblings, 0 replies; 36+ messages in thread
From: Tony Lindgren @ 2021-11-17  8:36 UTC (permalink / raw)
  To: Ard Biesheuvel
  Cc: Russell King (Oracle),
	Guillaume Tucker, linux-omap, Linux ARM, Nicolas Pitre,
	Arnd Bergmann, Kees Cook, Keith Packard, Linus Walleij,
	Nick Desaulniers, kernelci

* Ard Biesheuvel <ardb@kernel.org> [211117 08:29]:
> On Wed, 17 Nov 2021 at 08:59, Tony Lindgren <tony@atomide.com> wrote:
> >
> > * Ard Biesheuvel <ardb@kernel.org> [211116 22:03]:
> > > Of course, I may have missed something, but I wouldn't expect a
> > > fundamental flaw in this logic to affect only OMAP3/4 based platforms
> > > in such a weird way. Perhaps there is something I missed in terms of
> > > TLB maintenance, although I would expect the existing fault handler to
> > > take care of that.
> >
> > Looks like disabling the deeper idle states for cpuidle where the CPUSs
> > get shut down and restored seems to work around the issue at least for
> > omap4. The assembly code is in arch/arm/mach-omap2/sleep44xx.S, and in
> > sleep34xx.S for omap3. No idea so far what might be causing this..
> >
> 
> Thanks Tony, that is very helpful. I have a Beaglebone white somewhere
> so I'll try and reproduce it locally as well.

I think with Beaglebone you may hit this only with suspend/resume if at
all. On am335x cpuidle is not shutting down the CPU. And only some models
will suspend to deeper idle states as it depends on the PMIC.

If you have some test patch to try, just let me know.

Regards,

Tony

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH v3 7/7] ARM: implement support for vmap'ed stacks
  2021-11-17  8:36                 ` Tony Lindgren
@ 2021-11-17  9:03                   ` Arnd Bergmann
  -1 siblings, 0 replies; 36+ messages in thread
From: Arnd Bergmann @ 2021-11-17  9:03 UTC (permalink / raw)
  To: Tony Lindgren
  Cc: Ard Biesheuvel, Russell King (Oracle),
	Guillaume Tucker, linux-omap, Linux ARM, Nicolas Pitre,
	Arnd Bergmann, Kees Cook, Keith Packard, Linus Walleij,
	Nick Desaulniers, kernelci

On Wed, Nov 17, 2021 at 9:36 AM Tony Lindgren <tony@atomide.com> wrote:
> * Ard Biesheuvel <ardb@kernel.org> [211117 08:29]:
> >
> > Thanks Tony, that is very helpful. I have a Beaglebone white somewhere
> > so I'll try and reproduce it locally as well.
>
> I think with Beaglebone you may hit this only with suspend/resume if at
> all. On am335x cpuidle is not shutting down the CPU. And only some models
> will suspend to deeper idle states as it depends on the PMIC.
>
> If you have some test patch to try, just let me know.

I looked at how the sleep code is called and found that cpu_suspend()/
__cpu_suspend() has interesting manipulation of the stack pointer to
call the platform specific function with a simple 1:1 page table,
I would expect the problem somewhere in there, haven't pinpointed
the exact line yet, but if any of that code tries to local the physical
address of the stack using virt_to_phys or its asm equivalent, this
fails for a vmap stack.

        Arnd

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH v3 7/7] ARM: implement support for vmap'ed stacks
@ 2021-11-17  9:03                   ` Arnd Bergmann
  0 siblings, 0 replies; 36+ messages in thread
From: Arnd Bergmann @ 2021-11-17  9:03 UTC (permalink / raw)
  To: Tony Lindgren
  Cc: Ard Biesheuvel, Russell King (Oracle),
	Guillaume Tucker, linux-omap, Linux ARM, Nicolas Pitre,
	Arnd Bergmann, Kees Cook, Keith Packard, Linus Walleij,
	Nick Desaulniers, kernelci

On Wed, Nov 17, 2021 at 9:36 AM Tony Lindgren <tony@atomide.com> wrote:
> * Ard Biesheuvel <ardb@kernel.org> [211117 08:29]:
> >
> > Thanks Tony, that is very helpful. I have a Beaglebone white somewhere
> > so I'll try and reproduce it locally as well.
>
> I think with Beaglebone you may hit this only with suspend/resume if at
> all. On am335x cpuidle is not shutting down the CPU. And only some models
> will suspend to deeper idle states as it depends on the PMIC.
>
> If you have some test patch to try, just let me know.

I looked at how the sleep code is called and found that cpu_suspend()/
__cpu_suspend() has interesting manipulation of the stack pointer to
call the platform specific function with a simple 1:1 page table,
I would expect the problem somewhere in there, haven't pinpointed
the exact line yet, but if any of that code tries to local the physical
address of the stack using virt_to_phys or its asm equivalent, this
fails for a vmap stack.

        Arnd

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH v3 7/7] ARM: implement support for vmap'ed stacks
  2021-11-17  9:03                   ` Arnd Bergmann
@ 2021-11-17  9:07                     ` Arnd Bergmann
  -1 siblings, 0 replies; 36+ messages in thread
From: Arnd Bergmann @ 2021-11-17  9:07 UTC (permalink / raw)
  To: Arnd Bergmann
  Cc: Tony Lindgren, Ard Biesheuvel, Russell King (Oracle),
	Guillaume Tucker, linux-omap, Linux ARM, Nicolas Pitre,
	Kees Cook, Keith Packard, Linus Walleij, Nick Desaulniers,
	kernelci

On Wed, Nov 17, 2021 at 10:03 AM Arnd Bergmann <arnd@arndb.de> wrote:
>
> On Wed, Nov 17, 2021 at 9:36 AM Tony Lindgren <tony@atomide.com> wrote:
> > * Ard Biesheuvel <ardb@kernel.org> [211117 08:29]:
> > >
> > > Thanks Tony, that is very helpful. I have a Beaglebone white somewhere
> > > so I'll try and reproduce it locally as well.
> >
> > I think with Beaglebone you may hit this only with suspend/resume if at
> > all. On am335x cpuidle is not shutting down the CPU. And only some models
> > will suspend to deeper idle states as it depends on the PMIC.
> >
> > If you have some test patch to try, just let me know.
>
> I looked at how the sleep code is called and found that cpu_suspend()/
> __cpu_suspend() has interesting manipulation of the stack pointer to
> call the platform specific function with a simple 1:1 page table,
> I would expect the problem somewhere in there, haven't pinpointed
> the exact line yet, but if any of that code tries to local the physical
> address of the stack using virt_to_phys or its asm equivalent, this
> fails for a vmap stack.

and just after sending this I see

void __cpu_suspend_save(u32 *ptr, u32 ptrsz, u32 sp, u32 *save_ptr)
{
        *save_ptr = virt_to_phys(ptr);

'ptr' is a pointer to the stack here. It might not be the only place that
needs fixing, but this clearly has to do a page table walk like
vmalloc_to_page() does to get to the correct physical address.

        Arnd

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH v3 7/7] ARM: implement support for vmap'ed stacks
@ 2021-11-17  9:07                     ` Arnd Bergmann
  0 siblings, 0 replies; 36+ messages in thread
From: Arnd Bergmann @ 2021-11-17  9:07 UTC (permalink / raw)
  To: Arnd Bergmann
  Cc: Tony Lindgren, Ard Biesheuvel, Russell King (Oracle),
	Guillaume Tucker, linux-omap, Linux ARM, Nicolas Pitre,
	Kees Cook, Keith Packard, Linus Walleij, Nick Desaulniers,
	kernelci

On Wed, Nov 17, 2021 at 10:03 AM Arnd Bergmann <arnd@arndb.de> wrote:
>
> On Wed, Nov 17, 2021 at 9:36 AM Tony Lindgren <tony@atomide.com> wrote:
> > * Ard Biesheuvel <ardb@kernel.org> [211117 08:29]:
> > >
> > > Thanks Tony, that is very helpful. I have a Beaglebone white somewhere
> > > so I'll try and reproduce it locally as well.
> >
> > I think with Beaglebone you may hit this only with suspend/resume if at
> > all. On am335x cpuidle is not shutting down the CPU. And only some models
> > will suspend to deeper idle states as it depends on the PMIC.
> >
> > If you have some test patch to try, just let me know.
>
> I looked at how the sleep code is called and found that cpu_suspend()/
> __cpu_suspend() has interesting manipulation of the stack pointer to
> call the platform specific function with a simple 1:1 page table,
> I would expect the problem somewhere in there, haven't pinpointed
> the exact line yet, but if any of that code tries to local the physical
> address of the stack using virt_to_phys or its asm equivalent, this
> fails for a vmap stack.

and just after sending this I see

void __cpu_suspend_save(u32 *ptr, u32 ptrsz, u32 sp, u32 *save_ptr)
{
        *save_ptr = virt_to_phys(ptr);

'ptr' is a pointer to the stack here. It might not be the only place that
needs fixing, but this clearly has to do a page table walk like
vmalloc_to_page() does to get to the correct physical address.

        Arnd

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH v3 7/7] ARM: implement support for vmap'ed stacks
  2021-11-17  9:07                     ` Arnd Bergmann
@ 2021-11-17  9:08                       ` Ard Biesheuvel
  -1 siblings, 0 replies; 36+ messages in thread
From: Ard Biesheuvel @ 2021-11-17  9:08 UTC (permalink / raw)
  To: Arnd Bergmann
  Cc: Tony Lindgren, Russell King (Oracle),
	Guillaume Tucker, linux-omap, Linux ARM, Nicolas Pitre,
	Kees Cook, Keith Packard, Linus Walleij, Nick Desaulniers,
	kernelci

On Wed, 17 Nov 2021 at 10:07, Arnd Bergmann <arnd@arndb.de> wrote:
>
> On Wed, Nov 17, 2021 at 10:03 AM Arnd Bergmann <arnd@arndb.de> wrote:
> >
> > On Wed, Nov 17, 2021 at 9:36 AM Tony Lindgren <tony@atomide.com> wrote:
> > > * Ard Biesheuvel <ardb@kernel.org> [211117 08:29]:
> > > >
> > > > Thanks Tony, that is very helpful. I have a Beaglebone white somewhere
> > > > so I'll try and reproduce it locally as well.
> > >
> > > I think with Beaglebone you may hit this only with suspend/resume if at
> > > all. On am335x cpuidle is not shutting down the CPU. And only some models
> > > will suspend to deeper idle states as it depends on the PMIC.
> > >
> > > If you have some test patch to try, just let me know.
> >
> > I looked at how the sleep code is called and found that cpu_suspend()/
> > __cpu_suspend() has interesting manipulation of the stack pointer to
> > call the platform specific function with a simple 1:1 page table,
> > I would expect the problem somewhere in there, haven't pinpointed
> > the exact line yet, but if any of that code tries to local the physical
> > address of the stack using virt_to_phys or its asm equivalent, this
> > fails for a vmap stack.
>
> and just after sending this I see
>
> void __cpu_suspend_save(u32 *ptr, u32 ptrsz, u32 sp, u32 *save_ptr)
> {
>         *save_ptr = virt_to_phys(ptr);
>
> 'ptr' is a pointer to the stack here. It might not be the only place that
> needs fixing, but this clearly has to do a page table walk like
> vmalloc_to_page() does to get to the correct physical address.
>

I had just arrived at the same conclusion. I'll fix this up and drop
it in kernelci.

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH v3 7/7] ARM: implement support for vmap'ed stacks
@ 2021-11-17  9:08                       ` Ard Biesheuvel
  0 siblings, 0 replies; 36+ messages in thread
From: Ard Biesheuvel @ 2021-11-17  9:08 UTC (permalink / raw)
  To: Arnd Bergmann
  Cc: Tony Lindgren, Russell King (Oracle),
	Guillaume Tucker, linux-omap, Linux ARM, Nicolas Pitre,
	Kees Cook, Keith Packard, Linus Walleij, Nick Desaulniers,
	kernelci

On Wed, 17 Nov 2021 at 10:07, Arnd Bergmann <arnd@arndb.de> wrote:
>
> On Wed, Nov 17, 2021 at 10:03 AM Arnd Bergmann <arnd@arndb.de> wrote:
> >
> > On Wed, Nov 17, 2021 at 9:36 AM Tony Lindgren <tony@atomide.com> wrote:
> > > * Ard Biesheuvel <ardb@kernel.org> [211117 08:29]:
> > > >
> > > > Thanks Tony, that is very helpful. I have a Beaglebone white somewhere
> > > > so I'll try and reproduce it locally as well.
> > >
> > > I think with Beaglebone you may hit this only with suspend/resume if at
> > > all. On am335x cpuidle is not shutting down the CPU. And only some models
> > > will suspend to deeper idle states as it depends on the PMIC.
> > >
> > > If you have some test patch to try, just let me know.
> >
> > I looked at how the sleep code is called and found that cpu_suspend()/
> > __cpu_suspend() has interesting manipulation of the stack pointer to
> > call the platform specific function with a simple 1:1 page table,
> > I would expect the problem somewhere in there, haven't pinpointed
> > the exact line yet, but if any of that code tries to local the physical
> > address of the stack using virt_to_phys or its asm equivalent, this
> > fails for a vmap stack.
>
> and just after sending this I see
>
> void __cpu_suspend_save(u32 *ptr, u32 ptrsz, u32 sp, u32 *save_ptr)
> {
>         *save_ptr = virt_to_phys(ptr);
>
> 'ptr' is a pointer to the stack here. It might not be the only place that
> needs fixing, but this clearly has to do a page table walk like
> vmalloc_to_page() does to get to the correct physical address.
>

I had just arrived at the same conclusion. I'll fix this up and drop
it in kernelci.

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH v3 7/7] ARM: implement support for vmap'ed stacks
  2021-11-17  9:08                       ` Ard Biesheuvel
@ 2021-11-17 10:48                         ` Ard Biesheuvel
  -1 siblings, 0 replies; 36+ messages in thread
From: Ard Biesheuvel @ 2021-11-17 10:48 UTC (permalink / raw)
  To: Arnd Bergmann
  Cc: Tony Lindgren, Russell King (Oracle),
	Guillaume Tucker, linux-omap, Linux ARM, Nicolas Pitre,
	Kees Cook, Keith Packard, Linus Walleij, Nick Desaulniers,
	kernelci

On Wed, 17 Nov 2021 at 10:08, Ard Biesheuvel <ardb@kernel.org> wrote:
>
> On Wed, 17 Nov 2021 at 10:07, Arnd Bergmann <arnd@arndb.de> wrote:
> >
> > On Wed, Nov 17, 2021 at 10:03 AM Arnd Bergmann <arnd@arndb.de> wrote:
> > >
> > > On Wed, Nov 17, 2021 at 9:36 AM Tony Lindgren <tony@atomide.com> wrote:
> > > > * Ard Biesheuvel <ardb@kernel.org> [211117 08:29]:
> > > > >
> > > > > Thanks Tony, that is very helpful. I have a Beaglebone white somewhere
> > > > > so I'll try and reproduce it locally as well.
> > > >
> > > > I think with Beaglebone you may hit this only with suspend/resume if at
> > > > all. On am335x cpuidle is not shutting down the CPU. And only some models
> > > > will suspend to deeper idle states as it depends on the PMIC.
> > > >
> > > > If you have some test patch to try, just let me know.
> > >
> > > I looked at how the sleep code is called and found that cpu_suspend()/
> > > __cpu_suspend() has interesting manipulation of the stack pointer to
> > > call the platform specific function with a simple 1:1 page table,
> > > I would expect the problem somewhere in there, haven't pinpointed
> > > the exact line yet, but if any of that code tries to local the physical
> > > address of the stack using virt_to_phys or its asm equivalent, this
> > > fails for a vmap stack.
> >
> > and just after sending this I see
> >
> > void __cpu_suspend_save(u32 *ptr, u32 ptrsz, u32 sp, u32 *save_ptr)
> > {
> >         *save_ptr = virt_to_phys(ptr);
> >
> > 'ptr' is a pointer to the stack here. It might not be the only place that
> > needs fixing, but this clearly has to do a page table walk like
> > vmalloc_to_page() does to get to the correct physical address.
> >
>
> I had just arrived at the same conclusion. I'll fix this up and drop
> it in kernelci.

Updated branch here:
https://git.kernel.org/pub/scm/linux/kernel/git/ardb/linux.git/commit/?h=arm-vmap-stacks-v4

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH v3 7/7] ARM: implement support for vmap'ed stacks
@ 2021-11-17 10:48                         ` Ard Biesheuvel
  0 siblings, 0 replies; 36+ messages in thread
From: Ard Biesheuvel @ 2021-11-17 10:48 UTC (permalink / raw)
  To: Arnd Bergmann
  Cc: Tony Lindgren, Russell King (Oracle),
	Guillaume Tucker, linux-omap, Linux ARM, Nicolas Pitre,
	Kees Cook, Keith Packard, Linus Walleij, Nick Desaulniers,
	kernelci

On Wed, 17 Nov 2021 at 10:08, Ard Biesheuvel <ardb@kernel.org> wrote:
>
> On Wed, 17 Nov 2021 at 10:07, Arnd Bergmann <arnd@arndb.de> wrote:
> >
> > On Wed, Nov 17, 2021 at 10:03 AM Arnd Bergmann <arnd@arndb.de> wrote:
> > >
> > > On Wed, Nov 17, 2021 at 9:36 AM Tony Lindgren <tony@atomide.com> wrote:
> > > > * Ard Biesheuvel <ardb@kernel.org> [211117 08:29]:
> > > > >
> > > > > Thanks Tony, that is very helpful. I have a Beaglebone white somewhere
> > > > > so I'll try and reproduce it locally as well.
> > > >
> > > > I think with Beaglebone you may hit this only with suspend/resume if at
> > > > all. On am335x cpuidle is not shutting down the CPU. And only some models
> > > > will suspend to deeper idle states as it depends on the PMIC.
> > > >
> > > > If you have some test patch to try, just let me know.
> > >
> > > I looked at how the sleep code is called and found that cpu_suspend()/
> > > __cpu_suspend() has interesting manipulation of the stack pointer to
> > > call the platform specific function with a simple 1:1 page table,
> > > I would expect the problem somewhere in there, haven't pinpointed
> > > the exact line yet, but if any of that code tries to local the physical
> > > address of the stack using virt_to_phys or its asm equivalent, this
> > > fails for a vmap stack.
> >
> > and just after sending this I see
> >
> > void __cpu_suspend_save(u32 *ptr, u32 ptrsz, u32 sp, u32 *save_ptr)
> > {
> >         *save_ptr = virt_to_phys(ptr);
> >
> > 'ptr' is a pointer to the stack here. It might not be the only place that
> > needs fixing, but this clearly has to do a page table walk like
> > vmalloc_to_page() does to get to the correct physical address.
> >
>
> I had just arrived at the same conclusion. I'll fix this up and drop
> it in kernelci.

Updated branch here:
https://git.kernel.org/pub/scm/linux/kernel/git/ardb/linux.git/commit/?h=arm-vmap-stacks-v4

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH v3 7/7] ARM: implement support for vmap'ed stacks
  2021-11-17 10:48                         ` Ard Biesheuvel
@ 2021-11-17 11:12                           ` Tony Lindgren
  -1 siblings, 0 replies; 36+ messages in thread
From: Tony Lindgren @ 2021-11-17 11:12 UTC (permalink / raw)
  To: Ard Biesheuvel
  Cc: Arnd Bergmann, Russell King (Oracle),
	Guillaume Tucker, linux-omap, Linux ARM, Nicolas Pitre,
	Kees Cook, Keith Packard, Linus Walleij, Nick Desaulniers,
	kernelci

* Ard Biesheuvel <ardb@kernel.org> [211117 10:49]:
> On Wed, 17 Nov 2021 at 10:08, Ard Biesheuvel <ardb@kernel.org> wrote:
> >
> > On Wed, 17 Nov 2021 at 10:07, Arnd Bergmann <arnd@arndb.de> wrote:
> > >
> > > On Wed, Nov 17, 2021 at 10:03 AM Arnd Bergmann <arnd@arndb.de> wrote:
> > > >
> > > > On Wed, Nov 17, 2021 at 9:36 AM Tony Lindgren <tony@atomide.com> wrote:
> > > > > * Ard Biesheuvel <ardb@kernel.org> [211117 08:29]:
> > > > > >
> > > > > > Thanks Tony, that is very helpful. I have a Beaglebone white somewhere
> > > > > > so I'll try and reproduce it locally as well.
> > > > >
> > > > > I think with Beaglebone you may hit this only with suspend/resume if at
> > > > > all. On am335x cpuidle is not shutting down the CPU. And only some models
> > > > > will suspend to deeper idle states as it depends on the PMIC.
> > > > >
> > > > > If you have some test patch to try, just let me know.
> > > >
> > > > I looked at how the sleep code is called and found that cpu_suspend()/
> > > > __cpu_suspend() has interesting manipulation of the stack pointer to
> > > > call the platform specific function with a simple 1:1 page table,
> > > > I would expect the problem somewhere in there, haven't pinpointed
> > > > the exact line yet, but if any of that code tries to local the physical
> > > > address of the stack using virt_to_phys or its asm equivalent, this
> > > > fails for a vmap stack.
> > >
> > > and just after sending this I see
> > >
> > > void __cpu_suspend_save(u32 *ptr, u32 ptrsz, u32 sp, u32 *save_ptr)
> > > {
> > >         *save_ptr = virt_to_phys(ptr);
> > >
> > > 'ptr' is a pointer to the stack here. It might not be the only place that
> > > needs fixing, but this clearly has to do a page table walk like
> > > vmalloc_to_page() does to get to the correct physical address.
> > >
> >
> > I had just arrived at the same conclusion. I'll fix this up and drop
> > it in kernelci.
> 
> Updated branch here:
> https://git.kernel.org/pub/scm/linux/kernel/git/ardb/linux.git/commit/?h=arm-vmap-stacks-v4

Great that branch boots for me!

Regards,

Tony

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH v3 7/7] ARM: implement support for vmap'ed stacks
@ 2021-11-17 11:12                           ` Tony Lindgren
  0 siblings, 0 replies; 36+ messages in thread
From: Tony Lindgren @ 2021-11-17 11:12 UTC (permalink / raw)
  To: Ard Biesheuvel
  Cc: Arnd Bergmann, Russell King (Oracle),
	Guillaume Tucker, linux-omap, Linux ARM, Nicolas Pitre,
	Kees Cook, Keith Packard, Linus Walleij, Nick Desaulniers,
	kernelci

* Ard Biesheuvel <ardb@kernel.org> [211117 10:49]:
> On Wed, 17 Nov 2021 at 10:08, Ard Biesheuvel <ardb@kernel.org> wrote:
> >
> > On Wed, 17 Nov 2021 at 10:07, Arnd Bergmann <arnd@arndb.de> wrote:
> > >
> > > On Wed, Nov 17, 2021 at 10:03 AM Arnd Bergmann <arnd@arndb.de> wrote:
> > > >
> > > > On Wed, Nov 17, 2021 at 9:36 AM Tony Lindgren <tony@atomide.com> wrote:
> > > > > * Ard Biesheuvel <ardb@kernel.org> [211117 08:29]:
> > > > > >
> > > > > > Thanks Tony, that is very helpful. I have a Beaglebone white somewhere
> > > > > > so I'll try and reproduce it locally as well.
> > > > >
> > > > > I think with Beaglebone you may hit this only with suspend/resume if at
> > > > > all. On am335x cpuidle is not shutting down the CPU. And only some models
> > > > > will suspend to deeper idle states as it depends on the PMIC.
> > > > >
> > > > > If you have some test patch to try, just let me know.
> > > >
> > > > I looked at how the sleep code is called and found that cpu_suspend()/
> > > > __cpu_suspend() has interesting manipulation of the stack pointer to
> > > > call the platform specific function with a simple 1:1 page table,
> > > > I would expect the problem somewhere in there, haven't pinpointed
> > > > the exact line yet, but if any of that code tries to local the physical
> > > > address of the stack using virt_to_phys or its asm equivalent, this
> > > > fails for a vmap stack.
> > >
> > > and just after sending this I see
> > >
> > > void __cpu_suspend_save(u32 *ptr, u32 ptrsz, u32 sp, u32 *save_ptr)
> > > {
> > >         *save_ptr = virt_to_phys(ptr);
> > >
> > > 'ptr' is a pointer to the stack here. It might not be the only place that
> > > needs fixing, but this clearly has to do a page table walk like
> > > vmalloc_to_page() does to get to the correct physical address.
> > >
> >
> > I had just arrived at the same conclusion. I'll fix this up and drop
> > it in kernelci.
> 
> Updated branch here:
> https://git.kernel.org/pub/scm/linux/kernel/git/ardb/linux.git/commit/?h=arm-vmap-stacks-v4

Great that branch boots for me!

Regards,

Tony

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH v3 7/7] ARM: implement support for vmap'ed stacks
  2021-11-17 11:12                           ` Tony Lindgren
@ 2021-11-17 11:13                             ` Ard Biesheuvel
  -1 siblings, 0 replies; 36+ messages in thread
From: Ard Biesheuvel @ 2021-11-17 11:13 UTC (permalink / raw)
  To: Tony Lindgren
  Cc: Arnd Bergmann, Russell King (Oracle),
	Guillaume Tucker, linux-omap, Linux ARM, Nicolas Pitre,
	Kees Cook, Keith Packard, Linus Walleij, Nick Desaulniers,
	kernelci

On Wed, 17 Nov 2021 at 12:12, Tony Lindgren <tony@atomide.com> wrote:
>
> * Ard Biesheuvel <ardb@kernel.org> [211117 10:49]:
> > On Wed, 17 Nov 2021 at 10:08, Ard Biesheuvel <ardb@kernel.org> wrote:
> > >
> > > On Wed, 17 Nov 2021 at 10:07, Arnd Bergmann <arnd@arndb.de> wrote:
> > > >
> > > > On Wed, Nov 17, 2021 at 10:03 AM Arnd Bergmann <arnd@arndb.de> wrote:
> > > > >
> > > > > On Wed, Nov 17, 2021 at 9:36 AM Tony Lindgren <tony@atomide.com> wrote:
> > > > > > * Ard Biesheuvel <ardb@kernel.org> [211117 08:29]:
> > > > > > >
> > > > > > > Thanks Tony, that is very helpful. I have a Beaglebone white somewhere
> > > > > > > so I'll try and reproduce it locally as well.
> > > > > >
> > > > > > I think with Beaglebone you may hit this only with suspend/resume if at
> > > > > > all. On am335x cpuidle is not shutting down the CPU. And only some models
> > > > > > will suspend to deeper idle states as it depends on the PMIC.
> > > > > >
> > > > > > If you have some test patch to try, just let me know.
> > > > >
> > > > > I looked at how the sleep code is called and found that cpu_suspend()/
> > > > > __cpu_suspend() has interesting manipulation of the stack pointer to
> > > > > call the platform specific function with a simple 1:1 page table,
> > > > > I would expect the problem somewhere in there, haven't pinpointed
> > > > > the exact line yet, but if any of that code tries to local the physical
> > > > > address of the stack using virt_to_phys or its asm equivalent, this
> > > > > fails for a vmap stack.
> > > >
> > > > and just after sending this I see
> > > >
> > > > void __cpu_suspend_save(u32 *ptr, u32 ptrsz, u32 sp, u32 *save_ptr)
> > > > {
> > > >         *save_ptr = virt_to_phys(ptr);
> > > >
> > > > 'ptr' is a pointer to the stack here. It might not be the only place that
> > > > needs fixing, but this clearly has to do a page table walk like
> > > > vmalloc_to_page() does to get to the correct physical address.
> > > >
> > >
> > > I had just arrived at the same conclusion. I'll fix this up and drop
> > > it in kernelci.
> >
> > Updated branch here:
> > https://git.kernel.org/pub/scm/linux/kernel/git/ardb/linux.git/commit/?h=arm-vmap-stacks-v4
>
> Great that branch boots for me!
>

Thanks for testing!

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH v3 7/7] ARM: implement support for vmap'ed stacks
@ 2021-11-17 11:13                             ` Ard Biesheuvel
  0 siblings, 0 replies; 36+ messages in thread
From: Ard Biesheuvel @ 2021-11-17 11:13 UTC (permalink / raw)
  To: Tony Lindgren
  Cc: Arnd Bergmann, Russell King (Oracle),
	Guillaume Tucker, linux-omap, Linux ARM, Nicolas Pitre,
	Kees Cook, Keith Packard, Linus Walleij, Nick Desaulniers,
	kernelci

On Wed, 17 Nov 2021 at 12:12, Tony Lindgren <tony@atomide.com> wrote:
>
> * Ard Biesheuvel <ardb@kernel.org> [211117 10:49]:
> > On Wed, 17 Nov 2021 at 10:08, Ard Biesheuvel <ardb@kernel.org> wrote:
> > >
> > > On Wed, 17 Nov 2021 at 10:07, Arnd Bergmann <arnd@arndb.de> wrote:
> > > >
> > > > On Wed, Nov 17, 2021 at 10:03 AM Arnd Bergmann <arnd@arndb.de> wrote:
> > > > >
> > > > > On Wed, Nov 17, 2021 at 9:36 AM Tony Lindgren <tony@atomide.com> wrote:
> > > > > > * Ard Biesheuvel <ardb@kernel.org> [211117 08:29]:
> > > > > > >
> > > > > > > Thanks Tony, that is very helpful. I have a Beaglebone white somewhere
> > > > > > > so I'll try and reproduce it locally as well.
> > > > > >
> > > > > > I think with Beaglebone you may hit this only with suspend/resume if at
> > > > > > all. On am335x cpuidle is not shutting down the CPU. And only some models
> > > > > > will suspend to deeper idle states as it depends on the PMIC.
> > > > > >
> > > > > > If you have some test patch to try, just let me know.
> > > > >
> > > > > I looked at how the sleep code is called and found that cpu_suspend()/
> > > > > __cpu_suspend() has interesting manipulation of the stack pointer to
> > > > > call the platform specific function with a simple 1:1 page table,
> > > > > I would expect the problem somewhere in there, haven't pinpointed
> > > > > the exact line yet, but if any of that code tries to local the physical
> > > > > address of the stack using virt_to_phys or its asm equivalent, this
> > > > > fails for a vmap stack.
> > > >
> > > > and just after sending this I see
> > > >
> > > > void __cpu_suspend_save(u32 *ptr, u32 ptrsz, u32 sp, u32 *save_ptr)
> > > > {
> > > >         *save_ptr = virt_to_phys(ptr);
> > > >
> > > > 'ptr' is a pointer to the stack here. It might not be the only place that
> > > > needs fixing, but this clearly has to do a page table walk like
> > > > vmalloc_to_page() does to get to the correct physical address.
> > > >
> > >
> > > I had just arrived at the same conclusion. I'll fix this up and drop
> > > it in kernelci.
> >
> > Updated branch here:
> > https://git.kernel.org/pub/scm/linux/kernel/git/ardb/linux.git/commit/?h=arm-vmap-stacks-v4
>
> Great that branch boots for me!
>

Thanks for testing!

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH v3 7/7] ARM: implement support for vmap'ed stacks
  2021-11-17 11:13                             ` Ard Biesheuvel
@ 2021-11-17 14:03                               ` Guillaume Tucker
  -1 siblings, 0 replies; 36+ messages in thread
From: Guillaume Tucker @ 2021-11-17 14:03 UTC (permalink / raw)
  To: Ard Biesheuvel, Tony Lindgren
  Cc: Arnd Bergmann, Russell King (Oracle),
	linux-omap, Linux ARM, Nicolas Pitre, Kees Cook, Keith Packard,
	Linus Walleij, Nick Desaulniers, kernelci

On 17/11/2021 11:13, Ard Biesheuvel wrote:
> On Wed, 17 Nov 2021 at 12:12, Tony Lindgren <tony@atomide.com> wrote:
>>
>> * Ard Biesheuvel <ardb@kernel.org> [211117 10:49]:
>>> On Wed, 17 Nov 2021 at 10:08, Ard Biesheuvel <ardb@kernel.org> wrote:
>>>>
>>>> On Wed, 17 Nov 2021 at 10:07, Arnd Bergmann <arnd@arndb.de> wrote:
>>>>>
>>>>> On Wed, Nov 17, 2021 at 10:03 AM Arnd Bergmann <arnd@arndb.de> wrote:
>>>>>>
>>>>>> On Wed, Nov 17, 2021 at 9:36 AM Tony Lindgren <tony@atomide.com> wrote:
>>>>>>> * Ard Biesheuvel <ardb@kernel.org> [211117 08:29]:
>>>>>>>>
>>>>>>>> Thanks Tony, that is very helpful. I have a Beaglebone white somewhere
>>>>>>>> so I'll try and reproduce it locally as well.
>>>>>>>
>>>>>>> I think with Beaglebone you may hit this only with suspend/resume if at
>>>>>>> all. On am335x cpuidle is not shutting down the CPU. And only some models
>>>>>>> will suspend to deeper idle states as it depends on the PMIC.
>>>>>>>
>>>>>>> If you have some test patch to try, just let me know.
>>>>>>
>>>>>> I looked at how the sleep code is called and found that cpu_suspend()/
>>>>>> __cpu_suspend() has interesting manipulation of the stack pointer to
>>>>>> call the platform specific function with a simple 1:1 page table,
>>>>>> I would expect the problem somewhere in there, haven't pinpointed
>>>>>> the exact line yet, but if any of that code tries to local the physical
>>>>>> address of the stack using virt_to_phys or its asm equivalent, this
>>>>>> fails for a vmap stack.
>>>>>
>>>>> and just after sending this I see
>>>>>
>>>>> void __cpu_suspend_save(u32 *ptr, u32 ptrsz, u32 sp, u32 *save_ptr)
>>>>> {
>>>>>         *save_ptr = virt_to_phys(ptr);
>>>>>
>>>>> 'ptr' is a pointer to the stack here. It might not be the only place that
>>>>> needs fixing, but this clearly has to do a page table walk like
>>>>> vmalloc_to_page() does to get to the correct physical address.
>>>>>
>>>>
>>>> I had just arrived at the same conclusion. I'll fix this up and drop
>>>> it in kernelci.
>>>
>>> Updated branch here:
>>> https://git.kernel.org/pub/scm/linux/kernel/git/ardb/linux.git/commit/?h=arm-vmap-stacks-v4
>>
>> Great that branch boots for me!
>>
> 
> Thanks for testing!

Tested-by: "kernelci.org bot" <bot@kernelci.org>

https://staging.kernelci.org/test/plan/id/6194fd2f85155923f71760ce/

Guillaume

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH v3 7/7] ARM: implement support for vmap'ed stacks
@ 2021-11-17 14:03                               ` Guillaume Tucker
  0 siblings, 0 replies; 36+ messages in thread
From: Guillaume Tucker @ 2021-11-17 14:03 UTC (permalink / raw)
  To: Ard Biesheuvel, Tony Lindgren
  Cc: Arnd Bergmann, Russell King (Oracle),
	linux-omap, Linux ARM, Nicolas Pitre, Kees Cook, Keith Packard,
	Linus Walleij, Nick Desaulniers, kernelci

On 17/11/2021 11:13, Ard Biesheuvel wrote:
> On Wed, 17 Nov 2021 at 12:12, Tony Lindgren <tony@atomide.com> wrote:
>>
>> * Ard Biesheuvel <ardb@kernel.org> [211117 10:49]:
>>> On Wed, 17 Nov 2021 at 10:08, Ard Biesheuvel <ardb@kernel.org> wrote:
>>>>
>>>> On Wed, 17 Nov 2021 at 10:07, Arnd Bergmann <arnd@arndb.de> wrote:
>>>>>
>>>>> On Wed, Nov 17, 2021 at 10:03 AM Arnd Bergmann <arnd@arndb.de> wrote:
>>>>>>
>>>>>> On Wed, Nov 17, 2021 at 9:36 AM Tony Lindgren <tony@atomide.com> wrote:
>>>>>>> * Ard Biesheuvel <ardb@kernel.org> [211117 08:29]:
>>>>>>>>
>>>>>>>> Thanks Tony, that is very helpful. I have a Beaglebone white somewhere
>>>>>>>> so I'll try and reproduce it locally as well.
>>>>>>>
>>>>>>> I think with Beaglebone you may hit this only with suspend/resume if at
>>>>>>> all. On am335x cpuidle is not shutting down the CPU. And only some models
>>>>>>> will suspend to deeper idle states as it depends on the PMIC.
>>>>>>>
>>>>>>> If you have some test patch to try, just let me know.
>>>>>>
>>>>>> I looked at how the sleep code is called and found that cpu_suspend()/
>>>>>> __cpu_suspend() has interesting manipulation of the stack pointer to
>>>>>> call the platform specific function with a simple 1:1 page table,
>>>>>> I would expect the problem somewhere in there, haven't pinpointed
>>>>>> the exact line yet, but if any of that code tries to local the physical
>>>>>> address of the stack using virt_to_phys or its asm equivalent, this
>>>>>> fails for a vmap stack.
>>>>>
>>>>> and just after sending this I see
>>>>>
>>>>> void __cpu_suspend_save(u32 *ptr, u32 ptrsz, u32 sp, u32 *save_ptr)
>>>>> {
>>>>>         *save_ptr = virt_to_phys(ptr);
>>>>>
>>>>> 'ptr' is a pointer to the stack here. It might not be the only place that
>>>>> needs fixing, but this clearly has to do a page table walk like
>>>>> vmalloc_to_page() does to get to the correct physical address.
>>>>>
>>>>
>>>> I had just arrived at the same conclusion. I'll fix this up and drop
>>>> it in kernelci.
>>>
>>> Updated branch here:
>>> https://git.kernel.org/pub/scm/linux/kernel/git/ardb/linux.git/commit/?h=arm-vmap-stacks-v4
>>
>> Great that branch boots for me!
>>
> 
> Thanks for testing!

Tested-by: "kernelci.org bot" <bot@kernelci.org>

https://staging.kernelci.org/test/plan/id/6194fd2f85155923f71760ce/

Guillaume

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 36+ messages in thread

end of thread, other threads:[~2021-11-17 14:05 UTC | newest]

Thread overview: 36+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-11-15 11:18 [PATCH v3 0/7] ARM: add vmap'ed stack support Ard Biesheuvel
2021-11-15 11:18 ` [PATCH v3 1/7] ARM: memcpy: use frame pointer as unwind anchor Ard Biesheuvel
2021-11-15 11:18 ` [PATCH v3 2/7] ARM: memmove: " Ard Biesheuvel
2021-11-15 11:18 ` [PATCH v3 3/7] ARM: memset: clean up unwind annotations Ard Biesheuvel
2021-11-15 11:18 ` [PATCH v3 4/7] ARM: unwind: disregard unwind info before stack frame is set up Ard Biesheuvel
2021-11-15 11:18 ` [PATCH v3 5/7] ARM: switch_to: clean up Thumb2 code path Ard Biesheuvel
2021-11-15 11:18 ` [PATCH v3 6/7] ARM: entry: rework stack realignment code in svc_entry Ard Biesheuvel
2021-11-15 11:18 ` [PATCH v3 7/7] ARM: implement support for vmap'ed stacks Ard Biesheuvel
2021-11-16  9:22   ` Guillaume Tucker
2021-11-16  9:22     ` Guillaume Tucker
2021-11-16 19:28     ` Ard Biesheuvel
2021-11-16 19:28       ` Ard Biesheuvel
2021-11-16 20:06       ` Russell King (Oracle)
2021-11-16 20:06         ` Russell King (Oracle)
2021-11-16 22:02         ` Ard Biesheuvel
2021-11-16 22:02           ` Ard Biesheuvel
2021-11-17  7:59           ` Tony Lindgren
2021-11-17  7:59             ` Tony Lindgren
2021-11-17  8:28             ` Ard Biesheuvel
2021-11-17  8:28               ` Ard Biesheuvel
2021-11-17  8:36               ` Tony Lindgren
2021-11-17  8:36                 ` Tony Lindgren
2021-11-17  9:03                 ` Arnd Bergmann
2021-11-17  9:03                   ` Arnd Bergmann
2021-11-17  9:07                   ` Arnd Bergmann
2021-11-17  9:07                     ` Arnd Bergmann
2021-11-17  9:08                     ` Ard Biesheuvel
2021-11-17  9:08                       ` Ard Biesheuvel
2021-11-17 10:48                       ` Ard Biesheuvel
2021-11-17 10:48                         ` Ard Biesheuvel
2021-11-17 11:12                         ` Tony Lindgren
2021-11-17 11:12                           ` Tony Lindgren
2021-11-17 11:13                           ` Ard Biesheuvel
2021-11-17 11:13                             ` Ard Biesheuvel
2021-11-17 14:03                             ` Guillaume Tucker
2021-11-17 14:03                               ` Guillaume Tucker

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.