linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v2 0/4] riscv: entry: further clean up and VMAP_STACK fix
@ 2022-09-28 16:20 Jisheng Zhang
  2022-09-28 16:20 ` [PATCH v2 1/4] riscv: remove extra level wrappers of trace_hardirqs_{on,off} Jisheng Zhang
                   ` (3 more replies)
  0 siblings, 4 replies; 18+ messages in thread
From: Jisheng Zhang @ 2022-09-28 16:20 UTC (permalink / raw)
  To: Paul Walmsley, Palmer Dabbelt, Albert Ou, Nathan Chancellor,
	Nick Desaulniers, Guo Ren
  Cc: linux-riscv, linux-kernel, llvm

I planed to do similar generic entry transaction as Guo Ren did[1], and
I had some commits in local. Since Guo has sent out the series, I
dropped my version and just provide those in my local repo but missing
in Guo's series. However, this doesn't mean this series depends on
Guo's series, in fact except the first one, the remaining three patches
are independent on generic entry.

[1]https://lore.kernel.org/linux-riscv/20220918155246.1203293-1-guoren@kernel.org/T/#t

Since v1:
  - consolidate gp saving/restoring in mcount-dyn.S as well
  - avoid non-trival memory waste if NR_CPUs is large
  - collect Reviewed-by tag.

Hi Guo,

I only added your reviewed-by tag to patch1, but dropped tag for patch3
and patch4 because the patch changed a bit.

Thanks

Jisheng Zhang (4):
  riscv: remove extra level wrappers of trace_hardirqs_{on,off}
  riscv: consolidate ret_from_kernel_thread into ret_from_fork
  riscv: fix race when vmap stack overflow and remove shadow_stack
  riscv: entry: consolidate general regs saving into save_gp

 arch/riscv/include/asm/asm-prototypes.h |   1 -
 arch/riscv/include/asm/asm.h            |  65 ++++++++++
 arch/riscv/include/asm/thread_info.h    |   4 +-
 arch/riscv/kernel/Makefile              |   2 -
 arch/riscv/kernel/asm-offsets.c         |   1 +
 arch/riscv/kernel/entry.S               | 154 +++---------------------
 arch/riscv/kernel/mcount-dyn.S          |  58 +--------
 arch/riscv/kernel/process.c             |   5 +-
 arch/riscv/kernel/trace_irq.c           |  27 -----
 arch/riscv/kernel/trace_irq.h           |  11 --
 arch/riscv/kernel/traps.c               |  31 +++--
 11 files changed, 104 insertions(+), 255 deletions(-)
 delete mode 100644 arch/riscv/kernel/trace_irq.c
 delete mode 100644 arch/riscv/kernel/trace_irq.h

-- 
2.34.1


^ permalink raw reply	[flat|nested] 18+ messages in thread

* [PATCH v2 1/4] riscv: remove extra level wrappers of trace_hardirqs_{on,off}
  2022-09-28 16:20 [PATCH v2 0/4] riscv: entry: further clean up and VMAP_STACK fix Jisheng Zhang
@ 2022-09-28 16:20 ` Jisheng Zhang
  2022-09-28 16:20 ` [PATCH v2 2/4] riscv: consolidate ret_from_kernel_thread into ret_from_fork Jisheng Zhang
                   ` (2 subsequent siblings)
  3 siblings, 0 replies; 18+ messages in thread
From: Jisheng Zhang @ 2022-09-28 16:20 UTC (permalink / raw)
  To: Paul Walmsley, Palmer Dabbelt, Albert Ou, Nathan Chancellor,
	Nick Desaulniers, Guo Ren
  Cc: linux-riscv, linux-kernel, llvm

Since riscv is converted to generic entry, there's no need for the
extra wrappers of trace_hardirqs_{on,off}.

Tested with llvm + irqsoff.

Signed-off-by: Jisheng Zhang <jszhang@kernel.org>
Reviewed-by: Guo Ren <guoren@kernel.org>
---
 arch/riscv/kernel/Makefile    |  2 --
 arch/riscv/kernel/trace_irq.c | 27 ---------------------------
 arch/riscv/kernel/trace_irq.h | 11 -----------
 3 files changed, 40 deletions(-)
 delete mode 100644 arch/riscv/kernel/trace_irq.c
 delete mode 100644 arch/riscv/kernel/trace_irq.h

diff --git a/arch/riscv/kernel/Makefile b/arch/riscv/kernel/Makefile
index 01da14e21019..11ee206cc235 100644
--- a/arch/riscv/kernel/Makefile
+++ b/arch/riscv/kernel/Makefile
@@ -69,8 +69,6 @@ obj-$(CONFIG_CPU_PM)		+= suspend_entry.o suspend.o
 obj-$(CONFIG_FUNCTION_TRACER)	+= mcount.o ftrace.o
 obj-$(CONFIG_DYNAMIC_FTRACE)	+= mcount-dyn.o
 
-obj-$(CONFIG_TRACE_IRQFLAGS)	+= trace_irq.o
-
 obj-$(CONFIG_PERF_EVENTS)	+= perf_callchain.o
 obj-$(CONFIG_HAVE_PERF_REGS)	+= perf_regs.o
 obj-$(CONFIG_RISCV_SBI)		+= sbi.o
diff --git a/arch/riscv/kernel/trace_irq.c b/arch/riscv/kernel/trace_irq.c
deleted file mode 100644
index 095ac976d7da..000000000000
--- a/arch/riscv/kernel/trace_irq.c
+++ /dev/null
@@ -1,27 +0,0 @@
-// SPDX-License-Identifier: GPL-2.0
-/*
- * Copyright (C) 2022 Changbin Du <changbin.du@gmail.com>
- */
-
-#include <linux/irqflags.h>
-#include <linux/kprobes.h>
-#include "trace_irq.h"
-
-/*
- * trace_hardirqs_on/off require the caller to setup frame pointer properly.
- * Otherwise, CALLER_ADDR1 might trigger an pagging exception in kernel.
- * Here we add one extra level so they can be safely called by low
- * level entry code which $fp is used for other purpose.
- */
-
-void __trace_hardirqs_on(void)
-{
-	trace_hardirqs_on();
-}
-NOKPROBE_SYMBOL(__trace_hardirqs_on);
-
-void __trace_hardirqs_off(void)
-{
-	trace_hardirqs_off();
-}
-NOKPROBE_SYMBOL(__trace_hardirqs_off);
diff --git a/arch/riscv/kernel/trace_irq.h b/arch/riscv/kernel/trace_irq.h
deleted file mode 100644
index 99fe67377e5e..000000000000
--- a/arch/riscv/kernel/trace_irq.h
+++ /dev/null
@@ -1,11 +0,0 @@
-/* SPDX-License-Identifier: GPL-2.0 */
-/*
- * Copyright (C) 2022 Changbin Du <changbin.du@gmail.com>
- */
-#ifndef __TRACE_IRQ_H
-#define __TRACE_IRQ_H
-
-void __trace_hardirqs_on(void);
-void __trace_hardirqs_off(void);
-
-#endif /* __TRACE_IRQ_H */
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH v2 2/4] riscv: consolidate ret_from_kernel_thread into ret_from_fork
  2022-09-28 16:20 [PATCH v2 0/4] riscv: entry: further clean up and VMAP_STACK fix Jisheng Zhang
  2022-09-28 16:20 ` [PATCH v2 1/4] riscv: remove extra level wrappers of trace_hardirqs_{on,off} Jisheng Zhang
@ 2022-09-28 16:20 ` Jisheng Zhang
  2022-10-19  3:37   ` Guo Ren
  2022-09-28 16:20 ` [PATCH v2 3/4] riscv: fix race when vmap stack overflow and remove shadow_stack Jisheng Zhang
  2022-09-28 16:20 ` [PATCH v2 4/4] riscv: entry: consolidate general regs saving into save_gp Jisheng Zhang
  3 siblings, 1 reply; 18+ messages in thread
From: Jisheng Zhang @ 2022-09-28 16:20 UTC (permalink / raw)
  To: Paul Walmsley, Palmer Dabbelt, Albert Ou, Nathan Chancellor,
	Nick Desaulniers, Guo Ren
  Cc: linux-riscv, linux-kernel, llvm

The ret_from_kernel_thread() behaves similarly with ret_from_fork(),
the only difference is whether call the fn(arg) or not, this can be
acchieved by testing fn is NULL or not, I.E s0 is 0 or not.

Signed-off-by: Jisheng Zhang <jszhang@kernel.org>
---
 arch/riscv/kernel/entry.S   | 11 +++--------
 arch/riscv/kernel/process.c |  5 ++---
 2 files changed, 5 insertions(+), 11 deletions(-)

diff --git a/arch/riscv/kernel/entry.S b/arch/riscv/kernel/entry.S
index 2207cf44a3bc..a3e1ed2fa2ac 100644
--- a/arch/riscv/kernel/entry.S
+++ b/arch/riscv/kernel/entry.S
@@ -323,20 +323,15 @@ END(handle_kernel_stack_overflow)
 
 ENTRY(ret_from_fork)
 	call schedule_tail
-	move a0, sp /* pt_regs */
-	la ra, ret_from_exception
-	tail syscall_exit_to_user_mode
-ENDPROC(ret_from_fork)
-
-ENTRY(ret_from_kernel_thread)
-	call schedule_tail
+	beqz s0, 1f	/* not from kernel thread */
 	/* Call fn(arg) */
 	move a0, s1
 	jalr s0
+1:
 	move a0, sp /* pt_regs */
 	la ra, ret_from_exception
 	tail syscall_exit_to_user_mode
-ENDPROC(ret_from_kernel_thread)
+ENDPROC(ret_from_fork)
 
 #ifdef CONFIG_IRQ_STACKS
 ENTRY(call_on_stack)
diff --git a/arch/riscv/kernel/process.c b/arch/riscv/kernel/process.c
index ceb9ebab6558..67e7cd123ceb 100644
--- a/arch/riscv/kernel/process.c
+++ b/arch/riscv/kernel/process.c
@@ -34,7 +34,6 @@ EXPORT_SYMBOL(__stack_chk_guard);
 #endif
 
 extern asmlinkage void ret_from_fork(void);
-extern asmlinkage void ret_from_kernel_thread(void);
 
 void arch_cpu_idle(void)
 {
@@ -172,7 +171,6 @@ int copy_thread(struct task_struct *p, const struct kernel_clone_args *args)
 		/* Supervisor/Machine, irqs on: */
 		childregs->status = SR_PP | SR_PIE;
 
-		p->thread.ra = (unsigned long)ret_from_kernel_thread;
 		p->thread.s[0] = (unsigned long)args->fn;
 		p->thread.s[1] = (unsigned long)args->fn_arg;
 	} else {
@@ -182,8 +180,9 @@ int copy_thread(struct task_struct *p, const struct kernel_clone_args *args)
 		if (clone_flags & CLONE_SETTLS)
 			childregs->tp = tls;
 		childregs->a0 = 0; /* Return value of fork() */
-		p->thread.ra = (unsigned long)ret_from_fork;
+		p->thread.s[0] = 0;
 	}
+	p->thread.ra = (unsigned long)ret_from_fork;
 	p->thread.sp = (unsigned long)childregs; /* kernel sp */
 	return 0;
 }
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH v2 3/4] riscv: fix race when vmap stack overflow and remove shadow_stack
  2022-09-28 16:20 [PATCH v2 0/4] riscv: entry: further clean up and VMAP_STACK fix Jisheng Zhang
  2022-09-28 16:20 ` [PATCH v2 1/4] riscv: remove extra level wrappers of trace_hardirqs_{on,off} Jisheng Zhang
  2022-09-28 16:20 ` [PATCH v2 2/4] riscv: consolidate ret_from_kernel_thread into ret_from_fork Jisheng Zhang
@ 2022-09-28 16:20 ` Jisheng Zhang
  2022-09-28 16:54   ` Jisheng Zhang
  2022-09-28 16:20 ` [PATCH v2 4/4] riscv: entry: consolidate general regs saving into save_gp Jisheng Zhang
  3 siblings, 1 reply; 18+ messages in thread
From: Jisheng Zhang @ 2022-09-28 16:20 UTC (permalink / raw)
  To: Paul Walmsley, Palmer Dabbelt, Albert Ou, Nathan Chancellor,
	Nick Desaulniers, Guo Ren
  Cc: linux-riscv, linux-kernel, llvm

Currently, when detecting vmap stack overflow, riscv firstly switches
to the so called shadow stack, then use this shadow stack to call the
get_overflow_stack() to get the overflow stack. However, there's
a race here if two or more harts use the same shadow stack at the same
time.

To solve this race, we rely on two facts:
1. the content of kernel thread pointer I.E "tp" register can still
be gotten from the the CSR_SCRATCH register, thus we can clobber tp
under the condtion that we restore tp from CSR_SCRATCH later.

2. Once vmap stack overflow happen, panic is comming soon, no
performance concern at all, so we don't need to define the overflow
stack as percpu var, we can simplify it into a pointer array which
points to allocated pages.

Thus we can use tp as a tmp register to get the cpu id to calculate
the offset of overflow stack pointer array for each cpu w/o shadow
stack any more. Thus the race condition is removed as a side effect.

NOTE: we can use similar mechanism to let each cpu use different shadow
stack to fix the race codition, but if we can remove shadow stack usage
totally, why not.

Signed-off-by: Jisheng Zhang <jszhang@kernel.org>
Fixes: 31da94c25aea ("riscv: add VMAP_STACK overflow detection")
---
 arch/riscv/include/asm/asm-prototypes.h |  1 -
 arch/riscv/include/asm/thread_info.h    |  4 +-
 arch/riscv/kernel/asm-offsets.c         |  1 +
 arch/riscv/kernel/entry.S               | 56 ++++---------------------
 arch/riscv/kernel/traps.c               | 31 ++++++++------
 5 files changed, 29 insertions(+), 64 deletions(-)

diff --git a/arch/riscv/include/asm/asm-prototypes.h b/arch/riscv/include/asm/asm-prototypes.h
index ef386fcf3939..4a06fa0f6493 100644
--- a/arch/riscv/include/asm/asm-prototypes.h
+++ b/arch/riscv/include/asm/asm-prototypes.h
@@ -25,7 +25,6 @@ DECLARE_DO_ERROR_INFO(do_trap_ecall_s);
 DECLARE_DO_ERROR_INFO(do_trap_ecall_m);
 DECLARE_DO_ERROR_INFO(do_trap_break);
 
-asmlinkage unsigned long get_overflow_stack(void);
 asmlinkage void handle_bad_stack(struct pt_regs *regs);
 
 #endif /* _ASM_RISCV_PROTOTYPES_H */
diff --git a/arch/riscv/include/asm/thread_info.h b/arch/riscv/include/asm/thread_info.h
index c970d41dc4c6..c604a5212a73 100644
--- a/arch/riscv/include/asm/thread_info.h
+++ b/arch/riscv/include/asm/thread_info.h
@@ -28,14 +28,12 @@
 
 #define THREAD_SHIFT            (PAGE_SHIFT + THREAD_SIZE_ORDER)
 #define OVERFLOW_STACK_SIZE     SZ_4K
-#define SHADOW_OVERFLOW_STACK_SIZE (1024)
+#define OVERFLOW_STACK_SHIFT	12
 
 #define IRQ_STACK_SIZE		THREAD_SIZE
 
 #ifndef __ASSEMBLY__
 
-extern long shadow_stack[SHADOW_OVERFLOW_STACK_SIZE / sizeof(long)];
-
 #include <asm/processor.h>
 #include <asm/csr.h>
 
diff --git a/arch/riscv/kernel/asm-offsets.c b/arch/riscv/kernel/asm-offsets.c
index df9444397908..62bf3bacc322 100644
--- a/arch/riscv/kernel/asm-offsets.c
+++ b/arch/riscv/kernel/asm-offsets.c
@@ -37,6 +37,7 @@ void asm_offsets(void)
 	OFFSET(TASK_TI_PREEMPT_COUNT, task_struct, thread_info.preempt_count);
 	OFFSET(TASK_TI_KERNEL_SP, task_struct, thread_info.kernel_sp);
 	OFFSET(TASK_TI_USER_SP, task_struct, thread_info.user_sp);
+	OFFSET(TASK_TI_CPU, task_struct, thread_info.cpu);
 
 	OFFSET(TASK_THREAD_F0,  task_struct, thread.fstate.f[0]);
 	OFFSET(TASK_THREAD_F1,  task_struct, thread.fstate.f[1]);
diff --git a/arch/riscv/kernel/entry.S b/arch/riscv/kernel/entry.S
index a3e1ed2fa2ac..5a6171a90d81 100644
--- a/arch/riscv/kernel/entry.S
+++ b/arch/riscv/kernel/entry.S
@@ -223,54 +223,16 @@ END(ret_from_exception)
 
 #ifdef CONFIG_VMAP_STACK
 ENTRY(handle_kernel_stack_overflow)
-	la sp, shadow_stack
-	addi sp, sp, SHADOW_OVERFLOW_STACK_SIZE
-
-	//save caller register to shadow stack
-	addi sp, sp, -(PT_SIZE_ON_STACK)
-	REG_S x1,  PT_RA(sp)
-	REG_S x5,  PT_T0(sp)
-	REG_S x6,  PT_T1(sp)
-	REG_S x7,  PT_T2(sp)
-	REG_S x10, PT_A0(sp)
-	REG_S x11, PT_A1(sp)
-	REG_S x12, PT_A2(sp)
-	REG_S x13, PT_A3(sp)
-	REG_S x14, PT_A4(sp)
-	REG_S x15, PT_A5(sp)
-	REG_S x16, PT_A6(sp)
-	REG_S x17, PT_A7(sp)
-	REG_S x28, PT_T3(sp)
-	REG_S x29, PT_T4(sp)
-	REG_S x30, PT_T5(sp)
-	REG_S x31, PT_T6(sp)
-
-	la ra, restore_caller_reg
-	tail get_overflow_stack
-
-restore_caller_reg:
-	//save per-cpu overflow stack
-	REG_S a0, -8(sp)
-	//restore caller register from shadow_stack
-	REG_L x1,  PT_RA(sp)
-	REG_L x5,  PT_T0(sp)
-	REG_L x6,  PT_T1(sp)
-	REG_L x7,  PT_T2(sp)
-	REG_L x10, PT_A0(sp)
-	REG_L x11, PT_A1(sp)
-	REG_L x12, PT_A2(sp)
-	REG_L x13, PT_A3(sp)
-	REG_L x14, PT_A4(sp)
-	REG_L x15, PT_A5(sp)
-	REG_L x16, PT_A6(sp)
-	REG_L x17, PT_A7(sp)
-	REG_L x28, PT_T3(sp)
-	REG_L x29, PT_T4(sp)
-	REG_L x30, PT_T5(sp)
-	REG_L x31, PT_T6(sp)
+	la sp, overflow_stack
+	/* use tp as tmp register since we can restore it from CSR_SCRATCH */
+	REG_L tp, TASK_TI_CPU(tp)
+	slli tp, tp, RISCV_LGPTR
+	add tp, sp, tp
+	REG_L sp, 0(tp)
+
+	/* restore tp */
+	csrr tp, CSR_SCRATCH
 
-	//load per-cpu overflow stack
-	REG_L sp, -8(sp)
 	addi sp, sp, -(PT_SIZE_ON_STACK)
 
 	//save context to overflow stack
diff --git a/arch/riscv/kernel/traps.c b/arch/riscv/kernel/traps.c
index 73f06cd149d9..b6c64f0fb70f 100644
--- a/arch/riscv/kernel/traps.c
+++ b/arch/riscv/kernel/traps.c
@@ -216,23 +216,12 @@ int is_valid_bugaddr(unsigned long pc)
 #endif /* CONFIG_GENERIC_BUG */
 
 #ifdef CONFIG_VMAP_STACK
-static DEFINE_PER_CPU(unsigned long [OVERFLOW_STACK_SIZE/sizeof(long)],
-		overflow_stack)__aligned(16);
-/*
- * shadow stack, handled_ kernel_ stack_ overflow(in kernel/entry.S) is used
- * to get per-cpu overflow stack(get_overflow_stack).
- */
-long shadow_stack[SHADOW_OVERFLOW_STACK_SIZE/sizeof(long)];
-asmlinkage unsigned long get_overflow_stack(void)
-{
-	return (unsigned long)this_cpu_ptr(overflow_stack) +
-		OVERFLOW_STACK_SIZE;
-}
+void *overflow_stack[NR_CPUS] __ro_after_init __aligned(16);
 
 asmlinkage void handle_bad_stack(struct pt_regs *regs)
 {
 	unsigned long tsk_stk = (unsigned long)current->stack;
-	unsigned long ovf_stk = (unsigned long)this_cpu_ptr(overflow_stack);
+	unsigned long ovf_stk = (unsigned long)overflow_stack[raw_smp_processor_id()];
 
 	console_verbose();
 
@@ -248,4 +237,20 @@ asmlinkage void handle_bad_stack(struct pt_regs *regs)
 	for (;;)
 		wait_for_interrupt();
 }
+
+static int __init alloc_overflow_stacks(void)
+{
+	u8 *s;
+	int cpu;
+
+	for_each_possible_cpu(cpu) {
+		s = (u8 *)__get_free_pages(GFP_KERNEL, get_order(OVERFLOW_STACK_SIZE));
+		if (WARN_ON(!s))
+			return -ENOMEM;
+		overflow_stack[cpu] = &s[OVERFLOW_STACK_SIZE];
+		printk("%px\n", overflow_stack[cpu]);
+	}
+	return 0;
+}
+early_initcall(alloc_overflow_stacks);
 #endif
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH v2 4/4] riscv: entry: consolidate general regs saving into save_gp
  2022-09-28 16:20 [PATCH v2 0/4] riscv: entry: further clean up and VMAP_STACK fix Jisheng Zhang
                   ` (2 preceding siblings ...)
  2022-09-28 16:20 ` [PATCH v2 3/4] riscv: fix race when vmap stack overflow and remove shadow_stack Jisheng Zhang
@ 2022-09-28 16:20 ` Jisheng Zhang
  2022-09-29  3:59   ` Guo Ren
  3 siblings, 1 reply; 18+ messages in thread
From: Jisheng Zhang @ 2022-09-28 16:20 UTC (permalink / raw)
  To: Paul Walmsley, Palmer Dabbelt, Albert Ou, Nathan Chancellor,
	Nick Desaulniers, Guo Ren
  Cc: linux-riscv, linux-kernel, llvm

Consolidate the saving/restoring GPs(except ra, sp and tp) into
save_gp/restore_gp macro.

No functional change intended.

Signed-off-by: Jisheng Zhang <jszhang@kernel.org>
---
 arch/riscv/include/asm/asm.h   | 65 +++++++++++++++++++++++++
 arch/riscv/kernel/entry.S      | 87 ++--------------------------------
 arch/riscv/kernel/mcount-dyn.S | 58 +----------------------
 3 files changed, 70 insertions(+), 140 deletions(-)

diff --git a/arch/riscv/include/asm/asm.h b/arch/riscv/include/asm/asm.h
index 1b471ff73178..2f3b49536e9d 100644
--- a/arch/riscv/include/asm/asm.h
+++ b/arch/riscv/include/asm/asm.h
@@ -68,6 +68,7 @@
 #endif
 
 #ifdef __ASSEMBLY__
+#include <asm/asm-offsets.h>
 
 /* Common assembly source macros */
 
@@ -80,6 +81,70 @@
 	.endr
 .endm
 
+	/* save all GPs except ra, sp and tp */
+	.macro save_gp
+	REG_S x3,  PT_GP(sp)
+	REG_S x5,  PT_T0(sp)
+	REG_S x6,  PT_T1(sp)
+	REG_S x7,  PT_T2(sp)
+	REG_S x8,  PT_S0(sp)
+	REG_S x9,  PT_S1(sp)
+	REG_S x10, PT_A0(sp)
+	REG_S x11, PT_A1(sp)
+	REG_S x12, PT_A2(sp)
+	REG_S x13, PT_A3(sp)
+	REG_S x14, PT_A4(sp)
+	REG_S x15, PT_A5(sp)
+	REG_S x16, PT_A6(sp)
+	REG_S x17, PT_A7(sp)
+	REG_S x18, PT_S2(sp)
+	REG_S x19, PT_S3(sp)
+	REG_S x20, PT_S4(sp)
+	REG_S x21, PT_S5(sp)
+	REG_S x22, PT_S6(sp)
+	REG_S x23, PT_S7(sp)
+	REG_S x24, PT_S8(sp)
+	REG_S x25, PT_S9(sp)
+	REG_S x26, PT_S10(sp)
+	REG_S x27, PT_S11(sp)
+	REG_S x28, PT_T3(sp)
+	REG_S x29, PT_T4(sp)
+	REG_S x30, PT_T5(sp)
+	REG_S x31, PT_T6(sp)
+	.endm
+
+	/* restore all GPs except ra, sp and tp */
+	.macro restore_gp
+	REG_L x3,  PT_GP(sp)
+	REG_L x5,  PT_T0(sp)
+	REG_L x6,  PT_T1(sp)
+	REG_L x7,  PT_T2(sp)
+	REG_L x8,  PT_S0(sp)
+	REG_L x9,  PT_S1(sp)
+	REG_L x10, PT_A0(sp)
+	REG_L x11, PT_A1(sp)
+	REG_L x12, PT_A2(sp)
+	REG_L x13, PT_A3(sp)
+	REG_L x14, PT_A4(sp)
+	REG_L x15, PT_A5(sp)
+	REG_L x16, PT_A6(sp)
+	REG_L x17, PT_A7(sp)
+	REG_L x18, PT_S2(sp)
+	REG_L x19, PT_S3(sp)
+	REG_L x20, PT_S4(sp)
+	REG_L x21, PT_S5(sp)
+	REG_L x22, PT_S6(sp)
+	REG_L x23, PT_S7(sp)
+	REG_L x24, PT_S8(sp)
+	REG_L x25, PT_S9(sp)
+	REG_L x26, PT_S10(sp)
+	REG_L x27, PT_S11(sp)
+	REG_L x28, PT_T3(sp)
+	REG_L x29, PT_T4(sp)
+	REG_L x30, PT_T5(sp)
+	REG_L x31, PT_T6(sp)
+	.endm
+
 #endif /* __ASSEMBLY__ */
 
 #endif /* _ASM_RISCV_ASM_H */
diff --git a/arch/riscv/kernel/entry.S b/arch/riscv/kernel/entry.S
index 5a6171a90d81..a90f17ed2822 100644
--- a/arch/riscv/kernel/entry.S
+++ b/arch/riscv/kernel/entry.S
@@ -40,34 +40,7 @@ _save_context:
 	REG_L sp, TASK_TI_KERNEL_SP(tp)
 	addi sp, sp, -(PT_SIZE_ON_STACK)
 	REG_S x1,  PT_RA(sp)
-	REG_S x3,  PT_GP(sp)
-	REG_S x5,  PT_T0(sp)
-	REG_S x6,  PT_T1(sp)
-	REG_S x7,  PT_T2(sp)
-	REG_S x8,  PT_S0(sp)
-	REG_S x9,  PT_S1(sp)
-	REG_S x10, PT_A0(sp)
-	REG_S x11, PT_A1(sp)
-	REG_S x12, PT_A2(sp)
-	REG_S x13, PT_A3(sp)
-	REG_S x14, PT_A4(sp)
-	REG_S x15, PT_A5(sp)
-	REG_S x16, PT_A6(sp)
-	REG_S x17, PT_A7(sp)
-	REG_S x18, PT_S2(sp)
-	REG_S x19, PT_S3(sp)
-	REG_S x20, PT_S4(sp)
-	REG_S x21, PT_S5(sp)
-	REG_S x22, PT_S6(sp)
-	REG_S x23, PT_S7(sp)
-	REG_S x24, PT_S8(sp)
-	REG_S x25, PT_S9(sp)
-	REG_S x26, PT_S10(sp)
-	REG_S x27, PT_S11(sp)
-	REG_S x28, PT_T3(sp)
-	REG_S x29, PT_T4(sp)
-	REG_S x30, PT_T5(sp)
-	REG_S x31, PT_T6(sp)
+	save_gp
 
 	/*
 	 * Disable user-mode memory access as it should only be set in the
@@ -182,35 +155,8 @@ ENTRY(ret_from_exception)
 	csrw CSR_STATUS, a0
 
 	REG_L x1,  PT_RA(sp)
-	REG_L x3,  PT_GP(sp)
 	REG_L x4,  PT_TP(sp)
-	REG_L x5,  PT_T0(sp)
-	REG_L x6,  PT_T1(sp)
-	REG_L x7,  PT_T2(sp)
-	REG_L x8,  PT_S0(sp)
-	REG_L x9,  PT_S1(sp)
-	REG_L x10, PT_A0(sp)
-	REG_L x11, PT_A1(sp)
-	REG_L x12, PT_A2(sp)
-	REG_L x13, PT_A3(sp)
-	REG_L x14, PT_A4(sp)
-	REG_L x15, PT_A5(sp)
-	REG_L x16, PT_A6(sp)
-	REG_L x17, PT_A7(sp)
-	REG_L x18, PT_S2(sp)
-	REG_L x19, PT_S3(sp)
-	REG_L x20, PT_S4(sp)
-	REG_L x21, PT_S5(sp)
-	REG_L x22, PT_S6(sp)
-	REG_L x23, PT_S7(sp)
-	REG_L x24, PT_S8(sp)
-	REG_L x25, PT_S9(sp)
-	REG_L x26, PT_S10(sp)
-	REG_L x27, PT_S11(sp)
-	REG_L x28, PT_T3(sp)
-	REG_L x29, PT_T4(sp)
-	REG_L x30, PT_T5(sp)
-	REG_L x31, PT_T6(sp)
+	restore_gp
 
 	REG_L x2,  PT_SP(sp)
 
@@ -237,34 +183,7 @@ ENTRY(handle_kernel_stack_overflow)
 
 	//save context to overflow stack
 	REG_S x1,  PT_RA(sp)
-	REG_S x3,  PT_GP(sp)
-	REG_S x5,  PT_T0(sp)
-	REG_S x6,  PT_T1(sp)
-	REG_S x7,  PT_T2(sp)
-	REG_S x8,  PT_S0(sp)
-	REG_S x9,  PT_S1(sp)
-	REG_S x10, PT_A0(sp)
-	REG_S x11, PT_A1(sp)
-	REG_S x12, PT_A2(sp)
-	REG_S x13, PT_A3(sp)
-	REG_S x14, PT_A4(sp)
-	REG_S x15, PT_A5(sp)
-	REG_S x16, PT_A6(sp)
-	REG_S x17, PT_A7(sp)
-	REG_S x18, PT_S2(sp)
-	REG_S x19, PT_S3(sp)
-	REG_S x20, PT_S4(sp)
-	REG_S x21, PT_S5(sp)
-	REG_S x22, PT_S6(sp)
-	REG_S x23, PT_S7(sp)
-	REG_S x24, PT_S8(sp)
-	REG_S x25, PT_S9(sp)
-	REG_S x26, PT_S10(sp)
-	REG_S x27, PT_S11(sp)
-	REG_S x28, PT_T3(sp)
-	REG_S x29, PT_T4(sp)
-	REG_S x30, PT_T5(sp)
-	REG_S x31, PT_T6(sp)
+	save_gp
 
 	REG_L s0, TASK_TI_KERNEL_SP(tp)
 	csrr s1, CSR_STATUS
diff --git a/arch/riscv/kernel/mcount-dyn.S b/arch/riscv/kernel/mcount-dyn.S
index d171eca623b6..1b4b0aecf4f5 100644
--- a/arch/riscv/kernel/mcount-dyn.S
+++ b/arch/riscv/kernel/mcount-dyn.S
@@ -68,35 +68,8 @@
 	REG_L x1,  PT_EPC(sp)
 
 	REG_S x2,  PT_SP(sp)
-	REG_S x3,  PT_GP(sp)
 	REG_S x4,  PT_TP(sp)
-	REG_S x5,  PT_T0(sp)
-	REG_S x6,  PT_T1(sp)
-	REG_S x7,  PT_T2(sp)
-	REG_S x8,  PT_S0(sp)
-	REG_S x9,  PT_S1(sp)
-	REG_S x10, PT_A0(sp)
-	REG_S x11, PT_A1(sp)
-	REG_S x12, PT_A2(sp)
-	REG_S x13, PT_A3(sp)
-	REG_S x14, PT_A4(sp)
-	REG_S x15, PT_A5(sp)
-	REG_S x16, PT_A6(sp)
-	REG_S x17, PT_A7(sp)
-	REG_S x18, PT_S2(sp)
-	REG_S x19, PT_S3(sp)
-	REG_S x20, PT_S4(sp)
-	REG_S x21, PT_S5(sp)
-	REG_S x22, PT_S6(sp)
-	REG_S x23, PT_S7(sp)
-	REG_S x24, PT_S8(sp)
-	REG_S x25, PT_S9(sp)
-	REG_S x26, PT_S10(sp)
-	REG_S x27, PT_S11(sp)
-	REG_S x28, PT_T3(sp)
-	REG_S x29, PT_T4(sp)
-	REG_S x30, PT_T5(sp)
-	REG_S x31, PT_T6(sp)
+	save_gp
 	.endm
 
 	.macro RESTORE_ALL
@@ -106,35 +79,8 @@
 	addi	sp, sp, -PT_SIZE_ON_STACK
 	REG_L x1,  PT_EPC(sp)
 	REG_L x2,  PT_SP(sp)
-	REG_L x3,  PT_GP(sp)
 	REG_L x4,  PT_TP(sp)
-	REG_L x5,  PT_T0(sp)
-	REG_L x6,  PT_T1(sp)
-	REG_L x7,  PT_T2(sp)
-	REG_L x8,  PT_S0(sp)
-	REG_L x9,  PT_S1(sp)
-	REG_L x10, PT_A0(sp)
-	REG_L x11, PT_A1(sp)
-	REG_L x12, PT_A2(sp)
-	REG_L x13, PT_A3(sp)
-	REG_L x14, PT_A4(sp)
-	REG_L x15, PT_A5(sp)
-	REG_L x16, PT_A6(sp)
-	REG_L x17, PT_A7(sp)
-	REG_L x18, PT_S2(sp)
-	REG_L x19, PT_S3(sp)
-	REG_L x20, PT_S4(sp)
-	REG_L x21, PT_S5(sp)
-	REG_L x22, PT_S6(sp)
-	REG_L x23, PT_S7(sp)
-	REG_L x24, PT_S8(sp)
-	REG_L x25, PT_S9(sp)
-	REG_L x26, PT_S10(sp)
-	REG_L x27, PT_S11(sp)
-	REG_L x28, PT_T3(sp)
-	REG_L x29, PT_T4(sp)
-	REG_L x30, PT_T5(sp)
-	REG_L x31, PT_T6(sp)
+	restore_gp
 
 	addi	sp, sp, PT_SIZE_ON_STACK
 	addi	sp, sp, SZREG
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* Re: [PATCH v2 3/4] riscv: fix race when vmap stack overflow and remove shadow_stack
  2022-09-28 16:20 ` [PATCH v2 3/4] riscv: fix race when vmap stack overflow and remove shadow_stack Jisheng Zhang
@ 2022-09-28 16:54   ` Jisheng Zhang
  2022-09-29  5:54     ` Guo Ren
  0 siblings, 1 reply; 18+ messages in thread
From: Jisheng Zhang @ 2022-09-28 16:54 UTC (permalink / raw)
  To: Paul Walmsley, Palmer Dabbelt, Albert Ou, Nathan Chancellor,
	Nick Desaulniers, Guo Ren
  Cc: linux-riscv, linux-kernel, llvm

On Thu, Sep 29, 2022 at 12:20:06AM +0800, Jisheng Zhang wrote:
> Currently, when detecting vmap stack overflow, riscv firstly switches
> to the so called shadow stack, then use this shadow stack to call the
> get_overflow_stack() to get the overflow stack. However, there's
> a race here if two or more harts use the same shadow stack at the same
> time.
> 
> To solve this race, we rely on two facts:
> 1. the content of kernel thread pointer I.E "tp" register can still
> be gotten from the the CSR_SCRATCH register, thus we can clobber tp
> under the condtion that we restore tp from CSR_SCRATCH later.
> 
> 2. Once vmap stack overflow happen, panic is comming soon, no
> performance concern at all, so we don't need to define the overflow
> stack as percpu var, we can simplify it into a pointer array which
> points to allocated pages.
> 
> Thus we can use tp as a tmp register to get the cpu id to calculate
> the offset of overflow stack pointer array for each cpu w/o shadow
> stack any more. Thus the race condition is removed as a side effect.
> 
> NOTE: we can use similar mechanism to let each cpu use different shadow
> stack to fix the race codition, but if we can remove shadow stack usage
> totally, why not.
> 
> Signed-off-by: Jisheng Zhang <jszhang@kernel.org>
> Fixes: 31da94c25aea ("riscv: add VMAP_STACK overflow detection")
> ---
>  arch/riscv/include/asm/asm-prototypes.h |  1 -
>  arch/riscv/include/asm/thread_info.h    |  4 +-
>  arch/riscv/kernel/asm-offsets.c         |  1 +
>  arch/riscv/kernel/entry.S               | 56 ++++---------------------
>  arch/riscv/kernel/traps.c               | 31 ++++++++------
>  5 files changed, 29 insertions(+), 64 deletions(-)
> 
> diff --git a/arch/riscv/include/asm/asm-prototypes.h b/arch/riscv/include/asm/asm-prototypes.h
> index ef386fcf3939..4a06fa0f6493 100644
> --- a/arch/riscv/include/asm/asm-prototypes.h
> +++ b/arch/riscv/include/asm/asm-prototypes.h
> @@ -25,7 +25,6 @@ DECLARE_DO_ERROR_INFO(do_trap_ecall_s);
>  DECLARE_DO_ERROR_INFO(do_trap_ecall_m);
>  DECLARE_DO_ERROR_INFO(do_trap_break);
>  
> -asmlinkage unsigned long get_overflow_stack(void);
>  asmlinkage void handle_bad_stack(struct pt_regs *regs);
>  
>  #endif /* _ASM_RISCV_PROTOTYPES_H */
> diff --git a/arch/riscv/include/asm/thread_info.h b/arch/riscv/include/asm/thread_info.h
> index c970d41dc4c6..c604a5212a73 100644
> --- a/arch/riscv/include/asm/thread_info.h
> +++ b/arch/riscv/include/asm/thread_info.h
> @@ -28,14 +28,12 @@
>  
>  #define THREAD_SHIFT            (PAGE_SHIFT + THREAD_SIZE_ORDER)
>  #define OVERFLOW_STACK_SIZE     SZ_4K
> -#define SHADOW_OVERFLOW_STACK_SIZE (1024)
> +#define OVERFLOW_STACK_SHIFT	12

oops, this should be removed, will update it in a newer version after
collecting review comments.

>  
>  #define IRQ_STACK_SIZE		THREAD_SIZE
>  
>  #ifndef __ASSEMBLY__
>  
> -extern long shadow_stack[SHADOW_OVERFLOW_STACK_SIZE / sizeof(long)];
> -
>  #include <asm/processor.h>
>  #include <asm/csr.h>
>  
> diff --git a/arch/riscv/kernel/asm-offsets.c b/arch/riscv/kernel/asm-offsets.c
> index df9444397908..62bf3bacc322 100644
> --- a/arch/riscv/kernel/asm-offsets.c
> +++ b/arch/riscv/kernel/asm-offsets.c
> @@ -37,6 +37,7 @@ void asm_offsets(void)
>  	OFFSET(TASK_TI_PREEMPT_COUNT, task_struct, thread_info.preempt_count);
>  	OFFSET(TASK_TI_KERNEL_SP, task_struct, thread_info.kernel_sp);
>  	OFFSET(TASK_TI_USER_SP, task_struct, thread_info.user_sp);
> +	OFFSET(TASK_TI_CPU, task_struct, thread_info.cpu);
>  
>  	OFFSET(TASK_THREAD_F0,  task_struct, thread.fstate.f[0]);
>  	OFFSET(TASK_THREAD_F1,  task_struct, thread.fstate.f[1]);
> diff --git a/arch/riscv/kernel/entry.S b/arch/riscv/kernel/entry.S
> index a3e1ed2fa2ac..5a6171a90d81 100644
> --- a/arch/riscv/kernel/entry.S
> +++ b/arch/riscv/kernel/entry.S
> @@ -223,54 +223,16 @@ END(ret_from_exception)
>  
>  #ifdef CONFIG_VMAP_STACK
>  ENTRY(handle_kernel_stack_overflow)
> -	la sp, shadow_stack
> -	addi sp, sp, SHADOW_OVERFLOW_STACK_SIZE
> -
> -	//save caller register to shadow stack
> -	addi sp, sp, -(PT_SIZE_ON_STACK)
> -	REG_S x1,  PT_RA(sp)
> -	REG_S x5,  PT_T0(sp)
> -	REG_S x6,  PT_T1(sp)
> -	REG_S x7,  PT_T2(sp)
> -	REG_S x10, PT_A0(sp)
> -	REG_S x11, PT_A1(sp)
> -	REG_S x12, PT_A2(sp)
> -	REG_S x13, PT_A3(sp)
> -	REG_S x14, PT_A4(sp)
> -	REG_S x15, PT_A5(sp)
> -	REG_S x16, PT_A6(sp)
> -	REG_S x17, PT_A7(sp)
> -	REG_S x28, PT_T3(sp)
> -	REG_S x29, PT_T4(sp)
> -	REG_S x30, PT_T5(sp)
> -	REG_S x31, PT_T6(sp)
> -
> -	la ra, restore_caller_reg
> -	tail get_overflow_stack
> -
> -restore_caller_reg:
> -	//save per-cpu overflow stack
> -	REG_S a0, -8(sp)
> -	//restore caller register from shadow_stack
> -	REG_L x1,  PT_RA(sp)
> -	REG_L x5,  PT_T0(sp)
> -	REG_L x6,  PT_T1(sp)
> -	REG_L x7,  PT_T2(sp)
> -	REG_L x10, PT_A0(sp)
> -	REG_L x11, PT_A1(sp)
> -	REG_L x12, PT_A2(sp)
> -	REG_L x13, PT_A3(sp)
> -	REG_L x14, PT_A4(sp)
> -	REG_L x15, PT_A5(sp)
> -	REG_L x16, PT_A6(sp)
> -	REG_L x17, PT_A7(sp)
> -	REG_L x28, PT_T3(sp)
> -	REG_L x29, PT_T4(sp)
> -	REG_L x30, PT_T5(sp)
> -	REG_L x31, PT_T6(sp)
> +	la sp, overflow_stack
> +	/* use tp as tmp register since we can restore it from CSR_SCRATCH */
> +	REG_L tp, TASK_TI_CPU(tp)
> +	slli tp, tp, RISCV_LGPTR
> +	add tp, sp, tp
> +	REG_L sp, 0(tp)
> +
> +	/* restore tp */
> +	csrr tp, CSR_SCRATCH
>  
> -	//load per-cpu overflow stack
> -	REG_L sp, -8(sp)
>  	addi sp, sp, -(PT_SIZE_ON_STACK)
>  
>  	//save context to overflow stack
> diff --git a/arch/riscv/kernel/traps.c b/arch/riscv/kernel/traps.c
> index 73f06cd149d9..b6c64f0fb70f 100644
> --- a/arch/riscv/kernel/traps.c
> +++ b/arch/riscv/kernel/traps.c
> @@ -216,23 +216,12 @@ int is_valid_bugaddr(unsigned long pc)
>  #endif /* CONFIG_GENERIC_BUG */
>  
>  #ifdef CONFIG_VMAP_STACK
> -static DEFINE_PER_CPU(unsigned long [OVERFLOW_STACK_SIZE/sizeof(long)],
> -		overflow_stack)__aligned(16);
> -/*
> - * shadow stack, handled_ kernel_ stack_ overflow(in kernel/entry.S) is used
> - * to get per-cpu overflow stack(get_overflow_stack).
> - */
> -long shadow_stack[SHADOW_OVERFLOW_STACK_SIZE/sizeof(long)];
> -asmlinkage unsigned long get_overflow_stack(void)
> -{
> -	return (unsigned long)this_cpu_ptr(overflow_stack) +
> -		OVERFLOW_STACK_SIZE;
> -}
> +void *overflow_stack[NR_CPUS] __ro_after_init __aligned(16);
>  
>  asmlinkage void handle_bad_stack(struct pt_regs *regs)
>  {
>  	unsigned long tsk_stk = (unsigned long)current->stack;
> -	unsigned long ovf_stk = (unsigned long)this_cpu_ptr(overflow_stack);
> +	unsigned long ovf_stk = (unsigned long)overflow_stack[raw_smp_processor_id()];
>  
>  	console_verbose();
>  
> @@ -248,4 +237,20 @@ asmlinkage void handle_bad_stack(struct pt_regs *regs)
>  	for (;;)
>  		wait_for_interrupt();
>  }
> +
> +static int __init alloc_overflow_stacks(void)
> +{
> +	u8 *s;
> +	int cpu;
> +
> +	for_each_possible_cpu(cpu) {
> +		s = (u8 *)__get_free_pages(GFP_KERNEL, get_order(OVERFLOW_STACK_SIZE));
> +		if (WARN_ON(!s))
> +			return -ENOMEM;
> +		overflow_stack[cpu] = &s[OVERFLOW_STACK_SIZE];

Since overflow_stack[cpu] points to the top of the slack, we need to
update the ovf_stack dumping in handle_bad_stack(). will take care
this in newer version.

> +		printk("%px\n", overflow_stack[cpu]);

forget to remove this printk :(

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH v2 4/4] riscv: entry: consolidate general regs saving into save_gp
  2022-09-28 16:20 ` [PATCH v2 4/4] riscv: entry: consolidate general regs saving into save_gp Jisheng Zhang
@ 2022-09-29  3:59   ` Guo Ren
  2022-09-29 16:01     ` Jisheng Zhang
  0 siblings, 1 reply; 18+ messages in thread
From: Guo Ren @ 2022-09-29  3:59 UTC (permalink / raw)
  To: Jisheng Zhang
  Cc: Paul Walmsley, Palmer Dabbelt, Albert Ou, Nathan Chancellor,
	Nick Desaulniers, linux-riscv, linux-kernel, llvm

On Thu, Sep 29, 2022 at 12:29 AM Jisheng Zhang <jszhang@kernel.org> wrote:
>
> Consolidate the saving/restoring GPs(except ra, sp and tp) into
> save_gp/restore_gp macro.
>
> No functional change intended.
>
> Signed-off-by: Jisheng Zhang <jszhang@kernel.org>
> ---
>  arch/riscv/include/asm/asm.h   | 65 +++++++++++++++++++++++++
>  arch/riscv/kernel/entry.S      | 87 ++--------------------------------
>  arch/riscv/kernel/mcount-dyn.S | 58 +----------------------
>  3 files changed, 70 insertions(+), 140 deletions(-)
>
> diff --git a/arch/riscv/include/asm/asm.h b/arch/riscv/include/asm/asm.h
> index 1b471ff73178..2f3b49536e9d 100644
> --- a/arch/riscv/include/asm/asm.h
> +++ b/arch/riscv/include/asm/asm.h
> @@ -68,6 +68,7 @@
>  #endif
>
>  #ifdef __ASSEMBLY__
> +#include <asm/asm-offsets.h>
>
>  /* Common assembly source macros */
>
> @@ -80,6 +81,70 @@
>         .endr
>  .endm
>
> +       /* save all GPs except ra, sp and tp */
> +       .macro save_gp
How about leave x3(gp) out of the macro, and define:
.marco save_from_x5_to_x31
.marco restore_from_x5_to_x31

> +       REG_S x3,  PT_GP(sp)
> +       REG_S x5,  PT_T0(sp)
> +       REG_S x6,  PT_T1(sp)
> +       REG_S x7,  PT_T2(sp)
> +       REG_S x8,  PT_S0(sp)
> +       REG_S x9,  PT_S1(sp)
> +       REG_S x10, PT_A0(sp)
> +       REG_S x11, PT_A1(sp)
> +       REG_S x12, PT_A2(sp)
> +       REG_S x13, PT_A3(sp)
> +       REG_S x14, PT_A4(sp)
> +       REG_S x15, PT_A5(sp)
> +       REG_S x16, PT_A6(sp)
> +       REG_S x17, PT_A7(sp)
> +       REG_S x18, PT_S2(sp)
> +       REG_S x19, PT_S3(sp)
> +       REG_S x20, PT_S4(sp)
> +       REG_S x21, PT_S5(sp)
> +       REG_S x22, PT_S6(sp)
> +       REG_S x23, PT_S7(sp)
> +       REG_S x24, PT_S8(sp)
> +       REG_S x25, PT_S9(sp)
> +       REG_S x26, PT_S10(sp)
> +       REG_S x27, PT_S11(sp)
> +       REG_S x28, PT_T3(sp)
> +       REG_S x29, PT_T4(sp)
> +       REG_S x30, PT_T5(sp)
> +       REG_S x31, PT_T6(sp)
> +       .endm
> +
> +       /* restore all GPs except ra, sp and tp */
> +       .macro restore_gp
> +       REG_L x3,  PT_GP(sp)
> +       REG_L x5,  PT_T0(sp)
> +       REG_L x6,  PT_T1(sp)
> +       REG_L x7,  PT_T2(sp)
> +       REG_L x8,  PT_S0(sp)
> +       REG_L x9,  PT_S1(sp)
> +       REG_L x10, PT_A0(sp)
> +       REG_L x11, PT_A1(sp)
> +       REG_L x12, PT_A2(sp)
> +       REG_L x13, PT_A3(sp)
> +       REG_L x14, PT_A4(sp)
> +       REG_L x15, PT_A5(sp)
> +       REG_L x16, PT_A6(sp)
> +       REG_L x17, PT_A7(sp)
> +       REG_L x18, PT_S2(sp)
> +       REG_L x19, PT_S3(sp)
> +       REG_L x20, PT_S4(sp)
> +       REG_L x21, PT_S5(sp)
> +       REG_L x22, PT_S6(sp)
> +       REG_L x23, PT_S7(sp)
> +       REG_L x24, PT_S8(sp)
> +       REG_L x25, PT_S9(sp)
> +       REG_L x26, PT_S10(sp)
> +       REG_L x27, PT_S11(sp)
> +       REG_L x28, PT_T3(sp)
> +       REG_L x29, PT_T4(sp)
> +       REG_L x30, PT_T5(sp)
> +       REG_L x31, PT_T6(sp)
> +       .endm
> +
>  #endif /* __ASSEMBLY__ */
>
>  #endif /* _ASM_RISCV_ASM_H */
> diff --git a/arch/riscv/kernel/entry.S b/arch/riscv/kernel/entry.S
> index 5a6171a90d81..a90f17ed2822 100644
> --- a/arch/riscv/kernel/entry.S
> +++ b/arch/riscv/kernel/entry.S
> @@ -40,34 +40,7 @@ _save_context:
>         REG_L sp, TASK_TI_KERNEL_SP(tp)
>         addi sp, sp, -(PT_SIZE_ON_STACK)
>         REG_S x1,  PT_RA(sp)
> -       REG_S x3,  PT_GP(sp)
> -       REG_S x5,  PT_T0(sp)
> -       REG_S x6,  PT_T1(sp)
> -       REG_S x7,  PT_T2(sp)
> -       REG_S x8,  PT_S0(sp)
> -       REG_S x9,  PT_S1(sp)
> -       REG_S x10, PT_A0(sp)
> -       REG_S x11, PT_A1(sp)
> -       REG_S x12, PT_A2(sp)
> -       REG_S x13, PT_A3(sp)
> -       REG_S x14, PT_A4(sp)
> -       REG_S x15, PT_A5(sp)
> -       REG_S x16, PT_A6(sp)
> -       REG_S x17, PT_A7(sp)
> -       REG_S x18, PT_S2(sp)
> -       REG_S x19, PT_S3(sp)
> -       REG_S x20, PT_S4(sp)
> -       REG_S x21, PT_S5(sp)
> -       REG_S x22, PT_S6(sp)
> -       REG_S x23, PT_S7(sp)
> -       REG_S x24, PT_S8(sp)
> -       REG_S x25, PT_S9(sp)
> -       REG_S x26, PT_S10(sp)
> -       REG_S x27, PT_S11(sp)
> -       REG_S x28, PT_T3(sp)
> -       REG_S x29, PT_T4(sp)
> -       REG_S x30, PT_T5(sp)
> -       REG_S x31, PT_T6(sp)
> +       save_gp
>
>         /*
>          * Disable user-mode memory access as it should only be set in the
> @@ -182,35 +155,8 @@ ENTRY(ret_from_exception)
>         csrw CSR_STATUS, a0
>
>         REG_L x1,  PT_RA(sp)
> -       REG_L x3,  PT_GP(sp)
>         REG_L x4,  PT_TP(sp)
> -       REG_L x5,  PT_T0(sp)
> -       REG_L x6,  PT_T1(sp)
> -       REG_L x7,  PT_T2(sp)
> -       REG_L x8,  PT_S0(sp)
> -       REG_L x9,  PT_S1(sp)
> -       REG_L x10, PT_A0(sp)
> -       REG_L x11, PT_A1(sp)
> -       REG_L x12, PT_A2(sp)
> -       REG_L x13, PT_A3(sp)
> -       REG_L x14, PT_A4(sp)
> -       REG_L x15, PT_A5(sp)
> -       REG_L x16, PT_A6(sp)
> -       REG_L x17, PT_A7(sp)
> -       REG_L x18, PT_S2(sp)
> -       REG_L x19, PT_S3(sp)
> -       REG_L x20, PT_S4(sp)
> -       REG_L x21, PT_S5(sp)
> -       REG_L x22, PT_S6(sp)
> -       REG_L x23, PT_S7(sp)
> -       REG_L x24, PT_S8(sp)
> -       REG_L x25, PT_S9(sp)
> -       REG_L x26, PT_S10(sp)
> -       REG_L x27, PT_S11(sp)
> -       REG_L x28, PT_T3(sp)
> -       REG_L x29, PT_T4(sp)
> -       REG_L x30, PT_T5(sp)
> -       REG_L x31, PT_T6(sp)
> +       restore_gp
>
>         REG_L x2,  PT_SP(sp)
>
> @@ -237,34 +183,7 @@ ENTRY(handle_kernel_stack_overflow)
>
>         //save context to overflow stack
>         REG_S x1,  PT_RA(sp)
> -       REG_S x3,  PT_GP(sp)
> -       REG_S x5,  PT_T0(sp)
> -       REG_S x6,  PT_T1(sp)
> -       REG_S x7,  PT_T2(sp)
> -       REG_S x8,  PT_S0(sp)
> -       REG_S x9,  PT_S1(sp)
> -       REG_S x10, PT_A0(sp)
> -       REG_S x11, PT_A1(sp)
> -       REG_S x12, PT_A2(sp)
> -       REG_S x13, PT_A3(sp)
> -       REG_S x14, PT_A4(sp)
> -       REG_S x15, PT_A5(sp)
> -       REG_S x16, PT_A6(sp)
> -       REG_S x17, PT_A7(sp)
> -       REG_S x18, PT_S2(sp)
> -       REG_S x19, PT_S3(sp)
> -       REG_S x20, PT_S4(sp)
> -       REG_S x21, PT_S5(sp)
> -       REG_S x22, PT_S6(sp)
> -       REG_S x23, PT_S7(sp)
> -       REG_S x24, PT_S8(sp)
> -       REG_S x25, PT_S9(sp)
> -       REG_S x26, PT_S10(sp)
> -       REG_S x27, PT_S11(sp)
> -       REG_S x28, PT_T3(sp)
> -       REG_S x29, PT_T4(sp)
> -       REG_S x30, PT_T5(sp)
> -       REG_S x31, PT_T6(sp)
> +       save_gp
>
>         REG_L s0, TASK_TI_KERNEL_SP(tp)
>         csrr s1, CSR_STATUS
> diff --git a/arch/riscv/kernel/mcount-dyn.S b/arch/riscv/kernel/mcount-dyn.S
> index d171eca623b6..1b4b0aecf4f5 100644
> --- a/arch/riscv/kernel/mcount-dyn.S
> +++ b/arch/riscv/kernel/mcount-dyn.S
> @@ -68,35 +68,8 @@
>         REG_L x1,  PT_EPC(sp)
>
>         REG_S x2,  PT_SP(sp)
> -       REG_S x3,  PT_GP(sp)
>         REG_S x4,  PT_TP(sp)
> -       REG_S x5,  PT_T0(sp)
> -       REG_S x6,  PT_T1(sp)
> -       REG_S x7,  PT_T2(sp)
> -       REG_S x8,  PT_S0(sp)
> -       REG_S x9,  PT_S1(sp)
> -       REG_S x10, PT_A0(sp)
> -       REG_S x11, PT_A1(sp)
> -       REG_S x12, PT_A2(sp)
> -       REG_S x13, PT_A3(sp)
> -       REG_S x14, PT_A4(sp)
> -       REG_S x15, PT_A5(sp)
> -       REG_S x16, PT_A6(sp)
> -       REG_S x17, PT_A7(sp)
> -       REG_S x18, PT_S2(sp)
> -       REG_S x19, PT_S3(sp)
> -       REG_S x20, PT_S4(sp)
> -       REG_S x21, PT_S5(sp)
> -       REG_S x22, PT_S6(sp)
> -       REG_S x23, PT_S7(sp)
> -       REG_S x24, PT_S8(sp)
> -       REG_S x25, PT_S9(sp)
> -       REG_S x26, PT_S10(sp)
> -       REG_S x27, PT_S11(sp)
> -       REG_S x28, PT_T3(sp)
> -       REG_S x29, PT_T4(sp)
> -       REG_S x30, PT_T5(sp)
> -       REG_S x31, PT_T6(sp)
> +       save_gp
>         .endm
>
>         .macro RESTORE_ALL
> @@ -106,35 +79,8 @@
>         addi    sp, sp, -PT_SIZE_ON_STACK
>         REG_L x1,  PT_EPC(sp)
>         REG_L x2,  PT_SP(sp)
> -       REG_L x3,  PT_GP(sp)
>         REG_L x4,  PT_TP(sp)
> -       REG_L x5,  PT_T0(sp)
> -       REG_L x6,  PT_T1(sp)
> -       REG_L x7,  PT_T2(sp)
> -       REG_L x8,  PT_S0(sp)
> -       REG_L x9,  PT_S1(sp)
> -       REG_L x10, PT_A0(sp)
> -       REG_L x11, PT_A1(sp)
> -       REG_L x12, PT_A2(sp)
> -       REG_L x13, PT_A3(sp)
> -       REG_L x14, PT_A4(sp)
> -       REG_L x15, PT_A5(sp)
> -       REG_L x16, PT_A6(sp)
> -       REG_L x17, PT_A7(sp)
> -       REG_L x18, PT_S2(sp)
> -       REG_L x19, PT_S3(sp)
> -       REG_L x20, PT_S4(sp)
> -       REG_L x21, PT_S5(sp)
> -       REG_L x22, PT_S6(sp)
> -       REG_L x23, PT_S7(sp)
> -       REG_L x24, PT_S8(sp)
> -       REG_L x25, PT_S9(sp)
> -       REG_L x26, PT_S10(sp)
> -       REG_L x27, PT_S11(sp)
> -       REG_L x28, PT_T3(sp)
> -       REG_L x29, PT_T4(sp)
> -       REG_L x30, PT_T5(sp)
> -       REG_L x31, PT_T6(sp)
> +       restore_gp
>
>         addi    sp, sp, PT_SIZE_ON_STACK
>         addi    sp, sp, SZREG
> --
> 2.34.1
>


-- 
Best Regards
 Guo Ren

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH v2 3/4] riscv: fix race when vmap stack overflow and remove shadow_stack
  2022-09-28 16:54   ` Jisheng Zhang
@ 2022-09-29  5:54     ` Guo Ren
  2022-09-29  6:58       ` Guo Ren
                         ` (2 more replies)
  0 siblings, 3 replies; 18+ messages in thread
From: Guo Ren @ 2022-09-29  5:54 UTC (permalink / raw)
  To: Jisheng Zhang
  Cc: Paul Walmsley, Palmer Dabbelt, Albert Ou, Nathan Chancellor,
	Nick Desaulniers, linux-riscv, linux-kernel, llvm

On Thu, Sep 29, 2022 at 1:03 AM Jisheng Zhang <jszhang@kernel.org> wrote:
>
> On Thu, Sep 29, 2022 at 12:20:06AM +0800, Jisheng Zhang wrote:
> > Currently, when detecting vmap stack overflow, riscv firstly switches
> > to the so called shadow stack, then use this shadow stack to call the
> > get_overflow_stack() to get the overflow stack. However, there's
> > a race here if two or more harts use the same shadow stack at the same
> > time.
> >
> > To solve this race, we rely on two facts:
> > 1. the content of kernel thread pointer I.E "tp" register can still
> > be gotten from the the CSR_SCRATCH register, thus we can clobber tp
> > under the condtion that we restore tp from CSR_SCRATCH later.
> >
> > 2. Once vmap stack overflow happen, panic is comming soon, no
> > performance concern at all, so we don't need to define the overflow
> > stack as percpu var, we can simplify it into a pointer array which
> > points to allocated pages.
> >
> > Thus we can use tp as a tmp register to get the cpu id to calculate
> > the offset of overflow stack pointer array for each cpu w/o shadow
> > stack any more. Thus the race condition is removed as a side effect.
> >
> > NOTE: we can use similar mechanism to let each cpu use different shadow
> > stack to fix the race codition, but if we can remove shadow stack usage
> > totally, why not.
> >
> > Signed-off-by: Jisheng Zhang <jszhang@kernel.org>
> > Fixes: 31da94c25aea ("riscv: add VMAP_STACK overflow detection")
> > ---
> >  arch/riscv/include/asm/asm-prototypes.h |  1 -
> >  arch/riscv/include/asm/thread_info.h    |  4 +-
> >  arch/riscv/kernel/asm-offsets.c         |  1 +
> >  arch/riscv/kernel/entry.S               | 56 ++++---------------------
> >  arch/riscv/kernel/traps.c               | 31 ++++++++------
> >  5 files changed, 29 insertions(+), 64 deletions(-)
> >
> > diff --git a/arch/riscv/include/asm/asm-prototypes.h b/arch/riscv/include/asm/asm-prototypes.h
> > index ef386fcf3939..4a06fa0f6493 100644
> > --- a/arch/riscv/include/asm/asm-prototypes.h
> > +++ b/arch/riscv/include/asm/asm-prototypes.h
> > @@ -25,7 +25,6 @@ DECLARE_DO_ERROR_INFO(do_trap_ecall_s);
> >  DECLARE_DO_ERROR_INFO(do_trap_ecall_m);
> >  DECLARE_DO_ERROR_INFO(do_trap_break);
> >
> > -asmlinkage unsigned long get_overflow_stack(void);
> >  asmlinkage void handle_bad_stack(struct pt_regs *regs);
> >
> >  #endif /* _ASM_RISCV_PROTOTYPES_H */
> > diff --git a/arch/riscv/include/asm/thread_info.h b/arch/riscv/include/asm/thread_info.h
> > index c970d41dc4c6..c604a5212a73 100644
> > --- a/arch/riscv/include/asm/thread_info.h
> > +++ b/arch/riscv/include/asm/thread_info.h
> > @@ -28,14 +28,12 @@
> >
> >  #define THREAD_SHIFT            (PAGE_SHIFT + THREAD_SIZE_ORDER)
> >  #define OVERFLOW_STACK_SIZE     SZ_4K
> > -#define SHADOW_OVERFLOW_STACK_SIZE (1024)
> > +#define OVERFLOW_STACK_SHIFT 12
>
> oops, this should be removed, will update it in a newer version after
> collecting review comments.
>
> >
> >  #define IRQ_STACK_SIZE               THREAD_SIZE
> >
> >  #ifndef __ASSEMBLY__
> >
> > -extern long shadow_stack[SHADOW_OVERFLOW_STACK_SIZE / sizeof(long)];
> > -
> >  #include <asm/processor.h>
> >  #include <asm/csr.h>
> >
> > diff --git a/arch/riscv/kernel/asm-offsets.c b/arch/riscv/kernel/asm-offsets.c
> > index df9444397908..62bf3bacc322 100644
> > --- a/arch/riscv/kernel/asm-offsets.c
> > +++ b/arch/riscv/kernel/asm-offsets.c
> > @@ -37,6 +37,7 @@ void asm_offsets(void)
> >       OFFSET(TASK_TI_PREEMPT_COUNT, task_struct, thread_info.preempt_count);
> >       OFFSET(TASK_TI_KERNEL_SP, task_struct, thread_info.kernel_sp);
> >       OFFSET(TASK_TI_USER_SP, task_struct, thread_info.user_sp);
> > +     OFFSET(TASK_TI_CPU, task_struct, thread_info.cpu);
> >
> >       OFFSET(TASK_THREAD_F0,  task_struct, thread.fstate.f[0]);
> >       OFFSET(TASK_THREAD_F1,  task_struct, thread.fstate.f[1]);
> > diff --git a/arch/riscv/kernel/entry.S b/arch/riscv/kernel/entry.S
> > index a3e1ed2fa2ac..5a6171a90d81 100644
> > --- a/arch/riscv/kernel/entry.S
> > +++ b/arch/riscv/kernel/entry.S
> > @@ -223,54 +223,16 @@ END(ret_from_exception)
> >
> >  #ifdef CONFIG_VMAP_STACK
> >  ENTRY(handle_kernel_stack_overflow)
> > -     la sp, shadow_stack
> > -     addi sp, sp, SHADOW_OVERFLOW_STACK_SIZE
> > -
> > -     //save caller register to shadow stack
> > -     addi sp, sp, -(PT_SIZE_ON_STACK)
> > -     REG_S x1,  PT_RA(sp)
> > -     REG_S x5,  PT_T0(sp)
> > -     REG_S x6,  PT_T1(sp)
> > -     REG_S x7,  PT_T2(sp)
> > -     REG_S x10, PT_A0(sp)
> > -     REG_S x11, PT_A1(sp)
> > -     REG_S x12, PT_A2(sp)
> > -     REG_S x13, PT_A3(sp)
> > -     REG_S x14, PT_A4(sp)
> > -     REG_S x15, PT_A5(sp)
> > -     REG_S x16, PT_A6(sp)
> > -     REG_S x17, PT_A7(sp)
> > -     REG_S x28, PT_T3(sp)
> > -     REG_S x29, PT_T4(sp)
> > -     REG_S x30, PT_T5(sp)
> > -     REG_S x31, PT_T6(sp)
> > -
> > -     la ra, restore_caller_reg
> > -     tail get_overflow_stack
> > -
> > -restore_caller_reg:
> > -     //save per-cpu overflow stack
> > -     REG_S a0, -8(sp)
> > -     //restore caller register from shadow_stack
> > -     REG_L x1,  PT_RA(sp)
> > -     REG_L x5,  PT_T0(sp)
> > -     REG_L x6,  PT_T1(sp)
> > -     REG_L x7,  PT_T2(sp)
> > -     REG_L x10, PT_A0(sp)
> > -     REG_L x11, PT_A1(sp)
> > -     REG_L x12, PT_A2(sp)
> > -     REG_L x13, PT_A3(sp)
> > -     REG_L x14, PT_A4(sp)
> > -     REG_L x15, PT_A5(sp)
> > -     REG_L x16, PT_A6(sp)
> > -     REG_L x17, PT_A7(sp)
> > -     REG_L x28, PT_T3(sp)
> > -     REG_L x29, PT_T4(sp)
> > -     REG_L x30, PT_T5(sp)
> > -     REG_L x31, PT_T6(sp)
> > +     la sp, overflow_stack
> > +     /* use tp as tmp register since we can restore it from CSR_SCRATCH */
> > +     REG_L tp, TASK_TI_CPU(tp)
> > +     slli tp, tp, RISCV_LGPTR
> > +     add tp, sp, tp
> > +     REG_L sp, 0(tp)
> > +
> > +     /* restore tp */
> > +     csrr tp, CSR_SCRATCH
> >
> > -     //load per-cpu overflow stack
> > -     REG_L sp, -8(sp)
> >       addi sp, sp, -(PT_SIZE_ON_STACK)
> >
> >       //save context to overflow stack
> > diff --git a/arch/riscv/kernel/traps.c b/arch/riscv/kernel/traps.c
> > index 73f06cd149d9..b6c64f0fb70f 100644
> > --- a/arch/riscv/kernel/traps.c
> > +++ b/arch/riscv/kernel/traps.c
> > @@ -216,23 +216,12 @@ int is_valid_bugaddr(unsigned long pc)
> >  #endif /* CONFIG_GENERIC_BUG */
> >
> >  #ifdef CONFIG_VMAP_STACK
> > -static DEFINE_PER_CPU(unsigned long [OVERFLOW_STACK_SIZE/sizeof(long)],
> > -             overflow_stack)__aligned(16);
> > -/*
> > - * shadow stack, handled_ kernel_ stack_ overflow(in kernel/entry.S) is used
> > - * to get per-cpu overflow stack(get_overflow_stack).
> > - */
> > -long shadow_stack[SHADOW_OVERFLOW_STACK_SIZE/sizeof(long)];
> > -asmlinkage unsigned long get_overflow_stack(void)
> > -{
> > -     return (unsigned long)this_cpu_ptr(overflow_stack) +
> > -             OVERFLOW_STACK_SIZE;
> > -}
> > +void *overflow_stack[NR_CPUS] __ro_after_init __aligned(16);
Er... We've talked NR_CPUS = 8192, even a pointer would cause 64KB wasted.

I prefer the previous solution with a atomic flag.

diff --git a/arch/riscv/kernel/entry.S b/arch/riscv/kernel/entry.S
index 5cbd6684ef52..42a3b14a20ab 100644
--- a/arch/riscv/kernel/entry.S
+++ b/arch/riscv/kernel/entry.S
@@ -223,6 +223,9 @@ END(ret_from_exception)

 #ifdef CONFIG_VMAP_STACK
 ENTRY(handle_kernel_stack_overflow)
+1:     la sp, spin_ shadow_stack
+       amoswap.w sp, sp, (sp)
+       bnez sp, 1b
        la sp, shadow_stack
        addi sp, sp, SHADOW_OVERFLOW_STACK_SIZE

diff --git a/arch/riscv/kernel/traps.c b/arch/riscv/kernel/traps.c
index 73f06cd149d9..9058a05cac53 100644
--- a/arch/riscv/kernel/traps.c
+++ b/arch/riscv/kernel/traps.c
@@ -229,11 +229,15 @@ asmlinkage unsigned long get_overflow_stack(void)
                OVERFLOW_STACK_SIZE;
 }

+atomic_t spin_ shadow_stack __section(".sdata");
+
 asmlinkage void handle_bad_stack(struct pt_regs *regs)
 {
        unsigned long tsk_stk = (unsigned long)current->stack;
        unsigned long ovf_stk = (unsigned long)this_cpu_ptr(overflow_stack);

+       atomic_set_release(spin_ shadow_stack, 0);
+
        console_verbose();

        pr_emerg("Insufficient stack space to handle exception!\n");





> >
> >  asmlinkage void handle_bad_stack(struct pt_regs *regs)
> >  {
> >       unsigned long tsk_stk = (unsigned long)current->stack;
> > -     unsigned long ovf_stk = (unsigned long)this_cpu_ptr(overflow_stack);
> > +     unsigned long ovf_stk = (unsigned long)overflow_stack[raw_smp_processor_id()];
> >
> >       console_verbose();
> >
> > @@ -248,4 +237,20 @@ asmlinkage void handle_bad_stack(struct pt_regs *regs)
> >       for (;;)
> >               wait_for_interrupt();
> >  }
> > +
> > +static int __init alloc_overflow_stacks(void)
> > +{
> > +     u8 *s;
> > +     int cpu;
> > +
> > +     for_each_possible_cpu(cpu) {
> > +             s = (u8 *)__get_free_pages(GFP_KERNEL, get_order(OVERFLOW_STACK_SIZE));
> > +             if (WARN_ON(!s))
> > +                     return -ENOMEM;
> > +             overflow_stack[cpu] = &s[OVERFLOW_STACK_SIZE];
>
> Since overflow_stack[cpu] points to the top of the slack, we need to
> update the ovf_stack dumping in handle_bad_stack(). will take care
> this in newer version.
>
> > +             printk("%px\n", overflow_stack[cpu]);
>
> forget to remove this printk :(



--
Best Regards
 Guo Ren

^ permalink raw reply related	[flat|nested] 18+ messages in thread

* Re: [PATCH v2 3/4] riscv: fix race when vmap stack overflow and remove shadow_stack
  2022-09-29  5:54     ` Guo Ren
@ 2022-09-29  6:58       ` Guo Ren
  2022-09-29 16:18       ` Jisheng Zhang
  2022-10-19  2:52       ` Guo Ren
  2 siblings, 0 replies; 18+ messages in thread
From: Guo Ren @ 2022-09-29  6:58 UTC (permalink / raw)
  To: Jisheng Zhang
  Cc: Paul Walmsley, Palmer Dabbelt, Albert Ou, Nathan Chancellor,
	Nick Desaulniers, linux-riscv, linux-kernel, llvm

On Thu, Sep 29, 2022 at 1:54 PM Guo Ren <guoren@kernel.org> wrote:
>
> On Thu, Sep 29, 2022 at 1:03 AM Jisheng Zhang <jszhang@kernel.org> wrote:
> >
> > On Thu, Sep 29, 2022 at 12:20:06AM +0800, Jisheng Zhang wrote:
> > > Currently, when detecting vmap stack overflow, riscv firstly switches
> > > to the so called shadow stack, then use this shadow stack to call the
> > > get_overflow_stack() to get the overflow stack. However, there's
> > > a race here if two or more harts use the same shadow stack at the same
> > > time.
> > >
> > > To solve this race, we rely on two facts:
> > > 1. the content of kernel thread pointer I.E "tp" register can still
> > > be gotten from the the CSR_SCRATCH register, thus we can clobber tp
> > > under the condtion that we restore tp from CSR_SCRATCH later.
> > >
> > > 2. Once vmap stack overflow happen, panic is comming soon, no
> > > performance concern at all, so we don't need to define the overflow
> > > stack as percpu var, we can simplify it into a pointer array which
> > > points to allocated pages.
> > >
> > > Thus we can use tp as a tmp register to get the cpu id to calculate
> > > the offset of overflow stack pointer array for each cpu w/o shadow
> > > stack any more. Thus the race condition is removed as a side effect.
> > >
> > > NOTE: we can use similar mechanism to let each cpu use different shadow
> > > stack to fix the race codition, but if we can remove shadow stack usage
> > > totally, why not.
> > >
> > > Signed-off-by: Jisheng Zhang <jszhang@kernel.org>
> > > Fixes: 31da94c25aea ("riscv: add VMAP_STACK overflow detection")
> > > ---
> > >  arch/riscv/include/asm/asm-prototypes.h |  1 -
> > >  arch/riscv/include/asm/thread_info.h    |  4 +-
> > >  arch/riscv/kernel/asm-offsets.c         |  1 +
> > >  arch/riscv/kernel/entry.S               | 56 ++++---------------------
> > >  arch/riscv/kernel/traps.c               | 31 ++++++++------
> > >  5 files changed, 29 insertions(+), 64 deletions(-)
> > >
> > > diff --git a/arch/riscv/include/asm/asm-prototypes.h b/arch/riscv/include/asm/asm-prototypes.h
> > > index ef386fcf3939..4a06fa0f6493 100644
> > > --- a/arch/riscv/include/asm/asm-prototypes.h
> > > +++ b/arch/riscv/include/asm/asm-prototypes.h
> > > @@ -25,7 +25,6 @@ DECLARE_DO_ERROR_INFO(do_trap_ecall_s);
> > >  DECLARE_DO_ERROR_INFO(do_trap_ecall_m);
> > >  DECLARE_DO_ERROR_INFO(do_trap_break);
> > >
> > > -asmlinkage unsigned long get_overflow_stack(void);
> > >  asmlinkage void handle_bad_stack(struct pt_regs *regs);
> > >
> > >  #endif /* _ASM_RISCV_PROTOTYPES_H */
> > > diff --git a/arch/riscv/include/asm/thread_info.h b/arch/riscv/include/asm/thread_info.h
> > > index c970d41dc4c6..c604a5212a73 100644
> > > --- a/arch/riscv/include/asm/thread_info.h
> > > +++ b/arch/riscv/include/asm/thread_info.h
> > > @@ -28,14 +28,12 @@
> > >
> > >  #define THREAD_SHIFT            (PAGE_SHIFT + THREAD_SIZE_ORDER)
> > >  #define OVERFLOW_STACK_SIZE     SZ_4K
> > > -#define SHADOW_OVERFLOW_STACK_SIZE (1024)
> > > +#define OVERFLOW_STACK_SHIFT 12
> >
> > oops, this should be removed, will update it in a newer version after
> > collecting review comments.
> >
> > >
> > >  #define IRQ_STACK_SIZE               THREAD_SIZE
> > >
> > >  #ifndef __ASSEMBLY__
> > >
> > > -extern long shadow_stack[SHADOW_OVERFLOW_STACK_SIZE / sizeof(long)];
> > > -
> > >  #include <asm/processor.h>
> > >  #include <asm/csr.h>
> > >
> > > diff --git a/arch/riscv/kernel/asm-offsets.c b/arch/riscv/kernel/asm-offsets.c
> > > index df9444397908..62bf3bacc322 100644
> > > --- a/arch/riscv/kernel/asm-offsets.c
> > > +++ b/arch/riscv/kernel/asm-offsets.c
> > > @@ -37,6 +37,7 @@ void asm_offsets(void)
> > >       OFFSET(TASK_TI_PREEMPT_COUNT, task_struct, thread_info.preempt_count);
> > >       OFFSET(TASK_TI_KERNEL_SP, task_struct, thread_info.kernel_sp);
> > >       OFFSET(TASK_TI_USER_SP, task_struct, thread_info.user_sp);
> > > +     OFFSET(TASK_TI_CPU, task_struct, thread_info.cpu);
> > >
> > >       OFFSET(TASK_THREAD_F0,  task_struct, thread.fstate.f[0]);
> > >       OFFSET(TASK_THREAD_F1,  task_struct, thread.fstate.f[1]);
> > > diff --git a/arch/riscv/kernel/entry.S b/arch/riscv/kernel/entry.S
> > > index a3e1ed2fa2ac..5a6171a90d81 100644
> > > --- a/arch/riscv/kernel/entry.S
> > > +++ b/arch/riscv/kernel/entry.S
> > > @@ -223,54 +223,16 @@ END(ret_from_exception)
> > >
> > >  #ifdef CONFIG_VMAP_STACK
> > >  ENTRY(handle_kernel_stack_overflow)
> > > -     la sp, shadow_stack
> > > -     addi sp, sp, SHADOW_OVERFLOW_STACK_SIZE
> > > -
> > > -     //save caller register to shadow stack
> > > -     addi sp, sp, -(PT_SIZE_ON_STACK)
> > > -     REG_S x1,  PT_RA(sp)
> > > -     REG_S x5,  PT_T0(sp)
> > > -     REG_S x6,  PT_T1(sp)
> > > -     REG_S x7,  PT_T2(sp)
> > > -     REG_S x10, PT_A0(sp)
> > > -     REG_S x11, PT_A1(sp)
> > > -     REG_S x12, PT_A2(sp)
> > > -     REG_S x13, PT_A3(sp)
> > > -     REG_S x14, PT_A4(sp)
> > > -     REG_S x15, PT_A5(sp)
> > > -     REG_S x16, PT_A6(sp)
> > > -     REG_S x17, PT_A7(sp)
> > > -     REG_S x28, PT_T3(sp)
> > > -     REG_S x29, PT_T4(sp)
> > > -     REG_S x30, PT_T5(sp)
> > > -     REG_S x31, PT_T6(sp)
> > > -
> > > -     la ra, restore_caller_reg
> > > -     tail get_overflow_stack
> > > -
> > > -restore_caller_reg:
> > > -     //save per-cpu overflow stack
> > > -     REG_S a0, -8(sp)
> > > -     //restore caller register from shadow_stack
> > > -     REG_L x1,  PT_RA(sp)
> > > -     REG_L x5,  PT_T0(sp)
> > > -     REG_L x6,  PT_T1(sp)
> > > -     REG_L x7,  PT_T2(sp)
> > > -     REG_L x10, PT_A0(sp)
> > > -     REG_L x11, PT_A1(sp)
> > > -     REG_L x12, PT_A2(sp)
> > > -     REG_L x13, PT_A3(sp)
> > > -     REG_L x14, PT_A4(sp)
> > > -     REG_L x15, PT_A5(sp)
> > > -     REG_L x16, PT_A6(sp)
> > > -     REG_L x17, PT_A7(sp)
> > > -     REG_L x28, PT_T3(sp)
> > > -     REG_L x29, PT_T4(sp)
> > > -     REG_L x30, PT_T5(sp)
> > > -     REG_L x31, PT_T6(sp)
> > > +     la sp, overflow_stack
> > > +     /* use tp as tmp register since we can restore it from CSR_SCRATCH */
> > > +     REG_L tp, TASK_TI_CPU(tp)
> > > +     slli tp, tp, RISCV_LGPTR
> > > +     add tp, sp, tp
> > > +     REG_L sp, 0(tp)
> > > +
> > > +     /* restore tp */
> > > +     csrr tp, CSR_SCRATCH
> > >
> > > -     //load per-cpu overflow stack
> > > -     REG_L sp, -8(sp)
> > >       addi sp, sp, -(PT_SIZE_ON_STACK)
> > >
> > >       //save context to overflow stack
> > > diff --git a/arch/riscv/kernel/traps.c b/arch/riscv/kernel/traps.c
> > > index 73f06cd149d9..b6c64f0fb70f 100644
> > > --- a/arch/riscv/kernel/traps.c
> > > +++ b/arch/riscv/kernel/traps.c
> > > @@ -216,23 +216,12 @@ int is_valid_bugaddr(unsigned long pc)
> > >  #endif /* CONFIG_GENERIC_BUG */
> > >
> > >  #ifdef CONFIG_VMAP_STACK
> > > -static DEFINE_PER_CPU(unsigned long [OVERFLOW_STACK_SIZE/sizeof(long)],
> > > -             overflow_stack)__aligned(16);
> > > -/*
> > > - * shadow stack, handled_ kernel_ stack_ overflow(in kernel/entry.S) is used
> > > - * to get per-cpu overflow stack(get_overflow_stack).
> > > - */
> > > -long shadow_stack[SHADOW_OVERFLOW_STACK_SIZE/sizeof(long)];
> > > -asmlinkage unsigned long get_overflow_stack(void)
> > > -{
> > > -     return (unsigned long)this_cpu_ptr(overflow_stack) +
> > > -             OVERFLOW_STACK_SIZE;
> > > -}
> > > +void *overflow_stack[NR_CPUS] __ro_after_init __aligned(16);
> Er... We've talked NR_CPUS = 8192, even a pointer would cause 64KB wasted.
>
> I prefer the previous solution with a atomic flag.
>
> diff --git a/arch/riscv/kernel/entry.S b/arch/riscv/kernel/entry.S
> index 5cbd6684ef52..42a3b14a20ab 100644
> --- a/arch/riscv/kernel/entry.S
> +++ b/arch/riscv/kernel/entry.S
> @@ -223,6 +223,9 @@ END(ret_from_exception)
>
>  #ifdef CONFIG_VMAP_STACK
>  ENTRY(handle_kernel_stack_overflow)
> +1:     la sp, spin_ shadow_stack
> +       amoswap.w sp, sp, (sp)
> +       bnez sp, 1b
>         la sp, shadow_stack
>         addi sp, sp, SHADOW_OVERFLOW_STACK_SIZE
>
> diff --git a/arch/riscv/kernel/traps.c b/arch/riscv/kernel/traps.c
> index 73f06cd149d9..9058a05cac53 100644
> --- a/arch/riscv/kernel/traps.c
> +++ b/arch/riscv/kernel/traps.c
> @@ -229,11 +229,15 @@ asmlinkage unsigned long get_overflow_stack(void)
>                 OVERFLOW_STACK_SIZE;
>  }
>
> +atomic_t spin_ shadow_stack __section(".sdata");
> +
>  asmlinkage void handle_bad_stack(struct pt_regs *regs)
>  {
>         unsigned long tsk_stk = (unsigned long)current->stack;
>         unsigned long ovf_stk = (unsigned long)this_cpu_ptr(overflow_stack);
>
> +       atomic_set_release(spin_ shadow_stack, 0);
Sorry, it should be:
   +       atomic_set_release(&spin_ shadow_stack, 0);

> +
>         console_verbose();
>
>         pr_emerg("Insufficient stack space to handle exception!\n");
>
>
>
>
>
> > >
> > >  asmlinkage void handle_bad_stack(struct pt_regs *regs)
> > >  {
> > >       unsigned long tsk_stk = (unsigned long)current->stack;
> > > -     unsigned long ovf_stk = (unsigned long)this_cpu_ptr(overflow_stack);
> > > +     unsigned long ovf_stk = (unsigned long)overflow_stack[raw_smp_processor_id()];
> > >
> > >       console_verbose();
> > >
> > > @@ -248,4 +237,20 @@ asmlinkage void handle_bad_stack(struct pt_regs *regs)
> > >       for (;;)
> > >               wait_for_interrupt();
> > >  }
> > > +
> > > +static int __init alloc_overflow_stacks(void)
> > > +{
> > > +     u8 *s;
> > > +     int cpu;
> > > +
> > > +     for_each_possible_cpu(cpu) {
> > > +             s = (u8 *)__get_free_pages(GFP_KERNEL, get_order(OVERFLOW_STACK_SIZE));
> > > +             if (WARN_ON(!s))
> > > +                     return -ENOMEM;
> > > +             overflow_stack[cpu] = &s[OVERFLOW_STACK_SIZE];
> >
> > Since overflow_stack[cpu] points to the top of the slack, we need to
> > update the ovf_stack dumping in handle_bad_stack(). will take care
> > this in newer version.
> >
> > > +             printk("%px\n", overflow_stack[cpu]);
> >
> > forget to remove this printk :(
>
>
>
> --
> Best Regards
>  Guo Ren



-- 
Best Regards
 Guo Ren

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH v2 4/4] riscv: entry: consolidate general regs saving into save_gp
  2022-09-29  3:59   ` Guo Ren
@ 2022-09-29 16:01     ` Jisheng Zhang
  2022-10-19  2:55       ` Guo Ren
  0 siblings, 1 reply; 18+ messages in thread
From: Jisheng Zhang @ 2022-09-29 16:01 UTC (permalink / raw)
  To: Guo Ren
  Cc: Paul Walmsley, Palmer Dabbelt, Albert Ou, Nathan Chancellor,
	Nick Desaulniers, linux-riscv, linux-kernel, llvm

On Thu, Sep 29, 2022 at 11:59:00AM +0800, Guo Ren wrote:
> On Thu, Sep 29, 2022 at 12:29 AM Jisheng Zhang <jszhang@kernel.org> wrote:
> >
> > Consolidate the saving/restoring GPs(except ra, sp and tp) into
> > save_gp/restore_gp macro.
> >
> > No functional change intended.
> >
> > Signed-off-by: Jisheng Zhang <jszhang@kernel.org>
> > ---
> >  arch/riscv/include/asm/asm.h   | 65 +++++++++++++++++++++++++
> >  arch/riscv/kernel/entry.S      | 87 ++--------------------------------
> >  arch/riscv/kernel/mcount-dyn.S | 58 +----------------------
> >  3 files changed, 70 insertions(+), 140 deletions(-)
> >
> > diff --git a/arch/riscv/include/asm/asm.h b/arch/riscv/include/asm/asm.h
> > index 1b471ff73178..2f3b49536e9d 100644
> > --- a/arch/riscv/include/asm/asm.h
> > +++ b/arch/riscv/include/asm/asm.h
> > @@ -68,6 +68,7 @@
> >  #endif
> >
> >  #ifdef __ASSEMBLY__
> > +#include <asm/asm-offsets.h>
> >
> >  /* Common assembly source macros */
> >
> > @@ -80,6 +81,70 @@
> >         .endr
> >  .endm
> >
> > +       /* save all GPs except ra, sp and tp */
> > +       .macro save_gp
> How about leave x3(gp) out of the macro, and define:
> .marco save_from_x5_to_x31
> .marco restore_from_x5_to_x31

Good idea, will do in next version.

Thanks

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH v2 3/4] riscv: fix race when vmap stack overflow and remove shadow_stack
  2022-09-29  5:54     ` Guo Ren
  2022-09-29  6:58       ` Guo Ren
@ 2022-09-29 16:18       ` Jisheng Zhang
  2022-09-30  3:35         ` Guo Ren
  2022-10-19  2:52       ` Guo Ren
  2 siblings, 1 reply; 18+ messages in thread
From: Jisheng Zhang @ 2022-09-29 16:18 UTC (permalink / raw)
  To: Guo Ren
  Cc: Paul Walmsley, Palmer Dabbelt, Albert Ou, Nathan Chancellor,
	Nick Desaulniers, linux-riscv, linux-kernel, llvm

On Thu, Sep 29, 2022 at 01:54:25PM +0800, Guo Ren wrote:
> On Thu, Sep 29, 2022 at 1:03 AM Jisheng Zhang <jszhang@kernel.org> wrote:
> >
> > On Thu, Sep 29, 2022 at 12:20:06AM +0800, Jisheng Zhang wrote:
> > > Currently, when detecting vmap stack overflow, riscv firstly switches
> > > to the so called shadow stack, then use this shadow stack to call the
> > > get_overflow_stack() to get the overflow stack. However, there's
> > > a race here if two or more harts use the same shadow stack at the same
> > > time.
> > >
> > > To solve this race, we rely on two facts:
> > > 1. the content of kernel thread pointer I.E "tp" register can still
> > > be gotten from the the CSR_SCRATCH register, thus we can clobber tp
> > > under the condtion that we restore tp from CSR_SCRATCH later.
> > >
> > > 2. Once vmap stack overflow happen, panic is comming soon, no
> > > performance concern at all, so we don't need to define the overflow
> > > stack as percpu var, we can simplify it into a pointer array which
> > > points to allocated pages.
> > >
> > > Thus we can use tp as a tmp register to get the cpu id to calculate
> > > the offset of overflow stack pointer array for each cpu w/o shadow
> > > stack any more. Thus the race condition is removed as a side effect.
> > >
> > > NOTE: we can use similar mechanism to let each cpu use different shadow
> > > stack to fix the race codition, but if we can remove shadow stack usage
> > > totally, why not.
> > >
> > > Signed-off-by: Jisheng Zhang <jszhang@kernel.org>
> > > Fixes: 31da94c25aea ("riscv: add VMAP_STACK overflow detection")
> > > ---
> > >  arch/riscv/include/asm/asm-prototypes.h |  1 -
> > >  arch/riscv/include/asm/thread_info.h    |  4 +-
> > >  arch/riscv/kernel/asm-offsets.c         |  1 +
> > >  arch/riscv/kernel/entry.S               | 56 ++++---------------------
> > >  arch/riscv/kernel/traps.c               | 31 ++++++++------
> > >  5 files changed, 29 insertions(+), 64 deletions(-)
> > >
> > > diff --git a/arch/riscv/include/asm/asm-prototypes.h b/arch/riscv/include/asm/asm-prototypes.h
> > > index ef386fcf3939..4a06fa0f6493 100644
> > > --- a/arch/riscv/include/asm/asm-prototypes.h
> > > +++ b/arch/riscv/include/asm/asm-prototypes.h
> > > @@ -25,7 +25,6 @@ DECLARE_DO_ERROR_INFO(do_trap_ecall_s);
> > >  DECLARE_DO_ERROR_INFO(do_trap_ecall_m);
> > >  DECLARE_DO_ERROR_INFO(do_trap_break);
> > >
> > > -asmlinkage unsigned long get_overflow_stack(void);
> > >  asmlinkage void handle_bad_stack(struct pt_regs *regs);
> > >
> > >  #endif /* _ASM_RISCV_PROTOTYPES_H */
> > > diff --git a/arch/riscv/include/asm/thread_info.h b/arch/riscv/include/asm/thread_info.h
> > > index c970d41dc4c6..c604a5212a73 100644
> > > --- a/arch/riscv/include/asm/thread_info.h
> > > +++ b/arch/riscv/include/asm/thread_info.h
> > > @@ -28,14 +28,12 @@
> > >
> > >  #define THREAD_SHIFT            (PAGE_SHIFT + THREAD_SIZE_ORDER)
> > >  #define OVERFLOW_STACK_SIZE     SZ_4K
> > > -#define SHADOW_OVERFLOW_STACK_SIZE (1024)
> > > +#define OVERFLOW_STACK_SHIFT 12
> >
> > oops, this should be removed, will update it in a newer version after
> > collecting review comments.
> >
> > >
> > >  #define IRQ_STACK_SIZE               THREAD_SIZE
> > >
> > >  #ifndef __ASSEMBLY__
> > >
> > > -extern long shadow_stack[SHADOW_OVERFLOW_STACK_SIZE / sizeof(long)];
> > > -
> > >  #include <asm/processor.h>
> > >  #include <asm/csr.h>
> > >
> > > diff --git a/arch/riscv/kernel/asm-offsets.c b/arch/riscv/kernel/asm-offsets.c
> > > index df9444397908..62bf3bacc322 100644
> > > --- a/arch/riscv/kernel/asm-offsets.c
> > > +++ b/arch/riscv/kernel/asm-offsets.c
> > > @@ -37,6 +37,7 @@ void asm_offsets(void)
> > >       OFFSET(TASK_TI_PREEMPT_COUNT, task_struct, thread_info.preempt_count);
> > >       OFFSET(TASK_TI_KERNEL_SP, task_struct, thread_info.kernel_sp);
> > >       OFFSET(TASK_TI_USER_SP, task_struct, thread_info.user_sp);
> > > +     OFFSET(TASK_TI_CPU, task_struct, thread_info.cpu);
> > >
> > >       OFFSET(TASK_THREAD_F0,  task_struct, thread.fstate.f[0]);
> > >       OFFSET(TASK_THREAD_F1,  task_struct, thread.fstate.f[1]);
> > > diff --git a/arch/riscv/kernel/entry.S b/arch/riscv/kernel/entry.S
> > > index a3e1ed2fa2ac..5a6171a90d81 100644
> > > --- a/arch/riscv/kernel/entry.S
> > > +++ b/arch/riscv/kernel/entry.S
> > > @@ -223,54 +223,16 @@ END(ret_from_exception)
> > >
> > >  #ifdef CONFIG_VMAP_STACK
> > >  ENTRY(handle_kernel_stack_overflow)
> > > -     la sp, shadow_stack
> > > -     addi sp, sp, SHADOW_OVERFLOW_STACK_SIZE
> > > -
> > > -     //save caller register to shadow stack
> > > -     addi sp, sp, -(PT_SIZE_ON_STACK)
> > > -     REG_S x1,  PT_RA(sp)
> > > -     REG_S x5,  PT_T0(sp)
> > > -     REG_S x6,  PT_T1(sp)
> > > -     REG_S x7,  PT_T2(sp)
> > > -     REG_S x10, PT_A0(sp)
> > > -     REG_S x11, PT_A1(sp)
> > > -     REG_S x12, PT_A2(sp)
> > > -     REG_S x13, PT_A3(sp)
> > > -     REG_S x14, PT_A4(sp)
> > > -     REG_S x15, PT_A5(sp)
> > > -     REG_S x16, PT_A6(sp)
> > > -     REG_S x17, PT_A7(sp)
> > > -     REG_S x28, PT_T3(sp)
> > > -     REG_S x29, PT_T4(sp)
> > > -     REG_S x30, PT_T5(sp)
> > > -     REG_S x31, PT_T6(sp)
> > > -
> > > -     la ra, restore_caller_reg
> > > -     tail get_overflow_stack
> > > -
> > > -restore_caller_reg:
> > > -     //save per-cpu overflow stack
> > > -     REG_S a0, -8(sp)
> > > -     //restore caller register from shadow_stack
> > > -     REG_L x1,  PT_RA(sp)
> > > -     REG_L x5,  PT_T0(sp)
> > > -     REG_L x6,  PT_T1(sp)
> > > -     REG_L x7,  PT_T2(sp)
> > > -     REG_L x10, PT_A0(sp)
> > > -     REG_L x11, PT_A1(sp)
> > > -     REG_L x12, PT_A2(sp)
> > > -     REG_L x13, PT_A3(sp)
> > > -     REG_L x14, PT_A4(sp)
> > > -     REG_L x15, PT_A5(sp)
> > > -     REG_L x16, PT_A6(sp)
> > > -     REG_L x17, PT_A7(sp)
> > > -     REG_L x28, PT_T3(sp)
> > > -     REG_L x29, PT_T4(sp)
> > > -     REG_L x30, PT_T5(sp)
> > > -     REG_L x31, PT_T6(sp)
> > > +     la sp, overflow_stack
> > > +     /* use tp as tmp register since we can restore it from CSR_SCRATCH */
> > > +     REG_L tp, TASK_TI_CPU(tp)
> > > +     slli tp, tp, RISCV_LGPTR
> > > +     add tp, sp, tp
> > > +     REG_L sp, 0(tp)
> > > +
> > > +     /* restore tp */
> > > +     csrr tp, CSR_SCRATCH
> > >
> > > -     //load per-cpu overflow stack
> > > -     REG_L sp, -8(sp)
> > >       addi sp, sp, -(PT_SIZE_ON_STACK)
> > >
> > >       //save context to overflow stack
> > > diff --git a/arch/riscv/kernel/traps.c b/arch/riscv/kernel/traps.c
> > > index 73f06cd149d9..b6c64f0fb70f 100644
> > > --- a/arch/riscv/kernel/traps.c
> > > +++ b/arch/riscv/kernel/traps.c
> > > @@ -216,23 +216,12 @@ int is_valid_bugaddr(unsigned long pc)
> > >  #endif /* CONFIG_GENERIC_BUG */
> > >
> > >  #ifdef CONFIG_VMAP_STACK
> > > -static DEFINE_PER_CPU(unsigned long [OVERFLOW_STACK_SIZE/sizeof(long)],
> > > -             overflow_stack)__aligned(16);
> > > -/*
> > > - * shadow stack, handled_ kernel_ stack_ overflow(in kernel/entry.S) is used
> > > - * to get per-cpu overflow stack(get_overflow_stack).
> > > - */
> > > -long shadow_stack[SHADOW_OVERFLOW_STACK_SIZE/sizeof(long)];
> > > -asmlinkage unsigned long get_overflow_stack(void)
> > > -{
> > > -     return (unsigned long)this_cpu_ptr(overflow_stack) +
> > > -             OVERFLOW_STACK_SIZE;
> > > -}
> > > +void *overflow_stack[NR_CPUS] __ro_after_init __aligned(16);
> Er... We've talked NR_CPUS = 8192, even a pointer would cause 64KB wasted.
> 
> I prefer the previous solution with a atomic flag.

Hi guo,

I see your opinions. Here are my comments:

Now the range of riscv's NR_CPUS is 2~32, given I have removed the 1KB
shadow stack, so this patch saves 768Bytes or more rather than wastes
memory. No mention that I also removed the shadow stack save and
restore code.

From another side, we didn't see such riscv system with huge(8192) CPU
numbers. Even if we do have such system in the future, I belive such
system should be powered by lots of memory, 64KB is trivial comapred
with GBs or TBs memory size.

Given the simplicity of my solution, I still prefer my solution. What
do you think?

Thanks
> 
> diff --git a/arch/riscv/kernel/entry.S b/arch/riscv/kernel/entry.S
> index 5cbd6684ef52..42a3b14a20ab 100644
> --- a/arch/riscv/kernel/entry.S
> +++ b/arch/riscv/kernel/entry.S
> @@ -223,6 +223,9 @@ END(ret_from_exception)
> 
>  #ifdef CONFIG_VMAP_STACK
>  ENTRY(handle_kernel_stack_overflow)
> +1:     la sp, spin_ shadow_stack
> +       amoswap.w sp, sp, (sp)
> +       bnez sp, 1b
>         la sp, shadow_stack
>         addi sp, sp, SHADOW_OVERFLOW_STACK_SIZE
> 
> diff --git a/arch/riscv/kernel/traps.c b/arch/riscv/kernel/traps.c
> index 73f06cd149d9..9058a05cac53 100644
> --- a/arch/riscv/kernel/traps.c
> +++ b/arch/riscv/kernel/traps.c
> @@ -229,11 +229,15 @@ asmlinkage unsigned long get_overflow_stack(void)
>                 OVERFLOW_STACK_SIZE;
>  }
> 
> +atomic_t spin_ shadow_stack __section(".sdata");
> +
>  asmlinkage void handle_bad_stack(struct pt_regs *regs)
>  {
>         unsigned long tsk_stk = (unsigned long)current->stack;
>         unsigned long ovf_stk = (unsigned long)this_cpu_ptr(overflow_stack);
> 
> +       atomic_set_release(spin_ shadow_stack, 0);
> +
>         console_verbose();
> 
>         pr_emerg("Insufficient stack space to handle exception!\n");

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH v2 3/4] riscv: fix race when vmap stack overflow and remove shadow_stack
  2022-09-29 16:18       ` Jisheng Zhang
@ 2022-09-30  3:35         ` Guo Ren
  0 siblings, 0 replies; 18+ messages in thread
From: Guo Ren @ 2022-09-30  3:35 UTC (permalink / raw)
  To: Jisheng Zhang
  Cc: Paul Walmsley, Palmer Dabbelt, Albert Ou, Nathan Chancellor,
	Nick Desaulniers, linux-riscv, linux-kernel, llvm

On Fri, Sep 30, 2022 at 12:27 AM Jisheng Zhang <jszhang@kernel.org> wrote:
>
> On Thu, Sep 29, 2022 at 01:54:25PM +0800, Guo Ren wrote:
> > On Thu, Sep 29, 2022 at 1:03 AM Jisheng Zhang <jszhang@kernel.org> wrote:
> > >
> > > On Thu, Sep 29, 2022 at 12:20:06AM +0800, Jisheng Zhang wrote:
> > > > Currently, when detecting vmap stack overflow, riscv firstly switches
> > > > to the so called shadow stack, then use this shadow stack to call the
> > > > get_overflow_stack() to get the overflow stack. However, there's
> > > > a race here if two or more harts use the same shadow stack at the same
> > > > time.
> > > >
> > > > To solve this race, we rely on two facts:
> > > > 1. the content of kernel thread pointer I.E "tp" register can still
> > > > be gotten from the the CSR_SCRATCH register, thus we can clobber tp
> > > > under the condtion that we restore tp from CSR_SCRATCH later.
> > > >
> > > > 2. Once vmap stack overflow happen, panic is comming soon, no
> > > > performance concern at all, so we don't need to define the overflow
> > > > stack as percpu var, we can simplify it into a pointer array which
> > > > points to allocated pages.
> > > >
> > > > Thus we can use tp as a tmp register to get the cpu id to calculate
> > > > the offset of overflow stack pointer array for each cpu w/o shadow
> > > > stack any more. Thus the race condition is removed as a side effect.
> > > >
> > > > NOTE: we can use similar mechanism to let each cpu use different shadow
> > > > stack to fix the race codition, but if we can remove shadow stack usage
> > > > totally, why not.
> > > >
> > > > Signed-off-by: Jisheng Zhang <jszhang@kernel.org>
> > > > Fixes: 31da94c25aea ("riscv: add VMAP_STACK overflow detection")
> > > > ---
> > > >  arch/riscv/include/asm/asm-prototypes.h |  1 -
> > > >  arch/riscv/include/asm/thread_info.h    |  4 +-
> > > >  arch/riscv/kernel/asm-offsets.c         |  1 +
> > > >  arch/riscv/kernel/entry.S               | 56 ++++---------------------
> > > >  arch/riscv/kernel/traps.c               | 31 ++++++++------
> > > >  5 files changed, 29 insertions(+), 64 deletions(-)
> > > >
> > > > diff --git a/arch/riscv/include/asm/asm-prototypes.h b/arch/riscv/include/asm/asm-prototypes.h
> > > > index ef386fcf3939..4a06fa0f6493 100644
> > > > --- a/arch/riscv/include/asm/asm-prototypes.h
> > > > +++ b/arch/riscv/include/asm/asm-prototypes.h
> > > > @@ -25,7 +25,6 @@ DECLARE_DO_ERROR_INFO(do_trap_ecall_s);
> > > >  DECLARE_DO_ERROR_INFO(do_trap_ecall_m);
> > > >  DECLARE_DO_ERROR_INFO(do_trap_break);
> > > >
> > > > -asmlinkage unsigned long get_overflow_stack(void);
> > > >  asmlinkage void handle_bad_stack(struct pt_regs *regs);
> > > >
> > > >  #endif /* _ASM_RISCV_PROTOTYPES_H */
> > > > diff --git a/arch/riscv/include/asm/thread_info.h b/arch/riscv/include/asm/thread_info.h
> > > > index c970d41dc4c6..c604a5212a73 100644
> > > > --- a/arch/riscv/include/asm/thread_info.h
> > > > +++ b/arch/riscv/include/asm/thread_info.h
> > > > @@ -28,14 +28,12 @@
> > > >
> > > >  #define THREAD_SHIFT            (PAGE_SHIFT + THREAD_SIZE_ORDER)
> > > >  #define OVERFLOW_STACK_SIZE     SZ_4K
> > > > -#define SHADOW_OVERFLOW_STACK_SIZE (1024)
> > > > +#define OVERFLOW_STACK_SHIFT 12
> > >
> > > oops, this should be removed, will update it in a newer version after
> > > collecting review comments.
> > >
> > > >
> > > >  #define IRQ_STACK_SIZE               THREAD_SIZE
> > > >
> > > >  #ifndef __ASSEMBLY__
> > > >
> > > > -extern long shadow_stack[SHADOW_OVERFLOW_STACK_SIZE / sizeof(long)];
> > > > -
> > > >  #include <asm/processor.h>
> > > >  #include <asm/csr.h>
> > > >
> > > > diff --git a/arch/riscv/kernel/asm-offsets.c b/arch/riscv/kernel/asm-offsets.c
> > > > index df9444397908..62bf3bacc322 100644
> > > > --- a/arch/riscv/kernel/asm-offsets.c
> > > > +++ b/arch/riscv/kernel/asm-offsets.c
> > > > @@ -37,6 +37,7 @@ void asm_offsets(void)
> > > >       OFFSET(TASK_TI_PREEMPT_COUNT, task_struct, thread_info.preempt_count);
> > > >       OFFSET(TASK_TI_KERNEL_SP, task_struct, thread_info.kernel_sp);
> > > >       OFFSET(TASK_TI_USER_SP, task_struct, thread_info.user_sp);
> > > > +     OFFSET(TASK_TI_CPU, task_struct, thread_info.cpu);
> > > >
> > > >       OFFSET(TASK_THREAD_F0,  task_struct, thread.fstate.f[0]);
> > > >       OFFSET(TASK_THREAD_F1,  task_struct, thread.fstate.f[1]);
> > > > diff --git a/arch/riscv/kernel/entry.S b/arch/riscv/kernel/entry.S
> > > > index a3e1ed2fa2ac..5a6171a90d81 100644
> > > > --- a/arch/riscv/kernel/entry.S
> > > > +++ b/arch/riscv/kernel/entry.S
> > > > @@ -223,54 +223,16 @@ END(ret_from_exception)
> > > >
> > > >  #ifdef CONFIG_VMAP_STACK
> > > >  ENTRY(handle_kernel_stack_overflow)
> > > > -     la sp, shadow_stack
> > > > -     addi sp, sp, SHADOW_OVERFLOW_STACK_SIZE
> > > > -
> > > > -     //save caller register to shadow stack
> > > > -     addi sp, sp, -(PT_SIZE_ON_STACK)
> > > > -     REG_S x1,  PT_RA(sp)
> > > > -     REG_S x5,  PT_T0(sp)
> > > > -     REG_S x6,  PT_T1(sp)
> > > > -     REG_S x7,  PT_T2(sp)
> > > > -     REG_S x10, PT_A0(sp)
> > > > -     REG_S x11, PT_A1(sp)
> > > > -     REG_S x12, PT_A2(sp)
> > > > -     REG_S x13, PT_A3(sp)
> > > > -     REG_S x14, PT_A4(sp)
> > > > -     REG_S x15, PT_A5(sp)
> > > > -     REG_S x16, PT_A6(sp)
> > > > -     REG_S x17, PT_A7(sp)
> > > > -     REG_S x28, PT_T3(sp)
> > > > -     REG_S x29, PT_T4(sp)
> > > > -     REG_S x30, PT_T5(sp)
> > > > -     REG_S x31, PT_T6(sp)
> > > > -
> > > > -     la ra, restore_caller_reg
> > > > -     tail get_overflow_stack
> > > > -
> > > > -restore_caller_reg:
> > > > -     //save per-cpu overflow stack
> > > > -     REG_S a0, -8(sp)
> > > > -     //restore caller register from shadow_stack
> > > > -     REG_L x1,  PT_RA(sp)
> > > > -     REG_L x5,  PT_T0(sp)
> > > > -     REG_L x6,  PT_T1(sp)
> > > > -     REG_L x7,  PT_T2(sp)
> > > > -     REG_L x10, PT_A0(sp)
> > > > -     REG_L x11, PT_A1(sp)
> > > > -     REG_L x12, PT_A2(sp)
> > > > -     REG_L x13, PT_A3(sp)
> > > > -     REG_L x14, PT_A4(sp)
> > > > -     REG_L x15, PT_A5(sp)
> > > > -     REG_L x16, PT_A6(sp)
> > > > -     REG_L x17, PT_A7(sp)
> > > > -     REG_L x28, PT_T3(sp)
> > > > -     REG_L x29, PT_T4(sp)
> > > > -     REG_L x30, PT_T5(sp)
> > > > -     REG_L x31, PT_T6(sp)
> > > > +     la sp, overflow_stack
> > > > +     /* use tp as tmp register since we can restore it from CSR_SCRATCH */
> > > > +     REG_L tp, TASK_TI_CPU(tp)
> > > > +     slli tp, tp, RISCV_LGPTR
> > > > +     add tp, sp, tp
> > > > +     REG_L sp, 0(tp)
> > > > +
> > > > +     /* restore tp */
> > > > +     csrr tp, CSR_SCRATCH
> > > >
> > > > -     //load per-cpu overflow stack
> > > > -     REG_L sp, -8(sp)
> > > >       addi sp, sp, -(PT_SIZE_ON_STACK)
> > > >
> > > >       //save context to overflow stack
> > > > diff --git a/arch/riscv/kernel/traps.c b/arch/riscv/kernel/traps.c
> > > > index 73f06cd149d9..b6c64f0fb70f 100644
> > > > --- a/arch/riscv/kernel/traps.c
> > > > +++ b/arch/riscv/kernel/traps.c
> > > > @@ -216,23 +216,12 @@ int is_valid_bugaddr(unsigned long pc)
> > > >  #endif /* CONFIG_GENERIC_BUG */
> > > >
> > > >  #ifdef CONFIG_VMAP_STACK
> > > > -static DEFINE_PER_CPU(unsigned long [OVERFLOW_STACK_SIZE/sizeof(long)],
> > > > -             overflow_stack)__aligned(16);
> > > > -/*
> > > > - * shadow stack, handled_ kernel_ stack_ overflow(in kernel/entry.S) is used
> > > > - * to get per-cpu overflow stack(get_overflow_stack).
> > > > - */
> > > > -long shadow_stack[SHADOW_OVERFLOW_STACK_SIZE/sizeof(long)];
> > > > -asmlinkage unsigned long get_overflow_stack(void)
> > > > -{
> > > > -     return (unsigned long)this_cpu_ptr(overflow_stack) +
> > > > -             OVERFLOW_STACK_SIZE;
> > > > -}
> > > > +void *overflow_stack[NR_CPUS] __ro_after_init __aligned(16);
> > Er... We've talked NR_CPUS = 8192, even a pointer would cause 64KB wasted.
> >
> > I prefer the previous solution with a atomic flag.
>
> Hi guo,
>
> I see your opinions. Here are my comments:
>
> Now the range of riscv's NR_CPUS is 2~32, given I have removed the 1KB
> shadow stack, so this patch saves 768Bytes or more rather than wastes
> memory. No mention that I also removed the shadow stack save and
> restore code.
>
> From another side, we didn't see such riscv system with huge(8192) CPU
> numbers. Even if we do have such system in the future, I belive such
> system should be powered by lots of memory, 64KB is trivial comapred
> with GBs or TBs memory size.
>
> Given the simplicity of my solution, I still prefer my solution. What
> do you think?
Yours is a new proposal, but mine is just a fixup. So I think we
should fix up the previous bug. And then move to the next.

>
> Thanks
> >
> > diff --git a/arch/riscv/kernel/entry.S b/arch/riscv/kernel/entry.S
> > index 5cbd6684ef52..42a3b14a20ab 100644
> > --- a/arch/riscv/kernel/entry.S
> > +++ b/arch/riscv/kernel/entry.S
> > @@ -223,6 +223,9 @@ END(ret_from_exception)
> >
> >  #ifdef CONFIG_VMAP_STACK
> >  ENTRY(handle_kernel_stack_overflow)
> > +1:     la sp, spin_ shadow_stack
> > +       amoswap.w sp, sp, (sp)
> > +       bnez sp, 1b
> >         la sp, shadow_stack
> >         addi sp, sp, SHADOW_OVERFLOW_STACK_SIZE
> >
> > diff --git a/arch/riscv/kernel/traps.c b/arch/riscv/kernel/traps.c
> > index 73f06cd149d9..9058a05cac53 100644
> > --- a/arch/riscv/kernel/traps.c
> > +++ b/arch/riscv/kernel/traps.c
> > @@ -229,11 +229,15 @@ asmlinkage unsigned long get_overflow_stack(void)
> >                 OVERFLOW_STACK_SIZE;
> >  }
> >
> > +atomic_t spin_ shadow_stack __section(".sdata");
> > +
> >  asmlinkage void handle_bad_stack(struct pt_regs *regs)
> >  {
> >         unsigned long tsk_stk = (unsigned long)current->stack;
> >         unsigned long ovf_stk = (unsigned long)this_cpu_ptr(overflow_stack);
> >
> > +       atomic_set_release(spin_ shadow_stack, 0);
> > +
> >         console_verbose();
> >
> >         pr_emerg("Insufficient stack space to handle exception!\n");



-- 
Best Regards
 Guo Ren

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH v2 3/4] riscv: fix race when vmap stack overflow and remove shadow_stack
  2022-09-29  5:54     ` Guo Ren
  2022-09-29  6:58       ` Guo Ren
  2022-09-29 16:18       ` Jisheng Zhang
@ 2022-10-19  2:52       ` Guo Ren
  2022-10-19 13:05         ` Jisheng Zhang
  2 siblings, 1 reply; 18+ messages in thread
From: Guo Ren @ 2022-10-19  2:52 UTC (permalink / raw)
  To: Jisheng Zhang
  Cc: Paul Walmsley, Palmer Dabbelt, Albert Ou, Nathan Chancellor,
	Nick Desaulniers, linux-riscv, linux-kernel, llvm

On Thu, Sep 29, 2022 at 1:54 PM Guo Ren <guoren@kernel.org> wrote:
>
> On Thu, Sep 29, 2022 at 1:03 AM Jisheng Zhang <jszhang@kernel.org> wrote:
> >
> > On Thu, Sep 29, 2022 at 12:20:06AM +0800, Jisheng Zhang wrote:
> > > Currently, when detecting vmap stack overflow, riscv firstly switches
> > > to the so called shadow stack, then use this shadow stack to call the
> > > get_overflow_stack() to get the overflow stack. However, there's
> > > a race here if two or more harts use the same shadow stack at the same
> > > time.
> > >
> > > To solve this race, we rely on two facts:
> > > 1. the content of kernel thread pointer I.E "tp" register can still
> > > be gotten from the the CSR_SCRATCH register, thus we can clobber tp
> > > under the condtion that we restore tp from CSR_SCRATCH later.
> > >
> > > 2. Once vmap stack overflow happen, panic is comming soon, no
> > > performance concern at all, so we don't need to define the overflow
> > > stack as percpu var, we can simplify it into a pointer array which
> > > points to allocated pages.
> > >
> > > Thus we can use tp as a tmp register to get the cpu id to calculate
> > > the offset of overflow stack pointer array for each cpu w/o shadow
> > > stack any more. Thus the race condition is removed as a side effect.
> > >
> > > NOTE: we can use similar mechanism to let each cpu use different shadow
> > > stack to fix the race codition, but if we can remove shadow stack usage
> > > totally, why not.
> > >
> > > Signed-off-by: Jisheng Zhang <jszhang@kernel.org>
> > > Fixes: 31da94c25aea ("riscv: add VMAP_STACK overflow detection")
> > > ---
> > >  arch/riscv/include/asm/asm-prototypes.h |  1 -
> > >  arch/riscv/include/asm/thread_info.h    |  4 +-
> > >  arch/riscv/kernel/asm-offsets.c         |  1 +
> > >  arch/riscv/kernel/entry.S               | 56 ++++---------------------
> > >  arch/riscv/kernel/traps.c               | 31 ++++++++------
> > >  5 files changed, 29 insertions(+), 64 deletions(-)
> > >
> > > diff --git a/arch/riscv/include/asm/asm-prototypes.h b/arch/riscv/include/asm/asm-prototypes.h
> > > index ef386fcf3939..4a06fa0f6493 100644
> > > --- a/arch/riscv/include/asm/asm-prototypes.h
> > > +++ b/arch/riscv/include/asm/asm-prototypes.h
> > > @@ -25,7 +25,6 @@ DECLARE_DO_ERROR_INFO(do_trap_ecall_s);
> > >  DECLARE_DO_ERROR_INFO(do_trap_ecall_m);
> > >  DECLARE_DO_ERROR_INFO(do_trap_break);
> > >
> > > -asmlinkage unsigned long get_overflow_stack(void);
> > >  asmlinkage void handle_bad_stack(struct pt_regs *regs);
> > >
> > >  #endif /* _ASM_RISCV_PROTOTYPES_H */
> > > diff --git a/arch/riscv/include/asm/thread_info.h b/arch/riscv/include/asm/thread_info.h
> > > index c970d41dc4c6..c604a5212a73 100644
> > > --- a/arch/riscv/include/asm/thread_info.h
> > > +++ b/arch/riscv/include/asm/thread_info.h
> > > @@ -28,14 +28,12 @@
> > >
> > >  #define THREAD_SHIFT            (PAGE_SHIFT + THREAD_SIZE_ORDER)
> > >  #define OVERFLOW_STACK_SIZE     SZ_4K
> > > -#define SHADOW_OVERFLOW_STACK_SIZE (1024)
> > > +#define OVERFLOW_STACK_SHIFT 12
> >
> > oops, this should be removed, will update it in a newer version after
> > collecting review comments.
> >
> > >
> > >  #define IRQ_STACK_SIZE               THREAD_SIZE
> > >
> > >  #ifndef __ASSEMBLY__
> > >
> > > -extern long shadow_stack[SHADOW_OVERFLOW_STACK_SIZE / sizeof(long)];
> > > -
> > >  #include <asm/processor.h>
> > >  #include <asm/csr.h>
> > >
> > > diff --git a/arch/riscv/kernel/asm-offsets.c b/arch/riscv/kernel/asm-offsets.c
> > > index df9444397908..62bf3bacc322 100644
> > > --- a/arch/riscv/kernel/asm-offsets.c
> > > +++ b/arch/riscv/kernel/asm-offsets.c
> > > @@ -37,6 +37,7 @@ void asm_offsets(void)
> > >       OFFSET(TASK_TI_PREEMPT_COUNT, task_struct, thread_info.preempt_count);
> > >       OFFSET(TASK_TI_KERNEL_SP, task_struct, thread_info.kernel_sp);
> > >       OFFSET(TASK_TI_USER_SP, task_struct, thread_info.user_sp);
> > > +     OFFSET(TASK_TI_CPU, task_struct, thread_info.cpu);
> > >
> > >       OFFSET(TASK_THREAD_F0,  task_struct, thread.fstate.f[0]);
> > >       OFFSET(TASK_THREAD_F1,  task_struct, thread.fstate.f[1]);
> > > diff --git a/arch/riscv/kernel/entry.S b/arch/riscv/kernel/entry.S
> > > index a3e1ed2fa2ac..5a6171a90d81 100644
> > > --- a/arch/riscv/kernel/entry.S
> > > +++ b/arch/riscv/kernel/entry.S
> > > @@ -223,54 +223,16 @@ END(ret_from_exception)
> > >
> > >  #ifdef CONFIG_VMAP_STACK
> > >  ENTRY(handle_kernel_stack_overflow)
> > > -     la sp, shadow_stack
> > > -     addi sp, sp, SHADOW_OVERFLOW_STACK_SIZE
> > > -
> > > -     //save caller register to shadow stack
> > > -     addi sp, sp, -(PT_SIZE_ON_STACK)
> > > -     REG_S x1,  PT_RA(sp)
> > > -     REG_S x5,  PT_T0(sp)
> > > -     REG_S x6,  PT_T1(sp)
> > > -     REG_S x7,  PT_T2(sp)
> > > -     REG_S x10, PT_A0(sp)
> > > -     REG_S x11, PT_A1(sp)
> > > -     REG_S x12, PT_A2(sp)
> > > -     REG_S x13, PT_A3(sp)
> > > -     REG_S x14, PT_A4(sp)
> > > -     REG_S x15, PT_A5(sp)
> > > -     REG_S x16, PT_A6(sp)
> > > -     REG_S x17, PT_A7(sp)
> > > -     REG_S x28, PT_T3(sp)
> > > -     REG_S x29, PT_T4(sp)
> > > -     REG_S x30, PT_T5(sp)
> > > -     REG_S x31, PT_T6(sp)
> > > -
> > > -     la ra, restore_caller_reg
> > > -     tail get_overflow_stack
> > > -
> > > -restore_caller_reg:
> > > -     //save per-cpu overflow stack
> > > -     REG_S a0, -8(sp)
> > > -     //restore caller register from shadow_stack
> > > -     REG_L x1,  PT_RA(sp)
> > > -     REG_L x5,  PT_T0(sp)
> > > -     REG_L x6,  PT_T1(sp)
> > > -     REG_L x7,  PT_T2(sp)
> > > -     REG_L x10, PT_A0(sp)
> > > -     REG_L x11, PT_A1(sp)
> > > -     REG_L x12, PT_A2(sp)
> > > -     REG_L x13, PT_A3(sp)
> > > -     REG_L x14, PT_A4(sp)
> > > -     REG_L x15, PT_A5(sp)
> > > -     REG_L x16, PT_A6(sp)
> > > -     REG_L x17, PT_A7(sp)
> > > -     REG_L x28, PT_T3(sp)
> > > -     REG_L x29, PT_T4(sp)
> > > -     REG_L x30, PT_T5(sp)
> > > -     REG_L x31, PT_T6(sp)
> > > +     la sp, overflow_stack
> > > +     /* use tp as tmp register since we can restore it from CSR_SCRATCH */
> > > +     REG_L tp, TASK_TI_CPU(tp)
> > > +     slli tp, tp, RISCV_LGPTR
> > > +     add tp, sp, tp
> > > +     REG_L sp, 0(tp)
> > > +
> > > +     /* restore tp */
> > > +     csrr tp, CSR_SCRATCH
> > >
> > > -     //load per-cpu overflow stack
> > > -     REG_L sp, -8(sp)
> > >       addi sp, sp, -(PT_SIZE_ON_STACK)
> > >
> > >       //save context to overflow stack
> > > diff --git a/arch/riscv/kernel/traps.c b/arch/riscv/kernel/traps.c
> > > index 73f06cd149d9..b6c64f0fb70f 100644
> > > --- a/arch/riscv/kernel/traps.c
> > > +++ b/arch/riscv/kernel/traps.c
> > > @@ -216,23 +216,12 @@ int is_valid_bugaddr(unsigned long pc)
> > >  #endif /* CONFIG_GENERIC_BUG */
> > >
> > >  #ifdef CONFIG_VMAP_STACK
> > > -static DEFINE_PER_CPU(unsigned long [OVERFLOW_STACK_SIZE/sizeof(long)],
> > > -             overflow_stack)__aligned(16);
> > > -/*
> > > - * shadow stack, handled_ kernel_ stack_ overflow(in kernel/entry.S) is used
> > > - * to get per-cpu overflow stack(get_overflow_stack).
> > > - */
> > > -long shadow_stack[SHADOW_OVERFLOW_STACK_SIZE/sizeof(long)];
> > > -asmlinkage unsigned long get_overflow_stack(void)
> > > -{
> > > -     return (unsigned long)this_cpu_ptr(overflow_stack) +
> > > -             OVERFLOW_STACK_SIZE;
> > > -}
> > > +void *overflow_stack[NR_CPUS] __ro_after_init __aligned(16);
> Er... We've talked NR_CPUS = 8192, even a pointer would cause 64KB wasted.
>
> I prefer the previous solution with a atomic flag.
>
> diff --git a/arch/riscv/kernel/entry.S b/arch/riscv/kernel/entry.S
> index 5cbd6684ef52..42a3b14a20ab 100644
> --- a/arch/riscv/kernel/entry.S
> +++ b/arch/riscv/kernel/entry.S
> @@ -223,6 +223,9 @@ END(ret_from_exception)
>
>  #ifdef CONFIG_VMAP_STACK
>  ENTRY(handle_kernel_stack_overflow)
> +1:     la sp, spin_ shadow_stack
> +       amoswap.w sp, sp, (sp)
> +       bnez sp, 1b
>         la sp, shadow_stack
>         addi sp, sp, SHADOW_OVERFLOW_STACK_SIZE
>
> diff --git a/arch/riscv/kernel/traps.c b/arch/riscv/kernel/traps.c
> index 73f06cd149d9..9058a05cac53 100644
> --- a/arch/riscv/kernel/traps.c
> +++ b/arch/riscv/kernel/traps.c
> @@ -229,11 +229,15 @@ asmlinkage unsigned long get_overflow_stack(void)
>                 OVERFLOW_STACK_SIZE;
>  }
>
> +atomic_t spin_ shadow_stack __section(".sdata");
> +
>  asmlinkage void handle_bad_stack(struct pt_regs *regs)
>  {
>         unsigned long tsk_stk = (unsigned long)current->stack;
>         unsigned long ovf_stk = (unsigned long)this_cpu_ptr(overflow_stack);
>
> +       atomic_set_release(spin_ shadow_stack, 0);
> +
>         console_verbose();
>
>         pr_emerg("Insufficient stack space to handle exception!\n");
Hi Jisheng,

Could you give out a new patch based on the above style to fix the
problem first? It's a minor modification and consolidate the entry.S
procedure.


>
>
>
>
>
> > >
> > >  asmlinkage void handle_bad_stack(struct pt_regs *regs)
> > >  {
> > >       unsigned long tsk_stk = (unsigned long)current->stack;
> > > -     unsigned long ovf_stk = (unsigned long)this_cpu_ptr(overflow_stack);
> > > +     unsigned long ovf_stk = (unsigned long)overflow_stack[raw_smp_processor_id()];
> > >
> > >       console_verbose();
> > >
> > > @@ -248,4 +237,20 @@ asmlinkage void handle_bad_stack(struct pt_regs *regs)
> > >       for (;;)
> > >               wait_for_interrupt();
> > >  }
> > > +
> > > +static int __init alloc_overflow_stacks(void)
> > > +{
> > > +     u8 *s;
> > > +     int cpu;
> > > +
> > > +     for_each_possible_cpu(cpu) {
> > > +             s = (u8 *)__get_free_pages(GFP_KERNEL, get_order(OVERFLOW_STACK_SIZE));
> > > +             if (WARN_ON(!s))
> > > +                     return -ENOMEM;
> > > +             overflow_stack[cpu] = &s[OVERFLOW_STACK_SIZE];
> >
> > Since overflow_stack[cpu] points to the top of the slack, we need to
> > update the ovf_stack dumping in handle_bad_stack(). will take care
> > this in newer version.
> >
> > > +             printk("%px\n", overflow_stack[cpu]);
> >
> > forget to remove this printk :(
>
>
>
> --
> Best Regards
>  Guo Ren



-- 
Best Regards
 Guo Ren

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH v2 4/4] riscv: entry: consolidate general regs saving into save_gp
  2022-09-29 16:01     ` Jisheng Zhang
@ 2022-10-19  2:55       ` Guo Ren
  2022-10-19 13:01         ` Jisheng Zhang
  0 siblings, 1 reply; 18+ messages in thread
From: Guo Ren @ 2022-10-19  2:55 UTC (permalink / raw)
  To: Jisheng Zhang
  Cc: Paul Walmsley, Palmer Dabbelt, Albert Ou, Nathan Chancellor,
	Nick Desaulniers, linux-riscv, linux-kernel, llvm

On Fri, Sep 30, 2022 at 12:11 AM Jisheng Zhang <jszhang@kernel.org> wrote:
>
> On Thu, Sep 29, 2022 at 11:59:00AM +0800, Guo Ren wrote:
> > On Thu, Sep 29, 2022 at 12:29 AM Jisheng Zhang <jszhang@kernel.org> wrote:
> > >
> > > Consolidate the saving/restoring GPs(except ra, sp and tp) into
> > > save_gp/restore_gp macro.
> > >
> > > No functional change intended.
> > >
> > > Signed-off-by: Jisheng Zhang <jszhang@kernel.org>
> > > ---
> > >  arch/riscv/include/asm/asm.h   | 65 +++++++++++++++++++++++++
> > >  arch/riscv/kernel/entry.S      | 87 ++--------------------------------
> > >  arch/riscv/kernel/mcount-dyn.S | 58 +----------------------
> > >  3 files changed, 70 insertions(+), 140 deletions(-)
> > >
> > > diff --git a/arch/riscv/include/asm/asm.h b/arch/riscv/include/asm/asm.h
> > > index 1b471ff73178..2f3b49536e9d 100644
> > > --- a/arch/riscv/include/asm/asm.h
> > > +++ b/arch/riscv/include/asm/asm.h
> > > @@ -68,6 +68,7 @@
> > >  #endif
> > >
> > >  #ifdef __ASSEMBLY__
> > > +#include <asm/asm-offsets.h>
> > >
> > >  /* Common assembly source macros */
> > >
> > > @@ -80,6 +81,70 @@
> > >         .endr
> > >  .endm
> > >
> > > +       /* save all GPs except ra, sp and tp */
> > > +       .macro save_gp
> > How about leave x3(gp) out of the macro, and define:
> > .marco save_from_x5_to_x31
> > .marco restore_from_x5_to_x31
>
> Good idea, will do in next version.

Where is the next version?

I want to involve the patch in my generic entry series, it would help
in the coding convention of the entry code.

>
> Thanks



-- 
Best Regards
 Guo Ren

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH v2 2/4] riscv: consolidate ret_from_kernel_thread into ret_from_fork
  2022-09-28 16:20 ` [PATCH v2 2/4] riscv: consolidate ret_from_kernel_thread into ret_from_fork Jisheng Zhang
@ 2022-10-19  3:37   ` Guo Ren
  0 siblings, 0 replies; 18+ messages in thread
From: Guo Ren @ 2022-10-19  3:37 UTC (permalink / raw)
  To: Jisheng Zhang
  Cc: Paul Walmsley, Palmer Dabbelt, Albert Ou, Nathan Chancellor,
	Nick Desaulniers, linux-riscv, linux-kernel, llvm

I would involve this patch in the generic entry series. It would help
remove the ret_from_kernel_thread part in the entry.S.

On Thu, Sep 29, 2022 at 12:29 AM Jisheng Zhang <jszhang@kernel.org> wrote:
>
> The ret_from_kernel_thread() behaves similarly with ret_from_fork(),
> the only difference is whether call the fn(arg) or not, this can be
> acchieved by testing fn is NULL or not, I.E s0 is 0 or not.
>
> Signed-off-by: Jisheng Zhang <jszhang@kernel.org>
> ---
>  arch/riscv/kernel/entry.S   | 11 +++--------
>  arch/riscv/kernel/process.c |  5 ++---
>  2 files changed, 5 insertions(+), 11 deletions(-)
>
> diff --git a/arch/riscv/kernel/entry.S b/arch/riscv/kernel/entry.S
> index 2207cf44a3bc..a3e1ed2fa2ac 100644
> --- a/arch/riscv/kernel/entry.S
> +++ b/arch/riscv/kernel/entry.S
> @@ -323,20 +323,15 @@ END(handle_kernel_stack_overflow)
>
>  ENTRY(ret_from_fork)
>         call schedule_tail
> -       move a0, sp /* pt_regs */
> -       la ra, ret_from_exception
> -       tail syscall_exit_to_user_mode
> -ENDPROC(ret_from_fork)
> -
> -ENTRY(ret_from_kernel_thread)
> -       call schedule_tail
> +       beqz s0, 1f     /* not from kernel thread */
>         /* Call fn(arg) */
>         move a0, s1
>         jalr s0
> +1:
>         move a0, sp /* pt_regs */
>         la ra, ret_from_exception
>         tail syscall_exit_to_user_mode
> -ENDPROC(ret_from_kernel_thread)
> +ENDPROC(ret_from_fork)
>
>  #ifdef CONFIG_IRQ_STACKS
>  ENTRY(call_on_stack)
> diff --git a/arch/riscv/kernel/process.c b/arch/riscv/kernel/process.c
> index ceb9ebab6558..67e7cd123ceb 100644
> --- a/arch/riscv/kernel/process.c
> +++ b/arch/riscv/kernel/process.c
> @@ -34,7 +34,6 @@ EXPORT_SYMBOL(__stack_chk_guard);
>  #endif
>
>  extern asmlinkage void ret_from_fork(void);
> -extern asmlinkage void ret_from_kernel_thread(void);
>
>  void arch_cpu_idle(void)
>  {
> @@ -172,7 +171,6 @@ int copy_thread(struct task_struct *p, const struct kernel_clone_args *args)
>                 /* Supervisor/Machine, irqs on: */
>                 childregs->status = SR_PP | SR_PIE;
>
> -               p->thread.ra = (unsigned long)ret_from_kernel_thread;
>                 p->thread.s[0] = (unsigned long)args->fn;
>                 p->thread.s[1] = (unsigned long)args->fn_arg;
>         } else {
> @@ -182,8 +180,9 @@ int copy_thread(struct task_struct *p, const struct kernel_clone_args *args)
>                 if (clone_flags & CLONE_SETTLS)
>                         childregs->tp = tls;
>                 childregs->a0 = 0; /* Return value of fork() */
> -               p->thread.ra = (unsigned long)ret_from_fork;
> +               p->thread.s[0] = 0;
>         }
> +       p->thread.ra = (unsigned long)ret_from_fork;
>         p->thread.sp = (unsigned long)childregs; /* kernel sp */
>         return 0;
>  }
> --
> 2.34.1
>


-- 
Best Regards
 Guo Ren

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH v2 4/4] riscv: entry: consolidate general regs saving into save_gp
  2022-10-19  2:55       ` Guo Ren
@ 2022-10-19 13:01         ` Jisheng Zhang
  0 siblings, 0 replies; 18+ messages in thread
From: Jisheng Zhang @ 2022-10-19 13:01 UTC (permalink / raw)
  To: Guo Ren
  Cc: Paul Walmsley, Palmer Dabbelt, Albert Ou, Nathan Chancellor,
	Nick Desaulniers, linux-riscv, linux-kernel, llvm

On Wed, Oct 19, 2022 at 10:55:09AM +0800, Guo Ren wrote:
> On Fri, Sep 30, 2022 at 12:11 AM Jisheng Zhang <jszhang@kernel.org> wrote:
> >
> > On Thu, Sep 29, 2022 at 11:59:00AM +0800, Guo Ren wrote:
> > > On Thu, Sep 29, 2022 at 12:29 AM Jisheng Zhang <jszhang@kernel.org> wrote:
> > > >
> > > > Consolidate the saving/restoring GPs(except ra, sp and tp) into
> > > > save_gp/restore_gp macro.
> > > >
> > > > No functional change intended.
> > > >
> > > > Signed-off-by: Jisheng Zhang <jszhang@kernel.org>
> > > > ---
> > > >  arch/riscv/include/asm/asm.h   | 65 +++++++++++++++++++++++++
> > > >  arch/riscv/kernel/entry.S      | 87 ++--------------------------------
> > > >  arch/riscv/kernel/mcount-dyn.S | 58 +----------------------
> > > >  3 files changed, 70 insertions(+), 140 deletions(-)
> > > >
> > > > diff --git a/arch/riscv/include/asm/asm.h b/arch/riscv/include/asm/asm.h
> > > > index 1b471ff73178..2f3b49536e9d 100644
> > > > --- a/arch/riscv/include/asm/asm.h
> > > > +++ b/arch/riscv/include/asm/asm.h
> > > > @@ -68,6 +68,7 @@
> > > >  #endif
> > > >
> > > >  #ifdef __ASSEMBLY__
> > > > +#include <asm/asm-offsets.h>
> > > >
> > > >  /* Common assembly source macros */
> > > >
> > > > @@ -80,6 +81,70 @@
> > > >         .endr
> > > >  .endm
> > > >
> > > > +       /* save all GPs except ra, sp and tp */
> > > > +       .macro save_gp
> > > How about leave x3(gp) out of the macro, and define:
> > > .marco save_from_x5_to_x31
> > > .marco restore_from_x5_to_x31
> >
> > Good idea, will do in next version.
> 
> Where is the next version?
> 
> I want to involve the patch in my generic entry series, it would help
> in the coding convention of the entry code.

Hi,

Here is the v3:
https://lore.kernel.org/linux-riscv/20221003102921.3973-1-jszhang@kernel.org/T/#t

thanks very much

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH v2 3/4] riscv: fix race when vmap stack overflow and remove shadow_stack
  2022-10-19  2:52       ` Guo Ren
@ 2022-10-19 13:05         ` Jisheng Zhang
  2022-10-20  1:00           ` Guo Ren
  0 siblings, 1 reply; 18+ messages in thread
From: Jisheng Zhang @ 2022-10-19 13:05 UTC (permalink / raw)
  To: Guo Ren
  Cc: Paul Walmsley, Palmer Dabbelt, Albert Ou, Nathan Chancellor,
	Nick Desaulniers, linux-riscv, linux-kernel, llvm

On Wed, Oct 19, 2022 at 10:52:16AM +0800, Guo Ren wrote:
> On Thu, Sep 29, 2022 at 1:54 PM Guo Ren <guoren@kernel.org> wrote:
> >
> > On Thu, Sep 29, 2022 at 1:03 AM Jisheng Zhang <jszhang@kernel.org> wrote:
> > >
> > > On Thu, Sep 29, 2022 at 12:20:06AM +0800, Jisheng Zhang wrote:
> > > > Currently, when detecting vmap stack overflow, riscv firstly switches
> > > > to the so called shadow stack, then use this shadow stack to call the
> > > > get_overflow_stack() to get the overflow stack. However, there's
> > > > a race here if two or more harts use the same shadow stack at the same
> > > > time.
> > > >
> > > > To solve this race, we rely on two facts:
> > > > 1. the content of kernel thread pointer I.E "tp" register can still
> > > > be gotten from the the CSR_SCRATCH register, thus we can clobber tp
> > > > under the condtion that we restore tp from CSR_SCRATCH later.
> > > >
> > > > 2. Once vmap stack overflow happen, panic is comming soon, no
> > > > performance concern at all, so we don't need to define the overflow
> > > > stack as percpu var, we can simplify it into a pointer array which
> > > > points to allocated pages.
> > > >
> > > > Thus we can use tp as a tmp register to get the cpu id to calculate
> > > > the offset of overflow stack pointer array for each cpu w/o shadow
> > > > stack any more. Thus the race condition is removed as a side effect.
> > > >
> > > > NOTE: we can use similar mechanism to let each cpu use different shadow
> > > > stack to fix the race codition, but if we can remove shadow stack usage
> > > > totally, why not.
> > > >
> > > > Signed-off-by: Jisheng Zhang <jszhang@kernel.org>
> > > > Fixes: 31da94c25aea ("riscv: add VMAP_STACK overflow detection")
> > > > ---
> > > >  arch/riscv/include/asm/asm-prototypes.h |  1 -
> > > >  arch/riscv/include/asm/thread_info.h    |  4 +-
> > > >  arch/riscv/kernel/asm-offsets.c         |  1 +
> > > >  arch/riscv/kernel/entry.S               | 56 ++++---------------------
> > > >  arch/riscv/kernel/traps.c               | 31 ++++++++------
> > > >  5 files changed, 29 insertions(+), 64 deletions(-)
> > > >
> > > > diff --git a/arch/riscv/include/asm/asm-prototypes.h b/arch/riscv/include/asm/asm-prototypes.h
> > > > index ef386fcf3939..4a06fa0f6493 100644
> > > > --- a/arch/riscv/include/asm/asm-prototypes.h
> > > > +++ b/arch/riscv/include/asm/asm-prototypes.h
> > > > @@ -25,7 +25,6 @@ DECLARE_DO_ERROR_INFO(do_trap_ecall_s);
> > > >  DECLARE_DO_ERROR_INFO(do_trap_ecall_m);
> > > >  DECLARE_DO_ERROR_INFO(do_trap_break);
> > > >
> > > > -asmlinkage unsigned long get_overflow_stack(void);
> > > >  asmlinkage void handle_bad_stack(struct pt_regs *regs);
> > > >
> > > >  #endif /* _ASM_RISCV_PROTOTYPES_H */
> > > > diff --git a/arch/riscv/include/asm/thread_info.h b/arch/riscv/include/asm/thread_info.h
> > > > index c970d41dc4c6..c604a5212a73 100644
> > > > --- a/arch/riscv/include/asm/thread_info.h
> > > > +++ b/arch/riscv/include/asm/thread_info.h
> > > > @@ -28,14 +28,12 @@
> > > >
> > > >  #define THREAD_SHIFT            (PAGE_SHIFT + THREAD_SIZE_ORDER)
> > > >  #define OVERFLOW_STACK_SIZE     SZ_4K
> > > > -#define SHADOW_OVERFLOW_STACK_SIZE (1024)
> > > > +#define OVERFLOW_STACK_SHIFT 12
> > >
> > > oops, this should be removed, will update it in a newer version after
> > > collecting review comments.
> > >
> > > >
> > > >  #define IRQ_STACK_SIZE               THREAD_SIZE
> > > >
> > > >  #ifndef __ASSEMBLY__
> > > >
> > > > -extern long shadow_stack[SHADOW_OVERFLOW_STACK_SIZE / sizeof(long)];
> > > > -
> > > >  #include <asm/processor.h>
> > > >  #include <asm/csr.h>
> > > >
> > > > diff --git a/arch/riscv/kernel/asm-offsets.c b/arch/riscv/kernel/asm-offsets.c
> > > > index df9444397908..62bf3bacc322 100644
> > > > --- a/arch/riscv/kernel/asm-offsets.c
> > > > +++ b/arch/riscv/kernel/asm-offsets.c
> > > > @@ -37,6 +37,7 @@ void asm_offsets(void)
> > > >       OFFSET(TASK_TI_PREEMPT_COUNT, task_struct, thread_info.preempt_count);
> > > >       OFFSET(TASK_TI_KERNEL_SP, task_struct, thread_info.kernel_sp);
> > > >       OFFSET(TASK_TI_USER_SP, task_struct, thread_info.user_sp);
> > > > +     OFFSET(TASK_TI_CPU, task_struct, thread_info.cpu);
> > > >
> > > >       OFFSET(TASK_THREAD_F0,  task_struct, thread.fstate.f[0]);
> > > >       OFFSET(TASK_THREAD_F1,  task_struct, thread.fstate.f[1]);
> > > > diff --git a/arch/riscv/kernel/entry.S b/arch/riscv/kernel/entry.S
> > > > index a3e1ed2fa2ac..5a6171a90d81 100644
> > > > --- a/arch/riscv/kernel/entry.S
> > > > +++ b/arch/riscv/kernel/entry.S
> > > > @@ -223,54 +223,16 @@ END(ret_from_exception)
> > > >
> > > >  #ifdef CONFIG_VMAP_STACK
> > > >  ENTRY(handle_kernel_stack_overflow)
> > > > -     la sp, shadow_stack
> > > > -     addi sp, sp, SHADOW_OVERFLOW_STACK_SIZE
> > > > -
> > > > -     //save caller register to shadow stack
> > > > -     addi sp, sp, -(PT_SIZE_ON_STACK)
> > > > -     REG_S x1,  PT_RA(sp)
> > > > -     REG_S x5,  PT_T0(sp)
> > > > -     REG_S x6,  PT_T1(sp)
> > > > -     REG_S x7,  PT_T2(sp)
> > > > -     REG_S x10, PT_A0(sp)
> > > > -     REG_S x11, PT_A1(sp)
> > > > -     REG_S x12, PT_A2(sp)
> > > > -     REG_S x13, PT_A3(sp)
> > > > -     REG_S x14, PT_A4(sp)
> > > > -     REG_S x15, PT_A5(sp)
> > > > -     REG_S x16, PT_A6(sp)
> > > > -     REG_S x17, PT_A7(sp)
> > > > -     REG_S x28, PT_T3(sp)
> > > > -     REG_S x29, PT_T4(sp)
> > > > -     REG_S x30, PT_T5(sp)
> > > > -     REG_S x31, PT_T6(sp)
> > > > -
> > > > -     la ra, restore_caller_reg
> > > > -     tail get_overflow_stack
> > > > -
> > > > -restore_caller_reg:
> > > > -     //save per-cpu overflow stack
> > > > -     REG_S a0, -8(sp)
> > > > -     //restore caller register from shadow_stack
> > > > -     REG_L x1,  PT_RA(sp)
> > > > -     REG_L x5,  PT_T0(sp)
> > > > -     REG_L x6,  PT_T1(sp)
> > > > -     REG_L x7,  PT_T2(sp)
> > > > -     REG_L x10, PT_A0(sp)
> > > > -     REG_L x11, PT_A1(sp)
> > > > -     REG_L x12, PT_A2(sp)
> > > > -     REG_L x13, PT_A3(sp)
> > > > -     REG_L x14, PT_A4(sp)
> > > > -     REG_L x15, PT_A5(sp)
> > > > -     REG_L x16, PT_A6(sp)
> > > > -     REG_L x17, PT_A7(sp)
> > > > -     REG_L x28, PT_T3(sp)
> > > > -     REG_L x29, PT_T4(sp)
> > > > -     REG_L x30, PT_T5(sp)
> > > > -     REG_L x31, PT_T6(sp)
> > > > +     la sp, overflow_stack
> > > > +     /* use tp as tmp register since we can restore it from CSR_SCRATCH */
> > > > +     REG_L tp, TASK_TI_CPU(tp)
> > > > +     slli tp, tp, RISCV_LGPTR
> > > > +     add tp, sp, tp
> > > > +     REG_L sp, 0(tp)
> > > > +
> > > > +     /* restore tp */
> > > > +     csrr tp, CSR_SCRATCH
> > > >
> > > > -     //load per-cpu overflow stack
> > > > -     REG_L sp, -8(sp)
> > > >       addi sp, sp, -(PT_SIZE_ON_STACK)
> > > >
> > > >       //save context to overflow stack
> > > > diff --git a/arch/riscv/kernel/traps.c b/arch/riscv/kernel/traps.c
> > > > index 73f06cd149d9..b6c64f0fb70f 100644
> > > > --- a/arch/riscv/kernel/traps.c
> > > > +++ b/arch/riscv/kernel/traps.c
> > > > @@ -216,23 +216,12 @@ int is_valid_bugaddr(unsigned long pc)
> > > >  #endif /* CONFIG_GENERIC_BUG */
> > > >
> > > >  #ifdef CONFIG_VMAP_STACK
> > > > -static DEFINE_PER_CPU(unsigned long [OVERFLOW_STACK_SIZE/sizeof(long)],
> > > > -             overflow_stack)__aligned(16);
> > > > -/*
> > > > - * shadow stack, handled_ kernel_ stack_ overflow(in kernel/entry.S) is used
> > > > - * to get per-cpu overflow stack(get_overflow_stack).
> > > > - */
> > > > -long shadow_stack[SHADOW_OVERFLOW_STACK_SIZE/sizeof(long)];
> > > > -asmlinkage unsigned long get_overflow_stack(void)
> > > > -{
> > > > -     return (unsigned long)this_cpu_ptr(overflow_stack) +
> > > > -             OVERFLOW_STACK_SIZE;
> > > > -}
> > > > +void *overflow_stack[NR_CPUS] __ro_after_init __aligned(16);
> > Er... We've talked NR_CPUS = 8192, even a pointer would cause 64KB wasted.
> >
> > I prefer the previous solution with a atomic flag.
> >
> > diff --git a/arch/riscv/kernel/entry.S b/arch/riscv/kernel/entry.S
> > index 5cbd6684ef52..42a3b14a20ab 100644
> > --- a/arch/riscv/kernel/entry.S
> > +++ b/arch/riscv/kernel/entry.S
> > @@ -223,6 +223,9 @@ END(ret_from_exception)
> >
> >  #ifdef CONFIG_VMAP_STACK
> >  ENTRY(handle_kernel_stack_overflow)
> > +1:     la sp, spin_ shadow_stack
> > +       amoswap.w sp, sp, (sp)
> > +       bnez sp, 1b
> >         la sp, shadow_stack
> >         addi sp, sp, SHADOW_OVERFLOW_STACK_SIZE
> >
> > diff --git a/arch/riscv/kernel/traps.c b/arch/riscv/kernel/traps.c
> > index 73f06cd149d9..9058a05cac53 100644
> > --- a/arch/riscv/kernel/traps.c
> > +++ b/arch/riscv/kernel/traps.c
> > @@ -229,11 +229,15 @@ asmlinkage unsigned long get_overflow_stack(void)
> >                 OVERFLOW_STACK_SIZE;
> >  }
> >
> > +atomic_t spin_ shadow_stack __section(".sdata");
> > +
> >  asmlinkage void handle_bad_stack(struct pt_regs *regs)
> >  {
> >         unsigned long tsk_stk = (unsigned long)current->stack;
> >         unsigned long ovf_stk = (unsigned long)this_cpu_ptr(overflow_stack);
> >
> > +       atomic_set_release(spin_ shadow_stack, 0);
> > +
> >         console_verbose();
> >
> >         pr_emerg("Insufficient stack space to handle exception!\n");
> Hi Jisheng,

Hi Guo,

> 
> Could you give out a new patch based on the above style to fix the
> problem first? It's a minor modification and consolidate the entry.S
> procedure.
> 

I read the above msg as: A new patch following the above style to fix
the race condition. Then another patch to remove the shadow stack
usage. Will cook the fix patch soon. 

Thanks

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH v2 3/4] riscv: fix race when vmap stack overflow and remove shadow_stack
  2022-10-19 13:05         ` Jisheng Zhang
@ 2022-10-20  1:00           ` Guo Ren
  0 siblings, 0 replies; 18+ messages in thread
From: Guo Ren @ 2022-10-20  1:00 UTC (permalink / raw)
  To: Jisheng Zhang
  Cc: Paul Walmsley, Palmer Dabbelt, Albert Ou, Nathan Chancellor,
	Nick Desaulniers, linux-riscv, linux-kernel, llvm

On Wed, Oct 19, 2022 at 9:15 PM Jisheng Zhang <jszhang@kernel.org> wrote:
>
> On Wed, Oct 19, 2022 at 10:52:16AM +0800, Guo Ren wrote:
> > On Thu, Sep 29, 2022 at 1:54 PM Guo Ren <guoren@kernel.org> wrote:
> > >
> > > On Thu, Sep 29, 2022 at 1:03 AM Jisheng Zhang <jszhang@kernel.org> wrote:
> > > >
> > > > On Thu, Sep 29, 2022 at 12:20:06AM +0800, Jisheng Zhang wrote:
> > > > > Currently, when detecting vmap stack overflow, riscv firstly switches
> > > > > to the so called shadow stack, then use this shadow stack to call the
> > > > > get_overflow_stack() to get the overflow stack. However, there's
> > > > > a race here if two or more harts use the same shadow stack at the same
> > > > > time.
> > > > >
> > > > > To solve this race, we rely on two facts:
> > > > > 1. the content of kernel thread pointer I.E "tp" register can still
> > > > > be gotten from the the CSR_SCRATCH register, thus we can clobber tp
> > > > > under the condtion that we restore tp from CSR_SCRATCH later.
> > > > >
> > > > > 2. Once vmap stack overflow happen, panic is comming soon, no
> > > > > performance concern at all, so we don't need to define the overflow
> > > > > stack as percpu var, we can simplify it into a pointer array which
> > > > > points to allocated pages.
> > > > >
> > > > > Thus we can use tp as a tmp register to get the cpu id to calculate
> > > > > the offset of overflow stack pointer array for each cpu w/o shadow
> > > > > stack any more. Thus the race condition is removed as a side effect.
> > > > >
> > > > > NOTE: we can use similar mechanism to let each cpu use different shadow
> > > > > stack to fix the race codition, but if we can remove shadow stack usage
> > > > > totally, why not.
> > > > >
> > > > > Signed-off-by: Jisheng Zhang <jszhang@kernel.org>
> > > > > Fixes: 31da94c25aea ("riscv: add VMAP_STACK overflow detection")
> > > > > ---
> > > > >  arch/riscv/include/asm/asm-prototypes.h |  1 -
> > > > >  arch/riscv/include/asm/thread_info.h    |  4 +-
> > > > >  arch/riscv/kernel/asm-offsets.c         |  1 +
> > > > >  arch/riscv/kernel/entry.S               | 56 ++++---------------------
> > > > >  arch/riscv/kernel/traps.c               | 31 ++++++++------
> > > > >  5 files changed, 29 insertions(+), 64 deletions(-)
> > > > >
> > > > > diff --git a/arch/riscv/include/asm/asm-prototypes.h b/arch/riscv/include/asm/asm-prototypes.h
> > > > > index ef386fcf3939..4a06fa0f6493 100644
> > > > > --- a/arch/riscv/include/asm/asm-prototypes.h
> > > > > +++ b/arch/riscv/include/asm/asm-prototypes.h
> > > > > @@ -25,7 +25,6 @@ DECLARE_DO_ERROR_INFO(do_trap_ecall_s);
> > > > >  DECLARE_DO_ERROR_INFO(do_trap_ecall_m);
> > > > >  DECLARE_DO_ERROR_INFO(do_trap_break);
> > > > >
> > > > > -asmlinkage unsigned long get_overflow_stack(void);
> > > > >  asmlinkage void handle_bad_stack(struct pt_regs *regs);
> > > > >
> > > > >  #endif /* _ASM_RISCV_PROTOTYPES_H */
> > > > > diff --git a/arch/riscv/include/asm/thread_info.h b/arch/riscv/include/asm/thread_info.h
> > > > > index c970d41dc4c6..c604a5212a73 100644
> > > > > --- a/arch/riscv/include/asm/thread_info.h
> > > > > +++ b/arch/riscv/include/asm/thread_info.h
> > > > > @@ -28,14 +28,12 @@
> > > > >
> > > > >  #define THREAD_SHIFT            (PAGE_SHIFT + THREAD_SIZE_ORDER)
> > > > >  #define OVERFLOW_STACK_SIZE     SZ_4K
> > > > > -#define SHADOW_OVERFLOW_STACK_SIZE (1024)
> > > > > +#define OVERFLOW_STACK_SHIFT 12
> > > >
> > > > oops, this should be removed, will update it in a newer version after
> > > > collecting review comments.
> > > >
> > > > >
> > > > >  #define IRQ_STACK_SIZE               THREAD_SIZE
> > > > >
> > > > >  #ifndef __ASSEMBLY__
> > > > >
> > > > > -extern long shadow_stack[SHADOW_OVERFLOW_STACK_SIZE / sizeof(long)];
> > > > > -
> > > > >  #include <asm/processor.h>
> > > > >  #include <asm/csr.h>
> > > > >
> > > > > diff --git a/arch/riscv/kernel/asm-offsets.c b/arch/riscv/kernel/asm-offsets.c
> > > > > index df9444397908..62bf3bacc322 100644
> > > > > --- a/arch/riscv/kernel/asm-offsets.c
> > > > > +++ b/arch/riscv/kernel/asm-offsets.c
> > > > > @@ -37,6 +37,7 @@ void asm_offsets(void)
> > > > >       OFFSET(TASK_TI_PREEMPT_COUNT, task_struct, thread_info.preempt_count);
> > > > >       OFFSET(TASK_TI_KERNEL_SP, task_struct, thread_info.kernel_sp);
> > > > >       OFFSET(TASK_TI_USER_SP, task_struct, thread_info.user_sp);
> > > > > +     OFFSET(TASK_TI_CPU, task_struct, thread_info.cpu);
> > > > >
> > > > >       OFFSET(TASK_THREAD_F0,  task_struct, thread.fstate.f[0]);
> > > > >       OFFSET(TASK_THREAD_F1,  task_struct, thread.fstate.f[1]);
> > > > > diff --git a/arch/riscv/kernel/entry.S b/arch/riscv/kernel/entry.S
> > > > > index a3e1ed2fa2ac..5a6171a90d81 100644
> > > > > --- a/arch/riscv/kernel/entry.S
> > > > > +++ b/arch/riscv/kernel/entry.S
> > > > > @@ -223,54 +223,16 @@ END(ret_from_exception)
> > > > >
> > > > >  #ifdef CONFIG_VMAP_STACK
> > > > >  ENTRY(handle_kernel_stack_overflow)
> > > > > -     la sp, shadow_stack
> > > > > -     addi sp, sp, SHADOW_OVERFLOW_STACK_SIZE
> > > > > -
> > > > > -     //save caller register to shadow stack
> > > > > -     addi sp, sp, -(PT_SIZE_ON_STACK)
> > > > > -     REG_S x1,  PT_RA(sp)
> > > > > -     REG_S x5,  PT_T0(sp)
> > > > > -     REG_S x6,  PT_T1(sp)
> > > > > -     REG_S x7,  PT_T2(sp)
> > > > > -     REG_S x10, PT_A0(sp)
> > > > > -     REG_S x11, PT_A1(sp)
> > > > > -     REG_S x12, PT_A2(sp)
> > > > > -     REG_S x13, PT_A3(sp)
> > > > > -     REG_S x14, PT_A4(sp)
> > > > > -     REG_S x15, PT_A5(sp)
> > > > > -     REG_S x16, PT_A6(sp)
> > > > > -     REG_S x17, PT_A7(sp)
> > > > > -     REG_S x28, PT_T3(sp)
> > > > > -     REG_S x29, PT_T4(sp)
> > > > > -     REG_S x30, PT_T5(sp)
> > > > > -     REG_S x31, PT_T6(sp)
> > > > > -
> > > > > -     la ra, restore_caller_reg
> > > > > -     tail get_overflow_stack
> > > > > -
> > > > > -restore_caller_reg:
> > > > > -     //save per-cpu overflow stack
> > > > > -     REG_S a0, -8(sp)
> > > > > -     //restore caller register from shadow_stack
> > > > > -     REG_L x1,  PT_RA(sp)
> > > > > -     REG_L x5,  PT_T0(sp)
> > > > > -     REG_L x6,  PT_T1(sp)
> > > > > -     REG_L x7,  PT_T2(sp)
> > > > > -     REG_L x10, PT_A0(sp)
> > > > > -     REG_L x11, PT_A1(sp)
> > > > > -     REG_L x12, PT_A2(sp)
> > > > > -     REG_L x13, PT_A3(sp)
> > > > > -     REG_L x14, PT_A4(sp)
> > > > > -     REG_L x15, PT_A5(sp)
> > > > > -     REG_L x16, PT_A6(sp)
> > > > > -     REG_L x17, PT_A7(sp)
> > > > > -     REG_L x28, PT_T3(sp)
> > > > > -     REG_L x29, PT_T4(sp)
> > > > > -     REG_L x30, PT_T5(sp)
> > > > > -     REG_L x31, PT_T6(sp)
> > > > > +     la sp, overflow_stack
> > > > > +     /* use tp as tmp register since we can restore it from CSR_SCRATCH */
> > > > > +     REG_L tp, TASK_TI_CPU(tp)
> > > > > +     slli tp, tp, RISCV_LGPTR
> > > > > +     add tp, sp, tp
> > > > > +     REG_L sp, 0(tp)
> > > > > +
> > > > > +     /* restore tp */
> > > > > +     csrr tp, CSR_SCRATCH
> > > > >
> > > > > -     //load per-cpu overflow stack
> > > > > -     REG_L sp, -8(sp)
> > > > >       addi sp, sp, -(PT_SIZE_ON_STACK)
> > > > >
> > > > >       //save context to overflow stack
> > > > > diff --git a/arch/riscv/kernel/traps.c b/arch/riscv/kernel/traps.c
> > > > > index 73f06cd149d9..b6c64f0fb70f 100644
> > > > > --- a/arch/riscv/kernel/traps.c
> > > > > +++ b/arch/riscv/kernel/traps.c
> > > > > @@ -216,23 +216,12 @@ int is_valid_bugaddr(unsigned long pc)
> > > > >  #endif /* CONFIG_GENERIC_BUG */
> > > > >
> > > > >  #ifdef CONFIG_VMAP_STACK
> > > > > -static DEFINE_PER_CPU(unsigned long [OVERFLOW_STACK_SIZE/sizeof(long)],
> > > > > -             overflow_stack)__aligned(16);
> > > > > -/*
> > > > > - * shadow stack, handled_ kernel_ stack_ overflow(in kernel/entry.S) is used
> > > > > - * to get per-cpu overflow stack(get_overflow_stack).
> > > > > - */
> > > > > -long shadow_stack[SHADOW_OVERFLOW_STACK_SIZE/sizeof(long)];
> > > > > -asmlinkage unsigned long get_overflow_stack(void)
> > > > > -{
> > > > > -     return (unsigned long)this_cpu_ptr(overflow_stack) +
> > > > > -             OVERFLOW_STACK_SIZE;
> > > > > -}
> > > > > +void *overflow_stack[NR_CPUS] __ro_after_init __aligned(16);
> > > Er... We've talked NR_CPUS = 8192, even a pointer would cause 64KB wasted.
> > >
> > > I prefer the previous solution with a atomic flag.
> > >
> > > diff --git a/arch/riscv/kernel/entry.S b/arch/riscv/kernel/entry.S
> > > index 5cbd6684ef52..42a3b14a20ab 100644
> > > --- a/arch/riscv/kernel/entry.S
> > > +++ b/arch/riscv/kernel/entry.S
> > > @@ -223,6 +223,9 @@ END(ret_from_exception)
> > >
> > >  #ifdef CONFIG_VMAP_STACK
> > >  ENTRY(handle_kernel_stack_overflow)
> > > +1:     la sp, spin_ shadow_stack
> > > +       amoswap.w sp, sp, (sp)
> > > +       bnez sp, 1b
> > >         la sp, shadow_stack
> > >         addi sp, sp, SHADOW_OVERFLOW_STACK_SIZE
> > >
> > > diff --git a/arch/riscv/kernel/traps.c b/arch/riscv/kernel/traps.c
> > > index 73f06cd149d9..9058a05cac53 100644
> > > --- a/arch/riscv/kernel/traps.c
> > > +++ b/arch/riscv/kernel/traps.c
> > > @@ -229,11 +229,15 @@ asmlinkage unsigned long get_overflow_stack(void)
> > >                 OVERFLOW_STACK_SIZE;
> > >  }
> > >
> > > +atomic_t spin_ shadow_stack __section(".sdata");
> > > +
> > >  asmlinkage void handle_bad_stack(struct pt_regs *regs)
> > >  {
> > >         unsigned long tsk_stk = (unsigned long)current->stack;
> > >         unsigned long ovf_stk = (unsigned long)this_cpu_ptr(overflow_stack);
> > >
> > > +       atomic_set_release(spin_ shadow_stack, 0);
> > > +
> > >         console_verbose();
> > >
> > >         pr_emerg("Insufficient stack space to handle exception!\n");
> > Hi Jisheng,
>
> Hi Guo,
>
> >
> > Could you give out a new patch based on the above style to fix the
> > problem first? It's a minor modification and consolidate the entry.S
> > procedure.
> >
>
> I read the above msg as: A new patch following the above style to fix
> the race condition. Then another patch to remove the shadow stack
> usage. Will cook the fix patch soon.
Yes, give a first small fixup patch. Then continue your idea.

I would also involve it in my generic entry series, which reconstructs
entry.S so much. The patch could fix up entry.S potential bug.

>
> Thanks



-- 
Best Regards
 Guo Ren

^ permalink raw reply	[flat|nested] 18+ messages in thread

end of thread, other threads:[~2022-10-20  1:00 UTC | newest]

Thread overview: 18+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-09-28 16:20 [PATCH v2 0/4] riscv: entry: further clean up and VMAP_STACK fix Jisheng Zhang
2022-09-28 16:20 ` [PATCH v2 1/4] riscv: remove extra level wrappers of trace_hardirqs_{on,off} Jisheng Zhang
2022-09-28 16:20 ` [PATCH v2 2/4] riscv: consolidate ret_from_kernel_thread into ret_from_fork Jisheng Zhang
2022-10-19  3:37   ` Guo Ren
2022-09-28 16:20 ` [PATCH v2 3/4] riscv: fix race when vmap stack overflow and remove shadow_stack Jisheng Zhang
2022-09-28 16:54   ` Jisheng Zhang
2022-09-29  5:54     ` Guo Ren
2022-09-29  6:58       ` Guo Ren
2022-09-29 16:18       ` Jisheng Zhang
2022-09-30  3:35         ` Guo Ren
2022-10-19  2:52       ` Guo Ren
2022-10-19 13:05         ` Jisheng Zhang
2022-10-20  1:00           ` Guo Ren
2022-09-28 16:20 ` [PATCH v2 4/4] riscv: entry: consolidate general regs saving into save_gp Jisheng Zhang
2022-09-29  3:59   ` Guo Ren
2022-09-29 16:01     ` Jisheng Zhang
2022-10-19  2:55       ` Guo Ren
2022-10-19 13:01         ` Jisheng Zhang

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).