All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v2] Reset task stack state in bringup_cpu()
@ 2021-11-18 10:29 Mark Rutland
  2021-11-23 11:06 ` Mark Rutland
  0 siblings, 1 reply; 3+ messages in thread
From: Mark Rutland @ 2021-11-18 10:29 UTC (permalink / raw)
  To: linux-kernel
  Cc: catalin.marinas, dvyukov, mark.rutland, peterz, quic_qiancai,
	valentin.schneider, will, woodylin

To hot unplug a CPU, the idle task on that CPU calls a few layers of C
code before finally leaving the kernel. When KASAN is in use, poisoned
shadow is left around for each of the active stack frames, and when
shadow call stacks are in use. When shadow call stacks are in use the
task's saved SCS SP is left pointing at an arbitrary point within the
task's shadow call stack.

When a CPU is offlined than onlined back into the kernel, this stale
state can adversely affect execution. Stale KASAN shadow can alias new
stackframes and result in bogus KASAN warnings. A stale SCS SP is
effectively a memory leak, and prevents a portion of the shadow call
stack being used. Across a number of hotplug cycles the idle task's
entire shadow call stack can become unusable.

We previously fixed the KASAN issue in commit:

  e1b77c92981a5222 ("sched/kasan: remove stale KASAN poison after hotplug")

... by removing any stale KASAN stack poison immediately prior to
onlining a CPU.

Subsequently in commit:

  f1a0a376ca0c4ef1 ("sched/core: Initialize the idle task with preemption disabled")

... the refactoring left the KASAN and SCS cleanup in one-time idle
thread initialization code rather than something invoked prior to each
CPU being onlined, breaking both as above.

We fixed SCS (but not KASAN) in commit:

  63acd42c0d4942f7 ("sched/scs: Reset the shadow stack when idle_task_exit")

... but as this runs in the context of the idle task being offlined it's
potentially fragile.

To fix these consistently and more robustly, reset the SCS SP and KASAN
shadow of a CPU's idle task immediately before we online that CPU in
bringup_cpu(). This ensures the idle task always has a consistent state
when it is running, and removes the need to so so when exiting an idle
task.

Whenever any thread is created, dup_task_struct() will give the task a
stack which is free of KASAN shadow, and initialize the task's SCS SP,
so there's no need to specially initialize either for idle thread within
init_idle(), as this was only necessary to handle hotplug cycles.

I've tested this with both GCC and clang, with relevant options enabled,
offlining and onlining CPUs with:

| while true; do
|   for C in /sys/devices/system/cpu/cpu*/online; do
|     echo 0 > $C;
|     echo 1 > $C;
|   done
| done

Link: https://lore.kernel.org/lkml/20211012083521.973587-1-woodylin@google.com/
Link: https://lore.kernel.org/linux-arm-kernel/YY9ECKyPtDbD9q8q@qian-HP-Z2-SFF-G5-Workstation/
Link: https://lore.kernel.org/lkml/20211115113310.35693-1-mark.rutland@arm.com/
Fixes: 1a0a376ca0c4ef1 ("sched/core: Initialize the idle task with preemption disabled")
Reported-by: Qian Cai <quic_qiancai@quicinc.com>
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Reviewed-by: Valentin Schneider <valentin.schneider@arm.com>
Tested-by: Qian Cai <quic_qiancai@quicinc.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Will Deacon <will@kernel.org>
Cc: Woody Lin <woodylin@google.com>
---
 kernel/cpu.c        | 7 +++++++
 kernel/sched/core.c | 4 ----
 2 files changed, 7 insertions(+), 4 deletions(-)

Since v1 [1]:
* Clarify commit message
* Fix typos
* Accumulate tags

[1] https://lore.kernel.org/lkml/20211115113310.35693-1-mark.rutland@arm.com/

diff --git a/kernel/cpu.c b/kernel/cpu.c
index 192e43a87407..407a2568f35e 100644
--- a/kernel/cpu.c
+++ b/kernel/cpu.c
@@ -31,6 +31,7 @@
 #include <linux/smpboot.h>
 #include <linux/relay.h>
 #include <linux/slab.h>
+#include <linux/scs.h>
 #include <linux/percpu-rwsem.h>
 #include <linux/cpuset.h>
 
@@ -588,6 +589,12 @@ static int bringup_cpu(unsigned int cpu)
 	int ret;
 
 	/*
+	 * Reset stale stack state from the last time this CPU was online.
+	 */
+	scs_task_reset(idle);
+	kasan_unpoison_task_stack(idle);
+
+	/*
 	 * Some architectures have to walk the irq descriptors to
 	 * setup the vector space for the cpu which comes online.
 	 * Prevent irq alloc/free across the bringup.
diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 3c9b0fda64ac..76f9deeaa942 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -8619,9 +8619,6 @@ void __init init_idle(struct task_struct *idle, int cpu)
 	idle->flags |= PF_IDLE | PF_KTHREAD | PF_NO_SETAFFINITY;
 	kthread_set_per_cpu(idle, cpu);
 
-	scs_task_reset(idle);
-	kasan_unpoison_task_stack(idle);
-
 #ifdef CONFIG_SMP
 	/*
 	 * It's possible that init_idle() gets called multiple times on a task,
@@ -8777,7 +8774,6 @@ void idle_task_exit(void)
 		finish_arch_post_lock_switch();
 	}
 
-	scs_task_reset(current);
 	/* finish_cpu(), as ran on the BP, will clean up the active_mm state */
 }
 
-- 
2.11.0


^ permalink raw reply related	[flat|nested] 3+ messages in thread

* Re: [PATCH v2] Reset task stack state in bringup_cpu()
  2021-11-18 10:29 [PATCH v2] Reset task stack state in bringup_cpu() Mark Rutland
@ 2021-11-23 11:06 ` Mark Rutland
  2021-11-23 11:09   ` Mark Rutland
  0 siblings, 1 reply; 3+ messages in thread
From: Mark Rutland @ 2021-11-23 11:06 UTC (permalink / raw)
  To: linux-kernel
  Cc: catalin.marinas, dvyukov, peterz, quic_qiancai,
	valentin.schneider, will, woodylin

On Thu, Nov 18, 2021 at 10:29:27AM +0000, Mark Rutland wrote:
> To hot unplug a CPU, the idle task on that CPU calls a few layers of C
> code before finally leaving the kernel. When KASAN is in use, poisoned
> shadow is left around for each of the active stack frames, and when
> shadow call stacks are in use. When shadow call stacks are in use the
> task's saved SCS SP is left pointing at an arbitrary point within the
> task's shadow call stack.
> 
> When a CPU is offlined than onlined back into the kernel, this stale
> state can adversely affect execution. Stale KASAN shadow can alias new
> stackframes and result in bogus KASAN warnings. A stale SCS SP is
> effectively a memory leak, and prevents a portion of the shadow call
> stack being used. Across a number of hotplug cycles the idle task's
> entire shadow call stack can become unusable.
> 
> We previously fixed the KASAN issue in commit:
> 
>   e1b77c92981a5222 ("sched/kasan: remove stale KASAN poison after hotplug")
> 
> ... by removing any stale KASAN stack poison immediately prior to
> onlining a CPU.
> 
> Subsequently in commit:
> 
>   f1a0a376ca0c4ef1 ("sched/core: Initialize the idle task with preemption disabled")
> 
> ... the refactoring left the KASAN and SCS cleanup in one-time idle
> thread initialization code rather than something invoked prior to each
> CPU being onlined, breaking both as above.
> 
> We fixed SCS (but not KASAN) in commit:
> 
>   63acd42c0d4942f7 ("sched/scs: Reset the shadow stack when idle_task_exit")
> 
> ... but as this runs in the context of the idle task being offlined it's
> potentially fragile.
> 
> To fix these consistently and more robustly, reset the SCS SP and KASAN
> shadow of a CPU's idle task immediately before we online that CPU in
> bringup_cpu(). This ensures the idle task always has a consistent state
> when it is running, and removes the need to so so when exiting an idle
> task.
> 
> Whenever any thread is created, dup_task_struct() will give the task a
> stack which is free of KASAN shadow, and initialize the task's SCS SP,
> so there's no need to specially initialize either for idle thread within
> init_idle(), as this was only necessary to handle hotplug cycles.
> 
> I've tested this with both GCC and clang, with relevant options enabled,
> offlining and onlining CPUs with:
> 
> | while true; do
> |   for C in /sys/devices/system/cpu/cpu*/online; do
> |     echo 0 > $C;
> |     echo 1 > $C;
> |   done
> | done
> 
> Link: https://lore.kernel.org/lkml/20211012083521.973587-1-woodylin@google.com/
> Link: https://lore.kernel.org/linux-arm-kernel/YY9ECKyPtDbD9q8q@qian-HP-Z2-SFF-G5-Workstation/
> Link: https://lore.kernel.org/lkml/20211115113310.35693-1-mark.rutland@arm.com/
> Fixes: 1a0a376ca0c4ef1 ("sched/core: Initialize the idle task with preemption disabled")

Ugh, that should be f1a0a376ca0c4ef1 (with a leading 'f'), as in the
body of the commit message.

I can fix that up for v3, unless someone's happy to fix that up when
picking this up.

Thanks,
Mark.

> Reported-by: Qian Cai <quic_qiancai@quicinc.com>
> Signed-off-by: Mark Rutland <mark.rutland@arm.com>
> Reviewed-by: Valentin Schneider <valentin.schneider@arm.com>
> Tested-by: Qian Cai <quic_qiancai@quicinc.com>
> Cc: Catalin Marinas <catalin.marinas@arm.com>
> Cc: Dmitry Vyukov <dvyukov@google.com>
> Cc: Peter Zijlstra <peterz@infradead.org>
> Cc: Will Deacon <will@kernel.org>
> Cc: Woody Lin <woodylin@google.com>
> ---
>  kernel/cpu.c        | 7 +++++++
>  kernel/sched/core.c | 4 ----
>  2 files changed, 7 insertions(+), 4 deletions(-)
> 
> Since v1 [1]:
> * Clarify commit message
> * Fix typos
> * Accumulate tags
> 
> [1] https://lore.kernel.org/lkml/20211115113310.35693-1-mark.rutland@arm.com/
> 
> diff --git a/kernel/cpu.c b/kernel/cpu.c
> index 192e43a87407..407a2568f35e 100644
> --- a/kernel/cpu.c
> +++ b/kernel/cpu.c
> @@ -31,6 +31,7 @@
>  #include <linux/smpboot.h>
>  #include <linux/relay.h>
>  #include <linux/slab.h>
> +#include <linux/scs.h>
>  #include <linux/percpu-rwsem.h>
>  #include <linux/cpuset.h>
>  
> @@ -588,6 +589,12 @@ static int bringup_cpu(unsigned int cpu)
>  	int ret;
>  
>  	/*
> +	 * Reset stale stack state from the last time this CPU was online.
> +	 */
> +	scs_task_reset(idle);
> +	kasan_unpoison_task_stack(idle);
> +
> +	/*
>  	 * Some architectures have to walk the irq descriptors to
>  	 * setup the vector space for the cpu which comes online.
>  	 * Prevent irq alloc/free across the bringup.
> diff --git a/kernel/sched/core.c b/kernel/sched/core.c
> index 3c9b0fda64ac..76f9deeaa942 100644
> --- a/kernel/sched/core.c
> +++ b/kernel/sched/core.c
> @@ -8619,9 +8619,6 @@ void __init init_idle(struct task_struct *idle, int cpu)
>  	idle->flags |= PF_IDLE | PF_KTHREAD | PF_NO_SETAFFINITY;
>  	kthread_set_per_cpu(idle, cpu);
>  
> -	scs_task_reset(idle);
> -	kasan_unpoison_task_stack(idle);
> -
>  #ifdef CONFIG_SMP
>  	/*
>  	 * It's possible that init_idle() gets called multiple times on a task,
> @@ -8777,7 +8774,6 @@ void idle_task_exit(void)
>  		finish_arch_post_lock_switch();
>  	}
>  
> -	scs_task_reset(current);
>  	/* finish_cpu(), as ran on the BP, will clean up the active_mm state */
>  }
>  
> -- 
> 2.11.0
> 

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: [PATCH v2] Reset task stack state in bringup_cpu()
  2021-11-23 11:06 ` Mark Rutland
@ 2021-11-23 11:09   ` Mark Rutland
  0 siblings, 0 replies; 3+ messages in thread
From: Mark Rutland @ 2021-11-23 11:09 UTC (permalink / raw)
  To: linux-kernel
  Cc: catalin.marinas, dvyukov, peterz, quic_qiancai,
	valentin.schneider, will, woodylin

On Tue, Nov 23, 2021 at 11:06:48AM +0000, Mark Rutland wrote:
> On Thu, Nov 18, 2021 at 10:29:27AM +0000, Mark Rutland wrote:
> > To hot unplug a CPU, the idle task on that CPU calls a few layers of C
> > code before finally leaving the kernel. When KASAN is in use, poisoned
> > shadow is left around for each of the active stack frames, and when
> > shadow call stacks are in use. When shadow call stacks are in use the
> > task's saved SCS SP is left pointing at an arbitrary point within the
> > task's shadow call stack.
> > 
> > When a CPU is offlined than onlined back into the kernel, this stale
> > state can adversely affect execution. Stale KASAN shadow can alias new
> > stackframes and result in bogus KASAN warnings. A stale SCS SP is
> > effectively a memory leak, and prevents a portion of the shadow call
> > stack being used. Across a number of hotplug cycles the idle task's
> > entire shadow call stack can become unusable.
> > 
> > We previously fixed the KASAN issue in commit:
> > 
> >   e1b77c92981a5222 ("sched/kasan: remove stale KASAN poison after hotplug")
> > 
> > ... by removing any stale KASAN stack poison immediately prior to
> > onlining a CPU.
> > 
> > Subsequently in commit:
> > 
> >   f1a0a376ca0c4ef1 ("sched/core: Initialize the idle task with preemption disabled")
> > 
> > ... the refactoring left the KASAN and SCS cleanup in one-time idle
> > thread initialization code rather than something invoked prior to each
> > CPU being onlined, breaking both as above.
> > 
> > We fixed SCS (but not KASAN) in commit:
> > 
> >   63acd42c0d4942f7 ("sched/scs: Reset the shadow stack when idle_task_exit")
> > 
> > ... but as this runs in the context of the idle task being offlined it's
> > potentially fragile.
> > 
> > To fix these consistently and more robustly, reset the SCS SP and KASAN
> > shadow of a CPU's idle task immediately before we online that CPU in
> > bringup_cpu(). This ensures the idle task always has a consistent state
> > when it is running, and removes the need to so so when exiting an idle
> > task.
> > 
> > Whenever any thread is created, dup_task_struct() will give the task a
> > stack which is free of KASAN shadow, and initialize the task's SCS SP,
> > so there's no need to specially initialize either for idle thread within
> > init_idle(), as this was only necessary to handle hotplug cycles.
> > 
> > I've tested this with both GCC and clang, with relevant options enabled,
> > offlining and onlining CPUs with:
> > 
> > | while true; do
> > |   for C in /sys/devices/system/cpu/cpu*/online; do
> > |     echo 0 > $C;
> > |     echo 1 > $C;
> > |   done
> > | done
> > 
> > Link: https://lore.kernel.org/lkml/20211012083521.973587-1-woodylin@google.com/
> > Link: https://lore.kernel.org/linux-arm-kernel/YY9ECKyPtDbD9q8q@qian-HP-Z2-SFF-G5-Workstation/
> > Link: https://lore.kernel.org/lkml/20211115113310.35693-1-mark.rutland@arm.com/
> > Fixes: 1a0a376ca0c4ef1 ("sched/core: Initialize the idle task with preemption disabled")
> 
> Ugh, that should be f1a0a376ca0c4ef1 (with a leading 'f'), as in the
> body of the commit message.
> 
> I can fix that up for v3, unless someone's happy to fix that up when
> picking this up.

I just realised I forgot to Cc Ingo, so I'll spin that as v3 and fix the
Cc list.

Mark.

> > Reported-by: Qian Cai <quic_qiancai@quicinc.com>
> > Signed-off-by: Mark Rutland <mark.rutland@arm.com>
> > Reviewed-by: Valentin Schneider <valentin.schneider@arm.com>
> > Tested-by: Qian Cai <quic_qiancai@quicinc.com>
> > Cc: Catalin Marinas <catalin.marinas@arm.com>
> > Cc: Dmitry Vyukov <dvyukov@google.com>
> > Cc: Peter Zijlstra <peterz@infradead.org>
> > Cc: Will Deacon <will@kernel.org>
> > Cc: Woody Lin <woodylin@google.com>
> > ---
> >  kernel/cpu.c        | 7 +++++++
> >  kernel/sched/core.c | 4 ----
> >  2 files changed, 7 insertions(+), 4 deletions(-)
> > 
> > Since v1 [1]:
> > * Clarify commit message
> > * Fix typos
> > * Accumulate tags
> > 
> > [1] https://lore.kernel.org/lkml/20211115113310.35693-1-mark.rutland@arm.com/
> > 
> > diff --git a/kernel/cpu.c b/kernel/cpu.c
> > index 192e43a87407..407a2568f35e 100644
> > --- a/kernel/cpu.c
> > +++ b/kernel/cpu.c
> > @@ -31,6 +31,7 @@
> >  #include <linux/smpboot.h>
> >  #include <linux/relay.h>
> >  #include <linux/slab.h>
> > +#include <linux/scs.h>
> >  #include <linux/percpu-rwsem.h>
> >  #include <linux/cpuset.h>
> >  
> > @@ -588,6 +589,12 @@ static int bringup_cpu(unsigned int cpu)
> >  	int ret;
> >  
> >  	/*
> > +	 * Reset stale stack state from the last time this CPU was online.
> > +	 */
> > +	scs_task_reset(idle);
> > +	kasan_unpoison_task_stack(idle);
> > +
> > +	/*
> >  	 * Some architectures have to walk the irq descriptors to
> >  	 * setup the vector space for the cpu which comes online.
> >  	 * Prevent irq alloc/free across the bringup.
> > diff --git a/kernel/sched/core.c b/kernel/sched/core.c
> > index 3c9b0fda64ac..76f9deeaa942 100644
> > --- a/kernel/sched/core.c
> > +++ b/kernel/sched/core.c
> > @@ -8619,9 +8619,6 @@ void __init init_idle(struct task_struct *idle, int cpu)
> >  	idle->flags |= PF_IDLE | PF_KTHREAD | PF_NO_SETAFFINITY;
> >  	kthread_set_per_cpu(idle, cpu);
> >  
> > -	scs_task_reset(idle);
> > -	kasan_unpoison_task_stack(idle);
> > -
> >  #ifdef CONFIG_SMP
> >  	/*
> >  	 * It's possible that init_idle() gets called multiple times on a task,
> > @@ -8777,7 +8774,6 @@ void idle_task_exit(void)
> >  		finish_arch_post_lock_switch();
> >  	}
> >  
> > -	scs_task_reset(current);
> >  	/* finish_cpu(), as ran on the BP, will clean up the active_mm state */
> >  }
> >  
> > -- 
> > 2.11.0
> > 

^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2021-11-23 11:09 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-11-18 10:29 [PATCH v2] Reset task stack state in bringup_cpu() Mark Rutland
2021-11-23 11:06 ` Mark Rutland
2021-11-23 11:09   ` Mark Rutland

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.