[7/8] sched: Fix CPU hotplug / tighten is_per_cpu_kthread()
diff mbox series

Message ID 20210116113920.103635633@infradead.org
State New, archived
Headers show
Series
  • sched: Fix hot-unplug regressions
Related show

Commit Message

Peter Zijlstra Jan. 16, 2021, 11:30 a.m. UTC
Prior to commit 1cf12e08bc4d ("sched/hotplug: Consolidate task
migration on CPU unplug") we'd leave any task on the dying CPU and
break affinity and force them off at the very end.

This scheme had to change in order to enable migrate_disable(). One
cannot wait for migrate_disable() to complete while stuck in
stop_machine(). Furthermore, since we need at the very least: idle,
hotplug and stop threads at any point before stop_machine, we can't
break affinity and/or push those away.

Under the assumption that all per-cpu kthreads are sanely handled by
CPU hotplug, the new code no long breaks affinity or migrates any of
them (which then includes the critical ones above).

However, there's an important difference between per-cpu kthreads and
kthreads that happen to have a single CPU affinity which is lost. The
latter class very much relies on the forced affinity breaking and
migration semantics previously provided.

Use the new kthread_is_per_cpu() infrastructure to tighten
is_per_cpu_kthread() and fix the hot-unplug problems stemming from the
change.

Fixes: 1cf12e08bc4d ("sched/hotplug: Consolidate task migration on CPU unplug")
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
---
 kernel/sched/core.c |   29 +++++++++++++++++++++++++----
 1 file changed, 25 insertions(+), 4 deletions(-)

Comments

Valentin Schneider Jan. 17, 2021, 4:57 p.m. UTC | #1
On 16/01/21 12:30, Peter Zijlstra wrote:
> @@ -1796,13 +1796,28 @@ static inline bool rq_has_pinned_tasks(s
>   */
>  static inline bool is_cpu_allowed(struct task_struct *p, int cpu)
>  {
> +	/* When not in the task's cpumask, no point in looking further. */
>       if (!cpumask_test_cpu(cpu, p->cpus_ptr))
>               return false;
>
> -	if (is_per_cpu_kthread(p) || is_migration_disabled(p))
> +	/* migrate_disabled() must be allowed to finish. */
> +	if (is_migration_disabled(p))
>               return cpu_online(cpu);
>
> -	return cpu_active(cpu);
> +	/* Non kernel threads are not allowed during either online or offline. */
> +	if (!(p->flags & PF_KTHREAD))
> +		return cpu_active(cpu);
> +
> +	/* KTHREAD_IS_PER_CPU is always allowed. */
> +	if (kthread_is_per_cpu(p))
> +		return cpu_online(cpu);
> +
> +	/* Regular kernel threads don't get to stay during offline. */
> +	if (cpu_rq(cpu)->balance_callback == &balance_push_callback)
> +		return cpu_active(cpu);

is_cpu_allowed(, cpu) isn't guaranteed to have cpu_rq(cpu)'s rq_lock
held, so this can race with balance_push_set(, true). This shouldn't
matter under normal circumstances as we'll have sched_cpu_wait_empty()
further down the line.

This might get ugly with the rollback faff - this is jumping the gun a
bit, but that's something we'll have to address, and I think what I'm
concerned about is close to what you mentioned in

  http://lore.kernel.org/r/YAM1t2Qzr7Rib3bN@hirez.programming.kicks-ass.net

Here's what I'm thinking of:

_cpu_up()                            ttwu()
                                       select_task_rq()
                                         is_cpu_allowed()
                                           rq->balance_callback != balance_push_callback
  smpboot_unpark_threads() // FAIL
  (now going down, set push here)
  sched_cpu_wait_empty()
  ...                                  ttwu_queue()
  sched_cpu_dying()
  *ARGH*

I've written some horrors on top of this series here:

  https://gitlab.arm.com/linux-arm/linux-vs/-/commits/mainline/migrate_disable/stragglers/

Also, my TX2 is again in need of CPR, so in the meantime I'm running
tests on a (much) smaller machine...

> +
> +	/* But are allowed during online. */
> +	return cpu_online(cpu);
>  }
Peter Zijlstra Jan. 18, 2021, 9:30 a.m. UTC | #2
On Sun, Jan 17, 2021 at 04:57:27PM +0000, Valentin Schneider wrote:
> On 16/01/21 12:30, Peter Zijlstra wrote:
> > @@ -1796,13 +1796,28 @@ static inline bool rq_has_pinned_tasks(s
> >   */
> >  static inline bool is_cpu_allowed(struct task_struct *p, int cpu)
> >  {
> > +	/* When not in the task's cpumask, no point in looking further. */
> >       if (!cpumask_test_cpu(cpu, p->cpus_ptr))
> >               return false;
> >
> > +	/* migrate_disabled() must be allowed to finish. */
> > +	if (is_migration_disabled(p))
> >               return cpu_online(cpu);
> >
> > +	/* Non kernel threads are not allowed during either online or offline. */
> > +	if (!(p->flags & PF_KTHREAD))
> > +		return cpu_active(cpu);
> > +
> > +	/* KTHREAD_IS_PER_CPU is always allowed. */
> > +	if (kthread_is_per_cpu(p))
> > +		return cpu_online(cpu);
> > +
> > +	/* Regular kernel threads don't get to stay during offline. */
> > +	if (cpu_rq(cpu)->balance_callback == &balance_push_callback)
> > +		return cpu_active(cpu);
> 
> is_cpu_allowed(, cpu) isn't guaranteed to have cpu_rq(cpu)'s rq_lock
> held, so this can race with balance_push_set(, true). This shouldn't
> matter under normal circumstances as we'll have sched_cpu_wait_empty()
> further down the line.
> 
> This might get ugly with the rollback faff - this is jumping the gun a
> bit, but that's something we'll have to address, and I think what I'm
> concerned about is close to what you mentioned in
> 
>   http://lore.kernel.org/r/YAM1t2Qzr7Rib3bN@hirez.programming.kicks-ass.net
> 
> Here's what I'm thinking of:
> 
> _cpu_up()                            ttwu()
>                                        select_task_rq()
>                                          is_cpu_allowed()
>                                            rq->balance_callback != balance_push_callback
>   smpboot_unpark_threads() // FAIL
>   (now going down, set push here)
>   sched_cpu_wait_empty()
>   ...                                  ttwu_queue()
>   sched_cpu_dying()
>   *ARGH*
> 

Let me try this then...

diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 5057054b1cff..9b045296d646 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -7495,6 +7495,8 @@ int sched_cpu_activate(unsigned int cpu)
 	return 0;
 }
 
+unsigned long sched_cpu_rcu_state;
+
 int sched_cpu_deactivate(unsigned int cpu)
 {
 	struct rq *rq = cpu_rq(cpu);
@@ -7519,6 +7521,11 @@ int sched_cpu_deactivate(unsigned int cpu)
 	 */
 	balance_push_set(cpu, true);
 
+	/*
+	 * See sched_cpu_wait_empty().
+	 */
+	sched_cpu_rcu_state = get_state_synchronize_rcu();
+
 	rq_lock_irqsave(rq, &rf);
 	if (rq->rd) {
 		update_rq_clock(rq);
@@ -7578,6 +7585,12 @@ int sched_cpu_starting(unsigned int cpu)
  */
 int sched_cpu_wait_empty(unsigned int cpu)
 {
+	/*
+	 * Guarantee that TTWU will observe balance_push_set(true),
+	 * such that all wakeups will refuse this CPU.
+	 */
+	cond_synchronize_rcu(sched_cpu_rcu_state);
+
 	balance_hotplug_wait();
 	return 0;
 }

Patch
diff mbox series

--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -1796,13 +1796,28 @@  static inline bool rq_has_pinned_tasks(s
  */
 static inline bool is_cpu_allowed(struct task_struct *p, int cpu)
 {
+	/* When not in the task's cpumask, no point in looking further. */
 	if (!cpumask_test_cpu(cpu, p->cpus_ptr))
 		return false;
 
-	if (is_per_cpu_kthread(p) || is_migration_disabled(p))
+	/* migrate_disabled() must be allowed to finish. */
+	if (is_migration_disabled(p))
 		return cpu_online(cpu);
 
-	return cpu_active(cpu);
+	/* Non kernel threads are not allowed during either online or offline. */
+	if (!(p->flags & PF_KTHREAD))
+		return cpu_active(cpu);
+
+	/* KTHREAD_IS_PER_CPU is always allowed. */
+	if (kthread_is_per_cpu(p))
+		return cpu_online(cpu);
+
+	/* Regular kernel threads don't get to stay during offline. */
+	if (cpu_rq(cpu)->balance_callback == &balance_push_callback)
+		return cpu_active(cpu);
+
+	/* But are allowed during online. */
+	return cpu_online(cpu);
 }
 
 /*
@@ -7276,8 +7291,14 @@  static void balance_push(struct rq *rq)
 	/*
 	 * Both the cpu-hotplug and stop task are in this case and are
 	 * required to complete the hotplug process.
+	 *
+	 * XXX: the idle task does not match kthread_is_per_cpu() due to
+	 * histerical raisins.
 	 */
-	if (is_per_cpu_kthread(push_task) || is_migration_disabled(push_task)) {
+	if (rq->idle == push_task ||
+	    ((push_task->flags & PF_KTHREAD) && kthread_is_per_cpu(push_task)) ||
+	    is_migration_disabled(push_task)) {
+
 		/*
 		 * If this is the idle task on the outgoing CPU try to wake
 		 * up the hotplug control thread which might wait for the
@@ -7309,7 +7330,7 @@  static void balance_push(struct rq *rq)
 	/*
 	 * At this point need_resched() is true and we'll take the loop in
 	 * schedule(). The next pick is obviously going to be the stop task
-	 * which is_per_cpu_kthread() and will push this task away.
+	 * which kthread_is_per_cpu() and will push this task away.
 	 */
 	raw_spin_lock(&rq->lock);
 }