[3/8] sched: Dont run cpu-online with balance_push() enabled
diff mbox series

Message ID 20210116113919.760767084@infradead.org
State New, archived
Headers show
Series
  • sched: Fix hot-unplug regressions
Related show

Commit Message

Peter Zijlstra Jan. 16, 2021, 11:30 a.m. UTC
We don't need to push away tasks when we come online, mark the push
complete right before the CPU dies.

XXX hotplug state machine has trouble with rollback here.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
---
 kernel/sched/core.c |   16 ++++++++++++++--
 1 file changed, 14 insertions(+), 2 deletions(-)

Comments

Peter Zijlstra Jan. 16, 2021, 3:27 p.m. UTC | #1
On Sat, Jan 16, 2021 at 12:30:36PM +0100, Peter Zijlstra wrote:
> @@ -7608,6 +7614,12 @@ int sched_cpu_dying(unsigned int cpu)
>  	}
>  	rq_unlock_irqrestore(rq, &rf);
>  
> +	/*
> +	 * Should really be after we clear cpu_online(), but we're in
> +	 * stop_machine(), so it all works.
> +	 */
> +	balance_push_set(cpu, false);

Looking at the RCU thing just now made me realize we run all the DYING
notifiers with cpu_online() already false, so the above comment is wrong
and ordering in fact perfect.

Patch
diff mbox series

--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -7320,10 +7320,12 @@  static void balance_push_set(int cpu, bo
 	struct rq_flags rf;
 
 	rq_lock_irqsave(rq, &rf);
-	if (on)
+	if (on) {
+		WARN_ON_ONCE(rq->balance_callback);
 		rq->balance_callback = &balance_push_callback;
-	else
+	} else if (rq->balance_callback == &balance_push_callback) {
 		rq->balance_callback = NULL;
+	}
 	rq_unlock_irqrestore(rq, &rf);
 }
 
@@ -7441,6 +7443,10 @@  int sched_cpu_activate(unsigned int cpu)
 	struct rq *rq = cpu_rq(cpu);
 	struct rq_flags rf;
 
+	/*
+	 * Make sure that when the hotplug state machine does a roll-back
+	 * we clear balance_push. Ideally that would happen earlier...
+	 */
 	balance_push_set(cpu, false);
 
 #ifdef CONFIG_SCHED_SMT
@@ -7608,6 +7614,12 @@  int sched_cpu_dying(unsigned int cpu)
 	}
 	rq_unlock_irqrestore(rq, &rf);
 
+	/*
+	 * Should really be after we clear cpu_online(), but we're in
+	 * stop_machine(), so it all works.
+	 */
+	balance_push_set(cpu, false);
+
 	calc_load_migrate(rq);
 	update_max_interval();
 	nohz_balance_exit_idle(rq);