linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 0/3] sched: Fix remaining balance_push vs hotplug hole
@ 2021-03-10 14:52 Peter Zijlstra
  2021-03-10 14:52 ` [PATCH 1/3] cpumask: Make cpu_{online,possible,present,active}() inline Peter Zijlstra
                   ` (2 more replies)
  0 siblings, 3 replies; 29+ messages in thread
From: Peter Zijlstra @ 2021-03-10 14:52 UTC (permalink / raw)
  To: valentin.schneider, tglx, mingo, bigeasy, swood, juri.lelli,
	vincent.guittot, dietmar.eggemann, rostedt, bsegall, mgorman,
	bristot, vincent.donnefort, qais.yousef
  Cc: linux-kernel, peterz

Hi,

While the patches here:

  https://lkml.kernel.org/r/20210121101702.402798862@infradead.org

fixed the immediate hotplug regression, there was still a pending issue with
hotplug rollback (which basically never happens).

I had written these patches at the time, but never got around to actually
posting them. So here goes.


^ permalink raw reply	[flat|nested] 29+ messages in thread

* [PATCH 1/3] cpumask: Make cpu_{online,possible,present,active}() inline
  2021-03-10 14:52 [PATCH 0/3] sched: Fix remaining balance_push vs hotplug hole Peter Zijlstra
@ 2021-03-10 14:52 ` Peter Zijlstra
  2021-04-16 15:53   ` [tip: sched/core] " tip-bot2 for Peter Zijlstra
  2021-03-10 14:53 ` [PATCH 2/3] cpumask: Introduce DYING mask Peter Zijlstra
  2021-03-10 14:53 ` [PATCH 3/3] sched: Use cpu_dying() to fix balance_push vs hotplug-rollback Peter Zijlstra
  2 siblings, 1 reply; 29+ messages in thread
From: Peter Zijlstra @ 2021-03-10 14:52 UTC (permalink / raw)
  To: valentin.schneider, tglx, mingo, bigeasy, swood, juri.lelli,
	vincent.guittot, dietmar.eggemann, rostedt, bsegall, mgorman,
	bristot, vincent.donnefort, qais.yousef
  Cc: linux-kernel, peterz

Prepare for addition of another mask. Primarily a code movement to
avoid having to create more #ifdef, but while there, convert
everything with an argument to an inline function.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
---
 include/linux/cpumask.h |   97 ++++++++++++++++++++++++++++++++----------------
 1 file changed, 66 insertions(+), 31 deletions(-)

--- a/include/linux/cpumask.h
+++ b/include/linux/cpumask.h
@@ -98,37 +98,6 @@ extern struct cpumask __cpu_active_mask;
 
 extern atomic_t __num_online_cpus;
 
-#if NR_CPUS > 1
-/**
- * num_online_cpus() - Read the number of online CPUs
- *
- * Despite the fact that __num_online_cpus is of type atomic_t, this
- * interface gives only a momentary snapshot and is not protected against
- * concurrent CPU hotplug operations unless invoked from a cpuhp_lock held
- * region.
- */
-static inline unsigned int num_online_cpus(void)
-{
-	return atomic_read(&__num_online_cpus);
-}
-#define num_possible_cpus()	cpumask_weight(cpu_possible_mask)
-#define num_present_cpus()	cpumask_weight(cpu_present_mask)
-#define num_active_cpus()	cpumask_weight(cpu_active_mask)
-#define cpu_online(cpu)		cpumask_test_cpu((cpu), cpu_online_mask)
-#define cpu_possible(cpu)	cpumask_test_cpu((cpu), cpu_possible_mask)
-#define cpu_present(cpu)	cpumask_test_cpu((cpu), cpu_present_mask)
-#define cpu_active(cpu)		cpumask_test_cpu((cpu), cpu_active_mask)
-#else
-#define num_online_cpus()	1U
-#define num_possible_cpus()	1U
-#define num_present_cpus()	1U
-#define num_active_cpus()	1U
-#define cpu_online(cpu)		((cpu) == 0)
-#define cpu_possible(cpu)	((cpu) == 0)
-#define cpu_present(cpu)	((cpu) == 0)
-#define cpu_active(cpu)		((cpu) == 0)
-#endif
-
 extern cpumask_t cpus_booted_once_mask;
 
 static inline void cpu_max_bits_warn(unsigned int cpu, unsigned int bits)
@@ -894,6 +863,72 @@ static inline const struct cpumask *get_
 	return to_cpumask(p);
 }
 
+#if NR_CPUS > 1
+/**
+ * num_online_cpus() - Read the number of online CPUs
+ *
+ * Despite the fact that __num_online_cpus is of type atomic_t, this
+ * interface gives only a momentary snapshot and is not protected against
+ * concurrent CPU hotplug operations unless invoked from a cpuhp_lock held
+ * region.
+ */
+static inline unsigned int num_online_cpus(void)
+{
+	return atomic_read(&__num_online_cpus);
+}
+#define num_possible_cpus()	cpumask_weight(cpu_possible_mask)
+#define num_present_cpus()	cpumask_weight(cpu_present_mask)
+#define num_active_cpus()	cpumask_weight(cpu_active_mask)
+
+static inline bool cpu_online(unsigned int cpu)
+{
+	return cpumask_test_cpu(cpu, cpu_online_mask);
+}
+
+static inline bool cpu_possible(unsigned int cpu)
+{
+	return cpumask_test_cpu(cpu, cpu_possible_mask);
+}
+
+static inline bool cpu_present(unsigned int cpu)
+{
+	return cpumask_test_cpu(cpu, cpu_present_mask);
+}
+
+static inline bool cpu_active(unsigned int cpu)
+{
+	return cpumask_test_cpu(cpu, cpu_active_mask);
+}
+
+#else
+
+#define num_online_cpus()	1U
+#define num_possible_cpus()	1U
+#define num_present_cpus()	1U
+#define num_active_cpus()	1U
+
+static inline bool cpu_online(unsigned int cpu)
+{
+	return cpu == 0;
+}
+
+static inline bool cpu_possible(unsigned int cpu)
+{
+	return cpu == 0;
+}
+
+static inline bool cpu_present(unsigned int cpu)
+{
+	return cpu == 0;
+}
+
+static inline bool cpu_active(unsigned int cpu)
+{
+	return cpu == 0;
+}
+
+#endif /* NR_CPUS > 1 */
+
 #define cpu_is_offline(cpu)	unlikely(!cpu_online(cpu))
 
 #if NR_CPUS <= BITS_PER_LONG



^ permalink raw reply	[flat|nested] 29+ messages in thread

* [PATCH 2/3] cpumask: Introduce DYING mask
  2021-03-10 14:52 [PATCH 0/3] sched: Fix remaining balance_push vs hotplug hole Peter Zijlstra
  2021-03-10 14:52 ` [PATCH 1/3] cpumask: Make cpu_{online,possible,present,active}() inline Peter Zijlstra
@ 2021-03-10 14:53 ` Peter Zijlstra
  2021-03-21 19:30   ` Qais Yousef
  2021-04-16 15:53   ` [tip: sched/core] " tip-bot2 for Peter Zijlstra
  2021-03-10 14:53 ` [PATCH 3/3] sched: Use cpu_dying() to fix balance_push vs hotplug-rollback Peter Zijlstra
  2 siblings, 2 replies; 29+ messages in thread
From: Peter Zijlstra @ 2021-03-10 14:53 UTC (permalink / raw)
  To: valentin.schneider, tglx, mingo, bigeasy, swood, juri.lelli,
	vincent.guittot, dietmar.eggemann, rostedt, bsegall, mgorman,
	bristot, vincent.donnefort, qais.yousef
  Cc: linux-kernel, peterz

Introduce a cpumask that indicates (for each CPU) what direction the
CPU hotplug is currently going. Notably, it tracks rollbacks. Eg. when
an up fails and we do a roll-back down, it will accurately reflect the
direction.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
---
 include/linux/cpumask.h |   20 ++++++++++++++++++++
 kernel/cpu.c            |    6 ++++++
 2 files changed, 26 insertions(+)

--- a/include/linux/cpumask.h
+++ b/include/linux/cpumask.h
@@ -91,10 +91,12 @@ extern struct cpumask __cpu_possible_mas
 extern struct cpumask __cpu_online_mask;
 extern struct cpumask __cpu_present_mask;
 extern struct cpumask __cpu_active_mask;
+extern struct cpumask __cpu_dying_mask;
 #define cpu_possible_mask ((const struct cpumask *)&__cpu_possible_mask)
 #define cpu_online_mask   ((const struct cpumask *)&__cpu_online_mask)
 #define cpu_present_mask  ((const struct cpumask *)&__cpu_present_mask)
 #define cpu_active_mask   ((const struct cpumask *)&__cpu_active_mask)
+#define cpu_dying_mask    ((const struct cpumask *)&__cpu_dying_mask)
 
 extern atomic_t __num_online_cpus;
 
@@ -826,6 +828,14 @@ set_cpu_active(unsigned int cpu, bool ac
 		cpumask_clear_cpu(cpu, &__cpu_active_mask);
 }
 
+static inline void
+set_cpu_dying(unsigned int cpu, bool dying)
+{
+	if (dying)
+		cpumask_set_cpu(cpu, &__cpu_dying_mask);
+	else
+		cpumask_clear_cpu(cpu, &__cpu_dying_mask);
+}
 
 /**
  * to_cpumask - convert an NR_CPUS bitmap to a struct cpumask *
@@ -900,6 +910,11 @@ static inline bool cpu_active(unsigned i
 	return cpumask_test_cpu(cpu, cpu_active_mask);
 }
 
+static inline bool cpu_dying(unsigned int cpu)
+{
+	return cpumask_test_cpu(cpu, cpu_dying_mask);
+}
+
 #else
 
 #define num_online_cpus()	1U
@@ -927,6 +942,11 @@ static inline bool cpu_active(unsigned i
 	return cpu == 0;
 }
 
+static inline bool cpu_dying(unsigned int cpu)
+{
+	return false;
+}
+
 #endif /* NR_CPUS > 1 */
 
 #define cpu_is_offline(cpu)	unlikely(!cpu_online(cpu))
--- a/kernel/cpu.c
+++ b/kernel/cpu.c
@@ -160,6 +160,9 @@ static int cpuhp_invoke_callback(unsigne
 	int (*cb)(unsigned int cpu);
 	int ret, cnt;
 
+	if (bringup != !cpu_dying(cpu))
+		set_cpu_dying(cpu, !bringup);
+
 	if (st->fail == state) {
 		st->fail = CPUHP_INVALID;
 		return -EAGAIN;
@@ -2512,6 +2515,9 @@ EXPORT_SYMBOL(__cpu_present_mask);
 struct cpumask __cpu_active_mask __read_mostly;
 EXPORT_SYMBOL(__cpu_active_mask);
 
+struct cpumask __cpu_dying_mask __read_mostly;
+EXPORT_SYMBOL(__cpu_dying_mask);
+
 atomic_t __num_online_cpus __read_mostly;
 EXPORT_SYMBOL(__num_online_cpus);
 



^ permalink raw reply	[flat|nested] 29+ messages in thread

* [PATCH 3/3] sched: Use cpu_dying() to fix balance_push vs hotplug-rollback
  2021-03-10 14:52 [PATCH 0/3] sched: Fix remaining balance_push vs hotplug hole Peter Zijlstra
  2021-03-10 14:52 ` [PATCH 1/3] cpumask: Make cpu_{online,possible,present,active}() inline Peter Zijlstra
  2021-03-10 14:53 ` [PATCH 2/3] cpumask: Introduce DYING mask Peter Zijlstra
@ 2021-03-10 14:53 ` Peter Zijlstra
  2021-03-11 15:13   ` Valentin Schneider
  2 siblings, 1 reply; 29+ messages in thread
From: Peter Zijlstra @ 2021-03-10 14:53 UTC (permalink / raw)
  To: valentin.schneider, tglx, mingo, bigeasy, swood, juri.lelli,
	vincent.guittot, dietmar.eggemann, rostedt, bsegall, mgorman,
	bristot, vincent.donnefort, qais.yousef
  Cc: linux-kernel, peterz

Use the new cpu_dying() state to simplify and fix the balance_push()
vs CPU hotplug rollback state.

Specifically, we currently rely on notifiers sched_cpu_dying() /
sched_cpu_activate() to terminate balance_push, however if the
cpu_down() fails when we're past sched_cpu_deactivate(), it should
terminate balance_push at that point and not wait until we hit
sched_cpu_activate().

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
---
 kernel/sched/core.c  |   33 ++++++++++++++++++++-------------
 kernel/sched/sched.h |    1 -
 2 files changed, 20 insertions(+), 14 deletions(-)

--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -1811,7 +1811,7 @@ static inline bool is_cpu_allowed(struct
 		return cpu_online(cpu);
 
 	/* Regular kernel threads don't get to stay during offline. */
-	if (cpu_rq(cpu)->balance_push)
+	if (cpu_dying(cpu))
 		return false;
 
 	/* But are allowed during online. */
@@ -7647,12 +7647,19 @@ static void balance_push(struct rq *rq)
 
 	lockdep_assert_held(&rq->lock);
 	SCHED_WARN_ON(rq->cpu != smp_processor_id());
+
 	/*
 	 * Ensure the thing is persistent until balance_push_set(.on = false);
 	 */
 	rq->balance_callback = &balance_push_callback;
 
 	/*
+	 * Only active while going offline.
+	 */
+	if (!cpu_dying(rq->cpu))
+		return;
+
+	/*
 	 * Both the cpu-hotplug and stop task are in this case and are
 	 * required to complete the hotplug process.
 	 *
@@ -7705,7 +7712,6 @@ static void balance_push_set(int cpu, bo
 	struct rq_flags rf;
 
 	rq_lock_irqsave(rq, &rf);
-	rq->balance_push = on;
 	if (on) {
 		WARN_ON_ONCE(rq->balance_callback);
 		rq->balance_callback = &balance_push_callback;
@@ -7830,8 +7836,8 @@ int sched_cpu_activate(unsigned int cpu)
 	struct rq_flags rf;
 
 	/*
-	 * Make sure that when the hotplug state machine does a roll-back
-	 * we clear balance_push. Ideally that would happen earlier...
+	 * Clear the balance_push callback and prepare to schedule
+	 * regular tasks.
 	 */
 	balance_push_set(cpu, false);
 
@@ -7883,14 +7889,6 @@ int sched_cpu_deactivate(unsigned int cp
 	set_cpu_active(cpu, false);
 
 	/*
-	 * From this point forward, this CPU will refuse to run any task that
-	 * is not: migrate_disable() or KTHREAD_IS_PER_CPU, and will actively
-	 * push those tasks away until this gets cleared, see
-	 * sched_cpu_dying().
-	 */
-	balance_push_set(cpu, true);
-
-	/*
 	 * We've cleared cpu_active_mask / set balance_push, wait for all
 	 * preempt-disabled and RCU users of this state to go away such that
 	 * all new such users will observe it.
@@ -7910,6 +7908,14 @@ int sched_cpu_deactivate(unsigned int cp
 	}
 	rq_unlock_irqrestore(rq, &rf);
 
+	/*
+	 * From this point forward, this CPU will refuse to run any task that
+	 * is not: migrate_disable() or KTHREAD_IS_PER_CPU, and will actively
+	 * push those tasks away until this gets cleared, see
+	 * sched_cpu_dying().
+	 */
+	balance_push_set(cpu, true);
+
 #ifdef CONFIG_SCHED_SMT
 	/*
 	 * When going down, decrement the number of cores with SMT present.
@@ -8206,7 +8212,7 @@ void __init sched_init(void)
 		rq->sd = NULL;
 		rq->rd = NULL;
 		rq->cpu_capacity = rq->cpu_capacity_orig = SCHED_CAPACITY_SCALE;
-		rq->balance_callback = NULL;
+		rq->balance_callback = &balance_push_callback;
 		rq->active_balance = 0;
 		rq->next_balance = jiffies;
 		rq->push_cpu = 0;
@@ -8253,6 +8259,7 @@ void __init sched_init(void)
 
 #ifdef CONFIG_SMP
 	idle_thread_set_boot_cpu();
+	balance_push_set(smp_processor_id(), false);
 #endif
 	init_sched_fair_class();
 
--- a/kernel/sched/sched.h
+++ b/kernel/sched/sched.h
@@ -982,7 +982,6 @@ struct rq {
 	unsigned long		cpu_capacity_orig;
 
 	struct callback_head	*balance_callback;
-	unsigned char		balance_push;
 
 	unsigned char		nohz_idle_balance;
 	unsigned char		idle_balance;



^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH 3/3] sched: Use cpu_dying() to fix balance_push vs hotplug-rollback
  2021-03-10 14:53 ` [PATCH 3/3] sched: Use cpu_dying() to fix balance_push vs hotplug-rollback Peter Zijlstra
@ 2021-03-11 15:13   ` Valentin Schneider
  2021-03-11 16:42     ` Peter Zijlstra
  2021-04-12 12:03     ` Peter Zijlstra
  0 siblings, 2 replies; 29+ messages in thread
From: Valentin Schneider @ 2021-03-11 15:13 UTC (permalink / raw)
  To: Peter Zijlstra, tglx, mingo, bigeasy, swood, juri.lelli,
	vincent.guittot, dietmar.eggemann, rostedt, bsegall, mgorman,
	bristot, vincent.donnefort, qais.yousef
  Cc: linux-kernel, peterz

Peter Zijlstra <peterz@infradead.org> writes:
> @@ -7883,14 +7889,6 @@ int sched_cpu_deactivate(unsigned int cp
>  	set_cpu_active(cpu, false);
>  
>  	/*
> -	 * From this point forward, this CPU will refuse to run any task that
> -	 * is not: migrate_disable() or KTHREAD_IS_PER_CPU, and will actively
> -	 * push those tasks away until this gets cleared, see
> -	 * sched_cpu_dying().
> -	 */
> -	balance_push_set(cpu, true);
> -
> -	/*
>  	 * We've cleared cpu_active_mask / set balance_push, wait for all
>  	 * preempt-disabled and RCU users of this state to go away such that
>  	 * all new such users will observe it.
> @@ -7910,6 +7908,14 @@ int sched_cpu_deactivate(unsigned int cp
>  	}
>  	rq_unlock_irqrestore(rq, &rf);
>  
> +	/*
> +	 * From this point forward, this CPU will refuse to run any task that
> +	 * is not: migrate_disable() or KTHREAD_IS_PER_CPU, and will actively
> +	 * push those tasks away until this gets cleared, see
> +	 * sched_cpu_dying().
> +	 */
> +	balance_push_set(cpu, true);
> +

AIUI with cpu_dying_mask being flipped before even entering
sched_cpu_deactivate(), we don't need this to be before the
synchronize_rcu() anymore; is there more than that to why you're punting it
back this side of it?

>  #ifdef CONFIG_SCHED_SMT
>  	/*
>  	 * When going down, decrement the number of cores with SMT present.

> @@ -8206,7 +8212,7 @@ void __init sched_init(void)
>  		rq->sd = NULL;
>  		rq->rd = NULL;
>  		rq->cpu_capacity = rq->cpu_capacity_orig = SCHED_CAPACITY_SCALE;
> -		rq->balance_callback = NULL;
> +		rq->balance_callback = &balance_push_callback;
>  		rq->active_balance = 0;
>  		rq->next_balance = jiffies;
>  		rq->push_cpu = 0;
> @@ -8253,6 +8259,7 @@ void __init sched_init(void)
>  
>  #ifdef CONFIG_SMP
>  	idle_thread_set_boot_cpu();
> +	balance_push_set(smp_processor_id(), false);
>  #endif
>  	init_sched_fair_class();
>

I don't get what these two changes do - the end result is the same as
before, no?


Also, AIUI this patch covers the cpu_dying -> !cpu_dying rollback case
since balance_push gets numbed down by !cpu_dying. What about the other way
around (hot-plug failure + rollback)? We may have allowed !pcpu tasks on the
now-dying CPU, and we'd need to re-install the balance_push callback. 

I'm starting to think we'd need to have

  rq->balance_callback = &balance_push_callback

for any CPU with hotplug state < CPUHP_AP_ACTIVE. Thus we would
need:

  balance_push_set(cpu, true) in sched_init() and sched_cpu_deactivate()
  balance_push_set(cpu, false) in sched_cpu_activate()

and the rest would be driven by the cpu_dying_mask.

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH 3/3] sched: Use cpu_dying() to fix balance_push vs hotplug-rollback
  2021-03-11 15:13   ` Valentin Schneider
@ 2021-03-11 16:42     ` Peter Zijlstra
  2021-04-12 12:03     ` Peter Zijlstra
  1 sibling, 0 replies; 29+ messages in thread
From: Peter Zijlstra @ 2021-03-11 16:42 UTC (permalink / raw)
  To: Valentin Schneider
  Cc: tglx, mingo, bigeasy, swood, juri.lelli, vincent.guittot,
	dietmar.eggemann, rostedt, bsegall, mgorman, bristot,
	vincent.donnefort, qais.yousef, linux-kernel

On Thu, Mar 11, 2021 at 03:13:04PM +0000, Valentin Schneider wrote:

> >  #ifdef CONFIG_SCHED_SMT
> >  	/*
> >  	 * When going down, decrement the number of cores with SMT present.
> 
> > @@ -8206,7 +8212,7 @@ void __init sched_init(void)
> >  		rq->sd = NULL;
> >  		rq->rd = NULL;
> >  		rq->cpu_capacity = rq->cpu_capacity_orig = SCHED_CAPACITY_SCALE;
> > -		rq->balance_callback = NULL;
> > +		rq->balance_callback = &balance_push_callback;
> >  		rq->active_balance = 0;
> >  		rq->next_balance = jiffies;
> >  		rq->push_cpu = 0;
> > @@ -8253,6 +8259,7 @@ void __init sched_init(void)
> >  
> >  #ifdef CONFIG_SMP
> >  	idle_thread_set_boot_cpu();
> > +	balance_push_set(smp_processor_id(), false);
> >  #endif
> >  	init_sched_fair_class();
> >
> 
> I don't get what these two changes do - the end result is the same as
> before, no?

IIRC the idea was to initialize the offline CPUs to the same state as if
they'd been offlined.

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH 2/3] cpumask: Introduce DYING mask
  2021-03-10 14:53 ` [PATCH 2/3] cpumask: Introduce DYING mask Peter Zijlstra
@ 2021-03-21 19:30   ` Qais Yousef
  2021-03-22 15:07     ` Steven Rostedt
  2021-04-12 10:55     ` Peter Zijlstra
  2021-04-16 15:53   ` [tip: sched/core] " tip-bot2 for Peter Zijlstra
  1 sibling, 2 replies; 29+ messages in thread
From: Qais Yousef @ 2021-03-21 19:30 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: valentin.schneider, tglx, mingo, bigeasy, swood, juri.lelli,
	vincent.guittot, dietmar.eggemann, rostedt, bsegall, mgorman,
	bristot, vincent.donnefort, linux-kernel

On 03/10/21 15:53, Peter Zijlstra wrote:
> --- a/kernel/cpu.c
> +++ b/kernel/cpu.c
> @@ -160,6 +160,9 @@ static int cpuhp_invoke_callback(unsigne
>  	int (*cb)(unsigned int cpu);
>  	int ret, cnt;
>  
> +	if (bringup != !cpu_dying(cpu))

nit: this condition is hard to read

> +		set_cpu_dying(cpu, !bringup);

since cpu_dying() will do cpumask_test_cpu(), are we saving  much if we
unconditionally call set_cpu_dying(cpu, !bringup) which performs
cpumask_{set, clear}_cpu()?

> +
>  	if (st->fail == state) {
>  		st->fail = CPUHP_INVALID;
>  		return -EAGAIN;

Thanks

--
Qais yousef

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH 2/3] cpumask: Introduce DYING mask
  2021-03-21 19:30   ` Qais Yousef
@ 2021-03-22 15:07     ` Steven Rostedt
  2021-04-12 10:55     ` Peter Zijlstra
  1 sibling, 0 replies; 29+ messages in thread
From: Steven Rostedt @ 2021-03-22 15:07 UTC (permalink / raw)
  To: Qais Yousef
  Cc: Peter Zijlstra, valentin.schneider, tglx, mingo, bigeasy, swood,
	juri.lelli, vincent.guittot, dietmar.eggemann, bsegall, mgorman,
	bristot, vincent.donnefort, linux-kernel

On Sun, 21 Mar 2021 19:30:37 +0000
Qais Yousef <qais.yousef@arm.com> wrote:

> On 03/10/21 15:53, Peter Zijlstra wrote:
> > --- a/kernel/cpu.c
> > +++ b/kernel/cpu.c
> > @@ -160,6 +160,9 @@ static int cpuhp_invoke_callback(unsigne
> >  	int (*cb)(unsigned int cpu);
> >  	int ret, cnt;
> >  
> > +	if (bringup != !cpu_dying(cpu))  
> 
> nit: this condition is hard to read

Would
	if (bringup == !!cpu_dying(cpu))

read better?

-- Steve

> 
> > +		set_cpu_dying(cpu, !bringup);  


^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH 2/3] cpumask: Introduce DYING mask
  2021-03-21 19:30   ` Qais Yousef
  2021-03-22 15:07     ` Steven Rostedt
@ 2021-04-12 10:55     ` Peter Zijlstra
  2021-04-12 11:16       ` Qais Yousef
  1 sibling, 1 reply; 29+ messages in thread
From: Peter Zijlstra @ 2021-04-12 10:55 UTC (permalink / raw)
  To: Qais Yousef
  Cc: valentin.schneider, tglx, mingo, bigeasy, swood, juri.lelli,
	vincent.guittot, dietmar.eggemann, rostedt, bsegall, mgorman,
	bristot, vincent.donnefort, linux-kernel

On Sun, Mar 21, 2021 at 07:30:37PM +0000, Qais Yousef wrote:
> On 03/10/21 15:53, Peter Zijlstra wrote:
> > --- a/kernel/cpu.c
> > +++ b/kernel/cpu.c
> > @@ -160,6 +160,9 @@ static int cpuhp_invoke_callback(unsigne
> >  	int (*cb)(unsigned int cpu);
> >  	int ret, cnt;
> >  
> > +	if (bringup != !cpu_dying(cpu))
> 
> nit: this condition is hard to read
> 
> > +		set_cpu_dying(cpu, !bringup);

How's:

	if (cpu_dying(cpu) != !bringup)
		set_cpu_dying(cpu, !bringup);

> since cpu_dying() will do cpumask_test_cpu(), are we saving  much if we
> unconditionally call set_cpu_dying(cpu, !bringup) which performs
> cpumask_{set, clear}_cpu()?

This is hotplug, it's all slow, endlessly rewriting that bit shouldn't
be a problem I suppose.

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH 2/3] cpumask: Introduce DYING mask
  2021-04-12 10:55     ` Peter Zijlstra
@ 2021-04-12 11:16       ` Qais Yousef
  0 siblings, 0 replies; 29+ messages in thread
From: Qais Yousef @ 2021-04-12 11:16 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: valentin.schneider, tglx, mingo, bigeasy, swood, juri.lelli,
	vincent.guittot, dietmar.eggemann, rostedt, bsegall, mgorman,
	bristot, vincent.donnefort, linux-kernel

On 04/12/21 12:55, Peter Zijlstra wrote:
> On Sun, Mar 21, 2021 at 07:30:37PM +0000, Qais Yousef wrote:
> > On 03/10/21 15:53, Peter Zijlstra wrote:
> > > --- a/kernel/cpu.c
> > > +++ b/kernel/cpu.c
> > > @@ -160,6 +160,9 @@ static int cpuhp_invoke_callback(unsigne
> > >  	int (*cb)(unsigned int cpu);
> > >  	int ret, cnt;
> > >  
> > > +	if (bringup != !cpu_dying(cpu))
> > 
> > nit: this condition is hard to read
> > 
> > > +		set_cpu_dying(cpu, !bringup);
> 
> How's:
> 
> 	if (cpu_dying(cpu) != !bringup)
> 		set_cpu_dying(cpu, !bringup);

Slightly better I suppose :)

> 
> > since cpu_dying() will do cpumask_test_cpu(), are we saving  much if we
> > unconditionally call set_cpu_dying(cpu, !bringup) which performs
> > cpumask_{set, clear}_cpu()?
> 
> This is hotplug, it's all slow, endlessly rewriting that bit shouldn't
> be a problem I suppose.

True. Beside I doubt there's a performance hit really, cpu_dying() will read
the bit in the cpumask anyway, unconditionally writing will be as fast since
both will fetch the cacheline anyway?

Regardless, not really a big deal. It's not really the hardest thing to stare
at here ;-)

Thanks

--
Qais Yousef

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH 3/3] sched: Use cpu_dying() to fix balance_push vs hotplug-rollback
  2021-03-11 15:13   ` Valentin Schneider
  2021-03-11 16:42     ` Peter Zijlstra
@ 2021-04-12 12:03     ` Peter Zijlstra
  2021-04-12 17:22       ` Valentin Schneider
  1 sibling, 1 reply; 29+ messages in thread
From: Peter Zijlstra @ 2021-04-12 12:03 UTC (permalink / raw)
  To: Valentin Schneider
  Cc: tglx, mingo, bigeasy, swood, juri.lelli, vincent.guittot,
	dietmar.eggemann, rostedt, bsegall, mgorman, bristot,
	vincent.donnefort, qais.yousef, linux-kernel

On Thu, Mar 11, 2021 at 03:13:04PM +0000, Valentin Schneider wrote:
> Peter Zijlstra <peterz@infradead.org> writes:
> > @@ -7883,14 +7889,6 @@ int sched_cpu_deactivate(unsigned int cp
> >  	set_cpu_active(cpu, false);
> >  
> >  	/*
> > -	 * From this point forward, this CPU will refuse to run any task that
> > -	 * is not: migrate_disable() or KTHREAD_IS_PER_CPU, and will actively
> > -	 * push those tasks away until this gets cleared, see
> > -	 * sched_cpu_dying().
> > -	 */
> > -	balance_push_set(cpu, true);
> > -
> > -	/*
> >  	 * We've cleared cpu_active_mask / set balance_push, wait for all
> >  	 * preempt-disabled and RCU users of this state to go away such that
> >  	 * all new such users will observe it.
> > @@ -7910,6 +7908,14 @@ int sched_cpu_deactivate(unsigned int cp
> >  	}
> >  	rq_unlock_irqrestore(rq, &rf);
> >  
> > +	/*
> > +	 * From this point forward, this CPU will refuse to run any task that
> > +	 * is not: migrate_disable() or KTHREAD_IS_PER_CPU, and will actively
> > +	 * push those tasks away until this gets cleared, see
> > +	 * sched_cpu_dying().
> > +	 */
> > +	balance_push_set(cpu, true);
> > +
> 
> AIUI with cpu_dying_mask being flipped before even entering
> sched_cpu_deactivate(), we don't need this to be before the
> synchronize_rcu() anymore; is there more than that to why you're punting it
> back this side of it?

I think it does does need to be like this, we need to clearly separate
the active=true and balance_push_set(). If we were to somehow observe
both balance_push_set() and active==false, we'd be in trouble.

> >  #ifdef CONFIG_SCHED_SMT
> >  	/*
> >  	 * When going down, decrement the number of cores with SMT present.
> 
> > @@ -8206,7 +8212,7 @@ void __init sched_init(void)
> >  		rq->sd = NULL;
> >  		rq->rd = NULL;
> >  		rq->cpu_capacity = rq->cpu_capacity_orig = SCHED_CAPACITY_SCALE;
> > -		rq->balance_callback = NULL;
> > +		rq->balance_callback = &balance_push_callback;
> >  		rq->active_balance = 0;
> >  		rq->next_balance = jiffies;
> >  		rq->push_cpu = 0;
> > @@ -8253,6 +8259,7 @@ void __init sched_init(void)
> >  
> >  #ifdef CONFIG_SMP
> >  	idle_thread_set_boot_cpu();
> > +	balance_push_set(smp_processor_id(), false);
> >  #endif
> >  	init_sched_fair_class();
> >
> 
> I don't get what these two changes do - the end result is the same as
> before, no?

Not quite; we have to make sure the state of the offline CPUs matches
that of a CPU that's been offlined. For consistency if nothing else, but
it's actually important for a point below.

> Also, AIUI this patch covers the cpu_dying -> !cpu_dying rollback case
> since balance_push gets numbed down by !cpu_dying. What about the other way
> around (hot-plug failure + rollback)? We may have allowed !pcpu tasks on the
> now-dying CPU, and we'd need to re-install the balance_push callback. 

This is in fact handled. Note how the previous point initialized the
offline CPU to have the push_callback installed.

Also note how balance_push() re-instates itself unconditionally.

So the thing is, we install the push callback on deactivate() and leave
it in place until activate, it is always there, regardless what way
we're moving.

However, it is only effective whild going down, see the early exit.

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH 3/3] sched: Use cpu_dying() to fix balance_push vs hotplug-rollback
  2021-04-12 12:03     ` Peter Zijlstra
@ 2021-04-12 17:22       ` Valentin Schneider
  2021-04-13  6:51         ` Peter Zijlstra
  0 siblings, 1 reply; 29+ messages in thread
From: Valentin Schneider @ 2021-04-12 17:22 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: tglx, mingo, bigeasy, swood, juri.lelli, vincent.guittot,
	dietmar.eggemann, rostedt, bsegall, mgorman, bristot,
	vincent.donnefort, qais.yousef, linux-kernel

On 12/04/21 14:03, Peter Zijlstra wrote:
> On Thu, Mar 11, 2021 at 03:13:04PM +0000, Valentin Schneider wrote:
>> Peter Zijlstra <peterz@infradead.org> writes:
>> > @@ -7910,6 +7908,14 @@ int sched_cpu_deactivate(unsigned int cp
>> >    }
>> >    rq_unlock_irqrestore(rq, &rf);
>> >
>> > +	/*
>> > +	 * From this point forward, this CPU will refuse to run any task that
>> > +	 * is not: migrate_disable() or KTHREAD_IS_PER_CPU, and will actively
>> > +	 * push those tasks away until this gets cleared, see
>> > +	 * sched_cpu_dying().
>> > +	 */
>> > +	balance_push_set(cpu, true);
>> > +
>>
>> AIUI with cpu_dying_mask being flipped before even entering
>> sched_cpu_deactivate(), we don't need this to be before the
>> synchronize_rcu() anymore; is there more than that to why you're punting it
>> back this side of it?
>
> I think it does does need to be like this, we need to clearly separate
> the active=true and balance_push_set(). If we were to somehow observe
> both balance_push_set() and active==false, we'd be in trouble.
>

I'm afraid I don't follow; we're replacing a read of rq->balance_push with
cpu_dying(), and those are still written on the same side of the
synchronize_rcu(). What am I missing?

>> >  #ifdef CONFIG_SCHED_SMT
>> >    /*
>> >     * When going down, decrement the number of cores with SMT present.
>>
>> > @@ -8206,7 +8212,7 @@ void __init sched_init(void)
>> >            rq->sd = NULL;
>> >            rq->rd = NULL;
>> >            rq->cpu_capacity = rq->cpu_capacity_orig = SCHED_CAPACITY_SCALE;
>> > -		rq->balance_callback = NULL;
>> > +		rq->balance_callback = &balance_push_callback;
>> >            rq->active_balance = 0;
>> >            rq->next_balance = jiffies;
>> >            rq->push_cpu = 0;
>> > @@ -8253,6 +8259,7 @@ void __init sched_init(void)
>> >
>> >  #ifdef CONFIG_SMP
>> >    idle_thread_set_boot_cpu();
>> > +	balance_push_set(smp_processor_id(), false);
>> >  #endif
>> >    init_sched_fair_class();
>> >
>>
>> I don't get what these two changes do - the end result is the same as
>> before, no?
>
> Not quite; we have to make sure the state of the offline CPUs matches
> that of a CPU that's been offlined. For consistency if nothing else, but
> it's actually important for a point below.
>
>> Also, AIUI this patch covers the cpu_dying -> !cpu_dying rollback case
>> since balance_push gets numbed down by !cpu_dying. What about the other way
>> around (hot-plug failure + rollback)? We may have allowed !pcpu tasks on the
>> now-dying CPU, and we'd need to re-install the balance_push callback.
>
> This is in fact handled. Note how the previous point initialized the
> offline CPU to have the push_callback installed.
>
> Also note how balance_push() re-instates itself unconditionally.
>
> So the thing is, we install the push callback on deactivate() and leave
> it in place until activate, it is always there, regardless what way
> we're moving.
>
> However, it is only effective whild going down, see the early exit.


Oooh, I can't read, only the boot CPU gets its callback uninstalled in
sched_init()! So secondaries keep push_callback installed up until
sched_cpu_activate(), but as you said it's not effective unless a rollback
happens.

Now, doesn't that mean we should *not* uninstall the callback in
sched_cpu_dying()? AFAIK it's possible for the initial secondary CPU
boot to go fine, but the next offline+online cycle fails while going up -
that would need to rollback with push_callback installed.

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH 3/3] sched: Use cpu_dying() to fix balance_push vs hotplug-rollback
  2021-04-12 17:22       ` Valentin Schneider
@ 2021-04-13  6:51         ` Peter Zijlstra
  2021-04-15  8:59           ` Peter Zijlstra
  0 siblings, 1 reply; 29+ messages in thread
From: Peter Zijlstra @ 2021-04-13  6:51 UTC (permalink / raw)
  To: Valentin Schneider
  Cc: tglx, mingo, bigeasy, swood, juri.lelli, vincent.guittot,
	dietmar.eggemann, rostedt, bsegall, mgorman, bristot,
	vincent.donnefort, qais.yousef, linux-kernel

On Mon, Apr 12, 2021 at 06:22:42PM +0100, Valentin Schneider wrote:
> On 12/04/21 14:03, Peter Zijlstra wrote:
> > On Thu, Mar 11, 2021 at 03:13:04PM +0000, Valentin Schneider wrote:
> >> Peter Zijlstra <peterz@infradead.org> writes:
> >> > @@ -7910,6 +7908,14 @@ int sched_cpu_deactivate(unsigned int cp
> >> >    }
> >> >    rq_unlock_irqrestore(rq, &rf);
> >> >
> >> > +	/*
> >> > +	 * From this point forward, this CPU will refuse to run any task that
> >> > +	 * is not: migrate_disable() or KTHREAD_IS_PER_CPU, and will actively
> >> > +	 * push those tasks away until this gets cleared, see
> >> > +	 * sched_cpu_dying().
> >> > +	 */
> >> > +	balance_push_set(cpu, true);
> >> > +
> >>
> >> AIUI with cpu_dying_mask being flipped before even entering
> >> sched_cpu_deactivate(), we don't need this to be before the
> >> synchronize_rcu() anymore; is there more than that to why you're punting it
> >> back this side of it?
> >
> > I think it does does need to be like this, we need to clearly separate
> > the active=true and balance_push_set(). If we were to somehow observe
> > both balance_push_set() and active==false, we'd be in trouble.
> >
> 
> I'm afraid I don't follow; we're replacing a read of rq->balance_push with
> cpu_dying(), and those are still written on the same side of the
> synchronize_rcu(). What am I missing?

Yeah, I'm not sure anymnore either; I tried to work out why I'd done
that but upon closer examination everything fell flat.

Let me try again today :-)

> Oooh, I can't read, only the boot CPU gets its callback uninstalled in
> sched_init()! So secondaries keep push_callback installed up until
> sched_cpu_activate(), but as you said it's not effective unless a rollback
> happens.
> 
> Now, doesn't that mean we should *not* uninstall the callback in
> sched_cpu_dying()? AFAIK it's possible for the initial secondary CPU
> boot to go fine, but the next offline+online cycle fails while going up -
> that would need to rollback with push_callback installed.

Quite; I removed that shortly after sending this; when I tried to write
a comment and found it.

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH 3/3] sched: Use cpu_dying() to fix balance_push vs hotplug-rollback
  2021-04-13  6:51         ` Peter Zijlstra
@ 2021-04-15  8:59           ` Peter Zijlstra
  2021-04-15 14:32             ` Valentin Schneider
  2021-04-16 15:53             ` [tip: sched/core] sched: Use cpu_dying() to fix balance_push vs hotplug-rollback tip-bot2 for Peter Zijlstra
  0 siblings, 2 replies; 29+ messages in thread
From: Peter Zijlstra @ 2021-04-15  8:59 UTC (permalink / raw)
  To: Valentin Schneider
  Cc: tglx, mingo, bigeasy, swood, juri.lelli, vincent.guittot,
	dietmar.eggemann, rostedt, bsegall, mgorman, bristot,
	vincent.donnefort, qais.yousef, linux-kernel

On Tue, Apr 13, 2021 at 08:51:23AM +0200, Peter Zijlstra wrote:

> > I'm afraid I don't follow; we're replacing a read of rq->balance_push with
> > cpu_dying(), and those are still written on the same side of the
> > synchronize_rcu(). What am I missing?
> 
> Yeah, I'm not sure anymnore either; I tried to work out why I'd done
> that but upon closer examination everything fell flat.
> 
> Let me try again today :-)

Can't make sense of what I did.. I've removed that hunk. Patch now looks
like this.

---
Subject: sched: Use cpu_dying() to fix balance_push vs hotplug-rollback
From: Peter Zijlstra <peterz@infradead.org>
Date: Thu Jan 21 16:09:32 CET 2021

Use the new cpu_dying() state to simplify and fix the balance_push()
vs CPU hotplug rollback state.

Specifically, we currently rely on notifiers sched_cpu_dying() /
sched_cpu_activate() to terminate balance_push, however if the
cpu_down() fails when we're past sched_cpu_deactivate(), it should
terminate balance_push at that point and not wait until we hit
sched_cpu_activate().

Similarly, when cpu_up() fails and we're going back down, balance_push
should be active, where it currently is not.

So instead, make sure balance_push is enabled between
sched_cpu_deactivate() and sched_cpu_activate() (eg. when
!cpu_active()), and gate it's utility with cpu_dying().

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
---
 kernel/sched/core.c  |   26 +++++++++++++++-----------
 kernel/sched/sched.h |    1 -
 2 files changed, 15 insertions(+), 12 deletions(-)

--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -1811,7 +1811,7 @@ static inline bool is_cpu_allowed(struct
 		return cpu_online(cpu);
 
 	/* Regular kernel threads don't get to stay during offline. */
-	if (cpu_rq(cpu)->balance_push)
+	if (cpu_dying(cpu))
 		return false;
 
 	/* But are allowed during online. */
@@ -7639,6 +7639,9 @@ static DEFINE_PER_CPU(struct cpu_stop_wo
 
 /*
  * Ensure we only run per-cpu kthreads once the CPU goes !active.
+ *
+ * This is active/set between sched_cpu_deactivate() / sched_cpu_activate().
+ * But only effective when the hotplug motion is down.
  */
 static void balance_push(struct rq *rq)
 {
@@ -7646,12 +7649,19 @@ static void balance_push(struct rq *rq)
 
 	lockdep_assert_held(&rq->lock);
 	SCHED_WARN_ON(rq->cpu != smp_processor_id());
+
 	/*
 	 * Ensure the thing is persistent until balance_push_set(.on = false);
 	 */
 	rq->balance_callback = &balance_push_callback;
 
 	/*
+	 * Only active while going offline.
+	 */
+	if (!cpu_dying(rq->cpu))
+		return;
+
+	/*
 	 * Both the cpu-hotplug and stop task are in this case and are
 	 * required to complete the hotplug process.
 	 *
@@ -7704,7 +7714,6 @@ static void balance_push_set(int cpu, bo
 	struct rq_flags rf;
 
 	rq_lock_irqsave(rq, &rf);
-	rq->balance_push = on;
 	if (on) {
 		WARN_ON_ONCE(rq->balance_callback);
 		rq->balance_callback = &balance_push_callback;
@@ -7829,8 +7838,8 @@ int sched_cpu_activate(unsigned int cpu)
 	struct rq_flags rf;
 
 	/*
-	 * Make sure that when the hotplug state machine does a roll-back
-	 * we clear balance_push. Ideally that would happen earlier...
+	 * Clear the balance_push callback and prepare to schedule
+	 * regular tasks.
 	 */
 	balance_push_set(cpu, false);
 
@@ -8015,12 +8024,6 @@ int sched_cpu_dying(unsigned int cpu)
 	}
 	rq_unlock_irqrestore(rq, &rf);
 
-	/*
-	 * Now that the CPU is offline, make sure we're welcome
-	 * to new tasks once we come back up.
-	 */
-	balance_push_set(cpu, false);
-
 	calc_load_migrate(rq);
 	update_max_interval();
 	hrtick_clear(rq);
@@ -8205,7 +8208,7 @@ void __init sched_init(void)
 		rq->sd = NULL;
 		rq->rd = NULL;
 		rq->cpu_capacity = rq->cpu_capacity_orig = SCHED_CAPACITY_SCALE;
-		rq->balance_callback = NULL;
+		rq->balance_callback = &balance_push_callback;
 		rq->active_balance = 0;
 		rq->next_balance = jiffies;
 		rq->push_cpu = 0;
@@ -8252,6 +8255,7 @@ void __init sched_init(void)
 
 #ifdef CONFIG_SMP
 	idle_thread_set_boot_cpu();
+	balance_push_set(smp_processor_id(), false);
 #endif
 	init_sched_fair_class();
 
--- a/kernel/sched/sched.h
+++ b/kernel/sched/sched.h
@@ -983,7 +983,6 @@ struct rq {
 	unsigned long		cpu_capacity_orig;
 
 	struct callback_head	*balance_callback;
-	unsigned char		balance_push;
 
 	unsigned char		nohz_idle_balance;
 	unsigned char		idle_balance;

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH 3/3] sched: Use cpu_dying() to fix balance_push vs hotplug-rollback
  2021-04-15  8:59           ` Peter Zijlstra
@ 2021-04-15 14:32             ` Valentin Schneider
  2021-04-15 15:29               ` Peter Zijlstra
  2021-04-19 10:56               ` Vincent Donnefort
  2021-04-16 15:53             ` [tip: sched/core] sched: Use cpu_dying() to fix balance_push vs hotplug-rollback tip-bot2 for Peter Zijlstra
  1 sibling, 2 replies; 29+ messages in thread
From: Valentin Schneider @ 2021-04-15 14:32 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: tglx, mingo, bigeasy, swood, juri.lelli, vincent.guittot,
	dietmar.eggemann, rostedt, bsegall, mgorman, bristot,
	vincent.donnefort, qais.yousef, linux-kernel

On 15/04/21 10:59, Peter Zijlstra wrote:
> Can't make sense of what I did.. I've removed that hunk. Patch now looks
> like this.
>

Small nit below, but regardless feel free to apply to the whole lot:
Reviewed-by: Valentin Schneider <valentin.schneider@arm.com>

@VincentD, ISTR you had tested the initial version of this with your fancy
shmancy hotplug rollback stresser. Feel like doing this

> So instead, make sure balance_push is enabled between
> sched_cpu_deactivate() and sched_cpu_activate() (eg. when
> !cpu_active()), and gate it's utility with cpu_dying().

I'd word that "is enabled below sched_cpu_activate()", since
sched_cpu_deactivate() is now out of the picture.

[...]
> @@ -7639,6 +7639,9 @@ static DEFINE_PER_CPU(struct cpu_stop_wo
>
>  /*
>   * Ensure we only run per-cpu kthreads once the CPU goes !active.
> + *
> + * This is active/set between sched_cpu_deactivate() / sched_cpu_activate().

Ditto

> + * But only effective when the hotplug motion is down.
>   */
>  static void balance_push(struct rq *rq)
>  {

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH 3/3] sched: Use cpu_dying() to fix balance_push vs hotplug-rollback
  2021-04-15 14:32             ` Valentin Schneider
@ 2021-04-15 15:29               ` Peter Zijlstra
  2021-04-15 15:34                 ` Valentin Schneider
  2021-04-19 10:56               ` Vincent Donnefort
  1 sibling, 1 reply; 29+ messages in thread
From: Peter Zijlstra @ 2021-04-15 15:29 UTC (permalink / raw)
  To: Valentin Schneider
  Cc: tglx, mingo, bigeasy, swood, juri.lelli, vincent.guittot,
	dietmar.eggemann, rostedt, bsegall, mgorman, bristot,
	vincent.donnefort, qais.yousef, linux-kernel

On Thu, Apr 15, 2021 at 03:32:11PM +0100, Valentin Schneider wrote:
> > So instead, make sure balance_push is enabled between
> > sched_cpu_deactivate() and sched_cpu_activate() (eg. when
> > !cpu_active()), and gate it's utility with cpu_dying().
> 
> I'd word that "is enabled below sched_cpu_activate()", since
> sched_cpu_deactivate() is now out of the picture.

I are confused (again!). Did you mean to say something like: 'is enabled
below SCHED_AP_ACTIVE' ?

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH 3/3] sched: Use cpu_dying() to fix balance_push vs hotplug-rollback
  2021-04-15 15:29               ` Peter Zijlstra
@ 2021-04-15 15:34                 ` Valentin Schneider
  0 siblings, 0 replies; 29+ messages in thread
From: Valentin Schneider @ 2021-04-15 15:34 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: tglx, mingo, bigeasy, swood, juri.lelli, vincent.guittot,
	dietmar.eggemann, rostedt, bsegall, mgorman, bristot,
	vincent.donnefort, qais.yousef, linux-kernel

On 15/04/21 17:29, Peter Zijlstra wrote:
> On Thu, Apr 15, 2021 at 03:32:11PM +0100, Valentin Schneider wrote:
>> I'd word that "is enabled below sched_cpu_activate()", since
>> sched_cpu_deactivate() is now out of the picture.
>
> I are confused (again!). Did you mean to say something like: 'is enabled
> below SCHED_AP_ACTIVE' ?

Something like this yes; wording is hard.

^ permalink raw reply	[flat|nested] 29+ messages in thread

* [tip: sched/core] cpumask: Introduce DYING mask
  2021-03-10 14:53 ` [PATCH 2/3] cpumask: Introduce DYING mask Peter Zijlstra
  2021-03-21 19:30   ` Qais Yousef
@ 2021-04-16 15:53   ` tip-bot2 for Peter Zijlstra
  1 sibling, 0 replies; 29+ messages in thread
From: tip-bot2 for Peter Zijlstra @ 2021-04-16 15:53 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: Peter Zijlstra (Intel), Valentin Schneider, x86, linux-kernel

The following commit has been merged into the sched/core branch of tip:

Commit-ID:     e40f74c535b8a0ecf3ef0388b51a34cdadb34fb5
Gitweb:        https://git.kernel.org/tip/e40f74c535b8a0ecf3ef0388b51a34cdadb34fb5
Author:        Peter Zijlstra <peterz@infradead.org>
AuthorDate:    Tue, 19 Jan 2021 18:43:45 +01:00
Committer:     Peter Zijlstra <peterz@infradead.org>
CommitterDate: Fri, 16 Apr 2021 17:06:32 +02:00

cpumask: Introduce DYING mask

Introduce a cpumask that indicates (for each CPU) what direction the
CPU hotplug is currently going. Notably, it tracks rollbacks. Eg. when
an up fails and we do a roll-back down, it will accurately reflect the
direction.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Valentin Schneider <valentin.schneider@arm.com>
Link: https://lkml.kernel.org/r/20210310150109.151441252@infradead.org
---
 include/linux/cpumask.h | 20 ++++++++++++++++++++
 kernel/cpu.c            |  6 ++++++
 2 files changed, 26 insertions(+)

diff --git a/include/linux/cpumask.h b/include/linux/cpumask.h
index a584336..e6b948a 100644
--- a/include/linux/cpumask.h
+++ b/include/linux/cpumask.h
@@ -91,10 +91,12 @@ extern struct cpumask __cpu_possible_mask;
 extern struct cpumask __cpu_online_mask;
 extern struct cpumask __cpu_present_mask;
 extern struct cpumask __cpu_active_mask;
+extern struct cpumask __cpu_dying_mask;
 #define cpu_possible_mask ((const struct cpumask *)&__cpu_possible_mask)
 #define cpu_online_mask   ((const struct cpumask *)&__cpu_online_mask)
 #define cpu_present_mask  ((const struct cpumask *)&__cpu_present_mask)
 #define cpu_active_mask   ((const struct cpumask *)&__cpu_active_mask)
+#define cpu_dying_mask    ((const struct cpumask *)&__cpu_dying_mask)
 
 extern atomic_t __num_online_cpus;
 
@@ -826,6 +828,14 @@ set_cpu_active(unsigned int cpu, bool active)
 		cpumask_clear_cpu(cpu, &__cpu_active_mask);
 }
 
+static inline void
+set_cpu_dying(unsigned int cpu, bool dying)
+{
+	if (dying)
+		cpumask_set_cpu(cpu, &__cpu_dying_mask);
+	else
+		cpumask_clear_cpu(cpu, &__cpu_dying_mask);
+}
 
 /**
  * to_cpumask - convert an NR_CPUS bitmap to a struct cpumask *
@@ -900,6 +910,11 @@ static inline bool cpu_active(unsigned int cpu)
 	return cpumask_test_cpu(cpu, cpu_active_mask);
 }
 
+static inline bool cpu_dying(unsigned int cpu)
+{
+	return cpumask_test_cpu(cpu, cpu_dying_mask);
+}
+
 #else
 
 #define num_online_cpus()	1U
@@ -927,6 +942,11 @@ static inline bool cpu_active(unsigned int cpu)
 	return cpu == 0;
 }
 
+static inline bool cpu_dying(unsigned int cpu)
+{
+	return false;
+}
+
 #endif /* NR_CPUS > 1 */
 
 #define cpu_is_offline(cpu)	unlikely(!cpu_online(cpu))
diff --git a/kernel/cpu.c b/kernel/cpu.c
index 23505d6..838dcf2 100644
--- a/kernel/cpu.c
+++ b/kernel/cpu.c
@@ -160,6 +160,9 @@ static int cpuhp_invoke_callback(unsigned int cpu, enum cpuhp_state state,
 	int (*cb)(unsigned int cpu);
 	int ret, cnt;
 
+	if (cpu_dying(cpu) != !bringup)
+		set_cpu_dying(cpu, !bringup);
+
 	if (st->fail == state) {
 		st->fail = CPUHP_INVALID;
 		return -EAGAIN;
@@ -2512,6 +2515,9 @@ EXPORT_SYMBOL(__cpu_present_mask);
 struct cpumask __cpu_active_mask __read_mostly;
 EXPORT_SYMBOL(__cpu_active_mask);
 
+struct cpumask __cpu_dying_mask __read_mostly;
+EXPORT_SYMBOL(__cpu_dying_mask);
+
 atomic_t __num_online_cpus __read_mostly;
 EXPORT_SYMBOL(__num_online_cpus);
 

^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [tip: sched/core] sched: Use cpu_dying() to fix balance_push vs hotplug-rollback
  2021-04-15  8:59           ` Peter Zijlstra
  2021-04-15 14:32             ` Valentin Schneider
@ 2021-04-16 15:53             ` tip-bot2 for Peter Zijlstra
  1 sibling, 0 replies; 29+ messages in thread
From: tip-bot2 for Peter Zijlstra @ 2021-04-16 15:53 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: Peter Zijlstra (Intel), Valentin Schneider, x86, linux-kernel

The following commit has been merged into the sched/core branch of tip:

Commit-ID:     b5c4477366fb5e6a2f0f38742c33acd666c07698
Gitweb:        https://git.kernel.org/tip/b5c4477366fb5e6a2f0f38742c33acd666c07698
Author:        Peter Zijlstra <peterz@infradead.org>
AuthorDate:    Thu, 21 Jan 2021 16:09:32 +01:00
Committer:     Peter Zijlstra <peterz@infradead.org>
CommitterDate: Fri, 16 Apr 2021 17:06:32 +02:00

sched: Use cpu_dying() to fix balance_push vs hotplug-rollback

Use the new cpu_dying() state to simplify and fix the balance_push()
vs CPU hotplug rollback state.

Specifically, we currently rely on notifiers sched_cpu_dying() /
sched_cpu_activate() to terminate balance_push, however if the
cpu_down() fails when we're past sched_cpu_deactivate(), it should
terminate balance_push at that point and not wait until we hit
sched_cpu_activate().

Similarly, when cpu_up() fails and we're going back down, balance_push
should be active, where it currently is not.

So instead, make sure balance_push is enabled below SCHED_AP_ACTIVE
(when !cpu_active()), and gate it's utility with cpu_dying().

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Valentin Schneider <valentin.schneider@arm.com>
Link: https://lkml.kernel.org/r/YHgAYef83VQhKdC2@hirez.programming.kicks-ass.net
---
 kernel/sched/core.c  | 26 +++++++++++++++-----------
 kernel/sched/sched.h |  1 -
 2 files changed, 15 insertions(+), 12 deletions(-)

diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 95bd6ab..7d031da 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -1811,7 +1811,7 @@ static inline bool is_cpu_allowed(struct task_struct *p, int cpu)
 		return cpu_online(cpu);
 
 	/* Regular kernel threads don't get to stay during offline. */
-	if (cpu_rq(cpu)->balance_push)
+	if (cpu_dying(cpu))
 		return false;
 
 	/* But are allowed during online. */
@@ -7638,6 +7638,9 @@ static DEFINE_PER_CPU(struct cpu_stop_work, push_work);
 
 /*
  * Ensure we only run per-cpu kthreads once the CPU goes !active.
+ *
+ * This is enabled below SCHED_AP_ACTIVE; when !cpu_active(), but only
+ * effective when the hotplug motion is down.
  */
 static void balance_push(struct rq *rq)
 {
@@ -7645,12 +7648,19 @@ static void balance_push(struct rq *rq)
 
 	lockdep_assert_held(&rq->lock);
 	SCHED_WARN_ON(rq->cpu != smp_processor_id());
+
 	/*
 	 * Ensure the thing is persistent until balance_push_set(.on = false);
 	 */
 	rq->balance_callback = &balance_push_callback;
 
 	/*
+	 * Only active while going offline.
+	 */
+	if (!cpu_dying(rq->cpu))
+		return;
+
+	/*
 	 * Both the cpu-hotplug and stop task are in this case and are
 	 * required to complete the hotplug process.
 	 *
@@ -7703,7 +7713,6 @@ static void balance_push_set(int cpu, bool on)
 	struct rq_flags rf;
 
 	rq_lock_irqsave(rq, &rf);
-	rq->balance_push = on;
 	if (on) {
 		WARN_ON_ONCE(rq->balance_callback);
 		rq->balance_callback = &balance_push_callback;
@@ -7828,8 +7837,8 @@ int sched_cpu_activate(unsigned int cpu)
 	struct rq_flags rf;
 
 	/*
-	 * Make sure that when the hotplug state machine does a roll-back
-	 * we clear balance_push. Ideally that would happen earlier...
+	 * Clear the balance_push callback and prepare to schedule
+	 * regular tasks.
 	 */
 	balance_push_set(cpu, false);
 
@@ -8014,12 +8023,6 @@ int sched_cpu_dying(unsigned int cpu)
 	}
 	rq_unlock_irqrestore(rq, &rf);
 
-	/*
-	 * Now that the CPU is offline, make sure we're welcome
-	 * to new tasks once we come back up.
-	 */
-	balance_push_set(cpu, false);
-
 	calc_load_migrate(rq);
 	update_max_interval();
 	hrtick_clear(rq);
@@ -8204,7 +8207,7 @@ void __init sched_init(void)
 		rq->sd = NULL;
 		rq->rd = NULL;
 		rq->cpu_capacity = rq->cpu_capacity_orig = SCHED_CAPACITY_SCALE;
-		rq->balance_callback = NULL;
+		rq->balance_callback = &balance_push_callback;
 		rq->active_balance = 0;
 		rq->next_balance = jiffies;
 		rq->push_cpu = 0;
@@ -8251,6 +8254,7 @@ void __init sched_init(void)
 
 #ifdef CONFIG_SMP
 	idle_thread_set_boot_cpu();
+	balance_push_set(smp_processor_id(), false);
 #endif
 	init_sched_fair_class();
 
diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h
index cbb0b01..7e7e936 100644
--- a/kernel/sched/sched.h
+++ b/kernel/sched/sched.h
@@ -983,7 +983,6 @@ struct rq {
 	unsigned long		cpu_capacity_orig;
 
 	struct callback_head	*balance_callback;
-	unsigned char		balance_push;
 
 	unsigned char		nohz_idle_balance;
 	unsigned char		idle_balance;

^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [tip: sched/core] cpumask: Make cpu_{online,possible,present,active}() inline
  2021-03-10 14:52 ` [PATCH 1/3] cpumask: Make cpu_{online,possible,present,active}() inline Peter Zijlstra
@ 2021-04-16 15:53   ` tip-bot2 for Peter Zijlstra
  0 siblings, 0 replies; 29+ messages in thread
From: tip-bot2 for Peter Zijlstra @ 2021-04-16 15:53 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: Peter Zijlstra (Intel), Valentin Schneider, x86, linux-kernel

The following commit has been merged into the sched/core branch of tip:

Commit-ID:     b02a4fd8148f655095d9e3d6eddd8f0042bcc27c
Gitweb:        https://git.kernel.org/tip/b02a4fd8148f655095d9e3d6eddd8f0042bcc27c
Author:        Peter Zijlstra <peterz@infradead.org>
AuthorDate:    Mon, 25 Jan 2021 16:46:49 +01:00
Committer:     Peter Zijlstra <peterz@infradead.org>
CommitterDate: Fri, 16 Apr 2021 17:06:32 +02:00

cpumask: Make cpu_{online,possible,present,active}() inline

Prepare for addition of another mask. Primarily a code movement to
avoid having to create more #ifdef, but while there, convert
everything with an argument to an inline function.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Valentin Schneider <valentin.schneider@arm.com>
Link: https://lkml.kernel.org/r/20210310150109.045447765@infradead.org
---
 include/linux/cpumask.h | 97 +++++++++++++++++++++++++++-------------
 1 file changed, 66 insertions(+), 31 deletions(-)

diff --git a/include/linux/cpumask.h b/include/linux/cpumask.h
index 383684e..a584336 100644
--- a/include/linux/cpumask.h
+++ b/include/linux/cpumask.h
@@ -98,37 +98,6 @@ extern struct cpumask __cpu_active_mask;
 
 extern atomic_t __num_online_cpus;
 
-#if NR_CPUS > 1
-/**
- * num_online_cpus() - Read the number of online CPUs
- *
- * Despite the fact that __num_online_cpus is of type atomic_t, this
- * interface gives only a momentary snapshot and is not protected against
- * concurrent CPU hotplug operations unless invoked from a cpuhp_lock held
- * region.
- */
-static inline unsigned int num_online_cpus(void)
-{
-	return atomic_read(&__num_online_cpus);
-}
-#define num_possible_cpus()	cpumask_weight(cpu_possible_mask)
-#define num_present_cpus()	cpumask_weight(cpu_present_mask)
-#define num_active_cpus()	cpumask_weight(cpu_active_mask)
-#define cpu_online(cpu)		cpumask_test_cpu((cpu), cpu_online_mask)
-#define cpu_possible(cpu)	cpumask_test_cpu((cpu), cpu_possible_mask)
-#define cpu_present(cpu)	cpumask_test_cpu((cpu), cpu_present_mask)
-#define cpu_active(cpu)		cpumask_test_cpu((cpu), cpu_active_mask)
-#else
-#define num_online_cpus()	1U
-#define num_possible_cpus()	1U
-#define num_present_cpus()	1U
-#define num_active_cpus()	1U
-#define cpu_online(cpu)		((cpu) == 0)
-#define cpu_possible(cpu)	((cpu) == 0)
-#define cpu_present(cpu)	((cpu) == 0)
-#define cpu_active(cpu)		((cpu) == 0)
-#endif
-
 extern cpumask_t cpus_booted_once_mask;
 
 static inline void cpu_max_bits_warn(unsigned int cpu, unsigned int bits)
@@ -894,6 +863,72 @@ static inline const struct cpumask *get_cpu_mask(unsigned int cpu)
 	return to_cpumask(p);
 }
 
+#if NR_CPUS > 1
+/**
+ * num_online_cpus() - Read the number of online CPUs
+ *
+ * Despite the fact that __num_online_cpus is of type atomic_t, this
+ * interface gives only a momentary snapshot and is not protected against
+ * concurrent CPU hotplug operations unless invoked from a cpuhp_lock held
+ * region.
+ */
+static inline unsigned int num_online_cpus(void)
+{
+	return atomic_read(&__num_online_cpus);
+}
+#define num_possible_cpus()	cpumask_weight(cpu_possible_mask)
+#define num_present_cpus()	cpumask_weight(cpu_present_mask)
+#define num_active_cpus()	cpumask_weight(cpu_active_mask)
+
+static inline bool cpu_online(unsigned int cpu)
+{
+	return cpumask_test_cpu(cpu, cpu_online_mask);
+}
+
+static inline bool cpu_possible(unsigned int cpu)
+{
+	return cpumask_test_cpu(cpu, cpu_possible_mask);
+}
+
+static inline bool cpu_present(unsigned int cpu)
+{
+	return cpumask_test_cpu(cpu, cpu_present_mask);
+}
+
+static inline bool cpu_active(unsigned int cpu)
+{
+	return cpumask_test_cpu(cpu, cpu_active_mask);
+}
+
+#else
+
+#define num_online_cpus()	1U
+#define num_possible_cpus()	1U
+#define num_present_cpus()	1U
+#define num_active_cpus()	1U
+
+static inline bool cpu_online(unsigned int cpu)
+{
+	return cpu == 0;
+}
+
+static inline bool cpu_possible(unsigned int cpu)
+{
+	return cpu == 0;
+}
+
+static inline bool cpu_present(unsigned int cpu)
+{
+	return cpu == 0;
+}
+
+static inline bool cpu_active(unsigned int cpu)
+{
+	return cpu == 0;
+}
+
+#endif /* NR_CPUS > 1 */
+
 #define cpu_is_offline(cpu)	unlikely(!cpu_online(cpu))
 
 #if NR_CPUS <= BITS_PER_LONG

^ permalink raw reply related	[flat|nested] 29+ messages in thread

* Re: [PATCH 3/3] sched: Use cpu_dying() to fix balance_push vs hotplug-rollback
  2021-04-15 14:32             ` Valentin Schneider
  2021-04-15 15:29               ` Peter Zijlstra
@ 2021-04-19 10:56               ` Vincent Donnefort
  2021-04-20  9:46                 ` Vincent Donnefort
  1 sibling, 1 reply; 29+ messages in thread
From: Vincent Donnefort @ 2021-04-19 10:56 UTC (permalink / raw)
  To: Valentin Schneider
  Cc: Peter Zijlstra, tglx, mingo, bigeasy, swood, juri.lelli,
	vincent.guittot, dietmar.eggemann, rostedt, bsegall, mgorman,
	bristot, qais.yousef, linux-kernel

On Thu, Apr 15, 2021 at 03:32:11PM +0100, Valentin Schneider wrote:
> On 15/04/21 10:59, Peter Zijlstra wrote:
> > Can't make sense of what I did.. I've removed that hunk. Patch now looks
> > like this.
> >
> 
> Small nit below, but regardless feel free to apply to the whole lot:
> Reviewed-by: Valentin Schneider <valentin.schneider@arm.com>
> 
> @VincentD, ISTR you had tested the initial version of this with your fancy
> shmancy hotplug rollback stresser. Feel like doing this

I indeed wrote a test to verify all the rollback cases, up and down.

It seems I encounter an intermitent issue while running several iterations of
that test ... but I need more time to debug and figure-out where it is blocking.

> 
> > So instead, make sure balance_push is enabled between
> > sched_cpu_deactivate() and sched_cpu_activate() (eg. when
> > !cpu_active()), and gate it's utility with cpu_dying().
> 
> I'd word that "is enabled below sched_cpu_activate()", since
> sched_cpu_deactivate() is now out of the picture.
> 
> [...]
> > @@ -7639,6 +7639,9 @@ static DEFINE_PER_CPU(struct cpu_stop_wo
> >
> >  /*
> >   * Ensure we only run per-cpu kthreads once the CPU goes !active.
> > + *
> > + * This is active/set between sched_cpu_deactivate() / sched_cpu_activate().
> 
> Ditto
> 
> > + * But only effective when the hotplug motion is down.
> >   */
> >  static void balance_push(struct rq *rq)
> >  {

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH 3/3] sched: Use cpu_dying() to fix balance_push vs hotplug-rollback
  2021-04-19 10:56               ` Vincent Donnefort
@ 2021-04-20  9:46                 ` Vincent Donnefort
  2021-04-20 14:20                   ` Peter Zijlstra
  0 siblings, 1 reply; 29+ messages in thread
From: Vincent Donnefort @ 2021-04-20  9:46 UTC (permalink / raw)
  To: Valentin Schneider
  Cc: Peter Zijlstra, tglx, mingo, bigeasy, swood, juri.lelli,
	vincent.guittot, dietmar.eggemann, rostedt, bsegall, mgorman,
	bristot, qais.yousef, linux-kernel

On Mon, Apr 19, 2021 at 11:56:30AM +0100, Vincent Donnefort wrote:
> On Thu, Apr 15, 2021 at 03:32:11PM +0100, Valentin Schneider wrote:
> > On 15/04/21 10:59, Peter Zijlstra wrote:
> > > Can't make sense of what I did.. I've removed that hunk. Patch now looks
> > > like this.
> > >
> > 
> > Small nit below, but regardless feel free to apply to the whole lot:
> > Reviewed-by: Valentin Schneider <valentin.schneider@arm.com>
> > 
> > @VincentD, ISTR you had tested the initial version of this with your fancy
> > shmancy hotplug rollback stresser. Feel like doing this
> 
> I indeed wrote a test to verify all the rollback cases, up and down.
> 
> It seems I encounter an intermitent issue while running several iterations of
> that test ... but I need more time to debug and figure-out where it is blocking.

Found the issue:

$ cat hotplug/states:
219: sched:active
220: online

CPU0: 

$ echo 219 > hotplug/fail
$ echo 0 > online

=> cpu_active = 1 cpu_dying = 1

which means that later on, for another CPU hotunplug, in
__balance_push_cpu_stop(), the fallback rq for a kthread can select that
CPU0, but __migrate_task() would fail and we end-up in an infinite loop,
trying to migrate that task to CPU0.

The problem is that for a failure in sched:active, as "online" has no callback,
there will be no call to cpuhp_invoke_callback(). Hence, the cpu_dying bit would
not be reset.

Maybe cpuhp_reset_state() and cpuhp_set_state() would then be a better place to
switch the dying bit?

> 
> > 
> > > So instead, make sure balance_push is enabled between
> > > sched_cpu_deactivate() and sched_cpu_activate() (eg. when
> > > !cpu_active()), and gate it's utility with cpu_dying().
> > 
> > I'd word that "is enabled below sched_cpu_activate()", since
> > sched_cpu_deactivate() is now out of the picture.
> > 
> > [...]
> > > @@ -7639,6 +7639,9 @@ static DEFINE_PER_CPU(struct cpu_stop_wo
> > >
> > >  /*
> > >   * Ensure we only run per-cpu kthreads once the CPU goes !active.
> > > + *
> > > + * This is active/set between sched_cpu_deactivate() / sched_cpu_activate().
> > 
> > Ditto
> > 
> > > + * But only effective when the hotplug motion is down.
> > >   */
> > >  static void balance_push(struct rq *rq)
> > >  {

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH 3/3] sched: Use cpu_dying() to fix balance_push vs hotplug-rollback
  2021-04-20  9:46                 ` Vincent Donnefort
@ 2021-04-20 14:20                   ` Peter Zijlstra
  2021-04-20 14:39                     ` Peter Zijlstra
  0 siblings, 1 reply; 29+ messages in thread
From: Peter Zijlstra @ 2021-04-20 14:20 UTC (permalink / raw)
  To: Vincent Donnefort
  Cc: Valentin Schneider, tglx, mingo, bigeasy, swood, juri.lelli,
	vincent.guittot, dietmar.eggemann, rostedt, bsegall, mgorman,
	bristot, qais.yousef, linux-kernel

On Tue, Apr 20, 2021 at 10:46:33AM +0100, Vincent Donnefort wrote:

> Found the issue:
> 
> $ cat hotplug/states:
> 219: sched:active
> 220: online
> 
> CPU0: 
> 
> $ echo 219 > hotplug/fail
> $ echo 0 > online
> 
> => cpu_active = 1 cpu_dying = 1
> 
> which means that later on, for another CPU hotunplug, in
> __balance_push_cpu_stop(), the fallback rq for a kthread can select that
> CPU0, but __migrate_task() would fail and we end-up in an infinite loop,
> trying to migrate that task to CPU0.
> 
> The problem is that for a failure in sched:active, as "online" has no callback,
> there will be no call to cpuhp_invoke_callback(). Hence, the cpu_dying bit would
> not be reset.

Urgh! Good find.

> Maybe cpuhp_reset_state() and cpuhp_set_state() would then be a better place to
> switch the dying bit?

Yes, except now cpuhp_invoke_ap_callback() makes my head hurt, that runs
the callbacks out of order. I _think_ we can ignore it, but ....

Something like the below, let me see if I can reproduce and test.

---
diff --git a/kernel/cpu.c b/kernel/cpu.c
index 838dcf238f92..05272bae953d 100644
--- a/kernel/cpu.c
+++ b/kernel/cpu.c
@@ -160,9 +160,6 @@ static int cpuhp_invoke_callback(unsigned int cpu, enum cpuhp_state state,
 	int (*cb)(unsigned int cpu);
 	int ret, cnt;
 
-	if (cpu_dying(cpu) != !bringup)
-		set_cpu_dying(cpu, !bringup);
-
 	if (st->fail == state) {
 		st->fail = CPUHP_INVALID;
 		return -EAGAIN;
@@ -467,13 +464,16 @@ static inline enum cpuhp_state
 cpuhp_set_state(struct cpuhp_cpu_state *st, enum cpuhp_state target)
 {
 	enum cpuhp_state prev_state = st->state;
+	bool bringup = st->state < target;
 
 	st->rollback = false;
 	st->last = NULL;
 
 	st->target = target;
 	st->single = false;
-	st->bringup = st->state < target;
+	st->bringup = bringup;
+	if (cpu_dying(cpu) != !bringup)
+		set_cpu_dying(cpu, !bringup);
 
 	return prev_state;
 }
@@ -481,6 +481,8 @@ cpuhp_set_state(struct cpuhp_cpu_state *st, enum cpuhp_state target)
 static inline void
 cpuhp_reset_state(struct cpuhp_cpu_state *st, enum cpuhp_state prev_state)
 {
+	bool bringup = !st->bringup;
+
 	st->target = prev_state;
 
 	/*
@@ -503,7 +505,9 @@ cpuhp_reset_state(struct cpuhp_cpu_state *st, enum cpuhp_state prev_state)
 			st->state++;
 	}
 
-	st->bringup = !st->bringup;
+	st->bringup = bringup;
+	if (cpu_dying(cpu) != !bringup)
+		set_cpu_dying(cpu, !bringup);
 }
 
 /* Regular hotplug invocation of the AP hotplug thread */

^ permalink raw reply related	[flat|nested] 29+ messages in thread

* Re: [PATCH 3/3] sched: Use cpu_dying() to fix balance_push vs hotplug-rollback
  2021-04-20 14:20                   ` Peter Zijlstra
@ 2021-04-20 14:39                     ` Peter Zijlstra
  2021-04-20 14:58                       ` Peter Zijlstra
  0 siblings, 1 reply; 29+ messages in thread
From: Peter Zijlstra @ 2021-04-20 14:39 UTC (permalink / raw)
  To: Vincent Donnefort
  Cc: Valentin Schneider, tglx, mingo, bigeasy, swood, juri.lelli,
	vincent.guittot, dietmar.eggemann, rostedt, bsegall, mgorman,
	bristot, qais.yousef, linux-kernel

On Tue, Apr 20, 2021 at 04:20:56PM +0200, Peter Zijlstra wrote:
> On Tue, Apr 20, 2021 at 10:46:33AM +0100, Vincent Donnefort wrote:
> 
> > Found the issue:
> > 
> > $ cat hotplug/states:
> > 219: sched:active
> > 220: online
> > 
> > CPU0: 
> > 
> > $ echo 219 > hotplug/fail
> > $ echo 0 > online
> > 
> > => cpu_active = 1 cpu_dying = 1
> > 
> > which means that later on, for another CPU hotunplug, in
> > __balance_push_cpu_stop(), the fallback rq for a kthread can select that
> > CPU0, but __migrate_task() would fail and we end-up in an infinite loop,
> > trying to migrate that task to CPU0.
> > 
> > The problem is that for a failure in sched:active, as "online" has no callback,
> > there will be no call to cpuhp_invoke_callback(). Hence, the cpu_dying bit would
> > not be reset.
> 
> Urgh! Good find.
> 
> > Maybe cpuhp_reset_state() and cpuhp_set_state() would then be a better place to
> > switch the dying bit?
> 
> Yes, except now cpuhp_invoke_ap_callback() makes my head hurt, that runs
> the callbacks out of order. I _think_ we can ignore it, but ....
> 
> Something like the below, let me see if I can reproduce and test.

I seem to have triggered the BUG() in select_fallback_rq() with your recipie.
Have cpu0 fail on sched:active, then offline all other CPUs.

Now lemme add that patch.

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH 3/3] sched: Use cpu_dying() to fix balance_push vs hotplug-rollback
  2021-04-20 14:39                     ` Peter Zijlstra
@ 2021-04-20 14:58                       ` Peter Zijlstra
  2021-04-20 16:53                         ` Vincent Donnefort
                                           ` (2 more replies)
  0 siblings, 3 replies; 29+ messages in thread
From: Peter Zijlstra @ 2021-04-20 14:58 UTC (permalink / raw)
  To: Vincent Donnefort
  Cc: Valentin Schneider, tglx, mingo, bigeasy, swood, juri.lelli,
	vincent.guittot, dietmar.eggemann, rostedt, bsegall, mgorman,
	bristot, qais.yousef, linux-kernel

On Tue, Apr 20, 2021 at 04:39:04PM +0200, Peter Zijlstra wrote:
> On Tue, Apr 20, 2021 at 04:20:56PM +0200, Peter Zijlstra wrote:
> > On Tue, Apr 20, 2021 at 10:46:33AM +0100, Vincent Donnefort wrote:
> > 
> > > Found the issue:
> > > 
> > > $ cat hotplug/states:
> > > 219: sched:active
> > > 220: online
> > > 
> > > CPU0: 
> > > 
> > > $ echo 219 > hotplug/fail
> > > $ echo 0 > online
> > > 
> > > => cpu_active = 1 cpu_dying = 1
> > > 
> > > which means that later on, for another CPU hotunplug, in
> > > __balance_push_cpu_stop(), the fallback rq for a kthread can select that
> > > CPU0, but __migrate_task() would fail and we end-up in an infinite loop,
> > > trying to migrate that task to CPU0.
> > > 
> > > The problem is that for a failure in sched:active, as "online" has no callback,
> > > there will be no call to cpuhp_invoke_callback(). Hence, the cpu_dying bit would
> > > not be reset.
> > 
> > Urgh! Good find.

> I seem to have triggered the BUG() in select_fallback_rq() with your recipie.
> Have cpu0 fail on sched:active, then offline all other CPUs.
> 
> Now lemme add that patch.

(which obviously didn't actually build) seems to fix it.

---
diff --git a/kernel/cpu.c b/kernel/cpu.c
index 838dcf238f92..e538518556f4 100644
--- a/kernel/cpu.c
+++ b/kernel/cpu.c
@@ -63,6 +63,7 @@ struct cpuhp_cpu_state {
 	bool			rollback;
 	bool			single;
 	bool			bringup;
+	int			cpu;
 	struct hlist_node	*node;
 	struct hlist_node	*last;
 	enum cpuhp_state	cb_state;
@@ -160,9 +161,6 @@ static int cpuhp_invoke_callback(unsigned int cpu, enum cpuhp_state state,
 	int (*cb)(unsigned int cpu);
 	int ret, cnt;
 
-	if (cpu_dying(cpu) != !bringup)
-		set_cpu_dying(cpu, !bringup);
-
 	if (st->fail == state) {
 		st->fail = CPUHP_INVALID;
 		return -EAGAIN;
@@ -467,13 +465,16 @@ static inline enum cpuhp_state
 cpuhp_set_state(struct cpuhp_cpu_state *st, enum cpuhp_state target)
 {
 	enum cpuhp_state prev_state = st->state;
+	bool bringup = st->state < target;
 
 	st->rollback = false;
 	st->last = NULL;
 
 	st->target = target;
 	st->single = false;
-	st->bringup = st->state < target;
+	st->bringup = bringup;
+	if (cpu_dying(st->cpu) != !bringup)
+		set_cpu_dying(st->cpu, !bringup);
 
 	return prev_state;
 }
@@ -481,6 +482,8 @@ cpuhp_set_state(struct cpuhp_cpu_state *st, enum cpuhp_state target)
 static inline void
 cpuhp_reset_state(struct cpuhp_cpu_state *st, enum cpuhp_state prev_state)
 {
+	bool bringup = !st->bringup;
+
 	st->target = prev_state;
 
 	/*
@@ -503,7 +506,9 @@ cpuhp_reset_state(struct cpuhp_cpu_state *st, enum cpuhp_state prev_state)
 			st->state++;
 	}
 
-	st->bringup = !st->bringup;
+	st->bringup = bringup;
+	if (cpu_dying(st->cpu) != !bringup)
+		set_cpu_dying(st->cpu, !bringup);
 }
 
 /* Regular hotplug invocation of the AP hotplug thread */
@@ -693,6 +698,7 @@ static void cpuhp_create(unsigned int cpu)
 
 	init_completion(&st->done_up);
 	init_completion(&st->done_down);
+	st->cpu = cpu;
 }
 
 static int cpuhp_should_run(unsigned int cpu)

^ permalink raw reply related	[flat|nested] 29+ messages in thread

* Re: [PATCH 3/3] sched: Use cpu_dying() to fix balance_push vs hotplug-rollback
  2021-04-20 14:58                       ` Peter Zijlstra
@ 2021-04-20 16:53                         ` Vincent Donnefort
  2021-04-20 18:07                           ` Peter Zijlstra
  2021-04-21  9:32                         ` Valentin Schneider
  2021-04-22  7:36                         ` [tip: sched/core] cpumask/hotplug: Fix cpu_dying() state tracking tip-bot2 for Peter Zijlstra
  2 siblings, 1 reply; 29+ messages in thread
From: Vincent Donnefort @ 2021-04-20 16:53 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Valentin Schneider, tglx, mingo, bigeasy, swood, juri.lelli,
	vincent.guittot, dietmar.eggemann, rostedt, bsegall, mgorman,
	bristot, qais.yousef, linux-kernel

On Tue, Apr 20, 2021 at 04:58:00PM +0200, Peter Zijlstra wrote:
> On Tue, Apr 20, 2021 at 04:39:04PM +0200, Peter Zijlstra wrote:
> > On Tue, Apr 20, 2021 at 04:20:56PM +0200, Peter Zijlstra wrote:
> > > On Tue, Apr 20, 2021 at 10:46:33AM +0100, Vincent Donnefort wrote:
> > > 
> > > > Found the issue:
> > > > 
> > > > $ cat hotplug/states:
> > > > 219: sched:active
> > > > 220: online
> > > > 
> > > > CPU0: 
> > > > 
> > > > $ echo 219 > hotplug/fail
> > > > $ echo 0 > online
> > > > 
> > > > => cpu_active = 1 cpu_dying = 1
> > > > 
> > > > which means that later on, for another CPU hotunplug, in
> > > > __balance_push_cpu_stop(), the fallback rq for a kthread can select that
> > > > CPU0, but __migrate_task() would fail and we end-up in an infinite loop,
> > > > trying to migrate that task to CPU0.
> > > > 
> > > > The problem is that for a failure in sched:active, as "online" has no callback,
> > > > there will be no call to cpuhp_invoke_callback(). Hence, the cpu_dying bit would
> > > > not be reset.
> > > 
> > > Urgh! Good find.
> 
> > I seem to have triggered the BUG() in select_fallback_rq() with your recipie.
> > Have cpu0 fail on sched:active, then offline all other CPUs.
> > 
> > Now lemme add that patch.
> 
> (which obviously didn't actually build) seems to fix it.
> 
> ---
> diff --git a/kernel/cpu.c b/kernel/cpu.c
> index 838dcf238f92..e538518556f4 100644
> --- a/kernel/cpu.c
> +++ b/kernel/cpu.c
> @@ -63,6 +63,7 @@ struct cpuhp_cpu_state {
>  	bool			rollback;
>  	bool			single;
>  	bool			bringup;
> +	int			cpu;
>  	struct hlist_node	*node;
>  	struct hlist_node	*last;
>  	enum cpuhp_state	cb_state;
> @@ -160,9 +161,6 @@ static int cpuhp_invoke_callback(unsigned int cpu, enum cpuhp_state state,
>  	int (*cb)(unsigned int cpu);
>  	int ret, cnt;
>  
> -	if (cpu_dying(cpu) != !bringup)
> -		set_cpu_dying(cpu, !bringup);
> -
>  	if (st->fail == state) {
>  		st->fail = CPUHP_INVALID;
>  		return -EAGAIN;
> @@ -467,13 +465,16 @@ static inline enum cpuhp_state
>  cpuhp_set_state(struct cpuhp_cpu_state *st, enum cpuhp_state target)
>  {
>  	enum cpuhp_state prev_state = st->state;
> +	bool bringup = st->state < target;
>  
>  	st->rollback = false;
>  	st->last = NULL;
>  
>  	st->target = target;
>  	st->single = false;
> -	st->bringup = st->state < target;
> +	st->bringup = bringup;
> +	if (cpu_dying(st->cpu) != !bringup)
> +		set_cpu_dying(st->cpu, !bringup);
>  
>  	return prev_state;
>  }
> @@ -481,6 +482,8 @@ cpuhp_set_state(struct cpuhp_cpu_state *st, enum cpuhp_state target)
>  static inline void
>  cpuhp_reset_state(struct cpuhp_cpu_state *st, enum cpuhp_state prev_state)
>  {
> +	bool bringup = !st->bringup;
> +
>  	st->target = prev_state;
>  
>  	/*
> @@ -503,7 +506,9 @@ cpuhp_reset_state(struct cpuhp_cpu_state *st, enum cpuhp_state prev_state)
>  			st->state++;
>  	}
>  
> -	st->bringup = !st->bringup;
> +	st->bringup = bringup;
> +	if (cpu_dying(st->cpu) != !bringup)
> +		set_cpu_dying(st->cpu, !bringup);
>  }
>  
>  /* Regular hotplug invocation of the AP hotplug thread */
> @@ -693,6 +698,7 @@ static void cpuhp_create(unsigned int cpu)
>  
>  	init_completion(&st->done_up);
>  	init_completion(&st->done_down);
> +	st->cpu = cpu;
>  }
>  
>  static int cpuhp_should_run(unsigned int cpu)

All good with that snippet on my end.

I wonder if balance_push() shouldn't use the cpu_of() accessor
instead of rq->cpu.

Otherwise,

+ Reviewed-by: Vincent Donnefort <vincent.donnefort@arm.com>

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH 3/3] sched: Use cpu_dying() to fix balance_push vs hotplug-rollback
  2021-04-20 16:53                         ` Vincent Donnefort
@ 2021-04-20 18:07                           ` Peter Zijlstra
  0 siblings, 0 replies; 29+ messages in thread
From: Peter Zijlstra @ 2021-04-20 18:07 UTC (permalink / raw)
  To: Vincent Donnefort
  Cc: Valentin Schneider, tglx, mingo, bigeasy, swood, juri.lelli,
	vincent.guittot, dietmar.eggemann, rostedt, bsegall, mgorman,
	bristot, qais.yousef, linux-kernel

On Tue, Apr 20, 2021 at 05:53:40PM +0100, Vincent Donnefort wrote:
> All good with that snippet on my end.
> 
> I wonder if balance_push() shouldn't use the cpu_of() accessor
> instead of rq->cpu.

That might be a personal quirk of mine, but for code that is under
CONFIG_SMP (as all balancing code must be) I tend to prefer the more
explicit rq->cpu usage. cpu_of() obviously also works.

> Otherwise,
> 
> + Reviewed-by: Vincent Donnefort <vincent.donnefort@arm.com>

Thanks!, now I get to write a Changelog :-)

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH 3/3] sched: Use cpu_dying() to fix balance_push vs hotplug-rollback
  2021-04-20 14:58                       ` Peter Zijlstra
  2021-04-20 16:53                         ` Vincent Donnefort
@ 2021-04-21  9:32                         ` Valentin Schneider
  2021-04-22  7:36                         ` [tip: sched/core] cpumask/hotplug: Fix cpu_dying() state tracking tip-bot2 for Peter Zijlstra
  2 siblings, 0 replies; 29+ messages in thread
From: Valentin Schneider @ 2021-04-21  9:32 UTC (permalink / raw)
  To: Peter Zijlstra, Vincent Donnefort
  Cc: tglx, mingo, bigeasy, swood, juri.lelli, vincent.guittot,
	dietmar.eggemann, rostedt, bsegall, mgorman, bristot,
	qais.yousef, linux-kernel

On 20/04/21 16:58, Peter Zijlstra wrote:
> On Tue, Apr 20, 2021 at 04:39:04PM +0200, Peter Zijlstra wrote:
>> On Tue, Apr 20, 2021 at 04:20:56PM +0200, Peter Zijlstra wrote:
>> > On Tue, Apr 20, 2021 at 10:46:33AM +0100, Vincent Donnefort wrote:
>> >
>> > > Found the issue:
>> > >
>> > > $ cat hotplug/states:
>> > > 219: sched:active
>> > > 220: online
>> > >
>> > > CPU0:
>> > >
>> > > $ echo 219 > hotplug/fail
>> > > $ echo 0 > online
>> > >
>> > > => cpu_active = 1 cpu_dying = 1
>> > >
>> > > which means that later on, for another CPU hotunplug, in
>> > > __balance_push_cpu_stop(), the fallback rq for a kthread can select that
>> > > CPU0, but __migrate_task() would fail and we end-up in an infinite loop,
>> > > trying to migrate that task to CPU0.
>> > >
>> > > The problem is that for a failure in sched:active, as "online" has no callback,
>> > > there will be no call to cpuhp_invoke_callback(). Hence, the cpu_dying bit would
>> > > not be reset.
>> >
>> > Urgh! Good find.
>
>> I seem to have triggered the BUG() in select_fallback_rq() with your recipie.
>> Have cpu0 fail on sched:active, then offline all other CPUs.
>>
>> Now lemme add that patch.
>
> (which obviously didn't actually build) seems to fix it.
>

Moving the cpu_dying_mask update from cpuhp_invoke_callback() to
cpuhp_{set, reset}_state() means we lose an update in cpuhp_issue_call(),
but AFAICT that wasn't required (this doesn't actually change a CPU's
hotplug state, rather executes some newly installed/removed callbacks whose
state maps below the CPU's current hp state).

Actually staring at it some more, it might have caused bugs: if a
cpuhp_setup_state() fails, we can end up in cpuhp_rollback_install() which
will end up calling cpuhp_invoke_callback(bringup=false) and mess with the
dying mask.

FWIW:

Reviewed-by: Valentin Schneider <valentin.schneider@arm.com>

^ permalink raw reply	[flat|nested] 29+ messages in thread

* [tip: sched/core] cpumask/hotplug: Fix cpu_dying() state tracking
  2021-04-20 14:58                       ` Peter Zijlstra
  2021-04-20 16:53                         ` Vincent Donnefort
  2021-04-21  9:32                         ` Valentin Schneider
@ 2021-04-22  7:36                         ` tip-bot2 for Peter Zijlstra
  2 siblings, 0 replies; 29+ messages in thread
From: tip-bot2 for Peter Zijlstra @ 2021-04-22  7:36 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: Vincent Donnefort, Peter Zijlstra (Intel),
	Valentin Schneider, x86, linux-kernel

The following commit has been merged into the sched/core branch of tip:

Commit-ID:     2ea46c6fc9452ac100ad907b051d797225847e33
Gitweb:        https://git.kernel.org/tip/2ea46c6fc9452ac100ad907b051d797225847e33
Author:        Peter Zijlstra <peterz@infradead.org>
AuthorDate:    Tue, 20 Apr 2021 20:04:19 +02:00
Committer:     Peter Zijlstra <peterz@infradead.org>
CommitterDate: Wed, 21 Apr 2021 13:55:43 +02:00

cpumask/hotplug: Fix cpu_dying() state tracking

Vincent reported that for states with a NULL startup/teardown function
we do not call cpuhp_invoke_callback() (because there is none) and as
such we'll not update the cpu_dying() state.

The stale cpu_dying() can eventually lead to triggering BUG().

Rectify this by updating cpu_dying() in the exact same places the
hotplug machinery tracks its directional state, namely
cpuhp_set_state() and cpuhp_reset_state().

Reported-by: Vincent Donnefort <vincent.donnefort@arm.com>
Suggested-by: Vincent Donnefort <vincent.donnefort@arm.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Vincent Donnefort <vincent.donnefort@arm.com>
Reviewed-by: Valentin Schneider <valentin.schneider@arm.com>
Link: https://lkml.kernel.org/r/YH7r+AoQEReSvxBI@hirez.programming.kicks-ass.net
---
 kernel/cpu.c | 16 +++++++++++-----
 1 file changed, 11 insertions(+), 5 deletions(-)

diff --git a/kernel/cpu.c b/kernel/cpu.c
index 838dcf2..e538518 100644
--- a/kernel/cpu.c
+++ b/kernel/cpu.c
@@ -63,6 +63,7 @@ struct cpuhp_cpu_state {
 	bool			rollback;
 	bool			single;
 	bool			bringup;
+	int			cpu;
 	struct hlist_node	*node;
 	struct hlist_node	*last;
 	enum cpuhp_state	cb_state;
@@ -160,9 +161,6 @@ static int cpuhp_invoke_callback(unsigned int cpu, enum cpuhp_state state,
 	int (*cb)(unsigned int cpu);
 	int ret, cnt;
 
-	if (cpu_dying(cpu) != !bringup)
-		set_cpu_dying(cpu, !bringup);
-
 	if (st->fail == state) {
 		st->fail = CPUHP_INVALID;
 		return -EAGAIN;
@@ -467,13 +465,16 @@ static inline enum cpuhp_state
 cpuhp_set_state(struct cpuhp_cpu_state *st, enum cpuhp_state target)
 {
 	enum cpuhp_state prev_state = st->state;
+	bool bringup = st->state < target;
 
 	st->rollback = false;
 	st->last = NULL;
 
 	st->target = target;
 	st->single = false;
-	st->bringup = st->state < target;
+	st->bringup = bringup;
+	if (cpu_dying(st->cpu) != !bringup)
+		set_cpu_dying(st->cpu, !bringup);
 
 	return prev_state;
 }
@@ -481,6 +482,8 @@ cpuhp_set_state(struct cpuhp_cpu_state *st, enum cpuhp_state target)
 static inline void
 cpuhp_reset_state(struct cpuhp_cpu_state *st, enum cpuhp_state prev_state)
 {
+	bool bringup = !st->bringup;
+
 	st->target = prev_state;
 
 	/*
@@ -503,7 +506,9 @@ cpuhp_reset_state(struct cpuhp_cpu_state *st, enum cpuhp_state prev_state)
 			st->state++;
 	}
 
-	st->bringup = !st->bringup;
+	st->bringup = bringup;
+	if (cpu_dying(st->cpu) != !bringup)
+		set_cpu_dying(st->cpu, !bringup);
 }
 
 /* Regular hotplug invocation of the AP hotplug thread */
@@ -693,6 +698,7 @@ static void cpuhp_create(unsigned int cpu)
 
 	init_completion(&st->done_up);
 	init_completion(&st->done_down);
+	st->cpu = cpu;
 }
 
 static int cpuhp_should_run(unsigned int cpu)

^ permalink raw reply related	[flat|nested] 29+ messages in thread

end of thread, other threads:[~2021-04-22  7:36 UTC | newest]

Thread overview: 29+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-03-10 14:52 [PATCH 0/3] sched: Fix remaining balance_push vs hotplug hole Peter Zijlstra
2021-03-10 14:52 ` [PATCH 1/3] cpumask: Make cpu_{online,possible,present,active}() inline Peter Zijlstra
2021-04-16 15:53   ` [tip: sched/core] " tip-bot2 for Peter Zijlstra
2021-03-10 14:53 ` [PATCH 2/3] cpumask: Introduce DYING mask Peter Zijlstra
2021-03-21 19:30   ` Qais Yousef
2021-03-22 15:07     ` Steven Rostedt
2021-04-12 10:55     ` Peter Zijlstra
2021-04-12 11:16       ` Qais Yousef
2021-04-16 15:53   ` [tip: sched/core] " tip-bot2 for Peter Zijlstra
2021-03-10 14:53 ` [PATCH 3/3] sched: Use cpu_dying() to fix balance_push vs hotplug-rollback Peter Zijlstra
2021-03-11 15:13   ` Valentin Schneider
2021-03-11 16:42     ` Peter Zijlstra
2021-04-12 12:03     ` Peter Zijlstra
2021-04-12 17:22       ` Valentin Schneider
2021-04-13  6:51         ` Peter Zijlstra
2021-04-15  8:59           ` Peter Zijlstra
2021-04-15 14:32             ` Valentin Schneider
2021-04-15 15:29               ` Peter Zijlstra
2021-04-15 15:34                 ` Valentin Schneider
2021-04-19 10:56               ` Vincent Donnefort
2021-04-20  9:46                 ` Vincent Donnefort
2021-04-20 14:20                   ` Peter Zijlstra
2021-04-20 14:39                     ` Peter Zijlstra
2021-04-20 14:58                       ` Peter Zijlstra
2021-04-20 16:53                         ` Vincent Donnefort
2021-04-20 18:07                           ` Peter Zijlstra
2021-04-21  9:32                         ` Valentin Schneider
2021-04-22  7:36                         ` [tip: sched/core] cpumask/hotplug: Fix cpu_dying() state tracking tip-bot2 for Peter Zijlstra
2021-04-16 15:53             ` [tip: sched/core] sched: Use cpu_dying() to fix balance_push vs hotplug-rollback tip-bot2 for Peter Zijlstra

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).