All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 0/4] cpu/hotplug: rollback and "fail" interface fixes
@ 2021-01-11 17:10 vincent.donnefort
  2021-01-11 17:10 ` [PATCH 1/4] cpu/hotplug: Allowing to reset fail injection vincent.donnefort
                   ` (3 more replies)
  0 siblings, 4 replies; 17+ messages in thread
From: vincent.donnefort @ 2021-01-11 17:10 UTC (permalink / raw)
  To: peterz, tglx, linux-kernel; +Cc: valentin.schneider, Vincent Donnefort

From: Vincent Donnefort <vincent.donnefort@arm.com>

This patch-set intends mainly to fix HP rollback, which is currently broken,
due to an inconsistent "state" usage and an issue with CPUHP_AP_ONLINE_IDLE.

It also improves the "fail" interface, which can now be reset and will reject
CPUHP_BRINGUP_CPU, a step that can't always fail.

Vincent Donnefort (4):
  cpu/hotplug: Allowing to reset fail injection
  cpu/hotplug: CPUHP_BRINGUP_CPU exception in fail injection
  cpu/hotplug: Add cpuhp_invoke_callback_range()
  cpu/hotplug: Fix CPU down rollback

 kernel/cpu.c | 173 +++++++++++++++++++++++++++++++++++++----------------------
 1 file changed, 110 insertions(+), 63 deletions(-)

-- 
2.7.4


^ permalink raw reply	[flat|nested] 17+ messages in thread

* [PATCH 1/4] cpu/hotplug: Allowing to reset fail injection
  2021-01-11 17:10 [PATCH 0/4] cpu/hotplug: rollback and "fail" interface fixes vincent.donnefort
@ 2021-01-11 17:10 ` vincent.donnefort
  2021-01-11 17:10 ` [PATCH 2/4] cpu/hotplug: CPUHP_BRINGUP_CPU exception in " vincent.donnefort
                   ` (2 subsequent siblings)
  3 siblings, 0 replies; 17+ messages in thread
From: vincent.donnefort @ 2021-01-11 17:10 UTC (permalink / raw)
  To: peterz, tglx, linux-kernel; +Cc: valentin.schneider, Vincent Donnefort

From: Vincent Donnefort <vincent.donnefort@arm.com>

Currently, the only way of resetting this file is to actually try to run
a hotplug, hotunplug or both. This is quite annoying for testing and, as
the default value for this file is -1, it seems quite natural to let a
user write it.

Signed-off-by: Vincent Donnefort <vincent.donnefort@arm.com>
---
 kernel/cpu.c | 5 +++++
 1 file changed, 5 insertions(+)

diff --git a/kernel/cpu.c b/kernel/cpu.c
index 1b6302e..9121edf 100644
--- a/kernel/cpu.c
+++ b/kernel/cpu.c
@@ -2207,6 +2207,11 @@ static ssize_t write_cpuhp_fail(struct device *dev,
 	if (ret)
 		return ret;
 
+	if (fail == CPUHP_INVALID) {
+		st->fail = fail;
+		return count;
+	}
+
 	if (fail < CPUHP_OFFLINE || fail > CPUHP_ONLINE)
 		return -EINVAL;
 
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH 2/4] cpu/hotplug: CPUHP_BRINGUP_CPU exception in fail injection
  2021-01-11 17:10 [PATCH 0/4] cpu/hotplug: rollback and "fail" interface fixes vincent.donnefort
  2021-01-11 17:10 ` [PATCH 1/4] cpu/hotplug: Allowing to reset fail injection vincent.donnefort
@ 2021-01-11 17:10 ` vincent.donnefort
  2021-01-20 12:58   ` Peter Zijlstra
  2021-01-11 17:10 ` [PATCH 3/4] cpu/hotplug: Add cpuhp_invoke_callback_range() vincent.donnefort
  2021-01-11 17:10 ` [PATCH 4/4] cpu/hotplug: Fix CPU down rollback vincent.donnefort
  3 siblings, 1 reply; 17+ messages in thread
From: vincent.donnefort @ 2021-01-11 17:10 UTC (permalink / raw)
  To: peterz, tglx, linux-kernel; +Cc: valentin.schneider, Vincent Donnefort

From: Vincent Donnefort <vincent.donnefort@arm.com>

The atomic states (between CPUHP_AP_IDLE_DEAD and CPUHP_AP_ONLINE) are
triggered by the CPUHP_BRINGUP_CPU step. If the latter doesn't run, none
of the atomic can. Hence, rollback is not possible after a hotunplug
CPUHP_BRINGUP_CPU step failure and the "fail" interface shouldn't allow
it. Moreover, the current CPUHP_BRINGUP_CPU teardown callback
(finish_cpu()) cannot fail anyway.

Signed-off-by: Vincent Donnefort <vincent.donnefort@arm.com>
---
 kernel/cpu.c | 9 +++++++--
 1 file changed, 7 insertions(+), 2 deletions(-)

diff --git a/kernel/cpu.c b/kernel/cpu.c
index 9121edf..bcd7b2a 100644
--- a/kernel/cpu.c
+++ b/kernel/cpu.c
@@ -2216,9 +2216,14 @@ static ssize_t write_cpuhp_fail(struct device *dev,
 		return -EINVAL;
 
 	/*
-	 * Cannot fail STARTING/DYING callbacks.
+	 * Cannot fail STARTING/DYING callbacks. Also, those callbacks are
+	 * triggered by BRINGUP_CPU bringup callback. Therefore, the latter
+	 * can't fail during hotunplug, as it would mean we have no way of
+	 * rolling back the atomic states that have been previously teared
+	 * down.
 	 */
-	if (cpuhp_is_atomic_state(fail))
+	if (cpuhp_is_atomic_state(fail) ||
+	    (fail == CPUHP_BRINGUP_CPU && st->state > CPUHP_BRINGUP_CPU))
 		return -EINVAL;
 
 	/*
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH 3/4] cpu/hotplug: Add cpuhp_invoke_callback_range()
  2021-01-11 17:10 [PATCH 0/4] cpu/hotplug: rollback and "fail" interface fixes vincent.donnefort
  2021-01-11 17:10 ` [PATCH 1/4] cpu/hotplug: Allowing to reset fail injection vincent.donnefort
  2021-01-11 17:10 ` [PATCH 2/4] cpu/hotplug: CPUHP_BRINGUP_CPU exception in " vincent.donnefort
@ 2021-01-11 17:10 ` vincent.donnefort
  2021-01-20 13:11   ` Peter Zijlstra
                     ` (2 more replies)
  2021-01-11 17:10 ` [PATCH 4/4] cpu/hotplug: Fix CPU down rollback vincent.donnefort
  3 siblings, 3 replies; 17+ messages in thread
From: vincent.donnefort @ 2021-01-11 17:10 UTC (permalink / raw)
  To: peterz, tglx, linux-kernel; +Cc: valentin.schneider, Vincent Donnefort

From: Vincent Donnefort <vincent.donnefort@arm.com>

Factorizing and unifying cpuhp callback range invocations, especially for
the hotunplug path, where two different ways of decrementing were used. The
first one, decrements before the callback is called:

 cpuhp_thread_fun()
     state = st->state;
     st->state--;
     cpuhp_invoke_callback(state);

The second one, after:

 take_down_cpu()|cpuhp_down_callbacks()
     cpuhp_invoke_callback(st->state);
     st->state--;

This is problematic for rolling back the steps in case of error, as
depending on the decrement, the rollback will start from N or N-1. It also
makes tracing inconsistent, between steps run in the cpuhp thread and
the others.

Additionally, avoid useless cpuhp_thread_fun() loops by skipping empty
steps.

Signed-off-by: Vincent Donnefort <vincent.donnefort@arm.com>
---
 kernel/cpu.c | 150 +++++++++++++++++++++++++++++++++++------------------------
 1 file changed, 90 insertions(+), 60 deletions(-)

diff --git a/kernel/cpu.c b/kernel/cpu.c
index bcd7b2a..aebec3a 100644
--- a/kernel/cpu.c
+++ b/kernel/cpu.c
@@ -135,6 +135,11 @@ static struct cpuhp_step *cpuhp_get_step(enum cpuhp_state state)
 	return cpuhp_hp_states + state;
 }
 
+static bool cpuhp_step_empty(bool bringup, struct cpuhp_step *step)
+{
+	return bringup ? !step->startup.single : !step->teardown.single;
+}
+
 /**
  * cpuhp_invoke_callback _ Invoke the callbacks for a given state
  * @cpu:	The cpu for which the callback should be invoked
@@ -157,26 +162,24 @@ static int cpuhp_invoke_callback(unsigned int cpu, enum cpuhp_state state,
 
 	if (st->fail == state) {
 		st->fail = CPUHP_INVALID;
-
-		if (!(bringup ? step->startup.single : step->teardown.single))
-			return 0;
-
 		return -EAGAIN;
 	}
 
+	if (cpuhp_step_empty(bringup, step)) {
+		WARN_ON_ONCE(1);
+		return 0;
+	}
+
 	if (!step->multi_instance) {
 		WARN_ON_ONCE(lastp && *lastp);
 		cb = bringup ? step->startup.single : step->teardown.single;
-		if (!cb)
-			return 0;
+
 		trace_cpuhp_enter(cpu, st->target, state, cb);
 		ret = cb(cpu);
 		trace_cpuhp_exit(cpu, st->state, state, ret);
 		return ret;
 	}
 	cbm = bringup ? step->startup.multi : step->teardown.multi;
-	if (!cbm)
-		return 0;
 
 	/* Single invocation for instance add/remove */
 	if (node) {
@@ -475,6 +478,11 @@ cpuhp_set_state(struct cpuhp_cpu_state *st, enum cpuhp_state target)
 static inline void
 cpuhp_reset_state(struct cpuhp_cpu_state *st, enum cpuhp_state prev_state)
 {
+	st->target = prev_state;
+
+	if (st->rollback)
+		return;
+
 	st->rollback = true;
 
 	/*
@@ -488,7 +496,6 @@ cpuhp_reset_state(struct cpuhp_cpu_state *st, enum cpuhp_state prev_state)
 			st->state++;
 	}
 
-	st->target = prev_state;
 	st->bringup = !st->bringup;
 }
 
@@ -591,10 +598,53 @@ static int finish_cpu(unsigned int cpu)
  * Hotplug state machine related functions
  */
 
-static void undo_cpu_up(unsigned int cpu, struct cpuhp_cpu_state *st)
+/*
+ * Get the next state to run. Empty ones will be skipped. Returns true if a
+ * state must be run.
+ *
+ * st->state will be modified ahead of time, to match state_to_run, as if it
+ * has already ran.
+ */
+static bool cpuhp_next_state(bool bringup,
+			     enum cpuhp_state *state_to_run,
+			     struct cpuhp_cpu_state *st,
+			     enum cpuhp_state target)
 {
-	for (st->state--; st->state > st->target; st->state--)
-		cpuhp_invoke_callback(cpu, st->state, false, NULL, NULL);
+	do {
+		if (bringup) {
+			if (st->state >= target)
+				return false;
+
+			*state_to_run = ++st->state;
+		} else {
+			if (st->state <= target)
+				return false;
+
+			*state_to_run = st->state--;
+		}
+
+		if (!cpuhp_step_empty(bringup, cpuhp_get_step(*state_to_run)))
+			break;
+	} while (true);
+
+	return true;
+}
+
+static int cpuhp_invoke_callback_range(bool bringup,
+				       unsigned int cpu,
+				       struct cpuhp_cpu_state *st,
+				       enum cpuhp_state target)
+{
+	enum cpuhp_state state;
+	int err = 0;
+
+	while (cpuhp_next_state(bringup, &state, st, target)) {
+		err = cpuhp_invoke_callback(cpu, state, bringup, NULL, NULL);
+		if (err)
+			break;
+	}
+
+	return err;
 }
 
 static inline bool can_rollback_cpu(struct cpuhp_cpu_state *st)
@@ -617,16 +667,12 @@ static int cpuhp_up_callbacks(unsigned int cpu, struct cpuhp_cpu_state *st,
 	enum cpuhp_state prev_state = st->state;
 	int ret = 0;
 
-	while (st->state < target) {
-		st->state++;
-		ret = cpuhp_invoke_callback(cpu, st->state, true, NULL, NULL);
-		if (ret) {
-			if (can_rollback_cpu(st)) {
-				st->target = prev_state;
-				undo_cpu_up(cpu, st);
-			}
-			break;
-		}
+	ret = cpuhp_invoke_callback_range(true, cpu, st, target);
+	if (ret) {
+		cpuhp_reset_state(st, prev_state);
+		if (can_rollback_cpu(st))
+			WARN_ON(cpuhp_invoke_callback_range(false, cpu, st,
+							    prev_state));
 	}
 	return ret;
 }
@@ -690,17 +736,9 @@ static void cpuhp_thread_fun(unsigned int cpu)
 		state = st->cb_state;
 		st->should_run = false;
 	} else {
-		if (bringup) {
-			st->state++;
-			state = st->state;
-			st->should_run = (st->state < st->target);
-			WARN_ON_ONCE(st->state > st->target);
-		} else {
-			state = st->state;
-			st->state--;
-			st->should_run = (st->state > st->target);
-			WARN_ON_ONCE(st->state < st->target);
-		}
+		st->should_run = cpuhp_next_state(bringup, &state, st, st->target);
+		if (!st->should_run)
+			goto end;
 	}
 
 	WARN_ON_ONCE(!cpuhp_is_ap_state(state));
@@ -728,6 +766,7 @@ static void cpuhp_thread_fun(unsigned int cpu)
 		st->should_run = false;
 	}
 
+end:
 	cpuhp_lock_release(bringup);
 	lockdep_release_cpus_lock();
 
@@ -881,19 +920,18 @@ static int take_cpu_down(void *_param)
 		return err;
 
 	/*
-	 * We get here while we are in CPUHP_TEARDOWN_CPU state and we must not
-	 * do this step again.
+	 * Must be called from CPUHP_TEARDOWN_CPU, which means, as we are going
+	 * down, that the current state is CPUHP_TEARDOWN_CPU - 1.
 	 */
-	WARN_ON(st->state != CPUHP_TEARDOWN_CPU);
-	st->state--;
+	WARN_ON(st->state != (CPUHP_TEARDOWN_CPU - 1));
+
 	/* Invoke the former CPU_DYING callbacks */
-	for (; st->state > target; st->state--) {
-		ret = cpuhp_invoke_callback(cpu, st->state, false, NULL, NULL);
+	ret = cpuhp_invoke_callback_range(false, cpu, st, target);
+	if (ret)
 		/*
 		 * DYING must not fail!
 		 */
 		WARN_ON_ONCE(ret);
-	}
 
 	/* Give up timekeeping duties */
 	tick_handover_do_timer();
@@ -975,27 +1013,22 @@ void cpuhp_report_idle_dead(void)
 				 cpuhp_complete_idle_dead, st, 0);
 }
 
-static void undo_cpu_down(unsigned int cpu, struct cpuhp_cpu_state *st)
-{
-	for (st->state++; st->state < st->target; st->state++)
-		cpuhp_invoke_callback(cpu, st->state, true, NULL, NULL);
-}
-
 static int cpuhp_down_callbacks(unsigned int cpu, struct cpuhp_cpu_state *st,
 				enum cpuhp_state target)
 {
 	enum cpuhp_state prev_state = st->state;
 	int ret = 0;
 
-	for (; st->state > target; st->state--) {
-		ret = cpuhp_invoke_callback(cpu, st->state, false, NULL, NULL);
-		if (ret) {
-			st->target = prev_state;
-			if (st->state < prev_state)
-				undo_cpu_down(cpu, st);
-			break;
-		}
+	ret = cpuhp_invoke_callback_range(false, cpu, st, target);
+	if (ret) {
+
+		cpuhp_reset_state(st, prev_state);
+
+		if (st->state < prev_state)
+			WARN_ON(cpuhp_invoke_callback_range(true, cpu, st,
+							    prev_state));
 	}
+
 	return ret;
 }
 
@@ -1164,14 +1197,12 @@ void notify_cpu_starting(unsigned int cpu)
 
 	rcu_cpu_starting(cpu);	/* Enables RCU usage on this CPU. */
 	cpumask_set_cpu(cpu, &cpus_booted_once_mask);
-	while (st->state < target) {
-		st->state++;
-		ret = cpuhp_invoke_callback(cpu, st->state, true, NULL, NULL);
+	ret = cpuhp_invoke_callback_range(true, cpu, st, target);
+	if (ret)
 		/*
 		 * STARTING must not fail!
 		 */
 		WARN_ON_ONCE(ret);
-	}
 }
 
 /*
@@ -1777,8 +1808,7 @@ static int cpuhp_issue_call(int cpu, enum cpuhp_state state, bool bringup,
 	 * If there's nothing to do, we done.
 	 * Relies on the union for multi_instance.
 	 */
-	if ((bringup && !sp->startup.single) ||
-	    (!bringup && !sp->teardown.single))
+	if (cpuhp_step_empty(bringup, sp))
 		return 0;
 	/*
 	 * The non AP bound callbacks can fail on bringup. On teardown
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH 4/4] cpu/hotplug: Fix CPU down rollback
  2021-01-11 17:10 [PATCH 0/4] cpu/hotplug: rollback and "fail" interface fixes vincent.donnefort
                   ` (2 preceding siblings ...)
  2021-01-11 17:10 ` [PATCH 3/4] cpu/hotplug: Add cpuhp_invoke_callback_range() vincent.donnefort
@ 2021-01-11 17:10 ` vincent.donnefort
  2021-01-21 14:57   ` Peter Zijlstra
  3 siblings, 1 reply; 17+ messages in thread
From: vincent.donnefort @ 2021-01-11 17:10 UTC (permalink / raw)
  To: peterz, tglx, linux-kernel; +Cc: valentin.schneider, Vincent Donnefort

From: Vincent Donnefort <vincent.donnefort@arm.com>

After the AP brought itself down to CPUHP_TEARDOWN_CPU, the BP will finish
the job. The steps left are as followed:

   +--------------------+
   | CPUHP_TEARDOWN_CPU |  -> If fails state is CPUHP_TEARDOWN_CPU
   +--------------------+
   |   ATOMIC STATES    |  -> Cannot Fail
   +--------------------+
   |  CPUHP_BRINGUP_CPU |  -> Cannot fail
   +--------------------+
   |        ...         |
   |        ...         |  -> Can fail and rollback
   |   CPUHP_OFFLINE    |
   +--------------------+

There are, in case of failure two possibilities:

  1. Failure in CPUHP_TEARDOWN_CPU, then st->state will still be
     CPUHP_TEARDOWN_CPU

  2. Failure between CPUHP_OFFLINE and CPUHP_BRINGUP_CPU.

For 2. The rollback will always run in the CPUHP_BRINGUP_CPU bringup
callback (bringup_cpu()) which kicks the AP back on. The latter will then
end-up setting st->state to CPUHP_AP_ONLINE_IDLE.

Checking for CPUHP_AP_ONLINE_IDLE allows then to rollback in all cases.

Signed-off-by: Vincent Donnefort <vincent.donnefort@arm.com>
---
 kernel/cpu.c | 9 ++++++++-
 1 file changed, 8 insertions(+), 1 deletion(-)

diff --git a/kernel/cpu.c b/kernel/cpu.c
index aebec3a..8b3f84e 100644
--- a/kernel/cpu.c
+++ b/kernel/cpu.c
@@ -1078,7 +1078,14 @@ static int __ref _cpu_down(unsigned int cpu, int tasks_frozen,
 	 * to do the further cleanups.
 	 */
 	ret = cpuhp_down_callbacks(cpu, st, target);
-	if (ret && st->state == CPUHP_TEARDOWN_CPU && st->state < prev_state) {
+	if (ret && (st->state == CPUHP_AP_ONLINE_IDLE ||
+		    st->state == CPUHP_TEARDOWN_CPU) &&
+	    st->state < prev_state) {
+		/*
+		 * After an error the state can be either:
+		 *  - CPUHP_TEARDOWN_CPU if this step failed.
+		 *  - CPUHP_AP_ONLINE_IDLE if any other failed.
+		 */
 		cpuhp_reset_state(st, prev_state);
 		__cpuhp_kick_ap(st);
 	}
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* Re: [PATCH 2/4] cpu/hotplug: CPUHP_BRINGUP_CPU exception in fail injection
  2021-01-11 17:10 ` [PATCH 2/4] cpu/hotplug: CPUHP_BRINGUP_CPU exception in " vincent.donnefort
@ 2021-01-20 12:58   ` Peter Zijlstra
  2021-01-20 15:17     ` Vincent Donnefort
  0 siblings, 1 reply; 17+ messages in thread
From: Peter Zijlstra @ 2021-01-20 12:58 UTC (permalink / raw)
  To: vincent.donnefort; +Cc: tglx, linux-kernel, valentin.schneider

On Mon, Jan 11, 2021 at 05:10:45PM +0000, vincent.donnefort@arm.com wrote:
> From: Vincent Donnefort <vincent.donnefort@arm.com>
> 
> The atomic states (between CPUHP_AP_IDLE_DEAD and CPUHP_AP_ONLINE) are
> triggered by the CPUHP_BRINGUP_CPU step. If the latter doesn't run, none
> of the atomic can. Hence, rollback is not possible after a hotunplug
> CPUHP_BRINGUP_CPU step failure and the "fail" interface shouldn't allow
> it. Moreover, the current CPUHP_BRINGUP_CPU teardown callback
> (finish_cpu()) cannot fail anyway.
> 
> Signed-off-by: Vincent Donnefort <vincent.donnefort@arm.com>
> ---
>  kernel/cpu.c | 9 +++++++--
>  1 file changed, 7 insertions(+), 2 deletions(-)
> 
> diff --git a/kernel/cpu.c b/kernel/cpu.c
> index 9121edf..bcd7b2a 100644
> --- a/kernel/cpu.c
> +++ b/kernel/cpu.c
> @@ -2216,9 +2216,14 @@ static ssize_t write_cpuhp_fail(struct device *dev,
>  		return -EINVAL;
>  
>  	/*
> -	 * Cannot fail STARTING/DYING callbacks.
> +	 * Cannot fail STARTING/DYING callbacks. Also, those callbacks are
> +	 * triggered by BRINGUP_CPU bringup callback. Therefore, the latter
> +	 * can't fail during hotunplug, as it would mean we have no way of
> +	 * rolling back the atomic states that have been previously teared
> +	 * down.
>  	 */
> -	if (cpuhp_is_atomic_state(fail))
> +	if (cpuhp_is_atomic_state(fail) ||
> +	    (fail == CPUHP_BRINGUP_CPU && st->state > CPUHP_BRINGUP_CPU))
>  		return -EINVAL;

Should we instead disallow failing any state that has .cant_stop ?

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 3/4] cpu/hotplug: Add cpuhp_invoke_callback_range()
  2021-01-11 17:10 ` [PATCH 3/4] cpu/hotplug: Add cpuhp_invoke_callback_range() vincent.donnefort
@ 2021-01-20 13:11   ` Peter Zijlstra
  2021-01-20 17:54     ` Peter Zijlstra
  2021-01-20 17:45   ` Peter Zijlstra
  2021-01-20 17:49   ` Peter Zijlstra
  2 siblings, 1 reply; 17+ messages in thread
From: Peter Zijlstra @ 2021-01-20 13:11 UTC (permalink / raw)
  To: vincent.donnefort; +Cc: tglx, linux-kernel, valentin.schneider

On Mon, Jan 11, 2021 at 05:10:46PM +0000, vincent.donnefort@arm.com wrote:
> @@ -157,26 +162,24 @@ static int cpuhp_invoke_callback(unsigned int cpu, enum cpuhp_state state,
>  
>  	if (st->fail == state) {
>  		st->fail = CPUHP_INVALID;
> -
> -		if (!(bringup ? step->startup.single : step->teardown.single))
> -			return 0;
> -
>  		return -EAGAIN;
>  	}
>  
> +	if (cpuhp_step_empty(bringup, step)) {
> +		WARN_ON_ONCE(1);
> +		return 0;
> +	}

This changes the behaviour of fail.. might be best to refactor without
changing behaviour.

Lemme continue reading.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 2/4] cpu/hotplug: CPUHP_BRINGUP_CPU exception in fail injection
  2021-01-20 12:58   ` Peter Zijlstra
@ 2021-01-20 15:17     ` Vincent Donnefort
  2021-01-20 17:30       ` Peter Zijlstra
  0 siblings, 1 reply; 17+ messages in thread
From: Vincent Donnefort @ 2021-01-20 15:17 UTC (permalink / raw)
  To: Peter Zijlstra; +Cc: tglx, linux-kernel, valentin.schneider

On Wed, Jan 20, 2021 at 01:58:35PM +0100, Peter Zijlstra wrote:
> On Mon, Jan 11, 2021 at 05:10:45PM +0000, vincent.donnefort@arm.com wrote:
> > From: Vincent Donnefort <vincent.donnefort@arm.com>
> > 
> > The atomic states (between CPUHP_AP_IDLE_DEAD and CPUHP_AP_ONLINE) are
> > triggered by the CPUHP_BRINGUP_CPU step. If the latter doesn't run, none
> > of the atomic can. Hence, rollback is not possible after a hotunplug
> > CPUHP_BRINGUP_CPU step failure and the "fail" interface shouldn't allow
> > it. Moreover, the current CPUHP_BRINGUP_CPU teardown callback
> > (finish_cpu()) cannot fail anyway.
> > 
> > Signed-off-by: Vincent Donnefort <vincent.donnefort@arm.com>
> > ---
> >  kernel/cpu.c | 9 +++++++--
> >  1 file changed, 7 insertions(+), 2 deletions(-)
> > 
> > diff --git a/kernel/cpu.c b/kernel/cpu.c
> > index 9121edf..bcd7b2a 100644
> > --- a/kernel/cpu.c
> > +++ b/kernel/cpu.c
> > @@ -2216,9 +2216,14 @@ static ssize_t write_cpuhp_fail(struct device *dev,
> >  		return -EINVAL;
> >  
> >  	/*
> > -	 * Cannot fail STARTING/DYING callbacks.
> > +	 * Cannot fail STARTING/DYING callbacks. Also, those callbacks are
> > +	 * triggered by BRINGUP_CPU bringup callback. Therefore, the latter
> > +	 * can't fail during hotunplug, as it would mean we have no way of
> > +	 * rolling back the atomic states that have been previously teared
> > +	 * down.
> >  	 */
> > -	if (cpuhp_is_atomic_state(fail))
> > +	if (cpuhp_is_atomic_state(fail) ||
> > +	    (fail == CPUHP_BRINGUP_CPU && st->state > CPUHP_BRINGUP_CPU))
> >  		return -EINVAL;
> 
> Should we instead disallow failing any state that has .cant_stop ?

We would reduce the scope of what can be tested: bringup_cpu() and
takedown_cpu() are both marked as "cant_stop". Still, those callbacks are
allowed to fail.

Checking for cant_stop, made me also see that write_cpuhp_target() is probably
missing a check for cpuhp_is_atomic_state(). For the same reason as this patch,
when doing cpu_down(), we can't stop in one of these states.

-- 
Vincent

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 2/4] cpu/hotplug: CPUHP_BRINGUP_CPU exception in fail injection
  2021-01-20 15:17     ` Vincent Donnefort
@ 2021-01-20 17:30       ` Peter Zijlstra
  0 siblings, 0 replies; 17+ messages in thread
From: Peter Zijlstra @ 2021-01-20 17:30 UTC (permalink / raw)
  To: Vincent Donnefort; +Cc: tglx, linux-kernel, valentin.schneider

On Wed, Jan 20, 2021 at 03:17:24PM +0000, Vincent Donnefort wrote:
> On Wed, Jan 20, 2021 at 01:58:35PM +0100, Peter Zijlstra wrote:
> > On Mon, Jan 11, 2021 at 05:10:45PM +0000, vincent.donnefort@arm.com wrote:

> > > +	if (cpuhp_is_atomic_state(fail) ||
> > > +	    (fail == CPUHP_BRINGUP_CPU && st->state > CPUHP_BRINGUP_CPU))
> > >  		return -EINVAL;
> > 
> > Should we instead disallow failing any state that has .cant_stop ?
> 
> We would reduce the scope of what can be tested: bringup_cpu() and
> takedown_cpu() are both marked as "cant_stop". Still, those callbacks are
> allowed to fail.

Fair enough. I suppose we can add an additional cant_fail field, but I'm
not sure that's worth the effort over this.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 3/4] cpu/hotplug: Add cpuhp_invoke_callback_range()
  2021-01-11 17:10 ` [PATCH 3/4] cpu/hotplug: Add cpuhp_invoke_callback_range() vincent.donnefort
  2021-01-20 13:11   ` Peter Zijlstra
@ 2021-01-20 17:45   ` Peter Zijlstra
  2021-01-20 17:53     ` Peter Zijlstra
  2021-01-20 17:49   ` Peter Zijlstra
  2 siblings, 1 reply; 17+ messages in thread
From: Peter Zijlstra @ 2021-01-20 17:45 UTC (permalink / raw)
  To: vincent.donnefort; +Cc: tglx, linux-kernel, valentin.schneider

On Mon, Jan 11, 2021 at 05:10:46PM +0000, vincent.donnefort@arm.com wrote:
> @@ -475,6 +478,11 @@ cpuhp_set_state(struct cpuhp_cpu_state *st, enum cpuhp_state target)
>  static inline void
>  cpuhp_reset_state(struct cpuhp_cpu_state *st, enum cpuhp_state prev_state)
>  {
> +	st->target = prev_state;
> +
> +	if (st->rollback)
> +		return;

I'm thinking that if we call rollback while already rollback we're hosed
something fierce, no?

That like going up, failing, going back down again, also failing, giving
up in a fiery death.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 3/4] cpu/hotplug: Add cpuhp_invoke_callback_range()
  2021-01-11 17:10 ` [PATCH 3/4] cpu/hotplug: Add cpuhp_invoke_callback_range() vincent.donnefort
  2021-01-20 13:11   ` Peter Zijlstra
  2021-01-20 17:45   ` Peter Zijlstra
@ 2021-01-20 17:49   ` Peter Zijlstra
  2 siblings, 0 replies; 17+ messages in thread
From: Peter Zijlstra @ 2021-01-20 17:49 UTC (permalink / raw)
  To: vincent.donnefort; +Cc: tglx, linux-kernel, valentin.schneider

On Mon, Jan 11, 2021 at 05:10:46PM +0000, vincent.donnefort@arm.com wrote:
> +		if (can_rollback_cpu(st))
> +			WARN_ON(cpuhp_invoke_callback_range(false, cpu, st,
> +							    prev_state));

> +	if (ret)
>  		/*
>  		 * DYING must not fail!
>  		 */
>  		WARN_ON_ONCE(ret);
> -	}

> +	if (ret) {
> +
> +		cpuhp_reset_state(st, prev_state);
> +
> +		if (st->state < prev_state)
> +			WARN_ON(cpuhp_invoke_callback_range(true, cpu, st,
> +							    prev_state));
>  	}

> +	if (ret)
>  		/*
>  		 * STARTING must not fail!
>  		 */
>  		WARN_ON_ONCE(ret);
> -	}

Stuff like that is CodingStyle fail.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 3/4] cpu/hotplug: Add cpuhp_invoke_callback_range()
  2021-01-20 17:45   ` Peter Zijlstra
@ 2021-01-20 17:53     ` Peter Zijlstra
  2021-01-21 10:57       ` Vincent Donnefort
  0 siblings, 1 reply; 17+ messages in thread
From: Peter Zijlstra @ 2021-01-20 17:53 UTC (permalink / raw)
  To: vincent.donnefort; +Cc: tglx, linux-kernel, valentin.schneider

On Wed, Jan 20, 2021 at 06:45:16PM +0100, Peter Zijlstra wrote:
> On Mon, Jan 11, 2021 at 05:10:46PM +0000, vincent.donnefort@arm.com wrote:
> > @@ -475,6 +478,11 @@ cpuhp_set_state(struct cpuhp_cpu_state *st, enum cpuhp_state target)
> >  static inline void
> >  cpuhp_reset_state(struct cpuhp_cpu_state *st, enum cpuhp_state prev_state)
> >  {
> > +	st->target = prev_state;
> > +
> > +	if (st->rollback)
> > +		return;
> 
> I'm thinking that if we call rollback while already rollback we're hosed
> something fierce, no?
> 
> That like going up, failing, going back down again, also failing, giving
> up in a fiery death.

Ooh, is this a hack for _cpu_down():

	ret = cpuhp_down_callbacks(cpu, st, target);
	if (ret && st->state == CPUHP_TEARDOWN_CPU && st->state < prev_state) {
		cpuhp_reset_state(st, prev_state);
		__cpuhp_kick_ap(st);
	}

Where cpuhp_down_callbacks() can already have called cpuhp_reset_state() ?

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 3/4] cpu/hotplug: Add cpuhp_invoke_callback_range()
  2021-01-20 13:11   ` Peter Zijlstra
@ 2021-01-20 17:54     ` Peter Zijlstra
  0 siblings, 0 replies; 17+ messages in thread
From: Peter Zijlstra @ 2021-01-20 17:54 UTC (permalink / raw)
  To: vincent.donnefort; +Cc: tglx, linux-kernel, valentin.schneider

On Wed, Jan 20, 2021 at 02:11:14PM +0100, Peter Zijlstra wrote:
> On Mon, Jan 11, 2021 at 05:10:46PM +0000, vincent.donnefort@arm.com wrote:
> > @@ -157,26 +162,24 @@ static int cpuhp_invoke_callback(unsigned int cpu, enum cpuhp_state state,
> >  
> >  	if (st->fail == state) {
> >  		st->fail = CPUHP_INVALID;
> > -
> > -		if (!(bringup ? step->startup.single : step->teardown.single))
> > -			return 0;
> > -
> >  		return -EAGAIN;
> >  	}
> >  
> > +	if (cpuhp_step_empty(bringup, step)) {
> > +		WARN_ON_ONCE(1);
> > +		return 0;
> > +	}
> 
> This changes the behaviour of fail.. might be best to refactor without
> changing behaviour.
> 

Aah, the trick is in cpuhp_next_state() skipping empty states, so we'll
never get there.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 3/4] cpu/hotplug: Add cpuhp_invoke_callback_range()
  2021-01-20 17:53     ` Peter Zijlstra
@ 2021-01-21 10:57       ` Vincent Donnefort
  2021-01-21 11:18         ` Peter Zijlstra
  0 siblings, 1 reply; 17+ messages in thread
From: Vincent Donnefort @ 2021-01-21 10:57 UTC (permalink / raw)
  To: Peter Zijlstra; +Cc: tglx, linux-kernel, valentin.schneider

On Wed, Jan 20, 2021 at 06:53:33PM +0100, Peter Zijlstra wrote:
> On Wed, Jan 20, 2021 at 06:45:16PM +0100, Peter Zijlstra wrote:
> > On Mon, Jan 11, 2021 at 05:10:46PM +0000, vincent.donnefort@arm.com wrote:
> > > @@ -475,6 +478,11 @@ cpuhp_set_state(struct cpuhp_cpu_state *st, enum cpuhp_state target)
> > >  static inline void
> > >  cpuhp_reset_state(struct cpuhp_cpu_state *st, enum cpuhp_state prev_state)
> > >  {
> > > +	st->target = prev_state;
> > > +
> > > +	if (st->rollback)
> > > +		return;
> > 
> > I'm thinking that if we call rollback while already rollback we're hosed
> > something fierce, no?
> > 
> > That like going up, failing, going back down again, also failing, giving
> > up in a fiery death.
> 
> Ooh, is this a hack for _cpu_down():
> 
> 	ret = cpuhp_down_callbacks(cpu, st, target);
> 	if (ret && st->state == CPUHP_TEARDOWN_CPU && st->state < prev_state) {
> 		cpuhp_reset_state(st, prev_state);
> 		__cpuhp_kick_ap(st);
> 	}
> 
> Where cpuhp_down_callbacks() can already have called cpuhp_reset_state() ?

Yes, it is now possible that this function will be called twice during the
rollback. Shall I avoid this and treat the case above differently ? i.e. "if we
are here, state has already been reset, and we should only set st->target".

-- 
Vincent

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 3/4] cpu/hotplug: Add cpuhp_invoke_callback_range()
  2021-01-21 10:57       ` Vincent Donnefort
@ 2021-01-21 11:18         ` Peter Zijlstra
  0 siblings, 0 replies; 17+ messages in thread
From: Peter Zijlstra @ 2021-01-21 11:18 UTC (permalink / raw)
  To: Vincent Donnefort; +Cc: tglx, linux-kernel, valentin.schneider

On Thu, Jan 21, 2021 at 10:57:57AM +0000, Vincent Donnefort wrote:
> On Wed, Jan 20, 2021 at 06:53:33PM +0100, Peter Zijlstra wrote:
> > On Wed, Jan 20, 2021 at 06:45:16PM +0100, Peter Zijlstra wrote:
> > > On Mon, Jan 11, 2021 at 05:10:46PM +0000, vincent.donnefort@arm.com wrote:
> > > > @@ -475,6 +478,11 @@ cpuhp_set_state(struct cpuhp_cpu_state *st, enum cpuhp_state target)
> > > >  static inline void
> > > >  cpuhp_reset_state(struct cpuhp_cpu_state *st, enum cpuhp_state prev_state)
> > > >  {
> > > > +	st->target = prev_state;
> > > > +
> > > > +	if (st->rollback)
> > > > +		return;
> > > 
> > > I'm thinking that if we call rollback while already rollback we're hosed
> > > something fierce, no?
> > > 
> > > That like going up, failing, going back down again, also failing, giving
> > > up in a fiery death.
> > 
> > Ooh, is this a hack for _cpu_down():
> > 
> > 	ret = cpuhp_down_callbacks(cpu, st, target);
> > 	if (ret && st->state == CPUHP_TEARDOWN_CPU && st->state < prev_state) {
> > 		cpuhp_reset_state(st, prev_state);
> > 		__cpuhp_kick_ap(st);
> > 	}
> > 
> > Where cpuhp_down_callbacks() can already have called cpuhp_reset_state() ?
> 
> Yes, it is now possible that this function will be called twice during the
> rollback. Shall I avoid this and treat the case above differently ? i.e. "if we
> are here, state has already been reset, and we should only set st->target".

Not sure, but a comment would be useful :-)

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 4/4] cpu/hotplug: Fix CPU down rollback
  2021-01-11 17:10 ` [PATCH 4/4] cpu/hotplug: Fix CPU down rollback vincent.donnefort
@ 2021-01-21 14:57   ` Peter Zijlstra
  2021-01-21 15:43     ` Vincent Donnefort
  0 siblings, 1 reply; 17+ messages in thread
From: Peter Zijlstra @ 2021-01-21 14:57 UTC (permalink / raw)
  To: vincent.donnefort; +Cc: tglx, linux-kernel, valentin.schneider

On Mon, Jan 11, 2021 at 05:10:47PM +0000, vincent.donnefort@arm.com wrote:
> From: Vincent Donnefort <vincent.donnefort@arm.com>
> 
> After the AP brought itself down to CPUHP_TEARDOWN_CPU, the BP will finish
> the job. The steps left are as followed:
> 
>    +--------------------+
>    | CPUHP_TEARDOWN_CPU |  -> If fails state is CPUHP_TEARDOWN_CPU
>    +--------------------+
>    |   ATOMIC STATES    |  -> Cannot Fail
>    +--------------------+
>    |  CPUHP_BRINGUP_CPU |  -> Cannot fail
>    +--------------------+
>    |        ...         |
>    |        ...         |  -> Can fail and rollback

These are the PREPARE/DEAD states, right? It would be _really_ daft for
a DEAD notifier to fail. But yeah, I suppose that if it does, it will
indeed end up in CPUHP_AP_ONLINE_IDLE.

Do we want to WARN when a DEAD notifier fails?



^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 4/4] cpu/hotplug: Fix CPU down rollback
  2021-01-21 14:57   ` Peter Zijlstra
@ 2021-01-21 15:43     ` Vincent Donnefort
  0 siblings, 0 replies; 17+ messages in thread
From: Vincent Donnefort @ 2021-01-21 15:43 UTC (permalink / raw)
  To: Peter Zijlstra; +Cc: tglx, linux-kernel, valentin.schneider

On Thu, Jan 21, 2021 at 03:57:03PM +0100, Peter Zijlstra wrote:
> On Mon, Jan 11, 2021 at 05:10:47PM +0000, vincent.donnefort@arm.com wrote:
> > From: Vincent Donnefort <vincent.donnefort@arm.com>
> > 
> > After the AP brought itself down to CPUHP_TEARDOWN_CPU, the BP will finish
> > the job. The steps left are as followed:
> > 
> >    +--------------------+
> >    | CPUHP_TEARDOWN_CPU |  -> If fails state is CPUHP_TEARDOWN_CPU
> >    +--------------------+
> >    |   ATOMIC STATES    |  -> Cannot Fail
> >    +--------------------+
> >    |  CPUHP_BRINGUP_CPU |  -> Cannot fail
> >    +--------------------+
> >    |        ...         |
> >    |        ...         |  -> Can fail and rollback
> 
> These are the PREPARE/DEAD states, right? It would be _really_ daft for
> a DEAD notifier to fail. But yeah, I suppose that if it does, it will
> indeed end up in CPUHP_AP_ONLINE_IDLE.
> 
> Do we want to WARN when a DEAD notifier fails?
> 
> 

Indeed, I couldn't find a dead callback which can return an error. So I
suppose we could go for another strategy here, that would be to not allow
failure for the dead states (i.e states < BRINGUP_CPU when hotunplug). In
that case, the fail interface should probably disallow selecting those
states and a WARN here would be good.

-- 
Vincent

^ permalink raw reply	[flat|nested] 17+ messages in thread

end of thread, other threads:[~2021-01-21 16:57 UTC | newest]

Thread overview: 17+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-01-11 17:10 [PATCH 0/4] cpu/hotplug: rollback and "fail" interface fixes vincent.donnefort
2021-01-11 17:10 ` [PATCH 1/4] cpu/hotplug: Allowing to reset fail injection vincent.donnefort
2021-01-11 17:10 ` [PATCH 2/4] cpu/hotplug: CPUHP_BRINGUP_CPU exception in " vincent.donnefort
2021-01-20 12:58   ` Peter Zijlstra
2021-01-20 15:17     ` Vincent Donnefort
2021-01-20 17:30       ` Peter Zijlstra
2021-01-11 17:10 ` [PATCH 3/4] cpu/hotplug: Add cpuhp_invoke_callback_range() vincent.donnefort
2021-01-20 13:11   ` Peter Zijlstra
2021-01-20 17:54     ` Peter Zijlstra
2021-01-20 17:45   ` Peter Zijlstra
2021-01-20 17:53     ` Peter Zijlstra
2021-01-21 10:57       ` Vincent Donnefort
2021-01-21 11:18         ` Peter Zijlstra
2021-01-20 17:49   ` Peter Zijlstra
2021-01-11 17:10 ` [PATCH 4/4] cpu/hotplug: Fix CPU down rollback vincent.donnefort
2021-01-21 14:57   ` Peter Zijlstra
2021-01-21 15:43     ` Vincent Donnefort

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.