* [PATCH 0/4] cpu/hotplug: rollback and "fail" interface fixes
@ 2021-01-11 17:10 vincent.donnefort
2021-01-11 17:10 ` [PATCH 1/4] cpu/hotplug: Allowing to reset fail injection vincent.donnefort
` (3 more replies)
0 siblings, 4 replies; 17+ messages in thread
From: vincent.donnefort @ 2021-01-11 17:10 UTC (permalink / raw)
To: peterz, tglx, linux-kernel; +Cc: valentin.schneider, Vincent Donnefort
From: Vincent Donnefort <vincent.donnefort@arm.com>
This patch-set intends mainly to fix HP rollback, which is currently broken,
due to an inconsistent "state" usage and an issue with CPUHP_AP_ONLINE_IDLE.
It also improves the "fail" interface, which can now be reset and will reject
CPUHP_BRINGUP_CPU, a step that can't always fail.
Vincent Donnefort (4):
cpu/hotplug: Allowing to reset fail injection
cpu/hotplug: CPUHP_BRINGUP_CPU exception in fail injection
cpu/hotplug: Add cpuhp_invoke_callback_range()
cpu/hotplug: Fix CPU down rollback
kernel/cpu.c | 173 +++++++++++++++++++++++++++++++++++++----------------------
1 file changed, 110 insertions(+), 63 deletions(-)
--
2.7.4
^ permalink raw reply [flat|nested] 17+ messages in thread
* [PATCH 1/4] cpu/hotplug: Allowing to reset fail injection
2021-01-11 17:10 [PATCH 0/4] cpu/hotplug: rollback and "fail" interface fixes vincent.donnefort
@ 2021-01-11 17:10 ` vincent.donnefort
2021-01-11 17:10 ` [PATCH 2/4] cpu/hotplug: CPUHP_BRINGUP_CPU exception in " vincent.donnefort
` (2 subsequent siblings)
3 siblings, 0 replies; 17+ messages in thread
From: vincent.donnefort @ 2021-01-11 17:10 UTC (permalink / raw)
To: peterz, tglx, linux-kernel; +Cc: valentin.schneider, Vincent Donnefort
From: Vincent Donnefort <vincent.donnefort@arm.com>
Currently, the only way of resetting this file is to actually try to run
a hotplug, hotunplug or both. This is quite annoying for testing and, as
the default value for this file is -1, it seems quite natural to let a
user write it.
Signed-off-by: Vincent Donnefort <vincent.donnefort@arm.com>
---
kernel/cpu.c | 5 +++++
1 file changed, 5 insertions(+)
diff --git a/kernel/cpu.c b/kernel/cpu.c
index 1b6302e..9121edf 100644
--- a/kernel/cpu.c
+++ b/kernel/cpu.c
@@ -2207,6 +2207,11 @@ static ssize_t write_cpuhp_fail(struct device *dev,
if (ret)
return ret;
+ if (fail == CPUHP_INVALID) {
+ st->fail = fail;
+ return count;
+ }
+
if (fail < CPUHP_OFFLINE || fail > CPUHP_ONLINE)
return -EINVAL;
--
2.7.4
^ permalink raw reply related [flat|nested] 17+ messages in thread
* [PATCH 2/4] cpu/hotplug: CPUHP_BRINGUP_CPU exception in fail injection
2021-01-11 17:10 [PATCH 0/4] cpu/hotplug: rollback and "fail" interface fixes vincent.donnefort
2021-01-11 17:10 ` [PATCH 1/4] cpu/hotplug: Allowing to reset fail injection vincent.donnefort
@ 2021-01-11 17:10 ` vincent.donnefort
2021-01-20 12:58 ` Peter Zijlstra
2021-01-11 17:10 ` [PATCH 3/4] cpu/hotplug: Add cpuhp_invoke_callback_range() vincent.donnefort
2021-01-11 17:10 ` [PATCH 4/4] cpu/hotplug: Fix CPU down rollback vincent.donnefort
3 siblings, 1 reply; 17+ messages in thread
From: vincent.donnefort @ 2021-01-11 17:10 UTC (permalink / raw)
To: peterz, tglx, linux-kernel; +Cc: valentin.schneider, Vincent Donnefort
From: Vincent Donnefort <vincent.donnefort@arm.com>
The atomic states (between CPUHP_AP_IDLE_DEAD and CPUHP_AP_ONLINE) are
triggered by the CPUHP_BRINGUP_CPU step. If the latter doesn't run, none
of the atomic can. Hence, rollback is not possible after a hotunplug
CPUHP_BRINGUP_CPU step failure and the "fail" interface shouldn't allow
it. Moreover, the current CPUHP_BRINGUP_CPU teardown callback
(finish_cpu()) cannot fail anyway.
Signed-off-by: Vincent Donnefort <vincent.donnefort@arm.com>
---
kernel/cpu.c | 9 +++++++--
1 file changed, 7 insertions(+), 2 deletions(-)
diff --git a/kernel/cpu.c b/kernel/cpu.c
index 9121edf..bcd7b2a 100644
--- a/kernel/cpu.c
+++ b/kernel/cpu.c
@@ -2216,9 +2216,14 @@ static ssize_t write_cpuhp_fail(struct device *dev,
return -EINVAL;
/*
- * Cannot fail STARTING/DYING callbacks.
+ * Cannot fail STARTING/DYING callbacks. Also, those callbacks are
+ * triggered by BRINGUP_CPU bringup callback. Therefore, the latter
+ * can't fail during hotunplug, as it would mean we have no way of
+ * rolling back the atomic states that have been previously teared
+ * down.
*/
- if (cpuhp_is_atomic_state(fail))
+ if (cpuhp_is_atomic_state(fail) ||
+ (fail == CPUHP_BRINGUP_CPU && st->state > CPUHP_BRINGUP_CPU))
return -EINVAL;
/*
--
2.7.4
^ permalink raw reply related [flat|nested] 17+ messages in thread
* [PATCH 3/4] cpu/hotplug: Add cpuhp_invoke_callback_range()
2021-01-11 17:10 [PATCH 0/4] cpu/hotplug: rollback and "fail" interface fixes vincent.donnefort
2021-01-11 17:10 ` [PATCH 1/4] cpu/hotplug: Allowing to reset fail injection vincent.donnefort
2021-01-11 17:10 ` [PATCH 2/4] cpu/hotplug: CPUHP_BRINGUP_CPU exception in " vincent.donnefort
@ 2021-01-11 17:10 ` vincent.donnefort
2021-01-20 13:11 ` Peter Zijlstra
` (2 more replies)
2021-01-11 17:10 ` [PATCH 4/4] cpu/hotplug: Fix CPU down rollback vincent.donnefort
3 siblings, 3 replies; 17+ messages in thread
From: vincent.donnefort @ 2021-01-11 17:10 UTC (permalink / raw)
To: peterz, tglx, linux-kernel; +Cc: valentin.schneider, Vincent Donnefort
From: Vincent Donnefort <vincent.donnefort@arm.com>
Factorizing and unifying cpuhp callback range invocations, especially for
the hotunplug path, where two different ways of decrementing were used. The
first one, decrements before the callback is called:
cpuhp_thread_fun()
state = st->state;
st->state--;
cpuhp_invoke_callback(state);
The second one, after:
take_down_cpu()|cpuhp_down_callbacks()
cpuhp_invoke_callback(st->state);
st->state--;
This is problematic for rolling back the steps in case of error, as
depending on the decrement, the rollback will start from N or N-1. It also
makes tracing inconsistent, between steps run in the cpuhp thread and
the others.
Additionally, avoid useless cpuhp_thread_fun() loops by skipping empty
steps.
Signed-off-by: Vincent Donnefort <vincent.donnefort@arm.com>
---
kernel/cpu.c | 150 +++++++++++++++++++++++++++++++++++------------------------
1 file changed, 90 insertions(+), 60 deletions(-)
diff --git a/kernel/cpu.c b/kernel/cpu.c
index bcd7b2a..aebec3a 100644
--- a/kernel/cpu.c
+++ b/kernel/cpu.c
@@ -135,6 +135,11 @@ static struct cpuhp_step *cpuhp_get_step(enum cpuhp_state state)
return cpuhp_hp_states + state;
}
+static bool cpuhp_step_empty(bool bringup, struct cpuhp_step *step)
+{
+ return bringup ? !step->startup.single : !step->teardown.single;
+}
+
/**
* cpuhp_invoke_callback _ Invoke the callbacks for a given state
* @cpu: The cpu for which the callback should be invoked
@@ -157,26 +162,24 @@ static int cpuhp_invoke_callback(unsigned int cpu, enum cpuhp_state state,
if (st->fail == state) {
st->fail = CPUHP_INVALID;
-
- if (!(bringup ? step->startup.single : step->teardown.single))
- return 0;
-
return -EAGAIN;
}
+ if (cpuhp_step_empty(bringup, step)) {
+ WARN_ON_ONCE(1);
+ return 0;
+ }
+
if (!step->multi_instance) {
WARN_ON_ONCE(lastp && *lastp);
cb = bringup ? step->startup.single : step->teardown.single;
- if (!cb)
- return 0;
+
trace_cpuhp_enter(cpu, st->target, state, cb);
ret = cb(cpu);
trace_cpuhp_exit(cpu, st->state, state, ret);
return ret;
}
cbm = bringup ? step->startup.multi : step->teardown.multi;
- if (!cbm)
- return 0;
/* Single invocation for instance add/remove */
if (node) {
@@ -475,6 +478,11 @@ cpuhp_set_state(struct cpuhp_cpu_state *st, enum cpuhp_state target)
static inline void
cpuhp_reset_state(struct cpuhp_cpu_state *st, enum cpuhp_state prev_state)
{
+ st->target = prev_state;
+
+ if (st->rollback)
+ return;
+
st->rollback = true;
/*
@@ -488,7 +496,6 @@ cpuhp_reset_state(struct cpuhp_cpu_state *st, enum cpuhp_state prev_state)
st->state++;
}
- st->target = prev_state;
st->bringup = !st->bringup;
}
@@ -591,10 +598,53 @@ static int finish_cpu(unsigned int cpu)
* Hotplug state machine related functions
*/
-static void undo_cpu_up(unsigned int cpu, struct cpuhp_cpu_state *st)
+/*
+ * Get the next state to run. Empty ones will be skipped. Returns true if a
+ * state must be run.
+ *
+ * st->state will be modified ahead of time, to match state_to_run, as if it
+ * has already ran.
+ */
+static bool cpuhp_next_state(bool bringup,
+ enum cpuhp_state *state_to_run,
+ struct cpuhp_cpu_state *st,
+ enum cpuhp_state target)
{
- for (st->state--; st->state > st->target; st->state--)
- cpuhp_invoke_callback(cpu, st->state, false, NULL, NULL);
+ do {
+ if (bringup) {
+ if (st->state >= target)
+ return false;
+
+ *state_to_run = ++st->state;
+ } else {
+ if (st->state <= target)
+ return false;
+
+ *state_to_run = st->state--;
+ }
+
+ if (!cpuhp_step_empty(bringup, cpuhp_get_step(*state_to_run)))
+ break;
+ } while (true);
+
+ return true;
+}
+
+static int cpuhp_invoke_callback_range(bool bringup,
+ unsigned int cpu,
+ struct cpuhp_cpu_state *st,
+ enum cpuhp_state target)
+{
+ enum cpuhp_state state;
+ int err = 0;
+
+ while (cpuhp_next_state(bringup, &state, st, target)) {
+ err = cpuhp_invoke_callback(cpu, state, bringup, NULL, NULL);
+ if (err)
+ break;
+ }
+
+ return err;
}
static inline bool can_rollback_cpu(struct cpuhp_cpu_state *st)
@@ -617,16 +667,12 @@ static int cpuhp_up_callbacks(unsigned int cpu, struct cpuhp_cpu_state *st,
enum cpuhp_state prev_state = st->state;
int ret = 0;
- while (st->state < target) {
- st->state++;
- ret = cpuhp_invoke_callback(cpu, st->state, true, NULL, NULL);
- if (ret) {
- if (can_rollback_cpu(st)) {
- st->target = prev_state;
- undo_cpu_up(cpu, st);
- }
- break;
- }
+ ret = cpuhp_invoke_callback_range(true, cpu, st, target);
+ if (ret) {
+ cpuhp_reset_state(st, prev_state);
+ if (can_rollback_cpu(st))
+ WARN_ON(cpuhp_invoke_callback_range(false, cpu, st,
+ prev_state));
}
return ret;
}
@@ -690,17 +736,9 @@ static void cpuhp_thread_fun(unsigned int cpu)
state = st->cb_state;
st->should_run = false;
} else {
- if (bringup) {
- st->state++;
- state = st->state;
- st->should_run = (st->state < st->target);
- WARN_ON_ONCE(st->state > st->target);
- } else {
- state = st->state;
- st->state--;
- st->should_run = (st->state > st->target);
- WARN_ON_ONCE(st->state < st->target);
- }
+ st->should_run = cpuhp_next_state(bringup, &state, st, st->target);
+ if (!st->should_run)
+ goto end;
}
WARN_ON_ONCE(!cpuhp_is_ap_state(state));
@@ -728,6 +766,7 @@ static void cpuhp_thread_fun(unsigned int cpu)
st->should_run = false;
}
+end:
cpuhp_lock_release(bringup);
lockdep_release_cpus_lock();
@@ -881,19 +920,18 @@ static int take_cpu_down(void *_param)
return err;
/*
- * We get here while we are in CPUHP_TEARDOWN_CPU state and we must not
- * do this step again.
+ * Must be called from CPUHP_TEARDOWN_CPU, which means, as we are going
+ * down, that the current state is CPUHP_TEARDOWN_CPU - 1.
*/
- WARN_ON(st->state != CPUHP_TEARDOWN_CPU);
- st->state--;
+ WARN_ON(st->state != (CPUHP_TEARDOWN_CPU - 1));
+
/* Invoke the former CPU_DYING callbacks */
- for (; st->state > target; st->state--) {
- ret = cpuhp_invoke_callback(cpu, st->state, false, NULL, NULL);
+ ret = cpuhp_invoke_callback_range(false, cpu, st, target);
+ if (ret)
/*
* DYING must not fail!
*/
WARN_ON_ONCE(ret);
- }
/* Give up timekeeping duties */
tick_handover_do_timer();
@@ -975,27 +1013,22 @@ void cpuhp_report_idle_dead(void)
cpuhp_complete_idle_dead, st, 0);
}
-static void undo_cpu_down(unsigned int cpu, struct cpuhp_cpu_state *st)
-{
- for (st->state++; st->state < st->target; st->state++)
- cpuhp_invoke_callback(cpu, st->state, true, NULL, NULL);
-}
-
static int cpuhp_down_callbacks(unsigned int cpu, struct cpuhp_cpu_state *st,
enum cpuhp_state target)
{
enum cpuhp_state prev_state = st->state;
int ret = 0;
- for (; st->state > target; st->state--) {
- ret = cpuhp_invoke_callback(cpu, st->state, false, NULL, NULL);
- if (ret) {
- st->target = prev_state;
- if (st->state < prev_state)
- undo_cpu_down(cpu, st);
- break;
- }
+ ret = cpuhp_invoke_callback_range(false, cpu, st, target);
+ if (ret) {
+
+ cpuhp_reset_state(st, prev_state);
+
+ if (st->state < prev_state)
+ WARN_ON(cpuhp_invoke_callback_range(true, cpu, st,
+ prev_state));
}
+
return ret;
}
@@ -1164,14 +1197,12 @@ void notify_cpu_starting(unsigned int cpu)
rcu_cpu_starting(cpu); /* Enables RCU usage on this CPU. */
cpumask_set_cpu(cpu, &cpus_booted_once_mask);
- while (st->state < target) {
- st->state++;
- ret = cpuhp_invoke_callback(cpu, st->state, true, NULL, NULL);
+ ret = cpuhp_invoke_callback_range(true, cpu, st, target);
+ if (ret)
/*
* STARTING must not fail!
*/
WARN_ON_ONCE(ret);
- }
}
/*
@@ -1777,8 +1808,7 @@ static int cpuhp_issue_call(int cpu, enum cpuhp_state state, bool bringup,
* If there's nothing to do, we done.
* Relies on the union for multi_instance.
*/
- if ((bringup && !sp->startup.single) ||
- (!bringup && !sp->teardown.single))
+ if (cpuhp_step_empty(bringup, sp))
return 0;
/*
* The non AP bound callbacks can fail on bringup. On teardown
--
2.7.4
^ permalink raw reply related [flat|nested] 17+ messages in thread
* [PATCH 4/4] cpu/hotplug: Fix CPU down rollback
2021-01-11 17:10 [PATCH 0/4] cpu/hotplug: rollback and "fail" interface fixes vincent.donnefort
` (2 preceding siblings ...)
2021-01-11 17:10 ` [PATCH 3/4] cpu/hotplug: Add cpuhp_invoke_callback_range() vincent.donnefort
@ 2021-01-11 17:10 ` vincent.donnefort
2021-01-21 14:57 ` Peter Zijlstra
3 siblings, 1 reply; 17+ messages in thread
From: vincent.donnefort @ 2021-01-11 17:10 UTC (permalink / raw)
To: peterz, tglx, linux-kernel; +Cc: valentin.schneider, Vincent Donnefort
From: Vincent Donnefort <vincent.donnefort@arm.com>
After the AP brought itself down to CPUHP_TEARDOWN_CPU, the BP will finish
the job. The steps left are as followed:
+--------------------+
| CPUHP_TEARDOWN_CPU | -> If fails state is CPUHP_TEARDOWN_CPU
+--------------------+
| ATOMIC STATES | -> Cannot Fail
+--------------------+
| CPUHP_BRINGUP_CPU | -> Cannot fail
+--------------------+
| ... |
| ... | -> Can fail and rollback
| CPUHP_OFFLINE |
+--------------------+
There are, in case of failure two possibilities:
1. Failure in CPUHP_TEARDOWN_CPU, then st->state will still be
CPUHP_TEARDOWN_CPU
2. Failure between CPUHP_OFFLINE and CPUHP_BRINGUP_CPU.
For 2. The rollback will always run in the CPUHP_BRINGUP_CPU bringup
callback (bringup_cpu()) which kicks the AP back on. The latter will then
end-up setting st->state to CPUHP_AP_ONLINE_IDLE.
Checking for CPUHP_AP_ONLINE_IDLE allows then to rollback in all cases.
Signed-off-by: Vincent Donnefort <vincent.donnefort@arm.com>
---
kernel/cpu.c | 9 ++++++++-
1 file changed, 8 insertions(+), 1 deletion(-)
diff --git a/kernel/cpu.c b/kernel/cpu.c
index aebec3a..8b3f84e 100644
--- a/kernel/cpu.c
+++ b/kernel/cpu.c
@@ -1078,7 +1078,14 @@ static int __ref _cpu_down(unsigned int cpu, int tasks_frozen,
* to do the further cleanups.
*/
ret = cpuhp_down_callbacks(cpu, st, target);
- if (ret && st->state == CPUHP_TEARDOWN_CPU && st->state < prev_state) {
+ if (ret && (st->state == CPUHP_AP_ONLINE_IDLE ||
+ st->state == CPUHP_TEARDOWN_CPU) &&
+ st->state < prev_state) {
+ /*
+ * After an error the state can be either:
+ * - CPUHP_TEARDOWN_CPU if this step failed.
+ * - CPUHP_AP_ONLINE_IDLE if any other failed.
+ */
cpuhp_reset_state(st, prev_state);
__cpuhp_kick_ap(st);
}
--
2.7.4
^ permalink raw reply related [flat|nested] 17+ messages in thread
* Re: [PATCH 2/4] cpu/hotplug: CPUHP_BRINGUP_CPU exception in fail injection
2021-01-11 17:10 ` [PATCH 2/4] cpu/hotplug: CPUHP_BRINGUP_CPU exception in " vincent.donnefort
@ 2021-01-20 12:58 ` Peter Zijlstra
2021-01-20 15:17 ` Vincent Donnefort
0 siblings, 1 reply; 17+ messages in thread
From: Peter Zijlstra @ 2021-01-20 12:58 UTC (permalink / raw)
To: vincent.donnefort; +Cc: tglx, linux-kernel, valentin.schneider
On Mon, Jan 11, 2021 at 05:10:45PM +0000, vincent.donnefort@arm.com wrote:
> From: Vincent Donnefort <vincent.donnefort@arm.com>
>
> The atomic states (between CPUHP_AP_IDLE_DEAD and CPUHP_AP_ONLINE) are
> triggered by the CPUHP_BRINGUP_CPU step. If the latter doesn't run, none
> of the atomic can. Hence, rollback is not possible after a hotunplug
> CPUHP_BRINGUP_CPU step failure and the "fail" interface shouldn't allow
> it. Moreover, the current CPUHP_BRINGUP_CPU teardown callback
> (finish_cpu()) cannot fail anyway.
>
> Signed-off-by: Vincent Donnefort <vincent.donnefort@arm.com>
> ---
> kernel/cpu.c | 9 +++++++--
> 1 file changed, 7 insertions(+), 2 deletions(-)
>
> diff --git a/kernel/cpu.c b/kernel/cpu.c
> index 9121edf..bcd7b2a 100644
> --- a/kernel/cpu.c
> +++ b/kernel/cpu.c
> @@ -2216,9 +2216,14 @@ static ssize_t write_cpuhp_fail(struct device *dev,
> return -EINVAL;
>
> /*
> - * Cannot fail STARTING/DYING callbacks.
> + * Cannot fail STARTING/DYING callbacks. Also, those callbacks are
> + * triggered by BRINGUP_CPU bringup callback. Therefore, the latter
> + * can't fail during hotunplug, as it would mean we have no way of
> + * rolling back the atomic states that have been previously teared
> + * down.
> */
> - if (cpuhp_is_atomic_state(fail))
> + if (cpuhp_is_atomic_state(fail) ||
> + (fail == CPUHP_BRINGUP_CPU && st->state > CPUHP_BRINGUP_CPU))
> return -EINVAL;
Should we instead disallow failing any state that has .cant_stop ?
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH 3/4] cpu/hotplug: Add cpuhp_invoke_callback_range()
2021-01-11 17:10 ` [PATCH 3/4] cpu/hotplug: Add cpuhp_invoke_callback_range() vincent.donnefort
@ 2021-01-20 13:11 ` Peter Zijlstra
2021-01-20 17:54 ` Peter Zijlstra
2021-01-20 17:45 ` Peter Zijlstra
2021-01-20 17:49 ` Peter Zijlstra
2 siblings, 1 reply; 17+ messages in thread
From: Peter Zijlstra @ 2021-01-20 13:11 UTC (permalink / raw)
To: vincent.donnefort; +Cc: tglx, linux-kernel, valentin.schneider
On Mon, Jan 11, 2021 at 05:10:46PM +0000, vincent.donnefort@arm.com wrote:
> @@ -157,26 +162,24 @@ static int cpuhp_invoke_callback(unsigned int cpu, enum cpuhp_state state,
>
> if (st->fail == state) {
> st->fail = CPUHP_INVALID;
> -
> - if (!(bringup ? step->startup.single : step->teardown.single))
> - return 0;
> -
> return -EAGAIN;
> }
>
> + if (cpuhp_step_empty(bringup, step)) {
> + WARN_ON_ONCE(1);
> + return 0;
> + }
This changes the behaviour of fail.. might be best to refactor without
changing behaviour.
Lemme continue reading.
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH 2/4] cpu/hotplug: CPUHP_BRINGUP_CPU exception in fail injection
2021-01-20 12:58 ` Peter Zijlstra
@ 2021-01-20 15:17 ` Vincent Donnefort
2021-01-20 17:30 ` Peter Zijlstra
0 siblings, 1 reply; 17+ messages in thread
From: Vincent Donnefort @ 2021-01-20 15:17 UTC (permalink / raw)
To: Peter Zijlstra; +Cc: tglx, linux-kernel, valentin.schneider
On Wed, Jan 20, 2021 at 01:58:35PM +0100, Peter Zijlstra wrote:
> On Mon, Jan 11, 2021 at 05:10:45PM +0000, vincent.donnefort@arm.com wrote:
> > From: Vincent Donnefort <vincent.donnefort@arm.com>
> >
> > The atomic states (between CPUHP_AP_IDLE_DEAD and CPUHP_AP_ONLINE) are
> > triggered by the CPUHP_BRINGUP_CPU step. If the latter doesn't run, none
> > of the atomic can. Hence, rollback is not possible after a hotunplug
> > CPUHP_BRINGUP_CPU step failure and the "fail" interface shouldn't allow
> > it. Moreover, the current CPUHP_BRINGUP_CPU teardown callback
> > (finish_cpu()) cannot fail anyway.
> >
> > Signed-off-by: Vincent Donnefort <vincent.donnefort@arm.com>
> > ---
> > kernel/cpu.c | 9 +++++++--
> > 1 file changed, 7 insertions(+), 2 deletions(-)
> >
> > diff --git a/kernel/cpu.c b/kernel/cpu.c
> > index 9121edf..bcd7b2a 100644
> > --- a/kernel/cpu.c
> > +++ b/kernel/cpu.c
> > @@ -2216,9 +2216,14 @@ static ssize_t write_cpuhp_fail(struct device *dev,
> > return -EINVAL;
> >
> > /*
> > - * Cannot fail STARTING/DYING callbacks.
> > + * Cannot fail STARTING/DYING callbacks. Also, those callbacks are
> > + * triggered by BRINGUP_CPU bringup callback. Therefore, the latter
> > + * can't fail during hotunplug, as it would mean we have no way of
> > + * rolling back the atomic states that have been previously teared
> > + * down.
> > */
> > - if (cpuhp_is_atomic_state(fail))
> > + if (cpuhp_is_atomic_state(fail) ||
> > + (fail == CPUHP_BRINGUP_CPU && st->state > CPUHP_BRINGUP_CPU))
> > return -EINVAL;
>
> Should we instead disallow failing any state that has .cant_stop ?
We would reduce the scope of what can be tested: bringup_cpu() and
takedown_cpu() are both marked as "cant_stop". Still, those callbacks are
allowed to fail.
Checking for cant_stop, made me also see that write_cpuhp_target() is probably
missing a check for cpuhp_is_atomic_state(). For the same reason as this patch,
when doing cpu_down(), we can't stop in one of these states.
--
Vincent
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH 2/4] cpu/hotplug: CPUHP_BRINGUP_CPU exception in fail injection
2021-01-20 15:17 ` Vincent Donnefort
@ 2021-01-20 17:30 ` Peter Zijlstra
0 siblings, 0 replies; 17+ messages in thread
From: Peter Zijlstra @ 2021-01-20 17:30 UTC (permalink / raw)
To: Vincent Donnefort; +Cc: tglx, linux-kernel, valentin.schneider
On Wed, Jan 20, 2021 at 03:17:24PM +0000, Vincent Donnefort wrote:
> On Wed, Jan 20, 2021 at 01:58:35PM +0100, Peter Zijlstra wrote:
> > On Mon, Jan 11, 2021 at 05:10:45PM +0000, vincent.donnefort@arm.com wrote:
> > > + if (cpuhp_is_atomic_state(fail) ||
> > > + (fail == CPUHP_BRINGUP_CPU && st->state > CPUHP_BRINGUP_CPU))
> > > return -EINVAL;
> >
> > Should we instead disallow failing any state that has .cant_stop ?
>
> We would reduce the scope of what can be tested: bringup_cpu() and
> takedown_cpu() are both marked as "cant_stop". Still, those callbacks are
> allowed to fail.
Fair enough. I suppose we can add an additional cant_fail field, but I'm
not sure that's worth the effort over this.
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH 3/4] cpu/hotplug: Add cpuhp_invoke_callback_range()
2021-01-11 17:10 ` [PATCH 3/4] cpu/hotplug: Add cpuhp_invoke_callback_range() vincent.donnefort
2021-01-20 13:11 ` Peter Zijlstra
@ 2021-01-20 17:45 ` Peter Zijlstra
2021-01-20 17:53 ` Peter Zijlstra
2021-01-20 17:49 ` Peter Zijlstra
2 siblings, 1 reply; 17+ messages in thread
From: Peter Zijlstra @ 2021-01-20 17:45 UTC (permalink / raw)
To: vincent.donnefort; +Cc: tglx, linux-kernel, valentin.schneider
On Mon, Jan 11, 2021 at 05:10:46PM +0000, vincent.donnefort@arm.com wrote:
> @@ -475,6 +478,11 @@ cpuhp_set_state(struct cpuhp_cpu_state *st, enum cpuhp_state target)
> static inline void
> cpuhp_reset_state(struct cpuhp_cpu_state *st, enum cpuhp_state prev_state)
> {
> + st->target = prev_state;
> +
> + if (st->rollback)
> + return;
I'm thinking that if we call rollback while already rollback we're hosed
something fierce, no?
That like going up, failing, going back down again, also failing, giving
up in a fiery death.
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH 3/4] cpu/hotplug: Add cpuhp_invoke_callback_range()
2021-01-11 17:10 ` [PATCH 3/4] cpu/hotplug: Add cpuhp_invoke_callback_range() vincent.donnefort
2021-01-20 13:11 ` Peter Zijlstra
2021-01-20 17:45 ` Peter Zijlstra
@ 2021-01-20 17:49 ` Peter Zijlstra
2 siblings, 0 replies; 17+ messages in thread
From: Peter Zijlstra @ 2021-01-20 17:49 UTC (permalink / raw)
To: vincent.donnefort; +Cc: tglx, linux-kernel, valentin.schneider
On Mon, Jan 11, 2021 at 05:10:46PM +0000, vincent.donnefort@arm.com wrote:
> + if (can_rollback_cpu(st))
> + WARN_ON(cpuhp_invoke_callback_range(false, cpu, st,
> + prev_state));
> + if (ret)
> /*
> * DYING must not fail!
> */
> WARN_ON_ONCE(ret);
> - }
> + if (ret) {
> +
> + cpuhp_reset_state(st, prev_state);
> +
> + if (st->state < prev_state)
> + WARN_ON(cpuhp_invoke_callback_range(true, cpu, st,
> + prev_state));
> }
> + if (ret)
> /*
> * STARTING must not fail!
> */
> WARN_ON_ONCE(ret);
> - }
Stuff like that is CodingStyle fail.
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH 3/4] cpu/hotplug: Add cpuhp_invoke_callback_range()
2021-01-20 17:45 ` Peter Zijlstra
@ 2021-01-20 17:53 ` Peter Zijlstra
2021-01-21 10:57 ` Vincent Donnefort
0 siblings, 1 reply; 17+ messages in thread
From: Peter Zijlstra @ 2021-01-20 17:53 UTC (permalink / raw)
To: vincent.donnefort; +Cc: tglx, linux-kernel, valentin.schneider
On Wed, Jan 20, 2021 at 06:45:16PM +0100, Peter Zijlstra wrote:
> On Mon, Jan 11, 2021 at 05:10:46PM +0000, vincent.donnefort@arm.com wrote:
> > @@ -475,6 +478,11 @@ cpuhp_set_state(struct cpuhp_cpu_state *st, enum cpuhp_state target)
> > static inline void
> > cpuhp_reset_state(struct cpuhp_cpu_state *st, enum cpuhp_state prev_state)
> > {
> > + st->target = prev_state;
> > +
> > + if (st->rollback)
> > + return;
>
> I'm thinking that if we call rollback while already rollback we're hosed
> something fierce, no?
>
> That like going up, failing, going back down again, also failing, giving
> up in a fiery death.
Ooh, is this a hack for _cpu_down():
ret = cpuhp_down_callbacks(cpu, st, target);
if (ret && st->state == CPUHP_TEARDOWN_CPU && st->state < prev_state) {
cpuhp_reset_state(st, prev_state);
__cpuhp_kick_ap(st);
}
Where cpuhp_down_callbacks() can already have called cpuhp_reset_state() ?
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH 3/4] cpu/hotplug: Add cpuhp_invoke_callback_range()
2021-01-20 13:11 ` Peter Zijlstra
@ 2021-01-20 17:54 ` Peter Zijlstra
0 siblings, 0 replies; 17+ messages in thread
From: Peter Zijlstra @ 2021-01-20 17:54 UTC (permalink / raw)
To: vincent.donnefort; +Cc: tglx, linux-kernel, valentin.schneider
On Wed, Jan 20, 2021 at 02:11:14PM +0100, Peter Zijlstra wrote:
> On Mon, Jan 11, 2021 at 05:10:46PM +0000, vincent.donnefort@arm.com wrote:
> > @@ -157,26 +162,24 @@ static int cpuhp_invoke_callback(unsigned int cpu, enum cpuhp_state state,
> >
> > if (st->fail == state) {
> > st->fail = CPUHP_INVALID;
> > -
> > - if (!(bringup ? step->startup.single : step->teardown.single))
> > - return 0;
> > -
> > return -EAGAIN;
> > }
> >
> > + if (cpuhp_step_empty(bringup, step)) {
> > + WARN_ON_ONCE(1);
> > + return 0;
> > + }
>
> This changes the behaviour of fail.. might be best to refactor without
> changing behaviour.
>
Aah, the trick is in cpuhp_next_state() skipping empty states, so we'll
never get there.
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH 3/4] cpu/hotplug: Add cpuhp_invoke_callback_range()
2021-01-20 17:53 ` Peter Zijlstra
@ 2021-01-21 10:57 ` Vincent Donnefort
2021-01-21 11:18 ` Peter Zijlstra
0 siblings, 1 reply; 17+ messages in thread
From: Vincent Donnefort @ 2021-01-21 10:57 UTC (permalink / raw)
To: Peter Zijlstra; +Cc: tglx, linux-kernel, valentin.schneider
On Wed, Jan 20, 2021 at 06:53:33PM +0100, Peter Zijlstra wrote:
> On Wed, Jan 20, 2021 at 06:45:16PM +0100, Peter Zijlstra wrote:
> > On Mon, Jan 11, 2021 at 05:10:46PM +0000, vincent.donnefort@arm.com wrote:
> > > @@ -475,6 +478,11 @@ cpuhp_set_state(struct cpuhp_cpu_state *st, enum cpuhp_state target)
> > > static inline void
> > > cpuhp_reset_state(struct cpuhp_cpu_state *st, enum cpuhp_state prev_state)
> > > {
> > > + st->target = prev_state;
> > > +
> > > + if (st->rollback)
> > > + return;
> >
> > I'm thinking that if we call rollback while already rollback we're hosed
> > something fierce, no?
> >
> > That like going up, failing, going back down again, also failing, giving
> > up in a fiery death.
>
> Ooh, is this a hack for _cpu_down():
>
> ret = cpuhp_down_callbacks(cpu, st, target);
> if (ret && st->state == CPUHP_TEARDOWN_CPU && st->state < prev_state) {
> cpuhp_reset_state(st, prev_state);
> __cpuhp_kick_ap(st);
> }
>
> Where cpuhp_down_callbacks() can already have called cpuhp_reset_state() ?
Yes, it is now possible that this function will be called twice during the
rollback. Shall I avoid this and treat the case above differently ? i.e. "if we
are here, state has already been reset, and we should only set st->target".
--
Vincent
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH 3/4] cpu/hotplug: Add cpuhp_invoke_callback_range()
2021-01-21 10:57 ` Vincent Donnefort
@ 2021-01-21 11:18 ` Peter Zijlstra
0 siblings, 0 replies; 17+ messages in thread
From: Peter Zijlstra @ 2021-01-21 11:18 UTC (permalink / raw)
To: Vincent Donnefort; +Cc: tglx, linux-kernel, valentin.schneider
On Thu, Jan 21, 2021 at 10:57:57AM +0000, Vincent Donnefort wrote:
> On Wed, Jan 20, 2021 at 06:53:33PM +0100, Peter Zijlstra wrote:
> > On Wed, Jan 20, 2021 at 06:45:16PM +0100, Peter Zijlstra wrote:
> > > On Mon, Jan 11, 2021 at 05:10:46PM +0000, vincent.donnefort@arm.com wrote:
> > > > @@ -475,6 +478,11 @@ cpuhp_set_state(struct cpuhp_cpu_state *st, enum cpuhp_state target)
> > > > static inline void
> > > > cpuhp_reset_state(struct cpuhp_cpu_state *st, enum cpuhp_state prev_state)
> > > > {
> > > > + st->target = prev_state;
> > > > +
> > > > + if (st->rollback)
> > > > + return;
> > >
> > > I'm thinking that if we call rollback while already rollback we're hosed
> > > something fierce, no?
> > >
> > > That like going up, failing, going back down again, also failing, giving
> > > up in a fiery death.
> >
> > Ooh, is this a hack for _cpu_down():
> >
> > ret = cpuhp_down_callbacks(cpu, st, target);
> > if (ret && st->state == CPUHP_TEARDOWN_CPU && st->state < prev_state) {
> > cpuhp_reset_state(st, prev_state);
> > __cpuhp_kick_ap(st);
> > }
> >
> > Where cpuhp_down_callbacks() can already have called cpuhp_reset_state() ?
>
> Yes, it is now possible that this function will be called twice during the
> rollback. Shall I avoid this and treat the case above differently ? i.e. "if we
> are here, state has already been reset, and we should only set st->target".
Not sure, but a comment would be useful :-)
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH 4/4] cpu/hotplug: Fix CPU down rollback
2021-01-11 17:10 ` [PATCH 4/4] cpu/hotplug: Fix CPU down rollback vincent.donnefort
@ 2021-01-21 14:57 ` Peter Zijlstra
2021-01-21 15:43 ` Vincent Donnefort
0 siblings, 1 reply; 17+ messages in thread
From: Peter Zijlstra @ 2021-01-21 14:57 UTC (permalink / raw)
To: vincent.donnefort; +Cc: tglx, linux-kernel, valentin.schneider
On Mon, Jan 11, 2021 at 05:10:47PM +0000, vincent.donnefort@arm.com wrote:
> From: Vincent Donnefort <vincent.donnefort@arm.com>
>
> After the AP brought itself down to CPUHP_TEARDOWN_CPU, the BP will finish
> the job. The steps left are as followed:
>
> +--------------------+
> | CPUHP_TEARDOWN_CPU | -> If fails state is CPUHP_TEARDOWN_CPU
> +--------------------+
> | ATOMIC STATES | -> Cannot Fail
> +--------------------+
> | CPUHP_BRINGUP_CPU | -> Cannot fail
> +--------------------+
> | ... |
> | ... | -> Can fail and rollback
These are the PREPARE/DEAD states, right? It would be _really_ daft for
a DEAD notifier to fail. But yeah, I suppose that if it does, it will
indeed end up in CPUHP_AP_ONLINE_IDLE.
Do we want to WARN when a DEAD notifier fails?
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH 4/4] cpu/hotplug: Fix CPU down rollback
2021-01-21 14:57 ` Peter Zijlstra
@ 2021-01-21 15:43 ` Vincent Donnefort
0 siblings, 0 replies; 17+ messages in thread
From: Vincent Donnefort @ 2021-01-21 15:43 UTC (permalink / raw)
To: Peter Zijlstra; +Cc: tglx, linux-kernel, valentin.schneider
On Thu, Jan 21, 2021 at 03:57:03PM +0100, Peter Zijlstra wrote:
> On Mon, Jan 11, 2021 at 05:10:47PM +0000, vincent.donnefort@arm.com wrote:
> > From: Vincent Donnefort <vincent.donnefort@arm.com>
> >
> > After the AP brought itself down to CPUHP_TEARDOWN_CPU, the BP will finish
> > the job. The steps left are as followed:
> >
> > +--------------------+
> > | CPUHP_TEARDOWN_CPU | -> If fails state is CPUHP_TEARDOWN_CPU
> > +--------------------+
> > | ATOMIC STATES | -> Cannot Fail
> > +--------------------+
> > | CPUHP_BRINGUP_CPU | -> Cannot fail
> > +--------------------+
> > | ... |
> > | ... | -> Can fail and rollback
>
> These are the PREPARE/DEAD states, right? It would be _really_ daft for
> a DEAD notifier to fail. But yeah, I suppose that if it does, it will
> indeed end up in CPUHP_AP_ONLINE_IDLE.
>
> Do we want to WARN when a DEAD notifier fails?
>
>
Indeed, I couldn't find a dead callback which can return an error. So I
suppose we could go for another strategy here, that would be to not allow
failure for the dead states (i.e states < BRINGUP_CPU when hotunplug). In
that case, the fail interface should probably disallow selecting those
states and a WARN here would be good.
--
Vincent
^ permalink raw reply [flat|nested] 17+ messages in thread
end of thread, other threads:[~2021-01-21 16:57 UTC | newest]
Thread overview: 17+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-01-11 17:10 [PATCH 0/4] cpu/hotplug: rollback and "fail" interface fixes vincent.donnefort
2021-01-11 17:10 ` [PATCH 1/4] cpu/hotplug: Allowing to reset fail injection vincent.donnefort
2021-01-11 17:10 ` [PATCH 2/4] cpu/hotplug: CPUHP_BRINGUP_CPU exception in " vincent.donnefort
2021-01-20 12:58 ` Peter Zijlstra
2021-01-20 15:17 ` Vincent Donnefort
2021-01-20 17:30 ` Peter Zijlstra
2021-01-11 17:10 ` [PATCH 3/4] cpu/hotplug: Add cpuhp_invoke_callback_range() vincent.donnefort
2021-01-20 13:11 ` Peter Zijlstra
2021-01-20 17:54 ` Peter Zijlstra
2021-01-20 17:45 ` Peter Zijlstra
2021-01-20 17:53 ` Peter Zijlstra
2021-01-21 10:57 ` Vincent Donnefort
2021-01-21 11:18 ` Peter Zijlstra
2021-01-20 17:49 ` Peter Zijlstra
2021-01-11 17:10 ` [PATCH 4/4] cpu/hotplug: Fix CPU down rollback vincent.donnefort
2021-01-21 14:57 ` Peter Zijlstra
2021-01-21 15:43 ` Vincent Donnefort
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).