* [PATCH 0/4] cpu/hotplug: rollback and "fail" interface fixes @ 2021-01-11 17:10 vincent.donnefort 2021-01-11 17:10 ` [PATCH 1/4] cpu/hotplug: Allowing to reset fail injection vincent.donnefort ` (3 more replies) 0 siblings, 4 replies; 17+ messages in thread From: vincent.donnefort @ 2021-01-11 17:10 UTC (permalink / raw) To: peterz, tglx, linux-kernel; +Cc: valentin.schneider, Vincent Donnefort From: Vincent Donnefort <vincent.donnefort@arm.com> This patch-set intends mainly to fix HP rollback, which is currently broken, due to an inconsistent "state" usage and an issue with CPUHP_AP_ONLINE_IDLE. It also improves the "fail" interface, which can now be reset and will reject CPUHP_BRINGUP_CPU, a step that can't always fail. Vincent Donnefort (4): cpu/hotplug: Allowing to reset fail injection cpu/hotplug: CPUHP_BRINGUP_CPU exception in fail injection cpu/hotplug: Add cpuhp_invoke_callback_range() cpu/hotplug: Fix CPU down rollback kernel/cpu.c | 173 +++++++++++++++++++++++++++++++++++++---------------------- 1 file changed, 110 insertions(+), 63 deletions(-) -- 2.7.4 ^ permalink raw reply [flat|nested] 17+ messages in thread
* [PATCH 1/4] cpu/hotplug: Allowing to reset fail injection 2021-01-11 17:10 [PATCH 0/4] cpu/hotplug: rollback and "fail" interface fixes vincent.donnefort @ 2021-01-11 17:10 ` vincent.donnefort 2021-01-11 17:10 ` [PATCH 2/4] cpu/hotplug: CPUHP_BRINGUP_CPU exception in " vincent.donnefort ` (2 subsequent siblings) 3 siblings, 0 replies; 17+ messages in thread From: vincent.donnefort @ 2021-01-11 17:10 UTC (permalink / raw) To: peterz, tglx, linux-kernel; +Cc: valentin.schneider, Vincent Donnefort From: Vincent Donnefort <vincent.donnefort@arm.com> Currently, the only way of resetting this file is to actually try to run a hotplug, hotunplug or both. This is quite annoying for testing and, as the default value for this file is -1, it seems quite natural to let a user write it. Signed-off-by: Vincent Donnefort <vincent.donnefort@arm.com> --- kernel/cpu.c | 5 +++++ 1 file changed, 5 insertions(+) diff --git a/kernel/cpu.c b/kernel/cpu.c index 1b6302e..9121edf 100644 --- a/kernel/cpu.c +++ b/kernel/cpu.c @@ -2207,6 +2207,11 @@ static ssize_t write_cpuhp_fail(struct device *dev, if (ret) return ret; + if (fail == CPUHP_INVALID) { + st->fail = fail; + return count; + } + if (fail < CPUHP_OFFLINE || fail > CPUHP_ONLINE) return -EINVAL; -- 2.7.4 ^ permalink raw reply related [flat|nested] 17+ messages in thread
* [PATCH 2/4] cpu/hotplug: CPUHP_BRINGUP_CPU exception in fail injection 2021-01-11 17:10 [PATCH 0/4] cpu/hotplug: rollback and "fail" interface fixes vincent.donnefort 2021-01-11 17:10 ` [PATCH 1/4] cpu/hotplug: Allowing to reset fail injection vincent.donnefort @ 2021-01-11 17:10 ` vincent.donnefort 2021-01-20 12:58 ` Peter Zijlstra 2021-01-11 17:10 ` [PATCH 3/4] cpu/hotplug: Add cpuhp_invoke_callback_range() vincent.donnefort 2021-01-11 17:10 ` [PATCH 4/4] cpu/hotplug: Fix CPU down rollback vincent.donnefort 3 siblings, 1 reply; 17+ messages in thread From: vincent.donnefort @ 2021-01-11 17:10 UTC (permalink / raw) To: peterz, tglx, linux-kernel; +Cc: valentin.schneider, Vincent Donnefort From: Vincent Donnefort <vincent.donnefort@arm.com> The atomic states (between CPUHP_AP_IDLE_DEAD and CPUHP_AP_ONLINE) are triggered by the CPUHP_BRINGUP_CPU step. If the latter doesn't run, none of the atomic can. Hence, rollback is not possible after a hotunplug CPUHP_BRINGUP_CPU step failure and the "fail" interface shouldn't allow it. Moreover, the current CPUHP_BRINGUP_CPU teardown callback (finish_cpu()) cannot fail anyway. Signed-off-by: Vincent Donnefort <vincent.donnefort@arm.com> --- kernel/cpu.c | 9 +++++++-- 1 file changed, 7 insertions(+), 2 deletions(-) diff --git a/kernel/cpu.c b/kernel/cpu.c index 9121edf..bcd7b2a 100644 --- a/kernel/cpu.c +++ b/kernel/cpu.c @@ -2216,9 +2216,14 @@ static ssize_t write_cpuhp_fail(struct device *dev, return -EINVAL; /* - * Cannot fail STARTING/DYING callbacks. + * Cannot fail STARTING/DYING callbacks. Also, those callbacks are + * triggered by BRINGUP_CPU bringup callback. Therefore, the latter + * can't fail during hotunplug, as it would mean we have no way of + * rolling back the atomic states that have been previously teared + * down. */ - if (cpuhp_is_atomic_state(fail)) + if (cpuhp_is_atomic_state(fail) || + (fail == CPUHP_BRINGUP_CPU && st->state > CPUHP_BRINGUP_CPU)) return -EINVAL; /* -- 2.7.4 ^ permalink raw reply related [flat|nested] 17+ messages in thread
* Re: [PATCH 2/4] cpu/hotplug: CPUHP_BRINGUP_CPU exception in fail injection 2021-01-11 17:10 ` [PATCH 2/4] cpu/hotplug: CPUHP_BRINGUP_CPU exception in " vincent.donnefort @ 2021-01-20 12:58 ` Peter Zijlstra 2021-01-20 15:17 ` Vincent Donnefort 0 siblings, 1 reply; 17+ messages in thread From: Peter Zijlstra @ 2021-01-20 12:58 UTC (permalink / raw) To: vincent.donnefort; +Cc: tglx, linux-kernel, valentin.schneider On Mon, Jan 11, 2021 at 05:10:45PM +0000, vincent.donnefort@arm.com wrote: > From: Vincent Donnefort <vincent.donnefort@arm.com> > > The atomic states (between CPUHP_AP_IDLE_DEAD and CPUHP_AP_ONLINE) are > triggered by the CPUHP_BRINGUP_CPU step. If the latter doesn't run, none > of the atomic can. Hence, rollback is not possible after a hotunplug > CPUHP_BRINGUP_CPU step failure and the "fail" interface shouldn't allow > it. Moreover, the current CPUHP_BRINGUP_CPU teardown callback > (finish_cpu()) cannot fail anyway. > > Signed-off-by: Vincent Donnefort <vincent.donnefort@arm.com> > --- > kernel/cpu.c | 9 +++++++-- > 1 file changed, 7 insertions(+), 2 deletions(-) > > diff --git a/kernel/cpu.c b/kernel/cpu.c > index 9121edf..bcd7b2a 100644 > --- a/kernel/cpu.c > +++ b/kernel/cpu.c > @@ -2216,9 +2216,14 @@ static ssize_t write_cpuhp_fail(struct device *dev, > return -EINVAL; > > /* > - * Cannot fail STARTING/DYING callbacks. > + * Cannot fail STARTING/DYING callbacks. Also, those callbacks are > + * triggered by BRINGUP_CPU bringup callback. Therefore, the latter > + * can't fail during hotunplug, as it would mean we have no way of > + * rolling back the atomic states that have been previously teared > + * down. > */ > - if (cpuhp_is_atomic_state(fail)) > + if (cpuhp_is_atomic_state(fail) || > + (fail == CPUHP_BRINGUP_CPU && st->state > CPUHP_BRINGUP_CPU)) > return -EINVAL; Should we instead disallow failing any state that has .cant_stop ? ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH 2/4] cpu/hotplug: CPUHP_BRINGUP_CPU exception in fail injection 2021-01-20 12:58 ` Peter Zijlstra @ 2021-01-20 15:17 ` Vincent Donnefort 2021-01-20 17:30 ` Peter Zijlstra 0 siblings, 1 reply; 17+ messages in thread From: Vincent Donnefort @ 2021-01-20 15:17 UTC (permalink / raw) To: Peter Zijlstra; +Cc: tglx, linux-kernel, valentin.schneider On Wed, Jan 20, 2021 at 01:58:35PM +0100, Peter Zijlstra wrote: > On Mon, Jan 11, 2021 at 05:10:45PM +0000, vincent.donnefort@arm.com wrote: > > From: Vincent Donnefort <vincent.donnefort@arm.com> > > > > The atomic states (between CPUHP_AP_IDLE_DEAD and CPUHP_AP_ONLINE) are > > triggered by the CPUHP_BRINGUP_CPU step. If the latter doesn't run, none > > of the atomic can. Hence, rollback is not possible after a hotunplug > > CPUHP_BRINGUP_CPU step failure and the "fail" interface shouldn't allow > > it. Moreover, the current CPUHP_BRINGUP_CPU teardown callback > > (finish_cpu()) cannot fail anyway. > > > > Signed-off-by: Vincent Donnefort <vincent.donnefort@arm.com> > > --- > > kernel/cpu.c | 9 +++++++-- > > 1 file changed, 7 insertions(+), 2 deletions(-) > > > > diff --git a/kernel/cpu.c b/kernel/cpu.c > > index 9121edf..bcd7b2a 100644 > > --- a/kernel/cpu.c > > +++ b/kernel/cpu.c > > @@ -2216,9 +2216,14 @@ static ssize_t write_cpuhp_fail(struct device *dev, > > return -EINVAL; > > > > /* > > - * Cannot fail STARTING/DYING callbacks. > > + * Cannot fail STARTING/DYING callbacks. Also, those callbacks are > > + * triggered by BRINGUP_CPU bringup callback. Therefore, the latter > > + * can't fail during hotunplug, as it would mean we have no way of > > + * rolling back the atomic states that have been previously teared > > + * down. > > */ > > - if (cpuhp_is_atomic_state(fail)) > > + if (cpuhp_is_atomic_state(fail) || > > + (fail == CPUHP_BRINGUP_CPU && st->state > CPUHP_BRINGUP_CPU)) > > return -EINVAL; > > Should we instead disallow failing any state that has .cant_stop ? We would reduce the scope of what can be tested: bringup_cpu() and takedown_cpu() are both marked as "cant_stop". Still, those callbacks are allowed to fail. Checking for cant_stop, made me also see that write_cpuhp_target() is probably missing a check for cpuhp_is_atomic_state(). For the same reason as this patch, when doing cpu_down(), we can't stop in one of these states. -- Vincent ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH 2/4] cpu/hotplug: CPUHP_BRINGUP_CPU exception in fail injection 2021-01-20 15:17 ` Vincent Donnefort @ 2021-01-20 17:30 ` Peter Zijlstra 0 siblings, 0 replies; 17+ messages in thread From: Peter Zijlstra @ 2021-01-20 17:30 UTC (permalink / raw) To: Vincent Donnefort; +Cc: tglx, linux-kernel, valentin.schneider On Wed, Jan 20, 2021 at 03:17:24PM +0000, Vincent Donnefort wrote: > On Wed, Jan 20, 2021 at 01:58:35PM +0100, Peter Zijlstra wrote: > > On Mon, Jan 11, 2021 at 05:10:45PM +0000, vincent.donnefort@arm.com wrote: > > > + if (cpuhp_is_atomic_state(fail) || > > > + (fail == CPUHP_BRINGUP_CPU && st->state > CPUHP_BRINGUP_CPU)) > > > return -EINVAL; > > > > Should we instead disallow failing any state that has .cant_stop ? > > We would reduce the scope of what can be tested: bringup_cpu() and > takedown_cpu() are both marked as "cant_stop". Still, those callbacks are > allowed to fail. Fair enough. I suppose we can add an additional cant_fail field, but I'm not sure that's worth the effort over this. ^ permalink raw reply [flat|nested] 17+ messages in thread
* [PATCH 3/4] cpu/hotplug: Add cpuhp_invoke_callback_range() 2021-01-11 17:10 [PATCH 0/4] cpu/hotplug: rollback and "fail" interface fixes vincent.donnefort 2021-01-11 17:10 ` [PATCH 1/4] cpu/hotplug: Allowing to reset fail injection vincent.donnefort 2021-01-11 17:10 ` [PATCH 2/4] cpu/hotplug: CPUHP_BRINGUP_CPU exception in " vincent.donnefort @ 2021-01-11 17:10 ` vincent.donnefort 2021-01-20 13:11 ` Peter Zijlstra ` (2 more replies) 2021-01-11 17:10 ` [PATCH 4/4] cpu/hotplug: Fix CPU down rollback vincent.donnefort 3 siblings, 3 replies; 17+ messages in thread From: vincent.donnefort @ 2021-01-11 17:10 UTC (permalink / raw) To: peterz, tglx, linux-kernel; +Cc: valentin.schneider, Vincent Donnefort From: Vincent Donnefort <vincent.donnefort@arm.com> Factorizing and unifying cpuhp callback range invocations, especially for the hotunplug path, where two different ways of decrementing were used. The first one, decrements before the callback is called: cpuhp_thread_fun() state = st->state; st->state--; cpuhp_invoke_callback(state); The second one, after: take_down_cpu()|cpuhp_down_callbacks() cpuhp_invoke_callback(st->state); st->state--; This is problematic for rolling back the steps in case of error, as depending on the decrement, the rollback will start from N or N-1. It also makes tracing inconsistent, between steps run in the cpuhp thread and the others. Additionally, avoid useless cpuhp_thread_fun() loops by skipping empty steps. Signed-off-by: Vincent Donnefort <vincent.donnefort@arm.com> --- kernel/cpu.c | 150 +++++++++++++++++++++++++++++++++++------------------------ 1 file changed, 90 insertions(+), 60 deletions(-) diff --git a/kernel/cpu.c b/kernel/cpu.c index bcd7b2a..aebec3a 100644 --- a/kernel/cpu.c +++ b/kernel/cpu.c @@ -135,6 +135,11 @@ static struct cpuhp_step *cpuhp_get_step(enum cpuhp_state state) return cpuhp_hp_states + state; } +static bool cpuhp_step_empty(bool bringup, struct cpuhp_step *step) +{ + return bringup ? !step->startup.single : !step->teardown.single; +} + /** * cpuhp_invoke_callback _ Invoke the callbacks for a given state * @cpu: The cpu for which the callback should be invoked @@ -157,26 +162,24 @@ static int cpuhp_invoke_callback(unsigned int cpu, enum cpuhp_state state, if (st->fail == state) { st->fail = CPUHP_INVALID; - - if (!(bringup ? step->startup.single : step->teardown.single)) - return 0; - return -EAGAIN; } + if (cpuhp_step_empty(bringup, step)) { + WARN_ON_ONCE(1); + return 0; + } + if (!step->multi_instance) { WARN_ON_ONCE(lastp && *lastp); cb = bringup ? step->startup.single : step->teardown.single; - if (!cb) - return 0; + trace_cpuhp_enter(cpu, st->target, state, cb); ret = cb(cpu); trace_cpuhp_exit(cpu, st->state, state, ret); return ret; } cbm = bringup ? step->startup.multi : step->teardown.multi; - if (!cbm) - return 0; /* Single invocation for instance add/remove */ if (node) { @@ -475,6 +478,11 @@ cpuhp_set_state(struct cpuhp_cpu_state *st, enum cpuhp_state target) static inline void cpuhp_reset_state(struct cpuhp_cpu_state *st, enum cpuhp_state prev_state) { + st->target = prev_state; + + if (st->rollback) + return; + st->rollback = true; /* @@ -488,7 +496,6 @@ cpuhp_reset_state(struct cpuhp_cpu_state *st, enum cpuhp_state prev_state) st->state++; } - st->target = prev_state; st->bringup = !st->bringup; } @@ -591,10 +598,53 @@ static int finish_cpu(unsigned int cpu) * Hotplug state machine related functions */ -static void undo_cpu_up(unsigned int cpu, struct cpuhp_cpu_state *st) +/* + * Get the next state to run. Empty ones will be skipped. Returns true if a + * state must be run. + * + * st->state will be modified ahead of time, to match state_to_run, as if it + * has already ran. + */ +static bool cpuhp_next_state(bool bringup, + enum cpuhp_state *state_to_run, + struct cpuhp_cpu_state *st, + enum cpuhp_state target) { - for (st->state--; st->state > st->target; st->state--) - cpuhp_invoke_callback(cpu, st->state, false, NULL, NULL); + do { + if (bringup) { + if (st->state >= target) + return false; + + *state_to_run = ++st->state; + } else { + if (st->state <= target) + return false; + + *state_to_run = st->state--; + } + + if (!cpuhp_step_empty(bringup, cpuhp_get_step(*state_to_run))) + break; + } while (true); + + return true; +} + +static int cpuhp_invoke_callback_range(bool bringup, + unsigned int cpu, + struct cpuhp_cpu_state *st, + enum cpuhp_state target) +{ + enum cpuhp_state state; + int err = 0; + + while (cpuhp_next_state(bringup, &state, st, target)) { + err = cpuhp_invoke_callback(cpu, state, bringup, NULL, NULL); + if (err) + break; + } + + return err; } static inline bool can_rollback_cpu(struct cpuhp_cpu_state *st) @@ -617,16 +667,12 @@ static int cpuhp_up_callbacks(unsigned int cpu, struct cpuhp_cpu_state *st, enum cpuhp_state prev_state = st->state; int ret = 0; - while (st->state < target) { - st->state++; - ret = cpuhp_invoke_callback(cpu, st->state, true, NULL, NULL); - if (ret) { - if (can_rollback_cpu(st)) { - st->target = prev_state; - undo_cpu_up(cpu, st); - } - break; - } + ret = cpuhp_invoke_callback_range(true, cpu, st, target); + if (ret) { + cpuhp_reset_state(st, prev_state); + if (can_rollback_cpu(st)) + WARN_ON(cpuhp_invoke_callback_range(false, cpu, st, + prev_state)); } return ret; } @@ -690,17 +736,9 @@ static void cpuhp_thread_fun(unsigned int cpu) state = st->cb_state; st->should_run = false; } else { - if (bringup) { - st->state++; - state = st->state; - st->should_run = (st->state < st->target); - WARN_ON_ONCE(st->state > st->target); - } else { - state = st->state; - st->state--; - st->should_run = (st->state > st->target); - WARN_ON_ONCE(st->state < st->target); - } + st->should_run = cpuhp_next_state(bringup, &state, st, st->target); + if (!st->should_run) + goto end; } WARN_ON_ONCE(!cpuhp_is_ap_state(state)); @@ -728,6 +766,7 @@ static void cpuhp_thread_fun(unsigned int cpu) st->should_run = false; } +end: cpuhp_lock_release(bringup); lockdep_release_cpus_lock(); @@ -881,19 +920,18 @@ static int take_cpu_down(void *_param) return err; /* - * We get here while we are in CPUHP_TEARDOWN_CPU state and we must not - * do this step again. + * Must be called from CPUHP_TEARDOWN_CPU, which means, as we are going + * down, that the current state is CPUHP_TEARDOWN_CPU - 1. */ - WARN_ON(st->state != CPUHP_TEARDOWN_CPU); - st->state--; + WARN_ON(st->state != (CPUHP_TEARDOWN_CPU - 1)); + /* Invoke the former CPU_DYING callbacks */ - for (; st->state > target; st->state--) { - ret = cpuhp_invoke_callback(cpu, st->state, false, NULL, NULL); + ret = cpuhp_invoke_callback_range(false, cpu, st, target); + if (ret) /* * DYING must not fail! */ WARN_ON_ONCE(ret); - } /* Give up timekeeping duties */ tick_handover_do_timer(); @@ -975,27 +1013,22 @@ void cpuhp_report_idle_dead(void) cpuhp_complete_idle_dead, st, 0); } -static void undo_cpu_down(unsigned int cpu, struct cpuhp_cpu_state *st) -{ - for (st->state++; st->state < st->target; st->state++) - cpuhp_invoke_callback(cpu, st->state, true, NULL, NULL); -} - static int cpuhp_down_callbacks(unsigned int cpu, struct cpuhp_cpu_state *st, enum cpuhp_state target) { enum cpuhp_state prev_state = st->state; int ret = 0; - for (; st->state > target; st->state--) { - ret = cpuhp_invoke_callback(cpu, st->state, false, NULL, NULL); - if (ret) { - st->target = prev_state; - if (st->state < prev_state) - undo_cpu_down(cpu, st); - break; - } + ret = cpuhp_invoke_callback_range(false, cpu, st, target); + if (ret) { + + cpuhp_reset_state(st, prev_state); + + if (st->state < prev_state) + WARN_ON(cpuhp_invoke_callback_range(true, cpu, st, + prev_state)); } + return ret; } @@ -1164,14 +1197,12 @@ void notify_cpu_starting(unsigned int cpu) rcu_cpu_starting(cpu); /* Enables RCU usage on this CPU. */ cpumask_set_cpu(cpu, &cpus_booted_once_mask); - while (st->state < target) { - st->state++; - ret = cpuhp_invoke_callback(cpu, st->state, true, NULL, NULL); + ret = cpuhp_invoke_callback_range(true, cpu, st, target); + if (ret) /* * STARTING must not fail! */ WARN_ON_ONCE(ret); - } } /* @@ -1777,8 +1808,7 @@ static int cpuhp_issue_call(int cpu, enum cpuhp_state state, bool bringup, * If there's nothing to do, we done. * Relies on the union for multi_instance. */ - if ((bringup && !sp->startup.single) || - (!bringup && !sp->teardown.single)) + if (cpuhp_step_empty(bringup, sp)) return 0; /* * The non AP bound callbacks can fail on bringup. On teardown -- 2.7.4 ^ permalink raw reply related [flat|nested] 17+ messages in thread
* Re: [PATCH 3/4] cpu/hotplug: Add cpuhp_invoke_callback_range() 2021-01-11 17:10 ` [PATCH 3/4] cpu/hotplug: Add cpuhp_invoke_callback_range() vincent.donnefort @ 2021-01-20 13:11 ` Peter Zijlstra 2021-01-20 17:54 ` Peter Zijlstra 2021-01-20 17:45 ` Peter Zijlstra 2021-01-20 17:49 ` Peter Zijlstra 2 siblings, 1 reply; 17+ messages in thread From: Peter Zijlstra @ 2021-01-20 13:11 UTC (permalink / raw) To: vincent.donnefort; +Cc: tglx, linux-kernel, valentin.schneider On Mon, Jan 11, 2021 at 05:10:46PM +0000, vincent.donnefort@arm.com wrote: > @@ -157,26 +162,24 @@ static int cpuhp_invoke_callback(unsigned int cpu, enum cpuhp_state state, > > if (st->fail == state) { > st->fail = CPUHP_INVALID; > - > - if (!(bringup ? step->startup.single : step->teardown.single)) > - return 0; > - > return -EAGAIN; > } > > + if (cpuhp_step_empty(bringup, step)) { > + WARN_ON_ONCE(1); > + return 0; > + } This changes the behaviour of fail.. might be best to refactor without changing behaviour. Lemme continue reading. ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH 3/4] cpu/hotplug: Add cpuhp_invoke_callback_range() 2021-01-20 13:11 ` Peter Zijlstra @ 2021-01-20 17:54 ` Peter Zijlstra 0 siblings, 0 replies; 17+ messages in thread From: Peter Zijlstra @ 2021-01-20 17:54 UTC (permalink / raw) To: vincent.donnefort; +Cc: tglx, linux-kernel, valentin.schneider On Wed, Jan 20, 2021 at 02:11:14PM +0100, Peter Zijlstra wrote: > On Mon, Jan 11, 2021 at 05:10:46PM +0000, vincent.donnefort@arm.com wrote: > > @@ -157,26 +162,24 @@ static int cpuhp_invoke_callback(unsigned int cpu, enum cpuhp_state state, > > > > if (st->fail == state) { > > st->fail = CPUHP_INVALID; > > - > > - if (!(bringup ? step->startup.single : step->teardown.single)) > > - return 0; > > - > > return -EAGAIN; > > } > > > > + if (cpuhp_step_empty(bringup, step)) { > > + WARN_ON_ONCE(1); > > + return 0; > > + } > > This changes the behaviour of fail.. might be best to refactor without > changing behaviour. > Aah, the trick is in cpuhp_next_state() skipping empty states, so we'll never get there. ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH 3/4] cpu/hotplug: Add cpuhp_invoke_callback_range() 2021-01-11 17:10 ` [PATCH 3/4] cpu/hotplug: Add cpuhp_invoke_callback_range() vincent.donnefort 2021-01-20 13:11 ` Peter Zijlstra @ 2021-01-20 17:45 ` Peter Zijlstra 2021-01-20 17:53 ` Peter Zijlstra 2021-01-20 17:49 ` Peter Zijlstra 2 siblings, 1 reply; 17+ messages in thread From: Peter Zijlstra @ 2021-01-20 17:45 UTC (permalink / raw) To: vincent.donnefort; +Cc: tglx, linux-kernel, valentin.schneider On Mon, Jan 11, 2021 at 05:10:46PM +0000, vincent.donnefort@arm.com wrote: > @@ -475,6 +478,11 @@ cpuhp_set_state(struct cpuhp_cpu_state *st, enum cpuhp_state target) > static inline void > cpuhp_reset_state(struct cpuhp_cpu_state *st, enum cpuhp_state prev_state) > { > + st->target = prev_state; > + > + if (st->rollback) > + return; I'm thinking that if we call rollback while already rollback we're hosed something fierce, no? That like going up, failing, going back down again, also failing, giving up in a fiery death. ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH 3/4] cpu/hotplug: Add cpuhp_invoke_callback_range() 2021-01-20 17:45 ` Peter Zijlstra @ 2021-01-20 17:53 ` Peter Zijlstra 2021-01-21 10:57 ` Vincent Donnefort 0 siblings, 1 reply; 17+ messages in thread From: Peter Zijlstra @ 2021-01-20 17:53 UTC (permalink / raw) To: vincent.donnefort; +Cc: tglx, linux-kernel, valentin.schneider On Wed, Jan 20, 2021 at 06:45:16PM +0100, Peter Zijlstra wrote: > On Mon, Jan 11, 2021 at 05:10:46PM +0000, vincent.donnefort@arm.com wrote: > > @@ -475,6 +478,11 @@ cpuhp_set_state(struct cpuhp_cpu_state *st, enum cpuhp_state target) > > static inline void > > cpuhp_reset_state(struct cpuhp_cpu_state *st, enum cpuhp_state prev_state) > > { > > + st->target = prev_state; > > + > > + if (st->rollback) > > + return; > > I'm thinking that if we call rollback while already rollback we're hosed > something fierce, no? > > That like going up, failing, going back down again, also failing, giving > up in a fiery death. Ooh, is this a hack for _cpu_down(): ret = cpuhp_down_callbacks(cpu, st, target); if (ret && st->state == CPUHP_TEARDOWN_CPU && st->state < prev_state) { cpuhp_reset_state(st, prev_state); __cpuhp_kick_ap(st); } Where cpuhp_down_callbacks() can already have called cpuhp_reset_state() ? ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH 3/4] cpu/hotplug: Add cpuhp_invoke_callback_range() 2021-01-20 17:53 ` Peter Zijlstra @ 2021-01-21 10:57 ` Vincent Donnefort 2021-01-21 11:18 ` Peter Zijlstra 0 siblings, 1 reply; 17+ messages in thread From: Vincent Donnefort @ 2021-01-21 10:57 UTC (permalink / raw) To: Peter Zijlstra; +Cc: tglx, linux-kernel, valentin.schneider On Wed, Jan 20, 2021 at 06:53:33PM +0100, Peter Zijlstra wrote: > On Wed, Jan 20, 2021 at 06:45:16PM +0100, Peter Zijlstra wrote: > > On Mon, Jan 11, 2021 at 05:10:46PM +0000, vincent.donnefort@arm.com wrote: > > > @@ -475,6 +478,11 @@ cpuhp_set_state(struct cpuhp_cpu_state *st, enum cpuhp_state target) > > > static inline void > > > cpuhp_reset_state(struct cpuhp_cpu_state *st, enum cpuhp_state prev_state) > > > { > > > + st->target = prev_state; > > > + > > > + if (st->rollback) > > > + return; > > > > I'm thinking that if we call rollback while already rollback we're hosed > > something fierce, no? > > > > That like going up, failing, going back down again, also failing, giving > > up in a fiery death. > > Ooh, is this a hack for _cpu_down(): > > ret = cpuhp_down_callbacks(cpu, st, target); > if (ret && st->state == CPUHP_TEARDOWN_CPU && st->state < prev_state) { > cpuhp_reset_state(st, prev_state); > __cpuhp_kick_ap(st); > } > > Where cpuhp_down_callbacks() can already have called cpuhp_reset_state() ? Yes, it is now possible that this function will be called twice during the rollback. Shall I avoid this and treat the case above differently ? i.e. "if we are here, state has already been reset, and we should only set st->target". -- Vincent ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH 3/4] cpu/hotplug: Add cpuhp_invoke_callback_range() 2021-01-21 10:57 ` Vincent Donnefort @ 2021-01-21 11:18 ` Peter Zijlstra 0 siblings, 0 replies; 17+ messages in thread From: Peter Zijlstra @ 2021-01-21 11:18 UTC (permalink / raw) To: Vincent Donnefort; +Cc: tglx, linux-kernel, valentin.schneider On Thu, Jan 21, 2021 at 10:57:57AM +0000, Vincent Donnefort wrote: > On Wed, Jan 20, 2021 at 06:53:33PM +0100, Peter Zijlstra wrote: > > On Wed, Jan 20, 2021 at 06:45:16PM +0100, Peter Zijlstra wrote: > > > On Mon, Jan 11, 2021 at 05:10:46PM +0000, vincent.donnefort@arm.com wrote: > > > > @@ -475,6 +478,11 @@ cpuhp_set_state(struct cpuhp_cpu_state *st, enum cpuhp_state target) > > > > static inline void > > > > cpuhp_reset_state(struct cpuhp_cpu_state *st, enum cpuhp_state prev_state) > > > > { > > > > + st->target = prev_state; > > > > + > > > > + if (st->rollback) > > > > + return; > > > > > > I'm thinking that if we call rollback while already rollback we're hosed > > > something fierce, no? > > > > > > That like going up, failing, going back down again, also failing, giving > > > up in a fiery death. > > > > Ooh, is this a hack for _cpu_down(): > > > > ret = cpuhp_down_callbacks(cpu, st, target); > > if (ret && st->state == CPUHP_TEARDOWN_CPU && st->state < prev_state) { > > cpuhp_reset_state(st, prev_state); > > __cpuhp_kick_ap(st); > > } > > > > Where cpuhp_down_callbacks() can already have called cpuhp_reset_state() ? > > Yes, it is now possible that this function will be called twice during the > rollback. Shall I avoid this and treat the case above differently ? i.e. "if we > are here, state has already been reset, and we should only set st->target". Not sure, but a comment would be useful :-) ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH 3/4] cpu/hotplug: Add cpuhp_invoke_callback_range() 2021-01-11 17:10 ` [PATCH 3/4] cpu/hotplug: Add cpuhp_invoke_callback_range() vincent.donnefort 2021-01-20 13:11 ` Peter Zijlstra 2021-01-20 17:45 ` Peter Zijlstra @ 2021-01-20 17:49 ` Peter Zijlstra 2 siblings, 0 replies; 17+ messages in thread From: Peter Zijlstra @ 2021-01-20 17:49 UTC (permalink / raw) To: vincent.donnefort; +Cc: tglx, linux-kernel, valentin.schneider On Mon, Jan 11, 2021 at 05:10:46PM +0000, vincent.donnefort@arm.com wrote: > + if (can_rollback_cpu(st)) > + WARN_ON(cpuhp_invoke_callback_range(false, cpu, st, > + prev_state)); > + if (ret) > /* > * DYING must not fail! > */ > WARN_ON_ONCE(ret); > - } > + if (ret) { > + > + cpuhp_reset_state(st, prev_state); > + > + if (st->state < prev_state) > + WARN_ON(cpuhp_invoke_callback_range(true, cpu, st, > + prev_state)); > } > + if (ret) > /* > * STARTING must not fail! > */ > WARN_ON_ONCE(ret); > - } Stuff like that is CodingStyle fail. ^ permalink raw reply [flat|nested] 17+ messages in thread
* [PATCH 4/4] cpu/hotplug: Fix CPU down rollback 2021-01-11 17:10 [PATCH 0/4] cpu/hotplug: rollback and "fail" interface fixes vincent.donnefort ` (2 preceding siblings ...) 2021-01-11 17:10 ` [PATCH 3/4] cpu/hotplug: Add cpuhp_invoke_callback_range() vincent.donnefort @ 2021-01-11 17:10 ` vincent.donnefort 2021-01-21 14:57 ` Peter Zijlstra 3 siblings, 1 reply; 17+ messages in thread From: vincent.donnefort @ 2021-01-11 17:10 UTC (permalink / raw) To: peterz, tglx, linux-kernel; +Cc: valentin.schneider, Vincent Donnefort From: Vincent Donnefort <vincent.donnefort@arm.com> After the AP brought itself down to CPUHP_TEARDOWN_CPU, the BP will finish the job. The steps left are as followed: +--------------------+ | CPUHP_TEARDOWN_CPU | -> If fails state is CPUHP_TEARDOWN_CPU +--------------------+ | ATOMIC STATES | -> Cannot Fail +--------------------+ | CPUHP_BRINGUP_CPU | -> Cannot fail +--------------------+ | ... | | ... | -> Can fail and rollback | CPUHP_OFFLINE | +--------------------+ There are, in case of failure two possibilities: 1. Failure in CPUHP_TEARDOWN_CPU, then st->state will still be CPUHP_TEARDOWN_CPU 2. Failure between CPUHP_OFFLINE and CPUHP_BRINGUP_CPU. For 2. The rollback will always run in the CPUHP_BRINGUP_CPU bringup callback (bringup_cpu()) which kicks the AP back on. The latter will then end-up setting st->state to CPUHP_AP_ONLINE_IDLE. Checking for CPUHP_AP_ONLINE_IDLE allows then to rollback in all cases. Signed-off-by: Vincent Donnefort <vincent.donnefort@arm.com> --- kernel/cpu.c | 9 ++++++++- 1 file changed, 8 insertions(+), 1 deletion(-) diff --git a/kernel/cpu.c b/kernel/cpu.c index aebec3a..8b3f84e 100644 --- a/kernel/cpu.c +++ b/kernel/cpu.c @@ -1078,7 +1078,14 @@ static int __ref _cpu_down(unsigned int cpu, int tasks_frozen, * to do the further cleanups. */ ret = cpuhp_down_callbacks(cpu, st, target); - if (ret && st->state == CPUHP_TEARDOWN_CPU && st->state < prev_state) { + if (ret && (st->state == CPUHP_AP_ONLINE_IDLE || + st->state == CPUHP_TEARDOWN_CPU) && + st->state < prev_state) { + /* + * After an error the state can be either: + * - CPUHP_TEARDOWN_CPU if this step failed. + * - CPUHP_AP_ONLINE_IDLE if any other failed. + */ cpuhp_reset_state(st, prev_state); __cpuhp_kick_ap(st); } -- 2.7.4 ^ permalink raw reply related [flat|nested] 17+ messages in thread
* Re: [PATCH 4/4] cpu/hotplug: Fix CPU down rollback 2021-01-11 17:10 ` [PATCH 4/4] cpu/hotplug: Fix CPU down rollback vincent.donnefort @ 2021-01-21 14:57 ` Peter Zijlstra 2021-01-21 15:43 ` Vincent Donnefort 0 siblings, 1 reply; 17+ messages in thread From: Peter Zijlstra @ 2021-01-21 14:57 UTC (permalink / raw) To: vincent.donnefort; +Cc: tglx, linux-kernel, valentin.schneider On Mon, Jan 11, 2021 at 05:10:47PM +0000, vincent.donnefort@arm.com wrote: > From: Vincent Donnefort <vincent.donnefort@arm.com> > > After the AP brought itself down to CPUHP_TEARDOWN_CPU, the BP will finish > the job. The steps left are as followed: > > +--------------------+ > | CPUHP_TEARDOWN_CPU | -> If fails state is CPUHP_TEARDOWN_CPU > +--------------------+ > | ATOMIC STATES | -> Cannot Fail > +--------------------+ > | CPUHP_BRINGUP_CPU | -> Cannot fail > +--------------------+ > | ... | > | ... | -> Can fail and rollback These are the PREPARE/DEAD states, right? It would be _really_ daft for a DEAD notifier to fail. But yeah, I suppose that if it does, it will indeed end up in CPUHP_AP_ONLINE_IDLE. Do we want to WARN when a DEAD notifier fails? ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH 4/4] cpu/hotplug: Fix CPU down rollback 2021-01-21 14:57 ` Peter Zijlstra @ 2021-01-21 15:43 ` Vincent Donnefort 0 siblings, 0 replies; 17+ messages in thread From: Vincent Donnefort @ 2021-01-21 15:43 UTC (permalink / raw) To: Peter Zijlstra; +Cc: tglx, linux-kernel, valentin.schneider On Thu, Jan 21, 2021 at 03:57:03PM +0100, Peter Zijlstra wrote: > On Mon, Jan 11, 2021 at 05:10:47PM +0000, vincent.donnefort@arm.com wrote: > > From: Vincent Donnefort <vincent.donnefort@arm.com> > > > > After the AP brought itself down to CPUHP_TEARDOWN_CPU, the BP will finish > > the job. The steps left are as followed: > > > > +--------------------+ > > | CPUHP_TEARDOWN_CPU | -> If fails state is CPUHP_TEARDOWN_CPU > > +--------------------+ > > | ATOMIC STATES | -> Cannot Fail > > +--------------------+ > > | CPUHP_BRINGUP_CPU | -> Cannot fail > > +--------------------+ > > | ... | > > | ... | -> Can fail and rollback > > These are the PREPARE/DEAD states, right? It would be _really_ daft for > a DEAD notifier to fail. But yeah, I suppose that if it does, it will > indeed end up in CPUHP_AP_ONLINE_IDLE. > > Do we want to WARN when a DEAD notifier fails? > > Indeed, I couldn't find a dead callback which can return an error. So I suppose we could go for another strategy here, that would be to not allow failure for the dead states (i.e states < BRINGUP_CPU when hotunplug). In that case, the fail interface should probably disallow selecting those states and a WARN here would be good. -- Vincent ^ permalink raw reply [flat|nested] 17+ messages in thread
end of thread, other threads:[~2021-01-21 16:57 UTC | newest] Thread overview: 17+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2021-01-11 17:10 [PATCH 0/4] cpu/hotplug: rollback and "fail" interface fixes vincent.donnefort 2021-01-11 17:10 ` [PATCH 1/4] cpu/hotplug: Allowing to reset fail injection vincent.donnefort 2021-01-11 17:10 ` [PATCH 2/4] cpu/hotplug: CPUHP_BRINGUP_CPU exception in " vincent.donnefort 2021-01-20 12:58 ` Peter Zijlstra 2021-01-20 15:17 ` Vincent Donnefort 2021-01-20 17:30 ` Peter Zijlstra 2021-01-11 17:10 ` [PATCH 3/4] cpu/hotplug: Add cpuhp_invoke_callback_range() vincent.donnefort 2021-01-20 13:11 ` Peter Zijlstra 2021-01-20 17:54 ` Peter Zijlstra 2021-01-20 17:45 ` Peter Zijlstra 2021-01-20 17:53 ` Peter Zijlstra 2021-01-21 10:57 ` Vincent Donnefort 2021-01-21 11:18 ` Peter Zijlstra 2021-01-20 17:49 ` Peter Zijlstra 2021-01-11 17:10 ` [PATCH 4/4] cpu/hotplug: Fix CPU down rollback vincent.donnefort 2021-01-21 14:57 ` Peter Zijlstra 2021-01-21 15:43 ` Vincent Donnefort
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).