linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v4] cpu/hotplug: Do not bail-out in DYING/STARTING sections
@ 2022-07-04 13:13 Vincent Donnefort
  2022-07-19 15:12 ` Valentin Schneider
  0 siblings, 1 reply; 4+ messages in thread
From: Vincent Donnefort @ 2022-07-04 13:13 UTC (permalink / raw)
  To: peterz, tglx
  Cc: linux-kernel, vschneid, regressions, kernel-team,
	Vincent Donnefort, Derek Dolney

The DYING/STARTING callbacks are not expected to fail. However, as reported
by Derek, drivers such as tboot are still free to return errors within
those sections, which halts the hot(un)plug and leaves the CPU in an
unrecoverable state.

No rollback being possible there, let's only log the failures and proceed
with the following steps. This restores the hotplug behaviour prior to
commit 453e41085183 ("cpu/hotplug: Add cpuhp_invoke_callback_range()")

Link: https://bugzilla.kernel.org/show_bug.cgi?id=215867
Fixes: 453e41085183 ("cpu/hotplug: Add cpuhp_invoke_callback_range()")
Reported-by: Derek Dolney <z23@posteo.net>
Signed-off-by: Vincent Donnefort <vdonnefort@google.com>
Tested-by: Derek Dolney <z23@posteo.net>

---
v3 -> v4:
   - Sorry ... wrong commit description style ...
v2 -> v3:
   - Tested-by tag.
   - Refine commit description.
   - Bugzilla link.
v1 -> v2:
   - Commit message rewording.
   - More details in the warnings.
   - Some variable renaming

diff --git a/kernel/cpu.c b/kernel/cpu.c
index bbad5e375d3b..c3617683459e 100644
--- a/kernel/cpu.c
+++ b/kernel/cpu.c
@@ -663,21 +663,51 @@ static bool cpuhp_next_state(bool bringup,
 	return true;
 }
 
-static int cpuhp_invoke_callback_range(bool bringup,
-				       unsigned int cpu,
-				       struct cpuhp_cpu_state *st,
-				       enum cpuhp_state target)
+static int _cpuhp_invoke_callback_range(bool bringup,
+					unsigned int cpu,
+					struct cpuhp_cpu_state *st,
+					enum cpuhp_state target,
+					bool nofail)
 {
 	enum cpuhp_state state;
-	int err = 0;
+	int ret = 0;
 
 	while (cpuhp_next_state(bringup, &state, st, target)) {
+		int err;
+
 		err = cpuhp_invoke_callback(cpu, state, bringup, NULL, NULL);
-		if (err)
+		if (!err)
+			continue;
+
+		if (nofail) {
+			pr_warn("CPU %u %s state %s (%d) failed (%d)\n",
+				cpu, bringup ? "UP" : "DOWN",
+				cpuhp_get_step(st->state)->name,
+				st->state, err);
+			ret = -1;
+		} else {
+			ret = err;
 			break;
+		}
 	}
 
-	return err;
+	return ret;
+}
+
+static inline int cpuhp_invoke_callback_range(bool bringup,
+					      unsigned int cpu,
+					      struct cpuhp_cpu_state *st,
+					      enum cpuhp_state target)
+{
+	return _cpuhp_invoke_callback_range(bringup, cpu, st, target, false);
+}
+
+static inline void cpuhp_invoke_callback_range_nofail(bool bringup,
+						      unsigned int cpu,
+						      struct cpuhp_cpu_state *st,
+						      enum cpuhp_state target)
+{
+	WARN_ON_ONCE(_cpuhp_invoke_callback_range(bringup, cpu, st, target, true));
 }
 
 static inline bool can_rollback_cpu(struct cpuhp_cpu_state *st)
@@ -999,7 +1029,6 @@ static int take_cpu_down(void *_param)
 	struct cpuhp_cpu_state *st = this_cpu_ptr(&cpuhp_state);
 	enum cpuhp_state target = max((int)st->target, CPUHP_AP_OFFLINE);
 	int err, cpu = smp_processor_id();
-	int ret;
 
 	/* Ensure this CPU doesn't handle any more interrupts. */
 	err = __cpu_disable();
@@ -1012,13 +1041,11 @@ static int take_cpu_down(void *_param)
 	 */
 	WARN_ON(st->state != (CPUHP_TEARDOWN_CPU - 1));
 
-	/* Invoke the former CPU_DYING callbacks */
-	ret = cpuhp_invoke_callback_range(false, cpu, st, target);
-
 	/*
+	 * Invoke the former CPU_DYING callbacks
 	 * DYING must not fail!
 	 */
-	WARN_ON_ONCE(ret);
+	cpuhp_invoke_callback_range_nofail(false, cpu, st, target);
 
 	/* Give up timekeeping duties */
 	tick_handover_do_timer();
@@ -1296,16 +1323,14 @@ void notify_cpu_starting(unsigned int cpu)
 {
 	struct cpuhp_cpu_state *st = per_cpu_ptr(&cpuhp_state, cpu);
 	enum cpuhp_state target = min((int)st->target, CPUHP_AP_ONLINE);
-	int ret;
 
 	rcu_cpu_starting(cpu);	/* Enables RCU usage on this CPU. */
 	cpumask_set_cpu(cpu, &cpus_booted_once_mask);
-	ret = cpuhp_invoke_callback_range(true, cpu, st, target);
 
 	/*
 	 * STARTING must not fail!
 	 */
-	WARN_ON_ONCE(ret);
+	cpuhp_invoke_callback_range_nofail(true, cpu, st, target);
 }
 
 /*
-- 
2.37.0.rc0.161.g10f37bed90-goog


^ permalink raw reply related	[flat|nested] 4+ messages in thread

* Re: [PATCH v4] cpu/hotplug: Do not bail-out in DYING/STARTING sections
  2022-07-04 13:13 [PATCH v4] cpu/hotplug: Do not bail-out in DYING/STARTING sections Vincent Donnefort
@ 2022-07-19 15:12 ` Valentin Schneider
  2022-07-19 15:48   ` Vincent Donnefort
  0 siblings, 1 reply; 4+ messages in thread
From: Valentin Schneider @ 2022-07-19 15:12 UTC (permalink / raw)
  To: Vincent Donnefort, peterz, tglx
  Cc: linux-kernel, regressions, kernel-team, Vincent Donnefort, Derek Dolney

On 04/07/22 14:13, Vincent Donnefort wrote:
> +static int _cpuhp_invoke_callback_range(bool bringup,
> +					unsigned int cpu,
> +					struct cpuhp_cpu_state *st,
> +					enum cpuhp_state target,
> +					bool nofail)
[...]
> +		if (nofail) {
> +			pr_warn("CPU %u %s state %s (%d) failed (%d)\n",
> +				cpu, bringup ? "UP" : "DOWN",
> +				cpuhp_get_step(st->state)->name,
> +				st->state, err);
> +			ret = -1;

On a single failure we'll get two warns (WARN_ON_ONCE() + pr_warn(), and
then subsequently just the pr_warn()), is that intended?

Also, why not have ret = err here?

> +		} else {
> +			ret = err;
>                       break;
> +		}
>       }
>
> -	return err;
> +	return ret;

> +static inline void cpuhp_invoke_callback_range_nofail(bool bringup,
> +						      unsigned int cpu,
> +						      struct cpuhp_cpu_state *st,
> +						      enum cpuhp_state target)
> +{
> +	WARN_ON_ONCE(_cpuhp_invoke_callback_range(bringup, cpu, st, target, true));
>  }
>


^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [PATCH v4] cpu/hotplug: Do not bail-out in DYING/STARTING sections
  2022-07-19 15:12 ` Valentin Schneider
@ 2022-07-19 15:48   ` Vincent Donnefort
  2022-07-22 18:35     ` Valentin Schneider
  0 siblings, 1 reply; 4+ messages in thread
From: Vincent Donnefort @ 2022-07-19 15:48 UTC (permalink / raw)
  To: Valentin Schneider
  Cc: peterz, tglx, linux-kernel, regressions, kernel-team, Derek Dolney

On Tue, Jul 19, 2022 at 04:12:03PM +0100, Valentin Schneider wrote:
> On 04/07/22 14:13, Vincent Donnefort wrote:
> > +static int _cpuhp_invoke_callback_range(bool bringup,
> > +					unsigned int cpu,
> > +					struct cpuhp_cpu_state *st,
> > +					enum cpuhp_state target,
> > +					bool nofail)
> [...]
> > +		if (nofail) {
> > +			pr_warn("CPU %u %s state %s (%d) failed (%d)\n",
> > +				cpu, bringup ? "UP" : "DOWN",
> > +				cpuhp_get_step(st->state)->name,
> > +				st->state, err);
> > +			ret = -1;
> 
> On a single failure we'll get two warns (WARN_ON_ONCE() + pr_warn(), and
> then subsequently just the pr_warn()), is that intended?

It does, this is to keep the backtrace that used to be here... but now, giving
a second thought, we can probably get rid of it and just keep the pr_warn()?

> 
> Also, why not have ret = err here?

If two states fail, the ret wouldn't mean much, hence a default "-1" just for
the WARN_ONCE. But if we drop the latter, that would simplify the problem of
knowing which error code to return.

> 
> > +		} else {
> > +			ret = err;
> >                       break;
> > +		}
> >       }
> >
> > -	return err;
> > +	return ret;
> 
> > +static inline void cpuhp_invoke_callback_range_nofail(bool bringup,
> > +						      unsigned int cpu,
> > +						      struct cpuhp_cpu_state *st,
> > +						      enum cpuhp_state target)
> > +{
> > +	WARN_ON_ONCE(_cpuhp_invoke_callback_range(bringup, cpu, st, target, true));
> >  }
> >
> 

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [PATCH v4] cpu/hotplug: Do not bail-out in DYING/STARTING sections
  2022-07-19 15:48   ` Vincent Donnefort
@ 2022-07-22 18:35     ` Valentin Schneider
  0 siblings, 0 replies; 4+ messages in thread
From: Valentin Schneider @ 2022-07-22 18:35 UTC (permalink / raw)
  To: Vincent Donnefort
  Cc: peterz, tglx, linux-kernel, regressions, kernel-team, Derek Dolney

On 19/07/22 16:48, Vincent Donnefort wrote:
> On Tue, Jul 19, 2022 at 04:12:03PM +0100, Valentin Schneider wrote:
>> On 04/07/22 14:13, Vincent Donnefort wrote:
>> > +static int _cpuhp_invoke_callback_range(bool bringup,
>> > +					unsigned int cpu,
>> > +					struct cpuhp_cpu_state *st,
>> > +					enum cpuhp_state target,
>> > +					bool nofail)
>> [...]
>> > +		if (nofail) {
>> > +			pr_warn("CPU %u %s state %s (%d) failed (%d)\n",
>> > +				cpu, bringup ? "UP" : "DOWN",
>> > +				cpuhp_get_step(st->state)->name,
>> > +				st->state, err);
>> > +			ret = -1;
>>
>> On a single failure we'll get two warns (WARN_ON_ONCE() + pr_warn(), and
>> then subsequently just the pr_warn()), is that intended?
>
> It does, this is to keep the backtrace that used to be here... but now, giving
> a second thought, we can probably get rid of it and just keep the pr_warn()?
>
>>
>> Also, why not have ret = err here?
>
> If two states fail, the ret wouldn't mean much, hence a default "-1" just for
> the WARN_ONCE.

Right

> But if we drop the latter, that would simplify the problem of
> knowing which error code to return.
>

We need to drop one of the two, the pr_warn() will probably be more useful
if/when we need to debug this, so go for it.


^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2022-07-22 18:35 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-07-04 13:13 [PATCH v4] cpu/hotplug: Do not bail-out in DYING/STARTING sections Vincent Donnefort
2022-07-19 15:12 ` Valentin Schneider
2022-07-19 15:48   ` Vincent Donnefort
2022-07-22 18:35     ` Valentin Schneider

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).