linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v6] cpu/hotplug: Do not bail-out in DYING/STARTING sections
@ 2022-09-27 10:12 Vincent Donnefort
  2022-11-15 11:06 ` Thorsten Leemhuis
  2022-12-02 11:50 ` [tip: smp/core] " tip-bot2 for Vincent Donnefort
  0 siblings, 2 replies; 3+ messages in thread
From: Vincent Donnefort @ 2022-09-27 10:12 UTC (permalink / raw)
  To: peterz, tglx
  Cc: linux-kernel, vschneid, regressions, kernel-team,
	Vincent Donnefort, Derek Dolney

The DYING/STARTING callbacks are not expected to fail. However, as reported
by Derek, drivers such as tboot are still free to return errors within
those sections, which halts the hot(un)plug and leaves the CPU in an
unrecoverable state.

No rollback being possible there, let's only log the failures and proceed
with the following steps. This restores the hotplug behaviour prior to
commit 453e41085183 ("cpu/hotplug: Add cpuhp_invoke_callback_range()")

Link: https://bugzilla.kernel.org/show_bug.cgi?id=215867
Fixes: 453e41085183 ("cpu/hotplug: Add cpuhp_invoke_callback_range()")
Reported-by: Derek Dolney <z23@posteo.net>
Signed-off-by: Vincent Donnefort <vdonnefort@google.com>
Tested-by: Derek Dolney <z23@posteo.net>
Reviewed-by: Valentin Schneider <vschneid@redhat.com>

---

v5 -> v6:
   - Collect Reviewed-by
v4 -> v5:
   - Remove WARN, only log broken states with pr_warn.
v3 -> v4:
   - Sorry ... wrong commit description style ...
v2 -> v3:
   - Tested-by tag.
   - Refine commit description.
   - Bugzilla link.
v1 -> v2:
   - Commit message rewording.
   - More details in the warnings.
   - Some variable renaming

diff --git a/kernel/cpu.c b/kernel/cpu.c
index bbad5e375d3b..621e5af42d57 100644
--- a/kernel/cpu.c
+++ b/kernel/cpu.c
@@ -663,21 +663,51 @@ static bool cpuhp_next_state(bool bringup,
 	return true;
 }
 
-static int cpuhp_invoke_callback_range(bool bringup,
-				       unsigned int cpu,
-				       struct cpuhp_cpu_state *st,
-				       enum cpuhp_state target)
+static int __cpuhp_invoke_callback_range(bool bringup,
+					 unsigned int cpu,
+					 struct cpuhp_cpu_state *st,
+					 enum cpuhp_state target,
+					 bool nofail)
 {
 	enum cpuhp_state state;
-	int err = 0;
+	int ret = 0;
 
 	while (cpuhp_next_state(bringup, &state, st, target)) {
+		int err;
+
 		err = cpuhp_invoke_callback(cpu, state, bringup, NULL, NULL);
-		if (err)
+		if (!err)
+			continue;
+
+		if (nofail) {
+			pr_warn("CPU %u %s state %s (%d) failed (%d)\n",
+				cpu, bringup ? "UP" : "DOWN",
+				cpuhp_get_step(st->state)->name,
+				st->state, err);
+			ret = -1;
+		} else {
+			ret = err;
 			break;
+		}
 	}
 
-	return err;
+	return ret;
+}
+
+static inline int cpuhp_invoke_callback_range(bool bringup,
+					      unsigned int cpu,
+					      struct cpuhp_cpu_state *st,
+					      enum cpuhp_state target)
+{
+	return __cpuhp_invoke_callback_range(bringup, cpu, st, target, false);
+}
+
+static inline void cpuhp_invoke_callback_range_nofail(bool bringup,
+						      unsigned int cpu,
+						      struct cpuhp_cpu_state *st,
+						      enum cpuhp_state target)
+{
+	__cpuhp_invoke_callback_range(bringup, cpu, st, target, true);
 }
 
 static inline bool can_rollback_cpu(struct cpuhp_cpu_state *st)
@@ -999,7 +1029,6 @@ static int take_cpu_down(void *_param)
 	struct cpuhp_cpu_state *st = this_cpu_ptr(&cpuhp_state);
 	enum cpuhp_state target = max((int)st->target, CPUHP_AP_OFFLINE);
 	int err, cpu = smp_processor_id();
-	int ret;
 
 	/* Ensure this CPU doesn't handle any more interrupts. */
 	err = __cpu_disable();
@@ -1012,13 +1041,11 @@ static int take_cpu_down(void *_param)
 	 */
 	WARN_ON(st->state != (CPUHP_TEARDOWN_CPU - 1));
 
-	/* Invoke the former CPU_DYING callbacks */
-	ret = cpuhp_invoke_callback_range(false, cpu, st, target);
-
 	/*
+	 * Invoke the former CPU_DYING callbacks
 	 * DYING must not fail!
 	 */
-	WARN_ON_ONCE(ret);
+	cpuhp_invoke_callback_range_nofail(false, cpu, st, target);
 
 	/* Give up timekeeping duties */
 	tick_handover_do_timer();
@@ -1296,16 +1323,14 @@ void notify_cpu_starting(unsigned int cpu)
 {
 	struct cpuhp_cpu_state *st = per_cpu_ptr(&cpuhp_state, cpu);
 	enum cpuhp_state target = min((int)st->target, CPUHP_AP_ONLINE);
-	int ret;
 
 	rcu_cpu_starting(cpu);	/* Enables RCU usage on this CPU. */
 	cpumask_set_cpu(cpu, &cpus_booted_once_mask);
-	ret = cpuhp_invoke_callback_range(true, cpu, st, target);
 
 	/*
 	 * STARTING must not fail!
 	 */
-	WARN_ON_ONCE(ret);
+	cpuhp_invoke_callback_range_nofail(true, cpu, st, target);
 }
 
 /*
-- 
2.37.3.998.g577e59143f-goog


^ permalink raw reply related	[flat|nested] 3+ messages in thread

* Re: [PATCH v6] cpu/hotplug: Do not bail-out in DYING/STARTING sections
  2022-09-27 10:12 [PATCH v6] cpu/hotplug: Do not bail-out in DYING/STARTING sections Vincent Donnefort
@ 2022-11-15 11:06 ` Thorsten Leemhuis
  2022-12-02 11:50 ` [tip: smp/core] " tip-bot2 for Vincent Donnefort
  1 sibling, 0 replies; 3+ messages in thread
From: Thorsten Leemhuis @ 2022-11-15 11:06 UTC (permalink / raw)
  To: peterz, tglx
  Cc: linux-kernel, vschneid, kernel-team, Derek Dolney, Vincent Donnefort

Hi, this is your Linux kernel regression tracker. Top-posting for once,
to make this easily accessible to everyone.

Peter, Thomas, what's the holdup with this patch?

I'm asking because I have the linked issue on the list of tracked
regressions and there wasn't really any progress for weeks now afaics
(if I missed anything, please let me know). I'm getting quite close to
the point where my only remaining option is "get Linus to look into
this", but I'd like to avoid that.

Ciao, Thorsten

On 27.09.22 12:12, Vincent Donnefort wrote:
> The DYING/STARTING callbacks are not expected to fail. However, as reported
> by Derek, drivers such as tboot are still free to return errors within
> those sections, which halts the hot(un)plug and leaves the CPU in an
> unrecoverable state.
> 
> No rollback being possible there, let's only log the failures and proceed
> with the following steps. This restores the hotplug behaviour prior to
> commit 453e41085183 ("cpu/hotplug: Add cpuhp_invoke_callback_range()")
> 
> Link: https://bugzilla.kernel.org/show_bug.cgi?id=215867
> Fixes: 453e41085183 ("cpu/hotplug: Add cpuhp_invoke_callback_range()")
> Reported-by: Derek Dolney <z23@posteo.net>
> Signed-off-by: Vincent Donnefort <vdonnefort@google.com>
> Tested-by: Derek Dolney <z23@posteo.net>
> Reviewed-by: Valentin Schneider <vschneid@redhat.com>
> 
> ---
> 
> v5 -> v6:
>    - Collect Reviewed-by
> v4 -> v5:
>    - Remove WARN, only log broken states with pr_warn.
> v3 -> v4:
>    - Sorry ... wrong commit description style ...
> v2 -> v3:
>    - Tested-by tag.
>    - Refine commit description.
>    - Bugzilla link.
> v1 -> v2:
>    - Commit message rewording.
>    - More details in the warnings.
>    - Some variable renaming
> 
> diff --git a/kernel/cpu.c b/kernel/cpu.c
> index bbad5e375d3b..621e5af42d57 100644
> --- a/kernel/cpu.c
> +++ b/kernel/cpu.c
> @@ -663,21 +663,51 @@ static bool cpuhp_next_state(bool bringup,
>  	return true;
>  }
>  
> -static int cpuhp_invoke_callback_range(bool bringup,
> -				       unsigned int cpu,
> -				       struct cpuhp_cpu_state *st,
> -				       enum cpuhp_state target)
> +static int __cpuhp_invoke_callback_range(bool bringup,
> +					 unsigned int cpu,
> +					 struct cpuhp_cpu_state *st,
> +					 enum cpuhp_state target,
> +					 bool nofail)
>  {
>  	enum cpuhp_state state;
> -	int err = 0;
> +	int ret = 0;
>  
>  	while (cpuhp_next_state(bringup, &state, st, target)) {
> +		int err;
> +
>  		err = cpuhp_invoke_callback(cpu, state, bringup, NULL, NULL);
> -		if (err)
> +		if (!err)
> +			continue;
> +
> +		if (nofail) {
> +			pr_warn("CPU %u %s state %s (%d) failed (%d)\n",
> +				cpu, bringup ? "UP" : "DOWN",
> +				cpuhp_get_step(st->state)->name,
> +				st->state, err);
> +			ret = -1;
> +		} else {
> +			ret = err;
>  			break;
> +		}
>  	}
>  
> -	return err;
> +	return ret;
> +}
> +
> +static inline int cpuhp_invoke_callback_range(bool bringup,
> +					      unsigned int cpu,
> +					      struct cpuhp_cpu_state *st,
> +					      enum cpuhp_state target)
> +{
> +	return __cpuhp_invoke_callback_range(bringup, cpu, st, target, false);
> +}
> +
> +static inline void cpuhp_invoke_callback_range_nofail(bool bringup,
> +						      unsigned int cpu,
> +						      struct cpuhp_cpu_state *st,
> +						      enum cpuhp_state target)
> +{
> +	__cpuhp_invoke_callback_range(bringup, cpu, st, target, true);
>  }
>  
>  static inline bool can_rollback_cpu(struct cpuhp_cpu_state *st)
> @@ -999,7 +1029,6 @@ static int take_cpu_down(void *_param)
>  	struct cpuhp_cpu_state *st = this_cpu_ptr(&cpuhp_state);
>  	enum cpuhp_state target = max((int)st->target, CPUHP_AP_OFFLINE);
>  	int err, cpu = smp_processor_id();
> -	int ret;
>  
>  	/* Ensure this CPU doesn't handle any more interrupts. */
>  	err = __cpu_disable();
> @@ -1012,13 +1041,11 @@ static int take_cpu_down(void *_param)
>  	 */
>  	WARN_ON(st->state != (CPUHP_TEARDOWN_CPU - 1));
>  
> -	/* Invoke the former CPU_DYING callbacks */
> -	ret = cpuhp_invoke_callback_range(false, cpu, st, target);
> -
>  	/*
> +	 * Invoke the former CPU_DYING callbacks
>  	 * DYING must not fail!
>  	 */
> -	WARN_ON_ONCE(ret);
> +	cpuhp_invoke_callback_range_nofail(false, cpu, st, target);
>  
>  	/* Give up timekeeping duties */
>  	tick_handover_do_timer();
> @@ -1296,16 +1323,14 @@ void notify_cpu_starting(unsigned int cpu)
>  {
>  	struct cpuhp_cpu_state *st = per_cpu_ptr(&cpuhp_state, cpu);
>  	enum cpuhp_state target = min((int)st->target, CPUHP_AP_ONLINE);
> -	int ret;
>  
>  	rcu_cpu_starting(cpu);	/* Enables RCU usage on this CPU. */
>  	cpumask_set_cpu(cpu, &cpus_booted_once_mask);
> -	ret = cpuhp_invoke_callback_range(true, cpu, st, target);
>  
>  	/*
>  	 * STARTING must not fail!
>  	 */
> -	WARN_ON_ONCE(ret);
> +	cpuhp_invoke_callback_range_nofail(true, cpu, st, target);
>  }
>  
>  /*

^ permalink raw reply	[flat|nested] 3+ messages in thread

* [tip: smp/core] cpu/hotplug: Do not bail-out in DYING/STARTING sections
  2022-09-27 10:12 [PATCH v6] cpu/hotplug: Do not bail-out in DYING/STARTING sections Vincent Donnefort
  2022-11-15 11:06 ` Thorsten Leemhuis
@ 2022-12-02 11:50 ` tip-bot2 for Vincent Donnefort
  1 sibling, 0 replies; 3+ messages in thread
From: tip-bot2 for Vincent Donnefort @ 2022-12-02 11:50 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: Derek Dolney, Vincent Donnefort, Thomas Gleixner,
	Valentin Schneider, x86, linux-kernel

The following commit has been merged into the smp/core branch of tip:

Commit-ID:     6f855b39e4602b6b42a8e5cbcfefb8a1b8b5f0be
Gitweb:        https://git.kernel.org/tip/6f855b39e4602b6b42a8e5cbcfefb8a1b8b5f0be
Author:        Vincent Donnefort <vdonnefort@google.com>
AuthorDate:    Tue, 27 Sep 2022 11:12:59 +01:00
Committer:     Thomas Gleixner <tglx@linutronix.de>
CommitterDate: Fri, 02 Dec 2022 12:43:02 +01:00

cpu/hotplug: Do not bail-out in DYING/STARTING sections

The DYING/STARTING callbacks are not expected to fail. However, as reported
by Derek, buggy drivers such as tboot are still free to return errors
within those sections, which halts the hot(un)plug and leaves the CPU in an
unrecoverable state.

As there is no rollback possible, only log the failures and proceed with
the following steps.

This restores the hotplug behaviour prior to commit 453e41085183
("cpu/hotplug: Add cpuhp_invoke_callback_range()")

Fixes: 453e41085183 ("cpu/hotplug: Add cpuhp_invoke_callback_range()")
Reported-by: Derek Dolney <z23@posteo.net>
Signed-off-by: Vincent Donnefort <vdonnefort@google.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Derek Dolney <z23@posteo.net>
Reviewed-by: Valentin Schneider <vschneid@redhat.com>
Link: https://bugzilla.kernel.org/show_bug.cgi?id=215867
Link: https://lore.kernel.org/r/20220927101259.1149636-1-vdonnefort@google.com

---
 kernel/cpu.c | 56 ++++++++++++++++++++++++++++++++++++---------------
 1 file changed, 40 insertions(+), 16 deletions(-)

diff --git a/kernel/cpu.c b/kernel/cpu.c
index 3f704a8..6c0a92c 100644
--- a/kernel/cpu.c
+++ b/kernel/cpu.c
@@ -663,21 +663,51 @@ static bool cpuhp_next_state(bool bringup,
 	return true;
 }
 
-static int cpuhp_invoke_callback_range(bool bringup,
-				       unsigned int cpu,
-				       struct cpuhp_cpu_state *st,
-				       enum cpuhp_state target)
+static int __cpuhp_invoke_callback_range(bool bringup,
+					 unsigned int cpu,
+					 struct cpuhp_cpu_state *st,
+					 enum cpuhp_state target,
+					 bool nofail)
 {
 	enum cpuhp_state state;
-	int err = 0;
+	int ret = 0;
 
 	while (cpuhp_next_state(bringup, &state, st, target)) {
+		int err;
+
 		err = cpuhp_invoke_callback(cpu, state, bringup, NULL, NULL);
-		if (err)
+		if (!err)
+			continue;
+
+		if (nofail) {
+			pr_warn("CPU %u %s state %s (%d) failed (%d)\n",
+				cpu, bringup ? "UP" : "DOWN",
+				cpuhp_get_step(st->state)->name,
+				st->state, err);
+			ret = -1;
+		} else {
+			ret = err;
 			break;
+		}
 	}
 
-	return err;
+	return ret;
+}
+
+static inline int cpuhp_invoke_callback_range(bool bringup,
+					      unsigned int cpu,
+					      struct cpuhp_cpu_state *st,
+					      enum cpuhp_state target)
+{
+	return __cpuhp_invoke_callback_range(bringup, cpu, st, target, false);
+}
+
+static inline void cpuhp_invoke_callback_range_nofail(bool bringup,
+						      unsigned int cpu,
+						      struct cpuhp_cpu_state *st,
+						      enum cpuhp_state target)
+{
+	__cpuhp_invoke_callback_range(bringup, cpu, st, target, true);
 }
 
 static inline bool can_rollback_cpu(struct cpuhp_cpu_state *st)
@@ -999,7 +1029,6 @@ static int take_cpu_down(void *_param)
 	struct cpuhp_cpu_state *st = this_cpu_ptr(&cpuhp_state);
 	enum cpuhp_state target = max((int)st->target, CPUHP_AP_OFFLINE);
 	int err, cpu = smp_processor_id();
-	int ret;
 
 	/* Ensure this CPU doesn't handle any more interrupts. */
 	err = __cpu_disable();
@@ -1012,13 +1041,10 @@ static int take_cpu_down(void *_param)
 	 */
 	WARN_ON(st->state != (CPUHP_TEARDOWN_CPU - 1));
 
-	/* Invoke the former CPU_DYING callbacks */
-	ret = cpuhp_invoke_callback_range(false, cpu, st, target);
-
 	/*
-	 * DYING must not fail!
+	 * Invoke the former CPU_DYING callbacks. DYING must not fail!
 	 */
-	WARN_ON_ONCE(ret);
+	cpuhp_invoke_callback_range_nofail(false, cpu, st, target);
 
 	/* Give up timekeeping duties */
 	tick_handover_do_timer();
@@ -1296,16 +1322,14 @@ void notify_cpu_starting(unsigned int cpu)
 {
 	struct cpuhp_cpu_state *st = per_cpu_ptr(&cpuhp_state, cpu);
 	enum cpuhp_state target = min((int)st->target, CPUHP_AP_ONLINE);
-	int ret;
 
 	rcu_cpu_starting(cpu);	/* Enables RCU usage on this CPU. */
 	cpumask_set_cpu(cpu, &cpus_booted_once_mask);
-	ret = cpuhp_invoke_callback_range(true, cpu, st, target);
 
 	/*
 	 * STARTING must not fail!
 	 */
-	WARN_ON_ONCE(ret);
+	cpuhp_invoke_callback_range_nofail(true, cpu, st, target);
 }
 
 /*

^ permalink raw reply related	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2022-12-02 11:51 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-09-27 10:12 [PATCH v6] cpu/hotplug: Do not bail-out in DYING/STARTING sections Vincent Donnefort
2022-11-15 11:06 ` Thorsten Leemhuis
2022-12-02 11:50 ` [tip: smp/core] " tip-bot2 for Vincent Donnefort

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).