linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v2 1/1] kernel/cpu: to track which CPUHP callback is failed
@ 2021-04-09  5:53 Dongli Zhang
  2021-04-13 11:58 ` Qais Yousef
  2021-08-10 16:33 ` [tip: smp/core] cpu/hotplug: Add debug printks for hotplug callback failures tip-bot2 for Dongli Zhang
  0 siblings, 2 replies; 3+ messages in thread
From: Dongli Zhang @ 2021-04-09  5:53 UTC (permalink / raw)
  To: linux-kernel
  Cc: tglx, peterz, qais.yousef, mpe, paulmck, npiggin, frederic, ethp,
	joe.jin, dongli.zhang

During bootup or cpu hotplug, the cpuhp_up_callbacks() or
cpuhp_down_callbacks() call many CPUHP callbacks (e.g., perf, mm,
workqueue, RCU, kvmclock and more) for each cpu to online/offline. It may
roll back to its previous state if any of callbacks is failed. As a result,
the user will not be able to know which callback is failed and usually the
only symptom is cpu online/offline failure.

This patch is to print more debug log to help user narrow down where is the
root cause.

Below is the example that how the patch helps narrow down the root cause
for the issue fixed by commit d7eb79c6290c ("KVM: kvmclock: Fix vCPUs > 64
can't be online/hotpluged").

We will have below dynamic debug log once we add
dyndbg="file kernel/cpu.c +p" to kernel command line and when issue is
reproduced.

"CPUHP up callback failure (-12) for cpu 64 at kvmclock:setup_percpu (66)"

Cc: Joe Jin <joe.jin@oracle.com>
Signed-off-by: Dongli Zhang <dongli.zhang@oracle.com>
---
Changed since v1 RFC:
  - use pr_debug() but not pr_err_once() (suggested by Qais Yousef)
  - print log for cpuhp_down_callbacks() as well (suggested by Qais Yousef)

 kernel/cpu.c | 8 ++++++++
 1 file changed, 8 insertions(+)

diff --git a/kernel/cpu.c b/kernel/cpu.c
index 1b6302ecbabe..bcd4dd7de9c3 100644
--- a/kernel/cpu.c
+++ b/kernel/cpu.c
@@ -621,6 +621,10 @@ static int cpuhp_up_callbacks(unsigned int cpu, struct cpuhp_cpu_state *st,
 		st->state++;
 		ret = cpuhp_invoke_callback(cpu, st->state, true, NULL, NULL);
 		if (ret) {
+			pr_debug("CPUHP up callback failure (%d) for cpu %u at %s (%d)\n",
+				 ret, cpu, cpuhp_get_step(st->state)->name,
+				 st->state);
+
 			if (can_rollback_cpu(st)) {
 				st->target = prev_state;
 				undo_cpu_up(cpu, st);
@@ -990,6 +994,10 @@ static int cpuhp_down_callbacks(unsigned int cpu, struct cpuhp_cpu_state *st,
 	for (; st->state > target; st->state--) {
 		ret = cpuhp_invoke_callback(cpu, st->state, false, NULL, NULL);
 		if (ret) {
+			pr_debug("CPUHP down callback failure (%d) for cpu %u at %s (%d)\n",
+				 ret, cpu, cpuhp_get_step(st->state)->name,
+				 st->state);
+
 			st->target = prev_state;
 			if (st->state < prev_state)
 				undo_cpu_down(cpu, st);
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 3+ messages in thread

* Re: [PATCH v2 1/1] kernel/cpu: to track which CPUHP callback is failed
  2021-04-09  5:53 [PATCH v2 1/1] kernel/cpu: to track which CPUHP callback is failed Dongli Zhang
@ 2021-04-13 11:58 ` Qais Yousef
  2021-08-10 16:33 ` [tip: smp/core] cpu/hotplug: Add debug printks for hotplug callback failures tip-bot2 for Dongli Zhang
  1 sibling, 0 replies; 3+ messages in thread
From: Qais Yousef @ 2021-04-13 11:58 UTC (permalink / raw)
  To: Dongli Zhang
  Cc: linux-kernel, tglx, peterz, mpe, paulmck, npiggin, frederic,
	ethp, joe.jin

On 04/08/21 22:53, Dongli Zhang wrote:
> During bootup or cpu hotplug, the cpuhp_up_callbacks() or
> cpuhp_down_callbacks() call many CPUHP callbacks (e.g., perf, mm,
> workqueue, RCU, kvmclock and more) for each cpu to online/offline. It may
> roll back to its previous state if any of callbacks is failed. As a result,
> the user will not be able to know which callback is failed and usually the
> only symptom is cpu online/offline failure.
> 
> This patch is to print more debug log to help user narrow down where is the
> root cause.
> 
> Below is the example that how the patch helps narrow down the root cause
> for the issue fixed by commit d7eb79c6290c ("KVM: kvmclock: Fix vCPUs > 64
> can't be online/hotpluged").
> 
> We will have below dynamic debug log once we add
> dyndbg="file kernel/cpu.c +p" to kernel command line and when issue is
> reproduced.

You can also enable it at runtime

echo "file kernel/cpu.c +p" > /sys/kernel/debug/dynamic_debug/control

> 
> "CPUHP up callback failure (-12) for cpu 64 at kvmclock:setup_percpu (66)"
> 
> Cc: Joe Jin <joe.jin@oracle.com>
> Signed-off-by: Dongli Zhang <dongli.zhang@oracle.com>
> ---

I don't see the harm in adding the debug if some find it useful.

FWIW

Reviewed-by: Qais Yousef <qais.yousef@arm.com>

Cheers

--
Qais Yousef

> Changed since v1 RFC:
>   - use pr_debug() but not pr_err_once() (suggested by Qais Yousef)
>   - print log for cpuhp_down_callbacks() as well (suggested by Qais Yousef)
> 
>  kernel/cpu.c | 8 ++++++++
>  1 file changed, 8 insertions(+)
> 
> diff --git a/kernel/cpu.c b/kernel/cpu.c
> index 1b6302ecbabe..bcd4dd7de9c3 100644
> --- a/kernel/cpu.c
> +++ b/kernel/cpu.c
> @@ -621,6 +621,10 @@ static int cpuhp_up_callbacks(unsigned int cpu, struct cpuhp_cpu_state *st,
>  		st->state++;
>  		ret = cpuhp_invoke_callback(cpu, st->state, true, NULL, NULL);
>  		if (ret) {
> +			pr_debug("CPUHP up callback failure (%d) for cpu %u at %s (%d)\n",
> +				 ret, cpu, cpuhp_get_step(st->state)->name,
> +				 st->state);
> +
>  			if (can_rollback_cpu(st)) {
>  				st->target = prev_state;
>  				undo_cpu_up(cpu, st);
> @@ -990,6 +994,10 @@ static int cpuhp_down_callbacks(unsigned int cpu, struct cpuhp_cpu_state *st,
>  	for (; st->state > target; st->state--) {
>  		ret = cpuhp_invoke_callback(cpu, st->state, false, NULL, NULL);
>  		if (ret) {
> +			pr_debug("CPUHP down callback failure (%d) for cpu %u at %s (%d)\n",
> +				 ret, cpu, cpuhp_get_step(st->state)->name,
> +				 st->state);
> +
>  			st->target = prev_state;
>  			if (st->state < prev_state)
>  				undo_cpu_down(cpu, st);
> -- 
> 2.17.1
> 

^ permalink raw reply	[flat|nested] 3+ messages in thread

* [tip: smp/core] cpu/hotplug: Add debug printks for hotplug callback failures
  2021-04-09  5:53 [PATCH v2 1/1] kernel/cpu: to track which CPUHP callback is failed Dongli Zhang
  2021-04-13 11:58 ` Qais Yousef
@ 2021-08-10 16:33 ` tip-bot2 for Dongli Zhang
  1 sibling, 0 replies; 3+ messages in thread
From: tip-bot2 for Dongli Zhang @ 2021-08-10 16:33 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: Dongli Zhang, Thomas Gleixner, Qais Yousef, x86, linux-kernel

The following commit has been merged into the smp/core branch of tip:

Commit-ID:     ebca71a8c96f0af2ba482489ecc64d88979cd825
Gitweb:        https://git.kernel.org/tip/ebca71a8c96f0af2ba482489ecc64d88979cd825
Author:        Dongli Zhang <dongli.zhang@oracle.com>
AuthorDate:    Thu, 08 Apr 2021 22:53:16 -07:00
Committer:     Thomas Gleixner <tglx@linutronix.de>
CommitterDate: Tue, 10 Aug 2021 18:31:32 +02:00

cpu/hotplug: Add debug printks for hotplug callback failures

CPU hotplug callbacks can fail and cause a rollback to the previous
state. These failures are silent and therefore hard to debug.

Add pr_debug() to the up and down paths which provide information about the
error code, the CPU and the failed state. The debug printks can be enabled
via kernel command line or sysfs.

[ tglx: Adopt to current mainline, massage printk and changelog ]

Signed-off-by: Dongli Zhang <dongli.zhang@oracle.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Qais Yousef <qais.yousef@arm.com>
Link: https://lore.kernel.org/r/20210409055316.1709-1-dongli.zhang@oracle.com

---
 kernel/cpu.c | 7 +++++++
 1 file changed, 7 insertions(+)

diff --git a/kernel/cpu.c b/kernel/cpu.c
index 7ef28e1..192e43a 100644
--- a/kernel/cpu.c
+++ b/kernel/cpu.c
@@ -692,6 +692,10 @@ static int cpuhp_up_callbacks(unsigned int cpu, struct cpuhp_cpu_state *st,
 
 	ret = cpuhp_invoke_callback_range(true, cpu, st, target);
 	if (ret) {
+		pr_debug("CPU UP failed (%d) CPU %u state %s (%d)\n",
+			 ret, cpu, cpuhp_get_step(st->state)->name,
+			 st->state);
+
 		cpuhp_reset_state(st, prev_state);
 		if (can_rollback_cpu(st))
 			WARN_ON(cpuhp_invoke_callback_range(false, cpu, st,
@@ -1091,6 +1095,9 @@ static int cpuhp_down_callbacks(unsigned int cpu, struct cpuhp_cpu_state *st,
 
 	ret = cpuhp_invoke_callback_range(false, cpu, st, target);
 	if (ret) {
+		pr_debug("CPU DOWN failed (%d) CPU %u state %s (%d)\n",
+			 ret, cpu, cpuhp_get_step(st->state)->name,
+			 st->state);
 
 		cpuhp_reset_state(st, prev_state);
 

^ permalink raw reply related	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2021-08-10 16:33 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-04-09  5:53 [PATCH v2 1/1] kernel/cpu: to track which CPUHP callback is failed Dongli Zhang
2021-04-13 11:58 ` Qais Yousef
2021-08-10 16:33 ` [tip: smp/core] cpu/hotplug: Add debug printks for hotplug callback failures tip-bot2 for Dongli Zhang

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).