linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v3 0/3] rcu: Display registers of self-detected stall as far as possible
@ 2022-07-30 10:23 Zhen Lei
  2022-07-30 10:23 ` [PATCH v3 1/3] rcu/exp: Use NMI to get the backtrace of cpu_curr(other_cpu) first Zhen Lei
                   ` (2 more replies)
  0 siblings, 3 replies; 8+ messages in thread
From: Zhen Lei @ 2022-07-30 10:23 UTC (permalink / raw)
  To: Ingo Molnar, Peter Zijlstra, Juri Lelli, Vincent Guittot,
	Dietmar Eggemann, Steven Rostedt, Ben Segall, Mel Gorman,
	Daniel Bristot de Oliveira, Valentin Schneider, linux-kernel,
	Paul E . McKenney, Frederic Weisbecker, Neeraj Upadhyay,
	Josh Triplett, Mathieu Desnoyers, Lai Jiangshan, Joel Fernandes,
	rcu
  Cc: Zhen Lei

v2 --> v3:
1. Patch 1 Add trigger_single_cpu_backtrace(cpu) in synchronize_rcu_expedited_wait()
   Subsequently, we can see that all callers of dump_cpu_task() try
   trigger_single_cpu_backtrace() first. Then I do the cleanup in Patch 2.
2. Patch 3, as Paul E. McKenney's suggestion, push the code into dump_cpu_task().

For newcomers:
Currently, dump_cpu_task() is mainly used by RCU, in order to dump the
stack traces of the current task of the specified CPU when a rcu stall
is detected.

For architectures that do not support NMI interrupts, registers is not
printed when rcu stall is self-detected. This patch series improve it.


v2:
https://lkml.org/lkml/2022/7/27/1800

Zhen Lei (3):
  rcu/exp: Use NMI to get the backtrace of cpu_curr(other_cpu) first
  sched/debug: Try trigger_single_cpu_backtrace(cpu) in dump_cpu_task()
  sched/debug: Show the registers of 'current' in dump_cpu_task()

 kernel/rcu/tree_stall.h |  8 +++-----
 kernel/sched/core.c     | 14 ++++++++++++++
 kernel/smp.c            |  3 +--
 3 files changed, 18 insertions(+), 7 deletions(-)

-- 
2.25.1


^ permalink raw reply	[flat|nested] 8+ messages in thread

* [PATCH v3 1/3] rcu/exp: Use NMI to get the backtrace of cpu_curr(other_cpu) first
  2022-07-30 10:23 [PATCH v3 0/3] rcu: Display registers of self-detected stall as far as possible Zhen Lei
@ 2022-07-30 10:23 ` Zhen Lei
  2022-08-01 23:14   ` Paul E. McKenney
  2022-07-30 10:23 ` [PATCH v3 2/3] sched/debug: Try trigger_single_cpu_backtrace(cpu) in dump_cpu_task() Zhen Lei
  2022-07-30 10:23 ` [PATCH v3 3/3] sched/debug: Show the registers of 'current' " Zhen Lei
  2 siblings, 1 reply; 8+ messages in thread
From: Zhen Lei @ 2022-07-30 10:23 UTC (permalink / raw)
  To: Ingo Molnar, Peter Zijlstra, Juri Lelli, Vincent Guittot,
	Dietmar Eggemann, Steven Rostedt, Ben Segall, Mel Gorman,
	Daniel Bristot de Oliveira, Valentin Schneider, linux-kernel,
	Paul E . McKenney, Frederic Weisbecker, Neeraj Upadhyay,
	Josh Triplett, Mathieu Desnoyers, Lai Jiangshan, Joel Fernandes,
	rcu
  Cc: Zhen Lei

The backtrace of cpu_curr(other_cpu) is unwinded based on the 'fp' saved
during its last switch-out. For the most part, it's out of date. So try
to use NMI to get the backtrace first, just like those functions in
"tree_stall.h" did. Such as rcu_dump_cpu_stacks().

Signed-off-by: Zhen Lei <thunder.leizhen@huawei.com>
---
 kernel/rcu/tree_exp.h | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/kernel/rcu/tree_exp.h b/kernel/rcu/tree_exp.h
index 0f70f62039a9090..21381697de23f0b 100644
--- a/kernel/rcu/tree_exp.h
+++ b/kernel/rcu/tree_exp.h
@@ -665,7 +665,8 @@ static void synchronize_rcu_expedited_wait(void)
 				mask = leaf_node_cpu_bit(rnp, cpu);
 				if (!(READ_ONCE(rnp->expmask) & mask))
 					continue;
-				dump_cpu_task(cpu);
+				if (!trigger_single_cpu_backtrace(cpu))
+					dump_cpu_task(cpu);
 			}
 		}
 		jiffies_stall = 3 * rcu_exp_jiffies_till_stall_check() + 3;
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 8+ messages in thread

* [PATCH v3 2/3] sched/debug: Try trigger_single_cpu_backtrace(cpu) in dump_cpu_task()
  2022-07-30 10:23 [PATCH v3 0/3] rcu: Display registers of self-detected stall as far as possible Zhen Lei
  2022-07-30 10:23 ` [PATCH v3 1/3] rcu/exp: Use NMI to get the backtrace of cpu_curr(other_cpu) first Zhen Lei
@ 2022-07-30 10:23 ` Zhen Lei
  2022-07-30 10:23 ` [PATCH v3 3/3] sched/debug: Show the registers of 'current' " Zhen Lei
  2 siblings, 0 replies; 8+ messages in thread
From: Zhen Lei @ 2022-07-30 10:23 UTC (permalink / raw)
  To: Ingo Molnar, Peter Zijlstra, Juri Lelli, Vincent Guittot,
	Dietmar Eggemann, Steven Rostedt, Ben Segall, Mel Gorman,
	Daniel Bristot de Oliveira, Valentin Schneider, linux-kernel,
	Paul E . McKenney, Frederic Weisbecker, Neeraj Upadhyay,
	Josh Triplett, Mathieu Desnoyers, Lai Jiangshan, Joel Fernandes,
	rcu
  Cc: Zhen Lei

Function trigger_all_cpu_backtrace() uses NMI to dump the stack traces
of other CPU, it should actually be one of the ways to implement
dump_cpu_task(). So try it first in dump_cpu_task(). At the same time,
unnecessary duplicate code of upper-layer functions is eliminated.

Signed-off-by: Zhen Lei <thunder.leizhen@huawei.com>
---
 kernel/rcu/tree_exp.h   | 3 +--
 kernel/rcu/tree_stall.h | 8 +++-----
 kernel/sched/core.c     | 3 +++
 kernel/smp.c            | 3 +--
 4 files changed, 8 insertions(+), 9 deletions(-)

diff --git a/kernel/rcu/tree_exp.h b/kernel/rcu/tree_exp.h
index 21381697de23f0b..0f70f62039a9090 100644
--- a/kernel/rcu/tree_exp.h
+++ b/kernel/rcu/tree_exp.h
@@ -665,8 +665,7 @@ static void synchronize_rcu_expedited_wait(void)
 				mask = leaf_node_cpu_bit(rnp, cpu);
 				if (!(READ_ONCE(rnp->expmask) & mask))
 					continue;
-				if (!trigger_single_cpu_backtrace(cpu))
-					dump_cpu_task(cpu);
+				dump_cpu_task(cpu);
 			}
 		}
 		jiffies_stall = 3 * rcu_exp_jiffies_till_stall_check() + 3;
diff --git a/kernel/rcu/tree_stall.h b/kernel/rcu/tree_stall.h
index a001e1e7a99269c..80749d257ac2f78 100644
--- a/kernel/rcu/tree_stall.h
+++ b/kernel/rcu/tree_stall.h
@@ -368,7 +368,7 @@ static void rcu_dump_cpu_stacks(void)
 			if (rnp->qsmask & leaf_node_cpu_bit(rnp, cpu)) {
 				if (cpu_is_offline(cpu))
 					pr_err("Offline CPU %d blocking current GP.\n", cpu);
-				else if (!trigger_single_cpu_backtrace(cpu))
+				else
 					dump_cpu_task(cpu);
 			}
 		raw_spin_unlock_irqrestore_rcu_node(rnp, flags);
@@ -486,8 +486,7 @@ static void rcuc_kthread_dump(struct rcu_data *rdp)
 
 	pr_err("%s kthread starved for %ld jiffies\n", rcuc->comm, j);
 	sched_show_task(rcuc);
-	if (!trigger_single_cpu_backtrace(cpu))
-		dump_cpu_task(cpu);
+	dump_cpu_task(cpu);
 }
 
 /* Complain about starvation of grace-period kthread.  */
@@ -515,8 +514,7 @@ static void rcu_check_gp_kthread_starvation(void)
 					pr_err("RCU GP kthread last ran on offline CPU %d.\n", cpu);
 				} else  {
 					pr_err("Stack dump where RCU GP kthread last ran:\n");
-					if (!trigger_single_cpu_backtrace(cpu))
-						dump_cpu_task(cpu);
+					dump_cpu_task(cpu);
 				}
 			}
 			wake_up_process(gpk);
diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 54e5fb2eeee898c..5942af8728e30e5 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -11112,6 +11112,9 @@ struct cgroup_subsys cpu_cgrp_subsys = {
 
 void dump_cpu_task(int cpu)
 {
+	if (trigger_single_cpu_backtrace(cpu))
+		return;
+
 	pr_info("Task dump for CPU %d:\n", cpu);
 	sched_show_task(cpu_curr(cpu));
 }
diff --git a/kernel/smp.c b/kernel/smp.c
index dd215f439426449..56ca958364aebeb 100644
--- a/kernel/smp.c
+++ b/kernel/smp.c
@@ -370,8 +370,7 @@ static bool csd_lock_wait_toolong(struct __call_single_data *csd, u64 ts0, u64 *
 	if (cpu >= 0) {
 		if (static_branch_unlikely(&csdlock_debug_extended))
 			csd_lock_print_extended(csd, cpu);
-		if (!trigger_single_cpu_backtrace(cpu))
-			dump_cpu_task(cpu);
+		dump_cpu_task(cpu);
 		if (!cpu_cur_csd) {
 			pr_alert("csd: Re-sending CSD lock (#%d) IPI from CPU#%02d to CPU#%02d\n", *bug_id, raw_smp_processor_id(), cpu);
 			arch_send_call_function_single_ipi(cpu);
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 8+ messages in thread

* [PATCH v3 3/3] sched/debug: Show the registers of 'current' in dump_cpu_task()
  2022-07-30 10:23 [PATCH v3 0/3] rcu: Display registers of self-detected stall as far as possible Zhen Lei
  2022-07-30 10:23 ` [PATCH v3 1/3] rcu/exp: Use NMI to get the backtrace of cpu_curr(other_cpu) first Zhen Lei
  2022-07-30 10:23 ` [PATCH v3 2/3] sched/debug: Try trigger_single_cpu_backtrace(cpu) in dump_cpu_task() Zhen Lei
@ 2022-07-30 10:23 ` Zhen Lei
  2 siblings, 0 replies; 8+ messages in thread
From: Zhen Lei @ 2022-07-30 10:23 UTC (permalink / raw)
  To: Ingo Molnar, Peter Zijlstra, Juri Lelli, Vincent Guittot,
	Dietmar Eggemann, Steven Rostedt, Ben Segall, Mel Gorman,
	Daniel Bristot de Oliveira, Valentin Schneider, linux-kernel,
	Paul E . McKenney, Frederic Weisbecker, Neeraj Upadhyay,
	Josh Triplett, Mathieu Desnoyers, Lai Jiangshan, Joel Fernandes,
	rcu
  Cc: Zhen Lei

For architectures that do not support NMI, registers is not printed.
However, this information is useful for analyzing the root cause of the
fault. Fortunately, when the stack traces of current is dumped in the
interrupt handler, we can take it through get_irq_regs() and display it
through show_regs(). Further, show_regs() unwind the call trace based on
'regs', the worthless call trace associated with interrupt handling will
be omitted, this helps us to focus more on the problem. By the way, for
architectures that support NMI, it also avoids generating an unnecessary
NMI in this case.

This is an example of rcu self-detected stall on arm64:
[   27.501721] rcu: INFO: rcu_preempt self-detected stall on CPU
[   27.502238] rcu:     0-....: (1250 ticks this GP) idle=4f7/1/0x4000000000000000 softirq=2594/2594 fqs=619
[   27.502632]  (t=1251 jiffies g=2989 q=29 ncpus=4)
[   27.503845] CPU: 0 PID: 306 Comm: test0 Not tainted 5.19.0-rc7-00009-g1c1a6c29ff99-dirty #46
[   27.504732] Hardware name: linux,dummy-virt (DT)
[   27.504947] pstate: 20000005 (nzCv daif -PAN -UAO -TCO -DIT -SSBS BTYPE=--)
[   27.504998] pc : arch_counter_read+0x18/0x24
[   27.505301] lr : arch_counter_read+0x18/0x24
[   27.505328] sp : ffff80000b29bdf0
[   27.505345] x29: ffff80000b29bdf0 x28: 0000000000000000 x27: 0000000000000000
[   27.505475] x26: 0000000000000000 x25: 0000000000000000 x24: 0000000000000000
[   27.505553] x23: 0000000000001f40 x22: ffff800009849c48 x21: 000000065f871ae0
[   27.505627] x20: 00000000000025ec x19: ffff80000a6eb300 x18: ffffffffffffffff
[   27.505654] x17: 0000000000000001 x16: 0000000000000000 x15: ffff80000a6d0296
[   27.505681] x14: ffffffffffffffff x13: ffff80000a29bc18 x12: 0000000000000426
[   27.505709] x11: 0000000000000162 x10: ffff80000a2f3c18 x9 : ffff80000a29bc18
[   27.505736] x8 : 00000000ffffefff x7 : ffff80000a2f3c18 x6 : 00000000759bd013
[   27.505761] x5 : 01ffffffffffffff x4 : 0002dc6c00000000 x3 : 0000000000000017
[   27.505787] x2 : 00000000000025ec x1 : ffff80000b29bdf0 x0 : 0000000075a30653
[   27.505937] Call trace:
[   27.506002]  arch_counter_read+0x18/0x24
[   27.506171]  ktime_get+0x48/0xa0
[   27.506207]  test_task+0x70/0xf0
[   27.506227]  kthread+0x10c/0x110
[   27.506243]  ret_from_fork+0x10/0x20

The old output is as follows:
[   27.944550] rcu: INFO: rcu_preempt self-detected stall on CPU
[   27.944980] rcu:     0-....: (1249 ticks this GP) idle=cbb/1/0x4000000000000000 softirq=2610/2610 fqs=614
[   27.945407]  (t=1251 jiffies g=2681 q=28 ncpus=4)
[   27.945731] Task dump for CPU 0:
[   27.945844] task:test0           state:R  running task     stack:    0 pid:  306 ppid:     2 flags:0x0000000a
[   27.946073] Call trace:
[   27.946151]  dump_backtrace.part.0+0xc8/0xd4
[   27.946378]  show_stack+0x18/0x70
[   27.946405]  sched_show_task+0x150/0x180
[   27.946427]  dump_cpu_task+0x44/0x54
[   27.947193]  rcu_dump_cpu_stacks+0xec/0x130
[   27.947212]  rcu_sched_clock_irq+0xb18/0xef0
[   27.947231]  update_process_times+0x68/0xac
[   27.947248]  tick_sched_handle+0x34/0x60
[   27.947266]  tick_sched_timer+0x4c/0xa4
[   27.947281]  __hrtimer_run_queues+0x178/0x360
[   27.947295]  hrtimer_interrupt+0xe8/0x244
[   27.947309]  arch_timer_handler_virt+0x38/0x4c
[   27.947326]  handle_percpu_devid_irq+0x88/0x230
[   27.947342]  generic_handle_domain_irq+0x2c/0x44
[   27.947357]  gic_handle_irq+0x44/0xc4
[   27.947376]  call_on_irq_stack+0x2c/0x54
[   27.947415]  do_interrupt_handler+0x80/0x94
[   27.947431]  el1_interrupt+0x34/0x70
[   27.947447]  el1h_64_irq_handler+0x18/0x24
[   27.947462]  el1h_64_irq+0x64/0x68                       <--- the above backtrace is worthless
[   27.947474]  arch_counter_read+0x18/0x24
[   27.947487]  ktime_get+0x48/0xa0
[   27.947501]  test_task+0x70/0xf0
[   27.947520]  kthread+0x10c/0x110
[   27.947538]  ret_from_fork+0x10/0x20

Signed-off-by: Zhen Lei <thunder.leizhen@huawei.com>
---
 kernel/sched/core.c | 11 +++++++++++
 1 file changed, 11 insertions(+)

diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 5942af8728e30e5..656231df72fb08a 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -73,6 +73,7 @@
 
 #include <uapi/linux/sched/types.h>
 
+#include <asm/irq_regs.h>
 #include <asm/switch_to.h>
 #include <asm/tlb.h>
 
@@ -11112,6 +11113,16 @@ struct cgroup_subsys cpu_cgrp_subsys = {
 
 void dump_cpu_task(int cpu)
 {
+	if (cpu == smp_processor_id() && in_hardirq()) {
+		struct pt_regs *regs;
+
+		regs = get_irq_regs();
+		if (regs) {
+			show_regs(regs);
+			return;
+		}
+	}
+
 	if (trigger_single_cpu_backtrace(cpu))
 		return;
 
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 8+ messages in thread

* Re: [PATCH v3 1/3] rcu/exp: Use NMI to get the backtrace of cpu_curr(other_cpu) first
  2022-07-30 10:23 ` [PATCH v3 1/3] rcu/exp: Use NMI to get the backtrace of cpu_curr(other_cpu) first Zhen Lei
@ 2022-08-01 23:14   ` Paul E. McKenney
  2022-08-02  2:06     ` Leizhen (ThunderTown)
  0 siblings, 1 reply; 8+ messages in thread
From: Paul E. McKenney @ 2022-08-01 23:14 UTC (permalink / raw)
  To: Zhen Lei
  Cc: Ingo Molnar, Peter Zijlstra, Juri Lelli, Vincent Guittot,
	Dietmar Eggemann, Steven Rostedt, Ben Segall, Mel Gorman,
	Daniel Bristot de Oliveira, Valentin Schneider, linux-kernel,
	Frederic Weisbecker, Neeraj Upadhyay, Josh Triplett,
	Mathieu Desnoyers, Lai Jiangshan, Joel Fernandes, rcu

On Sat, Jul 30, 2022 at 06:23:28PM +0800, Zhen Lei wrote:
> The backtrace of cpu_curr(other_cpu) is unwinded based on the 'fp' saved
> during its last switch-out. For the most part, it's out of date. So try
> to use NMI to get the backtrace first, just like those functions in
> "tree_stall.h" did. Such as rcu_dump_cpu_stacks().
> 
> Signed-off-by: Zhen Lei <thunder.leizhen@huawei.com>

Much better, thank you!

> ---
>  kernel/rcu/tree_exp.h | 3 ++-
>  1 file changed, 2 insertions(+), 1 deletion(-)
> 
> diff --git a/kernel/rcu/tree_exp.h b/kernel/rcu/tree_exp.h
> index 0f70f62039a9090..21381697de23f0b 100644
> --- a/kernel/rcu/tree_exp.h
> +++ b/kernel/rcu/tree_exp.h
> @@ -665,7 +665,8 @@ static void synchronize_rcu_expedited_wait(void)
>  				mask = leaf_node_cpu_bit(rnp, cpu);
>  				if (!(READ_ONCE(rnp->expmask) & mask))
>  					continue;
> -				dump_cpu_task(cpu);
> +				if (!trigger_single_cpu_backtrace(cpu))
> +					dump_cpu_task(cpu);

But why not just leave this unchanged, rather than adding the call to
trigger_single_cpu_backtrace() in this patch and then removing it in
the next patch?

							Thanx, Paul

>  			}
>  		}
>  		jiffies_stall = 3 * rcu_exp_jiffies_till_stall_check() + 3;
> -- 
> 2.25.1
> 

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH v3 1/3] rcu/exp: Use NMI to get the backtrace of cpu_curr(other_cpu) first
  2022-08-01 23:14   ` Paul E. McKenney
@ 2022-08-02  2:06     ` Leizhen (ThunderTown)
  2022-08-04  0:07       ` Paul E. McKenney
  0 siblings, 1 reply; 8+ messages in thread
From: Leizhen (ThunderTown) @ 2022-08-02  2:06 UTC (permalink / raw)
  To: paulmck
  Cc: Ingo Molnar, Peter Zijlstra, Juri Lelli, Vincent Guittot,
	Dietmar Eggemann, Steven Rostedt, Ben Segall, Mel Gorman,
	Daniel Bristot de Oliveira, Valentin Schneider, linux-kernel,
	Frederic Weisbecker, Neeraj Upadhyay, Josh Triplett,
	Mathieu Desnoyers, Lai Jiangshan, Joel Fernandes, rcu



On 2022/8/2 7:14, Paul E. McKenney wrote:
> On Sat, Jul 30, 2022 at 06:23:28PM +0800, Zhen Lei wrote:
>> The backtrace of cpu_curr(other_cpu) is unwinded based on the 'fp' saved
>> during its last switch-out. For the most part, it's out of date. So try
>> to use NMI to get the backtrace first, just like those functions in
>> "tree_stall.h" did. Such as rcu_dump_cpu_stacks().
>>
>> Signed-off-by: Zhen Lei <thunder.leizhen@huawei.com>
> 
> Much better, thank you!
> 
>> ---
>>  kernel/rcu/tree_exp.h | 3 ++-
>>  1 file changed, 2 insertions(+), 1 deletion(-)
>>
>> diff --git a/kernel/rcu/tree_exp.h b/kernel/rcu/tree_exp.h
>> index 0f70f62039a9090..21381697de23f0b 100644
>> --- a/kernel/rcu/tree_exp.h
>> +++ b/kernel/rcu/tree_exp.h
>> @@ -665,7 +665,8 @@ static void synchronize_rcu_expedited_wait(void)
>>  				mask = leaf_node_cpu_bit(rnp, cpu);
>>  				if (!(READ_ONCE(rnp->expmask) & mask))
>>  					continue;
>> -				dump_cpu_task(cpu);
>> +				if (!trigger_single_cpu_backtrace(cpu))
>> +					dump_cpu_task(cpu);
> 
> But why not just leave this unchanged, rather than adding the call to
> trigger_single_cpu_backtrace() in this patch and then removing it in
> the next patch?

To make the patch clear and easy to describe. Otherwise, I need to
give an additional description of it in the next patch, because I
searched all dump_cpu_task(). This seems to make the next patch
less simple.

Some of the patch sets I've seen have been done step by step like
this. But I can't find it now.

On the other hand, this patch is a small fix. Earlier versions may
only backport it, not the next cleanup patch.

> 
> 							Thanx, Paul
> 
>>  			}
>>  		}
>>  		jiffies_stall = 3 * rcu_exp_jiffies_till_stall_check() + 3;
>> -- 
>> 2.25.1
>>
> .
> 

-- 
Regards,
  Zhen Lei

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH v3 1/3] rcu/exp: Use NMI to get the backtrace of cpu_curr(other_cpu) first
  2022-08-02  2:06     ` Leizhen (ThunderTown)
@ 2022-08-04  0:07       ` Paul E. McKenney
  2022-08-04  1:34         ` Leizhen (ThunderTown)
  0 siblings, 1 reply; 8+ messages in thread
From: Paul E. McKenney @ 2022-08-04  0:07 UTC (permalink / raw)
  To: Leizhen (ThunderTown)
  Cc: Ingo Molnar, Peter Zijlstra, Juri Lelli, Vincent Guittot,
	Dietmar Eggemann, Steven Rostedt, Ben Segall, Mel Gorman,
	Daniel Bristot de Oliveira, Valentin Schneider, linux-kernel,
	Frederic Weisbecker, Neeraj Upadhyay, Josh Triplett,
	Mathieu Desnoyers, Lai Jiangshan, Joel Fernandes, rcu

On Tue, Aug 02, 2022 at 10:06:00AM +0800, Leizhen (ThunderTown) wrote:
> 
> 
> On 2022/8/2 7:14, Paul E. McKenney wrote:
> > On Sat, Jul 30, 2022 at 06:23:28PM +0800, Zhen Lei wrote:
> >> The backtrace of cpu_curr(other_cpu) is unwinded based on the 'fp' saved
> >> during its last switch-out. For the most part, it's out of date. So try
> >> to use NMI to get the backtrace first, just like those functions in
> >> "tree_stall.h" did. Such as rcu_dump_cpu_stacks().
> >>
> >> Signed-off-by: Zhen Lei <thunder.leizhen@huawei.com>
> > 
> > Much better, thank you!
> > 
> >> ---
> >>  kernel/rcu/tree_exp.h | 3 ++-
> >>  1 file changed, 2 insertions(+), 1 deletion(-)
> >>
> >> diff --git a/kernel/rcu/tree_exp.h b/kernel/rcu/tree_exp.h
> >> index 0f70f62039a9090..21381697de23f0b 100644
> >> --- a/kernel/rcu/tree_exp.h
> >> +++ b/kernel/rcu/tree_exp.h
> >> @@ -665,7 +665,8 @@ static void synchronize_rcu_expedited_wait(void)
> >>  				mask = leaf_node_cpu_bit(rnp, cpu);
> >>  				if (!(READ_ONCE(rnp->expmask) & mask))
> >>  					continue;
> >> -				dump_cpu_task(cpu);
> >> +				if (!trigger_single_cpu_backtrace(cpu))
> >> +					dump_cpu_task(cpu);
> > 
> > But why not just leave this unchanged, rather than adding the call to
> > trigger_single_cpu_backtrace() in this patch and then removing it in
> > the next patch?
> 
> To make the patch clear and easy to describe. Otherwise, I need to
> give an additional description of it in the next patch, because I
> searched all dump_cpu_task(). This seems to make the next patch
> less simple.
> 
> Some of the patch sets I've seen have been done step by step like
> this. But I can't find it now.
> 
> On the other hand, this patch is a small fix. Earlier versions may
> only backport it, not the next cleanup patch.

You do have the option of doing a Cc to stable to control the backporting,
if that is a potential issue for you.

On the commit log, just say that the one use case already avoided doing
the trigger_single_cpu_backtrace(), and thus did not need to be updated.

So please resend the series, but without the undo/redo.  There would
thus be two patches rather than three, but there are plenty of other
things that need fixing anyway.

							Thanx, Paul

> >>  			}
> >>  		}
> >>  		jiffies_stall = 3 * rcu_exp_jiffies_till_stall_check() + 3;
> >> -- 
> >> 2.25.1
> >>
> > .
> > 
> 
> -- 
> Regards,
>   Zhen Lei

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH v3 1/3] rcu/exp: Use NMI to get the backtrace of cpu_curr(other_cpu) first
  2022-08-04  0:07       ` Paul E. McKenney
@ 2022-08-04  1:34         ` Leizhen (ThunderTown)
  0 siblings, 0 replies; 8+ messages in thread
From: Leizhen (ThunderTown) @ 2022-08-04  1:34 UTC (permalink / raw)
  To: paulmck
  Cc: Ingo Molnar, Peter Zijlstra, Juri Lelli, Vincent Guittot,
	Dietmar Eggemann, Steven Rostedt, Ben Segall, Mel Gorman,
	Daniel Bristot de Oliveira, Valentin Schneider, linux-kernel,
	Frederic Weisbecker, Neeraj Upadhyay, Josh Triplett,
	Mathieu Desnoyers, Lai Jiangshan, Joel Fernandes, rcu



On 2022/8/4 8:07, Paul E. McKenney wrote:
> On Tue, Aug 02, 2022 at 10:06:00AM +0800, Leizhen (ThunderTown) wrote:
>>
>>
>> On 2022/8/2 7:14, Paul E. McKenney wrote:
>>> On Sat, Jul 30, 2022 at 06:23:28PM +0800, Zhen Lei wrote:
>>>> The backtrace of cpu_curr(other_cpu) is unwinded based on the 'fp' saved
>>>> during its last switch-out. For the most part, it's out of date. So try
>>>> to use NMI to get the backtrace first, just like those functions in
>>>> "tree_stall.h" did. Such as rcu_dump_cpu_stacks().
>>>>
>>>> Signed-off-by: Zhen Lei <thunder.leizhen@huawei.com>
>>>
>>> Much better, thank you!
>>>
>>>> ---
>>>>  kernel/rcu/tree_exp.h | 3 ++-
>>>>  1 file changed, 2 insertions(+), 1 deletion(-)
>>>>
>>>> diff --git a/kernel/rcu/tree_exp.h b/kernel/rcu/tree_exp.h
>>>> index 0f70f62039a9090..21381697de23f0b 100644
>>>> --- a/kernel/rcu/tree_exp.h
>>>> +++ b/kernel/rcu/tree_exp.h
>>>> @@ -665,7 +665,8 @@ static void synchronize_rcu_expedited_wait(void)
>>>>  				mask = leaf_node_cpu_bit(rnp, cpu);
>>>>  				if (!(READ_ONCE(rnp->expmask) & mask))
>>>>  					continue;
>>>> -				dump_cpu_task(cpu);
>>>> +				if (!trigger_single_cpu_backtrace(cpu))
>>>> +					dump_cpu_task(cpu);
>>>
>>> But why not just leave this unchanged, rather than adding the call to
>>> trigger_single_cpu_backtrace() in this patch and then removing it in
>>> the next patch?
>>
>> To make the patch clear and easy to describe. Otherwise, I need to
>> give an additional description of it in the next patch, because I
>> searched all dump_cpu_task(). This seems to make the next patch
>> less simple.
>>
>> Some of the patch sets I've seen have been done step by step like
>> this. But I can't find it now.
>>
>> On the other hand, this patch is a small fix. Earlier versions may
>> only backport it, not the next cleanup patch.
> 
> You do have the option of doing a Cc to stable to control the backporting,
> if that is a potential issue for you.
> 
> On the commit log, just say that the one use case already avoided doing
> the trigger_single_cpu_backtrace(), and thus did not need to be updated.
> 
> So please resend the series, but without the undo/redo.  There would
> thus be two patches rather than three, but there are plenty of other
> things that need fixing anyway.

OK, thanks.

> 
> 							Thanx, Paul
> 
>>>>  			}
>>>>  		}
>>>>  		jiffies_stall = 3 * rcu_exp_jiffies_till_stall_check() + 3;
>>>> -- 
>>>> 2.25.1
>>>>
>>> .
>>>
>>
>> -- 
>> Regards,
>>   Zhen Lei
> .
> 

-- 
Regards,
  Zhen Lei

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2022-08-04  1:34 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-07-30 10:23 [PATCH v3 0/3] rcu: Display registers of self-detected stall as far as possible Zhen Lei
2022-07-30 10:23 ` [PATCH v3 1/3] rcu/exp: Use NMI to get the backtrace of cpu_curr(other_cpu) first Zhen Lei
2022-08-01 23:14   ` Paul E. McKenney
2022-08-02  2:06     ` Leizhen (ThunderTown)
2022-08-04  0:07       ` Paul E. McKenney
2022-08-04  1:34         ` Leizhen (ThunderTown)
2022-07-30 10:23 ` [PATCH v3 2/3] sched/debug: Try trigger_single_cpu_backtrace(cpu) in dump_cpu_task() Zhen Lei
2022-07-30 10:23 ` [PATCH v3 3/3] sched/debug: Show the registers of 'current' " Zhen Lei

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).