All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 0/2] rcu: Don't dump the stalled CPU on where RCU GP kthread last ran twice
@ 2023-07-05  7:30 Zhen Lei
  2023-07-05  7:30 ` [PATCH 1/2] rcu: Delete a redundant check in rcu_check_gp_kthread_starvation() Zhen Lei
  2023-07-05  7:30 ` [PATCH 2/2] rcu: Don't dump the stalled CPU on where RCU GP kthread last ran twice Zhen Lei
  0 siblings, 2 replies; 12+ messages in thread
From: Zhen Lei @ 2023-07-05  7:30 UTC (permalink / raw)
  To: Paul E . McKenney, Frederic Weisbecker, Neeraj Upadhyay,
	Joel Fernandes, Josh Triplett, Boqun Feng, Steven Rostedt,
	Mathieu Desnoyers, Lai Jiangshan, Zqiang, rcu, linux-kernel
  Cc: Zhen Lei

The stacks of all stalled CPUs will be dumped. If the CPU on where RCU GP
kthread last ran is stalled, its stack does not need to be dumped again.

For example: Please search "Sending NMI from CPU 1 to CPUs 0"
rcu: INFO: rcu_sched self-detected stall on CPU
rcu:    1-...!: (999 ticks this GP) idle=a1e4/1/0x40000002 softirq=116/116 fqs=0
rcu:    (t=1000 jiffies g=-875 q=18 ncpus=4)
rcu: rcu_sched kthread timer wakeup didn't happen for 999 jiffies! g-875 f0x0 RCU_GP_WAIT_FQS(5) ->state=0x402
rcu:    Possible timer handling issue on cpu=0 timer-softirq=449
rcu: rcu_sched kthread starved for 1000 jiffies! g-875 f0x0 RCU_GP_WAIT_FQS(5) ->state=0x402 ->cpu=0
rcu:    Unless rcu_sched kthread gets sufficient CPU time, OOM is now expected behavior.
rcu: RCU grace-period kthread stack dump:
task:rcu_sched       state:I stack:0     pid:12    ppid:2      flags:0x00000000
 __schedule from schedule+0x50/0xa4
 schedule from schedule_timeout+0x1f8/0x328
 schedule_timeout from rcu_gp_fqs_loop+0x330/0x464
 rcu_gp_fqs_loop from rcu_gp_kthread+0xb0/0x200
 rcu_gp_kthread from kthread+0xe8/0x104
 kthread from ret_from_fork+0x14/0x2c
Exception stack(0xc0855fb0 to 0xc0855ff8)
5fa0:                                     00000000 00000000 00000000 00000000
5fc0: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
5fe0: 00000000 00000000 00000000 00000000 00000013 00000000
rcu: Stack dump where RCU GP kthread last ran:
Sending NMI from CPU 1 to CPUs 0:
NMI backtrace for cpu 0
CPU: 0 PID: 0 Comm: swapper/0 Not tainted 6.4.0-rc1+ #2
Hardware name: ARM-Versatile Express
PC is at ktime_get+0x4c/0xe8
LR is at ktime_get+0x4c/0xe8
pc : [<801a61a4>]    lr : [<801a61a4>]    psr: 60000113
sp : 80d01e48  ip : 00000002  fp : 0000001a
r10: 5befcd40  r9 : 431bde82  r8 : d7b634db
r7 : 00001bb0  r6 : 9ad70c88  r5 : 00000002  r4 : 80db0f40
r3 : ffffffff  r2 : f8a7b162  r1 : 00000000  r0 : 07584e9d
Flags: nZCv  IRQs on  FIQs on  Mode SVC_32  ISA ARM  Segment none
Control: 10c5387d  Table: 6000406a  DAC: 00000051
 ktime_get from tst_softirq+0x30/0xfc
 tst_softirq from __do_softirq+0x128/0x334
 __do_softirq from irq_exit+0x108/0x12c
 irq_exit from __irq_svc+0x88/0xb0
Exception stack(0x80d01f18 to 0x80d01f60)
1f00:                                                       00490d54 00000001
1f20: 80d07fc0 00000000 80d9d260 80d04cd0 00000001 80d04d18 80c5ec18 00000000
1f40: 80d9bc35 80d07fc0 80d9cc80 80d01f68 808d6a6c 808d79f8 60000013 ffffffff
 __irq_svc from default_idle_call+0x4c/0xb4
 default_idle_call from do_idle+0x1a8/0x288
 do_idle from cpu_startup_entry+0x18/0x1c
 cpu_startup_entry from rest_init+0xb4/0xb8
 rest_init from arch_post_acpi_subsys_init+0x0/0x8
Sending NMI from CPU 1 to CPUs 0:
NMI backtrace for cpu 0
CPU: 0 PID: 0 Comm: swapper/0 Not tainted 6.4.0-rc1+ #2
Hardware name: ARM-Versatile Express
PC is at ktime_get+0x4c/0xe8
LR is at ktime_get+0x4c/0xe8
pc : [<801a61a4>]    lr : [<801a61a4>]    psr: 60000113
sp : 80d01e48  ip : 00000002  fp : 0000001a
r10: 5befcd40  r9 : 431bde82  r8 : d7b634db
r7 : 00001bb2  r6 : 9ad70c88  r5 : 00000002  r4 : 80db0f40
r3 : ffffffff  r2 : f8a78d88  r1 : 00000000  r0 : 07587277
Flags: nZCv  IRQs on  FIQs on  Mode SVC_32  ISA ARM  Segment none
Control: 10c5387d  Table: 6000406a  DAC: 00000051
 ktime_get from tst_softirq+0x30/0xfc
 tst_softirq from __do_softirq+0x128/0x334
 __do_softirq from irq_exit+0x108/0x12c
 irq_exit from __irq_svc+0x88/0xb0
Exception stack(0x80d01f18 to 0x80d01f60)
1f00:                                                       00490d54 00000001
1f20: 80d07fc0 00000000 80d9d260 80d04cd0 00000001 80d04d18 80c5ec18 00000000
1f40: 80d9bc35 80d07fc0 80d9cc80 80d01f68 808d6a6c 808d79f8 60000013 ffffffff
 __irq_svc from default_idle_call+0x4c/0xb4
 default_idle_call from do_idle+0x1a8/0x288
 do_idle from cpu_startup_entry+0x18/0x1c
 cpu_startup_entry from rest_init+0xb4/0xb8
 rest_init from arch_post_acpi_subsys_init+0x0/0x8
CPU: 1 PID: 0 Comm: swapper/1 Not tainted 6.4.0-rc1+ #2
Hardware name: ARM-Versatile Express
PC is at default_idle_call+0x4c/0xb4
LR is at ct_kernel_enter.constprop.5+0x44/0x11c
pc : [<808d79f8>]    lr : [<808d6a6c>]    psr: 60000013
sp : c085df98  ip : 80d9cc80  fp : 81126e80
r10: 80d9bc35  r9 : 00000000  r8 : 80c5ec18
r7 : 80d04d18  r6 : 00000002  r5 : 80d04cd0  r4 : 80d9d260
r3 : 00000000  r2 : 81126e80  r1 : 00000001  r0 : 0050a1e4
Flags: nZCv  IRQs on  FIQs on  Mode SVC_32  ISA ARM  Segment none
Control: 10c5387d  Table: 6802806a  DAC: 00000051
 default_idle_call from do_idle+0x1a8/0x288
 do_idle from cpu_startup_entry+0x18/0x1c
 cpu_startup_entry from secondary_start_kernel+0x14c/0x150
 secondary_start_kernel from 0x60101660
Sending NMI from CPU 1 to CPUs 3:
NMI backtrace for cpu 3 skipped: idling at default_idle_call+0x4c/0xb4


Zhen Lei (2):
  rcu: Delete a redundant check in rcu_check_gp_kthread_starvation()
  rcu: Don't dump the stalled CPU on where RCU GP kthread last ran twice

 kernel/rcu/tree_stall.h | 14 +++++++-------
 1 file changed, 7 insertions(+), 7 deletions(-)

-- 
2.25.1


^ permalink raw reply	[flat|nested] 12+ messages in thread

* [PATCH 1/2] rcu: Delete a redundant check in rcu_check_gp_kthread_starvation()
  2023-07-05  7:30 [PATCH 0/2] rcu: Don't dump the stalled CPU on where RCU GP kthread last ran twice Zhen Lei
@ 2023-07-05  7:30 ` Zhen Lei
  2023-07-10 19:03   ` Paul E. McKenney
  2023-07-05  7:30 ` [PATCH 2/2] rcu: Don't dump the stalled CPU on where RCU GP kthread last ran twice Zhen Lei
  1 sibling, 1 reply; 12+ messages in thread
From: Zhen Lei @ 2023-07-05  7:30 UTC (permalink / raw)
  To: Paul E . McKenney, Frederic Weisbecker, Neeraj Upadhyay,
	Joel Fernandes, Josh Triplett, Boqun Feng, Steven Rostedt,
	Mathieu Desnoyers, Lai Jiangshan, Zqiang, rcu, linux-kernel
  Cc: Zhen Lei

The above condition "if (gpk)" already ensures that gp_kthread is created,
so the local variable 'cpu' cannot be negative here.

Signed-off-by: Zhen Lei <thunder.leizhen@huawei.com>
---
 kernel/rcu/tree_stall.h | 12 +++++-------
 1 file changed, 5 insertions(+), 7 deletions(-)

diff --git a/kernel/rcu/tree_stall.h b/kernel/rcu/tree_stall.h
index b10b8349bb2a48b..dcfaa3d5db2cbc7 100644
--- a/kernel/rcu/tree_stall.h
+++ b/kernel/rcu/tree_stall.h
@@ -537,13 +537,11 @@ static void rcu_check_gp_kthread_starvation(void)
 			pr_err("\tUnless %s kthread gets sufficient CPU time, OOM is now expected behavior.\n", rcu_state.name);
 			pr_err("RCU grace-period kthread stack dump:\n");
 			sched_show_task(gpk);
-			if (cpu >= 0) {
-				if (cpu_is_offline(cpu)) {
-					pr_err("RCU GP kthread last ran on offline CPU %d.\n", cpu);
-				} else  {
-					pr_err("Stack dump where RCU GP kthread last ran:\n");
-					dump_cpu_task(cpu);
-				}
+			if (cpu_is_offline(cpu)) {
+				pr_err("RCU GP kthread last ran on offline CPU %d.\n", cpu);
+			} else  {
+				pr_err("Stack dump where RCU GP kthread last ran:\n");
+				dump_cpu_task(cpu);
 			}
 			wake_up_process(gpk);
 		}
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH 2/2] rcu: Don't dump the stalled CPU on where RCU GP kthread last ran twice
  2023-07-05  7:30 [PATCH 0/2] rcu: Don't dump the stalled CPU on where RCU GP kthread last ran twice Zhen Lei
  2023-07-05  7:30 ` [PATCH 1/2] rcu: Delete a redundant check in rcu_check_gp_kthread_starvation() Zhen Lei
@ 2023-07-05  7:30 ` Zhen Lei
  2023-07-05  8:17   ` Leizhen (ThunderTown)
  2023-07-10 19:05   ` Paul E. McKenney
  1 sibling, 2 replies; 12+ messages in thread
From: Zhen Lei @ 2023-07-05  7:30 UTC (permalink / raw)
  To: Paul E . McKenney, Frederic Weisbecker, Neeraj Upadhyay,
	Joel Fernandes, Josh Triplett, Boqun Feng, Steven Rostedt,
	Mathieu Desnoyers, Lai Jiangshan, Zqiang, rcu, linux-kernel
  Cc: Zhen Lei

The stacks of all stalled CPUs will be dumped. If the CPU on where RCU GP
kthread last ran is stalled, its stack does not need to be dumped again.

Signed-off-by: Zhen Lei <thunder.leizhen@huawei.com>
---
 kernel/rcu/tree_stall.h | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/kernel/rcu/tree_stall.h b/kernel/rcu/tree_stall.h
index dcfaa3d5db2cbc7..cc884cd49e026a3 100644
--- a/kernel/rcu/tree_stall.h
+++ b/kernel/rcu/tree_stall.h
@@ -534,12 +534,14 @@ static void rcu_check_gp_kthread_starvation(void)
 		       data_race(READ_ONCE(rcu_state.gp_state)),
 		       gpk ? data_race(READ_ONCE(gpk->__state)) : ~0, cpu);
 		if (gpk) {
+			struct rcu_data *rdp = per_cpu_ptr(&rcu_data, cpu);
+
 			pr_err("\tUnless %s kthread gets sufficient CPU time, OOM is now expected behavior.\n", rcu_state.name);
 			pr_err("RCU grace-period kthread stack dump:\n");
 			sched_show_task(gpk);
 			if (cpu_is_offline(cpu)) {
 				pr_err("RCU GP kthread last ran on offline CPU %d.\n", cpu);
-			} else  {
+			} else if (!(data_race(READ_ONCE(rdp->mynode->qsmask)) & rdp->grpmask)) {
 				pr_err("Stack dump where RCU GP kthread last ran:\n");
 				dump_cpu_task(cpu);
 			}
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* Re: [PATCH 2/2] rcu: Don't dump the stalled CPU on where RCU GP kthread last ran twice
  2023-07-05  7:30 ` [PATCH 2/2] rcu: Don't dump the stalled CPU on where RCU GP kthread last ran twice Zhen Lei
@ 2023-07-05  8:17   ` Leizhen (ThunderTown)
  2023-07-10 19:05   ` Paul E. McKenney
  1 sibling, 0 replies; 12+ messages in thread
From: Leizhen (ThunderTown) @ 2023-07-05  8:17 UTC (permalink / raw)
  To: Paul E . McKenney, Frederic Weisbecker, Neeraj Upadhyay,
	Joel Fernandes, Josh Triplett, Boqun Feng, Steven Rostedt,
	Mathieu Desnoyers, Lai Jiangshan, Zqiang, rcu, linux-kernel



On 2023/7/5 15:30, Zhen Lei wrote:
> The stacks of all stalled CPUs will be dumped. If the CPU on where RCU GP
> kthread last ran is stalled, its stack does not need to be dumped again.

Should I add Fixes?

Fixes: 243027a3c805 ("rcu: For RCU grace-period kthread starvation, dump last CPU it ran on")

> 
> Signed-off-by: Zhen Lei <thunder.leizhen@huawei.com>
> ---
>  kernel/rcu/tree_stall.h | 4 +++-
>  1 file changed, 3 insertions(+), 1 deletion(-)
> 
> diff --git a/kernel/rcu/tree_stall.h b/kernel/rcu/tree_stall.h
> index dcfaa3d5db2cbc7..cc884cd49e026a3 100644
> --- a/kernel/rcu/tree_stall.h
> +++ b/kernel/rcu/tree_stall.h
> @@ -534,12 +534,14 @@ static void rcu_check_gp_kthread_starvation(void)
>  		       data_race(READ_ONCE(rcu_state.gp_state)),
>  		       gpk ? data_race(READ_ONCE(gpk->__state)) : ~0, cpu);
>  		if (gpk) {
> +			struct rcu_data *rdp = per_cpu_ptr(&rcu_data, cpu);
> +
>  			pr_err("\tUnless %s kthread gets sufficient CPU time, OOM is now expected behavior.\n", rcu_state.name);
>  			pr_err("RCU grace-period kthread stack dump:\n");
>  			sched_show_task(gpk);
>  			if (cpu_is_offline(cpu)) {
>  				pr_err("RCU GP kthread last ran on offline CPU %d.\n", cpu);
> -			} else  {
> +			} else if (!(data_race(READ_ONCE(rdp->mynode->qsmask)) & rdp->grpmask)) {
>  				pr_err("Stack dump where RCU GP kthread last ran:\n");
>  				dump_cpu_task(cpu);
>  			}
> 

-- 
Regards,
  Zhen Lei

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH 1/2] rcu: Delete a redundant check in rcu_check_gp_kthread_starvation()
  2023-07-05  7:30 ` [PATCH 1/2] rcu: Delete a redundant check in rcu_check_gp_kthread_starvation() Zhen Lei
@ 2023-07-10 19:03   ` Paul E. McKenney
  2023-07-11  3:20     ` Leizhen (ThunderTown)
  0 siblings, 1 reply; 12+ messages in thread
From: Paul E. McKenney @ 2023-07-10 19:03 UTC (permalink / raw)
  To: Zhen Lei
  Cc: Frederic Weisbecker, Neeraj Upadhyay, Joel Fernandes,
	Josh Triplett, Boqun Feng, Steven Rostedt, Mathieu Desnoyers,
	Lai Jiangshan, Zqiang, rcu, linux-kernel

On Wed, Jul 05, 2023 at 03:30:19PM +0800, Zhen Lei wrote:
> The above condition "if (gpk)" already ensures that gp_kthread is created,
> so the local variable 'cpu' cannot be negative here.
> 
> Signed-off-by: Zhen Lei <thunder.leizhen@huawei.com>
> ---
>  kernel/rcu/tree_stall.h | 12 +++++-------
>  1 file changed, 5 insertions(+), 7 deletions(-)
> 
> diff --git a/kernel/rcu/tree_stall.h b/kernel/rcu/tree_stall.h
> index b10b8349bb2a48b..dcfaa3d5db2cbc7 100644
> --- a/kernel/rcu/tree_stall.h
> +++ b/kernel/rcu/tree_stall.h
> @@ -537,13 +537,11 @@ static void rcu_check_gp_kthread_starvation(void)
>  			pr_err("\tUnless %s kthread gets sufficient CPU time, OOM is now expected behavior.\n", rcu_state.name);
>  			pr_err("RCU grace-period kthread stack dump:\n");
>  			sched_show_task(gpk);
> -			if (cpu >= 0) {

I am not quite this trusting of the relation between the relationship
between the existence of the grace-period khread and its CPU number
being in range.  Let's please start with something like this:

			if (!WARN_ON_ONCE(cpu < 0)) {

Please note that this is not just me.  See for example the use of the
cpumask_check() function, albeit the opposite concern.

> -				if (cpu_is_offline(cpu)) {
> -					pr_err("RCU GP kthread last ran on offline CPU %d.\n", cpu);
> -				} else  {
> -					pr_err("Stack dump where RCU GP kthread last ran:\n");
> -					dump_cpu_task(cpu);
> -				}
> +			if (cpu_is_offline(cpu)) {
> +				pr_err("RCU GP kthread last ran on offline CPU %d.\n", cpu);
> +			} else  {
> +				pr_err("Stack dump where RCU GP kthread last ran:\n");
> +				dump_cpu_task(cpu);
>  			}
>  			wake_up_process(gpk);
>  		}
> -- 
> 2.25.1
> 

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH 2/2] rcu: Don't dump the stalled CPU on where RCU GP kthread last ran twice
  2023-07-05  7:30 ` [PATCH 2/2] rcu: Don't dump the stalled CPU on where RCU GP kthread last ran twice Zhen Lei
  2023-07-05  8:17   ` Leizhen (ThunderTown)
@ 2023-07-10 19:05   ` Paul E. McKenney
  2023-07-10 19:55     ` Joel Fernandes
  1 sibling, 1 reply; 12+ messages in thread
From: Paul E. McKenney @ 2023-07-10 19:05 UTC (permalink / raw)
  To: Zhen Lei
  Cc: Frederic Weisbecker, Neeraj Upadhyay, Joel Fernandes,
	Josh Triplett, Boqun Feng, Steven Rostedt, Mathieu Desnoyers,
	Lai Jiangshan, Zqiang, rcu, linux-kernel

On Wed, Jul 05, 2023 at 03:30:20PM +0800, Zhen Lei wrote:
> The stacks of all stalled CPUs will be dumped. If the CPU on where RCU GP
> kthread last ran is stalled, its stack does not need to be dumped again.
> 
> Signed-off-by: Zhen Lei <thunder.leizhen@huawei.com>

This one looks good.  Please feel free to rebase it before 1/2 and repost.

							Thanx, Paul

> ---
>  kernel/rcu/tree_stall.h | 4 +++-
>  1 file changed, 3 insertions(+), 1 deletion(-)
> 
> diff --git a/kernel/rcu/tree_stall.h b/kernel/rcu/tree_stall.h
> index dcfaa3d5db2cbc7..cc884cd49e026a3 100644
> --- a/kernel/rcu/tree_stall.h
> +++ b/kernel/rcu/tree_stall.h
> @@ -534,12 +534,14 @@ static void rcu_check_gp_kthread_starvation(void)
>  		       data_race(READ_ONCE(rcu_state.gp_state)),
>  		       gpk ? data_race(READ_ONCE(gpk->__state)) : ~0, cpu);
>  		if (gpk) {
> +			struct rcu_data *rdp = per_cpu_ptr(&rcu_data, cpu);
> +
>  			pr_err("\tUnless %s kthread gets sufficient CPU time, OOM is now expected behavior.\n", rcu_state.name);
>  			pr_err("RCU grace-period kthread stack dump:\n");
>  			sched_show_task(gpk);
>  			if (cpu_is_offline(cpu)) {
>  				pr_err("RCU GP kthread last ran on offline CPU %d.\n", cpu);
> -			} else  {
> +			} else if (!(data_race(READ_ONCE(rdp->mynode->qsmask)) & rdp->grpmask)) {
>  				pr_err("Stack dump where RCU GP kthread last ran:\n");
>  				dump_cpu_task(cpu);
>  			}
> -- 
> 2.25.1
> 

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH 2/2] rcu: Don't dump the stalled CPU on where RCU GP kthread last ran twice
  2023-07-10 19:05   ` Paul E. McKenney
@ 2023-07-10 19:55     ` Joel Fernandes
  2023-07-10 20:32       ` Paul E. McKenney
  0 siblings, 1 reply; 12+ messages in thread
From: Joel Fernandes @ 2023-07-10 19:55 UTC (permalink / raw)
  To: paulmck
  Cc: Zhen Lei, Frederic Weisbecker, Neeraj Upadhyay, Josh Triplett,
	Boqun Feng, Steven Rostedt, Mathieu Desnoyers, Lai Jiangshan,
	Zqiang, rcu, linux-kernel

On Mon, Jul 10, 2023 at 3:06 PM Paul E. McKenney <paulmck@kernel.org> wrote:
>
> On Wed, Jul 05, 2023 at 03:30:20PM +0800, Zhen Lei wrote:
> > The stacks of all stalled CPUs will be dumped. If the CPU on where RCU GP
> > kthread last ran is stalled, its stack does not need to be dumped again.
> >
> > Signed-off-by: Zhen Lei <thunder.leizhen@huawei.com>
>
> This one looks good.  Please feel free to rebase it before 1/2 and repost.

Just a small comment:
I wondered if this would make it harder to identify which stack among
the various CPU stacks corresponds to the one the GP kthread is
running on. However, this line does print the CPU number of the
thread, so it is perhaps not an issue:

                pr_err("%s kthread starved for %ld jiffies! g%ld f%#x
%s(%d) ->state=%#x ->cpu=%d\n",

Reviewed-by: Joel Fernandes (Google) <joel@joelfernandes.org>

Thanks.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH 2/2] rcu: Don't dump the stalled CPU on where RCU GP kthread last ran twice
  2023-07-10 19:55     ` Joel Fernandes
@ 2023-07-10 20:32       ` Paul E. McKenney
  2023-07-11  3:26         ` Leizhen (ThunderTown)
  0 siblings, 1 reply; 12+ messages in thread
From: Paul E. McKenney @ 2023-07-10 20:32 UTC (permalink / raw)
  To: Joel Fernandes
  Cc: Zhen Lei, Frederic Weisbecker, Neeraj Upadhyay, Josh Triplett,
	Boqun Feng, Steven Rostedt, Mathieu Desnoyers, Lai Jiangshan,
	Zqiang, rcu, linux-kernel

On Mon, Jul 10, 2023 at 03:55:16PM -0400, Joel Fernandes wrote:
> On Mon, Jul 10, 2023 at 3:06 PM Paul E. McKenney <paulmck@kernel.org> wrote:
> >
> > On Wed, Jul 05, 2023 at 03:30:20PM +0800, Zhen Lei wrote:
> > > The stacks of all stalled CPUs will be dumped. If the CPU on where RCU GP
> > > kthread last ran is stalled, its stack does not need to be dumped again.
> > >
> > > Signed-off-by: Zhen Lei <thunder.leizhen@huawei.com>
> >
> > This one looks good.  Please feel free to rebase it before 1/2 and repost.
> 
> Just a small comment:
> I wondered if this would make it harder to identify which stack among
> the various CPU stacks corresponds to the one the GP kthread is
> running on. However, this line does print the CPU number of the
> thread, so it is perhaps not an issue:
> 
>                 pr_err("%s kthread starved for %ld jiffies! g%ld f%#x
> %s(%d) ->state=%#x ->cpu=%d\n",
> 
> Reviewed-by: Joel Fernandes (Google) <joel@joelfernandes.org>

Thank you!  Zhen Lei, please feel free to add Joel's Reviewed-by on your
next posting.

							Thanx, Paul

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH 1/2] rcu: Delete a redundant check in rcu_check_gp_kthread_starvation()
  2023-07-10 19:03   ` Paul E. McKenney
@ 2023-07-11  3:20     ` Leizhen (ThunderTown)
  2023-07-11 16:48       ` Paul E. McKenney
  0 siblings, 1 reply; 12+ messages in thread
From: Leizhen (ThunderTown) @ 2023-07-11  3:20 UTC (permalink / raw)
  To: paulmck
  Cc: Frederic Weisbecker, Neeraj Upadhyay, Joel Fernandes,
	Josh Triplett, Boqun Feng, Steven Rostedt, Mathieu Desnoyers,
	Lai Jiangshan, Zqiang, rcu, linux-kernel



On 2023/7/11 3:03, Paul E. McKenney wrote:
> On Wed, Jul 05, 2023 at 03:30:19PM +0800, Zhen Lei wrote:
>> The above condition "if (gpk)" already ensures that gp_kthread is created,
>> so the local variable 'cpu' cannot be negative here.
>>
>> Signed-off-by: Zhen Lei <thunder.leizhen@huawei.com>
>> ---
>>  kernel/rcu/tree_stall.h | 12 +++++-------
>>  1 file changed, 5 insertions(+), 7 deletions(-)
>>
>> diff --git a/kernel/rcu/tree_stall.h b/kernel/rcu/tree_stall.h
>> index b10b8349bb2a48b..dcfaa3d5db2cbc7 100644
>> --- a/kernel/rcu/tree_stall.h
>> +++ b/kernel/rcu/tree_stall.h
>> @@ -537,13 +537,11 @@ static void rcu_check_gp_kthread_starvation(void)
>>  			pr_err("\tUnless %s kthread gets sufficient CPU time, OOM is now expected behavior.\n", rcu_state.name);
>>  			pr_err("RCU grace-period kthread stack dump:\n");
>>  			sched_show_task(gpk);
>> -			if (cpu >= 0) {
> 
> I am not quite this trusting of the relation between the relationship
> between the existence of the grace-period khread and its CPU number
> being in range.  Let's please start with something like this:
> 
> 			if (!WARN_ON_ONCE(cpu < 0)) {
> 
> Please note that this is not just me.  See for example the use of the
> cpumask_check() function, albeit the opposite concern.

git grep -wn "\->cpu" kernel/ include/
kernel/kthread.c:583:   to_kthread(p)->cpu = cpu;				//kthread_create_on_cpu()
kernel/sched/sched.h:2024:      WRITE_ONCE(task_thread_info(p)->cpu, cpu);	//__set_task_cpu()
include/linux/sched.h:2250:     return READ_ONCE(task_thread_info(p)->cpu);	//task_cpu()

git grep -wn "\.cpu" kernel/ include/						//There is no task related, the search result is omitted.

Therefore, there is only one path "set_task_cpu()-->__set_task_cpu()" that can dynamically
change the value of task_cpu(p). In fact, this guarantee has been made in set_task_cpu().
set_task_cpu
	WARN_ON_ONCE(!cpu_online(new_cpu));
	__set_task_cpu(p, new_cpu);

In addition, task_struct has member 'on_rq'. Therefore, when a task leaves the scheduling
queue, setting the member 'cpu' to an invalid value will be thankless.

Sorry, these two patches was posted too quickly, and I'm still regretting that I should have
attached this to the commit description these days.


> 
>> -				if (cpu_is_offline(cpu)) {
>> -					pr_err("RCU GP kthread last ran on offline CPU %d.\n", cpu);
>> -				} else  {
>> -					pr_err("Stack dump where RCU GP kthread last ran:\n");
>> -					dump_cpu_task(cpu);
>> -				}
>> +			if (cpu_is_offline(cpu)) {
>> +				pr_err("RCU GP kthread last ran on offline CPU %d.\n", cpu);
>> +			} else  {
>> +				pr_err("Stack dump where RCU GP kthread last ran:\n");
>> +				dump_cpu_task(cpu);
>>  			}
>>  			wake_up_process(gpk);
>>  		}
>> -- 
>> 2.25.1
>>
> .
> 

-- 
Regards,
  Zhen Lei

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH 2/2] rcu: Don't dump the stalled CPU on where RCU GP kthread last ran twice
  2023-07-10 20:32       ` Paul E. McKenney
@ 2023-07-11  3:26         ` Leizhen (ThunderTown)
  0 siblings, 0 replies; 12+ messages in thread
From: Leizhen (ThunderTown) @ 2023-07-11  3:26 UTC (permalink / raw)
  To: paulmck, Joel Fernandes
  Cc: Frederic Weisbecker, Neeraj Upadhyay, Josh Triplett, Boqun Feng,
	Steven Rostedt, Mathieu Desnoyers, Lai Jiangshan, Zqiang, rcu,
	linux-kernel



On 2023/7/11 4:32, Paul E. McKenney wrote:
> On Mon, Jul 10, 2023 at 03:55:16PM -0400, Joel Fernandes wrote:
>> On Mon, Jul 10, 2023 at 3:06 PM Paul E. McKenney <paulmck@kernel.org> wrote:
>>>
>>> On Wed, Jul 05, 2023 at 03:30:20PM +0800, Zhen Lei wrote:
>>>> The stacks of all stalled CPUs will be dumped. If the CPU on where RCU GP
>>>> kthread last ran is stalled, its stack does not need to be dumped again.
>>>>
>>>> Signed-off-by: Zhen Lei <thunder.leizhen@huawei.com>
>>>
>>> This one looks good.  Please feel free to rebase it before 1/2 and repost.
>>
>> Just a small comment:
>> I wondered if this would make it harder to identify which stack among
>> the various CPU stacks corresponds to the one the GP kthread is
>> running on. However, this line does print the CPU number of the
>> thread, so it is perhaps not an issue:
>>
>>                 pr_err("%s kthread starved for %ld jiffies! g%ld f%#x
>> %s(%d) ->state=%#x ->cpu=%d\n",
>>
>> Reviewed-by: Joel Fernandes (Google) <joel@joelfernandes.org>
> 
> Thank you!  Zhen Lei, please feel free to add Joel's Reviewed-by on your
> next posting.

OK

> 
> 							Thanx, Paul
> .
> 

-- 
Regards,
  Zhen Lei

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH 1/2] rcu: Delete a redundant check in rcu_check_gp_kthread_starvation()
  2023-07-11  3:20     ` Leizhen (ThunderTown)
@ 2023-07-11 16:48       ` Paul E. McKenney
  2023-07-13  2:03         ` Leizhen (ThunderTown)
  0 siblings, 1 reply; 12+ messages in thread
From: Paul E. McKenney @ 2023-07-11 16:48 UTC (permalink / raw)
  To: Leizhen (ThunderTown)
  Cc: Frederic Weisbecker, Neeraj Upadhyay, Joel Fernandes,
	Josh Triplett, Boqun Feng, Steven Rostedt, Mathieu Desnoyers,
	Lai Jiangshan, Zqiang, rcu, linux-kernel

On Tue, Jul 11, 2023 at 11:20:07AM +0800, Leizhen (ThunderTown) wrote:
> 
> 
> On 2023/7/11 3:03, Paul E. McKenney wrote:
> > On Wed, Jul 05, 2023 at 03:30:19PM +0800, Zhen Lei wrote:
> >> The above condition "if (gpk)" already ensures that gp_kthread is created,
> >> so the local variable 'cpu' cannot be negative here.
> >>
> >> Signed-off-by: Zhen Lei <thunder.leizhen@huawei.com>
> >> ---
> >>  kernel/rcu/tree_stall.h | 12 +++++-------
> >>  1 file changed, 5 insertions(+), 7 deletions(-)
> >>
> >> diff --git a/kernel/rcu/tree_stall.h b/kernel/rcu/tree_stall.h
> >> index b10b8349bb2a48b..dcfaa3d5db2cbc7 100644
> >> --- a/kernel/rcu/tree_stall.h
> >> +++ b/kernel/rcu/tree_stall.h
> >> @@ -537,13 +537,11 @@ static void rcu_check_gp_kthread_starvation(void)
> >>  			pr_err("\tUnless %s kthread gets sufficient CPU time, OOM is now expected behavior.\n", rcu_state.name);
> >>  			pr_err("RCU grace-period kthread stack dump:\n");
> >>  			sched_show_task(gpk);
> >> -			if (cpu >= 0) {
> > 
> > I am not quite this trusting of the relation between the relationship
> > between the existence of the grace-period khread and its CPU number
> > being in range.  Let's please start with something like this:
> > 
> > 			if (!WARN_ON_ONCE(cpu < 0)) {
> > 
> > Please note that this is not just me.  See for example the use of the
> > cpumask_check() function, albeit the opposite concern.
> 
> git grep -wn "\->cpu" kernel/ include/
> kernel/kthread.c:583:   to_kthread(p)->cpu = cpu;				//kthread_create_on_cpu()
> kernel/sched/sched.h:2024:      WRITE_ONCE(task_thread_info(p)->cpu, cpu);	//__set_task_cpu()
> include/linux/sched.h:2250:     return READ_ONCE(task_thread_info(p)->cpu);	//task_cpu()
> 
> git grep -wn "\.cpu" kernel/ include/						//There is no task related, the search result is omitted.
> 
> Therefore, there is only one path "set_task_cpu()-->__set_task_cpu()" that can dynamically
> change the value of task_cpu(p). In fact, this guarantee has been made in set_task_cpu().
> set_task_cpu
> 	WARN_ON_ONCE(!cpu_online(new_cpu));
> 	__set_task_cpu(p, new_cpu);
> 
> In addition, task_struct has member 'on_rq'. Therefore, when a task leaves the scheduling
> queue, setting the member 'cpu' to an invalid value will be thankless.

Thank you for digging into this!  Given that, as you say, we can dispense
with the check.

> Sorry, these two patches was posted too quickly, and I'm still regretting that I should have
> attached this to the commit description these days.

Please do resend the patches with this explanation in the commit log.
And please don't worry about making the English pretty, as I can always
wordsmith.

							Thanx, Paul

> >> -				if (cpu_is_offline(cpu)) {
> >> -					pr_err("RCU GP kthread last ran on offline CPU %d.\n", cpu);
> >> -				} else  {
> >> -					pr_err("Stack dump where RCU GP kthread last ran:\n");
> >> -					dump_cpu_task(cpu);
> >> -				}
> >> +			if (cpu_is_offline(cpu)) {
> >> +				pr_err("RCU GP kthread last ran on offline CPU %d.\n", cpu);
> >> +			} else  {
> >> +				pr_err("Stack dump where RCU GP kthread last ran:\n");
> >> +				dump_cpu_task(cpu);
> >>  			}
> >>  			wake_up_process(gpk);
> >>  		}
> >> -- 
> >> 2.25.1
> >>
> > .
> > 
> 
> -- 
> Regards,
>   Zhen Lei

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH 1/2] rcu: Delete a redundant check in rcu_check_gp_kthread_starvation()
  2023-07-11 16:48       ` Paul E. McKenney
@ 2023-07-13  2:03         ` Leizhen (ThunderTown)
  0 siblings, 0 replies; 12+ messages in thread
From: Leizhen (ThunderTown) @ 2023-07-13  2:03 UTC (permalink / raw)
  To: paulmck
  Cc: Frederic Weisbecker, Neeraj Upadhyay, Joel Fernandes,
	Josh Triplett, Boqun Feng, Steven Rostedt, Mathieu Desnoyers,
	Lai Jiangshan, Zqiang, rcu, linux-kernel



On 2023/7/12 0:48, Paul E. McKenney wrote:
> On Tue, Jul 11, 2023 at 11:20:07AM +0800, Leizhen (ThunderTown) wrote:
>>
>>
>> On 2023/7/11 3:03, Paul E. McKenney wrote:
>>> On Wed, Jul 05, 2023 at 03:30:19PM +0800, Zhen Lei wrote:
>>>> The above condition "if (gpk)" already ensures that gp_kthread is created,
>>>> so the local variable 'cpu' cannot be negative here.
>>>>
>>>> Signed-off-by: Zhen Lei <thunder.leizhen@huawei.com>
>>>> ---
>>>>  kernel/rcu/tree_stall.h | 12 +++++-------
>>>>  1 file changed, 5 insertions(+), 7 deletions(-)
>>>>
>>>> diff --git a/kernel/rcu/tree_stall.h b/kernel/rcu/tree_stall.h
>>>> index b10b8349bb2a48b..dcfaa3d5db2cbc7 100644
>>>> --- a/kernel/rcu/tree_stall.h
>>>> +++ b/kernel/rcu/tree_stall.h
>>>> @@ -537,13 +537,11 @@ static void rcu_check_gp_kthread_starvation(void)
>>>>  			pr_err("\tUnless %s kthread gets sufficient CPU time, OOM is now expected behavior.\n", rcu_state.name);
>>>>  			pr_err("RCU grace-period kthread stack dump:\n");
>>>>  			sched_show_task(gpk);
>>>> -			if (cpu >= 0) {
>>>
>>> I am not quite this trusting of the relation between the relationship
>>> between the existence of the grace-period khread and its CPU number
>>> being in range.  Let's please start with something like this:
>>>
>>> 			if (!WARN_ON_ONCE(cpu < 0)) {
>>>
>>> Please note that this is not just me.  See for example the use of the
>>> cpumask_check() function, albeit the opposite concern.
>>
>> git grep -wn "\->cpu" kernel/ include/
>> kernel/kthread.c:583:   to_kthread(p)->cpu = cpu;				//kthread_create_on_cpu()
>> kernel/sched/sched.h:2024:      WRITE_ONCE(task_thread_info(p)->cpu, cpu);	//__set_task_cpu()
>> include/linux/sched.h:2250:     return READ_ONCE(task_thread_info(p)->cpu);	//task_cpu()
>>
>> git grep -wn "\.cpu" kernel/ include/						//There is no task related, the search result is omitted.
>>
>> Therefore, there is only one path "set_task_cpu()-->__set_task_cpu()" that can dynamically
>> change the value of task_cpu(p). In fact, this guarantee has been made in set_task_cpu().
>> set_task_cpu
>> 	WARN_ON_ONCE(!cpu_online(new_cpu));
>> 	__set_task_cpu(p, new_cpu);
>>
>> In addition, task_struct has member 'on_rq'. Therefore, when a task leaves the scheduling
>> queue, setting the member 'cpu' to an invalid value will be thankless.
> 
> Thank you for digging into this!  Given that, as you say, we can dispense
> with the check.
> 
>> Sorry, these two patches was posted too quickly, and I'm still regretting that I should have
>> attached this to the commit description these days.
> 
> Please do resend the patches with this explanation in the commit log.
> And please don't worry about making the English pretty, as I can always
> wordsmith.

OK, thank you very much.

> 
> 							Thanx, Paul
> 
>>>> -				if (cpu_is_offline(cpu)) {
>>>> -					pr_err("RCU GP kthread last ran on offline CPU %d.\n", cpu);
>>>> -				} else  {
>>>> -					pr_err("Stack dump where RCU GP kthread last ran:\n");
>>>> -					dump_cpu_task(cpu);
>>>> -				}
>>>> +			if (cpu_is_offline(cpu)) {
>>>> +				pr_err("RCU GP kthread last ran on offline CPU %d.\n", cpu);
>>>> +			} else  {
>>>> +				pr_err("Stack dump where RCU GP kthread last ran:\n");
>>>> +				dump_cpu_task(cpu);
>>>>  			}
>>>>  			wake_up_process(gpk);
>>>>  		}
>>>> -- 
>>>> 2.25.1
>>>>
>>> .
>>>
>>
>> -- 
>> Regards,
>>   Zhen Lei
> .
> 

-- 
Regards,
  Zhen Lei

^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2023-07-13  2:03 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-07-05  7:30 [PATCH 0/2] rcu: Don't dump the stalled CPU on where RCU GP kthread last ran twice Zhen Lei
2023-07-05  7:30 ` [PATCH 1/2] rcu: Delete a redundant check in rcu_check_gp_kthread_starvation() Zhen Lei
2023-07-10 19:03   ` Paul E. McKenney
2023-07-11  3:20     ` Leizhen (ThunderTown)
2023-07-11 16:48       ` Paul E. McKenney
2023-07-13  2:03         ` Leizhen (ThunderTown)
2023-07-05  7:30 ` [PATCH 2/2] rcu: Don't dump the stalled CPU on where RCU GP kthread last ran twice Zhen Lei
2023-07-05  8:17   ` Leizhen (ThunderTown)
2023-07-10 19:05   ` Paul E. McKenney
2023-07-10 19:55     ` Joel Fernandes
2023-07-10 20:32       ` Paul E. McKenney
2023-07-11  3:26         ` Leizhen (ThunderTown)

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.