All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH linux-next][RFC] powerpc: protect cpu offlining by RCU offline lock
@ 2022-09-14  2:15 Zhouyi Zhou
  2022-09-14 12:17   ` Paul E. McKenney
  0 siblings, 1 reply; 6+ messages in thread
From: Zhouyi Zhou @ 2022-09-14  2:15 UTC (permalink / raw)
  To: mpe, npiggin, christophe.leroy, atrajeev, linuxppc-dev,
	linux-kernel, lance, paulmck, rcu
  Cc: Zhouyi Zhou

During the cpu offlining, the sub functions of xive_teardown_cpu will
call __lock_acquire when CONFIG_LOCKDEP=y. The latter function will
travel RCU protected list, so "WARNING: suspicious RCU usage" will be
triggered.

Try to protect cpu offlining by RCU offline lock.

Tested on PPC VM of Open Source Lab of Oregon State University.
(Each round of tests takes about 19 hours to finish)
Test results show that although "WARNING: suspicious RCU usage" has gone,
but there are more "BUG: soft lockup" reports than the original kernel
(10 vs 6), so I add a [RFC] to my subject line.

Signed-off-by: Zhouyi Zhou <zhouzhouyi@gmail.com>
---
[it seems that there are some delivery problem in my previous email,
 so I send again via gmail, sorry for the trouble]
 
Dear PPC and RCU developers

I found this bug when trying to do rcutorture tests in ppc VM of
Open Source Lab of Oregon State University.

console.log report following bug:
[   37.635545][    T0] WARNING: suspicious RCU usage^M
[   37.636409][    T0] 6.0.0-rc4-next-20220907-dirty #8 Not tainted^M
[   37.637575][    T0] -----------------------------^M
[   37.638306][    T0] kernel/locking/lockdep.c:3723 RCU-list traversed in non-reader section!!^M
[   37.639651][    T0] ^M
[   37.639651][    T0] other info that might help us debug this:^M
[   37.639651][    T0] ^M
[   37.641381][    T0] ^M
[   37.641381][    T0] RCU used illegally from offline CPU!^M
[   37.641381][    T0] rcu_scheduler_active = 2, debug_locks = 1^M
[   37.667170][    T0] no locks held by swapper/6/0.^M
[   37.668328][    T0] ^M
[   37.668328][    T0] stack backtrace:^M
[   37.669995][    T0] CPU: 6 PID: 0 Comm: swapper/6 Not tainted 6.0.0-rc4-next-20220907-dirty #8^M
[   37.672777][    T0] Call Trace:^M
[   37.673729][    T0] [c000000004653920] [c00000000097f9b4] dump_stack_lvl+0x98/0xe0 (unreliable)^M
[   37.678579][    T0] [c000000004653960] [c0000000001f2eb8] lockdep_rcu_suspicious+0x148/0x16c^M
[   37.680425][    T0] [c0000000046539f0] [c0000000001ed9b4] __lock_acquire+0x10f4/0x26e0^M
[   37.682450][    T0] [c000000004653b30] [c0000000001efc2c] lock_acquire+0x12c/0x420^M
[   37.684113][    T0] [c000000004653c20] [c0000000010d704c] _raw_spin_lock_irqsave+0x6c/0xc0^M
[   37.686154][    T0] [c000000004653c60] [c0000000000c7b4c] xive_spapr_put_ipi+0xcc/0x150^M
[   37.687879][    T0] [c000000004653ca0] [c0000000010c72a8] xive_cleanup_cpu_ipi+0xc8/0xf0^M
[   37.689856][    T0] [c000000004653cf0] [c0000000010c7370] xive_teardown_cpu+0xa0/0xf0^M
[   37.691877][    T0] [c000000004653d30] [c0000000000fba5c] pseries_cpu_offline_self+0x5c/0x100^M
[   37.693882][    T0] [c000000004653da0] [c00000000005d2c4] arch_cpu_idle_dead+0x44/0x60^M
[   37.695739][    T0] [c000000004653dc0] [c0000000001c740c] do_idle+0x16c/0x3d0^M
[   37.697536][    T0] [c000000004653e70] [c0000000001c7a1c] cpu_startup_entry+0x3c/0x40^M
[   37.699694][    T0] [c000000004653ea0] [c00000000005ca20] start_secondary+0x6c0/0xb50^M
[   37.701742][    T0] [c000000004653f90] [c00000000000d054] start_secondary_prolog+0x10/0x14^M


I am a beginner, hope I can be of some beneficial to the community ;-)

Thanks
Zhouyi
--
 arch/powerpc/platforms/pseries/hotplug-cpu.c |  5 ++++-
 include/linux/rcupdate.h                     |  3 ++-
 kernel/rcu/tree.c                            | 10 ++++++++++
 3 files changed, 16 insertions(+), 2 deletions(-)

diff --git a/arch/powerpc/platforms/pseries/hotplug-cpu.c b/arch/powerpc/platforms/pseries/hotplug-cpu.c
index 0f8cd8b06432..ddf66a253c70 100644
--- a/arch/powerpc/platforms/pseries/hotplug-cpu.c
+++ b/arch/powerpc/platforms/pseries/hotplug-cpu.c
@@ -64,11 +64,14 @@ static void pseries_cpu_offline_self(void)
 
 	local_irq_disable();
 	idle_task_exit();
+
+	/* Because the cpu is now offline, let rcu know that */
+	rcu_state_ofl_lock();
 	if (xive_enabled())
 		xive_teardown_cpu();
 	else
 		xics_teardown_cpu();
-
+	rcu_state_ofl_unlock();
 	unregister_slb_shadow(hwcpu);
 	rtas_stop_self();
 
diff --git a/include/linux/rcupdate.h b/include/linux/rcupdate.h
index 63d2e6a60ad7..d857955a02ba 100644
--- a/include/linux/rcupdate.h
+++ b/include/linux/rcupdate.h
@@ -1034,5 +1034,6 @@ rcu_head_after_call_rcu(struct rcu_head *rhp, rcu_callback_t f)
 /* kernel/ksysfs.c definitions */
 extern int rcu_expedited;
 extern int rcu_normal;
-
+void rcu_state_ofl_lock(void);
+void rcu_state_ofl_unlock(void);
 #endif /* __LINUX_RCUPDATE_H */
diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
index 6bb8e72bc815..3282725f1054 100644
--- a/kernel/rcu/tree.c
+++ b/kernel/rcu/tree.c
@@ -4796,6 +4796,16 @@ void __init rcu_init(void)
 		(void)start_poll_synchronize_rcu_expedited();
 }
 
+void rcu_state_ofl_lock(void)
+{
+	arch_spin_lock(&rcu_state.ofl_lock);
+}
+
+void rcu_state_ofl_unlock(void)
+{
+	arch_spin_unlock(&rcu_state.ofl_lock);
+}
+
 #include "tree_stall.h"
 #include "tree_exp.h"
 #include "tree_nocb.h"
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 6+ messages in thread

* Re: [PATCH linux-next][RFC] powerpc: protect cpu offlining by RCU offline lock
  2022-09-14  2:15 [PATCH linux-next][RFC] powerpc: protect cpu offlining by RCU offline lock Zhouyi Zhou
@ 2022-09-14 12:17   ` Paul E. McKenney
  0 siblings, 0 replies; 6+ messages in thread
From: Paul E. McKenney @ 2022-09-14 12:17 UTC (permalink / raw)
  To: Zhouyi Zhou
  Cc: mpe, npiggin, christophe.leroy, atrajeev, linuxppc-dev,
	linux-kernel, lance, rcu

On Wed, Sep 14, 2022 at 10:15:28AM +0800, Zhouyi Zhou wrote:
> During the cpu offlining, the sub functions of xive_teardown_cpu will
> call __lock_acquire when CONFIG_LOCKDEP=y. The latter function will
> travel RCU protected list, so "WARNING: suspicious RCU usage" will be
> triggered.
> 
> Try to protect cpu offlining by RCU offline lock.

Rather than acquiring the RCU lock, why not change the functions called
by xive_teardown_cpu() to avoid calling __lock_acquire()?  For example,
a call to spin_lock() could be changed to arch_spin_lock().

							Thanx, Paul

> Tested on PPC VM of Open Source Lab of Oregon State University.
> (Each round of tests takes about 19 hours to finish)
> Test results show that although "WARNING: suspicious RCU usage" has gone,
> but there are more "BUG: soft lockup" reports than the original kernel
> (10 vs 6), so I add a [RFC] to my subject line.
> 
> Signed-off-by: Zhouyi Zhou <zhouzhouyi@gmail.com>
> ---
> [it seems that there are some delivery problem in my previous email,
>  so I send again via gmail, sorry for the trouble]
>  
> Dear PPC and RCU developers
> 
> I found this bug when trying to do rcutorture tests in ppc VM of
> Open Source Lab of Oregon State University.
> 
> console.log report following bug:
> [   37.635545][    T0] WARNING: suspicious RCU usage^M
> [   37.636409][    T0] 6.0.0-rc4-next-20220907-dirty #8 Not tainted^M
> [   37.637575][    T0] -----------------------------^M
> [   37.638306][    T0] kernel/locking/lockdep.c:3723 RCU-list traversed in non-reader section!!^M
> [   37.639651][    T0] ^M
> [   37.639651][    T0] other info that might help us debug this:^M
> [   37.639651][    T0] ^M
> [   37.641381][    T0] ^M
> [   37.641381][    T0] RCU used illegally from offline CPU!^M
> [   37.641381][    T0] rcu_scheduler_active = 2, debug_locks = 1^M
> [   37.667170][    T0] no locks held by swapper/6/0.^M
> [   37.668328][    T0] ^M
> [   37.668328][    T0] stack backtrace:^M
> [   37.669995][    T0] CPU: 6 PID: 0 Comm: swapper/6 Not tainted 6.0.0-rc4-next-20220907-dirty #8^M
> [   37.672777][    T0] Call Trace:^M
> [   37.673729][    T0] [c000000004653920] [c00000000097f9b4] dump_stack_lvl+0x98/0xe0 (unreliable)^M
> [   37.678579][    T0] [c000000004653960] [c0000000001f2eb8] lockdep_rcu_suspicious+0x148/0x16c^M
> [   37.680425][    T0] [c0000000046539f0] [c0000000001ed9b4] __lock_acquire+0x10f4/0x26e0^M
> [   37.682450][    T0] [c000000004653b30] [c0000000001efc2c] lock_acquire+0x12c/0x420^M
> [   37.684113][    T0] [c000000004653c20] [c0000000010d704c] _raw_spin_lock_irqsave+0x6c/0xc0^M
> [   37.686154][    T0] [c000000004653c60] [c0000000000c7b4c] xive_spapr_put_ipi+0xcc/0x150^M
> [   37.687879][    T0] [c000000004653ca0] [c0000000010c72a8] xive_cleanup_cpu_ipi+0xc8/0xf0^M
> [   37.689856][    T0] [c000000004653cf0] [c0000000010c7370] xive_teardown_cpu+0xa0/0xf0^M
> [   37.691877][    T0] [c000000004653d30] [c0000000000fba5c] pseries_cpu_offline_self+0x5c/0x100^M
> [   37.693882][    T0] [c000000004653da0] [c00000000005d2c4] arch_cpu_idle_dead+0x44/0x60^M
> [   37.695739][    T0] [c000000004653dc0] [c0000000001c740c] do_idle+0x16c/0x3d0^M
> [   37.697536][    T0] [c000000004653e70] [c0000000001c7a1c] cpu_startup_entry+0x3c/0x40^M
> [   37.699694][    T0] [c000000004653ea0] [c00000000005ca20] start_secondary+0x6c0/0xb50^M
> [   37.701742][    T0] [c000000004653f90] [c00000000000d054] start_secondary_prolog+0x10/0x14^M
> 
> 
> I am a beginner, hope I can be of some beneficial to the community ;-)
> 
> Thanks
> Zhouyi
> --
>  arch/powerpc/platforms/pseries/hotplug-cpu.c |  5 ++++-
>  include/linux/rcupdate.h                     |  3 ++-
>  kernel/rcu/tree.c                            | 10 ++++++++++
>  3 files changed, 16 insertions(+), 2 deletions(-)
> 
> diff --git a/arch/powerpc/platforms/pseries/hotplug-cpu.c b/arch/powerpc/platforms/pseries/hotplug-cpu.c
> index 0f8cd8b06432..ddf66a253c70 100644
> --- a/arch/powerpc/platforms/pseries/hotplug-cpu.c
> +++ b/arch/powerpc/platforms/pseries/hotplug-cpu.c
> @@ -64,11 +64,14 @@ static void pseries_cpu_offline_self(void)
>  
>  	local_irq_disable();
>  	idle_task_exit();
> +
> +	/* Because the cpu is now offline, let rcu know that */
> +	rcu_state_ofl_lock();
>  	if (xive_enabled())
>  		xive_teardown_cpu();
>  	else
>  		xics_teardown_cpu();
> -
> +	rcu_state_ofl_unlock();
>  	unregister_slb_shadow(hwcpu);
>  	rtas_stop_self();
>  
> diff --git a/include/linux/rcupdate.h b/include/linux/rcupdate.h
> index 63d2e6a60ad7..d857955a02ba 100644
> --- a/include/linux/rcupdate.h
> +++ b/include/linux/rcupdate.h
> @@ -1034,5 +1034,6 @@ rcu_head_after_call_rcu(struct rcu_head *rhp, rcu_callback_t f)
>  /* kernel/ksysfs.c definitions */
>  extern int rcu_expedited;
>  extern int rcu_normal;
> -
> +void rcu_state_ofl_lock(void);
> +void rcu_state_ofl_unlock(void);
>  #endif /* __LINUX_RCUPDATE_H */
> diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
> index 6bb8e72bc815..3282725f1054 100644
> --- a/kernel/rcu/tree.c
> +++ b/kernel/rcu/tree.c
> @@ -4796,6 +4796,16 @@ void __init rcu_init(void)
>  		(void)start_poll_synchronize_rcu_expedited();
>  }
>  
> +void rcu_state_ofl_lock(void)
> +{
> +	arch_spin_lock(&rcu_state.ofl_lock);
> +}
> +
> +void rcu_state_ofl_unlock(void)
> +{
> +	arch_spin_unlock(&rcu_state.ofl_lock);
> +}
> +
>  #include "tree_stall.h"
>  #include "tree_exp.h"
>  #include "tree_nocb.h"
> -- 
> 2.34.1
> 

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH linux-next][RFC] powerpc: protect cpu offlining by RCU offline lock
@ 2022-09-14 12:17   ` Paul E. McKenney
  0 siblings, 0 replies; 6+ messages in thread
From: Paul E. McKenney @ 2022-09-14 12:17 UTC (permalink / raw)
  To: Zhouyi Zhou; +Cc: atrajeev, linux-kernel, rcu, lance, npiggin, linuxppc-dev

On Wed, Sep 14, 2022 at 10:15:28AM +0800, Zhouyi Zhou wrote:
> During the cpu offlining, the sub functions of xive_teardown_cpu will
> call __lock_acquire when CONFIG_LOCKDEP=y. The latter function will
> travel RCU protected list, so "WARNING: suspicious RCU usage" will be
> triggered.
> 
> Try to protect cpu offlining by RCU offline lock.

Rather than acquiring the RCU lock, why not change the functions called
by xive_teardown_cpu() to avoid calling __lock_acquire()?  For example,
a call to spin_lock() could be changed to arch_spin_lock().

							Thanx, Paul

> Tested on PPC VM of Open Source Lab of Oregon State University.
> (Each round of tests takes about 19 hours to finish)
> Test results show that although "WARNING: suspicious RCU usage" has gone,
> but there are more "BUG: soft lockup" reports than the original kernel
> (10 vs 6), so I add a [RFC] to my subject line.
> 
> Signed-off-by: Zhouyi Zhou <zhouzhouyi@gmail.com>
> ---
> [it seems that there are some delivery problem in my previous email,
>  so I send again via gmail, sorry for the trouble]
>  
> Dear PPC and RCU developers
> 
> I found this bug when trying to do rcutorture tests in ppc VM of
> Open Source Lab of Oregon State University.
> 
> console.log report following bug:
> [   37.635545][    T0] WARNING: suspicious RCU usage^M
> [   37.636409][    T0] 6.0.0-rc4-next-20220907-dirty #8 Not tainted^M
> [   37.637575][    T0] -----------------------------^M
> [   37.638306][    T0] kernel/locking/lockdep.c:3723 RCU-list traversed in non-reader section!!^M
> [   37.639651][    T0] ^M
> [   37.639651][    T0] other info that might help us debug this:^M
> [   37.639651][    T0] ^M
> [   37.641381][    T0] ^M
> [   37.641381][    T0] RCU used illegally from offline CPU!^M
> [   37.641381][    T0] rcu_scheduler_active = 2, debug_locks = 1^M
> [   37.667170][    T0] no locks held by swapper/6/0.^M
> [   37.668328][    T0] ^M
> [   37.668328][    T0] stack backtrace:^M
> [   37.669995][    T0] CPU: 6 PID: 0 Comm: swapper/6 Not tainted 6.0.0-rc4-next-20220907-dirty #8^M
> [   37.672777][    T0] Call Trace:^M
> [   37.673729][    T0] [c000000004653920] [c00000000097f9b4] dump_stack_lvl+0x98/0xe0 (unreliable)^M
> [   37.678579][    T0] [c000000004653960] [c0000000001f2eb8] lockdep_rcu_suspicious+0x148/0x16c^M
> [   37.680425][    T0] [c0000000046539f0] [c0000000001ed9b4] __lock_acquire+0x10f4/0x26e0^M
> [   37.682450][    T0] [c000000004653b30] [c0000000001efc2c] lock_acquire+0x12c/0x420^M
> [   37.684113][    T0] [c000000004653c20] [c0000000010d704c] _raw_spin_lock_irqsave+0x6c/0xc0^M
> [   37.686154][    T0] [c000000004653c60] [c0000000000c7b4c] xive_spapr_put_ipi+0xcc/0x150^M
> [   37.687879][    T0] [c000000004653ca0] [c0000000010c72a8] xive_cleanup_cpu_ipi+0xc8/0xf0^M
> [   37.689856][    T0] [c000000004653cf0] [c0000000010c7370] xive_teardown_cpu+0xa0/0xf0^M
> [   37.691877][    T0] [c000000004653d30] [c0000000000fba5c] pseries_cpu_offline_self+0x5c/0x100^M
> [   37.693882][    T0] [c000000004653da0] [c00000000005d2c4] arch_cpu_idle_dead+0x44/0x60^M
> [   37.695739][    T0] [c000000004653dc0] [c0000000001c740c] do_idle+0x16c/0x3d0^M
> [   37.697536][    T0] [c000000004653e70] [c0000000001c7a1c] cpu_startup_entry+0x3c/0x40^M
> [   37.699694][    T0] [c000000004653ea0] [c00000000005ca20] start_secondary+0x6c0/0xb50^M
> [   37.701742][    T0] [c000000004653f90] [c00000000000d054] start_secondary_prolog+0x10/0x14^M
> 
> 
> I am a beginner, hope I can be of some beneficial to the community ;-)
> 
> Thanks
> Zhouyi
> --
>  arch/powerpc/platforms/pseries/hotplug-cpu.c |  5 ++++-
>  include/linux/rcupdate.h                     |  3 ++-
>  kernel/rcu/tree.c                            | 10 ++++++++++
>  3 files changed, 16 insertions(+), 2 deletions(-)
> 
> diff --git a/arch/powerpc/platforms/pseries/hotplug-cpu.c b/arch/powerpc/platforms/pseries/hotplug-cpu.c
> index 0f8cd8b06432..ddf66a253c70 100644
> --- a/arch/powerpc/platforms/pseries/hotplug-cpu.c
> +++ b/arch/powerpc/platforms/pseries/hotplug-cpu.c
> @@ -64,11 +64,14 @@ static void pseries_cpu_offline_self(void)
>  
>  	local_irq_disable();
>  	idle_task_exit();
> +
> +	/* Because the cpu is now offline, let rcu know that */
> +	rcu_state_ofl_lock();
>  	if (xive_enabled())
>  		xive_teardown_cpu();
>  	else
>  		xics_teardown_cpu();
> -
> +	rcu_state_ofl_unlock();
>  	unregister_slb_shadow(hwcpu);
>  	rtas_stop_self();
>  
> diff --git a/include/linux/rcupdate.h b/include/linux/rcupdate.h
> index 63d2e6a60ad7..d857955a02ba 100644
> --- a/include/linux/rcupdate.h
> +++ b/include/linux/rcupdate.h
> @@ -1034,5 +1034,6 @@ rcu_head_after_call_rcu(struct rcu_head *rhp, rcu_callback_t f)
>  /* kernel/ksysfs.c definitions */
>  extern int rcu_expedited;
>  extern int rcu_normal;
> -
> +void rcu_state_ofl_lock(void);
> +void rcu_state_ofl_unlock(void);
>  #endif /* __LINUX_RCUPDATE_H */
> diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
> index 6bb8e72bc815..3282725f1054 100644
> --- a/kernel/rcu/tree.c
> +++ b/kernel/rcu/tree.c
> @@ -4796,6 +4796,16 @@ void __init rcu_init(void)
>  		(void)start_poll_synchronize_rcu_expedited();
>  }
>  
> +void rcu_state_ofl_lock(void)
> +{
> +	arch_spin_lock(&rcu_state.ofl_lock);
> +}
> +
> +void rcu_state_ofl_unlock(void)
> +{
> +	arch_spin_unlock(&rcu_state.ofl_lock);
> +}
> +
>  #include "tree_stall.h"
>  #include "tree_exp.h"
>  #include "tree_nocb.h"
> -- 
> 2.34.1
> 

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH linux-next][RFC] powerpc: protect cpu offlining by RCU offline lock
  2022-09-14 12:17   ` Paul E. McKenney
@ 2022-09-14 14:09     ` Zhouyi Zhou
  -1 siblings, 0 replies; 6+ messages in thread
From: Zhouyi Zhou @ 2022-09-14 14:09 UTC (permalink / raw)
  To: paulmck
  Cc: mpe, npiggin, christophe.leroy, atrajeev, linuxppc-dev,
	linux-kernel, lance, rcu

On Wed, Sep 14, 2022 at 8:17 PM Paul E. McKenney <paulmck@kernel.org> wrote:
>
> On Wed, Sep 14, 2022 at 10:15:28AM +0800, Zhouyi Zhou wrote:
> > During the cpu offlining, the sub functions of xive_teardown_cpu will
> > call __lock_acquire when CONFIG_LOCKDEP=y. The latter function will
> > travel RCU protected list, so "WARNING: suspicious RCU usage" will be
> > triggered.
> >
> > Try to protect cpu offlining by RCU offline lock.
>
Thank Paul for your guidance!
> Rather than acquiring the RCU lock, why not change the functions called
> by xive_teardown_cpu() to avoid calling __lock_acquire()?  For example,
> a call to spin_lock() could be changed to arch_spin_lock().
Great idea!
I will take a try, and perform new rounds of rcutorture tests. I will
submit a new version next week.
Also thank PPC developers for your patience on me ;-)

Cheers
Zhouyi
>
>                                                         Thanx, Paul
>
> > Tested on PPC VM of Open Source Lab of Oregon State University.
> > (Each round of tests takes about 19 hours to finish)
> > Test results show that although "WARNING: suspicious RCU usage" has gone,
> > but there are more "BUG: soft lockup" reports than the original kernel
> > (10 vs 6), so I add a [RFC] to my subject line.
> >
> > Signed-off-by: Zhouyi Zhou <zhouzhouyi@gmail.com>
> > ---
> > [it seems that there are some delivery problem in my previous email,
> >  so I send again via gmail, sorry for the trouble]
> >
> > Dear PPC and RCU developers
> >
> > I found this bug when trying to do rcutorture tests in ppc VM of
> > Open Source Lab of Oregon State University.
> >
> > console.log report following bug:
> > [   37.635545][    T0] WARNING: suspicious RCU usage^M
> > [   37.636409][    T0] 6.0.0-rc4-next-20220907-dirty #8 Not tainted^M
> > [   37.637575][    T0] -----------------------------^M
> > [   37.638306][    T0] kernel/locking/lockdep.c:3723 RCU-list traversed in non-reader section!!^M
> > [   37.639651][    T0] ^M
> > [   37.639651][    T0] other info that might help us debug this:^M
> > [   37.639651][    T0] ^M
> > [   37.641381][    T0] ^M
> > [   37.641381][    T0] RCU used illegally from offline CPU!^M
> > [   37.641381][    T0] rcu_scheduler_active = 2, debug_locks = 1^M
> > [   37.667170][    T0] no locks held by swapper/6/0.^M
> > [   37.668328][    T0] ^M
> > [   37.668328][    T0] stack backtrace:^M
> > [   37.669995][    T0] CPU: 6 PID: 0 Comm: swapper/6 Not tainted 6.0.0-rc4-next-20220907-dirty #8^M
> > [   37.672777][    T0] Call Trace:^M
> > [   37.673729][    T0] [c000000004653920] [c00000000097f9b4] dump_stack_lvl+0x98/0xe0 (unreliable)^M
> > [   37.678579][    T0] [c000000004653960] [c0000000001f2eb8] lockdep_rcu_suspicious+0x148/0x16c^M
> > [   37.680425][    T0] [c0000000046539f0] [c0000000001ed9b4] __lock_acquire+0x10f4/0x26e0^M
> > [   37.682450][    T0] [c000000004653b30] [c0000000001efc2c] lock_acquire+0x12c/0x420^M
> > [   37.684113][    T0] [c000000004653c20] [c0000000010d704c] _raw_spin_lock_irqsave+0x6c/0xc0^M
> > [   37.686154][    T0] [c000000004653c60] [c0000000000c7b4c] xive_spapr_put_ipi+0xcc/0x150^M
> > [   37.687879][    T0] [c000000004653ca0] [c0000000010c72a8] xive_cleanup_cpu_ipi+0xc8/0xf0^M
> > [   37.689856][    T0] [c000000004653cf0] [c0000000010c7370] xive_teardown_cpu+0xa0/0xf0^M
> > [   37.691877][    T0] [c000000004653d30] [c0000000000fba5c] pseries_cpu_offline_self+0x5c/0x100^M
> > [   37.693882][    T0] [c000000004653da0] [c00000000005d2c4] arch_cpu_idle_dead+0x44/0x60^M
> > [   37.695739][    T0] [c000000004653dc0] [c0000000001c740c] do_idle+0x16c/0x3d0^M
> > [   37.697536][    T0] [c000000004653e70] [c0000000001c7a1c] cpu_startup_entry+0x3c/0x40^M
> > [   37.699694][    T0] [c000000004653ea0] [c00000000005ca20] start_secondary+0x6c0/0xb50^M
> > [   37.701742][    T0] [c000000004653f90] [c00000000000d054] start_secondary_prolog+0x10/0x14^M
> >
> >
> > I am a beginner, hope I can be of some beneficial to the community ;-)
> >
> > Thanks
> > Zhouyi
> > --
> >  arch/powerpc/platforms/pseries/hotplug-cpu.c |  5 ++++-
> >  include/linux/rcupdate.h                     |  3 ++-
> >  kernel/rcu/tree.c                            | 10 ++++++++++
> >  3 files changed, 16 insertions(+), 2 deletions(-)
> >
> > diff --git a/arch/powerpc/platforms/pseries/hotplug-cpu.c b/arch/powerpc/platforms/pseries/hotplug-cpu.c
> > index 0f8cd8b06432..ddf66a253c70 100644
> > --- a/arch/powerpc/platforms/pseries/hotplug-cpu.c
> > +++ b/arch/powerpc/platforms/pseries/hotplug-cpu.c
> > @@ -64,11 +64,14 @@ static void pseries_cpu_offline_self(void)
> >
> >       local_irq_disable();
> >       idle_task_exit();
> > +
> > +     /* Because the cpu is now offline, let rcu know that */
> > +     rcu_state_ofl_lock();
> >       if (xive_enabled())
> >               xive_teardown_cpu();
> >       else
> >               xics_teardown_cpu();
> > -
> > +     rcu_state_ofl_unlock();
> >       unregister_slb_shadow(hwcpu);
> >       rtas_stop_self();
> >
> > diff --git a/include/linux/rcupdate.h b/include/linux/rcupdate.h
> > index 63d2e6a60ad7..d857955a02ba 100644
> > --- a/include/linux/rcupdate.h
> > +++ b/include/linux/rcupdate.h
> > @@ -1034,5 +1034,6 @@ rcu_head_after_call_rcu(struct rcu_head *rhp, rcu_callback_t f)
> >  /* kernel/ksysfs.c definitions */
> >  extern int rcu_expedited;
> >  extern int rcu_normal;
> > -
> > +void rcu_state_ofl_lock(void);
> > +void rcu_state_ofl_unlock(void);
> >  #endif /* __LINUX_RCUPDATE_H */
> > diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
> > index 6bb8e72bc815..3282725f1054 100644
> > --- a/kernel/rcu/tree.c
> > +++ b/kernel/rcu/tree.c
> > @@ -4796,6 +4796,16 @@ void __init rcu_init(void)
> >               (void)start_poll_synchronize_rcu_expedited();
> >  }
> >
> > +void rcu_state_ofl_lock(void)
> > +{
> > +     arch_spin_lock(&rcu_state.ofl_lock);
> > +}
> > +
> > +void rcu_state_ofl_unlock(void)
> > +{
> > +     arch_spin_unlock(&rcu_state.ofl_lock);
> > +}
> > +
> >  #include "tree_stall.h"
> >  #include "tree_exp.h"
> >  #include "tree_nocb.h"
> > --
> > 2.34.1
> >

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH linux-next][RFC] powerpc: protect cpu offlining by RCU offline lock
@ 2022-09-14 14:09     ` Zhouyi Zhou
  0 siblings, 0 replies; 6+ messages in thread
From: Zhouyi Zhou @ 2022-09-14 14:09 UTC (permalink / raw)
  To: paulmck; +Cc: atrajeev, linux-kernel, rcu, lance, npiggin, linuxppc-dev

On Wed, Sep 14, 2022 at 8:17 PM Paul E. McKenney <paulmck@kernel.org> wrote:
>
> On Wed, Sep 14, 2022 at 10:15:28AM +0800, Zhouyi Zhou wrote:
> > During the cpu offlining, the sub functions of xive_teardown_cpu will
> > call __lock_acquire when CONFIG_LOCKDEP=y. The latter function will
> > travel RCU protected list, so "WARNING: suspicious RCU usage" will be
> > triggered.
> >
> > Try to protect cpu offlining by RCU offline lock.
>
Thank Paul for your guidance!
> Rather than acquiring the RCU lock, why not change the functions called
> by xive_teardown_cpu() to avoid calling __lock_acquire()?  For example,
> a call to spin_lock() could be changed to arch_spin_lock().
Great idea!
I will take a try, and perform new rounds of rcutorture tests. I will
submit a new version next week.
Also thank PPC developers for your patience on me ;-)

Cheers
Zhouyi
>
>                                                         Thanx, Paul
>
> > Tested on PPC VM of Open Source Lab of Oregon State University.
> > (Each round of tests takes about 19 hours to finish)
> > Test results show that although "WARNING: suspicious RCU usage" has gone,
> > but there are more "BUG: soft lockup" reports than the original kernel
> > (10 vs 6), so I add a [RFC] to my subject line.
> >
> > Signed-off-by: Zhouyi Zhou <zhouzhouyi@gmail.com>
> > ---
> > [it seems that there are some delivery problem in my previous email,
> >  so I send again via gmail, sorry for the trouble]
> >
> > Dear PPC and RCU developers
> >
> > I found this bug when trying to do rcutorture tests in ppc VM of
> > Open Source Lab of Oregon State University.
> >
> > console.log report following bug:
> > [   37.635545][    T0] WARNING: suspicious RCU usage^M
> > [   37.636409][    T0] 6.0.0-rc4-next-20220907-dirty #8 Not tainted^M
> > [   37.637575][    T0] -----------------------------^M
> > [   37.638306][    T0] kernel/locking/lockdep.c:3723 RCU-list traversed in non-reader section!!^M
> > [   37.639651][    T0] ^M
> > [   37.639651][    T0] other info that might help us debug this:^M
> > [   37.639651][    T0] ^M
> > [   37.641381][    T0] ^M
> > [   37.641381][    T0] RCU used illegally from offline CPU!^M
> > [   37.641381][    T0] rcu_scheduler_active = 2, debug_locks = 1^M
> > [   37.667170][    T0] no locks held by swapper/6/0.^M
> > [   37.668328][    T0] ^M
> > [   37.668328][    T0] stack backtrace:^M
> > [   37.669995][    T0] CPU: 6 PID: 0 Comm: swapper/6 Not tainted 6.0.0-rc4-next-20220907-dirty #8^M
> > [   37.672777][    T0] Call Trace:^M
> > [   37.673729][    T0] [c000000004653920] [c00000000097f9b4] dump_stack_lvl+0x98/0xe0 (unreliable)^M
> > [   37.678579][    T0] [c000000004653960] [c0000000001f2eb8] lockdep_rcu_suspicious+0x148/0x16c^M
> > [   37.680425][    T0] [c0000000046539f0] [c0000000001ed9b4] __lock_acquire+0x10f4/0x26e0^M
> > [   37.682450][    T0] [c000000004653b30] [c0000000001efc2c] lock_acquire+0x12c/0x420^M
> > [   37.684113][    T0] [c000000004653c20] [c0000000010d704c] _raw_spin_lock_irqsave+0x6c/0xc0^M
> > [   37.686154][    T0] [c000000004653c60] [c0000000000c7b4c] xive_spapr_put_ipi+0xcc/0x150^M
> > [   37.687879][    T0] [c000000004653ca0] [c0000000010c72a8] xive_cleanup_cpu_ipi+0xc8/0xf0^M
> > [   37.689856][    T0] [c000000004653cf0] [c0000000010c7370] xive_teardown_cpu+0xa0/0xf0^M
> > [   37.691877][    T0] [c000000004653d30] [c0000000000fba5c] pseries_cpu_offline_self+0x5c/0x100^M
> > [   37.693882][    T0] [c000000004653da0] [c00000000005d2c4] arch_cpu_idle_dead+0x44/0x60^M
> > [   37.695739][    T0] [c000000004653dc0] [c0000000001c740c] do_idle+0x16c/0x3d0^M
> > [   37.697536][    T0] [c000000004653e70] [c0000000001c7a1c] cpu_startup_entry+0x3c/0x40^M
> > [   37.699694][    T0] [c000000004653ea0] [c00000000005ca20] start_secondary+0x6c0/0xb50^M
> > [   37.701742][    T0] [c000000004653f90] [c00000000000d054] start_secondary_prolog+0x10/0x14^M
> >
> >
> > I am a beginner, hope I can be of some beneficial to the community ;-)
> >
> > Thanks
> > Zhouyi
> > --
> >  arch/powerpc/platforms/pseries/hotplug-cpu.c |  5 ++++-
> >  include/linux/rcupdate.h                     |  3 ++-
> >  kernel/rcu/tree.c                            | 10 ++++++++++
> >  3 files changed, 16 insertions(+), 2 deletions(-)
> >
> > diff --git a/arch/powerpc/platforms/pseries/hotplug-cpu.c b/arch/powerpc/platforms/pseries/hotplug-cpu.c
> > index 0f8cd8b06432..ddf66a253c70 100644
> > --- a/arch/powerpc/platforms/pseries/hotplug-cpu.c
> > +++ b/arch/powerpc/platforms/pseries/hotplug-cpu.c
> > @@ -64,11 +64,14 @@ static void pseries_cpu_offline_self(void)
> >
> >       local_irq_disable();
> >       idle_task_exit();
> > +
> > +     /* Because the cpu is now offline, let rcu know that */
> > +     rcu_state_ofl_lock();
> >       if (xive_enabled())
> >               xive_teardown_cpu();
> >       else
> >               xics_teardown_cpu();
> > -
> > +     rcu_state_ofl_unlock();
> >       unregister_slb_shadow(hwcpu);
> >       rtas_stop_self();
> >
> > diff --git a/include/linux/rcupdate.h b/include/linux/rcupdate.h
> > index 63d2e6a60ad7..d857955a02ba 100644
> > --- a/include/linux/rcupdate.h
> > +++ b/include/linux/rcupdate.h
> > @@ -1034,5 +1034,6 @@ rcu_head_after_call_rcu(struct rcu_head *rhp, rcu_callback_t f)
> >  /* kernel/ksysfs.c definitions */
> >  extern int rcu_expedited;
> >  extern int rcu_normal;
> > -
> > +void rcu_state_ofl_lock(void);
> > +void rcu_state_ofl_unlock(void);
> >  #endif /* __LINUX_RCUPDATE_H */
> > diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
> > index 6bb8e72bc815..3282725f1054 100644
> > --- a/kernel/rcu/tree.c
> > +++ b/kernel/rcu/tree.c
> > @@ -4796,6 +4796,16 @@ void __init rcu_init(void)
> >               (void)start_poll_synchronize_rcu_expedited();
> >  }
> >
> > +void rcu_state_ofl_lock(void)
> > +{
> > +     arch_spin_lock(&rcu_state.ofl_lock);
> > +}
> > +
> > +void rcu_state_ofl_unlock(void)
> > +{
> > +     arch_spin_unlock(&rcu_state.ofl_lock);
> > +}
> > +
> >  #include "tree_stall.h"
> >  #include "tree_exp.h"
> >  #include "tree_nocb.h"
> > --
> > 2.34.1
> >

^ permalink raw reply	[flat|nested] 6+ messages in thread

* [PATCH linux-next][RFC] powerpc: protect cpu offlining by RCU offline lock
@ 2022-09-14  1:44 Zhouyi Zhou
  0 siblings, 0 replies; 6+ messages in thread
From: Zhouyi Zhou @ 2022-09-14  1:44 UTC (permalink / raw)
  To: mpe, npiggin, christophe.leroy, atrajeev, linuxppc-dev,
	linux-kernel, lance, paulmck, rcu
  Cc: Zhouyi Zhou

From: Zhouyi Zhou <zhouzhouyi@gmail.com>

During the cpu offlining, the sub functions of xive_teardown_cpu will
call __lock_acquire when CONFIG_LOCKDEP=y. The latter function will
travel RCU protected list, so "WARNING: suspicious RCU usage" will be
triggered.

Try to protect cpu offlining by RCU offline lock.

Tested on PPC VM of Open Source Lab of Oregon State University.
(Each round of tests takes about 19 hours to finish)
Test results show that although "WARNING: suspicious RCU usage" has gone,
but there are more "BUG: soft lockup" reports than the original kernel
(10 vs 6), so I add a [RFC] to my subject line.

Signed-off-by: Zhouyi Zhou <zhouzhouyi@gmail.com>
---
Dear PPC and RCU developers

I found this bug when trying to do rcutorture tests in ppc VM of
Open Source Lab of Oregon State University.

console.log report following bug:
[   37.635545][    T0] WARNING: suspicious RCU usage^M
[   37.636409][    T0] 6.0.0-rc4-next-20220907-dirty #8 Not tainted^M
[   37.637575][    T0] -----------------------------^M
[   37.638306][    T0] kernel/locking/lockdep.c:3723 RCU-list traversed in non-reader section!!^M
[   37.639651][    T0] ^M
[   37.639651][    T0] other info that might help us debug this:^M
[   37.639651][    T0] ^M
[   37.641381][    T0] ^M
[   37.641381][    T0] RCU used illegally from offline CPU!^M
[   37.641381][    T0] rcu_scheduler_active = 2, debug_locks = 1^M
[   37.667170][    T0] no locks held by swapper/6/0.^M
[   37.668328][    T0] ^M
[   37.668328][    T0] stack backtrace:^M
[   37.669995][    T0] CPU: 6 PID: 0 Comm: swapper/6 Not tainted 6.0.0-rc4-next-20220907-dirty #8^M
[   37.672777][    T0] Call Trace:^M
[   37.673729][    T0] [c000000004653920] [c00000000097f9b4] dump_stack_lvl+0x98/0xe0 (unreliable)^M
[   37.678579][    T0] [c000000004653960] [c0000000001f2eb8] lockdep_rcu_suspicious+0x148/0x16c^M
[   37.680425][    T0] [c0000000046539f0] [c0000000001ed9b4] __lock_acquire+0x10f4/0x26e0^M
[   37.682450][    T0] [c000000004653b30] [c0000000001efc2c] lock_acquire+0x12c/0x420^M
[   37.684113][    T0] [c000000004653c20] [c0000000010d704c] _raw_spin_lock_irqsave+0x6c/0xc0^M
[   37.686154][    T0] [c000000004653c60] [c0000000000c7b4c] xive_spapr_put_ipi+0xcc/0x150^M
[   37.687879][    T0] [c000000004653ca0] [c0000000010c72a8] xive_cleanup_cpu_ipi+0xc8/0xf0^M
[   37.689856][    T0] [c000000004653cf0] [c0000000010c7370] xive_teardown_cpu+0xa0/0xf0^M
[   37.691877][    T0] [c000000004653d30] [c0000000000fba5c] pseries_cpu_offline_self+0x5c/0x100^M
[   37.693882][    T0] [c000000004653da0] [c00000000005d2c4] arch_cpu_idle_dead+0x44/0x60^M
[   37.695739][    T0] [c000000004653dc0] [c0000000001c740c] do_idle+0x16c/0x3d0^M
[   37.697536][    T0] [c000000004653e70] [c0000000001c7a1c] cpu_startup_entry+0x3c/0x40^M
[   37.699694][    T0] [c000000004653ea0] [c00000000005ca20] start_secondary+0x6c0/0xb50^M
[   37.701742][    T0] [c000000004653f90] [c00000000000d054] start_secondary_prolog+0x10/0x14^M


I am a beginner, hope I can be of some beneficial to the community ;-)

Thanks
Zhouyi
--
 arch/powerpc/platforms/pseries/hotplug-cpu.c |  5 ++++-
 include/linux/rcupdate.h                     |  3 ++-
 kernel/rcu/tree.c                            | 10 ++++++++++
 3 files changed, 16 insertions(+), 2 deletions(-)

diff --git a/arch/powerpc/platforms/pseries/hotplug-cpu.c b/arch/powerpc/platforms/pseries/hotplug-cpu.c
index 0f8cd8b06432..ddf66a253c70 100644
--- a/arch/powerpc/platforms/pseries/hotplug-cpu.c
+++ b/arch/powerpc/platforms/pseries/hotplug-cpu.c
@@ -64,11 +64,14 @@ static void pseries_cpu_offline_self(void)
 
 	local_irq_disable();
 	idle_task_exit();
+
+	/* Because the cpu is now offline, let rcu know that */
+	rcu_state_ofl_lock();
 	if (xive_enabled())
 		xive_teardown_cpu();
 	else
 		xics_teardown_cpu();
-
+	rcu_state_ofl_unlock();
 	unregister_slb_shadow(hwcpu);
 	rtas_stop_self();
 
diff --git a/include/linux/rcupdate.h b/include/linux/rcupdate.h
index 63d2e6a60ad7..d857955a02ba 100644
--- a/include/linux/rcupdate.h
+++ b/include/linux/rcupdate.h
@@ -1034,5 +1034,6 @@ rcu_head_after_call_rcu(struct rcu_head *rhp, rcu_callback_t f)
 /* kernel/ksysfs.c definitions */
 extern int rcu_expedited;
 extern int rcu_normal;
-
+void rcu_state_ofl_lock(void);
+void rcu_state_ofl_unlock(void);
 #endif /* __LINUX_RCUPDATE_H */
diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
index 6bb8e72bc815..3282725f1054 100644
--- a/kernel/rcu/tree.c
+++ b/kernel/rcu/tree.c
@@ -4796,6 +4796,16 @@ void __init rcu_init(void)
 		(void)start_poll_synchronize_rcu_expedited();
 }
 
+void rcu_state_ofl_lock(void)
+{
+	arch_spin_lock(&rcu_state.ofl_lock);
+}
+
+void rcu_state_ofl_unlock(void)
+{
+	arch_spin_unlock(&rcu_state.ofl_lock);
+}
+
 #include "tree_stall.h"
 #include "tree_exp.h"
 #include "tree_nocb.h"
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2022-09-14 14:10 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-09-14  2:15 [PATCH linux-next][RFC] powerpc: protect cpu offlining by RCU offline lock Zhouyi Zhou
2022-09-14 12:17 ` Paul E. McKenney
2022-09-14 12:17   ` Paul E. McKenney
2022-09-14 14:09   ` Zhouyi Zhou
2022-09-14 14:09     ` Zhouyi Zhou
  -- strict thread matches above, loose matches on Subject: below --
2022-09-14  1:44 Zhouyi Zhou

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.