linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* rcu warnings cause stack overflow
@ 2012-02-01 10:06 Heiko Carstens
  2012-02-01 15:14 ` Frederic Weisbecker
  0 siblings, 1 reply; 13+ messages in thread
From: Heiko Carstens @ 2012-02-01 10:06 UTC (permalink / raw)
  To: Frederic Weisbecker
  Cc: linux-kernel, Paul E. McKenney, Ingo Molnar, Peter Zijlstra

Hi Frederic,

your patch 00f49e5729 "rcu: Warn when rcu_read_lock() is used in extended
quiescent state" adds a WARN_ON_ONCE to rcu_lock_acquire().
Actually this found a bug on s390 (thanks!) but it probably didn't work
as expected.
On architectures which implement WARN_ON_ONCE with an exception this
additional warning will lead to a stack overflow (if it triggers):

[   55.746956] Kernel stack overflow.
[   55.746966] Modules linked in: qeth_l3 binfmt_misc dm_multipath scsi_dh dm_mod qeth vmur ccwgroup [last unloaded: scsi_wait_
scan]
[   55.746999] CPU: 0 Not tainted 3.3.0-rc1-00167-gf8275f9 #90
[   55.747005] Process swapper/0 (pid: 0, task: 0000000000911100, ksp: 0000000000907d50)
[   55.747013] Krnl PSW : 0404000180000000 00000000005d5728 (illegal_op+0x1c/0x134)
[   55.747034]            R:0 T:1 IO:0 EX:0 Key:0 M:1 W:0 P:0 AS:0 CC:0 PM:0 EA:3
[   55.747043] Krnl GPRS: 0000000000000001 00000000005d570c 00000000009040e8 0000000000000002
[   55.747054]            00000000005d83dc ffffffffffffffff 0000000000000000 0400000000907cc8
[   55.747064]            0404100180000000 00000000005d8478 0000000000000008 00000000009040e8
[   55.747074]            0000000000904000 00000000005dc550 0000000000904048 0000000000904048
[   55.747096] Krnl Code: 00000000005d571c: b90400ef            lgr     %r14,%r15
[   55.747118]            00000000005d5720: b90400b2            lgr     %r11,%r2
[   55.747194]           #00000000005d5724: a7840001            brc     8,5d5726
[   55.747205]           >00000000005d5728: a7fbff18            aghi    %r15,-232
[   55.747216]            00000000005d572c: e3e0f0980024        stg     %r14,152(%r15)
[   55.747228]            00000000005d5732: e31020100004        lg      %r1,16(%r2)
[   55.747242]            00000000005d5738: 58c020a0            l       %r12,160(%r2)
[   55.747257]            00000000005d573c: 91012009            tm      9(%r2),1
[   55.747276] Call Trace:
[   55.747282] ([<00000000005d60b4>] pgm_check_handler+0x154/0x158)
[   55.747296]  [<00000000005d8478>] __atomic_notifier_call_chain+0xd8/0xfc
[   55.747309] ([<00000000005d83dc>] __atomic_notifier_call_chain+0x3c/0xfc)
[   55.747322]  [<00000000005d84c6>] atomic_notifier_call_chain+0x2a/0x3c
[   55.747335]  [<00000000005d852a>] notify_die+0x52/0x60
[   55.747349]  [<00000000005d57da>] illegal_op+0xce/0x134
[   55.747364]  [<00000000005d60b4>] pgm_check_handler+0x154/0x158

[...lots more of the same...] 

[   55.747379]  [<00000000005d8478>] __atomic_notifier_call_chain+0xd8/0xfc
[   55.747425] ([<00000000005d83dc>] __atomic_notifier_call_chain+0x3c/0xfc)
[   55.747432]  [<00000000005d84c6>] atomic_notifier_call_chain+0x2a/0x3c
[   55.747440]  [<00000000005d852a>] notify_die+0x52/0x60
[   55.747448]  [<00000000005d57da>] illegal_op+0xce/0x134
[   55.747457]  [<00000000005d60b4>] pgm_check_handler+0x154/0x158
[   55.747797]  [<00000000005d8478>] __atomic_notifier_call_chain+0xd8/0xfc
[   55.747806] ([<00000000005d83dc>] __atomic_notifier_call_chain+0x3c/0xfc)
[   55.747816]  [<00000000005d84c6>] atomic_notifier_call_chain+0x2a/0x3c
[   55.747826]  [<00000000005d852a>] notify_die+0x52/0x60
[   55.748456]  [<00000000005d57da>] illegal_op+0xce/0x134
[   55.748463]  [<00000000005d60b4>] pgm_check_handler+0x154/0x158
[   55.748472]  [<000000000017afa0>] select_task_rq_fair+0x1478/0x14b4
[   55.748483] ([<0000000000179bb8>] select_task_rq_fair+0x90/0x14b4)
[   55.748493]  [<0000000000170702>] try_to_wake_up+0x136/0x47c
[   55.748506]  [<000000000015b446>] autoremove_wake_function+0x26/0x58
[   55.748518]  [<000000000016693a>] __wake_up_common+0x76/0xb4
[   55.748530]  [<000000000016aed0>] __wake_up+0x4c/0x60
[   55.748541]  [<0000000000109ee0>] s390_handle_mcck+0x194/0x1f8
[   55.748557]  [<000000000010486a>] cpu_idle+0x192/0x1c0
[   55.748570]  [<0000000000977916>] start_kernel+0x402/0x410
[   55.748588]  [<0000000000100020>] _stext+0x20/0x80
[   55.748603] 2 locks held by swapper/0/0:
[   55.748612]  #0:  (crw_handler_wait_q.lock){......}, at: [<000000000016aeb6>] __wake_up+0x32/0x60
[   55.748648]  #1:  (&p->pi_lock){-.-.-.}, at: [<000000000017060c>] try_to_wake_up+0x40/0x47c
[   55.748663] Last Breaking-Event-Address:
[   55.748667]  [<0000000000000000>] 0x0

This simply happens because WARN_ON_ONCE causes an exception, the excpetion
handler wants to call a notifier call chain (notify_die), which again uses
rcu_read_lock(), which again causes an exception and so on...
Unfortunately WARN_ON_ONCE first causes an exception and only afterwards sets
the flag that the warning already happened. Seems to be quite some effort to
change this behaviour.

Removing the WARN_ON_ONCE will fix this and, if lockdep is turned on, still
will find illegal uses. But it won't work for lockdep off configs...
So we probably want something better than the patch below.

diff --git a/include/linux/rcupdate.h b/include/linux/rcupdate.h
index 81c04f4..6da8ca4 100644
--- a/include/linux/rcupdate.h
+++ b/include/linux/rcupdate.h
@@ -239,13 +239,11 @@ static inline int rcu_is_cpu_idle(void)
 
 static inline void rcu_lock_acquire(struct lockdep_map *map)
 {
-	WARN_ON_ONCE(rcu_is_cpu_idle());
 	lock_acquire(map, 0, 0, 2, 1, NULL, _THIS_IP_);
 }
 
 static inline void rcu_lock_release(struct lockdep_map *map)
 {
-	WARN_ON_ONCE(rcu_is_cpu_idle());
 	lock_release(map, 1, _THIS_IP_);
 }
 


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* Re: rcu warnings cause stack overflow
  2012-02-01 10:06 rcu warnings cause stack overflow Heiko Carstens
@ 2012-02-01 15:14 ` Frederic Weisbecker
  2012-02-01 17:18   ` Paul E. McKenney
  2012-02-02 12:27   ` Heiko Carstens
  0 siblings, 2 replies; 13+ messages in thread
From: Frederic Weisbecker @ 2012-02-01 15:14 UTC (permalink / raw)
  To: Heiko Carstens
  Cc: linux-kernel, Paul E. McKenney, Ingo Molnar, Peter Zijlstra

On Wed, Feb 01, 2012 at 11:06:52AM +0100, Heiko Carstens wrote:
> Hi Frederic,
> 
> your patch 00f49e5729 "rcu: Warn when rcu_read_lock() is used in extended
> quiescent state" adds a WARN_ON_ONCE to rcu_lock_acquire().
> Actually this found a bug on s390 (thanks!) but it probably didn't work
> as expected.
> On architectures which implement WARN_ON_ONCE with an exception this
> additional warning will lead to a stack overflow (if it triggers):
> 
> [   55.746956] Kernel stack overflow.
> [   55.746966] Modules linked in: qeth_l3 binfmt_misc dm_multipath scsi_dh dm_mod qeth vmur ccwgroup [last unloaded: scsi_wait_
> scan]
> [   55.746999] CPU: 0 Not tainted 3.3.0-rc1-00167-gf8275f9 #90
> [   55.747005] Process swapper/0 (pid: 0, task: 0000000000911100, ksp: 0000000000907d50)
> [   55.747013] Krnl PSW : 0404000180000000 00000000005d5728 (illegal_op+0x1c/0x134)
> [   55.747034]            R:0 T:1 IO:0 EX:0 Key:0 M:1 W:0 P:0 AS:0 CC:0 PM:0 EA:3
> [   55.747043] Krnl GPRS: 0000000000000001 00000000005d570c 00000000009040e8 0000000000000002
> [   55.747054]            00000000005d83dc ffffffffffffffff 0000000000000000 0400000000907cc8
> [   55.747064]            0404100180000000 00000000005d8478 0000000000000008 00000000009040e8
> [   55.747074]            0000000000904000 00000000005dc550 0000000000904048 0000000000904048
> [   55.747096] Krnl Code: 00000000005d571c: b90400ef            lgr     %r14,%r15
> [   55.747118]            00000000005d5720: b90400b2            lgr     %r11,%r2
> [   55.747194]           #00000000005d5724: a7840001            brc     8,5d5726
> [   55.747205]           >00000000005d5728: a7fbff18            aghi    %r15,-232
> [   55.747216]            00000000005d572c: e3e0f0980024        stg     %r14,152(%r15)
> [   55.747228]            00000000005d5732: e31020100004        lg      %r1,16(%r2)
> [   55.747242]            00000000005d5738: 58c020a0            l       %r12,160(%r2)
> [   55.747257]            00000000005d573c: 91012009            tm      9(%r2),1
> [   55.747276] Call Trace:
> [   55.747282] ([<00000000005d60b4>] pgm_check_handler+0x154/0x158)
> [   55.747296]  [<00000000005d8478>] __atomic_notifier_call_chain+0xd8/0xfc
> [   55.747309] ([<00000000005d83dc>] __atomic_notifier_call_chain+0x3c/0xfc)
> [   55.747322]  [<00000000005d84c6>] atomic_notifier_call_chain+0x2a/0x3c
> [   55.747335]  [<00000000005d852a>] notify_die+0x52/0x60
> [   55.747349]  [<00000000005d57da>] illegal_op+0xce/0x134
> [   55.747364]  [<00000000005d60b4>] pgm_check_handler+0x154/0x158
> 
> [...lots more of the same...] 
> 
> [   55.747379]  [<00000000005d8478>] __atomic_notifier_call_chain+0xd8/0xfc
> [   55.747425] ([<00000000005d83dc>] __atomic_notifier_call_chain+0x3c/0xfc)
> [   55.747432]  [<00000000005d84c6>] atomic_notifier_call_chain+0x2a/0x3c
> [   55.747440]  [<00000000005d852a>] notify_die+0x52/0x60
> [   55.747448]  [<00000000005d57da>] illegal_op+0xce/0x134
> [   55.747457]  [<00000000005d60b4>] pgm_check_handler+0x154/0x158
> [   55.747797]  [<00000000005d8478>] __atomic_notifier_call_chain+0xd8/0xfc
> [   55.747806] ([<00000000005d83dc>] __atomic_notifier_call_chain+0x3c/0xfc)
> [   55.747816]  [<00000000005d84c6>] atomic_notifier_call_chain+0x2a/0x3c
> [   55.747826]  [<00000000005d852a>] notify_die+0x52/0x60
> [   55.748456]  [<00000000005d57da>] illegal_op+0xce/0x134
> [   55.748463]  [<00000000005d60b4>] pgm_check_handler+0x154/0x158
> [   55.748472]  [<000000000017afa0>] select_task_rq_fair+0x1478/0x14b4
> [   55.748483] ([<0000000000179bb8>] select_task_rq_fair+0x90/0x14b4)
> [   55.748493]  [<0000000000170702>] try_to_wake_up+0x136/0x47c
> [   55.748506]  [<000000000015b446>] autoremove_wake_function+0x26/0x58
> [   55.748518]  [<000000000016693a>] __wake_up_common+0x76/0xb4
> [   55.748530]  [<000000000016aed0>] __wake_up+0x4c/0x60
> [   55.748541]  [<0000000000109ee0>] s390_handle_mcck+0x194/0x1f8
> [   55.748557]  [<000000000010486a>] cpu_idle+0x192/0x1c0
> [   55.748570]  [<0000000000977916>] start_kernel+0x402/0x410
> [   55.748588]  [<0000000000100020>] _stext+0x20/0x80
> [   55.748603] 2 locks held by swapper/0/0:
> [   55.748612]  #0:  (crw_handler_wait_q.lock){......}, at: [<000000000016aeb6>] __wake_up+0x32/0x60
> [   55.748648]  #1:  (&p->pi_lock){-.-.-.}, at: [<000000000017060c>] try_to_wake_up+0x40/0x47c
> [   55.748663] Last Breaking-Event-Address:
> [   55.748667]  [<0000000000000000>] 0x0
> 
> This simply happens because WARN_ON_ONCE causes an exception, the excpetion
> handler wants to call a notifier call chain (notify_die), which again uses
> rcu_read_lock(), which again causes an exception and so on...
> Unfortunately WARN_ON_ONCE first causes an exception and only afterwards sets
> the flag that the warning already happened. Seems to be quite some effort to
> change this behaviour.
> 
> Removing the WARN_ON_ONCE will fix this and, if lockdep is turned on, still
> will find illegal uses. But it won't work for lockdep off configs...
> So we probably want something better than the patch below.

Ah ok. Hmm, but why are you using an exception to implement WARN_ON()
in s390? Is it to have a whole new stack for the warning path in order
to avoid stack overflow from the place that called the WARN_ON() ?

Anyway perhaps we need a recursion protection on WARN_ON_ONCE(), such
as:

diff --git a/include/asm-generic/bug.h b/include/asm-generic/bug.h
index 84458b0..f76635f 100644
--- a/include/asm-generic/bug.h
+++ b/include/asm-generic/bug.h
@@ -137,9 +137,13 @@ extern void warn_slowpath_null(const char *file, const int line);
 	static bool __warned;					\
 	int __ret_warn_once = !!(condition);			\
 								\
-	if (unlikely(__ret_warn_once))				\
-		if (WARN_ON(!__warned)) 			\
+	if (unlikely(__ret_warn_once)) {			\
+		if (!__warned) {				\
 			__warned = true;			\
+			barrier();				\
+			WARN_ON(1);				\
+		}						\
+	}							\
 	unlikely(__ret_warn_once);				\
 })
 

^ permalink raw reply related	[flat|nested] 13+ messages in thread

* Re: rcu warnings cause stack overflow
  2012-02-01 15:14 ` Frederic Weisbecker
@ 2012-02-01 17:18   ` Paul E. McKenney
  2012-02-01 18:08     ` Frederic Weisbecker
  2012-02-02 12:27   ` Heiko Carstens
  1 sibling, 1 reply; 13+ messages in thread
From: Paul E. McKenney @ 2012-02-01 17:18 UTC (permalink / raw)
  To: Frederic Weisbecker
  Cc: Heiko Carstens, linux-kernel, Ingo Molnar, Peter Zijlstra

On Wed, Feb 01, 2012 at 04:14:48PM +0100, Frederic Weisbecker wrote:
> On Wed, Feb 01, 2012 at 11:06:52AM +0100, Heiko Carstens wrote:
> > Hi Frederic,
> > 
> > your patch 00f49e5729 "rcu: Warn when rcu_read_lock() is used in extended
> > quiescent state" adds a WARN_ON_ONCE to rcu_lock_acquire().
> > Actually this found a bug on s390 (thanks!) but it probably didn't work
> > as expected.
> > On architectures which implement WARN_ON_ONCE with an exception this
> > additional warning will lead to a stack overflow (if it triggers):
> > 
> > [   55.746956] Kernel stack overflow.
> > [   55.746966] Modules linked in: qeth_l3 binfmt_misc dm_multipath scsi_dh dm_mod qeth vmur ccwgroup [last unloaded: scsi_wait_
> > scan]
> > [   55.746999] CPU: 0 Not tainted 3.3.0-rc1-00167-gf8275f9 #90
> > [   55.747005] Process swapper/0 (pid: 0, task: 0000000000911100, ksp: 0000000000907d50)
> > [   55.747013] Krnl PSW : 0404000180000000 00000000005d5728 (illegal_op+0x1c/0x134)
> > [   55.747034]            R:0 T:1 IO:0 EX:0 Key:0 M:1 W:0 P:0 AS:0 CC:0 PM:0 EA:3
> > [   55.747043] Krnl GPRS: 0000000000000001 00000000005d570c 00000000009040e8 0000000000000002
> > [   55.747054]            00000000005d83dc ffffffffffffffff 0000000000000000 0400000000907cc8
> > [   55.747064]            0404100180000000 00000000005d8478 0000000000000008 00000000009040e8
> > [   55.747074]            0000000000904000 00000000005dc550 0000000000904048 0000000000904048
> > [   55.747096] Krnl Code: 00000000005d571c: b90400ef            lgr     %r14,%r15
> > [   55.747118]            00000000005d5720: b90400b2            lgr     %r11,%r2
> > [   55.747194]           #00000000005d5724: a7840001            brc     8,5d5726
> > [   55.747205]           >00000000005d5728: a7fbff18            aghi    %r15,-232
> > [   55.747216]            00000000005d572c: e3e0f0980024        stg     %r14,152(%r15)
> > [   55.747228]            00000000005d5732: e31020100004        lg      %r1,16(%r2)
> > [   55.747242]            00000000005d5738: 58c020a0            l       %r12,160(%r2)
> > [   55.747257]            00000000005d573c: 91012009            tm      9(%r2),1
> > [   55.747276] Call Trace:
> > [   55.747282] ([<00000000005d60b4>] pgm_check_handler+0x154/0x158)
> > [   55.747296]  [<00000000005d8478>] __atomic_notifier_call_chain+0xd8/0xfc
> > [   55.747309] ([<00000000005d83dc>] __atomic_notifier_call_chain+0x3c/0xfc)
> > [   55.747322]  [<00000000005d84c6>] atomic_notifier_call_chain+0x2a/0x3c
> > [   55.747335]  [<00000000005d852a>] notify_die+0x52/0x60
> > [   55.747349]  [<00000000005d57da>] illegal_op+0xce/0x134
> > [   55.747364]  [<00000000005d60b4>] pgm_check_handler+0x154/0x158
> > 
> > [...lots more of the same...] 
> > 
> > [   55.747379]  [<00000000005d8478>] __atomic_notifier_call_chain+0xd8/0xfc
> > [   55.747425] ([<00000000005d83dc>] __atomic_notifier_call_chain+0x3c/0xfc)
> > [   55.747432]  [<00000000005d84c6>] atomic_notifier_call_chain+0x2a/0x3c
> > [   55.747440]  [<00000000005d852a>] notify_die+0x52/0x60
> > [   55.747448]  [<00000000005d57da>] illegal_op+0xce/0x134
> > [   55.747457]  [<00000000005d60b4>] pgm_check_handler+0x154/0x158
> > [   55.747797]  [<00000000005d8478>] __atomic_notifier_call_chain+0xd8/0xfc
> > [   55.747806] ([<00000000005d83dc>] __atomic_notifier_call_chain+0x3c/0xfc)
> > [   55.747816]  [<00000000005d84c6>] atomic_notifier_call_chain+0x2a/0x3c
> > [   55.747826]  [<00000000005d852a>] notify_die+0x52/0x60
> > [   55.748456]  [<00000000005d57da>] illegal_op+0xce/0x134
> > [   55.748463]  [<00000000005d60b4>] pgm_check_handler+0x154/0x158
> > [   55.748472]  [<000000000017afa0>] select_task_rq_fair+0x1478/0x14b4
> > [   55.748483] ([<0000000000179bb8>] select_task_rq_fair+0x90/0x14b4)
> > [   55.748493]  [<0000000000170702>] try_to_wake_up+0x136/0x47c
> > [   55.748506]  [<000000000015b446>] autoremove_wake_function+0x26/0x58
> > [   55.748518]  [<000000000016693a>] __wake_up_common+0x76/0xb4
> > [   55.748530]  [<000000000016aed0>] __wake_up+0x4c/0x60
> > [   55.748541]  [<0000000000109ee0>] s390_handle_mcck+0x194/0x1f8
> > [   55.748557]  [<000000000010486a>] cpu_idle+0x192/0x1c0
> > [   55.748570]  [<0000000000977916>] start_kernel+0x402/0x410
> > [   55.748588]  [<0000000000100020>] _stext+0x20/0x80
> > [   55.748603] 2 locks held by swapper/0/0:
> > [   55.748612]  #0:  (crw_handler_wait_q.lock){......}, at: [<000000000016aeb6>] __wake_up+0x32/0x60
> > [   55.748648]  #1:  (&p->pi_lock){-.-.-.}, at: [<000000000017060c>] try_to_wake_up+0x40/0x47c
> > [   55.748663] Last Breaking-Event-Address:
> > [   55.748667]  [<0000000000000000>] 0x0
> > 
> > This simply happens because WARN_ON_ONCE causes an exception, the excpetion
> > handler wants to call a notifier call chain (notify_die), which again uses
> > rcu_read_lock(), which again causes an exception and so on...
> > Unfortunately WARN_ON_ONCE first causes an exception and only afterwards sets
> > the flag that the warning already happened. Seems to be quite some effort to
> > change this behaviour.
> > 
> > Removing the WARN_ON_ONCE will fix this and, if lockdep is turned on, still
> > will find illegal uses. But it won't work for lockdep off configs...
> > So we probably want something better than the patch below.
> 
> Ah ok. Hmm, but why are you using an exception to implement WARN_ON()
> in s390? Is it to have a whole new stack for the warning path in order
> to avoid stack overflow from the place that called the WARN_ON() ?
> 
> Anyway perhaps we need a recursion protection on WARN_ON_ONCE(), such
> as:

This makes sense to me, but I am also taking Heiko's patch
removing the WARN_ON()s in favor of the lockdep checks now in
rcu_read_lock_held().  After all, the rcu_lock_acquire() WARN_ON_ONCE()
only appears if PROVE_LOCKING=y, so the added requirement is not a
big deal.  Also, CONFIG_PROVE_RCU is in the testing requirements in
Documentation/SubmitChecklist.

							Thanx, Paul

> diff --git a/include/asm-generic/bug.h b/include/asm-generic/bug.h
> index 84458b0..f76635f 100644
> --- a/include/asm-generic/bug.h
> +++ b/include/asm-generic/bug.h
> @@ -137,9 +137,13 @@ extern void warn_slowpath_null(const char *file, const int line);
>  	static bool __warned;					\
>  	int __ret_warn_once = !!(condition);			\
>  								\
> -	if (unlikely(__ret_warn_once))				\
> -		if (WARN_ON(!__warned)) 			\
> +	if (unlikely(__ret_warn_once)) {			\
> +		if (!__warned) {				\
>  			__warned = true;			\
> +			barrier();				\
> +			WARN_ON(1);				\
> +		}						\
> +	}							\
>  	unlikely(__ret_warn_once);				\
>  })
> 
> 


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: rcu warnings cause stack overflow
  2012-02-01 17:18   ` Paul E. McKenney
@ 2012-02-01 18:08     ` Frederic Weisbecker
  2012-02-01 18:22       ` Paul E. McKenney
  0 siblings, 1 reply; 13+ messages in thread
From: Frederic Weisbecker @ 2012-02-01 18:08 UTC (permalink / raw)
  To: Paul E. McKenney
  Cc: Heiko Carstens, linux-kernel, Ingo Molnar, Peter Zijlstra

On Wed, Feb 01, 2012 at 09:18:56AM -0800, Paul E. McKenney wrote:
> On Wed, Feb 01, 2012 at 04:14:48PM +0100, Frederic Weisbecker wrote:
> > On Wed, Feb 01, 2012 at 11:06:52AM +0100, Heiko Carstens wrote:
> > > Hi Frederic,
> > > 
> > > your patch 00f49e5729 "rcu: Warn when rcu_read_lock() is used in extended
> > > quiescent state" adds a WARN_ON_ONCE to rcu_lock_acquire().
> > > Actually this found a bug on s390 (thanks!) but it probably didn't work
> > > as expected.
> > > On architectures which implement WARN_ON_ONCE with an exception this
> > > additional warning will lead to a stack overflow (if it triggers):
> > > 
> > > [   55.746956] Kernel stack overflow.
> > > [   55.746966] Modules linked in: qeth_l3 binfmt_misc dm_multipath scsi_dh dm_mod qeth vmur ccwgroup [last unloaded: scsi_wait_
> > > scan]
> > > [   55.746999] CPU: 0 Not tainted 3.3.0-rc1-00167-gf8275f9 #90
> > > [   55.747005] Process swapper/0 (pid: 0, task: 0000000000911100, ksp: 0000000000907d50)
> > > [   55.747013] Krnl PSW : 0404000180000000 00000000005d5728 (illegal_op+0x1c/0x134)
> > > [   55.747034]            R:0 T:1 IO:0 EX:0 Key:0 M:1 W:0 P:0 AS:0 CC:0 PM:0 EA:3
> > > [   55.747043] Krnl GPRS: 0000000000000001 00000000005d570c 00000000009040e8 0000000000000002
> > > [   55.747054]            00000000005d83dc ffffffffffffffff 0000000000000000 0400000000907cc8
> > > [   55.747064]            0404100180000000 00000000005d8478 0000000000000008 00000000009040e8
> > > [   55.747074]            0000000000904000 00000000005dc550 0000000000904048 0000000000904048
> > > [   55.747096] Krnl Code: 00000000005d571c: b90400ef            lgr     %r14,%r15
> > > [   55.747118]            00000000005d5720: b90400b2            lgr     %r11,%r2
> > > [   55.747194]           #00000000005d5724: a7840001            brc     8,5d5726
> > > [   55.747205]           >00000000005d5728: a7fbff18            aghi    %r15,-232
> > > [   55.747216]            00000000005d572c: e3e0f0980024        stg     %r14,152(%r15)
> > > [   55.747228]            00000000005d5732: e31020100004        lg      %r1,16(%r2)
> > > [   55.747242]            00000000005d5738: 58c020a0            l       %r12,160(%r2)
> > > [   55.747257]            00000000005d573c: 91012009            tm      9(%r2),1
> > > [   55.747276] Call Trace:
> > > [   55.747282] ([<00000000005d60b4>] pgm_check_handler+0x154/0x158)
> > > [   55.747296]  [<00000000005d8478>] __atomic_notifier_call_chain+0xd8/0xfc
> > > [   55.747309] ([<00000000005d83dc>] __atomic_notifier_call_chain+0x3c/0xfc)
> > > [   55.747322]  [<00000000005d84c6>] atomic_notifier_call_chain+0x2a/0x3c
> > > [   55.747335]  [<00000000005d852a>] notify_die+0x52/0x60
> > > [   55.747349]  [<00000000005d57da>] illegal_op+0xce/0x134
> > > [   55.747364]  [<00000000005d60b4>] pgm_check_handler+0x154/0x158
> > > 
> > > [...lots more of the same...] 
> > > 
> > > [   55.747379]  [<00000000005d8478>] __atomic_notifier_call_chain+0xd8/0xfc
> > > [   55.747425] ([<00000000005d83dc>] __atomic_notifier_call_chain+0x3c/0xfc)
> > > [   55.747432]  [<00000000005d84c6>] atomic_notifier_call_chain+0x2a/0x3c
> > > [   55.747440]  [<00000000005d852a>] notify_die+0x52/0x60
> > > [   55.747448]  [<00000000005d57da>] illegal_op+0xce/0x134
> > > [   55.747457]  [<00000000005d60b4>] pgm_check_handler+0x154/0x158
> > > [   55.747797]  [<00000000005d8478>] __atomic_notifier_call_chain+0xd8/0xfc
> > > [   55.747806] ([<00000000005d83dc>] __atomic_notifier_call_chain+0x3c/0xfc)
> > > [   55.747816]  [<00000000005d84c6>] atomic_notifier_call_chain+0x2a/0x3c
> > > [   55.747826]  [<00000000005d852a>] notify_die+0x52/0x60
> > > [   55.748456]  [<00000000005d57da>] illegal_op+0xce/0x134
> > > [   55.748463]  [<00000000005d60b4>] pgm_check_handler+0x154/0x158
> > > [   55.748472]  [<000000000017afa0>] select_task_rq_fair+0x1478/0x14b4
> > > [   55.748483] ([<0000000000179bb8>] select_task_rq_fair+0x90/0x14b4)
> > > [   55.748493]  [<0000000000170702>] try_to_wake_up+0x136/0x47c
> > > [   55.748506]  [<000000000015b446>] autoremove_wake_function+0x26/0x58
> > > [   55.748518]  [<000000000016693a>] __wake_up_common+0x76/0xb4
> > > [   55.748530]  [<000000000016aed0>] __wake_up+0x4c/0x60
> > > [   55.748541]  [<0000000000109ee0>] s390_handle_mcck+0x194/0x1f8
> > > [   55.748557]  [<000000000010486a>] cpu_idle+0x192/0x1c0
> > > [   55.748570]  [<0000000000977916>] start_kernel+0x402/0x410
> > > [   55.748588]  [<0000000000100020>] _stext+0x20/0x80
> > > [   55.748603] 2 locks held by swapper/0/0:
> > > [   55.748612]  #0:  (crw_handler_wait_q.lock){......}, at: [<000000000016aeb6>] __wake_up+0x32/0x60
> > > [   55.748648]  #1:  (&p->pi_lock){-.-.-.}, at: [<000000000017060c>] try_to_wake_up+0x40/0x47c
> > > [   55.748663] Last Breaking-Event-Address:
> > > [   55.748667]  [<0000000000000000>] 0x0
> > > 
> > > This simply happens because WARN_ON_ONCE causes an exception, the excpetion
> > > handler wants to call a notifier call chain (notify_die), which again uses
> > > rcu_read_lock(), which again causes an exception and so on...
> > > Unfortunately WARN_ON_ONCE first causes an exception and only afterwards sets
> > > the flag that the warning already happened. Seems to be quite some effort to
> > > change this behaviour.
> > > 
> > > Removing the WARN_ON_ONCE will fix this and, if lockdep is turned on, still
> > > will find illegal uses. But it won't work for lockdep off configs...
> > > So we probably want something better than the patch below.
> > 
> > Ah ok. Hmm, but why are you using an exception to implement WARN_ON()
> > in s390? Is it to have a whole new stack for the warning path in order
> > to avoid stack overflow from the place that called the WARN_ON() ?
> > 
> > Anyway perhaps we need a recursion protection on WARN_ON_ONCE(), such
> > as:
> 
> This makes sense to me, but I am also taking Heiko's patch
> removing the WARN_ON()s in favor of the lockdep checks now in
> rcu_read_lock_held().  After all, the rcu_lock_acquire() WARN_ON_ONCE()
> only appears if PROVE_LOCKING=y, so the added requirement is not a
> big deal.  Also, CONFIG_PROVE_RCU is in the testing requirements in
> Documentation/SubmitChecklist.
> 
> 							Thanx, Paul

I think if we do this, we lose the debugging coverage on places that
make use of rcu_read_lock{_sched,_bh}() without using rcu_dereference{_sched,_bh}().

It seems walking the task list with list_for_each_entry_rcu() under rcu_read_lock() is
one such example because rcu lists are using rcu_dereference_raw().

Also I realize a problem with my below patch. The exception will call rcu_read_lock()
from another callsite so we are still going to recurse. Only of one level but we will
recurse. This could mess up the warning output, although it prevents from the
stack overflow. If we still go that direction, we probably need a more global, or per
cpu, recursion protection of these RCU warnings.

> 
> > diff --git a/include/asm-generic/bug.h b/include/asm-generic/bug.h
> > index 84458b0..f76635f 100644
> > --- a/include/asm-generic/bug.h
> > +++ b/include/asm-generic/bug.h
> > @@ -137,9 +137,13 @@ extern void warn_slowpath_null(const char *file, const int line);
> >  	static bool __warned;					\
> >  	int __ret_warn_once = !!(condition);			\
> >  								\
> > -	if (unlikely(__ret_warn_once))				\
> > -		if (WARN_ON(!__warned)) 			\
> > +	if (unlikely(__ret_warn_once)) {			\
> > +		if (!__warned) {				\
> >  			__warned = true;			\
> > +			barrier();				\
> > +			WARN_ON(1);				\
> > +		}						\
> > +	}							\
> >  	unlikely(__ret_warn_once);				\
> >  })
> > 
> > 
> 

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: rcu warnings cause stack overflow
  2012-02-01 18:08     ` Frederic Weisbecker
@ 2012-02-01 18:22       ` Paul E. McKenney
  2012-02-01 18:31         ` Frederic Weisbecker
  0 siblings, 1 reply; 13+ messages in thread
From: Paul E. McKenney @ 2012-02-01 18:22 UTC (permalink / raw)
  To: Frederic Weisbecker
  Cc: Heiko Carstens, linux-kernel, Ingo Molnar, Peter Zijlstra

On Wed, Feb 01, 2012 at 07:08:15PM +0100, Frederic Weisbecker wrote:
> On Wed, Feb 01, 2012 at 09:18:56AM -0800, Paul E. McKenney wrote:
> > On Wed, Feb 01, 2012 at 04:14:48PM +0100, Frederic Weisbecker wrote:
> > > On Wed, Feb 01, 2012 at 11:06:52AM +0100, Heiko Carstens wrote:
> > > > Hi Frederic,
> > > > 
> > > > your patch 00f49e5729 "rcu: Warn when rcu_read_lock() is used in extended
> > > > quiescent state" adds a WARN_ON_ONCE to rcu_lock_acquire().
> > > > Actually this found a bug on s390 (thanks!) but it probably didn't work
> > > > as expected.
> > > > On architectures which implement WARN_ON_ONCE with an exception this
> > > > additional warning will lead to a stack overflow (if it triggers):
> > > > 
> > > > [   55.746956] Kernel stack overflow.
> > > > [   55.746966] Modules linked in: qeth_l3 binfmt_misc dm_multipath scsi_dh dm_mod qeth vmur ccwgroup [last unloaded: scsi_wait_
> > > > scan]
> > > > [   55.746999] CPU: 0 Not tainted 3.3.0-rc1-00167-gf8275f9 #90
> > > > [   55.747005] Process swapper/0 (pid: 0, task: 0000000000911100, ksp: 0000000000907d50)
> > > > [   55.747013] Krnl PSW : 0404000180000000 00000000005d5728 (illegal_op+0x1c/0x134)
> > > > [   55.747034]            R:0 T:1 IO:0 EX:0 Key:0 M:1 W:0 P:0 AS:0 CC:0 PM:0 EA:3
> > > > [   55.747043] Krnl GPRS: 0000000000000001 00000000005d570c 00000000009040e8 0000000000000002
> > > > [   55.747054]            00000000005d83dc ffffffffffffffff 0000000000000000 0400000000907cc8
> > > > [   55.747064]            0404100180000000 00000000005d8478 0000000000000008 00000000009040e8
> > > > [   55.747074]            0000000000904000 00000000005dc550 0000000000904048 0000000000904048
> > > > [   55.747096] Krnl Code: 00000000005d571c: b90400ef            lgr     %r14,%r15
> > > > [   55.747118]            00000000005d5720: b90400b2            lgr     %r11,%r2
> > > > [   55.747194]           #00000000005d5724: a7840001            brc     8,5d5726
> > > > [   55.747205]           >00000000005d5728: a7fbff18            aghi    %r15,-232
> > > > [   55.747216]            00000000005d572c: e3e0f0980024        stg     %r14,152(%r15)
> > > > [   55.747228]            00000000005d5732: e31020100004        lg      %r1,16(%r2)
> > > > [   55.747242]            00000000005d5738: 58c020a0            l       %r12,160(%r2)
> > > > [   55.747257]            00000000005d573c: 91012009            tm      9(%r2),1
> > > > [   55.747276] Call Trace:
> > > > [   55.747282] ([<00000000005d60b4>] pgm_check_handler+0x154/0x158)
> > > > [   55.747296]  [<00000000005d8478>] __atomic_notifier_call_chain+0xd8/0xfc
> > > > [   55.747309] ([<00000000005d83dc>] __atomic_notifier_call_chain+0x3c/0xfc)
> > > > [   55.747322]  [<00000000005d84c6>] atomic_notifier_call_chain+0x2a/0x3c
> > > > [   55.747335]  [<00000000005d852a>] notify_die+0x52/0x60
> > > > [   55.747349]  [<00000000005d57da>] illegal_op+0xce/0x134
> > > > [   55.747364]  [<00000000005d60b4>] pgm_check_handler+0x154/0x158
> > > > 
> > > > [...lots more of the same...] 
> > > > 
> > > > [   55.747379]  [<00000000005d8478>] __atomic_notifier_call_chain+0xd8/0xfc
> > > > [   55.747425] ([<00000000005d83dc>] __atomic_notifier_call_chain+0x3c/0xfc)
> > > > [   55.747432]  [<00000000005d84c6>] atomic_notifier_call_chain+0x2a/0x3c
> > > > [   55.747440]  [<00000000005d852a>] notify_die+0x52/0x60
> > > > [   55.747448]  [<00000000005d57da>] illegal_op+0xce/0x134
> > > > [   55.747457]  [<00000000005d60b4>] pgm_check_handler+0x154/0x158
> > > > [   55.747797]  [<00000000005d8478>] __atomic_notifier_call_chain+0xd8/0xfc
> > > > [   55.747806] ([<00000000005d83dc>] __atomic_notifier_call_chain+0x3c/0xfc)
> > > > [   55.747816]  [<00000000005d84c6>] atomic_notifier_call_chain+0x2a/0x3c
> > > > [   55.747826]  [<00000000005d852a>] notify_die+0x52/0x60
> > > > [   55.748456]  [<00000000005d57da>] illegal_op+0xce/0x134
> > > > [   55.748463]  [<00000000005d60b4>] pgm_check_handler+0x154/0x158
> > > > [   55.748472]  [<000000000017afa0>] select_task_rq_fair+0x1478/0x14b4
> > > > [   55.748483] ([<0000000000179bb8>] select_task_rq_fair+0x90/0x14b4)
> > > > [   55.748493]  [<0000000000170702>] try_to_wake_up+0x136/0x47c
> > > > [   55.748506]  [<000000000015b446>] autoremove_wake_function+0x26/0x58
> > > > [   55.748518]  [<000000000016693a>] __wake_up_common+0x76/0xb4
> > > > [   55.748530]  [<000000000016aed0>] __wake_up+0x4c/0x60
> > > > [   55.748541]  [<0000000000109ee0>] s390_handle_mcck+0x194/0x1f8
> > > > [   55.748557]  [<000000000010486a>] cpu_idle+0x192/0x1c0
> > > > [   55.748570]  [<0000000000977916>] start_kernel+0x402/0x410
> > > > [   55.748588]  [<0000000000100020>] _stext+0x20/0x80
> > > > [   55.748603] 2 locks held by swapper/0/0:
> > > > [   55.748612]  #0:  (crw_handler_wait_q.lock){......}, at: [<000000000016aeb6>] __wake_up+0x32/0x60
> > > > [   55.748648]  #1:  (&p->pi_lock){-.-.-.}, at: [<000000000017060c>] try_to_wake_up+0x40/0x47c
> > > > [   55.748663] Last Breaking-Event-Address:
> > > > [   55.748667]  [<0000000000000000>] 0x0
> > > > 
> > > > This simply happens because WARN_ON_ONCE causes an exception, the excpetion
> > > > handler wants to call a notifier call chain (notify_die), which again uses
> > > > rcu_read_lock(), which again causes an exception and so on...
> > > > Unfortunately WARN_ON_ONCE first causes an exception and only afterwards sets
> > > > the flag that the warning already happened. Seems to be quite some effort to
> > > > change this behaviour.
> > > > 
> > > > Removing the WARN_ON_ONCE will fix this and, if lockdep is turned on, still
> > > > will find illegal uses. But it won't work for lockdep off configs...
> > > > So we probably want something better than the patch below.
> > > 
> > > Ah ok. Hmm, but why are you using an exception to implement WARN_ON()
> > > in s390? Is it to have a whole new stack for the warning path in order
> > > to avoid stack overflow from the place that called the WARN_ON() ?
> > > 
> > > Anyway perhaps we need a recursion protection on WARN_ON_ONCE(), such
> > > as:
> > 
> > This makes sense to me, but I am also taking Heiko's patch
> > removing the WARN_ON()s in favor of the lockdep checks now in
> > rcu_read_lock_held().  After all, the rcu_lock_acquire() WARN_ON_ONCE()
> > only appears if PROVE_LOCKING=y, so the added requirement is not a
> > big deal.  Also, CONFIG_PROVE_RCU is in the testing requirements in
> > Documentation/SubmitChecklist.
> > 
> > 							Thanx, Paul
> 
> I think if we do this, we lose the debugging coverage on places that
> make use of rcu_read_lock{_sched,_bh}() without using rcu_dereference{_sched,_bh}().
> 
> It seems walking the task list with list_for_each_entry_rcu() under rcu_read_lock() is
> one such example because rcu lists are using rcu_dereference_raw().

Good point.  I will convert these to rcu_lockdep_assert()s.

> Also I realize a problem with my below patch. The exception will call rcu_read_lock()
> from another callsite so we are still going to recurse. Only of one level but we will
> recurse. This could mess up the warning output, although it prevents from the
> stack overflow. If we still go that direction, we probably need a more global, or per
> cpu, recursion protection of these RCU warnings.

Hmmm...  Should this be global or specific to S390?

							Thanx, Paul

> > > diff --git a/include/asm-generic/bug.h b/include/asm-generic/bug.h
> > > index 84458b0..f76635f 100644
> > > --- a/include/asm-generic/bug.h
> > > +++ b/include/asm-generic/bug.h
> > > @@ -137,9 +137,13 @@ extern void warn_slowpath_null(const char *file, const int line);
> > >  	static bool __warned;					\
> > >  	int __ret_warn_once = !!(condition);			\
> > >  								\
> > > -	if (unlikely(__ret_warn_once))				\
> > > -		if (WARN_ON(!__warned)) 			\
> > > +	if (unlikely(__ret_warn_once)) {			\
> > > +		if (!__warned) {				\
> > >  			__warned = true;			\
> > > +			barrier();				\
> > > +			WARN_ON(1);				\
> > > +		}						\
> > > +	}							\
> > >  	unlikely(__ret_warn_once);				\
> > >  })
> > > 
> > > 
> > 
> 


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: rcu warnings cause stack overflow
  2012-02-01 18:22       ` Paul E. McKenney
@ 2012-02-01 18:31         ` Frederic Weisbecker
  0 siblings, 0 replies; 13+ messages in thread
From: Frederic Weisbecker @ 2012-02-01 18:31 UTC (permalink / raw)
  To: Paul E. McKenney
  Cc: Heiko Carstens, linux-kernel, Ingo Molnar, Peter Zijlstra

On Wed, Feb 01, 2012 at 10:22:09AM -0800, Paul E. McKenney wrote:
> On Wed, Feb 01, 2012 at 07:08:15PM +0100, Frederic Weisbecker wrote:
> > On Wed, Feb 01, 2012 at 09:18:56AM -0800, Paul E. McKenney wrote:
> > > On Wed, Feb 01, 2012 at 04:14:48PM +0100, Frederic Weisbecker wrote:
> > > > On Wed, Feb 01, 2012 at 11:06:52AM +0100, Heiko Carstens wrote:
> > > > > Hi Frederic,
> > > > > 
> > > > > your patch 00f49e5729 "rcu: Warn when rcu_read_lock() is used in extended
> > > > > quiescent state" adds a WARN_ON_ONCE to rcu_lock_acquire().
> > > > > Actually this found a bug on s390 (thanks!) but it probably didn't work
> > > > > as expected.
> > > > > On architectures which implement WARN_ON_ONCE with an exception this
> > > > > additional warning will lead to a stack overflow (if it triggers):
> > > > > 
> > > > > [   55.746956] Kernel stack overflow.
> > > > > [   55.746966] Modules linked in: qeth_l3 binfmt_misc dm_multipath scsi_dh dm_mod qeth vmur ccwgroup [last unloaded: scsi_wait_
> > > > > scan]
> > > > > [   55.746999] CPU: 0 Not tainted 3.3.0-rc1-00167-gf8275f9 #90
> > > > > [   55.747005] Process swapper/0 (pid: 0, task: 0000000000911100, ksp: 0000000000907d50)
> > > > > [   55.747013] Krnl PSW : 0404000180000000 00000000005d5728 (illegal_op+0x1c/0x134)
> > > > > [   55.747034]            R:0 T:1 IO:0 EX:0 Key:0 M:1 W:0 P:0 AS:0 CC:0 PM:0 EA:3
> > > > > [   55.747043] Krnl GPRS: 0000000000000001 00000000005d570c 00000000009040e8 0000000000000002
> > > > > [   55.747054]            00000000005d83dc ffffffffffffffff 0000000000000000 0400000000907cc8
> > > > > [   55.747064]            0404100180000000 00000000005d8478 0000000000000008 00000000009040e8
> > > > > [   55.747074]            0000000000904000 00000000005dc550 0000000000904048 0000000000904048
> > > > > [   55.747096] Krnl Code: 00000000005d571c: b90400ef            lgr     %r14,%r15
> > > > > [   55.747118]            00000000005d5720: b90400b2            lgr     %r11,%r2
> > > > > [   55.747194]           #00000000005d5724: a7840001            brc     8,5d5726
> > > > > [   55.747205]           >00000000005d5728: a7fbff18            aghi    %r15,-232
> > > > > [   55.747216]            00000000005d572c: e3e0f0980024        stg     %r14,152(%r15)
> > > > > [   55.747228]            00000000005d5732: e31020100004        lg      %r1,16(%r2)
> > > > > [   55.747242]            00000000005d5738: 58c020a0            l       %r12,160(%r2)
> > > > > [   55.747257]            00000000005d573c: 91012009            tm      9(%r2),1
> > > > > [   55.747276] Call Trace:
> > > > > [   55.747282] ([<00000000005d60b4>] pgm_check_handler+0x154/0x158)
> > > > > [   55.747296]  [<00000000005d8478>] __atomic_notifier_call_chain+0xd8/0xfc
> > > > > [   55.747309] ([<00000000005d83dc>] __atomic_notifier_call_chain+0x3c/0xfc)
> > > > > [   55.747322]  [<00000000005d84c6>] atomic_notifier_call_chain+0x2a/0x3c
> > > > > [   55.747335]  [<00000000005d852a>] notify_die+0x52/0x60
> > > > > [   55.747349]  [<00000000005d57da>] illegal_op+0xce/0x134
> > > > > [   55.747364]  [<00000000005d60b4>] pgm_check_handler+0x154/0x158
> > > > > 
> > > > > [...lots more of the same...] 
> > > > > 
> > > > > [   55.747379]  [<00000000005d8478>] __atomic_notifier_call_chain+0xd8/0xfc
> > > > > [   55.747425] ([<00000000005d83dc>] __atomic_notifier_call_chain+0x3c/0xfc)
> > > > > [   55.747432]  [<00000000005d84c6>] atomic_notifier_call_chain+0x2a/0x3c
> > > > > [   55.747440]  [<00000000005d852a>] notify_die+0x52/0x60
> > > > > [   55.747448]  [<00000000005d57da>] illegal_op+0xce/0x134
> > > > > [   55.747457]  [<00000000005d60b4>] pgm_check_handler+0x154/0x158
> > > > > [   55.747797]  [<00000000005d8478>] __atomic_notifier_call_chain+0xd8/0xfc
> > > > > [   55.747806] ([<00000000005d83dc>] __atomic_notifier_call_chain+0x3c/0xfc)
> > > > > [   55.747816]  [<00000000005d84c6>] atomic_notifier_call_chain+0x2a/0x3c
> > > > > [   55.747826]  [<00000000005d852a>] notify_die+0x52/0x60
> > > > > [   55.748456]  [<00000000005d57da>] illegal_op+0xce/0x134
> > > > > [   55.748463]  [<00000000005d60b4>] pgm_check_handler+0x154/0x158
> > > > > [   55.748472]  [<000000000017afa0>] select_task_rq_fair+0x1478/0x14b4
> > > > > [   55.748483] ([<0000000000179bb8>] select_task_rq_fair+0x90/0x14b4)
> > > > > [   55.748493]  [<0000000000170702>] try_to_wake_up+0x136/0x47c
> > > > > [   55.748506]  [<000000000015b446>] autoremove_wake_function+0x26/0x58
> > > > > [   55.748518]  [<000000000016693a>] __wake_up_common+0x76/0xb4
> > > > > [   55.748530]  [<000000000016aed0>] __wake_up+0x4c/0x60
> > > > > [   55.748541]  [<0000000000109ee0>] s390_handle_mcck+0x194/0x1f8
> > > > > [   55.748557]  [<000000000010486a>] cpu_idle+0x192/0x1c0
> > > > > [   55.748570]  [<0000000000977916>] start_kernel+0x402/0x410
> > > > > [   55.748588]  [<0000000000100020>] _stext+0x20/0x80
> > > > > [   55.748603] 2 locks held by swapper/0/0:
> > > > > [   55.748612]  #0:  (crw_handler_wait_q.lock){......}, at: [<000000000016aeb6>] __wake_up+0x32/0x60
> > > > > [   55.748648]  #1:  (&p->pi_lock){-.-.-.}, at: [<000000000017060c>] try_to_wake_up+0x40/0x47c
> > > > > [   55.748663] Last Breaking-Event-Address:
> > > > > [   55.748667]  [<0000000000000000>] 0x0
> > > > > 
> > > > > This simply happens because WARN_ON_ONCE causes an exception, the excpetion
> > > > > handler wants to call a notifier call chain (notify_die), which again uses
> > > > > rcu_read_lock(), which again causes an exception and so on...
> > > > > Unfortunately WARN_ON_ONCE first causes an exception and only afterwards sets
> > > > > the flag that the warning already happened. Seems to be quite some effort to
> > > > > change this behaviour.
> > > > > 
> > > > > Removing the WARN_ON_ONCE will fix this and, if lockdep is turned on, still
> > > > > will find illegal uses. But it won't work for lockdep off configs...
> > > > > So we probably want something better than the patch below.
> > > > 
> > > > Ah ok. Hmm, but why are you using an exception to implement WARN_ON()
> > > > in s390? Is it to have a whole new stack for the warning path in order
> > > > to avoid stack overflow from the place that called the WARN_ON() ?
> > > > 
> > > > Anyway perhaps we need a recursion protection on WARN_ON_ONCE(), such
> > > > as:
> > > 
> > > This makes sense to me, but I am also taking Heiko's patch
> > > removing the WARN_ON()s in favor of the lockdep checks now in
> > > rcu_read_lock_held().  After all, the rcu_lock_acquire() WARN_ON_ONCE()
> > > only appears if PROVE_LOCKING=y, so the added requirement is not a
> > > big deal.  Also, CONFIG_PROVE_RCU is in the testing requirements in
> > > Documentation/SubmitChecklist.
> > > 
> > > 							Thanx, Paul
> > 
> > I think if we do this, we lose the debugging coverage on places that
> > make use of rcu_read_lock{_sched,_bh}() without using rcu_dereference{_sched,_bh}().
> > 
> > It seems walking the task list with list_for_each_entry_rcu() under rcu_read_lock() is
> > one such example because rcu lists are using rcu_dereference_raw().
> 
> Good point.  I will convert these to rcu_lockdep_assert()s.

But rcu_dereference_raw() can be used outside rcu_read_lock() as well.

> 
> > Also I realize a problem with my below patch. The exception will call rcu_read_lock()
> > from another callsite so we are still going to recurse. Only of one level but we will
> > recurse. This could mess up the warning output, although it prevents from the
> > stack overflow. If we still go that direction, we probably need a more global, or per
> > cpu, recursion protection of these RCU warnings.
> 
> Hmmm...  Should this be global or specific to S390?

I guess specific to S390. But it may be a good idea to protect against
WARN() recursions in general. One would expect such function to be quite
robust.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: rcu warnings cause stack overflow
  2012-02-01 15:14 ` Frederic Weisbecker
  2012-02-01 17:18   ` Paul E. McKenney
@ 2012-02-02 12:27   ` Heiko Carstens
  2012-02-02 14:52     ` Frederic Weisbecker
  1 sibling, 1 reply; 13+ messages in thread
From: Heiko Carstens @ 2012-02-02 12:27 UTC (permalink / raw)
  To: Frederic Weisbecker
  Cc: linux-kernel, Paul E. McKenney, Ingo Molnar, Peter Zijlstra

On Wed, Feb 01, 2012 at 04:14:48PM +0100, Frederic Weisbecker wrote:
> > Removing the WARN_ON_ONCE will fix this and, if lockdep is turned on, still
> > will find illegal uses. But it won't work for lockdep off configs...
> > So we probably want something better than the patch below.
> 
> Ah ok. Hmm, but why are you using an exception to implement WARN_ON()
> in s390? Is it to have a whole new stack for the warning path in order
> to avoid stack overflow from the place that called the WARN_ON() ?

The reason was to reduce the code footprint of the WARN_ON() and also
be able to print the register contents at the time the warning happened.

All architectures which define __WARN_TAINT implement warnings with
exceptions. Currently that are parisc, powerpc, s390 and sh.


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: rcu warnings cause stack overflow
  2012-02-02 12:27   ` Heiko Carstens
@ 2012-02-02 14:52     ` Frederic Weisbecker
  2012-02-02 19:11       ` Paul E. McKenney
  0 siblings, 1 reply; 13+ messages in thread
From: Frederic Weisbecker @ 2012-02-02 14:52 UTC (permalink / raw)
  To: Heiko Carstens
  Cc: linux-kernel, Paul E. McKenney, Ingo Molnar, Peter Zijlstra

On Thu, Feb 02, 2012 at 01:27:42PM +0100, Heiko Carstens wrote:
> On Wed, Feb 01, 2012 at 04:14:48PM +0100, Frederic Weisbecker wrote:
> > > Removing the WARN_ON_ONCE will fix this and, if lockdep is turned on, still
> > > will find illegal uses. But it won't work for lockdep off configs...
> > > So we probably want something better than the patch below.
> > 
> > Ah ok. Hmm, but why are you using an exception to implement WARN_ON()
> > in s390? Is it to have a whole new stack for the warning path in order
> > to avoid stack overflow from the place that called the WARN_ON() ?
> 
> The reason was to reduce the code footprint of the WARN_ON() and also
> be able to print the register contents at the time the warning happened.

Ah ok, makes sense.

> 
> All architectures which define __WARN_TAINT implement warnings with
> exceptions. Currently that are parisc, powerpc, s390 and sh.
> 

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: rcu warnings cause stack overflow
  2012-02-02 14:52     ` Frederic Weisbecker
@ 2012-02-02 19:11       ` Paul E. McKenney
  2012-02-03  9:32         ` Heiko Carstens
  0 siblings, 1 reply; 13+ messages in thread
From: Paul E. McKenney @ 2012-02-02 19:11 UTC (permalink / raw)
  To: Frederic Weisbecker
  Cc: Heiko Carstens, linux-kernel, Ingo Molnar, Peter Zijlstra

On Thu, Feb 02, 2012 at 03:52:20PM +0100, Frederic Weisbecker wrote:
> On Thu, Feb 02, 2012 at 01:27:42PM +0100, Heiko Carstens wrote:
> > On Wed, Feb 01, 2012 at 04:14:48PM +0100, Frederic Weisbecker wrote:
> > > > Removing the WARN_ON_ONCE will fix this and, if lockdep is turned on, still
> > > > will find illegal uses. But it won't work for lockdep off configs...
> > > > So we probably want something better than the patch below.
> > > 
> > > Ah ok. Hmm, but why are you using an exception to implement WARN_ON()
> > > in s390? Is it to have a whole new stack for the warning path in order
> > > to avoid stack overflow from the place that called the WARN_ON() ?
> > 
> > The reason was to reduce the code footprint of the WARN_ON() and also
> > be able to print the register contents at the time the warning happened.
> 
> Ah ok, makes sense.

So Frederic should push his anti-recursion patch, then?

							Thanx, Paul

> > All architectures which define __WARN_TAINT implement warnings with
> > exceptions. Currently that are parisc, powerpc, s390 and sh.
> > 
> 


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: rcu warnings cause stack overflow
  2012-02-02 19:11       ` Paul E. McKenney
@ 2012-02-03  9:32         ` Heiko Carstens
  2012-02-03 18:33           ` Paul E. McKenney
  0 siblings, 1 reply; 13+ messages in thread
From: Heiko Carstens @ 2012-02-03  9:32 UTC (permalink / raw)
  To: Paul E. McKenney
  Cc: Frederic Weisbecker, linux-kernel, Ingo Molnar, Peter Zijlstra

On Thu, Feb 02, 2012 at 11:11:16AM -0800, Paul E. McKenney wrote:
> On Thu, Feb 02, 2012 at 03:52:20PM +0100, Frederic Weisbecker wrote:
> > On Thu, Feb 02, 2012 at 01:27:42PM +0100, Heiko Carstens wrote:
> > > On Wed, Feb 01, 2012 at 04:14:48PM +0100, Frederic Weisbecker wrote:
> > > > > Removing the WARN_ON_ONCE will fix this and, if lockdep is turned on, still
> > > > > will find illegal uses. But it won't work for lockdep off configs...
> > > > > So we probably want something better than the patch below.
> > > > 
> > > > Ah ok. Hmm, but why are you using an exception to implement WARN_ON()
> > > > in s390? Is it to have a whole new stack for the warning path in order
> > > > to avoid stack overflow from the place that called the WARN_ON() ?
> > > 
> > > The reason was to reduce the code footprint of the WARN_ON() and also
> > > be able to print the register contents at the time the warning happened.
> > 
> > Ah ok, makes sense.
> 
> So Frederic should push his anti-recursion patch, then?

Yes, please.

Tested-by: Heiko Carstens <heiko.carstens@de.ibm.com>

It still generates recursive warnings because the WARNON_ONCE is inlined and
every different usage will generate an exception, but it didn't produce a
stack overflow anymore.
To avoid the recursive warning the patch below would help. Not sure if it's
worth it...

Subject: [PATCH] rcu: move rcu_is_cpu_idle() check warning into C file

From: Heiko Carstens <heiko.carstens@de.ibm.com>

rcu_read_lock() and rcu_read_unlock() generate a warning if a cpu is in
extended quiescant state. Since these functions are inlined this can cause
a lot of warnings if in the processing of the WARN_ON_ONCE() there is
another usage of e.g. rcu_read_lock(). To make sure we only get one
warning (and avoid possible stack overflows) uninline the check.

Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
---
 include/linux/rcupdate.h |    9 +++++++--
 kernel/rcupdate.c        |    6 ++++++
 2 files changed, 13 insertions(+), 2 deletions(-)

diff --git a/include/linux/rcupdate.h b/include/linux/rcupdate.h
index 81c04f4..9fe7be5 100644
--- a/include/linux/rcupdate.h
+++ b/include/linux/rcupdate.h
@@ -230,22 +230,27 @@ static inline void destroy_rcu_head_on_stack(struct rcu_head *head)
 
 #ifdef CONFIG_PROVE_RCU
 extern int rcu_is_cpu_idle(void);
+extern void rcu_warn_if_is_cpu_idle(void);
 #else /* !CONFIG_PROVE_RCU */
 static inline int rcu_is_cpu_idle(void)
 {
 	return 0;
 }
+
+static inline void rcu_warn_if_is_cpu_idle(void)
+{
+}
 #endif /* else !CONFIG_PROVE_RCU */
 
 static inline void rcu_lock_acquire(struct lockdep_map *map)
 {
-	WARN_ON_ONCE(rcu_is_cpu_idle());
+	rcu_warn_if_is_cpu_idle();
 	lock_acquire(map, 0, 0, 2, 1, NULL, _THIS_IP_);
 }
 
 static inline void rcu_lock_release(struct lockdep_map *map)
 {
-	WARN_ON_ONCE(rcu_is_cpu_idle());
+	rcu_warn_if_is_cpu_idle();
 	lock_release(map, 1, _THIS_IP_);
 }
 
diff --git a/kernel/rcupdate.c b/kernel/rcupdate.c
index 2bc4e13..5deca18 100644
--- a/kernel/rcupdate.c
+++ b/kernel/rcupdate.c
@@ -141,6 +141,12 @@ int rcu_my_thread_group_empty(void)
 	return thread_group_empty(current);
 }
 EXPORT_SYMBOL_GPL(rcu_my_thread_group_empty);
+
+void rcu_warn_if_is_cpu_idle(void)
+{
+	WARN_ON_ONCE(rcu_is_cpu_idle());
+}
+EXPORT_SYMBOL_GPL(rcu_warn_if_is_cpu_idle);
 #endif /* #ifdef CONFIG_PROVE_RCU */
 
 #ifdef CONFIG_DEBUG_OBJECTS_RCU_HEAD


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* Re: rcu warnings cause stack overflow
  2012-02-03  9:32         ` Heiko Carstens
@ 2012-02-03 18:33           ` Paul E. McKenney
  2012-02-04 13:13             ` Frederic Weisbecker
  0 siblings, 1 reply; 13+ messages in thread
From: Paul E. McKenney @ 2012-02-03 18:33 UTC (permalink / raw)
  To: Heiko Carstens
  Cc: Frederic Weisbecker, linux-kernel, Ingo Molnar, Peter Zijlstra

On Fri, Feb 03, 2012 at 10:32:14AM +0100, Heiko Carstens wrote:
> On Thu, Feb 02, 2012 at 11:11:16AM -0800, Paul E. McKenney wrote:
> > On Thu, Feb 02, 2012 at 03:52:20PM +0100, Frederic Weisbecker wrote:
> > > On Thu, Feb 02, 2012 at 01:27:42PM +0100, Heiko Carstens wrote:
> > > > On Wed, Feb 01, 2012 at 04:14:48PM +0100, Frederic Weisbecker wrote:
> > > > > > Removing the WARN_ON_ONCE will fix this and, if lockdep is turned on, still
> > > > > > will find illegal uses. But it won't work for lockdep off configs...
> > > > > > So we probably want something better than the patch below.
> > > > > 
> > > > > Ah ok. Hmm, but why are you using an exception to implement WARN_ON()
> > > > > in s390? Is it to have a whole new stack for the warning path in order
> > > > > to avoid stack overflow from the place that called the WARN_ON() ?
> > > > 
> > > > The reason was to reduce the code footprint of the WARN_ON() and also
> > > > be able to print the register contents at the time the warning happened.
> > > 
> > > Ah ok, makes sense.
> > 
> > So Frederic should push his anti-recursion patch, then?
> 
> Yes, please.
> 
> Tested-by: Heiko Carstens <heiko.carstens@de.ibm.com>
> 
> It still generates recursive warnings because the WARNON_ONCE is inlined and
> every different usage will generate an exception, but it didn't produce a
> stack overflow anymore.
> To avoid the recursive warning the patch below would help. Not sure if it's
> worth it...
> 
> Subject: [PATCH] rcu: move rcu_is_cpu_idle() check warning into C file
> 
> From: Heiko Carstens <heiko.carstens@de.ibm.com>
> 
> rcu_read_lock() and rcu_read_unlock() generate a warning if a cpu is in
> extended quiescant state. Since these functions are inlined this can cause
> a lot of warnings if in the processing of the WARN_ON_ONCE() there is
> another usage of e.g. rcu_read_lock(). To make sure we only get one
> warning (and avoid possible stack overflows) uninline the check.
> 
> Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
> ---
>  include/linux/rcupdate.h |    9 +++++++--
>  kernel/rcupdate.c        |    6 ++++++
>  2 files changed, 13 insertions(+), 2 deletions(-)
> 
> diff --git a/include/linux/rcupdate.h b/include/linux/rcupdate.h
> index 81c04f4..9fe7be5 100644
> --- a/include/linux/rcupdate.h
> +++ b/include/linux/rcupdate.h
> @@ -230,22 +230,27 @@ static inline void destroy_rcu_head_on_stack(struct rcu_head *head)
> 
>  #ifdef CONFIG_PROVE_RCU
>  extern int rcu_is_cpu_idle(void);
> +extern void rcu_warn_if_is_cpu_idle(void);
>  #else /* !CONFIG_PROVE_RCU */
>  static inline int rcu_is_cpu_idle(void)
>  {
>  	return 0;
>  }
> +
> +static inline void rcu_warn_if_is_cpu_idle(void)
> +{
> +}
>  #endif /* else !CONFIG_PROVE_RCU */
> 
>  static inline void rcu_lock_acquire(struct lockdep_map *map)
>  {
> -	WARN_ON_ONCE(rcu_is_cpu_idle());
> +	rcu_warn_if_is_cpu_idle();

Thank you for the patch, but this WARN_ON_ONCE() has now been removed
in favor of lockdep-RCU checks elsewhere.  This has the advantage of
leveraging lockdep's splat-once and anti-recursion facilities.

So I believe that current -rcu covers this.  (And yes, I do need to
push my most recent changes out.)

							Thanx, Paul

>  	lock_acquire(map, 0, 0, 2, 1, NULL, _THIS_IP_);
>  }
> 
>  static inline void rcu_lock_release(struct lockdep_map *map)
>  {
> -	WARN_ON_ONCE(rcu_is_cpu_idle());
> +	rcu_warn_if_is_cpu_idle();
>  	lock_release(map, 1, _THIS_IP_);
>  }
> 
> diff --git a/kernel/rcupdate.c b/kernel/rcupdate.c
> index 2bc4e13..5deca18 100644
> --- a/kernel/rcupdate.c
> +++ b/kernel/rcupdate.c
> @@ -141,6 +141,12 @@ int rcu_my_thread_group_empty(void)
>  	return thread_group_empty(current);
>  }
>  EXPORT_SYMBOL_GPL(rcu_my_thread_group_empty);
> +
> +void rcu_warn_if_is_cpu_idle(void)
> +{
> +	WARN_ON_ONCE(rcu_is_cpu_idle());
> +}
> +EXPORT_SYMBOL_GPL(rcu_warn_if_is_cpu_idle);
>  #endif /* #ifdef CONFIG_PROVE_RCU */
> 
>  #ifdef CONFIG_DEBUG_OBJECTS_RCU_HEAD


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: rcu warnings cause stack overflow
  2012-02-03 18:33           ` Paul E. McKenney
@ 2012-02-04 13:13             ` Frederic Weisbecker
  2012-02-04 16:52               ` Paul E. McKenney
  0 siblings, 1 reply; 13+ messages in thread
From: Frederic Weisbecker @ 2012-02-04 13:13 UTC (permalink / raw)
  To: Paul E. McKenney
  Cc: Heiko Carstens, linux-kernel, Ingo Molnar, Peter Zijlstra

On Fri, Feb 03, 2012 at 10:33:35AM -0800, Paul E. McKenney wrote:
> On Fri, Feb 03, 2012 at 10:32:14AM +0100, Heiko Carstens wrote:
> > On Thu, Feb 02, 2012 at 11:11:16AM -0800, Paul E. McKenney wrote:
> > > On Thu, Feb 02, 2012 at 03:52:20PM +0100, Frederic Weisbecker wrote:
> > > > On Thu, Feb 02, 2012 at 01:27:42PM +0100, Heiko Carstens wrote:
> > > > > On Wed, Feb 01, 2012 at 04:14:48PM +0100, Frederic Weisbecker wrote:
> > > > > > > Removing the WARN_ON_ONCE will fix this and, if lockdep is turned on, still
> > > > > > > will find illegal uses. But it won't work for lockdep off configs...
> > > > > > > So we probably want something better than the patch below.
> > > > > > 
> > > > > > Ah ok. Hmm, but why are you using an exception to implement WARN_ON()
> > > > > > in s390? Is it to have a whole new stack for the warning path in order
> > > > > > to avoid stack overflow from the place that called the WARN_ON() ?
> > > > > 
> > > > > The reason was to reduce the code footprint of the WARN_ON() and also
> > > > > be able to print the register contents at the time the warning happened.
> > > > 
> > > > Ah ok, makes sense.
> > > 
> > > So Frederic should push his anti-recursion patch, then?
> > 
> > Yes, please.
> > 
> > Tested-by: Heiko Carstens <heiko.carstens@de.ibm.com>
> > 
> > It still generates recursive warnings because the WARNON_ONCE is inlined and
> > every different usage will generate an exception, but it didn't produce a
> > stack overflow anymore.
> > To avoid the recursive warning the patch below would help. Not sure if it's
> > worth it...
> > 
> > Subject: [PATCH] rcu: move rcu_is_cpu_idle() check warning into C file
> > 
> > From: Heiko Carstens <heiko.carstens@de.ibm.com>
> > 
> > rcu_read_lock() and rcu_read_unlock() generate a warning if a cpu is in
> > extended quiescant state. Since these functions are inlined this can cause
> > a lot of warnings if in the processing of the WARN_ON_ONCE() there is
> > another usage of e.g. rcu_read_lock(). To make sure we only get one
> > warning (and avoid possible stack overflows) uninline the check.
> > 
> > Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
> > ---
> >  include/linux/rcupdate.h |    9 +++++++--
> >  kernel/rcupdate.c        |    6 ++++++
> >  2 files changed, 13 insertions(+), 2 deletions(-)
> > 
> > diff --git a/include/linux/rcupdate.h b/include/linux/rcupdate.h
> > index 81c04f4..9fe7be5 100644
> > --- a/include/linux/rcupdate.h
> > +++ b/include/linux/rcupdate.h
> > @@ -230,22 +230,27 @@ static inline void destroy_rcu_head_on_stack(struct rcu_head *head)
> > 
> >  #ifdef CONFIG_PROVE_RCU
> >  extern int rcu_is_cpu_idle(void);
> > +extern void rcu_warn_if_is_cpu_idle(void);
> >  #else /* !CONFIG_PROVE_RCU */
> >  static inline int rcu_is_cpu_idle(void)
> >  {
> >  	return 0;
> >  }
> > +
> > +static inline void rcu_warn_if_is_cpu_idle(void)
> > +{
> > +}
> >  #endif /* else !CONFIG_PROVE_RCU */
> > 
> >  static inline void rcu_lock_acquire(struct lockdep_map *map)
> >  {
> > -	WARN_ON_ONCE(rcu_is_cpu_idle());
> > +	rcu_warn_if_is_cpu_idle();
> 
> Thank you for the patch, but this WARN_ON_ONCE() has now been removed
> in favor of lockdep-RCU checks elsewhere.  This has the advantage of
> leveraging lockdep's splat-once and anti-recursion facilities.
> 
> So I believe that current -rcu covers this.  (And yes, I do need to
> push my most recent changes out.)

This still uncovers cases where we call rcu_read_lock() without matching
rcu_dereference(). Amongst this we have rcu_dereference_raw(), conditional
rcu_dereference() and may be cases where we simply have no rcu_dereference*
but we use rcu_read_lock() alone for some reason...

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: rcu warnings cause stack overflow
  2012-02-04 13:13             ` Frederic Weisbecker
@ 2012-02-04 16:52               ` Paul E. McKenney
  0 siblings, 0 replies; 13+ messages in thread
From: Paul E. McKenney @ 2012-02-04 16:52 UTC (permalink / raw)
  To: Frederic Weisbecker
  Cc: Heiko Carstens, linux-kernel, Ingo Molnar, Peter Zijlstra

On Sat, Feb 04, 2012 at 02:13:35PM +0100, Frederic Weisbecker wrote:
> On Fri, Feb 03, 2012 at 10:33:35AM -0800, Paul E. McKenney wrote:
> > On Fri, Feb 03, 2012 at 10:32:14AM +0100, Heiko Carstens wrote:
> > > On Thu, Feb 02, 2012 at 11:11:16AM -0800, Paul E. McKenney wrote:
> > > > On Thu, Feb 02, 2012 at 03:52:20PM +0100, Frederic Weisbecker wrote:
> > > > > On Thu, Feb 02, 2012 at 01:27:42PM +0100, Heiko Carstens wrote:
> > > > > > On Wed, Feb 01, 2012 at 04:14:48PM +0100, Frederic Weisbecker wrote:
> > > > > > > > Removing the WARN_ON_ONCE will fix this and, if lockdep is turned on, still
> > > > > > > > will find illegal uses. But it won't work for lockdep off configs...
> > > > > > > > So we probably want something better than the patch below.
> > > > > > > 
> > > > > > > Ah ok. Hmm, but why are you using an exception to implement WARN_ON()
> > > > > > > in s390? Is it to have a whole new stack for the warning path in order
> > > > > > > to avoid stack overflow from the place that called the WARN_ON() ?
> > > > > > 
> > > > > > The reason was to reduce the code footprint of the WARN_ON() and also
> > > > > > be able to print the register contents at the time the warning happened.
> > > > > 
> > > > > Ah ok, makes sense.
> > > > 
> > > > So Frederic should push his anti-recursion patch, then?
> > > 
> > > Yes, please.
> > > 
> > > Tested-by: Heiko Carstens <heiko.carstens@de.ibm.com>
> > > 
> > > It still generates recursive warnings because the WARNON_ONCE is inlined and
> > > every different usage will generate an exception, but it didn't produce a
> > > stack overflow anymore.
> > > To avoid the recursive warning the patch below would help. Not sure if it's
> > > worth it...
> > > 
> > > Subject: [PATCH] rcu: move rcu_is_cpu_idle() check warning into C file
> > > 
> > > From: Heiko Carstens <heiko.carstens@de.ibm.com>
> > > 
> > > rcu_read_lock() and rcu_read_unlock() generate a warning if a cpu is in
> > > extended quiescant state. Since these functions are inlined this can cause
> > > a lot of warnings if in the processing of the WARN_ON_ONCE() there is
> > > another usage of e.g. rcu_read_lock(). To make sure we only get one
> > > warning (and avoid possible stack overflows) uninline the check.
> > > 
> > > Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
> > > ---
> > >  include/linux/rcupdate.h |    9 +++++++--
> > >  kernel/rcupdate.c        |    6 ++++++
> > >  2 files changed, 13 insertions(+), 2 deletions(-)
> > > 
> > > diff --git a/include/linux/rcupdate.h b/include/linux/rcupdate.h
> > > index 81c04f4..9fe7be5 100644
> > > --- a/include/linux/rcupdate.h
> > > +++ b/include/linux/rcupdate.h
> > > @@ -230,22 +230,27 @@ static inline void destroy_rcu_head_on_stack(struct rcu_head *head)
> > > 
> > >  #ifdef CONFIG_PROVE_RCU
> > >  extern int rcu_is_cpu_idle(void);
> > > +extern void rcu_warn_if_is_cpu_idle(void);
> > >  #else /* !CONFIG_PROVE_RCU */
> > >  static inline int rcu_is_cpu_idle(void)
> > >  {
> > >  	return 0;
> > >  }
> > > +
> > > +static inline void rcu_warn_if_is_cpu_idle(void)
> > > +{
> > > +}
> > >  #endif /* else !CONFIG_PROVE_RCU */
> > > 
> > >  static inline void rcu_lock_acquire(struct lockdep_map *map)
> > >  {
> > > -	WARN_ON_ONCE(rcu_is_cpu_idle());
> > > +	rcu_warn_if_is_cpu_idle();
> > 
> > Thank you for the patch, but this WARN_ON_ONCE() has now been removed
> > in favor of lockdep-RCU checks elsewhere.  This has the advantage of
> > leveraging lockdep's splat-once and anti-recursion facilities.
> > 
> > So I believe that current -rcu covers this.  (And yes, I do need to
> > push my most recent changes out.)
> 
> This still uncovers cases where we call rcu_read_lock() without matching
> rcu_dereference(). Amongst this we have rcu_dereference_raw(), conditional
> rcu_dereference() and may be cases where we simply have no rcu_dereference*
> but we use rcu_read_lock() alone for some reason...

Ah!  I moved the !rcu_is_cpu_idle() check to the read-side primitives,
for example:

	static inline void rcu_read_lock(void)
	{
		__rcu_read_lock();
		__acquire(RCU);
		rcu_lock_acquire(&rcu_lock_map);
		rcu_lockdep_assert(!rcu_is_cpu_idle(),
				   "rcu_read_lock() used illegally while idle");
	}

The reason I did this is that there is a good chance that it will be
necessary to allow SRCU readers on CPUs that the other flavors of RCU
believe to be idle.  One reason for this is that the KVM guys want your
tickless idle to apply to KVM guest processes, but they need the guest
to be in an SRCU read-side critical section during that same time.

Now, this would break given current synchronize_srcu(), but Peter
Zijlstra's recent SRCU patches might get us to a point where it would work.

Then of course there will be the challenge of making it scale, but one
thing at a time.  ;-)

							Thanx, Paul


^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2012-02-04 16:52 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2012-02-01 10:06 rcu warnings cause stack overflow Heiko Carstens
2012-02-01 15:14 ` Frederic Weisbecker
2012-02-01 17:18   ` Paul E. McKenney
2012-02-01 18:08     ` Frederic Weisbecker
2012-02-01 18:22       ` Paul E. McKenney
2012-02-01 18:31         ` Frederic Weisbecker
2012-02-02 12:27   ` Heiko Carstens
2012-02-02 14:52     ` Frederic Weisbecker
2012-02-02 19:11       ` Paul E. McKenney
2012-02-03  9:32         ` Heiko Carstens
2012-02-03 18:33           ` Paul E. McKenney
2012-02-04 13:13             ` Frederic Weisbecker
2012-02-04 16:52               ` Paul E. McKenney

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).