linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* rcu: Add might_sleep() check to synchronize_rcu()
@ 2018-03-23 21:12 Thomas Gleixner
  2018-03-23 21:28 ` Steven Rostedt
  2018-03-25 18:50 ` Paul E. McKenney
  0 siblings, 2 replies; 9+ messages in thread
From: Thomas Gleixner @ 2018-03-23 21:12 UTC (permalink / raw)
  To: LKML
  Cc: Paul E. McKenney, Peter Zijlstra, Josh Triplett, Steven Rostedt,
	Mathieu Desnoyers, Lai Jiangshan, Joel Fernandes

Subject: rcu: Add might_sleep() check to synchronize_rcu()
From: Thomas Gleixner <tglx@linutronix.de>
Date: Fri, 23 Mar 2018 22:02:18 +0100

Joel reported a debugobjects warning which is triggered by a RCU callback
invoking synchronize_rcu(). RCU callbacks run in softirq context, so
calling synchronize_rcu() is a bad idea as it might sleep.

debugobjects triggers because __wait_rcu_gp() uses on stack objects and
invokes debug_object_init_on_stack(). That function checks the object
address against current's task stack, which fails because the code runs on
the softirq stack.

synchronize_rcu() lacks a might_sleep() check which would have caught that
issue way earlier because it would trigger with the minimal debug options
enabled.

Add a might_sleep() check to catch such cases.

Reported-by: Joel Fernandes <joelaf@google.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
Cc: Josh Triplett <josh@joshtriplett.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: Lai Jiangshan <jiangshanlai@gmail.com>
---
 kernel/rcu/tree_plugin.h |    1 +
 1 file changed, 1 insertion(+)

--- a/kernel/rcu/tree_plugin.h
+++ b/kernel/rcu/tree_plugin.h
@@ -753,6 +753,7 @@ void synchronize_rcu(void)
 			 "Illegal synchronize_rcu() in RCU read-side critical section");
 	if (rcu_scheduler_active == RCU_SCHEDULER_INACTIVE)
 		return;
+	might_sleep();
 	if (rcu_gp_is_expedited())
 		synchronize_rcu_expedited();
 	else

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: rcu: Add might_sleep() check to synchronize_rcu()
  2018-03-23 21:12 rcu: Add might_sleep() check to synchronize_rcu() Thomas Gleixner
@ 2018-03-23 21:28 ` Steven Rostedt
  2018-03-23 21:33   ` Thomas Gleixner
  2018-03-25 18:50 ` Paul E. McKenney
  1 sibling, 1 reply; 9+ messages in thread
From: Steven Rostedt @ 2018-03-23 21:28 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: LKML, Paul E. McKenney, Peter Zijlstra, Josh Triplett,
	Mathieu Desnoyers, Lai Jiangshan, Joel Fernandes

On Fri, 23 Mar 2018 22:12:24 +0100 (CET)
Thomas Gleixner <tglx@linutronix.de> wrote:

> synchronize_rcu() lacks a might_sleep() check which would have caught that
> issue way earlier because it would trigger with the minimal debug options
> enabled.
> 
> Add a might_sleep() check to catch such cases.

I'm not against the patch, but really? I would think that
synchronize_rcu() would pretty much always schedule, and scheduling
from atomic would have triggered with minimal debug options enabled.

-- Steve

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: rcu: Add might_sleep() check to synchronize_rcu()
  2018-03-23 21:28 ` Steven Rostedt
@ 2018-03-23 21:33   ` Thomas Gleixner
  2018-03-23 21:40     ` Steven Rostedt
  0 siblings, 1 reply; 9+ messages in thread
From: Thomas Gleixner @ 2018-03-23 21:33 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: LKML, Paul E. McKenney, Peter Zijlstra, Josh Triplett,
	Mathieu Desnoyers, Lai Jiangshan, Joel Fernandes

On Fri, 23 Mar 2018, Steven Rostedt wrote:

> On Fri, 23 Mar 2018 22:12:24 +0100 (CET)
> Thomas Gleixner <tglx@linutronix.de> wrote:
> 
> > synchronize_rcu() lacks a might_sleep() check which would have caught that
> > issue way earlier because it would trigger with the minimal debug options
> > enabled.
> > 
> > Add a might_sleep() check to catch such cases.
> 
> I'm not against the patch, but really? I would think that
> synchronize_rcu() would pretty much always schedule, and scheduling
> from atomic would have triggered with minimal debug options enabled.

Dunno. The reported splat is here:

       https://pastebin.com/raw/puvh0cXE

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: rcu: Add might_sleep() check to synchronize_rcu()
  2018-03-23 21:33   ` Thomas Gleixner
@ 2018-03-23 21:40     ` Steven Rostedt
  2018-03-23 21:46       ` Thomas Gleixner
  2018-03-23 22:57       ` Joel Fernandes
  0 siblings, 2 replies; 9+ messages in thread
From: Steven Rostedt @ 2018-03-23 21:40 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: LKML, Paul E. McKenney, Peter Zijlstra, Josh Triplett,
	Mathieu Desnoyers, Lai Jiangshan, Joel Fernandes

On Fri, 23 Mar 2018 22:33:29 +0100 (CET)
Thomas Gleixner <tglx@linutronix.de> wrote:

> On Fri, 23 Mar 2018, Steven Rostedt wrote:
> 
> > On Fri, 23 Mar 2018 22:12:24 +0100 (CET)
> > Thomas Gleixner <tglx@linutronix.de> wrote:
> >   
> > > synchronize_rcu() lacks a might_sleep() check which would have caught that
> > > issue way earlier because it would trigger with the minimal debug options
> > > enabled.
> > > 
> > > Add a might_sleep() check to catch such cases.  
> > 
> > I'm not against the patch, but really? I would think that
> > synchronize_rcu() would pretty much always schedule, and scheduling
> > from atomic would have triggered with minimal debug options enabled.  
> 
> Dunno. The reported splat is here:
> 
>        https://pastebin.com/raw/puvh0cXE

[  150.560848] ODEBUG: object is not on stack, but annotated
[  150.566398] ------------[ cut here ]------------
[  150.571133] WARNING: CPU: 1 PID: 0 at lib/debugobjects.c:300 __debug_object_init+0x526/0xc40
[  150.579682] Kernel panic - not syncing: panic_on_warn set ...
[  150.579682] 
[  150.587012] CPU: 1 PID: 0 Comm: swapper/1 Not tainted 4.9.89-g960923f #61
[  150.593906] Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
[  150.603233]  ffff8801db307a08 ffffffff81d96069 ffffffff83a482c0 ffff8801db307ae0
[  150.611190]  ffffffff83c19700 ffffffff81dfefb6 0000000000000009 ffff8801db307ad0
[  150.619157]  ffffffff8142fbd1 0000000041b58ab3 ffffffff8418bd08 ffffffff8142fa15
[  150.627118] Call Trace:
[  150.629667]  <IRQ> [  150.631700]  [<ffffffff81d96069>] dump_stack+0xc1/0x128
[  150.637051]  [<ffffffff81dfefb6>] ? __debug_object_init+0x526/0xc40
[  150.643431]  [<ffffffff8142fbd1>] panic+0x1bc/0x3a8
[  150.648416]  [<ffffffff8142fa15>] ? percpu_up_read_preempt_enable.constprop.53+0xd7/0xd7
[  150.656611]  [<ffffffff81430835>] ? load_image_and_restore+0xf9/0xf9
[  150.663070]  [<ffffffff81269efd>] ? vprintk_default+0x1d/0x30
[  150.668925]  [<ffffffff81131879>] ? __warn+0x1a9/0x1e0
[  150.674170]  [<ffffffff81dfefb6>] ? __debug_object_init+0x526/0xc40
[  150.680543]  [<ffffffff81131894>] __warn+0x1c4/0x1e0
[  150.685614]  [<ffffffff81131afc>] warn_slowpath_null+0x2c/0x40
[  150.691972]  [<ffffffff81dfefb6>] __debug_object_init+0x526/0xc40
[  150.698174]  [<ffffffff81dfea90>] ? debug_object_fixup+0x30/0x30
[  150.704283]  [<ffffffff81dff709>] debug_object_init_on_stack+0x19/0x20
[  150.710917]  [<ffffffff81287a93>] __wait_rcu_gp+0x93/0x1b0
[  150.716508]  [<ffffffff81290251>] synchronize_rcu.part.65+0x101/0x110
[  150.723054]  [<ffffffff81290150>] ? rcu_pm_notify+0xc0/0xc0
[  150.728735]  [<ffffffff81292bc0>] ? __call_rcu.constprop.72+0x910/0x910
[  150.735459]  [<ffffffff81235221>] ? __lock_is_held+0xa1/0xf0
[  150.741223]  [<ffffffff81290287>] synchronize_rcu+0x27/0x90
[  150.746908]  [<ffffffff83588b35>] __l2tp_session_unhash+0x3d5/0x550

Looks like __l2tp_session_unhash() is the real culprit here.

[  150.753281]  [<ffffffff8358891f>] ? __l2tp_session_unhash+0x1bf/0x550
[  150.759828]  [<ffffffff8114596a>] ? __local_bh_enable_ip+0x6a/0xd0
[  150.766123]  [<ffffffff8358ddb0>] ? l2tp_udp_encap_recv+0xd90/0xd90
[  150.772497]  [<ffffffff83588e97>] l2tp_tunnel_closeall+0x1e7/0x3a0
[  150.778782]  [<ffffffff835897be>] l2tp_tunnel_destruct+0x30e/0x5a0
[  150.785067]  [<ffffffff8358965a>] ? l2tp_tunnel_destruct+0x1aa/0x5a0
[  150.791537]  [<ffffffff835894b0>] ? l2tp_tunnel_del_work+0x460/0x460
[  150.797997]  [<ffffffff82ee8053>] __sk_destruct+0x53/0x570
[  150.803588]  [<ffffffff81293918>] rcu_process_callbacks+0x898/0x1300
[  150.810048]  [<ffffffff812939f7>] ? rcu_process_callbacks+0x977/0x1300
[  150.816684]  [<ffffffff82ee8000>] ? __sk_dst_check+0x240/0x240
[  150.822625]  [<ffffffff838be5d6>] __do_softirq+0x206/0x951
[  150.828223]  [<ffffffff81147315>] irq_exit+0x165/0x190
[  150.833557]  [<ffffffff838bd1eb>] smp_apic_timer_interrupt+0x7b/0xa0
[  150.840018]  [<ffffffff838b9470>] apic_timer_interrupt+0xa0/0xb0
[  150.846132]  <EOI> [  150.848166]  [<ffffffff838b6756>] ? native_safe_halt+0x6/0x10
[  150.854036]  [<ffffffff8123bf2d>] ? trace_hardirqs_on+0xd/0x10
[  150.859973]  [<ffffffff838b5d85>] default_idle+0x55/0x360
[  150.865478]  [<ffffffff8106be0a>] arch_cpu_idle+0xa/0x10

I think you want this instead, as __l2tp_session_unhash is what looks
like might be hiding the call to synchronize_rcu(). It's not called in
all instances, and I don't think your patch would have triggered the
issues before hand. You want this:

diff --git a/net/l2tp/l2tp_core.c b/net/l2tp/l2tp_core.c
index 194a7483bb93..857b494bee29 100644
--- a/net/l2tp/l2tp_core.c
+++ b/net/l2tp/l2tp_core.c
@@ -1677,6 +1677,8 @@ void __l2tp_session_unhash(struct l2tp_session *session)
 {
 	struct l2tp_tunnel *tunnel = session->tunnel;
 
+	might_sleep();
+
 	/* Remove the session from core hashes */
 	if (tunnel) {
 		/* Remove from the per-tunnel hash */

-- Steve

^ permalink raw reply related	[flat|nested] 9+ messages in thread

* Re: rcu: Add might_sleep() check to synchronize_rcu()
  2018-03-23 21:40     ` Steven Rostedt
@ 2018-03-23 21:46       ` Thomas Gleixner
  2018-03-23 22:57       ` Joel Fernandes
  1 sibling, 0 replies; 9+ messages in thread
From: Thomas Gleixner @ 2018-03-23 21:46 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: LKML, Paul E. McKenney, Peter Zijlstra, Josh Triplett,
	Mathieu Desnoyers, Lai Jiangshan, Joel Fernandes, netdev,
	James Chapman

On Fri, 23 Mar 2018, Steven Rostedt wrote:
> On Fri, 23 Mar 2018 22:33:29 +0100 (CET)
> [  150.741223]  [<ffffffff81290287>] synchronize_rcu+0x27/0x90
> [  150.746908]  [<ffffffff83588b35>] __l2tp_session_unhash+0x3d5/0x550
> 
> Looks like __l2tp_session_unhash() is the real culprit here.

Yes. I reported that to netdev already.

> [  150.753281]  [<ffffffff8358891f>] ? __l2tp_session_unhash+0x1bf/0x550
> [  150.759828]  [<ffffffff8114596a>] ? __local_bh_enable_ip+0x6a/0xd0
> [  150.766123]  [<ffffffff8358ddb0>] ? l2tp_udp_encap_recv+0xd90/0xd90
> [  150.772497]  [<ffffffff83588e97>] l2tp_tunnel_closeall+0x1e7/0x3a0
> [  150.778782]  [<ffffffff835897be>] l2tp_tunnel_destruct+0x30e/0x5a0
> [  150.785067]  [<ffffffff8358965a>] ? l2tp_tunnel_destruct+0x1aa/0x5a0
> [  150.791537]  [<ffffffff835894b0>] ? l2tp_tunnel_del_work+0x460/0x460
> [  150.797997]  [<ffffffff82ee8053>] __sk_destruct+0x53/0x570
> [  150.803588]  [<ffffffff81293918>] rcu_process_callbacks+0x898/0x1300
> [  150.810048]  [<ffffffff812939f7>] ? rcu_process_callbacks+0x977/0x1300
> [  150.816684]  [<ffffffff82ee8000>] ? __sk_dst_check+0x240/0x240
> [  150.822625]  [<ffffffff838be5d6>] __do_softirq+0x206/0x951
> [  150.828223]  [<ffffffff81147315>] irq_exit+0x165/0x190
> [  150.833557]  [<ffffffff838bd1eb>] smp_apic_timer_interrupt+0x7b/0xa0
> [  150.840018]  [<ffffffff838b9470>] apic_timer_interrupt+0xa0/0xb0
> [  150.846132]  <EOI> [  150.848166]  [<ffffffff838b6756>] ? native_safe_halt+0x6/0x10
> [  150.854036]  [<ffffffff8123bf2d>] ? trace_hardirqs_on+0xd/0x10
> [  150.859973]  [<ffffffff838b5d85>] default_idle+0x55/0x360
> [  150.865478]  [<ffffffff8106be0a>] arch_cpu_idle+0xa/0x10
> 
> I think you want this instead, as __l2tp_session_unhash is what looks
> like might be hiding the call to synchronize_rcu(). It's not called in
> all instances, and I don't think your patch would have triggered the
> issues before hand. You want this:
>
> diff --git a/net/l2tp/l2tp_core.c b/net/l2tp/l2tp_core.c
> index 194a7483bb93..857b494bee29 100644
> --- a/net/l2tp/l2tp_core.c
> +++ b/net/l2tp/l2tp_core.c
> @@ -1677,6 +1677,8 @@ void __l2tp_session_unhash(struct l2tp_session *session)
>  {
>  	struct l2tp_tunnel *tunnel = session->tunnel;
>  
> +	might_sleep();
> +
>  	/* Remove the session from core hashes */
>  	if (tunnel) {
>  		/* Remove from the per-tunnel hash */

That too :)

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: rcu: Add might_sleep() check to synchronize_rcu()
  2018-03-23 21:40     ` Steven Rostedt
  2018-03-23 21:46       ` Thomas Gleixner
@ 2018-03-23 22:57       ` Joel Fernandes
  2018-03-24  1:21         ` Steven Rostedt
  1 sibling, 1 reply; 9+ messages in thread
From: Joel Fernandes @ 2018-03-23 22:57 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: Thomas Gleixner, LKML, Paul E. McKenney, Peter Zijlstra,
	Josh Triplett, Mathieu Desnoyers, Lai Jiangshan

On Fri, Mar 23, 2018 at 2:40 PM, Steven Rostedt <rostedt@goodmis.org> wrote:
> On Fri, 23 Mar 2018 22:33:29 +0100 (CET)
> Thomas Gleixner <tglx@linutronix.de> wrote:
>
>> On Fri, 23 Mar 2018, Steven Rostedt wrote:
>>
>> > On Fri, 23 Mar 2018 22:12:24 +0100 (CET)
>> > Thomas Gleixner <tglx@linutronix.de> wrote:
>> >
>> > > synchronize_rcu() lacks a might_sleep() check which would have caught that
>> > > issue way earlier because it would trigger with the minimal debug options
>> > > enabled.
>> > >
>> > > Add a might_sleep() check to catch such cases.
>> >
>> > I'm not against the patch, but really? I would think that
>> > synchronize_rcu() would pretty much always schedule, and scheduling
>> > from atomic would have triggered with minimal debug options enabled.
>>
>> Dunno. The reported splat is here:
>>
>>        https://pastebin.com/raw/puvh0cXE
>
> [  150.560848] ODEBUG: object is not on stack, but annotated
> [  150.566398] ------------[ cut here ]------------
> [  150.571133] WARNING: CPU: 1 PID: 0 at lib/debugobjects.c:300 __debug_object_init+0x526/0xc40
> [  150.579682] Kernel panic - not syncing: panic_on_warn set ...
> [  150.579682]
> [  150.587012] CPU: 1 PID: 0 Comm: swapper/1 Not tainted 4.9.89-g960923f #61
> [  150.593906] Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
> [  150.603233]  ffff8801db307a08 ffffffff81d96069 ffffffff83a482c0 ffff8801db307ae0
> [  150.611190]  ffffffff83c19700 ffffffff81dfefb6 0000000000000009 ffff8801db307ad0
> [  150.619157]  ffffffff8142fbd1 0000000041b58ab3 ffffffff8418bd08 ffffffff8142fa15
> [  150.627118] Call Trace:
> [  150.629667]  <IRQ> [  150.631700]  [<ffffffff81d96069>] dump_stack+0xc1/0x128
> [  150.637051]  [<ffffffff81dfefb6>] ? __debug_object_init+0x526/0xc40
> [  150.643431]  [<ffffffff8142fbd1>] panic+0x1bc/0x3a8
> [  150.648416]  [<ffffffff8142fa15>] ? percpu_up_read_preempt_enable.constprop.53+0xd7/0xd7
> [  150.656611]  [<ffffffff81430835>] ? load_image_and_restore+0xf9/0xf9
> [  150.663070]  [<ffffffff81269efd>] ? vprintk_default+0x1d/0x30
> [  150.668925]  [<ffffffff81131879>] ? __warn+0x1a9/0x1e0
> [  150.674170]  [<ffffffff81dfefb6>] ? __debug_object_init+0x526/0xc40
> [  150.680543]  [<ffffffff81131894>] __warn+0x1c4/0x1e0
> [  150.685614]  [<ffffffff81131afc>] warn_slowpath_null+0x2c/0x40
> [  150.691972]  [<ffffffff81dfefb6>] __debug_object_init+0x526/0xc40
> [  150.698174]  [<ffffffff81dfea90>] ? debug_object_fixup+0x30/0x30
> [  150.704283]  [<ffffffff81dff709>] debug_object_init_on_stack+0x19/0x20
> [  150.710917]  [<ffffffff81287a93>] __wait_rcu_gp+0x93/0x1b0
> [  150.716508]  [<ffffffff81290251>] synchronize_rcu.part.65+0x101/0x110
> [  150.723054]  [<ffffffff81290150>] ? rcu_pm_notify+0xc0/0xc0
> [  150.728735]  [<ffffffff81292bc0>] ? __call_rcu.constprop.72+0x910/0x910
> [  150.735459]  [<ffffffff81235221>] ? __lock_is_held+0xa1/0xf0
> [  150.741223]  [<ffffffff81290287>] synchronize_rcu+0x27/0x90
> [  150.746908]  [<ffffffff83588b35>] __l2tp_session_unhash+0x3d5/0x550
>
> Looks like __l2tp_session_unhash() is the real culprit here.
>
> [  150.753281]  [<ffffffff8358891f>] ? __l2tp_session_unhash+0x1bf/0x550
> [  150.759828]  [<ffffffff8114596a>] ? __local_bh_enable_ip+0x6a/0xd0
> [  150.766123]  [<ffffffff8358ddb0>] ? l2tp_udp_encap_recv+0xd90/0xd90
> [  150.772497]  [<ffffffff83588e97>] l2tp_tunnel_closeall+0x1e7/0x3a0
> [  150.778782]  [<ffffffff835897be>] l2tp_tunnel_destruct+0x30e/0x5a0
> [  150.785067]  [<ffffffff8358965a>] ? l2tp_tunnel_destruct+0x1aa/0x5a0
> [  150.791537]  [<ffffffff835894b0>] ? l2tp_tunnel_del_work+0x460/0x460
> [  150.797997]  [<ffffffff82ee8053>] __sk_destruct+0x53/0x570
> [  150.803588]  [<ffffffff81293918>] rcu_process_callbacks+0x898/0x1300
> [  150.810048]  [<ffffffff812939f7>] ? rcu_process_callbacks+0x977/0x1300
> [  150.816684]  [<ffffffff82ee8000>] ? __sk_dst_check+0x240/0x240
> [  150.822625]  [<ffffffff838be5d6>] __do_softirq+0x206/0x951
> [  150.828223]  [<ffffffff81147315>] irq_exit+0x165/0x190
> [  150.833557]  [<ffffffff838bd1eb>] smp_apic_timer_interrupt+0x7b/0xa0
> [  150.840018]  [<ffffffff838b9470>] apic_timer_interrupt+0xa0/0xb0
> [  150.846132]  <EOI> [  150.848166]  [<ffffffff838b6756>] ? native_safe_halt+0x6/0x10
> [  150.854036]  [<ffffffff8123bf2d>] ? trace_hardirqs_on+0xd/0x10
> [  150.859973]  [<ffffffff838b5d85>] default_idle+0x55/0x360
> [  150.865478]  [<ffffffff8106be0a>] arch_cpu_idle+0xa/0x10
>
> I think you want this instead, as __l2tp_session_unhash is what looks
> like might be hiding the call to synchronize_rcu(). It's not called in
> all instances, and I don't think your patch would have triggered the
> issues before hand. You want this:
>
> diff --git a/net/l2tp/l2tp_core.c b/net/l2tp/l2tp_core.c
> index 194a7483bb93..857b494bee29 100644
> --- a/net/l2tp/l2tp_core.c
> +++ b/net/l2tp/l2tp_core.c
> @@ -1677,6 +1677,8 @@ void __l2tp_session_unhash(struct l2tp_session *session)
>  {
>         struct l2tp_tunnel *tunnel = session->tunnel;
>
> +       might_sleep();
> +
>         /* Remove the session from core hashes */
>         if (tunnel) {
>                 /* Remove from the per-tunnel hash */

Thanks Thomas and Steven, also shouldn't this code be calling
synchronize_rcu_bh instead of synchronize_rcu, to complement the
rcu_read_lock_bh? In which situations would you call one versus the
other?

Also it seems rcu_read_lock_bh does a might_sleep already in rcu_blocking_is_gp.

thanks,

- Joel

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: rcu: Add might_sleep() check to synchronize_rcu()
  2018-03-23 22:57       ` Joel Fernandes
@ 2018-03-24  1:21         ` Steven Rostedt
  2018-03-25 18:43           ` Paul E. McKenney
  0 siblings, 1 reply; 9+ messages in thread
From: Steven Rostedt @ 2018-03-24  1:21 UTC (permalink / raw)
  To: Joel Fernandes
  Cc: Thomas Gleixner, LKML, Paul E. McKenney, Peter Zijlstra,
	Josh Triplett, Mathieu Desnoyers, Lai Jiangshan

On Fri, 23 Mar 2018 15:57:04 -0700
Joel Fernandes <joelaf@google.com> wrote:

> > diff --git a/net/l2tp/l2tp_core.c b/net/l2tp/l2tp_core.c
> > index 194a7483bb93..857b494bee29 100644
> > --- a/net/l2tp/l2tp_core.c
> > +++ b/net/l2tp/l2tp_core.c
> > @@ -1677,6 +1677,8 @@ void __l2tp_session_unhash(struct l2tp_session *session)
> >  {
> >         struct l2tp_tunnel *tunnel = session->tunnel;
> >
> > +       might_sleep();
> > +
> >         /* Remove the session from core hashes */
> >         if (tunnel) {
> >                 /* Remove from the per-tunnel hash */  
> 
> Thanks Thomas and Steven, also shouldn't this code be calling
> synchronize_rcu_bh instead of synchronize_rcu, to complement the
> rcu_read_lock_bh? In which situations would you call one versus the
> other?

Probably, as the comment above rcu_read_lock_bh is:

 * rcu_read_lock_bh() - mark the beginning of an RCU-bh critical section
 *
 * This is equivalent of rcu_read_lock(), but to be used when updates
 * are being done using call_rcu_bh() or synchronize_rcu_bh(). Since
 * both call_rcu_bh() and synchronize_rcu_bh() consider completion of a
 * softirq handler to be a quiescent state, a process in RCU read-side
 * critical section must be protected by disabling softirqs.

It appears that the reason to use rcu_read_lock_bh() is if you are
calling synchronize_rcu_bh(). Otherwise, one could just be using
straight rcu_read_lock().

-- Steve

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: rcu: Add might_sleep() check to synchronize_rcu()
  2018-03-24  1:21         ` Steven Rostedt
@ 2018-03-25 18:43           ` Paul E. McKenney
  0 siblings, 0 replies; 9+ messages in thread
From: Paul E. McKenney @ 2018-03-25 18:43 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: Joel Fernandes, Thomas Gleixner, LKML, Peter Zijlstra,
	Josh Triplett, Mathieu Desnoyers, Lai Jiangshan

On Fri, Mar 23, 2018 at 09:21:05PM -0400, Steven Rostedt wrote:
> On Fri, 23 Mar 2018 15:57:04 -0700
> Joel Fernandes <joelaf@google.com> wrote:
> 
> > > diff --git a/net/l2tp/l2tp_core.c b/net/l2tp/l2tp_core.c
> > > index 194a7483bb93..857b494bee29 100644
> > > --- a/net/l2tp/l2tp_core.c
> > > +++ b/net/l2tp/l2tp_core.c
> > > @@ -1677,6 +1677,8 @@ void __l2tp_session_unhash(struct l2tp_session *session)
> > >  {
> > >         struct l2tp_tunnel *tunnel = session->tunnel;
> > >
> > > +       might_sleep();
> > > +
> > >         /* Remove the session from core hashes */
> > >         if (tunnel) {
> > >                 /* Remove from the per-tunnel hash */  
> > 
> > Thanks Thomas and Steven, also shouldn't this code be calling
> > synchronize_rcu_bh instead of synchronize_rcu, to complement the
> > rcu_read_lock_bh? In which situations would you call one versus the
> > other?
> 
> Probably, as the comment above rcu_read_lock_bh is:
> 
>  * rcu_read_lock_bh() - mark the beginning of an RCU-bh critical section
>  *
>  * This is equivalent of rcu_read_lock(), but to be used when updates
>  * are being done using call_rcu_bh() or synchronize_rcu_bh(). Since
>  * both call_rcu_bh() and synchronize_rcu_bh() consider completion of a
>  * softirq handler to be a quiescent state, a process in RCU read-side
>  * critical section must be protected by disabling softirqs.
> 
> It appears that the reason to use rcu_read_lock_bh() is if you are
> calling synchronize_rcu_bh(). Otherwise, one could just be using
> straight rcu_read_lock().

Agreed, these do have to match.  (I am still working on collapsing
RCU-preempt, RCU-bh, and RCU-sched into one thing per Linus's request,
but still at the pen-and-paper stage.  Not all that difficult, just a
lot of cases to cover.)

							Thanx, Paul

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: rcu: Add might_sleep() check to synchronize_rcu()
  2018-03-23 21:12 rcu: Add might_sleep() check to synchronize_rcu() Thomas Gleixner
  2018-03-23 21:28 ` Steven Rostedt
@ 2018-03-25 18:50 ` Paul E. McKenney
  1 sibling, 0 replies; 9+ messages in thread
From: Paul E. McKenney @ 2018-03-25 18:50 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: LKML, Peter Zijlstra, Josh Triplett, Steven Rostedt,
	Mathieu Desnoyers, Lai Jiangshan, Joel Fernandes

On Fri, Mar 23, 2018 at 10:12:24PM +0100, Thomas Gleixner wrote:
> Subject: rcu: Add might_sleep() check to synchronize_rcu()
> From: Thomas Gleixner <tglx@linutronix.de>
> Date: Fri, 23 Mar 2018 22:02:18 +0100
> 
> Joel reported a debugobjects warning which is triggered by a RCU callback
> invoking synchronize_rcu(). RCU callbacks run in softirq context, so
> calling synchronize_rcu() is a bad idea as it might sleep.
> 
> debugobjects triggers because __wait_rcu_gp() uses on stack objects and
> invokes debug_object_init_on_stack(). That function checks the object
> address against current's task stack, which fails because the code runs on
> the softirq stack.
> 
> synchronize_rcu() lacks a might_sleep() check which would have caught that
> issue way earlier because it would trigger with the minimal debug options
> enabled.
> 
> Add a might_sleep() check to catch such cases.
> 
> Reported-by: Joel Fernandes <joelaf@google.com>
> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
> Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
> Cc: Josh Triplett <josh@joshtriplett.org>
> Cc: Steven Rostedt <rostedt@goodmis.org>
> Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
> Cc: Lai Jiangshan <jiangshanlai@gmail.com>
> ---
>  kernel/rcu/tree_plugin.h |    1 +
>  1 file changed, 1 insertion(+)
> 
> --- a/kernel/rcu/tree_plugin.h
> +++ b/kernel/rcu/tree_plugin.h
> @@ -753,6 +753,7 @@ void synchronize_rcu(void)
>  			 "Illegal synchronize_rcu() in RCU read-side critical section");
>  	if (rcu_scheduler_active == RCU_SCHEDULER_INACTIVE)
>  		return;
> +	might_sleep();
>  	if (rcu_gp_is_expedited())
>  		synchronize_rcu_expedited();
>  	else

I could add this, but synchronize_rcu_expedited() will do
either a mutex_lock() or a wait_event(), both of which already
have a might_sleep(), and wait_rcu_gp() unconditionally calls
wait_for_completion(), which already has a might_sleep().

Unless there is only one CPU in the system either at early boot.  Is
this possibility common enough to warrant a might_sleep() further up?

							Thanx, Paul

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2018-03-25 18:49 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-03-23 21:12 rcu: Add might_sleep() check to synchronize_rcu() Thomas Gleixner
2018-03-23 21:28 ` Steven Rostedt
2018-03-23 21:33   ` Thomas Gleixner
2018-03-23 21:40     ` Steven Rostedt
2018-03-23 21:46       ` Thomas Gleixner
2018-03-23 22:57       ` Joel Fernandes
2018-03-24  1:21         ` Steven Rostedt
2018-03-25 18:43           ` Paul E. McKenney
2018-03-25 18:50 ` Paul E. McKenney

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).