netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Suspicious RCU usage in bridge with Linux v4.0-9362-g1fc149933fd4
@ 2015-04-23 17:07 Josh Boyer
  2015-04-23 17:35 ` Sudeep Holla
                   ` (3 more replies)
  0 siblings, 4 replies; 27+ messages in thread
From: Josh Boyer @ 2015-04-23 17:07 UTC (permalink / raw)
  To: Stephen Hemminger
  Cc: David S. Miller, bridge, netdev, Linux-Kernel@Vger. Kernel. Org,
	dac.override

Hi All,

We've had a user report the following backtrace from the bridge module
with a recent Linus' tree.  Has anything like this been reported yet?
If you have any questions on setup, the user is CC'd.

josh

[   29.382235] br0: port 1(tap0) entered forwarding state

[   29.382286] ===============================
[   29.382315] [ INFO: suspicious RCU usage. ]
[   29.382344] 4.1.0-0.rc0.git11.1.fc23.x86_64 #1 Not tainted
[   29.382380] -------------------------------
[   29.382409] net/bridge/br_private.h:626 suspicious
rcu_dereference_check() usage!
[   29.382455]
               other info that might help us debug this:

[   29.382507]
               rcu_scheduler_active = 1, debug_locks = 0
[   29.382549] 2 locks held by swapper/0/0:
[   29.382576]  #0:  (((&p->forward_delay_timer))){+.-...}, at:
[<ffffffff81139f75>] call_timer_fn+0x5/0x4f0
[   29.382660]  #1:  (&(&br->lock)->rlock){+.-...}, at:
[<ffffffffa0450dc1>] br_forward_delay_timer_expired+0x31/0x140
[bridge]
[   29.382754]
               stack backtrace:
[   29.382787] CPU: 0 PID: 0 Comm: swapper/0 Not tainted
4.1.0-0.rc0.git11.1.fc23.x86_64 #1
[   29.382838] Hardware name: LENOVO 422916G/LENOVO, BIOS A1KT53AUS 04/07/2015
[   29.382882]  0000000000000000 3ebfc20364115825 ffff880666603c48
ffffffff81892d4b
[   29.382943]  0000000000000000 ffffffff81e124e0 ffff880666603c78
ffffffff8110bcd7
[   29.383004]  ffff8800785c9d00 ffff88065485ac58 ffff880c62002800
ffff880c5fc88ac0
[   29.383065] Call Trace:
[   29.383084]  <IRQ>  [<ffffffff81892d4b>] dump_stack+0x4c/0x65
[   29.383130]  [<ffffffff8110bcd7>] lockdep_rcu_suspicious+0xe7/0x120
[   29.383178]  [<ffffffffa04520f9>] br_fill_ifinfo+0x4a9/0x6a0 [bridge]
[   29.383225]  [<ffffffffa045266b>] br_ifinfo_notify+0x11b/0x4b0 [bridge]
[   29.383271]  [<ffffffffa0450d90>] ? br_hold_timer_expired+0x70/0x70 [bridge]
[   29.383320]  [<ffffffffa0450de8>]
br_forward_delay_timer_expired+0x58/0x140 [bridge]
[   29.383371]  [<ffffffffa0450d90>] ? br_hold_timer_expired+0x70/0x70 [bridge]
[   29.383416]  [<ffffffff8113a033>] call_timer_fn+0xc3/0x4f0
[   29.383454]  [<ffffffff81139f75>] ? call_timer_fn+0x5/0x4f0
[   29.383493]  [<ffffffff8110a90f>] ? lock_release_holdtime.part.29+0xf/0x200
[   29.383541]  [<ffffffffa0450d90>] ? br_hold_timer_expired+0x70/0x70 [bridge]
[   29.383587]  [<ffffffff8113a6a4>] run_timer_softirq+0x244/0x490
[   29.383629]  [<ffffffff810b68cc>] __do_softirq+0xec/0x670
[   29.383666]  [<ffffffff810b70d5>] irq_exit+0x145/0x150
[   29.383703]  [<ffffffff8189f506>] smp_apic_timer_interrupt+0x46/0x60
[   29.383744]  [<ffffffff8189d523>] apic_timer_interrupt+0x73/0x80
[   29.383782]  <EOI>  [<ffffffff816f131f>] ? cpuidle_enter_state+0x5f/0x2f0
[   29.383832]  [<ffffffff816f131b>] ? cpuidle_enter_state+0x5b/0x2f0
[   29.383873]  [<ffffffff816f15e7>] cpuidle_enter+0x17/0x20
[   29.383908]  [<ffffffff81103c6f>] cpu_startup_entry+0x36f/0x5f0
[   29.383949]  [<ffffffff81887cbd>] rest_init+0x13d/0x150
[   29.383986]  [<ffffffff821c905b>] start_kernel+0x4d2/0x4f3
[   29.384023]  [<ffffffff821c8120>] ? early_idt_handlers+0x120/0x120
[   29.384064]  [<ffffffff821c8339>] x86_64_start_reservations+0x2a/0x2c
[   29.384105]  [<ffffffff821c8485>] x86_64_start_kernel+0x14a/0x16d
[   36.100347] IN=br0 OUT=
MAC=33:33:00:00:00:02:52:54:00:12:01:00:86:dd
SRC=fe80:0000:0000:0000:5054:00ff:fe12:0100
DST=ff02:0000:0000:0000:0000:0000:0000:0002 LEN=56 TC=0 HOPLIMIT=255
FLOWLBL=0 PROTO=ICMPv6 TYPE=133 CODE=0
[   40.113744] IN=br0 OUT=
MAC=33:33:00:00:00:02:52:54:00:12:01:00:86:dd
SRC=fe80:0000:0000:0000:5054:00ff:fe12:0100
DST=ff02:0000:0000:0000:0000:0000:0000:0002 LEN=56 TC=0 HOPLIMIT=255
FLOWLBL=0 PROTO=ICMPv6 TYPE=133 CODE=0
[   44.119727] IN=br0 OUT=
MAC=33:33:00:00:00:02:52:54:00:12:01:00:86:dd
SRC=fe80:0000:0000:0000:5054:00ff:fe12:0100
DST=ff02:0000:0000:0000:0000:0000:0000:0002 LEN=56 TC=0 HOPLIMIT=255
FLOWLBL=0 PROTO=ICMPv6 TYPE=133 CODE=0

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: Suspicious RCU usage in bridge with Linux v4.0-9362-g1fc149933fd4
  2015-04-23 17:07 Suspicious RCU usage in bridge with Linux v4.0-9362-g1fc149933fd4 Josh Boyer
@ 2015-04-23 17:35 ` Sudeep Holla
  2015-04-23 22:53 ` Cong Wang
                   ` (2 subsequent siblings)
  3 siblings, 0 replies; 27+ messages in thread
From: Sudeep Holla @ 2015-04-23 17:35 UTC (permalink / raw)
  To: Josh Boyer
  Cc: Stephen Hemminger, David S. Miller, bridge, netdev,
	Linux-Kernel@Vger. Kernel. Org, dac.override

On Thu, Apr 23, 2015 at 6:07 PM, Josh Boyer <jwboyer@fedoraproject.org> wrote:
> Hi All,
>
> We've had a user report the following backtrace from the bridge module
> with a recent Linus' tree.  Has anything like this been reported yet?
> If you have any questions on setup, the user is CC'd.
>

I too observed similar backtrace once(not able to reproduce it again)
while I was trying to check inconsistent lock state[1] with lockdep enabled.

Regards,
Sudeep

[1] https://lkml.org/lkml/2015/4/23/329


--->8

===============================
[ INFO: suspicious RCU usage. ]
4.0.0 #269 Not tainted
-------------------------------
include/trace/events/ipi.h:68 suspicious rcu_dereference_check() usage!

other info that might help us debug this:


RCU used illegally from idle CPU!
rcu_scheduler_active = 1, debug_locks = 1
RCU used illegally from extended quiescent state!
no locks held by swapper/0/0.

stack backtrace:
CPU: 0 PID: 0 Comm: swapper/0 Not tainted 4.0.0 #269
Hardware name: ARM-Versatile Express
[<c00151f1>] (unwind_backtrace) from [<c0011971>] (show_stack+0x11/0x14)
[<c0011971>] (show_stack) from [<c05627c7>] (dump_stack+0x73/0x8c)
[<c05627c7>] (dump_stack) from [<c0013e63>] (handle_IPI+0x257/0x410)
[<c0013e63>] (handle_IPI) from [<c000932b>] (gic_handle_irq+0x4f/0x50)
[<c000932b>] (gic_handle_irq) from [<c00121ff>] (__irq_svc+0x3f/0x64)
Exception stack(0xc083bf30 to 0xc083bf78)
bf20:                                     00000001 00000001 00000000 c0841938
bf40: c083a000 c083d5fc c08ba5c8 c083d598 00000000 00000000 c08b938f c056afc8
bf60: 00000008 c083bf78 c005d157 c000f288 40000133 ffffffff
[<c00121ff>] (__irq_svc) from [<c000f288>] (arch_cpu_idle+0x30/0x34)
[<c000f288>] (arch_cpu_idle) from [<c00555d5>] (cpu_startup_entry+0x339/0x404)
[<c00555d5>] (cpu_startup_entry) from [<c07d1a4f>] (start_kernel+0x32f/0x338)
[<c07d1a4f>] (start_kernel) from [<8000807f>] (0x8000807f)

===============================
[ INFO: suspicious RCU usage. ]
4.0.0 #269 Not tainted
-------------------------------
include/trace/events/ipi.h:84 suspicious rcu_dereference_check() usage!

other info that might help us debug this:


RCU used illegally from idle CPU!
rcu_scheduler_active = 1, debug_locks = 1
RCU used illegally from extended quiescent state!
no locks held by swapper/0/0.

stack backtrace:
CPU: 0 PID: 0 Comm: swapper/0 Not tainted 4.0.0 #269
Hardware name: ARM-Versatile Express
[<c00151f1>] (unwind_backtrace) from [<c0011971>] (show_stack+0x11/0x14)
[<c0011971>] (show_stack) from [<c05627c7>] (dump_stack+0x73/0x8c)
[<c05627c7>] (dump_stack) from [<c0013f85>] (handle_IPI+0x379/0x410)
[<c0013f85>] (handle_IPI) from [<c000932b>] (gic_handle_irq+0x4f/0x50)
[<c000932b>] (gic_handle_irq) from [<c00121ff>] (__irq_svc+0x3f/0x64)
Exception stack(0xc083bf30 to 0xc083bf78)
bf20:                                     00000001 00000001 00000000 c0841938
bf40: c083a000 c083d5fc c08ba5c8 c083d598 00000000 00000000 c08b938f c056afc8
bf60: 00000008 c083bf78 c005d157 c000f288 40000133 ffffffff
[<c00121ff>] (__irq_svc) from [<c000f288>] (arch_cpu_idle+0x30/0x34)
[<c000f288>] (arch_cpu_idle) from [<c00555d5>] (cpu_startup_entry+0x339/0x404)
[<c00555d5>] (cpu_startup_entry) from [<c07d1a4f>] (start_kernel+0x32f/0x338)
[<c07d1a4f>] (start_kernel) from [<8000807f>] (0x8000807f)

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: Suspicious RCU usage in bridge with Linux v4.0-9362-g1fc149933fd4
  2015-04-23 17:07 Suspicious RCU usage in bridge with Linux v4.0-9362-g1fc149933fd4 Josh Boyer
  2015-04-23 17:35 ` Sudeep Holla
@ 2015-04-23 22:53 ` Cong Wang
  2015-05-04 13:39 ` Dominick Grift
  2015-05-21 20:28 ` [PATCH net] bridge: fix lockdep splat Eric Dumazet
  3 siblings, 0 replies; 27+ messages in thread
From: Cong Wang @ 2015-04-23 22:53 UTC (permalink / raw)
  To: Josh Boyer
  Cc: Stephen Hemminger, David S. Miller, bridge, netdev,
	Linux-Kernel@Vger. Kernel. Org, dac.override

On Thu, Apr 23, 2015 at 10:07 AM, Josh Boyer <jwboyer@fedoraproject.org> wrote:
> [   29.382235] br0: port 1(tap0) entered forwarding state
>
> [   29.382286] ===============================
> [   29.382315] [ INFO: suspicious RCU usage. ]
> [   29.382344] 4.1.0-0.rc0.git11.1.fc23.x86_64 #1 Not tainted
> [   29.382380] -------------------------------
> [   29.382409] net/bridge/br_private.h:626 suspicious
> rcu_dereference_check() usage!
> [   29.382455]
>                other info that might help us debug this:
>
> [   29.382507]
>                rcu_scheduler_active = 1, debug_locks = 0
> [   29.382549] 2 locks held by swapper/0/0:
> [   29.382576]  #0:  (((&p->forward_delay_timer))){+.-...}, at:
> [<ffffffff81139f75>] call_timer_fn+0x5/0x4f0
> [   29.382660]  #1:  (&(&br->lock)->rlock){+.-...}, at:
> [<ffffffffa0450dc1>] br_forward_delay_timer_expired+0x31/0x140
> [bridge]
> [   29.382754]
>                stack backtrace:
> [   29.382787] CPU: 0 PID: 0 Comm: swapper/0 Not tainted
> 4.1.0-0.rc0.git11.1.fc23.x86_64 #1
> [   29.382838] Hardware name: LENOVO 422916G/LENOVO, BIOS A1KT53AUS 04/07/2015
> [   29.382882]  0000000000000000 3ebfc20364115825 ffff880666603c48
> ffffffff81892d4b
> [   29.382943]  0000000000000000 ffffffff81e124e0 ffff880666603c78
> ffffffff8110bcd7
> [   29.383004]  ffff8800785c9d00 ffff88065485ac58 ffff880c62002800
> ffff880c5fc88ac0
> [   29.383065] Call Trace:
> [   29.383084]  <IRQ>  [<ffffffff81892d4b>] dump_stack+0x4c/0x65
> [   29.383130]  [<ffffffff8110bcd7>] lockdep_rcu_suspicious+0xe7/0x120
> [   29.383178]  [<ffffffffa04520f9>] br_fill_ifinfo+0x4a9/0x6a0 [bridge]
> [   29.383225]  [<ffffffffa045266b>] br_ifinfo_notify+0x11b/0x4b0 [bridge]
> [   29.383271]  [<ffffffffa0450d90>] ? br_hold_timer_expired+0x70/0x70 [bridge]
> [   29.383320]  [<ffffffffa0450de8>]
> br_forward_delay_timer_expired+0x58/0x140 [bridge]
> [   29.383371]  [<ffffffffa0450d90>] ? br_hold_timer_expired+0x70/0x70 [bridge]
> [   29.383416]  [<ffffffff8113a033>] call_timer_fn+0xc3/0x4f0
> [   29.383454]  [<ffffffff81139f75>] ? call_timer_fn+0x5/0x4f0
> [   29.383493]  [<ffffffff8110a90f>] ? lock_release_holdtime.part.29+0xf/0x200
> [   29.383541]  [<ffffffffa0450d90>] ? br_hold_timer_expired+0x70/0x70 [bridge]
> [   29.383587]  [<ffffffff8113a6a4>] run_timer_softirq+0x244/0x490
> [   29.383629]  [<ffffffff810b68cc>] __do_softirq+0xec/0x670
> [   29.383666]  [<ffffffff810b70d5>] irq_exit+0x145/0x150
> [   29.383703]  [<ffffffff8189f506>] smp_apic_timer_interrupt+0x46/0x60
> [   29.383744]  [<ffffffff8189d523>] apic_timer_interrupt+0x73/0x80
> [   29.383782]  <EOI>  [<ffffffff816f131f>] ? cpuidle_enter_state+0x5f/0x2f0
> [   29.383832]  [<ffffffff816f131b>] ? cpuidle_enter_state+0x5b/0x2f0
> [   29.383873]  [<ffffffff816f15e7>] cpuidle_enter+0x17/0x20
> [   29.383908]  [<ffffffff81103c6f>] cpu_startup_entry+0x36f/0x5f0
> [   29.383949]  [<ffffffff81887cbd>] rest_init+0x13d/0x150
> [   29.383986]  [<ffffffff821c905b>] start_kernel+0x4d2/0x4f3
> [   29.384023]  [<ffffffff821c8120>] ? early_idt_handlers+0x120/0x120
> [   29.384064]  [<ffffffff821c8339>] x86_64_start_reservations+0x2a/0x2c
> [   29.384105]  [<ffffffff821c8485>] x86_64_start_kernel+0x14a/0x16d

We are dereferencing a RCU pointer with rtnl assert, but in the timer context
we only have br->lock. It looks like we need to take RCU read lock on that path
rather than asserting rtnl lock.

Thanks for the report.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: Suspicious RCU usage in bridge with Linux v4.0-9362-g1fc149933fd4
  2015-04-23 17:07 Suspicious RCU usage in bridge with Linux v4.0-9362-g1fc149933fd4 Josh Boyer
  2015-04-23 17:35 ` Sudeep Holla
  2015-04-23 22:53 ` Cong Wang
@ 2015-05-04 13:39 ` Dominick Grift
  2015-05-04 18:45   ` Cong Wang
  2015-05-21 20:28 ` [PATCH net] bridge: fix lockdep splat Eric Dumazet
  3 siblings, 1 reply; 27+ messages in thread
From: Dominick Grift @ 2015-05-04 13:39 UTC (permalink / raw)
  To: netdev

[-- Attachment #1: Type: text/plain, Size: 5550 bytes --]

On Thu, Apr 23, 2015 at 01:07:45PM -0400, Josh Boyer wrote:
> Hi All,
> 
> We've had a user report the following backtrace from the bridge module
> with a recent Linus' tree.  Has anything like this been reported yet?
> If you have any questions on setup, the user is CC'd.
> 
> josh
> 
> [   29.382235] br0: port 1(tap0) entered forwarding state
> 
> [   29.382286] ===============================
> [   29.382315] [ INFO: suspicious RCU usage. ]
> [   29.382344] 4.1.0-0.rc0.git11.1.fc23.x86_64 #1 Not tainted
> [   29.382380] -------------------------------
> [   29.382409] net/bridge/br_private.h:626 suspicious
> rcu_dereference_check() usage!

<snip>

With 4.1.0-0.rc1.git1.1.fc23.x86_64 the situation seems to have slightly changed:

May 04 14:51:34 d30 kernel: ===============================
May 04 14:51:34 d30 kernel: [ INFO: suspicious RCU usage. ]
May 04 14:51:34 d30 kernel: 4.1.0-0.rc1.git1.1.fc23.x86_64 #1 Not tainted
May 04 14:51:34 d30 kernel: -------------------------------
May 04 14:51:34 d30 kernel: net/bridge/br_private.h:626 suspicious rcu_dereference_check() usage!
May 04 14:51:34 d30 kernel:
May 04 14:51:34 d30 kernel:
May 04 14:51:34 d30 kernel: 3 locks held by qemu-system-x86/1133:
May 04 14:51:34 d30 kernel:  #0:  (&vcpu->mutex){+.+.+.}, at: [<ffffffffa06a7e16>] vcpu_load+0x26/0x70 [kvm]
May 04 14:51:34 d30 kernel:  #1:  (((&p->forward_delay_timer))){+.-...}, at: [<ffffffff81139db5>] call_timer_fn+0x5/0x4f0
May 04 14:51:34 d30 kernel:  #2:  (&(&br->lock)->rlock){+.-...}, at: [<ffffffffa04aedc1>] br_forward_delay_timer_expired+0x31/0x140 [bridge]
May 04 14:51:34 d30 kernel:
May 04 14:51:34 d30 kernel: CPU: 6 PID: 1133 Comm: qemu-system-x86 Not tainted 4.1.0-0.rc1.git1.1.fc23.x86_64 #1
May 04 14:51:34 d30 kernel: Hardware name: LENOVO 422916G/LENOVO, BIOS A1KT53AUS 04/07/2015
May 04 14:51:34 d30 kernel:  0000000000000000 000000000dce176d ffff880c65e03c48 ffffffff81893b45
May 04 14:51:34 d30 kernel:  0000000000000000 ffff880c5e869a60 ffff880c65e03c78 ffffffff8110bb17
May 04 14:51:34 d30 kernel:  ffff880c56f6f700 ffff880c61a35c58 ffff880c5c14d000 ffff88065b470ac0
May 04 14:51:34 d30 kernel: Call Trace:
May 04 14:51:34 d30 kernel:  <IRQ>  [<ffffffff81893b45>] dump_stack+0x4c/0x65
May 04 14:51:34 d30 kernel:  [<ffffffff8110bb17>] lockdep_rcu_suspicious+0xe7/0x120
May 04 14:51:34 d30 kernel:  [<ffffffffa04b00f9>] br_fill_ifinfo+0x4a9/0x6a0 [bridge]
May 04 14:51:34 d30 kernel:  [<ffffffffa04b066b>] br_ifinfo_notify+0x11b/0x4b0 [bridge]
May 04 14:51:34 d30 kernel:  [<ffffffffa04aed90>] ? br_hold_timer_expired+0x70/0x70 [bridge]
May 04 14:51:34 d30 kernel:  [<ffffffffa04aede8>] br_forward_delay_timer_expired+0x58/0x140 [bridge]
May 04 14:51:34 d30 kernel:  [<ffffffffa04aed90>] ? br_hold_timer_expired+0x70/0x70 [bridge]
May 04 14:51:34 d30 kernel:  [<ffffffff81139e73>] call_timer_fn+0xc3/0x4f0
May 04 14:51:34 d30 kernel:  [<ffffffff81139db5>] ? call_timer_fn+0x5/0x4f0
May 04 14:51:34 d30 kernel:  [<ffffffff8189cb50>] ? _raw_spin_unlock_irq+0x30/0x50
May 04 14:51:34 d30 kernel:  [<ffffffffa04aed90>] ? br_hold_timer_expired+0x70/0x70 [bridge]
May 04 14:51:34 d30 kernel:  [<ffffffff8113a4e4>] run_timer_softirq+0x244/0x490
May 04 14:51:34 d30 kernel:  [<ffffffff810b6847>] ? __do_softirq+0xb7/0x670
May 04 14:51:34 d30 kernel:  [<ffffffff810b687c>] __do_softirq+0xec/0x670
May 04 14:51:34 d30 kernel:  [<ffffffffa12b1380>] ? vmx_invpcid_supported+0x30/0x30 [kvm_intel]
May 04 14:51:34 d30 kernel:  [<ffffffff810b7085>] irq_exit+0x145/0x150
May 04 14:51:34 d30 kernel:  [<ffffffff818a0546>] smp_apic_timer_interrupt+0x46/0x60
May 04 14:51:34 d30 kernel:  [<ffffffff8189e4f0>] ? uv_bau_message_intr1+0x80/0x80
May 04 14:51:34 d30 kernel:  [<ffffffff8189e563>] apic_timer_interrupt+0x73/0x80
May 04 14:51:34 d30 kernel:  <EOI>  [<ffffffffa12bf3a1>] ? vmx_vcpu_run+0x871/0x8a0 [kvm_intel]
May 04 14:51:34 d30 kernel:  [<ffffffff8189e4f0>] ? uv_bau_message_intr1+0x80/0x80
May 04 14:51:34 d30 kernel:  [<ffffffffa12b236f>] ? vmx_handle_external_intr+0x6f/0x70 [kvm_intel]
May 04 14:51:34 d30 kernel:  [<ffffffffa06c1fff>] ? kvm_arch_vcpu_ioctl_run+0x71f/0x1ba0 [kvm]
May 04 14:51:34 d30 kernel:  [<ffffffffa06c204f>] ? kvm_arch_vcpu_ioctl_run+0x76f/0x1ba0 [kvm]
May 04 14:51:34 d30 kernel:  [<ffffffff8110e1ad>] ? trace_hardirqs_on+0xd/0x10
May 04 14:51:34 d30 kernel:  [<ffffffffa06a7e16>] ? vcpu_load+0x26/0x70 [kvm]
May 04 14:51:34 d30 kernel:  [<ffffffffa06babe8>] ? kvm_arch_vcpu_load+0x58/0x210 [kvm]
May 04 14:51:34 d30 kernel:  [<ffffffffa06a8223>] kvm_vcpu_ioctl+0x383/0x7e0 [kvm]
May 04 14:51:34 d30 kernel:  [<ffffffff81027e7d>] ? native_sched_clock+0x2d/0xa0
May 04 14:51:34 d30 kernel:  [<ffffffff810db056>] ? creds_are_invalid.part.1+0x16/0x50
May 04 14:51:34 d30 kernel:  [<ffffffff810db0b1>] ? creds_are_invalid+0x21/0x30
May 04 14:51:34 d30 kernel:  [<ffffffff813b56fa>] ? inode_has_perm.isra.48+0x2a/0xa0
May 04 14:51:34 d30 kernel:  [<ffffffff81298558>] do_vfs_ioctl+0x2e8/0x530
May 04 14:51:34 d30 kernel:  [<ffffffff81298821>] SyS_ioctl+0x81/0xa0
May 04 14:51:34 d30 kernel:  [<ffffffff8189d5ee>] system_call_fastpath+0x12/0x76
May 04 14:51:34 d30 kernel: br0: port 1(tap0) entered forwarding state
May 04 14:51:34 d30 kernel: br0: port 3(tap2) entered forwarding state
May 04 14:51:34 d30 kernel: br0: port 4(tap3) entered forwarding state 

-- 
02DFF788
4D30 903A 1CF3 B756 FB48  1514 3148 83A2 02DF F788
http://keys.gnupg.net/pks/lookup?op=vindex&search=0x314883A202DFF788
Dominick Grift

[-- Attachment #2: Type: application/pgp-signature, Size: 648 bytes --]

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: Suspicious RCU usage in bridge with Linux v4.0-9362-g1fc149933fd4
  2015-05-04 13:39 ` Dominick Grift
@ 2015-05-04 18:45   ` Cong Wang
  2015-05-04 20:27     ` Stephen Hemminger
                       ` (2 more replies)
  0 siblings, 3 replies; 27+ messages in thread
From: Cong Wang @ 2015-05-04 18:45 UTC (permalink / raw)
  To: Dominick Grift; +Cc: netdev

[-- Attachment #1: Type: text/plain, Size: 943 bytes --]

On Mon, May 4, 2015 at 6:39 AM, Dominick Grift <dac.override@gmail.com> wrote:
> On Thu, Apr 23, 2015 at 01:07:45PM -0400, Josh Boyer wrote:
>> Hi All,
>>
>> We've had a user report the following backtrace from the bridge module
>> with a recent Linus' tree.  Has anything like this been reported yet?
>> If you have any questions on setup, the user is CC'd.
>>
>> josh
>>
>> [   29.382235] br0: port 1(tap0) entered forwarding state
>>
>> [   29.382286] ===============================
>> [   29.382315] [ INFO: suspicious RCU usage. ]
>> [   29.382344] 4.1.0-0.rc0.git11.1.fc23.x86_64 #1 Not tainted
>> [   29.382380] -------------------------------
>> [   29.382409] net/bridge/br_private.h:626 suspicious
>> rcu_dereference_check() usage!
>
> <snip>
>
> With 4.1.0-0.rc1.git1.1.fc23.x86_64 the situation seems to have slightly changed:
>

Should be the same issue. Please give the attached patch a try,
it is compile-tested only.

Thanks!

[-- Attachment #2: br.diff --]
[-- Type: text/plain, Size: 1708 bytes --]

diff --git a/net/bridge/br_netlink.c b/net/bridge/br_netlink.c
index 4b5c236..8ba989a 100644
--- a/net/bridge/br_netlink.c
+++ b/net/bridge/br_netlink.c
@@ -325,22 +325,28 @@ static int br_fill_ifinfo(struct sk_buff *skb,
 		struct nlattr *af;
 		int err;
 
+		rcu_read_lock();
 		if (port)
 			pv = nbp_get_vlan_info(port);
 		else
 			pv = br_get_vlan_info(br);
 
-		if (!pv || bitmap_empty(pv->vlan_bitmap, VLAN_N_VID))
+		if (!pv || bitmap_empty(pv->vlan_bitmap, VLAN_N_VID)) {
+			rcu_read_unlock();
 			goto done;
+		}
 
 		af = nla_nest_start(skb, IFLA_AF_SPEC);
-		if (!af)
+		if (!af) {
+			rcu_read_unlock();
 			goto nla_put_failure;
+		}
 
 		if (filter_mask & RTEXT_FILTER_BRVLAN_COMPRESSED)
 			err = br_fill_ifvlaninfo_compressed(skb, pv);
 		else
 			err = br_fill_ifvlaninfo(skb, pv);
+		rcu_read_unlock();
 		if (err)
 			goto nla_put_failure;
 		nla_nest_end(skb, af);
diff --git a/net/bridge/br_private.h b/net/bridge/br_private.h
index 3362c29..860832e 100644
--- a/net/bridge/br_private.h
+++ b/net/bridge/br_private.h
@@ -617,13 +617,17 @@ int nbp_vlan_init(struct net_bridge_port *port);
 static inline struct net_port_vlans *br_get_vlan_info(
 						const struct net_bridge *br)
 {
-	return rcu_dereference_rtnl(br->vlan_info);
+	return rcu_dereference_protected(br->vlan_info,
+					 rcu_read_lock_held() ||
+					 lockdep_rtnl_is_held());
 }
 
 static inline struct net_port_vlans *nbp_get_vlan_info(
 						const struct net_bridge_port *p)
 {
-	return rcu_dereference_rtnl(p->vlan_info);
+	return rcu_dereference_protected(p->vlan_info,
+					 rcu_read_lock_held() ||
+					 lockdep_rtnl_is_held());
 }
 
 /* Since bridge now depends on 8021Q module, but the time bridge sees the

^ permalink raw reply related	[flat|nested] 27+ messages in thread

* Re: Suspicious RCU usage in bridge with Linux v4.0-9362-g1fc149933fd4
  2015-05-04 18:45   ` Cong Wang
@ 2015-05-04 20:27     ` Stephen Hemminger
  2015-05-04 21:35       ` Cong Wang
  2015-05-05  2:32       ` Vlad Yasevich
  2015-05-04 20:49     ` Eric Dumazet
  2015-05-05  8:33     ` Dominick Grift
  2 siblings, 2 replies; 27+ messages in thread
From: Stephen Hemminger @ 2015-05-04 20:27 UTC (permalink / raw)
  To: Cong Wang; +Cc: Dominick Grift, netdev

On Mon, 4 May 2015 11:45:41 -0700
Cong Wang <cwang@twopensource.com> wrote:

> On Mon, May 4, 2015 at 6:39 AM, Dominick Grift <dac.override@gmail.com> wrote:
> > On Thu, Apr 23, 2015 at 01:07:45PM -0400, Josh Boyer wrote:
> >> Hi All,
> >>
> >> We've had a user report the following backtrace from the bridge module
> >> with a recent Linus' tree.  Has anything like this been reported yet?
> >> If you have any questions on setup, the user is CC'd.
> >>
> >> josh
> >>
> >> [   29.382235] br0: port 1(tap0) entered forwarding state
> >>
> >> [   29.382286] ===============================
> >> [   29.382315] [ INFO: suspicious RCU usage. ]
> >> [   29.382344] 4.1.0-0.rc0.git11.1.fc23.x86_64 #1 Not tainted
> >> [   29.382380] -------------------------------
> >> [   29.382409] net/bridge/br_private.h:626 suspicious
> >> rcu_dereference_check() usage!
> >
> > <snip>
> >
> > With 4.1.0-0.rc1.git1.1.fc23.x86_64 the situation seems to have slightly changed:
> >
> 
> Should be the same issue. Please give the attached patch a try,
> it is compile-tested only.
> 
> Thanks!

Good analysis in identifying the issue. But the proposed patch
doesn't seem right.

The br->lock protects against changes to the bridge port state.
vlan_info should be treated as part of the bridge state.

The correct fix is to get vlan_info out of depending on RTNL
and use br->lock to control modifications.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: Suspicious RCU usage in bridge with Linux v4.0-9362-g1fc149933fd4
  2015-05-04 18:45   ` Cong Wang
  2015-05-04 20:27     ` Stephen Hemminger
@ 2015-05-04 20:49     ` Eric Dumazet
  2015-05-04 21:38       ` Cong Wang
  2015-05-05  8:33     ` Dominick Grift
  2 siblings, 1 reply; 27+ messages in thread
From: Eric Dumazet @ 2015-05-04 20:49 UTC (permalink / raw)
  To: Cong Wang; +Cc: Dominick Grift, netdev

On Mon, 2015-05-04 at 11:45 -0700, Cong Wang wrote:
> On Mon, May 4, 2015 at 6:39 AM, Dominick Grift <dac.override@gmail.com> wrote:
> > On Thu, Apr 23, 2015 at 01:07:45PM -0400, Josh Boyer wrote:
> >> Hi All,
> >>
> >> We've had a user report the following backtrace from the bridge module
> >> with a recent Linus' tree.  Has anything like this been reported yet?
> >> If you have any questions on setup, the user is CC'd.
> >>
> >> josh
> >>
> >> [   29.382235] br0: port 1(tap0) entered forwarding state
> >>
> >> [   29.382286] ===============================
> >> [   29.382315] [ INFO: suspicious RCU usage. ]
> >> [   29.382344] 4.1.0-0.rc0.git11.1.fc23.x86_64 #1 Not tainted
> >> [   29.382380] -------------------------------
> >> [   29.382409] net/bridge/br_private.h:626 suspicious
> >> rcu_dereference_check() usage!
> >
> > <snip>
> >
> > With 4.1.0-0.rc1.git1.1.fc23.x86_64 the situation seems to have slightly changed:
> >
> 
> Should be the same issue. Please give the attached patch a try,
> it is compile-tested only.
> 
> Thanks!

Please send inline patches, otherwise its hard for us to review them and
give feedback.

At first glance, it is way too complicated.

br_get_vlan_info() change is not needed :

rcu_dereference_rtnl() can already be used from RCU or RTNL protected
code.

(Quite different from rtnl_dereference())

What about :

diff --git a/net/bridge/br_stp_timer.c b/net/bridge/br_stp_timer.c
index 4fcaa67750fda845ad0a180332c4cd96a9524086..7caf7fae2d5b8aa369b924e1c87a47c343fb8954 100644
--- a/net/bridge/br_stp_timer.c
+++ b/net/bridge/br_stp_timer.c
@@ -97,7 +97,9 @@ static void br_forward_delay_timer_expired(unsigned long arg)
 		netif_carrier_on(br->dev);
 	}
 	br_log_state(p);
+	rcu_read_lock();
 	br_ifinfo_notify(RTM_NEWLINK, p);
+	rcu_read_unlock();
 	spin_unlock(&br->lock);
 }
 

^ permalink raw reply related	[flat|nested] 27+ messages in thread

* Re: Suspicious RCU usage in bridge with Linux v4.0-9362-g1fc149933fd4
  2015-05-04 20:27     ` Stephen Hemminger
@ 2015-05-04 21:35       ` Cong Wang
  2015-05-11 13:15         ` Dominick Grift
  2015-05-05  2:32       ` Vlad Yasevich
  1 sibling, 1 reply; 27+ messages in thread
From: Cong Wang @ 2015-05-04 21:35 UTC (permalink / raw)
  To: Stephen Hemminger; +Cc: Dominick Grift, netdev, Vlad Yasevich

On Mon, May 4, 2015 at 1:27 PM, Stephen Hemminger
<stephen@networkplumber.org> wrote:
> On Mon, 4 May 2015 11:45:41 -0700
> Cong Wang <cwang@twopensource.com> wrote:
>
>> On Mon, May 4, 2015 at 6:39 AM, Dominick Grift <dac.override@gmail.com> wrote:
>> > On Thu, Apr 23, 2015 at 01:07:45PM -0400, Josh Boyer wrote:
>> >> Hi All,
>> >>
>> >> We've had a user report the following backtrace from the bridge module
>> >> with a recent Linus' tree.  Has anything like this been reported yet?
>> >> If you have any questions on setup, the user is CC'd.
>> >>
>> >> josh
>> >>
>> >> [   29.382235] br0: port 1(tap0) entered forwarding state
>> >>
>> >> [   29.382286] ===============================
>> >> [   29.382315] [ INFO: suspicious RCU usage. ]
>> >> [   29.382344] 4.1.0-0.rc0.git11.1.fc23.x86_64 #1 Not tainted
>> >> [   29.382380] -------------------------------
>> >> [   29.382409] net/bridge/br_private.h:626 suspicious
>> >> rcu_dereference_check() usage!
>> >
>> > <snip>
>> >
>> > With 4.1.0-0.rc1.git1.1.fc23.x86_64 the situation seems to have slightly changed:
>> >
>>
>> Should be the same issue. Please give the attached patch a try,
>> it is compile-tested only.
>>
>> Thanks!
>
> Good analysis in identifying the issue. But the proposed patch
> doesn't seem right.
>
> The br->lock protects against changes to the bridge port state.
> vlan_info should be treated as part of the bridge state.
>
> The correct fix is to get vlan_info out of depending on RTNL
> and use br->lock to control modifications.

It _looks like_ we only retrieve vlan info to fill netlink
messages in timer context, so it doesn't seem we need to
hold br->lock here.

But I never look into br vlan code of course.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: Suspicious RCU usage in bridge with Linux v4.0-9362-g1fc149933fd4
  2015-05-04 20:49     ` Eric Dumazet
@ 2015-05-04 21:38       ` Cong Wang
  2015-05-04 22:06         ` Eric Dumazet
  0 siblings, 1 reply; 27+ messages in thread
From: Cong Wang @ 2015-05-04 21:38 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: Dominick Grift, netdev

On Mon, May 4, 2015 at 1:49 PM, Eric Dumazet <eric.dumazet@gmail.com> wrote:
> On Mon, 2015-05-04 at 11:45 -0700, Cong Wang wrote:
>> On Mon, May 4, 2015 at 6:39 AM, Dominick Grift <dac.override@gmail.com> wrote:
>> > On Thu, Apr 23, 2015 at 01:07:45PM -0400, Josh Boyer wrote:
>> >> Hi All,
>> >>
>> >> We've had a user report the following backtrace from the bridge module
>> >> with a recent Linus' tree.  Has anything like this been reported yet?
>> >> If you have any questions on setup, the user is CC'd.
>> >>
>> >> josh
>> >>
>> >> [   29.382235] br0: port 1(tap0) entered forwarding state
>> >>
>> >> [   29.382286] ===============================
>> >> [   29.382315] [ INFO: suspicious RCU usage. ]
>> >> [   29.382344] 4.1.0-0.rc0.git11.1.fc23.x86_64 #1 Not tainted
>> >> [   29.382380] -------------------------------
>> >> [   29.382409] net/bridge/br_private.h:626 suspicious
>> >> rcu_dereference_check() usage!
>> >
>> > <snip>
>> >
>> > With 4.1.0-0.rc1.git1.1.fc23.x86_64 the situation seems to have slightly changed:
>> >
>>
>> Should be the same issue. Please give the attached patch a try,
>> it is compile-tested only.
>>
>> Thanks!
>
> Please send inline patches, otherwise its hard for us to review them and
> give feedback.


Compile-test only patch is never ready for review, I thought it is too obvious
to mention.

>
> At first glance, it is way too complicated.
>
> br_get_vlan_info() change is not needed :
>
> rcu_dereference_rtnl() can already be used from RCU or RTNL protected
> code.
>
> (Quite different from rtnl_dereference())
>
> What about :

Feel free to submit a formal patch, I apparently don't have time to read more
into this, nor have time to argument on details.

Thanks.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: Suspicious RCU usage in bridge with Linux v4.0-9362-g1fc149933fd4
  2015-05-04 21:38       ` Cong Wang
@ 2015-05-04 22:06         ` Eric Dumazet
  2015-05-04 22:17           ` Cong Wang
  0 siblings, 1 reply; 27+ messages in thread
From: Eric Dumazet @ 2015-05-04 22:06 UTC (permalink / raw)
  To: Cong Wang; +Cc: Dominick Grift, netdev

On Mon, 2015-05-04 at 14:38 -0700, Cong Wang wrote:

> Compile-test only patch is never ready for review, I thought it is too obvious
> to mention.

Thats not relevant.

Obvious or not, you are not making things easy for reviewers.

Our time is precious too, even if you believe its a minor detail.

If you don't care for our feedback, send private patches.

When I hit 'Reply' to your mail, I don't have the patch available at
hand, and need to copy/paste it.

Surely I am not alone.

Surely you can change the way you provide patches, even if they are in
RFC state.

If you don't have time, just ignore these bug reports, and do whatever
you need to, nobody will complain.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: Suspicious RCU usage in bridge with Linux v4.0-9362-g1fc149933fd4
  2015-05-04 22:06         ` Eric Dumazet
@ 2015-05-04 22:17           ` Cong Wang
  2015-05-04 22:29             ` Eric Dumazet
  0 siblings, 1 reply; 27+ messages in thread
From: Cong Wang @ 2015-05-04 22:17 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: Dominick Grift, netdev

On Mon, May 4, 2015 at 3:06 PM, Eric Dumazet <eric.dumazet@gmail.com> wrote:
> On Mon, 2015-05-04 at 14:38 -0700, Cong Wang wrote:
>
>> Compile-test only patch is never ready for review, I thought it is too obvious
>> to mention.
>
> Thats not relevant.
>
> Obvious or not, you are not making things easy for reviewers.
>

Why should I make a NOT-ready-for-review patch easy for reviewers?
What's the point here? I don't have the right to choose my patch NOT
to be reviewed? Huh...


> Our time is precious too, even if you believe its a minor detail.
>
> If you don't care for our feedback, send private patches.


Not until it is ready, apparently.


>
> When I hit 'Reply' to your mail, I don't have the patch available at
> hand, and need to copy/paste it.

That is EXACTLY what I want, because it is NOT READY FOR REVIEW.


>
> Surely I am not alone.
>
> Surely you can change the way you provide patches, even if they are in
> RFC state.

If every patch in this mailing list has to be ready, yes.

Otherwise no, sometimes we do need a patch (or a diff if it is a better name)
ONLY for discussion, aka, NOT for review.


>
> If you don't have time, just ignore these bug reports, and do whatever
> you need to, nobody will complain.
>

Apparently you did nothing before I just sent a reply without a patch,
only patch draws your attention for some reason, I _guess_ you only
rush for patches.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: Suspicious RCU usage in bridge with Linux v4.0-9362-g1fc149933fd4
  2015-05-04 22:17           ` Cong Wang
@ 2015-05-04 22:29             ` Eric Dumazet
  2015-05-04 22:44               ` Cong Wang
  0 siblings, 1 reply; 27+ messages in thread
From: Eric Dumazet @ 2015-05-04 22:29 UTC (permalink / raw)
  To: Cong Wang; +Cc: Dominick Grift, netdev

On Mon, 2015-05-04 at 15:17 -0700, Cong Wang wrote:
> On Mon, May 4, 2015 at 3:06 PM, Eric Dumazet <eric.dumazet@gmail.com> wrote:
> > On Mon, 2015-05-04 at 14:38 -0700, Cong Wang wrote:
> >
> >> Compile-test only patch is never ready for review, I thought it is too obvious
> >> to mention.
> >
> > Thats not relevant.
> >
> > Obvious or not, you are not making things easy for reviewers.
> >
> 
> Why should I make a NOT-ready-for-review patch easy for reviewers?
> What's the point here? I don't have the right to choose my patch NOT
> to be reviewed? Huh...
> 

Look, I only said :

"Please send inline patches, otherwise its hard for us to review them
and give feedback."

I gave a quite reasonable request, with a "Please"


I was not expecting this becoming a war.

Next time, I will let David deal with your patches.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: Suspicious RCU usage in bridge with Linux v4.0-9362-g1fc149933fd4
  2015-05-04 22:29             ` Eric Dumazet
@ 2015-05-04 22:44               ` Cong Wang
  2015-05-04 23:00                 ` Eric Dumazet
  0 siblings, 1 reply; 27+ messages in thread
From: Cong Wang @ 2015-05-04 22:44 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: Dominick Grift, netdev

On Mon, May 4, 2015 at 3:29 PM, Eric Dumazet <eric.dumazet@gmail.com> wrote:
> On Mon, 2015-05-04 at 15:17 -0700, Cong Wang wrote:
>> On Mon, May 4, 2015 at 3:06 PM, Eric Dumazet <eric.dumazet@gmail.com> wrote:
>> > On Mon, 2015-05-04 at 14:38 -0700, Cong Wang wrote:
>> >
>> >> Compile-test only patch is never ready for review, I thought it is too obvious
>> >> to mention.
>> >
>> > Thats not relevant.
>> >
>> > Obvious or not, you are not making things easy for reviewers.
>> >
>>
>> Why should I make a NOT-ready-for-review patch easy for reviewers?
>> What's the point here? I don't have the right to choose my patch NOT
>> to be reviewed? Huh...
>>
>
> Look, I only said :
>
> "Please send inline patches, otherwise its hard for us to review them
> and give feedback."
>
> I gave a quite reasonable request, with a "Please"

With hundreds of patches merged, I don't need to teach how to send
a reviewable patch when it is ready, this one is just not ready for review.

So your request is only valid for patches ready for review. Focus on
the bug.

It is not my first time to get surprised netdev people for some reason
only talk when there is a patch, yet another reason I should send
a patch even just for drawing attentions.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: Suspicious RCU usage in bridge with Linux v4.0-9362-g1fc149933fd4
  2015-05-04 22:44               ` Cong Wang
@ 2015-05-04 23:00                 ` Eric Dumazet
  2015-05-04 23:22                   ` David Miller
  0 siblings, 1 reply; 27+ messages in thread
From: Eric Dumazet @ 2015-05-04 23:00 UTC (permalink / raw)
  To: Cong Wang; +Cc: Dominick Grift, netdev

On Mon, 2015-05-04 at 15:44 -0700, Cong Wang wrote:

> So your request is only valid for patches ready for review. Focus on
> the bug.

Yep, I was trying to, but couldn't reply on your patch, and this is not
the first time it happens with you.

Thus my reasonable request.

Turns out you did not understand it and turned it into personal attack.

Too bad.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: Suspicious RCU usage in bridge with Linux v4.0-9362-g1fc149933fd4
  2015-05-04 23:00                 ` Eric Dumazet
@ 2015-05-04 23:22                   ` David Miller
  0 siblings, 0 replies; 27+ messages in thread
From: David Miller @ 2015-05-04 23:22 UTC (permalink / raw)
  To: eric.dumazet; +Cc: cwang, dac.override, netdev

From: Eric Dumazet <eric.dumazet@gmail.com>
Date: Mon, 04 May 2015 16:00:04 -0700

> On Mon, 2015-05-04 at 15:44 -0700, Cong Wang wrote:
> 
>> So your request is only valid for patches ready for review. Focus on
>> the bug.
> 
> Yep, I was trying to, but couldn't reply on your patch, and this is not
> the first time it happens with you.
> 
> Thus my reasonable request.

+1  I totally agree with Eric on all of his counts, and I think you are
being completely rediculous Cong.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: Suspicious RCU usage in bridge with Linux v4.0-9362-g1fc149933fd4
  2015-05-04 20:27     ` Stephen Hemminger
  2015-05-04 21:35       ` Cong Wang
@ 2015-05-05  2:32       ` Vlad Yasevich
  1 sibling, 0 replies; 27+ messages in thread
From: Vlad Yasevich @ 2015-05-05  2:32 UTC (permalink / raw)
  To: Stephen Hemminger, Cong Wang; +Cc: Dominick Grift, netdev

On 05/04/2015 04:27 PM, Stephen Hemminger wrote:
> On Mon, 4 May 2015 11:45:41 -0700
> Cong Wang <cwang@twopensource.com> wrote:
> 
>> On Mon, May 4, 2015 at 6:39 AM, Dominick Grift <dac.override@gmail.com> wrote:
>>> On Thu, Apr 23, 2015 at 01:07:45PM -0400, Josh Boyer wrote:
>>>> Hi All,
>>>>
>>>> We've had a user report the following backtrace from the bridge module
>>>> with a recent Linus' tree.  Has anything like this been reported yet?
>>>> If you have any questions on setup, the user is CC'd.
>>>>
>>>> josh
>>>>
>>>> [   29.382235] br0: port 1(tap0) entered forwarding state
>>>>
>>>> [   29.382286] ===============================
>>>> [   29.382315] [ INFO: suspicious RCU usage. ]
>>>> [   29.382344] 4.1.0-0.rc0.git11.1.fc23.x86_64 #1 Not tainted
>>>> [   29.382380] -------------------------------
>>>> [   29.382409] net/bridge/br_private.h:626 suspicious
>>>> rcu_dereference_check() usage!
>>>
>>> <snip>
>>>
>>> With 4.1.0-0.rc1.git1.1.fc23.x86_64 the situation seems to have slightly changed:
>>>
>>
>> Should be the same issue. Please give the attached patch a try,
>> it is compile-tested only.
>>
>> Thanks!
> 
> Good analysis in identifying the issue. But the proposed patch
> doesn't seem right.
> 
> The br->lock protects against changes to the bridge port state.
> vlan_info should be treated as part of the bridge state.
> 
> The correct fix is to get vlan_info out of depending on RTNL
> and use br->lock to control modifications.
> 

Changing the write side protection to be dependent on br->lock would
then require rcu or lock to be held in br_getlink().  It all
boils down to the same thing.  br_fill_info() needs to be either
in rcu or locked context.  It's already in rtnl context, so Eric's
proposal is the simplest one.

-vlad

> --
> To unsubscribe from this list: send the line "unsubscribe netdev" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: Suspicious RCU usage in bridge with Linux v4.0-9362-g1fc149933fd4
  2015-05-04 18:45   ` Cong Wang
  2015-05-04 20:27     ` Stephen Hemminger
  2015-05-04 20:49     ` Eric Dumazet
@ 2015-05-05  8:33     ` Dominick Grift
  2 siblings, 0 replies; 27+ messages in thread
From: Dominick Grift @ 2015-05-05  8:33 UTC (permalink / raw)
  To: netdev

[-- Attachment #1: Type: text/plain, Size: 358 bytes --]

On Mon, May 04, 2015 at 11:45:41AM -0700, Cong Wang wrote:

<snip>

> Should be the same issue. Please give the attached patch a try,
> it is compile-tested only.
> 
> Thanks!

<snip>

Thank you.

I have just booted with the above path and the "oops" is gone

So although the patch may not be optimal, it does seem to take care of the issue.

[-- Attachment #2: Type: application/pgp-signature, Size: 648 bytes --]

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: Suspicious RCU usage in bridge with Linux v4.0-9362-g1fc149933fd4
  2015-05-04 21:35       ` Cong Wang
@ 2015-05-11 13:15         ` Dominick Grift
  2015-05-11 14:53           ` Eric Dumazet
  0 siblings, 1 reply; 27+ messages in thread
From: Dominick Grift @ 2015-05-11 13:15 UTC (permalink / raw)
  To: netdev

[-- Attachment #1: Type: text/plain, Size: 555 bytes --]

On Mon, May 04, 2015 at 02:35:27PM -0700, Cong Wang wrote:

<snip>

> It _looks like_ we only retrieve vlan info to fill netlink
> messages in timer context, so it doesn't seem we need to
> hold br->lock here.
> 
> But I never look into br vlan code of course.

This is just a friendly reminder that this issue still exits in: 4.1.0-0.rc2.git3.1

Ignore me if you are aware of the above

-- 
02DFF788
4D30 903A 1CF3 B756 FB48  1514 3148 83A2 02DF F788
http://keys.gnupg.net/pks/lookup?op=vindex&search=0x314883A202DFF788
Dominick Grift

[-- Attachment #2: Type: application/pgp-signature, Size: 648 bytes --]

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: Suspicious RCU usage in bridge with Linux v4.0-9362-g1fc149933fd4
  2015-05-11 13:15         ` Dominick Grift
@ 2015-05-11 14:53           ` Eric Dumazet
  2015-05-11 17:42             ` Dominick Grift
                               ` (2 more replies)
  0 siblings, 3 replies; 27+ messages in thread
From: Eric Dumazet @ 2015-05-11 14:53 UTC (permalink / raw)
  To: Dominick Grift; +Cc: netdev

On Mon, 2015-05-11 at 15:15 +0200, Dominick Grift wrote:
> On Mon, May 04, 2015 at 02:35:27PM -0700, Cong Wang wrote:
> 
> <snip>
> 
> > It _looks like_ we only retrieve vlan info to fill netlink
> > messages in timer context, so it doesn't seem we need to
> > hold br->lock here.
> > 
> > But I never look into br vlan code of course.
> 
> This is just a friendly reminder that this issue still exits in: 4.1.0-0.rc2.git3.1
> 
> Ignore me if you are aware of the above
> 

Hi Dominick

Have you tried this patch I sent last monday ?

I will submit formally when I get a test result.

Thanks

diff --git a/net/bridge/br_stp_timer.c b/net/bridge/br_stp_timer.c
index 4fcaa67750fda845ad0a180332c4cd96a9524086..7caf7fae2d5b8aa369b924e1c87a47c343fb8954 100644
--- a/net/bridge/br_stp_timer.c
+++ b/net/bridge/br_stp_timer.c
@@ -97,7 +97,9 @@ static void br_forward_delay_timer_expired(unsigned long arg)
 		netif_carrier_on(br->dev);
 	}
 	br_log_state(p);
+	rcu_read_lock();
 	br_ifinfo_notify(RTM_NEWLINK, p);
+	rcu_read_unlock();
 	spin_unlock(&br->lock);
 }
 

^ permalink raw reply related	[flat|nested] 27+ messages in thread

* Re: Suspicious RCU usage in bridge with Linux v4.0-9362-g1fc149933fd4
  2015-05-11 14:53           ` Eric Dumazet
@ 2015-05-11 17:42             ` Dominick Grift
  2015-05-12  0:13             ` poma
  2015-05-12 18:27             ` Dominick Grift
  2 siblings, 0 replies; 27+ messages in thread
From: Dominick Grift @ 2015-05-11 17:42 UTC (permalink / raw)
  To: netdev

[-- Attachment #1: Type: text/plain, Size: 511 bytes --]

On Mon, May 11, 2015 at 07:53:06AM -0700, Eric Dumazet wrote:
> On Mon, 2015-05-11 at 15:15 +0200, Dominick Grift wrote:

<snip>

> > 
> > This is just a friendly reminder that this issue still exits in: 4.1.0-0.rc2.git3.1
> > 
> > Ignore me if you are aware of the above
> > 
> 
> Hi Dominick
> 
> Have you tried this patch I sent last monday ?
> 
> I will submit formally when I get a test result.
> 
> Thanks
> 

Whoops I overlooked that. The ball is in our court.

Will get back ASAP

[-- Attachment #2: Type: application/pgp-signature, Size: 648 bytes --]

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: Suspicious RCU usage in bridge with Linux v4.0-9362-g1fc149933fd4
  2015-05-11 14:53           ` Eric Dumazet
  2015-05-11 17:42             ` Dominick Grift
@ 2015-05-12  0:13             ` poma
  2015-05-12  0:37               ` Eric Dumazet
  2015-05-12 18:27             ` Dominick Grift
  2 siblings, 1 reply; 27+ messages in thread
From: poma @ 2015-05-12  0:13 UTC (permalink / raw)
  To: Eric Dumazet, Dominick Grift; +Cc: netdev

On 11.05.2015 16:53, Eric Dumazet wrote:
> On Mon, 2015-05-11 at 15:15 +0200, Dominick Grift wrote:
>> On Mon, May 04, 2015 at 02:35:27PM -0700, Cong Wang wrote:
>>
>> <snip>
>>
>>> It _looks like_ we only retrieve vlan info to fill netlink
>>> messages in timer context, so it doesn't seem we need to
>>> hold br->lock here.
>>>
>>> But I never look into br vlan code of course.
>>
>> This is just a friendly reminder that this issue still exits in: 4.1.0-0.rc2.git3.1
>>
>> Ignore me if you are aware of the above
>>
> 
> Hi Dominick
> 
> Have you tried this patch I sent last monday ?
> 

https://bugzilla.kernel.org/show_bug.cgi?id=98081#c2


> I will submit formally when I get a test result.
> 
> Thanks
> 
> diff --git a/net/bridge/br_stp_timer.c b/net/bridge/br_stp_timer.c
> index 4fcaa67750fda845ad0a180332c4cd96a9524086..7caf7fae2d5b8aa369b924e1c87a47c343fb8954 100644
> --- a/net/bridge/br_stp_timer.c
> +++ b/net/bridge/br_stp_timer.c
> @@ -97,7 +97,9 @@ static void br_forward_delay_timer_expired(unsigned long arg)
>  		netif_carrier_on(br->dev);
>  	}
>  	br_log_state(p);
> +	rcu_read_lock();
>  	br_ifinfo_notify(RTM_NEWLINK, p);
> +	rcu_read_unlock();
>  	spin_unlock(&br->lock);
>  }
>  
> 
> 
> --
> To unsubscribe from this list: send the line "unsubscribe netdev" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: Suspicious RCU usage in bridge with Linux v4.0-9362-g1fc149933fd4
  2015-05-12  0:13             ` poma
@ 2015-05-12  0:37               ` Eric Dumazet
  2015-05-12  1:33                 ` poma
  0 siblings, 1 reply; 27+ messages in thread
From: Eric Dumazet @ 2015-05-12  0:37 UTC (permalink / raw)
  To: poma; +Cc: Dominick Grift, netdev

On Tue, 2015-05-12 at 02:13 +0200, poma wrote:

> https://bugzilla.kernel.org/show_bug.cgi?id=98081#c2

Reported panic has nothing to do with this RCU issue.

I would say your test is non conclusive.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: Suspicious RCU usage in bridge with Linux v4.0-9362-g1fc149933fd4
  2015-05-12  0:37               ` Eric Dumazet
@ 2015-05-12  1:33                 ` poma
  0 siblings, 0 replies; 27+ messages in thread
From: poma @ 2015-05-12  1:33 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: Dominick Grift, netdev

On 12.05.2015 02:37, Eric Dumazet wrote:
> On Tue, 2015-05-12 at 02:13 +0200, poma wrote:
> 
>> https://bugzilla.kernel.org/show_bug.cgi?id=98081#c2
> 
> Reported panic has nothing to do with this RCU issue.
> 
> I would say your test is non conclusive.
> 
> 

Awesome, I'm closing it.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: Suspicious RCU usage in bridge with Linux v4.0-9362-g1fc149933fd4
  2015-05-11 14:53           ` Eric Dumazet
  2015-05-11 17:42             ` Dominick Grift
  2015-05-12  0:13             ` poma
@ 2015-05-12 18:27             ` Dominick Grift
  2015-05-21 19:22               ` Josh Boyer
  2 siblings, 1 reply; 27+ messages in thread
From: Dominick Grift @ 2015-05-12 18:27 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: netdev

[-- Attachment #1: Type: text/plain, Size: 968 bytes --]

On Mon, May 11, 2015 at 07:53:06AM -0700, Eric Dumazet wrote:
> 
> Hi Dominick
> 
> Have you tried this patch I sent last monday ?
> 
> I will submit formally when I get a test result.
> 
> Thanks

Eric,

I have tested the below patch and everything is in order!

Thank you!

> 
> diff --git a/net/bridge/br_stp_timer.c b/net/bridge/br_stp_timer.c
> index 4fcaa67750fda845ad0a180332c4cd96a9524086..7caf7fae2d5b8aa369b924e1c87a47c343fb8954 100644
> --- a/net/bridge/br_stp_timer.c
> +++ b/net/bridge/br_stp_timer.c
> @@ -97,7 +97,9 @@ static void br_forward_delay_timer_expired(unsigned long arg)
>  		netif_carrier_on(br->dev);
>  	}
>  	br_log_state(p);
> +	rcu_read_lock();
>  	br_ifinfo_notify(RTM_NEWLINK, p);
> +	rcu_read_unlock();
>  	spin_unlock(&br->lock);
>  }
>  
> 
> 

-- 
02DFF788
4D30 903A 1CF3 B756 FB48  1514 3148 83A2 02DF F788
http://keys.gnupg.net/pks/lookup?op=vindex&search=0x314883A202DFF788
Dominick Grift

[-- Attachment #2: Type: application/pgp-signature, Size: 648 bytes --]

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: Suspicious RCU usage in bridge with Linux v4.0-9362-g1fc149933fd4
  2015-05-12 18:27             ` Dominick Grift
@ 2015-05-21 19:22               ` Josh Boyer
  2015-05-21 20:06                 ` Eric Dumazet
  0 siblings, 1 reply; 27+ messages in thread
From: Josh Boyer @ 2015-05-21 19:22 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: netdev, Dominick Grift

On Tue, May 12, 2015 at 2:27 PM, Dominick Grift <dac.override@gmail.com> wrote:
> On Mon, May 11, 2015 at 07:53:06AM -0700, Eric Dumazet wrote:
>>
>> Hi Dominick
>>
>> Have you tried this patch I sent last monday ?
>>
>> I will submit formally when I get a test result.
>>
>> Thanks
>
> Eric,
>
> I have tested the below patch and everything is in order!
>
> Thank you!
>
>>
>> diff --git a/net/bridge/br_stp_timer.c b/net/bridge/br_stp_timer.c
>> index 4fcaa67750fda845ad0a180332c4cd96a9524086..7caf7fae2d5b8aa369b924e1c87a47c343fb8954 100644
>> --- a/net/bridge/br_stp_timer.c
>> +++ b/net/bridge/br_stp_timer.c
>> @@ -97,7 +97,9 @@ static void br_forward_delay_timer_expired(unsigned long arg)
>>               netif_carrier_on(br->dev);
>>       }
>>       br_log_state(p);
>> +     rcu_read_lock();
>>       br_ifinfo_notify(RTM_NEWLINK, p);
>> +     rcu_read_unlock();
>>       spin_unlock(&br->lock);
>>  }
>>
>>
>>

I'm still seeing this with Linus' tree as of yesterday.  Is this
queued anywhere for 4.1?

josh

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: Suspicious RCU usage in bridge with Linux v4.0-9362-g1fc149933fd4
  2015-05-21 19:22               ` Josh Boyer
@ 2015-05-21 20:06                 ` Eric Dumazet
  0 siblings, 0 replies; 27+ messages in thread
From: Eric Dumazet @ 2015-05-21 20:06 UTC (permalink / raw)
  To: Josh Boyer; +Cc: netdev, Dominick Grift

On Thu, 2015-05-21 at 15:22 -0400, Josh Boyer wrote:

> I'm still seeing this with Linus' tree as of yesterday.  Is this
> queued anywhere for 4.1?
I'll send the patch, thanks for the heads up.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* [PATCH net] bridge: fix lockdep splat
  2015-04-23 17:07 Suspicious RCU usage in bridge with Linux v4.0-9362-g1fc149933fd4 Josh Boyer
                   ` (2 preceding siblings ...)
  2015-05-04 13:39 ` Dominick Grift
@ 2015-05-21 20:28 ` Eric Dumazet
  3 siblings, 0 replies; 27+ messages in thread
From: Eric Dumazet @ 2015-05-21 20:28 UTC (permalink / raw)
  To: Josh Boyer; +Cc: David S. Miller, Dominick Grift, netdev, Vlad Yasevich

From: Eric Dumazet <edumazet@google.com>

Following lockdep splat was reported :

[   29.382286] ===============================
[   29.382315] [ INFO: suspicious RCU usage. ]
[   29.382344] 4.1.0-0.rc0.git11.1.fc23.x86_64 #1 Not tainted
[   29.382380] -------------------------------
[   29.382409] net/bridge/br_private.h:626 suspicious
rcu_dereference_check() usage!
[   29.382455]
               other info that might help us debug this:

[   29.382507]
               rcu_scheduler_active = 1, debug_locks = 0
[   29.382549] 2 locks held by swapper/0/0:
[   29.382576]  #0:  (((&p->forward_delay_timer))){+.-...}, at:
[<ffffffff81139f75>] call_timer_fn+0x5/0x4f0
[   29.382660]  #1:  (&(&br->lock)->rlock){+.-...}, at:
[<ffffffffa0450dc1>] br_forward_delay_timer_expired+0x31/0x140
[bridge]
[   29.382754]
               stack backtrace:
[   29.382787] CPU: 0 PID: 0 Comm: swapper/0 Not tainted
4.1.0-0.rc0.git11.1.fc23.x86_64 #1
[   29.382838] Hardware name: LENOVO 422916G/LENOVO, BIOS A1KT53AUS 04/07/2015
[   29.382882]  0000000000000000 3ebfc20364115825 ffff880666603c48
ffffffff81892d4b
[   29.382943]  0000000000000000 ffffffff81e124e0 ffff880666603c78
ffffffff8110bcd7
[   29.383004]  ffff8800785c9d00 ffff88065485ac58 ffff880c62002800
ffff880c5fc88ac0
[   29.383065] Call Trace:
[   29.383084]  <IRQ>  [<ffffffff81892d4b>] dump_stack+0x4c/0x65
[   29.383130]  [<ffffffff8110bcd7>] lockdep_rcu_suspicious+0xe7/0x120
[   29.383178]  [<ffffffffa04520f9>] br_fill_ifinfo+0x4a9/0x6a0 [bridge]
[   29.383225]  [<ffffffffa045266b>] br_ifinfo_notify+0x11b/0x4b0 [bridge]
[   29.383271]  [<ffffffffa0450d90>] ? br_hold_timer_expired+0x70/0x70 [bridge]
[   29.383320]  [<ffffffffa0450de8>]
br_forward_delay_timer_expired+0x58/0x140 [bridge]
[   29.383371]  [<ffffffffa0450d90>] ? br_hold_timer_expired+0x70/0x70 [bridge]
[   29.383416]  [<ffffffff8113a033>] call_timer_fn+0xc3/0x4f0
[   29.383454]  [<ffffffff81139f75>] ? call_timer_fn+0x5/0x4f0
[   29.383493]  [<ffffffff8110a90f>] ? lock_release_holdtime.part.29+0xf/0x200
[   29.383541]  [<ffffffffa0450d90>] ? br_hold_timer_expired+0x70/0x70 [bridge]
[   29.383587]  [<ffffffff8113a6a4>] run_timer_softirq+0x244/0x490
[   29.383629]  [<ffffffff810b68cc>] __do_softirq+0xec/0x670
[   29.383666]  [<ffffffff810b70d5>] irq_exit+0x145/0x150
[   29.383703]  [<ffffffff8189f506>] smp_apic_timer_interrupt+0x46/0x60
[   29.383744]  [<ffffffff8189d523>] apic_timer_interrupt+0x73/0x80
[   29.383782]  <EOI>  [<ffffffff816f131f>] ? cpuidle_enter_state+0x5f/0x2f0
[   29.383832]  [<ffffffff816f131b>] ? cpuidle_enter_state+0x5b/0x2f0

Problem here is that br_forward_delay_timer_expired() is a timer
handler, calling br_ifinfo_notify() which assumes either rcu_read_lock()
or RTNL are held.

Simplest fix seems to add rcu read lock section.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: Josh Boyer <jwboyer@fedoraproject.org>
Reported-by: Dominick Grift <dac.override@gmail.com>
Cc: Vlad Yasevich <vyasevich@gmail.com>
---
 net/bridge/br_stp_timer.c |    2 ++
 1 file changed, 2 insertions(+)

diff --git a/net/bridge/br_stp_timer.c b/net/bridge/br_stp_timer.c
index 4fcaa67750fda845ad0a180332c4cd96a9524086..7caf7fae2d5b8aa369b924e1c87a47c343fb8954 100644
--- a/net/bridge/br_stp_timer.c
+++ b/net/bridge/br_stp_timer.c
@@ -97,7 +97,9 @@ static void br_forward_delay_timer_expired(unsigned long arg)
 		netif_carrier_on(br->dev);
 	}
 	br_log_state(p);
+	rcu_read_lock();
 	br_ifinfo_notify(RTM_NEWLINK, p);
+	rcu_read_unlock();
 	spin_unlock(&br->lock);
 }
 

^ permalink raw reply related	[flat|nested] 27+ messages in thread

end of thread, other threads:[~2015-05-21 20:28 UTC | newest]

Thread overview: 27+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-04-23 17:07 Suspicious RCU usage in bridge with Linux v4.0-9362-g1fc149933fd4 Josh Boyer
2015-04-23 17:35 ` Sudeep Holla
2015-04-23 22:53 ` Cong Wang
2015-05-04 13:39 ` Dominick Grift
2015-05-04 18:45   ` Cong Wang
2015-05-04 20:27     ` Stephen Hemminger
2015-05-04 21:35       ` Cong Wang
2015-05-11 13:15         ` Dominick Grift
2015-05-11 14:53           ` Eric Dumazet
2015-05-11 17:42             ` Dominick Grift
2015-05-12  0:13             ` poma
2015-05-12  0:37               ` Eric Dumazet
2015-05-12  1:33                 ` poma
2015-05-12 18:27             ` Dominick Grift
2015-05-21 19:22               ` Josh Boyer
2015-05-21 20:06                 ` Eric Dumazet
2015-05-05  2:32       ` Vlad Yasevich
2015-05-04 20:49     ` Eric Dumazet
2015-05-04 21:38       ` Cong Wang
2015-05-04 22:06         ` Eric Dumazet
2015-05-04 22:17           ` Cong Wang
2015-05-04 22:29             ` Eric Dumazet
2015-05-04 22:44               ` Cong Wang
2015-05-04 23:00                 ` Eric Dumazet
2015-05-04 23:22                   ` David Miller
2015-05-05  8:33     ` Dominick Grift
2015-05-21 20:28 ` [PATCH net] bridge: fix lockdep splat Eric Dumazet

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).