netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Re: mod_timer: list_add corruption: WARNING: CPU: 1 PID: 0 at lib/list_debug.c:33 __list_add+0xbe/0xd0()
       [not found] ` <51E8F759.4060202@linux.vnet.ibm.com>
@ 2013-07-19  9:08   ` Srivatsa S. Bhat
  0 siblings, 0 replies; 9+ messages in thread
From: Srivatsa S. Bhat @ 2013-07-19  9:08 UTC (permalink / raw)
  To: linux-kernel
  Cc: Srivatsa S. Bhat, Frederic Weisbecker, Thomas Gleixner,
	Ingo Molnar, Paul E. McKenney, netdev

On 07/19/2013 01:52 PM, Srivatsa S. Bhat wrote:
> On 07/16/2013 07:55 PM, Srivatsa S. Bhat wrote:
>> Hi,
>>
>> I happened to hit these warnings on an idle system shortly after
>> boot. And the machine locked up and sent NMIs to all CPUs, some of
>> which have been captured below.
>>
>> Kernel version: v3.11-rc1 + 3 unrelated local cpufreq patches.
>> Options set in .config (which are of interest here):
>>
>> CONFIG_TICK_ONESHOT=y
>> CONFIG_NO_HZ_COMMON=y
>> # CONFIG_HZ_PERIODIC is not set
>> # CONFIG_NO_HZ_IDLE is not set
>> CONFIG_NO_HZ_FULL=y
>> CONFIG_NO_HZ_FULL_ALL=y
>> CONFIG_NO_HZ=y
>> CONFIG_HIGH_RES_TIMERS=y
>>
>> CONFIG_TREE_RCU=y
>> # CONFIG_PREEMPT_RCU is not set
>> CONFIG_RCU_STALL_COMMON=y
>> CONFIG_CONTEXT_TRACKING=y
>> CONFIG_RCU_USER_QS=y
>> CONFIG_CONTEXT_TRACKING_FORCE=y
>> CONFIG_RCU_FANOUT=64
>> CONFIG_RCU_FANOUT_LEAF=16
>> # CONFIG_RCU_FANOUT_EXACT is not set
>> CONFIG_RCU_FAST_NO_HZ=y
>> # CONFIG_TREE_RCU_TRACE is not set
>> CONFIG_RCU_NOCB_CPU=y
>> CONFIG_RCU_NOCB_CPU_ALL=y
>>
>>
> 
> Today I hit the same warnings, even without NO_HZ_FULL enabled.
> I've attached my .config with this mail.
>

Adding netdev mailing list to CC.

I had posted the warnings here:
http://marc.info/?l=linux-kernel&m=137398492701587&w=2

Regards,
Srivatsa S. Bhat

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: mod_timer: list_add corruption: WARNING: CPU: 1 PID: 0 at lib/list_debug.c:33 __list_add+0xbe/0xd0()
       [not found] ` <alpine.DEB.2.02.1307191323150.4089@ionos.tec.linutronix.de>
@ 2013-07-19 11:47   ` Srivatsa S. Bhat
  2013-07-19 13:40     ` Thomas Gleixner
  0 siblings, 1 reply; 9+ messages in thread
From: Srivatsa S. Bhat @ 2013-07-19 11:47 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: linux-kernel, Frederic Weisbecker, Ingo Molnar, Paul E. McKenney, netdev

On 07/19/2013 04:55 PM, Thomas Gleixner wrote:
> On Tue, 16 Jul 2013, Srivatsa S. Bhat wrote:
>> ------------[ cut here ]------------
>> WARNING: CPU: 1 PID: 0 at lib/list_debug.c:33 __list_add+0xbe/0xd0()
>> list_add corruption. prev->next should be next (ffff8810396b5568), but was           (null). (prev=ffff88102c1344c0).
> 
> Can you please enable debugobjects?
> 

Sure Thomas, please find the new traces below, with
debug objects enabled.
Regards,
Srivatsa S. Bhat


------------[ cut here ]------------
WARNING: CPU: 1 PID: 0 at lib/debugobjects.c:260 debug_print_object+0x8e/0xb0()
ODEBUG: init active (active state 0) object type: timer_list hint: br_multicast_group_expired+0x0/0x110 [bridge]
Modules linked in: ebtable_nat ebtables nfs fscache nf_conntrack_ipv4 nf_defrag_ipv4 ipt_REJECT xt_CHECKSUM iptable_mangle iptable_filter ip_tables nfsd lockd nfs_acl exportfs auth_rpcgss autofs4 sunrpc 8021q garp bridge stp llc ip6t_REJECT nf_conntrack_ipv6 nf_defrag_ipv6 xt_state nf_conntrack ip6table_filter ip6_tables ipv6 vfat fat vhost_net macvtap macvlan vhost tun kvm_intel kvm uinput iTCO_wdt iTCO_vendor_support cdc_ether usbnet mii microcode i2c_i801 i2c_core lpc_ich mfd_core shpchp mlx4_core ioatdma dca be2net wmi acpi_cpufreq mperf ext4 jbd2 mbcache dm_mirror dm_region_hash dm_log dm_mod
CPU: 1 PID: 0 Comm: swapper/1 Not tainted 3.11.0-rc1-dbg-a #16
Hardware name: IBM  -[8737R2A]-/00Y2738, BIOS -[B2E120RUS-1.20]- 11/30/2012
 0000000000000104 ffff88103fc43908 ffffffff8161b3c9 ffffffff81a14151
 ffff88103fc43958 ffff88103fc43948 ffffffff8104e6bc 0000000000000000
 ffffffff81a2cfae ffff88101b21cfd8 ffffffff81c37ba0 ffffffff828f2ca8
Call Trace:
 <IRQ>  [<ffffffff8161b3c9>] dump_stack+0x59/0x80
 [<ffffffff8104e6bc>] warn_slowpath_common+0x8c/0xc0
 [<ffffffff8104e7a6>] warn_slowpath_fmt+0x46/0x50
 [<ffffffff812b5aee>] debug_print_object+0x8e/0xb0
 [<ffffffffa04247f0>] ? br_multicast_free_pg+0x20/0x20 [bridge]
 [<ffffffff812b65e2>] ? __debug_object_init+0x42/0x3f0
 [<ffffffff812b67bf>] __debug_object_init+0x21f/0x3f0
 [<ffffffff812b69df>] debug_object_init+0x1f/0x30
 [<ffffffff81060ea9>] init_timer_key+0x39/0x100
 [<ffffffffa0425ec5>] br_ip4_multicast_query+0x155/0x380 [bridge]
 [<ffffffffa0427eef>] br_multicast_ipv4_rcv+0x2cf/0x3d0 [bridge]
 [<ffffffff8162140b>] ? _raw_spin_unlock+0x2b/0x50
 [<ffffffffa0419a9b>] ? br_fdb_update+0x1db/0x2b0 [bridge]
 [<ffffffffa04284b5>] br_multicast_rcv+0x45/0x60 [bridge]
 [<ffffffffa041bdfe>] br_handle_frame_finish+0x16e/0x3c0 [bridge]
 [<ffffffffa041bac8>] br_handle_frame+0x238/0x400 [bridge]
 [<ffffffffa041b890>] ? br_del_bridge+0x80/0x80 [bridge]
 [<ffffffff81539ca7>] __netif_receive_skb_core+0x237/0x960
 [<ffffffff81539ade>] ? __netif_receive_skb_core+0x6e/0x960
 [<ffffffff8153a3f7>] __netif_receive_skb+0x27/0x70
 [<ffffffff8153c6fd>] netif_receive_skb+0x2d/0x210
 [<ffffffff81527e65>] ? __netdev_alloc_skb+0xa5/0x110
 [<ffffffffa0129a0f>] be_rx_compl_process+0xef/0x140 [be2net]
 [<ffffffffa0129dc2>] be_process_rx+0xe2/0x1a0 [be2net]
 [<ffffffffa0129fbd>] be_poll+0x13d/0x1d0 [be2net]
 [<ffffffff8153dab8>] net_rx_action+0xd8/0x2a0
 [<ffffffff81058e19>] __do_softirq+0x149/0x400
 [<ffffffff8105922d>] irq_exit+0xed/0x100
 [<ffffffff8162d206>] do_IRQ+0x66/0xe0
 [<ffffffff8162182f>] common_interrupt+0x6f/0x6f
 <EOI>  [<ffffffff814f148b>] ? cpuidle_enter_state+0x5b/0xe0
 [<ffffffff814f1487>] ? cpuidle_enter_state+0x57/0xe0
 [<ffffffff81625cd0>] ? notifier_call_chain+0x120/0x120
 [<ffffffff814f197f>] cpuidle_idle_call+0xcf/0x160
 [<ffffffff8100cf0e>] arch_cpu_idle+0xe/0x30
 [<ffffffff810b1291>] cpu_idle_loop+0x81/0x3a0
 [<ffffffff810b1620>] cpu_startup_entry+0x70/0x80
 [<ffffffff810334fc>] start_secondary+0xdc/0xe0
---[ end trace 68b46b9f74e14585 ]---
------------[ cut here ]------------
WARNING: CPU: 1 PID: 0 at lib/debugobjects.c:260 debug_print_object+0x8e/0xb0()
ODEBUG: init active (active state 0) object type: timer_list hint: br_multicast_group_expired+0x0/0x110 [bridge]
Modules linked in: ebtable_nat ebtables nfs fscache nf_conntrack_ipv4 nf_defrag_ipv4 ipt_REJECT xt_CHECKSUM iptable_mangle iptable_filter ip_tables nfsd lockd nfs_acl exportfs auth_rpcgss autofs4 sunrpc 8021q garp bridge stp llc ip6t_REJECT nf_conntrack_ipv6 nf_defrag_ipv6 xt_state nf_conntrack ip6table_filter ip6_tables ipv6 vfat fat vhost_net macvtap macvlan vhost tun kvm_intel kvm uinput iTCO_wdt iTCO_vendor_support cdc_ether usbnet mii microcode i2c_i801 i2c_core lpc_ich mfd_core shpchp mlx4_core ioatdma dca be2net wmi acpi_cpufreq mperf ext4 jbd2 mbcache dm_mirror dm_region_hash dm_log dm_mod
CPU: 1 PID: 0 Comm: swapper/1 Tainted: G        W    3.11.0-rc1-dbg-a #16
Hardware name: IBM  -[8737R2A]-/00Y2738, BIOS -[B2E120RUS-1.20]- 11/30/2012
 0000000000000104 ffff88103fc43908 ffffffff8161b3c9 ffffffff81a14151
 ffff88103fc43958 ffff88103fc43948 ffffffff8104e6bc 0000000000000000
 ffffffff81a2cfae ffff88101b21cfd8 ffffffff81c37ba0 ffffffff828f2ca8
Call Trace:
 <IRQ>  [<ffffffff8161b3c9>] dump_stack+0x59/0x80
 [<ffffffff8104e6bc>] warn_slowpath_common+0x8c/0xc0
 [<ffffffff8104e7a6>] warn_slowpath_fmt+0x46/0x50
 [<ffffffff812b5aee>] debug_print_object+0x8e/0xb0
 [<ffffffffa04247f0>] ? br_multicast_free_pg+0x20/0x20 [bridge]
 [<ffffffff812b65e2>] ? __debug_object_init+0x42/0x3f0
 [<ffffffff812b67bf>] __debug_object_init+0x21f/0x3f0
 [<ffffffff812b69df>] debug_object_init+0x1f/0x30
 [<ffffffff81060ea9>] init_timer_key+0x39/0x100
 [<ffffffffa0425ec5>] br_ip4_multicast_query+0x155/0x380 [bridge]
 [<ffffffffa0427eef>] br_multicast_ipv4_rcv+0x2cf/0x3d0 [bridge]
 [<ffffffff8156d93e>] ? nf_hook_slow+0x16e/0x2a0
 [<ffffffffa0419a07>] ? br_fdb_update+0x147/0x2b0 [bridge]
 [<ffffffffa04284b5>] br_multicast_rcv+0x45/0x60 [bridge]
 [<ffffffffa041bdfe>] br_handle_frame_finish+0x16e/0x3c0 [bridge]
 [<ffffffffa041bac8>] br_handle_frame+0x238/0x400 [bridge]
 [<ffffffffa041b890>] ? br_del_bridge+0x80/0x80 [bridge]
 [<ffffffff81539ca7>] __netif_receive_skb_core+0x237/0x960
 [<ffffffff81539ade>] ? __netif_receive_skb_core+0x6e/0x960
 [<ffffffff810c2d1d>] ? trace_hardirqs_on+0xd/0x10
 [<ffffffff8153a3f7>] __netif_receive_skb+0x27/0x70
 [<ffffffff8153c6fd>] netif_receive_skb+0x2d/0x210
 [<ffffffff81527e65>] ? __netdev_alloc_skb+0xa5/0x110
 [<ffffffffa0129a0f>] be_rx_compl_process+0xef/0x140 [be2net]
 [<ffffffffa0129dc2>] be_process_rx+0xe2/0x1a0 [be2net]
 [<ffffffffa0129fbd>] be_poll+0x13d/0x1d0 [be2net]
 [<ffffffff8153dab8>] net_rx_action+0xd8/0x2a0
 [<ffffffff81058e19>] __do_softirq+0x149/0x400
 [<ffffffff8105922d>] irq_exit+0xed/0x100
 [<ffffffff8162d206>] do_IRQ+0x66/0xe0
 [<ffffffff8162182f>] common_interrupt+0x6f/0x6f
 <EOI>  [<ffffffff814f148b>] ? cpuidle_enter_state+0x5b/0xe0
 [<ffffffff814f1487>] ? cpuidle_enter_state+0x57/0xe0
 [<ffffffff81625cd0>] ? notifier_call_chain+0x120/0x120
 [<ffffffff814f197f>] cpuidle_idle_call+0xcf/0x160
 [<ffffffff8100cf0e>] arch_cpu_idle+0xe/0x30
 [<ffffffff810b1291>] cpu_idle_loop+0x81/0x3a0
 [<ffffffff810b1620>] cpu_startup_entry+0x70/0x80
 [<ffffffff810334fc>] start_secondary+0xdc/0xe0
---[ end trace 68b46b9f74e14586 ]---
------------[ cut here ]------------
WARNING: CPU: 1 PID: 0 at lib/debugobjects.c:260 debug_print_object+0x8e/0xb0()
ODEBUG: init active (active state 0) object type: timer_list hint: br_multicast_group_expired+0x0/0x110 [bridge]
Modules linked in: ebtable_nat ebtables nfs fscache nf_conntrack_ipv4 nf_defrag_ipv4 ipt_REJECT xt_CHECKSUM iptable_mangle iptable_filter ip_tables nfsd lockd nfs_acl exportfs auth_rpcgss autofs4 sunrpc 8021q garp bridge stp llc ip6t_REJECT nf_conntrack_ipv6 nf_defrag_ipv6 xt_state nf_conntrack ip6table_filter ip6_tables ipv6 vfat fat vhost_net macvtap macvlan vhost tun kvm_intel kvm uinput iTCO_wdt iTCO_vendor_support cdc_ether usbnet mii microcode i2c_i801 i2c_core lpc_ich mfd_core shpchp mlx4_core ioatdma dca be2net wmi acpi_cpufreq mperf ext4 jbd2 mbcache dm_mirror dm_region_hash dm_log dm_mod
CPU: 1 PID: 0 Comm: swapper/1 Tainted: G        W    3.11.0-rc1-dbg-a #16
Hardware name: IBM  -[8737R2A]-/00Y2738, BIOS -[B2E120RUS-1.20]- 11/30/2012
 0000000000000104 ffff88103fc43908 ffffffff8161b3c9 ffffffff81a14151
 ffff88103fc43958 ffff88103fc43948 ffffffff8104e6bc 0000000000000000
 ffffffff81a2cfae ffff88101b21cfd8 ffffffff81c37ba0 ffffffff828f2ca8
Call Trace:
 <IRQ>  [<ffffffff8161b3c9>] dump_stack+0x59/0x80
 [<ffffffff8104e6bc>] warn_slowpath_common+0x8c/0xc0
 [<ffffffff8104e7a6>] warn_slowpath_fmt+0x46/0x50
 [<ffffffff812b5aee>] debug_print_object+0x8e/0xb0
 [<ffffffffa04247f0>] ? br_multicast_free_pg+0x20/0x20 [bridge]
 [<ffffffff812b65e2>] ? __debug_object_init+0x42/0x3f0
 [<ffffffff812b67bf>] __debug_object_init+0x21f/0x3f0
 [<ffffffff810c2d1d>] ? trace_hardirqs_on+0xd/0x10
 [<ffffffff812b69df>] debug_object_init+0x1f/0x30
 [<ffffffff81060ea9>] init_timer_key+0x39/0x100
 [<ffffffffa0425ec5>] br_ip4_multicast_query+0x155/0x380 [bridge]
 [<ffffffffa0427eef>] br_multicast_ipv4_rcv+0x2cf/0x3d0 [bridge]
 [<ffffffff8156d93e>] ? nf_hook_slow+0x16e/0x2a0
 [<ffffffffa04199a4>] ? br_fdb_update+0xe4/0x2b0 [bridge]
 [<ffffffffa04284b5>] br_multicast_rcv+0x45/0x60 [bridge]
 [<ffffffffa041bdfe>] br_handle_frame_finish+0x16e/0x3c0 [bridge]
 [<ffffffffa041bac8>] br_handle_frame+0x238/0x400 [bridge]
 [<ffffffffa041b890>] ? br_del_bridge+0x80/0x80 [bridge]
 [<ffffffff81539ca7>] __netif_receive_skb_core+0x237/0x960
 [<ffffffff81539ade>] ? __netif_receive_skb_core+0x6e/0x960
 [<ffffffff810c2d1d>] ? trace_hardirqs_on+0xd/0x10
 [<ffffffff8153a3f7>] __netif_receive_skb+0x27/0x70
 [<ffffffff8153c6fd>] netif_receive_skb+0x2d/0x210
 [<ffffffff81527e65>] ? __netdev_alloc_skb+0xa5/0x110
 [<ffffffffa0129a0f>] be_rx_compl_process+0xef/0x140 [be2net]
 [<ffffffffa0129dc2>] be_process_rx+0xe2/0x1a0 [be2net]
 [<ffffffffa0129fbd>] be_poll+0x13d/0x1d0 [be2net]
 [<ffffffff8153dab8>] net_rx_action+0xd8/0x2a0
 [<ffffffff81058e19>] __do_softirq+0x149/0x400
 [<ffffffff8105922d>] irq_exit+0xed/0x100
 [<ffffffff8162d206>] do_IRQ+0x66/0xe0
 [<ffffffff8162182f>] common_interrupt+0x6f/0x6f
 <EOI>  [<ffffffff814f148b>] ? cpuidle_enter_state+0x5b/0xe0
 [<ffffffff814f1487>] ? cpuidle_enter_state+0x57/0xe0
 [<ffffffff81625cd0>] ? notifier_call_chain+0x120/0x120
 [<ffffffff814f197f>] cpuidle_idle_call+0xcf/0x160
 [<ffffffff8100cf0e>] arch_cpu_idle+0xe/0x30
 [<ffffffff810b1291>] cpu_idle_loop+0x81/0x3a0
 [<ffffffff810b1620>] cpu_startup_entry+0x70/0x80
 [<ffffffff810334fc>] start_secondary+0xdc/0xe0
---[ end trace 68b46b9f74e14587 ]---
------------[ cut here ]------------
WARNING: CPU: 1 PID: 0 at lib/debugobjects.c:260 debug_print_object+0x8e/0xb0()
ODEBUG: init active (active state 0) object type: timer_list hint: br_multicast_group_expired+0x0/0x110 [bridge]
Modules linked in: ebtable_nat ebtables nfs fscache nf_conntrack_ipv4 nf_defrag_ipv4 ipt_REJECT xt_CHECKSUM iptable_mangle iptable_filter ip_tables nfsd lockd nfs_acl exportfs auth_rpcgss autofs4 sunrpc 8021q garp bridge stp llc ip6t_REJECT nf_conntrack_ipv6 nf_defrag_ipv6 xt_state nf_conntrack ip6table_filter ip6_tables ipv6 vfat fat vhost_net macvtap macvlan vhost tun kvm_intel kvm uinput iTCO_wdt iTCO_vendor_support cdc_ether usbnet mii microcode i2c_i801 i2c_core lpc_ich mfd_core shpchp mlx4_core ioatdma dca be2net wmi acpi_cpufreq mperf ext4 jbd2 mbcache dm_mirror dm_region_hash dm_log dm_mod
CPU: 1 PID: 0 Comm: swapper/1 Tainted: G        W    3.11.0-rc1-dbg-a #16
Hardware name: IBM  -[8737R2A]-/00Y2738, BIOS -[B2E120RUS-1.20]- 11/30/2012
 0000000000000104 ffff88103fc43908 ffffffff8161b3c9 ffffffff81a14151
 ffff88103fc43958 ffff88103fc43948 ffffffff8104e6bc 0000000000000000
 ffffffff81a2cfae ffff88101b21cfd8 ffffffff81c37ba0 ffffffff828f2ca8
Call Trace:
 <IRQ>  [<ffffffff8161b3c9>] dump_stack+0x59/0x80
 [<ffffffff8104e6bc>] warn_slowpath_common+0x8c/0xc0
 [<ffffffff8104e7a6>] warn_slowpath_fmt+0x46/0x50
 [<ffffffff812b5aee>] debug_print_object+0x8e/0xb0
 [<ffffffffa04247f0>] ? br_multicast_free_pg+0x20/0x20 [bridge]
 [<ffffffff812b65e2>] ? __debug_object_init+0x42/0x3f0
 [<ffffffff812b67bf>] __debug_object_init+0x21f/0x3f0
 [<ffffffff810c2d1d>] ? trace_hardirqs_on+0xd/0x10
 [<ffffffff812b69df>] debug_object_init+0x1f/0x30
 [<ffffffff81060ea9>] init_timer_key+0x39/0x100
 [<ffffffffa0425ec5>] br_ip4_multicast_query+0x155/0x380 [bridge]
 [<ffffffffa0427eef>] br_multicast_ipv4_rcv+0x2cf/0x3d0 [bridge]
 [<ffffffff8156d93e>] ? nf_hook_slow+0x16e/0x2a0
 [<ffffffffa04199a4>] ? br_fdb_update+0xe4/0x2b0 [bridge]
 [<ffffffffa04284b5>] br_multicast_rcv+0x45/0x60 [bridge]
 [<ffffffffa041bdfe>] br_handle_frame_finish+0x16e/0x3c0 [bridge]
 [<ffffffffa041bac8>] br_handle_frame+0x238/0x400 [bridge]
 [<ffffffffa041b890>] ? br_del_bridge+0x80/0x80 [bridge]
 [<ffffffff81539ca7>] __netif_receive_skb_core+0x237/0x960
 [<ffffffff81539ade>] ? __netif_receive_skb_core+0x6e/0x960
 [<ffffffff8153a3f7>] __netif_receive_skb+0x27/0x70
 [<ffffffff8153c6fd>] netif_receive_skb+0x2d/0x210
 [<ffffffff81527e65>] ? __netdev_alloc_skb+0xa5/0x110
 [<ffffffffa0129a0f>] be_rx_compl_process+0xef/0x140 [be2net]
 [<ffffffffa0129dc2>] be_process_rx+0xe2/0x1a0 [be2net]
 [<ffffffffa0129fbd>] be_poll+0x13d/0x1d0 [be2net]
 [<ffffffff8153dab8>] net_rx_action+0xd8/0x2a0
 [<ffffffff81058e19>] __do_softirq+0x149/0x400
 [<ffffffff8105922d>] irq_exit+0xed/0x100
 [<ffffffff8162d206>] do_IRQ+0x66/0xe0
 [<ffffffff8162182f>] common_interrupt+0x6f/0x6f
 <EOI>  [<ffffffff814f148b>] ? cpuidle_enter_state+0x5b/0xe0
 [<ffffffff814f1487>] ? cpuidle_enter_state+0x57/0xe0
 [<ffffffff81625cd0>] ? notifier_call_chain+0x120/0x120
 [<ffffffff814f197f>] cpuidle_idle_call+0xcf/0x160
 [<ffffffff8100cf0e>] arch_cpu_idle+0xe/0x30
 [<ffffffff810b1291>] cpu_idle_loop+0x81/0x3a0
 [<ffffffff810b1620>] cpu_startup_entry+0x70/0x80
 [<ffffffff810334fc>] start_secondary+0xdc/0xe0
---[ end trace 68b46b9f74e14588 ]---

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: mod_timer: list_add corruption: WARNING: CPU: 1 PID: 0 at lib/list_debug.c:33 __list_add+0xbe/0xd0()
  2013-07-19 11:47   ` Srivatsa S. Bhat
@ 2013-07-19 13:40     ` Thomas Gleixner
  2013-07-19 13:51       ` Eric Dumazet
  0 siblings, 1 reply; 9+ messages in thread
From: Thomas Gleixner @ 2013-07-19 13:40 UTC (permalink / raw)
  To: Srivatsa S. Bhat
  Cc: linux-kernel, Frederic Weisbecker, Ingo Molnar, Paul E. McKenney, netdev

On Fri, 19 Jul 2013, Srivatsa S. Bhat wrote:
> On 07/19/2013 04:55 PM, Thomas Gleixner wrote:
> > On Tue, 16 Jul 2013, Srivatsa S. Bhat wrote:
> >> ------------[ cut here ]------------
> >> WARNING: CPU: 1 PID: 0 at lib/list_debug.c:33 __list_add+0xbe/0xd0()
> >> list_add corruption. prev->next should be next (ffff8810396b5568), but was           (null). (prev=ffff88102c1344c0).
> > 
> > Can you please enable debugobjects?
> > 
> 
> Sure Thomas, please find the new traces below, with
> debug objects enabled.
> Regards,
> Srivatsa S. Bhat
> 
> 
> ------------[ cut here ]------------
> WARNING: CPU: 1 PID: 0 at lib/debugobjects.c:260 debug_print_object+0x8e/0xb0()
> ODEBUG: init active (active state 0) object type: timer_list hint: br_multicast_group_expired+0x0/0x110 [bridge]

So an active enqueued timer gets reinitialized. Not so pretty :)

>  [<ffffffff812b5aee>] debug_print_object+0x8e/0xb0
>  [<ffffffffa04247f0>] ? br_multicast_free_pg+0x20/0x20 [bridge]
>  [<ffffffff812b65e2>] ? __debug_object_init+0x42/0x3f0
>  [<ffffffff812b67bf>] __debug_object_init+0x21f/0x3f0
>  [<ffffffff812b69df>] debug_object_init+0x1f/0x30
>  [<ffffffff81060ea9>] init_timer_key+0x39/0x100
>  [<ffffffffa0425ec5>] br_ip4_multicast_query+0x155/0x380 [bridge]

Here is the offending call site. I leave that to the network wizards.

> [<ffffffffa0427eef>] br_multicast_ipv4_rcv+0x2cf/0x3d0 [bridge]
> [<ffffffff8162140b>] ? _raw_spin_unlock+0x2b/0x50
> [<ffffffffa0419a9b>] ? br_fdb_update+0x1db/0x2b0 [bridge]
> [<ffffffffa04284b5>] br_multicast_rcv+0x45/0x60 [bridge]
> [<ffffffffa041bdfe>] br_handle_frame_finish+0x16e/0x3c0 [bridge]
> [<ffffffffa041bac8>] br_handle_frame+0x238/0x400 [bridge]
> [<ffffffffa041b890>] ? br_del_bridge+0x80/0x80 [bridge]
> [<ffffffff81539ca7>] __netif_receive_skb_core+0x237/0x960
> [<ffffffff81539ade>] ? __netif_receive_skb_core+0x6e/0x960
> [<ffffffff8153a3f7>] __netif_receive_skb+0x27/0x70
> [<ffffffff8153c6fd>] netif_receive_skb+0x2d/0x210
> [<ffffffff81527e65>] ? __netdev_alloc_skb+0xa5/0x110
> [<ffffffffa0129a0f>] be_rx_compl_process+0xef/0x140 [be2net]
> [<ffffffffa0129dc2>] be_process_rx+0xe2/0x1a0 [be2net]
> [<ffffffffa0129fbd>] be_poll+0x13d/0x1d0 [be2net]
> [<ffffffff8153dab8>] net_rx_action+0xd8/0x2a0
> [<ffffffff81058e19>] __do_softirq+0x149/0x400
> [<ffffffff8105922d>] irq_exit+0xed/0x100
> [<ffffffff8162d206>] do_IRQ+0x66/0xe0

Thanks,

	tglx

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: mod_timer: list_add corruption: WARNING: CPU: 1 PID: 0 at lib/list_debug.c:33 __list_add+0xbe/0xd0()
  2013-07-19 13:40     ` Thomas Gleixner
@ 2013-07-19 13:51       ` Eric Dumazet
  2013-07-19 16:38         ` Thomas Gleixner
  0 siblings, 1 reply; 9+ messages in thread
From: Eric Dumazet @ 2013-07-19 13:51 UTC (permalink / raw)
  To: Thomas Gleixner, Cong Wang
  Cc: Srivatsa S. Bhat, linux-kernel, Frederic Weisbecker, Ingo Molnar,
	Paul E. McKenney, netdev

On Fri, 2013-07-19 at 15:40 +0200, Thomas Gleixner wrote:
> On Fri, 19 Jul 2013, Srivatsa S. Bhat wrote:
> > On 07/19/2013 04:55 PM, Thomas Gleixner wrote:
> > > On Tue, 16 Jul 2013, Srivatsa S. Bhat wrote:
> > >> ------------[ cut here ]------------
> > >> WARNING: CPU: 1 PID: 0 at lib/list_debug.c:33 __list_add+0xbe/0xd0()
> > >> list_add corruption. prev->next should be next (ffff8810396b5568), but was           (null). (prev=ffff88102c1344c0).
> > > 
> > > Can you please enable debugobjects?
> > > 
> > 
> > Sure Thomas, please find the new traces below, with
> > debug objects enabled.
> > Regards,
> > Srivatsa S. Bhat
> > 
> > 
> > ------------[ cut here ]------------
> > WARNING: CPU: 1 PID: 0 at lib/debugobjects.c:260 debug_print_object+0x8e/0xb0()
> > ODEBUG: init active (active state 0) object type: timer_list hint: br_multicast_group_expired+0x0/0x110 [bridge]
> 
> So an active enqueued timer gets reinitialized. Not so pretty :)
> 
> >  [<ffffffff812b5aee>] debug_print_object+0x8e/0xb0
> >  [<ffffffffa04247f0>] ? br_multicast_free_pg+0x20/0x20 [bridge]
> >  [<ffffffff812b65e2>] ? __debug_object_init+0x42/0x3f0
> >  [<ffffffff812b67bf>] __debug_object_init+0x21f/0x3f0
> >  [<ffffffff812b69df>] debug_object_init+0x1f/0x30
> >  [<ffffffff81060ea9>] init_timer_key+0x39/0x100
> >  [<ffffffffa0425ec5>] br_ip4_multicast_query+0x155/0x380 [bridge]
> 
> Here is the offending call site. I leave that to the network wizards.
> 
> > [<ffffffffa0427eef>] br_multicast_ipv4_rcv+0x2cf/0x3d0 [bridge]
> > [<ffffffff8162140b>] ? _raw_spin_unlock+0x2b/0x50
> > [<ffffffffa0419a9b>] ? br_fdb_update+0x1db/0x2b0 [bridge]
> > [<ffffffffa04284b5>] br_multicast_rcv+0x45/0x60 [bridge]
> > [<ffffffffa041bdfe>] br_handle_frame_finish+0x16e/0x3c0 [bridge]
> > [<ffffffffa041bac8>] br_handle_frame+0x238/0x400 [bridge]
> > [<ffffffffa041b890>] ? br_del_bridge+0x80/0x80 [bridge]
> > [<ffffffff81539ca7>] __netif_receive_skb_core+0x237/0x960
> > [<ffffffff81539ade>] ? __netif_receive_skb_core+0x6e/0x960
> > [<ffffffff8153a3f7>] __netif_receive_skb+0x27/0x70
> > [<ffffffff8153c6fd>] netif_receive_skb+0x2d/0x210
> > [<ffffffff81527e65>] ? __netdev_alloc_skb+0xa5/0x110
> > [<ffffffffa0129a0f>] be_rx_compl_process+0xef/0x140 [be2net]
> > [<ffffffffa0129dc2>] be_process_rx+0xe2/0x1a0 [be2net]
> > [<ffffffffa0129fbd>] be_poll+0x13d/0x1d0 [be2net]
> > [<ffffffff8153dab8>] net_rx_action+0xd8/0x2a0
> > [<ffffffff81058e19>] __do_softirq+0x149/0x400
> > [<ffffffff8105922d>] irq_exit+0xed/0x100
> > [<ffffffff8162d206>] do_IRQ+0x66/0xe0
> 
> Thanks,

Bug added by :

commit 9f00b2e7cf241fa389733d41b615efdaa2cb0f5b
Author: Cong Wang <amwang@redhat.com>
Date:   Tue May 21 21:52:55 2013 +0000

    bridge: only expire the mdb entry when query is received
    
    Currently we arm the expire timer when the mdb entry is added,
    however, this causes problem when there is no querier sent
    out after that.
    
    So we should only arm the timer when a corresponding query is
    received, as suggested by Herbert.
    
    And he also mentioned "if there is no querier then group
    subscriptions shouldn't expire. There has to be at least one querier
    in the network for this thing to work.  Otherwise it just degenerates
    into a non-snooping switch, which is OK."
    
    Cc: Herbert Xu <herbert@gondor.apana.org.au>
    Cc: Stephen Hemminger <stephen@networkplumber.org>
    Cc: "David S. Miller" <davem@davemloft.net>
    Cc: Adam Baker <linux@baker-net.org.uk>
    Signed-off-by: Cong Wang <amwang@redhat.com>
    Acked-by: Herbert Xu <herbert@gondor.apana.org.au>
    Signed-off-by: David S. Miller <davem@davemloft.net>


I guess following should help 

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: mod_timer: list_add corruption: WARNING: CPU: 1 PID: 0 at lib/list_debug.c:33 __list_add+0xbe/0xd0()
  2013-07-19 13:51       ` Eric Dumazet
@ 2013-07-19 16:38         ` Thomas Gleixner
  2013-07-19 17:26           ` Eric Dumazet
  0 siblings, 1 reply; 9+ messages in thread
From: Thomas Gleixner @ 2013-07-19 16:38 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: Cong Wang, Srivatsa S. Bhat, linux-kernel, Frederic Weisbecker,
	Ingo Molnar, Paul E. McKenney, netdev

On Fri, 19 Jul 2013, Eric Dumazet wrote:
> 
> I guess following should help 

Applying the empty patch does not work very well. Could you try again
after the caffeine reached your brain, please ? :)

Thanks,

	tglx

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: mod_timer: list_add corruption: WARNING: CPU: 1 PID: 0 at lib/list_debug.c:33 __list_add+0xbe/0xd0()
  2013-07-19 16:38         ` Thomas Gleixner
@ 2013-07-19 17:26           ` Eric Dumazet
  2013-07-19 18:46             ` Srivatsa S. Bhat
  0 siblings, 1 reply; 9+ messages in thread
From: Eric Dumazet @ 2013-07-19 17:26 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: Cong Wang, Srivatsa S. Bhat, linux-kernel, Frederic Weisbecker,
	Ingo Molnar, Paul E. McKenney, netdev

On Fri, 2013-07-19 at 18:38 +0200, Thomas Gleixner wrote:
> On Fri, 19 Jul 2013, Eric Dumazet wrote:
> > 
> > I guess following should help 
> 
> Applying the empty patch does not work very well. Could you try again
> after the caffeine reached your brain, please ? :)

hmm, right ;)

diff --git a/net/bridge/br_multicast.c b/net/bridge/br_multicast.c
index 69af490..4b99c9a 100644
--- a/net/bridge/br_multicast.c
+++ b/net/bridge/br_multicast.c
@@ -619,6 +619,9 @@ rehash:
 	mp->br = br;
 	mp->addr = *group;
 
+	setup_timer(&mp->timer, br_multicast_group_expired,
+		    (unsigned long)mp);
+
 	hlist_add_head_rcu(&mp->hlist[mdb->ver], &mdb->mhash[hash]);
 	mdb->size++;
 
@@ -1126,7 +1129,6 @@ static int br_ip4_multicast_query(struct net_bridge *br,
 	if (!mp)
 		goto out;
 
-	setup_timer(&mp->timer, br_multicast_group_expired, (unsigned long)mp);
 	mod_timer(&mp->timer, now + br->multicast_membership_interval);
 	mp->timer_armed = true;
 
@@ -1204,7 +1206,6 @@ static int br_ip6_multicast_query(struct net_bridge *br,
 	if (!mp)
 		goto out;
 
-	setup_timer(&mp->timer, br_multicast_group_expired, (unsigned long)mp);
 	mod_timer(&mp->timer, now + br->multicast_membership_interval);
 	mp->timer_armed = true;
 

^ permalink raw reply related	[flat|nested] 9+ messages in thread

* Re: mod_timer: list_add corruption: WARNING: CPU: 1 PID: 0 at lib/list_debug.c:33 __list_add+0xbe/0xd0()
  2013-07-19 17:26           ` Eric Dumazet
@ 2013-07-19 18:46             ` Srivatsa S. Bhat
  0 siblings, 0 replies; 9+ messages in thread
From: Srivatsa S. Bhat @ 2013-07-19 18:46 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: Thomas Gleixner, Cong Wang, linux-kernel, Frederic Weisbecker,
	Ingo Molnar, Paul E. McKenney, netdev

On 07/19/2013 10:56 PM, Eric Dumazet wrote:
> On Fri, 2013-07-19 at 18:38 +0200, Thomas Gleixner wrote:
>> On Fri, 19 Jul 2013, Eric Dumazet wrote:
>>>
>>> I guess following should help 
>>
>> Applying the empty patch does not work very well. Could you try again
>> after the caffeine reached your brain, please ? :)
> 
> hmm, right ;)
> 

This patch fixes the issue for me - the system has been idle for more
than an hour now without any problems (earlier, i used to get the traces
within 5 minutes of idle time).

Thanks a lot for the fix!

Tested-by: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>

Regards,
Srivatsa S. Bhat

> diff --git a/net/bridge/br_multicast.c b/net/bridge/br_multicast.c
> index 69af490..4b99c9a 100644
> --- a/net/bridge/br_multicast.c
> +++ b/net/bridge/br_multicast.c
> @@ -619,6 +619,9 @@ rehash:
>  	mp->br = br;
>  	mp->addr = *group;
> 
> +	setup_timer(&mp->timer, br_multicast_group_expired,
> +		    (unsigned long)mp);
> +
>  	hlist_add_head_rcu(&mp->hlist[mdb->ver], &mdb->mhash[hash]);
>  	mdb->size++;
> 
> @@ -1126,7 +1129,6 @@ static int br_ip4_multicast_query(struct net_bridge *br,
>  	if (!mp)
>  		goto out;
> 
> -	setup_timer(&mp->timer, br_multicast_group_expired, (unsigned long)mp);
>  	mod_timer(&mp->timer, now + br->multicast_membership_interval);
>  	mp->timer_armed = true;
> 
> @@ -1204,7 +1206,6 @@ static int br_ip6_multicast_query(struct net_bridge *br,
>  	if (!mp)
>  		goto out;
> 
> -	setup_timer(&mp->timer, br_multicast_group_expired, (unsigned long)mp);
>  	mod_timer(&mp->timer, now + br->multicast_membership_interval);
>  	mp->timer_armed = true;
> 

^ permalink raw reply	[flat|nested] 9+ messages in thread

* [PATCH] bridge: do not call setup_timer() multiple times
       [not found] <51E557C0.2060602@linux.vnet.ibm.com>
       [not found] ` <51E8F759.4060202@linux.vnet.ibm.com>
       [not found] ` <alpine.DEB.2.02.1307191323150.4089@ionos.tec.linutronix.de>
@ 2013-07-20  3:07 ` Eric Dumazet
  2013-07-20  5:13   ` David Miller
  2 siblings, 1 reply; 9+ messages in thread
From: Eric Dumazet @ 2013-07-20  3:07 UTC (permalink / raw)
  To: Srivatsa S. Bhat, David Miller; +Cc: linux-kernel, Thomas Gleixner, netdev

From: Eric Dumazet <edumazet@google.com>

commit 9f00b2e7cf24 ("bridge: only expire the mdb entry when query is
received") added a nasty bug as an active timer can be reinitialized.

setup_timer() must be done once, no matter how many time mod_timer()
is called. br_multicast_new_group() is the right place to do this.

Reported-by: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
Diagnosed-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Tested-by: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
Cc: Cong Wang <amwang@redhat.com>
---
 net/bridge/br_multicast.c |    5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/net/bridge/br_multicast.c b/net/bridge/br_multicast.c
index 69af490..4b99c9a 100644
--- a/net/bridge/br_multicast.c
+++ b/net/bridge/br_multicast.c
@@ -619,6 +619,9 @@ rehash:
 	mp->br = br;
 	mp->addr = *group;
 
+	setup_timer(&mp->timer, br_multicast_group_expired,
+		    (unsigned long)mp);
+
 	hlist_add_head_rcu(&mp->hlist[mdb->ver], &mdb->mhash[hash]);
 	mdb->size++;
 
@@ -1126,7 +1129,6 @@ static int br_ip4_multicast_query(struct net_bridge *br,
 	if (!mp)
 		goto out;
 
-	setup_timer(&mp->timer, br_multicast_group_expired, (unsigned long)mp);
 	mod_timer(&mp->timer, now + br->multicast_membership_interval);
 	mp->timer_armed = true;
 
@@ -1204,7 +1206,6 @@ static int br_ip6_multicast_query(struct net_bridge *br,
 	if (!mp)
 		goto out;
 
-	setup_timer(&mp->timer, br_multicast_group_expired, (unsigned long)mp);
 	mod_timer(&mp->timer, now + br->multicast_membership_interval);
 	mp->timer_armed = true;
 

^ permalink raw reply related	[flat|nested] 9+ messages in thread

* Re: [PATCH] bridge: do not call setup_timer() multiple times
  2013-07-20  3:07 ` [PATCH] bridge: do not call setup_timer() multiple times Eric Dumazet
@ 2013-07-20  5:13   ` David Miller
  0 siblings, 0 replies; 9+ messages in thread
From: David Miller @ 2013-07-20  5:13 UTC (permalink / raw)
  To: eric.dumazet; +Cc: srivatsa.bhat, linux-kernel, tglx, netdev

From: Eric Dumazet <eric.dumazet@gmail.com>
Date: Fri, 19 Jul 2013 20:07:16 -0700

> From: Eric Dumazet <edumazet@google.com>
> 
> commit 9f00b2e7cf24 ("bridge: only expire the mdb entry when query is
> received") added a nasty bug as an active timer can be reinitialized.
> 
> setup_timer() must be done once, no matter how many time mod_timer()
> is called. br_multicast_new_group() is the right place to do this.
> 
> Reported-by: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
> Diagnosed-by: Thomas Gleixner <tglx@linutronix.de>
> Signed-off-by: Eric Dumazet <edumazet@google.com>
> Tested-by: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
> Cc: Cong Wang <amwang@redhat.com>

Applied, thanks a lot Eric.

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2013-07-20  5:13 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <51E557C0.2060602@linux.vnet.ibm.com>
     [not found] ` <51E8F759.4060202@linux.vnet.ibm.com>
2013-07-19  9:08   ` mod_timer: list_add corruption: WARNING: CPU: 1 PID: 0 at lib/list_debug.c:33 __list_add+0xbe/0xd0() Srivatsa S. Bhat
     [not found] ` <alpine.DEB.2.02.1307191323150.4089@ionos.tec.linutronix.de>
2013-07-19 11:47   ` Srivatsa S. Bhat
2013-07-19 13:40     ` Thomas Gleixner
2013-07-19 13:51       ` Eric Dumazet
2013-07-19 16:38         ` Thomas Gleixner
2013-07-19 17:26           ` Eric Dumazet
2013-07-19 18:46             ` Srivatsa S. Bhat
2013-07-20  3:07 ` [PATCH] bridge: do not call setup_timer() multiple times Eric Dumazet
2013-07-20  5:13   ` David Miller

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).