All of lore.kernel.org
 help / color / mirror / Atom feed
* 2.6.27.18: bnx2/tg3: BUG: "scheduling while atomic" trying to ifenslave a second interface to my bond
@ 2009-04-13 21:15 Paul Smith
  2009-04-14 16:01 ` Paul Smith
                   ` (2 more replies)
  0 siblings, 3 replies; 19+ messages in thread
From: Paul Smith @ 2009-04-13 21:15 UTC (permalink / raw)
  To: netdev

Hi all; I'm hoping someone can point me in the right direction.  I have
a Broadcom NetXen II BCM5708S network card (bnx2) and a Broadcom NetXen
5714S network card (tg3).  If I use either one by itself, it works fine.
However, I want to bond them as active-active, and I can't use mode=4
because there are other devices on the network which don't support it.
So, I create the bond interface with:

        # modprobe bonding mode=6 miimon=200 xmit_hash_policy=layer2
        
        Ethernet Channel Bonding Driver: v3.3.0 (June 10, 2008)
        bonding: xor_mode param is irrelevant in mode adaptive load balancing
        bonding: In ALB mode you might experience client disconnections upon reconnection of a link if the bonding module updelay parameter (0 msec) is incompatible with the forwarding delay time of the switch

This seems to work fine.  Then I bring up the interface with ifconfig
and I get:

        bond0     Link encap:Ethernet  HWaddr 00:00:00:00:00:00  
                  inet addr:10.0.9.46  Bcast:10.0.15.255  Mask:255.255.240.0
                  UP BROADCAST MASTER MULTICAST  MTU:1500  Metric:1
                  RX packets:0 errors:0 dropped:0 overruns:0 frame:0
                  TX packets:0 errors:0 dropped:0 overruns:0 carrier:0
                  collisions:0 txqueuelen:0 
                  RX bytes:0 (0.0 B)  TX bytes:0 (0.0 B)

Then I enslave one of my ethernet cards (it doesn't appear to matter
which one I enslave first), and that works fine as well:

        # ifenslave bond0 eth2
        bnx2: eth2: using MSI
        bonding: bond0: enslaving eth2 as an active interface with a down link.
        bnx2: eth2 NIC SerDes Link is Up, 1000 Mbps full duplex, receive & transmit flow control ON
        bonding: bond0: link status definitely up for interface eth2.
        bonding: bond0: making interface eth2 the new active one.
        bonding: bond0: first active interface up!
        
        # ifconfig eth2
        eth2      Link encap:Ethernet  HWaddr 00:06:72:00:01:01  
                  UP BROADCAST RUNNING SLAVE MULTICAST  MTU:1500  Metric:1
                  RX packets:9 errors:0 dropped:0 overruns:0 frame:0
                  TX packets:36 errors:0 dropped:0 overruns:0 carrier:0
                  collisions:0 txqueuelen:1000 
                  RX bytes:696 (696.0 B)  TX bytes:2669 (2.6 KiB)
                  Interrupt:17 Memory:da000000-da012800 

I check bond0 and it's correctly inherited the MAC from this new
interface.  If I stop here I can just use this interface and everything
is great.  Similarly if I create a bond and only enslave the tg3
interface.  But of course, a bond with just one interface isn't doing
much for me :-)

As soon as I try to ifenslave the second interface, Badness Ensues:

        # ifenslave bond0 eth0
        ------------[ cut here ]------------
        WARNING: at linux/kernel/sched.c:4303 local_bh_enable_ip+0x2c/0xc0()
        Modules linked in: rng_core dock scsi_mod libata ata_piix zlib_inflate bnx2 ipmi_msghandler ipmi_si ipmi_devintf bonding
        Pid: 1552, comm: ifenslave Not tainted 2.6.27.18-WR3.0bg_small #1
        
        Call Trace:
         [<ffffffff8023be34>] warn_on_slowpath+0x64/0xb0
         [<ffffffff8028654a>] get_page_from_freelist+0x30a/0x640
         [<ffffffff8041497a>] __dev_get_by_name+0x9a/0xc0
         [<ffffffff80419a66>] dev_ethtool+0xd46/0x11c0
         [<ffffffff8027fc7a>] find_get_page+0x9a/0xe0
         [<ffffffff802800c3>] find_lock_page+0x23/0x80
         [<ffffffff8024233c>] local_bh_enable_ip+0x2c/0xc0
         [<ffffffffa00ad780>] bond_alb_set_mac_address+0x2a0/0x2f0 [bonding]
         [<ffffffff80416d26>] dev_set_mac_address+0x56/0x80
         [<ffffffff80418013>] dev_ioctl+0x343/0x5e0
         [<ffffffff8045c43b>] devinet_ioctl+0x29b/0x7b0
         [<ffffffff80406df1>] sock_ioctl+0x71/0x260
         [<ffffffff802bddff>] vfs_ioctl+0x2f/0xb0
         [<ffffffff802be0e3>] do_vfs_ioctl+0x263/0x2e0
         [<ffffffff802bddff>] vfs_ioctl+0x2f/0xb0
         [<ffffffff802be217>] sys_ioctl+0xb7/0x100
         [<ffffffff802eb2a3>] dev_ifsioc+0x73/0x2c0
         [<ffffffff802eaf9a>] ethtool_ioctl+0x9a/0xa0
         [<ffffffff802ebfa3>] compat_sys_ioctl+0x113/0x3c0
         [<ffffffff8022ad52>] ia32_syscall_done+0x0/0xa
        BUG: scheduling while atomic: ifenslave/1552/0x10000000
        Modules linked in: rng_core dock scsi_mod libata ata_piix zlib_inflate bnx2 ipmi_msghandler ipmi_si ipmi_devintf bonding
        Pid: 1552, comm: ifenslave Not tainted 2.6.27.18-WR3.0bg_small #1
        
        Call Trace:
         [<ffffffff8049b53a>] schedule+0xea/0x336
         [<ffffffff8020e619>] show_trace_log_lvl+0x39/0x80
         [<ffffffff8049b04b>] printk+0xc0/0xd5
         [<ffffffff8049b432>] preempt_schedule+0x32/0x50
         [<ffffffff8020e5b3>] dump_trace_extended+0x4f3/0x500
         [<ffffffff8020e5d0>] dump_trace+0x10/0x20
         [<ffffffff8020e634>] show_trace_log_lvl+0x54/0x80
         [<ffffffff8049ae36>] dump_stack+0x69/0x6f
         [<ffffffff8023be34>] warn_on_slowpath+0x64/0xb0
         [<ffffffff8028654a>] get_page_from_freelist+0x30a/0x640
         [<ffffffff8041497a>] __dev_get_by_name+0x9a/0xc0
         [<ffffffff80419a66>] dev_ethtool+0xd46/0x11c0
         [<ffffffff8027fc7a>] find_get_page+0x9a/0xe0
         [<ffffffff802800c3>] find_lock_page+0x23/0x80
         [<ffffffff8024233c>] local_bh_enable_ip+0x2c/0xc0
         [<ffffffffa00ad780>] bond_alb_set_mac_address+0x2a0/0x2f0 [bonding]
         [<ffffffff80416d26>] dev_set_mac_address+0x56/0x80
         [<ffffffff80418013>] dev_ioctl+0x343/0x5e0
         [<ffffffff8045c43b>] devinet_ioctl+0x29b/0x7b0
         [<ffffffff80406df1>] sock_ioctl+0x71/0x260
         [<ffffffff802bddff>] vfs_ioctl+0x2f/0xb0
         [<ffffffff802be0e3>] do_vfs_ioctl+0x263/0x2e0
         [<ffffffff802bddff>] vfs_ioctl+0x2f/0xb0
         [<ffffffff802be217>] sys_ioctl+0xb7/0x100
         [<ffffffff802eb2a3>] dev_ifsioc+0x73/0x2c0
         [<ffffffff802eaf9a>] ethtool_ioctl+0x9a/0xa0
         [<ffffffff802ebfa3>] compat_sys_ioctl+0x113/0x3c0
         [<ffffffff8022ad52>] ia32_syscall_done+0x0/0xa

        ---[ end trace ff7f0219c6745dff ]---

I can't access the console anymore (typing does nothing) but if I let it
sit there, it will periodically complain further:

        BUG: soft lockup - CPU#2 stuck for 61s! [ifenslave:1552]
        Modules linked in: rng_core dock scsi_mod libata ata_piix zlib_inflate bnx2 ipmi_msghandler ipmi_si ipmi_devintf bonding
        CPU 2:
        Modules linked in: rng_core dock scsi_mod libata ata_piix zlib_inflate bnx2 ipmi_msghandler ipmi_si ipmi_devintf bonding
        Pid: 1552, comm: ifenslave Tainted: G        W 2.6.27.18-WR3.0bg_small #1
        RIP: 0010:[<ffffffff8036773f>]  [<ffffffff8036773f>] __write_lock_failed+0xf/0x20
        RSP: 0000:ffff88046fb71c80  EFLAGS: 00000206
        RAX: ffff88046fb71fd8 RBX: ffff88046e115200 RCX: 0000000000000001
        RDX: 0000000000000101 RSI: ffff88046e0be400 RDI: ffff88046e1156b0
        RBP: 0000000000000000 R08: ffff88046fb88c70 R09: 0000000000000000
        R10: 00000000e1281e79 R11: 0000000000000001 R12: ffff88046e115680
        R13: ffff88046fb71c18 R14: ffff88046c79df00 R15: ffff88046e0be400
        FS:  0000000000000000(0000) GS:ffff88046f805880(0063) knlGS:00000000f7f126c0
        CS:  0010 DS: 002b ES: 002b CR0: 000000008005003b
        CR2: 000000004cd11000 CR3: 000000046c734000 CR4: 00000000000006e0
        DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
        DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
        
        Call Trace:
         [<ffffffff8049d5d4>] _write_lock_bh+0x24/0x30
         [<ffffffffa00ad759>] bond_alb_set_mac_address+0x279/0x2f0 [bonding]
         [<ffffffff80416d26>] dev_set_mac_address+0x56/0x80
         [<ffffffff80418013>] dev_ioctl+0x343/0x5e0
         [<ffffffff8045c43b>] devinet_ioctl+0x29b/0x7b0
         [<ffffffff80406df1>] sock_ioctl+0x71/0x260
         [<ffffffff802bddff>] vfs_ioctl+0x2f/0xb0
         [<ffffffff802be0e3>] do_vfs_ioctl+0x263/0x2e0
         [<ffffffff802bddff>] vfs_ioctl+0x2f/0xb0
         [<ffffffff802be217>] sys_ioctl+0xb7/0x100
         [<ffffffff802eb2a3>] dev_ifsioc+0x73/0x2c0
         [<ffffffff802eaf9a>] ethtool_ioctl+0x9a/0xa0
         [<ffffffff802ebfa3>] compat_sys_ioctl+0x113/0x3c0
         [<ffffffff8022ad52>] ia32_syscall_done+0x0/0xa

<a little bit later>

        ------------[ cut here ]------------
        WARNING: at /linux/net/sched/sch_generic.c:219 dev_watchdog+0x22e/0x240()
        NETDEV WATCHDOG: eth2 (bnx2): transmit timed out
        Modules linked in: rng_core dock scsi_mod libata ata_piix zlib_inflate bnx2 ipmi_msghandler ipmi_si ipmi_devintf bonding
        Pid: 0, comm: swapper Tainted: G        W 2.6.27.18-WR3.0bg_small #1
        
        Call Trace:
         <IRQ>  [<ffffffff8023bd7d>] warn_slowpath+0xcd/0x120
         [<ffffffff802575ba>] hrtimer_interrupt+0x16a/0x1d0
         [<ffffffff8022f20e>] resched_task+0x4e/0x80
         [<ffffffff802a7ca2>] __slab_free+0xb2/0x380
         [<ffffffff802a7ca2>] __slab_free+0xb2/0x380
         [<ffffffff8035eca9>] __next_cpu+0x19/0x30
         [<ffffffff8023185c>] find_busiest_group+0x1dc/0x960
         [<ffffffff8022e870>] load_balance_fair+0xa0/0x130
         [<ffffffff80364e21>] strlcpy+0x41/0x50
         [<ffffffff80426fee>] dev_watchdog+0x22e/0x240
         [<ffffffff80426dc0>] dev_watchdog+0x0/0x240
         [<ffffffff80247207>] run_timer_softirq+0x157/0x230
         [<ffffffff8025a407>] getnstimeofday+0x57/0xe0
         [<ffffffff80242603>] __do_softirq+0xe3/0x210
         [<ffffffff8020d91c>] call_softirq+0x1c/0x30
         [<ffffffff8020ff75>] do_softirq+0x35/0x70
         [<ffffffff802416b5>] irq_exit+0x45/0x60
         [<ffffffff8021dc09>] smp_apic_timer_interrupt+0x149/0x1b0
         [<ffffffff8020d366>] apic_timer_interrupt+0x66/0x70
         <EOI>  [<ffffffff80214f5c>] mwait_idle+0x3c/0x50
         [<ffffffff8020b4b9>] cpu_idle+0x79/0x100
        
        ---[ end trace 7a134222da5adb1b ]---

I've tried all kinds of things, as I alluded to above: switching the
order, adding sleeps (before invoking ifenslave etc.), bringing up the
slave interfaces before I enslave or not, power-cycling, etc. but
nothing seems to make a difference; as soon as I bond the second
interface the whole thing goes south.

In my googling I haven't found too much, but I did find this:

        https://bugzilla.redhat.com/show_bug.cgi?id=251902#c25

which is a comment added to a different bug.  Although the trace doesn't
match the original bug, it does resemble my trace (but I'm not using
Xen)  However, the Red Hat engineer (rightly) requested that a new bug
be filed for this and I haven't been able to find that new bug (if it
was ever filed).

I've also pulled the latest GIT tree and looked at the differences
between the drivers/net/bond/bond_alb.c but didn't see anything that
looked like it related to this (but, I'm not versed in the kernel code
so it's quite possible I missed it).  I checked differences between
bond_main.c etc. as well but, again, nothing jumped at me.  Since I'm
working on an embedded system it will be somewhat painful to try to
build the latest kernel to test in this environment, but I could do it
if someone believes that it might be fixed there.

Anyone have any thoughts about what might be going on, or what my next
steps should be?  I'm stumped :-(

-- 
Paul Smith <paul@mad-scientist.net>



^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: 2.6.27.18: bnx2/tg3: BUG: "scheduling while atomic" trying to ifenslave a second interface to my bond
  2009-04-13 21:15 2.6.27.18: bnx2/tg3: BUG: "scheduling while atomic" trying to ifenslave a second interface to my bond Paul Smith
@ 2009-04-14 16:01 ` Paul Smith
  2009-04-14 21:29   ` Brian Haley
  2009-04-15  1:12 ` Jay Vosburgh
       [not found] ` <1241397581.6499.658.camel@homebase.localnet>
  2 siblings, 1 reply; 19+ messages in thread
From: Paul Smith @ 2009-04-14 16:01 UTC (permalink / raw)
  To: netdev

Sorry for the top-post, but I just wanted to add: the system has two
NetXen II interfaces and two NetXen interfaces.  I've now tried bonding
all combinations of these interfaces, and regardless of the order they
all fail when the second interface is bonded.

As another data point, if I change the bonding to mode=4 instead, then I
don't get any kernel failures (but of course the bonding doesn't work
properly as the switch is not configured for this).

Is anyone else able to use mode=6 with the bonding driver, or is that
mode just non-functional?  Is it something particular to these Broadcom
drivers?

I'm still pretty stumped here and I'd really love some pointers...
thanks!

On Mon, 2009-04-13 at 17:15 -0400, Paul Smith wrote:
> Hi all; I'm hoping someone can point me in the right direction.  I have
> a Broadcom NetXen II BCM5708S network card (bnx2) and a Broadcom NetXen
> 5714S network card (tg3).  If I use either one by itself, it works fine.
> However, I want to bond them as active-active, and I can't use mode=4
> because there are other devices on the network which don't support it.
> So, I create the bond interface with:
> 
>         # modprobe bonding mode=6 miimon=200 xmit_hash_policy=layer2
>         
>         Ethernet Channel Bonding Driver: v3.3.0 (June 10, 2008)
>         bonding: xor_mode param is irrelevant in mode adaptive load balancing
>         bonding: In ALB mode you might experience client disconnections upon reconnection of a link if the bonding module updelay parameter (0 msec) is incompatible with the forwarding delay time of the switch
> 
> This seems to work fine.  Then I bring up the interface with ifconfig
> and I get:
> 
>         bond0     Link encap:Ethernet  HWaddr 00:00:00:00:00:00  
>                   inet addr:10.0.9.46  Bcast:10.0.15.255  Mask:255.255.240.0
>                   UP BROADCAST MASTER MULTICAST  MTU:1500  Metric:1
>                   RX packets:0 errors:0 dropped:0 overruns:0 frame:0
>                   TX packets:0 errors:0 dropped:0 overruns:0 carrier:0
>                   collisions:0 txqueuelen:0 
>                   RX bytes:0 (0.0 B)  TX bytes:0 (0.0 B)
> 
> Then I enslave one of my ethernet cards (it doesn't appear to matter
> which one I enslave first), and that works fine as well:
> 
>         # ifenslave bond0 eth2
>         bnx2: eth2: using MSI
>         bonding: bond0: enslaving eth2 as an active interface with a down link.
>         bnx2: eth2 NIC SerDes Link is Up, 1000 Mbps full duplex, receive & transmit flow control ON
>         bonding: bond0: link status definitely up for interface eth2.
>         bonding: bond0: making interface eth2 the new active one.
>         bonding: bond0: first active interface up!
>         
>         # ifconfig eth2
>         eth2      Link encap:Ethernet  HWaddr 00:06:72:00:01:01  
>                   UP BROADCAST RUNNING SLAVE MULTICAST  MTU:1500  Metric:1
>                   RX packets:9 errors:0 dropped:0 overruns:0 frame:0
>                   TX packets:36 errors:0 dropped:0 overruns:0 carrier:0
>                   collisions:0 txqueuelen:1000 
>                   RX bytes:696 (696.0 B)  TX bytes:2669 (2.6 KiB)
>                   Interrupt:17 Memory:da000000-da012800 
> 
> I check bond0 and it's correctly inherited the MAC from this new
> interface.  If I stop here I can just use this interface and everything
> is great.  Similarly if I create a bond and only enslave the tg3
> interface.  But of course, a bond with just one interface isn't doing
> much for me :-)
> 
> As soon as I try to ifenslave the second interface, Badness Ensues:
> 
>         # ifenslave bond0 eth0
>         ------------[ cut here ]------------
>         WARNING: at linux/kernel/sched.c:4303 local_bh_enable_ip+0x2c/0xc0()
>         Modules linked in: rng_core dock scsi_mod libata ata_piix zlib_inflate bnx2 ipmi_msghandler ipmi_si ipmi_devintf bonding
>         Pid: 1552, comm: ifenslave Not tainted 2.6.27.18-WR3.0bg_small #1
>         
>         Call Trace:
>          [<ffffffff8023be34>] warn_on_slowpath+0x64/0xb0
>          [<ffffffff8028654a>] get_page_from_freelist+0x30a/0x640
>          [<ffffffff8041497a>] __dev_get_by_name+0x9a/0xc0
>          [<ffffffff80419a66>] dev_ethtool+0xd46/0x11c0
>          [<ffffffff8027fc7a>] find_get_page+0x9a/0xe0
>          [<ffffffff802800c3>] find_lock_page+0x23/0x80
>          [<ffffffff8024233c>] local_bh_enable_ip+0x2c/0xc0
>          [<ffffffffa00ad780>] bond_alb_set_mac_address+0x2a0/0x2f0 [bonding]
>          [<ffffffff80416d26>] dev_set_mac_address+0x56/0x80
>          [<ffffffff80418013>] dev_ioctl+0x343/0x5e0
>          [<ffffffff8045c43b>] devinet_ioctl+0x29b/0x7b0
>          [<ffffffff80406df1>] sock_ioctl+0x71/0x260
>          [<ffffffff802bddff>] vfs_ioctl+0x2f/0xb0
>          [<ffffffff802be0e3>] do_vfs_ioctl+0x263/0x2e0
>          [<ffffffff802bddff>] vfs_ioctl+0x2f/0xb0
>          [<ffffffff802be217>] sys_ioctl+0xb7/0x100
>          [<ffffffff802eb2a3>] dev_ifsioc+0x73/0x2c0
>          [<ffffffff802eaf9a>] ethtool_ioctl+0x9a/0xa0
>          [<ffffffff802ebfa3>] compat_sys_ioctl+0x113/0x3c0
>          [<ffffffff8022ad52>] ia32_syscall_done+0x0/0xa
>         BUG: scheduling while atomic: ifenslave/1552/0x10000000
>         Modules linked in: rng_core dock scsi_mod libata ata_piix zlib_inflate bnx2 ipmi_msghandler ipmi_si ipmi_devintf bonding
>         Pid: 1552, comm: ifenslave Not tainted 2.6.27.18-WR3.0bg_small #1
>         
>         Call Trace:
>          [<ffffffff8049b53a>] schedule+0xea/0x336
>          [<ffffffff8020e619>] show_trace_log_lvl+0x39/0x80
>          [<ffffffff8049b04b>] printk+0xc0/0xd5
>          [<ffffffff8049b432>] preempt_schedule+0x32/0x50
>          [<ffffffff8020e5b3>] dump_trace_extended+0x4f3/0x500
>          [<ffffffff8020e5d0>] dump_trace+0x10/0x20
>          [<ffffffff8020e634>] show_trace_log_lvl+0x54/0x80
>          [<ffffffff8049ae36>] dump_stack+0x69/0x6f
>          [<ffffffff8023be34>] warn_on_slowpath+0x64/0xb0
>          [<ffffffff8028654a>] get_page_from_freelist+0x30a/0x640
>          [<ffffffff8041497a>] __dev_get_by_name+0x9a/0xc0
>          [<ffffffff80419a66>] dev_ethtool+0xd46/0x11c0
>          [<ffffffff8027fc7a>] find_get_page+0x9a/0xe0
>          [<ffffffff802800c3>] find_lock_page+0x23/0x80
>          [<ffffffff8024233c>] local_bh_enable_ip+0x2c/0xc0
>          [<ffffffffa00ad780>] bond_alb_set_mac_address+0x2a0/0x2f0 [bonding]
>          [<ffffffff80416d26>] dev_set_mac_address+0x56/0x80
>          [<ffffffff80418013>] dev_ioctl+0x343/0x5e0
>          [<ffffffff8045c43b>] devinet_ioctl+0x29b/0x7b0
>          [<ffffffff80406df1>] sock_ioctl+0x71/0x260
>          [<ffffffff802bddff>] vfs_ioctl+0x2f/0xb0
>          [<ffffffff802be0e3>] do_vfs_ioctl+0x263/0x2e0
>          [<ffffffff802bddff>] vfs_ioctl+0x2f/0xb0
>          [<ffffffff802be217>] sys_ioctl+0xb7/0x100
>          [<ffffffff802eb2a3>] dev_ifsioc+0x73/0x2c0
>          [<ffffffff802eaf9a>] ethtool_ioctl+0x9a/0xa0
>          [<ffffffff802ebfa3>] compat_sys_ioctl+0x113/0x3c0
>          [<ffffffff8022ad52>] ia32_syscall_done+0x0/0xa
> 
>         ---[ end trace ff7f0219c6745dff ]---
> 
> I can't access the console anymore (typing does nothing) but if I let it
> sit there, it will periodically complain further:
> 
>         BUG: soft lockup - CPU#2 stuck for 61s! [ifenslave:1552]
>         Modules linked in: rng_core dock scsi_mod libata ata_piix zlib_inflate bnx2 ipmi_msghandler ipmi_si ipmi_devintf bonding
>         CPU 2:
>         Modules linked in: rng_core dock scsi_mod libata ata_piix zlib_inflate bnx2 ipmi_msghandler ipmi_si ipmi_devintf bonding
>         Pid: 1552, comm: ifenslave Tainted: G        W 2.6.27.18-WR3.0bg_small #1
>         RIP: 0010:[<ffffffff8036773f>]  [<ffffffff8036773f>] __write_lock_failed+0xf/0x20
>         RSP: 0000:ffff88046fb71c80  EFLAGS: 00000206
>         RAX: ffff88046fb71fd8 RBX: ffff88046e115200 RCX: 0000000000000001
>         RDX: 0000000000000101 RSI: ffff88046e0be400 RDI: ffff88046e1156b0
>         RBP: 0000000000000000 R08: ffff88046fb88c70 R09: 0000000000000000
>         R10: 00000000e1281e79 R11: 0000000000000001 R12: ffff88046e115680
>         R13: ffff88046fb71c18 R14: ffff88046c79df00 R15: ffff88046e0be400
>         FS:  0000000000000000(0000) GS:ffff88046f805880(0063) knlGS:00000000f7f126c0
>         CS:  0010 DS: 002b ES: 002b CR0: 000000008005003b
>         CR2: 000000004cd11000 CR3: 000000046c734000 CR4: 00000000000006e0
>         DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
>         DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
>         
>         Call Trace:
>          [<ffffffff8049d5d4>] _write_lock_bh+0x24/0x30
>          [<ffffffffa00ad759>] bond_alb_set_mac_address+0x279/0x2f0 [bonding]
>          [<ffffffff80416d26>] dev_set_mac_address+0x56/0x80
>          [<ffffffff80418013>] dev_ioctl+0x343/0x5e0
>          [<ffffffff8045c43b>] devinet_ioctl+0x29b/0x7b0
>          [<ffffffff80406df1>] sock_ioctl+0x71/0x260
>          [<ffffffff802bddff>] vfs_ioctl+0x2f/0xb0
>          [<ffffffff802be0e3>] do_vfs_ioctl+0x263/0x2e0
>          [<ffffffff802bddff>] vfs_ioctl+0x2f/0xb0
>          [<ffffffff802be217>] sys_ioctl+0xb7/0x100
>          [<ffffffff802eb2a3>] dev_ifsioc+0x73/0x2c0
>          [<ffffffff802eaf9a>] ethtool_ioctl+0x9a/0xa0
>          [<ffffffff802ebfa3>] compat_sys_ioctl+0x113/0x3c0
>          [<ffffffff8022ad52>] ia32_syscall_done+0x0/0xa
> 
> <a little bit later>
> 
>         ------------[ cut here ]------------
>         WARNING: at /linux/net/sched/sch_generic.c:219 dev_watchdog+0x22e/0x240()
>         NETDEV WATCHDOG: eth2 (bnx2): transmit timed out
>         Modules linked in: rng_core dock scsi_mod libata ata_piix zlib_inflate bnx2 ipmi_msghandler ipmi_si ipmi_devintf bonding
>         Pid: 0, comm: swapper Tainted: G        W 2.6.27.18-WR3.0bg_small #1
>         
>         Call Trace:
>          <IRQ>  [<ffffffff8023bd7d>] warn_slowpath+0xcd/0x120
>          [<ffffffff802575ba>] hrtimer_interrupt+0x16a/0x1d0
>          [<ffffffff8022f20e>] resched_task+0x4e/0x80
>          [<ffffffff802a7ca2>] __slab_free+0xb2/0x380
>          [<ffffffff802a7ca2>] __slab_free+0xb2/0x380
>          [<ffffffff8035eca9>] __next_cpu+0x19/0x30
>          [<ffffffff8023185c>] find_busiest_group+0x1dc/0x960
>          [<ffffffff8022e870>] load_balance_fair+0xa0/0x130
>          [<ffffffff80364e21>] strlcpy+0x41/0x50
>          [<ffffffff80426fee>] dev_watchdog+0x22e/0x240
>          [<ffffffff80426dc0>] dev_watchdog+0x0/0x240
>          [<ffffffff80247207>] run_timer_softirq+0x157/0x230
>          [<ffffffff8025a407>] getnstimeofday+0x57/0xe0
>          [<ffffffff80242603>] __do_softirq+0xe3/0x210
>          [<ffffffff8020d91c>] call_softirq+0x1c/0x30
>          [<ffffffff8020ff75>] do_softirq+0x35/0x70
>          [<ffffffff802416b5>] irq_exit+0x45/0x60
>          [<ffffffff8021dc09>] smp_apic_timer_interrupt+0x149/0x1b0
>          [<ffffffff8020d366>] apic_timer_interrupt+0x66/0x70
>          <EOI>  [<ffffffff80214f5c>] mwait_idle+0x3c/0x50
>          [<ffffffff8020b4b9>] cpu_idle+0x79/0x100
>         
>         ---[ end trace 7a134222da5adb1b ]---
> 
> I've tried all kinds of things, as I alluded to above: switching the
> order, adding sleeps (before invoking ifenslave etc.), bringing up the
> slave interfaces before I enslave or not, power-cycling, etc. but
> nothing seems to make a difference; as soon as I bond the second
> interface the whole thing goes south.
> 
> In my googling I haven't found too much, but I did find this:
> 
>         https://bugzilla.redhat.com/show_bug.cgi?id=251902#c25
> 
> which is a comment added to a different bug.  Although the trace doesn't
> match the original bug, it does resemble my trace (but I'm not using
> Xen)  However, the Red Hat engineer (rightly) requested that a new bug
> be filed for this and I haven't been able to find that new bug (if it
> was ever filed).
> 
> I've also pulled the latest GIT tree and looked at the differences
> between the drivers/net/bond/bond_alb.c but didn't see anything that
> looked like it related to this (but, I'm not versed in the kernel code
> so it's quite possible I missed it).  I checked differences between
> bond_main.c etc. as well but, again, nothing jumped at me.  Since I'm
> working on an embedded system it will be somewhat painful to try to
> build the latest kernel to test in this environment, but I could do it
> if someone believes that it might be fixed there.
> 
> Anyone have any thoughts about what might be going on, or what my next
> steps should be?  I'm stumped :-(

-- 
Paul Smith <psmith@mad-scientist.net>



^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: 2.6.27.18: bnx2/tg3: BUG: "scheduling while atomic" trying to ifenslave a second interface to my bond
  2009-04-14 16:01 ` Paul Smith
@ 2009-04-14 21:29   ` Brian Haley
  0 siblings, 0 replies; 19+ messages in thread
From: Brian Haley @ 2009-04-14 21:29 UTC (permalink / raw)
  To: paul; +Cc: netdev

Paul Smith wrote:
> Is anyone else able to use mode=6 with the bonding driver, or is that
> mode just non-functional?  Is it something particular to these Broadcom
> drivers?

The only help I can give you right now is that on my test system it doesn't crash.

	Bonding Driver: v3.5.0 (November 4, 2008)
	2x Broadcom NetXtreme II BCM5708

The kernel itself is stock 2.6.29 and I'm running on an x86_64 box, not exactly
embedded :)

-Brian

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: 2.6.27.18: bnx2/tg3: BUG: "scheduling while atomic" trying to ifenslave a second interface to my bond
  2009-04-13 21:15 2.6.27.18: bnx2/tg3: BUG: "scheduling while atomic" trying to ifenslave a second interface to my bond Paul Smith
  2009-04-14 16:01 ` Paul Smith
@ 2009-04-15  1:12 ` Jay Vosburgh
  2009-04-15  3:23   ` David Miller
                     ` (2 more replies)
       [not found] ` <1241397581.6499.658.camel@homebase.localnet>
  2 siblings, 3 replies; 19+ messages in thread
From: Jay Vosburgh @ 2009-04-15  1:12 UTC (permalink / raw)
  To: paul; +Cc: netdev

Paul Smith <paul@mad-scientist.net> wrote:
[...]
>As soon as I try to ifenslave the second interface, Badness Ensues:
>
>        # ifenslave bond0 eth0
>        ------------[ cut here ]------------
>        WARNING: at linux/kernel/sched.c:4303 local_bh_enable_ip+0x2c/0xc0()
>        Modules linked in: rng_core dock scsi_mod libata ata_piix zlib_inflate bnx2 ipmi_msghandler ipmi_si ipmi_devintf bonding
>        Pid: 1552, comm: ifenslave Not tainted 2.6.27.18-WR3.0bg_small #1
>        
>        Call Trace:
>         [<ffffffff8023be34>] warn_on_slowpath+0x64/0xb0
>         [<ffffffff8028654a>] get_page_from_freelist+0x30a/0x640
>         [<ffffffff8041497a>] __dev_get_by_name+0x9a/0xc0
>         [<ffffffff80419a66>] dev_ethtool+0xd46/0x11c0
>         [<ffffffff8027fc7a>] find_get_page+0x9a/0xe0
>         [<ffffffff802800c3>] find_lock_page+0x23/0x80
>         [<ffffffff8024233c>] local_bh_enable_ip+0x2c/0xc0
>         [<ffffffffa00ad780>] bond_alb_set_mac_address+0x2a0/0x2f0 [bonding]
[...]

	I think I know what's going on.  I believe this patch will
resolve things, but I won't be able to test it until tomorrow.  If you
want to test this, great; if you want to wait, that's fine too.

diff --git a/drivers/net/bonding/bond_alb.c b/drivers/net/bonding/bond_alb.c
index 8dc6fbb..b22467a 100644
--- a/drivers/net/bonding/bond_alb.c
+++ b/drivers/net/bonding/bond_alb.c
@@ -1708,10 +1708,8 @@ void bond_alb_handle_active_change(struct bonding *bond, struct slave *new_slave
  * Called with RTNL
  */
 int bond_alb_set_mac_address(struct net_device *bond_dev, void *addr)
-	__releases(&bond->curr_slave_lock)
-	__releases(&bond->lock)
 	__acquires(&bond->lock)
-	__acquires(&bond->curr_slave_lock)
+	__releases(&bond->lock)
 {
 	struct bonding *bond = netdev_priv(bond_dev);
 	struct sockaddr *sa = addr;
@@ -1747,9 +1745,6 @@ int bond_alb_set_mac_address(struct net_device *bond_dev, void *addr)
 		}
 	}
 
-	write_unlock_bh(&bond->curr_slave_lock);
-	read_unlock(&bond->lock);
-
 	if (swap_slave) {
 		alb_swap_mac_addr(bond, swap_slave, bond->curr_active_slave);
 		alb_fasten_mac_swap(bond, swap_slave, bond->curr_active_slave);
@@ -1757,16 +1752,17 @@ int bond_alb_set_mac_address(struct net_device *bond_dev, void *addr)
 		alb_set_slave_mac_addr(bond->curr_active_slave, bond_dev->dev_addr,
 				       bond->alb_info.rlb_enabled);
 
-		alb_send_learning_packets(bond->curr_active_slave, bond_dev->dev_addr);
+		read_lock(&bond->lock);
+		alb_send_learning_packets(bond->curr_active_slave,
+					  bond_dev->dev_addr);
 		if (bond->alb_info.rlb_enabled) {
 			/* inform clients mac address has changed */
-			rlb_req_update_slave_clients(bond, bond->curr_active_slave);
+			rlb_req_update_slave_clients(bond,
+						     bond->curr_active_slave);
 		}
+		read_unlock(&bond->lock);
 	}
 
-	read_lock(&bond->lock);
-	write_lock_bh(&bond->curr_slave_lock);
-
 	return 0;
 }
 



	-J

---
	-Jay Vosburgh, IBM Linux Technology Center, fubar@us.ibm.com

^ permalink raw reply related	[flat|nested] 19+ messages in thread

* Re: 2.6.27.18: bnx2/tg3: BUG: "scheduling while atomic" trying to ifenslave a second interface to my bond
  2009-04-15  1:12 ` Jay Vosburgh
@ 2009-04-15  3:23   ` David Miller
  2009-04-15  5:29   ` Paul Smith
  2009-04-15 16:56   ` Paul Smith
  2 siblings, 0 replies; 19+ messages in thread
From: David Miller @ 2009-04-15  3:23 UTC (permalink / raw)
  To: fubar; +Cc: paul, netdev

From: Jay Vosburgh <fubar@us.ibm.com>
Date: Tue, 14 Apr 2009 18:12:47 -0700

> 	I think I know what's going on.  I believe this patch will
> resolve things, but I won't be able to test it until tomorrow.  If you
> want to test this, great; if you want to wait, that's fine too.

Jay, thanks for working on this.

Let me know when you have a final version of this fix for me
to include.

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: 2.6.27.18: bnx2/tg3: BUG: "scheduling while atomic" trying to ifenslave a second interface to my bond
  2009-04-15  1:12 ` Jay Vosburgh
  2009-04-15  3:23   ` David Miller
@ 2009-04-15  5:29   ` Paul Smith
  2009-04-15 16:56   ` Paul Smith
  2 siblings, 0 replies; 19+ messages in thread
From: Paul Smith @ 2009-04-15  5:29 UTC (permalink / raw)
  To: Linux netdev

On Tue, 2009-04-14 at 18:12 -0700, Jay Vosburgh wrote:
> 	I think I know what's going on.  I believe this patch will
> resolve things, but I won't be able to test it until tomorrow.  If you
> want to test this, great; if you want to wait, that's fine too.

I tested this; it works great.  All my systems came up fine with this
change applied.  Thanks!

> diff --git a/drivers/net/bonding/bond_alb.c b/drivers/net/bonding/bond_alb.c
> index 8dc6fbb..b22467a 100644
> --- a/drivers/net/bonding/bond_alb.c
> +++ b/drivers/net/bonding/bond_alb.c
> @@ -1708,10 +1708,8 @@ void bond_alb_handle_active_change(struct bonding *bond, struct slave *new_slave
>   * Called with RTNL
>   */
>  int bond_alb_set_mac_address(struct net_device *bond_dev, void *addr)
> -	__releases(&bond->curr_slave_lock)
> -	__releases(&bond->lock)
>  	__acquires(&bond->lock)
> -	__acquires(&bond->curr_slave_lock)
> +	__releases(&bond->lock)
>  {
>  	struct bonding *bond = netdev_priv(bond_dev);
>  	struct sockaddr *sa = addr;
> @@ -1747,9 +1745,6 @@ int bond_alb_set_mac_address(struct net_device *bond_dev, void *addr)
>  		}
>  	}
>  
> -	write_unlock_bh(&bond->curr_slave_lock);
> -	read_unlock(&bond->lock);
> -
>  	if (swap_slave) {
>  		alb_swap_mac_addr(bond, swap_slave, bond->curr_active_slave);
>  		alb_fasten_mac_swap(bond, swap_slave, bond->curr_active_slave);
> @@ -1757,16 +1752,17 @@ int bond_alb_set_mac_address(struct net_device *bond_dev, void *addr)
>  		alb_set_slave_mac_addr(bond->curr_active_slave, bond_dev->dev_addr,
>  				       bond->alb_info.rlb_enabled);
>  
> -		alb_send_learning_packets(bond->curr_active_slave, bond_dev->dev_addr);
> +		read_lock(&bond->lock);
> +		alb_send_learning_packets(bond->curr_active_slave,
> +					  bond_dev->dev_addr);
>  		if (bond->alb_info.rlb_enabled) {
>  			/* inform clients mac address has changed */
> -			rlb_req_update_slave_clients(bond, bond->curr_active_slave);
> +			rlb_req_update_slave_clients(bond,
> +						     bond->curr_active_slave);
>  		}
> +		read_unlock(&bond->lock);
>  	}
>  
> -	read_lock(&bond->lock);
> -	write_lock_bh(&bond->curr_slave_lock);
> -
>  	return 0;
>  }
>  

-- 
Paul Smith <paul@mad-scientist.net>

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: 2.6.27.18: bnx2/tg3: BUG: "scheduling while atomic" trying to ifenslave a second interface to my bond
  2009-04-15  1:12 ` Jay Vosburgh
  2009-04-15  3:23   ` David Miller
  2009-04-15  5:29   ` Paul Smith
@ 2009-04-15 16:56   ` Paul Smith
  2009-04-15 18:11     ` Jay Vosburgh
  2 siblings, 1 reply; 19+ messages in thread
From: Paul Smith @ 2009-04-15 16:56 UTC (permalink / raw)
  To: Jay Vosburgh; +Cc: netdev

On Tue, 2009-04-14 at 18:12 -0700, Jay Vosburgh wrote:
> 	I think I know what's going on.  I believe this patch will
> resolve things, but I won't be able to test it until tomorrow.  If you
> want to test this, great; if you want to wait, that's fine too.

Hi Jay; as I mentioned last night this patch is working fine for me so
far.

However, looking at the rest of this function it seems to me that there
are other locking issues, at least based on the documentation in the
header file:

 * Here are the locking policies for the two bonding locks:
 *
 * 1) Get bond->lock when reading/writing slave list.
 * 2) Get bond->curr_slave_lock when reading/writing bond->curr_active_slave.
 *    (It is unnecessary when the write-lock is put with bond->lock.)
 * 3) When we lock with bond->curr_slave_lock, we must lock with bond->lock
 *    beforehand.

For example, don't you need to hold bond->curr_slave_lock at least
around the "if (!bond->curr_active_slave)"?  What about around the
"bond_for_each_slave" loop?

Many of the other functions, later, also seem to work with
bond->curr_active_slave and they don't take this lock.

Unless I'm missing something, I think there are still more problems in
the locking in bond_alb_set_mac_address().

Thoughts?

> diff --git a/drivers/net/bonding/bond_alb.c b/drivers/net/bonding/bond_alb.c
> index 8dc6fbb..b22467a 100644
> --- a/drivers/net/bonding/bond_alb.c
> +++ b/drivers/net/bonding/bond_alb.c
> @@ -1708,10 +1708,8 @@ void bond_alb_handle_active_change(struct bonding *bond, struct slave *new_slave
>   * Called with RTNL
>   */
>  int bond_alb_set_mac_address(struct net_device *bond_dev, void *addr)
> -	__releases(&bond->curr_slave_lock)
> -	__releases(&bond->lock)
>  	__acquires(&bond->lock)
> -	__acquires(&bond->curr_slave_lock)
> +	__releases(&bond->lock)
>  {
>  	struct bonding *bond = netdev_priv(bond_dev);
>  	struct sockaddr *sa = addr;
> @@ -1747,9 +1745,6 @@ int bond_alb_set_mac_address(struct net_device *bond_dev, void *addr)
>  		}
>  	}
>  
> -	write_unlock_bh(&bond->curr_slave_lock);
> -	read_unlock(&bond->lock);
> -
>  	if (swap_slave) {
>  		alb_swap_mac_addr(bond, swap_slave, bond->curr_active_slave);
>  		alb_fasten_mac_swap(bond, swap_slave, bond->curr_active_slave);
> @@ -1757,16 +1752,17 @@ int bond_alb_set_mac_address(struct net_device *bond_dev, void *addr)
>  		alb_set_slave_mac_addr(bond->curr_active_slave, bond_dev->dev_addr,
>  				       bond->alb_info.rlb_enabled);
>  
> -		alb_send_learning_packets(bond->curr_active_slave, bond_dev->dev_addr);
> +		read_lock(&bond->lock);
> +		alb_send_learning_packets(bond->curr_active_slave,
> +					  bond_dev->dev_addr);
>  		if (bond->alb_info.rlb_enabled) {
>  			/* inform clients mac address has changed */
> -			rlb_req_update_slave_clients(bond, bond->curr_active_slave);
> +			rlb_req_update_slave_clients(bond,
> +						     bond->curr_active_slave);
>  		}
> +		read_unlock(&bond->lock);
>  	}
>  
> -	read_lock(&bond->lock);
> -	write_lock_bh(&bond->curr_slave_lock);
> -
>  	return 0;
>  }
>  

-- 
Paul Smith <paul@mad-scientist.net>



^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: 2.6.27.18: bnx2/tg3: BUG: "scheduling while atomic" trying to ifenslave a second interface to my bond
  2009-04-15 16:56   ` Paul Smith
@ 2009-04-15 18:11     ` Jay Vosburgh
  2009-04-15 18:39       ` Paul Smith
  0 siblings, 1 reply; 19+ messages in thread
From: Jay Vosburgh @ 2009-04-15 18:11 UTC (permalink / raw)
  To: paul; +Cc: netdev

Paul Smith <paul@mad-scientist.net> wrote:

>On Tue, 2009-04-14 at 18:12 -0700, Jay Vosburgh wrote:
>> 	I think I know what's going on.  I believe this patch will
>> resolve things, but I won't be able to test it until tomorrow.  If you
>> want to test this, great; if you want to wait, that's fine too.
>
>Hi Jay; as I mentioned last night this patch is working fine for me so
>far.

	Thanks for the test report.

>However, looking at the rest of this function it seems to me that there
>are other locking issues, at least based on the documentation in the
>header file:
>
> * Here are the locking policies for the two bonding locks:
> *
> * 1) Get bond->lock when reading/writing slave list.
> * 2) Get bond->curr_slave_lock when reading/writing bond->curr_active_slave.
> *    (It is unnecessary when the write-lock is put with bond->lock.)
> * 3) When we lock with bond->curr_slave_lock, we must lock with bond->lock
> *    beforehand.
>
>For example, don't you need to hold bond->curr_slave_lock at least
>around the "if (!bond->curr_active_slave)"?  What about around the
>"bond_for_each_slave" loop?
>
>Many of the other functions, later, also seem to work with
>bond->curr_active_slave and they don't take this lock.
>
>Unless I'm missing something, I think there are still more problems in
>the locking in bond_alb_set_mac_address().

	The various MAC manipulating functions are either called under
RTNL (as bond_alb_set_mac_address is) or take pains to acquire RTNL
before doing anything with the MAC.  Also, the slave list and
curr_active_slave are mutexed by RTNL, so those inspections should be
safe.

	I'm reasonably sure that the curr_slave_lock is superfluous
(which wasn't the case when it was originally introduced), but I haven't
had a chance to validate this.  The locking has changed from what's
documented in the header file; RTNL wasn't used for this when that was
written.

	-J

---
	-Jay Vosburgh, IBM Linux Technology Center, fubar@us.ibm.com

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: 2.6.27.18: bnx2/tg3: BUG: "scheduling while atomic" trying to ifenslave a second interface to my bond
  2009-04-15 18:11     ` Jay Vosburgh
@ 2009-04-15 18:39       ` Paul Smith
  0 siblings, 0 replies; 19+ messages in thread
From: Paul Smith @ 2009-04-15 18:39 UTC (permalink / raw)
  To: Netdev

On Wed, 2009-04-15 at 11:11 -0700, Jay Vosburgh wrote:
>         The various MAC manipulating functions are either called under
> RTNL (as bond_alb_set_mac_address is) or take pains to acquire RTNL
> before doing anything with the MAC.  Also, the slave list and
> curr_active_slave are mutexed by RTNL, so those inspections should be
> safe.
> 
>         I'm reasonably sure that the curr_slave_lock is superfluous
> (which wasn't the case when it was originally introduced), but I
> haven't had a chance to validate this.  The locking has changed from
> what's documented in the header file; RTNL wasn't used for this when
> that was written.

OK, sounds good.  I'll let you know if I observe any other odd behavior
with the bonding driver.

Thanks for the great support!  Cheers!

-- 
Paul Smith <paul@mad-scientist.net>



^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: 2.6.27.18: bnx2/tg3: BUG: "scheduling while atomic" trying toifenslave a second interface to my bond
       [not found] ` <1241397581.6499.658.camel@homebase.localnet>
@ 2009-05-04 19:03   ` Jay Vosburgh
  2009-05-04 19:06     ` David Miller
  2009-05-05  4:32     ` David Miller
  0 siblings, 2 replies; 19+ messages in thread
From: Jay Vosburgh @ 2009-05-04 19:03 UTC (permalink / raw)
  To: paul; +Cc: Linux netdev, David S. Miller

Paul Smith <paul@mad-scientist.net> wrote:

>Hi Jay/David/etc.;
>
>This patch is critical for me to properly use mode 6 (balance-alb)
>bonding; I assume it will be needed for others as well.  I haven't
>checked to see if it's still necessary in 2.6.29/2.6.30, but I didn't
>notice it going into the latest 2.6.27.22, released today.
>
>Is this still unofficial?  Is there an official patch on the horizon?

	David, please apply and queue for -stable:

Subject: [PATCH] bonding: fix alb mode locking regression 

	Fix locking issue in alb MAC address management; removed
incorrect locking and replaced with correct locking.  This bug was
introduced in commit:

commit 059fe7a578fba5bbb0fdc0365bfcf6218fa25eb0
Author: Jay Vosburgh <fubar@us.ibm.com>
Date:   Wed Oct 17 17:37:49 2007 -0700

    bonding: Convert locks to _bh, rework alb locking for new locking

	Bug reported by Paul Smith <paul@mad-scientist.net>, who also
tested the fix.

Signed-off-by: Jay Vosburgh <fubar@us.ibm.com>

diff --git a/drivers/net/bonding/bond_alb.c b/drivers/net/bonding/bond_alb.c
index 553a899..46d312b 100644
--- a/drivers/net/bonding/bond_alb.c
+++ b/drivers/net/bonding/bond_alb.c
@@ -1706,10 +1706,8 @@ void bond_alb_handle_active_change(struct bonding *bond, struct slave *new_slave
  * Called with RTNL
  */
 int bond_alb_set_mac_address(struct net_device *bond_dev, void *addr)
-	__releases(&bond->curr_slave_lock)
-	__releases(&bond->lock)
 	__acquires(&bond->lock)
-	__acquires(&bond->curr_slave_lock)
+	__releases(&bond->lock)
 {
 	struct bonding *bond = netdev_priv(bond_dev);
 	struct sockaddr *sa = addr;
@@ -1745,9 +1743,6 @@ int bond_alb_set_mac_address(struct net_device *bond_dev, void *addr)
 		}
 	}
 
-	write_unlock_bh(&bond->curr_slave_lock);
-	read_unlock(&bond->lock);
-
 	if (swap_slave) {
 		alb_swap_mac_addr(bond, swap_slave, bond->curr_active_slave);
 		alb_fasten_mac_swap(bond, swap_slave, bond->curr_active_slave);
@@ -1755,16 +1750,15 @@ int bond_alb_set_mac_address(struct net_device *bond_dev, void *addr)
 		alb_set_slave_mac_addr(bond->curr_active_slave, bond_dev->dev_addr,
 				       bond->alb_info.rlb_enabled);
 
+		read_lock(&bond->lock);
 		alb_send_learning_packets(bond->curr_active_slave, bond_dev->dev_addr);
 		if (bond->alb_info.rlb_enabled) {
 			/* inform clients mac address has changed */
 			rlb_req_update_slave_clients(bond, bond->curr_active_slave);
 		}
+		read_unlock(&bond->lock);
 	}
 
-	read_lock(&bond->lock);
-	write_lock_bh(&bond->curr_slave_lock);
-
 	return 0;
 }
 



^ permalink raw reply related	[flat|nested] 19+ messages in thread

* Re: 2.6.27.18: bnx2/tg3: BUG: "scheduling while atomic" trying toifenslave a second interface to my bond
  2009-05-04 19:03   ` 2.6.27.18: bnx2/tg3: BUG: "scheduling while atomic" trying toifenslave " Jay Vosburgh
@ 2009-05-04 19:06     ` David Miller
  2009-05-04 22:38       ` Paul Smith
                         ` (2 more replies)
  2009-05-05  4:32     ` David Miller
  1 sibling, 3 replies; 19+ messages in thread
From: David Miller @ 2009-05-04 19:06 UTC (permalink / raw)
  To: fubar; +Cc: paul, netdev

From: Jay Vosburgh <fubar@us.ibm.com>
Date: Mon, 04 May 2009 12:03:37 -0700

> Paul Smith <paul@mad-scientist.net> wrote:
> 
>>Hi Jay/David/etc.;
>>
>>This patch is critical for me to properly use mode 6 (balance-alb)
>>bonding; I assume it will be needed for others as well.  I haven't
>>checked to see if it's still necessary in 2.6.29/2.6.30, but I didn't
>>notice it going into the latest 2.6.27.22, released today.
>>
>>Is this still unofficial?  Is there an official patch on the horizon?
> 
> 	David, please apply and queue for -stable:

Greg just posted that there will be no further 2.6.27.x -stable
releases after the one he just made.

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: 2.6.27.18: bnx2/tg3: BUG: "scheduling while atomic" trying toifenslave a second interface to my bond
  2009-05-04 19:06     ` David Miller
@ 2009-05-04 22:38       ` Paul Smith
  2009-05-04 22:44         ` David Miller
  2009-05-05  0:59         ` Paul Smith
  2009-05-04 23:00       ` 2.6.27.18: bnx2/tg3: BUG: "scheduling while atomic" tryingtoifenslave " Jay Vosburgh
  2009-05-04 23:05       ` 2.6.27.18: bnx2/tg3: BUG: "scheduling while atomic" trying toifenslave " Ben Hutchings
  2 siblings, 2 replies; 19+ messages in thread
From: Paul Smith @ 2009-05-04 22:38 UTC (permalink / raw)
  To: David Miller; +Cc: fubar, netdev

On Mon, 2009-05-04 at 12:06 -0700, David Miller wrote:
> Greg just posted that there will be no further 2.6.27.x -stable
> releases after the one he just made.

Really?  That seems odd.  Everything I heard for the last 7 months or
so, even as late as last month, seemed to imply that 2.6.27 was going to
be the next long-term supported kernel, taking over from 2.6.16, and
that it would be supported for "years".

Maybe Adrian is taking over 2.6.27.x maintenance from Greg?

Was this posted on lkml?



^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: 2.6.27.18: bnx2/tg3: BUG: "scheduling while atomic" trying toifenslave a second interface to my bond
  2009-05-04 22:38       ` Paul Smith
@ 2009-05-04 22:44         ` David Miller
  2009-05-05  0:59         ` Paul Smith
  1 sibling, 0 replies; 19+ messages in thread
From: David Miller @ 2009-05-04 22:44 UTC (permalink / raw)
  To: paul; +Cc: fubar, netdev

From: Paul Smith <paul@mad-scientist.net>
Date: Mon, 04 May 2009 18:38:14 -0400

> On Mon, 2009-05-04 at 12:06 -0700, David Miller wrote:
>> Greg just posted that there will be no further 2.6.27.x -stable
>> releases after the one he just made.
> 
> Really?  That seems odd.  Everything I heard for the last 7 months or
> so, even as late as last month, seemed to imply that 2.6.27 was going to
> be the next long-term supported kernel, taking over from 2.6.16, and
> that it would be supported for "years".

The reality of the situation is that someone has to do the work.

> Maybe Adrian is taking over 2.6.27.x maintenance from Greg?

I have no idea.

> Was this posted on lkml?

Yes.

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: 2.6.27.18: bnx2/tg3: BUG: "scheduling while atomic" tryingtoifenslave a second interface to my bond
  2009-05-04 19:06     ` David Miller
  2009-05-04 22:38       ` Paul Smith
@ 2009-05-04 23:00       ` Jay Vosburgh
  2009-05-04 23:04         ` David Miller
  2009-05-04 23:05       ` 2.6.27.18: bnx2/tg3: BUG: "scheduling while atomic" trying toifenslave " Ben Hutchings
  2 siblings, 1 reply; 19+ messages in thread
From: Jay Vosburgh @ 2009-05-04 23:00 UTC (permalink / raw)
  To: David Miller; +Cc: paul, netdev

David Miller <davem@davemloft.net> wrote:

>From: Jay Vosburgh <fubar@us.ibm.com>
>Date: Mon, 04 May 2009 12:03:37 -0700
>
>> Paul Smith <paul@mad-scientist.net> wrote:
>> 
>>>Hi Jay/David/etc.;
>>>
>>>This patch is critical for me to properly use mode 6 (balance-alb)
>>>bonding; I assume it will be needed for others as well.  I haven't
>>>checked to see if it's still necessary in 2.6.29/2.6.30, but I didn't
>>>notice it going into the latest 2.6.27.22, released today.
>>>
>>>Is this still unofficial?  Is there an official patch on the horizon?
>> 
>> 	David, please apply and queue for -stable:
>
>Greg just posted that there will be no further 2.6.27.x -stable
>releases after the one he just made.

	Regardless of the -stable situation, the patch is still needed
for the current mainline (sorry if that wasn't clear).  I checked it
against the current net-2.6 and net-next-2.6 trees, and it should apply
to either one.

	-J

---
	-Jay Vosburgh, IBM Linux Technology Center, fubar@us.ibm.com

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: 2.6.27.18: bnx2/tg3: BUG: "scheduling while atomic" tryingtoifenslave a second interface to my bond
  2009-05-04 23:00       ` 2.6.27.18: bnx2/tg3: BUG: "scheduling while atomic" tryingtoifenslave " Jay Vosburgh
@ 2009-05-04 23:04         ` David Miller
  0 siblings, 0 replies; 19+ messages in thread
From: David Miller @ 2009-05-04 23:04 UTC (permalink / raw)
  To: fubar; +Cc: paul, netdev

From: Jay Vosburgh <fubar@us.ibm.com>
Date: Mon, 04 May 2009 16:00:51 -0700

> 	Regardless of the -stable situation, the patch is still needed
> for the current mainline (sorry if that wasn't clear).  I checked it
> against the current net-2.6 and net-next-2.6 trees, and it should apply
> to either one.

I didn't realize that, ok I'll unmark it in patchwork and
get to it soon.

Thanks.

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: 2.6.27.18: bnx2/tg3: BUG: "scheduling while atomic" trying toifenslave a second interface to my bond
  2009-05-04 19:06     ` David Miller
  2009-05-04 22:38       ` Paul Smith
  2009-05-04 23:00       ` 2.6.27.18: bnx2/tg3: BUG: "scheduling while atomic" tryingtoifenslave " Jay Vosburgh
@ 2009-05-04 23:05       ` Ben Hutchings
  2009-05-04 23:12         ` David Miller
  2 siblings, 1 reply; 19+ messages in thread
From: Ben Hutchings @ 2009-05-04 23:05 UTC (permalink / raw)
  To: David Miller; +Cc: fubar, paul, netdev

On Mon, 2009-05-04 at 12:06 -0700, David Miller wrote:
> From: Jay Vosburgh <fubar@us.ibm.com>
> Date: Mon, 04 May 2009 12:03:37 -0700
> 
> > Paul Smith <paul@mad-scientist.net> wrote:
> > 
> >>Hi Jay/David/etc.;
> >>
> >>This patch is critical for me to properly use mode 6 (balance-alb)
> >>bonding; I assume it will be needed for others as well.  I haven't
> >>checked to see if it's still necessary in 2.6.29/2.6.30, but I didn't
> >>notice it going into the latest 2.6.27.22, released today.
> >>
> >>Is this still unofficial?  Is there an official patch on the horizon?
> > 
> > 	David, please apply and queue for -stable:
> 
> Greg just posted that there will be no further 2.6.27.x -stable
> releases after the one he just made.

It's 2.6.28.x that he's dropping now
<http://article.gmane.org/gmane.linux.kernel/831202>, not .27.x - he'll
have to support the latter for years in SLES 11 anyway.

Ben.

-- 
Ben Hutchings, Senior Software Engineer, Solarflare Communications
Not speaking for my employer; that's the marketing department's job.
They asked us to note that Solarflare product names are trademarked.

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: 2.6.27.18: bnx2/tg3: BUG: "scheduling while atomic" trying toifenslave a second interface to my bond
  2009-05-04 23:05       ` 2.6.27.18: bnx2/tg3: BUG: "scheduling while atomic" trying toifenslave " Ben Hutchings
@ 2009-05-04 23:12         ` David Miller
  0 siblings, 0 replies; 19+ messages in thread
From: David Miller @ 2009-05-04 23:12 UTC (permalink / raw)
  To: bhutchings; +Cc: fubar, paul, netdev

From: Ben Hutchings <bhutchings@solarflare.com>
Date: Tue, 05 May 2009 00:05:12 +0100

> On Mon, 2009-05-04 at 12:06 -0700, David Miller wrote:
>> From: Jay Vosburgh <fubar@us.ibm.com>
>> Date: Mon, 04 May 2009 12:03:37 -0700
>> 
>> > Paul Smith <paul@mad-scientist.net> wrote:
>> > 
>> >>Hi Jay/David/etc.;
>> >>
>> >>This patch is critical for me to properly use mode 6 (balance-alb)
>> >>bonding; I assume it will be needed for others as well.  I haven't
>> >>checked to see if it's still necessary in 2.6.29/2.6.30, but I didn't
>> >>notice it going into the latest 2.6.27.22, released today.
>> >>
>> >>Is this still unofficial?  Is there an official patch on the horizon?
>> > 
>> > 	David, please apply and queue for -stable:
>> 
>> Greg just posted that there will be no further 2.6.27.x -stable
>> releases after the one he just made.
> 
> It's 2.6.28.x that he's dropping now
> <http://article.gmane.org/gmane.linux.kernel/831202>, not .27.x - he'll
> have to support the latter for years in SLES 11 anyway.

Yes he just corrected me about this in private email.

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: 2.6.27.18: bnx2/tg3: BUG: "scheduling while atomic" trying toifenslave a second interface to my bond
  2009-05-04 22:38       ` Paul Smith
  2009-05-04 22:44         ` David Miller
@ 2009-05-05  0:59         ` Paul Smith
  1 sibling, 0 replies; 19+ messages in thread
From: Paul Smith @ 2009-05-05  0:59 UTC (permalink / raw)
  To: David Miller; +Cc: fubar, netdev

Whew!  I was a bit worried there :-)

Can we re-queue this for 2.6.27-stable then?  I've got the patch already
building in my tree but it'd be nice to get it there by default going
forward.

Cheers!

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: 2.6.27.18: bnx2/tg3: BUG: "scheduling while atomic" trying toifenslave a second interface to my bond
  2009-05-04 19:03   ` 2.6.27.18: bnx2/tg3: BUG: "scheduling while atomic" trying toifenslave " Jay Vosburgh
  2009-05-04 19:06     ` David Miller
@ 2009-05-05  4:32     ` David Miller
  1 sibling, 0 replies; 19+ messages in thread
From: David Miller @ 2009-05-05  4:32 UTC (permalink / raw)
  To: fubar; +Cc: paul, netdev

From: Jay Vosburgh <fubar@us.ibm.com>
Date: Mon, 04 May 2009 12:03:37 -0700

> 	David, please apply and queue for -stable:

Done.

^ permalink raw reply	[flat|nested] 19+ messages in thread

end of thread, other threads:[~2009-05-05  4:32 UTC | newest]

Thread overview: 19+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2009-04-13 21:15 2.6.27.18: bnx2/tg3: BUG: "scheduling while atomic" trying to ifenslave a second interface to my bond Paul Smith
2009-04-14 16:01 ` Paul Smith
2009-04-14 21:29   ` Brian Haley
2009-04-15  1:12 ` Jay Vosburgh
2009-04-15  3:23   ` David Miller
2009-04-15  5:29   ` Paul Smith
2009-04-15 16:56   ` Paul Smith
2009-04-15 18:11     ` Jay Vosburgh
2009-04-15 18:39       ` Paul Smith
     [not found] ` <1241397581.6499.658.camel@homebase.localnet>
2009-05-04 19:03   ` 2.6.27.18: bnx2/tg3: BUG: "scheduling while atomic" trying toifenslave " Jay Vosburgh
2009-05-04 19:06     ` David Miller
2009-05-04 22:38       ` Paul Smith
2009-05-04 22:44         ` David Miller
2009-05-05  0:59         ` Paul Smith
2009-05-04 23:00       ` 2.6.27.18: bnx2/tg3: BUG: "scheduling while atomic" tryingtoifenslave " Jay Vosburgh
2009-05-04 23:04         ` David Miller
2009-05-04 23:05       ` 2.6.27.18: bnx2/tg3: BUG: "scheduling while atomic" trying toifenslave " Ben Hutchings
2009-05-04 23:12         ` David Miller
2009-05-05  4:32     ` David Miller

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.