All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH] bonding: prevent deadlock on slave store with alb mode
@ 2011-05-24 19:36 Neil Horman
  2011-05-24 20:00 ` Andy Gospodarek
                   ` (2 more replies)
  0 siblings, 3 replies; 18+ messages in thread
From: Neil Horman @ 2011-05-24 19:36 UTC (permalink / raw)
  To: netdev; +Cc: Neil Horman, Jay Vosburgh, Andy Gospodarek, David S. Miller

This soft lockup was recently reported:

[root@dell-per715-01 ~]# echo +bond5 > /sys/class/net/bonding_masters
[root@dell-per715-01 ~]# echo +eth1 > /sys/class/net/bond5/bonding/slaves
bonding: bond5: doing slave updates when interface is down.
bonding bond5: master_dev is not up in bond_enslave
[root@dell-per715-01 ~]# echo -eth1 > /sys/class/net/bond5/bonding/slaves
bonding: bond5: doing slave updates when interface is down.

BUG: soft lockup - CPU#12 stuck for 60s! [bash:6444]
CPU 12:
Modules linked in: bonding autofs4 hidp rfcomm l2cap bluetooth lockd sunrpc
be2d
Pid: 6444, comm: bash Not tainted 2.6.18-262.el5 #1
RIP: 0010:[<ffffffff80064bf0>]  [<ffffffff80064bf0>]
.text.lock.spinlock+0x26/00
RSP: 0018:ffff810113167da8  EFLAGS: 00000286
RAX: ffff810113167fd8 RBX: ffff810123a47800 RCX: 0000000000ff1025
RDX: 0000000000000000 RSI: ffff810123a47800 RDI: ffff81021b57f6f8
RBP: ffff81021b57f500 R08: 0000000000000000 R09: 000000000000000c
R10: 00000000ffffffff R11: ffff81011d41c000 R12: ffff81021b57f000
R13: 0000000000000000 R14: 0000000000000282 R15: 0000000000000282
FS:  00002b3b41ef3f50(0000) GS:ffff810123b27940(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
CR2: 00002b3b456dd000 CR3: 000000031fc60000 CR4: 00000000000006e0

Call Trace:
 [<ffffffff80064af9>] _spin_lock_bh+0x9/0x14
 [<ffffffff886937d7>] :bonding:tlb_clear_slave+0x22/0xa1
 [<ffffffff8869423c>] :bonding:bond_alb_deinit_slave+0xba/0xf0
 [<ffffffff8868dda6>] :bonding:bond_release+0x1b4/0x450
 [<ffffffff8006457b>] __down_write_nested+0x12/0x92
 [<ffffffff88696ae4>] :bonding:bonding_store_slaves+0x25c/0x2f7
 [<ffffffff801106f7>] sysfs_write_file+0xb9/0xe8
 [<ffffffff80016b87>] vfs_write+0xce/0x174
 [<ffffffff80017450>] sys_write+0x45/0x6e
 [<ffffffff8005d28d>] tracesys+0xd5/0xe0

It occurs because we are able to change the slave configuarion of a bond while
the bond interface is down.  The bonding driver initializes some data structures
only after its ndo_open routine is called.  Among them is the initalization of
the alb tx and rx hash locks.  So if we add or remove a slave without first
opening the bond master device, we run the risk of trying to lock/unlock a
spinlock that has garbage for data in it, which results in our above softlock.

We could fix it by moving the spin lock initalization to the device creation
path, but it seems that since we're warning people about not doing this, we
should probably just disallow them from doing it, so fix it by adding an EINVAL
return if we're not up yet.  Tested by the reporter and confirmed to fix the
problem.

Signed-off-by: Neil Horman <nhorman@tuxdriver.com>
Reported-by: jtluka@redhat.com
CC: Jay Vosburgh <fubar@us.ibm.com>
CC: Andy Gospodarek <andy@greyhouse.net>
CC: "David S. Miller" <daevm@davemloft.net>
---
 drivers/net/bonding/bond_sysfs.c |    1 +
 1 files changed, 1 insertions(+), 0 deletions(-)

diff --git a/drivers/net/bonding/bond_sysfs.c b/drivers/net/bonding/bond_sysfs.c
index 4059bfc..206c543 100644
--- a/drivers/net/bonding/bond_sysfs.c
+++ b/drivers/net/bonding/bond_sysfs.c
@@ -231,6 +231,7 @@ static ssize_t bonding_store_slaves(struct device *d,
 	if (!(bond->dev->flags & IFF_UP)) {
 		pr_warning("%s: doing slave updates when interface is down.\n",
 			   bond->dev->name);
+		return -EINVAL;
 	}
 
 	if (!rtnl_trylock())
-- 
1.7.5.1


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* Re: [PATCH] bonding: prevent deadlock on slave store with alb mode
  2011-05-24 19:36 [PATCH] bonding: prevent deadlock on slave store with alb mode Neil Horman
@ 2011-05-24 20:00 ` Andy Gospodarek
  2011-05-24 20:13   ` Nicolas de Pesloüan
  2011-05-25 17:16 ` [PATCH] bonding: prevent deadlock on slave store with alb mode (v2) Neil Horman
  2011-05-25 18:13 ` [PATCH] bonding: prevent deadlock on slave store with alb mode (v3) Neil Horman
  2 siblings, 1 reply; 18+ messages in thread
From: Andy Gospodarek @ 2011-05-24 20:00 UTC (permalink / raw)
  To: Neil Horman; +Cc: netdev, Jay Vosburgh, Andy Gospodarek, David S. Miller

On Tue, May 24, 2011 at 03:36:05PM -0400, Neil Horman wrote:
> This soft lockup was recently reported:
> 
> [root@dell-per715-01 ~]# echo +bond5 > /sys/class/net/bonding_masters
> [root@dell-per715-01 ~]# echo +eth1 > /sys/class/net/bond5/bonding/slaves
> bonding: bond5: doing slave updates when interface is down.
> bonding bond5: master_dev is not up in bond_enslave
> [root@dell-per715-01 ~]# echo -eth1 > /sys/class/net/bond5/bonding/slaves
> bonding: bond5: doing slave updates when interface is down.
> 
> BUG: soft lockup - CPU#12 stuck for 60s! [bash:6444]
> CPU 12:
> Modules linked in: bonding autofs4 hidp rfcomm l2cap bluetooth lockd sunrpc
> be2d
> Pid: 6444, comm: bash Not tainted 2.6.18-262.el5 #1
> RIP: 0010:[<ffffffff80064bf0>]  [<ffffffff80064bf0>]
> .text.lock.spinlock+0x26/00
> RSP: 0018:ffff810113167da8  EFLAGS: 00000286
> RAX: ffff810113167fd8 RBX: ffff810123a47800 RCX: 0000000000ff1025
> RDX: 0000000000000000 RSI: ffff810123a47800 RDI: ffff81021b57f6f8
> RBP: ffff81021b57f500 R08: 0000000000000000 R09: 000000000000000c
> R10: 00000000ffffffff R11: ffff81011d41c000 R12: ffff81021b57f000
> R13: 0000000000000000 R14: 0000000000000282 R15: 0000000000000282
> FS:  00002b3b41ef3f50(0000) GS:ffff810123b27940(0000) knlGS:0000000000000000
> CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
> CR2: 00002b3b456dd000 CR3: 000000031fc60000 CR4: 00000000000006e0
> 
> Call Trace:
>  [<ffffffff80064af9>] _spin_lock_bh+0x9/0x14
>  [<ffffffff886937d7>] :bonding:tlb_clear_slave+0x22/0xa1
>  [<ffffffff8869423c>] :bonding:bond_alb_deinit_slave+0xba/0xf0
>  [<ffffffff8868dda6>] :bonding:bond_release+0x1b4/0x450
>  [<ffffffff8006457b>] __down_write_nested+0x12/0x92
>  [<ffffffff88696ae4>] :bonding:bonding_store_slaves+0x25c/0x2f7
>  [<ffffffff801106f7>] sysfs_write_file+0xb9/0xe8
>  [<ffffffff80016b87>] vfs_write+0xce/0x174
>  [<ffffffff80017450>] sys_write+0x45/0x6e
>  [<ffffffff8005d28d>] tracesys+0xd5/0xe0
> 
> It occurs because we are able to change the slave configuarion of a bond while
> the bond interface is down.  The bonding driver initializes some data structures
> only after its ndo_open routine is called.  Among them is the initalization of
> the alb tx and rx hash locks.  So if we add or remove a slave without first
> opening the bond master device, we run the risk of trying to lock/unlock a
> spinlock that has garbage for data in it, which results in our above softlock.
> 
> We could fix it by moving the spin lock initalization to the device creation
> path, but it seems that since we're warning people about not doing this, we
> should probably just disallow them from doing it, so fix it by adding an EINVAL
> return if we're not up yet.  Tested by the reporter and confirmed to fix the
> problem.
> 
> Signed-off-by: Neil Horman <nhorman@tuxdriver.com>

Signed-off-by: Andy Gospodarek <andy@greyhouse.net>

> Reported-by: jtluka@redhat.com
> CC: Jay Vosburgh <fubar@us.ibm.com>
> CC: Andy Gospodarek <andy@greyhouse.net>
> CC: "David S. Miller" <daevm@davemloft.net>
> ---
>  drivers/net/bonding/bond_sysfs.c |    1 +
>  1 files changed, 1 insertions(+), 0 deletions(-)
> 
> diff --git a/drivers/net/bonding/bond_sysfs.c b/drivers/net/bonding/bond_sysfs.c
> index 4059bfc..206c543 100644
> --- a/drivers/net/bonding/bond_sysfs.c
> +++ b/drivers/net/bonding/bond_sysfs.c
> @@ -231,6 +231,7 @@ static ssize_t bonding_store_slaves(struct device *d,
>  	if (!(bond->dev->flags & IFF_UP)) {
>  		pr_warning("%s: doing slave updates when interface is down.\n",
>  			   bond->dev->name);
> +		return -EINVAL;
>  	}
>  
>  	if (!rtnl_trylock())
> -- 
> 1.7.5.1
> 
> --
> To unsubscribe from this list: send the line "unsubscribe netdev" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH] bonding: prevent deadlock on slave store with alb mode
  2011-05-24 20:00 ` Andy Gospodarek
@ 2011-05-24 20:13   ` Nicolas de Pesloüan
  2011-05-24 20:37     ` Neil Horman
  0 siblings, 1 reply; 18+ messages in thread
From: Nicolas de Pesloüan @ 2011-05-24 20:13 UTC (permalink / raw)
  To: Andy Gospodarek; +Cc: Neil Horman, netdev, Jay Vosburgh, David S. Miller

Le 24/05/2011 22:00, Andy Gospodarek a écrit :
> On Tue, May 24, 2011 at 03:36:05PM -0400, Neil Horman wrote:
>> This soft lockup was recently reported:
>>
>> [root@dell-per715-01 ~]# echo +bond5>  /sys/class/net/bonding_masters
>> [root@dell-per715-01 ~]# echo +eth1>  /sys/class/net/bond5/bonding/slaves
>> bonding: bond5: doing slave updates when interface is down.
>> bonding bond5: master_dev is not up in bond_enslave
>> [root@dell-per715-01 ~]# echo -eth1>  /sys/class/net/bond5/bonding/slaves
>> bonding: bond5: doing slave updates when interface is down.
>>
>> BUG: soft lockup - CPU#12 stuck for 60s! [bash:6444]
>> CPU 12:
>> Modules linked in: bonding autofs4 hidp rfcomm l2cap bluetooth lockd sunrpc
>> be2d
>> Pid: 6444, comm: bash Not tainted 2.6.18-262.el5 #1
>> RIP: 0010:[<ffffffff80064bf0>]  [<ffffffff80064bf0>]
>> .text.lock.spinlock+0x26/00
>> RSP: 0018:ffff810113167da8  EFLAGS: 00000286
>> RAX: ffff810113167fd8 RBX: ffff810123a47800 RCX: 0000000000ff1025
>> RDX: 0000000000000000 RSI: ffff810123a47800 RDI: ffff81021b57f6f8
>> RBP: ffff81021b57f500 R08: 0000000000000000 R09: 000000000000000c
>> R10: 00000000ffffffff R11: ffff81011d41c000 R12: ffff81021b57f000
>> R13: 0000000000000000 R14: 0000000000000282 R15: 0000000000000282
>> FS:  00002b3b41ef3f50(0000) GS:ffff810123b27940(0000) knlGS:0000000000000000
>> CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
>> CR2: 00002b3b456dd000 CR3: 000000031fc60000 CR4: 00000000000006e0
>>
>> Call Trace:
>>   [<ffffffff80064af9>] _spin_lock_bh+0x9/0x14
>>   [<ffffffff886937d7>] :bonding:tlb_clear_slave+0x22/0xa1
>>   [<ffffffff8869423c>] :bonding:bond_alb_deinit_slave+0xba/0xf0
>>   [<ffffffff8868dda6>] :bonding:bond_release+0x1b4/0x450
>>   [<ffffffff8006457b>] __down_write_nested+0x12/0x92
>>   [<ffffffff88696ae4>] :bonding:bonding_store_slaves+0x25c/0x2f7
>>   [<ffffffff801106f7>] sysfs_write_file+0xb9/0xe8
>>   [<ffffffff80016b87>] vfs_write+0xce/0x174
>>   [<ffffffff80017450>] sys_write+0x45/0x6e
>>   [<ffffffff8005d28d>] tracesys+0xd5/0xe0
>>
>> It occurs because we are able to change the slave configuarion of a bond while
>> the bond interface is down.  The bonding driver initializes some data structures
>> only after its ndo_open routine is called.  Among them is the initalization of
>> the alb tx and rx hash locks.  So if we add or remove a slave without first
>> opening the bond master device, we run the risk of trying to lock/unlock a
>> spinlock that has garbage for data in it, which results in our above softlock.
>>
>> We could fix it by moving the spin lock initalization to the device creation
>> path, but it seems that since we're warning people about not doing this, we
>> should probably just disallow them from doing it, so fix it by adding an EINVAL
>> return if we're not up yet.  Tested by the reporter and confirmed to fix the
>> problem.
>>
>> Signed-off-by: Neil Horman<nhorman@tuxdriver.com>
>
> Signed-off-by: Andy Gospodarek<andy@greyhouse.net>
>
>> Reported-by: jtluka@redhat.com
>> CC: Jay Vosburgh<fubar@us.ibm.com>
>> CC: Andy Gospodarek<andy@greyhouse.net>
>> CC: "David S. Miller"<daevm@davemloft.net>
>> ---
>>   drivers/net/bonding/bond_sysfs.c |    1 +
>>   1 files changed, 1 insertions(+), 0 deletions(-)
>>
>> diff --git a/drivers/net/bonding/bond_sysfs.c b/drivers/net/bonding/bond_sysfs.c
>> index 4059bfc..206c543 100644
>> --- a/drivers/net/bonding/bond_sysfs.c
>> +++ b/drivers/net/bonding/bond_sysfs.c
>> @@ -231,6 +231,7 @@ static ssize_t bonding_store_slaves(struct device *d,
>>   	if (!(bond->dev->flags&  IFF_UP)) {
>>   		pr_warning("%s: doing slave updates when interface is down.\n",
>>   			   bond->dev->name);
>> +		return -EINVAL;

This will turn a warning into an error.

This warning existed for long, but never caused the bonding setup to fail. This patch cause some 
regression for user space. For example, current ifenslave-2.6 package in Debian doesn't ensure bond 
is UP before enslaving, because this was never required.

NAK.

>>   	}
>>
>>   	if (!rtnl_trylock())
>> --
>> 1.7.5.1
>>
>> --
>> To unsubscribe from this list: send the line "unsubscribe netdev" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> --
> To unsubscribe from this list: send the line "unsubscribe netdev" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH] bonding: prevent deadlock on slave store with alb mode
  2011-05-24 20:13   ` Nicolas de Pesloüan
@ 2011-05-24 20:37     ` Neil Horman
  2011-05-24 20:51       ` Nicolas de Pesloüan
  0 siblings, 1 reply; 18+ messages in thread
From: Neil Horman @ 2011-05-24 20:37 UTC (permalink / raw)
  To: Nicolas de Pesloüan
  Cc: Andy Gospodarek, netdev, Jay Vosburgh, David S. Miller

On Tue, May 24, 2011 at 10:13:35PM +0200, Nicolas de Pesloüan wrote:
> Le 24/05/2011 22:00, Andy Gospodarek a écrit :
> >On Tue, May 24, 2011 at 03:36:05PM -0400, Neil Horman wrote:
> >>This soft lockup was recently reported:
> >>
> >>[root@dell-per715-01 ~]# echo +bond5>  /sys/class/net/bonding_masters
> >>[root@dell-per715-01 ~]# echo +eth1>  /sys/class/net/bond5/bonding/slaves
> >>bonding: bond5: doing slave updates when interface is down.
> >>bonding bond5: master_dev is not up in bond_enslave
> >>[root@dell-per715-01 ~]# echo -eth1>  /sys/class/net/bond5/bonding/slaves
> >>bonding: bond5: doing slave updates when interface is down.
> >>
> >>BUG: soft lockup - CPU#12 stuck for 60s! [bash:6444]
> >>CPU 12:
> >>Modules linked in: bonding autofs4 hidp rfcomm l2cap bluetooth lockd sunrpc
> >>be2d
> >>Pid: 6444, comm: bash Not tainted 2.6.18-262.el5 #1
> >>RIP: 0010:[<ffffffff80064bf0>]  [<ffffffff80064bf0>]
> >>.text.lock.spinlock+0x26/00
> >>RSP: 0018:ffff810113167da8  EFLAGS: 00000286
> >>RAX: ffff810113167fd8 RBX: ffff810123a47800 RCX: 0000000000ff1025
> >>RDX: 0000000000000000 RSI: ffff810123a47800 RDI: ffff81021b57f6f8
> >>RBP: ffff81021b57f500 R08: 0000000000000000 R09: 000000000000000c
> >>R10: 00000000ffffffff R11: ffff81011d41c000 R12: ffff81021b57f000
> >>R13: 0000000000000000 R14: 0000000000000282 R15: 0000000000000282
> >>FS:  00002b3b41ef3f50(0000) GS:ffff810123b27940(0000) knlGS:0000000000000000
> >>CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
> >>CR2: 00002b3b456dd000 CR3: 000000031fc60000 CR4: 00000000000006e0
> >>
> >>Call Trace:
> >>  [<ffffffff80064af9>] _spin_lock_bh+0x9/0x14
> >>  [<ffffffff886937d7>] :bonding:tlb_clear_slave+0x22/0xa1
> >>  [<ffffffff8869423c>] :bonding:bond_alb_deinit_slave+0xba/0xf0
> >>  [<ffffffff8868dda6>] :bonding:bond_release+0x1b4/0x450
> >>  [<ffffffff8006457b>] __down_write_nested+0x12/0x92
> >>  [<ffffffff88696ae4>] :bonding:bonding_store_slaves+0x25c/0x2f7
> >>  [<ffffffff801106f7>] sysfs_write_file+0xb9/0xe8
> >>  [<ffffffff80016b87>] vfs_write+0xce/0x174
> >>  [<ffffffff80017450>] sys_write+0x45/0x6e
> >>  [<ffffffff8005d28d>] tracesys+0xd5/0xe0
> >>
> >>It occurs because we are able to change the slave configuarion of a bond while
> >>the bond interface is down.  The bonding driver initializes some data structures
> >>only after its ndo_open routine is called.  Among them is the initalization of
> >>the alb tx and rx hash locks.  So if we add or remove a slave without first
> >>opening the bond master device, we run the risk of trying to lock/unlock a
> >>spinlock that has garbage for data in it, which results in our above softlock.
> >>
> >>We could fix it by moving the spin lock initalization to the device creation
> >>path, but it seems that since we're warning people about not doing this, we
> >>should probably just disallow them from doing it, so fix it by adding an EINVAL
> >>return if we're not up yet.  Tested by the reporter and confirmed to fix the
> >>problem.
> >>
> >>Signed-off-by: Neil Horman<nhorman@tuxdriver.com>
> >
> >Signed-off-by: Andy Gospodarek<andy@greyhouse.net>
> >
> >>Reported-by: jtluka@redhat.com
> >>CC: Jay Vosburgh<fubar@us.ibm.com>
> >>CC: Andy Gospodarek<andy@greyhouse.net>
> >>CC: "David S. Miller"<daevm@davemloft.net>
> >>---
> >>  drivers/net/bonding/bond_sysfs.c |    1 +
> >>  1 files changed, 1 insertions(+), 0 deletions(-)
> >>
> >>diff --git a/drivers/net/bonding/bond_sysfs.c b/drivers/net/bonding/bond_sysfs.c
> >>index 4059bfc..206c543 100644
> >>--- a/drivers/net/bonding/bond_sysfs.c
> >>+++ b/drivers/net/bonding/bond_sysfs.c
> >>@@ -231,6 +231,7 @@ static ssize_t bonding_store_slaves(struct device *d,
> >>  	if (!(bond->dev->flags&  IFF_UP)) {
> >>  		pr_warning("%s: doing slave updates when interface is down.\n",
> >>  			   bond->dev->name);
> >>+		return -EINVAL;
> 
> This will turn a warning into an error.
> 
Yes, because it should have been an error all along.

> This warning existed for long, but never caused the bonding setup to
> fail. This patch cause some regression for user space. For example,
> current ifenslave-2.6 package in Debian doesn't ensure bond is UP
> before enslaving, because this was never required.
> 
Thats not a regression, thats the kernel returning an error where it should have
done so all along.  Just because a utility got away with it for awhile and it
didn't always cause a lockup, doesn't grandfather that application in to a
situation where the kernel has to support its broken behavior in perpituity.  

Besides, iirc, the ifsenslave utility still uses the ioctl path, which this
patch doesn't touch, so ifenslave is currently unaffected (although I should
look in the ioctl path to see if we have already added such a check, lest you be
able to deadlock your system as previously indicated using that tool).

Neil


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH] bonding: prevent deadlock on slave store with alb mode
  2011-05-24 20:37     ` Neil Horman
@ 2011-05-24 20:51       ` Nicolas de Pesloüan
  2011-05-24 21:02         ` Nicolas de Pesloüan
  2011-05-24 21:12         ` Jay Vosburgh
  0 siblings, 2 replies; 18+ messages in thread
From: Nicolas de Pesloüan @ 2011-05-24 20:51 UTC (permalink / raw)
  To: Neil Horman; +Cc: Andy Gospodarek, netdev, Jay Vosburgh, David S. Miller

Le 24/05/2011 22:37, Neil Horman a écrit :

>>>> +		return -EINVAL;
>>
>> This will turn a warning into an error.
>>
> Yes, because it should have been an error all along.
>
>> This warning existed for long, but never caused the bonding setup to
>> fail. This patch cause some regression for user space. For example,
>> current ifenslave-2.6 package in Debian doesn't ensure bond is UP
>> before enslaving, because this was never required.
>>
> Thats not a regression, thats the kernel returning an error where it should have
> done so all along.  Just because a utility got away with it for awhile and it
> didn't always cause a lockup, doesn't grandfather that application in to a
> situation where the kernel has to support its broken behavior in perpituity.
>
> Besides, iirc, the ifsenslave utility still uses the ioctl path, which this
> patch doesn't touch, so ifenslave is currently unaffected (although I should
> look in the ioctl path to see if we have already added such a check, lest you be
> able to deadlock your system as previously indicated using that tool).

Unfortunately, no. Recent versions of ifenslave-2.6 on Debian don't use ioctl (ifenslave binary) 
anymore, but only sysfs.

Documentation/bonding.txt should be updated to reflect this change.
pr_warning should be changed to pr_ err.
Bonding version should be bumped.

Anyway, I will fix this package, but I suspect there exist many user scripts that don't ensure bond 
is up before enslaving.

	Nicolas.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH] bonding: prevent deadlock on slave store with alb mode
  2011-05-24 20:51       ` Nicolas de Pesloüan
@ 2011-05-24 21:02         ` Nicolas de Pesloüan
  2011-05-24 21:12         ` Jay Vosburgh
  1 sibling, 0 replies; 18+ messages in thread
From: Nicolas de Pesloüan @ 2011-05-24 21:02 UTC (permalink / raw)
  To: Neil Horman; +Cc: Andy Gospodarek, netdev, Jay Vosburgh, David S. Miller

Le 24/05/2011 22:51, Nicolas de Pesloüan a écrit :
> Le 24/05/2011 22:37, Neil Horman a écrit :
>
>>>>> + return -EINVAL;
>>>
>>> This will turn a warning into an error.
>>>
>> Yes, because it should have been an error all along.
>>
>>> This warning existed for long, but never caused the bonding setup to
>>> fail. This patch cause some regression for user space. For example,
>>> current ifenslave-2.6 package in Debian doesn't ensure bond is UP
>>> before enslaving, because this was never required.
>>>
>> Thats not a regression, thats the kernel returning an error where it should have
>> done so all along. Just because a utility got away with it for awhile and it
>> didn't always cause a lockup, doesn't grandfather that application in to a
>> situation where the kernel has to support its broken behavior in perpituity.
>>
>> Besides, iirc, the ifsenslave utility still uses the ioctl path, which this
>> patch doesn't touch, so ifenslave is currently unaffected (although I should
>> look in the ioctl path to see if we have already added such a check, lest you be
>> able to deadlock your system as previously indicated using that tool).
>
> Unfortunately, no. Recent versions of ifenslave-2.6 on Debian don't use ioctl (ifenslave binary)
> anymore, but only sysfs.
>
> Documentation/bonding.txt should be updated to reflect this change.
> pr_warning should be changed to pr_ err.
> Bonding version should be bumped.
>
> Anyway, I will fix this package, but I suspect there exist many user scripts that don't ensure bond
> is up before enslaving.

Well, still thinking about it...

On Ubuntu, due to the usage of upstart, the slaves are ifup before the master. On Debian, this is 
also true for hotplug slaves. I wonder whether this patch may cause troubles, because the master 
will have to be up before calling ifup for it.

	Nicolas.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH] bonding: prevent deadlock on slave store with alb mode
  2011-05-24 20:51       ` Nicolas de Pesloüan
  2011-05-24 21:02         ` Nicolas de Pesloüan
@ 2011-05-24 21:12         ` Jay Vosburgh
  2011-05-24 21:14           ` David Miller
                             ` (2 more replies)
  1 sibling, 3 replies; 18+ messages in thread
From: Jay Vosburgh @ 2011-05-24 21:12 UTC (permalink / raw)
  To: =?ISO-8859-1?Q?Nicolas_de_Peslo=FCan?=
  Cc: Neil Horman, Andy Gospodarek, netdev, David S. Miller

Nicolas de Pesloüan <nicolas.2p.debian@gmail.com> wrote:

>Le 24/05/2011 22:37, Neil Horman a écrit :
>
>>>>> +		return -EINVAL;
>>>
>>> This will turn a warning into an error.
>>>
>> Yes, because it should have been an error all along.
>>
>>> This warning existed for long, but never caused the bonding setup to
>>> fail. This patch cause some regression for user space. For example,
>>> current ifenslave-2.6 package in Debian doesn't ensure bond is UP
>>> before enslaving, because this was never required.
>>>
>> Thats not a regression, thats the kernel returning an error where it should have
>> done so all along.  Just because a utility got away with it for awhile and it
>> didn't always cause a lockup, doesn't grandfather that application in to a
>> situation where the kernel has to support its broken behavior in perpituity.
>>
>> Besides, iirc, the ifsenslave utility still uses the ioctl path, which this
>> patch doesn't touch, so ifenslave is currently unaffected (although I should
>> look in the ioctl path to see if we have already added such a check, lest you be
>> able to deadlock your system as previously indicated using that tool).
>
>Unfortunately, no. Recent versions of ifenslave-2.6 on Debian don't use
>ioctl (ifenslave binary) anymore, but only sysfs.
>
>Documentation/bonding.txt should be updated to reflect this change.
>pr_warning should be changed to pr_ err.
>Bonding version should be bumped.
>
>Anyway, I will fix this package, but I suspect there exist many user
>scripts that don't ensure bond is up before enslaving.

	I looked at sysconfig (as supplied with opensuse) and it uses
sysfs, and does set the master device up first.  The other potential
user that comes to mind is that OFED at one point had a script to set up
bonding for Infiniband devices.  I don't know if this is still the case,
nor do I know if it set the bond device up before enslaving.

	Generally speaking, though, in the long run I think it should be
permissible to change any bonding option when the bond is down (even to
values that make no sense in context, e.g., setting the primary to a
device not currently enslaved).  My rationale here is that some options
are very difficult to modify when the bond is up (e.g., changing the
mode), and now some other set is precluded when the bond is down.  The
init scripts already have repeat logic in them; this just makes things
more complicated.

	There should be a state wherein any option can be changed (well,
maybe not max_bonds), and that should be the down state.  A subset can
also be changed while up.  I'd be happy to be able to change all options
while the bond is up, too, but that seems pretty hard to do.

	How much harder is it to fix the locking and permit the action
in question here?

	-J

---
	-Jay Vosburgh, IBM Linux Technology Center, fubar@us.ibm.com

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH] bonding: prevent deadlock on slave store with alb mode
  2011-05-24 21:12         ` Jay Vosburgh
@ 2011-05-24 21:14           ` David Miller
  2011-05-24 21:32           ` Nicolas de Pesloüan
  2011-05-24 23:18           ` Neil Horman
  2 siblings, 0 replies; 18+ messages in thread
From: David Miller @ 2011-05-24 21:14 UTC (permalink / raw)
  To: fubar; +Cc: nicolas.2p.debian, nhorman, andy, netdev

From: Jay Vosburgh <fubar@us.ibm.com>
Date: Tue, 24 May 2011 14:12:54 -0700

> 	How much harder is it to fix the locking and permit the action
> in question here?

I've reverted and am holding off on this patch until this discussion
is resolved.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH] bonding: prevent deadlock on slave store with alb mode
  2011-05-24 21:12         ` Jay Vosburgh
  2011-05-24 21:14           ` David Miller
@ 2011-05-24 21:32           ` Nicolas de Pesloüan
  2011-05-24 23:18           ` Neil Horman
  2 siblings, 0 replies; 18+ messages in thread
From: Nicolas de Pesloüan @ 2011-05-24 21:32 UTC (permalink / raw)
  To: Jay Vosburgh; +Cc: Neil Horman, Andy Gospodarek, netdev, David S. Miller

Le 24/05/2011 23:12, Jay Vosburgh a écrit :
> Nicolas de Pesloüan<nicolas.2p.debian@gmail.com>  wrote:
>
>> Le 24/05/2011 22:37, Neil Horman a écrit :
>>
>>>>>> +		return -EINVAL;
>>>>
>>>> This will turn a warning into an error.
>>>>
>>> Yes, because it should have been an error all along.
>>>
>>>> This warning existed for long, but never caused the bonding setup to
>>>> fail. This patch cause some regression for user space. For example,
>>>> current ifenslave-2.6 package in Debian doesn't ensure bond is UP
>>>> before enslaving, because this was never required.
>>>>
>>> Thats not a regression, thats the kernel returning an error where it should have
>>> done so all along.  Just because a utility got away with it for awhile and it
>>> didn't always cause a lockup, doesn't grandfather that application in to a
>>> situation where the kernel has to support its broken behavior in perpituity.
>>>
>>> Besides, iirc, the ifsenslave utility still uses the ioctl path, which this
>>> patch doesn't touch, so ifenslave is currently unaffected (although I should
>>> look in the ioctl path to see if we have already added such a check, lest you be
>>> able to deadlock your system as previously indicated using that tool).
>>
>> Unfortunately, no. Recent versions of ifenslave-2.6 on Debian don't use
>> ioctl (ifenslave binary) anymore, but only sysfs.
>>
>> Documentation/bonding.txt should be updated to reflect this change.
>> pr_warning should be changed to pr_ err.
>> Bonding version should be bumped.
>>
>> Anyway, I will fix this package, but I suspect there exist many user
>> scripts that don't ensure bond is up before enslaving.
>
> 	I looked at sysconfig (as supplied with opensuse) and it uses
> sysfs, and does set the master device up first.  The other potential
> user that comes to mind is that OFED at one point had a script to set up
> bonding for Infiniband devices.  I don't know if this is still the case,
> nor do I know if it set the bond device up before enslaving.
>
> 	Generally speaking, though, in the long run I think it should be
> permissible to change any bonding option when the bond is down (even to
> values that make no sense in context, e.g., setting the primary to a
> device not currently enslaved).  My rationale here is that some options
> are very difficult to modify when the bond is up (e.g., changing the
> mode), and now some other set is precluded when the bond is down.  The
> init scripts already have repeat logic in them; this just makes things
> more complicated.
>
> 	There should be a state wherein any option can be changed (well,
> maybe not max_bonds), and that should be the down state.  A subset can
> also be changed while up.  I'd be happy to be able to change all options
> while the bond is up, too, but that seems pretty hard to do.
>
> 	How much harder is it to fix the locking and permit the action
> in question here?

To add to the comment by Jay, here is an extract from the more up to date version of the script that 
setup bonding on Debian (/etc/network/if-pre-up.d/ifenslave, from ifenslave-2.6 package):

# early_setup_master is the place where we do master setup that need to be done before enslavement.
early_setup_master()
{
         # Warning: the order in wich we write into the sysfs files is important.
         # Double check in drivers/net/bonding/bond_sysfs.c in linux kernel source tree
         # before changing anything here.

         # fail_over_mac must be set before enslavement of any slaves.
         sysfs fail_over_mac "$IF_BOND_FAIL_OVER_MAC"
}

setup_master()
{
         # Warning: the order in wich we write into the sysfs files is important.
         # Double check in drivers/net/bonding/bond_sysfs.c in linux kernel source tree
         # before changing anything here.

         # use_carrier can be set anytime.
         sysfs use_carrier "$IF_BOND_USE_CARRIER"
         # num_grat_arp can be set anytime.
         sysfs num_grat_arp "$IF_BOND_NUM_GRAT_ARP"
         # num_unsol_na can be set anytime.
         sysfs num_unsol_na "$IF_BOND_NUM_UNSOL_NA"

         # xmit_hash_policy can be set anytime.
         # Changing xmit_hash_policy requires $BOND_MASTER to be down.
         sysfs_change_down xmit_hash_policy "$IF_BOND_XMIT_HASH_POLICY"

         # arp_ip_target must be set before arp_interval.
         sysfs_add arp_ip_target "$IF_BOND_ARP_IP_TARGET"
         sysfs arp_interval "$IF_BOND_ARP_INTERVAL"

         # miimon must be set before updelay and downdelay.
         sysfs miimon "$IF_BOND_MIIMON"
         sysfs downdelay "$IF_BOND_DOWNDELAY"
         sysfs updelay "$IF_BOND_UPDELAY"

         # Changing ad_select requires $BOND_MASTER to be down.
         sysfs_change_down ad_select "$IF_BOND_AD_SELECT"

         # Changing mode requires $BOND_MASTER to be down.
         # Mode should be set after miimon or arp_interval, to avoid a warning in syslog.
         sysfs_change_down mode "$IF_BOND_MODE"

         # arp_validate must be after mode (because mode must be active-backup).
         sysfs arp_validate "$IF_BOND_ARP_VALIDATE"

         # lacp_rate must be set after mode (because mode must be 802.3ad).
         # Changing lacp_rate requires $BOND_MASTER to be down.
         sysfs_change_down lacp_rate "$IF_BOND_LACP_RATE"

         # primary must be set after mode (because only supported in some modes) and after enslavement.
         # The first slave in bond-primary found in current slaves becomes the primary.
         # If no slave in bond-primary is found, then primary does not change.
         for slave in $IF_BOND_PRIMARY ; do
                 if grep -sq "\\<$slave\\>" "/sys/class/net/$BOND_MASTER/bonding/slaves" ; then
                         sysfs primary "$slave"
                         break
                 fi
         done

         # primary_reselect should be set after mode (because only supported in some modes), after 
enslavement
         # and after primary. This is currently (2.6.35-rc1) not enforced by the bonding driver, but 
it is
         # probably safer to do it in that order.
         sysfs primary_reselect "$IF_BOND_PRIMARY_RESELECT"

         # queue_id must be set after enslavement.
         for iface_queue_id in $IF_BOND_QUEUE_ID
         do
                 sysfs iface_queue_id $iface_queue_id
         done

         # active_slave must be set after mode and after enslavement.
         # The slave must be up and the underlying link must be up too.
         # FIXME: We should have a way to write an empty string to active_slave, to set the 
active_slave to none.
         if [ "$IF_BOND_ACTIVE_SLAVE" ] ; then
                 # Need to force interface up before. Bonding will refuse to activate a down interface.
                 ip link set "$IF_BOND_ACTIVE_SLAVE" up
                 sysfs active_slave "$IF_BOND_ACTIVE_SLAVE"
         fi

         # Force $BOND_MASTER to be up, if we are called from a slave stanza.
         [ "$IFACE" != "$BOND_MASTER" ] && ip link set dev "$BOND_MASTER" up
}

In order to fix some of the reported problems with some previous versions of this script, I had to 
conduct a detailed code review of bond_sysfs.c and eventually end up with this current version, 
where I documented all the constraints I discovered.

bonding sysfs currently enforce many constraints on the exact order of the setup. As noted by Jay, 
some setup need to be done while the master if down, some while the master is up, some need to be 
done before the first enslavement, some must be done after the referenced slave is enslaved, some 
are only allowed if mode is properly set before, and so on...

I though for long about adding this into Documentation/networking/bonding.txt, but fail to find the 
time to do so.

Arguably, it would be better to spend time to remove those constraints than to document them.

	Nicolas.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH] bonding: prevent deadlock on slave store with alb mode
  2011-05-24 21:12         ` Jay Vosburgh
  2011-05-24 21:14           ` David Miller
  2011-05-24 21:32           ` Nicolas de Pesloüan
@ 2011-05-24 23:18           ` Neil Horman
  2011-05-25  0:34             ` Jay Vosburgh
  2 siblings, 1 reply; 18+ messages in thread
From: Neil Horman @ 2011-05-24 23:18 UTC (permalink / raw)
  To: Jay Vosburgh
  Cc: =?ISO-8859-1?Q?Nicolas_de_Peslo=FCan?=,
	Andy Gospodarek, netdev, David S. Miller

On Tue, May 24, 2011 at 02:12:54PM -0700, Jay Vosburgh wrote:
> Nicolas de Pesloüan <nicolas.2p.debian@gmail.com> wrote:
> 
> >Le 24/05/2011 22:37, Neil Horman a écrit :
> >
> >>>>> +		return -EINVAL;
> >>>
> >>> This will turn a warning into an error.
> >>>
> >> Yes, because it should have been an error all along.
> >>
> >>> This warning existed for long, but never caused the bonding setup to
> >>> fail. This patch cause some regression for user space. For example,
> >>> current ifenslave-2.6 package in Debian doesn't ensure bond is UP
> >>> before enslaving, because this was never required.
> >>>
> >> Thats not a regression, thats the kernel returning an error where it should have
> >> done so all along.  Just because a utility got away with it for awhile and it
> >> didn't always cause a lockup, doesn't grandfather that application in to a
> >> situation where the kernel has to support its broken behavior in perpituity.
> >>
> >> Besides, iirc, the ifsenslave utility still uses the ioctl path, which this
> >> patch doesn't touch, so ifenslave is currently unaffected (although I should
> >> look in the ioctl path to see if we have already added such a check, lest you be
> >> able to deadlock your system as previously indicated using that tool).
> >
> >Unfortunately, no. Recent versions of ifenslave-2.6 on Debian don't use
> >ioctl (ifenslave binary) anymore, but only sysfs.
> >
> >Documentation/bonding.txt should be updated to reflect this change.
> >pr_warning should be changed to pr_ err.
> >Bonding version should be bumped.
> >
> >Anyway, I will fix this package, but I suspect there exist many user
> >scripts that don't ensure bond is up before enslaving.
> 
> 	I looked at sysconfig (as supplied with opensuse) and it uses
> sysfs, and does set the master device up first.  The other potential
> user that comes to mind is that OFED at one point had a script to set up
> bonding for Infiniband devices.  I don't know if this is still the case,
> nor do I know if it set the bond device up before enslaving.
> 
> 	Generally speaking, though, in the long run I think it should be
> permissible to change any bonding option when the bond is down (even to
> values that make no sense in context, e.g., setting the primary to a
> device not currently enslaved).  My rationale here is that some options
> are very difficult to modify when the bond is up (e.g., changing the
> mode), and now some other set is precluded when the bond is down.  The
> init scripts already have repeat logic in them; this just makes things
> more complicated.
> 
> 	There should be a state wherein any option can be changed (well,
> maybe not max_bonds), and that should be the down state.  A subset can
> also be changed while up.  I'd be happy to be able to change all options
> while the bond is up, too, but that seems pretty hard to do.
> 
> 	How much harder is it to fix the locking and permit the action
> in question here?
> 
In this case, to just hack something in place is pretty easy, I can just
initalize the spinlocks for all cases in the bond_create path.  But to do in any
sort of sensical way is much harder, since the code is written such that you
initialize various relevant data structures based on the mode of the bond,
which, as you indicated above, you want the right to change up until the point
where you ifup the bonded interface. 

The whole thing is predicated on the notion that
transitioning from the down to up state is the gating factor to initializing the
current configuration.  What might work is an in-between state in which you commit
and initialize a bond based on the current configuation.  Doing so would allow
you to (re-)initialize a bond configuration in a safe state.  Only after
commiting a configuration could you enslave devices or ifup the bond.  Once up,
further commits would be non-permissable until the bond was brought down again.
Of course, this would also require changing the semantics of the user space
tools.

This also begs the question, is it or is it not safe to enslave devices while
the bond is down?  Clearly from the bug report its unsafe, and I don't know what
other (if any) conditions exist that cause problems when doing this (be that a
deadlock, panic or simply undefined or unexpected behavior).  If its really
unsafe, then issuing a warning seems incorrect, we shouldn't allow user space to
cause things like this, and as such, we should return an error.  If it is safe
(generally) and this is an isolated bug, then we should probably remove the
warning.  But to just issue a vague 'This might do bad things' warning seems
wrong in either case.

I'll respin the patch to just initialize the spinlocks in the morning, if thats
what will fix the deadlock, but it really seems like the wrong way to go to me.
If enslaving devices to a bond while its down has been known to cause problems,
then we shouldn't allow it and we should update the user space tools to
understand and handle that.
Neil

> 	-J
> 
> ---
> 	-Jay Vosburgh, IBM Linux Technology Center, fubar@us.ibm.com
> 

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH] bonding: prevent deadlock on slave store with alb mode
  2011-05-24 23:18           ` Neil Horman
@ 2011-05-25  0:34             ` Jay Vosburgh
  2011-05-25  2:20               ` Neil Horman
  2011-05-25  7:33               ` Nicolas de Pesloüan
  0 siblings, 2 replies; 18+ messages in thread
From: Jay Vosburgh @ 2011-05-25  0:34 UTC (permalink / raw)
  To: Neil Horman
  Cc: =?us-ascii?B?PT9JU08tODg1OS0xP1E/Tmljb2xhc19kZV9QZXNsbz1GQ2FuPz0=?=,
	Andy Gospodarek, netdev, David S. Miller

Neil Horman <nhorman@tuxdriver.com> wrote:

>On Tue, May 24, 2011 at 02:12:54PM -0700, Jay Vosburgh wrote:
>> Nicolas de Pesloüan <nicolas.2p.debian@gmail.com> wrote:
>> 
>> >Le 24/05/2011 22:37, Neil Horman a écrit :
>> >
>> >>>>> +		return -EINVAL;
>> >>>
>> >>> This will turn a warning into an error.
>> >>>
>> >> Yes, because it should have been an error all along.
>> >>
>> >>> This warning existed for long, but never caused the bonding setup to
>> >>> fail. This patch cause some regression for user space. For example,
>> >>> current ifenslave-2.6 package in Debian doesn't ensure bond is UP
>> >>> before enslaving, because this was never required.
>> >>>
>> >> Thats not a regression, thats the kernel returning an error where it should have
>> >> done so all along.  Just because a utility got away with it for awhile and it
>> >> didn't always cause a lockup, doesn't grandfather that application in to a
>> >> situation where the kernel has to support its broken behavior in perpituity.
>> >>
>> >> Besides, iirc, the ifsenslave utility still uses the ioctl path, which this
>> >> patch doesn't touch, so ifenslave is currently unaffected (although I should
>> >> look in the ioctl path to see if we have already added such a check, lest you be
>> >> able to deadlock your system as previously indicated using that tool).
>> >
>> >Unfortunately, no. Recent versions of ifenslave-2.6 on Debian don't use
>> >ioctl (ifenslave binary) anymore, but only sysfs.
>> >
>> >Documentation/bonding.txt should be updated to reflect this change.
>> >pr_warning should be changed to pr_ err.
>> >Bonding version should be bumped.
>> >
>> >Anyway, I will fix this package, but I suspect there exist many user
>> >scripts that don't ensure bond is up before enslaving.
>> 
>> 	I looked at sysconfig (as supplied with opensuse) and it uses
>> sysfs, and does set the master device up first.  The other potential
>> user that comes to mind is that OFED at one point had a script to set up
>> bonding for Infiniband devices.  I don't know if this is still the case,
>> nor do I know if it set the bond device up before enslaving.
>> 
>> 	Generally speaking, though, in the long run I think it should be
>> permissible to change any bonding option when the bond is down (even to
>> values that make no sense in context, e.g., setting the primary to a
>> device not currently enslaved).  My rationale here is that some options
>> are very difficult to modify when the bond is up (e.g., changing the
>> mode), and now some other set is precluded when the bond is down.  The
>> init scripts already have repeat logic in them; this just makes things
>> more complicated.
>> 
>> 	There should be a state wherein any option can be changed (well,
>> maybe not max_bonds), and that should be the down state.  A subset can
>> also be changed while up.  I'd be happy to be able to change all options
>> while the bond is up, too, but that seems pretty hard to do.
>> 
>> 	How much harder is it to fix the locking and permit the action
>> in question here?
>> 
>In this case, to just hack something in place is pretty easy, I can just
>initalize the spinlocks for all cases in the bond_create path.  But to do in any
>sort of sensical way is much harder, since the code is written such that you
>initialize various relevant data structures based on the mode of the bond,
>which, as you indicated above, you want the right to change up until the point
>where you ifup the bonded interface. 
>
>The whole thing is predicated on the notion that
>transitioning from the down to up state is the gating factor to initializing the
>current configuration.  What might work is an in-between state in which you commit
>and initialize a bond based on the current configuation.  Doing so would allow
>you to (re-)initialize a bond configuration in a safe state.  Only after
>commiting a configuration could you enslave devices or ifup the bond.  Once up,
>further commits would be non-permissable until the bond was brought down again.
>Of course, this would also require changing the semantics of the user space
>tools.

	One alternative is to simply permit all option changes while the
bond is down, then commit then as a set when the bond is brought up (or
fail the open or adjust options if the options are invalid).  More or
less what you suggest, except that the "commit" is the ifup.

	Even that is a substantial rework of how things are done now,
since as you point out, options today take effect more or less in real
time.

>This also begs the question, is it or is it not safe to enslave devices while
>the bond is down?  Clearly from the bug report its unsafe, and I don't know what
>other (if any) conditions exist that cause problems when doing this (be that a
>deadlock, panic or simply undefined or unexpected behavior).  If its really
>unsafe, then issuing a warning seems incorrect, we shouldn't allow user space to
>cause things like this, and as such, we should return an error.  If it is safe
>(generally) and this is an isolated bug, then we should probably remove the
>warning.  But to just issue a vague 'This might do bad things' warning seems
>wrong in either case.

	Agreed.  I think it (conceptually) should be safe to add or
remove slaves when the bond is down, and bonding shouldn't complain
about it.

	Ok, I looked through the log, and originally enslaving while
down via sysfs was disallowed when that code was added.

	Later, enslaving while down was enabled in a patch for
Infiniband.  Reading the changelog explains why; for IB, enslaving while
up messed up the multicast group addresses:

commit 6b1bf096508c870889c2be63c7757a04d72116fe
Author: Moni Shoua <monis@voltaire.com>
Date:   Tue Oct 9 19:43:40 2007 -0700

    net/bonding: Enable IP multicast for bonding IPoIB devices
    
    Allow to enslave devices when the bonding device is not up. Over the discussion
    held at the previous post this seemed to be the most clean way to go, where it
    is not expected to cause instabilities.
    
    Normally, the bonding driver is UP before any enslavement takes place.
    Once a netdevice is UP, the network stack acts to have it join some multicast groups
    (eg the all-hosts 224.0.0.1). Now, since ether_setup() have set the bonding device
    type to be ARPHRD_ETHER and address len to be ETHER_ALEN, the net core code
    computes a wrong multicast link address. This is b/c ip_eth_mc_map() is called
    where for multicast joins taking place after the enslavement another ip_xxx_mc_map()
    is called (eg ip_ib_mc_map() when the bond type is ARPHRD_INFINIBAND)
    
    Signed-off-by: Moni Shoua <monis at voltaire.com>
    Signed-off-by: Or Gerlitz <ogerlitz at voltaire.com>
    Acked-by: Jay Vosburgh <fubar@us.ibm.com>
    Signed-off-by: Jeff Garzik <jeff@garzik.org>

	Assuming this situation hasn't changed (and I'm not sure how it
could, because the first IB slave causes a type change), I don't believe
we can disallow enslaving while the bond is down because IB depends on
it.

>I'll respin the patch to just initialize the spinlocks in the morning, if thats
>what will fix the deadlock, but it really seems like the wrong way to go to me.
>If enslaving devices to a bond while its down has been known to cause problems,
>then we shouldn't allow it and we should update the user space tools to
>understand and handle that.

	This is the first I've heard of it causing a panic; looking
back, I can also see some issues with the warning itself (because it
happens prior to the rtnl_trylock, the message may repeat a lot if rtnl
is contended), so I'm happy to see the warning go away either way.

	-J

---
	-Jay Vosburgh, IBM Linux Technology Center, fubar@us.ibm.com

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH] bonding: prevent deadlock on slave store with alb mode
  2011-05-25  0:34             ` Jay Vosburgh
@ 2011-05-25  2:20               ` Neil Horman
  2011-05-25  7:33               ` Nicolas de Pesloüan
  1 sibling, 0 replies; 18+ messages in thread
From: Neil Horman @ 2011-05-25  2:20 UTC (permalink / raw)
  To: Jay Vosburgh
  Cc: =?us-ascii?B?PT9JU08tODg1OS0xP1E/Tmljb2xhc19kZV9QZXNsbz1GQ2FuPz0=?=,
	Andy Gospodarek, netdev, David S. Miller

On Tue, May 24, 2011 at 05:34:28PM -0700, Jay Vosburgh wrote:
> Neil Horman <nhorman@tuxdriver.com> wrote:
> 
> >On Tue, May 24, 2011 at 02:12:54PM -0700, Jay Vosburgh wrote:
> >> Nicolas de Pesloüan <nicolas.2p.debian@gmail.com> wrote:
> >> 
> >> >Le 24/05/2011 22:37, Neil Horman a écrit :
> >> >
> >> >>>>> +		return -EINVAL;
> >> >>>
> >> >>> This will turn a warning into an error.
> >> >>>
> >> >> Yes, because it should have been an error all along.
> >> >>
> >> >>> This warning existed for long, but never caused the bonding setup to
> >> >>> fail. This patch cause some regression for user space. For example,
> >> >>> current ifenslave-2.6 package in Debian doesn't ensure bond is UP
> >> >>> before enslaving, because this was never required.
> >> >>>
> >> >> Thats not a regression, thats the kernel returning an error where it should have
> >> >> done so all along.  Just because a utility got away with it for awhile and it
> >> >> didn't always cause a lockup, doesn't grandfather that application in to a
> >> >> situation where the kernel has to support its broken behavior in perpituity.
> >> >>
> >> >> Besides, iirc, the ifsenslave utility still uses the ioctl path, which this
> >> >> patch doesn't touch, so ifenslave is currently unaffected (although I should
> >> >> look in the ioctl path to see if we have already added such a check, lest you be
> >> >> able to deadlock your system as previously indicated using that tool).
> >> >
> >> >Unfortunately, no. Recent versions of ifenslave-2.6 on Debian don't use
> >> >ioctl (ifenslave binary) anymore, but only sysfs.
> >> >
> >> >Documentation/bonding.txt should be updated to reflect this change.
> >> >pr_warning should be changed to pr_ err.
> >> >Bonding version should be bumped.
> >> >
> >> >Anyway, I will fix this package, but I suspect there exist many user
> >> >scripts that don't ensure bond is up before enslaving.
> >> 
> >> 	I looked at sysconfig (as supplied with opensuse) and it uses
> >> sysfs, and does set the master device up first.  The other potential
> >> user that comes to mind is that OFED at one point had a script to set up
> >> bonding for Infiniband devices.  I don't know if this is still the case,
> >> nor do I know if it set the bond device up before enslaving.
> >> 
> >> 	Generally speaking, though, in the long run I think it should be
> >> permissible to change any bonding option when the bond is down (even to
> >> values that make no sense in context, e.g., setting the primary to a
> >> device not currently enslaved).  My rationale here is that some options
> >> are very difficult to modify when the bond is up (e.g., changing the
> >> mode), and now some other set is precluded when the bond is down.  The
> >> init scripts already have repeat logic in them; this just makes things
> >> more complicated.
> >> 
> >> 	There should be a state wherein any option can be changed (well,
> >> maybe not max_bonds), and that should be the down state.  A subset can
> >> also be changed while up.  I'd be happy to be able to change all options
> >> while the bond is up, too, but that seems pretty hard to do.
> >> 
> >> 	How much harder is it to fix the locking and permit the action
> >> in question here?
> >> 
> >In this case, to just hack something in place is pretty easy, I can just
> >initalize the spinlocks for all cases in the bond_create path.  But to do in any
> >sort of sensical way is much harder, since the code is written such that you
> >initialize various relevant data structures based on the mode of the bond,
> >which, as you indicated above, you want the right to change up until the point
> >where you ifup the bonded interface. 
> >
> >The whole thing is predicated on the notion that
> >transitioning from the down to up state is the gating factor to initializing the
> >current configuration.  What might work is an in-between state in which you commit
> >and initialize a bond based on the current configuation.  Doing so would allow
> >you to (re-)initialize a bond configuration in a safe state.  Only after
> >commiting a configuration could you enslave devices or ifup the bond.  Once up,
> >further commits would be non-permissable until the bond was brought down again.
> >Of course, this would also require changing the semantics of the user space
> >tools.
> 
> 	One alternative is to simply permit all option changes while the
> bond is down, then commit then as a set when the bond is brought up (or
> fail the open or adjust options if the options are invalid).  More or
> less what you suggest, except that the "commit" is the ifup.
> 
Well, thats rather the problem, by making the ifup be the commit step, you deny
the 'in between phase' where the configuration is applied, but prior to the
interface being up, so its safe to enslave devices.

> 	Even that is a substantial rework of how things are done now,
> since as you point out, options today take effect more or less in real
> time.
> 
Agreed its both a large undertaking, and it still requires user space
co-ordination.

> >This also begs the question, is it or is it not safe to enslave devices while
> >the bond is down?  Clearly from the bug report its unsafe, and I don't know what
> >other (if any) conditions exist that cause problems when doing this (be that a
> >deadlock, panic or simply undefined or unexpected behavior).  If its really
> >unsafe, then issuing a warning seems incorrect, we shouldn't allow user space to
> >cause things like this, and as such, we should return an error.  If it is safe
> >(generally) and this is an isolated bug, then we should probably remove the
> >warning.  But to just issue a vague 'This might do bad things' warning seems
> >wrong in either case.
> 
> 	Agreed.  I think it (conceptually) should be safe to add or
> remove slaves when the bond is down, and bonding shouldn't complain
> about it.
> 
Ok, thats good to know then.

> 	Ok, I looked through the log, and originally enslaving while
> down via sysfs was disallowed when that code was added.
> 
> 	Later, enslaving while down was enabled in a patch for
> Infiniband.  Reading the changelog explains why; for IB, enslaving while
> up messed up the multicast group addresses:
> 
> commit 6b1bf096508c870889c2be63c7757a04d72116fe
> Author: Moni Shoua <monis@voltaire.com>
<snip commit log>
> 
> 	Assuming this situation hasn't changed (and I'm not sure how it
> could, because the first IB slave causes a type change), I don't believe
> we can disallow enslaving while the bond is down because IB depends on
> it.
> 
Ok, thats also good to know.  Thank you for doing the research on that.

> >I'll respin the patch to just initialize the spinlocks in the morning, if thats
> >what will fix the deadlock, but it really seems like the wrong way to go to me.
> >If enslaving devices to a bond while its down has been known to cause problems,
> >then we shouldn't allow it and we should update the user space tools to
> >understand and handle that.
> 
> 	This is the first I've heard of it causing a panic; looking
> back, I can also see some issues with the warning itself (because it
> happens prior to the rtnl_trylock, the message may repeat a lot if rtnl
> is contended), so I'm happy to see the warning go away either way.
> 
Copy that, it sounds like the warning isn't actually needed (i.e. enslaving with
the bond down should be safe, and this panic is an outlier condition).  I'll
respin the patch with the spinlock initialized in the create path and remove the
warning removed in the AM

Thanks!
Neil

> 	-J
> 
> ---
> 	-Jay Vosburgh, IBM Linux Technology Center, fubar@us.ibm.com
> 

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH] bonding: prevent deadlock on slave store with alb mode
  2011-05-25  0:34             ` Jay Vosburgh
  2011-05-25  2:20               ` Neil Horman
@ 2011-05-25  7:33               ` Nicolas de Pesloüan
  1 sibling, 0 replies; 18+ messages in thread
From: Nicolas de Pesloüan @ 2011-05-25  7:33 UTC (permalink / raw)
  To: Jay Vosburgh; +Cc: Neil Horman, Andy Gospodarek, netdev, David S. Miller

Le 25/05/2011 02:34, Jay Vosburgh a écrit :

> 	One alternative is to simply permit all option changes while the
> bond is down, then commit then as a set when the bond is brought up (or
> fail the open or adjust options if the options are invalid).  More or
> less what you suggest, except that the "commit" is the ifup.
>
> 	Even that is a substantial rework of how things are done now,
> since as you point out, options today take effect more or less in real
> time.

Agreed.

Setting an option while the master is down should only syntax-check the option value and store the 
value for later use (at the time we bring the bond up). Setting an option while the master is up 
should take effect in realtime if possible/appropriate or store it and issue a warning saying one 
need to down+up the bond to commit the change.

>> This also begs the question, is it or is it not safe to enslave devices while
>> the bond is down?  Clearly from the bug report its unsafe, and I don't know what
>> other (if any) conditions exist that cause problems when doing this (be that a
>> deadlock, panic or simply undefined or unexpected behavior).  If its really
>> unsafe, then issuing a warning seems incorrect, we shouldn't allow user space to
>> cause things like this, and as such, we should return an error.  If it is safe
>> (generally) and this is an isolated bug, then we should probably remove the
>> warning.  But to just issue a vague 'This might do bad things' warning seems
>> wrong in either case.
>
> 	Agreed.  I think it (conceptually) should be safe to add or
> remove slaves when the bond is down, and bonding shouldn't complain
> about it.

Are there any reasons not to consider enslavement as a normal option with the same processing as 
other options (only do real enslavement at the time we bring the master up)?

	Nicolas.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* [PATCH] bonding: prevent deadlock on slave store with alb mode (v2)
  2011-05-24 19:36 [PATCH] bonding: prevent deadlock on slave store with alb mode Neil Horman
  2011-05-24 20:00 ` Andy Gospodarek
@ 2011-05-25 17:16 ` Neil Horman
  2011-05-25 17:32   ` Jay Vosburgh
  2011-05-25 21:58   ` David Miller
  2011-05-25 18:13 ` [PATCH] bonding: prevent deadlock on slave store with alb mode (v3) Neil Horman
  2 siblings, 2 replies; 18+ messages in thread
From: Neil Horman @ 2011-05-25 17:16 UTC (permalink / raw)
  To: netdev
  Cc: Neil Horman, Jay Vosburgh, Andy Gospodarek, nicolas.2p.debian,
	David S. Miller

This soft lockup was recently reported:

[root@dell-per715-01 ~]# echo +bond5 > /sys/class/net/bonding_masters
[root@dell-per715-01 ~]# echo +eth1 > /sys/class/net/bond5/bonding/slaves
bonding: bond5: doing slave updates when interface is down.
bonding bond5: master_dev is not up in bond_enslave
[root@dell-per715-01 ~]# echo -eth1 > /sys/class/net/bond5/bonding/slaves
bonding: bond5: doing slave updates when interface is down.

BUG: soft lockup - CPU#12 stuck for 60s! [bash:6444]
CPU 12:
Modules linked in: bonding autofs4 hidp rfcomm l2cap bluetooth lockd sunrpc
be2d
Pid: 6444, comm: bash Not tainted 2.6.18-262.el5 #1
RIP: 0010:[<ffffffff80064bf0>]  [<ffffffff80064bf0>]
.text.lock.spinlock+0x26/00
RSP: 0018:ffff810113167da8  EFLAGS: 00000286
RAX: ffff810113167fd8 RBX: ffff810123a47800 RCX: 0000000000ff1025
RDX: 0000000000000000 RSI: ffff810123a47800 RDI: ffff81021b57f6f8
RBP: ffff81021b57f500 R08: 0000000000000000 R09: 000000000000000c
R10: 00000000ffffffff R11: ffff81011d41c000 R12: ffff81021b57f000
R13: 0000000000000000 R14: 0000000000000282 R15: 0000000000000282
FS:  00002b3b41ef3f50(0000) GS:ffff810123b27940(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
CR2: 00002b3b456dd000 CR3: 000000031fc60000 CR4: 00000000000006e0

Call Trace:
 [<ffffffff80064af9>] _spin_lock_bh+0x9/0x14
 [<ffffffff886937d7>] :bonding:tlb_clear_slave+0x22/0xa1
 [<ffffffff8869423c>] :bonding:bond_alb_deinit_slave+0xba/0xf0
 [<ffffffff8868dda6>] :bonding:bond_release+0x1b4/0x450
 [<ffffffff8006457b>] __down_write_nested+0x12/0x92
 [<ffffffff88696ae4>] :bonding:bonding_store_slaves+0x25c/0x2f7
 [<ffffffff801106f7>] sysfs_write_file+0xb9/0xe8
 [<ffffffff80016b87>] vfs_write+0xce/0x174
 [<ffffffff80017450>] sys_write+0x45/0x6e
 [<ffffffff8005d28d>] tracesys+0xd5/0xe0

It occurs because we are able to change the slave configuarion of a bond while
the bond interface is down.  The bonding driver initializes some data structures
only after its ndo_open routine is called.  Among them is the initalization of
the alb tx and rx hash locks.  So if we add or remove a slave without first
opening the bond master device, we run the risk of trying to lock/unlock a
spinlock that has garbage for data in it, which results in our above softlock.

Note that sometimes this works, because in many cases an unlocked spinlock has
the raw_lock parameter initialized to zero (meaning that the kzalloc of the
net_device private data is equivalent to calling spin_lock_init), but thats not
true in all cases, and we aren't guaranteed that condition, so we need to pass
the relevant spinlocks through the spin_lock_init function.

Fix it by moving the spin_lock_init calls for the tx and rx hashtable locks to
the ndo_init path, so they are ready for use by the bond_store_slaves path.

Change notes:
v2) Based on conversation with Jay and Nicolas it seems that the ability to
enslave devices while the bond master is down should be safe to do.  As such
this is an outlier bug, and so instead we'll just initalize the errant spinlocks
in the init path rather than the open path, solving the problem.  We'll also
remove the warnings about the bond being down during enslave operations, since
it should be safe

Signed-off-by: Neil Horman <nhorman@tuxdriver.com>
Reported-by: jtluka@redhat.com
CC: Jay Vosburgh <fubar@us.ibm.com>
CC: Andy Gospodarek <andy@greyhouse.net>
CC: nicolas.2p.debian@gmail.com
CC: "David S. Miller" <davem@davemloft.net>
---
 drivers/net/bonding/bond_alb.c   |    4 ----
 drivers/net/bonding/bond_main.c  |   16 ++++++++++------
 drivers/net/bonding/bond_sysfs.c |    6 ------
 3 files changed, 10 insertions(+), 16 deletions(-)

diff --git a/drivers/net/bonding/bond_alb.c b/drivers/net/bonding/bond_alb.c
index 8f2d2e7..2df9276 100644
--- a/drivers/net/bonding/bond_alb.c
+++ b/drivers/net/bonding/bond_alb.c
@@ -163,8 +163,6 @@ static int tlb_initialize(struct bonding *bond)
 	struct tlb_client_info *new_hashtbl;
 	int i;
 
-	spin_lock_init(&(bond_info->tx_hashtbl_lock));
-
 	new_hashtbl = kzalloc(size, GFP_KERNEL);
 	if (!new_hashtbl) {
 		pr_err("%s: Error: Failed to allocate TLB hash table\n",
@@ -747,8 +745,6 @@ static int rlb_initialize(struct bonding *bond)
 	int size = RLB_HASH_TABLE_SIZE * sizeof(struct rlb_client_info);
 	int i;
 
-	spin_lock_init(&(bond_info->rx_hashtbl_lock));
-
 	new_hashtbl = kmalloc(size, GFP_KERNEL);
 	if (!new_hashtbl) {
 		pr_err("%s: Error: Failed to allocate RLB hash table\n",
diff --git a/drivers/net/bonding/bond_main.c b/drivers/net/bonding/bond_main.c
index 088fd84..59bf5c5 100644
--- a/drivers/net/bonding/bond_main.c
+++ b/drivers/net/bonding/bond_main.c
@@ -1542,12 +1542,6 @@ int bond_enslave(struct net_device *bond_dev, struct net_device *slave_dev)
 			   bond_dev->name, slave_dev->name);
 	}
 
-	/* bond must be initialized by bond_open() before enslaving */
-	if (!(bond_dev->flags & IFF_UP)) {
-		pr_warning("%s: master_dev is not up in bond_enslave\n",
-			   bond_dev->name);
-	}
-
 	/* already enslaved */
 	if (slave_dev->flags & IFF_SLAVE) {
 		pr_debug("Error, Device was already enslaved\n");
@@ -4832,9 +4826,19 @@ static int bond_init(struct net_device *bond_dev)
 {
 	struct bonding *bond = netdev_priv(bond_dev);
 	struct bond_net *bn = net_generic(dev_net(bond_dev), bond_net_id);
+	struct alb_bond_info *bond_info = &(BOND_ALB_INFO(bond));
 
 	pr_debug("Begin bond_init for %s\n", bond_dev->name);
 
+	/*
+	 * Initialize locks that may be requiered during
+	 * en/deslave operations.  All of the bond_open work
+	 * (of which this is part) should really be moved to
+	 * a phase prior to dev_open
+	 */
+	spin_lock_init(&(bond_info->tx_hashtbl_lock));
+	spin_lock_init(&(bond_info->rx_hashtbl_lock));
+
 	bond->wq = create_singlethread_workqueue(bond_dev->name);
 	if (!bond->wq)
 		return -ENOMEM;
diff --git a/drivers/net/bonding/bond_sysfs.c b/drivers/net/bonding/bond_sysfs.c
index 4059bfc..bb1319f 100644
--- a/drivers/net/bonding/bond_sysfs.c
+++ b/drivers/net/bonding/bond_sysfs.c
@@ -227,12 +227,6 @@ static ssize_t bonding_store_slaves(struct device *d,
 	struct net_device *dev;
 	struct bonding *bond = to_bond(d);
 
-	/* Quick sanity check -- is the bond interface up? */
-	if (!(bond->dev->flags & IFF_UP)) {
-		pr_warning("%s: doing slave updates when interface is down.\n",
-			   bond->dev->name);
-	}
-
 	if (!rtnl_trylock())
 		return restart_syscall();
 
-- 
1.7.5.2


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* Re: [PATCH] bonding: prevent deadlock on slave store with alb mode (v2)
  2011-05-25 17:16 ` [PATCH] bonding: prevent deadlock on slave store with alb mode (v2) Neil Horman
@ 2011-05-25 17:32   ` Jay Vosburgh
  2011-05-25 21:58   ` David Miller
  1 sibling, 0 replies; 18+ messages in thread
From: Jay Vosburgh @ 2011-05-25 17:32 UTC (permalink / raw)
  To: Neil Horman; +Cc: netdev, Andy Gospodarek, nicolas.2p.debian, David S. Miller

Neil Horman <nhorman@tuxdriver.com> wrote:

>This soft lockup was recently reported:
>
>[root@dell-per715-01 ~]# echo +bond5 > /sys/class/net/bonding_masters
>[root@dell-per715-01 ~]# echo +eth1 > /sys/class/net/bond5/bonding/slaves
>bonding: bond5: doing slave updates when interface is down.
>bonding bond5: master_dev is not up in bond_enslave
>[root@dell-per715-01 ~]# echo -eth1 > /sys/class/net/bond5/bonding/slaves
>bonding: bond5: doing slave updates when interface is down.
>
>BUG: soft lockup - CPU#12 stuck for 60s! [bash:6444]
>CPU 12:
>Modules linked in: bonding autofs4 hidp rfcomm l2cap bluetooth lockd sunrpc
>be2d
>Pid: 6444, comm: bash Not tainted 2.6.18-262.el5 #1
>RIP: 0010:[<ffffffff80064bf0>]  [<ffffffff80064bf0>]
>.text.lock.spinlock+0x26/00
>RSP: 0018:ffff810113167da8  EFLAGS: 00000286
>RAX: ffff810113167fd8 RBX: ffff810123a47800 RCX: 0000000000ff1025
>RDX: 0000000000000000 RSI: ffff810123a47800 RDI: ffff81021b57f6f8
>RBP: ffff81021b57f500 R08: 0000000000000000 R09: 000000000000000c
>R10: 00000000ffffffff R11: ffff81011d41c000 R12: ffff81021b57f000
>R13: 0000000000000000 R14: 0000000000000282 R15: 0000000000000282
>FS:  00002b3b41ef3f50(0000) GS:ffff810123b27940(0000) knlGS:0000000000000000
>CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
>CR2: 00002b3b456dd000 CR3: 000000031fc60000 CR4: 00000000000006e0
>
>Call Trace:
> [<ffffffff80064af9>] _spin_lock_bh+0x9/0x14
> [<ffffffff886937d7>] :bonding:tlb_clear_slave+0x22/0xa1
> [<ffffffff8869423c>] :bonding:bond_alb_deinit_slave+0xba/0xf0
> [<ffffffff8868dda6>] :bonding:bond_release+0x1b4/0x450
> [<ffffffff8006457b>] __down_write_nested+0x12/0x92
> [<ffffffff88696ae4>] :bonding:bonding_store_slaves+0x25c/0x2f7
> [<ffffffff801106f7>] sysfs_write_file+0xb9/0xe8
> [<ffffffff80016b87>] vfs_write+0xce/0x174
> [<ffffffff80017450>] sys_write+0x45/0x6e
> [<ffffffff8005d28d>] tracesys+0xd5/0xe0
>
>It occurs because we are able to change the slave configuarion of a bond while
>the bond interface is down.  The bonding driver initializes some data structures
>only after its ndo_open routine is called.  Among them is the initalization of
>the alb tx and rx hash locks.  So if we add or remove a slave without first
>opening the bond master device, we run the risk of trying to lock/unlock a
>spinlock that has garbage for data in it, which results in our above softlock.
>
>Note that sometimes this works, because in many cases an unlocked spinlock has
>the raw_lock parameter initialized to zero (meaning that the kzalloc of the
>net_device private data is equivalent to calling spin_lock_init), but thats not
>true in all cases, and we aren't guaranteed that condition, so we need to pass
>the relevant spinlocks through the spin_lock_init function.
>
>Fix it by moving the spin_lock_init calls for the tx and rx hashtable locks to
>the ndo_init path, so they are ready for use by the bond_store_slaves path.
>
>Change notes:
>v2) Based on conversation with Jay and Nicolas it seems that the ability to
>enslave devices while the bond master is down should be safe to do.  As such
>this is an outlier bug, and so instead we'll just initalize the errant spinlocks
>in the init path rather than the open path, solving the problem.  We'll also
>remove the warnings about the bond being down during enslave operations, since
>it should be safe

	One spelling nit, below; fix that and I'm good with this.

Signed-off-by: Jay Vosburgh <fubar@us.ibm.com>

	-J

>Signed-off-by: Neil Horman <nhorman@tuxdriver.com>
>Reported-by: jtluka@redhat.com
>CC: Jay Vosburgh <fubar@us.ibm.com>
>CC: Andy Gospodarek <andy@greyhouse.net>
>CC: nicolas.2p.debian@gmail.com
>CC: "David S. Miller" <davem@davemloft.net>
>---
> drivers/net/bonding/bond_alb.c   |    4 ----
> drivers/net/bonding/bond_main.c  |   16 ++++++++++------
> drivers/net/bonding/bond_sysfs.c |    6 ------
> 3 files changed, 10 insertions(+), 16 deletions(-)
>
>diff --git a/drivers/net/bonding/bond_alb.c b/drivers/net/bonding/bond_alb.c
>index 8f2d2e7..2df9276 100644
>--- a/drivers/net/bonding/bond_alb.c
>+++ b/drivers/net/bonding/bond_alb.c
>@@ -163,8 +163,6 @@ static int tlb_initialize(struct bonding *bond)
> 	struct tlb_client_info *new_hashtbl;
> 	int i;
>
>-	spin_lock_init(&(bond_info->tx_hashtbl_lock));
>-
> 	new_hashtbl = kzalloc(size, GFP_KERNEL);
> 	if (!new_hashtbl) {
> 		pr_err("%s: Error: Failed to allocate TLB hash table\n",
>@@ -747,8 +745,6 @@ static int rlb_initialize(struct bonding *bond)
> 	int size = RLB_HASH_TABLE_SIZE * sizeof(struct rlb_client_info);
> 	int i;
>
>-	spin_lock_init(&(bond_info->rx_hashtbl_lock));
>-
> 	new_hashtbl = kmalloc(size, GFP_KERNEL);
> 	if (!new_hashtbl) {
> 		pr_err("%s: Error: Failed to allocate RLB hash table\n",
>diff --git a/drivers/net/bonding/bond_main.c b/drivers/net/bonding/bond_main.c
>index 088fd84..59bf5c5 100644
>--- a/drivers/net/bonding/bond_main.c
>+++ b/drivers/net/bonding/bond_main.c
>@@ -1542,12 +1542,6 @@ int bond_enslave(struct net_device *bond_dev, struct net_device *slave_dev)
> 			   bond_dev->name, slave_dev->name);
> 	}
>
>-	/* bond must be initialized by bond_open() before enslaving */
>-	if (!(bond_dev->flags & IFF_UP)) {
>-		pr_warning("%s: master_dev is not up in bond_enslave\n",
>-			   bond_dev->name);
>-	}
>-
> 	/* already enslaved */
> 	if (slave_dev->flags & IFF_SLAVE) {
> 		pr_debug("Error, Device was already enslaved\n");
>@@ -4832,9 +4826,19 @@ static int bond_init(struct net_device *bond_dev)
> {
> 	struct bonding *bond = netdev_priv(bond_dev);
> 	struct bond_net *bn = net_generic(dev_net(bond_dev), bond_net_id);
>+	struct alb_bond_info *bond_info = &(BOND_ALB_INFO(bond));
>
> 	pr_debug("Begin bond_init for %s\n", bond_dev->name);
>
>+	/*
>+	 * Initialize locks that may be requiered during

	"Required."

>+	 * en/deslave operations.  All of the bond_open work
>+	 * (of which this is part) should really be moved to
>+	 * a phase prior to dev_open
>+	 */
>+	spin_lock_init(&(bond_info->tx_hashtbl_lock));
>+	spin_lock_init(&(bond_info->rx_hashtbl_lock));
>+
> 	bond->wq = create_singlethread_workqueue(bond_dev->name);
> 	if (!bond->wq)
> 		return -ENOMEM;
>diff --git a/drivers/net/bonding/bond_sysfs.c b/drivers/net/bonding/bond_sysfs.c
>index 4059bfc..bb1319f 100644
>--- a/drivers/net/bonding/bond_sysfs.c
>+++ b/drivers/net/bonding/bond_sysfs.c
>@@ -227,12 +227,6 @@ static ssize_t bonding_store_slaves(struct device *d,
> 	struct net_device *dev;
> 	struct bonding *bond = to_bond(d);
>
>-	/* Quick sanity check -- is the bond interface up? */
>-	if (!(bond->dev->flags & IFF_UP)) {
>-		pr_warning("%s: doing slave updates when interface is down.\n",
>-			   bond->dev->name);
>-	}
>-
> 	if (!rtnl_trylock())
> 		return restart_syscall();
>
>-- 
>1.7.5.2
>

---
	-Jay Vosburgh, IBM Linux Technology Center, fubar@us.ibm.com

^ permalink raw reply	[flat|nested] 18+ messages in thread

* [PATCH] bonding: prevent deadlock on slave store with alb mode (v3)
  2011-05-24 19:36 [PATCH] bonding: prevent deadlock on slave store with alb mode Neil Horman
  2011-05-24 20:00 ` Andy Gospodarek
  2011-05-25 17:16 ` [PATCH] bonding: prevent deadlock on slave store with alb mode (v2) Neil Horman
@ 2011-05-25 18:13 ` Neil Horman
  2011-05-25 18:59   ` Jay Vosburgh
  2 siblings, 1 reply; 18+ messages in thread
From: Neil Horman @ 2011-05-25 18:13 UTC (permalink / raw)
  To: netdev
  Cc: Neil Horman, Jay Vosburgh, Andy Gospodarek, nicolas.2p.debian,
	David S. Miller

This soft lockup was recently reported:

[root@dell-per715-01 ~]# echo +bond5 > /sys/class/net/bonding_masters
[root@dell-per715-01 ~]# echo +eth1 > /sys/class/net/bond5/bonding/slaves
bonding: bond5: doing slave updates when interface is down.
bonding bond5: master_dev is not up in bond_enslave
[root@dell-per715-01 ~]# echo -eth1 > /sys/class/net/bond5/bonding/slaves
bonding: bond5: doing slave updates when interface is down.

BUG: soft lockup - CPU#12 stuck for 60s! [bash:6444]
CPU 12:
Modules linked in: bonding autofs4 hidp rfcomm l2cap bluetooth lockd sunrpc
be2d
Pid: 6444, comm: bash Not tainted 2.6.18-262.el5 #1
RIP: 0010:[<ffffffff80064bf0>]  [<ffffffff80064bf0>]
.text.lock.spinlock+0x26/00
RSP: 0018:ffff810113167da8  EFLAGS: 00000286
RAX: ffff810113167fd8 RBX: ffff810123a47800 RCX: 0000000000ff1025
RDX: 0000000000000000 RSI: ffff810123a47800 RDI: ffff81021b57f6f8
RBP: ffff81021b57f500 R08: 0000000000000000 R09: 000000000000000c
R10: 00000000ffffffff R11: ffff81011d41c000 R12: ffff81021b57f000
R13: 0000000000000000 R14: 0000000000000282 R15: 0000000000000282
FS:  00002b3b41ef3f50(0000) GS:ffff810123b27940(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
CR2: 00002b3b456dd000 CR3: 000000031fc60000 CR4: 00000000000006e0

Call Trace:
 [<ffffffff80064af9>] _spin_lock_bh+0x9/0x14
 [<ffffffff886937d7>] :bonding:tlb_clear_slave+0x22/0xa1
 [<ffffffff8869423c>] :bonding:bond_alb_deinit_slave+0xba/0xf0
 [<ffffffff8868dda6>] :bonding:bond_release+0x1b4/0x450
 [<ffffffff8006457b>] __down_write_nested+0x12/0x92
 [<ffffffff88696ae4>] :bonding:bonding_store_slaves+0x25c/0x2f7
 [<ffffffff801106f7>] sysfs_write_file+0xb9/0xe8
 [<ffffffff80016b87>] vfs_write+0xce/0x174
 [<ffffffff80017450>] sys_write+0x45/0x6e
 [<ffffffff8005d28d>] tracesys+0xd5/0xe0

It occurs because we are able to change the slave configuarion of a bond while
the bond interface is down.  The bonding driver initializes some data structures
only after its ndo_open routine is called.  Among them is the initalization of
the alb tx and rx hash locks.  So if we add or remove a slave without first
opening the bond master device, we run the risk of trying to lock/unlock a
spinlock that has garbage for data in it, which results in our above softlock.

Note that sometimes this works, because in many cases an unlocked spinlock has
the raw_lock parameter initialized to zero (meaning that the kzalloc of the
net_device private data is equivalent to calling spin_lock_init), but thats not
true in all cases, and we aren't guaranteed that condition, so we need to pass
the relevant spinlocks through the spin_lock_init function.

Fix it by moving the spin_lock_init calls for the tx and rx hashtable locks to
the ndo_init path, so they are ready for use by the bond_store_slaves path.

Change notes:
v2) Based on conversation with Jay and Nicolas it seems that the ability to
enslave devices while the bond master is down should be safe to do.  As such
this is an outlier bug, and so instead we'll just initalize the errant spinlocks
in the init path rather than the open path, solving the problem.  We'll also
remove the warnings about the bond being down during enslave operations, since
it should be safe

v3) Fix spelling error

Signed-off-by: Neil Horman <nhorman@tuxdriver.com>
Reported-by: jtluka@redhat.com
CC: Jay Vosburgh <fubar@us.ibm.com>
CC: Andy Gospodarek <andy@greyhouse.net>
CC: nicolas.2p.debian@gmail.com
CC: "David S. Miller" <davem@davemloft.net>
---
 drivers/net/bonding/bond_alb.c   |    4 ----
 drivers/net/bonding/bond_main.c  |   16 ++++++++++------
 drivers/net/bonding/bond_sysfs.c |    6 ------
 3 files changed, 10 insertions(+), 16 deletions(-)

diff --git a/drivers/net/bonding/bond_alb.c b/drivers/net/bonding/bond_alb.c
index 8f2d2e7..2df9276 100644
--- a/drivers/net/bonding/bond_alb.c
+++ b/drivers/net/bonding/bond_alb.c
@@ -163,8 +163,6 @@ static int tlb_initialize(struct bonding *bond)
 	struct tlb_client_info *new_hashtbl;
 	int i;
 
-	spin_lock_init(&(bond_info->tx_hashtbl_lock));
-
 	new_hashtbl = kzalloc(size, GFP_KERNEL);
 	if (!new_hashtbl) {
 		pr_err("%s: Error: Failed to allocate TLB hash table\n",
@@ -747,8 +745,6 @@ static int rlb_initialize(struct bonding *bond)
 	int size = RLB_HASH_TABLE_SIZE * sizeof(struct rlb_client_info);
 	int i;
 
-	spin_lock_init(&(bond_info->rx_hashtbl_lock));
-
 	new_hashtbl = kmalloc(size, GFP_KERNEL);
 	if (!new_hashtbl) {
 		pr_err("%s: Error: Failed to allocate RLB hash table\n",
diff --git a/drivers/net/bonding/bond_main.c b/drivers/net/bonding/bond_main.c
index 088fd84..c0045d7 100644
--- a/drivers/net/bonding/bond_main.c
+++ b/drivers/net/bonding/bond_main.c
@@ -1542,12 +1542,6 @@ int bond_enslave(struct net_device *bond_dev, struct net_device *slave_dev)
 			   bond_dev->name, slave_dev->name);
 	}
 
-	/* bond must be initialized by bond_open() before enslaving */
-	if (!(bond_dev->flags & IFF_UP)) {
-		pr_warning("%s: master_dev is not up in bond_enslave\n",
-			   bond_dev->name);
-	}
-
 	/* already enslaved */
 	if (slave_dev->flags & IFF_SLAVE) {
 		pr_debug("Error, Device was already enslaved\n");
@@ -4832,9 +4826,19 @@ static int bond_init(struct net_device *bond_dev)
 {
 	struct bonding *bond = netdev_priv(bond_dev);
 	struct bond_net *bn = net_generic(dev_net(bond_dev), bond_net_id);
+	struct alb_bond_info *bond_info = &(BOND_ALB_INFO(bond));
 
 	pr_debug("Begin bond_init for %s\n", bond_dev->name);
 
+	/*
+	 * Initialize locks that may be required during
+	 * en/deslave operations.  All of the bond_open work
+	 * (of which this is part) should really be moved to
+	 * a phase prior to dev_open
+	 */
+	spin_lock_init(&(bond_info->tx_hashtbl_lock));
+	spin_lock_init(&(bond_info->rx_hashtbl_lock));
+
 	bond->wq = create_singlethread_workqueue(bond_dev->name);
 	if (!bond->wq)
 		return -ENOMEM;
diff --git a/drivers/net/bonding/bond_sysfs.c b/drivers/net/bonding/bond_sysfs.c
index 4059bfc..bb1319f 100644
--- a/drivers/net/bonding/bond_sysfs.c
+++ b/drivers/net/bonding/bond_sysfs.c
@@ -227,12 +227,6 @@ static ssize_t bonding_store_slaves(struct device *d,
 	struct net_device *dev;
 	struct bonding *bond = to_bond(d);
 
-	/* Quick sanity check -- is the bond interface up? */
-	if (!(bond->dev->flags & IFF_UP)) {
-		pr_warning("%s: doing slave updates when interface is down.\n",
-			   bond->dev->name);
-	}
-
 	if (!rtnl_trylock())
 		return restart_syscall();
 
-- 
1.7.5.2


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* Re: [PATCH] bonding: prevent deadlock on slave store with alb mode (v3)
  2011-05-25 18:13 ` [PATCH] bonding: prevent deadlock on slave store with alb mode (v3) Neil Horman
@ 2011-05-25 18:59   ` Jay Vosburgh
  0 siblings, 0 replies; 18+ messages in thread
From: Jay Vosburgh @ 2011-05-25 18:59 UTC (permalink / raw)
  To: Neil Horman; +Cc: netdev, Andy Gospodarek, nicolas.2p.debian, David S. Miller

Neil Horman <nhorman@tuxdriver.com> wrote:

>This soft lockup was recently reported:
>
>[root@dell-per715-01 ~]# echo +bond5 > /sys/class/net/bonding_masters
>[root@dell-per715-01 ~]# echo +eth1 > /sys/class/net/bond5/bonding/slaves
>bonding: bond5: doing slave updates when interface is down.
>bonding bond5: master_dev is not up in bond_enslave
>[root@dell-per715-01 ~]# echo -eth1 > /sys/class/net/bond5/bonding/slaves
>bonding: bond5: doing slave updates when interface is down.
>
>BUG: soft lockup - CPU#12 stuck for 60s! [bash:6444]
>CPU 12:
>Modules linked in: bonding autofs4 hidp rfcomm l2cap bluetooth lockd sunrpc
>be2d
>Pid: 6444, comm: bash Not tainted 2.6.18-262.el5 #1
>RIP: 0010:[<ffffffff80064bf0>]  [<ffffffff80064bf0>]
>.text.lock.spinlock+0x26/00
>RSP: 0018:ffff810113167da8  EFLAGS: 00000286
>RAX: ffff810113167fd8 RBX: ffff810123a47800 RCX: 0000000000ff1025
>RDX: 0000000000000000 RSI: ffff810123a47800 RDI: ffff81021b57f6f8
>RBP: ffff81021b57f500 R08: 0000000000000000 R09: 000000000000000c
>R10: 00000000ffffffff R11: ffff81011d41c000 R12: ffff81021b57f000
>R13: 0000000000000000 R14: 0000000000000282 R15: 0000000000000282
>FS:  00002b3b41ef3f50(0000) GS:ffff810123b27940(0000) knlGS:0000000000000000
>CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
>CR2: 00002b3b456dd000 CR3: 000000031fc60000 CR4: 00000000000006e0
>
>Call Trace:
> [<ffffffff80064af9>] _spin_lock_bh+0x9/0x14
> [<ffffffff886937d7>] :bonding:tlb_clear_slave+0x22/0xa1
> [<ffffffff8869423c>] :bonding:bond_alb_deinit_slave+0xba/0xf0
> [<ffffffff8868dda6>] :bonding:bond_release+0x1b4/0x450
> [<ffffffff8006457b>] __down_write_nested+0x12/0x92
> [<ffffffff88696ae4>] :bonding:bonding_store_slaves+0x25c/0x2f7
> [<ffffffff801106f7>] sysfs_write_file+0xb9/0xe8
> [<ffffffff80016b87>] vfs_write+0xce/0x174
> [<ffffffff80017450>] sys_write+0x45/0x6e
> [<ffffffff8005d28d>] tracesys+0xd5/0xe0
>
>It occurs because we are able to change the slave configuarion of a bond while
>the bond interface is down.  The bonding driver initializes some data structures
>only after its ndo_open routine is called.  Among them is the initalization of
>the alb tx and rx hash locks.  So if we add or remove a slave without first
>opening the bond master device, we run the risk of trying to lock/unlock a
>spinlock that has garbage for data in it, which results in our above softlock.
>
>Note that sometimes this works, because in many cases an unlocked spinlock has
>the raw_lock parameter initialized to zero (meaning that the kzalloc of the
>net_device private data is equivalent to calling spin_lock_init), but thats not
>true in all cases, and we aren't guaranteed that condition, so we need to pass
>the relevant spinlocks through the spin_lock_init function.
>
>Fix it by moving the spin_lock_init calls for the tx and rx hashtable locks to
>the ndo_init path, so they are ready for use by the bond_store_slaves path.
>
>Change notes:
>v2) Based on conversation with Jay and Nicolas it seems that the ability to
>enslave devices while the bond master is down should be safe to do.  As such
>this is an outlier bug, and so instead we'll just initalize the errant spinlocks
>in the init path rather than the open path, solving the problem.  We'll also
>remove the warnings about the bond being down during enslave operations, since
>it should be safe
>
>v3) Fix spelling error
>
>Signed-off-by: Neil Horman <nhorman@tuxdriver.com>

Signed-off-by: Jay Vosburgh <fubar@us.ibm.com>


>Reported-by: jtluka@redhat.com
>CC: Jay Vosburgh <fubar@us.ibm.com>
>CC: Andy Gospodarek <andy@greyhouse.net>
>CC: nicolas.2p.debian@gmail.com
>CC: "David S. Miller" <davem@davemloft.net>
>---
> drivers/net/bonding/bond_alb.c   |    4 ----
> drivers/net/bonding/bond_main.c  |   16 ++++++++++------
> drivers/net/bonding/bond_sysfs.c |    6 ------
> 3 files changed, 10 insertions(+), 16 deletions(-)
>
>diff --git a/drivers/net/bonding/bond_alb.c b/drivers/net/bonding/bond_alb.c
>index 8f2d2e7..2df9276 100644
>--- a/drivers/net/bonding/bond_alb.c
>+++ b/drivers/net/bonding/bond_alb.c
>@@ -163,8 +163,6 @@ static int tlb_initialize(struct bonding *bond)
> 	struct tlb_client_info *new_hashtbl;
> 	int i;
>
>-	spin_lock_init(&(bond_info->tx_hashtbl_lock));
>-
> 	new_hashtbl = kzalloc(size, GFP_KERNEL);
> 	if (!new_hashtbl) {
> 		pr_err("%s: Error: Failed to allocate TLB hash table\n",
>@@ -747,8 +745,6 @@ static int rlb_initialize(struct bonding *bond)
> 	int size = RLB_HASH_TABLE_SIZE * sizeof(struct rlb_client_info);
> 	int i;
>
>-	spin_lock_init(&(bond_info->rx_hashtbl_lock));
>-
> 	new_hashtbl = kmalloc(size, GFP_KERNEL);
> 	if (!new_hashtbl) {
> 		pr_err("%s: Error: Failed to allocate RLB hash table\n",
>diff --git a/drivers/net/bonding/bond_main.c b/drivers/net/bonding/bond_main.c
>index 088fd84..c0045d7 100644
>--- a/drivers/net/bonding/bond_main.c
>+++ b/drivers/net/bonding/bond_main.c
>@@ -1542,12 +1542,6 @@ int bond_enslave(struct net_device *bond_dev, struct net_device *slave_dev)
> 			   bond_dev->name, slave_dev->name);
> 	}
>
>-	/* bond must be initialized by bond_open() before enslaving */
>-	if (!(bond_dev->flags & IFF_UP)) {
>-		pr_warning("%s: master_dev is not up in bond_enslave\n",
>-			   bond_dev->name);
>-	}
>-
> 	/* already enslaved */
> 	if (slave_dev->flags & IFF_SLAVE) {
> 		pr_debug("Error, Device was already enslaved\n");
>@@ -4832,9 +4826,19 @@ static int bond_init(struct net_device *bond_dev)
> {
> 	struct bonding *bond = netdev_priv(bond_dev);
> 	struct bond_net *bn = net_generic(dev_net(bond_dev), bond_net_id);
>+	struct alb_bond_info *bond_info = &(BOND_ALB_INFO(bond));
>
> 	pr_debug("Begin bond_init for %s\n", bond_dev->name);
>
>+	/*
>+	 * Initialize locks that may be required during
>+	 * en/deslave operations.  All of the bond_open work
>+	 * (of which this is part) should really be moved to
>+	 * a phase prior to dev_open
>+	 */
>+	spin_lock_init(&(bond_info->tx_hashtbl_lock));
>+	spin_lock_init(&(bond_info->rx_hashtbl_lock));
>+
> 	bond->wq = create_singlethread_workqueue(bond_dev->name);
> 	if (!bond->wq)
> 		return -ENOMEM;
>diff --git a/drivers/net/bonding/bond_sysfs.c b/drivers/net/bonding/bond_sysfs.c
>index 4059bfc..bb1319f 100644
>--- a/drivers/net/bonding/bond_sysfs.c
>+++ b/drivers/net/bonding/bond_sysfs.c
>@@ -227,12 +227,6 @@ static ssize_t bonding_store_slaves(struct device *d,
> 	struct net_device *dev;
> 	struct bonding *bond = to_bond(d);
>
>-	/* Quick sanity check -- is the bond interface up? */
>-	if (!(bond->dev->flags & IFF_UP)) {
>-		pr_warning("%s: doing slave updates when interface is down.\n",
>-			   bond->dev->name);
>-	}
>-
> 	if (!rtnl_trylock())
> 		return restart_syscall();
>
>-- 
>1.7.5.2
>

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH] bonding: prevent deadlock on slave store with alb mode (v2)
  2011-05-25 17:16 ` [PATCH] bonding: prevent deadlock on slave store with alb mode (v2) Neil Horman
  2011-05-25 17:32   ` Jay Vosburgh
@ 2011-05-25 21:58   ` David Miller
  1 sibling, 0 replies; 18+ messages in thread
From: David Miller @ 2011-05-25 21:58 UTC (permalink / raw)
  To: nhorman; +Cc: netdev, fubar, andy, nicolas.2p.debian

From: Neil Horman <nhorman@tuxdriver.com>
Date: Wed, 25 May 2011 13:16:45 -0400

> This soft lockup was recently reported:
 ...
> Signed-off-by: Neil Horman <nhorman@tuxdriver.com>
> Reported-by: jtluka@redhat.com

Applied.

^ permalink raw reply	[flat|nested] 18+ messages in thread

end of thread, other threads:[~2011-05-25 21:59 UTC | newest]

Thread overview: 18+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2011-05-24 19:36 [PATCH] bonding: prevent deadlock on slave store with alb mode Neil Horman
2011-05-24 20:00 ` Andy Gospodarek
2011-05-24 20:13   ` Nicolas de Pesloüan
2011-05-24 20:37     ` Neil Horman
2011-05-24 20:51       ` Nicolas de Pesloüan
2011-05-24 21:02         ` Nicolas de Pesloüan
2011-05-24 21:12         ` Jay Vosburgh
2011-05-24 21:14           ` David Miller
2011-05-24 21:32           ` Nicolas de Pesloüan
2011-05-24 23:18           ` Neil Horman
2011-05-25  0:34             ` Jay Vosburgh
2011-05-25  2:20               ` Neil Horman
2011-05-25  7:33               ` Nicolas de Pesloüan
2011-05-25 17:16 ` [PATCH] bonding: prevent deadlock on slave store with alb mode (v2) Neil Horman
2011-05-25 17:32   ` Jay Vosburgh
2011-05-25 21:58   ` David Miller
2011-05-25 18:13 ` [PATCH] bonding: prevent deadlock on slave store with alb mode (v3) Neil Horman
2011-05-25 18:59   ` Jay Vosburgh

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.