All of lore.kernel.org
 help / color / mirror / Atom feed
* Re: Bonding on bond
@ 2011-01-19 20:33 Nicolas de Pesloüan
  2011-01-20 15:31 ` Jiri Bohac
  0 siblings, 1 reply; 7+ messages in thread
From: Nicolas de Pesloüan @ 2011-01-19 20:33 UTC (permalink / raw)
  To: Jiri Bohac; +Cc: Jay Vosburgh, bonding-devel, netdev

Le 19/01/2011 16:49, Jiri Bohac a écrit :
 > On Tue, Jan 18, 2011 at 09:07:20AM +0100, Nicolas de Pesloüan wrote:
 >> Staking bond is not supported. Currently, no setup is know to
 >> require stacking bond.

 > I agree. This question and weird bugreports from people trying
 > this come up over and over. How about this patch?

Why not. Adding this to the documentation should also help.

 > bonding: prohibit enslaving of bonding masters
 >
 > Nested bonding is not supported and will result in strange problems, e.g.:
 > - netif_receive_skb() will not properly change skb->dev to point to the
 >   uppoer-most bonding master
 > - arp monitor will not work (dev->last_rx is only updated by hardware drivers)
 > - accidentally enslaving a bonding master to itself will cause an infinite
 >   recursion in the TX path
 >
 > This patch prevents this by prohibiting a bonding master from being further enslaved.
 >
 > Signed-off-by: Jiri Bohac <jbohac@suse.cz>
 >
 > diff --git a/drivers/net/bonding/bond_main.c b/drivers/net/bonding/bond_main.c
 > index b1025b8..d4d5f42 100644
 > --- a/drivers/net/bonding/bond_main.c
 > +++ b/drivers/net/bonding/bond_main.c
 > @@ -1448,8 +1448,8 @@ int bond_enslave(struct net_device *bond_dev, struct net_device *slave_dev)
 >  	}
 >
 >  	/* already enslaved */
 > -	if (slave_dev->flags & IFF_SLAVE) {
 > -		pr_debug("Error, Device was already enslaved\n");
 > +	if (slave_dev->priv_flags & IFF_BONDING) {
 > +		pr_debug("Error, Device already enslaved or a bonding master\n");

Even if it is possible to test for slave and for master with a single condition (IFF_BONDING), I 
suggest to split the tests and the error messages, to give end user the best possible diagnostic.

If the device is already a master, let's say it.
If the device is already enslaved, let's continue to say it. It might even be better to give the 
name of the other master that already own this slave.

 >  		return -EBUSY;
 >  	}
 >
 >
 >
 > --
 > Jiri Bohac <jbohac@suse.cz>
 > SUSE Labs, SUSE CZ

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Bonding on bond
  2011-01-19 20:33 Bonding on bond Nicolas de Pesloüan
@ 2011-01-20 15:31 ` Jiri Bohac
  2011-01-20 16:12   ` Nicolas de Pesloüan
  0 siblings, 1 reply; 7+ messages in thread
From: Jiri Bohac @ 2011-01-20 15:31 UTC (permalink / raw)
  To: Nicolas de Pesloüan; +Cc: Jiri Bohac, Jay Vosburgh, bonding-devel, netdev

On Wed, Jan 19, 2011 at 09:33:19PM +0100, Nicolas de Pesloüan wrote:
> Even if it is possible to test for slave and for master with a
> single condition (IFF_BONDING), I suggest to split the tests and the
> error messages, to give end user the best possible diagnostic.

OK, why not. The below patch still uses IFF_BONDING to detect a
master is being enslaved, because IFF_MASTER is also used by the
eql driver. No idea if it works / someone ever uses it with
bonding, but it might collide.

bonding: prohibit enslaving of bonding masters

Nested bonding is not supported and will result in strange problems, e.g.:
- netif_receive_skb() will not properly change skb->dev to point to the
  uppoer-most bonding master
- arp monitor will not work (dev->last_rx is only updated by hardware drivers)
- accidentally enslaving a bonding master to itself will cause an infinite
  recursion in the TX path

This patch prevents this by prohibiting a bonding master from being further enslaved.

Signed-off-by: Jiri Bohac <jbohac@suse.cz>

diff --git a/drivers/net/bonding/bond_main.c b/drivers/net/bonding/bond_main.c
index b1025b8..b117dd8 100644
--- a/drivers/net/bonding/bond_main.c
+++ b/drivers/net/bonding/bond_main.c
@@ -1453,6 +1453,12 @@ int bond_enslave(struct net_device *bond_dev, struct net_device *slave_dev)
 		return -EBUSY;
 	}
 
+	/* cannot enslave a master */
+	if (slave_dev->priv_flags & IFF_BONDING) {
+		pr_debug("Error, cannot enslave a bonding master\n");
+		return -EBUSY;
+	}
+
 	/* vlan challenged mutual exclusion */
 	/* no need to lock since we're protected by rtnl_lock */
 	if (slave_dev->features & NETIF_F_VLAN_CHALLENGED) {

-- 
Jiri Bohac <jbohac@suse.cz>
SUSE Labs, SUSE CZ


^ permalink raw reply related	[flat|nested] 7+ messages in thread

* Re: Bonding on bond
  2011-01-20 15:31 ` Jiri Bohac
@ 2011-01-20 16:12   ` Nicolas de Pesloüan
  2011-01-20 19:53     ` Jay Vosburgh
  0 siblings, 1 reply; 7+ messages in thread
From: Nicolas de Pesloüan @ 2011-01-20 16:12 UTC (permalink / raw)
  To: Jiri Bohac; +Cc: Jay Vosburgh, bonding-devel, netdev

Le 20/01/2011 16:31, Jiri Bohac a écrit :
> On Wed, Jan 19, 2011 at 09:33:19PM +0100, Nicolas de Pesloüan wrote:
>> Even if it is possible to test for slave and for master with a
>> single condition (IFF_BONDING), I suggest to split the tests and the
>> error messages, to give end user the best possible diagnostic.
>
> OK, why not. The below patch still uses IFF_BONDING to detect a
> master is being enslaved, because IFF_MASTER is also used by the
> eql driver. No idea if it works / someone ever uses it with
> bonding, but it might collide.

Thanks Jiri.

> bonding: prohibit enslaving of bonding masters
>
> Nested bonding is not supported and will result in strange problems, e.g.:
> - netif_receive_skb() will not properly change skb->dev to point to the
>    uppoer-most bonding master
> - arp monitor will not work (dev->last_rx is only updated by hardware drivers)
> - accidentally enslaving a bonding master to itself will cause an infinite
>    recursion in the TX path
>
> This patch prevents this by prohibiting a bonding master from being further enslaved.
>
> Signed-off-by: Jiri Bohac<jbohac@suse.cz>

Reviewed-by: Nicolas de Pesloüan <nicolas.2p.debian@free.fr>

> diff --git a/drivers/net/bonding/bond_main.c b/drivers/net/bonding/bond_main.c
> index b1025b8..b117dd8 100644
> --- a/drivers/net/bonding/bond_main.c
> +++ b/drivers/net/bonding/bond_main.c
> @@ -1453,6 +1453,12 @@ int bond_enslave(struct net_device *bond_dev, struct net_device *slave_dev)
>   		return -EBUSY;
>   	}
>
> +	/* cannot enslave a master */
> +	if (slave_dev->priv_flags&  IFF_BONDING) {
> +		pr_debug("Error, cannot enslave a bonding master\n");
> +		return -EBUSY;
> +	}
> +
>   	/* vlan challenged mutual exclusion */
>   	/* no need to lock since we're protected by rtnl_lock */
>   	if (slave_dev->features&  NETIF_F_VLAN_CHALLENGED) {
>


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Bonding on bond
  2011-01-20 16:12   ` Nicolas de Pesloüan
@ 2011-01-20 19:53     ` Jay Vosburgh
  2011-01-22 22:57       ` Nicolas de Pesloüan
  0 siblings, 1 reply; 7+ messages in thread
From: Jay Vosburgh @ 2011-01-20 19:53 UTC (permalink / raw)
  To: =?ISO-8859-1?Q?Nicolas_de_Peslo=FCan?=; +Cc: Jiri Bohac, bonding-devel, netdev

Nicolas de Pesloüan 	<nicolas.2p.debian@gmail.com> wrote:

>Le 20/01/2011 16:31, Jiri Bohac a écrit :
>> On Wed, Jan 19, 2011 at 09:33:19PM +0100, Nicolas de Pesloüan wrote:
>>> Even if it is possible to test for slave and for master with a
>>> single condition (IFF_BONDING), I suggest to split the tests and the
>>> error messages, to give end user the best possible diagnostic.
>>
>> OK, why not. The below patch still uses IFF_BONDING to detect a
>> master is being enslaved, because IFF_MASTER is also used by the
>> eql driver. No idea if it works / someone ever uses it with
>> bonding, but it might collide.
>
>Thanks Jiri.
>
>> bonding: prohibit enslaving of bonding masters
>>
>> Nested bonding is not supported and will result in strange problems, e.g.:
>> - netif_receive_skb() will not properly change skb->dev to point to the
>>    uppoer-most bonding master
>> - arp monitor will not work (dev->last_rx is only updated by hardware drivers)
>> - accidentally enslaving a bonding master to itself will cause an infinite
>>    recursion in the TX path

	Did you test these?  I'm curious about the ARP monitor
assertion, because last_rx is updated by bonding itself now (in
skb_bond_should_drop), not in the device drivers.

	I'm in agreement that, by and large, nesting of bonds is
pointless.  However, I suspect that there are users out in the world who
are happily doing so, and this patch may shut them down.

	I've not tested with nesting in a while; I know it used to work
(at least for limited cases, typically an active-backup bond with a pair
of balance-xor or balance-rr or sometimes 802.3ad enslaved to it), but
has never really been a deliberate feature.  Is nesting now utterly
broken, as suggested by the list of problems above?

>> This patch prevents this by prohibiting a bonding master from being further enslaved.
>>
>> Signed-off-by: Jiri Bohac<jbohac@suse.cz>
>
>Reviewed-by: Nicolas de Pesloüan <nicolas.2p.debian@free.fr>

	If nesting really doesn't work and is going to be disabled, then
at a minimum it should also have an update to the documentation
explaining this.

	-J

>> diff --git a/drivers/net/bonding/bond_main.c b/drivers/net/bonding/bond_main.c
>> index b1025b8..b117dd8 100644
>> --- a/drivers/net/bonding/bond_main.c
>> +++ b/drivers/net/bonding/bond_main.c
>> @@ -1453,6 +1453,12 @@ int bond_enslave(struct net_device *bond_dev, struct net_device *slave_dev)
>>   		return -EBUSY;
>>   	}
>>
>> +	/* cannot enslave a master */
>> +	if (slave_dev->priv_flags&  IFF_BONDING) {
>> +		pr_debug("Error, cannot enslave a bonding master\n");
>> +		return -EBUSY;
>> +	}
>> +
>>   	/* vlan challenged mutual exclusion */
>>   	/* no need to lock since we're protected by rtnl_lock */
>>   	if (slave_dev->features&  NETIF_F_VLAN_CHALLENGED) {
>>

---
	-Jay Vosburgh, IBM Linux Technology Center, fubar@us.ibm.com

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Bonding on bond
  2011-01-20 19:53     ` Jay Vosburgh
@ 2011-01-22 22:57       ` Nicolas de Pesloüan
  2011-01-29  0:38         ` Jay Vosburgh
  0 siblings, 1 reply; 7+ messages in thread
From: Nicolas de Pesloüan @ 2011-01-22 22:57 UTC (permalink / raw)
  To: Jay Vosburgh; +Cc: Jiri Bohac, bonding-devel, netdev

Le 20/01/2011 20:53, Jay Vosburgh a écrit :
> 	I'm in agreement that, by and large, nesting of bonds is
> pointless.  However, I suspect that there are users out in the world who
> are happily doing so, and this patch may shut them down.

Hi Jay,

I tested the following nested bonding configuration:

bond1 : eth1 + eth3, in balance-rr mode.
bond2 : eth0 + eth2, in balance-rr mode.
bond0 : bond1 + bond2, in active-backup mode.

The egress path apparently works not so bad, even if I didn't take time yet to check proper load 
balancing nor fail over.

However, the ingress path doesn't work at all. bond0 is unable to receive any packets (ARP or IP).

It doesn't sound surprising to me, having a look at the current code in __netif_receive_skb() :

>         /*
>          * bonding note: skbs received on inactive slaves should only
>          * be delivered to pkt handlers that are exact matches.  Also
>          * the deliver_no_wcard flag will be set.  If packet handlers
>          * are sensitive to duplicate packets these skbs will need to
>          * be dropped at the handler.
>          */
>         null_or_orig = NULL;
>         orig_dev = skb->dev;
>         master = ACCESS_ONCE(orig_dev->master);
>         if (skb->deliver_no_wcard)
>                 null_or_orig = orig_dev;
>         else if (master) {
>                 if (skb_bond_should_drop(skb, master)) {
>                         skb->deliver_no_wcard = 1;
>                         null_or_orig = orig_dev; /* deliver only exact match */
>                 } else
>                         skb->dev = master;
>         }

The skb_bond_should_drop() and skb->dev = master logic is only applied at a single level.

After this code, skb->dev is the master dev of the receiving dev, but skb->dev->master can be != 
NULL, if another level of bonding exists. Nothing obvious would cause the packet to be delivered to 
this possible higher level bonding interface (skb->dev->master).

Is something else expected to call __netif_receive_skb() again, with the current skb, to cause 
another level of bonding to be reachable? For as far as I understand, nothing will, but I might have 
missed something.

> 	I've not tested with nesting in a while; I know it used to work
> (at least for limited cases, typically an active-backup bond with a pair
> of balance-xor or balance-rr or sometimes 802.3ad enslaved to it), but
> has never really been a deliberate feature.  Is nesting now utterly
> broken, as suggested by the list of problems above?

I don't know whether someone really use nested bonding, but I can hardly imagine how one can have it 
works with current kernel, except for a pure egress application, without any feedback from the 
network. And such very specific application wouldn't even be able to receive an ARP reply...

> 	If nesting really doesn't work and is going to be disabled, then
> at a minimum it should also have an update to the documentation
> explaining this.

At least, we should explain that nesting bonding interfaces is known to be mostly broken and 
unsupported.

That being said, we still miss a way to achieve a simple configuration with several links doing load 
balancing to a switch and one or several links doing fail over to another switch, both switches 
*not* being 802.3ad capable.

Should we arrange for bonding to be allowed to nest, for this purpose, or should we find a way to 
setup this configuration with a single level of bonding ? I would prefer the second, but...

	Nicolas.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Bonding on bond
  2011-01-22 22:57       ` Nicolas de Pesloüan
@ 2011-01-29  0:38         ` Jay Vosburgh
  2011-02-02 10:19           ` Nicolas de Pesloüan
  0 siblings, 1 reply; 7+ messages in thread
From: Jay Vosburgh @ 2011-01-29  0:38 UTC (permalink / raw)
  To: =?UTF-8?B?Tmljb2xhcyBkZSBQZXNsb8O8YW4=?=
  Cc: Jiri Bohac, bonding-devel, netdev

Nicolas de Pesloüan <nicolas.2p.debian@gmail.com> wrote:

>Le 20/01/2011 20:53, Jay Vosburgh a écrit :
>> 	I'm in agreement that, by and large, nesting of bonds is
>> pointless.  However, I suspect that there are users out in the world who
>> are happily doing so, and this patch may shut them down.
>
>Hi Jay,
>
>I tested the following nested bonding configuration:
>
>bond1 : eth1 + eth3, in balance-rr mode.
>bond2 : eth0 + eth2, in balance-rr mode.
>bond0 : bond1 + bond2, in active-backup mode.
>
>The egress path apparently works not so bad, even if I didn't take time
>yet to check proper load balancing nor fail over.
>
>However, the ingress path doesn't work at all. bond0 is unable to receive any packets (ARP or IP).

	In light of this, I don't see a problem with disallowing nesting
of bonds.  It should be documented in bonding.txt.

>It doesn't sound surprising to me, having a look at the current code in __netif_receive_skb() :
>
>>         /*
>>          * bonding note: skbs received on inactive slaves should only
>>          * be delivered to pkt handlers that are exact matches.  Also
>>          * the deliver_no_wcard flag will be set.  If packet handlers
>>          * are sensitive to duplicate packets these skbs will need to
>>          * be dropped at the handler.
>>          */
>>         null_or_orig = NULL;
>>         orig_dev = skb->dev;
>>         master = ACCESS_ONCE(orig_dev->master);
>>         if (skb->deliver_no_wcard)
>>                 null_or_orig = orig_dev;
>>         else if (master) {
>>                 if (skb_bond_should_drop(skb, master)) {
>>                         skb->deliver_no_wcard = 1;
>>                         null_or_orig = orig_dev; /* deliver only exact match */
>>                 } else
>>                         skb->dev = master;
>>         }
>
>The skb_bond_should_drop() and skb->dev = master logic is only applied at a single level.
>
>After this code, skb->dev is the master dev of the receiving dev, but
>skb->dev->master can be != NULL, if another level of bonding
>exists. Nothing obvious would cause the packet to be delivered to this
>possible higher level bonding interface (skb->dev->master).
>
>Is something else expected to call __netif_receive_skb() again, with the
>current skb, to cause another level of bonding to be reachable? For as far
>as I understand, nothing will, but I might have missed something.
>
>> 	I've not tested with nesting in a while; I know it used to work
>> (at least for limited cases, typically an active-backup bond with a pair
>> of balance-xor or balance-rr or sometimes 802.3ad enslaved to it), but
>> has never really been a deliberate feature.  Is nesting now utterly
>> broken, as suggested by the list of problems above?
>
>I don't know whether someone really use nested bonding, but I can hardly
>imagine how one can have it works with current kernel, except for a pure
>egress application, without any feedback from the network. And such very
>specific application wouldn't even be able to receive an ARP reply...
>
>> 	If nesting really doesn't work and is going to be disabled, then
>> at a minimum it should also have an update to the documentation
>> explaining this.
>
>At least, we should explain that nesting bonding interfaces is known to be
>mostly broken and unsupported.
>
>That being said, we still miss a way to achieve a simple configuration
>with several links doing load balancing to a switch and one or several
>links doing fail over to another switch, both switches *not* being 802.3ad
>capable.

	This is a harder problem, but it's something that doesn't work
today (and I suspect hasn't for a long time, so if somebody was using
this, I think there would have been some discussion).

>Should we arrange for bonding to be allowed to nest, for this purpose, or
>should we find a way to setup this configuration with a single level of
>bonding ? I would prefer the second, but...

	I'm not sure that either is necessary; 802.3ad will do this
today, and few current production switches lack 802.3ad support.

	Adding support for etherchannel (i.e., not 802.3ad) gang
failover is nontrivial, because the multiple etherchannel port groups
will have to be managed separately, and most likely assigned manually.
Sure, it'd be nice to have, but I'm not sure if it's a benefit worth the
effort.

	Either way, for now, since I recall you mentioned in another
email that you'd crashed the system from nesting bonds, I don't see a
problem with disallowing nesting and updating the documentation with a
bit of this discussion (e.g., "nesting doesn't work, you're probably
trying to do gang failover, which 802.3ad already does for you").

	-J

---
	-Jay Vosburgh, IBM Linux Technology Center, fubar@us.ibm.com

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Bonding on bond
  2011-01-29  0:38         ` Jay Vosburgh
@ 2011-02-02 10:19           ` Nicolas de Pesloüan
  0 siblings, 0 replies; 7+ messages in thread
From: Nicolas de Pesloüan @ 2011-02-02 10:19 UTC (permalink / raw)
  To: Jay Vosburgh, Jiri Bohac; +Cc: bonding-devel, netdev

Le 29/01/2011 01:38, Jay Vosburgh a écrit :
> Nicolas de Pesloüan<nicolas.2p.debian@gmail.com>  wrote:

[snip]

>> However, the ingress path doesn't work at all. bond0 is unable to receive any packets (ARP or IP).
>
> 	In light of this, I don't see a problem with disallowing nesting
> of bonds.  It should be documented in bonding.txt.

Ok, I will do that.

Jiri, any trouble with me stealing your patch (code) and adding the documentation update part? Or do 
you prefer to do it yourself?

[snip]

>> That being said, we still miss a way to achieve a simple configuration
>> with several links doing load balancing to a switch and one or several
>> links doing fail over to another switch, both switches *not* being 802.3ad
>> capable.
>
> 	This is a harder problem, but it's something that doesn't work
> today (and I suspect hasn't for a long time, so if somebody was using
> this, I think there would have been some discussion).

In the mean time, I will state in the documentation that:

- nesting is not allowed.
- only the above particular setup would possibly require nesting.
- this can be achieve using 802.3ad mode, connected to 802.3ad capable switches.

>> Should we arrange for bonding to be allowed to nest, for this purpose, or
>> should we find a way to setup this configuration with a single level of
>> bonding ? I would prefer the second, but...
>
> 	I'm not sure that either is necessary; 802.3ad will do this
> today, and few current production switches lack 802.3ad support.
>
> 	Adding support for etherchannel (i.e., not 802.3ad) gang
> failover is nontrivial, because the multiple etherchannel port groups
> will have to be managed separately, and most likely assigned manually.
> Sure, it'd be nice to have, but I'm not sure if it's a benefit worth the
> effort.

I'm far from a 802.3ad (802.1AX) specialist, but... wouldn't it be possible to force the aggregator 
by hand, for every slaves, to achieve the same effect as receiving LACPDU, when connected to non 
802.3ad capable switches?

echo 802.3ad > /sys/class/net/bond0/bonding/mode
echo +eth0 > /sys/class/net/bond0/bonding/slaves
echo +eth1 > /sys/class/net/bond0/bonding/slaves
echo +eth2 > /sys/class/net/bond0/bonding/slaves
echo 1 > /sys/class/net/bond0/bonding/ad_aggregator_eth0 # those sysfs entries to be created...
echo 1 > /sys/class/net/bond0/bonding/ad_aggregator_eth1
echo 2 > /sys/class/net/bond0/bonding/ad_aggregator_eth2

> 	Either way, for now, since I recall you mentioned in another
> email that you'd crashed the system from nesting bonds, I don't see a
> problem with disallowing nesting and updating the documentation with a
> bit of this discussion (e.g., "nesting doesn't work, you're probably
> trying to do gang failover, which 802.3ad already does for you").

Thanks.

	Nicolas.

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2011-02-02 10:19 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2011-01-19 20:33 Bonding on bond Nicolas de Pesloüan
2011-01-20 15:31 ` Jiri Bohac
2011-01-20 16:12   ` Nicolas de Pesloüan
2011-01-20 19:53     ` Jay Vosburgh
2011-01-22 22:57       ` Nicolas de Pesloüan
2011-01-29  0:38         ` Jay Vosburgh
2011-02-02 10:19           ` Nicolas de Pesloüan

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.