netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Bonding problem
@ 2011-08-15  9:44 Eduard Sinelnikov
  2011-08-15 10:22 ` WeipingPan
  2011-08-15 10:25 ` [PATCH net-2.6] bonding:restore backup and inactive flag of slave Weiping Pan
  0 siblings, 2 replies; 7+ messages in thread
From: Eduard Sinelnikov @ 2011-08-15  9:44 UTC (permalink / raw)
  To: netdev, Andy Gospodarek, Jay Vosburgh

[-- Attachment #1: Type: text/plain, Size: 3853 bytes --]

Hi all,

Following the thread:
http://marc.info/?l=linux-netdev&m=131282467512508&w=2

I have created the this patch for kernel version:3.0.1, which may fix
the bonding problem

Patch explanation:
The patch seting all slaves active prior to switching to round robin mode.
This is done to ensure that every posibly active slave will be used in
communication.

Also, I noticed that just changing the bond_xmit_round_robin will only
partially fix the problem.
Since slaves with inactive bit will not CATCH any trafic.

I wonder if I should remove the check "bond_is_active_slave(slave))"
in bond_xmit_round_robin

Please advice.
            Eduard


On Mon, Aug 08, 2011 at 10:06:05AM -0700, Jay Vosburgh wrote:
>
> Andy Gospodarek <andy@greyhouse.net> wrote:
>
> >On Sun, Aug 07, 2011 at 03:00:30PM +0300, Eduard Sinelnikov wrote:
> >> Hi,
> >>
> >> In the kernel 2.6.39.3 ( /drivers/net/bond/bond_main.c).
> >> In the function  ‘bond_xmit_roundrobin’
> >> The code check if the bond is active via
> >> ‘bond_is_active_slave(slave)’ Function call.
> >> Which actually checks if the slave is backup or active
> >> What is the meaning of slave being  backup in round robin mode?
> >> Correct me if I wrong but in round robin every slave should send a
> >> packet, regardless of being active or backup.
> >>
> >> Thank you,
> >> Â  Â  Â  Â  Â  Â Eduard
> >
> >There probably is not a compelling reason to continue to have it.  There
> >may be a reason historically, but I'm not aware what that might be at
> >this point.  For modes other than active-backup, the value of
> >slave->link and slave->backup should always contain a value that
> >indicates the slave is up and available for transmit.
>
> If you read Eduard's other posts regarding this, the actual
> issue is that when changing from another mode into round-robin,
> occasionally slaves will still be marked as "backup" and won't be used:
>

I did notice that one after I sent this first response.

> >Date: Mon, 8 Aug 2011 11:16:39 +0300
> >Subject: On line Bonding configuration change fails
> >From: Eduard Sinelnikov <eduard.sinelnikov@gmail.com>
> >To: netdev@vger.kernel.org
> >Sender: netdev-owner@vger.kernel.org
> >
> >Hi,
> >
> >My configuration is a follows:
> >
> >Â  Â  Â  Â  Â  Â  Â | eth0 -------------->
> >Ububntu | eth1 --------------> Â  Â Swith ------------> Other computer
> >
> >Scenario:
> >• change the bond mode to active/backup
> >• unplug some of the cable
> >• plug-in the unplugged cable
> >• change bond mode to round robin
> >
> >I can see that only one eth1 is sending data. When I unplug it the ping stops.
> >
> >Is it a bug or some mis-configuration?
> >
> >In the kernel ( /drivers/net/bond/bond_main.c).
> >In the function  ‘bond_xmit_roundrobin
> >’
> >The code check if the bond is active via
> >‘bond_is_active_slave(slave)’ Function call.
> >Which actually checks if the slave is backup or active
> >What is the meaning of backup in round robin?
> >Correct me if I wrong but in round robin every slave should send a
> >packet, regardless of being active or backup.
>
> So from looking at the code, it seems that the actual problem is
> that when transitioning to round-robin mode, one or more slaves can
> remain marked as "backup," and in round-robin mode, that won't ever
> change.  We could probably work around that by removing the "is_active"
> test (essentially declaring that "is_active" is only valid in
> active-backup mode).  That might produce a few odd messages here and
> there (when removing a slave or during a link failure, for example).
>
> From inspection, the bond_xmit_xor function likely has this same
> problem.
>

Agreed.

> -J
>
> ---
> -Jay Vosburgh, IBM Linux Technology Center, fubar@us.ibm.com

[-- Attachment #2: bond_patch.patch --]
[-- Type: application/octet-stream, Size: 1471 bytes --]

diff -uprN linux-3.0.1/drivers/net/bonding/bond_sysfs.c linux-3.0.1.bond/drivers/net/bonding/bond_sysfs.c
--- linux-3.0.1/drivers/net/bonding/bond_sysfs.c	2011-08-05 07:59:21.000000000 +0300
+++ linux-3.0.1.bond/drivers/net/bonding/bond_sysfs.c	2011-08-15 11:59:13.346377263 +0300
@@ -290,6 +290,37 @@ static ssize_t bonding_show_mode(struct
 			bond->params.mode);
 }
 
+
+
+// activate all interfaces.
+static void inline bonding_activate_interfaces(struct bonding * bond )
+{
+	struct slave *slave ;
+	int i ;
+
+	
+
+	read_lock(&bond->lock);	
+	
+	bond_for_each_slave(bond, slave, i) {
+	
+		read_lock(&bond->curr_slave_lock);
+	
+		// change the backup to active since there is no meaninng of backup in round robin.
+		// Also, change the device state so it can catch traffic.
+		if ((  bond_slave_state(slave) ) || slave->inactive ) {
+			if ((slave->link == BOND_LINK_UP) && IS_UP(slave->dev)) {
+				bond_set_slave_active_flags(slave);
+			}
+		}
+		
+		read_unlock(&bond->curr_slave_lock);
+	}
+	
+	read_unlock(&bond->lock);
+
+} 
+
 static ssize_t bonding_store_mode(struct device *d,
 				  struct device_attribute *attr,
 				  const char *buf, size_t count)
@@ -320,6 +351,10 @@ static ssize_t bonding_store_mode(struct
 		goto out;
 	}
 
+	if (bond->params.mode == BOND_MODE_ROUNDROBIN) {
+		bonding_activate_interfaces(bond) ;
+	}
+
 	bond->params.mode = new_value;
 	bond_set_mode_ops(bond, bond->params.mode);
 	pr_info("%s: setting mode to %s (%d).\n",

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Bonding problem
  2011-08-15  9:44 Bonding problem Eduard Sinelnikov
@ 2011-08-15 10:22 ` WeipingPan
  2011-08-15 10:25 ` [PATCH net-2.6] bonding:restore backup and inactive flag of slave Weiping Pan
  1 sibling, 0 replies; 7+ messages in thread
From: WeipingPan @ 2011-08-15 10:22 UTC (permalink / raw)
  To: Eduard Sinelnikov; +Cc: netdev, Andy Gospodarek, Jay Vosburgh

On 08/15/2011 05:44 PM, Eduard Sinelnikov wrote:
> Hi all,
>
> Following the thread:
> http://marc.info/?l=linux-netdev&m=131282467512508&w=2
>
> I have created the this patch for kernel version:3.0.1, which may fix
> the bonding problem
>
> Patch explanation:
> The patch seting all slaves active prior to switching to round robin mode.
> This is done to ensure that every posibly active slave will be used in
> communication.
>
> Also, I noticed that just changing the bond_xmit_round_robin will only
> partially fix the problem.
> Since slaves with inactive bit will not CATCH any trafic.
>
> I wonder if I should remove the check "bond_is_active_slave(slave))"
> in bond_xmit_round_robin
>
> Please advice.
>              Eduard
>
>
My patch is to restore the backup and inactive flag of slave, too,
and I think it is more generic. :-)

Will send it soon.

thanks
Weiping Pan

> On Mon, Aug 08, 2011 at 10:06:05AM -0700, Jay Vosburgh wrote:
>> Andy Gospodarek<andy@greyhouse.net>  wrote:
>>
>>> On Sun, Aug 07, 2011 at 03:00:30PM +0300, Eduard Sinelnikov wrote:
>>>> Hi,
>>>>
>>>> In the kernel 2.6.39.3 ( /drivers/net/bond/bond_main.c).
>>>> In the function  ‘bond_xmit_roundrobin’
>>>> The code check if the bond is active via
>>>> ‘bond_is_active_slave(slave)’ Function call.
>>>> Which actually checks if the slave is backup or active
>>>> What is the meaning of slave being  backup in round robin mode?
>>>> Correct me if I wrong but in round robin every slave should send a
>>>> packet, regardless of being active or backup.
>>>>
>>>> Thank you,
>>>> Â  Â  Â  Â  Â  Â Eduard
>>> There probably is not a compelling reason to continue to have it.  There
>>> may be a reason historically, but I'm not aware what that might be at
>>> this point.  For modes other than active-backup, the value of
>>> slave->link and slave->backup should always contain a value that
>>> indicates the slave is up and available for transmit.
>> If you read Eduard's other posts regarding this, the actual
>> issue is that when changing from another mode into round-robin,
>> occasionally slaves will still be marked as "backup" and won't be used:
>>
> I did notice that one after I sent this first response.
>
>>> Date: Mon, 8 Aug 2011 11:16:39 +0300
>>> Subject: On line Bonding configuration change fails
>>> From: Eduard Sinelnikov<eduard.sinelnikov@gmail.com>
>>> To: netdev@vger.kernel.org
>>> Sender: netdev-owner@vger.kernel.org
>>>
>>> Hi,
>>>
>>> My configuration is a follows:
>>>
>>> Â  Â  Â  Â  Â  Â  Â | eth0 -------------->
>>> Ububntu | eth1 -------------->  Â  Â Swith ------------>  Other computer
>>>
>>> Scenario:
>>> • change the bond mode to active/backup
>>> • unplug some of the cable
>>> • plug-in the unplugged cable
>>> • change bond mode to round robin
>>>
>>> I can see that only one eth1 is sending data. When I unplug it the ping stops.
>>>
>>> Is it a bug or some mis-configuration?
>>>
>>> In the kernel ( /drivers/net/bond/bond_main.c).
>>> In the function  ‘bond_xmit_roundrobin
>>> ’
>>> The code check if the bond is active via
>>> ‘bond_is_active_slave(slave)’ Function call.
>>> Which actually checks if the slave is backup or active
>>> What is the meaning of backup in round robin?
>>> Correct me if I wrong but in round robin every slave should send a
>>> packet, regardless of being active or backup.
>> So from looking at the code, it seems that the actual problem is
>> that when transitioning to round-robin mode, one or more slaves can
>> remain marked as "backup," and in round-robin mode, that won't ever
>> change.  We could probably work around that by removing the "is_active"
>> test (essentially declaring that "is_active" is only valid in
>> active-backup mode).  That might produce a few odd messages here and
>> there (when removing a slave or during a link failure, for example).
>>
>>  From inspection, the bond_xmit_xor function likely has this same
>> problem.
>>
> Agreed.
>
>> -J
>>
>> ---
>> -Jay Vosburgh, IBM Linux Technology Center, fubar@us.ibm.com


^ permalink raw reply	[flat|nested] 7+ messages in thread

* [PATCH net-2.6] bonding:restore backup and inactive flag of slave
  2011-08-15  9:44 Bonding problem Eduard Sinelnikov
  2011-08-15 10:22 ` WeipingPan
@ 2011-08-15 10:25 ` Weiping Pan
  2011-08-15 16:18   ` WANG Cong
  1 sibling, 1 reply; 7+ messages in thread
From: Weiping Pan @ 2011-08-15 10:25 UTC (permalink / raw)
  To: netdev; +Cc: Weiping Pan

Eduard Sinelnikov (eduard.sinelnikov@gmail.com) found that if we change
bonding mode from active backup to round robin, some slaves are still keeping
"backup", and won't transmit packets.

As Jay Vosburgh(fubar@us.ibm.com) pointed out that we can work around that by
removing the bond_is_active_slave() check, because the "backup" flag is only
meaningful for active backup mode.

But if we just simply ignore the bond_is_active_slave() check,
the transmission will work fine, but we can't maintain the correct value of
"backup" flag for each slaves, though it is meaningless for other mode than
active backup.

I'd like to restore "backup" and "inactive" flag in bond_open,
thus we can keep the correct value of them.

As for bond_is_active_slave(), I'd like to prepare another patch to handle it.

Signed-off-by: Weiping Pan <panweiping3@gmail.com>
---
 drivers/net/bonding/bond_main.c |   18 ++++++++++++++++++
 1 files changed, 18 insertions(+), 0 deletions(-)

diff --git a/drivers/net/bonding/bond_main.c b/drivers/net/bonding/bond_main.c
index 38a83ac..3ed9827 100644
--- a/drivers/net/bonding/bond_main.c
+++ b/drivers/net/bonding/bond_main.c
@@ -3419,9 +3419,27 @@ static int bond_xmit_hash_policy_l2(struct sk_buff *skb, int count)
 static int bond_open(struct net_device *bond_dev)
 {
 	struct bonding *bond = netdev_priv(bond_dev);
+	struct slave *slave;
+	int i;
 
 	bond->kill_timers = 0;
 
+	// restore slave->backup and slave->inactive
+	read_lock(&bond->lock);
+	read_lock(&bond->curr_slave_lock);
+	if (bond->slave_cnt > 0) {
+		bond_for_each_slave(bond, slave, i) {
+			if ((bond->params.mode == BOND_MODE_ACTIVEBACKUP)
+				&& (slave != bond->curr_active_slave)) {
+				bond_set_slave_inactive_flags(slave);
+			} else {
+				bond_set_slave_active_flags(slave);
+			}
+		}
+	}
+	read_unlock(&bond->curr_slave_lock);
+	read_unlock(&bond->lock);
+
 	INIT_DELAYED_WORK(&bond->mcast_work, bond_resend_igmp_join_requests_delayed);
 
 	if (bond_is_lb(bond)) {
-- 
1.7.4.4


^ permalink raw reply related	[flat|nested] 7+ messages in thread

* Re: [PATCH net-2.6] bonding:restore backup and inactive flag of slave
  2011-08-15 10:25 ` [PATCH net-2.6] bonding:restore backup and inactive flag of slave Weiping Pan
@ 2011-08-15 16:18   ` WANG Cong
  2011-08-16  1:57     ` [PATCH net-2.6 V2] bonding:reset " Weiping Pan
  0 siblings, 1 reply; 7+ messages in thread
From: WANG Cong @ 2011-08-15 16:18 UTC (permalink / raw)
  To: netdev

On Mon, 15 Aug 2011 18:25:16 +0800, Weiping Pan wrote:

> 
> diff --git a/drivers/net/bonding/bond_main.c
> b/drivers/net/bonding/bond_main.c index 38a83ac..3ed9827 100644
> --- a/drivers/net/bonding/bond_main.c
> +++ b/drivers/net/bonding/bond_main.c
> @@ -3419,9 +3419,27 @@ static int bond_xmit_hash_policy_l2(struct 
sk_buff *skb, int count)
>  static int bond_open(struct net_device *bond_dev) {
>  	struct bonding *bond = netdev_priv(bond_dev);
> +	struct slave *slave;
> +	int i;
>  
>  	bond->kill_timers = 0;
>  
> +	// restore slave->backup and slave->inactive


Please use C-style comment.

> +	read_lock(&bond->lock);
> +	read_lock(&bond->curr_slave_lock);
> +	if (bond->slave_cnt > 0) {


You can move the second read_lock() under this if().



^ permalink raw reply	[flat|nested] 7+ messages in thread

* [PATCH net-2.6 V2] bonding:reset backup and inactive flag of slave
  2011-08-15 16:18   ` WANG Cong
@ 2011-08-16  1:57     ` Weiping Pan
  2011-08-16 12:30       ` WANG Cong
  0 siblings, 1 reply; 7+ messages in thread
From: Weiping Pan @ 2011-08-16  1:57 UTC (permalink / raw)
  To: netdev; +Cc: Weiping Pan

Eduard Sinelnikov (eduard.sinelnikov@gmail.com) found that if we change
bonding mode from active backup to round robin, some slaves are still keeping
"backup", and won't transmit packets.

As Jay Vosburgh(fubar@us.ibm.com) pointed out that we can work around that by
removing the bond_is_active_slave() check, because the "backup" flag is only
meaningful for active backup mode.

But if we just simply ignore the bond_is_active_slave() check,
the transmission will work fine, but we can't maintain the correct value of
"backup" flag for each slaves, though it is meaningless for other mode than
active backup.

I'd like to reset "backup" and "inactive" flag in bond_open,
thus we can keep the correct value of them.

As for bond_is_active_slave(), I'd like to prepare another patch to handle it.

V2:
Use C style comment.
Move read_lock(&bond->curr_slave_lock).
Replace restore with reset, for active backup mode, it means "restore",
but for other modes, it means "reset".

Signed-off-by: Weiping Pan <panweiping3@gmail.com>
---
 drivers/net/bonding/bond_main.c |   18 ++++++++++++++++++
 1 files changed, 18 insertions(+), 0 deletions(-)

diff --git a/drivers/net/bonding/bond_main.c b/drivers/net/bonding/bond_main.c
index 38a83ac..43f2ea5 100644
--- a/drivers/net/bonding/bond_main.c
+++ b/drivers/net/bonding/bond_main.c
@@ -3419,9 +3419,27 @@ static int bond_xmit_hash_policy_l2(struct sk_buff *skb, int count)
 static int bond_open(struct net_device *bond_dev)
 {
 	struct bonding *bond = netdev_priv(bond_dev);
+	struct slave *slave;
+	int i;
 
 	bond->kill_timers = 0;
 
+	/* reset slave->backup and slave->inactive */
+	read_lock(&bond->lock);
+	if (bond->slave_cnt > 0) {
+		read_lock(&bond->curr_slave_lock);
+		bond_for_each_slave(bond, slave, i) {
+			if ((bond->params.mode == BOND_MODE_ACTIVEBACKUP)
+				&& (slave != bond->curr_active_slave)) {
+				bond_set_slave_inactive_flags(slave);
+			} else {
+				bond_set_slave_active_flags(slave);
+			}
+		}
+		read_unlock(&bond->curr_slave_lock);
+	}
+	read_unlock(&bond->lock);
+
 	INIT_DELAYED_WORK(&bond->mcast_work, bond_resend_igmp_join_requests_delayed);
 
 	if (bond_is_lb(bond)) {
-- 
1.7.4.4


^ permalink raw reply related	[flat|nested] 7+ messages in thread

* Re: [PATCH net-2.6 V2] bonding:reset backup and inactive flag of slave
  2011-08-16  1:57     ` [PATCH net-2.6 V2] bonding:reset " Weiping Pan
@ 2011-08-16 12:30       ` WANG Cong
  2011-08-18  3:12         ` David Miller
  0 siblings, 1 reply; 7+ messages in thread
From: WANG Cong @ 2011-08-16 12:30 UTC (permalink / raw)
  To: netdev

On Tue, 16 Aug 2011 09:57:35 +0800, Weiping Pan wrote:

> Eduard Sinelnikov (eduard.sinelnikov@gmail.com) found that if we change
> bonding mode from active backup to round robin, some slaves are still
> keeping "backup", and won't transmit packets.
> 
> As Jay Vosburgh(fubar@us.ibm.com) pointed out that we can work around
> that by removing the bond_is_active_slave() check, because the "backup"
> flag is only meaningful for active backup mode.
> 
> But if we just simply ignore the bond_is_active_slave() check, the
> transmission will work fine, but we can't maintain the correct value of
> "backup" flag for each slaves, though it is meaningless for other mode
> than active backup.
> 
> I'd like to reset "backup" and "inactive" flag in bond_open, thus we can
> keep the correct value of them.
> 
> As for bond_is_active_slave(), I'd like to prepare another patch to
> handle it.
> 
> V2:
> Use C style comment.
> Move read_lock(&bond->curr_slave_lock). Replace restore with reset, for
> active backup mode, it means "restore", but for other modes, it means
> "reset".
> 
> Signed-off-by: Weiping Pan <panweiping3@gmail.com>

Reviewed-by: WANG Cong <xiyou.wangcong@gmail.com>

Thanks.


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH net-2.6 V2] bonding:reset backup and inactive flag of slave
  2011-08-16 12:30       ` WANG Cong
@ 2011-08-18  3:12         ` David Miller
  0 siblings, 0 replies; 7+ messages in thread
From: David Miller @ 2011-08-18  3:12 UTC (permalink / raw)
  To: xiyou.wangcong; +Cc: netdev

From: WANG Cong <xiyou.wangcong@gmail.com>
Date: Tue, 16 Aug 2011 12:30:39 +0000 (UTC)

> On Tue, 16 Aug 2011 09:57:35 +0800, Weiping Pan wrote:
> 
>> Eduard Sinelnikov (eduard.sinelnikov@gmail.com) found that if we change
>> bonding mode from active backup to round robin, some slaves are still
>> keeping "backup", and won't transmit packets.
>> 
>> As Jay Vosburgh(fubar@us.ibm.com) pointed out that we can work around
>> that by removing the bond_is_active_slave() check, because the "backup"
>> flag is only meaningful for active backup mode.
>> 
>> But if we just simply ignore the bond_is_active_slave() check, the
>> transmission will work fine, but we can't maintain the correct value of
>> "backup" flag for each slaves, though it is meaningless for other mode
>> than active backup.
>> 
>> I'd like to reset "backup" and "inactive" flag in bond_open, thus we can
>> keep the correct value of them.
>> 
>> As for bond_is_active_slave(), I'd like to prepare another patch to
>> handle it.
>> 
>> V2:
>> Use C style comment.
>> Move read_lock(&bond->curr_slave_lock). Replace restore with reset, for
>> active backup mode, it means "restore", but for other modes, it means
>> "reset".
>> 
>> Signed-off-by: Weiping Pan <panweiping3@gmail.com>
> 
> Reviewed-by: WANG Cong <xiyou.wangcong@gmail.com>

Applied, thanks.

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2011-08-18  3:12 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2011-08-15  9:44 Bonding problem Eduard Sinelnikov
2011-08-15 10:22 ` WeipingPan
2011-08-15 10:25 ` [PATCH net-2.6] bonding:restore backup and inactive flag of slave Weiping Pan
2011-08-15 16:18   ` WANG Cong
2011-08-16  1:57     ` [PATCH net-2.6 V2] bonding:reset " Weiping Pan
2011-08-16 12:30       ` WANG Cong
2011-08-18  3:12         ` David Miller

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).