All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH net] bonding: fix 802.3ad aggregator reselection
@ 2016-06-23 21:20 Jay Vosburgh
  2016-06-28  8:20 ` David Miller
  2016-06-28 11:57 ` Veli-Matti Lintu
  0 siblings, 2 replies; 14+ messages in thread
From: Jay Vosburgh @ 2016-06-23 21:20 UTC (permalink / raw)
  To: netdev
  Cc: Veli-Matti Lintu, Veaceslav Falico, Andy Gospodarek, zhuyj,
	David S. Miller


	Since commit 7bb11dc9f59d ("bonding: unify all places where
actor-oper key needs to be updated."), the logic in bonding to handle
selection between multiple aggregators has not functioned.

	This affects only configurations wherein the bonding slaves
connect to two discrete aggregators (e.g., two independent switches, each
with LACP enabled), thus creating two separate aggregation groups within a
single bond.

	The cause is a change in 7bb11dc9f59d to no longer set
AD_PORT_BEGIN on a port after a link state change, which would cause the
port to be reselected for attachment to an aggregator as if were newly
added to the bond.  We cannot restore the prior behavior, as it
contradicts IEEE 802.1AX 5.4.12, which requires ports that "become
inoperable" (lose carrier, setting port_enabled=false as per 802.1AX
5.4.7) to remain selected (i.e., assigned to the aggregator).  As the port
now remains selected, the aggregator selection logic is not invoked.

	A side effect of this change is that aggregators in bonding will
now contain ports that are link down.  The aggregator selection logic
does not currently handle this situation correctly, causing incorrect
aggregator selection.

	This patch makes two changes to repair the aggregator selection
logic in bonding to function as documented and within the confines of the
standard:

	First, the aggregator selection and related logic now utilizes the
number of active ports per aggregator, not the number of selected ports
(as some selected ports may be down).  The ad_select "bandwidth" and
"count" options only consider ports that are link up.

	Second, on any carrier state change of any slave, the aggregator
selection logic is explicitly called to insure the correct aggregator is
active.

Reported-by: Veli-Matti Lintu <veli-matti.lintu@opinsys.fi>
Fixes: 7bb11dc9f59d ("bonding: unify all places where actor-oper key needs to be updated.")
Signed-off-by: Jay Vosburgh <jay.vosburgh@canonical.com>

---
 drivers/net/bonding/bond_3ad.c | 64 +++++++++++++++++++++++++++++-------------
 1 file changed, 45 insertions(+), 19 deletions(-)

diff --git a/drivers/net/bonding/bond_3ad.c b/drivers/net/bonding/bond_3ad.c
index b9304a295f86..ca81f46ea1aa 100644
--- a/drivers/net/bonding/bond_3ad.c
+++ b/drivers/net/bonding/bond_3ad.c
@@ -657,6 +657,20 @@ static void __set_agg_ports_ready(struct aggregator *aggregator, int val)
 	}
 }
 
+static int __agg_active_ports(struct aggregator *agg)
+{
+	struct port *port;
+	int active = 0;
+
+	for (port = agg->lag_ports; port;
+	     port = port->next_port_in_aggregator) {
+		if (port->is_enabled)
+			active++;
+	}
+
+	return active;
+}
+
 /**
  * __get_agg_bandwidth - get the total bandwidth of an aggregator
  * @aggregator: the aggregator we're looking at
@@ -664,39 +678,40 @@ static void __set_agg_ports_ready(struct aggregator *aggregator, int val)
  */
 static u32 __get_agg_bandwidth(struct aggregator *aggregator)
 {
+	int nports = __agg_active_ports(aggregator);
 	u32 bandwidth = 0;
 
-	if (aggregator->num_of_ports) {
+	if (nports) {
 		switch (__get_link_speed(aggregator->lag_ports)) {
 		case AD_LINK_SPEED_1MBPS:
-			bandwidth = aggregator->num_of_ports;
+			bandwidth = nports;
 			break;
 		case AD_LINK_SPEED_10MBPS:
-			bandwidth = aggregator->num_of_ports * 10;
+			bandwidth = nports * 10;
 			break;
 		case AD_LINK_SPEED_100MBPS:
-			bandwidth = aggregator->num_of_ports * 100;
+			bandwidth = nports * 100;
 			break;
 		case AD_LINK_SPEED_1000MBPS:
-			bandwidth = aggregator->num_of_ports * 1000;
+			bandwidth = nports * 1000;
 			break;
 		case AD_LINK_SPEED_2500MBPS:
-			bandwidth = aggregator->num_of_ports * 2500;
+			bandwidth = nports * 2500;
 			break;
 		case AD_LINK_SPEED_10000MBPS:
-			bandwidth = aggregator->num_of_ports * 10000;
+			bandwidth = nports * 10000;
 			break;
 		case AD_LINK_SPEED_20000MBPS:
-			bandwidth = aggregator->num_of_ports * 20000;
+			bandwidth = nports * 20000;
 			break;
 		case AD_LINK_SPEED_40000MBPS:
-			bandwidth = aggregator->num_of_ports * 40000;
+			bandwidth = nports * 40000;
 			break;
 		case AD_LINK_SPEED_56000MBPS:
-			bandwidth = aggregator->num_of_ports * 56000;
+			bandwidth = nports * 56000;
 			break;
 		case AD_LINK_SPEED_100000MBPS:
-			bandwidth = aggregator->num_of_ports * 100000;
+			bandwidth = nports * 100000;
 			break;
 		default:
 			bandwidth = 0; /* to silence the compiler */
@@ -1530,10 +1545,10 @@ static struct aggregator *ad_agg_selection_test(struct aggregator *best,
 
 	switch (__get_agg_selection_mode(curr->lag_ports)) {
 	case BOND_AD_COUNT:
-		if (curr->num_of_ports > best->num_of_ports)
+		if (__agg_active_ports(curr) > __agg_active_ports(best))
 			return curr;
 
-		if (curr->num_of_ports < best->num_of_ports)
+		if (__agg_active_ports(curr) < __agg_active_ports(best))
 			return best;
 
 		/*FALLTHROUGH*/
@@ -1561,8 +1576,14 @@ static int agg_device_up(const struct aggregator *agg)
 	if (!port)
 		return 0;
 
-	return netif_running(port->slave->dev) &&
-	       netif_carrier_ok(port->slave->dev);
+	for (port = agg->lag_ports; port;
+	     port = port->next_port_in_aggregator) {
+		if (netif_running(port->slave->dev) &&
+		    netif_carrier_ok(port->slave->dev))
+			return 1;
+	}
+
+	return 0;
 }
 
 /**
@@ -1610,7 +1631,7 @@ static void ad_agg_selection_logic(struct aggregator *agg,
 
 		agg->is_active = 0;
 
-		if (agg->num_of_ports && agg_device_up(agg))
+		if (__agg_active_ports(agg) && agg_device_up(agg))
 			best = ad_agg_selection_test(best, agg);
 	}
 
@@ -1622,7 +1643,7 @@ static void ad_agg_selection_logic(struct aggregator *agg,
 		 * answering partner.
 		 */
 		if (active && active->lag_ports &&
-		    active->lag_ports->is_enabled &&
+		    __agg_active_ports(active) &&
 		    (__agg_has_partner(active) ||
 		     (!__agg_has_partner(active) &&
 		     !__agg_has_partner(best)))) {
@@ -2133,7 +2154,7 @@ void bond_3ad_unbind_slave(struct slave *slave)
 				else
 					temp_aggregator->lag_ports = temp_port->next_port_in_aggregator;
 				temp_aggregator->num_of_ports--;
-				if (temp_aggregator->num_of_ports == 0) {
+				if (__agg_active_ports(temp_aggregator) == 0) {
 					select_new_active_agg = temp_aggregator->is_active;
 					ad_clear_agg(temp_aggregator);
 					if (select_new_active_agg) {
@@ -2432,7 +2453,9 @@ void bond_3ad_adapter_speed_duplex_changed(struct slave *slave)
  */
 void bond_3ad_handle_link_change(struct slave *slave, char link)
 {
+	struct aggregator *agg;
 	struct port *port;
+	bool dummy;
 
 	port = &(SLAVE_AD_INFO(slave)->port);
 
@@ -2459,6 +2482,9 @@ void bond_3ad_handle_link_change(struct slave *slave, char link)
 		port->is_enabled = false;
 		ad_update_actor_keys(port, true);
 	}
+	agg = __get_first_agg(port);
+	ad_agg_selection_logic(agg, &dummy);
+
 	netdev_dbg(slave->bond->dev, "Port %d changed link status to %s\n",
 		   port->actor_port_number,
 		   link == BOND_LINK_UP ? "UP" : "DOWN");
@@ -2499,7 +2525,7 @@ int bond_3ad_set_carrier(struct bonding *bond)
 	active = __get_active_agg(&(SLAVE_AD_INFO(first_slave)->aggregator));
 	if (active) {
 		/* are enough slaves available to consider link up? */
-		if (active->num_of_ports < bond->params.min_links) {
+		if (__agg_active_ports(active) < bond->params.min_links) {
 			if (netif_carrier_ok(bond->dev)) {
 				netif_carrier_off(bond->dev);
 				goto out;
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 14+ messages in thread

* Re: [PATCH net] bonding: fix 802.3ad aggregator reselection
  2016-06-23 21:20 [PATCH net] bonding: fix 802.3ad aggregator reselection Jay Vosburgh
@ 2016-06-28  8:20 ` David Miller
  2016-06-28 11:57 ` Veli-Matti Lintu
  1 sibling, 0 replies; 14+ messages in thread
From: David Miller @ 2016-06-28  8:20 UTC (permalink / raw)
  To: jay.vosburgh; +Cc: netdev, veli-matti.lintu, vfalico, gospo, zyjzyj2000

From: Jay Vosburgh <jay.vosburgh@canonical.com>
Date: Thu, 23 Jun 2016 14:20:51 -0700

> 
> 	Since commit 7bb11dc9f59d ("bonding: unify all places where
> actor-oper key needs to be updated."), the logic in bonding to handle
> selection between multiple aggregators has not functioned.
> 
> 	This affects only configurations wherein the bonding slaves
> connect to two discrete aggregators (e.g., two independent switches, each
> with LACP enabled), thus creating two separate aggregation groups within a
> single bond.
> 
> 	The cause is a change in 7bb11dc9f59d to no longer set
> AD_PORT_BEGIN on a port after a link state change, which would cause the
> port to be reselected for attachment to an aggregator as if were newly
> added to the bond.  We cannot restore the prior behavior, as it
> contradicts IEEE 802.1AX 5.4.12, which requires ports that "become
> inoperable" (lose carrier, setting port_enabled=false as per 802.1AX
> 5.4.7) to remain selected (i.e., assigned to the aggregator).  As the port
> now remains selected, the aggregator selection logic is not invoked.
> 
> 	A side effect of this change is that aggregators in bonding will
> now contain ports that are link down.  The aggregator selection logic
> does not currently handle this situation correctly, causing incorrect
> aggregator selection.
> 
> 	This patch makes two changes to repair the aggregator selection
> logic in bonding to function as documented and within the confines of the
> standard:
> 
> 	First, the aggregator selection and related logic now utilizes the
> number of active ports per aggregator, not the number of selected ports
> (as some selected ports may be down).  The ad_select "bandwidth" and
> "count" options only consider ports that are link up.
> 
> 	Second, on any carrier state change of any slave, the aggregator
> selection logic is explicitly called to insure the correct aggregator is
> active.
> 
> Reported-by: Veli-Matti Lintu <veli-matti.lintu@opinsys.fi>
> Fixes: 7bb11dc9f59d ("bonding: unify all places where actor-oper key needs to be updated.")
> Signed-off-by: Jay Vosburgh <jay.vosburgh@canonical.com>

Applied and queued up for -stable, thanks Jay.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH net] bonding: fix 802.3ad aggregator reselection
  2016-06-23 21:20 [PATCH net] bonding: fix 802.3ad aggregator reselection Jay Vosburgh
  2016-06-28  8:20 ` David Miller
@ 2016-06-28 11:57 ` Veli-Matti Lintu
  2016-06-29 15:59   ` Jay Vosburgh
       [not found]   ` <CAD=hENfHOVBzwrC2Yy371FvxRG9AyJpvAm+s4R6hzU_+o=tJYA@mail.gmail.com>
  1 sibling, 2 replies; 14+ messages in thread
From: Veli-Matti Lintu @ 2016-06-28 11:57 UTC (permalink / raw)
  To: Jay Vosburgh
  Cc: netdev, Veaceslav Falico, Andy Gospodarek, zhuyj, David S. Miller

2016-06-24 0:20 GMT+03:00 Jay Vosburgh <jay.vosburgh@canonical.com>:
>
>         Since commit 7bb11dc9f59d ("bonding: unify all places where
> actor-oper key needs to be updated."), the logic in bonding to handle
> selection between multiple aggregators has not functioned.
>
>         This affects only configurations wherein the bonding slaves
> connect to two discrete aggregators (e.g., two independent switches, each
> with LACP enabled), thus creating two separate aggregation groups within a
> single bond.
>
>         The cause is a change in 7bb11dc9f59d to no longer set
> AD_PORT_BEGIN on a port after a link state change, which would cause the
> port to be reselected for attachment to an aggregator as if were newly
> added to the bond.  We cannot restore the prior behavior, as it
> contradicts IEEE 802.1AX 5.4.12, which requires ports that "become
> inoperable" (lose carrier, setting port_enabled=false as per 802.1AX
> 5.4.7) to remain selected (i.e., assigned to the aggregator).  As the port
> now remains selected, the aggregator selection logic is not invoked.
>
>         A side effect of this change is that aggregators in bonding will
> now contain ports that are link down.  The aggregator selection logic
> does not currently handle this situation correctly, causing incorrect
> aggregator selection.
>
>         This patch makes two changes to repair the aggregator selection
> logic in bonding to function as documented and within the confines of the
> standard:
>
>         First, the aggregator selection and related logic now utilizes the
> number of active ports per aggregator, not the number of selected ports
> (as some selected ports may be down).  The ad_select "bandwidth" and
> "count" options only consider ports that are link up.
>
>         Second, on any carrier state change of any slave, the aggregator
> selection logic is explicitly called to insure the correct aggregator is
> active.
>
> Reported-by: Veli-Matti Lintu <veli-matti.lintu@opinsys.fi>
> Fixes: 7bb11dc9f59d ("bonding: unify all places where actor-oper key needs to be updated.")
> Signed-off-by: Jay Vosburgh <jay.vosburgh@canonical.com>


Hi,

Thanks for the patch. I have been now testing it and the reselection
seems to be working now in most cases, but I hit one case that seems
to consistently fail in my test environment.

I've been doing most of testing with ad_select=count and this happens
with it. I haven't yet done extensive testing with
ad_select=stable/bandwidth.

The sequence to trigger the failure seems to be:

  Switch A (Agg ID 2)       Switch B (Agg ID 1)
enp5s0f0 ens5f0 ens6f0    enp5s0f1 ens5f1 ens6f1
    X       X      -           X      -       -     Connection works
(Agg ID 2 active)
    X       -      -           X      -       -     Connection works
(Agg ID 1 active)
    X       -      -           -      -       -     No connection (Agg
ID 2 active)

I'm also wondering why link down event causes change of aggregator
when the active aggregator has the same number of active links than
the new aggregator.

The situation here clears once a port comes up. Once I hit also
problems without disabling all ports on active switch:

  Switch A (Agg ID 2)       Switch B (Agg ID 1)
enp5s0f0 ens5f0 ens6f0    enp5s0f1 ens5f1 ens6f1
    X       X      -           X      -       -     Connection works
(Agg ID 2 active)
    X       -      -           X      -       -     No connection (Agg
ID 1 active)

The active switch does not seem to matter either when disabling ports:

  Switch A (Agg ID 2)       Switch B (Agg ID 1)
enp5s0f0 ens5f0 ens6f0    enp5s0f1 ens5f1 ens6f1
    X       -      -           X      X       X     Connection works
(Agg ID 1 active)
    X       -      -           X      -       -     Connection works
(Agg ID 2 active)
    -       -      -           X      -       -     No connection (Agg
ID 1 active)

All testing was done with upstream version 4.6.2 with the patch
applied. When there is no connection, /proc/net/bonding/bond0 still
shows that there is an active aggregator that has a link up, but for
some reason no traffic passes through. I added some debugging
information in bond_procfs.c and the number of active ports seems to
match the enabled ports on switches.

I'll continue doing tests with different scenarios and I can also test
specific cases if needed.


Veli-Matti

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH net] bonding: fix 802.3ad aggregator reselection
  2016-06-28 11:57 ` Veli-Matti Lintu
@ 2016-06-29 15:59   ` Jay Vosburgh
  2016-06-30 11:15     ` Veli-Matti Lintu
       [not found]   ` <CAD=hENfHOVBzwrC2Yy371FvxRG9AyJpvAm+s4R6hzU_+o=tJYA@mail.gmail.com>
  1 sibling, 1 reply; 14+ messages in thread
From: Jay Vosburgh @ 2016-06-29 15:59 UTC (permalink / raw)
  To: Veli-Matti Lintu
  Cc: netdev, Veaceslav Falico, Andy Gospodarek, zhuyj, David S. Miller

Veli-Matti Lintu <veli-matti.lintu@opinsys.fi> wrote:
[...]
>Thanks for the patch. I have been now testing it and the reselection
>seems to be working now in most cases, but I hit one case that seems
>to consistently fail in my test environment.
>
>I've been doing most of testing with ad_select=count and this happens
>with it. I haven't yet done extensive testing with
>ad_select=stable/bandwidth.
>
>The sequence to trigger the failure seems to be:
>
>  Switch A (Agg ID 2)       Switch B (Agg ID 1)
>enp5s0f0 ens5f0 ens6f0    enp5s0f1 ens5f1 ens6f1
>    X       X      -           X      -       -     Connection works
>(Agg ID 2 active)
>    X       -      -           X      -       -     Connection works
>(Agg ID 1 active)
>    X       -      -           -      -       -     No connection (Agg
>ID 2 active)

	I tried this locally, but don't see any failure (at the end, the
"Switch A" agg is still active with the single port).  I am starting
with just two ports in each aggregator (instead of three), so that may
be relevant.

	Can you enable dynamic debug for bonding and run your test
again, and then send me the debug output (this will appear in the kernel
log, e.g., from dmesg)?  You can enable this via

# echo 'module bonding =p' > /sys/kernel/debug/dynamic_debug/control

	before running the test.  The contents of
/proc/net/bonding/bond0 (read as root, otherwise the LACP internal state
isn't included) from each step would also be helpful.  The output will
likely be large, so I'd suggest sending it to me directly off-list if
it's too big.

>I'm also wondering why link down event causes change of aggregator
>when the active aggregator has the same number of active links than
>the new aggregator.

	This shouldn't happen.  If the active aggregator is just as good
as some other aggregator choice, it should stay with the current active.

	I suspect that both of these are edge cases arising from the
aggregators now including link down ports, which previously never
happened.

	-J

---
	-Jay Vosburgh, jay.vosburgh@canonical.com

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH net] bonding: fix 802.3ad aggregator reselection
  2016-06-29 15:59   ` Jay Vosburgh
@ 2016-06-30 11:15     ` Veli-Matti Lintu
  2016-07-05 14:01       ` Veli-Matti Lintu
  0 siblings, 1 reply; 14+ messages in thread
From: Veli-Matti Lintu @ 2016-06-30 11:15 UTC (permalink / raw)
  To: Jay Vosburgh
  Cc: netdev, Veaceslav Falico, Andy Gospodarek, zhuyj, David S. Miller

2016-06-29 18:59 GMT+03:00 Jay Vosburgh <jay.vosburgh@canonical.com>:
> Veli-Matti Lintu <veli-matti.lintu@opinsys.fi> wrote:
> [...]
>>Thanks for the patch. I have been now testing it and the reselection
>>seems to be working now in most cases, but I hit one case that seems
>>to consistently fail in my test environment.
>>
>>I've been doing most of testing with ad_select=count and this happens
>>with it. I haven't yet done extensive testing with
>>ad_select=stable/bandwidth.
>>
>>The sequence to trigger the failure seems to be:
>>
>>  Switch A (Agg ID 2)       Switch B (Agg ID 1)
>>enp5s0f0 ens5f0 ens6f0    enp5s0f1 ens5f1 ens6f1
>>    X       X      -           X      -       -     Connection works
>>(Agg ID 2 active)
>>    X       -      -           X      -       -     Connection works
>>(Agg ID 1 active)
>>    X       -      -           -      -       -     No connection (Agg
>>ID 2 active)
>
>         I tried this locally, but don't see any failure (at the end, the
> "Switch A" agg is still active with the single port).  I am starting
> with just two ports in each aggregator (instead of three), so that may
> be relevant.

When the connection problem occurs, /proc/net/bonding/bond0 always
shows the aggregator that has a link up active. Dumpcap sees at least
broadcast traffic on the port, but I haven't done extensive analysis
on that yet. All TCP connections are cut until the bond is up again
when more ports are enabled on the switch. ping doesn't work either
way.

If I start with only one link enabled on both switches or disable all
active links at once, this problem does not occur.


>         Can you enable dynamic debug for bonding and run your test
> again, and then send me the debug output (this will appear in the kernel
> log, e.g., from dmesg)?  You can enable this via
>
> # echo 'module bonding =p' > /sys/kernel/debug/dynamic_debug/control
>
>         before running the test.  The contents of
> /proc/net/bonding/bond0 (read as root, otherwise the LACP internal state
> isn't included) from each step would also be helpful.  The output will
> likely be large, so I'd suggest sending it to me directly off-list if
> it's too big.

lacp_rate=0 (slow) is used on these tests, but using lacp_rate=1
(fast) does not seem to change the behaviour. This time testing was
done with ad_select=bandwidth, so the same problem affects at least
count/bandwidth.

Here's an overview of the steps done:

  Switch A (Agg ID 2)       Switch B (Agg ID 1)
enp5s0f0 ens5f0 ens6f0    enp5s0f1 ens5f1 ens6f1
    X       X      X           X      X       X     Ok (Agg ID 2
active) - step1.txt
    X       -      -           X      X       X     Ok (Agg ID 1
active) - step2.txt
    X       -      -           X      X       -     Ok (Agg ID 1
active) - step3.txt
    X       -      -           X      -       -     Ok (Agg ID 1
active) - step4.txt
    X       -      -           -      -       -     Connection not
working (Agg ID 2 active) - step5.txt
    X       X      -           -      -       -     Ok (Agg ID 2
active) - step6.txt
    X       X      -           X      X       -     Ok (Agg ID 2
active) - step7.txt

The files are quite big, so uploaded them here:

step1: https://gist.github.com/vmlintu/195e372428ec654b4aa49f51cbe27b14
step2: https://gist.github.com/vmlintu/3f98fe68e52945c189dfeb59cf302fac
step3: https://gist.github.com/vmlintu/bb9757f3e1368593da9cded035275b1d
step4: https://gist.github.com/vmlintu/f3ed76294fb4360695e155da33f1ac04
step5: https://gist.github.com/vmlintu/a2b5a192065927c7cfd09ce547e1f240
step6: https://gist.github.com/vmlintu/650604d5a6992dc057e6d832b5c8cb4f
step7: https://gist.github.com/vmlintu/885f3e8b70873a6d02d2b732829564cc
debug.log: https://gist.github.com/vmlintu/030f2bcf4618344ec8724fdf41aef6d0

The step?.txt files have the contents of /proc/net/bonding/bond0 when
the switch ports match the configuration.


>>I'm also wondering why link down event causes change of aggregator
>>when the active aggregator has the same number of active links than
>>the new aggregator.
>
>         This shouldn't happen.  If the active aggregator is just as good
> as some other aggregator choice, it should stay with the current active.
>
>         I suspect that both of these are edge cases arising from the
> aggregators now including link down ports, which previously never
> happened.

This is something I didn't see in this test. This time I did a reboot
before testing, which might cause change in behaviour. I try to find
the steps required for this to happen.

Veli-Matti

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH net] bonding: fix 802.3ad aggregator reselection
       [not found]   ` <CAD=hENfHOVBzwrC2Yy371FvxRG9AyJpvAm+s4R6hzU_+o=tJYA@mail.gmail.com>
@ 2016-07-05 13:59     ` Veli-Matti Lintu
  2016-07-05 20:02       ` Jay Vosburgh
  0 siblings, 1 reply; 14+ messages in thread
From: Veli-Matti Lintu @ 2016-07-05 13:59 UTC (permalink / raw)
  To: zhuyj
  Cc: Jay Vosburgh, netdev, Veaceslav Falico, Andy Gospodarek, David S. Miller

2016-07-01 12:03 GMT+03:00 zhuyj <zyjzyj2000@gmail.com>:
> "
> ...
> All testing was done with upstream version 4.6.2 with the patch
> applied. When there is no connection, /proc/net/bonding/bond0 still
> shows that there is an active aggregator that has a link up, but for
> some reason no traffic passes through. I added some debugging
> information in bond_procfs.c and the number of active ports seems to
> match the enabled ports on switches.
> ...
> "
> Would you like to let me know what method you used, the miimon or
> arp_interval? If miimon, would you like to use arp_interval to make tests?

I have understood that only miimon is available with 802.3ad:

bonding.txt:

802.3ad:
...
        Finally, the 802.3ad mode mandates the use of the MII monitor,
        therefore, the ARP monitor is not available in this mode.

Is there a way to enable ARP monitor somehow?

Veli-Matti


>
> Let me know the result.
> Zhu Yanjun
>
> On Tue, Jun 28, 2016 at 7:57 PM, Veli-Matti Lintu
> <veli-matti.lintu@opinsys.fi> wrote:
>>
>> 2016-06-24 0:20 GMT+03:00 Jay Vosburgh <jay.vosburgh@canonical.com>:
>> >
>> >         Since commit 7bb11dc9f59d ("bonding: unify all places where
>> > actor-oper key needs to be updated."), the logic in bonding to handle
>> > selection between multiple aggregators has not functioned.
>> >
>> >         This affects only configurations wherein the bonding slaves
>> > connect to two discrete aggregators (e.g., two independent switches,
>> > each
>> > with LACP enabled), thus creating two separate aggregation groups within
>> > a
>> > single bond.
>> >
>> >         The cause is a change in 7bb11dc9f59d to no longer set
>> > AD_PORT_BEGIN on a port after a link state change, which would cause the
>> > port to be reselected for attachment to an aggregator as if were newly
>> > added to the bond.  We cannot restore the prior behavior, as it
>> > contradicts IEEE 802.1AX 5.4.12, which requires ports that "become
>> > inoperable" (lose carrier, setting port_enabled=false as per 802.1AX
>> > 5.4.7) to remain selected (i.e., assigned to the aggregator).  As the
>> > port
>> > now remains selected, the aggregator selection logic is not invoked.
>> >
>> >         A side effect of this change is that aggregators in bonding will
>> > now contain ports that are link down.  The aggregator selection logic
>> > does not currently handle this situation correctly, causing incorrect
>> > aggregator selection.
>> >
>> >         This patch makes two changes to repair the aggregator selection
>> > logic in bonding to function as documented and within the confines of
>> > the
>> > standard:
>> >
>> >         First, the aggregator selection and related logic now utilizes
>> > the
>> > number of active ports per aggregator, not the number of selected ports
>> > (as some selected ports may be down).  The ad_select "bandwidth" and
>> > "count" options only consider ports that are link up.
>> >
>> >         Second, on any carrier state change of any slave, the aggregator
>> > selection logic is explicitly called to insure the correct aggregator is
>> > active.
>> >
>> > Reported-by: Veli-Matti Lintu <veli-matti.lintu@opinsys.fi>
>> > Fixes: 7bb11dc9f59d ("bonding: unify all places where actor-oper key
>> > needs to be updated.")
>> > Signed-off-by: Jay Vosburgh <jay.vosburgh@canonical.com>
>>
>>
>> Hi,
>>
>> Thanks for the patch. I have been now testing it and the reselection
>> seems to be working now in most cases, but I hit one case that seems
>> to consistently fail in my test environment.
>>
>> I've been doing most of testing with ad_select=count and this happens
>> with it. I haven't yet done extensive testing with
>> ad_select=stable/bandwidth.
>>
>> The sequence to trigger the failure seems to be:
>>
>>   Switch A (Agg ID 2)       Switch B (Agg ID 1)
>> enp5s0f0 ens5f0 ens6f0    enp5s0f1 ens5f1 ens6f1
>>     X       X      -           X      -       -     Connection works
>> (Agg ID 2 active)
>>     X       -      -           X      -       -     Connection works
>> (Agg ID 1 active)
>>     X       -      -           -      -       -     No connection (Agg
>> ID 2 active)
>>
>> I'm also wondering why link down event causes change of aggregator
>> when the active aggregator has the same number of active links than
>> the new aggregator.
>>
>> The situation here clears once a port comes up. Once I hit also
>> problems without disabling all ports on active switch:
>>
>>   Switch A (Agg ID 2)       Switch B (Agg ID 1)
>> enp5s0f0 ens5f0 ens6f0    enp5s0f1 ens5f1 ens6f1
>>     X       X      -           X      -       -     Connection works
>> (Agg ID 2 active)
>>     X       -      -           X      -       -     No connection (Agg
>> ID 1 active)
>>
>> The active switch does not seem to matter either when disabling ports:
>>
>>   Switch A (Agg ID 2)       Switch B (Agg ID 1)
>> enp5s0f0 ens5f0 ens6f0    enp5s0f1 ens5f1 ens6f1
>>     X       -      -           X      X       X     Connection works
>> (Agg ID 1 active)
>>     X       -      -           X      -       -     Connection works
>> (Agg ID 2 active)
>>     -       -      -           X      -       -     No connection (Agg
>> ID 1 active)
>>
>> All testing was done with upstream version 4.6.2 with the patch
>> applied. When there is no connection, /proc/net/bonding/bond0 still
>> shows that there is an active aggregator that has a link up, but for
>> some reason no traffic passes through. I added some debugging
>> information in bond_procfs.c and the number of active ports seems to
>> match the enabled ports on switches.
>>
>> I'll continue doing tests with different scenarios and I can also test
>> specific cases if needed.
>>
>>
>> Veli-Matti
>
>

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH net] bonding: fix 802.3ad aggregator reselection
  2016-06-30 11:15     ` Veli-Matti Lintu
@ 2016-07-05 14:01       ` Veli-Matti Lintu
  2016-07-05 20:52         ` Veli-Matti Lintu
  2016-07-05 21:20         ` Jay Vosburgh
  0 siblings, 2 replies; 14+ messages in thread
From: Veli-Matti Lintu @ 2016-07-05 14:01 UTC (permalink / raw)
  To: Jay Vosburgh
  Cc: netdev, Veaceslav Falico, Andy Gospodarek, zhuyj, David S. Miller

2016-06-30 14:15 GMT+03:00 Veli-Matti Lintu <veli-matti.lintu@opinsys.fi>:
> 2016-06-29 18:59 GMT+03:00 Jay Vosburgh <jay.vosburgh@canonical.com>:
>> Veli-Matti Lintu <veli-matti.lintu@opinsys.fi> wrote:

>>         I tried this locally, but don't see any failure (at the end, the
>> "Switch A" agg is still active with the single port).  I am starting
>> with just two ports in each aggregator (instead of three), so that may
>> be relevant.
>
> When the connection problem occurs, /proc/net/bonding/bond0 always
> shows the aggregator that has a link up active. Dumpcap sees at least
> broadcast traffic on the port, but I haven't done extensive analysis
> on that yet. All TCP connections are cut until the bond is up again
> when more ports are enabled on the switch. ping doesn't work either
> way.

I did some further testing on this and it looks like I can get this
working by enabling the ports in the new aggregator the same way as
the ports in old aggregator are disabled in ad_agg_selection_logic().

Normally the ports seem to get enabled from ad_mux_machine() in "case
AD_MUX_COLLECTING_DISTRIBUTING", but something different happens there
as the port does get enabled, but no traffic passes through. So far I
haven't been able to figure out what happens. When the connection is
lost, dumpcap sees traffic on the only active port in the bond, but it
seems like nothing catches it. If I disable and re-enable the same
port, traffic start flowing again normally.

Here's the patch I used for testing on top of 4.7.0-rc6. I haven't
tested this with other modes or h/w setups yet.


diff --git a/drivers/net/bonding/bond_3ad.c b/drivers/net/bonding/bond_3ad.c
index ca81f46..45c06c4 100644
--- a/drivers/net/bonding/bond_3ad.c
+++ b/drivers/net/bonding/bond_3ad.c
@@ -1706,6 +1706,25 @@ static void ad_agg_selection_logic(struct
aggregator *agg,
                                __disable_port(port);
                        }
                }
+
+               /* Enable ports in the new aggregator */
+                if (best) {
+                       netdev_dbg(bond->dev, "Enable ports\n");
+
+                        for (port = best->lag_ports; port;
+                             port = port->next_port_in_aggregator) {
+                                netdev_dbg(bond->dev, "Agg: %d, P=%d:
Port: %s; Enabled=%d\n",
+                                            best->aggregator_identifier,
+                                            best->num_of_ports,
+                                            port->slave->dev->name,
+                                            __port_is_enabled(port));
+
+                                if (!__port_is_enabled(port))
+                                        __enable_port(port);
+                        }
+                }
+
+
                /* Slave array needs update. */
                *update_slave_arr = true;
        }

Veli-Matti

^ permalink raw reply related	[flat|nested] 14+ messages in thread

* Re: [PATCH net] bonding: fix 802.3ad aggregator reselection
  2016-07-05 13:59     ` Veli-Matti Lintu
@ 2016-07-05 20:02       ` Jay Vosburgh
  0 siblings, 0 replies; 14+ messages in thread
From: Jay Vosburgh @ 2016-07-05 20:02 UTC (permalink / raw)
  To: Veli-Matti Lintu
  Cc: zhuyj, netdev, Veaceslav Falico, Andy Gospodarek, David S. Miller

Veli-Matti Lintu <veli-matti.lintu@opinsys.fi> wrote:
[...]
>I have understood that only miimon is available with 802.3ad:
>
>bonding.txt:
>
>802.3ad:
>...
>        Finally, the 802.3ad mode mandates the use of the MII monitor,
>        therefore, the ARP monitor is not available in this mode.
>
>Is there a way to enable ARP monitor somehow?

	No, there isn't.  ARP monitor is not available in 802.3ad mode.

	-J

---
	-Jay Vosburgh, jay.vosburgh@canonical.com

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH net] bonding: fix 802.3ad aggregator reselection
  2016-07-05 14:01       ` Veli-Matti Lintu
@ 2016-07-05 20:52         ` Veli-Matti Lintu
  2016-07-05 21:20         ` Jay Vosburgh
  1 sibling, 0 replies; 14+ messages in thread
From: Veli-Matti Lintu @ 2016-07-05 20:52 UTC (permalink / raw)
  To: Jay Vosburgh
  Cc: netdev, Veaceslav Falico, Andy Gospodarek, zhuyj, David S. Miller

2016-07-05 17:01 GMT+03:00 Veli-Matti Lintu <veli-matti.lintu@opinsys.fi>:
> 2016-06-30 14:15 GMT+03:00 Veli-Matti Lintu <veli-matti.lintu@opinsys.fi>:
>> 2016-06-29 18:59 GMT+03:00 Jay Vosburgh <jay.vosburgh@canonical.com>:
>>> Veli-Matti Lintu <veli-matti.lintu@opinsys.fi> wrote:
>
>>>         I tried this locally, but don't see any failure (at the end, the
>>> "Switch A" agg is still active with the single port).  I am starting
>>> with just two ports in each aggregator (instead of three), so that may
>>> be relevant.
>>
>> When the connection problem occurs, /proc/net/bonding/bond0 always
>> shows the aggregator that has a link up active. Dumpcap sees at least
>> broadcast traffic on the port, but I haven't done extensive analysis
>> on that yet. All TCP connections are cut until the bond is up again
>> when more ports are enabled on the switch. ping doesn't work either
>> way.
>
> I did some further testing on this and it looks like I can get this
> working by enabling the ports in the new aggregator the same way as
> the ports in old aggregator are disabled in ad_agg_selection_logic().
>
> Normally the ports seem to get enabled from ad_mux_machine() in "case
> AD_MUX_COLLECTING_DISTRIBUTING", but something different happens there
> as the port does get enabled, but no traffic passes through. So far I
> haven't been able to figure out what happens. When the connection is
> lost, dumpcap sees traffic on the only active port in the bond, but it
> seems like nothing catches it. If I disable and re-enable the same
> port, traffic start flowing again normally.

One more thing to add here - I have tested the following
bond/bridge/vlan configurations:

1. bond0 has IP address, no bridges/vlans
2. bond0 belongs to a bridge that has the IP address, no vlans
3. bond0 belongs to a bridge that has the IP address + there are
bond0.X VLANs that belong to separate bridges

All configurations behave the same way.

It is also possible to reproduce this with two aggregators with two
links each. The steps are:

   Agg 1   Agg 2
   P1 P2   P3 P4
   X   X   X   X   OK (Agg 2 active)
   X   X   X   -   OK (Agg 1 active)
   X   -   X   -   OK (Agg 1 active)
   -   -   X   -   Fail (Agg 2 active)

The first disabled port needs to be in active aggregator so that the
aggregator is reselected and changed.

Veli-Matti


> Here's the patch I used for testing on top of 4.7.0-rc6. I haven't
> tested this with other modes or h/w setups yet.
>
>
> diff --git a/drivers/net/bonding/bond_3ad.c b/drivers/net/bonding/bond_3ad.c
> index ca81f46..45c06c4 100644
> --- a/drivers/net/bonding/bond_3ad.c
> +++ b/drivers/net/bonding/bond_3ad.c
> @@ -1706,6 +1706,25 @@ static void ad_agg_selection_logic(struct
> aggregator *agg,
>                                 __disable_port(port);
>                         }
>                 }
> +
> +               /* Enable ports in the new aggregator */
> +                if (best) {
> +                       netdev_dbg(bond->dev, "Enable ports\n");
> +
> +                        for (port = best->lag_ports; port;
> +                             port = port->next_port_in_aggregator) {
> +                                netdev_dbg(bond->dev, "Agg: %d, P=%d:
> Port: %s; Enabled=%d\n",
> +                                            best->aggregator_identifier,
> +                                            best->num_of_ports,
> +                                            port->slave->dev->name,
> +                                            __port_is_enabled(port));
> +
> +                                if (!__port_is_enabled(port))
> +                                        __enable_port(port);
> +                        }
> +                }
> +
> +
>                 /* Slave array needs update. */
>                 *update_slave_arr = true;
>         }
>
> Veli-Matti

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH net] bonding: fix 802.3ad aggregator reselection
  2016-07-05 14:01       ` Veli-Matti Lintu
  2016-07-05 20:52         ` Veli-Matti Lintu
@ 2016-07-05 21:20         ` Jay Vosburgh
  2016-07-07 20:04           ` Veli-Matti Lintu
  1 sibling, 1 reply; 14+ messages in thread
From: Jay Vosburgh @ 2016-07-05 21:20 UTC (permalink / raw)
  To: Veli-Matti Lintu
  Cc: netdev, Veaceslav Falico, Andy Gospodarek, zhuyj, David S. Miller

Veli-Matti Lintu <veli-matti.lintu@opinsys.fi> wrote:

>2016-06-30 14:15 GMT+03:00 Veli-Matti Lintu <veli-matti.lintu@opinsys.fi>:
>> 2016-06-29 18:59 GMT+03:00 Jay Vosburgh <jay.vosburgh@canonical.com>:
>>> Veli-Matti Lintu <veli-matti.lintu@opinsys.fi> wrote:
>
>>>         I tried this locally, but don't see any failure (at the end, the
>>> "Switch A" agg is still active with the single port).  I am starting
>>> with just two ports in each aggregator (instead of three), so that may
>>> be relevant.
>>
>> When the connection problem occurs, /proc/net/bonding/bond0 always
>> shows the aggregator that has a link up active. Dumpcap sees at least
>> broadcast traffic on the port, but I haven't done extensive analysis
>> on that yet. All TCP connections are cut until the bond is up again
>> when more ports are enabled on the switch. ping doesn't work either
>> way.
>
>I did some further testing on this and it looks like I can get this
>working by enabling the ports in the new aggregator the same way as
>the ports in old aggregator are disabled in ad_agg_selection_logic().

	I tested with this some as well, using 6 ports total across two
switches, and was still not able to reproduce the issue.  How are you
configuring the bond in the first place?  It may be that there is some
dependency on the ordering of the slaves within the bond and how they
are disabled.

	Also, I am taking the ports down by physically unplugging the
cable from the switch.  If you're doing it differently, that might be
relevant.

>Normally the ports seem to get enabled from ad_mux_machine() in "case
>AD_MUX_COLLECTING_DISTRIBUTING", but something different happens there
>as the port does get enabled, but no traffic passes through. So far I
>haven't been able to figure out what happens. When the connection is
>lost, dumpcap sees traffic on the only active port in the bond, but it
>seems like nothing catches it. If I disable and re-enable the same
>port, traffic start flowing again normally.

	Looking at the debug log you provided, the step that fails
appears to correspond to this portion:

[  367.811419] bond0: link status definitely down for interface enp5s0f1, disabl
ing it
[  367.811425] bond0: best Agg=2; P=3; a k=9; p k=57; Ind=0; Act=0
[  367.811429] bond0: best ports ffff8830113f6638 slave ffff8830113cfe00 enp5s0f
0
[  367.811432] bond0: Agg=1; P=3; a k=9; p k=57; Ind=0; Act=0
[  367.811434] bond0: Agg=2; P=3; a k=9; p k=57; Ind=0; Act=0
[  367.811437] bond0: Agg=3; P=0; a k=0; p k=0; Ind=0; Act=0
[  367.811439] bond0: Agg=4; P=0; a k=0; p k=0; Ind=0; Act=0
[  367.811441] bond0: Agg=5; P=0; a k=0; p k=0; Ind=0; Act=0
[  367.811444] bond0: Agg=6; P=0; a k=0; p k=0; Ind=0; Act=0
[  367.811446] bond0: LAG 2 chosen as the active LAG
[  367.811448] bond0: Agg=2; P=3; a k=9; p k=57; Ind=0; Act=1
[  367.811451] bond0: Port 1 changed link status to DOWN
[  367.811455] bond0: first active interface up!
[  367.811461] Rx Machine: Port=1 (enp5s0f1), Last State=6, Curr State=2
[  367.811495] ixgbe 0000:05:00.0 enp5s0f0: event: 1b
[  367.811497] ixgbe 0000:05:00.0 enp5s0f0: IFF_SLAVE
[  367.811519] ixgbe 0000:05:00.1 enp5s0f1: event: 19
[  367.811522] ixgbe 0000:05:00.1 enp5s0f1: IFF_SLAVE
[  367.811525] ixgbe 0000:05:00.1 enp5s0f1: event: 19
[  367.811528] ixgbe 0000:05:00.1 enp5s0f1: IFF_SLAVE
[  386.809542] Periodic Machine: Port=2, Last State=3, Curr State=4
[  386.909543] Periodic Machine: Port=2, Last State=4, Curr State=3
[  387.009530] update lacpdu: enp5s0f0, actor port state 3d
[  387.009541] Sent LACPDU on port 2
[  387.571372] bond0: Received LACPDU on port 2 slave enp5s0f0
[  387.571379] Rx Machine: Port=2 (enp5s0f0), Last State=6, Curr State=6
[  387.571381] enp5s0f0 partner sync=1
[  416.810767] Periodic Machine: Port=2, Last State=3, Curr State=4
[  416.910786] Periodic Machine: Port=2, Last State=4, Curr State=3
[  417.010749] update lacpdu: enp5s0f0, actor port state 3d
[  417.010761] Sent LACPDU on port 2
[  417.569137] bond0: Received LACPDU on port 2 slave enp5s0f0
[  417.569156] Rx Machine: Port=2 (enp5s0f0), Last State=6, Curr State=6
[... repeats this cycle for a while ....]
[  537.614050] enp5s0f0 partner sync=1
[  566.816851] Periodic Machine: Port=2, Last State=3, Curr State=4
[  566.916843] Periodic Machine: Port=2, Last State=4, Curr State=3
[  567.016829] update lacpdu: enp5s0f0, actor port state 3d
[  567.016839] Sent LACPDU on port 2
[  567.558379] bond0: Received LACPDU on port 2 slave enp5s0f0
[  567.558399] Rx Machine: Port=2 (enp5s0f0), Last State=6, Curr State=6
[  567.558403] enp5s0f0 partner sync=1
[  572.925434] igb 0000:81:00.0 ens5f0: igb: ens5f0 NIC Link is Up 1000 Mbps Full Duplex, Flow Control: RX
[  572.925862] igb 0000:81:00.0 ens5f0: event: 4
[  572.925865] igb 0000:81:00.0 ens5f0: IFF_SLAVE
[  572.925890] bond0: Port 6 Received link speed 0 update from adapter

	The "Periodic Machine" 3->4 then 4->3 then "Sent LACPDU" looks
normal (3->4 is the periodic timer expiring, state 4 sets port->ntt,
then the next iteration moves back to state 3), and includes both send
and receive of LACPDUs on port 2 (which is enp5s0f0).

	I don't see a transition to COLL/DIST state for port 2 in the
log, so presumably it takes place prior to the beginning.  This is the
place that would call __enable_port.

	What you're describing sounds consistent with the slave not
being set to active, which would cause bond_handle_frame ->
bond_should_deliver_exact_match to return RX_HANDLER_EXACT.

	This leads me to wonder if the port in question is in this
incorrect state from the beginning, but it only manifests once it
becomes the only active port.

	Can you instrument __enable_port to see when it is called for
the bad port, and if it actually calls bond_set_slave_active_flags for
the port in question (without your additional patch)?

	Actually, your patch looks to have some debug output; what does
that provide?

>Here's the patch I used for testing on top of 4.7.0-rc6. I haven't
>tested this with other modes or h/w setups yet.

	I'm a bit leery of putting this in until the root cause is
understood, as it otherwise may just be masking the real issue.

	-J

>diff --git a/drivers/net/bonding/bond_3ad.c b/drivers/net/bonding/bond_3ad.c
>index ca81f46..45c06c4 100644
>--- a/drivers/net/bonding/bond_3ad.c
>+++ b/drivers/net/bonding/bond_3ad.c
>@@ -1706,6 +1706,25 @@ static void ad_agg_selection_logic(struct
>aggregator *agg,
>                                __disable_port(port);
>                        }
>                }
>+
>+               /* Enable ports in the new aggregator */
>+                if (best) {
>+                       netdev_dbg(bond->dev, "Enable ports\n");
>+
>+                        for (port = best->lag_ports; port;
>+                             port = port->next_port_in_aggregator) {
>+                                netdev_dbg(bond->dev, "Agg: %d, P=%d:
>Port: %s; Enabled=%d\n",
>+                                            best->aggregator_identifier,
>+                                            best->num_of_ports,
>+                                            port->slave->dev->name,
>+                                            __port_is_enabled(port));
>+
>+                                if (!__port_is_enabled(port))
>+                                        __enable_port(port);
>+                        }
>+                }
>+
>+
>                /* Slave array needs update. */
>                *update_slave_arr = true;
>        }
>
>Veli-Matti

---
	-Jay Vosburgh, jay.vosburgh@canonical.com

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH net] bonding: fix 802.3ad aggregator reselection
  2016-07-05 21:20         ` Jay Vosburgh
@ 2016-07-07 20:04           ` Veli-Matti Lintu
  2016-07-08  1:48             ` Jay Vosburgh
  0 siblings, 1 reply; 14+ messages in thread
From: Veli-Matti Lintu @ 2016-07-07 20:04 UTC (permalink / raw)
  To: Jay Vosburgh
  Cc: netdev, Veaceslav Falico, Andy Gospodarek, zhuyj, David S. Miller

2016-07-06 0:20 GMT+03:00 Jay Vosburgh <jay.vosburgh@canonical.com>:
> Veli-Matti Lintu <veli-matti.lintu@opinsys.fi> wrote:
>
>>2016-06-30 14:15 GMT+03:00 Veli-Matti Lintu <veli-matti.lintu@opinsys.fi>:
>>> 2016-06-29 18:59 GMT+03:00 Jay Vosburgh <jay.vosburgh@canonical.com>:
>>>> Veli-Matti Lintu <veli-matti.lintu@opinsys.fi> wrote:
>>
>>>>         I tried this locally, but don't see any failure (at the end, the
>>>> "Switch A" agg is still active with the single port).  I am starting
>>>> with just two ports in each aggregator (instead of three), so that may
>>>> be relevant.
>>>
>>> When the connection problem occurs, /proc/net/bonding/bond0 always
>>> shows the aggregator that has a link up active. Dumpcap sees at least
>>> broadcast traffic on the port, but I haven't done extensive analysis
>>> on that yet. All TCP connections are cut until the bond is up again
>>> when more ports are enabled on the switch. ping doesn't work either
>>> way.
>>
>>I did some further testing on this and it looks like I can get this
>>working by enabling the ports in the new aggregator the same way as
>>the ports in old aggregator are disabled in ad_agg_selection_logic().
>
>         I tested with this some as well, using 6 ports total across two
> switches, and was still not able to reproduce the issue.  How are you
> configuring the bond in the first place?  It may be that there is some
> dependency on the ordering of the slaves within the bond and how they
> are disabled.

The switch configuration should be pretty basic - port group that has
LACP as type. L4 based hashing is configured on the switches. The bond
is running as untagged on the ports. "show run" shows them as such:

trunk 20-22 trk4 lacp
trunk-load-balance L4-based

Both switches have similar configuration. I'm not an expert in switch
configurations, so I do not know if there's anything else interesting
in there.

The servers are running Ubuntu 16.04 and systemd-networkd is used to
configure the interfaces. Using /etc/network/interfaces failed quite
often to setup all interfaces at boot time, so we switched over to
systemd-networkd.

>         Also, I am taking the ports down by physically unplugging the
> cable from the switch.  If you're doing it differently, that might be
> relevant.

The servers are in remote location, so I'm disabling the ports in
switch configuration. Unfortunately I do not have a similar setup that
I could access physically at the moment.

>>Normally the ports seem to get enabled from ad_mux_machine() in "case
>>AD_MUX_COLLECTING_DISTRIBUTING", but something different happens there
>>as the port does get enabled, but no traffic passes through. So far I
>>haven't been able to figure out what happens. When the connection is
>>lost, dumpcap sees traffic on the only active port in the bond, but it
>>seems like nothing catches it. If I disable and re-enable the same
>>port, traffic start flowing again normally.
>
>         Looking at the debug log you provided, the step that fails
> appears to correspond to this portion:

...

>         The "Periodic Machine" 3->4 then 4->3 then "Sent LACPDU" looks
> normal (3->4 is the periodic timer expiring, state 4 sets port->ntt,
> then the next iteration moves back to state 3), and includes both send
> and receive of LACPDUs on port 2 (which is enp5s0f0).
>
>         I don't see a transition to COLL/DIST state for port 2 in the
> log, so presumably it takes place prior to the beginning.  This is the
> place that would call __enable_port.
>
>         What you're describing sounds consistent with the slave not
> being set to active, which would cause bond_handle_frame ->
> bond_should_deliver_exact_match to return RX_HANDLER_EXACT.
>
>         This leads me to wonder if the port in question is in this
> incorrect state from the beginning, but it only manifests once it
> becomes the only active port.
>
>         Can you instrument __enable_port to see when it is called for
> the bad port, and if it actually calls bond_set_slave_active_flags for
> the port in question (without your additional patch)?

Thanks for the pointers. Adding some extra debug in __enable_port
helped me to get in right direction. And adding quite a bit of extra
debug information after that.

It looks like I'm hitting a case where the port is not initialized
when bond_update_slave_arr is run and this causes a failure in
bond_xmit_slave_id as it cannot find any ports that would be usable
for sending.

When the aggregator changes for the first time, __disable_port() is
called for all the ports in the old aggregator in
ad_agg_selection_logic():

                if (active) {
                        for (port = active->lag_ports; port;
                             port = port->next_port_in_aggregator) {
                                __disable_port(port);
                        }
                }
                /* Slave array needs update. */
                *update_slave_arr = true;

This sets slave->inactive for the port and also slave->backup is set.
These are still set when agg_selection_logic() is called to reselect
the aggregator with your patch. It sets the is_enabled flag, but does
not touch inactive and backup flags.

ad_agg_selection_logic() sets update_slave_arr to true, but the port
is not enabled yet when bond_update_slave_arr() gets called. When
__enable_port() is called for the ports in ad_mux_machine() case
AD_MUX_COLLECTING_DISTRIBUTING, bond_update_slave_arr() is not called
again. This leaves the xmit hash without any slaves. I added some
debugging in bond_3ad_xor_xmit() to see that it drops packets while
connection is not working. The problem is cleared right away when a
port comes up and bond_update_slave_arr() gets called.

I can try to collect debug logs with some extra debugging enabled if that helps.

I was able to get it working by setting the update_slave_arr in
ad_mux_machine when a port is enabled. I'm not sure if this the best
place to get bond_update_slave_arr() invoked, but it seems to do the
trick.

diff --git a/drivers/net/bonding/bond_3ad.c b/drivers/net/bonding/bond_3ad.c
index ca81f46..9b8653c 100644
--- a/drivers/net/bonding/bond_3ad.c
+++ b/drivers/net/bonding/bond_3ad.c
@@ -970,6 +980,9 @@ static void ad_mux_machine(struct port *port, bool
*update_slave_arr)
                                    !__port_is_enabled(port)) {

                                        __enable_port(port);
+
+                                       if (__port_is_enabled(port))
+                                               *update_slave_arr = true;
                                }
                        }
                        break;

The earlier patch that called __enable_port() in
ad_agg_selection_logic probably did the same by getting the inactive
and backup flags cleared so that bond_update_slave_arr() included the
ports in the xmit hash, but I haven't looked through the steps there.
I really cannot claim to understand all the logic in the code, so this
might be totally wrong..


Veli-Matti

^ permalink raw reply related	[flat|nested] 14+ messages in thread

* Re: [PATCH net] bonding: fix 802.3ad aggregator reselection
  2016-07-07 20:04           ` Veli-Matti Lintu
@ 2016-07-08  1:48             ` Jay Vosburgh
  2016-07-08 12:22               ` Veli-Matti Lintu
  0 siblings, 1 reply; 14+ messages in thread
From: Jay Vosburgh @ 2016-07-08  1:48 UTC (permalink / raw)
  To: Veli-Matti Lintu
  Cc: netdev, Veaceslav Falico, Andy Gospodarek, zhuyj, David S. Miller

Veli-Matti Lintu <veli-matti.lintu@opinsys.fi> wrote:

>2016-07-06 0:20 GMT+03:00 Jay Vosburgh <jay.vosburgh@canonical.com>:
>> Veli-Matti Lintu <veli-matti.lintu@opinsys.fi> wrote:
>>
>>>2016-06-30 14:15 GMT+03:00 Veli-Matti Lintu <veli-matti.lintu@opinsys.fi>:
>>>> 2016-06-29 18:59 GMT+03:00 Jay Vosburgh <jay.vosburgh@canonical.com>:
>>>>> Veli-Matti Lintu <veli-matti.lintu@opinsys.fi> wrote:
>>>
>>>>>         I tried this locally, but don't see any failure (at the end, the
>>>>> "Switch A" agg is still active with the single port).  I am starting
>>>>> with just two ports in each aggregator (instead of three), so that may
>>>>> be relevant.
>>>>
>>>> When the connection problem occurs, /proc/net/bonding/bond0 always
>>>> shows the aggregator that has a link up active. Dumpcap sees at least
>>>> broadcast traffic on the port, but I haven't done extensive analysis
>>>> on that yet. All TCP connections are cut until the bond is up again
>>>> when more ports are enabled on the switch. ping doesn't work either
>>>> way.
>>>
>>>I did some further testing on this and it looks like I can get this
>>>working by enabling the ports in the new aggregator the same way as
>>>the ports in old aggregator are disabled in ad_agg_selection_logic().
>>
>>         I tested with this some as well, using 6 ports total across two
>> switches, and was still not able to reproduce the issue.  How are you
>> configuring the bond in the first place?  It may be that there is some
>> dependency on the ordering of the slaves within the bond and how they
>> are disabled.
>
>The switch configuration should be pretty basic - port group that has
>LACP as type. L4 based hashing is configured on the switches. The bond
>is running as untagged on the ports. "show run" shows them as such:
>
>trunk 20-22 trk4 lacp
>trunk-load-balance L4-based
>
>Both switches have similar configuration. I'm not an expert in switch
>configurations, so I do not know if there's anything else interesting
>in there.
>
>The servers are running Ubuntu 16.04 and systemd-networkd is used to
>configure the interfaces. Using /etc/network/interfaces failed quite
>often to setup all interfaces at boot time, so we switched over to
>systemd-networkd.
>
>>         Also, I am taking the ports down by physically unplugging the
>> cable from the switch.  If you're doing it differently, that might be
>> relevant.
>
>The servers are in remote location, so I'm disabling the ports in
>switch configuration. Unfortunately I do not have a similar setup that
>I could access physically at the moment.
>
>>>Normally the ports seem to get enabled from ad_mux_machine() in "case
>>>AD_MUX_COLLECTING_DISTRIBUTING", but something different happens there
>>>as the port does get enabled, but no traffic passes through. So far I
>>>haven't been able to figure out what happens. When the connection is
>>>lost, dumpcap sees traffic on the only active port in the bond, but it
>>>seems like nothing catches it. If I disable and re-enable the same
>>>port, traffic start flowing again normally.
>>
>>         Looking at the debug log you provided, the step that fails
>> appears to correspond to this portion:
>
>...
>
>>         The "Periodic Machine" 3->4 then 4->3 then "Sent LACPDU" looks
>> normal (3->4 is the periodic timer expiring, state 4 sets port->ntt,
>> then the next iteration moves back to state 3), and includes both send
>> and receive of LACPDUs on port 2 (which is enp5s0f0).
>>
>>         I don't see a transition to COLL/DIST state for port 2 in the
>> log, so presumably it takes place prior to the beginning.  This is the
>> place that would call __enable_port.
>>
>>         What you're describing sounds consistent with the slave not
>> being set to active, which would cause bond_handle_frame ->
>> bond_should_deliver_exact_match to return RX_HANDLER_EXACT.
>>
>>         This leads me to wonder if the port in question is in this
>> incorrect state from the beginning, but it only manifests once it
>> becomes the only active port.
>>
>>         Can you instrument __enable_port to see when it is called for
>> the bad port, and if it actually calls bond_set_slave_active_flags for
>> the port in question (without your additional patch)?
>
>Thanks for the pointers. Adding some extra debug in __enable_port
>helped me to get in right direction. And adding quite a bit of extra
>debug information after that.
>
>It looks like I'm hitting a case where the port is not initialized
>when bond_update_slave_arr is run and this causes a failure in
>bond_xmit_slave_id as it cannot find any ports that would be usable
>for sending.
>
>When the aggregator changes for the first time, __disable_port() is
>called for all the ports in the old aggregator in
>ad_agg_selection_logic():
>
>                if (active) {
>                        for (port = active->lag_ports; port;
>                             port = port->next_port_in_aggregator) {
>                                __disable_port(port);
>                        }
>                }
>                /* Slave array needs update. */
>                *update_slave_arr = true;
>
>This sets slave->inactive for the port and also slave->backup is set.
>These are still set when agg_selection_logic() is called to reselect
>the aggregator with your patch. It sets the is_enabled flag, but does
>not touch inactive and backup flags.
>
>ad_agg_selection_logic() sets update_slave_arr to true, but the port
>is not enabled yet when bond_update_slave_arr() gets called. When
>__enable_port() is called for the ports in ad_mux_machine() case
>AD_MUX_COLLECTING_DISTRIBUTING, bond_update_slave_arr() is not called
>again. This leaves the xmit hash without any slaves. I added some
>debugging in bond_3ad_xor_xmit() to see that it drops packets while
>connection is not working. The problem is cleared right away when a
>port comes up and bond_update_slave_arr() gets called.

	I think this is the key right here.  The slave array update and
the move to COLL_DIST state are not synchronized, and the port is not
eligible to be added to the array until it reaches COLL_DIST state.

	This is managed by __enable/__disable_port, which changes the
backup flag that bond_update_slave_arr uses to test for eligibility in
802.3ad mode (via bond_slave_can_tx).

[ from an earlier message ]
>It is also possible to reproduce this with two aggregators with two
>links each. The steps are:
>
>   Agg 1   Agg 2
>   P1 P2   P3 P4
>   X   X   X   X   OK (Agg 2 active)
>   X   X   X   -   OK (Agg 1 active)
>   X   -   X   -   OK (Agg 1 active)
>   -   -   X   -   Fail (Agg 2 active)

	I think I might see what's the sequence of events; stepping
through the above for P3:

	1) Should be up and ok.  Presumably COLL_DIST state.

	2) P3 is run through __disable_port by agg_selection at "LAG %d
chosen as the active LAG" when "Agg 1" is made the active aggregator.
Still likely COLL_DIST state, as all this does is set slave->inactive
(which affects packet RX logic, but not the LACP state).  I don't see a
message for "Disabling port" in the debug log that would be printed if
it went through ad_disable_coll/dist.

	3) no change to P3

	4) link change calls ad_agg_selection_logic, and then updates
slave array, but P3 is still ->inactive, so isn't added.  Later, the
state machine calls ad_mux_machine, which hits the case that calls
__enable_port, but this case doesn't also update the slave array,
leading to the issue at hand.

	Prior to the slave arrays, __enable_port would be enough to
permit the port to send and receive traffic, but an array update needs
to be done when __enable_port actually enables the port.

>I can try to collect debug logs with some extra debugging enabled if that helps.
>
>I was able to get it working by setting the update_slave_arr in
>ad_mux_machine when a port is enabled. I'm not sure if this the best
>place to get bond_update_slave_arr() invoked, but it seems to do the
>trick.
>
>diff --git a/drivers/net/bonding/bond_3ad.c b/drivers/net/bonding/bond_3ad.c
>index ca81f46..9b8653c 100644
>--- a/drivers/net/bonding/bond_3ad.c
>+++ b/drivers/net/bonding/bond_3ad.c
>@@ -970,6 +980,9 @@ static void ad_mux_machine(struct port *port, bool
>*update_slave_arr)
>                                    !__port_is_enabled(port)) {
>
>                                        __enable_port(port);
>+
>+                                       if (__port_is_enabled(port))
>+                                               *update_slave_arr = true;
>                                }
>                        }
>                        break;

	The part I didn't follow at first is that later on in
ad_mux_machine it does:

		switch (port->sm_mux_state) {
[...]
                case AD_MUX_COLLECTING_DISTRIBUTING:
                        port->actor_oper_port_state |= AD_STATE_COLLECTING;
                        port->actor_oper_port_state |= AD_STATE_DISTRIBUTING;
                        port->actor_oper_port_state |= AD_STATE_SYNCHRONIZATION;
                        ad_enable_collecting_distributing(port,
                                                          update_slave_arr);
                        port->ntt = true;
                        break;

	and ad_enable_collecting_distributing should set
update_slave_arr, so in principle it shouldn't be necessary to add your
logic to ad_mux_machine.

	But, this switch case section will run first (because it occurs
when the port initially goes into COLL_DIST state), and the earlier
section that you changed happens the next time through the state
machine, when the port is already in COLL_DIST state.  So, the section I
copied above setting update_slave_arr won't do anything, because the
port is not enabled until the next pass through the state machine.

	I wonder if this is the real problem here: that the port does
not become enabled until the second pass through the state machine in
COLL_DIST state, and then only as a failsafe:

                                /* if port state hasn't changed make
                                 * sure that a collecting distributing
                                 * port in an active aggregator is enabled
                                 */

	Perhaps this happens because in your case the port enters (or
stays in) COLL_DIST state when it is not part of the active aggregator,
and ad_enable_coll/dist doesn't call __enable_port for the port.

	The current call to __enable_port in ad_agg_selection_logic will
only enable the port if it has no partner and is the active aggregator,
which should be the case only when the port's link partner is not
enabled for LACP, so in theory that should not apply here.

>The earlier patch that called __enable_port() in
>ad_agg_selection_logic probably did the same by getting the inactive
>and backup flags cleared so that bond_update_slave_arr() included the
>ports in the xmit hash, but I haven't looked through the steps there.
>I really cannot claim to understand all the logic in the code, so this
>might be totally wrong..

	Either fix (agg_selection_logic enabling all ports in the new
active agg, or ad_mux_machine setting *update_slave_arr = true) seems to
resolve the issue.  I'm not 100% sure about edge cases yet, though.

	We're kind of in uncharted territory here a bit with regard to
the behavior required by the standard, since it doesn't really address
choosing between multiple possible aggregators.

	Thinking about it, I am leaning towards the logic of your prior
patch, the one that enables all ports when an aggregator is made active
in ad_agg_selection_logic.  That feels to me to be a clearer fix, and I
think it may get the slave array properly updated 100ms faster (as it
won't wait for a subsequent invocation of the state machine).

	I think the following will do the right thing (disclaimer: I
have not even tried to compile this):

diff --git a/drivers/net/bonding/bond_3ad.c b/drivers/net/bonding/bond_3ad.c
index edc70ffad660..2da5be91226e 100644
--- a/drivers/net/bonding/bond_3ad.c
+++ b/drivers/net/bonding/bond_3ad.c
@@ -1713,18 +1713,12 @@ static void ad_agg_selection_logic(struct aggregator *agg,
 		*update_slave_arr = true;
 	}
 
-	/* if the selected aggregator is of join individuals
-	 * (partner_system is NULL), enable their ports
-	 */
 	active = __get_active_agg(origin);
 
 	if (active) {
-		if (!__agg_has_partner(active)) {
-			for (port = active->lag_ports; port;
-			     port = port->next_port_in_aggregator) {
-				__enable_port(port);
-			}
-		}
+		for (port = active->lag_ports; port;
+		     port = port->next_port_in_aggregator)
+			__enable_port(port);
 	}
 
 	rcu_read_unlock();


	Rather than adding a new loop as your original patch did, this
one repurposes the existing loop for individual links and does the
enable for all cases.

	I think this may still need a check for the port state in there,
as ports in the aggregator may be link up but still not suitable for
TX/RX if the LACP negotiation doesn't put the port into COLL_DIST state.

	Enabling a port that is already enabled is harmless, and if the
port is link down, the enable will do nothing, so I think this should
resolve things.

	-J

---
	-Jay Vosburgh, jay.vosburgh@canonical.com

^ permalink raw reply related	[flat|nested] 14+ messages in thread

* Re: [PATCH net] bonding: fix 802.3ad aggregator reselection
  2016-07-08  1:48             ` Jay Vosburgh
@ 2016-07-08 12:22               ` Veli-Matti Lintu
  2016-08-18  9:28                 ` Veli-Matti Lintu
  0 siblings, 1 reply; 14+ messages in thread
From: Veli-Matti Lintu @ 2016-07-08 12:22 UTC (permalink / raw)
  To: Jay Vosburgh
  Cc: netdev, Veaceslav Falico, Andy Gospodarek, zhuyj, David S. Miller

2016-07-08 4:48 GMT+03:00 Jay Vosburgh <jay.vosburgh@canonical.com>:
> Veli-Matti Lintu <veli-matti.lintu@opinsys.fi> wrote:
>
>>2016-07-06 0:20 GMT+03:00 Jay Vosburgh <jay.vosburgh@canonical.com>:

>>Thanks for the pointers. Adding some extra debug in __enable_port
>>helped me to get in right direction. And adding quite a bit of extra
>>debug information after that.
>>
>>It looks like I'm hitting a case where the port is not initialized
>>when bond_update_slave_arr is run and this causes a failure in
>>bond_xmit_slave_id as it cannot find any ports that would be usable
>>for sending.
>>
>>When the aggregator changes for the first time, __disable_port() is
>>called for all the ports in the old aggregator in
>>ad_agg_selection_logic():
>>
>>                if (active) {
>>                        for (port = active->lag_ports; port;
>>                             port = port->next_port_in_aggregator) {
>>                                __disable_port(port);
>>                        }
>>                }
>>                /* Slave array needs update. */
>>                *update_slave_arr = true;
>>
>>This sets slave->inactive for the port and also slave->backup is set.
>>These are still set when agg_selection_logic() is called to reselect
>>the aggregator with your patch. It sets the is_enabled flag, but does
>>not touch inactive and backup flags.
>>
>>ad_agg_selection_logic() sets update_slave_arr to true, but the port
>>is not enabled yet when bond_update_slave_arr() gets called. When
>>__enable_port() is called for the ports in ad_mux_machine() case
>>AD_MUX_COLLECTING_DISTRIBUTING, bond_update_slave_arr() is not called
>>again. This leaves the xmit hash without any slaves. I added some
>>debugging in bond_3ad_xor_xmit() to see that it drops packets while
>>connection is not working. The problem is cleared right away when a
>>port comes up and bond_update_slave_arr() gets called.
>
>         I think this is the key right here.  The slave array update and
> the move to COLL_DIST state are not synchronized, and the port is not
> eligible to be added to the array until it reaches COLL_DIST state.
>
>         This is managed by __enable/__disable_port, which changes the
> backup flag that bond_update_slave_arr uses to test for eligibility in
> 802.3ad mode (via bond_slave_can_tx).
>
> [ from an earlier message ]
>>It is also possible to reproduce this with two aggregators with two
>>links each. The steps are:
>>
>>   Agg 1   Agg 2
>>   P1 P2   P3 P4
>>   X   X   X   X   OK (Agg 2 active)
>>   X   X   X   -   OK (Agg 1 active)
>>   X   -   X   -   OK (Agg 1 active)
>>   -   -   X   -   Fail (Agg 2 active)
>
>         I think I might see what's the sequence of events; stepping
> through the above for P3:
>
>         1) Should be up and ok.  Presumably COLL_DIST state.
>
>         2) P3 is run through __disable_port by agg_selection at "LAG %d
> chosen as the active LAG" when "Agg 1" is made the active aggregator.
> Still likely COLL_DIST state, as all this does is set slave->inactive
> (which affects packet RX logic, but not the LACP state).  I don't see a
> message for "Disabling port" in the debug log that would be printed if
> it went through ad_disable_coll/dist.

>         3) no change to P3
>
>         4) link change calls ad_agg_selection_logic, and then updates
> slave array, but P3 is still ->inactive, so isn't added.  Later, the
> state machine calls ad_mux_machine, which hits the case that calls
> __enable_port, but this case doesn't also update the slave array,
> leading to the issue at hand.

Yes, this is what I'm seeing after adding debug information to the
relevant calls.


>         Prior to the slave arrays, __enable_port would be enough to
> permit the port to send and receive traffic, but an array update needs
> to be done when __enable_port actually enables the port.


>>I was able to get it working by setting the update_slave_arr in
>>ad_mux_machine when a port is enabled. I'm not sure if this the best
>>place to get bond_update_slave_arr() invoked, but it seems to do the
>>trick.
>>
>>diff --git a/drivers/net/bonding/bond_3ad.c b/drivers/net/bonding/bond_3ad.c
>>index ca81f46..9b8653c 100644
>>--- a/drivers/net/bonding/bond_3ad.c
>>+++ b/drivers/net/bonding/bond_3ad.c
>>@@ -970,6 +980,9 @@ static void ad_mux_machine(struct port *port, bool
>>*update_slave_arr)
>>                                    !__port_is_enabled(port)) {
>>
>>                                        __enable_port(port);
>>+
>>+                                       if (__port_is_enabled(port))
>>+                                               *update_slave_arr = true;
>>                                }
>>                        }
>>                        break;
>
>         The part I didn't follow at first is that later on in
> ad_mux_machine it does:
>
>                 switch (port->sm_mux_state) {
> [...]
>                 case AD_MUX_COLLECTING_DISTRIBUTING:
>                         port->actor_oper_port_state |= AD_STATE_COLLECTING;
>                         port->actor_oper_port_state |= AD_STATE_DISTRIBUTING;
>                         port->actor_oper_port_state |= AD_STATE_SYNCHRONIZATION;
>                         ad_enable_collecting_distributing(port,
>                                                           update_slave_arr);
>                         port->ntt = true;
>                         break;
>
>         and ad_enable_collecting_distributing should set
> update_slave_arr, so in principle it shouldn't be necessary to add your
> logic to ad_mux_machine.
>
>         But, this switch case section will run first (because it occurs
> when the port initially goes into COLL_DIST state), and the earlier
> section that you changed happens the next time through the state
> machine, when the port is already in COLL_DIST state.  So, the section I
> copied above setting update_slave_arr won't do anything, because the
> port is not enabled until the next pass through the state machine.
>
>         I wonder if this is the real problem here: that the port does
> not become enabled until the second pass through the state machine in
> COLL_DIST state, and then only as a failsafe:
>
>                                 /* if port state hasn't changed make
>                                  * sure that a collecting distributing
>                                  * port in an active aggregator is enabled
>                                  */
>
>         Perhaps this happens because in your case the port enters (or
> stays in) COLL_DIST state when it is not part of the active aggregator,
> and ad_enable_coll/dist doesn't call __enable_port for the port.

I added some debugging there and this seems to be the case.

>         The current call to __enable_port in ad_agg_selection_logic will
> only enable the port if it has no partner and is the active aggregator,
> which should be the case only when the port's link partner is not
> enabled for LACP, so in theory that should not apply here.

After adding some debugging again, it looks like in
ad_agg_selection_logic() calls __enable_port() here when the bond is
being set up and the first port ends up being treated here as an
individual port. But there's no call to __enable_port() here after the
bond has been setup properly and link partner is found.

        if (active) {
                if (!__agg_has_partner(active)) {
                        for (port = active->lag_ports; port;
                             port = port->next_port_in_aggregator) {
                                __enable_port(port);
                        }
                }
        }


>>The earlier patch that called __enable_port() in
>>ad_agg_selection_logic probably did the same by getting the inactive
>>and backup flags cleared so that bond_update_slave_arr() included the
>>ports in the xmit hash, but I haven't looked through the steps there.
>>I really cannot claim to understand all the logic in the code, so this
>>might be totally wrong..
>
>         Either fix (agg_selection_logic enabling all ports in the new
> active agg, or ad_mux_machine setting *update_slave_arr = true) seems to
> resolve the issue.  I'm not 100% sure about edge cases yet, though.
>
>         We're kind of in uncharted territory here a bit with regard to
> the behavior required by the standard, since it doesn't really address
> choosing between multiple possible aggregators.
>
>         Thinking about it, I am leaning towards the logic of your prior
> patch, the one that enables all ports when an aggregator is made active
> in ad_agg_selection_logic.  That feels to me to be a clearer fix, and I
> think it may get the slave array properly updated 100ms faster (as it
> won't wait for a subsequent invocation of the state machine).
>
>         I think the following will do the right thing (disclaimer: I
> have not even tried to compile this):

I did do some testing with similar change already yesterday before
posting the other patch and this patch does fix the problem at least
in this case.

Calling __enable_port() in ad_agg_selection_logic does seem to get the
state corrected faster and I'd say that there are less dropped packets
than having ad_mux_machine set *update_slave_arr = true. But I really
don't have an environment to test where the traffic would be the same
between tests, so this is just my feeling.

> diff --git a/drivers/net/bonding/bond_3ad.c b/drivers/net/bonding/bond_3ad.c
> index edc70ffad660..2da5be91226e 100644
> --- a/drivers/net/bonding/bond_3ad.c
> +++ b/drivers/net/bonding/bond_3ad.c
> @@ -1713,18 +1713,12 @@ static void ad_agg_selection_logic(struct aggregator *agg,
>                 *update_slave_arr = true;
>         }
>
> -       /* if the selected aggregator is of join individuals
> -        * (partner_system is NULL), enable their ports
> -        */
>         active = __get_active_agg(origin);
>
>         if (active) {
> -               if (!__agg_has_partner(active)) {
> -                       for (port = active->lag_ports; port;
> -                            port = port->next_port_in_aggregator) {
> -                               __enable_port(port);
> -                       }
> -               }
> +               for (port = active->lag_ports; port;
> +                    port = port->next_port_in_aggregator)
> +                       __enable_port(port);
>         }
>
>         rcu_read_unlock();
>
>
>         Rather than adding a new loop as your original patch did, this
> one repurposes the existing loop for individual links and does the
> enable for all cases.
>
>         I think this may still need a check for the port state in there,
> as ports in the aggregator may be link up but still not suitable for
> TX/RX if the LACP negotiation doesn't put the port into COLL_DIST state.
>
>         Enabling a port that is already enabled is harmless, and if the
> port is link down, the enable will do nothing, so I think this should
> resolve things.

I can test any patches with additional checks, but this change seems
to fix it for us.

Thanks for help! It has been enlightening to dig through the code to
actually understand what happens.

Veli-Matti

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH net] bonding: fix 802.3ad aggregator reselection
  2016-07-08 12:22               ` Veli-Matti Lintu
@ 2016-08-18  9:28                 ` Veli-Matti Lintu
  0 siblings, 0 replies; 14+ messages in thread
From: Veli-Matti Lintu @ 2016-08-18  9:28 UTC (permalink / raw)
  To: Jay Vosburgh
  Cc: netdev, Veaceslav Falico, Andy Gospodarek, zhuyj, David S. Miller

2016-07-08 15:22 GMT+03:00 Veli-Matti Lintu <veli-matti.lintu@opinsys.fi>:
> 2016-07-08 4:48 GMT+03:00 Jay Vosburgh <jay.vosburgh@canonical.com>:

>> diff --git a/drivers/net/bonding/bond_3ad.c b/drivers/net/bonding/bond_3ad.c
>> index edc70ffad660..2da5be91226e 100644
>> --- a/drivers/net/bonding/bond_3ad.c
>> +++ b/drivers/net/bonding/bond_3ad.c
>> @@ -1713,18 +1713,12 @@ static void ad_agg_selection_logic(struct aggregator *agg,
>>                 *update_slave_arr = true;
>>         }
>>
>> -       /* if the selected aggregator is of join individuals
>> -        * (partner_system is NULL), enable their ports
>> -        */
>>         active = __get_active_agg(origin);
>>
>>         if (active) {
>> -               if (!__agg_has_partner(active)) {
>> -                       for (port = active->lag_ports; port;
>> -                            port = port->next_port_in_aggregator) {
>> -                               __enable_port(port);
>> -                       }
>> -               }
>> +               for (port = active->lag_ports; port;
>> +                    port = port->next_port_in_aggregator)
>> +                       __enable_port(port);
>>         }
>>
>>         rcu_read_unlock();
>>
>>
>>         Rather than adding a new loop as your original patch did, this
>> one repurposes the existing loop for individual links and does the
>> enable for all cases.
>>
>>         I think this may still need a check for the port state in there,
>> as ports in the aggregator may be link up but still not suitable for
>> TX/RX if the LACP negotiation doesn't put the port into COLL_DIST state.
>>
>>         Enabling a port that is already enabled is harmless, and if the
>> port is link down, the enable will do nothing, so I think this should
>> resolve things.
>
> I can test any patches with additional checks, but this change seems
> to fix it for us.

Do you think something else is needed is addition to this? This has
been working on my servers now for some time without problems. I
tested this also against mainline 4.8-rc1 that needed it for bonding
to work properly with the previous test case.

Veli-Matti

^ permalink raw reply	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2016-08-18  9:28 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-06-23 21:20 [PATCH net] bonding: fix 802.3ad aggregator reselection Jay Vosburgh
2016-06-28  8:20 ` David Miller
2016-06-28 11:57 ` Veli-Matti Lintu
2016-06-29 15:59   ` Jay Vosburgh
2016-06-30 11:15     ` Veli-Matti Lintu
2016-07-05 14:01       ` Veli-Matti Lintu
2016-07-05 20:52         ` Veli-Matti Lintu
2016-07-05 21:20         ` Jay Vosburgh
2016-07-07 20:04           ` Veli-Matti Lintu
2016-07-08  1:48             ` Jay Vosburgh
2016-07-08 12:22               ` Veli-Matti Lintu
2016-08-18  9:28                 ` Veli-Matti Lintu
     [not found]   ` <CAD=hENfHOVBzwrC2Yy371FvxRG9AyJpvAm+s4R6hzU_+o=tJYA@mail.gmail.com>
2016-07-05 13:59     ` Veli-Matti Lintu
2016-07-05 20:02       ` Jay Vosburgh

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.