netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v3] bonding: force enable lacp port after link state recovery for 802.3ad
@ 2019-09-18 13:06 zhangsha.zhang
  2019-09-18 13:35 ` zhangsha (A)
  2019-09-24 13:56 ` David Miller
  0 siblings, 2 replies; 5+ messages in thread
From: zhangsha.zhang @ 2019-09-18 13:06 UTC (permalink / raw)
  To: jay.vosburgh, vfalico, andy, davem, netdev, linux-kernel,
	yuehaibing, hunongda, alex.chen, zhangsha.zhang

From: Sha Zhang <zhangsha.zhang@huawei.com>

After the commit 334031219a84 ("bonding/802.3ad: fix slave link
initialization transition states") merged,
the slave's link status will be changed to BOND_LINK_FAIL
from BOND_LINK_DOWN in the following scenario:
- Driver reports loss of carrier and
  bonding driver receives NETDEV_DOWN notifier
- slave's duplex and speed is zerod and
  its port->is_enabled is cleard to 'false';
- Driver reports link recovery and
  bonding driver receives NETDEV_UP notifier;
- If speed/duplex getting failed here, the link status
  will be changed to BOND_LINK_FAIL;
- The MII monotor later recover the slave's speed/duplex
  and set link status to BOND_LINK_UP, but remains
  the 'port->is_enabled' to 'false'.

In this scenario, the lacp port will not be enabled even its speed
and duplex are valid. The bond will not send LACPDU's, and its
state is 'AD_STATE_DEFAULTED' forever. The simplest fix I think
is to call bond_3ad_handle_link_change() in bond_miimon_commit,
this function can enable lacp after port slave speed check.
As enabled, the lacp port can run its state machine normally
after link recovery.

Signed-off-by: Sha Zhang <zhangsha.zhang@huawei.com>
---
 drivers/net/bonding/bond_main.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/drivers/net/bonding/bond_main.c b/drivers/net/bonding/bond_main.c
index 931d9d9..76324a5 100644
--- a/drivers/net/bonding/bond_main.c
+++ b/drivers/net/bonding/bond_main.c
@@ -2206,7 +2206,8 @@ static void bond_miimon_commit(struct bonding *bond)
 			 */
 			if (BOND_MODE(bond) == BOND_MODE_8023AD &&
 			    slave->link == BOND_LINK_UP)
-				bond_3ad_adapter_speed_duplex_changed(slave);
+				bond_3ad_handle_link_change(slave,
+							    BOND_LINK_UP);
 			continue;
 
 		case BOND_LINK_UP:
-- 
1.8.3.1


^ permalink raw reply related	[flat|nested] 5+ messages in thread

* RE: [PATCH v3] bonding: force enable lacp port after link state recovery for 802.3ad
  2019-09-18 13:06 [PATCH v3] bonding: force enable lacp port after link state recovery for 802.3ad zhangsha.zhang
@ 2019-09-18 13:35 ` zhangsha (A)
  2019-09-19  8:11   ` Jay Vosburgh
  2019-09-24 13:56 ` David Miller
  1 sibling, 1 reply; 5+ messages in thread
From: zhangsha (A) @ 2019-09-18 13:35 UTC (permalink / raw)
  To: jay.vosburgh, vfalico, andy, davem, netdev, linux-kernel,
	yuehaibing, hunongda, Chenzhendong (alex)



> -----Original Message-----
> From: zhangsha (A)
> Sent: 2019年9月18日 21:06
> To: jay.vosburgh@canonical.com; vfalico@gmail.com; andy@greyhouse.net;
> davem@davemloft.net; netdev@vger.kernel.org; linux-kernel@vger.kernel.org;
> yuehaibing <yuehaibing@huawei.com>; hunongda <hunongda@huawei.com>;
> Chenzhendong (alex) <alex.chen@huawei.com>; zhangsha (A)
> <zhangsha.zhang@huawei.com>
> Subject: [PATCH v3] bonding: force enable lacp port after link state recovery for
> 802.3ad
> 
> From: Sha Zhang <zhangsha.zhang@huawei.com>
> 
> After the commit 334031219a84 ("bonding/802.3ad: fix slave link initialization
> transition states") merged, the slave's link status will be changed to
> BOND_LINK_FAIL from BOND_LINK_DOWN in the following scenario:
> - Driver reports loss of carrier and
>   bonding driver receives NETDEV_DOWN notifier
> - slave's duplex and speed is zerod and
>   its port->is_enabled is cleard to 'false';
> - Driver reports link recovery and
>   bonding driver receives NETDEV_UP notifier;
> - If speed/duplex getting failed here, the link status
>   will be changed to BOND_LINK_FAIL;
> - The MII monotor later recover the slave's speed/duplex
>   and set link status to BOND_LINK_UP, but remains
>   the 'port->is_enabled' to 'false'.
> 
> In this scenario, the lacp port will not be enabled even its speed and duplex are
> valid. The bond will not send LACPDU's, and its state is 'AD_STATE_DEFAULTED'
> forever. The simplest fix I think is to call bond_3ad_handle_link_change() in
> bond_miimon_commit, this function can enable lacp after port slave speed
> check.
> As enabled, the lacp port can run its state machine normally after link recovery.
> 
> Signed-off-by: Sha Zhang <zhangsha.zhang@huawei.com>
> ---
>  drivers/net/bonding/bond_main.c | 3 ++-
>  1 file changed, 2 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/net/bonding/bond_main.c
> b/drivers/net/bonding/bond_main.c index 931d9d9..76324a5 100644
> --- a/drivers/net/bonding/bond_main.c
> +++ b/drivers/net/bonding/bond_main.c
> @@ -2206,7 +2206,8 @@ static void bond_miimon_commit(struct bonding
> *bond)
>  			 */
>  			if (BOND_MODE(bond) == BOND_MODE_8023AD &&
>  			    slave->link == BOND_LINK_UP)
> -
> 	bond_3ad_adapter_speed_duplex_changed(slave);
> +				bond_3ad_handle_link_change(slave,
> +							    BOND_LINK_UP);
>  			continue;
> 
>  		case BOND_LINK_UP:

Hi, David,
I have replied your email for a while,  I guess you may miss my email, so I resend it.
The following link address is the last email, please review the new one again, thank you.
https://patchwork.ozlabs.org/patch/1151915/

Last time, you doubted this is a driver specific problem,
I prefer to believe it's not because I find the commit 4d2c0cda,
its log says " Some NIC drivers don't have correct speed/duplex 
settings at the time they send NETDEV_UP notification ...".

Anyway, I think the lacp status should be fixed correctly,
since link-monitoring (miimon) set SPEED/DUPLEX right here.

> --
> 1.8.3.1



^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH v3] bonding: force enable lacp port after link state recovery for 802.3ad
  2019-09-18 13:35 ` zhangsha (A)
@ 2019-09-19  8:11   ` Jay Vosburgh
  2019-09-19 12:46     ` zhangsha (A)
  0 siblings, 1 reply; 5+ messages in thread
From: Jay Vosburgh @ 2019-09-19  8:11 UTC (permalink / raw)
  To: zhangsha (A)
  Cc: vfalico, andy, davem, netdev, linux-kernel, yuehaibing, hunongda,
	Chenzhendong (alex)

zhangsha (A) <zhangsha.zhang@huawei.com> wrote:

>> -----Original Message-----
>> From: zhangsha (A)
>> Sent: 2019年9月18日 21:06
>> To: jay.vosburgh@canonical.com; vfalico@gmail.com; andy@greyhouse.net;
>> davem@davemloft.net; netdev@vger.kernel.org; linux-kernel@vger.kernel.org;
>> yuehaibing <yuehaibing@huawei.com>; hunongda <hunongda@huawei.com>;
>> Chenzhendong (alex) <alex.chen@huawei.com>; zhangsha (A)
>> <zhangsha.zhang@huawei.com>
>> Subject: [PATCH v3] bonding: force enable lacp port after link state recovery for
>> 802.3ad
>> 
>> From: Sha Zhang <zhangsha.zhang@huawei.com>
>> 
>> After the commit 334031219a84 ("bonding/802.3ad: fix slave link initialization
>> transition states") merged, the slave's link status will be changed to
>> BOND_LINK_FAIL from BOND_LINK_DOWN in the following scenario:
>> - Driver reports loss of carrier and
>>   bonding driver receives NETDEV_DOWN notifier
>> - slave's duplex and speed is zerod and
>>   its port->is_enabled is cleard to 'false';
>> - Driver reports link recovery and
>>   bonding driver receives NETDEV_UP notifier;
>> - If speed/duplex getting failed here, the link status
>>   will be changed to BOND_LINK_FAIL;
>> - The MII monotor later recover the slave's speed/duplex
>>   and set link status to BOND_LINK_UP, but remains
>>   the 'port->is_enabled' to 'false'.
>> 
>> In this scenario, the lacp port will not be enabled even its speed and duplex are
>> valid. The bond will not send LACPDU's, and its state is 'AD_STATE_DEFAULTED'
>> forever. The simplest fix I think is to call bond_3ad_handle_link_change() in
>> bond_miimon_commit, this function can enable lacp after port slave speed
>> check.
>> As enabled, the lacp port can run its state machine normally after link recovery.
>> 
>> Signed-off-by: Sha Zhang <zhangsha.zhang@huawei.com>
>> ---
>>  drivers/net/bonding/bond_main.c | 3 ++-
>>  1 file changed, 2 insertions(+), 1 deletion(-)
>> 
>> diff --git a/drivers/net/bonding/bond_main.c
>> b/drivers/net/bonding/bond_main.c index 931d9d9..76324a5 100644
>> --- a/drivers/net/bonding/bond_main.c
>> +++ b/drivers/net/bonding/bond_main.c
>> @@ -2206,7 +2206,8 @@ static void bond_miimon_commit(struct bonding
>> *bond)
>>  			 */
>>  			if (BOND_MODE(bond) == BOND_MODE_8023AD &&
>>  			    slave->link == BOND_LINK_UP)
>> -
>> 	bond_3ad_adapter_speed_duplex_changed(slave);
>> +				bond_3ad_handle_link_change(slave,
>> +							    BOND_LINK_UP);
>>  			continue;
>> 
>>  		case BOND_LINK_UP:
>
>Hi, David,
>I have replied your email for a while,  I guess you may miss my email, so I resend it.
>The following link address is the last email, please review the new one again, thank you.
>https://patchwork.ozlabs.org/patch/1151915/
>
>Last time, you doubted this is a driver specific problem,
>I prefer to believe it's not because I find the commit 4d2c0cda,
>its log says " Some NIC drivers don't have correct speed/duplex 
>settings at the time they send NETDEV_UP notification ...".
>
>Anyway, I think the lacp status should be fixed correctly,
>since link-monitoring (miimon) set SPEED/DUPLEX right here.

	I suspect this is going to be related to the concurrent
discussion with Aleksei, and would like to see the instrumentation
results from his tests before adding another change to attempt to
resolve this.

	Also, what device are you using for your testing, and are you
able to run the instrumentation patch that I provided to Aleksei and
provide its results?

	-J

---
	-Jay Vosburgh, jay.vosburgh@canonical.com

^ permalink raw reply	[flat|nested] 5+ messages in thread

* RE: [PATCH v3] bonding: force enable lacp port after link state recovery for 802.3ad
  2019-09-19  8:11   ` Jay Vosburgh
@ 2019-09-19 12:46     ` zhangsha (A)
  0 siblings, 0 replies; 5+ messages in thread
From: zhangsha (A) @ 2019-09-19 12:46 UTC (permalink / raw)
  To: Jay Vosburgh, zaharov
  Cc: vfalico, andy, davem, netdev, linux-kernel, yuehaibing, hunongda,
	Chenzhendong (alex)



> -----Original Message-----
> From: Jay Vosburgh [mailto:jay.vosburgh@canonical.com]
> Sent: 2019年9月19日 16:12
> To: zhangsha (A) <zhangsha.zhang@huawei.com>
> Cc: vfalico@gmail.com; andy@greyhouse.net; davem@davemloft.net;
> netdev@vger.kernel.org; linux-kernel@vger.kernel.org; yuehaibing
> <yuehaibing@huawei.com>; hunongda <hunongda@huawei.com>;
> Chenzhendong (alex) <alex.chen@huawei.com>
> Subject: Re: [PATCH v3] bonding: force enable lacp port after link state
> recovery for 802.3ad
> 
> zhangsha (A) <zhangsha.zhang@huawei.com> wrote:
> 
> >> -----Original Message-----
> >> From: zhangsha (A)
> >> Sent: 2019年9月18日 21:06
> >> To: jay.vosburgh@canonical.com; vfalico@gmail.com;
> >> andy@greyhouse.net; davem@davemloft.net; netdev@vger.kernel.org;
> >> linux-kernel@vger.kernel.org; yuehaibing <yuehaibing@huawei.com>;
> >> hunongda <hunongda@huawei.com>; Chenzhendong (alex)
> >> <alex.chen@huawei.com>; zhangsha (A) <zhangsha.zhang@huawei.com>
> >> Subject: [PATCH v3] bonding: force enable lacp port after link state
> >> recovery for 802.3ad
> >>
> >> From: Sha Zhang <zhangsha.zhang@huawei.com>
> >>
> >> After the commit 334031219a84 ("bonding/802.3ad: fix slave link
> >> initialization transition states") merged, the slave's link status
> >> will be changed to BOND_LINK_FAIL from BOND_LINK_DOWN in the
> following scenario:
> >> - Driver reports loss of carrier and
> >>   bonding driver receives NETDEV_DOWN notifier
> >> - slave's duplex and speed is zerod and
> >>   its port->is_enabled is cleard to 'false';
> >> - Driver reports link recovery and
> >>   bonding driver receives NETDEV_UP notifier;
> >> - If speed/duplex getting failed here, the link status
> >>   will be changed to BOND_LINK_FAIL;
> >> - The MII monotor later recover the slave's speed/duplex
> >>   and set link status to BOND_LINK_UP, but remains
> >>   the 'port->is_enabled' to 'false'.
> >>
> >> In this scenario, the lacp port will not be enabled even its speed
> >> and duplex are valid. The bond will not send LACPDU's, and its state is
> 'AD_STATE_DEFAULTED'
> >> forever. The simplest fix I think is to call
> >> bond_3ad_handle_link_change() in bond_miimon_commit, this function
> >> can enable lacp after port slave speed check.
> >> As enabled, the lacp port can run its state machine normally after link
> recovery.
> >>
> >> Signed-off-by: Sha Zhang <zhangsha.zhang@huawei.com>
> >> ---
> >>  drivers/net/bonding/bond_main.c | 3 ++-
> >>  1 file changed, 2 insertions(+), 1 deletion(-)
> >>
> >> diff --git a/drivers/net/bonding/bond_main.c
> >> b/drivers/net/bonding/bond_main.c index 931d9d9..76324a5 100644
> >> --- a/drivers/net/bonding/bond_main.c
> >> +++ b/drivers/net/bonding/bond_main.c
> >> @@ -2206,7 +2206,8 @@ static void bond_miimon_commit(struct bonding
> >> *bond)
> >>  			 */
> >>  			if (BOND_MODE(bond) == BOND_MODE_8023AD &&
> >>  			    slave->link == BOND_LINK_UP)
> >> -
> >> 	bond_3ad_adapter_speed_duplex_changed(slave);
> >> +				bond_3ad_handle_link_change(slave,
> >> +							    BOND_LINK_UP);
> >>  			continue;
> >>
> >>  		case BOND_LINK_UP:
> >
> >Hi, David,
> >I have replied your email for a while,  I guess you may miss my email, so I
> resend it.
> >The following link address is the last email, please review the new one again,
> thank you.
> >https://patchwork.ozlabs.org/patch/1151915/
> >
> >Last time, you doubted this is a driver specific problem, I prefer to
> >believe it's not because I find the commit 4d2c0cda, its log says "
> >Some NIC drivers don't have correct speed/duplex settings at the time
> >they send NETDEV_UP notification ...".
> >
> >Anyway, I think the lacp status should be fixed correctly, since
> >link-monitoring (miimon) set SPEED/DUPLEX right here.
> 
> 	I suspect this is going to be related to the concurrent discussion with
> Aleksei, and would like to see the instrumentation results from his tests before
> adding another change to attempt to resolve this.
> 
> 	Also, what device are you using for your testing, and are you able to
> run the instrumentation patch that I provided to Aleksei and provide its results?
> 	
> 	-J
> 

Yes, I think it's the same problem.
I am using a Huawei hinic card with kernel 4.19 and got the same message and the
weird system mac "00:00:00:00:00:00":
"link status definitely down for interface eth6, disabling it
 link status up again after 0 ms for interface eth6"

I will run your instrumentation patch, but maybe I need more times.
In fact, I have tried to reproduce the problem for thousands of times, but never succeeded.

> ---
> 	-Jay Vosburgh, jay.vosburgh@canonical.com

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH v3] bonding: force enable lacp port after link state recovery for 802.3ad
  2019-09-18 13:06 [PATCH v3] bonding: force enable lacp port after link state recovery for 802.3ad zhangsha.zhang
  2019-09-18 13:35 ` zhangsha (A)
@ 2019-09-24 13:56 ` David Miller
  1 sibling, 0 replies; 5+ messages in thread
From: David Miller @ 2019-09-24 13:56 UTC (permalink / raw)
  To: zhangsha.zhang
  Cc: jay.vosburgh, vfalico, andy, netdev, linux-kernel, yuehaibing,
	hunongda, alex.chen

From: <zhangsha.zhang@huawei.com>
Date: Wed, 18 Sep 2019 21:06:20 +0800

> From: Sha Zhang <zhangsha.zhang@huawei.com>
> 
> After the commit 334031219a84 ("bonding/802.3ad: fix slave link
> initialization transition states") merged,
> the slave's link status will be changed to BOND_LINK_FAIL
> from BOND_LINK_DOWN in the following scenario:
> - Driver reports loss of carrier and
>   bonding driver receives NETDEV_DOWN notifier
> - slave's duplex and speed is zerod and
>   its port->is_enabled is cleard to 'false';
> - Driver reports link recovery and
>   bonding driver receives NETDEV_UP notifier;
> - If speed/duplex getting failed here, the link status
>   will be changed to BOND_LINK_FAIL;
> - The MII monotor later recover the slave's speed/duplex
>   and set link status to BOND_LINK_UP, but remains
>   the 'port->is_enabled' to 'false'.
> 
> In this scenario, the lacp port will not be enabled even its speed
> and duplex are valid. The bond will not send LACPDU's, and its
> state is 'AD_STATE_DEFAULTED' forever. The simplest fix I think
> is to call bond_3ad_handle_link_change() in bond_miimon_commit,
> this function can enable lacp after port slave speed check.
> As enabled, the lacp port can run its state machine normally
> after link recovery.
> 
> Signed-off-by: Sha Zhang <zhangsha.zhang@huawei.com>

Please work out with Jay and Aleksei how to properly fix this.

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2019-09-24 13:56 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-09-18 13:06 [PATCH v3] bonding: force enable lacp port after link state recovery for 802.3ad zhangsha.zhang
2019-09-18 13:35 ` zhangsha (A)
2019-09-19  8:11   ` Jay Vosburgh
2019-09-19 12:46     ` zhangsha (A)
2019-09-24 13:56 ` David Miller

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).