netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: "Алексей Захаров" <zaharov@selectel.ru>
To: Jay Vosburgh <jay.vosburgh@canonical.com>
Cc: netdev@vger.kernel.org
Subject: Re: Fwd: [PATCH] bonding/802.3ad: fix slave initialization states race
Date: Thu, 19 Sep 2019 12:53:14 +0300	[thread overview]
Message-ID: <CAJYOGF87z-o9=a20dC2mZRtfMU58uL0yxZkQJ-bxe5skVvi2rA@mail.gmail.com> (raw)
In-Reply-To: <9357.1568880036@nyx>

чт, 19 сент. 2019 г. в 11:00, Jay Vosburgh <jay.vosburgh@canonical.com>:
>
> Алексей Захаров wrote:
>
> >> >Once a while, one of 802.3ad slaves fails to initialize and hangs in
> >> >BOND_LINK_FAIL state. Commit 334031219a84 ("bonding/802.3ad: fix slave
> >> >link initialization transition states") checks slave->last_link_up. But
> >> >link can still hang in weird state.
> >> >After physical link comes up it sends first two LACPDU messages and
> >> >doesn't work properly after that. It doesn't send or receive LACPDU.
> >> >Once it happens, the only message in dmesg is:
> >> >bond1: link status up again after 0 ms for interface eth2
> >>
> >>         I believe this message indicates that the slave entered
> >> BOND_LINK_FAIL state, but downdelay was not set.  The _FAIL state is
> >> really for managing the downdelay expiration, and a slave should not be
> >> in that state (outside of a brief transition entirely within
> >> bond_miimon_inspect) if downdelay is 0.
> >That's true, downdelay was set to 0, we only use updelay 500.
> >Does it mean, that the bonding driver shouldn't set slave to FAIL
> >state in this case?
>
>         It really shouldn't change the slave->link outside of the
> monitoring functions at all, because there are side effects that are not
> happening (user space notifications, updelay / downdelay, etc).
>
> >> >This behavior can be reproduced (not every time):
> >> >1. Set slave link down
> >> >2. Wait for 1-3 seconds
> >> >3. Set slave link up
> >> >
> >> >The fix is to check slave->link before setting it to BOND_LINK_FAIL or
> >> >BOND_LINK_DOWN state. If got invalid Speed/Dupex values and link is in
> >> >BOND_LINK_UP state, mark it as BOND_LINK_FAIL; otherwise mark it as
> >> >BOND_LINK_DOWN.
> >> >
> >> >Fixes: 334031219a84 ("bonding/802.3ad: fix slave link initialization
> >> >transition states")
> >> >Signed-off-by: Aleksei Zakharov <zakharov.a.g@yandex.ru>
> >> >---
> >> > drivers/net/bonding/bond_main.c | 2 +-
> >> > 1 file changed, 1 insertion(+), 1 deletion(-)
> >> >
> >> >diff --git a/drivers/net/bonding/bond_main.c b/drivers/net/bonding/bond_main.c
> >> >index 931d9d935686..a28776d8f33f 100644
> >> >--- a/drivers/net/bonding/bond_main.c
> >> >+++ b/drivers/net/bonding/bond_main.c
> >> >@@ -3135,7 +3135,7 @@ static int bond_slave_netdev_event(unsigned long event,
> >> >                */
> >> >               if (bond_update_speed_duplex(slave) &&
> >> >                   BOND_MODE(bond) == BOND_MODE_8023AD) {
> >> >-                      if (slave->last_link_up)
> >> >+                      if (slave->link == BOND_LINK_UP)
> >> >                               slave->link = BOND_LINK_FAIL;
> >> >                       else
> >> >                               slave->link = BOND_LINK_DOWN;
> >>
> >>         Is the core problem here that slaves are reporting link up, but
> >> returning invalid values for speed and/or duplex?  If so, what network
> >> device are you testing with that is exhibiting this behavior?
> >That's true, because link becomes FAIL right in this block of code.
> >We use Mellanox ConnectX-3 Pro nic.
> >
> >>
> >>         If I'm not mistaken, there have been several iterations of
> >> hackery on this block of code to work around this same problem, and each
> >> time there's some corner case that still doesn't work.
> >As i can see, commit 4d2c0cda0744 ("bonding: speed/duplex update at
> >NETDEV_UP event")
> >introduced BOND_LINK_DOWN state if update speed/duplex failed.
> >
> >Commit ea53abfab960 ("bonding/802.3ad: fix link_failure_count tracking")
> >changed DOWN state to FAIL.
> >
> >Commit 334031219a84 ("bonding/802.3ad: fix slave link initialization
> >transition states")
> >implemented different new state for different current states, but it
> >was based on slave->last_link_up.
> >In our case slave->last_link_up !=0 when this code runs. But, slave is
> >not in UP state at the moment. It becomes
> >FAIL and hangs in this state.
> >So, it looks like checking if slave is in UP mode is more appropriate
> >here. At least it works in our case.
> >
> >There was one more commit 12185dfe4436 ("bonding: Force slave speed
> >check after link state recovery for 802.3ad")
> >but it doesn't help in our case.
> >
> >>
> >>         As Davem asked last time around, is the real problem that device
> >> drivers report carrier up but supply invalid speed and duplex state?
> >Probably, but I'm not quite sure right now. We didn't face this issue
> >before 4d2c0cda0744 and ea53abfab960
> >commits.
>
>         My concern here is that we keep adding special cases to this
> code apparently without really understanding the root cause of the
> failures.  4d2c0cda0744 asserts that there is a problem that drivers are
> not supplying speed and duplex information at NETDEV_UP time, but is not
> specific as to the details (hardware information).  Before we add
> another change, I would like to understand what the actual underlying
> cause of the failure is, and if yours is somehow different from what
> 4d2c0cda0744 or ea53abfab960 were fixing (or trying to fix).
>
>         Would it be possible for you to instrument the code here to dump
> out the duplex/speed failure information and carrier state of the slave
> device at this point when it fails in your testing?  Something like the
> following (which I have not compile tested):
>
> diff --git a/drivers/net/bonding/bond_main.c b/drivers/net/bonding/bond_main.c
> index 931d9d935686..758af8c2b9e1 100644
> --- a/drivers/net/bonding/bond_main.c
> +++ b/drivers/net/bonding/bond_main.c
> @@ -378,15 +378,22 @@ static int bond_update_speed_duplex(struct slave *slave)
>         slave->duplex = DUPLEX_UNKNOWN;
>
>         res = __ethtool_get_link_ksettings(slave_dev, &ecmd);
> -       if (res < 0)
> +       if (res < 0) {
> +               pr_err("DBG ksettings res %d slave %s\n", res, slave_dev->name);
>                 return 1;
> -       if (ecmd.base.speed == 0 || ecmd.base.speed == ((__u32)-1))
> +       }
> +       if (ecmd.base.speed == 0 || ecmd.base.speed == ((__u32)-1)) {
> +               pr_err("DBG speed %u slave %s\n", ecmd.base.speed,
> +                      slave_dev->name);
>                 return 1;
> +       }
>         switch (ecmd.base.duplex) {
>         case DUPLEX_FULL:
>         case DUPLEX_HALF:
>                 break;
>         default:
> +               pr_err("DBG duplex %u slave %s\n", ecmd.base.duplex,
> +                      slave_dev->name);
>                 return 1;
>         }
>
> @@ -3135,6 +3142,9 @@ static int bond_slave_netdev_event(unsigned long event,
>                  */
>                 if (bond_update_speed_duplex(slave) &&
>                     BOND_MODE(bond) == BOND_MODE_8023AD) {
> +                       pr_err("DBG slave %s event %d carrier %d\n",
> +                              slave->dev->name, event,
> +                              netif_carrier_ok(slave->dev));
>                         if (slave->last_link_up)
>                                 slave->link = BOND_LINK_FAIL;
>                         else

Thanks, did that, without my patch. Here is the output when link doesn't work.
Host has actor port state 71 and partner port state 1:
[Thu Sep 19 12:14:04 2019] mlx4_en: eth2: Steering Mode 1
[Thu Sep 19 12:14:04 2019] DBG speed 4294967295 slave eth2
[Thu Sep 19 12:14:04 2019] DBG slave eth2 event 1 carrier 0
[Thu Sep 19 12:14:04 2019] 8021q: adding VLAN 0 to HW filter on device eth2
[Thu Sep 19 12:14:04 2019] mlx4_en: eth2: Link Up
[Thu Sep 19 12:14:04 2019] bond-san: link status up again after 0 ms
for interface eth2

Here is the output when everything works fine:
[Thu Sep 19 12:15:40 2019] mlx4_en: eth2: Steering Mode 1
[Thu Sep 19 12:15:40 2019] DBG speed 4294967295 slave eth2
[Thu Sep 19 12:15:40 2019] DBG slave eth2 event 1 carrier 0
[Thu Sep 19 12:15:40 2019] 8021q: adding VLAN 0 to HW filter on device eth2
[Thu Sep 19 12:15:40 2019] bond-san: link status definitely down for
interface eth2, disabling it
[Thu Sep 19 12:15:40 2019] mlx4_en: eth2: Link Up
[Thu Sep 19 12:15:40 2019] bond-san: link status up for interface
eth2, enabling it in 500 ms
[Thu Sep 19 12:15:41 2019] bond-san: link status definitely up for
interface eth2, 10000 Mbps full duplex

If I'm not mistaken, there's up event before carrier is up.


-- 
Best Regards,
Aleksei Zakharov
System administrator

  reply	other threads:[~2019-09-19  9:53 UTC|newest]

Thread overview: 19+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-09-18 13:05 [PATCH] bonding/802.3ad: fix slave initialization states race Aleksei Zakharov
2019-09-18 14:34 ` Jay Vosburgh
     [not found]   ` <CAJYOGF9KZdouvmTxQcTOQgsi-uBxbvW50K3ufW1=8neeW98QVA@mail.gmail.com>
2019-09-18 18:27     ` Fwd: " Алексей Захаров
2019-09-19  8:00       ` Jay Vosburgh
2019-09-19  9:53         ` Алексей Захаров [this message]
2019-09-19 15:27           ` Jay Vosburgh
2019-09-20 13:52             ` Jay Vosburgh
2019-09-20 16:00               ` Алексей Захаров
2019-09-21  7:06                 ` Jay Vosburgh
2019-09-21 11:17                   ` Алексей Захаров
2019-09-25  0:31                     ` Jay Vosburgh
2019-09-25 11:01                       ` Aleksei Zakharov
2019-09-26  4:38                         ` Jay Vosburgh
2019-09-26 14:25                           ` Aleksei Zakharov
2019-09-26 20:01                             ` Jay Vosburgh
2019-09-27 11:14                               ` Aleksei Zakharov
2019-09-27  9:43                         ` zhangsha (A)
2019-10-22 12:05                       ` Aleksei Zakharov
2019-09-24 13:52 ` David Miller

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CAJYOGF87z-o9=a20dC2mZRtfMU58uL0yxZkQJ-bxe5skVvi2rA@mail.gmail.com' \
    --to=zaharov@selectel.ru \
    --cc=jay.vosburgh@canonical.com \
    --cc=netdev@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).