From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-6.8 required=3.0 tests=HEADER_FROM_DIFFERENT_DOMAINS, INCLUDES_PATCH,MAILING_LIST_MULTI,SIGNED_OFF_BY,SPF_HELO_NONE,SPF_PASS, URIBL_BLOCKED autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 7A770C49ED7 for ; Thu, 19 Sep 2019 08:00:40 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id 4E8A321907 for ; Thu, 19 Sep 2019 08:00:40 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S2388316AbfISIAj convert rfc822-to-8bit (ORCPT ); Thu, 19 Sep 2019 04:00:39 -0400 Received: from youngberry.canonical.com ([91.189.89.112]:51001 "EHLO youngberry.canonical.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S2387523AbfISIAj (ORCPT ); Thu, 19 Sep 2019 04:00:39 -0400 Received: from static-dcd-cqq-121001.business.bouyguestelecom.com ([212.194.121.1] helo=nyx.localdomain) by youngberry.canonical.com with esmtpsa (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.86_2) (envelope-from ) id 1iArMi-0001Hh-8X; Thu, 19 Sep 2019 08:00:36 +0000 Received: by nyx.localdomain (Postfix, from userid 1000) id 2D03324010F; Thu, 19 Sep 2019 10:00:36 +0200 (CEST) Received: from nyx (localhost [127.0.0.1]) by nyx.localdomain (Postfix) with ESMTP id 26DF7289C51; Thu, 19 Sep 2019 10:00:36 +0200 (CEST) From: Jay Vosburgh To: =?us-ascii?Q?=3D=3FUTF-8=3FB=3F0JDQu9C10LrRgdC10Lkg0JfQsNGF0LDRgNC+0LI?= =?us-ascii?Q?=3D=3F=3D?= cc: netdev@vger.kernel.org Subject: Re: Fwd: [PATCH] bonding/802.3ad: fix slave initialization states race In-reply-to: References: <20190918130545.GA11133@yandex.ru> <31893.1568817274@nyx> Comments: In-reply-to =?us-ascii?Q?=3D=3FUTF-8=3FB=3F0JDQu9C10LrRgdC10Lkg0?= =?us-ascii?Q?JfQsNGF0LDRgNC+0LI=3D=3F=3D?= message dated "Wed, 18 Sep 2019 21:27:05 +0300." X-Mailer: MH-E 8.5+bzr; nmh 1.7.1-RC3; GNU Emacs 27.0.50 MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 8BIT Date: Thu, 19 Sep 2019 10:00:36 +0200 Message-ID: <9357.1568880036@nyx> Sender: netdev-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: netdev@vger.kernel.org Алексей Захаров wrote: >> >Once a while, one of 802.3ad slaves fails to initialize and hangs in >> >BOND_LINK_FAIL state. Commit 334031219a84 ("bonding/802.3ad: fix slave >> >link initialization transition states") checks slave->last_link_up. But >> >link can still hang in weird state. >> >After physical link comes up it sends first two LACPDU messages and >> >doesn't work properly after that. It doesn't send or receive LACPDU. >> >Once it happens, the only message in dmesg is: >> >bond1: link status up again after 0 ms for interface eth2 >> >> I believe this message indicates that the slave entered >> BOND_LINK_FAIL state, but downdelay was not set. The _FAIL state is >> really for managing the downdelay expiration, and a slave should not be >> in that state (outside of a brief transition entirely within >> bond_miimon_inspect) if downdelay is 0. >That's true, downdelay was set to 0, we only use updelay 500. >Does it mean, that the bonding driver shouldn't set slave to FAIL >state in this case? It really shouldn't change the slave->link outside of the monitoring functions at all, because there are side effects that are not happening (user space notifications, updelay / downdelay, etc). >> >This behavior can be reproduced (not every time): >> >1. Set slave link down >> >2. Wait for 1-3 seconds >> >3. Set slave link up >> > >> >The fix is to check slave->link before setting it to BOND_LINK_FAIL or >> >BOND_LINK_DOWN state. If got invalid Speed/Dupex values and link is in >> >BOND_LINK_UP state, mark it as BOND_LINK_FAIL; otherwise mark it as >> >BOND_LINK_DOWN. >> > >> >Fixes: 334031219a84 ("bonding/802.3ad: fix slave link initialization >> >transition states") >> >Signed-off-by: Aleksei Zakharov >> >--- >> > drivers/net/bonding/bond_main.c | 2 +- >> > 1 file changed, 1 insertion(+), 1 deletion(-) >> > >> >diff --git a/drivers/net/bonding/bond_main.c b/drivers/net/bonding/bond_main.c >> >index 931d9d935686..a28776d8f33f 100644 >> >--- a/drivers/net/bonding/bond_main.c >> >+++ b/drivers/net/bonding/bond_main.c >> >@@ -3135,7 +3135,7 @@ static int bond_slave_netdev_event(unsigned long event, >> > */ >> > if (bond_update_speed_duplex(slave) && >> > BOND_MODE(bond) == BOND_MODE_8023AD) { >> >- if (slave->last_link_up) >> >+ if (slave->link == BOND_LINK_UP) >> > slave->link = BOND_LINK_FAIL; >> > else >> > slave->link = BOND_LINK_DOWN; >> >> Is the core problem here that slaves are reporting link up, but >> returning invalid values for speed and/or duplex? If so, what network >> device are you testing with that is exhibiting this behavior? >That's true, because link becomes FAIL right in this block of code. >We use Mellanox ConnectX-3 Pro nic. > >> >> If I'm not mistaken, there have been several iterations of >> hackery on this block of code to work around this same problem, and each >> time there's some corner case that still doesn't work. >As i can see, commit 4d2c0cda0744 ("bonding: speed/duplex update at >NETDEV_UP event") >introduced BOND_LINK_DOWN state if update speed/duplex failed. > >Commit ea53abfab960 ("bonding/802.3ad: fix link_failure_count tracking") >changed DOWN state to FAIL. > >Commit 334031219a84 ("bonding/802.3ad: fix slave link initialization >transition states") >implemented different new state for different current states, but it >was based on slave->last_link_up. >In our case slave->last_link_up !=0 when this code runs. But, slave is >not in UP state at the moment. It becomes >FAIL and hangs in this state. >So, it looks like checking if slave is in UP mode is more appropriate >here. At least it works in our case. > >There was one more commit 12185dfe4436 ("bonding: Force slave speed >check after link state recovery for 802.3ad") >but it doesn't help in our case. > >> >> As Davem asked last time around, is the real problem that device >> drivers report carrier up but supply invalid speed and duplex state? >Probably, but I'm not quite sure right now. We didn't face this issue >before 4d2c0cda0744 and ea53abfab960 >commits. My concern here is that we keep adding special cases to this code apparently without really understanding the root cause of the failures. 4d2c0cda0744 asserts that there is a problem that drivers are not supplying speed and duplex information at NETDEV_UP time, but is not specific as to the details (hardware information). Before we add another change, I would like to understand what the actual underlying cause of the failure is, and if yours is somehow different from what 4d2c0cda0744 or ea53abfab960 were fixing (or trying to fix). Would it be possible for you to instrument the code here to dump out the duplex/speed failure information and carrier state of the slave device at this point when it fails in your testing? Something like the following (which I have not compile tested): diff --git a/drivers/net/bonding/bond_main.c b/drivers/net/bonding/bond_main.c index 931d9d935686..758af8c2b9e1 100644 --- a/drivers/net/bonding/bond_main.c +++ b/drivers/net/bonding/bond_main.c @@ -378,15 +378,22 @@ static int bond_update_speed_duplex(struct slave *slave) slave->duplex = DUPLEX_UNKNOWN; res = __ethtool_get_link_ksettings(slave_dev, &ecmd); - if (res < 0) + if (res < 0) { + pr_err("DBG ksettings res %d slave %s\n", res, slave_dev->name); return 1; - if (ecmd.base.speed == 0 || ecmd.base.speed == ((__u32)-1)) + } + if (ecmd.base.speed == 0 || ecmd.base.speed == ((__u32)-1)) { + pr_err("DBG speed %u slave %s\n", ecmd.base.speed, + slave_dev->name); return 1; + } switch (ecmd.base.duplex) { case DUPLEX_FULL: case DUPLEX_HALF: break; default: + pr_err("DBG duplex %u slave %s\n", ecmd.base.duplex, + slave_dev->name); return 1; } @@ -3135,6 +3142,9 @@ static int bond_slave_netdev_event(unsigned long event, */ if (bond_update_speed_duplex(slave) && BOND_MODE(bond) == BOND_MODE_8023AD) { + pr_err("DBG slave %s event %d carrier %d\n", + slave->dev->name, event, + netif_carrier_ok(slave->dev)); if (slave->last_link_up) slave->link = BOND_LINK_FAIL; else Thanks, -J --- -Jay Vosburgh, jay.vosburgh@canonical.com