linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Ivan Vecera <ivecera@redhat.com>
To: Jarod Wilson <jarod@redhat.com>
Cc: linux-kernel@vger.kernel.org, Jay Vosburgh <j.vosburgh@gmail.com>,
	Veaceslav Falico <vfalico@gmail.com>,
	Andy Gospodarek <andy@greyhouse.net>,
	"David S. Miller" <davem@davemloft.net>,
	Jakub Kicinski <kuba@kernel.org>, Thomas Davis <tadavis@lbl.gov>,
	netdev@vger.kernel.org
Subject: Re: [PATCH net] bonding: fix feature flag setting at init time
Date: Mon, 23 Nov 2020 09:19:21 +0100	[thread overview]
Message-ID: <20201123091921.0277d562@ceranb> (raw)
In-Reply-To: <20201123031716.6179-1-jarod@redhat.com>

On Sun, 22 Nov 2020 22:17:16 -0500
Jarod Wilson <jarod@redhat.com> wrote:

> Have run into a case where bond_option_mode_set() gets called before
> hw_features has been filled in, and very bad things happen when
> netdev_change_features() then gets called, because the empty hw_features
> wipes out almost all features. Further reading of netdev feature flag
> documentation suggests drivers aren't supposed to touch wanted_features,
> so this changes bond_option_mode_set() to use netdev_increment_features()
> and &= ~BOND_XFRM_FEATURES on mode changes and then only calling
> netdev_features_change() if there was actually a change of features. This
> specifically fixes bonding on top of mlxsw interfaces, and has been
> regression-tested with ixgbe interfaces. This change also simplifies the
> xfrm-specific code in bond_setup() a little bit as well.

Hi Jarod,

the reason is not correct... The problem is not with empty ->hw_features but
with empty ->wanted_features.
During bond device creation bond_newlink() is called. It calls bond_changelink()
first and afterwards register_netdevice(). The problem is that ->wanted_features
are initialized in register_netdevice() so during bond_changlink() call
->wanted_features is 0. So...

bond_newlink()
-> bond_changelink()
   -> __bond_opt_set()
      -> bond_option_mode_set()
         -> netdev_change_features()
            -> __netdev_update_features()
               features = netdev_get_wanted_features()
                          { dev->features & ~dev->hw_features) | dev->wanted_features }

dev->wanted_features is here zero so the rest of the expression clears a bunch of
bits from dev->features...

In case of mlxsw it is important that NETIF_F_HW_VLAN_CTAG_FILTER bit is cleared in
bonding device because in this case vlan_add_rx_filter_info() does not call
bond_vlan_rx_add_vid() so mlxsw_sp_port_add_vid() is not called as well.

Later this causes a WARN in mlxsw_sp_inetaddr_port_vlan_event() because
instance of mlxsw_sp_port_vlan does not exist as mlxsw_sp_port_add_vid() was not
called.

Btw. it should be enough to call existing snippet in bond_option_mode_set() only
when device is already registered?

E.g.:
diff --git a/drivers/net/bonding/bond_options.c b/drivers/net/bonding/bond_options.c
index 9abfaae1c6f7..ca4913fee5a9 100644
--- a/drivers/net/bonding/bond_options.c
+++ b/drivers/net/bonding/bond_options.c
@@ -768,11 +768,13 @@ static int bond_option_mode_set(struct bonding *bond,
                bond->params.tlb_dynamic_lb = 1;
 
 #ifdef CONFIG_XFRM_OFFLOAD
-       if (newval->value == BOND_MODE_ACTIVEBACKUP)
-               bond->dev->wanted_features |= BOND_XFRM_FEATURES;
-       else
-               bond->dev->wanted_features &= ~BOND_XFRM_FEATURES;
-       netdev_change_features(bond->dev);
+       if (dev->reg_state == NETREG_REGISTERED) {
+               if (newval->value == BOND_MODE_ACTIVEBACKUP)
+                       bond->dev->wanted_features |= BOND_XFRM_FEATURES;
+               else
+                       bond->dev->wanted_features &= ~BOND_XFRM_FEATURES;
+               netdev_change_features(bond->dev);
+       }
 #endif /* CONFIG_XFRM_OFFLOAD */


Thanks,
Ivan


  reply	other threads:[~2020-11-23  8:19 UTC|newest]

Thread overview: 21+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-11-23  3:17 [PATCH net] bonding: fix feature flag setting at init time Jarod Wilson
2020-11-23  8:19 ` Ivan Vecera [this message]
2020-11-23 15:32 ` kernel test robot
2020-12-02 17:30 ` [PATCH net v2] " Jarod Wilson
2020-12-02 17:41   ` Ivan Vecera
2020-12-02 17:53   ` Jakub Kicinski
2020-12-02 19:03     ` Jarod Wilson
2020-12-02 19:22       ` Jakub Kicinski
2020-12-02 19:39         ` Jarod Wilson
2020-12-02 17:55   ` Jay Vosburgh
2020-12-02 19:23     ` Jarod Wilson
2020-12-02 20:17       ` Jay Vosburgh
2020-12-02 20:54         ` Jarod Wilson
2020-12-03  0:43   ` [PATCH net v3] " Jarod Wilson
2020-12-03 16:45     ` Jakub Kicinski
2020-12-05 16:13       ` Jarod Wilson
2020-12-03 16:50     ` Jakub Kicinski
2020-12-04  3:14       ` Jarod Wilson
2020-12-04 15:45         ` Jakub Kicinski
2020-12-05 17:22     ` [PATCH net v4] " Jarod Wilson
2020-12-08 19:27       ` Jakub Kicinski

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20201123091921.0277d562@ceranb \
    --to=ivecera@redhat.com \
    --cc=andy@greyhouse.net \
    --cc=davem@davemloft.net \
    --cc=j.vosburgh@gmail.com \
    --cc=jarod@redhat.com \
    --cc=kuba@kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=netdev@vger.kernel.org \
    --cc=tadavis@lbl.gov \
    --cc=vfalico@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).