All of lore.kernel.org
 help / color / mirror / Atom feed
From: Ivan Vecera <ivecera@redhat.com>
To: Jarod Wilson <jarod@redhat.com>
Cc: linux-kernel@vger.kernel.org, Jay Vosburgh <j.vosburgh@gmail.com>,
	Veaceslav Falico <vfalico@gmail.com>,
	Andy Gospodarek <andy@greyhouse.net>,
	"David S. Miller" <davem@davemloft.net>,
	Jakub Kicinski <kuba@kernel.org>, Thomas Davis <tadavis@lbl.gov>,
	netdev@vger.kernel.org
Subject: Re: [PATCH net] bonding: fix feature flag setting at init time
Date: Mon, 23 Nov 2020 09:19:21 +0100	[thread overview]
Message-ID: <20201123091921.0277d562@ceranb> (raw)
In-Reply-To: <20201123031716.6179-1-jarod@redhat.com>

On Sun, 22 Nov 2020 22:17:16 -0500
Jarod Wilson <jarod@redhat.com> wrote:

> Have run into a case where bond_option_mode_set() gets called before
> hw_features has been filled in, and very bad things happen when
> netdev_change_features() then gets called, because the empty hw_features
> wipes out almost all features. Further reading of netdev feature flag
> documentation suggests drivers aren't supposed to touch wanted_features,
> so this changes bond_option_mode_set() to use netdev_increment_features()
> and &= ~BOND_XFRM_FEATURES on mode changes and then only calling
> netdev_features_change() if there was actually a change of features. This
> specifically fixes bonding on top of mlxsw interfaces, and has been
> regression-tested with ixgbe interfaces. This change also simplifies the
> xfrm-specific code in bond_setup() a little bit as well.

Hi Jarod,

the reason is not correct... The problem is not with empty ->hw_features but
with empty ->wanted_features.
During bond device creation bond_newlink() is called. It calls bond_changelink()
first and afterwards register_netdevice(). The problem is that ->wanted_features
are initialized in register_netdevice() so during bond_changlink() call
->wanted_features is 0. So...

bond_newlink()
-> bond_changelink()
   -> __bond_opt_set()
      -> bond_option_mode_set()
         -> netdev_change_features()
            -> __netdev_update_features()
               features = netdev_get_wanted_features()
                          { dev->features & ~dev->hw_features) | dev->wanted_features }

dev->wanted_features is here zero so the rest of the expression clears a bunch of
bits from dev->features...

In case of mlxsw it is important that NETIF_F_HW_VLAN_CTAG_FILTER bit is cleared in
bonding device because in this case vlan_add_rx_filter_info() does not call
bond_vlan_rx_add_vid() so mlxsw_sp_port_add_vid() is not called as well.

Later this causes a WARN in mlxsw_sp_inetaddr_port_vlan_event() because
instance of mlxsw_sp_port_vlan does not exist as mlxsw_sp_port_add_vid() was not
called.

Btw. it should be enough to call existing snippet in bond_option_mode_set() only
when device is already registered?

E.g.:
diff --git a/drivers/net/bonding/bond_options.c b/drivers/net/bonding/bond_options.c
index 9abfaae1c6f7..ca4913fee5a9 100644
--- a/drivers/net/bonding/bond_options.c
+++ b/drivers/net/bonding/bond_options.c
@@ -768,11 +768,13 @@ static int bond_option_mode_set(struct bonding *bond,
                bond->params.tlb_dynamic_lb = 1;
 
 #ifdef CONFIG_XFRM_OFFLOAD
-       if (newval->value == BOND_MODE_ACTIVEBACKUP)
-               bond->dev->wanted_features |= BOND_XFRM_FEATURES;
-       else
-               bond->dev->wanted_features &= ~BOND_XFRM_FEATURES;
-       netdev_change_features(bond->dev);
+       if (dev->reg_state == NETREG_REGISTERED) {
+               if (newval->value == BOND_MODE_ACTIVEBACKUP)
+                       bond->dev->wanted_features |= BOND_XFRM_FEATURES;
+               else
+                       bond->dev->wanted_features &= ~BOND_XFRM_FEATURES;
+               netdev_change_features(bond->dev);
+       }
 #endif /* CONFIG_XFRM_OFFLOAD */


Thanks,
Ivan


  reply	other threads:[~2020-11-23  8:19 UTC|newest]

Thread overview: 22+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-11-23  3:17 [PATCH net] bonding: fix feature flag setting at init time Jarod Wilson
2020-11-23  8:19 ` Ivan Vecera [this message]
2020-11-23 15:32 ` kernel test robot
2020-11-23 15:32   ` kernel test robot
2020-12-02 17:30 ` [PATCH net v2] " Jarod Wilson
2020-12-02 17:41   ` Ivan Vecera
2020-12-02 17:53   ` Jakub Kicinski
2020-12-02 19:03     ` Jarod Wilson
2020-12-02 19:22       ` Jakub Kicinski
2020-12-02 19:39         ` Jarod Wilson
2020-12-02 17:55   ` Jay Vosburgh
2020-12-02 19:23     ` Jarod Wilson
2020-12-02 20:17       ` Jay Vosburgh
2020-12-02 20:54         ` Jarod Wilson
2020-12-03  0:43   ` [PATCH net v3] " Jarod Wilson
2020-12-03 16:45     ` Jakub Kicinski
2020-12-05 16:13       ` Jarod Wilson
2020-12-03 16:50     ` Jakub Kicinski
2020-12-04  3:14       ` Jarod Wilson
2020-12-04 15:45         ` Jakub Kicinski
2020-12-05 17:22     ` [PATCH net v4] " Jarod Wilson
2020-12-08 19:27       ` Jakub Kicinski

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20201123091921.0277d562@ceranb \
    --to=ivecera@redhat.com \
    --cc=andy@greyhouse.net \
    --cc=davem@davemloft.net \
    --cc=j.vosburgh@gmail.com \
    --cc=jarod@redhat.com \
    --cc=kuba@kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=netdev@vger.kernel.org \
    --cc=tadavis@lbl.gov \
    --cc=vfalico@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.