All of lore.kernel.org
 help / color / mirror / Atom feed
From: Florian Fainelli <f.fainelli@gmail.com>
To: Vladimir Oltean <olteanv@gmail.com>
Cc: Jakub Kicinski <kuba@kernel.org>,
	"David S. Miller" <davem@davemloft.net>,
	Andrew Lunn <andrew@lunn.ch>,
	Vivien Didelot <vivien.didelot@gmail.com>,
	Tobias Waldekranz <tobias@waldekranz.com>,
	netdev@vger.kernel.org, linux-kernel@vger.kernel.org,
	Roopa Prabhu <roopa@nvidia.com>,
	Nikolay Aleksandrov <nikolay@nvidia.com>,
	Jiri Pirko <jiri@resnulli.us>, Ido Schimmel <idosch@idosch.org>,
	Alexandre Belloni <alexandre.belloni@bootlin.com>,
	UNGLinuxDriver@microchip.com, Vadym Kochan <vkochan@marvell.com>,
	Taras Chornyi <tchornyi@marvell.com>,
	Grygorii Strashko <grygorii.strashko@ti.com>,
	Vignesh Raghavendra <vigneshr@ti.com>,
	Ioana Ciornei <ioana.ciornei@nxp.com>,
	Ivan Vecera <ivecera@redhat.com>,
	linux-omap@vger.kernel.org,
	Vladimir Oltean <vladimir.oltean@nxp.com>
Subject: Re: [RFC PATCH v2 net-next 08/16] net: dsa: replay port and host-joined mdb entries when joining the bridge
Date: Mon, 22 Mar 2021 08:56:37 -0700	[thread overview]
Message-ID: <602ff919-61ce-2e8e-55f1-5a1bf21e90a1@gmail.com> (raw)
In-Reply-To: <20210320095311.noojqngoyobuntej@skbuf>



On 3/20/2021 2:53 AM, Vladimir Oltean wrote:
> On Fri, Mar 19, 2021 at 03:20:38PM -0700, Florian Fainelli wrote:
>>
>>
>> On 3/18/2021 4:18 PM, Vladimir Oltean wrote:
>>> From: Vladimir Oltean <vladimir.oltean@nxp.com>
>>>
>>> I have udhcpcd in my system and this is configured to bring interfaces
>>> up as soon as they are created.
>>>
>>> I create a bridge as follows:
>>>
>>> ip link add br0 type bridge
>>>
>>> As soon as I create the bridge and udhcpcd brings it up, I have some
>>> other crap (avahi)
>>
>> How dare you ;)
> 
> Well, it comes preinstalled on my system, I don't need it, and it has
> caused me nothing but trouble. So I think it has earned its title :D
> 
>>> that starts sending some random IPv6 packets to
>>> advertise some local services, and from there, the br0 bridge joins the
>>> following IPv6 groups:
>>>
>>> 33:33:ff:6d:c1:9c vid 0
>>> 33:33:00:00:00:6a vid 0
>>> 33:33:00:00:00:fb vid 0
>>>
>>> br_dev_xmit
>>> -> br_multicast_rcv
>>>    -> br_ip6_multicast_add_group
>>>       -> __br_multicast_add_group
>>>          -> br_multicast_host_join
>>>             -> br_mdb_notify
>>>
>>> This is all fine, but inside br_mdb_notify we have br_mdb_switchdev_host
>>> hooked up, and switchdev will attempt to offload the host joined groups
>>> to an empty list of ports. Of course nobody offloads them.
>>>
>>> Then when we add a port to br0:
>>>
>>> ip link set swp0 master br0
>>>
>>> the bridge doesn't replay the host-joined MDB entries from br_add_if,
>>> and eventually the host joined addresses expire, and a switchdev
>>> notification for deleting it is emitted, but surprise, the original
>>> addition was already completely missed.
>>>
>>> The strategy to address this problem is to replay the MDB entries (both
>>> the port ones and the host joined ones) when the new port joins the
>>> bridge, similar to what vxlan_fdb_replay does (in that case, its FDB can
>>> be populated and only then attached to a bridge that you offload).
>>> However there are 2 possibilities: the addresses can be 'pushed' by the
>>> bridge into the port, or the port can 'pull' them from the bridge.
>>>
>>> Considering that in the general case, the new port can be really late to
>>> the party, and there may have been many other switchdev ports that
>>> already received the initial notification, we would like to avoid
>>> delivering duplicate events to them, since they might misbehave. And
>>> currently, the bridge calls the entire switchdev notifier chain, whereas
>>> for replaying it should just call the notifier block of the new guy.
>>> But the bridge doesn't know what is the new guy's notifier block, it
>>> just knows where the switchdev notifier chain is. So for simplification,
>>> we make this a driver-initiated pull for now, and the notifier block is
>>> passed as an argument.
>>>
>>> To emulate the calling context for mdb objects (deferred and put on the
>>> blocking notifier chain), we must iterate under RCU protection through
>>> the bridge's mdb entries, queue them, and only call them once we're out
>>> of the RCU read-side critical section.
>>>
>>> Suggested-by: Ido Schimmel <idosch@idosch.org>
>>> Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
>>> ---
>>>  include/linux/if_bridge.h |  9 +++++
>>>  net/bridge/br_mdb.c       | 84 +++++++++++++++++++++++++++++++++++++++
>>>  net/dsa/dsa_priv.h        |  2 +
>>>  net/dsa/port.c            |  6 +++
>>>  net/dsa/slave.c           |  2 +-
>>>  5 files changed, 102 insertions(+), 1 deletion(-)
>>>
>>> diff --git a/include/linux/if_bridge.h b/include/linux/if_bridge.h
>>> index ebd16495459c..4c25dafb013d 100644
>>> --- a/include/linux/if_bridge.h
>>> +++ b/include/linux/if_bridge.h
>>> @@ -69,6 +69,8 @@ bool br_multicast_has_querier_anywhere(struct net_device *dev, int proto);
>>>  bool br_multicast_has_querier_adjacent(struct net_device *dev, int proto);
>>>  bool br_multicast_enabled(const struct net_device *dev);
>>>  bool br_multicast_router(const struct net_device *dev);
>>> +int br_mdb_replay(struct net_device *br_dev, struct net_device *dev,
>>> +		  struct notifier_block *nb, struct netlink_ext_ack *extack);
>>>  #else
>>>  static inline int br_multicast_list_adjacent(struct net_device *dev,
>>>  					     struct list_head *br_ip_list)
>>> @@ -93,6 +95,13 @@ static inline bool br_multicast_router(const struct net_device *dev)
>>>  {
>>>  	return false;
>>>  }
>>> +static inline int br_mdb_replay(struct net_device *br_dev,
>>> +				struct net_device *dev,
>>> +				struct notifier_block *nb,
>>> +				struct netlink_ext_ack *extack)
>>> +{
>>> +	return -EINVAL;
>>
>> Should we return -EOPNOTUSPP such that this is not made fatal for DSA if
>> someone compiles its kernel with CONFIG_BRIDGE_IGMP_SNOOPING disabled?
> 
> Sure, I'll change the return values of the shims everywhere.
> 
>>> +}
>>>  #endif
>>>
>>>  #if IS_ENABLED(CONFIG_BRIDGE) && IS_ENABLED(CONFIG_BRIDGE_VLAN_FILTERING)
>>> diff --git a/net/bridge/br_mdb.c b/net/bridge/br_mdb.c
>>> index 8846c5bcd075..23973186094c 100644
>>> --- a/net/bridge/br_mdb.c
>>> +++ b/net/bridge/br_mdb.c
>>> @@ -506,6 +506,90 @@ static void br_mdb_complete(struct net_device *dev, int err, void *priv)
>>>  	kfree(priv);
>>>  }
>>>
>>> +static int br_mdb_replay_one(struct notifier_block *nb, struct net_device *dev,
>>> +			     struct net_bridge_mdb_entry *mp, int obj_id,
>>> +			     struct net_device *orig_dev,
>>> +			     struct netlink_ext_ack *extack)
>>> +{
>>> +	struct switchdev_notifier_port_obj_info obj_info = {
>>> +		.info = {
>>> +			.dev = dev,
>>> +			.extack = extack,
>>> +		},
>>> +	};
>>> +	struct switchdev_obj_port_mdb mdb = {
>>> +		.obj = {
>>> +			.orig_dev = orig_dev,
>>> +			.id = obj_id,
>>> +		},
>>> +		.vid = mp->addr.vid,
>>> +	};
>>> +	int err;
>>> +
>>> +	if (mp->addr.proto == htons(ETH_P_IP))
>>> +		ip_eth_mc_map(mp->addr.dst.ip4, mdb.addr);
>>> +#if IS_ENABLED(CONFIG_IPV6)
>>> +	else if (mp->addr.proto == htons(ETH_P_IPV6))
>>> +		ipv6_eth_mc_map(&mp->addr.dst.ip6, mdb.addr);
>>> +#endif
>>> +	else
>>> +		ether_addr_copy(mdb.addr, mp->addr.dst.mac_addr);
>>
>> How you would feel about re-using br_mdb_switchdev_host_port() here and
>> pass a 'type' value that is neither RTM_NEWDB nor RTM_DELDB just so you
>> don't have to duplicate that code here and we ensure it is in sync?
> 
> The trouble is that br_mdb_switchdev_host calls switchdev_port_obj_add,
> and I think the agreement was that replayed events should be a silent,
> one-to-one conversation via a direct call to the notifier block of the
> interested driver, as opposed to a call to the entire notifier chain
> which would make everybody else in the system see duplicates. This is
> the reason why I duplicated mostly everything.

It's not a whole lot of notification but if you passed a type argument
that is neither of the two supported value (say -1),
br_mdb_switchdev_host_port() would end its execution there, and that
would avoid the duplication altogether. I am not stuck on that idea and
can hardly think for now of why this function would change, or why the
switchdev_obj_port_mdb would change, too.
-- 
Florian

  reply	other threads:[~2021-03-22 15:57 UTC|newest]

Thread overview: 55+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-03-18 23:18 [RFC PATCH v2 net-next 00/16] Better support for sandwiched LAGs with bridge and DSA Vladimir Oltean
2021-03-18 23:18 ` [RFC PATCH v2 net-next 01/16] net: dsa: call dsa_port_bridge_join when joining a LAG that is already in a bridge Vladimir Oltean
2021-03-19 22:04   ` Florian Fainelli
2021-03-22 10:24   ` Tobias Waldekranz
2021-03-18 23:18 ` [RFC PATCH v2 net-next 02/16] net: dsa: pass extack to dsa_port_{bridge,lag}_join Vladimir Oltean
2021-03-19 22:05   ` Florian Fainelli
2021-03-22 10:25   ` Tobias Waldekranz
2021-03-18 23:18 ` [RFC PATCH v2 net-next 03/16] net: dsa: inherit the actual bridge port flags at join time Vladimir Oltean
2021-03-19  1:53   ` kernel test robot
2021-03-19 22:08   ` Florian Fainelli
2021-03-20 10:05     ` Vladimir Oltean
2021-03-18 23:18 ` [RFC PATCH v2 net-next 04/16] net: dsa: sync up with bridge port's STP state when joining Vladimir Oltean
2021-03-19 22:11   ` Florian Fainelli
2021-03-22 10:29   ` Tobias Waldekranz
2021-03-18 23:18 ` [RFC PATCH v2 net-next 05/16] net: dsa: sync up VLAN filtering state when joining the bridge Vladimir Oltean
2021-03-19 22:11   ` Florian Fainelli
2021-03-22 10:30   ` Tobias Waldekranz
2021-03-18 23:18 ` [RFC PATCH v2 net-next 06/16] net: dsa: sync multicast router " Vladimir Oltean
2021-03-19 22:12   ` Florian Fainelli
2021-03-22 11:17   ` Tobias Waldekranz
2021-03-22 11:43     ` Vladimir Oltean
2021-03-18 23:18 ` [RFC PATCH v2 net-next 07/16] net: dsa: sync ageing time " Vladimir Oltean
2021-03-19 22:13   ` Florian Fainelli
2021-03-20 10:09     ` Vladimir Oltean
2021-03-22 11:20   ` Tobias Waldekranz
2021-03-18 23:18 ` [RFC PATCH v2 net-next 08/16] net: dsa: replay port and host-joined mdb entries " Vladimir Oltean
2021-03-19 22:20   ` Florian Fainelli
2021-03-20  9:53     ` Vladimir Oltean
2021-03-22 15:56       ` Florian Fainelli [this message]
2021-03-18 23:18 ` [RFC PATCH v2 net-next 09/16] net: dsa: replay port and local fdb " Vladimir Oltean
2021-03-22 15:44   ` Tobias Waldekranz
2021-03-22 16:19     ` Vladimir Oltean
2021-03-22 17:07       ` Tobias Waldekranz
2021-03-22 17:13         ` Vladimir Oltean
2021-03-18 23:18 ` [RFC PATCH v2 net-next 10/16] net: dsa: replay VLANs installed on port " Vladimir Oltean
2021-03-19 22:24   ` Florian Fainelli
2021-03-18 23:18 ` [RFC PATCH v2 net-next 11/16] net: ocelot: support multiple bridges Vladimir Oltean
2021-03-18 23:18 ` [RFC PATCH v2 net-next 12/16] net: ocelot: call ocelot_netdevice_bridge_join when joining a bridged LAG Vladimir Oltean
2021-03-18 23:18 ` [RFC PATCH v2 net-next 13/16] net: ocelot: replay switchdev events when joining bridge Vladimir Oltean
2021-03-18 23:18 ` [RFC PATCH v2 net-next 14/16] net: dsa: don't set skb->offload_fwd_mark when not offloading the bridge Vladimir Oltean
2021-03-19  8:52   ` DENG Qingfang
2021-03-19  9:06     ` Vladimir Oltean
2021-03-19  9:29       ` DENG Qingfang
2021-03-19 10:49         ` Vladimir Oltean
2021-03-22  8:04           ` DENG Qingfang
2021-03-22 22:23             ` Vladimir Oltean
2021-03-22 16:06   ` Florian Fainelli
2021-03-18 23:18 ` [RFC PATCH v2 net-next 15/16] net: dsa: return -EOPNOTSUPP when driver does not implement .port_lag_join Vladimir Oltean
2021-03-22 15:51   ` Florian Fainelli
2021-03-22 15:58   ` Tobias Waldekranz
2021-03-18 23:18 ` [RFC PATCH v2 net-next 16/16] net: bridge: switchdev: let drivers inform which bridge ports are offloaded Vladimir Oltean
2021-03-19  2:31   ` kernel test robot
2021-03-19  2:46   ` kernel test robot
2021-03-22 16:30   ` Tobias Waldekranz
2021-03-22 17:19     ` Vladimir Oltean

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=602ff919-61ce-2e8e-55f1-5a1bf21e90a1@gmail.com \
    --to=f.fainelli@gmail.com \
    --cc=UNGLinuxDriver@microchip.com \
    --cc=alexandre.belloni@bootlin.com \
    --cc=andrew@lunn.ch \
    --cc=davem@davemloft.net \
    --cc=grygorii.strashko@ti.com \
    --cc=idosch@idosch.org \
    --cc=ioana.ciornei@nxp.com \
    --cc=ivecera@redhat.com \
    --cc=jiri@resnulli.us \
    --cc=kuba@kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-omap@vger.kernel.org \
    --cc=netdev@vger.kernel.org \
    --cc=nikolay@nvidia.com \
    --cc=olteanv@gmail.com \
    --cc=roopa@nvidia.com \
    --cc=tchornyi@marvell.com \
    --cc=tobias@waldekranz.com \
    --cc=vigneshr@ti.com \
    --cc=vivien.didelot@gmail.com \
    --cc=vkochan@marvell.com \
    --cc=vladimir.oltean@nxp.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.