Netdev Archive on lore.kernel.org
 help / color / Atom feed
* [PATCH net 0/6] bridge: Fix snooping in multi-bridge config with switchdev
@ 2021-05-04 18:22 Joseph Huang
  2021-05-04 18:22 ` [PATCH 1/6] bridge: Refactor br_mdb_notify Joseph Huang
                   ` (6 more replies)
  0 siblings, 7 replies; 13+ messages in thread
From: Joseph Huang @ 2021-05-04 18:22 UTC (permalink / raw)
  To: Roopa Prabhu, Nikolay Aleksandrov, David S. Miller,
	Jakub Kicinski, bridge, netdev, linux-kernel
  Cc: Joseph Huang

This series of patches contains the following fixes:

1. In a distributed system with multiple hardware-offloading bridges,
   if a multicast source is attached to a Non-Querier bridge, the bridge
   will not forward any multicast packets from that source to the Querier.

                    +--------------------+
                    |                    |
                    |      Snooping      |    +------------+
                    |      Bridge 1      |----| Listener 1 |
                    |     (Querier)      |    +------------+
                    |                    |
                    +--------------------+
                              |
                              |
                    +----+---------+-----+
                    |    | mrouter |     |
   +-----------+    |    +---------+     |    +------------+
   | MC Source |----|      Snooping      |----| Listener 2 |
   +-----------|    |      Bridge 2      |    +------------+
                    |    (Non-Querier)   |
                    +--------------------+

   In this scenario, Listener 1 will never receive multicast traffic
   from MC Source since Snooping Bridge 2 does not forward multicast
   packets to the mrouter port. Patches 0001, 0002, and 0003 address
   this issue.

2. If mcast_flood is disabled on a bridge port, some of the snooping
   functions stop working properly.

   a. Consider the following scenario:

                       +--------------------+
                       |                    |
                       |      Snooping      |    +------------+
                       |      Bridge 1      |----| Listener 1 |
                       |     (Querier)      |    +------------+
                       |                    |
                       +--------------------+
                                 |
                                 |
                       +--------------------+
                       |    | mrouter |     |
      +-----------+    |    +---------+     |
      | MC Source |----|      Snooping      |
      +-----------|    |      Bridge 2      |
                       |    (Non-Querier)   |
                       +--------------------+

      In this scenario, Listener 1 will never receive multicast traffic
      from MC Source if mcast_flood is disabled on the mrouter port on
      Snooping Bridge 2. Patch 0004 addresses this issue.

   b. For a Non-Querier bridge, if mcast_flood is disabled on a bridge
      port, Queries received from other Querier will not be forwarded
      out of that bridge port. Patch 0005 addresses this issue.

3. After a system boots up, the first couple Reports are not handled
   properly:

   1) the Report from the Host is being flooded (via br_flood) to all
      bridge ports, and
   2) if the mrouter port's mcast_flood is disabled, the Reports received
      from other hosts will not be forwarded to the Querier.

   Patch 0006 addresses this issue.

These patches were developed and verified initially against 5.4 kernel
(due to hardware platform limitation) and forward-patched to 5.12.
Snooping code introduced between 5.4 and 5.12 are not extensively tested
(only IGMPv2/MLDv1 were tested). The hardware platform used were two
bridges utilizing a single Marvell 88E6352 Ethernet switch chip (i.e.,
no cross-chip bridging involved).

Joseph Huang (6):
  bridge: Refactor br_mdb_notify
  bridge: Offload mrouter port forwarding to switchdev
  bridge: Avoid traffic disruption when Querier state changes
  bridge: Force mcast_flooding for mrouter ports
  bridge: Flood Queries even when mcast_flood is disabled
  bridge: Always multicast_flood Reports

 net/bridge/br_device.c    |   5 +-
 net/bridge/br_forward.c   |   3 +-
 net/bridge/br_input.c     |   5 +-
 net/bridge/br_mdb.c       |  70 +++++++++++++---------
 net/bridge/br_multicast.c | 121 ++++++++++++++++++++++++++++++++++----
 net/bridge/br_private.h   |  11 +++-
 6 files changed, 169 insertions(+), 46 deletions(-)


base-commit: 5e321ded302da4d8c5d5dd953423d9b748ab3775
-- 
2.17.1


^ permalink raw reply	[flat|nested] 13+ messages in thread

* [PATCH 1/6] bridge: Refactor br_mdb_notify
  2021-05-04 18:22 [PATCH net 0/6] bridge: Fix snooping in multi-bridge config with switchdev Joseph Huang
@ 2021-05-04 18:22 ` Joseph Huang
  2021-05-04 18:22 ` [PATCH 2/6] bridge: Offload mrouter port forwarding to switchdev Joseph Huang
                   ` (5 subsequent siblings)
  6 siblings, 0 replies; 13+ messages in thread
From: Joseph Huang @ 2021-05-04 18:22 UTC (permalink / raw)
  To: Roopa Prabhu, Nikolay Aleksandrov, David S. Miller,
	Jakub Kicinski, bridge, netdev, linux-kernel
  Cc: Joseph Huang

Separate out switchdev notification to its own function in preparation
for the patch "bridge: Offload mrouter port forwarding to switchdev".

Signed-off-by: Joseph Huang <Joseph.Huang@garmin.com>
---
 net/bridge/br_mdb.c     | 57 ++++++++++++++++++++++++-----------------
 net/bridge/br_private.h |  2 ++
 2 files changed, 36 insertions(+), 23 deletions(-)

diff --git a/net/bridge/br_mdb.c b/net/bridge/br_mdb.c
index 95fa4af0e8dd..e8684d798ec3 100644
--- a/net/bridge/br_mdb.c
+++ b/net/bridge/br_mdb.c
@@ -669,10 +669,9 @@ static void br_mdb_switchdev_host(struct net_device *dev,
 		br_mdb_switchdev_host_port(dev, lower_dev, mp, type);
 }
 
-void br_mdb_notify(struct net_device *dev,
-		   struct net_bridge_mdb_entry *mp,
-		   struct net_bridge_port_group *pg,
-		   int type)
+void br_mdb_switchdev_port(struct net_bridge_mdb_entry *mp,
+			   struct net_bridge_port *p,
+			   int type)
 {
 	struct br_mdb_complete_info *complete_info;
 	struct switchdev_obj_port_mdb mdb = {
@@ -681,30 +680,42 @@ void br_mdb_notify(struct net_device *dev,
 			.flags = SWITCHDEV_F_DEFER,
 		},
 	};
+
+	if (!p)
+		return;
+
+	br_switchdev_mdb_populate(&mdb, mp);
+
+	mdb.obj.orig_dev = p->dev;
+	switch (type) {
+	case RTM_NEWMDB:
+		complete_info = kmalloc(sizeof(*complete_info), GFP_ATOMIC);
+		if (!complete_info)
+			break;
+		complete_info->port = p;
+		complete_info->ip = mp->addr;
+		mdb.obj.complete_priv = complete_info;
+		mdb.obj.complete = br_mdb_complete;
+		if (switchdev_port_obj_add(p->dev, &mdb.obj, NULL))
+			kfree(complete_info);
+		break;
+	case RTM_DELMDB:
+		switchdev_port_obj_del(p->dev, &mdb.obj);
+		break;
+	}
+}
+
+void br_mdb_notify(struct net_device *dev,
+		   struct net_bridge_mdb_entry *mp,
+		   struct net_bridge_port_group *pg,
+		   int type)
+{
 	struct net *net = dev_net(dev);
 	struct sk_buff *skb;
 	int err = -ENOBUFS;
 
 	if (pg) {
-		br_switchdev_mdb_populate(&mdb, mp);
-
-		mdb.obj.orig_dev = pg->key.port->dev;
-		switch (type) {
-		case RTM_NEWMDB:
-			complete_info = kmalloc(sizeof(*complete_info), GFP_ATOMIC);
-			if (!complete_info)
-				break;
-			complete_info->port = pg->key.port;
-			complete_info->ip = mp->addr;
-			mdb.obj.complete_priv = complete_info;
-			mdb.obj.complete = br_mdb_complete;
-			if (switchdev_port_obj_add(pg->key.port->dev, &mdb.obj, NULL))
-				kfree(complete_info);
-			break;
-		case RTM_DELMDB:
-			switchdev_port_obj_del(pg->key.port->dev, &mdb.obj);
-			break;
-		}
+		br_mdb_switchdev_port(mp, pg->key.port, type);
 	} else {
 		br_mdb_switchdev_host(dev, mp, type);
 	}
diff --git a/net/bridge/br_private.h b/net/bridge/br_private.h
index 7ce8a77cc6b6..5cba9d228b9c 100644
--- a/net/bridge/br_private.h
+++ b/net/bridge/br_private.h
@@ -829,6 +829,8 @@ br_multicast_new_port_group(struct net_bridge_port *port, struct br_ip *group,
 			    u8 filter_mode, u8 rt_protocol);
 int br_mdb_hash_init(struct net_bridge *br);
 void br_mdb_hash_fini(struct net_bridge *br);
+void br_mdb_switchdev_port(struct net_bridge_mdb_entry *mp,
+			   struct net_bridge_port *p, int type);
 void br_mdb_notify(struct net_device *dev, struct net_bridge_mdb_entry *mp,
 		   struct net_bridge_port_group *pg, int type);
 void br_rtr_notify(struct net_device *dev, struct net_bridge_port *port,
-- 
2.17.1


^ permalink raw reply	[flat|nested] 13+ messages in thread

* [PATCH 2/6] bridge: Offload mrouter port forwarding to switchdev
  2021-05-04 18:22 [PATCH net 0/6] bridge: Fix snooping in multi-bridge config with switchdev Joseph Huang
  2021-05-04 18:22 ` [PATCH 1/6] bridge: Refactor br_mdb_notify Joseph Huang
@ 2021-05-04 18:22 ` Joseph Huang
  2021-05-04 18:22 ` [PATCH 3/6] bridge: Avoid traffic disruption when Querier state changes Joseph Huang
                   ` (4 subsequent siblings)
  6 siblings, 0 replies; 13+ messages in thread
From: Joseph Huang @ 2021-05-04 18:22 UTC (permalink / raw)
  To: Roopa Prabhu, Nikolay Aleksandrov, David S. Miller,
	Jakub Kicinski, bridge, netdev, linux-kernel
  Cc: Joseph Huang

Offload the mrouter port forwarding to switchdev also.

Currently multicast snooping fails to forward traffic in some cases
where there're multiple hardware-offloading bridges involved.

Consider the following scenario:

                 +--------------------+
                 |                    |
                 |      Snooping   +--|    +------------+
                 |      Bridge 1   |P1|----| Listener 1 |
                 |     (Querier)   +--|    +------------+
                 |                    |
                 +--------------------+
                           |
                           |
                 +--------------------+
                 |    | mrouter |     |
+-----------+    |    +---------+  +--|    +------------+
| MC Source |----|      Snooping   |P2|----| Listener 2 |
+-----------|    |      Bridge 2   +--|    +------------+
                 |    (Non-Querier)   |
                 +--------------------+

In this scenario, Listener 2 is able to receive multicast traffic
from MC Source while Listener 1 is not. The reason is that, on
Snooping Bridge 2, when the (soft) bridge attempts to forward
a packet to the mrouter port via br_multicast_flood, the effort
is blocked by nbp_switchdev_allowed_egress, since offload_fwd_mark
indicates that the packet should have been handled by the hardware
already. Listener 2 would receive the packets without any problem
since P2 is programmed into the hardware as a member of the group;
however the mrouter port would not since the mrouter port would
normally not be a member of any group, and thus will not be added
to the address database on the hardware switch chip.

This patch takes a simplistic approach: when an mrouter port is added/
deleted, it's added/deleted to all mdb groups; and similarly, when
an mdb group is added/deleted, all mrouter ports are added/deleted
to/from it.

Before this patch, switchdev programming matches exactly with mdb:
 +-----+
 | mdb |
 +-----+
    |
 +----------------------------------------------+
 |  |        +--------------------------------+ |
 |  |        | both in mdb and switchdev      | |
 |  |        | +------+   +------+   +------+ | |
 |  +--------|-| port |---| port |---| port | | |
 |           | +------+   +------+   +------+ | |
 | switchdev +--------------------------------+ |
 +----------------------------------------------+

After this patch, some entries will only exist in switchdev and not
in mdb:
 +-----+
 | mdb |
 +-----+
    |
 +---------------------------------------------------------------------+
 |  |        +--------------------------------++---------------------+ |
 |  |        |  both in mdb and switchdev     ||  only in switchdev  | |
 |  |        | +------+   +------+   +------+ || +------+   +------+ | |
 |  +--------|-| port |---| port |---| port | || |  mr  |---|  mr  | | |
 |           | +------+   +------+   +------+ || +------+   +------+ | |
 | switchdev +--------------------------------++---------------------+ |
 +---------------------------------------------------------------------+

Signed-off-by: Joseph Huang <Joseph.Huang@garmin.com>
---
 net/bridge/br_multicast.c | 28 ++++++++++++++++++++++++++++
 1 file changed, 28 insertions(+)

diff --git a/net/bridge/br_multicast.c b/net/bridge/br_multicast.c
index 226bb05c3b42..5ed0d5efef09 100644
--- a/net/bridge/br_multicast.c
+++ b/net/bridge/br_multicast.c
@@ -522,10 +522,26 @@ static void br_multicast_destroy_mdb_entry(struct net_bridge_mcast_gc *gc)
 	kfree_rcu(mp, rcu);
 }
 
+/* Add/delete all mrouter ports to/from a group
+ * called while br->multicast_lock is held
+ */
+static void br_multicast_group_change(struct net_bridge_mdb_entry *mp,
+				      bool is_group_added)
+{
+	struct net_bridge_port *p;
+	struct hlist_node *n;
+
+	hlist_for_each_entry_safe(p, n, &mp->br->router_list, rlist)
+		br_mdb_switchdev_port(mp, p, is_group_added ?
+				      RTM_NEWMDB : RTM_DELMDB);
+}
+
 static void br_multicast_del_mdb_entry(struct net_bridge_mdb_entry *mp)
 {
 	struct net_bridge *br = mp->br;
 
+	br_multicast_group_change(mp, false);
+
 	rhashtable_remove_fast(&br->mdb_hash_tbl, &mp->rhnode,
 			       br_mdb_rht_params);
 	hlist_del_init_rcu(&mp->mdb_node);
@@ -1068,6 +1084,8 @@ struct net_bridge_mdb_entry *br_multicast_new_group(struct net_bridge *br,
 		hlist_add_head_rcu(&mp->mdb_node, &br->mdb_list);
 	}
 
+	br_multicast_group_change(mp, true);
+
 	return mp;
 }
 
@@ -2651,8 +2669,18 @@ static void br_port_mc_router_state_change(struct net_bridge_port *p,
 		.flags = SWITCHDEV_F_DEFER,
 		.u.mrouter = is_mc_router,
 	};
+	struct net_bridge_mdb_entry *mp;
+	struct hlist_node *n;
 
 	switchdev_port_attr_set(p->dev, &attr, NULL);
+
+	/* Add/delete the router port to/from all multicast group
+	 * called whle br->multicast_lock is held
+	 */
+	hlist_for_each_entry_safe(mp, n, &p->br->mdb_list, mdb_node) {
+		br_mdb_switchdev_port(mp, p, is_mc_router ?
+				      RTM_NEWMDB : RTM_DELMDB);
+	}
 }
 
 /*
-- 
2.17.1


^ permalink raw reply	[flat|nested] 13+ messages in thread

* [PATCH 3/6] bridge: Avoid traffic disruption when Querier state changes
  2021-05-04 18:22 [PATCH net 0/6] bridge: Fix snooping in multi-bridge config with switchdev Joseph Huang
  2021-05-04 18:22 ` [PATCH 1/6] bridge: Refactor br_mdb_notify Joseph Huang
  2021-05-04 18:22 ` [PATCH 2/6] bridge: Offload mrouter port forwarding to switchdev Joseph Huang
@ 2021-05-04 18:22 ` Joseph Huang
  2021-05-04 18:22 ` [PATCH 4/6] bridge: Force mcast_flooding for mrouter ports Joseph Huang
                   ` (3 subsequent siblings)
  6 siblings, 0 replies; 13+ messages in thread
From: Joseph Huang @ 2021-05-04 18:22 UTC (permalink / raw)
  To: Roopa Prabhu, Nikolay Aleksandrov, David S. Miller,
	Jakub Kicinski, bridge, netdev, linux-kernel
  Cc: Joseph Huang

Modify br_mdb_notify so that switchdev mdb purge events are never sent
for mrouter ports, and when a non-mrouter port turns into an mrouter
port, all port groups associated with that port are deleted immediately.

Consider the following scenario:

                 +--------------------+
                 |                    |
+-----------+    |      Snooping   +--|    +------------+
| MC Source |----|      Bridge 1   |P1|----| Listener 1 |
+-----------|    |        +--+     +--|    +------------+
                 |        |P2|        |
                 +--------------------+
                           |
                           |
                 +--------------------+
                 |        |P3|        |
                 |        +--+     +--|    +---------------+
                 |      Snooping   |P4|----| Listener 2    |
                 |      Bridge 2   +--|    | Joins Group A |
                 |                    |    +---------------+
                 +--------------------+

Assuming initially Snooping Bridge 1 is the Querier, and Snooping Bridge
2 is a Non-Querier. After some Query/Report exchange, Snooping Bridge 1
would create an mdb group A and add P2 to the group, and starts a timer
on the port group A/P2.

Let's say Snooping Bridge 2 becomes the Querier for some reason (e.g.,
Snooping Bridge 2 rebooted) before the port group A/P2 expires. With
the patch 'bridge: Offload mrouter port forwarding to switchdev',
Snooping Bridge 1 detects that P2 has now become an mrouter port, and
will add it to the address database on the hardware switch chip (even
though it's already there when the port group A/P2 was added). This is
all fine until the timer on port group A/P2 expires, and then Snooping
Bridge 1 will purge P2 from the address database on the switch chip.
Now Listener 2 will not be able to receive multicast traffic from MC
Source anymore.

With this patch, immediately after a bridge port turns into an mrouter
port, the port's membership information is removed from the bridge' mdb,
but remains programmed in the address database on the hardware chip,
just to be consistent with the database/programming state as before
the Querier role change.  The hardware programming will be cleaned up
when the group expires (via br_multicast_group_change).

Signed-off-by: Joseph Huang <Joseph.Huang@garmin.com>
---
 net/bridge/br_mdb.c       | 17 +++++----
 net/bridge/br_multicast.c | 78 ++++++++++++++++++++++++++++++++-------
 net/bridge/br_private.h   |  3 +-
 3 files changed, 75 insertions(+), 23 deletions(-)

diff --git a/net/bridge/br_mdb.c b/net/bridge/br_mdb.c
index e8684d798ec3..c121b780450b 100644
--- a/net/bridge/br_mdb.c
+++ b/net/bridge/br_mdb.c
@@ -708,16 +708,17 @@ void br_mdb_switchdev_port(struct net_bridge_mdb_entry *mp,
 void br_mdb_notify(struct net_device *dev,
 		   struct net_bridge_mdb_entry *mp,
 		   struct net_bridge_port_group *pg,
-		   int type)
+		   int type, bool swdev_notify)
 {
 	struct net *net = dev_net(dev);
 	struct sk_buff *skb;
 	int err = -ENOBUFS;
 
-	if (pg) {
-		br_mdb_switchdev_port(mp, pg->key.port, type);
-	} else {
-		br_mdb_switchdev_host(dev, mp, type);
+	if (swdev_notify) {
+		if (pg)
+			br_mdb_switchdev_port(mp, pg->key.port, type);
+		else
+			br_mdb_switchdev_host(dev, mp, type);
 	}
 
 	skb = nlmsg_new(rtnl_mdb_nlmsg_size(pg), GFP_ATOMIC);
@@ -1011,7 +1012,7 @@ static int br_mdb_add_group(struct net_bridge *br, struct net_bridge_port *port,
 		}
 
 		br_multicast_host_join(mp, false);
-		br_mdb_notify(br->dev, mp, NULL, RTM_NEWMDB);
+		br_mdb_notify(br->dev, mp, NULL, RTM_NEWMDB, true);
 
 		return 0;
 	}
@@ -1042,7 +1043,7 @@ static int br_mdb_add_group(struct net_bridge *br, struct net_bridge_port *port,
 	rcu_assign_pointer(*pp, p);
 	if (entry->state == MDB_TEMPORARY)
 		mod_timer(&p->timer, now + br->multicast_membership_interval);
-	br_mdb_notify(br->dev, mp, p, RTM_NEWMDB);
+	br_mdb_notify(br->dev, mp, p, RTM_NEWMDB, true);
 	/* if we are adding a new EXCLUDE port group (*,G) it needs to be also
 	 * added to all S,G entries for proper replication, if we are adding
 	 * a new INCLUDE port (S,G) then all of *,G EXCLUDE ports need to be
@@ -1176,7 +1177,7 @@ static int __br_mdb_del(struct net_bridge *br, struct br_mdb_entry *entry,
 	if (entry->ifindex == mp->br->dev->ifindex && mp->host_joined) {
 		br_multicast_host_leave(mp, false);
 		err = 0;
-		br_mdb_notify(br->dev, mp, NULL, RTM_DELMDB);
+		br_mdb_notify(br->dev, mp, NULL, RTM_DELMDB, true);
 		if (!mp->ports && netif_running(br->dev))
 			mod_timer(&mp->timer, jiffies);
 		goto unlock;
diff --git a/net/bridge/br_multicast.c b/net/bridge/br_multicast.c
index 5ed0d5efef09..d7fbe1f3af18 100644
--- a/net/bridge/br_multicast.c
+++ b/net/bridge/br_multicast.c
@@ -506,7 +506,7 @@ static void br_multicast_fwd_src_handle(struct net_bridge_group_src *src)
 		sg_mp = br_mdb_ip_get(src->br, &sg_key.addr);
 		if (!sg_mp)
 			return;
-		br_mdb_notify(src->br->dev, sg_mp, sg, RTM_NEWMDB);
+		br_mdb_notify(src->br->dev, sg_mp, sg, RTM_NEWMDB, true);
 	}
 }
 
@@ -617,7 +617,12 @@ void br_multicast_del_pg(struct net_bridge_mdb_entry *mp,
 	br_multicast_eht_clean_sets(pg);
 	hlist_for_each_entry_safe(ent, tmp, &pg->src_list, node)
 		br_multicast_del_group_src(ent, false);
-	br_mdb_notify(br->dev, mp, pg, RTM_DELMDB);
+	/* don't notify switchdev if mrouter port
+	 * switchdev will be notified when group expires via
+	 * br_multicast_group_change
+	 */
+	br_mdb_notify(br->dev, mp, pg, RTM_DELMDB,
+		      hlist_unhashed(&pg->key.port->rlist));
 	if (!br_multicast_is_star_g(&mp->addr)) {
 		rhashtable_remove_fast(&br->sg_port_tbl, &pg->rhnode,
 				       br_sg_port_rht_params);
@@ -688,7 +693,7 @@ static void br_multicast_port_group_expired(struct timer_list *t)
 
 		if (WARN_ON(!mp))
 			goto out;
-		br_mdb_notify(br->dev, mp, pg, RTM_NEWMDB);
+		br_mdb_notify(br->dev, mp, pg, RTM_NEWMDB, true);
 	}
 out:
 	spin_unlock(&br->multicast_lock);
@@ -1228,7 +1233,7 @@ void br_multicast_host_join(struct net_bridge_mdb_entry *mp, bool notify)
 		if (br_multicast_is_star_g(&mp->addr))
 			br_multicast_star_g_host_state(mp);
 		if (notify)
-			br_mdb_notify(mp->br->dev, mp, NULL, RTM_NEWMDB);
+			br_mdb_notify(mp->br->dev, mp, NULL, RTM_NEWMDB, true);
 	}
 
 	if (br_group_is_l2(&mp->addr))
@@ -1246,7 +1251,7 @@ void br_multicast_host_leave(struct net_bridge_mdb_entry *mp, bool notify)
 	if (br_multicast_is_star_g(&mp->addr))
 		br_multicast_star_g_host_state(mp);
 	if (notify)
-		br_mdb_notify(mp->br->dev, mp, NULL, RTM_DELMDB);
+		br_mdb_notify(mp->br->dev, mp, NULL, RTM_DELMDB, true);
 }
 
 static struct net_bridge_port_group *
@@ -1294,7 +1299,7 @@ __br_multicast_add_group(struct net_bridge *br,
 	rcu_assign_pointer(*pp, p);
 	if (blocked)
 		p->flags |= MDB_PG_FLAGS_BLOCKED;
-	br_mdb_notify(br->dev, mp, p, RTM_NEWMDB);
+	br_mdb_notify(br->dev, mp, p, RTM_NEWMDB, true);
 
 found:
 	if (igmpv2_mldv1)
@@ -2436,7 +2441,7 @@ static int br_ip4_multicast_igmp3_report(struct net_bridge *br,
 			break;
 		}
 		if (changed)
-			br_mdb_notify(br->dev, mdst, pg, RTM_NEWMDB);
+			br_mdb_notify(br->dev, mdst, pg, RTM_NEWMDB, true);
 unlock_continue:
 		spin_unlock_bh(&br->multicast_lock);
 	}
@@ -2575,7 +2580,7 @@ static int br_ip6_multicast_mld2_report(struct net_bridge *br,
 			break;
 		}
 		if (changed)
-			br_mdb_notify(br->dev, mdst, pg, RTM_NEWMDB);
+			br_mdb_notify(br->dev, mdst, pg, RTM_NEWMDB, true);
 unlock_continue:
 		spin_unlock_bh(&br->multicast_lock);
 	}
@@ -2660,26 +2665,71 @@ br_multicast_update_query_timer(struct net_bridge *br,
 	mod_timer(&query->timer, jiffies + br->multicast_querier_interval);
 }
 
-static void br_port_mc_router_state_change(struct net_bridge_port *p,
+static void br_port_mc_router_state_change(struct net_bridge_port *port,
 					   bool is_mc_router)
 {
 	struct switchdev_attr attr = {
-		.orig_dev = p->dev,
+		.orig_dev = port->dev,
 		.id = SWITCHDEV_ATTR_ID_PORT_MROUTER,
 		.flags = SWITCHDEV_F_DEFER,
 		.u.mrouter = is_mc_router,
 	};
 	struct net_bridge_mdb_entry *mp;
+	struct net_bridge *br = port->br;
 	struct hlist_node *n;
 
-	switchdev_port_attr_set(p->dev, &attr, NULL);
+	switchdev_port_attr_set(port->dev, &attr, NULL);
 
 	/* Add/delete the router port to/from all multicast group
 	 * called whle br->multicast_lock is held
 	 */
-	hlist_for_each_entry_safe(mp, n, &p->br->mdb_list, mdb_node) {
-		br_mdb_switchdev_port(mp, p, is_mc_router ?
-				      RTM_NEWMDB : RTM_DELMDB);
+	hlist_for_each_entry_safe(mp, n, &br->mdb_list, mdb_node) {
+		struct net_bridge_port_group __rcu **pp;
+		struct net_bridge_port_group *p;
+		int port_group_exists = 0;
+
+		if (is_mc_router) {
+			for (pp = &mp->ports;
+				(p = mlock_dereference(*pp, br)) != NULL;
+				pp = &p->next) {
+				if (p->key.port == port) {
+					port_group_exists = 1;
+					if (!(p->flags & MDB_PG_FLAGS_PERMANENT))
+						br_multicast_del_pg(mp, p, pp);
+				}
+
+				if ((unsigned long)p->key.port < (unsigned long)port)
+					break;
+			}
+
+			if (port_group_exists)
+				continue;
+
+			br_mdb_switchdev_port(mp, port, RTM_NEWMDB);
+		} else {
+			for (pp = &mp->ports;
+				(p = mlock_dereference(*pp, br)) != NULL;
+				pp = &p->next) {
+				if (p->key.port == port) {
+					port_group_exists = 1;
+					break;
+				}
+
+				if ((unsigned long)p->key.port < (unsigned long)port)
+					break;
+			}
+
+			if (port_group_exists)
+				continue;
+
+			p = br_multicast_new_port_group(port, &mp->addr, *pp, 0,
+							NULL, MCAST_EXCLUDE, RTPROT_KERNEL);
+			if (unlikely(!p))
+				continue;
+			rcu_assign_pointer(*pp, p);
+			br_mdb_notify(br->dev, mp, p, RTM_NEWMDB, false);
+			mod_timer(&p->timer, jiffies + br->multicast_membership_interval);
+		}
 	}
 }
 
diff --git a/net/bridge/br_private.h b/net/bridge/br_private.h
index 5cba9d228b9c..9aa51508ba83 100644
--- a/net/bridge/br_private.h
+++ b/net/bridge/br_private.h
@@ -832,7 +832,8 @@ void br_mdb_hash_fini(struct net_bridge *br);
 void br_mdb_switchdev_port(struct net_bridge_mdb_entry *mp,
 			   struct net_bridge_port *p, int type);
 void br_mdb_notify(struct net_device *dev, struct net_bridge_mdb_entry *mp,
-		   struct net_bridge_port_group *pg, int type);
+		   struct net_bridge_port_group *pg, int type,
+		   bool swdev_notify);
 void br_rtr_notify(struct net_device *dev, struct net_bridge_port *port,
 		   int type);
 void br_multicast_del_pg(struct net_bridge_mdb_entry *mp,
-- 
2.17.1


^ permalink raw reply	[flat|nested] 13+ messages in thread

* [PATCH 4/6] bridge: Force mcast_flooding for mrouter ports
  2021-05-04 18:22 [PATCH net 0/6] bridge: Fix snooping in multi-bridge config with switchdev Joseph Huang
                   ` (2 preceding siblings ...)
  2021-05-04 18:22 ` [PATCH 3/6] bridge: Avoid traffic disruption when Querier state changes Joseph Huang
@ 2021-05-04 18:22 ` Joseph Huang
  2021-05-04 18:22 ` [PATCH 5/6] bridge: Flood Queries even when mcast_flood is disabled Joseph Huang
                   ` (2 subsequent siblings)
  6 siblings, 0 replies; 13+ messages in thread
From: Joseph Huang @ 2021-05-04 18:22 UTC (permalink / raw)
  To: Roopa Prabhu, Nikolay Aleksandrov, David S. Miller,
	Jakub Kicinski, bridge, netdev, linux-kernel
  Cc: Joseph Huang

When a port turns into an mrouter port, enable multicast flooding
on that port even if mcast_flood is disabled by user config. This
is necessary so that in a distributed system, the multicast packets
can be fowarded to the Querier when the multicast source is attached
to a Non-Querier bridge.

Consider the following scenario:

                 +--------------------+
                 |                    |
                 |      Snooping      |    +------------+
                 |      Bridge 1      |----| Listener 1 |
                 |     (Querier)      |    +------------+
                 |                    |
                 +--------------------+
                           |
                           |
                 +--------------------+
                 |    | mrouter |     |
+-----------+    |    +---------+     |
| MC Source |----|      Snooping      |
+-----------|    |      Bridge 2      |
                 |    (Non-Querier)   |
                 +--------------------+

In this scenario, Listener 1 will never receive multicast traffic
from MC Source if mcast_flood is disabled on the mrouter port on
Snooping Bridge 2.

Signed-off-by: Joseph Huang <Joseph.Huang@garmin.com>
---
 net/bridge/br_multicast.c | 15 +++++++++++++++
 1 file changed, 15 insertions(+)

diff --git a/net/bridge/br_multicast.c b/net/bridge/br_multicast.c
index d7fbe1f3af18..719ded3204a0 100644
--- a/net/bridge/br_multicast.c
+++ b/net/bridge/br_multicast.c
@@ -2680,6 +2680,21 @@ static void br_port_mc_router_state_change(struct net_bridge_port *port,
 
 	switchdev_port_attr_set(port->dev, &attr, NULL);
 
+	/* Force mcast_flood if mrouter port
+	 * this does not prevent netlink from changing it again
+	 */
+	if (is_mc_router && !(port->flags & BR_MCAST_FLOOD)) {
+		attr.id = SWITCHDEV_ATTR_ID_PORT_BRIDGE_FLAGS;
+		attr.u.brport_flags.val = BR_MCAST_FLOOD;
+		attr.u.brport_flags.mask = BR_MCAST_FLOOD;
+		switchdev_port_attr_set(port->dev, &attr, NULL);
+	} else if (!is_mc_router) {
+		attr.id = SWITCHDEV_ATTR_ID_PORT_BRIDGE_FLAGS;
+		attr.u.brport_flags.val = port->flags & BR_MCAST_FLOOD;
+		attr.u.brport_flags.mask = BR_MCAST_FLOOD;
+		switchdev_port_attr_set(port->dev, &attr, NULL);
+	}
+
 	/* Add/delete the router port to/from all multicast group
 	 * called whle br->multicast_lock is held
 	 */
-- 
2.17.1


^ permalink raw reply	[flat|nested] 13+ messages in thread

* [PATCH 5/6] bridge: Flood Queries even when mcast_flood is disabled
  2021-05-04 18:22 [PATCH net 0/6] bridge: Fix snooping in multi-bridge config with switchdev Joseph Huang
                   ` (3 preceding siblings ...)
  2021-05-04 18:22 ` [PATCH 4/6] bridge: Force mcast_flooding for mrouter ports Joseph Huang
@ 2021-05-04 18:22 ` Joseph Huang
  2021-05-04 18:22 ` [PATCH 6/6] bridge: Always multicast_flood Reports Joseph Huang
  2021-05-04 20:05 ` [PATCH net 0/6] bridge: Fix snooping in multi-bridge config with switchdev Nikolay Aleksandrov
  6 siblings, 0 replies; 13+ messages in thread
From: Joseph Huang @ 2021-05-04 18:22 UTC (permalink / raw)
  To: Roopa Prabhu, Nikolay Aleksandrov, David S. Miller,
	Jakub Kicinski, bridge, netdev, linux-kernel
  Cc: Joseph Huang

Modify the forwarding path so that received Queries are always flooded
even when mcast_flood is disabled on a bridge port.

In current implementation, when mcast_flood is disabled on a bridge
port, Queries received from other Querier will not be forwarded out of
that bridge port. This unfortunately broke multicast snooping.

Signed-off-by: Joseph Huang <Joseph.Huang@garmin.com>
---
 net/bridge/br_forward.c   | 3 ++-
 net/bridge/br_multicast.c | 3 +++
 net/bridge/br_private.h   | 3 +++
 3 files changed, 8 insertions(+), 1 deletion(-)

diff --git a/net/bridge/br_forward.c b/net/bridge/br_forward.c
index 6e9b049ae521..2fb9b4a78881 100644
--- a/net/bridge/br_forward.c
+++ b/net/bridge/br_forward.c
@@ -203,7 +203,8 @@ void br_flood(struct net_bridge *br, struct sk_buff *skb,
 				continue;
 			break;
 		case BR_PKT_MULTICAST:
-			if (!(p->flags & BR_MCAST_FLOOD) && skb->dev != br->dev)
+			if (!(p->flags & BR_MCAST_FLOOD) && skb->dev != br->dev &&
+			    !BR_INPUT_SKB_CB_FORCE_FLOOD(skb))
 				continue;
 			break;
 		case BR_PKT_BROADCAST:
diff --git a/net/bridge/br_multicast.c b/net/bridge/br_multicast.c
index 719ded3204a0..b7d9c491abe0 100644
--- a/net/bridge/br_multicast.c
+++ b/net/bridge/br_multicast.c
@@ -3238,6 +3238,7 @@ static int br_multicast_ipv4_rcv(struct net_bridge *br,
 		err = br_ip4_multicast_igmp3_report(br, port, skb, vid);
 		break;
 	case IGMP_HOST_MEMBERSHIP_QUERY:
+		BR_INPUT_SKB_CB(skb)->force_flood = 1;
 		br_ip4_multicast_query(br, port, skb, vid);
 		break;
 	case IGMP_HOST_LEAVE_MESSAGE:
@@ -3300,6 +3301,7 @@ static int br_multicast_ipv6_rcv(struct net_bridge *br,
 		err = br_ip6_multicast_mld2_report(br, port, skb, vid);
 		break;
 	case ICMPV6_MGM_QUERY:
+		BR_INPUT_SKB_CB(skb)->force_flood = 1;
 		err = br_ip6_multicast_query(br, port, skb, vid);
 		break;
 	case ICMPV6_MGM_REDUCTION:
@@ -3322,6 +3324,7 @@ int br_multicast_rcv(struct net_bridge *br, struct net_bridge_port *port,
 
 	BR_INPUT_SKB_CB(skb)->igmp = 0;
 	BR_INPUT_SKB_CB(skb)->mrouters_only = 0;
+	BR_INPUT_SKB_CB(skb)->force_flood = 0;
 
 	if (!br_opt_get(br, BROPT_MULTICAST_ENABLED))
 		return 0;
diff --git a/net/bridge/br_private.h b/net/bridge/br_private.h
index 9aa51508ba83..59af599d48eb 100644
--- a/net/bridge/br_private.h
+++ b/net/bridge/br_private.h
@@ -491,6 +491,7 @@ struct br_input_skb_cb {
 #ifdef CONFIG_BRIDGE_IGMP_SNOOPING
 	u8 igmp;
 	u8 mrouters_only:1;
+	u8 force_flood:1;
 #endif
 	u8 proxyarp_replied:1;
 	u8 src_port_isolated:1;
@@ -510,8 +511,10 @@ struct br_input_skb_cb {
 
 #ifdef CONFIG_BRIDGE_IGMP_SNOOPING
 # define BR_INPUT_SKB_CB_MROUTERS_ONLY(__skb)	(BR_INPUT_SKB_CB(__skb)->mrouters_only)
+# define BR_INPUT_SKB_CB_FORCE_FLOOD(__skb)		(BR_INPUT_SKB_CB(__skb)->force_flood)
 #else
 # define BR_INPUT_SKB_CB_MROUTERS_ONLY(__skb)	(0)
+# define BR_INPUT_SKB_CB_FORCE_FLOOD(__skb)		(0)
 #endif
 
 #define br_printk(level, br, format, args...)	\
-- 
2.17.1


^ permalink raw reply	[flat|nested] 13+ messages in thread

* [PATCH 6/6] bridge: Always multicast_flood Reports
  2021-05-04 18:22 [PATCH net 0/6] bridge: Fix snooping in multi-bridge config with switchdev Joseph Huang
                   ` (4 preceding siblings ...)
  2021-05-04 18:22 ` [PATCH 5/6] bridge: Flood Queries even when mcast_flood is disabled Joseph Huang
@ 2021-05-04 18:22 ` Joseph Huang
  2021-05-04 20:05 ` [PATCH net 0/6] bridge: Fix snooping in multi-bridge config with switchdev Nikolay Aleksandrov
  6 siblings, 0 replies; 13+ messages in thread
From: Joseph Huang @ 2021-05-04 18:22 UTC (permalink / raw)
  To: Roopa Prabhu, Nikolay Aleksandrov, David S. Miller,
	Jakub Kicinski, bridge, netdev, linux-kernel
  Cc: Joseph Huang

Modify the forwarding path so that IGMPv1/2/MLDv1 Reports are always
flooded by br_multicast_flood, regardless of the check done
by br_multicast_querier_exists.

This patch fixes the problems where after a system boots up, the first
couple of Reports are not handled properly in that:

1) the Report from the Host is being flooded (via br_flood) to all
   bridge ports, and
2) if the mrouter port's mcast_flood is disabled, the Reports received
   from other hosts will not be forwarded to the Querier.

Signed-off-by: Joseph Huang <Joseph.Huang@garmin.com>
---
 net/bridge/br_device.c    | 5 +++--
 net/bridge/br_input.c     | 5 +++--
 net/bridge/br_multicast.c | 3 +++
 net/bridge/br_private.h   | 3 +++
 4 files changed, 12 insertions(+), 4 deletions(-)

diff --git a/net/bridge/br_device.c b/net/bridge/br_device.c
index e8b626cc6bfd..ff75ba242f38 100644
--- a/net/bridge/br_device.c
+++ b/net/bridge/br_device.c
@@ -88,8 +88,9 @@ netdev_tx_t br_dev_xmit(struct sk_buff *skb, struct net_device *dev)
 		}
 
 		mdst = br_mdb_get(br, skb, vid);
-		if ((mdst || BR_INPUT_SKB_CB_MROUTERS_ONLY(skb)) &&
-		    br_multicast_querier_exists(br, eth_hdr(skb), mdst))
+		if (((mdst || BR_INPUT_SKB_CB_MROUTERS_ONLY(skb)) &&
+		    br_multicast_querier_exists(br, eth_hdr(skb), mdst)) ||
+		    BR_INPUT_SKB_CB_FORCE_MC_FLOOD(skb))
 			br_multicast_flood(mdst, skb, false, true);
 		else
 			br_flood(br, skb, BR_PKT_MULTICAST, false, true);
diff --git a/net/bridge/br_input.c b/net/bridge/br_input.c
index 8875e953ac53..572d7f20477f 100644
--- a/net/bridge/br_input.c
+++ b/net/bridge/br_input.c
@@ -129,8 +129,9 @@ int br_handle_frame_finish(struct net *net, struct sock *sk, struct sk_buff *skb
 	switch (pkt_type) {
 	case BR_PKT_MULTICAST:
 		mdst = br_mdb_get(br, skb, vid);
-		if ((mdst || BR_INPUT_SKB_CB_MROUTERS_ONLY(skb)) &&
-		    br_multicast_querier_exists(br, eth_hdr(skb), mdst)) {
+		if (((mdst || BR_INPUT_SKB_CB_MROUTERS_ONLY(skb)) &&
+		    br_multicast_querier_exists(br, eth_hdr(skb), mdst)) ||
+		    BR_INPUT_SKB_CB_FORCE_MC_FLOOD(skb)) {
 			if ((mdst && mdst->host_joined) ||
 			    br_multicast_is_router(br)) {
 				local_rcv = true;
diff --git a/net/bridge/br_multicast.c b/net/bridge/br_multicast.c
index b7d9c491abe0..dfdbe19f3e93 100644
--- a/net/bridge/br_multicast.c
+++ b/net/bridge/br_multicast.c
@@ -3231,6 +3231,7 @@ static int br_multicast_ipv4_rcv(struct net_bridge *br,
 	case IGMP_HOST_MEMBERSHIP_REPORT:
 	case IGMPV2_HOST_MEMBERSHIP_REPORT:
 		BR_INPUT_SKB_CB(skb)->mrouters_only = 1;
+		BR_INPUT_SKB_CB(skb)->force_mc_flood = 1;
 		err = br_ip4_multicast_add_group(br, port, ih->group, vid, src,
 						 true);
 		break;
@@ -3294,6 +3295,7 @@ static int br_multicast_ipv6_rcv(struct net_bridge *br,
 	case ICMPV6_MGM_REPORT:
 		src = eth_hdr(skb)->h_source;
 		BR_INPUT_SKB_CB(skb)->mrouters_only = 1;
+		BR_INPUT_SKB_CB(skb)->force_mc_flood = 1;
 		err = br_ip6_multicast_add_group(br, port, &mld->mld_mca, vid,
 						 src, true);
 		break;
@@ -3325,6 +3327,7 @@ int br_multicast_rcv(struct net_bridge *br, struct net_bridge_port *port,
 	BR_INPUT_SKB_CB(skb)->igmp = 0;
 	BR_INPUT_SKB_CB(skb)->mrouters_only = 0;
 	BR_INPUT_SKB_CB(skb)->force_flood = 0;
+	BR_INPUT_SKB_CB(skb)->force_mc_flood = 0;
 
 	if (!br_opt_get(br, BROPT_MULTICAST_ENABLED))
 		return 0;
diff --git a/net/bridge/br_private.h b/net/bridge/br_private.h
index 59af599d48eb..6d4f20d7f482 100644
--- a/net/bridge/br_private.h
+++ b/net/bridge/br_private.h
@@ -492,6 +492,7 @@ struct br_input_skb_cb {
 	u8 igmp;
 	u8 mrouters_only:1;
 	u8 force_flood:1;
+	u8 force_mc_flood:1;
 #endif
 	u8 proxyarp_replied:1;
 	u8 src_port_isolated:1;
@@ -512,9 +513,11 @@ struct br_input_skb_cb {
 #ifdef CONFIG_BRIDGE_IGMP_SNOOPING
 # define BR_INPUT_SKB_CB_MROUTERS_ONLY(__skb)	(BR_INPUT_SKB_CB(__skb)->mrouters_only)
 # define BR_INPUT_SKB_CB_FORCE_FLOOD(__skb)		(BR_INPUT_SKB_CB(__skb)->force_flood)
+# define BR_INPUT_SKB_CB_FORCE_MC_FLOOD(__skb)	(BR_INPUT_SKB_CB(__skb)->force_mc_flood)
 #else
 # define BR_INPUT_SKB_CB_MROUTERS_ONLY(__skb)	(0)
 # define BR_INPUT_SKB_CB_FORCE_FLOOD(__skb)		(0)
+# define BR_INPUT_SKB_CB_FORCE_MC_FLOOD(__skb)	(0)
 #endif
 
 #define br_printk(level, br, format, args...)	\
-- 
2.17.1


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH net 0/6] bridge: Fix snooping in multi-bridge config with switchdev
  2021-05-04 18:22 [PATCH net 0/6] bridge: Fix snooping in multi-bridge config with switchdev Joseph Huang
                   ` (5 preceding siblings ...)
  2021-05-04 18:22 ` [PATCH 6/6] bridge: Always multicast_flood Reports Joseph Huang
@ 2021-05-04 20:05 ` Nikolay Aleksandrov
  2021-05-04 20:37   ` Huang, Joseph
  6 siblings, 1 reply; 13+ messages in thread
From: Nikolay Aleksandrov @ 2021-05-04 20:05 UTC (permalink / raw)
  To: Joseph Huang, Roopa Prabhu, David S. Miller, Jakub Kicinski,
	bridge, netdev, linux-kernel, Ido Schimmel

On 04/05/2021 21:22, Joseph Huang wrote:
> This series of patches contains the following fixes:
> 
> 1. In a distributed system with multiple hardware-offloading bridges,
>    if a multicast source is attached to a Non-Querier bridge, the bridge
>    will not forward any multicast packets from that source to the Querier.
> 
>                     +--------------------+
>                     |                    |
>                     |      Snooping      |    +------------+
>                     |      Bridge 1      |----| Listener 1 |
>                     |     (Querier)      |    +------------+
>                     |                    |
>                     +--------------------+
>                               |
>                               |
>                     +----+---------+-----+
>                     |    | mrouter |     |
>    +-----------+    |    +---------+     |    +------------+
>    | MC Source |----|      Snooping      |----| Listener 2 |
>    +-----------|    |      Bridge 2      |    +------------+
>                     |    (Non-Querier)   |
>                     +--------------------+
> 
>    In this scenario, Listener 1 will never receive multicast traffic
>    from MC Source since Snooping Bridge 2 does not forward multicast
>    packets to the mrouter port. Patches 0001, 0002, and 0003 address
>    this issue.
> 
> 2. If mcast_flood is disabled on a bridge port, some of the snooping
>    functions stop working properly.
> 
>    a. Consider the following scenario:
> 
>                        +--------------------+
>                        |                    |
>                        |      Snooping      |    +------------+
>                        |      Bridge 1      |----| Listener 1 |
>                        |     (Querier)      |    +------------+
>                        |                    |
>                        +--------------------+
>                                  |
>                                  |
>                        +--------------------+
>                        |    | mrouter |     |
>       +-----------+    |    +---------+     |
>       | MC Source |----|      Snooping      |
>       +-----------|    |      Bridge 2      |
>                        |    (Non-Querier)   |
>                        +--------------------+
> 
>       In this scenario, Listener 1 will never receive multicast traffic
>       from MC Source if mcast_flood is disabled on the mrouter port on
>       Snooping Bridge 2. Patch 0004 addresses this issue.
> 
>    b. For a Non-Querier bridge, if mcast_flood is disabled on a bridge
>       port, Queries received from other Querier will not be forwarded
>       out of that bridge port. Patch 0005 addresses this issue.
> 
> 3. After a system boots up, the first couple Reports are not handled
>    properly:
> 
>    1) the Report from the Host is being flooded (via br_flood) to all
>       bridge ports, and
>    2) if the mrouter port's mcast_flood is disabled, the Reports received
>       from other hosts will not be forwarded to the Querier.
> 
>    Patch 0006 addresses this issue.
> 
> These patches were developed and verified initially against 5.4 kernel
> (due to hardware platform limitation) and forward-patched to 5.12.
> Snooping code introduced between 5.4 and 5.12 are not extensively tested
> (only IGMPv2/MLDv1 were tested). The hardware platform used were two
> bridges utilizing a single Marvell 88E6352 Ethernet switch chip (i.e.,
> no cross-chip bridging involved).
> 
> Joseph Huang (6):
>   bridge: Refactor br_mdb_notify
>   bridge: Offload mrouter port forwarding to switchdev
>   bridge: Avoid traffic disruption when Querier state changes
>   bridge: Force mcast_flooding for mrouter ports
>   bridge: Flood Queries even when mcast_flood is disabled
>   bridge: Always multicast_flood Reports
> 
>  net/bridge/br_device.c    |   5 +-
>  net/bridge/br_forward.c   |   3 +-
>  net/bridge/br_input.c     |   5 +-
>  net/bridge/br_mdb.c       |  70 +++++++++++++---------
>  net/bridge/br_multicast.c | 121 ++++++++++++++++++++++++++++++++++----
>  net/bridge/br_private.h   |  11 +++-
>  6 files changed, 169 insertions(+), 46 deletions(-)
> 
> 
> base-commit: 5e321ded302da4d8c5d5dd953423d9b748ab3775
> 

Hi,
This patch-set is inappropriate for -net, if at all. It's quite late over here
and I'll review the rest later, but I can say from a quick peek that patch 02
is unacceptable for it increases the complexity with 1 order of magnitude of all
add/del call paths and some of them can be invoked on user packets. A lot of this
functionality should be "hidden" in the driver or done by a user-space daemon/helper.
Most of the flooding behaviour changes must be hidden behind some new option
otherwise they'll break user setups that rely on the current. I'll review the patches
in detail over the following few days, net-next is closed anyway.

Cheers,
 Nik


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH net 0/6] bridge: Fix snooping in multi-bridge config with switchdev
  2021-05-04 20:05 ` [PATCH net 0/6] bridge: Fix snooping in multi-bridge config with switchdev Nikolay Aleksandrov
@ 2021-05-04 20:37   ` Huang, Joseph
  2021-05-04 22:29     ` Tobias Waldekranz
  0 siblings, 1 reply; 13+ messages in thread
From: Huang, Joseph @ 2021-05-04 20:37 UTC (permalink / raw)
  To: Nikolay Aleksandrov, Roopa Prabhu, David S. Miller,
	Jakub Kicinski, bridge, netdev, linux-kernel, Ido Schimmel

> Hi,
> This patch-set is inappropriate for -net, if at all. It's quite late over here and I'll
> review the rest later, but I can say from a quick peek that patch 02 is
> unacceptable for it increases the complexity with 1 order of magnitude of all
> add/del call paths and some of them can be invoked on user packets. A lot of
> this functionality should be "hidden" in the driver or done by a user-space
> daemon/helper.
> Most of the flooding behaviour changes must be hidden behind some new
> option otherwise they'll break user setups that rely on the current. I'll review
> the patches in detail over the following few days, net-next is closed anyway.
> 
> Cheers,
>  Nik

Hi Nik,

Thanks for your quick response!
Once you have a chance to review the set, please let me know how I can improve them to make them acceptable. These are real problems and we do need to fix them.

Thanks,
Joseph

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH net 0/6] bridge: Fix snooping in multi-bridge config with switchdev
  2021-05-04 20:37   ` Huang, Joseph
@ 2021-05-04 22:29     ` Tobias Waldekranz
  2021-05-04 23:26       ` Huang, Joseph
  0 siblings, 1 reply; 13+ messages in thread
From: Tobias Waldekranz @ 2021-05-04 22:29 UTC (permalink / raw)
  To: Huang, Joseph, Nikolay Aleksandrov, Roopa Prabhu,
	David S. Miller, Jakub Kicinski, bridge, netdev, linux-kernel,
	Ido Schimmel

On Tue, May 04, 2021 at 20:37, "Huang, Joseph" <Joseph.Huang@garmin.com> wrote:
>> Hi,
>> This patch-set is inappropriate for -net, if at all. It's quite late over here and I'll
>> review the rest later, but I can say from a quick peek that patch 02 is
>> unacceptable for it increases the complexity with 1 order of magnitude of all
>> add/del call paths and some of them can be invoked on user packets. A lot of
>> this functionality should be "hidden" in the driver or done by a user-space
>> daemon/helper.
>> Most of the flooding behaviour changes must be hidden behind some new
>> option otherwise they'll break user setups that rely on the current. I'll review
>> the patches in detail over the following few days, net-next is closed anyway.
>> 
>> Cheers,
>>  Nik
>
> Hi Nik,
>
> Thanks for your quick response!
> Once you have a chance to review the set, please let me know how I can improve them to make them acceptable. These are real problems and we do need to fix them.

If I may make a suggestion: I also work with mv88e6xxx systems, and we
have the same issues with known multicast not being flooded to router
ports. Knowing that chipset, I see what you are trying to do.

But other chips may work differently. Imagine for example a switch where
there is a separate vector of router ports that the hardware can OR in
after looking up the group in the ATU. This implementation would render
the performance gains possible on that device useless. As another
example, you could imagine a device where an ATU operation exists that
sets a bit in the vector of every group in a particular database;
instead of having to update each entry individually.

I think we (mv88e6xxx) will have to accept that we need to add the
proper scaffolding to manage this on the driver side. That way the
bridge can stay generic. The bridge could just provide some MDB iterator
to save us from having to cache all the configured groups.

So basically:

- In mv88e6xxx, maintain a per-switch vector of router ports.

- When a ports router state is toggled:
  1. Update the vector.
  2. Ask the bridge to iterate through all applicable groups and update
     the corresponding ATU entries.

- When a new MDB entry is updated, make sure to also OR in the current
  vector of router ports in the DPV of the ATU entry.


I would be happy to help out with testing of this!

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH net 0/6] bridge: Fix snooping in multi-bridge config with switchdev
  2021-05-04 22:29     ` Tobias Waldekranz
@ 2021-05-04 23:26       ` Huang, Joseph
  2021-05-05  6:52         ` Tobias Waldekranz
  2021-05-05  6:59         ` Nikolay Aleksandrov
  0 siblings, 2 replies; 13+ messages in thread
From: Huang, Joseph @ 2021-05-04 23:26 UTC (permalink / raw)
  To: Tobias Waldekranz, Nikolay Aleksandrov, Roopa Prabhu,
	David S. Miller, Jakub Kicinski, bridge, netdev, linux-kernel,
	Ido Schimmel

> If I may make a suggestion: I also work with mv88e6xxx systems, and we
> have the same issues with known multicast not being flooded to router
> ports. Knowing that chipset, I see what you are trying to do.
> 
> But other chips may work differently. Imagine for example a switch where
> there is a separate vector of router ports that the hardware can OR in after
> looking up the group in the ATU. This implementation would render the
> performance gains possible on that device useless. As another example, you
> could imagine a device where an ATU operation exists that sets a bit in the
> vector of every group in a particular database; instead of having to update
> each entry individually.
> 
> I think we (mv88e6xxx) will have to accept that we need to add the proper
> scaffolding to manage this on the driver side. That way the bridge can stay
> generic. The bridge could just provide some MDB iterator to save us from
> having to cache all the configured groups.
> 
> So basically:
> 
> - In mv88e6xxx, maintain a per-switch vector of router ports.
> 
> - When a ports router state is toggled:
>   1. Update the vector.
>   2. Ask the bridge to iterate through all applicable groups and update
>      the corresponding ATU entries.
> 
> - When a new MDB entry is updated, make sure to also OR in the current
>   vector of router ports in the DPV of the ATU entry.
> 
> 
> I would be happy to help out with testing of this!

Thanks for the suggestion/offer!

What patch 0002 does is that:

- When an mrouter port is added/deleted, it iterates over the list of mdb's
  to add/delete that port to/from the group in the hardware (I think this is
  what your bullet #2 does as well, except that one is done in the bridge,
  and the other is done in the driver)

- When a group is added/deleted, it iterates over the list of mrouter ports
  to add/delete the switchdev programming

I think what Nik is objecting to is that with this approach, there's now
a for-loop in the call paths (thus it "increases the complexity with 1 order
of magnitude), however I can't think of a way to avoid the looping (whether
done inside the bridge or in the driver) but still achieve the same result
(for Marvell at least).

I suspect that other SOHO switches might have this problem as well (Broadcom
comes to mind).

Thanks,
Joseph

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH net 0/6] bridge: Fix snooping in multi-bridge config with switchdev
  2021-05-04 23:26       ` Huang, Joseph
@ 2021-05-05  6:52         ` Tobias Waldekranz
  2021-05-05  6:59         ` Nikolay Aleksandrov
  1 sibling, 0 replies; 13+ messages in thread
From: Tobias Waldekranz @ 2021-05-05  6:52 UTC (permalink / raw)
  To: Huang, Joseph, Nikolay Aleksandrov, Roopa Prabhu,
	David S. Miller, Jakub Kicinski, bridge, netdev, linux-kernel,
	Ido Schimmel

On Tue, May 04, 2021 at 23:26, "Huang, Joseph" <Joseph.Huang@garmin.com> wrote:
>> If I may make a suggestion: I also work with mv88e6xxx systems, and we
>> have the same issues with known multicast not being flooded to router
>> ports. Knowing that chipset, I see what you are trying to do.
>> 
>> But other chips may work differently. Imagine for example a switch where
>> there is a separate vector of router ports that the hardware can OR in after
>> looking up the group in the ATU. This implementation would render the
>> performance gains possible on that device useless. As another example, you
>> could imagine a device where an ATU operation exists that sets a bit in the
>> vector of every group in a particular database; instead of having to update
>> each entry individually.
>> 
>> I think we (mv88e6xxx) will have to accept that we need to add the proper
>> scaffolding to manage this on the driver side. That way the bridge can stay
>> generic. The bridge could just provide some MDB iterator to save us from
>> having to cache all the configured groups.
>> 
>> So basically:
>> 
>> - In mv88e6xxx, maintain a per-switch vector of router ports.
>> 
>> - When a ports router state is toggled:
>>   1. Update the vector.
>>   2. Ask the bridge to iterate through all applicable groups and update
>>      the corresponding ATU entries.
>> 
>> - When a new MDB entry is updated, make sure to also OR in the current
>>   vector of router ports in the DPV of the ATU entry.
>> 
>> 
>> I would be happy to help out with testing of this!
>
> Thanks for the suggestion/offer!
>
> What patch 0002 does is that:
>
> - When an mrouter port is added/deleted, it iterates over the list of mdb's
>   to add/delete that port to/from the group in the hardware (I think this is
>   what your bullet #2 does as well, except that one is done in the bridge,
>   and the other is done in the driver)
>
> - When a group is added/deleted, it iterates over the list of mrouter ports
>   to add/delete the switchdev programming
>
> I think what Nik is objecting to is that with this approach, there's now
> a for-loop in the call paths (thus it "increases the complexity with 1 order
> of magnitude), however I can't think of a way to avoid the looping (whether
> done inside the bridge or in the driver) but still achieve the same result
> (for Marvell at least).

(I will stop trying to read Nikolay's mind and go forward with my own
opinions now :))

The problem with solving this at the bridge layer is that you miss out
on optimizations that are available at lower layers. As an example:

      br0
    /  |  \
swp0 swp1 swp2
     (R)  (R)

With two router ports, any new group added/removed to/from swp0 would
incur 3 individual ATU operations: First adding swp0, then each router
port individually in your loop. If you have the vector prepared in the
driver, you can batch them together in one operation.

This also atomically transitions the group from unknown to known without
disrupting any streams towards a router. In the bridge-layer solution,
flows will still be blocked in the (admittedly small) window between
adding swp0 and swp{1,2}.

> I suspect that other SOHO switches might have this problem as well (Broadcom
> comes to mind).

I suspect you are right. That is why I suggested implementing the
iterator in the bridge that can be reused by all drivers. Something
along the lines of br_fdb_replay. The rest should mostly be hardware
specific anyway.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH net 0/6] bridge: Fix snooping in multi-bridge config with switchdev
  2021-05-04 23:26       ` Huang, Joseph
  2021-05-05  6:52         ` Tobias Waldekranz
@ 2021-05-05  6:59         ` Nikolay Aleksandrov
  1 sibling, 0 replies; 13+ messages in thread
From: Nikolay Aleksandrov @ 2021-05-05  6:59 UTC (permalink / raw)
  To: Huang, Joseph, Tobias Waldekranz, Roopa Prabhu, David S. Miller,
	Jakub Kicinski, bridge, netdev, linux-kernel, Ido Schimmel

On 05/05/2021 02:26, Huang, Joseph wrote:
>> If I may make a suggestion: I also work with mv88e6xxx systems, and we
>> have the same issues with known multicast not being flooded to router
>> ports. Knowing that chipset, I see what you are trying to do.
>>
>> But other chips may work differently. Imagine for example a switch where
>> there is a separate vector of router ports that the hardware can OR in after
>> looking up the group in the ATU. This implementation would render the
>> performance gains possible on that device useless. As another example, you
>> could imagine a device where an ATU operation exists that sets a bit in the
>> vector of every group in a particular database; instead of having to update
>> each entry individually.
>>
>> I think we (mv88e6xxx) will have to accept that we need to add the proper
>> scaffolding to manage this on the driver side. That way the bridge can stay
>> generic. The bridge could just provide some MDB iterator to save us from
>> having to cache all the configured groups.
>>
>> So basically:
>>
>> - In mv88e6xxx, maintain a per-switch vector of router ports.
>>
>> - When a ports router state is toggled:
>>   1. Update the vector.
>>   2. Ask the bridge to iterate through all applicable groups and update
>>      the corresponding ATU entries.
>>
>> - When a new MDB entry is updated, make sure to also OR in the current
>>   vector of router ports in the DPV of the ATU entry.
>>
>>
>> I would be happy to help out with testing of this!
> 
> Thanks for the suggestion/offer!
> 
> What patch 0002 does is that:
> 
> - When an mrouter port is added/deleted, it iterates over the list of mdb's
>   to add/delete that port to/from the group in the hardware (I think this is
>   what your bullet #2 does as well, except that one is done in the bridge,
>   and the other is done in the driver)
> 
> - When a group is added/deleted, it iterates over the list of mrouter ports
>   to add/delete the switchdev programming
> 
> I think what Nik is objecting to is that with this approach, there's now
> a for-loop in the call paths (thus it "increases the complexity with 1 order
> of magnitude), however I can't think of a way to avoid the looping (whether
> done inside the bridge or in the driver) but still achieve the same result
> (for Marvell at least).
> 

Note that I did not say to avoid it in the switchdev driver. :)
I said it should be in the driver or in some user-space helper, but it mustn't
affect non-switchdev software use cases so much.

You can check how mlxsw[1] deals with mdbs and router ports.

[1] drivers/net/ethernet/mellanox/mlxsw/spectrum_switchdev.c

> I suspect that other SOHO switches might have this problem as well (Broadcom
> comes to mind).
> 
> Thanks,
> Joseph
> 


^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, back to index

Thread overview: 13+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-05-04 18:22 [PATCH net 0/6] bridge: Fix snooping in multi-bridge config with switchdev Joseph Huang
2021-05-04 18:22 ` [PATCH 1/6] bridge: Refactor br_mdb_notify Joseph Huang
2021-05-04 18:22 ` [PATCH 2/6] bridge: Offload mrouter port forwarding to switchdev Joseph Huang
2021-05-04 18:22 ` [PATCH 3/6] bridge: Avoid traffic disruption when Querier state changes Joseph Huang
2021-05-04 18:22 ` [PATCH 4/6] bridge: Force mcast_flooding for mrouter ports Joseph Huang
2021-05-04 18:22 ` [PATCH 5/6] bridge: Flood Queries even when mcast_flood is disabled Joseph Huang
2021-05-04 18:22 ` [PATCH 6/6] bridge: Always multicast_flood Reports Joseph Huang
2021-05-04 20:05 ` [PATCH net 0/6] bridge: Fix snooping in multi-bridge config with switchdev Nikolay Aleksandrov
2021-05-04 20:37   ` Huang, Joseph
2021-05-04 22:29     ` Tobias Waldekranz
2021-05-04 23:26       ` Huang, Joseph
2021-05-05  6:52         ` Tobias Waldekranz
2021-05-05  6:59         ` Nikolay Aleksandrov

Netdev Archive on lore.kernel.org

Archives are clonable:
	git clone --mirror https://lore.kernel.org/netdev/0 netdev/git/0.git
	git clone --mirror https://lore.kernel.org/netdev/1 netdev/git/1.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 netdev netdev/ https://lore.kernel.org/netdev \
		netdev@vger.kernel.org
	public-inbox-index netdev

Example config snippet for mirrors

Newsgroup available over NNTP:
	nntp://nntp.lore.kernel.org/org.kernel.vger.netdev


AGPL code for this site: git clone https://public-inbox.org/public-inbox.git