All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH net-next 00/16] bridge: Limit number of MDB entries per port, port-vlan
@ 2023-01-26 17:01 ` Petr Machata
  0 siblings, 0 replies; 90+ messages in thread
From: Petr Machata @ 2023-01-26 17:01 UTC (permalink / raw)
  To: David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
	Roopa Prabhu, Nikolay Aleksandrov, netdev
  Cc: bridge, Petr Machata, Ido Schimmel

The MDB maintained by the bridge is limited. When the bridge is configured
for IGMP / MLD snooping, a buggy or malicious client can easily exhaust its
capacity. In SW datapath, the capacity is configurable through the
IFLA_BR_MCAST_HASH_MAX parameter, but ultimately is finite. Obviously a
similar limit exists in the HW datapath for purposes of offloading.

In order to prevent the issue of unilateral exhaustion of MDB resources,
introduce two parameters in each of two contexts:

- Per-port and (when BROPT_MCAST_VLAN_SNOOPING_ENABLED is enabled)
  per-port-VLAN number of MDB entries that the port is member in.

- Per-port and (when BROPT_MCAST_VLAN_SNOOPING_ENABLED is enabled)
  per-port-VLAN maximum permitted number of MDB entries, or 0 for
  no limit.

Per-port number of entries keeps track of the total number of MDB entries
configured on a given port. The per-port-VLAN value then keeps track of the
subset of MDB entries configured specifically for the given VLAN, on that
port. The number is adjusted as port_groups are created and deleted, and
therefore under multicast lock.

A maximum value, if non-zero, then places a limit on the number of entries
that can be configured in a given context. Attempts to add entries above
the maximum are rejected.

Rejection reason of netlink-based requests to add MDB entries is
communicated through extack. This channel is unavailable for rejections
triggered from the control path. To address this lack of visibility, the
patchset adds a tracepoint, bridge:br_mdb_full:

	# perf record -e bridge:br_mdb_full &
	# [...]
	# perf script | cut -d: -f4-
	 dev v2 af 2 src 192.0.2.1/:: grp 239.1.1.1/::/00:00:00:00:00:00 vid 0
	 dev v2 af 10 src 0.0.0.0/2001:db8:1::1 grp 0.0.0.0/ff0e::1/00:00:00:00:00:00 vid 0
	 dev v2 af 2 src 192.0.2.1/:: grp 239.1.1.1/::/00:00:00:00:00:00 vid 10
	 dev v2 af 10 src 0.0.0.0/2001:db8:1::1 grp 0.0.0.0/ff0e::1/00:00:00:00:00:00 vid 10

This tracepoint is triggered for mcast_hash_max exhaustions as well.

The following is an example of how the feature is used. A more extensive
example is available in patch #8:

	# bridge vlan set dev v1 vid 1 mcast_max_groups 1
	# bridge mdb add dev br port v1 grp 230.1.2.3 temp vid 1
	# bridge mdb add dev br port v1 grp 230.1.2.4 temp vid 1
	Error: bridge: Port-VLAN is already a member in mcast_max_groups (1) groups.

The patchset progresses as follows:

- In patch #1, set strict_start_type at two bridge-related policies. The
  reason is we are adding a new attribute to one of these, and want the new
  attribute to be parsed strictly. The other was adjusted for completeness'
  sake.

- In patches #2 to #5, br_mdb and br_multicast code is adjusted to make the
  following additions smoother.

- In patch #6, add the tracepoint.

- In patch #7, the code to maintain number of MDB entries is added as
  struct net_bridge_mcast_port::mdb_n_entries. The maximum is added, too,
  as struct net_bridge_mcast_port::mdb_max_entries, however at this point
  there is no way to set the value yet, and since 0 is treated as "no
  limit", the functionality doesn't change at this point. Note however,
  that mcast_hash_max violations already do trigger at this point.

- In patch #8, netlink plumbing is added: reading of number of entries, and
  reading and writing of maximum.

  The per-port values are passed through RTM_NEWLINK / RTM_GETLINK messages
  in IFLA_BRPORT_MCAST_N_GROUPS and _MAX_GROUPS, inside IFLA_PROTINFO nest.

  The per-port-vlan values are passed through RTM_GETVLAN / RTM_NEWVLAN
  messages in BRIDGE_VLANDB_ENTRY_MCAST_N_GROUPS, _MAX_GROUPS, inside
  BRIDGE_VLANDB_ENTRY.

The following patches deal with the selftest:

- Patches #9 and #10 clean up and move around some selftest code.

- Patches #11 to #14 add helpers and generalize the existing IGMP / MLD
  support to allow generating packets with configurable group addresses and
  varying source lists for (S,G) memberships.

- Patch #15 adds code to generate IGMP leave and MLD done packets.

- Patch #16 finally adds the selftest itself.

Petr Machata (16):
  net: bridge: Set strict_start_type at two policies
  net: bridge: Add extack to br_multicast_new_port_group()
  net: bridge: Move extack-setting to br_multicast_new_port_group()
  net: bridge: Add br_multicast_del_port_group()
  net: bridge: Change a cleanup in br_multicast_new_port_group() to goto
  net: bridge: Add a tracepoint for MDB overflows
  net: bridge: Maintain number of MDB entries in net_bridge_mcast_port
  net: bridge: Add netlink knobs for number / maximum MDB entries
  selftests: forwarding: Move IGMP- and MLD-related functions to lib
  selftests: forwarding: bridge_mdb: Fix a typo
  selftests: forwarding: lib: Add helpers for IP address handling
  selftests: forwarding: lib: Add helpers for checksum handling
  selftests: forwarding: lib: Parameterize IGMPv3/MLDv2 generation
  selftests: forwarding: lib: Allow list of IPs for IGMPv3/MLDv2
  selftests: forwarding: lib: Add helpers to build IGMP/MLD leave
    packets
  selftests: forwarding: bridge_mdb_max: Add a new selftest

 include/trace/events/bridge.h                 |  67 ++
 include/uapi/linux/if_bridge.h                |   2 +
 include/uapi/linux/if_link.h                  |   2 +
 net/bridge/br_mdb.c                           |  17 +-
 net/bridge/br_multicast.c                     | 255 ++++-
 net/bridge/br_netlink.c                       |  21 +-
 net/bridge/br_netlink_tunnel.c                |   3 +
 net/bridge/br_private.h                       |  22 +-
 net/bridge/br_vlan.c                          |  11 +-
 net/bridge/br_vlan_options.c                  |  33 +-
 net/core/net-traces.c                         |   1 +
 net/core/rtnetlink.c                          |   2 +-
 .../testing/selftests/net/forwarding/Makefile |   1 +
 .../selftests/net/forwarding/bridge_mdb.sh    |  60 +-
 .../net/forwarding/bridge_mdb_max.sh          | 970 ++++++++++++++++++
 tools/testing/selftests/net/forwarding/lib.sh | 216 ++++
 16 files changed, 1604 insertions(+), 79 deletions(-)
 create mode 100755 tools/testing/selftests/net/forwarding/bridge_mdb_max.sh

-- 
2.39.0


^ permalink raw reply	[flat|nested] 90+ messages in thread

* [Bridge] [PATCH net-next 00/16] bridge: Limit number of MDB entries per port, port-vlan
@ 2023-01-26 17:01 ` Petr Machata
  0 siblings, 0 replies; 90+ messages in thread
From: Petr Machata @ 2023-01-26 17:01 UTC (permalink / raw)
  To: David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
	Roopa Prabhu, Nikolay Aleksandrov, netdev
  Cc: Petr Machata, Ido Schimmel, bridge

The MDB maintained by the bridge is limited. When the bridge is configured
for IGMP / MLD snooping, a buggy or malicious client can easily exhaust its
capacity. In SW datapath, the capacity is configurable through the
IFLA_BR_MCAST_HASH_MAX parameter, but ultimately is finite. Obviously a
similar limit exists in the HW datapath for purposes of offloading.

In order to prevent the issue of unilateral exhaustion of MDB resources,
introduce two parameters in each of two contexts:

- Per-port and (when BROPT_MCAST_VLAN_SNOOPING_ENABLED is enabled)
  per-port-VLAN number of MDB entries that the port is member in.

- Per-port and (when BROPT_MCAST_VLAN_SNOOPING_ENABLED is enabled)
  per-port-VLAN maximum permitted number of MDB entries, or 0 for
  no limit.

Per-port number of entries keeps track of the total number of MDB entries
configured on a given port. The per-port-VLAN value then keeps track of the
subset of MDB entries configured specifically for the given VLAN, on that
port. The number is adjusted as port_groups are created and deleted, and
therefore under multicast lock.

A maximum value, if non-zero, then places a limit on the number of entries
that can be configured in a given context. Attempts to add entries above
the maximum are rejected.

Rejection reason of netlink-based requests to add MDB entries is
communicated through extack. This channel is unavailable for rejections
triggered from the control path. To address this lack of visibility, the
patchset adds a tracepoint, bridge:br_mdb_full:

	# perf record -e bridge:br_mdb_full &
	# [...]
	# perf script | cut -d: -f4-
	 dev v2 af 2 src 192.0.2.1/:: grp 239.1.1.1/::/00:00:00:00:00:00 vid 0
	 dev v2 af 10 src 0.0.0.0/2001:db8:1::1 grp 0.0.0.0/ff0e::1/00:00:00:00:00:00 vid 0
	 dev v2 af 2 src 192.0.2.1/:: grp 239.1.1.1/::/00:00:00:00:00:00 vid 10
	 dev v2 af 10 src 0.0.0.0/2001:db8:1::1 grp 0.0.0.0/ff0e::1/00:00:00:00:00:00 vid 10

This tracepoint is triggered for mcast_hash_max exhaustions as well.

The following is an example of how the feature is used. A more extensive
example is available in patch #8:

	# bridge vlan set dev v1 vid 1 mcast_max_groups 1
	# bridge mdb add dev br port v1 grp 230.1.2.3 temp vid 1
	# bridge mdb add dev br port v1 grp 230.1.2.4 temp vid 1
	Error: bridge: Port-VLAN is already a member in mcast_max_groups (1) groups.

The patchset progresses as follows:

- In patch #1, set strict_start_type at two bridge-related policies. The
  reason is we are adding a new attribute to one of these, and want the new
  attribute to be parsed strictly. The other was adjusted for completeness'
  sake.

- In patches #2 to #5, br_mdb and br_multicast code is adjusted to make the
  following additions smoother.

- In patch #6, add the tracepoint.

- In patch #7, the code to maintain number of MDB entries is added as
  struct net_bridge_mcast_port::mdb_n_entries. The maximum is added, too,
  as struct net_bridge_mcast_port::mdb_max_entries, however at this point
  there is no way to set the value yet, and since 0 is treated as "no
  limit", the functionality doesn't change at this point. Note however,
  that mcast_hash_max violations already do trigger at this point.

- In patch #8, netlink plumbing is added: reading of number of entries, and
  reading and writing of maximum.

  The per-port values are passed through RTM_NEWLINK / RTM_GETLINK messages
  in IFLA_BRPORT_MCAST_N_GROUPS and _MAX_GROUPS, inside IFLA_PROTINFO nest.

  The per-port-vlan values are passed through RTM_GETVLAN / RTM_NEWVLAN
  messages in BRIDGE_VLANDB_ENTRY_MCAST_N_GROUPS, _MAX_GROUPS, inside
  BRIDGE_VLANDB_ENTRY.

The following patches deal with the selftest:

- Patches #9 and #10 clean up and move around some selftest code.

- Patches #11 to #14 add helpers and generalize the existing IGMP / MLD
  support to allow generating packets with configurable group addresses and
  varying source lists for (S,G) memberships.

- Patch #15 adds code to generate IGMP leave and MLD done packets.

- Patch #16 finally adds the selftest itself.

Petr Machata (16):
  net: bridge: Set strict_start_type at two policies
  net: bridge: Add extack to br_multicast_new_port_group()
  net: bridge: Move extack-setting to br_multicast_new_port_group()
  net: bridge: Add br_multicast_del_port_group()
  net: bridge: Change a cleanup in br_multicast_new_port_group() to goto
  net: bridge: Add a tracepoint for MDB overflows
  net: bridge: Maintain number of MDB entries in net_bridge_mcast_port
  net: bridge: Add netlink knobs for number / maximum MDB entries
  selftests: forwarding: Move IGMP- and MLD-related functions to lib
  selftests: forwarding: bridge_mdb: Fix a typo
  selftests: forwarding: lib: Add helpers for IP address handling
  selftests: forwarding: lib: Add helpers for checksum handling
  selftests: forwarding: lib: Parameterize IGMPv3/MLDv2 generation
  selftests: forwarding: lib: Allow list of IPs for IGMPv3/MLDv2
  selftests: forwarding: lib: Add helpers to build IGMP/MLD leave
    packets
  selftests: forwarding: bridge_mdb_max: Add a new selftest

 include/trace/events/bridge.h                 |  67 ++
 include/uapi/linux/if_bridge.h                |   2 +
 include/uapi/linux/if_link.h                  |   2 +
 net/bridge/br_mdb.c                           |  17 +-
 net/bridge/br_multicast.c                     | 255 ++++-
 net/bridge/br_netlink.c                       |  21 +-
 net/bridge/br_netlink_tunnel.c                |   3 +
 net/bridge/br_private.h                       |  22 +-
 net/bridge/br_vlan.c                          |  11 +-
 net/bridge/br_vlan_options.c                  |  33 +-
 net/core/net-traces.c                         |   1 +
 net/core/rtnetlink.c                          |   2 +-
 .../testing/selftests/net/forwarding/Makefile |   1 +
 .../selftests/net/forwarding/bridge_mdb.sh    |  60 +-
 .../net/forwarding/bridge_mdb_max.sh          | 970 ++++++++++++++++++
 tools/testing/selftests/net/forwarding/lib.sh | 216 ++++
 16 files changed, 1604 insertions(+), 79 deletions(-)
 create mode 100755 tools/testing/selftests/net/forwarding/bridge_mdb_max.sh

-- 
2.39.0


^ permalink raw reply	[flat|nested] 90+ messages in thread

* [PATCH net-next 01/16] net: bridge: Set strict_start_type at two policies
  2023-01-26 17:01 ` [Bridge] " Petr Machata
@ 2023-01-26 17:01   ` Petr Machata
  -1 siblings, 0 replies; 90+ messages in thread
From: Petr Machata @ 2023-01-26 17:01 UTC (permalink / raw)
  To: David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
	Roopa Prabhu, Nikolay Aleksandrov, netdev
  Cc: bridge, Petr Machata, Ido Schimmel

Make any attributes newly-added to br_port_policy or vlan_tunnel_policy
parsed strictly, to prevent userspace from passing garbage. Note that this
patchset only touches the former policy. The latter was adjusted for
completeness' sake. There do not appear to be other _deprecated calls
with non-NULL policies.

Suggested-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
---
 net/bridge/br_netlink.c        | 2 ++
 net/bridge/br_netlink_tunnel.c | 3 +++
 2 files changed, 5 insertions(+)

diff --git a/net/bridge/br_netlink.c b/net/bridge/br_netlink.c
index 4316cc82ae17..a6133d469885 100644
--- a/net/bridge/br_netlink.c
+++ b/net/bridge/br_netlink.c
@@ -858,6 +858,8 @@ static int br_afspec(struct net_bridge *br,
 }
 
 static const struct nla_policy br_port_policy[IFLA_BRPORT_MAX + 1] = {
+	[IFLA_BRPORT_UNSPEC]	= { .strict_start_type =
+					IFLA_BRPORT_MCAST_EHT_HOSTS_LIMIT + 1 },
 	[IFLA_BRPORT_STATE]	= { .type = NLA_U8 },
 	[IFLA_BRPORT_COST]	= { .type = NLA_U32 },
 	[IFLA_BRPORT_PRIORITY]	= { .type = NLA_U16 },
diff --git a/net/bridge/br_netlink_tunnel.c b/net/bridge/br_netlink_tunnel.c
index 8914290c75d4..17abf092f7ca 100644
--- a/net/bridge/br_netlink_tunnel.c
+++ b/net/bridge/br_netlink_tunnel.c
@@ -188,6 +188,9 @@ int br_fill_vlan_tunnel_info(struct sk_buff *skb,
 }
 
 static const struct nla_policy vlan_tunnel_policy[IFLA_BRIDGE_VLAN_TUNNEL_MAX + 1] = {
+	[IFLA_BRIDGE_VLAN_TUNNEL_UNSPEC] = {
+		.strict_start_type = IFLA_BRIDGE_VLAN_TUNNEL_FLAGS + 1
+	},
 	[IFLA_BRIDGE_VLAN_TUNNEL_ID] = { .type = NLA_U32 },
 	[IFLA_BRIDGE_VLAN_TUNNEL_VID] = { .type = NLA_U16 },
 	[IFLA_BRIDGE_VLAN_TUNNEL_FLAGS] = { .type = NLA_U16 },
-- 
2.39.0


^ permalink raw reply related	[flat|nested] 90+ messages in thread

* [Bridge] [PATCH net-next 01/16] net: bridge: Set strict_start_type at two policies
@ 2023-01-26 17:01   ` Petr Machata
  0 siblings, 0 replies; 90+ messages in thread
From: Petr Machata @ 2023-01-26 17:01 UTC (permalink / raw)
  To: David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
	Roopa Prabhu, Nikolay Aleksandrov, netdev
  Cc: Petr Machata, Ido Schimmel, bridge

Make any attributes newly-added to br_port_policy or vlan_tunnel_policy
parsed strictly, to prevent userspace from passing garbage. Note that this
patchset only touches the former policy. The latter was adjusted for
completeness' sake. There do not appear to be other _deprecated calls
with non-NULL policies.

Suggested-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
---
 net/bridge/br_netlink.c        | 2 ++
 net/bridge/br_netlink_tunnel.c | 3 +++
 2 files changed, 5 insertions(+)

diff --git a/net/bridge/br_netlink.c b/net/bridge/br_netlink.c
index 4316cc82ae17..a6133d469885 100644
--- a/net/bridge/br_netlink.c
+++ b/net/bridge/br_netlink.c
@@ -858,6 +858,8 @@ static int br_afspec(struct net_bridge *br,
 }
 
 static const struct nla_policy br_port_policy[IFLA_BRPORT_MAX + 1] = {
+	[IFLA_BRPORT_UNSPEC]	= { .strict_start_type =
+					IFLA_BRPORT_MCAST_EHT_HOSTS_LIMIT + 1 },
 	[IFLA_BRPORT_STATE]	= { .type = NLA_U8 },
 	[IFLA_BRPORT_COST]	= { .type = NLA_U32 },
 	[IFLA_BRPORT_PRIORITY]	= { .type = NLA_U16 },
diff --git a/net/bridge/br_netlink_tunnel.c b/net/bridge/br_netlink_tunnel.c
index 8914290c75d4..17abf092f7ca 100644
--- a/net/bridge/br_netlink_tunnel.c
+++ b/net/bridge/br_netlink_tunnel.c
@@ -188,6 +188,9 @@ int br_fill_vlan_tunnel_info(struct sk_buff *skb,
 }
 
 static const struct nla_policy vlan_tunnel_policy[IFLA_BRIDGE_VLAN_TUNNEL_MAX + 1] = {
+	[IFLA_BRIDGE_VLAN_TUNNEL_UNSPEC] = {
+		.strict_start_type = IFLA_BRIDGE_VLAN_TUNNEL_FLAGS + 1
+	},
 	[IFLA_BRIDGE_VLAN_TUNNEL_ID] = { .type = NLA_U32 },
 	[IFLA_BRIDGE_VLAN_TUNNEL_VID] = { .type = NLA_U16 },
 	[IFLA_BRIDGE_VLAN_TUNNEL_FLAGS] = { .type = NLA_U16 },
-- 
2.39.0


^ permalink raw reply related	[flat|nested] 90+ messages in thread

* [PATCH net-next 02/16] net: bridge: Add extack to br_multicast_new_port_group()
  2023-01-26 17:01 ` [Bridge] " Petr Machata
@ 2023-01-26 17:01   ` Petr Machata
  -1 siblings, 0 replies; 90+ messages in thread
From: Petr Machata @ 2023-01-26 17:01 UTC (permalink / raw)
  To: David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
	Roopa Prabhu, Nikolay Aleksandrov, netdev
  Cc: bridge, Petr Machata, Ido Schimmel

Make it possible to set an extack in br_multicast_new_port_group().
Eventually, this function will check for per-port and per-port-vlan
MDB maximums, and will use the extack to communicate the reason for
the bounce.

Signed-off-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
---
 net/bridge/br_mdb.c       | 5 +++--
 net/bridge/br_multicast.c | 5 +++--
 net/bridge/br_private.h   | 3 ++-
 3 files changed, 8 insertions(+), 5 deletions(-)

diff --git a/net/bridge/br_mdb.c b/net/bridge/br_mdb.c
index 00e5743647b0..069061366541 100644
--- a/net/bridge/br_mdb.c
+++ b/net/bridge/br_mdb.c
@@ -849,7 +849,7 @@ static int br_mdb_add_group_sg(const struct br_mdb_config *cfg,
 	}
 
 	p = br_multicast_new_port_group(cfg->p, &cfg->group, *pp, flags, NULL,
-					MCAST_INCLUDE, cfg->rt_protocol);
+					MCAST_INCLUDE, cfg->rt_protocol, extack);
 	if (unlikely(!p)) {
 		NL_SET_ERR_MSG_MOD(extack, "Couldn't allocate new (S, G) port group");
 		return -ENOMEM;
@@ -1075,7 +1075,8 @@ static int br_mdb_add_group_star_g(const struct br_mdb_config *cfg,
 	}
 
 	p = br_multicast_new_port_group(cfg->p, &cfg->group, *pp, flags, NULL,
-					cfg->filter_mode, cfg->rt_protocol);
+					cfg->filter_mode, cfg->rt_protocol,
+					extack);
 	if (unlikely(!p)) {
 		NL_SET_ERR_MSG_MOD(extack, "Couldn't allocate new (*, G) port group");
 		return -ENOMEM;
diff --git a/net/bridge/br_multicast.c b/net/bridge/br_multicast.c
index dea1ee1bd095..de67d176838f 100644
--- a/net/bridge/br_multicast.c
+++ b/net/bridge/br_multicast.c
@@ -1284,7 +1284,8 @@ struct net_bridge_port_group *br_multicast_new_port_group(
 			unsigned char flags,
 			const unsigned char *src,
 			u8 filter_mode,
-			u8 rt_protocol)
+			u8 rt_protocol,
+			struct netlink_ext_ack *extack)
 {
 	struct net_bridge_port_group *p;
 
@@ -1387,7 +1388,7 @@ __br_multicast_add_group(struct net_bridge_mcast *brmctx,
 	}
 
 	p = br_multicast_new_port_group(pmctx->port, group, *pp, 0, src,
-					filter_mode, RTPROT_KERNEL);
+					filter_mode, RTPROT_KERNEL, NULL);
 	if (unlikely(!p)) {
 		p = ERR_PTR(-ENOMEM);
 		goto out;
diff --git a/net/bridge/br_private.h b/net/bridge/br_private.h
index 15ef7fd508ee..1805c468ae03 100644
--- a/net/bridge/br_private.h
+++ b/net/bridge/br_private.h
@@ -956,7 +956,8 @@ br_multicast_new_port_group(struct net_bridge_port *port,
 			    const struct br_ip *group,
 			    struct net_bridge_port_group __rcu *next,
 			    unsigned char flags, const unsigned char *src,
-			    u8 filter_mode, u8 rt_protocol);
+			    u8 filter_mode, u8 rt_protocol,
+			    struct netlink_ext_ack *extack);
 int br_mdb_hash_init(struct net_bridge *br);
 void br_mdb_hash_fini(struct net_bridge *br);
 void br_mdb_notify(struct net_device *dev, struct net_bridge_mdb_entry *mp,
-- 
2.39.0


^ permalink raw reply related	[flat|nested] 90+ messages in thread

* [Bridge] [PATCH net-next 02/16] net: bridge: Add extack to br_multicast_new_port_group()
@ 2023-01-26 17:01   ` Petr Machata
  0 siblings, 0 replies; 90+ messages in thread
From: Petr Machata @ 2023-01-26 17:01 UTC (permalink / raw)
  To: David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
	Roopa Prabhu, Nikolay Aleksandrov, netdev
  Cc: Petr Machata, Ido Schimmel, bridge

Make it possible to set an extack in br_multicast_new_port_group().
Eventually, this function will check for per-port and per-port-vlan
MDB maximums, and will use the extack to communicate the reason for
the bounce.

Signed-off-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
---
 net/bridge/br_mdb.c       | 5 +++--
 net/bridge/br_multicast.c | 5 +++--
 net/bridge/br_private.h   | 3 ++-
 3 files changed, 8 insertions(+), 5 deletions(-)

diff --git a/net/bridge/br_mdb.c b/net/bridge/br_mdb.c
index 00e5743647b0..069061366541 100644
--- a/net/bridge/br_mdb.c
+++ b/net/bridge/br_mdb.c
@@ -849,7 +849,7 @@ static int br_mdb_add_group_sg(const struct br_mdb_config *cfg,
 	}
 
 	p = br_multicast_new_port_group(cfg->p, &cfg->group, *pp, flags, NULL,
-					MCAST_INCLUDE, cfg->rt_protocol);
+					MCAST_INCLUDE, cfg->rt_protocol, extack);
 	if (unlikely(!p)) {
 		NL_SET_ERR_MSG_MOD(extack, "Couldn't allocate new (S, G) port group");
 		return -ENOMEM;
@@ -1075,7 +1075,8 @@ static int br_mdb_add_group_star_g(const struct br_mdb_config *cfg,
 	}
 
 	p = br_multicast_new_port_group(cfg->p, &cfg->group, *pp, flags, NULL,
-					cfg->filter_mode, cfg->rt_protocol);
+					cfg->filter_mode, cfg->rt_protocol,
+					extack);
 	if (unlikely(!p)) {
 		NL_SET_ERR_MSG_MOD(extack, "Couldn't allocate new (*, G) port group");
 		return -ENOMEM;
diff --git a/net/bridge/br_multicast.c b/net/bridge/br_multicast.c
index dea1ee1bd095..de67d176838f 100644
--- a/net/bridge/br_multicast.c
+++ b/net/bridge/br_multicast.c
@@ -1284,7 +1284,8 @@ struct net_bridge_port_group *br_multicast_new_port_group(
 			unsigned char flags,
 			const unsigned char *src,
 			u8 filter_mode,
-			u8 rt_protocol)
+			u8 rt_protocol,
+			struct netlink_ext_ack *extack)
 {
 	struct net_bridge_port_group *p;
 
@@ -1387,7 +1388,7 @@ __br_multicast_add_group(struct net_bridge_mcast *brmctx,
 	}
 
 	p = br_multicast_new_port_group(pmctx->port, group, *pp, 0, src,
-					filter_mode, RTPROT_KERNEL);
+					filter_mode, RTPROT_KERNEL, NULL);
 	if (unlikely(!p)) {
 		p = ERR_PTR(-ENOMEM);
 		goto out;
diff --git a/net/bridge/br_private.h b/net/bridge/br_private.h
index 15ef7fd508ee..1805c468ae03 100644
--- a/net/bridge/br_private.h
+++ b/net/bridge/br_private.h
@@ -956,7 +956,8 @@ br_multicast_new_port_group(struct net_bridge_port *port,
 			    const struct br_ip *group,
 			    struct net_bridge_port_group __rcu *next,
 			    unsigned char flags, const unsigned char *src,
-			    u8 filter_mode, u8 rt_protocol);
+			    u8 filter_mode, u8 rt_protocol,
+			    struct netlink_ext_ack *extack);
 int br_mdb_hash_init(struct net_bridge *br);
 void br_mdb_hash_fini(struct net_bridge *br);
 void br_mdb_notify(struct net_device *dev, struct net_bridge_mdb_entry *mp,
-- 
2.39.0


^ permalink raw reply related	[flat|nested] 90+ messages in thread

* [PATCH net-next 03/16] net: bridge: Move extack-setting to br_multicast_new_port_group()
  2023-01-26 17:01 ` [Bridge] " Petr Machata
@ 2023-01-26 17:01   ` Petr Machata
  -1 siblings, 0 replies; 90+ messages in thread
From: Petr Machata @ 2023-01-26 17:01 UTC (permalink / raw)
  To: David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
	Roopa Prabhu, Nikolay Aleksandrov, netdev
  Cc: bridge, Petr Machata, Ido Schimmel

Now that br_multicast_new_port_group() takes an extack argument, move
setting the extack there. The downside is that the error messages end
up being less specific (the function cannot distinguish between (S,G)
and (*,G) groups). However, the alternative is to check in the caller
whether the callee set the extack, and if it didn't, set it. But that
is only done when the callee is not exactly known. (E.g. in case of a
notifier invocation.)

Signed-off-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
---
 net/bridge/br_mdb.c       | 9 +++------
 net/bridge/br_multicast.c | 5 ++++-
 2 files changed, 7 insertions(+), 7 deletions(-)

diff --git a/net/bridge/br_mdb.c b/net/bridge/br_mdb.c
index 069061366541..139de8ac532c 100644
--- a/net/bridge/br_mdb.c
+++ b/net/bridge/br_mdb.c
@@ -850,10 +850,9 @@ static int br_mdb_add_group_sg(const struct br_mdb_config *cfg,
 
 	p = br_multicast_new_port_group(cfg->p, &cfg->group, *pp, flags, NULL,
 					MCAST_INCLUDE, cfg->rt_protocol, extack);
-	if (unlikely(!p)) {
-		NL_SET_ERR_MSG_MOD(extack, "Couldn't allocate new (S, G) port group");
+	if (unlikely(!p))
 		return -ENOMEM;
-	}
+
 	rcu_assign_pointer(*pp, p);
 	if (!(flags & MDB_PG_FLAGS_PERMANENT) && !cfg->src_entry)
 		mod_timer(&p->timer,
@@ -1077,10 +1076,8 @@ static int br_mdb_add_group_star_g(const struct br_mdb_config *cfg,
 	p = br_multicast_new_port_group(cfg->p, &cfg->group, *pp, flags, NULL,
 					cfg->filter_mode, cfg->rt_protocol,
 					extack);
-	if (unlikely(!p)) {
-		NL_SET_ERR_MSG_MOD(extack, "Couldn't allocate new (*, G) port group");
+	if (unlikely(!p))
 		return -ENOMEM;
-	}
 
 	err = br_mdb_add_group_srcs(cfg, p, brmctx, extack);
 	if (err)
diff --git a/net/bridge/br_multicast.c b/net/bridge/br_multicast.c
index de67d176838f..f9f4d54226fd 100644
--- a/net/bridge/br_multicast.c
+++ b/net/bridge/br_multicast.c
@@ -1290,8 +1290,10 @@ struct net_bridge_port_group *br_multicast_new_port_group(
 	struct net_bridge_port_group *p;
 
 	p = kzalloc(sizeof(*p), GFP_ATOMIC);
-	if (unlikely(!p))
+	if (unlikely(!p)) {
+		NL_SET_ERR_MSG_MOD(extack, "Couldn't allocate new port group");
 		return NULL;
+	}
 
 	p->key.addr = *group;
 	p->key.port = port;
@@ -1306,6 +1308,7 @@ struct net_bridge_port_group *br_multicast_new_port_group(
 	if (!br_multicast_is_star_g(group) &&
 	    rhashtable_lookup_insert_fast(&port->br->sg_port_tbl, &p->rhnode,
 					  br_sg_port_rht_params)) {
+		NL_SET_ERR_MSG_MOD(extack, "Couldn't insert new port group");
 		kfree(p);
 		return NULL;
 	}
-- 
2.39.0


^ permalink raw reply related	[flat|nested] 90+ messages in thread

* [Bridge] [PATCH net-next 03/16] net: bridge: Move extack-setting to br_multicast_new_port_group()
@ 2023-01-26 17:01   ` Petr Machata
  0 siblings, 0 replies; 90+ messages in thread
From: Petr Machata @ 2023-01-26 17:01 UTC (permalink / raw)
  To: David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
	Roopa Prabhu, Nikolay Aleksandrov, netdev
  Cc: Petr Machata, Ido Schimmel, bridge

Now that br_multicast_new_port_group() takes an extack argument, move
setting the extack there. The downside is that the error messages end
up being less specific (the function cannot distinguish between (S,G)
and (*,G) groups). However, the alternative is to check in the caller
whether the callee set the extack, and if it didn't, set it. But that
is only done when the callee is not exactly known. (E.g. in case of a
notifier invocation.)

Signed-off-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
---
 net/bridge/br_mdb.c       | 9 +++------
 net/bridge/br_multicast.c | 5 ++++-
 2 files changed, 7 insertions(+), 7 deletions(-)

diff --git a/net/bridge/br_mdb.c b/net/bridge/br_mdb.c
index 069061366541..139de8ac532c 100644
--- a/net/bridge/br_mdb.c
+++ b/net/bridge/br_mdb.c
@@ -850,10 +850,9 @@ static int br_mdb_add_group_sg(const struct br_mdb_config *cfg,
 
 	p = br_multicast_new_port_group(cfg->p, &cfg->group, *pp, flags, NULL,
 					MCAST_INCLUDE, cfg->rt_protocol, extack);
-	if (unlikely(!p)) {
-		NL_SET_ERR_MSG_MOD(extack, "Couldn't allocate new (S, G) port group");
+	if (unlikely(!p))
 		return -ENOMEM;
-	}
+
 	rcu_assign_pointer(*pp, p);
 	if (!(flags & MDB_PG_FLAGS_PERMANENT) && !cfg->src_entry)
 		mod_timer(&p->timer,
@@ -1077,10 +1076,8 @@ static int br_mdb_add_group_star_g(const struct br_mdb_config *cfg,
 	p = br_multicast_new_port_group(cfg->p, &cfg->group, *pp, flags, NULL,
 					cfg->filter_mode, cfg->rt_protocol,
 					extack);
-	if (unlikely(!p)) {
-		NL_SET_ERR_MSG_MOD(extack, "Couldn't allocate new (*, G) port group");
+	if (unlikely(!p))
 		return -ENOMEM;
-	}
 
 	err = br_mdb_add_group_srcs(cfg, p, brmctx, extack);
 	if (err)
diff --git a/net/bridge/br_multicast.c b/net/bridge/br_multicast.c
index de67d176838f..f9f4d54226fd 100644
--- a/net/bridge/br_multicast.c
+++ b/net/bridge/br_multicast.c
@@ -1290,8 +1290,10 @@ struct net_bridge_port_group *br_multicast_new_port_group(
 	struct net_bridge_port_group *p;
 
 	p = kzalloc(sizeof(*p), GFP_ATOMIC);
-	if (unlikely(!p))
+	if (unlikely(!p)) {
+		NL_SET_ERR_MSG_MOD(extack, "Couldn't allocate new port group");
 		return NULL;
+	}
 
 	p->key.addr = *group;
 	p->key.port = port;
@@ -1306,6 +1308,7 @@ struct net_bridge_port_group *br_multicast_new_port_group(
 	if (!br_multicast_is_star_g(group) &&
 	    rhashtable_lookup_insert_fast(&port->br->sg_port_tbl, &p->rhnode,
 					  br_sg_port_rht_params)) {
+		NL_SET_ERR_MSG_MOD(extack, "Couldn't insert new port group");
 		kfree(p);
 		return NULL;
 	}
-- 
2.39.0


^ permalink raw reply related	[flat|nested] 90+ messages in thread

* [PATCH net-next 04/16] net: bridge: Add br_multicast_del_port_group()
  2023-01-26 17:01 ` [Bridge] " Petr Machata
@ 2023-01-26 17:01   ` Petr Machata
  -1 siblings, 0 replies; 90+ messages in thread
From: Petr Machata @ 2023-01-26 17:01 UTC (permalink / raw)
  To: David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
	Roopa Prabhu, Nikolay Aleksandrov, netdev
  Cc: bridge, Petr Machata, Ido Schimmel

Since cleaning up the effects of br_multicast_new_port_group() just
consists of delisting and freeing the memory, the function
br_mdb_add_group_star_g() inlines the corresponding code. In the following
patches, number of per-port and per-port-VLAN MDB entries is going to be
maintained, and that counter will have to be updated. Because that logic
is going to be hidden in the br_multicast module, introduce a new hook
intended to again remove a newly-created group.

Signed-off-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
---
 net/bridge/br_mdb.c       |  3 +--
 net/bridge/br_multicast.c | 11 +++++++++++
 net/bridge/br_private.h   |  1 +
 3 files changed, 13 insertions(+), 2 deletions(-)

diff --git a/net/bridge/br_mdb.c b/net/bridge/br_mdb.c
index 139de8ac532c..9f22ebfdc518 100644
--- a/net/bridge/br_mdb.c
+++ b/net/bridge/br_mdb.c
@@ -1099,8 +1099,7 @@ static int br_mdb_add_group_star_g(const struct br_mdb_config *cfg,
 	return 0;
 
 err_del_port_group:
-	hlist_del_init(&p->mglist);
-	kfree(p);
+	br_multicast_del_port_group(p);
 	return err;
 }
 
diff --git a/net/bridge/br_multicast.c b/net/bridge/br_multicast.c
index f9f4d54226fd..08da724ebfdd 100644
--- a/net/bridge/br_multicast.c
+++ b/net/bridge/br_multicast.c
@@ -1326,6 +1326,17 @@ struct net_bridge_port_group *br_multicast_new_port_group(
 	return p;
 }
 
+void br_multicast_del_port_group(struct net_bridge_port_group *p)
+{
+	struct net_bridge_port *port = p->key.port;
+
+	hlist_del_init(&p->mglist);
+	if (!br_multicast_is_star_g(&p->key.addr))
+		rhashtable_remove_fast(&port->br->sg_port_tbl, &p->rhnode,
+				       br_sg_port_rht_params);
+	kfree(p);
+}
+
 void br_multicast_host_join(const struct net_bridge_mcast *brmctx,
 			    struct net_bridge_mdb_entry *mp, bool notify)
 {
diff --git a/net/bridge/br_private.h b/net/bridge/br_private.h
index 1805c468ae03..e4069e27b5c6 100644
--- a/net/bridge/br_private.h
+++ b/net/bridge/br_private.h
@@ -958,6 +958,7 @@ br_multicast_new_port_group(struct net_bridge_port *port,
 			    unsigned char flags, const unsigned char *src,
 			    u8 filter_mode, u8 rt_protocol,
 			    struct netlink_ext_ack *extack);
+void br_multicast_del_port_group(struct net_bridge_port_group *p);
 int br_mdb_hash_init(struct net_bridge *br);
 void br_mdb_hash_fini(struct net_bridge *br);
 void br_mdb_notify(struct net_device *dev, struct net_bridge_mdb_entry *mp,
-- 
2.39.0


^ permalink raw reply related	[flat|nested] 90+ messages in thread

* [Bridge] [PATCH net-next 04/16] net: bridge: Add br_multicast_del_port_group()
@ 2023-01-26 17:01   ` Petr Machata
  0 siblings, 0 replies; 90+ messages in thread
From: Petr Machata @ 2023-01-26 17:01 UTC (permalink / raw)
  To: David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
	Roopa Prabhu, Nikolay Aleksandrov, netdev
  Cc: Petr Machata, Ido Schimmel, bridge

Since cleaning up the effects of br_multicast_new_port_group() just
consists of delisting and freeing the memory, the function
br_mdb_add_group_star_g() inlines the corresponding code. In the following
patches, number of per-port and per-port-VLAN MDB entries is going to be
maintained, and that counter will have to be updated. Because that logic
is going to be hidden in the br_multicast module, introduce a new hook
intended to again remove a newly-created group.

Signed-off-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
---
 net/bridge/br_mdb.c       |  3 +--
 net/bridge/br_multicast.c | 11 +++++++++++
 net/bridge/br_private.h   |  1 +
 3 files changed, 13 insertions(+), 2 deletions(-)

diff --git a/net/bridge/br_mdb.c b/net/bridge/br_mdb.c
index 139de8ac532c..9f22ebfdc518 100644
--- a/net/bridge/br_mdb.c
+++ b/net/bridge/br_mdb.c
@@ -1099,8 +1099,7 @@ static int br_mdb_add_group_star_g(const struct br_mdb_config *cfg,
 	return 0;
 
 err_del_port_group:
-	hlist_del_init(&p->mglist);
-	kfree(p);
+	br_multicast_del_port_group(p);
 	return err;
 }
 
diff --git a/net/bridge/br_multicast.c b/net/bridge/br_multicast.c
index f9f4d54226fd..08da724ebfdd 100644
--- a/net/bridge/br_multicast.c
+++ b/net/bridge/br_multicast.c
@@ -1326,6 +1326,17 @@ struct net_bridge_port_group *br_multicast_new_port_group(
 	return p;
 }
 
+void br_multicast_del_port_group(struct net_bridge_port_group *p)
+{
+	struct net_bridge_port *port = p->key.port;
+
+	hlist_del_init(&p->mglist);
+	if (!br_multicast_is_star_g(&p->key.addr))
+		rhashtable_remove_fast(&port->br->sg_port_tbl, &p->rhnode,
+				       br_sg_port_rht_params);
+	kfree(p);
+}
+
 void br_multicast_host_join(const struct net_bridge_mcast *brmctx,
 			    struct net_bridge_mdb_entry *mp, bool notify)
 {
diff --git a/net/bridge/br_private.h b/net/bridge/br_private.h
index 1805c468ae03..e4069e27b5c6 100644
--- a/net/bridge/br_private.h
+++ b/net/bridge/br_private.h
@@ -958,6 +958,7 @@ br_multicast_new_port_group(struct net_bridge_port *port,
 			    unsigned char flags, const unsigned char *src,
 			    u8 filter_mode, u8 rt_protocol,
 			    struct netlink_ext_ack *extack);
+void br_multicast_del_port_group(struct net_bridge_port_group *p);
 int br_mdb_hash_init(struct net_bridge *br);
 void br_mdb_hash_fini(struct net_bridge *br);
 void br_mdb_notify(struct net_device *dev, struct net_bridge_mdb_entry *mp,
-- 
2.39.0


^ permalink raw reply related	[flat|nested] 90+ messages in thread

* [PATCH net-next 05/16] net: bridge: Change a cleanup in br_multicast_new_port_group() to goto
  2023-01-26 17:01 ` [Bridge] " Petr Machata
@ 2023-01-26 17:01   ` Petr Machata
  -1 siblings, 0 replies; 90+ messages in thread
From: Petr Machata @ 2023-01-26 17:01 UTC (permalink / raw)
  To: David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
	Roopa Prabhu, Nikolay Aleksandrov, netdev
  Cc: bridge, Petr Machata, Ido Schimmel

This function is getting more to clean up in the following patches.
Structuring the cleanups in one labeled block will allow reusing the same
cleanup from several places.

Signed-off-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
---
 net/bridge/br_multicast.c | 7 +++++--
 1 file changed, 5 insertions(+), 2 deletions(-)

diff --git a/net/bridge/br_multicast.c b/net/bridge/br_multicast.c
index 08da724ebfdd..51b622afdb67 100644
--- a/net/bridge/br_multicast.c
+++ b/net/bridge/br_multicast.c
@@ -1309,8 +1309,7 @@ struct net_bridge_port_group *br_multicast_new_port_group(
 	    rhashtable_lookup_insert_fast(&port->br->sg_port_tbl, &p->rhnode,
 					  br_sg_port_rht_params)) {
 		NL_SET_ERR_MSG_MOD(extack, "Couldn't insert new port group");
-		kfree(p);
-		return NULL;
+		goto free_out;
 	}
 
 	rcu_assign_pointer(p->next, next);
@@ -1324,6 +1323,10 @@ struct net_bridge_port_group *br_multicast_new_port_group(
 		eth_broadcast_addr(p->eth_addr);
 
 	return p;
+
+free_out:
+	kfree(p);
+	return NULL;
 }
 
 void br_multicast_del_port_group(struct net_bridge_port_group *p)
-- 
2.39.0


^ permalink raw reply related	[flat|nested] 90+ messages in thread

* [Bridge] [PATCH net-next 05/16] net: bridge: Change a cleanup in br_multicast_new_port_group() to goto
@ 2023-01-26 17:01   ` Petr Machata
  0 siblings, 0 replies; 90+ messages in thread
From: Petr Machata @ 2023-01-26 17:01 UTC (permalink / raw)
  To: David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
	Roopa Prabhu, Nikolay Aleksandrov, netdev
  Cc: Petr Machata, Ido Schimmel, bridge

This function is getting more to clean up in the following patches.
Structuring the cleanups in one labeled block will allow reusing the same
cleanup from several places.

Signed-off-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
---
 net/bridge/br_multicast.c | 7 +++++--
 1 file changed, 5 insertions(+), 2 deletions(-)

diff --git a/net/bridge/br_multicast.c b/net/bridge/br_multicast.c
index 08da724ebfdd..51b622afdb67 100644
--- a/net/bridge/br_multicast.c
+++ b/net/bridge/br_multicast.c
@@ -1309,8 +1309,7 @@ struct net_bridge_port_group *br_multicast_new_port_group(
 	    rhashtable_lookup_insert_fast(&port->br->sg_port_tbl, &p->rhnode,
 					  br_sg_port_rht_params)) {
 		NL_SET_ERR_MSG_MOD(extack, "Couldn't insert new port group");
-		kfree(p);
-		return NULL;
+		goto free_out;
 	}
 
 	rcu_assign_pointer(p->next, next);
@@ -1324,6 +1323,10 @@ struct net_bridge_port_group *br_multicast_new_port_group(
 		eth_broadcast_addr(p->eth_addr);
 
 	return p;
+
+free_out:
+	kfree(p);
+	return NULL;
 }
 
 void br_multicast_del_port_group(struct net_bridge_port_group *p)
-- 
2.39.0


^ permalink raw reply related	[flat|nested] 90+ messages in thread

* [PATCH net-next 06/16] net: bridge: Add a tracepoint for MDB overflows
  2023-01-26 17:01 ` [Bridge] " Petr Machata
@ 2023-01-26 17:01   ` Petr Machata
  -1 siblings, 0 replies; 90+ messages in thread
From: Petr Machata @ 2023-01-26 17:01 UTC (permalink / raw)
  To: David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
	Roopa Prabhu, Nikolay Aleksandrov, netdev
  Cc: bridge, Petr Machata, Ido Schimmel, Steven Rostedt, linux-trace-kernel

The following patch will add two more maximum MDB allowances to the global
one, mcast_hash_max, that exists today. In all these cases, attempts to add
MDB entries above the configured maximums through netlink, fail noisily and
obviously. Such visibility is missing when adding entries through the
control plane traffic, by IGMP or MLD packets.

To improve visibility in those cases, add a trace point that reports the
violation, including the relevant netdevice (be it a slave or the bridge
itself), and the MDB entry parameters:

	# perf record -e bridge:br_mdb_full &
	# [...]
	# perf script | cut -d: -f4-
	 dev v2 af 2 src 192.0.2.1/:: grp 239.1.1.1/::/00:00:00:00:00:00 vid 0
	 dev v2 af 10 src 0.0.0.0/2001:db8:1::1 grp 0.0.0.0/ff0e::1/00:00:00:00:00:00 vid 0
	 dev v2 af 2 src 192.0.2.1/:: grp 239.1.1.1/::/00:00:00:00:00:00 vid 10
	 dev v2 af 10 src 0.0.0.0/2001:db8:1::1 grp 0.0.0.0/ff0e::1/00:00:00:00:00:00 vid 10

CC: Steven Rostedt <rostedt@goodmis.org>
CC: linux-trace-kernel@vger.kernel.org
Signed-off-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
---
 include/trace/events/bridge.h | 67 +++++++++++++++++++++++++++++++++++
 net/core/net-traces.c         |  1 +
 2 files changed, 68 insertions(+)

diff --git a/include/trace/events/bridge.h b/include/trace/events/bridge.h
index 6b200059c2c5..00d5e2dcb3ad 100644
--- a/include/trace/events/bridge.h
+++ b/include/trace/events/bridge.h
@@ -122,6 +122,73 @@ TRACE_EVENT(br_fdb_update,
 		  __entry->flags)
 );
 
+TRACE_EVENT(br_mdb_full,
+
+	TP_PROTO(const struct net_device *dev,
+		 const struct br_ip *group),
+
+	TP_ARGS(dev, group),
+
+	TP_STRUCT__entry(
+		__string(dev, dev->name)
+		__field(int, af)
+		__field(u16, vid)
+		__array(__u8, src4, 4)
+		__array(__u8, src6, 16)
+		__array(__u8, grp4, 4)
+		__array(__u8, grp6, 16)
+		__array(__u8, grpmac, ETH_ALEN) /* For af == 0. */
+	),
+
+	TP_fast_assign(
+		__assign_str(dev, dev->name);
+		__entry->vid = group->vid;
+
+		if (!group->proto) {
+			__entry->af = 0;
+
+			memset(__entry->src4, 0, sizeof(__entry->src4));
+			memset(__entry->src6, 0, sizeof(__entry->src6));
+			memset(__entry->grp4, 0, sizeof(__entry->grp4));
+			memset(__entry->grp6, 0, sizeof(__entry->grp6));
+			memcpy(__entry->grpmac, group->dst.mac_addr, ETH_ALEN);
+		} else if (group->proto == htons(ETH_P_IP)) {
+			__be32 *p32;
+
+			__entry->af = AF_INET;
+
+			p32 = (__be32 *) __entry->src4;
+			*p32 = group->src.ip4;
+
+			p32 = (__be32 *) __entry->grp4;
+			*p32 = group->dst.ip4;
+
+			memset(__entry->src6, 0, sizeof(__entry->src6));
+			memset(__entry->grp6, 0, sizeof(__entry->grp6));
+			memset(__entry->grpmac, 0, ETH_ALEN);
+#if IS_ENABLED(CONFIG_IPV6)
+		} else {
+			struct in6_addr *in6;
+
+			__entry->af = AF_INET6;
+
+			in6 = (struct in6_addr *)__entry->src6;
+			*in6 = group->src.ip6;
+
+			in6 = (struct in6_addr *)__entry->grp6;
+			*in6 = group->dst.ip6;
+
+			memset(__entry->src4, 0, sizeof(__entry->src4));
+			memset(__entry->grp4, 0, sizeof(__entry->grp4));
+			memset(__entry->grpmac, 0, ETH_ALEN);
+#endif
+		}
+	),
+
+	TP_printk("dev %s af %u src %pI4/%pI6c grp %pI4/%pI6c/%pM vid %u",
+		  __get_str(dev), __entry->af, __entry->src4, __entry->src6,
+		  __entry->grp4, __entry->grp6, __entry->grpmac, __entry->vid)
+);
 
 #endif /* _TRACE_BRIDGE_H */
 
diff --git a/net/core/net-traces.c b/net/core/net-traces.c
index ee7006bbe49b..805b7385dd8d 100644
--- a/net/core/net-traces.c
+++ b/net/core/net-traces.c
@@ -41,6 +41,7 @@ EXPORT_TRACEPOINT_SYMBOL_GPL(br_fdb_add);
 EXPORT_TRACEPOINT_SYMBOL_GPL(br_fdb_external_learn_add);
 EXPORT_TRACEPOINT_SYMBOL_GPL(fdb_delete);
 EXPORT_TRACEPOINT_SYMBOL_GPL(br_fdb_update);
+EXPORT_TRACEPOINT_SYMBOL_GPL(br_mdb_full);
 #endif
 
 #if IS_ENABLED(CONFIG_PAGE_POOL)
-- 
2.39.0


^ permalink raw reply related	[flat|nested] 90+ messages in thread

* [Bridge] [PATCH net-next 06/16] net: bridge: Add a tracepoint for MDB overflows
@ 2023-01-26 17:01   ` Petr Machata
  0 siblings, 0 replies; 90+ messages in thread
From: Petr Machata @ 2023-01-26 17:01 UTC (permalink / raw)
  To: David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
	Roopa Prabhu, Nikolay Aleksandrov, netdev
  Cc: Petr Machata, Ido Schimmel, bridge, Steven Rostedt, linux-trace-kernel

The following patch will add two more maximum MDB allowances to the global
one, mcast_hash_max, that exists today. In all these cases, attempts to add
MDB entries above the configured maximums through netlink, fail noisily and
obviously. Such visibility is missing when adding entries through the
control plane traffic, by IGMP or MLD packets.

To improve visibility in those cases, add a trace point that reports the
violation, including the relevant netdevice (be it a slave or the bridge
itself), and the MDB entry parameters:

	# perf record -e bridge:br_mdb_full &
	# [...]
	# perf script | cut -d: -f4-
	 dev v2 af 2 src 192.0.2.1/:: grp 239.1.1.1/::/00:00:00:00:00:00 vid 0
	 dev v2 af 10 src 0.0.0.0/2001:db8:1::1 grp 0.0.0.0/ff0e::1/00:00:00:00:00:00 vid 0
	 dev v2 af 2 src 192.0.2.1/:: grp 239.1.1.1/::/00:00:00:00:00:00 vid 10
	 dev v2 af 10 src 0.0.0.0/2001:db8:1::1 grp 0.0.0.0/ff0e::1/00:00:00:00:00:00 vid 10

CC: Steven Rostedt <rostedt@goodmis.org>
CC: linux-trace-kernel@vger.kernel.org
Signed-off-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
---
 include/trace/events/bridge.h | 67 +++++++++++++++++++++++++++++++++++
 net/core/net-traces.c         |  1 +
 2 files changed, 68 insertions(+)

diff --git a/include/trace/events/bridge.h b/include/trace/events/bridge.h
index 6b200059c2c5..00d5e2dcb3ad 100644
--- a/include/trace/events/bridge.h
+++ b/include/trace/events/bridge.h
@@ -122,6 +122,73 @@ TRACE_EVENT(br_fdb_update,
 		  __entry->flags)
 );
 
+TRACE_EVENT(br_mdb_full,
+
+	TP_PROTO(const struct net_device *dev,
+		 const struct br_ip *group),
+
+	TP_ARGS(dev, group),
+
+	TP_STRUCT__entry(
+		__string(dev, dev->name)
+		__field(int, af)
+		__field(u16, vid)
+		__array(__u8, src4, 4)
+		__array(__u8, src6, 16)
+		__array(__u8, grp4, 4)
+		__array(__u8, grp6, 16)
+		__array(__u8, grpmac, ETH_ALEN) /* For af == 0. */
+	),
+
+	TP_fast_assign(
+		__assign_str(dev, dev->name);
+		__entry->vid = group->vid;
+
+		if (!group->proto) {
+			__entry->af = 0;
+
+			memset(__entry->src4, 0, sizeof(__entry->src4));
+			memset(__entry->src6, 0, sizeof(__entry->src6));
+			memset(__entry->grp4, 0, sizeof(__entry->grp4));
+			memset(__entry->grp6, 0, sizeof(__entry->grp6));
+			memcpy(__entry->grpmac, group->dst.mac_addr, ETH_ALEN);
+		} else if (group->proto == htons(ETH_P_IP)) {
+			__be32 *p32;
+
+			__entry->af = AF_INET;
+
+			p32 = (__be32 *) __entry->src4;
+			*p32 = group->src.ip4;
+
+			p32 = (__be32 *) __entry->grp4;
+			*p32 = group->dst.ip4;
+
+			memset(__entry->src6, 0, sizeof(__entry->src6));
+			memset(__entry->grp6, 0, sizeof(__entry->grp6));
+			memset(__entry->grpmac, 0, ETH_ALEN);
+#if IS_ENABLED(CONFIG_IPV6)
+		} else {
+			struct in6_addr *in6;
+
+			__entry->af = AF_INET6;
+
+			in6 = (struct in6_addr *)__entry->src6;
+			*in6 = group->src.ip6;
+
+			in6 = (struct in6_addr *)__entry->grp6;
+			*in6 = group->dst.ip6;
+
+			memset(__entry->src4, 0, sizeof(__entry->src4));
+			memset(__entry->grp4, 0, sizeof(__entry->grp4));
+			memset(__entry->grpmac, 0, ETH_ALEN);
+#endif
+		}
+	),
+
+	TP_printk("dev %s af %u src %pI4/%pI6c grp %pI4/%pI6c/%pM vid %u",
+		  __get_str(dev), __entry->af, __entry->src4, __entry->src6,
+		  __entry->grp4, __entry->grp6, __entry->grpmac, __entry->vid)
+);
 
 #endif /* _TRACE_BRIDGE_H */
 
diff --git a/net/core/net-traces.c b/net/core/net-traces.c
index ee7006bbe49b..805b7385dd8d 100644
--- a/net/core/net-traces.c
+++ b/net/core/net-traces.c
@@ -41,6 +41,7 @@ EXPORT_TRACEPOINT_SYMBOL_GPL(br_fdb_add);
 EXPORT_TRACEPOINT_SYMBOL_GPL(br_fdb_external_learn_add);
 EXPORT_TRACEPOINT_SYMBOL_GPL(fdb_delete);
 EXPORT_TRACEPOINT_SYMBOL_GPL(br_fdb_update);
+EXPORT_TRACEPOINT_SYMBOL_GPL(br_mdb_full);
 #endif
 
 #if IS_ENABLED(CONFIG_PAGE_POOL)
-- 
2.39.0


^ permalink raw reply related	[flat|nested] 90+ messages in thread

* [PATCH net-next 07/16] net: bridge: Maintain number of MDB entries in net_bridge_mcast_port
  2023-01-26 17:01 ` [Bridge] " Petr Machata
@ 2023-01-26 17:01   ` Petr Machata
  -1 siblings, 0 replies; 90+ messages in thread
From: Petr Machata @ 2023-01-26 17:01 UTC (permalink / raw)
  To: David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
	Roopa Prabhu, Nikolay Aleksandrov, netdev
  Cc: bridge, Petr Machata, Ido Schimmel

The MDB maintained by the bridge is limited. When the bridge is configured
for IGMP / MLD snooping, a buggy or malicious client can easily exhaust its
capacity. In SW datapath, the capacity is configurable through the
IFLA_BR_MCAST_HASH_MAX parameter, but ultimately is finite. Obviously a
similar limit exists in the HW datapath for purposes of offloading.

In order to prevent the issue of unilateral exhaustion of MDB resources,
introduce two parameters in each of two contexts:

- Per-port and per-port-VLAN number of MDB entries that the port
  is member in.

- Per-port and (when BROPT_MCAST_VLAN_SNOOPING_ENABLED is enabled)
  per-port-VLAN maximum permitted number of MDB entries, or 0 for
  no limit.

The per-port multicast context is used for tracking of MDB entries for the
port as a whole. This is available for all bridges.

The per-port-VLAN multicast context is then only available on
VLAN-filtering bridges on VLANs that have multicast snooping on.

With these changes in place, it will be possible to configure MDB limit for
bridge as a whole, or any one port as a whole, or any single port-VLAN.

Note that unlike the global limit, exhaustion of the per-port and
per-port-VLAN maximums does not cause disablement of multicast snooping.
It is also permitted to configure the local limit larger than hash_max,
even though that is not useful.

In this patch, introduce only the accounting for number of entries, and the
max field itself, but not the means to toggle the max. The next patch
introduces the netlink APIs to toggle and read the values.

Note that the per-port-VLAN mcast_max_groups value gets reset when VLAN
snooping is enabled. The reason for this is that while VLAN snooping is
disabled, permanent entries can be added above the limit imposed by the
configured maximum. Under those circumstances, whatever caused the VLAN
context enablement, would need to be rolled back, adding a fair amount of
code that would be rarely hit and tricky to maintain. At the same time,
the feature that this would enable is IMHO not interesting: I posit that
the usefulness of keeping mcast_max_groups intact across
mcast_vlan_snooping toggles is marginal at best.

Signed-off-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
---
 net/bridge/br_multicast.c | 131 +++++++++++++++++++++++++++++++++++++-
 net/bridge/br_private.h   |   2 +
 2 files changed, 132 insertions(+), 1 deletion(-)

diff --git a/net/bridge/br_multicast.c b/net/bridge/br_multicast.c
index 51b622afdb67..de531109b947 100644
--- a/net/bridge/br_multicast.c
+++ b/net/bridge/br_multicast.c
@@ -31,6 +31,7 @@
 #include <net/ip6_checksum.h>
 #include <net/addrconf.h>
 #endif
+#include <trace/events/bridge.h>
 
 #include "br_private.h"
 #include "br_private_mcast_eht.h"
@@ -234,6 +235,29 @@ br_multicast_pg_to_port_ctx(const struct net_bridge_port_group *pg)
 	return pmctx;
 }
 
+static struct net_bridge_mcast_port *
+br_multicast_port_vid_to_port_ctx(struct net_bridge_port *port, u16 vid)
+{
+	struct net_bridge_mcast_port *pmctx = NULL;
+	struct net_bridge_vlan *vlan;
+
+	lockdep_assert_held_once(&port->br->multicast_lock);
+
+	if (!br_opt_get(port->br, BROPT_MCAST_VLAN_SNOOPING_ENABLED))
+		return NULL;
+
+	/* Take RCU to access the vlan. */
+	rcu_read_lock();
+
+	vlan = br_vlan_find(nbp_vlan_group_rcu(port), vid);
+	if (vlan && !br_multicast_port_ctx_vlan_disabled(&vlan->port_mcast_ctx))
+		pmctx = &vlan->port_mcast_ctx;
+
+	rcu_read_unlock();
+
+	return pmctx;
+}
+
 /* when snooping we need to check if the contexts should be used
  * in the following order:
  * - if pmctx is non-NULL (port), check if it should be used
@@ -668,6 +692,80 @@ void br_multicast_del_group_src(struct net_bridge_group_src *src,
 	__br_multicast_del_group_src(src);
 }
 
+static int
+br_multicast_port_ngroups_inc_one(struct net_bridge_mcast_port *pmctx,
+				  struct netlink_ext_ack *extack)
+{
+	if (pmctx->mdb_max_entries &&
+	    pmctx->mdb_n_entries == pmctx->mdb_max_entries)
+		return -E2BIG;
+
+	pmctx->mdb_n_entries++;
+	return 0;
+}
+
+static void br_multicast_port_ngroups_dec_one(struct net_bridge_mcast_port *pmctx)
+{
+	WARN_ON_ONCE(pmctx->mdb_n_entries-- == 0);
+}
+
+static int br_multicast_port_ngroups_inc(struct net_bridge_port *port,
+					 const struct br_ip *group,
+					 struct netlink_ext_ack *extack)
+{
+	struct net_bridge_mcast_port *pmctx;
+	int err;
+
+	lockdep_assert_held_once(&port->br->multicast_lock);
+
+	/* Always count on the port context. */
+	err = br_multicast_port_ngroups_inc_one(&port->multicast_ctx, extack);
+	if (err) {
+		NL_SET_ERR_MSG_FMT_MOD(extack, "Port is already a member in mcast_max_groups (%u) groups",
+				       port->multicast_ctx.mdb_max_entries);
+		trace_br_mdb_full(port->dev, group);
+		return err;
+	}
+
+	/* Only count on the VLAN context if VID is given, and if snooping on
+	 * that VLAN is enabled.
+	 */
+	if (!group->vid)
+		return 0;
+
+	pmctx = br_multicast_port_vid_to_port_ctx(port, group->vid);
+	if (!pmctx)
+		return 0;
+
+	err = br_multicast_port_ngroups_inc_one(pmctx, extack);
+	if (err) {
+		NL_SET_ERR_MSG_FMT_MOD(extack, "Port-VLAN is already a member in mcast_max_groups (%u) groups",
+				       pmctx->mdb_max_entries);
+		trace_br_mdb_full(port->dev, group);
+		goto dec_one_out;
+	}
+
+	return 0;
+
+dec_one_out:
+	br_multicast_port_ngroups_dec_one(&port->multicast_ctx);
+	return err;
+}
+
+static void br_multicast_port_ngroups_dec(struct net_bridge_port *port, u16 vid)
+{
+	struct net_bridge_mcast_port *pmctx;
+
+	lockdep_assert_held_once(&port->br->multicast_lock);
+
+	if (vid) {
+		pmctx = br_multicast_port_vid_to_port_ctx(port, vid);
+		if (pmctx)
+			br_multicast_port_ngroups_dec_one(pmctx);
+	}
+	br_multicast_port_ngroups_dec_one(&port->multicast_ctx);
+}
+
 static void br_multicast_destroy_port_group(struct net_bridge_mcast_gc *gc)
 {
 	struct net_bridge_port_group *pg;
@@ -702,6 +800,7 @@ void br_multicast_del_pg(struct net_bridge_mdb_entry *mp,
 	} else {
 		br_multicast_star_g_handle_mode(pg, MCAST_INCLUDE);
 	}
+	br_multicast_port_ngroups_dec(pg->key.port, pg->key.addr.vid);
 	hlist_add_head(&pg->mcast_gc.gc_node, &br->mcast_gc_list);
 	queue_work(system_long_wq, &br->mcast_gc_work);
 
@@ -1165,6 +1264,7 @@ struct net_bridge_mdb_entry *br_multicast_new_group(struct net_bridge *br,
 		return mp;
 
 	if (atomic_read(&br->mdb_hash_tbl.nelems) >= br->hash_max) {
+		trace_br_mdb_full(br->dev, group);
 		br_mc_disabled_update(br->dev, false, NULL);
 		br_opt_toggle(br, BROPT_MULTICAST_ENABLED, false);
 		return ERR_PTR(-E2BIG);
@@ -1288,11 +1388,16 @@ struct net_bridge_port_group *br_multicast_new_port_group(
 			struct netlink_ext_ack *extack)
 {
 	struct net_bridge_port_group *p;
+	int err;
+
+	err = br_multicast_port_ngroups_inc(port, group, extack);
+	if (err)
+		return NULL;
 
 	p = kzalloc(sizeof(*p), GFP_ATOMIC);
 	if (unlikely(!p)) {
 		NL_SET_ERR_MSG_MOD(extack, "Couldn't allocate new port group");
-		return NULL;
+		goto dec_out;
 	}
 
 	p->key.addr = *group;
@@ -1326,18 +1431,22 @@ struct net_bridge_port_group *br_multicast_new_port_group(
 
 free_out:
 	kfree(p);
+dec_out:
+	br_multicast_port_ngroups_dec(port, group->vid);
 	return NULL;
 }
 
 void br_multicast_del_port_group(struct net_bridge_port_group *p)
 {
 	struct net_bridge_port *port = p->key.port;
+	__u16 vid = p->key.addr.vid;
 
 	hlist_del_init(&p->mglist);
 	if (!br_multicast_is_star_g(&p->key.addr))
 		rhashtable_remove_fast(&port->br->sg_port_tbl, &p->rhnode,
 				       br_sg_port_rht_params);
 	kfree(p);
+	br_multicast_port_ngroups_dec(port, vid);
 }
 
 void br_multicast_host_join(const struct net_bridge_mcast *brmctx,
@@ -1951,6 +2060,26 @@ static void __br_multicast_enable_port_ctx(struct net_bridge_mcast_port *pmctx)
 		br_ip4_multicast_add_router(brmctx, pmctx);
 		br_ip6_multicast_add_router(brmctx, pmctx);
 	}
+
+	if (br_multicast_port_ctx_is_vlan(pmctx)) {
+		struct net_bridge_port_group *pg;
+
+		/* If BR_VLFLAG_MCAST_ENABLED was enabled in the past, but then
+		 * disabled, the mcast_n_groups counter is now wrong. First,
+		 * BR_VLFLAG_MCAST_ENABLED is toggled before temporary entries
+		 * are flushed, thus mcast_n_groups after the toggle does not
+		 * reflect the true values. And second, permanent entries added
+		 * while BR_VLFLAG_MCAST_ENABLED was disabled, are not reflected
+		 * either. Thus we have to refresh the counter.
+		 */
+
+		pmctx->mdb_max_entries = 0;
+		pmctx->mdb_n_entries = 0;
+		hlist_for_each_entry(pg, &pmctx->port->mglist, mglist) {
+			if (pg->key.addr.vid == pmctx->vlan->vid)
+				br_multicast_port_ngroups_inc_one(pmctx, NULL);
+		}
+	}
 }
 
 void br_multicast_enable_port(struct net_bridge_port *port)
diff --git a/net/bridge/br_private.h b/net/bridge/br_private.h
index e4069e27b5c6..49f411a0a1f1 100644
--- a/net/bridge/br_private.h
+++ b/net/bridge/br_private.h
@@ -126,6 +126,8 @@ struct net_bridge_mcast_port {
 	struct hlist_node		ip6_rlist;
 #endif /* IS_ENABLED(CONFIG_IPV6) */
 	unsigned char			multicast_router;
+	u32				mdb_n_entries;
+	u32				mdb_max_entries;
 #endif /* CONFIG_BRIDGE_IGMP_SNOOPING */
 };
 
-- 
2.39.0


^ permalink raw reply related	[flat|nested] 90+ messages in thread

* [Bridge] [PATCH net-next 07/16] net: bridge: Maintain number of MDB entries in net_bridge_mcast_port
@ 2023-01-26 17:01   ` Petr Machata
  0 siblings, 0 replies; 90+ messages in thread
From: Petr Machata @ 2023-01-26 17:01 UTC (permalink / raw)
  To: David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
	Roopa Prabhu, Nikolay Aleksandrov, netdev
  Cc: Petr Machata, Ido Schimmel, bridge

The MDB maintained by the bridge is limited. When the bridge is configured
for IGMP / MLD snooping, a buggy or malicious client can easily exhaust its
capacity. In SW datapath, the capacity is configurable through the
IFLA_BR_MCAST_HASH_MAX parameter, but ultimately is finite. Obviously a
similar limit exists in the HW datapath for purposes of offloading.

In order to prevent the issue of unilateral exhaustion of MDB resources,
introduce two parameters in each of two contexts:

- Per-port and per-port-VLAN number of MDB entries that the port
  is member in.

- Per-port and (when BROPT_MCAST_VLAN_SNOOPING_ENABLED is enabled)
  per-port-VLAN maximum permitted number of MDB entries, or 0 for
  no limit.

The per-port multicast context is used for tracking of MDB entries for the
port as a whole. This is available for all bridges.

The per-port-VLAN multicast context is then only available on
VLAN-filtering bridges on VLANs that have multicast snooping on.

With these changes in place, it will be possible to configure MDB limit for
bridge as a whole, or any one port as a whole, or any single port-VLAN.

Note that unlike the global limit, exhaustion of the per-port and
per-port-VLAN maximums does not cause disablement of multicast snooping.
It is also permitted to configure the local limit larger than hash_max,
even though that is not useful.

In this patch, introduce only the accounting for number of entries, and the
max field itself, but not the means to toggle the max. The next patch
introduces the netlink APIs to toggle and read the values.

Note that the per-port-VLAN mcast_max_groups value gets reset when VLAN
snooping is enabled. The reason for this is that while VLAN snooping is
disabled, permanent entries can be added above the limit imposed by the
configured maximum. Under those circumstances, whatever caused the VLAN
context enablement, would need to be rolled back, adding a fair amount of
code that would be rarely hit and tricky to maintain. At the same time,
the feature that this would enable is IMHO not interesting: I posit that
the usefulness of keeping mcast_max_groups intact across
mcast_vlan_snooping toggles is marginal at best.

Signed-off-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
---
 net/bridge/br_multicast.c | 131 +++++++++++++++++++++++++++++++++++++-
 net/bridge/br_private.h   |   2 +
 2 files changed, 132 insertions(+), 1 deletion(-)

diff --git a/net/bridge/br_multicast.c b/net/bridge/br_multicast.c
index 51b622afdb67..de531109b947 100644
--- a/net/bridge/br_multicast.c
+++ b/net/bridge/br_multicast.c
@@ -31,6 +31,7 @@
 #include <net/ip6_checksum.h>
 #include <net/addrconf.h>
 #endif
+#include <trace/events/bridge.h>
 
 #include "br_private.h"
 #include "br_private_mcast_eht.h"
@@ -234,6 +235,29 @@ br_multicast_pg_to_port_ctx(const struct net_bridge_port_group *pg)
 	return pmctx;
 }
 
+static struct net_bridge_mcast_port *
+br_multicast_port_vid_to_port_ctx(struct net_bridge_port *port, u16 vid)
+{
+	struct net_bridge_mcast_port *pmctx = NULL;
+	struct net_bridge_vlan *vlan;
+
+	lockdep_assert_held_once(&port->br->multicast_lock);
+
+	if (!br_opt_get(port->br, BROPT_MCAST_VLAN_SNOOPING_ENABLED))
+		return NULL;
+
+	/* Take RCU to access the vlan. */
+	rcu_read_lock();
+
+	vlan = br_vlan_find(nbp_vlan_group_rcu(port), vid);
+	if (vlan && !br_multicast_port_ctx_vlan_disabled(&vlan->port_mcast_ctx))
+		pmctx = &vlan->port_mcast_ctx;
+
+	rcu_read_unlock();
+
+	return pmctx;
+}
+
 /* when snooping we need to check if the contexts should be used
  * in the following order:
  * - if pmctx is non-NULL (port), check if it should be used
@@ -668,6 +692,80 @@ void br_multicast_del_group_src(struct net_bridge_group_src *src,
 	__br_multicast_del_group_src(src);
 }
 
+static int
+br_multicast_port_ngroups_inc_one(struct net_bridge_mcast_port *pmctx,
+				  struct netlink_ext_ack *extack)
+{
+	if (pmctx->mdb_max_entries &&
+	    pmctx->mdb_n_entries == pmctx->mdb_max_entries)
+		return -E2BIG;
+
+	pmctx->mdb_n_entries++;
+	return 0;
+}
+
+static void br_multicast_port_ngroups_dec_one(struct net_bridge_mcast_port *pmctx)
+{
+	WARN_ON_ONCE(pmctx->mdb_n_entries-- == 0);
+}
+
+static int br_multicast_port_ngroups_inc(struct net_bridge_port *port,
+					 const struct br_ip *group,
+					 struct netlink_ext_ack *extack)
+{
+	struct net_bridge_mcast_port *pmctx;
+	int err;
+
+	lockdep_assert_held_once(&port->br->multicast_lock);
+
+	/* Always count on the port context. */
+	err = br_multicast_port_ngroups_inc_one(&port->multicast_ctx, extack);
+	if (err) {
+		NL_SET_ERR_MSG_FMT_MOD(extack, "Port is already a member in mcast_max_groups (%u) groups",
+				       port->multicast_ctx.mdb_max_entries);
+		trace_br_mdb_full(port->dev, group);
+		return err;
+	}
+
+	/* Only count on the VLAN context if VID is given, and if snooping on
+	 * that VLAN is enabled.
+	 */
+	if (!group->vid)
+		return 0;
+
+	pmctx = br_multicast_port_vid_to_port_ctx(port, group->vid);
+	if (!pmctx)
+		return 0;
+
+	err = br_multicast_port_ngroups_inc_one(pmctx, extack);
+	if (err) {
+		NL_SET_ERR_MSG_FMT_MOD(extack, "Port-VLAN is already a member in mcast_max_groups (%u) groups",
+				       pmctx->mdb_max_entries);
+		trace_br_mdb_full(port->dev, group);
+		goto dec_one_out;
+	}
+
+	return 0;
+
+dec_one_out:
+	br_multicast_port_ngroups_dec_one(&port->multicast_ctx);
+	return err;
+}
+
+static void br_multicast_port_ngroups_dec(struct net_bridge_port *port, u16 vid)
+{
+	struct net_bridge_mcast_port *pmctx;
+
+	lockdep_assert_held_once(&port->br->multicast_lock);
+
+	if (vid) {
+		pmctx = br_multicast_port_vid_to_port_ctx(port, vid);
+		if (pmctx)
+			br_multicast_port_ngroups_dec_one(pmctx);
+	}
+	br_multicast_port_ngroups_dec_one(&port->multicast_ctx);
+}
+
 static void br_multicast_destroy_port_group(struct net_bridge_mcast_gc *gc)
 {
 	struct net_bridge_port_group *pg;
@@ -702,6 +800,7 @@ void br_multicast_del_pg(struct net_bridge_mdb_entry *mp,
 	} else {
 		br_multicast_star_g_handle_mode(pg, MCAST_INCLUDE);
 	}
+	br_multicast_port_ngroups_dec(pg->key.port, pg->key.addr.vid);
 	hlist_add_head(&pg->mcast_gc.gc_node, &br->mcast_gc_list);
 	queue_work(system_long_wq, &br->mcast_gc_work);
 
@@ -1165,6 +1264,7 @@ struct net_bridge_mdb_entry *br_multicast_new_group(struct net_bridge *br,
 		return mp;
 
 	if (atomic_read(&br->mdb_hash_tbl.nelems) >= br->hash_max) {
+		trace_br_mdb_full(br->dev, group);
 		br_mc_disabled_update(br->dev, false, NULL);
 		br_opt_toggle(br, BROPT_MULTICAST_ENABLED, false);
 		return ERR_PTR(-E2BIG);
@@ -1288,11 +1388,16 @@ struct net_bridge_port_group *br_multicast_new_port_group(
 			struct netlink_ext_ack *extack)
 {
 	struct net_bridge_port_group *p;
+	int err;
+
+	err = br_multicast_port_ngroups_inc(port, group, extack);
+	if (err)
+		return NULL;
 
 	p = kzalloc(sizeof(*p), GFP_ATOMIC);
 	if (unlikely(!p)) {
 		NL_SET_ERR_MSG_MOD(extack, "Couldn't allocate new port group");
-		return NULL;
+		goto dec_out;
 	}
 
 	p->key.addr = *group;
@@ -1326,18 +1431,22 @@ struct net_bridge_port_group *br_multicast_new_port_group(
 
 free_out:
 	kfree(p);
+dec_out:
+	br_multicast_port_ngroups_dec(port, group->vid);
 	return NULL;
 }
 
 void br_multicast_del_port_group(struct net_bridge_port_group *p)
 {
 	struct net_bridge_port *port = p->key.port;
+	__u16 vid = p->key.addr.vid;
 
 	hlist_del_init(&p->mglist);
 	if (!br_multicast_is_star_g(&p->key.addr))
 		rhashtable_remove_fast(&port->br->sg_port_tbl, &p->rhnode,
 				       br_sg_port_rht_params);
 	kfree(p);
+	br_multicast_port_ngroups_dec(port, vid);
 }
 
 void br_multicast_host_join(const struct net_bridge_mcast *brmctx,
@@ -1951,6 +2060,26 @@ static void __br_multicast_enable_port_ctx(struct net_bridge_mcast_port *pmctx)
 		br_ip4_multicast_add_router(brmctx, pmctx);
 		br_ip6_multicast_add_router(brmctx, pmctx);
 	}
+
+	if (br_multicast_port_ctx_is_vlan(pmctx)) {
+		struct net_bridge_port_group *pg;
+
+		/* If BR_VLFLAG_MCAST_ENABLED was enabled in the past, but then
+		 * disabled, the mcast_n_groups counter is now wrong. First,
+		 * BR_VLFLAG_MCAST_ENABLED is toggled before temporary entries
+		 * are flushed, thus mcast_n_groups after the toggle does not
+		 * reflect the true values. And second, permanent entries added
+		 * while BR_VLFLAG_MCAST_ENABLED was disabled, are not reflected
+		 * either. Thus we have to refresh the counter.
+		 */
+
+		pmctx->mdb_max_entries = 0;
+		pmctx->mdb_n_entries = 0;
+		hlist_for_each_entry(pg, &pmctx->port->mglist, mglist) {
+			if (pg->key.addr.vid == pmctx->vlan->vid)
+				br_multicast_port_ngroups_inc_one(pmctx, NULL);
+		}
+	}
 }
 
 void br_multicast_enable_port(struct net_bridge_port *port)
diff --git a/net/bridge/br_private.h b/net/bridge/br_private.h
index e4069e27b5c6..49f411a0a1f1 100644
--- a/net/bridge/br_private.h
+++ b/net/bridge/br_private.h
@@ -126,6 +126,8 @@ struct net_bridge_mcast_port {
 	struct hlist_node		ip6_rlist;
 #endif /* IS_ENABLED(CONFIG_IPV6) */
 	unsigned char			multicast_router;
+	u32				mdb_n_entries;
+	u32				mdb_max_entries;
 #endif /* CONFIG_BRIDGE_IGMP_SNOOPING */
 };
 
-- 
2.39.0


^ permalink raw reply related	[flat|nested] 90+ messages in thread

* [PATCH net-next 08/16] net: bridge: Add netlink knobs for number / maximum MDB entries
  2023-01-26 17:01 ` [Bridge] " Petr Machata
@ 2023-01-26 17:01   ` Petr Machata
  -1 siblings, 0 replies; 90+ messages in thread
From: Petr Machata @ 2023-01-26 17:01 UTC (permalink / raw)
  To: David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
	Roopa Prabhu, Nikolay Aleksandrov, netdev
  Cc: bridge, Petr Machata, Ido Schimmel

The previous patch added accounting for number of MDB entries per port and
per port-VLAN, and the logic to verify that these values stay within
configured bounds. However it didn't provide means to actually configure
those bounds or read the occupancy. This patch does that.

Two new netlink attributes are added for the MDB occupancy:
IFLA_BRPORT_MCAST_N_GROUPS for the per-port occupancy and
BRIDGE_VLANDB_ENTRY_MCAST_N_GROUPS for the per-port-VLAN occupancy.
And another two for the maximum number of MDB entries:
IFLA_BRPORT_MCAST_MAX_GROUPS for the per-port maximum, and
BRIDGE_VLANDB_ENTRY_MCAST_MAX_GROUPS for the per-port-VLAN one.

Note that the two new IFLA_BRPORT_ attributes prompt bumping of
RTNL_SLAVE_MAX_TYPE to size the slave attribute tables large enough.

The new attributes are used like this:

 # ip link add name br up type bridge vlan_filtering 1 mcast_snooping 1 \
                                      mcast_vlan_snooping 1 mcast_querier 1
 # ip link set dev v1 master br
 # bridge vlan add dev v1 vid 2

 # bridge vlan set dev v1 vid 1 mcast_max_groups 1
 # bridge mdb add dev br port v1 grp 230.1.2.3 temp vid 1
 # bridge mdb add dev br port v1 grp 230.1.2.4 temp vid 1
 Error: bridge: Port-VLAN is already a member in mcast_max_groups (1) groups.

 # bridge link set dev v1 mcast_max_groups 1
 # bridge mdb add dev br port v1 grp 230.1.2.3 temp vid 2
 Error: bridge: Port is already a member in mcast_max_groups (1) groups.

 # bridge -d link show
 5: v1@v2: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 master br [...]
     [...] mcast_n_groups 1 mcast_max_groups 1

 # bridge -d vlan show
 port              vlan-id
 br                1 PVID Egress Untagged
                     state forwarding mcast_router 1
 v1                1 PVID Egress Untagged
                     [...] mcast_n_groups 1 mcast_max_groups 1
                   2
                     [...] mcast_n_groups 0 mcast_max_groups 0

Signed-off-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
---
 include/uapi/linux/if_bridge.h |  2 +
 include/uapi/linux/if_link.h   |  2 +
 net/bridge/br_multicast.c      | 96 ++++++++++++++++++++++++++++++++++
 net/bridge/br_netlink.c        | 19 ++++++-
 net/bridge/br_private.h        | 16 +++++-
 net/bridge/br_vlan.c           | 11 ++--
 net/bridge/br_vlan_options.c   | 33 +++++++++++-
 net/core/rtnetlink.c           |  2 +-
 8 files changed, 173 insertions(+), 8 deletions(-)

diff --git a/include/uapi/linux/if_bridge.h b/include/uapi/linux/if_bridge.h
index d9de241d90f9..d60c456710b3 100644
--- a/include/uapi/linux/if_bridge.h
+++ b/include/uapi/linux/if_bridge.h
@@ -523,6 +523,8 @@ enum {
 	BRIDGE_VLANDB_ENTRY_TUNNEL_INFO,
 	BRIDGE_VLANDB_ENTRY_STATS,
 	BRIDGE_VLANDB_ENTRY_MCAST_ROUTER,
+	BRIDGE_VLANDB_ENTRY_MCAST_N_GROUPS,
+	BRIDGE_VLANDB_ENTRY_MCAST_MAX_GROUPS,
 	__BRIDGE_VLANDB_ENTRY_MAX,
 };
 #define BRIDGE_VLANDB_ENTRY_MAX (__BRIDGE_VLANDB_ENTRY_MAX - 1)
diff --git a/include/uapi/linux/if_link.h b/include/uapi/linux/if_link.h
index 1021a7e47a86..1bed3a72939c 100644
--- a/include/uapi/linux/if_link.h
+++ b/include/uapi/linux/if_link.h
@@ -564,6 +564,8 @@ enum {
 	IFLA_BRPORT_MCAST_EHT_HOSTS_CNT,
 	IFLA_BRPORT_LOCKED,
 	IFLA_BRPORT_MAB,
+	IFLA_BRPORT_MCAST_N_GROUPS,
+	IFLA_BRPORT_MCAST_MAX_GROUPS,
 	__IFLA_BRPORT_MAX
 };
 #define IFLA_BRPORT_MAX (__IFLA_BRPORT_MAX - 1)
diff --git a/net/bridge/br_multicast.c b/net/bridge/br_multicast.c
index de531109b947..04261dd2380b 100644
--- a/net/bridge/br_multicast.c
+++ b/net/bridge/br_multicast.c
@@ -766,6 +766,102 @@ static void br_multicast_port_ngroups_dec(struct net_bridge_port *port, u16 vid)
 	br_multicast_port_ngroups_dec_one(&port->multicast_ctx);
 }
 
+static int
+br_multicast_pmctx_ngroups_set_max(struct net_bridge_mcast_port *pmctx,
+				   u32 max, struct netlink_ext_ack *extack)
+{
+	if (max && max < pmctx->mdb_n_entries) {
+		NL_SET_ERR_MSG_FMT_MOD(extack, "Can't set mcast_max_groups=%u, which is below mcast_n_groups=%u",
+				       max, pmctx->mdb_n_entries);
+		return -EINVAL;
+	}
+
+	pmctx->mdb_max_entries = max;
+	return 0;
+}
+
+u32 br_multicast_port_ngroups_get(const struct net_bridge_port *port)
+{
+	u32 n;
+
+	spin_lock_bh(&port->br->multicast_lock);
+	n = port->multicast_ctx.mdb_n_entries;
+	spin_unlock_bh(&port->br->multicast_lock);
+
+	return n;
+}
+
+int br_multicast_vlan_ngroups_get(struct net_bridge *br,
+				  const struct net_bridge_vlan *v,
+				  u32 *n)
+{
+	if (br_multicast_port_ctx_vlan_disabled(&v->port_mcast_ctx))
+		return -EINVAL;
+
+	spin_lock_bh(&br->multicast_lock);
+	*n = v->port_mcast_ctx.mdb_n_entries;
+	spin_unlock_bh(&br->multicast_lock);
+
+	return 0;
+}
+
+int br_multicast_port_ngroups_set_max(struct net_bridge_port *port, u32 max,
+				      struct netlink_ext_ack *extack)
+{
+	int err;
+
+	spin_lock_bh(&port->br->multicast_lock);
+	err = br_multicast_pmctx_ngroups_set_max(&port->multicast_ctx, max,
+						 extack);
+	spin_unlock_bh(&port->br->multicast_lock);
+
+	return err;
+}
+
+int br_multicast_vlan_ngroups_set_max(struct net_bridge *br,
+				      struct net_bridge_vlan *v, u32 max,
+				      struct netlink_ext_ack *extack)
+{
+	int err;
+
+	if (br_multicast_port_ctx_vlan_disabled(&v->port_mcast_ctx)) {
+		NL_SET_ERR_MSG_MOD(extack, "Multicast snooping disabled on this VLAN");
+		return -EINVAL;
+	}
+
+	spin_lock_bh(&br->multicast_lock);
+	err = br_multicast_pmctx_ngroups_set_max(&v->port_mcast_ctx, max,
+						 extack);
+	spin_unlock_bh(&br->multicast_lock);
+
+	return err;
+}
+
+u32 br_multicast_port_ngroups_get_max(const struct net_bridge_port *port)
+{
+	u32 max;
+
+	spin_lock_bh(&port->br->multicast_lock);
+	max = port->multicast_ctx.mdb_max_entries;
+	spin_unlock_bh(&port->br->multicast_lock);
+
+	return max;
+}
+
+int br_multicast_vlan_ngroups_get_max(struct net_bridge *br,
+				      const struct net_bridge_vlan *v,
+				      u32 *max)
+{
+	if (br_multicast_port_ctx_vlan_disabled(&v->port_mcast_ctx))
+		return -EINVAL;
+
+	spin_lock_bh(&br->multicast_lock);
+	*max = v->port_mcast_ctx.mdb_max_entries;
+	spin_unlock_bh(&br->multicast_lock);
+
+	return 0;
+}
+
 static void br_multicast_destroy_port_group(struct net_bridge_mcast_gc *gc)
 {
 	struct net_bridge_port_group *pg;
diff --git a/net/bridge/br_netlink.c b/net/bridge/br_netlink.c
index a6133d469885..063c1646dfe8 100644
--- a/net/bridge/br_netlink.c
+++ b/net/bridge/br_netlink.c
@@ -202,6 +202,8 @@ static inline size_t br_port_info_size(void)
 		+ nla_total_size_64bit(sizeof(u64)) /* IFLA_BRPORT_HOLD_TIMER */
 #ifdef CONFIG_BRIDGE_IGMP_SNOOPING
 		+ nla_total_size(sizeof(u8))	/* IFLA_BRPORT_MULTICAST_ROUTER */
+		+ nla_total_size(sizeof(u32))	/* IFLA_BRPORT_MCAST_N_GROUPS */
+		+ nla_total_size(sizeof(u32))	/* IFLA_BRPORT_MCAST_MAX_GROUPS */
 #endif
 		+ nla_total_size(sizeof(u16))	/* IFLA_BRPORT_GROUP_FWD_MASK */
 		+ nla_total_size(sizeof(u8))	/* IFLA_BRPORT_MRP_RING_OPEN */
@@ -298,7 +300,11 @@ static int br_port_fill_attrs(struct sk_buff *skb,
 	    nla_put_u32(skb, IFLA_BRPORT_MCAST_EHT_HOSTS_LIMIT,
 			p->multicast_eht_hosts_limit) ||
 	    nla_put_u32(skb, IFLA_BRPORT_MCAST_EHT_HOSTS_CNT,
-			p->multicast_eht_hosts_cnt))
+			p->multicast_eht_hosts_cnt) ||
+	    nla_put_u32(skb, IFLA_BRPORT_MCAST_N_GROUPS,
+			br_multicast_port_ngroups_get(p)) ||
+	    nla_put_u32(skb, IFLA_BRPORT_MCAST_MAX_GROUPS,
+			br_multicast_port_ngroups_get_max(p)))
 		return -EMSGSIZE;
 #endif
 
@@ -883,6 +889,8 @@ static const struct nla_policy br_port_policy[IFLA_BRPORT_MAX + 1] = {
 	[IFLA_BRPORT_MAB] = { .type = NLA_U8 },
 	[IFLA_BRPORT_BACKUP_PORT] = { .type = NLA_U32 },
 	[IFLA_BRPORT_MCAST_EHT_HOSTS_LIMIT] = { .type = NLA_U32 },
+	[IFLA_BRPORT_MCAST_N_GROUPS] = { .type = NLA_REJECT },
+	[IFLA_BRPORT_MCAST_MAX_GROUPS] = { .type = NLA_U32 },
 };
 
 /* Change the state of the port and notify spanning tree */
@@ -1017,6 +1025,15 @@ static int br_setport(struct net_bridge_port *p, struct nlattr *tb[],
 		if (err)
 			return err;
 	}
+
+	if (tb[IFLA_BRPORT_MCAST_MAX_GROUPS]) {
+		u32 max_groups;
+
+		max_groups = nla_get_u32(tb[IFLA_BRPORT_MCAST_MAX_GROUPS]);
+		err = br_multicast_port_ngroups_set_max(p, max_groups, extack);
+		if (err)
+			return err;
+	}
 #endif
 
 	if (tb[IFLA_BRPORT_GROUP_FWD_MASK]) {
diff --git a/net/bridge/br_private.h b/net/bridge/br_private.h
index 49f411a0a1f1..86b7a221e806 100644
--- a/net/bridge/br_private.h
+++ b/net/bridge/br_private.h
@@ -978,6 +978,19 @@ void br_multicast_uninit_stats(struct net_bridge *br);
 void br_multicast_get_stats(const struct net_bridge *br,
 			    const struct net_bridge_port *p,
 			    struct br_mcast_stats *dest);
+u32 br_multicast_port_ngroups_get(const struct net_bridge_port *port);
+int br_multicast_vlan_ngroups_get(struct net_bridge *br,
+				  const struct net_bridge_vlan *v,
+				  u32 *n);
+int br_multicast_port_ngroups_set_max(struct net_bridge_port *port,
+				      u32 max, struct netlink_ext_ack *extack);
+int br_multicast_vlan_ngroups_set_max(struct net_bridge *br,
+				      struct net_bridge_vlan *v, u32 max,
+				      struct netlink_ext_ack *extack);
+u32 br_multicast_port_ngroups_get_max(const struct net_bridge_port *port);
+int br_multicast_vlan_ngroups_get_max(struct net_bridge *br,
+				      const struct net_bridge_vlan *v,
+				      u32 *max);
 void br_mdb_init(void);
 void br_mdb_uninit(void);
 void br_multicast_host_join(const struct net_bridge_mcast *brmctx,
@@ -1761,7 +1774,8 @@ static inline u16 br_vlan_flags(const struct net_bridge_vlan *v, u16 pvid)
 #ifdef CONFIG_BRIDGE_VLAN_FILTERING
 bool br_vlan_opts_eq_range(const struct net_bridge_vlan *v_curr,
 			   const struct net_bridge_vlan *range_end);
-bool br_vlan_opts_fill(struct sk_buff *skb, const struct net_bridge_vlan *v);
+bool br_vlan_opts_fill(struct sk_buff *skb, const struct net_bridge_vlan *v,
+		       const struct net_bridge_port *p);
 size_t br_vlan_opts_nl_size(void);
 int br_vlan_process_options(const struct net_bridge *br,
 			    const struct net_bridge_port *p,
diff --git a/net/bridge/br_vlan.c b/net/bridge/br_vlan.c
index bc75fa1e4666..8a3dbc09ba38 100644
--- a/net/bridge/br_vlan.c
+++ b/net/bridge/br_vlan.c
@@ -1816,6 +1816,7 @@ static bool br_vlan_stats_fill(struct sk_buff *skb,
 /* v_opts is used to dump the options which must be equal in the whole range */
 static bool br_vlan_fill_vids(struct sk_buff *skb, u16 vid, u16 vid_range,
 			      const struct net_bridge_vlan *v_opts,
+			      const struct net_bridge_port *p,
 			      u16 flags,
 			      bool dump_stats)
 {
@@ -1842,7 +1843,7 @@ static bool br_vlan_fill_vids(struct sk_buff *skb, u16 vid, u16 vid_range,
 		goto out_err;
 
 	if (v_opts) {
-		if (!br_vlan_opts_fill(skb, v_opts))
+		if (!br_vlan_opts_fill(skb, v_opts, p))
 			goto out_err;
 
 		if (dump_stats && !br_vlan_stats_fill(skb, v_opts))
@@ -1925,7 +1926,7 @@ void br_vlan_notify(const struct net_bridge *br,
 		goto out_kfree;
 	}
 
-	if (!br_vlan_fill_vids(skb, vid, vid_range, v, flags, false))
+	if (!br_vlan_fill_vids(skb, vid, vid_range, v, p, flags, false))
 		goto out_err;
 
 	nlmsg_end(skb, nlh);
@@ -2030,7 +2031,7 @@ static int br_vlan_dump_dev(const struct net_device *dev,
 
 			if (!br_vlan_fill_vids(skb, range_start->vid,
 					       range_end->vid, range_start,
-					       vlan_flags, dump_stats)) {
+					       p, vlan_flags, dump_stats)) {
 				err = -EMSGSIZE;
 				break;
 			}
@@ -2056,7 +2057,7 @@ static int br_vlan_dump_dev(const struct net_device *dev,
 		else if (!dump_global &&
 			 !br_vlan_fill_vids(skb, range_start->vid,
 					    range_end->vid, range_start,
-					    br_vlan_flags(range_start, pvid),
+					    p, br_vlan_flags(range_start, pvid),
 					    dump_stats))
 			err = -EMSGSIZE;
 	}
@@ -2131,6 +2132,8 @@ static const struct nla_policy br_vlan_db_policy[BRIDGE_VLANDB_ENTRY_MAX + 1] =
 	[BRIDGE_VLANDB_ENTRY_STATE]	= { .type = NLA_U8 },
 	[BRIDGE_VLANDB_ENTRY_TUNNEL_INFO] = { .type = NLA_NESTED },
 	[BRIDGE_VLANDB_ENTRY_MCAST_ROUTER]	= { .type = NLA_U8 },
+	[BRIDGE_VLANDB_ENTRY_MCAST_N_GROUPS]	= { .type = NLA_REJECT },
+	[BRIDGE_VLANDB_ENTRY_MCAST_MAX_GROUPS]	= { .type = NLA_U32 },
 };
 
 static int br_vlan_rtm_process_one(struct net_device *dev,
diff --git a/net/bridge/br_vlan_options.c b/net/bridge/br_vlan_options.c
index a2724d03278c..43d8f11ce79c 100644
--- a/net/bridge/br_vlan_options.c
+++ b/net/bridge/br_vlan_options.c
@@ -48,7 +48,8 @@ bool br_vlan_opts_eq_range(const struct net_bridge_vlan *v_curr,
 	       curr_mc_rtr == range_mc_rtr;
 }
 
-bool br_vlan_opts_fill(struct sk_buff *skb, const struct net_bridge_vlan *v)
+bool br_vlan_opts_fill(struct sk_buff *skb, const struct net_bridge_vlan *v,
+		       const struct net_bridge_port *p)
 {
 	if (nla_put_u8(skb, BRIDGE_VLANDB_ENTRY_STATE, br_vlan_get_state(v)) ||
 	    !__vlan_tun_put(skb, v))
@@ -58,6 +59,20 @@ bool br_vlan_opts_fill(struct sk_buff *skb, const struct net_bridge_vlan *v)
 	if (nla_put_u8(skb, BRIDGE_VLANDB_ENTRY_MCAST_ROUTER,
 		       br_vlan_multicast_router(v)))
 		return false;
+	if (p && !br_multicast_port_ctx_vlan_disabled(&v->port_mcast_ctx)) {
+		u32 mdb_max_entries;
+		u32 mdb_n_entries;
+
+		if (br_multicast_vlan_ngroups_get(p->br, v, &mdb_n_entries) ||
+		    nla_put_u32(skb, BRIDGE_VLANDB_ENTRY_MCAST_N_GROUPS,
+				mdb_n_entries))
+			return false;
+		if (br_multicast_vlan_ngroups_get_max(p->br, v,
+						      &mdb_max_entries) ||
+		    nla_put_u32(skb, BRIDGE_VLANDB_ENTRY_MCAST_MAX_GROUPS,
+				mdb_max_entries))
+			return false;
+	}
 #endif
 
 	return true;
@@ -70,6 +85,8 @@ size_t br_vlan_opts_nl_size(void)
 	       + nla_total_size(sizeof(u32)) /* BRIDGE_VLANDB_TINFO_ID */
 #ifdef CONFIG_BRIDGE_IGMP_SNOOPING
 	       + nla_total_size(sizeof(u8)) /* BRIDGE_VLANDB_ENTRY_MCAST_ROUTER */
+	       + nla_total_size(sizeof(u32)) /* BRIDGE_VLANDB_ENTRY_MCAST_N_GROUPS */
+	       + nla_total_size(sizeof(u32)) /* BRIDGE_VLANDB_ENTRY_MCAST_MAX_GROUPS */
 #endif
 	       + 0;
 }
@@ -212,6 +229,20 @@ static int br_vlan_process_one_opts(const struct net_bridge *br,
 			return err;
 		*changed = true;
 	}
+	if (tb[BRIDGE_VLANDB_ENTRY_MCAST_MAX_GROUPS]) {
+		u32 val;
+
+		if (!p) {
+			NL_SET_ERR_MSG_MOD(extack, "Can't set mcast_max_groups for non-port vlans");
+			return -EINVAL;
+		}
+
+		val = nla_get_u32(tb[BRIDGE_VLANDB_ENTRY_MCAST_MAX_GROUPS]);
+		err = br_multicast_vlan_ngroups_set_max(p->br, v, val, extack);
+		if (err)
+			return err;
+		*changed = true;
+	}
 #endif
 
 	return 0;
diff --git a/net/core/rtnetlink.c b/net/core/rtnetlink.c
index 64289bc98887..e786255a8360 100644
--- a/net/core/rtnetlink.c
+++ b/net/core/rtnetlink.c
@@ -58,7 +58,7 @@
 #include "dev.h"
 
 #define RTNL_MAX_TYPE		50
-#define RTNL_SLAVE_MAX_TYPE	40
+#define RTNL_SLAVE_MAX_TYPE	42
 
 struct rtnl_link {
 	rtnl_doit_func		doit;
-- 
2.39.0


^ permalink raw reply related	[flat|nested] 90+ messages in thread

* [Bridge] [PATCH net-next 08/16] net: bridge: Add netlink knobs for number / maximum MDB entries
@ 2023-01-26 17:01   ` Petr Machata
  0 siblings, 0 replies; 90+ messages in thread
From: Petr Machata @ 2023-01-26 17:01 UTC (permalink / raw)
  To: David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
	Roopa Prabhu, Nikolay Aleksandrov, netdev
  Cc: Petr Machata, Ido Schimmel, bridge

The previous patch added accounting for number of MDB entries per port and
per port-VLAN, and the logic to verify that these values stay within
configured bounds. However it didn't provide means to actually configure
those bounds or read the occupancy. This patch does that.

Two new netlink attributes are added for the MDB occupancy:
IFLA_BRPORT_MCAST_N_GROUPS for the per-port occupancy and
BRIDGE_VLANDB_ENTRY_MCAST_N_GROUPS for the per-port-VLAN occupancy.
And another two for the maximum number of MDB entries:
IFLA_BRPORT_MCAST_MAX_GROUPS for the per-port maximum, and
BRIDGE_VLANDB_ENTRY_MCAST_MAX_GROUPS for the per-port-VLAN one.

Note that the two new IFLA_BRPORT_ attributes prompt bumping of
RTNL_SLAVE_MAX_TYPE to size the slave attribute tables large enough.

The new attributes are used like this:

 # ip link add name br up type bridge vlan_filtering 1 mcast_snooping 1 \
                                      mcast_vlan_snooping 1 mcast_querier 1
 # ip link set dev v1 master br
 # bridge vlan add dev v1 vid 2

 # bridge vlan set dev v1 vid 1 mcast_max_groups 1
 # bridge mdb add dev br port v1 grp 230.1.2.3 temp vid 1
 # bridge mdb add dev br port v1 grp 230.1.2.4 temp vid 1
 Error: bridge: Port-VLAN is already a member in mcast_max_groups (1) groups.

 # bridge link set dev v1 mcast_max_groups 1
 # bridge mdb add dev br port v1 grp 230.1.2.3 temp vid 2
 Error: bridge: Port is already a member in mcast_max_groups (1) groups.

 # bridge -d link show
 5: v1@v2: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 master br [...]
     [...] mcast_n_groups 1 mcast_max_groups 1

 # bridge -d vlan show
 port              vlan-id
 br                1 PVID Egress Untagged
                     state forwarding mcast_router 1
 v1                1 PVID Egress Untagged
                     [...] mcast_n_groups 1 mcast_max_groups 1
                   2
                     [...] mcast_n_groups 0 mcast_max_groups 0

Signed-off-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
---
 include/uapi/linux/if_bridge.h |  2 +
 include/uapi/linux/if_link.h   |  2 +
 net/bridge/br_multicast.c      | 96 ++++++++++++++++++++++++++++++++++
 net/bridge/br_netlink.c        | 19 ++++++-
 net/bridge/br_private.h        | 16 +++++-
 net/bridge/br_vlan.c           | 11 ++--
 net/bridge/br_vlan_options.c   | 33 +++++++++++-
 net/core/rtnetlink.c           |  2 +-
 8 files changed, 173 insertions(+), 8 deletions(-)

diff --git a/include/uapi/linux/if_bridge.h b/include/uapi/linux/if_bridge.h
index d9de241d90f9..d60c456710b3 100644
--- a/include/uapi/linux/if_bridge.h
+++ b/include/uapi/linux/if_bridge.h
@@ -523,6 +523,8 @@ enum {
 	BRIDGE_VLANDB_ENTRY_TUNNEL_INFO,
 	BRIDGE_VLANDB_ENTRY_STATS,
 	BRIDGE_VLANDB_ENTRY_MCAST_ROUTER,
+	BRIDGE_VLANDB_ENTRY_MCAST_N_GROUPS,
+	BRIDGE_VLANDB_ENTRY_MCAST_MAX_GROUPS,
 	__BRIDGE_VLANDB_ENTRY_MAX,
 };
 #define BRIDGE_VLANDB_ENTRY_MAX (__BRIDGE_VLANDB_ENTRY_MAX - 1)
diff --git a/include/uapi/linux/if_link.h b/include/uapi/linux/if_link.h
index 1021a7e47a86..1bed3a72939c 100644
--- a/include/uapi/linux/if_link.h
+++ b/include/uapi/linux/if_link.h
@@ -564,6 +564,8 @@ enum {
 	IFLA_BRPORT_MCAST_EHT_HOSTS_CNT,
 	IFLA_BRPORT_LOCKED,
 	IFLA_BRPORT_MAB,
+	IFLA_BRPORT_MCAST_N_GROUPS,
+	IFLA_BRPORT_MCAST_MAX_GROUPS,
 	__IFLA_BRPORT_MAX
 };
 #define IFLA_BRPORT_MAX (__IFLA_BRPORT_MAX - 1)
diff --git a/net/bridge/br_multicast.c b/net/bridge/br_multicast.c
index de531109b947..04261dd2380b 100644
--- a/net/bridge/br_multicast.c
+++ b/net/bridge/br_multicast.c
@@ -766,6 +766,102 @@ static void br_multicast_port_ngroups_dec(struct net_bridge_port *port, u16 vid)
 	br_multicast_port_ngroups_dec_one(&port->multicast_ctx);
 }
 
+static int
+br_multicast_pmctx_ngroups_set_max(struct net_bridge_mcast_port *pmctx,
+				   u32 max, struct netlink_ext_ack *extack)
+{
+	if (max && max < pmctx->mdb_n_entries) {
+		NL_SET_ERR_MSG_FMT_MOD(extack, "Can't set mcast_max_groups=%u, which is below mcast_n_groups=%u",
+				       max, pmctx->mdb_n_entries);
+		return -EINVAL;
+	}
+
+	pmctx->mdb_max_entries = max;
+	return 0;
+}
+
+u32 br_multicast_port_ngroups_get(const struct net_bridge_port *port)
+{
+	u32 n;
+
+	spin_lock_bh(&port->br->multicast_lock);
+	n = port->multicast_ctx.mdb_n_entries;
+	spin_unlock_bh(&port->br->multicast_lock);
+
+	return n;
+}
+
+int br_multicast_vlan_ngroups_get(struct net_bridge *br,
+				  const struct net_bridge_vlan *v,
+				  u32 *n)
+{
+	if (br_multicast_port_ctx_vlan_disabled(&v->port_mcast_ctx))
+		return -EINVAL;
+
+	spin_lock_bh(&br->multicast_lock);
+	*n = v->port_mcast_ctx.mdb_n_entries;
+	spin_unlock_bh(&br->multicast_lock);
+
+	return 0;
+}
+
+int br_multicast_port_ngroups_set_max(struct net_bridge_port *port, u32 max,
+				      struct netlink_ext_ack *extack)
+{
+	int err;
+
+	spin_lock_bh(&port->br->multicast_lock);
+	err = br_multicast_pmctx_ngroups_set_max(&port->multicast_ctx, max,
+						 extack);
+	spin_unlock_bh(&port->br->multicast_lock);
+
+	return err;
+}
+
+int br_multicast_vlan_ngroups_set_max(struct net_bridge *br,
+				      struct net_bridge_vlan *v, u32 max,
+				      struct netlink_ext_ack *extack)
+{
+	int err;
+
+	if (br_multicast_port_ctx_vlan_disabled(&v->port_mcast_ctx)) {
+		NL_SET_ERR_MSG_MOD(extack, "Multicast snooping disabled on this VLAN");
+		return -EINVAL;
+	}
+
+	spin_lock_bh(&br->multicast_lock);
+	err = br_multicast_pmctx_ngroups_set_max(&v->port_mcast_ctx, max,
+						 extack);
+	spin_unlock_bh(&br->multicast_lock);
+
+	return err;
+}
+
+u32 br_multicast_port_ngroups_get_max(const struct net_bridge_port *port)
+{
+	u32 max;
+
+	spin_lock_bh(&port->br->multicast_lock);
+	max = port->multicast_ctx.mdb_max_entries;
+	spin_unlock_bh(&port->br->multicast_lock);
+
+	return max;
+}
+
+int br_multicast_vlan_ngroups_get_max(struct net_bridge *br,
+				      const struct net_bridge_vlan *v,
+				      u32 *max)
+{
+	if (br_multicast_port_ctx_vlan_disabled(&v->port_mcast_ctx))
+		return -EINVAL;
+
+	spin_lock_bh(&br->multicast_lock);
+	*max = v->port_mcast_ctx.mdb_max_entries;
+	spin_unlock_bh(&br->multicast_lock);
+
+	return 0;
+}
+
 static void br_multicast_destroy_port_group(struct net_bridge_mcast_gc *gc)
 {
 	struct net_bridge_port_group *pg;
diff --git a/net/bridge/br_netlink.c b/net/bridge/br_netlink.c
index a6133d469885..063c1646dfe8 100644
--- a/net/bridge/br_netlink.c
+++ b/net/bridge/br_netlink.c
@@ -202,6 +202,8 @@ static inline size_t br_port_info_size(void)
 		+ nla_total_size_64bit(sizeof(u64)) /* IFLA_BRPORT_HOLD_TIMER */
 #ifdef CONFIG_BRIDGE_IGMP_SNOOPING
 		+ nla_total_size(sizeof(u8))	/* IFLA_BRPORT_MULTICAST_ROUTER */
+		+ nla_total_size(sizeof(u32))	/* IFLA_BRPORT_MCAST_N_GROUPS */
+		+ nla_total_size(sizeof(u32))	/* IFLA_BRPORT_MCAST_MAX_GROUPS */
 #endif
 		+ nla_total_size(sizeof(u16))	/* IFLA_BRPORT_GROUP_FWD_MASK */
 		+ nla_total_size(sizeof(u8))	/* IFLA_BRPORT_MRP_RING_OPEN */
@@ -298,7 +300,11 @@ static int br_port_fill_attrs(struct sk_buff *skb,
 	    nla_put_u32(skb, IFLA_BRPORT_MCAST_EHT_HOSTS_LIMIT,
 			p->multicast_eht_hosts_limit) ||
 	    nla_put_u32(skb, IFLA_BRPORT_MCAST_EHT_HOSTS_CNT,
-			p->multicast_eht_hosts_cnt))
+			p->multicast_eht_hosts_cnt) ||
+	    nla_put_u32(skb, IFLA_BRPORT_MCAST_N_GROUPS,
+			br_multicast_port_ngroups_get(p)) ||
+	    nla_put_u32(skb, IFLA_BRPORT_MCAST_MAX_GROUPS,
+			br_multicast_port_ngroups_get_max(p)))
 		return -EMSGSIZE;
 #endif
 
@@ -883,6 +889,8 @@ static const struct nla_policy br_port_policy[IFLA_BRPORT_MAX + 1] = {
 	[IFLA_BRPORT_MAB] = { .type = NLA_U8 },
 	[IFLA_BRPORT_BACKUP_PORT] = { .type = NLA_U32 },
 	[IFLA_BRPORT_MCAST_EHT_HOSTS_LIMIT] = { .type = NLA_U32 },
+	[IFLA_BRPORT_MCAST_N_GROUPS] = { .type = NLA_REJECT },
+	[IFLA_BRPORT_MCAST_MAX_GROUPS] = { .type = NLA_U32 },
 };
 
 /* Change the state of the port and notify spanning tree */
@@ -1017,6 +1025,15 @@ static int br_setport(struct net_bridge_port *p, struct nlattr *tb[],
 		if (err)
 			return err;
 	}
+
+	if (tb[IFLA_BRPORT_MCAST_MAX_GROUPS]) {
+		u32 max_groups;
+
+		max_groups = nla_get_u32(tb[IFLA_BRPORT_MCAST_MAX_GROUPS]);
+		err = br_multicast_port_ngroups_set_max(p, max_groups, extack);
+		if (err)
+			return err;
+	}
 #endif
 
 	if (tb[IFLA_BRPORT_GROUP_FWD_MASK]) {
diff --git a/net/bridge/br_private.h b/net/bridge/br_private.h
index 49f411a0a1f1..86b7a221e806 100644
--- a/net/bridge/br_private.h
+++ b/net/bridge/br_private.h
@@ -978,6 +978,19 @@ void br_multicast_uninit_stats(struct net_bridge *br);
 void br_multicast_get_stats(const struct net_bridge *br,
 			    const struct net_bridge_port *p,
 			    struct br_mcast_stats *dest);
+u32 br_multicast_port_ngroups_get(const struct net_bridge_port *port);
+int br_multicast_vlan_ngroups_get(struct net_bridge *br,
+				  const struct net_bridge_vlan *v,
+				  u32 *n);
+int br_multicast_port_ngroups_set_max(struct net_bridge_port *port,
+				      u32 max, struct netlink_ext_ack *extack);
+int br_multicast_vlan_ngroups_set_max(struct net_bridge *br,
+				      struct net_bridge_vlan *v, u32 max,
+				      struct netlink_ext_ack *extack);
+u32 br_multicast_port_ngroups_get_max(const struct net_bridge_port *port);
+int br_multicast_vlan_ngroups_get_max(struct net_bridge *br,
+				      const struct net_bridge_vlan *v,
+				      u32 *max);
 void br_mdb_init(void);
 void br_mdb_uninit(void);
 void br_multicast_host_join(const struct net_bridge_mcast *brmctx,
@@ -1761,7 +1774,8 @@ static inline u16 br_vlan_flags(const struct net_bridge_vlan *v, u16 pvid)
 #ifdef CONFIG_BRIDGE_VLAN_FILTERING
 bool br_vlan_opts_eq_range(const struct net_bridge_vlan *v_curr,
 			   const struct net_bridge_vlan *range_end);
-bool br_vlan_opts_fill(struct sk_buff *skb, const struct net_bridge_vlan *v);
+bool br_vlan_opts_fill(struct sk_buff *skb, const struct net_bridge_vlan *v,
+		       const struct net_bridge_port *p);
 size_t br_vlan_opts_nl_size(void);
 int br_vlan_process_options(const struct net_bridge *br,
 			    const struct net_bridge_port *p,
diff --git a/net/bridge/br_vlan.c b/net/bridge/br_vlan.c
index bc75fa1e4666..8a3dbc09ba38 100644
--- a/net/bridge/br_vlan.c
+++ b/net/bridge/br_vlan.c
@@ -1816,6 +1816,7 @@ static bool br_vlan_stats_fill(struct sk_buff *skb,
 /* v_opts is used to dump the options which must be equal in the whole range */
 static bool br_vlan_fill_vids(struct sk_buff *skb, u16 vid, u16 vid_range,
 			      const struct net_bridge_vlan *v_opts,
+			      const struct net_bridge_port *p,
 			      u16 flags,
 			      bool dump_stats)
 {
@@ -1842,7 +1843,7 @@ static bool br_vlan_fill_vids(struct sk_buff *skb, u16 vid, u16 vid_range,
 		goto out_err;
 
 	if (v_opts) {
-		if (!br_vlan_opts_fill(skb, v_opts))
+		if (!br_vlan_opts_fill(skb, v_opts, p))
 			goto out_err;
 
 		if (dump_stats && !br_vlan_stats_fill(skb, v_opts))
@@ -1925,7 +1926,7 @@ void br_vlan_notify(const struct net_bridge *br,
 		goto out_kfree;
 	}
 
-	if (!br_vlan_fill_vids(skb, vid, vid_range, v, flags, false))
+	if (!br_vlan_fill_vids(skb, vid, vid_range, v, p, flags, false))
 		goto out_err;
 
 	nlmsg_end(skb, nlh);
@@ -2030,7 +2031,7 @@ static int br_vlan_dump_dev(const struct net_device *dev,
 
 			if (!br_vlan_fill_vids(skb, range_start->vid,
 					       range_end->vid, range_start,
-					       vlan_flags, dump_stats)) {
+					       p, vlan_flags, dump_stats)) {
 				err = -EMSGSIZE;
 				break;
 			}
@@ -2056,7 +2057,7 @@ static int br_vlan_dump_dev(const struct net_device *dev,
 		else if (!dump_global &&
 			 !br_vlan_fill_vids(skb, range_start->vid,
 					    range_end->vid, range_start,
-					    br_vlan_flags(range_start, pvid),
+					    p, br_vlan_flags(range_start, pvid),
 					    dump_stats))
 			err = -EMSGSIZE;
 	}
@@ -2131,6 +2132,8 @@ static const struct nla_policy br_vlan_db_policy[BRIDGE_VLANDB_ENTRY_MAX + 1] =
 	[BRIDGE_VLANDB_ENTRY_STATE]	= { .type = NLA_U8 },
 	[BRIDGE_VLANDB_ENTRY_TUNNEL_INFO] = { .type = NLA_NESTED },
 	[BRIDGE_VLANDB_ENTRY_MCAST_ROUTER]	= { .type = NLA_U8 },
+	[BRIDGE_VLANDB_ENTRY_MCAST_N_GROUPS]	= { .type = NLA_REJECT },
+	[BRIDGE_VLANDB_ENTRY_MCAST_MAX_GROUPS]	= { .type = NLA_U32 },
 };
 
 static int br_vlan_rtm_process_one(struct net_device *dev,
diff --git a/net/bridge/br_vlan_options.c b/net/bridge/br_vlan_options.c
index a2724d03278c..43d8f11ce79c 100644
--- a/net/bridge/br_vlan_options.c
+++ b/net/bridge/br_vlan_options.c
@@ -48,7 +48,8 @@ bool br_vlan_opts_eq_range(const struct net_bridge_vlan *v_curr,
 	       curr_mc_rtr == range_mc_rtr;
 }
 
-bool br_vlan_opts_fill(struct sk_buff *skb, const struct net_bridge_vlan *v)
+bool br_vlan_opts_fill(struct sk_buff *skb, const struct net_bridge_vlan *v,
+		       const struct net_bridge_port *p)
 {
 	if (nla_put_u8(skb, BRIDGE_VLANDB_ENTRY_STATE, br_vlan_get_state(v)) ||
 	    !__vlan_tun_put(skb, v))
@@ -58,6 +59,20 @@ bool br_vlan_opts_fill(struct sk_buff *skb, const struct net_bridge_vlan *v)
 	if (nla_put_u8(skb, BRIDGE_VLANDB_ENTRY_MCAST_ROUTER,
 		       br_vlan_multicast_router(v)))
 		return false;
+	if (p && !br_multicast_port_ctx_vlan_disabled(&v->port_mcast_ctx)) {
+		u32 mdb_max_entries;
+		u32 mdb_n_entries;
+
+		if (br_multicast_vlan_ngroups_get(p->br, v, &mdb_n_entries) ||
+		    nla_put_u32(skb, BRIDGE_VLANDB_ENTRY_MCAST_N_GROUPS,
+				mdb_n_entries))
+			return false;
+		if (br_multicast_vlan_ngroups_get_max(p->br, v,
+						      &mdb_max_entries) ||
+		    nla_put_u32(skb, BRIDGE_VLANDB_ENTRY_MCAST_MAX_GROUPS,
+				mdb_max_entries))
+			return false;
+	}
 #endif
 
 	return true;
@@ -70,6 +85,8 @@ size_t br_vlan_opts_nl_size(void)
 	       + nla_total_size(sizeof(u32)) /* BRIDGE_VLANDB_TINFO_ID */
 #ifdef CONFIG_BRIDGE_IGMP_SNOOPING
 	       + nla_total_size(sizeof(u8)) /* BRIDGE_VLANDB_ENTRY_MCAST_ROUTER */
+	       + nla_total_size(sizeof(u32)) /* BRIDGE_VLANDB_ENTRY_MCAST_N_GROUPS */
+	       + nla_total_size(sizeof(u32)) /* BRIDGE_VLANDB_ENTRY_MCAST_MAX_GROUPS */
 #endif
 	       + 0;
 }
@@ -212,6 +229,20 @@ static int br_vlan_process_one_opts(const struct net_bridge *br,
 			return err;
 		*changed = true;
 	}
+	if (tb[BRIDGE_VLANDB_ENTRY_MCAST_MAX_GROUPS]) {
+		u32 val;
+
+		if (!p) {
+			NL_SET_ERR_MSG_MOD(extack, "Can't set mcast_max_groups for non-port vlans");
+			return -EINVAL;
+		}
+
+		val = nla_get_u32(tb[BRIDGE_VLANDB_ENTRY_MCAST_MAX_GROUPS]);
+		err = br_multicast_vlan_ngroups_set_max(p->br, v, val, extack);
+		if (err)
+			return err;
+		*changed = true;
+	}
 #endif
 
 	return 0;
diff --git a/net/core/rtnetlink.c b/net/core/rtnetlink.c
index 64289bc98887..e786255a8360 100644
--- a/net/core/rtnetlink.c
+++ b/net/core/rtnetlink.c
@@ -58,7 +58,7 @@
 #include "dev.h"
 
 #define RTNL_MAX_TYPE		50
-#define RTNL_SLAVE_MAX_TYPE	40
+#define RTNL_SLAVE_MAX_TYPE	42
 
 struct rtnl_link {
 	rtnl_doit_func		doit;
-- 
2.39.0


^ permalink raw reply related	[flat|nested] 90+ messages in thread

* [PATCH net-next 09/16] selftests: forwarding: Move IGMP- and MLD-related functions to lib
  2023-01-26 17:01 ` [Bridge] " Petr Machata
@ 2023-01-26 17:01   ` Petr Machata
  -1 siblings, 0 replies; 90+ messages in thread
From: Petr Machata @ 2023-01-26 17:01 UTC (permalink / raw)
  To: David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
	Roopa Prabhu, Nikolay Aleksandrov, netdev
  Cc: bridge, Petr Machata, Ido Schimmel

These functions will be helpful for other testsuites as well. Extract them
to a common place.

Signed-off-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
---
 .../selftests/net/forwarding/bridge_mdb.sh    | 49 -------------------
 tools/testing/selftests/net/forwarding/lib.sh | 49 +++++++++++++++++++
 2 files changed, 49 insertions(+), 49 deletions(-)

diff --git a/tools/testing/selftests/net/forwarding/bridge_mdb.sh b/tools/testing/selftests/net/forwarding/bridge_mdb.sh
index 2fa5973c0c28..51f2b0d77067 100755
--- a/tools/testing/selftests/net/forwarding/bridge_mdb.sh
+++ b/tools/testing/selftests/net/forwarding/bridge_mdb.sh
@@ -1018,26 +1018,6 @@ fwd_test()
 	ip -6 address del fe80::1/64 dev br0
 }
 
-igmpv3_is_in_get()
-{
-	local igmpv3
-
-	igmpv3=$(:
-		)"22:"$(			: Type - Membership Report
-		)"00:"$(			: Reserved
-		)"2a:f8:"$(			: Checksum
-		)"00:00:"$(			: Reserved
-		)"00:01:"$(			: Number of Group Records
-		)"01:"$(			: Record Type - IS_IN
-		)"00:"$(			: Aux Data Len
-		)"00:01:"$(			: Number of Sources
-		)"ef:01:01:01:"$(		: Multicast Address - 239.1.1.1
-		)"c0:00:02:02"$(		: Source Address - 192.0.2.2
-		)
-
-	echo $igmpv3
-}
-
 ctrl_igmpv3_is_in_test()
 {
 	RET=0
@@ -1077,35 +1057,6 @@ ctrl_igmpv3_is_in_test()
 	log_test "IGMPv3 MODE_IS_INCLUE tests"
 }
 
-mldv2_is_in_get()
-{
-	local hbh
-	local icmpv6
-
-	hbh=$(:
-		)"3a:"$(			: Next Header - ICMPv6
-		)"00:"$(			: Hdr Ext Len
-		)"00:00:00:00:00:00:"$(		: Options and Padding
-		)
-
-	icmpv6=$(:
-		)"8f:"$(			: Type - MLDv2 Report
-		)"00:"$(			: Code
-		)"45:39:"$(			: Checksum
-		)"00:00:"$(			: Reserved
-		)"00:01:"$(			: Number of Group Records
-		)"01:"$(			: Record Type - IS_IN
-		)"00:"$(			: Aux Data Len
-		)"00:01:"$(			: Number of Sources
-		)"ff:0e:00:00:00:00:00:00:"$(	: Multicast address - ff0e::1
-		)"00:00:00:00:00:00:00:01:"$(	:
-		)"20:01:0d:b8:00:01:00:00:"$(	: Source Address - 2001:db8:1::2
-		)"00:00:00:00:00:00:00:02:"$(	:
-		)
-
-	echo ${hbh}${icmpv6}
-}
-
 ctrl_mldv2_is_in_test()
 {
 	RET=0
diff --git a/tools/testing/selftests/net/forwarding/lib.sh b/tools/testing/selftests/net/forwarding/lib.sh
index 1c4f866de7d7..db2534f7e49b 100755
--- a/tools/testing/selftests/net/forwarding/lib.sh
+++ b/tools/testing/selftests/net/forwarding/lib.sh
@@ -1692,3 +1692,52 @@ hw_stats_monitor_test()
 
 	log_test "${type}_stats notifications"
 }
+
+igmpv3_is_in_get()
+{
+	local igmpv3
+
+	igmpv3=$(:
+		)"22:"$(			: Type - Membership Report
+		)"00:"$(			: Reserved
+		)"2a:f8:"$(			: Checksum
+		)"00:00:"$(			: Reserved
+		)"00:01:"$(			: Number of Group Records
+		)"01:"$(			: Record Type - IS_IN
+		)"00:"$(			: Aux Data Len
+		)"00:01:"$(			: Number of Sources
+		)"ef:01:01:01:"$(		: Multicast Address - 239.1.1.1
+		)"c0:00:02:02"$(		: Source Address - 192.0.2.2
+		)
+
+	echo $igmpv3
+}
+
+mldv2_is_in_get()
+{
+	local hbh
+	local icmpv6
+
+	hbh=$(:
+		)"3a:"$(			: Next Header - ICMPv6
+		)"00:"$(			: Hdr Ext Len
+		)"00:00:00:00:00:00:"$(		: Options and Padding
+		)
+
+	icmpv6=$(:
+		)"8f:"$(			: Type - MLDv2 Report
+		)"00:"$(			: Code
+		)"45:39:"$(			: Checksum
+		)"00:00:"$(			: Reserved
+		)"00:01:"$(			: Number of Group Records
+		)"01:"$(			: Record Type - IS_IN
+		)"00:"$(			: Aux Data Len
+		)"00:01:"$(			: Number of Sources
+		)"ff:0e:00:00:00:00:00:00:"$(	: Multicast address - ff0e::1
+		)"00:00:00:00:00:00:00:01:"$(	:
+		)"20:01:0d:b8:00:01:00:00:"$(	: Source Address - 2001:db8:1::2
+		)"00:00:00:00:00:00:00:02:"$(	:
+		)
+
+	echo ${hbh}${icmpv6}
+}
-- 
2.39.0


^ permalink raw reply related	[flat|nested] 90+ messages in thread

* [Bridge] [PATCH net-next 09/16] selftests: forwarding: Move IGMP- and MLD-related functions to lib
@ 2023-01-26 17:01   ` Petr Machata
  0 siblings, 0 replies; 90+ messages in thread
From: Petr Machata @ 2023-01-26 17:01 UTC (permalink / raw)
  To: David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
	Roopa Prabhu, Nikolay Aleksandrov, netdev
  Cc: Petr Machata, Ido Schimmel, bridge

These functions will be helpful for other testsuites as well. Extract them
to a common place.

Signed-off-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
---
 .../selftests/net/forwarding/bridge_mdb.sh    | 49 -------------------
 tools/testing/selftests/net/forwarding/lib.sh | 49 +++++++++++++++++++
 2 files changed, 49 insertions(+), 49 deletions(-)

diff --git a/tools/testing/selftests/net/forwarding/bridge_mdb.sh b/tools/testing/selftests/net/forwarding/bridge_mdb.sh
index 2fa5973c0c28..51f2b0d77067 100755
--- a/tools/testing/selftests/net/forwarding/bridge_mdb.sh
+++ b/tools/testing/selftests/net/forwarding/bridge_mdb.sh
@@ -1018,26 +1018,6 @@ fwd_test()
 	ip -6 address del fe80::1/64 dev br0
 }
 
-igmpv3_is_in_get()
-{
-	local igmpv3
-
-	igmpv3=$(:
-		)"22:"$(			: Type - Membership Report
-		)"00:"$(			: Reserved
-		)"2a:f8:"$(			: Checksum
-		)"00:00:"$(			: Reserved
-		)"00:01:"$(			: Number of Group Records
-		)"01:"$(			: Record Type - IS_IN
-		)"00:"$(			: Aux Data Len
-		)"00:01:"$(			: Number of Sources
-		)"ef:01:01:01:"$(		: Multicast Address - 239.1.1.1
-		)"c0:00:02:02"$(		: Source Address - 192.0.2.2
-		)
-
-	echo $igmpv3
-}
-
 ctrl_igmpv3_is_in_test()
 {
 	RET=0
@@ -1077,35 +1057,6 @@ ctrl_igmpv3_is_in_test()
 	log_test "IGMPv3 MODE_IS_INCLUE tests"
 }
 
-mldv2_is_in_get()
-{
-	local hbh
-	local icmpv6
-
-	hbh=$(:
-		)"3a:"$(			: Next Header - ICMPv6
-		)"00:"$(			: Hdr Ext Len
-		)"00:00:00:00:00:00:"$(		: Options and Padding
-		)
-
-	icmpv6=$(:
-		)"8f:"$(			: Type - MLDv2 Report
-		)"00:"$(			: Code
-		)"45:39:"$(			: Checksum
-		)"00:00:"$(			: Reserved
-		)"00:01:"$(			: Number of Group Records
-		)"01:"$(			: Record Type - IS_IN
-		)"00:"$(			: Aux Data Len
-		)"00:01:"$(			: Number of Sources
-		)"ff:0e:00:00:00:00:00:00:"$(	: Multicast address - ff0e::1
-		)"00:00:00:00:00:00:00:01:"$(	:
-		)"20:01:0d:b8:00:01:00:00:"$(	: Source Address - 2001:db8:1::2
-		)"00:00:00:00:00:00:00:02:"$(	:
-		)
-
-	echo ${hbh}${icmpv6}
-}
-
 ctrl_mldv2_is_in_test()
 {
 	RET=0
diff --git a/tools/testing/selftests/net/forwarding/lib.sh b/tools/testing/selftests/net/forwarding/lib.sh
index 1c4f866de7d7..db2534f7e49b 100755
--- a/tools/testing/selftests/net/forwarding/lib.sh
+++ b/tools/testing/selftests/net/forwarding/lib.sh
@@ -1692,3 +1692,52 @@ hw_stats_monitor_test()
 
 	log_test "${type}_stats notifications"
 }
+
+igmpv3_is_in_get()
+{
+	local igmpv3
+
+	igmpv3=$(:
+		)"22:"$(			: Type - Membership Report
+		)"00:"$(			: Reserved
+		)"2a:f8:"$(			: Checksum
+		)"00:00:"$(			: Reserved
+		)"00:01:"$(			: Number of Group Records
+		)"01:"$(			: Record Type - IS_IN
+		)"00:"$(			: Aux Data Len
+		)"00:01:"$(			: Number of Sources
+		)"ef:01:01:01:"$(		: Multicast Address - 239.1.1.1
+		)"c0:00:02:02"$(		: Source Address - 192.0.2.2
+		)
+
+	echo $igmpv3
+}
+
+mldv2_is_in_get()
+{
+	local hbh
+	local icmpv6
+
+	hbh=$(:
+		)"3a:"$(			: Next Header - ICMPv6
+		)"00:"$(			: Hdr Ext Len
+		)"00:00:00:00:00:00:"$(		: Options and Padding
+		)
+
+	icmpv6=$(:
+		)"8f:"$(			: Type - MLDv2 Report
+		)"00:"$(			: Code
+		)"45:39:"$(			: Checksum
+		)"00:00:"$(			: Reserved
+		)"00:01:"$(			: Number of Group Records
+		)"01:"$(			: Record Type - IS_IN
+		)"00:"$(			: Aux Data Len
+		)"00:01:"$(			: Number of Sources
+		)"ff:0e:00:00:00:00:00:00:"$(	: Multicast address - ff0e::1
+		)"00:00:00:00:00:00:00:01:"$(	:
+		)"20:01:0d:b8:00:01:00:00:"$(	: Source Address - 2001:db8:1::2
+		)"00:00:00:00:00:00:00:02:"$(	:
+		)
+
+	echo ${hbh}${icmpv6}
+}
-- 
2.39.0


^ permalink raw reply related	[flat|nested] 90+ messages in thread

* [PATCH net-next 10/16] selftests: forwarding: bridge_mdb: Fix a typo
  2023-01-26 17:01 ` [Bridge] " Petr Machata
@ 2023-01-26 17:01   ` Petr Machata
  -1 siblings, 0 replies; 90+ messages in thread
From: Petr Machata @ 2023-01-26 17:01 UTC (permalink / raw)
  To: David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
	Roopa Prabhu, Nikolay Aleksandrov, netdev
  Cc: bridge, Petr Machata, Ido Schimmel

Add the letter missing from the word "INCLUDE".

Signed-off-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
---
 tools/testing/selftests/net/forwarding/bridge_mdb.sh | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/tools/testing/selftests/net/forwarding/bridge_mdb.sh b/tools/testing/selftests/net/forwarding/bridge_mdb.sh
index 51f2b0d77067..4e16677f02ba 100755
--- a/tools/testing/selftests/net/forwarding/bridge_mdb.sh
+++ b/tools/testing/selftests/net/forwarding/bridge_mdb.sh
@@ -1054,7 +1054,7 @@ ctrl_igmpv3_is_in_test()
 
 	bridge mdb del dev br0 port $swp1 grp 239.1.1.1 vid 10
 
-	log_test "IGMPv3 MODE_IS_INCLUE tests"
+	log_test "IGMPv3 MODE_IS_INCLUDE tests"
 }
 
 ctrl_mldv2_is_in_test()
-- 
2.39.0


^ permalink raw reply related	[flat|nested] 90+ messages in thread

* [Bridge] [PATCH net-next 10/16] selftests: forwarding: bridge_mdb: Fix a typo
@ 2023-01-26 17:01   ` Petr Machata
  0 siblings, 0 replies; 90+ messages in thread
From: Petr Machata @ 2023-01-26 17:01 UTC (permalink / raw)
  To: David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
	Roopa Prabhu, Nikolay Aleksandrov, netdev
  Cc: Petr Machata, Ido Schimmel, bridge

Add the letter missing from the word "INCLUDE".

Signed-off-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
---
 tools/testing/selftests/net/forwarding/bridge_mdb.sh | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/tools/testing/selftests/net/forwarding/bridge_mdb.sh b/tools/testing/selftests/net/forwarding/bridge_mdb.sh
index 51f2b0d77067..4e16677f02ba 100755
--- a/tools/testing/selftests/net/forwarding/bridge_mdb.sh
+++ b/tools/testing/selftests/net/forwarding/bridge_mdb.sh
@@ -1054,7 +1054,7 @@ ctrl_igmpv3_is_in_test()
 
 	bridge mdb del dev br0 port $swp1 grp 239.1.1.1 vid 10
 
-	log_test "IGMPv3 MODE_IS_INCLUE tests"
+	log_test "IGMPv3 MODE_IS_INCLUDE tests"
 }
 
 ctrl_mldv2_is_in_test()
-- 
2.39.0


^ permalink raw reply related	[flat|nested] 90+ messages in thread

* [PATCH net-next 11/16] selftests: forwarding: lib: Add helpers for IP address handling
  2023-01-26 17:01 ` [Bridge] " Petr Machata
@ 2023-01-26 17:01   ` Petr Machata
  -1 siblings, 0 replies; 90+ messages in thread
From: Petr Machata @ 2023-01-26 17:01 UTC (permalink / raw)
  To: David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
	Roopa Prabhu, Nikolay Aleksandrov, netdev
  Cc: bridge, Petr Machata, Ido Schimmel

In order to generate IGMPv3 and MLDv2 packets on the fly, we will need
helpers to expand IPv4 and IPv6 addresses given as parameters in
mausezahn payload notation. Add helpers that do it.

Signed-off-by: Petr Machata <petrm@nvidia.com>
---
 tools/testing/selftests/net/forwarding/lib.sh | 37 +++++++++++++++++++
 1 file changed, 37 insertions(+)

diff --git a/tools/testing/selftests/net/forwarding/lib.sh b/tools/testing/selftests/net/forwarding/lib.sh
index db2534f7e49b..8f7e2cc8b779 100755
--- a/tools/testing/selftests/net/forwarding/lib.sh
+++ b/tools/testing/selftests/net/forwarding/lib.sh
@@ -1693,6 +1693,43 @@ hw_stats_monitor_test()
 	log_test "${type}_stats notifications"
 }
 
+ipv4_to_bytes()
+{
+	local IP=$1; shift
+
+	printf '%02x:' ${IP//./ } |
+	    sed 's/:$//'
+}
+
+# Convert a given IPv6 address, `IP' such that the :: token, if present, is
+# expanded, and each 16-bit group is padded with zeroes to be 4 hexadecimal
+# digits. An optional `BYTESEP' parameter can be given to further separate
+# individual bytes of each 16-bit group.
+expand_ipv6()
+{
+	local IP=$1; shift
+	local bytesep=$1; shift
+
+	local cvt_ip=${IP/::/_}
+	local colons=${cvt_ip//[^:]/}
+	local allcol=:::::::
+	# IP where :: -> the appropriate number of colons:
+	local allcol_ip=${cvt_ip/_/${allcol:${#colons}}}
+
+	echo $allcol_ip | tr : '\n' |
+	    sed s/^/0000/ |
+	    sed 's/.*\(..\)\(..\)/\1'"$bytesep"'\2/' |
+	    tr '\n' : |
+	    sed 's/:$//'
+}
+
+ipv6_to_bytes()
+{
+	local IP=$1; shift
+
+	expand_ipv6 "$IP" :
+}
+
 igmpv3_is_in_get()
 {
 	local igmpv3
-- 
2.39.0


^ permalink raw reply related	[flat|nested] 90+ messages in thread

* [Bridge] [PATCH net-next 11/16] selftests: forwarding: lib: Add helpers for IP address handling
@ 2023-01-26 17:01   ` Petr Machata
  0 siblings, 0 replies; 90+ messages in thread
From: Petr Machata @ 2023-01-26 17:01 UTC (permalink / raw)
  To: David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
	Roopa Prabhu, Nikolay Aleksandrov, netdev
  Cc: Petr Machata, Ido Schimmel, bridge

In order to generate IGMPv3 and MLDv2 packets on the fly, we will need
helpers to expand IPv4 and IPv6 addresses given as parameters in
mausezahn payload notation. Add helpers that do it.

Signed-off-by: Petr Machata <petrm@nvidia.com>
---
 tools/testing/selftests/net/forwarding/lib.sh | 37 +++++++++++++++++++
 1 file changed, 37 insertions(+)

diff --git a/tools/testing/selftests/net/forwarding/lib.sh b/tools/testing/selftests/net/forwarding/lib.sh
index db2534f7e49b..8f7e2cc8b779 100755
--- a/tools/testing/selftests/net/forwarding/lib.sh
+++ b/tools/testing/selftests/net/forwarding/lib.sh
@@ -1693,6 +1693,43 @@ hw_stats_monitor_test()
 	log_test "${type}_stats notifications"
 }
 
+ipv4_to_bytes()
+{
+	local IP=$1; shift
+
+	printf '%02x:' ${IP//./ } |
+	    sed 's/:$//'
+}
+
+# Convert a given IPv6 address, `IP' such that the :: token, if present, is
+# expanded, and each 16-bit group is padded with zeroes to be 4 hexadecimal
+# digits. An optional `BYTESEP' parameter can be given to further separate
+# individual bytes of each 16-bit group.
+expand_ipv6()
+{
+	local IP=$1; shift
+	local bytesep=$1; shift
+
+	local cvt_ip=${IP/::/_}
+	local colons=${cvt_ip//[^:]/}
+	local allcol=:::::::
+	# IP where :: -> the appropriate number of colons:
+	local allcol_ip=${cvt_ip/_/${allcol:${#colons}}}
+
+	echo $allcol_ip | tr : '\n' |
+	    sed s/^/0000/ |
+	    sed 's/.*\(..\)\(..\)/\1'"$bytesep"'\2/' |
+	    tr '\n' : |
+	    sed 's/:$//'
+}
+
+ipv6_to_bytes()
+{
+	local IP=$1; shift
+
+	expand_ipv6 "$IP" :
+}
+
 igmpv3_is_in_get()
 {
 	local igmpv3
-- 
2.39.0


^ permalink raw reply related	[flat|nested] 90+ messages in thread

* [PATCH net-next 12/16] selftests: forwarding: lib: Add helpers for checksum handling
  2023-01-26 17:01 ` [Bridge] " Petr Machata
@ 2023-01-26 17:01   ` Petr Machata
  -1 siblings, 0 replies; 90+ messages in thread
From: Petr Machata @ 2023-01-26 17:01 UTC (permalink / raw)
  To: David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
	Roopa Prabhu, Nikolay Aleksandrov, netdev
  Cc: bridge, Petr Machata, Ido Schimmel

In order to generate IGMPv3 and MLDv2 packets on the fly, we will need
helpers to calculate the packet checksum.

The approach presented in this patch revolves around payload templates
for mausezahn. These are mausezahn-like payload strings (01:23:45:...)
with possibly one 2-byte sequence replaced with the word PAYLOAD. The
main function is payload_template_calc_checksum(), which calculates
RFC 1071 checksum of the message. There are further helpers to then
convert the checksum to the payload format, and to expand it.

For IPv6, MLDv2 message checksum is computed using a pseudoheader that
differs from the header used in the payload itself. The fact that the
two messages are different means that the checksum needs to be
returned as a separate quantity, instead of being expanded in-place in
the payload itself. Furthermore, the pseudoheader includes a length of
the message. Much like the checksum, this needs to be expanded in
mausezahn format. And likewise for number of addresses for (S,G)
entries. Thus we have several places where a computed quantity needs
to be presented in the payload format. Add a helper u16_to_bytes(),
which will be used in all these cases.

Signed-off-by: Petr Machata <petrm@nvidia.com>
---
 tools/testing/selftests/net/forwarding/lib.sh | 56 +++++++++++++++++++
 1 file changed, 56 insertions(+)

diff --git a/tools/testing/selftests/net/forwarding/lib.sh b/tools/testing/selftests/net/forwarding/lib.sh
index 8f7e2cc8b779..1c5ca7552881 100755
--- a/tools/testing/selftests/net/forwarding/lib.sh
+++ b/tools/testing/selftests/net/forwarding/lib.sh
@@ -1730,6 +1730,62 @@ ipv6_to_bytes()
 	expand_ipv6 "$IP" :
 }
 
+u16_to_bytes()
+{
+	local u16=$1; shift
+
+	printf "%04x" $u16 | sed 's/^/000/;s/^.*\(..\)\(..\)$/\1:\2/'
+}
+
+# Given a mausezahn-formatted payload (colon-separated bytes given as %#02x),
+# possibly with a keyword CHECKSUM stashed where a 16-bit checksum should be,
+# calculate checksum as per RFC 1071, assuming the CHECKSUM field (if any)
+# stands for 00:00.
+payload_template_calc_checksum()
+{
+	local payload=$1; shift
+
+	(
+	    # Set input radix.
+	    echo "16i"
+	    # Push zero for the initial checksum.
+	    echo 0
+
+	    # Pad the payload with a terminating 00: in case we get an odd
+	    # number of bytes.
+	    echo "${payload%:}:00:" |
+		sed 's/CHECKSUM/00:00/g' |
+		tr '[:lower:]' '[:upper:]' |
+		# Add the word to the checksum.
+		sed 's/\(..\):\(..\):/\1\2+\n/g' |
+		# Strip the extra odd byte we pushed if left unconverted.
+		sed 's/\(..\):$//'
+
+	    echo "10000 ~ +"	# Calculate and add carry.
+	    echo "FFFF r - p"	# Bit-flip and print.
+	) |
+	    dc |
+	    tr '[:upper:]' '[:lower:]'
+}
+
+payload_template_expand_checksum()
+{
+	local payload=$1; shift
+	local checksum=$1; shift
+
+	local ckbytes=$(u16_to_bytes $checksum)
+
+	echo "$payload" | sed "s/CHECKSUM/$ckbytes/g"
+}
+
+payload_template_nbytes()
+{
+	local payload=$1; shift
+
+	payload_template_expand_checksum "${payload%:}" 0 |
+		sed 's/:/\n/g' | wc -l
+}
+
 igmpv3_is_in_get()
 {
 	local igmpv3
-- 
2.39.0


^ permalink raw reply related	[flat|nested] 90+ messages in thread

* [Bridge] [PATCH net-next 12/16] selftests: forwarding: lib: Add helpers for checksum handling
@ 2023-01-26 17:01   ` Petr Machata
  0 siblings, 0 replies; 90+ messages in thread
From: Petr Machata @ 2023-01-26 17:01 UTC (permalink / raw)
  To: David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
	Roopa Prabhu, Nikolay Aleksandrov, netdev
  Cc: Petr Machata, Ido Schimmel, bridge

In order to generate IGMPv3 and MLDv2 packets on the fly, we will need
helpers to calculate the packet checksum.

The approach presented in this patch revolves around payload templates
for mausezahn. These are mausezahn-like payload strings (01:23:45:...)
with possibly one 2-byte sequence replaced with the word PAYLOAD. The
main function is payload_template_calc_checksum(), which calculates
RFC 1071 checksum of the message. There are further helpers to then
convert the checksum to the payload format, and to expand it.

For IPv6, MLDv2 message checksum is computed using a pseudoheader that
differs from the header used in the payload itself. The fact that the
two messages are different means that the checksum needs to be
returned as a separate quantity, instead of being expanded in-place in
the payload itself. Furthermore, the pseudoheader includes a length of
the message. Much like the checksum, this needs to be expanded in
mausezahn format. And likewise for number of addresses for (S,G)
entries. Thus we have several places where a computed quantity needs
to be presented in the payload format. Add a helper u16_to_bytes(),
which will be used in all these cases.

Signed-off-by: Petr Machata <petrm@nvidia.com>
---
 tools/testing/selftests/net/forwarding/lib.sh | 56 +++++++++++++++++++
 1 file changed, 56 insertions(+)

diff --git a/tools/testing/selftests/net/forwarding/lib.sh b/tools/testing/selftests/net/forwarding/lib.sh
index 8f7e2cc8b779..1c5ca7552881 100755
--- a/tools/testing/selftests/net/forwarding/lib.sh
+++ b/tools/testing/selftests/net/forwarding/lib.sh
@@ -1730,6 +1730,62 @@ ipv6_to_bytes()
 	expand_ipv6 "$IP" :
 }
 
+u16_to_bytes()
+{
+	local u16=$1; shift
+
+	printf "%04x" $u16 | sed 's/^/000/;s/^.*\(..\)\(..\)$/\1:\2/'
+}
+
+# Given a mausezahn-formatted payload (colon-separated bytes given as %#02x),
+# possibly with a keyword CHECKSUM stashed where a 16-bit checksum should be,
+# calculate checksum as per RFC 1071, assuming the CHECKSUM field (if any)
+# stands for 00:00.
+payload_template_calc_checksum()
+{
+	local payload=$1; shift
+
+	(
+	    # Set input radix.
+	    echo "16i"
+	    # Push zero for the initial checksum.
+	    echo 0
+
+	    # Pad the payload with a terminating 00: in case we get an odd
+	    # number of bytes.
+	    echo "${payload%:}:00:" |
+		sed 's/CHECKSUM/00:00/g' |
+		tr '[:lower:]' '[:upper:]' |
+		# Add the word to the checksum.
+		sed 's/\(..\):\(..\):/\1\2+\n/g' |
+		# Strip the extra odd byte we pushed if left unconverted.
+		sed 's/\(..\):$//'
+
+	    echo "10000 ~ +"	# Calculate and add carry.
+	    echo "FFFF r - p"	# Bit-flip and print.
+	) |
+	    dc |
+	    tr '[:upper:]' '[:lower:]'
+}
+
+payload_template_expand_checksum()
+{
+	local payload=$1; shift
+	local checksum=$1; shift
+
+	local ckbytes=$(u16_to_bytes $checksum)
+
+	echo "$payload" | sed "s/CHECKSUM/$ckbytes/g"
+}
+
+payload_template_nbytes()
+{
+	local payload=$1; shift
+
+	payload_template_expand_checksum "${payload%:}" 0 |
+		sed 's/:/\n/g' | wc -l
+}
+
 igmpv3_is_in_get()
 {
 	local igmpv3
-- 
2.39.0


^ permalink raw reply related	[flat|nested] 90+ messages in thread

* [PATCH net-next 13/16] selftests: forwarding: lib: Parameterize IGMPv3/MLDv2 generation
  2023-01-26 17:01 ` [Bridge] " Petr Machata
@ 2023-01-26 17:01   ` Petr Machata
  -1 siblings, 0 replies; 90+ messages in thread
From: Petr Machata @ 2023-01-26 17:01 UTC (permalink / raw)
  To: David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
	Roopa Prabhu, Nikolay Aleksandrov, netdev
  Cc: bridge, Petr Machata, Ido Schimmel

In order to generate IGMPv3 and MLDv2 packets on the fly, the
functions that generate these packets need to be able to generate
packets for different groups and different sources. Generating MLDv2
packets further needs the source address of the packet for purposes of
checksum calculation. Add the necessary parameters, and generate the
payload accordingly by dispatching to helpers added in the previous
patches.

Adjust the sole client, bridge_mdb.sh, as well.

Signed-off-by: Petr Machata <petrm@nvidia.com>
---
 .../selftests/net/forwarding/bridge_mdb.sh    |  9 ++---
 tools/testing/selftests/net/forwarding/lib.sh | 36 +++++++++++++------
 2 files changed, 31 insertions(+), 14 deletions(-)

diff --git a/tools/testing/selftests/net/forwarding/bridge_mdb.sh b/tools/testing/selftests/net/forwarding/bridge_mdb.sh
index 4e16677f02ba..b48867d8cadf 100755
--- a/tools/testing/selftests/net/forwarding/bridge_mdb.sh
+++ b/tools/testing/selftests/net/forwarding/bridge_mdb.sh
@@ -1029,7 +1029,7 @@ ctrl_igmpv3_is_in_test()
 
 	# IS_IN ( 192.0.2.2 )
 	$MZ $h1.10 -c 1 -A 192.0.2.1 -B 239.1.1.1 \
-		-t ip proto=2,p=$(igmpv3_is_in_get) -q
+		-t ip proto=2,p=$(igmpv3_is_in_get 239.1.1.1 192.0.2.2) -q
 
 	bridge -d mdb show dev br0 vid 10 | grep 239.1.1.1 | grep -q 192.0.2.2
 	check_fail $? "Permanent entry affected by IGMP packet"
@@ -1042,7 +1042,7 @@ ctrl_igmpv3_is_in_test()
 
 	# IS_IN ( 192.0.2.2 )
 	$MZ $h1.10 -c 1 -A 192.0.2.1 -B 239.1.1.1 \
-		-t ip proto=2,p=$(igmpv3_is_in_get) -q
+		-t ip proto=2,p=$(igmpv3_is_in_get 239.1.1.1 192.0.2.2) -q
 
 	bridge -d mdb show dev br0 vid 10 | grep 239.1.1.1 | grep -v "src" | \
 		grep -q 192.0.2.2
@@ -1067,8 +1067,9 @@ ctrl_mldv2_is_in_test()
 		filter_mode include source_list 2001:db8:1::1
 
 	# IS_IN ( 2001:db8:1::2 )
+	local p=$(mldv2_is_in_get fe80::1 ff0e::1 2001:db8:1::2)
 	$MZ -6 $h1.10 -c 1 -A fe80::1 -B ff0e::1 \
-		-t ip hop=1,next=0,p=$(mldv2_is_in_get) -q
+		-t ip hop=1,next=0,p="$p" -q
 
 	bridge -d mdb show dev br0 vid 10 | grep ff0e::1 | \
 		grep -q 2001:db8:1::2
@@ -1082,7 +1083,7 @@ ctrl_mldv2_is_in_test()
 
 	# IS_IN ( 2001:db8:1::2 )
 	$MZ -6 $h1.10 -c 1 -A fe80::1 -B ff0e::1 \
-		-t ip hop=1,next=0,p=$(mldv2_is_in_get) -q
+		-t ip hop=1,next=0,p="$p" -q
 
 	bridge -d mdb show dev br0 vid 10 | grep ff0e::1 | grep -v "src" | \
 		grep -q 2001:db8:1::2
diff --git a/tools/testing/selftests/net/forwarding/lib.sh b/tools/testing/selftests/net/forwarding/lib.sh
index 1c5ca7552881..60d4408610b1 100755
--- a/tools/testing/selftests/net/forwarding/lib.sh
+++ b/tools/testing/selftests/net/forwarding/lib.sh
@@ -1788,26 +1788,35 @@ payload_template_nbytes()
 
 igmpv3_is_in_get()
 {
+	local GRP=$1; shift
+	local IP=$1; shift
+
 	local igmpv3
 
+	# IS_IN ( $IP )
 	igmpv3=$(:
 		)"22:"$(			: Type - Membership Report
 		)"00:"$(			: Reserved
-		)"2a:f8:"$(			: Checksum
+		)"CHECKSUM:"$(			: Checksum
 		)"00:00:"$(			: Reserved
 		)"00:01:"$(			: Number of Group Records
 		)"01:"$(			: Record Type - IS_IN
 		)"00:"$(			: Aux Data Len
 		)"00:01:"$(			: Number of Sources
-		)"ef:01:01:01:"$(		: Multicast Address - 239.1.1.1
-		)"c0:00:02:02"$(		: Source Address - 192.0.2.2
+		)"$(ipv4_to_bytes $GRP):"$(	: Multicast Address
+		)"$(ipv4_to_bytes $IP)"$(	: Source Address
 		)
+	local checksum=$(payload_template_calc_checksum "$igmpv3")
 
-	echo $igmpv3
+	payload_template_expand_checksum "$igmpv3" $checksum
 }
 
 mldv2_is_in_get()
 {
+	local SIP=$1; shift
+	local GRP=$1; shift
+	local IP=$1; shift
+
 	local hbh
 	local icmpv6
 
@@ -1820,17 +1829,24 @@ mldv2_is_in_get()
 	icmpv6=$(:
 		)"8f:"$(			: Type - MLDv2 Report
 		)"00:"$(			: Code
-		)"45:39:"$(			: Checksum
+		)"CHECKSUM:"$(			: Checksum
 		)"00:00:"$(			: Reserved
 		)"00:01:"$(			: Number of Group Records
 		)"01:"$(			: Record Type - IS_IN
 		)"00:"$(			: Aux Data Len
 		)"00:01:"$(			: Number of Sources
-		)"ff:0e:00:00:00:00:00:00:"$(	: Multicast address - ff0e::1
-		)"00:00:00:00:00:00:00:01:"$(	:
-		)"20:01:0d:b8:00:01:00:00:"$(	: Source Address - 2001:db8:1::2
-		)"00:00:00:00:00:00:00:02:"$(	:
+		)"$(ipv6_to_bytes $GRP):"$(	: Multicast address
+		)"$(ipv6_to_bytes $IP):"$(	: Source Address
 		)
 
-	echo ${hbh}${icmpv6}
+	local len=$(u16_to_bytes $(payload_template_nbytes $icmpv6))
+	local sudohdr=$(:
+		)"$(ipv6_to_bytes $SIP):"$(	: SIP
+		)"$(ipv6_to_bytes $GRP):"$(	: DIP is multicast address
+	        )"${len}:"$(			: Upper-layer length
+	        )"00:3a:"$(			: Zero and next-header
+	        )
+	local checksum=$(payload_template_calc_checksum ${sudohdr}${icmpv6})
+
+	payload_template_expand_checksum "$hbh$icmpv6" $checksum
 }
-- 
2.39.0


^ permalink raw reply related	[flat|nested] 90+ messages in thread

* [Bridge] [PATCH net-next 13/16] selftests: forwarding: lib: Parameterize IGMPv3/MLDv2 generation
@ 2023-01-26 17:01   ` Petr Machata
  0 siblings, 0 replies; 90+ messages in thread
From: Petr Machata @ 2023-01-26 17:01 UTC (permalink / raw)
  To: David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
	Roopa Prabhu, Nikolay Aleksandrov, netdev
  Cc: Petr Machata, Ido Schimmel, bridge

In order to generate IGMPv3 and MLDv2 packets on the fly, the
functions that generate these packets need to be able to generate
packets for different groups and different sources. Generating MLDv2
packets further needs the source address of the packet for purposes of
checksum calculation. Add the necessary parameters, and generate the
payload accordingly by dispatching to helpers added in the previous
patches.

Adjust the sole client, bridge_mdb.sh, as well.

Signed-off-by: Petr Machata <petrm@nvidia.com>
---
 .../selftests/net/forwarding/bridge_mdb.sh    |  9 ++---
 tools/testing/selftests/net/forwarding/lib.sh | 36 +++++++++++++------
 2 files changed, 31 insertions(+), 14 deletions(-)

diff --git a/tools/testing/selftests/net/forwarding/bridge_mdb.sh b/tools/testing/selftests/net/forwarding/bridge_mdb.sh
index 4e16677f02ba..b48867d8cadf 100755
--- a/tools/testing/selftests/net/forwarding/bridge_mdb.sh
+++ b/tools/testing/selftests/net/forwarding/bridge_mdb.sh
@@ -1029,7 +1029,7 @@ ctrl_igmpv3_is_in_test()
 
 	# IS_IN ( 192.0.2.2 )
 	$MZ $h1.10 -c 1 -A 192.0.2.1 -B 239.1.1.1 \
-		-t ip proto=2,p=$(igmpv3_is_in_get) -q
+		-t ip proto=2,p=$(igmpv3_is_in_get 239.1.1.1 192.0.2.2) -q
 
 	bridge -d mdb show dev br0 vid 10 | grep 239.1.1.1 | grep -q 192.0.2.2
 	check_fail $? "Permanent entry affected by IGMP packet"
@@ -1042,7 +1042,7 @@ ctrl_igmpv3_is_in_test()
 
 	# IS_IN ( 192.0.2.2 )
 	$MZ $h1.10 -c 1 -A 192.0.2.1 -B 239.1.1.1 \
-		-t ip proto=2,p=$(igmpv3_is_in_get) -q
+		-t ip proto=2,p=$(igmpv3_is_in_get 239.1.1.1 192.0.2.2) -q
 
 	bridge -d mdb show dev br0 vid 10 | grep 239.1.1.1 | grep -v "src" | \
 		grep -q 192.0.2.2
@@ -1067,8 +1067,9 @@ ctrl_mldv2_is_in_test()
 		filter_mode include source_list 2001:db8:1::1
 
 	# IS_IN ( 2001:db8:1::2 )
+	local p=$(mldv2_is_in_get fe80::1 ff0e::1 2001:db8:1::2)
 	$MZ -6 $h1.10 -c 1 -A fe80::1 -B ff0e::1 \
-		-t ip hop=1,next=0,p=$(mldv2_is_in_get) -q
+		-t ip hop=1,next=0,p="$p" -q
 
 	bridge -d mdb show dev br0 vid 10 | grep ff0e::1 | \
 		grep -q 2001:db8:1::2
@@ -1082,7 +1083,7 @@ ctrl_mldv2_is_in_test()
 
 	# IS_IN ( 2001:db8:1::2 )
 	$MZ -6 $h1.10 -c 1 -A fe80::1 -B ff0e::1 \
-		-t ip hop=1,next=0,p=$(mldv2_is_in_get) -q
+		-t ip hop=1,next=0,p="$p" -q
 
 	bridge -d mdb show dev br0 vid 10 | grep ff0e::1 | grep -v "src" | \
 		grep -q 2001:db8:1::2
diff --git a/tools/testing/selftests/net/forwarding/lib.sh b/tools/testing/selftests/net/forwarding/lib.sh
index 1c5ca7552881..60d4408610b1 100755
--- a/tools/testing/selftests/net/forwarding/lib.sh
+++ b/tools/testing/selftests/net/forwarding/lib.sh
@@ -1788,26 +1788,35 @@ payload_template_nbytes()
 
 igmpv3_is_in_get()
 {
+	local GRP=$1; shift
+	local IP=$1; shift
+
 	local igmpv3
 
+	# IS_IN ( $IP )
 	igmpv3=$(:
 		)"22:"$(			: Type - Membership Report
 		)"00:"$(			: Reserved
-		)"2a:f8:"$(			: Checksum
+		)"CHECKSUM:"$(			: Checksum
 		)"00:00:"$(			: Reserved
 		)"00:01:"$(			: Number of Group Records
 		)"01:"$(			: Record Type - IS_IN
 		)"00:"$(			: Aux Data Len
 		)"00:01:"$(			: Number of Sources
-		)"ef:01:01:01:"$(		: Multicast Address - 239.1.1.1
-		)"c0:00:02:02"$(		: Source Address - 192.0.2.2
+		)"$(ipv4_to_bytes $GRP):"$(	: Multicast Address
+		)"$(ipv4_to_bytes $IP)"$(	: Source Address
 		)
+	local checksum=$(payload_template_calc_checksum "$igmpv3")
 
-	echo $igmpv3
+	payload_template_expand_checksum "$igmpv3" $checksum
 }
 
 mldv2_is_in_get()
 {
+	local SIP=$1; shift
+	local GRP=$1; shift
+	local IP=$1; shift
+
 	local hbh
 	local icmpv6
 
@@ -1820,17 +1829,24 @@ mldv2_is_in_get()
 	icmpv6=$(:
 		)"8f:"$(			: Type - MLDv2 Report
 		)"00:"$(			: Code
-		)"45:39:"$(			: Checksum
+		)"CHECKSUM:"$(			: Checksum
 		)"00:00:"$(			: Reserved
 		)"00:01:"$(			: Number of Group Records
 		)"01:"$(			: Record Type - IS_IN
 		)"00:"$(			: Aux Data Len
 		)"00:01:"$(			: Number of Sources
-		)"ff:0e:00:00:00:00:00:00:"$(	: Multicast address - ff0e::1
-		)"00:00:00:00:00:00:00:01:"$(	:
-		)"20:01:0d:b8:00:01:00:00:"$(	: Source Address - 2001:db8:1::2
-		)"00:00:00:00:00:00:00:02:"$(	:
+		)"$(ipv6_to_bytes $GRP):"$(	: Multicast address
+		)"$(ipv6_to_bytes $IP):"$(	: Source Address
 		)
 
-	echo ${hbh}${icmpv6}
+	local len=$(u16_to_bytes $(payload_template_nbytes $icmpv6))
+	local sudohdr=$(:
+		)"$(ipv6_to_bytes $SIP):"$(	: SIP
+		)"$(ipv6_to_bytes $GRP):"$(	: DIP is multicast address
+	        )"${len}:"$(			: Upper-layer length
+	        )"00:3a:"$(			: Zero and next-header
+	        )
+	local checksum=$(payload_template_calc_checksum ${sudohdr}${icmpv6})
+
+	payload_template_expand_checksum "$hbh$icmpv6" $checksum
 }
-- 
2.39.0


^ permalink raw reply related	[flat|nested] 90+ messages in thread

* [PATCH net-next 14/16] selftests: forwarding: lib: Allow list of IPs for IGMPv3/MLDv2
  2023-01-26 17:01 ` [Bridge] " Petr Machata
@ 2023-01-26 17:01   ` Petr Machata
  -1 siblings, 0 replies; 90+ messages in thread
From: Petr Machata @ 2023-01-26 17:01 UTC (permalink / raw)
  To: David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
	Roopa Prabhu, Nikolay Aleksandrov, netdev
  Cc: bridge, Petr Machata, Ido Schimmel

The testsuite that checks for mcast_max_groups functionality will need
to generate IGMP and MLD packets with configurable number of (S,G)
addresses. To that end, further extend igmpv3_is_in_get() and
mldv2_is_in_get() to allow a list of IP addresses instead of one
address.

Signed-off-by: Petr Machata <petrm@nvidia.com>
---
 tools/testing/selftests/net/forwarding/lib.sh | 22 +++++++++++++------
 1 file changed, 15 insertions(+), 7 deletions(-)

diff --git a/tools/testing/selftests/net/forwarding/lib.sh b/tools/testing/selftests/net/forwarding/lib.sh
index 60d4408610b1..9f180af2cd81 100755
--- a/tools/testing/selftests/net/forwarding/lib.sh
+++ b/tools/testing/selftests/net/forwarding/lib.sh
@@ -1789,11 +1789,12 @@ payload_template_nbytes()
 igmpv3_is_in_get()
 {
 	local GRP=$1; shift
-	local IP=$1; shift
+	local sources=("$@")
 
 	local igmpv3
+	local nsources=$(u16_to_bytes ${#sources[@]})
 
-	# IS_IN ( $IP )
+	# IS_IN ( $sources )
 	igmpv3=$(:
 		)"22:"$(			: Type - Membership Report
 		)"00:"$(			: Reserved
@@ -1802,9 +1803,12 @@ igmpv3_is_in_get()
 		)"00:01:"$(			: Number of Group Records
 		)"01:"$(			: Record Type - IS_IN
 		)"00:"$(			: Aux Data Len
-		)"00:01:"$(			: Number of Sources
+		)"${nsources}:"$(		: Number of Sources
 		)"$(ipv4_to_bytes $GRP):"$(	: Multicast Address
-		)"$(ipv4_to_bytes $IP)"$(	: Source Address
+		)"$(for src in "${sources[@]}"; do
+			ipv4_to_bytes $src
+			echo -n :
+		    done)"$(			: Source Addresses
 		)
 	local checksum=$(payload_template_calc_checksum "$igmpv3")
 
@@ -1815,10 +1819,11 @@ mldv2_is_in_get()
 {
 	local SIP=$1; shift
 	local GRP=$1; shift
-	local IP=$1; shift
+	local sources=("$@")
 
 	local hbh
 	local icmpv6
+	local nsources=$(u16_to_bytes ${#sources[@]})
 
 	hbh=$(:
 		)"3a:"$(			: Next Header - ICMPv6
@@ -1834,9 +1839,12 @@ mldv2_is_in_get()
 		)"00:01:"$(			: Number of Group Records
 		)"01:"$(			: Record Type - IS_IN
 		)"00:"$(			: Aux Data Len
-		)"00:01:"$(			: Number of Sources
+		)"${nsources}:"$(		: Number of Sources
 		)"$(ipv6_to_bytes $GRP):"$(	: Multicast address
-		)"$(ipv6_to_bytes $IP):"$(	: Source Address
+		)"$(for src in "${sources[@]}"; do
+			ipv6_to_bytes $src
+			echo -n :
+		    done)"$(			: Source Addresses
 		)
 
 	local len=$(u16_to_bytes $(payload_template_nbytes $icmpv6))
-- 
2.39.0


^ permalink raw reply related	[flat|nested] 90+ messages in thread

* [Bridge] [PATCH net-next 14/16] selftests: forwarding: lib: Allow list of IPs for IGMPv3/MLDv2
@ 2023-01-26 17:01   ` Petr Machata
  0 siblings, 0 replies; 90+ messages in thread
From: Petr Machata @ 2023-01-26 17:01 UTC (permalink / raw)
  To: David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
	Roopa Prabhu, Nikolay Aleksandrov, netdev
  Cc: Petr Machata, Ido Schimmel, bridge

The testsuite that checks for mcast_max_groups functionality will need
to generate IGMP and MLD packets with configurable number of (S,G)
addresses. To that end, further extend igmpv3_is_in_get() and
mldv2_is_in_get() to allow a list of IP addresses instead of one
address.

Signed-off-by: Petr Machata <petrm@nvidia.com>
---
 tools/testing/selftests/net/forwarding/lib.sh | 22 +++++++++++++------
 1 file changed, 15 insertions(+), 7 deletions(-)

diff --git a/tools/testing/selftests/net/forwarding/lib.sh b/tools/testing/selftests/net/forwarding/lib.sh
index 60d4408610b1..9f180af2cd81 100755
--- a/tools/testing/selftests/net/forwarding/lib.sh
+++ b/tools/testing/selftests/net/forwarding/lib.sh
@@ -1789,11 +1789,12 @@ payload_template_nbytes()
 igmpv3_is_in_get()
 {
 	local GRP=$1; shift
-	local IP=$1; shift
+	local sources=("$@")
 
 	local igmpv3
+	local nsources=$(u16_to_bytes ${#sources[@]})
 
-	# IS_IN ( $IP )
+	# IS_IN ( $sources )
 	igmpv3=$(:
 		)"22:"$(			: Type - Membership Report
 		)"00:"$(			: Reserved
@@ -1802,9 +1803,12 @@ igmpv3_is_in_get()
 		)"00:01:"$(			: Number of Group Records
 		)"01:"$(			: Record Type - IS_IN
 		)"00:"$(			: Aux Data Len
-		)"00:01:"$(			: Number of Sources
+		)"${nsources}:"$(		: Number of Sources
 		)"$(ipv4_to_bytes $GRP):"$(	: Multicast Address
-		)"$(ipv4_to_bytes $IP)"$(	: Source Address
+		)"$(for src in "${sources[@]}"; do
+			ipv4_to_bytes $src
+			echo -n :
+		    done)"$(			: Source Addresses
 		)
 	local checksum=$(payload_template_calc_checksum "$igmpv3")
 
@@ -1815,10 +1819,11 @@ mldv2_is_in_get()
 {
 	local SIP=$1; shift
 	local GRP=$1; shift
-	local IP=$1; shift
+	local sources=("$@")
 
 	local hbh
 	local icmpv6
+	local nsources=$(u16_to_bytes ${#sources[@]})
 
 	hbh=$(:
 		)"3a:"$(			: Next Header - ICMPv6
@@ -1834,9 +1839,12 @@ mldv2_is_in_get()
 		)"00:01:"$(			: Number of Group Records
 		)"01:"$(			: Record Type - IS_IN
 		)"00:"$(			: Aux Data Len
-		)"00:01:"$(			: Number of Sources
+		)"${nsources}:"$(		: Number of Sources
 		)"$(ipv6_to_bytes $GRP):"$(	: Multicast address
-		)"$(ipv6_to_bytes $IP):"$(	: Source Address
+		)"$(for src in "${sources[@]}"; do
+			ipv6_to_bytes $src
+			echo -n :
+		    done)"$(			: Source Addresses
 		)
 
 	local len=$(u16_to_bytes $(payload_template_nbytes $icmpv6))
-- 
2.39.0


^ permalink raw reply related	[flat|nested] 90+ messages in thread

* [PATCH net-next 15/16] selftests: forwarding: lib: Add helpers to build IGMP/MLD leave packets
  2023-01-26 17:01 ` [Bridge] " Petr Machata
@ 2023-01-26 17:01   ` Petr Machata
  -1 siblings, 0 replies; 90+ messages in thread
From: Petr Machata @ 2023-01-26 17:01 UTC (permalink / raw)
  To: David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
	Roopa Prabhu, Nikolay Aleksandrov, netdev
  Cc: bridge, Petr Machata, Ido Schimmel

The testsuite that checks for mcast_max_groups functionality will need to
wipe the added groups as well. Add helpers to build an IGMP or MLD packets
announcing that host is leaving a given group.

Signed-off-by: Petr Machata <petrm@nvidia.com>
---
 tools/testing/selftests/net/forwarding/lib.sh | 50 +++++++++++++++++++
 1 file changed, 50 insertions(+)

diff --git a/tools/testing/selftests/net/forwarding/lib.sh b/tools/testing/selftests/net/forwarding/lib.sh
index 9f180af2cd81..7b3e89a15ccb 100755
--- a/tools/testing/selftests/net/forwarding/lib.sh
+++ b/tools/testing/selftests/net/forwarding/lib.sh
@@ -1815,6 +1815,21 @@ igmpv3_is_in_get()
 	payload_template_expand_checksum "$igmpv3" $checksum
 }
 
+igmpv2_leave_get()
+{
+	local GRP=$1; shift
+
+	local payload=$(:
+		)"17:"$(			: Type - Leave Group
+		)"00:"$(			: Max Resp Time - not meaningful
+		)"CHECKSUM:"$(			: Checksum
+		)"$(ipv4_to_bytes $GRP)"$(	: Group Address
+		)
+	local checksum=$(payload_template_calc_checksum "$payload")
+
+	payload_template_expand_checksum "$payload" $checksum
+}
+
 mldv2_is_in_get()
 {
 	local SIP=$1; shift
@@ -1858,3 +1873,38 @@ mldv2_is_in_get()
 
 	payload_template_expand_checksum "$hbh$icmpv6" $checksum
 }
+
+mldv1_done_get()
+{
+	local SIP=$1; shift
+	local GRP=$1; shift
+
+	local hbh
+	local icmpv6
+
+	hbh=$(:
+		)"3a:"$(			: Next Header - ICMPv6
+		)"00:"$(			: Hdr Ext Len
+		)"00:00:00:00:00:00:"$(		: Options and Padding
+		)
+
+	icmpv6=$(:
+		)"84:"$(			: Type - MLDv1 Done
+		)"00:"$(			: Code
+		)"CHECKSUM:"$(			: Checksum
+		)"00:00:"$(			: Max Resp Delay - not meaningful
+		)"00:00:"$(			: Reserved
+		)"$(ipv6_to_bytes $GRP):"$(	: Multicast address
+		)
+
+	local len=$(u16_to_bytes $(payload_template_nbytes $icmpv6))
+	local sudohdr=$(:
+		)"$(ipv6_to_bytes $SIP):"$(	: SIP
+		)"$(ipv6_to_bytes $GRP):"$(	: DIP is multicast address
+	        )"${len}:"$(			: Upper-layer length
+	        )"00:3a:"$(			: Zero and next-header
+	        )
+	local checksum=$(payload_template_calc_checksum ${sudohdr}${icmpv6})
+
+	payload_template_expand_checksum "$hbh$icmpv6" $checksum
+}
-- 
2.39.0


^ permalink raw reply related	[flat|nested] 90+ messages in thread

* [Bridge] [PATCH net-next 15/16] selftests: forwarding: lib: Add helpers to build IGMP/MLD leave packets
@ 2023-01-26 17:01   ` Petr Machata
  0 siblings, 0 replies; 90+ messages in thread
From: Petr Machata @ 2023-01-26 17:01 UTC (permalink / raw)
  To: David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
	Roopa Prabhu, Nikolay Aleksandrov, netdev
  Cc: Petr Machata, Ido Schimmel, bridge

The testsuite that checks for mcast_max_groups functionality will need to
wipe the added groups as well. Add helpers to build an IGMP or MLD packets
announcing that host is leaving a given group.

Signed-off-by: Petr Machata <petrm@nvidia.com>
---
 tools/testing/selftests/net/forwarding/lib.sh | 50 +++++++++++++++++++
 1 file changed, 50 insertions(+)

diff --git a/tools/testing/selftests/net/forwarding/lib.sh b/tools/testing/selftests/net/forwarding/lib.sh
index 9f180af2cd81..7b3e89a15ccb 100755
--- a/tools/testing/selftests/net/forwarding/lib.sh
+++ b/tools/testing/selftests/net/forwarding/lib.sh
@@ -1815,6 +1815,21 @@ igmpv3_is_in_get()
 	payload_template_expand_checksum "$igmpv3" $checksum
 }
 
+igmpv2_leave_get()
+{
+	local GRP=$1; shift
+
+	local payload=$(:
+		)"17:"$(			: Type - Leave Group
+		)"00:"$(			: Max Resp Time - not meaningful
+		)"CHECKSUM:"$(			: Checksum
+		)"$(ipv4_to_bytes $GRP)"$(	: Group Address
+		)
+	local checksum=$(payload_template_calc_checksum "$payload")
+
+	payload_template_expand_checksum "$payload" $checksum
+}
+
 mldv2_is_in_get()
 {
 	local SIP=$1; shift
@@ -1858,3 +1873,38 @@ mldv2_is_in_get()
 
 	payload_template_expand_checksum "$hbh$icmpv6" $checksum
 }
+
+mldv1_done_get()
+{
+	local SIP=$1; shift
+	local GRP=$1; shift
+
+	local hbh
+	local icmpv6
+
+	hbh=$(:
+		)"3a:"$(			: Next Header - ICMPv6
+		)"00:"$(			: Hdr Ext Len
+		)"00:00:00:00:00:00:"$(		: Options and Padding
+		)
+
+	icmpv6=$(:
+		)"84:"$(			: Type - MLDv1 Done
+		)"00:"$(			: Code
+		)"CHECKSUM:"$(			: Checksum
+		)"00:00:"$(			: Max Resp Delay - not meaningful
+		)"00:00:"$(			: Reserved
+		)"$(ipv6_to_bytes $GRP):"$(	: Multicast address
+		)
+
+	local len=$(u16_to_bytes $(payload_template_nbytes $icmpv6))
+	local sudohdr=$(:
+		)"$(ipv6_to_bytes $SIP):"$(	: SIP
+		)"$(ipv6_to_bytes $GRP):"$(	: DIP is multicast address
+	        )"${len}:"$(			: Upper-layer length
+	        )"00:3a:"$(			: Zero and next-header
+	        )
+	local checksum=$(payload_template_calc_checksum ${sudohdr}${icmpv6})
+
+	payload_template_expand_checksum "$hbh$icmpv6" $checksum
+}
-- 
2.39.0


^ permalink raw reply related	[flat|nested] 90+ messages in thread

* [PATCH net-next 16/16] selftests: forwarding: bridge_mdb_max: Add a new selftest
  2023-01-26 17:01 ` [Bridge] " Petr Machata
@ 2023-01-26 17:01   ` Petr Machata
  -1 siblings, 0 replies; 90+ messages in thread
From: Petr Machata @ 2023-01-26 17:01 UTC (permalink / raw)
  To: David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
	Roopa Prabhu, Nikolay Aleksandrov, netdev
  Cc: bridge, Petr Machata, Ido Schimmel

Add a suite covering mcast_n_groups and mcast_max_groups bridge features.

Signed-off-by: Petr Machata <petrm@nvidia.com>
---
 .../testing/selftests/net/forwarding/Makefile |   1 +
 .../net/forwarding/bridge_mdb_max.sh          | 970 ++++++++++++++++++
 2 files changed, 971 insertions(+)
 create mode 100755 tools/testing/selftests/net/forwarding/bridge_mdb_max.sh

diff --git a/tools/testing/selftests/net/forwarding/Makefile b/tools/testing/selftests/net/forwarding/Makefile
index 453ae006fbcf..91201ab3c4fc 100644
--- a/tools/testing/selftests/net/forwarding/Makefile
+++ b/tools/testing/selftests/net/forwarding/Makefile
@@ -4,6 +4,7 @@ TEST_PROGS = bridge_igmp.sh \
 	bridge_locked_port.sh \
 	bridge_mdb.sh \
 	bridge_mdb_host.sh \
+	bridge_mdb_max.sh \
 	bridge_mdb_port_down.sh \
 	bridge_mld.sh \
 	bridge_port_isolation.sh \
diff --git a/tools/testing/selftests/net/forwarding/bridge_mdb_max.sh b/tools/testing/selftests/net/forwarding/bridge_mdb_max.sh
new file mode 100755
index 000000000000..20c8831f7cde
--- /dev/null
+++ b/tools/testing/selftests/net/forwarding/bridge_mdb_max.sh
@@ -0,0 +1,970 @@
+#!/bin/bash
+# SPDX-License-Identifier: GPL-2.0
+
+# +-----------------------+                          +------------------------+
+# | H1 (vrf)              |                          | H2 (vrf)               |
+# | + $h1.10              |                          | + $h2.10               |
+# | | 192.0.2.1/28        |                          | | 192.0.2.2/28         |
+# | | 2001:db8:1::1/64    |                          | | 2001:db8:1::2/64     |
+# | |                     |                          | |                      |
+# | |  + $h1.20           |                          | |  + $h2.20            |
+# | \  | 198.51.100.1/24  |                          | \  | 198.51.100.2/24   |
+# |  \ | 2001:db8:2::1/64 |                          |  \ | 2001:db8:2::2/64  |
+# |   \|                  |                          |   \|                   |
+# |    + $h1              |                          |    + $h2               |
+# +----|------------------+                          +----|-------------------+
+#      |                                                  |
+# +----|--------------------------------------------------|-------------------+
+# | SW |                                                  |                   |
+# | +--|--------------------------------------------------|-----------------+ |
+# | |  + $swp1                   BR0 (802.1q)             + $swp2           | |
+# | |     vid 10                                             vid 10         | |
+# | |     vid 20                                             vid 20         | |
+# | |                                                                       | |
+# | +-----------------------------------------------------------------------+ |
+# +---------------------------------------------------------------------------+
+
+ALL_TESTS="
+	$(: Tests vlan_filtering 0 mcast_vlan_snooping 0. )
+	test_port_ngroups_cfg4
+	test_port_maxgroups_cfg4
+	test_port_ngroups_ctl4
+	test_port_maxgroups_ctl4
+	test_port_ngroups_cfg6
+	test_port_maxgroups_cfg6
+	test_port_ngroups_ctl6
+	test_port_maxgroups_ctl6
+
+	switch_destroy
+	switch_create_8021q
+	setup_wait
+
+	$(: Tests vlan_filtering 1 mcast_vlan_snooping 0. )
+	test_vlan_attributes_off
+	test_port_ngroups_cfg4
+	test_port_maxgroups_cfg4
+	test_port_ngroups_ctl4
+	test_port_maxgroups_ctl4
+	test_port_ngroups_cfg6
+	test_port_maxgroups_cfg6
+	test_port_ngroups_ctl6
+	test_port_maxgroups_ctl6
+
+	switch_destroy
+	switch_create_vlan_snooping
+	setup_wait
+
+	$(: Tests vlan_filtering 1 mcast_vlan_snooping 1. )
+	test_vlan_attributes_on
+	test_port_ngroups_cfg4
+	test_port_maxgroups_cfg4
+	test_port_vlan_ngroups_cfg4
+	test_port_vlan_maxgroups_cfg4
+	test_port_ngroups_cfg6
+	test_port_maxgroups_cfg6
+	test_port_vlan_ngroups_cfg6
+	test_port_vlan_maxgroups_cfg6
+	test_port_vlan_toggle_vlan_snooping
+"
+
+NUM_NETIFS=4
+source lib.sh
+source tc_common.sh
+
+h1_create()
+{
+	simple_if_init $h1
+	vlan_create $h1 10 v$h1 192.0.2.1/28 2001:db8:1::1/64
+	vlan_create $h1 20 v$h1 198.51.100.1/24 2001:db8:2::1/64
+}
+
+h1_destroy()
+{
+	vlan_destroy $h1 20
+	vlan_destroy $h1 10
+	simple_if_fini $h1
+}
+
+h2_create()
+{
+	simple_if_init $h2
+	vlan_create $h2 10 v$h2 192.0.2.2/28
+	vlan_create $h2 20 v$h2 198.51.100.2/24
+}
+
+h2_destroy()
+{
+	vlan_destroy $h2 20
+	vlan_destroy $h2 10
+	simple_if_fini $h2
+}
+
+switch_create_8021d()
+{
+	log_info "802.1d tests"
+
+	ip link add name br0 type bridge vlan_filtering 0 \
+		mcast_snooping 1 \
+		mcast_igmp_version 3 mcast_mld_version 2
+	ip link set dev br0 up
+
+	ip link set dev $swp1 master br0
+	ip link set dev $swp1 up
+	bridge link set dev $swp1 fastleave on
+
+	ip link set dev $swp2 master br0
+	ip link set dev $swp2 up
+}
+
+switch_create_8021q()
+{
+	local br_flags=$1; shift
+
+	log_info "802.1q $br_flags${br_flags:+ }tests"
+
+	ip link add name br0 type bridge vlan_filtering 1 vlan_default_pvid 0 \
+		mcast_snooping 1 $br_flags \
+		mcast_igmp_version 3 mcast_mld_version 2
+	bridge vlan add vid 10 dev br0 self
+	bridge vlan add vid 20 dev br0 self
+	ip link set dev br0 up
+
+	ip link set dev $swp1 master br0
+	ip link set dev $swp1 up
+	bridge link set dev $swp1 fastleave on
+	bridge vlan add vid 10 dev $swp1
+	bridge vlan add vid 20 dev $swp1
+
+	ip link set dev $swp2 master br0
+	ip link set dev $swp2 up
+	bridge vlan add vid 10 dev $swp2
+	bridge vlan add vid 20 dev $swp2
+}
+
+switch_create_vlan_snooping()
+{
+	switch_create_8021q "mcast_vlan_snooping 1"
+}
+
+switch_destroy()
+{
+	ip link set dev $swp2 down
+	ip link set dev $swp2 nomaster
+
+	ip link set dev $swp1 down
+	ip link set dev $swp1 nomaster
+
+	ip link set dev br0 down
+	ip link del dev br0
+}
+
+setup_prepare()
+{
+	h1=${NETIFS[p1]}
+	swp1=${NETIFS[p2]}
+
+	swp2=${NETIFS[p3]}
+	h2=${NETIFS[p4]}
+
+	vrf_prepare
+	forwarding_enable
+
+	h1_create
+	h2_create
+	switch_create_8021d
+}
+
+cleanup()
+{
+	pre_cleanup
+
+	switch_destroy
+	h2_destroy
+	h1_destroy
+
+	forwarding_restore
+	vrf_cleanup
+}
+
+cfg_src_list()
+{
+	local IPs=("$@")
+	local IPstr=$(echo ${IPs[@]} | tr '[:space:]' , | sed 's/,$//')
+
+	echo ${IPstr:+source_list }${IPstr}
+}
+
+cfg_group_op()
+{
+	local op=$1; shift
+	local locus=$1; shift
+	local GRP=$1; shift
+	local state=$1; shift
+	local IPs=("$@")
+
+	local source_list=$(cfg_src_list ${IPs[@]})
+
+	# Everything besides `bridge mdb' uses the "dev X vid Y" syntax,
+	# so we use it here as well and convert.
+	local br_locus=$(echo "$locus" | sed 's/^dev /port /')
+
+	bridge mdb $op dev br0 $br_locus grp $GRP $state \
+	       filter_mode include $source_list
+}
+
+cfg4_entries_op()
+{
+	local op=$1; shift
+	local locus=$1; shift
+	local state=$1; shift
+	local n=$1; shift
+
+	local GRP=239.1.1.1
+	local IPs=$(seq -f 192.0.2.%g 1 $((n - 1)))
+	cfg_group_op "$op" "$locus" "$GRP" "$state" ${IPs[@]}
+}
+
+cfg4_entries_add()
+{
+	cfg4_entries_op add "$@"
+}
+
+cfg4_entries_del()
+{
+	cfg4_entries_op del "$@"
+}
+
+cfg6_entries_op()
+{
+	local op=$1; shift
+	local locus=$1; shift
+	local state=$1; shift
+	local n=$1; shift
+
+	local GRP=ff0e::1
+	local IPs=$(printf "2001:db8:1::%x\n" $(seq 1 $((n - 1))))
+	cfg_group_op "$op" "$locus" "$GRP" "$state" ${IPs[@]}
+}
+
+cfg6_entries_add()
+{
+	cfg6_entries_op add "$@"
+}
+
+cfg6_entries_del()
+{
+	cfg6_entries_op del "$@"
+}
+
+dev_peer()
+{
+	local dev_kw=$1; shift
+	local dev=$1; shift
+	local vid_kw=$1; shift
+	local vid=$1; shift
+
+	echo "$h1.${vid:-10}"
+}
+
+ctl4_entries_add()
+{
+	local locus=$1; shift
+	local state=$1; shift
+	local n=$1; shift
+
+	local IPs=$(seq -f 192.0.2.%g 1 $((n - 1)))
+	local peer=$(dev_peer $locus)
+	local GRP=239.1.1.1
+	$MZ $peer -c 1 -A 192.0.2.1 -B $GRP \
+		-t ip proto=2,p=$(igmpv3_is_in_get $GRP $IPs) -q
+	sleep 1
+
+	local nn=$(bridge mdb show dev br0 | grep $GRP | wc -l)
+	if ((nn != n)); then
+		echo mcast_max_groups > /dev/stderr
+		false
+	fi
+}
+
+ctl4_entries_del()
+{
+	local locus=$1; shift
+	local state=$1; shift
+	local n=$1; shift
+
+	local peer=$(dev_peer $locus)
+	local GRP=239.1.1.1
+	$MZ $peer -c 1 -A 192.0.2.1 -B 224.0.0.2 \
+		-t ip proto=2,p=$(igmpv2_leave_get $GRP) -q
+	sleep 1
+	! bridge mdb show dev br0 | grep -q $GRP
+}
+
+ctl6_entries_add()
+{
+	local locus=$1; shift
+	local state=$1; shift
+	local n=$1; shift
+
+	local IPs=$(printf "2001:db8:1::%x\n" $(seq 1 $((n - 1))))
+	local peer=$(dev_peer $locus)
+	local SIP=fe80::1
+	local GRP=ff0e::1
+	local p=$(mldv2_is_in_get $SIP $GRP $IPs)
+	$MZ -6 $peer -c 1 -A $SIP -B $GRP -t ip hop=1,next=0,p="$p" -q
+	sleep 1
+
+	local nn=$(bridge mdb show dev br0 | grep $GRP | wc -l)
+	if ((nn != n)); then
+		echo mcast_max_groups > /dev/stderr
+		false
+	fi
+}
+
+ctl6_entries_del()
+{
+	local locus=$1; shift
+	local state=$1; shift
+	local n=$1; shift
+
+	local peer=$(dev_peer $locus)
+	local SIP=fe80::1
+	local GRP=ff0e::1
+	local p=$(mldv1_done_get $SIP $GRP)
+	$MZ -6 $peer -c 1 -A $SIP -B $GRP -t ip hop=1,next=0,p="$p" -q
+	sleep 1
+	! bridge mdb show dev br0 | grep -q $GRP
+}
+
+bridge_maxgroups_errmsg_check_cfg()
+{
+	local msg=$1; shift
+	local needle=$1; shift
+
+	echo "$msg" | grep -q mcast_max_groups
+	check_err $? "Adding MDB entries failed for the wrong reason: $msg"
+}
+
+bridge_maxgroups_errmsg_check_cfg4()
+{
+	bridge_maxgroups_errmsg_check_cfg "$@"
+}
+
+bridge_maxgroups_errmsg_check_cfg6()
+{
+	bridge_maxgroups_errmsg_check_cfg "$@"
+}
+
+bridge_maxgroups_errmsg_check_ctl4()
+{
+	:
+}
+
+bridge_maxgroups_errmsg_check_ctl6()
+{
+	:
+}
+
+bridge_port_ngroups_get()
+{
+	local locus=$1; shift
+
+	bridge -j -d link show $locus |
+	    jq '.[].mcast_n_groups'
+}
+
+bridge_port_maxgroups_get()
+{
+	local locus=$1; shift
+
+	bridge -j -d link show $locus |
+	    jq '.[].mcast_max_groups'
+}
+
+bridge_port_maxgroups_set()
+{
+	local locus=$1; shift
+	local max=$1; shift
+
+	bridge link set $locus mcast_max_groups $max
+}
+
+bridge_port_vlan_ngroups_get()
+{
+	local locus=$1; shift
+
+	bridge -j -d vlan show $locus |
+	    jq '.[].vlans[].mcast_n_groups'
+}
+
+bridge_port_vlan_maxgroups_get()
+{
+	local locus=$1; shift
+
+	bridge -j -d vlan show $locus |
+	    jq '.[].vlans[].mcast_max_groups'
+}
+
+bridge_port_vlan_maxgroups_set()
+{
+	local locus=$1; shift
+	local max=$1; shift
+
+	bridge vlan set $locus mcast_max_groups $max
+}
+
+test_port_ngroups()
+{
+	local CFG=$1; shift
+
+	RET=0
+
+	local n0=$(bridge_port_ngroups_get "dev $swp1")
+	${CFG}_entries_add "dev $swp1 vid 10" temp 5
+	check_err $? "Couldn't add MDB entries"
+	local n1=$(bridge_port_ngroups_get "dev $swp1")
+
+	((n1 == n0 + 5))
+	check_err $? "Number of groups was $n0, now is $n1, but $((n0 + 5)) expected"
+
+	${CFG}_entries_del "dev $swp1 vid 10" temp 5
+	check_err $? "Couldn't delete MDB entries"
+	local n2=$(bridge_port_ngroups_get "dev $swp1")
+
+	((n2 == n0))
+	check_err $? "Number of groups was $n0, now is $n2, but should be back to $n0"
+
+	log_test "$CFG: Port ngroups"
+}
+
+test_port_ngroups_cfg4()
+{
+	test_port_ngroups cfg4
+}
+
+test_port_ngroups_cfg6()
+{
+	test_port_ngroups cfg6
+}
+
+test_port_ngroups_ctl4()
+{
+	test_port_ngroups ctl4
+}
+
+test_port_ngroups_ctl6()
+{
+	test_port_ngroups ctl6
+}
+
+test_port_vlan_ngroups()
+{
+	local CFG=$1; shift
+
+	RET=0
+
+	local n10=$(bridge_port_vlan_ngroups_get "dev $swp1 vid 10")
+	local n20=$(bridge_port_vlan_ngroups_get "dev $swp1 vid 20")
+	${CFG}_entries_add "dev $swp1 vid 10" temp 5
+	check_err $? "Couldn't add MDB entries to VLAN 10"
+	local n11=$(bridge_port_vlan_ngroups_get "dev $swp1 vid 10")
+	local n21=$(bridge_port_vlan_ngroups_get "dev $swp1 vid 20")
+
+	((n11 == n10 + 5))
+	check_err $? "Number of groups at VLAN 10 was $n10, now is $n11, but 5 entries added on VLAN 10, $((n10 + 5)) expected"
+
+	((n21 == n20))
+	check_err $? "Number of groups at VLAN 20 was $n20, now is $n21, but no change expected on VLAN 20"
+
+	${CFG}_entries_add "dev $swp1 vid 20" temp 5
+	check_err $? "Couldn't add MDB entries to VLAN 20"
+	local n12=$(bridge_port_vlan_ngroups_get "dev $swp1 vid 10")
+	local n22=$(bridge_port_vlan_ngroups_get "dev $swp1 vid 20")
+
+	((n12 == n11))
+	check_err $? "Number of groups at VLAN 10 was $n11, now is $n12, but no change expected on VLAN 10"
+
+	((n22 == n21 + 5))
+	check_err $? "Number of groups at VLAN 20 was $n21, now is $n22, but 5 entries added on VLAN 20, $((n21 + 5)) expected"
+
+	${CFG}_entries_del "dev $swp1 vid 10" temp 5
+	check_err $? "Couldn't delete MDB entries from VLAN 10"
+	${CFG}_entries_del "dev $swp1 vid 20" temp 5
+	check_err $? "Couldn't delete MDB entries from VLAN 20"
+	local n13=$(bridge_port_vlan_ngroups_get "dev $swp1 vid 10")
+	local n23=$(bridge_port_vlan_ngroups_get "dev $swp1 vid 20")
+
+	((n13 == n10))
+	check_err $? "Number of groups at VLAN 10 was $n10, now is $n13, but should be back to $n10"
+
+	((n23 == n20))
+	check_err $? "Number of groups at VLAN 20 was $n20, now is $n23, but should be back to $n20"
+
+	log_test "$CFG: Port-vlan ngroups"
+}
+
+test_port_vlan_ngroups_cfg4()
+{
+	test_port_vlan_ngroups cfg4
+}
+
+test_port_vlan_ngroups_cfg6()
+{
+	test_port_vlan_ngroups cfg6
+}
+
+test_maxgroups_zero()
+{
+	local CFG=$1; shift
+	local context=$1; shift
+	local locus=$1; shift
+
+	RET=0
+	local max
+
+	max=$(bridge_${context}_maxgroups_get "$locus")
+	((max == 0))
+	check_err $? "Max groups on $locus should be 0, but $max reported"
+
+	bridge_${context}_maxgroups_set "$locus" 100
+	check_err $? "Failed to set max to 100"
+	max=$(bridge_${context}_maxgroups_get "$locus")
+	((max == 100))
+	check_err $? "Max groups expected to be 100, but $max reported"
+
+	bridge_${context}_maxgroups_set "$locus" 0
+	check_err $? "Couldn't set maximum to 0"
+
+	# Test that setting 0 explicitly still serves as infinity.
+	${CFG}_entries_add "$locus" temp 5
+	check_err $? "Adding 5 MDB entries failed but should have passed"
+	${CFG}_entries_del "$locus" temp 5
+	check_err $? "Couldn't delete MDB entries"
+
+	log_test "$CFG: $context maxgroups: reporting and treatment of 0"
+}
+
+test_port_maxgroups_zero_cfg4()
+{
+	test_maxgroups_zero cfg4 port "dev $swp1"
+}
+
+test_port_maxgroups_zero_ctl4()
+{
+	test_maxgroups_zero ctl4 port "dev $swp1"
+}
+
+test_port_vlan_maxgroups_zero_cfg4()
+{
+	test_maxgroups_zero cfg4 port_vlan "dev $swp1 vid 10"
+	test_maxgroups_zero cfg4 port_vlan "dev $swp1 vid 20"
+}
+
+test_port_maxgroups_zero_cfg6()
+{
+	test_maxgroups_zero cfg6 port "dev $swp1"
+}
+
+test_port_maxgroups_zero_ctl6()
+{
+	test_maxgroups_zero ctl6 port "dev $swp1"
+}
+
+test_port_vlan_maxgroups_zero_cfg6()
+{
+	test_maxgroups_zero cfg6 port_vlan "dev $swp1 vid 10"
+	test_maxgroups_zero cfg6 port_vlan "dev $swp1 vid 20"
+}
+
+test_port_vlan_maxgroups_zero_cross_vlan()
+{
+	local CFG=$1; shift
+
+	local locus0="dev $swp1"
+	local locus1="dev $swp1 vid 10"
+	local locus2="dev $swp1 vid 20"
+	local max
+
+	RET=0
+
+	bridge_port_vlan_maxgroups_set "$locus1" 100
+	check_err $? "$locus1: Failed to set max to 100"
+
+	max=$(bridge_port_maxgroups_get "$locus0")
+	((max == 0))
+	check_err $? "$locus0: Max groups expected to be 0, but $max reported"
+
+	max=$(bridge_port_vlan_maxgroups_get "$locus2")
+	((max == 0))
+	check_err $? "$locus2: Max groups expected to be 0, but $max reported"
+
+	bridge_port_vlan_maxgroups_set "$locus2" 100
+	check_err $? "$locus2: Failed to set max to 100"
+
+	max=$(bridge_port_maxgroups_get "$locus0")
+	((max == 0))
+	check_err $? "$locus0: Max groups expected to be 0, but $max reported"
+
+	max=$(bridge_port_vlan_maxgroups_get "$locus2")
+	((max == 100))
+	check_err $? "$locus2: Max groups expected to be 100, but $max reported"
+
+	bridge_port_maxgroups_set "$locus0" 100
+	check_err $? "$locus0: Failed to set max to 100"
+
+	max=$(bridge_port_maxgroups_get "$locus0")
+	((max == 100))
+	check_err $? "$locus0: Max groups expected to be 100, but $max reported"
+
+	max=$(bridge_port_vlan_maxgroups_get "$locus2")
+	((max == 100))
+	check_err $? "$locus2: Max groups expected to be 100, but $max reported"
+
+	bridge_port_vlan_maxgroups_set "$locus1" 0
+	check_err $? "$locus1: Failed to set max to 0"
+
+	max=$(bridge_port_maxgroups_get "$locus0")
+	((max == 100))
+	check_err $? "$locus0: Max groups expected to be 100, but $max reported"
+
+	max=$(bridge_port_vlan_maxgroups_get "$locus2")
+	((max == 100))
+	check_err $? "$locus2: Max groups expected to be 100, but $max reported"
+
+	bridge_port_vlan_maxgroups_set "$locus2" 0
+	check_err $? "$locus2: Failed to set max to 0"
+
+	max=$(bridge_port_maxgroups_get "$locus0")
+	((max == 100))
+	check_err $? "$locus0: Max groups expected to be 100, but $max reported"
+
+	max=$(bridge_port_vlan_maxgroups_get "$locus2")
+	((max == 0))
+	check_err $? "$locus2: Max groups expected to be 0 but $max reported"
+
+	bridge_port_maxgroups_set "$locus0" 0
+	check_err $? "$locus0: Failed to set max to 0"
+
+	max=$(bridge_port_maxgroups_get "$locus0")
+	((max == 0))
+	check_err $? "$locus0: Max groups expected to be 0, but $max reported"
+
+	max=$(bridge_port_vlan_maxgroups_get "$locus2")
+	((max == 0))
+	check_err $? "$locus2: Max groups expected to be 0, but $max reported"
+
+	log_test "$CFG: port_vlan maxgroups: isolation of port and per-VLAN maximums"
+}
+
+test_maxgroups_too_low()
+{
+	local CFG=$1; shift
+	local context=$1; shift
+	local locus=$1; shift
+
+	RET=0
+
+	local n=$(bridge_${context}_ngroups_get "$locus")
+	local msg
+
+	${CFG}_entries_add "$locus" temp 5
+	msg=$(bridge_${context}_maxgroups_set "$locus" $((n+1)) 2>&1)
+	check_fail $? "$locus: Setting maxgroups to $((n+1)) passed, but should have failed"
+	bridge_maxgroups_errmsg_check_cfg "$msg"
+	${CFG}_entries_del "$locus" temp 5
+	check_err $? "$locus: Couldn't delete MDB entries"
+
+	bridge_${context}_maxgroups_set "$locus" 0
+	check_err $? "$locus: Couldn't set maximum to 0"
+
+	log_test "$CFG: $context maxgroups: configure below ngroups"
+}
+
+test_port_maxgroups_too_low_cfg4()
+{
+	test_maxgroups_too_low cfg4 port "dev $swp1"
+}
+
+test_port_maxgroups_too_low_ctl4()
+{
+	test_maxgroups_too_low ctl4 port "dev $swp1"
+}
+
+test_port_vlan_maxgroups_too_low_cfg4()
+{
+	test_maxgroups_too_low cfg4 port_vlan "dev $swp1 vid 10"
+	test_maxgroups_too_low cfg4 port_vlan "dev $swp1 vid 20"
+}
+
+test_port_maxgroups_too_low_cfg6()
+{
+	test_maxgroups_too_low cfg6 port "dev $swp1"
+}
+
+test_port_maxgroups_too_low_ctl6()
+{
+	test_maxgroups_too_low ctl6 port "dev $swp1"
+}
+
+test_port_vlan_maxgroups_too_low_cfg6()
+{
+	test_maxgroups_too_low cfg6 port_vlan "dev $swp1 vid 10"
+	test_maxgroups_too_low cfg6 port_vlan "dev $swp1 vid 20"
+}
+
+test_maxgroups_too_many_entries()
+{
+	local CFG=$1; shift
+	local context=$1; shift
+	local locus=$1; shift
+
+	RET=0
+
+	local n=$(bridge_${context}_ngroups_get "$locus")
+	local msg
+
+	# Configure a low maximum
+	bridge_${context}_maxgroups_set "$locus" $((n+1))
+	check_err $? "$locus: Couldn't set maximum"
+
+	# Try to add more entries than the configured maximum
+	msg=$(${CFG}_entries_add "$locus" temp 5 2>&1)
+	check_fail $? "Adding 5 MDB entries passed, but should have failed"
+	bridge_maxgroups_errmsg_check_${CFG} "$msg"
+
+	# When adding entries through the control path, as many as possible
+	# get created. That's consistent with the mcast_hash_max behavior.
+	# So there, drop the entries explicitly.
+	if [[ ${CFG%[46]} == ctl ]]; then
+		${CFG}_entries_del "$locus" temp 17 2>&1
+	fi
+
+	local n2=$(bridge_${context}_ngroups_get "$locus")
+	((n2 == n))
+	check_err $? "Number of groups was $n, but after a failed attempt to add MDB entries it changed to $n2"
+
+	bridge_${context}_maxgroups_set "$locus" 0
+	check_err $? "$locus: Couldn't set maximum to 0"
+
+	log_test "$CFG: $context maxgroups: add too many MDB entries"
+}
+
+test_port_maxgroups_too_many_entries_cfg4()
+{
+	test_maxgroups_too_many_entries cfg4 port "dev $swp1"
+}
+
+test_port_maxgroups_too_many_entries_ctl4()
+{
+	test_maxgroups_too_many_entries ctl4 port "dev $swp1"
+}
+
+test_port_maxgroups_too_many_entries_cfg6()
+{
+	test_maxgroups_too_many_entries cfg6 port "dev $swp1"
+}
+
+test_port_maxgroups_too_many_entries_ctl6()
+{
+	test_maxgroups_too_many_entries ctl6 port "dev $swp1"
+}
+
+test_port_vlan_maxgroups_too_many_entries_cfg4()
+{
+	test_maxgroups_too_many_entries cfg4 port_vlan "dev $swp1 vid 10"
+	test_maxgroups_too_many_entries cfg4 port_vlan "dev $swp1 vid 20"
+}
+
+test_port_vlan_maxgroups_too_many_entries_cfg6()
+{
+	test_maxgroups_too_many_entries cfg6 port_vlan "dev $swp1 vid 10"
+	test_maxgroups_too_many_entries cfg6 port_vlan "dev $swp1 vid 20"
+}
+
+test_port_maxgroups_cfg4()
+{
+	test_port_maxgroups_zero_cfg4
+	test_port_maxgroups_too_low_cfg4
+	test_port_maxgroups_too_many_entries_cfg4
+}
+
+test_port_maxgroups_ctl4()
+{
+	test_port_maxgroups_zero_ctl4
+	test_port_maxgroups_too_low_ctl4
+	test_port_maxgroups_too_many_entries_ctl4
+}
+
+test_port_maxgroups_cfg6()
+{
+	test_port_maxgroups_zero_cfg6
+	test_port_maxgroups_too_low_cfg6
+	test_port_maxgroups_too_many_entries_cfg6
+}
+
+test_port_maxgroups_ctl6()
+{
+	test_port_maxgroups_zero_ctl6
+	test_port_maxgroups_too_low_ctl6
+	test_port_maxgroups_too_many_entries_ctl6
+}
+
+test_port_vlan_maxgroups_too_many_cross_vlan()
+{
+	local CFG=$1; shift
+
+	RET=0
+
+	local locus0="dev $swp1"
+	local locus1="dev $swp1 vid 10"
+	local locus2="dev $swp1 vid 20"
+	local n1=$(bridge_port_vlan_ngroups_get "$locus1")
+	local n2=$(bridge_port_vlan_ngroups_get "$locus2")
+	local msg
+
+	if ((n1 > n2)); then
+		local tmp=$n1
+		n1=$n2
+		n2=$tmp
+
+		tmp="$locus1"
+		locus1="$locus2"
+		locus2="$tmp"
+	fi
+
+	# Now 0 <= n1 <= n2.
+	${CFG}_entries_add "$locus2" temp 5
+	check_err $? "Couldn't add 5 entries"
+
+	n2=$(bridge_port_vlan_ngroups_get "$locus2")
+	# Now 0 <= n1 < n2-1.
+
+	# Setting locus1'maxgroups to n2-1 should pass. The number is
+	# smaller than both the absolute number of MDB entries, and in
+	# particular than number of locus2's number of entries, but it is
+	# large enough to cover locus1's entries. Thus we check that
+	# individual VLAN's ngroups are independent.
+	bridge_port_vlan_maxgroups_set "$locus1" $((n2-1))
+	check_err $? "Setting ${locus1}'s maxgroups to $((n2-1)) failed"
+
+	msg=$(${CFG}_entries_add "$locus1" temp $n2 2>&1)
+	check_fail $? "$locus1: Adding $n2 MDB entries passed, but should have failed"
+	bridge_maxgroups_errmsg_check_${CFG} "$msg"
+
+	bridge_port_maxgroups_set "$locus0" $((n1 + n2 + 2))
+	check_err $? "$locus0: Couldn't set maximum"
+
+	msg=$(${CFG}_entries_add "$locus1" temp 5 2>&1)
+	check_fail $? "$locus1: Adding 5 MDB entries passed, but should have failed"
+	bridge_maxgroups_errmsg_check_${CFG} "$msg"
+
+	${CFG}_entries_add "$locus1" temp 2
+	check_err $? "$locus1: Adding 2 MDB entries failed, but should have passed"
+
+	${CFG}_entries_del "$locus1" temp 2
+	check_err $? "Couldn't delete MDB entries"
+
+	${CFG}_entries_del "$locus2" temp 5
+	check_err $? "Couldn't delete MDB entries"
+
+	bridge_port_vlan_maxgroups_set "$locus1" 0
+	check_err $? "$locus1: Couldn't set maximum to 0"
+
+	bridge_port_maxgroups_set "$locus0" 0
+	check_err $? "$locus0: Couldn't set maximum to 0"
+
+	log_test "$CFG: port_vlan maxgroups: isolation of port and per-VLAN ngroups"
+}
+
+test_port_vlan_maxgroups_cfg4()
+{
+	test_port_vlan_maxgroups_zero_cfg4
+	test_port_vlan_maxgroups_zero_cross_vlan cfg4
+	test_port_vlan_maxgroups_too_low_cfg4
+	test_port_vlan_maxgroups_too_many_entries_cfg4
+	test_port_vlan_maxgroups_too_many_cross_vlan cfg4
+}
+
+test_port_vlan_maxgroups_cfg6()
+{
+	test_port_vlan_maxgroups_zero_cfg6
+	test_port_vlan_maxgroups_zero_cross_vlan cfg6
+	test_port_vlan_maxgroups_too_low_cfg6
+	test_port_vlan_maxgroups_too_many_entries_cfg6
+	test_port_vlan_maxgroups_too_many_cross_vlan cfg6
+}
+
+test_vlan_attributes()
+{
+	local locus=$1; shift
+	local expect=$1; shift
+
+	RET=0
+
+	local max=$(bridge_port_vlan_maxgroups_get "$locus")
+	local n=$(bridge_port_vlan_ngroups_get "$locus")
+
+	eval "[[ $max $expect ]]"
+	check_err $? "$locus: maxgroups attribute expected to be $expect, but was $max"
+
+	eval "[[ $n $expect ]]"
+	check_err $? "$locus: ngroups attribute expected to be $expect, but was $n"
+
+	log_test "port_vlan: presence of ngroups and maxgroups attributes"
+}
+
+test_vlan_attributes_off()
+{
+	test_vlan_attributes "dev $swp1 vid 10" "== null"
+}
+
+test_vlan_attributes_on()
+{
+	test_vlan_attributes "dev $swp1 vid 10" "-ge 0"
+}
+
+test_port_vlan_toggle_vlan_snooping_mode()
+{
+	local mode=$1; shift
+
+	RET=0
+
+	local CFG=cfg4
+	local context=port_vlan
+	local locus="dev $swp1 vid 10"
+
+	${CFG}_entries_add "$locus" $mode 5
+	check_err $? "Couldn't add MDB entries"
+
+	bridge_${context}_maxgroups_set "$locus" 100
+	check_err $? "Failed to set max to 100"
+
+	ip link set dev br0 type bridge mcast_vlan_snooping 0
+	sleep 1
+	ip link set dev br0 type bridge mcast_vlan_snooping 1
+
+	local n=$(bridge_${context}_ngroups_get "$locus")
+	local nn=$(bridge mdb show dev br0 | grep $swp1 | wc -l)
+	((nn == n))
+	check_err $? "mcast_n_groups expected to be $nn, but $n reported"
+
+	local max=$(bridge_${context}_maxgroups_get "$locus")
+	((max == 0))
+	check_err $? "Max groups expected to be 0 but $max reported"
+
+	log_test "$CFG: $context: $mode: mcast_vlan_snooping toggle"
+}
+
+test_port_vlan_toggle_vlan_snooping()
+{
+	test_port_vlan_toggle_vlan_snooping_mode temp
+	test_port_vlan_toggle_vlan_snooping_mode permanent
+}
+
+trap cleanup EXIT
+
+setup_prepare
+setup_wait
+tests_run
+
+exit $EXIT_STATUS
-- 
2.39.0


^ permalink raw reply related	[flat|nested] 90+ messages in thread

* [Bridge] [PATCH net-next 16/16] selftests: forwarding: bridge_mdb_max: Add a new selftest
@ 2023-01-26 17:01   ` Petr Machata
  0 siblings, 0 replies; 90+ messages in thread
From: Petr Machata @ 2023-01-26 17:01 UTC (permalink / raw)
  To: David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
	Roopa Prabhu, Nikolay Aleksandrov, netdev
  Cc: Petr Machata, Ido Schimmel, bridge

Add a suite covering mcast_n_groups and mcast_max_groups bridge features.

Signed-off-by: Petr Machata <petrm@nvidia.com>
---
 .../testing/selftests/net/forwarding/Makefile |   1 +
 .../net/forwarding/bridge_mdb_max.sh          | 970 ++++++++++++++++++
 2 files changed, 971 insertions(+)
 create mode 100755 tools/testing/selftests/net/forwarding/bridge_mdb_max.sh

diff --git a/tools/testing/selftests/net/forwarding/Makefile b/tools/testing/selftests/net/forwarding/Makefile
index 453ae006fbcf..91201ab3c4fc 100644
--- a/tools/testing/selftests/net/forwarding/Makefile
+++ b/tools/testing/selftests/net/forwarding/Makefile
@@ -4,6 +4,7 @@ TEST_PROGS = bridge_igmp.sh \
 	bridge_locked_port.sh \
 	bridge_mdb.sh \
 	bridge_mdb_host.sh \
+	bridge_mdb_max.sh \
 	bridge_mdb_port_down.sh \
 	bridge_mld.sh \
 	bridge_port_isolation.sh \
diff --git a/tools/testing/selftests/net/forwarding/bridge_mdb_max.sh b/tools/testing/selftests/net/forwarding/bridge_mdb_max.sh
new file mode 100755
index 000000000000..20c8831f7cde
--- /dev/null
+++ b/tools/testing/selftests/net/forwarding/bridge_mdb_max.sh
@@ -0,0 +1,970 @@
+#!/bin/bash
+# SPDX-License-Identifier: GPL-2.0
+
+# +-----------------------+                          +------------------------+
+# | H1 (vrf)              |                          | H2 (vrf)               |
+# | + $h1.10              |                          | + $h2.10               |
+# | | 192.0.2.1/28        |                          | | 192.0.2.2/28         |
+# | | 2001:db8:1::1/64    |                          | | 2001:db8:1::2/64     |
+# | |                     |                          | |                      |
+# | |  + $h1.20           |                          | |  + $h2.20            |
+# | \  | 198.51.100.1/24  |                          | \  | 198.51.100.2/24   |
+# |  \ | 2001:db8:2::1/64 |                          |  \ | 2001:db8:2::2/64  |
+# |   \|                  |                          |   \|                   |
+# |    + $h1              |                          |    + $h2               |
+# +----|------------------+                          +----|-------------------+
+#      |                                                  |
+# +----|--------------------------------------------------|-------------------+
+# | SW |                                                  |                   |
+# | +--|--------------------------------------------------|-----------------+ |
+# | |  + $swp1                   BR0 (802.1q)             + $swp2           | |
+# | |     vid 10                                             vid 10         | |
+# | |     vid 20                                             vid 20         | |
+# | |                                                                       | |
+# | +-----------------------------------------------------------------------+ |
+# +---------------------------------------------------------------------------+
+
+ALL_TESTS="
+	$(: Tests vlan_filtering 0 mcast_vlan_snooping 0. )
+	test_port_ngroups_cfg4
+	test_port_maxgroups_cfg4
+	test_port_ngroups_ctl4
+	test_port_maxgroups_ctl4
+	test_port_ngroups_cfg6
+	test_port_maxgroups_cfg6
+	test_port_ngroups_ctl6
+	test_port_maxgroups_ctl6
+
+	switch_destroy
+	switch_create_8021q
+	setup_wait
+
+	$(: Tests vlan_filtering 1 mcast_vlan_snooping 0. )
+	test_vlan_attributes_off
+	test_port_ngroups_cfg4
+	test_port_maxgroups_cfg4
+	test_port_ngroups_ctl4
+	test_port_maxgroups_ctl4
+	test_port_ngroups_cfg6
+	test_port_maxgroups_cfg6
+	test_port_ngroups_ctl6
+	test_port_maxgroups_ctl6
+
+	switch_destroy
+	switch_create_vlan_snooping
+	setup_wait
+
+	$(: Tests vlan_filtering 1 mcast_vlan_snooping 1. )
+	test_vlan_attributes_on
+	test_port_ngroups_cfg4
+	test_port_maxgroups_cfg4
+	test_port_vlan_ngroups_cfg4
+	test_port_vlan_maxgroups_cfg4
+	test_port_ngroups_cfg6
+	test_port_maxgroups_cfg6
+	test_port_vlan_ngroups_cfg6
+	test_port_vlan_maxgroups_cfg6
+	test_port_vlan_toggle_vlan_snooping
+"
+
+NUM_NETIFS=4
+source lib.sh
+source tc_common.sh
+
+h1_create()
+{
+	simple_if_init $h1
+	vlan_create $h1 10 v$h1 192.0.2.1/28 2001:db8:1::1/64
+	vlan_create $h1 20 v$h1 198.51.100.1/24 2001:db8:2::1/64
+}
+
+h1_destroy()
+{
+	vlan_destroy $h1 20
+	vlan_destroy $h1 10
+	simple_if_fini $h1
+}
+
+h2_create()
+{
+	simple_if_init $h2
+	vlan_create $h2 10 v$h2 192.0.2.2/28
+	vlan_create $h2 20 v$h2 198.51.100.2/24
+}
+
+h2_destroy()
+{
+	vlan_destroy $h2 20
+	vlan_destroy $h2 10
+	simple_if_fini $h2
+}
+
+switch_create_8021d()
+{
+	log_info "802.1d tests"
+
+	ip link add name br0 type bridge vlan_filtering 0 \
+		mcast_snooping 1 \
+		mcast_igmp_version 3 mcast_mld_version 2
+	ip link set dev br0 up
+
+	ip link set dev $swp1 master br0
+	ip link set dev $swp1 up
+	bridge link set dev $swp1 fastleave on
+
+	ip link set dev $swp2 master br0
+	ip link set dev $swp2 up
+}
+
+switch_create_8021q()
+{
+	local br_flags=$1; shift
+
+	log_info "802.1q $br_flags${br_flags:+ }tests"
+
+	ip link add name br0 type bridge vlan_filtering 1 vlan_default_pvid 0 \
+		mcast_snooping 1 $br_flags \
+		mcast_igmp_version 3 mcast_mld_version 2
+	bridge vlan add vid 10 dev br0 self
+	bridge vlan add vid 20 dev br0 self
+	ip link set dev br0 up
+
+	ip link set dev $swp1 master br0
+	ip link set dev $swp1 up
+	bridge link set dev $swp1 fastleave on
+	bridge vlan add vid 10 dev $swp1
+	bridge vlan add vid 20 dev $swp1
+
+	ip link set dev $swp2 master br0
+	ip link set dev $swp2 up
+	bridge vlan add vid 10 dev $swp2
+	bridge vlan add vid 20 dev $swp2
+}
+
+switch_create_vlan_snooping()
+{
+	switch_create_8021q "mcast_vlan_snooping 1"
+}
+
+switch_destroy()
+{
+	ip link set dev $swp2 down
+	ip link set dev $swp2 nomaster
+
+	ip link set dev $swp1 down
+	ip link set dev $swp1 nomaster
+
+	ip link set dev br0 down
+	ip link del dev br0
+}
+
+setup_prepare()
+{
+	h1=${NETIFS[p1]}
+	swp1=${NETIFS[p2]}
+
+	swp2=${NETIFS[p3]}
+	h2=${NETIFS[p4]}
+
+	vrf_prepare
+	forwarding_enable
+
+	h1_create
+	h2_create
+	switch_create_8021d
+}
+
+cleanup()
+{
+	pre_cleanup
+
+	switch_destroy
+	h2_destroy
+	h1_destroy
+
+	forwarding_restore
+	vrf_cleanup
+}
+
+cfg_src_list()
+{
+	local IPs=("$@")
+	local IPstr=$(echo ${IPs[@]} | tr '[:space:]' , | sed 's/,$//')
+
+	echo ${IPstr:+source_list }${IPstr}
+}
+
+cfg_group_op()
+{
+	local op=$1; shift
+	local locus=$1; shift
+	local GRP=$1; shift
+	local state=$1; shift
+	local IPs=("$@")
+
+	local source_list=$(cfg_src_list ${IPs[@]})
+
+	# Everything besides `bridge mdb' uses the "dev X vid Y" syntax,
+	# so we use it here as well and convert.
+	local br_locus=$(echo "$locus" | sed 's/^dev /port /')
+
+	bridge mdb $op dev br0 $br_locus grp $GRP $state \
+	       filter_mode include $source_list
+}
+
+cfg4_entries_op()
+{
+	local op=$1; shift
+	local locus=$1; shift
+	local state=$1; shift
+	local n=$1; shift
+
+	local GRP=239.1.1.1
+	local IPs=$(seq -f 192.0.2.%g 1 $((n - 1)))
+	cfg_group_op "$op" "$locus" "$GRP" "$state" ${IPs[@]}
+}
+
+cfg4_entries_add()
+{
+	cfg4_entries_op add "$@"
+}
+
+cfg4_entries_del()
+{
+	cfg4_entries_op del "$@"
+}
+
+cfg6_entries_op()
+{
+	local op=$1; shift
+	local locus=$1; shift
+	local state=$1; shift
+	local n=$1; shift
+
+	local GRP=ff0e::1
+	local IPs=$(printf "2001:db8:1::%x\n" $(seq 1 $((n - 1))))
+	cfg_group_op "$op" "$locus" "$GRP" "$state" ${IPs[@]}
+}
+
+cfg6_entries_add()
+{
+	cfg6_entries_op add "$@"
+}
+
+cfg6_entries_del()
+{
+	cfg6_entries_op del "$@"
+}
+
+dev_peer()
+{
+	local dev_kw=$1; shift
+	local dev=$1; shift
+	local vid_kw=$1; shift
+	local vid=$1; shift
+
+	echo "$h1.${vid:-10}"
+}
+
+ctl4_entries_add()
+{
+	local locus=$1; shift
+	local state=$1; shift
+	local n=$1; shift
+
+	local IPs=$(seq -f 192.0.2.%g 1 $((n - 1)))
+	local peer=$(dev_peer $locus)
+	local GRP=239.1.1.1
+	$MZ $peer -c 1 -A 192.0.2.1 -B $GRP \
+		-t ip proto=2,p=$(igmpv3_is_in_get $GRP $IPs) -q
+	sleep 1
+
+	local nn=$(bridge mdb show dev br0 | grep $GRP | wc -l)
+	if ((nn != n)); then
+		echo mcast_max_groups > /dev/stderr
+		false
+	fi
+}
+
+ctl4_entries_del()
+{
+	local locus=$1; shift
+	local state=$1; shift
+	local n=$1; shift
+
+	local peer=$(dev_peer $locus)
+	local GRP=239.1.1.1
+	$MZ $peer -c 1 -A 192.0.2.1 -B 224.0.0.2 \
+		-t ip proto=2,p=$(igmpv2_leave_get $GRP) -q
+	sleep 1
+	! bridge mdb show dev br0 | grep -q $GRP
+}
+
+ctl6_entries_add()
+{
+	local locus=$1; shift
+	local state=$1; shift
+	local n=$1; shift
+
+	local IPs=$(printf "2001:db8:1::%x\n" $(seq 1 $((n - 1))))
+	local peer=$(dev_peer $locus)
+	local SIP=fe80::1
+	local GRP=ff0e::1
+	local p=$(mldv2_is_in_get $SIP $GRP $IPs)
+	$MZ -6 $peer -c 1 -A $SIP -B $GRP -t ip hop=1,next=0,p="$p" -q
+	sleep 1
+
+	local nn=$(bridge mdb show dev br0 | grep $GRP | wc -l)
+	if ((nn != n)); then
+		echo mcast_max_groups > /dev/stderr
+		false
+	fi
+}
+
+ctl6_entries_del()
+{
+	local locus=$1; shift
+	local state=$1; shift
+	local n=$1; shift
+
+	local peer=$(dev_peer $locus)
+	local SIP=fe80::1
+	local GRP=ff0e::1
+	local p=$(mldv1_done_get $SIP $GRP)
+	$MZ -6 $peer -c 1 -A $SIP -B $GRP -t ip hop=1,next=0,p="$p" -q
+	sleep 1
+	! bridge mdb show dev br0 | grep -q $GRP
+}
+
+bridge_maxgroups_errmsg_check_cfg()
+{
+	local msg=$1; shift
+	local needle=$1; shift
+
+	echo "$msg" | grep -q mcast_max_groups
+	check_err $? "Adding MDB entries failed for the wrong reason: $msg"
+}
+
+bridge_maxgroups_errmsg_check_cfg4()
+{
+	bridge_maxgroups_errmsg_check_cfg "$@"
+}
+
+bridge_maxgroups_errmsg_check_cfg6()
+{
+	bridge_maxgroups_errmsg_check_cfg "$@"
+}
+
+bridge_maxgroups_errmsg_check_ctl4()
+{
+	:
+}
+
+bridge_maxgroups_errmsg_check_ctl6()
+{
+	:
+}
+
+bridge_port_ngroups_get()
+{
+	local locus=$1; shift
+
+	bridge -j -d link show $locus |
+	    jq '.[].mcast_n_groups'
+}
+
+bridge_port_maxgroups_get()
+{
+	local locus=$1; shift
+
+	bridge -j -d link show $locus |
+	    jq '.[].mcast_max_groups'
+}
+
+bridge_port_maxgroups_set()
+{
+	local locus=$1; shift
+	local max=$1; shift
+
+	bridge link set $locus mcast_max_groups $max
+}
+
+bridge_port_vlan_ngroups_get()
+{
+	local locus=$1; shift
+
+	bridge -j -d vlan show $locus |
+	    jq '.[].vlans[].mcast_n_groups'
+}
+
+bridge_port_vlan_maxgroups_get()
+{
+	local locus=$1; shift
+
+	bridge -j -d vlan show $locus |
+	    jq '.[].vlans[].mcast_max_groups'
+}
+
+bridge_port_vlan_maxgroups_set()
+{
+	local locus=$1; shift
+	local max=$1; shift
+
+	bridge vlan set $locus mcast_max_groups $max
+}
+
+test_port_ngroups()
+{
+	local CFG=$1; shift
+
+	RET=0
+
+	local n0=$(bridge_port_ngroups_get "dev $swp1")
+	${CFG}_entries_add "dev $swp1 vid 10" temp 5
+	check_err $? "Couldn't add MDB entries"
+	local n1=$(bridge_port_ngroups_get "dev $swp1")
+
+	((n1 == n0 + 5))
+	check_err $? "Number of groups was $n0, now is $n1, but $((n0 + 5)) expected"
+
+	${CFG}_entries_del "dev $swp1 vid 10" temp 5
+	check_err $? "Couldn't delete MDB entries"
+	local n2=$(bridge_port_ngroups_get "dev $swp1")
+
+	((n2 == n0))
+	check_err $? "Number of groups was $n0, now is $n2, but should be back to $n0"
+
+	log_test "$CFG: Port ngroups"
+}
+
+test_port_ngroups_cfg4()
+{
+	test_port_ngroups cfg4
+}
+
+test_port_ngroups_cfg6()
+{
+	test_port_ngroups cfg6
+}
+
+test_port_ngroups_ctl4()
+{
+	test_port_ngroups ctl4
+}
+
+test_port_ngroups_ctl6()
+{
+	test_port_ngroups ctl6
+}
+
+test_port_vlan_ngroups()
+{
+	local CFG=$1; shift
+
+	RET=0
+
+	local n10=$(bridge_port_vlan_ngroups_get "dev $swp1 vid 10")
+	local n20=$(bridge_port_vlan_ngroups_get "dev $swp1 vid 20")
+	${CFG}_entries_add "dev $swp1 vid 10" temp 5
+	check_err $? "Couldn't add MDB entries to VLAN 10"
+	local n11=$(bridge_port_vlan_ngroups_get "dev $swp1 vid 10")
+	local n21=$(bridge_port_vlan_ngroups_get "dev $swp1 vid 20")
+
+	((n11 == n10 + 5))
+	check_err $? "Number of groups at VLAN 10 was $n10, now is $n11, but 5 entries added on VLAN 10, $((n10 + 5)) expected"
+
+	((n21 == n20))
+	check_err $? "Number of groups at VLAN 20 was $n20, now is $n21, but no change expected on VLAN 20"
+
+	${CFG}_entries_add "dev $swp1 vid 20" temp 5
+	check_err $? "Couldn't add MDB entries to VLAN 20"
+	local n12=$(bridge_port_vlan_ngroups_get "dev $swp1 vid 10")
+	local n22=$(bridge_port_vlan_ngroups_get "dev $swp1 vid 20")
+
+	((n12 == n11))
+	check_err $? "Number of groups at VLAN 10 was $n11, now is $n12, but no change expected on VLAN 10"
+
+	((n22 == n21 + 5))
+	check_err $? "Number of groups at VLAN 20 was $n21, now is $n22, but 5 entries added on VLAN 20, $((n21 + 5)) expected"
+
+	${CFG}_entries_del "dev $swp1 vid 10" temp 5
+	check_err $? "Couldn't delete MDB entries from VLAN 10"
+	${CFG}_entries_del "dev $swp1 vid 20" temp 5
+	check_err $? "Couldn't delete MDB entries from VLAN 20"
+	local n13=$(bridge_port_vlan_ngroups_get "dev $swp1 vid 10")
+	local n23=$(bridge_port_vlan_ngroups_get "dev $swp1 vid 20")
+
+	((n13 == n10))
+	check_err $? "Number of groups at VLAN 10 was $n10, now is $n13, but should be back to $n10"
+
+	((n23 == n20))
+	check_err $? "Number of groups at VLAN 20 was $n20, now is $n23, but should be back to $n20"
+
+	log_test "$CFG: Port-vlan ngroups"
+}
+
+test_port_vlan_ngroups_cfg4()
+{
+	test_port_vlan_ngroups cfg4
+}
+
+test_port_vlan_ngroups_cfg6()
+{
+	test_port_vlan_ngroups cfg6
+}
+
+test_maxgroups_zero()
+{
+	local CFG=$1; shift
+	local context=$1; shift
+	local locus=$1; shift
+
+	RET=0
+	local max
+
+	max=$(bridge_${context}_maxgroups_get "$locus")
+	((max == 0))
+	check_err $? "Max groups on $locus should be 0, but $max reported"
+
+	bridge_${context}_maxgroups_set "$locus" 100
+	check_err $? "Failed to set max to 100"
+	max=$(bridge_${context}_maxgroups_get "$locus")
+	((max == 100))
+	check_err $? "Max groups expected to be 100, but $max reported"
+
+	bridge_${context}_maxgroups_set "$locus" 0
+	check_err $? "Couldn't set maximum to 0"
+
+	# Test that setting 0 explicitly still serves as infinity.
+	${CFG}_entries_add "$locus" temp 5
+	check_err $? "Adding 5 MDB entries failed but should have passed"
+	${CFG}_entries_del "$locus" temp 5
+	check_err $? "Couldn't delete MDB entries"
+
+	log_test "$CFG: $context maxgroups: reporting and treatment of 0"
+}
+
+test_port_maxgroups_zero_cfg4()
+{
+	test_maxgroups_zero cfg4 port "dev $swp1"
+}
+
+test_port_maxgroups_zero_ctl4()
+{
+	test_maxgroups_zero ctl4 port "dev $swp1"
+}
+
+test_port_vlan_maxgroups_zero_cfg4()
+{
+	test_maxgroups_zero cfg4 port_vlan "dev $swp1 vid 10"
+	test_maxgroups_zero cfg4 port_vlan "dev $swp1 vid 20"
+}
+
+test_port_maxgroups_zero_cfg6()
+{
+	test_maxgroups_zero cfg6 port "dev $swp1"
+}
+
+test_port_maxgroups_zero_ctl6()
+{
+	test_maxgroups_zero ctl6 port "dev $swp1"
+}
+
+test_port_vlan_maxgroups_zero_cfg6()
+{
+	test_maxgroups_zero cfg6 port_vlan "dev $swp1 vid 10"
+	test_maxgroups_zero cfg6 port_vlan "dev $swp1 vid 20"
+}
+
+test_port_vlan_maxgroups_zero_cross_vlan()
+{
+	local CFG=$1; shift
+
+	local locus0="dev $swp1"
+	local locus1="dev $swp1 vid 10"
+	local locus2="dev $swp1 vid 20"
+	local max
+
+	RET=0
+
+	bridge_port_vlan_maxgroups_set "$locus1" 100
+	check_err $? "$locus1: Failed to set max to 100"
+
+	max=$(bridge_port_maxgroups_get "$locus0")
+	((max == 0))
+	check_err $? "$locus0: Max groups expected to be 0, but $max reported"
+
+	max=$(bridge_port_vlan_maxgroups_get "$locus2")
+	((max == 0))
+	check_err $? "$locus2: Max groups expected to be 0, but $max reported"
+
+	bridge_port_vlan_maxgroups_set "$locus2" 100
+	check_err $? "$locus2: Failed to set max to 100"
+
+	max=$(bridge_port_maxgroups_get "$locus0")
+	((max == 0))
+	check_err $? "$locus0: Max groups expected to be 0, but $max reported"
+
+	max=$(bridge_port_vlan_maxgroups_get "$locus2")
+	((max == 100))
+	check_err $? "$locus2: Max groups expected to be 100, but $max reported"
+
+	bridge_port_maxgroups_set "$locus0" 100
+	check_err $? "$locus0: Failed to set max to 100"
+
+	max=$(bridge_port_maxgroups_get "$locus0")
+	((max == 100))
+	check_err $? "$locus0: Max groups expected to be 100, but $max reported"
+
+	max=$(bridge_port_vlan_maxgroups_get "$locus2")
+	((max == 100))
+	check_err $? "$locus2: Max groups expected to be 100, but $max reported"
+
+	bridge_port_vlan_maxgroups_set "$locus1" 0
+	check_err $? "$locus1: Failed to set max to 0"
+
+	max=$(bridge_port_maxgroups_get "$locus0")
+	((max == 100))
+	check_err $? "$locus0: Max groups expected to be 100, but $max reported"
+
+	max=$(bridge_port_vlan_maxgroups_get "$locus2")
+	((max == 100))
+	check_err $? "$locus2: Max groups expected to be 100, but $max reported"
+
+	bridge_port_vlan_maxgroups_set "$locus2" 0
+	check_err $? "$locus2: Failed to set max to 0"
+
+	max=$(bridge_port_maxgroups_get "$locus0")
+	((max == 100))
+	check_err $? "$locus0: Max groups expected to be 100, but $max reported"
+
+	max=$(bridge_port_vlan_maxgroups_get "$locus2")
+	((max == 0))
+	check_err $? "$locus2: Max groups expected to be 0 but $max reported"
+
+	bridge_port_maxgroups_set "$locus0" 0
+	check_err $? "$locus0: Failed to set max to 0"
+
+	max=$(bridge_port_maxgroups_get "$locus0")
+	((max == 0))
+	check_err $? "$locus0: Max groups expected to be 0, but $max reported"
+
+	max=$(bridge_port_vlan_maxgroups_get "$locus2")
+	((max == 0))
+	check_err $? "$locus2: Max groups expected to be 0, but $max reported"
+
+	log_test "$CFG: port_vlan maxgroups: isolation of port and per-VLAN maximums"
+}
+
+test_maxgroups_too_low()
+{
+	local CFG=$1; shift
+	local context=$1; shift
+	local locus=$1; shift
+
+	RET=0
+
+	local n=$(bridge_${context}_ngroups_get "$locus")
+	local msg
+
+	${CFG}_entries_add "$locus" temp 5
+	msg=$(bridge_${context}_maxgroups_set "$locus" $((n+1)) 2>&1)
+	check_fail $? "$locus: Setting maxgroups to $((n+1)) passed, but should have failed"
+	bridge_maxgroups_errmsg_check_cfg "$msg"
+	${CFG}_entries_del "$locus" temp 5
+	check_err $? "$locus: Couldn't delete MDB entries"
+
+	bridge_${context}_maxgroups_set "$locus" 0
+	check_err $? "$locus: Couldn't set maximum to 0"
+
+	log_test "$CFG: $context maxgroups: configure below ngroups"
+}
+
+test_port_maxgroups_too_low_cfg4()
+{
+	test_maxgroups_too_low cfg4 port "dev $swp1"
+}
+
+test_port_maxgroups_too_low_ctl4()
+{
+	test_maxgroups_too_low ctl4 port "dev $swp1"
+}
+
+test_port_vlan_maxgroups_too_low_cfg4()
+{
+	test_maxgroups_too_low cfg4 port_vlan "dev $swp1 vid 10"
+	test_maxgroups_too_low cfg4 port_vlan "dev $swp1 vid 20"
+}
+
+test_port_maxgroups_too_low_cfg6()
+{
+	test_maxgroups_too_low cfg6 port "dev $swp1"
+}
+
+test_port_maxgroups_too_low_ctl6()
+{
+	test_maxgroups_too_low ctl6 port "dev $swp1"
+}
+
+test_port_vlan_maxgroups_too_low_cfg6()
+{
+	test_maxgroups_too_low cfg6 port_vlan "dev $swp1 vid 10"
+	test_maxgroups_too_low cfg6 port_vlan "dev $swp1 vid 20"
+}
+
+test_maxgroups_too_many_entries()
+{
+	local CFG=$1; shift
+	local context=$1; shift
+	local locus=$1; shift
+
+	RET=0
+
+	local n=$(bridge_${context}_ngroups_get "$locus")
+	local msg
+
+	# Configure a low maximum
+	bridge_${context}_maxgroups_set "$locus" $((n+1))
+	check_err $? "$locus: Couldn't set maximum"
+
+	# Try to add more entries than the configured maximum
+	msg=$(${CFG}_entries_add "$locus" temp 5 2>&1)
+	check_fail $? "Adding 5 MDB entries passed, but should have failed"
+	bridge_maxgroups_errmsg_check_${CFG} "$msg"
+
+	# When adding entries through the control path, as many as possible
+	# get created. That's consistent with the mcast_hash_max behavior.
+	# So there, drop the entries explicitly.
+	if [[ ${CFG%[46]} == ctl ]]; then
+		${CFG}_entries_del "$locus" temp 17 2>&1
+	fi
+
+	local n2=$(bridge_${context}_ngroups_get "$locus")
+	((n2 == n))
+	check_err $? "Number of groups was $n, but after a failed attempt to add MDB entries it changed to $n2"
+
+	bridge_${context}_maxgroups_set "$locus" 0
+	check_err $? "$locus: Couldn't set maximum to 0"
+
+	log_test "$CFG: $context maxgroups: add too many MDB entries"
+}
+
+test_port_maxgroups_too_many_entries_cfg4()
+{
+	test_maxgroups_too_many_entries cfg4 port "dev $swp1"
+}
+
+test_port_maxgroups_too_many_entries_ctl4()
+{
+	test_maxgroups_too_many_entries ctl4 port "dev $swp1"
+}
+
+test_port_maxgroups_too_many_entries_cfg6()
+{
+	test_maxgroups_too_many_entries cfg6 port "dev $swp1"
+}
+
+test_port_maxgroups_too_many_entries_ctl6()
+{
+	test_maxgroups_too_many_entries ctl6 port "dev $swp1"
+}
+
+test_port_vlan_maxgroups_too_many_entries_cfg4()
+{
+	test_maxgroups_too_many_entries cfg4 port_vlan "dev $swp1 vid 10"
+	test_maxgroups_too_many_entries cfg4 port_vlan "dev $swp1 vid 20"
+}
+
+test_port_vlan_maxgroups_too_many_entries_cfg6()
+{
+	test_maxgroups_too_many_entries cfg6 port_vlan "dev $swp1 vid 10"
+	test_maxgroups_too_many_entries cfg6 port_vlan "dev $swp1 vid 20"
+}
+
+test_port_maxgroups_cfg4()
+{
+	test_port_maxgroups_zero_cfg4
+	test_port_maxgroups_too_low_cfg4
+	test_port_maxgroups_too_many_entries_cfg4
+}
+
+test_port_maxgroups_ctl4()
+{
+	test_port_maxgroups_zero_ctl4
+	test_port_maxgroups_too_low_ctl4
+	test_port_maxgroups_too_many_entries_ctl4
+}
+
+test_port_maxgroups_cfg6()
+{
+	test_port_maxgroups_zero_cfg6
+	test_port_maxgroups_too_low_cfg6
+	test_port_maxgroups_too_many_entries_cfg6
+}
+
+test_port_maxgroups_ctl6()
+{
+	test_port_maxgroups_zero_ctl6
+	test_port_maxgroups_too_low_ctl6
+	test_port_maxgroups_too_many_entries_ctl6
+}
+
+test_port_vlan_maxgroups_too_many_cross_vlan()
+{
+	local CFG=$1; shift
+
+	RET=0
+
+	local locus0="dev $swp1"
+	local locus1="dev $swp1 vid 10"
+	local locus2="dev $swp1 vid 20"
+	local n1=$(bridge_port_vlan_ngroups_get "$locus1")
+	local n2=$(bridge_port_vlan_ngroups_get "$locus2")
+	local msg
+
+	if ((n1 > n2)); then
+		local tmp=$n1
+		n1=$n2
+		n2=$tmp
+
+		tmp="$locus1"
+		locus1="$locus2"
+		locus2="$tmp"
+	fi
+
+	# Now 0 <= n1 <= n2.
+	${CFG}_entries_add "$locus2" temp 5
+	check_err $? "Couldn't add 5 entries"
+
+	n2=$(bridge_port_vlan_ngroups_get "$locus2")
+	# Now 0 <= n1 < n2-1.
+
+	# Setting locus1'maxgroups to n2-1 should pass. The number is
+	# smaller than both the absolute number of MDB entries, and in
+	# particular than number of locus2's number of entries, but it is
+	# large enough to cover locus1's entries. Thus we check that
+	# individual VLAN's ngroups are independent.
+	bridge_port_vlan_maxgroups_set "$locus1" $((n2-1))
+	check_err $? "Setting ${locus1}'s maxgroups to $((n2-1)) failed"
+
+	msg=$(${CFG}_entries_add "$locus1" temp $n2 2>&1)
+	check_fail $? "$locus1: Adding $n2 MDB entries passed, but should have failed"
+	bridge_maxgroups_errmsg_check_${CFG} "$msg"
+
+	bridge_port_maxgroups_set "$locus0" $((n1 + n2 + 2))
+	check_err $? "$locus0: Couldn't set maximum"
+
+	msg=$(${CFG}_entries_add "$locus1" temp 5 2>&1)
+	check_fail $? "$locus1: Adding 5 MDB entries passed, but should have failed"
+	bridge_maxgroups_errmsg_check_${CFG} "$msg"
+
+	${CFG}_entries_add "$locus1" temp 2
+	check_err $? "$locus1: Adding 2 MDB entries failed, but should have passed"
+
+	${CFG}_entries_del "$locus1" temp 2
+	check_err $? "Couldn't delete MDB entries"
+
+	${CFG}_entries_del "$locus2" temp 5
+	check_err $? "Couldn't delete MDB entries"
+
+	bridge_port_vlan_maxgroups_set "$locus1" 0
+	check_err $? "$locus1: Couldn't set maximum to 0"
+
+	bridge_port_maxgroups_set "$locus0" 0
+	check_err $? "$locus0: Couldn't set maximum to 0"
+
+	log_test "$CFG: port_vlan maxgroups: isolation of port and per-VLAN ngroups"
+}
+
+test_port_vlan_maxgroups_cfg4()
+{
+	test_port_vlan_maxgroups_zero_cfg4
+	test_port_vlan_maxgroups_zero_cross_vlan cfg4
+	test_port_vlan_maxgroups_too_low_cfg4
+	test_port_vlan_maxgroups_too_many_entries_cfg4
+	test_port_vlan_maxgroups_too_many_cross_vlan cfg4
+}
+
+test_port_vlan_maxgroups_cfg6()
+{
+	test_port_vlan_maxgroups_zero_cfg6
+	test_port_vlan_maxgroups_zero_cross_vlan cfg6
+	test_port_vlan_maxgroups_too_low_cfg6
+	test_port_vlan_maxgroups_too_many_entries_cfg6
+	test_port_vlan_maxgroups_too_many_cross_vlan cfg6
+}
+
+test_vlan_attributes()
+{
+	local locus=$1; shift
+	local expect=$1; shift
+
+	RET=0
+
+	local max=$(bridge_port_vlan_maxgroups_get "$locus")
+	local n=$(bridge_port_vlan_ngroups_get "$locus")
+
+	eval "[[ $max $expect ]]"
+	check_err $? "$locus: maxgroups attribute expected to be $expect, but was $max"
+
+	eval "[[ $n $expect ]]"
+	check_err $? "$locus: ngroups attribute expected to be $expect, but was $n"
+
+	log_test "port_vlan: presence of ngroups and maxgroups attributes"
+}
+
+test_vlan_attributes_off()
+{
+	test_vlan_attributes "dev $swp1 vid 10" "== null"
+}
+
+test_vlan_attributes_on()
+{
+	test_vlan_attributes "dev $swp1 vid 10" "-ge 0"
+}
+
+test_port_vlan_toggle_vlan_snooping_mode()
+{
+	local mode=$1; shift
+
+	RET=0
+
+	local CFG=cfg4
+	local context=port_vlan
+	local locus="dev $swp1 vid 10"
+
+	${CFG}_entries_add "$locus" $mode 5
+	check_err $? "Couldn't add MDB entries"
+
+	bridge_${context}_maxgroups_set "$locus" 100
+	check_err $? "Failed to set max to 100"
+
+	ip link set dev br0 type bridge mcast_vlan_snooping 0
+	sleep 1
+	ip link set dev br0 type bridge mcast_vlan_snooping 1
+
+	local n=$(bridge_${context}_ngroups_get "$locus")
+	local nn=$(bridge mdb show dev br0 | grep $swp1 | wc -l)
+	((nn == n))
+	check_err $? "mcast_n_groups expected to be $nn, but $n reported"
+
+	local max=$(bridge_${context}_maxgroups_get "$locus")
+	((max == 0))
+	check_err $? "Max groups expected to be 0 but $max reported"
+
+	log_test "$CFG: $context: $mode: mcast_vlan_snooping toggle"
+}
+
+test_port_vlan_toggle_vlan_snooping()
+{
+	test_port_vlan_toggle_vlan_snooping_mode temp
+	test_port_vlan_toggle_vlan_snooping_mode permanent
+}
+
+trap cleanup EXIT
+
+setup_prepare
+setup_wait
+tests_run
+
+exit $EXIT_STATUS
-- 
2.39.0


^ permalink raw reply related	[flat|nested] 90+ messages in thread

* Re: [PATCH net-next 06/16] net: bridge: Add a tracepoint for MDB overflows
  2023-01-26 17:01   ` [Bridge] " Petr Machata
@ 2023-01-26 17:53     ` Steven Rostedt
  -1 siblings, 0 replies; 90+ messages in thread
From: Steven Rostedt @ 2023-01-26 17:53 UTC (permalink / raw)
  To: Petr Machata
  Cc: David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
	Roopa Prabhu, Nikolay Aleksandrov, netdev, bridge, Ido Schimmel,
	linux-trace-kernel

On Thu, 26 Jan 2023 18:01:14 +0100
Petr Machata <petrm@nvidia.com> wrote:

> The following patch will add two more maximum MDB allowances to the global
> one, mcast_hash_max, that exists today. In all these cases, attempts to add
> MDB entries above the configured maximums through netlink, fail noisily and
> obviously. Such visibility is missing when adding entries through the
> control plane traffic, by IGMP or MLD packets.
> 
> To improve visibility in those cases, add a trace point that reports the
> violation, including the relevant netdevice (be it a slave or the bridge
> itself), and the MDB entry parameters:
> 
> 	# perf record -e bridge:br_mdb_full &
> 	# [...]
> 	# perf script | cut -d: -f4-
> 	 dev v2 af 2 src 192.0.2.1/:: grp 239.1.1.1/::/00:00:00:00:00:00 vid 0
> 	 dev v2 af 10 src 0.0.0.0/2001:db8:1::1 grp 0.0.0.0/ff0e::1/00:00:00:00:00:00 vid 0
> 	 dev v2 af 2 src 192.0.2.1/:: grp 239.1.1.1/::/00:00:00:00:00:00 vid 10
> 	 dev v2 af 10 src 0.0.0.0/2001:db8:1::1 grp 0.0.0.0/ff0e::1/00:00:00:00:00:00 vid 10
> 
> CC: Steven Rostedt <rostedt@goodmis.org>
> CC: linux-trace-kernel@vger.kernel.org
> Signed-off-by: Petr Machata <petrm@nvidia.com>
> Reviewed-by: Ido Schimmel <idosch@nvidia.com>
> ---
>  include/trace/events/bridge.h | 67 +++++++++++++++++++++++++++++++++++
>  net/core/net-traces.c         |  1 +
>  2 files changed, 68 insertions(+)
> 
> diff --git a/include/trace/events/bridge.h b/include/trace/events/bridge.h
> index 6b200059c2c5..00d5e2dcb3ad 100644
> --- a/include/trace/events/bridge.h
> +++ b/include/trace/events/bridge.h
> @@ -122,6 +122,73 @@ TRACE_EVENT(br_fdb_update,
>  		  __entry->flags)
>  );
>  
> +TRACE_EVENT(br_mdb_full,
> +
> +	TP_PROTO(const struct net_device *dev,
> +		 const struct br_ip *group),
> +
> +	TP_ARGS(dev, group),
> +
> +	TP_STRUCT__entry(
> +		__string(dev, dev->name)
> +		__field(int, af)
> +		__field(u16, vid)
> +		__array(__u8, src4, 4)
> +		__array(__u8, src6, 16)
> +		__array(__u8, grp4, 4)
> +		__array(__u8, grp6, 16)
> +		__array(__u8, grpmac, ETH_ALEN) /* For af == 0. */

Instead of wasting ring buffer space, why not just have:

		__array(__u8, src, 16)
		__array(__u8, grp, 16)


> +	),
> +
> +	TP_fast_assign(
> +		__assign_str(dev, dev->name);
> +		__entry->vid = group->vid;
> +
> +		if (!group->proto) {
> +			__entry->af = 0;
> +
> +			memset(__entry->src4, 0, sizeof(__entry->src4));
> +			memset(__entry->src6, 0, sizeof(__entry->src6));
> +			memset(__entry->grp4, 0, sizeof(__entry->grp4));
> +			memset(__entry->grp6, 0, sizeof(__entry->grp6));
> +			memcpy(__entry->grpmac, group->dst.mac_addr, ETH_ALEN);
> +		} else if (group->proto == htons(ETH_P_IP)) {
> +			__be32 *p32;
> +
> +			__entry->af = AF_INET;
> +
> +			p32 = (__be32 *) __entry->src4;
> +			*p32 = group->src.ip4;
> +
> +			p32 = (__be32 *) __entry->grp4;
> +			*p32 = group->dst.ip4;

			struct in6_addr *in6;

			in6 = (struct in6_addr *)__entry->src;
			ipv6_addr_set_v4mapped(group->src.ip4, in6);

			in6 = (struct in6_addr *)__entry->grp;
			ipv6_addr_set_v4mapped(group->grp.ip4, in6);

> +
> +			memset(__entry->src6, 0, sizeof(__entry->src6));
> +			memset(__entry->grp6, 0, sizeof(__entry->grp6));
> +			memset(__entry->grpmac, 0, ETH_ALEN);
> +#if IS_ENABLED(CONFIG_IPV6)
> +		} else {
> +			struct in6_addr *in6;
> +
> +			__entry->af = AF_INET6;
> +
> +			in6 = (struct in6_addr *)__entry->src6;
> +			*in6 = group->src.ip6;
> +
> +			in6 = (struct in6_addr *)__entry->grp6;
> +			*in6 = group->dst.ip6;
> +
> +			memset(__entry->src4, 0, sizeof(__entry->src4));
> +			memset(__entry->grp4, 0, sizeof(__entry->grp4));
> +			memset(__entry->grpmac, 0, ETH_ALEN);
> +#endif
> +		}
> +	),
> +
> +	TP_printk("dev %s af %u src %pI4/%pI6c grp %pI4/%pI6c/%pM vid %u",
> +		  __get_str(dev), __entry->af, __entry->src4, __entry->src6,
> +		  __entry->grp4, __entry->grp6, __entry->grpmac, __entry->vid)

And just have: 

	TP_printk("dev %s af %u src %pI6c grp %pI6c/%pM vid %u",
		  __get_str(dev), __entry->af, __entry->src, __entry->grp,
		  __entry->grpmac, __entry->vid)

As the %pI6c should detect that it's a ipv4 address and show that.

-- Steve


> +);
>  
>  #endif /* _TRACE_BRIDGE_H */
>  
> diff --git a/net/core/net-traces.c b/net/core/net-traces.c
> index ee7006bbe49b..805b7385dd8d 100644
> --- a/net/core/net-traces.c
> +++ b/net/core/net-traces.c
> @@ -41,6 +41,7 @@ EXPORT_TRACEPOINT_SYMBOL_GPL(br_fdb_add);
>  EXPORT_TRACEPOINT_SYMBOL_GPL(br_fdb_external_learn_add);
>  EXPORT_TRACEPOINT_SYMBOL_GPL(fdb_delete);
>  EXPORT_TRACEPOINT_SYMBOL_GPL(br_fdb_update);
> +EXPORT_TRACEPOINT_SYMBOL_GPL(br_mdb_full);
>  #endif
>  
>  #if IS_ENABLED(CONFIG_PAGE_POOL)


^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: [Bridge] [PATCH net-next 06/16] net: bridge: Add a tracepoint for MDB overflows
@ 2023-01-26 17:53     ` Steven Rostedt
  0 siblings, 0 replies; 90+ messages in thread
From: Steven Rostedt @ 2023-01-26 17:53 UTC (permalink / raw)
  To: Petr Machata
  Cc: netdev, Nikolay Aleksandrov, bridge, Ido Schimmel, Eric Dumazet,
	Roopa Prabhu, Jakub Kicinski, Paolo Abeni, David S. Miller,
	linux-trace-kernel

On Thu, 26 Jan 2023 18:01:14 +0100
Petr Machata <petrm@nvidia.com> wrote:

> The following patch will add two more maximum MDB allowances to the global
> one, mcast_hash_max, that exists today. In all these cases, attempts to add
> MDB entries above the configured maximums through netlink, fail noisily and
> obviously. Such visibility is missing when adding entries through the
> control plane traffic, by IGMP or MLD packets.
> 
> To improve visibility in those cases, add a trace point that reports the
> violation, including the relevant netdevice (be it a slave or the bridge
> itself), and the MDB entry parameters:
> 
> 	# perf record -e bridge:br_mdb_full &
> 	# [...]
> 	# perf script | cut -d: -f4-
> 	 dev v2 af 2 src 192.0.2.1/:: grp 239.1.1.1/::/00:00:00:00:00:00 vid 0
> 	 dev v2 af 10 src 0.0.0.0/2001:db8:1::1 grp 0.0.0.0/ff0e::1/00:00:00:00:00:00 vid 0
> 	 dev v2 af 2 src 192.0.2.1/:: grp 239.1.1.1/::/00:00:00:00:00:00 vid 10
> 	 dev v2 af 10 src 0.0.0.0/2001:db8:1::1 grp 0.0.0.0/ff0e::1/00:00:00:00:00:00 vid 10
> 
> CC: Steven Rostedt <rostedt@goodmis.org>
> CC: linux-trace-kernel@vger.kernel.org
> Signed-off-by: Petr Machata <petrm@nvidia.com>
> Reviewed-by: Ido Schimmel <idosch@nvidia.com>
> ---
>  include/trace/events/bridge.h | 67 +++++++++++++++++++++++++++++++++++
>  net/core/net-traces.c         |  1 +
>  2 files changed, 68 insertions(+)
> 
> diff --git a/include/trace/events/bridge.h b/include/trace/events/bridge.h
> index 6b200059c2c5..00d5e2dcb3ad 100644
> --- a/include/trace/events/bridge.h
> +++ b/include/trace/events/bridge.h
> @@ -122,6 +122,73 @@ TRACE_EVENT(br_fdb_update,
>  		  __entry->flags)
>  );
>  
> +TRACE_EVENT(br_mdb_full,
> +
> +	TP_PROTO(const struct net_device *dev,
> +		 const struct br_ip *group),
> +
> +	TP_ARGS(dev, group),
> +
> +	TP_STRUCT__entry(
> +		__string(dev, dev->name)
> +		__field(int, af)
> +		__field(u16, vid)
> +		__array(__u8, src4, 4)
> +		__array(__u8, src6, 16)
> +		__array(__u8, grp4, 4)
> +		__array(__u8, grp6, 16)
> +		__array(__u8, grpmac, ETH_ALEN) /* For af == 0. */

Instead of wasting ring buffer space, why not just have:

		__array(__u8, src, 16)
		__array(__u8, grp, 16)


> +	),
> +
> +	TP_fast_assign(
> +		__assign_str(dev, dev->name);
> +		__entry->vid = group->vid;
> +
> +		if (!group->proto) {
> +			__entry->af = 0;
> +
> +			memset(__entry->src4, 0, sizeof(__entry->src4));
> +			memset(__entry->src6, 0, sizeof(__entry->src6));
> +			memset(__entry->grp4, 0, sizeof(__entry->grp4));
> +			memset(__entry->grp6, 0, sizeof(__entry->grp6));
> +			memcpy(__entry->grpmac, group->dst.mac_addr, ETH_ALEN);
> +		} else if (group->proto == htons(ETH_P_IP)) {
> +			__be32 *p32;
> +
> +			__entry->af = AF_INET;
> +
> +			p32 = (__be32 *) __entry->src4;
> +			*p32 = group->src.ip4;
> +
> +			p32 = (__be32 *) __entry->grp4;
> +			*p32 = group->dst.ip4;

			struct in6_addr *in6;

			in6 = (struct in6_addr *)__entry->src;
			ipv6_addr_set_v4mapped(group->src.ip4, in6);

			in6 = (struct in6_addr *)__entry->grp;
			ipv6_addr_set_v4mapped(group->grp.ip4, in6);

> +
> +			memset(__entry->src6, 0, sizeof(__entry->src6));
> +			memset(__entry->grp6, 0, sizeof(__entry->grp6));
> +			memset(__entry->grpmac, 0, ETH_ALEN);
> +#if IS_ENABLED(CONFIG_IPV6)
> +		} else {
> +			struct in6_addr *in6;
> +
> +			__entry->af = AF_INET6;
> +
> +			in6 = (struct in6_addr *)__entry->src6;
> +			*in6 = group->src.ip6;
> +
> +			in6 = (struct in6_addr *)__entry->grp6;
> +			*in6 = group->dst.ip6;
> +
> +			memset(__entry->src4, 0, sizeof(__entry->src4));
> +			memset(__entry->grp4, 0, sizeof(__entry->grp4));
> +			memset(__entry->grpmac, 0, ETH_ALEN);
> +#endif
> +		}
> +	),
> +
> +	TP_printk("dev %s af %u src %pI4/%pI6c grp %pI4/%pI6c/%pM vid %u",
> +		  __get_str(dev), __entry->af, __entry->src4, __entry->src6,
> +		  __entry->grp4, __entry->grp6, __entry->grpmac, __entry->vid)

And just have: 

	TP_printk("dev %s af %u src %pI6c grp %pI6c/%pM vid %u",
		  __get_str(dev), __entry->af, __entry->src, __entry->grp,
		  __entry->grpmac, __entry->vid)

As the %pI6c should detect that it's a ipv4 address and show that.

-- Steve


> +);
>  
>  #endif /* _TRACE_BRIDGE_H */
>  
> diff --git a/net/core/net-traces.c b/net/core/net-traces.c
> index ee7006bbe49b..805b7385dd8d 100644
> --- a/net/core/net-traces.c
> +++ b/net/core/net-traces.c
> @@ -41,6 +41,7 @@ EXPORT_TRACEPOINT_SYMBOL_GPL(br_fdb_add);
>  EXPORT_TRACEPOINT_SYMBOL_GPL(br_fdb_external_learn_add);
>  EXPORT_TRACEPOINT_SYMBOL_GPL(fdb_delete);
>  EXPORT_TRACEPOINT_SYMBOL_GPL(br_fdb_update);
> +EXPORT_TRACEPOINT_SYMBOL_GPL(br_mdb_full);
>  #endif
>  
>  #if IS_ENABLED(CONFIG_PAGE_POOL)


^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: [PATCH net-next 01/16] net: bridge: Set strict_start_type at two policies
  2023-01-26 17:01   ` [Bridge] " Petr Machata
@ 2023-01-26 19:18     ` Stephen Hemminger
  -1 siblings, 0 replies; 90+ messages in thread
From: Stephen Hemminger @ 2023-01-26 19:18 UTC (permalink / raw)
  To: Petr Machata
  Cc: David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
	Roopa Prabhu, Nikolay Aleksandrov, netdev, bridge, Ido Schimmel

On Thu, 26 Jan 2023 18:01:09 +0100
Petr Machata <petrm@nvidia.com> wrote:

>  static const struct nla_policy br_port_policy[IFLA_BRPORT_MAX + 1] = {
> +	[IFLA_BRPORT_UNSPEC]	= { .strict_start_type =
> +					IFLA_BRPORT_MCAST_EHT_HOSTS_LIMIT + 1 },

Is the original IFLA_BRPORT a typo? ETH not EHT

^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: [Bridge] [PATCH net-next 01/16] net: bridge: Set strict_start_type at two policies
@ 2023-01-26 19:18     ` Stephen Hemminger
  0 siblings, 0 replies; 90+ messages in thread
From: Stephen Hemminger @ 2023-01-26 19:18 UTC (permalink / raw)
  To: Petr Machata
  Cc: netdev, Nikolay Aleksandrov, bridge, Ido Schimmel, Eric Dumazet,
	Roopa Prabhu, Jakub Kicinski, Paolo Abeni, David S. Miller

On Thu, 26 Jan 2023 18:01:09 +0100
Petr Machata <petrm@nvidia.com> wrote:

>  static const struct nla_policy br_port_policy[IFLA_BRPORT_MAX + 1] = {
> +	[IFLA_BRPORT_UNSPEC]	= { .strict_start_type =
> +					IFLA_BRPORT_MCAST_EHT_HOSTS_LIMIT + 1 },

Is the original IFLA_BRPORT a typo? ETH not EHT

^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: [PATCH net-next 01/16] net: bridge: Set strict_start_type at two policies
  2023-01-26 19:18     ` [Bridge] " Stephen Hemminger
@ 2023-01-26 20:27       ` Nikolay Aleksandrov
  -1 siblings, 0 replies; 90+ messages in thread
From: Nikolay Aleksandrov @ 2023-01-26 20:27 UTC (permalink / raw)
  To: Stephen Hemminger, Petr Machata
  Cc: David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
	Roopa Prabhu, netdev, bridge, Ido Schimmel

On January 26, 2023 9:18:43 PM GMT+02:00, Stephen Hemminger <stephen@networkplumber.org> wrote:
>On Thu, 26 Jan 2023 18:01:09 +0100
>Petr Machata <petrm@nvidia.com> wrote:
>
>>  static const struct nla_policy br_port_policy[IFLA_BRPORT_MAX + 1] = {
>> +	[IFLA_BRPORT_UNSPEC]	= { .strict_start_type =
>> +					IFLA_BRPORT_MCAST_EHT_HOSTS_LIMIT + 1 },
>
>Is the original IFLA_BRPORT a typo? ETH not EHT


No, it's not a typo, Explicit Host Tracking

^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: [Bridge] [PATCH net-next 01/16] net: bridge: Set strict_start_type at two policies
@ 2023-01-26 20:27       ` Nikolay Aleksandrov
  0 siblings, 0 replies; 90+ messages in thread
From: Nikolay Aleksandrov @ 2023-01-26 20:27 UTC (permalink / raw)
  To: Stephen Hemminger, Petr Machata
  Cc: netdev, Ido Schimmel, bridge, Eric Dumazet, Roopa Prabhu,
	Jakub Kicinski, Paolo Abeni, David S. Miller

On January 26, 2023 9:18:43 PM GMT+02:00, Stephen Hemminger <stephen@networkplumber.org> wrote:
>On Thu, 26 Jan 2023 18:01:09 +0100
>Petr Machata <petrm@nvidia.com> wrote:
>
>>  static const struct nla_policy br_port_policy[IFLA_BRPORT_MAX + 1] = {
>> +	[IFLA_BRPORT_UNSPEC]	= { .strict_start_type =
>> +					IFLA_BRPORT_MCAST_EHT_HOSTS_LIMIT + 1 },
>
>Is the original IFLA_BRPORT a typo? ETH not EHT


No, it's not a typo, Explicit Host Tracking

^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: [PATCH net-next 00/16] bridge: Limit number of MDB entries per port, port-vlan
  2023-01-26 17:01 ` [Bridge] " Petr Machata
@ 2023-01-26 20:28   ` Nikolay Aleksandrov
  -1 siblings, 0 replies; 90+ messages in thread
From: Nikolay Aleksandrov @ 2023-01-26 20:28 UTC (permalink / raw)
  To: Petr Machata, David S. Miller, Eric Dumazet, Jakub Kicinski,
	Paolo Abeni, Roopa Prabhu, netdev
  Cc: bridge, Ido Schimmel

On January 26, 2023 7:01:08 PM GMT+02:00, Petr Machata <petrm@nvidia.com> wrote:
>The MDB maintained by the bridge is limited. When the bridge is configured
>for IGMP / MLD snooping, a buggy or malicious client can easily exhaust its
>capacity. In SW datapath, the capacity is configurable through the
>IFLA_BR_MCAST_HASH_MAX parameter, but ultimately is finite. Obviously a
>similar limit exists in the HW datapath for purposes of offloading.
>
>In order to prevent the issue of unilateral exhaustion of MDB resources,
>introduce two parameters in each of two contexts:
>
>- Per-port and (when BROPT_MCAST_VLAN_SNOOPING_ENABLED is enabled)
>  per-port-VLAN number of MDB entries that the port is member in.
>
>- Per-port and (when BROPT_MCAST_VLAN_SNOOPING_ENABLED is enabled)
>  per-port-VLAN maximum permitted number of MDB entries, or 0 for
>  no limit.
>
>Per-port number of entries keeps track of the total number of MDB entries
>configured on a given port. The per-port-VLAN value then keeps track of the
>subset of MDB entries configured specifically for the given VLAN, on that
>port. The number is adjusted as port_groups are created and deleted, and
>therefore under multicast lock.
>
>A maximum value, if non-zero, then places a limit on the number of entries
>that can be configured in a given context. Attempts to add entries above
>the maximum are rejected.
>
>Rejection reason of netlink-based requests to add MDB entries is
>communicated through extack. This channel is unavailable for rejections
>triggered from the control path. To address this lack of visibility, the
>patchset adds a tracepoint, bridge:br_mdb_full:
>
>	# perf record -e bridge:br_mdb_full &
>	# [...]
>	# perf script | cut -d: -f4-
>	 dev v2 af 2 src 192.0.2.1/:: grp 239.1.1.1/::/00:00:00:00:00:00 vid 0
>	 dev v2 af 10 src 0.0.0.0/2001:db8:1::1 grp 0.0.0.0/ff0e::1/00:00:00:00:00:00 vid 0
>	 dev v2 af 2 src 192.0.2.1/:: grp 239.1.1.1/::/00:00:00:00:00:00 vid 10
>	 dev v2 af 10 src 0.0.0.0/2001:db8:1::1 grp 0.0.0.0/ff0e::1/00:00:00:00:00:00 vid 10
>
>This tracepoint is triggered for mcast_hash_max exhaustions as well.
>
>The following is an example of how the feature is used. A more extensive
>example is available in patch #8:
>
>	# bridge vlan set dev v1 vid 1 mcast_max_groups 1
>	# bridge mdb add dev br port v1 grp 230.1.2.3 temp vid 1
>	# bridge mdb add dev br port v1 grp 230.1.2.4 temp vid 1
>	Error: bridge: Port-VLAN is already a member in mcast_max_groups (1) groups.
>
>The patchset progresses as follows:
>
>- In patch #1, set strict_start_type at two bridge-related policies. The
>  reason is we are adding a new attribute to one of these, and want the new
>  attribute to be parsed strictly. The other was adjusted for completeness'
>  sake.
>
>- In patches #2 to #5, br_mdb and br_multicast code is adjusted to make the
>  following additions smoother.
>
>- In patch #6, add the tracepoint.
>
>- In patch #7, the code to maintain number of MDB entries is added as
>  struct net_bridge_mcast_port::mdb_n_entries. The maximum is added, too,
>  as struct net_bridge_mcast_port::mdb_max_entries, however at this point
>  there is no way to set the value yet, and since 0 is treated as "no
>  limit", the functionality doesn't change at this point. Note however,
>  that mcast_hash_max violations already do trigger at this point.
>
>- In patch #8, netlink plumbing is added: reading of number of entries, and
>  reading and writing of maximum.
>
>  The per-port values are passed through RTM_NEWLINK / RTM_GETLINK messages
>  in IFLA_BRPORT_MCAST_N_GROUPS and _MAX_GROUPS, inside IFLA_PROTINFO nest.
>
>  The per-port-vlan values are passed through RTM_GETVLAN / RTM_NEWVLAN
>  messages in BRIDGE_VLANDB_ENTRY_MCAST_N_GROUPS, _MAX_GROUPS, inside
>  BRIDGE_VLANDB_ENTRY.
>
>The following patches deal with the selftest:
>
>- Patches #9 and #10 clean up and move around some selftest code.
>
>- Patches #11 to #14 add helpers and generalize the existing IGMP / MLD
>  support to allow generating packets with configurable group addresses and
>  varying source lists for (S,G) memberships.
>
>- Patch #15 adds code to generate IGMP leave and MLD done packets.
>
>- Patch #16 finally adds the selftest itself.
>
>Petr Machata (16):
>  net: bridge: Set strict_start_type at two policies
>  net: bridge: Add extack to br_multicast_new_port_group()
>  net: bridge: Move extack-setting to br_multicast_new_port_group()
>  net: bridge: Add br_multicast_del_port_group()
>  net: bridge: Change a cleanup in br_multicast_new_port_group() to goto
>  net: bridge: Add a tracepoint for MDB overflows
>  net: bridge: Maintain number of MDB entries in net_bridge_mcast_port
>  net: bridge: Add netlink knobs for number / maximum MDB entries
>  selftests: forwarding: Move IGMP- and MLD-related functions to lib
>  selftests: forwarding: bridge_mdb: Fix a typo
>  selftests: forwarding: lib: Add helpers for IP address handling
>  selftests: forwarding: lib: Add helpers for checksum handling
>  selftests: forwarding: lib: Parameterize IGMPv3/MLDv2 generation
>  selftests: forwarding: lib: Allow list of IPs for IGMPv3/MLDv2
>  selftests: forwarding: lib: Add helpers to build IGMP/MLD leave
>    packets
>  selftests: forwarding: bridge_mdb_max: Add a new selftest
>
> include/trace/events/bridge.h                 |  67 ++
> include/uapi/linux/if_bridge.h                |   2 +
> include/uapi/linux/if_link.h                  |   2 +
> net/bridge/br_mdb.c                           |  17 +-
> net/bridge/br_multicast.c                     | 255 ++++-
> net/bridge/br_netlink.c                       |  21 +-
> net/bridge/br_netlink_tunnel.c                |   3 +
> net/bridge/br_private.h                       |  22 +-
> net/bridge/br_vlan.c                          |  11 +-
> net/bridge/br_vlan_options.c                  |  33 +-
> net/core/net-traces.c                         |   1 +
> net/core/rtnetlink.c                          |   2 +-
> .../testing/selftests/net/forwarding/Makefile |   1 +
> .../selftests/net/forwarding/bridge_mdb.sh    |  60 +-
> .../net/forwarding/bridge_mdb_max.sh          | 970 ++++++++++++++++++
> tools/testing/selftests/net/forwarding/lib.sh | 216 ++++
> 16 files changed, 1604 insertions(+), 79 deletions(-)
> create mode 100755 tools/testing/selftests/net/forwarding/bridge_mdb_max.sh
>


Nice set, thanks! Please hold off applying until Sunday when I'll be able to properly review it.

Cheers,
  Nik

^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: [Bridge] [PATCH net-next 00/16] bridge: Limit number of MDB entries per port, port-vlan
@ 2023-01-26 20:28   ` Nikolay Aleksandrov
  0 siblings, 0 replies; 90+ messages in thread
From: Nikolay Aleksandrov @ 2023-01-26 20:28 UTC (permalink / raw)
  To: Petr Machata, David S. Miller, Eric Dumazet, Jakub Kicinski,
	Paolo Abeni, Roopa Prabhu, netdev
  Cc: Ido Schimmel, bridge

On January 26, 2023 7:01:08 PM GMT+02:00, Petr Machata <petrm@nvidia.com> wrote:
>The MDB maintained by the bridge is limited. When the bridge is configured
>for IGMP / MLD snooping, a buggy or malicious client can easily exhaust its
>capacity. In SW datapath, the capacity is configurable through the
>IFLA_BR_MCAST_HASH_MAX parameter, but ultimately is finite. Obviously a
>similar limit exists in the HW datapath for purposes of offloading.
>
>In order to prevent the issue of unilateral exhaustion of MDB resources,
>introduce two parameters in each of two contexts:
>
>- Per-port and (when BROPT_MCAST_VLAN_SNOOPING_ENABLED is enabled)
>  per-port-VLAN number of MDB entries that the port is member in.
>
>- Per-port and (when BROPT_MCAST_VLAN_SNOOPING_ENABLED is enabled)
>  per-port-VLAN maximum permitted number of MDB entries, or 0 for
>  no limit.
>
>Per-port number of entries keeps track of the total number of MDB entries
>configured on a given port. The per-port-VLAN value then keeps track of the
>subset of MDB entries configured specifically for the given VLAN, on that
>port. The number is adjusted as port_groups are created and deleted, and
>therefore under multicast lock.
>
>A maximum value, if non-zero, then places a limit on the number of entries
>that can be configured in a given context. Attempts to add entries above
>the maximum are rejected.
>
>Rejection reason of netlink-based requests to add MDB entries is
>communicated through extack. This channel is unavailable for rejections
>triggered from the control path. To address this lack of visibility, the
>patchset adds a tracepoint, bridge:br_mdb_full:
>
>	# perf record -e bridge:br_mdb_full &
>	# [...]
>	# perf script | cut -d: -f4-
>	 dev v2 af 2 src 192.0.2.1/:: grp 239.1.1.1/::/00:00:00:00:00:00 vid 0
>	 dev v2 af 10 src 0.0.0.0/2001:db8:1::1 grp 0.0.0.0/ff0e::1/00:00:00:00:00:00 vid 0
>	 dev v2 af 2 src 192.0.2.1/:: grp 239.1.1.1/::/00:00:00:00:00:00 vid 10
>	 dev v2 af 10 src 0.0.0.0/2001:db8:1::1 grp 0.0.0.0/ff0e::1/00:00:00:00:00:00 vid 10
>
>This tracepoint is triggered for mcast_hash_max exhaustions as well.
>
>The following is an example of how the feature is used. A more extensive
>example is available in patch #8:
>
>	# bridge vlan set dev v1 vid 1 mcast_max_groups 1
>	# bridge mdb add dev br port v1 grp 230.1.2.3 temp vid 1
>	# bridge mdb add dev br port v1 grp 230.1.2.4 temp vid 1
>	Error: bridge: Port-VLAN is already a member in mcast_max_groups (1) groups.
>
>The patchset progresses as follows:
>
>- In patch #1, set strict_start_type at two bridge-related policies. The
>  reason is we are adding a new attribute to one of these, and want the new
>  attribute to be parsed strictly. The other was adjusted for completeness'
>  sake.
>
>- In patches #2 to #5, br_mdb and br_multicast code is adjusted to make the
>  following additions smoother.
>
>- In patch #6, add the tracepoint.
>
>- In patch #7, the code to maintain number of MDB entries is added as
>  struct net_bridge_mcast_port::mdb_n_entries. The maximum is added, too,
>  as struct net_bridge_mcast_port::mdb_max_entries, however at this point
>  there is no way to set the value yet, and since 0 is treated as "no
>  limit", the functionality doesn't change at this point. Note however,
>  that mcast_hash_max violations already do trigger at this point.
>
>- In patch #8, netlink plumbing is added: reading of number of entries, and
>  reading and writing of maximum.
>
>  The per-port values are passed through RTM_NEWLINK / RTM_GETLINK messages
>  in IFLA_BRPORT_MCAST_N_GROUPS and _MAX_GROUPS, inside IFLA_PROTINFO nest.
>
>  The per-port-vlan values are passed through RTM_GETVLAN / RTM_NEWVLAN
>  messages in BRIDGE_VLANDB_ENTRY_MCAST_N_GROUPS, _MAX_GROUPS, inside
>  BRIDGE_VLANDB_ENTRY.
>
>The following patches deal with the selftest:
>
>- Patches #9 and #10 clean up and move around some selftest code.
>
>- Patches #11 to #14 add helpers and generalize the existing IGMP / MLD
>  support to allow generating packets with configurable group addresses and
>  varying source lists for (S,G) memberships.
>
>- Patch #15 adds code to generate IGMP leave and MLD done packets.
>
>- Patch #16 finally adds the selftest itself.
>
>Petr Machata (16):
>  net: bridge: Set strict_start_type at two policies
>  net: bridge: Add extack to br_multicast_new_port_group()
>  net: bridge: Move extack-setting to br_multicast_new_port_group()
>  net: bridge: Add br_multicast_del_port_group()
>  net: bridge: Change a cleanup in br_multicast_new_port_group() to goto
>  net: bridge: Add a tracepoint for MDB overflows
>  net: bridge: Maintain number of MDB entries in net_bridge_mcast_port
>  net: bridge: Add netlink knobs for number / maximum MDB entries
>  selftests: forwarding: Move IGMP- and MLD-related functions to lib
>  selftests: forwarding: bridge_mdb: Fix a typo
>  selftests: forwarding: lib: Add helpers for IP address handling
>  selftests: forwarding: lib: Add helpers for checksum handling
>  selftests: forwarding: lib: Parameterize IGMPv3/MLDv2 generation
>  selftests: forwarding: lib: Allow list of IPs for IGMPv3/MLDv2
>  selftests: forwarding: lib: Add helpers to build IGMP/MLD leave
>    packets
>  selftests: forwarding: bridge_mdb_max: Add a new selftest
>
> include/trace/events/bridge.h                 |  67 ++
> include/uapi/linux/if_bridge.h                |   2 +
> include/uapi/linux/if_link.h                  |   2 +
> net/bridge/br_mdb.c                           |  17 +-
> net/bridge/br_multicast.c                     | 255 ++++-
> net/bridge/br_netlink.c                       |  21 +-
> net/bridge/br_netlink_tunnel.c                |   3 +
> net/bridge/br_private.h                       |  22 +-
> net/bridge/br_vlan.c                          |  11 +-
> net/bridge/br_vlan_options.c                  |  33 +-
> net/core/net-traces.c                         |   1 +
> net/core/rtnetlink.c                          |   2 +-
> .../testing/selftests/net/forwarding/Makefile |   1 +
> .../selftests/net/forwarding/bridge_mdb.sh    |  60 +-
> .../net/forwarding/bridge_mdb_max.sh          | 970 ++++++++++++++++++
> tools/testing/selftests/net/forwarding/lib.sh | 216 ++++
> 16 files changed, 1604 insertions(+), 79 deletions(-)
> create mode 100755 tools/testing/selftests/net/forwarding/bridge_mdb_max.sh
>


Nice set, thanks! Please hold off applying until Sunday when I'll be able to properly review it.

Cheers,
  Nik

^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: [PATCH net-next 06/16] net: bridge: Add a tracepoint for MDB overflows
  2023-01-26 17:53     ` [Bridge] " Steven Rostedt
@ 2023-01-27 14:29       ` Petr Machata
  -1 siblings, 0 replies; 90+ messages in thread
From: Petr Machata @ 2023-01-27 14:29 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: Petr Machata, David S. Miller, Eric Dumazet, Jakub Kicinski,
	Paolo Abeni, Roopa Prabhu, Nikolay Aleksandrov, netdev, bridge,
	Ido Schimmel, linux-trace-kernel


Steven Rostedt <rostedt@goodmis.org> writes:

>> diff --git a/include/trace/events/bridge.h b/include/trace/events/bridge.h
>> index 6b200059c2c5..00d5e2dcb3ad 100644
>> --- a/include/trace/events/bridge.h
>> +++ b/include/trace/events/bridge.h
>> @@ -122,6 +122,73 @@ TRACE_EVENT(br_fdb_update,
>>  		  __entry->flags)
>>  );
>>  
>> +TRACE_EVENT(br_mdb_full,
>> +
>> +	TP_PROTO(const struct net_device *dev,
>> +		 const struct br_ip *group),
>> +
>> +	TP_ARGS(dev, group),
>> +
>> +	TP_STRUCT__entry(
>> +		__string(dev, dev->name)
>> +		__field(int, af)
>> +		__field(u16, vid)
>> +		__array(__u8, src4, 4)
>> +		__array(__u8, src6, 16)
>> +		__array(__u8, grp4, 4)
>> +		__array(__u8, grp6, 16)
>> +		__array(__u8, grpmac, ETH_ALEN) /* For af == 0. */
>
> Instead of wasting ring buffer space, why not just have:
>
> 		__array(__u8, src, 16)
> 		__array(__u8, grp, 16)
>
>> +	),
>> +
>> +	TP_fast_assign(
>> +		__assign_str(dev, dev->name);
>> +		__entry->vid = group->vid;
>> +
>> +		if (!group->proto) {
>> +			__entry->af = 0;
>> +
>> +			memset(__entry->src4, 0, sizeof(__entry->src4));
>> +			memset(__entry->src6, 0, sizeof(__entry->src6));
>> +			memset(__entry->grp4, 0, sizeof(__entry->grp4));
>> +			memset(__entry->grp6, 0, sizeof(__entry->grp6));
>> +			memcpy(__entry->grpmac, group->dst.mac_addr, ETH_ALEN);
>> +		} else if (group->proto == htons(ETH_P_IP)) {
>> +			__be32 *p32;
>> +
>> +			__entry->af = AF_INET;
>> +
>> +			p32 = (__be32 *) __entry->src4;
>> +			*p32 = group->src.ip4;
>> +
>> +			p32 = (__be32 *) __entry->grp4;
>> +			*p32 = group->dst.ip4;
>
> 			struct in6_addr *in6;
>
> 			in6 = (struct in6_addr *)__entry->src;
> 			ipv6_addr_set_v4mapped(group->src.ip4, in6);
>
> 			in6 = (struct in6_addr *)__entry->grp;
> 			ipv6_addr_set_v4mapped(group->grp.ip4, in6);
>
>> +
>> +			memset(__entry->src6, 0, sizeof(__entry->src6));
>> +			memset(__entry->grp6, 0, sizeof(__entry->grp6));
>> +			memset(__entry->grpmac, 0, ETH_ALEN);
>> +#if IS_ENABLED(CONFIG_IPV6)
>> +		} else {
>> +			struct in6_addr *in6;
>> +
>> +			__entry->af = AF_INET6;
>> +
>> +			in6 = (struct in6_addr *)__entry->src6;
>> +			*in6 = group->src.ip6;
>> +
>> +			in6 = (struct in6_addr *)__entry->grp6;
>> +			*in6 = group->dst.ip6;
>> +
>> +			memset(__entry->src4, 0, sizeof(__entry->src4));
>> +			memset(__entry->grp4, 0, sizeof(__entry->grp4));
>> +			memset(__entry->grpmac, 0, ETH_ALEN);
>> +#endif
>> +		}
>> +	),
>> +
>> +	TP_printk("dev %s af %u src %pI4/%pI6c grp %pI4/%pI6c/%pM vid %u",
>> +		  __get_str(dev), __entry->af, __entry->src4, __entry->src6,
>> +		  __entry->grp4, __entry->grp6, __entry->grpmac, __entry->vid)
>
> And just have: 
>
> 	TP_printk("dev %s af %u src %pI6c grp %pI6c/%pM vid %u",
> 		  __get_str(dev), __entry->af, __entry->src, __entry->grp,
> 		  __entry->grpmac, __entry->vid)
>
> As the %pI6c should detect that it's a ipv4 address and show that.

So the reason I split the fields was that %pI4, %pI6c, %pM do not seem
to work with buffers of wrong size.

But I can consolidate 4/6 by changing the address to IPv6 like you
propose. I'll do this for v2. Thanks!

^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: [Bridge] [PATCH net-next 06/16] net: bridge: Add a tracepoint for MDB overflows
@ 2023-01-27 14:29       ` Petr Machata
  0 siblings, 0 replies; 90+ messages in thread
From: Petr Machata @ 2023-01-27 14:29 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: Petr Machata, netdev, Nikolay Aleksandrov, bridge, Ido Schimmel,
	Eric Dumazet, Roopa Prabhu, Jakub Kicinski, Paolo Abeni,
	David S. Miller, linux-trace-kernel


Steven Rostedt <rostedt@goodmis.org> writes:

>> diff --git a/include/trace/events/bridge.h b/include/trace/events/bridge.h
>> index 6b200059c2c5..00d5e2dcb3ad 100644
>> --- a/include/trace/events/bridge.h
>> +++ b/include/trace/events/bridge.h
>> @@ -122,6 +122,73 @@ TRACE_EVENT(br_fdb_update,
>>  		  __entry->flags)
>>  );
>>  
>> +TRACE_EVENT(br_mdb_full,
>> +
>> +	TP_PROTO(const struct net_device *dev,
>> +		 const struct br_ip *group),
>> +
>> +	TP_ARGS(dev, group),
>> +
>> +	TP_STRUCT__entry(
>> +		__string(dev, dev->name)
>> +		__field(int, af)
>> +		__field(u16, vid)
>> +		__array(__u8, src4, 4)
>> +		__array(__u8, src6, 16)
>> +		__array(__u8, grp4, 4)
>> +		__array(__u8, grp6, 16)
>> +		__array(__u8, grpmac, ETH_ALEN) /* For af == 0. */
>
> Instead of wasting ring buffer space, why not just have:
>
> 		__array(__u8, src, 16)
> 		__array(__u8, grp, 16)
>
>> +	),
>> +
>> +	TP_fast_assign(
>> +		__assign_str(dev, dev->name);
>> +		__entry->vid = group->vid;
>> +
>> +		if (!group->proto) {
>> +			__entry->af = 0;
>> +
>> +			memset(__entry->src4, 0, sizeof(__entry->src4));
>> +			memset(__entry->src6, 0, sizeof(__entry->src6));
>> +			memset(__entry->grp4, 0, sizeof(__entry->grp4));
>> +			memset(__entry->grp6, 0, sizeof(__entry->grp6));
>> +			memcpy(__entry->grpmac, group->dst.mac_addr, ETH_ALEN);
>> +		} else if (group->proto == htons(ETH_P_IP)) {
>> +			__be32 *p32;
>> +
>> +			__entry->af = AF_INET;
>> +
>> +			p32 = (__be32 *) __entry->src4;
>> +			*p32 = group->src.ip4;
>> +
>> +			p32 = (__be32 *) __entry->grp4;
>> +			*p32 = group->dst.ip4;
>
> 			struct in6_addr *in6;
>
> 			in6 = (struct in6_addr *)__entry->src;
> 			ipv6_addr_set_v4mapped(group->src.ip4, in6);
>
> 			in6 = (struct in6_addr *)__entry->grp;
> 			ipv6_addr_set_v4mapped(group->grp.ip4, in6);
>
>> +
>> +			memset(__entry->src6, 0, sizeof(__entry->src6));
>> +			memset(__entry->grp6, 0, sizeof(__entry->grp6));
>> +			memset(__entry->grpmac, 0, ETH_ALEN);
>> +#if IS_ENABLED(CONFIG_IPV6)
>> +		} else {
>> +			struct in6_addr *in6;
>> +
>> +			__entry->af = AF_INET6;
>> +
>> +			in6 = (struct in6_addr *)__entry->src6;
>> +			*in6 = group->src.ip6;
>> +
>> +			in6 = (struct in6_addr *)__entry->grp6;
>> +			*in6 = group->dst.ip6;
>> +
>> +			memset(__entry->src4, 0, sizeof(__entry->src4));
>> +			memset(__entry->grp4, 0, sizeof(__entry->grp4));
>> +			memset(__entry->grpmac, 0, ETH_ALEN);
>> +#endif
>> +		}
>> +	),
>> +
>> +	TP_printk("dev %s af %u src %pI4/%pI6c grp %pI4/%pI6c/%pM vid %u",
>> +		  __get_str(dev), __entry->af, __entry->src4, __entry->src6,
>> +		  __entry->grp4, __entry->grp6, __entry->grpmac, __entry->vid)
>
> And just have: 
>
> 	TP_printk("dev %s af %u src %pI6c grp %pI6c/%pM vid %u",
> 		  __get_str(dev), __entry->af, __entry->src, __entry->grp,
> 		  __entry->grpmac, __entry->vid)
>
> As the %pI6c should detect that it's a ipv4 address and show that.

So the reason I split the fields was that %pI4, %pI6c, %pM do not seem
to work with buffers of wrong size.

But I can consolidate 4/6 by changing the address to IPv6 like you
propose. I'll do this for v2. Thanks!

^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: [PATCH net-next 01/16] net: bridge: Set strict_start_type at two policies
  2023-01-26 17:01   ` [Bridge] " Petr Machata
@ 2023-01-29  9:09     ` Nikolay Aleksandrov
  -1 siblings, 0 replies; 90+ messages in thread
From: Nikolay Aleksandrov @ 2023-01-29  9:09 UTC (permalink / raw)
  To: Petr Machata, David S. Miller, Eric Dumazet, Jakub Kicinski,
	Paolo Abeni, Roopa Prabhu, netdev
  Cc: bridge, Ido Schimmel

On 26/01/2023 19:01, Petr Machata wrote:
> Make any attributes newly-added to br_port_policy or vlan_tunnel_policy
> parsed strictly, to prevent userspace from passing garbage. Note that this
> patchset only touches the former policy. The latter was adjusted for
> completeness' sake. There do not appear to be other _deprecated calls
> with non-NULL policies.
> 
> Suggested-by: Ido Schimmel <idosch@nvidia.com>
> Signed-off-by: Petr Machata <petrm@nvidia.com>
> Reviewed-by: Ido Schimmel <idosch@nvidia.com>
> ---
>  net/bridge/br_netlink.c        | 2 ++
>  net/bridge/br_netlink_tunnel.c | 3 +++
>  2 files changed, 5 insertions(+)
> 
> diff --git a/net/bridge/br_netlink.c b/net/bridge/br_netlink.c
> index 4316cc82ae17..a6133d469885 100644
> --- a/net/bridge/br_netlink.c
> +++ b/net/bridge/br_netlink.c
> @@ -858,6 +858,8 @@ static int br_afspec(struct net_bridge *br,
>  }
>  
>  static const struct nla_policy br_port_policy[IFLA_BRPORT_MAX + 1] = {
> +	[IFLA_BRPORT_UNSPEC]	= { .strict_start_type =
> +					IFLA_BRPORT_MCAST_EHT_HOSTS_LIMIT + 1 },
>  	[IFLA_BRPORT_STATE]	= { .type = NLA_U8 },
>  	[IFLA_BRPORT_COST]	= { .type = NLA_U32 },
>  	[IFLA_BRPORT_PRIORITY]	= { .type = NLA_U16 },
> diff --git a/net/bridge/br_netlink_tunnel.c b/net/bridge/br_netlink_tunnel.c
> index 8914290c75d4..17abf092f7ca 100644
> --- a/net/bridge/br_netlink_tunnel.c
> +++ b/net/bridge/br_netlink_tunnel.c
> @@ -188,6 +188,9 @@ int br_fill_vlan_tunnel_info(struct sk_buff *skb,
>  }
>  
>  static const struct nla_policy vlan_tunnel_policy[IFLA_BRIDGE_VLAN_TUNNEL_MAX + 1] = {
> +	[IFLA_BRIDGE_VLAN_TUNNEL_UNSPEC] = {
> +		.strict_start_type = IFLA_BRIDGE_VLAN_TUNNEL_FLAGS + 1
> +	},
>  	[IFLA_BRIDGE_VLAN_TUNNEL_ID] = { .type = NLA_U32 },
>  	[IFLA_BRIDGE_VLAN_TUNNEL_VID] = { .type = NLA_U16 },
>  	[IFLA_BRIDGE_VLAN_TUNNEL_FLAGS] = { .type = NLA_U16 },

Acked-by: Nikolay Aleksandrov <razor@blackwall.org>


^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: [Bridge] [PATCH net-next 01/16] net: bridge: Set strict_start_type at two policies
@ 2023-01-29  9:09     ` Nikolay Aleksandrov
  0 siblings, 0 replies; 90+ messages in thread
From: Nikolay Aleksandrov @ 2023-01-29  9:09 UTC (permalink / raw)
  To: Petr Machata, David S. Miller, Eric Dumazet, Jakub Kicinski,
	Paolo Abeni, Roopa Prabhu, netdev
  Cc: Ido Schimmel, bridge

On 26/01/2023 19:01, Petr Machata wrote:
> Make any attributes newly-added to br_port_policy or vlan_tunnel_policy
> parsed strictly, to prevent userspace from passing garbage. Note that this
> patchset only touches the former policy. The latter was adjusted for
> completeness' sake. There do not appear to be other _deprecated calls
> with non-NULL policies.
> 
> Suggested-by: Ido Schimmel <idosch@nvidia.com>
> Signed-off-by: Petr Machata <petrm@nvidia.com>
> Reviewed-by: Ido Schimmel <idosch@nvidia.com>
> ---
>  net/bridge/br_netlink.c        | 2 ++
>  net/bridge/br_netlink_tunnel.c | 3 +++
>  2 files changed, 5 insertions(+)
> 
> diff --git a/net/bridge/br_netlink.c b/net/bridge/br_netlink.c
> index 4316cc82ae17..a6133d469885 100644
> --- a/net/bridge/br_netlink.c
> +++ b/net/bridge/br_netlink.c
> @@ -858,6 +858,8 @@ static int br_afspec(struct net_bridge *br,
>  }
>  
>  static const struct nla_policy br_port_policy[IFLA_BRPORT_MAX + 1] = {
> +	[IFLA_BRPORT_UNSPEC]	= { .strict_start_type =
> +					IFLA_BRPORT_MCAST_EHT_HOSTS_LIMIT + 1 },
>  	[IFLA_BRPORT_STATE]	= { .type = NLA_U8 },
>  	[IFLA_BRPORT_COST]	= { .type = NLA_U32 },
>  	[IFLA_BRPORT_PRIORITY]	= { .type = NLA_U16 },
> diff --git a/net/bridge/br_netlink_tunnel.c b/net/bridge/br_netlink_tunnel.c
> index 8914290c75d4..17abf092f7ca 100644
> --- a/net/bridge/br_netlink_tunnel.c
> +++ b/net/bridge/br_netlink_tunnel.c
> @@ -188,6 +188,9 @@ int br_fill_vlan_tunnel_info(struct sk_buff *skb,
>  }
>  
>  static const struct nla_policy vlan_tunnel_policy[IFLA_BRIDGE_VLAN_TUNNEL_MAX + 1] = {
> +	[IFLA_BRIDGE_VLAN_TUNNEL_UNSPEC] = {
> +		.strict_start_type = IFLA_BRIDGE_VLAN_TUNNEL_FLAGS + 1
> +	},
>  	[IFLA_BRIDGE_VLAN_TUNNEL_ID] = { .type = NLA_U32 },
>  	[IFLA_BRIDGE_VLAN_TUNNEL_VID] = { .type = NLA_U16 },
>  	[IFLA_BRIDGE_VLAN_TUNNEL_FLAGS] = { .type = NLA_U16 },

Acked-by: Nikolay Aleksandrov <razor@blackwall.org>


^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: [PATCH net-next 02/16] net: bridge: Add extack to br_multicast_new_port_group()
  2023-01-26 17:01   ` [Bridge] " Petr Machata
@ 2023-01-29  9:09     ` Nikolay Aleksandrov
  -1 siblings, 0 replies; 90+ messages in thread
From: Nikolay Aleksandrov @ 2023-01-29  9:09 UTC (permalink / raw)
  To: Petr Machata, David S. Miller, Eric Dumazet, Jakub Kicinski,
	Paolo Abeni, Roopa Prabhu, netdev
  Cc: bridge, Ido Schimmel

On 26/01/2023 19:01, Petr Machata wrote:
> Make it possible to set an extack in br_multicast_new_port_group().
> Eventually, this function will check for per-port and per-port-vlan
> MDB maximums, and will use the extack to communicate the reason for
> the bounce.
> 
> Signed-off-by: Petr Machata <petrm@nvidia.com>
> Reviewed-by: Ido Schimmel <idosch@nvidia.com>
> ---
>  net/bridge/br_mdb.c       | 5 +++--
>  net/bridge/br_multicast.c | 5 +++--
>  net/bridge/br_private.h   | 3 ++-
>  3 files changed, 8 insertions(+), 5 deletions(-)
> 

Acked-by: Nikolay Aleksandrov <razor@blackwall.org>


^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: [Bridge] [PATCH net-next 02/16] net: bridge: Add extack to br_multicast_new_port_group()
@ 2023-01-29  9:09     ` Nikolay Aleksandrov
  0 siblings, 0 replies; 90+ messages in thread
From: Nikolay Aleksandrov @ 2023-01-29  9:09 UTC (permalink / raw)
  To: Petr Machata, David S. Miller, Eric Dumazet, Jakub Kicinski,
	Paolo Abeni, Roopa Prabhu, netdev
  Cc: Ido Schimmel, bridge

On 26/01/2023 19:01, Petr Machata wrote:
> Make it possible to set an extack in br_multicast_new_port_group().
> Eventually, this function will check for per-port and per-port-vlan
> MDB maximums, and will use the extack to communicate the reason for
> the bounce.
> 
> Signed-off-by: Petr Machata <petrm@nvidia.com>
> Reviewed-by: Ido Schimmel <idosch@nvidia.com>
> ---
>  net/bridge/br_mdb.c       | 5 +++--
>  net/bridge/br_multicast.c | 5 +++--
>  net/bridge/br_private.h   | 3 ++-
>  3 files changed, 8 insertions(+), 5 deletions(-)
> 

Acked-by: Nikolay Aleksandrov <razor@blackwall.org>


^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: [PATCH net-next 03/16] net: bridge: Move extack-setting to br_multicast_new_port_group()
  2023-01-26 17:01   ` [Bridge] " Petr Machata
@ 2023-01-29  9:09     ` Nikolay Aleksandrov
  -1 siblings, 0 replies; 90+ messages in thread
From: Nikolay Aleksandrov @ 2023-01-29  9:09 UTC (permalink / raw)
  To: Petr Machata, David S. Miller, Eric Dumazet, Jakub Kicinski,
	Paolo Abeni, Roopa Prabhu, netdev
  Cc: bridge, Ido Schimmel

On 26/01/2023 19:01, Petr Machata wrote:
> Now that br_multicast_new_port_group() takes an extack argument, move
> setting the extack there. The downside is that the error messages end
> up being less specific (the function cannot distinguish between (S,G)
> and (*,G) groups). However, the alternative is to check in the caller
> whether the callee set the extack, and if it didn't, set it. But that
> is only done when the callee is not exactly known. (E.g. in case of a
> notifier invocation.)
> 
> Signed-off-by: Petr Machata <petrm@nvidia.com>
> Reviewed-by: Ido Schimmel <idosch@nvidia.com>
> ---
>  net/bridge/br_mdb.c       | 9 +++------
>  net/bridge/br_multicast.c | 5 ++++-
>  2 files changed, 7 insertions(+), 7 deletions(-)
> 

Acked-by: Nikolay Aleksandrov <razor@blackwall.org>



^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: [Bridge] [PATCH net-next 03/16] net: bridge: Move extack-setting to br_multicast_new_port_group()
@ 2023-01-29  9:09     ` Nikolay Aleksandrov
  0 siblings, 0 replies; 90+ messages in thread
From: Nikolay Aleksandrov @ 2023-01-29  9:09 UTC (permalink / raw)
  To: Petr Machata, David S. Miller, Eric Dumazet, Jakub Kicinski,
	Paolo Abeni, Roopa Prabhu, netdev
  Cc: Ido Schimmel, bridge

On 26/01/2023 19:01, Petr Machata wrote:
> Now that br_multicast_new_port_group() takes an extack argument, move
> setting the extack there. The downside is that the error messages end
> up being less specific (the function cannot distinguish between (S,G)
> and (*,G) groups). However, the alternative is to check in the caller
> whether the callee set the extack, and if it didn't, set it. But that
> is only done when the callee is not exactly known. (E.g. in case of a
> notifier invocation.)
> 
> Signed-off-by: Petr Machata <petrm@nvidia.com>
> Reviewed-by: Ido Schimmel <idosch@nvidia.com>
> ---
>  net/bridge/br_mdb.c       | 9 +++------
>  net/bridge/br_multicast.c | 5 ++++-
>  2 files changed, 7 insertions(+), 7 deletions(-)
> 

Acked-by: Nikolay Aleksandrov <razor@blackwall.org>



^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: [PATCH net-next 04/16] net: bridge: Add br_multicast_del_port_group()
  2023-01-26 17:01   ` [Bridge] " Petr Machata
@ 2023-01-29  9:11     ` Nikolay Aleksandrov
  -1 siblings, 0 replies; 90+ messages in thread
From: Nikolay Aleksandrov @ 2023-01-29  9:11 UTC (permalink / raw)
  To: Petr Machata, David S. Miller, Eric Dumazet, Jakub Kicinski,
	Paolo Abeni, Roopa Prabhu, netdev
  Cc: bridge, Ido Schimmel

On 26/01/2023 19:01, Petr Machata wrote:
> Since cleaning up the effects of br_multicast_new_port_group() just
> consists of delisting and freeing the memory, the function
> br_mdb_add_group_star_g() inlines the corresponding code. In the following
> patches, number of per-port and per-port-VLAN MDB entries is going to be
> maintained, and that counter will have to be updated. Because that logic
> is going to be hidden in the br_multicast module, introduce a new hook
> intended to again remove a newly-created group.
> 
> Signed-off-by: Petr Machata <petrm@nvidia.com>
> Reviewed-by: Ido Schimmel <idosch@nvidia.com>
> ---
>  net/bridge/br_mdb.c       |  3 +--
>  net/bridge/br_multicast.c | 11 +++++++++++
>  net/bridge/br_private.h   |  1 +
>  3 files changed, 13 insertions(+), 2 deletions(-)
> 

Acked-by: Nikolay Aleksandrov <razor@blackwall.org>



^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: [Bridge] [PATCH net-next 04/16] net: bridge: Add br_multicast_del_port_group()
@ 2023-01-29  9:11     ` Nikolay Aleksandrov
  0 siblings, 0 replies; 90+ messages in thread
From: Nikolay Aleksandrov @ 2023-01-29  9:11 UTC (permalink / raw)
  To: Petr Machata, David S. Miller, Eric Dumazet, Jakub Kicinski,
	Paolo Abeni, Roopa Prabhu, netdev
  Cc: Ido Schimmel, bridge

On 26/01/2023 19:01, Petr Machata wrote:
> Since cleaning up the effects of br_multicast_new_port_group() just
> consists of delisting and freeing the memory, the function
> br_mdb_add_group_star_g() inlines the corresponding code. In the following
> patches, number of per-port and per-port-VLAN MDB entries is going to be
> maintained, and that counter will have to be updated. Because that logic
> is going to be hidden in the br_multicast module, introduce a new hook
> intended to again remove a newly-created group.
> 
> Signed-off-by: Petr Machata <petrm@nvidia.com>
> Reviewed-by: Ido Schimmel <idosch@nvidia.com>
> ---
>  net/bridge/br_mdb.c       |  3 +--
>  net/bridge/br_multicast.c | 11 +++++++++++
>  net/bridge/br_private.h   |  1 +
>  3 files changed, 13 insertions(+), 2 deletions(-)
> 

Acked-by: Nikolay Aleksandrov <razor@blackwall.org>



^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: [PATCH net-next 05/16] net: bridge: Change a cleanup in br_multicast_new_port_group() to goto
  2023-01-26 17:01   ` [Bridge] " Petr Machata
@ 2023-01-29  9:11     ` Nikolay Aleksandrov
  -1 siblings, 0 replies; 90+ messages in thread
From: Nikolay Aleksandrov @ 2023-01-29  9:11 UTC (permalink / raw)
  To: Petr Machata, David S. Miller, Eric Dumazet, Jakub Kicinski,
	Paolo Abeni, Roopa Prabhu, netdev
  Cc: bridge, Ido Schimmel

On 26/01/2023 19:01, Petr Machata wrote:
> This function is getting more to clean up in the following patches.
> Structuring the cleanups in one labeled block will allow reusing the same
> cleanup from several places.
> 
> Signed-off-by: Petr Machata <petrm@nvidia.com>
> Reviewed-by: Ido Schimmel <idosch@nvidia.com>
> ---
>  net/bridge/br_multicast.c | 7 +++++--
>  1 file changed, 5 insertions(+), 2 deletions(-)
> 

Acked-by: Nikolay Aleksandrov <razor@blackwall.org>



^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: [Bridge] [PATCH net-next 05/16] net: bridge: Change a cleanup in br_multicast_new_port_group() to goto
@ 2023-01-29  9:11     ` Nikolay Aleksandrov
  0 siblings, 0 replies; 90+ messages in thread
From: Nikolay Aleksandrov @ 2023-01-29  9:11 UTC (permalink / raw)
  To: Petr Machata, David S. Miller, Eric Dumazet, Jakub Kicinski,
	Paolo Abeni, Roopa Prabhu, netdev
  Cc: Ido Schimmel, bridge

On 26/01/2023 19:01, Petr Machata wrote:
> This function is getting more to clean up in the following patches.
> Structuring the cleanups in one labeled block will allow reusing the same
> cleanup from several places.
> 
> Signed-off-by: Petr Machata <petrm@nvidia.com>
> Reviewed-by: Ido Schimmel <idosch@nvidia.com>
> ---
>  net/bridge/br_multicast.c | 7 +++++--
>  1 file changed, 5 insertions(+), 2 deletions(-)
> 

Acked-by: Nikolay Aleksandrov <razor@blackwall.org>



^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: [PATCH net-next 07/16] net: bridge: Maintain number of MDB entries in net_bridge_mcast_port
  2023-01-26 17:01   ` [Bridge] " Petr Machata
@ 2023-01-29  9:40     ` Nikolay Aleksandrov
  -1 siblings, 0 replies; 90+ messages in thread
From: Nikolay Aleksandrov @ 2023-01-29  9:40 UTC (permalink / raw)
  To: Petr Machata, David S. Miller, Eric Dumazet, Jakub Kicinski,
	Paolo Abeni, Roopa Prabhu, netdev
  Cc: bridge, Ido Schimmel

On 26/01/2023 19:01, Petr Machata wrote:
> The MDB maintained by the bridge is limited. When the bridge is configured
> for IGMP / MLD snooping, a buggy or malicious client can easily exhaust its
> capacity. In SW datapath, the capacity is configurable through the
> IFLA_BR_MCAST_HASH_MAX parameter, but ultimately is finite. Obviously a
> similar limit exists in the HW datapath for purposes of offloading.
> 
> In order to prevent the issue of unilateral exhaustion of MDB resources,
> introduce two parameters in each of two contexts:
> 
> - Per-port and per-port-VLAN number of MDB entries that the port
>   is member in.
> 
> - Per-port and (when BROPT_MCAST_VLAN_SNOOPING_ENABLED is enabled)
>   per-port-VLAN maximum permitted number of MDB entries, or 0 for
>   no limit.
> 
> The per-port multicast context is used for tracking of MDB entries for the
> port as a whole. This is available for all bridges.
> 
> The per-port-VLAN multicast context is then only available on
> VLAN-filtering bridges on VLANs that have multicast snooping on.
> 
> With these changes in place, it will be possible to configure MDB limit for
> bridge as a whole, or any one port as a whole, or any single port-VLAN.
> 
> Note that unlike the global limit, exhaustion of the per-port and
> per-port-VLAN maximums does not cause disablement of multicast snooping.
> It is also permitted to configure the local limit larger than hash_max,
> even though that is not useful.
> 
> In this patch, introduce only the accounting for number of entries, and the
> max field itself, but not the means to toggle the max. The next patch
> introduces the netlink APIs to toggle and read the values.
> 
> Note that the per-port-VLAN mcast_max_groups value gets reset when VLAN
> snooping is enabled. The reason for this is that while VLAN snooping is
> disabled, permanent entries can be added above the limit imposed by the
> configured maximum. Under those circumstances, whatever caused the VLAN
> context enablement, would need to be rolled back, adding a fair amount of
> code that would be rarely hit and tricky to maintain. At the same time,
> the feature that this would enable is IMHO not interesting: I posit that
> the usefulness of keeping mcast_max_groups intact across
> mcast_vlan_snooping toggles is marginal at best.
> 
> Signed-off-by: Petr Machata <petrm@nvidia.com>
> Reviewed-by: Ido Schimmel <idosch@nvidia.com>
> ---
>  net/bridge/br_multicast.c | 131 +++++++++++++++++++++++++++++++++++++-
>  net/bridge/br_private.h   |   2 +
>  2 files changed, 132 insertions(+), 1 deletion(-)
> 

Acked-by: Nikolay Aleksandrov <razor@blackwall.org>


^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: [Bridge] [PATCH net-next 07/16] net: bridge: Maintain number of MDB entries in net_bridge_mcast_port
@ 2023-01-29  9:40     ` Nikolay Aleksandrov
  0 siblings, 0 replies; 90+ messages in thread
From: Nikolay Aleksandrov @ 2023-01-29  9:40 UTC (permalink / raw)
  To: Petr Machata, David S. Miller, Eric Dumazet, Jakub Kicinski,
	Paolo Abeni, Roopa Prabhu, netdev
  Cc: Ido Schimmel, bridge

On 26/01/2023 19:01, Petr Machata wrote:
> The MDB maintained by the bridge is limited. When the bridge is configured
> for IGMP / MLD snooping, a buggy or malicious client can easily exhaust its
> capacity. In SW datapath, the capacity is configurable through the
> IFLA_BR_MCAST_HASH_MAX parameter, but ultimately is finite. Obviously a
> similar limit exists in the HW datapath for purposes of offloading.
> 
> In order to prevent the issue of unilateral exhaustion of MDB resources,
> introduce two parameters in each of two contexts:
> 
> - Per-port and per-port-VLAN number of MDB entries that the port
>   is member in.
> 
> - Per-port and (when BROPT_MCAST_VLAN_SNOOPING_ENABLED is enabled)
>   per-port-VLAN maximum permitted number of MDB entries, or 0 for
>   no limit.
> 
> The per-port multicast context is used for tracking of MDB entries for the
> port as a whole. This is available for all bridges.
> 
> The per-port-VLAN multicast context is then only available on
> VLAN-filtering bridges on VLANs that have multicast snooping on.
> 
> With these changes in place, it will be possible to configure MDB limit for
> bridge as a whole, or any one port as a whole, or any single port-VLAN.
> 
> Note that unlike the global limit, exhaustion of the per-port and
> per-port-VLAN maximums does not cause disablement of multicast snooping.
> It is also permitted to configure the local limit larger than hash_max,
> even though that is not useful.
> 
> In this patch, introduce only the accounting for number of entries, and the
> max field itself, but not the means to toggle the max. The next patch
> introduces the netlink APIs to toggle and read the values.
> 
> Note that the per-port-VLAN mcast_max_groups value gets reset when VLAN
> snooping is enabled. The reason for this is that while VLAN snooping is
> disabled, permanent entries can be added above the limit imposed by the
> configured maximum. Under those circumstances, whatever caused the VLAN
> context enablement, would need to be rolled back, adding a fair amount of
> code that would be rarely hit and tricky to maintain. At the same time,
> the feature that this would enable is IMHO not interesting: I posit that
> the usefulness of keeping mcast_max_groups intact across
> mcast_vlan_snooping toggles is marginal at best.
> 
> Signed-off-by: Petr Machata <petrm@nvidia.com>
> Reviewed-by: Ido Schimmel <idosch@nvidia.com>
> ---
>  net/bridge/br_multicast.c | 131 +++++++++++++++++++++++++++++++++++++-
>  net/bridge/br_private.h   |   2 +
>  2 files changed, 132 insertions(+), 1 deletion(-)
> 

Acked-by: Nikolay Aleksandrov <razor@blackwall.org>


^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: [PATCH net-next 08/16] net: bridge: Add netlink knobs for number / maximum MDB entries
  2023-01-26 17:01   ` [Bridge] " Petr Machata
@ 2023-01-29 10:07     ` Nikolay Aleksandrov
  -1 siblings, 0 replies; 90+ messages in thread
From: Nikolay Aleksandrov @ 2023-01-29 10:07 UTC (permalink / raw)
  To: Petr Machata, David S. Miller, Eric Dumazet, Jakub Kicinski,
	Paolo Abeni, Roopa Prabhu, netdev
  Cc: bridge, Ido Schimmel

On 26/01/2023 19:01, Petr Machata wrote:
> The previous patch added accounting for number of MDB entries per port and
> per port-VLAN, and the logic to verify that these values stay within
> configured bounds. However it didn't provide means to actually configure
> those bounds or read the occupancy. This patch does that.
> 
> Two new netlink attributes are added for the MDB occupancy:
> IFLA_BRPORT_MCAST_N_GROUPS for the per-port occupancy and
> BRIDGE_VLANDB_ENTRY_MCAST_N_GROUPS for the per-port-VLAN occupancy.
> And another two for the maximum number of MDB entries:
> IFLA_BRPORT_MCAST_MAX_GROUPS for the per-port maximum, and
> BRIDGE_VLANDB_ENTRY_MCAST_MAX_GROUPS for the per-port-VLAN one.
> 
> Note that the two new IFLA_BRPORT_ attributes prompt bumping of
> RTNL_SLAVE_MAX_TYPE to size the slave attribute tables large enough.
> 
> The new attributes are used like this:
> 
>  # ip link add name br up type bridge vlan_filtering 1 mcast_snooping 1 \
>                                       mcast_vlan_snooping 1 mcast_querier 1
>  # ip link set dev v1 master br
>  # bridge vlan add dev v1 vid 2
> 
>  # bridge vlan set dev v1 vid 1 mcast_max_groups 1
>  # bridge mdb add dev br port v1 grp 230.1.2.3 temp vid 1
>  # bridge mdb add dev br port v1 grp 230.1.2.4 temp vid 1
>  Error: bridge: Port-VLAN is already a member in mcast_max_groups (1) groups.
> 
>  # bridge link set dev v1 mcast_max_groups 1
>  # bridge mdb add dev br port v1 grp 230.1.2.3 temp vid 2
>  Error: bridge: Port is already a member in mcast_max_groups (1) groups.
> 
>  # bridge -d link show
>  5: v1@v2: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 master br [...]
>      [...] mcast_n_groups 1 mcast_max_groups 1
> 
>  # bridge -d vlan show
>  port              vlan-id
>  br                1 PVID Egress Untagged
>                      state forwarding mcast_router 1
>  v1                1 PVID Egress Untagged
>                      [...] mcast_n_groups 1 mcast_max_groups 1
>                    2
>                      [...] mcast_n_groups 0 mcast_max_groups 0
> 
> Signed-off-by: Petr Machata <petrm@nvidia.com>
> Reviewed-by: Ido Schimmel <idosch@nvidia.com>
> ---
>  include/uapi/linux/if_bridge.h |  2 +
>  include/uapi/linux/if_link.h   |  2 +
>  net/bridge/br_multicast.c      | 96 ++++++++++++++++++++++++++++++++++
>  net/bridge/br_netlink.c        | 19 ++++++-
>  net/bridge/br_private.h        | 16 +++++-
>  net/bridge/br_vlan.c           | 11 ++--
>  net/bridge/br_vlan_options.c   | 33 +++++++++++-
>  net/core/rtnetlink.c           |  2 +-
>  8 files changed, 173 insertions(+), 8 deletions(-)
> 
> diff --git a/include/uapi/linux/if_bridge.h b/include/uapi/linux/if_bridge.h
> index d9de241d90f9..d60c456710b3 100644
> --- a/include/uapi/linux/if_bridge.h
> +++ b/include/uapi/linux/if_bridge.h
> @@ -523,6 +523,8 @@ enum {
>  	BRIDGE_VLANDB_ENTRY_TUNNEL_INFO,
>  	BRIDGE_VLANDB_ENTRY_STATS,
>  	BRIDGE_VLANDB_ENTRY_MCAST_ROUTER,
> +	BRIDGE_VLANDB_ENTRY_MCAST_N_GROUPS,
> +	BRIDGE_VLANDB_ENTRY_MCAST_MAX_GROUPS,
>  	__BRIDGE_VLANDB_ENTRY_MAX,
>  };
>  #define BRIDGE_VLANDB_ENTRY_MAX (__BRIDGE_VLANDB_ENTRY_MAX - 1)
> diff --git a/include/uapi/linux/if_link.h b/include/uapi/linux/if_link.h
> index 1021a7e47a86..1bed3a72939c 100644
> --- a/include/uapi/linux/if_link.h
> +++ b/include/uapi/linux/if_link.h
> @@ -564,6 +564,8 @@ enum {
>  	IFLA_BRPORT_MCAST_EHT_HOSTS_CNT,
>  	IFLA_BRPORT_LOCKED,
>  	IFLA_BRPORT_MAB,
> +	IFLA_BRPORT_MCAST_N_GROUPS,
> +	IFLA_BRPORT_MCAST_MAX_GROUPS,
>  	__IFLA_BRPORT_MAX
>  };
>  #define IFLA_BRPORT_MAX (__IFLA_BRPORT_MAX - 1)
> diff --git a/net/bridge/br_multicast.c b/net/bridge/br_multicast.c
> index de531109b947..04261dd2380b 100644
> --- a/net/bridge/br_multicast.c
> +++ b/net/bridge/br_multicast.c
> @@ -766,6 +766,102 @@ static void br_multicast_port_ngroups_dec(struct net_bridge_port *port, u16 vid)
>  	br_multicast_port_ngroups_dec_one(&port->multicast_ctx);
>  }
>  
> +static int
> +br_multicast_pmctx_ngroups_set_max(struct net_bridge_mcast_port *pmctx,
> +				   u32 max, struct netlink_ext_ack *extack)
> +{
> +	if (max && max < pmctx->mdb_n_entries) {
> +		NL_SET_ERR_MSG_FMT_MOD(extack, "Can't set mcast_max_groups=%u, which is below mcast_n_groups=%u",
> +				       max, pmctx->mdb_n_entries);

Why not? All new entries will be rejected anyway, at most some will expire and make room.

> +		return -EINVAL;
> +	}
> +
> +	pmctx->mdb_max_entries = max;
> +	return 0;
> +}
> +
> +u32 br_multicast_port_ngroups_get(const struct net_bridge_port *port)
> +{
> +	u32 n;
> +
> +	spin_lock_bh(&port->br->multicast_lock);
> +	n = port->multicast_ctx.mdb_n_entries;
> +	spin_unlock_bh(&port->br->multicast_lock);

This is too much just to read the value, we block all IGMP/MLD processing and potentially
block packet processing on the same core just to read it. These reads are done for notifications,
getlink and also for fill_slave_info. I think we can just use WRITE/READ_ONCE helpers to access
it. Especially since the lock is taken for both values (max and current count). We still get a
snapshop that can be wrong by the time it's returned and about changing it we'll start enforcing
the new limit with a minor delay which is not a big deal.

> +
> +	return n;
> +}
> +
> +int br_multicast_vlan_ngroups_get(struct net_bridge *br,
> +				  const struct net_bridge_vlan *v,
> +				  u32 *n)
> +{
> +	if (br_multicast_port_ctx_vlan_disabled(&v->port_mcast_ctx))
> +		return -EINVAL;
> +
> +	spin_lock_bh(&br->multicast_lock);
> +	*n = v->port_mcast_ctx.mdb_n_entries;
> +	spin_unlock_bh(&br->multicast_lock);
> +

ditto and for all accesses below that require the lock..

> +	return 0;
> +}
> +
> +int br_multicast_port_ngroups_set_max(struct net_bridge_port *port, u32 max,
> +				      struct netlink_ext_ack *extack)
> +{
> +	int err;
> +
> +	spin_lock_bh(&port->br->multicast_lock);
> +	err = br_multicast_pmctx_ngroups_set_max(&port->multicast_ctx, max,
> +						 extack);
> +	spin_unlock_bh(&port->br->multicast_lock);
> +
> +	return err;
> +}
> +
> +int br_multicast_vlan_ngroups_set_max(struct net_bridge *br,
> +				      struct net_bridge_vlan *v, u32 max,
> +				      struct netlink_ext_ack *extack)
> +{
> +	int err;
> +
> +	if (br_multicast_port_ctx_vlan_disabled(&v->port_mcast_ctx)) {
> +		NL_SET_ERR_MSG_MOD(extack, "Multicast snooping disabled on this VLAN");
> +		return -EINVAL;
> +	}
> +
> +	spin_lock_bh(&br->multicast_lock);
> +	err = br_multicast_pmctx_ngroups_set_max(&v->port_mcast_ctx, max,
> +						 extack);
> +	spin_unlock_bh(&br->multicast_lock);
> +
> +	return err;
> +}
> +
> +u32 br_multicast_port_ngroups_get_max(const struct net_bridge_port *port)
> +{
> +	u32 max;
> +
> +	spin_lock_bh(&port->br->multicast_lock);
> +	max = port->multicast_ctx.mdb_max_entries;
> +	spin_unlock_bh(&port->br->multicast_lock);


> +
> +	return max;
> +}
> +
> +int br_multicast_vlan_ngroups_get_max(struct net_bridge *br,
> +				      const struct net_bridge_vlan *v,
> +				      u32 *max)
> +{
> +	if (br_multicast_port_ctx_vlan_disabled(&v->port_mcast_ctx))
> +		return -EINVAL;
> +
> +	spin_lock_bh(&br->multicast_lock);
> +	*max = v->port_mcast_ctx.mdb_max_entries;
> +	spin_unlock_bh(&br->multicast_lock);


> +
> +	return 0;
> +}
> +
>  static void br_multicast_destroy_port_group(struct net_bridge_mcast_gc *gc)
>  {
>  	struct net_bridge_port_group *pg;
> diff --git a/net/bridge/br_netlink.c b/net/bridge/br_netlink.c
> index a6133d469885..063c1646dfe8 100644
> --- a/net/bridge/br_netlink.c
> +++ b/net/bridge/br_netlink.c
> @@ -202,6 +202,8 @@ static inline size_t br_port_info_size(void)
>  		+ nla_total_size_64bit(sizeof(u64)) /* IFLA_BRPORT_HOLD_TIMER */
>  #ifdef CONFIG_BRIDGE_IGMP_SNOOPING
>  		+ nla_total_size(sizeof(u8))	/* IFLA_BRPORT_MULTICAST_ROUTER */
> +		+ nla_total_size(sizeof(u32))	/* IFLA_BRPORT_MCAST_N_GROUPS */
> +		+ nla_total_size(sizeof(u32))	/* IFLA_BRPORT_MCAST_MAX_GROUPS */
>  #endif
>  		+ nla_total_size(sizeof(u16))	/* IFLA_BRPORT_GROUP_FWD_MASK */
>  		+ nla_total_size(sizeof(u8))	/* IFLA_BRPORT_MRP_RING_OPEN */
> @@ -298,7 +300,11 @@ static int br_port_fill_attrs(struct sk_buff *skb,
>  	    nla_put_u32(skb, IFLA_BRPORT_MCAST_EHT_HOSTS_LIMIT,
>  			p->multicast_eht_hosts_limit) ||
>  	    nla_put_u32(skb, IFLA_BRPORT_MCAST_EHT_HOSTS_CNT,
> -			p->multicast_eht_hosts_cnt))
> +			p->multicast_eht_hosts_cnt) ||
> +	    nla_put_u32(skb, IFLA_BRPORT_MCAST_N_GROUPS,
> +			br_multicast_port_ngroups_get(p)) ||
> +	    nla_put_u32(skb, IFLA_BRPORT_MCAST_MAX_GROUPS,
> +			br_multicast_port_ngroups_get_max(p)))
>  		return -EMSGSIZE;
>  #endif
>  
> @@ -883,6 +889,8 @@ static const struct nla_policy br_port_policy[IFLA_BRPORT_MAX + 1] = {
>  	[IFLA_BRPORT_MAB] = { .type = NLA_U8 },
>  	[IFLA_BRPORT_BACKUP_PORT] = { .type = NLA_U32 },
>  	[IFLA_BRPORT_MCAST_EHT_HOSTS_LIMIT] = { .type = NLA_U32 },
> +	[IFLA_BRPORT_MCAST_N_GROUPS] = { .type = NLA_REJECT },
> +	[IFLA_BRPORT_MCAST_MAX_GROUPS] = { .type = NLA_U32 },
>  };
>  
>  /* Change the state of the port and notify spanning tree */
> @@ -1017,6 +1025,15 @@ static int br_setport(struct net_bridge_port *p, struct nlattr *tb[],
>  		if (err)
>  			return err;
>  	}
> +
> +	if (tb[IFLA_BRPORT_MCAST_MAX_GROUPS]) {
> +		u32 max_groups;
> +
> +		max_groups = nla_get_u32(tb[IFLA_BRPORT_MCAST_MAX_GROUPS]);
> +		err = br_multicast_port_ngroups_set_max(p, max_groups, extack);
> +		if (err)
> +			return err;
> +	}
>  #endif
>  
>  	if (tb[IFLA_BRPORT_GROUP_FWD_MASK]) {
> diff --git a/net/bridge/br_private.h b/net/bridge/br_private.h
> index 49f411a0a1f1..86b7a221e806 100644
> --- a/net/bridge/br_private.h
> +++ b/net/bridge/br_private.h
> @@ -978,6 +978,19 @@ void br_multicast_uninit_stats(struct net_bridge *br);
>  void br_multicast_get_stats(const struct net_bridge *br,
>  			    const struct net_bridge_port *p,
>  			    struct br_mcast_stats *dest);
> +u32 br_multicast_port_ngroups_get(const struct net_bridge_port *port);
> +int br_multicast_vlan_ngroups_get(struct net_bridge *br,
> +				  const struct net_bridge_vlan *v,
> +				  u32 *n);
> +int br_multicast_port_ngroups_set_max(struct net_bridge_port *port,
> +				      u32 max, struct netlink_ext_ack *extack);
> +int br_multicast_vlan_ngroups_set_max(struct net_bridge *br,
> +				      struct net_bridge_vlan *v, u32 max,
> +				      struct netlink_ext_ack *extack);
> +u32 br_multicast_port_ngroups_get_max(const struct net_bridge_port *port);
> +int br_multicast_vlan_ngroups_get_max(struct net_bridge *br,
> +				      const struct net_bridge_vlan *v,
> +				      u32 *max);
>  void br_mdb_init(void);
>  void br_mdb_uninit(void);
>  void br_multicast_host_join(const struct net_bridge_mcast *brmctx,
> @@ -1761,7 +1774,8 @@ static inline u16 br_vlan_flags(const struct net_bridge_vlan *v, u16 pvid)
>  #ifdef CONFIG_BRIDGE_VLAN_FILTERING
>  bool br_vlan_opts_eq_range(const struct net_bridge_vlan *v_curr,
>  			   const struct net_bridge_vlan *range_end);
> -bool br_vlan_opts_fill(struct sk_buff *skb, const struct net_bridge_vlan *v);
> +bool br_vlan_opts_fill(struct sk_buff *skb, const struct net_bridge_vlan *v,
> +		       const struct net_bridge_port *p);
>  size_t br_vlan_opts_nl_size(void);
>  int br_vlan_process_options(const struct net_bridge *br,
>  			    const struct net_bridge_port *p,
> diff --git a/net/bridge/br_vlan.c b/net/bridge/br_vlan.c
> index bc75fa1e4666..8a3dbc09ba38 100644
> --- a/net/bridge/br_vlan.c
> +++ b/net/bridge/br_vlan.c
> @@ -1816,6 +1816,7 @@ static bool br_vlan_stats_fill(struct sk_buff *skb,
>  /* v_opts is used to dump the options which must be equal in the whole range */
>  static bool br_vlan_fill_vids(struct sk_buff *skb, u16 vid, u16 vid_range,
>  			      const struct net_bridge_vlan *v_opts,
> +			      const struct net_bridge_port *p,
>  			      u16 flags,
>  			      bool dump_stats)
>  {
> @@ -1842,7 +1843,7 @@ static bool br_vlan_fill_vids(struct sk_buff *skb, u16 vid, u16 vid_range,
>  		goto out_err;
>  
>  	if (v_opts) {
> -		if (!br_vlan_opts_fill(skb, v_opts))
> +		if (!br_vlan_opts_fill(skb, v_opts, p))
>  			goto out_err;
>  
>  		if (dump_stats && !br_vlan_stats_fill(skb, v_opts))
> @@ -1925,7 +1926,7 @@ void br_vlan_notify(const struct net_bridge *br,
>  		goto out_kfree;
>  	}
>  
> -	if (!br_vlan_fill_vids(skb, vid, vid_range, v, flags, false))
> +	if (!br_vlan_fill_vids(skb, vid, vid_range, v, p, flags, false))
>  		goto out_err;
>  
>  	nlmsg_end(skb, nlh);
> @@ -2030,7 +2031,7 @@ static int br_vlan_dump_dev(const struct net_device *dev,
>  
>  			if (!br_vlan_fill_vids(skb, range_start->vid,
>  					       range_end->vid, range_start,
> -					       vlan_flags, dump_stats)) {
> +					       p, vlan_flags, dump_stats)) {
>  				err = -EMSGSIZE;
>  				break;
>  			}
> @@ -2056,7 +2057,7 @@ static int br_vlan_dump_dev(const struct net_device *dev,
>  		else if (!dump_global &&
>  			 !br_vlan_fill_vids(skb, range_start->vid,
>  					    range_end->vid, range_start,
> -					    br_vlan_flags(range_start, pvid),
> +					    p, br_vlan_flags(range_start, pvid),
>  					    dump_stats))
>  			err = -EMSGSIZE;
>  	}
> @@ -2131,6 +2132,8 @@ static const struct nla_policy br_vlan_db_policy[BRIDGE_VLANDB_ENTRY_MAX + 1] =
>  	[BRIDGE_VLANDB_ENTRY_STATE]	= { .type = NLA_U8 },
>  	[BRIDGE_VLANDB_ENTRY_TUNNEL_INFO] = { .type = NLA_NESTED },
>  	[BRIDGE_VLANDB_ENTRY_MCAST_ROUTER]	= { .type = NLA_U8 },
> +	[BRIDGE_VLANDB_ENTRY_MCAST_N_GROUPS]	= { .type = NLA_REJECT },
> +	[BRIDGE_VLANDB_ENTRY_MCAST_MAX_GROUPS]	= { .type = NLA_U32 },
>  };
>  
>  static int br_vlan_rtm_process_one(struct net_device *dev,
> diff --git a/net/bridge/br_vlan_options.c b/net/bridge/br_vlan_options.c
> index a2724d03278c..43d8f11ce79c 100644
> --- a/net/bridge/br_vlan_options.c
> +++ b/net/bridge/br_vlan_options.c
> @@ -48,7 +48,8 @@ bool br_vlan_opts_eq_range(const struct net_bridge_vlan *v_curr,
>  	       curr_mc_rtr == range_mc_rtr;
>  }
>  
> -bool br_vlan_opts_fill(struct sk_buff *skb, const struct net_bridge_vlan *v)
> +bool br_vlan_opts_fill(struct sk_buff *skb, const struct net_bridge_vlan *v,
> +		       const struct net_bridge_port *p)
>  {
>  	if (nla_put_u8(skb, BRIDGE_VLANDB_ENTRY_STATE, br_vlan_get_state(v)) ||
>  	    !__vlan_tun_put(skb, v))
> @@ -58,6 +59,20 @@ bool br_vlan_opts_fill(struct sk_buff *skb, const struct net_bridge_vlan *v)
>  	if (nla_put_u8(skb, BRIDGE_VLANDB_ENTRY_MCAST_ROUTER,
>  		       br_vlan_multicast_router(v)))
>  		return false;
> +	if (p && !br_multicast_port_ctx_vlan_disabled(&v->port_mcast_ctx)) {
> +		u32 mdb_max_entries;
> +		u32 mdb_n_entries;
> +
> +		if (br_multicast_vlan_ngroups_get(p->br, v, &mdb_n_entries) ||
> +		    nla_put_u32(skb, BRIDGE_VLANDB_ENTRY_MCAST_N_GROUPS,
> +				mdb_n_entries))
> +			return false;
> +		if (br_multicast_vlan_ngroups_get_max(p->br, v,
> +						      &mdb_max_entries) ||
> +		    nla_put_u32(skb, BRIDGE_VLANDB_ENTRY_MCAST_MAX_GROUPS,
> +				mdb_max_entries))
> +			return false;
> +	}
>  #endif
>  
>  	return true;
> @@ -70,6 +85,8 @@ size_t br_vlan_opts_nl_size(void)
>  	       + nla_total_size(sizeof(u32)) /* BRIDGE_VLANDB_TINFO_ID */
>  #ifdef CONFIG_BRIDGE_IGMP_SNOOPING
>  	       + nla_total_size(sizeof(u8)) /* BRIDGE_VLANDB_ENTRY_MCAST_ROUTER */
> +	       + nla_total_size(sizeof(u32)) /* BRIDGE_VLANDB_ENTRY_MCAST_N_GROUPS */
> +	       + nla_total_size(sizeof(u32)) /* BRIDGE_VLANDB_ENTRY_MCAST_MAX_GROUPS */
>  #endif
>  	       + 0;
>  }
> @@ -212,6 +229,20 @@ static int br_vlan_process_one_opts(const struct net_bridge *br,
>  			return err;
>  		*changed = true;
>  	}
> +	if (tb[BRIDGE_VLANDB_ENTRY_MCAST_MAX_GROUPS]) {
> +		u32 val;
> +
> +		if (!p) {
> +			NL_SET_ERR_MSG_MOD(extack, "Can't set mcast_max_groups for non-port vlans");
> +			return -EINVAL;
> +		}
> +
> +		val = nla_get_u32(tb[BRIDGE_VLANDB_ENTRY_MCAST_MAX_GROUPS]);
> +		err = br_multicast_vlan_ngroups_set_max(p->br, v, val, extack);
> +		if (err)
> +			return err;
> +		*changed = true;
> +	}
>  #endif
>  
>  	return 0;
> diff --git a/net/core/rtnetlink.c b/net/core/rtnetlink.c
> index 64289bc98887..e786255a8360 100644
> --- a/net/core/rtnetlink.c
> +++ b/net/core/rtnetlink.c
> @@ -58,7 +58,7 @@
>  #include "dev.h"
>  
>  #define RTNL_MAX_TYPE		50
> -#define RTNL_SLAVE_MAX_TYPE	40
> +#define RTNL_SLAVE_MAX_TYPE	42
>  
>  struct rtnl_link {
>  	rtnl_doit_func		doit;


^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: [Bridge] [PATCH net-next 08/16] net: bridge: Add netlink knobs for number / maximum MDB entries
@ 2023-01-29 10:07     ` Nikolay Aleksandrov
  0 siblings, 0 replies; 90+ messages in thread
From: Nikolay Aleksandrov @ 2023-01-29 10:07 UTC (permalink / raw)
  To: Petr Machata, David S. Miller, Eric Dumazet, Jakub Kicinski,
	Paolo Abeni, Roopa Prabhu, netdev
  Cc: Ido Schimmel, bridge

On 26/01/2023 19:01, Petr Machata wrote:
> The previous patch added accounting for number of MDB entries per port and
> per port-VLAN, and the logic to verify that these values stay within
> configured bounds. However it didn't provide means to actually configure
> those bounds or read the occupancy. This patch does that.
> 
> Two new netlink attributes are added for the MDB occupancy:
> IFLA_BRPORT_MCAST_N_GROUPS for the per-port occupancy and
> BRIDGE_VLANDB_ENTRY_MCAST_N_GROUPS for the per-port-VLAN occupancy.
> And another two for the maximum number of MDB entries:
> IFLA_BRPORT_MCAST_MAX_GROUPS for the per-port maximum, and
> BRIDGE_VLANDB_ENTRY_MCAST_MAX_GROUPS for the per-port-VLAN one.
> 
> Note that the two new IFLA_BRPORT_ attributes prompt bumping of
> RTNL_SLAVE_MAX_TYPE to size the slave attribute tables large enough.
> 
> The new attributes are used like this:
> 
>  # ip link add name br up type bridge vlan_filtering 1 mcast_snooping 1 \
>                                       mcast_vlan_snooping 1 mcast_querier 1
>  # ip link set dev v1 master br
>  # bridge vlan add dev v1 vid 2
> 
>  # bridge vlan set dev v1 vid 1 mcast_max_groups 1
>  # bridge mdb add dev br port v1 grp 230.1.2.3 temp vid 1
>  # bridge mdb add dev br port v1 grp 230.1.2.4 temp vid 1
>  Error: bridge: Port-VLAN is already a member in mcast_max_groups (1) groups.
> 
>  # bridge link set dev v1 mcast_max_groups 1
>  # bridge mdb add dev br port v1 grp 230.1.2.3 temp vid 2
>  Error: bridge: Port is already a member in mcast_max_groups (1) groups.
> 
>  # bridge -d link show
>  5: v1@v2: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 master br [...]
>      [...] mcast_n_groups 1 mcast_max_groups 1
> 
>  # bridge -d vlan show
>  port              vlan-id
>  br                1 PVID Egress Untagged
>                      state forwarding mcast_router 1
>  v1                1 PVID Egress Untagged
>                      [...] mcast_n_groups 1 mcast_max_groups 1
>                    2
>                      [...] mcast_n_groups 0 mcast_max_groups 0
> 
> Signed-off-by: Petr Machata <petrm@nvidia.com>
> Reviewed-by: Ido Schimmel <idosch@nvidia.com>
> ---
>  include/uapi/linux/if_bridge.h |  2 +
>  include/uapi/linux/if_link.h   |  2 +
>  net/bridge/br_multicast.c      | 96 ++++++++++++++++++++++++++++++++++
>  net/bridge/br_netlink.c        | 19 ++++++-
>  net/bridge/br_private.h        | 16 +++++-
>  net/bridge/br_vlan.c           | 11 ++--
>  net/bridge/br_vlan_options.c   | 33 +++++++++++-
>  net/core/rtnetlink.c           |  2 +-
>  8 files changed, 173 insertions(+), 8 deletions(-)
> 
> diff --git a/include/uapi/linux/if_bridge.h b/include/uapi/linux/if_bridge.h
> index d9de241d90f9..d60c456710b3 100644
> --- a/include/uapi/linux/if_bridge.h
> +++ b/include/uapi/linux/if_bridge.h
> @@ -523,6 +523,8 @@ enum {
>  	BRIDGE_VLANDB_ENTRY_TUNNEL_INFO,
>  	BRIDGE_VLANDB_ENTRY_STATS,
>  	BRIDGE_VLANDB_ENTRY_MCAST_ROUTER,
> +	BRIDGE_VLANDB_ENTRY_MCAST_N_GROUPS,
> +	BRIDGE_VLANDB_ENTRY_MCAST_MAX_GROUPS,
>  	__BRIDGE_VLANDB_ENTRY_MAX,
>  };
>  #define BRIDGE_VLANDB_ENTRY_MAX (__BRIDGE_VLANDB_ENTRY_MAX - 1)
> diff --git a/include/uapi/linux/if_link.h b/include/uapi/linux/if_link.h
> index 1021a7e47a86..1bed3a72939c 100644
> --- a/include/uapi/linux/if_link.h
> +++ b/include/uapi/linux/if_link.h
> @@ -564,6 +564,8 @@ enum {
>  	IFLA_BRPORT_MCAST_EHT_HOSTS_CNT,
>  	IFLA_BRPORT_LOCKED,
>  	IFLA_BRPORT_MAB,
> +	IFLA_BRPORT_MCAST_N_GROUPS,
> +	IFLA_BRPORT_MCAST_MAX_GROUPS,
>  	__IFLA_BRPORT_MAX
>  };
>  #define IFLA_BRPORT_MAX (__IFLA_BRPORT_MAX - 1)
> diff --git a/net/bridge/br_multicast.c b/net/bridge/br_multicast.c
> index de531109b947..04261dd2380b 100644
> --- a/net/bridge/br_multicast.c
> +++ b/net/bridge/br_multicast.c
> @@ -766,6 +766,102 @@ static void br_multicast_port_ngroups_dec(struct net_bridge_port *port, u16 vid)
>  	br_multicast_port_ngroups_dec_one(&port->multicast_ctx);
>  }
>  
> +static int
> +br_multicast_pmctx_ngroups_set_max(struct net_bridge_mcast_port *pmctx,
> +				   u32 max, struct netlink_ext_ack *extack)
> +{
> +	if (max && max < pmctx->mdb_n_entries) {
> +		NL_SET_ERR_MSG_FMT_MOD(extack, "Can't set mcast_max_groups=%u, which is below mcast_n_groups=%u",
> +				       max, pmctx->mdb_n_entries);

Why not? All new entries will be rejected anyway, at most some will expire and make room.

> +		return -EINVAL;
> +	}
> +
> +	pmctx->mdb_max_entries = max;
> +	return 0;
> +}
> +
> +u32 br_multicast_port_ngroups_get(const struct net_bridge_port *port)
> +{
> +	u32 n;
> +
> +	spin_lock_bh(&port->br->multicast_lock);
> +	n = port->multicast_ctx.mdb_n_entries;
> +	spin_unlock_bh(&port->br->multicast_lock);

This is too much just to read the value, we block all IGMP/MLD processing and potentially
block packet processing on the same core just to read it. These reads are done for notifications,
getlink and also for fill_slave_info. I think we can just use WRITE/READ_ONCE helpers to access
it. Especially since the lock is taken for both values (max and current count). We still get a
snapshop that can be wrong by the time it's returned and about changing it we'll start enforcing
the new limit with a minor delay which is not a big deal.

> +
> +	return n;
> +}
> +
> +int br_multicast_vlan_ngroups_get(struct net_bridge *br,
> +				  const struct net_bridge_vlan *v,
> +				  u32 *n)
> +{
> +	if (br_multicast_port_ctx_vlan_disabled(&v->port_mcast_ctx))
> +		return -EINVAL;
> +
> +	spin_lock_bh(&br->multicast_lock);
> +	*n = v->port_mcast_ctx.mdb_n_entries;
> +	spin_unlock_bh(&br->multicast_lock);
> +

ditto and for all accesses below that require the lock..

> +	return 0;
> +}
> +
> +int br_multicast_port_ngroups_set_max(struct net_bridge_port *port, u32 max,
> +				      struct netlink_ext_ack *extack)
> +{
> +	int err;
> +
> +	spin_lock_bh(&port->br->multicast_lock);
> +	err = br_multicast_pmctx_ngroups_set_max(&port->multicast_ctx, max,
> +						 extack);
> +	spin_unlock_bh(&port->br->multicast_lock);
> +
> +	return err;
> +}
> +
> +int br_multicast_vlan_ngroups_set_max(struct net_bridge *br,
> +				      struct net_bridge_vlan *v, u32 max,
> +				      struct netlink_ext_ack *extack)
> +{
> +	int err;
> +
> +	if (br_multicast_port_ctx_vlan_disabled(&v->port_mcast_ctx)) {
> +		NL_SET_ERR_MSG_MOD(extack, "Multicast snooping disabled on this VLAN");
> +		return -EINVAL;
> +	}
> +
> +	spin_lock_bh(&br->multicast_lock);
> +	err = br_multicast_pmctx_ngroups_set_max(&v->port_mcast_ctx, max,
> +						 extack);
> +	spin_unlock_bh(&br->multicast_lock);
> +
> +	return err;
> +}
> +
> +u32 br_multicast_port_ngroups_get_max(const struct net_bridge_port *port)
> +{
> +	u32 max;
> +
> +	spin_lock_bh(&port->br->multicast_lock);
> +	max = port->multicast_ctx.mdb_max_entries;
> +	spin_unlock_bh(&port->br->multicast_lock);


> +
> +	return max;
> +}
> +
> +int br_multicast_vlan_ngroups_get_max(struct net_bridge *br,
> +				      const struct net_bridge_vlan *v,
> +				      u32 *max)
> +{
> +	if (br_multicast_port_ctx_vlan_disabled(&v->port_mcast_ctx))
> +		return -EINVAL;
> +
> +	spin_lock_bh(&br->multicast_lock);
> +	*max = v->port_mcast_ctx.mdb_max_entries;
> +	spin_unlock_bh(&br->multicast_lock);


> +
> +	return 0;
> +}
> +
>  static void br_multicast_destroy_port_group(struct net_bridge_mcast_gc *gc)
>  {
>  	struct net_bridge_port_group *pg;
> diff --git a/net/bridge/br_netlink.c b/net/bridge/br_netlink.c
> index a6133d469885..063c1646dfe8 100644
> --- a/net/bridge/br_netlink.c
> +++ b/net/bridge/br_netlink.c
> @@ -202,6 +202,8 @@ static inline size_t br_port_info_size(void)
>  		+ nla_total_size_64bit(sizeof(u64)) /* IFLA_BRPORT_HOLD_TIMER */
>  #ifdef CONFIG_BRIDGE_IGMP_SNOOPING
>  		+ nla_total_size(sizeof(u8))	/* IFLA_BRPORT_MULTICAST_ROUTER */
> +		+ nla_total_size(sizeof(u32))	/* IFLA_BRPORT_MCAST_N_GROUPS */
> +		+ nla_total_size(sizeof(u32))	/* IFLA_BRPORT_MCAST_MAX_GROUPS */
>  #endif
>  		+ nla_total_size(sizeof(u16))	/* IFLA_BRPORT_GROUP_FWD_MASK */
>  		+ nla_total_size(sizeof(u8))	/* IFLA_BRPORT_MRP_RING_OPEN */
> @@ -298,7 +300,11 @@ static int br_port_fill_attrs(struct sk_buff *skb,
>  	    nla_put_u32(skb, IFLA_BRPORT_MCAST_EHT_HOSTS_LIMIT,
>  			p->multicast_eht_hosts_limit) ||
>  	    nla_put_u32(skb, IFLA_BRPORT_MCAST_EHT_HOSTS_CNT,
> -			p->multicast_eht_hosts_cnt))
> +			p->multicast_eht_hosts_cnt) ||
> +	    nla_put_u32(skb, IFLA_BRPORT_MCAST_N_GROUPS,
> +			br_multicast_port_ngroups_get(p)) ||
> +	    nla_put_u32(skb, IFLA_BRPORT_MCAST_MAX_GROUPS,
> +			br_multicast_port_ngroups_get_max(p)))
>  		return -EMSGSIZE;
>  #endif
>  
> @@ -883,6 +889,8 @@ static const struct nla_policy br_port_policy[IFLA_BRPORT_MAX + 1] = {
>  	[IFLA_BRPORT_MAB] = { .type = NLA_U8 },
>  	[IFLA_BRPORT_BACKUP_PORT] = { .type = NLA_U32 },
>  	[IFLA_BRPORT_MCAST_EHT_HOSTS_LIMIT] = { .type = NLA_U32 },
> +	[IFLA_BRPORT_MCAST_N_GROUPS] = { .type = NLA_REJECT },
> +	[IFLA_BRPORT_MCAST_MAX_GROUPS] = { .type = NLA_U32 },
>  };
>  
>  /* Change the state of the port and notify spanning tree */
> @@ -1017,6 +1025,15 @@ static int br_setport(struct net_bridge_port *p, struct nlattr *tb[],
>  		if (err)
>  			return err;
>  	}
> +
> +	if (tb[IFLA_BRPORT_MCAST_MAX_GROUPS]) {
> +		u32 max_groups;
> +
> +		max_groups = nla_get_u32(tb[IFLA_BRPORT_MCAST_MAX_GROUPS]);
> +		err = br_multicast_port_ngroups_set_max(p, max_groups, extack);
> +		if (err)
> +			return err;
> +	}
>  #endif
>  
>  	if (tb[IFLA_BRPORT_GROUP_FWD_MASK]) {
> diff --git a/net/bridge/br_private.h b/net/bridge/br_private.h
> index 49f411a0a1f1..86b7a221e806 100644
> --- a/net/bridge/br_private.h
> +++ b/net/bridge/br_private.h
> @@ -978,6 +978,19 @@ void br_multicast_uninit_stats(struct net_bridge *br);
>  void br_multicast_get_stats(const struct net_bridge *br,
>  			    const struct net_bridge_port *p,
>  			    struct br_mcast_stats *dest);
> +u32 br_multicast_port_ngroups_get(const struct net_bridge_port *port);
> +int br_multicast_vlan_ngroups_get(struct net_bridge *br,
> +				  const struct net_bridge_vlan *v,
> +				  u32 *n);
> +int br_multicast_port_ngroups_set_max(struct net_bridge_port *port,
> +				      u32 max, struct netlink_ext_ack *extack);
> +int br_multicast_vlan_ngroups_set_max(struct net_bridge *br,
> +				      struct net_bridge_vlan *v, u32 max,
> +				      struct netlink_ext_ack *extack);
> +u32 br_multicast_port_ngroups_get_max(const struct net_bridge_port *port);
> +int br_multicast_vlan_ngroups_get_max(struct net_bridge *br,
> +				      const struct net_bridge_vlan *v,
> +				      u32 *max);
>  void br_mdb_init(void);
>  void br_mdb_uninit(void);
>  void br_multicast_host_join(const struct net_bridge_mcast *brmctx,
> @@ -1761,7 +1774,8 @@ static inline u16 br_vlan_flags(const struct net_bridge_vlan *v, u16 pvid)
>  #ifdef CONFIG_BRIDGE_VLAN_FILTERING
>  bool br_vlan_opts_eq_range(const struct net_bridge_vlan *v_curr,
>  			   const struct net_bridge_vlan *range_end);
> -bool br_vlan_opts_fill(struct sk_buff *skb, const struct net_bridge_vlan *v);
> +bool br_vlan_opts_fill(struct sk_buff *skb, const struct net_bridge_vlan *v,
> +		       const struct net_bridge_port *p);
>  size_t br_vlan_opts_nl_size(void);
>  int br_vlan_process_options(const struct net_bridge *br,
>  			    const struct net_bridge_port *p,
> diff --git a/net/bridge/br_vlan.c b/net/bridge/br_vlan.c
> index bc75fa1e4666..8a3dbc09ba38 100644
> --- a/net/bridge/br_vlan.c
> +++ b/net/bridge/br_vlan.c
> @@ -1816,6 +1816,7 @@ static bool br_vlan_stats_fill(struct sk_buff *skb,
>  /* v_opts is used to dump the options which must be equal in the whole range */
>  static bool br_vlan_fill_vids(struct sk_buff *skb, u16 vid, u16 vid_range,
>  			      const struct net_bridge_vlan *v_opts,
> +			      const struct net_bridge_port *p,
>  			      u16 flags,
>  			      bool dump_stats)
>  {
> @@ -1842,7 +1843,7 @@ static bool br_vlan_fill_vids(struct sk_buff *skb, u16 vid, u16 vid_range,
>  		goto out_err;
>  
>  	if (v_opts) {
> -		if (!br_vlan_opts_fill(skb, v_opts))
> +		if (!br_vlan_opts_fill(skb, v_opts, p))
>  			goto out_err;
>  
>  		if (dump_stats && !br_vlan_stats_fill(skb, v_opts))
> @@ -1925,7 +1926,7 @@ void br_vlan_notify(const struct net_bridge *br,
>  		goto out_kfree;
>  	}
>  
> -	if (!br_vlan_fill_vids(skb, vid, vid_range, v, flags, false))
> +	if (!br_vlan_fill_vids(skb, vid, vid_range, v, p, flags, false))
>  		goto out_err;
>  
>  	nlmsg_end(skb, nlh);
> @@ -2030,7 +2031,7 @@ static int br_vlan_dump_dev(const struct net_device *dev,
>  
>  			if (!br_vlan_fill_vids(skb, range_start->vid,
>  					       range_end->vid, range_start,
> -					       vlan_flags, dump_stats)) {
> +					       p, vlan_flags, dump_stats)) {
>  				err = -EMSGSIZE;
>  				break;
>  			}
> @@ -2056,7 +2057,7 @@ static int br_vlan_dump_dev(const struct net_device *dev,
>  		else if (!dump_global &&
>  			 !br_vlan_fill_vids(skb, range_start->vid,
>  					    range_end->vid, range_start,
> -					    br_vlan_flags(range_start, pvid),
> +					    p, br_vlan_flags(range_start, pvid),
>  					    dump_stats))
>  			err = -EMSGSIZE;
>  	}
> @@ -2131,6 +2132,8 @@ static const struct nla_policy br_vlan_db_policy[BRIDGE_VLANDB_ENTRY_MAX + 1] =
>  	[BRIDGE_VLANDB_ENTRY_STATE]	= { .type = NLA_U8 },
>  	[BRIDGE_VLANDB_ENTRY_TUNNEL_INFO] = { .type = NLA_NESTED },
>  	[BRIDGE_VLANDB_ENTRY_MCAST_ROUTER]	= { .type = NLA_U8 },
> +	[BRIDGE_VLANDB_ENTRY_MCAST_N_GROUPS]	= { .type = NLA_REJECT },
> +	[BRIDGE_VLANDB_ENTRY_MCAST_MAX_GROUPS]	= { .type = NLA_U32 },
>  };
>  
>  static int br_vlan_rtm_process_one(struct net_device *dev,
> diff --git a/net/bridge/br_vlan_options.c b/net/bridge/br_vlan_options.c
> index a2724d03278c..43d8f11ce79c 100644
> --- a/net/bridge/br_vlan_options.c
> +++ b/net/bridge/br_vlan_options.c
> @@ -48,7 +48,8 @@ bool br_vlan_opts_eq_range(const struct net_bridge_vlan *v_curr,
>  	       curr_mc_rtr == range_mc_rtr;
>  }
>  
> -bool br_vlan_opts_fill(struct sk_buff *skb, const struct net_bridge_vlan *v)
> +bool br_vlan_opts_fill(struct sk_buff *skb, const struct net_bridge_vlan *v,
> +		       const struct net_bridge_port *p)
>  {
>  	if (nla_put_u8(skb, BRIDGE_VLANDB_ENTRY_STATE, br_vlan_get_state(v)) ||
>  	    !__vlan_tun_put(skb, v))
> @@ -58,6 +59,20 @@ bool br_vlan_opts_fill(struct sk_buff *skb, const struct net_bridge_vlan *v)
>  	if (nla_put_u8(skb, BRIDGE_VLANDB_ENTRY_MCAST_ROUTER,
>  		       br_vlan_multicast_router(v)))
>  		return false;
> +	if (p && !br_multicast_port_ctx_vlan_disabled(&v->port_mcast_ctx)) {
> +		u32 mdb_max_entries;
> +		u32 mdb_n_entries;
> +
> +		if (br_multicast_vlan_ngroups_get(p->br, v, &mdb_n_entries) ||
> +		    nla_put_u32(skb, BRIDGE_VLANDB_ENTRY_MCAST_N_GROUPS,
> +				mdb_n_entries))
> +			return false;
> +		if (br_multicast_vlan_ngroups_get_max(p->br, v,
> +						      &mdb_max_entries) ||
> +		    nla_put_u32(skb, BRIDGE_VLANDB_ENTRY_MCAST_MAX_GROUPS,
> +				mdb_max_entries))
> +			return false;
> +	}
>  #endif
>  
>  	return true;
> @@ -70,6 +85,8 @@ size_t br_vlan_opts_nl_size(void)
>  	       + nla_total_size(sizeof(u32)) /* BRIDGE_VLANDB_TINFO_ID */
>  #ifdef CONFIG_BRIDGE_IGMP_SNOOPING
>  	       + nla_total_size(sizeof(u8)) /* BRIDGE_VLANDB_ENTRY_MCAST_ROUTER */
> +	       + nla_total_size(sizeof(u32)) /* BRIDGE_VLANDB_ENTRY_MCAST_N_GROUPS */
> +	       + nla_total_size(sizeof(u32)) /* BRIDGE_VLANDB_ENTRY_MCAST_MAX_GROUPS */
>  #endif
>  	       + 0;
>  }
> @@ -212,6 +229,20 @@ static int br_vlan_process_one_opts(const struct net_bridge *br,
>  			return err;
>  		*changed = true;
>  	}
> +	if (tb[BRIDGE_VLANDB_ENTRY_MCAST_MAX_GROUPS]) {
> +		u32 val;
> +
> +		if (!p) {
> +			NL_SET_ERR_MSG_MOD(extack, "Can't set mcast_max_groups for non-port vlans");
> +			return -EINVAL;
> +		}
> +
> +		val = nla_get_u32(tb[BRIDGE_VLANDB_ENTRY_MCAST_MAX_GROUPS]);
> +		err = br_multicast_vlan_ngroups_set_max(p->br, v, val, extack);
> +		if (err)
> +			return err;
> +		*changed = true;
> +	}
>  #endif
>  
>  	return 0;
> diff --git a/net/core/rtnetlink.c b/net/core/rtnetlink.c
> index 64289bc98887..e786255a8360 100644
> --- a/net/core/rtnetlink.c
> +++ b/net/core/rtnetlink.c
> @@ -58,7 +58,7 @@
>  #include "dev.h"
>  
>  #define RTNL_MAX_TYPE		50
> -#define RTNL_SLAVE_MAX_TYPE	40
> +#define RTNL_SLAVE_MAX_TYPE	42
>  
>  struct rtnl_link {
>  	rtnl_doit_func		doit;


^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: [PATCH net-next 09/16] selftests: forwarding: Move IGMP- and MLD-related functions to lib
  2023-01-26 17:01   ` [Bridge] " Petr Machata
@ 2023-01-29 10:08     ` Nikolay Aleksandrov
  -1 siblings, 0 replies; 90+ messages in thread
From: Nikolay Aleksandrov @ 2023-01-29 10:08 UTC (permalink / raw)
  To: Petr Machata, David S. Miller, Eric Dumazet, Jakub Kicinski,
	Paolo Abeni, Roopa Prabhu, netdev
  Cc: bridge, Ido Schimmel

On 26/01/2023 19:01, Petr Machata wrote:
> These functions will be helpful for other testsuites as well. Extract them
> to a common place.
> 
> Signed-off-by: Petr Machata <petrm@nvidia.com>
> Reviewed-by: Ido Schimmel <idosch@nvidia.com>
> ---
>  .../selftests/net/forwarding/bridge_mdb.sh    | 49 -------------------
>  tools/testing/selftests/net/forwarding/lib.sh | 49 +++++++++++++++++++
>  2 files changed, 49 insertions(+), 49 deletions(-)
> 

Acked-by: Nikolay Aleksandrov <razor@blackwall.org>



^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: [Bridge] [PATCH net-next 09/16] selftests: forwarding: Move IGMP- and MLD-related functions to lib
@ 2023-01-29 10:08     ` Nikolay Aleksandrov
  0 siblings, 0 replies; 90+ messages in thread
From: Nikolay Aleksandrov @ 2023-01-29 10:08 UTC (permalink / raw)
  To: Petr Machata, David S. Miller, Eric Dumazet, Jakub Kicinski,
	Paolo Abeni, Roopa Prabhu, netdev
  Cc: Ido Schimmel, bridge

On 26/01/2023 19:01, Petr Machata wrote:
> These functions will be helpful for other testsuites as well. Extract them
> to a common place.
> 
> Signed-off-by: Petr Machata <petrm@nvidia.com>
> Reviewed-by: Ido Schimmel <idosch@nvidia.com>
> ---
>  .../selftests/net/forwarding/bridge_mdb.sh    | 49 -------------------
>  tools/testing/selftests/net/forwarding/lib.sh | 49 +++++++++++++++++++
>  2 files changed, 49 insertions(+), 49 deletions(-)
> 

Acked-by: Nikolay Aleksandrov <razor@blackwall.org>



^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: [PATCH net-next 10/16] selftests: forwarding: bridge_mdb: Fix a typo
  2023-01-26 17:01   ` [Bridge] " Petr Machata
@ 2023-01-29 10:09     ` Nikolay Aleksandrov
  -1 siblings, 0 replies; 90+ messages in thread
From: Nikolay Aleksandrov @ 2023-01-29 10:09 UTC (permalink / raw)
  To: Petr Machata, David S. Miller, Eric Dumazet, Jakub Kicinski,
	Paolo Abeni, Roopa Prabhu, netdev
  Cc: bridge, Ido Schimmel

On 26/01/2023 19:01, Petr Machata wrote:
> Add the letter missing from the word "INCLUDE".
> 
> Signed-off-by: Petr Machata <petrm@nvidia.com>
> Reviewed-by: Ido Schimmel <idosch@nvidia.com>
> ---
>  tools/testing/selftests/net/forwarding/bridge_mdb.sh | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/tools/testing/selftests/net/forwarding/bridge_mdb.sh b/tools/testing/selftests/net/forwarding/bridge_mdb.sh
> index 51f2b0d77067..4e16677f02ba 100755
> --- a/tools/testing/selftests/net/forwarding/bridge_mdb.sh
> +++ b/tools/testing/selftests/net/forwarding/bridge_mdb.sh
> @@ -1054,7 +1054,7 @@ ctrl_igmpv3_is_in_test()
>  
>  	bridge mdb del dev br0 port $swp1 grp 239.1.1.1 vid 10
>  
> -	log_test "IGMPv3 MODE_IS_INCLUE tests"
> +	log_test "IGMPv3 MODE_IS_INCLUDE tests"
>  }
>  
>  ctrl_mldv2_is_in_test()


Oops :)

Acked-by: Nikolay Aleksandrov <razor@blackwall.org>


^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: [Bridge] [PATCH net-next 10/16] selftests: forwarding: bridge_mdb: Fix a typo
@ 2023-01-29 10:09     ` Nikolay Aleksandrov
  0 siblings, 0 replies; 90+ messages in thread
From: Nikolay Aleksandrov @ 2023-01-29 10:09 UTC (permalink / raw)
  To: Petr Machata, David S. Miller, Eric Dumazet, Jakub Kicinski,
	Paolo Abeni, Roopa Prabhu, netdev
  Cc: Ido Schimmel, bridge

On 26/01/2023 19:01, Petr Machata wrote:
> Add the letter missing from the word "INCLUDE".
> 
> Signed-off-by: Petr Machata <petrm@nvidia.com>
> Reviewed-by: Ido Schimmel <idosch@nvidia.com>
> ---
>  tools/testing/selftests/net/forwarding/bridge_mdb.sh | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/tools/testing/selftests/net/forwarding/bridge_mdb.sh b/tools/testing/selftests/net/forwarding/bridge_mdb.sh
> index 51f2b0d77067..4e16677f02ba 100755
> --- a/tools/testing/selftests/net/forwarding/bridge_mdb.sh
> +++ b/tools/testing/selftests/net/forwarding/bridge_mdb.sh
> @@ -1054,7 +1054,7 @@ ctrl_igmpv3_is_in_test()
>  
>  	bridge mdb del dev br0 port $swp1 grp 239.1.1.1 vid 10
>  
> -	log_test "IGMPv3 MODE_IS_INCLUE tests"
> +	log_test "IGMPv3 MODE_IS_INCLUDE tests"
>  }
>  
>  ctrl_mldv2_is_in_test()


Oops :)

Acked-by: Nikolay Aleksandrov <razor@blackwall.org>


^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: [PATCH net-next 11/16] selftests: forwarding: lib: Add helpers for IP address handling
  2023-01-26 17:01   ` [Bridge] " Petr Machata
@ 2023-01-29 10:09     ` Nikolay Aleksandrov
  -1 siblings, 0 replies; 90+ messages in thread
From: Nikolay Aleksandrov @ 2023-01-29 10:09 UTC (permalink / raw)
  To: Petr Machata, David S. Miller, Eric Dumazet, Jakub Kicinski,
	Paolo Abeni, Roopa Prabhu, netdev
  Cc: bridge, Ido Schimmel

On 26/01/2023 19:01, Petr Machata wrote:
> In order to generate IGMPv3 and MLDv2 packets on the fly, we will need
> helpers to expand IPv4 and IPv6 addresses given as parameters in
> mausezahn payload notation. Add helpers that do it.
> 
> Signed-off-by: Petr Machata <petrm@nvidia.com>
> ---
>  tools/testing/selftests/net/forwarding/lib.sh | 37 +++++++++++++++++++
>  1 file changed, 37 insertions(+)
> 

Acked-by: Nikolay Aleksandrov <razor@blackwall.org>

^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: [Bridge] [PATCH net-next 11/16] selftests: forwarding: lib: Add helpers for IP address handling
@ 2023-01-29 10:09     ` Nikolay Aleksandrov
  0 siblings, 0 replies; 90+ messages in thread
From: Nikolay Aleksandrov @ 2023-01-29 10:09 UTC (permalink / raw)
  To: Petr Machata, David S. Miller, Eric Dumazet, Jakub Kicinski,
	Paolo Abeni, Roopa Prabhu, netdev
  Cc: Ido Schimmel, bridge

On 26/01/2023 19:01, Petr Machata wrote:
> In order to generate IGMPv3 and MLDv2 packets on the fly, we will need
> helpers to expand IPv4 and IPv6 addresses given as parameters in
> mausezahn payload notation. Add helpers that do it.
> 
> Signed-off-by: Petr Machata <petrm@nvidia.com>
> ---
>  tools/testing/selftests/net/forwarding/lib.sh | 37 +++++++++++++++++++
>  1 file changed, 37 insertions(+)
> 

Acked-by: Nikolay Aleksandrov <razor@blackwall.org>

^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: [PATCH net-next 12/16] selftests: forwarding: lib: Add helpers for checksum handling
  2023-01-26 17:01   ` [Bridge] " Petr Machata
@ 2023-01-29 10:10     ` Nikolay Aleksandrov
  -1 siblings, 0 replies; 90+ messages in thread
From: Nikolay Aleksandrov @ 2023-01-29 10:10 UTC (permalink / raw)
  To: Petr Machata, David S. Miller, Eric Dumazet, Jakub Kicinski,
	Paolo Abeni, Roopa Prabhu, netdev
  Cc: bridge, Ido Schimmel

On 26/01/2023 19:01, Petr Machata wrote:
> In order to generate IGMPv3 and MLDv2 packets on the fly, we will need
> helpers to calculate the packet checksum.
> 
> The approach presented in this patch revolves around payload templates
> for mausezahn. These are mausezahn-like payload strings (01:23:45:...)
> with possibly one 2-byte sequence replaced with the word PAYLOAD. The
> main function is payload_template_calc_checksum(), which calculates
> RFC 1071 checksum of the message. There are further helpers to then
> convert the checksum to the payload format, and to expand it.
> 
> For IPv6, MLDv2 message checksum is computed using a pseudoheader that
> differs from the header used in the payload itself. The fact that the
> two messages are different means that the checksum needs to be
> returned as a separate quantity, instead of being expanded in-place in
> the payload itself. Furthermore, the pseudoheader includes a length of
> the message. Much like the checksum, this needs to be expanded in
> mausezahn format. And likewise for number of addresses for (S,G)
> entries. Thus we have several places where a computed quantity needs
> to be presented in the payload format. Add a helper u16_to_bytes(),
> which will be used in all these cases.
> 
> Signed-off-by: Petr Machata <petrm@nvidia.com>
> ---
>  tools/testing/selftests/net/forwarding/lib.sh | 56 +++++++++++++++++++
>  1 file changed, 56 insertions(+)
> 

Acked-by: Nikolay Aleksandrov <razor@blackwall.org>



^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: [Bridge] [PATCH net-next 12/16] selftests: forwarding: lib: Add helpers for checksum handling
@ 2023-01-29 10:10     ` Nikolay Aleksandrov
  0 siblings, 0 replies; 90+ messages in thread
From: Nikolay Aleksandrov @ 2023-01-29 10:10 UTC (permalink / raw)
  To: Petr Machata, David S. Miller, Eric Dumazet, Jakub Kicinski,
	Paolo Abeni, Roopa Prabhu, netdev
  Cc: Ido Schimmel, bridge

On 26/01/2023 19:01, Petr Machata wrote:
> In order to generate IGMPv3 and MLDv2 packets on the fly, we will need
> helpers to calculate the packet checksum.
> 
> The approach presented in this patch revolves around payload templates
> for mausezahn. These are mausezahn-like payload strings (01:23:45:...)
> with possibly one 2-byte sequence replaced with the word PAYLOAD. The
> main function is payload_template_calc_checksum(), which calculates
> RFC 1071 checksum of the message. There are further helpers to then
> convert the checksum to the payload format, and to expand it.
> 
> For IPv6, MLDv2 message checksum is computed using a pseudoheader that
> differs from the header used in the payload itself. The fact that the
> two messages are different means that the checksum needs to be
> returned as a separate quantity, instead of being expanded in-place in
> the payload itself. Furthermore, the pseudoheader includes a length of
> the message. Much like the checksum, this needs to be expanded in
> mausezahn format. And likewise for number of addresses for (S,G)
> entries. Thus we have several places where a computed quantity needs
> to be presented in the payload format. Add a helper u16_to_bytes(),
> which will be used in all these cases.
> 
> Signed-off-by: Petr Machata <petrm@nvidia.com>
> ---
>  tools/testing/selftests/net/forwarding/lib.sh | 56 +++++++++++++++++++
>  1 file changed, 56 insertions(+)
> 

Acked-by: Nikolay Aleksandrov <razor@blackwall.org>



^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: [PATCH net-next 13/16] selftests: forwarding: lib: Parameterize IGMPv3/MLDv2 generation
  2023-01-26 17:01   ` [Bridge] " Petr Machata
@ 2023-01-29 10:10     ` Nikolay Aleksandrov
  -1 siblings, 0 replies; 90+ messages in thread
From: Nikolay Aleksandrov @ 2023-01-29 10:10 UTC (permalink / raw)
  To: Petr Machata, David S. Miller, Eric Dumazet, Jakub Kicinski,
	Paolo Abeni, Roopa Prabhu, netdev
  Cc: bridge, Ido Schimmel

On 26/01/2023 19:01, Petr Machata wrote:
> In order to generate IGMPv3 and MLDv2 packets on the fly, the
> functions that generate these packets need to be able to generate
> packets for different groups and different sources. Generating MLDv2
> packets further needs the source address of the packet for purposes of
> checksum calculation. Add the necessary parameters, and generate the
> payload accordingly by dispatching to helpers added in the previous
> patches.
> 
> Adjust the sole client, bridge_mdb.sh, as well.
> 
> Signed-off-by: Petr Machata <petrm@nvidia.com>
> ---
>  .../selftests/net/forwarding/bridge_mdb.sh    |  9 ++---
>  tools/testing/selftests/net/forwarding/lib.sh | 36 +++++++++++++------
>  2 files changed, 31 insertions(+), 14 deletions(-)
> 

Acked-by: Nikolay Aleksandrov <razor@blackwall.org>



^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: [Bridge] [PATCH net-next 13/16] selftests: forwarding: lib: Parameterize IGMPv3/MLDv2 generation
@ 2023-01-29 10:10     ` Nikolay Aleksandrov
  0 siblings, 0 replies; 90+ messages in thread
From: Nikolay Aleksandrov @ 2023-01-29 10:10 UTC (permalink / raw)
  To: Petr Machata, David S. Miller, Eric Dumazet, Jakub Kicinski,
	Paolo Abeni, Roopa Prabhu, netdev
  Cc: Ido Schimmel, bridge

On 26/01/2023 19:01, Petr Machata wrote:
> In order to generate IGMPv3 and MLDv2 packets on the fly, the
> functions that generate these packets need to be able to generate
> packets for different groups and different sources. Generating MLDv2
> packets further needs the source address of the packet for purposes of
> checksum calculation. Add the necessary parameters, and generate the
> payload accordingly by dispatching to helpers added in the previous
> patches.
> 
> Adjust the sole client, bridge_mdb.sh, as well.
> 
> Signed-off-by: Petr Machata <petrm@nvidia.com>
> ---
>  .../selftests/net/forwarding/bridge_mdb.sh    |  9 ++---
>  tools/testing/selftests/net/forwarding/lib.sh | 36 +++++++++++++------
>  2 files changed, 31 insertions(+), 14 deletions(-)
> 

Acked-by: Nikolay Aleksandrov <razor@blackwall.org>



^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: [PATCH net-next 14/16] selftests: forwarding: lib: Allow list of IPs for IGMPv3/MLDv2
  2023-01-26 17:01   ` [Bridge] " Petr Machata
@ 2023-01-29 10:11     ` Nikolay Aleksandrov
  -1 siblings, 0 replies; 90+ messages in thread
From: Nikolay Aleksandrov @ 2023-01-29 10:11 UTC (permalink / raw)
  To: Petr Machata, David S. Miller, Eric Dumazet, Jakub Kicinski,
	Paolo Abeni, Roopa Prabhu, netdev
  Cc: bridge, Ido Schimmel

On 26/01/2023 19:01, Petr Machata wrote:
> The testsuite that checks for mcast_max_groups functionality will need
> to generate IGMP and MLD packets with configurable number of (S,G)
> addresses. To that end, further extend igmpv3_is_in_get() and
> mldv2_is_in_get() to allow a list of IP addresses instead of one
> address.
> 
> Signed-off-by: Petr Machata <petrm@nvidia.com>
> ---
>  tools/testing/selftests/net/forwarding/lib.sh | 22 +++++++++++++------
>  1 file changed, 15 insertions(+), 7 deletions(-)
> 

Acked-by: Nikolay Aleksandrov <razor@blackwall.org>



^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: [Bridge] [PATCH net-next 14/16] selftests: forwarding: lib: Allow list of IPs for IGMPv3/MLDv2
@ 2023-01-29 10:11     ` Nikolay Aleksandrov
  0 siblings, 0 replies; 90+ messages in thread
From: Nikolay Aleksandrov @ 2023-01-29 10:11 UTC (permalink / raw)
  To: Petr Machata, David S. Miller, Eric Dumazet, Jakub Kicinski,
	Paolo Abeni, Roopa Prabhu, netdev
  Cc: Ido Schimmel, bridge

On 26/01/2023 19:01, Petr Machata wrote:
> The testsuite that checks for mcast_max_groups functionality will need
> to generate IGMP and MLD packets with configurable number of (S,G)
> addresses. To that end, further extend igmpv3_is_in_get() and
> mldv2_is_in_get() to allow a list of IP addresses instead of one
> address.
> 
> Signed-off-by: Petr Machata <petrm@nvidia.com>
> ---
>  tools/testing/selftests/net/forwarding/lib.sh | 22 +++++++++++++------
>  1 file changed, 15 insertions(+), 7 deletions(-)
> 

Acked-by: Nikolay Aleksandrov <razor@blackwall.org>



^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: [PATCH net-next 15/16] selftests: forwarding: lib: Add helpers to build IGMP/MLD leave packets
  2023-01-26 17:01   ` [Bridge] " Petr Machata
@ 2023-01-29 10:11     ` Nikolay Aleksandrov
  -1 siblings, 0 replies; 90+ messages in thread
From: Nikolay Aleksandrov @ 2023-01-29 10:11 UTC (permalink / raw)
  To: Petr Machata, David S. Miller, Eric Dumazet, Jakub Kicinski,
	Paolo Abeni, Roopa Prabhu, netdev
  Cc: bridge, Ido Schimmel

On 26/01/2023 19:01, Petr Machata wrote:
> The testsuite that checks for mcast_max_groups functionality will need to
> wipe the added groups as well. Add helpers to build an IGMP or MLD packets
> announcing that host is leaving a given group.
> 
> Signed-off-by: Petr Machata <petrm@nvidia.com>
> ---
>  tools/testing/selftests/net/forwarding/lib.sh | 50 +++++++++++++++++++
>  1 file changed, 50 insertions(+)
> 

Acked-by: Nikolay Aleksandrov <razor@blackwall.org>



^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: [Bridge] [PATCH net-next 15/16] selftests: forwarding: lib: Add helpers to build IGMP/MLD leave packets
@ 2023-01-29 10:11     ` Nikolay Aleksandrov
  0 siblings, 0 replies; 90+ messages in thread
From: Nikolay Aleksandrov @ 2023-01-29 10:11 UTC (permalink / raw)
  To: Petr Machata, David S. Miller, Eric Dumazet, Jakub Kicinski,
	Paolo Abeni, Roopa Prabhu, netdev
  Cc: Ido Schimmel, bridge

On 26/01/2023 19:01, Petr Machata wrote:
> The testsuite that checks for mcast_max_groups functionality will need to
> wipe the added groups as well. Add helpers to build an IGMP or MLD packets
> announcing that host is leaving a given group.
> 
> Signed-off-by: Petr Machata <petrm@nvidia.com>
> ---
>  tools/testing/selftests/net/forwarding/lib.sh | 50 +++++++++++++++++++
>  1 file changed, 50 insertions(+)
> 

Acked-by: Nikolay Aleksandrov <razor@blackwall.org>



^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: [PATCH net-next 16/16] selftests: forwarding: bridge_mdb_max: Add a new selftest
  2023-01-26 17:01   ` [Bridge] " Petr Machata
@ 2023-01-29 10:12     ` Nikolay Aleksandrov
  -1 siblings, 0 replies; 90+ messages in thread
From: Nikolay Aleksandrov @ 2023-01-29 10:12 UTC (permalink / raw)
  To: Petr Machata, David S. Miller, Eric Dumazet, Jakub Kicinski,
	Paolo Abeni, Roopa Prabhu, netdev
  Cc: bridge, Ido Schimmel

On 26/01/2023 19:01, Petr Machata wrote:
> Add a suite covering mcast_n_groups and mcast_max_groups bridge features.
> 
> Signed-off-by: Petr Machata <petrm@nvidia.com>
> ---
>  .../testing/selftests/net/forwarding/Makefile |   1 +
>  .../net/forwarding/bridge_mdb_max.sh          | 970 ++++++++++++++++++
>  2 files changed, 971 insertions(+)
>  create mode 100755 tools/testing/selftests/net/forwarding/bridge_mdb_max.sh
> 

Nice test coverage!
Acked-by: Nikolay Aleksandrov <razor@blackwall.org>


^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: [Bridge] [PATCH net-next 16/16] selftests: forwarding: bridge_mdb_max: Add a new selftest
@ 2023-01-29 10:12     ` Nikolay Aleksandrov
  0 siblings, 0 replies; 90+ messages in thread
From: Nikolay Aleksandrov @ 2023-01-29 10:12 UTC (permalink / raw)
  To: Petr Machata, David S. Miller, Eric Dumazet, Jakub Kicinski,
	Paolo Abeni, Roopa Prabhu, netdev
  Cc: Ido Schimmel, bridge

On 26/01/2023 19:01, Petr Machata wrote:
> Add a suite covering mcast_n_groups and mcast_max_groups bridge features.
> 
> Signed-off-by: Petr Machata <petrm@nvidia.com>
> ---
>  .../testing/selftests/net/forwarding/Makefile |   1 +
>  .../net/forwarding/bridge_mdb_max.sh          | 970 ++++++++++++++++++
>  2 files changed, 971 insertions(+)
>  create mode 100755 tools/testing/selftests/net/forwarding/bridge_mdb_max.sh
> 

Nice test coverage!
Acked-by: Nikolay Aleksandrov <razor@blackwall.org>


^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: [PATCH net-next 08/16] net: bridge: Add netlink knobs for number / maximum MDB entries
  2023-01-29 10:07     ` [Bridge] " Nikolay Aleksandrov
@ 2023-01-29 14:58       ` Ido Schimmel
  -1 siblings, 0 replies; 90+ messages in thread
From: Ido Schimmel @ 2023-01-29 14:58 UTC (permalink / raw)
  To: Nikolay Aleksandrov
  Cc: Petr Machata, David S. Miller, Eric Dumazet, Jakub Kicinski,
	Paolo Abeni, Roopa Prabhu, netdev, bridge

Thanks for the review, Nik!

On Sun, Jan 29, 2023 at 12:07:31PM +0200, Nikolay Aleksandrov wrote:
> On 26/01/2023 19:01, Petr Machata wrote:
> > +static int
> > +br_multicast_pmctx_ngroups_set_max(struct net_bridge_mcast_port *pmctx,
> > +				   u32 max, struct netlink_ext_ack *extack)
> > +{
> > +	if (max && max < pmctx->mdb_n_entries) {
> > +		NL_SET_ERR_MSG_FMT_MOD(extack, "Can't set mcast_max_groups=%u, which is below mcast_n_groups=%u",
> > +				       max, pmctx->mdb_n_entries);
> 
> Why not? All new entries will be rejected anyway, at most some will expire and make room.

Looking at the code of the global limit ('mcast_hash_max') and also
testing it, I see that the above is not enforced there either so doing
what you suggest will at least make the port and port-vlan limits
consistent with the global limit in this regard.

> 
> > +		return -EINVAL;
> > +	}
> > +
> > +	pmctx->mdb_max_entries = max;
> > +	return 0;
> > +}

^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: [Bridge] [PATCH net-next 08/16] net: bridge: Add netlink knobs for number / maximum MDB entries
@ 2023-01-29 14:58       ` Ido Schimmel
  0 siblings, 0 replies; 90+ messages in thread
From: Ido Schimmel @ 2023-01-29 14:58 UTC (permalink / raw)
  To: Nikolay Aleksandrov
  Cc: Petr Machata, netdev, bridge, Eric Dumazet, Roopa Prabhu,
	Jakub Kicinski, Paolo Abeni, David S. Miller

Thanks for the review, Nik!

On Sun, Jan 29, 2023 at 12:07:31PM +0200, Nikolay Aleksandrov wrote:
> On 26/01/2023 19:01, Petr Machata wrote:
> > +static int
> > +br_multicast_pmctx_ngroups_set_max(struct net_bridge_mcast_port *pmctx,
> > +				   u32 max, struct netlink_ext_ack *extack)
> > +{
> > +	if (max && max < pmctx->mdb_n_entries) {
> > +		NL_SET_ERR_MSG_FMT_MOD(extack, "Can't set mcast_max_groups=%u, which is below mcast_n_groups=%u",
> > +				       max, pmctx->mdb_n_entries);
> 
> Why not? All new entries will be rejected anyway, at most some will expire and make room.

Looking at the code of the global limit ('mcast_hash_max') and also
testing it, I see that the above is not enforced there either so doing
what you suggest will at least make the port and port-vlan limits
consistent with the global limit in this regard.

> 
> > +		return -EINVAL;
> > +	}
> > +
> > +	pmctx->mdb_max_entries = max;
> > +	return 0;
> > +}

^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: [PATCH net-next 07/16] net: bridge: Maintain number of MDB entries in net_bridge_mcast_port
  2023-01-26 17:01   ` [Bridge] " Petr Machata
@ 2023-01-29 16:55     ` Nikolay Aleksandrov
  -1 siblings, 0 replies; 90+ messages in thread
From: Nikolay Aleksandrov @ 2023-01-29 16:55 UTC (permalink / raw)
  To: Petr Machata, David S. Miller, Eric Dumazet, Jakub Kicinski,
	Paolo Abeni, Roopa Prabhu, netdev
  Cc: bridge, Ido Schimmel

On 26/01/2023 19:01, Petr Machata wrote:
> The MDB maintained by the bridge is limited. When the bridge is configured
> for IGMP / MLD snooping, a buggy or malicious client can easily exhaust its
> capacity. In SW datapath, the capacity is configurable through the
> IFLA_BR_MCAST_HASH_MAX parameter, but ultimately is finite. Obviously a
> similar limit exists in the HW datapath for purposes of offloading.
> 
> In order to prevent the issue of unilateral exhaustion of MDB resources,
> introduce two parameters in each of two contexts:
> 
> - Per-port and per-port-VLAN number of MDB entries that the port
>   is member in.
> 
> - Per-port and (when BROPT_MCAST_VLAN_SNOOPING_ENABLED is enabled)
>   per-port-VLAN maximum permitted number of MDB entries, or 0 for
>   no limit.
> 
> The per-port multicast context is used for tracking of MDB entries for the
> port as a whole. This is available for all bridges.
> 
> The per-port-VLAN multicast context is then only available on
> VLAN-filtering bridges on VLANs that have multicast snooping on.
> 
> With these changes in place, it will be possible to configure MDB limit for
> bridge as a whole, or any one port as a whole, or any single port-VLAN.
> 
> Note that unlike the global limit, exhaustion of the per-port and
> per-port-VLAN maximums does not cause disablement of multicast snooping.
> It is also permitted to configure the local limit larger than hash_max,
> even though that is not useful.
> 
> In this patch, introduce only the accounting for number of entries, and the
> max field itself, but not the means to toggle the max. The next patch
> introduces the netlink APIs to toggle and read the values.
> 
> Note that the per-port-VLAN mcast_max_groups value gets reset when VLAN
> snooping is enabled. The reason for this is that while VLAN snooping is
> disabled, permanent entries can be added above the limit imposed by the
> configured maximum. Under those circumstances, whatever caused the VLAN
> context enablement, would need to be rolled back, adding a fair amount of
> code that would be rarely hit and tricky to maintain. At the same time,
> the feature that this would enable is IMHO not interesting: I posit that
> the usefulness of keeping mcast_max_groups intact across
> mcast_vlan_snooping toggles is marginal at best.
> 

Hmm, I keep thinking about this one and I don't completely agree. It would be
more user-friendly if the max count doesn't get reset when mcast snooping is toggled.
Imposing order of operations (first enable snooping, then config max entries) isn't necessary
and it makes sense for someone to first set the limit and then enable vlan snooping.
Also it would be consistent with port max entries, I'd prefer if we have the same
behaviour for port and vlan pmctxs. If we allow to set any maximum at any time we
don't need to rollback anything, also we already always lookup vlans in br_multicast_port_vid_to_port_ctx()
to check if snooping is enabled so we can keep the count correct regardless, the same as
it's done for the ports. Keeping both limits with consistent semantics seems better to me.

WDYT ?

> Signed-off-by: Petr Machata <petrm@nvidia.com>
> Reviewed-by: Ido Schimmel <idosch@nvidia.com>
> ---
>  net/bridge/br_multicast.c | 131 +++++++++++++++++++++++++++++++++++++-
>  net/bridge/br_private.h   |   2 +
>  2 files changed, 132 insertions(+), 1 deletion(-)
> 
> diff --git a/net/bridge/br_multicast.c b/net/bridge/br_multicast.c
> index 51b622afdb67..de531109b947 100644
> --- a/net/bridge/br_multicast.c
> +++ b/net/bridge/br_multicast.c
> @@ -31,6 +31,7 @@
>  #include <net/ip6_checksum.h>
>  #include <net/addrconf.h>
>  #endif
> +#include <trace/events/bridge.h>
>  
>  #include "br_private.h"
>  #include "br_private_mcast_eht.h"
> @@ -234,6 +235,29 @@ br_multicast_pg_to_port_ctx(const struct net_bridge_port_group *pg)
>  	return pmctx;
>  }
>  
> +static struct net_bridge_mcast_port *
> +br_multicast_port_vid_to_port_ctx(struct net_bridge_port *port, u16 vid)
> +{
> +	struct net_bridge_mcast_port *pmctx = NULL;
> +	struct net_bridge_vlan *vlan;
> +
> +	lockdep_assert_held_once(&port->br->multicast_lock);
> +
> +	if (!br_opt_get(port->br, BROPT_MCAST_VLAN_SNOOPING_ENABLED))
> +		return NULL;
> +
> +	/* Take RCU to access the vlan. */
> +	rcu_read_lock();
> +
> +	vlan = br_vlan_find(nbp_vlan_group_rcu(port), vid);
> +	if (vlan && !br_multicast_port_ctx_vlan_disabled(&vlan->port_mcast_ctx))
> +		pmctx = &vlan->port_mcast_ctx;
> +
> +	rcu_read_unlock();
> +
> +	return pmctx;
> +}
> +
>  /* when snooping we need to check if the contexts should be used
>   * in the following order:
>   * - if pmctx is non-NULL (port), check if it should be used
> @@ -668,6 +692,80 @@ void br_multicast_del_group_src(struct net_bridge_group_src *src,
>  	__br_multicast_del_group_src(src);
>  }
>  
> +static int
> +br_multicast_port_ngroups_inc_one(struct net_bridge_mcast_port *pmctx,
> +				  struct netlink_ext_ack *extack)
> +{
> +	if (pmctx->mdb_max_entries &&
> +	    pmctx->mdb_n_entries == pmctx->mdb_max_entries)
> +		return -E2BIG;
> +
> +	pmctx->mdb_n_entries++;
> +	return 0;
> +}
> +
> +static void br_multicast_port_ngroups_dec_one(struct net_bridge_mcast_port *pmctx)
> +{
> +	WARN_ON_ONCE(pmctx->mdb_n_entries-- == 0);
> +}
> +
> +static int br_multicast_port_ngroups_inc(struct net_bridge_port *port,
> +					 const struct br_ip *group,
> +					 struct netlink_ext_ack *extack)
> +{
> +	struct net_bridge_mcast_port *pmctx;
> +	int err;
> +
> +	lockdep_assert_held_once(&port->br->multicast_lock);
> +
> +	/* Always count on the port context. */
> +	err = br_multicast_port_ngroups_inc_one(&port->multicast_ctx, extack);
> +	if (err) {
> +		NL_SET_ERR_MSG_FMT_MOD(extack, "Port is already a member in mcast_max_groups (%u) groups",
> +				       port->multicast_ctx.mdb_max_entries);
> +		trace_br_mdb_full(port->dev, group);
> +		return err;
> +	}
> +
> +	/* Only count on the VLAN context if VID is given, and if snooping on
> +	 * that VLAN is enabled.
> +	 */
> +	if (!group->vid)
> +		return 0;
> +
> +	pmctx = br_multicast_port_vid_to_port_ctx(port, group->vid);
> +	if (!pmctx)
> +		return 0;
> +
> +	err = br_multicast_port_ngroups_inc_one(pmctx, extack);
> +	if (err) {
> +		NL_SET_ERR_MSG_FMT_MOD(extack, "Port-VLAN is already a member in mcast_max_groups (%u) groups",
> +				       pmctx->mdb_max_entries);
> +		trace_br_mdb_full(port->dev, group);
> +		goto dec_one_out;
> +	}
> +
> +	return 0;
> +
> +dec_one_out:
> +	br_multicast_port_ngroups_dec_one(&port->multicast_ctx);
> +	return err;
> +}
> +
> +static void br_multicast_port_ngroups_dec(struct net_bridge_port *port, u16 vid)
> +{
> +	struct net_bridge_mcast_port *pmctx;
> +
> +	lockdep_assert_held_once(&port->br->multicast_lock);
> +
> +	if (vid) {
> +		pmctx = br_multicast_port_vid_to_port_ctx(port, vid);
> +		if (pmctx)
> +			br_multicast_port_ngroups_dec_one(pmctx);
> +	}
> +	br_multicast_port_ngroups_dec_one(&port->multicast_ctx);
> +}
> +
>  static void br_multicast_destroy_port_group(struct net_bridge_mcast_gc *gc)
>  {
>  	struct net_bridge_port_group *pg;
> @@ -702,6 +800,7 @@ void br_multicast_del_pg(struct net_bridge_mdb_entry *mp,
>  	} else {
>  		br_multicast_star_g_handle_mode(pg, MCAST_INCLUDE);
>  	}
> +	br_multicast_port_ngroups_dec(pg->key.port, pg->key.addr.vid);
>  	hlist_add_head(&pg->mcast_gc.gc_node, &br->mcast_gc_list);
>  	queue_work(system_long_wq, &br->mcast_gc_work);
>  
> @@ -1165,6 +1264,7 @@ struct net_bridge_mdb_entry *br_multicast_new_group(struct net_bridge *br,
>  		return mp;
>  
>  	if (atomic_read(&br->mdb_hash_tbl.nelems) >= br->hash_max) {
> +		trace_br_mdb_full(br->dev, group);
>  		br_mc_disabled_update(br->dev, false, NULL);
>  		br_opt_toggle(br, BROPT_MULTICAST_ENABLED, false);
>  		return ERR_PTR(-E2BIG);
> @@ -1288,11 +1388,16 @@ struct net_bridge_port_group *br_multicast_new_port_group(
>  			struct netlink_ext_ack *extack)
>  {
>  	struct net_bridge_port_group *p;
> +	int err;
> +
> +	err = br_multicast_port_ngroups_inc(port, group, extack);
> +	if (err)
> +		return NULL;
>  
>  	p = kzalloc(sizeof(*p), GFP_ATOMIC);
>  	if (unlikely(!p)) {
>  		NL_SET_ERR_MSG_MOD(extack, "Couldn't allocate new port group");
> -		return NULL;
> +		goto dec_out;
>  	}
>  
>  	p->key.addr = *group;
> @@ -1326,18 +1431,22 @@ struct net_bridge_port_group *br_multicast_new_port_group(
>  
>  free_out:
>  	kfree(p);
> +dec_out:
> +	br_multicast_port_ngroups_dec(port, group->vid);
>  	return NULL;
>  }
>  
>  void br_multicast_del_port_group(struct net_bridge_port_group *p)
>  {
>  	struct net_bridge_port *port = p->key.port;
> +	__u16 vid = p->key.addr.vid;
>  
>  	hlist_del_init(&p->mglist);
>  	if (!br_multicast_is_star_g(&p->key.addr))
>  		rhashtable_remove_fast(&port->br->sg_port_tbl, &p->rhnode,
>  				       br_sg_port_rht_params);
>  	kfree(p);
> +	br_multicast_port_ngroups_dec(port, vid);
>  }
>  
>  void br_multicast_host_join(const struct net_bridge_mcast *brmctx,
> @@ -1951,6 +2060,26 @@ static void __br_multicast_enable_port_ctx(struct net_bridge_mcast_port *pmctx)
>  		br_ip4_multicast_add_router(brmctx, pmctx);
>  		br_ip6_multicast_add_router(brmctx, pmctx);
>  	}
> +
> +	if (br_multicast_port_ctx_is_vlan(pmctx)) {
> +		struct net_bridge_port_group *pg;
> +
> +		/* If BR_VLFLAG_MCAST_ENABLED was enabled in the past, but then
> +		 * disabled, the mcast_n_groups counter is now wrong. First,
> +		 * BR_VLFLAG_MCAST_ENABLED is toggled before temporary entries
> +		 * are flushed, thus mcast_n_groups after the toggle does not
> +		 * reflect the true values. And second, permanent entries added
> +		 * while BR_VLFLAG_MCAST_ENABLED was disabled, are not reflected
> +		 * either. Thus we have to refresh the counter.
> +		 */
> +
> +		pmctx->mdb_max_entries = 0;
> +		pmctx->mdb_n_entries = 0;
> +		hlist_for_each_entry(pg, &pmctx->port->mglist, mglist) {
> +			if (pg->key.addr.vid == pmctx->vlan->vid)
> +				br_multicast_port_ngroups_inc_one(pmctx, NULL);
> +		}
> +	}
>  }
>  
>  void br_multicast_enable_port(struct net_bridge_port *port)
> diff --git a/net/bridge/br_private.h b/net/bridge/br_private.h
> index e4069e27b5c6..49f411a0a1f1 100644
> --- a/net/bridge/br_private.h
> +++ b/net/bridge/br_private.h
> @@ -126,6 +126,8 @@ struct net_bridge_mcast_port {
>  	struct hlist_node		ip6_rlist;
>  #endif /* IS_ENABLED(CONFIG_IPV6) */
>  	unsigned char			multicast_router;
> +	u32				mdb_n_entries;
> +	u32				mdb_max_entries;
>  #endif /* CONFIG_BRIDGE_IGMP_SNOOPING */
>  };
>  


^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: [Bridge] [PATCH net-next 07/16] net: bridge: Maintain number of MDB entries in net_bridge_mcast_port
@ 2023-01-29 16:55     ` Nikolay Aleksandrov
  0 siblings, 0 replies; 90+ messages in thread
From: Nikolay Aleksandrov @ 2023-01-29 16:55 UTC (permalink / raw)
  To: Petr Machata, David S. Miller, Eric Dumazet, Jakub Kicinski,
	Paolo Abeni, Roopa Prabhu, netdev
  Cc: Ido Schimmel, bridge

On 26/01/2023 19:01, Petr Machata wrote:
> The MDB maintained by the bridge is limited. When the bridge is configured
> for IGMP / MLD snooping, a buggy or malicious client can easily exhaust its
> capacity. In SW datapath, the capacity is configurable through the
> IFLA_BR_MCAST_HASH_MAX parameter, but ultimately is finite. Obviously a
> similar limit exists in the HW datapath for purposes of offloading.
> 
> In order to prevent the issue of unilateral exhaustion of MDB resources,
> introduce two parameters in each of two contexts:
> 
> - Per-port and per-port-VLAN number of MDB entries that the port
>   is member in.
> 
> - Per-port and (when BROPT_MCAST_VLAN_SNOOPING_ENABLED is enabled)
>   per-port-VLAN maximum permitted number of MDB entries, or 0 for
>   no limit.
> 
> The per-port multicast context is used for tracking of MDB entries for the
> port as a whole. This is available for all bridges.
> 
> The per-port-VLAN multicast context is then only available on
> VLAN-filtering bridges on VLANs that have multicast snooping on.
> 
> With these changes in place, it will be possible to configure MDB limit for
> bridge as a whole, or any one port as a whole, or any single port-VLAN.
> 
> Note that unlike the global limit, exhaustion of the per-port and
> per-port-VLAN maximums does not cause disablement of multicast snooping.
> It is also permitted to configure the local limit larger than hash_max,
> even though that is not useful.
> 
> In this patch, introduce only the accounting for number of entries, and the
> max field itself, but not the means to toggle the max. The next patch
> introduces the netlink APIs to toggle and read the values.
> 
> Note that the per-port-VLAN mcast_max_groups value gets reset when VLAN
> snooping is enabled. The reason for this is that while VLAN snooping is
> disabled, permanent entries can be added above the limit imposed by the
> configured maximum. Under those circumstances, whatever caused the VLAN
> context enablement, would need to be rolled back, adding a fair amount of
> code that would be rarely hit and tricky to maintain. At the same time,
> the feature that this would enable is IMHO not interesting: I posit that
> the usefulness of keeping mcast_max_groups intact across
> mcast_vlan_snooping toggles is marginal at best.
> 

Hmm, I keep thinking about this one and I don't completely agree. It would be
more user-friendly if the max count doesn't get reset when mcast snooping is toggled.
Imposing order of operations (first enable snooping, then config max entries) isn't necessary
and it makes sense for someone to first set the limit and then enable vlan snooping.
Also it would be consistent with port max entries, I'd prefer if we have the same
behaviour for port and vlan pmctxs. If we allow to set any maximum at any time we
don't need to rollback anything, also we already always lookup vlans in br_multicast_port_vid_to_port_ctx()
to check if snooping is enabled so we can keep the count correct regardless, the same as
it's done for the ports. Keeping both limits with consistent semantics seems better to me.

WDYT ?

> Signed-off-by: Petr Machata <petrm@nvidia.com>
> Reviewed-by: Ido Schimmel <idosch@nvidia.com>
> ---
>  net/bridge/br_multicast.c | 131 +++++++++++++++++++++++++++++++++++++-
>  net/bridge/br_private.h   |   2 +
>  2 files changed, 132 insertions(+), 1 deletion(-)
> 
> diff --git a/net/bridge/br_multicast.c b/net/bridge/br_multicast.c
> index 51b622afdb67..de531109b947 100644
> --- a/net/bridge/br_multicast.c
> +++ b/net/bridge/br_multicast.c
> @@ -31,6 +31,7 @@
>  #include <net/ip6_checksum.h>
>  #include <net/addrconf.h>
>  #endif
> +#include <trace/events/bridge.h>
>  
>  #include "br_private.h"
>  #include "br_private_mcast_eht.h"
> @@ -234,6 +235,29 @@ br_multicast_pg_to_port_ctx(const struct net_bridge_port_group *pg)
>  	return pmctx;
>  }
>  
> +static struct net_bridge_mcast_port *
> +br_multicast_port_vid_to_port_ctx(struct net_bridge_port *port, u16 vid)
> +{
> +	struct net_bridge_mcast_port *pmctx = NULL;
> +	struct net_bridge_vlan *vlan;
> +
> +	lockdep_assert_held_once(&port->br->multicast_lock);
> +
> +	if (!br_opt_get(port->br, BROPT_MCAST_VLAN_SNOOPING_ENABLED))
> +		return NULL;
> +
> +	/* Take RCU to access the vlan. */
> +	rcu_read_lock();
> +
> +	vlan = br_vlan_find(nbp_vlan_group_rcu(port), vid);
> +	if (vlan && !br_multicast_port_ctx_vlan_disabled(&vlan->port_mcast_ctx))
> +		pmctx = &vlan->port_mcast_ctx;
> +
> +	rcu_read_unlock();
> +
> +	return pmctx;
> +}
> +
>  /* when snooping we need to check if the contexts should be used
>   * in the following order:
>   * - if pmctx is non-NULL (port), check if it should be used
> @@ -668,6 +692,80 @@ void br_multicast_del_group_src(struct net_bridge_group_src *src,
>  	__br_multicast_del_group_src(src);
>  }
>  
> +static int
> +br_multicast_port_ngroups_inc_one(struct net_bridge_mcast_port *pmctx,
> +				  struct netlink_ext_ack *extack)
> +{
> +	if (pmctx->mdb_max_entries &&
> +	    pmctx->mdb_n_entries == pmctx->mdb_max_entries)
> +		return -E2BIG;
> +
> +	pmctx->mdb_n_entries++;
> +	return 0;
> +}
> +
> +static void br_multicast_port_ngroups_dec_one(struct net_bridge_mcast_port *pmctx)
> +{
> +	WARN_ON_ONCE(pmctx->mdb_n_entries-- == 0);
> +}
> +
> +static int br_multicast_port_ngroups_inc(struct net_bridge_port *port,
> +					 const struct br_ip *group,
> +					 struct netlink_ext_ack *extack)
> +{
> +	struct net_bridge_mcast_port *pmctx;
> +	int err;
> +
> +	lockdep_assert_held_once(&port->br->multicast_lock);
> +
> +	/* Always count on the port context. */
> +	err = br_multicast_port_ngroups_inc_one(&port->multicast_ctx, extack);
> +	if (err) {
> +		NL_SET_ERR_MSG_FMT_MOD(extack, "Port is already a member in mcast_max_groups (%u) groups",
> +				       port->multicast_ctx.mdb_max_entries);
> +		trace_br_mdb_full(port->dev, group);
> +		return err;
> +	}
> +
> +	/* Only count on the VLAN context if VID is given, and if snooping on
> +	 * that VLAN is enabled.
> +	 */
> +	if (!group->vid)
> +		return 0;
> +
> +	pmctx = br_multicast_port_vid_to_port_ctx(port, group->vid);
> +	if (!pmctx)
> +		return 0;
> +
> +	err = br_multicast_port_ngroups_inc_one(pmctx, extack);
> +	if (err) {
> +		NL_SET_ERR_MSG_FMT_MOD(extack, "Port-VLAN is already a member in mcast_max_groups (%u) groups",
> +				       pmctx->mdb_max_entries);
> +		trace_br_mdb_full(port->dev, group);
> +		goto dec_one_out;
> +	}
> +
> +	return 0;
> +
> +dec_one_out:
> +	br_multicast_port_ngroups_dec_one(&port->multicast_ctx);
> +	return err;
> +}
> +
> +static void br_multicast_port_ngroups_dec(struct net_bridge_port *port, u16 vid)
> +{
> +	struct net_bridge_mcast_port *pmctx;
> +
> +	lockdep_assert_held_once(&port->br->multicast_lock);
> +
> +	if (vid) {
> +		pmctx = br_multicast_port_vid_to_port_ctx(port, vid);
> +		if (pmctx)
> +			br_multicast_port_ngroups_dec_one(pmctx);
> +	}
> +	br_multicast_port_ngroups_dec_one(&port->multicast_ctx);
> +}
> +
>  static void br_multicast_destroy_port_group(struct net_bridge_mcast_gc *gc)
>  {
>  	struct net_bridge_port_group *pg;
> @@ -702,6 +800,7 @@ void br_multicast_del_pg(struct net_bridge_mdb_entry *mp,
>  	} else {
>  		br_multicast_star_g_handle_mode(pg, MCAST_INCLUDE);
>  	}
> +	br_multicast_port_ngroups_dec(pg->key.port, pg->key.addr.vid);
>  	hlist_add_head(&pg->mcast_gc.gc_node, &br->mcast_gc_list);
>  	queue_work(system_long_wq, &br->mcast_gc_work);
>  
> @@ -1165,6 +1264,7 @@ struct net_bridge_mdb_entry *br_multicast_new_group(struct net_bridge *br,
>  		return mp;
>  
>  	if (atomic_read(&br->mdb_hash_tbl.nelems) >= br->hash_max) {
> +		trace_br_mdb_full(br->dev, group);
>  		br_mc_disabled_update(br->dev, false, NULL);
>  		br_opt_toggle(br, BROPT_MULTICAST_ENABLED, false);
>  		return ERR_PTR(-E2BIG);
> @@ -1288,11 +1388,16 @@ struct net_bridge_port_group *br_multicast_new_port_group(
>  			struct netlink_ext_ack *extack)
>  {
>  	struct net_bridge_port_group *p;
> +	int err;
> +
> +	err = br_multicast_port_ngroups_inc(port, group, extack);
> +	if (err)
> +		return NULL;
>  
>  	p = kzalloc(sizeof(*p), GFP_ATOMIC);
>  	if (unlikely(!p)) {
>  		NL_SET_ERR_MSG_MOD(extack, "Couldn't allocate new port group");
> -		return NULL;
> +		goto dec_out;
>  	}
>  
>  	p->key.addr = *group;
> @@ -1326,18 +1431,22 @@ struct net_bridge_port_group *br_multicast_new_port_group(
>  
>  free_out:
>  	kfree(p);
> +dec_out:
> +	br_multicast_port_ngroups_dec(port, group->vid);
>  	return NULL;
>  }
>  
>  void br_multicast_del_port_group(struct net_bridge_port_group *p)
>  {
>  	struct net_bridge_port *port = p->key.port;
> +	__u16 vid = p->key.addr.vid;
>  
>  	hlist_del_init(&p->mglist);
>  	if (!br_multicast_is_star_g(&p->key.addr))
>  		rhashtable_remove_fast(&port->br->sg_port_tbl, &p->rhnode,
>  				       br_sg_port_rht_params);
>  	kfree(p);
> +	br_multicast_port_ngroups_dec(port, vid);
>  }
>  
>  void br_multicast_host_join(const struct net_bridge_mcast *brmctx,
> @@ -1951,6 +2060,26 @@ static void __br_multicast_enable_port_ctx(struct net_bridge_mcast_port *pmctx)
>  		br_ip4_multicast_add_router(brmctx, pmctx);
>  		br_ip6_multicast_add_router(brmctx, pmctx);
>  	}
> +
> +	if (br_multicast_port_ctx_is_vlan(pmctx)) {
> +		struct net_bridge_port_group *pg;
> +
> +		/* If BR_VLFLAG_MCAST_ENABLED was enabled in the past, but then
> +		 * disabled, the mcast_n_groups counter is now wrong. First,
> +		 * BR_VLFLAG_MCAST_ENABLED is toggled before temporary entries
> +		 * are flushed, thus mcast_n_groups after the toggle does not
> +		 * reflect the true values. And second, permanent entries added
> +		 * while BR_VLFLAG_MCAST_ENABLED was disabled, are not reflected
> +		 * either. Thus we have to refresh the counter.
> +		 */
> +
> +		pmctx->mdb_max_entries = 0;
> +		pmctx->mdb_n_entries = 0;
> +		hlist_for_each_entry(pg, &pmctx->port->mglist, mglist) {
> +			if (pg->key.addr.vid == pmctx->vlan->vid)
> +				br_multicast_port_ngroups_inc_one(pmctx, NULL);
> +		}
> +	}
>  }
>  
>  void br_multicast_enable_port(struct net_bridge_port *port)
> diff --git a/net/bridge/br_private.h b/net/bridge/br_private.h
> index e4069e27b5c6..49f411a0a1f1 100644
> --- a/net/bridge/br_private.h
> +++ b/net/bridge/br_private.h
> @@ -126,6 +126,8 @@ struct net_bridge_mcast_port {
>  	struct hlist_node		ip6_rlist;
>  #endif /* IS_ENABLED(CONFIG_IPV6) */
>  	unsigned char			multicast_router;
> +	u32				mdb_n_entries;
> +	u32				mdb_max_entries;
>  #endif /* CONFIG_BRIDGE_IGMP_SNOOPING */
>  };
>  


^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: [PATCH net-next 07/16] net: bridge: Maintain number of MDB entries in net_bridge_mcast_port
  2023-01-29 16:55     ` [Bridge] " Nikolay Aleksandrov
@ 2023-01-30  8:08       ` Ido Schimmel
  -1 siblings, 0 replies; 90+ messages in thread
From: Ido Schimmel @ 2023-01-30  8:08 UTC (permalink / raw)
  To: Nikolay Aleksandrov
  Cc: Petr Machata, David S. Miller, Eric Dumazet, Jakub Kicinski,
	Paolo Abeni, Roopa Prabhu, netdev, bridge

On Sun, Jan 29, 2023 at 06:55:26PM +0200, Nikolay Aleksandrov wrote:
> On 26/01/2023 19:01, Petr Machata wrote:
> > The MDB maintained by the bridge is limited. When the bridge is configured
> > for IGMP / MLD snooping, a buggy or malicious client can easily exhaust its
> > capacity. In SW datapath, the capacity is configurable through the
> > IFLA_BR_MCAST_HASH_MAX parameter, but ultimately is finite. Obviously a
> > similar limit exists in the HW datapath for purposes of offloading.
> > 
> > In order to prevent the issue of unilateral exhaustion of MDB resources,
> > introduce two parameters in each of two contexts:
> > 
> > - Per-port and per-port-VLAN number of MDB entries that the port
> >   is member in.
> > 
> > - Per-port and (when BROPT_MCAST_VLAN_SNOOPING_ENABLED is enabled)
> >   per-port-VLAN maximum permitted number of MDB entries, or 0 for
> >   no limit.
> > 
> > The per-port multicast context is used for tracking of MDB entries for the
> > port as a whole. This is available for all bridges.
> > 
> > The per-port-VLAN multicast context is then only available on
> > VLAN-filtering bridges on VLANs that have multicast snooping on.
> > 
> > With these changes in place, it will be possible to configure MDB limit for
> > bridge as a whole, or any one port as a whole, or any single port-VLAN.
> > 
> > Note that unlike the global limit, exhaustion of the per-port and
> > per-port-VLAN maximums does not cause disablement of multicast snooping.
> > It is also permitted to configure the local limit larger than hash_max,
> > even though that is not useful.
> > 
> > In this patch, introduce only the accounting for number of entries, and the
> > max field itself, but not the means to toggle the max. The next patch
> > introduces the netlink APIs to toggle and read the values.
> > 
> > Note that the per-port-VLAN mcast_max_groups value gets reset when VLAN
> > snooping is enabled. The reason for this is that while VLAN snooping is
> > disabled, permanent entries can be added above the limit imposed by the
> > configured maximum. Under those circumstances, whatever caused the VLAN
> > context enablement, would need to be rolled back, adding a fair amount of
> > code that would be rarely hit and tricky to maintain. At the same time,
> > the feature that this would enable is IMHO not interesting: I posit that
> > the usefulness of keeping mcast_max_groups intact across
> > mcast_vlan_snooping toggles is marginal at best.
> > 
> 
> Hmm, I keep thinking about this one and I don't completely agree. It would be
> more user-friendly if the max count doesn't get reset when mcast snooping is toggled.
> Imposing order of operations (first enable snooping, then config max entries) isn't necessary
> and it makes sense for someone to first set the limit and then enable vlan snooping.
> Also it would be consistent with port max entries, I'd prefer if we have the same
> behaviour for port and vlan pmctxs. If we allow to set any maximum at any time we
> don't need to rollback anything, also we already always lookup vlans in br_multicast_port_vid_to_port_ctx()
> to check if snooping is enabled so we can keep the count correct regardless, the same as
> it's done for the ports. Keeping both limits with consistent semantics seems better to me.
> 
> WDYT ?

The current approach is strict and prevents user space from performing
configuration that does not make a lot of sense:

1. Setting the maximum to be less than the current count.

2. Increasing the port-VLAN count above port-VLAN maximum when VLAN
snooping is disabled (i.e., maximum is not enforced) and then enabling
VLAN snooping.

However, it is not consistent with similar existing behavior where the
kernel is more liberal. For example:

1. It is possible to set the global maximum to be less than the current
number of entries.

2. Other port-VLAN attributes are not reset when VLAN snooping is
toggled.

And it also results in order of operations problems like you described.

So, it seems to me that we have more good reasons to not reset the
maximum than to reset it. Regardless of which path we take, it is
important to document the behavior in the man page (and in the commit
message, obviously) to avoid "bug reports" later on.

^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: [Bridge] [PATCH net-next 07/16] net: bridge: Maintain number of MDB entries in net_bridge_mcast_port
@ 2023-01-30  8:08       ` Ido Schimmel
  0 siblings, 0 replies; 90+ messages in thread
From: Ido Schimmel @ 2023-01-30  8:08 UTC (permalink / raw)
  To: Nikolay Aleksandrov
  Cc: Petr Machata, netdev, bridge, Eric Dumazet, Roopa Prabhu,
	Jakub Kicinski, Paolo Abeni, David S. Miller

On Sun, Jan 29, 2023 at 06:55:26PM +0200, Nikolay Aleksandrov wrote:
> On 26/01/2023 19:01, Petr Machata wrote:
> > The MDB maintained by the bridge is limited. When the bridge is configured
> > for IGMP / MLD snooping, a buggy or malicious client can easily exhaust its
> > capacity. In SW datapath, the capacity is configurable through the
> > IFLA_BR_MCAST_HASH_MAX parameter, but ultimately is finite. Obviously a
> > similar limit exists in the HW datapath for purposes of offloading.
> > 
> > In order to prevent the issue of unilateral exhaustion of MDB resources,
> > introduce two parameters in each of two contexts:
> > 
> > - Per-port and per-port-VLAN number of MDB entries that the port
> >   is member in.
> > 
> > - Per-port and (when BROPT_MCAST_VLAN_SNOOPING_ENABLED is enabled)
> >   per-port-VLAN maximum permitted number of MDB entries, or 0 for
> >   no limit.
> > 
> > The per-port multicast context is used for tracking of MDB entries for the
> > port as a whole. This is available for all bridges.
> > 
> > The per-port-VLAN multicast context is then only available on
> > VLAN-filtering bridges on VLANs that have multicast snooping on.
> > 
> > With these changes in place, it will be possible to configure MDB limit for
> > bridge as a whole, or any one port as a whole, or any single port-VLAN.
> > 
> > Note that unlike the global limit, exhaustion of the per-port and
> > per-port-VLAN maximums does not cause disablement of multicast snooping.
> > It is also permitted to configure the local limit larger than hash_max,
> > even though that is not useful.
> > 
> > In this patch, introduce only the accounting for number of entries, and the
> > max field itself, but not the means to toggle the max. The next patch
> > introduces the netlink APIs to toggle and read the values.
> > 
> > Note that the per-port-VLAN mcast_max_groups value gets reset when VLAN
> > snooping is enabled. The reason for this is that while VLAN snooping is
> > disabled, permanent entries can be added above the limit imposed by the
> > configured maximum. Under those circumstances, whatever caused the VLAN
> > context enablement, would need to be rolled back, adding a fair amount of
> > code that would be rarely hit and tricky to maintain. At the same time,
> > the feature that this would enable is IMHO not interesting: I posit that
> > the usefulness of keeping mcast_max_groups intact across
> > mcast_vlan_snooping toggles is marginal at best.
> > 
> 
> Hmm, I keep thinking about this one and I don't completely agree. It would be
> more user-friendly if the max count doesn't get reset when mcast snooping is toggled.
> Imposing order of operations (first enable snooping, then config max entries) isn't necessary
> and it makes sense for someone to first set the limit and then enable vlan snooping.
> Also it would be consistent with port max entries, I'd prefer if we have the same
> behaviour for port and vlan pmctxs. If we allow to set any maximum at any time we
> don't need to rollback anything, also we already always lookup vlans in br_multicast_port_vid_to_port_ctx()
> to check if snooping is enabled so we can keep the count correct regardless, the same as
> it's done for the ports. Keeping both limits with consistent semantics seems better to me.
> 
> WDYT ?

The current approach is strict and prevents user space from performing
configuration that does not make a lot of sense:

1. Setting the maximum to be less than the current count.

2. Increasing the port-VLAN count above port-VLAN maximum when VLAN
snooping is disabled (i.e., maximum is not enforced) and then enabling
VLAN snooping.

However, it is not consistent with similar existing behavior where the
kernel is more liberal. For example:

1. It is possible to set the global maximum to be less than the current
number of entries.

2. Other port-VLAN attributes are not reset when VLAN snooping is
toggled.

And it also results in order of operations problems like you described.

So, it seems to me that we have more good reasons to not reset the
maximum than to reset it. Regardless of which path we take, it is
important to document the behavior in the man page (and in the commit
message, obviously) to avoid "bug reports" later on.

^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: [PATCH net-next 07/16] net: bridge: Maintain number of MDB entries in net_bridge_mcast_port
  2023-01-30  8:08       ` [Bridge] " Ido Schimmel
@ 2023-01-30  8:56         ` Nikolay Aleksandrov
  -1 siblings, 0 replies; 90+ messages in thread
From: Nikolay Aleksandrov @ 2023-01-30  8:56 UTC (permalink / raw)
  To: Ido Schimmel
  Cc: Petr Machata, David S. Miller, Eric Dumazet, Jakub Kicinski,
	Paolo Abeni, Roopa Prabhu, netdev, bridge

On 30/01/2023 10:08, Ido Schimmel wrote:
> On Sun, Jan 29, 2023 at 06:55:26PM +0200, Nikolay Aleksandrov wrote:
>> On 26/01/2023 19:01, Petr Machata wrote:
>>> The MDB maintained by the bridge is limited. When the bridge is configured
>>> for IGMP / MLD snooping, a buggy or malicious client can easily exhaust its
>>> capacity. In SW datapath, the capacity is configurable through the
>>> IFLA_BR_MCAST_HASH_MAX parameter, but ultimately is finite. Obviously a
>>> similar limit exists in the HW datapath for purposes of offloading.
>>>
>>> In order to prevent the issue of unilateral exhaustion of MDB resources,
>>> introduce two parameters in each of two contexts:
>>>
>>> - Per-port and per-port-VLAN number of MDB entries that the port
>>>   is member in.
>>>
>>> - Per-port and (when BROPT_MCAST_VLAN_SNOOPING_ENABLED is enabled)
>>>   per-port-VLAN maximum permitted number of MDB entries, or 0 for
>>>   no limit.
>>>
>>> The per-port multicast context is used for tracking of MDB entries for the
>>> port as a whole. This is available for all bridges.
>>>
>>> The per-port-VLAN multicast context is then only available on
>>> VLAN-filtering bridges on VLANs that have multicast snooping on.
>>>
>>> With these changes in place, it will be possible to configure MDB limit for
>>> bridge as a whole, or any one port as a whole, or any single port-VLAN.
>>>
>>> Note that unlike the global limit, exhaustion of the per-port and
>>> per-port-VLAN maximums does not cause disablement of multicast snooping.
>>> It is also permitted to configure the local limit larger than hash_max,
>>> even though that is not useful.
>>>
>>> In this patch, introduce only the accounting for number of entries, and the
>>> max field itself, but not the means to toggle the max. The next patch
>>> introduces the netlink APIs to toggle and read the values.
>>>
>>> Note that the per-port-VLAN mcast_max_groups value gets reset when VLAN
>>> snooping is enabled. The reason for this is that while VLAN snooping is
>>> disabled, permanent entries can be added above the limit imposed by the
>>> configured maximum. Under those circumstances, whatever caused the VLAN
>>> context enablement, would need to be rolled back, adding a fair amount of
>>> code that would be rarely hit and tricky to maintain. At the same time,
>>> the feature that this would enable is IMHO not interesting: I posit that
>>> the usefulness of keeping mcast_max_groups intact across
>>> mcast_vlan_snooping toggles is marginal at best.
>>>
>>
>> Hmm, I keep thinking about this one and I don't completely agree. It would be
>> more user-friendly if the max count doesn't get reset when mcast snooping is toggled.
>> Imposing order of operations (first enable snooping, then config max entries) isn't necessary
>> and it makes sense for someone to first set the limit and then enable vlan snooping.
>> Also it would be consistent with port max entries, I'd prefer if we have the same
>> behaviour for port and vlan pmctxs. If we allow to set any maximum at any time we
>> don't need to rollback anything, also we already always lookup vlans in br_multicast_port_vid_to_port_ctx()
>> to check if snooping is enabled so we can keep the count correct regardless, the same as
>> it's done for the ports. Keeping both limits with consistent semantics seems better to me.
>>
>> WDYT ?
> 
> The current approach is strict and prevents user space from performing
> configuration that does not make a lot of sense:
> 
> 1. Setting the maximum to be less than the current count.
> 
> 2. Increasing the port-VLAN count above port-VLAN maximum when VLAN
> snooping is disabled (i.e., maximum is not enforced) and then enabling
> VLAN snooping.
> 
> However, it is not consistent with similar existing behavior where the
> kernel is more liberal. For example:
> 
> 1. It is possible to set the global maximum to be less than the current
> number of entries.
> 
> 2. Other port-VLAN attributes are not reset when VLAN snooping is
> toggled.
> 

Right, 2) is my main concern and could be surprising. I'd also like to
have consistent behaviour for both limits - port and vlan.

> And it also results in order of operations problems like you described.
> 
> So, it seems to me that we have more good reasons to not reset the
> maximum than to reset it. Regardless of which path we take, it is
> important to document the behavior in the man page (and in the commit
> message, obviously) to avoid "bug reports" later on.

+1
Absolutely agree.

Thanks,
 Nik


^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: [Bridge] [PATCH net-next 07/16] net: bridge: Maintain number of MDB entries in net_bridge_mcast_port
@ 2023-01-30  8:56         ` Nikolay Aleksandrov
  0 siblings, 0 replies; 90+ messages in thread
From: Nikolay Aleksandrov @ 2023-01-30  8:56 UTC (permalink / raw)
  To: Ido Schimmel
  Cc: Petr Machata, netdev, bridge, Eric Dumazet, Roopa Prabhu,
	Jakub Kicinski, Paolo Abeni, David S. Miller

On 30/01/2023 10:08, Ido Schimmel wrote:
> On Sun, Jan 29, 2023 at 06:55:26PM +0200, Nikolay Aleksandrov wrote:
>> On 26/01/2023 19:01, Petr Machata wrote:
>>> The MDB maintained by the bridge is limited. When the bridge is configured
>>> for IGMP / MLD snooping, a buggy or malicious client can easily exhaust its
>>> capacity. In SW datapath, the capacity is configurable through the
>>> IFLA_BR_MCAST_HASH_MAX parameter, but ultimately is finite. Obviously a
>>> similar limit exists in the HW datapath for purposes of offloading.
>>>
>>> In order to prevent the issue of unilateral exhaustion of MDB resources,
>>> introduce two parameters in each of two contexts:
>>>
>>> - Per-port and per-port-VLAN number of MDB entries that the port
>>>   is member in.
>>>
>>> - Per-port and (when BROPT_MCAST_VLAN_SNOOPING_ENABLED is enabled)
>>>   per-port-VLAN maximum permitted number of MDB entries, or 0 for
>>>   no limit.
>>>
>>> The per-port multicast context is used for tracking of MDB entries for the
>>> port as a whole. This is available for all bridges.
>>>
>>> The per-port-VLAN multicast context is then only available on
>>> VLAN-filtering bridges on VLANs that have multicast snooping on.
>>>
>>> With these changes in place, it will be possible to configure MDB limit for
>>> bridge as a whole, or any one port as a whole, or any single port-VLAN.
>>>
>>> Note that unlike the global limit, exhaustion of the per-port and
>>> per-port-VLAN maximums does not cause disablement of multicast snooping.
>>> It is also permitted to configure the local limit larger than hash_max,
>>> even though that is not useful.
>>>
>>> In this patch, introduce only the accounting for number of entries, and the
>>> max field itself, but not the means to toggle the max. The next patch
>>> introduces the netlink APIs to toggle and read the values.
>>>
>>> Note that the per-port-VLAN mcast_max_groups value gets reset when VLAN
>>> snooping is enabled. The reason for this is that while VLAN snooping is
>>> disabled, permanent entries can be added above the limit imposed by the
>>> configured maximum. Under those circumstances, whatever caused the VLAN
>>> context enablement, would need to be rolled back, adding a fair amount of
>>> code that would be rarely hit and tricky to maintain. At the same time,
>>> the feature that this would enable is IMHO not interesting: I posit that
>>> the usefulness of keeping mcast_max_groups intact across
>>> mcast_vlan_snooping toggles is marginal at best.
>>>
>>
>> Hmm, I keep thinking about this one and I don't completely agree. It would be
>> more user-friendly if the max count doesn't get reset when mcast snooping is toggled.
>> Imposing order of operations (first enable snooping, then config max entries) isn't necessary
>> and it makes sense for someone to first set the limit and then enable vlan snooping.
>> Also it would be consistent with port max entries, I'd prefer if we have the same
>> behaviour for port and vlan pmctxs. If we allow to set any maximum at any time we
>> don't need to rollback anything, also we already always lookup vlans in br_multicast_port_vid_to_port_ctx()
>> to check if snooping is enabled so we can keep the count correct regardless, the same as
>> it's done for the ports. Keeping both limits with consistent semantics seems better to me.
>>
>> WDYT ?
> 
> The current approach is strict and prevents user space from performing
> configuration that does not make a lot of sense:
> 
> 1. Setting the maximum to be less than the current count.
> 
> 2. Increasing the port-VLAN count above port-VLAN maximum when VLAN
> snooping is disabled (i.e., maximum is not enforced) and then enabling
> VLAN snooping.
> 
> However, it is not consistent with similar existing behavior where the
> kernel is more liberal. For example:
> 
> 1. It is possible to set the global maximum to be less than the current
> number of entries.
> 
> 2. Other port-VLAN attributes are not reset when VLAN snooping is
> toggled.
> 

Right, 2) is my main concern and could be surprising. I'd also like to
have consistent behaviour for both limits - port and vlan.

> And it also results in order of operations problems like you described.
> 
> So, it seems to me that we have more good reasons to not reset the
> maximum than to reset it. Regardless of which path we take, it is
> important to document the behavior in the man page (and in the commit
> message, obviously) to avoid "bug reports" later on.

+1
Absolutely agree.

Thanks,
 Nik


^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: [PATCH net-next 08/16] net: bridge: Add netlink knobs for number / maximum MDB entries
  2023-01-29 10:07     ` [Bridge] " Nikolay Aleksandrov
@ 2023-01-30 11:07       ` Petr Machata
  -1 siblings, 0 replies; 90+ messages in thread
From: Petr Machata @ 2023-01-30 11:07 UTC (permalink / raw)
  To: Nikolay Aleksandrov
  Cc: Petr Machata, David S. Miller, Eric Dumazet, Jakub Kicinski,
	Paolo Abeni, Roopa Prabhu, netdev, bridge, Ido Schimmel


Nikolay Aleksandrov <razor@blackwall.org> writes:

> On 26/01/2023 19:01, Petr Machata wrote:
>> diff --git a/net/bridge/br_multicast.c b/net/bridge/br_multicast.c
>> index de531109b947..04261dd2380b 100644
>> --- a/net/bridge/br_multicast.c
>> +++ b/net/bridge/br_multicast.c
>> @@ -766,6 +766,102 @@ static void br_multicast_port_ngroups_dec(struct net_bridge_port *port, u16 vid)
>>  	br_multicast_port_ngroups_dec_one(&port->multicast_ctx);
>>  }
>>  
>> +static int
>> +br_multicast_pmctx_ngroups_set_max(struct net_bridge_mcast_port *pmctx,
>> +				   u32 max, struct netlink_ext_ack *extack)
>> +{
>> +	if (max && max < pmctx->mdb_n_entries) {
>> +		NL_SET_ERR_MSG_FMT_MOD(extack, "Can't set mcast_max_groups=%u, which is below mcast_n_groups=%u",
>> +				       max, pmctx->mdb_n_entries);
>
> Why not? All new entries will be rejected anyway, at most some will expire and make room.

Yeah, as I wrote in the other thread, I can relax the relationship
between max and n.

>> +		return -EINVAL;
>> +	}
>> +
>> +	pmctx->mdb_max_entries = max;
>> +	return 0;
>> +}
>> +
>> +u32 br_multicast_port_ngroups_get(const struct net_bridge_port *port)
>> +{
>> +	u32 n;
>> +
>> +	spin_lock_bh(&port->br->multicast_lock);
>> +	n = port->multicast_ctx.mdb_n_entries;
>> +	spin_unlock_bh(&port->br->multicast_lock);
>
> This is too much just to read the value, we block all IGMP/MLD processing and potentially
> block packet processing on the same core just to read it. These reads are done for notifications,
> getlink and also for fill_slave_info. I think we can just use WRITE/READ_ONCE helpers to access
> it. Especially since the lock is taken for both values (max and current count). We still get a
> snapshop that can be wrong by the time it's returned and about changing it we'll start enforcing
> the new limit with a minor delay which is not a big deal.

Makes sense.

>> +
>> +	return n;
>> +}
>> +
>> +int br_multicast_vlan_ngroups_get(struct net_bridge *br,
>> +				  const struct net_bridge_vlan *v,
>> +				  u32 *n)
>> +{
>> +	if (br_multicast_port_ctx_vlan_disabled(&v->port_mcast_ctx))
>> +		return -EINVAL;
>> +
>> +	spin_lock_bh(&br->multicast_lock);
>> +	*n = v->port_mcast_ctx.mdb_n_entries;
>> +	spin_unlock_bh(&br->multicast_lock);
>> +
>
> ditto and for all accesses below that require the lock..

Yah.

^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: [Bridge] [PATCH net-next 08/16] net: bridge: Add netlink knobs for number / maximum MDB entries
@ 2023-01-30 11:07       ` Petr Machata
  0 siblings, 0 replies; 90+ messages in thread
From: Petr Machata @ 2023-01-30 11:07 UTC (permalink / raw)
  To: Nikolay Aleksandrov
  Cc: Petr Machata, netdev, Ido Schimmel, bridge, Eric Dumazet,
	Roopa Prabhu, Jakub Kicinski, Paolo Abeni, David S. Miller


Nikolay Aleksandrov <razor@blackwall.org> writes:

> On 26/01/2023 19:01, Petr Machata wrote:
>> diff --git a/net/bridge/br_multicast.c b/net/bridge/br_multicast.c
>> index de531109b947..04261dd2380b 100644
>> --- a/net/bridge/br_multicast.c
>> +++ b/net/bridge/br_multicast.c
>> @@ -766,6 +766,102 @@ static void br_multicast_port_ngroups_dec(struct net_bridge_port *port, u16 vid)
>>  	br_multicast_port_ngroups_dec_one(&port->multicast_ctx);
>>  }
>>  
>> +static int
>> +br_multicast_pmctx_ngroups_set_max(struct net_bridge_mcast_port *pmctx,
>> +				   u32 max, struct netlink_ext_ack *extack)
>> +{
>> +	if (max && max < pmctx->mdb_n_entries) {
>> +		NL_SET_ERR_MSG_FMT_MOD(extack, "Can't set mcast_max_groups=%u, which is below mcast_n_groups=%u",
>> +				       max, pmctx->mdb_n_entries);
>
> Why not? All new entries will be rejected anyway, at most some will expire and make room.

Yeah, as I wrote in the other thread, I can relax the relationship
between max and n.

>> +		return -EINVAL;
>> +	}
>> +
>> +	pmctx->mdb_max_entries = max;
>> +	return 0;
>> +}
>> +
>> +u32 br_multicast_port_ngroups_get(const struct net_bridge_port *port)
>> +{
>> +	u32 n;
>> +
>> +	spin_lock_bh(&port->br->multicast_lock);
>> +	n = port->multicast_ctx.mdb_n_entries;
>> +	spin_unlock_bh(&port->br->multicast_lock);
>
> This is too much just to read the value, we block all IGMP/MLD processing and potentially
> block packet processing on the same core just to read it. These reads are done for notifications,
> getlink and also for fill_slave_info. I think we can just use WRITE/READ_ONCE helpers to access
> it. Especially since the lock is taken for both values (max and current count). We still get a
> snapshop that can be wrong by the time it's returned and about changing it we'll start enforcing
> the new limit with a minor delay which is not a big deal.

Makes sense.

>> +
>> +	return n;
>> +}
>> +
>> +int br_multicast_vlan_ngroups_get(struct net_bridge *br,
>> +				  const struct net_bridge_vlan *v,
>> +				  u32 *n)
>> +{
>> +	if (br_multicast_port_ctx_vlan_disabled(&v->port_mcast_ctx))
>> +		return -EINVAL;
>> +
>> +	spin_lock_bh(&br->multicast_lock);
>> +	*n = v->port_mcast_ctx.mdb_n_entries;
>> +	spin_unlock_bh(&br->multicast_lock);
>> +
>
> ditto and for all accesses below that require the lock..

Yah.

^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: [PATCH net-next 07/16] net: bridge: Maintain number of MDB entries in net_bridge_mcast_port
  2023-01-29 16:55     ` [Bridge] " Nikolay Aleksandrov
@ 2023-01-30 15:02       ` Petr Machata
  -1 siblings, 0 replies; 90+ messages in thread
From: Petr Machata @ 2023-01-30 15:02 UTC (permalink / raw)
  To: Nikolay Aleksandrov
  Cc: Petr Machata, David S. Miller, Eric Dumazet, Jakub Kicinski,
	Paolo Abeni, Roopa Prabhu, netdev, bridge, Ido Schimmel


Nikolay Aleksandrov <razor@blackwall.org> writes:

> On 26/01/2023 19:01, Petr Machata wrote:
>> Note that the per-port-VLAN mcast_max_groups value gets reset when VLAN
>> snooping is enabled. The reason for this is that while VLAN snooping is
>> disabled, permanent entries can be added above the limit imposed by the
>> configured maximum. Under those circumstances, whatever caused the VLAN
>> context enablement, would need to be rolled back, adding a fair amount of
>> code that would be rarely hit and tricky to maintain. At the same time,
>> the feature that this would enable is IMHO not interesting: I posit that
>> the usefulness of keeping mcast_max_groups intact across
>> mcast_vlan_snooping toggles is marginal at best.
>> 
>
> Hmm, I keep thinking about this one and I don't completely agree. It
> would be more user-friendly if the max count doesn't get reset when
> mcast snooping is toggled. Imposing order of operations (first enable
> snooping, then config max entries) isn't necessary and it makes sense
> for someone to first set the limit and then enable vlan snooping.

If you are talking about mcast_snooping, that can be disabled while
mcast_vlan_snooping is enabled. So you can configure everything, then
turn snooping on.

If you are talking about configuring max while mcast_vlan_snooping is
off, then I assumed one shouldn't touch the VLAN context if
br_multicast_port_ctx_vlan_disabled(). So we would need to track the n
and max in some other entity than in the multicast context. But maybe
I'm wrong.

> Also it would be consistent with port max entries, I'd prefer if we
> have the same behaviour for port and vlan pmctxs. If we allow to set
> any maximum at any time we don't need to rollback anything, also we
> already always lookup vlans in br_multicast_port_vid_to_port_ctx() to
> check if snooping is enabled so we can keep the count correct
> regardless, the same as it's done for the ports. Keeping both limits
> with consistent semantics seems better to me.

The idea of requiring max >= current felt so natural to me that I didn't
even check what mcast_hash_max was doing. Sure -- let's be consistent.
This will incidentally make all the rollbacks go away, and happily makes
sense WRT locking, too: since the relation between max and n is somewhat
loose, we don't need to worry too much about sequencing inc-/dec-n vs.
set-max.

^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: [Bridge] [PATCH net-next 07/16] net: bridge: Maintain number of MDB entries in net_bridge_mcast_port
@ 2023-01-30 15:02       ` Petr Machata
  0 siblings, 0 replies; 90+ messages in thread
From: Petr Machata @ 2023-01-30 15:02 UTC (permalink / raw)
  To: Nikolay Aleksandrov
  Cc: Petr Machata, netdev, Ido Schimmel, bridge, Eric Dumazet,
	Roopa Prabhu, Jakub Kicinski, Paolo Abeni, David S. Miller


Nikolay Aleksandrov <razor@blackwall.org> writes:

> On 26/01/2023 19:01, Petr Machata wrote:
>> Note that the per-port-VLAN mcast_max_groups value gets reset when VLAN
>> snooping is enabled. The reason for this is that while VLAN snooping is
>> disabled, permanent entries can be added above the limit imposed by the
>> configured maximum. Under those circumstances, whatever caused the VLAN
>> context enablement, would need to be rolled back, adding a fair amount of
>> code that would be rarely hit and tricky to maintain. At the same time,
>> the feature that this would enable is IMHO not interesting: I posit that
>> the usefulness of keeping mcast_max_groups intact across
>> mcast_vlan_snooping toggles is marginal at best.
>> 
>
> Hmm, I keep thinking about this one and I don't completely agree. It
> would be more user-friendly if the max count doesn't get reset when
> mcast snooping is toggled. Imposing order of operations (first enable
> snooping, then config max entries) isn't necessary and it makes sense
> for someone to first set the limit and then enable vlan snooping.

If you are talking about mcast_snooping, that can be disabled while
mcast_vlan_snooping is enabled. So you can configure everything, then
turn snooping on.

If you are talking about configuring max while mcast_vlan_snooping is
off, then I assumed one shouldn't touch the VLAN context if
br_multicast_port_ctx_vlan_disabled(). So we would need to track the n
and max in some other entity than in the multicast context. But maybe
I'm wrong.

> Also it would be consistent with port max entries, I'd prefer if we
> have the same behaviour for port and vlan pmctxs. If we allow to set
> any maximum at any time we don't need to rollback anything, also we
> already always lookup vlans in br_multicast_port_vid_to_port_ctx() to
> check if snooping is enabled so we can keep the count correct
> regardless, the same as it's done for the ports. Keeping both limits
> with consistent semantics seems better to me.

The idea of requiring max >= current felt so natural to me that I didn't
even check what mcast_hash_max was doing. Sure -- let's be consistent.
This will incidentally make all the rollbacks go away, and happily makes
sense WRT locking, too: since the relation between max and n is somewhat
loose, we don't need to worry too much about sequencing inc-/dec-n vs.
set-max.

^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: [PATCH net-next 06/16] net: bridge: Add a tracepoint for MDB overflows
  2023-01-26 17:53     ` [Bridge] " Steven Rostedt
@ 2023-01-30 15:50       ` Petr Machata
  -1 siblings, 0 replies; 90+ messages in thread
From: Petr Machata @ 2023-01-30 15:50 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: Petr Machata, David S. Miller, Eric Dumazet, Jakub Kicinski,
	Paolo Abeni, Roopa Prabhu, Nikolay Aleksandrov, netdev, bridge,
	Ido Schimmel, linux-trace-kernel


Steven Rostedt <rostedt@goodmis.org> writes:

> On Thu, 26 Jan 2023 18:01:14 +0100
> Petr Machata <petrm@nvidia.com> wrote:
>
>> +	TP_printk("dev %s af %u src %pI4/%pI6c grp %pI4/%pI6c/%pM vid %u",
>> +		  __get_str(dev), __entry->af, __entry->src4, __entry->src6,
>> +		  __entry->grp4, __entry->grp6, __entry->grpmac, __entry->vid)
>
> And just have: 
>
> 	TP_printk("dev %s af %u src %pI6c grp %pI6c/%pM vid %u",
> 		  __get_str(dev), __entry->af, __entry->src, __entry->grp,
> 		  __entry->grpmac, __entry->vid)
>
> As the %pI6c should detect that it's a ipv4 address and show that.

This means the IP addresses will always be IPv6, even for an IPv4 MDB
entries. One can still figure out the true protocol from the address
family field, but it might not be obvious. Plus the IPv4-mapped IPv6
addresses are not really formatted as IPv4, though yeah, IPv4 notation
is embedded in that.

All the information is still there, but... scrambled? Not sure the
reduction in clarity is worth the 8 bytes that we save. The tracepoint
is unlikely to trigger often.

What say you?

^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: [Bridge] [PATCH net-next 06/16] net: bridge: Add a tracepoint for MDB overflows
@ 2023-01-30 15:50       ` Petr Machata
  0 siblings, 0 replies; 90+ messages in thread
From: Petr Machata @ 2023-01-30 15:50 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: Petr Machata, netdev, Nikolay Aleksandrov, bridge, Ido Schimmel,
	Eric Dumazet, Roopa Prabhu, Jakub Kicinski, Paolo Abeni,
	David S. Miller, linux-trace-kernel


Steven Rostedt <rostedt@goodmis.org> writes:

> On Thu, 26 Jan 2023 18:01:14 +0100
> Petr Machata <petrm@nvidia.com> wrote:
>
>> +	TP_printk("dev %s af %u src %pI4/%pI6c grp %pI4/%pI6c/%pM vid %u",
>> +		  __get_str(dev), __entry->af, __entry->src4, __entry->src6,
>> +		  __entry->grp4, __entry->grp6, __entry->grpmac, __entry->vid)
>
> And just have: 
>
> 	TP_printk("dev %s af %u src %pI6c grp %pI6c/%pM vid %u",
> 		  __get_str(dev), __entry->af, __entry->src, __entry->grp,
> 		  __entry->grpmac, __entry->vid)
>
> As the %pI6c should detect that it's a ipv4 address and show that.

This means the IP addresses will always be IPv6, even for an IPv4 MDB
entries. One can still figure out the true protocol from the address
family field, but it might not be obvious. Plus the IPv4-mapped IPv6
addresses are not really formatted as IPv4, though yeah, IPv4 notation
is embedded in that.

All the information is still there, but... scrambled? Not sure the
reduction in clarity is worth the 8 bytes that we save. The tracepoint
is unlikely to trigger often.

What say you?

^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: [PATCH net-next 06/16] net: bridge: Add a tracepoint for MDB overflows
  2023-01-30 15:50       ` [Bridge] " Petr Machata
@ 2023-01-30 23:23         ` Steven Rostedt
  -1 siblings, 0 replies; 90+ messages in thread
From: Steven Rostedt @ 2023-01-30 23:23 UTC (permalink / raw)
  To: Petr Machata
  Cc: David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
	Roopa Prabhu, Nikolay Aleksandrov, netdev, bridge, Ido Schimmel,
	linux-trace-kernel

On Mon, 30 Jan 2023 16:50:32 +0100
Petr Machata <petrm@nvidia.com> wrote:

> Steven Rostedt <rostedt@goodmis.org> writes:
> 
> > On Thu, 26 Jan 2023 18:01:14 +0100
> > Petr Machata <petrm@nvidia.com> wrote:
> >  
> >> +	TP_printk("dev %s af %u src %pI4/%pI6c grp %pI4/%pI6c/%pM vid %u",
> >> +		  __get_str(dev), __entry->af, __entry->src4, __entry->src6,
> >> +		  __entry->grp4, __entry->grp6, __entry->grpmac, __entry->vid)  
> >
> > And just have: 
> >
> > 	TP_printk("dev %s af %u src %pI6c grp %pI6c/%pM vid %u",
> > 		  __get_str(dev), __entry->af, __entry->src, __entry->grp,
> > 		  __entry->grpmac, __entry->vid)
> >
> > As the %pI6c should detect that it's a ipv4 address and show that.  
> 
> This means the IP addresses will always be IPv6, even for an IPv4 MDB
> entries. One can still figure out the true protocol from the address
> family field, but it might not be obvious. Plus the IPv4-mapped IPv6
> addresses are not really formatted as IPv4, though yeah, IPv4 notation
> is embedded in that.
> 
> All the information is still there, but... scrambled? Not sure the
> reduction in clarity is worth the 8 bytes that we save. The tracepoint
> is unlikely to trigger often.

8 bytes per event, and yes, ring buffer real estate is expensive.

And if you use trace-cmd or perf, we can always add a plugin to
libtraceevent that can format this much nicer based on the information that
is there.

-- Steve

^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: [Bridge] [PATCH net-next 06/16] net: bridge: Add a tracepoint for MDB overflows
@ 2023-01-30 23:23         ` Steven Rostedt
  0 siblings, 0 replies; 90+ messages in thread
From: Steven Rostedt @ 2023-01-30 23:23 UTC (permalink / raw)
  To: Petr Machata
  Cc: netdev, Nikolay Aleksandrov, bridge, Ido Schimmel, Eric Dumazet,
	Roopa Prabhu, Jakub Kicinski, Paolo Abeni, David S. Miller,
	linux-trace-kernel

On Mon, 30 Jan 2023 16:50:32 +0100
Petr Machata <petrm@nvidia.com> wrote:

> Steven Rostedt <rostedt@goodmis.org> writes:
> 
> > On Thu, 26 Jan 2023 18:01:14 +0100
> > Petr Machata <petrm@nvidia.com> wrote:
> >  
> >> +	TP_printk("dev %s af %u src %pI4/%pI6c grp %pI4/%pI6c/%pM vid %u",
> >> +		  __get_str(dev), __entry->af, __entry->src4, __entry->src6,
> >> +		  __entry->grp4, __entry->grp6, __entry->grpmac, __entry->vid)  
> >
> > And just have: 
> >
> > 	TP_printk("dev %s af %u src %pI6c grp %pI6c/%pM vid %u",
> > 		  __get_str(dev), __entry->af, __entry->src, __entry->grp,
> > 		  __entry->grpmac, __entry->vid)
> >
> > As the %pI6c should detect that it's a ipv4 address and show that.  
> 
> This means the IP addresses will always be IPv6, even for an IPv4 MDB
> entries. One can still figure out the true protocol from the address
> family field, but it might not be obvious. Plus the IPv4-mapped IPv6
> addresses are not really formatted as IPv4, though yeah, IPv4 notation
> is embedded in that.
> 
> All the information is still there, but... scrambled? Not sure the
> reduction in clarity is worth the 8 bytes that we save. The tracepoint
> is unlikely to trigger often.

8 bytes per event, and yes, ring buffer real estate is expensive.

And if you use trace-cmd or perf, we can always add a plugin to
libtraceevent that can format this much nicer based on the information that
is there.

-- Steve

^ permalink raw reply	[flat|nested] 90+ messages in thread

end of thread, other threads:[~2023-01-30 23:23 UTC | newest]

Thread overview: 90+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-01-26 17:01 [PATCH net-next 00/16] bridge: Limit number of MDB entries per port, port-vlan Petr Machata
2023-01-26 17:01 ` [Bridge] " Petr Machata
2023-01-26 17:01 ` [PATCH net-next 01/16] net: bridge: Set strict_start_type at two policies Petr Machata
2023-01-26 17:01   ` [Bridge] " Petr Machata
2023-01-26 19:18   ` Stephen Hemminger
2023-01-26 19:18     ` [Bridge] " Stephen Hemminger
2023-01-26 20:27     ` Nikolay Aleksandrov
2023-01-26 20:27       ` [Bridge] " Nikolay Aleksandrov
2023-01-29  9:09   ` Nikolay Aleksandrov
2023-01-29  9:09     ` [Bridge] " Nikolay Aleksandrov
2023-01-26 17:01 ` [PATCH net-next 02/16] net: bridge: Add extack to br_multicast_new_port_group() Petr Machata
2023-01-26 17:01   ` [Bridge] " Petr Machata
2023-01-29  9:09   ` Nikolay Aleksandrov
2023-01-29  9:09     ` [Bridge] " Nikolay Aleksandrov
2023-01-26 17:01 ` [PATCH net-next 03/16] net: bridge: Move extack-setting " Petr Machata
2023-01-26 17:01   ` [Bridge] " Petr Machata
2023-01-29  9:09   ` Nikolay Aleksandrov
2023-01-29  9:09     ` [Bridge] " Nikolay Aleksandrov
2023-01-26 17:01 ` [PATCH net-next 04/16] net: bridge: Add br_multicast_del_port_group() Petr Machata
2023-01-26 17:01   ` [Bridge] " Petr Machata
2023-01-29  9:11   ` Nikolay Aleksandrov
2023-01-29  9:11     ` [Bridge] " Nikolay Aleksandrov
2023-01-26 17:01 ` [PATCH net-next 05/16] net: bridge: Change a cleanup in br_multicast_new_port_group() to goto Petr Machata
2023-01-26 17:01   ` [Bridge] " Petr Machata
2023-01-29  9:11   ` Nikolay Aleksandrov
2023-01-29  9:11     ` [Bridge] " Nikolay Aleksandrov
2023-01-26 17:01 ` [PATCH net-next 06/16] net: bridge: Add a tracepoint for MDB overflows Petr Machata
2023-01-26 17:01   ` [Bridge] " Petr Machata
2023-01-26 17:53   ` Steven Rostedt
2023-01-26 17:53     ` [Bridge] " Steven Rostedt
2023-01-27 14:29     ` Petr Machata
2023-01-27 14:29       ` [Bridge] " Petr Machata
2023-01-30 15:50     ` Petr Machata
2023-01-30 15:50       ` [Bridge] " Petr Machata
2023-01-30 23:23       ` Steven Rostedt
2023-01-30 23:23         ` [Bridge] " Steven Rostedt
2023-01-26 17:01 ` [PATCH net-next 07/16] net: bridge: Maintain number of MDB entries in net_bridge_mcast_port Petr Machata
2023-01-26 17:01   ` [Bridge] " Petr Machata
2023-01-29  9:40   ` Nikolay Aleksandrov
2023-01-29  9:40     ` [Bridge] " Nikolay Aleksandrov
2023-01-29 16:55   ` Nikolay Aleksandrov
2023-01-29 16:55     ` [Bridge] " Nikolay Aleksandrov
2023-01-30  8:08     ` Ido Schimmel
2023-01-30  8:08       ` [Bridge] " Ido Schimmel
2023-01-30  8:56       ` Nikolay Aleksandrov
2023-01-30  8:56         ` [Bridge] " Nikolay Aleksandrov
2023-01-30 15:02     ` Petr Machata
2023-01-30 15:02       ` [Bridge] " Petr Machata
2023-01-26 17:01 ` [PATCH net-next 08/16] net: bridge: Add netlink knobs for number / maximum MDB entries Petr Machata
2023-01-26 17:01   ` [Bridge] " Petr Machata
2023-01-29 10:07   ` Nikolay Aleksandrov
2023-01-29 10:07     ` [Bridge] " Nikolay Aleksandrov
2023-01-29 14:58     ` Ido Schimmel
2023-01-29 14:58       ` [Bridge] " Ido Schimmel
2023-01-30 11:07     ` Petr Machata
2023-01-30 11:07       ` [Bridge] " Petr Machata
2023-01-26 17:01 ` [PATCH net-next 09/16] selftests: forwarding: Move IGMP- and MLD-related functions to lib Petr Machata
2023-01-26 17:01   ` [Bridge] " Petr Machata
2023-01-29 10:08   ` Nikolay Aleksandrov
2023-01-29 10:08     ` [Bridge] " Nikolay Aleksandrov
2023-01-26 17:01 ` [PATCH net-next 10/16] selftests: forwarding: bridge_mdb: Fix a typo Petr Machata
2023-01-26 17:01   ` [Bridge] " Petr Machata
2023-01-29 10:09   ` Nikolay Aleksandrov
2023-01-29 10:09     ` [Bridge] " Nikolay Aleksandrov
2023-01-26 17:01 ` [PATCH net-next 11/16] selftests: forwarding: lib: Add helpers for IP address handling Petr Machata
2023-01-26 17:01   ` [Bridge] " Petr Machata
2023-01-29 10:09   ` Nikolay Aleksandrov
2023-01-29 10:09     ` [Bridge] " Nikolay Aleksandrov
2023-01-26 17:01 ` [PATCH net-next 12/16] selftests: forwarding: lib: Add helpers for checksum handling Petr Machata
2023-01-26 17:01   ` [Bridge] " Petr Machata
2023-01-29 10:10   ` Nikolay Aleksandrov
2023-01-29 10:10     ` [Bridge] " Nikolay Aleksandrov
2023-01-26 17:01 ` [PATCH net-next 13/16] selftests: forwarding: lib: Parameterize IGMPv3/MLDv2 generation Petr Machata
2023-01-26 17:01   ` [Bridge] " Petr Machata
2023-01-29 10:10   ` Nikolay Aleksandrov
2023-01-29 10:10     ` [Bridge] " Nikolay Aleksandrov
2023-01-26 17:01 ` [PATCH net-next 14/16] selftests: forwarding: lib: Allow list of IPs for IGMPv3/MLDv2 Petr Machata
2023-01-26 17:01   ` [Bridge] " Petr Machata
2023-01-29 10:11   ` Nikolay Aleksandrov
2023-01-29 10:11     ` [Bridge] " Nikolay Aleksandrov
2023-01-26 17:01 ` [PATCH net-next 15/16] selftests: forwarding: lib: Add helpers to build IGMP/MLD leave packets Petr Machata
2023-01-26 17:01   ` [Bridge] " Petr Machata
2023-01-29 10:11   ` Nikolay Aleksandrov
2023-01-29 10:11     ` [Bridge] " Nikolay Aleksandrov
2023-01-26 17:01 ` [PATCH net-next 16/16] selftests: forwarding: bridge_mdb_max: Add a new selftest Petr Machata
2023-01-26 17:01   ` [Bridge] " Petr Machata
2023-01-29 10:12   ` Nikolay Aleksandrov
2023-01-29 10:12     ` [Bridge] " Nikolay Aleksandrov
2023-01-26 20:28 ` [PATCH net-next 00/16] bridge: Limit number of MDB entries per port, port-vlan Nikolay Aleksandrov
2023-01-26 20:28   ` [Bridge] " Nikolay Aleksandrov

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.