linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [net-next,rfc] net: bridge: mdb: Extend with multicast LLADDR
@ 2019-08-01 12:50 Horatiu Vultur
  2019-08-01 14:07 ` Nikolay Aleksandrov
  0 siblings, 1 reply; 8+ messages in thread
From: Horatiu Vultur @ 2019-08-01 12:50 UTC (permalink / raw)
  To: nikolay, idosch, andrew, allan.nielsen
  Cc: davem, roopa, petrm, tglx, fw, netdev, linux-kernel, bridge,
	Horatiu Vultur

Based on the discussion on the topic[1], we extend the functionality of
the 'bridge mdb' command to accept link layer multicast address. This
required only few changes and it fits nicely with the current
implementation and also the old implementation was not changed.

In this patch, we have added a MAC address to the union in 'struct br_ip'.
If we want to continue like this we should properly rename the structure as
it is not an IP any more.

To create a group for two of the front ports the following entries can
be added:
bridge mdb add dev br0 port eth0 grp 01:00:00:00:00:04 permanent vid 1
bridge mdb add dev br0 port eth1 grp 01:00:00:00:00:04 permanent vid 1

Now the entries will be display as following:
dev br0 port eth0 grp 01:00:00:00:00:04 permanent offload vid 1
dev br0 port eth1 grp 01:00:00:00:00:04 permanent offload vid 1

This requires changes to iproute2 as well, see the follogin patch for that.

Now if frame with dmac '01:00:00:00:00:04' will arrive at one of the front
ports. If we have HW offload support, then the frame will be forwarded by
the switch, and need not to go to the CPU. In a pure SW world, the frame is
forwarded by the SW bridge, which will flooded it only the ports which are
part of the group.

So far so good. This is an important part of the problem we wanted to solve.

But, there is one drawback of this approach. If you want to add two of the
front ports and br0 to receive the frame then I can't see a way of doing it
with the bridge mdb command. To do that it requireds many more changes to
the existing code.

Example:
bridge mdb add dev br0 port eth0 grp 01:00:00:00:00:04 permanent vid 1
bridge mdb add dev br0 port eth1 grp 01:00:00:00:00:04 permanent vid 1
bridge mdb add dev br0 port br0 grp 01:00:00:00:00:04 permanent vid 1 // This looks wrong.

We believe we come a long way by re-using the facilities in MDB (thanks for
convincing us in doing this), but we are still not completely happy with
the result.

If I only look at the user-interface (iproute2), and completely ignore all
the implementation details, then I still think that the FDB sub command is
more suitable for this. Today, FDB is for unicast, and MDB is for multicast.
We could change this such that MDB is for IP-multicast, and FDB is
forwarding in general (we should prevent FDB in install IP-multicast rules,
but we suggest to allow it to install MAC-Multicast rules).

The example from above would now look like this:
bridge fdb add 01:00:00:00:00:04 dev eth0 static self master
bridge fdb add 01:00:00:00:00:04 dev eth1 static self master
bridge fdb add 01:00:00:00:00:04 dev br0 static self master

It would be very similar to the "bridge vlan" command which also allow to
specify groups with and without br0.

Next observation is on the hashing data structure. In 'net_bridge_mdb_entry'
we only use/need the net_bridge_port_group/ports member (and the MAC
address, which we hacked into the br_ip struct) when we are a L2-multicast
entry. This type allow use to re-use the br_multicast_flood function
which does a lot of the work for us.

Also, the key used to do the lookup in the FDB is already a MAC address
(no need to hack the br_ip).

Regarding the events generated by switchdev: In the current proposal this
is a SWITCHDEV_OBJ_ID_PORT_MDB which refer to the switchdev_obj_port_mdb
type. If we changed to use the SWITCHDEV_FDB_ADD_TO_BRIDGE event, then
the associated data type would be switchdev_notifier_fdb_info - which also
has the information we need.

Using the FDB database, can still reuse the net_bridge_port_group type (and
associated functions), and I other parts from the MDB call graph as well.

If this sounds appealing, then we can do a proposal based on the idea.

If the MDB patch is what we can agree on, then we will continue polish this
and look for a solution to control the inclusion/exclusion of the br0
device (hints will be most appreciated).

[1] https://patchwork.ozlabs.org/patch/1136878/

Signed-off-by: Horatiu Vultur <horatiu.vultur@microchip.com>
Co-developed-by: Allan W. Nielsen <allan.nielsen@microchip.com>
Signed-off-by: Allan W. Nielsen <allan.nielsen@microchip.com>
---
 include/linux/if_bridge.h      |  1 +
 include/uapi/linux/if_bridge.h |  1 +
 net/bridge/br_device.c         |  7 +++++--
 net/bridge/br_forward.c        |  3 ++-
 net/bridge/br_input.c          | 13 ++++++++++--
 net/bridge/br_mdb.c            | 47 +++++++++++++++++++++++++++++++++++-------
 net/bridge/br_multicast.c      |  4 +++-
 net/bridge/br_private.h        |  3 ++-
 8 files changed, 64 insertions(+), 15 deletions(-)

diff --git a/include/linux/if_bridge.h b/include/linux/if_bridge.h
index f3fab5d..07b092a 100644
--- a/include/linux/if_bridge.h
+++ b/include/linux/if_bridge.h
@@ -16,6 +16,7 @@
 struct br_ip {
 	union {
 		__be32	ip4;
+		__u8	mac[ETH_ALEN];
 #if IS_ENABLED(CONFIG_IPV6)
 		struct in6_addr ip6;
 #endif
diff --git a/include/uapi/linux/if_bridge.h b/include/uapi/linux/if_bridge.h
index 773e476..e535a81 100644
--- a/include/uapi/linux/if_bridge.h
+++ b/include/uapi/linux/if_bridge.h
@@ -243,6 +243,7 @@ struct br_mdb_entry {
 		union {
 			__be32	ip4;
 			struct in6_addr ip6;
+			__u8	mac[ETH_ALEN];
 		} u;
 		__be16		proto;
 	} addr;
diff --git a/net/bridge/br_device.c b/net/bridge/br_device.c
index c05def8..b2d9041 100644
--- a/net/bridge/br_device.c
+++ b/net/bridge/br_device.c
@@ -83,7 +83,7 @@ netdev_tx_t br_dev_xmit(struct sk_buff *skb, struct net_device *dev)
 		br_flood(br, skb, BR_PKT_BROADCAST, false, true);
 	} else if (is_multicast_ether_addr(dest)) {
 		if (unlikely(netpoll_tx_running(dev))) {
-			br_flood(br, skb, BR_PKT_MULTICAST, false, true);
+			br_flood(br, skb, BR_PKT_MULTICAST_IP, false, true);
 			goto out;
 		}
 		if (br_multicast_rcv(br, NULL, skb, vid)) {
@@ -95,8 +95,11 @@ netdev_tx_t br_dev_xmit(struct sk_buff *skb, struct net_device *dev)
 		if ((mdst || BR_INPUT_SKB_CB_MROUTERS_ONLY(skb)) &&
 		    br_multicast_querier_exists(br, eth_hdr(skb)))
 			br_multicast_flood(mdst, skb, false, true);
+		else if (mdst && skb->protocol != htons(ETH_P_IP) &&
+			 skb->protocol != htons(ETH_P_IPV6))
+			br_multicast_flood(mdst, skb, false, true);
 		else
-			br_flood(br, skb, BR_PKT_MULTICAST, false, true);
+			br_flood(br, skb, BR_PKT_MULTICAST_IP, false, true);
 	} else if ((dst = br_fdb_find_rcu(br, dest, vid)) != NULL) {
 		br_forward(dst->dst, skb, false, true);
 	} else {
diff --git a/net/bridge/br_forward.c b/net/bridge/br_forward.c
index 8663700..36b58e8 100644
--- a/net/bridge/br_forward.c
+++ b/net/bridge/br_forward.c
@@ -203,7 +203,8 @@ void br_flood(struct net_bridge *br, struct sk_buff *skb,
 			if (!(p->flags & BR_FLOOD))
 				continue;
 			break;
-		case BR_PKT_MULTICAST:
+		case BR_PKT_MULTICAST_IP:
+		case BR_PKT_MULTICAST_L2:
 			if (!(p->flags & BR_MCAST_FLOOD) && skb->dev != br->dev)
 				continue;
 			break;
diff --git a/net/bridge/br_input.c b/net/bridge/br_input.c
index 21b74e7..a7db0c5 100644
--- a/net/bridge/br_input.c
+++ b/net/bridge/br_input.c
@@ -99,9 +99,13 @@ int br_handle_frame_finish(struct net *net, struct sock *sk, struct sk_buff *skb
 			pkt_type = BR_PKT_BROADCAST;
 			local_rcv = true;
 		} else {
-			pkt_type = BR_PKT_MULTICAST;
+			pkt_type = BR_PKT_MULTICAST_IP;
 			if (br_multicast_rcv(br, p, skb, vid))
 				goto drop;
+
+			if (skb->protocol != htons(ETH_P_IP) &&
+			    skb->protocol != htons(ETH_P_IPV6))
+				pkt_type = BR_PKT_MULTICAST_L2;
 		}
 	}
 
@@ -129,7 +133,12 @@ int br_handle_frame_finish(struct net *net, struct sock *sk, struct sk_buff *skb
 	}
 
 	switch (pkt_type) {
-	case BR_PKT_MULTICAST:
+	case BR_PKT_MULTICAST_L2:
+		mdst = br_mdb_get(br, skb, vid);
+		if (mdst)
+			mcast_hit = true;
+		break;
+	case BR_PKT_MULTICAST_IP:
 		mdst = br_mdb_get(br, skb, vid);
 		if ((mdst || BR_INPUT_SKB_CB_MROUTERS_ONLY(skb)) &&
 		    br_multicast_querier_exists(br, eth_hdr(skb))) {
diff --git a/net/bridge/br_mdb.c b/net/bridge/br_mdb.c
index bf6acd3..b337a30 100644
--- a/net/bridge/br_mdb.c
+++ b/net/bridge/br_mdb.c
@@ -67,12 +67,19 @@ static void __mdb_entry_to_br_ip(struct br_mdb_entry *entry, struct br_ip *ip)
 	memset(ip, 0, sizeof(struct br_ip));
 	ip->vid = entry->vid;
 	ip->proto = entry->addr.proto;
-	if (ip->proto == htons(ETH_P_IP))
+	switch (ip->proto) {
+	case htons(ETH_P_IP):
 		ip->u.ip4 = entry->addr.u.ip4;
+		break;
 #if IS_ENABLED(CONFIG_IPV6)
-	else
+	case htons(ETH_P_IPV6):
 		ip->u.ip6 = entry->addr.u.ip6;
+		break;
 #endif
+	default:
+		ether_addr_copy(ip->u.mac, entry->addr.u.mac);
+		break;
+	}
 }
 
 static int br_mdb_fill_info(struct sk_buff *skb, struct netlink_callback *cb,
@@ -117,12 +124,19 @@ static int br_mdb_fill_info(struct sk_buff *skb, struct netlink_callback *cb,
 			e.ifindex = port->dev->ifindex;
 			e.vid = p->addr.vid;
 			__mdb_entry_fill_flags(&e, p->flags);
-			if (p->addr.proto == htons(ETH_P_IP))
+			switch (p->addr.proto) {
+			case htons(ETH_P_IP):
 				e.addr.u.ip4 = p->addr.u.ip4;
+				break;
 #if IS_ENABLED(CONFIG_IPV6)
-			if (p->addr.proto == htons(ETH_P_IPV6))
+			case htons(ETH_P_IPV6):
 				e.addr.u.ip6 = p->addr.u.ip6;
+				break;
 #endif
+			default:
+				ether_addr_copy(e.addr.u.mac, p->addr.u.mac);
+				break;
+			}
 			e.addr.proto = p->addr.proto;
 			nest_ent = nla_nest_start_noflag(skb,
 							 MDBA_MDB_ENTRY_INFO);
@@ -322,12 +336,19 @@ static void br_mdb_switchdev_host_port(struct net_device *dev,
 		.vid = entry->vid,
 	};
 
-	if (entry->addr.proto == htons(ETH_P_IP))
+	switch (entry->addr.proto) {
+	case htons(ETH_P_IP):
 		ip_eth_mc_map(entry->addr.u.ip4, mdb.addr);
+		break;
 #if IS_ENABLED(CONFIG_IPV6)
-	else
+	case htons(ETH_P_IPV6):
 		ipv6_eth_mc_map(&entry->addr.u.ip6, mdb.addr);
+		break;
 #endif
+	default:
+		ether_addr_copy(mdb.addr, entry->addr.u.mac);
+		break;
+	}
 
 	mdb.obj.orig_dev = dev;
 	switch (type) {
@@ -367,12 +388,19 @@ static void __br_mdb_notify(struct net_device *dev, struct net_bridge_port *p,
 	int err = -ENOBUFS;
 
 	port_dev = __dev_get_by_index(net, entry->ifindex);
-	if (entry->addr.proto == htons(ETH_P_IP))
+	switch (entry->addr.proto) {
+	case htons(ETH_P_IP):
 		ip_eth_mc_map(entry->addr.u.ip4, mdb.addr);
+		break;
 #if IS_ENABLED(CONFIG_IPV6)
-	else
+	case htons(ETH_P_IPV6):
 		ipv6_eth_mc_map(&entry->addr.u.ip6, mdb.addr);
+		break;
 #endif
+	default:
+		ether_addr_copy(mdb.addr, entry->addr.u.mac);
+		break;
+	}
 
 	mdb.obj.orig_dev = port_dev;
 	if (p && port_dev && type == RTM_NEWMDB) {
@@ -423,6 +451,7 @@ void br_mdb_notify(struct net_device *dev, struct net_bridge_port *port,
 #if IS_ENABLED(CONFIG_IPV6)
 	entry.addr.u.ip6 = group->u.ip6;
 #endif
+	ether_addr_copy(group->u.mac, entry.addr.u.mac);
 	entry.vid = group->vid;
 	__mdb_entry_fill_flags(&entry, flags);
 	__br_mdb_notify(dev, port, &entry, type);
@@ -510,6 +539,8 @@ static bool is_valid_mdb_entry(struct br_mdb_entry *entry)
 		if (ipv6_addr_is_ll_all_nodes(&entry->addr.u.ip6))
 			return false;
 #endif
+	} else if (is_multicast_ether_addr(entry->addr.u.mac)) {
+		;
 	} else
 		return false;
 	if (entry->state != MDB_PERMANENT && entry->state != MDB_TEMPORARY)
diff --git a/net/bridge/br_multicast.c b/net/bridge/br_multicast.c
index de22c8f..01250c1 100644
--- a/net/bridge/br_multicast.c
+++ b/net/bridge/br_multicast.c
@@ -133,7 +133,9 @@ struct net_bridge_mdb_entry *br_mdb_get(struct net_bridge *br,
 		break;
 #endif
 	default:
-		return NULL;
+		ip.proto = 0;
+		ether_addr_copy(ip.u.mac, eth_hdr(skb)->h_dest);
+		break;
 	}
 
 	return br_mdb_ip_get_rcu(br, &ip);
diff --git a/net/bridge/br_private.h b/net/bridge/br_private.h
index 159a0e2..60e2430d 100644
--- a/net/bridge/br_private.h
+++ b/net/bridge/br_private.h
@@ -590,7 +590,8 @@ void br_fdb_offloaded_set(struct net_bridge *br, struct net_bridge_port *p,
 /* br_forward.c */
 enum br_pkt_type {
 	BR_PKT_UNICAST,
-	BR_PKT_MULTICAST,
+	BR_PKT_MULTICAST_IP,
+	BR_PKT_MULTICAST_L2,
 	BR_PKT_BROADCAST
 };
 int br_dev_queue_push_xmit(struct net *net, struct sock *sk, struct sk_buff *skb);
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 8+ messages in thread

* Re: [net-next,rfc] net: bridge: mdb: Extend with multicast LLADDR
  2019-08-01 12:50 [net-next,rfc] net: bridge: mdb: Extend with multicast LLADDR Horatiu Vultur
@ 2019-08-01 14:07 ` Nikolay Aleksandrov
  2019-08-01 14:11   ` Nikolay Aleksandrov
  2019-08-02 14:07   ` Allan W. Nielsen
  0 siblings, 2 replies; 8+ messages in thread
From: Nikolay Aleksandrov @ 2019-08-01 14:07 UTC (permalink / raw)
  To: Horatiu Vultur, idosch, andrew, allan.nielsen
  Cc: davem, roopa, petrm, tglx, fw, netdev, linux-kernel, bridge

[-- Attachment #1: Type: text/plain, Size: 13857 bytes --]

Hi Horatiu,
Overall I think MDB is the right way, we'd like to contain the multicast code.
A few comments below.

On 01/08/2019 15:50, Horatiu Vultur wrote:
> Based on the discussion on the topic[1], we extend the functionality of
> the 'bridge mdb' command to accept link layer multicast address. This
> required only few changes and it fits nicely with the current
> implementation and also the old implementation was not changed.
> 
> In this patch, we have added a MAC address to the union in 'struct br_ip'.
> If we want to continue like this we should properly rename the structure as
> it is not an IP any more.
> 
> To create a group for two of the front ports the following entries can
> be added:
> bridge mdb add dev br0 port eth0 grp 01:00:00:00:00:04 permanent vid 1
> bridge mdb add dev br0 port eth1 grp 01:00:00:00:00:04 permanent vid 1
> 
> Now the entries will be display as following:
> dev br0 port eth0 grp 01:00:00:00:00:04 permanent offload vid 1
> dev br0 port eth1 grp 01:00:00:00:00:04 permanent offload vid 1
> 
> This requires changes to iproute2 as well, see the follogin patch for that.
> 
> Now if frame with dmac '01:00:00:00:00:04' will arrive at one of the front
> ports. If we have HW offload support, then the frame will be forwarded by
> the switch, and need not to go to the CPU. In a pure SW world, the frame is
> forwarded by the SW bridge, which will flooded it only the ports which are
> part of the group.
> 
> So far so good. This is an important part of the problem we wanted to solve.
> 
> But, there is one drawback of this approach. If you want to add two of the
> front ports and br0 to receive the frame then I can't see a way of doing it
> with the bridge mdb command. To do that it requireds many more changes to
> the existing code.
> 
> Example:
> bridge mdb add dev br0 port eth0 grp 01:00:00:00:00:04 permanent vid 1
> bridge mdb add dev br0 port eth1 grp 01:00:00:00:00:04 permanent vid 1
> bridge mdb add dev br0 port br0 grp 01:00:00:00:00:04 permanent vid 1 // This looks wrong.
> 
> We believe we come a long way by re-using the facilities in MDB (thanks for
> convincing us in doing this), but we are still not completely happy with
> the result.
> 

Just add self argument for the bridge mdb command, no need to specify it twice.

> If I only look at the user-interface (iproute2), and completely ignore all
> the implementation details, then I still think that the FDB sub command is
> more suitable for this. Today, FDB is for unicast, and MDB is for multicast.
> We could change this such that MDB is for IP-multicast, and FDB is
> forwarding in general (we should prevent FDB in install IP-multicast rules,
> but we suggest to allow it to install MAC-Multicast rules).
> 
> The example from above would now look like this:
> bridge fdb add 01:00:00:00:00:04 dev eth0 static self master
> bridge fdb add 01:00:00:00:00:04 dev eth1 static self master
> bridge fdb add 01:00:00:00:00:04 dev br0 static self master
> 
> It would be very similar to the "bridge vlan" command which also allow to
> specify groups with and without br0.
> 
> Next observation is on the hashing data structure. In 'net_bridge_mdb_entry'
> we only use/need the net_bridge_port_group/ports member (and the MAC
> address, which we hacked into the br_ip struct) when we are a L2-multicast
> entry. This type allow use to re-use the br_multicast_flood function
> which does a lot of the work for us.
> 
> Also, the key used to do the lookup in the FDB is already a MAC address
> (no need to hack the br_ip).
> 

Look at it as extending br_ip, it's not a hack but a valid mcast use-case.
In fact br_ip is an internal structure which can easily be renamed.

> Regarding the events generated by switchdev: In the current proposal this
> is a SWITCHDEV_OBJ_ID_PORT_MDB which refer to the switchdev_obj_port_mdb
> type. If we changed to use the SWITCHDEV_FDB_ADD_TO_BRIDGE event, then
> the associated data type would be switchdev_notifier_fdb_info - which also
> has the information we need.
> 
> Using the FDB database, can still reuse the net_bridge_port_group type (and
> associated functions), and I other parts from the MDB call graph as well.
> 

We don't want to mix these.

> If this sounds appealing, then we can do a proposal based on the idea.
> 
> If the MDB patch is what we can agree on, then we will continue polish this
> and look for a solution to control the inclusion/exclusion of the br0
> device (hints will be most appreciated).
> 

Yes, please. Let's work on this implementation. Some code comments below.

> [1] https://patchwork.ozlabs.org/patch/1136878/
> 
> Signed-off-by: Horatiu Vultur <horatiu.vultur@microchip.com>
> Co-developed-by: Allan W. Nielsen <allan.nielsen@microchip.com>
> Signed-off-by: Allan W. Nielsen <allan.nielsen@microchip.com>
> ---
>  include/linux/if_bridge.h      |  1 +
>  include/uapi/linux/if_bridge.h |  1 +
>  net/bridge/br_device.c         |  7 +++++--
>  net/bridge/br_forward.c        |  3 ++-
>  net/bridge/br_input.c          | 13 ++++++++++--
>  net/bridge/br_mdb.c            | 47 +++++++++++++++++++++++++++++++++++-------
>  net/bridge/br_multicast.c      |  4 +++-
>  net/bridge/br_private.h        |  3 ++-
>  8 files changed, 64 insertions(+), 15 deletions(-)
> 

Overall I don't think we need this BR_PKT_MULTICAST_L2, we could do the below much
easier and without the checks if you use a per-mdb flag that says it's to be treated
as a MULTICAST_L2 entry. Then you remove all of the BR_PKT_MULTICAST_L2 code (see the
attached patch based on this one for example). and continue processing it as it is processed today.
We'll keep the fast-path with minimal number of new conditionals.

Something like the patch I've attached to this reply, note that it is not complete
just to show the intent, you'll have to re-work br_mdb_notify() to make it proper
and there're most probably other details I've missed. If you find even better/less
complex way to do it then please do.

Cheers,
 Nik

> diff --git a/include/linux/if_bridge.h b/include/linux/if_bridge.h
> index f3fab5d..07b092a 100644
> --- a/include/linux/if_bridge.h
> +++ b/include/linux/if_bridge.h
> @@ -16,6 +16,7 @@
>  struct br_ip {
>  	union {
>  		__be32	ip4;
> +		__u8	mac[ETH_ALEN];
>  #if IS_ENABLED(CONFIG_IPV6)
>  		struct in6_addr ip6;
>  #endif
> diff --git a/include/uapi/linux/if_bridge.h b/include/uapi/linux/if_bridge.h
> index 773e476..e535a81 100644
> --- a/include/uapi/linux/if_bridge.h
> +++ b/include/uapi/linux/if_bridge.h
> @@ -243,6 +243,7 @@ struct br_mdb_entry {
>  		union {
>  			__be32	ip4;
>  			struct in6_addr ip6;
> +			__u8	mac[ETH_ALEN];
>  		} u;
>  		__be16		proto;
>  	} addr;
> diff --git a/net/bridge/br_device.c b/net/bridge/br_device.c
> index c05def8..b2d9041 100644
> --- a/net/bridge/br_device.c
> +++ b/net/bridge/br_device.c
> @@ -83,7 +83,7 @@ netdev_tx_t br_dev_xmit(struct sk_buff *skb, struct net_device *dev)
>  		br_flood(br, skb, BR_PKT_BROADCAST, false, true);
>  	} else if (is_multicast_ether_addr(dest)) {
>  		if (unlikely(netpoll_tx_running(dev))) {
> -			br_flood(br, skb, BR_PKT_MULTICAST, false, true);
> +			br_flood(br, skb, BR_PKT_MULTICAST_IP, false, true);
>  			goto out;
>  		}
>  		if (br_multicast_rcv(br, NULL, skb, vid)) {
> @@ -95,8 +95,11 @@ netdev_tx_t br_dev_xmit(struct sk_buff *skb, struct net_device *dev)
>  		if ((mdst || BR_INPUT_SKB_CB_MROUTERS_ONLY(skb)) &&
>  		    br_multicast_querier_exists(br, eth_hdr(skb)))
>  			br_multicast_flood(mdst, skb, false, true);
> +		else if (mdst && skb->protocol != htons(ETH_P_IP) &&
> +			 skb->protocol != htons(ETH_P_IPV6))
> +			br_multicast_flood(mdst, skb, false, true);
>  		else
> -			br_flood(br, skb, BR_PKT_MULTICAST, false, true);
> +			br_flood(br, skb, BR_PKT_MULTICAST_IP, false, true);
>  	} else if ((dst = br_fdb_find_rcu(br, dest, vid)) != NULL) {
>  		br_forward(dst->dst, skb, false, true);
>  	} else {
> diff --git a/net/bridge/br_forward.c b/net/bridge/br_forward.c
> index 8663700..36b58e8 100644
> --- a/net/bridge/br_forward.c
> +++ b/net/bridge/br_forward.c
> @@ -203,7 +203,8 @@ void br_flood(struct net_bridge *br, struct sk_buff *skb,
>  			if (!(p->flags & BR_FLOOD))
>  				continue;
>  			break;
> -		case BR_PKT_MULTICAST:
> +		case BR_PKT_MULTICAST_IP:
> +		case BR_PKT_MULTICAST_L2:
>  			if (!(p->flags & BR_MCAST_FLOOD) && skb->dev != br->dev)
>  				continue;
>  			break;
> diff --git a/net/bridge/br_input.c b/net/bridge/br_input.c
> index 21b74e7..a7db0c5 100644
> --- a/net/bridge/br_input.c
> +++ b/net/bridge/br_input.c
> @@ -99,9 +99,13 @@ int br_handle_frame_finish(struct net *net, struct sock *sk, struct sk_buff *skb
>  			pkt_type = BR_PKT_BROADCAST;
>  			local_rcv = true;
>  		} else {
> -			pkt_type = BR_PKT_MULTICAST;
> +			pkt_type = BR_PKT_MULTICAST_IP;
>  			if (br_multicast_rcv(br, p, skb, vid))
>  				goto drop;
> +
> +			if (skb->protocol != htons(ETH_P_IP) &&
> +			    skb->protocol != htons(ETH_P_IPV6))
> +				pkt_type = BR_PKT_MULTICAST_L2;
>  		}
>  	}
>  
> @@ -129,7 +133,12 @@ int br_handle_frame_finish(struct net *net, struct sock *sk, struct sk_buff *skb
>  	}
>  
>  	switch (pkt_type) {
> -	case BR_PKT_MULTICAST:
> +	case BR_PKT_MULTICAST_L2:
> +		mdst = br_mdb_get(br, skb, vid);
> +		if (mdst)
> +			mcast_hit = true;
> +		break;
> +	case BR_PKT_MULTICAST_IP:
>  		mdst = br_mdb_get(br, skb, vid);
>  		if ((mdst || BR_INPUT_SKB_CB_MROUTERS_ONLY(skb)) &&
>  		    br_multicast_querier_exists(br, eth_hdr(skb))) {
> diff --git a/net/bridge/br_mdb.c b/net/bridge/br_mdb.c
> index bf6acd3..b337a30 100644
> --- a/net/bridge/br_mdb.c
> +++ b/net/bridge/br_mdb.c
> @@ -67,12 +67,19 @@ static void __mdb_entry_to_br_ip(struct br_mdb_entry *entry, struct br_ip *ip)
>  	memset(ip, 0, sizeof(struct br_ip));
>  	ip->vid = entry->vid;
>  	ip->proto = entry->addr.proto;
> -	if (ip->proto == htons(ETH_P_IP))
> +	switch (ip->proto) {
> +	case htons(ETH_P_IP):
>  		ip->u.ip4 = entry->addr.u.ip4;
> +		break;
>  #if IS_ENABLED(CONFIG_IPV6)
> -	else
> +	case htons(ETH_P_IPV6):
>  		ip->u.ip6 = entry->addr.u.ip6;
> +		break;
>  #endif
> +	default:
> +		ether_addr_copy(ip->u.mac, entry->addr.u.mac);
> +		break;
> +	}
>  }
>  
>  static int br_mdb_fill_info(struct sk_buff *skb, struct netlink_callback *cb,
> @@ -117,12 +124,19 @@ static int br_mdb_fill_info(struct sk_buff *skb, struct netlink_callback *cb,
>  			e.ifindex = port->dev->ifindex;
>  			e.vid = p->addr.vid;
>  			__mdb_entry_fill_flags(&e, p->flags);
> -			if (p->addr.proto == htons(ETH_P_IP))
> +			switch (p->addr.proto) {
> +			case htons(ETH_P_IP):
>  				e.addr.u.ip4 = p->addr.u.ip4;
> +				break;
>  #if IS_ENABLED(CONFIG_IPV6)
> -			if (p->addr.proto == htons(ETH_P_IPV6))
> +			case htons(ETH_P_IPV6):
>  				e.addr.u.ip6 = p->addr.u.ip6;
> +				break;
>  #endif
> +			default:
> +				ether_addr_copy(e.addr.u.mac, p->addr.u.mac);
> +				break;
> +			}
>  			e.addr.proto = p->addr.proto;
>  			nest_ent = nla_nest_start_noflag(skb,
>  							 MDBA_MDB_ENTRY_INFO);
> @@ -322,12 +336,19 @@ static void br_mdb_switchdev_host_port(struct net_device *dev,
>  		.vid = entry->vid,
>  	};
>  
> -	if (entry->addr.proto == htons(ETH_P_IP))
> +	switch (entry->addr.proto) {
> +	case htons(ETH_P_IP):
>  		ip_eth_mc_map(entry->addr.u.ip4, mdb.addr);
> +		break;
>  #if IS_ENABLED(CONFIG_IPV6)
> -	else
> +	case htons(ETH_P_IPV6):
>  		ipv6_eth_mc_map(&entry->addr.u.ip6, mdb.addr);
> +		break;
>  #endif
> +	default:
> +		ether_addr_copy(mdb.addr, entry->addr.u.mac);
> +		break;
> +	}
>  
>  	mdb.obj.orig_dev = dev;
>  	switch (type) {
> @@ -367,12 +388,19 @@ static void __br_mdb_notify(struct net_device *dev, struct net_bridge_port *p,
>  	int err = -ENOBUFS;
>  
>  	port_dev = __dev_get_by_index(net, entry->ifindex);
> -	if (entry->addr.proto == htons(ETH_P_IP))
> +	switch (entry->addr.proto) {
> +	case htons(ETH_P_IP):
>  		ip_eth_mc_map(entry->addr.u.ip4, mdb.addr);
> +		break;
>  #if IS_ENABLED(CONFIG_IPV6)
> -	else
> +	case htons(ETH_P_IPV6):
>  		ipv6_eth_mc_map(&entry->addr.u.ip6, mdb.addr);
> +		break;
>  #endif
> +	default:
> +		ether_addr_copy(mdb.addr, entry->addr.u.mac);
> +		break;
> +	}
>  
>  	mdb.obj.orig_dev = port_dev;
>  	if (p && port_dev && type == RTM_NEWMDB) {
> @@ -423,6 +451,7 @@ void br_mdb_notify(struct net_device *dev, struct net_bridge_port *port,
>  #if IS_ENABLED(CONFIG_IPV6)
>  	entry.addr.u.ip6 = group->u.ip6;
>  #endif
> +	ether_addr_copy(group->u.mac, entry.addr.u.mac);
>  	entry.vid = group->vid;
>  	__mdb_entry_fill_flags(&entry, flags);
>  	__br_mdb_notify(dev, port, &entry, type);
> @@ -510,6 +539,8 @@ static bool is_valid_mdb_entry(struct br_mdb_entry *entry)
>  		if (ipv6_addr_is_ll_all_nodes(&entry->addr.u.ip6))
>  			return false;
>  #endif
> +	} else if (is_multicast_ether_addr(entry->addr.u.mac)) {
> +		;
>  	} else
>  		return false;
>  	if (entry->state != MDB_PERMANENT && entry->state != MDB_TEMPORARY)
> diff --git a/net/bridge/br_multicast.c b/net/bridge/br_multicast.c
> index de22c8f..01250c1 100644
> --- a/net/bridge/br_multicast.c
> +++ b/net/bridge/br_multicast.c
> @@ -133,7 +133,9 @@ struct net_bridge_mdb_entry *br_mdb_get(struct net_bridge *br,
>  		break;
>  #endif
>  	default:
> -		return NULL;
> +		ip.proto = 0;
> +		ether_addr_copy(ip.u.mac, eth_hdr(skb)->h_dest);
> +		break;
>  	}
>  
>  	return br_mdb_ip_get_rcu(br, &ip);
> diff --git a/net/bridge/br_private.h b/net/bridge/br_private.h
> index 159a0e2..60e2430d 100644
> --- a/net/bridge/br_private.h
> +++ b/net/bridge/br_private.h
> @@ -590,7 +590,8 @@ void br_fdb_offloaded_set(struct net_bridge *br, struct net_bridge_port *p,
>  /* br_forward.c */
>  enum br_pkt_type {
>  	BR_PKT_UNICAST,
> -	BR_PKT_MULTICAST,
> +	BR_PKT_MULTICAST_IP,
> +	BR_PKT_MULTICAST_L2,
>  	BR_PKT_BROADCAST
>  };
>  int br_dev_queue_push_xmit(struct net *net, struct sock *sk, struct sk_buff *skb);
> 


[-- Attachment #2: 0001-net-bridge-mdb-Extend-with-multicast-LLADDR.patch --]
[-- Type: text/x-patch, Size: 12289 bytes --]

From 611cb2250c06a22d08819bc2d3d67bb7a2867cc4 Mon Sep 17 00:00:00 2001
From: Horatiu Vultur <horatiu.vultur@microchip.com>
Date: Thu, 1 Aug 2019 14:50:40 +0200
Subject: [RFC incomplete] net: bridge: mdb: Extend with multicast LLADDR

Based on the discussion on the topic[1], we extend the functionality of
the 'bridge mdb' command to accept link layer multicast address. This
required only few changes and it fits nicely with the current
implementation and also the old implementation was not changed.

In this patch, we have added a MAC address to the union in 'struct br_ip'.
If we want to continue like this we should properly rename the structure as
it is not an IP any more.

To create a group for two of the front ports the following entries can
be added:
bridge mdb add dev br0 port eth0 grp 01:00:00:00:00:04 permanent vid 1
bridge mdb add dev br0 port eth1 grp 01:00:00:00:00:04 permanent vid 1

Now the entries will be display as following:
dev br0 port eth0 grp 01:00:00:00:00:04 permanent offload vid 1
dev br0 port eth1 grp 01:00:00:00:00:04 permanent offload vid 1

This requires changes to iproute2 as well, see the follogin patch for that.

Now if frame with dmac '01:00:00:00:00:04' will arrive at one of the front
ports. If we have HW offload support, then the frame will be forwarded by
the switch, and need not to go to the CPU. In a pure SW world, the frame is
forwarded by the SW bridge, which will flooded it only the ports which are
part of the group.

So far so good. This is an important part of the problem we wanted to solve.

But, there is one drawback of this approach. If you want to add two of the
front ports and br0 to receive the frame then I can't see a way of doing it
with the bridge mdb command. To do that it requireds many more changes to
the existing code.

Example:
bridge mdb add dev br0 port eth0 grp 01:00:00:00:00:04 permanent vid 1
bridge mdb add dev br0 port eth1 grp 01:00:00:00:00:04 permanent vid 1
bridge mdb add dev br0 port br0 grp 01:00:00:00:00:04 permanent vid 1 // This looks wrong.

We believe we come a long way by re-using the facilities in MDB (thanks for
convincing us in doing this), but we are still not completely happy with
the result.

If I only look at the user-interface (iproute2), and completely ignore all
the implementation details, then I still think that the FDB sub command is
more suitable for this. Today, FDB is for unicast, and MDB is for multicast.
We could change this such that MDB is for IP-multicast, and FDB is
forwarding in general (we should prevent FDB in install IP-multicast rules,
but we suggest to allow it to install MAC-Multicast rules).

The example from above would now look like this:
bridge fdb add 01:00:00:00:00:04 dev eth0 static self master
bridge fdb add 01:00:00:00:00:04 dev eth1 static self master
bridge fdb add 01:00:00:00:00:04 dev br0 static self master

It would be very similar to the "bridge vlan" command which also allow to
specify groups with and without br0.

Next observation is on the hashing data structure. In 'net_bridge_mdb_entry'
we only use/need the net_bridge_port_group/ports member (and the MAC
address, which we hacked into the br_ip struct) when we are a L2-multicast
entry. This type allow use to re-use the br_multicast_flood function
which does a lot of the work for us.

Also, the key used to do the lookup in the FDB is already a MAC address
(no need to hack the br_ip).

Regarding the events generated by switchdev: In the current proposal this
is a SWITCHDEV_OBJ_ID_PORT_MDB which refer to the switchdev_obj_port_mdb
type. If we changed to use the SWITCHDEV_FDB_ADD_TO_BRIDGE event, then
the associated data type would be switchdev_notifier_fdb_info - which also
has the information we need.

Using the FDB database, can still reuse the net_bridge_port_group type (and
associated functions), and I other parts from the MDB call graph as well.

If this sounds appealing, then we can do a proposal based on the idea.

If the MDB patch is what we can agree on, then we will continue polish this
and look for a solution to control the inclusion/exclusion of the br0
device (hints will be most appreciated).

[1] https://patchwork.ozlabs.org/patch/1136878/

Signed-off-by: Horatiu Vultur <horatiu.vultur@microchip.com>
Co-developed-by: Allan W. Nielsen <allan.nielsen@microchip.com>
Signed-off-by: Allan W. Nielsen <allan.nielsen@microchip.com>
---
 include/linux/if_bridge.h      |  1 +
 include/uapi/linux/if_bridge.h |  1 +
 net/bridge/br_device.c         |  7 +++--
 net/bridge/br_forward.c        |  3 ++-
 net/bridge/br_input.c          | 13 ++++++++--
 net/bridge/br_mdb.c            | 47 ++++++++++++++++++++++++++++------
 net/bridge/br_multicast.c      |  4 ++-
 net/bridge/br_private.h        |  3 ++-
 8 files changed, 64 insertions(+), 15 deletions(-)

diff --git a/include/linux/if_bridge.h b/include/linux/if_bridge.h
index 9e57c4411734..68f2558b1a23 100644
--- a/include/linux/if_bridge.h
+++ b/include/linux/if_bridge.h
@@ -16,6 +16,7 @@
 struct br_ip {
 	union {
 		__be32	ip4;
+		__u8	mac[ETH_ALEN];
 #if IS_ENABLED(CONFIG_IPV6)
 		struct in6_addr ip6;
 #endif
diff --git a/include/uapi/linux/if_bridge.h b/include/uapi/linux/if_bridge.h
index 1b3c2b643a02..49a2de0b7420 100644
--- a/include/uapi/linux/if_bridge.h
+++ b/include/uapi/linux/if_bridge.h
@@ -244,6 +244,7 @@ struct br_mdb_entry {
 		union {
 			__be32	ip4;
 			struct in6_addr ip6;
+			__u8	mac[ETH_ALEN];
 		} u;
 		__be16		proto;
 	} addr;
diff --git a/net/bridge/br_device.c b/net/bridge/br_device.c
index 681b72862c16..581d6875cdb2 100644
--- a/net/bridge/br_device.c
+++ b/net/bridge/br_device.c
@@ -84,7 +84,7 @@ netdev_tx_t br_dev_xmit(struct sk_buff *skb, struct net_device *dev)
 		br_flood(br, skb, BR_PKT_BROADCAST, false, true);
 	} else if (is_multicast_ether_addr(dest)) {
 		if (unlikely(netpoll_tx_running(dev))) {
-			br_flood(br, skb, BR_PKT_MULTICAST, false, true);
+			br_flood(br, skb, BR_PKT_MULTICAST_IP, false, true);
 			goto out;
 		}
 		if (br_multicast_rcv(br, NULL, skb, vid)) {
@@ -96,8 +96,11 @@ netdev_tx_t br_dev_xmit(struct sk_buff *skb, struct net_device *dev)
 		if ((mdst || BR_INPUT_SKB_CB_MROUTERS_ONLY(skb)) &&
 		    br_multicast_querier_exists(br, eth_hdr(skb)))
 			br_multicast_flood(mdst, skb, false, true);
+		else if (mdst && skb->protocol != htons(ETH_P_IP) &&
+			 skb->protocol != htons(ETH_P_IPV6))
+			br_multicast_flood(mdst, skb, false, true);
 		else
-			br_flood(br, skb, BR_PKT_MULTICAST, false, true);
+			br_flood(br, skb, BR_PKT_MULTICAST_IP, false, true);
 	} else if ((dst = br_fdb_find_rcu(br, dest, vid)) != NULL) {
 		br_forward(dst->dst, skb, false, true);
 	} else {
diff --git a/net/bridge/br_forward.c b/net/bridge/br_forward.c
index 86637000f275..36b58e86d719 100644
--- a/net/bridge/br_forward.c
+++ b/net/bridge/br_forward.c
@@ -203,7 +203,8 @@ void br_flood(struct net_bridge *br, struct sk_buff *skb,
 			if (!(p->flags & BR_FLOOD))
 				continue;
 			break;
-		case BR_PKT_MULTICAST:
+		case BR_PKT_MULTICAST_IP:
+		case BR_PKT_MULTICAST_L2:
 			if (!(p->flags & BR_MCAST_FLOOD) && skb->dev != br->dev)
 				continue;
 			break;
diff --git a/net/bridge/br_input.c b/net/bridge/br_input.c
index 09b1dd8cd853..f69e710a7f55 100644
--- a/net/bridge/br_input.c
+++ b/net/bridge/br_input.c
@@ -97,9 +97,13 @@ int br_handle_frame_finish(struct net *net, struct sock *sk, struct sk_buff *skb
 			pkt_type = BR_PKT_BROADCAST;
 			local_rcv = true;
 		} else {
-			pkt_type = BR_PKT_MULTICAST;
+			pkt_type = BR_PKT_MULTICAST_IP;
 			if (br_multicast_rcv(br, p, skb, vid))
 				goto drop;
+
+			if (skb->protocol != htons(ETH_P_IP) &&
+			    skb->protocol != htons(ETH_P_IPV6))
+				pkt_type = BR_PKT_MULTICAST_L2;
 		}
 	}
 
@@ -127,7 +131,12 @@ int br_handle_frame_finish(struct net *net, struct sock *sk, struct sk_buff *skb
 	}
 
 	switch (pkt_type) {
-	case BR_PKT_MULTICAST:
+	case BR_PKT_MULTICAST_L2:
+		mdst = br_mdb_get(br, skb, vid);
+		if (mdst)
+			mcast_hit = true;
+		break;
+	case BR_PKT_MULTICAST_IP:
 		mdst = br_mdb_get(br, skb, vid);
 		if ((mdst || BR_INPUT_SKB_CB_MROUTERS_ONLY(skb)) &&
 		    br_multicast_querier_exists(br, eth_hdr(skb))) {
diff --git a/net/bridge/br_mdb.c b/net/bridge/br_mdb.c
index 428af1abf8cc..25d182f45fde 100644
--- a/net/bridge/br_mdb.c
+++ b/net/bridge/br_mdb.c
@@ -69,12 +69,19 @@ static void __mdb_entry_to_br_ip(struct br_mdb_entry *entry, struct br_ip *ip)
 	memset(ip, 0, sizeof(struct br_ip));
 	ip->vid = entry->vid;
 	ip->proto = entry->addr.proto;
-	if (ip->proto == htons(ETH_P_IP))
+	switch (ip->proto) {
+	case htons(ETH_P_IP):
 		ip->u.ip4 = entry->addr.u.ip4;
+		break;
 #if IS_ENABLED(CONFIG_IPV6)
-	else
+	case htons(ETH_P_IPV6):
 		ip->u.ip6 = entry->addr.u.ip6;
+		break;
 #endif
+	default:
+		ether_addr_copy(ip->u.mac, entry->addr.u.mac);
+		break;
+	}
 }
 
 static int br_mdb_fill_info(struct sk_buff *skb, struct netlink_callback *cb,
@@ -119,12 +126,19 @@ static int br_mdb_fill_info(struct sk_buff *skb, struct netlink_callback *cb,
 			e.ifindex = port->dev->ifindex;
 			e.vid = p->addr.vid;
 			__mdb_entry_fill_flags(&e, p->flags);
-			if (p->addr.proto == htons(ETH_P_IP))
+			switch (p->addr.proto) {
+			case htons(ETH_P_IP):
 				e.addr.u.ip4 = p->addr.u.ip4;
+				break;
 #if IS_ENABLED(CONFIG_IPV6)
-			if (p->addr.proto == htons(ETH_P_IPV6))
+			case htons(ETH_P_IPV6):
 				e.addr.u.ip6 = p->addr.u.ip6;
+				break;
 #endif
+			default:
+				ether_addr_copy(e.addr.u.mac, p->addr.u.mac);
+				break;
+			}
 			e.addr.proto = p->addr.proto;
 			nest_ent = nla_nest_start_noflag(skb,
 							 MDBA_MDB_ENTRY_INFO);
@@ -324,12 +338,19 @@ static void br_mdb_switchdev_host_port(struct net_device *dev,
 		.vid = entry->vid,
 	};
 
-	if (entry->addr.proto == htons(ETH_P_IP))
+	switch (entry->addr.proto) {
+	case htons(ETH_P_IP):
 		ip_eth_mc_map(entry->addr.u.ip4, mdb.addr);
+		break;
 #if IS_ENABLED(CONFIG_IPV6)
-	else
+	case htons(ETH_P_IPV6):
 		ipv6_eth_mc_map(&entry->addr.u.ip6, mdb.addr);
+		break;
 #endif
+	default:
+		ether_addr_copy(mdb.addr, entry->addr.u.mac);
+		break;
+	}
 
 	mdb.obj.orig_dev = dev;
 	switch (type) {
@@ -369,12 +390,19 @@ static void __br_mdb_notify(struct net_device *dev, struct net_bridge_port *p,
 	int err = -ENOBUFS;
 
 	port_dev = __dev_get_by_index(net, entry->ifindex);
-	if (entry->addr.proto == htons(ETH_P_IP))
+	switch (entry->addr.proto) {
+	case htons(ETH_P_IP):
 		ip_eth_mc_map(entry->addr.u.ip4, mdb.addr);
+		break;
 #if IS_ENABLED(CONFIG_IPV6)
-	else
+	case htons(ETH_P_IPV6):
 		ipv6_eth_mc_map(&entry->addr.u.ip6, mdb.addr);
+		break;
 #endif
+	default:
+		ether_addr_copy(mdb.addr, entry->addr.u.mac);
+		break;
+	}
 
 	mdb.obj.orig_dev = port_dev;
 	if (p && port_dev && type == RTM_NEWMDB) {
@@ -425,6 +453,7 @@ void br_mdb_notify(struct net_device *dev, struct net_bridge_port *port,
 #if IS_ENABLED(CONFIG_IPV6)
 	entry.addr.u.ip6 = group->u.ip6;
 #endif
+	ether_addr_copy(group->u.mac, entry.addr.u.mac);
 	entry.vid = group->vid;
 	__mdb_entry_fill_flags(&entry, flags);
 	__br_mdb_notify(dev, port, &entry, type);
@@ -512,6 +541,8 @@ static bool is_valid_mdb_entry(struct br_mdb_entry *entry)
 		if (ipv6_addr_is_ll_all_nodes(&entry->addr.u.ip6))
 			return false;
 #endif
+	} else if (is_multicast_ether_addr(entry->addr.u.mac)) {
+		;
 	} else
 		return false;
 	if (entry->state != MDB_PERMANENT && entry->state != MDB_TEMPORARY)
diff --git a/net/bridge/br_multicast.c b/net/bridge/br_multicast.c
index 3d4b2817687f..aa28a322ce9d 100644
--- a/net/bridge/br_multicast.c
+++ b/net/bridge/br_multicast.c
@@ -133,7 +133,9 @@ struct net_bridge_mdb_entry *br_mdb_get(struct net_bridge *br,
 		break;
 #endif
 	default:
-		return NULL;
+		ip.proto = 0;
+		ether_addr_copy(ip.u.mac, eth_hdr(skb)->h_dest);
+		break;
 	}
 
 	return br_mdb_ip_get_rcu(br, &ip);
diff --git a/net/bridge/br_private.h b/net/bridge/br_private.h
index c4fd307fbfdc..1f2a880a9d17 100644
--- a/net/bridge/br_private.h
+++ b/net/bridge/br_private.h
@@ -592,7 +592,8 @@ void br_fdb_offloaded_set(struct net_bridge *br, struct net_bridge_port *p,
 /* br_forward.c */
 enum br_pkt_type {
 	BR_PKT_UNICAST,
-	BR_PKT_MULTICAST,
+	BR_PKT_MULTICAST_IP,
+	BR_PKT_MULTICAST_L2,
 	BR_PKT_BROADCAST
 };
 int br_dev_queue_push_xmit(struct net *net, struct sock *sk, struct sk_buff *skb);
-- 
2.21.0


^ permalink raw reply related	[flat|nested] 8+ messages in thread

* Re: [net-next,rfc] net: bridge: mdb: Extend with multicast LLADDR
  2019-08-01 14:07 ` Nikolay Aleksandrov
@ 2019-08-01 14:11   ` Nikolay Aleksandrov
  2019-08-01 14:15     ` Nikolay Aleksandrov
  2019-08-02 14:07   ` Allan W. Nielsen
  1 sibling, 1 reply; 8+ messages in thread
From: Nikolay Aleksandrov @ 2019-08-01 14:11 UTC (permalink / raw)
  To: Horatiu Vultur, idosch, andrew, allan.nielsen
  Cc: davem, roopa, petrm, tglx, fw, netdev, linux-kernel, bridge

[-- Attachment #1: Type: text/plain, Size: 1750 bytes --]

On 01/08/2019 17:07, Nikolay Aleksandrov wrote:
> Hi Horatiu,
> Overall I think MDB is the right way, we'd like to contain the multicast code.
> A few comments below.
> 
> On 01/08/2019 15:50, Horatiu Vultur wrote:
[snip]
>>
>> Signed-off-by: Horatiu Vultur <horatiu.vultur@microchip.com>
>> Co-developed-by: Allan W. Nielsen <allan.nielsen@microchip.com>
>> Signed-off-by: Allan W. Nielsen <allan.nielsen@microchip.com>
>> ---
>>  include/linux/if_bridge.h      |  1 +
>>  include/uapi/linux/if_bridge.h |  1 +
>>  net/bridge/br_device.c         |  7 +++++--
>>  net/bridge/br_forward.c        |  3 ++-
>>  net/bridge/br_input.c          | 13 ++++++++++--
>>  net/bridge/br_mdb.c            | 47 +++++++++++++++++++++++++++++++++++-------
>>  net/bridge/br_multicast.c      |  4 +++-
>>  net/bridge/br_private.h        |  3 ++-
>>  8 files changed, 64 insertions(+), 15 deletions(-)
>>
> 
> Overall I don't think we need this BR_PKT_MULTICAST_L2, we could do the below much
> easier and without the checks if you use a per-mdb flag that says it's to be treated
> as a MULTICAST_L2 entry. Then you remove all of the BR_PKT_MULTICAST_L2 code (see the
> attached patch based on this one for example). and continue processing it as it is processed today.
> We'll keep the fast-path with minimal number of new conditionals.
> 
> Something like the patch I've attached to this reply, note that it is not complete
> just to show the intent, you'll have to re-work br_mdb_notify() to make it proper
> and there're most probably other details I've missed. If you find even better/less
> complex way to do it then please do.
> 
> Cheers,
>  Nik

Oops, I sent back your original patch. Here's the actually changed version
I was talking about.

Thanks,
 Nik




[-- Attachment #2: 0001-net-bridge-mdb-Extend-with-multicast-LLADDR.patch --]
[-- Type: text/x-patch, Size: 12962 bytes --]

From 01dbe0b22da96efcc6fbf46bd3b22353fca32f5d Mon Sep 17 00:00:00 2001
From: Horatiu Vultur <horatiu.vultur@microchip.com>
Date: Thu, 1 Aug 2019 14:50:40 +0200
Subject: [RFC incomplete] net: bridge: mdb: Extend with multicast LLADDR

Based on the discussion on the topic[1], we extend the functionality of
the 'bridge mdb' command to accept link layer multicast address. This
required only few changes and it fits nicely with the current
implementation and also the old implementation was not changed.

In this patch, we have added a MAC address to the union in 'struct br_ip'.
If we want to continue like this we should properly rename the structure as
it is not an IP any more.

To create a group for two of the front ports the following entries can
be added:
bridge mdb add dev br0 port eth0 grp 01:00:00:00:00:04 permanent vid 1
bridge mdb add dev br0 port eth1 grp 01:00:00:00:00:04 permanent vid 1

Now the entries will be display as following:
dev br0 port eth0 grp 01:00:00:00:00:04 permanent offload vid 1
dev br0 port eth1 grp 01:00:00:00:00:04 permanent offload vid 1

This requires changes to iproute2 as well, see the follogin patch for that.

Now if frame with dmac '01:00:00:00:00:04' will arrive at one of the front
ports. If we have HW offload support, then the frame will be forwarded by
the switch, and need not to go to the CPU. In a pure SW world, the frame is
forwarded by the SW bridge, which will flooded it only the ports which are
part of the group.

So far so good. This is an important part of the problem we wanted to solve.

But, there is one drawback of this approach. If you want to add two of the
front ports and br0 to receive the frame then I can't see a way of doing it
with the bridge mdb command. To do that it requireds many more changes to
the existing code.

Example:
bridge mdb add dev br0 port eth0 grp 01:00:00:00:00:04 permanent vid 1
bridge mdb add dev br0 port eth1 grp 01:00:00:00:00:04 permanent vid 1
bridge mdb add dev br0 port br0 grp 01:00:00:00:00:04 permanent vid 1 // This looks wrong.

We believe we come a long way by re-using the facilities in MDB (thanks for
convincing us in doing this), but we are still not completely happy with
the result.

If I only look at the user-interface (iproute2), and completely ignore all
the implementation details, then I still think that the FDB sub command is
more suitable for this. Today, FDB is for unicast, and MDB is for multicast.
We could change this such that MDB is for IP-multicast, and FDB is
forwarding in general (we should prevent FDB in install IP-multicast rules,
but we suggest to allow it to install MAC-Multicast rules).

The example from above would now look like this:
bridge fdb add 01:00:00:00:00:04 dev eth0 static self master
bridge fdb add 01:00:00:00:00:04 dev eth1 static self master
bridge fdb add 01:00:00:00:00:04 dev br0 static self master

It would be very similar to the "bridge vlan" command which also allow to
specify groups with and without br0.

Next observation is on the hashing data structure. In 'net_bridge_mdb_entry'
we only use/need the net_bridge_port_group/ports member (and the MAC
address, which we hacked into the br_ip struct) when we are a L2-multicast
entry. This type allow use to re-use the br_multicast_flood function
which does a lot of the work for us.

Also, the key used to do the lookup in the FDB is already a MAC address
(no need to hack the br_ip).

Regarding the events generated by switchdev: In the current proposal this
is a SWITCHDEV_OBJ_ID_PORT_MDB which refer to the switchdev_obj_port_mdb
type. If we changed to use the SWITCHDEV_FDB_ADD_TO_BRIDGE event, then
the associated data type would be switchdev_notifier_fdb_info - which also
has the information we need.

Using the FDB database, can still reuse the net_bridge_port_group type (and
associated functions), and I other parts from the MDB call graph as well.

If this sounds appealing, then we can do a proposal based on the idea.

If the MDB patch is what we can agree on, then we will continue polish this
and look for a solution to control the inclusion/exclusion of the br0
device (hints will be most appreciated).

[1] https://patchwork.ozlabs.org/patch/1136878/

Signed-off-by: Horatiu Vultur <horatiu.vultur@microchip.com>
Co-developed-by: Allan W. Nielsen <allan.nielsen@microchip.com>
Signed-off-by: Allan W. Nielsen <allan.nielsen@microchip.com>
---
 include/linux/if_bridge.h      |  1 +
 include/uapi/linux/if_bridge.h |  2 ++
 net/bridge/br_device.c         |  2 +-
 net/bridge/br_input.c          |  2 +-
 net/bridge/br_mdb.c            | 58 ++++++++++++++++++++++++++++------
 net/bridge/br_multicast.c      |  6 ++--
 net/bridge/br_private.h        | 10 ++++--
 7 files changed, 64 insertions(+), 17 deletions(-)

diff --git a/include/linux/if_bridge.h b/include/linux/if_bridge.h
index 9e57c4411734..68f2558b1a23 100644
--- a/include/linux/if_bridge.h
+++ b/include/linux/if_bridge.h
@@ -16,6 +16,7 @@
 struct br_ip {
 	union {
 		__be32	ip4;
+		__u8	mac[ETH_ALEN];
 #if IS_ENABLED(CONFIG_IPV6)
 		struct in6_addr ip6;
 #endif
diff --git a/include/uapi/linux/if_bridge.h b/include/uapi/linux/if_bridge.h
index 1b3c2b643a02..50b4b481fac5 100644
--- a/include/uapi/linux/if_bridge.h
+++ b/include/uapi/linux/if_bridge.h
@@ -238,12 +238,14 @@ struct br_mdb_entry {
 	__u8 state;
 #define MDB_FLAGS_OFFLOAD	(1 << 0)
 #define MDB_FLAGS_FAST_LEAVE	(1 << 1)
+#define MDB_FLAGS_L2MCAST	(1 << 2)
 	__u8 flags;
 	__u16 vid;
 	struct {
 		union {
 			__be32	ip4;
 			struct in6_addr ip6;
+			__u8	mac[ETH_ALEN];
 		} u;
 		__be16		proto;
 	} addr;
diff --git a/net/bridge/br_device.c b/net/bridge/br_device.c
index 681b72862c16..8ffceaac4c80 100644
--- a/net/bridge/br_device.c
+++ b/net/bridge/br_device.c
@@ -94,7 +94,7 @@ netdev_tx_t br_dev_xmit(struct sk_buff *skb, struct net_device *dev)
 
 		mdst = br_mdb_get(br, skb, vid);
 		if ((mdst || BR_INPUT_SKB_CB_MROUTERS_ONLY(skb)) &&
-		    br_multicast_querier_exists(br, eth_hdr(skb)))
+		    br_multicast_querier_exists(br, eth_hdr(skb), mdst))
 			br_multicast_flood(mdst, skb, false, true);
 		else
 			br_flood(br, skb, BR_PKT_MULTICAST, false, true);
diff --git a/net/bridge/br_input.c b/net/bridge/br_input.c
index 09b1dd8cd853..331a1ee87c62 100644
--- a/net/bridge/br_input.c
+++ b/net/bridge/br_input.c
@@ -130,7 +130,7 @@ int br_handle_frame_finish(struct net *net, struct sock *sk, struct sk_buff *skb
 	case BR_PKT_MULTICAST:
 		mdst = br_mdb_get(br, skb, vid);
 		if ((mdst || BR_INPUT_SKB_CB_MROUTERS_ONLY(skb)) &&
-		    br_multicast_querier_exists(br, eth_hdr(skb))) {
+		    br_multicast_querier_exists(br, eth_hdr(skb), mdst)) {
 			if ((mdst && mdst->host_joined) ||
 			    br_multicast_is_router(br)) {
 				local_rcv = true;
diff --git a/net/bridge/br_mdb.c b/net/bridge/br_mdb.c
index 428af1abf8cc..2da17b769760 100644
--- a/net/bridge/br_mdb.c
+++ b/net/bridge/br_mdb.c
@@ -62,6 +62,8 @@ static void __mdb_entry_fill_flags(struct br_mdb_entry *e, unsigned char flags)
 		e->flags |= MDB_FLAGS_OFFLOAD;
 	if (flags & MDB_PG_FLAGS_FAST_LEAVE)
 		e->flags |= MDB_FLAGS_FAST_LEAVE;
+	if (flags & MDB_PG_FLAGS_L2MCAST)
+		e->flags |= MDB_FLAGS_L2MCAST;
 }
 
 static void __mdb_entry_to_br_ip(struct br_mdb_entry *entry, struct br_ip *ip)
@@ -69,12 +71,19 @@ static void __mdb_entry_to_br_ip(struct br_mdb_entry *entry, struct br_ip *ip)
 	memset(ip, 0, sizeof(struct br_ip));
 	ip->vid = entry->vid;
 	ip->proto = entry->addr.proto;
-	if (ip->proto == htons(ETH_P_IP))
+	switch (ip->proto) {
+	case htons(ETH_P_IP):
 		ip->u.ip4 = entry->addr.u.ip4;
+		break;
 #if IS_ENABLED(CONFIG_IPV6)
-	else
+	case htons(ETH_P_IPV6):
 		ip->u.ip6 = entry->addr.u.ip6;
+		break;
 #endif
+	default:
+		ether_addr_copy(ip->u.mac, entry->addr.u.mac);
+		break;
+	}
 }
 
 static int br_mdb_fill_info(struct sk_buff *skb, struct netlink_callback *cb,
@@ -110,6 +119,7 @@ static int br_mdb_fill_info(struct sk_buff *skb, struct netlink_callback *cb,
 		      pp = &p->next) {
 			struct nlattr *nest_ent;
 			struct br_mdb_entry e;
+			u16 flags = p->flags;
 
 			port = p->port;
 			if (!port)
@@ -118,13 +128,22 @@ static int br_mdb_fill_info(struct sk_buff *skb, struct netlink_callback *cb,
 			memset(&e, 0, sizeof(e));
 			e.ifindex = port->dev->ifindex;
 			e.vid = p->addr.vid;
-			__mdb_entry_fill_flags(&e, p->flags);
-			if (p->addr.proto == htons(ETH_P_IP))
+			if (mp->l2_mcast)
+				flags |= MDB_PG_FLAGS_L2MCAST;
+			__mdb_entry_fill_flags(&e, flags);
+			switch (p->addr.proto) {
+			case htons(ETH_P_IP):
 				e.addr.u.ip4 = p->addr.u.ip4;
+				break;
 #if IS_ENABLED(CONFIG_IPV6)
-			if (p->addr.proto == htons(ETH_P_IPV6))
+			case htons(ETH_P_IPV6):
 				e.addr.u.ip6 = p->addr.u.ip6;
+				break;
 #endif
+			default:
+				ether_addr_copy(e.addr.u.mac, p->addr.u.mac);
+				break;
+			}
 			e.addr.proto = p->addr.proto;
 			nest_ent = nla_nest_start_noflag(skb,
 							 MDBA_MDB_ENTRY_INFO);
@@ -324,12 +343,19 @@ static void br_mdb_switchdev_host_port(struct net_device *dev,
 		.vid = entry->vid,
 	};
 
-	if (entry->addr.proto == htons(ETH_P_IP))
+	switch (entry->addr.proto) {
+	case htons(ETH_P_IP):
 		ip_eth_mc_map(entry->addr.u.ip4, mdb.addr);
+		break;
 #if IS_ENABLED(CONFIG_IPV6)
-	else
+	case htons(ETH_P_IPV6):
 		ipv6_eth_mc_map(&entry->addr.u.ip6, mdb.addr);
+		break;
 #endif
+	default:
+		ether_addr_copy(mdb.addr, entry->addr.u.mac);
+		break;
+	}
 
 	mdb.obj.orig_dev = dev;
 	switch (type) {
@@ -369,12 +395,19 @@ static void __br_mdb_notify(struct net_device *dev, struct net_bridge_port *p,
 	int err = -ENOBUFS;
 
 	port_dev = __dev_get_by_index(net, entry->ifindex);
-	if (entry->addr.proto == htons(ETH_P_IP))
+	switch (entry->addr.proto) {
+	case htons(ETH_P_IP):
 		ip_eth_mc_map(entry->addr.u.ip4, mdb.addr);
+		break;
 #if IS_ENABLED(CONFIG_IPV6)
-	else
+	case htons(ETH_P_IPV6):
 		ipv6_eth_mc_map(&entry->addr.u.ip6, mdb.addr);
+		break;
 #endif
+	default:
+		ether_addr_copy(mdb.addr, entry->addr.u.mac);
+		break;
+	}
 
 	mdb.obj.orig_dev = port_dev;
 	if (p && port_dev && type == RTM_NEWMDB) {
@@ -425,6 +458,7 @@ void br_mdb_notify(struct net_device *dev, struct net_bridge_port *port,
 #if IS_ENABLED(CONFIG_IPV6)
 	entry.addr.u.ip6 = group->u.ip6;
 #endif
+	ether_addr_copy(group->u.mac, entry.addr.u.mac);
 	entry.vid = group->vid;
 	__mdb_entry_fill_flags(&entry, flags);
 	__br_mdb_notify(dev, port, &entry, type);
@@ -512,8 +546,12 @@ static bool is_valid_mdb_entry(struct br_mdb_entry *entry)
 		if (ipv6_addr_is_ll_all_nodes(&entry->addr.u.ip6))
 			return false;
 #endif
-	} else
+	} else if ((entry->flags & MDB_FLAGS_L2MCAST) &&
+		   !is_multicast_ether_addr(entry->addr.u.mac)) {
 		return false;
+	} else {
+		return false;
+	}
 	if (entry->state != MDB_PERMANENT && entry->state != MDB_TEMPORARY)
 		return false;
 	if (entry->vid >= VLAN_VID_MASK)
diff --git a/net/bridge/br_multicast.c b/net/bridge/br_multicast.c
index 3d4b2817687f..43cfc8b18765 100644
--- a/net/bridge/br_multicast.c
+++ b/net/bridge/br_multicast.c
@@ -133,7 +133,9 @@ struct net_bridge_mdb_entry *br_mdb_get(struct net_bridge *br,
 		break;
 #endif
 	default:
-		return NULL;
+		ip.proto = 0;
+		ether_addr_copy(ip.u.mac, eth_hdr(skb)->h_dest);
+		break;
 	}
 
 	return br_mdb_ip_get_rcu(br, &ip);
@@ -2233,7 +2235,7 @@ bool br_multicast_has_querier_anywhere(struct net_device *dev, int proto)
 	memset(&eth, 0, sizeof(eth));
 	eth.h_proto = htons(proto);
 
-	ret = br_multicast_querier_exists(br, &eth);
+	ret = br_multicast_querier_exists(br, &eth, NULL);
 
 unlock:
 	rcu_read_unlock();
diff --git a/net/bridge/br_private.h b/net/bridge/br_private.h
index c4fd307fbfdc..be9e5f327c86 100644
--- a/net/bridge/br_private.h
+++ b/net/bridge/br_private.h
@@ -200,6 +200,7 @@ struct net_bridge_fdb_entry {
 #define MDB_PG_FLAGS_PERMANENT	BIT(0)
 #define MDB_PG_FLAGS_OFFLOAD	BIT(1)
 #define MDB_PG_FLAGS_FAST_LEAVE	BIT(2)
+#define MDB_PG_FLAGS_L2MCAST	BIT(3)
 
 struct net_bridge_port_group {
 	struct net_bridge_port		*port;
@@ -220,6 +221,7 @@ struct net_bridge_mdb_entry {
 	struct timer_list		timer;
 	struct br_ip			addr;
 	bool				host_joined;
+	bool				l2_mcast;
 	struct hlist_node		mdb_node;
 };
 
@@ -734,7 +736,8 @@ __br_multicast_querier_exists(struct net_bridge *br,
 }
 
 static inline bool br_multicast_querier_exists(struct net_bridge *br,
-					       struct ethhdr *eth)
+					       struct ethhdr *eth,
+					       struct net_bridge_mdb_entry *dst)
 {
 	switch (eth->h_proto) {
 	case (htons(ETH_P_IP)):
@@ -746,7 +749,7 @@ static inline bool br_multicast_querier_exists(struct net_bridge *br,
 			&br->ip6_other_query, true);
 #endif
 	default:
-		return false;
+		return !!(dst && dst->l2_mcast);
 	}
 }
 
@@ -814,7 +817,8 @@ static inline bool br_multicast_is_router(struct net_bridge *br)
 }
 
 static inline bool br_multicast_querier_exists(struct net_bridge *br,
-					       struct ethhdr *eth)
+					       struct ethhdr *eth,
+					       struct net_bridge_mdb_entry *dst)
 {
 	return false;
 }
-- 
2.21.0


^ permalink raw reply related	[flat|nested] 8+ messages in thread

* Re: [net-next,rfc] net: bridge: mdb: Extend with multicast LLADDR
  2019-08-01 14:11   ` Nikolay Aleksandrov
@ 2019-08-01 14:15     ` Nikolay Aleksandrov
  2019-08-01 15:25       ` Nikolay Aleksandrov
  0 siblings, 1 reply; 8+ messages in thread
From: Nikolay Aleksandrov @ 2019-08-01 14:15 UTC (permalink / raw)
  To: Horatiu Vultur, idosch, andrew, allan.nielsen
  Cc: davem, roopa, petrm, tglx, fw, netdev, linux-kernel, bridge

On 01/08/2019 17:11, Nikolay Aleksandrov wrote:
> On 01/08/2019 17:07, Nikolay Aleksandrov wrote:
>> Hi Horatiu,
>> Overall I think MDB is the right way, we'd like to contain the multicast code.
>> A few comments below.
>>
>> On 01/08/2019 15:50, Horatiu Vultur wrote:
> [snip]
>>>
>>> Signed-off-by: Horatiu Vultur <horatiu.vultur@microchip.com>
>>> Co-developed-by: Allan W. Nielsen <allan.nielsen@microchip.com>
>>> Signed-off-by: Allan W. Nielsen <allan.nielsen@microchip.com>
>>> ---
>>>  include/linux/if_bridge.h      |  1 +
>>>  include/uapi/linux/if_bridge.h |  1 +
>>>  net/bridge/br_device.c         |  7 +++++--
>>>  net/bridge/br_forward.c        |  3 ++-
>>>  net/bridge/br_input.c          | 13 ++++++++++--
>>>  net/bridge/br_mdb.c            | 47 +++++++++++++++++++++++++++++++++++-------
>>>  net/bridge/br_multicast.c      |  4 +++-
>>>  net/bridge/br_private.h        |  3 ++-
>>>  8 files changed, 64 insertions(+), 15 deletions(-)
>>>
>>
>> Overall I don't think we need this BR_PKT_MULTICAST_L2, we could do the below much
>> easier and without the checks if you use a per-mdb flag that says it's to be treated
>> as a MULTICAST_L2 entry. Then you remove all of the BR_PKT_MULTICAST_L2 code (see the
>> attached patch based on this one for example). and continue processing it as it is processed today.
>> We'll keep the fast-path with minimal number of new conditionals.
>>
>> Something like the patch I've attached to this reply, note that it is not complete
>> just to show the intent, you'll have to re-work br_mdb_notify() to make it proper
>> and there're most probably other details I've missed. If you find even better/less
>> complex way to do it then please do.
>>
>> Cheers,
>>  Nik
> 
> Oops, I sent back your original patch. Here's the actually changed version
> I was talking about.
> 
> Thanks,
>  Nik
> 
> 
> 

The querier exists change is a hack just to get the point, I'd prefer
to re-write that portion in a better way which makes more sense, i.e.
get that check out of there since it doesn't mean that an actual querier
exists. :)


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [net-next,rfc] net: bridge: mdb: Extend with multicast LLADDR
  2019-08-01 14:15     ` Nikolay Aleksandrov
@ 2019-08-01 15:25       ` Nikolay Aleksandrov
  0 siblings, 0 replies; 8+ messages in thread
From: Nikolay Aleksandrov @ 2019-08-01 15:25 UTC (permalink / raw)
  To: Horatiu Vultur, idosch, andrew, allan.nielsen
  Cc: davem, roopa, petrm, tglx, fw, netdev, linux-kernel, bridge

On 01/08/2019 17:15, Nikolay Aleksandrov wrote:
> On 01/08/2019 17:11, Nikolay Aleksandrov wrote:
>> On 01/08/2019 17:07, Nikolay Aleksandrov wrote:
>>> Hi Horatiu,
>>> Overall I think MDB is the right way, we'd like to contain the multicast code.
>>> A few comments below.
>>>
>>> On 01/08/2019 15:50, Horatiu Vultur wrote:
>> [snip]
>>>>
>>>> Signed-off-by: Horatiu Vultur <horatiu.vultur@microchip.com>
>>>> Co-developed-by: Allan W. Nielsen <allan.nielsen@microchip.com>
>>>> Signed-off-by: Allan W. Nielsen <allan.nielsen@microchip.com>
>>>> ---
>>>>  include/linux/if_bridge.h      |  1 +
>>>>  include/uapi/linux/if_bridge.h |  1 +
>>>>  net/bridge/br_device.c         |  7 +++++--
>>>>  net/bridge/br_forward.c        |  3 ++-
>>>>  net/bridge/br_input.c          | 13 ++++++++++--
>>>>  net/bridge/br_mdb.c            | 47 +++++++++++++++++++++++++++++++++++-------
>>>>  net/bridge/br_multicast.c      |  4 +++-
>>>>  net/bridge/br_private.h        |  3 ++-
>>>>  8 files changed, 64 insertions(+), 15 deletions(-)
>>>>
>>>
>>> Overall I don't think we need this BR_PKT_MULTICAST_L2, we could do the below much
>>> easier and without the checks if you use a per-mdb flag that says it's to be treated
>>> as a MULTICAST_L2 entry. Then you remove all of the BR_PKT_MULTICAST_L2 code (see the
>>> attached patch based on this one for example). and continue processing it as it is processed today.
>>> We'll keep the fast-path with minimal number of new conditionals.
>>>
>>> Something like the patch I've attached to this reply, note that it is not complete
>>> just to show the intent, you'll have to re-work br_mdb_notify() to make it proper
>>> and there're most probably other details I've missed. If you find even better/less
>>> complex way to do it then please do.
>>>
>>> Cheers,
>>>  Nik
>>
>> Oops, I sent back your original patch. Here's the actually changed version
>> I was talking about.
>>
>> Thanks,
>>  Nik
>>
>>
>>
> 
> The querier exists change is a hack just to get the point, I'd prefer
> to re-write that portion in a better way which makes more sense, i.e.
> get that check out of there since it doesn't mean that an actual querier
> exists. :)
> 

TBH, I'm inclined to just use proto == 0 *internally* as this even though it's reserved,
we're not putting it on the wire or using it to construct packets, it's just internal
use which can change into a flag if some day that value needs to be used. Obviously
to user-space we need it to be a flag, we can't expose or configure it as a proto value
without making it permanent uapi. I haven't looked into detail how feasible this is,
just a thought that might make it simpler.




^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [net-next,rfc] net: bridge: mdb: Extend with multicast LLADDR
  2019-08-01 14:07 ` Nikolay Aleksandrov
  2019-08-01 14:11   ` Nikolay Aleksandrov
@ 2019-08-02 14:07   ` Allan W. Nielsen
  2019-08-02 14:16     ` Nikolay Aleksandrov
  1 sibling, 1 reply; 8+ messages in thread
From: Allan W. Nielsen @ 2019-08-02 14:07 UTC (permalink / raw)
  To: Nikolay Aleksandrov
  Cc: Horatiu Vultur, idosch, andrew, davem, roopa, petrm, tglx, fw,
	netdev, linux-kernel, bridge

The 08/01/2019 17:07, Nikolay Aleksandrov wrote:
> > To create a group for two of the front ports the following entries can
> > be added:
> > bridge mdb add dev br0 port eth0 grp 01:00:00:00:00:04 permanent vid 1
> > bridge mdb add dev br0 port eth1 grp 01:00:00:00:00:04 permanent vid 1
> > 
> > Now the entries will be display as following:
> > dev br0 port eth0 grp 01:00:00:00:00:04 permanent offload vid 1
> > dev br0 port eth1 grp 01:00:00:00:00:04 permanent offload vid 1
> > 
> > This requires changes to iproute2 as well, see the follogin patch for that.
> > 
> > Now if frame with dmac '01:00:00:00:00:04' will arrive at one of the front
> > ports. If we have HW offload support, then the frame will be forwarded by
> > the switch, and need not to go to the CPU. In a pure SW world, the frame is
> > forwarded by the SW bridge, which will flooded it only the ports which are
> > part of the group.
> > 
> > So far so good. This is an important part of the problem we wanted to solve.
> > 
> > But, there is one drawback of this approach. If you want to add two of the
> > front ports and br0 to receive the frame then I can't see a way of doing it
> > with the bridge mdb command. To do that it requireds many more changes to
> > the existing code.
> > 
> > Example:
> > bridge mdb add dev br0 port eth0 grp 01:00:00:00:00:04 permanent vid 1
> > bridge mdb add dev br0 port eth1 grp 01:00:00:00:00:04 permanent vid 1
> > bridge mdb add dev br0 port br0 grp 01:00:00:00:00:04 permanent vid 1 // This looks wrong.
> > 
> > We believe we come a long way by re-using the facilities in MDB (thanks for
> > convincing us in doing this), but we are still not completely happy with
> > the result.
> Just add self argument for the bridge mdb command, no need to specify it twice.
Like this:
bridge mdb add dev br0 port eth1 grp 01:00:00:00:00:04 permanent vid self

Then if I want to remove br0 rom the group, should I then have a no-self, and
then it becomes even more strange what to write in the port.

bridge mdb add dev br0 port ?? grp 01:00:00:00:00:04 permanent vid no-self
                            ^^
And, what if it is a group with only br0 (the traffic should go to br0 and
not any of the slave interfaces)?

Also, the 'self' keyword has different meanings in the 'bridge vlan' and the
'bridge fdb' commands where it refres to if the offload rule should be install
in HW - or only in the SW bridge.

The proposed does not look pretty bad, but at least it will be possible to
configured the different scenarios:

bridge mdb add dev br0 port br0 grp 01:00:00:00:00:04 permanent vid 1
bridge mdb del dev br0 port br0 grp 01:00:00:00:00:04 permanent vid 1

The more I look at the "bridge mdb { add | del } dev DEV port PORT" command, the
less I understand why do we have both 'dev' and 'port'? The implementation will
only allow this if 'port' has become enslaved to the switch represented by
'dev'. Anyway, what is done is done, and we need to stay backwards compatible,
but we could make it optional, and then it looks a bit less strange to allow the
port to specify a br0.

Like this:

bridge mdb { add | del } [dev DEV] port PORT grp GROUP [ permanent | temp ] [ vid VID ]

bridge mdb add port eth0 grp 01:00:00:00:00:04 permanent vid 1
bridge mdb add port eth1 grp 01:00:00:00:00:04 permanent vid 1
bridge mdb add port br0  grp 01:00:00:00:00:04 permanent vid 1 // Add br0 to the gruop
bridge mdb del port br0  grp 01:00:00:00:00:04 permanent vid 1 // Delete it again

Alternative we could also make the port optional:

bridge mdb { add | del } dev DEV [port PORT] grp GROUP [ permanent | temp ] [ vid VID ]

bridge mdb add dev br0 port eth0 grp 01:00:00:00:00:04 permanent vid 1
bridge mdb add dev br0 port eth1 grp 01:00:00:00:00:04 permanent vid 1
bridge mdb add dev br0           grp 01:00:00:00:00:04 permanent vid 1 // Add br0 to the gruop
bridge mdb del dev br0           grp 01:00:00:00:00:04 permanent vid 1 // Delete it again

Any preferences?

/Allan



^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [net-next,rfc] net: bridge: mdb: Extend with multicast LLADDR
  2019-08-02 14:07   ` Allan W. Nielsen
@ 2019-08-02 14:16     ` Nikolay Aleksandrov
  2019-08-02 14:21       ` Allan W. Nielsen
  0 siblings, 1 reply; 8+ messages in thread
From: Nikolay Aleksandrov @ 2019-08-02 14:16 UTC (permalink / raw)
  To: Allan W. Nielsen
  Cc: Horatiu Vultur, idosch, andrew, davem, roopa, petrm, tglx, fw,
	netdev, linux-kernel, bridge

On 02/08/2019 17:07, Allan W. Nielsen wrote:
> The 08/01/2019 17:07, Nikolay Aleksandrov wrote:
>>> To create a group for two of the front ports the following entries can
>>> be added:
>>> bridge mdb add dev br0 port eth0 grp 01:00:00:00:00:04 permanent vid 1
>>> bridge mdb add dev br0 port eth1 grp 01:00:00:00:00:04 permanent vid 1
>>>
>>> Now the entries will be display as following:
>>> dev br0 port eth0 grp 01:00:00:00:00:04 permanent offload vid 1
>>> dev br0 port eth1 grp 01:00:00:00:00:04 permanent offload vid 1
>>>
>>> This requires changes to iproute2 as well, see the follogin patch for that.
>>>
>>> Now if frame with dmac '01:00:00:00:00:04' will arrive at one of the front
>>> ports. If we have HW offload support, then the frame will be forwarded by
>>> the switch, and need not to go to the CPU. In a pure SW world, the frame is
>>> forwarded by the SW bridge, which will flooded it only the ports which are
>>> part of the group.
>>>
>>> So far so good. This is an important part of the problem we wanted to solve.
>>>
>>> But, there is one drawback of this approach. If you want to add two of the
>>> front ports and br0 to receive the frame then I can't see a way of doing it
>>> with the bridge mdb command. To do that it requireds many more changes to
>>> the existing code.
>>>
>>> Example:
>>> bridge mdb add dev br0 port eth0 grp 01:00:00:00:00:04 permanent vid 1
>>> bridge mdb add dev br0 port eth1 grp 01:00:00:00:00:04 permanent vid 1
>>> bridge mdb add dev br0 port br0 grp 01:00:00:00:00:04 permanent vid 1 // This looks wrong.
>>>
>>> We believe we come a long way by re-using the facilities in MDB (thanks for
>>> convincing us in doing this), but we are still not completely happy with
>>> the result.
>> Just add self argument for the bridge mdb command, no need to specify it twice.
> Like this:
> bridge mdb add dev br0 port eth1 grp 01:00:00:00:00:04 permanent vid self

What ?! No, that is not what I meant.
bridge mdb add dev br0 grp 01:00:00:00:00:04 permanent vid self
bridge mdb del dev br0 grp 01:00:00:00:00:04 permanent vid self

Similar to fdb. You don't need no-self..
I don't mind a different approach, this was just a suggestion. But please
without "no-self" :)

> 
> Then if I want to remove br0 rom the group, should I then have a no-self, and
> then it becomes even more strange what to write in the port.
> 
> bridge mdb add dev br0 port ?? grp 01:00:00:00:00:04 permanent vid no-self
>                             ^^
> And, what if it is a group with only br0 (the traffic should go to br0 and
> not any of the slave interfaces)?
> 
> Also, the 'self' keyword has different meanings in the 'bridge vlan' and the
> 'bridge fdb' commands where it refres to if the offload rule should be install
> in HW - or only in the SW bridge.

No, it shouldn't. Self means act on the device, in this case act on the bridge,
otherwise master is assumed.

> 
> The proposed does not look pretty bad, but at least it will be possible to
> configured the different scenarios:
> 
> bridge mdb add dev br0 port br0 grp 01:00:00:00:00:04 permanent vid 1
> bridge mdb del dev br0 port br0 grp 01:00:00:00:00:04 permanent vid 1
> 

That works too, but the "port" part is redundant.

> The more I look at the "bridge mdb { add | del } dev DEV port PORT" command, the
> less I understand why do we have both 'dev' and 'port'? The implementation will
> only allow this if 'port' has become enslaved to the switch represented by
> 'dev'. Anyway, what is done is done, and we need to stay backwards compatible,
> but we could make it optional, and then it looks a bit less strange to allow the
> port to specify a br0.
> 
> Like this:
> 
> bridge mdb { add | del } [dev DEV] port PORT grp GROUP [ permanent | temp ] [ vid VID ]
> 
> bridge mdb add port eth0 grp 01:00:00:00:00:04 permanent vid 1
> bridge mdb add port eth1 grp 01:00:00:00:00:04 permanent vid 1
> bridge mdb add port br0  grp 01:00:00:00:00:04 permanent vid 1 // Add br0 to the gruop
> bridge mdb del port br0  grp 01:00:00:00:00:04 permanent vid 1 // Delete it again
> 

br0 is not a port, thus the "self" or just dev or whatever you choose..

> Alternative we could also make the port optional:
> 
> bridge mdb { add | del } dev DEV [port PORT] grp GROUP [ permanent | temp ] [ vid VID ]
> 
> bridge mdb add dev br0 port eth0 grp 01:00:00:00:00:04 permanent vid 1
> bridge mdb add dev br0 port eth1 grp 01:00:00:00:00:04 permanent vid 1
> bridge mdb add dev br0           grp 01:00:00:00:00:04 permanent vid 1 // Add br0 to the gruop
> bridge mdb del dev br0           grp 01:00:00:00:00:04 permanent vid 1 // Delete it again
> 

Right. I read this one later. :)

> Any preferences?
> 

Not really, up to you. Any of the above seem fine to me.

> /Allan
> 
> 


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [net-next,rfc] net: bridge: mdb: Extend with multicast LLADDR
  2019-08-02 14:16     ` Nikolay Aleksandrov
@ 2019-08-02 14:21       ` Allan W. Nielsen
  0 siblings, 0 replies; 8+ messages in thread
From: Allan W. Nielsen @ 2019-08-02 14:21 UTC (permalink / raw)
  To: Nikolay Aleksandrov
  Cc: Horatiu Vultur, idosch, andrew, davem, roopa, petrm, tglx, fw,
	netdev, linux-kernel, bridge

The 08/02/2019 17:16, Nikolay Aleksandrov wrote:
> On 02/08/2019 17:07, Allan W. Nielsen wrote:
> > The 08/01/2019 17:07, Nikolay Aleksandrov wrote:
> >>> To create a group for two of the front ports the following entries can
> >>> be added:
> >>> bridge mdb add dev br0 port eth0 grp 01:00:00:00:00:04 permanent vid 1
> >>> bridge mdb add dev br0 port eth1 grp 01:00:00:00:00:04 permanent vid 1
> >>>
> >>> Now the entries will be display as following:
> >>> dev br0 port eth0 grp 01:00:00:00:00:04 permanent offload vid 1
> >>> dev br0 port eth1 grp 01:00:00:00:00:04 permanent offload vid 1
> >>>
> >>> This requires changes to iproute2 as well, see the follogin patch for that.
> >>>
> >>> Now if frame with dmac '01:00:00:00:00:04' will arrive at one of the front
> >>> ports. If we have HW offload support, then the frame will be forwarded by
> >>> the switch, and need not to go to the CPU. In a pure SW world, the frame is
> >>> forwarded by the SW bridge, which will flooded it only the ports which are
> >>> part of the group.
> >>>
> >>> So far so good. This is an important part of the problem we wanted to solve.
> >>>
> >>> But, there is one drawback of this approach. If you want to add two of the
> >>> front ports and br0 to receive the frame then I can't see a way of doing it
> >>> with the bridge mdb command. To do that it requireds many more changes to
> >>> the existing code.
> >>>
> >>> Example:
> >>> bridge mdb add dev br0 port eth0 grp 01:00:00:00:00:04 permanent vid 1
> >>> bridge mdb add dev br0 port eth1 grp 01:00:00:00:00:04 permanent vid 1
> >>> bridge mdb add dev br0 port br0 grp 01:00:00:00:00:04 permanent vid 1 // This looks wrong.
> >>>
> >>> We believe we come a long way by re-using the facilities in MDB (thanks for
> >>> convincing us in doing this), but we are still not completely happy with
> >>> the result.
> >> Just add self argument for the bridge mdb command, no need to specify it twice.
> > Like this:
> > bridge mdb add dev br0 port eth1 grp 01:00:00:00:00:04 permanent vid self
> 
> What ?! No, that is not what I meant.
> bridge mdb add dev br0 grp 01:00:00:00:00:04 permanent vid self
> bridge mdb del dev br0 grp 01:00:00:00:00:04 permanent vid self
> 
> Similar to fdb. You don't need no-self..
> I don't mind a different approach, this was just a suggestion. But please
> without "no-self" :)

Good, then we are in sync on that one :-D

> > 
> > Then if I want to remove br0 rom the group, should I then have a no-self, and
> > then it becomes even more strange what to write in the port.
> > 
> > bridge mdb add dev br0 port ?? grp 01:00:00:00:00:04 permanent vid no-self
> >                             ^^
> > And, what if it is a group with only br0 (the traffic should go to br0 and
> > not any of the slave interfaces)?
> > 
> > Also, the 'self' keyword has different meanings in the 'bridge vlan' and the
> > 'bridge fdb' commands where it refres to if the offload rule should be install
> > in HW - or only in the SW bridge.
> 
> No, it shouldn't. Self means act on the device, in this case act on the bridge,
> otherwise master is assumed.
> 
> > 
> > The proposed does not look pretty bad, but at least it will be possible to
> > configured the different scenarios:
> > 
> > bridge mdb add dev br0 port br0 grp 01:00:00:00:00:04 permanent vid 1
> > bridge mdb del dev br0 port br0 grp 01:00:00:00:00:04 permanent vid 1
> > 
> 
> That works too, but the "port" part is redundant.
> 
> > The more I look at the "bridge mdb { add | del } dev DEV port PORT" command, the
> > less I understand why do we have both 'dev' and 'port'? The implementation will
> > only allow this if 'port' has become enslaved to the switch represented by
> > 'dev'. Anyway, what is done is done, and we need to stay backwards compatible,
> > but we could make it optional, and then it looks a bit less strange to allow the
> > port to specify a br0.
> > 
> > Like this:
> > 
> > bridge mdb { add | del } [dev DEV] port PORT grp GROUP [ permanent | temp ] [ vid VID ]
> > 
> > bridge mdb add port eth0 grp 01:00:00:00:00:04 permanent vid 1
> > bridge mdb add port eth1 grp 01:00:00:00:00:04 permanent vid 1
> > bridge mdb add port br0  grp 01:00:00:00:00:04 permanent vid 1 // Add br0 to the gruop
> > bridge mdb del port br0  grp 01:00:00:00:00:04 permanent vid 1 // Delete it again
> > 
> 
> br0 is not a port, thus the "self" or just dev or whatever you choose..
Ahh, now I understand what you meant.

> > Alternative we could also make the port optional:
> > 
> > bridge mdb { add | del } dev DEV [port PORT] grp GROUP [ permanent | temp ] [ vid VID ]
> > 
> > bridge mdb add dev br0 port eth0 grp 01:00:00:00:00:04 permanent vid 1
> > bridge mdb add dev br0 port eth1 grp 01:00:00:00:00:04 permanent vid 1
> > bridge mdb add dev br0           grp 01:00:00:00:00:04 permanent vid 1 // Add br0 to the gruop
> > bridge mdb del dev br0           grp 01:00:00:00:00:04 permanent vid 1 // Delete it again
> > 
> 
> Right. I read this one later. :)
> 
> > Any preferences?
> Not really, up to you. Any of the above seem fine to me.
Perfect, I like this one the most:

bridge mdb { add | del } dev DEV [ port PORT ] grp GROUP [ permanent | temp ] [ vid VID ]

bridge mdb add dev br0 port eth0 grp 01:00:00:00:00:04 permanent vid 1
bridge mdb add dev br0 port eth1 grp 01:00:00:00:00:04 permanent vid 1
bridge mdb add dev br0           grp 01:00:00:00:00:04 permanent vid 1 // Add br0 to the gruop
bridge mdb del dev br0           grp 01:00:00:00:00:04 permanent vid 1 // Delete it again

/Allan

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2019-08-02 14:21 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-08-01 12:50 [net-next,rfc] net: bridge: mdb: Extend with multicast LLADDR Horatiu Vultur
2019-08-01 14:07 ` Nikolay Aleksandrov
2019-08-01 14:11   ` Nikolay Aleksandrov
2019-08-01 14:15     ` Nikolay Aleksandrov
2019-08-01 15:25       ` Nikolay Aleksandrov
2019-08-02 14:07   ` Allan W. Nielsen
2019-08-02 14:16     ` Nikolay Aleksandrov
2019-08-02 14:21       ` Allan W. Nielsen

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).