netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
To: Vladimir Oltean <olteanv@gmail.com>
Cc: roopa@cumulusnetworks.com, netdev@vger.kernel.org
Subject: Re: What is the correct way to install an L2 multicast route into a bridge?
Date: Wed, 8 Jul 2020 15:55:23 +0300	[thread overview]
Message-ID: <9621e436-b674-ea12-eabd-9908ca6d5ee8@cumulusnetworks.com> (raw)
In-Reply-To: <7a64aacf-51fd-4697-6af9-229bbbe97d0b@cumulusnetworks.com>

[-- Attachment #1: Type: text/plain, Size: 2474 bytes --]

On 08/07/2020 14:17, Nikolay Aleksandrov wrote:
> On 08/07/2020 14:07, Nikolay Aleksandrov wrote:
>> On 08/07/2020 12:42, Vladimir Oltean wrote:
>>> On Wed, Jul 08, 2020 at 12:16:27PM +0300, Nikolay Aleksandrov wrote:
>>>> On 08/07/2020 12:04, Vladimir Oltean wrote:
[snip]
>>>>
>>>
>>> Thanks, Nikolay.
>>> Isn't mdb_modify() already netlink-based? I think you're talking about
>>> some changes to 'struct br_mdb_entry' which would be necessary. What
>>> changes would be needed, do you know (both in the 'workaround' case as
>>> well as in 'fully netlink')?
>>>
>>> -Vladimir
>>>
>>
>> That is netlink-based, but the uAPI (used also for add/del/dump) uses a fixed-size struct
>> which is very inconvenient and hard to extend. I plan to add MDBv2 which uses separate
>> netlink attributes and can be easily extended as we plan to add some new features and will
>> need that flexibility. It will use a new container attribute for the notifications as well.
>>
>> In the workaround case IIRC you'd have to add a new protocol type to denote the L2 routes, and
> 
> Actually drop the whole /workaround/ comment altogether. It can be implemented fairly straight-forward
> even with the struct we got now. You don't need any new attributes.
> I just had forgotten the details and spoke too quickly. :)
> 
>> re-work the lookup logic to include L2 in non-IP case. You'd have to edit the multicast fast-path,
>> and everything else that assumes the frame has to be IP/IPv6. I'm sure I'm missing some details as
>> last I did this was over an year ago where I made a quick and dirty hack that implemented it with proto = 0
>> to denote an L2 entry just as a proof of concept.
>> Also you would have to make sure all of that is compatible with current user-space code. For example
>> iproute2/bridge/mdb.c considers that proto can be only IPv4 or IPv6 if it's not v4, i.e. it will
>> print the new L2 entries as :: IPv6 entries until it's fixed.
>>
>> Obviously some of the items for the workaround case are valid in all cases for L2 routes (e.g. fast-path/lookup edit).
>> But I think it's not that hard to implement without affecting the fast path much or even at all.
>>
>> Cheers,
>>  Nik
>>

I found the patch and rebased it against net-next. I want to stress that it is unfinished and
barely tested, it was just a hack to enable L2 entries and forwarding.
If you're interested and find it useful please feel free to take it over as
I don't have time right now.

Thanks,
 Nik



[-- Attachment #2: 0001-net-bridge-multicast-add-support-for-L2-entries.patch --]
[-- Type: text/x-patch, Size: 9150 bytes --]

From 87ad93d5953932eb14739572743d049de1ba8b1d Mon Sep 17 00:00:00 2001
From: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Date: Wed, 8 Jul 2020 15:49:26 +0300
Subject: [RFC] net: bridge: multicast: add support for L2 entries

Unfinished.

Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
---
 include/linux/if_bridge.h      |  1 +
 include/uapi/linux/if_bridge.h |  2 ++
 net/bridge/br_device.c         |  2 +-
 net/bridge/br_input.c          |  2 +-
 net/bridge/br_mdb.c            | 35 ++++++++++++++++++++++++++--------
 net/bridge/br_multicast.c      | 12 +++++++++---
 net/bridge/br_private.h        |  7 +++++--
 7 files changed, 46 insertions(+), 15 deletions(-)

diff --git a/include/linux/if_bridge.h b/include/linux/if_bridge.h
index b3a8d3054af0..4c95eff22c8e 100644
--- a/include/linux/if_bridge.h
+++ b/include/linux/if_bridge.h
@@ -20,6 +20,7 @@ struct br_ip {
 		struct in6_addr ip6;
 #endif
 	} u;
+	__u8 mac_addr[ETH_ALEN];
 	__be16		proto;
 	__u16           vid;
 };
diff --git a/include/uapi/linux/if_bridge.h b/include/uapi/linux/if_bridge.h
index c114c1c2bd53..73e129370662 100644
--- a/include/uapi/linux/if_bridge.h
+++ b/include/uapi/linux/if_bridge.h
@@ -437,10 +437,12 @@ struct br_mdb_entry {
 	__u8 state;
 #define MDB_FLAGS_OFFLOAD	(1 << 0)
 #define MDB_FLAGS_FAST_LEAVE	(1 << 1)
+#define MDB_FLAGS_L2		(1 << 2)
 	__u8 flags;
 	__u16 vid;
 	struct {
 		union {
+			__u8 mac_addr[ETH_ALEN];
 			__be32	ip4;
 			struct in6_addr ip6;
 		} u;
diff --git a/net/bridge/br_device.c b/net/bridge/br_device.c
index 8c7b78f8bc23..b32421f484f3 100644
--- a/net/bridge/br_device.c
+++ b/net/bridge/br_device.c
@@ -91,7 +91,7 @@ netdev_tx_t br_dev_xmit(struct sk_buff *skb, struct net_device *dev)
 
 		mdst = br_mdb_get(br, skb, vid);
 		if ((mdst || BR_INPUT_SKB_CB_MROUTERS_ONLY(skb)) &&
-		    br_multicast_querier_exists(br, eth_hdr(skb)))
+		    br_multicast_querier_exists(br, eth_hdr(skb), mdst))
 			br_multicast_flood(mdst, skb, false, true);
 		else
 			br_flood(br, skb, BR_PKT_MULTICAST, false, true);
diff --git a/net/bridge/br_input.c b/net/bridge/br_input.c
index 59a318b9f646..d31b5c18c6a1 100644
--- a/net/bridge/br_input.c
+++ b/net/bridge/br_input.c
@@ -134,7 +134,7 @@ int br_handle_frame_finish(struct net *net, struct sock *sk, struct sk_buff *skb
 	case BR_PKT_MULTICAST:
 		mdst = br_mdb_get(br, skb, vid);
 		if ((mdst || BR_INPUT_SKB_CB_MROUTERS_ONLY(skb)) &&
-		    br_multicast_querier_exists(br, eth_hdr(skb))) {
+		    br_multicast_querier_exists(br, eth_hdr(skb), mdst)) {
 			if ((mdst && mdst->host_joined) ||
 			    br_multicast_is_router(br)) {
 				local_rcv = true;
diff --git a/net/bridge/br_mdb.c b/net/bridge/br_mdb.c
index da5ed4cf9233..c80218a5e6fa 100644
--- a/net/bridge/br_mdb.c
+++ b/net/bridge/br_mdb.c
@@ -62,6 +62,8 @@ static void __mdb_entry_fill_flags(struct br_mdb_entry *e, unsigned char flags)
 		e->flags |= MDB_FLAGS_OFFLOAD;
 	if (flags & MDB_PG_FLAGS_FAST_LEAVE)
 		e->flags |= MDB_FLAGS_FAST_LEAVE;
+	if (flags & MDB_PG_FLAGS_L2)
+		e->flags |= MDB_FLAGS_L2;
 }
 
 static void __mdb_entry_to_br_ip(struct br_mdb_entry *entry, struct br_ip *ip)
@@ -72,9 +74,11 @@ static void __mdb_entry_to_br_ip(struct br_mdb_entry *entry, struct br_ip *ip)
 	if (ip->proto == htons(ETH_P_IP))
 		ip->u.ip4 = entry->addr.u.ip4;
 #if IS_ENABLED(CONFIG_IPV6)
-	else
+	else if (ip->proto == htons(ETH_P_IPV6))
 		ip->u.ip6 = entry->addr.u.ip6;
 #endif
+	else
+		memcpy(ip->mac_addr, entry->addr.u.mac_addr, ETH_ALEN);
 }
 
 static int __mdb_fill_info(struct sk_buff *skb,
@@ -103,9 +107,12 @@ static int __mdb_fill_info(struct sk_buff *skb,
 	if (mp->addr.proto == htons(ETH_P_IP))
 		e.addr.u.ip4 = mp->addr.u.ip4;
 #if IS_ENABLED(CONFIG_IPV6)
-	if (mp->addr.proto == htons(ETH_P_IPV6))
+	else if (mp->addr.proto == htons(ETH_P_IPV6))
 		e.addr.u.ip6 = mp->addr.u.ip6;
 #endif
+	else
+		memcpy(e.addr.u.mac_addr, mp->addr.mac_addr, ETH_ALEN);
+
 	e.addr.proto = mp->addr.proto;
 	nest_ent = nla_nest_start_noflag(skb,
 					 MDBA_MDB_ENTRY_INFO);
@@ -399,9 +406,11 @@ static void __br_mdb_notify(struct net_device *dev, struct net_bridge_port *p,
 	if (entry->addr.proto == htons(ETH_P_IP))
 		ip_eth_mc_map(entry->addr.u.ip4, mdb.addr);
 #if IS_ENABLED(CONFIG_IPV6)
-	else
+	else if (entry->addr.proto == htons(ETH_P_IPV6))
 		ipv6_eth_mc_map(&entry->addr.u.ip6, mdb.addr);
 #endif
+	else
+		memcpy(mdb.addr, entry->addr.u.mac_addr, ETH_ALEN);
 
 	mdb.obj.orig_dev = port_dev;
 	if (p && port_dev && type == RTM_NEWMDB) {
@@ -447,6 +456,7 @@ void br_mdb_notify(struct net_device *dev, struct net_bridge_port *port,
 		entry.ifindex = port->dev->ifindex;
 	else
 		entry.ifindex = dev->ifindex;
+	memcpy(entry.addr.u.mac_addr, group->mac_addr, ETH_ALEN);
 	entry.addr.proto = group->proto;
 	entry.addr.u.ip4 = group->u.ip4;
 #if IS_ENABLED(CONFIG_IPV6)
@@ -529,18 +539,24 @@ static bool is_valid_mdb_entry(struct br_mdb_entry *entry)
 	if (entry->ifindex == 0)
 		return false;
 
-	if (entry->addr.proto == htons(ETH_P_IP)) {
+	switch (entry->addr.proto) {
+	case htons(ETH_P_IP):
 		if (!ipv4_is_multicast(entry->addr.u.ip4))
 			return false;
 		if (ipv4_is_local_multicast(entry->addr.u.ip4))
 			return false;
+		break;
 #if IS_ENABLED(CONFIG_IPV6)
-	} else if (entry->addr.proto == htons(ETH_P_IPV6)) {
+	case htons(ETH_P_IPV6):
 		if (ipv6_addr_is_ll_all_nodes(&entry->addr.u.ip6))
 			return false;
+		break;
 #endif
-	} else
-		return false;
+	default:
+		if (entry->addr.proto != 0)
+			return false;
+	}
+
 	if (entry->state != MDB_PERMANENT && entry->state != MDB_TEMPORARY)
 		return false;
 	if (entry->vid >= VLAN_VID_MASK)
@@ -616,6 +632,9 @@ static int br_mdb_add_group(struct net_bridge *br, struct net_bridge_port *port,
 			return err;
 	}
 
+	if (mp->l2 && state != MDB_PERMANENT)
+		return -EINVAL;
+
 	/* host join */
 	if (!port) {
 		/* don't allow any flags for host-joined groups */
@@ -642,7 +661,7 @@ static int br_mdb_add_group(struct net_bridge *br, struct net_bridge_port *port,
 	if (unlikely(!p))
 		return -ENOMEM;
 	rcu_assign_pointer(*pp, p);
-	if (state == MDB_TEMPORARY)
+	if (state == MDB_TEMPORARY && !mp->l2)
 		mod_timer(&p->timer, now + br->multicast_membership_interval);
 
 	return 0;
diff --git a/net/bridge/br_multicast.c b/net/bridge/br_multicast.c
index 83490bf73a13..79af5b656df1 100644
--- a/net/bridge/br_multicast.c
+++ b/net/bridge/br_multicast.c
@@ -133,7 +133,8 @@ struct net_bridge_mdb_entry *br_mdb_get(struct net_bridge *br,
 		break;
 #endif
 	default:
-		return NULL;
+		ip.proto = 0;
+		memcpy(&ip.mac_addr, eth_hdr(skb)->h_dest, ETH_ALEN);
 	}
 
 	return br_mdb_ip_get_rcu(br, &ip);
@@ -457,6 +458,7 @@ struct net_bridge_mdb_entry *br_multicast_new_group(struct net_bridge *br,
 
 	mp->br = br;
 	mp->addr = *group;
+	mp->l2 = !!(group->proto == 0);
 	timer_setup(&mp->timer, br_multicast_group_expired, 0);
 	err = rhashtable_lookup_insert_fast(&br->mdb_hash_tbl, &mp->rhnode,
 					    br_mdb_rht_params);
@@ -486,6 +488,8 @@ struct net_bridge_port_group *br_multicast_new_port_group(
 	p->addr = *group;
 	p->port = port;
 	p->flags = flags;
+	if (group->proto == htons(0))
+		p->flags |= MDB_PG_FLAGS_L2;
 	rcu_assign_pointer(p->next, next);
 	hlist_add_head(&p->mglist, &port->mglist);
 	timer_setup(&p->timer, br_multicast_port_group_expired, 0);
@@ -519,7 +523,9 @@ void br_multicast_host_join(struct net_bridge_mdb_entry *mp, bool notify)
 			br_mdb_notify(mp->br->dev, NULL, &mp->addr,
 				      RTM_NEWMDB, 0);
 	}
-	mod_timer(&mp->timer, jiffies + mp->br->multicast_membership_interval);
+	if (!mp->l2)
+		mod_timer(&mp->timer,
+			  jiffies + mp->br->multicast_membership_interval);
 }
 
 void br_multicast_host_leave(struct net_bridge_mdb_entry *mp, bool notify)
@@ -2252,7 +2258,7 @@ bool br_multicast_has_querier_anywhere(struct net_device *dev, int proto)
 	memset(&eth, 0, sizeof(eth));
 	eth.h_proto = htons(proto);
 
-	ret = br_multicast_querier_exists(br, &eth);
+	ret = br_multicast_querier_exists(br, &eth, NULL);
 
 unlock:
 	rcu_read_unlock();
diff --git a/net/bridge/br_private.h b/net/bridge/br_private.h
index 65d2c163a24a..ea0a13d9fbea 100644
--- a/net/bridge/br_private.h
+++ b/net/bridge/br_private.h
@@ -213,6 +213,7 @@ struct net_bridge_fdb_entry {
 #define MDB_PG_FLAGS_PERMANENT	BIT(0)
 #define MDB_PG_FLAGS_OFFLOAD	BIT(1)
 #define MDB_PG_FLAGS_FAST_LEAVE	BIT(2)
+#define MDB_PG_FLAGS_L2		BIT(3)
 
 struct net_bridge_port_group {
 	struct net_bridge_port		*port;
@@ -233,6 +234,7 @@ struct net_bridge_mdb_entry {
 	struct timer_list		timer;
 	struct br_ip			addr;
 	bool				host_joined;
+	bool				l2;
 	struct hlist_node		mdb_node;
 };
 
@@ -816,7 +818,8 @@ __br_multicast_querier_exists(struct net_bridge *br,
 }
 
 static inline bool br_multicast_querier_exists(struct net_bridge *br,
-					       struct ethhdr *eth)
+					       struct ethhdr *eth,
+					       const struct net_bridge_mdb_entry *mdb)
 {
 	switch (eth->h_proto) {
 	case (htons(ETH_P_IP)):
@@ -828,7 +831,7 @@ static inline bool br_multicast_querier_exists(struct net_bridge *br,
 			&br->ip6_other_query, true);
 #endif
 	default:
-		return false;
+		return !!(mdb && mdb->l2);
 	}
 }
 
-- 
2.25.4


  reply	other threads:[~2020-07-08 12:55 UTC|newest]

Thread overview: 7+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-07-08  9:04 What is the correct way to install an L2 multicast route into a bridge? Vladimir Oltean
2020-07-08  9:16 ` Nikolay Aleksandrov
2020-07-08  9:42   ` Vladimir Oltean
2020-07-08 11:07     ` Nikolay Aleksandrov
2020-07-08 11:17       ` Nikolay Aleksandrov
2020-07-08 12:55         ` Nikolay Aleksandrov [this message]
2020-07-08 13:10           ` Vladimir Oltean

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=9621e436-b674-ea12-eabd-9908ca6d5ee8@cumulusnetworks.com \
    --to=nikolay@cumulusnetworks.com \
    --cc=netdev@vger.kernel.org \
    --cc=olteanv@gmail.com \
    --cc=roopa@cumulusnetworks.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).