linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH] net: bridge: Allow bridge to joing multicast groups
@ 2019-07-25 11:44 Horatiu Vultur
  2019-07-25 13:06 ` Nikolay Aleksandrov
  2019-08-01 19:17 ` Vivien Didelot
  0 siblings, 2 replies; 43+ messages in thread
From: Horatiu Vultur @ 2019-07-25 11:44 UTC (permalink / raw)
  To: roopa, nikolay, davem, bridge, netdev, linux-kernel, allan.nielsen
  Cc: Horatiu Vultur

There is no way to configure the bridge, to receive only specific link
layer multicast addresses. From the description of the command 'bridge
fdb append' is supposed to do that, but there was no way to notify the
network driver that the bridge joined a group, because LLADDR was added
to the unicast netdev_hw_addr_list.

Therefore update fdb_add_entry to check if the NLM_F_APPEND flag is set
and if the source is NULL, which represent the bridge itself. Then add
address to multicast netdev_hw_addr_list for each bridge interfaces.
And then the .ndo_set_rx_mode function on the driver is called. To notify
the driver that the list of multicast mac addresses changed.

Signed-off-by: Horatiu Vultur <horatiu.vultur@microchip.com>
---
 net/bridge/br_fdb.c | 49 ++++++++++++++++++++++++++++++++++++++++++++++---
 1 file changed, 46 insertions(+), 3 deletions(-)

diff --git a/net/bridge/br_fdb.c b/net/bridge/br_fdb.c
index b1d3248..d93746d 100644
--- a/net/bridge/br_fdb.c
+++ b/net/bridge/br_fdb.c
@@ -175,6 +175,29 @@ static void fdb_add_hw_addr(struct net_bridge *br, const unsigned char *addr)
 	}
 }
 
+static void fdb_add_hw_maddr(struct net_bridge *br, const unsigned char *addr)
+{
+	int err;
+	struct net_bridge_port *p;
+
+	ASSERT_RTNL();
+
+	list_for_each_entry(p, &br->port_list, list) {
+		if (!br_promisc_port(p)) {
+			err = dev_mc_add(p->dev, addr);
+			if (err)
+				goto undo;
+		}
+	}
+
+	return;
+undo:
+	list_for_each_entry_continue_reverse(p, &br->port_list, list) {
+		if (!br_promisc_port(p))
+			dev_mc_del(p->dev, addr);
+	}
+}
+
 /* When a static FDB entry is deleted, the HW address from that entry is
  * also removed from the bridge private HW address list and updates all
  * the ports with needed information.
@@ -192,13 +215,27 @@ static void fdb_del_hw_addr(struct net_bridge *br, const unsigned char *addr)
 	}
 }
 
+static void fdb_del_hw_maddr(struct net_bridge *br, const unsigned char *addr)
+{
+	struct net_bridge_port *p;
+
+	ASSERT_RTNL();
+
+	list_for_each_entry(p, &br->port_list, list) {
+		if (!br_promisc_port(p))
+			dev_mc_del(p->dev, addr);
+	}
+}
+
 static void fdb_delete(struct net_bridge *br, struct net_bridge_fdb_entry *f,
 		       bool swdev_notify)
 {
 	trace_fdb_delete(br, f);
 
-	if (f->is_static)
+	if (f->is_static) {
 		fdb_del_hw_addr(br, f->key.addr.addr);
+		fdb_del_hw_maddr(br, f->key.addr.addr);
+	}
 
 	hlist_del_init_rcu(&f->fdb_node);
 	rhashtable_remove_fast(&br->fdb_hash_tbl, &f->rhnode,
@@ -843,13 +880,19 @@ static int fdb_add_entry(struct net_bridge *br, struct net_bridge_port *source,
 			fdb->is_local = 1;
 			if (!fdb->is_static) {
 				fdb->is_static = 1;
-				fdb_add_hw_addr(br, addr);
+				if (flags & NLM_F_APPEND && !source)
+					fdb_add_hw_maddr(br, addr);
+				else
+					fdb_add_hw_addr(br, addr);
 			}
 		} else if (state & NUD_NOARP) {
 			fdb->is_local = 0;
 			if (!fdb->is_static) {
 				fdb->is_static = 1;
-				fdb_add_hw_addr(br, addr);
+				if (flags & NLM_F_APPEND && !source)
+					fdb_add_hw_maddr(br, addr);
+				else
+					fdb_add_hw_addr(br, addr);
 			}
 		} else {
 			fdb->is_local = 0;
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 43+ messages in thread

* Re: [PATCH] net: bridge: Allow bridge to joing multicast groups
  2019-07-25 11:44 [PATCH] net: bridge: Allow bridge to joing multicast groups Horatiu Vultur
@ 2019-07-25 13:06 ` Nikolay Aleksandrov
  2019-07-25 13:21   ` Nikolay Aleksandrov
  2019-08-01 19:17 ` Vivien Didelot
  1 sibling, 1 reply; 43+ messages in thread
From: Nikolay Aleksandrov @ 2019-07-25 13:06 UTC (permalink / raw)
  To: Horatiu Vultur, roopa, davem, bridge, netdev, linux-kernel,
	allan.nielsen

On 25/07/2019 14:44, Horatiu Vultur wrote:
> There is no way to configure the bridge, to receive only specific link
> layer multicast addresses. From the description of the command 'bridge
> fdb append' is supposed to do that, but there was no way to notify the
> network driver that the bridge joined a group, because LLADDR was added
> to the unicast netdev_hw_addr_list.
> 
> Therefore update fdb_add_entry to check if the NLM_F_APPEND flag is set
> and if the source is NULL, which represent the bridge itself. Then add
> address to multicast netdev_hw_addr_list for each bridge interfaces.
> And then the .ndo_set_rx_mode function on the driver is called. To notify
> the driver that the list of multicast mac addresses changed.
> 
> Signed-off-by: Horatiu Vultur <horatiu.vultur@microchip.com>
> ---
>  net/bridge/br_fdb.c | 49 ++++++++++++++++++++++++++++++++++++++++++++++---
>  1 file changed, 46 insertions(+), 3 deletions(-)
> 

Hi,
I'm sorry but this patch is wrong on many levels, some notes below. In general
NLM_F_APPEND is only used in vxlan, the bridge does not handle that flag at all.
FDB is only for *unicast*, nothing is joined and no multicast should be used with fdbs.
MDB is used for multicast handling, but both of these are used for forwarding.
The reason the static fdbs are added to the filter is for non-promisc ports, so they can
receive traffic destined for these FDBs for forwarding.
If you'd like to join any multicast group please use the standard way, if you'd like to join
it only on a specific port - join it only on that port (or ports) and the bridge and you'll
have the effect that you're describing. What do you mean there's no way ?

In addition you're allowing a mix of mcast functions to be called with unicast addresses
and vice versa, it is not that big of a deal because the kernel will simply return an error
but still makes no sense.

Nacked-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>

> diff --git a/net/bridge/br_fdb.c b/net/bridge/br_fdb.c
> index b1d3248..d93746d 100644
> --- a/net/bridge/br_fdb.c
> +++ b/net/bridge/br_fdb.c
> @@ -175,6 +175,29 @@ static void fdb_add_hw_addr(struct net_bridge *br, const unsigned char *addr)
>  	}
>  }
>  
> +static void fdb_add_hw_maddr(struct net_bridge *br, const unsigned char *addr)
> +{
> +	int err;
> +	struct net_bridge_port *p;
> +
> +	ASSERT_RTNL();
> +
> +	list_for_each_entry(p, &br->port_list, list) {
> +		if (!br_promisc_port(p)) {
> +			err = dev_mc_add(p->dev, addr);
> +			if (err)
> +				goto undo;
> +		}
> +	}
> +
> +	return;
> +undo:
> +	list_for_each_entry_continue_reverse(p, &br->port_list, list) {
> +		if (!br_promisc_port(p))
> +			dev_mc_del(p->dev, addr);
> +	}
> +}
> +
>  /* When a static FDB entry is deleted, the HW address from that entry is
>   * also removed from the bridge private HW address list and updates all
>   * the ports with needed information.
> @@ -192,13 +215,27 @@ static void fdb_del_hw_addr(struct net_bridge *br, const unsigned char *addr)
>  	}
>  }
>  
> +static void fdb_del_hw_maddr(struct net_bridge *br, const unsigned char *addr)
> +{
> +	struct net_bridge_port *p;
> +
> +	ASSERT_RTNL();
> +
> +	list_for_each_entry(p, &br->port_list, list) {
> +		if (!br_promisc_port(p))
> +			dev_mc_del(p->dev, addr);
> +	}
> +}
> +
>  static void fdb_delete(struct net_bridge *br, struct net_bridge_fdb_entry *f,
>  		       bool swdev_notify)
>  {
>  	trace_fdb_delete(br, f);
>  
> -	if (f->is_static)
> +	if (f->is_static) {
>  		fdb_del_hw_addr(br, f->key.addr.addr);
> +		fdb_del_hw_maddr(br, f->key.addr.addr);

Walking over all ports again for each static delete is a no-go.

> +	}
>  
>  	hlist_del_init_rcu(&f->fdb_node);
>  	rhashtable_remove_fast(&br->fdb_hash_tbl, &f->rhnode,
> @@ -843,13 +880,19 @@ static int fdb_add_entry(struct net_bridge *br, struct net_bridge_port *source,
>  			fdb->is_local = 1;
>  			if (!fdb->is_static) {
>  				fdb->is_static = 1;
> -				fdb_add_hw_addr(br, addr);
> +				if (flags & NLM_F_APPEND && !source)
> +					fdb_add_hw_maddr(br, addr);
> +				else
> +					fdb_add_hw_addr(br, addr);
>  			}
>  		} else if (state & NUD_NOARP) {
>  			fdb->is_local = 0;
>  			if (!fdb->is_static) {
>  				fdb->is_static = 1;
> -				fdb_add_hw_addr(br, addr);
> +				if (flags & NLM_F_APPEND && !source)
> +					fdb_add_hw_maddr(br, addr);
> +				else
> +					fdb_add_hw_addr(br, addr);
>  			}
>  		} else {
>  			fdb->is_local = 0;
> 


^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH] net: bridge: Allow bridge to joing multicast groups
  2019-07-25 13:06 ` Nikolay Aleksandrov
@ 2019-07-25 13:21   ` Nikolay Aleksandrov
  2019-07-25 14:21     ` Horatiu Vultur
  0 siblings, 1 reply; 43+ messages in thread
From: Nikolay Aleksandrov @ 2019-07-25 13:21 UTC (permalink / raw)
  To: Horatiu Vultur, roopa, davem, bridge, netdev, linux-kernel,
	allan.nielsen

On 25/07/2019 16:06, Nikolay Aleksandrov wrote:
> On 25/07/2019 14:44, Horatiu Vultur wrote:
>> There is no way to configure the bridge, to receive only specific link
>> layer multicast addresses. From the description of the command 'bridge
>> fdb append' is supposed to do that, but there was no way to notify the
>> network driver that the bridge joined a group, because LLADDR was added
>> to the unicast netdev_hw_addr_list.
>>
>> Therefore update fdb_add_entry to check if the NLM_F_APPEND flag is set
>> and if the source is NULL, which represent the bridge itself. Then add
>> address to multicast netdev_hw_addr_list for each bridge interfaces.
>> And then the .ndo_set_rx_mode function on the driver is called. To notify
>> the driver that the list of multicast mac addresses changed.
>>
>> Signed-off-by: Horatiu Vultur <horatiu.vultur@microchip.com>
>> ---
>>  net/bridge/br_fdb.c | 49 ++++++++++++++++++++++++++++++++++++++++++++++---
>>  1 file changed, 46 insertions(+), 3 deletions(-)
>>
> 
> Hi,
> I'm sorry but this patch is wrong on many levels, some notes below. In general
> NLM_F_APPEND is only used in vxlan, the bridge does not handle that flag at all.
> FDB is only for *unicast*, nothing is joined and no multicast should be used with fdbs.
> MDB is used for multicast handling, but both of these are used for forwarding.
> The reason the static fdbs are added to the filter is for non-promisc ports, so they can
> receive traffic destined for these FDBs for forwarding.
> If you'd like to join any multicast group please use the standard way, if you'd like to join
> it only on a specific port - join it only on that port (or ports) and the bridge and you'll

And obviously this is for the case where you're not enabling port promisc mode (non-default).
In general you'll only need to join the group on the bridge to receive traffic for it
or add it as an mdb entry to forward it.

> have the effect that you're describing. What do you mean there's no way ?
> 
> In addition you're allowing a mix of mcast functions to be called with unicast addresses
> and vice versa, it is not that big of a deal because the kernel will simply return an error
> but still makes no sense.
> 
> Nacked-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
> 
>> diff --git a/net/bridge/br_fdb.c b/net/bridge/br_fdb.c
>> index b1d3248..d93746d 100644
>> --- a/net/bridge/br_fdb.c
>> +++ b/net/bridge/br_fdb.c
>> @@ -175,6 +175,29 @@ static void fdb_add_hw_addr(struct net_bridge *br, const unsigned char *addr)
>>  	}
>>  }
>>  
>> +static void fdb_add_hw_maddr(struct net_bridge *br, const unsigned char *addr)
>> +{
>> +	int err;
>> +	struct net_bridge_port *p;
>> +
>> +	ASSERT_RTNL();
>> +
>> +	list_for_each_entry(p, &br->port_list, list) {
>> +		if (!br_promisc_port(p)) {
>> +			err = dev_mc_add(p->dev, addr);
>> +			if (err)
>> +				goto undo;
>> +		}
>> +	}
>> +
>> +	return;
>> +undo:
>> +	list_for_each_entry_continue_reverse(p, &br->port_list, list) {
>> +		if (!br_promisc_port(p))
>> +			dev_mc_del(p->dev, addr);
>> +	}
>> +}
>> +
>>  /* When a static FDB entry is deleted, the HW address from that entry is
>>   * also removed from the bridge private HW address list and updates all
>>   * the ports with needed information.
>> @@ -192,13 +215,27 @@ static void fdb_del_hw_addr(struct net_bridge *br, const unsigned char *addr)
>>  	}
>>  }
>>  
>> +static void fdb_del_hw_maddr(struct net_bridge *br, const unsigned char *addr)
>> +{
>> +	struct net_bridge_port *p;
>> +
>> +	ASSERT_RTNL();
>> +
>> +	list_for_each_entry(p, &br->port_list, list) {
>> +		if (!br_promisc_port(p))
>> +			dev_mc_del(p->dev, addr);
>> +	}
>> +}
>> +
>>  static void fdb_delete(struct net_bridge *br, struct net_bridge_fdb_entry *f,
>>  		       bool swdev_notify)
>>  {
>>  	trace_fdb_delete(br, f);
>>  
>> -	if (f->is_static)
>> +	if (f->is_static) {
>>  		fdb_del_hw_addr(br, f->key.addr.addr);
>> +		fdb_del_hw_maddr(br, f->key.addr.addr);
> 
> Walking over all ports again for each static delete is a no-go.
> 
>> +	}
>>  
>>  	hlist_del_init_rcu(&f->fdb_node);
>>  	rhashtable_remove_fast(&br->fdb_hash_tbl, &f->rhnode,
>> @@ -843,13 +880,19 @@ static int fdb_add_entry(struct net_bridge *br, struct net_bridge_port *source,
>>  			fdb->is_local = 1;
>>  			if (!fdb->is_static) {
>>  				fdb->is_static = 1;
>> -				fdb_add_hw_addr(br, addr);
>> +				if (flags & NLM_F_APPEND && !source)
>> +					fdb_add_hw_maddr(br, addr);
>> +				else
>> +					fdb_add_hw_addr(br, addr);
>>  			}
>>  		} else if (state & NUD_NOARP) {
>>  			fdb->is_local = 0;
>>  			if (!fdb->is_static) {
>>  				fdb->is_static = 1;
>> -				fdb_add_hw_addr(br, addr);
>> +				if (flags & NLM_F_APPEND && !source)
>> +					fdb_add_hw_maddr(br, addr);
>> +				else
>> +					fdb_add_hw_addr(br, addr);
>>  			}
>>  		} else {
>>  			fdb->is_local = 0;
>>
> 


^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH] net: bridge: Allow bridge to joing multicast groups
  2019-07-25 13:21   ` Nikolay Aleksandrov
@ 2019-07-25 14:21     ` Horatiu Vultur
  2019-07-26  8:41       ` Nikolay Aleksandrov
  0 siblings, 1 reply; 43+ messages in thread
From: Horatiu Vultur @ 2019-07-25 14:21 UTC (permalink / raw)
  To: Nikolay Aleksandrov
  Cc: roopa, davem, bridge, netdev, linux-kernel, allan.nielsen

Hi Nikolay,

The 07/25/2019 16:21, Nikolay Aleksandrov wrote:
> External E-Mail
> 
> 
> On 25/07/2019 16:06, Nikolay Aleksandrov wrote:
> > On 25/07/2019 14:44, Horatiu Vultur wrote:
> >> There is no way to configure the bridge, to receive only specific link
> >> layer multicast addresses. From the description of the command 'bridge
> >> fdb append' is supposed to do that, but there was no way to notify the
> >> network driver that the bridge joined a group, because LLADDR was added
> >> to the unicast netdev_hw_addr_list.
> >>
> >> Therefore update fdb_add_entry to check if the NLM_F_APPEND flag is set
> >> and if the source is NULL, which represent the bridge itself. Then add
> >> address to multicast netdev_hw_addr_list for each bridge interfaces.
> >> And then the .ndo_set_rx_mode function on the driver is called. To notify
> >> the driver that the list of multicast mac addresses changed.
> >>
> >> Signed-off-by: Horatiu Vultur <horatiu.vultur@microchip.com>
> >> ---
> >>  net/bridge/br_fdb.c | 49 ++++++++++++++++++++++++++++++++++++++++++++++---
> >>  1 file changed, 46 insertions(+), 3 deletions(-)
> >>
> > 
> > Hi,
> > I'm sorry but this patch is wrong on many levels, some notes below. In general
> > NLM_F_APPEND is only used in vxlan, the bridge does not handle that flag at all.
> > FDB is only for *unicast*, nothing is joined and no multicast should be used with fdbs.
> > MDB is used for multicast handling, but both of these are used for forwarding.
> > The reason the static fdbs are added to the filter is for non-promisc ports, so they can
> > receive traffic destined for these FDBs for forwarding.
> > If you'd like to join any multicast group please use the standard way, if you'd like to join
> > it only on a specific port - join it only on that port (or ports) and the bridge and you'll
> 
> And obviously this is for the case where you're not enabling port promisc mode (non-default).
> In general you'll only need to join the group on the bridge to receive traffic for it
> or add it as an mdb entry to forward it.
> 
> > have the effect that you're describing. What do you mean there's no way ?

Thanks for the explanation.
There are few things that are not 100% clear to me and maybe you can
explain them, not to go totally in the wrong direction. Currently I am
writing a network driver on which I added switchdev support. Then I was
looking for a way to configure the network driver to copy link layer
multicast address to the CPU port.

If I am using bridge mdb I can do it only for IP multicast addreses,
but how should I do it if I want non IP frames with link layer multicast
address to be copy to CPU? For example: all frames with multicast
address '01-21-6C-00-00-01' to be copy to CPU. What is the user space
command for that?

> > 
> > In addition you're allowing a mix of mcast functions to be called with unicast addresses
> > and vice versa, it is not that big of a deal because the kernel will simply return an error
> > but still makes no sense.
> > 
> > Nacked-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
> > 
> >> diff --git a/net/bridge/br_fdb.c b/net/bridge/br_fdb.c
> >> index b1d3248..d93746d 100644
> >> --- a/net/bridge/br_fdb.c
> >> +++ b/net/bridge/br_fdb.c
> >> @@ -175,6 +175,29 @@ static void fdb_add_hw_addr(struct net_bridge *br, const unsigned char *addr)
> >>  	}
> >>  }
> >>  
> >> +static void fdb_add_hw_maddr(struct net_bridge *br, const unsigned char *addr)
> >> +{
> >> +	int err;
> >> +	struct net_bridge_port *p;
> >> +
> >> +	ASSERT_RTNL();
> >> +
> >> +	list_for_each_entry(p, &br->port_list, list) {
> >> +		if (!br_promisc_port(p)) {
> >> +			err = dev_mc_add(p->dev, addr);
> >> +			if (err)
> >> +				goto undo;
> >> +		}
> >> +	}
> >> +
> >> +	return;
> >> +undo:
> >> +	list_for_each_entry_continue_reverse(p, &br->port_list, list) {
> >> +		if (!br_promisc_port(p))
> >> +			dev_mc_del(p->dev, addr);
> >> +	}
> >> +}
> >> +
> >>  /* When a static FDB entry is deleted, the HW address from that entry is
> >>   * also removed from the bridge private HW address list and updates all
> >>   * the ports with needed information.
> >> @@ -192,13 +215,27 @@ static void fdb_del_hw_addr(struct net_bridge *br, const unsigned char *addr)
> >>  	}
> >>  }
> >>  
> >> +static void fdb_del_hw_maddr(struct net_bridge *br, const unsigned char *addr)
> >> +{
> >> +	struct net_bridge_port *p;
> >> +
> >> +	ASSERT_RTNL();
> >> +
> >> +	list_for_each_entry(p, &br->port_list, list) {
> >> +		if (!br_promisc_port(p))
> >> +			dev_mc_del(p->dev, addr);
> >> +	}
> >> +}
> >> +
> >>  static void fdb_delete(struct net_bridge *br, struct net_bridge_fdb_entry *f,
> >>  		       bool swdev_notify)
> >>  {
> >>  	trace_fdb_delete(br, f);
> >>  
> >> -	if (f->is_static)
> >> +	if (f->is_static) {
> >>  		fdb_del_hw_addr(br, f->key.addr.addr);
> >> +		fdb_del_hw_maddr(br, f->key.addr.addr);
> > 
> > Walking over all ports again for each static delete is a no-go.
> > 
> >> +	}
> >>  
> >>  	hlist_del_init_rcu(&f->fdb_node);
> >>  	rhashtable_remove_fast(&br->fdb_hash_tbl, &f->rhnode,
> >> @@ -843,13 +880,19 @@ static int fdb_add_entry(struct net_bridge *br, struct net_bridge_port *source,
> >>  			fdb->is_local = 1;
> >>  			if (!fdb->is_static) {
> >>  				fdb->is_static = 1;
> >> -				fdb_add_hw_addr(br, addr);
> >> +				if (flags & NLM_F_APPEND && !source)
> >> +					fdb_add_hw_maddr(br, addr);
> >> +				else
> >> +					fdb_add_hw_addr(br, addr);
> >>  			}
> >>  		} else if (state & NUD_NOARP) {
> >>  			fdb->is_local = 0;
> >>  			if (!fdb->is_static) {
> >>  				fdb->is_static = 1;
> >> -				fdb_add_hw_addr(br, addr);
> >> +				if (flags & NLM_F_APPEND && !source)
> >> +					fdb_add_hw_maddr(br, addr);
> >> +				else
> >> +					fdb_add_hw_addr(br, addr);
> >>  			}
> >>  		} else {
> >>  			fdb->is_local = 0;
> >>
> > 
> 

-- 
/Horatiu

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH] net: bridge: Allow bridge to joing multicast groups
  2019-07-25 14:21     ` Horatiu Vultur
@ 2019-07-26  8:41       ` Nikolay Aleksandrov
  2019-07-26  9:26         ` Nikolay Aleksandrov
  0 siblings, 1 reply; 43+ messages in thread
From: Nikolay Aleksandrov @ 2019-07-26  8:41 UTC (permalink / raw)
  To: Horatiu Vultur; +Cc: roopa, davem, bridge, netdev, linux-kernel, allan.nielsen

On 25/07/2019 17:21, Horatiu Vultur wrote:
> Hi Nikolay,
> 
> The 07/25/2019 16:21, Nikolay Aleksandrov wrote:
>> External E-Mail
>>
>>
>> On 25/07/2019 16:06, Nikolay Aleksandrov wrote:
>>> On 25/07/2019 14:44, Horatiu Vultur wrote:
>>>> There is no way to configure the bridge, to receive only specific link
>>>> layer multicast addresses. From the description of the command 'bridge
>>>> fdb append' is supposed to do that, but there was no way to notify the
>>>> network driver that the bridge joined a group, because LLADDR was added
>>>> to the unicast netdev_hw_addr_list.
>>>>
>>>> Therefore update fdb_add_entry to check if the NLM_F_APPEND flag is set
>>>> and if the source is NULL, which represent the bridge itself. Then add
>>>> address to multicast netdev_hw_addr_list for each bridge interfaces.
>>>> And then the .ndo_set_rx_mode function on the driver is called. To notify
>>>> the driver that the list of multicast mac addresses changed.
>>>>
>>>> Signed-off-by: Horatiu Vultur <horatiu.vultur@microchip.com>
>>>> ---
>>>>  net/bridge/br_fdb.c | 49 ++++++++++++++++++++++++++++++++++++++++++++++---
>>>>  1 file changed, 46 insertions(+), 3 deletions(-)
>>>>
>>>
>>> Hi,
>>> I'm sorry but this patch is wrong on many levels, some notes below. In general
>>> NLM_F_APPEND is only used in vxlan, the bridge does not handle that flag at all.
>>> FDB is only for *unicast*, nothing is joined and no multicast should be used with fdbs.
>>> MDB is used for multicast handling, but both of these are used for forwarding.
>>> The reason the static fdbs are added to the filter is for non-promisc ports, so they can
>>> receive traffic destined for these FDBs for forwarding.
>>> If you'd like to join any multicast group please use the standard way, if you'd like to join
>>> it only on a specific port - join it only on that port (or ports) and the bridge and you'll
>>
>> And obviously this is for the case where you're not enabling port promisc mode (non-default).
>> In general you'll only need to join the group on the bridge to receive traffic for it
>> or add it as an mdb entry to forward it.
>>
>>> have the effect that you're describing. What do you mean there's no way ?
> 
> Thanks for the explanation.
> There are few things that are not 100% clear to me and maybe you can
> explain them, not to go totally in the wrong direction. Currently I am
> writing a network driver on which I added switchdev support. Then I was
> looking for a way to configure the network driver to copy link layer
> multicast address to the CPU port.
> 
> If I am using bridge mdb I can do it only for IP multicast addreses,
> but how should I do it if I want non IP frames with link layer multicast
> address to be copy to CPU? For example: all frames with multicast
> address '01-21-6C-00-00-01' to be copy to CPU. What is the user space
> command for that?
> 

Check SIOCADDMULTI (ip maddr from iproute2), f.e. add that mac to the port
which needs to receive it and the bridge will send it up automatically since
it's unknown mcast (note that if there's a querier, you'll have to make the
bridge mcast router if it is not the querier itself). It would also flood it to all
other ports so you may want to control that. It really depends on the setup
and the how the hardware is configured.

>>>
>>> In addition you're allowing a mix of mcast functions to be called with unicast addresses
>>> and vice versa, it is not that big of a deal because the kernel will simply return an error
>>> but still makes no sense.
>>>
>>> Nacked-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
>>>
>>>> diff --git a/net/bridge/br_fdb.c b/net/bridge/br_fdb.c
>>>> index b1d3248..d93746d 100644
>>>> --- a/net/bridge/br_fdb.c
>>>> +++ b/net/bridge/br_fdb.c
>>>> @@ -175,6 +175,29 @@ static void fdb_add_hw_addr(struct net_bridge *br, const unsigned char *addr)
>>>>  	}
>>>>  }
>>>>  
>>>> +static void fdb_add_hw_maddr(struct net_bridge *br, const unsigned char *addr)
>>>> +{
>>>> +	int err;
>>>> +	struct net_bridge_port *p;
>>>> +
>>>> +	ASSERT_RTNL();
>>>> +
>>>> +	list_for_each_entry(p, &br->port_list, list) {
>>>> +		if (!br_promisc_port(p)) {
>>>> +			err = dev_mc_add(p->dev, addr);
>>>> +			if (err)
>>>> +				goto undo;
>>>> +		}
>>>> +	}
>>>> +
>>>> +	return;
>>>> +undo:
>>>> +	list_for_each_entry_continue_reverse(p, &br->port_list, list) {
>>>> +		if (!br_promisc_port(p))
>>>> +			dev_mc_del(p->dev, addr);
>>>> +	}
>>>> +}
>>>> +
>>>>  /* When a static FDB entry is deleted, the HW address from that entry is
>>>>   * also removed from the bridge private HW address list and updates all
>>>>   * the ports with needed information.
>>>> @@ -192,13 +215,27 @@ static void fdb_del_hw_addr(struct net_bridge *br, const unsigned char *addr)
>>>>  	}
>>>>  }
>>>>  
>>>> +static void fdb_del_hw_maddr(struct net_bridge *br, const unsigned char *addr)
>>>> +{
>>>> +	struct net_bridge_port *p;
>>>> +
>>>> +	ASSERT_RTNL();
>>>> +
>>>> +	list_for_each_entry(p, &br->port_list, list) {
>>>> +		if (!br_promisc_port(p))
>>>> +			dev_mc_del(p->dev, addr);
>>>> +	}
>>>> +}
>>>> +
>>>>  static void fdb_delete(struct net_bridge *br, struct net_bridge_fdb_entry *f,
>>>>  		       bool swdev_notify)
>>>>  {
>>>>  	trace_fdb_delete(br, f);
>>>>  
>>>> -	if (f->is_static)
>>>> +	if (f->is_static) {
>>>>  		fdb_del_hw_addr(br, f->key.addr.addr);
>>>> +		fdb_del_hw_maddr(br, f->key.addr.addr);
>>>
>>> Walking over all ports again for each static delete is a no-go.
>>>
>>>> +	}
>>>>  
>>>>  	hlist_del_init_rcu(&f->fdb_node);
>>>>  	rhashtable_remove_fast(&br->fdb_hash_tbl, &f->rhnode,
>>>> @@ -843,13 +880,19 @@ static int fdb_add_entry(struct net_bridge *br, struct net_bridge_port *source,
>>>>  			fdb->is_local = 1;
>>>>  			if (!fdb->is_static) {
>>>>  				fdb->is_static = 1;
>>>> -				fdb_add_hw_addr(br, addr);
>>>> +				if (flags & NLM_F_APPEND && !source)
>>>> +					fdb_add_hw_maddr(br, addr);
>>>> +				else
>>>> +					fdb_add_hw_addr(br, addr);
>>>>  			}
>>>>  		} else if (state & NUD_NOARP) {
>>>>  			fdb->is_local = 0;
>>>>  			if (!fdb->is_static) {
>>>>  				fdb->is_static = 1;
>>>> -				fdb_add_hw_addr(br, addr);
>>>> +				if (flags & NLM_F_APPEND && !source)
>>>> +					fdb_add_hw_maddr(br, addr);
>>>> +				else
>>>> +					fdb_add_hw_addr(br, addr);
>>>>  			}
>>>>  		} else {
>>>>  			fdb->is_local = 0;
>>>>
>>>
>>
> 


^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH] net: bridge: Allow bridge to joing multicast groups
  2019-07-26  8:41       ` Nikolay Aleksandrov
@ 2019-07-26  9:26         ` Nikolay Aleksandrov
  2019-07-26 12:02           ` Horatiu Vultur
  0 siblings, 1 reply; 43+ messages in thread
From: Nikolay Aleksandrov @ 2019-07-26  9:26 UTC (permalink / raw)
  To: Horatiu Vultur; +Cc: roopa, davem, bridge, netdev, linux-kernel, allan.nielsen

On 26/07/2019 11:41, Nikolay Aleksandrov wrote:
> On 25/07/2019 17:21, Horatiu Vultur wrote:
>> Hi Nikolay,
>>
>> The 07/25/2019 16:21, Nikolay Aleksandrov wrote:
>>> External E-Mail
>>>
>>>
>>> On 25/07/2019 16:06, Nikolay Aleksandrov wrote:
>>>> On 25/07/2019 14:44, Horatiu Vultur wrote:
>>>>> There is no way to configure the bridge, to receive only specific link
>>>>> layer multicast addresses. From the description of the command 'bridge
>>>>> fdb append' is supposed to do that, but there was no way to notify the
>>>>> network driver that the bridge joined a group, because LLADDR was added
>>>>> to the unicast netdev_hw_addr_list.
>>>>>
>>>>> Therefore update fdb_add_entry to check if the NLM_F_APPEND flag is set
>>>>> and if the source is NULL, which represent the bridge itself. Then add
>>>>> address to multicast netdev_hw_addr_list for each bridge interfaces.
>>>>> And then the .ndo_set_rx_mode function on the driver is called. To notify
>>>>> the driver that the list of multicast mac addresses changed.
>>>>>
>>>>> Signed-off-by: Horatiu Vultur <horatiu.vultur@microchip.com>
>>>>> ---
>>>>>  net/bridge/br_fdb.c | 49 ++++++++++++++++++++++++++++++++++++++++++++++---
>>>>>  1 file changed, 46 insertions(+), 3 deletions(-)
>>>>>
>>>>
>>>> Hi,
>>>> I'm sorry but this patch is wrong on many levels, some notes below. In general
>>>> NLM_F_APPEND is only used in vxlan, the bridge does not handle that flag at all.
>>>> FDB is only for *unicast*, nothing is joined and no multicast should be used with fdbs.
>>>> MDB is used for multicast handling, but both of these are used for forwarding.
>>>> The reason the static fdbs are added to the filter is for non-promisc ports, so they can
>>>> receive traffic destined for these FDBs for forwarding.
>>>> If you'd like to join any multicast group please use the standard way, if you'd like to join
>>>> it only on a specific port - join it only on that port (or ports) and the bridge and you'll
>>>
>>> And obviously this is for the case where you're not enabling port promisc mode (non-default).
>>> In general you'll only need to join the group on the bridge to receive traffic for it
>>> or add it as an mdb entry to forward it.
>>>
>>>> have the effect that you're describing. What do you mean there's no way ?
>>
>> Thanks for the explanation.
>> There are few things that are not 100% clear to me and maybe you can
>> explain them, not to go totally in the wrong direction. Currently I am
>> writing a network driver on which I added switchdev support. Then I was
>> looking for a way to configure the network driver to copy link layer
>> multicast address to the CPU port.
>>
>> If I am using bridge mdb I can do it only for IP multicast addreses,
>> but how should I do it if I want non IP frames with link layer multicast
>> address to be copy to CPU? For example: all frames with multicast
>> address '01-21-6C-00-00-01' to be copy to CPU. What is the user space
>> command for that?
>>
> 
> Check SIOCADDMULTI (ip maddr from iproute2), f.e. add that mac to the port
> which needs to receive it and the bridge will send it up automatically since
> it's unknown mcast (note that if there's a querier, you'll have to make the
> bridge mcast router if it is not the querier itself). It would also flood it to all

Actually you mentioned non-IP traffic, so the querier stuff is not a problem. This
traffic will always be flooded by the bridge (and also a copy will be locally sent up).
Thus only the flooding may need to be controlled.

> other ports so you may want to control that. It really depends on the setup
> and the how the hardware is configured.
> 
>>>>
>>>> In addition you're allowing a mix of mcast functions to be called with unicast addresses
>>>> and vice versa, it is not that big of a deal because the kernel will simply return an error
>>>> but still makes no sense.
>>>>
>>>> Nacked-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
>>>>
>>>>> diff --git a/net/bridge/br_fdb.c b/net/bridge/br_fdb.c
>>>>> index b1d3248..d93746d 100644
>>>>> --- a/net/bridge/br_fdb.c
>>>>> +++ b/net/bridge/br_fdb.c
>>>>> @@ -175,6 +175,29 @@ static void fdb_add_hw_addr(struct net_bridge *br, const unsigned char *addr)
>>>>>  	}
>>>>>  }
>>>>>  
>>>>> +static void fdb_add_hw_maddr(struct net_bridge *br, const unsigned char *addr)
>>>>> +{
>>>>> +	int err;
>>>>> +	struct net_bridge_port *p;
>>>>> +
>>>>> +	ASSERT_RTNL();
>>>>> +
>>>>> +	list_for_each_entry(p, &br->port_list, list) {
>>>>> +		if (!br_promisc_port(p)) {
>>>>> +			err = dev_mc_add(p->dev, addr);
>>>>> +			if (err)
>>>>> +				goto undo;
>>>>> +		}
>>>>> +	}
>>>>> +
>>>>> +	return;
>>>>> +undo:
>>>>> +	list_for_each_entry_continue_reverse(p, &br->port_list, list) {
>>>>> +		if (!br_promisc_port(p))
>>>>> +			dev_mc_del(p->dev, addr);
>>>>> +	}
>>>>> +}
>>>>> +
>>>>>  /* When a static FDB entry is deleted, the HW address from that entry is
>>>>>   * also removed from the bridge private HW address list and updates all
>>>>>   * the ports with needed information.
>>>>> @@ -192,13 +215,27 @@ static void fdb_del_hw_addr(struct net_bridge *br, const unsigned char *addr)
>>>>>  	}
>>>>>  }
>>>>>  
>>>>> +static void fdb_del_hw_maddr(struct net_bridge *br, const unsigned char *addr)
>>>>> +{
>>>>> +	struct net_bridge_port *p;
>>>>> +
>>>>> +	ASSERT_RTNL();
>>>>> +
>>>>> +	list_for_each_entry(p, &br->port_list, list) {
>>>>> +		if (!br_promisc_port(p))
>>>>> +			dev_mc_del(p->dev, addr);
>>>>> +	}
>>>>> +}
>>>>> +
>>>>>  static void fdb_delete(struct net_bridge *br, struct net_bridge_fdb_entry *f,
>>>>>  		       bool swdev_notify)
>>>>>  {
>>>>>  	trace_fdb_delete(br, f);
>>>>>  
>>>>> -	if (f->is_static)
>>>>> +	if (f->is_static) {
>>>>>  		fdb_del_hw_addr(br, f->key.addr.addr);
>>>>> +		fdb_del_hw_maddr(br, f->key.addr.addr);
>>>>
>>>> Walking over all ports again for each static delete is a no-go.
>>>>
>>>>> +	}
>>>>>  
>>>>>  	hlist_del_init_rcu(&f->fdb_node);
>>>>>  	rhashtable_remove_fast(&br->fdb_hash_tbl, &f->rhnode,
>>>>> @@ -843,13 +880,19 @@ static int fdb_add_entry(struct net_bridge *br, struct net_bridge_port *source,
>>>>>  			fdb->is_local = 1;
>>>>>  			if (!fdb->is_static) {
>>>>>  				fdb->is_static = 1;
>>>>> -				fdb_add_hw_addr(br, addr);
>>>>> +				if (flags & NLM_F_APPEND && !source)
>>>>> +					fdb_add_hw_maddr(br, addr);
>>>>> +				else
>>>>> +					fdb_add_hw_addr(br, addr);
>>>>>  			}
>>>>>  		} else if (state & NUD_NOARP) {
>>>>>  			fdb->is_local = 0;
>>>>>  			if (!fdb->is_static) {
>>>>>  				fdb->is_static = 1;
>>>>> -				fdb_add_hw_addr(br, addr);
>>>>> +				if (flags & NLM_F_APPEND && !source)
>>>>> +					fdb_add_hw_maddr(br, addr);
>>>>> +				else
>>>>> +					fdb_add_hw_addr(br, addr);
>>>>>  			}
>>>>>  		} else {
>>>>>  			fdb->is_local = 0;
>>>>>
>>>>
>>>
>>
> 


^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH] net: bridge: Allow bridge to joing multicast groups
  2019-07-26  9:26         ` Nikolay Aleksandrov
@ 2019-07-26 12:02           ` Horatiu Vultur
  2019-07-26 12:31             ` Nikolay Aleksandrov
  2019-07-26 13:46             ` Andrew Lunn
  0 siblings, 2 replies; 43+ messages in thread
From: Horatiu Vultur @ 2019-07-26 12:02 UTC (permalink / raw)
  To: Nikolay Aleksandrov
  Cc: roopa, davem, bridge, netdev, linux-kernel, allan.nielsen

Hi Nikolay,

The 07/26/2019 12:26, Nikolay Aleksandrov wrote:
> External E-Mail
> 
> 
> On 26/07/2019 11:41, Nikolay Aleksandrov wrote:
> > On 25/07/2019 17:21, Horatiu Vultur wrote:
> >> Hi Nikolay,
> >>
> >> The 07/25/2019 16:21, Nikolay Aleksandrov wrote:
> >>> External E-Mail
> >>>
> >>>
> >>> On 25/07/2019 16:06, Nikolay Aleksandrov wrote:
> >>>> On 25/07/2019 14:44, Horatiu Vultur wrote:
> >>>>> There is no way to configure the bridge, to receive only specific link
> >>>>> layer multicast addresses. From the description of the command 'bridge
> >>>>> fdb append' is supposed to do that, but there was no way to notify the
> >>>>> network driver that the bridge joined a group, because LLADDR was added
> >>>>> to the unicast netdev_hw_addr_list.
> >>>>>
> >>>>> Therefore update fdb_add_entry to check if the NLM_F_APPEND flag is set
> >>>>> and if the source is NULL, which represent the bridge itself. Then add
> >>>>> address to multicast netdev_hw_addr_list for each bridge interfaces.
> >>>>> And then the .ndo_set_rx_mode function on the driver is called. To notify
> >>>>> the driver that the list of multicast mac addresses changed.
> >>>>>
> >>>>> Signed-off-by: Horatiu Vultur <horatiu.vultur@microchip.com>
> >>>>> ---
> >>>>>  net/bridge/br_fdb.c | 49 ++++++++++++++++++++++++++++++++++++++++++++++---
> >>>>>  1 file changed, 46 insertions(+), 3 deletions(-)
> >>>>>
> >>>>
> >>>> Hi,
> >>>> I'm sorry but this patch is wrong on many levels, some notes below. In general
> >>>> NLM_F_APPEND is only used in vxlan, the bridge does not handle that flag at all.
> >>>> FDB is only for *unicast*, nothing is joined and no multicast should be used with fdbs.
> >>>> MDB is used for multicast handling, but both of these are used for forwarding.
> >>>> The reason the static fdbs are added to the filter is for non-promisc ports, so they can
> >>>> receive traffic destined for these FDBs for forwarding.
> >>>> If you'd like to join any multicast group please use the standard way, if you'd like to join
> >>>> it only on a specific port - join it only on that port (or ports) and the bridge and you'll
> >>>
> >>> And obviously this is for the case where you're not enabling port promisc mode (non-default).
> >>> In general you'll only need to join the group on the bridge to receive traffic for it
> >>> or add it as an mdb entry to forward it.
> >>>
> >>>> have the effect that you're describing. What do you mean there's no way ?
> >>
> >> Thanks for the explanation.
> >> There are few things that are not 100% clear to me and maybe you can
> >> explain them, not to go totally in the wrong direction. Currently I am
> >> writing a network driver on which I added switchdev support. Then I was
> >> looking for a way to configure the network driver to copy link layer
> >> multicast address to the CPU port.
> >>
> >> If I am using bridge mdb I can do it only for IP multicast addreses,
> >> but how should I do it if I want non IP frames with link layer multicast
> >> address to be copy to CPU? For example: all frames with multicast
> >> address '01-21-6C-00-00-01' to be copy to CPU. What is the user space
> >> command for that?
> >>
> > 
> > Check SIOCADDMULTI (ip maddr from iproute2), f.e. add that mac to the port
> > which needs to receive it and the bridge will send it up automatically since
> > it's unknown mcast (note that if there's a querier, you'll have to make the
> > bridge mcast router if it is not the querier itself). It would also flood it to all
> 
> Actually you mentioned non-IP traffic, so the querier stuff is not a problem. This
> traffic will always be flooded by the bridge (and also a copy will be locally sent up).
> Thus only the flooding may need to be controlled.

OK, I see, but the part which is not clear to me is, which bridge
command(from iproute2) to use so the bridge would notify the network
driver(using swichdev or not) to configure the HW to copy all the frames
with dmac '01-21-6C-00-00-01' to CPU? So that the bridge can receive
those frames and then just to pass them up.
Definitly I would not like to set the front ports in promisc mode, to
copy all the frames to CPU because I think that would overkill it.
Thanks,

> 
> > other ports so you may want to control that. It really depends on the setup
> > and the how the hardware is configured.
> > 
> >>>>
> >>>> In addition you're allowing a mix of mcast functions to be called with unicast addresses
> >>>> and vice versa, it is not that big of a deal because the kernel will simply return an error
> >>>> but still makes no sense.
> >>>>
> >>>> Nacked-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
> >>>>
> >>>>> diff --git a/net/bridge/br_fdb.c b/net/bridge/br_fdb.c
> >>>>> index b1d3248..d93746d 100644
> >>>>> --- a/net/bridge/br_fdb.c
> >>>>> +++ b/net/bridge/br_fdb.c
> >>>>> @@ -175,6 +175,29 @@ static void fdb_add_hw_addr(struct net_bridge *br, const unsigned char *addr)
> >>>>>  	}
> >>>>>  }
> >>>>>  
> >>>>> +static void fdb_add_hw_maddr(struct net_bridge *br, const unsigned char *addr)
> >>>>> +{
> >>>>> +	int err;
> >>>>> +	struct net_bridge_port *p;
> >>>>> +
> >>>>> +	ASSERT_RTNL();
> >>>>> +
> >>>>> +	list_for_each_entry(p, &br->port_list, list) {
> >>>>> +		if (!br_promisc_port(p)) {
> >>>>> +			err = dev_mc_add(p->dev, addr);
> >>>>> +			if (err)
> >>>>> +				goto undo;
> >>>>> +		}
> >>>>> +	}
> >>>>> +
> >>>>> +	return;
> >>>>> +undo:
> >>>>> +	list_for_each_entry_continue_reverse(p, &br->port_list, list) {
> >>>>> +		if (!br_promisc_port(p))
> >>>>> +			dev_mc_del(p->dev, addr);
> >>>>> +	}
> >>>>> +}
> >>>>> +
> >>>>>  /* When a static FDB entry is deleted, the HW address from that entry is
> >>>>>   * also removed from the bridge private HW address list and updates all
> >>>>>   * the ports with needed information.
> >>>>> @@ -192,13 +215,27 @@ static void fdb_del_hw_addr(struct net_bridge *br, const unsigned char *addr)
> >>>>>  	}
> >>>>>  }
> >>>>>  
> >>>>> +static void fdb_del_hw_maddr(struct net_bridge *br, const unsigned char *addr)
> >>>>> +{
> >>>>> +	struct net_bridge_port *p;
> >>>>> +
> >>>>> +	ASSERT_RTNL();
> >>>>> +
> >>>>> +	list_for_each_entry(p, &br->port_list, list) {
> >>>>> +		if (!br_promisc_port(p))
> >>>>> +			dev_mc_del(p->dev, addr);
> >>>>> +	}
> >>>>> +}
> >>>>> +
> >>>>>  static void fdb_delete(struct net_bridge *br, struct net_bridge_fdb_entry *f,
> >>>>>  		       bool swdev_notify)
> >>>>>  {
> >>>>>  	trace_fdb_delete(br, f);
> >>>>>  
> >>>>> -	if (f->is_static)
> >>>>> +	if (f->is_static) {
> >>>>>  		fdb_del_hw_addr(br, f->key.addr.addr);
> >>>>> +		fdb_del_hw_maddr(br, f->key.addr.addr);
> >>>>
> >>>> Walking over all ports again for each static delete is a no-go.
> >>>>
> >>>>> +	}
> >>>>>  
> >>>>>  	hlist_del_init_rcu(&f->fdb_node);
> >>>>>  	rhashtable_remove_fast(&br->fdb_hash_tbl, &f->rhnode,
> >>>>> @@ -843,13 +880,19 @@ static int fdb_add_entry(struct net_bridge *br, struct net_bridge_port *source,
> >>>>>  			fdb->is_local = 1;
> >>>>>  			if (!fdb->is_static) {
> >>>>>  				fdb->is_static = 1;
> >>>>> -				fdb_add_hw_addr(br, addr);
> >>>>> +				if (flags & NLM_F_APPEND && !source)
> >>>>> +					fdb_add_hw_maddr(br, addr);
> >>>>> +				else
> >>>>> +					fdb_add_hw_addr(br, addr);
> >>>>>  			}
> >>>>>  		} else if (state & NUD_NOARP) {
> >>>>>  			fdb->is_local = 0;
> >>>>>  			if (!fdb->is_static) {
> >>>>>  				fdb->is_static = 1;
> >>>>> -				fdb_add_hw_addr(br, addr);
> >>>>> +				if (flags & NLM_F_APPEND && !source)
> >>>>> +					fdb_add_hw_maddr(br, addr);
> >>>>> +				else
> >>>>> +					fdb_add_hw_addr(br, addr);
> >>>>>  			}
> >>>>>  		} else {
> >>>>>  			fdb->is_local = 0;
> >>>>>
> >>>>
> >>>
> >>
> > 
> 

-- 
/Horatiu

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH] net: bridge: Allow bridge to joing multicast groups
  2019-07-26 12:02           ` Horatiu Vultur
@ 2019-07-26 12:31             ` Nikolay Aleksandrov
  2019-07-29 12:14               ` Allan W. Nielsen
  2019-08-01 14:22               ` Allan W. Nielsen
  2019-07-26 13:46             ` Andrew Lunn
  1 sibling, 2 replies; 43+ messages in thread
From: Nikolay Aleksandrov @ 2019-07-26 12:31 UTC (permalink / raw)
  To: Horatiu Vultur; +Cc: roopa, davem, bridge, netdev, linux-kernel, allan.nielsen

On 26/07/2019 15:02, Horatiu Vultur wrote:
> Hi Nikolay,
> 
> The 07/26/2019 12:26, Nikolay Aleksandrov wrote:
>> External E-Mail
>>
>>
>> On 26/07/2019 11:41, Nikolay Aleksandrov wrote:
>>> On 25/07/2019 17:21, Horatiu Vultur wrote:
>>>> Hi Nikolay,
>>>>
>>>> The 07/25/2019 16:21, Nikolay Aleksandrov wrote:
>>>>> External E-Mail
>>>>>
>>>>>
>>>>> On 25/07/2019 16:06, Nikolay Aleksandrov wrote:
>>>>>> On 25/07/2019 14:44, Horatiu Vultur wrote:
>>>>>>> There is no way to configure the bridge, to receive only specific link
>>>>>>> layer multicast addresses. From the description of the command 'bridge
>>>>>>> fdb append' is supposed to do that, but there was no way to notify the
>>>>>>> network driver that the bridge joined a group, because LLADDR was added
>>>>>>> to the unicast netdev_hw_addr_list.
>>>>>>>
>>>>>>> Therefore update fdb_add_entry to check if the NLM_F_APPEND flag is set
>>>>>>> and if the source is NULL, which represent the bridge itself. Then add
>>>>>>> address to multicast netdev_hw_addr_list for each bridge interfaces.
>>>>>>> And then the .ndo_set_rx_mode function on the driver is called. To notify
>>>>>>> the driver that the list of multicast mac addresses changed.
>>>>>>>
>>>>>>> Signed-off-by: Horatiu Vultur <horatiu.vultur@microchip.com>
>>>>>>> ---
>>>>>>>  net/bridge/br_fdb.c | 49 ++++++++++++++++++++++++++++++++++++++++++++++---
>>>>>>>  1 file changed, 46 insertions(+), 3 deletions(-)
>>>>>>>
>>>>>>
>>>>>> Hi,
>>>>>> I'm sorry but this patch is wrong on many levels, some notes below. In general
>>>>>> NLM_F_APPEND is only used in vxlan, the bridge does not handle that flag at all.
>>>>>> FDB is only for *unicast*, nothing is joined and no multicast should be used with fdbs.
>>>>>> MDB is used for multicast handling, but both of these are used for forwarding.
>>>>>> The reason the static fdbs are added to the filter is for non-promisc ports, so they can
>>>>>> receive traffic destined for these FDBs for forwarding.
>>>>>> If you'd like to join any multicast group please use the standard way, if you'd like to join
>>>>>> it only on a specific port - join it only on that port (or ports) and the bridge and you'll
>>>>>
>>>>> And obviously this is for the case where you're not enabling port promisc mode (non-default).
>>>>> In general you'll only need to join the group on the bridge to receive traffic for it
>>>>> or add it as an mdb entry to forward it.
>>>>>
>>>>>> have the effect that you're describing. What do you mean there's no way ?
>>>>
>>>> Thanks for the explanation.
>>>> There are few things that are not 100% clear to me and maybe you can
>>>> explain them, not to go totally in the wrong direction. Currently I am
>>>> writing a network driver on which I added switchdev support. Then I was
>>>> looking for a way to configure the network driver to copy link layer
>>>> multicast address to the CPU port.
>>>>
>>>> If I am using bridge mdb I can do it only for IP multicast addreses,
>>>> but how should I do it if I want non IP frames with link layer multicast
>>>> address to be copy to CPU? For example: all frames with multicast
>>>> address '01-21-6C-00-00-01' to be copy to CPU. What is the user space
>>>> command for that?
>>>>
>>>
>>> Check SIOCADDMULTI (ip maddr from iproute2), f.e. add that mac to the port
>>> which needs to receive it and the bridge will send it up automatically since
>>> it's unknown mcast (note that if there's a querier, you'll have to make the
>>> bridge mcast router if it is not the querier itself). It would also flood it to all
>>
>> Actually you mentioned non-IP traffic, so the querier stuff is not a problem. This
>> traffic will always be flooded by the bridge (and also a copy will be locally sent up).
>> Thus only the flooding may need to be controlled.
> 
> OK, I see, but the part which is not clear to me is, which bridge
> command(from iproute2) to use so the bridge would notify the network
> driver(using swichdev or not) to configure the HW to copy all the frames
> with dmac '01-21-6C-00-00-01' to CPU? So that the bridge can receive
> those frames and then just to pass them up.
> Definitly I would not like to set the front ports in promisc mode, to
> copy all the frames to CPU because I think that would overkill it.
> Thanks,
> 

For non-IP traffic there's no such command, if it was IP you could catch
switchdev mdb notifications, but just for a mac address you'll have to configure
the port yourself, e.g. via the ip maddr command if you don't want to put it in
promisc mode. You know that in order to not run in promisc mode you'll have to disable
port flooding and port learning, right ? Otherwise they're always put in promisc.
If you do that you could do a simple hack in your driver with the current situation:
$ bridge fdb add 01:21:6C:00:00:01 dev swpX master static
If swpX is not promisc you'll get dev_uc_add() which will call __dev_set_rx_mode()
and the notifier so you can catch that address being added to the device, traffic
for that mac will *not* be forwarded to swpX later because it's considered multicast
and unknown at that so it will be flooded to all who have the FLOOD flag and will be
sent up the bridge as well.
But this is very hacky, I'd prefer the ip maddr add command on the port where you wish
to receive that traffic, the bridge will pass it up always due to it being unknown.


>>
>>> other ports so you may want to control that. It really depends on the setup
>>> and the how the hardware is configured.
>>>
>>>>>>
>>>>>> In addition you're allowing a mix of mcast functions to be called with unicast addresses
>>>>>> and vice versa, it is not that big of a deal because the kernel will simply return an error
>>>>>> but still makes no sense.
>>>>>>
>>>>>> Nacked-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
>>>>>>
>>>>>>> diff --git a/net/bridge/br_fdb.c b/net/bridge/br_fdb.c
>>>>>>> index b1d3248..d93746d 100644
>>>>>>> --- a/net/bridge/br_fdb.c
>>>>>>> +++ b/net/bridge/br_fdb.c
>>>>>>> @@ -175,6 +175,29 @@ static void fdb_add_hw_addr(struct net_bridge *br, const unsigned char *addr)
>>>>>>>  	}
>>>>>>>  }
>>>>>>>  
>>>>>>> +static void fdb_add_hw_maddr(struct net_bridge *br, const unsigned char *addr)
>>>>>>> +{
>>>>>>> +	int err;
>>>>>>> +	struct net_bridge_port *p;
>>>>>>> +
>>>>>>> +	ASSERT_RTNL();
>>>>>>> +
>>>>>>> +	list_for_each_entry(p, &br->port_list, list) {
>>>>>>> +		if (!br_promisc_port(p)) {
>>>>>>> +			err = dev_mc_add(p->dev, addr);
>>>>>>> +			if (err)
>>>>>>> +				goto undo;
>>>>>>> +		}
>>>>>>> +	}
>>>>>>> +
>>>>>>> +	return;
>>>>>>> +undo:
>>>>>>> +	list_for_each_entry_continue_reverse(p, &br->port_list, list) {
>>>>>>> +		if (!br_promisc_port(p))
>>>>>>> +			dev_mc_del(p->dev, addr);
>>>>>>> +	}
>>>>>>> +}
>>>>>>> +
>>>>>>>  /* When a static FDB entry is deleted, the HW address from that entry is
>>>>>>>   * also removed from the bridge private HW address list and updates all
>>>>>>>   * the ports with needed information.
>>>>>>> @@ -192,13 +215,27 @@ static void fdb_del_hw_addr(struct net_bridge *br, const unsigned char *addr)
>>>>>>>  	}
>>>>>>>  }
>>>>>>>  
>>>>>>> +static void fdb_del_hw_maddr(struct net_bridge *br, const unsigned char *addr)
>>>>>>> +{
>>>>>>> +	struct net_bridge_port *p;
>>>>>>> +
>>>>>>> +	ASSERT_RTNL();
>>>>>>> +
>>>>>>> +	list_for_each_entry(p, &br->port_list, list) {
>>>>>>> +		if (!br_promisc_port(p))
>>>>>>> +			dev_mc_del(p->dev, addr);
>>>>>>> +	}
>>>>>>> +}
>>>>>>> +
>>>>>>>  static void fdb_delete(struct net_bridge *br, struct net_bridge_fdb_entry *f,
>>>>>>>  		       bool swdev_notify)
>>>>>>>  {
>>>>>>>  	trace_fdb_delete(br, f);
>>>>>>>  
>>>>>>> -	if (f->is_static)
>>>>>>> +	if (f->is_static) {
>>>>>>>  		fdb_del_hw_addr(br, f->key.addr.addr);
>>>>>>> +		fdb_del_hw_maddr(br, f->key.addr.addr);
>>>>>>
>>>>>> Walking over all ports again for each static delete is a no-go.
>>>>>>
>>>>>>> +	}
>>>>>>>  
>>>>>>>  	hlist_del_init_rcu(&f->fdb_node);
>>>>>>>  	rhashtable_remove_fast(&br->fdb_hash_tbl, &f->rhnode,
>>>>>>> @@ -843,13 +880,19 @@ static int fdb_add_entry(struct net_bridge *br, struct net_bridge_port *source,
>>>>>>>  			fdb->is_local = 1;
>>>>>>>  			if (!fdb->is_static) {
>>>>>>>  				fdb->is_static = 1;
>>>>>>> -				fdb_add_hw_addr(br, addr);
>>>>>>> +				if (flags & NLM_F_APPEND && !source)
>>>>>>> +					fdb_add_hw_maddr(br, addr);
>>>>>>> +				else
>>>>>>> +					fdb_add_hw_addr(br, addr);
>>>>>>>  			}
>>>>>>>  		} else if (state & NUD_NOARP) {
>>>>>>>  			fdb->is_local = 0;
>>>>>>>  			if (!fdb->is_static) {
>>>>>>>  				fdb->is_static = 1;
>>>>>>> -				fdb_add_hw_addr(br, addr);
>>>>>>> +				if (flags & NLM_F_APPEND && !source)
>>>>>>> +					fdb_add_hw_maddr(br, addr);
>>>>>>> +				else
>>>>>>> +					fdb_add_hw_addr(br, addr);
>>>>>>>  			}
>>>>>>>  		} else {
>>>>>>>  			fdb->is_local = 0;
>>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>
> 


^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH] net: bridge: Allow bridge to joing multicast groups
  2019-07-26 12:02           ` Horatiu Vultur
  2019-07-26 12:31             ` Nikolay Aleksandrov
@ 2019-07-26 13:46             ` Andrew Lunn
  2019-07-26 19:50               ` Allan W. Nielsen
  1 sibling, 1 reply; 43+ messages in thread
From: Andrew Lunn @ 2019-07-26 13:46 UTC (permalink / raw)
  To: Horatiu Vultur
  Cc: Nikolay Aleksandrov, roopa, davem, bridge, netdev, linux-kernel,
	allan.nielsen

On Fri, Jul 26, 2019 at 02:02:15PM +0200, Horatiu Vultur wrote:
> Hi Nikolay,
> 
> The 07/26/2019 12:26, Nikolay Aleksandrov wrote:
> > External E-Mail
> > 
> > 
> > On 26/07/2019 11:41, Nikolay Aleksandrov wrote:
> > > On 25/07/2019 17:21, Horatiu Vultur wrote:
> > >> Hi Nikolay,
> > >>
> > >> The 07/25/2019 16:21, Nikolay Aleksandrov wrote:
> > >>> External E-Mail
> > >>>
> > >>>
> > >>> On 25/07/2019 16:06, Nikolay Aleksandrov wrote:
> > >>>> On 25/07/2019 14:44, Horatiu Vultur wrote:
> > >>>>> There is no way to configure the bridge, to receive only specific link
> > >>>>> layer multicast addresses. From the description of the command 'bridge
> > >>>>> fdb append' is supposed to do that, but there was no way to notify the
> > >>>>> network driver that the bridge joined a group, because LLADDR was added
> > >>>>> to the unicast netdev_hw_addr_list.
> > >>>>>
> > >>>>> Therefore update fdb_add_entry to check if the NLM_F_APPEND flag is set
> > >>>>> and if the source is NULL, which represent the bridge itself. Then add
> > >>>>> address to multicast netdev_hw_addr_list for each bridge interfaces.
> > >>>>> And then the .ndo_set_rx_mode function on the driver is called. To notify
> > >>>>> the driver that the list of multicast mac addresses changed.
> > >>>>>
> > >>>>> Signed-off-by: Horatiu Vultur <horatiu.vultur@microchip.com>
> > >>>>> ---
> > >>>>>  net/bridge/br_fdb.c | 49 ++++++++++++++++++++++++++++++++++++++++++++++---
> > >>>>>  1 file changed, 46 insertions(+), 3 deletions(-)
> > >>>>>
> > >>>>
> > >>>> Hi,
> > >>>> I'm sorry but this patch is wrong on many levels, some notes below. In general
> > >>>> NLM_F_APPEND is only used in vxlan, the bridge does not handle that flag at all.
> > >>>> FDB is only for *unicast*, nothing is joined and no multicast should be used with fdbs.
> > >>>> MDB is used for multicast handling, but both of these are used for forwarding.
> > >>>> The reason the static fdbs are added to the filter is for non-promisc ports, so they can
> > >>>> receive traffic destined for these FDBs for forwarding.
> > >>>> If you'd like to join any multicast group please use the standard way, if you'd like to join
> > >>>> it only on a specific port - join it only on that port (or ports) and the bridge and you'll
> > >>>
> > >>> And obviously this is for the case where you're not enabling port promisc mode (non-default).
> > >>> In general you'll only need to join the group on the bridge to receive traffic for it
> > >>> or add it as an mdb entry to forward it.
> > >>>
> > >>>> have the effect that you're describing. What do you mean there's no way ?
> > >>
> > >> Thanks for the explanation.
> > >> There are few things that are not 100% clear to me and maybe you can
> > >> explain them, not to go totally in the wrong direction. Currently I am
> > >> writing a network driver on which I added switchdev support. Then I was
> > >> looking for a way to configure the network driver to copy link layer
> > >> multicast address to the CPU port.
> > >>
> > >> If I am using bridge mdb I can do it only for IP multicast addreses,
> > >> but how should I do it if I want non IP frames with link layer multicast
> > >> address to be copy to CPU? For example: all frames with multicast
> > >> address '01-21-6C-00-00-01' to be copy to CPU. What is the user space
> > >> command for that?
> > >>
> > > 
> > > Check SIOCADDMULTI (ip maddr from iproute2), f.e. add that mac to the port
> > > which needs to receive it and the bridge will send it up automatically since
> > > it's unknown mcast (note that if there's a querier, you'll have to make the
> > > bridge mcast router if it is not the querier itself). It would also flood it to all
> > 
> > Actually you mentioned non-IP traffic, so the querier stuff is not a problem. This
> > traffic will always be flooded by the bridge (and also a copy will be locally sent up).
> > Thus only the flooding may need to be controlled.
> 
> OK, I see, but the part which is not clear to me is, which bridge
> command(from iproute2) to use so the bridge would notify the network
> driver(using swichdev or not) to configure the HW to copy all the frames
> with dmac '01-21-6C-00-00-01' to CPU? So that the bridge can receive
> those frames and then just to pass them up.

Hi Horatiu

Something to keep in mind.

My default, multicast should be flooded, and that includes the CPU
port for a DSA driver. Adding an MDB entry allows for optimisations,
limiting which ports a multicast frame goes out of. But it is just an
optimisation.

	Andrew

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH] net: bridge: Allow bridge to joing multicast groups
  2019-07-26 13:46             ` Andrew Lunn
@ 2019-07-26 19:50               ` Allan W. Nielsen
  2019-07-27  3:02                 ` Andrew Lunn
  0 siblings, 1 reply; 43+ messages in thread
From: Allan W. Nielsen @ 2019-07-26 19:50 UTC (permalink / raw)
  To: Andrew Lunn
  Cc: Horatiu Vultur, Nikolay Aleksandrov, roopa, davem, bridge,
	netdev, linux-kernel

Hi All,

I'm working on the same project as Horatiu.

The 07/26/2019 15:46, Andrew Lunn wrote:
> My default, multicast should be flooded, and that includes the CPU
> port for a DSA driver. Adding an MDB entry allows for optimisations,
> limiting which ports a multicast frame goes out of. But it is just an
> optimisation.

Do you do this for all VLANs, or is there a way to only do this for VLANs that
the CPU is suppose to take part of?

I assume we could limit the behavioral to only do this for VLANs which the
Bridge interface is part of - but I'm not sure if this is what you suggest.

As you properly guessed, this model is quite different from what we are used to.
Just for the context: The ethernet switches done by Vitesse, which was acquired
by Microsemi, and now become Microchip, has until now been supported by a MIT
licensed API (running in user-space) and a protocol stack running on top of the
API. In this model we have been used to explicitly configure what packets should
go to the CPU. Typically this would be the MAC addresses of the interface it
self, multicast addresses required by the IP stack (4 and 6), and the broadcast
address. In this model, will only do this on VLANs which is configured as L3
interfaces.

We may be able to make it work by flood all multicast traffic by default, and
use a low priority CPU queue. But I'm having a hard time getting used to this
model (maybe time will help).

Is it considered required to include the CPU in all multicast flood masks? Or do
you know if this is different from driver to driver?

Alternative would it make sense to make this behavioral configurable?

/Allan

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH] net: bridge: Allow bridge to joing multicast groups
  2019-07-26 19:50               ` Allan W. Nielsen
@ 2019-07-27  3:02                 ` Andrew Lunn
  2019-07-28 19:15                   ` Allan W. Nielsen
  0 siblings, 1 reply; 43+ messages in thread
From: Andrew Lunn @ 2019-07-27  3:02 UTC (permalink / raw)
  To: Allan W. Nielsen
  Cc: Horatiu Vultur, Nikolay Aleksandrov, roopa, davem, bridge,
	netdev, linux-kernel

> As you properly guessed, this model is quite different from what we are used to.

Yes, it takes a while to get the idea that the hardware is just an
accelerator for what the Linux stack can already do. And if the switch
cannot do some feature, pass the frame to Linux so it can handle it.

You need to keep in mind that there could be other ports in the bridge
than switch ports, and those ports might be interested in the
multicast traffic. Hence the CPU needs to see the traffic. But IGMP
snooping can be used to optimise this. But you still need to be
careful, eg. IPv6 Neighbour discovery has often been broken on
mv88e6xxx because we have been too aggressive with filtering
multicast.

	Andrew

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH] net: bridge: Allow bridge to joing multicast groups
  2019-07-27  3:02                 ` Andrew Lunn
@ 2019-07-28 19:15                   ` Allan W. Nielsen
  2019-07-28 23:07                     ` Andrew Lunn
  2019-07-29  6:09                     ` Ido Schimmel
  0 siblings, 2 replies; 43+ messages in thread
From: Allan W. Nielsen @ 2019-07-28 19:15 UTC (permalink / raw)
  To: Andrew Lunn
  Cc: Horatiu Vultur, Nikolay Aleksandrov, roopa, davem, bridge,
	netdev, linux-kernel

The 07/27/2019 05:02, Andrew Lunn wrote:
> > As you properly guessed, this model is quite different from what we are used to.
> 
> Yes, it takes a while to get the idea that the hardware is just an
> accelerator for what the Linux stack can already do. And if the switch
> cannot do some feature, pass the frame to Linux so it can handle it.
This is understood, and not that different from what we are used to.

The surprise was to make all multicast traffic to go to the CPU.

> You need to keep in mind that there could be other ports in the bridge
> than switch ports, and those ports might be interested in the
> multicast traffic. Hence the CPU needs to see the traffic.
This is a good argument, but I was under the impression that not all HW/drivers
supports foreign interfaces (see ocelot_netdevice_dev_check and
mlxsw_sp_port_dev_check).

> But IGMP snooping can be used to optimise this.
Yes, IGMP snooping can limit the multicast storm of multicast IP traffic, but
not for L2 non-IP multicast traffic.

We could really use something similar for non-IP multicast MAC addresses.

Trying to get back to the original problem:

We have a network which implements the ODVA/DLR ring protocol. This protocol
sends out a beacon frame as often as every 3 us (as far as I recall, default I
believe is 400 us) to this MAC address: 01:21:6C:00:00:01.

Try take a quick look at slide 10 in [1].

If we assume that the SwitchDev driver implemented such that all multicast
traffic goes to the CPU, then we should really have a way to install a HW
offload path in the silicon, such that these packets does not go to the CPU (as
they are known not to be use full, and a frame every 3 us is a significant load
on small DMA connections and CPU resources).

If we assume that the SwitchDev driver implemented such that only "needed"
multicast packets goes to the CPU, then we need a way to get these packets in
case we want to implement the DLR protocol.

I'm sure that both models can work, and I do not think that this is the main
issue here.

Our initial attempt was to allow install static L2-MAC entries and append
multiple ports to such an entry in the MAC table. This was rejected, for several
good reasons it seems. But I'm not sure it was clear what we wanted to achieve,
and why we find it to be important. Hopefully this is clear with a real world
use-case.

Any hints or ideas on what would be a better way to solve this problems will be
much appreciated.

/Allan

[1] https://www.odva.org/Portals/0/Library/Conference/2017-ODVA-Conference_Woods_High%20Availability_Guidelines%20for%20Use%20of%20DLR%20in%20EtherNetIP%20Networks_FINAL%20PPT.pdf

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH] net: bridge: Allow bridge to joing multicast groups
  2019-07-28 19:15                   ` Allan W. Nielsen
@ 2019-07-28 23:07                     ` Andrew Lunn
  2019-07-29  6:09                     ` Ido Schimmel
  1 sibling, 0 replies; 43+ messages in thread
From: Andrew Lunn @ 2019-07-28 23:07 UTC (permalink / raw)
  To: Allan W. Nielsen
  Cc: Horatiu Vultur, Nikolay Aleksandrov, roopa, davem, bridge,
	netdev, linux-kernel

> Trying to get back to the original problem:
> 
> We have a network which implements the ODVA/DLR ring protocol. This protocol
> sends out a beacon frame as often as every 3 us (as far as I recall, default I
> believe is 400 us) to this MAC address: 01:21:6C:00:00:01.
> 
> Try take a quick look at slide 10 in [1].
> 
> If we assume that the SwitchDev driver implemented such that all multicast
> traffic goes to the CPU, then we should really have a way to install a HW
> offload path in the silicon, such that these packets does not go to the CPU (as
> they are known not to be use full, and a frame every 3 us is a significant load
> on small DMA connections and CPU resources).
> 
> If we assume that the SwitchDev driver implemented such that only "needed"
> multicast packets goes to the CPU, then we need a way to get these packets in
> case we want to implement the DLR protocol.
> 
> I'm sure that both models can work, and I do not think that this is the main
> issue here.
> 
> Our initial attempt was to allow install static L2-MAC entries and append
> multiple ports to such an entry in the MAC table. This was rejected, for several
> good reasons it seems. But I'm not sure it was clear what we wanted to achieve,
> and why we find it to be important. Hopefully this is clear with a real world
> use-case.
> 
> Any hints or ideas on what would be a better way to solve this problems will be
> much appreciated.

I always try to think about how this would work if i had a bunch of
discrete network interfaces, not a switch. What APIs are involved in
configuring such a system? How does the Linux network stack perform
software DLR? How is the reception and blocking of the multicast group
performed?

Once you understand how it works in the software implement, it should
then be more obvious which switchdev hooks should be used to
accelerate this using hardware.

	   Andrew

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH] net: bridge: Allow bridge to joing multicast groups
  2019-07-28 19:15                   ` Allan W. Nielsen
  2019-07-28 23:07                     ` Andrew Lunn
@ 2019-07-29  6:09                     ` Ido Schimmel
  2019-07-29 12:43                       ` Allan W. Nielsen
  1 sibling, 1 reply; 43+ messages in thread
From: Ido Schimmel @ 2019-07-29  6:09 UTC (permalink / raw)
  To: Allan W. Nielsen
  Cc: Andrew Lunn, Horatiu Vultur, Nikolay Aleksandrov, roopa, davem,
	bridge, netdev, linux-kernel

On Sun, Jul 28, 2019 at 09:15:59PM +0200, Allan W. Nielsen wrote:
> If we assume that the SwitchDev driver implemented such that all multicast
> traffic goes to the CPU, then we should really have a way to install a HW
> offload path in the silicon, such that these packets does not go to the CPU (as
> they are known not to be use full, and a frame every 3 us is a significant load
> on small DMA connections and CPU resources).
> 
> If we assume that the SwitchDev driver implemented such that only "needed"
> multicast packets goes to the CPU, then we need a way to get these packets in
> case we want to implement the DLR protocol.

I'm not familiar with the HW you're working with, so the below might not
be relevant.

In case you don't want to send all multicast traffic to the CPU (I'll
refer to it later), you can install an ingress tc filter that traps to
the CPU the packets you do want to receive. Something like:

# tc qdisc add dev swp1 clsact
# tc filter add dev swp1 pref 1 ingress flower skip_sw dst_mac \
	01:21:6C:00:00:01 action trap

If your HW supports sharing the same filter among multiple ports, then
you can install your filter in a tc shared block and bind multiple ports
to it.

Another option is to always send a *copy* of multicast packets to the
CPU, but make sure the HW uses a policer that prevents the CPU from
being overwhelmed. To avoid packets being forwarded twice (by HW and
SW), you will need to mark such packets in your driver with
'skb->offload_fwd_mark = 1'.

Now, in case user wants to allow the CPU to receive certain packets at a
higher rate, a tc filter can be used. It will be identical to the filter
I mentioned earlier, but with a 'police' action chained before 'trap'.

I don't think this is currently supported by any driver, but I believe
it's the right way to go: By default the CPU receives all the traffic it
should receive and user can fine-tune it using ACLs.

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH] net: bridge: Allow bridge to joing multicast groups
  2019-07-26 12:31             ` Nikolay Aleksandrov
@ 2019-07-29 12:14               ` Allan W. Nielsen
  2019-07-29 12:22                 ` Nikolay Aleksandrov
  2019-08-01 14:22               ` Allan W. Nielsen
  1 sibling, 1 reply; 43+ messages in thread
From: Allan W. Nielsen @ 2019-07-29 12:14 UTC (permalink / raw)
  To: Nikolay Aleksandrov
  Cc: Horatiu Vultur, roopa, davem, bridge, netdev, linux-kernel

Hi Nikolay,

First of all, as mentioned further down in this thread, I realized that our
implementation of the multicast floodmasks does not align with the existing SW
implementation. We will change this, such that all multicast packets goes to the
SW bridge.

This changes things a bit, not that much.

I actually think you summarized the issue we have (after changing to multicast
flood-masks) right here:

The 07/26/2019 12:26, Nikolay Aleksandrov wrote:
> >> Actually you mentioned non-IP traffic, so the querier stuff is not a problem. This
> >> traffic will always be flooded by the bridge (and also a copy will be locally sent up).
> >> Thus only the flooding may need to be controlled.

This seems to be exactly what we need.

Assuming we have a SW bridge (br0) with 4 slave interfaces (eth0-3). We use this
on a network where we want to limit the flooding of frames with dmac
01:21:6C:00:00:01 (which is non IP traffic) to eth0 and eth1.

One way of doing this could potentially be to support the following command:

bridge fdb add    01:21:6C:00:00:01 port eth0
bridge fdb append 01:21:6C:00:00:01 port eth1

On 25/07/2019 16:06, Nikolay Aleksandrov wrote:
> >>>>>>  In general NLM_F_APPEND is only used in vxlan, the bridge does not
> >>>>>>  handle that flag at all.  FDB is only for *unicast*, nothing is joined
> >>>>>>  and no multicast should be used with fdbs. MDB is used for multicast
> >>>>>>  handling, but both of these are used for forwarding.
This is true, and this should have been addressed in the patch, we were too
focused on setting up the offload patch in the driver, and forgot to do the SW
implementation.

Do you see any issues in supporting this flag, and updating the SW
forwarding in br_handle_frame_finish such that it can support/allow a FDB entry
to be a multicast?

/Allan


^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH] net: bridge: Allow bridge to joing multicast groups
  2019-07-29 12:14               ` Allan W. Nielsen
@ 2019-07-29 12:22                 ` Nikolay Aleksandrov
  2019-07-29 12:50                   ` Nikolay Aleksandrov
  2019-07-29 13:14                   ` Allan W. Nielsen
  0 siblings, 2 replies; 43+ messages in thread
From: Nikolay Aleksandrov @ 2019-07-29 12:22 UTC (permalink / raw)
  To: Allan W. Nielsen
  Cc: Horatiu Vultur, roopa, davem, bridge, netdev, linux-kernel

Hi Allan,
On 29/07/2019 15:14, Allan W. Nielsen wrote:
> Hi Nikolay,
> 
> First of all, as mentioned further down in this thread, I realized that our
> implementation of the multicast floodmasks does not align with the existing SW
> implementation. We will change this, such that all multicast packets goes to the
> SW bridge.
> 
> This changes things a bit, not that much.
> 
> I actually think you summarized the issue we have (after changing to multicast
> flood-masks) right here:
> 
> The 07/26/2019 12:26, Nikolay Aleksandrov wrote:
>>>> Actually you mentioned non-IP traffic, so the querier stuff is not a problem. This
>>>> traffic will always be flooded by the bridge (and also a copy will be locally sent up).
>>>> Thus only the flooding may need to be controlled.
> 
> This seems to be exactly what we need.
> 
> Assuming we have a SW bridge (br0) with 4 slave interfaces (eth0-3). We use this
> on a network where we want to limit the flooding of frames with dmac
> 01:21:6C:00:00:01 (which is non IP traffic) to eth0 and eth1.
> 
> One way of doing this could potentially be to support the following command:
> 
> bridge fdb add    01:21:6C:00:00:01 port eth0
> bridge fdb append 01:21:6C:00:00:01 port eth1
> 
> On 25/07/2019 16:06, Nikolay Aleksandrov wrote:
>>>>>>>>  In general NLM_F_APPEND is only used in vxlan, the bridge does not
>>>>>>>>  handle that flag at all.  FDB is only for *unicast*, nothing is joined
>>>>>>>>  and no multicast should be used with fdbs. MDB is used for multicast
>>>>>>>>  handling, but both of these are used for forwarding.
> This is true, and this should have been addressed in the patch, we were too
> focused on setting up the offload patch in the driver, and forgot to do the SW
> implementation.
> 
> Do you see any issues in supporting this flag, and updating the SW
> forwarding in br_handle_frame_finish such that it can support/allow a FDB entry
> to be a multicast?
> 

Yes, all of the multicast code is handled differently, it doesn't go through the fdb
lookup or code at all. I don't see how you'll do a lookup in the fdb table with a
multicast mac address, take a look at br_handle_frame_finish() and you'll notice
that when a multicast dmac is detected then we use the bridge mcast code for lookups
and forwarding. If you're trying to achieve Rx only on the bridge of these then
why not just use Ido's tc suggestion or even the ip maddr add offload for each port ?

If you add a multicast mac in the fdb (currently allowed, but has no effect) and you
use dev_mc_add() as suggested that'd just be a hack to pass it down and it is already
possible to achieve via other methods, no need to go through the bridge.

> /Allan
> 


^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH] net: bridge: Allow bridge to joing multicast groups
  2019-07-29  6:09                     ` Ido Schimmel
@ 2019-07-29 12:43                       ` Allan W. Nielsen
  0 siblings, 0 replies; 43+ messages in thread
From: Allan W. Nielsen @ 2019-07-29 12:43 UTC (permalink / raw)
  To: Ido Schimmel
  Cc: Andrew Lunn, Horatiu Vultur, Nikolay Aleksandrov, roopa, davem,
	bridge, netdev, linux-kernel

Hi Ido,

The 07/29/2019 09:09, Ido Schimmel wrote:
> External E-Mail
> 
> 
> On Sun, Jul 28, 2019 at 09:15:59PM +0200, Allan W. Nielsen wrote:
> > If we assume that the SwitchDev driver implemented such that all multicast
> > traffic goes to the CPU, then we should really have a way to install a HW
> > offload path in the silicon, such that these packets does not go to the CPU (as
> > they are known not to be use full, and a frame every 3 us is a significant load
> > on small DMA connections and CPU resources).
> > 
> > If we assume that the SwitchDev driver implemented such that only "needed"
> > multicast packets goes to the CPU, then we need a way to get these packets in
> > case we want to implement the DLR protocol.
> 
> I'm not familiar with the HW you're working with, so the below might not
> be relevant.
> 
> In case you don't want to send all multicast traffic to the CPU (I'll
> refer to it later), you can install an ingress tc filter that traps to
> the CPU the packets you do want to receive. Something like:
> 
> # tc qdisc add dev swp1 clsact
> # tc filter add dev swp1 pref 1 ingress flower skip_sw dst_mac \
> 	01:21:6C:00:00:01 action trap
I have actually been looking at this, and it may an idea to go down this road.
But so far we have chosen not to for the following reasons:
- It is not only about trapping traffic to the CPU, we also needs to capability
  to limit the flooding on the front ports.
- In our case (the silicon), this feature really belongs to the MAC-table, which
  is why we did prefer to do it via the FDB entries.
  - But the HW does have TCAM resources, and we are planning on exposing these
    resources via the tc-flower interface. It is just that we have more MAC
    table resoruces than TCAM resources, which is another argument for using the
    MAC table.

> If your HW supports sharing the same filter among multiple ports, then
> you can install your filter in a tc shared block and bind multiple ports
> to it.
It does, thanks for making us aware of this optimization option.

> Another option is to always send a *copy* of multicast packets to the
> CPU, but make sure the HW uses a policer that prevents the CPU from
> being overwhelmed. To avoid packets being forwarded twice (by HW and
> SW), you will need to mark such packets in your driver with
> 'skb->offload_fwd_mark = 1'.
Understood

> Now, in case user wants to allow the CPU to receive certain packets at a
> higher rate, a tc filter can be used. It will be identical to the filter
> I mentioned earlier, but with a 'police' action chained before 'trap'.
I see.

> I don't think this is currently supported by any driver, but I believe
> it's the right way to go: By default the CPU receives all the traffic it
> should receive and user can fine-tune it using ACLs.
If all the frames goes to the CPU, then how can I fine-tune frames not to go to
the CPU?? I can do a TRAP (to get it to the CPU) a DROP (to drop it before
forwarding), but how can I forward a multicast packet, but prevent it from going
to the CPU?

I have seen that the mirror command can do re-direction, but not to a list of
ports...

All in all, thanks a lot for the suggestions, but to begin with I think we will
explore the MAC table option a bit more. But we will get back to TC to support
the ACL functions.

/Allan



^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH] net: bridge: Allow bridge to joing multicast groups
  2019-07-29 12:22                 ` Nikolay Aleksandrov
@ 2019-07-29 12:50                   ` Nikolay Aleksandrov
  2019-07-29 13:14                   ` Allan W. Nielsen
  1 sibling, 0 replies; 43+ messages in thread
From: Nikolay Aleksandrov @ 2019-07-29 12:50 UTC (permalink / raw)
  To: Allan W. Nielsen
  Cc: Horatiu Vultur, roopa, davem, bridge, netdev, linux-kernel

On 29/07/2019 15:22, Nikolay Aleksandrov wrote:
> Hi Allan,
> On 29/07/2019 15:14, Allan W. Nielsen wrote:
>> Hi Nikolay,
>>
>> First of all, as mentioned further down in this thread, I realized that our
>> implementation of the multicast floodmasks does not align with the existing SW
>> implementation. We will change this, such that all multicast packets goes to the
>> SW bridge.
>>
>> This changes things a bit, not that much.
>>
>> I actually think you summarized the issue we have (after changing to multicast
>> flood-masks) right here:
>>
>> The 07/26/2019 12:26, Nikolay Aleksandrov wrote:
>>>>> Actually you mentioned non-IP traffic, so the querier stuff is not a problem. This
>>>>> traffic will always be flooded by the bridge (and also a copy will be locally sent up).
>>>>> Thus only the flooding may need to be controlled.
>>
>> This seems to be exactly what we need.
>>
>> Assuming we have a SW bridge (br0) with 4 slave interfaces (eth0-3). We use this
>> on a network where we want to limit the flooding of frames with dmac
>> 01:21:6C:00:00:01 (which is non IP traffic) to eth0 and eth1.
>>
>> One way of doing this could potentially be to support the following command:
>>
>> bridge fdb add    01:21:6C:00:00:01 port eth0
>> bridge fdb append 01:21:6C:00:00:01 port eth1
>>

And the fdbs become linked lists ? So we'll increase the complexity for something
that is already supported by ACLs (e.g. tc) and also bridge per-port multicast
flood flag ?

I'm sorry but that doesn't sound good to me for a case which is very rare and
there are existing ways to solve without incurring performance hits or increasing
code complexity.

>> On 25/07/2019 16:06, Nikolay Aleksandrov wrote:
>>>>>>>>>  In general NLM_F_APPEND is only used in vxlan, the bridge does not
>>>>>>>>>  handle that flag at all.  FDB is only for *unicast*, nothing is joined
>>>>>>>>>  and no multicast should be used with fdbs. MDB is used for multicast
>>>>>>>>>  handling, but both of these are used for forwarding.
>> This is true, and this should have been addressed in the patch, we were too
>> focused on setting up the offload patch in the driver, and forgot to do the SW
>> implementation.
>>
>> Do you see any issues in supporting this flag, and updating the SW
>> forwarding in br_handle_frame_finish such that it can support/allow a FDB entry
>> to be a multicast?
>>
> 
> Yes, all of the multicast code is handled differently, it doesn't go through the fdb
> lookup or code at all. I don't see how you'll do a lookup in the fdb table with a
> multicast mac address, take a look at br_handle_frame_finish() and you'll notice
> that when a multicast dmac is detected then we use the bridge mcast code for lookups
> and forwarding. If you're trying to achieve Rx only on the bridge of these then
> why not just use Ido's tc suggestion or even the ip maddr add offload for each port ?
> 
> If you add a multicast mac in the fdb (currently allowed, but has no effect) and you
> use dev_mc_add() as suggested that'd just be a hack to pass it down and it is already
> possible to achieve via other methods, no need to go through the bridge.
> 
>> /Allan
>>
> 


^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH] net: bridge: Allow bridge to joing multicast groups
  2019-07-29 12:22                 ` Nikolay Aleksandrov
  2019-07-29 12:50                   ` Nikolay Aleksandrov
@ 2019-07-29 13:14                   ` Allan W. Nielsen
  2019-07-29 13:42                     ` Nikolay Aleksandrov
  1 sibling, 1 reply; 43+ messages in thread
From: Allan W. Nielsen @ 2019-07-29 13:14 UTC (permalink / raw)
  To: Nikolay Aleksandrov
  Cc: Horatiu Vultur, roopa, davem, bridge, netdev, linux-kernel

The 07/29/2019 15:22, Nikolay Aleksandrov wrote:
> Yes, all of the multicast code is handled differently, it doesn't go through the fdb
> lookup or code at all. I don't see how you'll do a lookup in the fdb table with a
> multicast mac address, take a look at br_handle_frame_finish() and you'll notice
> that when a multicast dmac is detected then we use the bridge mcast code for lookups
> and forwarding.

Here is my thinking (needs much more elaboration, which will come if we do a
patch to test it out):


In br_pkt_type

Rename BR_PKT_MULTICAST to BR_PKT_MULTICAST_IP
Add a new type called BR_PKT_MULTICAST_L2

In br_handle_frame_finish

	if (is_multicast_ether_addr(dest)) {
		/* by definition the broadcast is also a multicast address */
		if (is_broadcast_ether_addr(dest)) {
			pkt_type = BR_PKT_BROADCAST;
			local_rcv = true;
		} else {
			pkt_type = BR_PKT_MULTICAST;
			if (br_multicast_rcv(br, p, skb, vid))
				goto drop;
		}
	}

Change the code above to detect if it is a BR_PKT_MULTICAST_IP or a
BR_PKT_MULTICAST_L2


In this section:

switch (pkt_type) {
....
}

if (dst) {
} else {
}

Add awareness to the BR_PKT_MULTICAST_L2 type, and allow it do forwarding
according to the static entry if it is there.

> If you're trying to achieve Rx only on the bridge of these then
> why not just use Ido's tc suggestion or even the ip maddr add offload for each port ?
> 
> If you add a multicast mac in the fdb (currently allowed, but has no effect) and you
> use dev_mc_add() as suggested that'd just be a hack to pass it down and it is already
> possible to achieve via other methods, no need to go through the bridge.

Well, I wanted the SW bridge implementation to behave the same with an without
HW offload.

And also, I believe that is conceptually belongs to the MAC tables.

/Allan


^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH] net: bridge: Allow bridge to joing multicast groups
  2019-07-29 13:14                   ` Allan W. Nielsen
@ 2019-07-29 13:42                     ` Nikolay Aleksandrov
  2019-07-29 13:52                       ` Allan W. Nielsen
  0 siblings, 1 reply; 43+ messages in thread
From: Nikolay Aleksandrov @ 2019-07-29 13:42 UTC (permalink / raw)
  To: Allan W. Nielsen
  Cc: Horatiu Vultur, roopa, davem, bridge, netdev, linux-kernel

On 29/07/2019 16:14, Allan W. Nielsen wrote:
> The 07/29/2019 15:22, Nikolay Aleksandrov wrote:
>> Yes, all of the multicast code is handled differently, it doesn't go through the fdb
>> lookup or code at all. I don't see how you'll do a lookup in the fdb table with a
>> multicast mac address, take a look at br_handle_frame_finish() and you'll notice
>> that when a multicast dmac is detected then we use the bridge mcast code for lookups
>> and forwarding.
> 
> Here is my thinking (needs much more elaboration, which will come if we do a
> patch to test it out):
> 
> 
> In br_pkt_type
> 
> Rename BR_PKT_MULTICAST to BR_PKT_MULTICAST_IP
> Add a new type called BR_PKT_MULTICAST_L2
> 
> In br_handle_frame_finish
> 
> 	if (is_multicast_ether_addr(dest)) {
> 		/* by definition the broadcast is also a multicast address */
> 		if (is_broadcast_ether_addr(dest)) {
> 			pkt_type = BR_PKT_BROADCAST;
> 			local_rcv = true;
> 		} else {
> 			pkt_type = BR_PKT_MULTICAST;
> 			if (br_multicast_rcv(br, p, skb, vid))
> 				goto drop;
> 		}
> 	}
> 
> Change the code above to detect if it is a BR_PKT_MULTICAST_IP or a
> BR_PKT_MULTICAST_L2
> 
> 
> In this section:
> 
> switch (pkt_type) {
> ....
> }
> 
> if (dst) {
> } else {
> }
> 
> Add awareness to the BR_PKT_MULTICAST_L2 type, and allow it do forwarding
> according to the static entry if it is there.
> 

All of these add overhead to everyone - standard unicast and multicast forwarding.
Also as mentioned in my second email the fdb would need changes which would further
increase that overhead and also the code complexity for everyone for a use-case which
is very rare when there are alternatives today which avoid all of that. Offload tc rules
and add a rule to allow that traffic on ingress for particular ports, then either use
the bridge multicast flood flag or tc again to restrict flood to other egress ports.

Alternatively use ip maddr add which calls dev_mc_add() which is what the patch was
trying to do in the first place to allow the Rx of these packets and then control
the flooding via one of the above methods.

Alternatively offload nft and use that to control the traffic.

I'm sure there are more ways that I'm missing. :)

If you find a way to achieve this without incurring a performance hit or significant
code complexity increase, and without breaking current use-cases (e.g. unexpected default
forwarding behaviour changes) then please send a patch and we can discuss it further with
all details present. People have provided enough alternatives which avoid all of the
problems.

>> If you're trying to achieve Rx only on the bridge of these then
>> why not just use Ido's tc suggestion or even the ip maddr add offload for each port ?
>>
>> If you add a multicast mac in the fdb (currently allowed, but has no effect) and you
>> use dev_mc_add() as suggested that'd just be a hack to pass it down and it is already
>> possible to achieve via other methods, no need to go through the bridge.
> 
> Well, I wanted the SW bridge implementation to behave the same with an without
> HW offload.
> 
> And also, I believe that is conceptually belongs to the MAC tables.
> 
> /Allan
> 


^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH] net: bridge: Allow bridge to joing multicast groups
  2019-07-29 13:42                     ` Nikolay Aleksandrov
@ 2019-07-29 13:52                       ` Allan W. Nielsen
  2019-07-29 14:21                         ` Nikolay Aleksandrov
  0 siblings, 1 reply; 43+ messages in thread
From: Allan W. Nielsen @ 2019-07-29 13:52 UTC (permalink / raw)
  To: Nikolay Aleksandrov
  Cc: Horatiu Vultur, roopa, davem, bridge, netdev, linux-kernel

The 07/29/2019 15:50, Nikolay Aleksandrov wrote:
> On 29/07/2019 15:22, Nikolay Aleksandrov wrote:
> > Hi Allan,
> > On 29/07/2019 15:14, Allan W. Nielsen wrote:
> >> First of all, as mentioned further down in this thread, I realized that our
> >> implementation of the multicast floodmasks does not align with the existing SW
> >> implementation. We will change this, such that all multicast packets goes to the
> >> SW bridge.
> >>
> >> This changes things a bit, not that much.
> >>
> >> I actually think you summarized the issue we have (after changing to multicast
> >> flood-masks) right here:
> >>
> >> The 07/26/2019 12:26, Nikolay Aleksandrov wrote:
> >>>>> Actually you mentioned non-IP traffic, so the querier stuff is not a problem. This
> >>>>> traffic will always be flooded by the bridge (and also a copy will be locally sent up).
> >>>>> Thus only the flooding may need to be controlled.
> >>
> >> This seems to be exactly what we need.
> >>
> >> Assuming we have a SW bridge (br0) with 4 slave interfaces (eth0-3). We use this
> >> on a network where we want to limit the flooding of frames with dmac
> >> 01:21:6C:00:00:01 (which is non IP traffic) to eth0 and eth1.
> >>
> >> One way of doing this could potentially be to support the following command:
> >>
> >> bridge fdb add    01:21:6C:00:00:01 port eth0
> >> bridge fdb append 01:21:6C:00:00:01 port eth1
> >>
> 
> And the fdbs become linked lists?
Yes, it will most likely become a linked list

> So we'll increase the complexity for something that is already supported by
> ACLs (e.g. tc) and also bridge per-port multicast flood flag ?
I do not think it can be supported with the facilities we have today in tc.

We can do half of it (copy more fraems to the CPU) with tc, but we can not limit
the floodmask of a frame with tc (say we want it to flood to 2 out of 4 slave
ports).

> I'm sorry but that doesn't sound good to me for a case which is very rare and
> there are existing ways to solve without incurring performance hits or increasing
> code complexity.
I do not consider it rarely, controling the forwarding of L2 multicast frames is
quite common in the applications we are doing.

> If you find a way to achieve this without incurring a performance hit or significant
> code complexity increase, and without breaking current use-cases (e.g. unexpected default
> forwarding behaviour changes) then please send a patch and we can discuss it further with
> all details present. People have provided enough alternatives which avoid all of the
> problems.
Will do, thanks for the guidance.

/Allan


^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH] net: bridge: Allow bridge to joing multicast groups
  2019-07-29 13:52                       ` Allan W. Nielsen
@ 2019-07-29 14:21                         ` Nikolay Aleksandrov
  2019-07-29 14:35                           ` Allan W. Nielsen
  0 siblings, 1 reply; 43+ messages in thread
From: Nikolay Aleksandrov @ 2019-07-29 14:21 UTC (permalink / raw)
  To: Allan W. Nielsen
  Cc: Horatiu Vultur, roopa, davem, bridge, netdev, linux-kernel

On 29/07/2019 16:52, Allan W. Nielsen wrote:
> The 07/29/2019 15:50, Nikolay Aleksandrov wrote:
>> On 29/07/2019 15:22, Nikolay Aleksandrov wrote:
>>> Hi Allan,
>>> On 29/07/2019 15:14, Allan W. Nielsen wrote:
>>>> First of all, as mentioned further down in this thread, I realized that our
>>>> implementation of the multicast floodmasks does not align with the existing SW
>>>> implementation. We will change this, such that all multicast packets goes to the
>>>> SW bridge.
>>>>
>>>> This changes things a bit, not that much.
>>>>
>>>> I actually think you summarized the issue we have (after changing to multicast
>>>> flood-masks) right here:
>>>>
>>>> The 07/26/2019 12:26, Nikolay Aleksandrov wrote:
>>>>>>> Actually you mentioned non-IP traffic, so the querier stuff is not a problem. This
>>>>>>> traffic will always be flooded by the bridge (and also a copy will be locally sent up).
>>>>>>> Thus only the flooding may need to be controlled.
>>>>
>>>> This seems to be exactly what we need.
>>>>
>>>> Assuming we have a SW bridge (br0) with 4 slave interfaces (eth0-3). We use this
>>>> on a network where we want to limit the flooding of frames with dmac
>>>> 01:21:6C:00:00:01 (which is non IP traffic) to eth0 and eth1.
>>>>
>>>> One way of doing this could potentially be to support the following command:
>>>>
>>>> bridge fdb add    01:21:6C:00:00:01 port eth0
>>>> bridge fdb append 01:21:6C:00:00:01 port eth1
>>>>
>>
>> And the fdbs become linked lists?
> Yes, it will most likely become a linked list
> 
>> So we'll increase the complexity for something that is already supported by
>> ACLs (e.g. tc) and also bridge per-port multicast flood flag ?
> I do not think it can be supported with the facilities we have today in tc.
> 
> We can do half of it (copy more fraems to the CPU) with tc, but we can not limit
> the floodmask of a frame with tc (say we want it to flood to 2 out of 4 slave
> ports).
> 

Why not ? You attach an egress filter for the ports and allow that dmac on only
2 of the ports.

>> I'm sorry but that doesn't sound good to me for a case which is very rare and
>> there are existing ways to solve without incurring performance hits or increasing
>> code complexity.
> I do not consider it rarely, controling the forwarding of L2 multicast frames is
> quite common in the applications we are doing.
> 
>> If you find a way to achieve this without incurring a performance hit or significant
>> code complexity increase, and without breaking current use-cases (e.g. unexpected default
>> forwarding behaviour changes) then please send a patch and we can discuss it further with
>> all details present. People have provided enough alternatives which avoid all of the
>> problems.
> Will do, thanks for the guidance.
> 
> /Allan
> 


^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH] net: bridge: Allow bridge to joing multicast groups
  2019-07-29 14:21                         ` Nikolay Aleksandrov
@ 2019-07-29 14:35                           ` Allan W. Nielsen
  2019-07-29 17:51                             ` Ido Schimmel
  0 siblings, 1 reply; 43+ messages in thread
From: Allan W. Nielsen @ 2019-07-29 14:35 UTC (permalink / raw)
  To: Nikolay Aleksandrov
  Cc: Horatiu Vultur, roopa, davem, bridge, netdev, linux-kernel

The 07/29/2019 17:21, Nikolay Aleksandrov wrote:
> On 29/07/2019 16:52, Allan W. Nielsen wrote:
> > The 07/29/2019 15:50, Nikolay Aleksandrov wrote:
> >> On 29/07/2019 15:22, Nikolay Aleksandrov wrote:
> >>> Hi Allan,
> >>> On 29/07/2019 15:14, Allan W. Nielsen wrote:
> >>>> First of all, as mentioned further down in this thread, I realized that our
> >>>> implementation of the multicast floodmasks does not align with the existing SW
> >>>> implementation. We will change this, such that all multicast packets goes to the
> >>>> SW bridge.
> >>>>
> >>>> This changes things a bit, not that much.
> >>>>
> >>>> I actually think you summarized the issue we have (after changing to multicast
> >>>> flood-masks) right here:
> >>>>
> >>>> The 07/26/2019 12:26, Nikolay Aleksandrov wrote:
> >>>>>>> Actually you mentioned non-IP traffic, so the querier stuff is not a problem. This
> >>>>>>> traffic will always be flooded by the bridge (and also a copy will be locally sent up).
> >>>>>>> Thus only the flooding may need to be controlled.
> >>>>
> >>>> This seems to be exactly what we need.
> >>>>
> >>>> Assuming we have a SW bridge (br0) with 4 slave interfaces (eth0-3). We use this
> >>>> on a network where we want to limit the flooding of frames with dmac
> >>>> 01:21:6C:00:00:01 (which is non IP traffic) to eth0 and eth1.
> >>>>
> >>>> One way of doing this could potentially be to support the following command:
> >>>>
> >>>> bridge fdb add    01:21:6C:00:00:01 port eth0
> >>>> bridge fdb append 01:21:6C:00:00:01 port eth1
> >> And the fdbs become linked lists?
> > Yes, it will most likely become a linked list
> > 
> >> So we'll increase the complexity for something that is already supported by
> >> ACLs (e.g. tc) and also bridge per-port multicast flood flag ?
> > I do not think it can be supported with the facilities we have today in tc.
> > 
> > We can do half of it (copy more fraems to the CPU) with tc, but we can not limit
> > the floodmask of a frame with tc (say we want it to flood to 2 out of 4 slave
> > ports).
> Why not ? You attach an egress filter for the ports and allow that dmac on only
> 2 of the ports.
Because we want a solution which we eventually can offload in HW. And the HW
facilities we have is doing ingress processing (we have no egress ACLs in this
design), and if we try to offload an egress rule, with an ingress HW facility,
then we will run into other issues.

/Allan

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH] net: bridge: Allow bridge to joing multicast groups
  2019-07-29 14:35                           ` Allan W. Nielsen
@ 2019-07-29 17:51                             ` Ido Schimmel
  2019-07-30  6:27                               ` Allan W. Nielsen
  0 siblings, 1 reply; 43+ messages in thread
From: Ido Schimmel @ 2019-07-29 17:51 UTC (permalink / raw)
  To: Allan W. Nielsen
  Cc: Nikolay Aleksandrov, Horatiu Vultur, roopa, davem, bridge,
	netdev, linux-kernel

On Mon, Jul 29, 2019 at 04:35:09PM +0200, Allan W. Nielsen wrote:
> The 07/29/2019 17:21, Nikolay Aleksandrov wrote:
> > On 29/07/2019 16:52, Allan W. Nielsen wrote:
> > > The 07/29/2019 15:50, Nikolay Aleksandrov wrote:
> > >> On 29/07/2019 15:22, Nikolay Aleksandrov wrote:
> > >>> Hi Allan,
> > >>> On 29/07/2019 15:14, Allan W. Nielsen wrote:
> > >>>> First of all, as mentioned further down in this thread, I realized that our
> > >>>> implementation of the multicast floodmasks does not align with the existing SW
> > >>>> implementation. We will change this, such that all multicast packets goes to the
> > >>>> SW bridge.
> > >>>>
> > >>>> This changes things a bit, not that much.
> > >>>>
> > >>>> I actually think you summarized the issue we have (after changing to multicast
> > >>>> flood-masks) right here:
> > >>>>
> > >>>> The 07/26/2019 12:26, Nikolay Aleksandrov wrote:
> > >>>>>>> Actually you mentioned non-IP traffic, so the querier stuff is not a problem. This
> > >>>>>>> traffic will always be flooded by the bridge (and also a copy will be locally sent up).
> > >>>>>>> Thus only the flooding may need to be controlled.
> > >>>>
> > >>>> This seems to be exactly what we need.
> > >>>>
> > >>>> Assuming we have a SW bridge (br0) with 4 slave interfaces (eth0-3). We use this
> > >>>> on a network where we want to limit the flooding of frames with dmac
> > >>>> 01:21:6C:00:00:01 (which is non IP traffic) to eth0 and eth1.
> > >>>>
> > >>>> One way of doing this could potentially be to support the following command:
> > >>>>
> > >>>> bridge fdb add    01:21:6C:00:00:01 port eth0
> > >>>> bridge fdb append 01:21:6C:00:00:01 port eth1
> > >> And the fdbs become linked lists?
> > > Yes, it will most likely become a linked list
> > > 
> > >> So we'll increase the complexity for something that is already supported by
> > >> ACLs (e.g. tc) and also bridge per-port multicast flood flag ?
> > > I do not think it can be supported with the facilities we have today in tc.
> > > 
> > > We can do half of it (copy more fraems to the CPU) with tc, but we can not limit
> > > the floodmask of a frame with tc (say we want it to flood to 2 out of 4 slave
> > > ports).
> > Why not ? You attach an egress filter for the ports and allow that dmac on only
> > 2 of the ports.
> Because we want a solution which we eventually can offload in HW. And the HW
> facilities we have is doing ingress processing (we have no egress ACLs in this
> design), and if we try to offload an egress rule, with an ingress HW facility,
> then we will run into other issues.

Can you please clarify what you're trying to achieve? I just read the
thread again and my impression is that you're trying to locally receive
packets with a certain link layer multicast address. Nik suggested
SIOCADDMULTI and I suggested a tc filter to get the packet to the CPU.

If you now want to limit the ports to which this packet is flooded, then
you can use tc filters in *software*:

# tc qdisc add dev eth2 clsact
# tc filter add dev eth2 egress pref 1 flower skip_hw \
	dst_mac 01:21:6C:00:00:01 action drop

If you want to forward the packet in hardware and locally receive it,
you can chain several mirred action and then a trap action.

Both options avoid HW egress ACLs which your design does not support.

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH] net: bridge: Allow bridge to joing multicast groups
  2019-07-29 17:51                             ` Ido Schimmel
@ 2019-07-30  6:27                               ` Allan W. Nielsen
  2019-07-30  7:06                                 ` Ido Schimmel
  2019-07-30 14:34                                 ` Andrew Lunn
  0 siblings, 2 replies; 43+ messages in thread
From: Allan W. Nielsen @ 2019-07-30  6:27 UTC (permalink / raw)
  To: Ido Schimmel
  Cc: Nikolay Aleksandrov, Horatiu Vultur, roopa, davem, bridge,
	netdev, linux-kernel

The 07/29/2019 20:51, Ido Schimmel wrote:
> Can you please clarify what you're trying to achieve? I just read the
> thread again and my impression is that you're trying to locally receive
> packets with a certain link layer multicast address.
Yes. The thread is also a bit confusing because we half way through realized
that we misunderstood how the multicast packets should be handled (sorry about
that). To begin with we had a driver where multicast packets was only copied to
the CPU if someone needed it. Andrew and Nikolay made us aware that this is not
how other drivers are doing it, so we changed the driver to include the CPU in
the default multicast flood-mask.

This changes the objective a bit. To begin with we needed to get more packets to
the CPU (which could have been done using tc ingress rules and a trap action).

Now after we changed the driver, we realized that we need something to limit the
flooding of certain L2 multicast packets. This is the new problem we are trying
to solve!

Example: Say we have a bridge with 4 slave interfaces, then we want to install a
forwarding rule saying that packets to a given L2-multicast MAC address, should
only be flooded to 2 of the 4 ports.

(instead of adding rules to get certain packets to the CPU, we are now adding
other rules to prevent other packets from going to the CPU and other ports where
they are not needed/wanted).

This is exactly the same thing as IGMP snooping does dynamically, but only for
IP multicast.

The "bridge mdb" allow users to manually/static add/del a port to a multicast
group, but still it operates on IP multicast address (not L2 multicast
addresses).

> Nik suggested SIOCADDMULTI.
It is not clear to me how this should be used to limit the flooding, maybe we
can make some hacks, but as far as I understand the intend of this is maintain
the list of addresses an interface should receive. I'm not sure this should
influence how for forwarding decisions are being made.

> and I suggested a tc filter to get the packet to the CPU.
The TC solution is a good solution to the original problem where wanted to copy
more frames to the CPU. But we were convinced that this is not the right
approach, and that the CPU by default should receive all multicast packets, and
we should instead try to find a way to limit the flooding of certain frames as
an optimization.

> If you now want to limit the ports to which this packet is flooded, then
> you can use tc filters in *software*:
> 
> # tc qdisc add dev eth2 clsact
> # tc filter add dev eth2 egress pref 1 flower skip_hw \
> 	dst_mac 01:21:6C:00:00:01 action drop
Yes. This can work in the SW bridge.

> If you want to forward the packet in hardware and locally receive it,
> you can chain several mirred action and then a trap action.
I'm not I fully understand how this should be done, but it does sound like it
becomes quite complicated. Also, as far as I understand it will mean that we
will be using TCAM/ACL resources to do something that could have been done with
a simple MAC entry.

> Both options avoid HW egress ACLs which your design does not support.
True, but what is wrong with expanding the functionality of the normal
forwarding/MAC operations to allow multiple destinations?

It is not an uncommon feature (I just browsed the manual of some common L2
switches and they all has this feature).

It seems to fit nicely into the existing user-interface:

bridge fdb add    01:21:6C:00:00:01 port eth0
bridge fdb append 01:21:6C:00:00:01 port eth1

It seems that it can be added to the existing implementation with out adding
significant complexity.

It will be easy to offload in HW.

I do not believe that it will be a performance issue, if this is a concern then
we may have to do a bit of benchmarking, or we can make it a configuration
option.

Long story short, we (Horatiu and I) learned a lot from the discussion here, and
I think we should try do a new patch with the learning we got. Then it is easier
to see what it actually means to the exiting code, complexity, exiting drivers,
performance, default behavioral, backwards compatibly, and other valid concerns.

If the patch is no good, and cannot be fixed, then we will go back and look
further into alternative solutions.

-- 
/Allan



^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH] net: bridge: Allow bridge to joing multicast groups
  2019-07-30  6:27                               ` Allan W. Nielsen
@ 2019-07-30  7:06                                 ` Ido Schimmel
  2019-07-30  8:30                                   ` Allan W. Nielsen
  2019-07-30 14:34                                 ` Andrew Lunn
  1 sibling, 1 reply; 43+ messages in thread
From: Ido Schimmel @ 2019-07-30  7:06 UTC (permalink / raw)
  To: Allan W. Nielsen
  Cc: Nikolay Aleksandrov, Horatiu Vultur, roopa, davem, bridge,
	netdev, linux-kernel

On Tue, Jul 30, 2019 at 08:27:22AM +0200, Allan W. Nielsen wrote:
> The 07/29/2019 20:51, Ido Schimmel wrote:
> > Can you please clarify what you're trying to achieve? I just read the
> > thread again and my impression is that you're trying to locally receive
> > packets with a certain link layer multicast address.
> Yes. The thread is also a bit confusing because we half way through realized
> that we misunderstood how the multicast packets should be handled (sorry about
> that). To begin with we had a driver where multicast packets was only copied to
> the CPU if someone needed it. Andrew and Nikolay made us aware that this is not
> how other drivers are doing it, so we changed the driver to include the CPU in
> the default multicast flood-mask.

OK, so what prevents you from removing all other ports from the
flood-mask and letting the CPU handle the flooding? Then you can install
software tc filters to limit the flooding.

> This changes the objective a bit. To begin with we needed to get more packets to
> the CPU (which could have been done using tc ingress rules and a trap action).
> 
> Now after we changed the driver, we realized that we need something to limit the
> flooding of certain L2 multicast packets. This is the new problem we are trying
> to solve!
> 
> Example: Say we have a bridge with 4 slave interfaces, then we want to install a
> forwarding rule saying that packets to a given L2-multicast MAC address, should
> only be flooded to 2 of the 4 ports.
> 
> (instead of adding rules to get certain packets to the CPU, we are now adding
> other rules to prevent other packets from going to the CPU and other ports where
> they are not needed/wanted).
> 
> This is exactly the same thing as IGMP snooping does dynamically, but only for
> IP multicast.
> 
> The "bridge mdb" allow users to manually/static add/del a port to a multicast
> group, but still it operates on IP multicast address (not L2 multicast
> addresses).
> 
> > Nik suggested SIOCADDMULTI.
> It is not clear to me how this should be used to limit the flooding, maybe we
> can make some hacks, but as far as I understand the intend of this is maintain
> the list of addresses an interface should receive. I'm not sure this should
> influence how for forwarding decisions are being made.
> 
> > and I suggested a tc filter to get the packet to the CPU.
> The TC solution is a good solution to the original problem where wanted to copy
> more frames to the CPU. But we were convinced that this is not the right
> approach, and that the CPU by default should receive all multicast packets, and
> we should instead try to find a way to limit the flooding of certain frames as
> an optimization.

This can still work. In Linux, ingress tc filters are executed before the
bridge's Rx handler. The same happens in every sane HW. Ingress ACL is
performed before L2 forwarding. Assuming you have eth0-eth3 bridged and
you want to prevent packets with DMAC 01:21:6C:00:00:01 from egressing
eth2:

# tc filter add dev eth0 ingress pref 1 flower skip_sw \
	dst_mac 01:21:6C:00:00:01 action trap
# tc filter add dev eth2 egress pref 1 flower skip_hw \
	dst_mac 01:21:6C:00:00:01 action drop

The first filter is only present in HW ('skip_sw') and should result in
your HW passing you the sole copy of the packet.

The second filter is only present in SW ('skip_hw', not using HW egress
ACL that you don't have) and drops the packet after it was flooded by
the SW bridge.

As I mentioned earlier, you can install the filter once in your HW and
share it between different ports using a shared block. This means you
only consume one TCAM entry.

Note that this allows you to keep flooding all other multicast packets
in HW.

> > If you now want to limit the ports to which this packet is flooded, then
> > you can use tc filters in *software*:
> > 
> > # tc qdisc add dev eth2 clsact
> > # tc filter add dev eth2 egress pref 1 flower skip_hw \
> > 	dst_mac 01:21:6C:00:00:01 action drop
> Yes. This can work in the SW bridge.
> 
> > If you want to forward the packet in hardware and locally receive it,
> > you can chain several mirred action and then a trap action.
> I'm not I fully understand how this should be done, but it does sound like it
> becomes quite complicated. Also, as far as I understand it will mean that we
> will be using TCAM/ACL resources to do something that could have been done with
> a simple MAC entry.
> 
> > Both options avoid HW egress ACLs which your design does not support.
> True, but what is wrong with expanding the functionality of the normal
> forwarding/MAC operations to allow multiple destinations?
> 
> It is not an uncommon feature (I just browsed the manual of some common L2
> switches and they all has this feature).
> 
> It seems to fit nicely into the existing user-interface:
> 
> bridge fdb add    01:21:6C:00:00:01 port eth0
> bridge fdb append 01:21:6C:00:00:01 port eth1

Wouldn't it be better to instead extend the MDB entries so that they are
either keyed by IP or MAC? I believe FDB should remain as unicast-only.
As a bonus, existing drivers could benefit from it, as MDB entries are
already notified by MAC.

> 
> It seems that it can be added to the existing implementation with out adding
> significant complexity.
> 
> It will be easy to offload in HW.
> 
> I do not believe that it will be a performance issue, if this is a concern then
> we may have to do a bit of benchmarking, or we can make it a configuration
> option.
> 
> Long story short, we (Horatiu and I) learned a lot from the discussion here, and
> I think we should try do a new patch with the learning we got. Then it is easier
> to see what it actually means to the exiting code, complexity, exiting drivers,
> performance, default behavioral, backwards compatibly, and other valid concerns.
> 
> If the patch is no good, and cannot be fixed, then we will go back and look
> further into alternative solutions.

Overall, I tend to agree with Nik. I think your use case is too specific
to justify the amount of changes you want to make in the bridge driver.
We also provided other alternatives. That being said, you're more than
welcome to send the patches and we can continue the discussion then.

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH] net: bridge: Allow bridge to joing multicast groups
  2019-07-30  7:06                                 ` Ido Schimmel
@ 2019-07-30  8:30                                   ` Allan W. Nielsen
  2019-07-30  8:58                                     ` Nikolay Aleksandrov
  2019-07-30 10:04                                     ` Ido Schimmel
  0 siblings, 2 replies; 43+ messages in thread
From: Allan W. Nielsen @ 2019-07-30  8:30 UTC (permalink / raw)
  To: Ido Schimmel
  Cc: Nikolay Aleksandrov, Horatiu Vultur, roopa, davem, bridge,
	netdev, linux-kernel

The 07/30/2019 10:06, Ido Schimmel wrote:
> On Tue, Jul 30, 2019 at 08:27:22AM +0200, Allan W. Nielsen wrote:
> > The 07/29/2019 20:51, Ido Schimmel wrote:
> > > Can you please clarify what you're trying to achieve? I just read the
> > > thread again and my impression is that you're trying to locally receive
> > > packets with a certain link layer multicast address.
> > Yes. The thread is also a bit confusing because we half way through realized
> > that we misunderstood how the multicast packets should be handled (sorry about
> > that). To begin with we had a driver where multicast packets was only copied to
> > the CPU if someone needed it. Andrew and Nikolay made us aware that this is not
> > how other drivers are doing it, so we changed the driver to include the CPU in
> > the default multicast flood-mask.
> OK, so what prevents you from removing all other ports from the
> flood-mask and letting the CPU handle the flooding? Then you can install
> software tc filters to limit the flooding.
I do not have the bandwidth to forward the multicast traffic in the CPU.

It will also cause enormous latency on the forwarding of L2 multicast packets.

> > This changes the objective a bit. To begin with we needed to get more packets to
> > the CPU (which could have been done using tc ingress rules and a trap action).
> > 
> > Now after we changed the driver, we realized that we need something to limit the
> > flooding of certain L2 multicast packets. This is the new problem we are trying
> > to solve!
> > 
> > Example: Say we have a bridge with 4 slave interfaces, then we want to install a
> > forwarding rule saying that packets to a given L2-multicast MAC address, should
> > only be flooded to 2 of the 4 ports.
> > 
> > (instead of adding rules to get certain packets to the CPU, we are now adding
> > other rules to prevent other packets from going to the CPU and other ports where
> > they are not needed/wanted).
> > 
> > This is exactly the same thing as IGMP snooping does dynamically, but only for
> > IP multicast.
> > 
> > The "bridge mdb" allow users to manually/static add/del a port to a multicast
> > group, but still it operates on IP multicast address (not L2 multicast
> > addresses).
> > 
> > > Nik suggested SIOCADDMULTI.
> > It is not clear to me how this should be used to limit the flooding, maybe we
> > can make some hacks, but as far as I understand the intend of this is maintain
> > the list of addresses an interface should receive. I'm not sure this should
> > influence how for forwarding decisions are being made.
> > 
> > > and I suggested a tc filter to get the packet to the CPU.
> > The TC solution is a good solution to the original problem where wanted to copy
> > more frames to the CPU. But we were convinced that this is not the right
> > approach, and that the CPU by default should receive all multicast packets, and
> > we should instead try to find a way to limit the flooding of certain frames as
> > an optimization.
> 
> This can still work. In Linux, ingress tc filters are executed before the
> bridge's Rx handler. The same happens in every sane HW. Ingress ACL is
> performed before L2 forwarding. Assuming you have eth0-eth3 bridged and
> you want to prevent packets with DMAC 01:21:6C:00:00:01 from egressing
> eth2:
> 
> # tc filter add dev eth0 ingress pref 1 flower skip_sw \
> 	dst_mac 01:21:6C:00:00:01 action trap
> # tc filter add dev eth2 egress pref 1 flower skip_hw \
> 	dst_mac 01:21:6C:00:00:01 action drop
> 
> The first filter is only present in HW ('skip_sw') and should result in
> your HW passing you the sole copy of the packet.
Agree.

> The second filter is only present in SW ('skip_hw', not using HW egress
> ACL that you don't have) and drops the packet after it was flooded by
> the SW bridge.
Agree.

> As I mentioned earlier, you can install the filter once in your HW and
> share it between different ports using a shared block. This means you
> only consume one TCAM entry.
> 
> Note that this allows you to keep flooding all other multicast packets
> in HW.
Yes, but the frames we want to limit the flood-mask on are the exact frames
which occurs at a very high rate, and where latency is important.

I really do not consider it as an option to forward this in SW, when it is
something that can easily be offloaded in HW.

> > > If you now want to limit the ports to which this packet is flooded, then
> > > you can use tc filters in *software*:
> > > 
> > > # tc qdisc add dev eth2 clsact
> > > # tc filter add dev eth2 egress pref 1 flower skip_hw \
> > > 	dst_mac 01:21:6C:00:00:01 action drop
> > Yes. This can work in the SW bridge.
> > 
> > > If you want to forward the packet in hardware and locally receive it,
> > > you can chain several mirred action and then a trap action.
> > I'm not I fully understand how this should be done, but it does sound like it
> > becomes quite complicated. Also, as far as I understand it will mean that we
> > will be using TCAM/ACL resources to do something that could have been done with
> > a simple MAC entry.
> > 
> > > Both options avoid HW egress ACLs which your design does not support.
> > True, but what is wrong with expanding the functionality of the normal
> > forwarding/MAC operations to allow multiple destinations?
> > 
> > It is not an uncommon feature (I just browsed the manual of some common L2
> > switches and they all has this feature).
> > 
> > It seems to fit nicely into the existing user-interface:
> > 
> > bridge fdb add    01:21:6C:00:00:01 port eth0
> > bridge fdb append 01:21:6C:00:00:01 port eth1
> 
> Wouldn't it be better to instead extend the MDB entries so that they are
> either keyed by IP or MAC? I believe FDB should remain as unicast-only.

You might be right, it was not clear to me which of the two would fit the
purpose best.

From a user-space iproute2 perspective I prefer using the "bridge fdb" command
as it already supports the needed syntax, and I do not think it will be too
pretty if we squeeze this into the "bridge mdb" command syntax.

But that does not mean that it need to go into the FDB database in the
implementation.

Last evening when I looked at it again, I was considering keeping the
net_bridge_fdb_entry structure as is, and add a new hashtable with the
following:

struct net_bridge_fdbmc_entry {
	struct rhash_head		rhnode;
	struct net_bridge_fdbmc_ports   *dst;

	struct net_bridge_fdb_key	key;
	struct hlist_node		fdb_node;
	unsigned char			offloaded:1;

	struct rcu_head			rcu;
};

If we go with this approach then we can look at the MAC address and see if it is
a unicast which will cause a lookup in the fdb, l3-multicast (33:33:* or
01:00:5e:*) which will cause a lookup in the mdb, or finally a fdbmc which will
need to do a lookup in this new hashtable.

Alternative it would be like this:

struct net_bridge_fdb_entry {
	struct rhash_head		rhnode;
	union net_bridge_port_or_list	*dst;

	struct net_bridge_fdb_key	key;
	struct hlist_node		fdb_node;
	unsigned char			is_local:1,
					is_static:1,
					is_sticky:1,
					added_by_user:1,
					added_by_external_learn:1,
					offloaded:1;
					multi_dst:1;

	/* write-heavy members should not affect lookups */
	unsigned long			updated ____cacheline_aligned_in_smp;
	unsigned long			used;

	struct rcu_head			rcu;
};

Both solutions should require fairly few changes, and should not cause any
measurable performance hit.

Making it fit into the net_bridge_mdb_entry seems to be harder.

> As a bonus, existing drivers could benefit from it, as MDB entries are already
> notified by MAC.
Not sure I follow. When FDB entries are added, it also generates notification
events.

> > It seems that it can be added to the existing implementation with out adding
> > significant complexity.
> > 
> > It will be easy to offload in HW.
> > 
> > I do not believe that it will be a performance issue, if this is a concern then
> > we may have to do a bit of benchmarking, or we can make it a configuration
> > option.
> > 
> > Long story short, we (Horatiu and I) learned a lot from the discussion here, and
> > I think we should try do a new patch with the learning we got. Then it is easier
> > to see what it actually means to the exiting code, complexity, exiting drivers,
> > performance, default behavioral, backwards compatibly, and other valid concerns.
> > 
> > If the patch is no good, and cannot be fixed, then we will go back and look
> > further into alternative solutions.
> Overall, I tend to agree with Nik. I think your use case is too specific
> to justify the amount of changes you want to make in the bridge driver.
> We also provided other alternatives. That being said, you're more than
> welcome to send the patches and we can continue the discussion then.
Okay, good to know. I'm not sure I agree that the alternative solutions really
solves the issue this is trying to solve, nor do I agree that this is specific
to our needs.

But lets take a look at a new patch, and see what is the amount of changes we
are talking about. Without having the patch it is really hard to know for sure.

-- 
/Allan


^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH] net: bridge: Allow bridge to joing multicast groups
  2019-07-30  8:30                                   ` Allan W. Nielsen
@ 2019-07-30  8:58                                     ` Nikolay Aleksandrov
  2019-07-30  9:21                                       ` Allan W. Nielsen
  2019-07-30 10:04                                     ` Ido Schimmel
  1 sibling, 1 reply; 43+ messages in thread
From: Nikolay Aleksandrov @ 2019-07-30  8:58 UTC (permalink / raw)
  To: Allan W. Nielsen, Ido Schimmel
  Cc: Horatiu Vultur, roopa, davem, bridge, netdev, linux-kernel

On 30/07/2019 11:30, Allan W. Nielsen wrote:
> The 07/30/2019 10:06, Ido Schimmel wrote:
>> On Tue, Jul 30, 2019 at 08:27:22AM +0200, Allan W. Nielsen wrote:
>>> The 07/29/2019 20:51, Ido Schimmel wrote:
>>>> Can you please clarify what you're trying to achieve? I just read the
>>>> thread again and my impression is that you're trying to locally receive
>>>> packets with a certain link layer multicast address.
>>> Yes. The thread is also a bit confusing because we half way through realized
>>> that we misunderstood how the multicast packets should be handled (sorry about
>>> that). To begin with we had a driver where multicast packets was only copied to
>>> the CPU if someone needed it. Andrew and Nikolay made us aware that this is not
>>> how other drivers are doing it, so we changed the driver to include the CPU in
>>> the default multicast flood-mask.
>> OK, so what prevents you from removing all other ports from the
>> flood-mask and letting the CPU handle the flooding? Then you can install
>> software tc filters to limit the flooding.
> I do not have the bandwidth to forward the multicast traffic in the CPU.
> 
> It will also cause enormous latency on the forwarding of L2 multicast packets.
> 
>>> This changes the objective a bit. To begin with we needed to get more packets to
>>> the CPU (which could have been done using tc ingress rules and a trap action).
>>>
>>> Now after we changed the driver, we realized that we need something to limit the
>>> flooding of certain L2 multicast packets. This is the new problem we are trying
>>> to solve!
>>>
>>> Example: Say we have a bridge with 4 slave interfaces, then we want to install a
>>> forwarding rule saying that packets to a given L2-multicast MAC address, should
>>> only be flooded to 2 of the 4 ports.
>>>
>>> (instead of adding rules to get certain packets to the CPU, we are now adding
>>> other rules to prevent other packets from going to the CPU and other ports where
>>> they are not needed/wanted).
>>>
>>> This is exactly the same thing as IGMP snooping does dynamically, but only for
>>> IP multicast.
>>>
>>> The "bridge mdb" allow users to manually/static add/del a port to a multicast
>>> group, but still it operates on IP multicast address (not L2 multicast
>>> addresses).
>>>
>>>> Nik suggested SIOCADDMULTI.
>>> It is not clear to me how this should be used to limit the flooding, maybe we
>>> can make some hacks, but as far as I understand the intend of this is maintain
>>> the list of addresses an interface should receive. I'm not sure this should
>>> influence how for forwarding decisions are being made.
>>>
>>>> and I suggested a tc filter to get the packet to the CPU.
>>> The TC solution is a good solution to the original problem where wanted to copy
>>> more frames to the CPU. But we were convinced that this is not the right
>>> approach, and that the CPU by default should receive all multicast packets, and
>>> we should instead try to find a way to limit the flooding of certain frames as
>>> an optimization.
>>
>> This can still work. In Linux, ingress tc filters are executed before the
>> bridge's Rx handler. The same happens in every sane HW. Ingress ACL is
>> performed before L2 forwarding. Assuming you have eth0-eth3 bridged and
>> you want to prevent packets with DMAC 01:21:6C:00:00:01 from egressing
>> eth2:
>>
>> # tc filter add dev eth0 ingress pref 1 flower skip_sw \
>> 	dst_mac 01:21:6C:00:00:01 action trap
>> # tc filter add dev eth2 egress pref 1 flower skip_hw \
>> 	dst_mac 01:21:6C:00:00:01 action drop
>>
>> The first filter is only present in HW ('skip_sw') and should result in
>> your HW passing you the sole copy of the packet.
> Agree.
> 
>> The second filter is only present in SW ('skip_hw', not using HW egress
>> ACL that you don't have) and drops the packet after it was flooded by
>> the SW bridge.
> Agree.
> 
>> As I mentioned earlier, you can install the filter once in your HW and
>> share it between different ports using a shared block. This means you
>> only consume one TCAM entry.
>>
>> Note that this allows you to keep flooding all other multicast packets
>> in HW.
> Yes, but the frames we want to limit the flood-mask on are the exact frames
> which occurs at a very high rate, and where latency is important.
> 
> I really do not consider it as an option to forward this in SW, when it is
> something that can easily be offloaded in HW.
> 
>>>> If you now want to limit the ports to which this packet is flooded, then
>>>> you can use tc filters in *software*:
>>>>
>>>> # tc qdisc add dev eth2 clsact
>>>> # tc filter add dev eth2 egress pref 1 flower skip_hw \
>>>> 	dst_mac 01:21:6C:00:00:01 action drop
>>> Yes. This can work in the SW bridge.
>>>
>>>> If you want to forward the packet in hardware and locally receive it,
>>>> you can chain several mirred action and then a trap action.
>>> I'm not I fully understand how this should be done, but it does sound like it
>>> becomes quite complicated. Also, as far as I understand it will mean that we
>>> will be using TCAM/ACL resources to do something that could have been done with
>>> a simple MAC entry.
>>>
>>>> Both options avoid HW egress ACLs which your design does not support.
>>> True, but what is wrong with expanding the functionality of the normal
>>> forwarding/MAC operations to allow multiple destinations?
>>>
>>> It is not an uncommon feature (I just browsed the manual of some common L2
>>> switches and they all has this feature).
>>>
>>> It seems to fit nicely into the existing user-interface:
>>>
>>> bridge fdb add    01:21:6C:00:00:01 port eth0
>>> bridge fdb append 01:21:6C:00:00:01 port eth1
>>
>> Wouldn't it be better to instead extend the MDB entries so that they are
>> either keyed by IP or MAC? I believe FDB should remain as unicast-only.
> 
> You might be right, it was not clear to me which of the two would fit the
> purpose best.
> 
> From a user-space iproute2 perspective I prefer using the "bridge fdb" command
> as it already supports the needed syntax, and I do not think it will be too
> pretty if we squeeze this into the "bridge mdb" command syntax.
> 

MDB is a much better fit as Ido already suggested. FDB should remain unicast
and mixing them is not a good idea, we already have a good ucast/mcast separation
and we'd like to keep it that way.

> But that does not mean that it need to go into the FDB database in the
> implementation.
> 
> Last evening when I looked at it again, I was considering keeping the
> net_bridge_fdb_entry structure as is, and add a new hashtable with the
> following:
> 
> struct net_bridge_fdbmc_entry {
> 	struct rhash_head		rhnode;
> 	struct net_bridge_fdbmc_ports   *dst;
> 
> 	struct net_bridge_fdb_key	key;
> 	struct hlist_node		fdb_node;
> 	unsigned char			offloaded:1;
> 
> 	struct rcu_head			rcu;
> };
> 

What would the notification for this look like ?

> If we go with this approach then we can look at the MAC address and see if it is
> a unicast which will cause a lookup in the fdb, l3-multicast (33:33:* or
> 01:00:5e:*) which will cause a lookup in the mdb, or finally a fdbmc which will
> need to do a lookup in this new hashtable.

That sounds wrong, you will change the current default behaviour of flooding these
packets. This will have to be well hidden behind a new option and enabled only on user
request.

> 
> Alternative it would be like this:
> 
> struct net_bridge_fdb_entry {
> 	struct rhash_head		rhnode;
> 	union net_bridge_port_or_list	*dst;
> 
> 	struct net_bridge_fdb_key	key;
> 	struct hlist_node		fdb_node;
> 	unsigned char			is_local:1,
> 					is_static:1,
> 					is_sticky:1,
> 					added_by_user:1,
> 					added_by_external_learn:1,
> 					offloaded:1;
> 					multi_dst:1;
> 
> 	/* write-heavy members should not affect lookups */
> 	unsigned long			updated ____cacheline_aligned_in_smp;
> 	unsigned long			used;
> 
> 	struct rcu_head			rcu;
> };
> 
> Both solutions should require fairly few changes, and should not cause any
> measurable performance hit.
> 

You'll have to convert these bits to use the proper atomic bitops if you go with
the second solution. That has to be done even today, but the second case would
make it a must.

> Making it fit into the net_bridge_mdb_entry seems to be harder.
> 

But it is the correct abstraction from bridge POV, so please stop trying to change
the FDB code and try to keep to the multicast code.

>> As a bonus, existing drivers could benefit from it, as MDB entries are already
>> notified by MAC.
> Not sure I follow. When FDB entries are added, it also generates notification
> events.
> 

Could you please show fdb event with multiple ports ?

>>> It seems that it can be added to the existing implementation with out adding
>>> significant complexity.
>>>
>>> It will be easy to offload in HW.
>>>
>>> I do not believe that it will be a performance issue, if this is a concern then
>>> we may have to do a bit of benchmarking, or we can make it a configuration
>>> option.
>>>
>>> Long story short, we (Horatiu and I) learned a lot from the discussion here, and
>>> I think we should try do a new patch with the learning we got. Then it is easier
>>> to see what it actually means to the exiting code, complexity, exiting drivers,
>>> performance, default behavioral, backwards compatibly, and other valid concerns.
>>>
>>> If the patch is no good, and cannot be fixed, then we will go back and look
>>> further into alternative solutions.
>> Overall, I tend to agree with Nik. I think your use case is too specific
>> to justify the amount of changes you want to make in the bridge driver.
>> We also provided other alternatives. That being said, you're more than
>> welcome to send the patches and we can continue the discussion then.
> Okay, good to know. I'm not sure I agree that the alternative solutions really
> solves the issue this is trying to solve, nor do I agree that this is specific
> to our needs.
> 
> But lets take a look at a new patch, and see what is the amount of changes we
> are talking about. Without having the patch it is really hard to know for sure.
> 

Please keep in mind that this case is the exception, not the norm, thus it should
not under any circumstance affect the standard deployments.

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH] net: bridge: Allow bridge to joing multicast groups
  2019-07-30  8:58                                     ` Nikolay Aleksandrov
@ 2019-07-30  9:21                                       ` Allan W. Nielsen
  2019-07-30  9:55                                         ` Nikolay Aleksandrov
  0 siblings, 1 reply; 43+ messages in thread
From: Allan W. Nielsen @ 2019-07-30  9:21 UTC (permalink / raw)
  To: Nikolay Aleksandrov
  Cc: Ido Schimmel, Horatiu Vultur, roopa, davem, bridge, netdev, linux-kernel

The 07/30/2019 11:58, Nikolay Aleksandrov wrote:
> On 30/07/2019 11:30, Allan W. Nielsen wrote:
> > The 07/30/2019 10:06, Ido Schimmel wrote:
> >> On Tue, Jul 30, 2019 at 08:27:22AM +0200, Allan W. Nielsen wrote:
> >>> The 07/29/2019 20:51, Ido Schimmel wrote:
> >>>> Can you please clarify what you're trying to achieve? I just read the
> >>>> thread again and my impression is that you're trying to locally receive
> >>>> packets with a certain link layer multicast address.
> >>> Yes. The thread is also a bit confusing because we half way through realized
> >>> that we misunderstood how the multicast packets should be handled (sorry about
> >>> that). To begin with we had a driver where multicast packets was only copied to
> >>> the CPU if someone needed it. Andrew and Nikolay made us aware that this is not
> >>> how other drivers are doing it, so we changed the driver to include the CPU in
> >>> the default multicast flood-mask.
> >> OK, so what prevents you from removing all other ports from the
> >> flood-mask and letting the CPU handle the flooding? Then you can install
> >> software tc filters to limit the flooding.
> > I do not have the bandwidth to forward the multicast traffic in the CPU.
> > 
> > It will also cause enormous latency on the forwarding of L2 multicast packets.
> > 
> >>> This changes the objective a bit. To begin with we needed to get more packets to
> >>> the CPU (which could have been done using tc ingress rules and a trap action).
> >>>
> >>> Now after we changed the driver, we realized that we need something to limit the
> >>> flooding of certain L2 multicast packets. This is the new problem we are trying
> >>> to solve!
> >>>
> >>> Example: Say we have a bridge with 4 slave interfaces, then we want to install a
> >>> forwarding rule saying that packets to a given L2-multicast MAC address, should
> >>> only be flooded to 2 of the 4 ports.
> >>>
> >>> (instead of adding rules to get certain packets to the CPU, we are now adding
> >>> other rules to prevent other packets from going to the CPU and other ports where
> >>> they are not needed/wanted).
> >>>
> >>> This is exactly the same thing as IGMP snooping does dynamically, but only for
> >>> IP multicast.
> >>>
> >>> The "bridge mdb" allow users to manually/static add/del a port to a multicast
> >>> group, but still it operates on IP multicast address (not L2 multicast
> >>> addresses).
> >>>
> >>>> Nik suggested SIOCADDMULTI.
> >>> It is not clear to me how this should be used to limit the flooding, maybe we
> >>> can make some hacks, but as far as I understand the intend of this is maintain
> >>> the list of addresses an interface should receive. I'm not sure this should
> >>> influence how for forwarding decisions are being made.
> >>>
> >>>> and I suggested a tc filter to get the packet to the CPU.
> >>> The TC solution is a good solution to the original problem where wanted to copy
> >>> more frames to the CPU. But we were convinced that this is not the right
> >>> approach, and that the CPU by default should receive all multicast packets, and
> >>> we should instead try to find a way to limit the flooding of certain frames as
> >>> an optimization.
> >>
> >> This can still work. In Linux, ingress tc filters are executed before the
> >> bridge's Rx handler. The same happens in every sane HW. Ingress ACL is
> >> performed before L2 forwarding. Assuming you have eth0-eth3 bridged and
> >> you want to prevent packets with DMAC 01:21:6C:00:00:01 from egressing
> >> eth2:
> >>
> >> # tc filter add dev eth0 ingress pref 1 flower skip_sw \
> >> 	dst_mac 01:21:6C:00:00:01 action trap
> >> # tc filter add dev eth2 egress pref 1 flower skip_hw \
> >> 	dst_mac 01:21:6C:00:00:01 action drop
> >>
> >> The first filter is only present in HW ('skip_sw') and should result in
> >> your HW passing you the sole copy of the packet.
> > Agree.
> > 
> >> The second filter is only present in SW ('skip_hw', not using HW egress
> >> ACL that you don't have) and drops the packet after it was flooded by
> >> the SW bridge.
> > Agree.
> > 
> >> As I mentioned earlier, you can install the filter once in your HW and
> >> share it between different ports using a shared block. This means you
> >> only consume one TCAM entry.
> >>
> >> Note that this allows you to keep flooding all other multicast packets
> >> in HW.
> > Yes, but the frames we want to limit the flood-mask on are the exact frames
> > which occurs at a very high rate, and where latency is important.
> > 
> > I really do not consider it as an option to forward this in SW, when it is
> > something that can easily be offloaded in HW.
> > 
> >>>> If you now want to limit the ports to which this packet is flooded, then
> >>>> you can use tc filters in *software*:
> >>>>
> >>>> # tc qdisc add dev eth2 clsact
> >>>> # tc filter add dev eth2 egress pref 1 flower skip_hw \
> >>>> 	dst_mac 01:21:6C:00:00:01 action drop
> >>> Yes. This can work in the SW bridge.
> >>>
> >>>> If you want to forward the packet in hardware and locally receive it,
> >>>> you can chain several mirred action and then a trap action.
> >>> I'm not I fully understand how this should be done, but it does sound like it
> >>> becomes quite complicated. Also, as far as I understand it will mean that we
> >>> will be using TCAM/ACL resources to do something that could have been done with
> >>> a simple MAC entry.
> >>>
> >>>> Both options avoid HW egress ACLs which your design does not support.
> >>> True, but what is wrong with expanding the functionality of the normal
> >>> forwarding/MAC operations to allow multiple destinations?
> >>>
> >>> It is not an uncommon feature (I just browsed the manual of some common L2
> >>> switches and they all has this feature).
> >>>
> >>> It seems to fit nicely into the existing user-interface:
> >>>
> >>> bridge fdb add    01:21:6C:00:00:01 port eth0
> >>> bridge fdb append 01:21:6C:00:00:01 port eth1
> >>
> >> Wouldn't it be better to instead extend the MDB entries so that they are
> >> either keyed by IP or MAC? I believe FDB should remain as unicast-only.
> > 
> > You might be right, it was not clear to me which of the two would fit the
> > purpose best.
> > 
> > From a user-space iproute2 perspective I prefer using the "bridge fdb" command
> > as it already supports the needed syntax, and I do not think it will be too
> > pretty if we squeeze this into the "bridge mdb" command syntax.
> > 
> 
> MDB is a much better fit as Ido already suggested. FDB should remain unicast
> and mixing them is not a good idea, we already have a good ucast/mcast separation
> and we'd like to keep it that way.
Okay. We will explore that option.


> > But that does not mean that it need to go into the FDB database in the
> > implementation.
> > 
> > Last evening when I looked at it again, I was considering keeping the
> > net_bridge_fdb_entry structure as is, and add a new hashtable with the
> > following:
> > 
> > struct net_bridge_fdbmc_entry {
> > 	struct rhash_head		rhnode;
> > 	struct net_bridge_fdbmc_ports   *dst;
> > 
> > 	struct net_bridge_fdb_key	key;
> > 	struct hlist_node		fdb_node;
> > 	unsigned char			offloaded:1;
> > 
> > 	struct rcu_head			rcu;
> > };
> > 
> 
> What would the notification for this look like ?
Not sure. But we will change the direction and use the MDB structures instead.

> > If we go with this approach then we can look at the MAC address and see if it is
> > a unicast which will cause a lookup in the fdb, l3-multicast (33:33:* or
> > 01:00:5e:*) which will cause a lookup in the mdb, or finally a fdbmc which will
> > need to do a lookup in this new hashtable.
> 
> That sounds wrong, you will change the current default behaviour of flooding these
> packets. This will have to be well hidden behind a new option and enabled only on user
> request.
It will only affect users who install a static L2-multicast entry. If no entry
is found, it will default to flooding, which will be the same as before.

> > Alternative it would be like this:
> > 
> > struct net_bridge_fdb_entry {
> > 	struct rhash_head		rhnode;
> > 	union net_bridge_port_or_list	*dst;
> > 
> > 	struct net_bridge_fdb_key	key;
> > 	struct hlist_node		fdb_node;
> > 	unsigned char			is_local:1,
> > 					is_static:1,
> > 					is_sticky:1,
> > 					added_by_user:1,
> > 					added_by_external_learn:1,
> > 					offloaded:1;
> > 					multi_dst:1;
> > 
> > 	/* write-heavy members should not affect lookups */
> > 	unsigned long			updated ____cacheline_aligned_in_smp;
> > 	unsigned long			used;
> > 
> > 	struct rcu_head			rcu;
> > };
> > 
> > Both solutions should require fairly few changes, and should not cause any
> > measurable performance hit.
> > 
> 
> You'll have to convert these bits to use the proper atomic bitops if you go with
> the second solution. That has to be done even today, but the second case would
> make it a must.
Good to know.

Just for my understanding, is this because this is the "current" guide lines on
how things should be done, or is this because the multi_dst as a special need.

The multi_dst flag will never be changed in the life-cycle of the structure, and
the structure is protected by rcu. If this is causeing a raise, then I do not
see it.

> > Making it fit into the net_bridge_mdb_entry seems to be harder.
> > 
> 
> But it is the correct abstraction from bridge POV, so please stop trying to change
> the FDB code and try to keep to the multicast code.
We are planning on letting the net_bridge_port_or_list union use the
net_bridge_port_group structure, which will mean that we can re-use the
br_multicast_flood function (if we change the signatire to accept the ports
instead of the entry).

> >> As a bonus, existing drivers could benefit from it, as MDB entries are already
> >> notified by MAC.
> > Not sure I follow. When FDB entries are added, it also generates notification
> > events.
> > 
> 
> Could you please show fdb event with multiple ports ?
We will get to that. Maybe this is an argument for converting to mdb. We have
not looked into the details of this yet.

> >>> It seems that it can be added to the existing implementation with out adding
> >>> significant complexity.
> >>>
> >>> It will be easy to offload in HW.
> >>>
> >>> I do not believe that it will be a performance issue, if this is a concern then
> >>> we may have to do a bit of benchmarking, or we can make it a configuration
> >>> option.
> >>>
> >>> Long story short, we (Horatiu and I) learned a lot from the discussion here, and
> >>> I think we should try do a new patch with the learning we got. Then it is easier
> >>> to see what it actually means to the exiting code, complexity, exiting drivers,
> >>> performance, default behavioral, backwards compatibly, and other valid concerns.
> >>>
> >>> If the patch is no good, and cannot be fixed, then we will go back and look
> >>> further into alternative solutions.
> >> Overall, I tend to agree with Nik. I think your use case is too specific
> >> to justify the amount of changes you want to make in the bridge driver.
> >> We also provided other alternatives. That being said, you're more than
> >> welcome to send the patches and we can continue the discussion then.
> > Okay, good to know. I'm not sure I agree that the alternative solutions really
> > solves the issue this is trying to solve, nor do I agree that this is specific
> > to our needs.
> > 
> > But lets take a look at a new patch, and see what is the amount of changes we
> > are talking about. Without having the patch it is really hard to know for sure.
> Please keep in mind that this case is the exception, not the norm, thus it should
> not under any circumstance affect the standard deployments.
Understood - no surprises.

-- 
/Allan

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH] net: bridge: Allow bridge to joing multicast groups
  2019-07-30  9:21                                       ` Allan W. Nielsen
@ 2019-07-30  9:55                                         ` Nikolay Aleksandrov
  2019-07-30 11:23                                           ` Allan W. Nielsen
  0 siblings, 1 reply; 43+ messages in thread
From: Nikolay Aleksandrov @ 2019-07-30  9:55 UTC (permalink / raw)
  To: Allan W. Nielsen
  Cc: Ido Schimmel, Horatiu Vultur, roopa, davem, bridge, netdev, linux-kernel

On 30/07/2019 12:21, Allan W. Nielsen wrote:
> The 07/30/2019 11:58, Nikolay Aleksandrov wrote:
>> On 30/07/2019 11:30, Allan W. Nielsen wrote:
>>> The 07/30/2019 10:06, Ido Schimmel wrote:
>>>> On Tue, Jul 30, 2019 at 08:27:22AM +0200, Allan W. Nielsen wrote:
>>>>> The 07/29/2019 20:51, Ido Schimmel wrote:
>>>>>> Can you please clarify what you're trying to achieve? I just read the
>>>>>> thread again and my impression is that you're trying to locally receive
>>>>>> packets with a certain link layer multicast address.
>>>>> Yes. The thread is also a bit confusing because we half way through realized
>>>>> that we misunderstood how the multicast packets should be handled (sorry about
>>>>> that). To begin with we had a driver where multicast packets was only copied to
>>>>> the CPU if someone needed it. Andrew and Nikolay made us aware that this is not
>>>>> how other drivers are doing it, so we changed the driver to include the CPU in
>>>>> the default multicast flood-mask.
>>>> OK, so what prevents you from removing all other ports from the
>>>> flood-mask and letting the CPU handle the flooding? Then you can install
>>>> software tc filters to limit the flooding.
>>> I do not have the bandwidth to forward the multicast traffic in the CPU.
>>>
>>> It will also cause enormous latency on the forwarding of L2 multicast packets.
>>>
>>>>> This changes the objective a bit. To begin with we needed to get more packets to
>>>>> the CPU (which could have been done using tc ingress rules and a trap action).
>>>>>
>>>>> Now after we changed the driver, we realized that we need something to limit the
>>>>> flooding of certain L2 multicast packets. This is the new problem we are trying
>>>>> to solve!
>>>>>
>>>>> Example: Say we have a bridge with 4 slave interfaces, then we want to install a
>>>>> forwarding rule saying that packets to a given L2-multicast MAC address, should
>>>>> only be flooded to 2 of the 4 ports.
>>>>>
>>>>> (instead of adding rules to get certain packets to the CPU, we are now adding
>>>>> other rules to prevent other packets from going to the CPU and other ports where
>>>>> they are not needed/wanted).
>>>>>
>>>>> This is exactly the same thing as IGMP snooping does dynamically, but only for
>>>>> IP multicast.
>>>>>
>>>>> The "bridge mdb" allow users to manually/static add/del a port to a multicast
>>>>> group, but still it operates on IP multicast address (not L2 multicast
>>>>> addresses).
>>>>>
>>>>>> Nik suggested SIOCADDMULTI.
>>>>> It is not clear to me how this should be used to limit the flooding, maybe we
>>>>> can make some hacks, but as far as I understand the intend of this is maintain
>>>>> the list of addresses an interface should receive. I'm not sure this should
>>>>> influence how for forwarding decisions are being made.
>>>>>
>>>>>> and I suggested a tc filter to get the packet to the CPU.
>>>>> The TC solution is a good solution to the original problem where wanted to copy
>>>>> more frames to the CPU. But we were convinced that this is not the right
>>>>> approach, and that the CPU by default should receive all multicast packets, and
>>>>> we should instead try to find a way to limit the flooding of certain frames as
>>>>> an optimization.
>>>>
>>>> This can still work. In Linux, ingress tc filters are executed before the
>>>> bridge's Rx handler. The same happens in every sane HW. Ingress ACL is
>>>> performed before L2 forwarding. Assuming you have eth0-eth3 bridged and
>>>> you want to prevent packets with DMAC 01:21:6C:00:00:01 from egressing
>>>> eth2:
>>>>
>>>> # tc filter add dev eth0 ingress pref 1 flower skip_sw \
>>>> 	dst_mac 01:21:6C:00:00:01 action trap
>>>> # tc filter add dev eth2 egress pref 1 flower skip_hw \
>>>> 	dst_mac 01:21:6C:00:00:01 action drop
>>>>
>>>> The first filter is only present in HW ('skip_sw') and should result in
>>>> your HW passing you the sole copy of the packet.
>>> Agree.
>>>
>>>> The second filter is only present in SW ('skip_hw', not using HW egress
>>>> ACL that you don't have) and drops the packet after it was flooded by
>>>> the SW bridge.
>>> Agree.
>>>
>>>> As I mentioned earlier, you can install the filter once in your HW and
>>>> share it between different ports using a shared block. This means you
>>>> only consume one TCAM entry.
>>>>
>>>> Note that this allows you to keep flooding all other multicast packets
>>>> in HW.
>>> Yes, but the frames we want to limit the flood-mask on are the exact frames
>>> which occurs at a very high rate, and where latency is important.
>>>
>>> I really do not consider it as an option to forward this in SW, when it is
>>> something that can easily be offloaded in HW.
>>>
>>>>>> If you now want to limit the ports to which this packet is flooded, then
>>>>>> you can use tc filters in *software*:
>>>>>>
>>>>>> # tc qdisc add dev eth2 clsact
>>>>>> # tc filter add dev eth2 egress pref 1 flower skip_hw \
>>>>>> 	dst_mac 01:21:6C:00:00:01 action drop
>>>>> Yes. This can work in the SW bridge.
>>>>>
>>>>>> If you want to forward the packet in hardware and locally receive it,
>>>>>> you can chain several mirred action and then a trap action.
>>>>> I'm not I fully understand how this should be done, but it does sound like it
>>>>> becomes quite complicated. Also, as far as I understand it will mean that we
>>>>> will be using TCAM/ACL resources to do something that could have been done with
>>>>> a simple MAC entry.
>>>>>
>>>>>> Both options avoid HW egress ACLs which your design does not support.
>>>>> True, but what is wrong with expanding the functionality of the normal
>>>>> forwarding/MAC operations to allow multiple destinations?
>>>>>
>>>>> It is not an uncommon feature (I just browsed the manual of some common L2
>>>>> switches and they all has this feature).
>>>>>
>>>>> It seems to fit nicely into the existing user-interface:
>>>>>
>>>>> bridge fdb add    01:21:6C:00:00:01 port eth0
>>>>> bridge fdb append 01:21:6C:00:00:01 port eth1
>>>>
>>>> Wouldn't it be better to instead extend the MDB entries so that they are
>>>> either keyed by IP or MAC? I believe FDB should remain as unicast-only.
>>>
>>> You might be right, it was not clear to me which of the two would fit the
>>> purpose best.
>>>
>>> From a user-space iproute2 perspective I prefer using the "bridge fdb" command
>>> as it already supports the needed syntax, and I do not think it will be too
>>> pretty if we squeeze this into the "bridge mdb" command syntax.
>>>
>>
>> MDB is a much better fit as Ido already suggested. FDB should remain unicast
>> and mixing them is not a good idea, we already have a good ucast/mcast separation
>> and we'd like to keep it that way.
> Okay. We will explore that option.
> 
> 
Great, thanks.

>>> But that does not mean that it need to go into the FDB database in the
>>> implementation.
>>>
>>> Last evening when I looked at it again, I was considering keeping the
>>> net_bridge_fdb_entry structure as is, and add a new hashtable with the
>>> following:
>>>
>>> struct net_bridge_fdbmc_entry {
>>> 	struct rhash_head		rhnode;
>>> 	struct net_bridge_fdbmc_ports   *dst;
>>>
>>> 	struct net_bridge_fdb_key	key;
>>> 	struct hlist_node		fdb_node;
>>> 	unsigned char			offloaded:1;
>>>
>>> 	struct rcu_head			rcu;
>>> };
>>>
>>
>> What would the notification for this look like ?
> Not sure. But we will change the direction and use the MDB structures instead.
> 

Ack

>>> If we go with this approach then we can look at the MAC address and see if it is
>>> a unicast which will cause a lookup in the fdb, l3-multicast (33:33:* or
>>> 01:00:5e:*) which will cause a lookup in the mdb, or finally a fdbmc which will
>>> need to do a lookup in this new hashtable.
>>
>> That sounds wrong, you will change the current default behaviour of flooding these
>> packets. This will have to be well hidden behind a new option and enabled only on user
>> request.
> It will only affect users who install a static L2-multicast entry. If no entry
> is found, it will default to flooding, which will be the same as before.
> 

Ack

>>> Alternative it would be like this:
>>>
>>> struct net_bridge_fdb_entry {
>>> 	struct rhash_head		rhnode;
>>> 	union net_bridge_port_or_list	*dst;
>>>
>>> 	struct net_bridge_fdb_key	key;
>>> 	struct hlist_node		fdb_node;
>>> 	unsigned char			is_local:1,
>>> 					is_static:1,
>>> 					is_sticky:1,
>>> 					added_by_user:1,
>>> 					added_by_external_learn:1,
>>> 					offloaded:1;
>>> 					multi_dst:1;
>>>
>>> 	/* write-heavy members should not affect lookups */
>>> 	unsigned long			updated ____cacheline_aligned_in_smp;
>>> 	unsigned long			used;
>>>
>>> 	struct rcu_head			rcu;
>>> };
>>>
>>> Both solutions should require fairly few changes, and should not cause any
>>> measurable performance hit.
>>>
>>
>> You'll have to convert these bits to use the proper atomic bitops if you go with
>> the second solution. That has to be done even today, but the second case would
>> make it a must.
> Good to know.
> 
> Just for my understanding, is this because this is the "current" guide lines on
> how things should be done, or is this because the multi_dst as a special need.
> 
> The multi_dst flag will never be changed in the life-cycle of the structure, and
> the structure is protected by rcu. If this is causeing a raise, then I do not
> see it.
> 

These flags are changed from different contexts without any locking and you can end
up with wrong values since these are not atomic ops. We need to move to atomic ops
to guarantee consistent results. It is not only multi_dst issue, it's a problem
for all of them, it's just not critical today since you'll end up with wrong
flag values in such cases, but with multi_dst it will be important because the union
pointer will have to be treated different based on the multi_dst value.

>>> Making it fit into the net_bridge_mdb_entry seems to be harder.
>>>
>>
>> But it is the correct abstraction from bridge POV, so please stop trying to change
>> the FDB code and try to keep to the multicast code.
> We are planning on letting the net_bridge_port_or_list union use the
> net_bridge_port_group structure, which will mean that we can re-use the
> br_multicast_flood function (if we change the signatire to accept the ports
> instead of the entry).
> 

That sounds great, definitely a step in the right direction.

>>>> As a bonus, existing drivers could benefit from it, as MDB entries are already
>>>> notified by MAC.
>>> Not sure I follow. When FDB entries are added, it also generates notification
>>> events.
>>>
>>
>> Could you please show fdb event with multiple ports ?
> We will get to that. Maybe this is an argument for converting to mdb. We have
> not looked into the details of this yet.
> 

I can see a few potential problems, the important thing here would be to keep
backwards compatibility.

>>>>> It seems that it can be added to the existing implementation with out adding
>>>>> significant complexity.
>>>>>
>>>>> It will be easy to offload in HW.
>>>>>
>>>>> I do not believe that it will be a performance issue, if this is a concern then
>>>>> we may have to do a bit of benchmarking, or we can make it a configuration
>>>>> option.
>>>>>
>>>>> Long story short, we (Horatiu and I) learned a lot from the discussion here, and
>>>>> I think we should try do a new patch with the learning we got. Then it is easier
>>>>> to see what it actually means to the exiting code, complexity, exiting drivers,
>>>>> performance, default behavioral, backwards compatibly, and other valid concerns.
>>>>>
>>>>> If the patch is no good, and cannot be fixed, then we will go back and look
>>>>> further into alternative solutions.
>>>> Overall, I tend to agree with Nik. I think your use case is too specific
>>>> to justify the amount of changes you want to make in the bridge driver.
>>>> We also provided other alternatives. That being said, you're more than
>>>> welcome to send the patches and we can continue the discussion then.
>>> Okay, good to know. I'm not sure I agree that the alternative solutions really
>>> solves the issue this is trying to solve, nor do I agree that this is specific
>>> to our needs.
>>>
>>> But lets take a look at a new patch, and see what is the amount of changes we
>>> are talking about. Without having the patch it is really hard to know for sure.
>> Please keep in mind that this case is the exception, not the norm, thus it should
>> not under any circumstance affect the standard deployments.
> Understood - no surprises.
> 

Great, thanks again. Will continue discussing when the new patch arrives.

Cheers,
 Nik

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH] net: bridge: Allow bridge to joing multicast groups
  2019-07-30  8:30                                   ` Allan W. Nielsen
  2019-07-30  8:58                                     ` Nikolay Aleksandrov
@ 2019-07-30 10:04                                     ` Ido Schimmel
  2019-07-30 12:19                                       ` Allan W. Nielsen
  1 sibling, 1 reply; 43+ messages in thread
From: Ido Schimmel @ 2019-07-30 10:04 UTC (permalink / raw)
  To: Allan W. Nielsen
  Cc: Nikolay Aleksandrov, Horatiu Vultur, roopa, davem, bridge,
	netdev, linux-kernel

On Tue, Jul 30, 2019 at 10:30:28AM +0200, Allan W. Nielsen wrote:
> The 07/30/2019 10:06, Ido Schimmel wrote:
> > As a bonus, existing drivers could benefit from it, as MDB entries are already
> > notified by MAC.
> Not sure I follow. When FDB entries are added, it also generates notification
> events.

I meant the switchdev notification sent to drivers:

/* SWITCHDEV_OBJ_ID_PORT_MDB */
struct switchdev_obj_port_mdb {
	struct switchdev_obj obj;
	unsigned char addr[ETH_ALEN];
	u16 vid;
};

By extending MDB entries to also be keyed by MAC you basically get a lot
of things for free without duplicating the same code for multicast FDBs.

AFAICS, then only change in the fast path is in br_mdb_get() where you
need to use DMAC as key in case Ethertype is not IPv4/IPv6.

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH] net: bridge: Allow bridge to joing multicast groups
  2019-07-30  9:55                                         ` Nikolay Aleksandrov
@ 2019-07-30 11:23                                           ` Allan W. Nielsen
  0 siblings, 0 replies; 43+ messages in thread
From: Allan W. Nielsen @ 2019-07-30 11:23 UTC (permalink / raw)
  To: Nikolay Aleksandrov
  Cc: Ido Schimmel, Horatiu Vultur, roopa, davem, bridge, netdev, linux-kernel

The 07/30/2019 12:55, Nikolay Aleksandrov wrote:
> On 30/07/2019 12:21, Allan W. Nielsen wrote:
> > The 07/30/2019 11:58, Nikolay Aleksandrov wrote:
> >> On 30/07/2019 11:30, Allan W. Nielsen wrote:
> >>> The 07/30/2019 10:06, Ido Schimmel wrote:
> >>>> On Tue, Jul 30, 2019 at 08:27:22AM +0200, Allan W. Nielsen wrote:
> >>>>> The 07/29/2019 20:51, Ido Schimmel wrote:
> >>>>>> Can you please clarify what you're trying to achieve? I just read the
> >>>>>> thread again and my impression is that you're trying to locally receive
> >>>>>> packets with a certain link layer multicast address.
> >>>>> Yes. The thread is also a bit confusing because we half way through realized
> >>>>> that we misunderstood how the multicast packets should be handled (sorry about
> >>>>> that). To begin with we had a driver where multicast packets was only copied to
> >>>>> the CPU if someone needed it. Andrew and Nikolay made us aware that this is not
> >>>>> how other drivers are doing it, so we changed the driver to include the CPU in
> >>>>> the default multicast flood-mask.
> >>>> OK, so what prevents you from removing all other ports from the
> >>>> flood-mask and letting the CPU handle the flooding? Then you can install
> >>>> software tc filters to limit the flooding.
> >>> I do not have the bandwidth to forward the multicast traffic in the CPU.
> >>>
> >>> It will also cause enormous latency on the forwarding of L2 multicast packets.
> >>>
> >>>>> This changes the objective a bit. To begin with we needed to get more packets to
> >>>>> the CPU (which could have been done using tc ingress rules and a trap action).
> >>>>>
> >>>>> Now after we changed the driver, we realized that we need something to limit the
> >>>>> flooding of certain L2 multicast packets. This is the new problem we are trying
> >>>>> to solve!
> >>>>>
> >>>>> Example: Say we have a bridge with 4 slave interfaces, then we want to install a
> >>>>> forwarding rule saying that packets to a given L2-multicast MAC address, should
> >>>>> only be flooded to 2 of the 4 ports.
> >>>>>
> >>>>> (instead of adding rules to get certain packets to the CPU, we are now adding
> >>>>> other rules to prevent other packets from going to the CPU and other ports where
> >>>>> they are not needed/wanted).
> >>>>>
> >>>>> This is exactly the same thing as IGMP snooping does dynamically, but only for
> >>>>> IP multicast.
> >>>>>
> >>>>> The "bridge mdb" allow users to manually/static add/del a port to a multicast
> >>>>> group, but still it operates on IP multicast address (not L2 multicast
> >>>>> addresses).
> >>>>>
> >>>>>> Nik suggested SIOCADDMULTI.
> >>>>> It is not clear to me how this should be used to limit the flooding, maybe we
> >>>>> can make some hacks, but as far as I understand the intend of this is maintain
> >>>>> the list of addresses an interface should receive. I'm not sure this should
> >>>>> influence how for forwarding decisions are being made.
> >>>>>
> >>>>>> and I suggested a tc filter to get the packet to the CPU.
> >>>>> The TC solution is a good solution to the original problem where wanted to copy
> >>>>> more frames to the CPU. But we were convinced that this is not the right
> >>>>> approach, and that the CPU by default should receive all multicast packets, and
> >>>>> we should instead try to find a way to limit the flooding of certain frames as
> >>>>> an optimization.
> >>>>
> >>>> This can still work. In Linux, ingress tc filters are executed before the
> >>>> bridge's Rx handler. The same happens in every sane HW. Ingress ACL is
> >>>> performed before L2 forwarding. Assuming you have eth0-eth3 bridged and
> >>>> you want to prevent packets with DMAC 01:21:6C:00:00:01 from egressing
> >>>> eth2:
> >>>>
> >>>> # tc filter add dev eth0 ingress pref 1 flower skip_sw \
> >>>> 	dst_mac 01:21:6C:00:00:01 action trap
> >>>> # tc filter add dev eth2 egress pref 1 flower skip_hw \
> >>>> 	dst_mac 01:21:6C:00:00:01 action drop
> >>>>
> >>>> The first filter is only present in HW ('skip_sw') and should result in
> >>>> your HW passing you the sole copy of the packet.
> >>> Agree.
> >>>
> >>>> The second filter is only present in SW ('skip_hw', not using HW egress
> >>>> ACL that you don't have) and drops the packet after it was flooded by
> >>>> the SW bridge.
> >>> Agree.
> >>>
> >>>> As I mentioned earlier, you can install the filter once in your HW and
> >>>> share it between different ports using a shared block. This means you
> >>>> only consume one TCAM entry.
> >>>>
> >>>> Note that this allows you to keep flooding all other multicast packets
> >>>> in HW.
> >>> Yes, but the frames we want to limit the flood-mask on are the exact frames
> >>> which occurs at a very high rate, and where latency is important.
> >>>
> >>> I really do not consider it as an option to forward this in SW, when it is
> >>> something that can easily be offloaded in HW.
> >>>
> >>>>>> If you now want to limit the ports to which this packet is flooded, then
> >>>>>> you can use tc filters in *software*:
> >>>>>>
> >>>>>> # tc qdisc add dev eth2 clsact
> >>>>>> # tc filter add dev eth2 egress pref 1 flower skip_hw \
> >>>>>> 	dst_mac 01:21:6C:00:00:01 action drop
> >>>>> Yes. This can work in the SW bridge.
> >>>>>
> >>>>>> If you want to forward the packet in hardware and locally receive it,
> >>>>>> you can chain several mirred action and then a trap action.
> >>>>> I'm not I fully understand how this should be done, but it does sound like it
> >>>>> becomes quite complicated. Also, as far as I understand it will mean that we
> >>>>> will be using TCAM/ACL resources to do something that could have been done with
> >>>>> a simple MAC entry.
> >>>>>
> >>>>>> Both options avoid HW egress ACLs which your design does not support.
> >>>>> True, but what is wrong with expanding the functionality of the normal
> >>>>> forwarding/MAC operations to allow multiple destinations?
> >>>>>
> >>>>> It is not an uncommon feature (I just browsed the manual of some common L2
> >>>>> switches and they all has this feature).
> >>>>>
> >>>>> It seems to fit nicely into the existing user-interface:
> >>>>>
> >>>>> bridge fdb add    01:21:6C:00:00:01 port eth0
> >>>>> bridge fdb append 01:21:6C:00:00:01 port eth1
> >>>>
> >>>> Wouldn't it be better to instead extend the MDB entries so that they are
> >>>> either keyed by IP or MAC? I believe FDB should remain as unicast-only.
> >>>
> >>> You might be right, it was not clear to me which of the two would fit the
> >>> purpose best.
> >>>
> >>> From a user-space iproute2 perspective I prefer using the "bridge fdb" command
> >>> as it already supports the needed syntax, and I do not think it will be too
> >>> pretty if we squeeze this into the "bridge mdb" command syntax.
> >>>
> >>
> >> MDB is a much better fit as Ido already suggested. FDB should remain unicast
> >> and mixing them is not a good idea, we already have a good ucast/mcast separation
> >> and we'd like to keep it that way.
> > Okay. We will explore that option.
> > 
> > 
> Great, thanks.
> 
> >>> But that does not mean that it need to go into the FDB database in the
> >>> implementation.
> >>>
> >>> Last evening when I looked at it again, I was considering keeping the
> >>> net_bridge_fdb_entry structure as is, and add a new hashtable with the
> >>> following:
> >>>
> >>> struct net_bridge_fdbmc_entry {
> >>> 	struct rhash_head		rhnode;
> >>> 	struct net_bridge_fdbmc_ports   *dst;
> >>>
> >>> 	struct net_bridge_fdb_key	key;
> >>> 	struct hlist_node		fdb_node;
> >>> 	unsigned char			offloaded:1;
> >>>
> >>> 	struct rcu_head			rcu;
> >>> };
> >>>
> >>
> >> What would the notification for this look like ?
> > Not sure. But we will change the direction and use the MDB structures instead.
> > 
> 
> Ack
> 
> >>> If we go with this approach then we can look at the MAC address and see if it is
> >>> a unicast which will cause a lookup in the fdb, l3-multicast (33:33:* or
> >>> 01:00:5e:*) which will cause a lookup in the mdb, or finally a fdbmc which will
> >>> need to do a lookup in this new hashtable.
> >>
> >> That sounds wrong, you will change the current default behaviour of flooding these
> >> packets. This will have to be well hidden behind a new option and enabled only on user
> >> request.
> > It will only affect users who install a static L2-multicast entry. If no entry
> > is found, it will default to flooding, which will be the same as before.
> > 
> 
> Ack
> 
> >>> Alternative it would be like this:
> >>>
> >>> struct net_bridge_fdb_entry {
> >>> 	struct rhash_head		rhnode;
> >>> 	union net_bridge_port_or_list	*dst;
> >>>
> >>> 	struct net_bridge_fdb_key	key;
> >>> 	struct hlist_node		fdb_node;
> >>> 	unsigned char			is_local:1,
> >>> 					is_static:1,
> >>> 					is_sticky:1,
> >>> 					added_by_user:1,
> >>> 					added_by_external_learn:1,
> >>> 					offloaded:1;
> >>> 					multi_dst:1;
> >>>
> >>> 	/* write-heavy members should not affect lookups */
> >>> 	unsigned long			updated ____cacheline_aligned_in_smp;
> >>> 	unsigned long			used;
> >>>
> >>> 	struct rcu_head			rcu;
> >>> };
> >>>
> >>> Both solutions should require fairly few changes, and should not cause any
> >>> measurable performance hit.
> >>>
> >>
> >> You'll have to convert these bits to use the proper atomic bitops if you go with
> >> the second solution. That has to be done even today, but the second case would
> >> make it a must.
> > Good to know.
> > 
> > Just for my understanding, is this because this is the "current" guide lines on
> > how things should be done, or is this because the multi_dst as a special need.
> > 
> > The multi_dst flag will never be changed in the life-cycle of the structure, and
> > the structure is protected by rcu. If this is causeing a raise, then I do not
> > see it.
> > 
> 
> These flags are changed from different contexts without any locking and you can end
> up with wrong values since these are not atomic ops. We need to move to atomic ops
> to guarantee consistent results. It is not only multi_dst issue, it's a problem
> for all of them, it's just not critical today since you'll end up with wrong
> flag values in such cases, but with multi_dst it will be important because the union
> pointer will have to be treated different based on the multi_dst value.
Okay, thanks for explaining.

> >>> Making it fit into the net_bridge_mdb_entry seems to be harder.
> >>>
> >>
> >> But it is the correct abstraction from bridge POV, so please stop trying to change
> >> the FDB code and try to keep to the multicast code.
> > We are planning on letting the net_bridge_port_or_list union use the
> > net_bridge_port_group structure, which will mean that we can re-use the
> > br_multicast_flood function (if we change the signatire to accept the ports
> > instead of the entry).
> > 
> 
> That sounds great, definitely a step in the right direction.
> 
> >>>> As a bonus, existing drivers could benefit from it, as MDB entries are already
> >>>> notified by MAC.
> >>> Not sure I follow. When FDB entries are added, it also generates notification
> >>> events.
> >>>
> >>
> >> Could you please show fdb event with multiple ports ?
> > We will get to that. Maybe this is an argument for converting to mdb. We have
> > not looked into the details of this yet.
> I can see a few potential problems, the important thing here would be to keep
> backwards compatibility.
Okay, good to be aware of

> >>>>> It seems that it can be added to the existing implementation with out adding
> >>>>> significant complexity.
> >>>>>
> >>>>> It will be easy to offload in HW.
> >>>>>
> >>>>> I do not believe that it will be a performance issue, if this is a concern then
> >>>>> we may have to do a bit of benchmarking, or we can make it a configuration
> >>>>> option.
> >>>>>
> >>>>> Long story short, we (Horatiu and I) learned a lot from the discussion here, and
> >>>>> I think we should try do a new patch with the learning we got. Then it is easier
> >>>>> to see what it actually means to the exiting code, complexity, exiting drivers,
> >>>>> performance, default behavioral, backwards compatibly, and other valid concerns.
> >>>>>
> >>>>> If the patch is no good, and cannot be fixed, then we will go back and look
> >>>>> further into alternative solutions.
> >>>> Overall, I tend to agree with Nik. I think your use case is too specific
> >>>> to justify the amount of changes you want to make in the bridge driver.
> >>>> We also provided other alternatives. That being said, you're more than
> >>>> welcome to send the patches and we can continue the discussion then.
> >>> Okay, good to know. I'm not sure I agree that the alternative solutions really
> >>> solves the issue this is trying to solve, nor do I agree that this is specific
> >>> to our needs.
> >>>
> >>> But lets take a look at a new patch, and see what is the amount of changes we
> >>> are talking about. Without having the patch it is really hard to know for sure.
> >> Please keep in mind that this case is the exception, not the norm, thus it should
> >> not under any circumstance affect the standard deployments.
> > Understood - no surprises.
> > 
> 
> Great, thanks again. Will continue discussing when the new patch arrives.
Sure, looking forward to that.

/Allan

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH] net: bridge: Allow bridge to joing multicast groups
  2019-07-30 10:04                                     ` Ido Schimmel
@ 2019-07-30 12:19                                       ` Allan W. Nielsen
  0 siblings, 0 replies; 43+ messages in thread
From: Allan W. Nielsen @ 2019-07-30 12:19 UTC (permalink / raw)
  To: Ido Schimmel
  Cc: Nikolay Aleksandrov, Horatiu Vultur, roopa, davem, bridge,
	netdev, linux-kernel

The 07/30/2019 13:04, Ido Schimmel wrote:
> On Tue, Jul 30, 2019 at 10:30:28AM +0200, Allan W. Nielsen wrote:
> > The 07/30/2019 10:06, Ido Schimmel wrote:
> > > As a bonus, existing drivers could benefit from it, as MDB entries are already
> > > notified by MAC.
> > Not sure I follow. When FDB entries are added, it also generates notification
> > events.
> 
> I meant the switchdev notification sent to drivers:
> 
> /* SWITCHDEV_OBJ_ID_PORT_MDB */
> struct switchdev_obj_port_mdb {
> 	struct switchdev_obj obj;
> 	unsigned char addr[ETH_ALEN];
> 	u16 vid;
> };
> 
> By extending MDB entries to also be keyed by MAC you basically get a lot
> of things for free without duplicating the same code for multicast FDBs.
Agree, this should be the same.

> AFAICS, then only change in the fast path is in br_mdb_get() where you
> need to use DMAC as key in case Ethertype is not IPv4/IPv6.
That would be nice.

-- 
/Allan

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH] net: bridge: Allow bridge to joing multicast groups
  2019-07-30  6:27                               ` Allan W. Nielsen
  2019-07-30  7:06                                 ` Ido Schimmel
@ 2019-07-30 14:34                                 ` Andrew Lunn
  2019-07-30 19:00                                   ` Allan W. Nielsen
  1 sibling, 1 reply; 43+ messages in thread
From: Andrew Lunn @ 2019-07-30 14:34 UTC (permalink / raw)
  To: Allan W. Nielsen
  Cc: Ido Schimmel, Nikolay Aleksandrov, Horatiu Vultur, roopa, davem,
	bridge, netdev, linux-kernel

Hi Allan

Just throwing out another idea....

The whole offloading story has been you use the hardware to accelerate
what the Linux stack can already do.

In this case, you want to accelerate Device Level Ring, DLR.  But i've
not yet seen a software implementation of DLR. Should we really be
considering first adding DLR to the SW bridge? Make it an alternative
to the STP code? Once we have a generic implementation we can then
look at how it can be accelerated using switchdev.

     Andrew

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH] net: bridge: Allow bridge to joing multicast groups
  2019-07-30 14:34                                 ` Andrew Lunn
@ 2019-07-30 19:00                                   ` Allan W. Nielsen
  2019-07-31  3:31                                     ` Andrew Lunn
  0 siblings, 1 reply; 43+ messages in thread
From: Allan W. Nielsen @ 2019-07-30 19:00 UTC (permalink / raw)
  To: Andrew Lunn
  Cc: Ido Schimmel, Nikolay Aleksandrov, Horatiu Vultur, roopa, davem,
	bridge, netdev, linux-kernel

The 07/30/2019 16:34, Andrew Lunn wrote:
> The whole offloading story has been you use the hardware to accelerate
> what the Linux stack can already do.
It is true, I have been quite keen on finding a way to control the forwarding of
L2-multicast which will work in the same way with and without HW acceleration
(and which we can HW offlaod with the HW I'm working on).

> In this case, you want to accelerate Device Level Ring, DLR.
It is actually not only for DLR, there are other ring protocols which has the
same needs the same MRP (media redundancy protocol) is an other example.

I just used DLR as an example because this is the one we expect to implement the
protocol for first. There are other just as important use-cases.

> But i've not yet seen a software implementation of DLR. Should we really be
> considering first adding DLR to the SW bridge?
We have actually (slowly) stared to work on a DLR SW implementation. We want to
do this as a Linux driver instead of a user-space implementation, because there
are other HW facilities we would like to offload (the HW has a automatic frame
generator, which can generate the beacon frames, and a unit which can terminate
the beacon frames, and generate an interrupt if the beacon frames are not
received).

Our plan was to implement this in pure SW, and then look at how to HW offload
it.

But this will take some time before we have anything meaning full to show.

> Make it an alternative to the STP code?
I'm still working on learning the details of DLR, but I actually believe that it
in some situations may co-exists with STP ;-)

DLR only works on ring topologies, but it is possible to connect a ring to a
classic STP network. If doing so, then you are suppose to run DLR on the two
ring ports, and (M)STP on the ports connecting to the remaining part of the
network.

As far as I recall, this is called a gateway node. But supporting this is
optional, and will properly not be supported in the first implementation.

> Once we have a generic implementation we can then look at how it can
> be accelerated using switchdev.
I agree with you that we need a SW implementation of DLR because we can offload
the DLR protocol to HW.

But what we are looking at here, is to offload a
non-aware-(DLR|MRP)-switch which happens to be placed in a network with
these protocols running.

It is not really DLR specific, which is why it seems reasonable to implement
this without a DLR SW implementation up front.

-- 
/Allan

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH] net: bridge: Allow bridge to joing multicast groups
  2019-07-30 19:00                                   ` Allan W. Nielsen
@ 2019-07-31  3:31                                     ` Andrew Lunn
  2019-07-31  8:01                                       ` Allan W. Nielsen
  0 siblings, 1 reply; 43+ messages in thread
From: Andrew Lunn @ 2019-07-31  3:31 UTC (permalink / raw)
  To: Allan W. Nielsen
  Cc: Ido Schimmel, Nikolay Aleksandrov, Horatiu Vultur, roopa, davem,
	bridge, netdev, linux-kernel

> Our plan was to implement this in pure SW, and then look at how to HW offload
> it.

Great.

> But this will take some time before we have anything meaning full to show.
> 
> > Make it an alternative to the STP code?
> I'm still working on learning the details of DLR, but I actually believe that it
> in some situations may co-exists with STP ;-)

The PDF you linked to suggests this as well. But i think you will need
to make some core changes to the bridge. At the moment, STP is a
bridge level property. But you are going to need it to be a per-port
option. You can then use DLR on the ring ports, and optionally STP on
the other ports.

> But what we are looking at here, is to offload a
> non-aware-(DLR|MRP)-switch which happens to be placed in a network
> with these protocols running.

So we need to think about why we are passing traffic to the CPU port,
and under what conditions can it be blocked.

1) The interface is not part of a bridge. In this case, we only need
the switch to pass to the CPU port MC addresses which have been set
via set_rx_mode().

I think this case does not apply for what you want. You have two ports
bridges together as part of the ring.

2) The interface is part of a bridge. There are a few sub-cases

a) IGMP snooping is being performed. We can block multicast where
there is no interest in the group. But this is limited to IP
multicast.

b) IGMP snooping is not being used and all interfaces in the bridge
are ports of the switch. IP Multicast can be blocked to the CPU.

c) IGMP snooping is not being used and there is a non-switch interface
in the bridge. Multicast needed is needed, so it can be flooded out
this port.

d) set_rx_mode() has been called on the br0 interface, indicating
there is interest in the packets on the host. They must be sent to the
CPU so they can be delivered locally.

e) ????

Does the Multicast MAC address being used by DLR also map to an IP
mmulticast address? 01:21:6C:00:00:0[123] appear to be the MAC
addresses used by DLR. IPv4 multicast MAC addresses are
01:00:5E:XX:XX:XX. IPv6 multicast MAC addresses are 33:33:XX:XX:XX:XX.

So one possibility here is to teach the SW bridge about non-IP
multicast addresses. Initially the switch should forward all MAC
multicast frames to the CPU. If the frame is not an IPv4 or IPv6
frame, and there has not been a call to set_rx_mode() for the MAC
address on the br0 interface, and the bridge only contains switch
ports, switchdev could be used to block the multicast to the CPU
frame, but forward it out all other ports of the bridge.

      Andrew

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH] net: bridge: Allow bridge to joing multicast groups
  2019-07-31  3:31                                     ` Andrew Lunn
@ 2019-07-31  8:01                                       ` Allan W. Nielsen
  2019-07-31 13:45                                         ` Andrew Lunn
  0 siblings, 1 reply; 43+ messages in thread
From: Allan W. Nielsen @ 2019-07-31  8:01 UTC (permalink / raw)
  To: Andrew Lunn
  Cc: Ido Schimmel, Nikolay Aleksandrov, Horatiu Vultur, roopa, davem,
	bridge, netdev, linux-kernel

The 07/31/2019 05:31, Andrew Lunn wrote:
> The PDF you linked to suggests this as well. But i think you will need
> to make some core changes to the bridge. At the moment, STP is a
> bridge level property. But you are going to need it to be a per-port
> option. You can then use DLR on the ring ports, and optionally STP on
> the other ports.
I think you are right. I have not looked into the details of this. Our plan was
to not implement this to begin with. This will mean that it cannot act as a
gateway until this is done. But it still has good value to be able to implement
a DLR ring node.

> > But what we are looking at here, is to offload a
> > non-aware-(DLR|MRP)-switch which happens to be placed in a network
> > with these protocols running.
> 
> So we need to think about why we are passing traffic to the CPU port,
> and under what conditions can it be blocked.
> 
> 1) The interface is not part of a bridge. In this case, we only need
> the switch to pass to the CPU port MC addresses which have been set
> via set_rx_mode().
Yes. In our HW we are using the MAC table to do this, but this is an
implementation detail.

> I think this case does not apply for what you want. You have two ports
> bridges together as part of the ring.
Correct.

> 2) The interface is part of a bridge. There are a few sub-cases
> 
> a) IGMP snooping is being performed. We can block multicast where
> there is no interest in the group. But this is limited to IP
> multicast.
Agree. And this is done today by installing an explicit offload rule to limit
the flooding of a specific group.

> b) IGMP snooping is not being used and all interfaces in the bridge
> are ports of the switch. IP Multicast can be blocked to the CPU.
Does it matter if IGMP snooping is used or not? A more general statement could
be:

- "All interfaces in the bridge are ports of the switch. IP Multicast can be
  blocked to the CPU."

This assumes that if br0 joins a IP multicast group, then the needed MAC
addresses is added via the set_rx_mode (which I'm pretty sure is the case.

Or much more aggressive (which is where we started this entire discussion):

- "All interfaces in the bridge are ports of the switch. All Multicast (except
  ff:ff:ff:ff:ff:ff, 33:33:xx:xx:xx:xx, 01:80:C2:xx:xx:xx) can be blocked to the
  CPU."

But then we are back to having the need of requesting additional L2-multicast
mac addresses to the CPU. This has been discussed, and I do not want to re-start
the discussion, just wanted to mention it to have the complete picture.

> c) IGMP snooping is not being used and there is a non-switch interface
> in the bridge. Multicast needed is needed, so it can be flooded out
> this port.
Yes. And L2-multicast also always needs to go to the CPU regardless of IGMP.

> d) set_rx_mode() has been called on the br0 interface, indicating
> there is interest in the packets on the host. They must be sent to the
> CPU so they can be delivered locally.
Yes - but today set_rx_mode is not doing anything on br0. This was one problem
we had a few days ago, when we did not forward all traffic to the CPU by
default.

When looking at these cases, I'm not sure it matters if IGMP snooping is running
or not.

Here is how I understand what you say:

              || Foreign interfaces   |  No Foreign interfaces
==============||======================|=========================
IGMP Snoop on || IP-Multicast must    |  IP-Multicast is not needed
              || go to CPU by default |  in the CPU (by default*)
--------------||----------------------|-------------------------
IGMP Snoop off|| IP-Multicast must    |  IP-Multicast is not needed 
              || go to CPU by default |  in the CPU (by default*)

* This assumes that set_rx_mode starts to do something for the br0 interface
  (which does not seem to be the case right now - see br_dev_set_multicast_list).


> e) ????
e) Another case to consider is similar to d), but with VLANs. Assume that
br0.100 joins a IP-Multicast group, which will cause set_rx_mode() to be called.
As I read the code (I have not tried it, so I might have gotten this wrong),
then we loose the VLAN information, because the mc list is not VLAN aware.

This means that if br0.100 joins a group, then we will get that group for all
VLANs.

Maybe it is better (for now) to hold on to the exiting behavioral and let all
multicast go to the CPU by default, leave br_dev_set_multicast_list empty, use
IGMP snooping to optimize the flooding, and hopefully we will have a way to
limit the L2-multicast flooding as an optimization which can be applied to
"noisy" l2 multicast streams.

> Does the Multicast MAC address being used by DLR also map to an IP
> mmulticast address?
No.

> 01:21:6C:00:00:0[123] appear to be the MAC addresses used by DLR.
Yes.

> IPv4 multicast MAC addresses are 01:00:5E:XX:XX:XX. IPv6 multicast MAC
> addresses are 33:33:XX:XX:XX:XX.
Yes, and there are no overlap.

> So one possibility here is to teach the SW bridge about non-IP
> multicast addresses. Initially the switch should forward all MAC
> multicast frames to the CPU. If the frame is not an IPv4 or IPv6
> frame, and there has not been a call to set_rx_mode() for the MAC
> address on the br0 interface, and the bridge only contains switch
> ports, switchdev could be used to block the multicast to the CPU
> frame, but forward it out all other ports of the bridge.
Close, but not exactly (due to the arguments above).

Here is how I see it:

Teach the SW bridge about non-IP multicast addresses. Initially the switch
should forward all MAC multicast frames to the CPU. Today MDB rules can be
installed (either static or dynamic by IGMP), which limit the flooding of IPv4/6
multicast streams. In the same way, we should have a way to install a rule
(FDM/ or MDB) to limit the flooding of L2 multicast frames.

If foreign interfaces (or br0 it self) is part of the destination list, then
traffic also needs to go to the CPU.

By doing this, we can for explicitly configured dst mac address:
- limit the flooding on the on the SW bridge interfaces
- limit the flooding on the on the HW bridge interfaces
- prevent them to go to the CPU if they are not needed


-- 
/Allan

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH] net: bridge: Allow bridge to joing multicast groups
  2019-07-31  8:01                                       ` Allan W. Nielsen
@ 2019-07-31 13:45                                         ` Andrew Lunn
  2019-07-31 19:38                                           ` Allan W. Nielsen
  0 siblings, 1 reply; 43+ messages in thread
From: Andrew Lunn @ 2019-07-31 13:45 UTC (permalink / raw)
  To: Allan W. Nielsen
  Cc: Ido Schimmel, Nikolay Aleksandrov, Horatiu Vultur, roopa, davem,
	bridge, netdev, linux-kernel

> > 2) The interface is part of a bridge. There are a few sub-cases
> > 
> > a) IGMP snooping is being performed. We can block multicast where
> > there is no interest in the group. But this is limited to IP
> > multicast.
> Agree. And this is done today by installing an explicit offload rule to limit
> the flooding of a specific group.
> 
> > b) IGMP snooping is not being used and all interfaces in the bridge
> > are ports of the switch. IP Multicast can be blocked to the CPU.
> Does it matter if IGMP snooping is used or not? A more general statement could
> be:
> 
> - "All interfaces in the bridge are ports of the switch. IP Multicast can be
>   blocked to the CPU."

We have seen IPv6 neighbour discovery break in conditions like this. I
don't know the exact details.

You also need to watch out for 224.0.0.0/24. This is the link local
multicast range. There is no need to join MC addresses in that
range. It is assumed they will always be received. So even if IGMP is
enabled, you still need to pass some multicast traffic to the CPU.

> > So one possibility here is to teach the SW bridge about non-IP
> > multicast addresses. Initially the switch should forward all MAC
> > multicast frames to the CPU. If the frame is not an IPv4 or IPv6
> > frame, and there has not been a call to set_rx_mode() for the MAC
> > address on the br0 interface, and the bridge only contains switch
> > ports, switchdev could be used to block the multicast to the CPU
> > frame, but forward it out all other ports of the bridge.
> Close, but not exactly (due to the arguments above).
> 
> Here is how I see it:
> 
> Teach the SW bridge about non-IP multicast addresses. Initially the switch
> should forward all MAC multicast frames to the CPU. Today MDB rules can be
> installed (either static or dynamic by IGMP), which limit the flooding of IPv4/6
> multicast streams. In the same way, we should have a way to install a rule
> (FDM/ or MDB) to limit the flooding of L2 multicast frames.
> 
> If foreign interfaces (or br0 it self) is part of the destination list, then
> traffic also needs to go to the CPU.
> 
> By doing this, we can for explicitly configured dst mac address:
> - limit the flooding on the on the SW bridge interfaces
> - limit the flooding on the on the HW bridge interfaces
> - prevent them to go to the CPU if they are not needed

This is all very complex because of all the different corner cases. So
i don't think we want a user API to do the CPU part, we want the
network stack to do it. Otherwise the user is going to get is wrong,
break their network, and then come running to the list for help.

This also fits with how we do things in DSA. There is deliberately no
user space concept for configuring the DSA CPU port. To user space,
the switch is just a bunch of Linux interfaces. Everything to do with
the CPU port is hidden away in the DSA core layer, the DSA drivers,
and a little bit in the bridge.

       Andrew

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH] net: bridge: Allow bridge to joing multicast groups
  2019-07-31 13:45                                         ` Andrew Lunn
@ 2019-07-31 19:38                                           ` Allan W. Nielsen
  0 siblings, 0 replies; 43+ messages in thread
From: Allan W. Nielsen @ 2019-07-31 19:38 UTC (permalink / raw)
  To: Andrew Lunn
  Cc: Ido Schimmel, Nikolay Aleksandrov, Horatiu Vultur, roopa, davem,
	bridge, netdev, linux-kernel

The 07/31/2019 15:45, Andrew Lunn wrote:
> > Here is how I see it:
> > 
> > Teach the SW bridge about non-IP multicast addresses. Initially the switch
> > should forward all MAC multicast frames to the CPU. Today MDB rules can be
> > installed (either static or dynamic by IGMP), which limit the flooding of IPv4/6
> > multicast streams. In the same way, we should have a way to install a rule
> > (FDM/ or MDB) to limit the flooding of L2 multicast frames.
> > 
> > If foreign interfaces (or br0 it self) is part of the destination list, then
> > traffic also needs to go to the CPU.
> > 
> > By doing this, we can for explicitly configured dst mac address:
> > - limit the flooding on the on the SW bridge interfaces
> > - limit the flooding on the on the HW bridge interfaces
> > - prevent them to go to the CPU if they are not needed
> This is all very complex because of all the different corner cases. So
> i don't think we want a user API to do the CPU part, we want the
> network stack to do it. Otherwise the user is going to get is wrong,
> break their network, and then come running to the list for help.
Not sure I really understand what to conclude from this... Their are already
many ways the user can break it (tc has great hooks for that), and I not think
we can really prevent the user in configuring something that break stuff (but
we should not make it too easy either).

Anyway, Horatiu has come a long way in creating a (surprising simple) patch
which allow us to limit the flooding of L2-multicast. It is following the
guidance from Nikolay, it is using the MDB database, and I beleive it is well
aligned with the existing sw-bridge design.

I hope it will be ready tomorrow, then we can have a look at it and see if it is
any good.

> This also fits with how we do things in DSA. There is deliberately no
> user space concept for configuring the DSA CPU port. To user space,
> the switch is just a bunch of Linux interfaces. Everything to do with
> the CPU port is hidden away in the DSA core layer, the DSA drivers,
> and a little bit in the bridge.

Understood, but as far as I understand, in DSA you still have the br0 interface,
which kind-of represent the traffic going to the CPU (like in pure SW bridge,
and SwitchDev offloaded SW-bridge).

/Allan


^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH] net: bridge: Allow bridge to joing multicast groups
  2019-07-26 12:31             ` Nikolay Aleksandrov
  2019-07-29 12:14               ` Allan W. Nielsen
@ 2019-08-01 14:22               ` Allan W. Nielsen
  1 sibling, 0 replies; 43+ messages in thread
From: Allan W. Nielsen @ 2019-08-01 14:22 UTC (permalink / raw)
  To: Nikolay Aleksandrov
  Cc: Horatiu Vultur, roopa, davem, bridge, netdev, linux-kernel

The 07/26/2019 15:31, Nikolay Aleksandrov wrote:
...

> You know that in order to not run in promisc mode you'll have to disable
> port flooding and port learning, right ? Otherwise they're always put in promisc.
Yes, we have spend some time looking at nbp_update_port_count and trying to
understand the reasoning behind it.

Our understanding is that this is to make it work with a pure SW bridge
implementation, and this is actually an optimization to allow disable promisc
mode if all forwarding is static (no flooding and no learning).

We also noticed that the Ocelot and the Rocker drivers avoids this "issue" by
not implementing promisc mode.

But promisc mode is a really nice feature for debugging, and we would actually
like to have it, and when HW that can do learning/flooding it does not seem to
be necessary.

I tried to understand how this is handled in the Mellanox drivers, but gave up.
Too big, and we lack the insight in their design.

Do you know if there are better ways to prevent switchdev-offloaded-slave
interfaces to go to promisc mode?

/Allan


^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH] net: bridge: Allow bridge to joing multicast groups
  2019-07-25 11:44 [PATCH] net: bridge: Allow bridge to joing multicast groups Horatiu Vultur
  2019-07-25 13:06 ` Nikolay Aleksandrov
@ 2019-08-01 19:17 ` Vivien Didelot
  2019-08-01 19:48   ` Horatiu Vultur
  1 sibling, 1 reply; 43+ messages in thread
From: Vivien Didelot @ 2019-08-01 19:17 UTC (permalink / raw)
  To: Horatiu Vultur
  Cc: roopa, nikolay, davem, bridge, netdev, linux-kernel,
	allan.nielsen, Horatiu Vultur

Hi Horatiu,

On Thu, 25 Jul 2019 13:44:04 +0200, Horatiu Vultur <horatiu.vultur@microchip.com> wrote:
> There is no way to configure the bridge, to receive only specific link
> layer multicast addresses. From the description of the command 'bridge
> fdb append' is supposed to do that, but there was no way to notify the
> network driver that the bridge joined a group, because LLADDR was added
> to the unicast netdev_hw_addr_list.
> 
> Therefore update fdb_add_entry to check if the NLM_F_APPEND flag is set
> and if the source is NULL, which represent the bridge itself. Then add
> address to multicast netdev_hw_addr_list for each bridge interfaces.
> And then the .ndo_set_rx_mode function on the driver is called. To notify
> the driver that the list of multicast mac addresses changed.
> 
> Signed-off-by: Horatiu Vultur <horatiu.vultur@microchip.com>
> ---
>  net/bridge/br_fdb.c | 49 ++++++++++++++++++++++++++++++++++++++++++++++---
>  1 file changed, 46 insertions(+), 3 deletions(-)
> 
> diff --git a/net/bridge/br_fdb.c b/net/bridge/br_fdb.c
> index b1d3248..d93746d 100644
> --- a/net/bridge/br_fdb.c
> +++ b/net/bridge/br_fdb.c
> @@ -175,6 +175,29 @@ static void fdb_add_hw_addr(struct net_bridge *br, const unsigned char *addr)
>  	}
>  }
>  
> +static void fdb_add_hw_maddr(struct net_bridge *br, const unsigned char *addr)
> +{
> +	int err;
> +	struct net_bridge_port *p;
> +
> +	ASSERT_RTNL();
> +
> +	list_for_each_entry(p, &br->port_list, list) {
> +		if (!br_promisc_port(p)) {
> +			err = dev_mc_add(p->dev, addr);
> +			if (err)
> +				goto undo;
> +		}
> +	}
> +
> +	return;
> +undo:
> +	list_for_each_entry_continue_reverse(p, &br->port_list, list) {
> +		if (!br_promisc_port(p))
> +			dev_mc_del(p->dev, addr);
> +	}
> +}
> +
>  /* When a static FDB entry is deleted, the HW address from that entry is
>   * also removed from the bridge private HW address list and updates all
>   * the ports with needed information.
> @@ -192,13 +215,27 @@ static void fdb_del_hw_addr(struct net_bridge *br, const unsigned char *addr)
>  	}
>  }
>  
> +static void fdb_del_hw_maddr(struct net_bridge *br, const unsigned char *addr)
> +{
> +	struct net_bridge_port *p;
> +
> +	ASSERT_RTNL();
> +
> +	list_for_each_entry(p, &br->port_list, list) {
> +		if (!br_promisc_port(p))
> +			dev_mc_del(p->dev, addr);
> +	}
> +}
> +
>  static void fdb_delete(struct net_bridge *br, struct net_bridge_fdb_entry *f,
>  		       bool swdev_notify)
>  {
>  	trace_fdb_delete(br, f);
>  
> -	if (f->is_static)
> +	if (f->is_static) {
>  		fdb_del_hw_addr(br, f->key.addr.addr);
> +		fdb_del_hw_maddr(br, f->key.addr.addr);
> +	}
>  
>  	hlist_del_init_rcu(&f->fdb_node);
>  	rhashtable_remove_fast(&br->fdb_hash_tbl, &f->rhnode,
> @@ -843,13 +880,19 @@ static int fdb_add_entry(struct net_bridge *br, struct net_bridge_port *source,
>  			fdb->is_local = 1;
>  			if (!fdb->is_static) {
>  				fdb->is_static = 1;
> -				fdb_add_hw_addr(br, addr);
> +				if (flags & NLM_F_APPEND && !source)
> +					fdb_add_hw_maddr(br, addr);
> +				else
> +					fdb_add_hw_addr(br, addr);
>  			}
>  		} else if (state & NUD_NOARP) {
>  			fdb->is_local = 0;
>  			if (!fdb->is_static) {
>  				fdb->is_static = 1;
> -				fdb_add_hw_addr(br, addr);
> +				if (flags & NLM_F_APPEND && !source)
> +					fdb_add_hw_maddr(br, addr);
> +				else
> +					fdb_add_hw_addr(br, addr);
>  			}
>  		} else {
>  			fdb->is_local = 0;
> -- 
> 2.7.4
> 

I'm a bit late in the conversation. Isn't this what you want?

    ip address add <multicast IPv4 address> dev br0 autojoin


Thanks,
Vivien

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH] net: bridge: Allow bridge to joing multicast groups
  2019-08-01 19:17 ` Vivien Didelot
@ 2019-08-01 19:48   ` Horatiu Vultur
  2019-08-01 20:08     ` Vivien Didelot
  0 siblings, 1 reply; 43+ messages in thread
From: Horatiu Vultur @ 2019-08-01 19:48 UTC (permalink / raw)
  To: Vivien Didelot
  Cc: roopa, nikolay, davem, bridge, netdev, linux-kernel, allan.nielsen

Hi Vivien,

The 08/01/2019 15:17, Vivien Didelot wrote:
> External E-Mail
> 
> I'm a bit late in the conversation. Isn't this what you want?
> 
>     ip address add <multicast IPv4 address> dev br0 autojoin
> 

Not really, I was looking in a way to register the ports to link layer
multicast address. Sorry for the confusion, my description of the patch
was totally missleaning.

If you follow this thread you will get a better idea what we wanted to
achive. We got some really good comments and based on these we send a
RFC[1]. 

> 
> Thanks,
> Vivien

[1] https://patchwork.ozlabs.org/patch/1140468/

-- 
/Horatiu

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH] net: bridge: Allow bridge to joing multicast groups
  2019-08-01 19:48   ` Horatiu Vultur
@ 2019-08-01 20:08     ` Vivien Didelot
  0 siblings, 0 replies; 43+ messages in thread
From: Vivien Didelot @ 2019-08-01 20:08 UTC (permalink / raw)
  To: Horatiu Vultur
  Cc: roopa, nikolay, davem, bridge, netdev, linux-kernel, allan.nielsen

Hi Horatiu,

On Thu, 1 Aug 2019 21:48:02 +0200, Horatiu Vultur <horatiu.vultur@microchip.com> wrote:
> > I'm a bit late in the conversation. Isn't this what you want?
> > 
> >     ip address add <multicast IPv4 address> dev br0 autojoin
> > 
> 
> Not really, I was looking in a way to register the ports to link layer
> multicast address. Sorry for the confusion, my description of the patch
> was totally missleaning.
> 
> If you follow this thread you will get a better idea what we wanted to
> achive. We got some really good comments and based on these we send a
> RFC[1]. 

OK great! Keep me in the loop, I enjoy bridge and multicast very much ;-)


Thanks,

	Vivien

^ permalink raw reply	[flat|nested] 43+ messages in thread

end of thread, other threads:[~2019-08-01 20:08 UTC | newest]

Thread overview: 43+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-07-25 11:44 [PATCH] net: bridge: Allow bridge to joing multicast groups Horatiu Vultur
2019-07-25 13:06 ` Nikolay Aleksandrov
2019-07-25 13:21   ` Nikolay Aleksandrov
2019-07-25 14:21     ` Horatiu Vultur
2019-07-26  8:41       ` Nikolay Aleksandrov
2019-07-26  9:26         ` Nikolay Aleksandrov
2019-07-26 12:02           ` Horatiu Vultur
2019-07-26 12:31             ` Nikolay Aleksandrov
2019-07-29 12:14               ` Allan W. Nielsen
2019-07-29 12:22                 ` Nikolay Aleksandrov
2019-07-29 12:50                   ` Nikolay Aleksandrov
2019-07-29 13:14                   ` Allan W. Nielsen
2019-07-29 13:42                     ` Nikolay Aleksandrov
2019-07-29 13:52                       ` Allan W. Nielsen
2019-07-29 14:21                         ` Nikolay Aleksandrov
2019-07-29 14:35                           ` Allan W. Nielsen
2019-07-29 17:51                             ` Ido Schimmel
2019-07-30  6:27                               ` Allan W. Nielsen
2019-07-30  7:06                                 ` Ido Schimmel
2019-07-30  8:30                                   ` Allan W. Nielsen
2019-07-30  8:58                                     ` Nikolay Aleksandrov
2019-07-30  9:21                                       ` Allan W. Nielsen
2019-07-30  9:55                                         ` Nikolay Aleksandrov
2019-07-30 11:23                                           ` Allan W. Nielsen
2019-07-30 10:04                                     ` Ido Schimmel
2019-07-30 12:19                                       ` Allan W. Nielsen
2019-07-30 14:34                                 ` Andrew Lunn
2019-07-30 19:00                                   ` Allan W. Nielsen
2019-07-31  3:31                                     ` Andrew Lunn
2019-07-31  8:01                                       ` Allan W. Nielsen
2019-07-31 13:45                                         ` Andrew Lunn
2019-07-31 19:38                                           ` Allan W. Nielsen
2019-08-01 14:22               ` Allan W. Nielsen
2019-07-26 13:46             ` Andrew Lunn
2019-07-26 19:50               ` Allan W. Nielsen
2019-07-27  3:02                 ` Andrew Lunn
2019-07-28 19:15                   ` Allan W. Nielsen
2019-07-28 23:07                     ` Andrew Lunn
2019-07-29  6:09                     ` Ido Schimmel
2019-07-29 12:43                       ` Allan W. Nielsen
2019-08-01 19:17 ` Vivien Didelot
2019-08-01 19:48   ` Horatiu Vultur
2019-08-01 20:08     ` Vivien Didelot

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).