[PATCH net-next RFC v2] switchdev: bridge: drop hardware forwarded packets

All of lore.kernel.org
 help / color / mirror / Atom feed

* [PATCH net-next RFC v2] switchdev: bridge: drop hardware forwarded packets
@ 2015-03-20 16:58 roopa
  2015-03-20 17:11 ` John Fastabend
                   ` (2 more replies)
  0 siblings, 3 replies; 51+ messages in thread
From: roopa @ 2015-03-20 16:58 UTC (permalink / raw)
  To: davem, sfeldma, jiri, ronen.arad; +Cc: netdev

From: Roopa Prabhu <roopa@cumulusnetworks.com>

On a Linux bridge with bridge forwarding offloaded to switch ASIC,
there is a need to not re-forward frames that have already been
forwarded in hardware.

Typically these are broadcast or multicast frames forwarded by the
hardware to multiple destination ports including sending a copy of
the packet to the cpu (kernel e.g. an arp broadcast).
The bridge driver will try to forward the packet again, resulting in
two copies of the same packet.

These packets can also come up to the kernel for logging when they hit
a LOG acl rule in hardware. In such cases, you do want the packet
to go through the bridge netfilter hooks. Hence, this patch adds the
required checks just before the packet is being xmited.

v2:
	- Add a new hw_fwded flag in skbuff to indicate that the packet
	is already hardware forwarded. Switch driver will set this flag.
	I have been trying to avoid having this flag in the skb
	and thats why this patch has been in my tree for long. Cant think
	of other better alternatives. Suggestions are welcome. I have put
	this under CONFIG_NET_SWITCHDEV to minimize the impact.

Signed-off-by: Roopa Prabhu <roopa@cumulusnetworks.com>
Signed-off-by: Wilson Kok <wkok@cumulusnetworks.com>
---
 include/linux/skbuff.h  |    7 +++++--
 net/bridge/br_forward.c |   11 +++++++++++
 2 files changed, 16 insertions(+), 2 deletions(-)

diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h
index bba1330..1973b5c 100644
--- a/include/linux/skbuff.h
+++ b/include/linux/skbuff.h
@@ -560,8 +560,11 @@ struct sk_buff {
 				fclone:2,
 				peeked:1,
 				head_frag:1,
-				xmit_more:1;
-	/* one bit hole */
+				xmit_more:1,
+#ifdef CONFIG_NET_SWITCHDEV
+				hw_fwded:1;
+#endif
+	/* one bit hole if CONFIG_NET_SWITCHDEV not defined */
 	kmemcheck_bitfield_end(flags1);
 
 	/* fields enclosed in headers_start/headers_end are copied
diff --git a/net/bridge/br_forward.c b/net/bridge/br_forward.c
index 3304a54..b60b96e 100644
--- a/net/bridge/br_forward.c
+++ b/net/bridge/br_forward.c
@@ -37,6 +37,17 @@ static inline int should_deliver(const struct net_bridge_port *p,
 
 int br_dev_queue_push_xmit(struct sk_buff *skb)
 {
+
+#ifdef CONFIG_NET_SWITCHDEV
+	/* If skb is already hw forwarded and the port being forwarded
+	 * to is a switch port, dont reforward
+	 */
+	if (skb->hw_fwded && (skb->dev->features & NETIF_F_HW_SWITCH_OFFLOAD)) {
+		kfree_skb(skb);
+		return 0;
+	}
+
+#endif
 	if (!is_skb_forwardable(skb->dev, skb)) {
 		kfree_skb(skb);
 	} else {
-- 
1.7.10.4

^ permalink raw reply related	[flat|nested] 51+ messages in thread

* Re: [PATCH net-next RFC v2] switchdev: bridge: drop hardware forwarded packets
  2015-03-20 16:58 [PATCH net-next RFC v2] switchdev: bridge: drop hardware forwarded packets roopa
@ 2015-03-20 17:11 ` John Fastabend
  2015-03-20 18:13   ` Scott Feldman
  2015-03-20 21:03   ` roopa
  2015-03-20 18:03 ` Scott Feldman
  2015-03-20 20:36 ` David Miller
  2 siblings, 2 replies; 51+ messages in thread
From: John Fastabend @ 2015-03-20 17:11 UTC (permalink / raw)
  To: roopa, davem, sfeldma, jiri, ronen.arad; +Cc: netdev

On 03/20/2015 09:58 AM, roopa@cumulusnetworks.com wrote:
> From: Roopa Prabhu <roopa@cumulusnetworks.com>
> 
> On a Linux bridge with bridge forwarding offloaded to switch ASIC,
> there is a need to not re-forward frames that have already been
> forwarded in hardware.
> 
> Typically these are broadcast or multicast frames forwarded by the
> hardware to multiple destination ports including sending a copy of
> the packet to the cpu (kernel e.g. an arp broadcast).
> The bridge driver will try to forward the packet again, resulting in
> two copies of the same packet.
> 
> These packets can also come up to the kernel for logging when they hit
> a LOG acl rule in hardware. In such cases, you do want the packet
> to go through the bridge netfilter hooks. Hence, this patch adds the
> required checks just before the packet is being xmited.
> 
> v2:
> 	- Add a new hw_fwded flag in skbuff to indicate that the packet
> 	is already hardware forwarded. Switch driver will set this flag.
> 	I have been trying to avoid having this flag in the skb
> 	and thats why this patch has been in my tree for long. Cant think
> 	of other better alternatives. Suggestions are welcome. I have put
> 	this under CONFIG_NET_SWITCHDEV to minimize the impact.
> 
> Signed-off-by: Roopa Prabhu <roopa@cumulusnetworks.com>
> Signed-off-by: Wilson Kok <wkok@cumulusnetworks.com>
> ---

Interesting. I completely avoid this problem by not instantiating a
software bridge ;) When these pkts come up the stack I either use a
raw socket to capture them, put a 'tc' ingress rule to do something,
or have OVS handle them in some special way. It seems to me that this
is where the sw/hw model starts to break when you have these magic
bits to handle the packets differently.

How do you know to set the skb bit? Do you have some indicator in the
descriptor? I don't have any good way to learn this on my hardware. But
I can assume if it reached the CPU it was because of some explicit rule.
 
>  include/linux/skbuff.h  |    7 +++++--
>  net/bridge/br_forward.c |   11 +++++++++++
>  2 files changed, 16 insertions(+), 2 deletions(-)
> 
> diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h
> index bba1330..1973b5c 100644
> --- a/include/linux/skbuff.h
> +++ b/include/linux/skbuff.h
> @@ -560,8 +560,11 @@ struct sk_buff {
>  				fclone:2,
>  				peeked:1,
>  				head_frag:1,
> -				xmit_more:1;
> -	/* one bit hole */
> +				xmit_more:1,
> +#ifdef CONFIG_NET_SWITCHDEV
> +				hw_fwded:1;
> +#endif
> +	/* one bit hole if CONFIG_NET_SWITCHDEV not defined */
>  	kmemcheck_bitfield_end(flags1);
>  
>  	/* fields enclosed in headers_start/headers_end are copied
> diff --git a/net/bridge/br_forward.c b/net/bridge/br_forward.c
> index 3304a54..b60b96e 100644
> --- a/net/bridge/br_forward.c
> +++ b/net/bridge/br_forward.c
> @@ -37,6 +37,17 @@ static inline int should_deliver(const struct net_bridge_port *p,
>  
>  int br_dev_queue_push_xmit(struct sk_buff *skb)
>  {
> +
> +#ifdef CONFIG_NET_SWITCHDEV
> +	/* If skb is already hw forwarded and the port being forwarded
> +	 * to is a switch port, dont reforward
> +	 */
> +	if (skb->hw_fwded && (skb->dev->features & NETIF_F_HW_SWITCH_OFFLOAD)) {
> +		kfree_skb(skb);
> +		return 0;
> +	}
> +
> +#endif
>  	if (!is_skb_forwardable(skb->dev, skb)) {
>  		kfree_skb(skb);
>  	} else {
> 

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH net-next RFC v2] switchdev: bridge: drop hardware forwarded packets
  2015-03-20 16:58 [PATCH net-next RFC v2] switchdev: bridge: drop hardware forwarded packets roopa
  2015-03-20 17:11 ` John Fastabend
@ 2015-03-20 18:03 ` Scott Feldman
  2015-03-20 21:20   ` roopa
  2015-03-20 20:36 ` David Miller
  2 siblings, 1 reply; 51+ messages in thread
From: Scott Feldman @ 2015-03-20 18:03 UTC (permalink / raw)
  To: Roopa Prabhu
  Cc: David S. Miller, Jiří Pírko, Arad, Ronen, Netdev

On Fri, Mar 20, 2015 at 9:58 AM,  <roopa@cumulusnetworks.com> wrote:
> From: Roopa Prabhu <roopa@cumulusnetworks.com>
>
> On a Linux bridge with bridge forwarding offloaded to switch ASIC,
> there is a need to not re-forward frames that have already been
> forwarded in hardware.
>
> Typically these are broadcast or multicast frames forwarded by the
> hardware to multiple destination ports including sending a copy of
> the packet to the cpu (kernel e.g. an arp broadcast).
> The bridge driver will try to forward the packet again, resulting in
> two copies of the same packet.
>
> These packets can also come up to the kernel for logging when they hit
> a LOG acl rule in hardware. In such cases, you do want the packet
> to go through the bridge netfilter hooks. Hence, this patch adds the
> required checks just before the packet is being xmited.
>
> v2:
>         - Add a new hw_fwded flag in skbuff to indicate that the packet
>         is already hardware forwarded. Switch driver will set this flag.
>         I have been trying to avoid having this flag in the skb
>         and thats why this patch has been in my tree for long. Cant think
>         of other better alternatives. Suggestions are welcome. I have put
>         this under CONFIG_NET_SWITCHDEV to minimize the impact.
>
> Signed-off-by: Roopa Prabhu <roopa@cumulusnetworks.com>
> Signed-off-by: Wilson Kok <wkok@cumulusnetworks.com>
> ---
>  include/linux/skbuff.h  |    7 +++++--
>  net/bridge/br_forward.c |   11 +++++++++++
>  2 files changed, 16 insertions(+), 2 deletions(-)
>
> diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h
> index bba1330..1973b5c 100644
> --- a/include/linux/skbuff.h
> +++ b/include/linux/skbuff.h
> @@ -560,8 +560,11 @@ struct sk_buff {
>                                 fclone:2,
>                                 peeked:1,
>                                 head_frag:1,
> -                               xmit_more:1;
> -       /* one bit hole */
> +                               xmit_more:1,
> +#ifdef CONFIG_NET_SWITCHDEV
> +                               hw_fwded:1;
> +#endif
> +       /* one bit hole if CONFIG_NET_SWITCHDEV not defined */

Did you want this flag not copied in __copy_skb_header()?  Seems those
flags are special cased.  There is room for a bit here:

#ifdef CONFIG_IPV6_NDISC_NODETYPE
        __u8                    ndisc_nodetype:2;
#endif
        __u8                    ipvs_property:1;
        __u8                    inner_protocol_type:1;
        __u8                    remcsum_offload:1;
        /* 3 or 5 bit hole */
         <<<<<<<<

>         kmemcheck_bitfield_end(flags1);
>
>         /* fields enclosed in headers_start/headers_end are copied
> diff --git a/net/bridge/br_forward.c b/net/bridge/br_forward.c
> index 3304a54..b60b96e 100644
> --- a/net/bridge/br_forward.c
> +++ b/net/bridge/br_forward.c
> @@ -37,6 +37,17 @@ static inline int should_deliver(const struct net_bridge_port *p,
>
>  int br_dev_queue_push_xmit(struct sk_buff *skb)
>  {
> +
> +#ifdef CONFIG_NET_SWITCHDEV
> +       /* If skb is already hw forwarded and the port being forwarded
> +        * to is a switch port, dont reforward
> +        */
> +       if (skb->hw_fwded && (skb->dev->features & NETIF_F_HW_SWITCH_OFFLOAD)) {

The  check for skb->dev->features & NETIF_F_HW_SWITCH_OFFLOAD is redundant.

> +               kfree_skb(skb);
> +               return 0;
> +       }
> +
> +#endif
>         if (!is_skb_forwardable(skb->dev, skb)) {
>                 kfree_skb(skb);
>         } else {
> --
> 1.7.10.4
>

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH net-next RFC v2] switchdev: bridge: drop hardware forwarded packets
  2015-03-20 17:11 ` John Fastabend
@ 2015-03-20 18:13   ` Scott Feldman
  2015-03-20 18:30     ` John Fastabend
  2015-03-20 22:06     ` roopa
  2015-03-20 21:03   ` roopa
  1 sibling, 2 replies; 51+ messages in thread
From: Scott Feldman @ 2015-03-20 18:13 UTC (permalink / raw)
  To: John Fastabend
  Cc: Roopa Prabhu, David S. Miller, Jiří Pírko, Arad,
	Ronen, Netdev

On Fri, Mar 20, 2015 at 10:11 AM, John Fastabend
<john.r.fastabend@intel.com> wrote:
> On 03/20/2015 09:58 AM, roopa@cumulusnetworks.com wrote:
>> From: Roopa Prabhu <roopa@cumulusnetworks.com>
>>
>> On a Linux bridge with bridge forwarding offloaded to switch ASIC,
>> there is a need to not re-forward frames that have already been
>> forwarded in hardware.
>>
>> Typically these are broadcast or multicast frames forwarded by the
>> hardware to multiple destination ports including sending a copy of
>> the packet to the cpu (kernel e.g. an arp broadcast).
>> The bridge driver will try to forward the packet again, resulting in
>> two copies of the same packet.
>>
>> These packets can also come up to the kernel for logging when they hit
>> a LOG acl rule in hardware. In such cases, you do want the packet
>> to go through the bridge netfilter hooks. Hence, this patch adds the
>> required checks just before the packet is being xmited.
>>
>> v2:
>>       - Add a new hw_fwded flag in skbuff to indicate that the packet
>>       is already hardware forwarded. Switch driver will set this flag.
>>       I have been trying to avoid having this flag in the skb
>>       and thats why this patch has been in my tree for long. Cant think
>>       of other better alternatives. Suggestions are welcome. I have put
>>       this under CONFIG_NET_SWITCHDEV to minimize the impact.
>>
>> Signed-off-by: Roopa Prabhu <roopa@cumulusnetworks.com>
>> Signed-off-by: Wilson Kok <wkok@cumulusnetworks.com>
>> ---
>
> Interesting. I completely avoid this problem by not instantiating a
> software bridge ;) When these pkts come up the stack I either use a
> raw socket to capture them, put a 'tc' ingress rule to do something,
> or have OVS handle them in some special way. It seems to me that this
> is where the sw/hw model starts to break when you have these magic
> bits to handle the packets differently.
>
> How do you know to set the skb bit? Do you have some indicator in the
> descriptor? I don't have any good way to learn this on my hardware. But
> I can assume if it reached the CPU it was because of some explicit rule.

I was wondering that also, since there was no example.

This features seems like it belongs in the bridge.  We already have
BR_FLOOD to indicate whether unknown unicast traffic is flooded to a
bridge port.  Can we add another BR_FLOOD_BCAST (or some name) for
this new feature?  You would set/clear this flag on the bridge
(master) port.  The default is set.  And now:

- #define BR_AUTO_MASK          (BR_FLOOD | BR_LEARNING)
+ #define BR_AUTO_MASK          (BR_FLOOD | BR_FLOOD_BCAST | BR_LEARNING)

Does this work for your use-case, Roopa?

-scott

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH net-next RFC v2] switchdev: bridge: drop hardware forwarded packets
  2015-03-20 18:13   ` Scott Feldman
@ 2015-03-20 18:30     ` John Fastabend
  2015-03-20 22:06     ` roopa
  1 sibling, 0 replies; 51+ messages in thread
From: John Fastabend @ 2015-03-20 18:30 UTC (permalink / raw)
  To: Scott Feldman
  Cc: John Fastabend, Roopa Prabhu, David S. Miller,
	Jiří Pírko, Arad, Ronen, Netdev

On 03/20/2015 11:13 AM, Scott Feldman wrote:
> On Fri, Mar 20, 2015 at 10:11 AM, John Fastabend
> <john.r.fastabend@intel.com> wrote:
>> On 03/20/2015 09:58 AM, roopa@cumulusnetworks.com wrote:
>>> From: Roopa Prabhu <roopa@cumulusnetworks.com>
>>>
>>> On a Linux bridge with bridge forwarding offloaded to switch ASIC,
>>> there is a need to not re-forward frames that have already been
>>> forwarded in hardware.
>>>
>>> Typically these are broadcast or multicast frames forwarded by the
>>> hardware to multiple destination ports including sending a copy of
>>> the packet to the cpu (kernel e.g. an arp broadcast).
>>> The bridge driver will try to forward the packet again, resulting in
>>> two copies of the same packet.
>>>
>>> These packets can also come up to the kernel for logging when they hit
>>> a LOG acl rule in hardware. In such cases, you do want the packet
>>> to go through the bridge netfilter hooks. Hence, this patch adds the
>>> required checks just before the packet is being xmited.
>>>
>>> v2:
>>>        - Add a new hw_fwded flag in skbuff to indicate that the packet
>>>        is already hardware forwarded. Switch driver will set this flag.
>>>        I have been trying to avoid having this flag in the skb
>>>        and thats why this patch has been in my tree for long. Cant think
>>>        of other better alternatives. Suggestions are welcome. I have put
>>>        this under CONFIG_NET_SWITCHDEV to minimize the impact.
>>>
>>> Signed-off-by: Roopa Prabhu <roopa@cumulusnetworks.com>
>>> Signed-off-by: Wilson Kok <wkok@cumulusnetworks.com>
>>> ---
>>
>> Interesting. I completely avoid this problem by not instantiating a
>> software bridge ;) When these pkts come up the stack I either use a
>> raw socket to capture them, put a 'tc' ingress rule to do something,
>> or have OVS handle them in some special way. It seems to me that this
>> is where the sw/hw model starts to break when you have these magic
>> bits to handle the packets differently.
>>
>> How do you know to set the skb bit? Do you have some indicator in the
>> descriptor? I don't have any good way to learn this on my hardware. But
>> I can assume if it reached the CPU it was because of some explicit rule.
>
> I was wondering that also, since there was no example.
>
> This features seems like it belongs in the bridge.  We already have
> BR_FLOOD to indicate whether unknown unicast traffic is flooded to a
> bridge port.  Can we add another BR_FLOOD_BCAST (or some name) for
> this new feature?  You would set/clear this flag on the bridge
> (master) port.  The default is set.  And now:
>
> - #define BR_AUTO_MASK          (BR_FLOOD | BR_LEARNING)
> + #define BR_AUTO_MASK          (BR_FLOOD | BR_FLOOD_BCAST | BR_LEARNING)
>
> Does this work for your use-case, Roopa?

I'm probably being a bit dense but I can't think of a case where I would
want pkts forwarded back to the hardware. If you could explain a bit
more why this would be useful that would help me at least. Maybe a flag
to disable forwarding on the port would work. Perhaps using the
BR_STATE_* bits would be good enough?

.John

>
> -scott
> --
> To unsubscribe from this list: send the line "unsubscribe netdev" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>


-- 
John Fastabend         Intel Corporation

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH net-next RFC v2] switchdev: bridge: drop hardware forwarded packets
  2015-03-20 16:58 [PATCH net-next RFC v2] switchdev: bridge: drop hardware forwarded packets roopa
  2015-03-20 17:11 ` John Fastabend
  2015-03-20 18:03 ` Scott Feldman
@ 2015-03-20 20:36 ` David Miller
  2015-03-20 21:36   ` roopa
  2 siblings, 1 reply; 51+ messages in thread
From: David Miller @ 2015-03-20 20:36 UTC (permalink / raw)
  To: roopa; +Cc: sfeldma, jiri, ronen.arad, netdev

From: roopa@cumulusnetworks.com
Date: Fri, 20 Mar 2015 09:58:34 -0700

> On a Linux bridge with bridge forwarding offloaded to switch ASIC,
> there is a need to not re-forward frames that have already been
> forwarded in hardware.

It's impossible to validate this without seeing a use case where
the device marks packets appropriately in order to trigger this
new code.

And as John said, some devices may not even be able to do that.

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH net-next RFC v2] switchdev: bridge: drop hardware forwarded packets
  2015-03-20 17:11 ` John Fastabend
  2015-03-20 18:13   ` Scott Feldman
@ 2015-03-20 21:03   ` roopa
  2015-03-20 21:23     ` John Fastabend
  1 sibling, 1 reply; 51+ messages in thread
From: roopa @ 2015-03-20 21:03 UTC (permalink / raw)
  To: John Fastabend; +Cc: davem, sfeldma, jiri, ronen.arad, netdev, Wilson Kok

On 3/20/15, 10:11 AM, John Fastabend wrote:
> On 03/20/2015 09:58 AM, roopa@cumulusnetworks.com wrote:
>> From: Roopa Prabhu <roopa@cumulusnetworks.com>
>>
>> On a Linux bridge with bridge forwarding offloaded to switch ASIC,
>> there is a need to not re-forward frames that have already been
>> forwarded in hardware.
>>
>> Typically these are broadcast or multicast frames forwarded by the
>> hardware to multiple destination ports including sending a copy of
>> the packet to the cpu (kernel e.g. an arp broadcast).
>> The bridge driver will try to forward the packet again, resulting in
>> two copies of the same packet.
>>
>> These packets can also come up to the kernel for logging when they hit
>> a LOG acl rule in hardware. In such cases, you do want the packet
>> to go through the bridge netfilter hooks. Hence, this patch adds the
>> required checks just before the packet is being xmited.
>>
>> v2:
>> 	- Add a new hw_fwded flag in skbuff to indicate that the packet
>> 	is already hardware forwarded. Switch driver will set this flag.
>> 	I have been trying to avoid having this flag in the skb
>> 	and thats why this patch has been in my tree for long. Cant think
>> 	of other better alternatives. Suggestions are welcome. I have put
>> 	this under CONFIG_NET_SWITCHDEV to minimize the impact.
>>
>> Signed-off-by: Roopa Prabhu <roopa@cumulusnetworks.com>
>> Signed-off-by: Wilson Kok <wkok@cumulusnetworks.com>
>> ---
> Interesting. I completely avoid this problem by not instantiating a
> software bridge ;)
>   When these pkts come up the stack I either use a
> raw socket to capture them, put a 'tc' ingress rule to do something,
> or have OVS handle them in some special way.
> It seems to me that this
> is where the sw/hw model starts to break when you have these magic
> bits to handle the packets differently.

  In-kernel bridge driver is proven very useful for us to run stp,
or recently igmp reports (dont know the details here) etc in software.
I wonder how you handle these. If you don't use the in-kernel bridge
driver, I suspect you down the lane you will end-up having to duplicate a
lot of things that bridge driver already does in your switch driver.
>
> How do you know to set the skb bit? Do you have some indicator in the
> descriptor? I don't have any good way to learn this on my hardware. But
> I can assume if it reached the CPU it was because of some explicit rule.

Right now we tag all packets except for some igmp frames so that they 
get handled by software (in kernel bridge driver).
(But the igmp frames check is a bit of a hack right now). We don't use 
it today, but, the sdk
can give us some details about the reason the packet was sent to CPU (It 
possibly gets it from the descriptor).

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH net-next RFC v2] switchdev: bridge: drop hardware forwarded packets
  2015-03-20 18:03 ` Scott Feldman
@ 2015-03-20 21:20   ` roopa
  0 siblings, 0 replies; 51+ messages in thread
From: roopa @ 2015-03-20 21:20 UTC (permalink / raw)
  To: Scott Feldman
  Cc: David S. Miller, Jiří Pírko, Arad, Ronen, Netdev

On 3/20/15, 11:03 AM, Scott Feldman wrote:
> On Fri, Mar 20, 2015 at 9:58 AM,  <roopa@cumulusnetworks.com> wrote:
>> From: Roopa Prabhu <roopa@cumulusnetworks.com>
>>
>> On a Linux bridge with bridge forwarding offloaded to switch ASIC,
>> there is a need to not re-forward frames that have already been
>> forwarded in hardware.
>>
>> Typically these are broadcast or multicast frames forwarded by the
>> hardware to multiple destination ports including sending a copy of
>> the packet to the cpu (kernel e.g. an arp broadcast).
>> The bridge driver will try to forward the packet again, resulting in
>> two copies of the same packet.
>>
>> These packets can also come up to the kernel for logging when they hit
>> a LOG acl rule in hardware. In such cases, you do want the packet
>> to go through the bridge netfilter hooks. Hence, this patch adds the
>> required checks just before the packet is being xmited.
>>
>> v2:
>>          - Add a new hw_fwded flag in skbuff to indicate that the packet
>>          is already hardware forwarded. Switch driver will set this flag.
>>          I have been trying to avoid having this flag in the skb
>>          and thats why this patch has been in my tree for long. Cant think
>>          of other better alternatives. Suggestions are welcome. I have put
>>          this under CONFIG_NET_SWITCHDEV to minimize the impact.
>>
>> Signed-off-by: Roopa Prabhu <roopa@cumulusnetworks.com>
>> Signed-off-by: Wilson Kok <wkok@cumulusnetworks.com>
>> ---
>>   include/linux/skbuff.h  |    7 +++++--
>>   net/bridge/br_forward.c |   11 +++++++++++
>>   2 files changed, 16 insertions(+), 2 deletions(-)
>>
>> diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h
>> index bba1330..1973b5c 100644
>> --- a/include/linux/skbuff.h
>> +++ b/include/linux/skbuff.h
>> @@ -560,8 +560,11 @@ struct sk_buff {
>>                                  fclone:2,
>>                                  peeked:1,
>>                                  head_frag:1,
>> -                               xmit_more:1;
>> -       /* one bit hole */
>> +                               xmit_more:1,
>> +#ifdef CONFIG_NET_SWITCHDEV
>> +                               hw_fwded:1;
>> +#endif
>> +       /* one bit hole if CONFIG_NET_SWITCHDEV not defined */
> Did you want this flag not copied in __copy_skb_header()?  Seems those
> flags are special cased.  There is room for a bit here:
>
> #ifdef CONFIG_IPV6_NDISC_NODETYPE
>          __u8                    ndisc_nodetype:2;
> #endif
>          __u8                    ipvs_property:1;
>          __u8                    inner_protocol_type:1;
>          __u8                    remcsum_offload:1;
>          /* 3 or 5 bit hole */
>           <<<<<<<<

thx, yes I can add it here.
(I found the first 1 bit hole and added it there).
>
>>          kmemcheck_bitfield_end(flags1);
>>
>>          /* fields enclosed in headers_start/headers_end are copied
>> diff --git a/net/bridge/br_forward.c b/net/bridge/br_forward.c
>> index 3304a54..b60b96e 100644
>> --- a/net/bridge/br_forward.c
>> +++ b/net/bridge/br_forward.c
>> @@ -37,6 +37,17 @@ static inline int should_deliver(const struct net_bridge_port *p,
>>
>>   int br_dev_queue_push_xmit(struct sk_buff *skb)
>>   {
>> +
>> +#ifdef CONFIG_NET_SWITCHDEV
>> +       /* If skb is already hw forwarded and the port being forwarded
>> +        * to is a switch port, dont reforward
>> +        */
>> +       if (skb->hw_fwded && (skb->dev->features & NETIF_F_HW_SWITCH_OFFLOAD)) {
> The  check for skb->dev->features & NETIF_F_HW_SWITCH_OFFLOAD is redundant.

The skb->dev is the device it is getting forwarded to. The hw_fwded flag 
was set by the
device that originated the skb. The NETIF_F_HW_SWITCH_OFFLOAD flag is 
required because
the device being forwarded to can be a non switch port.

thanks,
Roopa

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH net-next RFC v2] switchdev: bridge: drop hardware forwarded packets
  2015-03-20 21:03   ` roopa
@ 2015-03-20 21:23     ` John Fastabend
  2015-03-20 22:04       ` Andrew Lunn
  2015-03-20 23:12       ` roopa
  0 siblings, 2 replies; 51+ messages in thread
From: John Fastabend @ 2015-03-20 21:23 UTC (permalink / raw)
  To: roopa
  Cc: John Fastabend, davem, sfeldma, jiri, ronen.arad, netdev,
	Wilson Kok, Stephen Hemminger

On 03/20/2015 02:03 PM, roopa wrote:
> On 3/20/15, 10:11 AM, John Fastabend wrote:
>> On 03/20/2015 09:58 AM, roopa@cumulusnetworks.com wrote:
>>> From: Roopa Prabhu <roopa@cumulusnetworks.com>
>>>
>>> On a Linux bridge with bridge forwarding offloaded to switch ASIC,
>>> there is a need to not re-forward frames that have already been
>>> forwarded in hardware.
>>>
>>> Typically these are broadcast or multicast frames forwarded by the
>>> hardware to multiple destination ports including sending a copy of
>>> the packet to the cpu (kernel e.g. an arp broadcast).
>>> The bridge driver will try to forward the packet again, resulting in
>>> two copies of the same packet.
>>>
>>> These packets can also come up to the kernel for logging when they hit
>>> a LOG acl rule in hardware. In such cases, you do want the packet
>>> to go through the bridge netfilter hooks. Hence, this patch adds the
>>> required checks just before the packet is being xmited.
>>>
>>> v2:
>>>     - Add a new hw_fwded flag in skbuff to indicate that the packet
>>>     is already hardware forwarded. Switch driver will set this flag.
>>>     I have been trying to avoid having this flag in the skb
>>>     and thats why this patch has been in my tree for long. Cant think
>>>     of other better alternatives. Suggestions are welcome. I have put
>>>     this under CONFIG_NET_SWITCHDEV to minimize the impact.
>>>
>>> Signed-off-by: Roopa Prabhu <roopa@cumulusnetworks.com>
>>> Signed-off-by: Wilson Kok <wkok@cumulusnetworks.com>
>>> ---
>> Interesting. I completely avoid this problem by not instantiating a
>> software bridge ;)
>>   When these pkts come up the stack I either use a
>> raw socket to capture them, put a 'tc' ingress rule to do something,
>> or have OVS handle them in some special way.
>> It seems to me that this
>> is where the sw/hw model starts to break when you have these magic
>> bits to handle the packets differently.
>
>   In-kernel bridge driver is proven very useful for us to run stp,
> or recently igmp reports (dont know the details here) etc in software.
> I wonder how you handle these. If you don't use the in-kernel bridge
> driver, I suspect you down the lane you will end-up having to duplicate a
> lot of things that bridge driver already does in your switch driver.

I think that code is in need of some serious love before it is usable. I
actually don't know who is using STP anymore if anyone. I suspect
everyone is using their own agents. I know Stephen had RSTP code for
awhile. Anyways it all runs in userspace and doesn't depend on the sw
bridge. I think it is a better model to run the control protocols in
userspace like this. I'm not an expert though, maybe Stephen will chime
in.

>>
>> How do you know to set the skb bit? Do you have some indicator in the
>> descriptor? I don't have any good way to learn this on my hardware. But
>> I can assume if it reached the CPU it was because of some explicit rule.
>
> Right now we tag all packets except for some igmp frames so that they
> get handled by software (in kernel bridge driver).
> (But the igmp frames check is a bit of a hack right now). We don't use
> it today, but, the sdk
> can give us some details about the reason the packet was sent to CPU (It
> possibly gets it from the descriptor).
>

hmm I agree with Scott then it seems like if your just tagging every
packet (or nearly every packet) you can turn forwarding off at the
port layer. then we save the bit in the skb for something else. And I
guess if you turn forwarding off at the port layer and have the control
traffic handled by a userspace agent there is no need for the software
bridge which is my case. Just my opinion though.

>
> --
> To unsubscribe from this list: send the line "unsubscribe netdev" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html


-- 
John Fastabend         Intel Corporation

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH net-next RFC v2] switchdev: bridge: drop hardware forwarded packets
  2015-03-20 20:36 ` David Miller
@ 2015-03-20 21:36   ` roopa
  2015-03-20 22:09     ` Andrew Lunn
  0 siblings, 1 reply; 51+ messages in thread
From: roopa @ 2015-03-20 21:36 UTC (permalink / raw)
  To: David Miller; +Cc: sfeldma, jiri, ronen.arad, netdev

On 3/20/15, 1:36 PM, David Miller wrote:
> From: roopa@cumulusnetworks.com
> Date: Fri, 20 Mar 2015 09:58:34 -0700
>
>> On a Linux bridge with bridge forwarding offloaded to switch ASIC,
>> there is a need to not re-forward frames that have already been
>> forwarded in hardware.
> It's impossible to validate this without seeing a use case where
> the device marks packets appropriately in order to trigger this
> new code.
>
> And as John said, some devices may not even be able to do that.
agreed, and that's why I am still sending this patch as RFC and not
for inclusion right away.

since we have discussed this problem multiple times in switchdev meetings,
the intent of this RFC is to see get the code out and also see if rocker 
or any other in-kernel
driver can use it.

thanks,
Roopa

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH net-next RFC v2] switchdev: bridge: drop hardware forwarded packets
  2015-03-20 21:23     ` John Fastabend
@ 2015-03-20 22:04       ` Andrew Lunn
  2015-03-20 23:12       ` roopa
  1 sibling, 0 replies; 51+ messages in thread
From: Andrew Lunn @ 2015-03-20 22:04 UTC (permalink / raw)
  To: John Fastabend
  Cc: roopa, John Fastabend, davem, sfeldma, jiri, ronen.arad, netdev,
	Wilson Kok, Stephen Hemminger

> >  In-kernel bridge driver is proven very useful for us to run stp,
> >or recently igmp reports (dont know the details here) etc in software.
> >I wonder how you handle these. If you don't use the in-kernel bridge
> >driver, I suspect you down the lane you will end-up having to duplicate a
> >lot of things that bridge driver already does in your switch driver.
> 
> I think that code is in need of some serious love before it is usable. I
> actually don't know who is using STP anymore if anyone.

We are using STP for the DSA switches. It is a good minimum to start
with, for these little switches with only a small number of ports, and
use in home WiFi routers etc.

    Andrew

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH net-next RFC v2] switchdev: bridge: drop hardware forwarded packets
  2015-03-20 18:13   ` Scott Feldman
  2015-03-20 18:30     ` John Fastabend
@ 2015-03-20 22:06     ` roopa
  2015-03-20 22:37       ` Scott Feldman
  1 sibling, 1 reply; 51+ messages in thread
From: roopa @ 2015-03-20 22:06 UTC (permalink / raw)
  To: Scott Feldman
  Cc: John Fastabend, David S. Miller, Jiří Pírko, Arad,
	Ronen, Netdev

On 3/20/15, 11:13 AM, Scott Feldman wrote:
> On Fri, Mar 20, 2015 at 10:11 AM, John Fastabend
> <john.r.fastabend@intel.com> wrote:
>> On 03/20/2015 09:58 AM, roopa@cumulusnetworks.com wrote:
>>> From: Roopa Prabhu <roopa@cumulusnetworks.com>
>>>
>>> On a Linux bridge with bridge forwarding offloaded to switch ASIC,
>>> there is a need to not re-forward frames that have already been
>>> forwarded in hardware.
>>>
>>> Typically these are broadcast or multicast frames forwarded by the
>>> hardware to multiple destination ports including sending a copy of
>>> the packet to the cpu (kernel e.g. an arp broadcast).
>>> The bridge driver will try to forward the packet again, resulting in
>>> two copies of the same packet.
>>>
>>> These packets can also come up to the kernel for logging when they hit
>>> a LOG acl rule in hardware. In such cases, you do want the packet
>>> to go through the bridge netfilter hooks. Hence, this patch adds the
>>> required checks just before the packet is being xmited.
>>>
>>> v2:
>>>        - Add a new hw_fwded flag in skbuff to indicate that the packet
>>>        is already hardware forwarded. Switch driver will set this flag.
>>>        I have been trying to avoid having this flag in the skb
>>>        and thats why this patch has been in my tree for long. Cant think
>>>        of other better alternatives. Suggestions are welcome. I have put
>>>        this under CONFIG_NET_SWITCHDEV to minimize the impact.
>>>
>>> Signed-off-by: Roopa Prabhu <roopa@cumulusnetworks.com>
>>> Signed-off-by: Wilson Kok <wkok@cumulusnetworks.com>
>>> ---
>> Interesting. I completely avoid this problem by not instantiating a
>> software bridge ;) When these pkts come up the stack I either use a
>> raw socket to capture them, put a 'tc' ingress rule to do something,
>> or have OVS handle them in some special way. It seems to me that this
>> is where the sw/hw model starts to break when you have these magic
>> bits to handle the packets differently.
>>
>> How do you know to set the skb bit? Do you have some indicator in the
>> descriptor? I don't have any good way to learn this on my hardware. But
>> I can assume if it reached the CPU it was because of some explicit rule.
> I was wondering that also, since there was no example.
>
> This features seems like it belongs in the bridge.
yes, it does, the check today is really in the bridge.
> We already have
> BR_FLOOD to indicate whether unknown unicast traffic is flooded to a
> bridge port.  Can we add another BR_FLOOD_BCAST (or some name) for
> this new feature?  You would set/clear this flag on the bridge
> (master) port.  The default is set.  And now:
>
> - #define BR_AUTO_MASK          (BR_FLOOD | BR_LEARNING)
> + #define BR_AUTO_MASK          (BR_FLOOD | BR_FLOOD_BCAST | BR_LEARNING)
>
> Does this work for your use-case, Roopa?
Note my first RFC patch, sort of did this:
https://marc.info/?l=linux-netdev&m=142147999420017&w=2

But there are open things there as listed in the comment and also in the 
subsequent
discussion on the thread.

We discussed this flag before and i think it does not allow the case 
where hw switch ports are  bridged with non-hw ports.

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH net-next RFC v2] switchdev: bridge: drop hardware forwarded packets
  2015-03-20 21:36   ` roopa
@ 2015-03-20 22:09     ` Andrew Lunn
  2015-03-20 23:43       ` Florian Fainelli
  2015-03-23  0:22       ` Guenter Roeck
  0 siblings, 2 replies; 51+ messages in thread
From: Andrew Lunn @ 2015-03-20 22:09 UTC (permalink / raw)
  To: roopa; +Cc: David Miller, sfeldma, jiri, ronen.arad, netdev

> since we have discussed this problem multiple times in switchdev meetings,
> the intent of this RFC is to see get the code out and also see if
> rocker or any other in-kernel
> driver can use it.

The Marvell switches in DSA don't have any way to mark packets why
they where forwarded towards the host. So i don't see how we could use
this feature with these chips.

     Andrew

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH net-next RFC v2] switchdev: bridge: drop hardware forwarded packets
  2015-03-20 22:06     ` roopa
@ 2015-03-20 22:37       ` Scott Feldman
  2015-03-20 23:30         ` roopa
  0 siblings, 1 reply; 51+ messages in thread
From: Scott Feldman @ 2015-03-20 22:37 UTC (permalink / raw)
  To: roopa
  Cc: John Fastabend, David S. Miller, Jiří Pírko, Arad,
	Ronen, Netdev

On Fri, Mar 20, 2015 at 3:06 PM, roopa <roopa@cumulusnetworks.com> wrote:
> On 3/20/15, 11:13 AM, Scott Feldman wrote:
>>
>> On Fri, Mar 20, 2015 at 10:11 AM, John Fastabend
>> <john.r.fastabend@intel.com> wrote:
>>>
>>> On 03/20/2015 09:58 AM, roopa@cumulusnetworks.com wrote:
>>>>
>>>> From: Roopa Prabhu <roopa@cumulusnetworks.com>
>>>>
>>>> On a Linux bridge with bridge forwarding offloaded to switch ASIC,
>>>> there is a need to not re-forward frames that have already been
>>>> forwarded in hardware.
>>>>
>>>> Typically these are broadcast or multicast frames forwarded by the
>>>> hardware to multiple destination ports including sending a copy of
>>>> the packet to the cpu (kernel e.g. an arp broadcast).
>>>> The bridge driver will try to forward the packet again, resulting in
>>>> two copies of the same packet.
>>>>
>>>> These packets can also come up to the kernel for logging when they hit
>>>> a LOG acl rule in hardware. In such cases, you do want the packet
>>>> to go through the bridge netfilter hooks. Hence, this patch adds the
>>>> required checks just before the packet is being xmited.
>>>>
>>>> v2:
>>>>        - Add a new hw_fwded flag in skbuff to indicate that the packet
>>>>        is already hardware forwarded. Switch driver will set this flag.
>>>>        I have been trying to avoid having this flag in the skb
>>>>        and thats why this patch has been in my tree for long. Cant think
>>>>        of other better alternatives. Suggestions are welcome. I have put
>>>>        this under CONFIG_NET_SWITCHDEV to minimize the impact.
>>>>
>>>> Signed-off-by: Roopa Prabhu <roopa@cumulusnetworks.com>
>>>> Signed-off-by: Wilson Kok <wkok@cumulusnetworks.com>
>>>> ---
>>>
>>> Interesting. I completely avoid this problem by not instantiating a
>>> software bridge ;) When these pkts come up the stack I either use a
>>> raw socket to capture them, put a 'tc' ingress rule to do something,
>>> or have OVS handle them in some special way. It seems to me that this
>>> is where the sw/hw model starts to break when you have these magic
>>> bits to handle the packets differently.
>>>
>>> How do you know to set the skb bit? Do you have some indicator in the
>>> descriptor? I don't have any good way to learn this on my hardware. But
>>> I can assume if it reached the CPU it was because of some explicit rule.
>>
>> I was wondering that also, since there was no example.
>>
>> This features seems like it belongs in the bridge.
>
> yes, it does, the check today is really in the bridge.
>>
>> We already have
>> BR_FLOOD to indicate whether unknown unicast traffic is flooded to a
>> bridge port.  Can we add another BR_FLOOD_BCAST (or some name) for
>> this new feature?  You would set/clear this flag on the bridge
>> (master) port.  The default is set.  And now:
>>
>> - #define BR_AUTO_MASK          (BR_FLOOD | BR_LEARNING)
>> + #define BR_AUTO_MASK          (BR_FLOOD | BR_FLOOD_BCAST | BR_LEARNING)
>>
>> Does this work for your use-case, Roopa?
>
> Note my first RFC patch, sort of did this:
> https://marc.info/?l=linux-netdev&m=142147999420017&w=2
>
> But there are open things there as listed in the comment and also in the
> subsequent
> discussion on the thread.
>
> We discussed this flag before and i think it does not allow the case where
> hw switch ports are  bridged with non-hw ports.

I went back and read the thread just to remind me what the pros/cons
where.  I think the mixed case isn't a concern since this
BR_FLOOD_BCAST check is made on egress to the bridge port.  So only
clear BR_FLOOD_BCAST on hw switch ports (if hw did the flood already
amongst its ports), and leave it set for non-hw-ports.   It seems the
patch should mostly be a clone of how BR_FLOOD is handled.  Is there
more to it?

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH net-next RFC v2] switchdev: bridge: drop hardware forwarded packets
  2015-03-20 21:23     ` John Fastabend
  2015-03-20 22:04       ` Andrew Lunn
@ 2015-03-20 23:12       ` roopa
  1 sibling, 0 replies; 51+ messages in thread
From: roopa @ 2015-03-20 23:12 UTC (permalink / raw)
  To: John Fastabend
  Cc: John Fastabend, davem, sfeldma, jiri, ronen.arad, netdev,
	Wilson Kok, Stephen Hemminger

On 3/20/15, 2:23 PM, John Fastabend wrote:
> On 03/20/2015 02:03 PM, roopa wrote:
>> On 3/20/15, 10:11 AM, John Fastabend wrote:
>>> On 03/20/2015 09:58 AM, roopa@cumulusnetworks.com wrote:
>>>> From: Roopa Prabhu <roopa@cumulusnetworks.com>
>>>>
>>>> On a Linux bridge with bridge forwarding offloaded to switch ASIC,
>>>> there is a need to not re-forward frames that have already been
>>>> forwarded in hardware.
>>>>
>>>> Typically these are broadcast or multicast frames forwarded by the
>>>> hardware to multiple destination ports including sending a copy of
>>>> the packet to the cpu (kernel e.g. an arp broadcast).
>>>> The bridge driver will try to forward the packet again, resulting in
>>>> two copies of the same packet.
>>>>
>>>> These packets can also come up to the kernel for logging when they hit
>>>> a LOG acl rule in hardware. In such cases, you do want the packet
>>>> to go through the bridge netfilter hooks. Hence, this patch adds the
>>>> required checks just before the packet is being xmited.
>>>>
>>>> v2:
>>>>     - Add a new hw_fwded flag in skbuff to indicate that the packet
>>>>     is already hardware forwarded. Switch driver will set this flag.
>>>>     I have been trying to avoid having this flag in the skb
>>>>     and thats why this patch has been in my tree for long. Cant think
>>>>     of other better alternatives. Suggestions are welcome. I have put
>>>>     this under CONFIG_NET_SWITCHDEV to minimize the impact.
>>>>
>>>> Signed-off-by: Roopa Prabhu <roopa@cumulusnetworks.com>
>>>> Signed-off-by: Wilson Kok <wkok@cumulusnetworks.com>
>>>> ---
>>> Interesting. I completely avoid this problem by not instantiating a
>>> software bridge ;)
>>>   When these pkts come up the stack I either use a
>>> raw socket to capture them, put a 'tc' ingress rule to do something,
>>> or have OVS handle them in some special way.
>>> It seems to me that this
>>> is where the sw/hw model starts to break when you have these magic
>>> bits to handle the packets differently.
>>
>>   In-kernel bridge driver is proven very useful for us to run stp,
>> or recently igmp reports (dont know the details here) etc in software.
>> I wonder how you handle these. If you don't use the in-kernel bridge
>> driver, I suspect you down the lane you will end-up having to 
>> duplicate a
>> lot of things that bridge driver already does in your switch driver.
>
> I think that code is in need of some serious love before it is usable. I
> actually don't know who is using STP anymore if anyone. I suspect
> everyone is using their own agents. I know Stephen had RSTP code for
> awhile. 
we run stp in userspace but also allow stp to run in kernel.
But the stp in userspace always work with the bridge in the kernel AFAIK.
We also use igmp in the bridge driver. I am guessing Stephens userspace 
RSTP also needs
a linux bridge to be created to work with.
> Anyways it all runs in userspace and doesn't depend on the sw
> bridge. I think it is a better model to run the control protocols in
> userspace like this. I'm not an expert though, maybe Stephen will chime
> in.

I agree with pushing control protocols to userspace. But they do work
with or use netdevices created in the kernel (eg team daemon in userspace
needs team net-devices).

and stephen can confirm on RSTP.
I know one of the userspace open source mstp daemons we use works with the
linux bridge device.

>
>>>
>>> How do you know to set the skb bit? Do you have some indicator in the
>>> descriptor? I don't have any good way to learn this on my hardware. But
>>> I can assume if it reached the CPU it was because of some explicit 
>>> rule.
>>
>> Right now we tag all packets except for some igmp frames so that they
>> get handled by software (in kernel bridge driver).
>> (But the igmp frames check is a bit of a hack right now). We don't use
>> it today, but, the sdk
>> can give us some details about the reason the packet was sent to CPU (It
>> possibly gets it from the descriptor).
>>
>
> hmm I agree with Scott then it seems like if your just tagging every
> packet (or nearly every packet) you can turn forwarding off at the
> port layer. then we save the bit in the skb for something else. 
I am all for saving the bit in the skb if I can. I will look at scotts 
flag a bit more.
My earlier patch on this subject has also been a user settable flag on 
the bridge
port.

> And I
> guess if you turn forwarding off at the port layer and have the control
> traffic handled by a userspace agent there is no need for the software
> bridge which is my case. Just my opinion though.

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH net-next RFC v2] switchdev: bridge: drop hardware forwarded packets
  2015-03-20 22:37       ` Scott Feldman
@ 2015-03-20 23:30         ` roopa
  2015-03-21  0:26           ` Scott Feldman
  0 siblings, 1 reply; 51+ messages in thread
From: roopa @ 2015-03-20 23:30 UTC (permalink / raw)
  To: Scott Feldman
  Cc: John Fastabend, David S. Miller, Jiří Pírko, Arad,
	Ronen, Netdev

On 3/20/15, 3:37 PM, Scott Feldman wrote:
> On Fri, Mar 20, 2015 at 3:06 PM, roopa <roopa@cumulusnetworks.com> wrote:
>> On 3/20/15, 11:13 AM, Scott Feldman wrote:
>>> On Fri, Mar 20, 2015 at 10:11 AM, John Fastabend
>>> <john.r.fastabend@intel.com> wrote:
>>>> On 03/20/2015 09:58 AM, roopa@cumulusnetworks.com wrote:
>>>>> From: Roopa Prabhu <roopa@cumulusnetworks.com>
>>>>>
>>>>> On a Linux bridge with bridge forwarding offloaded to switch ASIC,
>>>>> there is a need to not re-forward frames that have already been
>>>>> forwarded in hardware.
>>>>>
>>>>> Typically these are broadcast or multicast frames forwarded by the
>>>>> hardware to multiple destination ports including sending a copy of
>>>>> the packet to the cpu (kernel e.g. an arp broadcast).
>>>>> The bridge driver will try to forward the packet again, resulting in
>>>>> two copies of the same packet.
>>>>>
>>>>> These packets can also come up to the kernel for logging when they hit
>>>>> a LOG acl rule in hardware. In such cases, you do want the packet
>>>>> to go through the bridge netfilter hooks. Hence, this patch adds the
>>>>> required checks just before the packet is being xmited.
>>>>>
>>>>> v2:
>>>>>         - Add a new hw_fwded flag in skbuff to indicate that the packet
>>>>>         is already hardware forwarded. Switch driver will set this flag.
>>>>>         I have been trying to avoid having this flag in the skb
>>>>>         and thats why this patch has been in my tree for long. Cant think
>>>>>         of other better alternatives. Suggestions are welcome. I have put
>>>>>         this under CONFIG_NET_SWITCHDEV to minimize the impact.
>>>>>
>>>>> Signed-off-by: Roopa Prabhu <roopa@cumulusnetworks.com>
>>>>> Signed-off-by: Wilson Kok <wkok@cumulusnetworks.com>
>>>>> ---
>>>> Interesting. I completely avoid this problem by not instantiating a
>>>> software bridge ;) When these pkts come up the stack I either use a
>>>> raw socket to capture them, put a 'tc' ingress rule to do something,
>>>> or have OVS handle them in some special way. It seems to me that this
>>>> is where the sw/hw model starts to break when you have these magic
>>>> bits to handle the packets differently.
>>>>
>>>> How do you know to set the skb bit? Do you have some indicator in the
>>>> descriptor? I don't have any good way to learn this on my hardware. But
>>>> I can assume if it reached the CPU it was because of some explicit rule.
>>> I was wondering that also, since there was no example.
>>>
>>> This features seems like it belongs in the bridge.
>> yes, it does, the check today is really in the bridge.
>>> We already have
>>> BR_FLOOD to indicate whether unknown unicast traffic is flooded to a
>>> bridge port.  Can we add another BR_FLOOD_BCAST (or some name) for
>>> this new feature?  You would set/clear this flag on the bridge
>>> (master) port.  The default is set.  And now:
>>>
>>> - #define BR_AUTO_MASK          (BR_FLOOD | BR_LEARNING)
>>> + #define BR_AUTO_MASK          (BR_FLOOD | BR_FLOOD_BCAST | BR_LEARNING)
>>>
>>> Does this work for your use-case, Roopa?
>> Note my first RFC patch, sort of did this:
>> https://marc.info/?l=linux-netdev&m=142147999420017&w=2
>>
>> But there are open things there as listed in the comment and also in the
>> subsequent
>> discussion on the thread.
>>
>> We discussed this flag before and i think it does not allow the case where
>> hw switch ports are  bridged with non-hw ports.
> I went back and read the thread just to remind me what the pros/cons
> where.  I think the mixed case isn't a concern since this
> BR_FLOOD_BCAST check is made on egress to the bridge port.  So only
> clear BR_FLOOD_BCAST on hw switch ports (if hw did the flood already
> amongst its ports), and leave it set for non-hw-ports.   It seems the
> patch should mostly be a clone of how BR_FLOOD is handled.  Is there
> more to it?
That may work.  But, we have recently moved igmp handling to software in 
kernel
and i was trying to make this work for that case. I am going to try what 
you suggest
  by finding a work around for the igmp case.

I will get back to you.

Thanks!
-Roopa

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH net-next RFC v2] switchdev: bridge: drop hardware forwarded packets
  2015-03-20 22:09     ` Andrew Lunn
@ 2015-03-20 23:43       ` Florian Fainelli
  2015-03-23  0:22       ` Guenter Roeck
  1 sibling, 0 replies; 51+ messages in thread
From: Florian Fainelli @ 2015-03-20 23:43 UTC (permalink / raw)
  To: Andrew Lunn, roopa; +Cc: David Miller, sfeldma, jiri, ronen.arad, netdev

On 20/03/15 15:09, Andrew Lunn wrote:
>> since we have discussed this problem multiple times in switchdev meetings,
>> the intent of this RFC is to see get the code out and also see if
>> rocker or any other in-kernel
>> driver can use it.
> 
> The Marvell switches in DSA don't have any way to mark packets why
> they where forwarded towards the host. So i don't see how we could use
> this feature with these chips.

FWIW, the Broadcom tag format has a reason code which tells you why the
packet was forwarded, which you will typically enable plus the
corresponding protocol snooping (IGMP, MLD, ARP, DHCP etc...) to know
what to do. For these kinds of switches it would not be too hard to
discard such packets because you know why you got them in the first place.

Is it true just for the "old" DSA format and not for e.g: EDSA?
-- 
Florian

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH net-next RFC v2] switchdev: bridge: drop hardware forwarded packets
  2015-03-20 23:30         ` roopa
@ 2015-03-21  0:26           ` Scott Feldman
  2015-03-21  5:53             ` roopa
  0 siblings, 1 reply; 51+ messages in thread
From: Scott Feldman @ 2015-03-21  0:26 UTC (permalink / raw)
  To: roopa
  Cc: John Fastabend, David S. Miller, Jiří Pírko, Arad,
	Ronen, Netdev

On Fri, Mar 20, 2015 at 4:30 PM, roopa <roopa@cumulusnetworks.com> wrote:
> On 3/20/15, 3:37 PM, Scott Feldman wrote:
[cut]
>>
>> I went back and read the thread just to remind me what the pros/cons
>> where.  I think the mixed case isn't a concern since this
>> BR_FLOOD_BCAST check is made on egress to the bridge port.  So only
>> clear BR_FLOOD_BCAST on hw switch ports (if hw did the flood already
>> amongst its ports), and leave it set for non-hw-ports.   It seems the
>> patch should mostly be a clone of how BR_FLOOD is handled.  Is there
>> more to it?
>
> That may work.  But, we have recently moved igmp handling to software in
> kernel and i was trying to make this work for that case. I am going to try what you
> suggest  by finding a work around for the igmp case.

Wait, you lost me on the IGMP comment...that wasn't in your commit
msg.  Do you mean IGMP snooping?  What are you trying to get to work?

It's hard to understand the pieces you're considering without example
implementations.  Can you use rocker or DSA to show example?

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH net-next RFC v2] switchdev: bridge: drop hardware forwarded packets
  2015-03-21  0:26           ` Scott Feldman
@ 2015-03-21  5:53             ` roopa
  0 siblings, 0 replies; 51+ messages in thread
From: roopa @ 2015-03-21  5:53 UTC (permalink / raw)
  To: Scott Feldman
  Cc: John Fastabend, David S. Miller, Jiří Pírko, Arad,
	Ronen, Netdev

On 3/20/15, 5:26 PM, Scott Feldman wrote:
> On Fri, Mar 20, 2015 at 4:30 PM, roopa <roopa@cumulusnetworks.com> wrote:
>> On 3/20/15, 3:37 PM, Scott Feldman wrote:
> [cut]
>>> I went back and read the thread just to remind me what the pros/cons
>>> where.  I think the mixed case isn't a concern since this
>>> BR_FLOOD_BCAST check is made on egress to the bridge port.  So only
>>> clear BR_FLOOD_BCAST on hw switch ports (if hw did the flood already
>>> amongst its ports), and leave it set for non-hw-ports.   It seems the
>>> patch should mostly be a clone of how BR_FLOOD is handled.  Is there
>>> more to it?
>> That may work.  But, we have recently moved igmp handling to software in
>> kernel and i was trying to make this work for that case. I am going to try what you
>> suggest  by finding a work around for the igmp case.
> Wait, you lost me on the IGMP comment...that wasn't in your commit
> msg.  Do you mean IGMP snooping?  What are you trying to get to work?
scott, I brought up igmp as an example because we do some selective 
forwarding there.
The point i was trying to make is disabling forwarding on ports is not 
good enough. In some cases, we
do end up making decisions in the switch driver if the packet should be 
forwarded.
  And, that's the reason why having a way to indicate this per packet is 
more flexible.
>
> It's hard to understand the pieces you're considering without example
> implementations.
agreed.

> Can you use rocker or DSA to show example?
yep, let me see what i can do.
I don't plan to add it as a bridge port flag right away,  because 
thinking through this again,
we might hit something else down the lane and we might end up piling up 
flags.
lets see as more use-cases pop up.

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH net-next RFC v2] switchdev: bridge: drop hardware forwarded packets
  2015-03-20 22:09     ` Andrew Lunn
  2015-03-20 23:43       ` Florian Fainelli
@ 2015-03-23  0:22       ` Guenter Roeck
  2015-03-23  1:33         ` John Fastabend
  2015-03-23 14:00         ` roopa
  1 sibling, 2 replies; 51+ messages in thread
From: Guenter Roeck @ 2015-03-23  0:22 UTC (permalink / raw)
  To: Andrew Lunn; +Cc: roopa, David Miller, sfeldma, jiri, ronen.arad, netdev

On Fri, Mar 20, 2015 at 11:09:46PM +0100, Andrew Lunn wrote:
> > since we have discussed this problem multiple times in switchdev meetings,
> > the intent of this RFC is to see get the code out and also see if
> > rocker or any other in-kernel
> > driver can use it.
> 
> The Marvell switches in DSA don't have any way to mark packets why
> they where forwarded towards the host. So i don't see how we could use
> this feature with these chips.
> 

If we (re-)enable unknown address flooding in the Marvell switch chips,
we could simply mark all packets received from the switch as "forwarded
by hardware". Sure, there is no bit in the header, but we would know
from the chip configuration that the packets were forwarded.

There may be a different problem, though: The driver won't know if
the packet still needs to be forwarded by the soft bridge, for
example to a port of a switch on another network interface
which is part of the same bridge group.

		+---+
		|br0|
		+---+
		 | |
	+--------+ +----+
	|		|
      +---+	      +---+
      |sw0|	      |sw1|
      +---+	      +---+
       | +---+	        |
     +--+  +--+       +--+
     |p0|  |p1|	      |p2|
     +--+  +--+	      +--+

In this scenarion, sw0 can only know that it forwarded a packet to ports
on the same switch. It does not know know that the packet needs to be
forwarded to p2 as well. It would forward the packet from p0 to p1, and
thus presumably set the hw_fwded bit, but br0 still needs to forward it.

Maybe the check should be "if the packet was HW forwarded, the destination
is a switch, and the destination is the same switch, don't forward the packet".
This would be expensive, but on the other side it should not affect too
many packets.

Guenter

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH net-next RFC v2] switchdev: bridge: drop hardware forwarded packets
  2015-03-23  0:22       ` Guenter Roeck
@ 2015-03-23  1:33         ` John Fastabend
  2015-03-23  2:57           ` Guenter Roeck
  2015-03-23 17:10           ` roopa
  2015-03-23 14:00         ` roopa
  1 sibling, 2 replies; 51+ messages in thread
From: John Fastabend @ 2015-03-23  1:33 UTC (permalink / raw)
  To: Guenter Roeck
  Cc: Andrew Lunn, roopa, David Miller, sfeldma, jiri, ronen.arad, netdev

On 03/22/2015 05:22 PM, Guenter Roeck wrote:
> On Fri, Mar 20, 2015 at 11:09:46PM +0100, Andrew Lunn wrote:
>>> since we have discussed this problem multiple times in switchdev meetings,
>>> the intent of this RFC is to see get the code out and also see if
>>> rocker or any other in-kernel
>>> driver can use it.
>>
>> The Marvell switches in DSA don't have any way to mark packets why
>> they where forwarded towards the host. So i don't see how we could use
>> this feature with these chips.
>>
>
> If we (re-)enable unknown address flooding in the Marvell switch chips,
> we could simply mark all packets received from the switch as "forwarded
> by hardware". Sure, there is no bit in the header, but we would know
> from the chip configuration that the packets were forwarded.
>
> There may be a different problem, though: The driver won't know if
> the packet still needs to be forwarded by the soft bridge, for
> example to a port of a switch on another network interface
> which is part of the same bridge group.
>
> 		+---+
> 		|br0|
> 		+---+
> 		 | |
> 	+--------+ +----+
> 	|		|
>        +---+	      +---+
>        |sw0|	      |sw1|
>        +---+	      +---+
>         | +---+	        |
>       +--+  +--+       +--+
>       |p0|  |p1|	      |p2|
>       +--+  +--+	      +--+
>
> In this scenarion, sw0 can only know that it forwarded a packet to ports
> on the same switch. It does not know know that the packet needs to be
> forwarded to p2 as well. It would forward the packet from p0 to p1, and
> thus presumably set the hw_fwded bit, but br0 still needs to forward it.
>
> Maybe the check should be "if the packet was HW forwarded, the destination
> is a switch, and the destination is the same switch, don't forward the packet".
> This would be expensive, but on the other side it should not affect too
> many packets.

I think you might get away with only forwarding if the switch_id is
different then the ingress switch_id if they are the same then drop it
and assume the hardware already did the forward operation. We could
also add a port setting to turn it on/off if that is important.

I'm not sure why you would want to forward a packet back on a switch
port of the same switch it was received on. If you want to do this I
think its a special case and can be handled outside the bridge software
via 'tc', 'ovs', 'netfilters', etc. Maybe I missed a case though so
would be glad to hear it if there is one.

>
> Guenter
> --
> To unsubscribe from this list: send the line "unsubscribe netdev" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>


-- 
John Fastabend         Intel Corporation

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH net-next RFC v2] switchdev: bridge: drop hardware forwarded packets
  2015-03-23  1:33         ` John Fastabend
@ 2015-03-23  2:57           ` Guenter Roeck
  2015-03-23  3:18             ` John Fastabend
  2015-03-23 17:10           ` roopa
  1 sibling, 1 reply; 51+ messages in thread
From: Guenter Roeck @ 2015-03-23  2:57 UTC (permalink / raw)
  To: John Fastabend
  Cc: Andrew Lunn, roopa, David Miller, sfeldma, jiri, ronen.arad, netdev

On 03/22/2015 06:33 PM, John Fastabend wrote:
> On 03/22/2015 05:22 PM, Guenter Roeck wrote:
>> On Fri, Mar 20, 2015 at 11:09:46PM +0100, Andrew Lunn wrote:
>>>> since we have discussed this problem multiple times in switchdev meetings,
>>>> the intent of this RFC is to see get the code out and also see if
>>>> rocker or any other in-kernel
>>>> driver can use it.
>>>
>>> The Marvell switches in DSA don't have any way to mark packets why
>>> they where forwarded towards the host. So i don't see how we could use
>>> this feature with these chips.
>>>
>>
>> If we (re-)enable unknown address flooding in the Marvell switch chips,
>> we could simply mark all packets received from the switch as "forwarded
>> by hardware". Sure, there is no bit in the header, but we would know
>> from the chip configuration that the packets were forwarded.
>>
>> There may be a different problem, though: The driver won't know if
>> the packet still needs to be forwarded by the soft bridge, for
>> example to a port of a switch on another network interface
>> which is part of the same bridge group.
>>
>>         +---+
>>         |br0|
>>         +---+
>>          | |
>>     +--------+ +----+
>>     |        |
>>        +---+          +---+
>>        |sw0|          |sw1|
>>        +---+          +---+
>>         | +---+            |
>>       +--+  +--+       +--+
>>       |p0|  |p1|          |p2|
>>       +--+  +--+          +--+
>>
>> In this scenarion, sw0 can only know that it forwarded a packet to ports
>> on the same switch. It does not know know that the packet needs to be
>> forwarded to p2 as well. It would forward the packet from p0 to p1, and
>> thus presumably set the hw_fwded bit, but br0 still needs to forward it.
>>
>> Maybe the check should be "if the packet was HW forwarded, the destination
>> is a switch, and the destination is the same switch, don't forward the packet".
>> This would be expensive, but on the other side it should not affect too
>> many packets.
>
> I think you might get away with only forwarding if the switch_id is
> different then the ingress switch_id if they are the same then drop it

That is what I tried to say above with  "if ... the destination is the same
switch, don't forward the packet".

> and assume the hardware already did the forward operation. We could
> also add a port setting to turn it on/off if that is important.
>
> I'm not sure why you would want to forward a packet back on a switch
> port of the same switch it was received on. If you want to do this I
> think its a special case and can be handled outside the bridge software
> via 'tc', 'ovs', 'netfilters', etc. Maybe I missed a case though so
> would be glad to hear it if there is one.
>
I don't. The point I was trying to make is that the patch as written
doesn't support the above case, where multiple switches are associated
with the same bridge group through different network interfaces. Granted,
that may not be a likely case, but it should still be supported.

Guenter

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH net-next RFC v2] switchdev: bridge: drop hardware forwarded packets
  2015-03-23  2:57           ` Guenter Roeck
@ 2015-03-23  3:18             ` John Fastabend
  2015-03-23  3:33               ` Guenter Roeck
  0 siblings, 1 reply; 51+ messages in thread
From: John Fastabend @ 2015-03-23  3:18 UTC (permalink / raw)
  To: Guenter Roeck
  Cc: Andrew Lunn, roopa, David Miller, sfeldma, jiri, ronen.arad, netdev

On 03/22/2015 07:57 PM, Guenter Roeck wrote:
> On 03/22/2015 06:33 PM, John Fastabend wrote:
>> On 03/22/2015 05:22 PM, Guenter Roeck wrote:
>>> On Fri, Mar 20, 2015 at 11:09:46PM +0100, Andrew Lunn wrote:
>>>>> since we have discussed this problem multiple times in switchdev
>>>>> meetings,
>>>>> the intent of this RFC is to see get the code out and also see if
>>>>> rocker or any other in-kernel
>>>>> driver can use it.
>>>>
>>>> The Marvell switches in DSA don't have any way to mark packets why
>>>> they where forwarded towards the host. So i don't see how we could use
>>>> this feature with these chips.
>>>>
>>>
>>> If we (re-)enable unknown address flooding in the Marvell switch chips,
>>> we could simply mark all packets received from the switch as "forwarded
>>> by hardware". Sure, there is no bit in the header, but we would know
>>> from the chip configuration that the packets were forwarded.
>>>
>>> There may be a different problem, though: The driver won't know if
>>> the packet still needs to be forwarded by the soft bridge, for
>>> example to a port of a switch on another network interface
>>> which is part of the same bridge group.
>>>
>>>         +---+
>>>         |br0|
>>>         +---+
>>>          | |
>>>     +--------+ +----+
>>>     |        |
>>>        +---+          +---+
>>>        |sw0|          |sw1|
>>>        +---+          +---+
>>>         | +---+            |
>>>       +--+  +--+       +--+
>>>       |p0|  |p1|          |p2|
>>>       +--+  +--+          +--+
>>>
>>> In this scenarion, sw0 can only know that it forwarded a packet to ports
>>> on the same switch. It does not know know that the packet needs to be
>>> forwarded to p2 as well. It would forward the packet from p0 to p1, and
>>> thus presumably set the hw_fwded bit, but br0 still needs to forward it.
>>>
>>> Maybe the check should be "if the packet was HW forwarded, the
>>> destination
>>> is a switch, and the destination is the same switch, don't forward
>>> the packet".
>>> This would be expensive, but on the other side it should not affect too
>>> many packets.
>>
>> I think you might get away with only forwarding if the switch_id is
>> different then the ingress switch_id if they are the same then drop it
>
> That is what I tried to say above with  "if ... the destination is the same
> switch, don't forward the packet".

Sorry I probably wasn't being clear. I'm just saying we don't need to
have the driver tell us if the packet has been forwarded. All we have
to do in software is the switch check and assume all packets sent to us
from the driver have already been handled by the hardware. Roopa is
working on this I believe.

>
>> and assume the hardware already did the forward operation. We could
>> also add a port setting to turn it on/off if that is important.
>>
>> I'm not sure why you would want to forward a packet back on a switch
>> port of the same switch it was received on. If you want to do this I
>> think its a special case and can be handled outside the bridge software
>> via 'tc', 'ovs', 'netfilters', etc. Maybe I missed a case though so
>> would be glad to hear it if there is one.
>>
> I don't. The point I was trying to make is that the patch as written
> doesn't support the above case, where multiple switches are associated
> with the same bridge group through different network interfaces. Granted,
> that may not be a likely case, but it should still be supported.

Agreed. Thanks.

>
> Guenter
>


-- 
John Fastabend         Intel Corporation

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH net-next RFC v2] switchdev: bridge: drop hardware forwarded packets
  2015-03-23  3:18             ` John Fastabend
@ 2015-03-23  3:33               ` Guenter Roeck
  2015-03-23 17:12                 ` roopa
  0 siblings, 1 reply; 51+ messages in thread
From: Guenter Roeck @ 2015-03-23  3:33 UTC (permalink / raw)
  To: John Fastabend
  Cc: Andrew Lunn, roopa, David Miller, sfeldma, jiri, ronen.arad, netdev

On 03/22/2015 08:18 PM, John Fastabend wrote:
[ ... ]
>
> Sorry I probably wasn't being clear. I'm just saying we don't need to
> have the driver tell us if the packet has been forwarded. All we have
> to do in software is the switch check and assume all packets sent to us
> from the driver have already been handled by the hardware. Roopa is
> working on this I believe.
>

Ah, ok. Yes, you are correct. Sorry, I missed that.

Thanks,
Guenter

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH net-next RFC v2] switchdev: bridge: drop hardware forwarded packets
  2015-03-23  0:22       ` Guenter Roeck
  2015-03-23  1:33         ` John Fastabend
@ 2015-03-23 14:00         ` roopa
  1 sibling, 0 replies; 51+ messages in thread
From: roopa @ 2015-03-23 14:00 UTC (permalink / raw)
  To: Guenter Roeck
  Cc: Andrew Lunn, David Miller, sfeldma, jiri, ronen.arad, netdev

On 3/22/15, 5:22 PM, Guenter Roeck wrote:
> On Fri, Mar 20, 2015 at 11:09:46PM +0100, Andrew Lunn wrote:
>>> since we have discussed this problem multiple times in switchdev meetings,
>>> the intent of this RFC is to see get the code out and also see if
>>> rocker or any other in-kernel
>>> driver can use it.
>> The Marvell switches in DSA don't have any way to mark packets why
>> they where forwarded towards the host. So i don't see how we could use
>> this feature with these chips.
>>
> If we (re-)enable unknown address flooding in the Marvell switch chips,
> we could simply mark all packets received from the switch as "forwarded
> by hardware". Sure, there is no bit in the header, but we would know
> from the chip configuration that the packets were forwarded.
>
> There may be a different problem, though: The driver won't know if
> the packet still needs to be forwarded by the soft bridge, for
> example to a port of a switch on another network interface
> which is part of the same bridge group.
>
> 		+---+
> 		|br0|
> 		+---+
> 		 | |
> 	+--------+ +----+
> 	|		|
>        +---+	      +---+
>        |sw0|	      |sw1|
>        +---+	      +---+
>         | +---+	        |
>       +--+  +--+       +--+
>       |p0|  |p1|	      |p2|
>       +--+  +--+	      +--+
>
> In this scenarion, sw0 can only know that it forwarded a packet to ports
> on the same switch. It does not know know that the packet needs to be
> forwarded to p2 as well. It would forward the packet from p0 to p1, and
> thus presumably set the hw_fwded bit, but br0 still needs to forward it.
>
> Maybe the check should be "if the packet was HW forwarded, the destination
> is a switch, and the destination is the same switch, don't forward the packet".

correct. And that's why my patch had the check in the bridge driver and
the check included  check for 'hw forwarded' flag in the packet and also 
checked that the port being forwarded to was
a switch port (port has NETIF_F_HW_SWITCH_OFFLOAD). I don't have the 
'same switch' check yet
for simplicity.
> This would be expensive, but on the other side it should not affect too
> many packets.
>
agreed.

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH net-next RFC v2] switchdev: bridge: drop hardware forwarded packets
  2015-03-23  1:33         ` John Fastabend
  2015-03-23  2:57           ` Guenter Roeck
@ 2015-03-23 17:10           ` roopa
  1 sibling, 0 replies; 51+ messages in thread
From: roopa @ 2015-03-23 17:10 UTC (permalink / raw)
  To: John Fastabend
  Cc: Guenter Roeck, Andrew Lunn, David Miller, sfeldma, jiri,
	ronen.arad, netdev

On 3/22/15, 6:33 PM, John Fastabend wrote:
> On 03/22/2015 05:22 PM, Guenter Roeck wrote:
>> On Fri, Mar 20, 2015 at 11:09:46PM +0100, Andrew Lunn wrote:
>>>> since we have discussed this problem multiple times in switchdev 
>>>> meetings,
>>>> the intent of this RFC is to see get the code out and also see if
>>>> rocker or any other in-kernel
>>>> driver can use it.
>>>
>>> The Marvell switches in DSA don't have any way to mark packets why
>>> they where forwarded towards the host. So i don't see how we could use
>>> this feature with these chips.
>>>
>>
>> If we (re-)enable unknown address flooding in the Marvell switch chips,
>> we could simply mark all packets received from the switch as "forwarded
>> by hardware". Sure, there is no bit in the header, but we would know
>> from the chip configuration that the packets were forwarded.
>>
>> There may be a different problem, though: The driver won't know if
>> the packet still needs to be forwarded by the soft bridge, for
>> example to a port of a switch on another network interface
>> which is part of the same bridge group.
>>
>>         +---+
>>         |br0|
>>         +---+
>>          | |
>>     +--------+ +----+
>>     |        |
>>        +---+          +---+
>>        |sw0|          |sw1|
>>        +---+          +---+
>>         | +---+            |
>>       +--+  +--+       +--+
>>       |p0|  |p1|          |p2|
>>       +--+  +--+          +--+
>>
>> In this scenarion, sw0 can only know that it forwarded a packet to ports
>> on the same switch. It does not know know that the packet needs to be
>> forwarded to p2 as well. It would forward the packet from p0 to p1, and
>> thus presumably set the hw_fwded bit, but br0 still needs to forward it.
>>
>> Maybe the check should be "if the packet was HW forwarded, the 
>> destination
>> is a switch, and the destination is the same switch, don't forward 
>> the packet".
>> This would be expensive, but on the other side it should not affect too
>> many packets.
>
> I think you might get away with only forwarding if the switch_id is
> different then the ingress switch_id if they are the same then drop it
> and assume the hardware already did the forward operation. We could
> also add a port setting to turn it on/off if that is important.

agreed.
>
> I'm not sure why you would want to forward a packet back on a switch
> port of the same switch it was received on.
As i mentioned in one of the earlier threads, we have found the need to 
move to software forwarding in some cases (igmp etc).
And having an option to do that would be more flexible so, RFC v2 was 
showing code that did that.
But, agree that supporting the most common case today, ie disallowing 
forwarding between
  two switch ports is a good starting point.


> If you want to do this I
> think its a special case and can be handled outside the bridge software
> via 'tc', 'ovs', 'netfilters', etc. Maybe I missed a case though so
> would be glad to hear it if there is one.

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH net-next RFC v2] switchdev: bridge: drop hardware forwarded packets
  2015-03-23  3:33               ` Guenter Roeck
@ 2015-03-23 17:12                 ` roopa
  2015-03-24  5:59                   ` Scott Feldman
  0 siblings, 1 reply; 51+ messages in thread
From: roopa @ 2015-03-23 17:12 UTC (permalink / raw)
  To: Guenter Roeck
  Cc: John Fastabend, Andrew Lunn, David Miller, sfeldma, jiri,
	ronen.arad, netdev

On 3/22/15, 8:33 PM, Guenter Roeck wrote:
> On 03/22/2015 08:18 PM, John Fastabend wrote:
> [ ... ]
>>
>> Sorry I probably wasn't being clear. I'm just saying we don't need to
>> have the driver tell us if the packet has been forwarded. All we have
>> to do in software is the switch check and assume all packets sent to us
>> from the driver have already been handled by the hardware. Roopa is
>> working on this I believe.
>>
>
> Ah, ok. Yes, you are correct. Sorry, I missed that.
yep, so my first RFC listed three ways to do this,
1) flag on the bridge port
2) check if the port being forwarded to is a switch port, using
             - the offload flag
             - the parent id (as john fastabend pointed out)
3) A per packet flag which switch driver sets indicating that the packet 
is hw forwarded.
     This is because we have run into cases where we want to move to 
software forwarding
     of certain packets like igmp reports.  (I will get some more 
details on the particular igmp problem).
     In such case, hardware punts the igmp packet to cpu and cpu will do 
the forwarding.
     I think we may hit more cases like this in the future.

my RFC v1 was based on 1). RFC v2 was based on 3) above.

But, for now, agree that we can just support the more common case using 2).
And, we can move to 3) in the future if needed.

thanks,
Roopa

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH net-next RFC v2] switchdev: bridge: drop hardware forwarded packets
  2015-03-23 17:12                 ` roopa
@ 2015-03-24  5:59                   ` Scott Feldman
  2015-03-24 13:13                     ` Guenter Roeck
  2015-03-24 14:29                     ` Jiri Pirko
  0 siblings, 2 replies; 51+ messages in thread
From: Scott Feldman @ 2015-03-24  5:59 UTC (permalink / raw)
  To: roopa
  Cc: Guenter Roeck, John Fastabend, Andrew Lunn, David Miller,
	Jiří Pírko, Arad, Ronen, Netdev

On Mon, Mar 23, 2015 at 10:12 AM, roopa <roopa@cumulusnetworks.com> wrote:
> On 3/22/15, 8:33 PM, Guenter Roeck wrote:
>>
>> On 03/22/2015 08:18 PM, John Fastabend wrote:
>> [ ... ]
>>>
>>>
>>> Sorry I probably wasn't being clear. I'm just saying we don't need to
>>> have the driver tell us if the packet has been forwarded. All we have
>>> to do in software is the switch check and assume all packets sent to us
>>> from the driver have already been handled by the hardware. Roopa is
>>> working on this I believe.
>>>
>>
>> Ah, ok. Yes, you are correct. Sorry, I missed that.
>
> yep, so my first RFC listed three ways to do this,
> 1) flag on the bridge port
> 2) check if the port being forwarded to is a switch port, using
>             - the offload flag
>             - the parent id (as john fastabend pointed out)
> 3) A per packet flag which switch driver sets indicating that the packet is
> hw forwarded.
>     This is because we have run into cases where we want to move to software
> forwarding
>     of certain packets like igmp reports.  (I will get some more details on
> the particular igmp problem).
>     In such case, hardware punts the igmp packet to cpu and cpu will do the
> forwarding.
>     I think we may hit more cases like this in the future.
>
> my RFC v1 was based on 1). RFC v2 was based on 3) above.
>
> But, for now, agree that we can just support the more common case using 2).
> And, we can move to 3) in the future if needed.

Roopa, I think it may be possible to do this without any changes to
the bridge code or switchdev code by dropping duplicate pkts in the
swdev driver itself.  The skb is marked with skb_iif set to ifindex of
ingress port, so when the driver goes to egress a pkt on the port, if
the skb_iif is one of the other device ports, we can assume the device
did the fwd already so we can drop the duplicate pkt.  Below is the
change to rocker.  The driver can get as fancy as it wants in its test
to drop or not.  This solution works for mixed offload and
non-offloaded ports in a bridge, or ports from different offload
devices in the same bridge.

Yes, the bridge is spending overhead to clone pkts to flood to its
ports.  IGMP snooping mitigates this for mcast.  BR_FLOOD can be
turned off on the bridge ports to mitigate this for unknown unicast
floods.  So what's left is bcasts.

What do you think?   napi_id looked like another field that could be
used to find the src of the pkt, but skb_iif seemed more straight
forward to use.

diff --git a/drivers/net/ethernet/rocker/rocker.c
b/drivers/net/ethernet/rocker/rocker.c
index aab962c..0f7217f7 100644
--- a/drivers/net/ethernet/rocker/rocker.c
+++ b/drivers/net/ethernet/rocker/rocker.c
@@ -3931,15 +3931,28 @@ unmap_frag:
        return -EMSGSIZE;
 }

+static bool rocker_port_dev_check(struct net_device *dev);
+
 static netdev_tx_t rocker_port_xmit(struct sk_buff *skb, struct
net_device *dev)
 {
        struct rocker_port *rocker_port = netdev_priv(dev);
        struct rocker *rocker = rocker_port->rocker;
        struct rocker_desc_info *desc_info;
        struct rocker_tlv *frags;
+       struct net_device *in_dev;
        int i;
        int err;

+       if (rocker_port_is_bridged(rocker_port)) {
+               rcu_read_lock();
+               in_dev = dev_get_by_index_rcu(dev_net(dev), skb->skb_iif);
+               if (in_dev && rocker_port_dev_check(in_dev)) {
+                       rcu_read_unlock();
+                       goto skip;
+               }
+               rcu_read_unlock();
+       }
+
        desc_info = rocker_desc_head_get(&rocker_port->tx_ring);
        if (unlikely(!desc_info)) {
                if (net_ratelimit())
@@ -3951,7 +3964,7 @@ static netdev_tx_t rocker_port_xmit(struct
sk_buff *skb, struct net_device *dev)

        frags = rocker_tlv_nest_start(desc_info, ROCKER_TLV_TX_FRAGS);
        if (!frags)
-               goto out;
+               goto drop;
        err = rocker_tx_desc_frag_map_put(rocker_port, desc_info,
                                          skb->data, skb_headlen(skb));
        if (err)
@@ -3983,9 +3996,10 @@ unmap_frags:
        rocker_tx_desc_frags_unmap(rocker_port, desc_info);
 nest_cancel:
        rocker_tlv_nest_cancel(desc_info, frags);
-out:
-       dev_kfree_skb(skb);
+drop:
        dev->stats.tx_dropped++;
+skip:
+       dev_kfree_skb(skb);

^ permalink raw reply related	[flat|nested] 51+ messages in thread

* Re: [PATCH net-next RFC v2] switchdev: bridge: drop hardware forwarded packets
  2015-03-24  5:59                   ` Scott Feldman
@ 2015-03-24 13:13                     ` Guenter Roeck
  2015-03-24 18:08                       ` Scott Feldman
  2015-03-24 14:29                     ` Jiri Pirko
  1 sibling, 1 reply; 51+ messages in thread
From: Guenter Roeck @ 2015-03-24 13:13 UTC (permalink / raw)
  To: Scott Feldman, roopa
  Cc: John Fastabend, Andrew Lunn, David Miller,
	Jiří Pírko, Arad, Ronen, Netdev

On 03/23/2015 10:59 PM, Scott Feldman wrote:
> On Mon, Mar 23, 2015 at 10:12 AM, roopa <roopa@cumulusnetworks.com> wrote:
>> On 3/22/15, 8:33 PM, Guenter Roeck wrote:
>>>
[ ... ]
>>
>> yep, so my first RFC listed three ways to do this,
>> 1) flag on the bridge port
>> 2) check if the port being forwarded to is a switch port, using
>>              - the offload flag
>>              - the parent id (as john fastabend pointed out)
>> 3) A per packet flag which switch driver sets indicating that the packet is
>> hw forwarded.
>>      This is because we have run into cases where we want to move to software
>> forwarding
>>      of certain packets like igmp reports.  (I will get some more details on
>> the particular igmp problem).
>>      In such case, hardware punts the igmp packet to cpu and cpu will do the
>> forwarding.
>>      I think we may hit more cases like this in the future.
>>
>> my RFC v1 was based on 1). RFC v2 was based on 3) above.
>>
>> But, for now, agree that we can just support the more common case using 2).
>> And, we can move to 3) in the future if needed.
>
> Roopa, I think it may be possible to do this without any changes to
> the bridge code or switchdev code by dropping duplicate pkts in the
> swdev driver itself.  The skb is marked with skb_iif set to ifindex of
> ingress port, so when the driver goes to egress a pkt on the port, if
> the skb_iif is one of the other device ports, we can assume the device
> did the fwd already so we can drop the duplicate pkt.  Below is the
> change to rocker.  The driver can get as fancy as it wants in its test
> to drop or not.  This solution works for mixed offload and
> non-offloaded ports in a bridge, or ports from different offload
> devices in the same bridge.
>
> Yes, the bridge is spending overhead to clone pkts to flood to its
> ports.  IGMP snooping mitigates this for mcast.  BR_FLOOD can be
> turned off on the bridge ports to mitigate this for unknown unicast
> floods.  So what's left is bcasts.
>
You would still want the soft bridge code to flood from non-switch ports
to switch ports and vice versa, as well as across multiple switches.
So I am not entirely sure I understand how turning off BR_FLOOD would help.

Thanks,
Guenter

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH net-next RFC v2] switchdev: bridge: drop hardware forwarded packets
  2015-03-24  5:59                   ` Scott Feldman
  2015-03-24 13:13                     ` Guenter Roeck
@ 2015-03-24 14:29                     ` Jiri Pirko
  2015-03-24 16:01                       ` Guenter Roeck
  1 sibling, 1 reply; 51+ messages in thread
From: Jiri Pirko @ 2015-03-24 14:29 UTC (permalink / raw)
  To: Scott Feldman
  Cc: roopa, Guenter Roeck, John Fastabend, Andrew Lunn, David Miller,
	Arad, Ronen, Netdev

Tue, Mar 24, 2015 at 06:59:43AM CET, sfeldma@gmail.com wrote:
>On Mon, Mar 23, 2015 at 10:12 AM, roopa <roopa@cumulusnetworks.com> wrote:
>> On 3/22/15, 8:33 PM, Guenter Roeck wrote:
>>>
>>> On 03/22/2015 08:18 PM, John Fastabend wrote:
>>> [ ... ]
>>>>
>>>>
>>>> Sorry I probably wasn't being clear. I'm just saying we don't need to
>>>> have the driver tell us if the packet has been forwarded. All we have
>>>> to do in software is the switch check and assume all packets sent to us
>>>> from the driver have already been handled by the hardware. Roopa is
>>>> working on this I believe.
>>>>
>>>
>>> Ah, ok. Yes, you are correct. Sorry, I missed that.
>>
>> yep, so my first RFC listed three ways to do this,
>> 1) flag on the bridge port
>> 2) check if the port being forwarded to is a switch port, using
>>             - the offload flag
>>             - the parent id (as john fastabend pointed out)
>> 3) A per packet flag which switch driver sets indicating that the packet is
>> hw forwarded.
>>     This is because we have run into cases where we want to move to software
>> forwarding
>>     of certain packets like igmp reports.  (I will get some more details on
>> the particular igmp problem).
>>     In such case, hardware punts the igmp packet to cpu and cpu will do the
>> forwarding.
>>     I think we may hit more cases like this in the future.
>>
>> my RFC v1 was based on 1). RFC v2 was based on 3) above.
>>
>> But, for now, agree that we can just support the more common case using 2).
>> And, we can move to 3) in the future if needed.
>
>Roopa, I think it may be possible to do this without any changes to
>the bridge code or switchdev code by dropping duplicate pkts in the
>swdev driver itself.  The skb is marked with skb_iif set to ifindex of
>ingress port, so when the driver goes to egress a pkt on the port, if
>the skb_iif is one of the other device ports, we can assume the device
>did the fwd already so we can drop the duplicate pkt.  Below is the
>change to rocker.  The driver can get as fancy as it wants in its test
>to drop or not.  This solution works for mixed offload and
>non-offloaded ports in a bridge, or ports from different offload
>devices in the same bridge.
>
>Yes, the bridge is spending overhead to clone pkts to flood to its
>ports.  IGMP snooping mitigates this for mcast.  BR_FLOOD can be
>turned off on the bridge ports to mitigate this for unknown unicast
>floods.  So what's left is bcasts.
>
>What do you think?   napi_id looked like another field that could be
>used to find the src of the pkt, but skb_iif seemed more straight
>forward to use.
>
>diff --git a/drivers/net/ethernet/rocker/rocker.c
>b/drivers/net/ethernet/rocker/rocker.c
>index aab962c..0f7217f7 100644
>--- a/drivers/net/ethernet/rocker/rocker.c
>+++ b/drivers/net/ethernet/rocker/rocker.c
>@@ -3931,15 +3931,28 @@ unmap_frag:
>        return -EMSGSIZE;
> }
>
>+static bool rocker_port_dev_check(struct net_device *dev);
>+
> static netdev_tx_t rocker_port_xmit(struct sk_buff *skb, struct
>net_device *dev)
> {
>        struct rocker_port *rocker_port = netdev_priv(dev);
>        struct rocker *rocker = rocker_port->rocker;
>        struct rocker_desc_info *desc_info;
>        struct rocker_tlv *frags;
>+       struct net_device *in_dev;
>        int i;
>        int err;
>
>+       if (rocker_port_is_bridged(rocker_port)) {
>+               rcu_read_lock();
>+               in_dev = dev_get_by_index_rcu(dev_net(dev), skb->skb_iif);

Hmm, you iterate over all ports for every xmit call :/
Would be nicer if skb_iif would be netdev poiter. Not sure it is doable.


>+               if (in_dev && rocker_port_dev_check(in_dev)) {
>+                       rcu_read_unlock();
>+                       goto skip;
>+               }
>+               rcu_read_unlock();
>+       }
>+
>        desc_info = rocker_desc_head_get(&rocker_port->tx_ring);
>        if (unlikely(!desc_info)) {
>                if (net_ratelimit())
>@@ -3951,7 +3964,7 @@ static netdev_tx_t rocker_port_xmit(struct
>sk_buff *skb, struct net_device *dev)
>
>        frags = rocker_tlv_nest_start(desc_info, ROCKER_TLV_TX_FRAGS);
>        if (!frags)
>-               goto out;
>+               goto drop;
>        err = rocker_tx_desc_frag_map_put(rocker_port, desc_info,
>                                          skb->data, skb_headlen(skb));
>        if (err)
>@@ -3983,9 +3996,10 @@ unmap_frags:
>        rocker_tx_desc_frags_unmap(rocker_port, desc_info);
> nest_cancel:
>        rocker_tlv_nest_cancel(desc_info, frags);
>-out:
>-       dev_kfree_skb(skb);
>+drop:
>        dev->stats.tx_dropped++;
>+skip:
>+       dev_kfree_skb(skb);

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH net-next RFC v2] switchdev: bridge: drop hardware forwarded packets
  2015-03-24 14:29                     ` Jiri Pirko
@ 2015-03-24 16:01                       ` Guenter Roeck
  2015-03-24 17:45                         ` roopa
  2015-03-24 17:58                         ` Scott Feldman
  0 siblings, 2 replies; 51+ messages in thread
From: Guenter Roeck @ 2015-03-24 16:01 UTC (permalink / raw)
  To: Jiri Pirko
  Cc: Scott Feldman, roopa, John Fastabend, Andrew Lunn, David Miller,
	Arad, Ronen, Netdev

On Tue, Mar 24, 2015 at 03:29:21PM +0100, Jiri Pirko wrote:
> >
> >diff --git a/drivers/net/ethernet/rocker/rocker.c
> >b/drivers/net/ethernet/rocker/rocker.c
> >index aab962c..0f7217f7 100644
> >--- a/drivers/net/ethernet/rocker/rocker.c
> >+++ b/drivers/net/ethernet/rocker/rocker.c
> >@@ -3931,15 +3931,28 @@ unmap_frag:
> >        return -EMSGSIZE;
> > }
> >
> >+static bool rocker_port_dev_check(struct net_device *dev);
> >+
> > static netdev_tx_t rocker_port_xmit(struct sk_buff *skb, struct
> >net_device *dev)
> > {
> >        struct rocker_port *rocker_port = netdev_priv(dev);
> >        struct rocker *rocker = rocker_port->rocker;
> >        struct rocker_desc_info *desc_info;
> >        struct rocker_tlv *frags;
> >+       struct net_device *in_dev;
> >        int i;
> >        int err;
> >
> >+       if (rocker_port_is_bridged(rocker_port)) {
> >+               rcu_read_lock();
> >+               in_dev = dev_get_by_index_rcu(dev_net(dev), skb->skb_iif);
> 
> Hmm, you iterate over all ports for every xmit call :/
> Would be nicer if skb_iif would be netdev poiter. Not sure it is doable.
> 
It may be easier (and faster) to loop through all rocker ports and try to find
one with the same ifindex. Then the dev_check call would not be necessary.

If the rocker switch can be attached to multiple bridges, it may also be
necessary to check if the bridge group is the same for both ports.

Guenter

> 
> >+               if (in_dev && rocker_port_dev_check(in_dev)) {
> >+                       rcu_read_unlock();
> >+                       goto skip;
> >+               }
> >+               rcu_read_unlock();
> >+       }
> >+
> >        desc_info = rocker_desc_head_get(&rocker_port->tx_ring);
> >        if (unlikely(!desc_info)) {
> >                if (net_ratelimit())
> >@@ -3951,7 +3964,7 @@ static netdev_tx_t rocker_port_xmit(struct
> >sk_buff *skb, struct net_device *dev)
> >
> >        frags = rocker_tlv_nest_start(desc_info, ROCKER_TLV_TX_FRAGS);
> >        if (!frags)
> >-               goto out;
> >+               goto drop;
> >        err = rocker_tx_desc_frag_map_put(rocker_port, desc_info,
> >                                          skb->data, skb_headlen(skb));
> >        if (err)
> >@@ -3983,9 +3996,10 @@ unmap_frags:
> >        rocker_tx_desc_frags_unmap(rocker_port, desc_info);
> > nest_cancel:
> >        rocker_tlv_nest_cancel(desc_info, frags);
> >-out:
> >-       dev_kfree_skb(skb);
> >+drop:
> >        dev->stats.tx_dropped++;
> >+skip:
> >+       dev_kfree_skb(skb);

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH net-next RFC v2] switchdev: bridge: drop hardware forwarded packets
  2015-03-24 16:01                       ` Guenter Roeck
@ 2015-03-24 17:45                         ` roopa
  2015-03-24 17:58                           ` Guenter Roeck
  2015-03-24 17:58                         ` Scott Feldman
  1 sibling, 1 reply; 51+ messages in thread
From: roopa @ 2015-03-24 17:45 UTC (permalink / raw)
  To: Guenter Roeck
  Cc: Jiri Pirko, Scott Feldman, John Fastabend, Andrew Lunn,
	David Miller, Arad, Ronen, Netdev

On 3/24/15, 9:01 AM, Guenter Roeck wrote:
> On Tue, Mar 24, 2015 at 03:29:21PM +0100, Jiri Pirko wrote:
>>> diff --git a/drivers/net/ethernet/rocker/rocker.c
>>> b/drivers/net/ethernet/rocker/rocker.c
>>> index aab962c..0f7217f7 100644
>>> --- a/drivers/net/ethernet/rocker/rocker.c
>>> +++ b/drivers/net/ethernet/rocker/rocker.c
>>> @@ -3931,15 +3931,28 @@ unmap_frag:
>>>         return -EMSGSIZE;
>>> }
>>>
>>> +static bool rocker_port_dev_check(struct net_device *dev);
>>> +
>>> static netdev_tx_t rocker_port_xmit(struct sk_buff *skb, struct
>>> net_device *dev)
>>> {
>>>         struct rocker_port *rocker_port = netdev_priv(dev);
>>>         struct rocker *rocker = rocker_port->rocker;
>>>         struct rocker_desc_info *desc_info;
>>>         struct rocker_tlv *frags;
>>> +       struct net_device *in_dev;
>>>         int i;
>>>         int err;
>>>
>>> +       if (rocker_port_is_bridged(rocker_port)) {
>>> +               rcu_read_lock();
>>> +               in_dev = dev_get_by_index_rcu(dev_net(dev), skb->skb_iif);
>> Hmm, you iterate over all ports for every xmit call :/
>> Would be nicer if skb_iif would be netdev poiter. Not sure it is doable.
>>
> It may be easier (and faster) to loop through all rocker ports and try to find
> one with the same ifindex. Then the dev_check call would not be necessary.
>
This is still overhead for every packet on the switches we support. The 
number of ports can go close to 128
(40G ports can be broken into 4x10G ports).

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH net-next RFC v2] switchdev: bridge: drop hardware forwarded packets
  2015-03-24 16:01                       ` Guenter Roeck
  2015-03-24 17:45                         ` roopa
@ 2015-03-24 17:58                         ` Scott Feldman
  1 sibling, 0 replies; 51+ messages in thread
From: Scott Feldman @ 2015-03-24 17:58 UTC (permalink / raw)
  To: Guenter Roeck
  Cc: Jiri Pirko, roopa, John Fastabend, Andrew Lunn, David Miller,
	Arad, Ronen, Netdev

On Tue, Mar 24, 2015 at 9:01 AM, Guenter Roeck <linux@roeck-us.net> wrote:
> On Tue, Mar 24, 2015 at 03:29:21PM +0100, Jiri Pirko wrote:
>> >
>> >diff --git a/drivers/net/ethernet/rocker/rocker.c
>> >b/drivers/net/ethernet/rocker/rocker.c
>> >index aab962c..0f7217f7 100644
>> >--- a/drivers/net/ethernet/rocker/rocker.c
>> >+++ b/drivers/net/ethernet/rocker/rocker.c
>> >@@ -3931,15 +3931,28 @@ unmap_frag:
>> >        return -EMSGSIZE;
>> > }
>> >
>> >+static bool rocker_port_dev_check(struct net_device *dev);
>> >+
>> > static netdev_tx_t rocker_port_xmit(struct sk_buff *skb, struct
>> >net_device *dev)
>> > {
>> >        struct rocker_port *rocker_port = netdev_priv(dev);
>> >        struct rocker *rocker = rocker_port->rocker;
>> >        struct rocker_desc_info *desc_info;
>> >        struct rocker_tlv *frags;
>> >+       struct net_device *in_dev;
>> >        int i;
>> >        int err;
>> >
>> >+       if (rocker_port_is_bridged(rocker_port)) {
>> >+               rcu_read_lock();
>> >+               in_dev = dev_get_by_index_rcu(dev_net(dev), skb->skb_iif);
>>
>> Hmm, you iterate over all ports for every xmit call :/
>> Would be nicer if skb_iif would be netdev poiter. Not sure it is doable.
>>
> It may be easier (and faster) to loop through all rocker ports and try to find
> one with the same ifindex. Then the dev_check call would not be necessary.
>
> If the rocker switch can be attached to multiple bridges, it may also be
> necessary to check if the bridge group is the same for both ports.
>
> Guenter

Yes, good suggestions.  This is the flexibility part if we do the
check in the driver rather than code above.

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH net-next RFC v2] switchdev: bridge: drop hardware forwarded packets
  2015-03-24 17:45                         ` roopa
@ 2015-03-24 17:58                           ` Guenter Roeck
  2015-03-24 18:14                             ` Scott Feldman
  2015-03-24 18:48                             ` David Christensen
  0 siblings, 2 replies; 51+ messages in thread
From: Guenter Roeck @ 2015-03-24 17:58 UTC (permalink / raw)
  To: roopa
  Cc: Jiri Pirko, Scott Feldman, John Fastabend, Andrew Lunn,
	David Miller, Arad, Ronen, Netdev

On Tue, Mar 24, 2015 at 10:45:03AM -0700, roopa wrote:
> On 3/24/15, 9:01 AM, Guenter Roeck wrote:
> >On Tue, Mar 24, 2015 at 03:29:21PM +0100, Jiri Pirko wrote:
> >>>diff --git a/drivers/net/ethernet/rocker/rocker.c
> >>>b/drivers/net/ethernet/rocker/rocker.c
> >>>index aab962c..0f7217f7 100644
> >>>--- a/drivers/net/ethernet/rocker/rocker.c
> >>>+++ b/drivers/net/ethernet/rocker/rocker.c
> >>>@@ -3931,15 +3931,28 @@ unmap_frag:
> >>>        return -EMSGSIZE;
> >>>}
> >>>
> >>>+static bool rocker_port_dev_check(struct net_device *dev);
> >>>+
> >>>static netdev_tx_t rocker_port_xmit(struct sk_buff *skb, struct
> >>>net_device *dev)
> >>>{
> >>>        struct rocker_port *rocker_port = netdev_priv(dev);
> >>>        struct rocker *rocker = rocker_port->rocker;
> >>>        struct rocker_desc_info *desc_info;
> >>>        struct rocker_tlv *frags;
> >>>+       struct net_device *in_dev;
> >>>        int i;
> >>>        int err;
> >>>
> >>>+       if (rocker_port_is_bridged(rocker_port)) {
> >>>+               rcu_read_lock();
> >>>+               in_dev = dev_get_by_index_rcu(dev_net(dev), skb->skb_iif);
> >>Hmm, you iterate over all ports for every xmit call :/
> >>Would be nicer if skb_iif would be netdev poiter. Not sure it is doable.
> >>
> >It may be easier (and faster) to loop through all rocker ports and try to find
> >one with the same ifindex. Then the dev_check call would not be necessary.
> >
> This is still overhead for every packet on the switches we support. The
> number of ports can go close to 128
> (40G ports can be broken into 4x10G ports).
> 
Agreed. Given that, and since dev_get_by_index_rcu uses a hash to find the
device pointer, it may actually be (much) faster (and the above "iterate
over all ports" is a bit misleading).

I tested the above approach with DSA and a Marvell switch chip. It works,
but I am a bit concerned about the per-packet overhead, especially
in larger networks. I would prefer if there would be a means to 'catch'
duplicate packets earlier - before they are even created, if that is
possible.

Guenter

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH net-next RFC v2] switchdev: bridge: drop hardware forwarded packets
  2015-03-24 13:13                     ` Guenter Roeck
@ 2015-03-24 18:08                       ` Scott Feldman
  0 siblings, 0 replies; 51+ messages in thread
From: Scott Feldman @ 2015-03-24 18:08 UTC (permalink / raw)
  To: Guenter Roeck
  Cc: roopa, John Fastabend, Andrew Lunn, David Miller,
	Jiří Pírko, Arad, Ronen, Netdev

On Tue, Mar 24, 2015 at 6:13 AM, Guenter Roeck <linux@roeck-us.net> wrote:
> On 03/23/2015 10:59 PM, Scott Feldman wrote:
>>
>> On Mon, Mar 23, 2015 at 10:12 AM, roopa <roopa@cumulusnetworks.com> wrote:
>>>
>>> On 3/22/15, 8:33 PM, Guenter Roeck wrote:
>>>>
>>>>
> [ ... ]
>
>>>
>>> yep, so my first RFC listed three ways to do this,
>>> 1) flag on the bridge port
>>> 2) check if the port being forwarded to is a switch port, using
>>>              - the offload flag
>>>              - the parent id (as john fastabend pointed out)
>>> 3) A per packet flag which switch driver sets indicating that the packet
>>> is
>>> hw forwarded.
>>>      This is because we have run into cases where we want to move to
>>> software
>>> forwarding
>>>      of certain packets like igmp reports.  (I will get some more details
>>> on
>>> the particular igmp problem).
>>>      In such case, hardware punts the igmp packet to cpu and cpu will do
>>> the
>>> forwarding.
>>>      I think we may hit more cases like this in the future.
>>>
>>> my RFC v1 was based on 1). RFC v2 was based on 3) above.
>>>
>>> But, for now, agree that we can just support the more common case using
>>> 2).
>>> And, we can move to 3) in the future if needed.
>>
>>
>> Roopa, I think it may be possible to do this without any changes to
>> the bridge code or switchdev code by dropping duplicate pkts in the
>> swdev driver itself.  The skb is marked with skb_iif set to ifindex of
>> ingress port, so when the driver goes to egress a pkt on the port, if
>> the skb_iif is one of the other device ports, we can assume the device
>> did the fwd already so we can drop the duplicate pkt.  Below is the
>> change to rocker.  The driver can get as fancy as it wants in its test
>> to drop or not.  This solution works for mixed offload and
>> non-offloaded ports in a bridge, or ports from different offload
>> devices in the same bridge.
>>
>> Yes, the bridge is spending overhead to clone pkts to flood to its
>> ports.  IGMP snooping mitigates this for mcast.  BR_FLOOD can be
>> turned off on the bridge ports to mitigate this for unknown unicast
>> floods.  So what's left is bcasts.
>>
> You would still want the soft bridge code to flood from non-switch ports
> to switch ports and vice versa, as well as across multiple switches.
> So I am not entirely sure I understand how turning off BR_FLOOD would help.

Ya, you're right, turning off BR_FLOOD wouldn't help for those cases.

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH net-next RFC v2] switchdev: bridge: drop hardware forwarded packets
  2015-03-24 17:58                           ` Guenter Roeck
@ 2015-03-24 18:14                             ` Scott Feldman
  2015-03-25  3:10                               ` Guenter Roeck
  2015-03-25  3:46                               ` Florian Fainelli
  2015-03-24 18:48                             ` David Christensen
  1 sibling, 2 replies; 51+ messages in thread
From: Scott Feldman @ 2015-03-24 18:14 UTC (permalink / raw)
  To: Guenter Roeck
  Cc: roopa, Jiri Pirko, John Fastabend, Andrew Lunn, David Miller,
	Arad, Ronen, Netdev

On Tue, Mar 24, 2015 at 10:58 AM, Guenter Roeck <linux@roeck-us.net> wrote:
> On Tue, Mar 24, 2015 at 10:45:03AM -0700, roopa wrote:
>> On 3/24/15, 9:01 AM, Guenter Roeck wrote:
>> >On Tue, Mar 24, 2015 at 03:29:21PM +0100, Jiri Pirko wrote:
>> >>>diff --git a/drivers/net/ethernet/rocker/rocker.c
>> >>>b/drivers/net/ethernet/rocker/rocker.c
>> >>>index aab962c..0f7217f7 100644
>> >>>--- a/drivers/net/ethernet/rocker/rocker.c
>> >>>+++ b/drivers/net/ethernet/rocker/rocker.c
>> >>>@@ -3931,15 +3931,28 @@ unmap_frag:
>> >>>        return -EMSGSIZE;
>> >>>}
>> >>>
>> >>>+static bool rocker_port_dev_check(struct net_device *dev);
>> >>>+
>> >>>static netdev_tx_t rocker_port_xmit(struct sk_buff *skb, struct
>> >>>net_device *dev)
>> >>>{
>> >>>        struct rocker_port *rocker_port = netdev_priv(dev);
>> >>>        struct rocker *rocker = rocker_port->rocker;
>> >>>        struct rocker_desc_info *desc_info;
>> >>>        struct rocker_tlv *frags;
>> >>>+       struct net_device *in_dev;
>> >>>        int i;
>> >>>        int err;
>> >>>
>> >>>+       if (rocker_port_is_bridged(rocker_port)) {
>> >>>+               rcu_read_lock();
>> >>>+               in_dev = dev_get_by_index_rcu(dev_net(dev), skb->skb_iif);
>> >>Hmm, you iterate over all ports for every xmit call :/
>> >>Would be nicer if skb_iif would be netdev poiter. Not sure it is doable.
>> >>
>> >It may be easier (and faster) to loop through all rocker ports and try to find
>> >one with the same ifindex. Then the dev_check call would not be necessary.
>> >
>> This is still overhead for every packet on the switches we support. The
>> number of ports can go close to 128
>> (40G ports can be broken into 4x10G ports).
>>
> Agreed. Given that, and since dev_get_by_index_rcu uses a hash to find the
> device pointer, it may actually be (much) faster (and the above "iterate
> over all ports" is a bit misleading).
>
> I tested the above approach with DSA and a Marvell switch chip. It works,
> but I am a bit concerned about the per-packet overhead, especially
> in larger networks. I would prefer if there would be a means to 'catch'
> duplicate packets earlier - before they are even created, if that is
> possible.

I'm not so concerned about the per-packet overhead.  For multicast, we
have IGMP snooping.  And big switches are going to have rate controls
on CPU bound traffic, so the CPU should be able to handle the
per-packet overhead with ease.

^ permalink raw reply	[flat|nested] 51+ messages in thread

* RE: [PATCH net-next RFC v2] switchdev: bridge: drop hardware forwarded packets
  2015-03-24 17:58                           ` Guenter Roeck
  2015-03-24 18:14                             ` Scott Feldman
@ 2015-03-24 18:48                             ` David Christensen
  1 sibling, 0 replies; 51+ messages in thread
From: David Christensen @ 2015-03-24 18:48 UTC (permalink / raw)
  To: Guenter Roeck, roopa
  Cc: Jiri Pirko, Scott Feldman, John Fastabend, Andrew Lunn,
	David Miller, Arad, Ronen, Netdev

> > This is still overhead for every packet on the switches we support. The
> > number of ports can go close to 128
> > (40G ports can be broken into 4x10G ports).
> >
> Agreed. Given that, and since dev_get_by_index_rcu uses a hash to find the
> device pointer, it may actually be (much) faster (and the above "iterate
> over all ports" is a bit misleading).
> 
> I tested the above approach with DSA and a Marvell switch chip. It works,
> but I am a bit concerned about the per-packet overhead, especially
> in larger networks. I would prefer if there would be a means to 'catch'
> duplicate packets earlier - before they are even created, if that is
> possible.

Seems like we're running into issues that would be encountered by stacked
switches.  Should we consider similar solutions, i.e. a stacking protocol
like HiGig?  In this situation each packet would be passed between switches
along with a stacking data structure that indicates relevant information
such as where the packet was first received (i.e. an ingress module ID) 
and a bit indicating that it's a broadcast packet.  The receiving switch
then knows which ports the broadcast frame has already egressed and can
then flood remaining ports if necessary.  You can also create a CPU bit
if you want to forward the frame specifically to the host CPU.

It requires more upfront work but pays off later in flexibility.

Dave

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH net-next RFC v2] switchdev: bridge: drop hardware forwarded packets
  2015-03-24 18:14                             ` Scott Feldman
@ 2015-03-25  3:10                               ` Guenter Roeck
  2015-03-25  3:46                               ` Florian Fainelli
  1 sibling, 0 replies; 51+ messages in thread
From: Guenter Roeck @ 2015-03-25  3:10 UTC (permalink / raw)
  To: Scott Feldman
  Cc: roopa, Jiri Pirko, John Fastabend, Andrew Lunn, David Miller,
	Arad, Ronen, Netdev

On 03/24/2015 11:14 AM, Scott Feldman wrote:
> On Tue, Mar 24, 2015 at 10:58 AM, Guenter Roeck <linux@roeck-us.net> wrote:
>> On Tue, Mar 24, 2015 at 10:45:03AM -0700, roopa wrote:
>>> On 3/24/15, 9:01 AM, Guenter Roeck wrote:
>>>> On Tue, Mar 24, 2015 at 03:29:21PM +0100, Jiri Pirko wrote:
>>>>>> diff --git a/drivers/net/ethernet/rocker/rocker.c
>>>>>> b/drivers/net/ethernet/rocker/rocker.c
>>>>>> index aab962c..0f7217f7 100644
>>>>>> --- a/drivers/net/ethernet/rocker/rocker.c
>>>>>> +++ b/drivers/net/ethernet/rocker/rocker.c
>>>>>> @@ -3931,15 +3931,28 @@ unmap_frag:
>>>>>>         return -EMSGSIZE;
>>>>>> }
>>>>>>
>>>>>> +static bool rocker_port_dev_check(struct net_device *dev);
>>>>>> +
>>>>>> static netdev_tx_t rocker_port_xmit(struct sk_buff *skb, struct
>>>>>> net_device *dev)
>>>>>> {
>>>>>>         struct rocker_port *rocker_port = netdev_priv(dev);
>>>>>>         struct rocker *rocker = rocker_port->rocker;
>>>>>>         struct rocker_desc_info *desc_info;
>>>>>>         struct rocker_tlv *frags;
>>>>>> +       struct net_device *in_dev;
>>>>>>         int i;
>>>>>>         int err;
>>>>>>
>>>>>> +       if (rocker_port_is_bridged(rocker_port)) {
>>>>>> +               rcu_read_lock();
>>>>>> +               in_dev = dev_get_by_index_rcu(dev_net(dev), skb->skb_iif);
>>>>> Hmm, you iterate over all ports for every xmit call :/
>>>>> Would be nicer if skb_iif would be netdev poiter. Not sure it is doable.
>>>>>
>>>> It may be easier (and faster) to loop through all rocker ports and try to find
>>>> one with the same ifindex. Then the dev_check call would not be necessary.
>>>>
>>> This is still overhead for every packet on the switches we support. The
>>> number of ports can go close to 128
>>> (40G ports can be broken into 4x10G ports).
>>>
>> Agreed. Given that, and since dev_get_by_index_rcu uses a hash to find the
>> device pointer, it may actually be (much) faster (and the above "iterate
>> over all ports" is a bit misleading).
>>
>> I tested the above approach with DSA and a Marvell switch chip. It works,
>> but I am a bit concerned about the per-packet overhead, especially
>> in larger networks. I would prefer if there would be a means to 'catch'
>> duplicate packets earlier - before they are even created, if that is
>> possible.
>
> I'm not so concerned about the per-packet overhead.  For multicast, we
> have IGMP snooping.  And big switches are going to have rate controls
> on CPU bound traffic, so the CPU should be able to handle the
> per-packet overhead with ease.
>

Ok, next question: Are there any legitimate reasons why a packet might be
sent out on the same interface ? Examples might be packets received through
a VPN or other tunnel and forwarded to the local network, or packets forwarded
in L3 (for example if there are multiple L3 networks on the same link).
Would skb_iif be set for such packets ?

Guenter

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH net-next RFC v2] switchdev: bridge: drop hardware forwarded packets
  2015-03-24 18:14                             ` Scott Feldman
  2015-03-25  3:10                               ` Guenter Roeck
@ 2015-03-25  3:46                               ` Florian Fainelli
  2015-03-25  5:06                                 ` Scott Feldman
  1 sibling, 1 reply; 51+ messages in thread
From: Florian Fainelli @ 2015-03-25  3:46 UTC (permalink / raw)
  To: Scott Feldman
  Cc: Guenter Roeck, roopa, Jiri Pirko, John Fastabend, Andrew Lunn,
	David Miller, Arad, Ronen, Netdev

2015-03-24 11:14 GMT-07:00 Scott Feldman <sfeldma@gmail.com>:
> On Tue, Mar 24, 2015 at 10:58 AM, Guenter Roeck <linux@roeck-us.net> wrote:
>> On Tue, Mar 24, 2015 at 10:45:03AM -0700, roopa wrote:
>>> On 3/24/15, 9:01 AM, Guenter Roeck wrote:
>>> >On Tue, Mar 24, 2015 at 03:29:21PM +0100, Jiri Pirko wrote:
>>> >>>diff --git a/drivers/net/ethernet/rocker/rocker.c
>>> >>>b/drivers/net/ethernet/rocker/rocker.c
>>> >>>index aab962c..0f7217f7 100644
>>> >>>--- a/drivers/net/ethernet/rocker/rocker.c
>>> >>>+++ b/drivers/net/ethernet/rocker/rocker.c
>>> >>>@@ -3931,15 +3931,28 @@ unmap_frag:
>>> >>>        return -EMSGSIZE;
>>> >>>}
>>> >>>
>>> >>>+static bool rocker_port_dev_check(struct net_device *dev);
>>> >>>+
>>> >>>static netdev_tx_t rocker_port_xmit(struct sk_buff *skb, struct
>>> >>>net_device *dev)
>>> >>>{
>>> >>>        struct rocker_port *rocker_port = netdev_priv(dev);
>>> >>>        struct rocker *rocker = rocker_port->rocker;
>>> >>>        struct rocker_desc_info *desc_info;
>>> >>>        struct rocker_tlv *frags;
>>> >>>+       struct net_device *in_dev;
>>> >>>        int i;
>>> >>>        int err;
>>> >>>
>>> >>>+       if (rocker_port_is_bridged(rocker_port)) {
>>> >>>+               rcu_read_lock();
>>> >>>+               in_dev = dev_get_by_index_rcu(dev_net(dev), skb->skb_iif);
>>> >>Hmm, you iterate over all ports for every xmit call :/
>>> >>Would be nicer if skb_iif would be netdev poiter. Not sure it is doable.
>>> >>
>>> >It may be easier (and faster) to loop through all rocker ports and try to find
>>> >one with the same ifindex. Then the dev_check call would not be necessary.
>>> >
>>> This is still overhead for every packet on the switches we support. The
>>> number of ports can go close to 128
>>> (40G ports can be broken into 4x10G ports).
>>>
>> Agreed. Given that, and since dev_get_by_index_rcu uses a hash to find the
>> device pointer, it may actually be (much) faster (and the above "iterate
>> over all ports" is a bit misleading).
>>
>> I tested the above approach with DSA and a Marvell switch chip. It works,
>> but I am a bit concerned about the per-packet overhead, especially
>> in larger networks. I would prefer if there would be a means to 'catch'
>> duplicate packets earlier - before they are even created, if that is
>> possible.
>
> I'm not so concerned about the per-packet overhead.  For multicast, we
> have IGMP snooping.  And big switches are going to have rate controls
> on CPU bound traffic, so the CPU should be able to handle the
> per-packet overhead with ease.

Ok, that works for a model where you are only processing exception
traffic, but this may not always be the case, there are things you
don't/can't offload on smaller devices, such that you still want the
overhead to be a lightweight as possible.
--
Florian

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH net-next RFC v2] switchdev: bridge: drop hardware forwarded packets
  2015-03-25  3:46                               ` Florian Fainelli
@ 2015-03-25  5:06                                 ` Scott Feldman
  2015-03-25 17:01                                   ` roopa
  0 siblings, 1 reply; 51+ messages in thread
From: Scott Feldman @ 2015-03-25  5:06 UTC (permalink / raw)
  To: Florian Fainelli
  Cc: Guenter Roeck, roopa, Jiri Pirko, John Fastabend, Andrew Lunn,
	David Miller, Arad, Ronen, Netdev

On Tue, Mar 24, 2015 at 8:46 PM, Florian Fainelli <f.fainelli@gmail.com> wrote:
> 2015-03-24 11:14 GMT-07:00 Scott Feldman <sfeldma@gmail.com>:
>> On Tue, Mar 24, 2015 at 10:58 AM, Guenter Roeck <linux@roeck-us.net> wrote:
>>> On Tue, Mar 24, 2015 at 10:45:03AM -0700, roopa wrote:
>>>> On 3/24/15, 9:01 AM, Guenter Roeck wrote:
>>>> >On Tue, Mar 24, 2015 at 03:29:21PM +0100, Jiri Pirko wrote:
>>>> >>>diff --git a/drivers/net/ethernet/rocker/rocker.c
>>>> >>>b/drivers/net/ethernet/rocker/rocker.c
>>>> >>>index aab962c..0f7217f7 100644
>>>> >>>--- a/drivers/net/ethernet/rocker/rocker.c
>>>> >>>+++ b/drivers/net/ethernet/rocker/rocker.c
>>>> >>>@@ -3931,15 +3931,28 @@ unmap_frag:
>>>> >>>        return -EMSGSIZE;
>>>> >>>}
>>>> >>>
>>>> >>>+static bool rocker_port_dev_check(struct net_device *dev);
>>>> >>>+
>>>> >>>static netdev_tx_t rocker_port_xmit(struct sk_buff *skb, struct
>>>> >>>net_device *dev)
>>>> >>>{
>>>> >>>        struct rocker_port *rocker_port = netdev_priv(dev);
>>>> >>>        struct rocker *rocker = rocker_port->rocker;
>>>> >>>        struct rocker_desc_info *desc_info;
>>>> >>>        struct rocker_tlv *frags;
>>>> >>>+       struct net_device *in_dev;
>>>> >>>        int i;
>>>> >>>        int err;
>>>> >>>
>>>> >>>+       if (rocker_port_is_bridged(rocker_port)) {
>>>> >>>+               rcu_read_lock();
>>>> >>>+               in_dev = dev_get_by_index_rcu(dev_net(dev), skb->skb_iif);
>>>> >>Hmm, you iterate over all ports for every xmit call :/
>>>> >>Would be nicer if skb_iif would be netdev poiter. Not sure it is doable.
>>>> >>
>>>> >It may be easier (and faster) to loop through all rocker ports and try to find
>>>> >one with the same ifindex. Then the dev_check call would not be necessary.
>>>> >
>>>> This is still overhead for every packet on the switches we support. The
>>>> number of ports can go close to 128
>>>> (40G ports can be broken into 4x10G ports).
>>>>
>>> Agreed. Given that, and since dev_get_by_index_rcu uses a hash to find the
>>> device pointer, it may actually be (much) faster (and the above "iterate
>>> over all ports" is a bit misleading).
>>>
>>> I tested the above approach with DSA and a Marvell switch chip. It works,
>>> but I am a bit concerned about the per-packet overhead, especially
>>> in larger networks. I would prefer if there would be a means to 'catch'
>>> duplicate packets earlier - before they are even created, if that is
>>> possible.
>>
>> I'm not so concerned about the per-packet overhead.  For multicast, we
>> have IGMP snooping.  And big switches are going to have rate controls
>> on CPU bound traffic, so the CPU should be able to handle the
>> per-packet overhead with ease.
>
> Ok, that works for a model where you are only processing exception
> traffic, but this may not always be the case, there are things you
> don't/can't offload on smaller devices, such that you still want the
> overhead to be a lightweight as possible.

Ya, that makes sense.  Well, I'll concede this solution.  It has
helped to draw out the requirements, so that's a positive.

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH net-next RFC v2] switchdev: bridge: drop hardware forwarded packets
  2015-03-25  5:06                                 ` Scott Feldman
@ 2015-03-25 17:01                                   ` roopa
  2015-03-26  7:44                                     ` Scott Feldman
  0 siblings, 1 reply; 51+ messages in thread
From: roopa @ 2015-03-25 17:01 UTC (permalink / raw)
  To: Scott Feldman
  Cc: Florian Fainelli, Guenter Roeck, Jiri Pirko, John Fastabend,
	Andrew Lunn, David Miller, Arad, Ronen, Netdev

On 3/24/15, 10:06 PM, Scott Feldman wrote:
> On Tue, Mar 24, 2015 at 8:46 PM, Florian Fainelli <f.fainelli@gmail.com> wrote:
>> 2015-03-24 11:14 GMT-07:00 Scott Feldman <sfeldma@gmail.com>:
>>> On Tue, Mar 24, 2015 at 10:58 AM, Guenter Roeck <linux@roeck-us.net> wrote:
>>>> On Tue, Mar 24, 2015 at 10:45:03AM -0700, roopa wrote:
>>>>> On 3/24/15, 9:01 AM, Guenter Roeck wrote:
>>>>>> On Tue, Mar 24, 2015 at 03:29:21PM +0100, Jiri Pirko wrote:
>>>>>>>> diff --git a/drivers/net/ethernet/rocker/rocker.c
>>>>>>>> b/drivers/net/ethernet/rocker/rocker.c
>>>>>>>> index aab962c..0f7217f7 100644
>>>>>>>> --- a/drivers/net/ethernet/rocker/rocker.c
>>>>>>>> +++ b/drivers/net/ethernet/rocker/rocker.c
>>>>>>>> @@ -3931,15 +3931,28 @@ unmap_frag:
>>>>>>>>         return -EMSGSIZE;
>>>>>>>> }
>>>>>>>>
>>>>>>>> +static bool rocker_port_dev_check(struct net_device *dev);
>>>>>>>> +
>>>>>>>> static netdev_tx_t rocker_port_xmit(struct sk_buff *skb, struct
>>>>>>>> net_device *dev)
>>>>>>>> {
>>>>>>>>         struct rocker_port *rocker_port = netdev_priv(dev);
>>>>>>>>         struct rocker *rocker = rocker_port->rocker;
>>>>>>>>         struct rocker_desc_info *desc_info;
>>>>>>>>         struct rocker_tlv *frags;
>>>>>>>> +       struct net_device *in_dev;
>>>>>>>>         int i;
>>>>>>>>         int err;
>>>>>>>>
>>>>>>>> +       if (rocker_port_is_bridged(rocker_port)) {
>>>>>>>> +               rcu_read_lock();
>>>>>>>> +               in_dev = dev_get_by_index_rcu(dev_net(dev), skb->skb_iif);
>>>>>>> Hmm, you iterate over all ports for every xmit call :/
>>>>>>> Would be nicer if skb_iif would be netdev poiter. Not sure it is doable.
>>>>>>>
>>>>>> It may be easier (and faster) to loop through all rocker ports and try to find
>>>>>> one with the same ifindex. Then the dev_check call would not be necessary.
>>>>>>
>>>>> This is still overhead for every packet on the switches we support. The
>>>>> number of ports can go close to 128
>>>>> (40G ports can be broken into 4x10G ports).
>>>>>
>>>> Agreed. Given that, and since dev_get_by_index_rcu uses a hash to find the
>>>> device pointer, it may actually be (much) faster (and the above "iterate
>>>> over all ports" is a bit misleading).
>>>>
>>>> I tested the above approach with DSA and a Marvell switch chip. It works,
>>>> but I am a bit concerned about the per-packet overhead, especially
>>>> in larger networks. I would prefer if there would be a means to 'catch'
>>>> duplicate packets earlier - before they are even created, if that is
>>>> possible.
>>> I'm not so concerned about the per-packet overhead.  For multicast, we
>>> have IGMP snooping.  And big switches are going to have rate controls
>>> on CPU bound traffic, so the CPU should be able to handle the
>>> per-packet overhead with ease.
>> Ok, that works for a model where you are only processing exception
>> traffic, but this may not always be the case, there are things you
>> don't/can't offload on smaller devices, such that you still want the
>> overhead to be a lightweight as possible.
> Ya, that makes sense.  Well, I'll concede this solution.  It has
> helped to draw out the requirements, so that's a positive.
indeed, thanks scott.

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH net-next RFC v2] switchdev: bridge: drop hardware forwarded packets
  2015-03-25 17:01                                   ` roopa
@ 2015-03-26  7:44                                     ` Scott Feldman
  2015-03-26  8:20                                       ` Jiri Pirko
  2015-03-30 14:06                                       ` roopa
  0 siblings, 2 replies; 51+ messages in thread
From: Scott Feldman @ 2015-03-26  7:44 UTC (permalink / raw)
  To: roopa
  Cc: Florian Fainelli, Guenter Roeck, Jiri Pirko, John Fastabend,
	Andrew Lunn, David Miller, Arad, Ronen, Netdev

On Wed, Mar 25, 2015 at 10:01 AM, roopa <roopa@cumulusnetworks.com> wrote:

[cut]

So just to keep the discussion alive (because we really need to solve
this problem), my current thinking is back to Roopa's RFC patch to
mark the skb to avoid fwding in bridge driver.  One idea (sorry if
this was already suggested, thread is long) is to use
swdev_parent_id_get op in the following way:

1) when port interface is added to bridge, bridge calls
swdev_parent_id_get() on port to get switch id.
swdev_parent_id_get() needs to be modified to work on stacked drivers.
For example, if a bond is the new bridge port, swdev_parent_id_get()
on the bond interface should get switch_id for bond member.  We stash
the switch_id in the bridge port private structure for later
comparison.

2) port driver knows the switch_id for the port, so any pkts it sends
up to the CPU which has already been flooded/fwded by the device are
marked with the switch_id.  So the skb is marked, somehow.  Some
options:

  a) add a new skb switch_id field that's wrapped with
CONFIG_NET_SWITCHDEV; seems bad, to add a new field.
  b) put switch_id into skb->cb, but not sure how this doesn't get
stomped on by upper drivers, or how
      bridge knows if something valid is in there or not.  Too bad we
don't have a TLV format for skb->cb, so
      layers could pile things on.  But 48 bytes isn't much to play with.
  c) squash switch_id into u32 skb->mark.  We loose information here
and could collide between switch_ids.

3) bridge driver, in br_flood(), does check if skb switch_id mark
matches dst port switch_id.  If so, skips fwding pkt to that port.
The switch_id compare check compares switch_id len and contents.  If
skb has no switch_id mark, then compare can be skipped.

The only tough part is figuring out 2).  Just need someway to stuff
switch_id into skb.  With bridge driver doing match on switch_id on a
per-packet basis, we can support Florian's case where sometimes we
want the bridge driver to fwd pkts (in those cases, the driver just
leaves skb switch_id mark empty).   Mixed offloaded and non-offloaded
ports works because switch_id comparison fails for non-offload ports.
Same for mixed switches bridged together.  The per-pkt overhead
concerns are minimized.

-scott

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH net-next RFC v2] switchdev: bridge: drop hardware forwarded packets
  2015-03-26  7:44                                     ` Scott Feldman
@ 2015-03-26  8:20                                       ` Jiri Pirko
  2015-03-26 14:28                                         ` Scott Feldman
  2015-03-30 14:06                                       ` roopa
  1 sibling, 1 reply; 51+ messages in thread
From: Jiri Pirko @ 2015-03-26  8:20 UTC (permalink / raw)
  To: Scott Feldman
  Cc: roopa, Florian Fainelli, Guenter Roeck, John Fastabend,
	Andrew Lunn, David Miller, Arad, Ronen, Netdev

Thu, Mar 26, 2015 at 08:44:27AM CET, sfeldma@gmail.com wrote:
>On Wed, Mar 25, 2015 at 10:01 AM, roopa <roopa@cumulusnetworks.com> wrote:
>
>[cut]
>
>So just to keep the discussion alive (because we really need to solve
>this problem), my current thinking is back to Roopa's RFC patch to
>mark the skb to avoid fwding in bridge driver.  One idea (sorry if
>this was already suggested, thread is long) is to use
>swdev_parent_id_get op in the following way:
>
>1) when port interface is added to bridge, bridge calls
>swdev_parent_id_get() on port to get switch id.
>swdev_parent_id_get() needs to be modified to work on stacked drivers.
>For example, if a bond is the new bridge port, swdev_parent_id_get()
>on the bond interface should get switch_id for bond member.  We stash
>the switch_id in the bridge port private structure for later
>comparison.

Nope, that cannot work. You can bond 2 ports each belonging to a
different switch.

swdev_parent_id_get should not work on stacked devices ever.

>
>2) port driver knows the switch_id for the port, so any pkts it sends
>up to the CPU which has already been flooded/fwded by the device are
>marked with the switch_id.  So the skb is marked, somehow.  Some
>options:
>
>  a) add a new skb switch_id field that's wrapped with
>CONFIG_NET_SWITCHDEV; seems bad, to add a new field.
>  b) put switch_id into skb->cb, but not sure how this doesn't get
>stomped on by upper drivers, or how
>      bridge knows if something valid is in there or not.  Too bad we
>don't have a TLV format for skb->cb, so
>      layers could pile things on.  But 48 bytes isn't much to play with.
>  c) squash switch_id into u32 skb->mark.  We loose information here
>and could collide between switch_ids.
>
>3) bridge driver, in br_flood(), does check if skb switch_id mark
>matches dst port switch_id.  If so, skips fwding pkt to that port.
>The switch_id compare check compares switch_id len and contents.  If
>skb has no switch_id mark, then compare can be skipped.
>
>
>The only tough part is figuring out 2).  Just need someway to stuff
>switch_id into skb.  With bridge driver doing match on switch_id on a
>per-packet basis, we can support Florian's case where sometimes we
>want the bridge driver to fwd pkts (in those cases, the driver just
>leaves skb switch_id mark empty).   Mixed offloaded and non-offloaded
>ports works because switch_id comparison fails for non-offload ports.
>Same for mixed switches bridged together.  The per-pkt overhead
>concerns are minimized.
>
>-scott

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH net-next RFC v2] switchdev: bridge: drop hardware forwarded packets
  2015-03-26  8:20                                       ` Jiri Pirko
@ 2015-03-26 14:28                                         ` Scott Feldman
  2015-03-26 14:49                                           ` Jiri Pirko
  0 siblings, 1 reply; 51+ messages in thread
From: Scott Feldman @ 2015-03-26 14:28 UTC (permalink / raw)
  To: Jiri Pirko
  Cc: roopa, Florian Fainelli, Guenter Roeck, John Fastabend,
	Andrew Lunn, David Miller, Arad, Ronen, Netdev

On Thu, Mar 26, 2015 at 1:20 AM, Jiri Pirko <jiri@resnulli.us> wrote:
> Thu, Mar 26, 2015 at 08:44:27AM CET, sfeldma@gmail.com wrote:
>>On Wed, Mar 25, 2015 at 10:01 AM, roopa <roopa@cumulusnetworks.com> wrote:
>>
>>[cut]
>>
>>So just to keep the discussion alive (because we really need to solve
>>this problem), my current thinking is back to Roopa's RFC patch to
>>mark the skb to avoid fwding in bridge driver.  One idea (sorry if
>>this was already suggested, thread is long) is to use
>>swdev_parent_id_get op in the following way:
>>
>>1) when port interface is added to bridge, bridge calls
>>swdev_parent_id_get() on port to get switch id.
>>swdev_parent_id_get() needs to be modified to work on stacked drivers.
>>For example, if a bond is the new bridge port, swdev_parent_id_get()
>>on the bond interface should get switch_id for bond member.  We stash
>>the switch_id in the bridge port private structure for later
>>comparison.
>
> Nope, that cannot work. You can bond 2 ports each belonging to a
> different switch.

Are you thinking about two switch ASICs in the same box, and bonding
ports from each?  Or are you thinking about bonding ports from
different boxes, ala MLAG?

In the first case the bond would report NULL switch_id if the member
ports don't all have the same switch_id.  If bond switch_id is NULL,
the bridge driver would fwd pkts to bond and now bond would make same
check as bridge: if dst port switch_id is same as skb switch_id, then
drop pkt.  In bridge, if bond switch_id is non-NULL and matches skb
switch_id, then drop pkt.  So it works as desired for this case.  It
requires the bonding/teaming driver to modify the default behavior for
swdev_parent_id_get() to only return switch_id if all ports agree on
switch_id.

For second case using MLAG, I suspect bond member port switch_ids
would likely be different, and so with same logic in bonding/bridge
drivers as above in first case, the pkt would be fwded down.

Is there another case to consider?  I think converting
swdev_parent_id_get() to use same algo we have for stp, allowing for
any layer to override like in my bonding example, will have benefits
down the road.

What is the argument for not allowing stacked version of swdev_parent_id_get()?

-scott

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH net-next RFC v2] switchdev: bridge: drop hardware forwarded packets
  2015-03-26 14:28                                         ` Scott Feldman
@ 2015-03-26 14:49                                           ` Jiri Pirko
  2015-03-27  1:08                                             ` Simon Horman
  2015-03-27  6:43                                             ` Scott Feldman
  0 siblings, 2 replies; 51+ messages in thread
From: Jiri Pirko @ 2015-03-26 14:49 UTC (permalink / raw)
  To: Scott Feldman
  Cc: roopa, Florian Fainelli, Guenter Roeck, John Fastabend,
	Andrew Lunn, David Miller, Arad, Ronen, Netdev

Thu, Mar 26, 2015 at 03:28:28PM CET, sfeldma@gmail.com wrote:
>On Thu, Mar 26, 2015 at 1:20 AM, Jiri Pirko <jiri@resnulli.us> wrote:
>> Thu, Mar 26, 2015 at 08:44:27AM CET, sfeldma@gmail.com wrote:
>>>On Wed, Mar 25, 2015 at 10:01 AM, roopa <roopa@cumulusnetworks.com> wrote:
>>>
>>>[cut]
>>>
>>>So just to keep the discussion alive (because we really need to solve
>>>this problem), my current thinking is back to Roopa's RFC patch to
>>>mark the skb to avoid fwding in bridge driver.  One idea (sorry if
>>>this was already suggested, thread is long) is to use
>>>swdev_parent_id_get op in the following way:
>>>
>>>1) when port interface is added to bridge, bridge calls
>>>swdev_parent_id_get() on port to get switch id.
>>>swdev_parent_id_get() needs to be modified to work on stacked drivers.
>>>For example, if a bond is the new bridge port, swdev_parent_id_get()
>>>on the bond interface should get switch_id for bond member.  We stash
>>>the switch_id in the bridge port private structure for later
>>>comparison.
>>
>> Nope, that cannot work. You can bond 2 ports each belonging to a
>> different switch.
>
>Are you thinking about two switch ASICs in the same box, and bonding
>ports from each?  Or are you thinking about bonding ports from
>different boxes, ala MLAG?

One machine, 2 switches.

>
>In the first case the bond would report NULL switch_id if the member
>ports don't all have the same switch_id.  If bond switch_id is NULL,
>the bridge driver would fwd pkts to bond and now bond would make same
>check as bridge: if dst port switch_id is same as skb switch_id, then
>drop pkt.  In bridge, if bond switch_id is non-NULL and matches skb
>switch_id, then drop pkt.  So it works as desired for this case.  It
>requires the bonding/teaming driver to modify the default behavior for
>swdev_parent_id_get() to only return switch_id if all ports agree on
>switch_id.
>
>For second case using MLAG, I suspect bond member port switch_ids
>would likely be different, and so with same logic in bonding/bridge
>drivers as above in first case, the pkt would be fwded down.
>
>Is there another case to consider?  I think converting
>swdev_parent_id_get() to use same algo we have for stp, allowing for
>any layer to override like in my bonding example, will have benefits
>down the road.
>
>What is the argument for not allowing stacked version of swdev_parent_id_get()?

That was suppose wo identify a switch port. "ip link" will show you that
and you see right away what is going on. If bond implements that, that
brigs a mess. I don't like that.

>
>-scott

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH net-next RFC v2] switchdev: bridge: drop hardware forwarded packets
  2015-03-26 14:49                                           ` Jiri Pirko
@ 2015-03-27  1:08                                             ` Simon Horman
  2015-03-27  6:02                                               ` Jiri Pirko
  2015-03-27  6:43                                             ` Scott Feldman
  1 sibling, 1 reply; 51+ messages in thread
From: Simon Horman @ 2015-03-27  1:08 UTC (permalink / raw)
  To: Jiri Pirko
  Cc: Scott Feldman, roopa, Florian Fainelli, Guenter Roeck,
	John Fastabend, Andrew Lunn, David Miller, Arad, Ronen, Netdev

On Thu, Mar 26, 2015 at 03:49:22PM +0100, Jiri Pirko wrote:
> Thu, Mar 26, 2015 at 03:28:28PM CET, sfeldma@gmail.com wrote:
> >On Thu, Mar 26, 2015 at 1:20 AM, Jiri Pirko <jiri@resnulli.us> wrote:
> >> Thu, Mar 26, 2015 at 08:44:27AM CET, sfeldma@gmail.com wrote:
> >>>On Wed, Mar 25, 2015 at 10:01 AM, roopa <roopa@cumulusnetworks.com> wrote:
> >>>
> >>>[cut]
> >>>
> >>>So just to keep the discussion alive (because we really need to solve
> >>>this problem), my current thinking is back to Roopa's RFC patch to
> >>>mark the skb to avoid fwding in bridge driver.  One idea (sorry if
> >>>this was already suggested, thread is long) is to use
> >>>swdev_parent_id_get op in the following way:
> >>>
> >>>1) when port interface is added to bridge, bridge calls
> >>>swdev_parent_id_get() on port to get switch id.
> >>>swdev_parent_id_get() needs to be modified to work on stacked drivers.
> >>>For example, if a bond is the new bridge port, swdev_parent_id_get()
> >>>on the bond interface should get switch_id for bond member.  We stash
> >>>the switch_id in the bridge port private structure for later
> >>>comparison.
> >>
> >> Nope, that cannot work. You can bond 2 ports each belonging to a
> >> different switch.
> >
> >Are you thinking about two switch ASICs in the same box, and bonding
> >ports from each?  Or are you thinking about bonding ports from
> >different boxes, ala MLAG?
> 
> One machine, 2 switches.
> 
> >
> >In the first case the bond would report NULL switch_id if the member
> >ports don't all have the same switch_id.  If bond switch_id is NULL,
> >the bridge driver would fwd pkts to bond and now bond would make same
> >check as bridge: if dst port switch_id is same as skb switch_id, then
> >drop pkt.  In bridge, if bond switch_id is non-NULL and matches skb
> >switch_id, then drop pkt.  So it works as desired for this case.  It
> >requires the bonding/teaming driver to modify the default behavior for
> >swdev_parent_id_get() to only return switch_id if all ports agree on
> >switch_id.
> >
> >For second case using MLAG, I suspect bond member port switch_ids
> >would likely be different, and so with same logic in bonding/bridge
> >drivers as above in first case, the pkt would be fwded down.
> >
> >Is there another case to consider?  I think converting
> >swdev_parent_id_get() to use same algo we have for stp, allowing for
> >any layer to override like in my bonding example, will have benefits
> >down the road.
> >
> >What is the argument for not allowing stacked version of swdev_parent_id_get()?
> 
> That was suppose wo identify a switch port. "ip link" will show you that
> and you see right away what is going on. If bond implements that, that
> brigs a mess. I don't like that.

I'm not sure that I follow how this messes things up from a bridging point
of view. Would it help if bonds consistently returned a NULL parent id
even if all its slaves have the same parent id?

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH net-next RFC v2] switchdev: bridge: drop hardware forwarded packets
  2015-03-27  1:08                                             ` Simon Horman
@ 2015-03-27  6:02                                               ` Jiri Pirko
  0 siblings, 0 replies; 51+ messages in thread
From: Jiri Pirko @ 2015-03-27  6:02 UTC (permalink / raw)
  To: Simon Horman
  Cc: Scott Feldman, roopa, Florian Fainelli, Guenter Roeck,
	John Fastabend, Andrew Lunn, David Miller, Arad, Ronen, Netdev

Fri, Mar 27, 2015 at 02:08:23AM CET, simon.horman@netronome.com wrote:
>On Thu, Mar 26, 2015 at 03:49:22PM +0100, Jiri Pirko wrote:
>> Thu, Mar 26, 2015 at 03:28:28PM CET, sfeldma@gmail.com wrote:
>> >On Thu, Mar 26, 2015 at 1:20 AM, Jiri Pirko <jiri@resnulli.us> wrote:
>> >> Thu, Mar 26, 2015 at 08:44:27AM CET, sfeldma@gmail.com wrote:
>> >>>On Wed, Mar 25, 2015 at 10:01 AM, roopa <roopa@cumulusnetworks.com> wrote:
>> >>>
>> >>>[cut]
>> >>>
>> >>>So just to keep the discussion alive (because we really need to solve
>> >>>this problem), my current thinking is back to Roopa's RFC patch to
>> >>>mark the skb to avoid fwding in bridge driver.  One idea (sorry if
>> >>>this was already suggested, thread is long) is to use
>> >>>swdev_parent_id_get op in the following way:
>> >>>
>> >>>1) when port interface is added to bridge, bridge calls
>> >>>swdev_parent_id_get() on port to get switch id.
>> >>>swdev_parent_id_get() needs to be modified to work on stacked drivers.
>> >>>For example, if a bond is the new bridge port, swdev_parent_id_get()
>> >>>on the bond interface should get switch_id for bond member.  We stash
>> >>>the switch_id in the bridge port private structure for later
>> >>>comparison.
>> >>
>> >> Nope, that cannot work. You can bond 2 ports each belonging to a
>> >> different switch.
>> >
>> >Are you thinking about two switch ASICs in the same box, and bonding
>> >ports from each?  Or are you thinking about bonding ports from
>> >different boxes, ala MLAG?
>> 
>> One machine, 2 switches.
>> 
>> >
>> >In the first case the bond would report NULL switch_id if the member
>> >ports don't all have the same switch_id.  If bond switch_id is NULL,
>> >the bridge driver would fwd pkts to bond and now bond would make same
>> >check as bridge: if dst port switch_id is same as skb switch_id, then
>> >drop pkt.  In bridge, if bond switch_id is non-NULL and matches skb
>> >switch_id, then drop pkt.  So it works as desired for this case.  It
>> >requires the bonding/teaming driver to modify the default behavior for
>> >swdev_parent_id_get() to only return switch_id if all ports agree on
>> >switch_id.
>> >
>> >For second case using MLAG, I suspect bond member port switch_ids
>> >would likely be different, and so with same logic in bonding/bridge
>> >drivers as above in first case, the pkt would be fwded down.
>> >
>> >Is there another case to consider?  I think converting
>> >swdev_parent_id_get() to use same algo we have for stp, allowing for
>> >any layer to override like in my bonding example, will have benefits
>> >down the road.
>> >
>> >What is the argument for not allowing stacked version of swdev_parent_id_get()?
>> 
>> That was suppose wo identify a switch port. "ip link" will show you that
>> and you see right away what is going on. If bond implements that, that
>> brigs a mess. I don't like that.
>
>I'm not sure that I follow how this messes things up from a bridging point
>of view. Would it help if bonds consistently returned a NULL parent id
>even if all its slaves have the same parent id?

It's not about bridging point of view. Now, it is clear that if switchid
is defined for a device, that device is a switch port. According to the
switchid, you can find out multiple ports are part of single switch
chip. It is very clear for user.

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH net-next RFC v2] switchdev: bridge: drop hardware forwarded packets
  2015-03-26 14:49                                           ` Jiri Pirko
  2015-03-27  1:08                                             ` Simon Horman
@ 2015-03-27  6:43                                             ` Scott Feldman
  2015-03-27  7:01                                               ` Jiri Pirko
  1 sibling, 1 reply; 51+ messages in thread
From: Scott Feldman @ 2015-03-27  6:43 UTC (permalink / raw)
  To: Jiri Pirko
  Cc: roopa, Florian Fainelli, Guenter Roeck, John Fastabend,
	Andrew Lunn, David Miller, Arad, Ronen, Netdev

On Thu, Mar 26, 2015 at 7:49 AM, Jiri Pirko <jiri@resnulli.us> wrote:
> Thu, Mar 26, 2015 at 03:28:28PM CET, sfeldma@gmail.com wrote:

>>What is the argument for not allowing stacked version of swdev_parent_id_get()?
>
> That was suppose wo identify a switch port. "ip link" will show you that
> and you see right away what is going on. If bond implements that, that
> brigs a mess. I don't like that.

A bond is an aggregator of ports and showing switch_id for bond is no
more messy than showing link speed for bond, derived from link speed
on member ports.

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH net-next RFC v2] switchdev: bridge: drop hardware forwarded packets
  2015-03-27  6:43                                             ` Scott Feldman
@ 2015-03-27  7:01                                               ` Jiri Pirko
  2015-03-27 23:19                                                 ` Scott Feldman
  0 siblings, 1 reply; 51+ messages in thread
From: Jiri Pirko @ 2015-03-27  7:01 UTC (permalink / raw)
  To: Scott Feldman
  Cc: roopa, Florian Fainelli, Guenter Roeck, John Fastabend,
	Andrew Lunn, David Miller, Arad, Ronen, Netdev

Fri, Mar 27, 2015 at 07:43:35AM CET, sfeldma@gmail.com wrote:
>On Thu, Mar 26, 2015 at 7:49 AM, Jiri Pirko <jiri@resnulli.us> wrote:
>> Thu, Mar 26, 2015 at 03:28:28PM CET, sfeldma@gmail.com wrote:
>
>>>What is the argument for not allowing stacked version of swdev_parent_id_get()?
>>
>> That was suppose wo identify a switch port. "ip link" will show you that
>> and you see right away what is going on. If bond implements that, that
>> brigs a mess. I don't like that.
>
>A bond is an aggregator of ports and showing switch_id for bond is no
>more messy than showing link speed for bond, derived from link speed
>on member ports.

Well, I don't think so. I think the switchid should be showed on port
devices only.

Please remind me why exactly you need this?

Also should switch id be showed on other devices? Like bridges, vlans,
macvlans, etc?
For bond it sometimes will be shown (2ports of the same switch)
sometimes not (2ports each of the different switch).
That looks messy to me. I might be wrong of course...

Jiri

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH net-next RFC v2] switchdev: bridge: drop hardware forwarded packets
  2015-03-27  7:01                                               ` Jiri Pirko
@ 2015-03-27 23:19                                                 ` Scott Feldman
  0 siblings, 0 replies; 51+ messages in thread
From: Scott Feldman @ 2015-03-27 23:19 UTC (permalink / raw)
  To: Jiri Pirko
  Cc: roopa, Florian Fainelli, Guenter Roeck, John Fastabend,
	Andrew Lunn, David Miller, Arad, Ronen, Netdev

On Fri, Mar 27, 2015 at 12:01 AM, Jiri Pirko <jiri@resnulli.us> wrote:
> Fri, Mar 27, 2015 at 07:43:35AM CET, sfeldma@gmail.com wrote:
>>On Thu, Mar 26, 2015 at 7:49 AM, Jiri Pirko <jiri@resnulli.us> wrote:
>>> Thu, Mar 26, 2015 at 03:28:28PM CET, sfeldma@gmail.com wrote:
>>
>>>>What is the argument for not allowing stacked version of swdev_parent_id_get()?
>>>
>>> That was suppose wo identify a switch port. "ip link" will show you that
>>> and you see right away what is going on. If bond implements that, that
>>> brigs a mess. I don't like that.
>>
>>A bond is an aggregator of ports and showing switch_id for bond is no
>>more messy than showing link speed for bond, derived from link speed
>>on member ports.
>
> Well, I don't think so. I think the switchid should be showed on port
> devices only.

That's doable by making switchid sysfs and netlink interfaces (the
user-visible interfaces) not recurse, and for internal usage, to get
switchid of aggregate device, allow recursion.  I'll be posting
patches soon so you'll see what I mean from code.

> Please remind me why exactly you need this?

Central theme of this thread: to know if bcast/mcast/unknown ucast
pkts should be flooded by bridge or not on ports hardware may have
already flooded.

> Also should switch id be showed on other devices? Like bridges, vlans,
> macvlans, etc?
> For bond it sometimes will be shown (2ports of the same switch)
> sometimes not (2ports each of the different switch).
> That looks messy to me. I might be wrong of course...

No problem, we'll make only the base port device show switchid.

-scott

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH net-next RFC v2] switchdev: bridge: drop hardware forwarded packets
  2015-03-26  7:44                                     ` Scott Feldman
  2015-03-26  8:20                                       ` Jiri Pirko
@ 2015-03-30 14:06                                       ` roopa
  1 sibling, 0 replies; 51+ messages in thread
From: roopa @ 2015-03-30 14:06 UTC (permalink / raw)
  To: Scott Feldman
  Cc: Florian Fainelli, Guenter Roeck, Jiri Pirko, John Fastabend,
	Andrew Lunn, David Miller, Arad, Ronen, Netdev

On 3/26/15, 12:44 AM, Scott Feldman wrote:
> On Wed, Mar 25, 2015 at 10:01 AM, roopa <roopa@cumulusnetworks.com> wrote:
>
> [cut]
>
> So just to keep the discussion alive (because we really need to solve
> this problem), my current thinking is back to Roopa's RFC patch to
> mark the skb to avoid fwding in bridge driver.  One idea (sorry if
> this was already suggested, thread is long) is to use
> swdev_parent_id_get op in the following way:
>
> 1) when port interface is added to bridge, bridge calls
> swdev_parent_id_get() on port to get switch id.
> swdev_parent_id_get() needs to be modified to work on stacked drivers.
> For example, if a bond is the new bridge port, swdev_parent_id_get()
> on the bond interface should get switch_id for bond member.  We stash
> the switch_id in the bridge port private structure for later
> comparison.
>
> 2) port driver knows the switch_id for the port, so any pkts it sends
> up to the CPU which has already been flooded/fwded by the device are
> marked with the switch_id.  So the skb is marked, somehow.  Some
> options:
>
>    a) add a new skb switch_id field that's wrapped with
> CONFIG_NET_SWITCHDEV; seems bad, to add a new field.
>    b) put switch_id into skb->cb, but not sure how this doesn't get
> stomped on by upper drivers, or how
>        bridge knows if something valid is in there or not.  Too bad we
> don't have a TLV format for skb->cb, so
>        layers could pile things on.  But 48 bytes isn't much to play with.
>    c) squash switch_id into u32 skb->mark.  We loose information here
> and could collide between switch_ids.
>
> 3) bridge driver, in br_flood(), does check if skb switch_id mark
> matches dst port switch_id.  If so, skips fwding pkt to that port.
> The switch_id compare check compares switch_id len and contents.  If
> skb has no switch_id mark, then compare can be skipped.
>
>
> The only tough part is figuring out 2).

c) might be out of the question if userspace is using any markings and 
it may get overwritten.

> Just need someway to stuff
> switch_id into skb.  With bridge driver doing match on switch_id on a
> per-packet basis, we can support Florian's case where sometimes we
> want the bridge driver to fwd pkts (in those cases, the driver just
> leaves skb switch_id mark empty).
I have this case too and that's why i had the flag in the skb.
Agree, having switchid there will help with the overhead associated with 
looking up the switchid again.

>    Mixed offloaded and non-offloaded
> ports works because switch_id comparison fails for non-offload ports.
> Same for mixed switches bridged together.  The per-pkt overhead
> concerns are minimized.
>
Thanks for keeping this discussion going.

^ permalink raw reply	[flat|nested] 51+ messages in thread

end of thread, other threads:[~2015-03-30 14:06 UTC | newest]

Thread overview: 51+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-03-20 16:58 [PATCH net-next RFC v2] switchdev: bridge: drop hardware forwarded packets roopa
2015-03-20 17:11 ` John Fastabend
2015-03-20 18:13   ` Scott Feldman
2015-03-20 18:30     ` John Fastabend
2015-03-20 22:06     ` roopa
2015-03-20 22:37       ` Scott Feldman
2015-03-20 23:30         ` roopa
2015-03-21  0:26           ` Scott Feldman
2015-03-21  5:53             ` roopa
2015-03-20 21:03   ` roopa
2015-03-20 21:23     ` John Fastabend
2015-03-20 22:04       ` Andrew Lunn
2015-03-20 23:12       ` roopa
2015-03-20 18:03 ` Scott Feldman
2015-03-20 21:20   ` roopa
2015-03-20 20:36 ` David Miller
2015-03-20 21:36   ` roopa
2015-03-20 22:09     ` Andrew Lunn
2015-03-20 23:43       ` Florian Fainelli
2015-03-23  0:22       ` Guenter Roeck
2015-03-23  1:33         ` John Fastabend
2015-03-23  2:57           ` Guenter Roeck
2015-03-23  3:18             ` John Fastabend
2015-03-23  3:33               ` Guenter Roeck
2015-03-23 17:12                 ` roopa
2015-03-24  5:59                   ` Scott Feldman
2015-03-24 13:13                     ` Guenter Roeck
2015-03-24 18:08                       ` Scott Feldman
2015-03-24 14:29                     ` Jiri Pirko
2015-03-24 16:01                       ` Guenter Roeck
2015-03-24 17:45                         ` roopa
2015-03-24 17:58                           ` Guenter Roeck
2015-03-24 18:14                             ` Scott Feldman
2015-03-25  3:10                               ` Guenter Roeck
2015-03-25  3:46                               ` Florian Fainelli
2015-03-25  5:06                                 ` Scott Feldman
2015-03-25 17:01                                   ` roopa
2015-03-26  7:44                                     ` Scott Feldman
2015-03-26  8:20                                       ` Jiri Pirko
2015-03-26 14:28                                         ` Scott Feldman
2015-03-26 14:49                                           ` Jiri Pirko
2015-03-27  1:08                                             ` Simon Horman
2015-03-27  6:02                                               ` Jiri Pirko
2015-03-27  6:43                                             ` Scott Feldman
2015-03-27  7:01                                               ` Jiri Pirko
2015-03-27 23:19                                                 ` Scott Feldman
2015-03-30 14:06                                       ` roopa
2015-03-24 18:48                             ` David Christensen
2015-03-24 17:58                         ` Scott Feldman
2015-03-23 17:10           ` roopa
2015-03-23 14:00         ` roopa

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.