All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH net 0/2] DSA bridge TX forwarding offload fixes - part 1
@ 2021-10-03 22:23 Vladimir Oltean
  2021-10-03 22:23 ` [PATCH net 1/2] net: dsa: tag_dsa: send packets with TX fwd offload from VLAN-unaware bridges using VID 0 Vladimir Oltean
                   ` (2 more replies)
  0 siblings, 3 replies; 9+ messages in thread
From: Vladimir Oltean @ 2021-10-03 22:23 UTC (permalink / raw)
  To: netdev, Jakub Kicinski, David S. Miller
  Cc: Florian Fainelli, Andrew Lunn, Vivien Didelot, Tobias Waldekranz

This is part 1 of a series of fixes to the bridge TX forwarding offload
feature introduced for v5.15. Sadly, the other fixes are so intrusive
that they cannot be reasonably be sent to the "net" tree, as they also
include API changes. So they are left as part 2 for net-next.

Vladimir Oltean (2):
  net: dsa: tag_dsa: send packets with TX fwd offload from VLAN-unaware
    bridges using VID 0
  net: dsa: fix bridge_num not getting cleared after ports leaving the
    bridge

 net/dsa/dsa2.c    |  2 +-
 net/dsa/tag_dsa.c | 20 ++------------------
 2 files changed, 3 insertions(+), 19 deletions(-)

-- 
2.25.1


^ permalink raw reply	[flat|nested] 9+ messages in thread

* [PATCH net 1/2] net: dsa: tag_dsa: send packets with TX fwd offload from VLAN-unaware bridges using VID 0
  2021-10-03 22:23 [PATCH net 0/2] DSA bridge TX forwarding offload fixes - part 1 Vladimir Oltean
@ 2021-10-03 22:23 ` Vladimir Oltean
  2021-10-04 10:55   ` Tobias Waldekranz
  2021-10-03 22:23 ` [PATCH net 2/2] net: dsa: fix bridge_num not getting cleared after ports leaving the bridge Vladimir Oltean
  2021-10-04  6:38 ` [PATCH net 0/2] DSA bridge TX forwarding offload fixes - part 1 Tobias Waldekranz
  2 siblings, 1 reply; 9+ messages in thread
From: Vladimir Oltean @ 2021-10-03 22:23 UTC (permalink / raw)
  To: netdev, Jakub Kicinski, David S. Miller
  Cc: Florian Fainelli, Andrew Lunn, Vivien Didelot, Tobias Waldekranz

The present code is structured this way due to an incomplete thought
process. In Documentation/networking/switchdev.rst we document that if a
bridge is VLAN-unaware, then the presence or lack of a pvid on a bridge
port (or on the bridge itself, for that matter) should not affect the
ability to receive and transmit tagged or untagged packets.

If the bridge on behalf of which we are sending this packet is
VLAN-aware, then the TX forwarding offload API ensures that the skb will
be VLAN-tagged (if the packet was sent by user space as untagged, it
will get transmitted town to the driver as tagged with the bridge
device's pvid). But if the bridge is VLAN-unaware, it may or may not be
VLAN-tagged. In fact the logic to insert the bridge's PVID came from the
idea that we should emulate what is being done in the VLAN-aware case.
But we shouldn't.

It appears that injecting packets using a VLAN ID of 0 serves the
purpose of forwarding the packets to the egress port with no VLAN tag
added or stripped by the hardware, and no filtering being performed.
So we can simply remove the superfluous logic.

There are in fact two independent reasons why having this logic is broken:

(1) When CONFIG_BRIDGE_VLAN_FILTERING=n, we call br_vlan_get_pvid_rcu()
    but that returns an error and we do error out, dropping all packets
    on xmit. Not really smart. This is also an issue when the user
    deletes the bridge pvid:

    $ bridge vlan del dev br0 vid 1 self

    As mentioned, in both cases, packets should still flow freely, and
    they do just that on any net device where the bridge is not offloaded,
    but on mv88e6xxx they don't.

(2) that code actually triggers a lockdep warning due to the fact that
    it dereferences bridge private data that assumes rcu_preempt protection
    (rcu_read_lock), but rcu_read_lock is not actually held during
    .ndo_start_xmit, but rather rcu_read_lock_bh (rcu_bh), which has its
    own lockdep keys.

The solution to both problems is the same: delete the broken code.

Fixes: d82f8ab0d874 ("net: dsa: tag_dsa: offload the bridge forwarding process")
Reported-by: Andrew Lunn <andrew@lunn.ch>
Link: https://patchwork.kernel.org/project/netdevbpf/patch/20211003155141.2241314-1-andrew@lunn.ch/
Link: https://patchwork.kernel.org/project/netdevbpf/patch/20210928233708.1246774-1-vladimir.oltean@nxp.com/
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
---
 net/dsa/tag_dsa.c | 20 ++------------------
 1 file changed, 2 insertions(+), 18 deletions(-)

diff --git a/net/dsa/tag_dsa.c b/net/dsa/tag_dsa.c
index 77d0ce89ab77..7e35bcda91c9 100644
--- a/net/dsa/tag_dsa.c
+++ b/net/dsa/tag_dsa.c
@@ -129,12 +129,9 @@ static struct sk_buff *dsa_xmit_ll(struct sk_buff *skb, struct net_device *dev,
 	u8 tag_dev, tag_port;
 	enum dsa_cmd cmd;
 	u8 *dsa_header;
-	u16 pvid = 0;
-	int err;
 
 	if (skb->offload_fwd_mark) {
 		struct dsa_switch_tree *dst = dp->ds->dst;
-		struct net_device *br = dp->bridge_dev;
 
 		cmd = DSA_CMD_FORWARD;
 
@@ -144,19 +141,6 @@ static struct sk_buff *dsa_xmit_ll(struct sk_buff *skb, struct net_device *dev,
 		 */
 		tag_dev = dst->last_switch + 1 + dp->bridge_num;
 		tag_port = 0;
-
-		/* If we are offloading forwarding for a VLAN-unaware bridge,
-		 * inject packets to hardware using the bridge's pvid, since
-		 * that's where the packets ingressed from.
-		 */
-		if (!br_vlan_enabled(br)) {
-			/* Safe because __dev_queue_xmit() runs under
-			 * rcu_read_lock_bh()
-			 */
-			err = br_vlan_get_pvid_rcu(br, &pvid);
-			if (err)
-				return NULL;
-		}
 	} else {
 		cmd = DSA_CMD_FROM_CPU;
 		tag_dev = dp->ds->index;
@@ -188,8 +172,8 @@ static struct sk_buff *dsa_xmit_ll(struct sk_buff *skb, struct net_device *dev,
 
 		dsa_header[0] = (cmd << 6) | tag_dev;
 		dsa_header[1] = tag_port << 3;
-		dsa_header[2] = pvid >> 8;
-		dsa_header[3] = pvid & 0xff;
+		dsa_header[2] = 0;
+		dsa_header[3] = 0;
 	}
 
 	return skb;
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 9+ messages in thread

* [PATCH net 2/2] net: dsa: fix bridge_num not getting cleared after ports leaving the bridge
  2021-10-03 22:23 [PATCH net 0/2] DSA bridge TX forwarding offload fixes - part 1 Vladimir Oltean
  2021-10-03 22:23 ` [PATCH net 1/2] net: dsa: tag_dsa: send packets with TX fwd offload from VLAN-unaware bridges using VID 0 Vladimir Oltean
@ 2021-10-03 22:23 ` Vladimir Oltean
  2021-10-04  6:38 ` [PATCH net 0/2] DSA bridge TX forwarding offload fixes - part 1 Tobias Waldekranz
  2 siblings, 0 replies; 9+ messages in thread
From: Vladimir Oltean @ 2021-10-03 22:23 UTC (permalink / raw)
  To: netdev, Jakub Kicinski, David S. Miller
  Cc: Florian Fainelli, Andrew Lunn, Vivien Didelot, Tobias Waldekranz

The dp->bridge_num is zero-based, with -1 being the encoding for an
invalid value. But dsa_bridge_num_put used to check for an invalid value
by comparing bridge_num with 0, which is of course incorrect.

The result is that the bridge_num will never get cleared by
dsa_bridge_num_put, and further port joins to other bridges will get a
bridge_num larger than the previous one, and once all the available
bridges with TX forwarding offload supported by the hardware get
exhausted, the TX forwarding offload feature is simply disabled.

In the case of sja1105, 7 iterations of the loop below are enough to
exhaust the TX forwarding offload bits, and further bridge joins operate
without that feature.

ip link add br0 type bridge vlan_filtering 1

while :; do
        ip link set sw0p2 master br0 && sleep 1
        ip link set sw0p2 nomaster && sleep 1
done

This issue is enough of an indication that having the dp->bridge_num
invalid encoding be a negative number is prone to bugs, so this will be
changed to a one-based value, with the dp->bridge_num of zero being the
indication of no bridge. However, that is material for net-next.

Fixes: f5e165e72b29 ("net: dsa: track unique bridge numbers across all DSA switch trees")
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
---
 net/dsa/dsa2.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/net/dsa/dsa2.c b/net/dsa/dsa2.c
index b29262eee00b..6d5cc0217133 100644
--- a/net/dsa/dsa2.c
+++ b/net/dsa/dsa2.c
@@ -170,7 +170,7 @@ void dsa_bridge_num_put(const struct net_device *bridge_dev, int bridge_num)
 	/* Check if the bridge is still in use, otherwise it is time
 	 * to clean it up so we can reuse this bridge_num later.
 	 */
-	if (!dsa_bridge_num_find(bridge_dev))
+	if (dsa_bridge_num_find(bridge_dev) < 0)
 		clear_bit(bridge_num, &dsa_fwd_offloading_bridges);
 }
 
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 9+ messages in thread

* Re: [PATCH net 0/2] DSA bridge TX forwarding offload fixes - part 1
  2021-10-03 22:23 [PATCH net 0/2] DSA bridge TX forwarding offload fixes - part 1 Vladimir Oltean
  2021-10-03 22:23 ` [PATCH net 1/2] net: dsa: tag_dsa: send packets with TX fwd offload from VLAN-unaware bridges using VID 0 Vladimir Oltean
  2021-10-03 22:23 ` [PATCH net 2/2] net: dsa: fix bridge_num not getting cleared after ports leaving the bridge Vladimir Oltean
@ 2021-10-04  6:38 ` Tobias Waldekranz
  2021-10-04  9:38   ` Vladimir Oltean
  2 siblings, 1 reply; 9+ messages in thread
From: Tobias Waldekranz @ 2021-10-04  6:38 UTC (permalink / raw)
  To: Vladimir Oltean, netdev, Jakub Kicinski, David S. Miller
  Cc: Florian Fainelli, Andrew Lunn, Vivien Didelot

On Mon, Oct 04, 2021 at 01:23, Vladimir Oltean <vladimir.oltean@nxp.com> wrote:
> This is part 1 of a series of fixes to the bridge TX forwarding offload
> feature introduced for v5.15. Sadly, the other fixes are so intrusive
> that they cannot be reasonably be sent to the "net" tree, as they also
> include API changes. So they are left as part 2 for net-next.

Please give me some time to test this before merging. My spider sense is
tingling.

Have you tried this (1) on a multi-chip system and (2) in a multi-bridge
setup?

--
Tobias

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH net 0/2] DSA bridge TX forwarding offload fixes - part 1
  2021-10-04  6:38 ` [PATCH net 0/2] DSA bridge TX forwarding offload fixes - part 1 Tobias Waldekranz
@ 2021-10-04  9:38   ` Vladimir Oltean
  0 siblings, 0 replies; 9+ messages in thread
From: Vladimir Oltean @ 2021-10-04  9:38 UTC (permalink / raw)
  To: Tobias Waldekranz
  Cc: netdev, Jakub Kicinski, David S. Miller, Florian Fainelli,
	Andrew Lunn, Vivien Didelot

On Mon, Oct 04, 2021 at 08:38:44AM +0200, Tobias Waldekranz wrote:
> On Mon, Oct 04, 2021 at 01:23, Vladimir Oltean <vladimir.oltean@nxp.com> wrote:
> > This is part 1 of a series of fixes to the bridge TX forwarding offload
> > feature introduced for v5.15. Sadly, the other fixes are so intrusive
> > that they cannot be reasonably be sent to the "net" tree, as they also
> > include API changes. So they are left as part 2 for net-next.
> 
> Please give me some time to test this before merging. My spider sense is
> tingling.
> 
> Have you tried this (1) on a multi-chip system and (2) in a multi-bridge
> setup?

Hey, nice hearing from you :) more testing and feedback is always welcome.

When you say "this", which patch are you talking about specifically?

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH net 1/2] net: dsa: tag_dsa: send packets with TX fwd offload from VLAN-unaware bridges using VID 0
  2021-10-03 22:23 ` [PATCH net 1/2] net: dsa: tag_dsa: send packets with TX fwd offload from VLAN-unaware bridges using VID 0 Vladimir Oltean
@ 2021-10-04 10:55   ` Tobias Waldekranz
  2021-10-04 11:16     ` Vladimir Oltean
  0 siblings, 1 reply; 9+ messages in thread
From: Tobias Waldekranz @ 2021-10-04 10:55 UTC (permalink / raw)
  To: Vladimir Oltean, netdev, Jakub Kicinski, David S. Miller
  Cc: Florian Fainelli, Andrew Lunn, Vivien Didelot

On Mon, Oct 04, 2021 at 01:23, Vladimir Oltean <vladimir.oltean@nxp.com> wrote:
> The present code is structured this way due to an incomplete thought
> process. In Documentation/networking/switchdev.rst we document that if a
> bridge is VLAN-unaware, then the presence or lack of a pvid on a bridge
> port (or on the bridge itself, for that matter) should not affect the
> ability to receive and transmit tagged or untagged packets.
>
> If the bridge on behalf of which we are sending this packet is
> VLAN-aware, then the TX forwarding offload API ensures that the skb will
> be VLAN-tagged (if the packet was sent by user space as untagged, it
> will get transmitted town to the driver as tagged with the bridge
> device's pvid). But if the bridge is VLAN-unaware, it may or may not be
> VLAN-tagged. In fact the logic to insert the bridge's PVID came from the
> idea that we should emulate what is being done in the VLAN-aware case.
> But we shouldn't.

IMO, the problem here stems from a discrepancy between LinkStreet
devices and the bridge, in how PVID is interpreted. For the bridge, when
VLAN filtering is disabled, ingressing traffic will be assigned to VID
0. This is true even if the port's PVID is set. A mv88e6xxx port who's
QMode bits are set to 00 (802.1Q disabled) OTOH, will assign ingressing
traffic to its PVID.

So, in order to match the bridge's behavior, I think we need to rethink
how mv88e6xxx deals with non-filtering bridges. At first, one might be
tempted to simply leave the hardware PVID at 0. The PVT can then be used
to create isolation barriers between different bridges. ATU isolation is
really what kills this approach. Since there is no VLAN information in
the tag, there is no way to separate flows from different bridges into
different FIDs. This is the issue I discovered with the forward
offloading series.

> It appears that injecting packets using a VLAN ID of 0 serves the
> purpose of forwarding the packets to the egress port with no VLAN tag
> added or stripped by the hardware, and no filtering being performed.
> So we can simply remove the superfluous logic.

The problem with this patch is that return traffic from the CPU is sent
asymmetrically over a different VLAN, which in turn means that it will
perform the DA lookup in a different FID (0). The result is that traffic
does flow, but for the wrong reason. CPU -> port traffic is now flooded
as unknown unicast. An example:

(:aa / 10.1)
    br0
   /   \
sw0p1 sw0p2
\         /
 \       /
  \     /
    CPU
     |
  .--0--.
  | sw0 |
  '-1-2-'
    | '-- sniffer
    '---- host (:bb / 10.2)

br0 is created using the default settings. sw0 will have (among others)
static entries for the CPU:

    fid:0 addr:aa type:static port:0
    fid:1 addr:aa type:static port:0

1. host sends an ARP for 10.1.

2. sw0 will add this entry (since vlan_default_pvid is 1):

    fid:1 addr:bb type:age-7 port:1

3. CPU replies with a FORWARD (VID 0).

4. sw0 will perform a DA lookup in FID 0, missing the entry learned in
   step 2.

5. sw0 floods the frame as unknown unicast to both host and sniffer.

Conversely, if flooding of unknown unicast is disabled on sw0p1:

    $ bridge link set dev sw0p1 flood off

host can no longer communicate with the CPU.

As I alluded to in the forward offloading thread, I think we need to
move a scheme where:

1. mv88e6xxx clears ds->configure_vlan_while_not_filtering.
2. Assigns a free VID (and by extension a FID) in the VTU to each
   non-filtering bridge.

With this in place, the tagger could use the VID associated with the
egressing port's bridge in the tag.

> There are in fact two independent reasons why having this logic is broken:
>
> (1) When CONFIG_BRIDGE_VLAN_FILTERING=n, we call br_vlan_get_pvid_rcu()
>     but that returns an error and we do error out, dropping all packets
>     on xmit. Not really smart. This is also an issue when the user
>     deletes the bridge pvid:
>
>     $ bridge vlan del dev br0 vid 1 self
>
>     As mentioned, in both cases, packets should still flow freely, and
>     they do just that on any net device where the bridge is not offloaded,
>     but on mv88e6xxx they don't.
>
> (2) that code actually triggers a lockdep warning due to the fact that
>     it dereferences bridge private data that assumes rcu_preempt protection
>     (rcu_read_lock), but rcu_read_lock is not actually held during
>     .ndo_start_xmit, but rather rcu_read_lock_bh (rcu_bh), which has its
>     own lockdep keys.
>
> The solution to both problems is the same: delete the broken code.
>
> Fixes: d82f8ab0d874 ("net: dsa: tag_dsa: offload the bridge forwarding process")
> Reported-by: Andrew Lunn <andrew@lunn.ch>
> Link: https://patchwork.kernel.org/project/netdevbpf/patch/20211003155141.2241314-1-andrew@lunn.ch/
> Link: https://patchwork.kernel.org/project/netdevbpf/patch/20210928233708.1246774-1-vladimir.oltean@nxp.com/
> Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
> ---
>  net/dsa/tag_dsa.c | 20 ++------------------
>  1 file changed, 2 insertions(+), 18 deletions(-)
>
> diff --git a/net/dsa/tag_dsa.c b/net/dsa/tag_dsa.c
> index 77d0ce89ab77..7e35bcda91c9 100644
> --- a/net/dsa/tag_dsa.c
> +++ b/net/dsa/tag_dsa.c
> @@ -129,12 +129,9 @@ static struct sk_buff *dsa_xmit_ll(struct sk_buff *skb, struct net_device *dev,
>  	u8 tag_dev, tag_port;
>  	enum dsa_cmd cmd;
>  	u8 *dsa_header;
> -	u16 pvid = 0;
> -	int err;
>  
>  	if (skb->offload_fwd_mark) {
>  		struct dsa_switch_tree *dst = dp->ds->dst;
> -		struct net_device *br = dp->bridge_dev;
>  
>  		cmd = DSA_CMD_FORWARD;
>  
> @@ -144,19 +141,6 @@ static struct sk_buff *dsa_xmit_ll(struct sk_buff *skb, struct net_device *dev,
>  		 */
>  		tag_dev = dst->last_switch + 1 + dp->bridge_num;
>  		tag_port = 0;
> -
> -		/* If we are offloading forwarding for a VLAN-unaware bridge,
> -		 * inject packets to hardware using the bridge's pvid, since
> -		 * that's where the packets ingressed from.
> -		 */
> -		if (!br_vlan_enabled(br)) {
> -			/* Safe because __dev_queue_xmit() runs under
> -			 * rcu_read_lock_bh()
> -			 */
> -			err = br_vlan_get_pvid_rcu(br, &pvid);
> -			if (err)
> -				return NULL;
> -		}
>  	} else {
>  		cmd = DSA_CMD_FROM_CPU;
>  		tag_dev = dp->ds->index;
> @@ -188,8 +172,8 @@ static struct sk_buff *dsa_xmit_ll(struct sk_buff *skb, struct net_device *dev,
>  
>  		dsa_header[0] = (cmd << 6) | tag_dev;
>  		dsa_header[1] = tag_port << 3;
> -		dsa_header[2] = pvid >> 8;
> -		dsa_header[3] = pvid & 0xff;
> +		dsa_header[2] = 0;
> +		dsa_header[3] = 0;
>  	}
>  
>  	return skb;
> -- 
> 2.25.1

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH net 1/2] net: dsa: tag_dsa: send packets with TX fwd offload from VLAN-unaware bridges using VID 0
  2021-10-04 10:55   ` Tobias Waldekranz
@ 2021-10-04 11:16     ` Vladimir Oltean
  2021-10-04 13:45       ` Tobias Waldekranz
  0 siblings, 1 reply; 9+ messages in thread
From: Vladimir Oltean @ 2021-10-04 11:16 UTC (permalink / raw)
  To: Tobias Waldekranz
  Cc: netdev, Jakub Kicinski, David S. Miller, Florian Fainelli,
	Andrew Lunn, Vivien Didelot

On Mon, Oct 04, 2021 at 12:55:27PM +0200, Tobias Waldekranz wrote:
> On Mon, Oct 04, 2021 at 01:23, Vladimir Oltean <vladimir.oltean@nxp.com> wrote:
> > The present code is structured this way due to an incomplete thought
> > process. In Documentation/networking/switchdev.rst we document that if a
> > bridge is VLAN-unaware, then the presence or lack of a pvid on a bridge
> > port (or on the bridge itself, for that matter) should not affect the
> > ability to receive and transmit tagged or untagged packets.
> >
> > If the bridge on behalf of which we are sending this packet is
> > VLAN-aware, then the TX forwarding offload API ensures that the skb will
> > be VLAN-tagged (if the packet was sent by user space as untagged, it
> > will get transmitted town to the driver as tagged with the bridge
> > device's pvid). But if the bridge is VLAN-unaware, it may or may not be
> > VLAN-tagged. In fact the logic to insert the bridge's PVID came from the
> > idea that we should emulate what is being done in the VLAN-aware case.
> > But we shouldn't.
> 
> IMO, the problem here stems from a discrepancy between LinkStreet
> devices and the bridge, in how PVID is interpreted. For the bridge, when
> VLAN filtering is disabled, ingressing traffic will be assigned to VID
> 0. This is true even if the port's PVID is set. A mv88e6xxx port who's
> QMode bits are set to 00 (802.1Q disabled) OTOH, will assign ingressing
> traffic to its PVID.
> 
> So, in order to match the bridge's behavior, I think we need to rethink
> how mv88e6xxx deals with non-filtering bridges. At first, one might be
> tempted to simply leave the hardware PVID at 0. The PVT can then be used
> to create isolation barriers between different bridges. ATU isolation is
> really what kills this approach. Since there is no VLAN information in
> the tag, there is no way to separate flows from different bridges into
> different FIDs. This is the issue I discovered with the forward
> offloading series.
> 
> > It appears that injecting packets using a VLAN ID of 0 serves the
> > purpose of forwarding the packets to the egress port with no VLAN tag
> > added or stripped by the hardware, and no filtering being performed.
> > So we can simply remove the superfluous logic.
> 
> The problem with this patch is that return traffic from the CPU is sent
> asymmetrically over a different VLAN, which in turn means that it will
> perform the DA lookup in a different FID (0). The result is that traffic
> does flow, but for the wrong reason. CPU -> port traffic is now flooded
> as unknown unicast. An example:
> 
> (:aa / 10.1)
>     br0
>    /   \
> sw0p1 sw0p2
> \         /
>  \       /
>   \     /
>     CPU
>      |
>   .--0--.
>   | sw0 |
>   '-1-2-'
>     | '-- sniffer
>     '---- host (:bb / 10.2)
> 
> br0 is created using the default settings. sw0 will have (among others)
> static entries for the CPU:
> 
>     fid:0 addr:aa type:static port:0
>     fid:1 addr:aa type:static port:0
> 
> 1. host sends an ARP for 10.1.
> 
> 2. sw0 will add this entry (since vlan_default_pvid is 1):
> 
>     fid:1 addr:bb type:age-7 port:1

Well, that's precisely mv88e6xxx's problem, it should not make its
ports' pvid inherit that of the bridge if the bridge is not VLAN aware.
Other drivers inherit the bridge pvid only when VLAN filtering is turned
on. See sja1105, ocelot, mt7530 at the very least. So the entry should
have been learned in FID 0 here.

> 3. CPU replies with a FORWARD (VID 0).
> 
> 4. sw0 will perform a DA lookup in FID 0, missing the entry learned in
>    step 2.
> 
> 5. sw0 floods the frame as unknown unicast to both host and sniffer.
> 
> Conversely, if flooding of unknown unicast is disabled on sw0p1:
> 
>     $ bridge link set dev sw0p1 flood off
> 
> host can no longer communicate with the CPU.
> 
> As I alluded to in the forward offloading thread, I think we need to
> move a scheme where:
> 
> 1. mv88e6xxx clears ds->configure_vlan_while_not_filtering.

No, that's the wrong answer, nobody should clear ds->configure_vlan_while_not_filtering.
mv88e6xxx should leave the pvid at zero* when joining a bridge that is
not VLAN-aware. It should inherit the bridge pvid when that bridge
becomes VLAN-aware, and it should reset the pvid to zero* when that
bridge becomes VLAN-unaware.

> 2. Assigns a free VID (and by extension a FID) in the VTU to each
>    non-filtering bridge.

*with the mention that the pvid of zero will only solve the first half
of the problem, the discrepancy between the VLAN classified on xmit and
the VLAN classified on rcv.

It will not solve the ATU (FDB) isolation problem. But to solve the FDB
isolation problem you need this:
https://patchwork.kernel.org/project/netdevbpf/cover/20210818120150.892647-1-vladimir.oltean@nxp.com/

> With this in place, the tagger could use the VID associated with the
> egressing port's bridge in the tag.

So the patch is not incorrect, it is incomplete. And there's nothing
further I can add to the tagger logic to make it more complete, at least
not now.

That's one of the reasons why this is merely a "part 1".

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH net 1/2] net: dsa: tag_dsa: send packets with TX fwd offload from VLAN-unaware bridges using VID 0
  2021-10-04 11:16     ` Vladimir Oltean
@ 2021-10-04 13:45       ` Tobias Waldekranz
  2021-10-04 13:49         ` Vladimir Oltean
  0 siblings, 1 reply; 9+ messages in thread
From: Tobias Waldekranz @ 2021-10-04 13:45 UTC (permalink / raw)
  To: Vladimir Oltean
  Cc: netdev, Jakub Kicinski, David S. Miller, Florian Fainelli,
	Andrew Lunn, Vivien Didelot

On Mon, Oct 04, 2021 at 11:16, Vladimir Oltean <vladimir.oltean@nxp.com> wrote:
> On Mon, Oct 04, 2021 at 12:55:27PM +0200, Tobias Waldekranz wrote:
>> On Mon, Oct 04, 2021 at 01:23, Vladimir Oltean <vladimir.oltean@nxp.com> wrote:
>> > The present code is structured this way due to an incomplete thought
>> > process. In Documentation/networking/switchdev.rst we document that if a
>> > bridge is VLAN-unaware, then the presence or lack of a pvid on a bridge
>> > port (or on the bridge itself, for that matter) should not affect the
>> > ability to receive and transmit tagged or untagged packets.
>> >
>> > If the bridge on behalf of which we are sending this packet is
>> > VLAN-aware, then the TX forwarding offload API ensures that the skb will
>> > be VLAN-tagged (if the packet was sent by user space as untagged, it
>> > will get transmitted town to the driver as tagged with the bridge
>> > device's pvid). But if the bridge is VLAN-unaware, it may or may not be
>> > VLAN-tagged. In fact the logic to insert the bridge's PVID came from the
>> > idea that we should emulate what is being done in the VLAN-aware case.
>> > But we shouldn't.
>> 
>> IMO, the problem here stems from a discrepancy between LinkStreet
>> devices and the bridge, in how PVID is interpreted. For the bridge, when
>> VLAN filtering is disabled, ingressing traffic will be assigned to VID
>> 0. This is true even if the port's PVID is set. A mv88e6xxx port who's
>> QMode bits are set to 00 (802.1Q disabled) OTOH, will assign ingressing
>> traffic to its PVID.
>> 
>> So, in order to match the bridge's behavior, I think we need to rethink
>> how mv88e6xxx deals with non-filtering bridges. At first, one might be
>> tempted to simply leave the hardware PVID at 0. The PVT can then be used
>> to create isolation barriers between different bridges. ATU isolation is
>> really what kills this approach. Since there is no VLAN information in
>> the tag, there is no way to separate flows from different bridges into
>> different FIDs. This is the issue I discovered with the forward
>> offloading series.
>> 
>> > It appears that injecting packets using a VLAN ID of 0 serves the
>> > purpose of forwarding the packets to the egress port with no VLAN tag
>> > added or stripped by the hardware, and no filtering being performed.
>> > So we can simply remove the superfluous logic.
>> 
>> The problem with this patch is that return traffic from the CPU is sent
>> asymmetrically over a different VLAN, which in turn means that it will
>> perform the DA lookup in a different FID (0). The result is that traffic
>> does flow, but for the wrong reason. CPU -> port traffic is now flooded
>> as unknown unicast. An example:
>> 
>> (:aa / 10.1)
>>     br0
>>    /   \
>> sw0p1 sw0p2
>> \         /
>>  \       /
>>   \     /
>>     CPU
>>      |
>>   .--0--.
>>   | sw0 |
>>   '-1-2-'
>>     | '-- sniffer
>>     '---- host (:bb / 10.2)
>> 
>> br0 is created using the default settings. sw0 will have (among others)
>> static entries for the CPU:
>> 
>>     fid:0 addr:aa type:static port:0
>>     fid:1 addr:aa type:static port:0
>> 
>> 1. host sends an ARP for 10.1.
>> 
>> 2. sw0 will add this entry (since vlan_default_pvid is 1):
>> 
>>     fid:1 addr:bb type:age-7 port:1
>
> Well, that's precisely mv88e6xxx's problem, it should not make its
> ports' pvid inherit that of the bridge if the bridge is not VLAN aware.
> Other drivers inherit the bridge pvid only when VLAN filtering is turned
> on. See sja1105, ocelot, mt7530 at the very least. So the entry should
> have been learned in FID 0 here.
>
>> 3. CPU replies with a FORWARD (VID 0).
>> 
>> 4. sw0 will perform a DA lookup in FID 0, missing the entry learned in
>>    step 2.
>> 
>> 5. sw0 floods the frame as unknown unicast to both host and sniffer.
>> 
>> Conversely, if flooding of unknown unicast is disabled on sw0p1:
>> 
>>     $ bridge link set dev sw0p1 flood off
>> 
>> host can no longer communicate with the CPU.
>> 
>> As I alluded to in the forward offloading thread, I think we need to
>> move a scheme where:
>> 
>> 1. mv88e6xxx clears ds->configure_vlan_while_not_filtering.
>
> No, that's the wrong answer, nobody should clear ds->configure_vlan_while_not_filtering.
> mv88e6xxx should leave the pvid at zero* when joining a bridge that is
> not VLAN-aware. It should inherit the bridge pvid when that bridge
> becomes VLAN-aware, and it should reset the pvid to zero* when that
> bridge becomes VLAN-unaware.

Fair enough, even better!

>> 2. Assigns a free VID (and by extension a FID) in the VTU to each
>>    non-filtering bridge.
>
> *with the mention that the pvid of zero will only solve the first half
> of the problem, the discrepancy between the VLAN classified on xmit and
> the VLAN classified on rcv.
>
> It will not solve the ATU (FDB) isolation problem. But to solve the FDB
> isolation problem you need this:
> https://patchwork.kernel.org/project/netdevbpf/cover/20210818120150.892647-1-vladimir.oltean@nxp.com/
>
>> With this in place, the tagger could use the VID associated with the
>> egressing port's bridge in the tag.
>
> So the patch is not incorrect, it is incomplete. And there's nothing
> further I can add to the tagger logic to make it more complete, at least
> not now.
>
> That's one of the reasons why this is merely a "part 1".

Understood. But perhaps you could add the PVID-wrangling-patch you
suggested above to this series? That way we don't surprise any users on
stable by suddenly flooding traffic that used to be forwarded.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH net 1/2] net: dsa: tag_dsa: send packets with TX fwd offload from VLAN-unaware bridges using VID 0
  2021-10-04 13:45       ` Tobias Waldekranz
@ 2021-10-04 13:49         ` Vladimir Oltean
  0 siblings, 0 replies; 9+ messages in thread
From: Vladimir Oltean @ 2021-10-04 13:49 UTC (permalink / raw)
  To: Tobias Waldekranz
  Cc: netdev, Jakub Kicinski, David S. Miller, Florian Fainelli,
	Andrew Lunn, Vivien Didelot

On Mon, Oct 04, 2021 at 03:45:51PM +0200, Tobias Waldekranz wrote:
> > So the patch is not incorrect, it is incomplete. And there's nothing
> > further I can add to the tagger logic to make it more complete, at least
> > not now.
> >
> > That's one of the reasons why this is merely a "part 1".
> 
> Understood. But perhaps you could add the PVID-wrangling-patch you
> suggested above to this series? That way we don't surprise any users on
> stable by suddenly flooding traffic that used to be forwarded.

Sure, I suppose I could. v2 coming.

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2021-10-04 13:55 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-10-03 22:23 [PATCH net 0/2] DSA bridge TX forwarding offload fixes - part 1 Vladimir Oltean
2021-10-03 22:23 ` [PATCH net 1/2] net: dsa: tag_dsa: send packets with TX fwd offload from VLAN-unaware bridges using VID 0 Vladimir Oltean
2021-10-04 10:55   ` Tobias Waldekranz
2021-10-04 11:16     ` Vladimir Oltean
2021-10-04 13:45       ` Tobias Waldekranz
2021-10-04 13:49         ` Vladimir Oltean
2021-10-03 22:23 ` [PATCH net 2/2] net: dsa: fix bridge_num not getting cleared after ports leaving the bridge Vladimir Oltean
2021-10-04  6:38 ` [PATCH net 0/2] DSA bridge TX forwarding offload fixes - part 1 Tobias Waldekranz
2021-10-04  9:38   ` Vladimir Oltean

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.