All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH net-next] Documentation: networking: Clarify switchdev devices behavior
@ 2018-12-12 23:09 ` Florian Fainelli
  0 siblings, 0 replies; 23+ messages in thread
From: Florian Fainelli @ 2018-12-12 23:09 UTC (permalink / raw)
  To: netdev
  Cc: davem, andrew, jiri, idosch, vivien.didelot, nikolay, roopa,
	bridge, cphealy, Florian Fainelli

This patch provides details on the expected behavior of switchdev
enabled network devices when operating in a "stand alone" mode, as well
as when being bridge members. This clarifies a number of things that
recently came up during a bug fixing session on the b53 DSA switch
driver.

Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
---
Hi all,

Please review carefully, and let me know if you think some of the
behaviors described below do not make any sense. Thanks!

 Documentation/networking/switchdev.txt | 86 ++++++++++++++++++++++++++
 1 file changed, 86 insertions(+)

diff --git a/Documentation/networking/switchdev.txt b/Documentation/networking/switchdev.txt
index 82236a17b5e6..8c83174b477b 100644
--- a/Documentation/networking/switchdev.txt
+++ b/Documentation/networking/switchdev.txt
@@ -392,3 +392,89 @@ switchdev_trans_item_dequeue()
 
 If a transaction is aborted during "prepare" phase, switchdev code will handle
 cleanup of the queued-up objects.
+
+Switchdev enabled network device expected behavior
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+Below is a set of defined behavior that switchdev enabled network device must be
+adhering to.
+
+Configuration less state
+------------------------
+
+Upon driver bring up, the network devices must be fully operational, and the
+backing driver must be configuring the network device such that it is possible
+to send and receive to this network device such that it is properly separate
+from other network devices/ports (e.g: as is frequenty with a switch ASIC). How
+this is achieved is heavily hardware dependent, but a simple solution can be to
+use per-port VLAN identifiers.
+
+The network device must be capable of running a full IP protocol stack must be
+working, including multicast, DHCP, IPv4/6, etc. If necessary, it should be
+programming the appropriate filters for VLAN, multicast, unicast etc. The
+underlying device driver must effectively be configured in a similar fashion to
+what it would do when IGMP snooping is enabled for IP multicast over these
+switchdev network devices and unsollicited multicast must be filtered as early
+as possible into the hardware.
+
+When configuring VLANs on top of the network device, all VLANs must be working,
+irrespective of the state of other network devices (e.g: other ports being part
+of a VLAN aware bridge doing ingress VID checking). See below for details.
+
+Bridged network devices
+-----------------------
+
+When a switchdev enabled network device is added as a bridge member, it should
+not be disrupting any functionality of non-bridged network devices and they
+should continue to behave as normal network devices. Depending on the bridge
+configuration knobs below, the expected behavior is documented.
+
+VLAN filtering
+~~~~~~~~~~~~~~
+
+The Linux bridge allows the configuration of a VLAN filtering mode (compile and
+run time) which must be observed by the underlying switchdev network
+device/hardware:
+
+- with VLAN filtering turned off: frames ingressing the device with a VID that
+  is not programmed into the bridge/switch's VLAN table must be forwarded.
+
+- with VLAN filtering turned on: frames ingressing the device with a VID that is
+  not programmed into the bridges/switch's VLAN table must be dropped.
+
+Non-bridged network ports of the same switch fabric must not be disturbed in any
+way, shape or form by the enabling of VLAN filtering.
+
+VLAN devices configured on top of a switchdev network device (e.g: sw0p1.100)
+which is a bridge port member must also observe the following behavior:
+
+- with VLAN filtering turned off, these VLAN devices must be fully functional
+  since the hardware is allowed VID frames
+
+- with VLAN filtering turned on, these VLAN devices are not going to be
+  functional unless the bridge's VLAN database is also configured to have that
+  VID enabled for the underlying network device/port
+  (e.g: bridge vlan add vid 100 dev sw0p1)
+
+Because VLAN filtering can be turned on/off at runtime, the switchdev driver
+must be able to re-configure the underlying hardware on the fly to honor the
+toggling of that option and behave appropriately.
+
+IGMP snooping
+~~~~~~~~~~~~~
+
+The Linux bridge allows the configuration of IGMP snooping (compile and run
+time) which must be observed by the underlying switchdev network device/hardware
+in the following way:
+
+- when IGMP snooping is turned off, multicast traffic must be flooded to all
+  switch ports within the same broadcast domain, including the CPU/management
+  port of the switch (if handled separately).
+
+- when IGMP snooping is turned on, multicast traffic must be selectively flowing
+  to the appropriate network ports and not flood the entire switch, that must
+  include the CPU/management port.
+
+Because IGMP snooping can be turned on/off at runtime, the switchdev driver must
+be able to re-configure the underlying hardware on the fly to honor the toggling
+of that option and behave appropriately.
-- 
2.17.1

^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [Bridge] [PATCH net-next] Documentation: networking: Clarify switchdev devices behavior
@ 2018-12-12 23:09 ` Florian Fainelli
  0 siblings, 0 replies; 23+ messages in thread
From: Florian Fainelli @ 2018-12-12 23:09 UTC (permalink / raw)
  To: netdev
  Cc: andrew, Florian Fainelli, nikolay, roopa, bridge, vivien.didelot,
	idosch, jiri, davem

This patch provides details on the expected behavior of switchdev
enabled network devices when operating in a "stand alone" mode, as well
as when being bridge members. This clarifies a number of things that
recently came up during a bug fixing session on the b53 DSA switch
driver.

Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
---
Hi all,

Please review carefully, and let me know if you think some of the
behaviors described below do not make any sense. Thanks!

 Documentation/networking/switchdev.txt | 86 ++++++++++++++++++++++++++
 1 file changed, 86 insertions(+)

diff --git a/Documentation/networking/switchdev.txt b/Documentation/networking/switchdev.txt
index 82236a17b5e6..8c83174b477b 100644
--- a/Documentation/networking/switchdev.txt
+++ b/Documentation/networking/switchdev.txt
@@ -392,3 +392,89 @@ switchdev_trans_item_dequeue()
 
 If a transaction is aborted during "prepare" phase, switchdev code will handle
 cleanup of the queued-up objects.
+
+Switchdev enabled network device expected behavior
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+Below is a set of defined behavior that switchdev enabled network device must be
+adhering to.
+
+Configuration less state
+------------------------
+
+Upon driver bring up, the network devices must be fully operational, and the
+backing driver must be configuring the network device such that it is possible
+to send and receive to this network device such that it is properly separate
+from other network devices/ports (e.g: as is frequenty with a switch ASIC). How
+this is achieved is heavily hardware dependent, but a simple solution can be to
+use per-port VLAN identifiers.
+
+The network device must be capable of running a full IP protocol stack must be
+working, including multicast, DHCP, IPv4/6, etc. If necessary, it should be
+programming the appropriate filters for VLAN, multicast, unicast etc. The
+underlying device driver must effectively be configured in a similar fashion to
+what it would do when IGMP snooping is enabled for IP multicast over these
+switchdev network devices and unsollicited multicast must be filtered as early
+as possible into the hardware.
+
+When configuring VLANs on top of the network device, all VLANs must be working,
+irrespective of the state of other network devices (e.g: other ports being part
+of a VLAN aware bridge doing ingress VID checking). See below for details.
+
+Bridged network devices
+-----------------------
+
+When a switchdev enabled network device is added as a bridge member, it should
+not be disrupting any functionality of non-bridged network devices and they
+should continue to behave as normal network devices. Depending on the bridge
+configuration knobs below, the expected behavior is documented.
+
+VLAN filtering
+~~~~~~~~~~~~~~
+
+The Linux bridge allows the configuration of a VLAN filtering mode (compile and
+run time) which must be observed by the underlying switchdev network
+device/hardware:
+
+- with VLAN filtering turned off: frames ingressing the device with a VID that
+  is not programmed into the bridge/switch's VLAN table must be forwarded.
+
+- with VLAN filtering turned on: frames ingressing the device with a VID that is
+  not programmed into the bridges/switch's VLAN table must be dropped.
+
+Non-bridged network ports of the same switch fabric must not be disturbed in any
+way, shape or form by the enabling of VLAN filtering.
+
+VLAN devices configured on top of a switchdev network device (e.g: sw0p1.100)
+which is a bridge port member must also observe the following behavior:
+
+- with VLAN filtering turned off, these VLAN devices must be fully functional
+  since the hardware is allowed VID frames
+
+- with VLAN filtering turned on, these VLAN devices are not going to be
+  functional unless the bridge's VLAN database is also configured to have that
+  VID enabled for the underlying network device/port
+  (e.g: bridge vlan add vid 100 dev sw0p1)
+
+Because VLAN filtering can be turned on/off at runtime, the switchdev driver
+must be able to re-configure the underlying hardware on the fly to honor the
+toggling of that option and behave appropriately.
+
+IGMP snooping
+~~~~~~~~~~~~~
+
+The Linux bridge allows the configuration of IGMP snooping (compile and run
+time) which must be observed by the underlying switchdev network device/hardware
+in the following way:
+
+- when IGMP snooping is turned off, multicast traffic must be flooded to all
+  switch ports within the same broadcast domain, including the CPU/management
+  port of the switch (if handled separately).
+
+- when IGMP snooping is turned on, multicast traffic must be selectively flowing
+  to the appropriate network ports and not flood the entire switch, that must
+  include the CPU/management port.
+
+Because IGMP snooping can be turned on/off at runtime, the switchdev driver must
+be able to re-configure the underlying hardware on the fly to honor the toggling
+of that option and behave appropriately.
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 23+ messages in thread

* Re: [PATCH net-next] Documentation: networking: Clarify switchdev devices behavior
  2018-12-12 23:09 ` [Bridge] " Florian Fainelli
@ 2018-12-13  9:26   ` Andrew Lunn
  -1 siblings, 0 replies; 23+ messages in thread
From: Andrew Lunn @ 2018-12-13  9:26 UTC (permalink / raw)
  To: Florian Fainelli
  Cc: netdev, davem, jiri, idosch, vivien.didelot, nikolay, roopa,
	bridge, cphealy

> +VLAN filtering
> +~~~~~~~~~~~~~~
> +
> +The Linux bridge allows the configuration of a VLAN filtering mode (compile and
> +run time) which must be observed by the underlying switchdev network
> +device/hardware:
> +
> +- with VLAN filtering turned off: frames ingressing the device with a VID that
> +  is not programmed into the bridge/switch's VLAN table must be forwarded.
> +
> +- with VLAN filtering turned on: frames ingressing the device with a VID that is
> +  not programmed into the bridges/switch's VLAN table must be dropped.

Hi Florian

i forget the details, but there are some difference between VLAN
filtering being disabled at compile time, and disabled at runtime. I
think the expected behaviour is the same, but the switchdev API usage
is slightly different.

> +- when IGMP snooping is turned on, multicast traffic must be selectively flowing
> +  to the appropriate network ports and not flood the entire switch, that must
> +  include the CPU/management port.

224.0.0.X/32 should always be flooded, IGMP is optional for those
groups in the local subnet.

	Andrew

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [Bridge] [PATCH net-next] Documentation: networking: Clarify switchdev devices behavior
@ 2018-12-13  9:26   ` Andrew Lunn
  0 siblings, 0 replies; 23+ messages in thread
From: Andrew Lunn @ 2018-12-13  9:26 UTC (permalink / raw)
  To: Florian Fainelli
  Cc: nikolay, netdev, roopa, bridge, vivien.didelot, idosch, jiri, davem

> +VLAN filtering
> +~~~~~~~~~~~~~~
> +
> +The Linux bridge allows the configuration of a VLAN filtering mode (compile and
> +run time) which must be observed by the underlying switchdev network
> +device/hardware:
> +
> +- with VLAN filtering turned off: frames ingressing the device with a VID that
> +  is not programmed into the bridge/switch's VLAN table must be forwarded.
> +
> +- with VLAN filtering turned on: frames ingressing the device with a VID that is
> +  not programmed into the bridges/switch's VLAN table must be dropped.

Hi Florian

i forget the details, but there are some difference between VLAN
filtering being disabled at compile time, and disabled at runtime. I
think the expected behaviour is the same, but the switchdev API usage
is slightly different.

> +- when IGMP snooping is turned on, multicast traffic must be selectively flowing
> +  to the appropriate network ports and not flood the entire switch, that must
> +  include the CPU/management port.

224.0.0.X/32 should always be flooded, IGMP is optional for those
groups in the local subnet.

	Andrew

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH net-next] Documentation: networking: Clarify switchdev devices behavior
  2018-12-12 23:09 ` [Bridge] " Florian Fainelli
@ 2018-12-15 19:35   ` David Miller
  -1 siblings, 0 replies; 23+ messages in thread
From: David Miller @ 2018-12-15 19:35 UTC (permalink / raw)
  To: f.fainelli
  Cc: netdev, andrew, jiri, idosch, vivien.didelot, nikolay, roopa,
	bridge, cphealy

From: Florian Fainelli <f.fainelli@gmail.com>
Date: Wed, 12 Dec 2018 15:09:43 -0800

> This patch provides details on the expected behavior of switchdev
> enabled network devices when operating in a "stand alone" mode, as well
> as when being bridge members. This clarifies a number of things that
> recently came up during a bug fixing session on the b53 DSA switch
> driver.
> 
> Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
> ---
> Hi all,
> 
> Please review carefully, and let me know if you think some of the
> behaviors described below do not make any sense. Thanks!

Looks like Andrew had some feedback, so I'm expecting a new version
of this.

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [Bridge] [PATCH net-next] Documentation: networking: Clarify switchdev devices behavior
@ 2018-12-15 19:35   ` David Miller
  0 siblings, 0 replies; 23+ messages in thread
From: David Miller @ 2018-12-15 19:35 UTC (permalink / raw)
  To: f.fainelli
  Cc: andrew, nikolay, netdev, roopa, bridge, idosch, jiri, vivien.didelot

From: Florian Fainelli <f.fainelli@gmail.com>
Date: Wed, 12 Dec 2018 15:09:43 -0800

> This patch provides details on the expected behavior of switchdev
> enabled network devices when operating in a "stand alone" mode, as well
> as when being bridge members. This clarifies a number of things that
> recently came up during a bug fixing session on the b53 DSA switch
> driver.
> 
> Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
> ---
> Hi all,
> 
> Please review carefully, and let me know if you think some of the
> behaviors described below do not make any sense. Thanks!

Looks like Andrew had some feedback, so I'm expecting a new version
of this.

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH net-next] Documentation: networking: Clarify switchdev devices behavior
  2018-12-12 23:09 ` [Bridge] " Florian Fainelli
@ 2018-12-16  8:25   ` Ido Schimmel
  -1 siblings, 0 replies; 23+ messages in thread
From: Ido Schimmel @ 2018-12-16  8:25 UTC (permalink / raw)
  To: Florian Fainelli
  Cc: andrew, nikolay, netdev, roopa, bridge, vivien.didelot,
	Jiri Pirko, davem

On Wed, Dec 12, 2018 at 03:09:43PM -0800, Florian Fainelli wrote:
> This patch provides details on the expected behavior of switchdev
> enabled network devices when operating in a "stand alone" mode, as well
> as when being bridge members. This clarifies a number of things that
> recently came up during a bug fixing session on the b53 DSA switch
> driver.
> 
> Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
> ---
> Hi all,
> 
> Please review carefully, and let me know if you think some of the
> behaviors described below do not make any sense. Thanks!
> 
>  Documentation/networking/switchdev.txt | 86 ++++++++++++++++++++++++++
>  1 file changed, 86 insertions(+)
> 
> diff --git a/Documentation/networking/switchdev.txt b/Documentation/networking/switchdev.txt
> index 82236a17b5e6..8c83174b477b 100644
> --- a/Documentation/networking/switchdev.txt
> +++ b/Documentation/networking/switchdev.txt
> @@ -392,3 +392,89 @@ switchdev_trans_item_dequeue()
>  
>  If a transaction is aborted during "prepare" phase, switchdev code will handle
>  cleanup of the queued-up objects.
> +
> +Switchdev enabled network device expected behavior
> +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
> +
> +Below is a set of defined behavior that switchdev enabled network device must be
> +adhering to.
> +
> +Configuration less state
> +------------------------
> +
> +Upon driver bring up, the network devices must be fully operational, and the
> +backing driver must be configuring the network device such that it is possible
> +to send and receive to this network device such that it is properly separate
> +from other network devices/ports (e.g: as is frequenty with a switch ASIC). How
> +this is achieved is heavily hardware dependent, but a simple solution can be to
> +use per-port VLAN identifiers.
> +
> +The network device must be capable of running a full IP protocol stack must be
> +working, including multicast, DHCP, IPv4/6, etc. If necessary, it should be
> +programming the appropriate filters for VLAN, multicast, unicast etc. The
> +underlying device driver must effectively be configured in a similar fashion to
> +what it would do when IGMP snooping is enabled for IP multicast over these
> +switchdev network devices and unsollicited multicast must be filtered as early
> +as possible into the hardware.
> +
> +When configuring VLANs on top of the network device, all VLANs must be working,
> +irrespective of the state of other network devices (e.g: other ports being part
> +of a VLAN aware bridge doing ingress VID checking). See below for details.
> +
> +Bridged network devices
> +-----------------------
> +
> +When a switchdev enabled network device is added as a bridge member, it should
> +not be disrupting any functionality of non-bridged network devices and they
> +should continue to behave as normal network devices. Depending on the bridge
> +configuration knobs below, the expected behavior is documented.
> +
> +VLAN filtering
> +~~~~~~~~~~~~~~
> +
> +The Linux bridge allows the configuration of a VLAN filtering mode (compile and
> +run time) which must be observed by the underlying switchdev network
> +device/hardware:
> +
> +- with VLAN filtering turned off: frames ingressing the device with a VID that
> +  is not programmed into the bridge/switch's VLAN table must be forwarded.

mlxsw doesn't support it. These bridges are mainly used with VLAN
devices where the packets ingress the bridge untagged. When configured
over physical ports, we only allow untagged packets into such a bridge.

> +
> +- with VLAN filtering turned on: frames ingressing the device with a VID that is
> +  not programmed into the bridges/switch's VLAN table must be dropped.

ack

> +
> +Non-bridged network ports of the same switch fabric must not be disturbed in any
> +way, shape or form by the enabling of VLAN filtering.
> +
> +VLAN devices configured on top of a switchdev network device (e.g: sw0p1.100)
> +which is a bridge port member must also observe the following behavior:
> +
> +- with VLAN filtering turned off, these VLAN devices must be fully functional
> +  since the hardware is allowed VID frames
> +
> +- with VLAN filtering turned on, these VLAN devices are not going to be
> +  functional unless the bridge's VLAN database is also configured to have that
> +  VID enabled for the underlying network device/port
> +  (e.g: bridge vlan add vid 100 dev sw0p1)

mlxsw forbids the enslavement of VLAN devices to VLAN-aware bridges. It
doesn't really make sense to enable VLAN filtering when all the packets
are untagged.

But I disagree with the comment about the underlying port. When you
configured the VLAN device, it should have enabled the VLAN filters on
the real device via ndo_vlan_rx_add_vid().

> +
> +Because VLAN filtering can be turned on/off at runtime, the switchdev driver
> +must be able to re-configure the underlying hardware on the fly to honor the
> +toggling of that option and behave appropriately.

Please mention that switchdev drivers can refuse the operation.

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [Bridge] [PATCH net-next] Documentation: networking: Clarify switchdev devices behavior
@ 2018-12-16  8:25   ` Ido Schimmel
  0 siblings, 0 replies; 23+ messages in thread
From: Ido Schimmel @ 2018-12-16  8:25 UTC (permalink / raw)
  To: Florian Fainelli
  Cc: andrew, nikolay, netdev, roopa, bridge, vivien.didelot,
	Jiri Pirko, davem

On Wed, Dec 12, 2018 at 03:09:43PM -0800, Florian Fainelli wrote:
> This patch provides details on the expected behavior of switchdev
> enabled network devices when operating in a "stand alone" mode, as well
> as when being bridge members. This clarifies a number of things that
> recently came up during a bug fixing session on the b53 DSA switch
> driver.
> 
> Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
> ---
> Hi all,
> 
> Please review carefully, and let me know if you think some of the
> behaviors described below do not make any sense. Thanks!
> 
>  Documentation/networking/switchdev.txt | 86 ++++++++++++++++++++++++++
>  1 file changed, 86 insertions(+)
> 
> diff --git a/Documentation/networking/switchdev.txt b/Documentation/networking/switchdev.txt
> index 82236a17b5e6..8c83174b477b 100644
> --- a/Documentation/networking/switchdev.txt
> +++ b/Documentation/networking/switchdev.txt
> @@ -392,3 +392,89 @@ switchdev_trans_item_dequeue()
>  
>  If a transaction is aborted during "prepare" phase, switchdev code will handle
>  cleanup of the queued-up objects.
> +
> +Switchdev enabled network device expected behavior
> +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
> +
> +Below is a set of defined behavior that switchdev enabled network device must be
> +adhering to.
> +
> +Configuration less state
> +------------------------
> +
> +Upon driver bring up, the network devices must be fully operational, and the
> +backing driver must be configuring the network device such that it is possible
> +to send and receive to this network device such that it is properly separate
> +from other network devices/ports (e.g: as is frequenty with a switch ASIC). How
> +this is achieved is heavily hardware dependent, but a simple solution can be to
> +use per-port VLAN identifiers.
> +
> +The network device must be capable of running a full IP protocol stack must be
> +working, including multicast, DHCP, IPv4/6, etc. If necessary, it should be
> +programming the appropriate filters for VLAN, multicast, unicast etc. The
> +underlying device driver must effectively be configured in a similar fashion to
> +what it would do when IGMP snooping is enabled for IP multicast over these
> +switchdev network devices and unsollicited multicast must be filtered as early
> +as possible into the hardware.
> +
> +When configuring VLANs on top of the network device, all VLANs must be working,
> +irrespective of the state of other network devices (e.g: other ports being part
> +of a VLAN aware bridge doing ingress VID checking). See below for details.
> +
> +Bridged network devices
> +-----------------------
> +
> +When a switchdev enabled network device is added as a bridge member, it should
> +not be disrupting any functionality of non-bridged network devices and they
> +should continue to behave as normal network devices. Depending on the bridge
> +configuration knobs below, the expected behavior is documented.
> +
> +VLAN filtering
> +~~~~~~~~~~~~~~
> +
> +The Linux bridge allows the configuration of a VLAN filtering mode (compile and
> +run time) which must be observed by the underlying switchdev network
> +device/hardware:
> +
> +- with VLAN filtering turned off: frames ingressing the device with a VID that
> +  is not programmed into the bridge/switch's VLAN table must be forwarded.

mlxsw doesn't support it. These bridges are mainly used with VLAN
devices where the packets ingress the bridge untagged. When configured
over physical ports, we only allow untagged packets into such a bridge.

> +
> +- with VLAN filtering turned on: frames ingressing the device with a VID that is
> +  not programmed into the bridges/switch's VLAN table must be dropped.

ack

> +
> +Non-bridged network ports of the same switch fabric must not be disturbed in any
> +way, shape or form by the enabling of VLAN filtering.
> +
> +VLAN devices configured on top of a switchdev network device (e.g: sw0p1.100)
> +which is a bridge port member must also observe the following behavior:
> +
> +- with VLAN filtering turned off, these VLAN devices must be fully functional
> +  since the hardware is allowed VID frames
> +
> +- with VLAN filtering turned on, these VLAN devices are not going to be
> +  functional unless the bridge's VLAN database is also configured to have that
> +  VID enabled for the underlying network device/port
> +  (e.g: bridge vlan add vid 100 dev sw0p1)

mlxsw forbids the enslavement of VLAN devices to VLAN-aware bridges. It
doesn't really make sense to enable VLAN filtering when all the packets
are untagged.

But I disagree with the comment about the underlying port. When you
configured the VLAN device, it should have enabled the VLAN filters on
the real device via ndo_vlan_rx_add_vid().

> +
> +Because VLAN filtering can be turned on/off at runtime, the switchdev driver
> +must be able to re-configure the underlying hardware on the fly to honor the
> +toggling of that option and behave appropriately.

Please mention that switchdev drivers can refuse the operation.

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH net-next] Documentation: networking: Clarify switchdev devices behavior
  2018-12-16  8:25   ` [Bridge] " Ido Schimmel
@ 2018-12-16 17:14     ` Florian Fainelli
  -1 siblings, 0 replies; 23+ messages in thread
From: Florian Fainelli @ 2018-12-16 17:14 UTC (permalink / raw)
  To: Ido Schimmel
  Cc: andrew, nikolay, netdev, roopa, bridge, vivien.didelot,
	Jiri Pirko, davem

Le 12/16/18 à 12:25 AM, Ido Schimmel a écrit :
> On Wed, Dec 12, 2018 at 03:09:43PM -0800, Florian Fainelli wrote:
>> This patch provides details on the expected behavior of switchdev
>> enabled network devices when operating in a "stand alone" mode, as well
>> as when being bridge members. This clarifies a number of things that
>> recently came up during a bug fixing session on the b53 DSA switch
>> driver.
>>
>> Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
>> ---
>> Hi all,
>>
>> Please review carefully, and let me know if you think some of the
>> behaviors described below do not make any sense. Thanks!
>>
>>  Documentation/networking/switchdev.txt | 86 ++++++++++++++++++++++++++
>>  1 file changed, 86 insertions(+)
>>
>> diff --git a/Documentation/networking/switchdev.txt b/Documentation/networking/switchdev.txt
>> index 82236a17b5e6..8c83174b477b 100644
>> --- a/Documentation/networking/switchdev.txt
>> +++ b/Documentation/networking/switchdev.txt
>> @@ -392,3 +392,89 @@ switchdev_trans_item_dequeue()
>>  
>>  If a transaction is aborted during "prepare" phase, switchdev code will handle
>>  cleanup of the queued-up objects.
>> +
>> +Switchdev enabled network device expected behavior
>> +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
>> +
>> +Below is a set of defined behavior that switchdev enabled network device must be
>> +adhering to.
>> +
>> +Configuration less state
>> +------------------------
>> +
>> +Upon driver bring up, the network devices must be fully operational, and the
>> +backing driver must be configuring the network device such that it is possible
>> +to send and receive to this network device such that it is properly separate
>> +from other network devices/ports (e.g: as is frequenty with a switch ASIC). How
>> +this is achieved is heavily hardware dependent, but a simple solution can be to
>> +use per-port VLAN identifiers.
>> +
>> +The network device must be capable of running a full IP protocol stack must be
>> +working, including multicast, DHCP, IPv4/6, etc. If necessary, it should be
>> +programming the appropriate filters for VLAN, multicast, unicast etc. The
>> +underlying device driver must effectively be configured in a similar fashion to
>> +what it would do when IGMP snooping is enabled for IP multicast over these
>> +switchdev network devices and unsollicited multicast must be filtered as early
>> +as possible into the hardware.
>> +
>> +When configuring VLANs on top of the network device, all VLANs must be working,
>> +irrespective of the state of other network devices (e.g: other ports being part
>> +of a VLAN aware bridge doing ingress VID checking). See below for details.
>> +
>> +Bridged network devices
>> +-----------------------
>> +
>> +When a switchdev enabled network device is added as a bridge member, it should
>> +not be disrupting any functionality of non-bridged network devices and they
>> +should continue to behave as normal network devices. Depending on the bridge
>> +configuration knobs below, the expected behavior is documented.
>> +
>> +VLAN filtering
>> +~~~~~~~~~~~~~~
>> +
>> +The Linux bridge allows the configuration of a VLAN filtering mode (compile and
>> +run time) which must be observed by the underlying switchdev network
>> +device/hardware:
>> +
>> +- with VLAN filtering turned off: frames ingressing the device with a VID that
>> +  is not programmed into the bridge/switch's VLAN table must be forwarded.
> 
> mlxsw doesn't support it. These bridges are mainly used with VLAN
> devices where the packets ingress the bridge untagged. When configured
> over physical ports, we only allow untagged packets into such a bridge.

I suppose I got confused about the meaning of VLAN filtering on a Linux
bridge when offloaded to a switch, VLAN filtering turned off effectively
means: no VLAN awareness, everything untagged.

There are really many misnomers within the bridge code then, like
MC_DISABLED, this really means: flood or do not flood multicast, not
"disable multicast" which would be madness.

> 
>> +
>> +- with VLAN filtering turned on: frames ingressing the device with a VID that is
>> +  not programmed into the bridges/switch's VLAN table must be dropped.
> 
> ack
> 
>> +
>> +Non-bridged network ports of the same switch fabric must not be disturbed in any
>> +way, shape or form by the enabling of VLAN filtering.
>> +
>> +VLAN devices configured on top of a switchdev network device (e.g: sw0p1.100)
>> +which is a bridge port member must also observe the following behavior:
>> +
>> +- with VLAN filtering turned off, these VLAN devices must be fully functional
>> +  since the hardware is allowed VID frames
>> +
>> +- with VLAN filtering turned on, these VLAN devices are not going to be
>> +  functional unless the bridge's VLAN database is also configured to have that
>> +  VID enabled for the underlying network device/port
>> +  (e.g: bridge vlan add vid 100 dev sw0p1)
> 
> mlxsw forbids the enslavement of VLAN devices to VLAN-aware bridges. It
> doesn't really make sense to enable VLAN filtering when all the packets
> are untagged.

Did you mean VLAN-unaware here, otherwise that would contradict the
statement that VLAN-aware bridges mean everything untagged, or am I
incorrectly understanding things here?

> 
> But I disagree with the comment about the underlying port. When you
> configured the VLAN device, it should have enabled the VLAN filters on
> the real device via ndo_vlan_rx_add_vid().

That is really why I submitted this patch, because right now I have a
patch (yet to be submitted) which adds ndo_vlan_rx_{add,kill}_vid() and
if the underlying device is enslaved into a bridge, I just do nothing
and let the bridge control the VLAN membership, hence my comment and
example here.

What you are saying is that we should have these two cases:

1) VLAN devices on top of VLAN unaware bridge: allow the VLAN device and
program VLAN filter on the underlying switch port to permit VLAN tagging

2) VLAN devices on top of a VLAN aware bridge: deny the VLAN device
creation and let the bridge, which is VLAN aware manage the port VLAN
membership

In case 1) or 2) if the desire is to have a VLAN aware network device
this can be either done through a VLAN device on top of the switch port,
or through a VLAN device on top of the bridge master itself, and in
either case, this amounts to doing about the same thing.

Did I get this right?

> 
>> +
>> +Because VLAN filtering can be turned on/off at runtime, the switchdev driver
>> +must be able to re-configure the underlying hardware on the fly to honor the
>> +toggling of that option and behave appropriately.
> 
> Please mention that switchdev drivers can refuse the operation.
> 

Will do, thanks!
-- 
Florian

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [Bridge] [PATCH net-next] Documentation: networking: Clarify switchdev devices behavior
@ 2018-12-16 17:14     ` Florian Fainelli
  0 siblings, 0 replies; 23+ messages in thread
From: Florian Fainelli @ 2018-12-16 17:14 UTC (permalink / raw)
  To: Ido Schimmel
  Cc: andrew, nikolay, netdev, roopa, bridge, vivien.didelot,
	Jiri Pirko, davem

Le 12/16/18 à 12:25 AM, Ido Schimmel a écrit :
> On Wed, Dec 12, 2018 at 03:09:43PM -0800, Florian Fainelli wrote:
>> This patch provides details on the expected behavior of switchdev
>> enabled network devices when operating in a "stand alone" mode, as well
>> as when being bridge members. This clarifies a number of things that
>> recently came up during a bug fixing session on the b53 DSA switch
>> driver.
>>
>> Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
>> ---
>> Hi all,
>>
>> Please review carefully, and let me know if you think some of the
>> behaviors described below do not make any sense. Thanks!
>>
>>  Documentation/networking/switchdev.txt | 86 ++++++++++++++++++++++++++
>>  1 file changed, 86 insertions(+)
>>
>> diff --git a/Documentation/networking/switchdev.txt b/Documentation/networking/switchdev.txt
>> index 82236a17b5e6..8c83174b477b 100644
>> --- a/Documentation/networking/switchdev.txt
>> +++ b/Documentation/networking/switchdev.txt
>> @@ -392,3 +392,89 @@ switchdev_trans_item_dequeue()
>>  
>>  If a transaction is aborted during "prepare" phase, switchdev code will handle
>>  cleanup of the queued-up objects.
>> +
>> +Switchdev enabled network device expected behavior
>> +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
>> +
>> +Below is a set of defined behavior that switchdev enabled network device must be
>> +adhering to.
>> +
>> +Configuration less state
>> +------------------------
>> +
>> +Upon driver bring up, the network devices must be fully operational, and the
>> +backing driver must be configuring the network device such that it is possible
>> +to send and receive to this network device such that it is properly separate
>> +from other network devices/ports (e.g: as is frequenty with a switch ASIC). How
>> +this is achieved is heavily hardware dependent, but a simple solution can be to
>> +use per-port VLAN identifiers.
>> +
>> +The network device must be capable of running a full IP protocol stack must be
>> +working, including multicast, DHCP, IPv4/6, etc. If necessary, it should be
>> +programming the appropriate filters for VLAN, multicast, unicast etc. The
>> +underlying device driver must effectively be configured in a similar fashion to
>> +what it would do when IGMP snooping is enabled for IP multicast over these
>> +switchdev network devices and unsollicited multicast must be filtered as early
>> +as possible into the hardware.
>> +
>> +When configuring VLANs on top of the network device, all VLANs must be working,
>> +irrespective of the state of other network devices (e.g: other ports being part
>> +of a VLAN aware bridge doing ingress VID checking). See below for details.
>> +
>> +Bridged network devices
>> +-----------------------
>> +
>> +When a switchdev enabled network device is added as a bridge member, it should
>> +not be disrupting any functionality of non-bridged network devices and they
>> +should continue to behave as normal network devices. Depending on the bridge
>> +configuration knobs below, the expected behavior is documented.
>> +
>> +VLAN filtering
>> +~~~~~~~~~~~~~~
>> +
>> +The Linux bridge allows the configuration of a VLAN filtering mode (compile and
>> +run time) which must be observed by the underlying switchdev network
>> +device/hardware:
>> +
>> +- with VLAN filtering turned off: frames ingressing the device with a VID that
>> +  is not programmed into the bridge/switch's VLAN table must be forwarded.
> 
> mlxsw doesn't support it. These bridges are mainly used with VLAN
> devices where the packets ingress the bridge untagged. When configured
> over physical ports, we only allow untagged packets into such a bridge.

I suppose I got confused about the meaning of VLAN filtering on a Linux
bridge when offloaded to a switch, VLAN filtering turned off effectively
means: no VLAN awareness, everything untagged.

There are really many misnomers within the bridge code then, like
MC_DISABLED, this really means: flood or do not flood multicast, not
"disable multicast" which would be madness.

> 
>> +
>> +- with VLAN filtering turned on: frames ingressing the device with a VID that is
>> +  not programmed into the bridges/switch's VLAN table must be dropped.
> 
> ack
> 
>> +
>> +Non-bridged network ports of the same switch fabric must not be disturbed in any
>> +way, shape or form by the enabling of VLAN filtering.
>> +
>> +VLAN devices configured on top of a switchdev network device (e.g: sw0p1.100)
>> +which is a bridge port member must also observe the following behavior:
>> +
>> +- with VLAN filtering turned off, these VLAN devices must be fully functional
>> +  since the hardware is allowed VID frames
>> +
>> +- with VLAN filtering turned on, these VLAN devices are not going to be
>> +  functional unless the bridge's VLAN database is also configured to have that
>> +  VID enabled for the underlying network device/port
>> +  (e.g: bridge vlan add vid 100 dev sw0p1)
> 
> mlxsw forbids the enslavement of VLAN devices to VLAN-aware bridges. It
> doesn't really make sense to enable VLAN filtering when all the packets
> are untagged.

Did you mean VLAN-unaware here, otherwise that would contradict the
statement that VLAN-aware bridges mean everything untagged, or am I
incorrectly understanding things here?

> 
> But I disagree with the comment about the underlying port. When you
> configured the VLAN device, it should have enabled the VLAN filters on
> the real device via ndo_vlan_rx_add_vid().

That is really why I submitted this patch, because right now I have a
patch (yet to be submitted) which adds ndo_vlan_rx_{add,kill}_vid() and
if the underlying device is enslaved into a bridge, I just do nothing
and let the bridge control the VLAN membership, hence my comment and
example here.

What you are saying is that we should have these two cases:

1) VLAN devices on top of VLAN unaware bridge: allow the VLAN device and
program VLAN filter on the underlying switch port to permit VLAN tagging

2) VLAN devices on top of a VLAN aware bridge: deny the VLAN device
creation and let the bridge, which is VLAN aware manage the port VLAN
membership

In case 1) or 2) if the desire is to have a VLAN aware network device
this can be either done through a VLAN device on top of the switch port,
or through a VLAN device on top of the bridge master itself, and in
either case, this amounts to doing about the same thing.

Did I get this right?

> 
>> +
>> +Because VLAN filtering can be turned on/off at runtime, the switchdev driver
>> +must be able to re-configure the underlying hardware on the fly to honor the
>> +toggling of that option and behave appropriately.
> 
> Please mention that switchdev drivers can refuse the operation.
> 

Will do, thanks!
-- 
Florian

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH net-next] Documentation: networking: Clarify switchdev devices behavior
  2018-12-16  8:25   ` [Bridge] " Ido Schimmel
@ 2018-12-17  3:36     ` Florian Fainelli
  -1 siblings, 0 replies; 23+ messages in thread
From: Florian Fainelli @ 2018-12-17  3:36 UTC (permalink / raw)
  To: Ido Schimmel
  Cc: andrew, nikolay, netdev, roopa, bridge, vivien.didelot,
	Jiri Pirko, davem



On December 16, 2018 12:25:19 AM PST, Ido Schimmel <idosch@mellanox.com> wrote:
>On Wed, Dec 12, 2018 at 03:09:43PM -0800, Florian Fainelli wrote:

>
>mlxsw doesn't support it. These bridges are mainly used with VLAN
>devices where the packets ingress the bridge untagged. When configured
>over physical ports, we only allow untagged packets into such a bridge.

There is another complication with at least some of the DSA switches, turning off VLAN filtering is a global operation, so we must deny it if we have another bridge device that spans the same switch device which is also requesting VLAN filtering to be on. Not necessarily a problem in a larger switch fabric comprised of multiple switches (the D in DSA) since they could conceptually have multiple switches each with different VLAN filtering rules but that complicates the matter significantly.

The more I think about supporting toggling VLAN filtering at runtime the less it seems to have a good return on investment:

- the bridge layer does not remove VLAN entries created while the bridge was VLAN aware, thus complicating the on to off state, since we need to make the switch port a member of all VLANs, untagged, some older switches don't have a "join all VLAN" shorthand for that so that means programming up to 4K VLAN entries...slow.

- no reasonable use case comes to mind which would not involved knowing whether a bridge should be VLAN aware ahead of time.

I am therefore convinced that adopting the mlxsw behavior wrt. VLAN filtering toggling is a good approach. Thanks!
-- 
Florian

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [Bridge] [PATCH net-next] Documentation: networking: Clarify switchdev devices behavior
@ 2018-12-17  3:36     ` Florian Fainelli
  0 siblings, 0 replies; 23+ messages in thread
From: Florian Fainelli @ 2018-12-17  3:36 UTC (permalink / raw)
  To: Ido Schimmel
  Cc: andrew, nikolay, netdev, roopa, bridge, vivien.didelot,
	Jiri Pirko, davem



On December 16, 2018 12:25:19 AM PST, Ido Schimmel <idosch@mellanox.com> wrote:
>On Wed, Dec 12, 2018 at 03:09:43PM -0800, Florian Fainelli wrote:

>
>mlxsw doesn't support it. These bridges are mainly used with VLAN
>devices where the packets ingress the bridge untagged. When configured
>over physical ports, we only allow untagged packets into such a bridge.

There is another complication with at least some of the DSA switches, turning off VLAN filtering is a global operation, so we must deny it if we have another bridge device that spans the same switch device which is also requesting VLAN filtering to be on. Not necessarily a problem in a larger switch fabric comprised of multiple switches (the D in DSA) since they could conceptually have multiple switches each with different VLAN filtering rules but that complicates the matter significantly.

The more I think about supporting toggling VLAN filtering at runtime the less it seems to have a good return on investment:

- the bridge layer does not remove VLAN entries created while the bridge was VLAN aware, thus complicating the on to off state, since we need to make the switch port a member of all VLANs, untagged, some older switches don't have a "join all VLAN" shorthand for that so that means programming up to 4K VLAN entries...slow.

- no reasonable use case comes to mind which would not involved knowing whether a bridge should be VLAN aware ahead of time.

I am therefore convinced that adopting the mlxsw behavior wrt. VLAN filtering toggling is a good approach. Thanks!
-- 
Florian

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH net-next] Documentation: networking: Clarify switchdev devices behavior
  2018-12-16 17:14     ` [Bridge] " Florian Fainelli
@ 2018-12-18  7:01       ` Ido Schimmel
  -1 siblings, 0 replies; 23+ messages in thread
From: Ido Schimmel @ 2018-12-18  7:01 UTC (permalink / raw)
  To: Florian Fainelli
  Cc: netdev, davem, andrew, Jiri Pirko, vivien.didelot, nikolay,
	roopa, bridge, cphealy

On Sun, Dec 16, 2018 at 09:14:09AM -0800, Florian Fainelli wrote:
> Le 12/16/18 à 12:25 AM, Ido Schimmel a écrit :
> > On Wed, Dec 12, 2018 at 03:09:43PM -0800, Florian Fainelli wrote:
> >> +Non-bridged network ports of the same switch fabric must not be disturbed in any
> >> +way, shape or form by the enabling of VLAN filtering.
> >> +
> >> +VLAN devices configured on top of a switchdev network device (e.g: sw0p1.100)
> >> +which is a bridge port member must also observe the following behavior:
> >> +
> >> +- with VLAN filtering turned off, these VLAN devices must be fully functional
> >> +  since the hardware is allowed VID frames
> >> +
> >> +- with VLAN filtering turned on, these VLAN devices are not going to be
> >> +  functional unless the bridge's VLAN database is also configured to have that
> >> +  VID enabled for the underlying network device/port
> >> +  (e.g: bridge vlan add vid 100 dev sw0p1)
> > 
> > mlxsw forbids the enslavement of VLAN devices to VLAN-aware bridges. It
> > doesn't really make sense to enable VLAN filtering when all the packets
> > are untagged.
> 
> Did you mean VLAN-unaware here, otherwise that would contradict the
> statement that VLAN-aware bridges mean everything untagged, or am I
> incorrectly understanding things here?

I meant VLAN-aware... In a VLAN-unaware bridge the VLAN is meaningless.
For example, there is no filtering based on VLAN at ingress/egress and
FDB entries are only searched based on MAC (VLAN is always 0). This is
in contrast to a VLAN-aware bridge.

When you enslave VLAN netdevs to a bridge, the bridge sees untagged
packets. The VLAN tag is pulled from the packet in Rx path and then the
packet is injected to the bridge via the Rx handler configured on the
VLAN netdev. Therefore, there is point in enslaving these device to a
VLAN-aware bridge.

Also, mlxsw only supports a single VLAN-aware bridge. You can however,
configure 1K VLAN-unaware bridges.

> > But I disagree with the comment about the underlying port. When you
> > configured the VLAN device, it should have enabled the VLAN filters on
> > the real device via ndo_vlan_rx_add_vid().
> 
> That is really why I submitted this patch, because right now I have a
> patch (yet to be submitted) which adds ndo_vlan_rx_{add,kill}_vid() and
> if the underlying device is enslaved into a bridge, I just do nothing
> and let the bridge control the VLAN membership, hence my comment and
> example here.
> 
> What you are saying is that we should have these two cases:
> 
> 1) VLAN devices on top of VLAN unaware bridge: allow the VLAN device and
> program VLAN filter on the underlying switch port to permit VLAN tagging

When you say "on top" you mean enslaved to?

> 2) VLAN devices on top of a VLAN aware bridge: deny the VLAN device
> creation and let the bridge, which is VLAN aware manage the port VLAN
> membership

mlxsw does not forbid the creation of the VLAN device. It only forbids
its enslavement to a VLAN-aware bridge.

> In case 1) or 2) if the desire is to have a VLAN aware network device
> this can be either done through a VLAN device on top of the switch port,
> or through a VLAN device on top of the bridge master itself, and in
> either case, this amounts to doing about the same thing.
> 
> Did I get this right?

I'm saying that a VLAN-aware bridge with VIDs 10-100 (for example) is
equivalent to VLAN devices with VIDs 10-100 enslaved to br10-br100,
respectively. It is up to you if you want to / can support both modes.
We support both, but most / all users use the VLAN-aware bridge.

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [Bridge] [PATCH net-next] Documentation: networking: Clarify switchdev devices behavior
@ 2018-12-18  7:01       ` Ido Schimmel
  0 siblings, 0 replies; 23+ messages in thread
From: Ido Schimmel @ 2018-12-18  7:01 UTC (permalink / raw)
  To: Florian Fainelli
  Cc: andrew, nikolay, netdev, roopa, bridge, vivien.didelot,
	Jiri Pirko, davem

On Sun, Dec 16, 2018 at 09:14:09AM -0800, Florian Fainelli wrote:
> Le 12/16/18 à 12:25 AM, Ido Schimmel a écrit :
> > On Wed, Dec 12, 2018 at 03:09:43PM -0800, Florian Fainelli wrote:
> >> +Non-bridged network ports of the same switch fabric must not be disturbed in any
> >> +way, shape or form by the enabling of VLAN filtering.
> >> +
> >> +VLAN devices configured on top of a switchdev network device (e.g: sw0p1.100)
> >> +which is a bridge port member must also observe the following behavior:
> >> +
> >> +- with VLAN filtering turned off, these VLAN devices must be fully functional
> >> +  since the hardware is allowed VID frames
> >> +
> >> +- with VLAN filtering turned on, these VLAN devices are not going to be
> >> +  functional unless the bridge's VLAN database is also configured to have that
> >> +  VID enabled for the underlying network device/port
> >> +  (e.g: bridge vlan add vid 100 dev sw0p1)
> > 
> > mlxsw forbids the enslavement of VLAN devices to VLAN-aware bridges. It
> > doesn't really make sense to enable VLAN filtering when all the packets
> > are untagged.
> 
> Did you mean VLAN-unaware here, otherwise that would contradict the
> statement that VLAN-aware bridges mean everything untagged, or am I
> incorrectly understanding things here?

I meant VLAN-aware... In a VLAN-unaware bridge the VLAN is meaningless.
For example, there is no filtering based on VLAN at ingress/egress and
FDB entries are only searched based on MAC (VLAN is always 0). This is
in contrast to a VLAN-aware bridge.

When you enslave VLAN netdevs to a bridge, the bridge sees untagged
packets. The VLAN tag is pulled from the packet in Rx path and then the
packet is injected to the bridge via the Rx handler configured on the
VLAN netdev. Therefore, there is point in enslaving these device to a
VLAN-aware bridge.

Also, mlxsw only supports a single VLAN-aware bridge. You can however,
configure 1K VLAN-unaware bridges.

> > But I disagree with the comment about the underlying port. When you
> > configured the VLAN device, it should have enabled the VLAN filters on
> > the real device via ndo_vlan_rx_add_vid().
> 
> That is really why I submitted this patch, because right now I have a
> patch (yet to be submitted) which adds ndo_vlan_rx_{add,kill}_vid() and
> if the underlying device is enslaved into a bridge, I just do nothing
> and let the bridge control the VLAN membership, hence my comment and
> example here.
> 
> What you are saying is that we should have these two cases:
> 
> 1) VLAN devices on top of VLAN unaware bridge: allow the VLAN device and
> program VLAN filter on the underlying switch port to permit VLAN tagging

When you say "on top" you mean enslaved to?

> 2) VLAN devices on top of a VLAN aware bridge: deny the VLAN device
> creation and let the bridge, which is VLAN aware manage the port VLAN
> membership

mlxsw does not forbid the creation of the VLAN device. It only forbids
its enslavement to a VLAN-aware bridge.

> In case 1) or 2) if the desire is to have a VLAN aware network device
> this can be either done through a VLAN device on top of the switch port,
> or through a VLAN device on top of the bridge master itself, and in
> either case, this amounts to doing about the same thing.
> 
> Did I get this right?

I'm saying that a VLAN-aware bridge with VIDs 10-100 (for example) is
equivalent to VLAN devices with VIDs 10-100 enslaved to br10-br100,
respectively. It is up to you if you want to / can support both modes.
We support both, but most / all users use the VLAN-aware bridge.

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH net-next] Documentation: networking: Clarify switchdev devices behavior
  2018-12-18  7:01       ` [Bridge] " Ido Schimmel
@ 2018-12-18 20:13         ` Florian Fainelli
  -1 siblings, 0 replies; 23+ messages in thread
From: Florian Fainelli @ 2018-12-18 20:13 UTC (permalink / raw)
  To: Ido Schimmel
  Cc: netdev, davem, andrew, Jiri Pirko, vivien.didelot, nikolay,
	roopa, bridge, cphealy

On 12/17/18 11:01 PM, Ido Schimmel wrote:
> On Sun, Dec 16, 2018 at 09:14:09AM -0800, Florian Fainelli wrote:
>> Le 12/16/18 à 12:25 AM, Ido Schimmel a écrit :
>>> On Wed, Dec 12, 2018 at 03:09:43PM -0800, Florian Fainelli wrote:
>>>> +Non-bridged network ports of the same switch fabric must not be disturbed in any
>>>> +way, shape or form by the enabling of VLAN filtering.
>>>> +
>>>> +VLAN devices configured on top of a switchdev network device (e.g: sw0p1.100)
>>>> +which is a bridge port member must also observe the following behavior:
>>>> +
>>>> +- with VLAN filtering turned off, these VLAN devices must be fully functional
>>>> +  since the hardware is allowed VID frames
>>>> +
>>>> +- with VLAN filtering turned on, these VLAN devices are not going to be
>>>> +  functional unless the bridge's VLAN database is also configured to have that
>>>> +  VID enabled for the underlying network device/port
>>>> +  (e.g: bridge vlan add vid 100 dev sw0p1)
>>>
>>> mlxsw forbids the enslavement of VLAN devices to VLAN-aware bridges. It
>>> doesn't really make sense to enable VLAN filtering when all the packets
>>> are untagged.
>>
>> Did you mean VLAN-unaware here, otherwise that would contradict the
>> statement that VLAN-aware bridges mean everything untagged, or am I
>> incorrectly understanding things here?
> 
> I meant VLAN-aware... In a VLAN-unaware bridge the VLAN is meaningless.
> For example, there is no filtering based on VLAN at ingress/egress and
> FDB entries are only searched based on MAC (VLAN is always 0). This is
> in contrast to a VLAN-aware bridge.
> 
> When you enslave VLAN netdevs to a bridge, the bridge sees untagged
> packets. The VLAN tag is pulled from the packet in Rx path and then the
> packet is injected to the bridge via the Rx handler configured on the
> VLAN netdev. Therefore, there is point in enslaving these device to a
> VLAN-aware bridge.

I see what you describe and that is not quite what I was talking about,
see below.

> 
> Also, mlxsw only supports a single VLAN-aware bridge. You can however,
> configure 1K VLAN-unaware bridges.

OK, how do you enforce that in the driver? I was going to do something
as basic as: loop around all ports that are not the one being changed by
VLAN filtering attribute, if bridge device associated is non-NULL and
br_vlan_enabled() returns true for that bridge and we want to turn off
VLAN filtering, then this is not possible since that would break the
other bridge devices we have which are VLAN filtering enabled.

> 
>>> But I disagree with the comment about the underlying port. When you
>>> configured the VLAN device, it should have enabled the VLAN filters on
>>> the real device via ndo_vlan_rx_add_vid().
>>
>> That is really why I submitted this patch, because right now I have a
>> patch (yet to be submitted) which adds ndo_vlan_rx_{add,kill}_vid() and
>> if the underlying device is enslaved into a bridge, I just do nothing
>> and let the bridge control the VLAN membership, hence my comment and
>> example here.
>>
>> What you are saying is that we should have these two cases:
>>
>> 1) VLAN devices on top of VLAN unaware bridge: allow the VLAN device and
>> program VLAN filter on the underlying switch port to permit VLAN tagging
> 
> When you say "on top" you mean enslaved to?

I meant to write: a VLAN device created on (top of) a switch port, and
this switch port being a bridge member. The VLAN device would not be
added as a bridge member (did not really think about it).

> 
>> 2) VLAN devices on top of a VLAN aware bridge: deny the VLAN device
>> creation and let the bridge, which is VLAN aware manage the port VLAN
>> membership
> 
> mlxsw does not forbid the creation of the VLAN device. It only forbids
> its enslavement to a VLAN-aware bridge.

That's done in mlxsw_sp_netdevice_port_vlan_event() right?

> 
>> In case 1) or 2) if the desire is to have a VLAN aware network device
>> this can be either done through a VLAN device on top of the switch port,
>> or through a VLAN device on top of the bridge master itself, and in
>> either case, this amounts to doing about the same thing.
>>
>> Did I get this right?
> 
> I'm saying that a VLAN-aware bridge with VIDs 10-100 (for example) is
> equivalent to VLAN devices with VIDs 10-100 enslaved to br10-br100,
> respectively. It is up to you if you want to / can support both modes.
> We support both, but most / all users use the VLAN-aware bridge.
> 

Right, and it is also equivalent to have these two things:

- a VLAN aware bridge, adding VIDs 10-100 to sw0p1 through "bridge vlan
add vid .." and creating br0.10..br0.100 devices

and:

- a VLAN aware bridge, creating sw0p1.10..sw0p1.100 (VLAN devices) while
sw0p1 is a bridge port member
-- 
Florian

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [Bridge] [PATCH net-next] Documentation: networking: Clarify switchdev devices behavior
@ 2018-12-18 20:13         ` Florian Fainelli
  0 siblings, 0 replies; 23+ messages in thread
From: Florian Fainelli @ 2018-12-18 20:13 UTC (permalink / raw)
  To: Ido Schimmel
  Cc: andrew, nikolay, netdev, roopa, bridge, vivien.didelot,
	Jiri Pirko, davem

On 12/17/18 11:01 PM, Ido Schimmel wrote:
> On Sun, Dec 16, 2018 at 09:14:09AM -0800, Florian Fainelli wrote:
>> Le 12/16/18 à 12:25 AM, Ido Schimmel a écrit :
>>> On Wed, Dec 12, 2018 at 03:09:43PM -0800, Florian Fainelli wrote:
>>>> +Non-bridged network ports of the same switch fabric must not be disturbed in any
>>>> +way, shape or form by the enabling of VLAN filtering.
>>>> +
>>>> +VLAN devices configured on top of a switchdev network device (e.g: sw0p1.100)
>>>> +which is a bridge port member must also observe the following behavior:
>>>> +
>>>> +- with VLAN filtering turned off, these VLAN devices must be fully functional
>>>> +  since the hardware is allowed VID frames
>>>> +
>>>> +- with VLAN filtering turned on, these VLAN devices are not going to be
>>>> +  functional unless the bridge's VLAN database is also configured to have that
>>>> +  VID enabled for the underlying network device/port
>>>> +  (e.g: bridge vlan add vid 100 dev sw0p1)
>>>
>>> mlxsw forbids the enslavement of VLAN devices to VLAN-aware bridges. It
>>> doesn't really make sense to enable VLAN filtering when all the packets
>>> are untagged.
>>
>> Did you mean VLAN-unaware here, otherwise that would contradict the
>> statement that VLAN-aware bridges mean everything untagged, or am I
>> incorrectly understanding things here?
> 
> I meant VLAN-aware... In a VLAN-unaware bridge the VLAN is meaningless.
> For example, there is no filtering based on VLAN at ingress/egress and
> FDB entries are only searched based on MAC (VLAN is always 0). This is
> in contrast to a VLAN-aware bridge.
> 
> When you enslave VLAN netdevs to a bridge, the bridge sees untagged
> packets. The VLAN tag is pulled from the packet in Rx path and then the
> packet is injected to the bridge via the Rx handler configured on the
> VLAN netdev. Therefore, there is point in enslaving these device to a
> VLAN-aware bridge.

I see what you describe and that is not quite what I was talking about,
see below.

> 
> Also, mlxsw only supports a single VLAN-aware bridge. You can however,
> configure 1K VLAN-unaware bridges.

OK, how do you enforce that in the driver? I was going to do something
as basic as: loop around all ports that are not the one being changed by
VLAN filtering attribute, if bridge device associated is non-NULL and
br_vlan_enabled() returns true for that bridge and we want to turn off
VLAN filtering, then this is not possible since that would break the
other bridge devices we have which are VLAN filtering enabled.

> 
>>> But I disagree with the comment about the underlying port. When you
>>> configured the VLAN device, it should have enabled the VLAN filters on
>>> the real device via ndo_vlan_rx_add_vid().
>>
>> That is really why I submitted this patch, because right now I have a
>> patch (yet to be submitted) which adds ndo_vlan_rx_{add,kill}_vid() and
>> if the underlying device is enslaved into a bridge, I just do nothing
>> and let the bridge control the VLAN membership, hence my comment and
>> example here.
>>
>> What you are saying is that we should have these two cases:
>>
>> 1) VLAN devices on top of VLAN unaware bridge: allow the VLAN device and
>> program VLAN filter on the underlying switch port to permit VLAN tagging
> 
> When you say "on top" you mean enslaved to?

I meant to write: a VLAN device created on (top of) a switch port, and
this switch port being a bridge member. The VLAN device would not be
added as a bridge member (did not really think about it).

> 
>> 2) VLAN devices on top of a VLAN aware bridge: deny the VLAN device
>> creation and let the bridge, which is VLAN aware manage the port VLAN
>> membership
> 
> mlxsw does not forbid the creation of the VLAN device. It only forbids
> its enslavement to a VLAN-aware bridge.

That's done in mlxsw_sp_netdevice_port_vlan_event() right?

> 
>> In case 1) or 2) if the desire is to have a VLAN aware network device
>> this can be either done through a VLAN device on top of the switch port,
>> or through a VLAN device on top of the bridge master itself, and in
>> either case, this amounts to doing about the same thing.
>>
>> Did I get this right?
> 
> I'm saying that a VLAN-aware bridge with VIDs 10-100 (for example) is
> equivalent to VLAN devices with VIDs 10-100 enslaved to br10-br100,
> respectively. It is up to you if you want to / can support both modes.
> We support both, but most / all users use the VLAN-aware bridge.
> 

Right, and it is also equivalent to have these two things:

- a VLAN aware bridge, adding VIDs 10-100 to sw0p1 through "bridge vlan
add vid .." and creating br0.10..br0.100 devices

and:

- a VLAN aware bridge, creating sw0p1.10..sw0p1.100 (VLAN devices) while
sw0p1 is a bridge port member
-- 
Florian

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH net-next] Documentation: networking: Clarify switchdev devices behavior
  2018-12-18 20:13         ` [Bridge] " Florian Fainelli
@ 2018-12-22 20:29           ` Ido Schimmel
  -1 siblings, 0 replies; 23+ messages in thread
From: Ido Schimmel @ 2018-12-22 20:29 UTC (permalink / raw)
  To: Florian Fainelli
  Cc: Ido Schimmel, netdev, davem, andrew, Jiri Pirko, vivien.didelot,
	nikolay, roopa, bridge, cphealy

On Tue, Dec 18, 2018 at 12:13:38PM -0800, Florian Fainelli wrote:
> On 12/17/18 11:01 PM, Ido Schimmel wrote:
> > On Sun, Dec 16, 2018 at 09:14:09AM -0800, Florian Fainelli wrote:
> >> Le 12/16/18 à 12:25 AM, Ido Schimmel a écrit :
> >>> On Wed, Dec 12, 2018 at 03:09:43PM -0800, Florian Fainelli wrote:
> >>>> +Non-bridged network ports of the same switch fabric must not be disturbed in any
> >>>> +way, shape or form by the enabling of VLAN filtering.
> >>>> +
> >>>> +VLAN devices configured on top of a switchdev network device (e.g: sw0p1.100)
> >>>> +which is a bridge port member must also observe the following behavior:
> >>>> +
> >>>> +- with VLAN filtering turned off, these VLAN devices must be fully functional
> >>>> +  since the hardware is allowed VID frames
> >>>> +
> >>>> +- with VLAN filtering turned on, these VLAN devices are not going to be
> >>>> +  functional unless the bridge's VLAN database is also configured to have that
> >>>> +  VID enabled for the underlying network device/port
> >>>> +  (e.g: bridge vlan add vid 100 dev sw0p1)
> >>>
> >>> mlxsw forbids the enslavement of VLAN devices to VLAN-aware bridges. It
> >>> doesn't really make sense to enable VLAN filtering when all the packets
> >>> are untagged.
> >>
> >> Did you mean VLAN-unaware here, otherwise that would contradict the
> >> statement that VLAN-aware bridges mean everything untagged, or am I
> >> incorrectly understanding things here?
> > 
> > I meant VLAN-aware... In a VLAN-unaware bridge the VLAN is meaningless.
> > For example, there is no filtering based on VLAN at ingress/egress and
> > FDB entries are only searched based on MAC (VLAN is always 0). This is
> > in contrast to a VLAN-aware bridge.
> > 
> > When you enslave VLAN netdevs to a bridge, the bridge sees untagged
> > packets. The VLAN tag is pulled from the packet in Rx path and then the
> > packet is injected to the bridge via the Rx handler configured on the
> > VLAN netdev. Therefore, there is point in enslaving these device to a
> > VLAN-aware bridge.
> 
> I see what you describe and that is not quite what I was talking about,
> see below.
> 
> > 
> > Also, mlxsw only supports a single VLAN-aware bridge. You can however,
> > configure 1K VLAN-unaware bridges.
> 
> OK, how do you enforce that in the driver? I was going to do something
> as basic as: loop around all ports that are not the one being changed by
> VLAN filtering attribute, if bridge device associated is non-NULL and
> br_vlan_enabled() returns true for that bridge and we want to turn off
> VLAN filtering, then this is not possible since that would break the
> other bridge devices we have which are VLAN filtering enabled.

See mlxsw_sp_bridge_device_create(). We basically keep a list of bridges
we care about. If one is already VLAN aware, then we fail the creation
of another bridge.

> 
> > 
> >>> But I disagree with the comment about the underlying port. When you
> >>> configured the VLAN device, it should have enabled the VLAN filters on
> >>> the real device via ndo_vlan_rx_add_vid().
> >>
> >> That is really why I submitted this patch, because right now I have a
> >> patch (yet to be submitted) which adds ndo_vlan_rx_{add,kill}_vid() and
> >> if the underlying device is enslaved into a bridge, I just do nothing
> >> and let the bridge control the VLAN membership, hence my comment and
> >> example here.
> >>
> >> What you are saying is that we should have these two cases:
> >>
> >> 1) VLAN devices on top of VLAN unaware bridge: allow the VLAN device and
> >> program VLAN filter on the underlying switch port to permit VLAN tagging
> > 
> > When you say "on top" you mean enslaved to?
> 
> I meant to write: a VLAN device created on (top of) a switch port, and
> this switch port being a bridge member. The VLAN device would not be
> added as a bridge member (did not really think about it).
> 
> > 
> >> 2) VLAN devices on top of a VLAN aware bridge: deny the VLAN device
> >> creation and let the bridge, which is VLAN aware manage the port VLAN
> >> membership
> > 
> > mlxsw does not forbid the creation of the VLAN device. It only forbids
> > its enslavement to a VLAN-aware bridge.
> 
> That's done in mlxsw_sp_netdevice_port_vlan_event() right?

Almost :) See mlxsw_sp_bridge_8021q_port_join()

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [Bridge] [PATCH net-next] Documentation: networking: Clarify switchdev devices behavior
@ 2018-12-22 20:29           ` Ido Schimmel
  0 siblings, 0 replies; 23+ messages in thread
From: Ido Schimmel @ 2018-12-22 20:29 UTC (permalink / raw)
  To: Florian Fainelli
  Cc: andrew, nikolay, netdev, roopa, bridge, vivien.didelot,
	Ido Schimmel, Jiri Pirko, davem

On Tue, Dec 18, 2018 at 12:13:38PM -0800, Florian Fainelli wrote:
> On 12/17/18 11:01 PM, Ido Schimmel wrote:
> > On Sun, Dec 16, 2018 at 09:14:09AM -0800, Florian Fainelli wrote:
> >> Le 12/16/18 à 12:25 AM, Ido Schimmel a écrit :
> >>> On Wed, Dec 12, 2018 at 03:09:43PM -0800, Florian Fainelli wrote:
> >>>> +Non-bridged network ports of the same switch fabric must not be disturbed in any
> >>>> +way, shape or form by the enabling of VLAN filtering.
> >>>> +
> >>>> +VLAN devices configured on top of a switchdev network device (e.g: sw0p1.100)
> >>>> +which is a bridge port member must also observe the following behavior:
> >>>> +
> >>>> +- with VLAN filtering turned off, these VLAN devices must be fully functional
> >>>> +  since the hardware is allowed VID frames
> >>>> +
> >>>> +- with VLAN filtering turned on, these VLAN devices are not going to be
> >>>> +  functional unless the bridge's VLAN database is also configured to have that
> >>>> +  VID enabled for the underlying network device/port
> >>>> +  (e.g: bridge vlan add vid 100 dev sw0p1)
> >>>
> >>> mlxsw forbids the enslavement of VLAN devices to VLAN-aware bridges. It
> >>> doesn't really make sense to enable VLAN filtering when all the packets
> >>> are untagged.
> >>
> >> Did you mean VLAN-unaware here, otherwise that would contradict the
> >> statement that VLAN-aware bridges mean everything untagged, or am I
> >> incorrectly understanding things here?
> > 
> > I meant VLAN-aware... In a VLAN-unaware bridge the VLAN is meaningless.
> > For example, there is no filtering based on VLAN at ingress/egress and
> > FDB entries are only searched based on MAC (VLAN is always 0). This is
> > in contrast to a VLAN-aware bridge.
> > 
> > When you enslave VLAN netdevs to a bridge, the bridge sees untagged
> > packets. The VLAN tag is pulled from the packet in Rx path and then the
> > packet is injected to the bridge via the Rx handler configured on the
> > VLAN netdev. Therefore, there is point in enslaving these device to a
> > VLAN-aware bridge.
> 
> I see what you describe and that is not quite what I was talking about,
> see below.
> 
> > 
> > Also, mlxsw only supports a single VLAN-aware bridge. You can however,
> > configure 1K VLAN-unaware bridges.
> 
> OK, how do you enforce that in the driver? I was going to do something
> as basic as: loop around all ports that are not the one being changed by
> VLAN filtering attribute, if bridge device associated is non-NULL and
> br_vlan_enabled() returns true for that bridge and we want to turn off
> VLAN filtering, then this is not possible since that would break the
> other bridge devices we have which are VLAN filtering enabled.

See mlxsw_sp_bridge_device_create(). We basically keep a list of bridges
we care about. If one is already VLAN aware, then we fail the creation
of another bridge.

> 
> > 
> >>> But I disagree with the comment about the underlying port. When you
> >>> configured the VLAN device, it should have enabled the VLAN filters on
> >>> the real device via ndo_vlan_rx_add_vid().
> >>
> >> That is really why I submitted this patch, because right now I have a
> >> patch (yet to be submitted) which adds ndo_vlan_rx_{add,kill}_vid() and
> >> if the underlying device is enslaved into a bridge, I just do nothing
> >> and let the bridge control the VLAN membership, hence my comment and
> >> example here.
> >>
> >> What you are saying is that we should have these two cases:
> >>
> >> 1) VLAN devices on top of VLAN unaware bridge: allow the VLAN device and
> >> program VLAN filter on the underlying switch port to permit VLAN tagging
> > 
> > When you say "on top" you mean enslaved to?
> 
> I meant to write: a VLAN device created on (top of) a switch port, and
> this switch port being a bridge member. The VLAN device would not be
> added as a bridge member (did not really think about it).
> 
> > 
> >> 2) VLAN devices on top of a VLAN aware bridge: deny the VLAN device
> >> creation and let the bridge, which is VLAN aware manage the port VLAN
> >> membership
> > 
> > mlxsw does not forbid the creation of the VLAN device. It only forbids
> > its enslavement to a VLAN-aware bridge.
> 
> That's done in mlxsw_sp_netdevice_port_vlan_event() right?

Almost :) See mlxsw_sp_bridge_8021q_port_join()

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH net-next] Documentation: networking: Clarify switchdev devices behavior
  2020-07-23 22:58   ` Florian Fainelli
@ 2020-07-24  0:43     ` Vladimir Oltean
  0 siblings, 0 replies; 23+ messages in thread
From: Vladimir Oltean @ 2020-07-24  0:43 UTC (permalink / raw)
  To: Florian Fainelli
  Cc: netdev, davem, andrew, vivien.didelot, cphealy, idosch, jiri,
	bridge, nikolay, roopa, rdunlap, ilias.apalodimas,
	ivan.khoronzhuk, kuba

On Thu, Jul 23, 2020 at 03:58:24PM -0700, Florian Fainelli wrote:
> On 7/23/20 3:11 PM, Vladimir Oltean wrote:
> > On Wed, Jul 22, 2020 at 03:52:53PM -0700, Florian Fainelli wrote:
> >> This patch provides details on the expected behavior of switchdev
> >> enabled network devices when operating in a "stand alone" mode, as well
> >> as when being bridge members. This clarifies a number of things that
> >> recently came up during a bug fixing session on the b53 DSA switch
> >> driver.
> >>
> >> Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
> >> ---
> >> Since this has been submitted in a while, removing the patch numbering,
> >> but previous patches and discussions can be found here:
> >>
> >> http://patchwork.ozlabs.org/project/netdev/patch/20190103224702.21541-1-f.fainelli@gmail.com/
> >> http://patchwork.ozlabs.org/project/netdev/patch/20190109043930.8534-1-f.fainelli@gmail.com/
> >> http://patchwork.ozlabs.org/project/netdev/patch/20190110193206.9872-1-f.fainelli@gmail.com/
> >>
> >> David, I would like to hear from Vladimir and Ido specifically to make
> >> sure that the documentation is up to date with expectations or desired
> >> behavior so we can move forward with Vladimir's DSA rx filtering patch
> >> series. So don't apply this just yet ;)
> >>
> >> Thanks!
> >>
> > 
> > Thanks for giving me the opportunity to speak up.
> > 
> > Your addition to switchdev.rst is more than welcome, and the content is
> > good. I had opened this file a few days ago searching for a few words on
> > address filtering, but alas, there were none. And even now, with your
> > addition - there is something, but it's more focused on multicast, and I
> > haven't used that nearly enough to have a strong opinion about it. Let
> > me try to add my 2 cents about what concerns me on this particular topic.
> > 
> > I'm not asking you to add it to your documentation patch - not that you
> > could even do that, as I'm talking more about how things should be than
> > how they are, things like IVDF aren't even in mainline.
> > 
> > If people agree at least in principle with my words below, I can make
> > the necessary changes to the bridge driver to conform to this
> > interpretation of things.
> > 
> > 
> > Address filtering
> > ^^^^^^^^^^^^^^^^^
> > 
> > With regular network interface cards, address filters are used to drop
> > in hardware the frames that have a destination address different than
> > what the card is configured to perform termination on.
> > 
> > With switchdev, the hardware is usually geared towards accepting traffic
> > regardless of destination MAC address, because the primary objective is
> > forwarding to another host based on that address, and not termination.
> > 
> > Therefore, the address filters of a switchdev interface cannot typically
> > be implemented at the same hardware layer as they are for a regular
> > interface. The behavior as seen by the operating system should, however,
> > be the same.
> > 
> > In the case of a regular NIC, the expectation is that only the frames
> > having a destination that is present in the RX filtering lists (managed
> > through dev_uc_add() and dev_mc_add()) are accepted, while the others
> > are dropped. The filters can be bypassed using the IFF_PROMISC and
> > IFF_ALLMULTI flags.
> > 
> > A switchdev interface that is capable of offloading the Linux bridge
> > should have hardware provisioning for flooding unknown destination
> > addresses and learning from source addresses. Strictly speaking, the
> > hardware design of such an interface should be promiscuous out of
> > necessity: as long as flooding is enabled, hardware promiscuity is
> > implied.
> > 
> > However, this is of no relevance to the operating system. Since flooding
> > and forwarding happen autonomously, it makes no difference to the end
> > result whether the forwarded and flooded addresses are, or aren't,
> > present in the address list of the network interface corresponding to
> > the switchdev port.
> > 
> > To achieve a similar behavior between switchdev and non-switchdev
> > interfaces, address filtering for switchdev can be defined in terms of
> > the frames that the CPU sees.
> 
> Nit here, I don't know if we want to refer to CPU, host or management
> interface of the switch, all terms are IMHO inter changeable and clear
> in the context below, though I wonder what "pure" switchdev drivers
> would prefer to see being used.
> 

I really meant 'CPU' and not 'CPU port'. As in, 'the thing on which
Linux runs'. I can change to 'operating system' if that is clearer.

> > 
> > - Primary MAC address: A driver should deliver to the CPU, and only to
> >   the CPU, for termination purposes, frames having a destination address
> >   that matches the MAC address of the ingress interface.
> 
> Ack. We could go one step further and say that this is the MAC address
> of the Ethernet MAC connected to the CPU port. As we say in French this
> would be busting through an open door.
> 

We should.
This stuff may be basic, but I want it to be very clear.

> > 
> > - Secondary MAC addresses: A driver should deliver to the CPU frames
> >   having a destination address that matches an entry added with
> >   dev_uc_add() or dev_mc_add(). These typically correspond to upper
> >   interfaces configured on top of the switchdev interface, such as
> >   8021q, bridge, macvlan.
> 
> Ack.
> 
> > 
> > - A driver is allowed to not deliver to the CPU frames that don't have a
> >   match in the ingress interface's primary and secondary address lists.
> >   An exception to this rule is when the interface is configured as
> >   promiscuous, or to receive all multicast traffic.
> 
> Ack.
> 
> > 
> > - An interface can be configured as promiscuous when it is required that
> >   the CPU sees frames with an unknown destination (same as in the
> >   non-switchdev case). Otherwise said, promiscuous mode manages the
> >   presence (or the absence) of the CPU in the flooding domain of the
> >   switch. A similar comment applies to IFF_ALLMULTI, although that case
> >   applies only to unknown multicast traffic.
> 
> Ack.
> 
> > 
> > - Other layers of the network stack that actively make use of switchdev
> >   offloads should not request promiscuous mode for the sole purpose of
> >   accepting ingress frames that will end up reinjected in the hardware
> >   data path anyway. The switchdev framework can be considered to offload
> >   the need of promiscuity for this purpose. An example of valid use of
> >   promiscuous mode for a switchdev driver is when it is bridged with a
> >   non-switchdev interface, and the CPU needs to perform termination
> >   (from the hardware's perspective) of unknown-destination traffic, in
> >   order to forward it in software to the other network interfaces.
> 
> Ack.
> 
> > 
> > - If the hardware supports filtering MAC addresses per VLAN domain, then
> >   CPU membership of a VLAN could be managed through IVDF (Individual
> >   Virtual Device Filtering). Namely, the CPU should join the VLAN of all
> >   IVDF addresses in its filter list, and can exit all VLANs that are not
> >   there.
> 
> Ack. A complication can exist if VLAN filtering applies globally to the
> switch and the CPU interface is put in promiscuous mode. We would then
> expect the CPU interface to join all VLANs for the sake of receiving all
> frames.
> 

Yes, if a network interface is part of a switch that has other ports in
a VLAN-aware bridge, then this interface should join all 4096 VLANs when
put in IFF_PROMISC mode, to deactivate the VLAN filtering the hard way,
while also preserving VLAN information in frames sent to the CPU. Note
that there is a risk for untagged frames and pvid-tagged frames to be
indistinguishable from one another if you do that. This is where
installing a reserved VLAN as pvid, such as 4095 as pvid, which cannot
be sent on the network, can come in handy.

> > 
> >>  Documentation/networking/switchdev.rst | 118 +++++++++++++++++++++++++
> >>  1 file changed, 118 insertions(+)
> >>
> >> diff --git a/Documentation/networking/switchdev.rst b/Documentation/networking/switchdev.rst
> >> index ddc3f35775dc..2e4f50e6c63c 100644
> >> --- a/Documentation/networking/switchdev.rst
> >> +++ b/Documentation/networking/switchdev.rst
> >> @@ -385,3 +385,121 @@ The driver can monitor for updates to arp_tbl using the netevent notifier
> >>  NETEVENT_NEIGH_UPDATE.  The device can be programmed with resolved nexthops
> >>  for the routes as arp_tbl updates.  The driver implements ndo_neigh_destroy
> >>  to know when arp_tbl neighbor entries are purged from the port.
> >> +
> >> +Device driver expected behavior
> >> +-------------------------------
> >> +
> >> +Below is a set of defined behavior that switchdev enabled network devices must
> >> +adhere to.
> >> +
> >> +Configuration less state
> >> +^^^^^^^^^^^^^^^^^^^^^^^^
> >> +
> >> +Upon driver bring up, the network devices must be fully operational, and the
> >> +backing driver must configure the network device such that it is possible to
> >> +send and receive traffic to this network device and it is properly separated
> >> +from other network devices/ports (e.g.: as is frequent with a switch ASIC). How
> >> +this is achieved is heavily hardware dependent, but a simple solution can be to
> >> +use per-port VLAN identifiers unless a better mechanism is available
> >> +(proprietary metadata for each network port for instance).
> >> +
> >> +The network device must be capable of running a full IP protocol stack
> >> +including multicast, DHCP, IPv4/6, etc. If necessary, it should program the
> >> +appropriate filters for VLAN, multicast, unicast etc. The underlying device
> >> +driver must effectively be configured in a similar fashion to what it would do
> >> +when IGMP snooping is enabled for IP multicast over these switchdev network
> >> +devices and unsolicited multicast must be filtered as early as possible into
> >> +the hardware.
> >> +
> >> +When configuring VLANs on top of the network device, all VLANs must be working,
> >> +irrespective of the state of other network devices (e.g.: other ports being part
> >> +of a VLAN aware bridge doing ingress VID checking). See below for details.
> >> +
> >> +If the device implements e.g.: VLAN filtering, putting the interface in
> >> +promiscuous mode should allow the reception of all VLAN tags (including those
> >> +not present in the filter(s)).
> >> +
> >> +Bridged switch ports
> >> +^^^^^^^^^^^^^^^^^^^^
> >> +
> >> +When a switchdev enabled network device is added as a bridge member, it should
> >> +not disrupt any functionality of non-bridged network devices and they
> >> +should continue to behave as normal network devices. Depending on the bridge
> >> +configuration knobs below, the expected behavior is documented.
> >> +
> >> +Bridge VLAN filtering
> >> +^^^^^^^^^^^^^^^^^^^^^
> >> +
> >> +The Linux bridge allows the configuration of a VLAN filtering mode (compile and
> >> +run time) which must be observed by the underlying switchdev network
> > 
> > s/compile and run time/statically, at interface creation time, and dynamically/
> 
> Thanks.
> 
> > 
> >> +device/hardware:
> >> +
> >> +- with VLAN filtering turned off: the bridge is strictly VLAN unaware and its
> >> +  data path will only process untagged Ethernet frames. Frames ingressing the
> >> +  device with a VID that is not programmed into the bridge/switch's VLAN table
> >> +  must be forwarded and may be processed using a VLAN device (see below).
> >> +
> >> +- with VLAN filtering turned on: the bridge is VLAN aware and frames ingressing
> >> +  the device with a VID that is not programmed into the bridges/switch's VLAN
> >> +  table must be dropped (strict VID checking).
> >> +
> >> +Non-bridged network ports of the same switch fabric must not be disturbed in any
> >> +way by the enabling of VLAN filtering on the bridge device(s).
> >> +
> >> +VLAN devices configured on top of a switchdev network device (e.g: sw0p1.100)
> >> +which is a bridge port member must also observe the following behavior:
> >> +
> >> +- with VLAN filtering turned off, enslaving VLAN devices into the bridge might
> >> +  be allowed provided that there is sufficient separation using e.g.: a
> >> +  reserved VLAN ID (4095 for instance) for untagged traffic. The VLAN data path
> >> +  is used to pop/push the VLAN tag such that the bridge's data path only
> >> +  processes untagged traffic.
> >> +
> > 
> > Why does the bridge's data path only process untagged traffic?
> > It should process frames that are untagged or have a VLAN ID which does
> > not match the VLAN ID of any 8021q upper of the ingress interface.
> > Which brings me to the question: how is a VLAN frame having an unknown
> > (to 8021q) VLAN ID going to be treated by such a switchdev interface? It
> > should accept it. Will it? Well, I don't really understand the advice
> > given here, about the separation, and how does the pvid of 4095 help
> > with frames that are already VLAN-tagged.
> 
> I was trying to capture a discussion Ido and I had on IRC a while ago.
> He clarified that the VLAN-unaware bridge should only see untagged
> traffic within its data path. To answer your question, a VLAN frame with
> an unknown VID may be accepted by the switch hardware, but should not be
> delivered to the bridge data path because there is no software VLAN to
> process that VID.
> 

In my understanding it is the exact opposite: a VLAN frame is delivered
to the bridge data path _only_if_ there is no software VLAN to consume
it. At least, this is what is happening with software-only interfaces.

> The advice regarding separation is about the following use case: we have
> two physical switch ports sw0p0 and sw0p1. sw0p1.100 is created to
> terminate VID 100 tagged, and sw0p1.100 is created to terminate VID 100
> tagged as well.
> 
> sw0p0 is added to a bridge, and so is sw0p1.100, it seems to me that
> sw0p0.100 and sw0p1.100 should still be separate because they are not
> part of the same broadcast domain. One port (sw0p0) is part of the
> bridge, whereas the other (sw0p1) is not. Without a FID or internal
> double tagging, I am not sure how you can maintain that separation.
> 
> Maybe this is not worth mentioning, or maybe I am wrong, having some
> feedback would be welcome here.
> 

No, I think it's definitely worth mentioning, corner cases are always
the trickiest.

If we interpret an 8021q upper as "deliver this VLAN only to the CPU,
extract it from the hardware data path", then we're ok, given that we
can satisfy that request. Both sw0p0.100 and sw0p1.100 are delivered to
the CPU, where they are isolated enough that they are not going to be
software-bridged. That is, _if_ sw0p0.100 exists. In this model, We
might end up having to create it, just in order to maintain the
isolation.

By the way, I think that with the current model, offloading more fluid
setups like this sw0p0 <-> sw0p1.100 scenario is going to be a little
tricky. Maybe it would be wiser to simply bridge sw0p0 and sw1p1, and
add a 'matchall action vlan push' to the egress qdisc of sw1p1 and a
'flower protocol 802.1Q vlan_id 100 action vlan pop' to its ingress
qdisc. This lends itself a lot better to offloading.

> > 
> >> +- with VLAN filtering turned on, these VLAN devices can be created as long as
> >> +  there is not an existing VLAN entry into the bridge with an identical VID and
> >> +  port membership. These VLAN devices cannot be enslaved into the bridge since
> >> +  because they duplicate functionality/use case with the bridge's VLAN data path
> >> +  processing.
> >> +
> > 
> > The way I visualize things for myself, it's not so much that the bridge
> > and 8021q modules are duplicating functionality, but rather that the
> > requirements are contradictory. 'bridge vlan add ...' wants to configure
> > the forwarding data path, while 'ip link add link ... type vlan' wants
> > to steal frames from the data path and deliver them to the CPU.
> 
> Yes, that is a good way to look at it. With a VLAN aware bridge you can
> terminate VLAN traffic at the bridge level too, if your bridge master is
> also part of the VLAN group, which is why I felt that explaining that
> would be necessary.
> 

Correct. I wasn't thinking of the 'bridge fdb add dev br0
00:01:02:03:04:05 self master' and 'bridge vlan add dev swp0 vid 101
master' cases, but this is a good point. These addresses should be sent
upstream, towards the CPU. And a VLAN that is added 'in bulk' without a
specific IVDF address, such as one added through 'bridge vlan ..
master', should also contribute to the flood mask of the CPU.

> > 
> >> +Because VLAN filtering can be turned on/off at runtime, the switchdev driver
> >> +must be able to re-configure the underlying hardware on the fly to honor the
> >> +toggling of that option and behave appropriately.
> >> +
> >> +A switchdev driver can also refuse to support dynamic toggling of the VLAN
> >> +filtering knob at runtime and require a destruction of the bridge device(s) and
> >> +creation of new bridge device(s) with a different VLAN filtering value to
> >> +ensure VLAN awareness is pushed down to the HW.
> >> +
> >> +Finally, even when VLAN filtering in the bridge is turned off, the underlying
> >> +switch hardware and driver may still configured itself in a VLAN aware mode
> >> +provided that the behavior described above is observed.
> >> +
> > 
> > Otherwise stated: VLAN filtering shall be considered from the
> > perspective of observable behavior, and not from the perspective of
> > hardware configuration.
> 
> Yes, that is clearer.
> 
> > 
> >> +Bridge IGMP snooping
> >> +^^^^^^^^^^^^^^^^^^^^
> >> +
> >> +The Linux bridge allows the configuration of IGMP snooping (compile and run
> >> +time) which must be observed by the underlying switchdev network device/hardware
> > 
> > Same comment about "compile and run time" as above.
> > 
> >> +in the following way:
> >> +
> >> +- when IGMP snooping is turned off, multicast traffic must be flooded to all
> >> +  switch ports within the same broadcast domain. The CPU/management port
> > 
> > I think that if mc_disabled == true, multicast should be flooded only to
> > the ports which have mc_flood == true.
> 
> OK.
> 
> > 
> >> +  should ideally not be flooded and continue to learn multicast traffic through
> > 
> > unless the ingress interface has IFF_ALLMULTI or IFF_PROMISC, then I
> > suppose the CPU should be flooded from that particular port.
> 
> Yes indeed.
> 
> > 
> >> +  the network stack notifications. If the hardware is not capable of doing that
> >> +  then the CPU/management port must also be flooded and multicast filtering
> >> +  happens in software.
> >> +
> >> +- when IGMP snooping is turned on, multicast traffic must selectively flow
> >> +  to the appropriate network ports (including CPU/management port) and not be
> >> +  unnecessarily flooding.
> >> +
> > 
> > I believe that when mc_disabled == false, unknown multicast should be
> > flooded only to the ports connected to a multicast router. The local
> > device may also act as a multicast router.
> 
> OK that makes sense.
> 
> > 
> >> +The switch must adhere to RFC 4541 and flood multicast traffic accordingly
> >> +since that is what the Linux bridge implementation does.
> >> +
> > 
> > I have a lot of questions in this area.
> > Mainly, what should a driver do if the hardware can't parse IGMP/MLD but
> > just route by (maskable) layer 2 destination address?
> 
> Which hardware would fall in that category? Would that be sja1105 for
> instance?
> -- 
> Florian

Yes, that would be sja1105, although I'm sure it isn't the only one to
fall into this category.

Thanks,
-Vladimir

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH net-next] Documentation: networking: Clarify switchdev devices behavior
  2020-07-23 22:11 ` Vladimir Oltean
@ 2020-07-23 22:58   ` Florian Fainelli
  2020-07-24  0:43     ` Vladimir Oltean
  0 siblings, 1 reply; 23+ messages in thread
From: Florian Fainelli @ 2020-07-23 22:58 UTC (permalink / raw)
  To: Vladimir Oltean
  Cc: netdev, davem, andrew, vivien.didelot, cphealy, idosch, jiri,
	bridge, nikolay, roopa, rdunlap, ilias.apalodimas,
	ivan.khoronzhuk, kuba

On 7/23/20 3:11 PM, Vladimir Oltean wrote:
> On Wed, Jul 22, 2020 at 03:52:53PM -0700, Florian Fainelli wrote:
>> This patch provides details on the expected behavior of switchdev
>> enabled network devices when operating in a "stand alone" mode, as well
>> as when being bridge members. This clarifies a number of things that
>> recently came up during a bug fixing session on the b53 DSA switch
>> driver.
>>
>> Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
>> ---
>> Since this has been submitted in a while, removing the patch numbering,
>> but previous patches and discussions can be found here:
>>
>> http://patchwork.ozlabs.org/project/netdev/patch/20190103224702.21541-1-f.fainelli@gmail.com/
>> http://patchwork.ozlabs.org/project/netdev/patch/20190109043930.8534-1-f.fainelli@gmail.com/
>> http://patchwork.ozlabs.org/project/netdev/patch/20190110193206.9872-1-f.fainelli@gmail.com/
>>
>> David, I would like to hear from Vladimir and Ido specifically to make
>> sure that the documentation is up to date with expectations or desired
>> behavior so we can move forward with Vladimir's DSA rx filtering patch
>> series. So don't apply this just yet ;)
>>
>> Thanks!
>>
> 
> Thanks for giving me the opportunity to speak up.
> 
> Your addition to switchdev.rst is more than welcome, and the content is
> good. I had opened this file a few days ago searching for a few words on
> address filtering, but alas, there were none. And even now, with your
> addition - there is something, but it's more focused on multicast, and I
> haven't used that nearly enough to have a strong opinion about it. Let
> me try to add my 2 cents about what concerns me on this particular topic.
> 
> I'm not asking you to add it to your documentation patch - not that you
> could even do that, as I'm talking more about how things should be than
> how they are, things like IVDF aren't even in mainline.
> 
> If people agree at least in principle with my words below, I can make
> the necessary changes to the bridge driver to conform to this
> interpretation of things.
> 
> 
> Address filtering
> ^^^^^^^^^^^^^^^^^
> 
> With regular network interface cards, address filters are used to drop
> in hardware the frames that have a destination address different than
> what the card is configured to perform termination on.
> 
> With switchdev, the hardware is usually geared towards accepting traffic
> regardless of destination MAC address, because the primary objective is
> forwarding to another host based on that address, and not termination.
> 
> Therefore, the address filters of a switchdev interface cannot typically
> be implemented at the same hardware layer as they are for a regular
> interface. The behavior as seen by the operating system should, however,
> be the same.
> 
> In the case of a regular NIC, the expectation is that only the frames
> having a destination that is present in the RX filtering lists (managed
> through dev_uc_add() and dev_mc_add()) are accepted, while the others
> are dropped. The filters can be bypassed using the IFF_PROMISC and
> IFF_ALLMULTI flags.
> 
> A switchdev interface that is capable of offloading the Linux bridge
> should have hardware provisioning for flooding unknown destination
> addresses and learning from source addresses. Strictly speaking, the
> hardware design of such an interface should be promiscuous out of
> necessity: as long as flooding is enabled, hardware promiscuity is
> implied.
> 
> However, this is of no relevance to the operating system. Since flooding
> and forwarding happen autonomously, it makes no difference to the end
> result whether the forwarded and flooded addresses are, or aren't,
> present in the address list of the network interface corresponding to
> the switchdev port.
> 
> To achieve a similar behavior between switchdev and non-switchdev
> interfaces, address filtering for switchdev can be defined in terms of
> the frames that the CPU sees.

Nit here, I don't know if we want to refer to CPU, host or management
interface of the switch, all terms are IMHO inter changeable and clear
in the context below, though I wonder what "pure" switchdev drivers
would prefer to see being used.

> 
> - Primary MAC address: A driver should deliver to the CPU, and only to
>   the CPU, for termination purposes, frames having a destination address
>   that matches the MAC address of the ingress interface.

Ack. We could go one step further and say that this is the MAC address
of the Ethernet MAC connected to the CPU port. As we say in French this
would be busting through an open door.

> 
> - Secondary MAC addresses: A driver should deliver to the CPU frames
>   having a destination address that matches an entry added with
>   dev_uc_add() or dev_mc_add(). These typically correspond to upper
>   interfaces configured on top of the switchdev interface, such as
>   8021q, bridge, macvlan.

Ack.

> 
> - A driver is allowed to not deliver to the CPU frames that don't have a
>   match in the ingress interface's primary and secondary address lists.
>   An exception to this rule is when the interface is configured as
>   promiscuous, or to receive all multicast traffic.

Ack.

> 
> - An interface can be configured as promiscuous when it is required that
>   the CPU sees frames with an unknown destination (same as in the
>   non-switchdev case). Otherwise said, promiscuous mode manages the
>   presence (or the absence) of the CPU in the flooding domain of the
>   switch. A similar comment applies to IFF_ALLMULTI, although that case
>   applies only to unknown multicast traffic.

Ack.

> 
> - Other layers of the network stack that actively make use of switchdev
>   offloads should not request promiscuous mode for the sole purpose of
>   accepting ingress frames that will end up reinjected in the hardware
>   data path anyway. The switchdev framework can be considered to offload
>   the need of promiscuity for this purpose. An example of valid use of
>   promiscuous mode for a switchdev driver is when it is bridged with a
>   non-switchdev interface, and the CPU needs to perform termination
>   (from the hardware's perspective) of unknown-destination traffic, in
>   order to forward it in software to the other network interfaces.

Ack.

> 
> - If the hardware supports filtering MAC addresses per VLAN domain, then
>   CPU membership of a VLAN could be managed through IVDF (Individual
>   Virtual Device Filtering). Namely, the CPU should join the VLAN of all
>   IVDF addresses in its filter list, and can exit all VLANs that are not
>   there.

Ack. A complication can exist if VLAN filtering applies globally to the
switch and the CPU interface is put in promiscuous mode. We would then
expect the CPU interface to join all VLANs for the sake of receiving all
frames.

> 
>>  Documentation/networking/switchdev.rst | 118 +++++++++++++++++++++++++
>>  1 file changed, 118 insertions(+)
>>
>> diff --git a/Documentation/networking/switchdev.rst b/Documentation/networking/switchdev.rst
>> index ddc3f35775dc..2e4f50e6c63c 100644
>> --- a/Documentation/networking/switchdev.rst
>> +++ b/Documentation/networking/switchdev.rst
>> @@ -385,3 +385,121 @@ The driver can monitor for updates to arp_tbl using the netevent notifier
>>  NETEVENT_NEIGH_UPDATE.  The device can be programmed with resolved nexthops
>>  for the routes as arp_tbl updates.  The driver implements ndo_neigh_destroy
>>  to know when arp_tbl neighbor entries are purged from the port.
>> +
>> +Device driver expected behavior
>> +-------------------------------
>> +
>> +Below is a set of defined behavior that switchdev enabled network devices must
>> +adhere to.
>> +
>> +Configuration less state
>> +^^^^^^^^^^^^^^^^^^^^^^^^
>> +
>> +Upon driver bring up, the network devices must be fully operational, and the
>> +backing driver must configure the network device such that it is possible to
>> +send and receive traffic to this network device and it is properly separated
>> +from other network devices/ports (e.g.: as is frequent with a switch ASIC). How
>> +this is achieved is heavily hardware dependent, but a simple solution can be to
>> +use per-port VLAN identifiers unless a better mechanism is available
>> +(proprietary metadata for each network port for instance).
>> +
>> +The network device must be capable of running a full IP protocol stack
>> +including multicast, DHCP, IPv4/6, etc. If necessary, it should program the
>> +appropriate filters for VLAN, multicast, unicast etc. The underlying device
>> +driver must effectively be configured in a similar fashion to what it would do
>> +when IGMP snooping is enabled for IP multicast over these switchdev network
>> +devices and unsolicited multicast must be filtered as early as possible into
>> +the hardware.
>> +
>> +When configuring VLANs on top of the network device, all VLANs must be working,
>> +irrespective of the state of other network devices (e.g.: other ports being part
>> +of a VLAN aware bridge doing ingress VID checking). See below for details.
>> +
>> +If the device implements e.g.: VLAN filtering, putting the interface in
>> +promiscuous mode should allow the reception of all VLAN tags (including those
>> +not present in the filter(s)).
>> +
>> +Bridged switch ports
>> +^^^^^^^^^^^^^^^^^^^^
>> +
>> +When a switchdev enabled network device is added as a bridge member, it should
>> +not disrupt any functionality of non-bridged network devices and they
>> +should continue to behave as normal network devices. Depending on the bridge
>> +configuration knobs below, the expected behavior is documented.
>> +
>> +Bridge VLAN filtering
>> +^^^^^^^^^^^^^^^^^^^^^
>> +
>> +The Linux bridge allows the configuration of a VLAN filtering mode (compile and
>> +run time) which must be observed by the underlying switchdev network
> 
> s/compile and run time/statically, at interface creation time, and dynamically/

Thanks.

> 
>> +device/hardware:
>> +
>> +- with VLAN filtering turned off: the bridge is strictly VLAN unaware and its
>> +  data path will only process untagged Ethernet frames. Frames ingressing the
>> +  device with a VID that is not programmed into the bridge/switch's VLAN table
>> +  must be forwarded and may be processed using a VLAN device (see below).
>> +
>> +- with VLAN filtering turned on: the bridge is VLAN aware and frames ingressing
>> +  the device with a VID that is not programmed into the bridges/switch's VLAN
>> +  table must be dropped (strict VID checking).
>> +
>> +Non-bridged network ports of the same switch fabric must not be disturbed in any
>> +way by the enabling of VLAN filtering on the bridge device(s).
>> +
>> +VLAN devices configured on top of a switchdev network device (e.g: sw0p1.100)
>> +which is a bridge port member must also observe the following behavior:
>> +
>> +- with VLAN filtering turned off, enslaving VLAN devices into the bridge might
>> +  be allowed provided that there is sufficient separation using e.g.: a
>> +  reserved VLAN ID (4095 for instance) for untagged traffic. The VLAN data path
>> +  is used to pop/push the VLAN tag such that the bridge's data path only
>> +  processes untagged traffic.
>> +
> 
> Why does the bridge's data path only process untagged traffic?
> It should process frames that are untagged or have a VLAN ID which does
> not match the VLAN ID of any 8021q upper of the ingress interface.
> Which brings me to the question: how is a VLAN frame having an unknown
> (to 8021q) VLAN ID going to be treated by such a switchdev interface? It
> should accept it. Will it? Well, I don't really understand the advice
> given here, about the separation, and how does the pvid of 4095 help
> with frames that are already VLAN-tagged.

I was trying to capture a discussion Ido and I had on IRC a while ago.
He clarified that the VLAN-unaware bridge should only see untagged
traffic within its data path. To answer your question, a VLAN frame with
an unknown VID may be accepted by the switch hardware, but should not be
delivered to the bridge data path because there is no software VLAN to
process that VID.

The advice regarding separation is about the following use case: we have
two physical switch ports sw0p0 and sw0p1. sw0p1.100 is created to
terminate VID 100 tagged, and sw0p1.100 is created to terminate VID 100
tagged as well.

sw0p0 is added to a bridge, and so is sw0p1.100, it seems to me that
sw0p0.100 and sw0p1.100 should still be separate because they are not
part of the same broadcast domain. One port (sw0p0) is part of the
bridge, whereas the other (sw0p1) is not. Without a FID or internal
double tagging, I am not sure how you can maintain that separation.

Maybe this is not worth mentioning, or maybe I am wrong, having some
feedback would be welcome here.

> 
>> +- with VLAN filtering turned on, these VLAN devices can be created as long as
>> +  there is not an existing VLAN entry into the bridge with an identical VID and
>> +  port membership. These VLAN devices cannot be enslaved into the bridge since
>> +  because they duplicate functionality/use case with the bridge's VLAN data path
>> +  processing.
>> +
> 
> The way I visualize things for myself, it's not so much that the bridge
> and 8021q modules are duplicating functionality, but rather that the
> requirements are contradictory. 'bridge vlan add ...' wants to configure
> the forwarding data path, while 'ip link add link ... type vlan' wants
> to steal frames from the data path and deliver them to the CPU.

Yes, that is a good way to look at it. With a VLAN aware bridge you can
terminate VLAN traffic at the bridge level too, if your bridge master is
also part of the VLAN group, which is why I felt that explaining that
would be necessary.

> 
>> +Because VLAN filtering can be turned on/off at runtime, the switchdev driver
>> +must be able to re-configure the underlying hardware on the fly to honor the
>> +toggling of that option and behave appropriately.
>> +
>> +A switchdev driver can also refuse to support dynamic toggling of the VLAN
>> +filtering knob at runtime and require a destruction of the bridge device(s) and
>> +creation of new bridge device(s) with a different VLAN filtering value to
>> +ensure VLAN awareness is pushed down to the HW.
>> +
>> +Finally, even when VLAN filtering in the bridge is turned off, the underlying
>> +switch hardware and driver may still configured itself in a VLAN aware mode
>> +provided that the behavior described above is observed.
>> +
> 
> Otherwise stated: VLAN filtering shall be considered from the
> perspective of observable behavior, and not from the perspective of
> hardware configuration.

Yes, that is clearer.

> 
>> +Bridge IGMP snooping
>> +^^^^^^^^^^^^^^^^^^^^
>> +
>> +The Linux bridge allows the configuration of IGMP snooping (compile and run
>> +time) which must be observed by the underlying switchdev network device/hardware
> 
> Same comment about "compile and run time" as above.
> 
>> +in the following way:
>> +
>> +- when IGMP snooping is turned off, multicast traffic must be flooded to all
>> +  switch ports within the same broadcast domain. The CPU/management port
> 
> I think that if mc_disabled == true, multicast should be flooded only to
> the ports which have mc_flood == true.

OK.

> 
>> +  should ideally not be flooded and continue to learn multicast traffic through
> 
> unless the ingress interface has IFF_ALLMULTI or IFF_PROMISC, then I
> suppose the CPU should be flooded from that particular port.

Yes indeed.

> 
>> +  the network stack notifications. If the hardware is not capable of doing that
>> +  then the CPU/management port must also be flooded and multicast filtering
>> +  happens in software.
>> +
>> +- when IGMP snooping is turned on, multicast traffic must selectively flow
>> +  to the appropriate network ports (including CPU/management port) and not be
>> +  unnecessarily flooding.
>> +
> 
> I believe that when mc_disabled == false, unknown multicast should be
> flooded only to the ports connected to a multicast router. The local
> device may also act as a multicast router.

OK that makes sense.

> 
>> +The switch must adhere to RFC 4541 and flood multicast traffic accordingly
>> +since that is what the Linux bridge implementation does.
>> +
> 
> I have a lot of questions in this area.
> Mainly, what should a driver do if the hardware can't parse IGMP/MLD but
> just route by (maskable) layer 2 destination address?

Which hardware would fall in that category? Would that be sja1105 for
instance?
-- 
Florian

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH net-next] Documentation: networking: Clarify switchdev devices behavior
  2020-07-22 22:52 Florian Fainelli
  2020-07-23  2:25 ` Randy Dunlap
@ 2020-07-23 22:11 ` Vladimir Oltean
  2020-07-23 22:58   ` Florian Fainelli
  1 sibling, 1 reply; 23+ messages in thread
From: Vladimir Oltean @ 2020-07-23 22:11 UTC (permalink / raw)
  To: Florian Fainelli
  Cc: netdev, davem, andrew, vivien.didelot, cphealy, idosch, jiri,
	bridge, nikolay, roopa, rdunlap, ilias.apalodimas,
	ivan.khoronzhuk, kuba

On Wed, Jul 22, 2020 at 03:52:53PM -0700, Florian Fainelli wrote:
> This patch provides details on the expected behavior of switchdev
> enabled network devices when operating in a "stand alone" mode, as well
> as when being bridge members. This clarifies a number of things that
> recently came up during a bug fixing session on the b53 DSA switch
> driver.
> 
> Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
> ---
> Since this has been submitted in a while, removing the patch numbering,
> but previous patches and discussions can be found here:
> 
> http://patchwork.ozlabs.org/project/netdev/patch/20190103224702.21541-1-f.fainelli@gmail.com/
> http://patchwork.ozlabs.org/project/netdev/patch/20190109043930.8534-1-f.fainelli@gmail.com/
> http://patchwork.ozlabs.org/project/netdev/patch/20190110193206.9872-1-f.fainelli@gmail.com/
> 
> David, I would like to hear from Vladimir and Ido specifically to make
> sure that the documentation is up to date with expectations or desired
> behavior so we can move forward with Vladimir's DSA rx filtering patch
> series. So don't apply this just yet ;)
> 
> Thanks!
> 

Thanks for giving me the opportunity to speak up.

Your addition to switchdev.rst is more than welcome, and the content is
good. I had opened this file a few days ago searching for a few words on
address filtering, but alas, there were none. And even now, with your
addition - there is something, but it's more focused on multicast, and I
haven't used that nearly enough to have a strong opinion about it. Let
me try to add my 2 cents about what concerns me on this particular topic.

I'm not asking you to add it to your documentation patch - not that you
could even do that, as I'm talking more about how things should be than
how they are, things like IVDF aren't even in mainline.

If people agree at least in principle with my words below, I can make
the necessary changes to the bridge driver to conform to this
interpretation of things.


Address filtering
^^^^^^^^^^^^^^^^^

With regular network interface cards, address filters are used to drop
in hardware the frames that have a destination address different than
what the card is configured to perform termination on.

With switchdev, the hardware is usually geared towards accepting traffic
regardless of destination MAC address, because the primary objective is
forwarding to another host based on that address, and not termination.

Therefore, the address filters of a switchdev interface cannot typically
be implemented at the same hardware layer as they are for a regular
interface. The behavior as seen by the operating system should, however,
be the same.

In the case of a regular NIC, the expectation is that only the frames
having a destination that is present in the RX filtering lists (managed
through dev_uc_add() and dev_mc_add()) are accepted, while the others
are dropped. The filters can be bypassed using the IFF_PROMISC and
IFF_ALLMULTI flags.

A switchdev interface that is capable of offloading the Linux bridge
should have hardware provisioning for flooding unknown destination
addresses and learning from source addresses. Strictly speaking, the
hardware design of such an interface should be promiscuous out of
necessity: as long as flooding is enabled, hardware promiscuity is
implied.

However, this is of no relevance to the operating system. Since flooding
and forwarding happen autonomously, it makes no difference to the end
result whether the forwarded and flooded addresses are, or aren't,
present in the address list of the network interface corresponding to
the switchdev port.

To achieve a similar behavior between switchdev and non-switchdev
interfaces, address filtering for switchdev can be defined in terms of
the frames that the CPU sees.

- Primary MAC address: A driver should deliver to the CPU, and only to
  the CPU, for termination purposes, frames having a destination address
  that matches the MAC address of the ingress interface.

- Secondary MAC addresses: A driver should deliver to the CPU frames
  having a destination address that matches an entry added with
  dev_uc_add() or dev_mc_add(). These typically correspond to upper
  interfaces configured on top of the switchdev interface, such as
  8021q, bridge, macvlan.

- A driver is allowed to not deliver to the CPU frames that don't have a
  match in the ingress interface's primary and secondary address lists.
  An exception to this rule is when the interface is configured as
  promiscuous, or to receive all multicast traffic.

- An interface can be configured as promiscuous when it is required that
  the CPU sees frames with an unknown destination (same as in the
  non-switchdev case). Otherwise said, promiscuous mode manages the
  presence (or the absence) of the CPU in the flooding domain of the
  switch. A similar comment applies to IFF_ALLMULTI, although that case
  applies only to unknown multicast traffic.

- Other layers of the network stack that actively make use of switchdev
  offloads should not request promiscuous mode for the sole purpose of
  accepting ingress frames that will end up reinjected in the hardware
  data path anyway. The switchdev framework can be considered to offload
  the need of promiscuity for this purpose. An example of valid use of
  promiscuous mode for a switchdev driver is when it is bridged with a
  non-switchdev interface, and the CPU needs to perform termination
  (from the hardware's perspective) of unknown-destination traffic, in
  order to forward it in software to the other network interfaces.

- If the hardware supports filtering MAC addresses per VLAN domain, then
  CPU membership of a VLAN could be managed through IVDF (Individual
  Virtual Device Filtering). Namely, the CPU should join the VLAN of all
  IVDF addresses in its filter list, and can exit all VLANs that are not
  there.

>  Documentation/networking/switchdev.rst | 118 +++++++++++++++++++++++++
>  1 file changed, 118 insertions(+)
> 
> diff --git a/Documentation/networking/switchdev.rst b/Documentation/networking/switchdev.rst
> index ddc3f35775dc..2e4f50e6c63c 100644
> --- a/Documentation/networking/switchdev.rst
> +++ b/Documentation/networking/switchdev.rst
> @@ -385,3 +385,121 @@ The driver can monitor for updates to arp_tbl using the netevent notifier
>  NETEVENT_NEIGH_UPDATE.  The device can be programmed with resolved nexthops
>  for the routes as arp_tbl updates.  The driver implements ndo_neigh_destroy
>  to know when arp_tbl neighbor entries are purged from the port.
> +
> +Device driver expected behavior
> +-------------------------------
> +
> +Below is a set of defined behavior that switchdev enabled network devices must
> +adhere to.
> +
> +Configuration less state
> +^^^^^^^^^^^^^^^^^^^^^^^^
> +
> +Upon driver bring up, the network devices must be fully operational, and the
> +backing driver must configure the network device such that it is possible to
> +send and receive traffic to this network device and it is properly separated
> +from other network devices/ports (e.g.: as is frequent with a switch ASIC). How
> +this is achieved is heavily hardware dependent, but a simple solution can be to
> +use per-port VLAN identifiers unless a better mechanism is available
> +(proprietary metadata for each network port for instance).
> +
> +The network device must be capable of running a full IP protocol stack
> +including multicast, DHCP, IPv4/6, etc. If necessary, it should program the
> +appropriate filters for VLAN, multicast, unicast etc. The underlying device
> +driver must effectively be configured in a similar fashion to what it would do
> +when IGMP snooping is enabled for IP multicast over these switchdev network
> +devices and unsolicited multicast must be filtered as early as possible into
> +the hardware.
> +
> +When configuring VLANs on top of the network device, all VLANs must be working,
> +irrespective of the state of other network devices (e.g.: other ports being part
> +of a VLAN aware bridge doing ingress VID checking). See below for details.
> +
> +If the device implements e.g.: VLAN filtering, putting the interface in
> +promiscuous mode should allow the reception of all VLAN tags (including those
> +not present in the filter(s)).
> +
> +Bridged switch ports
> +^^^^^^^^^^^^^^^^^^^^
> +
> +When a switchdev enabled network device is added as a bridge member, it should
> +not disrupt any functionality of non-bridged network devices and they
> +should continue to behave as normal network devices. Depending on the bridge
> +configuration knobs below, the expected behavior is documented.
> +
> +Bridge VLAN filtering
> +^^^^^^^^^^^^^^^^^^^^^
> +
> +The Linux bridge allows the configuration of a VLAN filtering mode (compile and
> +run time) which must be observed by the underlying switchdev network

s/compile and run time/statically, at interface creation time, and dynamically/

> +device/hardware:
> +
> +- with VLAN filtering turned off: the bridge is strictly VLAN unaware and its
> +  data path will only process untagged Ethernet frames. Frames ingressing the
> +  device with a VID that is not programmed into the bridge/switch's VLAN table
> +  must be forwarded and may be processed using a VLAN device (see below).
> +
> +- with VLAN filtering turned on: the bridge is VLAN aware and frames ingressing
> +  the device with a VID that is not programmed into the bridges/switch's VLAN
> +  table must be dropped (strict VID checking).
> +
> +Non-bridged network ports of the same switch fabric must not be disturbed in any
> +way by the enabling of VLAN filtering on the bridge device(s).
> +
> +VLAN devices configured on top of a switchdev network device (e.g: sw0p1.100)
> +which is a bridge port member must also observe the following behavior:
> +
> +- with VLAN filtering turned off, enslaving VLAN devices into the bridge might
> +  be allowed provided that there is sufficient separation using e.g.: a
> +  reserved VLAN ID (4095 for instance) for untagged traffic. The VLAN data path
> +  is used to pop/push the VLAN tag such that the bridge's data path only
> +  processes untagged traffic.
> +

Why does the bridge's data path only process untagged traffic?
It should process frames that are untagged or have a VLAN ID which does
not match the VLAN ID of any 8021q upper of the ingress interface.
Which brings me to the question: how is a VLAN frame having an unknown
(to 8021q) VLAN ID going to be treated by such a switchdev interface? It
should accept it. Will it? Well, I don't really understand the advice
given here, about the separation, and how does the pvid of 4095 help
with frames that are already VLAN-tagged.

> +- with VLAN filtering turned on, these VLAN devices can be created as long as
> +  there is not an existing VLAN entry into the bridge with an identical VID and
> +  port membership. These VLAN devices cannot be enslaved into the bridge since
> +  because they duplicate functionality/use case with the bridge's VLAN data path
> +  processing.
> +

The way I visualize things for myself, it's not so much that the bridge
and 8021q modules are duplicating functionality, but rather that the
requirements are contradictory. 'bridge vlan add ...' wants to configure
the forwarding data path, while 'ip link add link ... type vlan' wants
to steal frames from the data path and deliver them to the CPU.

> +Because VLAN filtering can be turned on/off at runtime, the switchdev driver
> +must be able to re-configure the underlying hardware on the fly to honor the
> +toggling of that option and behave appropriately.
> +
> +A switchdev driver can also refuse to support dynamic toggling of the VLAN
> +filtering knob at runtime and require a destruction of the bridge device(s) and
> +creation of new bridge device(s) with a different VLAN filtering value to
> +ensure VLAN awareness is pushed down to the HW.
> +
> +Finally, even when VLAN filtering in the bridge is turned off, the underlying
> +switch hardware and driver may still configured itself in a VLAN aware mode
> +provided that the behavior described above is observed.
> +

Otherwise stated: VLAN filtering shall be considered from the
perspective of observable behavior, and not from the perspective of
hardware configuration.

> +Bridge IGMP snooping
> +^^^^^^^^^^^^^^^^^^^^
> +
> +The Linux bridge allows the configuration of IGMP snooping (compile and run
> +time) which must be observed by the underlying switchdev network device/hardware

Same comment about "compile and run time" as above.

> +in the following way:
> +
> +- when IGMP snooping is turned off, multicast traffic must be flooded to all
> +  switch ports within the same broadcast domain. The CPU/management port

I think that if mc_disabled == true, multicast should be flooded only to
the ports which have mc_flood == true.

> +  should ideally not be flooded and continue to learn multicast traffic through

unless the ingress interface has IFF_ALLMULTI or IFF_PROMISC, then I
suppose the CPU should be flooded from that particular port.

> +  the network stack notifications. If the hardware is not capable of doing that
> +  then the CPU/management port must also be flooded and multicast filtering
> +  happens in software.
> +
> +- when IGMP snooping is turned on, multicast traffic must selectively flow
> +  to the appropriate network ports (including CPU/management port) and not be
> +  unnecessarily flooding.
> +

I believe that when mc_disabled == false, unknown multicast should be
flooded only to the ports connected to a multicast router. The local
device may also act as a multicast router.

> +The switch must adhere to RFC 4541 and flood multicast traffic accordingly
> +since that is what the Linux bridge implementation does.
> +

I have a lot of questions in this area.
Mainly, what should a driver do if the hardware can't parse IGMP/MLD but
just route by (maskable) layer 2 destination address?

> +Because IGMP snooping can be turned on/off at runtime, the switchdev driver
> +must be able to re-configure the underlying hardware on the fly to honor the
> +toggling of that option and behave appropriately.
> +
> +A switchdev driver can also refuse to support dynamic toggling of the multicast
> +snooping knob at runtime and require the destruction of the bridge device(s)
> +and creation of a new bridge device(s) with a different multicast snooping
> +value.
> -- 
> 2.17.1
> 

Thanks,
-Vladimir

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH net-next] Documentation: networking: Clarify switchdev devices behavior
  2020-07-22 22:52 Florian Fainelli
@ 2020-07-23  2:25 ` Randy Dunlap
  2020-07-23 22:11 ` Vladimir Oltean
  1 sibling, 0 replies; 23+ messages in thread
From: Randy Dunlap @ 2020-07-23  2:25 UTC (permalink / raw)
  To: Florian Fainelli, netdev
  Cc: davem, andrew, vivien.didelot, cphealy, idosch, jiri, bridge,
	nikolay, roopa, ilias.apalodimas, ivan.khoronzhuk, olteanv, kuba

Hi,

This mostly looks good to me. I have a few edits below.


On 7/22/20 3:52 PM, Florian Fainelli wrote:
> This patch provides details on the expected behavior of switchdev
> enabled network devices when operating in a "stand alone" mode, as well
> as when being bridge members. This clarifies a number of things that
> recently came up during a bug fixing session on the b53 DSA switch
> driver.
> 
> Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
> ---

>  Documentation/networking/switchdev.rst | 118 +++++++++++++++++++++++++
>  1 file changed, 118 insertions(+)
> 
> diff --git a/Documentation/networking/switchdev.rst b/Documentation/networking/switchdev.rst
> index ddc3f35775dc..2e4f50e6c63c 100644
> --- a/Documentation/networking/switchdev.rst
> +++ b/Documentation/networking/switchdev.rst
> @@ -385,3 +385,121 @@ The driver can monitor for updates to arp_tbl using the netevent notifier
>  NETEVENT_NEIGH_UPDATE.  The device can be programmed with resolved nexthops
>  for the routes as arp_tbl updates.  The driver implements ndo_neigh_destroy
>  to know when arp_tbl neighbor entries are purged from the port.
> +
> +Device driver expected behavior
> +-------------------------------
> +
> +Below is a set of defined behavior that switchdev enabled network devices must
> +adhere to.
> +
> +Configuration less state

   Configuration-less state

> +^^^^^^^^^^^^^^^^^^^^^^^^
> +
> +Upon driver bring up, the network devices must be fully operational, and the
> +backing driver must configure the network device such that it is possible to
> +send and receive traffic to this network device and it is properly separated
> +from other network devices/ports (e.g.: as is frequent with a switch ASIC). How
> +this is achieved is heavily hardware dependent, but a simple solution can be to
> +use per-port VLAN identifiers unless a better mechanism is available
> +(proprietary metadata for each network port for instance).
> +
> +The network device must be capable of running a full IP protocol stack
> +including multicast, DHCP, IPv4/6, etc. If necessary, it should program the
> +appropriate filters for VLAN, multicast, unicast etc. The underlying device
> +driver must effectively be configured in a similar fashion to what it would do
> +when IGMP snooping is enabled for IP multicast over these switchdev network
> +devices and unsolicited multicast must be filtered as early as possible into
> +the hardware.
> +
> +When configuring VLANs on top of the network device, all VLANs must be working,
> +irrespective of the state of other network devices (e.g.: other ports being part
> +of a VLAN aware bridge doing ingress VID checking). See below for details.

        VLAN-aware

> +
> +If the device implements e.g.: VLAN filtering, putting the interface in
> +promiscuous mode should allow the reception of all VLAN tags (including those
> +not present in the filter(s)).
> +
> +Bridged switch ports
> +^^^^^^^^^^^^^^^^^^^^
> +
> +When a switchdev enabled network device is added as a bridge member, it should

          switchdev-enabled

> +not disrupt any functionality of non-bridged network devices and they
> +should continue to behave as normal network devices. Depending on the bridge
> +configuration knobs below, the expected behavior is documented.
> +
> +Bridge VLAN filtering
> +^^^^^^^^^^^^^^^^^^^^^
> +
> +The Linux bridge allows the configuration of a VLAN filtering mode (compile and
> +run time) which must be observed by the underlying switchdev network
> +device/hardware:
> +
> +- with VLAN filtering turned off: the bridge is strictly VLAN unaware and its
> +  data path will only process untagged Ethernet frames. Frames ingressing the
> +  device with a VID that is not programmed into the bridge/switch's VLAN table
> +  must be forwarded and may be processed using a VLAN device (see below).
> +
> +- with VLAN filtering turned on: the bridge is VLAN aware and frames ingressing
> +  the device with a VID that is not programmed into the bridges/switch's VLAN
> +  table must be dropped (strict VID checking).
> +
> +Non-bridged network ports of the same switch fabric must not be disturbed in any
> +way by the enabling of VLAN filtering on the bridge device(s).
> +
> +VLAN devices configured on top of a switchdev network device (e.g: sw0p1.100)
> +which is a bridge port member must also observe the following behavior:
> +
> +- with VLAN filtering turned off, enslaving VLAN devices into the bridge might
> +  be allowed provided that there is sufficient separation using e.g.: a
> +  reserved VLAN ID (4095 for instance) for untagged traffic. The VLAN data path
> +  is used to pop/push the VLAN tag such that the bridge's data path only
> +  processes untagged traffic.
> +
> +- with VLAN filtering turned on, these VLAN devices can be created as long as
> +  there is not an existing VLAN entry into the bridge with an identical VID and
> +  port membership. These VLAN devices cannot be enslaved into the bridge since
> +  because they duplicate functionality/use case with the bridge's VLAN data path

drop one of: since / because

> +  processing.
> +
> +Because VLAN filtering can be turned on/off at runtime, the switchdev driver
> +must be able to re-configure the underlying hardware on the fly to honor the

                   reconfigure

> +toggling of that option and behave appropriately.
> +
> +A switchdev driver can also refuse to support dynamic toggling of the VLAN
> +filtering knob at runtime and require a destruction of the bridge device(s) and
> +creation of new bridge device(s) with a different VLAN filtering value to
> +ensure VLAN awareness is pushed down to the HW.

              (preferably)                     hardware.

> +
> +Finally, even when VLAN filtering in the bridge is turned off, the underlying
> +switch hardware and driver may still configured itself in a VLAN aware mode

                                        configure              VLAN-aware

> +provided that the behavior described above is observed.
> +
> +Bridge IGMP snooping
> +^^^^^^^^^^^^^^^^^^^^
> +
> +The Linux bridge allows the configuration of IGMP snooping (compile and run
> +time) which must be observed by the underlying switchdev network device/hardware
> +in the following way:
> +
> +- when IGMP snooping is turned off, multicast traffic must be flooded to all
> +  switch ports within the same broadcast domain. The CPU/management port
> +  should ideally not be flooded and continue to learn multicast traffic through
> +  the network stack notifications. If the hardware is not capable of doing that
> +  then the CPU/management port must also be flooded and multicast filtering
> +  happens in software.
> +
> +- when IGMP snooping is turned on, multicast traffic must selectively flow
> +  to the appropriate network ports (including CPU/management port) and not be
> +  unnecessarily flooding.
> +
> +The switch must adhere to RFC 4541 and flood multicast traffic accordingly
> +since that is what the Linux bridge implementation does.
> +
> +Because IGMP snooping can be turned on/off at runtime, the switchdev driver
> +must be able to re-configure the underlying hardware on the fly to honor the

                   reconfigure

> +toggling of that option and behave appropriately.
> +
> +A switchdev driver can also refuse to support dynamic toggling of the multicast
> +snooping knob at runtime and require the destruction of the bridge device(s)
> +and creation of a new bridge device(s) with a different multicast snooping
> +value.


thanks.
-- 
~Randy


^ permalink raw reply	[flat|nested] 23+ messages in thread

* [PATCH net-next] Documentation: networking: Clarify switchdev devices behavior
@ 2020-07-22 22:52 Florian Fainelli
  2020-07-23  2:25 ` Randy Dunlap
  2020-07-23 22:11 ` Vladimir Oltean
  0 siblings, 2 replies; 23+ messages in thread
From: Florian Fainelli @ 2020-07-22 22:52 UTC (permalink / raw)
  To: netdev
  Cc: Florian Fainelli, davem, andrew, vivien.didelot, cphealy, idosch,
	jiri, bridge, nikolay, roopa, rdunlap, ilias.apalodimas,
	ivan.khoronzhuk, olteanv, kuba

This patch provides details on the expected behavior of switchdev
enabled network devices when operating in a "stand alone" mode, as well
as when being bridge members. This clarifies a number of things that
recently came up during a bug fixing session on the b53 DSA switch
driver.

Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
---
Since this has been submitted in a while, removing the patch numbering,
but previous patches and discussions can be found here:

http://patchwork.ozlabs.org/project/netdev/patch/20190103224702.21541-1-f.fainelli@gmail.com/
http://patchwork.ozlabs.org/project/netdev/patch/20190109043930.8534-1-f.fainelli@gmail.com/
http://patchwork.ozlabs.org/project/netdev/patch/20190110193206.9872-1-f.fainelli@gmail.com/

David, I would like to hear from Vladimir and Ido specifically to make
sure that the documentation is up to date with expectations or desired
behavior so we can move forward with Vladimir's DSA rx filtering patch
series. So don't apply this just yet ;)

Thanks!

 Documentation/networking/switchdev.rst | 118 +++++++++++++++++++++++++
 1 file changed, 118 insertions(+)

diff --git a/Documentation/networking/switchdev.rst b/Documentation/networking/switchdev.rst
index ddc3f35775dc..2e4f50e6c63c 100644
--- a/Documentation/networking/switchdev.rst
+++ b/Documentation/networking/switchdev.rst
@@ -385,3 +385,121 @@ The driver can monitor for updates to arp_tbl using the netevent notifier
 NETEVENT_NEIGH_UPDATE.  The device can be programmed with resolved nexthops
 for the routes as arp_tbl updates.  The driver implements ndo_neigh_destroy
 to know when arp_tbl neighbor entries are purged from the port.
+
+Device driver expected behavior
+-------------------------------
+
+Below is a set of defined behavior that switchdev enabled network devices must
+adhere to.
+
+Configuration less state
+^^^^^^^^^^^^^^^^^^^^^^^^
+
+Upon driver bring up, the network devices must be fully operational, and the
+backing driver must configure the network device such that it is possible to
+send and receive traffic to this network device and it is properly separated
+from other network devices/ports (e.g.: as is frequent with a switch ASIC). How
+this is achieved is heavily hardware dependent, but a simple solution can be to
+use per-port VLAN identifiers unless a better mechanism is available
+(proprietary metadata for each network port for instance).
+
+The network device must be capable of running a full IP protocol stack
+including multicast, DHCP, IPv4/6, etc. If necessary, it should program the
+appropriate filters for VLAN, multicast, unicast etc. The underlying device
+driver must effectively be configured in a similar fashion to what it would do
+when IGMP snooping is enabled for IP multicast over these switchdev network
+devices and unsolicited multicast must be filtered as early as possible into
+the hardware.
+
+When configuring VLANs on top of the network device, all VLANs must be working,
+irrespective of the state of other network devices (e.g.: other ports being part
+of a VLAN aware bridge doing ingress VID checking). See below for details.
+
+If the device implements e.g.: VLAN filtering, putting the interface in
+promiscuous mode should allow the reception of all VLAN tags (including those
+not present in the filter(s)).
+
+Bridged switch ports
+^^^^^^^^^^^^^^^^^^^^
+
+When a switchdev enabled network device is added as a bridge member, it should
+not disrupt any functionality of non-bridged network devices and they
+should continue to behave as normal network devices. Depending on the bridge
+configuration knobs below, the expected behavior is documented.
+
+Bridge VLAN filtering
+^^^^^^^^^^^^^^^^^^^^^
+
+The Linux bridge allows the configuration of a VLAN filtering mode (compile and
+run time) which must be observed by the underlying switchdev network
+device/hardware:
+
+- with VLAN filtering turned off: the bridge is strictly VLAN unaware and its
+  data path will only process untagged Ethernet frames. Frames ingressing the
+  device with a VID that is not programmed into the bridge/switch's VLAN table
+  must be forwarded and may be processed using a VLAN device (see below).
+
+- with VLAN filtering turned on: the bridge is VLAN aware and frames ingressing
+  the device with a VID that is not programmed into the bridges/switch's VLAN
+  table must be dropped (strict VID checking).
+
+Non-bridged network ports of the same switch fabric must not be disturbed in any
+way by the enabling of VLAN filtering on the bridge device(s).
+
+VLAN devices configured on top of a switchdev network device (e.g: sw0p1.100)
+which is a bridge port member must also observe the following behavior:
+
+- with VLAN filtering turned off, enslaving VLAN devices into the bridge might
+  be allowed provided that there is sufficient separation using e.g.: a
+  reserved VLAN ID (4095 for instance) for untagged traffic. The VLAN data path
+  is used to pop/push the VLAN tag such that the bridge's data path only
+  processes untagged traffic.
+
+- with VLAN filtering turned on, these VLAN devices can be created as long as
+  there is not an existing VLAN entry into the bridge with an identical VID and
+  port membership. These VLAN devices cannot be enslaved into the bridge since
+  because they duplicate functionality/use case with the bridge's VLAN data path
+  processing.
+
+Because VLAN filtering can be turned on/off at runtime, the switchdev driver
+must be able to re-configure the underlying hardware on the fly to honor the
+toggling of that option and behave appropriately.
+
+A switchdev driver can also refuse to support dynamic toggling of the VLAN
+filtering knob at runtime and require a destruction of the bridge device(s) and
+creation of new bridge device(s) with a different VLAN filtering value to
+ensure VLAN awareness is pushed down to the HW.
+
+Finally, even when VLAN filtering in the bridge is turned off, the underlying
+switch hardware and driver may still configured itself in a VLAN aware mode
+provided that the behavior described above is observed.
+
+Bridge IGMP snooping
+^^^^^^^^^^^^^^^^^^^^
+
+The Linux bridge allows the configuration of IGMP snooping (compile and run
+time) which must be observed by the underlying switchdev network device/hardware
+in the following way:
+
+- when IGMP snooping is turned off, multicast traffic must be flooded to all
+  switch ports within the same broadcast domain. The CPU/management port
+  should ideally not be flooded and continue to learn multicast traffic through
+  the network stack notifications. If the hardware is not capable of doing that
+  then the CPU/management port must also be flooded and multicast filtering
+  happens in software.
+
+- when IGMP snooping is turned on, multicast traffic must selectively flow
+  to the appropriate network ports (including CPU/management port) and not be
+  unnecessarily flooding.
+
+The switch must adhere to RFC 4541 and flood multicast traffic accordingly
+since that is what the Linux bridge implementation does.
+
+Because IGMP snooping can be turned on/off at runtime, the switchdev driver
+must be able to re-configure the underlying hardware on the fly to honor the
+toggling of that option and behave appropriately.
+
+A switchdev driver can also refuse to support dynamic toggling of the multicast
+snooping knob at runtime and require the destruction of the bridge device(s)
+and creation of a new bridge device(s) with a different multicast snooping
+value.
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 23+ messages in thread

end of thread, other threads:[~2020-07-24  0:43 UTC | newest]

Thread overview: 23+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-12-12 23:09 [PATCH net-next] Documentation: networking: Clarify switchdev devices behavior Florian Fainelli
2018-12-12 23:09 ` [Bridge] " Florian Fainelli
2018-12-13  9:26 ` Andrew Lunn
2018-12-13  9:26   ` [Bridge] " Andrew Lunn
2018-12-15 19:35 ` David Miller
2018-12-15 19:35   ` [Bridge] " David Miller
2018-12-16  8:25 ` Ido Schimmel
2018-12-16  8:25   ` [Bridge] " Ido Schimmel
2018-12-16 17:14   ` Florian Fainelli
2018-12-16 17:14     ` [Bridge] " Florian Fainelli
2018-12-18  7:01     ` Ido Schimmel
2018-12-18  7:01       ` [Bridge] " Ido Schimmel
2018-12-18 20:13       ` Florian Fainelli
2018-12-18 20:13         ` [Bridge] " Florian Fainelli
2018-12-22 20:29         ` Ido Schimmel
2018-12-22 20:29           ` [Bridge] " Ido Schimmel
2018-12-17  3:36   ` Florian Fainelli
2018-12-17  3:36     ` [Bridge] " Florian Fainelli
2020-07-22 22:52 Florian Fainelli
2020-07-23  2:25 ` Randy Dunlap
2020-07-23 22:11 ` Vladimir Oltean
2020-07-23 22:58   ` Florian Fainelli
2020-07-24  0:43     ` Vladimir Oltean

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.