netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 0/4] RFC CPSW switchdev mode
@ 2018-05-24  6:56 Ilias Apalodimas
  2018-05-24  6:56 ` [PATCH 1/4] cpsw: move common headers definitions to cpsw_priv.h Ilias Apalodimas
                   ` (5 more replies)
  0 siblings, 6 replies; 53+ messages in thread
From: Ilias Apalodimas @ 2018-05-24  6:56 UTC (permalink / raw)
  To: netdev, grygorii.strashko, ivan.khoronzhuk, nsekhar, jiri, ivecera
  Cc: francois.ozog, yogeshs, spatton, Ilias Apalodimas

Hello, 

This is adding a new mode on the cpsw driver based around switchdev.
In order to enable this you need to enable CONFIG_NET_SWITCHDEV, 
CONFIG_BRIDGE_VLAN_FILTERING, CONFIG_TI_CPSW_SWITCHDEV
and add to udev config: 

SUBSYSTEM=="net", ACTION=="add", ATTR{phys_switch_id}=="0f011900", \
        ATTR{phys_port_name}!="", NAME="sw0$attr{phys_port_name}"
Since the phys_switch_id is based on cpsw version, users with different 
version will need to do 'ip -d link show dev sw0p0 | grep switchid' and 
replace with the correct value.

This patch creates 3 ports, sw0p0, sw0p1 and sw0p2.
sw0p1 and sw0p2 are the netdev interfaces connected to PHY devices
while sw0p0 is the switch 'cpu facing port'.
sw0p0 will be unable to receive and transmit traffic and it's not 100% within
switchdev scope but, it's used to configure switch cpu port individually as 
this is needed for various switch features and configuration scenarios.

Bridge setup:
ip link add name br0 type bridge
ip link set dev br0 type bridge ageing_time 1000
ip link set dev br0 type bridge vlan_filtering 1

ip link set dev sw0p1 up
ip link set dev sw0p2 up
ip link set dev sw0p0 up
ip link set dev sw0p0 master br0
ip link set dev sw0p2 master br0
ip link set dev sw0p1 master br0

ip link set br0 address $(cat /sys/class/net/sw0p1/address)
ifconfig br0 up

VLAN config:
untagged:
bridge vlan add dev sw0p1 vid 100 pvid untagged master
bridge vlan add dev sw0p2 vid 100 pvid untagged master

tagged:
bridge vlan add dev sw0p1 vid 100 master
bridge vlan add dev sw0p2 vid 100 master

IP address on br0:
bridge vlan add dev br0 vid 100 pvid untagged self
bridge vlan add dev sw0p0 vid 100 pvid untagged master
udhcpc -i br0

FDBs:
bridge fdb add aa:bb:cc:dd:ee:ff dev sw0p1 master vlan 100
bridge fdb add aa:bb:cc:dd:ee:fe dev sw0p2 master

MDBs:
single vlan:
bridge mdb add dev br0 port sw0p1 grp 239.1.1.1 permanent vid 100

all vlans:
bridge mdb add dev br0 port sw0p2 grp 239.1.1.1 permanent
bridge mdb add dev br0 port sw0p0 grp 239.1.1.1 permanent

Multicast:
setting multicast on and off will affect registered multicast
setting allmulti on and off will affect unregistered multicast
This muct occur before adding VLANs on the interfaces. If you change the
flag after the VLAN configuration you need to re-issue the VLAN config 
commands.

Promiscuous mode:
Adding/removing sw0p0 on the bridge will enable/disable ALE_P0_UNI_FLOOD

NFS:
The only way for NFS to work is by chrooting to a minimal environment when 
switch configuration that will affect connectivity is needed.
Assuming you are booting NFS with eth1 interface(the script is hacky and 
it's just there to prove NFS is doable).

setup.sh:
#!/bin/sh
mkdir proc
mount -t proc none /proc

ifconfig br0  > /dev/null
if [ $? -ne 0 ]; then
        echo "Setting up bridge"
        ip link add name br0 type bridge
        ip link set dev br0 type bridge ageing_time 1000
        ip link set dev br0 type bridge vlan_filtering 1

        ip link set eth1 down 
        ip link set eth1 name sw0p1 
        ip link set dev sw0p1 up
        ip link set dev sw0p2 up
        ip link set dev sw0p0 up
        ip link set dev sw0p0 master br0 
        ip link set dev sw0p2 master br0
        ip link set dev sw0p1 master br0
        ifconfig sw0p1 0.0.0.0
        udhchc -i br0
fi
umount /proc

run_nfs.sh:
#!/bin/sh
mkdir /tmp/root/bin -p
mkdir /tmp/root/lib -p

cp -r /lib/ /tmp/root/
cp -r /bin/ /tmp/root/
cp /sbin/ip /tmp/root/bin
cp /sbin/bridge /tmp/root/bin
cp /sbin/ifconfig /tmp/root/bin
cp /sbin/udhcpc /tmp/root/bin
cp /path/to/setup.sh /tmp/root/bin
chroot /tmp/root/ busybox sh /bin/run_nfs.sh

run ./run_nfs.sh

This is on top of 4.17-rc2 tree.

P.S: I am not 100% sure that the promiscuity handling is correct.
Please let me know if i should change anything on that

Ilias Apalodimas (4):
  cpsw: move common headers definitions to cpsw_priv.h
  cpsw_ale: add support functions for switchdev
  cpsw_switchdev: add switchdev support files
  cpsw: add switchdev support

 drivers/net/ethernet/ti/Kconfig          |   9 +
 drivers/net/ethernet/ti/Makefile         |   1 +
 drivers/net/ethernet/ti/cpsw.c           | 610 ++++++++++++++++++++++---------
 drivers/net/ethernet/ti/cpsw_ale.c       | 129 +++++++
 drivers/net/ethernet/ti/cpsw_ale.h       |   8 +
 drivers/net/ethernet/ti/cpsw_priv.h      | 148 ++++++++
 drivers/net/ethernet/ti/cpsw_switchdev.c | 299 +++++++++++++++
 drivers/net/ethernet/ti/cpsw_switchdev.h |   4 +
 8 files changed, 1039 insertions(+), 169 deletions(-)
 create mode 100644 drivers/net/ethernet/ti/cpsw_priv.h
 create mode 100644 drivers/net/ethernet/ti/cpsw_switchdev.c
 create mode 100644 drivers/net/ethernet/ti/cpsw_switchdev.h

-- 
2.7.4

^ permalink raw reply	[flat|nested] 53+ messages in thread

* [PATCH 1/4] cpsw: move common headers definitions to cpsw_priv.h
  2018-05-24  6:56 [PATCH 0/4] RFC CPSW switchdev mode Ilias Apalodimas
@ 2018-05-24  6:56 ` Ilias Apalodimas
  2018-05-24  6:56 ` [PATCH 2/4] cpsw_ale: add support functions for switchdev Ilias Apalodimas
                   ` (4 subsequent siblings)
  5 siblings, 0 replies; 53+ messages in thread
From: Ilias Apalodimas @ 2018-05-24  6:56 UTC (permalink / raw)
  To: netdev, grygorii.strashko, ivan.khoronzhuk, nsekhar, jiri, ivecera
  Cc: francois.ozog, yogeshs, spatton, Ilias Apalodimas

Signed-off-by: Ilias Apalodimas <ilias.apalodimas@linaro.org>
---
 drivers/net/ethernet/ti/cpsw.c      | 111 +---------------------------
 drivers/net/ethernet/ti/cpsw_priv.h | 141 ++++++++++++++++++++++++++++++++++++
 2 files changed, 142 insertions(+), 110 deletions(-)
 create mode 100644 drivers/net/ethernet/ti/cpsw_priv.h

diff --git a/drivers/net/ethernet/ti/cpsw.c b/drivers/net/ethernet/ti/cpsw.c
index 3037127..b16e7cf 100644
--- a/drivers/net/ethernet/ti/cpsw.c
+++ b/drivers/net/ethernet/ti/cpsw.c
@@ -41,6 +41,7 @@
 
 #include "cpsw.h"
 #include "cpsw_ale.h"
+#include "cpsw_priv.h"
 #include "cpts.h"
 #include "davinci_cpdma.h"
 
@@ -88,7 +89,6 @@ do {								\
 #define CPSW_VERSION_3		0x19010f
 #define CPSW_VERSION_4		0x190112
 
-#define HOST_PORT_NUM		0
 #define CPSW_ALE_PORTS_NUM	3
 #define SLIVER_SIZE		0x40
 
@@ -309,16 +309,6 @@ struct cpsw_ss_regs {
 #define CPSW_MAX_BLKS_TX_SHIFT		4
 #define CPSW_MAX_BLKS_RX		5
 
-struct cpsw_host_regs {
-	u32	max_blks;
-	u32	blk_cnt;
-	u32	tx_in_ctl;
-	u32	port_vlan;
-	u32	tx_pri_map;
-	u32	cpdma_tx_pri_map;
-	u32	cpdma_rx_chan_map;
-};
-
 struct cpsw_sliver_regs {
 	u32	id_ver;
 	u32	mac_control;
@@ -370,105 +360,6 @@ struct cpsw_hw_stats {
 	u32	rxdmaoverruns;
 };
 
-struct cpsw_slave_data {
-	struct device_node *phy_node;
-	char		phy_id[MII_BUS_ID_SIZE];
-	int		phy_if;
-	u8		mac_addr[ETH_ALEN];
-	u16		dual_emac_res_vlan;	/* Reserved VLAN for DualEMAC */
-};
-
-struct cpsw_platform_data {
-	struct cpsw_slave_data	*slave_data;
-	u32	ss_reg_ofs;	/* Subsystem control register offset */
-	u32	channels;	/* number of cpdma channels (symmetric) */
-	u32	slaves;		/* number of slave cpgmac ports */
-	u32	active_slave; /* time stamping, ethtool and SIOCGMIIPHY slave */
-	u32	ale_entries;	/* ale table size */
-	u32	bd_ram_size;  /*buffer descriptor ram size */
-	u32	mac_control;	/* Mac control register */
-	u16	default_vlan;	/* Def VLAN for ALE lookup in VLAN aware mode*/
-	bool	dual_emac;	/* Enable Dual EMAC mode */
-};
-
-struct cpsw_slave {
-	void __iomem			*regs;
-	struct cpsw_sliver_regs __iomem	*sliver;
-	int				slave_num;
-	u32				mac_control;
-	struct cpsw_slave_data		*data;
-	struct phy_device		*phy;
-	struct net_device		*ndev;
-	u32				port_vlan;
-};
-
-static inline u32 slave_read(struct cpsw_slave *slave, u32 offset)
-{
-	return readl_relaxed(slave->regs + offset);
-}
-
-static inline void slave_write(struct cpsw_slave *slave, u32 val, u32 offset)
-{
-	writel_relaxed(val, slave->regs + offset);
-}
-
-struct cpsw_vector {
-	struct cpdma_chan *ch;
-	int budget;
-};
-
-struct cpsw_common {
-	struct device			*dev;
-	struct cpsw_platform_data	data;
-	struct napi_struct		napi_rx;
-	struct napi_struct		napi_tx;
-	struct cpsw_ss_regs __iomem	*regs;
-	struct cpsw_wr_regs __iomem	*wr_regs;
-	u8 __iomem			*hw_stats;
-	struct cpsw_host_regs __iomem	*host_port_regs;
-	u32				version;
-	u32				coal_intvl;
-	u32				bus_freq_mhz;
-	int				rx_packet_max;
-	struct cpsw_slave		*slaves;
-	struct cpdma_ctlr		*dma;
-	struct cpsw_vector		txv[CPSW_MAX_QUEUES];
-	struct cpsw_vector		rxv[CPSW_MAX_QUEUES];
-	struct cpsw_ale			*ale;
-	bool				quirk_irq;
-	bool				rx_irq_disabled;
-	bool				tx_irq_disabled;
-	u32 irqs_table[IRQ_NUM];
-	struct cpts			*cpts;
-	int				rx_ch_num, tx_ch_num;
-	int				speed;
-	int				usage_count;
-};
-
-struct cpsw_priv {
-	struct net_device		*ndev;
-	struct device			*dev;
-	u32				msg_enable;
-	u8				mac_addr[ETH_ALEN];
-	bool				rx_pause;
-	bool				tx_pause;
-	u32 emac_port;
-	struct cpsw_common *cpsw;
-};
-
-struct cpsw_stats {
-	char stat_string[ETH_GSTRING_LEN];
-	int type;
-	int sizeof_stat;
-	int stat_offset;
-};
-
-enum {
-	CPSW_STATS,
-	CPDMA_RX_STATS,
-	CPDMA_TX_STATS,
-};
-
 #define CPSW_STAT(m)		CPSW_STATS,				\
 				sizeof(((struct cpsw_hw_stats *)0)->m), \
 				offsetof(struct cpsw_hw_stats, m)
diff --git a/drivers/net/ethernet/ti/cpsw_priv.h b/drivers/net/ethernet/ti/cpsw_priv.h
new file mode 100644
index 0000000..3b02a83
--- /dev/null
+++ b/drivers/net/ethernet/ti/cpsw_priv.h
@@ -0,0 +1,141 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+#include <linux/netdevice.h>
+#include <linux/platform_device.h>
+
+#define HOST_PORT_NUM		0
+#define IRQ_NUM			2
+#define CPSW_MAX_QUEUES		8
+
+#define CPSW_VERSION_1		0x19010a
+#define CPSW_VERSION_2		0x19010c
+#define CPSW_VERSION_3		0x19010f
+#define CPSW_VERSION_4		0x190112
+
+/* CPSW_PORT_V1 */
+#define CPSW1_MAX_BLKS      0x00 /* Maximum FIFO Blocks */
+#define CPSW1_BLK_CNT       0x04 /* FIFO Block Usage Count (Read Only) */
+#define CPSW1_TX_IN_CTL     0x08 /* Transmit FIFO Control */
+#define CPSW1_PORT_VLAN     0x0c /* VLAN Register */
+#define CPSW1_TX_PRI_MAP    0x10 /* Tx Header Priority to Switch Pri Mapping */
+#define CPSW1_TS_CTL        0x14 /* Time Sync Control */
+#define CPSW1_TS_SEQ_LTYPE  0x18 /* Time Sync Sequence ID Offset and Msg Type */
+#define CPSW1_TS_VLAN       0x1c /* Time Sync VLAN1 and VLAN2 */
+
+/* CPSW_PORT_V2 */
+#define CPSW2_CONTROL       0x00 /* Control Register */
+#define CPSW2_MAX_BLKS      0x08 /* Maximum FIFO Blocks */
+#define CPSW2_BLK_CNT       0x0c /* FIFO Block Usage Count (Read Only) */
+#define CPSW2_TX_IN_CTL     0x10 /* Transmit FIFO Control */
+#define CPSW2_PORT_VLAN     0x14 /* VLAN Register */
+#define CPSW2_TX_PRI_MAP    0x18 /* Tx Header Priority to Switch Pri Mapping */
+#define CPSW2_TS_SEQ_MTYPE  0x1c /* Time Sync Sequence ID Offset and Msg Type */
+
+struct cpsw_slave_data {
+	struct	device_node *phy_node;
+	char	phy_id[MII_BUS_ID_SIZE];
+	int	phy_if;
+	u8	mac_addr[ETH_ALEN];
+	u16	dual_emac_res_vlan;	/* Reserved VLAN for DualEMAC */
+};
+
+struct cpsw_platform_data {
+	struct cpsw_slave_data	*slave_data;
+	u32	ss_reg_ofs;	/* Subsystem control register offset */
+	u32	channels;	/* number of cpdma channels (symmetric) */
+	u32	slaves;		/* number of slave cpgmac ports */
+	u32	active_slave; /* time stamping, ethtool and SIOCGMIIPHY slave */
+	u32	ale_entries;	/* ale table size */
+	u32	bd_ram_size;  /*buffer descriptor ram size */
+	u32	mac_control;	/* Mac control register */
+	u16	default_vlan;	/* Def VLAN for ALE lookup in VLAN aware mode*/
+	bool	dual_emac;	/* Enable Dual EMAC mode */
+};
+
+struct cpsw_slave {
+	void __iomem			*regs;
+	struct cpsw_sliver_regs __iomem	*sliver;
+	int				slave_num;
+	u32				mac_control;
+	struct cpsw_slave_data		*data;
+	struct phy_device		*phy;
+	struct net_device		*ndev;
+	u32				port_vlan;
+};
+
+struct cpsw_vector {
+	struct cpdma_chan *ch;
+	int budget;
+};
+
+struct cpsw_common {
+	struct device			*dev;
+	struct cpsw_platform_data	data;
+	struct napi_struct		napi_rx;
+	struct napi_struct		napi_tx;
+	struct cpsw_ss_regs __iomem	*regs;
+	struct cpsw_wr_regs __iomem	*wr_regs;
+	u8 __iomem			*hw_stats;
+	struct cpsw_host_regs __iomem	*host_port_regs;
+	u32				version;
+	u32				coal_intvl;
+	u32				bus_freq_mhz;
+	int				rx_packet_max;
+	struct cpsw_slave		*slaves;
+	struct cpdma_ctlr		*dma;
+	struct cpsw_vector		txv[CPSW_MAX_QUEUES];
+	struct cpsw_vector		rxv[CPSW_MAX_QUEUES];
+	struct cpsw_ale			*ale;
+	bool				quirk_irq;
+	bool				rx_irq_disabled;
+	bool				tx_irq_disabled;
+	u32				irqs_table[IRQ_NUM];
+	struct cpts			*cpts;
+	int				rx_ch_num, tx_ch_num;
+	int				speed;
+	int				usage_count;
+};
+
+struct cpsw_priv {
+	struct net_device	*ndev;
+	struct device		*dev;
+	u32			msg_enable;
+	u8			mac_addr[ETH_ALEN];
+	bool			rx_pause;
+	bool			tx_pause;
+	u8			port_state[3];
+	u32			emac_port;
+	struct cpsw_common	*cpsw;
+};
+
+struct cpsw_stats {
+	char stat_string[ETH_GSTRING_LEN];
+	int type;
+	int sizeof_stat;
+	int stat_offset;
+};
+
+enum {
+	CPSW_STATS,
+	CPDMA_RX_STATS,
+	CPDMA_TX_STATS,
+};
+
+struct cpsw_host_regs {
+	u32	max_blks;
+	u32	blk_cnt;
+	u32	tx_in_ctl;
+	u32	port_vlan;
+	u32	tx_pri_map;
+	u32	cpdma_tx_pri_map;
+	u32	cpdma_rx_chan_map;
+};
+
+static inline u32 slave_read(struct cpsw_slave *slave, u32 offset)
+{
+	return readl_relaxed(slave->regs + offset);
+}
+
+static inline void slave_write(struct cpsw_slave *slave, u32 val, u32 offset)
+{
+	writel_relaxed(val, slave->regs + offset);
+}
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 53+ messages in thread

* [PATCH 2/4] cpsw_ale: add support functions for switchdev
  2018-05-24  6:56 [PATCH 0/4] RFC CPSW switchdev mode Ilias Apalodimas
  2018-05-24  6:56 ` [PATCH 1/4] cpsw: move common headers definitions to cpsw_priv.h Ilias Apalodimas
@ 2018-05-24  6:56 ` Ilias Apalodimas
  2018-05-24  6:56 ` [PATCH 3/4] cpsw_switchdev: add switchdev support files Ilias Apalodimas
                   ` (3 subsequent siblings)
  5 siblings, 0 replies; 53+ messages in thread
From: Ilias Apalodimas @ 2018-05-24  6:56 UTC (permalink / raw)
  To: netdev, grygorii.strashko, ivan.khoronzhuk, nsekhar, jiri, ivecera
  Cc: francois.ozog, yogeshs, spatton, Ilias Apalodimas

Signed-off-by: Ilias Apalodimas <ilias.apalodimas@linaro.org>
---
 drivers/net/ethernet/ti/cpsw_ale.c | 129 +++++++++++++++++++++++++++++++++++++
 drivers/net/ethernet/ti/cpsw_ale.h |   8 +++
 2 files changed, 137 insertions(+)

diff --git a/drivers/net/ethernet/ti/cpsw_ale.c b/drivers/net/ethernet/ti/cpsw_ale.c
index 93dc05c..0b7383f 100644
--- a/drivers/net/ethernet/ti/cpsw_ale.c
+++ b/drivers/net/ethernet/ti/cpsw_ale.c
@@ -409,6 +409,41 @@ int cpsw_ale_del_mcast(struct cpsw_ale *ale, u8 *addr, int port_mask,
 }
 EXPORT_SYMBOL_GPL(cpsw_ale_del_mcast);
 
+static int cpsw_ale_read_mc(struct cpsw_ale *ale, u8 *addr, int flags, u16 vid)
+{
+	u32 ale_entry[ALE_ENTRY_WORDS] = {0, 0, 0};
+	int idx;
+
+	idx = cpsw_ale_match_addr(ale, addr, (flags & ALE_VLAN) ? vid : 0);
+	if (idx >= 0)
+		cpsw_ale_read(ale, idx, ale_entry);
+
+	return cpsw_ale_get_port_mask(ale_entry, ale->port_mask_bits);
+}
+
+int cpsw_ale_mcast_add_modify(struct cpsw_ale *ale, u8 *addr, int port_mask,
+			      int flags, u16 vid, int mcast_state)
+{
+	int mcast_members, ret;
+
+	mcast_members = cpsw_ale_read_mc(ale, addr, flags, vid) | port_mask;
+	ret = cpsw_ale_add_mcast(ale, addr, mcast_members, flags, vid, 0);
+
+	return ret;
+}
+
+int cpsw_ale_mcast_del_modify(struct cpsw_ale *ale, u8 *addr, int port_mask,
+			      int flags, u16 vid)
+{
+	int mcast_members, ret;
+
+	mcast_members = cpsw_ale_read_mc(ale, addr, flags, vid) & ~port_mask;
+	ret = cpsw_ale_del_mcast(ale, addr, mcast_members, flags, vid);
+
+	return ret;
+}
+EXPORT_SYMBOL_GPL(cpsw_ale_mcast_del_modify);
+
 /* ALE NetCP NU switch specific vlan functions */
 static void cpsw_ale_set_vlan_mcast(struct cpsw_ale *ale, u32 *ale_entry,
 				    int reg_mcast, int unreg_mcast)
@@ -424,6 +459,52 @@ static void cpsw_ale_set_vlan_mcast(struct cpsw_ale *ale, u32 *ale_entry,
 	writel(unreg_mcast, ale->params.ale_regs + ALE_VLAN_MASK_MUX(idx));
 }
 
+static int cpsw_ale_read_untagged(struct cpsw_ale *ale, u16 vid)
+{
+	u32 ale_entry[ALE_ENTRY_WORDS] = {0, 0, 0};
+	int idx;
+
+	idx = cpsw_ale_match_vlan(ale, vid);
+	if (idx >= 0)
+		cpsw_ale_read(ale, idx, ale_entry);
+
+	return cpsw_ale_get_vlan_untag_force(ale_entry, ale->vlan_field_bits);
+}
+
+/* returns mask of current members for specificed vlan */
+static int cpsw_ale_read_vlan_members(struct cpsw_ale *ale, u16 vid)
+{
+	u32 ale_entry[ALE_ENTRY_WORDS] = {0, 0, 0};
+	int idx;
+
+	idx = cpsw_ale_match_vlan(ale, vid);
+	if (idx >= 0)
+		cpsw_ale_read(ale, idx, ale_entry);
+
+	return cpsw_ale_get_vlan_member_list(ale_entry, ale->vlan_field_bits);
+}
+
+/* returns mask of registered/unregistered multicast registration */
+static int cpsw_ale_read_reg_unreg_mc(struct cpsw_ale *ale, u16 vid, bool unreg)
+{
+	u32 ale_entry[ALE_ENTRY_WORDS] = {0, 0, 0};
+	int idx;
+	int ret;
+
+	idx = cpsw_ale_match_vlan(ale, vid);
+	if (idx >= 0)
+		cpsw_ale_read(ale, idx, ale_entry);
+
+	if (unreg)
+		ret = cpsw_ale_get_vlan_unreg_mcast(ale_entry,
+						    ale->vlan_field_bits);
+	else
+		ret = cpsw_ale_get_vlan_reg_mcast(ale_entry,
+						  ale->vlan_field_bits);
+
+	return ret;
+}
+
 int cpsw_ale_add_vlan(struct cpsw_ale *ale, u16 vid, int port, int untag,
 		      int reg_mcast, int unreg_mcast)
 {
@@ -482,6 +563,54 @@ int cpsw_ale_del_vlan(struct cpsw_ale *ale, u16 vid, int port_mask)
 }
 EXPORT_SYMBOL_GPL(cpsw_ale_del_vlan);
 
+int cpsw_ale_vlan_add_modify(struct cpsw_ale *ale, u16 vid, int port_mask,
+			     int untag_mask, int reg_mask, int unreg_mask)
+{
+	int ret = 0;
+	int vlan_members = cpsw_ale_read_vlan_members(ale, vid) & ~port_mask;
+	int reg_mcast_members =
+		cpsw_ale_read_reg_unreg_mc(ale, vid, 0) & ~port_mask;
+	int unreg_mcast_members =
+		cpsw_ale_read_reg_unreg_mc(ale, vid, 1) & ~port_mask;
+	int untag_members = cpsw_ale_read_untagged(ale, vid) & ~port_mask;
+
+	vlan_members |= port_mask;
+	untag_members |= untag_mask;
+	reg_mcast_members |= reg_mask;
+	unreg_mcast_members |= unreg_mask;
+
+	ret = cpsw_ale_add_vlan(ale, vid, vlan_members, untag_members,
+				reg_mcast_members, unreg_mcast_members);
+	if (ret) {
+		dev_err(ale->params.dev, "Unable to add vlan\n");
+		return ret;
+	}
+	dev_dbg(ale->params.dev,  "port mask 0x%x untag 0x%x\n", vlan_members,
+		untag_mask);
+
+	return ret;
+}
+EXPORT_SYMBOL_GPL(cpsw_ale_vlan_add_modify);
+
+int cpsw_ale_vlan_del_modify(struct cpsw_ale *ale, u16 vid, int port_mask)
+{
+	int ret = 0;
+	int vlan_members;
+
+	vlan_members = cpsw_ale_read_vlan_members(ale, vid);
+	vlan_members &= ~port_mask;
+
+	ret = cpsw_ale_del_vlan(ale, vid, vlan_members);
+	if (ret) {
+		dev_err(ale->params.dev, "Unable to del vlan\n");
+		return ret;
+	}
+	dev_dbg(ale->params.dev, "port mask 0x%x\n", port_mask);
+
+	return ret;
+}
+EXPORT_SYMBOL_GPL(cpsw_ale_vlan_del_modify);
+
 void cpsw_ale_set_allmulti(struct cpsw_ale *ale, int allmulti)
 {
 	u32 ale_entry[ALE_ENTRY_WORDS];
diff --git a/drivers/net/ethernet/ti/cpsw_ale.h b/drivers/net/ethernet/ti/cpsw_ale.h
index d4fe901..bc29616 100644
--- a/drivers/net/ethernet/ti/cpsw_ale.h
+++ b/drivers/net/ethernet/ti/cpsw_ale.h
@@ -123,4 +123,12 @@ int cpsw_ale_control_set(struct cpsw_ale *ale, int port,
 			 int control, int value);
 void cpsw_ale_dump(struct cpsw_ale *ale, u32 *data);
 
+int cpsw_ale_vlan_add_modify(struct cpsw_ale *ale, u16 vid, int port_mask,
+			     int untag_mask, int reg_mcast, int unreg_mcast);
+int cpsw_ale_vlan_del_modify(struct cpsw_ale *ale, u16 vid, int port_mask);
+int cpsw_ale_mcast_add_modify(struct cpsw_ale *ale, u8 *addr, int port_mask,
+			      int flags, u16 vid, int mcast_state);
+int cpsw_ale_mcast_del_modify(struct cpsw_ale *ale, u8 *addr, int port,
+			      int flags, u16 vid);
+
 #endif
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 53+ messages in thread

* [PATCH 3/4] cpsw_switchdev: add switchdev support files
  2018-05-24  6:56 [PATCH 0/4] RFC CPSW switchdev mode Ilias Apalodimas
  2018-05-24  6:56 ` [PATCH 1/4] cpsw: move common headers definitions to cpsw_priv.h Ilias Apalodimas
  2018-05-24  6:56 ` [PATCH 2/4] cpsw_ale: add support functions for switchdev Ilias Apalodimas
@ 2018-05-24  6:56 ` Ilias Apalodimas
  2018-05-24 10:00   ` Maxim Uvarov
  2018-05-27  4:39   ` kbuild test robot
  2018-05-24  6:56 ` [PATCH 4/4] cpsw: add switchdev support Ilias Apalodimas
                   ` (2 subsequent siblings)
  5 siblings, 2 replies; 53+ messages in thread
From: Ilias Apalodimas @ 2018-05-24  6:56 UTC (permalink / raw)
  To: netdev, grygorii.strashko, ivan.khoronzhuk, nsekhar, jiri, ivecera
  Cc: francois.ozog, yogeshs, spatton, Ilias Apalodimas

Signed-off-by: Ilias Apalodimas <ilias.apalodimas@linaro.org>
---
 drivers/net/ethernet/ti/Kconfig          |   9 +
 drivers/net/ethernet/ti/Makefile         |   1 +
 drivers/net/ethernet/ti/cpsw_switchdev.c | 299 +++++++++++++++++++++++++++++++
 drivers/net/ethernet/ti/cpsw_switchdev.h |   4 +
 4 files changed, 313 insertions(+)
 create mode 100644 drivers/net/ethernet/ti/cpsw_switchdev.c
 create mode 100644 drivers/net/ethernet/ti/cpsw_switchdev.h

diff --git a/drivers/net/ethernet/ti/Kconfig b/drivers/net/ethernet/ti/Kconfig
index 48a541e..b22ae7d 100644
--- a/drivers/net/ethernet/ti/Kconfig
+++ b/drivers/net/ethernet/ti/Kconfig
@@ -73,6 +73,15 @@ config TI_CPSW
 	  To compile this driver as a module, choose M here: the module
 	  will be called cpsw.
 
+config TI_CPSW_SWITCHDEV
+	bool "TI CPSW switchdev support"
+	depends on TI_CPSW
+	depends on NET_SWITCHDEV
+	help
+	  Enable switchdev support on TI's CPSW Ethernet Switch.
+
+	  This will allow you to configure the switch using standard tools.
+
 config TI_CPTS
 	bool "TI Common Platform Time Sync (CPTS) Support"
 	depends on TI_CPSW || TI_KEYSTONE_NETCP
diff --git a/drivers/net/ethernet/ti/Makefile b/drivers/net/ethernet/ti/Makefile
index 0be551d..3926c6a 100644
--- a/drivers/net/ethernet/ti/Makefile
+++ b/drivers/net/ethernet/ti/Makefile
@@ -15,6 +15,7 @@ obj-$(CONFIG_TI_CPSW_PHY_SEL) += cpsw-phy-sel.o
 obj-$(CONFIG_TI_CPSW_ALE) += cpsw_ale.o
 obj-$(CONFIG_TI_CPTS_MOD) += cpts.o
 obj-$(CONFIG_TI_CPSW) += ti_cpsw.o
+ti_cpsw-objs:= cpsw_switchdev.o
 ti_cpsw-y := cpsw.o
 
 obj-$(CONFIG_TI_KEYSTONE_NETCP) += keystone_netcp.o
diff --git a/drivers/net/ethernet/ti/cpsw_switchdev.c b/drivers/net/ethernet/ti/cpsw_switchdev.c
new file mode 100644
index 0000000..bf8c1bf
--- /dev/null
+++ b/drivers/net/ethernet/ti/cpsw_switchdev.c
@@ -0,0 +1,299 @@
+// SPDX-License-Identifier: GPL-2.0
+/* Texas Instruments switchdev Driver
+ *
+ * Copyright (C) 2018 Texas Instruments
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License as
+ * published by the Free Software Foundation version 2.
+ *
+ * This program is distributed "as is" WITHOUT ANY WARRANTY of any
+ * kind, whether express or implied; without even the implied warranty
+ * of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ */
+
+#include <linux/etherdevice.h>
+#include <linux/if_bridge.h>
+#include <net/switchdev.h>
+#include "cpsw.h"
+#include "cpsw_priv.h"
+#include "cpsw_ale.h"
+
+static u32 cpsw_switchdev_get_ver(struct net_device *ndev)
+{
+	struct cpsw_priv *priv = netdev_priv(ndev);
+	struct cpsw_common *cpsw = priv->cpsw;
+
+	return cpsw->version;
+}
+
+static int cpsw_port_attr_set(struct net_device *dev,
+			      const struct switchdev_attr *attr,
+			      struct switchdev_trans *trans)
+{
+	return -EOPNOTSUPP;
+}
+
+static int cpsw_port_attr_get(struct net_device *dev,
+			      struct switchdev_attr *attr)
+{
+	u32 cpsw_ver;
+	int err = 0;
+
+	switch (attr->id) {
+	case SWITCHDEV_ATTR_ID_PORT_PARENT_ID:
+		cpsw_ver = cpsw_switchdev_get_ver(dev);
+		attr->u.ppid.id_len = sizeof(cpsw_ver);
+		memcpy(&attr->u.ppid.id, &cpsw_ver, attr->u.ppid.id_len);
+		break;
+	default:
+		return -EOPNOTSUPP;
+	}
+
+	return err;
+}
+
+static u16 cpsw_get_pvid(struct cpsw_priv *priv)
+{
+	struct cpsw_common *cpsw = priv->cpsw;
+	u32 __iomem *port_vlan_reg;
+	u32 pvid;
+
+	if (priv->emac_port) {
+		int reg = CPSW2_PORT_VLAN;
+
+		if (cpsw->version == CPSW_VERSION_1)
+			reg = CPSW1_PORT_VLAN;
+		pvid = slave_read(cpsw->slaves + (priv->emac_port - 1), reg);
+	} else {
+		port_vlan_reg = &cpsw->host_port_regs->port_vlan;
+		pvid = readl(port_vlan_reg);
+	}
+
+	pvid = pvid & 0xfff;
+
+	return pvid;
+}
+
+static void cpsw_set_pvid(struct cpsw_priv *priv, u16 vid, bool cfi, u32 cos)
+{
+	struct cpsw_common *cpsw = priv->cpsw;
+	void __iomem *port_vlan_reg;
+	u32 pvid;
+
+	pvid = vid;
+	pvid |= cfi ? BIT(12) : 0;
+	pvid |= (cos & 0x7) << 13;
+
+	if (priv->emac_port) {
+		int reg = CPSW2_PORT_VLAN;
+
+		if (cpsw->version == CPSW_VERSION_1)
+			reg = CPSW1_PORT_VLAN;
+		/* no barrier */
+		slave_write(cpsw->slaves + (priv->emac_port - 1), pvid, reg);
+	} else {
+		/* CPU port */
+		port_vlan_reg = &cpsw->host_port_regs->port_vlan;
+		writel(pvid, port_vlan_reg);
+	}
+}
+
+static int cpsw_port_vlan_add(struct cpsw_priv *priv, bool untag, bool pvid,
+			      u16 vid)
+{
+	struct cpsw_common *cpsw = priv->cpsw;
+	int port_mask = BIT(priv->emac_port);
+	int unreg_mcast_mask = 0;
+	int reg_mcast_mask = 0;
+	int untag_mask = 0;
+	int ret = 0;
+
+	if (priv->ndev->flags & IFF_ALLMULTI)
+		unreg_mcast_mask = port_mask;
+
+	if (priv->ndev->flags & IFF_MULTICAST)
+		reg_mcast_mask = port_mask;
+
+	if (untag)
+		untag_mask = port_mask;
+
+	ret = cpsw_ale_vlan_add_modify(cpsw->ale, vid, port_mask, untag_mask,
+				       reg_mcast_mask, unreg_mcast_mask);
+	if (ret) {
+		dev_err(priv->dev, "Unable to add vlan\n");
+		return ret;
+	}
+
+	if (!pvid)
+		return ret;
+
+	cpsw_set_pvid(priv, vid, 0, 0);
+
+	dev_dbg(priv->dev, "VID: %u dev: %s port: %u\n", vid,
+		priv->ndev->name, priv->emac_port);
+
+	return ret;
+}
+
+static int cpsw_port_vlan_del(struct cpsw_priv *priv, u16 vid)
+{
+	struct cpsw_common *cpsw = priv->cpsw;
+	int port_mask = BIT(priv->emac_port);
+	int ret = 0;
+
+	ret = cpsw_ale_vlan_del_modify(cpsw->ale, vid, port_mask);
+	if (ret != 0)
+		return ret;
+
+	ret = cpsw_ale_del_ucast(cpsw->ale, priv->mac_addr,
+				 HOST_PORT_NUM, ALE_VLAN, vid);
+
+	if (vid == cpsw_get_pvid(priv))
+		cpsw_set_pvid(priv, 0, 0, 0);
+
+	if (ret != 0) {
+		dev_dbg(priv->dev, "Failed to delete unicast entry\n");
+		ret = 0;
+	}
+
+	ret = cpsw_ale_del_mcast(cpsw->ale, priv->ndev->broadcast,
+				 0, ALE_VLAN, vid);
+	if (ret != 0) {
+		dev_dbg(priv->dev, "Failed to delete multicast entry\n");
+		ret = 0;
+	}
+
+	return ret;
+}
+
+static int cpsw_port_vlans_add(struct cpsw_priv *priv,
+			       const struct switchdev_obj_port_vlan *vlan,
+			       struct switchdev_trans *trans)
+{
+	bool untagged = vlan->flags & BRIDGE_VLAN_INFO_UNTAGGED;
+	bool pvid = vlan->flags & BRIDGE_VLAN_INFO_PVID;
+	u16 vid;
+
+	if (switchdev_trans_ph_prepare(trans))
+		return 0;
+
+	for (vid = vlan->vid_begin; vid <= vlan->vid_end; vid++) {
+		int err;
+
+		err = cpsw_port_vlan_add(priv, untagged, pvid, vid);
+		if (err)
+			return err;
+	}
+
+	return 0;
+}
+
+static int cpsw_port_vlans_del(struct cpsw_priv *priv,
+			       const struct switchdev_obj_port_vlan *vlan)
+
+{
+	u16 vid;
+
+	for (vid = vlan->vid_begin; vid <= vlan->vid_end; vid++) {
+		int err;
+
+		err = cpsw_port_vlan_del(priv, vid);
+		if (err)
+			return err;
+	}
+
+	return 0;
+}
+
+static int cpsw_port_mdb_add(struct cpsw_priv *priv,
+			     struct switchdev_obj_port_mdb *mdb,
+			     struct switchdev_trans *trans)
+{
+	struct cpsw_common *cpsw = priv->cpsw;
+	int port_mask;
+	int err;
+
+	if (switchdev_trans_ph_prepare(trans))
+		return 0;
+
+	port_mask = BIT(priv->emac_port);
+	err = cpsw_ale_mcast_add_modify(cpsw->ale, mdb->addr, port_mask,
+					ALE_VLAN, mdb->vid, 0);
+
+	return err;
+}
+
+static int cpsw_port_mdb_del(struct cpsw_priv *priv,
+			     struct switchdev_obj_port_mdb *mdb)
+
+{
+	struct cpsw_common *cpsw = priv->cpsw;
+	int del_mask;
+	int err;
+
+	del_mask = BIT(priv->emac_port);
+	err = cpsw_ale_mcast_del_modify(cpsw->ale, mdb->addr, del_mask,
+					ALE_VLAN, mdb->vid);
+
+	return err;
+}
+
+static int cpsw_port_obj_add(struct net_device *ndev,
+			     const struct switchdev_obj *obj,
+			     struct switchdev_trans *trans)
+{
+	struct switchdev_obj_port_vlan *vlan = SWITCHDEV_OBJ_PORT_VLAN(obj);
+	struct switchdev_obj_port_mdb *mdb = SWITCHDEV_OBJ_PORT_MDB(obj);
+	struct cpsw_priv *priv = netdev_priv(ndev);
+	int err = 0;
+
+	switch (obj->id) {
+	case SWITCHDEV_OBJ_ID_PORT_VLAN:
+		err = cpsw_port_vlans_add(priv, vlan, trans);
+		break;
+	case SWITCHDEV_OBJ_ID_PORT_MDB:
+		err = cpsw_port_mdb_add(priv, mdb, trans);
+		break;
+	default:
+		err = -EOPNOTSUPP;
+		break;
+	}
+
+	return err;
+}
+
+static int cpsw_port_obj_del(struct net_device *ndev,
+			     const struct switchdev_obj *obj)
+{
+	struct switchdev_obj_port_vlan *vlan = SWITCHDEV_OBJ_PORT_VLAN(obj);
+	struct cpsw_priv *priv = netdev_priv(ndev);
+	int err = 0;
+
+	switch (obj->id) {
+	case SWITCHDEV_OBJ_ID_PORT_VLAN:
+		err = cpsw_port_vlans_del(priv, vlan);
+		break;
+	case SWITCHDEV_OBJ_ID_PORT_MDB:
+		err = cpsw_port_mdb_del(priv, SWITCHDEV_OBJ_PORT_MDB(obj));
+		break;
+	default:
+		err = -EOPNOTSUPP;
+		break;
+	}
+
+	return err;
+}
+
+static const struct switchdev_ops cpsw_port_switchdev_ops = {
+	.switchdev_port_attr_set	= cpsw_port_attr_set,
+	.switchdev_port_attr_get	= cpsw_port_attr_get,
+	.switchdev_port_obj_add		= cpsw_port_obj_add,
+	.switchdev_port_obj_del		= cpsw_port_obj_del,
+};
+
+void cpsw_port_switchdev_init(struct net_device *ndev)
+{
+	ndev->switchdev_ops = &cpsw_port_switchdev_ops;
+}
diff --git a/drivers/net/ethernet/ti/cpsw_switchdev.h b/drivers/net/ethernet/ti/cpsw_switchdev.h
new file mode 100644
index 0000000..4940462
--- /dev/null
+++ b/drivers/net/ethernet/ti/cpsw_switchdev.h
@@ -0,0 +1,4 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+#include <net/switchdev.h>
+
+void cpsw_port_switchdev_init(struct net_device *ndev);
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 53+ messages in thread

* [PATCH 4/4] cpsw: add switchdev support
  2018-05-24  6:56 [PATCH 0/4] RFC CPSW switchdev mode Ilias Apalodimas
                   ` (2 preceding siblings ...)
  2018-05-24  6:56 ` [PATCH 3/4] cpsw_switchdev: add switchdev support files Ilias Apalodimas
@ 2018-05-24  6:56 ` Ilias Apalodimas
  2018-05-24 13:12   ` Andrew Lunn
  2018-05-24  8:05 ` [PATCH 0/4] RFC CPSW switchdev mode Jiri Pirko
  2018-06-01 21:29 ` Grygorii Strashko
  5 siblings, 1 reply; 53+ messages in thread
From: Ilias Apalodimas @ 2018-05-24  6:56 UTC (permalink / raw)
  To: netdev, grygorii.strashko, ivan.khoronzhuk, nsekhar, jiri, ivecera
  Cc: francois.ozog, yogeshs, spatton, Ilias Apalodimas

Signed-off-by: Ilias Apalodimas <ilias.apalodimas@linaro.org>
---
 drivers/net/ethernet/ti/cpsw.c      | 503 +++++++++++++++++++++++++++++++-----
 drivers/net/ethernet/ti/cpsw_priv.h |   9 +-
 2 files changed, 450 insertions(+), 62 deletions(-)

diff --git a/drivers/net/ethernet/ti/cpsw.c b/drivers/net/ethernet/ti/cpsw.c
index b16e7cf..8f8ebd8 100644
--- a/drivers/net/ethernet/ti/cpsw.c
+++ b/drivers/net/ethernet/ti/cpsw.c
@@ -18,12 +18,10 @@
 #include <linux/clk.h>
 #include <linux/timer.h>
 #include <linux/module.h>
-#include <linux/platform_device.h>
 #include <linux/irqreturn.h>
 #include <linux/interrupt.h>
 #include <linux/if_ether.h>
 #include <linux/etherdevice.h>
-#include <linux/netdevice.h>
 #include <linux/net_tstamp.h>
 #include <linux/phy.h>
 #include <linux/workqueue.h>
@@ -42,6 +40,7 @@
 #include "cpsw.h"
 #include "cpsw_ale.h"
 #include "cpsw_priv.h"
+#include "cpsw_switchdev.h"
 #include "cpts.h"
 #include "davinci_cpdma.h"
 
@@ -146,9 +145,6 @@ do {								\
 #define CPSW_CMINTMAX_INTVL	(1000 / CPSW_CMINTMIN_CNT)
 #define CPSW_CMINTMIN_INTVL	((1000 / CPSW_CMINTMAX_CNT) + 1)
 
-#define cpsw_slave_index(cpsw, priv)				\
-		((cpsw->data.dual_emac) ? priv->emac_port :	\
-		cpsw->data.active_slave)
 #define IRQ_NUM			2
 #define CPSW_MAX_QUEUES		8
 #define CPSW_CPDMA_DESCS_POOL_SIZE_DEFAULT 256
@@ -360,6 +356,13 @@ struct cpsw_hw_stats {
 	u32	rxdmaoverruns;
 };
 
+struct cpsw_switchdev_event_work {
+	struct work_struct work;
+	struct switchdev_notifier_fdb_info fdb_info;
+	struct cpsw_priv *priv;
+	unsigned long event;
+};
+
 #define CPSW_STAT(m)		CPSW_STATS,				\
 				sizeof(((struct cpsw_hw_stats *)0)->m), \
 				offsetof(struct cpsw_hw_stats, m)
@@ -433,18 +436,22 @@ static const struct cpsw_stats cpsw_gstrings_ch_stats[] = {
 		struct cpsw_slave *slave;				\
 		struct cpsw_common *cpsw = (priv)->cpsw;		\
 		int n;							\
-		if (cpsw->data.dual_emac)				\
-			(func)((cpsw)->slaves + priv->emac_port, ##arg);\
-		else							\
+		if (cpsw->data.switch_mode) {				\
+			if (priv->emac_port == HOST_PORT_NUM)		\
+				break;					\
+			(func)((cpsw)->slaves + (priv->emac_port - 1),  \
+			       ##arg);\
+		} else {						\
 			for (n = cpsw->data.slaves,			\
 					slave = cpsw->slaves;		\
 					n; n--)				\
 				(func)(slave++, ##arg);			\
+		}							\
 	} while (0)
 
 #define cpsw_dual_emac_src_port_detect(cpsw, status, ndev, skb)		\
 	do {								\
-		if (!cpsw->data.dual_emac)				\
+		if (!cpsw->data.switch_mode)				\
 			break;						\
 		if (CPDMA_RX_SOURCE_PORT(status) == 1) {		\
 			ndev = cpsw->slaves[0].ndev;			\
@@ -456,11 +463,13 @@ static const struct cpsw_stats cpsw_gstrings_ch_stats[] = {
 	} while (0)
 #define cpsw_add_mcast(cpsw, priv, addr)				\
 	do {								\
-		if (cpsw->data.dual_emac) {				\
+		if (cpsw->data.switch_mode) {				\
 			struct cpsw_slave *slave = cpsw->slaves +	\
-						priv->emac_port;	\
+						(priv->emac_port - 1);	\
 			int slave_port = cpsw_get_slave_port(		\
 						slave->slave_num);	\
+			if (priv->emac_port == HOST_PORT_NUM)		\
+				break;					\
 			cpsw_ale_add_mcast(cpsw->ale, addr,		\
 				1 << slave_port | ALE_PORT_HOST,	\
 				ALE_VLAN, slave->port_vlan, 0);		\
@@ -476,13 +485,41 @@ static inline int cpsw_get_slave_port(u32 slave_num)
 	return slave_num + 1;
 }
 
+static int cpsw_is_dual_mac(u8 switch_mode)
+{
+	return switch_mode == CPSW_DUAL_EMAC;
+}
+
+static int cpsw_is_switchdev(u8 switch_mode)
+{
+	return switch_mode == CPSW_SWITCHDEV;
+}
+
+static int cpsw_is_switch(u8 switch_mode)
+{
+	return switch_mode == CPSW_TI_SWITCH;
+}
+
+static int cpsw_slave_index(struct cpsw_priv *priv)
+{
+	struct cpsw_common *cpsw = priv->cpsw;
+
+#if IS_ENABLED(CONFIG_TI_CPSW_SWITCHDEV)
+	if (priv->emac_port == HOST_PORT_NUM)
+		return -1;
+#endif
+
+	return cpsw->data.switch_mode ? priv->emac_port - 1 :
+		cpsw->data.active_slave;
+}
+
 static void cpsw_set_promiscious(struct net_device *ndev, bool enable)
 {
 	struct cpsw_common *cpsw = ndev_to_cpsw(ndev);
 	struct cpsw_ale *ale = cpsw->ale;
 	int i;
 
-	if (cpsw->data.dual_emac) {
+	if (cpsw_is_dual_mac(cpsw->data.switch_mode)) {
 		bool flag = false;
 
 		/* Enabling promiscuous mode for one interface will be
@@ -508,7 +545,7 @@ static void cpsw_set_promiscious(struct net_device *ndev, bool enable)
 			cpsw_ale_control_set(ale, 0, ALE_BYPASS, 0);
 			dev_dbg(&ndev->dev, "promiscuity disabled\n");
 		}
-	} else {
+	} else if (cpsw_is_switch(cpsw->data.switch_mode)) {
 		if (enable) {
 			unsigned long timeout = jiffies + HZ;
 
@@ -548,6 +585,18 @@ static void cpsw_set_promiscious(struct net_device *ndev, bool enable)
 			}
 			dev_dbg(&ndev->dev, "promiscuity disabled\n");
 		}
+	} else if (cpsw_is_switchdev(cpsw->data.switch_mode)) {
+		/* When interfaces are placed into a bridge they'll switch to
+		 * promiscuous mode. In switchdev case ALE_P0_UNI_FLOOD is
+		 * changed whether the cpu port participates in the bridge
+		 */
+		struct cpsw_priv *priv = netdev_priv(ndev);
+		int slave_idx = cpsw_slave_index(priv);
+		int slave_num;
+
+		slave_num = cpsw_get_slave_port(slave_idx);
+		cpsw_ale_control_set(ale, slave_num, ALE_PORT_NOLEARN, 0);
+		cpsw_ale_control_set(ale, slave_num, ALE_PORT_NO_SA_UPDATE, 0);
 	}
 }
 
@@ -555,10 +604,11 @@ static void cpsw_ndo_set_rx_mode(struct net_device *ndev)
 {
 	struct cpsw_priv *priv = netdev_priv(ndev);
 	struct cpsw_common *cpsw = priv->cpsw;
+	int slave_no = cpsw_slave_index(priv);
 	int vid;
 
-	if (cpsw->data.dual_emac)
-		vid = cpsw->slaves[priv->emac_port].port_vlan;
+	if (cpsw_is_dual_mac(cpsw->data.switch_mode))
+		vid = cpsw->slaves[slave_no].port_vlan;
 	else
 		vid = cpsw->data.default_vlan;
 
@@ -629,8 +679,9 @@ static void cpsw_tx_handler(void *token, int len, int status)
 static void cpsw_rx_vlan_encap(struct sk_buff *skb)
 {
 	struct cpsw_priv *priv = netdev_priv(skb->dev);
-	struct cpsw_common *cpsw = priv->cpsw;
 	u32 rx_vlan_encap_hdr = *((u32 *)skb->data);
+	struct cpsw_common *cpsw = priv->cpsw;
+	int slave_no = cpsw_slave_index(priv);
 	u16 vtag, vid, prio, pkt_type;
 
 	/* Remove VLAN header encapsulation word */
@@ -651,8 +702,8 @@ static void cpsw_rx_vlan_encap(struct sk_buff *skb)
 	if (!vid)
 		return;
 	/* Ignore default vlans in dual mac mode */
-	if (cpsw->data.dual_emac &&
-	    vid == cpsw->slaves[priv->emac_port].port_vlan)
+	if (cpsw_is_dual_mac(cpsw->data.switch_mode) &&
+	    vid == cpsw->slaves[slave_no].port_vlan)
 		return;
 
 	prio = (rx_vlan_encap_hdr >>
@@ -681,9 +732,9 @@ static void cpsw_rx_handler(void *token, int len, int status)
 	cpsw_dual_emac_src_port_detect(cpsw, status, ndev, skb);
 
 	if (unlikely(status < 0) || unlikely(!netif_running(ndev))) {
-		/* In dual emac mode check for all interfaces */
-		if (cpsw->data.dual_emac && cpsw->usage_count &&
-		    (status >= 0)) {
+		/* In any other that switch mode check for all interfaces */
+		if (!cpsw_is_switch(cpsw->data.switch_mode) &&
+		    cpsw->usage_count && status >= 0) {
 			/* The packet received is for the interface which
 			 * is already down and the other interface is up
 			 * and running, instead of freeing which results
@@ -699,6 +750,11 @@ static void cpsw_rx_handler(void *token, int len, int status)
 		return;
 	}
 
+#if IS_ENABLED(CONFIG_TI_CPSW_SWITCHDEV)
+	if (cpsw_is_switchdev(cpsw->data.switch_mode))
+		skb->offload_fwd_mark = 1;
+#endif
+
 	new_skb = netdev_alloc_skb_ip_align(ndev, cpsw->rx_packet_max);
 	if (new_skb) {
 		skb_copy_queue_mapping(new_skb, skb);
@@ -1206,11 +1262,10 @@ static inline int cpsw_tx_packet_submit(struct cpsw_priv *priv,
 					struct sk_buff *skb,
 					struct cpdma_chan *txch)
 {
-	struct cpsw_common *cpsw = priv->cpsw;
-
 	skb_tx_timestamp(skb);
+
 	return cpdma_chan_submit(txch, skb, skb->data, skb->len,
-				 priv->emac_port + cpsw->data.dual_emac);
+				 priv->emac_port);
 }
 
 static inline void cpsw_add_dual_emac_def_ale_entries(
@@ -1283,7 +1338,7 @@ static void cpsw_slave_open(struct cpsw_slave *slave, struct cpsw_priv *priv)
 
 	slave_port = cpsw_get_slave_port(slave->slave_num);
 
-	if (cpsw->data.dual_emac)
+	if (cpsw_is_dual_mac(cpsw->data.switch_mode))
 		cpsw_add_dual_emac_def_ale_entries(priv, slave, slave_port);
 	else
 		cpsw_ale_add_mcast(cpsw->ale, priv->ndev->broadcast,
@@ -1362,8 +1417,8 @@ static void cpsw_init_host_port(struct cpsw_priv *priv)
 	control_reg = readl(&cpsw->regs->control);
 	control_reg |= CPSW_VLAN_AWARE | CPSW_RX_VLAN_ENCAP;
 	writel(control_reg, &cpsw->regs->control);
-	fifo_mode = (cpsw->data.dual_emac) ? CPSW_FIFO_DUAL_MAC_MODE :
-		     CPSW_FIFO_NORMAL_MODE;
+	fifo_mode = cpsw_is_dual_mac(cpsw->data.switch_mode) ?
+		CPSW_FIFO_DUAL_MAC_MODE : CPSW_FIFO_NORMAL_MODE;
 	writel(fifo_mode, &cpsw->host_port_regs->tx_in_ctl);
 
 	/* setup host port priority mapping */
@@ -1374,7 +1429,7 @@ static void cpsw_init_host_port(struct cpsw_priv *priv)
 	cpsw_ale_control_set(cpsw->ale, HOST_PORT_NUM,
 			     ALE_PORT_STATE, ALE_PORT_STATE_FORWARD);
 
-	if (!cpsw->data.dual_emac) {
+	if (!cpsw_is_dual_mac(cpsw->data.switch_mode)) {
 		cpsw_ale_add_ucast(cpsw->ale, priv->mac_addr, HOST_PORT_NUM,
 				   0, 0);
 		cpsw_ale_add_mcast(cpsw->ale, priv->ndev->broadcast,
@@ -1474,14 +1529,19 @@ static int cpsw_ndo_open(struct net_device *ndev)
 	/* Initialize host and slave ports */
 	if (!cpsw->usage_count)
 		cpsw_init_host_port(priv);
-	for_each_slave(priv, cpsw_slave_open, priv);
 
-	/* Add default VLAN */
-	if (!cpsw->data.dual_emac)
-		cpsw_add_default_vlan(priv);
-	else
-		cpsw_ale_add_vlan(cpsw->ale, cpsw->data.default_vlan,
-				  ALE_ALL_PORTS, ALE_ALL_PORTS, 0, 0);
+	if (!IS_ENABLED(CONFIG_TI_CPSW_SWITCHDEV) ||
+	    (IS_ENABLED(CONFIG_TI_CPSW_SWITCHDEV) &&
+	     priv->emac_port != HOST_PORT_NUM)) {
+		for_each_slave(priv, cpsw_slave_open, priv);
+
+		/* Add default VLAN */
+		if (cpsw_is_dual_mac(cpsw->data.switch_mode))
+			cpsw_ale_add_vlan(cpsw->ale, cpsw->data.default_vlan,
+					  ALE_ALL_PORTS, ALE_ALL_PORTS, 0, 0);
+		else
+			cpsw_add_default_vlan(priv);
+	}
 
 	/* initialize shared resources for every ndev */
 	if (!cpsw->usage_count) {
@@ -1575,6 +1635,13 @@ static netdev_tx_t cpsw_ndo_start_xmit(struct sk_buff *skb,
 	struct cpdma_chan *txch;
 	int ret, q_idx;
 
+#if IS_ENABLED(CONFIG_TI_CPSW_SWITCHDEV)
+	if (priv->emac_port == HOST_PORT_NUM) {
+		dev_kfree_skb_any(skb);
+		return NETDEV_TX_OK;
+	}
+#endif
+
 	if (skb_padto(skb, CPSW_MIN_PACKET_SIZE)) {
 		cpsw_err(priv, tx_err, "packet pad failed\n");
 		ndev->stats.tx_dropped++;
@@ -1655,8 +1722,12 @@ static void cpsw_hwtstamp_v2(struct cpsw_priv *priv)
 	struct cpsw_slave *slave;
 	struct cpsw_common *cpsw = priv->cpsw;
 	u32 ctrl, mtype;
+	int slave_no = cpsw_slave_index(priv);
+
+	if (slave_no < 0)
+		return;
 
-	slave = &cpsw->slaves[cpsw_slave_index(cpsw, priv)];
+	slave = &cpsw->slaves[slave_no];
 
 	ctrl = slave_read(slave, CPSW2_CONTROL);
 	switch (cpsw->version) {
@@ -1791,11 +1862,14 @@ static int cpsw_ndo_ioctl(struct net_device *dev, struct ifreq *req, int cmd)
 {
 	struct cpsw_priv *priv = netdev_priv(dev);
 	struct cpsw_common *cpsw = priv->cpsw;
-	int slave_no = cpsw_slave_index(cpsw, priv);
+	int slave_no = cpsw_slave_index(priv);
 
 	if (!netif_running(dev))
 		return -EINVAL;
 
+	if (slave_no < 0)
+		return -EOPNOTSUPP;
+
 	switch (cmd) {
 	case SIOCSHWTSTAMP:
 		return cpsw_hwtstamp_set(dev, req);
@@ -1832,6 +1906,7 @@ static int cpsw_ndo_set_mac_address(struct net_device *ndev, void *p)
 	struct cpsw_priv *priv = netdev_priv(ndev);
 	struct sockaddr *addr = (struct sockaddr *)p;
 	struct cpsw_common *cpsw = priv->cpsw;
+	int slave_no = cpsw_slave_index(priv);
 	int flags = 0;
 	u16 vid = 0;
 	int ret;
@@ -1845,8 +1920,8 @@ static int cpsw_ndo_set_mac_address(struct net_device *ndev, void *p)
 		return ret;
 	}
 
-	if (cpsw->data.dual_emac) {
-		vid = cpsw->slaves[priv->emac_port].port_vlan;
+	if (cpsw_is_dual_mac(cpsw->data.switch_mode)) {
+		vid = cpsw->slaves[slave_no].port_vlan;
 		flags = ALE_VLAN;
 	}
 
@@ -1884,8 +1959,11 @@ static inline int cpsw_add_vlan_ale_entry(struct cpsw_priv *priv,
 	u32 port_mask;
 	struct cpsw_common *cpsw = priv->cpsw;
 
-	if (cpsw->data.dual_emac) {
-		port_mask = (1 << (priv->emac_port + 1)) | ALE_PORT_HOST;
+	if (cpsw_is_switchdev(cpsw->data.switch_mode))
+		return -EOPNOTSUPP;
+
+	if (cpsw_is_dual_mac(cpsw->data.switch_mode)) {
+		port_mask = (1 << priv->emac_port) | ALE_PORT_HOST;
 
 		if (priv->ndev->flags & IFF_ALLMULTI)
 			unreg_mcast_mask = port_mask;
@@ -1929,6 +2007,9 @@ static int cpsw_ndo_vlan_rx_add_vid(struct net_device *ndev,
 	struct cpsw_common *cpsw = priv->cpsw;
 	int ret;
 
+	if (cpsw_is_switchdev(cpsw->data.switch_mode))
+		return 0;
+
 	if (vid == cpsw->data.default_vlan)
 		return 0;
 
@@ -1938,7 +2019,7 @@ static int cpsw_ndo_vlan_rx_add_vid(struct net_device *ndev,
 		return ret;
 	}
 
-	if (cpsw->data.dual_emac) {
+	if (cpsw_is_dual_mac(cpsw->data.switch_mode)) {
 		/* In dual EMAC, reserved VLAN id should not be used for
 		 * creating VLAN interfaces as this can break the dual
 		 * EMAC port separation
@@ -1965,6 +2046,9 @@ static int cpsw_ndo_vlan_rx_kill_vid(struct net_device *ndev,
 	struct cpsw_common *cpsw = priv->cpsw;
 	int ret;
 
+	if (cpsw_is_switchdev(cpsw->data.switch_mode))
+		return 0;
+
 	if (vid == cpsw->data.default_vlan)
 		return 0;
 
@@ -1974,7 +2058,7 @@ static int cpsw_ndo_vlan_rx_kill_vid(struct net_device *ndev,
 		return ret;
 	}
 
-	if (cpsw->data.dual_emac) {
+	if (cpsw_is_dual_mac(cpsw->data.switch_mode)) {
 		int i;
 
 		for (i = 0; i < cpsw->data.slaves; i++) {
@@ -1999,6 +2083,24 @@ static int cpsw_ndo_vlan_rx_kill_vid(struct net_device *ndev,
 	return ret;
 }
 
+static int cpsw_ndo_get_phys_port_name(struct net_device *ndev, char *name,
+				       size_t len)
+{
+	struct cpsw_priv *priv = netdev_priv(ndev);
+	struct cpsw_common *cpsw = priv->cpsw;
+	int err;
+
+	if (!cpsw_is_switchdev(cpsw->data.switch_mode))
+		return -EOPNOTSUPP;
+
+	err = snprintf(name, len, "p%d", priv->emac_port);
+
+	if (err >= len)
+		return -EINVAL;
+
+	return 0;
+}
+
 static int cpsw_ndo_set_tx_maxrate(struct net_device *ndev, int queue, u32 rate)
 {
 	struct cpsw_priv *priv = netdev_priv(ndev);
@@ -2065,6 +2167,7 @@ static const struct net_device_ops cpsw_netdev_ops = {
 #endif
 	.ndo_vlan_rx_add_vid	= cpsw_ndo_vlan_rx_add_vid,
 	.ndo_vlan_rx_kill_vid	= cpsw_ndo_vlan_rx_kill_vid,
+	.ndo_get_phys_port_name = cpsw_ndo_get_phys_port_name,
 };
 
 static int cpsw_get_regs_len(struct net_device *ndev)
@@ -2152,7 +2255,10 @@ static int cpsw_get_link_ksettings(struct net_device *ndev,
 {
 	struct cpsw_priv *priv = netdev_priv(ndev);
 	struct cpsw_common *cpsw = priv->cpsw;
-	int slave_no = cpsw_slave_index(cpsw, priv);
+	int slave_no = cpsw_slave_index(priv);
+
+	if (slave_no < 0)
+		return -EOPNOTSUPP;
 
 	if (!cpsw->slaves[slave_no].phy)
 		return -EOPNOTSUPP;
@@ -2166,7 +2272,10 @@ static int cpsw_set_link_ksettings(struct net_device *ndev,
 {
 	struct cpsw_priv *priv = netdev_priv(ndev);
 	struct cpsw_common *cpsw = priv->cpsw;
-	int slave_no = cpsw_slave_index(cpsw, priv);
+	int slave_no = cpsw_slave_index(priv);
+
+	if (slave_no < 0)
+		return -EOPNOTSUPP;
 
 	if (cpsw->slaves[slave_no].phy)
 		return phy_ethtool_ksettings_set(cpsw->slaves[slave_no].phy,
@@ -2179,7 +2288,10 @@ static void cpsw_get_wol(struct net_device *ndev, struct ethtool_wolinfo *wol)
 {
 	struct cpsw_priv *priv = netdev_priv(ndev);
 	struct cpsw_common *cpsw = priv->cpsw;
-	int slave_no = cpsw_slave_index(cpsw, priv);
+	int slave_no = cpsw_slave_index(priv);
+
+	if (slave_no < 0)
+		return;
 
 	wol->supported = 0;
 	wol->wolopts = 0;
@@ -2192,7 +2304,10 @@ static int cpsw_set_wol(struct net_device *ndev, struct ethtool_wolinfo *wol)
 {
 	struct cpsw_priv *priv = netdev_priv(ndev);
 	struct cpsw_common *cpsw = priv->cpsw;
-	int slave_no = cpsw_slave_index(cpsw, priv);
+	int slave_no = cpsw_slave_index(priv);
+
+	if (slave_no < 0)
+		return -EOPNOTSUPP;
 
 	if (cpsw->slaves[slave_no].phy)
 		return phy_ethtool_set_wol(cpsw->slaves[slave_no].phy, wol);
@@ -2451,7 +2566,10 @@ static int cpsw_get_eee(struct net_device *ndev, struct ethtool_eee *edata)
 {
 	struct cpsw_priv *priv = netdev_priv(ndev);
 	struct cpsw_common *cpsw = priv->cpsw;
-	int slave_no = cpsw_slave_index(cpsw, priv);
+	int slave_no = cpsw_slave_index(priv);
+
+	if (slave_no < 0)
+		return -EOPNOTSUPP;
 
 	if (cpsw->slaves[slave_no].phy)
 		return phy_ethtool_get_eee(cpsw->slaves[slave_no].phy, edata);
@@ -2463,7 +2581,10 @@ static int cpsw_set_eee(struct net_device *ndev, struct ethtool_eee *edata)
 {
 	struct cpsw_priv *priv = netdev_priv(ndev);
 	struct cpsw_common *cpsw = priv->cpsw;
-	int slave_no = cpsw_slave_index(cpsw, priv);
+	int slave_no = cpsw_slave_index(priv);
+
+	if (slave_no < 0)
+		return -EOPNOTSUPP;
 
 	if (cpsw->slaves[slave_no].phy)
 		return phy_ethtool_set_eee(cpsw->slaves[slave_no].phy, edata);
@@ -2475,7 +2596,10 @@ static int cpsw_nway_reset(struct net_device *ndev)
 {
 	struct cpsw_priv *priv = netdev_priv(ndev);
 	struct cpsw_common *cpsw = priv->cpsw;
-	int slave_no = cpsw_slave_index(cpsw, priv);
+	int slave_no = cpsw_slave_index(priv);
+
+	if (slave_no < 0)
+		return -EOPNOTSUPP;
 
 	if (cpsw->slaves[slave_no].phy)
 		return genphy_restart_aneg(cpsw->slaves[slave_no].phy);
@@ -2626,7 +2750,11 @@ static int cpsw_probe_dt(struct cpsw_platform_data *data,
 	data->mac_control = prop;
 
 	if (of_property_read_bool(node, "dual_emac"))
-		data->dual_emac = 1;
+		data->switch_mode = CPSW_DUAL_EMAC;
+
+	/* switchdev overrides DTS */
+	if (IS_ENABLED(CONFIG_TI_CPSW_SWITCHDEV))
+		data->switch_mode = CPSW_SWITCHDEV;
 
 	/*
 	 * Populate all the child nodes here...
@@ -2707,7 +2835,7 @@ static int cpsw_probe_dt(struct cpsw_platform_data *data,
 			if (ret)
 				return ret;
 		}
-		if (data->dual_emac) {
+		if (cpsw_is_dual_mac(data->switch_mode)) {
 			if (of_property_read_u32(slave_node, "dual_emac_res_vlan",
 						 &prop)) {
 				dev_err(&pdev->dev, "Missing dual_emac_res_vlan in DT.\n");
@@ -2787,9 +2915,13 @@ static int cpsw_probe_dual_emac(struct cpsw_priv *priv)
 	}
 	memcpy(ndev->dev_addr, priv_sl2->mac_addr, ETH_ALEN);
 
-	priv_sl2->emac_port = 1;
+	priv_sl2->emac_port = 2;
 	cpsw->slaves[1].ndev = ndev;
 	ndev->features |= NETIF_F_HW_VLAN_CTAG_FILTER;
+	if (cpsw_is_switchdev(cpsw->data.switch_mode)) {
+		ndev->features |= NETIF_F_NETNS_LOCAL;
+		cpsw_port_switchdev_init(ndev);
+	}
 
 	ndev->netdev_ops = &cpsw_netdev_ops;
 	ndev->ethtool_ops = &cpsw_ethtool_ops;
@@ -2806,6 +2938,49 @@ static int cpsw_probe_dual_emac(struct cpsw_priv *priv)
 	return ret;
 }
 
+static int cpsw_probe_cpu_port(struct cpsw_common *cpsw)
+{
+	struct cpsw_priv *priv_sl2;
+	struct net_device *ndev;
+	int ret = 0;
+
+	ndev = alloc_etherdev_mq(sizeof(struct cpsw_priv), CPSW_MAX_QUEUES);
+	if (!ndev) {
+		dev_err(cpsw->dev, "cpsw: error allocating net_device\n");
+		return -ENOMEM;
+	}
+
+	priv_sl2 = netdev_priv(ndev);
+	priv_sl2->cpsw = cpsw;
+	priv_sl2->ndev = ndev;
+	priv_sl2->dev  = &ndev->dev;
+	priv_sl2->msg_enable = netif_msg_init(debug_level, CPSW_DEBUG);
+
+	random_ether_addr(priv_sl2->mac_addr);
+	dev_info(cpsw->dev, "cpu port: Random MACID = %pM\n",
+		 priv_sl2->mac_addr);
+
+	memcpy(ndev->dev_addr, priv_sl2->mac_addr, ETH_ALEN);
+
+	priv_sl2->emac_port = HOST_PORT_NUM;
+	ndev->features |= NETIF_F_HW_VLAN_CTAG_FILTER | NETIF_F_NETNS_LOCAL;
+
+	ndev->netdev_ops = &cpsw_netdev_ops;
+
+	/* register the network device */
+	SET_NETDEV_DEV(ndev, cpsw->dev);
+	cpsw_port_switchdev_init(ndev);
+	ret = register_netdev(ndev);
+	if (ret) {
+		dev_err(cpsw->dev, "cpsw: error registering net device\n");
+		free_netdev(ndev);
+		ret = -ENODEV;
+	}
+	cpsw->master = ndev;
+
+	return ret;
+}
+
 #define CPSW_QUIRK_IRQ		BIT(0)
 
 static const struct platform_device_id cpsw_devtype[] = {
@@ -2844,6 +3019,187 @@ static const struct of_device_id cpsw_of_mtable[] = {
 };
 MODULE_DEVICE_TABLE(of, cpsw_of_mtable);
 
+static bool cpsw_port_dev_check(const struct net_device *dev)
+{
+	return dev->netdev_ops == &cpsw_netdev_ops;
+}
+
+static void cpsw_fdb_offload_notify(struct net_device *ndev,
+				    struct switchdev_notifier_fdb_info *rcv)
+{
+	struct switchdev_notifier_fdb_info info;
+
+	info.addr = rcv->addr;
+	info.vid = rcv->vid;
+	call_switchdev_notifiers(SWITCHDEV_FDB_OFFLOADED,
+				 ndev, &info.info);
+}
+
+static void cpsw_switchdev_event_work(struct work_struct *work)
+{
+	struct cpsw_switchdev_event_work *switchdev_work =
+		container_of(work, struct cpsw_switchdev_event_work, work);
+	struct cpsw_priv *priv = switchdev_work->priv;
+	struct switchdev_notifier_fdb_info *fdb;
+	struct cpsw_common *cpsw = priv->cpsw;
+
+	rtnl_lock();
+	switch (switchdev_work->event) {
+	case SWITCHDEV_FDB_ADD_TO_DEVICE:
+		fdb = &switchdev_work->fdb_info;
+		cpsw_ale_add_ucast(cpsw->ale, (u8 *)fdb->addr, priv->emac_port,
+				   ALE_VLAN | ALE_SECURE, fdb->vid);
+		cpsw_fdb_offload_notify(priv->ndev, fdb);
+		break;
+	case SWITCHDEV_FDB_DEL_TO_DEVICE:
+		fdb = &switchdev_work->fdb_info;
+		cpsw_ale_del_ucast(cpsw->ale, (u8 *)fdb->addr, priv->emac_port,
+				   ALE_VLAN | ALE_SECURE, fdb->vid);
+		break;
+	default:
+		break;
+	}
+	rtnl_unlock();
+
+	kfree(switchdev_work->fdb_info.addr);
+	kfree(switchdev_work);
+	dev_put(priv->ndev);
+}
+
+/* called under rcu_read_lock() */
+static int cpsw_switchdev_event(struct notifier_block *unused,
+				unsigned long event, void *ptr)
+{
+	struct net_device *ndev = switchdev_notifier_info_to_dev(ptr);
+	struct switchdev_notifier_fdb_info *fdb_info = ptr;
+	struct cpsw_switchdev_event_work *switchdev_work;
+	struct cpsw_priv *priv = netdev_priv(ndev);
+
+	if (!cpsw_port_dev_check(ndev))
+		return NOTIFY_DONE;
+
+	switchdev_work = kzalloc(sizeof(*switchdev_work), GFP_ATOMIC);
+	if (WARN_ON(!switchdev_work))
+		return NOTIFY_BAD;
+
+	INIT_WORK(&switchdev_work->work, cpsw_switchdev_event_work);
+	switchdev_work->priv = priv;
+	switchdev_work->event = event;
+
+	switch (event) {
+	case SWITCHDEV_FDB_ADD_TO_DEVICE:
+	case SWITCHDEV_FDB_DEL_TO_DEVICE:
+		memcpy(&switchdev_work->fdb_info, ptr,
+		       sizeof(switchdev_work->fdb_info));
+		switchdev_work->fdb_info.addr = kzalloc(ETH_ALEN, GFP_ATOMIC);
+		ether_addr_copy((u8 *)switchdev_work->fdb_info.addr,
+				fdb_info->addr);
+		dev_hold(ndev);
+		break;
+	default:
+		kfree(switchdev_work);
+		return NOTIFY_DONE;
+	}
+
+	queue_work(system_long_wq, &switchdev_work->work);
+
+	return NOTIFY_DONE;
+}
+
+static struct notifier_block cpsw_switchdev_notifier = {
+	.notifier_call = cpsw_switchdev_event,
+};
+
+static void cpsw_netdevice_port_link(struct net_device *ndev)
+{
+	struct cpsw_priv *priv = netdev_priv(ndev);
+	struct cpsw_common *cpsw = priv->cpsw;
+
+	if (priv->emac_port != HOST_PORT_NUM)
+		return;
+
+	cpsw_ale_control_set(cpsw->ale, HOST_PORT_NUM, ALE_P0_UNI_FLOOD, 1);
+}
+
+static void cpsw_netdevice_port_unlink(struct net_device *ndev)
+{
+	struct cpsw_priv *priv = netdev_priv(ndev);
+	struct cpsw_common *cpsw = priv->cpsw;
+
+	if (priv->emac_port != HOST_PORT_NUM)
+		return;
+
+	cpsw_ale_control_set(cpsw->ale, HOST_PORT_NUM, ALE_P0_UNI_FLOOD, 0);
+}
+
+/* netdev notifier */
+static int cpsw_netdevice_event(struct notifier_block *unused,
+				unsigned long event, void *ptr)
+{
+	struct net_device *ndev = netdev_notifier_info_to_dev(ptr);
+	struct netdev_notifier_changeupper_info *info;
+
+	switch (event) {
+	case NETDEV_CHANGEUPPER:
+		info = ptr;
+		if (!info->master)
+			goto out;
+		if (info->linking)
+			cpsw_netdevice_port_link(ndev);
+		else
+			cpsw_netdevice_port_unlink(ndev);
+		break;
+	default:
+		return NOTIFY_DONE;
+	}
+
+out:
+	return NOTIFY_DONE;
+}
+
+static struct notifier_block cpsw_netdevice_nb __read_mostly = {
+	.notifier_call = cpsw_netdevice_event,
+};
+
+static int cpsw_register_notifiers(struct cpsw_priv *priv)
+{
+	int ret;
+
+	ret = register_netdevice_notifier(&cpsw_netdevice_nb);
+	if (ret) {
+		cpsw_err(priv, probe, "can't register netdevice notifier\n");
+		return ret;
+	}
+
+	ret = register_switchdev_notifier(&cpsw_switchdev_notifier);
+	if (ret) {
+		cpsw_err(priv, probe, "can't register switchdev notifier\n");
+		goto unreg_netdevice;
+	}
+
+	return ret;
+
+unreg_netdevice:
+	ret = unregister_netdevice_notifier(&cpsw_netdevice_nb);
+
+	return ret;
+}
+
+static int cpsw_unregister_notifiers(struct cpsw_priv *priv)
+{
+	int ret;
+
+	ret = unregister_switchdev_notifier(&cpsw_switchdev_notifier);
+	if (ret)
+		dev_err(priv->dev, "can't unregister switchdev notifier\n");
+
+	ret += unregister_netdevice_notifier(&cpsw_netdevice_nb);
+	if (ret)
+		dev_err(priv->dev, "can't unregister netdevice notifier\n");
+
+	return ret;
+}
+
 static int cpsw_probe(struct platform_device *pdev)
 {
 	struct clk			*clk;
@@ -2935,7 +3291,11 @@ static int cpsw_probe(struct platform_device *pdev)
 		cpsw->slaves[i].slave_num = i;
 
 	cpsw->slaves[0].ndev = ndev;
-	priv->emac_port = 0;
+
+	if (cpsw_is_switch(cpsw->data.switch_mode))
+		priv->emac_port = HOST_PORT_NUM;
+	else
+		priv->emac_port = 1;
 
 	clk = devm_clk_get(&pdev->dev, "fck");
 	if (IS_ERR(clk)) {
@@ -3076,8 +3436,17 @@ static int cpsw_probe(struct platform_device *pdev)
 			cpsw->quirk_irq = true;
 	}
 
-	ndev->features |= NETIF_F_HW_VLAN_CTAG_FILTER | NETIF_F_HW_VLAN_CTAG_RX;
+	if (cpsw_is_switchdev(cpsw->data.switch_mode)) {
+		ret = cpsw_probe_cpu_port(cpsw);
+		if (ret) {
+			cpsw_err(priv, probe, "error probe cpu interface\n");
+			goto clean_dma_ret;
+		}
+		cpsw_port_switchdev_init(ndev);
+		ndev->features |= NETIF_F_NETNS_LOCAL;
+	}
 
+	ndev->features |= NETIF_F_HW_VLAN_CTAG_FILTER | NETIF_F_HW_VLAN_CTAG_RX;
 	ndev->netdev_ops = &cpsw_netdev_ops;
 	ndev->ethtool_ops = &cpsw_ethtool_ops;
 	netif_napi_add(ndev, &cpsw->napi_rx, cpsw_rx_poll, CPSW_POLL_WEIGHT);
@@ -3093,7 +3462,7 @@ static int cpsw_probe(struct platform_device *pdev)
 		goto clean_dma_ret;
 	}
 
-	if (cpsw->data.dual_emac) {
+	if (!cpsw_is_switch(cpsw->data.switch_mode)) {
 		ret = cpsw_probe_dual_emac(priv);
 		if (ret) {
 			cpsw_err(priv, probe, "error probe slave 2 emac interface\n");
@@ -3139,6 +3508,12 @@ static int cpsw_probe(struct platform_device *pdev)
 		goto clean_dma_ret;
 	}
 
+	if (cpsw_is_switchdev(cpsw->data.switch_mode)) {
+		ret = cpsw_register_notifiers(priv);
+		if (ret)
+			goto clean_dma_ret;
+	}
+
 	cpsw_notice(priv, probe,
 		    "initialized device (regs %pa, irq %d, pool size %d)\n",
 		    &ss_res->start, ndev->irq, dma_params.descs_pool_size);
@@ -3164,7 +3539,8 @@ static int cpsw_probe(struct platform_device *pdev)
 static int cpsw_remove(struct platform_device *pdev)
 {
 	struct net_device *ndev = platform_get_drvdata(pdev);
-	struct cpsw_common *cpsw = ndev_to_cpsw(ndev);
+	struct cpsw_priv *priv = netdev_priv(ndev);
+	struct cpsw_common *cpsw = priv->cpsw;
 	int ret;
 
 	ret = pm_runtime_get_sync(&pdev->dev);
@@ -3173,7 +3549,10 @@ static int cpsw_remove(struct platform_device *pdev)
 		return ret;
 	}
 
-	if (cpsw->data.dual_emac)
+	if (cpsw_is_switchdev(cpsw->data.switch_mode))
+		ret = cpsw_unregister_notifiers(priv);
+
+	if (cpsw->data.switch_mode)
 		unregister_netdev(cpsw->slaves[1].ndev);
 	unregister_netdev(ndev);
 
@@ -3182,8 +3561,10 @@ static int cpsw_remove(struct platform_device *pdev)
 	cpsw_remove_dt(pdev);
 	pm_runtime_put_sync(&pdev->dev);
 	pm_runtime_disable(&pdev->dev);
-	if (cpsw->data.dual_emac)
+	if (cpsw->data.switch_mode)
 		free_netdev(cpsw->slaves[1].ndev);
+	if (cpsw->master)
+		free_netdev(cpsw->master);
 	free_netdev(ndev);
 	return 0;
 }
@@ -3195,7 +3576,7 @@ static int cpsw_suspend(struct device *dev)
 	struct net_device	*ndev = platform_get_drvdata(pdev);
 	struct cpsw_common	*cpsw = ndev_to_cpsw(ndev);
 
-	if (cpsw->data.dual_emac) {
+	if (cpsw->data.switch_mode) {
 		int i;
 
 		for (i = 0; i < cpsw->data.slaves; i++) {
@@ -3224,7 +3605,7 @@ static int cpsw_resume(struct device *dev)
 
 	/* shut up ASSERT_RTNL() warning in netif_set_real_num_tx/rx_queues */
 	rtnl_lock();
-	if (cpsw->data.dual_emac) {
+	if (cpsw->data.switch_mode) {
 		int i;
 
 		for (i = 0; i < cpsw->data.slaves; i++) {
diff --git a/drivers/net/ethernet/ti/cpsw_priv.h b/drivers/net/ethernet/ti/cpsw_priv.h
index 3b02a83..4be5ffc 100644
--- a/drivers/net/ethernet/ti/cpsw_priv.h
+++ b/drivers/net/ethernet/ti/cpsw_priv.h
@@ -30,6 +30,12 @@
 #define CPSW2_TX_PRI_MAP    0x18 /* Tx Header Priority to Switch Pri Mapping */
 #define CPSW2_TS_SEQ_MTYPE  0x1c /* Time Sync Sequence ID Offset and Msg Type */
 
+enum {
+	CPSW_TI_SWITCH,
+	CPSW_DUAL_EMAC,
+	CPSW_SWITCHDEV,
+};
+
 struct cpsw_slave_data {
 	struct	device_node *phy_node;
 	char	phy_id[MII_BUS_ID_SIZE];
@@ -48,7 +54,7 @@ struct cpsw_platform_data {
 	u32	bd_ram_size;  /*buffer descriptor ram size */
 	u32	mac_control;	/* Mac control register */
 	u16	default_vlan;	/* Def VLAN for ALE lookup in VLAN aware mode*/
-	bool	dual_emac;	/* Enable Dual EMAC mode */
+	u8	switch_mode;    /* Enable Dual EMAC/switchdev mode */
 };
 
 struct cpsw_slave {
@@ -80,6 +86,7 @@ struct cpsw_common {
 	u32				coal_intvl;
 	u32				bus_freq_mhz;
 	int				rx_packet_max;
+	struct net_device		*master; /* used for switchdev */
 	struct cpsw_slave		*slaves;
 	struct cpdma_ctlr		*dma;
 	struct cpsw_vector		txv[CPSW_MAX_QUEUES];
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 53+ messages in thread

* Re: [PATCH 0/4] RFC CPSW switchdev mode
  2018-05-24  6:56 [PATCH 0/4] RFC CPSW switchdev mode Ilias Apalodimas
                   ` (3 preceding siblings ...)
  2018-05-24  6:56 ` [PATCH 4/4] cpsw: add switchdev support Ilias Apalodimas
@ 2018-05-24  8:05 ` Jiri Pirko
  2018-05-24  8:48   ` Ilias Apalodimas
  2018-06-01 21:29 ` Grygorii Strashko
  5 siblings, 1 reply; 53+ messages in thread
From: Jiri Pirko @ 2018-05-24  8:05 UTC (permalink / raw)
  To: Ilias Apalodimas
  Cc: netdev, grygorii.strashko, ivan.khoronzhuk, nsekhar, ivecera,
	francois.ozog, yogeshs, spatton

Thu, May 24, 2018 at 08:56:20AM CEST, ilias.apalodimas@linaro.org wrote:
>Hello, 
>
>This is adding a new mode on the cpsw driver based around switchdev.
>In order to enable this you need to enable CONFIG_NET_SWITCHDEV, 
>CONFIG_BRIDGE_VLAN_FILTERING, CONFIG_TI_CPSW_SWITCHDEV
>and add to udev config: 
>
>SUBSYSTEM=="net", ACTION=="add", ATTR{phys_switch_id}=="0f011900", \
>        ATTR{phys_port_name}!="", NAME="sw0$attr{phys_port_name}"
>Since the phys_switch_id is based on cpsw version, users with different 
>version will need to do 'ip -d link show dev sw0p0 | grep switchid' and 
>replace with the correct value.
>
>This patch creates 3 ports, sw0p0, sw0p1 and sw0p2.
>sw0p1 and sw0p2 are the netdev interfaces connected to PHY devices
>while sw0p0 is the switch 'cpu facing port'.

Any reason you need cpu port? We don't need it in mlxsw and also in dsa.

What is this device? Could you give me some pointer to description?

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [PATCH 0/4] RFC CPSW switchdev mode
  2018-05-24  8:05 ` [PATCH 0/4] RFC CPSW switchdev mode Jiri Pirko
@ 2018-05-24  8:48   ` Ilias Apalodimas
  2018-05-24 12:54     ` Andrew Lunn
  0 siblings, 1 reply; 53+ messages in thread
From: Ilias Apalodimas @ 2018-05-24  8:48 UTC (permalink / raw)
  To: Jiri Pirko
  Cc: netdev, grygorii.strashko, ivan.khoronzhuk, nsekhar, ivecera,
	francois.ozog, yogeshs, spatton

On Thu, May 24, 2018 at 10:05:28AM +0200, Jiri Pirko wrote:
> Thu, May 24, 2018 at 08:56:20AM CEST, ilias.apalodimas@linaro.org wrote:
> Any reason you need cpu port? We don't need it in mlxsw and also in dsa.
Yes i've seen that on mlxsw/rocker drivers and i was reluctant adding one here. 
The reason is that TI wants this configured differently from customer facing 
ports. Apparently there are existing customers already using the "feature". 
So OR'ing and adding the cpu port on every operation (add/del vlans add 
ucast/mcast entries etc) was less favoured. 
> 
> What is this device? Could you give me some pointer to description?
This is the switch used on TI's AM5728 and BBB boards. I am pretty sure there 
are other platforms i am not aware of.
http://www.ti.com/lit/ug/spruhz6j/spruhz6j.pdf is the techincal reference
manual. Section 24.11.5.4 "Initialization and Configuration of CPSW" is the
switch part.

Thanks,
Ilias

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [PATCH 3/4] cpsw_switchdev: add switchdev support files
  2018-05-24  6:56 ` [PATCH 3/4] cpsw_switchdev: add switchdev support files Ilias Apalodimas
@ 2018-05-24 10:00   ` Maxim Uvarov
  2018-05-27  4:39   ` kbuild test robot
  1 sibling, 0 replies; 53+ messages in thread
From: Maxim Uvarov @ 2018-05-24 10:00 UTC (permalink / raw)
  To: Ilias Apalodimas
  Cc: netdev, grygorii.strashko, ivan.khoronzhuk, nsekhar, jiri,
	ivecera, francois.ozog, yogeshs, spatton

2018-05-24 9:56 GMT+03:00 Ilias Apalodimas <ilias.apalodimas@linaro.org>:
> Signed-off-by: Ilias Apalodimas <ilias.apalodimas@linaro.org>
> ---
>  drivers/net/ethernet/ti/Kconfig          |   9 +
>  drivers/net/ethernet/ti/Makefile         |   1 +
>  drivers/net/ethernet/ti/cpsw_switchdev.c | 299 +++++++++++++++++++++++++++++++
>  drivers/net/ethernet/ti/cpsw_switchdev.h |   4 +
>  4 files changed, 313 insertions(+)
>  create mode 100644 drivers/net/ethernet/ti/cpsw_switchdev.c
>  create mode 100644 drivers/net/ethernet/ti/cpsw_switchdev.h
>
> diff --git a/drivers/net/ethernet/ti/Kconfig b/drivers/net/ethernet/ti/Kconfig
> index 48a541e..b22ae7d 100644
> --- a/drivers/net/ethernet/ti/Kconfig
> +++ b/drivers/net/ethernet/ti/Kconfig
> @@ -73,6 +73,15 @@ config TI_CPSW
>           To compile this driver as a module, choose M here: the module
>           will be called cpsw.
>
> +config TI_CPSW_SWITCHDEV
> +       bool "TI CPSW switchdev support"
> +       depends on TI_CPSW
> +       depends on NET_SWITCHDEV
> +       help
> +         Enable switchdev support on TI's CPSW Ethernet Switch.
> +
> +         This will allow you to configure the switch using standard tools.
> +
>  config TI_CPTS
>         bool "TI Common Platform Time Sync (CPTS) Support"
>         depends on TI_CPSW || TI_KEYSTONE_NETCP
> diff --git a/drivers/net/ethernet/ti/Makefile b/drivers/net/ethernet/ti/Makefile
> index 0be551d..3926c6a 100644
> --- a/drivers/net/ethernet/ti/Makefile
> +++ b/drivers/net/ethernet/ti/Makefile
> @@ -15,6 +15,7 @@ obj-$(CONFIG_TI_CPSW_PHY_SEL) += cpsw-phy-sel.o
>  obj-$(CONFIG_TI_CPSW_ALE) += cpsw_ale.o
>  obj-$(CONFIG_TI_CPTS_MOD) += cpts.o
>  obj-$(CONFIG_TI_CPSW) += ti_cpsw.o
> +ti_cpsw-objs:= cpsw_switchdev.o
>  ti_cpsw-y := cpsw.o
>
>  obj-$(CONFIG_TI_KEYSTONE_NETCP) += keystone_netcp.o
> diff --git a/drivers/net/ethernet/ti/cpsw_switchdev.c b/drivers/net/ethernet/ti/cpsw_switchdev.c
> new file mode 100644
> index 0000000..bf8c1bf
> --- /dev/null
> +++ b/drivers/net/ethernet/ti/cpsw_switchdev.c
> @@ -0,0 +1,299 @@
> +// SPDX-License-Identifier: GPL-2.0
> +/* Texas Instruments switchdev Driver
> + *
> + * Copyright (C) 2018 Texas Instruments
> + *
> + * This program is free software; you can redistribute it and/or
> + * modify it under the terms of the GNU General Public License as
> + * published by the Free Software Foundation version 2.
> + *
> + * This program is distributed "as is" WITHOUT ANY WARRANTY of any
> + * kind, whether express or implied; without even the implied warranty
> + * of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
> + * GNU General Public License for more details.
> + */
> +
> +#include <linux/etherdevice.h>
> +#include <linux/if_bridge.h>
> +#include <net/switchdev.h>
> +#include "cpsw.h"
> +#include "cpsw_priv.h"
> +#include "cpsw_ale.h"
> +
> +static u32 cpsw_switchdev_get_ver(struct net_device *ndev)
> +{
> +       struct cpsw_priv *priv = netdev_priv(ndev);
> +       struct cpsw_common *cpsw = priv->cpsw;
> +
> +       return cpsw->version;
> +}
> +
> +static int cpsw_port_attr_set(struct net_device *dev,
> +                             const struct switchdev_attr *attr,
> +                             struct switchdev_trans *trans)
> +{
> +       return -EOPNOTSUPP;
> +}
> +
> +static int cpsw_port_attr_get(struct net_device *dev,
> +                             struct switchdev_attr *attr)
> +{
> +       u32 cpsw_ver;
> +       int err = 0;
> +
> +       switch (attr->id) {
> +       case SWITCHDEV_ATTR_ID_PORT_PARENT_ID:
> +               cpsw_ver = cpsw_switchdev_get_ver(dev);
> +               attr->u.ppid.id_len = sizeof(cpsw_ver);
> +               memcpy(&attr->u.ppid.id, &cpsw_ver, attr->u.ppid.id_len);
> +               break;
> +       default:
> +               return -EOPNOTSUPP;
> +       }
> +
> +       return err;

err is always 0 here.

> +}
> +
> +static u16 cpsw_get_pvid(struct cpsw_priv *priv)
> +{
> +       struct cpsw_common *cpsw = priv->cpsw;
> +       u32 __iomem *port_vlan_reg;
> +       u32 pvid;
> +
> +       if (priv->emac_port) {
> +               int reg = CPSW2_PORT_VLAN;
> +
> +               if (cpsw->version == CPSW_VERSION_1)
> +                       reg = CPSW1_PORT_VLAN;
> +               pvid = slave_read(cpsw->slaves + (priv->emac_port - 1), reg);
> +       } else {
> +               port_vlan_reg = &cpsw->host_port_regs->port_vlan;
> +               pvid = readl(port_vlan_reg);
> +       }
> +
> +       pvid = pvid & 0xfff;
> +
> +       return pvid;
> +}
> +
> +static void cpsw_set_pvid(struct cpsw_priv *priv, u16 vid, bool cfi, u32 cos)
> +{
> +       struct cpsw_common *cpsw = priv->cpsw;
> +       void __iomem *port_vlan_reg;
> +       u32 pvid;
> +
> +       pvid = vid;
> +       pvid |= cfi ? BIT(12) : 0;
> +       pvid |= (cos & 0x7) << 13;
> +
> +       if (priv->emac_port) {
> +               int reg = CPSW2_PORT_VLAN;
> +
> +               if (cpsw->version == CPSW_VERSION_1)
> +                       reg = CPSW1_PORT_VLAN;
> +               /* no barrier */
> +               slave_write(cpsw->slaves + (priv->emac_port - 1), pvid, reg);
> +       } else {
> +               /* CPU port */
> +               port_vlan_reg = &cpsw->host_port_regs->port_vlan;
> +               writel(pvid, port_vlan_reg);
> +       }
> +}
> +
> +static int cpsw_port_vlan_add(struct cpsw_priv *priv, bool untag, bool pvid,
> +                             u16 vid)
> +{
> +       struct cpsw_common *cpsw = priv->cpsw;
> +       int port_mask = BIT(priv->emac_port);
> +       int unreg_mcast_mask = 0;
> +       int reg_mcast_mask = 0;
> +       int untag_mask = 0;
> +       int ret = 0;
> +
> +       if (priv->ndev->flags & IFF_ALLMULTI)
> +               unreg_mcast_mask = port_mask;
> +
> +       if (priv->ndev->flags & IFF_MULTICAST)
> +               reg_mcast_mask = port_mask;
> +
> +       if (untag)
> +               untag_mask = port_mask;
> +
> +       ret = cpsw_ale_vlan_add_modify(cpsw->ale, vid, port_mask, untag_mask,
> +                                      reg_mcast_mask, unreg_mcast_mask);
> +       if (ret) {
> +               dev_err(priv->dev, "Unable to add vlan\n");
> +               return ret;
> +       }
> +
> +       if (!pvid)
> +               return ret;
> +
> +       cpsw_set_pvid(priv, vid, 0, 0);
> +
> +       dev_dbg(priv->dev, "VID: %u dev: %s port: %u\n", vid,
> +               priv->ndev->name, priv->emac_port);
> +
> +       return ret;
> +}
> +
> +static int cpsw_port_vlan_del(struct cpsw_priv *priv, u16 vid)
> +{
> +       struct cpsw_common *cpsw = priv->cpsw;
> +       int port_mask = BIT(priv->emac_port);
> +       int ret = 0;

no need to set it to 0 here.

> +
> +       ret = cpsw_ale_vlan_del_modify(cpsw->ale, vid, port_mask);
> +       if (ret != 0)
> +               return ret;
> +
> +       ret = cpsw_ale_del_ucast(cpsw->ale, priv->mac_addr,
> +                                HOST_PORT_NUM, ALE_VLAN, vid);
> +
> +       if (vid == cpsw_get_pvid(priv))
> +               cpsw_set_pvid(priv, 0, 0, 0);
> +
> +       if (ret != 0) {
> +               dev_dbg(priv->dev, "Failed to delete unicast entry\n");
> +               ret = 0;

no need to set it to 0.

> +       }
> +
> +       ret = cpsw_ale_del_mcast(cpsw->ale, priv->ndev->broadcast,
> +                                0, ALE_VLAN, vid);
> +       if (ret != 0) {
> +               dev_dbg(priv->dev, "Failed to delete multicast entry\n");
> +               ret = 0;
> +       }
> +

just return 0 as it always returned.

> +       return ret;
> +}
> +
> +static int cpsw_port_vlans_add(struct cpsw_priv *priv,
> +                              const struct switchdev_obj_port_vlan *vlan,
> +                              struct switchdev_trans *trans)
> +{
> +       bool untagged = vlan->flags & BRIDGE_VLAN_INFO_UNTAGGED;
> +       bool pvid = vlan->flags & BRIDGE_VLAN_INFO_PVID;
> +       u16 vid;
> +
> +       if (switchdev_trans_ph_prepare(trans))
> +               return 0;
> +
> +       for (vid = vlan->vid_begin; vid <= vlan->vid_end; vid++) {
> +               int err;
> +
> +               err = cpsw_port_vlan_add(priv, untagged, pvid, vid);
> +               if (err)
> +                       return err;
> +       }
> +
> +       return 0;
> +}
> +
> +static int cpsw_port_vlans_del(struct cpsw_priv *priv,
> +                              const struct switchdev_obj_port_vlan *vlan)
> +
> +{
> +       u16 vid;
> +
> +       for (vid = vlan->vid_begin; vid <= vlan->vid_end; vid++) {
> +               int err;
> +
> +               err = cpsw_port_vlan_del(priv, vid);
> +               if (err)
> +                       return err;
> +       }
> +
> +       return 0;
> +}
> +
> +static int cpsw_port_mdb_add(struct cpsw_priv *priv,
> +                            struct switchdev_obj_port_mdb *mdb,
> +                            struct switchdev_trans *trans)
> +{
> +       struct cpsw_common *cpsw = priv->cpsw;
> +       int port_mask;
> +       int err;
> +
> +       if (switchdev_trans_ph_prepare(trans))
> +               return 0;
> +
> +       port_mask = BIT(priv->emac_port);
> +       err = cpsw_ale_mcast_add_modify(cpsw->ale, mdb->addr, port_mask,
> +                                       ALE_VLAN, mdb->vid, 0);
> +
> +       return err;
> +}
> +
> +static int cpsw_port_mdb_del(struct cpsw_priv *priv,
> +                            struct switchdev_obj_port_mdb *mdb)
> +
> +{
> +       struct cpsw_common *cpsw = priv->cpsw;
> +       int del_mask;
> +       int err;
> +
> +       del_mask = BIT(priv->emac_port);
> +       err = cpsw_ale_mcast_del_modify(cpsw->ale, mdb->addr, del_mask,
> +                                       ALE_VLAN, mdb->vid);
> +
> +       return err;
> +}
> +
> +static int cpsw_port_obj_add(struct net_device *ndev,
> +                            const struct switchdev_obj *obj,
> +                            struct switchdev_trans *trans)
> +{
> +       struct switchdev_obj_port_vlan *vlan = SWITCHDEV_OBJ_PORT_VLAN(obj);
> +       struct switchdev_obj_port_mdb *mdb = SWITCHDEV_OBJ_PORT_MDB(obj);
> +       struct cpsw_priv *priv = netdev_priv(ndev);
> +       int err = 0;
> +
> +       switch (obj->id) {
> +       case SWITCHDEV_OBJ_ID_PORT_VLAN:
> +               err = cpsw_port_vlans_add(priv, vlan, trans);
> +               break;
> +       case SWITCHDEV_OBJ_ID_PORT_MDB:
> +               err = cpsw_port_mdb_add(priv, mdb, trans);
> +               break;
> +       default:
> +               err = -EOPNOTSUPP;
> +               break;
> +       }
> +
> +       return err;
> +}
> +
> +static int cpsw_port_obj_del(struct net_device *ndev,
> +                            const struct switchdev_obj *obj)
> +{
> +       struct switchdev_obj_port_vlan *vlan = SWITCHDEV_OBJ_PORT_VLAN(obj);
> +       struct cpsw_priv *priv = netdev_priv(ndev);
> +       int err = 0;
> +
> +       switch (obj->id) {
> +       case SWITCHDEV_OBJ_ID_PORT_VLAN:
> +               err = cpsw_port_vlans_del(priv, vlan);
> +               break;
> +       case SWITCHDEV_OBJ_ID_PORT_MDB:
> +               err = cpsw_port_mdb_del(priv, SWITCHDEV_OBJ_PORT_MDB(obj));
> +               break;
> +       default:
> +               err = -EOPNOTSUPP;
> +               break;
> +       }
> +
> +       return err;
> +}
> +
> +static const struct switchdev_ops cpsw_port_switchdev_ops = {
> +       .switchdev_port_attr_set        = cpsw_port_attr_set,
> +       .switchdev_port_attr_get        = cpsw_port_attr_get,
> +       .switchdev_port_obj_add         = cpsw_port_obj_add,
> +       .switchdev_port_obj_del         = cpsw_port_obj_del,
> +};
> +
> +void cpsw_port_switchdev_init(struct net_device *ndev)
> +{
> +       ndev->switchdev_ops = &cpsw_port_switchdev_ops;
> +}
> diff --git a/drivers/net/ethernet/ti/cpsw_switchdev.h b/drivers/net/ethernet/ti/cpsw_switchdev.h
> new file mode 100644
> index 0000000..4940462
> --- /dev/null
> +++ b/drivers/net/ethernet/ti/cpsw_switchdev.h
> @@ -0,0 +1,4 @@
> +/* SPDX-License-Identifier: GPL-2.0 */
> +#include <net/switchdev.h>
> +
> +void cpsw_port_switchdev_init(struct net_device *ndev);
> --
> 2.7.4
>



-- 
Best regards,
Maxim Uvarov

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [PATCH 0/4] RFC CPSW switchdev mode
  2018-05-24  8:48   ` Ilias Apalodimas
@ 2018-05-24 12:54     ` Andrew Lunn
  2018-05-24 13:44       ` Ivan Vecera
  0 siblings, 1 reply; 53+ messages in thread
From: Andrew Lunn @ 2018-05-24 12:54 UTC (permalink / raw)
  To: Ilias Apalodimas
  Cc: Jiri Pirko, netdev, grygorii.strashko, ivan.khoronzhuk, nsekhar,
	ivecera, francois.ozog, yogeshs, spatton

On Thu, May 24, 2018 at 11:48:31AM +0300, Ilias Apalodimas wrote:
> On Thu, May 24, 2018 at 10:05:28AM +0200, Jiri Pirko wrote:
> > Thu, May 24, 2018 at 08:56:20AM CEST, ilias.apalodimas@linaro.org wrote:
> > Any reason you need cpu port? We don't need it in mlxsw and also in dsa.
> Yes i've seen that on mlxsw/rocker drivers and i was reluctant adding one here. 
> The reason is that TI wants this configured differently from customer facing 
> ports. Apparently there are existing customers already using the "feature". 
> So OR'ing and adding the cpu port on every operation (add/del vlans add 
> ucast/mcast entries etc) was less favoured. 

Hi Ilias

Nice to see this device moving away from its custom model and towards
the switchdev model.

Did you consider making a clean break from the existing code and write
a new driver. Let the existing customers using the existing
driver. Have the new switchdev driver fully conform to switchdev.

I don't like having this 'cpu' interface. As you say, it breaks the
switchhdev model. If we need to extend the switchdev model to support
some use case, lets do that. Please can you fully describe the use
cases, so we can discuss how to implement them cleanly within the
switchdev model.

Thanks
	  Andrew

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [PATCH 4/4] cpsw: add switchdev support
  2018-05-24  6:56 ` [PATCH 4/4] cpsw: add switchdev support Ilias Apalodimas
@ 2018-05-24 13:12   ` Andrew Lunn
  2018-05-24 13:32     ` Ilias Apalodimas
  0 siblings, 1 reply; 53+ messages in thread
From: Andrew Lunn @ 2018-05-24 13:12 UTC (permalink / raw)
  To: Ilias Apalodimas
  Cc: netdev, grygorii.strashko, ivan.khoronzhuk, nsekhar, jiri,
	ivecera, francois.ozog, yogeshs, spatton

> @@ -2626,7 +2750,11 @@ static int cpsw_probe_dt(struct cpsw_platform_data *data,
>  	data->mac_control = prop;
>  
>  	if (of_property_read_bool(node, "dual_emac"))
> -		data->dual_emac = 1;
> +		data->switch_mode = CPSW_DUAL_EMAC;
> +
> +	/* switchdev overrides DTS */
> +	if (IS_ENABLED(CONFIG_TI_CPSW_SWITCHDEV))
> +		data->switch_mode = CPSW_SWITCHDEV;

Device tree is supposed to describe the hardware. Using that hardware
in different ways is not something you should describe in DT.

There are also a lot of IS_ENABLED() here, which i don't like. It is a
lot better than #ifdef, but we should try to do better. It would be
good to split this cleanly into three parts. A generic library, which
does not care about DUAL_MAC or SWITCHDEV. A driver which implements
legacy DUAL MAC etc. And a driver which implements SWITCHDEV. We can
then give this new switchdev driver a different compatible. It i still
encoding in device tree how to use the hardware, but it is more
implicit, rather than explicit.

	  Andrew

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [PATCH 4/4] cpsw: add switchdev support
  2018-05-24 13:12   ` Andrew Lunn
@ 2018-05-24 13:32     ` Ilias Apalodimas
  2018-05-24 16:39       ` Andrew Lunn
  0 siblings, 1 reply; 53+ messages in thread
From: Ilias Apalodimas @ 2018-05-24 13:32 UTC (permalink / raw)
  To: Andrew Lunn
  Cc: netdev, grygorii.strashko, ivan.khoronzhuk, nsekhar, jiri,
	ivecera, francois.ozog, yogeshs, spatton

On Thu, May 24, 2018 at 03:12:29PM +0200, Andrew Lunn wrote:
> Device tree is supposed to describe the hardware. Using that hardware
> in different ways is not something you should describe in DT.
> 
The new switchdev mode is applied with a .config option in the kernel. What you
see is pre-existing code, so i am not sure if i should change it in this
patchset. Your point is valid though and we are on the same page.
> There are also a lot of IS_ENABLED() here, which i don't like. It is a
> lot better than #ifdef, but we should try to do better. 
I don't like it either i just tried to clean up code in "hot path" with ifdefs.
In theory this should replace "switch mode" in the near future so the ifdefs
will go away
> It would be
> good to split this cleanly into three parts. A generic library, which
> does not care about DUAL_MAC or SWITCHDEV. A driver which implements
> legacy DUAL MAC etc. And a driver which implements SWITCHDEV. We can
> then give this new switchdev driver a different compatible. It i still
> encoding in device tree how to use the hardware, but it is more
> implicit, rather than explicit.
Good idea, i'll sent the next version like that

Thanks,
Ilias

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [PATCH 0/4] RFC CPSW switchdev mode
  2018-05-24 12:54     ` Andrew Lunn
@ 2018-05-24 13:44       ` Ivan Vecera
  2018-05-24 14:08         ` Ilias Apalodimas
  0 siblings, 1 reply; 53+ messages in thread
From: Ivan Vecera @ 2018-05-24 13:44 UTC (permalink / raw)
  To: Andrew Lunn, Ilias Apalodimas
  Cc: Jiri Pirko, netdev, grygorii.strashko, ivan.khoronzhuk, nsekhar,
	francois.ozog, yogeshs, spatton

On 24.5.2018 14:54, Andrew Lunn wrote:
> On Thu, May 24, 2018 at 11:48:31AM +0300, Ilias Apalodimas wrote:
>> On Thu, May 24, 2018 at 10:05:28AM +0200, Jiri Pirko wrote:
>>> Thu, May 24, 2018 at 08:56:20AM CEST, ilias.apalodimas@linaro.org wrote:
>>> Any reason you need cpu port? We don't need it in mlxsw and also in dsa.
>> Yes i've seen that on mlxsw/rocker drivers and i was reluctant adding one here. 
>> The reason is that TI wants this configured differently from customer facing 
>> ports. Apparently there are existing customers already using the "feature". 
>> So OR'ing and adding the cpu port on every operation (add/del vlans add 
>> ucast/mcast entries etc) was less favoured. 
> 
> Hi Ilias
> 
> Nice to see this device moving away from its custom model and towards
> the switchdev model.
+1

> Did you consider making a clean break from the existing code and write
> a new driver. Let the existing customers using the existing
> driver. Have the new switchdev driver fully conform to switchdev.

I would also prefer fresh new driver. The existing one can be marked as
'bugfix-only' and later pertinently deprecated/removed.
> 
> I don't like having this 'cpu' interface. As you say, it breaks the
> switchhdev model. If we need to extend the switchdev model to support
> some use case, lets do that. Please can you fully describe the use
> cases, so we can discuss how to implement them cleanly within the
> switchdev model.
+1

Ivan

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [PATCH 0/4] RFC CPSW switchdev mode
  2018-05-24 13:44       ` Ivan Vecera
@ 2018-05-24 14:08         ` Ilias Apalodimas
  2018-05-24 14:54           ` Andrew Lunn
  2018-06-02 23:28           ` Grygorii Strashko
  0 siblings, 2 replies; 53+ messages in thread
From: Ilias Apalodimas @ 2018-05-24 14:08 UTC (permalink / raw)
  To: Ivan Vecera
  Cc: Andrew Lunn, Jiri Pirko, netdev, grygorii.strashko,
	ivan.khoronzhuk, nsekhar, francois.ozog, yogeshs, spatton

On Thu, May 24, 2018 at 03:44:54PM +0200, Ivan Vecera wrote:
> On 24.5.2018 14:54, Andrew Lunn wrote:
> > On Thu, May 24, 2018 at 11:48:31AM +0300, Ilias Apalodimas wrote:
> >> On Thu, May 24, 2018 at 10:05:28AM +0200, Jiri Pirko wrote:
> >>> Thu, May 24, 2018 at 08:56:20AM CEST, ilias.apalodimas@linaro.org wrote:
> >>> Any reason you need cpu port? We don't need it in mlxsw and also in dsa.
> >> Yes i've seen that on mlxsw/rocker drivers and i was reluctant adding one here. 
> >> The reason is that TI wants this configured differently from customer facing 
> >> ports. Apparently there are existing customers already using the "feature". 
> >> So OR'ing and adding the cpu port on every operation (add/del vlans add 
> >> ucast/mcast entries etc) was less favoured. 
> > 
> > Hi Ilias
> > 
> > Nice to see this device moving away from its custom model and towards
> > the switchdev model.
> +1
Thanks. To be honest it opens up so many posibilities for common configuration
from userspace across vendors that doing something new without it doesn't make
any sense (at least to me).

> 
> > Did you consider making a clean break from the existing code and write
> > a new driver. Let the existing customers using the existing
> > driver. Have the new switchdev driver fully conform to switchdev.
> 
> I would also prefer fresh new driver. The existing one can be marked as
> 'bugfix-only' and later pertinently deprecated/removed.
Yes, but given the driver and the platforms it's used at, we ended up patching 
the existing driver. I am not opposed to the idea, but Grygorii is more suited 
to reply on that.

> > 
> > I don't like having this 'cpu' interface. As you say, it breaks the
> > switchhdev model. If we need to extend the switchdev model to support
> > some use case, lets do that. Please can you fully describe the use
> > cases, so we can discuss how to implement them cleanly within the
> > switchdev model.
> +1
There's configuration needs from customers adding or not adding a VLAN to the
CPU port. In my configuration examples for instance, if the cpu port is not
added to the bridge, you cannot get an ip address on it. 
Similar cases exist for customers on adding MDBs as far as i know. So they want
the "customer facing ports" to have the MDBs present but not the cpu port.
In some cases (where the CPE/device that has the switch) participates in the 
traffic they want the cpu port to have the samne MDBs installed.
This is just two simple cases that come in mind, again Grygorii is more suited
to answer and explain existing/more complex use cases better than me.

Adding a cpu port that cannot transmit or receive traffic is a bit "weird", on
the other hand you can access it's configuration using the same userspace tools
and the same commands you do for the "normal" ports. Extending switchdev might
be the proper solution here.

Ilias

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [PATCH 0/4] RFC CPSW switchdev mode
  2018-05-24 14:08         ` Ilias Apalodimas
@ 2018-05-24 14:54           ` Andrew Lunn
  2018-05-24 15:07             ` Ilias Apalodimas
  2018-06-02 23:28           ` Grygorii Strashko
  1 sibling, 1 reply; 53+ messages in thread
From: Andrew Lunn @ 2018-05-24 14:54 UTC (permalink / raw)
  To: Ilias Apalodimas
  Cc: Ivan Vecera, Jiri Pirko, netdev, grygorii.strashko,
	ivan.khoronzhuk, nsekhar, francois.ozog, yogeshs, spatton

> There's configuration needs from customers adding or not adding a VLAN to the
> CPU port. In my configuration examples for instance, if the cpu port is not
> added to the bridge, you cannot get an ip address on it. 

If you cannot get an IP address, it is plain broken. The whole idea is
that switch port interfaces are just linux interfaces. A linux
interface which cannot get an IP address is broken.

> Similar cases exist for customers on adding MDBs as far as i know. So they want
> the "customer facing ports" to have the MDBs present but not the cpu port.

That i can understand. And it should actually work now with
switchdev. It performs IGMP snooping, and if there is nothing joining
the group on the CPU, it won't add an MDB entry to forward traffic to
the CPU.

> Adding a cpu port that cannot transmit or receive traffic is a bit "weird"

And how is it supposed to send BPDUs? STP is going to be broken....

    Andrew

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [PATCH 0/4] RFC CPSW switchdev mode
  2018-05-24 14:54           ` Andrew Lunn
@ 2018-05-24 15:07             ` Ilias Apalodimas
  2018-05-24 15:25               ` Andrew Lunn
  0 siblings, 1 reply; 53+ messages in thread
From: Ilias Apalodimas @ 2018-05-24 15:07 UTC (permalink / raw)
  To: Andrew Lunn
  Cc: Ivan Vecera, Jiri Pirko, netdev, grygorii.strashko,
	ivan.khoronzhuk, nsekhar, francois.ozog, yogeshs, spatton

On Thu, May 24, 2018 at 04:54:41PM +0200, Andrew Lunn wrote:
> If you cannot get an IP address, it is plain broken. The whole idea is
> that switch port interfaces are just linux interfaces. A linux
> interface which cannot get an IP address is broken.
The switch interfaces can get ip addresses just like every linux interface. The
cpu port can't (sw0p0)
> 
> > Similar cases exist for customers on adding MDBs as far as i know. So they want
> > the "customer facing ports" to have the MDBs present but not the cpu port.
> 
> That i can understand. And it should actually work now with
> switchdev. It performs IGMP snooping, and if there is nothing joining
> the group on the CPU, it won't add an MDB entry to forward traffic to
> the CPU.
Yes, but this should be configurable (i.e the customer can deny adding the MDB
on the cpu port)
> 
> > Adding a cpu port that cannot transmit or receive traffic is a bit "weird"
> 
> And how is it supposed to send BPDUs? STP is going to be broken....
Not sure about this, i'll have to check

Regards
Ilias

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [PATCH 0/4] RFC CPSW switchdev mode
  2018-05-24 15:07             ` Ilias Apalodimas
@ 2018-05-24 15:25               ` Andrew Lunn
  2018-05-24 16:02                 ` Ilias Apalodimas
  0 siblings, 1 reply; 53+ messages in thread
From: Andrew Lunn @ 2018-05-24 15:25 UTC (permalink / raw)
  To: Ilias Apalodimas
  Cc: Ivan Vecera, Jiri Pirko, netdev, grygorii.strashko,
	ivan.khoronzhuk, nsekhar, francois.ozog, yogeshs, spatton

> > That i can understand. And it should actually work now with
> > switchdev. It performs IGMP snooping, and if there is nothing joining
> > the group on the CPU, it won't add an MDB entry to forward traffic to
> > the CPU.

> Yes, but this should be configurable (i.e the customer can deny adding the MDB
> on the cpu port)

O.K, back to the basic idea. Switch ports are just normal Linux
interfaces.

How would you configure this with two e1000e put in a bridge? I want
multicast to be bridged between the two e1000e, but the host stack
should not see the packets.

	Andrew

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [PATCH 0/4] RFC CPSW switchdev mode
  2018-05-24 15:25               ` Andrew Lunn
@ 2018-05-24 16:02                 ` Ilias Apalodimas
  2018-05-24 16:33                   ` Andrew Lunn
  0 siblings, 1 reply; 53+ messages in thread
From: Ilias Apalodimas @ 2018-05-24 16:02 UTC (permalink / raw)
  To: Andrew Lunn
  Cc: Ivan Vecera, Jiri Pirko, netdev, grygorii.strashko,
	ivan.khoronzhuk, nsekhar, francois.ozog, yogeshs, spatton

On Thu, May 24, 2018 at 05:25:59PM +0200, Andrew Lunn wrote:
> O.K, back to the basic idea. Switch ports are just normal Linux
> interfaces.
> 
> How would you configure this with two e1000e put in a bridge? I want
> multicast to be bridged between the two e1000e, but the host stack
> should not see the packets.
I am not sure i am following. I might be missing something. In your case you
got two ethernet pci/pcie interfaces bridged through software. You can filter
those if needed. In the case we are trying to cover, you got a hardware that
offers that capability. Since not all switches are pcie based shouldn't we be
able to allow this ?

Regards
Ilias

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [PATCH 0/4] RFC CPSW switchdev mode
  2018-05-24 16:02                 ` Ilias Apalodimas
@ 2018-05-24 16:33                   ` Andrew Lunn
  2018-05-25  6:29                     ` Ilias Apalodimas
  0 siblings, 1 reply; 53+ messages in thread
From: Andrew Lunn @ 2018-05-24 16:33 UTC (permalink / raw)
  To: Ilias Apalodimas
  Cc: Ivan Vecera, Jiri Pirko, netdev, grygorii.strashko,
	ivan.khoronzhuk, nsekhar, francois.ozog, yogeshs, spatton

On Thu, May 24, 2018 at 07:02:54PM +0300, Ilias Apalodimas wrote:
> On Thu, May 24, 2018 at 05:25:59PM +0200, Andrew Lunn wrote:
> > O.K, back to the basic idea. Switch ports are just normal Linux
> > interfaces.
> > 
> > How would you configure this with two e1000e put in a bridge? I want
> > multicast to be bridged between the two e1000e, but the host stack
> > should not see the packets.
> I am not sure i am following. I might be missing something. In your case you
> got two ethernet pci/pcie interfaces bridged through software. You can filter
> those if needed. In the case we are trying to cover, you got a hardware that
> offers that capability. Since not all switches are pcie based shouldn't we be
> able to allow this ?

switchdev is about offloading what Linux can do to hardware to
accelerate it. The switch is a block of accelerator hardware, like a
GPU is for accelerating graphics. Linux can render OpenGL, but it is
better to hand it over to the GPU accelerator.

Same applies here. The Linux bridge can bridge multicast. Using the
switchdev API, you can push that down to the accelerator, and let it
do it.

So you need to think about, how do you make the Linux bridge not pass
multicast traffic to the host stack. Then how do you extend the
switchdev API so you can push this down to the accelerator.

To really get switchdev, you often need to pivot your point of view a
bit. People often think, switchdev is about writing drivers for
switches. Its not, its about how you offload networking which Linux
can do down to a switch. And if the switch cannot accelerate it, you
leave Linux to do it.

When you get in the details, i think you will find the switchdev API
actually already has what you need for this use case. What you need to
figure out is how you make the Linux bridge not pass multicast to the
host. Well, actually, not pass multicast it has not asked for. Then
accelerate it.

    Andrew

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [PATCH 4/4] cpsw: add switchdev support
  2018-05-24 13:32     ` Ilias Apalodimas
@ 2018-05-24 16:39       ` Andrew Lunn
  2018-05-25  4:56         ` Ilias Apalodimas
  0 siblings, 1 reply; 53+ messages in thread
From: Andrew Lunn @ 2018-05-24 16:39 UTC (permalink / raw)
  To: Ilias Apalodimas
  Cc: netdev, grygorii.strashko, ivan.khoronzhuk, nsekhar, jiri,
	ivecera, francois.ozog, yogeshs, spatton

On Thu, May 24, 2018 at 04:32:34PM +0300, Ilias Apalodimas wrote:
> On Thu, May 24, 2018 at 03:12:29PM +0200, Andrew Lunn wrote:
> > Device tree is supposed to describe the hardware. Using that hardware
> > in different ways is not something you should describe in DT.
> > 
> The new switchdev mode is applied with a .config option in the kernel. What you
> see is pre-existing code, so i am not sure if i should change it in this
> patchset.

If you break the code up into a library and two drivers, it becomes a
moot point.

But what i don't like here is that the device tree says to do dual
mac. But you ignore that and do sometime else. I would prefer that if
DT says dual mac, and switchdev is compiled in, the probe fails with
EINVAL. Rather than ignore something, make it clear it is invalid.

      Andrew

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [PATCH 4/4] cpsw: add switchdev support
  2018-05-24 16:39       ` Andrew Lunn
@ 2018-05-25  4:56         ` Ilias Apalodimas
  2018-06-01 21:48           ` Florian Fainelli
  0 siblings, 1 reply; 53+ messages in thread
From: Ilias Apalodimas @ 2018-05-25  4:56 UTC (permalink / raw)
  To: Andrew Lunn
  Cc: netdev, grygorii.strashko, ivan.khoronzhuk, nsekhar, jiri,
	ivecera, francois.ozog, yogeshs, spatton

On Thu, May 24, 2018 at 06:39:04PM +0200, Andrew Lunn wrote:
> On Thu, May 24, 2018 at 04:32:34PM +0300, Ilias Apalodimas wrote:
> > On Thu, May 24, 2018 at 03:12:29PM +0200, Andrew Lunn wrote:
> > > Device tree is supposed to describe the hardware. Using that hardware
> > > in different ways is not something you should describe in DT.
> > > 
> > The new switchdev mode is applied with a .config option in the kernel. What you
> > see is pre-existing code, so i am not sure if i should change it in this
> > patchset.
> 
> If you break the code up into a library and two drivers, it becomes a
> moot point.
Agree

> 
> But what i don't like here is that the device tree says to do dual
> mac. But you ignore that and do sometime else. I would prefer that if
> DT says dual mac, and switchdev is compiled in, the probe fails with
> EINVAL. Rather than ignore something, make it clear it is invalid.
The switch has 3 modes of operation as is.
1. switch mode, to enable that you don't need to add anything on
the DTS and linux registers a single netdev interface.
2. dual mac mode, this is when you need to add dual_emac; on the DTS.
3. switchdev mode which is controlled by a .config option, since as you 
pointed out DTS was not made for controlling config options. 

I agree that this is far from beautiful. If the driver remains as in though,
i'd prefer either keeping what's there or making "switchdev" a DTS option, 
following the pre-existing erroneous usage rather than making the device 
unusable.  If we end up returning some error and refuse to initialize, users 
that remote upgrade their equipment, without taking a good look at changelog,
will loose access to their devices with no means of remotely fixing that.

Regards
Ilias

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [PATCH 0/4] RFC CPSW switchdev mode
  2018-05-24 16:33                   ` Andrew Lunn
@ 2018-05-25  6:29                     ` Ilias Apalodimas
  2018-05-25 10:28                       ` Ilias Apalodimas
  2018-05-25 12:09                       ` Andrew Lunn
  0 siblings, 2 replies; 53+ messages in thread
From: Ilias Apalodimas @ 2018-05-25  6:29 UTC (permalink / raw)
  To: Andrew Lunn
  Cc: Ivan Vecera, Jiri Pirko, netdev, grygorii.strashko,
	ivan.khoronzhuk, nsekhar, francois.ozog, yogeshs, spatton

On Thu, May 24, 2018 at 06:33:10PM +0200, Andrew Lunn wrote:
> On Thu, May 24, 2018 at 07:02:54PM +0300, Ilias Apalodimas wrote:
> > On Thu, May 24, 2018 at 05:25:59PM +0200, Andrew Lunn wrote:
> > > O.K, back to the basic idea. Switch ports are just normal Linux
> > > interfaces.
> > > 
> > > How would you configure this with two e1000e put in a bridge? I want
> > > multicast to be bridged between the two e1000e, but the host stack
> > > should not see the packets.
> > I am not sure i am following. I might be missing something. In your case you
> > got two ethernet pci/pcie interfaces bridged through software. You can filter
> > those if needed. In the case we are trying to cover, you got a hardware that
> > offers that capability. Since not all switches are pcie based shouldn't we be
> > able to allow this ?
> 
> switchdev is about offloading what Linux can do to hardware to
> accelerate it. The switch is a block of accelerator hardware, like a
> GPU is for accelerating graphics. Linux can render OpenGL, but it is
> better to hand it over to the GPU accelerator.
> 
> Same applies here. The Linux bridge can bridge multicast. Using the
> switchdev API, you can push that down to the accelerator, and let it
> do it.
> 
> So you need to think about, how do you make the Linux bridge not pass
> multicast traffic to the host stack. Then how do you extend the
> switchdev API so you can push this down to the accelerator.
> 
> To really get switchdev, you often need to pivot your point of view a
> bit. People often think, switchdev is about writing drivers for
> switches. Its not, its about how you offload networking which Linux
> can do down to a switch. And if the switch cannot accelerate it, you
> leave Linux to do it.
> 
> When you get in the details, i think you will find the switchdev API
> actually already has what you need for this use case. What you need to
> figure out is how you make the Linux bridge not pass multicast to the
> host. Well, actually, not pass multicast it has not asked for. Then
> accelerate it.
> 
Understood, if we missed back anything on handling multicast for
the cpu port we'll go back and fix it (i am assuming snooping is the answer
here). Multicasting is only one part of the equation though. What about the 
need for vlans/FDBs on that port though? 

Ilias

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [PATCH 0/4] RFC CPSW switchdev mode
  2018-05-25  6:29                     ` Ilias Apalodimas
@ 2018-05-25 10:28                       ` Ilias Apalodimas
  2018-05-25 11:59                         ` Andrew Lunn
  2018-05-25 12:09                       ` Andrew Lunn
  1 sibling, 1 reply; 53+ messages in thread
From: Ilias Apalodimas @ 2018-05-25 10:28 UTC (permalink / raw)
  To: Andrew Lunn
  Cc: Ivan Vecera, Jiri Pirko, netdev, grygorii.strashko,
	ivan.khoronzhuk, nsekhar, francois.ozog, yogeshs, spatton

On Fri, May 25, 2018 at 09:29:02AM +0300, Ilias Apalodimas wrote:
> On Thu, May 24, 2018 at 06:33:10PM +0200, Andrew Lunn wrote:
> > On Thu, May 24, 2018 at 07:02:54PM +0300, Ilias Apalodimas wrote:
> > > On Thu, May 24, 2018 at 05:25:59PM +0200, Andrew Lunn wrote:
> > > > O.K, back to the basic idea. Switch ports are just normal Linux
> > > > interfaces.
> > > > 
> > > > How would you configure this with two e1000e put in a bridge? I want
> > > > multicast to be bridged between the two e1000e, but the host stack
> > > > should not see the packets.
> > > I am not sure i am following. I might be missing something. In your case you
> > > got two ethernet pci/pcie interfaces bridged through software. You can filter
> > > those if needed. In the case we are trying to cover, you got a hardware that
> > > offers that capability. Since not all switches are pcie based shouldn't we be
> > > able to allow this ?
> > 
> > switchdev is about offloading what Linux can do to hardware to
> > accelerate it. The switch is a block of accelerator hardware, like a
> > GPU is for accelerating graphics. Linux can render OpenGL, but it is
> > better to hand it over to the GPU accelerator.
> > 
> > Same applies here. The Linux bridge can bridge multicast. Using the
> > switchdev API, you can push that down to the accelerator, and let it
> > do it.
> > 
> > So you need to think about, how do you make the Linux bridge not pass
> > multicast traffic to the host stack. Then how do you extend the
> > switchdev API so you can push this down to the accelerator.
> > 
> > To really get switchdev, you often need to pivot your point of view a
> > bit. People often think, switchdev is about writing drivers for
> > switches. Its not, its about how you offload networking which Linux
> > can do down to a switch. And if the switch cannot accelerate it, you
> > leave Linux to do it.
> > 
> > When you get in the details, i think you will find the switchdev API
> > actually already has what you need for this use case. What you need to
> > figure out is how you make the Linux bridge not pass multicast to the
> > host. Well, actually, not pass multicast it has not asked for. Then
> > accelerate it.
> > 
> Understood, if we missed back anything on handling multicast for
> the cpu port we'll go back and fix it (i am assuming snooping is the answer
> here). Multicasting is only one part of the equation though. What about the 
> need for vlans/FDBs on that port though? 
> 
I just noticed this: https://www.spinics.net/lists/netdev/msg504760.html
I tried doing the "bridge vlan add vid 2 dev br0 self" in my initial attempts 
but didn't get a notification to program the CPU port(with the sef argument). 
This obviously solves our vlan configuration issue. So it's only static FBDs 
left.

Regards
Ilias

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [PATCH 0/4] RFC CPSW switchdev mode
  2018-05-25 10:28                       ` Ilias Apalodimas
@ 2018-05-25 11:59                         ` Andrew Lunn
  0 siblings, 0 replies; 53+ messages in thread
From: Andrew Lunn @ 2018-05-25 11:59 UTC (permalink / raw)
  To: Ilias Apalodimas
  Cc: Ivan Vecera, Jiri Pirko, netdev, grygorii.strashko,
	ivan.khoronzhuk, nsekhar, francois.ozog, yogeshs, spatton

> So it's only static FBDs left.

Please describe the use case. Why would i want to put static FDB
addresses on a CPU port? What in the linux network stack needs this?

	  Andrew

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [PATCH 0/4] RFC CPSW switchdev mode
  2018-05-25  6:29                     ` Ilias Apalodimas
  2018-05-25 10:28                       ` Ilias Apalodimas
@ 2018-05-25 12:09                       ` Andrew Lunn
  2018-05-31 15:27                         ` Ilias Apalodimas
  1 sibling, 1 reply; 53+ messages in thread
From: Andrew Lunn @ 2018-05-25 12:09 UTC (permalink / raw)
  To: Ilias Apalodimas
  Cc: Ivan Vecera, Jiri Pirko, netdev, grygorii.strashko,
	ivan.khoronzhuk, nsekhar, francois.ozog, yogeshs, spatton

> Understood, if we missed back anything on handling multicast for
> the cpu port we'll go back and fix it (i am assuming snooping is the answer
> here).

It is part of the answer. You should also look at .ndo_set_rx_mode.
When the switch ports are not in a bridge, this call i used to pass a
list of MAC addresses which the network stack would like to receiver.
You probably want to turn that list into MBD and FDBs.

    Andrew

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [PATCH 3/4] cpsw_switchdev: add switchdev support files
  2018-05-24  6:56 ` [PATCH 3/4] cpsw_switchdev: add switchdev support files Ilias Apalodimas
  2018-05-24 10:00   ` Maxim Uvarov
@ 2018-05-27  4:39   ` kbuild test robot
  1 sibling, 0 replies; 53+ messages in thread
From: kbuild test robot @ 2018-05-27  4:39 UTC (permalink / raw)
  To: Ilias Apalodimas
  Cc: kbuild-all, netdev, grygorii.strashko, ivan.khoronzhuk, nsekhar,
	jiri, ivecera, francois.ozog, yogeshs, spatton, Ilias Apalodimas

[-- Attachment #1: Type: text/plain, Size: 1439 bytes --]

Hi Ilias,

Thank you for the patch! Yet something to improve:

[auto build test ERROR on net/master]
[also build test ERROR on v4.17-rc6]
[cannot apply to net-next/master]
[if your patch is applied to the wrong git tree, please drop us a note to help improve the system]

url:    https://github.com/0day-ci/linux/commits/Ilias-Apalodimas/RFC-CPSW-switchdev-mode/20180527-102334
config: arm-omap2plus_defconfig (attached as .config)
compiler: arm-linux-gnueabi-gcc (Debian 7.2.0-11) 7.2.0
reproduce:
        wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
        chmod +x ~/bin/make.cross
        # save the attached .config to linux build tree
        make.cross ARCH=arm 

All errors (new ones prefixed by >>):

   drivers/net/ethernet/ti/cpsw_switchdev.c: In function 'cpsw_port_switchdev_init':
>> drivers/net/ethernet/ti/cpsw_switchdev.c:298:8: error: 'struct net_device' has no member named 'switchdev_ops'; did you mean 'netdev_ops'?
     ndev->switchdev_ops = &cpsw_port_switchdev_ops;
           ^~~~~~~~~~~~~
           netdev_ops

vim +298 drivers/net/ethernet/ti/cpsw_switchdev.c

   295	
   296	void cpsw_port_switchdev_init(struct net_device *ndev)
   297	{
 > 298		ndev->switchdev_ops = &cpsw_port_switchdev_ops;

---
0-DAY kernel test infrastructure                Open Source Technology Center
https://lists.01.org/pipermail/kbuild-all                   Intel Corporation

[-- Attachment #2: .config.gz --]
[-- Type: application/gzip, Size: 33598 bytes --]

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [PATCH 0/4] RFC CPSW switchdev mode
  2018-05-25 12:09                       ` Andrew Lunn
@ 2018-05-31 15:27                         ` Ilias Apalodimas
  0 siblings, 0 replies; 53+ messages in thread
From: Ilias Apalodimas @ 2018-05-31 15:27 UTC (permalink / raw)
  To: Andrew Lunn
  Cc: Ivan Vecera, Jiri Pirko, netdev, grygorii.strashko,
	ivan.khoronzhuk, nsekhar, francois.ozog, yogeshs, spatton

Sorry for the late response i had some time to take another look and do some
extra testing

> switchdev is about offloading what Linux can do to hardware to
> accelerate it. The switch is a block of accelerator hardware, like a
> GPU is for accelerating graphics. Linux can render OpenGL, but it is
> better to hand it over to the GPU accelerator.
>
> Same applies here. The Linux bridge can bridge multicast. Using the
> switchdev API, you can push that down to the accelerator, and let it
> do it.
>
> So you need to think about, how do you make the Linux bridge not pass
> multicast traffic to the host stack. Then how do you extend the
> switchdev API so you can push this down to the accelerator.
>

> To really get switchdev, you often need to pivot your point of view a
> bit. People often think, switchdev is about writing drivers for
> switches. Its not, its about how you offload networking which Linux
> can do down to a switch. And if the switch cannot accelerate it, you
> leave Linux to do it.
>
> When you get in the details, i think you will find the switchdev API
> actually already has what you need for this use case. What you need to
> figure out is how you make the Linux bridge not pass multicast to the
> host. Well, actually, not pass multicast it has not asked for. Then
> accelerate it.
The current driver is already working like that. The difference between the
modes of operation is this:
By registering the 'cpu port' we choose if the linux host is going to see the
br_ip4_multicast_igmp3_report or br_multicast_ipv4_rcv (by configuring the vlan
it participates) and trigger switchdev to add the MDBs
If the cpu port is member of that VLAN then the dynamic entry shows on 'bridge
mdb show' command i.e dev br0 port sw0p1 grp 239.1.1.1 temp offload vid 100
If not the user is able to add it manually.

Anyway i got the main points of the RFC, if Petr's patch get accepted i might be
able to respin this without registering a CPU port. 

Regards
Ilias

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [PATCH 0/4] RFC CPSW switchdev mode
  2018-05-24  6:56 [PATCH 0/4] RFC CPSW switchdev mode Ilias Apalodimas
                   ` (4 preceding siblings ...)
  2018-05-24  8:05 ` [PATCH 0/4] RFC CPSW switchdev mode Jiri Pirko
@ 2018-06-01 21:29 ` Grygorii Strashko
  2018-06-02 14:08   ` Andrew Lunn
  5 siblings, 1 reply; 53+ messages in thread
From: Grygorii Strashko @ 2018-06-01 21:29 UTC (permalink / raw)
  To: Ilias Apalodimas, netdev, ivan.khoronzhuk, nsekhar, jiri, ivecera
  Cc: francois.ozog, yogeshs, spatton

Hi Ilias,

On 05/24/2018 01:56 AM, Ilias Apalodimas wrote:
> Hello,
> 
> This is adding a new mode on the cpsw driver based around switchdev.
> In order to enable this you need to enable CONFIG_NET_SWITCHDEV,
> CONFIG_BRIDGE_VLAN_FILTERING, CONFIG_TI_CPSW_SWITCHDEV
> and add to udev config:
> 
> SUBSYSTEM=="net", ACTION=="add", ATTR{phys_switch_id}=="0f011900", \
>          ATTR{phys_port_name}!="", NAME="sw0$attr{phys_port_name}"
> Since the phys_switch_id is based on cpsw version, users with different
> version will need to do 'ip -d link show dev sw0p0 | grep switchid' and
> replace with the correct value.
> 
> This patch creates 3 ports, sw0p0, sw0p1 and sw0p2.
> sw0p1 and sw0p2 are the netdev interfaces connected to PHY devices
> while sw0p0 is the switch 'cpu facing port'.
> sw0p0 will be unable to receive and transmit traffic and it's not 100% within
> switchdev scope but, it's used to configure switch cpu port individually as
> this is needed for various switch features and configuration scenarios.

First, sorry for delayed reply it was very tough week (new SoC).

Second, Thanks a lot for your great work. I'm still testing it with different
 use cases and trying to consolidate my reply for all questions.

All, thanks for your comments.

-- 
regards,
-grygorii

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [PATCH 4/4] cpsw: add switchdev support
  2018-05-25  4:56         ` Ilias Apalodimas
@ 2018-06-01 21:48           ` Florian Fainelli
  2018-06-02 10:34             ` Ilias Apalodimas
  0 siblings, 1 reply; 53+ messages in thread
From: Florian Fainelli @ 2018-06-01 21:48 UTC (permalink / raw)
  To: Ilias Apalodimas, Andrew Lunn
  Cc: netdev, grygorii.strashko, ivan.khoronzhuk, nsekhar, jiri,
	ivecera, francois.ozog, yogeshs, spatton



On 05/24/2018 09:56 PM, Ilias Apalodimas wrote:
> On Thu, May 24, 2018 at 06:39:04PM +0200, Andrew Lunn wrote:
>> On Thu, May 24, 2018 at 04:32:34PM +0300, Ilias Apalodimas wrote:
>>> On Thu, May 24, 2018 at 03:12:29PM +0200, Andrew Lunn wrote:
>>>> Device tree is supposed to describe the hardware. Using that hardware
>>>> in different ways is not something you should describe in DT.
>>>>
>>> The new switchdev mode is applied with a .config option in the kernel. What you
>>> see is pre-existing code, so i am not sure if i should change it in this
>>> patchset.
>>
>> If you break the code up into a library and two drivers, it becomes a
>> moot point.
> Agree
> 
>>
>> But what i don't like here is that the device tree says to do dual
>> mac. But you ignore that and do sometime else. I would prefer that if
>> DT says dual mac, and switchdev is compiled in, the probe fails with
>> EINVAL. Rather than ignore something, make it clear it is invalid.
> The switch has 3 modes of operation as is.
> 1. switch mode, to enable that you don't need to add anything on
> the DTS and linux registers a single netdev interface.
> 2. dual mac mode, this is when you need to add dual_emac; on the DTS.
> 3. switchdev mode which is controlled by a .config option, since as you 
> pointed out DTS was not made for controlling config options. 
> 
> I agree that this is far from beautiful. If the driver remains as in though,
> i'd prefer either keeping what's there or making "switchdev" a DTS option, 
> following the pre-existing erroneous usage rather than making the device 
> unusable.  If we end up returning some error and refuse to initialize, users 
> that remote upgrade their equipment, without taking a good look at changelog,
> will loose access to their devices with no means of remotely fixing that.

It seems to me that the mistake here is seeing multiple modes of
operations for the cpsw. There are not actually many, there is one
usage, and then there is what you can and cannot offload. The basic
premise with switchdev and DSA (which uses switchdev) is that each
user-facing port of your switch needs to work as if it were a normal
Ethernet NIC, that is what you call dual-MAC I believe. Then, when you
create a bridge and you enslave those ports into the bridge, you need to
have forwarding done in hardware between these two ports when the
SMAC/DMAC are not for the host/CPU/management interface and you must
simultaneously still have the host have the ability to send/receive
traffic through the bridge device.

It seems to me like this is entirely doable given that the dual MAC use
case is supported already.

switchdev is just a stateless framework to get notified from the
networking stack about what you can possibly offload in hardware, so
having a DTS option gate that is unfortunately wrong because it is
really implementing a SW policy in DTS which is not what it is meant for.
-- 
Florian

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [PATCH 4/4] cpsw: add switchdev support
  2018-06-01 21:48           ` Florian Fainelli
@ 2018-06-02 10:34             ` Ilias Apalodimas
  2018-06-02 16:10               ` Florian Fainelli
  2018-06-05 21:03               ` Grygorii Strashko
  0 siblings, 2 replies; 53+ messages in thread
From: Ilias Apalodimas @ 2018-06-02 10:34 UTC (permalink / raw)
  To: Florian Fainelli
  Cc: Andrew Lunn, netdev, grygorii.strashko, ivan.khoronzhuk, nsekhar,
	jiri, ivecera, francois.ozog, yogeshs, spatton

Hi Florian, 

Thanks for taking time to look into this

On Fri, Jun 01, 2018 at 02:48:48PM -0700, Florian Fainelli wrote:
> 
> 
> On 05/24/2018 09:56 PM, Ilias Apalodimas wrote:
> > On Thu, May 24, 2018 at 06:39:04PM +0200, Andrew Lunn wrote:
> >> On Thu, May 24, 2018 at 04:32:34PM +0300, Ilias Apalodimas wrote:
> >>> On Thu, May 24, 2018 at 03:12:29PM +0200, Andrew Lunn wrote:
> >>>> Device tree is supposed to describe the hardware. Using that hardware
> >>>> in different ways is not something you should describe in DT.
> >>>>
> >>> The new switchdev mode is applied with a .config option in the kernel. What you
> >>> see is pre-existing code, so i am not sure if i should change it in this
> >>> patchset.
> >>
> >> If you break the code up into a library and two drivers, it becomes a
> >> moot point.
> > Agree
> > 
> >>
> >> But what i don't like here is that the device tree says to do dual
> >> mac. But you ignore that and do sometime else. I would prefer that if
> >> DT says dual mac, and switchdev is compiled in, the probe fails with
> >> EINVAL. Rather than ignore something, make it clear it is invalid.
> > The switch has 3 modes of operation as is.
> > 1. switch mode, to enable that you don't need to add anything on
> > the DTS and linux registers a single netdev interface.
> > 2. dual mac mode, this is when you need to add dual_emac; on the DTS.
> > 3. switchdev mode which is controlled by a .config option, since as you 
> > pointed out DTS was not made for controlling config options. 
> > 
> > I agree that this is far from beautiful. If the driver remains as in though,
> > i'd prefer either keeping what's there or making "switchdev" a DTS option, 
> > following the pre-existing erroneous usage rather than making the device 
> > unusable.  If we end up returning some error and refuse to initialize, users 
> > that remote upgrade their equipment, without taking a good look at changelog,
> > will loose access to their devices with no means of remotely fixing that.
> 
> It seems to me that the mistake here is seeing multiple modes of
> operations for the cpsw. There are not actually many, there is one
> usage, and then there is what you can and cannot offload. 
CPSW has in fact 2 modes of operation, different FIFO usage/lookup entry(it's
called ALE in the current driver) by-pass(which is used in dual emac for 
example) and other features. Again Grygorii is better suited to answer the 
exact differences.
> The basic
> premise with switchdev and DSA (which uses switchdev) is that each
> user-facing port of your switch needs to work as if it were a normal
> Ethernet NIC, that is what you call dual-MAC I believe. Then, when you
> create a bridge and you enslave those ports into the bridge, you need to
> have forwarding done in hardware between these two ports when the
> SMAC/DMAC are not for the host/CPU/management interface and you must
> simultaneously still have the host have the ability to send/receive
> traffic through the bridge device.
Yes dual emac does that. But dual emac configures the port facing VLAN to the
CPU port as well. So dual emac splits and uses 2 interfaces. VLAN 1 is
configured on port1 + CPU port and VLAN 2 is confired on port 2 + CPU port
That's exactly what the current RFC does as well, with the addition of
registering a sw0p0 (i'll explain why later on this mail)
A little more detail on the issue we are having. On my description 
sw0p0 -> CPU port, sw0p1 -> port 1 sw0p2 -> port 2. sw0p1/sw0p2 are the ports
that have PHYs attached. 

When we start in the new switchdev mode all interfaces are added to VLAN 0
so CPU port + port1 + port2 are all in the same VLAN group. In that case sw0p1
and sw0p2 are working as you describe. So those 2 interfaces can send/receive
traffic normally which matches the switchdev case.

When we add them on a bridge(let's say br0), VLAN1(or any default bridge VLAN)
is now configured on sw0p1 and sw0p2 but *not* on the CPU port. 
>From this point on the whole fuunctionality just collapses. The switch will 
work and offload traffic between sw0p1/sw0p2 but the bridge won't be able to 
get an ip address (since VLAN1 is not a member of the CPU port and the packet 
gets dropped). 
IGMPv2/V3 messages will never reach the br_multicast.c code to trigger 
switchdev and configure the MDBs on the ports.  i am prety sure there are other
fail scenarios which i haven't discovered already, but those 2 are the most 
basic ones.  If we add VLAN1 on the CPU port, everything works as intended of 
course.

That's the reason we registered sw0p0 as the CPU port. It can't do any "real"
traffic, but you can configure the CPU port independantly and not be forced to
do an OR on every VLAN add/delete grouping the CPU port with your port command.
The TL;DR version of this is that the switch is working exactly as switchdev is
expecting offloading traffic to the hardware when possible as long as the CPU
port is member of the proper VLANs

Petr's patch solves this for us (9c86ce2c1ae337fc10568a12aea812ed03de8319).
We can now do "bridge vlan add dev br0 vid 100 pvid untagged self" and decide
when to add the CPU port or not. 

There are still a couple of cases that are not covered though, if we don't 
register the CPU port. We cant decide when to forward multicast
traffic on the CPU port if a join hasn't been sent from br0.
So let's say you got 2 hosts doing multicast and for whatever reason the host
wants to see that traffic. 
With the CPU port present you can do a 
"bridge mdb add dev br0 port sw0p0 grp 239.1.1.1 permanent" which will offload
the traffic to the CPU port and thus the host. If this goes away we are forced
to send a join.
 
> It seems to me like this is entirely doable given that the dual MAC use
> case is supported already.
> 
> switchdev is just a stateless framework to get notified from the
> networking stack about what you can possibly offload in hardware, so
> having a DTS option gate that is unfortunately wrong because it is
> really implementing a SW policy in DTS which is not what it is meant for.
The DTS option for configuration pre-existed, i don't like that either switchdev
mode is activated by a .config option not DTS(it just overrides whatever config 
you have on the DTS). Far from pretty though fair point, i am open to 
suggestions.
> -- 
> Florian

Thanks!
Ilias

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [PATCH 0/4] RFC CPSW switchdev mode
  2018-06-01 21:29 ` Grygorii Strashko
@ 2018-06-02 14:08   ` Andrew Lunn
  2018-06-05 22:59     ` Grygorii Strashko
  0 siblings, 1 reply; 53+ messages in thread
From: Andrew Lunn @ 2018-06-02 14:08 UTC (permalink / raw)
  To: Grygorii Strashko
  Cc: Ilias Apalodimas, netdev, ivan.khoronzhuk, nsekhar, jiri,
	ivecera, francois.ozog, yogeshs, spatton

On Fri, Jun 01, 2018 at 04:29:08PM -0500, Grygorii Strashko wrote:
> Hi Ilias,


> Second, Thanks a lot for your great work. I'm still testing it with different
>  use cases and trying to consolidate my reply for all questions.
> 
> All, thanks for your comments.

Hi Grygorii

Something i've said to Ilias already. I would recommend you don't try
to cover all your uses cases with the first version. Keep it simple
and clean, don't do anything controversial and get it merged. Then add
more features one by one. We can then discuss any odd ball features
while being able to look at the complete system, driver, switchdev and
the network stack.

    Andrew

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [PATCH 4/4] cpsw: add switchdev support
  2018-06-02 10:34             ` Ilias Apalodimas
@ 2018-06-02 16:10               ` Florian Fainelli
  2018-06-02 16:52                 ` Ilias Apalodimas
  2018-06-05 21:03               ` Grygorii Strashko
  1 sibling, 1 reply; 53+ messages in thread
From: Florian Fainelli @ 2018-06-02 16:10 UTC (permalink / raw)
  To: Ilias Apalodimas
  Cc: Andrew Lunn, netdev, grygorii.strashko, ivan.khoronzhuk, nsekhar,
	jiri, ivecera, francois.ozog, yogeshs, spatton

On June 2, 2018 3:34:32 AM MST, Ilias Apalodimas <ilias.apalodimas@linaro.org> wrote:
>Hi Florian, 
>
>Thanks for taking time to look into this
>
>On Fri, Jun 01, 2018 at 02:48:48PM -0700, Florian Fainelli wrote:
>> 
>> 
>> On 05/24/2018 09:56 PM, Ilias Apalodimas wrote:
>> > On Thu, May 24, 2018 at 06:39:04PM +0200, Andrew Lunn wrote:
>> >> On Thu, May 24, 2018 at 04:32:34PM +0300, Ilias Apalodimas wrote:
>> >>> On Thu, May 24, 2018 at 03:12:29PM +0200, Andrew Lunn wrote:
>> >>>> Device tree is supposed to describe the hardware. Using that
>hardware
>> >>>> in different ways is not something you should describe in DT.
>> >>>>
>> >>> The new switchdev mode is applied with a .config option in the
>kernel. What you
>> >>> see is pre-existing code, so i am not sure if i should change it
>in this
>> >>> patchset.
>> >>
>> >> If you break the code up into a library and two drivers, it
>becomes a
>> >> moot point.
>> > Agree
>> > 
>> >>
>> >> But what i don't like here is that the device tree says to do dual
>> >> mac. But you ignore that and do sometime else. I would prefer that
>if
>> >> DT says dual mac, and switchdev is compiled in, the probe fails
>with
>> >> EINVAL. Rather than ignore something, make it clear it is invalid.
>> > The switch has 3 modes of operation as is.
>> > 1. switch mode, to enable that you don't need to add anything on
>> > the DTS and linux registers a single netdev interface.
>> > 2. dual mac mode, this is when you need to add dual_emac; on the
>DTS.
>> > 3. switchdev mode which is controlled by a .config option, since as
>you 
>> > pointed out DTS was not made for controlling config options. 
>> > 
>> > I agree that this is far from beautiful. If the driver remains as
>in though,
>> > i'd prefer either keeping what's there or making "switchdev" a DTS
>option, 
>> > following the pre-existing erroneous usage rather than making the
>device 
>> > unusable.  If we end up returning some error and refuse to
>initialize, users 
>> > that remote upgrade their equipment, without taking a good look at
>changelog,
>> > will loose access to their devices with no means of remotely fixing
>that.
>> 
>> It seems to me that the mistake here is seeing multiple modes of
>> operations for the cpsw. There are not actually many, there is one
>> usage, and then there is what you can and cannot offload. 
>CPSW has in fact 2 modes of operation, different FIFO usage/lookup
>entry(it's
>called ALE in the current driver) by-pass(which is used in dual emac
>for 
>example) and other features. Again Grygorii is better suited to answer
>the 
>exact differences.
>> The basic
>> premise with switchdev and DSA (which uses switchdev) is that each
>> user-facing port of your switch needs to work as if it were a normal
>> Ethernet NIC, that is what you call dual-MAC I believe. Then, when
>you
>> create a bridge and you enslave those ports into the bridge, you need
>to
>> have forwarding done in hardware between these two ports when the
>> SMAC/DMAC are not for the host/CPU/management interface and you must
>> simultaneously still have the host have the ability to send/receive
>> traffic through the bridge device.
>Yes dual emac does that. But dual emac configures the port facing VLAN
>to the
>CPU port as well. So dual emac splits and uses 2 interfaces. VLAN 1 is
>configured on port1 + CPU port and VLAN 2 is confired on port 2 + CPU
>port
>That's exactly what the current RFC does as well, with the addition of
>registering a sw0p0 (i'll explain why later on this mail)
>A little more detail on the issue we are having. On my description 
>sw0p0 -> CPU port, sw0p1 -> port 1 sw0p2 -> port 2. sw0p1/sw0p2 are the
>ports
>that have PHYs attached. 
>
>When we start in the new switchdev mode all interfaces are added to
>VLAN 0
>so CPU port + port1 + port2 are all in the same VLAN group. In that
>case sw0p1
>and sw0p2 are working as you describe. So those 2 interfaces can
>send/receive
>traffic normally which matches the switchdev case.
>
>When we add them on a bridge(let's say br0), VLAN1(or any default
>bridge VLAN)
>is now configured on sw0p1 and sw0p2 but *not* on the CPU port. 
>From this point on the whole fuunctionality just collapses. The switch
>will 
>work and offload traffic between sw0p1/sw0p2 but the bridge won't be
>able to 
>get an ip address (since VLAN1 is not a member of the CPU port and the
>packet 
>gets dropped). 
>IGMPv2/V3 messages will never reach the br_multicast.c code to trigger 
>switchdev and configure the MDBs on the ports.  i am prety sure there
>are other
>fail scenarios which i haven't discovered already, but those 2 are the
>most 
>basic ones.  If we add VLAN1 on the CPU port, everything works as
>intended of 
>course.
>
>That's the reason we registered sw0p0 as the CPU port. It can't do any
>"real"
>traffic, but you can configure the CPU port independantly and not be
>forced to
>do an OR on every VLAN add/delete grouping the CPU port with your port
>command.
>The TL;DR version of this is that the switch is working exactly as
>switchdev is
>expecting offloading traffic to the hardware when possible as long as
>the CPU
>port is member of the proper VLANs
>
>Petr's patch solves this for us
>(9c86ce2c1ae337fc10568a12aea812ed03de8319).
>We can now do "bridge vlan add dev br0 vid 100 pvid untagged self" and
>decide
>when to add the CPU port or not. 
>
>There are still a couple of cases that are not covered though, if we
>don't 
>register the CPU port. We cant decide when to forward multicast
>traffic on the CPU port if a join hasn't been sent from br0.
>So let's say you got 2 hosts doing multicast and for whatever reason
>the host
>wants to see that traffic. 
>With the CPU port present you can do a 
>"bridge mdb add dev br0 port sw0p0 grp 239.1.1.1 permanent" which will
>offload
>the traffic to the CPU port and thus the host. If this goes away we are
>forced
>to send a join.

Thanks for the detailed explanation. Somehow I was under the impression that cpsw had the ability, through specific DMA descriptor bits to direct traffic towards one external port or another and conversely, have that information from the HW when receiving packets. What you describe is exactly the same problem we have in DSA when the switch advertises DSA_TAG_PROTO_NONE where only VLAN tags could help differentiate traffic from external ports. At some point there was a discuss of making DSA_TAG_PROTO_NONE automatically create one VLAN per port but this is a good source for other problems...

Looking forward to your follow-up patch series!

-- 
Florian

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [PATCH 4/4] cpsw: add switchdev support
  2018-06-02 16:10               ` Florian Fainelli
@ 2018-06-02 16:52                 ` Ilias Apalodimas
  0 siblings, 0 replies; 53+ messages in thread
From: Ilias Apalodimas @ 2018-06-02 16:52 UTC (permalink / raw)
  To: Florian Fainelli
  Cc: Andrew Lunn, netdev, grygorii.strashko, ivan.khoronzhuk, nsekhar,
	jiri, ivecera, francois.ozog, yogeshs, spatton

On Sat, Jun 02, 2018 at 09:10:08AM -0700, Florian Fainelli wrote:
> On June 2, 2018 3:34:32 AM MST, Ilias Apalodimas <ilias.apalodimas@linaro.org> wrote:
> >Hi Florian, 
> >
> >Thanks for taking time to look into this
> >
> >On Fri, Jun 01, 2018 at 02:48:48PM -0700, Florian Fainelli wrote:
> >> 
> >> 
> >> On 05/24/2018 09:56 PM, Ilias Apalodimas wrote:
> >> > On Thu, May 24, 2018 at 06:39:04PM +0200, Andrew Lunn wrote:
> >> >> On Thu, May 24, 2018 at 04:32:34PM +0300, Ilias Apalodimas wrote:
> >> >>> On Thu, May 24, 2018 at 03:12:29PM +0200, Andrew Lunn wrote:
> >> >>>> Device tree is supposed to describe the hardware. Using that
> >hardware
> >> >>>> in different ways is not something you should describe in DT.
> >> >>>>
> >> >>> The new switchdev mode is applied with a .config option in the
> >kernel. What you
> >> >>> see is pre-existing code, so i am not sure if i should change it
> >in this
> >> >>> patchset.
> >> >>
> >> >> If you break the code up into a library and two drivers, it
> >becomes a
> >> >> moot point.
> >> > Agree
> >> > 
> >> >>
> >> >> But what i don't like here is that the device tree says to do dual
> >> >> mac. But you ignore that and do sometime else. I would prefer that
> >if
> >> >> DT says dual mac, and switchdev is compiled in, the probe fails
> >with
> >> >> EINVAL. Rather than ignore something, make it clear it is invalid.
> >> > The switch has 3 modes of operation as is.
> >> > 1. switch mode, to enable that you don't need to add anything on
> >> > the DTS and linux registers a single netdev interface.
> >> > 2. dual mac mode, this is when you need to add dual_emac; on the
> >DTS.
> >> > 3. switchdev mode which is controlled by a .config option, since as
> >you 
> >> > pointed out DTS was not made for controlling config options. 
> >> > 
> >> > I agree that this is far from beautiful. If the driver remains as
> >in though,
> >> > i'd prefer either keeping what's there or making "switchdev" a DTS
> >option, 
> >> > following the pre-existing erroneous usage rather than making the
> >device 
> >> > unusable.  If we end up returning some error and refuse to
> >initialize, users 
> >> > that remote upgrade their equipment, without taking a good look at
> >changelog,
> >> > will loose access to their devices with no means of remotely fixing
> >that.
> >> 
> >> It seems to me that the mistake here is seeing multiple modes of
> >> operations for the cpsw. There are not actually many, there is one
> >> usage, and then there is what you can and cannot offload. 
> >CPSW has in fact 2 modes of operation, different FIFO usage/lookup
> >entry(it's
> >called ALE in the current driver) by-pass(which is used in dual emac
> >for 
> >example) and other features. Again Grygorii is better suited to answer
> >the 
> >exact differences.
> >> The basic
> >> premise with switchdev and DSA (which uses switchdev) is that each
> >> user-facing port of your switch needs to work as if it were a normal
> >> Ethernet NIC, that is what you call dual-MAC I believe. Then, when
> >you
> >> create a bridge and you enslave those ports into the bridge, you need
> >to
> >> have forwarding done in hardware between these two ports when the
> >> SMAC/DMAC are not for the host/CPU/management interface and you must
> >> simultaneously still have the host have the ability to send/receive
> >> traffic through the bridge device.
> >Yes dual emac does that. But dual emac configures the port facing VLAN
> >to the
> >CPU port as well. So dual emac splits and uses 2 interfaces. VLAN 1 is
> >configured on port1 + CPU port and VLAN 2 is confired on port 2 + CPU
> >port
> >That's exactly what the current RFC does as well, with the addition of
> >registering a sw0p0 (i'll explain why later on this mail)
> >A little more detail on the issue we are having. On my description 
> >sw0p0 -> CPU port, sw0p1 -> port 1 sw0p2 -> port 2. sw0p1/sw0p2 are the
> >ports
> >that have PHYs attached. 
> >
> >When we start in the new switchdev mode all interfaces are added to
> >VLAN 0
> >so CPU port + port1 + port2 are all in the same VLAN group. In that
> >case sw0p1
> >and sw0p2 are working as you describe. So those 2 interfaces can
> >send/receive
> >traffic normally which matches the switchdev case.
> >
> >When we add them on a bridge(let's say br0), VLAN1(or any default
> >bridge VLAN)
> >is now configured on sw0p1 and sw0p2 but *not* on the CPU port. 
> >From this point on the whole fuunctionality just collapses. The switch
> >will 
> >work and offload traffic between sw0p1/sw0p2 but the bridge won't be
> >able to 
> >get an ip address (since VLAN1 is not a member of the CPU port and the
> >packet 
> >gets dropped). 
> >IGMPv2/V3 messages will never reach the br_multicast.c code to trigger 
> >switchdev and configure the MDBs on the ports.  i am prety sure there
> >are other
> >fail scenarios which i haven't discovered already, but those 2 are the
> >most 
> >basic ones.  If we add VLAN1 on the CPU port, everything works as
> >intended of 
> >course.
> >
> >That's the reason we registered sw0p0 as the CPU port. It can't do any
> >"real"
> >traffic, but you can configure the CPU port independantly and not be
> >forced to
> >do an OR on every VLAN add/delete grouping the CPU port with your port
> >command.
> >The TL;DR version of this is that the switch is working exactly as
> >switchdev is
> >expecting offloading traffic to the hardware when possible as long as
> >the CPU
> >port is member of the proper VLANs
> >
> >Petr's patch solves this for us
> >(9c86ce2c1ae337fc10568a12aea812ed03de8319).
> >We can now do "bridge vlan add dev br0 vid 100 pvid untagged self" and
> >decide
> >when to add the CPU port or not. 
> >
> >There are still a couple of cases that are not covered though, if we
> >don't 
> >register the CPU port. We cant decide when to forward multicast
> >traffic on the CPU port if a join hasn't been sent from br0.
> >So let's say you got 2 hosts doing multicast and for whatever reason
> >the host
> >wants to see that traffic. 
> >With the CPU port present you can do a 
> >"bridge mdb add dev br0 port sw0p0 grp 239.1.1.1 permanent" which will
> >offload
> >the traffic to the CPU port and thus the host. If this goes away we are
> >forced
> >to send a join.
> 
> Thanks for the detailed explanation. Somehow I was under the impression that cpsw had the ability, through specific DMA descriptor bits to direct traffic towards one external port or another and conversely, have that information from the HW when receiving packets. 
That's one mode of operation when by-passing the ALE if my understanding of
the hardware is correct. You can choose not to do that. I am still earning all
the details of the ahrdware myself. On Rx though you still need the CPU to
participate to receive the packet(and yes the descriptor indicates the port)

> What you describe is exactly the same problem we have in DSA when the switch advertises DSA_TAG_PROTO_NONE where only VLAN tags could help differentiate traffic from external ports. At some point there was a discuss of making DSA_TAG_PROTO_NONE automatically create one VLAN per port but this is a good source for other problems...
I am pretty sure i am not the first one to encounter this kind of problems
> 
> Looking forward to your follow-up patch series!
Will do!
> 
> -- 
> Florian

Thanks
Ilias

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [PATCH 0/4] RFC CPSW switchdev mode
  2018-05-24 14:08         ` Ilias Apalodimas
  2018-05-24 14:54           ` Andrew Lunn
@ 2018-06-02 23:28           ` Grygorii Strashko
  2018-06-03  0:08             ` Andrew Lunn
                               ` (3 more replies)
  1 sibling, 4 replies; 53+ messages in thread
From: Grygorii Strashko @ 2018-06-02 23:28 UTC (permalink / raw)
  To: Ilias Apalodimas, Ivan Vecera, Andrew Lunn
  Cc: Jiri Pirko, netdev, ivan.khoronzhuk, nsekhar, francois.ozog,
	yogeshs, spatton, Jiri Pirko

Hi All,

Sry, for delayed reply.

On 05/24/2018 09:08 AM, Ilias Apalodimas wrote:
> On Thu, May 24, 2018 at 03:44:54PM +0200, Ivan Vecera wrote:
>> On 24.5.2018 14:54, Andrew Lunn wrote:
>>> On Thu, May 24, 2018 at 11:48:31AM +0300, Ilias Apalodimas wrote:
>>>> On Thu, May 24, 2018 at 10:05:28AM +0200, Jiri Pirko wrote:
>>>>> Thu, May 24, 2018 at 08:56:20AM CEST, ilias.apalodimas@linaro.org wrote:
>>>>> Any reason you need cpu port? We don't need it in mlxsw and also in dsa.
>>>> Yes i've seen that on mlxsw/rocker drivers and i was reluctant adding one here.
>>>> The reason is that TI wants this configured differently from customer facing
>>>> ports. Apparently there are existing customers already using the "feature".
>>>> So OR'ing and adding the cpu port on every operation (add/del vlans add
>>>> ucast/mcast entries etc) was less favoured.
>>>
>>> Hi Ilias
>>>
>>> Nice to see this device moving away from its custom model and towards
>>> the switchdev model.
>> +1
> Thanks. To be honest it opens up so many posibilities for common configuration
> from userspace across vendors that doing something new without it doesn't make
> any sense (at least to me).
> 
>>
>>> Did you consider making a clean break from the existing code and write
>>> a new driver. Let the existing customers using the existing
>>> driver. Have the new switchdev driver fully conform to switchdev.
>>
>> I would also prefer fresh new driver. The existing one can be marked as
>> 'bugfix-only' and later pertinently deprecated/removed.
> Yes, but given the driver and the platforms it's used at, we ended up patching
> the existing driver. I am not opposed to the idea, but Grygorii is more suited
> to reply on that.

Correct, we considered two options - start from scratch or hack existing driver
to get working prototype as fast as possible.
Hacking of existing driver was just faster way to go and i agree that new driver 
might be better approach for the future.

> 
>>>
>>> I don't like having this 'cpu' interface. As you say, it breaks the
>>> switchhdev model. If we need to extend the switchdev model to support
>>> some use case, lets do that. Please can you fully describe the use
>>> cases, so we can discuss how to implement them cleanly within the
>>> switchdev model.
>> +1
> There's configuration needs from customers adding or not adding a VLAN to the
> CPU port. In my configuration examples for instance, if the cpu port is not
> added to the bridge, you cannot get an ip address on it.
> Similar cases exist for customers on adding MDBs as far as i know. So they want
> the "customer facing ports" to have the MDBs present but not the cpu port.
> In some cases (where the CPE/device that has the switch) participates in the
> traffic they want the cpu port to have the samne MDBs installed.
> This is just two simple cases that come in mind, again Grygorii is more suited
> to answer and explain existing/more complex use cases better than me.
> 
> Adding a cpu port that cannot transmit or receive traffic is a bit "weird", on
> the other hand you can access it's configuration using the same userspace tools
> and the same commands you do for the "normal" ports. Extending switchdev might
> be the proper solution here.

I'd try to provide more information about this switch and why we end up adding eth0 port.
Please, be patient to me as this is new area for me and I might not be right in some
conclusions or can missing smth. 

*Before this patch set*: Current CPSW driver in switch mode can be described as below
          +----------------------------------------+
          |Linux Host 0                            |
          |                                        |
          |   TI tool (ioctl)/netdev               |
          |  + +                                   |
          +-----------------+----------------------+
             | |            |
Control MMIO | |            | Data DMA
             | |            |
         +------------+-----+-----+-----------------+
         |   | |      |   CPDMA   |        CPSW     |
         |   v |      |           |                 |
         |     |      +-----------+                 |
         |     |      |    P0     |                 |
         |     |      |           |                 |
         |     |      +-----+-----+                 |
         |     |            |  FDB: static MAC Port=0
         |     |      +-----+-----+                 |
         |     +------>    ALE    +-----+           |
         |     +------+           |     |           |
         |     |      +-----------+     |           |
         | +---+----+              +----+---+       |
         | |        |              |        |       |
         | |  P1    |              |  P2    |       |
         +------------------------------------------+
           |        |              |        |
           |  PHY   |              |  PHY   |
           +---+----+              +----+---+
               |                        |
           +---+----+               +---+----+
           |        |               |        |
           |  Host 1|               | Host 2 |
           +--------+               +--------+

Note. This is embedded world and in many cases network configuration is
static, everything which is not allowed - drop (means no such things like
IGMP or even ARP).

The core Part of CPSW is ALE module which perform switching ops according 
to ALE table (fdb/mdb/vlan). From ALE point of view *all* port absolutely equal.
And Linux Host 0 in most of customer use cases is the no much different from Hosts 1 or 2,
so allowed net traffic to Linux Host 0 have to be very carefully configured as Apps running
on it must continue working even if network failed or overloaded (packet storms).

So, as per my understanding, P0 meaning here is absolutely not the same as CPU port in DSA.
It just another switch port with bad luck to be connected directly to CPU.

Current CPSW driver offloads basic switch configuration in HW and additional configuration
can be done using TI custom tool (kernel updated to accept additional IOCTL).
Current driver creates one netdev ETH0 and all net traffic enter/exit through this device
- it possible to distinguish which external port p1/2 received packet or send packet directly
to external port p1/2, but current CPSW doesn't do this.
Static Unicast FDB entries created in ALE table for Port 0 to specify which 
unicast traffic need to be accepted by Linux Host 0.  By default only registered multicast 
or broadcast packets forwarded to Linux Host 0.



*After this patch set*: goal keep things working the same as max as possible and
get rid of TI custom tool.

        +-------------------------------------------------------+
        | Linux Host 0                                          |
        |                            +------------------+       |
        |                            |            br0   |       |
        |                            |                  |       |
        |                            +---+--------------+       |
        |                                |             |data    |
        |     +-------+--------------+---------------+ |        |
        |     |       ^              ^   |           | |        |
        |     |      ++-----+      +-+---+-+    +----+-+-+      |
        |     |      |sw0p0 |      | sw0p1 |    |sw0p2   |      |
        |     |      +------+      +----^--+    +-----^--+      |
        |     |                         |             |         |
        |     |             +-----------+-------------+         |
        |     |             |                                   |
        +----+v+------------------------------------------------+
             | |            |
Control MMIO | |            | Data DMA
             | |            |
         +------------+-----+-----+-----------------------------+
         |   | |      |   CPDMA   |        CPSW                 |
         |   v |      |           |                             |
         |     |      +-----------+                             |
         |     |      |    P0     |                             |
         |     |      |           |                             |
         |     |      +-----+-----+                             |
         |     |            |        FDB: static MACP1 port=0   |
         |     |      +-----+-----+  FDB: static MACP2 port=0   |
         |     +------>    ALE    +-----+                       |
         |     +------+           |     |                       |
         |     |      +-----------+     |                       |
         | +---+----+              +----+---+                   |
         | |        |              |        |                   |
         | |  P1    |              |  P2    |                   |
         +------------------------------------------------------+
           |        |              |        |
           |  PHY   |              |  PHY   |
           +--------+              +--------+

In this implementation switch egress traffic to Linux Host 0 always
split between sw0p1 and sw0p2 depending on which external port P1/P2
packet was received, sw0p0 doesn't produce any traffic.
On switch ingress from Linux Host 0 packets are always sent directly to
external ports P1/P2, with assumption that Linux Bridge knows where packet
should go, and ALE bypassed in this case. sw0p0 doesn't allows to
send any traffic and serves for configuration purposes only.

Below I've described some tested use cases (not include full static configuration),
but regarding sw0p0 - there is work done by Ivan Khoronzhuk [1] which enables
adds MQPRIO and CBS Qdisc and targets AVB network features. It required to
offload MQPRIO and CBS parameters on all ports including P0. In case of P0,
CPDMA TX channels shapers need to be configured, and in case 
of sw0p1/sw0p2 internal FIFOS. 
sw0p0 also expected to be used to configure CPDMA interface in general -
number of tx/rx channels, rates, ring sizes.
In addition there is set of global CPSW parameters (not related to P1/P2, like
MAC Authorization Mode, OUI Deny Mode, crc ) which I've 
thought can be added to sw0p0 (using ethtool -priv-flags).

Additional headache is PTP: we have on PHC, but both external interfaces P1/P2
can timestamp packets.

[1] https://lkml.org/lkml/2018/5/18/1134


Below use cases were tried with this approach tried with current LKML,
so possible changes to Net/Bridge framework not considered:

 
1) boot, ping no vlan

# ip link add name br0 type bridge
# echo 0 > /sys/class/net/br0/bridge/default_pvid
# ip link set dev eth2 master br0
# ip link set dev eth0 master br0
# ip link set dev eth1 master br0
# ifconfig br0 192.168.1.2

*Note*: I've had to disable default_pvid as otherwise linux Bridge adds
and offloads default vlan 1, but default configuration for CPSW driver is vid 0.
+  CPSW specific - it can't untag packets for P0.
Another option I've found:
# ip link set dev br0 type bridge vlan_filtering 1.
but anyway, I've found it confusing that Linux bridge adds default vlan when vlan_filtering == 0


2) add vlans.

Host 1 - not vlan capable,
Host 2/ Linux Host 0 can handle vlans

# bridge vlan add dev sw0p1 vid 100 pvid untagged master
# bridge vlan add dev sw0p2 vid 100 master
# bridge vlan add dev sw0p0 vid 100 master

Any combination expected to work.

*Note*. Ilias reused IFF_ALLMULTI and IFF_MULTICAST on each interface to
 configure registered and unregistered multicast ports masks for each ALE VLAN
entries. So, 

# ifconfig eth0 -multicast
# ifconfig eth0 --allmulti
# bridge vlan add dev sw0p0 vid 100 master
expected to stop forwarding  any multicast traffic to Linux Host 0

3) MDB

Allow mcast 239.1.1.1 between P1/P2
# bridge mdb add dev br0 port sw0p1 grp 239.1.1.1 permanent
# bridge mdb add dev br0 port sw0p2 grp 239.1.1.1 permanent

Allow mcast 239.1.1.2 between P0/P2
# bridge mdb add dev br0 port sw0p0 grp 239.1.1.2 permanent
# bridge mdb add dev br0 port sw0p2 grp 239.1.1.2 permanent

*Note !!!!!*. I've found no proper way to add L2 mcast addresses as
01-80-C2-00-00-00 or 01-1B-19-00-00-00. probably I missing smth.

4) stp

I did some updates (added port stp state offload/stp mcats addr configuration
- will post diff on Monday) and tried stp (kernel STP) by connecting P1/P2
ports to the same switch:

# ip link add name br0 type bridge
# echo 0 > /sys/class/net/br0/bridge/default_pvid
# ip link set dev sw0p2 master br0
# ip link set dev sw0p0 master br0
# ip link set dev sw0p1 master br0
# brctl show
# ifconfig br0 192.168.1.2
# brctl stp br0 on
# brctl showstp br0

sw0p1 (3)
 port id                8003                    state                  blocking
sw0p2 (1)
 port id                8001                    state                forwarding


Don't know howto:
1) add FDB entry with "blocked" flag - ALE can discard all packets with SRC/DST
address = blocked MAC
2) add multicast MAC address with Supervisory Packet flag set. 
Such packets will bypass most of checks inside ALE and will be forwarded in all port's
states except "disabled".
3) add "unknown vlan configuration" : ALE provides possibility to configure
default behavior for tagged packets with "unknown vlan" by configuring 
- Unknown VLAN Force Untagged Egress ports Mask.
- Unknown VLAN Registered Multicast Flood Ports Mask
- Unknown VLAN Multicast Flood ports Mask
- Unknown VLAN Member ports List
4) The way to detect "brctl stp br0 on/off"


If you are here. Thanks for reading this and your patience.

-- 
regards,
-grygorii

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [PATCH 0/4] RFC CPSW switchdev mode
  2018-06-02 23:28           ` Grygorii Strashko
@ 2018-06-03  0:08             ` Andrew Lunn
  2018-06-05 21:18               ` Grygorii Strashko
  2018-06-03  0:26             ` Andrew Lunn
                               ` (2 subsequent siblings)
  3 siblings, 1 reply; 53+ messages in thread
From: Andrew Lunn @ 2018-06-03  0:08 UTC (permalink / raw)
  To: Grygorii Strashko
  Cc: Ilias Apalodimas, Ivan Vecera, Jiri Pirko, netdev,
	ivan.khoronzhuk, nsekhar, francois.ozog, yogeshs, spatton

On Sat, Jun 02, 2018 at 06:28:22PM -0500, Grygorii Strashko wrote:

Hi Grygorii

I'm just picking out one thing here... there is lots more good stuff here.

> Additional headache is PTP: we have on PHC, but both external interfaces P1/P2
> can timestamp packets.
 
This should not be a problem. The Marvell switches have one PHC, but
each port can time stamp packets using this counter. Each port has its
own receive and transmit time stamp registers. So i don't think this
will cause you problems.

     Andrew

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [PATCH 0/4] RFC CPSW switchdev mode
  2018-06-02 23:28           ` Grygorii Strashko
  2018-06-03  0:08             ` Andrew Lunn
@ 2018-06-03  0:26             ` Andrew Lunn
  2018-06-05 23:23               ` Grygorii Strashko
  2018-06-03  0:37             ` Andrew Lunn
  2018-06-03  0:49             ` Andrew Lunn
  3 siblings, 1 reply; 53+ messages in thread
From: Andrew Lunn @ 2018-06-03  0:26 UTC (permalink / raw)
  To: Grygorii Strashko
  Cc: Ilias Apalodimas, Ivan Vecera, Jiri Pirko, netdev,
	ivan.khoronzhuk, nsekhar, francois.ozog, yogeshs, spatton

> *After this patch set*: goal keep things working the same as max as
> possible and get rid of TI custom tool.

We are happy to keep things the same, if they fit with the switchdev
model. Anything in your customer TI tool/model which does not fit the
switchdev model you won't be able to keep, except if we agree to
extend the model.

I can say now, sw0p0 is going to cause problems. I really do suggest
you drop it for the moment in order to get a minimal driver
accepted. sw0p0 does not fit the switchdev model.

> Below I've described some tested use cases (not include full static configuration),
> but regarding sw0p0 - there is work done by Ivan Khoronzhuk [1] which enables
> adds MQPRIO and CBS Qdisc and targets AVB network features. It required to
> offload MQPRIO and CBS parameters on all ports including P0. In case of P0,
> CPDMA TX channels shapers need to be configured, and in case 
> of sw0p1/sw0p2 internal FIFOS. 
> sw0p0 also expected to be used to configure CPDMA interface in general -
> number of tx/rx channels, rates, ring sizes.

Can this be derives from the configuration on sw0p1 and sw0p2? 
sw0p1 has 1 tx channel, sw0p2 has 2 tx channels, so give p0 3 tx
channels?

> In addition there is set of global CPSW parameters (not related to P1/P2, like
> MAC Authorization Mode, OUI Deny Mode, crc ) which I've 
> thought can be added to sw0p0 (using ethtool -priv-flags).

You should describe these features, and then we can figure out how
best to model them. devlink might be an option if they are switch
global.

     Andrew

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [PATCH 0/4] RFC CPSW switchdev mode
  2018-06-02 23:28           ` Grygorii Strashko
  2018-06-03  0:08             ` Andrew Lunn
  2018-06-03  0:26             ` Andrew Lunn
@ 2018-06-03  0:37             ` Andrew Lunn
  2018-06-05 21:31               ` Grygorii Strashko
  2018-06-03  0:49             ` Andrew Lunn
  3 siblings, 1 reply; 53+ messages in thread
From: Andrew Lunn @ 2018-06-03  0:37 UTC (permalink / raw)
  To: Grygorii Strashko
  Cc: Ilias Apalodimas, Ivan Vecera, Jiri Pirko, netdev,
	ivan.khoronzhuk, nsekhar, francois.ozog, yogeshs, spatton

> 1) boot, ping no vlan
> 
> # ip link add name br0 type bridge
> # echo 0 > /sys/class/net/br0/bridge/default_pvid
> # ip link set dev eth2 master br0
> # ip link set dev eth0 master br0
> # ip link set dev eth1 master br0
> # ifconfig br0 192.168.1.2
> 
> *Note*: I've had to disable default_pvid as otherwise linux Bridge adds
> and offloads default vlan 1, but default configuration for CPSW driver is vid 0.
> +  CPSW specific - it can't untag packets for P0.
> Another option I've found:
> # ip link set dev br0 type bridge vlan_filtering 1.
> but anyway, I've found it confusing that Linux bridge adds default vlan when vlan_filtering == 0

There are three different configurations here you need to worry about,
with respect to vlans:

# CONFIG_VLAN_8021Q is not set

So you don't have any vlan support in the kernel.

CONFIG_VLAN_8021Q=y, vlan_filtering = 0

So you have vlans, but filtering is off

CONFIG_VLAN_8021Q=y, vlan_filtering = 1

So you have vlans, and filtering is on.

Even with vlan_filtering off, the bridge still does a little with
vlans.

And you need all three to work correctly. 

    Andrew

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [PATCH 0/4] RFC CPSW switchdev mode
  2018-06-02 23:28           ` Grygorii Strashko
                               ` (2 preceding siblings ...)
  2018-06-03  0:37             ` Andrew Lunn
@ 2018-06-03  0:49             ` Andrew Lunn
  2018-06-05 22:45               ` Grygorii Strashko
  3 siblings, 1 reply; 53+ messages in thread
From: Andrew Lunn @ 2018-06-03  0:49 UTC (permalink / raw)
  To: Grygorii Strashko
  Cc: Ilias Apalodimas, Ivan Vecera, Jiri Pirko, netdev,
	ivan.khoronzhuk, nsekhar, francois.ozog, yogeshs, spatton

Hi Grygorii

> Don't know howto:
> 1) add FDB entry with "blocked" flag - ALE can discard all packets with SRC/DST
> address = blocked MAC
> 2) add multicast MAC address with Supervisory Packet flag set. 
> Such packets will bypass most of checks inside ALE and will be forwarded in all port's
> states except "disabled".
> 3) add "unknown vlan configuration" : ALE provides possibility to configure
> default behavior for tagged packets with "unknown vlan" by configuring 
> - Unknown VLAN Force Untagged Egress ports Mask.
> - Unknown VLAN Registered Multicast Flood Ports Mask
> - Unknown VLAN Multicast Flood ports Mask
> - Unknown VLAN Member ports List
> 4) The way to detect "brctl stp br0 on/off"

You are probably looking at this from the wrong direction. Yes, the
switch can do these things. But the real question is, why would the
network stack want to do this? As i've said before, you are
accelerating the network stack by offloading things to the hardware.

Does the software bridge support FDB with a blocked flag? I don't
think it does. So you first need to extend the software bridge with
this concept. Then you can offload it to the hardware to accelerate
it.

Does the network stack need for forward specific multicast MAC
addresses between bridge ports independent of the state? If there is
no need for it, you don't need to accelerate it.

   Andrew

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [PATCH 4/4] cpsw: add switchdev support
  2018-06-02 10:34             ` Ilias Apalodimas
  2018-06-02 16:10               ` Florian Fainelli
@ 2018-06-05 21:03               ` Grygorii Strashko
  2018-06-05 21:37                 ` Florian Fainelli
  1 sibling, 1 reply; 53+ messages in thread
From: Grygorii Strashko @ 2018-06-05 21:03 UTC (permalink / raw)
  To: Ilias Apalodimas, Florian Fainelli
  Cc: Andrew Lunn, netdev, ivan.khoronzhuk, nsekhar, jiri, ivecera,
	francois.ozog, yogeshs, spatton



On 06/02/2018 05:34 AM, Ilias Apalodimas wrote:
> Hi Florian,
> 
> Thanks for taking time to look into this
> 
> On Fri, Jun 01, 2018 at 02:48:48PM -0700, Florian Fainelli wrote:
>>
>>
>> On 05/24/2018 09:56 PM, Ilias Apalodimas wrote:
>>> On Thu, May 24, 2018 at 06:39:04PM +0200, Andrew Lunn wrote:
>>>> On Thu, May 24, 2018 at 04:32:34PM +0300, Ilias Apalodimas wrote:
>>>>> On Thu, May 24, 2018 at 03:12:29PM +0200, Andrew Lunn wrote:
>>>>>> Device tree is supposed to describe the hardware. Using that hardware
>>>>>> in different ways is not something you should describe in DT.
>>>>>>
>>>>> The new switchdev mode is applied with a .config option in the kernel. What you
>>>>> see is pre-existing code, so i am not sure if i should change it in this
>>>>> patchset.
>>>>
>>>> If you break the code up into a library and two drivers, it becomes a
>>>> moot point.
>>> Agree
>>>
>>>>
>>>> But what i don't like here is that the device tree says to do dual
>>>> mac. But you ignore that and do sometime else. I would prefer that if
>>>> DT says dual mac, and switchdev is compiled in, the probe fails with
>>>> EINVAL. Rather than ignore something, make it clear it is invalid.
>>> The switch has 3 modes of operation as is.
>>> 1. switch mode, to enable that you don't need to add anything on
>>> the DTS and linux registers a single netdev interface.
>>> 2. dual mac mode, this is when you need to add dual_emac; on the DTS.
>>> 3. switchdev mode which is controlled by a .config option, since as you
>>> pointed out DTS was not made for controlling config options.
>>>
>>> I agree that this is far from beautiful. If the driver remains as in though,
>>> i'd prefer either keeping what's there or making "switchdev" a DTS option,
>>> following the pre-existing erroneous usage rather than making the device
>>> unusable.  If we end up returning some error and refuse to initialize, users
>>> that remote upgrade their equipment, without taking a good look at changelog,
>>> will loose access to their devices with no means of remotely fixing that.
>>
>> It seems to me that the mistake here is seeing multiple modes of
>> operations for the cpsw. There are not actually many, there is one
>> usage, and then there is what you can and cannot offload.

> CPSW has in fact 2 modes of operation, different FIFO usage/lookup entry(it's
> called ALE in the current driver) by-pass(which is used in dual emac for
> example) and other features. Again Grygorii is better suited to answer the
> exact differences.

dual_mac is HW enabled mode of operation, so having DT option is pretty
reasonable as for me. 
1) when enabled it configures internal FIFOs in special way so both
external Ports became equal in the direction toward to Host port 0.

TRM "The intention of this mode is to allow packets from both ethernet ports
to be written into the FIFO without one port starving the other port."

2) ALE, out of the box, does not support this mode and, as result, two
default vlan have to be created to direct traffic P1->P0 (vlan1) and
P2-P0 (vlan2), but any packets P0->P1 and P0->P2 have to be directed
(and will bypass ALE). This way traffic separated on cpsw egress towards to P0, 

>> The basic
>> premise with switchdev and DSA (which uses switchdev) is that each
>> user-facing port of your switch needs to work as if it were a normal
>> Ethernet NIC, that is what you call dual-MAC I believe. Then, when you
>> create a bridge and you enslave those ports into the bridge, you need to
>> have forwarding done in hardware between these two ports when the
>> SMAC/DMAC are not for the host/CPU/management interface and you must
>> simultaneously still have the host have the ability to send/receive
>> traffic through the bridge device.

TRM
"When operating in dual mac mode the intention is to transfer packets between ports 0 and 1 and ports 0
and 2, but not between ports 1 and 2. Each CPGMAC_SL appears as a single MAC with no bridging
between MAC’s. Each CPGMAC_SL has at least one unique (not the same) mac address."

So, we cant't talk about forwarding in dual_mac mode. Interfaces expected to be 
completely independent without any packet leaking between interfaces.

!! BUT !! CPSW ALE (switch core) still expected to be used, but for packet filtering
- only registered vlans
- only registered mcast/bcast
- ingress mcast/bcast rate limiting (it's actually more like coalescing -
 limits number of mcast/bcast packets per sec. 

And all offloading ALE (val/mdb) entries should always contain two ports in masks:
p1&p0 or p2&p0. Never ever all three ports. 

> Yes dual emac does that. But dual emac configures the port facing VLAN to the
> CPU port as well. So dual emac splits and uses 2 interfaces. VLAN 1 is
> configured on port1 + CPU port and VLAN 2 is confired on port 2 + CPU port
> That's exactly what the current RFC does as well, with the addition of
> registering a sw0p0 (i'll explain why later on this mail)
> A little more detail on the issue we are having. On my description
> sw0p0 -> CPU port, sw0p1 -> port 1 sw0p2 -> port 2. sw0p1/sw0p2 are the ports
> that have PHYs attached.
> 
> When we start in the new switchdev mode all interfaces are added to VLAN 0
> so CPU port + port1 + port2 are all in the same VLAN group. In that case sw0p1
> and sw0p2 are working as you describe. So those 2 interfaces can send/receive
> traffic normally which matches the switchdev case.
> 
> When we add them on a bridge(let's say br0), VLAN1(or any default bridge VLAN)
> is now configured on sw0p1 and sw0p2 but *not* on the CPU port.
>  From this point on the whole fuunctionality just collapses. The switch will
> work and offload traffic between sw0p1/sw0p2 but the bridge won't be able to
> get an ip address (since VLAN1 is not a member of the CPU port and the packet
> gets dropped).
> IGMPv2/V3 messages will never reach the br_multicast.c code to trigger
> switchdev and configure the MDBs on the ports.  i am prety sure there are other
> fail scenarios which i haven't discovered already, but those 2 are the most
> basic ones.  If we add VLAN1 on the CPU port, everything works as intended of
> course.
> 
> That's the reason we registered sw0p0 as the CPU port. It can't do any "real"
> traffic, but you can configure the CPU port independantly and not be forced to
> do an OR on every VLAN add/delete grouping the CPU port with your port command.
> The TL;DR version of this is that the switch is working exactly as switchdev is
> expecting offloading traffic to the hardware when possible as long as the CPU
> port is member of the proper VLANs
> 
> Petr's patch solves this for us (9c86ce2c1ae337fc10568a12aea812ed03de8319).
> We can now do "bridge vlan add dev br0 vid 100 pvid untagged self" and decide
> when to add the CPU port or not.
> 
> There are still a couple of cases that are not covered though, if we don't
> register the CPU port. We cant decide when to forward multicast
> traffic on the CPU port if a join hasn't been sent from br0.
> So let's say you got 2 hosts doing multicast and for whatever reason the host
> wants to see that traffic.
> With the CPU port present you can do a
> "bridge mdb add dev br0 port sw0p0 grp 239.1.1.1 permanent" which will offload
> the traffic to the CPU port and thus the host. If this goes away we are forced
> to send a join.
>   
>> It seems to me like this is entirely doable given that the dual MAC use
>> case is supported already.
>>
>> switchdev is just a stateless framework to get notified from the
>> networking stack about what you can possibly offload in hardware, so
>> having a DTS option gate that is unfortunately wrong because it is
>> really implementing a SW policy in DTS which is not what it is meant for.
> The DTS option for configuration pre-existed, i don't like that either switchdev
> mode is activated by a .config option not DTS(it just overrides whatever config
> you have on the DTS). Far from pretty though fair point, i am open to
> suggestions.

Again this is option describing HW mode which not expected to be changed on the fly.
I'm honestly, propose descope dual_mac and concentrate on switch mode (switchdev) - 
right now I don't see how dual_mac can be supported with switchdev as per above.
The same way as I do not see how we can re-use switchdev with 50% of devices which 
have "only one" user-facing external port (P1 or P2).


-- 
regards,
-grygorii

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [PATCH 0/4] RFC CPSW switchdev mode
  2018-06-03  0:08             ` Andrew Lunn
@ 2018-06-05 21:18               ` Grygorii Strashko
  2018-06-05 21:28                 ` Andrew Lunn
  0 siblings, 1 reply; 53+ messages in thread
From: Grygorii Strashko @ 2018-06-05 21:18 UTC (permalink / raw)
  To: Andrew Lunn
  Cc: Ilias Apalodimas, Ivan Vecera, Jiri Pirko, netdev,
	ivan.khoronzhuk, nsekhar, francois.ozog, yogeshs, spatton



On 06/02/2018 07:08 PM, Andrew Lunn wrote:
> On Sat, Jun 02, 2018 at 06:28:22PM -0500, Grygorii Strashko wrote:
> 
> Hi Grygorii
> 
> I'm just picking out one thing here... there is lots more good stuff here.
> 
>> Additional headache is PTP: we have on PHC, but both external interfaces P1/P2
>> can timestamp packets.
>   
> This should not be a problem. The Marvell switches have one PHC, but
> each port can time stamp packets using this counter. Each port has its
> own receive and transmit time stamp registers. So i don't think this
> will cause you problems.

I hope you are right - question is always in number of available options
and which one to select - and, most important, explain it to the end user :( 


For example:
phc_index is returned as part of .get_ts_info() = cpsw_get_ts_info(),
so which intf should return phc_index?

Still not tested, so jut hope ...


-- 
regards,
-grygorii

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [PATCH 0/4] RFC CPSW switchdev mode
  2018-06-05 21:18               ` Grygorii Strashko
@ 2018-06-05 21:28                 ` Andrew Lunn
  2018-06-05 21:42                   ` Grygorii Strashko
  0 siblings, 1 reply; 53+ messages in thread
From: Andrew Lunn @ 2018-06-05 21:28 UTC (permalink / raw)
  To: Grygorii Strashko
  Cc: Ilias Apalodimas, Ivan Vecera, Jiri Pirko, netdev,
	ivan.khoronzhuk, nsekhar, francois.ozog, yogeshs, spatton

> I hope you are right - question is always in number of available options
> and which one to select - and, most important, explain it to the end user :( 

The end customer being ptp4linux? At least for Marvell switches, it is
happy about everything except that the switch is a bit slow, so we
need to modify some of the time outs in the configuration file.

> For example:
> phc_index is returned as part of .get_ts_info() = cpsw_get_ts_info(),
> so which intf should return phc_index?

It is not a 1:1 relationship. See:

https://elixir.bootlin.com/linux/latest/source/drivers/net/dsa/mv88e6xxx/hwtstamp.c#L61

All interfaces return the same index.

In fact, for a switch, having a PHC per port would be odd. That would
mean you need to sync the PHCs in order to act as a boundary clock.

    Andrew

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [PATCH 0/4] RFC CPSW switchdev mode
  2018-06-03  0:37             ` Andrew Lunn
@ 2018-06-05 21:31               ` Grygorii Strashko
  2018-06-05 21:37                 ` Andrew Lunn
  0 siblings, 1 reply; 53+ messages in thread
From: Grygorii Strashko @ 2018-06-05 21:31 UTC (permalink / raw)
  To: Andrew Lunn
  Cc: Ilias Apalodimas, Ivan Vecera, Jiri Pirko, netdev,
	ivan.khoronzhuk, nsekhar, francois.ozog, yogeshs, spatton



On 06/02/2018 07:37 PM, Andrew Lunn wrote:
>> 1) boot, ping no vlan
>>
>> # ip link add name br0 type bridge
>> # echo 0 > /sys/class/net/br0/bridge/default_pvid
>> # ip link set dev eth2 master br0
>> # ip link set dev eth0 master br0
>> # ip link set dev eth1 master br0
>> # ifconfig br0 192.168.1.2
>>
>> *Note*: I've had to disable default_pvid as otherwise linux Bridge adds
>> and offloads default vlan 1, but default configuration for CPSW driver is vid 0.
>> +  CPSW specific - it can't untag packets for P0.
>> Another option I've found:
>> # ip link set dev br0 type bridge vlan_filtering 1.
>> but anyway, I've found it confusing that Linux bridge adds default vlan when vlan_filtering == 0
> 
> There are three different configurations here you need to worry about,
> with respect to vlans:
> 
> # CONFIG_VLAN_8021Q is not set
> 
> So you don't have any vlan support in the kernel.
> 
> CONFIG_VLAN_8021Q=y, vlan_filtering = 0
> 
> So you have vlans, but filtering is off
> 
> CONFIG_VLAN_8021Q=y, vlan_filtering = 1
> 
> So you have vlans, and filtering is on.
> 
> Even with vlan_filtering off, the bridge still does a little with
> vlans.
> 
> And you need all three to work correctly.
> 

Right, thanky for the info, but still (sry, to be annoying) why default vlan is added by bridge
when vlan_filtering == 0?

-- 
regards,
-grygorii

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [PATCH 4/4] cpsw: add switchdev support
  2018-06-05 21:03               ` Grygorii Strashko
@ 2018-06-05 21:37                 ` Florian Fainelli
  0 siblings, 0 replies; 53+ messages in thread
From: Florian Fainelli @ 2018-06-05 21:37 UTC (permalink / raw)
  To: Grygorii Strashko, Ilias Apalodimas
  Cc: Andrew Lunn, netdev, ivan.khoronzhuk, nsekhar, jiri, ivecera,
	francois.ozog, yogeshs, spatton

On 06/05/2018 02:03 PM, Grygorii Strashko wrote:
> 
> 
> On 06/02/2018 05:34 AM, Ilias Apalodimas wrote:
>> Hi Florian,
>>
>> Thanks for taking time to look into this
>>
>> On Fri, Jun 01, 2018 at 02:48:48PM -0700, Florian Fainelli wrote:
>>>
>>>
>>> On 05/24/2018 09:56 PM, Ilias Apalodimas wrote:
>>>> On Thu, May 24, 2018 at 06:39:04PM +0200, Andrew Lunn wrote:
>>>>> On Thu, May 24, 2018 at 04:32:34PM +0300, Ilias Apalodimas wrote:
>>>>>> On Thu, May 24, 2018 at 03:12:29PM +0200, Andrew Lunn wrote:
>>>>>>> Device tree is supposed to describe the hardware. Using that hardware
>>>>>>> in different ways is not something you should describe in DT.
>>>>>>>
>>>>>> The new switchdev mode is applied with a .config option in the kernel. What you
>>>>>> see is pre-existing code, so i am not sure if i should change it in this
>>>>>> patchset.
>>>>>
>>>>> If you break the code up into a library and two drivers, it becomes a
>>>>> moot point.
>>>> Agree
>>>>
>>>>>
>>>>> But what i don't like here is that the device tree says to do dual
>>>>> mac. But you ignore that and do sometime else. I would prefer that if
>>>>> DT says dual mac, and switchdev is compiled in, the probe fails with
>>>>> EINVAL. Rather than ignore something, make it clear it is invalid.
>>>> The switch has 3 modes of operation as is.
>>>> 1. switch mode, to enable that you don't need to add anything on
>>>> the DTS and linux registers a single netdev interface.
>>>> 2. dual mac mode, this is when you need to add dual_emac; on the DTS.
>>>> 3. switchdev mode which is controlled by a .config option, since as you
>>>> pointed out DTS was not made for controlling config options.
>>>>
>>>> I agree that this is far from beautiful. If the driver remains as in though,
>>>> i'd prefer either keeping what's there or making "switchdev" a DTS option,
>>>> following the pre-existing erroneous usage rather than making the device
>>>> unusable.  If we end up returning some error and refuse to initialize, users
>>>> that remote upgrade their equipment, without taking a good look at changelog,
>>>> will loose access to their devices with no means of remotely fixing that.
>>>
>>> It seems to me that the mistake here is seeing multiple modes of
>>> operations for the cpsw. There are not actually many, there is one
>>> usage, and then there is what you can and cannot offload.
> 
>> CPSW has in fact 2 modes of operation, different FIFO usage/lookup entry(it's
>> called ALE in the current driver) by-pass(which is used in dual emac for
>> example) and other features. Again Grygorii is better suited to answer the
>> exact differences.
> 
> dual_mac is HW enabled mode of operation, so having DT option is pretty
> reasonable as for me. 
> 1) when enabled it configures internal FIFOs in special way so both
> external Ports became equal in the direction toward to Host port 0.
> 
> TRM "The intention of this mode is to allow packets from both ethernet ports
> to be written into the FIFO without one port starving the other port."
> 
> 2) ALE, out of the box, does not support this mode and, as result, two
> default vlan have to be created to direct traffic P1->P0 (vlan1) and
> P2-P0 (vlan2), but any packets P0->P1 and P0->P2 have to be directed
> (and will bypass ALE). This way traffic separated on cpsw egress towards to P0, 
> 
>>> The basic
>>> premise with switchdev and DSA (which uses switchdev) is that each
>>> user-facing port of your switch needs to work as if it were a normal
>>> Ethernet NIC, that is what you call dual-MAC I believe. Then, when you
>>> create a bridge and you enslave those ports into the bridge, you need to
>>> have forwarding done in hardware between these two ports when the
>>> SMAC/DMAC are not for the host/CPU/management interface and you must
>>> simultaneously still have the host have the ability to send/receive
>>> traffic through the bridge device.
> 
> TRM
> "When operating in dual mac mode the intention is to transfer packets between ports 0 and 1 and ports 0
> and 2, but not between ports 1 and 2. Each CPGMAC_SL appears as a single MAC with no bridging
> between MAC’s. Each CPGMAC_SL has at least one unique (not the same) mac address."
> 
> So, we cant't talk about forwarding in dual_mac mode. Interfaces expected to be 
> completely independent without any packet leaking between interfaces.
> 
> !! BUT !! CPSW ALE (switch core) still expected to be used, but for packet filtering
> - only registered vlans
> - only registered mcast/bcast
> - ingress mcast/bcast rate limiting (it's actually more like coalescing -
>  limits number of mcast/bcast packets per sec. 
> 
> And all offloading ALE (val/mdb) entries should always contain two ports in masks:
> p1&p0 or p2&p0. Never ever all three ports. 
> 
>> Yes dual emac does that. But dual emac configures the port facing VLAN to the
>> CPU port as well. So dual emac splits and uses 2 interfaces. VLAN 1 is
>> configured on port1 + CPU port and VLAN 2 is confired on port 2 + CPU port
>> That's exactly what the current RFC does as well, with the addition of
>> registering a sw0p0 (i'll explain why later on this mail)
>> A little more detail on the issue we are having. On my description
>> sw0p0 -> CPU port, sw0p1 -> port 1 sw0p2 -> port 2. sw0p1/sw0p2 are the ports
>> that have PHYs attached.
>>
>> When we start in the new switchdev mode all interfaces are added to VLAN 0
>> so CPU port + port1 + port2 are all in the same VLAN group. In that case sw0p1
>> and sw0p2 are working as you describe. So those 2 interfaces can send/receive
>> traffic normally which matches the switchdev case.
>>
>> When we add them on a bridge(let's say br0), VLAN1(or any default bridge VLAN)
>> is now configured on sw0p1 and sw0p2 but *not* on the CPU port.
>>  From this point on the whole fuunctionality just collapses. The switch will
>> work and offload traffic between sw0p1/sw0p2 but the bridge won't be able to
>> get an ip address (since VLAN1 is not a member of the CPU port and the packet
>> gets dropped).
>> IGMPv2/V3 messages will never reach the br_multicast.c code to trigger
>> switchdev and configure the MDBs on the ports.  i am prety sure there are other
>> fail scenarios which i haven't discovered already, but those 2 are the most
>> basic ones.  If we add VLAN1 on the CPU port, everything works as intended of
>> course.
>>
>> That's the reason we registered sw0p0 as the CPU port. It can't do any "real"
>> traffic, but you can configure the CPU port independantly and not be forced to
>> do an OR on every VLAN add/delete grouping the CPU port with your port command.
>> The TL;DR version of this is that the switch is working exactly as switchdev is
>> expecting offloading traffic to the hardware when possible as long as the CPU
>> port is member of the proper VLANs
>>
>> Petr's patch solves this for us (9c86ce2c1ae337fc10568a12aea812ed03de8319).
>> We can now do "bridge vlan add dev br0 vid 100 pvid untagged self" and decide
>> when to add the CPU port or not.
>>
>> There are still a couple of cases that are not covered though, if we don't
>> register the CPU port. We cant decide when to forward multicast
>> traffic on the CPU port if a join hasn't been sent from br0.
>> So let's say you got 2 hosts doing multicast and for whatever reason the host
>> wants to see that traffic.
>> With the CPU port present you can do a
>> "bridge mdb add dev br0 port sw0p0 grp 239.1.1.1 permanent" which will offload
>> the traffic to the CPU port and thus the host. If this goes away we are forced
>> to send a join.
>>   
>>> It seems to me like this is entirely doable given that the dual MAC use
>>> case is supported already.
>>>
>>> switchdev is just a stateless framework to get notified from the
>>> networking stack about what you can possibly offload in hardware, so
>>> having a DTS option gate that is unfortunately wrong because it is
>>> really implementing a SW policy in DTS which is not what it is meant for.
>> The DTS option for configuration pre-existed, i don't like that either switchdev
>> mode is activated by a .config option not DTS(it just overrides whatever config
>> you have on the DTS). Far from pretty though fair point, i am open to
>> suggestions.
> 
> Again this is option describing HW mode which not expected to be changed on the fly.
> I'm honestly, propose descope dual_mac and concentrate on switch mode (switchdev) - 
> right now I don't see how dual_mac can be supported with switchdev as per above.
> The same way as I do not see how we can re-use switchdev with 50% of devices which 
> have "only one" user-facing external port (P1 or P2).

This is still not appropriate for Device Tree, because this is
completely orthogonal from one another. Also, I think you tend to
conflate what the switch can do, with in which mode does it start by
default, which are, again, two different things.
-- 
Florian

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [PATCH 0/4] RFC CPSW switchdev mode
  2018-06-05 21:31               ` Grygorii Strashko
@ 2018-06-05 21:37                 ` Andrew Lunn
  0 siblings, 0 replies; 53+ messages in thread
From: Andrew Lunn @ 2018-06-05 21:37 UTC (permalink / raw)
  To: Grygorii Strashko
  Cc: Ilias Apalodimas, Ivan Vecera, Jiri Pirko, netdev,
	ivan.khoronzhuk, nsekhar, francois.ozog, yogeshs, spatton

> Right, thanky for the info, but still (sry, to be annoying) why default vlan is added by bridge
> when vlan_filtering == 0?

I have no idea!

I just made sure the Marvell driver works with this.

You might want to do a git blame and find out who added it, and it
might say why.

      Andrew

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [PATCH 0/4] RFC CPSW switchdev mode
  2018-06-05 21:28                 ` Andrew Lunn
@ 2018-06-05 21:42                   ` Grygorii Strashko
  2018-06-05 21:55                     ` Andrew Lunn
  0 siblings, 1 reply; 53+ messages in thread
From: Grygorii Strashko @ 2018-06-05 21:42 UTC (permalink / raw)
  To: Andrew Lunn
  Cc: Ilias Apalodimas, Ivan Vecera, Jiri Pirko, netdev,
	ivan.khoronzhuk, nsekhar, francois.ozog, yogeshs, spatton



On 06/05/2018 04:28 PM, Andrew Lunn wrote:
>> I hope you are right - question is always in number of available options
>> and which one to select - and, most important, explain it to the end user :(
> 
> The end customer being ptp4linux? At least for Marvell switches, it is
> happy about everything except that the switch is a bit slow, so we
> need to modify some of the time outs in the configuration file.
> 
>> For example:
>> phc_index is returned as part of .get_ts_info() = cpsw_get_ts_info(),
>> so which intf should return phc_index?
> 
> It is not a 1:1 relationship. See:
> 
> https://elixir.bootlin.com/linux/latest/source/drivers/net/dsa/mv88e6xxx/hwtstamp.c#L61
> 
> All interfaces return the same index.
> 
> In fact, for a switch, having a PHC per port would be odd. That would
> mean you need to sync the PHCs in order to act as a boundary clock.

PHC only one, but hw timestamping blocks are per port.

-- 
regards,
-grygorii

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [PATCH 0/4] RFC CPSW switchdev mode
  2018-06-05 21:42                   ` Grygorii Strashko
@ 2018-06-05 21:55                     ` Andrew Lunn
  0 siblings, 0 replies; 53+ messages in thread
From: Andrew Lunn @ 2018-06-05 21:55 UTC (permalink / raw)
  To: Grygorii Strashko
  Cc: Ilias Apalodimas, Ivan Vecera, Jiri Pirko, netdev,
	ivan.khoronzhuk, nsekhar, francois.ozog, yogeshs, spatton

> PHC only one, but hw timestamping blocks are per port.

Yes, same as the Marvell. Per port, there are two receive time stamps
and one transmit time stamp.

    Andrew

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [PATCH 0/4] RFC CPSW switchdev mode
  2018-06-03  0:49             ` Andrew Lunn
@ 2018-06-05 22:45               ` Grygorii Strashko
  2018-06-05 23:40                 ` Andrew Lunn
  0 siblings, 1 reply; 53+ messages in thread
From: Grygorii Strashko @ 2018-06-05 22:45 UTC (permalink / raw)
  To: Andrew Lunn
  Cc: Ilias Apalodimas, Ivan Vecera, Jiri Pirko, netdev,
	ivan.khoronzhuk, nsekhar, francois.ozog, yogeshs, spatton

Hi Andrew,

Thanks a lot for you comments.

On 06/02/2018 07:49 PM, Andrew Lunn wrote:
> Hi Grygorii
> 
>> Don't know howto:
>> 1) add FDB entry with "blocked" flag - ALE can discard all packets with SRC/DST
>> address = blocked MAC
>> 2) add multicast MAC address with Supervisory Packet flag set.
>> Such packets will bypass most of checks inside ALE and will be forwarded in all port's
>> states except "disabled".
>> 3) add "unknown vlan configuration" : ALE provides possibility to configure
>> default behavior for tagged packets with "unknown vlan" by configuring
>> - Unknown VLAN Force Untagged Egress ports Mask.
>> - Unknown VLAN Registered Multicast Flood Ports Mask
>> - Unknown VLAN Multicast Flood ports Mask
>> - Unknown VLAN Member ports List
>> 4) The way to detect "brctl stp br0 on/off"
> 
> You are probably looking at this from the wrong direction. Yes, the
> switch can do these things. But the real question is, why would the
> network stack want to do this? As i've said before, you are
> accelerating the network stack by offloading things to the hardware.

Right. Thanks. 

> 
> Does the software bridge support FDB with a blocked flag? I don't
> think it does. So you first need to extend the software bridge with
> this concept. Then you can offload it to the hardware to accelerate
> it.

Sry, for possible misunderstanding: in "Don't know howto" i've listed
things I was not able to discover from code or documentation with hope
that expert opinion will help to confirm if this this a real/possible gap
or I/we've just missed smth. And if this is a real gap - get "green" or "red"
flag for future work (which need to be planned somehow). 

So, my understanding for (1) "blocked FDB entry support" is reasonable
extension for bridge/switchdev ("green").

> 
> Does the network stack need for forward specific multicast MAC
> addresses between bridge ports independent of the state? If there is
> no need for it, you don't need to accelerate it.

Assume this is about item 2 - this question is related to STP packets.
CPSW/ALE will drop STP packets if receiving port is in blocking/learning states 
unless corresponding  mcast entry exist in ALE entry with (Supervisory Packet) flag set
(Actually ALE mcast entry has two fields (TRM): 
Supervisory Packet (SUPER): When set, this field indicates that the packet
 with a matching multicast destination address is a supervisory packet.
Multicast Forward State (MCAST_FWD_STATE): Indicates the port state(s) required for the received port
on a destination address lookup in order for the multicast packet to be forwarded to
the transmit port(s). A transmit port must be in the Forwarding state in
order to forward the packet.)

Question 4 was asked with assumption that if (2) not supported and "red" flag
- then option (4) can be used as w/a (again if "green" flag) and STP mcast address
can be added in ALE on event "stp on".


** "unknown vlan configuration"

This is about following use case. Non static network configuration when
CPSW based device knows what traffic it can accept (Host port 0), but
it has no knowledge (or limited) about network segments attached to Port 1 and Port 2.

For example: Host 0 can accept only vlan 100 traffic coming from Port 1.
ALE entry: vid =100, port_mask 0x3

But there can be vlan 50 created in attached network segments.
Unknown VLAN Force Untagged Egress ports Mask = 0x0
Unknown VLAN Registered Multicast Flood Ports Mask = 0x6 (P1|P2)
Unknown VLAN Multicast Flood ports Mask = 0x6 (P1|P2)
Unknown VLAN Member ports List  = 0x6 (P1|P2)

with above configuration packets with "unknown vlan" (no ALE entry) will
still be forwarded between port 1 and 2, but not Port 0. 

So, is it reasonable to add "unknown vlan configuration" to bridge/switchdev
if not exist yet? will any other hw known benefit from it?

-- 
regards,
-grygorii

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [PATCH 0/4] RFC CPSW switchdev mode
  2018-06-02 14:08   ` Andrew Lunn
@ 2018-06-05 22:59     ` Grygorii Strashko
  2018-06-05 23:53       ` Andrew Lunn
  0 siblings, 1 reply; 53+ messages in thread
From: Grygorii Strashko @ 2018-06-05 22:59 UTC (permalink / raw)
  To: Andrew Lunn
  Cc: Ilias Apalodimas, netdev, ivan.khoronzhuk, nsekhar, jiri,
	ivecera, francois.ozog, yogeshs, spatton



On 06/02/2018 09:08 AM, Andrew Lunn wrote:
> On Fri, Jun 01, 2018 at 04:29:08PM -0500, Grygorii Strashko wrote:
>> Hi Ilias,
> 
> 
>> Second, Thanks a lot for your great work. I'm still testing it with different
>>   use cases and trying to consolidate my reply for all questions.
>>
>> All, thanks for your comments.
> 
> Hi Grygorii
> 
> Something i've said to Ilias already. I would recommend you don't try
> to cover all your uses cases with the first version. Keep it simple
> and clean, don't do anything controversial and get it merged. Then add
> more features one by one. We can then discuss any odd ball features
> while being able to look at the complete system, driver, switchdev and
> the network stack.
> 

yes. It definitely no problem from my side, except basic customer use-cases 
simply not working without sw0p0, at least with current LKML :(

And I just have to look a little bit in the future as selected approach
expected to be extended on future SoC (and other parts of existing SoCs - ICSS-G SW switch)
 where we going to have more features, like TSN, EST and packet Policing and Classification.

And I very, very appreciated for your (and all others) time and comments.

Thank you.

-- 
regards,
-grygorii

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [PATCH 0/4] RFC CPSW switchdev mode
  2018-06-03  0:26             ` Andrew Lunn
@ 2018-06-05 23:23               ` Grygorii Strashko
  2018-06-05 23:49                 ` Andrew Lunn
  2018-06-06  8:23                 ` Ivan Khoronzhuk
  0 siblings, 2 replies; 53+ messages in thread
From: Grygorii Strashko @ 2018-06-05 23:23 UTC (permalink / raw)
  To: Andrew Lunn
  Cc: Ilias Apalodimas, Ivan Vecera, Jiri Pirko, netdev,
	ivan.khoronzhuk, nsekhar, francois.ozog, yogeshs, spatton



On 06/02/2018 07:26 PM, Andrew Lunn wrote:
>> *After this patch set*: goal keep things working the same as max as
>> possible and get rid of TI custom tool.
> 
> We are happy to keep things the same, if they fit with the switchdev
> model. Anything in your customer TI tool/model which does not fit the
> switchdev model you won't be able to keep, except if we agree to
> extend the model.

Right. That's the main goal of RFC to identify those gaps.

> 
> I can say now, sw0p0 is going to cause problems. I really do suggest
> you drop it for the moment in order to get a minimal driver
> accepted. sw0p0 does not fit the switchdev model.

Honestly, this is not the first patchset and we started without sw0p0,
but then.... (with current LKML)
- default vlan offloading breaks traffic reception to P0
 (Ilias saying it's fixed in next - good)
- adding vlan to P1/P2 works, but not for P0 (again as per Ilias -fixed)
- mcast - no way to manually add static record and include or exclude P0.


:( above are basic functionality required.

> 
>> Below I've described some tested use cases (not include full static configuration),
>> but regarding sw0p0 - there is work done by Ivan Khoronzhuk [1] which enables
>> adds MQPRIO and CBS Qdisc and targets AVB network features. It required to
>> offload MQPRIO and CBS parameters on all ports including P0. In case of P0,
>> CPDMA TX channels shapers need to be configured, and in case
>> of sw0p1/sw0p2 internal FIFOS.
>> sw0p0 also expected to be used to configure CPDMA interface in general -
>> number of tx/rx channels, rates, ring sizes.
> 
> Can this be derives from the configuration on sw0p1 and sw0p2?
> sw0p1 has 1 tx channel, sw0p2 has 2 tx channels, so give p0 3 tx
> channels?

This not exactly what is required, sry I probably will need just repeat what Ivan 
described in https://lkml.org/lkml/2018/5/18/1135. I'd try to provide this info tomorrow.

Unfortunately, I'm not sure if all this reworking and switchdev conversation would make sense
if we will not be able to fit Ivan's work in new CPSW driver model ;..(
and do AVB bridge.

> 
>> In addition there is set of global CPSW parameters (not related to P1/P2, like
>> MAC Authorization Mode, OUI Deny Mode, crc ) which I've
>> thought can be added to sw0p0 (using ethtool -priv-flags).
> 
> You should describe these features, and then we can figure out how
> best to model them. devlink might be an option if they are switch
> global.

Ok. that can be postponed.

-- 
regards,
-grygorii

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [PATCH 0/4] RFC CPSW switchdev mode
  2018-06-05 22:45               ` Grygorii Strashko
@ 2018-06-05 23:40                 ` Andrew Lunn
  0 siblings, 0 replies; 53+ messages in thread
From: Andrew Lunn @ 2018-06-05 23:40 UTC (permalink / raw)
  To: Grygorii Strashko
  Cc: Ilias Apalodimas, Ivan Vecera, Jiri Pirko, netdev,
	ivan.khoronzhuk, nsekhar, francois.ozog, yogeshs, spatton

> So, my understanding for (1) "blocked FDB entry support" is reasonable
> extension for bridge/switchdev ("green").

You might have to justify it, but yes.

> > Does the network stack need for forward specific multicast MAC
> > addresses between bridge ports independent of the state? If there is
> > no need for it, you don't need to accelerate it.
> 
> Assume this is about item 2 - this question is related to STP packets.
> CPSW/ALE will drop STP packets if receiving port is in blocking/learning states 
> unless corresponding  mcast entry exist in ALE entry with (Supervisory Packet) flag set
> (Actually ALE mcast entry has two fields (TRM): 
> Supervisory Packet (SUPER): When set, this field indicates that the packet
>  with a matching multicast destination address is a supervisory packet.
> Multicast Forward State (MCAST_FWD_STATE): Indicates the port state(s) required for the received port
> on a destination address lookup in order for the multicast packet to be forwarded to
> the transmit port(s). A transmit port must be in the Forwarding state in
> order to forward the packet.)

So i this case, i would expect your driver to just add these entries
to the ALE. No need for configuration for above. Just do it as soon as
a port is made a member of a bridge. And remove it when the port
leaves the bridge.

DSA devices are smarter. They all have hardware which looks for BPDU
and forwards them to the host independent of the state of the port.
They also tend to have hardware looking for IGMP packets, and will
forward them to the host, even if the multicast address is not being
forwarded to the host.

> ** "unknown vlan configuration"
> 
> This is about following use case. Non static network configuration when
> CPSW based device knows what traffic it can accept (Host port 0), but
> it has no knowledge (or limited) about network segments attached to Port 1 and Port 2.
> 
> For example: Host 0 can accept only vlan 100 traffic coming from Port 1.
> ALE entry: vid =100, port_mask 0x3
> 
> But there can be vlan 50 created in attached network segments.
> Unknown VLAN Force Untagged Egress ports Mask = 0x0
> Unknown VLAN Registered Multicast Flood Ports Mask = 0x6 (P1|P2)
> Unknown VLAN Multicast Flood ports Mask = 0x6 (P1|P2)
> Unknown VLAN Member ports List  = 0x6 (P1|P2)
> 
> with above configuration packets with "unknown vlan" (no ALE entry) will
> still be forwarded between port 1 and 2, but not Port 0. 
> 
> So, is it reasonable to add "unknown vlan configuration" to bridge/switchdev
> if not exist yet? will any other hw known benefit from it?

You need to think about the case of two e1000e. How do you configure
this setup to do what you want? It probably is already possible.

     Andrew

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [PATCH 0/4] RFC CPSW switchdev mode
  2018-06-05 23:23               ` Grygorii Strashko
@ 2018-06-05 23:49                 ` Andrew Lunn
  2018-06-06  8:23                 ` Ivan Khoronzhuk
  1 sibling, 0 replies; 53+ messages in thread
From: Andrew Lunn @ 2018-06-05 23:49 UTC (permalink / raw)
  To: Grygorii Strashko
  Cc: Ilias Apalodimas, Ivan Vecera, Jiri Pirko, netdev,
	ivan.khoronzhuk, nsekhar, francois.ozog, yogeshs, spatton

On Tue, Jun 05, 2018 at 06:23:45PM -0500, Grygorii Strashko wrote:
> 
> 
> On 06/02/2018 07:26 PM, Andrew Lunn wrote:
> >> *After this patch set*: goal keep things working the same as max as
> >> possible and get rid of TI custom tool.
> > 
> > We are happy to keep things the same, if they fit with the switchdev
> > model. Anything in your customer TI tool/model which does not fit the
> > switchdev model you won't be able to keep, except if we agree to
> > extend the model.
> 
> Right. That's the main goal of RFC to identify those gaps.
> 
> > 
> > I can say now, sw0p0 is going to cause problems. I really do suggest
> > you drop it for the moment in order to get a minimal driver
> > accepted. sw0p0 does not fit the switchdev model.
> 
> Honestly, this is not the first patchset and we started without sw0p0,
> but then.... (with current LKML)
> - default vlan offloading breaks traffic reception to P0
>  (Ilias saying it's fixed in next - good)
> - adding vlan to P1/P2 works, but not for P0 (again as per Ilias -fixed)
> - mcast - no way to manually add static record and include or exclude P0.
> 
> 
> :( above are basic functionality required.

For a DSA driver, this is way more than basic. A basic DSA driver just
provides interfaces, and does everything in software. No offload at
all. Generally, FDB offload is next, then MDB, and then VLAN, each as
separate patch sets.

> Unfortunately, I'm not sure if all this reworking and switchdev conversation would make sense
> if we will not be able to fit Ivan's work in new CPSW driver model ;..(
> and do AVB bridge.

AVB bridge should fit the switchdev model. You can offload TC via
switchdev e.g. the b53 has mirred, mellonex has flower and a lot more.

      Andrew

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [PATCH 0/4] RFC CPSW switchdev mode
  2018-06-05 22:59     ` Grygorii Strashko
@ 2018-06-05 23:53       ` Andrew Lunn
  2018-06-06  6:42         ` Ilias Apalodimas
  0 siblings, 1 reply; 53+ messages in thread
From: Andrew Lunn @ 2018-06-05 23:53 UTC (permalink / raw)
  To: Grygorii Strashko
  Cc: Ilias Apalodimas, netdev, ivan.khoronzhuk, nsekhar, jiri,
	ivecera, francois.ozog, yogeshs, spatton

> And I just have to look a little bit in the future as selected approach
> expected to be extended on future SoC (and other parts of existing SoCs - ICSS-G SW switch)
> where we going to have more features, like TSN, EST and packet Policing and Classification.

You should probably took a closer look at the Mellonex driver. It has
a lot of TC offload, which is what policing and classification is.

TSN is being worked on in general, and i think the i210 is taking the
lead. So you probably want to keep an eye on that, and join the
discussion.

	Andrew

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [PATCH 0/4] RFC CPSW switchdev mode
  2018-06-05 23:53       ` Andrew Lunn
@ 2018-06-06  6:42         ` Ilias Apalodimas
  0 siblings, 0 replies; 53+ messages in thread
From: Ilias Apalodimas @ 2018-06-06  6:42 UTC (permalink / raw)
  To: Andrew Lunn
  Cc: Grygorii Strashko, netdev, ivan.khoronzhuk, nsekhar, jiri,
	ivecera, francois.ozog, yogeshs, spatton

Hi Andrew,

>On Wed, Jun 06, 2018 at 01:53:56AM +0200, Andrew Lunn wrote:
> > And I just have to look a little bit in the future as selected approach
> > expected to be extended on future SoC (and other parts of existing SoCs - ICSS-G SW switch)
> > where we going to have more features, like TSN, EST and packet Policing and Classification.
> 
> You should probably took a closer look at the Mellonex driver. It has
> a lot of TC offload, which is what policing and classification is.
> 
I did take a close look to both Mellanox and rocker drivers before issuing this
RFC and we seem to be close on the approach. What Grygorii is reffering to, is
that for CBS to work properly on CPSW there *must* be a way to configure the
CPU port individually.

> TSN is being worked on in general, and i think the i210 is taking the
> lead. So you probably want to keep an eye on that, and join the
> discussion.
> 
TSN is not just "one thing". It's a few IEEE standards bundled up to provide the
needed functionality. i210 is only implementing CBS at the moment and there's
an RFC out there to support what they call "Time based scheduling".
I am already having discussions with Jesus on their current work.

The idea behind using switchdev is that due to it's design, it's a very
convenient way of utilizing netlink and iproute2/tc to configure any kind of
future shapers. Since net_device_ops now has a callback to configure hardware
schedulers being able to expose netdevs as hardware ports and configure them
individually is great. As you can understand you'll end up with TSN capable
switches and NICs being configured with the same commands from a userspace point
of view. I am not sure this is the proper way to go, but at least for me, 
sounds like a viable solution.

Thanks
Ilias

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [PATCH 0/4] RFC CPSW switchdev mode
  2018-06-05 23:23               ` Grygorii Strashko
  2018-06-05 23:49                 ` Andrew Lunn
@ 2018-06-06  8:23                 ` Ivan Khoronzhuk
  1 sibling, 0 replies; 53+ messages in thread
From: Ivan Khoronzhuk @ 2018-06-06  8:23 UTC (permalink / raw)
  To: Grygorii Strashko
  Cc: Andrew Lunn, Ilias Apalodimas, Ivan Vecera, Jiri Pirko, netdev,
	nsekhar, francois.ozog, yogeshs, spatton

On Tue, Jun 05, 2018 at 06:23:45PM -0500, Grygorii Strashko wrote:
>
>
>On 06/02/2018 07:26 PM, Andrew Lunn wrote:
>>> *After this patch set*: goal keep things working the same as max as
>>> possible and get rid of TI custom tool.
>>
>> We are happy to keep things the same, if they fit with the switchdev
>> model. Anything in your customer TI tool/model which does not fit the
>> switchdev model you won't be able to keep, except if we agree to
>> extend the model.
>
>Right. That's the main goal of RFC to identify those gaps.
>
>>
>> I can say now, sw0p0 is going to cause problems. I really do suggest
>> you drop it for the moment in order to get a minimal driver
>> accepted. sw0p0 does not fit the switchdev model.
>
>Honestly, this is not the first patchset and we started without sw0p0,
>but then.... (with current LKML)
>- default vlan offloading breaks traffic reception to P0
> (Ilias saying it's fixed in next - good)
>- adding vlan to P1/P2 works, but not for P0 (again as per Ilias -fixed)
>- mcast - no way to manually add static record and include or exclude P0.
>
>
>:( above are basic functionality required.
>
>>
>>> Below I've described some tested use cases (not include full static configuration),
>>> but regarding sw0p0 - there is work done by Ivan Khoronzhuk [1] which enables
>>> adds MQPRIO and CBS Qdisc and targets AVB network features. It required to
>>> offload MQPRIO and CBS parameters on all ports including P0. In case of P0,
>>> CPDMA TX channels shapers need to be configured, and in case
>>> of sw0p1/sw0p2 internal FIFOS.
>>> sw0p0 also expected to be used to configure CPDMA interface in general -
>>> number of tx/rx channels, rates, ring sizes.
>>
>> Can this be derives from the configuration on sw0p1 and sw0p2?
>> sw0p1 has 1 tx channel, sw0p2 has 2 tx channels, so give p0 3 tx
>> channels?
>
>This not exactly what is required, sry I probably will need just repeat what Ivan
>described in https://lkml.org/lkml/2018/5/18/1135. I'd try to provide this info tomorrow.
>
>Unfortunately, I'm not sure if all this reworking and switchdev conversation would make sense
>if we will not be able to fit Ivan's work in new CPSW driver model ;..(
>and do AVB bridge.

Not sure I tracked everything, but current link example only for dual-emac mode,
where i guess no switchdev and thus we have only 2 ports having phys and no need
in some common configuration. Only common resources tx queues that are easily
shared between two ports. In case of switchdev the same, no need in port 0, at
least for cpsw implementation (not sure about others), tx queues cpdma shapers
are configured separately and can be done with any netdev presenting ports,
thus p0 can be avoided in this case also. For instance, I've created short
description, based on current switchdev RFC, showing simple configuration
commands for shapers configuration from host side and CBS for every port,
w/o p0 usage:
https://git.linaro.org/people/ivan.khoronzhuk/tsn_conf.git/tree/README_switchdev

Will add it once switchdev decision is stabilized and applied. Basically nothing
changed with dual-emac configuration except different rates set for cpdma
shapers from host side.

Probably, p0 is needed for other configurations, or for compatibility reasons
with old versions, but definitely should be created list of all cases, and on my
opinion, any common resource for both ports can be configured with any netdev
presenting ports if it doesn't conflict with ports configuration itself.

>
>>
>>> In addition there is set of global CPSW parameters (not related to P1/P2, like
>>> MAC Authorization Mode, OUI Deny Mode, crc ) which I've
>>> thought can be added to sw0p0 (using ethtool -priv-flags).
>>
>> You should describe these features, and then we can figure out how
>> best to model them. devlink might be an option if they are switch
>> global.
>
>Ok. that can be postponed.
>
>-- 
>regards,
>-grygorii

-- 
Regards,
Ivan Khoronzhuk

^ permalink raw reply	[flat|nested] 53+ messages in thread

end of thread, other threads:[~2018-06-06  8:23 UTC | newest]

Thread overview: 53+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-05-24  6:56 [PATCH 0/4] RFC CPSW switchdev mode Ilias Apalodimas
2018-05-24  6:56 ` [PATCH 1/4] cpsw: move common headers definitions to cpsw_priv.h Ilias Apalodimas
2018-05-24  6:56 ` [PATCH 2/4] cpsw_ale: add support functions for switchdev Ilias Apalodimas
2018-05-24  6:56 ` [PATCH 3/4] cpsw_switchdev: add switchdev support files Ilias Apalodimas
2018-05-24 10:00   ` Maxim Uvarov
2018-05-27  4:39   ` kbuild test robot
2018-05-24  6:56 ` [PATCH 4/4] cpsw: add switchdev support Ilias Apalodimas
2018-05-24 13:12   ` Andrew Lunn
2018-05-24 13:32     ` Ilias Apalodimas
2018-05-24 16:39       ` Andrew Lunn
2018-05-25  4:56         ` Ilias Apalodimas
2018-06-01 21:48           ` Florian Fainelli
2018-06-02 10:34             ` Ilias Apalodimas
2018-06-02 16:10               ` Florian Fainelli
2018-06-02 16:52                 ` Ilias Apalodimas
2018-06-05 21:03               ` Grygorii Strashko
2018-06-05 21:37                 ` Florian Fainelli
2018-05-24  8:05 ` [PATCH 0/4] RFC CPSW switchdev mode Jiri Pirko
2018-05-24  8:48   ` Ilias Apalodimas
2018-05-24 12:54     ` Andrew Lunn
2018-05-24 13:44       ` Ivan Vecera
2018-05-24 14:08         ` Ilias Apalodimas
2018-05-24 14:54           ` Andrew Lunn
2018-05-24 15:07             ` Ilias Apalodimas
2018-05-24 15:25               ` Andrew Lunn
2018-05-24 16:02                 ` Ilias Apalodimas
2018-05-24 16:33                   ` Andrew Lunn
2018-05-25  6:29                     ` Ilias Apalodimas
2018-05-25 10:28                       ` Ilias Apalodimas
2018-05-25 11:59                         ` Andrew Lunn
2018-05-25 12:09                       ` Andrew Lunn
2018-05-31 15:27                         ` Ilias Apalodimas
2018-06-02 23:28           ` Grygorii Strashko
2018-06-03  0:08             ` Andrew Lunn
2018-06-05 21:18               ` Grygorii Strashko
2018-06-05 21:28                 ` Andrew Lunn
2018-06-05 21:42                   ` Grygorii Strashko
2018-06-05 21:55                     ` Andrew Lunn
2018-06-03  0:26             ` Andrew Lunn
2018-06-05 23:23               ` Grygorii Strashko
2018-06-05 23:49                 ` Andrew Lunn
2018-06-06  8:23                 ` Ivan Khoronzhuk
2018-06-03  0:37             ` Andrew Lunn
2018-06-05 21:31               ` Grygorii Strashko
2018-06-05 21:37                 ` Andrew Lunn
2018-06-03  0:49             ` Andrew Lunn
2018-06-05 22:45               ` Grygorii Strashko
2018-06-05 23:40                 ` Andrew Lunn
2018-06-01 21:29 ` Grygorii Strashko
2018-06-02 14:08   ` Andrew Lunn
2018-06-05 22:59     ` Grygorii Strashko
2018-06-05 23:53       ` Andrew Lunn
2018-06-06  6:42         ` Ilias Apalodimas

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).